'Chaning a value in a pandas DataFrame using the .loc method and = operator

I have a dataframe with missing values that I am trying to fill in. I have made another dataframe that contains the necessary data to fill the missing values. Both dataframes have year & month columns that I use to decide what value to fill in. The dataframe with the missing values is called full_arranged and the dataframe with the values I wish to import to the former is called median_values.

To fill in the missing data, I wrote the following code:

for column in full_arranged.drop(['year','month','day', 'date'], axis = 1).columns:
    for year in range(2016,2019):
            for month in range(1,13):
                temp_val = median_values.loc[(median_values['year'] == year) & (median_values['month'] == month), column]
                full_arranged.loc[(full_arranged[column].isnull()) & (full_arranged['year'] == year) & (full_arranged['month'] == month), column] = temp_val

After I run this code, I check for missing values, but none of them have changed. What am I missing?

I have also tried to do the same but with the pandas method pd.fillna and got the same results (the missing values are of type np.nan but I also tired to run the same code with pd.fillna and pd.nan and it did not work).

full_arranged looks likes this:

year month value_1 value_2
2017 3 Nan 3.52

median_values looks like this:

year month median_value_for_value_1 median_value_for_value_2
2017 3 5 4


Solution 1:[1]

A minimal example of both DFs would be helpful.

However, the condition & (median_values[column] == column) seems strange.

Don't you want to do something like this?:

= median_values.loc[(median_values['year'] == year) & (median_values['month'] == month),column]

So select the row by year and month and return the value of the column of the median_values dataframe?

Update:

More efficient would be the Pandas way:

    #Build Dummy Dataframes
    full_arranged = pd.DataFrame([[2017,3,np.NAN,3.52],[2017,3,4.5,9],[2017,4,4.5,np.NAN]], columns = ['year','month','value_1','value_2'])
    median_values = pd.DataFrame([[2017,3,5,4],[2017,4,6,7]], columns = ['year','month','value_1','value_2'])
    
    #Set Year/Month Index
    full_arranged.index = pd.to_datetime((full_arranged['year'].astype(str) + "-" + full_arranged['month'].astype(str)))
    median_values.index = pd.to_datetime((median_values['year'].astype(str) + "-" + median_values['month'].astype(str)))
    
    #Update NA Values
    full_arranged = full_arranged.update(median_values, overwrite=False)

pd.to_datetime is optional.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1