'Chaning a value in a pandas DataFrame using the .loc method and = operator
I have a dataframe with missing values that I am trying to fill in.
I have made another dataframe that contains the necessary data to fill the missing values. Both dataframes have year & month columns that I use to decide what value to fill in.
The dataframe with the missing values is called full_arranged and the dataframe with the values I wish to import to the former is called median_values.
To fill in the missing data, I wrote the following code:
for column in full_arranged.drop(['year','month','day', 'date'], axis = 1).columns:
for year in range(2016,2019):
for month in range(1,13):
temp_val = median_values.loc[(median_values['year'] == year) & (median_values['month'] == month), column]
full_arranged.loc[(full_arranged[column].isnull()) & (full_arranged['year'] == year) & (full_arranged['month'] == month), column] = temp_val
After I run this code, I check for missing values, but none of them have changed. What am I missing?
I have also tried to do the same but with the pandas method pd.fillna and got the same results (the missing values are of type np.nan but I also tired to run the same code with pd.fillna and pd.nan and it did not work).
full_arranged looks likes this:
| year | month | value_1 | value_2 |
|---|---|---|---|
| 2017 | 3 | Nan | 3.52 |
median_values looks like this:
| year | month | median_value_for_value_1 | median_value_for_value_2 |
|---|---|---|---|
| 2017 | 3 | 5 | 4 |
Solution 1:[1]
A minimal example of both DFs would be helpful.
However, the condition & (median_values[column] == column) seems strange.
Don't you want to do something like this?:
= median_values.loc[(median_values['year'] == year) & (median_values['month'] == month),column]
So select the row by year and month and return the value of the column of the median_values dataframe?
Update:
More efficient would be the Pandas way:
#Build Dummy Dataframes
full_arranged = pd.DataFrame([[2017,3,np.NAN,3.52],[2017,3,4.5,9],[2017,4,4.5,np.NAN]], columns = ['year','month','value_1','value_2'])
median_values = pd.DataFrame([[2017,3,5,4],[2017,4,6,7]], columns = ['year','month','value_1','value_2'])
#Set Year/Month Index
full_arranged.index = pd.to_datetime((full_arranged['year'].astype(str) + "-" + full_arranged['month'].astype(str)))
median_values.index = pd.to_datetime((median_values['year'].astype(str) + "-" + median_values['month'].astype(str)))
#Update NA Values
full_arranged = full_arranged.update(median_values, overwrite=False)
pd.to_datetime is optional.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
