'Update a dataframe in pandas while iterating row by row
I have a pandas data frame that looks like this (its a pretty big one)
date exer exp ifor mat
1092 2014-03-17 American M 528.205 2014-04-19
1093 2014-03-17 American M 528.205 2014-04-19
1094 2014-03-17 American M 528.205 2014-04-19
1095 2014-03-17 American M 528.205 2014-04-19
1096 2014-03-17 American M 528.205 2014-05-17
now I would like to iterate row by row and as I go through each row, the value of ifor
in each row can change depending on some conditions and I need to lookup another dataframe.
Now, how do I update this as I iterate. Tried a few things none of them worked.
for i, row in df.iterrows():
if <something>:
row['ifor'] = x
else:
row['ifor'] = y
df.ix[i]['ifor'] = x
None of these approaches seem to work. I don't see the values updated in the dataframe.
Solution 1:[1]
You can use df.at
:
for i, row in df.iterrows():
ifor_val = something
if <condition>:
ifor_val = something_else
df.at[i,'ifor'] = ifor_val
For versions before 0.21.0, use df.set_value
:
for i, row in df.iterrows():
ifor_val = something
if <condition>:
ifor_val = something_else
df.set_value(i,'ifor',ifor_val)
If you don't need the row values you could simply iterate over the indices of df
, but I kept the original for-loop in case you need the row value for something not shown here.
Solution 2:[2]
Pandas DataFrame object should be thought of as a Series of Series. In other words, you should think of it in terms of columns. The reason why this is important is because when you use pd.DataFrame.iterrows
you are iterating through rows as Series. But these are not the Series that the data frame is storing and so they are new Series that are created for you while you iterate. That implies that when you attempt to assign tho them, those edits won't end up reflected in the original data frame.
Ok, now that that is out of the way: What do we do?
Suggestions prior to this post include:
pd.DataFrame.set_value
is deprecated as of Pandas version 0.21pd.DataFrame.ix
is deprecatedpd.DataFrame.loc
is fine but can work on array indexers and you can do better
My recommendation
Use pd.DataFrame.at
for i in df.index:
if <something>:
df.at[i, 'ifor'] = x
else:
df.at[i, 'ifor'] = y
You can even change this to:
for i in df.index:
df.at[i, 'ifor'] = x if <something> else y
Response to comment
and what if I need to use the value of the previous row for the if condition?
for i in range(1, len(df) + 1):
j = df.columns.get_loc('ifor')
if <something>:
df.iat[i - 1, j] = x
else:
df.iat[i - 1, j] = y
Solution 3:[3]
A method you can use is itertuples()
, it iterates over DataFrame rows as namedtuples, with index value as first element of the tuple. And it is much much faster compared with iterrows()
. For itertuples()
, each row
contains its Index
in the DataFrame, and you can use loc
to set the value.
for row in df.itertuples():
if <something>:
df.at[row.Index, 'ifor'] = x
else:
df.at[row.Index, 'ifor'] = x
df.loc[row.Index, 'ifor'] = x
Under most cases, itertuples()
is faster than iat
or at
.
Thanks @SantiStSupery, using .at
is much faster than loc
.
Solution 4:[4]
You should assign value by df.ix[i, 'exp']=X
or df.loc[i, 'exp']=X
instead of df.ix[i]['ifor'] = x
.
Otherwise you are working on a view, and should get a warming:
-c:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
But certainly, loop probably should better be replaced by some vectorized algorithm to make the full use of DataFrame
as @Phillip Cloud suggested.
Solution 5:[5]
It's better to use lambda
functions using df.apply()
-
df["ifor"] = df.apply(lambda x: {value} if {condition} else x["ifor"], axis=1)
Solution 6:[6]
Well, if you are going to iterate anyhow, why don't use the simplest method of all, df['Column'].values[i]
df['Column'] = ''
for i in range(len(df)):
df['Column'].values[i] = something/update/new_value
Or if you want to compare the new values with old or anything like that, why not store it in a list and then append in the end.
mylist, df['Column'] = [], ''
for <condition>:
mylist.append(something/update/new_value)
df['Column'] = mylist
Solution 7:[7]
for i, row in df.iterrows():
if <something>:
df.at[i, 'ifor'] = x
else:
df.at[i, 'ifor'] = y
Solution 8:[8]
List Comprehension could be an option.
df['new_column'] = [your_func(x) for x in df['column']]
This will iterate over the column df['column'] call the function your_func with the value from df['column'] and assign a value to the row in the new column df['new_column'].
Please, don't forget to create a function.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Viglione |
Solution 2 | |
Solution 3 | Martin He |
Solution 4 | CT Zhu |
Solution 5 | davidbilla |
Solution 6 | Pranzell |
Solution 7 | Duane |
Solution 8 | Kskarz |