'Is there a way subtract the values of one column from another column, and get the results in a new column in pandas using python?
So, I was messing around with operations in Pandas, and I reached conditional operations. For reference, I have two dataframes like this:
df_1:
| Time | Coupons_Sold |
|---|---|
| First_Quarter-2021 | 1041 |
| Second_Quarter-2021 | 2145 |
| Third_Quarter-2021 | 1809 |
| Fourth_Quarter-2021 | 1104 |
df_2:
| Time | Coupons_Sold |
|---|---|
| First_Quarter-2022 | 861 |
| Second_Quarter-2022 | 1024 |
| Third_Quarter-2021 | 902 |
| Fourth_Quarter-2021 | 1011 |
I wanted to do a conditional subtraction on these two datasets, such that the new column contains the absolute values from subtraction of the individual elements of the two columns, if and only if the time periods match.
I want something like:
| Time | Coupons_Sold |
|---|---|
| Third_Quarter-2021 | 907 |
| Fourth_Quarter-2021 | 93 |
because there are mappings for third and fourth quarters in both dataframes.
I tried this piece of code:
new_column = df_1['Coupons_Sold'] - df_2['Coupons_Sold']
But, this just gave me:
| center |
|---|
| 180 |
| 1121 |
| 907 |
| 93 |
Then I tried a few conditional statements like we do in python:
if df_1['Time'] == df_2['Time']:
df_1['Coupons_Sold'] - df_2['Coupons_Sold']
I tried the above code with in keyword, but got error.
but these conditional statements just gave me errors. Is there any way to do these kind of operations(py 2.7 or py3.7, both are okay)?
Thanks in advance.
If you need any more info, please ask and I will add the same.
Solution 1:[1]
The subtraction is done on the row index. By default its just 0, 1, 2, etc... You could make Time the index of both dataframes and then the subtraction will work. You'll get a Series with values and NaN. If you want the new column to match the shape of the original dataframes, you can just use it. Otherwise, apply .dropna() to collapse it.
>>> df_1.set_index("Time", inplace=True)
>>> df_2.set_index("Time", inplace=True)
>>> df_1["Coupons_Sold"] - df_2["Coupons_Sold"]
Time
First_Quarter-2021 NaN
First_Quarter-2022 NaN
Fourth_Quarter-2021 93.0
Second_Quarter-2021 NaN
Second_Quarter-2022 NaN
Third_Quarter-2021 907.0
Name: Coupons_Sold, dtype: float64
>>> (df_1["Coupons_Sold"] - df_2["Coupons_Sold"]).dropna()
Time
Fourth_Quarter-2021 93.0
Third_Quarter-2021 907.0
Name: Coupons_Sold, dtype: float64
Solution 2:[2]
You could use merge + diff for the specific columns:
cols = ['Time','Coupons_Sold']
out = df1[cols].merge(df2[cols], on='Time', suffixes=('_','')).set_index('Time').diff(axis=1).abs().dropna(axis=1).reset_index()
Output:
Time Coupons_Sold
2 Third_Quarter-2021 907
3 Fourth_Quarter-2021 93
Solution 3:[3]
You may try:
tset = set(df1['Time']).intersection(set(df2['Time']))
df3 = df1.loc[df1['Time'].isin(tset)].merge(df2.loc[df2['Time'].isin(tset)], on='Time')
df3['Coupons_Sold']=df3['Coupons_Sold_x']-df3['Coupons_Sold_y']
df3.drop(['Coupons_Sold_x','Coupons_Sold_y'], axis=1,inplace=True)
Output (df3):
Time Coupons_Sold
0 Third_Quarter-2021 907
1 Fourth_Quarter-2021 93
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 | |
| Solution 3 | SomeDude |
