'Pandas: Perform a Formula Calculation on Different Columns then Sum All of It
I have set of data as follow
import pandas as pd
dataset = [['01-01-2015', 234, '25-05-2017', 633, '03-06-2016', 935, '30-10-2019', 673, '16-12-2020', 825, '06-07-2019']]
ds = pd.DataFrame(dataset, columns = ['Start_Date', 'A', 'End_Date_A', 'B', 'End_Date_B', 'C', 'End_Date_C',
'D', 'End_Date_D', 'E', 'End_Date_E'])
Output:
Start_Date A End_Date_A B End_Date_B C End_Date_C D End_Date_D E End_Date_E
0 01-01-2015 234 25-05-2017 633 03-06-2016 935 30-10-2019 673 16-12-2020 825 06-07-2019
Then, I want to perform a calculation whereby for every end date of letter need to be minus with the Start_Date, then divide with 365 to get duration in years. Then the duration of each letter need to be use as a 'power of' of the value of their letter, respectively. Then the result of each letter need to be sum up to get the total.
Is there a way to perform a loop of this calculation?
Solution 1:[1]
You need to use a parser. I'm assuming the data looks as you have posted.
For this example, I'm using dateutil - but you can use any other in a similar way.
import pandas as pd
dataset = [['01-01-2015', 234, '25-05-2017', 633, '03-06-2016', 935, '30-10-2019', 673, '16-12-2020', 825, '06-07-2019']]
ds = pd.DataFrame(dataset, columns = ['Start_Date', 'A', 'End_Date_A', 'B', 'End_Date_B', 'C', 'End_Date_C',
'D', 'End_Date_D', 'E', 'End_Date_E'])
from dateutil import parser
import math
letters = ["A", "B", "C", "D", "E"]
start_date = parser.parse(ds["Start_Date"][0])
total = 0
for letter in letters:
end_date = parser.parse(ds["End_Date_" + letter][0])
years = (end_date - start_date).days / 365
power = math.pow(int(ds[letter][0]), years)
total+= power
print(letter, ":",power)
print("total: ", total)
I just printed the result, but you can play with the concept to do what you actually need.
Solution 2:[2]
With Pandas, I would organize the data as follows:
| End_date | Value |
|---|---|
| 25-05-2017 | 234 |
| 03-06-2016 | 633 |
| 30-10-2019 | 935 |
| 16-12-2020 | 673 |
| 06-07-2019 | 825 |
You may want to set the letters 'A' etc. as index or an extra column, though not strictly required. There is no need to put the Start_Date in dataframe if it is constant.
Assume that all dates are date object. Now, subtract the end date by start date and divide by 365:
df['Duration'] = (df['End_date'] - Start_Date).dt.days / 365
Do the 'Power of' row-wise:
df['Value_powered'] = df['Value'] ** df['Duration']
Finally, sum:
result = df['Value_powered'].sum()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | I M |
| Solution 2 |
