'Combine date present in two different columns to generate mean for a column

I have a dataset of the following format which has the Starting column values ranging from 2021-01-01 to 2022-03-13 and same goes for the Ending column where my values begin from 2021-01-01 to 2022-03-13.

The data for rainfall gets collected on a daily basis such that the entries are as follows:

This is the format of the data in my pandas df.

Apologies for the format. I want to get the display of the new dataframe like this

I am trying to combine and form monthly average values for the dataset. I cannot find a way where I am able to take monthly average values and store them in a different pandas dataframe such that it appears as follows:

The Monthly Rainfall is found using Total rainfall/ Total days in the month

Any help would be appreciated!

I have tried to use groupy and mean together from pandas library to find the output but it doesn't appear in the format I want.

df=df.groupby(['Starting','Ending','Location_id'])['rainfall'].mean().reset_index()



Solution 1:[1]

To solve the problem, you can write a function like this:

import math
from datetime import datetime

def to_date(x, y):
  lists = zip([datetime.strptime(dt, '%Y-%m-%d').date() for dt in x], [datetime.strptime(dt, '%Y-%m-%d').date() for dt in y])
  return [0 if math.isinf((x-y).days) else (x-y).days for x,y in lists]

Basically this function takes two lists (x,y) and turn every item in those into date() objects. And returns a new lists with items as days object. For your information, if you deduct same dates, Python returns an inf integer, which is infinite. To go over this, you can check if the item is an infitine integer, if so return 0 else return days.

Here's the code snippet I wrote, since you didn't provide a dataset, I wrote using the images you provided:

import pandas as pd

d = {
    'New_Starting': ['2021-01-01','2021-01-01','2021-01-01'],
    'New_Ending': ['2021-01-31','2021-01-31','2021-01-31'],
    'Location_id': [45, 52, 30],
    'Rainfall': [4.07, 6.53, 3.71]
}

d = pd.DataFrame(d)
d['Monthly_Rainfall'] = d['Rainfall'] / to_date(d['New_Ending'], d['New_Starting'])

Output:

    New_Starting    New_Ending  Location_id Rainfall    Monthly_Rainfall
0   2021-01-01      2021-01-31       45     4.07        0.135667
1   2021-01-01      2021-01-31       52     6.53        0.217667
2   2021-01-01      2021-01-31       30     3.71        0.123667

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Kerem Nayman