'Find Pandas column largest/smallest values where dates don't overlap
I have a DataFrame like:
df = pd.DataFrame(index = [0,1,2,3,4,5])
df['XYZ'] = [2, 8, 6, 5, 9, 10]
df['Date2'] = ["2005-01-06", "2005-01-07", "2005-01-08", "1994-06-08", "1999-06-15", "2005-01-09"]
df['Date1'] = ["2005-01-02", "2005-01-03", "2005-01-04", "1994-06-04", "1999-06-12", "2005-01-05"]
df['Date1'] = pd.to_datetime(df['Date1'])
df['Date2'] = pd.to_datetime(df['Date2'])
I need to follow the 2 largest values of XYZ with dates that do not overlap. The expected output would be:
XYZ Date1 Date2
10 2005-01-05 2005-01-09
9 1999-06-12 1999-06-15
5 1994-06-04 1994-06-08
I tried to sort by "XYZ":
df.sort_values(by="XYZ", ascending=False, inplace=True)
And then compare dates:
df['overlap'] = (df['Date1] <= df['Date2'].shift()) & (df['Date2'] >= df['Date1'].shift())
And then drop any True
values in df['overlap'] and take the nlargest()
values, however that results in cases that do overlap.
Any help would be much appreciated.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|