'Delete specific rows based in conditions on rows from a dataframe pandas
I want to delete specific rows based in conditions on rows from a Pandas dataframe.
For example, since I have several currency pairs at the same time, I intend to select only one of the currencies of the same time.
This is the priority: EUR, USD, GBP, CHF.
currency timebuy buyprice
CNHUSD 2021-01-05 08:30:00 0,00005073
CNHGBP 2021-01-05 08:30:00 1,588
ZARGBP 2021-01-07 05:15:00 0,2727
ZARUSD 2021-01-07 05:15:00 300
ZAREUR 2021-01-07 13:00:00 0,1936
ZARCHF 2021-01-07 13:00:00 0,0000052
JPYCHF 2021-01-13 06:00:00 0,0002222
JPYUSD 2021-01-13 06:00:00 8
JPYGBP 2021-01-13 06:00:00 8

to
currency timebuy buyprice
CNHUSD 2021-01-05 08:30:00 0,00005073
ZAREUR 2021-01-07 13:00:00 0,1936
JPYUSD 2021-01-13 06:00:00 8

Solution 1:[1]
For a priority list like this, it's easiest to work with numbers. So, you can create a nice numeric mapping from your priority list, and use it to pick rows:
priority = ['EUR', 'USD', 'GBP', 'CHF']
mapping = {p: i for i, p in enumerate(priority)}
indexes = df['currency'].str[-3:].map(mapping).groupby(df['currency'].str[:3]).idxmin().sort_values()
selected = df.loc[indexes]
Output:
>>> selected
currency timebuy buyprice
0 CNHUSD 2021-01-05 08:30:00 0,00005073
4 ZAREUR 2021-01-07 13:00:00 0,1936
7 JPYUSD 2021-01-13 06:00:00 8
One-liner:
priority = ['EUR', 'USD', 'GBP', 'CHF']
filtered = df.loc[df['currency'].str[-3:].map({p: i for i, p in enumerate(priority)}).groupby(df['currency'].str[:3]).idxmin().sort_values()]
If you want to group by each timestamp instead of the first 3 letters of currency, group by df['timestamp'] instead of df['currency'].str[:3], i.e.:
indexes = df['currency'].str[-3:].map(mapping).groupby(df['timestamp']).idxmin().sort_values()
# ^^^^^^^^^^^^^^^
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
