'How to set the value of a cell in a dataframe based on values in another dataframe?

I have these two dataframes:

ALL_df
Out[20]: 
                              value  dow  doy  ...  start  to  date_helper
timestamp                                      ...                        
2019-01-01 00:00:00+00:00  0.015262    1    1  ...      1   1   2019-01-01
2019-01-01 01:00:00+00:00  0.016534    1    1  ...      1   1   2019-01-01
2019-01-01 02:00:00+00:00  0.013991    1    1  ...      1   1   2019-01-01
2019-01-01 03:00:00+00:00  0.038156    1    1  ...      1   1   2019-01-01
2019-01-01 04:00:00+00:00  0.021622    1    1  ...      1   1   2019-01-01
                            ...  ...  ...  ...    ...  ..          ...
2019-12-30 19:00:00+00:00  0.181823    0  364  ...      2   2   2019-12-30
2019-12-30 20:00:00+00:00  0.144769    0  364  ...      2   1   2019-12-30
2019-12-30 21:00:00+00:00  0.052565    0  364  ...      1   1   2019-12-30
2019-12-30 22:00:00+00:00  0.084449    0  364  ...      1   1   2019-12-30
2019-12-30 23:00:00+00:00  0.076693    0  364  ...      1   2   2019-12-30

[192192 rows x 19 columns]


ferientage_df
Out[19]: 
       Unnamed: 0          Bundesland  year            Ferien      dates
0               0   Baden-W�rttemberg  1995       Osterferien 1995-04-18
1               1   Baden-W�rttemberg  1995       Osterferien 1995-04-19
2               2   Baden-W�rttemberg  1995       Osterferien 1995-04-20
3               3   Baden-W�rttemberg  1995       Osterferien 1995-04-21
4               4   Baden-W�rttemberg  1995       Osterferien 1995-04-22
          ...                 ...   ...               ...        ...
45125       45125  Schleswig-Holstein  2023  Weihnachtsferien 2024-01-02
45126       45126  Schleswig-Holstein  2023  Weihnachtsferien 2024-01-03
45127       45127  Schleswig-Holstein  2023  Weihnachtsferien 2024-01-04
45128       45128  Schleswig-Holstein  2023  Weihnachtsferien 2024-01-05
45129       45129  Schleswig-Holstein  2023  Weihnachtsferien 2024-01-06

[45130 rows x 5 columns]

I want to add a column, called "Holiday" to ALL_df based on if the date in date_helper exists in the column dates of the ferientage_df (German holidays)

Is it possible to do this in 1-2 lines or do I have to loop through the whole thing somehow?



Solution 1:[1]

GIven the following toy dataframes:

import pandas as pd

ALL_df = pd.DataFrame(
    {
        "value": {
            0: "0.016534",
            1: "0.013991",
            2: "0.038156",
            3: "0.021622",
        },
        "date_helper": {
            0: "2019-01-01",
            1: "1995-04-18",
            2: "2019-01-01",
            3: "2024-01-03",
        },
    }
)

ferientage_df = pd.DataFrame(
    {
        "Ferien": {
            0: "Osterferien",
            1: "Osterferien",
            2: "Weihnachtsferien",
            3: "Weihnachtsferien",
        },
        "dates": {
            0: "1995-04-18",
            1: "1995-04-19",
            2: "2024-01-03",
            3: "2024-01-04",
        },
    }
)

You can do this:

ALL_df = ALL_df.assign(
    Holiday=lambda df: df["date_helper"].isin(ferientage_df["dates"])
)

print(ALL_df)

# Output
      value date_helper  Holiday
0  0.016534  2019-01-01    False
1  0.013991  1995-04-18     True
2  0.038156  2019-01-01    False
3  0.021622  2024-01-03     True

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Laurent