'Logical with Pyspark with When
I have the dataframe below:
| customer_id | person_id | type_person | type_person2 | insert_date2 | anterior_type | update_date |
|---|---|---|---|---|---|---|
| abcdefghijklmnopqrst | 4a5ae8a5-6682-467 | Online | Online | 2022-03-02 | null | null |
| abcdefghijklmnopqrst | 1be8d3e8-8075-438 | Online | Online | 2022-03-02 | null | null |
| abcdefghijklmnopqrst | 6912dadc-1692-4bd | Online | Offline | 2022-03-02 | Online | 2022-03-03 |
| abcdefghijklmnopqrst | e48cba37-113c-4bd | Online | Online | 2022-03-02 | null | null |
| abcdefghijklmnopqrst | 831cb669-b2ae-4e8 | Online | Online | 2022-03-02 | null | null |
| abcdefghijklmnopqrst | 69161fe5-62ac-400 | Online | Online | 2022-03-02 | null | null |
| abcdefghijklmnopqrst | b48b59a0-92eb-410 | Online | Online | 2022-03-02 | null | null |
I need to look at the ´type_person´ and ´type_person2´ columns and create a new column with the following rules:
- If both are online then online
- If both are offline then offline
- If one is offline and one is online then hybrid
- If one is online and one is offline then hybrid
- If either of the two is hybrid then hybrid
How do I do this?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
