'How do I create a pandas column using a different function depending on what month it is in?
I have been given a JSON file that has information about flight delays from seven different airports. I have saved this to a pandas data frame called flights
. The data doesn't accurately display how many flights were delayed by weather so I have been assigned to recalculate that information. If the month is between April and August it is calculated differently than the rest of the months. I initially tried lambda with an if flights["month"] in delay_40
. Second I tried an np.where without using an in
statement, then np.select using dot notation instead of bracket notation. Each different implementation has given me the same error message. ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
Just a heads up, the indentation here is to improve readability. I understand that you can't have indentation in a lambda statement, and I don't know if it affects np.select or np.where.
delay_40 = ["April", "May", "June", "July", "August"]
weather_delay_total = flights
weather_delay_total = flights.assign(
improved_delays_weather = lambda row:
(round(row["num_of_delays_weather"] + (.3 * row["num_of_delays_late_aircraft"]) + (.4 * row["num_of_delays_nas"]))) if (row["month"] in delay_40)
else (round(row["num_of_delays_weather"] + (.3 * row["num_of_delays_late_aircraft"]) + (.65 * row["num_of_delays_nas"])))
)
weather_delay_total["improved_delays_weather"] = np.where(
flights.month == "April" or flights.month == "May" or flights.month == "June" or flights.month == "July" or flights.month == "August",
round(flights["num_of_delays_weather"] + (.3 * flights["num_of_delays_late_aircraft"]) + (.4 * flights["num_of_delays_nas"])),
round(flights["num_of_delays_weather"] + (.3 * flights["num_of_delays_late_aircraft"]) + (.65 * flights["num_of_delays_nas"])))
weather_delay_total = flights.assign(
improved_delays_weather = np.select(
flights.month == "April" or flights.month == "May" or flights.month == "June" or flights.month == "July" or flights.month == "August",
round(flights.num_of_delays_weather + (.3 * flights.num_of_delays_late_aircraft) + (.4 * flights.num_of_delays_nas)),
round(flights.num_of_delays_weather + (.3 * flights.num_of_delays_late_aircraft) + (.65 * flights.num_of_delays_nas)
)
)
Solution 1:[1]
This question was answered by Parfait and mozway in the comments. I used np.where = flights["month"].isin(delay_40)
and it worked perfectly. Thank you Parfait and mozway!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Riley S |