'How to count the occurrences of certain rows in a column from Python dataframe
So I have a dataframe of wildfires in California by county. It looks a little something like:
| Year | Counties | Name |
|---|---|---|
| 2016 | Monterey | Soberanes |
| Kern | Erskine | |
| San Luis Bishop | Chimney | |
| San Bernardino | Blue Cut | |
| Siskiyou | Gap | |
| ... | ... | ... |
I also have some lists of certain counties I'm most interested in.
relevant_counties = ['Monterey', 'Kern', ' San Bernardino']
If I wanted to see how many wildfires occurred in my list of relevant counties, how would I go about doing that? I have tried df['Counties'].value_counts() and it gives me a series with a count for each county, but how do I write a function that will then add the number of wildfires that occurred in a county in my relevant_counties list?
Thank you!
Solution 1:[1]
Ideally filter before counting the values:
df.loc[df['Counties'].isin(relevant_counties), 'counties'].value_counts()
Unless you plan on reusing the counts with various lists of counties:
counts = df['counties'].value_counts()
counts[relevant_counties]
counts[relevant_counties_2]
Note on the spaces
I notice that you have leading spaces in the list:
relevant_counties = ['Monterey', 'Kern', ' San Bernardino']
Try to avoid this, if you potentially have those spaces or uncertain case in the data, you can homogenize using:
# no leading/trailing spaces
relevant_counties = ['Monterey', 'Kern', 'San Bernardino']
clean = df['counties'].str.strip().str.title() # capitalize each word "Abc Def"
clean[clean.isin(relevant_counties)].value_counts()
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | mozway |
