'How to create a new column with the percentage of the occurance of a particular value in another column in a DataFrame
I have a column, it has A value either 'Y' or 'N' for yes or no. i want to be able to calculate the percentage of the occurance of Yes. and then include this as the value of a new column called "Percentage"
I have come up with this so far, Although this is what i need i dont know how to get the information in the way i describe
port_merge_lic_df.groupby(['Port'])['Shellfish Licence licence (Y/N)'].value_counts(normalize=True) * 100
Port Shellfish Licence licence (Y/N)
ABERDEEN Y 80.731789
N 19.268211
AYR N 94.736842
Y 5.263158
BELFAST N 81.654676
...
STORNOWAY N 23.362692
0.383857
ULLAPOOL N 56.936826
Y 43.063174
WICK N 100.000000
Name: Shellfish Licence licence (Y/N), Length: 87, dtype: float64
The dataframe is in the form:
df1 = pd.DataFrame({'Port': {0: 'NORTH SHIELDS', 1: 'NORTH SHIELDS',
2: 'NORTH SHIELDS', 3: 'NORTH SHIELDS',
4: 'NORTH SHIELDS'},'Shellfish Licence licence (Y/N)': {0: 'N', 1: 'N',
2: 'N', 3: 'N', 4: 'N'},
'Scallop Licence (Y/N)': {0: 'N', 1: 'N', 2: 'N', 3: 'N', 4: 'N'},
'Length Group': {0: 'Over10m', 1: 'Over10m', 2: 'Over10m',3:
'Over10m',4: 'Over10m'}})
df1
Solution 1:[1]
You should use a lambda.
Something like that:
res = port_merge_lic_df.groupby(['Port']).size().groupby('Shellfish Licence licence (Y/N)').apply(lambda x: x / x.sum())
And the last step:
res.reset_index(name='Percentage')
It should work.
Saying “thanks” is appreciated, but it doesn’t answer the question. Instead, vote up the answers that helped you the most! If these answers were helpful to you, please consider saying thank you in a more constructive way – by contributing your own answers to questions your peers have asked here. If you cannot vote up in this moment, please keep me in mind where you could.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jpg997 |
