'Create a new column for a given range of numbers with same frequency
I haven't been able to find the exact solution to my problem. My data set has a column called 'Priority' which contains string values.
Priority
low
low
low
low
low
medium
medium
medium
medium
medium
high
high
high
high
I want to add column "Num" that will give each 'Priority' number based on a range of numbers (which would be 0 to 3). As a result, each 'Priority' would have a number between 1 to 3 with the same frequency.
For example :
Priority Num
low 1
low 2
low 3
low 1
low 2
low 3
medium 1
medium 2
medium 3
medium 1
medium 2
high 1
high 2
high 3
high 1
After many other attempts, this is my best solution for now but it's return duplicates in range (1,3)
x = pd.DataFrame(
[[l, n] for l in data. Priority for n in range(1,6)],
columns=['Priority', 'Num'])
Do you have another idea?
Solution 1:[1]
You can use a lambda function
assume df is the name of your data set
df['num'] = df.Priority.apply(lambda x: 1 if x=='low' else(2 if x=='medium' else 3))
Solution 2:[2]
to read your data I use StringIO together pandas.read_csv()
data = '''Priority\n
low\n
low\n
low\n
low\n
low\n
medium\n
medium\n
medium\n
medium\n
medium\n
high\n
high\n
high\n
high\n'''
to call groupby in 'Priority' column to invoke cumcount() method as is explained by @Raymond Kwok
import pandas as pd
from io import StringIO
# read data
df = pandas.read_csv(StringIO(data), header=0)
# use groupby to store value as accumulate count
df['num']=df.groupby('Priority').cumcount()%3+1
output:
>>> df
Priority num
0 low 1
1 low 2
2 low 3
3 low 1
4 low 2
5 medium 1
6 medium 2
7 medium 3
8 medium 1
9 medium 2
10 high 1
11 high 2
12 high 3
13 high 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Yuval Regev |
| Solution 2 | ellhe-blaster |
