'count frequency of items in one column in relation to criteria in another column in python

I have a data frame that looks something like this:

Category	Topic
Category1	Topic1
Category2	Topic2
Category1	Topic2
Category3	Topic3
Category2	Topic3
Category3	Topic3

And I want an output like this:

Category	Topic	Frequency
Category1	Topic1
	Topic2
	Topic3
Catgeory2	Topic1
	Topic2
	Topic3
Category3	Topic1
	Topic2
	Topic3

I am just starting out with python and I'd really appreciate it if someone could help me out with this.

python

Solution 1:^[1]

If the frequency is meant to capture the frequency of topic within each category, then, a basic approch involves:

df.groupby('Category')['Topic'].value_counts(normalize=True)

Which is a Series. For example, on your input data, we get:

Category   Topic 
Category1  Topic1    0.5
           Topic2    0.5
Category2  Topic2    0.5
           Topic3    0.5
Category3  Topic3    1.0
Name: Topic, dtype: float64

For an output organized as per your example, that appears to be a DataFrame with three columns:

out = (
    df
    .groupby('Category')['Topic']
    .value_counts(normalize=True)
    .to_frame('frequency')
    .reset_index()
)

Again, on your input sample:

>>> out
    Category   Topic  frequency
0  Category1  Topic1        0.5
1  Category1  Topic2        0.5
2  Category2  Topic2        0.5
3  Category2  Topic3        0.5
4  Category3  Topic3        1.0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Pierre D

'count frequency of items in one column in relation to criteria in another column in python

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]