'most frequent word of a csv file column mapred python

do someone know how create a mapred python script that shows the most frequent word of a csv column? for example csv file has column A, B, C. I want the script to output the most frequent word of column C, any help would be very appreciated.



Solution 1:[1]

To easily work with csv you could use the library pandas. To count the occurrences, try with collections


import pandas as pd
from collections import Counter

df = pd.read_csv("csv_path") # Load the csv into a dataframe
occurrences = Counter(df['C']) # Count every word for the 'C' column.

# Now you have a dictionary-like structure with words as keys, and the number of occurrences as the value.
# If you want only the most frequent, you could use :

most_used_word = max(occurrences, key=occurrences.get)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1