'most frequent word of a csv file column mapred python
do someone know how create a mapred python script that shows the most frequent word of a csv column? for example csv file has column A, B, C. I want the script to output the most frequent word of column C, any help would be very appreciated.
Solution 1:[1]
To easily work with csv you could use the library pandas. To count the occurrences, try with collections
import pandas as pd
from collections import Counter
df = pd.read_csv("csv_path") # Load the csv into a dataframe
occurrences = Counter(df['C']) # Count every word for the 'C' column.
# Now you have a dictionary-like structure with words as keys, and the number of occurrences as the value.
# If you want only the most frequent, you could use :
most_used_word = max(occurrences, key=occurrences.get)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
