'Group similar words and replace with one word
I am having data of 500thousand, I am trying to calculate the co-occurrence, I got a memory issue. Then I figured out there are words that represent a similar meaning. How do to replace those words with words and calculate the co-occurence.
0 1 2 \
0 product management rpm progress 4gl
1 user experience interaction design 3d rendering
2 social media excel powerpoint
3 government research economic history
4 continuous improvement data governance quality management
3 4 5 \
0 ip camel prince2 foundation
1 event team graphic design
2 microsoft word word real estate
3 planning policy None
4 test foreign affairs capacity management
6 7 8 \
0 continuous integration gsm(hlr msc)
1 engineering user experience design sales
2 teamwork microsoft office None
3 None None None
4 lean industrial engineering data
9 ... 99 100 101 102 103 104 105 106 \
0 programming ... None None None None None None None None
1 3d modeling ... None None None None None None None None
2 None ... None None None None None None None None
3 None ... None None None None None None None None
4 process optimisation ... None None None None None None None None
107 108
0 None None
1 None None
2 None None
3 None None
4 None None
[5 rows x 109 columns]
For example, if word and Microsoft words represent the same meaning, how can I replace the words with one word and generate a new column name for them. I am very new to such a situation and am not sure how to proceed with the next step.
My approach is to replace those so that I can use this data for co-occurrence and word2vec and generate a graph for the community detection.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
