'How would I go about iterating through each row in a column and keeping a running tally of every substring that comes up? Python
Essentially what I am trying to do is go through the "External_Name" column, row by row, and get a count of unique substrings within each string, kind of like .value_counts().
| External_Name | Specialty |
|---|---|
| ABMC Hyperbaric Medicine and Wound Care | Hyperbaric/Wound Care |
| ABMC Kaukauna Laboratory Services | Laboratory |
| AHCM Sinai Bariatric Surgery Clinic | General Surgery |
| ........... | ........... |
| n | n |
For example, after running through the first three rows in "External_Name" the output would be something like
| Output | Count |
|---|---|
| ABMC | 2 |
| Hyperbaric | 1 |
| Medicine | 1 |
| and | 1 |
| Wound | 1 |
| Care | 1 |
So on and so forth. Any help would be really appreciated!
Solution 1:[1]
You can split at whitespace with str.split(), then explode the resulting word lists into individual rows and count the values with value_counts.
>>> df.External_Name.str.split().explode().value_counts()
ABMC 2
Hyperbaric 1
Medicine 1
and 1
Wound 1
Care 1
Kaukauna 1
Laboratory 1
Services 1
AHCM 1
Sinai 1
Bariatric 1
Surgery 1
Clinic 1
Name: External_Name, dtype: int64
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | fsimonjetz |
