'How would I go about iterating through each row in a column and keeping a running tally of every substring that comes up? Python

Essentially what I am trying to do is go through the "External_Name" column, row by row, and get a count of unique substrings within each string, kind of like .value_counts().

External_Name	Specialty
ABMC Hyperbaric Medicine and Wound Care	Hyperbaric/Wound Care
ABMC Kaukauna Laboratory Services	Laboratory
AHCM Sinai Bariatric Surgery Clinic	General Surgery
...........	...........
n	n

For example, after running through the first three rows in "External_Name" the output would be something like

Output	Count
ABMC	2
Hyperbaric	1
Medicine	1
and	1
Wound	1
Care	1

So on and so forth. Any help would be really appreciated!

Solution 1:^[1]

You can split at whitespace with str.split(), then explode the resulting word lists into individual rows and count the values with value_counts.

>>> df.External_Name.str.split().explode().value_counts()
ABMC          2
Hyperbaric    1
Medicine      1
and           1
Wound         1
Care          1
Kaukauna      1
Laboratory    1
Services      1
AHCM          1
Sinai         1
Bariatric     1
Surgery       1
Clinic        1
Name: External_Name, dtype: int64

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	fsimonjetz

'How would I go about iterating through each row in a column and keeping a running tally of every substring that comes up? Python

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]