'convert for-loop output into dataframe python
I am trying to convert the output of this code into a dataframe, but do not know how. What is a good way to turn the output columns (string and frequency) into a dataframe?
# Extract data
url = "https://www.cbinsights.com/research-unicorn-companies"
df = pd.read_html(url)
first_df = df[0]
column = "Select Investors"
all_investor = []
for i in first_df[column]:
all_investor += str(i).lower().split(',')
# Calculate frequency
for string in all_investor:
string = string.strip()
frequency = first_df[column].apply(
lambda x: string in str(x).lower()).sum()
print(string, frequency)
Output:
andreessen horowitz 41
new enterprise associates 21
battery ventures 14
index ventures 30
dst global 19
ribbit capital 8
forerunner ventures 4
crosslink capital 4
homebrew 2
sequoia capital 115
thoma bravo 3
softbank 50
tencent holdings 28
lightspeed india partners 4
sequoia capital india 25
ggv capital 14
....
Solution 1:[1]
Use str.split and value_counts:
>>> df['Select Investors'].str.split(', ').explode().str.lower().value_counts()
accel 54
tiger global management 48
sequoia capital china 46
andreessen horowitz 42
sequoia capital 41
..
.406 ventures 1
transamerica ventures 1
crane venture partners 1
geekdom fund 1
endiya partners 1
Name: Select Investors, Length: 1187, dtype: int64
Solution 2:[2]
# Extract data
url = "https://www.cbinsights.com/research-unicorn-companies"
df = pd.read_html(url)
first_df = df[0]
column = "Select Investors"
all_investor = []
for i in first_df[column]:
all_investor += [j.strip() for j in str(i).lower().split(',') if j.strip()]
all_investor = set(all_investor)
all_data = []
# Calculate frequency
for string in all_investor:
frequency = first_df[column].apply(
lambda x: string in str(x).lower()).sum()
all_data.append([string, frequency])
new_df = pd.DataFrame(all_data, columns=["Investor", "Frequency"])
new_df = new_df.sort_values(by='Frequency',ascending=False).reset_index()
new_df
Output:
index Investor Frequency
0 854 sequoia capital 115
1 626 ing 90
2 765 accel 62
3 1025 tiger global 50
4 964 softbank 50
... ... ... ...
1180 486 quiet capital 1
1181 487 york capital management 1
1182 489 ewtp capital 1
1183 490 kleiner perkins caulfield & byers 1
1184 1184 google capital 1
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
| Solution 2 | Mazhar |
