'Adding new column to dataframe and change it to list
Based on the given dataset, I must create a function which takes as input the original dataframe and returns the same dataframe, but with one additional column called subr_faved_by_as_list, where you have the same information as in subr_faved_by, but as a python list instead of a string.
That's my code:
from urllib import request
import pandas as pd
module_url = f"https://raw.githubusercontent.com/luisespinosaanke/cmt309-portfolio/master/data_portfolio_21.csv"
module_name = module_url.split('/')[-1]
print(f'Fetching {module_url}')
#with open("file_1.txt") as f1, open("file_2.txt") as f2
with request.urlopen(module_url) as f, open(module_name,'w') as outf:
a = f.read()
outf.write(a.decode('utf-8'))
df = pd.read_csv('data_portfolio_21.csv')
# this fills empty cells with empty strings
df = df.fillna('')
df.info()
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 author 19940 non-null object
1 posted_at 19940 non-null object
2 num_comments 19940 non-null int64
3 score 19940 non-null int64
4 selftext 19940 non-null object
5 subr_created_at 19940 non-null object
6 subr_description 19940 non-null object
7 subr_faved_by 19940 non-null object
8 subr_numb_members 19940 non-null int64
9 subr_numb_posts 19940 non-null int64
10 subreddit 19940 non-null object
11 title 19940 non-null object
12 total_awards_received 19940 non-null int64
13 upvote_ratio 19940 non-null float64
14 user_num_posts 19940 non-null int64
15 user_registered_at 19940 non-null object
16 user_upvote_ratio 19940 non-null float64
dtypes: float64(2), int64(6), object(9)
And that's the function
def transform_faves(df):
df['subr_faved_by_as_list'] = df['subr_faved_by'].str.split(' ', n = 1, expand = True)
return df
df = transform_faves(df)
I am getting the following error:
list indices must be integers or slices, not str
Solution 1:[1]
If use expand=True it return DataFrame, so need omit it, also is possible omit space string for split:
def transform_faves(df):
df['subr_faved_by_as_list'] = df['subr_faved_by'].str.split(n = 1)
return df
df = transform_faves(df)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | jezrael |
