'Adding new column to dataframe and change it to list

Based on the given dataset, I must create a function which takes as input the original dataframe and returns the same dataframe, but with one additional column called subr_faved_by_as_list, where you have the same information as in subr_faved_by, but as a python list instead of a string.

That's my code:

from urllib import request
import pandas as pd
module_url = f"https://raw.githubusercontent.com/luisespinosaanke/cmt309-portfolio/master/data_portfolio_21.csv"
module_name = module_url.split('/')[-1]
print(f'Fetching {module_url}')
#with open("file_1.txt") as f1, open("file_2.txt") as f2
with request.urlopen(module_url) as f, open(module_name,'w') as outf:
  a = f.read()
  outf.write(a.decode('utf-8'))


df = pd.read_csv('data_portfolio_21.csv')
# this fills empty cells with empty strings
df = df.fillna('')
df.info()

Data columns (total 17 columns):
 #   Column                 Non-Null Count  Dtype  
---  ------                 --------------  -----  
 0   author                 19940 non-null  object 
 1   posted_at              19940 non-null  object 
 2   num_comments           19940 non-null  int64  
 3   score                  19940 non-null  int64  
 4   selftext               19940 non-null  object 
 5   subr_created_at        19940 non-null  object 
 6   subr_description       19940 non-null  object 
 7   subr_faved_by          19940 non-null  object 
 8   subr_numb_members      19940 non-null  int64  
 9   subr_numb_posts        19940 non-null  int64  
 10  subreddit              19940 non-null  object 
 11  title                  19940 non-null  object 
 12  total_awards_received  19940 non-null  int64  
 13  upvote_ratio           19940 non-null  float64
 14  user_num_posts         19940 non-null  int64  
 15  user_registered_at     19940 non-null  object 
 16  user_upvote_ratio      19940 non-null  float64
dtypes: float64(2), int64(6), object(9)

And that's the function

def transform_faves(df):
    df['subr_faved_by_as_list'] = df['subr_faved_by'].str.split(' ', n = 1, expand = True)

    return df

df = transform_faves(df)

I am getting the following error:

list indices must be integers or slices, not str


Solution 1:[1]

If use expand=True it return DataFrame, so need omit it, also is possible omit space string for split:

def transform_faves(df):
    df['subr_faved_by_as_list'] = df['subr_faved_by'].str.split(n = 1)

    return df

df = transform_faves(df)

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 jezrael