'Python Pandas how to concatenate horizontally on the same row
I'm having issues with the formatting of a CSV I am trying to create. Here is the code I have so far
import os.path
import pandas as pd
import glob
usernamesDF=pd.read_csv('C:\\Users\\jotam\\Desktop\\Modeling Fanaticism\\User List\\users.csv') #CSV with list of users that posted in greatawakening subreddit
users=set(usernamesDF['author'].tolist()) # removes duplicates andconverts dataframe to list
botsDF=pd.read_csv('C:\\Users\\jotam\\Desktop\\Modeling Fanaticism\\User List\\Bot list.csv') #CSV with a list of bots
bots=set(botsDF['author'].tolist()) #removes any potential duplicates and converts dataframe to list
users=list(users.difference(bots)) #removes any known bots from user list
subreddits = pd.read_csv(r'C:\Users\jotam\Desktop\Modeling Fanaticism\Focus Group\Top Subreddits\Top 500 Subreddits.csv') # creats dataframe of top 500 subreddits
subreddits = (subreddits['subreddit'].tolist())[0:99] #converts to list and slices first 100
dfList = []
for user in users[:1]:
print(user)
for subreddit in subreddits:
path = (r"C:\\Users\\jotam\\Desktop\\Modeling Fanaticism\\Focus Group\\Subreddit CSVs\\" + str(subreddit) + '.csv')
if os.path.exists(path): #checks if the csv for each subreddit exists in folder
print(subreddit)
df = pd.read_csv(path,index_col = False
)
df = df.loc[df['author'] == user] #locates within the subreddit csv the current user in the for loop iteration
if len(df.index) == 0: #if user has not posted in the subreddit creates the empty dataframe into a useable one since its needed for the model
if_empty = {'author': user, 'subreddit' : subreddit, 'post_count' : '0' , 'total_score': '0', 'times_gilded' : '0', 'has_flair' : '0', 'is_distinguished' : '0'}
df = pd.DataFrame(data=if_empty,index = [0],columns=['author','subreddit','post_count','total_score','times_gilded','has_flair','is_distinguished'])
dfList.append(df)
concatDF = pd.concat(dfList,axis=1,ignore_index = False) #concatenates horizontally
# concatDF= concatDF.groupby(by='author')
concatDF.to_csv(r'C:\Users\jotam\Desktop\Modeling Fanaticism\Focus Group\NEWCONCAT.csv',index = False)
This image is the goal of what I am trying to make with my code
This image is what I currently have
For contaxt I am trying to concatenate horixzontally since the way it is in picture 1 every 5 columns after the 1st author column corresponds to a specific subreddit in the same order as a list I have from most posted to least posted subreddit. What I currently have I'm trying have each row corresponding to 1 user although in the 2nd image It's breaking it up into 4 rows. Thanks for the help!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
