'For loop in a dataframe and keep each iteration in Python
I'm creating a template to process SurveyMonkey surveys into a Tableau ready format. I'm breaking down the surveys into their question types. I want to automate the script as much as possible so I'm trying to use a for loop for each question type.
For our purposes let's stick to the Ranking type question.
Let's say I have a dataframe like this:
d = {'Respondent ID': [123, 234, 345], 'rank question 1': [3, 5, 4], 'rank question 2': [1, 6, 7]}
df = pd.DataFrame(data=d)
df
I want the final dataframe to look like this:
rankfinal = {'Respondent ID': [123, 234, 345, 123, 234, 345], 'answer': [3, 5, 4, 1, 6, 7], 'question': ['rank question 1', 'rank question 1', 'rank question 1', 'rank question 2', 'rank question 2', 'rank question 2']}
rank1 = pd.DataFrame(data=rankfinal)
rank1
I've tried several attempts, but here is my best:
ranking = [1,2] # These are the column positions in the original survey dataframe
hold = []
for i in range(len(ranking)):
hold.append(i)
respondent_id = []
questions = []
answers = []
for i in hold:
if len(hold) < 1:
print('No Ranking Questions! Moving on...')
else:
respondent_id.append(Respondent_ID)
questions.append(df.columns[ranking[i]])
answers.append(df.iloc[1:, ranking[i]])
While the code works, I don't think I can end up doing anything with the outputs to get them into a single dataframe. I've always struggled with loops so hopefully you might be able to help me get this project done.
Thanks in advance.
Solution 1:[1]
I would approach this problem by consolidating rank questions.
Loop through all "rank question" columns, and consolidate their values.
- you will end up with a list [3, 5, 4, 1, 6, 7]
Duplicate Respondent ID field n times where n == num of "rank question" columns.
- you will obtain a list [123, 234, 345, 123, 234, 345]
Create a list where you repeat "rank question" field names by the number of rows.
- you will obtain a list [rank question 1, ..., rank question2, ...]
Finally assign these lists as json and pass to pandas Dataframe.
Solution 2:[2]
I worked out a solution I'm mostly happy with:
rank_fun = {}
if len(ranking) < 1:
print('No Ranking Questions! Moving on...')
else:
for i in ranking:
rank_fun[i] = pd.concat([Respondent_ID, df.iloc[1:,i]], axis=1)
rank_fun[i]['question'] = rank_fun[i].columns[1]
rank1 = pd.DataFrame()
for i in ranking:
rank1 = rank_fun[i].append(rank_fun[i])
rank1.rename(columns={rank1.columns[1]: "answer" }, inplace = True)
rank1['answer option'] = "Rank"
rank1 = rank1[rank1['answer'].str.contains("nan")==False]
rank1
My only annoyance now is when there are no ranking questions I wish it wouldn't throw an error. Any ideas?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | SW Jeong |
| Solution 2 | CIHAnalytics |
