'How do I measure the length of the lists per userId using pandas?
I am trying to measure the length of the list under Original Query and subsequently find the mean and std dev but I cannot seem to measure the length. How do I do it?
This is what I tried:
filepath = "yandex_users_paired_queries.csv" #path to the csv with the query datasetqueries = pd.read_csv(filepath)
totalNum = queries.groupby('Original Query').size().reset_index(name='counts')
sessions = queries.groupby(['UserID','Original Query'])
print(sessions.size())
print("----------------------------------------------------------------")
print("~~~Mean & Average~~~")
sessionsDF = sessions.size().to_frame('counts')
sessionsDFbyBool = sessionsDF.groupby(['Original Query'])
print(sessionsDFbyBool["counts"].agg([np.mean,np.std]))
And this is my output:
UserID Original Query
154 [1228124, 388107, 1244921, 3507784] 1
[1237207, 1974238, 1493311, 1222688, 733390, 868851, 428547, 110871, 868851, 235307] 1
[1237207, 1974238, 1493311, 1222688, 733390, 868851, 428547] 1
[1237207, 1974238, 1493311, 1222688, 733390] 1
[1237207] 1
..
343 [919873, 551537, 1841361, 1377305, 610887, 1196372, 3724298] 1
[919873, 551537, 1841361, 1377305, 610887, 1196372] 1
345 [3078369, 3613096, 4249887, 2383044, 2366003, 4043437] 1
[3531370, 3078369, 284354, 4300636] 1
347 [1617419] 1
Length: 612, dtype: int64
Solution 1:[1]
You want to apply the len function on the 'Original Query' column.
queries['oq_len'] = queries['Original Query'].apply(len)
sessionsDF = queries.groupby('UserID').oq_len.agg([np.mean,np.std])
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Arnau |
