'How do I measure the length of the lists per userId using pandas?

I am trying to measure the length of the list under Original Query and subsequently find the mean and std dev but I cannot seem to measure the length. How do I do it?

This is what I tried:

filepath = "yandex_users_paired_queries.csv"        #path to the csv with the query datasetqueries = pd.read_csv(filepath)
totalNum = queries.groupby('Original Query').size().reset_index(name='counts')
sessions = queries.groupby(['UserID','Original Query'])
print(sessions.size())
print("----------------------------------------------------------------")
print("~~~Mean & Average~~~")
sessionsDF = sessions.size().to_frame('counts')
sessionsDFbyBool = sessionsDF.groupby(['Original Query'])
print(sessionsDFbyBool["counts"].agg([np.mean,np.std]))

And this is my output:

UserID  Original Query                                                                      
154     [1228124, 388107, 1244921, 3507784]                                                     1
        [1237207, 1974238, 1493311, 1222688, 733390, 868851, 428547, 110871, 868851, 235307]    1
        [1237207, 1974238, 1493311, 1222688, 733390, 868851, 428547]                            1
        [1237207, 1974238, 1493311, 1222688, 733390]                                            1
        [1237207]                                                                               1
                                                                                               ..
343     [919873, 551537, 1841361, 1377305, 610887, 1196372, 3724298]                            1
        [919873, 551537, 1841361, 1377305, 610887, 1196372]                                     1
345     [3078369, 3613096, 4249887, 2383044, 2366003, 4043437]                                  1
        [3531370, 3078369, 284354, 4300636]                                                     1
347     [1617419]                                                                               1
Length: 612, dtype: int64


Solution 1:[1]

You want to apply the len function on the 'Original Query' column.

queries['oq_len'] = queries['Original Query'].apply(len)
sessionsDF = queries.groupby('UserID').oq_len.agg([np.mean,np.std])

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Arnau