'Pandas: Why does the length of an empty list equal 1?

In the example DataFrame, why is the length of an empty list 1? I'd expect an empty list to be of length 0; as len([]) == 0.

Use case:

I'm trying to count the number of values in each row, where the values are a string of comma separated integers, or alpha-numeric.


Example:

Create the sample dataset:

import pandas as pd

df = pd.DataFrame({'col1': ['1,2,3,4', '1,2,3', '1,2', '1A, 363C', 
                   '1,1-33', '26a, Green House', '** All **', '', '']})

df['col1']

0             1,2,3,4
1               1,2,3
2                 1,2
3            1A, 363C
4              1,1-33
5    26a, Green House
6           ** All **
7                    
8                    
Name: col1, dtype: object

Split the string on comma to create lists of values:

df['col1'].str.split(',')

0           [1, 2, 3, 4]
1              [1, 2, 3]
2                 [1, 2]
3            [1A,  363C]
4              [1, 1-33]
5    [26a,  Green House]
6            [** All **]
7                     []
8                     []
Name: col1, dtype: object

Try and determine the length of each list:

df['col1'].str.split(',').map(len)

0    4
1    3
2    2
3    2
4    2
5    2
6    1
7    1  <-- Expedted to be 0
8    1  <-- Expected to be 0
Name: col1, dtype: int64

Questions:

  • Why is the length of an empty list 1?
    • As pointed out by @Timus, using .map(repr) shows the list isn't empty: ['']. Thank you.
  • What would be a better approach for this use-case?


Solution 1:[1]

We can try str.count

df['count'] = df['col1'].str.count(r'[^,]+')

      col1  count
0  1,2,3,4      4
1    1,2,3      3
2      1,2      2
3       1A      1
4               0

Solution 2:[2]

The last one has the empty string.

>>> ''.split(',')
['']

Solution 3:[3]

If you want to count the empty strings as 0 you can mask them:

df['col1'].str.split(',').str.len().mask(df['col1'].eq(''),0)

Note however that split+len is not the most straightforward approach. You can just count the separators (,). Then add 1 wherever the string is not empty:

df['col1'].str.count(',').add(df['col1'].ne(''))

Output:

0    4
1    3
2    2
3    1
4    0
Name: col1, dtype: int64

Solution 4:[4]

Thank you @Timus for the insight to use .map(repr) to reveal the non-empty list as [''].


Solution:

Replace all empty string values with NaN:

df['col1'].replace('', float('nan'), inplace=True)

Apply a lambda statement to split and count, if the value is not a float:

df['count'] = df['col1'].apply(lambda x: len(x.split(',')) if not isinstance(x, float) else 0)

Result:

    col1    count
0   1,2,3,4     4
1   1,2,3       3
2   1,2         2
3   1A          1
4   NaN         0

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Shubham Sharma
Solution 2 Emma
Solution 3 mozway
Solution 4 S3DEV