'Pandas: Why does the length of an empty list equal 1?
In the example DataFrame, why is the length of an empty list 1? I'd expect an empty list to be of length 0; as len([]) == 0.
Use case:
I'm trying to count the number of values in each row, where the values are a string of comma separated integers, or alpha-numeric.
Example:
Create the sample dataset:
import pandas as pd
df = pd.DataFrame({'col1': ['1,2,3,4', '1,2,3', '1,2', '1A, 363C',
'1,1-33', '26a, Green House', '** All **', '', '']})
df['col1']
0 1,2,3,4
1 1,2,3
2 1,2
3 1A, 363C
4 1,1-33
5 26a, Green House
6 ** All **
7
8
Name: col1, dtype: object
Split the string on comma to create lists of values:
df['col1'].str.split(',')
0 [1, 2, 3, 4]
1 [1, 2, 3]
2 [1, 2]
3 [1A, 363C]
4 [1, 1-33]
5 [26a, Green House]
6 [** All **]
7 []
8 []
Name: col1, dtype: object
Try and determine the length of each list:
df['col1'].str.split(',').map(len)
0 4
1 3
2 2
3 2
4 2
5 2
6 1
7 1 <-- Expedted to be 0
8 1 <-- Expected to be 0
Name: col1, dtype: int64
Questions:
- Why is the length of an empty list 1?
- As pointed out by @Timus, using
.map(repr)shows the list isn't empty:['']. Thank you.
- As pointed out by @Timus, using
- What would be a better approach for this use-case?
Solution 1:[1]
We can try str.count
df['count'] = df['col1'].str.count(r'[^,]+')
col1 count
0 1,2,3,4 4
1 1,2,3 3
2 1,2 2
3 1A 1
4 0
Solution 2:[2]
The last one has the empty string.
>>> ''.split(',')
['']
Solution 3:[3]
If you want to count the empty strings as 0 you can mask them:
df['col1'].str.split(',').str.len().mask(df['col1'].eq(''),0)
Note however that split+len is not the most straightforward approach. You can just count the separators (,). Then add 1 wherever the string is not empty:
df['col1'].str.count(',').add(df['col1'].ne(''))
Output:
0 4
1 3
2 2
3 1
4 0
Name: col1, dtype: int64
Solution 4:[4]
Thank you @Timus for the insight to use .map(repr) to reveal the non-empty list as [''].
Solution:
Replace all empty string values with NaN:
df['col1'].replace('', float('nan'), inplace=True)
Apply a lambda statement to split and count, if the value is not a float:
df['count'] = df['col1'].apply(lambda x: len(x.split(',')) if not isinstance(x, float) else 0)
Result:
col1 count
0 1,2,3,4 4
1 1,2,3 3
2 1,2 2
3 1A 1
4 NaN 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Shubham Sharma |
| Solution 2 | Emma |
| Solution 3 | mozway |
| Solution 4 | S3DEV |
