'Pandas DataFrame column (Series) has different index than the Dataframe?
Consider this small script:
import pandas as pd
aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a
bb.index = bb.index + 1
aa['b'] = bb
print(aa)
print(aa.a - aa.b)
the output is:
a b
0 1 NaN
1 2 1.0
2 3 2.0
0 NaN
1 0.0
2 0.0
3 NaN
while I was expecting aa.a - aa.b to be
0 NaN
1 1.0
2 1.0
How is this possible? Is it a Pandas bug?
Solution 1:[1]
aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a
bb.index = bb.index + 1
aa['b'] = bb
aa.reset_index(drop=True) # add this
your index does not match.
Solution 2:[2]
When you do aa.b - aa.a , you're substracting 2 pandas.Series having a same lenght, but not the same index :
aa.a
1 1
2 2
3 3
Name: a, dtype: int64
Where as:
aa.b
0 NaN
1 1.0
2 2.0
Name: b, dtype: float64
And when you do :
print(aa.b - aa.a)
you're printing the merge of these 2 pandas.Series (regardless the operation type : addition or substraction), and that's why the indices [0,1,2] and [1,2,3] will merged to a new index from 0 to 3 : [0,1,2,3].
And for instance, if you shift of 2 your bb.index instead of 1:
bb.index = bb.index + 2
that time, you will have 5 rows in your new pandas.Series instead of 4. And so on..
bb.index = bb.index + 2
aa['b'] = bb
print(aa.a - aa.b)
0 NaN
1 NaN
2 0.0
3 NaN
4 NaN
dtype: float64
Solution 3:[3]
Use this code to get what you expect:
aa = pd.DataFrame({'a': [1,2,3]})
bb = aa.a.copy()
bb.index = bb.index + 1
aa['b'] = bb
print(aa)
print(aa.a - aa.b)
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | User |
| Solution 2 | Pascal G. Bernard |
| Solution 3 | Pythonic2020 |
