'Pandas and Sets - ValueError: Length of values does not match length of index
I am trying to create a new column in my dataframe that contains the intersection of two sets (each contained in two separate columns). The columns themselves hold sets.
dfc['INTERSECTION'] = set(dfc.TABS1).intersection(set(dfc.TABS2))
I get a Value error. I was able to do
dfc['LEFT'] = set(dfc.TABS1) - set(dfc.TABS2)
no problem. TABS1 and TABS2 have values.
Any thoughts? Thanks.
I am adding example data below.
GROUP TABS1 TABS2
A {'T1','T2','T3'} {'T2','T3','T4'}
B {'T5', 'T6'} {'T6'}
Chris gave example, but using very different data set. I am looking for the intersection of TAB1 and TAB2 in a third column 'INTERSECTION. As mentioned above, I have no problems with
dfc['LEFT'] = set(dfc.TAB1) - set(dfc.TAB2)
This looks like it should be so straight forward...
Solution 1:[1]
set removes duplicates so you end up with a dict with a length less than the length of your dataframe. You need make sure the length of the array you are assign to a new column is equal to the length of the dataframe. You can replace the non-intersections with NaN if you want using list comprehension:
# sample data
df = pd.DataFrame([[1,2,3], [1,2,3], [2,3,4], [3,4,5]], columns=list('abc'))
# list comprehension
df['intersection'] = [a if a in set(df['b']) else np.nan for a in df['a']]
a b c intersection
0 1 2 3 NaN
1 1 2 3 NaN
2 2 3 4 2.0
3 3 4 5 3.0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | It_is_Chris |
