'Join two columns of integers in a pandas dataframe to a column of tuples
I want to combine two columns of features into one column, where each row will represent a data point as a tuple.
For example, here is my data frame:
Weather Temp Play
0 2 1 0
1 2 1 0
2 0 1 1
3 1 2 1
4 1 0 1
5 1 0 0
I want it to look something like this:
x Play
0 (2,1) 0
1 (2,1) 0
2 (0,1) 1
3 (1,2) 1
4 (1,0) 1
5 (1,0) 0
I want to then use this for model.fit(df[x], df[Play]) for Bernoulli Naive Bayes.
Is this at all possible? I am trying to avoid using lists. How can I do this for n columns next time?
Solution 1:[1]
You can use zip
df['x'] = list(zip(df.Weather, df.Temp))
Weather Temp Play x
0 1 1 4 (1, 1)
1 2 1 5 (2, 1)
2 3 1 6 (3, 1)
Solution 2:[2]
df.apply() can be used for a variety of abnormal cases such as this one:
df['x'] = df.apply(lambda x: (x.Weather, x.Temp), axis=1)
Output:
Weather Temp Play x
0 2 1 0 (2, 1)
1 2 1 0 (2, 1)
2 0 1 1 (0, 1)
3 1 2 1 (1, 2)
4 1 0 1 (1, 0)
5 1 0 0 (1, 0)
Solution 3:[3]
To complement the answer of @SruthiV, if you want to obtain the shown format (where you replace the 2 columns by a new one), you can remove the columns while using them with pop:
df['x'] = list(zip(df.pop('Weather'), df.pop('Temp')))
Output:
Play x
0 0 (2, 1)
1 0 (2, 1)
2 1 (0, 1)
3 1 (1, 2)
4 1 (1, 0)
5 0 (1, 0)
Similarly, if you want to insert the new column in the position of the (first) previous one:
df.insert(df.columns.get_loc('Weather'), 'x',
list(zip(df.pop('Weather'), df.pop('Temp'))))
NB. This operation is in place
Output:
x Play
0 (2, 1) 0
1 (2, 1) 0
2 (0, 1) 1
3 (1, 2) 1
4 (1, 0) 1
5 (1, 0) 0
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Sruthi V |
| Solution 2 | BeRT2me |
| Solution 3 | mozway |
