'Pandas: Create a tuple column from multiple columns
I have the following data frame my_df:
Person event time
---------------------------------
John A 2017-10-11
John B 2017-10-12
John C 2017-10-14
John D 2017-10-15
Ann X 2017-09-01
Ann Y 2017-09-02
Dave M 2017-10-05
Dave N 2017-10-07
Dave Q 2017-10-20
I want to create a new column, which is the (event, time) pair. It should look like:
Person event time event_time
------------------------------------------------------
John A 2017-10-11 (A, 2017-10-11)
John B 2017-10-12 (B, 2017-10-12)
John C 2017-10-14 (C, 2017-10-14)
John D 2017-10-15 (D, 2017-10-15)
Ann X 2017-09-01 (X, 2017-09-01)
Ann Y 2017-09-02 (Y, 2017-09-02)
Dave M 2017-10-05 (M, 2017-10-05)
Dave N 2017-10-07 (N, 2017-10-07)
Dave Q 2017-10-20 (Q, 2017-10-20)
Here is my code:
my_df['event_time'] = my_df.apply(lambda row: (row['event'] , row['time']), axis=1)
But I got the following error:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
4309 blocks = form_blocks(arrays, names, axes)
-> 4310 mgr = BlockManager(blocks, axes)
4311 mgr._consolidate_inplace()
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
2794 if do_integrity_check:
-> 2795 self._verify_integrity()
2796
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in _verify_integrity(self)
3005 if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 3006 construction_error(tot_items, block.shape[1:], self.axes)
3007 if len(self.items) != tot_items:
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
4279 raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280 passed, implied))
4281
ValueError: Shape of passed values is (128, 2), indices imply (128, 3)
Any idea what I did wrong in my code? Thanks!
Solution 1:[1]
Without apply
df.assign(event_time=list(zip(df.event,df.time)))
Out[1011]:
Person event time event_time
0 John A 2017-10-11 (A, 2017-10-11)
1 John B 2017-10-12 (B, 2017-10-12)
2 John C 2017-10-14 (C, 2017-10-14)
3 John D 2017-10-15 (D, 2017-10-15)
4 Ann X 2017-09-01 (X, 2017-09-01)
5 Ann Y 2017-09-02 (Y, 2017-09-02)
6 Dave M 2017-10-05 (M, 2017-10-05)
7 Dave N 2017-10-07 (N, 2017-10-07)
8 Dave Q 2017-10-20 (Q, 2017-10-20)
Solution 2:[2]
my_df['event_time'] = my_df.apply(lambda x: tuple(x[['event','time']]),axis = 1)
This will be my approach, if you you want to use lambda for running efficiency
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | BENY |
| Solution 2 | rad15f |
