'Pandas: Create a tuple column from multiple columns

I have the following data frame my_df:

Person       event         time
---------------------------------
John          A        2017-10-11
John          B        2017-10-12
John          C        2017-10-14
John          D        2017-10-15
Ann           X        2017-09-01
Ann           Y        2017-09-02
Dave          M        2017-10-05
Dave          N        2017-10-07
Dave          Q        2017-10-20

I want to create a new column, which is the (event, time) pair. It should look like:

Person       event         time        event_time
------------------------------------------------------
John          A        2017-10-11     (A, 2017-10-11)
John          B        2017-10-12     (B, 2017-10-12)
John          C        2017-10-14     (C, 2017-10-14)
John          D        2017-10-15     (D, 2017-10-15)
Ann           X        2017-09-01     (X, 2017-09-01)
Ann           Y        2017-09-02     (Y, 2017-09-02)
Dave          M        2017-10-05     (M, 2017-10-05)
Dave          N        2017-10-07     (N, 2017-10-07)
Dave          Q        2017-10-20     (Q, 2017-10-20)

Here is my code:

my_df['event_time'] = my_df.apply(lambda row: (row['event'] , row['time']), axis=1)

But I got the following error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in create_block_manager_from_arrays(arrays, names, axes)
   4309         blocks = form_blocks(arrays, names, axes)
-> 4310         mgr = BlockManager(blocks, axes)
   4311         mgr._consolidate_inplace()

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in __init__(self, blocks, axes, do_integrity_check, fastpath)
   2794         if do_integrity_check:
-> 2795             self._verify_integrity()
   2796 

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in _verify_integrity(self)
   3005             if block._verify_integrity and block.shape[1:] != mgr_shape[1:]:
-> 3006                 construction_error(tot_items, block.shape[1:], self.axes)
   3007         if len(self.items) != tot_items:

/usr/local/lib/python3.4/dist-packages/pandas/core/internals.py in construction_error(tot_items, block_shape, axes, e)
   4279     raise ValueError("Shape of passed values is {0}, indices imply {1}".format(
-> 4280         passed, implied))
   4281 

ValueError: Shape of passed values is (128, 2), indices imply (128, 3)

Any idea what I did wrong in my code? Thanks!



Solution 1:[1]

Without apply

df.assign(event_time=list(zip(df.event,df.time)))
Out[1011]: 
  Person event        time        event_time
0   John     A  2017-10-11  (A, 2017-10-11)
1   John     B  2017-10-12  (B, 2017-10-12)
2   John     C  2017-10-14  (C, 2017-10-14)
3   John     D  2017-10-15  (D, 2017-10-15)
4    Ann     X  2017-09-01  (X, 2017-09-01)
5    Ann     Y  2017-09-02  (Y, 2017-09-02)
6   Dave     M  2017-10-05  (M, 2017-10-05)
7   Dave     N  2017-10-07  (N, 2017-10-07)
8   Dave     Q  2017-10-20  (Q, 2017-10-20)

Solution 2:[2]

my_df['event_time'] = my_df.apply(lambda x: tuple(x[['event','time']]),axis = 1)

This will be my approach, if you you want to use lambda for running efficiency

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 BENY
Solution 2 rad15f