'Assign values from Numpy Array to Pandas DataFrame based on almost matching unix timestamp

I am given a 2D numpy array and a huge pandas DataFrame. A dummy example of them would look somewhat like this:

arr = np.array([[1648137283, 0],
                [1648137284, 1],
                [1648137285, 2],
                [1648137286, 3],
                .....
                [1658137287, 4],
                [1658137288, 5],
                [1658137289, 6]])

df.head(-6)
            unix         ...   value_a 
0           1643137283   ...     23
1           1643137284   ...     54
2           1643137285   ...     25
...          ...         ...     ...   
10036787    1653174068   ...     75
10036788    1653174069   ...     65
10036789    1653174070   ...     23

In the first column of arr is a unix timestamp and in the second an id-value. The DataFrame also has a column for the unix timestamp. My goal is to map the id-value from arr based on the unix timestamp to the corresponding timestamp of df in a separate new column called 'index'.

Now, these are probably important notes:

df contains only a portion of all timestamps from arr
df and arr have different lengths along the axis=0
the timestamps in df are ordered in sequences and repeat themselves
arr contains all unix timestamps from df but not the way around
about 1% of the unix values do not match perfectly. My unix is in unit='ms', some timestamps are off by +/-1 or +/-2, however, in my use cases they can bee seen as identical

I could do this within a loop or with np.where(). However, as arr and df are quite large, I was hoping for a fast solution.

Solution 1:^[1]

The idea is to convert the numpy array to a mapping containing key-val pairs, where key is unix timestamps and value is correponding id, then you can use series.map to substitute/map the values in the given dataframe

df['index'] = df['unix'].map(dict(arr))

Sample output

                unix  ...  value_a  index
0         1643137283  ...       23      0
1         1643137284  ...       54      1
2         1643137285  ...       25      2
10036787  1653174068  ...       75      3
10036788  1653174069  ...       65      5
10036789  1653174070  ...       23      6

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Shubham Sharma

'Assign values from Numpy Array to Pandas DataFrame based on almost matching unix timestamp

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]