'Pandas - select column using other column value as column name

I have a dataframe that contains a column, let's call it "names". "names" has the name of other columns. I would like to add a new column that would have for each row the value based on the column name contained on that "names" column.

Example:

Input dataframe: pd.DataFrame.from_dict({"a": [1, 2, 3,4], "b": [-1,-2,-3,-4], "names":['a','b','a','b']})

  a  |  b  | names |
 --- | --- | ----  |
  1  |  -1 | 'a'   |
  2  |  -2 | 'b'   |
  3  |  -3 | 'a'   |
  4  |  -4 | 'b'   |

Output dataframe: pd.DataFrame.from_dict({"a": [1, 2, 3,4], "b": [-1,-2,-3,-4], "names":['a','b','a','b'], "new_col":[1,-2,3,-4]})

  a  |  b  | names | new_col | 
 --- | --- | ----  | ------  |
  1  |  -1 | 'a'   |    1    |
  2  |  -2 | 'b'   |   -2    |
  3  |  -3 | 'a'   |    3    |
  4  |  -4 | 'b'   |   -4    |


Solution 1:[1]

Because DataFrame.lookup is deprecated as of Pandas 1.2.0, the following is what I came up with using DataFrame.melt:

df['new_col'] = df.melt(id_vars='names', value_vars=['a', 'b'], ignore_index=False).query('names == variable').loc[df.index, 'value']

Output:

>>> df
   a  b names  new_col
0  1 -1     a        1
1  2 -2     b       -2
2  3 -3     a        3
3  4 -4     b       -4

Can this be simplified? For correctness, the index must not be ignored.

Additional reference:

Solution 2:[2]

Solution using pd.factorize (from https://github.com/pandas-dev/pandas/issues/39171#issuecomment-773477244):

idx, cols = pd.factorize(df['names'])
df.reindex(cols, axis=1).to_numpy()[np.arange(len(df)), idx]

Solution 3:[3]

With the straightforward and easy solution (lookup) deprecated, another alternative to the pandas-based ones proposed here is to convert df into a numpy array and use numpy indexing:

df['new_col'] = df.values[df.index.get_indexer(df['names'].index), df.columns.get_indexer(df['names'])]

Let me explain what this does. df.values is a numpy array based on the DataFrame. As numpy arrays have to be indexed numerically, we need to use the get_indexer function to convert the pandas row and column index names to index numbers that can be used with numpy:

>>> df.index.get_indexer(df['names'].index)
array([0, 1, 2, 3], dtype=int64)

>>> df.columns.get_indexer(df['names'])
array([0, 1, 0, 1], dtype=int64)

(In this case, where the row index is already numerical, you could get away with simply using df.index as the first argument inside the bracket, but this does not work generally.)

Solution 4:[4]

Here's a short solution using df.melt and df.merge:

df.merge(df.melt(var_name='names', ignore_index=False), on=[None, 'names'])

Outputs:

   key_0  a  b names value
0      0  1 -1     a     1
1      1  2 -2     b    -2
2      2  3 -3     a     3
3      3  4 -4     b    -4

There's a redundant key_0 column which you need to drop with df.drop.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 ignoring_gravity
Solution 3 Erlend Magnus Viggen
Solution 4 Leif Metcalf