'Map column lists to dictionary and create new column with padded strings

Given this dataframe and word_index dictionary:

import pandas as pd

df = pd.DataFrame(data={'text_ids': [
                                     [1, 2, 3, 2, 7, 2, 8, 2, 0],
                                     [1, 2, 4, 2, 7, 2, 8, 2, 0],
                                     [1, 2, 5, 2, 6, 2, 8, 2, 0],
                                     [1, 2, 9, 2, 6, 2, 10, 2, 11, 2, 8, 0]
                                    ]})

word_index = {0: '<eos>', 1: '<sos>', 2: '/s', 3: 'he', 4: 'she', 5:'they', 6:'love', 7:'loves', 8: 'cats', 9: 'we', 10: 'talking', 11: 'about', 12: '<pad>'}

How can I map each sequence in text_ids to its corresponding value(s) in word_index, while making sure that \s really creates spaces in each string? Also, I need to add <pad> tokens to each string that has a length smaller than the largest integer sequence.

Expected output:

                                 text_ids                                       text
0             [1, 2, 3, 2, 7, 2, 8, 2, 0]   <sos> he loves cats <eos><pad><pad><pad>
1             [1, 2, 4, 2, 7, 2, 8, 2, 0]  <sos> she loves cats <eos><pad><pad><pad>
2             [1, 2, 5, 2, 6, 2, 8, 2, 0]  <sos> they love cats <eos><pad><pad><pad>
3  [1, 2, 9, 2, 6, 2, 10, 2, 11, 2, 8, 0]     <sos> we love talking about cats <eos>

Solution 1:^[1]

Another option:

(df["text_ids"]
    .explode()
    .map(word_index)
    .groupby(level=0)
    .apply(lambda q: " ".join(q)))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source
Solution 1	Mark Moretto

'Map column lists to dictionary and create new column with padded strings

Solution 1:[1]

Sources

Related Questions

Solution 1:^[1]