'replicate rows by n times in python

I have a dataframe that the first column are strings, the 2nd column are the numbers that I want to replicate each string.

df = pd.DataFrame({'String':['a','b', 'c', 'd', 'e', 'f', 'g'],
        'Times':[2, 3, 5, 3, 2, 4, 5]})
df
String Times
a 2
b 3
c 5
d 3
e 2
f 4
g 5

How can I create a data frame into this in python? (stopped at d but you know the pattern). Thanks!

String Times
a 1
a 2
b 1
b 2
b 3
c 1
c 2
c 3
c 4
c 5


Solution 1:[1]

A bit messy, but seems to do what you want:

new_df = pd.DataFrame(df.String.str.repeat(df.Times).apply(list).explode().reset_index(drop=True))

new_df["Times"] = new_df.groupby("String").cumcount() + 1

Output:

   String  Times
0       a      1
1       a      2
2       b      1
3       b      2
4       b      3
5       c      1
6       c      2
7       c      3
8       c      4
9       c      5
10      d      1
11      d      2
12      d      3
13      e      1
14      e      2
15      f      1
16      f      2
17      f      3
18      f      4
19      g      1
20      g      2
21      g      3
22      g      4
23      g      5

Solution 2:[2]

Use Index.repeat to replicate the rows and groupby+cumcount: to enumerate them:

(df.loc[df.index.repeat(df['Times'])]
   .assign(Times=lambda d: d.groupby('String').cumcount().add(1))
 )

Output:

  String  Times
0      a      1
0      a      2
1      b      1
1      b      2
1      b      3
2      c      1
2      c      2
2      c      3
2      c      4
2      c      5
3      d      1
...
comparison of all answers (7k rows):
@mozway
11.2 ms ± 1.63 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)

@Onyambu
16 ms ± 685 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

@ifly6
27.3 ms ± 2.42 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

@ddejohn
28.5 ms ± 4.2 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

On 700k rows: 1s, 1.43s, 2.34s, 2.94s in the same order

Solution 3:[3]

Use df.explode. Make this work by converting your repetitions to list:

s = ['a','b', 'c', 'd', 'e', 'f', 'g']
t = [2, 3, 5, 3, 2, 4, 5]

Then,

times = [list(range(i + 1)) for i in t]
df = pd.DataFrame({'string': s, 'times': times}).explode('times')
>>> df.head(10)
  string times
0      a     0
0      a     1
0      a     2
1      b     0
1      b     1
1      b     2
1      b     3
2      c     0
2      c     1
2      c     2

If you don't want the index repeated like that, df.reset_index(drop=True, inplace=True) (if in place, otherwise, reassign). If you want 1-indexing rather than 0-indexing, for list(range(i + 1)) substitute list(range(1, i + 2)). Alternatively, just add one to the times column at the end. The times column doesn't really matter for df.explode, it explodes based on repetitions anyway.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 ddejohn
Solution 2
Solution 3 ifly6