'Duplicating row in DataFrame and slicing a string value

I have the following DataFrame:

df = pd.DataFrame({
"Name" : ["Foo", "SomeString", "Bar"], 
"value1":[1, 2, 3], 
"value2":[0, 1, 2]})

I want to check if a string in the 'Name' col. has a length > 4. If this is true I want to duplicate the entire row and split/slice the Name-string such that I get the following output:

df = pd.DataFrame({
"Name" : ["Foo", "Some", "String", "Bar"], 
"value1":[1, 2, 2, 3], 
"value2":[0, 1, 1, 2]})


Solution 1:[1]

One option is to add a space between the 4th index and the 5th; then split on it and explode:

out = (df.assign(Name=(df['Name'].str[:4] + ' ' + df['Name'].str[4:]).str.split())
       .explode('Name').reset_index(drop=True))

Output:

     Name  value1  value2
0     Foo       1       0
1    Some       2       1
2  String       2       1
3     Bar       3       2

Solution 2:[2]

First you should split a string based on camel case (assuming there are only alphabetical characters used in the name), and then split and explode the dataframe as shown below:

Altogether this would be:

df['Name'] = df['Name'].apply(lambda x: re.sub('(?:([a-z])([A-Z]))', '\\1 \\2', x) if len(x) > 4 else x
df['Name'] = df['Name'].str.split()
df = df.explode("Name").reset_index(drop=True)

Output:

     Name  value1  value2
0     Foo       1       0
1    Some       2       1
2  String       2       1
3     Bar       3       2

The separate steps are shown below:

df['Name'] = df['Name'].apply(lambda x: re.sub('(?:([a-z])([A-Z]))', '\\1 \\2', x) if len(x) > 4 else x

Output:

>>> df
          Name  value1  value2
0          Foo       1       0
1  Some String       2       1
2          Bar       3       2
df['Name'] = df['Name'].str.split()

Output:

>>> df
             Name  value1  value2
0           [Foo]       1       0
1  [Some, String]       2       1
2           [Bar]       3       2
df.explode("Name").reset_index(drop=True)

Output:

     Name  value1  value2
0     Foo       1       0
1    Some       2       1
2  String       2       1
3     Bar       3       2

Solution 3:[3]

You can use a regex to extract the chunks of your string, then explode:

(df
 .assign(Name=df['Name'].str.findall('(?:^.{,4})|(?:.+)'))
 .explode('Name')
)

Then it is easy to adapt to other rules. For example to split the words on a capital letter: '[A-Z][a-z]+'

output:

     Name  value1  value2
0     Foo       1       0
1    Some       2       1
1  String       2       1
2     Bar       3       2

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2 Saint
Solution 3 mozway