'Using str.split with regex to split between uppercase and propercase strings

I have a column of strings containing full names. Lastnames are distinguished as groups of all-uppercase letters while Firstnames are given in propercase. The majority of names are ordered as (Firstname, LASTNAME) but many contain LASTNAME information in the middle or at the beginning of the string, as in the last entries here.

0       Manuel JOSE
1       Vincent MUANDUMBA
2       Alejandro DE LORRES
3       Luis FILIPE da Rivera
4       LIM Jock Hoi

I would like to split this column into separate Firstname and Lastname columns according to whether the text in the string is in the propercase (Firstname) or in all-caps (Lastname).

new = df["FullName"].str.split(pat=r'(?=[A-Z][a-z])', n=1, expand = True)
df['FirstName'] = new[0]
df['LastName'] = new[1]

All strings in proper or lowercase should be grouped in new[0] while all strings in uppercase should be grouped in new[1]

However, I can't seem to achieve this desired output since my regex isn't right. I've also tried pat=r'[A-Z](?:[A-Z]*(?![a-z])|[a-z]*)'



Solution 1:[1]

This code is a bit longer than using a str pattern, but you can be sure it sends every part of the name string to firstname or lastname as you want. Trick is using .istitle() function.

# Split every string in FullName column by returning a list of words
new = df["FullName"].str.split(' ')

# Create empty lists to keep new columns for df
FirstName = []
LastName = []

# Iterate over every splitted string (sample)
for name in new:
    Propercase =[] #This keeps values for FirstName condition
    Allcaps = [] # This keeps values for LastName (all-caps)
    # Iterate over every word in the sample
    for n in name:
        #  Check if it is propercase or lower ('da')
        if n.istitle() or n.islower():
            Propercase.append(n)
        # If not, it is all-caps
        else:
            Allcaps.append(n)
    # Add propercase words to FirstName list
    FirstName.append(' '.join(Propercase))
    # All-caps words to LastName list
    LastName.append(' '.join(Allcaps))

# Create columns
df['FirstName'] = FirstName
df['LastName'] = LastName

Output:

                FullName       FirstName   LastName
0            Manuel JOSE          Manuel       JOSE
1      Vincent MUANDUMBA         Vincent  MUANDUMBA
2    Alejandro DE LORRES       Alejandro  DE LORRES
3  Luis FILIPE da Rivera  Luis da Rivera     FILIPE
4           LIM Jock Hoi        Jock Hoi        LIM

This can be faster if you are sure first word in the name is either complete Firstname or Lastname (most of cultures but less generalizable):

new = df["FullName"].str.split(' ',1)

FirstName = []
LastName = []
for name in new:
    if name[0].istitle():
        FirstName.append(name[0])
        LastName.append(name[1])
    else:
        FirstName.append(name[1])
        LastName.append(name[0])

df['FirstName'] = FirstName
df['LastName'] = LastName

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 marc_s