'Checking a Pandas dataframe, check for length of strings and move that data to another dataframe
I have a dataframe like this (called df):
OU
CORP:Jenny Smith:
"CORP:John Smith:,John Smith:"
CORP:LINK:
CORP:Harry Linkster:
STORE:Mary Poppins:
STORE:Tony Stark:
STORE:Carmen Sandiego:
NEWS:Peter Parker:
NEWS:PARK:
NEWS:Clark Kent:
I want to parse it and check for any ONE word strings in the column, such as: LINK and PARK.
This is the logic I have:
for i in df.iteritems():
#if length of strings in between ':' == 1
#drop that row, and move to another dataframe df2
df should look like this after:
OU
CORP:Jenny Smith:
"CORP:John Smith:,John Smith:"
CORP:Harry Linkster:
STORE:Mary Poppins:
STORE:Tony Stark:
STORE:Carmen Sandiego:
NEWS:Peter Parker:
NEWS:Clark Kent:
df2 should look like this
OU
CORP:LINK:
NEWS:PARK:
Solution 1:[1]
IIUC:
m = df['OU'].str.split(':').str[1].str.split().str.len() == 1
df2 = df[m]
df = df[~m]
Output:
>>> df
OU
0 CORP:Jenny Smith:
1 "CORP:John Smith:,John Smith:"
3 CORP:Harry Linkster:
4 STORE:Mary Poppins:
5 STORE:Tony Stark:
6 STORE:Carmen Sandiego:
7 NEWS:Peter Parker:
9 NEWS:Clark Kent:
>>> df2
OU
2 CORP:LINK:
8 NEWS:PARK:
Solution 2:[2]
data
OU
0 CORP:Jenny Smith:
1 CORP:John Smith:,John Smith:
2 CORP:LINK:
3 CORP:Harry Linkster:
4 STORE:Mary Poppins:
5 STORE:Tony Stark:
6 STORE:Carmen Sandiego:
7 NEWS:Peter Parker:
8 NEWS:PARK:
9 NEWS:Clark Kent:
solution split the string by first substring and find length of resulting list. Use that to generate boolean indexing to conditionally come up with dfs.
m=df['OU'].str.split('^[\w]+\:|\s').str.len()==2
df1=df[m]
df2=df[~m]
print(df1)
OU
2 CORP:LINK:
8 NEWS:PARK:
print(df2)
OU
0 CORP:Jenny Smith:
1 CORP:John Smith:,John Smith:
3 CORP:Harry Linkster:
4 STORE:Mary Poppins:
5 STORE:Tony Stark:
6 STORE:Carmen Sandiego:
7 NEWS:Peter Parker:
9 NEWS:Clark Kent:
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Corralien |
| Solution 2 |
