'Get index of column containing string of key:value pairs
I have a data frame df:
Name Student_info School
Rajat {"FirstName":"Rajat", "LastName":"Sinha", "birthDate":"1999-05-01"} XYZ
Vivek {"FirstName":"Vivek", "LastName":"Vishwa", "birthDate":"1999-07-09"} ABC
Ram {"FirstName":"", "LastName":"Ram", "birthDate":"1999-05-09"} ABC
John {"FirstName":"", "LastName":"Mac", "birthDate":"1999-08-03"} ABC
I want to get the index of FirstName and LastName (indices of the keys and corresponding values). How can I get this? e.g: index of "FirstName":"Rajat" inside the Student_info column.
I tried:
index = [i for i in df.columns if isinstance(data[i][0], dict)]
But this gives index as null (as it's not a dictionary, but a string of values in the form of "key":"value" pairs)
I want to check if "FirstName":"" is empty:
ValueToCheck = ""
ValuesInDataframe = ? # get the index of empty string of key FirstName
if (ValueToCheck == ValuesInDataframe):
return true
What is the best way to get the index to compare the strings?
Solution 1:[1]
I assume your df is being read from a csv (or similar) file, something along the lines of:
data = StringIO('''
Name Student_info School
Rajat {"FirstName":"Rajat","LastName":"Sinha","birthDate":"1999-05-01"} XYZ
Vivek {"FirstName":"Vivek","LastName":"Vishwa","birthDate":"1999-07-09"} ABC
Ram {"FirstName":"","LastName":"Ram","birthDate":"1999-05-09"} ABC
John {"FirstName":"","LastName":"Mac","birthDate":"1999-08-03"} ABC
''')
df = pd.read_csv(data, sep = '\s+')
In this case 'Student_info' has strings that 'look like' dicts.
We can convert them to actual dicts by pply-ing eval and then expanding into separate columns:
df_si = df['Student_info'].apply(eval).apply(pd.Series)
df_si looks like this:
FirstName LastName birthDate
0 Rajat Sinha 1999-05-01
1 Vivek Vishwa 1999-07-09
2 Ram 1999-05-09
3 Mac 1999-08-03
You can combine this with the rest of df using join:
df = df[['Name', 'School']].join(df_si)
Now df looks like this:
Name School FirstName LastName birthDate
-- ------ -------- ----------- ---------- -----------
0 Rajat XYZ Rajat Sinha 1999-05-01
1 Vivek ABC Vivek Vishwa 1999-07-09
2 Ram ABC Ram 1999-05-09
3 John ABC Mac 1999-08-03
You can now extract those rows where Firstname is an empty string:
df[df['FirstName']=='']
output:
Name School FirstName LastName birthDate
-- ------ -------- ----------- ---------- -----------
2 Ram ABC Ram 1999-05-09
3 John ABC Mac 1999-08-03
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | piterbarg |
