'what is this format and how to read it in pandas
I have a file like this:
{"Name": "John", "age": 15}{"Name": "Anna", "age": 12}
they are on the same line. What kind of format this file belongs to? How to read it into pandas dataframe so that
name age
John 15
Anna 12
Thanks!
Solution 1:[1]
Approach 1 (use regex)
In your case, you may read the content of your file using:
with open('file_path', 'r') as f:
content = f.read()
but in my test I will just assign content with your example line
content = '''{"Name": "John", "age": 15}{"Name": "Anna", "age": 12}'''
Then re.findall to extract the data into a list of tuples.
import re
data = re.findall(r'{"Name": "([^"]*)", "age": (\d+)}', content)
print(data)
[('John', '15'), ('Anna', '12')]
Then build the dataframe with
pd.DataFrame(data, columns=['Name', 'age'])
Note: re.findall attempts to find this pattern {"Name": "([^"]*)", "age": (\d+)} from content, and anything within the brackets () is extracted. ([^"]*) is used for Name and means any length of string that does not include a " (so my assumption is that a name field never contains a ". For age, (\d+) means any length (>=1) of digits.
Approach 2 (use json)
Another way is to make your content a json.
import json
pd.DataFrame(json.loads('[' + content.replace('}{"Name": ', '},{"Name": ') + ']'))
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 |
