'what is this format and how to read it in pandas

I have a file like this:

{"Name": "John", "age": 15}{"Name": "Anna", "age": 12}

they are on the same line. What kind of format this file belongs to? How to read it into pandas dataframe so that

name    age
John    15
Anna    12

Thanks!



Solution 1:[1]

Approach 1 (use regex)

In your case, you may read the content of your file using:

with open('file_path', 'r') as f:
    content = f.read()

but in my test I will just assign content with your example line

content = '''{"Name": "John", "age": 15}{"Name": "Anna", "age": 12}'''

Then re.findall to extract the data into a list of tuples.

import re    
data = re.findall(r'{"Name": "([^"]*)", "age": (\d+)}', content)

print(data)
[('John', '15'), ('Anna', '12')]

Then build the dataframe with

pd.DataFrame(data, columns=['Name', 'age'])

Note: re.findall attempts to find this pattern {"Name": "([^"]*)", "age": (\d+)} from content, and anything within the brackets () is extracted. ([^"]*) is used for Name and means any length of string that does not include a " (so my assumption is that a name field never contains a ". For age, (\d+) means any length (>=1) of digits.

Approach 2 (use json)

Another way is to make your content a json.

import json

pd.DataFrame(json.loads('[' + content.replace('}{"Name": ', '},{"Name": ') + ']'))

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1