'JSONDecodeError: Extra data: line 1 column 65670 - line 1 column 66476 (char 65669 - 66475) & Trailing data Error when using Pandas.read_json

Firstly, I want to highlight that I tried all those suggested solutions even though this question seems similar to many others. I have four folders with subfolders, each containing JSON files. I want to make csvs for each one of these. The code below works just fine with three of these folders. However, I have spent too long for a solution to achieve the same for the largest of these folders. Again I have tried everything out there, from JSON.load(json_file), Pandas.read_json(json_file) giving me the errors in the title. Extra data in line 1 column 65670 when notepad showed the files are smaller than that, and Trailing data when using Pandas.read_json. It is noticeable that these errors appear when I access the files through for loops as there are +8000 of them. If I load a single file, it loads with no issues. Also, the code works ok when pointing to the smaller folders. Only failing with this particular one. Based on all the tests I've made, the error seems to be generated by the intent of loading several files at a time. The strange here is that the same code works with those folders with several similar JSON, which have been checked on JSON checkers, and nothing wrong is flagged. I hope someone could read the code and help with some guidance on what I am doing wrong here?

I wish to be able to include the json too, but I am not permitted to import files here. However I can say that it is a simple .JSON without nested objects it is 62732 characters long and it is entirely looking like this:

[{"space_Id":"FB_05_001","booking_Id":null,"node_Id":4,"summary_Date":"2022-04-01T00:00:00Z","record_Id":"1_20220401_4_FB_05_001","isWorking_Hours":0,"hourly_Index":1,"quarter_Hourly_Index":1,"avg_Occupancy":0.0,"max_Occupancy":0.0,"s5":0.0,"s6":0.0,"s7":0.0,"s8":0.0,"s9":0.0,"s10":0.0,"s11":0.0,"s12":0.0,"s13":0.0,"s14":0.0,"s15":0.0,"s16":0.0,"s17":0.0,"s18":0.0,"s19":0.0,"s24":900.0,"s25":0.0,"s26":0.0,"s27":0.0,"s28":0.0,"s29":0.0,"s30":0.0,"s31":0.0,"s36":0.0,"s37":0.0,"s38":0.0,"s39":0.0,"avg_Capacity_Utilisation":0.0,"space_Work_Type":null,"space_Type":null,"space_Class":null,"space_Name":null,"env_Zone_Id":null,"space_Type_Label":null},{"space_Id"...

def convert_json_with_new_line(path_to_folder, path_to_store):
    keylist = []
    main_folder_name = os.path.basename(path_to_folder)
    for subdirectory in os.listdir(path_to_folder):
        current_path = os.path.join(path_to_folder, subdirectory)
    
        if os.path.isdir(current_path):
            folder_name = os.path.basename(current_path)
            json_files = [pos_json for pos_json in os.listdir(current_path) if pos_json.endswith('.json')]
            
            
            for json_file in json_files:
                test_name = os.path.basename(json_file)
                file_name = os.path.splitext(test_name)[0]+'.csv'
                with open(os.path.join(current_path, test_name), 'r') as file:
                    df = json.load(file)

                    
                    try:
                        os.makedirs(path_to_store + '/' + main_folder_name + '/' + folder_name)
                    except OSError as e:
                        if e.errno != errno.EEXIST:
                            raise

                    
                    # Open file for writing
                    json_file = open(os.path.join(path_to_store, main_folder_name, folder_name, file_name), 'w', newline='')
                    csv_writer = csv.writer(json_file)
                    
                    # Counter variable used for writing headers to the csv
                    count = 0
            
                    for line in df:
                        if count == 0:            
            
                        # Writing headers to csv
                            header = line.keys()
                            csv_writer.writerow(header)
                            count += 1
                        csv_writer.writerow(line.values())

                    json_file.close()
                    file.close()

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'JSONDecodeError: Extra data: line 1 column 65670 - line 1 column 66476 (char 65669 - 66475) & Trailing data Error when using Pandas.read_json

Sources

Related Questions