'Converting non-conventional CSV into a compatible CSV

I got this kind of CSV which is not an CSV. Let me explain it. The output has column_name,value.

e.g.

created_at,2022-02-10T16:53:49+01:00
updated_at,2022-04-28T00:35:30+02:00
timezone,Europe/Berlin

I need to convert it into a friendly csv for excel. So the required output will be:

"created_at","updated_at","timezone"
"2022-02-10T16:53:49+01:00","2022-04-28T00:35:30+02:00","Europe/Berlin"

I found a way using this code:

pd.read_csv('in.txt', header=None).T.to_csv('output.csv', header=False, index=False)

Which outputs a file like this:

created_at,updated_at,timezone
2022-02-10T16:53:49+01:00,2022-04-28T00:35:30+02:00,Europe/Berlin

That could be OK but no, the double quotes are missing on each of the lines and also the CR at each of those lines too. So when I open the CSV with the excel, go to DATA > TEXT TO COLUMN and try to order as if were a table, does not work because of the missing double quotes and the CR.

I need to process a 2MB file and I was trying to do it automatically but I didn't find a proper way to do it.



Solution 1:[1]

Unless you are using Pandas for some other thing, there is no need to use it for this situation: the stdlib csv module allows full control of the output formatting.

All you need in Python is:

import csv

with open("input_file.csv") as input, open("output_file.csv", "wt") as output:
    writer = csv.writer(output, quoting=csv.QUOTE_ALL)
    writer.writerows(csv.reader(input))

There is some indirection in the docs about all the formatting options, but they are at the end of the section on the CSV module: https://docs.python.org/3/library/csv.html#csv-fmt-params

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1