'How to null missing columns when importing CSV

I have a very large(200+GB), incorrectly formatted CSV file. Some columns have fewer values than the other columns, i.e.

col1,col2,col3,col4
val2,val3,val5,val6
val2
val2,val3
val2,val4,val8,val9

Obviously, when I try to import this into postgres it will throw an error about missing data for columns. I would like to avoid fixing this CSV file, as it is very large and will likely take quite a bit of time. How do I get the importer to simply insert null values for the missing data instead of throwing an error?



Solution 1:[1]

You could use awk to edit the .csv file.


#!/bin/sh

cat - <<OMG > omg.csv
col1,col2,col3,col4
val2,val3,val5,val6
val2
val2,val3
val2,val4,val8,val9
OMG

awk -F, '{printf($0); for (i=NF;i<4;i++) {printf(",");} printf("\n"); }' < omg.csv # >out.csv

Result:


$ sh awk.sh
col1,col2,col3,col4
val2,val3,val5,val6
val2,,,
val2,val3,,
val2,val4,val8,val9
$

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 wildplasser