'How to null missing columns when importing CSV
I have a very large(200+GB), incorrectly formatted CSV file. Some columns have fewer values than the other columns, i.e.
col1,col2,col3,col4
val2,val3,val5,val6
val2
val2,val3
val2,val4,val8,val9
Obviously, when I try to import this into postgres it will throw an error about missing data for columns. I would like to avoid fixing this CSV file, as it is very large and will likely take quite a bit of time. How do I get the importer to simply insert null values for the missing data instead of throwing an error?
Solution 1:[1]
You could use awk to edit the .csv file.
#!/bin/sh
cat - <<OMG > omg.csv
col1,col2,col3,col4
val2,val3,val5,val6
val2
val2,val3
val2,val4,val8,val9
OMG
awk -F, '{printf($0); for (i=NF;i<4;i++) {printf(",");} printf("\n"); }' < omg.csv # >out.csv
Result:
$ sh awk.sh
col1,col2,col3,col4
val2,val3,val5,val6
val2,,,
val2,val3,,
val2,val4,val8,val9
$
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | wildplasser |
