'How do I import NON-CSV-DELIMITED ASCII data into SQLite 3?
How do I import non-CSV ASCII data into SQLite 3?
I have a large amount of newline (0x0A)-terminated ASCII data that I want to import to SQLite 3. I can't use CSV format, because many of the text fields have embedded commas and quotes. I can reformat the data any which way, but I'd rather not reformat as SQL insert statements (over 400 million rows).
I want the use the ".import" command, which supposedly supports an "ascii" format where a column separator of 0x1F and row separator of 0x1E is used. Somehow, it doesn't work. I keep getting "Expected 120 columns but found 1".
All the Google results that I'm getting are about CSV import, which is why I'm asking here.
I'm including a simplified example of my problem below. As mentioned, the sample data can contain quotes (single and double), tabs and commas, so I can't use those as delimiters. I can use any other byte value for column and row delimiters.
Example.
Table
create table testtable ( item char(20), descr char(30), misc char(40) );
Data
| item | descr | misc |
|---|---|---|
| apple | a fruit | no "other" comments, for now |
| grape | another fruit | Yes, I like grapes |
Solution 1:[1]
I read through the SQLite3 forums and stumbled on this solution, which isn't really documented fully.
The -ascii feature is useless/doesn't work.
-- First, I clear my test table
delete from testtable;
-- specify the column delimiter character. I used x1F \036.
-- newline remains as the record separator (visually easier)
.separator \036
-- import away
.import xxtestdat testtable
-- verify the load
.headers on
.mode column
select * from testtable;
Solution 2:[2]
Background
Where a CSV has (,,\n) for awk-termed field and record separators, or column and row separators, respectively, Sqlite's .import --ascii [file] [db] expects (\x37, \x36) which are ASCII Unit Separator (us) and Record Separator (rs) of the standard. See man ascii or other reference.
Demo
Obtain a plain example CSV using seq 9 | rs -C, 3 | sed 's/,$//' | tee seq.csv, note no trailing column delimiter.
<seq.csv tr ',' '\037' | tr '\n' '\036' | tee seq.ascii | less -RF translates the CSV row and column separators to ASCII unit- and record- separators respectively.
sqlite3 seq.db '.import --ascii seq.ascii tbl' imports.
sqlite3 seq.db '.dump tbl' shows the imported table.
Other hints
In sqlite, if the table does not exist before .import, the first row will become the header. Tables need column names.
In processing other large files, less -R, head -c [char #s], and vim may be your friend in munging these files and previewing changes.
I found quoting more finicky than I care to understand. bash or command interpret escape sequences in some contexts. tr correctly accepts $'\x1f' and '\037', but not "\x1f". Note tr accepts multichar arguments, so quoting is essential, tr , \x1f vs tr , '\x1f' makes a difference. gnu-sed adds a later stage of interpretation.
Other selected references
$ printf "\036\037 \x1e\x1f" | xxd
00000000: 1e1f 201e 1f .. ..
from man ascii,
in octal:
034 fs 035 gs 036 rs 037 usand hexadecimal:
1c fs 1d gs 1e rs 1f us.
Sqlite also allows specifying other column, and optionally row, separators.
.separator COL ?ROW? Change the column and row separators
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | |
| Solution 2 |
