'How do I import NON-CSV-DELIMITED ASCII data into SQLite 3?

How do I import non-CSV ASCII data into SQLite 3?

I have a large amount of newline (0x0A)-terminated ASCII data that I want to import to SQLite 3. I can't use CSV format, because many of the text fields have embedded commas and quotes. I can reformat the data any which way, but I'd rather not reformat as SQL insert statements (over 400 million rows).

I want the use the ".import" command, which supposedly supports an "ascii" format where a column separator of 0x1F and row separator of 0x1E is used. Somehow, it doesn't work. I keep getting "Expected 120 columns but found 1".

All the Google results that I'm getting are about CSV import, which is why I'm asking here.

I'm including a simplified example of my problem below. As mentioned, the sample data can contain quotes (single and double), tabs and commas, so I can't use those as delimiters. I can use any other byte value for column and row delimiters.

Example.

Table

create table testtable ( item char(20), descr char(30), misc char(40) );

Data

item descr misc
apple a fruit no "other" comments, for now
grape another fruit Yes, I like grapes


Solution 1:[1]

I read through the SQLite3 forums and stumbled on this solution, which isn't really documented fully.

The -ascii feature is useless/doesn't work.

-- First, I clear my test table
delete from testtable;

-- specify the column delimiter character. I used x1F \036.
-- newline  remains as the record separator (visually easier)
.separator \036

-- import away
.import xxtestdat testtable 

-- verify the load 
.headers on
.mode column

select * from testtable;

Solution 2:[2]

Background

Where a CSV has (,,\n) for awk-termed field and record separators, or column and row separators, respectively, Sqlite's .import --ascii [file] [db] expects (\x37, \x36) which are ASCII Unit Separator (us) and Record Separator (rs) of the standard. See man ascii or other reference.

Demo

Obtain a plain example CSV using seq 9 | rs -C, 3 | sed 's/,$//' | tee seq.csv, note no trailing column delimiter.

<seq.csv tr ',' '\037' | tr '\n' '\036' | tee seq.ascii | less -RF translates the CSV row and column separators to ASCII unit- and record- separators respectively.

sqlite3 seq.db '.import --ascii seq.ascii tbl' imports.

sqlite3 seq.db '.dump tbl' shows the imported table.

Other hints

In sqlite, if the table does not exist before .import, the first row will become the header. Tables need column names.

In processing other large files, less -R, head -c [char #s], and vim may be your friend in munging these files and previewing changes.

I found quoting more finicky than I care to understand. bash or command interpret escape sequences in some contexts. tr correctly accepts $'\x1f' and '\037', but not "\x1f". Note tr accepts multichar arguments, so quoting is essential, tr , \x1f vs tr , '\x1f' makes a difference. gnu-sed adds a later stage of interpretation.

Other selected references

$ printf "\036\037 \x1e\x1f" | xxd
00000000: 1e1f 201e 1f                             .. ..

from man ascii,

  • in octal:

    034 fs 035 gs 036 rs 037 us

  • and hexadecimal:

    1c fs 1d gs 1e rs 1f us.

Sqlite also allows specifying other column, and optionally row, separators. .separator COL ?ROW? Change the column and row separators

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1
Solution 2