'TDengine import from csv file

Just found the speed for importing sorted csv file is faster than the speed for importing unsorted csv file in TDengine database, each csv file has 1000000 rows, the only difference is one file has timestamp sorted, the other has timestamp unsorted.

Anyone can explain why importing sorted csv file is faster?

taos> create table if not exists t1(ts timestamp, c1 int, c2 float, c3 int, c4 int);
Query OK, 0 of 0 row(s) in database (0.001659s)

taos> insert into t1 file 'unsorted.csv';
Query OK, 1000000 of 1000000 row(s) in database (2.025508s)

taos> create table if not exists t2(ts timestamp, c1 int, c2 float, c3 int, c4 int);
Query OK, 0 of 0 row(s) in database (0.001335s)

taos> insert into t2 file 'sorted.csv';
Query OK, 1000000 of 1000000 row(s) in database (0.994504s)


Solution 1:[1]

I guess the reason is TDengine storage uses LSM-tree structure. Since the imported data is time-series data and records are sorted by primary timestamp key. So writing ordered data would take advantage of LSM as data just append to disk blocks. However for random access there's penalty.

Solution 2:[2]

Sorted records always go better for a time-based database or data-structure. I think it's mostly according to your business scenario - if sorted records is easy to produce, use it, if not, let time-series database (like TDengine) to handle it.

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 GeorgeWill93
Solution 2 zitsen