'Most efficient way to bind several csv of 4 million rows each up to 80 million rows?

I have several files containing ~4 million rows each with the same 4 columns IDs. I am looking for the most efficient way to bind all of them (total rows would be around 80 million). I think it would be equivalent to concatenating all these rows. In R I would simply use

rbind(csv1, csv2)

but I've tried and it took really long. I don't know if there is a more efficient way to do this, even considering other tools. I am running them in my laptop (8GB RAM).

The number of rows on each file is different ranging from 4 to 2 million each. A sample file would look like this:

id chr pos genotype    
rs7349153   1   565490  TC
rs568632519 1   565596  GA
rs534091456 1   565619  AT
rs539860681 1   565643  TC
rs572552962 1   565658  TC
rs375428604 1   565696  CA

where id chr pos genotype are the column names. All rows are different, the only pattern is that each file is splitted by chr column (so there is one file with chr1, other with chr2, etc). Final output I expect is a txt with all those rows concatenated, such as:

id chr pos genotype    
    rs4349153   1   565490  TC
    rs468622519 1   565396  GA
    rs534091456 2   565319  TT
    rs639810381 2   565443  TT
    rs572552362 3   564658  AC
    rs675422304 3   565396  CA

I am open to using any other tools. I've never used a database but I can give it a try.

I also thought about using bash's cat but I don't know if I'll have the same problems as with rbind

Thank you for your insights!

EDIT: Added more details.

python r bash terminal

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution	Source

'Most efficient way to bind several csv of 4 million rows each up to 80 million rows?

Sources

Related Questions