'Is read and insert operation faster than dump in mongodb
I need to clean a mongodb collection of 200Tb, and delete older timstamp. I am trying to build a new collection from the new, and run a delete query, since, running a del on the present collection that is in use, will slow down the other requests to it. I have thought of cloning a new collection either by taking a dump of the following collection, or by create a read and and write script, such that, it will read from the present collection and write to the cloned collection. My question is is a read/write operation of a batch ex: 1000 read and write faster than a dump ?
EDIT:
I found this, this and this article, and want to know, if writing a script in the above mentioned way the same as creating a ssh pipe of read and write ? ex: is a node/python script to fetch 1000 rows from a collection and insert that to a clone collection the same as ssh *** ". /etc/profile; mongodump -h sourceHost -d yourDatabase … | mongorestore -h targetHost -d yourDatabase ?
Solution 1:[1]
I would suggest this approach:
- Rename the collection. Your application will immediately create a new empty collection with the old name when it tries to insert some data. You may create some indexes.
- Run
mongoexport/mongoimportto import the valid data, i.e. skip the outdated.
Yes, in general mongodump/mongorestore might be faster, however at mongoexport you can define a query and limit the data which is exported. Could be like this:
mongoexport --uri "..." --db=yourDatabase --collection=collection --query='{timestamp: {$gt: ISODate("2022-01-010")}}' | mongoimport --uri "..." --db=yourDatabase --collection=collection --numInsertionWorkers=10
Utilize parameter numInsertionWorkers to run multiple workers. It will speed up your inserts.
So you run a sharded cluster? If yes, then you should use sh.splitAt() on the new collection, see How to copy a collection from one database to another in MongoDB
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|---|
| Solution 1 | Wernfried Domscheit |
