'mongodb findAndModify causes many write conflicts which causes 100% CPU usage
I’m using mongo DB with official golang driver as a storage for batch processing documents in a background. DB setup is: 1 primary and 2 secondaries.
Each document in processing collection has given structure:
{
_id: ObjectId("6239ba1f4f6a3f8e20d16243"),
data: { ... },
created_at: ISODate("2022-03-22T11:59:27.795Z"),
processed_times: 1,
status: 2, // status of processing (new, failed, in_progress)
updated_at: ISODate("2022-03-22T11:59:27.795Z"),
last_picked_at: ISODate("2022-03-22T11:59:32.164Z") // date of last "pick" for processing
}
Current flow of documents processing in that collection looks like this:
- Consumer (separate process) reads messages from event bus and writes them into processing collection as documents with status
new - Processor (separate worker process) with a little timeout “picks” 50 documents with status
newfromprocessingcollection. Basically, it runs a loop until 50 documents (or less if no more is presented in a collection) retrieved for processing. The query for “picking” each document is:
{
findAndModify: 'processing',
new: true,
query: { status: 0, last_picked_at: { '$exists': false } },
sort: { _id: 1 },
update: {
'$set': {
status: 2,
last_picked_at: current_date
},
'$inc': { processed_times: 1 }
}
UPD: I'm using findAndModify in a loop because mongo DB doesn't have sql-like select for update with lock for multiple entries. There are multiple instances of worker and I needed some built-in lock mechanism for picking documents. Also, I can't fetch only one entry because later on process service must retrieve additional information about these documents by sending their IDs to another service in batch handler. Pseudo-code of processing documents:
// pick items
itemsToProcess = []
for i to 50
item = findAndModify(...)
if item is not found:
break
else:
itemsToProcess[] = item
endfor
// run business logic which processes items and adds data into it
failedToProcess = processPickedItems(itemsToProcess)
if failedToProcess is not empty:
// returns all failed items back to processing collection to re-process
for failedItem in failedToProcess:
returnItemToProcessing(failedItem)
endfor
run transaction:
save items in different collection
delete successfully processed items
- After fetching all the documents each of them is processed and service runs a transaction, where it stores all processed documents in a different collection and removes all processed documents from
processingcollection.
All the operations (including transactions) have following read/write concerns:
write - {w: majority, j: false}
read - majority
When we ran service everything was fine, but on 4th day of work we noticed burst of conflict updates from findAndModify operation which picks documents for processing, the CPU usage of primary was 100% and execution time of such operations was getting higher and higher. We switched over the primary node and the overload has gone, but then we’ve experienced the same behaviour after 5 days. I’ve tried to understand what is wrong with the code and why there are such bursts of write conflicts after a long period of normal work without much conflicts, but i’ve ran out of guesses.
What would you suggest to identify potential causes for such case?
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
| Solution | Source |
|---|
