'Temporal/Cadence performance tuning
Could anyone help me understand the following situation. I have 1 worker with the configuration:
workerOptions := worker.Options{
BackgroundActivityContext: ctx,
MaxConcurrentWorkflowTaskPollers: 10,
MaxConcurrentActivityTaskPollers: 20,
MaxConcurrentWorkflowTaskExecutionSize: 256,
MaxConcurrentLocalActivityExecutionSize: 256,
MaxConcurrentActivityExecutionSize: 256,
If I set MaxConcurrentWorkflowTaskExecutionSize and MaxConcurrentActivityExecutionSize to 1024, the worker starts to work too slowly. I thought that increasing these two options will help to handle more Activities and WorkflowTasks, but it works differently. The worker has enough CPU/RAM and he is not overloaded at all.
From the Temporal UI I was able to catch that some of workflows freeze for some time in such history state:
1 WorkflowExecutionStarted Aug 10th 10:40:17 am CLOSE TIMEOUT 30m
2 WorkflowTaskScheduled Aug 10th 10:40:17 am TASKQUEUE temporal-basic
Also I adjusted such matching parameters:
matching.numTaskqueueReadPartitions:
- value: 100
constraints: {}
matching.numTaskqueueWritePartitions:
- value: 100
enter code here
Also when I am playing with a different configurations of worker from time to time I can get such errors on the history service:
temporal-history-5f8757cc4f-v8h94 temporal-history {"level":"error","ts":"2021-08-09T22:26:09.181Z","msg":"Fail to process task","service":"history","shard-id":255,"address":"10.218.13.7:7234","shard-item":"0xc09d263700","component":"transfer-queue-processor","cluster-name":"active","shard-id":255,"queue-task-id":2213997,"queue-task-visibility-timestamp":"2021-08-09T22:26:00.658Z","xdc-failover-version":0,"queue-task-type":"TransferActivityTask","wf-namespace-id":"4b775794-a076-499e-aa11-177db696d780","wf-id":"basic-workflow-30-0-5-3523","wf-run-id":"fc82334c-b57d-4d08-8c0d-480b6156b995","error":"context deadline exceeded","lifecycle":"ProcessingFailed","logging-call-at":"taskProcessor.go:332","stacktrace":"go.temporal.io/server/common/log.(*zapLogger).Error\n\t/temporal/common/log/zap_logger.go:143\ngo.temporal.io/server/service/history.(*taskProcessor).handleTaskError\n\t/temporal/service/history/taskProcessor.go:332\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck.func1\n\t/temporal/service/history/taskProcessor.go:218\ngo.temporal.io/server/common/backoff.Retry\n\t/temporal/common/backoff/retry.go:103\ngo.temporal.io/server/service/history.(*taskProcessor).processTaskAndAck\n\t/temporal/service/history/taskProcessor.go:244\ngo.temporal.io/server/service/history.(*taskProcessor).taskWorker\n\t/temporal/service/history/taskProcessor.go:167"}
The goal’s to understand what I should adjust(options/configs) to get more performance from Temporal.
I will appreciate any tips on where to look at a problem.
Solution 1:[1]
Here's a guide on how to think about Worker tuning. If it doesn't cover your case, please submit an issue!
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|---|
Solution 1 | Loren |