'Hints get generated past max_hint_window_in_ms

I have cassandra cluster with 8 nodes, where I put 2 of them down. Cassandra v3.11.11.

Here is my cassandra.yaml config

hinted_handoff_enabled: true
max_hint_window_in_ms: 3600000
hinted_handoff_throttle_in_kb: 1024
max_hints_delivery_threads: 2
hints_flush_period_in_ms: 10000
max_hints_file_size_in_mb: 128

After one our putting nodes down (max_hint_window_in_ms: 3600000) I expect to get no changes in hints directory but it still running and cassandra generates new hint files.

Every 5.0s: ls -lh                        host: Thu Feb  3 18:07:54 2022

total 118M
-rw-r--r-- 1 root root    8 Feb  2 08:05 0c197a36-04d0-436a-b4ce-e63742f5fe19-1643789092767-1.crc32
-rw-r--r-- 1 root root    8 Feb  2 08:29 0c197a36-04d0-436a-b4ce-e63742f5fe19-1643789477717-1.crc32
-rw-r--r-- 1 root root 6.2M Feb  2 08:29 0c197a36-04d0-436a-b4ce-e63742f5fe19-1643789477717-1.hints
-rw-r--r-- 1 root root    8 Feb  2 14:08 0c197a36-04d0-436a-b4ce-e63742f5fe19-1643790595904-1.crc32
-rw-r--r-- 1 root root  20M Feb  2 14:08 0c197a36-04d0-436a-b4ce-e63742f5fe19-1643790595904-1.hints
-rw-r----- 1 root root  31M Feb  3 18:07 0c197a36-04d0-436a-b4ce-e63742f5fe19-1643810952057-1.hints
-rw-r--r-- 1 root root    8 Feb  2 08:29 6ded456d-9d09-4097-8c37-8e46dcd915c4-1643789707920-1.crc32
-rw-r--r-- 1 root root 4.9M Feb  2 08:29 6ded456d-9d09-4097-8c37-8e46dcd915c4-1643789707920-1.hints
-rw-r--r-- 1 root root    8 Feb  2 14:08 6ded456d-9d09-4097-8c37-8e46dcd915c4-1643790595910-1.crc32
-rw-r--r-- 1 root root  22M Feb  2 14:08 6ded456d-9d09-4097-8c37-8e46dcd915c4-1643790595910-1.hints
-rw-r----- 1 root root  35M Feb  3 18:07 6ded456d-9d09-4097-8c37-8e46dcd915c4-1643810952061-1.hints

You can see in the output that first hint file was generated on 08:05 2 Feb and still working (current time is 18:07 3 Feb). This continues until my storage gets full and crashes. I have to purge whole hints directory and restart it. It reaches to 5GB of hints directory.

How to stop further generation of hint files? What is the correct configuration/solution for it ?



Solution 1:[1]

Coordinators will store hints again if a node was previously down, came back up then went down again. Every time a node goes down, the timer resets and coordinators will store hints max_hint_window_in_ms has expired.

Based on the timestamps you posted, the nodes went down multiple times at:

  1. Feb 2 08:05
  2. Feb 2 08.29
  3. Feb 2 14:08
  4. Feb 3 18:07

This indicates that the nodes were going up and down for the timer to reset. You can confirm this by checking the logs on the other 6 nodes.

On a slightly different note, this scenario should highlight to you that your cluster doesn't have enough capacity to tolerate node outages. If your nodes run out of space when other nodes go down even just for a few hours then that's a real problem you need to address and you should review the size of your cluster including the size of the nodes' hints_directory. Cheers!

Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source
Solution 1 Erick Ramirez