'RT linux jitter when receive UDP after upgrade kernel from 3.14 to 5.10
We have an old product that running Linux 3.14 Preempt kernel, One application keeps polling field devices one by one: send one UDP packet to one IP then sleep 2ms, and require to receive response UDP packet when sleep finished. When the kernel is 3.14 all is fine.
But after we upgrade the kernel to 5.10 with RT patch, we could observe some jitter and the no_respnse counter in the application is increased. By Wireshark running on Linux, I could see that the situation is(the second column is the time since the last packet):
44186 0.001031 172.23.0.17 172.23.7.17 UDP 57 37000 → 37000 Len=15
44187 0.002450 172.23.0.17 172.23.7.18 UDP 57 37000 → 37000 Len=15
44188 0.000118 172.23.7.17 172.23.0.17 UDP 313 37000 → 37000 Len=271
44189 0.000926 172.23.7.18 172.23.0.17 UDP 313 37000 → 37000 Len=271
what we want is like :
44170 0.002116 172.23.0.17 172.23.1.17 UDP 57 37000 → 37000 Len=15
44171 0.001115 172.23.1.17 172.23.0.17 UDP 313 37000 → 37000 Len=271
44172 0.001042 172.23.0.17 172.23.1.18 UDP 57 37000 → 37000 Len=15
44173 0.001104 172.23.1.18 172.23.0.17 UDP 313 37000 → 37000 Len=271
So the response from 172.23.7.17 is too late. But after some test I think this delay is not due to filed devices but the kernel or something(we run the Wireshark on the same Linux so the timestamp may not always be correct I think). The si% at the top is 3 times of old kernel. Especially when I use hping3 to give stress to the CPU(the CPU only has one core), the si% in the new kernel is 17% and 6% in the old kernel:
top - 21:51:48 up 41 min, 3 users, load average: 1.52, 0.92, 0.66
Tasks: 126 total, 3 running, 123 sleeping, 0 stopped, 0 zombie
%Cpu(s): 12.2 us, 36.7 sy, 0.0 ni, 33.3 id, 1.1 wa, 0.0 hi, 16.7 si, 0.0 st
MiB Mem : 1910.4 total, 1612.3 free, 132.3 used, 165.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1660.7 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
5794 root 20 0 11692 5516 5236 R 39.7 0.3 0:52.08 hping3
541 root 20 0 127828 126912 81048 S 10.1 6.5 3:51.52 our_app
When use cyclictest, jitter will be large number for new kernel:
# ./cyclictest -a 0 --policy fifo -p 50 -N -t 1
WARN: stat /dev/cpu_dma_latency failed: No such file or directory
policy: fifo: loadavg: 1.81 1.95 1.56 2/136 8285
T: 0 ( 6260) P:50 I:1000 C: 981767 Min: 9082 Act: 13414 Avg: 14979 Max: 1850722
On old kernel it will be only like Max: 200000 (0.2ms) And the iptables is not running. And the output of perf record:
2.46% [kernel] [k] restore_all_switch_stack
1.62% [kernel] [k] check_preemption_disabled
1.27% [kernel] [k] __copy_user_ll
1.17% [kernel] [k] entry_INT80_32
0.97% libapt-pkg.so.6.0.0 [.] pkgCache::FindGrp
0.91% libapt-pkg.so.6.0.0 [.] debListParser::ParseDepends
0.85% ld-2.31.so (deleted) [.] 0x00001090
0.83% libapt-pkg.so.6.0.0 [.] pkgTagSection::Scan
0.78% libapt-pkg.so.6.0.0 [.] 0x0017c32c
0.76% [kernel] [k] __sched_text_start
0.66% [kernel] [k] __local_bh_enable_ip
0.64% libapt-pkg.so.6.0.0 [.] pkgCache::sHash
0.62% [kernel] [k] preempt_count_add
0.59% [kernel] [k] preempt_count_sub
0.56% [kernel] [k] __rcu_read_unlock
0.54% [kernel] [k] rt_spin_unlock
0.51% [kernel] [k] avc_has_perm_noaudit
0.50% libc-2.31.so [.] malloc
0.50% [kernel] [k] siphash_2u64
0.46% ld-2.31.so [.] 0x00001090
0.42% [kernel] [k] raw_sendmsg
0.41% [kernel] [k] syscall_exit_to_user_mode
0.40% [kernel] [k] __local_bh_disable_ip
0.39% [kernel] [k] ip_route_output_key_hash_rcu
0.39% [kernel] [k] fib_table_lookup
0.38% libapt-pkg.so.6.0.0 [.] pkgCache::GrpIterator::FindPkg
0.38% [kernel] [k] kmem_cache_alloc
0.38% [kernel] [k] exit_to_user_mode_prepare
0.38% [kernel] [k] __rcu_read_lock
0.38% [kernel] [k] sched_clock
0.38% [kernel] [k] kallsyms_expand_symbol.constprop.0
0.37% perf_5.10 [.] 0x0016ec37
0.35% [kernel] [k] try_to_wake_up
0.35% [kernel] [k] __switch_to_asm
For a higher level overview, try: perf top --sort comm,dso
Could you give me some advice? Thanks in advance.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|