'RT linux jitter when receive UDP after upgrade kernel from 3.14 to 5.10

We have an old product that running Linux 3.14 Preempt kernel, One application keeps polling field devices one by one: send one UDP packet to one IP then sleep 2ms, and require to receive response UDP packet when sleep finished. When the kernel is 3.14 all is fine.

But after we upgrade the kernel to 5.10 with RT patch, we could observe some jitter and the no_respnse counter in the application is increased. By Wireshark running on Linux, I could see that the situation is(the second column is the time since the last packet):

44186   0.001031    172.23.0.17 172.23.7.17 UDP 57  37000 → 37000 Len=15
44187   0.002450    172.23.0.17 172.23.7.18 UDP 57  37000 → 37000 Len=15
44188   0.000118    172.23.7.17 172.23.0.17 UDP 313 37000 → 37000 Len=271
44189   0.000926    172.23.7.18 172.23.0.17 UDP 313 37000 → 37000 Len=271

what we want is like :

44170   0.002116    172.23.0.17 172.23.1.17 UDP 57  37000 → 37000 Len=15
44171   0.001115    172.23.1.17 172.23.0.17 UDP 313 37000 → 37000 Len=271
44172   0.001042    172.23.0.17 172.23.1.18 UDP 57  37000 → 37000 Len=15
44173   0.001104    172.23.1.18 172.23.0.17 UDP 313 37000 → 37000 Len=271

So the response from 172.23.7.17 is too late. But after some test I think this delay is not due to filed devices but the kernel or something(we run the Wireshark on the same Linux so the timestamp may not always be correct I think). The si% at the top is 3 times of old kernel. Especially when I use hping3 to give stress to the CPU(the CPU only has one core), the si% in the new kernel is 17% and 6% in the old kernel:

top - 21:51:48 up 41 min,  3 users,  load average: 1.52, 0.92, 0.66 
Tasks: 126 total,   3 running, 123 sleeping,   0 stopped,   0 zombie
%Cpu(s): 12.2 us, 36.7 sy,  0.0 ni, 33.3 id,  1.1 wa,  0.0 hi, 16.7 si,  0.0 st
MiB Mem :   1910.4 total,   1612.3 free,    132.3 used,    165.8 buff/cache
MiB Swap:      0.0 total,      0.0 free,      0.0 used.   1660.7 avail Mem 
  PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                                                                                                                                                                                                    
 5794 root      20   0   11692   5516   5236 R  39.7   0.3   0:52.08 hping3    
  541 root      20   0  127828 126912  81048 S  10.1   6.5   3:51.52 our_app

When use cyclictest, jitter will be large number for new kernel:

# ./cyclictest -a 0 --policy fifo -p 50 -N -t 1
WARN: stat /dev/cpu_dma_latency failed: No such file or directory
policy: fifo: loadavg: 1.81 1.95 1.56 2/136 8285          

T: 0 ( 6260) P:50 I:1000 C: 981767 Min:   9082 Act:   13414 Avg:   14979 Max: 1850722

On old kernel it will be only like Max: 200000 (0.2ms) And the iptables is not running. And the output of perf record:

 2.46%  [kernel]                       [k] restore_all_switch_stack
   1.62%  [kernel]                       [k] check_preemption_disabled
   1.27%  [kernel]                       [k] __copy_user_ll
   1.17%  [kernel]                       [k] entry_INT80_32
   0.97%  libapt-pkg.so.6.0.0            [.] pkgCache::FindGrp
   0.91%  libapt-pkg.so.6.0.0            [.] debListParser::ParseDepends
   0.85%  ld-2.31.so (deleted)           [.] 0x00001090
   0.83%  libapt-pkg.so.6.0.0            [.] pkgTagSection::Scan
   0.78%  libapt-pkg.so.6.0.0            [.] 0x0017c32c
   0.76%  [kernel]                       [k] __sched_text_start
   0.66%  [kernel]                       [k] __local_bh_enable_ip
   0.64%  libapt-pkg.so.6.0.0            [.] pkgCache::sHash
   0.62%  [kernel]                       [k] preempt_count_add
   0.59%  [kernel]                       [k] preempt_count_sub
   0.56%  [kernel]                       [k] __rcu_read_unlock
   0.54%  [kernel]                       [k] rt_spin_unlock
   0.51%  [kernel]                       [k] avc_has_perm_noaudit
   0.50%  libc-2.31.so                   [.] malloc
   0.50%  [kernel]                       [k] siphash_2u64
   0.46%  ld-2.31.so                     [.] 0x00001090
   0.42%  [kernel]                       [k] raw_sendmsg
   0.41%  [kernel]                       [k] syscall_exit_to_user_mode
   0.40%  [kernel]                       [k] __local_bh_disable_ip
   0.39%  [kernel]                       [k] ip_route_output_key_hash_rcu
   0.39%  [kernel]                       [k] fib_table_lookup
   0.38%  libapt-pkg.so.6.0.0            [.] pkgCache::GrpIterator::FindPkg
   0.38%  [kernel]                       [k] kmem_cache_alloc
   0.38%  [kernel]                       [k] exit_to_user_mode_prepare
   0.38%  [kernel]                       [k] __rcu_read_lock
   0.38%  [kernel]                       [k] sched_clock
   0.38%  [kernel]                       [k] kallsyms_expand_symbol.constprop.0
   0.37%  perf_5.10                      [.] 0x0016ec37
   0.35%  [kernel]                       [k] try_to_wake_up
   0.35%  [kernel]                       [k] __switch_to_asm
For a higher level overview, try: perf top --sort comm,dso

Could you give me some advice? Thanks in advance.



Sources

This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.

Source: Stack Overflow

Solution Source