'Network performance between two hosts slow with Linux bridges and VXLAN
I'm trying to debug a network performance problem between two hosts:
- Dell R6515
- Mellanox ConnectX-5 2x25Gb SFP28
- Ubuntu 20.04
- Kernel 5.13.0-40-generic
These hosts are both running Frr with BGP+EVPN+VXLAN.
When I run a benchmark with iperf3 between the two hosts:
Connecting to host 10.255.254.53, port 5201
[ 5] local 10.255.254.47 port 42990 connected to 10.255.254.53 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 2.86 GBytes 24.5 Gbits/sec 1 2.98 MBytes
[ 5] 1.00-2.00 sec 2.88 GBytes 24.8 Gbits/sec 0 3.00 MBytes
[ 5] 2.00-3.00 sec 2.88 GBytes 24.7 Gbits/sec 0 3.00 MBytes
[ 5] 3.00-4.00 sec 2.88 GBytes 24.8 Gbits/sec 0 3.00 MBytes
[ 5] 4.00-5.00 sec 2.88 GBytes 24.8 Gbits/sec 0 3.00 MBytes
[ 5] 5.00-6.00 sec 2.88 GBytes 24.8 Gbits/sec 0 3.00 MBytes
[ 5] 6.00-7.00 sec 2.88 GBytes 24.7 Gbits/sec 0 3.00 MBytes
[ 5] 7.00-8.00 sec 2.88 GBytes 24.8 Gbits/sec 0 3.00 MBytes
[ 5] 8.00-9.00 sec 2.88 GBytes 24.7 Gbits/sec 0 3.00 MBytes
[ 5] 9.00-10.00 sec 2.88 GBytes 24.8 Gbits/sec 0 3.00 MBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 28.8 GBytes 24.7 Gbits/sec 1 sender
[ 5] 0.00-10.00 sec 28.8 GBytes 24.7 Gbits/sec receiver
On their loopback interface both hosts have a IP configured which they announce via BGP.
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet 10.255.254.53/32 brd 10.255.254.53 scope global lo
valid_lft forever preferred_lft forever
inet6 2a05:xxx:700:2::53/128 scope global
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
Via this IPv4 or IPv6 address we can reach the 25Gb/s as expected.
Now, there is also cloudbr1 (we use Apache CloudStack) on these hosts:
6: cloudbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 master cloudbr1 cloudbr1
7: vxlan: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 master cloudbr1 state forwarding priority 32 cost 100
hairpin off guard off root_block off fastleave off learning on flood on mcast_flood on mcast_to_unicast off neigh_suppress off vlan_tunnel off isolated off vxlan
8: cloud0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master cloud0 cloud0
6: cloudbr1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9000 qdisc noqueue state UP group default qlen 1000
link/ether aa:88:51:dc:e2:91 brd ff:ff:ff:ff:ff:ff
inet 10.100.33.53/20 brd 10.100.47.255 scope global cloudbr1
valid_lft forever preferred_lft forever
When I benchmark between these Linux bridges on both hosts:
Connecting to host 10.100.33.53, port 5201
[ 5] local 10.100.33.47 port 48908 connected to 10.100.33.53 port 5201
[ ID] Interval Transfer Bitrate Retr Cwnd
[ 5] 0.00-1.00 sec 102 MBytes 853 Mbits/sec 35 236 KBytes
[ 5] 1.00-2.00 sec 101 MBytes 846 Mbits/sec 57 236 KBytes
[ 5] 2.00-3.00 sec 101 MBytes 850 Mbits/sec 12 315 KBytes
[ 5] 3.00-4.00 sec 101 MBytes 846 Mbits/sec 23 157 KBytes
[ 5] 4.00-5.00 sec 101 MBytes 850 Mbits/sec 11 166 KBytes
[ 5] 5.00-6.00 sec 100 MBytes 842 Mbits/sec 41 236 KBytes
[ 5] 6.00-7.00 sec 101 MBytes 850 Mbits/sec 8 140 KBytes
[ 5] 7.00-8.00 sec 101 MBytes 846 Mbits/sec 24 306 KBytes
[ 5] 8.00-9.00 sec 101 MBytes 851 Mbits/sec 11 280 KBytes
[ 5] 9.00-10.00 sec 100 MBytes 842 Mbits/sec 64 253 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval Transfer Bitrate Retr
[ 5] 0.00-10.00 sec 1011 MBytes 848 Mbits/sec 286 sender
[ 5] 0.00-10.00 sec 1009 MBytes 847 Mbits/sec receiver
We suddenly see the performance drop to 1Gb/s or even lower.
What did we check:
- MTU is 9216 on the underlay network and 1500 on the bridge (needed for VXLAN)
- VXLAN offloading is enabled on the network interfaces
Now, the weird thing is that VMs connected to the bridge reach a performance of about 8Gb/s. 8Gb/s seems normal due to the overhead of Virtio_net.
I am looking for pointers where to look. At the moment I have no idea where to look anymore.
Sources
This article follows the attribution requirements of Stack Overflow and is licensed under CC BY-SA 3.0.
Source: Stack Overflow
Solution | Source |
---|