Message ID | 20240201080516.3585867-1-idosch@nvidia.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] selftests: net: Fix bridge backup port test flakiness | expand |
On Thu, 2024-02-01 at 10:05 +0200, Ido Schimmel wrote: > The test toggles the carrier of a bridge port in order to test the > bridge backup port feature. > > Due to the linkwatch delayed work the carrier change is not always > reflected fast enough to the bridge driver and packets are not forwarded > as the test expects, resulting in failures [1]. > > Fix by adding a one second delay after a carrier change in places where > a packet is sent immediately after the carrier change. What about adding an helper alike wait_local_port_listen(), checking for bridge link status in short intervals, to likely reduce the overall wait time? Thanks, Paolo
On Thu, Feb 01, 2024 at 10:34:52AM +0100, Paolo Abeni wrote: > What about adding an helper alike wait_local_port_listen(), checking > for bridge link status in short intervals, to likely reduce the overall > wait time? What about the below? diff --git a/tools/testing/selftests/net/test_bridge_backup_port.sh b/tools/testing/selftests/net/test_bridge_backup_port.sh index 70a7d87ba2d2..1b3f89e2b86e 100755 --- a/tools/testing/selftests/net/test_bridge_backup_port.sh +++ b/tools/testing/selftests/net/test_bridge_backup_port.sh @@ -124,6 +124,16 @@ tc_check_packets() [[ $pkts == $count ]] } +bridge_link_check() +{ + local ns=$1; shift + local dev=$1; shift + local state=$1; shift + + bridge -n $ns -d -j link show dev $dev | \ + jq -e ".[][\"state\"] == \"$state\"" &> /dev/null +} + ################################################################################ # Setup @@ -259,6 +269,7 @@ backup_port() log_test $? 0 "No forwarding out of vx0" run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -268,6 +279,7 @@ backup_port() log_test $? 0 "No forwarding out of vx0" run_cmd "ip -n $sw1 link set dev swp1 carrier on" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding log_test $? 0 "swp1 carrier on" # Configure vx0 as the backup port of swp1 and check that packets are @@ -284,6 +296,7 @@ backup_port() log_test $? 0 "No forwarding out of vx0" run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -293,6 +306,7 @@ backup_port() log_test $? 0 "Forwarding out of vx0" run_cmd "ip -n $sw1 link set dev swp1 carrier on" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding log_test $? 0 "swp1 carrier on" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -314,6 +328,7 @@ backup_port() log_test $? 0 "No forwarding out of vx0" run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -369,6 +384,7 @@ backup_nhid() log_test $? 0 "No forwarding out of vx0" run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -382,6 +398,7 @@ backup_nhid() log_test $? 0 "Forwarding using VXLAN FDB entry" run_cmd "ip -n $sw1 link set dev swp1 carrier on" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding log_test $? 0 "swp1 carrier on" # Configure nexthop ID 10 as the backup nexthop ID of swp1 and check @@ -398,6 +415,7 @@ backup_nhid() log_test $? 0 "No forwarding out of vx0" run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -411,6 +429,7 @@ backup_nhid() log_test $? 0 "No forwarding using VXLAN FDB entry" run_cmd "ip -n $sw1 link set dev swp1 carrier on" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 forwarding log_test $? 0 "swp1 carrier on" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -441,6 +460,7 @@ backup_nhid() log_test $? 0 "No forwarding using VXLAN FDB entry" run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -497,6 +517,7 @@ backup_nhid_invalid() log_test $? 0 "Valid nexthop as backup nexthop" run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled log_test $? 0 "swp1 carrier off" run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" @@ -604,7 +625,9 @@ backup_nhid_ping() run_cmd "bridge -n $sw2 link set dev swp1 backup_nhid 10" run_cmd "ip -n $sw1 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw1 swp1 disabled run_cmd "ip -n $sw2 link set dev swp1 carrier off" + busywait $BUSYWAIT_TIMEOUT bridge_link_check $sw2 swp1 disabled run_cmd "ip netns exec $sw1 ping -i 0.1 -c 10 -w $PING_TIMEOUT 192.0.2.66" log_test $? 0 "Ping with backup nexthop ID"
On Thu, 2024-02-01 at 13:16 +0200, Ido Schimmel wrote: > On Thu, Feb 01, 2024 at 10:34:52AM +0100, Paolo Abeni wrote: > > What about adding an helper alike wait_local_port_listen(), checking > > for bridge link status in short intervals, to likely reduce the overall > > wait time? > > What about the below? > > diff --git a/tools/testing/selftests/net/test_bridge_backup_port.sh b/tools/testing/selftests/net/test_bridge_backup_port.sh > index 70a7d87ba2d2..1b3f89e2b86e 100755 > --- a/tools/testing/selftests/net/test_bridge_backup_port.sh > +++ b/tools/testing/selftests/net/test_bridge_backup_port.sh > @@ -124,6 +124,16 @@ tc_check_packets() > [[ $pkts == $count ]] > } > > +bridge_link_check() > +{ > + local ns=$1; shift > + local dev=$1; shift > + local state=$1; shift > + > + bridge -n $ns -d -j link show dev $dev | \ > + jq -e ".[][\"state\"] == \"$state\"" &> /dev/null > +} I was wondering more about a sleeping loop, something alike: wait_bridge_link() { for i in $(seq 10); do if bridge_link_check $1 $2 $3; then break fi sleep 0.1 done } but no strong preference Cheers, Paolo
diff --git a/tools/testing/selftests/net/test_bridge_backup_port.sh b/tools/testing/selftests/net/test_bridge_backup_port.sh index 70a7d87ba2d2..92078b56ae0a 100755 --- a/tools/testing/selftests/net/test_bridge_backup_port.sh +++ b/tools/testing/selftests/net/test_bridge_backup_port.sh @@ -260,6 +260,7 @@ backup_port() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 1 @@ -285,6 +286,7 @@ backup_port() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 2 @@ -294,6 +296,7 @@ backup_port() run_cmd "ip -n $sw1 link set dev swp1 carrier on" log_test $? 0 "swp1 carrier on" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 3 @@ -315,6 +318,7 @@ backup_port() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 4 @@ -370,6 +374,7 @@ backup_nhid() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 1 @@ -399,6 +404,7 @@ backup_nhid() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 2 @@ -412,6 +418,7 @@ backup_nhid() run_cmd "ip -n $sw1 link set dev swp1 carrier on" log_test $? 0 "swp1 carrier on" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 3 @@ -442,6 +449,7 @@ backup_nhid() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 4 @@ -498,6 +506,7 @@ backup_nhid_invalid() run_cmd "ip -n $sw1 link set dev swp1 carrier off" log_test $? 0 "swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 mausezahn br0.10 -a $smac -b $dmac -A 198.51.100.1 -B 198.51.100.2 -t ip -p 100 -q -c 1" tc_check_packets $sw1 "dev swp1 egress" 101 0 @@ -605,6 +614,7 @@ backup_nhid_ping() run_cmd "ip -n $sw1 link set dev swp1 carrier off" run_cmd "ip -n $sw2 link set dev swp1 carrier off" + sleep 1 run_cmd "ip netns exec $sw1 ping -i 0.1 -c 10 -w $PING_TIMEOUT 192.0.2.66" log_test $? 0 "Ping with backup nexthop ID"
The test toggles the carrier of a bridge port in order to test the bridge backup port feature. Due to the linkwatch delayed work the carrier change is not always reflected fast enough to the bridge driver and packets are not forwarded as the test expects, resulting in failures [1]. Fix by adding a one second delay after a carrier change in places where a packet is sent immediately after the carrier change. [1] # Backup port # ----------- [...] # TEST: swp1 carrier off [ OK ] # TEST: No forwarding out of swp1 [FAIL] [ 641.995910] br0: port 1(swp1) entered disabled state # TEST: No forwarding out of vx0 [ OK ] Fixes: b408453053fb ("selftests: net: Add bridge backup port and backup nexthop ID test") Signed-off-by: Ido Schimmel <idosch@nvidia.com> --- Jakub, targeting at net-next to see if it helps the CI, but can be applied to net. I'm unable to reproduce the failure locally. --- tools/testing/selftests/net/test_bridge_backup_port.sh | 10 ++++++++++ 1 file changed, 10 insertions(+)