diff mbox series

[net] selftests: bonding: Add more missing config options

Message ID 20240116154926.202164-1-bpoirier@nvidia.com (mailing list archive)
State Accepted
Commit dd2d40acdbb2b9e6bcddd5424b0e00c1760ecf26
Delegated to: Netdev Maintainers
Headers show
Series [net] selftests: bonding: Add more missing config options | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net
netdev/ynl success SINGLE THREAD; Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/cc_maintainers success CCed 0 of 0 maintainers
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 10 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-01-18--03-00 (tests: 403)

Commit Message

Benjamin Poirier Jan. 16, 2024, 3:49 p.m. UTC
As a followup to commit 03fb8565c880 ("selftests: bonding: add missing
build configs"), add more networking-specific config options which are
needed for bonding tests.

For testing, I used the minimal config generated by virtme-ng and I added
the options in the config file. All bonding tests passed.

Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6
Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options
Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
---
 tools/testing/selftests/drivers/net/bonding/config | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Jakub Kicinski Jan. 16, 2024, 6:44 p.m. UTC | #1
On Tue, 16 Jan 2024 10:49:26 -0500 Benjamin Poirier wrote:
> As a followup to commit 03fb8565c880 ("selftests: bonding: add missing
> build configs"), add more networking-specific config options which are
> needed for bonding tests.
> 
> For testing, I used the minimal config generated by virtme-ng and I added
> the options in the config file. All bonding tests passed.
> 
> Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6
> Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options
> Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon
> Suggested-by: Jakub Kicinski <kuba@kernel.org>
> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>

With this applied the only remaining bonding test which fails in our CI
is bond-options [1], good progress! :) Looks like it doesn't finish in
time:

not ok 7 selftests: drivers/net/bonding: bond_options.sh # TIMEOUT 120 seconds

The tests run in a VM without HW virtualization support. Any opinions
about bumping the timeout for bonding? If we enable KASAN etc. things
will get even slower.

[1]
https://netdev-2.bots.linux.dev/vmksft-bonding/results/423800/7-bond-options-sh
Jay Vosburgh Jan. 16, 2024, 7:03 p.m. UTC | #2
Jakub Kicinski <kuba@kernel.org> wrote:

>On Tue, 16 Jan 2024 10:49:26 -0500 Benjamin Poirier wrote:
>> As a followup to commit 03fb8565c880 ("selftests: bonding: add missing
>> build configs"), add more networking-specific config options which are
>> needed for bonding tests.
>> 
>> For testing, I used the minimal config generated by virtme-ng and I added
>> the options in the config file. All bonding tests passed.
>> 
>> Fixes: bbb774d921e2 ("net: Add tests for bonding and team address list management") # for ipv6
>> Fixes: 6cbe791c0f4e ("kselftest: bonding: add num_grat_arp test") # for tc options
>> Fixes: 222c94ec0ad4 ("selftests: bonding: add tests for ether type changes") # for nlmon
>> Suggested-by: Jakub Kicinski <kuba@kernel.org>
>> Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com>
>
>With this applied the only remaining bonding test which fails in our CI
>is bond-options [1], good progress! :) Looks like it doesn't finish in
>time:
>
>not ok 7 selftests: drivers/net/bonding: bond_options.sh # TIMEOUT 120 seconds
>
>The tests run in a VM without HW virtualization support. Any opinions
>about bumping the timeout for bonding? If we enable KASAN etc. things
>will get even slower.

	Reading the grat_arp test, it looks like it has long sleep times
built into it:

garp_test()
{
[...]
	exp_num=$(echo "${param}" | cut -f6 -d ' ')
	sleep $((exp_num + 2))

num_grat_arp()
{
	local val
	for val in 10 20 30 50; do
		garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000"

	If I'm reading it right, this will sleep for 12, 22, 32 and 52
seconds for the passes through the loop in num_grat_arp(), so that would
be 118 seconds just for that.

	-J

>[1]
>https://netdev-2.bots.linux.dev/vmksft-bonding/results/423800/7-bond-options-sh

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Jakub Kicinski Jan. 16, 2024, 7:20 p.m. UTC | #3
On Tue, 16 Jan 2024 11:03:30 -0800 Jay Vosburgh wrote:
> 	If I'm reading it right, this will sleep for 12, 22, 32 and 52
> seconds for the passes through the loop in num_grat_arp(), so that would
> be 118 seconds just for that.

Hah, that's really tight if the default timeout is 120.
Let me send a patch to bump it to 300..
Benjamin Poirier Jan. 16, 2024, 7:21 p.m. UTC | #4
On 2024-01-16 11:20 -0800, Jakub Kicinski wrote:
> On Tue, 16 Jan 2024 11:03:30 -0800 Jay Vosburgh wrote:
> > 	If I'm reading it right, this will sleep for 12, 22, 32 and 52
> > seconds for the passes through the loop in num_grat_arp(), so that would
> > be 118 seconds just for that.
> 
> Hah, that's really tight if the default timeout is 120.
> Let me send a patch to bump it to 300..

I also ran into that timeout when using `make run_tests` so I executed
the test script directly:

# time ./bond_options.sh
TEST: prio (active-backup miimon primary_reselect 0)                [ OK ]
TEST: prio (active-backup miimon primary_reselect 1)                [ OK ]
TEST: prio (active-backup miimon primary_reselect 2)                [ OK ]
TEST: prio (active-backup arp_ip_target primary_reselect 0)         [ OK ]
TEST: prio (active-backup arp_ip_target primary_reselect 1)         [ OK ]
TEST: prio (active-backup arp_ip_target primary_reselect 2)         [ OK ]
TEST: prio (active-backup ns_ip6_target primary_reselect 0)         [ OK ]
TEST: prio (active-backup ns_ip6_target primary_reselect 1)         [ OK ]
TEST: prio (active-backup ns_ip6_target primary_reselect 2)         [ OK ]
TEST: prio (balance-tlb miimon primary_reselect 0)                  [ OK ]
TEST: prio (balance-tlb miimon primary_reselect 1)                  [ OK ]
TEST: prio (balance-tlb miimon primary_reselect 2)                  [ OK ]
TEST: prio (balance-tlb arp_ip_target primary_reselect 0)           [ OK ]
TEST: prio (balance-tlb arp_ip_target primary_reselect 1)           [ OK ]
TEST: prio (balance-tlb arp_ip_target primary_reselect 2)           [ OK ]
TEST: prio (balance-tlb ns_ip6_target primary_reselect 0)           [ OK ]
TEST: prio (balance-tlb ns_ip6_target primary_reselect 1)           [ OK ]
TEST: prio (balance-tlb ns_ip6_target primary_reselect 2)           [ OK ]
TEST: prio (balance-alb miimon primary_reselect 0)                  [ OK ]
TEST: prio (balance-alb miimon primary_reselect 1)                  [ OK ]
TEST: prio (balance-alb miimon primary_reselect 2)                  [ OK ]
TEST: prio (balance-alb arp_ip_target primary_reselect 0)           [ OK ]
TEST: prio (balance-alb arp_ip_target primary_reselect 1)           [ OK ]
TEST: prio (balance-alb arp_ip_target primary_reselect 2)           [ OK ]
TEST: prio (balance-alb ns_ip6_target primary_reselect 0)           [ OK ]
TEST: prio (balance-alb ns_ip6_target primary_reselect 1)           [ OK ]
TEST: prio (balance-alb ns_ip6_target primary_reselect 2)           [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 0)     [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 1)     [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 2)     [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 3)     [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 4)     [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 5)     [ OK ]
TEST: arp_validate (active-backup arp_ip_target arp_validate 6)     [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 0)     [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 1)     [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 2)     [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 3)     [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 4)     [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 5)     [ OK ]
TEST: arp_validate (active-backup ns_ip6_target arp_validate 6)     [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 10)           [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 20)           [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 30)           [ OK ]
TEST: num_grat_arp (active-backup miimon num_grat_arp 50)           [ OK ]

real    13m35.065s
user    0m1.657s
sys     0m27.918s

The test is not cpu bound; as Jay pointed out, it spends most of its
time sleeping.
Jakub Kicinski Jan. 16, 2024, 7:29 p.m. UTC | #5
On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote:
> real    13m35.065s
> user    0m1.657s
> sys     0m27.918s
> 
> The test is not cpu bound; as Jay pointed out, it spends most of its
> time sleeping.

Ugh, so it does multiple iterations of 118 sec?

Could you send a patch to bump the timeout to 900 or 1200 in this case?
Jay Vosburgh Jan. 16, 2024, 7:43 p.m. UTC | #6
Jakub Kicinski <kuba@kernel.org> wrote:

>On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote:
>> real    13m35.065s
>> user    0m1.657s
>> sys     0m27.918s
>> 
>> The test is not cpu bound; as Jay pointed out, it spends most of its
>> time sleeping.
>
>Ugh, so it does multiple iterations of 118 sec?
>
>Could you send a patch to bump the timeout to 900 or 1200 in this case?

	We could also lower the interval or number of notifications;
right now "peer_notif_delay 1000" puts 1 second between the ARPs in the
num_grat_arp() test.  I'm not sure why that value was chosen, but the
peer_notify_delay is rounded to units of the miimon interval, and in
this test miimon is 100 msec.

	I haven't tested this at all, but something like

diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
index c54d1697f439..95eb77aebc3c 100755
--- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
@@ -277,7 +277,7 @@ garp_test()
 	ip -n ${s_ns} link set ${active_slave} down
 
 	exp_num=$(echo "${param}" | cut -f6 -d ' ')
-	sleep $((exp_num + 2))
+	sleep $((exp_num / 5 + 2))
 
 	active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")
 
@@ -297,7 +297,7 @@ num_grat_arp()
 {
 	local val
 	for val in 10 20 30 50; do
-		garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000"
+		garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 200"
 		log_test "num_grat_arp" "active-backup miimon num_grat_arp $val"
 	done
 }

	could substantially reduce the time to run the test.  It's kind
of icky with magic numbers, but that could be cleaned up.

	-J

---
	-Jay Vosburgh, jay.vosburgh@canonical.com
Benjamin Poirier Jan. 16, 2024, 7:47 p.m. UTC | #7
On 2024-01-16 11:29 -0800, Jakub Kicinski wrote:
> On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote:
> > real    13m35.065s
> > user    0m1.657s
> > sys     0m27.918s
> > 
> > The test is not cpu bound; as Jay pointed out, it spends most of its
> > time sleeping.
> 
> Ugh, so it does multiple iterations of 118 sec?

There are other test functions in the script which include a lot of
sleeping.

> Could you send a patch to bump the timeout to 900 or 1200 in this case?

Sure but I'd like to give a chance for Hangbin to reply first. Would the
test be just as good if it was shortened by removing some cases or
reducing the time intervals? Or is increasing the timeout the best
approach?
Hangbin Liu Jan. 17, 2024, 3:15 a.m. UTC | #8
On Tue, Jan 16, 2024 at 02:47:46PM -0500, Benjamin Poirier wrote:
> On 2024-01-16 11:29 -0800, Jakub Kicinski wrote:
> > On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote:
> > > real    13m35.065s
> > > user    0m1.657s
> > > sys     0m27.918s
> > > 
> > > The test is not cpu bound; as Jay pointed out, it spends most of its
> > > time sleeping.
> > 
> > Ugh, so it does multiple iterations of 118 sec?
> 
> There are other test functions in the script which include a lot of
> sleeping.

The arp_validate_test need to check the mii_status, which sleep too much time.
Maybe we can use busywait to save more time.

> 
> > Could you send a patch to bump the timeout to 900 or 1200 in this case?
> 
> Sure but I'd like to give a chance for Hangbin to reply first. Would the
> test be just as good if it was shortened by removing some cases or
> reducing the time intervals? Or is increasing the timeout the best
> approach?

The purpose of grat_arp is testing commit 9949e2efb54e ("bonding: fix
send_peer_notif overflow"). As the send_peer_notif was defined to u8,
to overflow it, we need to

send_peer_notif = num_peer_notif * peer_notif_delay = num_grat_arp * peer_notify_delay / miimon > 255
  (kernel)           (kernel parameter)                   (user parameter)

e.g. 30 (num_grat_arp) * 1000 (peer_notify_delay) / 100 (miimon) > 255.

Which need 30s to complete sending garp messages. To save the testing time,
the only way is reduce the miimon number. Something like
30 (num_grat_arp) * 500 (peer_notify_delay) / 50 (miimon) > 255.

To save more time, we can remove the 50 num_grat_arp testing. The patch would
like

diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
index c54d1697f439..20c4d862c436 100755
--- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh
+++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
@@ -277,7 +277,7 @@ garp_test()
        ip -n ${s_ns} link set ${active_slave} down

        exp_num=$(echo "${param}" | cut -f6 -d ' ')
-       sleep $((exp_num + 2))
+       sleep $((exp_num / 2 + 2))

        active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")

@@ -296,8 +296,8 @@ garp_test()
 num_grat_arp()
 {
        local val
-       for val in 10 20 30 50; do
-               garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000"
+       for val in 10 20 30; do
+               garp_test "mode active-backup miimon 50 num_grat_arp $val peer_notify_delay 500"
                log_test "num_grat_arp" "active-backup miimon num_grat_arp $val"
        done
 }

With this we can save 100s.

Thanks
Hangbin
Benjamin Poirier Jan. 18, 2024, 12:16 a.m. UTC | #9
On 2024-01-17 11:15 +0800, Hangbin Liu wrote:
> On Tue, Jan 16, 2024 at 02:47:46PM -0500, Benjamin Poirier wrote:
> > On 2024-01-16 11:29 -0800, Jakub Kicinski wrote:
> > > On Tue, 16 Jan 2024 14:21:51 -0500 Benjamin Poirier wrote:
> > > > real    13m35.065s
> > > > user    0m1.657s
> > > > sys     0m27.918s
> > > > 
> > > > The test is not cpu bound; as Jay pointed out, it spends most of its
> > > > time sleeping.
> > > 
> > > Ugh, so it does multiple iterations of 118 sec?
> > 
> > There are other test functions in the script which include a lot of
> > sleeping.
> 
> The arp_validate_test need to check the mii_status, which sleep too much time.
> Maybe we can use busywait to save more time.
> 
> > 
> > > Could you send a patch to bump the timeout to 900 or 1200 in this case?
> > 
> > Sure but I'd like to give a chance for Hangbin to reply first. Would the
> > test be just as good if it was shortened by removing some cases or
> > reducing the time intervals? Or is increasing the timeout the best
> > approach?
> 
> The purpose of grat_arp is testing commit 9949e2efb54e ("bonding: fix
> send_peer_notif overflow"). As the send_peer_notif was defined to u8,
> to overflow it, we need to
> 
> send_peer_notif = num_peer_notif * peer_notif_delay = num_grat_arp * peer_notify_delay / miimon > 255
>   (kernel)           (kernel parameter)                   (user parameter)
> 
> e.g. 30 (num_grat_arp) * 1000 (peer_notify_delay) / 100 (miimon) > 255.
> 
> Which need 30s to complete sending garp messages. To save the testing time,
> the only way is reduce the miimon number. Something like
> 30 (num_grat_arp) * 500 (peer_notify_delay) / 50 (miimon) > 255.
> 
> To save more time, we can remove the 50 num_grat_arp testing. The patch would
> like
> 
> diff --git a/tools/testing/selftests/drivers/net/bonding/bond_options.sh b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
> index c54d1697f439..20c4d862c436 100755
> --- a/tools/testing/selftests/drivers/net/bonding/bond_options.sh
> +++ b/tools/testing/selftests/drivers/net/bonding/bond_options.sh
> @@ -277,7 +277,7 @@ garp_test()
>         ip -n ${s_ns} link set ${active_slave} down
> 
>         exp_num=$(echo "${param}" | cut -f6 -d ' ')
> -       sleep $((exp_num + 2))
> +       sleep $((exp_num / 2 + 2))
> 
>         active_slave=$(cmd_jq "ip -n ${s_ns} -d -j link show bond0" ".[].linkinfo.info_data.active_slave")
> 
> @@ -296,8 +296,8 @@ garp_test()
>  num_grat_arp()
>  {
>         local val
> -       for val in 10 20 30 50; do
> -               garp_test "mode active-backup miimon 100 num_grat_arp $val peer_notify_delay 1000"
> +       for val in 10 20 30; do
> +               garp_test "mode active-backup miimon 50 num_grat_arp $val peer_notify_delay 500"
>                 log_test "num_grat_arp" "active-backup miimon num_grat_arp $val"
>         done
>  }
> 
> With this we can save 100s.
> 

Thanks for looking into it. This change got the runtime down from 13m35s
to 12m7s on the same system I used to test yesterday. That's a start but
since it's still well above the current timeout of 120s, I sent a patch
to increase the timeout to 1200s.
patchwork-bot+netdevbpf@kernel.org Jan. 18, 2024, 11:10 a.m. UTC | #10
Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Tue, 16 Jan 2024 10:49:26 -0500 you wrote:
> As a followup to commit 03fb8565c880 ("selftests: bonding: add missing
> build configs"), add more networking-specific config options which are
> needed for bonding tests.
> 
> For testing, I used the minimal config generated by virtme-ng and I added
> the options in the config file. All bonding tests passed.
> 
> [...]

Here is the summary with links:
  - [net] selftests: bonding: Add more missing config options
    https://git.kernel.org/netdev/net/c/dd2d40acdbb2

You are awesome, thank you!
diff mbox series

Patch

diff --git a/tools/testing/selftests/drivers/net/bonding/config b/tools/testing/selftests/drivers/net/bonding/config
index f85b16fc5128..899d7fb6ea8e 100644
--- a/tools/testing/selftests/drivers/net/bonding/config
+++ b/tools/testing/selftests/drivers/net/bonding/config
@@ -1,5 +1,10 @@ 
 CONFIG_BONDING=y
 CONFIG_BRIDGE=y
 CONFIG_DUMMY=y
+CONFIG_IPV6=y
 CONFIG_MACVLAN=y
+CONFIG_NET_ACT_GACT=y
+CONFIG_NET_CLS_FLOWER=y
+CONFIG_NET_SCH_INGRESS=y
+CONFIG_NLMON=y
 CONFIG_VETH=y