diff mbox series

[net] selftests/net: packetdrill: increase timing tolerance in debug mode

Message ID 20240919124412.3014326-1-willemdebruijn.kernel@gmail.com (mailing list archive)
State Accepted
Commit 72ef07554c5dcabb0053a147c4fd221a8e39bcfd
Headers show
Series [net] selftests/net: packetdrill: increase timing tolerance in debug mode | expand

Commit Message

Willem de Bruijn Sept. 19, 2024, 12:43 p.m. UTC
From: Willem de Bruijn <willemb@google.com>

Some packetdrill tests are flaky in debug mode. As discussed, increase
tolerance.

We have been doing this for debug builds outside ksft too.

Previous setting was 10000. A manual 50 runs in virtme-ng showed two
failures that needed 12000. To be on the safe side, Increase to 14000.

Link: https://lore.kernel.org/netdev/Zuhhe4-MQHd3EkfN@mini-arch/
Fixes: 1e42f73fd3c2 ("selftests/net: packetdrill: import tcp/zerocopy")
Reported-by: Stanislav Fomichev <sdf@fomichev.me>
Signed-off-by: Willem de Bruijn <willemb@google.com>
---
 tools/testing/selftests/net/packetdrill/ksft_runner.sh | 9 +++++++--
 1 file changed, 7 insertions(+), 2 deletions(-)

Comments

Simon Horman Sept. 19, 2024, 1:55 p.m. UTC | #1
On Thu, Sep 19, 2024 at 08:43:42AM -0400, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Some packetdrill tests are flaky in debug mode. As discussed, increase
> tolerance.
> 
> We have been doing this for debug builds outside ksft too.
> 
> Previous setting was 10000. A manual 50 runs in virtme-ng showed two
> failures that needed 12000. To be on the safe side, Increase to 14000.
> 
> Link: https://lore.kernel.org/netdev/Zuhhe4-MQHd3EkfN@mini-arch/
> Fixes: 1e42f73fd3c2 ("selftests/net: packetdrill: import tcp/zerocopy")
> Reported-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Reviewed-by: Simon Horman <horms@kernel.org>
Stanislav Fomichev Sept. 19, 2024, 9:04 p.m. UTC | #2
On 09/19, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Some packetdrill tests are flaky in debug mode. As discussed, increase
> tolerance.
> 
> We have been doing this for debug builds outside ksft too.
> 
> Previous setting was 10000. A manual 50 runs in virtme-ng showed two
> failures that needed 12000. To be on the safe side, Increase to 14000.
> 
> Link: https://lore.kernel.org/netdev/Zuhhe4-MQHd3EkfN@mini-arch/
> Fixes: 1e42f73fd3c2 ("selftests/net: packetdrill: import tcp/zerocopy")
> Reported-by: Stanislav Fomichev <sdf@fomichev.me>
> Signed-off-by: Willem de Bruijn <willemb@google.com>

Acked-by: Stanislav Fomichev <sdf@fomichev.me>

Thanks! Should probably go to net-next though? (Not sure what's
the bar for selftests fixes for 'net')
Matthieu Baerts (NGI0) Sept. 19, 2024, 10:03 p.m. UTC | #3
Hi Willem,

On 19/09/2024 14:43, Willem de Bruijn wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Some packetdrill tests are flaky in debug mode. As discussed, increase
> tolerance.

Thank you for the patch!

> We have been doing this for debug builds outside ksft too.
> 
> Previous setting was 10000. A manual 50 runs in virtme-ng showed two
> failures that needed 12000. To be on the safe side, Increase to 14000.

So far (in 3 runs), it looks like 14000 is enough. But I guess it is
still a bit too early to conclude that.

https://netdev.bots.linux.dev/contest.html?executor=vmksft-packetdrill-dbg

(Your patch has been introduced in the net-next-2024-09-19--15-00 branch.)


Personally, I would not be chocked if the tolerance was even 10x higher
to cope with this very slow environment where we care less about timing
I think. But if less works, that's good:

Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>


Just one question for later: in the GitHub repo, some tests set the
tolerance in the .pkt file, will it be OK for these tests? I guess yes,
because the max they set is 10k, but I just want to double-check.


(Note that it is now easier to spot other errors :) e.g.)


https://netdev-3.bots.linux.dev/vmksft-packetdrill-dbg/results/779660/22-tcp-zerocopy-epoll-exclusive-pkt/stdout


Cheers,
Matt
Paolo Abeni Sept. 26, 2024, 9:03 a.m. UTC | #4
On 9/19/24 23:04, Stanislav Fomichev wrote:
> On 09/19, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>
>>
>> Some packetdrill tests are flaky in debug mode. As discussed, increase
>> tolerance.
>>
>> We have been doing this for debug builds outside ksft too.
>>
>> Previous setting was 10000. A manual 50 runs in virtme-ng showed two
>> failures that needed 12000. To be on the safe side, Increase to 14000.
>>
>> Link: https://lore.kernel.org/netdev/Zuhhe4-MQHd3EkfN@mini-arch/
>> Fixes: 1e42f73fd3c2 ("selftests/net: packetdrill: import tcp/zerocopy")
>> Reported-by: Stanislav Fomichev <sdf@fomichev.me>
>> Signed-off-by: Willem de Bruijn <willemb@google.com>
> 
> Acked-by: Stanislav Fomichev <sdf@fomichev.me>
> 
> Thanks! Should probably go to net-next though? (Not sure what's
> the bar for selftests fixes for 'net')

FTR, we want this kind of fixes in net, to reach self-test stability in 
both trees ASAP.

Cheers,

Paolo
patchwork-bot+netdevbpf@kernel.org Sept. 26, 2024, 9:10 a.m. UTC | #5
Hello:

This patch was applied to netdev/net.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Thu, 19 Sep 2024 08:43:42 -0400 you wrote:
> From: Willem de Bruijn <willemb@google.com>
> 
> Some packetdrill tests are flaky in debug mode. As discussed, increase
> tolerance.
> 
> We have been doing this for debug builds outside ksft too.
> 
> [...]

Here is the summary with links:
  - [net] selftests/net: packetdrill: increase timing tolerance in debug mode
    https://git.kernel.org/netdev/net/c/72ef07554c5d

You are awesome, thank you!
Matthieu Baerts (NGI0) Sept. 26, 2024, 9:12 a.m. UTC | #6
Hi Willem,

On 20/09/2024 00:03, Matthieu Baerts wrote:
> On 19/09/2024 14:43, Willem de Bruijn wrote:
>> From: Willem de Bruijn <willemb@google.com>

(...)

>> We have been doing this for debug builds outside ksft too.
>>
>> Previous setting was 10000. A manual 50 runs in virtme-ng showed two
>> failures that needed 12000. To be on the safe side, Increase to 14000.
> 
> So far (in 3 runs), it looks like 14000 is enough. But I guess it is
> still a bit too early to conclude that.
> 
> https://netdev.bots.linux.dev/contest.html?executor=vmksft-packetdrill-dbg
> 
> (Your patch has been introduced in the net-next-2024-09-19--15-00 branch.)
One week after the introduction of this patch and >50 builds, it looks
like the results are good, only one issue related to timing issues:

https://netdev-3.bots.linux.dev/vmksft-packetdrill-dbg/results/782181/1-tcp-slow-start-slow-start-after-win-update-pkt/stdout

And it passed after a retry.

https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=packetdrill

Cheers,
Matt
Willem de Bruijn Sept. 26, 2024, 9:28 a.m. UTC | #7
Matthieu Baerts wrote:
> Hi Willem,
> 
> On 20/09/2024 00:03, Matthieu Baerts wrote:
> > On 19/09/2024 14:43, Willem de Bruijn wrote:
> >> From: Willem de Bruijn <willemb@google.com>
> 
> (...)
> 
> >> We have been doing this for debug builds outside ksft too.
> >>
> >> Previous setting was 10000. A manual 50 runs in virtme-ng showed two
> >> failures that needed 12000. To be on the safe side, Increase to 14000.
> > 
> > So far (in 3 runs), it looks like 14000 is enough. But I guess it is
> > still a bit too early to conclude that.
> > 
> > https://netdev.bots.linux.dev/contest.html?executor=vmksft-packetdrill-dbg
> > 
> > (Your patch has been introduced in the net-next-2024-09-19--15-00 branch.)
> One week after the introduction of this patch and >50 builds, it looks
> like the results are good, only one issue related to timing issues:
> 
> https://netdev-3.bots.linux.dev/vmksft-packetdrill-dbg/results/782181/1-tcp-slow-start-slow-start-after-win-update-pkt/stdout
> 
> And it passed after a retry.
> 
> https://netdev.bots.linux.dev/flakes.html?min-flip=0&tn-needle=packetdrill

Thanks Matthieu.
diff mbox series

Patch

diff --git a/tools/testing/selftests/net/packetdrill/ksft_runner.sh b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
index 7478c0c0c9aa..4071c133f29e 100755
--- a/tools/testing/selftests/net/packetdrill/ksft_runner.sh
+++ b/tools/testing/selftests/net/packetdrill/ksft_runner.sh
@@ -30,12 +30,17 @@  if [ -z "$(which packetdrill)" ]; then
 	exit "$KSFT_SKIP"
 fi
 
+declare -a optargs
+if [[ -n "${KSFT_MACHINE_SLOW}" ]]; then
+	optargs+=('--tolerance_usecs=14000')
+fi
+
 ktap_print_header
 ktap_set_plan 2
 
-unshare -n packetdrill ${ipv4_args[@]} $(basename $script) > /dev/null \
+unshare -n packetdrill ${ipv4_args[@]} ${optargs[@]} $(basename $script) > /dev/null \
 	&& ktap_test_pass "ipv4" || ktap_test_fail "ipv4"
-unshare -n packetdrill ${ipv6_args[@]} $(basename $script) > /dev/null \
+unshare -n packetdrill ${ipv6_args[@]} ${optargs[@]} $(basename $script) > /dev/null \
 	&& ktap_test_pass "ipv6" || ktap_test_fail "ipv6"
 
 ktap_finished