Message ID | 20250218195048.74692-3-kuba@kernel.org (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | selftests: drv-net: improve the queue test for XSK | expand |
On 02/18, Jakub Kicinski wrote: > We use wait_port_listen() extensively to wait for a process > we spawned to be ready. Not all processes will open listening > sockets. Add a method of explicitly waiting for a child to > be ready. Pass a FD to the spawned process and wait for it > to write a message to us. FD number is passed via KSFT_READY_FD > env variable. > > Make use of this method in the queues test to make it less flaky. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > .../selftests/drivers/net/xdp_helper.c | 22 ++++++- > tools/testing/selftests/drivers/net/queues.py | 46 ++++++--------- > tools/testing/selftests/net/lib/py/utils.py | 58 +++++++++++++++++-- > 3 files changed, 93 insertions(+), 33 deletions(-) > > diff --git a/tools/testing/selftests/drivers/net/xdp_helper.c b/tools/testing/selftests/drivers/net/xdp_helper.c > index cf06a88b830b..8f77da4f798f 100644 > --- a/tools/testing/selftests/drivers/net/xdp_helper.c > +++ b/tools/testing/selftests/drivers/net/xdp_helper.c > @@ -14,6 +14,25 @@ > #define UMEM_SZ (1U << 16) > #define NUM_DESC (UMEM_SZ / 2048) > > +/* Move this to a common header when reused! */ > +static void ksft_ready(void) > +{ > + const char msg[7] = "ready\n"; > + char *env_str; > + int fd; [..] > + env_str = getenv("KSFT_READY_FD"); > + if (!env_str) > + return; > + > + fd = atoi(env_str); > + if (!fd) > + return; optional nit: should these fail with error() instead of silent return? Should guarantee that the caller is doing everything correctly. (passing wait_init vs waiting for a port)
On Tue, 18 Feb 2025 13:10:14 -0800 Stanislav Fomichev wrote: > > + env_str = getenv("KSFT_READY_FD"); > > + if (!env_str) > > + return; > > + > > + fd = atoi(env_str); > > + if (!fd) > > + return; > > optional nit: should these fail with error() instead of silent return? > Should guarantee that the caller is doing everything correctly. > (passing wait_init vs waiting for a port) My thinking was that you may want to run the helper manually during development and then the env variable won't be set. Given that we currently don't expose the stdout/stderr if wait fails adding a print seemed a little performative.. We need to go back and provide better support for debug (/verbose output) at some stage.
On 02/18, Jakub Kicinski wrote: > On Tue, 18 Feb 2025 13:10:14 -0800 Stanislav Fomichev wrote: > > > + env_str = getenv("KSFT_READY_FD"); > > > + if (!env_str) > > > + return; > > > + > > > + fd = atoi(env_str); > > > + if (!fd) > > > + return; > > > > optional nit: should these fail with error() instead of silent return? > > Should guarantee that the caller is doing everything correctly. > > (passing wait_init vs waiting for a port) > > My thinking was that you may want to run the helper manually during > development and then the env variable won't be set. > > Given that we currently don't expose the stdout/stderr if wait fails > adding a print seemed a little performative.. We need to go back > and provide better support for debug (/verbose output) at some stage. Good point. In this case, we can fail only on the first case (!env_str); should still catch the wrong invocation and let you do KSFT_READY_FD= ./helper manual run. But agreed, that not worth the respin..
On Tue, Feb 18, 2025 at 11:50:46AM -0800, Jakub Kicinski wrote: > We use wait_port_listen() extensively to wait for a process > we spawned to be ready. Not all processes will open listening > sockets. Add a method of explicitly waiting for a child to > be ready. Pass a FD to the spawned process and wait for it > to write a message to us. FD number is passed via KSFT_READY_FD > env variable. > > Make use of this method in the queues test to make it less flaky. > > Signed-off-by: Jakub Kicinski <kuba@kernel.org> > --- > .../selftests/drivers/net/xdp_helper.c | 22 ++++++- > tools/testing/selftests/drivers/net/queues.py | 46 ++++++--------- > tools/testing/selftests/net/lib/py/utils.py | 58 +++++++++++++++++-- > 3 files changed, 93 insertions(+), 33 deletions(-) > > diff --git a/tools/testing/selftests/drivers/net/xdp_helper.c b/tools/testing/selftests/drivers/net/xdp_helper.c > index cf06a88b830b..8f77da4f798f 100644 > --- a/tools/testing/selftests/drivers/net/xdp_helper.c > +++ b/tools/testing/selftests/drivers/net/xdp_helper.c > @@ -14,6 +14,25 @@ > #define UMEM_SZ (1U << 16) > #define NUM_DESC (UMEM_SZ / 2048) > > +/* Move this to a common header when reused! */ > +static void ksft_ready(void) > +{ > + const char msg[7] = "ready\n"; > + char *env_str; > + int fd; > + > + env_str = getenv("KSFT_READY_FD"); > + if (!env_str) > + return; > + > + fd = atoi(env_str); > + if (!fd) > + return; > + > + write(fd, msg, sizeof(msg)); > + close(fd); > +} > + > /* this is a simple helper program that creates an XDP socket and does the > * minimum necessary to get bind() to succeed. > * > @@ -85,8 +104,7 @@ int main(int argc, char **argv) > return 1; > } > > - /* give the parent program some data when the socket is ready*/ > - fprintf(stdout, "%d\n", sock_fd); > + ksft_ready(); > > /* parent program will write a byte to stdin when its ready for this > * helper to exit > diff --git a/tools/testing/selftests/drivers/net/queues.py b/tools/testing/selftests/drivers/net/queues.py > index b6896a57a5fd..91e344d108ee 100755 > --- a/tools/testing/selftests/drivers/net/queues.py > +++ b/tools/testing/selftests/drivers/net/queues.py > @@ -5,13 +5,12 @@ from lib.py import ksft_disruptive, ksft_exit, ksft_run > from lib.py import ksft_eq, ksft_raises, KsftSkipEx, KsftFailEx > from lib.py import EthtoolFamily, NetdevFamily, NlError > from lib.py import NetDrvEnv > -from lib.py import cmd, defer, ip > +from lib.py import bkg, cmd, defer, ip > import errno > import glob > import os > import socket > import struct > -import subprocess > > def sys_get_queues(ifname, qtype='rx') -> int: > folders = glob.glob(f'/sys/class/net/{ifname}/queues/{qtype}-*') > @@ -25,37 +24,30 @@ import subprocess > return None > > def check_xdp(cfg, nl, xdp_queue_id=0) -> None: > - xdp = subprocess.Popen([cfg.rpath("xdp_helper"), f"{cfg.ifindex}", f"{xdp_queue_id}"], > - stdin=subprocess.PIPE, stdout=subprocess.PIPE, bufsize=1, > - text=True) > - defer(xdp.kill) > + with bkg(f'{cfg.rpath("xdp_helper")} {cfg.ifindex} {xdp_queue_id}', > + wait_init=3): > > - stdout, stderr = xdp.communicate(timeout=10) > - rx = tx = False > + rx = tx = False > > - if xdp.returncode == 255: > - raise KsftSkipEx('AF_XDP unsupported') > - elif xdp.returncode > 0: Removing this check causes a stack trace on my XDP-disabled kernel, whereas with the existing code it caused a skip. Maybe that's OK, though? The issue is that xdp_helper.c fails and exits with return -1 before the call to ksft_ready() which results in the following: # Exception| Traceback (most recent call last): # Exception| File "/home/jdamato/code/net-next/tools/testing/selftests/net/lib/py/ksft.py", line 223, in ksft_run # Exception| case(*args) # Exception| File "/home/jdamato/code/net-next/./tools/testing/selftests/drivers/net/queues.py", line 27, in check_xsk # Exception| with bkg(f'{cfg.rpath("xdp_helper")} {cfg.ifindex} {xdp_queue_id}', # Exception| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ # Exception| File "/home/jdamato/code/net-next/tools/testing/selftests/net/lib/py/utils.py", line 108, in __init__ # Exception| super().__init__(comm, background=True, # Exception| File "/home/jdamato/code/net-next/tools/testing/selftests/net/lib/py/utils.py", line 63, in __init__ # Exception| raise Exception("Did not receive ready message") # Exception| Exception: Did not receive ready message not ok 4 queues.check_xsk # Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0 I had originally modified the test so that if XDP is disabled in the kernel it would skip, but I think you mentioned in a previous thread that this was a "non-goal", IIRC ? No strong opinion on my side as to what the behavior should be when XDP is disabled, but wanted to mention this so that the behavior change was known. Separately: I retested this on a machine with XDP enabled, both with and without NETIF set and the test seems to hang because the helper is blocked on: read(STDIN_FILENO, &byte, 1); according to strace: strace: Process 14198 attached 21:50:02 read(0, So, I think this patch needs to be tweaked to write a byte to the helper so it exits (I assume before the defer was killing it?) or the helper needs to be modified in way?
On Tue, 18 Feb 2025 16:52:39 -0500 Joe Damato wrote: > Removing this check causes a stack trace on my XDP-disabled kernel, > whereas with the existing code it caused a skip. > > Maybe that's OK, though? > > The issue is that xdp_helper.c fails and exits with return -1 before > the call to ksft_ready() which results in the following: > > # Exception| Traceback (most recent call last): > # Exception| File "/home/jdamato/code/net-next/tools/testing/selftests/net/lib/py/ksft.py", line 223, in ksft_run > # Exception| case(*args) > # Exception| File "/home/jdamato/code/net-next/./tools/testing/selftests/drivers/net/queues.py", line 27, in check_xsk > # Exception| with bkg(f'{cfg.rpath("xdp_helper")} {cfg.ifindex} {xdp_queue_id}', > # Exception| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > # Exception| File "/home/jdamato/code/net-next/tools/testing/selftests/net/lib/py/utils.py", line 108, in __init__ > # Exception| super().__init__(comm, background=True, > # Exception| File "/home/jdamato/code/net-next/tools/testing/selftests/net/lib/py/utils.py", line 63, in __init__ > # Exception| raise Exception("Did not receive ready message") > # Exception| Exception: Did not receive ready message > not ok 4 queues.check_xsk > # Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0 > > I had originally modified the test so that if XDP is disabled in the > kernel it would skip, but I think you mentioned in a previous thread > that this was a "non-goal", IIRC ? > > No strong opinion on my side as to what the behavior should be when > XDP is disabled, but wanted to mention this so that the behavior > change was known. I thought of doing this: diff --git a/tools/testing/selftests/drivers/net/xdp_helper.c b/tools/testing/selftests/drivers/net/xdp_helper.c index 8f77da4f798f..8c34e8915fc4 100644 --- a/tools/testing/selftests/drivers/net/xdp_helper.c +++ b/tools/testing/selftests/drivers/net/xdp_helper.c @@ -53,7 +53,7 @@ int main(int argc, char **argv) int queue; char byte; - if (argc != 3) { + if (argc > 1 && argc != 3) { fprintf(stderr, "Usage: %s ifindex queue_id", argv[0]); return 1; } @@ -69,6 +69,13 @@ int main(int argc, char **argv) return 1; } + + if (argc == 1) { + printf("AF_XDP support detected\n"); + close(sock_fd); + return 0; + } + ifindex = atoi(argv[1]); queue = atoi(argv[2]); Then we can run the helper with no arguments, just to check if af_xdp is supported. If that returns 0 we go on, otherwise we print your nice error. LMK if that sounds good, assuming a respin is needed I can add that :) > Separately: I retested this on a machine with XDP enabled, both with > and without NETIF set and the test seems to hang because the helper > is blocked on: > > read(STDIN_FILENO, &byte, 1); > > according to strace: > > strace: Process 14198 attached > 21:50:02 read(0, > > So, I think this patch needs to be tweaked to write a byte to the > helper so it exits (I assume before the defer was killing it?) or > the helper needs to be modified in way? What Python version do you have? For me the xdp process doesn't wait at all. Running this under vng and Python 3.13 the read returns 0 immediately. Even if it doesn't we run bkg() with default params, so exit_wait=False init will set: self.terminate = not exit_wait and then __exit__ will do: return self.process(terminate=self.terminate, fail=self.check_fail) which does: if self.terminate: self.proc.terminate() so the helper should get a SIGINT, no? We shall find out if NIPA agrees with my local system at 4p.
On Tue, 18 Feb 2025 15:05:12 -0800 Jakub Kicinski wrote:
> We shall find out if NIPA agrees with my local system at 4p.
NIPA agrees with you, I'll take another look tomorrow.
On Tue, Feb 18, 2025 at 03:05:12PM -0800, Jakub Kicinski wrote: > On Tue, 18 Feb 2025 16:52:39 -0500 Joe Damato wrote: > > Removing this check causes a stack trace on my XDP-disabled kernel, > > whereas with the existing code it caused a skip. > > > > Maybe that's OK, though? > > > > The issue is that xdp_helper.c fails and exits with return -1 before > > the call to ksft_ready() which results in the following: > > > > # Exception| Traceback (most recent call last): > > # Exception| File "/home/jdamato/code/net-next/tools/testing/selftests/net/lib/py/ksft.py", line 223, in ksft_run > > # Exception| case(*args) > > # Exception| File "/home/jdamato/code/net-next/./tools/testing/selftests/drivers/net/queues.py", line 27, in check_xsk > > # Exception| with bkg(f'{cfg.rpath("xdp_helper")} {cfg.ifindex} {xdp_queue_id}', > > # Exception| ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > > # Exception| File "/home/jdamato/code/net-next/tools/testing/selftests/net/lib/py/utils.py", line 108, in __init__ > > # Exception| super().__init__(comm, background=True, > > # Exception| File "/home/jdamato/code/net-next/tools/testing/selftests/net/lib/py/utils.py", line 63, in __init__ > > # Exception| raise Exception("Did not receive ready message") > > # Exception| Exception: Did not receive ready message > > not ok 4 queues.check_xsk > > # Totals: pass:3 fail:1 xfail:0 xpass:0 skip:0 error:0 > > > > I had originally modified the test so that if XDP is disabled in the > > kernel it would skip, but I think you mentioned in a previous thread > > that this was a "non-goal", IIRC ? > > > > No strong opinion on my side as to what the behavior should be when > > XDP is disabled, but wanted to mention this so that the behavior > > change was known. > > I thought of doing this: > > diff --git a/tools/testing/selftests/drivers/net/xdp_helper.c b/tools/testing/selftests/drivers/net/xdp_helper.c > index 8f77da4f798f..8c34e8915fc4 100644 > --- a/tools/testing/selftests/drivers/net/xdp_helper.c > +++ b/tools/testing/selftests/drivers/net/xdp_helper.c > @@ -53,7 +53,7 @@ int main(int argc, char **argv) > int queue; > char byte; > > - if (argc != 3) { > + if (argc > 1 && argc != 3) { > fprintf(stderr, "Usage: %s ifindex queue_id", argv[0]); > return 1; > } > @@ -69,6 +69,13 @@ int main(int argc, char **argv) > return 1; > } > > + > + if (argc == 1) { > + printf("AF_XDP support detected\n"); > + close(sock_fd); > + return 0; > + } > + > ifindex = atoi(argv[1]); > queue = atoi(argv[2]); > > > Then we can run the helper with no arguments, just to check if af_xdp > is supported. If that returns 0 we go on, otherwise we print your nice > error. > > LMK if that sounds good, assuming a respin is needed I can add that :) That seems to fine; if you do decide to go this route in a re-spin, would you mind also adding a "\n" in the fprintf after "ifindex queue_id" ? Sorry I missed that on my initial implementation. That missing \n was mentioned by Kurt in another thread: https://lore.kernel.org/netdev/878qq22xk3.fsf@kurt.kurt.home/T/#m4d0778b9e849bc72064074105182f4c84ba55eb2 > > Separately: I retested this on a machine with XDP enabled, both with > > and without NETIF set and the test seems to hang because the helper > > is blocked on: > > > > read(STDIN_FILENO, &byte, 1); > > > > according to strace: > > > > strace: Process 14198 attached > > 21:50:02 read(0, > > > > So, I think this patch needs to be tweaked to write a byte to the > > helper so it exits (I assume before the defer was killing it?) or > > the helper needs to be modified in way? > > What Python version do you have? Python 3.12.3 (via Ubuntu 24.04.1 LTS). I can re-test with a different version using pyenv if you'd like? Are there docs which mention which python version tests should be compatible with? If so, could you pass along a link? Sorry if I missed that. > > For me the xdp process doesn't wait at all. Running this under vng > and Python 3.13 the read returns 0 immediately. Interesting, so something changed between 3.12.3 and 3.13, I suppose. > Even if it doesn't we run bkg() with default params, so exit_wait=False > init will set: > > self.terminate = not exit_wait > > and then __exit__ will do: > > return self.process(terminate=self.terminate, fail=self.check_fail) > > which does: > > if self.terminate: > self.proc.terminate() > > so the helper should get a SIGINT, no? (Minor nit, the python docs say terminate delivers a SIGTERM, but please continue reading) I'm not a python programmer but did try a bit to debug this. Looking at strace on my system, it seems that the script forks twice and the SIGTERM is sent to the wrong pid? I suppose it forks twice because the helper is invoked via /bin/sh: 18:27:15 vfork(strace: Process 448303 attached <unfinished ...> So, pid 448303 is the first forked process, which does an execve: [pid 448303] 18:27:15 execve("/bin/sh", ["/bin/sh", "-c", "/home/jdamato/code/net-next/tools/testing/selftests/drivers/net/xdp_helper 5 0"], 0x7f661511fad0 /* 26 vars */ <unfinished ...> So /bin/sh runs and that then forks again to run the helper: [pid 448278] 18:27:15 <... vfork resumed>) = 448303 [pid 448303] 18:27:15 vfork(strace: Process 448304 attached <unfinished ...> [pid 448304] 18:27:15 execve("/home/jdamato/code/net-next/tools/testing/selftests/drivers/net/xdp_helper", ["/home/jdamato/code/net-next/tools/testing/selftests/drivers/net/xdp_helper", "5", "0"], 0x55faf4f81918 /* 26 vars */ <unfinished ...> So pid 448304 is the process doing the execve to run xdp_helper and would need to be the one getting the SIGTERM/SIGINT We can see it does the read: [pid 448304] 18:27:15 read(0, <unfinished ...> However, later the python script does the terminate flow you've described in your message but sends the signal to the /bin/sh that was forked off: [pid 448278] 18:27:15 kill(448303, SIGTERM) = 0 [...] [pid 448303] 18:27:15 +++ killed by SIGTERM +++ But pid 448304 is xdp_helper, which is still running and should be the one to get the TERM. I have no idea why this would be different on your system vs mine. Maybe something changed with Python between Python versions? > We shall find out if NIPA agrees with my local system at 4p. Sorry for the noob question, but is there a NIPA url or something I can look at to see if this worked / if future tests I submit work?
On Tue, Feb 18, 2025 at 05:37:39PM -0800, Jakub Kicinski wrote: > On Tue, 18 Feb 2025 15:05:12 -0800 Jakub Kicinski wrote: > > We shall find out if NIPA agrees with my local system at 4p. > > NIPA agrees with you, I'll take another look tomorrow. I think I debugged why it happens on my system, hopefully you see that message before you go through your own debug cycle :)
On Wed, 19 Feb 2025 13:39:51 -0500 Joe Damato wrote: > On Tue, Feb 18, 2025 at 03:05:12PM -0800, Jakub Kicinski wrote: > > On Tue, 18 Feb 2025 16:52:39 -0500 Joe Damato wrote: > > Then we can run the helper with no arguments, just to check if af_xdp > > is supported. If that returns 0 we go on, otherwise we print your nice > > error. > > > > LMK if that sounds good, assuming a respin is needed I can add that :) > > That seems to fine; if you do decide to go this route in a re-spin, > would you mind also adding a "\n" in the fprintf after "ifindex > queue_id" ? Sorry I missed that on my initial implementation. > > That missing \n was mentioned by Kurt in another thread: Will do! > > > Separately: I retested this on a machine with XDP enabled, both with > > > and without NETIF set and the test seems to hang because the helper > > > is blocked on: > > > > > > read(STDIN_FILENO, &byte, 1); > > > > > > according to strace: > > > > > > strace: Process 14198 attached > > > 21:50:02 read(0, > > > > > > So, I think this patch needs to be tweaked to write a byte to the > > > helper so it exits (I assume before the defer was killing it?) or > > > the helper needs to be modified in way? > > > > What Python version do you have? > > Python 3.12.3 (via Ubuntu 24.04.1 LTS). > > I can re-test with a different version using pyenv if you'd like? > > Are there docs which mention which python version tests should be > compatible with? If so, could you pass along a link? Sorry if I > missed that. You're good, sorry, tests are supposed to run on any non-EOL version of python. <snip> > [pid 448278] 18:27:15 kill(448303, SIGTERM) = 0 > [...] > [pid 448303] 18:27:15 +++ killed by SIGTERM +++ > > But pid 448304 is xdp_helper, which is still running and should be > the one to get the TERM. Very interesting. I dug deeper into this, and it turns out its shell dependent. I'm guessing you're using one of the cool shells, I use bash. bash does a direct exec for "sh -c X", other shells fork first. I'll add a warning in bkg() for combining shell=True and terminate=True. > I have no idea why this would be different on your system vs mine. > Maybe something changed with Python between Python versions? More digging still necessary here, as NIPA also runs on bash. So your problem is different than NIPA's. NIPA runs: make -C tools/testing/selftests TARGETS="drivers/net" \ TEST_PROGS=queues.py TEST_GEN_PROGS="" run_tests which runs thru a layer of perl for output prefixing: tools/testing/selftests/kselftest/prefix.pl which in turn send a SIGTTIN when we call read(), and hangs the helper. > > We shall find out if NIPA agrees with my local system at 4p. > > Sorry for the noob question, but is there a NIPA url or something I > can look at to see if this worked / if future tests I submit work? https://netdev.bots.linux.dev/status.html With the disclaimer that we discourage people from looking at it. It tests everything on the list combined, we can't support the corporate "try until CI is green" development model. Not that I'd accuse you of such practices :)
On Wed, Feb 19, 2025 at 02:48:44PM -0800, Jakub Kicinski wrote: > On Wed, 19 Feb 2025 13:39:51 -0500 Joe Damato wrote: > > On Tue, Feb 18, 2025 at 03:05:12PM -0800, Jakub Kicinski wrote: > > > On Tue, 18 Feb 2025 16:52:39 -0500 Joe Damato wrote: [...] > <snip> > > [pid 448278] 18:27:15 kill(448303, SIGTERM) = 0 > > [...] > > [pid 448303] 18:27:15 +++ killed by SIGTERM +++ > > > > But pid 448304 is xdp_helper, which is still running and should be > > the one to get the TERM. > > Very interesting. I dug deeper into this, and it turns out its shell > dependent. I'm guessing you're using one of the cool shells, I use > bash. bash does a direct exec for "sh -c X", other shells fork first. I am using bash, as well. The version comes with Ubuntu 24.04.1: GNU bash, version 5.2.21(1)-release (x86_64-pc-linux-gnu) I wonder if it's something about the environment that causes bash to act this on my machine and NIPA but not yours? > I'll add a warning in bkg() for combining shell=True and terminate=True. > > > I have no idea why this would be different on your system vs mine. > > Maybe something changed with Python between Python versions? > > More digging still necessary here, as NIPA also runs on bash. > So your problem is different than NIPA's. > NIPA runs: > > make -C tools/testing/selftests TARGETS="drivers/net" \ > TEST_PROGS=queues.py TEST_GEN_PROGS="" run_tests > > which runs thru a layer of perl for output prefixing: > > tools/testing/selftests/kselftest/prefix.pl > > which in turn send a SIGTTIN when we call read(), and hangs the helper. > > > > We shall find out if NIPA agrees with my local system at 4p. > > > > Sorry for the noob question, but is there a NIPA url or something I > > can look at to see if this worked / if future tests I submit work? > > https://netdev.bots.linux.dev/status.html > > With the disclaimer that we discourage people from looking at it. > It tests everything on the list combined, we can't support the > corporate "try until CI is green" development model. Not that > I'd accuse you of such practices :) Yea I wasn't sure what the right process is for selftests; the ones I've hacked on or written I've run locally until they passed, but I suppose it is possible that in cases like this upstream CI will fail for some reason or another.
diff --git a/tools/testing/selftests/drivers/net/xdp_helper.c b/tools/testing/selftests/drivers/net/xdp_helper.c index cf06a88b830b..8f77da4f798f 100644 --- a/tools/testing/selftests/drivers/net/xdp_helper.c +++ b/tools/testing/selftests/drivers/net/xdp_helper.c @@ -14,6 +14,25 @@ #define UMEM_SZ (1U << 16) #define NUM_DESC (UMEM_SZ / 2048) +/* Move this to a common header when reused! */ +static void ksft_ready(void) +{ + const char msg[7] = "ready\n"; + char *env_str; + int fd; + + env_str = getenv("KSFT_READY_FD"); + if (!env_str) + return; + + fd = atoi(env_str); + if (!fd) + return; + + write(fd, msg, sizeof(msg)); + close(fd); +} + /* this is a simple helper program that creates an XDP socket and does the * minimum necessary to get bind() to succeed. * @@ -85,8 +104,7 @@ int main(int argc, char **argv) return 1; } - /* give the parent program some data when the socket is ready*/ - fprintf(stdout, "%d\n", sock_fd); + ksft_ready(); /* parent program will write a byte to stdin when its ready for this * helper to exit diff --git a/tools/testing/selftests/drivers/net/queues.py b/tools/testing/selftests/drivers/net/queues.py index b6896a57a5fd..91e344d108ee 100755 --- a/tools/testing/selftests/drivers/net/queues.py +++ b/tools/testing/selftests/drivers/net/queues.py @@ -5,13 +5,12 @@ from lib.py import ksft_disruptive, ksft_exit, ksft_run from lib.py import ksft_eq, ksft_raises, KsftSkipEx, KsftFailEx from lib.py import EthtoolFamily, NetdevFamily, NlError from lib.py import NetDrvEnv -from lib.py import cmd, defer, ip +from lib.py import bkg, cmd, defer, ip import errno import glob import os import socket import struct -import subprocess def sys_get_queues(ifname, qtype='rx') -> int: folders = glob.glob(f'/sys/class/net/{ifname}/queues/{qtype}-*') @@ -25,37 +24,30 @@ import subprocess return None def check_xdp(cfg, nl, xdp_queue_id=0) -> None: - xdp = subprocess.Popen([cfg.rpath("xdp_helper"), f"{cfg.ifindex}", f"{xdp_queue_id}"], - stdin=subprocess.PIPE, stdout=subprocess.PIPE, bufsize=1, - text=True) - defer(xdp.kill) + with bkg(f'{cfg.rpath("xdp_helper")} {cfg.ifindex} {xdp_queue_id}', + wait_init=3): - stdout, stderr = xdp.communicate(timeout=10) - rx = tx = False + rx = tx = False - if xdp.returncode == 255: - raise KsftSkipEx('AF_XDP unsupported') - elif xdp.returncode > 0: - raise KsftFailEx('unable to create AF_XDP socket') + queues = nl.queue_get({'ifindex': cfg.ifindex}, dump=True) + if not queues: + raise KsftSkipEx("Netlink reports no queues") - queues = nl.queue_get({'ifindex': cfg.ifindex}, dump=True) - if not queues: - raise KsftSkipEx("Netlink reports no queues") + for q in queues: + if q['id'] == 0: + if q['type'] == 'rx': + rx = True + if q['type'] == 'tx': + tx = True - for q in queues: - if q['id'] == 0: - if q['type'] == 'rx': - rx = True - if q['type'] == 'tx': - tx = True + ksft_eq(q['xsk'], {}) + else: + if 'xsk' in q: + _fail("Check failed: xsk attribute set.") - ksft_eq(q['xsk'], {}) - else: - if 'xsk' in q: - _fail("Check failed: xsk attribute set.") + ksft_eq(rx, True) + ksft_eq(tx, True) - ksft_eq(rx, True) - ksft_eq(tx, True) def get_queues(cfg, nl) -> None: snl = NetdevFamily(recv_size=4096) diff --git a/tools/testing/selftests/net/lib/py/utils.py b/tools/testing/selftests/net/lib/py/utils.py index 9e3bcddcf3e8..efb9b8fc1447 100644 --- a/tools/testing/selftests/net/lib/py/utils.py +++ b/tools/testing/selftests/net/lib/py/utils.py @@ -2,8 +2,10 @@ import errno import json as _json +import os import random import re +import select import socket import subprocess import time @@ -15,8 +17,22 @@ import time self.cmd = cmd_obj +def fd_read_timeout(fd, timeout): + rlist, _, _ = select.select([fd], [], [], timeout) + if rlist: + return os.read(fd, 1024) + else: + raise TimeoutError("Timeout waiting for fd read") + + class cmd: - def __init__(self, comm, shell=True, fail=True, ns=None, background=False, host=None, timeout=5): + """ + Execute a command on local or remote host. + + Use bkg() instead to run a command in the background. + """ + def __init__(self, comm, shell=True, fail=True, ns=None, background=False, + host=None, timeout=5, wait_init=None): if ns: comm = f'ip netns exec {ns} ' + comm @@ -28,8 +44,23 @@ import time if host: self.proc = host.cmd(comm) else: + # wait_init lets us wait for the background process to fully start, + # we pass an FD to the child process, and wait for it to write back + pass_fds = () + env = os.environ.copy() + if wait_init is not None: + rfd, wfd = os.pipe() + pass_fds = (wfd, ) + env["KSFT_READY_FD"] = str(wfd) self.proc = subprocess.Popen(comm, shell=shell, stdout=subprocess.PIPE, - stderr=subprocess.PIPE) + stderr=subprocess.PIPE, pass_fds=pass_fds, + env=env) + if wait_init is not None: + os.close(wfd) + msg = fd_read_timeout(rfd, wait_init) + os.close(rfd) + if not msg: + raise Exception("Did not receive ready message") if not background: self.process(terminate=False, fail=fail, timeout=timeout) @@ -54,10 +85,29 @@ import time class bkg(cmd): + """ + Run a command in the background. + + Examples usage: + + Run a command on remote host, and wait for it to finish. + This is usually paired with wait_port_listen() to make sure + the command has initialized: + + with bkg("socat ...", exit_wait=True, host=cfg.remote) as nc: + ... + + Run a command and expect it to let us know that it's ready + by writing to a special file decriptor passed via KSFT_READY_FD. + Command will be terminated when we exit the context manager: + + with bkg("my_binary", wait_init=5): + """ def __init__(self, comm, shell=True, fail=None, ns=None, host=None, - exit_wait=False): + exit_wait=False, wait_init=None): super().__init__(comm, background=True, - shell=shell, fail=fail, ns=ns, host=host) + shell=shell, fail=fail, ns=ns, host=host, + wait_init=wait_init) self.terminate = not exit_wait self.check_fail = fail
We use wait_port_listen() extensively to wait for a process we spawned to be ready. Not all processes will open listening sockets. Add a method of explicitly waiting for a child to be ready. Pass a FD to the spawned process and wait for it to write a message to us. FD number is passed via KSFT_READY_FD env variable. Make use of this method in the queues test to make it less flaky. Signed-off-by: Jakub Kicinski <kuba@kernel.org> --- .../selftests/drivers/net/xdp_helper.c | 22 ++++++- tools/testing/selftests/drivers/net/queues.py | 46 ++++++--------- tools/testing/selftests/net/lib/py/utils.py | 58 +++++++++++++++++-- 3 files changed, 93 insertions(+), 33 deletions(-)