Message ID | 20220228114325.818294-1-thuth@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | tests/avocado: Cancel BootLinux tests in case there is no free port | expand |
Hi, Thomas, sorry for the late reply, I was in PTO. Just in case it is still needed: On Mon, Feb 28, 2022 at 12:43:25PM +0100, Thomas Huth wrote: > The BootLinux tests are currently failing with an ugly python > stack trace on my RHEL8 system since they cannot get a free port > (likely due to the firewall settings on my system). Let's properly > check the return value of find_free_port() instead and cancel the > test gracefully if it cannot get a free port. > > Signed-off-by: Thomas Huth <thuth@redhat.com> > --- > Unfortunately, it still takes > 70 seconds for each and every > tests from tests/avocado/boot_linux.py to get canceled, so > tests/avocado/boot_linux.py still renders "make check-avocado" > for me pretty unusable... looking at the implementation of > find_free_port() in Avocado, I wonder whether there isn't a > better way to get a free port number in Python? Brute-forcing > all ports between 1024 and 65536 seems just quite cumbersome > to me... This is something that also bothers me with this method, and maybe we could get a free port using something like this: ``` with socket() as s: s.bind(('',0)) port = s.getsockname()[1] ``` I haven't benchmarked both solutions yet nor looked at socket module code, but I just created an issue[1] on Avocado's side so that we can evaluate alternatives. [1] - https://github.com/avocado-framework/avocado/issues/5273 > tests/avocado/avocado_qemu/__init__.py | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py > index 75063c0c30..9b056b5ce5 100644 > --- a/tests/avocado/avocado_qemu/__init__.py > +++ b/tests/avocado/avocado_qemu/__init__.py > @@ -603,6 +603,8 @@ def prepare_cloudinit(self, ssh_pubkey=None): > try: > cloudinit_iso = os.path.join(self.workdir, 'cloudinit.iso') > self.phone_home_port = network.find_free_port() > + if not self.phone_home_port: > + self.cancel('Failed to get a free port') > pubkey_content = None > if ssh_pubkey: > with open(ssh_pubkey) as pubkey: In any case, this LGTM. Reviewed-by: Beraldo Leal <bleal@redhat.com> -- Beraldo
On Mon, Feb 28, 2022 at 12:43:25PM +0100, Thomas Huth wrote: > The BootLinux tests are currently failing with an ugly python > stack trace on my RHEL8 system since they cannot get a free port > (likely due to the firewall settings on my system). Let's properly > check the return value of find_free_port() instead and cancel the > test gracefully if it cannot get a free port. > > Signed-off-by: Thomas Huth <thuth@redhat.com> > --- > Unfortunately, it still takes > 70 seconds for each and every > tests from tests/avocado/boot_linux.py to get canceled, so > tests/avocado/boot_linux.py still renders "make check-avocado" > for me pretty unusable... looking at the implementation of > find_free_port() in Avocado, I wonder whether there isn't a > better way to get a free port number in Python? Brute-forcing > all ports between 1024 and 65536 seems just quite cumbersome > to me... Even in the worst case of testing every single port, for INET and INET6 and for STREAM and DGRAM sockets, that find_free_port port completes in a couple of seconds. The impl though is totally overkill, in checking for both INET and INET6 and STREAM and DGRAM. The code using the phone_home_port hardcodes 0.0.0.0 and TCP. So it would be sufficient to let the kernel tell you what port is free: import socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.bind(("0.0.0.0", 0)) freeport = s.getsockname()[1] s.close() return freeport This code is all inherantly racy though, because as designed it is checking for a port that's available and then later calls wait_for_phone_home which spins up a HTTP server listening on the port. The port can be used in between the check and use. This can be the case if running many things in parallel on the host. It would be better to spin up that server using kernel port auto-selection at the start eliminating the race entirely. Then just record the port that was allocated and use that when building thue cloudinit config for the guest. Regards, Daniel
On 07/03/2022 13.50, Daniel P. Berrangé wrote: > On Mon, Feb 28, 2022 at 12:43:25PM +0100, Thomas Huth wrote: >> The BootLinux tests are currently failing with an ugly python >> stack trace on my RHEL8 system since they cannot get a free port >> (likely due to the firewall settings on my system). Let's properly >> check the return value of find_free_port() instead and cancel the >> test gracefully if it cannot get a free port. >> >> Signed-off-by: Thomas Huth <thuth@redhat.com> >> --- >> Unfortunately, it still takes > 70 seconds for each and every >> tests from tests/avocado/boot_linux.py to get canceled, so >> tests/avocado/boot_linux.py still renders "make check-avocado" >> for me pretty unusable... looking at the implementation of >> find_free_port() in Avocado, I wonder whether there isn't a >> better way to get a free port number in Python? Brute-forcing >> all ports between 1024 and 65536 seems just quite cumbersome >> to me... > > Even in the worst case of testing every single port, > for INET and INET6 and for STREAM and DGRAM sockets, > that find_free_port port completes in a couple of > seconds. Weird, on my system, the test runs for 70 seconds, just to finally discovered that there was no free port available. > This code is all inherantly racy though, because as > designed it is checking for a port that's available > and then later calls wait_for_phone_home which spins > up a HTTP server listening on the port. The port can > be used in between the check and use. This can be > the case if running many things in parallel on the > host. > > It would be better to spin up that server using > kernel port auto-selection at the start eliminating > the race entirely. Then just record the port that > was allocated and use that when building thue > cloudinit config for the guest. That sounds much better, indeed... now we just need a volunteer to fix it ;-) Thomas
On Mon, Mar 07, 2022 at 07:31:50PM +0100, Thomas Huth wrote: > On 07/03/2022 13.50, Daniel P. Berrangé wrote: > > On Mon, Feb 28, 2022 at 12:43:25PM +0100, Thomas Huth wrote: > > > The BootLinux tests are currently failing with an ugly python > > > stack trace on my RHEL8 system since they cannot get a free port > > > (likely due to the firewall settings on my system). Let's properly > > > check the return value of find_free_port() instead and cancel the > > > test gracefully if it cannot get a free port. > > > > > > Signed-off-by: Thomas Huth <thuth@redhat.com> > > > --- > > > Unfortunately, it still takes > 70 seconds for each and every > > > tests from tests/avocado/boot_linux.py to get canceled, so > > > tests/avocado/boot_linux.py still renders "make check-avocado" > > > for me pretty unusable... looking at the implementation of > > > find_free_port() in Avocado, I wonder whether there isn't a > > > better way to get a free port number in Python? Brute-forcing > > > all ports between 1024 and 65536 seems just quite cumbersome > > > to me... > > > > Even in the worst case of testing every single port, > > for INET and INET6 and for STREAM and DGRAM sockets, > > that find_free_port port completes in a couple of > > seconds. > > Weird, on my system, the test runs for 70 seconds, just to finally > discovered that there was no free port available. Incidentally I'm really suprised you even hit the 'no free port' scenario. I've never seen that happen unless the machine was basically doomed due to something leaking open sockets by the 10's of 1000's. You mention firewall settings above, but I don't think that's relevant. The find_free_port() call is with no args, so it gets set to 'address=localhost' which means is_port_free takes the bind() codepath. The firewall has no interaction with the bind() codepath in the kernel AFAIK. Regards, Daniel
On 07/03/2022 19.48, Daniel P. Berrangé wrote: > On Mon, Mar 07, 2022 at 07:31:50PM +0100, Thomas Huth wrote: >> On 07/03/2022 13.50, Daniel P. Berrangé wrote: >>> On Mon, Feb 28, 2022 at 12:43:25PM +0100, Thomas Huth wrote: >>>> The BootLinux tests are currently failing with an ugly python >>>> stack trace on my RHEL8 system since they cannot get a free port >>>> (likely due to the firewall settings on my system). Let's properly >>>> check the return value of find_free_port() instead and cancel the >>>> test gracefully if it cannot get a free port. >>>> >>>> Signed-off-by: Thomas Huth <thuth@redhat.com> >>>> --- >>>> Unfortunately, it still takes > 70 seconds for each and every >>>> tests from tests/avocado/boot_linux.py to get canceled, so >>>> tests/avocado/boot_linux.py still renders "make check-avocado" >>>> for me pretty unusable... looking at the implementation of >>>> find_free_port() in Avocado, I wonder whether there isn't a >>>> better way to get a free port number in Python? Brute-forcing >>>> all ports between 1024 and 65536 seems just quite cumbersome >>>> to me... >>> >>> Even in the worst case of testing every single port, >>> for INET and INET6 and for STREAM and DGRAM sockets, >>> that find_free_port port completes in a couple of >>> seconds. >> >> Weird, on my system, the test runs for 70 seconds, just to finally >> discovered that there was no free port available. > > Incidentally I'm really suprised you even hit the 'no free port' > scenario. I've never seen that happen unless the machine was > basically doomed due to something leaking open sockets by the > 10's of 1000's. > > You mention firewall settings above, but I don't think that's > relevant. The find_free_port() call is with no args, so it > gets set to 'address=localhost' which means is_port_free > takes the bind() codepath. The firewall has no interaction > with the bind() codepath in the kernel AFAIK. Yes, I've now had another closer look, and indeed, the firewall is not the problem here - but is_port_free() seems to be very buggy. The problem is that I do not have any IPv6 address configured on my system, and in that case the current code fails. See https://github.com/avocado-framework/avocado/issues/5273 ... I need this patch to fix it: diff --git a/avocado/utils/network/ports.py b/avocado/utils/network/ports.py --- a/avocado/utils/network/ports.py +++ b/avocado/utils/network/ports.py @@ -60,7 +60,7 @@ def is_port_free(port, address): if localhost: return False sock.close() - return True + return True finally: if sock is not None: sock.close() Thomas
Thomas Huth <thuth@redhat.com> writes: > The BootLinux tests are currently failing with an ugly python > stack trace on my RHEL8 system since they cannot get a free port > (likely due to the firewall settings on my system). Let's properly > check the return value of find_free_port() instead and cancel the > test gracefully if it cannot get a free port. > > Signed-off-by: Thomas Huth <thuth@redhat.com> > --- > Unfortunately, it still takes > 70 seconds for each and every > tests from tests/avocado/boot_linux.py to get canceled, so > tests/avocado/boot_linux.py still renders "make check-avocado" > for me pretty unusable... looking at the implementation of > find_free_port() in Avocado, I wonder whether there isn't a > better way to get a free port number in Python? Brute-forcing > all ports between 1024 and 65536 seems just quite cumbersome > to me... > > tests/avocado/avocado_qemu/__init__.py | 2 ++ > 1 file changed, 2 insertions(+) > LGTM, despite the root issue is being addressed in Avocado. Reviewed-by: Cleber Rosa <crosa@redhat.com>
diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py index 75063c0c30..9b056b5ce5 100644 --- a/tests/avocado/avocado_qemu/__init__.py +++ b/tests/avocado/avocado_qemu/__init__.py @@ -603,6 +603,8 @@ def prepare_cloudinit(self, ssh_pubkey=None): try: cloudinit_iso = os.path.join(self.workdir, 'cloudinit.iso') self.phone_home_port = network.find_free_port() + if not self.phone_home_port: + self.cancel('Failed to get a free port') pubkey_content = None if ssh_pubkey: with open(ssh_pubkey) as pubkey:
The BootLinux tests are currently failing with an ugly python stack trace on my RHEL8 system since they cannot get a free port (likely due to the firewall settings on my system). Let's properly check the return value of find_free_port() instead and cancel the test gracefully if it cannot get a free port. Signed-off-by: Thomas Huth <thuth@redhat.com> --- Unfortunately, it still takes > 70 seconds for each and every tests from tests/avocado/boot_linux.py to get canceled, so tests/avocado/boot_linux.py still renders "make check-avocado" for me pretty unusable... looking at the implementation of find_free_port() in Avocado, I wonder whether there isn't a better way to get a free port number in Python? Brute-forcing all ports between 1024 and 65536 seems just quite cumbersome to me... tests/avocado/avocado_qemu/__init__.py | 2 ++ 1 file changed, 2 insertions(+)