diff mbox series

tests/avocado: Cancel BootLinux tests in case there is no free port

Message ID 20220228114325.818294-1-thuth@redhat.com (mailing list archive)
State New, archived
Headers show
Series tests/avocado: Cancel BootLinux tests in case there is no free port | expand

Commit Message

Thomas Huth Feb. 28, 2022, 11:43 a.m. UTC
The BootLinux tests are currently failing with an ugly python
stack trace on my RHEL8 system since they cannot get a free port
(likely due to the firewall settings on my system). Let's properly
check the return value of find_free_port() instead and cancel the
test gracefully if it cannot get a free port.

Signed-off-by: Thomas Huth <thuth@redhat.com>
---
 Unfortunately, it still takes > 70 seconds for each and every
 tests from tests/avocado/boot_linux.py to get canceled, so
 tests/avocado/boot_linux.py still renders "make check-avocado"
 for me pretty unusable... looking at the implementation of
 find_free_port() in Avocado, I wonder whether there isn't a
 better way to get a free port number in Python? Brute-forcing
 all ports between 1024 and 65536 seems just quite cumbersome
 to me...

 tests/avocado/avocado_qemu/__init__.py | 2 ++
 1 file changed, 2 insertions(+)

Comments

Beraldo Leal March 7, 2022, 12:31 p.m. UTC | #1
Hi, Thomas, sorry for the late reply, I was in PTO.

Just in case it is still needed:

On Mon, Feb 28, 2022 at 12:43:25PM +0100, Thomas Huth wrote:
> The BootLinux tests are currently failing with an ugly python
> stack trace on my RHEL8 system since they cannot get a free port
> (likely due to the firewall settings on my system). Let's properly
> check the return value of find_free_port() instead and cancel the
> test gracefully if it cannot get a free port.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  Unfortunately, it still takes > 70 seconds for each and every
>  tests from tests/avocado/boot_linux.py to get canceled, so
>  tests/avocado/boot_linux.py still renders "make check-avocado"
>  for me pretty unusable... looking at the implementation of
>  find_free_port() in Avocado, I wonder whether there isn't a
>  better way to get a free port number in Python? Brute-forcing
>  all ports between 1024 and 65536 seems just quite cumbersome
>  to me...

This is something that also bothers me with this method, and maybe we
could get a free port using something like this:

```
with socket() as s: 
    s.bind(('',0)) 
    port = s.getsockname()[1]
```  

I haven't benchmarked both solutions yet nor looked at socket module
code, but I just created an issue[1] on Avocado's side so that we can
evaluate alternatives.

[1] - https://github.com/avocado-framework/avocado/issues/5273

>  tests/avocado/avocado_qemu/__init__.py | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
> index 75063c0c30..9b056b5ce5 100644
> --- a/tests/avocado/avocado_qemu/__init__.py
> +++ b/tests/avocado/avocado_qemu/__init__.py
> @@ -603,6 +603,8 @@ def prepare_cloudinit(self, ssh_pubkey=None):
>          try:
>              cloudinit_iso = os.path.join(self.workdir, 'cloudinit.iso')
>              self.phone_home_port = network.find_free_port()
> +            if not self.phone_home_port:
> +                self.cancel('Failed to get a free port')
>              pubkey_content = None
>              if ssh_pubkey:
>                  with open(ssh_pubkey) as pubkey:

In any case, this LGTM.

Reviewed-by: Beraldo Leal <bleal@redhat.com>

--
Beraldo
Daniel P. Berrangé March 7, 2022, 12:50 p.m. UTC | #2
On Mon, Feb 28, 2022 at 12:43:25PM +0100, Thomas Huth wrote:
> The BootLinux tests are currently failing with an ugly python
> stack trace on my RHEL8 system since they cannot get a free port
> (likely due to the firewall settings on my system). Let's properly
> check the return value of find_free_port() instead and cancel the
> test gracefully if it cannot get a free port.
> 
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  Unfortunately, it still takes > 70 seconds for each and every
>  tests from tests/avocado/boot_linux.py to get canceled, so
>  tests/avocado/boot_linux.py still renders "make check-avocado"
>  for me pretty unusable... looking at the implementation of
>  find_free_port() in Avocado, I wonder whether there isn't a
>  better way to get a free port number in Python? Brute-forcing
>  all ports between 1024 and 65536 seems just quite cumbersome
>  to me...

Even in the worst case of testing every single port,
for INET and INET6 and for STREAM and DGRAM sockets,
that find_free_port port completes in a couple of
seconds. 

The impl though is totally overkill, in checking
for both INET and INET6 and STREAM and DGRAM.

The code using the phone_home_port hardcodes
0.0.0.0 and TCP. So it would be sufficient to
let the kernel tell you what port is free:

   import socket
   s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
   s.bind(("0.0.0.0", 0))

   freeport = s.getsockname()[1]
   s.close()
   return freeport


This code is all inherantly racy though, because as
designed it is checking for a port that's available
and then later calls wait_for_phone_home which spins
up a HTTP server listening on the port. The port can
be used in between the check and use. This can be
the case if running many things in parallel on the
host.

It would be better to spin up that server using
kernel port auto-selection at the start eliminating
the race entirely.  Then just record the port that
was allocated and use that when building thue
cloudinit config for the guest.

Regards,
Daniel
Thomas Huth March 7, 2022, 6:31 p.m. UTC | #3
On 07/03/2022 13.50, Daniel P. Berrangé wrote:
> On Mon, Feb 28, 2022 at 12:43:25PM +0100, Thomas Huth wrote:
>> The BootLinux tests are currently failing with an ugly python
>> stack trace on my RHEL8 system since they cannot get a free port
>> (likely due to the firewall settings on my system). Let's properly
>> check the return value of find_free_port() instead and cancel the
>> test gracefully if it cannot get a free port.
>>
>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>> ---
>>   Unfortunately, it still takes > 70 seconds for each and every
>>   tests from tests/avocado/boot_linux.py to get canceled, so
>>   tests/avocado/boot_linux.py still renders "make check-avocado"
>>   for me pretty unusable... looking at the implementation of
>>   find_free_port() in Avocado, I wonder whether there isn't a
>>   better way to get a free port number in Python? Brute-forcing
>>   all ports between 1024 and 65536 seems just quite cumbersome
>>   to me...
> 
> Even in the worst case of testing every single port,
> for INET and INET6 and for STREAM and DGRAM sockets,
> that find_free_port port completes in a couple of
> seconds.

Weird, on my system, the test runs for 70 seconds, just to finally 
discovered that there was no free port available.

> This code is all inherantly racy though, because as
> designed it is checking for a port that's available
> and then later calls wait_for_phone_home which spins
> up a HTTP server listening on the port. The port can
> be used in between the check and use. This can be
> the case if running many things in parallel on the
> host.
> 
> It would be better to spin up that server using
> kernel port auto-selection at the start eliminating
> the race entirely.  Then just record the port that
> was allocated and use that when building thue
> cloudinit config for the guest.

That sounds much better, indeed... now we just need a volunteer to fix it ;-)

  Thomas
Daniel P. Berrangé March 7, 2022, 6:48 p.m. UTC | #4
On Mon, Mar 07, 2022 at 07:31:50PM +0100, Thomas Huth wrote:
> On 07/03/2022 13.50, Daniel P. Berrangé wrote:
> > On Mon, Feb 28, 2022 at 12:43:25PM +0100, Thomas Huth wrote:
> > > The BootLinux tests are currently failing with an ugly python
> > > stack trace on my RHEL8 system since they cannot get a free port
> > > (likely due to the firewall settings on my system). Let's properly
> > > check the return value of find_free_port() instead and cancel the
> > > test gracefully if it cannot get a free port.
> > > 
> > > Signed-off-by: Thomas Huth <thuth@redhat.com>
> > > ---
> > >   Unfortunately, it still takes > 70 seconds for each and every
> > >   tests from tests/avocado/boot_linux.py to get canceled, so
> > >   tests/avocado/boot_linux.py still renders "make check-avocado"
> > >   for me pretty unusable... looking at the implementation of
> > >   find_free_port() in Avocado, I wonder whether there isn't a
> > >   better way to get a free port number in Python? Brute-forcing
> > >   all ports between 1024 and 65536 seems just quite cumbersome
> > >   to me...
> > 
> > Even in the worst case of testing every single port,
> > for INET and INET6 and for STREAM and DGRAM sockets,
> > that find_free_port port completes in a couple of
> > seconds.
> 
> Weird, on my system, the test runs for 70 seconds, just to finally
> discovered that there was no free port available.

Incidentally I'm really suprised you even hit the 'no free port'
scenario. I've never seen that happen unless the machine was
basically doomed due to something leaking open sockets by the
10's of 1000's.

You mention firewall settings above, but I don't think that's
relevant. The find_free_port() call is with no args, so it
gets set to 'address=localhost' which means is_port_free
takes the bind() codepath. The firewall has no interaction
with the bind() codepath in the kernel AFAIK. 

Regards,
Daniel
Thomas Huth March 8, 2022, 7:13 p.m. UTC | #5
On 07/03/2022 19.48, Daniel P. Berrangé wrote:
> On Mon, Mar 07, 2022 at 07:31:50PM +0100, Thomas Huth wrote:
>> On 07/03/2022 13.50, Daniel P. Berrangé wrote:
>>> On Mon, Feb 28, 2022 at 12:43:25PM +0100, Thomas Huth wrote:
>>>> The BootLinux tests are currently failing with an ugly python
>>>> stack trace on my RHEL8 system since they cannot get a free port
>>>> (likely due to the firewall settings on my system). Let's properly
>>>> check the return value of find_free_port() instead and cancel the
>>>> test gracefully if it cannot get a free port.
>>>>
>>>> Signed-off-by: Thomas Huth <thuth@redhat.com>
>>>> ---
>>>>    Unfortunately, it still takes > 70 seconds for each and every
>>>>    tests from tests/avocado/boot_linux.py to get canceled, so
>>>>    tests/avocado/boot_linux.py still renders "make check-avocado"
>>>>    for me pretty unusable... looking at the implementation of
>>>>    find_free_port() in Avocado, I wonder whether there isn't a
>>>>    better way to get a free port number in Python? Brute-forcing
>>>>    all ports between 1024 and 65536 seems just quite cumbersome
>>>>    to me...
>>>
>>> Even in the worst case of testing every single port,
>>> for INET and INET6 and for STREAM and DGRAM sockets,
>>> that find_free_port port completes in a couple of
>>> seconds.
>>
>> Weird, on my system, the test runs for 70 seconds, just to finally
>> discovered that there was no free port available.
> 
> Incidentally I'm really suprised you even hit the 'no free port'
> scenario. I've never seen that happen unless the machine was
> basically doomed due to something leaking open sockets by the
> 10's of 1000's.
> 
> You mention firewall settings above, but I don't think that's
> relevant. The find_free_port() call is with no args, so it
> gets set to 'address=localhost' which means is_port_free
> takes the bind() codepath. The firewall has no interaction
> with the bind() codepath in the kernel AFAIK.

Yes, I've now had another closer look, and indeed, the firewall is not the 
problem here - but is_port_free() seems to be very buggy. The problem is 
that I do not have any IPv6 address configured on my system, and in that 
case the current code fails.
See https://github.com/avocado-framework/avocado/issues/5273 ... I need this 
patch to fix it:

diff --git a/avocado/utils/network/ports.py b/avocado/utils/network/ports.py
--- a/avocado/utils/network/ports.py
+++ b/avocado/utils/network/ports.py
@@ -60,7 +60,7 @@ def is_port_free(port, address):
                      if localhost:
                          return False
                  sock.close()
-        return True
+                return True
      finally:
          if sock is not None:
              sock.close()

  Thomas
Cleber Rosa March 10, 2022, 3:28 p.m. UTC | #6
Thomas Huth <thuth@redhat.com> writes:

> The BootLinux tests are currently failing with an ugly python
> stack trace on my RHEL8 system since they cannot get a free port
> (likely due to the firewall settings on my system). Let's properly
> check the return value of find_free_port() instead and cancel the
> test gracefully if it cannot get a free port.
>
> Signed-off-by: Thomas Huth <thuth@redhat.com>
> ---
>  Unfortunately, it still takes > 70 seconds for each and every
>  tests from tests/avocado/boot_linux.py to get canceled, so
>  tests/avocado/boot_linux.py still renders "make check-avocado"
>  for me pretty unusable... looking at the implementation of
>  find_free_port() in Avocado, I wonder whether there isn't a
>  better way to get a free port number in Python? Brute-forcing
>  all ports between 1024 and 65536 seems just quite cumbersome
>  to me...
>
>  tests/avocado/avocado_qemu/__init__.py | 2 ++
>  1 file changed, 2 insertions(+)
>

LGTM, despite  the root issue is being addressed in Avocado.

Reviewed-by: Cleber Rosa <crosa@redhat.com>
diff mbox series

Patch

diff --git a/tests/avocado/avocado_qemu/__init__.py b/tests/avocado/avocado_qemu/__init__.py
index 75063c0c30..9b056b5ce5 100644
--- a/tests/avocado/avocado_qemu/__init__.py
+++ b/tests/avocado/avocado_qemu/__init__.py
@@ -603,6 +603,8 @@  def prepare_cloudinit(self, ssh_pubkey=None):
         try:
             cloudinit_iso = os.path.join(self.workdir, 'cloudinit.iso')
             self.phone_home_port = network.find_free_port()
+            if not self.phone_home_port:
+                self.cancel('Failed to get a free port')
             pubkey_content = None
             if ssh_pubkey:
                 with open(ssh_pubkey) as pubkey: