diff mbox series

[v1,5/7] xen-bus: Set offline if backend's state is XenbusStateClosed

Message ID 20231110204207.2927514-6-volodymyr_babchuk@epam.com (mailing list archive)
State New, archived
Headers show
Series [v1,1/7] xen-block: Do not write frontend nodes | expand

Commit Message

Volodymyr Babchuk Nov. 10, 2023, 8:42 p.m. UTC
From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>

Both state (XenbusStateClosed) and online (0) are expected by
toolstack/xl devd to completely destroy the device. But "offline"
is never being set by the backend resulting in timeout during
domain destruction, garbage in Xestore and still running Qemu
instance.

Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
---
 hw/xen/xen-bus.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

David Woodhouse Nov. 11, 2023, 11:42 a.m. UTC | #1
(When sending a series, if you Cc someone on one message please could you Cc them on the whole thread for context? Thanks.)

> Both state (XenbusStateClosed) and online (0) are expected by
> toolstack/xl devd to completely destroy the device. But "offline"
> is never being set by the backend resulting in timeout during
> domain destruction, garbage in Xestore and still running Qemu
> instance.

I would dearly love to see a clear state diagram showing all the possible state transitions during device creation, fe/be reconnecting, and hot plug/unplug. Does anything like that exist? Any test cases for the common and the less common transitions, and even the illegal ones which need to be handled gracefully?

Fantasy aside, can you please confirm that this patch series was tested with hotplug (device_add in the monitor) *and* unplug in QEMU, both under real Xen and ideally also emulated Xen in KVM? 

Thanks.
Paul Durrant Nov. 12, 2023, 8:37 p.m. UTC | #2
On 10/11/2023 15:42, Volodymyr Babchuk wrote:
> From: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> 
> Both state (XenbusStateClosed) and online (0) are expected by
> toolstack/xl devd to completely destroy the device. But "offline"
> is never being set by the backend resulting in timeout during
> domain destruction, garbage in Xestore and still running Qemu
> instance.
> 
> Signed-off-by: Oleksandr Tyshchenko <oleksandr_tyshchenko@epam.com>
> Signed-off-by: Volodymyr Babchuk <volodymyr_babchuk@epam.com>
> ---
>   hw/xen/xen-bus.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
> index 75474d4b43..6e7ec3af64 100644
> --- a/hw/xen/xen-bus.c
> +++ b/hw/xen/xen-bus.c
> @@ -519,6 +519,10 @@ static void xen_device_backend_changed(void *opaque, const char *path)
>           xen_device_backend_set_state(xendev, XenbusStateClosed);
>       }
>   
> +    if (xen_device_backend_get_state(xendev) == XenbusStateClosed) {
> +        xen_device_backend_set_online(xendev, false);
> +    }
> +
>       /*
>        * If a backend is still 'online' then we should leave it alone but,
>        * if a backend is not 'online', then the device is a candidate

I don't understand what you're trying to do here. Just a few lines up 
from this hunk there is:

  506    if (xen_device_backend_scanf(xendev, "online", "%u", &online) 
!= 1) {
  507        online = 0;
  508    }
  509
  510    xen_device_backend_set_online(xendev, !!online);

Why is this not sufficient? What happens if the frontend decides to stop 
and start (e.g. for a driver update)? I'm guessing the backend will be 
destroyed... which is not very friendly.

   Paul
diff mbox series

Patch

diff --git a/hw/xen/xen-bus.c b/hw/xen/xen-bus.c
index 75474d4b43..6e7ec3af64 100644
--- a/hw/xen/xen-bus.c
+++ b/hw/xen/xen-bus.c
@@ -519,6 +519,10 @@  static void xen_device_backend_changed(void *opaque, const char *path)
         xen_device_backend_set_state(xendev, XenbusStateClosed);
     }
 
+    if (xen_device_backend_get_state(xendev) == XenbusStateClosed) {
+        xen_device_backend_set_online(xendev, false);
+    }
+
     /*
      * If a backend is still 'online' then we should leave it alone but,
      * if a backend is not 'online', then the device is a candidate