diff mbox series

[6/6] pcie: expire pending delete

Message ID 20211011120504.254053-7-kraxel@redhat.com (mailing list archive)
State New, archived
Headers show
Series RfC: try improve native hotplug for pcie root ports | expand

Commit Message

Gerd Hoffmann Oct. 11, 2021, 12:05 p.m. UTC
Add an expire time for pending delete, once the time is over allow
pressing the attention button again.

This makes pcie hotplug behave more like acpi hotplug, where one can
try sending an 'device_del' monitor command again in case the guest
didn't respond to the first attempt.

Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
---
 include/hw/qdev-core.h | 1 +
 hw/pci/pcie.c          | 2 ++
 softmmu/qdev-monitor.c | 4 +++-
 3 files changed, 6 insertions(+), 1 deletion(-)

Comments

Michael S. Tsirkin Oct. 11, 2021, 12:49 p.m. UTC | #1
On Mon, Oct 11, 2021 at 02:05:04PM +0200, Gerd Hoffmann wrote:
> Add an expire time for pending delete, once the time is over allow
> pressing the attention button again.
> 
> This makes pcie hotplug behave more like acpi hotplug, where one can
> try sending an 'device_del' monitor command again in case the guest
> didn't respond to the first attempt.
> 
> Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
> ---
>  include/hw/qdev-core.h | 1 +
>  hw/pci/pcie.c          | 2 ++
>  softmmu/qdev-monitor.c | 4 +++-
>  3 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
> index 4ff19c714bd8..d082a9a00aca 100644
> --- a/include/hw/qdev-core.h
> +++ b/include/hw/qdev-core.h
> @@ -180,6 +180,7 @@ struct DeviceState {
>      char *canonical_path;
>      bool realized;
>      bool pending_deleted_event;
> +    int64_t pending_deleted_expires_ms;
>      QemuOpts *opts;
>      int hotplugged;
>      bool allow_unplug_during_migration;
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index f3ac04399969..477c8776aa27 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -549,6 +549,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
>      }
>  
>      dev->pending_deleted_event = true;
> +    dev->pending_deleted_expires_ms =
> +        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5000; /* 5 secs */
>  
>      /* In case user cancel the operation of multi-function hot-add,
>       * remove the function that is unexposed to guest individually,


Well this will be barely enough, right?

	Once the Power
	Indicator begins blinking, a 5-second abort interval exists during which a second depression of the
	Attention Button cancels the operation.

So I guess it needs to be more. Problem is of course if guest is
busy because of interrupts and whatnot, it might not get to
handling that in time ...


> diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
> index 0705f008466d..5e7960c52a0a 100644
> --- a/softmmu/qdev-monitor.c
> +++ b/softmmu/qdev-monitor.c
> @@ -910,7 +910,9 @@ void qmp_device_del(const char *id, Error **errp)
>  {
>      DeviceState *dev = find_device_state(id, errp);
>      if (dev != NULL) {
> -        if (dev->pending_deleted_event) {
> +        if (dev->pending_deleted_event &&
> +            (dev->pending_deleted_expires_ms == 0 ||
> +             dev->pending_deleted_expires_ms > qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL))) {
>              error_setg(errp, "Device %s is already in the "
>                               "process of unplug", id);
>              return;
> -- 
> 2.31.1
Gerd Hoffmann Oct. 12, 2021, 5:30 a.m. UTC | #2
> > index f3ac04399969..477c8776aa27 100644
> > --- a/hw/pci/pcie.c
> > +++ b/hw/pci/pcie.c
> > @@ -549,6 +549,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
> >      }
> >  
> >      dev->pending_deleted_event = true;
> > +    dev->pending_deleted_expires_ms =
> > +        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5000; /* 5 secs */
> >  
> >      /* In case user cancel the operation of multi-function hot-add,
> >       * remove the function that is unexposed to guest individually,
> 
> 
> Well this will be barely enough, right?
> 
> 	Once the Power
> 	Indicator begins blinking, a 5-second abort interval exists during which a second depression of the
> 	Attention Button cancels the operation.

Well, canceling the hot-plug is not supported in qemu right now (there
is no qmp command for that).  I'm also not sure it makes sense in the
first place for virtual machines.

> So I guess it needs to be more. Problem is of course if guest is
> busy because of interrupts and whatnot, it might not get to
> handling that in time ...

See patch #3, that one should take care of a busy guest ...

take care,
  Gerd
Michael S. Tsirkin Oct. 12, 2021, 5:46 a.m. UTC | #3
On Tue, Oct 12, 2021 at 07:30:34AM +0200, Gerd Hoffmann wrote:
> > > index f3ac04399969..477c8776aa27 100644
> > > --- a/hw/pci/pcie.c
> > > +++ b/hw/pci/pcie.c
> > > @@ -549,6 +549,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
> > >      }
> > >  
> > >      dev->pending_deleted_event = true;
> > > +    dev->pending_deleted_expires_ms =
> > > +        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5000; /* 5 secs */
> > >  
> > >      /* In case user cancel the operation of multi-function hot-add,
> > >       * remove the function that is unexposed to guest individually,
> > 
> > 
> > Well this will be barely enough, right?
> > 
> > 	Once the Power
> > 	Indicator begins blinking, a 5-second abort interval exists during which a second depression of the
> > 	Attention Button cancels the operation.
> 
> Well, canceling the hot-plug is not supported in qemu right now (there
> is no qmp command for that).  I'm also not sure it makes sense in the
> first place for virtual machines.

Yes. However if you resend an attention button press within the
5 second window, guest will think you cancelled hot-plug
and act accordingly.
It's a fundamentally racy algorithm :(


> > So I guess it needs to be more. Problem is of course if guest is
> > busy because of interrupts and whatnot, it might not get to
> > handling that in time ...
> 
> See patch #3, that one should take care of a busy guest ...
> 
> take care,
>   Gerd
Gerd Hoffmann Oct. 12, 2021, 6:44 a.m. UTC | #4
On Tue, Oct 12, 2021 at 01:46:35AM -0400, Michael S. Tsirkin wrote:
> On Tue, Oct 12, 2021 at 07:30:34AM +0200, Gerd Hoffmann wrote:
> > > > index f3ac04399969..477c8776aa27 100644
> > > > --- a/hw/pci/pcie.c
> > > > +++ b/hw/pci/pcie.c
> > > > @@ -549,6 +549,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
> > > >      }
> > > >  
> > > >      dev->pending_deleted_event = true;
> > > > +    dev->pending_deleted_expires_ms =
> > > > +        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5000; /* 5 secs */
> > > >  
> > > >      /* In case user cancel the operation of multi-function hot-add,
> > > >       * remove the function that is unexposed to guest individually,
> > > 
> > > 
> > > Well this will be barely enough, right?
> > > 
> > > 	Once the Power
> > > 	Indicator begins blinking, a 5-second abort interval exists during which a second depression of the
> > > 	Attention Button cancels the operation.
> > 
> > Well, canceling the hot-plug is not supported in qemu right now (there
> > is no qmp command for that).  I'm also not sure it makes sense in the
> > first place for virtual machines.
> 
> Yes. However if you resend an attention button press within the
> 5 second window, guest will think you cancelled hot-plug
> and act accordingly.
> It's a fundamentally racy algorithm :(

That's why re-sending an attention button press is blocked
for 5 seconds.

It's also blocked in case the guest blinks the power
indicator (see patch #3).

Both together work well in my testing, I can flood a (linux) guest
with device_del commands without bad side effects:

First device_del command sends attention button press.
Then device_del is rejected because the 5 secs are not over yet.
Then device_del is rejected because the indicator blinks.
Then unplug completes (and qemu sends event).
Then device_del fails because the device doesn't exist any more.

take care,
  Gerd
Michael S. Tsirkin Oct. 12, 2021, 7:01 a.m. UTC | #5
On Tue, Oct 12, 2021 at 08:44:45AM +0200, Gerd Hoffmann wrote:
> On Tue, Oct 12, 2021 at 01:46:35AM -0400, Michael S. Tsirkin wrote:
> > On Tue, Oct 12, 2021 at 07:30:34AM +0200, Gerd Hoffmann wrote:
> > > > > index f3ac04399969..477c8776aa27 100644
> > > > > --- a/hw/pci/pcie.c
> > > > > +++ b/hw/pci/pcie.c
> > > > > @@ -549,6 +549,8 @@ void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
> > > > >      }
> > > > >  
> > > > >      dev->pending_deleted_event = true;
> > > > > +    dev->pending_deleted_expires_ms =
> > > > > +        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5000; /* 5 secs */
> > > > >  
> > > > >      /* In case user cancel the operation of multi-function hot-add,
> > > > >       * remove the function that is unexposed to guest individually,
> > > > 
> > > > 
> > > > Well this will be barely enough, right?
> > > > 
> > > > 	Once the Power
> > > > 	Indicator begins blinking, a 5-second abort interval exists during which a second depression of the
> > > > 	Attention Button cancels the operation.
> > > 
> > > Well, canceling the hot-plug is not supported in qemu right now (there
> > > is no qmp command for that).  I'm also not sure it makes sense in the
> > > first place for virtual machines.
> > 
> > Yes. However if you resend an attention button press within the
> > 5 second window, guest will think you cancelled hot-plug
> > and act accordingly.
> > It's a fundamentally racy algorithm :(
> 
> That's why re-sending an attention button press is blocked
> for 5 seconds.
> It's also blocked in case the guest blinks the power
> indicator (see patch #3).
> 
> Both together work well in my testing, I can flood a (linux) guest
> with device_del commands without bad side effects:
> 
> First device_del command sends attention button press.
> Then device_del is rejected because the 5 secs are not over yet.
> Then device_del is rejected because the indicator blinks.

Ah, I see. 5 secs is a lot so yea, most likely it's gonnu
be ok, worst case we'll wait another 5 seconds and
send again, right?

Worth checking with windows guests BTW, we saw lots of
races with these too.


> Then unplug completes (and qemu sends event).
> Then device_del fails because the device doesn't exist any more.
> 
> take care,
>   Gerd
diff mbox series

Patch

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 4ff19c714bd8..d082a9a00aca 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -180,6 +180,7 @@  struct DeviceState {
     char *canonical_path;
     bool realized;
     bool pending_deleted_event;
+    int64_t pending_deleted_expires_ms;
     QemuOpts *opts;
     int hotplugged;
     bool allow_unplug_during_migration;
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index f3ac04399969..477c8776aa27 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -549,6 +549,8 @@  void pcie_cap_slot_unplug_request_cb(HotplugHandler *hotplug_dev,
     }
 
     dev->pending_deleted_event = true;
+    dev->pending_deleted_expires_ms =
+        qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) + 5000; /* 5 secs */
 
     /* In case user cancel the operation of multi-function hot-add,
      * remove the function that is unexposed to guest individually,
diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index 0705f008466d..5e7960c52a0a 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -910,7 +910,9 @@  void qmp_device_del(const char *id, Error **errp)
 {
     DeviceState *dev = find_device_state(id, errp);
     if (dev != NULL) {
-        if (dev->pending_deleted_event) {
+        if (dev->pending_deleted_event &&
+            (dev->pending_deleted_expires_ms == 0 ||
+             dev->pending_deleted_expires_ms > qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL))) {
             error_setg(errp, "Device %s is already in the "
                              "process of unplug", id);
             return;