diff mbox

[v2,1/1] hw/ppc/spapr.c: abort unplug_request if previous unplug isn't done

Message ID 20171009211136.31640-2-danielhb@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Daniel Henrique Barboza Oct. 9, 2017, 9:11 p.m. UTC
LMB removal is completed only when the spapr_lmb_release callback
is called after all DRCs of the dimm are detached. During this
time, it is possible that a unplug request for the same dimm
arrives, trying to detach DRCs that were detached by the guest
in the first unplug_request.

BQL doesn't help in this case - the lock will prevent any concurrent
removal from happening until the end of spapr_memory_unplug_request
only. What happens is that the second unplug_request ends up calling
spapr_drc_detach in a DRC that were detached already, causing an
assert error in spapr_drc_detach (e.g
https://bugs.launchpad.net/qemu/+bug/1718118).

spapr_lmb_release uses a structure called sPAPRDIMMState, stored in the
spapr->pending_dimm_unplugs QTAIL, to track how many LMB DRCs are left
to be detached by the guest. When there are no more DRCs left, this
structure is deleted and the pc-dimm unplug handler is called to
finish the process.

This patch reuses the sPAPRDIMMState to allow unplug_request to know
if there is an ongoing unplug process for a given dimm, aborting the
unplug request in this case, by doing the following changes:

- in spapr_lmb_release callback, move the dimm state removal to the
end, after pc-dimm unplug handler. With this change we can check for
the existence of the dimm state to see if the unplug process is
done.

- use spapr_pending_dimm_unplugs_find in spapr_memory_unplug_request
to check if the dimm state exists. If positive, there is an unplug
operation already in progress for this dimm, meaning that we should
abort it and warn the user about it.

Fixes: https://bugs.launchpad.net/qemu/+bug/1718118
Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c | 16 ++++++++++++++--
 1 file changed, 14 insertions(+), 2 deletions(-)

Comments

David Gibson Oct. 10, 2017, 1:36 a.m. UTC | #1
On Mon, Oct 09, 2017 at 06:11:36PM -0300, Daniel Henrique Barboza wrote:
> LMB removal is completed only when the spapr_lmb_release callback
> is called after all DRCs of the dimm are detached. During this
> time, it is possible that a unplug request for the same dimm
> arrives, trying to detach DRCs that were detached by the guest
> in the first unplug_request.
> 
> BQL doesn't help in this case - the lock will prevent any concurrent
> removal from happening until the end of spapr_memory_unplug_request
> only. What happens is that the second unplug_request ends up calling
> spapr_drc_detach in a DRC that were detached already, causing an
> assert error in spapr_drc_detach (e.g
> https://bugs.launchpad.net/qemu/+bug/1718118).
> 
> spapr_lmb_release uses a structure called sPAPRDIMMState, stored in the
> spapr->pending_dimm_unplugs QTAIL, to track how many LMB DRCs are left
> to be detached by the guest. When there are no more DRCs left, this
> structure is deleted and the pc-dimm unplug handler is called to
> finish the process.
> 
> This patch reuses the sPAPRDIMMState to allow unplug_request to know
> if there is an ongoing unplug process for a given dimm, aborting the
> unplug request in this case, by doing the following changes:
> 
> - in spapr_lmb_release callback, move the dimm state removal to the
> end, after pc-dimm unplug handler. With this change we can check for
> the existence of the dimm state to see if the unplug process is
> done.
> 
> - use spapr_pending_dimm_unplugs_find in spapr_memory_unplug_request
> to check if the dimm state exists. If positive, there is an unplug
> operation already in progress for this dimm, meaning that we should
> abort it and warn the user about it.
> 
> Fixes: https://bugs.launchpad.net/qemu/+bug/1718118
> Signed-off-by: Daniel Henrique Barboza <danielhb@linux.vnet.ibm.com>

Thanks for the updated description.  I've applied to ppc-for-2.11.

I'm not totally certain the first change is necessary - because each
of the pieces here should be called with the BQL, I think releasing
the DIMMState should be atomic with the rest of the stuff in
spapr_lmb_release().  But it makes it more obvious what's going on
here, so that's fine anyway.

> ---
>  hw/ppc/spapr.c | 16 ++++++++++++++--
>  1 file changed, 14 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index ff87f155d5..7f9274d57d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -3054,14 +3054,13 @@ void spapr_lmb_release(DeviceState *dev)
>          return;
>      }
>  
> -    spapr_pending_dimm_unplugs_remove(spapr, ds);
> -
>      /*
>       * Now that all the LMBs have been removed by the guest, call the
>       * pc-dimm unplug handler to cleanup up the pc-dimm device.
>       */
>      pc_dimm_memory_unplug(dev, &spapr->hotplug_memory, mr);
>      object_unparent(OBJECT(dev));
> +    spapr_pending_dimm_unplugs_remove(spapr, ds);
>  }
>  
>  static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
> @@ -3090,6 +3089,19 @@ static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
>          goto out;
>      }
>  
> +    /*
> +     * An existing pending dimm state for this DIMM means that there is an
> +     * unplug operation in progress, waiting for the spapr_lmb_release
> +     * callback to complete the job (BQL can't cover that far). In this case,
> +     * bail out to avoid detaching DRCs that were already released.
> +     */
> +    if (spapr_pending_dimm_unplugs_find(spapr, dimm)) {
> +        error_setg(&local_err,
> +                   "Memory unplug already in progress for device %s",
> +                   dev->id);
> +        goto out;
> +    }
> +
>      spapr_pending_dimm_unplugs_add(spapr, nr_lmbs, dimm);
>  
>      addr = addr_start;
diff mbox

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index ff87f155d5..7f9274d57d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -3054,14 +3054,13 @@  void spapr_lmb_release(DeviceState *dev)
         return;
     }
 
-    spapr_pending_dimm_unplugs_remove(spapr, ds);
-
     /*
      * Now that all the LMBs have been removed by the guest, call the
      * pc-dimm unplug handler to cleanup up the pc-dimm device.
      */
     pc_dimm_memory_unplug(dev, &spapr->hotplug_memory, mr);
     object_unparent(OBJECT(dev));
+    spapr_pending_dimm_unplugs_remove(spapr, ds);
 }
 
 static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
@@ -3090,6 +3089,19 @@  static void spapr_memory_unplug_request(HotplugHandler *hotplug_dev,
         goto out;
     }
 
+    /*
+     * An existing pending dimm state for this DIMM means that there is an
+     * unplug operation in progress, waiting for the spapr_lmb_release
+     * callback to complete the job (BQL can't cover that far). In this case,
+     * bail out to avoid detaching DRCs that were already released.
+     */
+    if (spapr_pending_dimm_unplugs_find(spapr, dimm)) {
+        error_setg(&local_err,
+                   "Memory unplug already in progress for device %s",
+                   dev->id);
+        goto out;
+    }
+
     spapr_pending_dimm_unplugs_add(spapr, nr_lmbs, dimm);
 
     addr = addr_start;