diff mbox

[FIX] spapr: Fix QEMU abort during memory unplug

Message ID 1500454449-29557-1-git-send-email-bharata@linux.vnet.ibm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Bharata B Rao July 19, 2017, 8:54 a.m. UTC
Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to
sPAPRMachineState) introduced a new way to track pending LMBs of DIMM
device that is marked for removal. Since this commit we can hit the
assert in spapr_pending_dimm_unplugs_add() in the following situation:

- DIMM device removal fails as the guest doesn't allow the removal.
- Subsequent attempt to remove the same DIMM would hit the assert
  as the corresponding sPAPRDIMMState is still part of the
  pending_dimm_unplugs list.

Fix this by removing the assert and conditionally adding the
sPAPRDIMMState to pending_dimm_unplugs list only when it is not
already present.

Fixes: 0cffce56ae3501c5783d779f97993ce478acf856
Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
---
 hw/ppc/spapr.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Daniel Henrique Barboza July 19, 2017, 6:03 p.m. UTC | #1
On 07/19/2017 05:54 AM, Bharata B Rao wrote:
> Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to
> sPAPRMachineState) introduced a new way to track pending LMBs of DIMM
> device that is marked for removal. Since this commit we can hit the
> assert in spapr_pending_dimm_unplugs_add() in the following situation:
>
> - DIMM device removal fails as the guest doesn't allow the removal.
> - Subsequent attempt to remove the same DIMM would hit the assert
>    as the corresponding sPAPRDIMMState is still part of the
>    pending_dimm_unplugs list.

I've asked Mike if there was any way of knowing if the unplug were 
successful
in the guest, and aside from the DRC state changes we don't have
any other way. I wanted to propose a simple cleanup of the remaining
sPAPRDIMMState from the failed unplug to avoid this situation altogether.

It appears that we'll have to deal with it with extra logic to cover this.

> Fix this by removing the assert and conditionally adding the
> sPAPRDIMMState to pending_dimm_unplugs list only when it is not
> already present.

I think there is a better place to put this verification. The function
spapr_pending_dimm_unplugs_add is called in two places:

1- in spapr_memory_unplug_request, when it first creates the structure
before starting the DRC unplug requests;

2- in spapr_recover_pending_dimm_state, to restore the DIMM state when
it is lost after a migration for example.

spapr_recover_pending_dimm_state is called as follows from 
spapr_lmb_release:


     sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, 
PC_DIMM(dev));

     /* This information will get lost if a migration occurs
      * during the unplug process. In this case recover it. */
     if (ds == NULL) {
         ds = spapr_recover_pending_dimm_state(spapr, PC_DIMM(dev));
         /* The DRC being examined by the caller at least must be counted */
         g_assert(ds->nr_lmbs);
     }


As we can see there is a find() call before calling recover(). This 
means that if you
do the verification inside unplugs_add() you're end up doing find() 
twice when
this recover() condition happens. This also means that you're most 
likely hitting
this situation from the regular LMB unplug flow from 
spapr_memory_unplug_request.

My suggestion would be to move this verification to 
spapr_memory_unplug_request
like this:

     (...)
     if (local_err) {
         goto out;
     }

     /* If there were prior unplugs attempts from this same DIMM that 
failed,
        there will be a dimm state object inside 
spapr->pending_dimm_unplugs list.
        In this case, do not add a new sPAPRDIMMState. */
     ds = spapr_recover_pending_dimm_state(spapr, dimm);
     if (ds == NULL) {
         ds = g_malloc0(sizeof(sPAPRDIMMState));
         ds->nr_lmbs = nr_lmbs;
         ds->dimm = dimm;
         spapr_pending_dimm_unplugs_add(spapr, ds);
     }


I prefer this way because you avoid a g_malloc0 that would be thrown away in
the unplugs_add().

Now, if we decide to stick with the verification inside unplugs_add() like
you proposed ....

>
> Fixes: 0cffce56ae3501c5783d779f97993ce478acf856
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
> ---
>   hw/ppc/spapr.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 1cb09e7..990bb2d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2853,8 +2853,9 @@ static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s,
>   static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr,
>                                              sPAPRDIMMState *dimm_state)
>   {
> -    g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm));
> -    QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next);
> +    if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) {
> +        QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next);
> +    }
... you'll need to free() the incoming dimm_state object that was allocated
outside of unplugs_add() if you're not going to add it:


+    if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) {
+        QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next);
+    }

     else {
         g_free(dimm_state);
     }



Now that I think about it, in this approach you'll waste not just a 
g_malloc()
but also a g_free(). Making the verification inside 
spapr_memory_unplug_request
like I said above would be my way to go in this case.



Daniel

>   }
>
>   static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr,
David Gibson July 20, 2017, 3:09 a.m. UTC | #2
On Wed, Jul 19, 2017 at 02:24:09PM +0530, Bharata B Rao wrote:
> Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to
> sPAPRMachineState) introduced a new way to track pending LMBs of DIMM
> device that is marked for removal. Since this commit we can hit the
> assert in spapr_pending_dimm_unplugs_add() in the following situation:
> 
> - DIMM device removal fails as the guest doesn't allow the removal.
> - Subsequent attempt to remove the same DIMM would hit the assert
>   as the corresponding sPAPRDIMMState is still part of the
>   pending_dimm_unplugs list.
> 
> Fix this by removing the assert and conditionally adding the
> sPAPRDIMMState to pending_dimm_unplugs list only when it is not
> already present.
> 
> Fixes: 0cffce56ae3501c5783d779f97993ce478acf856
> Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>

Sounds like a reasonable change based on the rationale above.
However, can you add a comment here explaining the situation in which
the entry already exists.

> ---
>  hw/ppc/spapr.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> index 1cb09e7..990bb2d 100644
> --- a/hw/ppc/spapr.c
> +++ b/hw/ppc/spapr.c
> @@ -2853,8 +2853,9 @@ static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s,
>  static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr,
>                                             sPAPRDIMMState *dimm_state)
>  {
> -    g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm));
> -    QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next);
> +    if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) {
> +        QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next);
> +    }
>  }
>  
>  static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr,
diff mbox

Patch

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 1cb09e7..990bb2d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2853,8 +2853,9 @@  static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s,
 static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr,
                                            sPAPRDIMMState *dimm_state)
 {
-    g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm));
-    QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next);
+    if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) {
+        QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next);
+    }
 }
 
 static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr,