Message ID | 1500454449-29557-1-git-send-email-bharata@linux.vnet.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 07/19/2017 05:54 AM, Bharata B Rao wrote: > Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to > sPAPRMachineState) introduced a new way to track pending LMBs of DIMM > device that is marked for removal. Since this commit we can hit the > assert in spapr_pending_dimm_unplugs_add() in the following situation: > > - DIMM device removal fails as the guest doesn't allow the removal. > - Subsequent attempt to remove the same DIMM would hit the assert > as the corresponding sPAPRDIMMState is still part of the > pending_dimm_unplugs list. I've asked Mike if there was any way of knowing if the unplug were successful in the guest, and aside from the DRC state changes we don't have any other way. I wanted to propose a simple cleanup of the remaining sPAPRDIMMState from the failed unplug to avoid this situation altogether. It appears that we'll have to deal with it with extra logic to cover this. > Fix this by removing the assert and conditionally adding the > sPAPRDIMMState to pending_dimm_unplugs list only when it is not > already present. I think there is a better place to put this verification. The function spapr_pending_dimm_unplugs_add is called in two places: 1- in spapr_memory_unplug_request, when it first creates the structure before starting the DRC unplug requests; 2- in spapr_recover_pending_dimm_state, to restore the DIMM state when it is lost after a migration for example. spapr_recover_pending_dimm_state is called as follows from spapr_lmb_release: sPAPRDIMMState *ds = spapr_pending_dimm_unplugs_find(spapr, PC_DIMM(dev)); /* This information will get lost if a migration occurs * during the unplug process. In this case recover it. */ if (ds == NULL) { ds = spapr_recover_pending_dimm_state(spapr, PC_DIMM(dev)); /* The DRC being examined by the caller at least must be counted */ g_assert(ds->nr_lmbs); } As we can see there is a find() call before calling recover(). This means that if you do the verification inside unplugs_add() you're end up doing find() twice when this recover() condition happens. This also means that you're most likely hitting this situation from the regular LMB unplug flow from spapr_memory_unplug_request. My suggestion would be to move this verification to spapr_memory_unplug_request like this: (...) if (local_err) { goto out; } /* If there were prior unplugs attempts from this same DIMM that failed, there will be a dimm state object inside spapr->pending_dimm_unplugs list. In this case, do not add a new sPAPRDIMMState. */ ds = spapr_recover_pending_dimm_state(spapr, dimm); if (ds == NULL) { ds = g_malloc0(sizeof(sPAPRDIMMState)); ds->nr_lmbs = nr_lmbs; ds->dimm = dimm; spapr_pending_dimm_unplugs_add(spapr, ds); } I prefer this way because you avoid a g_malloc0 that would be thrown away in the unplugs_add(). Now, if we decide to stick with the verification inside unplugs_add() like you proposed .... > > Fixes: 0cffce56ae3501c5783d779f97993ce478acf856 > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> > --- > hw/ppc/spapr.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 1cb09e7..990bb2d 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2853,8 +2853,9 @@ static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s, > static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr, > sPAPRDIMMState *dimm_state) > { > - g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)); > - QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); > + if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) { > + QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); > + } ... you'll need to free() the incoming dimm_state object that was allocated outside of unplugs_add() if you're not going to add it: + if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) { + QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); + } else { g_free(dimm_state); } Now that I think about it, in this approach you'll waste not just a g_malloc() but also a g_free(). Making the verification inside spapr_memory_unplug_request like I said above would be my way to go in this case. Daniel > } > > static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr,
On Wed, Jul 19, 2017 at 02:24:09PM +0530, Bharata B Rao wrote: > Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to > sPAPRMachineState) introduced a new way to track pending LMBs of DIMM > device that is marked for removal. Since this commit we can hit the > assert in spapr_pending_dimm_unplugs_add() in the following situation: > > - DIMM device removal fails as the guest doesn't allow the removal. > - Subsequent attempt to remove the same DIMM would hit the assert > as the corresponding sPAPRDIMMState is still part of the > pending_dimm_unplugs list. > > Fix this by removing the assert and conditionally adding the > sPAPRDIMMState to pending_dimm_unplugs list only when it is not > already present. > > Fixes: 0cffce56ae3501c5783d779f97993ce478acf856 > Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> Sounds like a reasonable change based on the rationale above. However, can you add a comment here explaining the situation in which the entry already exists. > --- > hw/ppc/spapr.c | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > index 1cb09e7..990bb2d 100644 > --- a/hw/ppc/spapr.c > +++ b/hw/ppc/spapr.c > @@ -2853,8 +2853,9 @@ static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s, > static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr, > sPAPRDIMMState *dimm_state) > { > - g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)); > - QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); > + if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) { > + QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); > + } > } > > static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr,
diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c index 1cb09e7..990bb2d 100644 --- a/hw/ppc/spapr.c +++ b/hw/ppc/spapr.c @@ -2853,8 +2853,9 @@ static sPAPRDIMMState *spapr_pending_dimm_unplugs_find(sPAPRMachineState *s, static void spapr_pending_dimm_unplugs_add(sPAPRMachineState *spapr, sPAPRDIMMState *dimm_state) { - g_assert(!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)); - QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); + if (!spapr_pending_dimm_unplugs_find(spapr, dimm_state->dimm)) { + QTAILQ_INSERT_HEAD(&spapr->pending_dimm_unplugs, dimm_state, next); + } } static void spapr_pending_dimm_unplugs_remove(sPAPRMachineState *spapr,
Commit 0cffce56 (hw/ppc/spapr.c: adding pending_dimm_unplugs to sPAPRMachineState) introduced a new way to track pending LMBs of DIMM device that is marked for removal. Since this commit we can hit the assert in spapr_pending_dimm_unplugs_add() in the following situation: - DIMM device removal fails as the guest doesn't allow the removal. - Subsequent attempt to remove the same DIMM would hit the assert as the corresponding sPAPRDIMMState is still part of the pending_dimm_unplugs list. Fix this by removing the assert and conditionally adding the sPAPRDIMMState to pending_dimm_unplugs list only when it is not already present. Fixes: 0cffce56ae3501c5783d779f97993ce478acf856 Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com> --- hw/ppc/spapr.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)