mbox series

[v3,0/7] CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration

Message ID 20210211225246.17315-1-danielhb413@gmail.com (mailing list archive)
Headers show
Series CPU unplug timeout/LMB unplug cleanup in DRC reconfiguration | expand

Message

Daniel Henrique Barboza Feb. 11, 2021, 10:52 p.m. UTC
Hi,

This is marked as a v3 as it started as a result of discussions that
followed the v2 [1]. 

The idea with this series is to add CPU hotunplug timeout to avoid the
situations where the kernel refuses to release the CPU. The reasoning
for a timeout approach is described in patch 05.

While investigating putting a timeout in memory hotunplug, I have found
out that we have a way to determine, at least in some cases, when the kernel
refuses to release the DIMM during a memory hotunplug. This alleviate one
of the most common issues (at least AFAIK) with memory hotunplug and it
made me gave up attempting to put a timeout in memory hotunplug altogether.

At this point I didn't add timeouts for PCI hotunplug operations, but it
is trivial to do so if desirable.

The series goes as follows:

- Patches 1-4: DRC simplifications/cleanups. The idea with these
  cleanups were to trim the spapr_drc_detach use as much as possible,
  since the function would be used to start the timeout timer

- Patch 5: timeout timer infrastructure

- Patch 6: add cpu unplug timeout

- Patch 7: reset DIMM unplug state when the kernel reconfigures the DRC
  connector



v2 link: [1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html


Daniel Henrique Barboza (7):
  spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()
  spapr_pci.c: simplify spapr_pci_unplug_request() function handling
  spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
  spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
  spapr_drc.c: introduce unplug_timeout_timer
  spapr_drc.c: add hotunplug timeout for CPUs
  spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state

 hw/ppc/spapr.c             |  40 ++++++++++++-
 hw/ppc/spapr_drc.c         | 116 +++++++++++++++++++++++++++----------
 hw/ppc/spapr_pci.c         |  44 +++++---------
 hw/ppc/trace-events        |   2 +-
 include/hw/ppc/spapr.h     |   2 +
 include/hw/ppc/spapr_drc.h |   7 ++-
 6 files changed, 147 insertions(+), 64 deletions(-)

Comments

David Gibson Feb. 17, 2021, 2:33 a.m. UTC | #1
On Thu, Feb 11, 2021 at 07:52:39PM -0300, Daniel Henrique Barboza wrote:
> Hi,
> 
> This is marked as a v3 as it started as a result of discussions that
> followed the v2 [1]. 
> 
> The idea with this series is to add CPU hotunplug timeout to avoid the
> situations where the kernel refuses to release the CPU. The reasoning
> for a timeout approach is described in patch 05.
> 
> While investigating putting a timeout in memory hotunplug, I have found
> out that we have a way to determine, at least in some cases, when the kernel
> refuses to release the DIMM during a memory hotunplug. This alleviate one
> of the most common issues (at least AFAIK) with memory hotunplug and it
> made me gave up attempting to put a timeout in memory hotunplug altogether.
> 
> At this point I didn't add timeouts for PCI hotunplug operations, but it
> is trivial to do so if desirable.
> 
> The series goes as follows:
> 
> - Patches 1-4: DRC simplifications/cleanups. The idea with these
>   cleanups were to trim the spapr_drc_detach use as much as possible,
>   since the function would be used to start the timeout timer
> 
> - Patch 5: timeout timer infrastructure
> 
> - Patch 6: add cpu unplug timeout
> 
> - Patch 7: reset DIMM unplug state when the kernel reconfigures the DRC
>   connector

Very nice start.  More comments throughout.

> 
> 
> 
> v2 link: [1] https://lists.gnu.org/archive/html/qemu-devel/2021-01/msg04400.html
> 
> 
> Daniel Henrique Barboza (7):
>   spapr_drc.c: do not call spapr_drc_detach() in drc_isolate_logical()
>   spapr_pci.c: simplify spapr_pci_unplug_request() function handling
>   spapr_drc.c: use spapr_drc_release() in isolate_physical/set_unusable
>   spapr: rename spapr_drc_detach() to spapr_drc_unplug_request()
>   spapr_drc.c: introduce unplug_timeout_timer
>   spapr_drc.c: add hotunplug timeout for CPUs
>   spapr_drc.c: use DRC reconfiguration to cleanup DIMM unplug state
> 
>  hw/ppc/spapr.c             |  40 ++++++++++++-
>  hw/ppc/spapr_drc.c         | 116 +++++++++++++++++++++++++++----------
>  hw/ppc/spapr_pci.c         |  44 +++++---------
>  hw/ppc/trace-events        |   2 +-
>  include/hw/ppc/spapr.h     |   2 +
>  include/hw/ppc/spapr_drc.h |   7 ++-
>  6 files changed, 147 insertions(+), 64 deletions(-)
>