mbox series

[ndctl,v5,0/6] Add support for reporting papr nvdimm health

Message ID 20200529220600.225320-1-vaibhav@linux.ibm.com (mailing list archive)
Headers show
Series Add support for reporting papr nvdimm health | expand

Message

Vaibhav Jain May 29, 2020, 10:05 p.m. UTC
Changes since v4 [1]:
* Updated proposed changes to remove usage of term 'SCM' due to
  ambiguity with 'PMEM' and 'NVDIMM'. [ Dan Williams ]
* Replaced the usage of term 'SCM' with 'PMEM' in most contexts.
  [ Aneesh ]
* Updates to various newly introduced identifiers in 'papr.c'
  removing the 'SCM' prefix from their names.
* Renamed NVDIMM_FAMILY_PAPR_SCM to NVDIMM_FAMILY_PAPR
* Renamed PAPR_SCM_PDSM_HEALTH PAPR_PDSM_HEALTH
* Renamed 'papr_scm.c' to 'papr.c'
* Renamed 'papr_scm_pdsm.h' to 'papr_pdsm.h'

[1] https://lore.kernel.org/linux-nvdimm/20200527044737.40615-1-vaibhav@linux.ibm.com

---
This patch-set proposes changes to libndctl to add support for reporting
health for nvdimms that support the PAPR standard[2]. The standard defines
machenism (HCALL) through which a guest kernel can query and fetch health
and performance stats of an nvdimm attached to the hypervisor[3]. Until
now 'ndctl' was unable to report these stats for papr_scm dimms on PPC64
guests due to absence of ACPI/NFIT, a limitation which this patch-set tries
to address.

The patch-set introduces support for the new PAPR PDSM family
defined at [4] & [5] via a new dimm-ops named
'papr_dimm_ops'. Infrastructure to probe and distinguish papr-scm
dimms from other dimm families that may support ACPI/NFIT is
implemented by updating the 'struct ndctl_dimm' initialization
routines to bifurcate based on the nvdimm type. We also introduce two
new dimm-ops member for handling initialization of dimm specific data
for specific DSM families.

These changes coupled with proposed kernel changes located at Ref[1] should
provide a way for the user to retrieve NVDIMM health status using ndtcl for
pseries guests. Below is a sample output using proposed kernel + ndctl
changes:

 # ndctl list -DH
[
  {
    "dev":"nmem0",
    "flag_smart_event":true,
    "health":{
      "health_state":"fatal",
      "shutdown_state":"dirty"
    }
  }
]

Structure of the patchset
=========================

We start with a re-factoring patch that splits the 'add_dimm()' function
into two functions one that take care of allocating and initializing
'struct ndctl_dimm' and another that takes care of initializing nfit
specific dimm attributes.

Patch-2 introduces probe function of papr nvdimms and assigning
'papr_dimm_ops' defined in 'papr.c' to 'dimm->ops' if
needed. The patch also code to parse the dimm flags specific to
papr nvdimms

Patch-3 introduces new dimm ops 'dimm_init()' & 'dimm_uninit()' to handle
DSM family specific initialization of 'struct ndctl_dimm'.

Patches-4,5 implements scaffolding to add support for PAPR PDSM
requests and pull in their definitions from the kernel.

Finally Patch-6 add support for issuing and handling the result of
'struct ndctl_cmd' to request dimm health stats from papr_scm kernel module
and returning appropriate health status to libndctl for reporting.

References
==========
[2] "Power Architecture Platform Reference"
https://en.wikipedia.org/wiki/Power_Architecture_Platform_Reference

[3] "Hypercall Op-codes (hcalls)"
https://github.com/torvalds/linux/blob/master/Documentation/powerpc/papr_hcalls.rst

[4] "powerpc/papr_scm: Add support for reporting nvdimm health"
https://lore.kernel.org/linux-nvdimm/20200529214719.223344-1-vaibhav@linux.ibm.com

[5] "ndctl/papr_scm,uapi: Add support for PAPR nvdimm specific methods"
https://lore.kernel.org/linux-nvdimm/20200529214719.223344-5-vaibhav@linux.ibm.com

Vaibhav Jain (6):
  libndctl: Refactor out add_dimm() to handle NFIT specific init
  libncdtl: Add initial support for NVDIMM_FAMILY_PAPR nvdimm family
  libndctl: Introduce new dimm-ops dimm_init() & dimm_uninit()
  libndctl,papr_scm: Add definitions for PAPR nvdimm specific methods
  papr: Add scaffolding to issue and handle PDSM requests
  libndctl,papr_scm: Implement support for PAPR_PDSM_HEALTH

 ndctl/lib/Makefile.am |   1 +
 ndctl/lib/libndctl.c  | 260 ++++++++++++++++++++++++++----------
 ndctl/lib/papr.c      | 301 ++++++++++++++++++++++++++++++++++++++++++
 ndctl/lib/papr_pdsm.h | 175 ++++++++++++++++++++++++
 ndctl/lib/private.h   |   9 ++
 ndctl/libndctl.h      |   1 +
 ndctl/ndctl.h         |   1 +
 7 files changed, 678 insertions(+), 70 deletions(-)
 create mode 100644 ndctl/lib/papr.c
 create mode 100644 ndctl/lib/papr_pdsm.h