mbox series

[v2,0/8] cxl: support CXL memory RAS features

Message ID 20250320180450.539-1-shiju.jose@huawei.com (mailing list archive)
Headers show
Series cxl: support CXL memory RAS features | expand

Message

Shiju Jose March 20, 2025, 6:04 p.m. UTC
From: Shiju Jose <shiju.jose@huawei.com>

Support for CXL memory RAS features: patrol scrub, ECS, soft-PPR and
memory sparing.

This CXL series was part of the EDAC series [1].

The code is based on cxl.git next branch [2] merged with ras.git edac-cxl
branch [3].

1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
2. https://web.git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=next
3. https://web.git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-cxl

Userspace code for CXL memory repair features [4] and
sample boot-script for CXL memory repair [5].

[4]: https://lore.kernel.org/lkml/20250207143028.1865-1-shiju.jose@huawei.com/
[5]: https://lore.kernel.org/lkml/20250207143028.1865-5-shiju.jose@huawei.com/

Changes
=======
v1 -> v2:
1. Feedbacks from Dan Williams on v1,
   https://lore.kernel.org/linux-mm/20250307091137.00006a0a@huawei.com/T/
  - Fixed lock issues in region scrubbing, added local cxl_acquire()
    and cxl_unlock.
  - Replaced CXL examples using cat and echo from EDAC .rst docs
    with short description and ref to ABI docs. Also corrections
    in existing descriptions as suggested by Dan.
  - Add policy description for the scrub control feature.
    However this may require inputs from CXL experts. 
  - Replaced CONFIG_CXL_RAS_FEATURES with CONFIG_CXL_EDAC_MEM_FEATURES.
  - Few changes to depends part of CONFIG_CXL_EDAC_MEM_FEATURES.
  - Rename drivers/cxl/core/memfeatures.c as drivers/cxl/core/edac.c
  - snprintf() -> kasprintf() in few places.
  
2. Feedbacks from Alison on v1,
  - In cxl_get_feature_entry()(patch 1), return NULL on failures and
    reintroduced checks in cxl_get_feature_entry().
  - Changed logic in for loop in region based scrubbing code.
  - Replace cxl_are_decoders_committed() to cxl_is_memdev_memory_online()
    and add as a local function to drivers/cxl/core/edac.c
  - Changed few multiline comments to single line comments.
  - Removed unnecessary comments from the code.
  - Reduced line length of few macros in ECS and memory repair code.
  - In new files, changed "GPL-2.0-or-later" -> "GPL-2.0-only".
  - Ran clang-format for new files and updated.                                                                                                                                                                       
3. Changes for feedbacks from Jonathan on v1.
  - Changed few multiline comments to single line comments.

Shiju Jose (8):
  cxl: Add helper function to retrieve a feature entry
  EDAC: Update documentation for the CXL memory patrol scrub control
    feature
  cxl/edac: Add CXL memory device patrol scrub control feature
  cxl/edac: Add CXL memory device ECS control feature
  cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command
  cxl: Support for finding memory operation attributes from the current
    boot
  cxl/memfeature: Add CXL memory device soft PPR control feature
  cxl/memfeature: Add CXL memory device memory sparing control feature

 Documentation/edac/memory_repair.rst |   31 +
 Documentation/edac/scrub.rst         |   47 +
 drivers/cxl/Kconfig                  |   27 +
 drivers/cxl/core/Makefile            |    1 +
 drivers/cxl/core/core.h              |    2 +
 drivers/cxl/core/edac.c              | 1730 ++++++++++++++++++++++++++
 drivers/cxl/core/features.c          |   23 +
 drivers/cxl/core/mbox.c              |   45 +-
 drivers/cxl/core/memdev.c            |    9 +
 drivers/cxl/core/ras.c               |  145 +++
 drivers/cxl/core/region.c            |    5 +
 drivers/cxl/cxlmem.h                 |   73 ++
 drivers/cxl/mem.c                    |    4 +
 drivers/cxl/pci.c                    |    3 +
 drivers/edac/mem_repair.c            |    9 +
 include/linux/edac.h                 |    7 +
 16 files changed, 2159 insertions(+), 2 deletions(-)
 create mode 100644 drivers/cxl/core/edac.c

Comments

Christophe Leroy March 21, 2025, 7:39 a.m. UTC | #1
Le 20/03/2025 à 19:04, shiju.jose@huawei.com a écrit :
> From: Shiju Jose <shiju.jose@huawei.com>
> 
> Support for CXL memory RAS features: patrol scrub, ECS, soft-PPR and
> memory sparing.
> 
> This CXL series was part of the EDAC series [1].
> 
> The code is based on cxl.git next branch [2] merged with ras.git edac-cxl
> branch [3].
> 
> 1. https://lore.kernel.org/linux-cxl/20250212143654.1893-1-shiju.jose@huawei.com/
> 2. https://web.git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=next
> 3. https://web.git.kernel.org/pub/scm/linux/kernel/git/ras/ras.git/log/?h=edac-cxl
> 
> Userspace code for CXL memory repair features [4] and
> sample boot-script for CXL memory repair [5].
> 
> [4]: https://lore.kernel.org/lkml/20250207143028.1865-1-shiju.jose@huawei.com/
> [5]: https://lore.kernel.org/lkml/20250207143028.1865-5-shiju.jose@huawei.com/

The title for the series is quite confusing, CXL seems to be something 
else. There is a series here [1] that removes CXL driver, but after 
looking it seems to be something completely different.

[1] https://lore.kernel.org/all/20250219070007.177725-1-ajd@linux.ibm.com/

Christophe


> 
> Changes
> =======
> v1 -> v2:
> 1. Feedbacks from Dan Williams on v1,
>     https://lore.kernel.org/linux-mm/20250307091137.00006a0a@huawei.com/T/
>    - Fixed lock issues in region scrubbing, added local cxl_acquire()
>      and cxl_unlock.
>    - Replaced CXL examples using cat and echo from EDAC .rst docs
>      with short description and ref to ABI docs. Also corrections
>      in existing descriptions as suggested by Dan.
>    - Add policy description for the scrub control feature.
>      However this may require inputs from CXL experts.
>    - Replaced CONFIG_CXL_RAS_FEATURES with CONFIG_CXL_EDAC_MEM_FEATURES.
>    - Few changes to depends part of CONFIG_CXL_EDAC_MEM_FEATURES.
>    - Rename drivers/cxl/core/memfeatures.c as drivers/cxl/core/edac.c
>    - snprintf() -> kasprintf() in few places.
>    
> 2. Feedbacks from Alison on v1,
>    - In cxl_get_feature_entry()(patch 1), return NULL on failures and
>      reintroduced checks in cxl_get_feature_entry().
>    - Changed logic in for loop in region based scrubbing code.
>    - Replace cxl_are_decoders_committed() to cxl_is_memdev_memory_online()
>      and add as a local function to drivers/cxl/core/edac.c
>    - Changed few multiline comments to single line comments.
>    - Removed unnecessary comments from the code.
>    - Reduced line length of few macros in ECS and memory repair code.
>    - In new files, changed "GPL-2.0-or-later" -> "GPL-2.0-only".
>    - Ran clang-format for new files and updated.
> 3. Changes for feedbacks from Jonathan on v1.
>    - Changed few multiline comments to single line comments.
> 
> Shiju Jose (8):
>    cxl: Add helper function to retrieve a feature entry
>    EDAC: Update documentation for the CXL memory patrol scrub control
>      feature
>    cxl/edac: Add CXL memory device patrol scrub control feature
>    cxl/edac: Add CXL memory device ECS control feature
>    cxl/mbox: Add support for PERFORM_MAINTENANCE mailbox command
>    cxl: Support for finding memory operation attributes from the current
>      boot
>    cxl/memfeature: Add CXL memory device soft PPR control feature
>    cxl/memfeature: Add CXL memory device memory sparing control feature
> 
>   Documentation/edac/memory_repair.rst |   31 +
>   Documentation/edac/scrub.rst         |   47 +
>   drivers/cxl/Kconfig                  |   27 +
>   drivers/cxl/core/Makefile            |    1 +
>   drivers/cxl/core/core.h              |    2 +
>   drivers/cxl/core/edac.c              | 1730 ++++++++++++++++++++++++++
>   drivers/cxl/core/features.c          |   23 +
>   drivers/cxl/core/mbox.c              |   45 +-
>   drivers/cxl/core/memdev.c            |    9 +
>   drivers/cxl/core/ras.c               |  145 +++
>   drivers/cxl/core/region.c            |    5 +
>   drivers/cxl/cxlmem.h                 |   73 ++
>   drivers/cxl/mem.c                    |    4 +
>   drivers/cxl/pci.c                    |    3 +
>   drivers/edac/mem_repair.c            |    9 +
>   include/linux/edac.h                 |    7 +
>   16 files changed, 2159 insertions(+), 2 deletions(-)
>   create mode 100644 drivers/cxl/core/edac.c
>