mbox series

[v3,for-next,0/5] RDMA/hns: Supports recovery of on-chip RAM 1bit ECC errors

Message ID 20220714134353.16700-1-liangwenpeng@huawei.com (mailing list archive)
Headers show
Series RDMA/hns: Supports recovery of on-chip RAM 1bit ECC errors | expand

Message

Wenpeng Liang July 14, 2022, 1:43 p.m. UTC
Add support for the 1bit ECC error recovery by abnormal interrupt reporting
and adjusts the structure of the abnormal interrupt handler.

The following is the outline of each patch:
(1)#1~#4: Cleanup and bugfix for the abnormal interrupt handler.
(2)#5: Support for the 1bit ECC error recovery.

Changes since v2:
* Optimize the logic of the exit of fmea_recover_others() in #5.
* v2 Link: https://patchwork.kernel.org/project/linux-rdma/cover/20220713092630.1657-1-liangwenpeng@huawei.com/

Changes since v1:
* Embed ecc_work into structure hns_roce_dev, no longer dynamically allocated in #5.
* Add the const keyword to the string array that does not change in #5.
* v1 Link: https://patchwork.kernel.org/project/linux-rdma/cover/20220624110845.48184-1-liangwenpeng@huawei.com/

Haoyue Xu (5):
  RDMA/hns: Remove unused abnormal interrupt of type RAS
  RDMA/hns: Fix the wrong type of return value of the interrupt handler
  RDMA/hns: Fix incorrect clearing of interrupt status register
  RDMA/hns: Refactor the abnormal interrupt handler function
  RDMA/hns: Recover 1bit-ECC error of RAM on chip

 drivers/infiniband/hw/hns/hns_roce_device.h |   1 +
 drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 248 +++++++++++++++++---
 drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  13 +-
 3 files changed, 227 insertions(+), 35 deletions(-)

--
2.33.0

Comments

Leon Romanovsky July 18, 2022, 11:17 a.m. UTC | #1
On Thu, Jul 14, 2022 at 09:43:48PM +0800, Wenpeng Liang wrote:
> Add support for the 1bit ECC error recovery by abnormal interrupt reporting
> and adjusts the structure of the abnormal interrupt handler.
> 
> The following is the outline of each patch:
> (1)#1~#4: Cleanup and bugfix for the abnormal interrupt handler.
> (2)#5: Support for the 1bit ECC error recovery.
> 
> Changes since v2:
> * Optimize the logic of the exit of fmea_recover_others() in #5.
> * v2 Link: https://patchwork.kernel.org/project/linux-rdma/cover/20220713092630.1657-1-liangwenpeng@huawei.com/
> 
> Changes since v1:
> * Embed ecc_work into structure hns_roce_dev, no longer dynamically allocated in #5.
> * Add the const keyword to the string array that does not change in #5.
> * v1 Link: https://patchwork.kernel.org/project/linux-rdma/cover/20220624110845.48184-1-liangwenpeng@huawei.com/
> 
> Haoyue Xu (5):
>   RDMA/hns: Remove unused abnormal interrupt of type RAS
>   RDMA/hns: Fix the wrong type of return value of the interrupt handler
>   RDMA/hns: Fix incorrect clearing of interrupt status register
>   RDMA/hns: Refactor the abnormal interrupt handler function
>   RDMA/hns: Recover 1bit-ECC error of RAM on chip
> 
>  drivers/infiniband/hw/hns/hns_roce_device.h |   1 +
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.c  | 248 +++++++++++++++++---
>  drivers/infiniband/hw/hns/hns_roce_hw_v2.h  |  13 +-
>  3 files changed, 227 insertions(+), 35 deletions(-)
> 

Thanks, applied.