mbox series

[v2,0/9] crypto: qat - improve recovery flows

Message ID 20240202105324.50391-1-mun.chun.yep@intel.com (mailing list archive)
Headers show
Series crypto: qat - improve recovery flows | expand

Message

Mun Chun Yep Feb. 2, 2024, 10:53 a.m. UTC
This set improves the error recovery flows in the QAT drivers and
adds a mechanism to test it through an heartbeat simulator.

When a QAT device reports either a fatal error, or an AER fatal error,
or fails an heartbeat check, the PF driver sends an error notification to
the VFs through PFVF comms and if `auto_reset` is enabled then
the device goes through reset flows for error recovery.
If SRIOV is enabled when an error is encountered, this is re-enabled after
the reset cycle is done.

Changed in v2:
- Removed redundant default value in Kconfig
- Removed ccflags define, use the CONFIG option directly in the code
- Reworked the AER reset and recovery flow

Damian Muszynski (2):
  crypto: qat - add heartbeat error simulator
  crypto: qat - add auto reset on error

Furong Zhou (3):
  crypto: qat - add fatal error notify method
  crypto: qat - disable arbitration before reset
  crypto: qat - limit heartbeat notifications

Mun Chun Yep (4):
  crypto: qat - update PFVF protocol for recovery
  crypto: qat - re-enable sriov after pf reset
  crypto: qat - add fatal error notification
  crypto: qat - improve aer error reset handling

 Documentation/ABI/testing/debugfs-driver-qat  |  26 ++++
 Documentation/ABI/testing/sysfs-driver-qat    |  20 +++
 drivers/crypto/intel/qat/Kconfig              |  14 +++
 drivers/crypto/intel/qat/qat_common/Makefile  |   2 +
 .../intel/qat/qat_common/adf_accel_devices.h  |   2 +
 drivers/crypto/intel/qat/qat_common/adf_aer.c | 116 +++++++++++++++++-
 .../intel/qat/qat_common/adf_cfg_strings.h    |   1 +
 .../intel/qat/qat_common/adf_common_drv.h     |  10 ++
 .../intel/qat/qat_common/adf_heartbeat.c      |  20 ++-
 .../intel/qat/qat_common/adf_heartbeat.h      |  21 ++++
 .../qat/qat_common/adf_heartbeat_dbgfs.c      |  52 ++++++++
 .../qat/qat_common/adf_heartbeat_inject.c     |  76 ++++++++++++
 .../intel/qat/qat_common/adf_hw_arbiter.c     |  25 ++++
 .../crypto/intel/qat/qat_common/adf_init.c    |  12 ++
 drivers/crypto/intel/qat/qat_common/adf_isr.c |   7 +-
 .../intel/qat/qat_common/adf_pfvf_msg.h       |   7 +-
 .../intel/qat/qat_common/adf_pfvf_pf_msg.c    |  64 +++++++++-
 .../intel/qat/qat_common/adf_pfvf_pf_msg.h    |  21 ++++
 .../intel/qat/qat_common/adf_pfvf_pf_proto.c  |   8 ++
 .../intel/qat/qat_common/adf_pfvf_vf_proto.c  |   6 +
 .../crypto/intel/qat/qat_common/adf_sriov.c   |  38 +++++-
 .../crypto/intel/qat/qat_common/adf_sysfs.c   |  37 ++++++
 22 files changed, 571 insertions(+), 14 deletions(-)
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c

Comments

Herbert Xu Feb. 9, 2024, 5:01 a.m. UTC | #1
On Fri, Feb 02, 2024 at 06:53:15PM +0800, Mun Chun Yep wrote:
> This set improves the error recovery flows in the QAT drivers and
> adds a mechanism to test it through an heartbeat simulator.
> 
> When a QAT device reports either a fatal error, or an AER fatal error,
> or fails an heartbeat check, the PF driver sends an error notification to
> the VFs through PFVF comms and if `auto_reset` is enabled then
> the device goes through reset flows for error recovery.
> If SRIOV is enabled when an error is encountered, this is re-enabled after
> the reset cycle is done.
> 
> Changed in v2:
> - Removed redundant default value in Kconfig
> - Removed ccflags define, use the CONFIG option directly in the code
> - Reworked the AER reset and recovery flow
> 
> Damian Muszynski (2):
>   crypto: qat - add heartbeat error simulator
>   crypto: qat - add auto reset on error
> 
> Furong Zhou (3):
>   crypto: qat - add fatal error notify method
>   crypto: qat - disable arbitration before reset
>   crypto: qat - limit heartbeat notifications
> 
> Mun Chun Yep (4):
>   crypto: qat - update PFVF protocol for recovery
>   crypto: qat - re-enable sriov after pf reset
>   crypto: qat - add fatal error notification
>   crypto: qat - improve aer error reset handling
> 
>  Documentation/ABI/testing/debugfs-driver-qat  |  26 ++++
>  Documentation/ABI/testing/sysfs-driver-qat    |  20 +++
>  drivers/crypto/intel/qat/Kconfig              |  14 +++
>  drivers/crypto/intel/qat/qat_common/Makefile  |   2 +
>  .../intel/qat/qat_common/adf_accel_devices.h  |   2 +
>  drivers/crypto/intel/qat/qat_common/adf_aer.c | 116 +++++++++++++++++-
>  .../intel/qat/qat_common/adf_cfg_strings.h    |   1 +
>  .../intel/qat/qat_common/adf_common_drv.h     |  10 ++
>  .../intel/qat/qat_common/adf_heartbeat.c      |  20 ++-
>  .../intel/qat/qat_common/adf_heartbeat.h      |  21 ++++
>  .../qat/qat_common/adf_heartbeat_dbgfs.c      |  52 ++++++++
>  .../qat/qat_common/adf_heartbeat_inject.c     |  76 ++++++++++++
>  .../intel/qat/qat_common/adf_hw_arbiter.c     |  25 ++++
>  .../crypto/intel/qat/qat_common/adf_init.c    |  12 ++
>  drivers/crypto/intel/qat/qat_common/adf_isr.c |   7 +-
>  .../intel/qat/qat_common/adf_pfvf_msg.h       |   7 +-
>  .../intel/qat/qat_common/adf_pfvf_pf_msg.c    |  64 +++++++++-
>  .../intel/qat/qat_common/adf_pfvf_pf_msg.h    |  21 ++++
>  .../intel/qat/qat_common/adf_pfvf_pf_proto.c  |   8 ++
>  .../intel/qat/qat_common/adf_pfvf_vf_proto.c  |   6 +
>  .../crypto/intel/qat/qat_common/adf_sriov.c   |  38 +++++-
>  .../crypto/intel/qat/qat_common/adf_sysfs.c   |  37 ++++++
>  22 files changed, 571 insertions(+), 14 deletions(-)
>  create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c
> 
> -- 
> 2.34.1

All applied.  Thanks.