mbox series

[v3,0/5] crypto: qat - add heartbeat feature

Message ID 20230622180405.133298-1-damian.muszynski@intel.com (mailing list archive)
Headers show
Series crypto: qat - add heartbeat feature | expand

Message

Damian Muszynski June 22, 2023, 6:04 p.m. UTC
This set introduces support for the QAT heartbeat feature. It allows
detection whenever device firmware or acceleration unit will hang.
We're adding this feature to allow our clients having a tool with
they could verify if all of the Quick Assist hardware resources are
healthy and operational.

QAT device firmware periodically writes counters to a specified physical
memory location. A pair of counters per thread is incremented at
the start and end of the main processing loop within the firmware.
Checking for Heartbeat consists of checking the validity of the pair
of counter values for each thread. Stagnant counters indicate
a firmware hang.

The first patch adds timestamp synchronization to the firmware.
The second patch removes historical and never used HB definitions.
Patch no. 3 is implementing the hardware clock frequency measuring
interface.
The fourth introduces the main heartbeat implementation with the debugfs
interface.
The last patch implements an algorithm that allows the code to detect
which version of heartbeat API is used at the currently loaded firmware.

Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>
Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

Changes since v2:
- fixed build error on a few of architectures - reduced unnecessary 
  64bit division.

Changes since v1:
- fixed build errors on a few of architectures - replaced macro
  DIV_ROUND_CLOSEST with DIV_ROUND_CLOSEST_ULL
- included prerequisite patch "add internal timer for qat 4xxx" which initially
  was sent separately as this patchset was still in developement.
  - timer patch reworked to use delayed work as suggested by Herbert Xu

Damian Muszynski (5):
  crypto: qat - add internal timer for qat 4xxx
  crypto: qat - drop obsolete heartbeat interface
  crypto: qat - add measure clock frequency
  crypto: qat - add heartbeat feature
  crypto: qat - add heartbeat counters check

 Documentation/ABI/testing/debugfs-driver-qat  |  51 +++
 .../intel/qat/qat_4xxx/adf_4xxx_hw_data.c     |  14 +
 .../intel/qat/qat_4xxx/adf_4xxx_hw_data.h     |   4 +
 drivers/crypto/intel/qat/qat_4xxx/adf_drv.c   |   3 +
 .../intel/qat/qat_c3xxx/adf_c3xxx_hw_data.c   |  28 ++
 .../intel/qat/qat_c3xxx/adf_c3xxx_hw_data.h   |   7 +
 .../intel/qat/qat_c62x/adf_c62x_hw_data.c     |  28 ++
 .../intel/qat/qat_c62x/adf_c62x_hw_data.h     |   7 +
 drivers/crypto/intel/qat/qat_common/Makefile  |   4 +
 .../intel/qat/qat_common/adf_accel_devices.h  |  13 +
 .../crypto/intel/qat/qat_common/adf_admin.c   |  43 +++
 .../intel/qat/qat_common/adf_cfg_strings.h    |   2 +
 .../crypto/intel/qat/qat_common/adf_clock.c   | 131 +++++++
 .../crypto/intel/qat/qat_common/adf_clock.h   |  14 +
 .../intel/qat/qat_common/adf_common_drv.h     |   5 +
 .../crypto/intel/qat/qat_common/adf_dbgfs.c   |   9 +-
 .../intel/qat/qat_common/adf_gen2_config.c    |   7 +
 .../intel/qat/qat_common/adf_gen2_hw_data.h   |   3 +
 .../intel/qat/qat_common/adf_gen4_hw_data.h   |   3 +
 .../intel/qat/qat_common/adf_gen4_timer.c     |  70 ++++
 .../intel/qat/qat_common/adf_gen4_timer.h     |  21 ++
 .../intel/qat/qat_common/adf_heartbeat.c      | 336 ++++++++++++++++++
 .../intel/qat/qat_common/adf_heartbeat.h      |  79 ++++
 .../qat/qat_common/adf_heartbeat_dbgfs.c      | 194 ++++++++++
 .../qat/qat_common/adf_heartbeat_dbgfs.h      |  12 +
 .../crypto/intel/qat/qat_common/adf_init.c    |  28 ++
 drivers/crypto/intel/qat/qat_common/adf_isr.c |   6 +
 .../qat/qat_common/icp_qat_fw_init_admin.h    |  23 +-
 .../qat/qat_dh895xcc/adf_dh895xcc_hw_data.c   |  13 +
 .../qat/qat_dh895xcc/adf_dh895xcc_hw_data.h   |   5 +
 30 files changed, 1147 insertions(+), 16 deletions(-)
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_clock.c
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_clock.h
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_timer.c
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_gen4_timer.h
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat.c
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat.h
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.c
 create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.h


base-commit: 0e2456dbf11f1d5427fc6c585ac149b3e9b816e7

Comments

Andy Shevchenko June 22, 2023, 6:21 p.m. UTC | #1
On Thu, Jun 22, 2023 at 08:04:01PM +0200, Damian Muszynski wrote:
> This set introduces support for the QAT heartbeat feature. It allows
> detection whenever device firmware or acceleration unit will hang.
> We're adding this feature to allow our clients having a tool with
> they could verify if all of the Quick Assist hardware resources are
> healthy and operational.
> 
> QAT device firmware periodically writes counters to a specified physical
> memory location. A pair of counters per thread is incremented at
> the start and end of the main processing loop within the firmware.
> Checking for Heartbeat consists of checking the validity of the pair
> of counter values for each thread. Stagnant counters indicate
> a firmware hang.
> 
> The first patch adds timestamp synchronization to the firmware.
> The second patch removes historical and never used HB definitions.
> Patch no. 3 is implementing the hardware clock frequency measuring
> interface.
> The fourth introduces the main heartbeat implementation with the debugfs
> interface.
> The last patch implements an algorithm that allows the code to detect
> which version of heartbeat API is used at the currently loaded firmware.
> 
> Signed-off-by: Damian Muszynski <damian.muszynski@intel.com>

> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>

These tags are not for cover letter.
Andy Shevchenko June 22, 2023, 6:27 p.m. UTC | #2
On Thu, Jun 22, 2023 at 08:04:01PM +0200, Damian Muszynski wrote:
> This set introduces support for the QAT heartbeat feature. It allows
> detection whenever device firmware or acceleration unit will hang.
> We're adding this feature to allow our clients having a tool with
> they could verify if all of the Quick Assist hardware resources are
> healthy and operational.
> 
> QAT device firmware periodically writes counters to a specified physical
> memory location. A pair of counters per thread is incremented at
> the start and end of the main processing loop within the firmware.
> Checking for Heartbeat consists of checking the validity of the pair
> of counter values for each thread. Stagnant counters indicate
> a firmware hang.
> 
> The first patch adds timestamp synchronization to the firmware.
> The second patch removes historical and never used HB definitions.
> Patch no. 3 is implementing the hardware clock frequency measuring
> interface.
> The fourth introduces the main heartbeat implementation with the debugfs
> interface.
> The last patch implements an algorithm that allows the code to detect
> which version of heartbeat API is used at the currently loaded firmware.

I made a few last minute nit-picks, feel free to ignore them if it's okay
with the maintainers.
Damian Muszynski June 26, 2023, 11:22 a.m. UTC | #3
On 2023-06-22 at 21:27:36 +0300, Andy Shevchenko wrote:
> On Thu, Jun 22, 2023 at 08:04:01PM +0200, Damian Muszynski wrote:
> > This set introduces support for the QAT heartbeat feature. It allows
> > detection whenever device firmware or acceleration unit will hang.
> > We're adding this feature to allow our clients having a tool with
> > they could verify if all of the Quick Assist hardware resources are
> > healthy and operational.
> > 
> > QAT device firmware periodically writes counters to a specified physical
> > memory location. A pair of counters per thread is incremented at
> > the start and end of the main processing loop within the firmware.
> > Checking for Heartbeat consists of checking the validity of the pair
> > of counter values for each thread. Stagnant counters indicate
> > a firmware hang.
> > 
> > The first patch adds timestamp synchronization to the firmware.
> > The second patch removes historical and never used HB definitions.
> > Patch no. 3 is implementing the hardware clock frequency measuring
> > interface.
> > The fourth introduces the main heartbeat implementation with the debugfs
> > interface.
> > The last patch implements an algorithm that allows the code to detect
> > which version of heartbeat API is used at the currently loaded firmware.
> 
> I made a few last minute nit-picks, feel free to ignore them if it's okay
> with the maintainers.

Thanks, I will implement those. 

--- 
Best Regards,
Damian Muszynski
Cabiddu, Giovanni June 26, 2023, 11:31 a.m. UTC | #4
On Thu, Jun 22, 2023 at 09:27:36PM +0300, Andy Shevchenko wrote:
> On Thu, Jun 22, 2023 at 08:04:01PM +0200, Damian Muszynski wrote:
> > This set introduces support for the QAT heartbeat feature. It allows
> > detection whenever device firmware or acceleration unit will hang.
> > We're adding this feature to allow our clients having a tool with
> > they could verify if all of the Quick Assist hardware resources are
> > healthy and operational.
> > 
> > QAT device firmware periodically writes counters to a specified physical
> > memory location. A pair of counters per thread is incremented at
> > the start and end of the main processing loop within the firmware.
> > Checking for Heartbeat consists of checking the validity of the pair
> > of counter values for each thread. Stagnant counters indicate
> > a firmware hang.
> > 
> > The first patch adds timestamp synchronization to the firmware.
> > The second patch removes historical and never used HB definitions.
> > Patch no. 3 is implementing the hardware clock frequency measuring
> > interface.
> > The fourth introduces the main heartbeat implementation with the debugfs
> > interface.
> > The last patch implements an algorithm that allows the code to detect
> > which version of heartbeat API is used at the currently loaded firmware.
> 
> I made a few last minute nit-picks, feel free to ignore them if it's okay
> with the maintainers.
Thanks.

Herbert, If you decide to include this in the PR for 6.5 we will send a
patch on top to clarify the comment.
Otherwise, we will resend the set including also a version update
(6.5->6.6) in the Documentation.

Regards,