From patchwork Fri Feb 2 10:53:16 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mun Chun Yep X-Patchwork-Id: 13542722 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 65F097CF02 for ; Fri, 2 Feb 2024 10:55:04 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871306; cv=none; b=bN4MJPOFB9p1mcWW8RD+qs5FUvFspQfeOlM33YC8URyEajJ9v2bju4OHQhigpEmLEGjnlm9qbcQdm9J18LIsv0WCgTvDj07IlUz2NRCckvoahhYwNVEevhOCvufu0lYwyWiUnhHumQK15LtKAU6eqJZMDTjm4rdcP3U/J5/hQlU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871306; c=relaxed/simple; bh=8CXeeh0xLX4xQVAlOh3NmP5xHpNGKk9nKpGECSI7DAc=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=P8KrJe1FBzwNx2QwuGsicTohr9+szGm8UqJQ0Wbvfon2rV9sMMv//hyRypH3Q+2rAdnF/4KLjVsqa4rRsOv8JXbrgFOOSsmcONwCQizCyAAwh12VQRFFPM8h7NSRxTofm1bpboADX2/+3Pdx7NJY+eE507VqYctHM34heH97tY4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=L0giS/Yk; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="L0giS/Yk" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706871304; x=1738407304; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=8CXeeh0xLX4xQVAlOh3NmP5xHpNGKk9nKpGECSI7DAc=; b=L0giS/Yk2jvFzugdh1G/6fCQZ+94gkR1lnZrgB+jtNwxOnkcCkenM1HL c+JqzZjEd2TpUsBDL7tU6egnyhKcKN6v7w44k1NNyVrD/bhUQnb6yXb3i MDqonlPvtTozVfLFXc2o9NC4507AiClIpnJeN7UwpM7vPl2i6i5U03t5M Xgx+AACMZza1+Ux5eFlsFIBEYszXetxw0an6LOR95WtsJlHKOuPdnahEU ump5qH+2amSv2psbPwh7cnRayPxMZPKycJLZcQciXhGs0NPc9P7Ou0KWO APwUG7a5sWjldBr05ZntMApeqmAJXRHFm0ncRjZpeH9YklqSxJdznkEGi w==; X-IronPort-AV: E=McAfee;i="6600,9927,10971"; a="10787280" X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="10787280" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:04 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="53540" Received: from myep-mobl1.png.intel.com ([10.107.10.166]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:02 -0800 From: Mun Chun Yep To: herbert@gondor.apana.org.au Cc: linux-crypto@vger.kernel.org, qat-linux@intel.com, Damian Muszynski , Giovanni Cabiddu , Lucas Segarra Fernandez , Ahsan Atta , Markas Rapoportas , Mun Chun Yep Subject: [PATCH v2 1/9] crypto: qat - add heartbeat error simulator Date: Fri, 2 Feb 2024 18:53:16 +0800 Message-Id: <20240202105324.50391-2-mun.chun.yep@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240202105324.50391-1-mun.chun.yep@intel.com> References: <20240202105324.50391-1-mun.chun.yep@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Damian Muszynski Add a mechanism that allows to inject a heartbeat error for testing purposes. A new attribute `inject_error` is added to debugfs for each QAT device. Upon a write on this attribute, the driver will inject an error on the device which can then be detected by the heartbeat feature. Errors are breaking the device functionality thus they require a device reset in order to be recovered. This functionality is not compiled by default, to enable it CRYPTO_DEV_QAT_ERROR_INJECTION must be set. Signed-off-by: Damian Muszynski Reviewed-by: Giovanni Cabiddu Reviewed-by: Lucas Segarra Fernandez Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Signed-off-by: Mun Chun Yep --- Documentation/ABI/testing/debugfs-driver-qat | 26 +++++++ drivers/crypto/intel/qat/Kconfig | 14 ++++ drivers/crypto/intel/qat/qat_common/Makefile | 2 + .../intel/qat/qat_common/adf_common_drv.h | 1 + .../intel/qat/qat_common/adf_heartbeat.c | 6 -- .../intel/qat/qat_common/adf_heartbeat.h | 18 +++++ .../qat/qat_common/adf_heartbeat_dbgfs.c | 52 +++++++++++++ .../qat/qat_common/adf_heartbeat_inject.c | 76 +++++++++++++++++++ .../intel/qat/qat_common/adf_hw_arbiter.c | 25 ++++++ 9 files changed, 214 insertions(+), 6 deletions(-) create mode 100644 drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c diff --git a/Documentation/ABI/testing/debugfs-driver-qat b/Documentation/ABI/testing/debugfs-driver-qat index b2db010d851e..bd6793760f29 100644 --- a/Documentation/ABI/testing/debugfs-driver-qat +++ b/Documentation/ABI/testing/debugfs-driver-qat @@ -81,3 +81,29 @@ Description: (RO) Read returns, for each Acceleration Engine (AE), the number : Number of Compress and Verify (CnV) errors and type of the last CnV error detected by Acceleration Engine N. + +What: /sys/kernel/debug/qat__/heartbeat/inject_error +Date: March 2024 +KernelVersion: 6.8 +Contact: qat-linux@intel.com +Description: (WO) Write to inject an error that simulates an heartbeat + failure. This is to be used for testing purposes. + + After writing this file, the driver stops arbitration on a + random engine and disables the fetching of heartbeat counters. + If a workload is running on the device, a job submitted to the + accelerator might not get a response and a read of the + `heartbeat/status` attribute might report -1, i.e. device + unresponsive. + The error is unrecoverable thus the device must be restarted to + restore its functionality. + + This attribute is available only when the kernel is built with + CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION=y. + + A write of 1 enables error injection. + + The following example shows how to enable error injection:: + + # cd /sys/kernel/debug/qat__ + # echo 1 > heartbeat/inject_error diff --git a/drivers/crypto/intel/qat/Kconfig b/drivers/crypto/intel/qat/Kconfig index c120f6715a09..02fb8abe4e6e 100644 --- a/drivers/crypto/intel/qat/Kconfig +++ b/drivers/crypto/intel/qat/Kconfig @@ -106,3 +106,17 @@ config CRYPTO_DEV_QAT_C62XVF To compile this as a module, choose M here: the module will be called qat_c62xvf. + +config CRYPTO_DEV_QAT_ERROR_INJECTION + bool "Support for Intel(R) QAT Devices Heartbeat Error Injection" + depends on CRYPTO_DEV_QAT + depends on DEBUG_FS + help + Enables a mechanism that allows to inject a heartbeat error on + Intel(R) QuickAssist devices for testing purposes. + + This is intended for developer use only. + If unsure, say N. + + This functionality is available via debugfs entry of the Intel(R) + QuickAssist device diff --git a/drivers/crypto/intel/qat/qat_common/Makefile b/drivers/crypto/intel/qat/qat_common/Makefile index 6908727bff3b..5915cde8a7aa 100644 --- a/drivers/crypto/intel/qat/qat_common/Makefile +++ b/drivers/crypto/intel/qat/qat_common/Makefile @@ -53,3 +53,5 @@ intel_qat-$(CONFIG_PCI_IOV) += adf_sriov.o adf_vf_isr.o adf_pfvf_utils.o \ adf_pfvf_pf_msg.o adf_pfvf_pf_proto.o \ adf_pfvf_vf_msg.o adf_pfvf_vf_proto.o \ adf_gen2_pfvf.o adf_gen4_pfvf.o + +intel_qat-$(CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION) += adf_heartbeat_inject.o diff --git a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h index f06188033a93..0baae42deb3a 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h +++ b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h @@ -90,6 +90,7 @@ void adf_exit_aer(void); int adf_init_arb(struct adf_accel_dev *accel_dev); void adf_exit_arb(struct adf_accel_dev *accel_dev); void adf_update_ring_arb(struct adf_etr_ring_data *ring); +int adf_disable_arb_thd(struct adf_accel_dev *accel_dev, u32 ae, u32 thr); int adf_dev_get(struct adf_accel_dev *accel_dev); void adf_dev_put(struct adf_accel_dev *accel_dev); diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c index 13f48d2f6da8..f88b1bc6857e 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c @@ -23,12 +23,6 @@ #define ADF_HB_EMPTY_SIG 0xA5A5A5A5 -/* Heartbeat counter pair */ -struct hb_cnt_pair { - __u16 resp_heartbeat_cnt; - __u16 req_heartbeat_cnt; -}; - static int adf_hb_check_polling_freq(struct adf_accel_dev *accel_dev) { u64 curr_time = adf_clock_get_current_time(); diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h index b22e3cb29798..24c3f4f24c86 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h @@ -19,6 +19,12 @@ enum adf_device_heartbeat_status { HB_DEV_UNSUPPORTED, }; +/* Heartbeat counter pair */ +struct hb_cnt_pair { + __u16 resp_heartbeat_cnt; + __u16 req_heartbeat_cnt; +}; + struct adf_heartbeat { unsigned int hb_sent_counter; unsigned int hb_failed_counter; @@ -35,6 +41,9 @@ struct adf_heartbeat { struct dentry *cfg; struct dentry *sent; struct dentry *failed; +#ifdef CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION + struct dentry *inject_error; +#endif } dbgfs; }; @@ -51,6 +60,15 @@ void adf_heartbeat_status(struct adf_accel_dev *accel_dev, enum adf_device_heartbeat_status *hb_status); void adf_heartbeat_check_ctrs(struct adf_accel_dev *accel_dev); +#ifdef CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION +int adf_heartbeat_inject_error(struct adf_accel_dev *accel_dev); +#else +static inline int adf_heartbeat_inject_error(struct adf_accel_dev *accel_dev) +{ + return -EPERM; +} +#endif + #else static inline int adf_heartbeat_init(struct adf_accel_dev *accel_dev) { diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.c index 2661af6a2ef6..5cd6c2d6f90a 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.c +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat_dbgfs.c @@ -155,6 +155,43 @@ static const struct file_operations adf_hb_cfg_fops = { .write = adf_hb_cfg_write, }; +static ssize_t adf_hb_error_inject_write(struct file *file, + const char __user *user_buf, + size_t count, loff_t *ppos) +{ + struct adf_accel_dev *accel_dev = file->private_data; + size_t written_chars; + char buf[3]; + int ret; + + /* last byte left as string termination */ + if (count != 2) + return -EINVAL; + + written_chars = simple_write_to_buffer(buf, sizeof(buf) - 1, + ppos, user_buf, count); + if (buf[0] != '1') + return -EINVAL; + + ret = adf_heartbeat_inject_error(accel_dev); + if (ret) { + dev_err(&GET_DEV(accel_dev), + "Heartbeat error injection failed with status %d\n", + ret); + return ret; + } + + dev_info(&GET_DEV(accel_dev), "Heartbeat error injection enabled\n"); + + return written_chars; +} + +static const struct file_operations adf_hb_error_inject_fops = { + .owner = THIS_MODULE, + .open = simple_open, + .write = adf_hb_error_inject_write, +}; + void adf_heartbeat_dbgfs_add(struct adf_accel_dev *accel_dev) { struct adf_heartbeat *hb = accel_dev->heartbeat; @@ -171,6 +208,17 @@ void adf_heartbeat_dbgfs_add(struct adf_accel_dev *accel_dev) &hb->hb_failed_counter, &adf_hb_stats_fops); hb->dbgfs.cfg = debugfs_create_file("config", 0600, hb->dbgfs.base_dir, accel_dev, &adf_hb_cfg_fops); + + if (IS_ENABLED(CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION)) { + struct dentry *inject_error __maybe_unused; + + inject_error = debugfs_create_file("inject_error", 0200, + hb->dbgfs.base_dir, accel_dev, + &adf_hb_error_inject_fops); +#ifdef CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION + hb->dbgfs.inject_error = inject_error; +#endif + } } EXPORT_SYMBOL_GPL(adf_heartbeat_dbgfs_add); @@ -189,6 +237,10 @@ void adf_heartbeat_dbgfs_rm(struct adf_accel_dev *accel_dev) hb->dbgfs.failed = NULL; debugfs_remove(hb->dbgfs.cfg); hb->dbgfs.cfg = NULL; +#ifdef CONFIG_CRYPTO_DEV_QAT_ERROR_INJECTION + debugfs_remove(hb->dbgfs.inject_error); + hb->dbgfs.inject_error = NULL; +#endif debugfs_remove(hb->dbgfs.base_dir); hb->dbgfs.base_dir = NULL; } diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c new file mode 100644 index 000000000000..a3b474bdef6c --- /dev/null +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat_inject.c @@ -0,0 +1,76 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* Copyright(c) 2023 Intel Corporation */ +#include + +#include "adf_admin.h" +#include "adf_common_drv.h" +#include "adf_heartbeat.h" + +#define MAX_HB_TICKS 0xFFFFFFFF + +static int adf_hb_set_timer_to_max(struct adf_accel_dev *accel_dev) +{ + struct adf_hw_device_data *hw_data = accel_dev->hw_device; + + accel_dev->heartbeat->hb_timer = 0; + + if (hw_data->stop_timer) + hw_data->stop_timer(accel_dev); + + return adf_send_admin_hb_timer(accel_dev, MAX_HB_TICKS); +} + +static void adf_set_hb_counters_fail(struct adf_accel_dev *accel_dev, u32 ae, + u32 thr) +{ + struct hb_cnt_pair *stats = accel_dev->heartbeat->dma.virt_addr; + struct adf_hw_device_data *hw_device = accel_dev->hw_device; + const size_t max_aes = hw_device->get_num_aes(hw_device); + const size_t hb_ctrs = hw_device->num_hb_ctrs; + size_t thr_id = ae * hb_ctrs + thr; + u16 num_rsp = stats[thr_id].resp_heartbeat_cnt; + + /* + * Inject live.req != live.rsp and live.rsp == last.rsp + * to trigger the heartbeat error detection + */ + stats[thr_id].req_heartbeat_cnt++; + stats += (max_aes * hb_ctrs); + stats[thr_id].resp_heartbeat_cnt = num_rsp; +} + +int adf_heartbeat_inject_error(struct adf_accel_dev *accel_dev) +{ + struct adf_hw_device_data *hw_device = accel_dev->hw_device; + const size_t max_aes = hw_device->get_num_aes(hw_device); + const size_t hb_ctrs = hw_device->num_hb_ctrs; + u32 rand, rand_ae, rand_thr; + unsigned long ae_mask; + int ret; + + ae_mask = hw_device->ae_mask; + + do { + /* Ensure we have a valid ae */ + get_random_bytes(&rand, sizeof(rand)); + rand_ae = rand % max_aes; + } while (!test_bit(rand_ae, &ae_mask)); + + get_random_bytes(&rand, sizeof(rand)); + rand_thr = rand % hb_ctrs; + + /* Increase the heartbeat timer to prevent FW updating HB counters */ + ret = adf_hb_set_timer_to_max(accel_dev); + if (ret) + return ret; + + /* Configure worker threads to stop processing any packet */ + ret = adf_disable_arb_thd(accel_dev, rand_ae, rand_thr); + if (ret) + return ret; + + /* Change HB counters memory to simulate a hang */ + adf_set_hb_counters_fail(accel_dev, rand_ae, rand_thr); + + return 0; +} diff --git a/drivers/crypto/intel/qat/qat_common/adf_hw_arbiter.c b/drivers/crypto/intel/qat/qat_common/adf_hw_arbiter.c index da6956699246..65bd26b25abc 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_hw_arbiter.c +++ b/drivers/crypto/intel/qat/qat_common/adf_hw_arbiter.c @@ -103,3 +103,28 @@ void adf_exit_arb(struct adf_accel_dev *accel_dev) csr_ops->write_csr_ring_srv_arb_en(csr, i, 0); } EXPORT_SYMBOL_GPL(adf_exit_arb); + +int adf_disable_arb_thd(struct adf_accel_dev *accel_dev, u32 ae, u32 thr) +{ + void __iomem *csr = accel_dev->transport->banks[0].csr_addr; + struct adf_hw_device_data *hw_data = accel_dev->hw_device; + const u32 *thd_2_arb_cfg; + struct arb_info info; + u32 ae_thr_map; + + if (ADF_AE_STRAND0_THREAD == thr || ADF_AE_STRAND1_THREAD == thr) + thr = ADF_AE_ADMIN_THREAD; + + hw_data->get_arb_info(&info); + thd_2_arb_cfg = hw_data->get_arb_mapping(accel_dev); + if (!thd_2_arb_cfg) + return -EFAULT; + + /* Disable scheduling for this particular AE and thread */ + ae_thr_map = *(thd_2_arb_cfg + ae); + ae_thr_map &= ~(GENMASK(3, 0) << (thr * BIT(2))); + + WRITE_CSR_ARB_WT2SAM(csr, info.arb_offset, info.wt2sam_offset, ae, + ae_thr_map); + return 0; +} From patchwork Fri Feb 2 10:53:17 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mun Chun Yep X-Patchwork-Id: 13542723 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 614A07CF02 for ; Fri, 2 Feb 2024 10:55:07 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871309; cv=none; b=TG5XXNiXDDVKtWaHY2l+Cb+e5yfDUkkixw/1O4yn+soN+Ou8Roauzcvr7QsVgSa7s+PYU2VtFF+hLJssfhdohfsE/BHM754eIW4jNZhm+qHBup2swTUwXpyxI3ETddb2CwEEn8L3WEYgizepHi+6NV8S0way0ZE+NF4BbRIdcfQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871309; c=relaxed/simple; bh=yqIOeHWNH6R8ce78erp1JUeBQuApZbpPiCUmsHB1hd8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=VrLu5R253ZAHs0X+0YwBObf6aGBVGSQN60WOnyuIUb86nr+4PTEhy9+9h1+NBovV8TxqLp3+GKDptC3FUhp8DZSQu0Fp40oZqjwmfJL625eqll5JU77VF2j2rd9kRVkD+ybvk526PAQcBHg7gb2PcehQGL0E9+ExyigCSJajrqo= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hVMMjPAH; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hVMMjPAH" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706871307; x=1738407307; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=yqIOeHWNH6R8ce78erp1JUeBQuApZbpPiCUmsHB1hd8=; b=hVMMjPAHo1k00ftavZPlp+mMqnkzHNO8eNaCxZNVdhDm28bAcshi77zl R/0XhDK30n46UzBMIM1x6FqC14b79XnhLAnaGgcmugJVO7nV8CVMLBtaR QrxP4WwrfgRMw1BCEK/U21vdtVyITFO2ksA0uaJQtUhNf9Uwm9w5Wc5MC PFqqrPrL1MNqdjDWtaMtcIAUl5KSOmYTcE1JqggNohWSOeGDsjzK7gG2K UHIbH6Y6KhWoOyzDhU3MCgBzXBx6fVNPvq1v7FgdGEA5JED3Xutcn+q6U c6HXkzMMKVvi1N6iKMyn5lTZ8oEOU4zKxsSghhc8vUrq+ER96s7uupVEW w==; X-IronPort-AV: E=McAfee;i="6600,9927,10971"; a="10787286" X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="10787286" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:06 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="53566" Received: from myep-mobl1.png.intel.com ([10.107.10.166]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:04 -0800 From: Mun Chun Yep To: herbert@gondor.apana.org.au Cc: linux-crypto@vger.kernel.org, qat-linux@intel.com, Furong Zhou , Ahsan Atta , Markas Rapoportas , Giovanni Cabiddu , Mun Chun Yep Subject: [PATCH v2 2/9] crypto: qat - add fatal error notify method Date: Fri, 2 Feb 2024 18:53:17 +0800 Message-Id: <20240202105324.50391-3-mun.chun.yep@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240202105324.50391-1-mun.chun.yep@intel.com> References: <20240202105324.50391-1-mun.chun.yep@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Furong Zhou Add error notify method to report a fatal error event to all the subsystems registered. In addition expose an API, adf_notify_fatal_error(), that allows to trigger a fatal error notification asynchronously in the context of a workqueue. This will be invoked when a fatal error is detected by the ISR or through Heartbeat. Signed-off-by: Furong Zhou Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Mun Chun Yep --- drivers/crypto/intel/qat/qat_common/adf_aer.c | 30 +++++++++++++++++++ .../intel/qat/qat_common/adf_common_drv.h | 3 ++ .../crypto/intel/qat/qat_common/adf_init.c | 12 ++++++++ 3 files changed, 45 insertions(+) diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index a39e70bd4b21..22a43b4b8315 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -8,6 +8,11 @@ #include "adf_accel_devices.h" #include "adf_common_drv.h" +struct adf_fatal_error_data { + struct adf_accel_dev *accel_dev; + struct work_struct work; +}; + static struct workqueue_struct *device_reset_wq; static pci_ers_result_t adf_error_detected(struct pci_dev *pdev, @@ -171,6 +176,31 @@ const struct pci_error_handlers adf_err_handler = { }; EXPORT_SYMBOL_GPL(adf_err_handler); +static void adf_notify_fatal_error_worker(struct work_struct *work) +{ + struct adf_fatal_error_data *wq_data = + container_of(work, struct adf_fatal_error_data, work); + struct adf_accel_dev *accel_dev = wq_data->accel_dev; + + adf_error_notifier(accel_dev); + kfree(wq_data); +} + +int adf_notify_fatal_error(struct adf_accel_dev *accel_dev) +{ + struct adf_fatal_error_data *wq_data; + + wq_data = kzalloc(sizeof(*wq_data), GFP_ATOMIC); + if (!wq_data) + return -ENOMEM; + + wq_data->accel_dev = accel_dev; + INIT_WORK(&wq_data->work, adf_notify_fatal_error_worker); + adf_misc_wq_queue_work(&wq_data->work); + + return 0; +} + int adf_init_aer(void) { device_reset_wq = alloc_workqueue("qat_device_reset_wq", diff --git a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h index 0baae42deb3a..8c062d5a8db2 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h +++ b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h @@ -40,6 +40,7 @@ enum adf_event { ADF_EVENT_SHUTDOWN, ADF_EVENT_RESTARTING, ADF_EVENT_RESTARTED, + ADF_EVENT_FATAL_ERROR, }; struct service_hndl { @@ -60,6 +61,8 @@ int adf_dev_restart(struct adf_accel_dev *accel_dev); void adf_devmgr_update_class_index(struct adf_hw_device_data *hw_data); void adf_clean_vf_map(bool); +int adf_notify_fatal_error(struct adf_accel_dev *accel_dev); +void adf_error_notifier(struct adf_accel_dev *accel_dev); int adf_devmgr_add_dev(struct adf_accel_dev *accel_dev, struct adf_accel_dev *pf); void adf_devmgr_rm_dev(struct adf_accel_dev *accel_dev, diff --git a/drivers/crypto/intel/qat/qat_common/adf_init.c b/drivers/crypto/intel/qat/qat_common/adf_init.c index f43ae9111553..74f0818c0703 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_init.c +++ b/drivers/crypto/intel/qat/qat_common/adf_init.c @@ -433,6 +433,18 @@ int adf_dev_restarted_notify(struct adf_accel_dev *accel_dev) return 0; } +void adf_error_notifier(struct adf_accel_dev *accel_dev) +{ + struct service_hndl *service; + + list_for_each_entry(service, &service_table, list) { + if (service->event_hld(accel_dev, ADF_EVENT_FATAL_ERROR)) + dev_err(&GET_DEV(accel_dev), + "Failed to send error event to %s.\n", + service->name); + } +} + static int adf_dev_shutdown_cache_cfg(struct adf_accel_dev *accel_dev) { char services[ADF_CFG_MAX_VAL_LEN_IN_BYTES] = {0}; From patchwork Fri Feb 2 10:53:18 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mun Chun Yep X-Patchwork-Id: 13542724 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5DF4F13BEAE for ; Fri, 2 Feb 2024 10:55:09 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871310; cv=none; b=qTLESVoJb/UY8W8IABWsLerH8DBylmFaQ9zpD/YrKrEjy1dpzuglW6B9HvYOgPj000TwGcnPU5gcPqD2USh9ER4VR5V+Q6w6CYY4aBbVhA/T/4Ye7p5uYdmqCCQTPOmQD0TGKCQlLIctUHguO1gj+u2GTx+ITVkR9sUG03lCI+s= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871310; c=relaxed/simple; bh=tTcxQM8kmIDo3CaLAzaXTDwkgvDAxFeXC364HbFq7z4=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=ZfIuQeIud1W1B8ty0T3Oic0Z1NElR6zT0PbQy5BARwpnUZ0ZYsK3EHhsMfZ9jW4tIxCWO7emi3VzzxqDSm2J93MUSK3bjx656ocY1oxUQrVhVDQ0YoHbLEjxSHts6rdC81xojE4YGr+Vf+ttkFxn1RBNTukK4fiprPyCQDgj+Ps= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=AcCQqijC; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="AcCQqijC" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706871309; x=1738407309; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=tTcxQM8kmIDo3CaLAzaXTDwkgvDAxFeXC364HbFq7z4=; b=AcCQqijCdODB9ydQxW7KfKoGzdGXqe18zVbygkjBZ7R4hH2G1t5TCJIY kqIFCcEW7RN5ykCCiFN5582TLEf8xwhW7wMhYe60XVOQjJqz0nciJkDcS utMJS8/VWvvN+isVCSeO8VJ1HGAbtBoReyJA7nv9lx9ECaCdnePxm2QVF Y+aBUgXlcuTkFD9SOlzXVLOWUiRINI1nbpMOwqKBo1dUD16xn4f0YOTZx zf9nK20RcRIrF4tt2gC/BeCPU5ODc7sJdqEQuFcLqvD4sfMQgchJEBAXs YR8xr2XfR0rirFsi6nemaHDy4S9eOxbzFmL5rFzYunIYhexQcIsZwfBBF Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10971"; a="10787298" X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="10787298" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:09 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="53583" Received: from myep-mobl1.png.intel.com ([10.107.10.166]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:07 -0800 From: Mun Chun Yep To: herbert@gondor.apana.org.au Cc: linux-crypto@vger.kernel.org, qat-linux@intel.com, Furong Zhou , Ahsan Atta , Markas Rapoportas , Giovanni Cabiddu , Mun Chun Yep Subject: [PATCH v2 3/9] crypto: qat - disable arbitration before reset Date: Fri, 2 Feb 2024 18:53:18 +0800 Message-Id: <20240202105324.50391-4-mun.chun.yep@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240202105324.50391-1-mun.chun.yep@intel.com> References: <20240202105324.50391-1-mun.chun.yep@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Furong Zhou Disable arbitration to avoid new requests to be processed before resetting a device. This is needed so that new requests are not fetched when an error is detected. Signed-off-by: Furong Zhou Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Mun Chun Yep --- drivers/crypto/intel/qat/qat_common/adf_aer.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index 22a43b4b8315..acbbd32bd815 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -181,8 +181,16 @@ static void adf_notify_fatal_error_worker(struct work_struct *work) struct adf_fatal_error_data *wq_data = container_of(work, struct adf_fatal_error_data, work); struct adf_accel_dev *accel_dev = wq_data->accel_dev; + struct adf_hw_device_data *hw_device = accel_dev->hw_device; adf_error_notifier(accel_dev); + + if (!accel_dev->is_vf) { + /* Disable arbitration to stop processing of new requests */ + if (hw_device->exit_arb) + hw_device->exit_arb(accel_dev); + } + kfree(wq_data); } From patchwork Fri Feb 2 10:53:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mun Chun Yep X-Patchwork-Id: 13542725 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id F149F13BEB0 for ; Fri, 2 Feb 2024 10:55:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871313; cv=none; b=fxeBs6Ghg9DbyB/HlXYLF2JVFPUovuws6S9l/49Eb4gCfPh1SHYNPWi6wyj8iL5XEF/TgRfAu3CpECRT/hsLKteXf4hQ7l4mkk94QWJk44VKgJfSbO3YKzOJI9ueeYcuJp53YpdffSOaLllB+1fEWktDdxk7qXFJrqCgb0ts27o= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871313; c=relaxed/simple; bh=q1ckhf4tF9nXGLVI/HYNhoEeNl1p98L8Uik789wGNbI=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=KFBa9sbXGob6RKpq+UQrDQdPreeBRH/r10YJXHyxel6jChGnZ341s9YRkDg7fK1O9we17xk9IugklBXeH40updyCfAghjBLlhCKiR5cGKEeSmA4EM6BC/RPAUIfQ+OL/jdnyrs+ZkyX4Cqv2mHNBa+ePoFG17/yGVufwzVHFnXI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=YlJM5JjX; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="YlJM5JjX" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706871312; x=1738407312; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=q1ckhf4tF9nXGLVI/HYNhoEeNl1p98L8Uik789wGNbI=; b=YlJM5JjXc/W24OJQb9EvhdV7iW1UE25R+F56JUGxt8av5lFojMrEtev/ y7hsrGKnwQbw60e+7AprcoMoOCeQTRv+nJXXfPSsB0yndjxXo6uZJ6YN7 r3nd0Y2d8/hvl59HpAwXLNG3SMRbQzkZBFEB0FCkxDHi0c9++S9Y9KQpD EBIXJeAQ4UkTD+uF5OW9X/egeSNwUFlQh4Nx+UB0c14m0JoKxDb25KJh+ OP1yHtakYFLLgTJ/qagUx2TiqfBT0wEDCVFSW/jYqbomt0i1/gGdlz8hx lYsY+XWgVPMK5Yg+mY4PNrFt47GjETPcZ5CwsjP+ET9hjDnyMH+q6EaQV Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10971"; a="10787310" X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="10787310" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="53597" Received: from myep-mobl1.png.intel.com ([10.107.10.166]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:10 -0800 From: Mun Chun Yep To: herbert@gondor.apana.org.au Cc: linux-crypto@vger.kernel.org, qat-linux@intel.com, Mun Chun Yep , Ahsan Atta , Markas Rapoportas , Giovanni Cabiddu Subject: [PATCH v2 4/9] crypto: qat - update PFVF protocol for recovery Date: Fri, 2 Feb 2024 18:53:19 +0800 Message-Id: <20240202105324.50391-5-mun.chun.yep@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240202105324.50391-1-mun.chun.yep@intel.com> References: <20240202105324.50391-1-mun.chun.yep@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Update the PFVF logic to handle restart and recovery. This adds the following functions: * adf_pf2vf_notify_fatal_error(): allows the PF to notify VFs that the device detected a fatal error and requires a reset. This sends to VF the event `ADF_PF2VF_MSGTYPE_FATAL_ERROR`. * adf_pf2vf_wait_for_restarting_complete(): allows the PF to wait for `ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE` events from active VFs before proceeding with a reset. * adf_pf2vf_notify_restarted(): enables the PF to notify VFs with an `ADF_PF2VF_MSGTYPE_RESTARTED` event after recovery, indicating that the device is back to normal. This prompts VF drivers switch back to use the accelerator for workload processing. These changes improve the communication and synchronization between PF and VF drivers during system restart and recovery processes. Signed-off-by: Mun Chun Yep Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu --- .../intel/qat/qat_common/adf_accel_devices.h | 1 + drivers/crypto/intel/qat/qat_common/adf_aer.c | 3 + .../intel/qat/qat_common/adf_pfvf_msg.h | 7 +- .../intel/qat/qat_common/adf_pfvf_pf_msg.c | 64 ++++++++++++++++++- .../intel/qat/qat_common/adf_pfvf_pf_msg.h | 21 ++++++ .../intel/qat/qat_common/adf_pfvf_pf_proto.c | 8 +++ .../intel/qat/qat_common/adf_pfvf_vf_proto.c | 6 ++ .../crypto/intel/qat/qat_common/adf_sriov.c | 1 + 8 files changed, 109 insertions(+), 2 deletions(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h index a16c7e6edc65..4a3c36aaa7ca 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h +++ b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h @@ -332,6 +332,7 @@ struct adf_accel_vf_info { struct ratelimit_state vf2pf_ratelimit; u32 vf_nr; bool init; + bool restarting; u8 vf_compat_ver; }; diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index acbbd32bd815..ecb114e1b59f 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -7,6 +7,7 @@ #include #include "adf_accel_devices.h" #include "adf_common_drv.h" +#include "adf_pfvf_pf_msg.h" struct adf_fatal_error_data { struct adf_accel_dev *accel_dev; @@ -189,6 +190,8 @@ static void adf_notify_fatal_error_worker(struct work_struct *work) /* Disable arbitration to stop processing of new requests */ if (hw_device->exit_arb) hw_device->exit_arb(accel_dev); + if (accel_dev->pf.vf_info) + adf_pf2vf_notify_fatal_error(accel_dev); } kfree(wq_data); diff --git a/drivers/crypto/intel/qat/qat_common/adf_pfvf_msg.h b/drivers/crypto/intel/qat/qat_common/adf_pfvf_msg.h index 204a42438992..d1b3ef9cadac 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_pfvf_msg.h +++ b/drivers/crypto/intel/qat/qat_common/adf_pfvf_msg.h @@ -99,6 +99,8 @@ enum pf2vf_msgtype { ADF_PF2VF_MSGTYPE_RESTARTING = 0x01, ADF_PF2VF_MSGTYPE_VERSION_RESP = 0x02, ADF_PF2VF_MSGTYPE_BLKMSG_RESP = 0x03, + ADF_PF2VF_MSGTYPE_FATAL_ERROR = 0x04, + ADF_PF2VF_MSGTYPE_RESTARTED = 0x05, /* Values from 0x10 are Gen4 specific, message type is only 4 bits in Gen2 devices. */ ADF_PF2VF_MSGTYPE_RP_RESET_RESP = 0x10, }; @@ -112,6 +114,7 @@ enum vf2pf_msgtype { ADF_VF2PF_MSGTYPE_LARGE_BLOCK_REQ = 0x07, ADF_VF2PF_MSGTYPE_MEDIUM_BLOCK_REQ = 0x08, ADF_VF2PF_MSGTYPE_SMALL_BLOCK_REQ = 0x09, + ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE = 0x0a, /* Values from 0x10 are Gen4 specific, message type is only 4 bits in Gen2 devices. */ ADF_VF2PF_MSGTYPE_RP_RESET = 0x10, }; @@ -124,8 +127,10 @@ enum pfvf_compatibility_version { ADF_PFVF_COMPAT_FAST_ACK = 0x03, /* Ring to service mapping support for non-standard mappings */ ADF_PFVF_COMPAT_RING_TO_SVC_MAP = 0x04, + /* Fallback compat */ + ADF_PFVF_COMPAT_FALLBACK = 0x05, /* Reference to the latest version */ - ADF_PFVF_COMPAT_THIS_VERSION = 0x04, + ADF_PFVF_COMPAT_THIS_VERSION = 0x05, }; /* PF->VF Version Response */ diff --git a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.c b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.c index 14c069f0d71a..0e31f4b41844 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.c +++ b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.c @@ -1,21 +1,83 @@ // SPDX-License-Identifier: (BSD-3-Clause OR GPL-2.0-only) /* Copyright(c) 2015 - 2021 Intel Corporation */ +#include #include #include "adf_accel_devices.h" #include "adf_pfvf_msg.h" #include "adf_pfvf_pf_msg.h" #include "adf_pfvf_pf_proto.h" +#define ADF_PF_WAIT_RESTARTING_COMPLETE_DELAY 100 +#define ADF_VF_SHUTDOWN_RETRY 100 + void adf_pf2vf_notify_restarting(struct adf_accel_dev *accel_dev) { struct adf_accel_vf_info *vf; struct pfvf_message msg = { .type = ADF_PF2VF_MSGTYPE_RESTARTING }; int i, num_vfs = pci_num_vf(accel_to_pci_dev(accel_dev)); + dev_dbg(&GET_DEV(accel_dev), "pf2vf notify restarting\n"); for (i = 0, vf = accel_dev->pf.vf_info; i < num_vfs; i++, vf++) { - if (vf->init && adf_send_pf2vf_msg(accel_dev, i, msg)) + vf->restarting = false; + if (!vf->init) + continue; + if (adf_send_pf2vf_msg(accel_dev, i, msg)) dev_err(&GET_DEV(accel_dev), "Failed to send restarting msg to VF%d\n", i); + else if (vf->vf_compat_ver >= ADF_PFVF_COMPAT_FALLBACK) + vf->restarting = true; + } +} + +void adf_pf2vf_wait_for_restarting_complete(struct adf_accel_dev *accel_dev) +{ + int num_vfs = pci_num_vf(accel_to_pci_dev(accel_dev)); + int i, retries = ADF_VF_SHUTDOWN_RETRY; + struct adf_accel_vf_info *vf; + bool vf_running; + + dev_dbg(&GET_DEV(accel_dev), "pf2vf wait for restarting complete\n"); + do { + vf_running = false; + for (i = 0, vf = accel_dev->pf.vf_info; i < num_vfs; i++, vf++) + if (vf->restarting) + vf_running = true; + if (!vf_running) + break; + msleep(ADF_PF_WAIT_RESTARTING_COMPLETE_DELAY); + } while (--retries); + + if (vf_running) + dev_warn(&GET_DEV(accel_dev), "Some VFs are still running\n"); +} + +void adf_pf2vf_notify_restarted(struct adf_accel_dev *accel_dev) +{ + struct pfvf_message msg = { .type = ADF_PF2VF_MSGTYPE_RESTARTED }; + int i, num_vfs = pci_num_vf(accel_to_pci_dev(accel_dev)); + struct adf_accel_vf_info *vf; + + dev_dbg(&GET_DEV(accel_dev), "pf2vf notify restarted\n"); + for (i = 0, vf = accel_dev->pf.vf_info; i < num_vfs; i++, vf++) { + if (vf->init && vf->vf_compat_ver >= ADF_PFVF_COMPAT_FALLBACK && + adf_send_pf2vf_msg(accel_dev, i, msg)) + dev_err(&GET_DEV(accel_dev), + "Failed to send restarted msg to VF%d\n", i); + } +} + +void adf_pf2vf_notify_fatal_error(struct adf_accel_dev *accel_dev) +{ + struct pfvf_message msg = { .type = ADF_PF2VF_MSGTYPE_FATAL_ERROR }; + int i, num_vfs = pci_num_vf(accel_to_pci_dev(accel_dev)); + struct adf_accel_vf_info *vf; + + dev_dbg(&GET_DEV(accel_dev), "pf2vf notify fatal error\n"); + for (i = 0, vf = accel_dev->pf.vf_info; i < num_vfs; i++, vf++) { + if (vf->init && vf->vf_compat_ver >= ADF_PFVF_COMPAT_FALLBACK && + adf_send_pf2vf_msg(accel_dev, i, msg)) + dev_err(&GET_DEV(accel_dev), + "Failed to send fatal error msg to VF%d\n", i); } } diff --git a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.h b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.h index e8982d1ac896..f203d88c919c 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.h +++ b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_msg.h @@ -5,7 +5,28 @@ #include "adf_accel_devices.h" +#if defined(CONFIG_PCI_IOV) void adf_pf2vf_notify_restarting(struct adf_accel_dev *accel_dev); +void adf_pf2vf_wait_for_restarting_complete(struct adf_accel_dev *accel_dev); +void adf_pf2vf_notify_restarted(struct adf_accel_dev *accel_dev); +void adf_pf2vf_notify_fatal_error(struct adf_accel_dev *accel_dev); +#else +static inline void adf_pf2vf_notify_restarting(struct adf_accel_dev *accel_dev) +{ +} + +static inline void adf_pf2vf_wait_for_restarting_complete(struct adf_accel_dev *accel_dev) +{ +} + +static inline void adf_pf2vf_notify_restarted(struct adf_accel_dev *accel_dev) +{ +} + +static inline void adf_pf2vf_notify_fatal_error(struct adf_accel_dev *accel_dev) +{ +} +#endif typedef int (*adf_pf2vf_blkmsg_provider)(struct adf_accel_dev *accel_dev, u8 *buffer, u8 compat); diff --git a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_proto.c b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_proto.c index 388e58bcbcaf..9ab93fbfefde 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_proto.c +++ b/drivers/crypto/intel/qat/qat_common/adf_pfvf_pf_proto.c @@ -291,6 +291,14 @@ static int adf_handle_vf2pf_msg(struct adf_accel_dev *accel_dev, u8 vf_nr, vf_info->init = false; } break; + case ADF_VF2PF_MSGTYPE_RESTARTING_COMPLETE: + { + dev_dbg(&GET_DEV(accel_dev), + "Restarting Complete received from VF%d\n", vf_nr); + vf_info->restarting = false; + vf_info->init = false; + } + break; case ADF_VF2PF_MSGTYPE_LARGE_BLOCK_REQ: case ADF_VF2PF_MSGTYPE_MEDIUM_BLOCK_REQ: case ADF_VF2PF_MSGTYPE_SMALL_BLOCK_REQ: diff --git a/drivers/crypto/intel/qat/qat_common/adf_pfvf_vf_proto.c b/drivers/crypto/intel/qat/qat_common/adf_pfvf_vf_proto.c index 1015155b6374..dc284a089c88 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_pfvf_vf_proto.c +++ b/drivers/crypto/intel/qat/qat_common/adf_pfvf_vf_proto.c @@ -308,6 +308,12 @@ static bool adf_handle_pf2vf_msg(struct adf_accel_dev *accel_dev, adf_pf2vf_handle_pf_restarting(accel_dev); return false; + case ADF_PF2VF_MSGTYPE_RESTARTED: + dev_dbg(&GET_DEV(accel_dev), "Restarted message received from PF\n"); + return true; + case ADF_PF2VF_MSGTYPE_FATAL_ERROR: + dev_err(&GET_DEV(accel_dev), "Fatal error received from PF\n"); + return true; case ADF_PF2VF_MSGTYPE_VERSION_RESP: case ADF_PF2VF_MSGTYPE_BLKMSG_RESP: case ADF_PF2VF_MSGTYPE_RP_RESET_RESP: diff --git a/drivers/crypto/intel/qat/qat_common/adf_sriov.c b/drivers/crypto/intel/qat/qat_common/adf_sriov.c index f44025bb6f99..cb2a9830f192 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_sriov.c +++ b/drivers/crypto/intel/qat/qat_common/adf_sriov.c @@ -103,6 +103,7 @@ void adf_disable_sriov(struct adf_accel_dev *accel_dev) return; adf_pf2vf_notify_restarting(accel_dev); + adf_pf2vf_wait_for_restarting_complete(accel_dev); pci_disable_sriov(accel_to_pci_dev(accel_dev)); /* Disable VF to PF interrupts */ From patchwork Fri Feb 2 10:53:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mun Chun Yep X-Patchwork-Id: 13542726 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 612AF13B7B4 for ; Fri, 2 Feb 2024 10:55:14 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871316; cv=none; b=HMS/1nbDlse2j0eCSyhNkPcq+zaRZ5I2cO+LJC8JpTakl/15bNGoCu8DqTS+3a1YoQZRb1NkexIQ7cVHIABoWCHZOQGBKc5ZmeW3jjQS+ON7j12pDuOPvel/zt+FH4BuhwUEvcuVLQdQOp3bB8SmluoV+EdVNA//BElNTe+XEJo= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871316; c=relaxed/simple; bh=KHwfFJlIOHLwoabu7yGFzna1/qTm3EkrCCAGubo59p8=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=dU0XRQCZZqx4LUvdyrwtmIVZZVo0TeadWeu4JRcCE9fIpMuvR8gNho3KAlRHN8V/p8TDNqq8GyBNUsZ+LcfBjhx/W86KNR1zLZoKK5UWZs0E7sfqnUbl801pQKNACI+Ml9j1tIMnmfzoastzOirW6d4oQspf7hzXJe0N1gT4DJ4= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=d35bsrGb; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="d35bsrGb" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706871314; x=1738407314; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KHwfFJlIOHLwoabu7yGFzna1/qTm3EkrCCAGubo59p8=; b=d35bsrGbLcNxn+wqW9xnXu0BbgLDo6KYwOXuij6VJENWpB3p4t4OFUTl dvfWESFakdiwH7eqThIrGFWUJ/EGc69ujWuXNYuoG3DgzgEftBo2DDBOA 5cep8p39z+wtRscsn9uzYw1qOGAMBf5Kcv31FM5BcgaPtpdhIugtp4TVI KE9PG8tDH2Zy4RtjDs2DzV+Rdb+Iy+pDiNmrim1pI22AQCnk4nzyJC3In /11YJVcuuH1okLguQBoxASonLTAoJAgiN8Eg5Jg6DHIq8wh/zcq9o2Yow HDYDPcAsytGF2N6bq5JW/bnQrxH0I3CWnpbf1KfeUkigqO03tJZJbrf+j g==; X-IronPort-AV: E=McAfee;i="6600,9927,10971"; a="10787328" X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="10787328" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:14 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="53610" Received: from myep-mobl1.png.intel.com ([10.107.10.166]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:12 -0800 From: Mun Chun Yep To: herbert@gondor.apana.org.au Cc: linux-crypto@vger.kernel.org, qat-linux@intel.com, Mun Chun Yep , Ahsan Atta , Markas Rapoportas , Giovanni Cabiddu Subject: [PATCH v2 5/9] crypto: qat - re-enable sriov after pf reset Date: Fri, 2 Feb 2024 18:53:20 +0800 Message-Id: <20240202105324.50391-6-mun.chun.yep@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240202105324.50391-1-mun.chun.yep@intel.com> References: <20240202105324.50391-1-mun.chun.yep@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 When a Physical Function (PF) is reset, SR-IOV gets disabled, making the associated Virtual Functions (VFs) unavailable. Even after reset and using pci_restore_state, VFs remain uncreated because the numvfs still at 0. Therefore, it's necessary to reconfigure SR-IOV to re-enable VFs. This commit introduces the ADF_SRIOV_ENABLED configuration flag to cache the SR-IOV enablement state. SR-IOV is only re-enabled if it was previously configured. This commit also introduces a dedicated workqueue without `WQ_MEM_RECLAIM` flag for enabling SR-IOV during Heartbeat and CPM error resets, preventing workqueue flushing warning. This patch is based on earlier work done by Shashank Gupta. Signed-off-by: Mun Chun Yep Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu --- drivers/crypto/intel/qat/qat_common/adf_aer.c | 40 ++++++++++++++++++- .../intel/qat/qat_common/adf_cfg_strings.h | 1 + .../intel/qat/qat_common/adf_common_drv.h | 5 +++ .../crypto/intel/qat/qat_common/adf_sriov.c | 37 +++++++++++++++-- 4 files changed, 79 insertions(+), 4 deletions(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index ecb114e1b59f..cd273b31db0e 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -15,6 +15,7 @@ struct adf_fatal_error_data { }; static struct workqueue_struct *device_reset_wq; +static struct workqueue_struct *device_sriov_wq; static pci_ers_result_t adf_error_detected(struct pci_dev *pdev, pci_channel_state_t state) @@ -43,6 +44,13 @@ struct adf_reset_dev_data { struct work_struct reset_work; }; +/* sriov dev data */ +struct adf_sriov_dev_data { + struct adf_accel_dev *accel_dev; + struct completion compl; + struct work_struct sriov_work; +}; + void adf_reset_sbr(struct adf_accel_dev *accel_dev) { struct pci_dev *pdev = accel_to_pci_dev(accel_dev); @@ -88,11 +96,22 @@ void adf_dev_restore(struct adf_accel_dev *accel_dev) } } +static void adf_device_sriov_worker(struct work_struct *work) +{ + struct adf_sriov_dev_data *sriov_data = + container_of(work, struct adf_sriov_dev_data, sriov_work); + + adf_reenable_sriov(sriov_data->accel_dev); + complete(&sriov_data->compl); +} + static void adf_device_reset_worker(struct work_struct *work) { struct adf_reset_dev_data *reset_data = container_of(work, struct adf_reset_dev_data, reset_work); struct adf_accel_dev *accel_dev = reset_data->accel_dev; + unsigned long wait_jiffies = msecs_to_jiffies(10000); + struct adf_sriov_dev_data sriov_data; adf_dev_restarting_notify(accel_dev); if (adf_dev_restart(accel_dev)) { @@ -103,6 +122,14 @@ static void adf_device_reset_worker(struct work_struct *work) WARN(1, "QAT: device restart failed. Device is unusable\n"); return; } + + sriov_data.accel_dev = accel_dev; + init_completion(&sriov_data.compl); + INIT_WORK(&sriov_data.sriov_work, adf_device_sriov_worker); + queue_work(device_sriov_wq, &sriov_data.sriov_work); + if (wait_for_completion_timeout(&sriov_data.compl, wait_jiffies)) + adf_pf2vf_notify_restarted(accel_dev); + adf_dev_restarted_notify(accel_dev); clear_bit(ADF_STATUS_RESTARTING, &accel_dev->status); @@ -216,7 +243,14 @@ int adf_init_aer(void) { device_reset_wq = alloc_workqueue("qat_device_reset_wq", WQ_MEM_RECLAIM, 0); - return !device_reset_wq ? -EFAULT : 0; + if (!device_reset_wq) + return -EFAULT; + + device_sriov_wq = alloc_workqueue("qat_device_sriov_wq", 0, 0); + if (!device_sriov_wq) + return -EFAULT; + + return 0; } void adf_exit_aer(void) @@ -224,4 +258,8 @@ void adf_exit_aer(void) if (device_reset_wq) destroy_workqueue(device_reset_wq); device_reset_wq = NULL; + + if (device_sriov_wq) + destroy_workqueue(device_sriov_wq); + device_sriov_wq = NULL; } diff --git a/drivers/crypto/intel/qat/qat_common/adf_cfg_strings.h b/drivers/crypto/intel/qat/qat_common/adf_cfg_strings.h index 322b76903a73..e015ad6cace2 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_cfg_strings.h +++ b/drivers/crypto/intel/qat/qat_common/adf_cfg_strings.h @@ -49,5 +49,6 @@ ADF_ETRMGR_BANK "%d" ADF_ETRMGR_CORE_AFFINITY #define ADF_ACCEL_STR "Accelerator%d" #define ADF_HEARTBEAT_TIMER "HeartbeatTimer" +#define ADF_SRIOV_ENABLED "SriovEnabled" #endif diff --git a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h index 8c062d5a8db2..10891c9da6e7 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h +++ b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h @@ -192,6 +192,7 @@ bool adf_misc_wq_queue_delayed_work(struct delayed_work *work, #if defined(CONFIG_PCI_IOV) int adf_sriov_configure(struct pci_dev *pdev, int numvfs); void adf_disable_sriov(struct adf_accel_dev *accel_dev); +void adf_reenable_sriov(struct adf_accel_dev *accel_dev); void adf_enable_vf2pf_interrupts(struct adf_accel_dev *accel_dev, u32 vf_mask); void adf_disable_all_vf2pf_interrupts(struct adf_accel_dev *accel_dev); bool adf_recv_and_handle_pf2vf_msg(struct adf_accel_dev *accel_dev); @@ -212,6 +213,10 @@ static inline void adf_disable_sriov(struct adf_accel_dev *accel_dev) { } +static inline void adf_reenable_sriov(struct adf_accel_dev *accel_dev) +{ +} + static inline int adf_init_pf_wq(void) { return 0; diff --git a/drivers/crypto/intel/qat/qat_common/adf_sriov.c b/drivers/crypto/intel/qat/qat_common/adf_sriov.c index cb2a9830f192..87a70c00c41e 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_sriov.c +++ b/drivers/crypto/intel/qat/qat_common/adf_sriov.c @@ -60,7 +60,6 @@ static int adf_enable_sriov(struct adf_accel_dev *accel_dev) /* This ptr will be populated when VFs will be created */ vf_info->accel_dev = accel_dev; vf_info->vf_nr = i; - vf_info->vf_compat_ver = 0; mutex_init(&vf_info->pf2vf_lock); ratelimit_state_init(&vf_info->vf2pf_ratelimit, @@ -84,6 +83,32 @@ static int adf_enable_sriov(struct adf_accel_dev *accel_dev) return pci_enable_sriov(pdev, totalvfs); } +void adf_reenable_sriov(struct adf_accel_dev *accel_dev) +{ + struct pci_dev *pdev = accel_to_pci_dev(accel_dev); + char cfg[ADF_CFG_MAX_VAL_LEN_IN_BYTES] = {0}; + unsigned long val = 0; + + if (adf_cfg_get_param_value(accel_dev, ADF_GENERAL_SEC, + ADF_SRIOV_ENABLED, cfg)) + return; + + if (!accel_dev->pf.vf_info) + return; + + if (adf_cfg_add_key_value_param(accel_dev, ADF_KERNEL_SEC, ADF_NUM_CY, + &val, ADF_DEC)) + return; + + if (adf_cfg_add_key_value_param(accel_dev, ADF_KERNEL_SEC, ADF_NUM_DC, + &val, ADF_DEC)) + return; + + set_bit(ADF_STATUS_CONFIGURED, &accel_dev->status); + dev_dbg(&pdev->dev, "Re-enabling SRIOV\n"); + adf_enable_sriov(accel_dev); +} + /** * adf_disable_sriov() - Disable SRIOV for the device * @accel_dev: Pointer to accel device. @@ -116,8 +141,10 @@ void adf_disable_sriov(struct adf_accel_dev *accel_dev) for (i = 0, vf = accel_dev->pf.vf_info; i < totalvfs; i++, vf++) mutex_destroy(&vf->pf2vf_lock); - kfree(accel_dev->pf.vf_info); - accel_dev->pf.vf_info = NULL; + if (!test_bit(ADF_STATUS_RESTARTING, &accel_dev->status)) { + kfree(accel_dev->pf.vf_info); + accel_dev->pf.vf_info = NULL; + } } EXPORT_SYMBOL_GPL(adf_disable_sriov); @@ -195,6 +222,10 @@ int adf_sriov_configure(struct pci_dev *pdev, int numvfs) if (ret) return ret; + val = 1; + adf_cfg_add_key_value_param(accel_dev, ADF_GENERAL_SEC, ADF_SRIOV_ENABLED, + &val, ADF_DEC); + return numvfs; } EXPORT_SYMBOL_GPL(adf_sriov_configure); From patchwork Fri Feb 2 10:53:21 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mun Chun Yep X-Patchwork-Id: 13542727 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B56D27D418 for ; Fri, 2 Feb 2024 10:55:16 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871318; cv=none; b=qwd+UjRHYWgiBvocZ8BMVKwXrZpd/pv4XsV3ILUgFN2ax7uDyDCbTIWxgWmKJkEbGSXxLOZIOxx1NROIT9cdDPqpZM3PSca/BAUrYM2ACW5LYvnBl3O5i1WRLrOtNJBdC0g0Ml20lidsw7BnUumENYBDJVV5Msf6TUPTogklUAg= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871318; c=relaxed/simple; bh=aNdfu8a76RK9rIHvl77AB1xtT6MV9sknou2JE8sop7I=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=B+AUD7b9GEvoKt9m5AGB5RFRHCPnunUEuW+JkzSWOLwg7quYmZIed2Gy8FuB91gxdbFc4aPtfbdLuo1cALPrPOBQq20WaJSlAa9CFXplP/G0ZGQA2HinTs1f6CrKOl/fGFklmLV/ChMPpfEZIqGul7p7WiAtbQ+oViU6SbdXlPA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=ZNJneuj0; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ZNJneuj0" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706871317; x=1738407317; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=aNdfu8a76RK9rIHvl77AB1xtT6MV9sknou2JE8sop7I=; b=ZNJneuj0K/kNcd0ou9b/T6+oVYWBgoDrrQ7Vw2jXadL3wP8YVjuo51zq CUhSPbXuD+GDz442v6K0nAN5VXbjRsndccIbZGAOxB7HEyz00yxXnTcYK tH+vfvC34+Mdgko3NLUKPL1oClQQxZSqWFFhnijJB3qIR3DFr07BehN/f 4zAkY/QG6a2FrqHLkHck0cyNkeGTzWBQF8aRjjxCZjjr+nS2p7yQ73tII L9N2CQx8NVM57EN9xqVcsy0jU8ws63+somNFA2pw7gPbKKzKlHcan417i KqCRgk2dPwviutngxUHb2ERTKaRyVuSXBUvQVVXvtCODH6sQB5x+gpSNu w==; X-IronPort-AV: E=McAfee;i="6600,9927,10971"; a="10787332" X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="10787332" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:16 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="53626" Received: from myep-mobl1.png.intel.com ([10.107.10.166]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:15 -0800 From: Mun Chun Yep To: herbert@gondor.apana.org.au Cc: linux-crypto@vger.kernel.org, qat-linux@intel.com, Mun Chun Yep , Ahsan Atta , Markas Rapoportas , Giovanni Cabiddu Subject: [PATCH v2 6/9] crypto: qat - add fatal error notification Date: Fri, 2 Feb 2024 18:53:21 +0800 Message-Id: <20240202105324.50391-7-mun.chun.yep@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240202105324.50391-1-mun.chun.yep@intel.com> References: <20240202105324.50391-1-mun.chun.yep@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Notify a fatal error condition and optionally reset the device in the following cases: * if the device reports an uncorrectable fatal error through an interrupt * if the heartbeat feature detects that the device is not responding This patch is based on earlier work done by Shashank Gupta. Signed-off-by: Mun Chun Yep Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu --- drivers/crypto/intel/qat/qat_common/adf_heartbeat.c | 3 +++ drivers/crypto/intel/qat/qat_common/adf_isr.c | 7 ++++++- 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c index f88b1bc6857e..fe8428d4ff39 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c @@ -229,6 +229,9 @@ void adf_heartbeat_status(struct adf_accel_dev *accel_dev, "Heartbeat ERROR: QAT is not responding.\n"); *hb_status = HB_DEV_UNRESPONSIVE; hb->hb_failed_counter++; + if (adf_notify_fatal_error(accel_dev)) + dev_err(&GET_DEV(accel_dev), + "Failed to notify fatal error\n"); return; } diff --git a/drivers/crypto/intel/qat/qat_common/adf_isr.c b/drivers/crypto/intel/qat/qat_common/adf_isr.c index a13d9885d60f..020d213f4c99 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_isr.c +++ b/drivers/crypto/intel/qat/qat_common/adf_isr.c @@ -139,8 +139,13 @@ static bool adf_handle_ras_int(struct adf_accel_dev *accel_dev) if (ras_ops->handle_interrupt && ras_ops->handle_interrupt(accel_dev, &reset_required)) { - if (reset_required) + if (reset_required) { dev_err(&GET_DEV(accel_dev), "Fatal error, reset required\n"); + if (adf_notify_fatal_error(accel_dev)) + dev_err(&GET_DEV(accel_dev), + "Failed to notify fatal error\n"); + } + return true; } From patchwork Fri Feb 2 10:53:22 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mun Chun Yep X-Patchwork-Id: 13542728 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 798E71AAB1 for ; Fri, 2 Feb 2024 10:55:19 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871321; cv=none; b=bE9LaZwgjl+3jZ5TJQNQwRfMG5IKSyiLynY5es6y0YnXq5RuhiW9h1/TnaH3TLzwLWgl1WZVrZunclzNKqWkyLF9b1xWYRVWhYvBL7zEk7cSQ5oPNuWwWxCx+12dIXmVXasTWSTJ8a2mv1YO746t38n2XWPDn+idkOnumflRO/k= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871321; c=relaxed/simple; bh=4eVCZxNR793rgSi1WiVtKc0LjycNbDTrHhVkTPYPbns=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=usGrIbS+MwBz7jc04tV33mrJawhStPFPo8YKkicOa3zYSOEPXVFXa/OsKuGYNwNeEql+ibjKPlEYjiNYQM9DvAr9ilYw79K9QD1aoGomlw8Iwm7JL4GOZIM5slLzPtFtj8ioWE4klmrTM/nRb6lRhhQPR1x35qZkGE4ZKziOxno= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=Gue1gwIm; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Gue1gwIm" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706871320; x=1738407320; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=4eVCZxNR793rgSi1WiVtKc0LjycNbDTrHhVkTPYPbns=; b=Gue1gwIm9gaF9broqtySugFOnvMQ8lBJzCkMVfHCHQ+W9GH1dTkBXAhD Y4xxlYViocT6S6oFUiQK+oC5VaICH7LItZNjsuqys/PM1F5AgYr7B9cM2 qlBlKR6yXLek96ibKo61uDgTE7neILRBeREqa/KYXvMbVrv4JqASKGrr/ dvEJFlmRgZiHHj/ujWjKBBupq1uFNbPrysrirG9LGquHMwqIBUoC6Ag3H wPhKJ2Wx279ot3b6R9Hv4UszZzxEB+XeYCE9SAEhIp/37v4t+Bk3K84kt yrab9180DClSG1nrCo/MBJ7LAOxAoxqPGuTSsLnaqEF8QPEkvlZiv674C Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10971"; a="10787349" X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="10787349" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:19 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="53637" Received: from myep-mobl1.png.intel.com ([10.107.10.166]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:17 -0800 From: Mun Chun Yep To: herbert@gondor.apana.org.au Cc: linux-crypto@vger.kernel.org, qat-linux@intel.com, Damian Muszynski , Ahsan Atta , Markas Rapoportas , Giovanni Cabiddu , Mun Chun Yep Subject: [PATCH v2 7/9] crypto: qat - add auto reset on error Date: Fri, 2 Feb 2024 18:53:22 +0800 Message-Id: <20240202105324.50391-8-mun.chun.yep@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240202105324.50391-1-mun.chun.yep@intel.com> References: <20240202105324.50391-1-mun.chun.yep@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Damian Muszynski Expose the `auto_reset` sysfs attribute to configure the driver to reset the device when a fatal error is detected. When auto reset is enabled, the driver resets the device when it detects either an heartbeat failure or a fatal error through an interrupt. This patch is based on earlier work done by Shashank Gupta. Signed-off-by: Damian Muszynski Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Mun Chun Yep --- Documentation/ABI/testing/sysfs-driver-qat | 20 ++++++++++ .../intel/qat/qat_common/adf_accel_devices.h | 1 + drivers/crypto/intel/qat/qat_common/adf_aer.c | 11 +++++- .../intel/qat/qat_common/adf_common_drv.h | 1 + .../crypto/intel/qat/qat_common/adf_sysfs.c | 37 +++++++++++++++++++ 5 files changed, 69 insertions(+), 1 deletion(-) diff --git a/Documentation/ABI/testing/sysfs-driver-qat b/Documentation/ABI/testing/sysfs-driver-qat index bbf329cf0d67..6778f1fea874 100644 --- a/Documentation/ABI/testing/sysfs-driver-qat +++ b/Documentation/ABI/testing/sysfs-driver-qat @@ -141,3 +141,23 @@ Description: 64 This attribute is only available for qat_4xxx devices. + +What: /sys/bus/pci/devices//qat/auto_reset +Date: March 2024 +KernelVersion: 6.8 +Contact: qat-linux@intel.com +Description: (RW) Reports the current state of the autoreset feature + for a QAT device + + Write to the attribute to enable or disable device auto reset. + + Device auto reset is disabled by default. + + The values are:: + + * 1/Yy/on: auto reset enabled. If the device encounters an + unrecoverable error, it will be reset automatically. + * 0/Nn/off: auto reset disabled. If the device encounters an + unrecoverable error, it will not be reset. + + This attribute is only available for qat_4xxx devices. diff --git a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h index 4a3c36aaa7ca..0f26aa976c8c 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h +++ b/drivers/crypto/intel/qat/qat_common/adf_accel_devices.h @@ -402,6 +402,7 @@ struct adf_accel_dev { struct adf_error_counters ras_errors; struct mutex state_lock; /* protect state of the device */ bool is_vf; + bool autoreset_on_error; u32 accel_id; }; #endif diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index cd273b31db0e..b3d4b6b99c65 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -204,6 +204,14 @@ const struct pci_error_handlers adf_err_handler = { }; EXPORT_SYMBOL_GPL(adf_err_handler); +int adf_dev_autoreset(struct adf_accel_dev *accel_dev) +{ + if (accel_dev->autoreset_on_error) + return adf_dev_aer_schedule_reset(accel_dev, ADF_DEV_RESET_ASYNC); + + return 0; +} + static void adf_notify_fatal_error_worker(struct work_struct *work) { struct adf_fatal_error_data *wq_data = @@ -215,10 +223,11 @@ static void adf_notify_fatal_error_worker(struct work_struct *work) if (!accel_dev->is_vf) { /* Disable arbitration to stop processing of new requests */ - if (hw_device->exit_arb) + if (accel_dev->autoreset_on_error && hw_device->exit_arb) hw_device->exit_arb(accel_dev); if (accel_dev->pf.vf_info) adf_pf2vf_notify_fatal_error(accel_dev); + adf_dev_autoreset(accel_dev); } kfree(wq_data); diff --git a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h index 10891c9da6e7..57328249c89e 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_common_drv.h +++ b/drivers/crypto/intel/qat/qat_common/adf_common_drv.h @@ -87,6 +87,7 @@ int adf_ae_stop(struct adf_accel_dev *accel_dev); extern const struct pci_error_handlers adf_err_handler; void adf_reset_sbr(struct adf_accel_dev *accel_dev); void adf_reset_flr(struct adf_accel_dev *accel_dev); +int adf_dev_autoreset(struct adf_accel_dev *accel_dev); void adf_dev_restore(struct adf_accel_dev *accel_dev); int adf_init_aer(void); void adf_exit_aer(void); diff --git a/drivers/crypto/intel/qat/qat_common/adf_sysfs.c b/drivers/crypto/intel/qat/qat_common/adf_sysfs.c index d450dad32c9e..4e7f70d4049d 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_sysfs.c +++ b/drivers/crypto/intel/qat/qat_common/adf_sysfs.c @@ -204,6 +204,42 @@ static ssize_t pm_idle_enabled_store(struct device *dev, struct device_attribute } static DEVICE_ATTR_RW(pm_idle_enabled); +static ssize_t auto_reset_show(struct device *dev, struct device_attribute *attr, + char *buf) +{ + char *auto_reset; + struct adf_accel_dev *accel_dev; + + accel_dev = adf_devmgr_pci_to_accel_dev(to_pci_dev(dev)); + if (!accel_dev) + return -EINVAL; + + auto_reset = accel_dev->autoreset_on_error ? "on" : "off"; + + return sysfs_emit(buf, "%s\n", auto_reset); +} + +static ssize_t auto_reset_store(struct device *dev, struct device_attribute *attr, + const char *buf, size_t count) +{ + struct adf_accel_dev *accel_dev; + bool enabled = false; + int ret; + + ret = kstrtobool(buf, &enabled); + if (ret) + return ret; + + accel_dev = adf_devmgr_pci_to_accel_dev(to_pci_dev(dev)); + if (!accel_dev) + return -EINVAL; + + accel_dev->autoreset_on_error = enabled; + + return count; +} +static DEVICE_ATTR_RW(auto_reset); + static DEVICE_ATTR_RW(state); static DEVICE_ATTR_RW(cfg_services); @@ -291,6 +327,7 @@ static struct attribute *qat_attrs[] = { &dev_attr_pm_idle_enabled.attr, &dev_attr_rp2srv.attr, &dev_attr_num_rps.attr, + &dev_attr_auto_reset.attr, NULL, }; From patchwork Fri Feb 2 10:53:23 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mun Chun Yep X-Patchwork-Id: 13542729 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0CDC01427B for ; Fri, 2 Feb 2024 10:55:22 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871323; cv=none; b=sjP43jqn+5oI5zZ2HHIYwbsAkuaSVUdW2GxrxEL5GFW2VrsA3yucC/US3VAvoB3w9Ev3Vjw39a5JI2hvVZFXfIN0TjSwt8Nv0n49mbEd2MfuaNAFBD+4UHw/OrUvpj5A25sAcnapmAuvcgDSTuqkQN2rlPWv+ukUAKPLRk2D1BQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871323; c=relaxed/simple; bh=KngnFF4I4XEPY1rp+UV+AMgeWe/ZmDkbkgVvIvHrm+k=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=EUkFAUgguSqYqJLsoBmFpFvxESe4sAfeBclbIRueGoqwkM5AXkxgLqZcH8QtihCn0Vt4xa9t3Sl4K5NTVapnwc/FFAtFcRmzMvIEVhZDOLcVjphGBgIAZ/kjtw+1UQgzb8TALKI14XhONDu0l8VnIQ6i0iFOC/ly7TYJ7B9rjvA= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=SmvaNzAa; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="SmvaNzAa" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706871322; x=1738407322; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KngnFF4I4XEPY1rp+UV+AMgeWe/ZmDkbkgVvIvHrm+k=; b=SmvaNzAa9KBWg+A8iqnQZtOOv4FQUQnNOVbkupWTNFVPzFc9VE4EEmHr GESSFBgNl8oZycVg54OSXzHs2CXDc9wz5qlV0mifkbKJVtLH1ZdZR4vh5 JifDcFOi3Y+lOodiAWHf+T/h7XWGQIN+OoEixvlaSX8HyAEI9uH2ZSC6T jsQRYsMdesblQW7Q9MmM1sXEYZXUCwM2vpkxYF0s6oVFEeJk5whXqJdll t3wb/zDM5sc7mIkxyIThR1DkZJ3jk/nsJ9ceaMiabyb8258TRPxWf9YzR Z7PfwHGYDDiIDHvBQY/9pv1J/ENkinN6wCzRrV/FbdL3QMaGWaAqBf9Qb w==; X-IronPort-AV: E=McAfee;i="6600,9927,10971"; a="10787374" X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="10787374" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:22 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="53648" Received: from myep-mobl1.png.intel.com ([10.107.10.166]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:20 -0800 From: Mun Chun Yep To: herbert@gondor.apana.org.au Cc: linux-crypto@vger.kernel.org, qat-linux@intel.com, Furong Zhou , Ahsan Atta , Markas Rapoportas , Giovanni Cabiddu , Mun Chun Yep Subject: [PATCH v2 8/9] crypto: qat - limit heartbeat notifications Date: Fri, 2 Feb 2024 18:53:23 +0800 Message-Id: <20240202105324.50391-9-mun.chun.yep@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240202105324.50391-1-mun.chun.yep@intel.com> References: <20240202105324.50391-1-mun.chun.yep@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Furong Zhou When the driver detects an heartbeat failure, it starts the recovery flow. Set a limit so that the number of events is limited in case the heartbeat status is read too frequently. Signed-off-by: Furong Zhou Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu Signed-off-by: Mun Chun Yep --- .../crypto/intel/qat/qat_common/adf_heartbeat.c | 17 ++++++++++++++--- .../crypto/intel/qat/qat_common/adf_heartbeat.h | 3 +++ 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c index fe8428d4ff39..b19aa1ef8eee 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.c @@ -205,6 +205,19 @@ static int adf_hb_get_status(struct adf_accel_dev *accel_dev) return ret; } +static void adf_heartbeat_reset(struct adf_accel_dev *accel_dev) +{ + u64 curr_time = adf_clock_get_current_time(); + u64 time_since_reset = curr_time - accel_dev->heartbeat->last_hb_reset_time; + + if (time_since_reset < ADF_CFG_HB_RESET_MS) + return; + + accel_dev->heartbeat->last_hb_reset_time = curr_time; + if (adf_notify_fatal_error(accel_dev)) + dev_err(&GET_DEV(accel_dev), "Failed to notify fatal error\n"); +} + void adf_heartbeat_status(struct adf_accel_dev *accel_dev, enum adf_device_heartbeat_status *hb_status) { @@ -229,9 +242,7 @@ void adf_heartbeat_status(struct adf_accel_dev *accel_dev, "Heartbeat ERROR: QAT is not responding.\n"); *hb_status = HB_DEV_UNRESPONSIVE; hb->hb_failed_counter++; - if (adf_notify_fatal_error(accel_dev)) - dev_err(&GET_DEV(accel_dev), - "Failed to notify fatal error\n"); + adf_heartbeat_reset(accel_dev); return; } diff --git a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h index 24c3f4f24c86..16fdfb48b196 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h +++ b/drivers/crypto/intel/qat/qat_common/adf_heartbeat.h @@ -13,6 +13,8 @@ struct dentry; #define ADF_CFG_HB_TIMER_DEFAULT_MS 500 #define ADF_CFG_HB_COUNT_THRESHOLD 3 +#define ADF_CFG_HB_RESET_MS 5000 + enum adf_device_heartbeat_status { HB_DEV_UNRESPONSIVE = 0, HB_DEV_ALIVE, @@ -30,6 +32,7 @@ struct adf_heartbeat { unsigned int hb_failed_counter; unsigned int hb_timer; u64 last_hb_check_time; + u64 last_hb_reset_time; bool ctrs_cnt_checked; struct hb_dma_addr { dma_addr_t phy_addr; From patchwork Fri Feb 2 10:53:24 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mun Chun Yep X-Patchwork-Id: 13542730 X-Patchwork-Delegate: herbert@gondor.apana.org.au Received: from mgamail.intel.com (mgamail.intel.com [192.198.163.11]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 5E32F7A73B for ; Fri, 2 Feb 2024 10:55:24 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=192.198.163.11 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871325; cv=none; b=qgxJ7T2zVr/v7FOAHUnDCHBeAT7VekNvhlOerIzYTa0ATBXVNXvL84e64UFKRFQ12oeKaTM/1wMg2rywJda9dCOzT26QtRdBmrdODkJgonzLStQVZhykcUeCKh8yMlMZzhbfdPYoacDtfM09VmZyRy4quNKwwLuOXSTQrRSROB0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1706871325; c=relaxed/simple; bh=iK5Dg+kPYmmsKf18NNwrJk+3HijZcLRbaN0Q5bzl8yE=; h=From:To:Cc:Subject:Date:Message-Id:In-Reply-To:References: MIME-Version; b=FKA17EqMlXlolKZ09Q6qwaxvNGT2bJ+JWYJ+xn/LBlu0MYxVtHGTsRYaymEDYsfYxOszVxFNCfVegNCVxowC+bk7xmiw2at+tFN7JLjygEco9UpqSSu0N6yMlXERqAXAq2l+K7vAKRxyrF8yMZ7F+pGVNGQPUhveSeT8Z3AF7Ss= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com; spf=pass smtp.mailfrom=intel.com; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b=hULeksqM; arc=none smtp.client-ip=192.198.163.11 Authentication-Results: smtp.subspace.kernel.org; dmarc=pass (p=none dis=none) header.from=intel.com Authentication-Results: smtp.subspace.kernel.org; spf=pass smtp.mailfrom=intel.com Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="hULeksqM" DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1706871324; x=1738407324; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=iK5Dg+kPYmmsKf18NNwrJk+3HijZcLRbaN0Q5bzl8yE=; b=hULeksqMGaNkEhPJpEeBVHsQd6Q2bcLkDi7qdxOHenIAajES6xfN6hRx X6DQY8hxMX93twv5sk0eBZJF0NNLwRI4g4rwCVWYK/x2UWUhHmwGr6OJN 1VPlh3Q8OE5EeJJWig5kx9Ajw7eldVLsJ6mlViSgy8a50SagH/99/45+8 I9IZa3CPXAyxOsp+PD1rBNccrWVW8Z+kdnkiprz9IAEzG5ZiguZuC0NZ0 Siyrv8uoYtMjt+FGEc6hscSwFNmxdMv/TaqqiDYadkzsvRndkRgsyDENw dIbiNyJlnSbWCKag/9lkxyhMHZabARKHdr6sGzZo+WxnigIZBwMaWkdLL g==; X-IronPort-AV: E=McAfee;i="6600,9927,10971"; a="10787396" X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="10787396" Received: from orviesa009.jf.intel.com ([10.64.159.149]) by fmvoesa105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:24 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="6.05,237,1701158400"; d="scan'208";a="53658" Received: from myep-mobl1.png.intel.com ([10.107.10.166]) by orviesa009-auth.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 02 Feb 2024 02:55:22 -0800 From: Mun Chun Yep To: herbert@gondor.apana.org.au Cc: linux-crypto@vger.kernel.org, qat-linux@intel.com, Mun Chun Yep , Ahsan Atta , Markas Rapoportas , Giovanni Cabiddu Subject: [PATCH v2 9/9] crypto: qat - improve aer error reset handling Date: Fri, 2 Feb 2024 18:53:24 +0800 Message-Id: <20240202105324.50391-10-mun.chun.yep@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20240202105324.50391-1-mun.chun.yep@intel.com> References: <20240202105324.50391-1-mun.chun.yep@intel.com> Precedence: bulk X-Mailing-List: linux-crypto@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 Rework the AER reset and recovery flow to take into account root port integrated devices that gets reset between the error detected and the slot reset callbacks. In adf_error_detected() the devices is gracefully shut down. The worker threads are disabled, the error conditions are notified to listeners and through PFVF comms and finally the device is reset as part of adf_dev_down(). In adf_slot_reset(), the device is brought up again. If SRIOV VFs were enabled before reset, these are re-enabled and VFs are notified of restarting through PFVF comms. Signed-off-by: Mun Chun Yep Reviewed-by: Ahsan Atta Reviewed-by: Markas Rapoportas Reviewed-by: Giovanni Cabiddu --- drivers/crypto/intel/qat/qat_common/adf_aer.c | 26 ++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/drivers/crypto/intel/qat/qat_common/adf_aer.c b/drivers/crypto/intel/qat/qat_common/adf_aer.c index b3d4b6b99c65..3597e7605a14 100644 --- a/drivers/crypto/intel/qat/qat_common/adf_aer.c +++ b/drivers/crypto/intel/qat/qat_common/adf_aer.c @@ -33,6 +33,19 @@ static pci_ers_result_t adf_error_detected(struct pci_dev *pdev, return PCI_ERS_RESULT_DISCONNECT; } + set_bit(ADF_STATUS_RESTARTING, &accel_dev->status); + if (accel_dev->hw_device->exit_arb) { + dev_dbg(&pdev->dev, "Disabling arbitration\n"); + accel_dev->hw_device->exit_arb(accel_dev); + } + adf_error_notifier(accel_dev); + adf_pf2vf_notify_fatal_error(accel_dev); + adf_dev_restarting_notify(accel_dev); + adf_pf2vf_notify_restarting(accel_dev); + adf_pf2vf_wait_for_restarting_complete(accel_dev); + pci_clear_master(pdev); + adf_dev_down(accel_dev, false); + return PCI_ERS_RESULT_NEED_RESET; } @@ -180,14 +193,25 @@ static int adf_dev_aer_schedule_reset(struct adf_accel_dev *accel_dev, static pci_ers_result_t adf_slot_reset(struct pci_dev *pdev) { struct adf_accel_dev *accel_dev = adf_devmgr_pci_to_accel_dev(pdev); + int res = 0; if (!accel_dev) { pr_err("QAT: Can't find acceleration device\n"); return PCI_ERS_RESULT_DISCONNECT; } - if (adf_dev_aer_schedule_reset(accel_dev, ADF_DEV_RESET_SYNC)) + + if (!pdev->is_busmaster) + pci_set_master(pdev); + pci_restore_state(pdev); + pci_save_state(pdev); + res = adf_dev_up(accel_dev, false); + if (res && res != -EALREADY) return PCI_ERS_RESULT_DISCONNECT; + adf_reenable_sriov(accel_dev); + adf_pf2vf_notify_restarted(accel_dev); + adf_dev_restarted_notify(accel_dev); + clear_bit(ADF_STATUS_RESTARTING, &accel_dev->status); return PCI_ERS_RESULT_RECOVERED; }