From patchwork Tue Nov 21 02:51:00 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462479 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="TzrAtX0R" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8E2E1CB; Mon, 20 Nov 2023 18:49:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700534990; x=1732070990; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5mmDRZqN56RQo8t8hR6LlwHoKlEk5QkHY/79mVty8Uk=; b=TzrAtX0Rs4GoPOLTyDCDKX1+10uBAVnAiUQm5d/FB38jMtpcDWgBxmcc SEToXB7OM6o4kTaPuXuzZeaYIwE+gxw3R6ZTc9qTSsUkajAI/exExtY5P 3SUAOJemzbu06VI+MJmprMcszvbujpJDDHEBLDBL23KL0N7VWZ6BHhsEO 33MDd/Jso0Kf7oLbTn/zYA/qUl4ayUJ3h5CB0udrlV7iNiVIuyVAIG0v3 lCsRfdw1g1MR3dSEYkWtI7b98fpS9UJCN6pwO4kMKZ0tSb42p6Gq/FuUJ rgUUMn61t+N1cudnBGtq4r1hC76IqI+DpIPGMiov7AlY8spnN0iud4qWr g==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458245838" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458245838" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:49:50 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488198" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488198" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:49:45 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 01/12] ice: Add function to get RX queue context Date: Tue, 21 Nov 2023 02:51:00 +0000 Message-Id: <20231121025111.257597-2-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Export RX queue context get function which is consumed by linux live migration driver to save and load device state. Signed-off-by: Yahui Cao Signed-off-by: Lingyu Liu --- drivers/net/ethernet/intel/ice/ice_common.c | 268 ++++++++++++++++++++ drivers/net/ethernet/intel/ice/ice_common.h | 5 + 2 files changed, 273 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c index 9a6c25f98632..d0a3bed00921 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.c +++ b/drivers/net/ethernet/intel/ice/ice_common.c @@ -1540,6 +1540,34 @@ ice_copy_rxq_ctx_to_hw(struct ice_hw *hw, u8 *ice_rxq_ctx, u32 rxq_index) return 0; } +/** + * ice_copy_rxq_ctx_from_hw - Copy rxq context register from HW + * @hw: pointer to the hardware structure + * @ice_rxq_ctx: pointer to the rxq context + * @rxq_index: the index of the Rx queue + * + * Copy rxq context from HW register space to dense structure + */ +static int +ice_copy_rxq_ctx_from_hw(struct ice_hw *hw, u8 *ice_rxq_ctx, u32 rxq_index) +{ + u8 i; + + if (!ice_rxq_ctx || rxq_index > QRX_CTRL_MAX_INDEX) + return -EINVAL; + + /* Copy each dword separately from HW */ + for (i = 0; i < ICE_RXQ_CTX_SIZE_DWORDS; i++) { + u32 *ctx = (u32 *)(ice_rxq_ctx + (i * sizeof(u32))); + + *ctx = rd32(hw, QRX_CONTEXT(i, rxq_index)); + + ice_debug(hw, ICE_DBG_QCTX, "qrxdata[%d]: %08X\n", i, *ctx); + } + + return 0; +} + /* LAN Rx Queue Context */ static const struct ice_ctx_ele ice_rlan_ctx_info[] = { /* Field Width LSB */ @@ -1591,6 +1619,32 @@ ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx, return ice_copy_rxq_ctx_to_hw(hw, ctx_buf, rxq_index); } +/** + * ice_read_rxq_ctx - Read rxq context from HW + * @hw: pointer to the hardware structure + * @rlan_ctx: pointer to the rxq context + * @rxq_index: the index of the Rx queue + * + * Read rxq context from HW register space and then converts it from dense + * structure to sparse + */ +int +ice_read_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx, + u32 rxq_index) +{ + u8 ctx_buf[ICE_RXQ_CTX_SZ] = { 0 }; + int status; + + if (!rlan_ctx) + return -EINVAL; + + status = ice_copy_rxq_ctx_from_hw(hw, ctx_buf, rxq_index); + if (status) + return status; + + return ice_get_ctx(ctx_buf, (u8 *)rlan_ctx, ice_rlan_ctx_info); +} + /* LAN Tx Queue Context */ const struct ice_ctx_ele ice_tlan_ctx_info[] = { /* Field Width LSB */ @@ -4743,6 +4797,220 @@ ice_set_ctx(struct ice_hw *hw, u8 *src_ctx, u8 *dest_ctx, return 0; } +/** + * ice_read_byte - read context byte into struct + * @src_ctx: the context structure to read from + * @dest_ctx: the context to be written to + * @ce_info: a description of the struct to be filled + */ +static void +ice_read_byte(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info) +{ + u8 dest_byte, mask; + u8 *src, *target; + u16 shift_width; + + /* prepare the bits and mask */ + shift_width = ce_info->lsb % 8; + mask = (u8)(BIT(ce_info->width) - 1); + + /* shift to correct alignment */ + mask <<= shift_width; + + /* get the current bits from the src bit string */ + src = src_ctx + (ce_info->lsb / 8); + + memcpy(&dest_byte, src, sizeof(dest_byte)); + + dest_byte &= mask; + + dest_byte >>= shift_width; + + /* get the address from the struct field */ + target = dest_ctx + ce_info->offset; + + /* put it back in the struct */ + memcpy(target, &dest_byte, sizeof(dest_byte)); +} + +/** + * ice_read_word - read context word into struct + * @src_ctx: the context structure to read from + * @dest_ctx: the context to be written to + * @ce_info: a description of the struct to be filled + */ +static void +ice_read_word(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info) +{ + u16 dest_word, mask; + u8 *src, *target; + __le16 src_word; + u16 shift_width; + + /* prepare the bits and mask */ + shift_width = ce_info->lsb % 8; + mask = BIT(ce_info->width) - 1; + + /* shift to correct alignment */ + mask <<= shift_width; + + /* get the current bits from the src bit string */ + src = src_ctx + (ce_info->lsb / 8); + + memcpy(&src_word, src, sizeof(src_word)); + + /* the data in the memory is stored as little endian so mask it + * correctly + */ + src_word &= cpu_to_le16(mask); + + /* get the data back into host order before shifting */ + dest_word = le16_to_cpu(src_word); + + dest_word >>= shift_width; + + /* get the address from the struct field */ + target = dest_ctx + ce_info->offset; + + /* put it back in the struct */ + memcpy(target, &dest_word, sizeof(dest_word)); +} + +/** + * ice_read_dword - read context dword into struct + * @src_ctx: the context structure to read from + * @dest_ctx: the context to be written to + * @ce_info: a description of the struct to be filled + */ +static void +ice_read_dword(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info) +{ + u32 dest_dword, mask; + __le32 src_dword; + u8 *src, *target; + u16 shift_width; + + /* prepare the bits and mask */ + shift_width = ce_info->lsb % 8; + + /* if the field width is exactly 32 on an x86 machine, then the shift + * operation will not work because the SHL instructions count is masked + * to 5 bits so the shift will do nothing + */ + if (ce_info->width < 32) + mask = BIT(ce_info->width) - 1; + else + mask = (u32)~0; + + /* shift to correct alignment */ + mask <<= shift_width; + + /* get the current bits from the src bit string */ + src = src_ctx + (ce_info->lsb / 8); + + memcpy(&src_dword, src, sizeof(src_dword)); + + /* the data in the memory is stored as little endian so mask it + * correctly + */ + src_dword &= cpu_to_le32(mask); + + /* get the data back into host order before shifting */ + dest_dword = le32_to_cpu(src_dword); + + dest_dword >>= shift_width; + + /* get the address from the struct field */ + target = dest_ctx + ce_info->offset; + + /* put it back in the struct */ + memcpy(target, &dest_dword, sizeof(dest_dword)); +} + +/** + * ice_read_qword - read context qword into struct + * @src_ctx: the context structure to read from + * @dest_ctx: the context to be written to + * @ce_info: a description of the struct to be filled + */ +static void +ice_read_qword(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info) +{ + u64 dest_qword, mask; + __le64 src_qword; + u8 *src, *target; + u16 shift_width; + + /* prepare the bits and mask */ + shift_width = ce_info->lsb % 8; + + /* if the field width is exactly 64 on an x86 machine, then the shift + * operation will not work because the SHL instructions count is masked + * to 6 bits so the shift will do nothing + */ + if (ce_info->width < 64) + mask = BIT_ULL(ce_info->width) - 1; + else + mask = (u64)~0; + + /* shift to correct alignment */ + mask <<= shift_width; + + /* get the current bits from the src bit string */ + src = src_ctx + (ce_info->lsb / 8); + + memcpy(&src_qword, src, sizeof(src_qword)); + + /* the data in the memory is stored as little endian so mask it + * correctly + */ + src_qword &= cpu_to_le64(mask); + + /* get the data back into host order before shifting */ + dest_qword = le64_to_cpu(src_qword); + + dest_qword >>= shift_width; + + /* get the address from the struct field */ + target = dest_ctx + ce_info->offset; + + /* put it back in the struct */ + memcpy(target, &dest_qword, sizeof(dest_qword)); +} + +/** + * ice_get_ctx - extract context bits from a packed structure + * @src_ctx: pointer to a generic packed context structure + * @dest_ctx: pointer to a generic non-packed context structure + * @ce_info: a description of the structure to be read from + */ +int +ice_get_ctx(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info) +{ + int i; + + for (i = 0; ce_info[i].width; i++) { + switch (ce_info[i].size_of) { + case 1: + ice_read_byte(src_ctx, dest_ctx, &ce_info[i]); + break; + case 2: + ice_read_word(src_ctx, dest_ctx, &ce_info[i]); + break; + case 4: + ice_read_dword(src_ctx, dest_ctx, &ce_info[i]); + break; + case 8: + ice_read_qword(src_ctx, dest_ctx, &ce_info[i]); + break; + default: + return -EINVAL; + } + } + + return 0; +} + /** * ice_get_lan_q_ctx - get the LAN queue context for the given VSI and TC * @hw: pointer to the HW struct diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h index 31fdcac33986..df9c7f30592a 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.h +++ b/drivers/net/ethernet/intel/ice/ice_common.h @@ -55,6 +55,9 @@ void ice_set_safe_mode_caps(struct ice_hw *hw); int ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx, u32 rxq_index); +int +ice_read_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx, + u32 rxq_index); int ice_aq_get_rss_lut(struct ice_hw *hw, struct ice_aq_get_set_rss_lut_params *get_params); @@ -74,6 +77,8 @@ extern const struct ice_ctx_ele ice_tlan_ctx_info[]; int ice_set_ctx(struct ice_hw *hw, u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info); +int +ice_get_ctx(u8 *src_ctx, u8 *dest_ctx, const struct ice_ctx_ele *ce_info); extern struct mutex ice_global_cfg_lock_sw; From patchwork Tue Nov 21 02:51:01 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462480 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ll4ySK7p" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 35A4BCB; Mon, 20 Nov 2023 18:49:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700534995; x=1732070995; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Vsbg5WYpgIdaFM7kidb5HJYGh+XfqpqCp0zWMK3Lung=; b=ll4ySK7p3tEGIehaIYtQ+FRVV67Gbf46sLjEcsfI3nfMTX0k9j0kGPOl gYp29Hb/6DWN9+tUb61rfUW0VkkOR1HwCSAsOog/3/s5RZFtjdyYTHS2e MzXjOgf2YsQ1WqrHr8tupgSn+H/yBiPQ7ez3ElW6tGxJhGmREnSdMX2a1 wBi/cCNW8LiPSN6FXKt+fspWhCG3XlGLzcUm46Me+mQJ3s+eHLsVXOxK9 tSw3k3bckGOslW+2lRnN9SlLN1nVWqBDZBeooxWXU8uhGDDrCQuwIzbjY IWN3L6moSb3W4MMwecnpK6fW11m3L75S4u58xfyd7nhBjAx8/iYZnAqrY Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458245861" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458245861" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:49:54 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488229" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488229" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:49:49 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 02/12] ice: Add function to get and set TX queue context Date: Tue, 21 Nov 2023 02:51:01 +0000 Message-Id: <20231121025111.257597-3-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org Export TX queue context get and set function which is consumed by linux live migration driver to save and load device state. TX queue context contains static fields which does not change during TX traffic and dynamic fields which may change during TX traffic. Signed-off-by: Yahui Cao --- drivers/net/ethernet/intel/ice/ice_common.c | 216 +++++++++++++++++- drivers/net/ethernet/intel/ice/ice_common.h | 6 + .../net/ethernet/intel/ice/ice_hw_autogen.h | 15 ++ .../net/ethernet/intel/ice/ice_lan_tx_rx.h | 3 + 4 files changed, 239 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c index d0a3bed00921..8577a5ef423e 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.c +++ b/drivers/net/ethernet/intel/ice/ice_common.c @@ -1645,7 +1645,10 @@ ice_read_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx, return ice_get_ctx(ctx_buf, (u8 *)rlan_ctx, ice_rlan_ctx_info); } -/* LAN Tx Queue Context */ +/* LAN Tx Queue Context used for set Tx config by ice_aqc_opc_add_txqs, + * Bit[0-175] is valid + */ + const struct ice_ctx_ele ice_tlan_ctx_info[] = { /* Field Width LSB */ ICE_CTX_STORE(ice_tlan_ctx, base, 57, 0), @@ -1679,6 +1682,217 @@ const struct ice_ctx_ele ice_tlan_ctx_info[] = { { 0 } }; +/* LAN Tx Queue Context used for get Tx config from QTXCOMM_CNTX data, + * Bit[0-292] is valid, including internal queue state. Since internal + * queue state is dynamic field, its value will be cleared once queue + * is disabled + */ +static const struct ice_ctx_ele ice_tlan_ctx_data_info[] = { + /* Field Width LSB */ + ICE_CTX_STORE(ice_tlan_ctx, base, 57, 0), + ICE_CTX_STORE(ice_tlan_ctx, port_num, 3, 57), + ICE_CTX_STORE(ice_tlan_ctx, cgd_num, 5, 60), + ICE_CTX_STORE(ice_tlan_ctx, pf_num, 3, 65), + ICE_CTX_STORE(ice_tlan_ctx, vmvf_num, 10, 68), + ICE_CTX_STORE(ice_tlan_ctx, vmvf_type, 2, 78), + ICE_CTX_STORE(ice_tlan_ctx, src_vsi, 10, 80), + ICE_CTX_STORE(ice_tlan_ctx, tsyn_ena, 1, 90), + ICE_CTX_STORE(ice_tlan_ctx, internal_usage_flag, 1, 91), + ICE_CTX_STORE(ice_tlan_ctx, alt_vlan, 1, 92), + ICE_CTX_STORE(ice_tlan_ctx, cpuid, 8, 93), + ICE_CTX_STORE(ice_tlan_ctx, wb_mode, 1, 101), + ICE_CTX_STORE(ice_tlan_ctx, tphrd_desc, 1, 102), + ICE_CTX_STORE(ice_tlan_ctx, tphrd, 1, 103), + ICE_CTX_STORE(ice_tlan_ctx, tphwr_desc, 1, 104), + ICE_CTX_STORE(ice_tlan_ctx, cmpq_id, 9, 105), + ICE_CTX_STORE(ice_tlan_ctx, qnum_in_func, 14, 114), + ICE_CTX_STORE(ice_tlan_ctx, itr_notification_mode, 1, 128), + ICE_CTX_STORE(ice_tlan_ctx, adjust_prof_id, 6, 129), + ICE_CTX_STORE(ice_tlan_ctx, qlen, 13, 135), + ICE_CTX_STORE(ice_tlan_ctx, quanta_prof_idx, 4, 148), + ICE_CTX_STORE(ice_tlan_ctx, tso_ena, 1, 152), + ICE_CTX_STORE(ice_tlan_ctx, tso_qnum, 11, 153), + ICE_CTX_STORE(ice_tlan_ctx, legacy_int, 1, 164), + ICE_CTX_STORE(ice_tlan_ctx, drop_ena, 1, 165), + ICE_CTX_STORE(ice_tlan_ctx, cache_prof_idx, 2, 166), + ICE_CTX_STORE(ice_tlan_ctx, pkt_shaper_prof_idx, 3, 168), + ICE_CTX_STORE(ice_tlan_ctx, tail, 13, 184), + { 0 } +}; + +/** + * ice_copy_txq_ctx_from_hw - Copy txq context register from HW + * @hw: pointer to the hardware structure + * @ice_txq_ctx: pointer to the txq context + * + * Copy txq context from HW register space to dense structure + */ +static int +ice_copy_txq_ctx_from_hw(struct ice_hw *hw, u8 *ice_txq_ctx) +{ + u8 i; + + if (!ice_txq_ctx) + return -EINVAL; + + /* Copy each dword separately from HW */ + for (i = 0; i < ICE_TXQ_CTX_SIZE_DWORDS; i++) { + u32 *ctx = (u32 *)(ice_txq_ctx + (i * sizeof(u32))); + + *ctx = rd32(hw, GLCOMM_QTX_CNTX_DATA(i)); + + ice_debug(hw, ICE_DBG_QCTX, "qtxdata[%d]: %08X\n", i, *ctx); + } + + return 0; +} + +/** + * ice_copy_txq_ctx_to_hw - Copy txq context register into HW + * @hw: pointer to the hardware structure + * @ice_txq_ctx: pointer to the txq context + * + * Copy txq context from dense structure to HW register space + */ +static int +ice_copy_txq_ctx_to_hw(struct ice_hw *hw, u8 *ice_txq_ctx) +{ + u8 i; + + if (!ice_txq_ctx) + return -EINVAL; + + /* Copy each dword separately to HW */ + for (i = 0; i < ICE_TXQ_CTX_SIZE_DWORDS; i++) { + u32 *ctx = (u32 *)(ice_txq_ctx + (i * sizeof(u32))); + + wr32(hw, GLCOMM_QTX_CNTX_DATA(i), *ctx); + + ice_debug(hw, ICE_DBG_QCTX, "qtxdata[%d]: %08X\n", i, *ctx); + } + + return 0; +} + +/* Configuration access to tx ring context(from PF) is done via indirect + * interface, GLCOMM_QTX_CNTX_CTL/DATA registers. However, there registers + * are shared by all the PFs with single PCI card. Hence multiplied PF may + * access there registers simultaneously, causing access conflicts. Then + * card-level grained locking is required to protect these registers from + * being competed by PF devices within the same card. However, there is no + * such kind of card-level locking supported. Introduce a coarse grained + * global lock which is shared by all the PF driver. + * + * The overall flow is to acquire the lock, read/write TXQ context through + * GLCOMM_QTX_CNTX_CTL/DATA indirect interface and release the lock once + * access is completed. In this way, only one PF can have access to TXQ + * context safely. + */ +static DEFINE_MUTEX(ice_global_txq_ctx_lock); + +/** + * ice_read_txq_ctx - Read txq context from HW + * @hw: pointer to the hardware structure + * @tlan_ctx: pointer to the txq context + * @txq_index: the index of the Tx queue + * + * Read txq context from HW register space and then convert it from dense + * structure to sparse + */ +int +ice_read_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx, + u32 txq_index) +{ + u8 ctx_buf[ICE_TXQ_CTX_SZ] = { 0 }; + int status; + u32 txq_base; + u32 cmd, reg; + + if (!tlan_ctx) + return -EINVAL; + + if (txq_index > QTX_COMM_HEAD_MAX_INDEX) + return -EINVAL; + + /* Get TXQ base within card space */ + txq_base = rd32(hw, PFLAN_TX_QALLOC(hw->pf_id)); + txq_base = (txq_base & PFLAN_TX_QALLOC_FIRSTQ_M) >> + PFLAN_TX_QALLOC_FIRSTQ_S; + + cmd = (GLCOMM_QTX_CNTX_CTL_CMD_READ + << GLCOMM_QTX_CNTX_CTL_CMD_S) & GLCOMM_QTX_CNTX_CTL_CMD_M; + reg = cmd | GLCOMM_QTX_CNTX_CTL_CMD_EXEC_M | + (((txq_base + txq_index) << GLCOMM_QTX_CNTX_CTL_QUEUE_ID_S) & + GLCOMM_QTX_CNTX_CTL_QUEUE_ID_M); + + mutex_lock(&ice_global_txq_ctx_lock); + + wr32(hw, GLCOMM_QTX_CNTX_CTL, reg); + ice_flush(hw); + + status = ice_copy_txq_ctx_from_hw(hw, ctx_buf); + if (status) { + mutex_unlock(&ice_global_txq_ctx_lock); + return status; + } + + mutex_unlock(&ice_global_txq_ctx_lock); + + return ice_get_ctx(ctx_buf, (u8 *)tlan_ctx, ice_tlan_ctx_data_info); +} + +/** + * ice_write_txq_ctx - Write txq context from HW + * @hw: pointer to the hardware structure + * @tlan_ctx: pointer to the txq context + * @txq_index: the index of the Tx queue + * + * Convert txq context from sparse to dense structure and then write + * it to HW register space + */ +int +ice_write_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx, + u32 txq_index) +{ + u8 ctx_buf[ICE_TXQ_CTX_SZ] = { 0 }; + int status; + u32 txq_base; + u32 cmd, reg; + + if (!tlan_ctx) + return -EINVAL; + + if (txq_index > QTX_COMM_HEAD_MAX_INDEX) + return -EINVAL; + + ice_set_ctx(hw, (u8 *)tlan_ctx, ctx_buf, ice_tlan_ctx_info); + + /* Get TXQ base within card space */ + txq_base = rd32(hw, PFLAN_TX_QALLOC(hw->pf_id)); + txq_base = (txq_base & PFLAN_TX_QALLOC_FIRSTQ_M) >> + PFLAN_TX_QALLOC_FIRSTQ_S; + + cmd = (GLCOMM_QTX_CNTX_CTL_CMD_WRITE_NO_DYN + << GLCOMM_QTX_CNTX_CTL_CMD_S) & GLCOMM_QTX_CNTX_CTL_CMD_M; + reg = cmd | GLCOMM_QTX_CNTX_CTL_CMD_EXEC_M | + (((txq_base + txq_index) << GLCOMM_QTX_CNTX_CTL_QUEUE_ID_S) & + GLCOMM_QTX_CNTX_CTL_QUEUE_ID_M); + + mutex_lock(&ice_global_txq_ctx_lock); + + status = ice_copy_txq_ctx_to_hw(hw, ctx_buf); + if (status) { + mutex_lock(&ice_global_txq_ctx_lock); + return status; + } + + wr32(hw, GLCOMM_QTX_CNTX_CTL, reg); + ice_flush(hw); + + mutex_unlock(&ice_global_txq_ctx_lock); + + return 0; +} /* Sideband Queue command wrappers */ /** diff --git a/drivers/net/ethernet/intel/ice/ice_common.h b/drivers/net/ethernet/intel/ice/ice_common.h index df9c7f30592a..40fbb9088475 100644 --- a/drivers/net/ethernet/intel/ice/ice_common.h +++ b/drivers/net/ethernet/intel/ice/ice_common.h @@ -58,6 +58,12 @@ ice_write_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx, int ice_read_rxq_ctx(struct ice_hw *hw, struct ice_rlan_ctx *rlan_ctx, u32 rxq_index); +int +ice_read_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx, + u32 txq_index); +int +ice_write_txq_ctx(struct ice_hw *hw, struct ice_tlan_ctx *tlan_ctx, + u32 txq_index); int ice_aq_get_rss_lut(struct ice_hw *hw, struct ice_aq_get_set_rss_lut_params *get_params); diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h index 86936b758ade..7410da715ad4 100644 --- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h +++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h @@ -8,6 +8,7 @@ #define QTX_COMM_DBELL(_DBQM) (0x002C0000 + ((_DBQM) * 4)) #define QTX_COMM_HEAD(_DBQM) (0x000E0000 + ((_DBQM) * 4)) +#define QTX_COMM_HEAD_MAX_INDEX 16383 #define QTX_COMM_HEAD_HEAD_S 0 #define QTX_COMM_HEAD_HEAD_M ICE_M(0x1FFF, 0) #define PF_FW_ARQBAH 0x00080180 @@ -258,6 +259,9 @@ #define VPINT_ALLOC_PCI_VALID_M BIT(31) #define VPINT_MBX_CTL(_VSI) (0x0016A000 + ((_VSI) * 4)) #define VPINT_MBX_CTL_CAUSE_ENA_M BIT(30) +#define PFLAN_TX_QALLOC(_PF) (0x001D2580 + ((_PF) * 4)) +#define PFLAN_TX_QALLOC_FIRSTQ_S 0 +#define PFLAN_TX_QALLOC_FIRSTQ_M ICE_M(0x3FFF, 0) #define GLLAN_RCTL_0 0x002941F8 #define QRX_CONTEXT(_i, _QRX) (0x00280000 + ((_i) * 8192 + (_QRX) * 4)) #define QRX_CTRL(_QRX) (0x00120000 + ((_QRX) * 4)) @@ -362,6 +366,17 @@ #define GLNVM_ULD_POR_DONE_1_M BIT(8) #define GLNVM_ULD_PCIER_DONE_2_M BIT(9) #define GLNVM_ULD_PE_DONE_M BIT(10) +#define GLCOMM_QTX_CNTX_CTL 0x002D2DC8 +#define GLCOMM_QTX_CNTX_CTL_QUEUE_ID_S 0 +#define GLCOMM_QTX_CNTX_CTL_QUEUE_ID_M ICE_M(0x3FFF, 0) +#define GLCOMM_QTX_CNTX_CTL_CMD_S 16 +#define GLCOMM_QTX_CNTX_CTL_CMD_M ICE_M(0x7, 16) +#define GLCOMM_QTX_CNTX_CTL_CMD_READ 0 +#define GLCOMM_QTX_CNTX_CTL_CMD_WRITE 1 +#define GLCOMM_QTX_CNTX_CTL_CMD_RESET 3 +#define GLCOMM_QTX_CNTX_CTL_CMD_WRITE_NO_DYN 4 +#define GLCOMM_QTX_CNTX_CTL_CMD_EXEC_M BIT(19) +#define GLCOMM_QTX_CNTX_DATA(_i) (0x002D2D40 + ((_i) * 4)) #define GLPCI_CNF2 0x000BE004 #define GLPCI_CNF2_CACHELINE_SIZE_M BIT(1) #define PF_FUNC_RID 0x0009E880 diff --git a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h index 89f986a75cc8..79e07c863ae0 100644 --- a/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h +++ b/drivers/net/ethernet/intel/ice/ice_lan_tx_rx.h @@ -431,6 +431,8 @@ enum ice_rx_flex_desc_status_error_1_bits { #define ICE_RXQ_CTX_SIZE_DWORDS 8 #define ICE_RXQ_CTX_SZ (ICE_RXQ_CTX_SIZE_DWORDS * sizeof(u32)) +#define ICE_TXQ_CTX_SIZE_DWORDS 10 +#define ICE_TXQ_CTX_SZ (ICE_TXQ_CTX_SIZE_DWORDS * sizeof(u32)) #define ICE_TX_CMPLTNQ_CTX_SIZE_DWORDS 22 #define ICE_TX_DRBELL_Q_CTX_SIZE_DWORDS 5 #define GLTCLAN_CQ_CNTX(i, CQ) (GLTCLAN_CQ_CNTX0(CQ) + ((i) * 0x0800)) @@ -649,6 +651,7 @@ struct ice_tlan_ctx { u8 cache_prof_idx; u8 pkt_shaper_prof_idx; u8 int_q_state; /* width not needed - internal - DO NOT WRITE!!! */ + u16 tail; }; /* The ice_ptype_lkup table is used to convert from the 10-bit ptype in the From patchwork Tue Nov 21 02:51:02 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462481 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="GjoMWj+B" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 25114C4; Mon, 20 Nov 2023 18:50:19 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700535019; x=1732071019; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=BBsmOnVexqUZSEHq9zjpJKeRotCf46aFcR8ZgegU6Gg=; b=GjoMWj+BZuJHP5o355X4D34ARcBwLa6sXstGAFXF+XoxYiMBLgUITiAw t4UwQxpLOb6vr2wQThIO1re3zNd+liR465MJVf1NvAJBStB9CQSLmP+DS biXSplZpjXByT1xw3afyQZ2wwktv4fGQm9YT3L2zSvUCgfzLUpXBzO2Fj 2UvFQvl0DqrOtSJtVwF3TAdZ+4OK8HDLKnYHmPpULv9X8oB1MaBwn41nV K0FwvoJAZUZNzw2KbMmOU444q2oIWs9wQv88vNt9AKxFtvgHQ+NR9+1h+ 61akBLumZT4aKz6zQg8ND+EEg9JQD5fIvbmCM2Kd0e1BpAuom28oNZd7A Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458245899" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458245899" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:49:59 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488276" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488276" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:49:54 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 03/12] ice: Introduce VF state ICE_VF_STATE_REPLAYING_VC for migration Date: Tue, 21 Nov 2023 02:51:02 +0000 Message-Id: <20231121025111.257597-4-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Lingyu Liu During migration device resume stage, part of device state is loaded by replaying logged virtual channel message. By default, once virtual channel message is processed successfully, PF will send message to VF. In addition, PF will notify VF about link state while handling virtual channel message GET_VF_RESOURCE and ENABLE_QUEUES. And VF driver will print link state change info once receiving notification from PF. However, device resume stage does not need PF to send messages to VF for the above cases. Stop PF from sending messages to VF while VF is in replay state. Signed-off-by: Lingyu Liu Signed-off-by: Yahui Cao --- drivers/net/ethernet/intel/ice/ice_vf_lib.h | 1 + drivers/net/ethernet/intel/ice/ice_virtchnl.c | 179 +++++++++++------- drivers/net/ethernet/intel/ice/ice_virtchnl.h | 8 +- .../ethernet/intel/ice/ice_virtchnl_fdir.c | 28 +-- 4 files changed, 127 insertions(+), 89 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h index 93c774f2f437..c7e7df7baf38 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h @@ -37,6 +37,7 @@ enum ice_vf_states { ICE_VF_STATE_DIS, ICE_VF_STATE_MC_PROMISC, ICE_VF_STATE_UC_PROMISC, + ICE_VF_STATE_REPLAYING_VC, ICE_VF_STATES_NBITS }; diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index cdf17b1e2f25..661ca86c3032 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -233,6 +233,9 @@ void ice_vc_notify_vf_link_state(struct ice_vf *vf) struct virtchnl_pf_event pfe = { 0 }; struct ice_hw *hw = &vf->pf->hw; + if (test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states)) + return; + pfe.event = VIRTCHNL_EVENT_LINK_CHANGE; pfe.severity = PF_EVENT_SEVERITY_INFO; @@ -282,7 +285,7 @@ void ice_vc_notify_reset(struct ice_pf *pf) } /** - * ice_vc_send_msg_to_vf - Send message to VF + * ice_vc_send_response_to_vf - Send response message to VF * @vf: pointer to the VF info * @v_opcode: virtual channel opcode * @v_retval: virtual channel return value @@ -291,9 +294,10 @@ void ice_vc_notify_reset(struct ice_pf *pf) * * send msg to VF */ -int -ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode, - enum virtchnl_status_code v_retval, u8 *msg, u16 msglen) +static int +ice_vc_send_response_to_vf(struct ice_vf *vf, u32 v_opcode, + enum virtchnl_status_code v_retval, + u8 *msg, u16 msglen) { struct device *dev; struct ice_pf *pf; @@ -314,6 +318,39 @@ ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode, return 0; } +/** + * ice_vc_respond_to_vf - Respond to VF + * @vf: pointer to the VF info + * @v_opcode: virtual channel opcode + * @v_retval: virtual channel return value + * @msg: pointer to the msg buffer + * @msglen: msg length + * + * Respond to VF. If it is replaying, return directly. + * + * Return 0 for success, negative for error. + */ +int +ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode, + enum virtchnl_status_code v_retval, u8 *msg, u16 msglen) +{ + struct device *dev; + struct ice_pf *pf = vf->pf; + + dev = ice_pf_to_dev(pf); + + if (test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states)) { + if (v_retval == VIRTCHNL_STATUS_SUCCESS) + return 0; + + dev_dbg(dev, "Unable to replay virt channel command, VF ID %d, virtchnl status code %d. op code %d, len %d.\n", + vf->vf_id, v_retval, v_opcode, msglen); + return -EIO; + } + + return ice_vc_send_response_to_vf(vf, v_opcode, v_retval, msg, msglen); +} + /** * ice_vc_get_ver_msg * @vf: pointer to the VF info @@ -332,9 +369,9 @@ static int ice_vc_get_ver_msg(struct ice_vf *vf, u8 *msg) if (VF_IS_V10(&vf->vf_ver)) info.minor = VIRTCHNL_VERSION_MINOR_NO_VF_CAPS; - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_VERSION, - VIRTCHNL_STATUS_SUCCESS, (u8 *)&info, - sizeof(struct virtchnl_version_info)); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_VERSION, + VIRTCHNL_STATUS_SUCCESS, (u8 *)&info, + sizeof(struct virtchnl_version_info)); } /** @@ -522,8 +559,8 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg) err: /* send the response back to the VF */ - ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_VF_RESOURCES, v_ret, - (u8 *)vfres, len); + ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_VF_RESOURCES, v_ret, + (u8 *)vfres, len); kfree(vfres); return ret; @@ -892,7 +929,7 @@ static int ice_vc_handle_rss_cfg(struct ice_vf *vf, u8 *msg, bool add) } error_param: - return ice_vc_send_msg_to_vf(vf, v_opcode, v_ret, NULL, 0); + return ice_vc_respond_to_vf(vf, v_opcode, v_ret, NULL, 0); } /** @@ -938,8 +975,8 @@ static int ice_vc_config_rss_key(struct ice_vf *vf, u8 *msg) if (ice_set_rss_key(vsi, vrk->key)) v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR; error_param: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_KEY, v_ret, - NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_KEY, v_ret, + NULL, 0); } /** @@ -984,7 +1021,7 @@ static int ice_vc_config_rss_lut(struct ice_vf *vf, u8 *msg) if (ice_set_rss_lut(vsi, vrl->lut, ICE_LUT_VSI_SIZE)) v_ret = VIRTCHNL_STATUS_ERR_ADMIN_QUEUE_ERROR; error_param: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_LUT, v_ret, + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_RSS_LUT, v_ret, NULL, 0); } @@ -1124,8 +1161,8 @@ static int ice_vc_cfg_promiscuous_mode_msg(struct ice_vf *vf, u8 *msg) } error_param: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE, - v_ret, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE, + v_ret, NULL, 0); } /** @@ -1165,8 +1202,8 @@ static int ice_vc_get_stats_msg(struct ice_vf *vf, u8 *msg) error_param: /* send the response to the VF */ - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_STATS, v_ret, - (u8 *)&stats, sizeof(stats)); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_STATS, v_ret, + (u8 *)&stats, sizeof(stats)); } /** @@ -1315,8 +1352,8 @@ static int ice_vc_ena_qs_msg(struct ice_vf *vf, u8 *msg) error_param: /* send the response to the VF */ - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES, v_ret, - NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_QUEUES, v_ret, + NULL, 0); } /** @@ -1455,8 +1492,8 @@ static int ice_vc_dis_qs_msg(struct ice_vf *vf, u8 *msg) error_param: /* send the response to the VF */ - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_QUEUES, v_ret, - NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_QUEUES, v_ret, + NULL, 0); } /** @@ -1586,8 +1623,8 @@ static int ice_vc_cfg_irq_map_msg(struct ice_vf *vf, u8 *msg) error_param: /* send the response to the VF */ - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_IRQ_MAP, v_ret, - NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_IRQ_MAP, v_ret, + NULL, 0); } /** @@ -1730,8 +1767,8 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg) } /* send the response to the VF */ - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES, - VIRTCHNL_STATUS_SUCCESS, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES, + VIRTCHNL_STATUS_SUCCESS, NULL, 0); error_param: /* disable whatever we can */ for (; i >= 0; i--) { @@ -1746,8 +1783,8 @@ static int ice_vc_cfg_qs_msg(struct ice_vf *vf, u8 *msg) ice_lag_move_new_vf_nodes(vf); /* send the response to the VF */ - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES, - VIRTCHNL_STATUS_ERR_PARAM, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_VSI_QUEUES, + VIRTCHNL_STATUS_ERR_PARAM, NULL, 0); } /** @@ -2049,7 +2086,7 @@ ice_vc_handle_mac_addr_msg(struct ice_vf *vf, u8 *msg, bool set) handle_mac_exit: /* send the response to the VF */ - return ice_vc_send_msg_to_vf(vf, vc_op, v_ret, NULL, 0); + return ice_vc_respond_to_vf(vf, vc_op, v_ret, NULL, 0); } /** @@ -2132,8 +2169,8 @@ static int ice_vc_request_qs_msg(struct ice_vf *vf, u8 *msg) error_param: /* send the response to the VF */ - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_REQUEST_QUEUES, - v_ret, (u8 *)vfres, sizeof(*vfres)); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_REQUEST_QUEUES, + v_ret, (u8 *)vfres, sizeof(*vfres)); } /** @@ -2398,11 +2435,11 @@ static int ice_vc_process_vlan_msg(struct ice_vf *vf, u8 *msg, bool add_v) error_param: /* send the response to the VF */ if (add_v) - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_VLAN, v_ret, - NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_VLAN, v_ret, + NULL, 0); else - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_VLAN, v_ret, - NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_VLAN, v_ret, + NULL, 0); } /** @@ -2477,8 +2514,8 @@ static int ice_vc_ena_vlan_stripping(struct ice_vf *vf) vf->vlan_strip_ena |= ICE_INNER_VLAN_STRIP_ENA; error_param: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING, - v_ret, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING, + v_ret, NULL, 0); } /** @@ -2514,8 +2551,8 @@ static int ice_vc_dis_vlan_stripping(struct ice_vf *vf) vf->vlan_strip_ena &= ~ICE_INNER_VLAN_STRIP_ENA; error_param: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING, - v_ret, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING, + v_ret, NULL, 0); } /** @@ -2550,8 +2587,8 @@ static int ice_vc_get_rss_hena(struct ice_vf *vf) vrh->hena = ICE_DEFAULT_RSS_HENA; err: /* send the response back to the VF */ - ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_RSS_HENA_CAPS, v_ret, - (u8 *)vrh, len); + ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_RSS_HENA_CAPS, v_ret, + (u8 *)vrh, len); kfree(vrh); return ret; } @@ -2616,8 +2653,8 @@ static int ice_vc_set_rss_hena(struct ice_vf *vf, u8 *msg) /* send the response to the VF */ err: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_SET_RSS_HENA, v_ret, - NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_SET_RSS_HENA, v_ret, + NULL, 0); } /** @@ -2672,8 +2709,8 @@ static int ice_vc_query_rxdid(struct ice_vf *vf) pf->supported_rxdids = rxdid->supported_rxdids; err: - ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_SUPPORTED_RXDIDS, - v_ret, (u8 *)rxdid, len); + ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_SUPPORTED_RXDIDS, + v_ret, (u8 *)rxdid, len); kfree(rxdid); return ret; } @@ -2909,8 +2946,8 @@ static int ice_vc_get_offload_vlan_v2_caps(struct ice_vf *vf) memcpy(&vf->vlan_v2_caps, caps, sizeof(*caps)); out: - err = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS, - v_ret, (u8 *)caps, len); + err = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_GET_OFFLOAD_VLAN_V2_CAPS, + v_ret, (u8 *)caps, len); kfree(caps); return err; } @@ -3151,8 +3188,8 @@ static int ice_vc_remove_vlan_v2_msg(struct ice_vf *vf, u8 *msg) v_ret = VIRTCHNL_STATUS_ERR_PARAM; out: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_VLAN_V2, v_ret, NULL, - 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_VLAN_V2, + v_ret, NULL, 0); } /** @@ -3293,8 +3330,8 @@ static int ice_vc_add_vlan_v2_msg(struct ice_vf *vf, u8 *msg) v_ret = VIRTCHNL_STATUS_ERR_PARAM; out: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_VLAN_V2, v_ret, NULL, - 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_VLAN_V2, + v_ret, NULL, 0); } /** @@ -3525,8 +3562,8 @@ static int ice_vc_ena_vlan_stripping_v2_msg(struct ice_vf *vf, u8 *msg) vf->vlan_strip_ena |= ICE_INNER_VLAN_STRIP_ENA; out: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2, - v_ret, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_STRIPPING_V2, + v_ret, NULL, 0); } /** @@ -3600,8 +3637,8 @@ static int ice_vc_dis_vlan_stripping_v2_msg(struct ice_vf *vf, u8 *msg) vf->vlan_strip_ena &= ~ICE_INNER_VLAN_STRIP_ENA; out: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2, - v_ret, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_STRIPPING_V2, + v_ret, NULL, 0); } /** @@ -3659,8 +3696,8 @@ static int ice_vc_ena_vlan_insertion_v2_msg(struct ice_vf *vf, u8 *msg) } out: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2, - v_ret, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ENABLE_VLAN_INSERTION_V2, + v_ret, NULL, 0); } /** @@ -3714,8 +3751,8 @@ static int ice_vc_dis_vlan_insertion_v2_msg(struct ice_vf *vf, u8 *msg) } out: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2, - v_ret, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DISABLE_VLAN_INSERTION_V2, + v_ret, NULL, 0); } static const struct ice_virtchnl_ops ice_virtchnl_dflt_ops = { @@ -3812,8 +3849,8 @@ static int ice_vc_repr_add_mac(struct ice_vf *vf, u8 *msg) } handle_mac_exit: - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_ETH_ADDR, - v_ret, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_ETH_ADDR, + v_ret, NULL, 0); } /** @@ -3832,8 +3869,8 @@ ice_vc_repr_del_mac(struct ice_vf __always_unused *vf, u8 __always_unused *msg) ice_update_legacy_cached_mac(vf, &al->list[0]); - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_ETH_ADDR, - VIRTCHNL_STATUS_SUCCESS, NULL, 0); + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_ETH_ADDR, + VIRTCHNL_STATUS_SUCCESS, NULL, 0); } static int @@ -3842,8 +3879,8 @@ ice_vc_repr_cfg_promiscuous_mode(struct ice_vf *vf, u8 __always_unused *msg) dev_dbg(ice_pf_to_dev(vf->pf), "Can't config promiscuous mode in switchdev mode for VF %d\n", vf->vf_id); - return ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE, - VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, + return ice_vc_respond_to_vf(vf, VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE, + VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, NULL, 0); } @@ -3986,16 +4023,16 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, error_handler: if (err) { - ice_vc_send_msg_to_vf(vf, v_opcode, VIRTCHNL_STATUS_ERR_PARAM, - NULL, 0); + ice_vc_respond_to_vf(vf, v_opcode, VIRTCHNL_STATUS_ERR_PARAM, + NULL, 0); dev_err(dev, "Invalid message from VF %d, opcode %d, len %d, error %d\n", vf_id, v_opcode, msglen, err); goto finish; } if (!ice_vc_is_opcode_allowed(vf, v_opcode)) { - ice_vc_send_msg_to_vf(vf, v_opcode, - VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, NULL, + ice_vc_respond_to_vf(vf, v_opcode, + VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, NULL, 0); goto finish; } @@ -4106,9 +4143,9 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, default: dev_err(dev, "Unsupported opcode %d from VF %d\n", v_opcode, vf_id); - err = ice_vc_send_msg_to_vf(vf, v_opcode, - VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, - NULL, 0); + err = ice_vc_respond_to_vf(vf, v_opcode, + VIRTCHNL_STATUS_ERR_NOT_SUPPORTED, + NULL, 0); break; } if (err) { diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h index cd747718de73..a2b6094e2f2f 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h @@ -60,8 +60,8 @@ void ice_vc_notify_vf_link_state(struct ice_vf *vf); void ice_vc_notify_link_state(struct ice_pf *pf); void ice_vc_notify_reset(struct ice_pf *pf); int -ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode, - enum virtchnl_status_code v_retval, u8 *msg, u16 msglen); +ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode, + enum virtchnl_status_code v_retval, u8 *msg, u16 msglen); bool ice_vc_isvalid_vsi_id(struct ice_vf *vf, u16 vsi_id); void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, struct ice_mbx_data *mbxdata); @@ -73,8 +73,8 @@ static inline void ice_vc_notify_link_state(struct ice_pf *pf) { } static inline void ice_vc_notify_reset(struct ice_pf *pf) { } static inline int -ice_vc_send_msg_to_vf(struct ice_vf *vf, u32 v_opcode, - enum virtchnl_status_code v_retval, u8 *msg, u16 msglen) +ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode, + enum virtchnl_status_code v_retval, u8 *msg, u16 msglen) { return -EOPNOTSUPP; } diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c b/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c index 24b23b7ef04a..816d8bf8bec4 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl_fdir.c @@ -1584,8 +1584,8 @@ ice_vc_add_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx, resp->flow_id = conf->flow_id; vf->fdir.fdir_fltr_cnt[conf->input.flow_type][is_tun]++; - ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret, - (u8 *)resp, len); + ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret, + (u8 *)resp, len); kfree(resp); dev_dbg(dev, "VF %d: flow_id:0x%X, FDIR %s success!\n", @@ -1600,8 +1600,8 @@ ice_vc_add_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx, ice_vc_fdir_remove_entry(vf, conf, conf->flow_id); devm_kfree(dev, conf); - ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret, - (u8 *)resp, len); + ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret, + (u8 *)resp, len); kfree(resp); return ret; } @@ -1648,8 +1648,8 @@ ice_vc_del_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx, ice_vc_fdir_remove_entry(vf, conf, conf->flow_id); vf->fdir.fdir_fltr_cnt[conf->input.flow_type][is_tun]--; - ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret, - (u8 *)resp, len); + ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret, + (u8 *)resp, len); kfree(resp); dev_dbg(dev, "VF %d: flow_id:0x%X, FDIR %s success!\n", @@ -1665,8 +1665,8 @@ ice_vc_del_fdir_fltr_post(struct ice_vf *vf, struct ice_vf_fdir_ctx *ctx, if (success) devm_kfree(dev, conf); - ret = ice_vc_send_msg_to_vf(vf, ctx->v_opcode, v_ret, - (u8 *)resp, len); + ret = ice_vc_respond_to_vf(vf, ctx->v_opcode, v_ret, + (u8 *)resp, len); kfree(resp); return ret; } @@ -1863,8 +1863,8 @@ int ice_vc_add_fdir_fltr(struct ice_vf *vf, u8 *msg) v_ret = VIRTCHNL_STATUS_SUCCESS; stat->status = VIRTCHNL_FDIR_SUCCESS; devm_kfree(dev, conf); - ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER, - v_ret, (u8 *)stat, len); + ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER, + v_ret, (u8 *)stat, len); goto exit; } @@ -1922,8 +1922,8 @@ int ice_vc_add_fdir_fltr(struct ice_vf *vf, u8 *msg) err_free_conf: devm_kfree(dev, conf); err_exit: - ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER, v_ret, - (u8 *)stat, len); + ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_ADD_FDIR_FILTER, v_ret, + (u8 *)stat, len); kfree(stat); return ret; } @@ -2006,8 +2006,8 @@ int ice_vc_del_fdir_fltr(struct ice_vf *vf, u8 *msg) err_del_tmr: ice_vc_fdir_clear_irq_ctx(vf); err_exit: - ret = ice_vc_send_msg_to_vf(vf, VIRTCHNL_OP_DEL_FDIR_FILTER, v_ret, - (u8 *)stat, len); + ret = ice_vc_respond_to_vf(vf, VIRTCHNL_OP_DEL_FDIR_FILTER, v_ret, + (u8 *)stat, len); kfree(stat); return ret; } From patchwork Tue Nov 21 02:51:03 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462482 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="H1Y/EjXg" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 8C69BE8; Mon, 20 Nov 2023 18:50:21 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700535021; x=1732071021; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=GEBquv8g8svvSh4MvuUqEVBc8Cbfc2Qs0nRwb8PM+JM=; b=H1Y/EjXgRxu1hoWwojREr9T3vkkT/AO/hTVANqDkQz7/VaTxD0MmGuwL pxj9IiuJ9Gk+G5BhmnwS2M+YNZBup+fg7uEv4bfsfB5LLRPxRE+Hydz/N 6mUJ90anZJrS9SAzRM5p4VAgqTZcB5hAApuR1KMhJqIajQsh8kQcOGe/S D7kYPf5EqmnxTes6vwBiUASwUI6x9kHo2u31xgLajrjFA/X/5RJC9Zk/m VE20JQRcRL66D3pKWRf44Byc7mbWVWGClRQrunlkogoyVmFAWvVExE1or iEdaskTno+Nf1pAQ/qEvl52xTbufJgs0seVX4xgpftsGbu7iPG5hSXEiw Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458245942" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458245942" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:50:03 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488388" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488388" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:49:58 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 04/12] ice: Add fundamental migration init and exit function Date: Tue, 21 Nov 2023 02:51:03 +0000 Message-Id: <20231121025111.257597-5-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Lingyu Liu Add basic entry point for live migration functionality initialization, uninitialization and add helper function for vfio driver to reach pf driver data. Signed-off-by: Lingyu Liu Reviewed-by: Michal Swiatkowski Signed-off-by: Yahui Cao --- drivers/net/ethernet/intel/ice/Makefile | 1 + drivers/net/ethernet/intel/ice/ice.h | 3 + drivers/net/ethernet/intel/ice/ice_main.c | 15 ++++ .../net/ethernet/intel/ice/ice_migration.c | 82 +++++++++++++++++++ .../intel/ice/ice_migration_private.h | 21 +++++ drivers/net/ethernet/intel/ice/ice_vf_lib.c | 4 + drivers/net/ethernet/intel/ice/ice_vf_lib.h | 2 + include/linux/net/intel/ice_migration.h | 27 ++++++ 8 files changed, 155 insertions(+) create mode 100644 drivers/net/ethernet/intel/ice/ice_migration.c create mode 100644 drivers/net/ethernet/intel/ice/ice_migration_private.h create mode 100644 include/linux/net/intel/ice_migration.h diff --git a/drivers/net/ethernet/intel/ice/Makefile b/drivers/net/ethernet/intel/ice/Makefile index 0679907980f7..c536a9a896c0 100644 --- a/drivers/net/ethernet/intel/ice/Makefile +++ b/drivers/net/ethernet/intel/ice/Makefile @@ -49,3 +49,4 @@ ice-$(CONFIG_RFS_ACCEL) += ice_arfs.o ice-$(CONFIG_XDP_SOCKETS) += ice_xsk.o ice-$(CONFIG_ICE_SWITCHDEV) += ice_eswitch.o ice_eswitch_br.o ice-$(CONFIG_GNSS) += ice_gnss.o +ice-$(CONFIG_ICE_VFIO_PCI) += ice_migration.o diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index 351e0d36df44..13f6ce51985c 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -55,6 +55,7 @@ #include #include #include +#include #include "ice_devids.h" #include "ice_type.h" #include "ice_txrx.h" @@ -77,6 +78,7 @@ #include "ice_gnss.h" #include "ice_irq.h" #include "ice_dpll.h" +#include "ice_migration_private.h" #define ICE_BAR0 0 #define ICE_REQ_DESC_MULTIPLE 32 @@ -963,6 +965,7 @@ void ice_service_task_schedule(struct ice_pf *pf); int ice_load(struct ice_pf *pf); void ice_unload(struct ice_pf *pf); void ice_adv_lnk_speed_maps_init(void); +struct ice_pf *ice_get_pf_from_vf_pdev(struct pci_dev *pdev); /** * ice_set_rdma_cap - enable RDMA support diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index 6607fa6fe556..2daa4d2b1dd1 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -9313,3 +9313,18 @@ static const struct net_device_ops ice_netdev_ops = { .ndo_xdp_xmit = ice_xdp_xmit, .ndo_xsk_wakeup = ice_xsk_wakeup, }; + +/** + * ice_get_pf_from_vf_pdev - Get PF structure from PCI device + * @pdev: pointer to PCI device + * + * Return pointer to ice PF structure, NULL for failure + */ +struct ice_pf *ice_get_pf_from_vf_pdev(struct pci_dev *pdev) +{ + struct ice_pf *pf; + + pf = pci_iov_get_pf_drvdata(pdev, &ice_driver); + + return !IS_ERR(pf) ? pf : NULL; +} diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c new file mode 100644 index 000000000000..2b9b5a2ce367 --- /dev/null +++ b/drivers/net/ethernet/intel/ice/ice_migration.c @@ -0,0 +1,82 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2018-2023 Intel Corporation */ + +#include "ice.h" + +/** + * ice_migration_get_pf - Get ice PF structure pointer by pdev + * @pdev: pointer to ice vfio pci VF pdev structure + * + * Return nonzero for success, NULL for failure. + */ +struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev) +{ + return ice_get_pf_from_vf_pdev(pdev); +} +EXPORT_SYMBOL(ice_migration_get_pf); + +/** + * ice_migration_init_vf - init ice VF device state data + * @vf: pointer to VF + */ +void ice_migration_init_vf(struct ice_vf *vf) +{ + vf->migration_enabled = true; +} + +/** + * ice_migration_uninit_vf - uninit VF device state data + * @vf: pointer to VF + */ +void ice_migration_uninit_vf(struct ice_vf *vf) +{ + if (!vf->migration_enabled) + return; + + vf->migration_enabled = false; +} + +/** + * ice_migration_init_dev - init ice migration device + * @pf: pointer to PF of migration device + * @vf_id: VF index of migration device + * + * Return 0 for success, negative for failure + */ +int ice_migration_init_dev(struct ice_pf *pf, int vf_id) +{ + struct device *dev = ice_pf_to_dev(pf); + struct ice_vf *vf; + + vf = ice_get_vf_by_id(pf, vf_id); + if (!vf) { + dev_err(dev, "Unable to locate VF from VF ID%d\n", vf_id); + return -EINVAL; + } + + ice_migration_init_vf(vf); + ice_put_vf(vf); + return 0; +} +EXPORT_SYMBOL(ice_migration_init_dev); + +/** + * ice_migration_uninit_dev - uninit ice migration device + * @pf: pointer to PF of migration device + * @vf_id: VF index of migration device + */ +void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id) +{ + struct device *dev = ice_pf_to_dev(pf); + struct ice_vf *vf; + + vf = ice_get_vf_by_id(pf, vf_id); + if (!vf) { + dev_err(dev, "Unable to locate VF from VF ID%d\n", vf_id); + return; + } + + ice_migration_uninit_vf(vf); + ice_put_vf(vf); +} +EXPORT_SYMBOL(ice_migration_uninit_dev); diff --git a/drivers/net/ethernet/intel/ice/ice_migration_private.h b/drivers/net/ethernet/intel/ice/ice_migration_private.h new file mode 100644 index 000000000000..2cc2f515fc5e --- /dev/null +++ b/drivers/net/ethernet/intel/ice/ice_migration_private.h @@ -0,0 +1,21 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (C) 2018-2023 Intel Corporation */ + +#ifndef _ICE_MIGRATION_PRIVATE_H_ +#define _ICE_MIGRATION_PRIVATE_H_ + +/* This header file is for exposing functions in ice_migration.c to + * files which will be compiled in ice.ko. + * Functions which may be used by other files which will be compiled + * in ice-vfio-pic.ko should be exposed as part of ice_migration.h. + */ + +#if IS_ENABLED(CONFIG_ICE_VFIO_PCI) +void ice_migration_init_vf(struct ice_vf *vf); +void ice_migration_uninit_vf(struct ice_vf *vf); +#else +static inline void ice_migration_init_vf(struct ice_vf *vf) { } +static inline void ice_migration_uninit_vf(struct ice_vf *vf) { } +#endif /* CONFIG_ICE_VFIO_PCI */ + +#endif /* _ICE_MIGRATION_PRIVATE_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.c b/drivers/net/ethernet/intel/ice/ice_vf_lib.c index aca1f2ea5034..8e571280831e 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.c @@ -243,6 +243,10 @@ static void ice_vf_pre_vsi_rebuild(struct ice_vf *vf) if (vf->vf_ops->irq_close) vf->vf_ops->irq_close(vf); + if (vf->migration_enabled) { + ice_migration_uninit_vf(vf); + ice_migration_init_vf(vf); + } ice_vf_clear_counters(vf); vf->vf_ops->clear_reset_trigger(vf); } diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h index c7e7df7baf38..431fd28787e8 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h @@ -139,6 +139,8 @@ struct ice_vf { struct devlink_port devlink_port; u16 num_msix; /* num of MSI-X configured on this VF */ + + u8 migration_enabled:1; }; /* Flags for controlling behavior of ice_reset_vf */ diff --git a/include/linux/net/intel/ice_migration.h b/include/linux/net/intel/ice_migration.h new file mode 100644 index 000000000000..7ea11a8714d6 --- /dev/null +++ b/include/linux/net/intel/ice_migration.h @@ -0,0 +1,27 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (C) 2018-2023 Intel Corporation */ + +#ifndef _ICE_MIGRATION_H_ +#define _ICE_MIGRATION_H_ + +struct ice_pf; + +#if IS_ENABLED(CONFIG_ICE_VFIO_PCI) +struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev); +int ice_migration_init_dev(struct ice_pf *pf, int vf_id); +void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id); +#else +static inline struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev) +{ + return NULL; +} + +static inline int ice_migration_init_dev(struct ice_pf *pf, int vf_id) +{ + return 0; +} + +static inline void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id) { } +#endif /* CONFIG_ICE_VFIO_PCI */ + +#endif /* _ICE_MIGRATION_H_ */ From patchwork Tue Nov 21 02:51:04 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462483 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="iF+MaB2l" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 85E1E110; Mon, 20 Nov 2023 18:50:23 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700535023; x=1732071023; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=HHNJajCyGqWDmA4jUcAW4Ov4HeI7WNEYAdgBvZUAsEw=; b=iF+MaB2ljeBJDm7OXMju4mk0KR1mnVT49ATsLsjf7ZIjJweUd/9W2SuK Dy7zG+Kfx+KDOn7lxOwAbQv36cdd10bAYBM/6vmVTbSnLtm2hOv9W7Vlx O/hRagl5wC+YJ2mhR+1ywOR+Q3Jo5pPhiHqBtDWnRezwHHMxdTRMaZ6uB qX5NvvJnHuHenC3bD/bUQpfsDCSquZTPmJmxBrZkEe167xqvFu/q7ICXA 8lPX3sTjrkL32QHWrjGOjWtvV9wZ80wZYtGFBLZoYJm2z7qE3Q8QHVuBc VFQAqApIPHcWtxEMPrR3V8EPVpG/O+Cv/DWSuSV1v4ltYvFU/OB04d3i4 A==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458245978" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458245978" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:50:08 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488418" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488418" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:50:03 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 05/12] ice: Log virtual channel messages in PF Date: Tue, 21 Nov 2023 02:51:04 +0000 Message-Id: <20231121025111.257597-6-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Lingyu Liu Save the virtual channel messages sent by VF on the source side during runtime. The logged virtchnl messages will be transferred and loaded into the device on the destination side during the device resume stage. For the feature which can not be migrated yet, it must be disabled or blocked to prevent from being abused by VF. Otherwise, it may introduce functional and security issue. Mask unsupported VF capability flags in the VF-PF negotiaion stage. Signed-off-by: Lingyu Liu Signed-off-by: Yahui Cao --- .../net/ethernet/intel/ice/ice_migration.c | 167 ++++++++++++++++++ .../intel/ice/ice_migration_private.h | 17 ++ drivers/net/ethernet/intel/ice/ice_vf_lib.h | 5 + drivers/net/ethernet/intel/ice/ice_virtchnl.c | 31 ++++ 4 files changed, 220 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c index 2b9b5a2ce367..18ec4ec7d147 100644 --- a/drivers/net/ethernet/intel/ice/ice_migration.c +++ b/drivers/net/ethernet/intel/ice/ice_migration.c @@ -3,6 +3,17 @@ #include "ice.h" +struct ice_migration_virtchnl_msg_slot { + u32 opcode; + u16 msg_len; + char msg_buffer[]; +}; + +struct ice_migration_virtchnl_msg_listnode { + struct list_head node; + struct ice_migration_virtchnl_msg_slot msg_slot; +}; + /** * ice_migration_get_pf - Get ice PF structure pointer by pdev * @pdev: pointer to ice vfio pci VF pdev structure @@ -22,6 +33,9 @@ EXPORT_SYMBOL(ice_migration_get_pf); void ice_migration_init_vf(struct ice_vf *vf) { vf->migration_enabled = true; + INIT_LIST_HEAD(&vf->virtchnl_msg_list); + vf->virtchnl_msg_num = 0; + vf->virtchnl_msg_size = 0; } /** @@ -30,10 +44,24 @@ void ice_migration_init_vf(struct ice_vf *vf) */ void ice_migration_uninit_vf(struct ice_vf *vf) { + struct ice_migration_virtchnl_msg_listnode *msg_listnode; + struct ice_migration_virtchnl_msg_listnode *dtmp; + if (!vf->migration_enabled) return; vf->migration_enabled = false; + + if (list_empty(&vf->virtchnl_msg_list)) + return; + list_for_each_entry_safe(msg_listnode, dtmp, + &vf->virtchnl_msg_list, + node) { + list_del(&msg_listnode->node); + kfree(msg_listnode); + } + vf->virtchnl_msg_num = 0; + vf->virtchnl_msg_size = 0; } /** @@ -80,3 +108,142 @@ void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id) ice_put_vf(vf); } EXPORT_SYMBOL(ice_migration_uninit_dev); + +/** + * ice_migration_is_loggable_msg - is this message loggable or not + * @v_opcode: virtchnl message operation code + * + * Return true if this message logging is supported, otherwise return false + */ +static inline bool ice_migration_is_loggable_msg(u32 v_opcode) +{ + switch (v_opcode) { + case VIRTCHNL_OP_VERSION: + case VIRTCHNL_OP_GET_VF_RESOURCES: + case VIRTCHNL_OP_CONFIG_VSI_QUEUES: + case VIRTCHNL_OP_CONFIG_IRQ_MAP: + case VIRTCHNL_OP_ADD_ETH_ADDR: + case VIRTCHNL_OP_DEL_ETH_ADDR: + case VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE: + case VIRTCHNL_OP_ENABLE_QUEUES: + case VIRTCHNL_OP_DISABLE_QUEUES: + case VIRTCHNL_OP_ADD_VLAN: + case VIRTCHNL_OP_DEL_VLAN: + case VIRTCHNL_OP_ENABLE_VLAN_STRIPPING: + case VIRTCHNL_OP_DISABLE_VLAN_STRIPPING: + case VIRTCHNL_OP_CONFIG_RSS_KEY: + case VIRTCHNL_OP_CONFIG_RSS_LUT: + case VIRTCHNL_OP_GET_SUPPORTED_RXDIDS: + return true; + default: + return false; + } +} + +/** + * ice_migration_log_vf_msg - Log request message from VF + * @vf: pointer to the VF structure + * @event: pointer to the AQ event + * + * Log VF message for later device state loading during live migration + * + * Return 0 for success, negative for error + */ +int ice_migration_log_vf_msg(struct ice_vf *vf, + struct ice_rq_event_info *event) +{ + struct ice_migration_virtchnl_msg_listnode *msg_listnode; + u32 v_opcode = le32_to_cpu(event->desc.cookie_high); + struct device *dev = ice_pf_to_dev(vf->pf); + u16 msglen = event->msg_len; + u8 *msg = event->msg_buf; + + if (!ice_migration_is_loggable_msg(v_opcode)) + return 0; + + if (vf->virtchnl_msg_num >= VIRTCHNL_MSG_MAX) { + dev_warn(dev, "VF %d has maximum number virtual channel commands\n", + vf->vf_id); + return -ENOMEM; + } + + msg_listnode = (struct ice_migration_virtchnl_msg_listnode *) + kzalloc(struct_size(msg_listnode, + msg_slot.msg_buffer, + msglen), + GFP_KERNEL); + if (!msg_listnode) { + dev_err(dev, "VF %d failed to allocate memory for msg listnode\n", + vf->vf_id); + return -ENOMEM; + } + dev_dbg(dev, "VF %d save virtual channel command, op code: %d, len: %d\n", + vf->vf_id, v_opcode, msglen); + msg_listnode->msg_slot.opcode = v_opcode; + msg_listnode->msg_slot.msg_len = msglen; + memcpy(msg_listnode->msg_slot.msg_buffer, msg, msglen); + list_add_tail(&msg_listnode->node, &vf->virtchnl_msg_list); + vf->virtchnl_msg_num++; + vf->virtchnl_msg_size += struct_size(&msg_listnode->msg_slot, + msg_buffer, + msglen); + return 0; +} + +/** + * ice_migration_unlog_vf_msg - revert logged message + * @vf: pointer to the VF structure + * @v_opcode: virtchnl message operation code + * + * Remove the last virtual channel message logged before. + */ +void ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode) +{ + struct ice_migration_virtchnl_msg_listnode *msg_listnode; + + if (!ice_migration_is_loggable_msg(v_opcode)) + return; + + if (WARN_ON_ONCE(list_empty(&vf->virtchnl_msg_list))) + return; + + msg_listnode = + list_last_entry(&vf->virtchnl_msg_list, + struct ice_migration_virtchnl_msg_listnode, + node); + if (WARN_ON_ONCE(msg_listnode->msg_slot.opcode != v_opcode)) + return; + + list_del(&msg_listnode->node); + kfree(msg_listnode); + vf->virtchnl_msg_num--; + vf->virtchnl_msg_size -= struct_size(&msg_listnode->msg_slot, + msg_buffer, + msg_listnode->msg_slot.msg_len); +} + +#define VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE \ + (VIRTCHNL_VF_OFFLOAD_L2 | \ + VIRTCHNL_VF_OFFLOAD_RSS_PF | \ + VIRTCHNL_VF_OFFLOAD_RSS_AQ | \ + VIRTCHNL_VF_OFFLOAD_RSS_REG | \ + VIRTCHNL_VF_OFFLOAD_RSS_PCTYPE_V2 | \ + VIRTCHNL_VF_OFFLOAD_ENCAP | \ + VIRTCHNL_VF_OFFLOAD_ENCAP_CSUM | \ + VIRTCHNL_VF_OFFLOAD_RX_POLLING | \ + VIRTCHNL_VF_OFFLOAD_WB_ON_ITR | \ + VIRTCHNL_VF_CAP_ADV_LINK_SPEED | \ + VIRTCHNL_VF_OFFLOAD_VLAN | \ + VIRTCHNL_VF_OFFLOAD_RX_FLEX_DESC | \ + VIRTCHNL_VF_OFFLOAD_USO) + +/** + * ice_migration_supported_caps - get migration supported VF capabilities + * + * When migration is activated, some VF capabilities are not supported. + * Hence unmask those capability flags for VF resources. + */ +u32 ice_migration_supported_caps(void) +{ + return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE; +} diff --git a/drivers/net/ethernet/intel/ice/ice_migration_private.h b/drivers/net/ethernet/intel/ice/ice_migration_private.h index 2cc2f515fc5e..676eb2d6c12e 100644 --- a/drivers/net/ethernet/intel/ice/ice_migration_private.h +++ b/drivers/net/ethernet/intel/ice/ice_migration_private.h @@ -13,9 +13,26 @@ #if IS_ENABLED(CONFIG_ICE_VFIO_PCI) void ice_migration_init_vf(struct ice_vf *vf); void ice_migration_uninit_vf(struct ice_vf *vf); +int ice_migration_log_vf_msg(struct ice_vf *vf, + struct ice_rq_event_info *event); +void ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode); +u32 ice_migration_supported_caps(void); #else static inline void ice_migration_init_vf(struct ice_vf *vf) { } static inline void ice_migration_uninit_vf(struct ice_vf *vf) { } +static inline int ice_migration_log_vf_msg(struct ice_vf *vf, + struct ice_rq_event_info *event) +{ + return 0; +} + +static inline void +ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode) { } +static inline u32 +ice_migration_supported_caps(void) +{ + return 0xFFFFFFFF; +} #endif /* CONFIG_ICE_VFIO_PCI */ #endif /* _ICE_MIGRATION_PRIVATE_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h index 431fd28787e8..318b6dfc016d 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h @@ -77,6 +77,7 @@ struct ice_vfs { unsigned long last_printed_mdd_jiffies; /* MDD message rate limit */ }; +#define VIRTCHNL_MSG_MAX 1000 /* VF information structure */ struct ice_vf { struct hlist_node entry; @@ -141,6 +142,10 @@ struct ice_vf { u16 num_msix; /* num of MSI-X configured on this VF */ u8 migration_enabled:1; + struct list_head virtchnl_msg_list; + u64 virtchnl_msg_num; + u64 virtchnl_msg_size; + u32 virtchnl_retval; }; /* Flags for controlling behavior of ice_reset_vf */ diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index 661ca86c3032..730eeaea8c89 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -348,6 +348,12 @@ ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode, return -EIO; } + /* v_retval will not be returned in this function, store it in the + * per VF field to be used by migration logging logic later. + */ + if (vf->migration_enabled) + vf->virtchnl_retval = v_retval; + return ice_vc_send_response_to_vf(vf, v_opcode, v_retval, msg, msglen); } @@ -480,6 +486,8 @@ static int ice_vc_get_vf_res_msg(struct ice_vf *vf, u8 *msg) VIRTCHNL_VF_OFFLOAD_RSS_REG | VIRTCHNL_VF_OFFLOAD_VLAN; + if (vf->migration_enabled) + vf->driver_caps &= ice_migration_supported_caps(); vfres->vf_cap_flags = VIRTCHNL_VF_OFFLOAD_L2; vsi = ice_get_vf_vsi(vf); if (!vsi) { @@ -4037,6 +4045,17 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, goto finish; } + if (vf->migration_enabled) { + if (ice_migration_log_vf_msg(vf, event)) { + u32 status_code = VIRTCHNL_STATUS_ERR_NO_MEMORY; + + err = ice_vc_respond_to_vf(vf, v_opcode, + status_code, + NULL, 0); + goto finish; + } + } + switch (v_opcode) { case VIRTCHNL_OP_VERSION: err = ops->get_ver_msg(vf, msg); @@ -4156,6 +4175,18 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, vf_id, v_opcode, err); } + /* All of the loggable virtual channel messages are logged by + * ice_migration_unlog_vf_msg() before they are processed. + * + * Two kinds of error may happen, virtual channel message's result + * is failure after processed by PF or message is not sent to VF + * successfully. If error happened, fallback here by reverting logged + * messages. + */ + if (vf->migration_enabled && + (vf->virtchnl_retval != VIRTCHNL_STATUS_SUCCESS || err)) + ice_migration_unlog_vf_msg(vf, v_opcode); + finish: mutex_unlock(&vf->cfg_lock); ice_put_vf(vf); From patchwork Tue Nov 21 02:51:05 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462484 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MFLFGm4v" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 4DED1D8; Mon, 20 Nov 2023 18:50:27 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700535027; x=1732071027; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=WtUoAVlpdEKgZFTz0tk9mfNmhk98t+A/5sj4rLyqYzM=; b=MFLFGm4v7vPnX+ku0mG8CH68lLIB7Ji6FGjT0MKrj0sL4qYWWN1oi2Ut dvTflcO2yJODKenaSs+8I03w0VKwxZJmsqSWwojLxTAnui8TQp6L8AVMU gpFRY7uHAcHYZXSUiWcFLgLWgiJM78kQVSFze/FPiOkAemrpz60sSNEMX 6hdC5hjDPQ8xC1rSBa2lGOcAkzK8vCgeoj3SXBcj1a9VTBGy+Mwntjdzi VgYTs7W5D4hMBj8cqeh+2i1bYBAc/gSUdmLb820nZtjgOan+c9dplwU2r eLqRr1Kr3vmCA7ADhNqLbyHo6O3dIOcTfSCuij2hsUZbfSw6smEgT7mcR Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458246021" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458246021" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:50:12 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488431" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488431" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:50:07 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 06/12] ice: Add device state save/load function for migration Date: Tue, 21 Nov 2023 02:51:05 +0000 Message-Id: <20231121025111.257597-7-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Lingyu Liu Add device state save/load function to adapter vfio migration stack when device is in stop-copy/resume stage. Device state saving handler is called by vfio driver in device stop copy stage. It snapshots the device state, translates device state into device specific data and fills the data into migration buffer. Device state loading handler is called by vfio driver in device resume stage. It gets device specific data from the migration buffer, translates the data into the device state and recover the device with the state. Currently only the virtual channel messages are handled. Signed-off-by: Lingyu Liu Signed-off-by: Yahui Cao --- .../net/ethernet/intel/ice/ice_migration.c | 226 ++++++++++++++++++ drivers/net/ethernet/intel/ice/ice_virtchnl.c | 27 ++- drivers/net/ethernet/intel/ice/ice_virtchnl.h | 7 +- include/linux/net/intel/ice_migration.h | 15 ++ 4 files changed, 266 insertions(+), 9 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c index 18ec4ec7d147..981aa92bbe86 100644 --- a/drivers/net/ethernet/intel/ice/ice_migration.c +++ b/drivers/net/ethernet/intel/ice/ice_migration.c @@ -3,6 +3,9 @@ #include "ice.h" +#define ICE_MIG_DEVSTAT_MAGIC 0xE8000001 +#define ICE_MIG_DEVSTAT_VERSION 0x1 + struct ice_migration_virtchnl_msg_slot { u32 opcode; u16 msg_len; @@ -14,6 +17,17 @@ struct ice_migration_virtchnl_msg_listnode { struct ice_migration_virtchnl_msg_slot msg_slot; }; +struct ice_migration_dev_state { + u32 magic; + u32 version; + u64 total_size; + u32 vf_caps; + u16 num_txq; + u16 num_rxq; + + u8 virtchnl_msgs[]; +} __aligned(8); + /** * ice_migration_get_pf - Get ice PF structure pointer by pdev * @pdev: pointer to ice vfio pci VF pdev structure @@ -247,3 +261,215 @@ u32 ice_migration_supported_caps(void) { return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE; } + +/** + * ice_migration_save_devstate - save device state to migration buffer + * @pf: pointer to PF of migration device + * @vf_id: VF index of migration device + * @buf: pointer to VF msg in migration buffer + * @buf_sz: size of migration buffer + * + * Return 0 for success, negative for error + */ +int +ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz) +{ + struct ice_migration_virtchnl_msg_listnode *msg_listnode; + struct ice_migration_virtchnl_msg_slot *dummy_op; + struct ice_migration_dev_state *devstate; + struct device *dev = ice_pf_to_dev(pf); + struct ice_vsi *vsi; + struct ice_vf *vf; + u64 total_sz; + int ret = 0; + + vf = ice_get_vf_by_id(pf, vf_id); + if (!vf) { + dev_err(dev, "Unable to locate VF from VF ID%d\n", vf_id); + return -EINVAL; + } + + vsi = ice_get_vf_vsi(vf); + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + ret = -EINVAL; + goto out_put_vf; + } + + /* Reserve space to store device state */ + total_sz = sizeof(struct ice_migration_dev_state) + + vf->virtchnl_msg_size + sizeof(*dummy_op); + if (total_sz > buf_sz) { + dev_err(dev, "Insufficient buffer to store device state for VF %d\n", + vf->vf_id); + ret = -ENOBUFS; + goto out_put_vf; + } + + devstate = (struct ice_migration_dev_state *)buf; + devstate->magic = ICE_MIG_DEVSTAT_MAGIC; + devstate->version = ICE_MIG_DEVSTAT_VERSION; + devstate->total_size = total_sz; + devstate->vf_caps = ice_migration_supported_caps(); + devstate->num_txq = vsi->num_txq; + devstate->num_rxq = vsi->num_rxq; + buf = devstate->virtchnl_msgs; + + list_for_each_entry(msg_listnode, &vf->virtchnl_msg_list, node) { + struct ice_migration_virtchnl_msg_slot *msg_slot; + u64 slot_size; + + msg_slot = &msg_listnode->msg_slot; + slot_size = struct_size(msg_slot, msg_buffer, + msg_slot->msg_len); + dev_dbg(dev, "VF %d copy virtchnl message to migration buffer op: %d, len: %d\n", + vf->vf_id, msg_slot->opcode, msg_slot->msg_len); + memcpy(buf, msg_slot, slot_size); + buf += slot_size; + } + + /* Use op code unknown to mark end of vc messages */ + dummy_op = (struct ice_migration_virtchnl_msg_slot *)buf; + dummy_op->opcode = VIRTCHNL_OP_UNKNOWN; + +out_put_vf: + ice_put_vf(vf); + return ret; +} +EXPORT_SYMBOL(ice_migration_save_devstate); + +/** + * ice_migration_check_match - check if configuration is matched or not + * @vf: pointer to VF + * @buf: pointer to device state buffer + * @buf_sz: size of buffer + * + * Return 0 for success, negative for error + */ +static int +ice_migration_check_match(struct ice_vf *vf, const u8 *buf, u64 buf_sz) +{ + u32 supported_caps = ice_migration_supported_caps(); + struct device *dev = ice_pf_to_dev(vf->pf); + struct ice_migration_dev_state *devstate; + struct ice_vsi *vsi; + + vsi = ice_get_vf_vsi(vf); + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + return -EINVAL; + } + + if (sizeof(struct ice_migration_dev_state) > buf_sz) { + dev_err(dev, "VF %d devstate header exceeds buffer size\n", + vf->vf_id); + return -EINVAL; + } + + devstate = (struct ice_migration_dev_state *)buf; + if (devstate->magic != ICE_MIG_DEVSTAT_MAGIC) { + dev_err(dev, "VF %d devstate has invalid magic 0x%x\n", + vf->vf_id, devstate->magic); + return -EINVAL; + } + + if (devstate->version != ICE_MIG_DEVSTAT_VERSION) { + dev_err(dev, "VF %d devstate has invalid version 0x%x\n", + vf->vf_id, devstate->version); + return -EINVAL; + } + + if (devstate->num_txq != vsi->num_txq) { + dev_err(dev, "Failed to match VF %d tx queue number, request %d, support %d\n", + vf->vf_id, devstate->num_txq, vsi->num_txq); + return -EINVAL; + } + + if (devstate->num_rxq != vsi->num_rxq) { + dev_err(dev, "Failed to match VF %d rx queue number, request %d, support %d\n", + vf->vf_id, devstate->num_rxq, vsi->num_rxq); + return -EINVAL; + } + + if ((devstate->vf_caps & supported_caps) != devstate->vf_caps) { + dev_err(dev, "Failed to match VF %d caps, request 0x%x, support 0x%x\n", + vf->vf_id, devstate->vf_caps, supported_caps); + return -EINVAL; + } + + if (devstate->total_size > buf_sz) { + dev_err(dev, "VF %d devstate exceeds buffer size\n", + vf->vf_id); + return -EINVAL; + } + + return 0; +} + +/** + * ice_migration_load_devstate - load device state at destination + * @pf: pointer to PF of migration device + * @vf_id: VF index of migration device + * @buf: pointer to device state buf in migration buffer + * @buf_sz: size of migration buffer + * + * This function uses the device state saved in migration buffer + * to load device state at destination VM + * + * Return 0 for success, negative for error + */ +int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, + const u8 *buf, u64 buf_sz) +{ + struct ice_migration_virtchnl_msg_slot *msg_slot; + struct ice_migration_dev_state *devstate; + struct device *dev = ice_pf_to_dev(pf); + struct ice_vf *vf; + int ret = 0; + + if (!buf) + return -EINVAL; + + vf = ice_get_vf_by_id(pf, vf_id); + if (!vf) { + dev_err(dev, "Unable to locate VF from VF ID%d\n", vf_id); + return -EINVAL; + } + + ret = ice_migration_check_match(vf, buf, buf_sz); + if (ret) + goto out_put_vf; + + devstate = (struct ice_migration_dev_state *)buf; + msg_slot = (struct ice_migration_virtchnl_msg_slot *) + devstate->virtchnl_msgs; + set_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states); + + while (msg_slot->opcode != VIRTCHNL_OP_UNKNOWN) { + struct ice_rq_event_info event; + u64 slot_sz; + + slot_sz = struct_size(msg_slot, msg_buffer, msg_slot->msg_len); + dev_dbg(dev, "VF %d replay virtchnl message op code: %d, msg len: %d\n", + vf->vf_id, msg_slot->opcode, msg_slot->msg_len); + event.desc.cookie_high = cpu_to_le32(msg_slot->opcode); + event.msg_len = msg_slot->msg_len; + event.desc.retval = cpu_to_le16(vf->vf_id); + event.msg_buf = (unsigned char *)msg_slot->msg_buffer; + ret = ice_vc_process_vf_msg(vf->pf, &event, NULL); + if (ret) { + dev_err(dev, "VF %d failed to replay virtchnl message op code: %d\n", + vf->vf_id, msg_slot->opcode); + goto out_clear_replay; + } + event.msg_buf = NULL; + msg_slot = (struct ice_migration_virtchnl_msg_slot *) + ((char *)msg_slot + slot_sz); + } +out_clear_replay: + clear_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states); +out_put_vf: + ice_put_vf(vf); + return ret; +} +EXPORT_SYMBOL(ice_migration_load_devstate); diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index 730eeaea8c89..54f441daa87e 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -3982,11 +3982,24 @@ ice_is_malicious_vf(struct ice_vf *vf, struct ice_mbx_data *mbxdata) * @event: pointer to the AQ event * @mbxdata: information used to detect VF attempting mailbox overflow * - * called from the common asq/arq handler to - * process request from VF + * This function will be called from: + * 1. the common asq/arq handler to process request from VF + * + * The return value is ignored, as the command handler will send the status + * of the request as a response to the VF. This flow sets the mbxdata to + * a non-NULL value and must call ice_is_malicious_vf to determine if this + * VF might be attempting to overflow the PF message queue. + * + * 2. replay virtual channel commamds during live migration + * + * The return value is used to indicate failure to replay vc commands and + * that the migration failed. This flow sets mbxdata to NULL and skips the + * ice_is_malicious_vf checks which are unnecessary during replay. + * + * Return 0 if success, negative for failure. */ -void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, - struct ice_mbx_data *mbxdata) +int ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, + struct ice_mbx_data *mbxdata) { u32 v_opcode = le32_to_cpu(event->desc.cookie_high); s16 vf_id = le16_to_cpu(event->desc.retval); @@ -4003,13 +4016,14 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, if (!vf) { dev_err(dev, "Unable to locate VF for message from VF ID %d, opcode %d, len %d\n", vf_id, v_opcode, msglen); - return; + return -EINVAL; } mutex_lock(&vf->cfg_lock); /* Check if the VF is trying to overflow the mailbox */ - if (ice_is_malicious_vf(vf, mbxdata)) + if (!test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states) && + ice_is_malicious_vf(vf, mbxdata)) goto finish; /* Check if VF is disabled. */ @@ -4190,4 +4204,5 @@ void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, finish: mutex_unlock(&vf->cfg_lock); ice_put_vf(vf); + return err; } diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.h b/drivers/net/ethernet/intel/ice/ice_virtchnl.h index a2b6094e2f2f..4b151a228c52 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.h +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.h @@ -63,8 +63,8 @@ int ice_vc_respond_to_vf(struct ice_vf *vf, u32 v_opcode, enum virtchnl_status_code v_retval, u8 *msg, u16 msglen); bool ice_vc_isvalid_vsi_id(struct ice_vf *vf, u16 vsi_id); -void ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, - struct ice_mbx_data *mbxdata); +int ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, + struct ice_mbx_data *mbxdata); #else /* CONFIG_PCI_IOV */ static inline void ice_virtchnl_set_dflt_ops(struct ice_vf *vf) { } static inline void ice_virtchnl_set_repr_ops(struct ice_vf *vf) { } @@ -84,10 +84,11 @@ static inline bool ice_vc_isvalid_vsi_id(struct ice_vf *vf, u16 vsi_id) return false; } -static inline void +static inline int ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, struct ice_mbx_data *mbxdata) { + return -EOPNOTSUPP; } #endif /* !CONFIG_PCI_IOV */ diff --git a/include/linux/net/intel/ice_migration.h b/include/linux/net/intel/ice_migration.h index 7ea11a8714d6..a142b78283a8 100644 --- a/include/linux/net/intel/ice_migration.h +++ b/include/linux/net/intel/ice_migration.h @@ -10,6 +10,10 @@ struct ice_pf; struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev); int ice_migration_init_dev(struct ice_pf *pf, int vf_id); void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id); +int ice_migration_save_devstate(struct ice_pf *pf, int vf_id, + u8 *buf, u64 buf_sz); +int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, + const u8 *buf, u64 buf_sz); #else static inline struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev) { @@ -22,6 +26,17 @@ static inline int ice_migration_init_dev(struct ice_pf *pf, int vf_id) } static inline void ice_migration_uninit_dev(struct ice_pf *pf, int vf_id) { } +static inline int +ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz) +{ + return 0; +} + +static inline int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, + const u8 *buf, u64 buf_sz) +{ + return 0; +} #endif /* CONFIG_ICE_VFIO_PCI */ #endif /* _ICE_MIGRATION_H_ */ From patchwork Tue Nov 21 02:51:06 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462485 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="MGdB1+BT" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 94D55CB; Mon, 20 Nov 2023 18:50:30 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700535030; x=1732071030; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=5FesQkUHYNJNnWuZOuOfrR5+GtijYybwUR/0BVOAmsQ=; b=MGdB1+BTiDRUXBy45GRecq5NWmqMqVHNqBGRXNlxo6N9vxGUPpxH8ZhX UjPf0QKfPpv8wo2LNbevtjZasrX72/Tc3zca7G+kNx5M65HqQPBFxnbki e/2dvZLE6KW/4Y3uBswz2zC1SA1n2yOtfy3JoYB6ICIJaB2mF4+lRzbyZ 3qiSU0PyLL88+6/su54rw56kNMj+gM0YdH9PN82LhKknZpbXnHH4cH0Ay MMEr/3jJ/fwEOxXumrwbCUDDPhSDhsGsnN9IYLf0tMvxjEkH8mBsEkIxI FKOGSINd7+uT65RQYPmsOlW1EB44sjRXp7D8n/jodsObyXZgg5qldOghN w==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458246065" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458246065" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:50:17 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488451" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488451" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:50:12 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 07/12] ice: Fix VSI id in virtual channel message for migration Date: Tue, 21 Nov 2023 02:51:06 +0000 Message-Id: <20231121025111.257597-8-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Lingyu Liu VSI id is a resource id for each VF and it is an absolute hardware id per PCI card. It is exposed to VF driver through virtual channel messages at the VF-PF negotiation stage. It is constant during the whole device lifecycle unless driver re-init. Almost all of the virtual channel messages will contain the VSI id. Once PF receives message, it will check if VSI id in the message is equal to the VF's VSI id for security and other reason. If a VM backed by VF VSI A is migrated to a VM backed by VF with VSI B, while in messages replaying stage, all the messages will be rejected by PF due to the invalid VSI id. Even after migration, VM runtime will get failure as well. Fix this gap by modifying the VSI id in the virtual channel message at migration device resuming stage and VM runtime stage. In most cases the VSI id will vary between migration source and destination side. And this is a slow path anyway. Signed-off-by: Lingyu Liu Signed-off-by: Yahui Cao --- .../net/ethernet/intel/ice/ice_migration.c | 95 +++++++++++++++++++ .../intel/ice/ice_migration_private.h | 4 + drivers/net/ethernet/intel/ice/ice_vf_lib.h | 1 + drivers/net/ethernet/intel/ice/ice_virtchnl.c | 1 + 4 files changed, 101 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c index 981aa92bbe86..780d2183011a 100644 --- a/drivers/net/ethernet/intel/ice/ice_migration.c +++ b/drivers/net/ethernet/intel/ice/ice_migration.c @@ -25,6 +25,7 @@ struct ice_migration_dev_state { u16 num_txq; u16 num_rxq; + u16 vsi_id; u8 virtchnl_msgs[]; } __aligned(8); @@ -50,6 +51,7 @@ void ice_migration_init_vf(struct ice_vf *vf) INIT_LIST_HEAD(&vf->virtchnl_msg_list); vf->virtchnl_msg_num = 0; vf->virtchnl_msg_size = 0; + vf->vm_vsi_num = vf->lan_vsi_num; } /** @@ -314,6 +316,7 @@ ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz) devstate->num_txq = vsi->num_txq; devstate->num_rxq = vsi->num_rxq; buf = devstate->virtchnl_msgs; + devstate->vsi_id = vf->vm_vsi_num; list_for_each_entry(msg_listnode, &vf->virtchnl_msg_list, node) { struct ice_migration_virtchnl_msg_slot *msg_slot; @@ -441,6 +444,8 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, goto out_put_vf; devstate = (struct ice_migration_dev_state *)buf; + vf->vm_vsi_num = devstate->vsi_id; + dev_dbg(dev, "VF %d vm vsi num is:%d\n", vf->vf_id, vf->vm_vsi_num); msg_slot = (struct ice_migration_virtchnl_msg_slot *) devstate->virtchnl_msgs; set_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states); @@ -473,3 +478,93 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, return ret; } EXPORT_SYMBOL(ice_migration_load_devstate); + +/** + * ice_migration_fix_msg_vsi - change virtual channel msg VSI id + * + * @vf: pointer to the VF structure + * @v_opcode: virtchnl message operation code + * @msg: pointer to the virtual channel message + * + * After migration, the VSI id saved by VF driver may be different from VF + * VSI id. Some virtual channel commands will fail due to unmatch VSI id. + * Change virtual channel message payload VSI id to real VSI id. + */ +void ice_migration_fix_msg_vsi(struct ice_vf *vf, u32 v_opcode, u8 *msg) +{ + if (!vf->migration_enabled) + return; + + switch (v_opcode) { + case VIRTCHNL_OP_ADD_ETH_ADDR: + case VIRTCHNL_OP_DEL_ETH_ADDR: + case VIRTCHNL_OP_ENABLE_QUEUES: + case VIRTCHNL_OP_DISABLE_QUEUES: + case VIRTCHNL_OP_CONFIG_RSS_KEY: + case VIRTCHNL_OP_CONFIG_RSS_LUT: + case VIRTCHNL_OP_GET_STATS: + case VIRTCHNL_OP_CONFIG_PROMISCUOUS_MODE: + case VIRTCHNL_OP_ADD_FDIR_FILTER: + case VIRTCHNL_OP_DEL_FDIR_FILTER: + case VIRTCHNL_OP_ADD_VLAN: + case VIRTCHNL_OP_DEL_VLAN: { + /* Read the beginning two bytes of message for VSI id */ + u16 *vsi_id = (u16 *)msg; + + /* For VM runtime stage, vsi_id in the virtual channel message + * should be equal to the PF logged vsi_id and vsi_id is + * replaced by VF's VSI id to guarantee that messages are + * processed successfully. If vsi_id is not equal to the PF + * logged vsi_id, then this message must be sent by malicious + * VF and no replacement is needed. Just let virtual channel + * handler to fail this message. + * + * For virtual channel replaying stage, all of the PF logged + * virtual channel messages are trusted and vsi_id is replaced + * anyway to guarantee the messages are processed successfully. + */ + if (*vsi_id == vf->vm_vsi_num || + test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states)) + *vsi_id = vf->lan_vsi_num; + break; + } + case VIRTCHNL_OP_CONFIG_IRQ_MAP: { + struct virtchnl_irq_map_info *irqmap_info; + u16 num_q_vectors_mapped; + int i; + + irqmap_info = (struct virtchnl_irq_map_info *)msg; + num_q_vectors_mapped = irqmap_info->num_vectors; + for (i = 0; i < num_q_vectors_mapped; i++) { + struct virtchnl_vector_map *map; + + map = &irqmap_info->vecmap[i]; + if (map->vsi_id == vf->vm_vsi_num || + test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states)) + map->vsi_id = vf->lan_vsi_num; + } + break; + } + case VIRTCHNL_OP_CONFIG_VSI_QUEUES: { + struct virtchnl_vsi_queue_config_info *qci; + + qci = (struct virtchnl_vsi_queue_config_info *)msg; + if (qci->vsi_id == vf->vm_vsi_num || + test_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states)) { + int i; + + qci->vsi_id = vf->lan_vsi_num; + for (i = 0; i < qci->num_queue_pairs; i++) { + struct virtchnl_queue_pair_info *qpi; + + qpi = &qci->qpair[i]; + qpi->txq.vsi_id = vf->lan_vsi_num; + qpi->rxq.vsi_id = vf->lan_vsi_num; + } + } + break; + } + default: + break; + } +} diff --git a/drivers/net/ethernet/intel/ice/ice_migration_private.h b/drivers/net/ethernet/intel/ice/ice_migration_private.h index 676eb2d6c12e..f72a488d9002 100644 --- a/drivers/net/ethernet/intel/ice/ice_migration_private.h +++ b/drivers/net/ethernet/intel/ice/ice_migration_private.h @@ -17,6 +17,7 @@ int ice_migration_log_vf_msg(struct ice_vf *vf, struct ice_rq_event_info *event); void ice_migration_unlog_vf_msg(struct ice_vf *vf, u32 v_opcode); u32 ice_migration_supported_caps(void); +void ice_migration_fix_msg_vsi(struct ice_vf *vf, u32 v_opcode, u8 *msg); #else static inline void ice_migration_init_vf(struct ice_vf *vf) { } static inline void ice_migration_uninit_vf(struct ice_vf *vf) { } @@ -33,6 +34,9 @@ ice_migration_supported_caps(void) { return 0xFFFFFFFF; } + +static inline void +ice_migration_fix_msg_vsi(struct ice_vf *vf, u32 v_opcode, u8 *msg) { } #endif /* CONFIG_ICE_VFIO_PCI */ #endif /* _ICE_MIGRATION_PRIVATE_H_ */ diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h index 318b6dfc016d..49d99694e91f 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h @@ -146,6 +146,7 @@ struct ice_vf { u64 virtchnl_msg_num; u64 virtchnl_msg_size; u32 virtchnl_retval; + u16 vm_vsi_num; }; /* Flags for controlling behavior of ice_reset_vf */ diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index 54f441daa87e..8dbe558790af 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -4060,6 +4060,7 @@ int ice_vc_process_vf_msg(struct ice_pf *pf, struct ice_rq_event_info *event, } if (vf->migration_enabled) { + ice_migration_fix_msg_vsi(vf, v_opcode, msg); if (ice_migration_log_vf_msg(vf, event)) { u32 status_code = VIRTCHNL_STATUS_ERR_NO_MEMORY; From patchwork Tue Nov 21 02:51:07 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462486 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="cF6Vef0y" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id B8CA5DC; Mon, 20 Nov 2023 18:50:32 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700535032; x=1732071032; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=chDhMbwAFTAOzcJ9fqd2u6cVrr/ul9wT1vFdAEszvfg=; b=cF6Vef0yMIj3B28st8Tdxnvm1TaK2FooeOYIQ1opPRKxCNy+9PrN4JNh LC/nL4IY/8ILcL6EBm/wdHYExVY/ep0/UgHrLbbLj3JZ+y6FD0YZWsM2X nlDsS8qmasxtnHkhtFAw4o9Cu+xAy/TlzMC4ezNxQylJBYir9kNLvSpeP h/UO5GGEbcpAWgaFvyzgbJl9kko0abMsSsml6XWSPX67KHK0g0Wz1Vw/J IUntWW+vSzkfapPl8Re0hapwh5kHj08OEcHPvp9b082dk+LKij7xPX1TV +O9r8bdqOlLG+QxF47Whis01sVIxgrk4Ki4lbec8HAS6565PkSmJ2zZjF A==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458246089" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458246089" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:50:21 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488481" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488481" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:50:16 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 08/12] ice: Save and load RX Queue head Date: Tue, 21 Nov 2023 02:51:07 +0000 Message-Id: <20231121025111.257597-9-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Lingyu Liu RX Queue head is a fundamental dma ring context which determines the next RX descriptor to be fetched. However, RX Queue head is not visible to VF while it is only visible in PF. As a result, PF needs to save and load RX Queue Head explicitly. Since network packets may come in at any time once RX Queue is enabled, RX Queue head needs to be loaded before Queue is enabled. RX Queue head loading handler is implemented by reading and then overwriting queue context with specific HEAD value. Signed-off-by: Lingyu Liu Signed-off-by: Yahui Cao --- .../net/ethernet/intel/ice/ice_migration.c | 125 ++++++++++++++++++ 1 file changed, 125 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c index 780d2183011a..473be6a83cf3 100644 --- a/drivers/net/ethernet/intel/ice/ice_migration.c +++ b/drivers/net/ethernet/intel/ice/ice_migration.c @@ -2,9 +2,11 @@ /* Copyright (C) 2018-2023 Intel Corporation */ #include "ice.h" +#include "ice_base.h" #define ICE_MIG_DEVSTAT_MAGIC 0xE8000001 #define ICE_MIG_DEVSTAT_VERSION 0x1 +#define ICE_MIG_VF_QRX_TAIL_MAX 256 struct ice_migration_virtchnl_msg_slot { u32 opcode; @@ -26,6 +28,8 @@ struct ice_migration_dev_state { u16 num_rxq; u16 vsi_id; + /* next RX desc index to be processed by the device */ + u16 rx_head[ICE_MIG_VF_QRX_TAIL_MAX]; u8 virtchnl_msgs[]; } __aligned(8); @@ -264,6 +268,54 @@ u32 ice_migration_supported_caps(void) return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE; } +/** + * ice_migration_save_rx_head - save rx head into device state buffer + * @vf: pointer to VF structure + * @devstate: pointer to migration buffer + * + * Return 0 for success, negative for error + */ +static int +ice_migration_save_rx_head(struct ice_vf *vf, + struct ice_migration_dev_state *devstate) +{ + struct device *dev = ice_pf_to_dev(vf->pf); + struct ice_vsi *vsi; + int i; + + vsi = ice_get_vf_vsi(vf); + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + return -EINVAL; + } + + ice_for_each_rxq(vsi, i) { + struct ice_rx_ring *rx_ring = vsi->rx_rings[i]; + struct ice_rlan_ctx rlan_ctx = {}; + struct ice_hw *hw = &vf->pf->hw; + u16 rxq_index; + int status; + + if (WARN_ON_ONCE(!rx_ring)) + return -EINVAL; + + devstate->rx_head[i] = 0; + if (!test_bit(i, vf->rxq_ena)) + continue; + + rxq_index = rx_ring->reg_idx; + status = ice_read_rxq_ctx(hw, &rlan_ctx, rxq_index); + if (status) { + dev_err(dev, "Failed to read RXQ[%d] context, err=%d\n", + rx_ring->q_index, status); + return -EIO; + } + devstate->rx_head[i] = rlan_ctx.head; + } + + return 0; +} + /** * ice_migration_save_devstate - save device state to migration buffer * @pf: pointer to PF of migration device @@ -318,6 +370,12 @@ ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz) buf = devstate->virtchnl_msgs; devstate->vsi_id = vf->vm_vsi_num; + ret = ice_migration_save_rx_head(vf, devstate); + if (ret) { + dev_err(dev, "VF %d failed to save rxq head\n", vf->vf_id); + goto out_put_vf; + } + list_for_each_entry(msg_listnode, &vf->virtchnl_msg_list, node) { struct ice_migration_virtchnl_msg_slot *msg_slot; u64 slot_size; @@ -409,6 +467,57 @@ ice_migration_check_match(struct ice_vf *vf, const u8 *buf, u64 buf_sz) return 0; } +/** + * ice_migration_load_rx_head - load rx head from device state buffer + * @vf: pointer to VF structure + * @devstate: pointer to migration device state + * + * Return 0 for success, negative for error + */ +static int +ice_migration_load_rx_head(struct ice_vf *vf, + struct ice_migration_dev_state *devstate) +{ + struct device *dev = ice_pf_to_dev(vf->pf); + struct ice_vsi *vsi; + int i; + + vsi = ice_get_vf_vsi(vf); + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + return -EINVAL; + } + + ice_for_each_rxq(vsi, i) { + struct ice_rx_ring *rx_ring = vsi->rx_rings[i]; + struct ice_rlan_ctx rlan_ctx = {}; + struct ice_hw *hw = &vf->pf->hw; + u16 rxq_index; + int status; + + if (WARN_ON_ONCE(!rx_ring)) + return -EINVAL; + + rxq_index = rx_ring->reg_idx; + status = ice_read_rxq_ctx(hw, &rlan_ctx, rxq_index); + if (status) { + dev_err(dev, "Failed to read RXQ[%d] context, err=%d\n", + rx_ring->q_index, status); + return -EIO; + } + + rlan_ctx.head = devstate->rx_head[i]; + status = ice_write_rxq_ctx(hw, &rlan_ctx, rxq_index); + if (status) { + dev_err(dev, "Failed to set LAN RXQ[%d] context, err=%d\n", + rx_ring->q_index, status); + return -EIO; + } + } + + return 0; +} + /** * ice_migration_load_devstate - load device state at destination * @pf: pointer to PF of migration device @@ -467,6 +576,22 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, vf->vf_id, msg_slot->opcode); goto out_clear_replay; } + + /* Once RX Queue is enabled, network traffic may come in at any + * time. As a result, RX Queue head needs to be loaded before + * RX Queue is enabled. + * For simplicity and integration, overwrite RX head just after + * RX ring context is configured. + */ + if (msg_slot->opcode == VIRTCHNL_OP_CONFIG_VSI_QUEUES) { + ret = ice_migration_load_rx_head(vf, devstate); + if (ret) { + dev_err(dev, "VF %d failed to load rx head\n", + vf->vf_id); + goto out_clear_replay; + } + } + event.msg_buf = NULL; msg_slot = (struct ice_migration_virtchnl_msg_slot *) ((char *)msg_slot + slot_sz); From patchwork Tue Nov 21 02:51:08 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462488 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="ce/n7FJN" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 1E7CFC4; Mon, 20 Nov 2023 18:50:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700535034; x=1732071034; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=Pq73yuf6JeL2kdbR+BSOiKELUA3aE1vbhSDePzfws90=; b=ce/n7FJN45FYMPVMgFIp1i+ZH2YWJa1aUNcTULp7wIY468kzICRFUHep q9Uzwp9FB9KBJ+EyfndJxeMuGQUUNMWIGD8BqNotaa9Eq9VQpU8NWMZSw ilApDrCTQeRT0ERLEeEdc0HZ2cEIefn2YyGTj3Lx/UWoV1ufg6F0e+n4h 9mNGPt4Z1Az3nhv7YYq40aFbLJ5hswi+mB1d20cA/uaJk+38FlqOkXH+K 91c/dqNl8AJgft+j9LcRtTLo+SXsTQsD9+cLjh8aG5BuBuDQ1CsmyA0Ny ScDeHjNtBI0qX+igi1E8WzzRUNSv/VCcHSXZECeNl2tyfYe4s0U3V7n/W Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458246112" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458246112" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:50:25 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488539" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488539" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:50:20 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 09/12] ice: Save and load TX Queue head Date: Tue, 21 Nov 2023 02:51:08 +0000 Message-Id: <20231121025111.257597-10-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Lingyu Liu TX Queue head is a fundamental DMA ring context which determines the next TX descriptor to be fetched. However, TX Queue head is not visible to VF while it is only visible in PF. As a result, PF needs to save and load TX Queue head explicitly. Unfortunately, due to HW limitation, TX Queue head can't be recovered through writing mmio registers. Since sending one packet will make TX head advanced by 1 index, TX Queue head can be advanced by N index through sending N packets. Filling in DMA ring with NOP descriptors and bumping doorbell can be used to change TX Queue head indirectly. And this method has no side effects except changing TX head value. To advance TX Head queue, HW needs to touch memory by DMA. But directly touching VM's memory to advance TX Queue head does not follow vfio migration protocol design, because vIOMMU state is not defined by the protocol. Even this may introduce functional and security issue under hostile guest circumstances. In order not to touch any VF memory or IO page table, TX Queue head loading is using PF managed memory and PF isolation domain. This will also introduce another dependency that while switching TX Queue between PF space and VF space, TX Queue head value is not changed. HW provides an indirect context access so that head value can be kept while switching context. In virtual channel model, VF driver only send TX queue ring base and length info to PF, while rest of the TX queue context are managed by PF. TX queue length must be verified by PF during virtual channel message processing. When PF uses dummy descriptors to advance TX head, it will configure the TX ring base as the new address managed by PF itself. As a result, all of the TX queue context is taken control of by PF and this method won't generate any attacking vulnerability The overall steps for TX head loading handler are: 1. Backup TX context, switch TX queue context as PF space and PF DMA ring base with interrupt disabled 2. Fill the DMA ring with dummy descriptors and bump doorbell to advance TX head. Once kicking doorbell, HW will issue DMA and send PCI upstream memory transaction tagged by PF BDF. Since ring base is PF's managed DMA buffer, DMA can work successfully and TX Head is advanced as expected. 3. Overwrite TX context by the backup context in step 1. Since TX queue head value is not changed while context switch, TX queue head is successfully loaded. Since everything is happening inside PF context, it is transparent to vfio driver and has no effects outside of PF. Co-developed-by: Yahui Cao Signed-off-by: Yahui Cao Signed-off-by: Lingyu Liu --- .../net/ethernet/intel/ice/ice_migration.c | 306 ++++++++++++++++++ drivers/net/ethernet/intel/ice/ice_virtchnl.c | 18 ++ 2 files changed, 324 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c index 473be6a83cf3..082ae2b79f60 100644 --- a/drivers/net/ethernet/intel/ice/ice_migration.c +++ b/drivers/net/ethernet/intel/ice/ice_migration.c @@ -3,10 +3,14 @@ #include "ice.h" #include "ice_base.h" +#include "ice_txrx_lib.h" #define ICE_MIG_DEVSTAT_MAGIC 0xE8000001 #define ICE_MIG_DEVSTAT_VERSION 0x1 #define ICE_MIG_VF_QRX_TAIL_MAX 256 +#define QTX_HEAD_RESTORE_DELAY_MAX 100 +#define QTX_HEAD_RESTORE_DELAY_SLEEP_US_MIN 10 +#define QTX_HEAD_RESTORE_DELAY_SLEEP_US_MAX 10 struct ice_migration_virtchnl_msg_slot { u32 opcode; @@ -30,6 +34,8 @@ struct ice_migration_dev_state { u16 vsi_id; /* next RX desc index to be processed by the device */ u16 rx_head[ICE_MIG_VF_QRX_TAIL_MAX]; + /* next TX desc index to be processed by the device */ + u16 tx_head[ICE_MIG_VF_QRX_TAIL_MAX]; u8 virtchnl_msgs[]; } __aligned(8); @@ -316,6 +322,62 @@ ice_migration_save_rx_head(struct ice_vf *vf, return 0; } +/** + * ice_migration_save_tx_head - save tx head in migration region + * @vf: pointer to VF structure + * @devstate: pointer to migration device state + * + * Return 0 for success, negative for error + */ +static int +ice_migration_save_tx_head(struct ice_vf *vf, + struct ice_migration_dev_state *devstate) +{ + struct ice_vsi *vsi = ice_get_vf_vsi(vf); + struct ice_pf *pf = vf->pf; + struct device *dev; + int i = 0; + + dev = ice_pf_to_dev(pf); + + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + return -EINVAL; + } + + ice_for_each_txq(vsi, i) { + u16 tx_head; + u32 reg; + + devstate->tx_head[i] = 0; + if (!test_bit(i, vf->txq_ena)) + continue; + + reg = rd32(&pf->hw, QTX_COMM_HEAD(vsi->txq_map[i])); + tx_head = (reg & QTX_COMM_HEAD_HEAD_M) + >> QTX_COMM_HEAD_HEAD_S; + + /* 1. If TX head is QTX_COMM_HEAD_HEAD_M marker, which means + * it is the value written by software and there are no + * descriptors write back happened, then there are no + * packets sent since queue enabled. + * 2. If TX head is ring length minus 1, then it just returns + * to the start of the ring. + */ + if (tx_head == QTX_COMM_HEAD_HEAD_M || + tx_head == (vsi->tx_rings[i]->count - 1)) + tx_head = 0; + else + /* Add compensation since value read from TX Head + * register is always the real TX head minus 1 + */ + tx_head++; + + devstate->tx_head[i] = tx_head; + } + return 0; +} + /** * ice_migration_save_devstate - save device state to migration buffer * @pf: pointer to PF of migration device @@ -376,6 +438,12 @@ ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz) goto out_put_vf; } + ret = ice_migration_save_tx_head(vf, devstate); + if (ret) { + dev_err(dev, "VF %d failed to save txq head\n", vf->vf_id); + goto out_put_vf; + } + list_for_each_entry(msg_listnode, &vf->virtchnl_msg_list, node) { struct ice_migration_virtchnl_msg_slot *msg_slot; u64 slot_size; @@ -518,6 +586,234 @@ ice_migration_load_rx_head(struct ice_vf *vf, return 0; } +/** + * ice_migration_init_dummy_desc - init dma ring by dummy descriptor + * @tx_desc: tx ring descriptor array + * @len: array length + * @tx_pkt_dma: dummy packet dma address + */ +static inline void +ice_migration_init_dummy_desc(struct ice_tx_desc *tx_desc, + u16 len, + dma_addr_t tx_pkt_dma) +{ + int i; + + /* Init ring with dummy descriptors */ + for (i = 0; i < len; i++) { + u32 td_cmd; + + td_cmd = ICE_TXD_LAST_DESC_CMD | ICE_TX_DESC_CMD_DUMMY; + tx_desc[i].cmd_type_offset_bsz = + ice_build_ctob(td_cmd, 0, SZ_256, 0); + tx_desc[i].buf_addr = cpu_to_le64(tx_pkt_dma); + } +} + +/** + * ice_migration_wait_for_tx_completion - wait for TX transmission completion + * @hw: pointer to the device HW structure + * @tx_ring: tx ring instance + * @head: expected tx head position when transmission completion + * + * Return 0 for success, negative for error. + */ +static int +ice_migration_wait_for_tx_completion(struct ice_hw *hw, + struct ice_tx_ring *tx_ring, u16 head) +{ + u32 tx_head; + int i; + + tx_head = rd32(hw, QTX_COMM_HEAD(tx_ring->reg_idx)); + tx_head = (tx_head & QTX_COMM_HEAD_HEAD_M) + >> QTX_COMM_HEAD_HEAD_S; + + for (i = 0; i < QTX_HEAD_RESTORE_DELAY_MAX && tx_head != (head - 1); + i++) { + usleep_range(QTX_HEAD_RESTORE_DELAY_SLEEP_US_MIN, + QTX_HEAD_RESTORE_DELAY_SLEEP_US_MAX); + + tx_head = rd32(hw, QTX_COMM_HEAD(tx_ring->reg_idx)); + tx_head = (tx_head & QTX_COMM_HEAD_HEAD_M) + >> QTX_COMM_HEAD_HEAD_S; + } + + if (i == QTX_HEAD_RESTORE_DELAY_MAX) + return -EBUSY; + + return 0; +} + +/** + * ice_migration_inject_dummy_desc - inject dummy descriptors + * @vf: pointer to VF structure + * @tx_ring: tx ring instance + * @head: tx head to be loaded + * @tx_desc_dma:tx descriptor ring base dma address + * + * For each TX queue, load the TX head by following below steps: + * 1. Backup TX context, switch TX queue context as PF space and PF + * DMA ring base with interrupt disabled + * 2. Fill the DMA ring with dummy descriptors and bump doorbell to + * advance TX head. Once kicking doorbell, HW will issue DMA and + * send PCI upstream memory transaction tagged by PF BDF. Since + * ring base is PF's managed DMA buffer, DMA can work successfully + * and TX Head is advanced as expected. + * 3. Overwrite TX context by the backup context in step 1. Since TX + * queue head value is not changed while context switch, TX queue + * head is successfully loaded. + * + * Return 0 for success, negative for error. + */ +static int +ice_migration_inject_dummy_desc(struct ice_vf *vf, struct ice_tx_ring *tx_ring, + u16 head, dma_addr_t tx_desc_dma) +{ + struct ice_tlan_ctx tlan_ctx, tlan_ctx_orig; + struct device *dev = ice_pf_to_dev(vf->pf); + struct ice_hw *hw = &vf->pf->hw; + u32 dynctl; + u32 tqctl; + int status; + int ret; + + /* 1.1 Backup TX Queue context */ + status = ice_read_txq_ctx(hw, &tlan_ctx, tx_ring->reg_idx); + if (status) { + dev_err(dev, "Failed to read TXQ[%d] context, err=%d\n", + tx_ring->q_index, status); + return -EIO; + } + memcpy(&tlan_ctx_orig, &tlan_ctx, sizeof(tlan_ctx)); + tqctl = rd32(hw, QINT_TQCTL(tx_ring->reg_idx)); + if (tx_ring->q_vector) + dynctl = rd32(hw, GLINT_DYN_CTL(tx_ring->q_vector->reg_idx)); + + /* 1.2 switch TX queue context as PF space and PF DMA ring base */ + tlan_ctx.vmvf_type = ICE_TLAN_CTX_VMVF_TYPE_PF; + tlan_ctx.vmvf_num = 0; + tlan_ctx.base = tx_desc_dma >> ICE_TLAN_CTX_BASE_S; + status = ice_write_txq_ctx(hw, &tlan_ctx, tx_ring->reg_idx); + if (status) { + dev_err(dev, "Failed to write TXQ[%d] context, err=%d\n", + tx_ring->q_index, status); + return -EIO; + } + + /* 1.3 Disable TX queue interrupt */ + wr32(hw, QINT_TQCTL(tx_ring->reg_idx), QINT_TQCTL_ITR_INDX_M); + + /* To disable tx queue interrupt during run time, software should + * write mmio to trigger a MSIX interrupt. + */ + if (tx_ring->q_vector) + wr32(hw, GLINT_DYN_CTL(tx_ring->q_vector->reg_idx), + (ICE_ITR_NONE << GLINT_DYN_CTL_ITR_INDX_S) | + GLINT_DYN_CTL_SWINT_TRIG_M | + GLINT_DYN_CTL_INTENA_M); + + /* Force memory writes to complete before letting h/w know there + * are new descriptors to fetch. + */ + wmb(); + + /* 2.1 Bump doorbell to advance TX Queue head */ + writel(head, tx_ring->tail); + + /* 2.2 Wait until TX Queue head move to expected place */ + ret = ice_migration_wait_for_tx_completion(hw, tx_ring, head); + if (ret) { + dev_err(dev, "VF %d txq[%d] head loading timeout\n", + vf->vf_id, tx_ring->q_index); + return ret; + } + + /* 3. Overwrite TX Queue context with backup context */ + status = ice_write_txq_ctx(hw, &tlan_ctx_orig, tx_ring->reg_idx); + if (status) { + dev_err(dev, "Failed to write TXQ[%d] context, err=%d\n", + tx_ring->q_index, status); + return -EIO; + } + wr32(hw, QINT_TQCTL(tx_ring->reg_idx), tqctl); + if (tx_ring->q_vector) + wr32(hw, GLINT_DYN_CTL(tx_ring->q_vector->reg_idx), dynctl); + + return 0; +} + +/** + * ice_migration_load_tx_head - load tx head + * @vf: pointer to VF structure + * @devstate: pointer to migration device state + * + * Return 0 for success, negative for error + */ +static int +ice_migration_load_tx_head(struct ice_vf *vf, + struct ice_migration_dev_state *devstate) +{ + struct device *dev = ice_pf_to_dev(vf->pf); + u16 ring_len = ICE_MAX_NUM_DESC; + dma_addr_t tx_desc_dma, tx_pkt_dma; + struct ice_tx_desc *tx_desc; + struct ice_vsi *vsi; + char *tx_pkt; + int ret = 0; + int i = 0; + + vsi = ice_get_vf_vsi(vf); + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + return -EINVAL; + } + + /* Allocate DMA ring and descriptor by PF */ + tx_desc = dma_alloc_coherent(dev, ring_len * sizeof(struct ice_tx_desc), + &tx_desc_dma, GFP_KERNEL | __GFP_ZERO); + tx_pkt = dma_alloc_coherent(dev, SZ_4K, &tx_pkt_dma, + GFP_KERNEL | __GFP_ZERO); + if (!tx_desc || !tx_pkt) { + dev_err(dev, "PF failed to allocate memory for VF %d\n", + vf->vf_id); + ret = -ENOMEM; + goto err; + } + + ice_for_each_txq(vsi, i) { + struct ice_tx_ring *tx_ring = vsi->tx_rings[i]; + u16 *tx_heads = devstate->tx_head; + + /* 1. Skip if TX Queue is not enabled */ + if (!test_bit(i, vf->txq_ena) || tx_heads[i] == 0) + continue; + + if (tx_heads[i] >= tx_ring->count) { + dev_err(dev, "VF %d: invalid tx ring length to load\n", + vf->vf_id); + ret = -EINVAL; + goto err; + } + + /* Dummy descriptors must be re-initialized after use, since + * it may be written back by HW + */ + ice_migration_init_dummy_desc(tx_desc, ring_len, tx_pkt_dma); + ret = ice_migration_inject_dummy_desc(vf, tx_ring, tx_heads[i], + tx_desc_dma); + if (ret) + goto err; + } + +err: + dma_free_coherent(dev, ring_len * sizeof(struct ice_tx_desc), + tx_desc, tx_desc_dma); + dma_free_coherent(dev, SZ_4K, tx_pkt, tx_pkt_dma); + + return ret; +} + /** * ice_migration_load_devstate - load device state at destination * @pf: pointer to PF of migration device @@ -596,6 +892,16 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, msg_slot = (struct ice_migration_virtchnl_msg_slot *) ((char *)msg_slot + slot_sz); } + + /* Only load the TX Queue head after rest of device state is loaded + * successfully. + */ + ret = ice_migration_load_tx_head(vf, devstate); + if (ret) { + dev_err(dev, "VF %d failed to load tx head\n", vf->vf_id); + goto out_clear_replay; + } + out_clear_replay: clear_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states); out_put_vf: diff --git a/drivers/net/ethernet/intel/ice/ice_virtchnl.c b/drivers/net/ethernet/intel/ice/ice_virtchnl.c index 8dbe558790af..e588712f585e 100644 --- a/drivers/net/ethernet/intel/ice/ice_virtchnl.c +++ b/drivers/net/ethernet/intel/ice/ice_virtchnl.c @@ -1351,6 +1351,24 @@ static int ice_vc_ena_qs_msg(struct ice_vf *vf, u8 *msg) continue; ice_vf_ena_txq_interrupt(vsi, vf_q_id); + + /* TX head register is a shadow copy of on-die TX head which + * maintains the accurate location. And TX head register is + * updated only after a packet is sent. If nothing is sent + * after the queue is enabled, then the value is the one + * updated last time and out-of-date. + * + * QTX_COMM_HEAD.HEAD rang value from 0x1fe0 to 0x1fff is + * reserved and will never be used by HW. Manually write a + * reserved value into TX head and use this as a marker for + * the case that there's no packets sent. + * + * This marker is only used in live migration use case. + */ + if (vf->migration_enabled) + wr32(&vsi->back->hw, + QTX_COMM_HEAD(vsi->txq_map[vf_q_id]), + QTX_COMM_HEAD_HEAD_M); set_bit(vf_q_id, vf->txq_ena); } From patchwork Tue Nov 21 02:51:09 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462487 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="PeDlf3uy" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id E474EE3; Mon, 20 Nov 2023 18:50:34 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700535035; x=1732071035; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=sGKE6kuIb+hb0CTMJZeKnZPATK+YHZfkRj39W2TIkQA=; b=PeDlf3uy6kBWxQf2WjS0ax8xQODnoRaT6HZLkrzL68NrPjJZXwT/o5sD XcW22EKRRIchPAwsriuf9aAXF/+c++6tKSVxosvTNG90sYwta343vwDEn ld9nOAcktV2DmQPphbx7dwOlC0bQNClVKHlR1nD6lsDrbr4EPu6lHagqF VQnySxobZa9KE5i3S0AD6eQnU1CcIi2WUoyBisUCItKYlajpcGvzFK4tB PuavZyXcOPFxEZAXGGJHmLxn2C0NeaTMRYZXxoayUFd8Gy3EPmI43hpjo nAmz+wm4sOnlxr+W+l780gFR1EUaQ1n4UgtYnHy1buh2Ddu5oAKN9Pday Q==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458246136" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458246136" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:50:30 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488585" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488585" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:50:25 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 10/12] ice: Add device suspend function for migration Date: Tue, 21 Nov 2023 02:51:09 +0000 Message-Id: <20231121025111.257597-11-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org From: Lingyu Liu Device suspend handler is called by vfio driver before saving device state. Typical operation includes stopping TX/RX queue. Signed-off-by: Lingyu Liu Signed-off-by: Yahui Cao --- .../net/ethernet/intel/ice/ice_migration.c | 69 +++++++++++++++++++ include/linux/net/intel/ice_migration.h | 6 ++ 2 files changed, 75 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c index 082ae2b79f60..a11cd0d3ad3d 100644 --- a/drivers/net/ethernet/intel/ice/ice_migration.c +++ b/drivers/net/ethernet/intel/ice/ice_migration.c @@ -2,6 +2,8 @@ /* Copyright (C) 2018-2023 Intel Corporation */ #include "ice.h" +#include "ice_lib.h" +#include "ice_fltr.h" #include "ice_base.h" #include "ice_txrx_lib.h" @@ -274,6 +276,73 @@ u32 ice_migration_supported_caps(void) return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE; } +/** + * ice_migration_suspend_dev - suspend device + * @pf: pointer to PF of migration device + * @vf_id: VF index of migration device + * + * Return 0 for success, negative for error + */ +int ice_migration_suspend_dev(struct ice_pf *pf, int vf_id) +{ + struct device *dev = ice_pf_to_dev(pf); + struct ice_vsi *vsi; + struct ice_vf *vf; + int ret; + + vf = ice_get_vf_by_id(pf, vf_id); + if (!vf) { + dev_err(dev, "Unable to locate VF from VF ID%d\n", vf_id); + return -EINVAL; + } + + if (!test_bit(ICE_VF_STATE_QS_ENA, vf->vf_states)) { + ret = 0; + goto out_put_vf; + } + + if (vf->virtchnl_msg_num > VIRTCHNL_MSG_MAX) { + dev_err(dev, "SR-IOV live migration disabled on VF %d. Migration buffer exceeded\n", + vf->vf_id); + ret = -EIO; + goto out_put_vf; + } + + vsi = ice_get_vf_vsi(vf); + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + ret = -EINVAL; + goto out_put_vf; + } + + /* Prevent VSI from queuing incoming packets by removing all filters */ + ice_fltr_remove_all(vsi); + + /* MAC based filter rule is disabled at this point. Set MAC to zero + * to keep consistency with VF mac address info shown by ip link + */ + eth_zero_addr(vf->hw_lan_addr); + eth_zero_addr(vf->dev_lan_addr); + + ret = ice_vsi_stop_lan_tx_rings(vsi, ICE_NO_RESET, vf->vf_id); + if (ret) { + dev_err(dev, "VF %d failed to stop tx rings\n", vf->vf_id); + ret = -EIO; + goto out_put_vf; + } + ret = ice_vsi_stop_all_rx_rings(vsi); + if (ret) { + dev_err(dev, "VF %d failed to stop rx rings\n", vf->vf_id); + ret = -EIO; + goto out_put_vf; + } + +out_put_vf: + ice_put_vf(vf); + return ret; +} +EXPORT_SYMBOL(ice_migration_suspend_dev); + /** * ice_migration_save_rx_head - save rx head into device state buffer * @vf: pointer to VF structure diff --git a/include/linux/net/intel/ice_migration.h b/include/linux/net/intel/ice_migration.h index a142b78283a8..47f46dca07ae 100644 --- a/include/linux/net/intel/ice_migration.h +++ b/include/linux/net/intel/ice_migration.h @@ -14,6 +14,7 @@ int ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz); int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, const u8 *buf, u64 buf_sz); +int ice_migration_suspend_dev(struct ice_pf *pf, int vf_id); #else static inline struct ice_pf *ice_migration_get_pf(struct pci_dev *pdev) { @@ -37,6 +38,11 @@ static inline int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, { return 0; } + +static inline int ice_migration_suspend_dev(struct ice_pf *pf, int vf_id) +{ + return 0; +} #endif /* CONFIG_ICE_VFIO_PCI */ #endif /* _ICE_MIGRATION_H_ */ From patchwork Tue Nov 21 02:51:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462489 X-Patchwork-Delegate: kuba@kernel.org Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="R8UW+tGi" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id F15ACFA; Mon, 20 Nov 2023 18:50:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700535036; x=1732071036; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=i8LrPqnDw7KutM9dMmDrGpORGJHz+x3QGV0jkRtM6zs=; b=R8UW+tGi2VQc2Jvnx2u9uolYwnsz7e4bnn+DeJDrJtxXx64ChTp5i/tw DTdQSCTfTdFsAUf4Or+HQfEykHsCfYNrRgYM5Vlupj/UghspaoUK8xvhA AP18aPANKGWSXtxoz7A8LFzFgnIUCbIdgbpehJKcKppcgrQBsTWeGxiqu QB/mKRqlpLk5KRC7tjB5d86KFTMIRV9VyDWqcGp7Y7gq1zlOoRo567h+p ex/uefj6hQAVWPc05C2Og+pX+To0D91xMhGypq+tRvZi7e8/PR3t4YFph ur4ErSzF2WH5WoMa1VnLxyD7d7H6a0XGDz+PM7VPuQmTEq9PT2lIouqJ0 w==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458246155" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458246155" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:50:34 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488615" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488615" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:50:29 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 11/12] ice: Save and load mmio registers Date: Tue, 21 Nov 2023 02:51:10 +0000 Message-Id: <20231121025111.257597-12-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Patchwork-Delegate: kuba@kernel.org In E800 device model, VF takes direct control over the context of AdminQ, irq ctrl, TX tail and RX tail by accessing VF pci mmio. Rest of all the state can only be setup by PF and the procedure is that VF sends all these configuration to PF through virtual channel messages to setup the rest of the state. To migrate AdminQ/irq context successfully, only AdminQ/irq register needs to be loaded, rest of the part like generic msix is handled by migration stack. To migrate RX dma ring successfully, RX ring base, length(setup via virtual channel messages) and tail register (setup via VF pci mmio) must be loaded before RX queue is enabled. To migrate TX dma ring successfully, TX ring base and length(setup via virtual channel messages) must be loaded before TX queue is enabled, and TX tail(setup via VF pci mmio) doesn't need to be loaded since TX queue is drained before migration and TX tail is stateless. For simplicity, just load all the VF pci mmio before virtual channel messages are replayed so that all the TX/RX ring context are loaded before queue is enabled. However, there are 2 corner cases which need to be taken care of: - During device suspenion, irq register may be dirtied when stopping queue. Hence save irq register into internal pre-saved area before queue is stopped and fetch the pre-saved irq register value at device saving stage. - When PF processes virtual channel VIRTCHNL_OP_CONFIG_VSI_QUEUES, irq register may be dirtied. Hence load the affacted irq register after virtual channel messages are replayed. Signed-off-by: Yahui Cao --- .../net/ethernet/intel/ice/ice_hw_autogen.h | 8 + .../net/ethernet/intel/ice/ice_migration.c | 308 ++++++++++++++++++ .../intel/ice/ice_migration_private.h | 7 + drivers/net/ethernet/intel/ice/ice_vf_lib.h | 2 + 4 files changed, 325 insertions(+) diff --git a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h index 7410da715ad4..389bf00411ff 100644 --- a/drivers/net/ethernet/intel/ice/ice_hw_autogen.h +++ b/drivers/net/ethernet/intel/ice/ice_hw_autogen.h @@ -31,8 +31,16 @@ #define PF_FW_ATQLEN_ATQVFE_M BIT(28) #define PF_FW_ATQLEN_ATQOVFL_M BIT(29) #define PF_FW_ATQLEN_ATQCRIT_M BIT(30) +#define VF_MBX_ARQBAH(_VF) (0x0022B800 + ((_VF) * 4)) +#define VF_MBX_ARQBAL(_VF) (0x0022B400 + ((_VF) * 4)) +#define VF_MBX_ARQH(_VF) (0x0022C000 + ((_VF) * 4)) #define VF_MBX_ARQLEN(_VF) (0x0022BC00 + ((_VF) * 4)) +#define VF_MBX_ARQT(_VF) (0x0022C400 + ((_VF) * 4)) +#define VF_MBX_ATQBAH(_VF) (0x0022A400 + ((_VF) * 4)) +#define VF_MBX_ATQBAL(_VF) (0x0022A000 + ((_VF) * 4)) +#define VF_MBX_ATQH(_VF) (0x0022AC00 + ((_VF) * 4)) #define VF_MBX_ATQLEN(_VF) (0x0022A800 + ((_VF) * 4)) +#define VF_MBX_ATQT(_VF) (0x0022B000 + ((_VF) * 4)) #define PF_FW_ATQLEN_ATQENABLE_M BIT(31) #define PF_FW_ATQT 0x00080400 #define PF_MBX_ARQBAH 0x0022E400 diff --git a/drivers/net/ethernet/intel/ice/ice_migration.c b/drivers/net/ethernet/intel/ice/ice_migration.c index a11cd0d3ad3d..127d45be6767 100644 --- a/drivers/net/ethernet/intel/ice/ice_migration.c +++ b/drivers/net/ethernet/intel/ice/ice_migration.c @@ -25,6 +25,27 @@ struct ice_migration_virtchnl_msg_listnode { struct ice_migration_virtchnl_msg_slot msg_slot; }; +struct ice_migration_mmio_regs { + /* VF Interrupts */ + u32 int_dyn_ctl[ICE_MIG_VF_MSIX_MAX]; + u32 int_intr[ICE_MIG_VF_ITR_NUM][ICE_MIG_VF_MSIX_MAX]; + + /* VF Control Queues */ + u32 asq_bal; + u32 asq_bah; + u32 asq_len; + u32 asq_head; + u32 asq_tail; + u32 arq_bal; + u32 arq_bah; + u32 arq_len; + u32 arq_head; + u32 arq_tail; + + /* VF LAN RX */ + u32 rx_tail[ICE_MIG_VF_QRX_TAIL_MAX]; +}; + struct ice_migration_dev_state { u32 magic; u32 version; @@ -33,6 +54,7 @@ struct ice_migration_dev_state { u16 num_txq; u16 num_rxq; + struct ice_migration_mmio_regs regs; u16 vsi_id; /* next RX desc index to be processed by the device */ u16 rx_head[ICE_MIG_VF_QRX_TAIL_MAX]; @@ -276,6 +298,57 @@ u32 ice_migration_supported_caps(void) return VIRTCHNL_VF_MIGRATION_SUPPORT_FEATURE; } +/** + * ice_migration_save_dirty_regs - save registers which may be dirtied + * @vf: pointer to VF structure + * + * Return 0 for success, negative for error + */ +static int ice_migration_save_dirty_regs(struct ice_vf *vf) +{ + struct ice_migration_dirty_regs *dirty_regs = &vf->dirty_regs; + struct device *dev = ice_pf_to_dev(vf->pf); + struct ice_hw *hw = &vf->pf->hw; + struct ice_vsi *vsi; + int itr, v_id; + + vsi = ice_get_vf_vsi(vf); + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + return -EINVAL; + } + + if (WARN_ON_ONCE(vsi->num_q_vectors + ICE_NONQ_VECS_VF > + ICE_MIG_VF_MSIX_MAX)) + return -EINVAL; + + /* Save Mailbox Q vectors */ + dirty_regs->int_dyn_ctl[0] = + rd32(hw, GLINT_DYN_CTL(vf->first_vector_idx)); + for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++) + dirty_regs->int_intr[itr][0] = + rd32(hw, GLINT_ITR(itr, vf->first_vector_idx)); + + /* Save Data Q vectors */ + for (v_id = 0; v_id < vsi->num_q_vectors; v_id++) { + int irq = v_id + ICE_NONQ_VECS_VF; + struct ice_q_vector *q_vector; + + q_vector = vsi->q_vectors[v_id]; + if (!q_vector) { + dev_err(dev, "VF %d invalid q vectors\n", vf->vf_id); + return -EINVAL; + } + dirty_regs->int_dyn_ctl[irq] = + rd32(hw, GLINT_DYN_CTL(q_vector->reg_idx)); + for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++) + dirty_regs->int_intr[itr][irq] = + rd32(hw, GLINT_ITR(itr, q_vector->reg_idx)); + } + + return 0; +} + /** * ice_migration_suspend_dev - suspend device * @pf: pointer to PF of migration device @@ -324,6 +397,15 @@ int ice_migration_suspend_dev(struct ice_pf *pf, int vf_id) eth_zero_addr(vf->hw_lan_addr); eth_zero_addr(vf->dev_lan_addr); + /* Irq register may be dirtied when stopping queue. Hence save irq + * register into pre-saved area before queue is stopped. + */ + ret = ice_migration_save_dirty_regs(vf); + if (ret) { + dev_err(dev, "VF %d failed to save dirty register copy\n", + vf->vf_id); + goto out_put_vf; + } ret = ice_vsi_stop_lan_tx_rings(vsi, ICE_NO_RESET, vf->vf_id); if (ret) { dev_err(dev, "VF %d failed to stop tx rings\n", vf->vf_id); @@ -447,6 +529,84 @@ ice_migration_save_tx_head(struct ice_vf *vf, return 0; } +/** + * ice_migration_save_regs - save mmio registers in migration region + * @vf: pointer to VF structure + * @devstate: pointer to migration device state + * + * Return 0 for success, negative for error + */ +static int +ice_migration_save_regs(struct ice_vf *vf, + struct ice_migration_dev_state *devstate) +{ + struct ice_migration_dirty_regs *dirty_regs = &vf->dirty_regs; + struct ice_migration_mmio_regs *regs = &devstate->regs; + struct device *dev = ice_pf_to_dev(vf->pf); + struct ice_hw *hw = &vf->pf->hw; + struct ice_vsi *vsi; + int i, itr, v_id; + + vsi = ice_get_vf_vsi(vf); + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + return -EINVAL; + } + + if (WARN_ON_ONCE(vsi->num_q_vectors + ICE_NONQ_VECS_VF > + ICE_MIG_VF_MSIX_MAX)) + return -EINVAL; + + /* For irq registers which may be dirtied when virtual channel message + * VIRTCHNL_OP_CONFIG_VSI_QUEUES is processed, load values from + * pre-saved area. + */ + + /* Save Mailbox Q vectors */ + regs->int_dyn_ctl[0] = dirty_regs->int_dyn_ctl[0]; + for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++) + regs->int_intr[itr][0] = dirty_regs->int_intr[itr][0]; + + /* Save Data Q vectors */ + for (v_id = 0; v_id < vsi->num_q_vectors; v_id++) { + int irq = v_id + ICE_NONQ_VECS_VF; + struct ice_q_vector *q_vector; + + q_vector = vsi->q_vectors[v_id]; + if (!q_vector) { + dev_err(dev, "VF %d invalid q vectors\n", vf->vf_id); + return -EINVAL; + } + regs->int_dyn_ctl[irq] = dirty_regs->int_dyn_ctl[irq]; + for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++) + regs->int_intr[itr][irq] = + dirty_regs->int_intr[itr][irq]; + } + + regs->asq_bal = rd32(hw, VF_MBX_ATQBAL(vf->vf_id)); + regs->asq_bah = rd32(hw, VF_MBX_ATQBAH(vf->vf_id)); + regs->asq_len = rd32(hw, VF_MBX_ATQLEN(vf->vf_id)); + regs->asq_head = rd32(hw, VF_MBX_ATQH(vf->vf_id)); + regs->asq_tail = rd32(hw, VF_MBX_ATQT(vf->vf_id)); + regs->arq_bal = rd32(hw, VF_MBX_ARQBAL(vf->vf_id)); + regs->arq_bah = rd32(hw, VF_MBX_ARQBAH(vf->vf_id)); + regs->arq_len = rd32(hw, VF_MBX_ARQLEN(vf->vf_id)); + regs->arq_head = rd32(hw, VF_MBX_ARQH(vf->vf_id)); + regs->arq_tail = rd32(hw, VF_MBX_ARQT(vf->vf_id)); + + ice_for_each_rxq(vsi, i) { + struct ice_rx_ring *rx_ring = vsi->rx_rings[i]; + + regs->rx_tail[i] = 0; + if (!test_bit(i, vf->rxq_ena)) + continue; + + regs->rx_tail[i] = rd32(hw, QRX_TAIL(rx_ring->reg_idx)); + } + + return 0; +} + /** * ice_migration_save_devstate - save device state to migration buffer * @pf: pointer to PF of migration device @@ -501,6 +661,12 @@ ice_migration_save_devstate(struct ice_pf *pf, int vf_id, u8 *buf, u64 buf_sz) buf = devstate->virtchnl_msgs; devstate->vsi_id = vf->vm_vsi_num; + ret = ice_migration_save_regs(vf, devstate); + if (ret) { + dev_err(dev, "VF %d failed to save mmio register\n", vf->vf_id); + goto out_put_vf; + } + ret = ice_migration_save_rx_head(vf, devstate); if (ret) { dev_err(dev, "VF %d failed to save rxq head\n", vf->vf_id); @@ -883,6 +1049,125 @@ ice_migration_load_tx_head(struct ice_vf *vf, return ret; } +/** + * ice_migration_load_regs - load mmio registers from device state buffer + * @vf: pointer to VF structure + * @devstate: pointer to migration device state + * + * Return 0 for success, negative for error + */ +static int +ice_migration_load_regs(struct ice_vf *vf, + struct ice_migration_dev_state *devstate) +{ + struct ice_migration_mmio_regs *regs = &devstate->regs; + struct device *dev = ice_pf_to_dev(vf->pf); + struct ice_hw *hw = &vf->pf->hw; + struct ice_vsi *vsi; + int i, itr, v_id; + + vsi = ice_get_vf_vsi(vf); + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + return -EINVAL; + } + + if (WARN_ON_ONCE(vsi->num_q_vectors + ICE_NONQ_VECS_VF > + ICE_MIG_VF_MSIX_MAX)) + return -EINVAL; + + /* Restore Mailbox Q vectors */ + wr32(hw, GLINT_DYN_CTL(vf->first_vector_idx), regs->int_dyn_ctl[0]); + for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++) + wr32(hw, GLINT_ITR(itr, vf->first_vector_idx), + regs->int_intr[itr][0]); + + /* Restore Data Q vectors */ + for (v_id = 0; v_id < vsi->num_q_vectors; v_id++) { + int irq = v_id + ICE_NONQ_VECS_VF; + struct ice_q_vector *q_vector; + + q_vector = vsi->q_vectors[v_id]; + if (!q_vector) { + dev_err(dev, "VF %d invalid q vectors\n", vf->vf_id); + return -EINVAL; + } + wr32(hw, GLINT_DYN_CTL(q_vector->reg_idx), + regs->int_dyn_ctl[irq]); + for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++) + wr32(hw, GLINT_ITR(itr, q_vector->reg_idx), + regs->int_intr[itr][irq]); + } + + wr32(hw, VF_MBX_ATQBAL(vf->vf_id), regs->asq_bal); + wr32(hw, VF_MBX_ATQBAH(vf->vf_id), regs->asq_bah); + wr32(hw, VF_MBX_ATQLEN(vf->vf_id), regs->asq_len); + wr32(hw, VF_MBX_ATQH(vf->vf_id), regs->asq_head); + /* Since Mailbox ctrl tx queue tail is bumped by VF driver to notify + * HW to send pks, VF_MBX_ATQT is not necessry to be loaded here. + */ + wr32(hw, VF_MBX_ARQBAL(vf->vf_id), regs->arq_bal); + wr32(hw, VF_MBX_ARQBAH(vf->vf_id), regs->arq_bah); + wr32(hw, VF_MBX_ARQLEN(vf->vf_id), regs->arq_len); + wr32(hw, VF_MBX_ARQH(vf->vf_id), regs->arq_head); + wr32(hw, VF_MBX_ARQT(vf->vf_id), regs->arq_tail); + + ice_for_each_rxq(vsi, i) { + struct ice_rx_ring *rx_ring = vsi->rx_rings[i]; + + wr32(hw, QRX_TAIL(rx_ring->reg_idx), regs->rx_tail[i]); + } + + return 0; +} + +/** + * ice_migration_load_dirty_regs - load registers which may be dirtied + * @vf: pointer to VF structure + * @devstate: pointer to migration device state + * + * Return 0 for success, negative for error + */ +static int +ice_migration_load_dirty_regs(struct ice_vf *vf, + struct ice_migration_dev_state *devstate) +{ + struct ice_migration_mmio_regs *regs = &devstate->regs; + struct device *dev = ice_pf_to_dev(vf->pf); + struct ice_hw *hw = &vf->pf->hw; + struct ice_vsi *vsi; + int itr, v_id; + + vsi = ice_get_vf_vsi(vf); + if (!vsi) { + dev_err(dev, "VF %d VSI is NULL\n", vf->vf_id); + return -EINVAL; + } + + if (WARN_ON_ONCE(vsi->num_q_vectors + ICE_NONQ_VECS_VF > + ICE_MIG_VF_MSIX_MAX)) + return -EINVAL; + + /* Restore Data Q vectors */ + for (v_id = 0; v_id < vsi->num_q_vectors; v_id++) { + int irq = v_id + ICE_NONQ_VECS_VF; + struct ice_q_vector *q_vector; + + q_vector = vsi->q_vectors[v_id]; + if (!q_vector) { + dev_err(dev, "VF %d invalid q vectors\n", vf->vf_id); + return -EINVAL; + } + wr32(hw, GLINT_DYN_CTL(q_vector->reg_idx), + regs->int_dyn_ctl[irq]); + for (itr = 0; itr < ICE_MIG_VF_ITR_NUM; itr++) + wr32(hw, GLINT_ITR(itr, q_vector->reg_idx), + regs->int_intr[itr][irq]); + } + + return 0; +} + /** * ice_migration_load_devstate - load device state at destination * @pf: pointer to PF of migration device @@ -920,6 +1205,18 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, devstate = (struct ice_migration_dev_state *)buf; vf->vm_vsi_num = devstate->vsi_id; dev_dbg(dev, "VF %d vm vsi num is:%d\n", vf->vf_id, vf->vm_vsi_num); + + /* RX tail register must be loaded before queue is enabled. For + * simplicity, just load all the mmio before virtual channel messages + * are replayed. + */ + ret = ice_migration_load_regs(vf, devstate); + if (ret) { + dev_err(dev, "VF %d failed to load mmio registers\n", + vf->vf_id); + goto out_put_vf; + } + msg_slot = (struct ice_migration_virtchnl_msg_slot *) devstate->virtchnl_msgs; set_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states); @@ -971,6 +1268,17 @@ int ice_migration_load_devstate(struct ice_pf *pf, int vf_id, goto out_clear_replay; } + /* When PF processes virtual channel VIRTCHNL_OP_CONFIG_VSI_QUEUES, irq + * register may be dirtied. Hence load the affacted irq register again + * after virtual channel messages are replayed. + */ + ret = ice_migration_load_dirty_regs(vf, devstate); + if (ret) { + dev_err(dev, "VF %d failed to load dirty registers\n", + vf->vf_id); + goto out_clear_replay; + } + out_clear_replay: clear_bit(ICE_VF_STATE_REPLAYING_VC, vf->vf_states); out_put_vf: diff --git a/drivers/net/ethernet/intel/ice/ice_migration_private.h b/drivers/net/ethernet/intel/ice/ice_migration_private.h index f72a488d9002..b76eb05747c8 100644 --- a/drivers/net/ethernet/intel/ice/ice_migration_private.h +++ b/drivers/net/ethernet/intel/ice/ice_migration_private.h @@ -10,6 +10,13 @@ * in ice-vfio-pic.ko should be exposed as part of ice_migration.h. */ +#define ICE_MIG_VF_MSIX_MAX 65 +#define ICE_MIG_VF_ITR_NUM 4 +struct ice_migration_dirty_regs { + u32 int_dyn_ctl[ICE_MIG_VF_MSIX_MAX]; + u32 int_intr[ICE_MIG_VF_ITR_NUM][ICE_MIG_VF_MSIX_MAX]; +}; + #if IS_ENABLED(CONFIG_ICE_VFIO_PCI) void ice_migration_init_vf(struct ice_vf *vf); void ice_migration_uninit_vf(struct ice_vf *vf); diff --git a/drivers/net/ethernet/intel/ice/ice_vf_lib.h b/drivers/net/ethernet/intel/ice/ice_vf_lib.h index 49d99694e91f..c971fb47c2ff 100644 --- a/drivers/net/ethernet/intel/ice/ice_vf_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_vf_lib.h @@ -14,6 +14,7 @@ #include "ice_type.h" #include "ice_virtchnl_fdir.h" #include "ice_vsi_vlan_ops.h" +#include "ice_migration_private.h" #define ICE_MAX_SRIOV_VFS 256 @@ -147,6 +148,7 @@ struct ice_vf { u64 virtchnl_msg_size; u32 virtchnl_retval; u16 vm_vsi_num; + struct ice_migration_dirty_regs dirty_regs; }; /* Flags for controlling behavior of ice_reset_vf */ From patchwork Tue Nov 21 02:51:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Cao, Yahui" X-Patchwork-Id: 13462490 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=intel.com header.i=@intel.com header.b="Lff22QW5" Received: from mgamail.intel.com (mgamail.intel.com [134.134.136.100]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id C9D13D8; Mon, 20 Nov 2023 18:50:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1700535040; x=1732071040; h=from:to:cc:subject:date:message-id:in-reply-to: references:mime-version:content-transfer-encoding; bh=KCbEkyQgARUTCvODq4McBGNPO2St50S+X16hHLy9bas=; b=Lff22QW5gepqaGiZCoPFd/RXen/o6Wny0YjJBxJ1RtCxLYk+76bbgIL3 cX5gfhBOafepmAhYYwgw8ol4ALDP1BcAaGC0ZNoi+f3k9ZLHFVYteAPu5 jGTg8cMCi/wAoVOxTpeuKr6pxQNSSJgRFFG0LL4jq9MwEIxhU8/rPH8he P6Vv+yULP5EZYbW99mXaKxgNpwPG4NTMwBT8LZ1RxcapqJzp76KoHjSNM W8oTrpH2AOhkzV8324/PpfFkgU1tUIUkQAr+PjHoPE/j6lr5An40AcJKj tHWbTM3IKLe9dNYjCoBAykMu4DRse5ePozRnBVAaWGBgnV/f5O7tu2ULZ g==; X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="458246179" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="458246179" Received: from fmsmga008.fm.intel.com ([10.253.24.58]) by orsmga105.jf.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 20 Nov 2023 18:50:40 -0800 X-ExtLoop1: 1 X-IronPort-AV: E=McAfee;i="6600,9927,10900"; a="832488648" X-IronPort-AV: E=Sophos;i="6.04,215,1695711600"; d="scan'208";a="832488648" Received: from dpdk-yahui-icx1.sh.intel.com ([10.67.111.85]) by fmsmga008.fm.intel.com with ESMTP; 20 Nov 2023 18:50:34 -0800 From: Yahui Cao To: intel-wired-lan@lists.osuosl.org Cc: kvm@vger.kernel.org, netdev@vger.kernel.org, lingyu.liu@intel.com, kevin.tian@intel.com, madhu.chittim@intel.com, sridhar.samudrala@intel.com, alex.williamson@redhat.com, jgg@nvidia.com, yishaih@nvidia.com, shameerali.kolothum.thodi@huawei.com, brett.creeley@amd.com, davem@davemloft.net, edumazet@google.com, kuba@kernel.org, pabeni@redhat.com Subject: [PATCH iwl-next v4 12/12] vfio/ice: Implement vfio_pci driver for E800 devices Date: Tue, 21 Nov 2023 02:51:11 +0000 Message-Id: <20231121025111.257597-13-yahui.cao@intel.com> X-Mailer: git-send-email 2.34.1 In-Reply-To: <20231121025111.257597-1-yahui.cao@intel.com> References: <20231121025111.257597-1-yahui.cao@intel.com> Precedence: bulk X-Mailing-List: netdev@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Lingyu Liu Add a vendor-specific vfio_pci driver for E800 devices. It uses vfio_pci_core to register to the VFIO subsystem and then implements the E800 specific logic to support VF live migration. It implements the device state transition flow for live migration. Signed-off-by: Lingyu Liu Signed-off-by: Yahui Cao --- MAINTAINERS | 7 + drivers/vfio/pci/Kconfig | 2 + drivers/vfio/pci/Makefile | 2 + drivers/vfio/pci/ice/Kconfig | 10 + drivers/vfio/pci/ice/Makefile | 4 + drivers/vfio/pci/ice/ice_vfio_pci.c | 707 ++++++++++++++++++++++++++++ 6 files changed, 732 insertions(+) create mode 100644 drivers/vfio/pci/ice/Kconfig create mode 100644 drivers/vfio/pci/ice/Makefile create mode 100644 drivers/vfio/pci/ice/ice_vfio_pci.c diff --git a/MAINTAINERS b/MAINTAINERS index 97f51d5ec1cf..c8faf7fe1bd1 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -22860,6 +22860,13 @@ L: kvm@vger.kernel.org S: Maintained F: drivers/vfio/pci/mlx5/ +VFIO ICE PCI DRIVER +M: Yahui Cao +M: Lingyu Liu +L: kvm@vger.kernel.org +S: Maintained +F: drivers/vfio/pci/ice/ + VFIO PCI DEVICE SPECIFIC DRIVERS R: Jason Gunthorpe R: Yishai Hadas diff --git a/drivers/vfio/pci/Kconfig b/drivers/vfio/pci/Kconfig index 8125e5f37832..6618208947af 100644 --- a/drivers/vfio/pci/Kconfig +++ b/drivers/vfio/pci/Kconfig @@ -65,4 +65,6 @@ source "drivers/vfio/pci/hisilicon/Kconfig" source "drivers/vfio/pci/pds/Kconfig" +source "drivers/vfio/pci/ice/Kconfig" + endmenu diff --git a/drivers/vfio/pci/Makefile b/drivers/vfio/pci/Makefile index 45167be462d8..fc1df82df3ac 100644 --- a/drivers/vfio/pci/Makefile +++ b/drivers/vfio/pci/Makefile @@ -13,3 +13,5 @@ obj-$(CONFIG_MLX5_VFIO_PCI) += mlx5/ obj-$(CONFIG_HISI_ACC_VFIO_PCI) += hisilicon/ obj-$(CONFIG_PDS_VFIO_PCI) += pds/ + +obj-$(CONFIG_ICE_VFIO_PCI) += ice/ diff --git a/drivers/vfio/pci/ice/Kconfig b/drivers/vfio/pci/ice/Kconfig new file mode 100644 index 000000000000..0b8cd1489073 --- /dev/null +++ b/drivers/vfio/pci/ice/Kconfig @@ -0,0 +1,10 @@ +# SPDX-License-Identifier: GPL-2.0-only +config ICE_VFIO_PCI + tristate "VFIO support for Intel(R) Ethernet Connection E800 Series" + depends on ICE + select VFIO_PCI_CORE + help + This provides migration support for Intel(R) Ethernet connection E800 + series devices using the VFIO framework. + + If you don't know what to do here, say N. diff --git a/drivers/vfio/pci/ice/Makefile b/drivers/vfio/pci/ice/Makefile new file mode 100644 index 000000000000..259d4ab89105 --- /dev/null +++ b/drivers/vfio/pci/ice/Makefile @@ -0,0 +1,4 @@ +# SPDX-License-Identifier: GPL-2.0-only +obj-$(CONFIG_ICE_VFIO_PCI) += ice-vfio-pci.o +ice-vfio-pci-y := ice_vfio_pci.o + diff --git a/drivers/vfio/pci/ice/ice_vfio_pci.c b/drivers/vfio/pci/ice/ice_vfio_pci.c new file mode 100644 index 000000000000..28a181aa2f3f --- /dev/null +++ b/drivers/vfio/pci/ice/ice_vfio_pci.c @@ -0,0 +1,707 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (C) 2018-2023 Intel Corporation */ + +#include +#include +#include +#include +#include +#include +#include +#include + +#define DRIVER_DESC "ICE VFIO PCI - User Level meta-driver for Intel E800 device family" + +struct ice_vfio_pci_migration_file { + struct file *filp; + struct mutex lock; /* protect migration file access */ + bool disabled; + + u8 mig_data[SZ_128K]; + size_t total_length; +}; + +struct ice_vfio_pci_core_device { + struct vfio_pci_core_device core_device; + u8 deferred_reset:1; + struct mutex state_mutex; /* protect migration state */ + enum vfio_device_mig_state mig_state; + /* protect the reset_done flow */ + spinlock_t reset_lock; + struct ice_vfio_pci_migration_file *resuming_migf; + struct ice_vfio_pci_migration_file *saving_migf; + struct vfio_device_migration_info mig_info; + u8 *mig_data; + struct ice_pf *pf; + int vf_id; +}; + +/** + * ice_vfio_pci_load_state - VFIO device state reloading + * @ice_vdev: pointer to ice vfio pci core device structure + * + * Load device state. This function is called when the userspace VFIO uAPI + * consumer wants to load the device state info from VFIO migration region and + * load them into the device. This function should make sure all the device + * state info is loaded successfully. As a result, return value is mandatory + * to be checked. + * + * Return 0 for success, negative value for failure. + */ +static int __must_check +ice_vfio_pci_load_state(struct ice_vfio_pci_core_device *ice_vdev) +{ + struct ice_vfio_pci_migration_file *migf = ice_vdev->resuming_migf; + + return ice_migration_load_devstate(ice_vdev->pf, + ice_vdev->vf_id, + migf->mig_data, + migf->total_length); +} + +/** + * ice_vfio_pci_save_state - VFIO device state saving + * @ice_vdev: pointer to ice vfio pci core device structure + * @migf: pointer to migration file + * + * Snapshot the device state and save it. This function is called when the + * VFIO uAPI consumer wants to snapshot the current device state and saves + * it into the VFIO migration region. This function should make sure all + * of the device state info is collectted and saved successfully. As a + * result, return value is mandatory to be checked. + * + * Return 0 for success, negative value for failure. + */ +static int __must_check +ice_vfio_pci_save_state(struct ice_vfio_pci_core_device *ice_vdev, + struct ice_vfio_pci_migration_file *migf) +{ + migf->total_length = SZ_128K; + + return ice_migration_save_devstate(ice_vdev->pf, + ice_vdev->vf_id, + migf->mig_data, + migf->total_length); +} + +/** + * ice_vfio_migration_init - Initialization for live migration function + * @ice_vdev: pointer to ice vfio pci core device structure + * + * Returns 0 on success, negative value on error + */ +static int ice_vfio_migration_init(struct ice_vfio_pci_core_device *ice_vdev) +{ + struct pci_dev *pdev = ice_vdev->core_device.pdev; + + ice_vdev->pf = ice_migration_get_pf(pdev); + if (!ice_vdev->pf) + return -EFAULT; + + ice_vdev->vf_id = pci_iov_vf_id(pdev); + if (ice_vdev->vf_id < 0) + return -EINVAL; + + return ice_migration_init_dev(ice_vdev->pf, ice_vdev->vf_id); +} + +/** + * ice_vfio_migration_uninit - Cleanup for live migration function + * @ice_vdev: pointer to ice vfio pci core device structure + */ +static void ice_vfio_migration_uninit(struct ice_vfio_pci_core_device *ice_vdev) +{ + ice_migration_uninit_dev(ice_vdev->pf, ice_vdev->vf_id); +} + +/** + * ice_vfio_pci_disable_fd - Close migration file + * @migf: pointer to ice vfio pci migration file + */ +static void ice_vfio_pci_disable_fd(struct ice_vfio_pci_migration_file *migf) +{ + mutex_lock(&migf->lock); + migf->disabled = true; + migf->total_length = 0; + migf->filp->f_pos = 0; + mutex_unlock(&migf->lock); +} + +/** + * ice_vfio_pci_disable_fds - Close migration files of ice vfio pci device + * @ice_vdev: pointer to ice vfio pci core device structure + */ +static void ice_vfio_pci_disable_fds(struct ice_vfio_pci_core_device *ice_vdev) +{ + if (ice_vdev->resuming_migf) { + ice_vfio_pci_disable_fd(ice_vdev->resuming_migf); + fput(ice_vdev->resuming_migf->filp); + ice_vdev->resuming_migf = NULL; + } + if (ice_vdev->saving_migf) { + ice_vfio_pci_disable_fd(ice_vdev->saving_migf); + fput(ice_vdev->saving_migf->filp); + ice_vdev->saving_migf = NULL; + } +} + +/* + * This function is called in all state_mutex unlock cases to + * handle a 'deferred_reset' if exists. + * @ice_vdev: pointer to ice vfio pci core device structure + */ +static void +ice_vfio_pci_state_mutex_unlock(struct ice_vfio_pci_core_device *ice_vdev) +{ +again: + spin_lock(&ice_vdev->reset_lock); + if (ice_vdev->deferred_reset) { + ice_vdev->deferred_reset = false; + spin_unlock(&ice_vdev->reset_lock); + ice_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING; + ice_vfio_pci_disable_fds(ice_vdev); + goto again; + } + mutex_unlock(&ice_vdev->state_mutex); + spin_unlock(&ice_vdev->reset_lock); +} + +static void ice_vfio_pci_reset_done(struct pci_dev *pdev) +{ + struct ice_vfio_pci_core_device *ice_vdev = + (struct ice_vfio_pci_core_device *)dev_get_drvdata(&pdev->dev); + + /* + * As the higher VFIO layers are holding locks across reset and using + * those same locks with the mm_lock we need to prevent ABBA deadlock + * with the state_mutex and mm_lock. + * In case the state_mutex was taken already we defer the cleanup work + * to the unlock flow of the other running context. + */ + spin_lock(&ice_vdev->reset_lock); + ice_vdev->deferred_reset = true; + if (!mutex_trylock(&ice_vdev->state_mutex)) { + spin_unlock(&ice_vdev->reset_lock); + return; + } + spin_unlock(&ice_vdev->reset_lock); + ice_vfio_pci_state_mutex_unlock(ice_vdev); +} + +/** + * ice_vfio_pci_open_device - Called when a vfio device is probed by VFIO UAPI + * @core_vdev: the vfio device to open + * + * Initialization of the vfio device + * + * Returns 0 on success, negative value on error + */ +static int ice_vfio_pci_open_device(struct vfio_device *core_vdev) +{ + struct ice_vfio_pci_core_device *ice_vdev = container_of(core_vdev, + struct ice_vfio_pci_core_device, core_device.vdev); + struct vfio_pci_core_device *vdev = &ice_vdev->core_device; + int ret; + + ret = vfio_pci_core_enable(vdev); + if (ret) + return ret; + + ret = ice_vfio_migration_init(ice_vdev); + if (ret) { + vfio_pci_core_disable(vdev); + return ret; + } + ice_vdev->mig_state = VFIO_DEVICE_STATE_RUNNING; + vfio_pci_core_finish_enable(vdev); + + return 0; +} + +/** + * ice_vfio_pci_close_device - Called when a vfio device fd is closed + * @core_vdev: the vfio device to close + */ +static void ice_vfio_pci_close_device(struct vfio_device *core_vdev) +{ + struct ice_vfio_pci_core_device *ice_vdev = container_of(core_vdev, + struct ice_vfio_pci_core_device, core_device.vdev); + + ice_vfio_pci_disable_fds(ice_vdev); + vfio_pci_core_close_device(core_vdev); + ice_vfio_migration_uninit(ice_vdev); +} + +/** + * ice_vfio_pci_release_file - release ice vfio pci migration file + * @inode: pointer to inode + * @filp: pointer to the file to release + * + * Return 0 for success, negative for error + */ +static int ice_vfio_pci_release_file(struct inode *inode, struct file *filp) +{ + struct ice_vfio_pci_migration_file *migf = filp->private_data; + + ice_vfio_pci_disable_fd(migf); + mutex_destroy(&migf->lock); + kfree(migf); + return 0; +} + +/** + * ice_vfio_pci_save_read - save migration file data to user space + * @filp: pointer to migration file + * @buf: pointer to user space buffer + * @len: data length to be saved + * @pos: should be 0 + * + * Return len of saved data, negative for error + */ +static ssize_t ice_vfio_pci_save_read(struct file *filp, char __user *buf, + size_t len, loff_t *pos) +{ + struct ice_vfio_pci_migration_file *migf = filp->private_data; + loff_t *off = &filp->f_pos; + ssize_t done = 0; + int ret; + + if (pos) + return -ESPIPE; + + mutex_lock(&migf->lock); + if (*off > migf->total_length) { + done = -EINVAL; + goto out_unlock; + } + + if (migf->disabled) { + done = -ENODEV; + goto out_unlock; + } + + len = min_t(size_t, migf->total_length - *off, len); + if (len) { + ret = copy_to_user(buf, migf->mig_data + *off, len); + if (ret) { + done = -EFAULT; + goto out_unlock; + } + *off += len; + done = len; + } +out_unlock: + mutex_unlock(&migf->lock); + return done; +} + +static const struct file_operations ice_vfio_pci_save_fops = { + .owner = THIS_MODULE, + .read = ice_vfio_pci_save_read, + .release = ice_vfio_pci_release_file, + .llseek = no_llseek, +}; + +/** + * ice_vfio_pci_stop_copy - create migration file and save migration state to it + * @ice_vdev: pointer to ice vfio pci core device structure + * + * Return migration file handler + */ +static struct ice_vfio_pci_migration_file * +ice_vfio_pci_stop_copy(struct ice_vfio_pci_core_device *ice_vdev) +{ + struct ice_vfio_pci_migration_file *migf; + int ret; + + migf = kzalloc(sizeof(*migf), GFP_KERNEL); + if (!migf) + return ERR_PTR(-ENOMEM); + + migf->filp = anon_inode_getfile("ice_vfio_pci_mig", + &ice_vfio_pci_save_fops, migf, + O_RDONLY); + if (IS_ERR(migf->filp)) { + int err = PTR_ERR(migf->filp); + + kfree(migf); + return ERR_PTR(err); + } + + stream_open(migf->filp->f_inode, migf->filp); + mutex_init(&migf->lock); + + ret = ice_vfio_pci_save_state(ice_vdev, migf); + if (ret) { + fput(migf->filp); + kfree(migf); + return ERR_PTR(ret); + } + + return migf; +} + +/** + * ice_vfio_pci_resume_write- copy migration file data from user space + * @filp: pointer to migration file + * @buf: pointer to user space buffer + * @len: data length to be copied + * @pos: should be 0 + * + * Return len of saved data, negative for error + */ +static ssize_t +ice_vfio_pci_resume_write(struct file *filp, const char __user *buf, + size_t len, loff_t *pos) +{ + struct ice_vfio_pci_migration_file *migf = filp->private_data; + loff_t *off = &filp->f_pos; + loff_t requested_length; + ssize_t done = 0; + int ret; + + if (pos) + return -ESPIPE; + + if (*off < 0 || + check_add_overflow((loff_t)len, *off, &requested_length)) + return -EINVAL; + + if (requested_length > sizeof(migf->mig_data)) + return -ENOMEM; + + mutex_lock(&migf->lock); + if (migf->disabled) { + done = -ENODEV; + goto out_unlock; + } + + ret = copy_from_user(migf->mig_data + *off, buf, len); + if (ret) { + done = -EFAULT; + goto out_unlock; + } + *off += len; + done = len; + migf->total_length += len; +out_unlock: + mutex_unlock(&migf->lock); + return done; +} + +static const struct file_operations ice_vfio_pci_resume_fops = { + .owner = THIS_MODULE, + .write = ice_vfio_pci_resume_write, + .release = ice_vfio_pci_release_file, + .llseek = no_llseek, +}; + +/** + * ice_vfio_pci_resume - create resuming migration file + * @ice_vdev: pointer to ice vfio pci core device structure + * + * Return migration file handler, negative value for failure + */ +static struct ice_vfio_pci_migration_file * +ice_vfio_pci_resume(struct ice_vfio_pci_core_device *ice_vdev) +{ + struct ice_vfio_pci_migration_file *migf; + + migf = kzalloc(sizeof(*migf), GFP_KERNEL); + if (!migf) + return ERR_PTR(-ENOMEM); + + migf->filp = anon_inode_getfile("ice_vfio_pci_mig", + &ice_vfio_pci_resume_fops, migf, + O_WRONLY); + if (IS_ERR(migf->filp)) { + int err = PTR_ERR(migf->filp); + + kfree(migf); + return ERR_PTR(err); + } + + stream_open(migf->filp->f_inode, migf->filp); + mutex_init(&migf->lock); + return migf; +} + +/** + * ice_vfio_pci_step_device_state_locked - process device state change + * @ice_vdev: pointer to ice vfio pci core device structure + * @new: new device state + * @final: final device state + * + * Return migration file handler or NULL for success, negative value for failure + */ +static struct file * +ice_vfio_pci_step_device_state_locked(struct ice_vfio_pci_core_device *ice_vdev, + u32 new, u32 final) +{ + u32 cur = ice_vdev->mig_state; + int ret; + + if (cur == VFIO_DEVICE_STATE_RUNNING && + new == VFIO_DEVICE_STATE_RUNNING_P2P) { + ice_migration_suspend_dev(ice_vdev->pf, ice_vdev->vf_id); + return NULL; + } + + if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && + new == VFIO_DEVICE_STATE_STOP) + return NULL; + + if (cur == VFIO_DEVICE_STATE_STOP && + new == VFIO_DEVICE_STATE_STOP_COPY) { + struct ice_vfio_pci_migration_file *migf; + + migf = ice_vfio_pci_stop_copy(ice_vdev); + if (IS_ERR(migf)) + return ERR_CAST(migf); + get_file(migf->filp); + ice_vdev->saving_migf = migf; + return migf->filp; + } + + if (cur == VFIO_DEVICE_STATE_STOP_COPY && + new == VFIO_DEVICE_STATE_STOP) { + ice_vfio_pci_disable_fds(ice_vdev); + return NULL; + } + + if (cur == VFIO_DEVICE_STATE_STOP && + new == VFIO_DEVICE_STATE_RESUMING) { + struct ice_vfio_pci_migration_file *migf; + + migf = ice_vfio_pci_resume(ice_vdev); + if (IS_ERR(migf)) + return ERR_CAST(migf); + get_file(migf->filp); + ice_vdev->resuming_migf = migf; + return migf->filp; + } + + if (cur == VFIO_DEVICE_STATE_RESUMING && new == VFIO_DEVICE_STATE_STOP) + return NULL; + + if (cur == VFIO_DEVICE_STATE_STOP && + new == VFIO_DEVICE_STATE_RUNNING_P2P) { + ret = ice_vfio_pci_load_state(ice_vdev); + if (ret) + return ERR_PTR(ret); + ice_vfio_pci_disable_fds(ice_vdev); + return NULL; + } + + if (cur == VFIO_DEVICE_STATE_RUNNING_P2P && + new == VFIO_DEVICE_STATE_RUNNING) + return NULL; + + /* + * vfio_mig_get_next_state() does not use arcs other than the above + */ + WARN_ON(true); + return ERR_PTR(-EINVAL); +} + +/** + * ice_vfio_pci_set_device_state - Config device state + * @vdev: pointer to vfio pci device + * @new_state: device state + * + * Return 0 for success, negative value for failure. + */ +static struct file * +ice_vfio_pci_set_device_state(struct vfio_device *vdev, + enum vfio_device_mig_state new_state) +{ + struct ice_vfio_pci_core_device *ice_vdev = + container_of(vdev, + struct ice_vfio_pci_core_device, + core_device.vdev); + enum vfio_device_mig_state next_state; + struct file *res = NULL; + int ret; + + mutex_lock(&ice_vdev->state_mutex); + while (new_state != ice_vdev->mig_state) { + ret = vfio_mig_get_next_state(vdev, ice_vdev->mig_state, + new_state, &next_state); + if (ret) { + res = ERR_PTR(ret); + break; + } + res = ice_vfio_pci_step_device_state_locked(ice_vdev, + next_state, + new_state); + if (IS_ERR(res)) + break; + ice_vdev->mig_state = next_state; + if (WARN_ON(res && new_state != ice_vdev->mig_state)) { + fput(res); + res = ERR_PTR(-EINVAL); + break; + } + } + ice_vfio_pci_state_mutex_unlock(ice_vdev); + return res; +} + +/** + * ice_vfio_pci_get_device_state - get device state + * @vdev: pointer to vfio pci device + * @curr_state: device state + * + * Return 0 for success + */ +static int ice_vfio_pci_get_device_state(struct vfio_device *vdev, + enum vfio_device_mig_state *curr_state) +{ + struct ice_vfio_pci_core_device *ice_vdev = + container_of(vdev, + struct ice_vfio_pci_core_device, + core_device.vdev); + mutex_lock(&ice_vdev->state_mutex); + *curr_state = ice_vdev->mig_state; + ice_vfio_pci_state_mutex_unlock(ice_vdev); + return 0; +} + +/** + * ice_vfio_pci_get_data_size - get migration data size + * @vdev: pointer to vfio pci device + * @stop_copy_length: migration data size + * + * Return 0 for success + */ +static int +ice_vfio_pci_get_data_size(struct vfio_device *vdev, + unsigned long *stop_copy_length) +{ + *stop_copy_length = SZ_128K; + return 0; +} + +static const struct vfio_migration_ops ice_vfio_pci_migrn_state_ops = { + .migration_set_state = ice_vfio_pci_set_device_state, + .migration_get_state = ice_vfio_pci_get_device_state, + .migration_get_data_size = ice_vfio_pci_get_data_size, +}; + +/** + * ice_vfio_pci_core_init_dev - initialize vfio device + * @core_vdev: pointer to vfio device + * + * Return 0 for success + */ +static int ice_vfio_pci_core_init_dev(struct vfio_device *core_vdev) +{ + struct ice_vfio_pci_core_device *ice_vdev = container_of(core_vdev, + struct ice_vfio_pci_core_device, core_device.vdev); + + mutex_init(&ice_vdev->state_mutex); + spin_lock_init(&ice_vdev->reset_lock); + + core_vdev->migration_flags = + VFIO_MIGRATION_STOP_COPY | VFIO_MIGRATION_P2P; + core_vdev->mig_ops = &ice_vfio_pci_migrn_state_ops; + + return vfio_pci_core_init_dev(core_vdev); +} + +static const struct vfio_device_ops ice_vfio_pci_ops = { + .name = "ice-vfio-pci", + .init = ice_vfio_pci_core_init_dev, + .release = vfio_pci_core_release_dev, + .open_device = ice_vfio_pci_open_device, + .close_device = ice_vfio_pci_close_device, + .device_feature = vfio_pci_core_ioctl_feature, + .read = vfio_pci_core_read, + .write = vfio_pci_core_write, + .ioctl = vfio_pci_core_ioctl, + .mmap = vfio_pci_core_mmap, + .request = vfio_pci_core_request, + .match = vfio_pci_core_match, + .bind_iommufd = vfio_iommufd_physical_bind, + .unbind_iommufd = vfio_iommufd_physical_unbind, + .attach_ioas = vfio_iommufd_physical_attach_ioas, + .detach_ioas = vfio_iommufd_physical_detach_ioas, +}; + +/** + * ice_vfio_pci_probe - Device initialization routine + * @pdev: PCI device information struct + * @id: entry in ice_vfio_pci_table + * + * Returns 0 on success, negative on failure + */ +static int +ice_vfio_pci_probe(struct pci_dev *pdev, const struct pci_device_id *id) +{ + struct ice_vfio_pci_core_device *ice_vdev; + int ret; + + ice_vdev = vfio_alloc_device(ice_vfio_pci_core_device, core_device.vdev, + &pdev->dev, &ice_vfio_pci_ops); + if (!ice_vdev) + return -ENOMEM; + + dev_set_drvdata(&pdev->dev, &ice_vdev->core_device); + + ret = vfio_pci_core_register_device(&ice_vdev->core_device); + if (ret) + goto out_free; + + return 0; + +out_free: + vfio_put_device(&ice_vdev->core_device.vdev); + return ret; +} + +/** + * ice_vfio_pci_remove - Device removal routine + * @pdev: PCI device information struct + */ +static void ice_vfio_pci_remove(struct pci_dev *pdev) +{ + struct ice_vfio_pci_core_device *ice_vdev = + (struct ice_vfio_pci_core_device *)dev_get_drvdata(&pdev->dev); + + vfio_pci_core_unregister_device(&ice_vdev->core_device); + vfio_put_device(&ice_vdev->core_device.vdev); +} + +/* ice_pci_tbl - PCI Device ID Table + * + * Wildcard entries (PCI_ANY_ID) should come last + * Last entry must be all 0s + * + * { Vendor ID, Device ID, SubVendor ID, SubDevice ID, + * Class, Class Mask, private data (not used) } + */ +static const struct pci_device_id ice_vfio_pci_table[] = { + { PCI_DRIVER_OVERRIDE_DEVICE_VFIO(PCI_VENDOR_ID_INTEL, 0x1889) }, + {} +}; +MODULE_DEVICE_TABLE(pci, ice_vfio_pci_table); + +static const struct pci_error_handlers ice_vfio_pci_core_err_handlers = { + .reset_done = ice_vfio_pci_reset_done, + .error_detected = vfio_pci_core_aer_err_detected, +}; + +static struct pci_driver ice_vfio_pci_driver = { + .name = "ice-vfio-pci", + .id_table = ice_vfio_pci_table, + .probe = ice_vfio_pci_probe, + .remove = ice_vfio_pci_remove, + .err_handler = &ice_vfio_pci_core_err_handlers, + .driver_managed_dma = true, +}; + +module_pci_driver(ice_vfio_pci_driver); + +MODULE_LICENSE("GPL"); +MODULE_AUTHOR("Intel Corporation, "); +MODULE_DESCRIPTION(DRIVER_DESC);