From patchwork Mon Jul 5 16:43:35 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 12359445 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id C2337C07E99 for ; Mon, 5 Jul 2021 16:58:00 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id A95AD61964 for ; Mon, 5 Jul 2021 16:58:00 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229652AbhGERAg (ORCPT ); Mon, 5 Jul 2021 13:00:36 -0400 Received: from mga05.intel.com ([192.55.52.43]:50091 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229774AbhGERAg (ORCPT ); Mon, 5 Jul 2021 13:00:36 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10036"; a="294646403" X-IronPort-AV: E=Sophos;i="5.83,325,1616482800"; d="scan'208";a="294646403" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jul 2021 09:57:59 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,325,1616482800"; d="scan'208";a="562686271" Received: from ranger.igk.intel.com ([10.102.21.164]) by fmsmga001.fm.intel.com with ESMTP; 05 Jul 2021 09:57:57 -0700 From: Maciej Fijalkowski To: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net, anthony.l.nguyen@intel.com, kuba@kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, joamaki@gmail.com, toke@redhat.com, Maciej Fijalkowski Subject: [PATCH v2 intel-next 1/4] ice: unify xdp_rings accesses Date: Mon, 5 Jul 2021 18:43:35 +0200 Message-Id: <20210705164338.58313-2-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210705164338.58313-1-maciej.fijalkowski@intel.com> References: <20210705164338.58313-1-maciej.fijalkowski@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org There has been a long lasting issue of improper xdp_rings indexing for XDP_TX and XDP_REDIRECT actions. Given that currently rx_ring->q_index is mixed with smp_processor_id(), there could be a situation where Tx descriptors are produced onto XDP Tx ring, but tail is never bumped - for example pin a particular queue id to non-matching IRQ line. Address this problem by ignoring the user ring count setting and always initialize the xdp_rings array to be of num_possible_cpus() size. Then, always use the smp_processor_id() as an index to xdp_rings array. This provides serialization as at given time only a single softirq can run on a particular CPU. Signed-off-by: Maciej Fijalkowski --- drivers/net/ethernet/intel/ice/ice_lib.c | 2 +- drivers/net/ethernet/intel/ice/ice_main.c | 2 +- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index dde9802c6c72..20d02e9d6635 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -3154,7 +3154,7 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi) ice_vsi_map_rings_to_vectors(vsi); if (ice_is_xdp_ena_vsi(vsi)) { - vsi->num_xdp_txq = vsi->alloc_rxq; + vsi->num_xdp_txq = num_possible_cpus(); ret = ice_prepare_xdp_rings(vsi, vsi->xdp_prog); if (ret) goto err_vectors; diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index ef8d1815af56..f26db305b797 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -2626,7 +2626,7 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog, } if (!ice_is_xdp_ena_vsi(vsi) && prog) { - vsi->num_xdp_txq = vsi->alloc_rxq; + vsi->num_xdp_txq = num_possible_cpus(); xdp_ring_err = ice_prepare_xdp_rings(vsi, prog); if (xdp_ring_err) NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Tx resources failed"); diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 171397dcf00a..4fd01a153d35 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -295,7 +295,7 @@ void ice_finalize_xdp_rx(struct ice_ring *rx_ring, unsigned int xdp_res) if (xdp_res & ICE_XDP_TX) { struct ice_ring *xdp_ring = - rx_ring->vsi->xdp_rings[rx_ring->q_index]; + rx_ring->vsi->xdp_rings[smp_processor_id()]; ice_xdp_ring_update_tail(xdp_ring); } From patchwork Mon Jul 5 16:43:36 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 12359447 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 25341C07E9C for ; Mon, 5 Jul 2021 16:58:03 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0C3F061975 for ; Mon, 5 Jul 2021 16:58:03 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229771AbhGERAj (ORCPT ); Mon, 5 Jul 2021 13:00:39 -0400 Received: from mga05.intel.com ([192.55.52.43]:50091 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229689AbhGERAj (ORCPT ); Mon, 5 Jul 2021 13:00:39 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10036"; a="294646405" X-IronPort-AV: E=Sophos;i="5.83,325,1616482800"; d="scan'208";a="294646405" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jul 2021 09:58:02 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,325,1616482800"; d="scan'208";a="562686288" Received: from ranger.igk.intel.com ([10.102.21.164]) by fmsmga001.fm.intel.com with ESMTP; 05 Jul 2021 09:57:59 -0700 From: Maciej Fijalkowski To: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net, anthony.l.nguyen@intel.com, kuba@kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, joamaki@gmail.com, toke@redhat.com, Maciej Fijalkowski Subject: [PATCH v2 intel-next 2/4] ice: optimize XDP_TX descriptor processing Date: Mon, 5 Jul 2021 18:43:36 +0200 Message-Id: <20210705164338.58313-3-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210705164338.58313-1-maciej.fijalkowski@intel.com> References: <20210705164338.58313-1-maciej.fijalkowski@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Improve the performance of XDP_TX by reducing the number of times we set the RS bit. Instead of setting it for each packet, set it at the end of the batch right before the tail bump. This results in reduced PCIe traffic as HW will issue less writebacks. For that purpose, introduce the next_rs_idx field onto ice_ring so that the descriptor with RS bit set can be tracked. This will allow for calculating the amount of descriptors that are ready to be cleaned. DD bit can be checked only on the descriptor that next_rs_idx points to. This logic can not be combined into existing ice_clean_tx_irq routine so let's introduce a separate function dedicated for cleaning XDP rings and drop the XDP specific bits from the existing one. Signed-off-by: Maciej Fijalkowski --- drivers/net/ethernet/intel/ice/ice_txrx.c | 79 ++++++++++++++++--- drivers/net/ethernet/intel/ice/ice_txrx.h | 5 +- drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 13 ++- 3 files changed, 76 insertions(+), 21 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 6ee8e0032d52..23b97c9579fb 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -194,6 +194,63 @@ void ice_free_tx_ring(struct ice_ring *tx_ring) } } +/** + * ice_clean_xdp_irq - Reclaim resources after transmit completes on XDP ring + * @xdp_ring: XDP ring to clean + * + * Returns true if there's any budget left (e.g. the clean is finished) + */ +static bool ice_clean_xdp_irq(struct ice_ring *xdp_ring) +{ + unsigned int total_bytes = 0, total_pkts = 0; + u16 next_rs_idx = xdp_ring->next_rs_idx; + u16 ntc = xdp_ring->next_to_clean; + struct ice_tx_desc *next_rs_desc; + struct ice_tx_buf *tx_buf; + u16 frames_ready = 0; + u16 frames_to_clean; + int i; + + next_rs_desc = ICE_TX_DESC(xdp_ring, next_rs_idx); + if (next_rs_desc->cmd_type_offset_bsz & + cpu_to_le64(ICE_TX_DESC_DTYPE_DESC_DONE)) { + if (next_rs_idx >= ntc) + frames_ready = next_rs_idx - ntc; + else + frames_ready = next_rs_idx + xdp_ring->count - ntc; + } + + if (!frames_ready) + return true; + + frames_to_clean = min_t(u16, frames_ready, ICE_DFLT_IRQ_WORK); + + for (i = 0; i < frames_to_clean; i++) { + tx_buf = &xdp_ring->tx_buf[ntc]; + + total_bytes += tx_buf->bytecount; + /* normally tx_buf->gso_segs was taken but at this point + * it's always 1 for us + */ + total_pkts++; + + page_frag_free(tx_buf->raw_buf); + dma_unmap_single(xdp_ring->dev, dma_unmap_addr(tx_buf, dma), + dma_unmap_len(tx_buf, len), DMA_TO_DEVICE); + dma_unmap_len_set(tx_buf, len, 0); + tx_buf->raw_buf = NULL; + + ntc++; + if (ntc >= xdp_ring->count) + ntc = 0; + } + + xdp_ring->next_to_clean = ntc; + ice_update_tx_ring_stats(xdp_ring, total_pkts, total_bytes); + + return frames_ready <= frames_to_clean; +} + /** * ice_clean_tx_irq - Reclaim resources after transmit completes * @tx_ring: Tx ring to clean @@ -238,11 +295,8 @@ static bool ice_clean_tx_irq(struct ice_ring *tx_ring, int napi_budget) total_bytes += tx_buf->bytecount; total_pkts += tx_buf->gso_segs; - if (ice_ring_is_xdp(tx_ring)) - page_frag_free(tx_buf->raw_buf); - else - /* free the skb */ - napi_consume_skb(tx_buf->skb, napi_budget); + /* free the skb */ + napi_consume_skb(tx_buf->skb, napi_budget); /* unmap skb header data */ dma_unmap_single(tx_ring->dev, @@ -297,10 +351,6 @@ static bool ice_clean_tx_irq(struct ice_ring *tx_ring, int napi_budget) tx_ring->next_to_clean = i; ice_update_tx_ring_stats(tx_ring, total_pkts, total_bytes); - - if (ice_ring_is_xdp(tx_ring)) - return !!budget; - netdev_tx_completed_queue(txring_txq(tx_ring), total_pkts, total_bytes); @@ -1396,9 +1446,14 @@ int ice_napi_poll(struct napi_struct *napi, int budget) * budget and be more aggressive about cleaning up the Tx descriptors. */ ice_for_each_ring(ring, q_vector->tx) { - bool wd = ring->xsk_pool ? - ice_clean_tx_irq_zc(ring, budget) : - ice_clean_tx_irq(ring, budget); + bool wd; + + if (ring->xsk_pool) + wd = ice_clean_tx_irq_zc(ring, budget); + else if (ice_ring_is_xdp(ring)) + wd = ice_clean_xdp_irq(ring); + else + wd = ice_clean_tx_irq(ring, budget); if (!wd) clean_complete = false; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index 1e46e80f3d6f..b43d471ce05d 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -283,7 +283,10 @@ struct ice_ring { /* used in interrupt processing */ u16 next_to_use; u16 next_to_clean; - u16 next_to_alloc; + union { + u16 next_to_alloc; + u16 next_rs_idx; + }; /* stats structs */ struct ice_q_stats stats; diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 4fd01a153d35..0b3d51c9869b 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -244,19 +244,13 @@ int ice_xmit_xdp_ring(void *data, u16 size, struct ice_ring *xdp_ring) tx_desc = ICE_TX_DESC(xdp_ring, i); tx_desc->buf_addr = cpu_to_le64(dma); - tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TXD_LAST_DESC_CMD, 0, + tx_desc->cmd_type_offset_bsz = ice_build_ctob(ICE_TX_DESC_CMD_EOP, 0, size, 0); - /* Make certain all of the status bits have been updated - * before next_to_watch is written. - */ - smp_wmb(); - + xdp_ring->next_rs_idx = i; i++; if (i == xdp_ring->count) i = 0; - - tx_buf->next_to_watch = tx_desc; xdp_ring->next_to_use = i; return ICE_XDP_TX; @@ -296,7 +290,10 @@ void ice_finalize_xdp_rx(struct ice_ring *rx_ring, unsigned int xdp_res) if (xdp_res & ICE_XDP_TX) { struct ice_ring *xdp_ring = rx_ring->vsi->xdp_rings[smp_processor_id()]; + struct ice_tx_desc *next_rs_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs_idx); + next_rs_desc->cmd_type_offset_bsz |= + cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S); ice_xdp_ring_update_tail(xdp_ring); } } From patchwork Mon Jul 5 16:43:37 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 12359449 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30598C07E99 for ; Mon, 5 Jul 2021 16:58:06 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 0F2C561964 for ; Mon, 5 Jul 2021 16:58:06 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229865AbhGERAm (ORCPT ); Mon, 5 Jul 2021 13:00:42 -0400 Received: from mga05.intel.com ([192.55.52.43]:50106 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229784AbhGERAl (ORCPT ); Mon, 5 Jul 2021 13:00:41 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10036"; a="294646411" X-IronPort-AV: E=Sophos;i="5.83,325,1616482800"; d="scan'208";a="294646411" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jul 2021 09:58:04 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,325,1616482800"; d="scan'208";a="562686299" Received: from ranger.igk.intel.com ([10.102.21.164]) by fmsmga001.fm.intel.com with ESMTP; 05 Jul 2021 09:58:02 -0700 From: Maciej Fijalkowski To: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net, anthony.l.nguyen@intel.com, kuba@kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, joamaki@gmail.com, toke@redhat.com, Maciej Fijalkowski Subject: [PATCH v2 intel-next 3/4] ice: do not create xdp_frame on XDP_TX Date: Mon, 5 Jul 2021 18:43:37 +0200 Message-Id: <20210705164338.58313-4-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210705164338.58313-1-maciej.fijalkowski@intel.com> References: <20210705164338.58313-1-maciej.fijalkowski@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org xdp_frame is not needed for XDP_TX data path in ice driver case. For this data path cleaning of sent descriptor will not happen anywhere outside of the driver, which means that carrying the information about the underlying memory model via xdp_frame will not be used. Therefore, this conversion can be simply dropped, which would relieve CPU a bit. Call directly ice_xmit_xdp_ring instead of ice_xmit_xdp_buff for XDP_TX action returned from BPF program. Signed-off-by: Maciej Fijalkowski --- drivers/net/ethernet/intel/ice/ice_txrx.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index 23b97c9579fb..fef1f74562e5 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -586,7 +586,7 @@ ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp, return ICE_XDP_PASS; case XDP_TX: xdp_ring = rx_ring->vsi->xdp_rings[smp_processor_id()]; - result = ice_xmit_xdp_buff(xdp, xdp_ring); + result = ice_xmit_xdp_ring(xdp->data, xdp->data_end - xdp->data, xdp_ring); if (result == ICE_XDP_CONSUMED) goto out_failure; return result; From patchwork Mon Jul 5 16:43:38 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Maciej Fijalkowski X-Patchwork-Id: 12359451 X-Patchwork-Delegate: kuba@kernel.org Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C1FFC07E99 for ; Mon, 5 Jul 2021 16:58:08 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 57F1661964 for ; Mon, 5 Jul 2021 16:58:08 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229784AbhGERAo (ORCPT ); Mon, 5 Jul 2021 13:00:44 -0400 Received: from mga05.intel.com ([192.55.52.43]:50107 "EHLO mga05.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229818AbhGERAo (ORCPT ); Mon, 5 Jul 2021 13:00:44 -0400 X-IronPort-AV: E=McAfee;i="6200,9189,10036"; a="294646412" X-IronPort-AV: E=Sophos;i="5.83,325,1616482800"; d="scan'208";a="294646412" Received: from fmsmga001.fm.intel.com ([10.253.24.23]) by fmsmga105.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 05 Jul 2021 09:58:07 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.83,325,1616482800"; d="scan'208";a="562686311" Received: from ranger.igk.intel.com ([10.102.21.164]) by fmsmga001.fm.intel.com with ESMTP; 05 Jul 2021 09:58:04 -0700 From: Maciej Fijalkowski To: intel-wired-lan@lists.osuosl.org Cc: netdev@vger.kernel.org, bpf@vger.kernel.org, davem@davemloft.net, anthony.l.nguyen@intel.com, kuba@kernel.org, bjorn@kernel.org, magnus.karlsson@intel.com, joamaki@gmail.com, toke@redhat.com, Maciej Fijalkowski Subject: [PATCH v2 intel-next 4/4] ice: introduce XDP_TX fallback path Date: Mon, 5 Jul 2021 18:43:38 +0200 Message-Id: <20210705164338.58313-5-maciej.fijalkowski@intel.com> X-Mailer: git-send-email 2.20.1 In-Reply-To: <20210705164338.58313-1-maciej.fijalkowski@intel.com> References: <20210705164338.58313-1-maciej.fijalkowski@intel.com> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: bpf@vger.kernel.org X-Patchwork-Delegate: kuba@kernel.org Under rare circumstances there might be a situation where a requirement of having XDP Tx queue per CPU could not be fulfilled and some of the Tx resources have to be shared between CPUs. This yields a need for placing accesses to xdp_rings array inside a critical section protected by spinlock. These accesses happen to be in the hot path, so let's introduce the static branch that will be triggered from the control plane when driver could not provide Tx queue dedicated for XDP on each CPU. Currently, the design that has been picked is to allow any number of XDP Tx queues that is at least half of a count of CPUs that platform has. For lower number driver will bail out with a response to user that there were not enough Tx resources that would allow configuring XDP. The sharing of rings is signalled via static branch enablement which in turn indicates that lock for xdp_ring accesses needs to be taken in hot path. Approach based on static branch has no impact on performance of a non-fallback path. The static branch will act as a global driver switch, meaning that if one PF got out of Tx resources, then other PFs that ice driver is servicing will suffer. However, given the fact that HW that ice driver is handling has 1024 Tx queues per each PF, this is currently an unlikely scenario. One thing to note in the fallback path is that it can result in a situation where one NAPI instance is producing descriptors to the xdp_ring that belongs to different NAPI, which would mean that an explicit raise of a software interrupt needs to be issued so that remote NAPI is scheduled and descriptors are cleaned on the xdp_ring. Signed-off-by: Maciej Fijalkowski --- drivers/net/ethernet/intel/ice/ice.h | 3 ++ drivers/net/ethernet/intel/ice/ice_lib.c | 4 +- drivers/net/ethernet/intel/ice/ice_main.c | 45 +++++++++++++++++-- drivers/net/ethernet/intel/ice/ice_txrx.c | 44 +++++++++++++++--- drivers/net/ethernet/intel/ice/ice_txrx.h | 1 + drivers/net/ethernet/intel/ice/ice_txrx_lib.c | 22 ++++++--- drivers/net/ethernet/intel/ice/ice_txrx_lib.h | 16 +++++++ 7 files changed, 117 insertions(+), 18 deletions(-) diff --git a/drivers/net/ethernet/intel/ice/ice.h b/drivers/net/ethernet/intel/ice/ice.h index a450343fbb92..3ddf079a6bf3 100644 --- a/drivers/net/ethernet/intel/ice/ice.h +++ b/drivers/net/ethernet/intel/ice/ice.h @@ -158,6 +158,8 @@ #define ice_pf_to_dev(pf) (&((pf)->pdev->dev)) +DECLARE_STATIC_KEY_FALSE(ice_xdp_locking_key); + struct ice_txq_meta { u32 q_teid; /* Tx-scheduler element identifier */ u16 q_id; /* Entry in VSI's txq_map bitmap */ @@ -647,6 +649,7 @@ int ice_up(struct ice_vsi *vsi); int ice_down(struct ice_vsi *vsi); int ice_vsi_cfg(struct ice_vsi *vsi); struct ice_vsi *ice_lb_vsi_setup(struct ice_pf *pf, struct ice_port_info *pi); +int ice_vsi_determine_xdp_res(struct ice_vsi *vsi); int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog); int ice_destroy_xdp_rings(struct ice_vsi *vsi); int diff --git a/drivers/net/ethernet/intel/ice/ice_lib.c b/drivers/net/ethernet/intel/ice/ice_lib.c index 20d02e9d6635..9aa8e711aaef 100644 --- a/drivers/net/ethernet/intel/ice/ice_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_lib.c @@ -3154,7 +3154,9 @@ int ice_vsi_rebuild(struct ice_vsi *vsi, bool init_vsi) ice_vsi_map_rings_to_vectors(vsi); if (ice_is_xdp_ena_vsi(vsi)) { - vsi->num_xdp_txq = num_possible_cpus(); + ret = ice_vsi_determine_xdp_res(vsi); + if (ret) + goto err_vectors; ret = ice_prepare_xdp_rings(vsi, vsi->xdp_prog); if (ret) goto err_vectors; diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index f26db305b797..151868c7c485 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -42,6 +42,8 @@ MODULE_PARM_DESC(debug, "netif level (0=none,...,16=all)"); #endif /* !CONFIG_DYNAMIC_DEBUG */ static DEFINE_IDA(ice_aux_ida); +DEFINE_STATIC_KEY_FALSE(ice_xdp_locking_key); +EXPORT_SYMBOL(ice_xdp_locking_key); static struct workqueue_struct *ice_wq; static const struct net_device_ops ice_netdev_safe_mode_ops; @@ -2382,6 +2384,7 @@ static int ice_xdp_alloc_setup_rings(struct ice_vsi *vsi) goto free_xdp_rings; ice_set_ring_xdp(xdp_ring); xdp_ring->xsk_pool = ice_xsk_pool(xdp_ring); + spin_lock_init(&xdp_ring->tx_lock); } return 0; @@ -2447,6 +2450,10 @@ int ice_prepare_xdp_rings(struct ice_vsi *vsi, struct bpf_prog *prog) if (__ice_vsi_get_qs(&xdp_qs_cfg)) goto err_map_xdp; + if (static_key_enabled(&ice_xdp_locking_key)) + netdev_warn(vsi->netdev, + "Could not allocate one XDP Tx ring per CPU, XDP_TX/XDP_REDIRECT actions will be slower\n"); + if (ice_xdp_alloc_setup_rings(vsi)) goto clear_xdp_rings; @@ -2563,6 +2570,9 @@ int ice_destroy_xdp_rings(struct ice_vsi *vsi) devm_kfree(ice_pf_to_dev(pf), vsi->xdp_rings); vsi->xdp_rings = NULL; + if (static_key_enabled(&ice_xdp_locking_key)) + static_branch_dec(&ice_xdp_locking_key); + if (ice_is_reset_in_progress(pf->state) || !vsi->q_vectors[0]) return 0; @@ -2597,6 +2607,29 @@ static void ice_vsi_rx_napi_schedule(struct ice_vsi *vsi) } } +/** + * ice_vsi_determine_xdp_res - figure out how many Tx qs can XDP have + * @vsi: VSI to determine the count of XDP Tx qs + * + * returns 0 if Tx qs count is higher than at least half of CPU count, + * -ENOMEM otherwise + */ +int ice_vsi_determine_xdp_res(struct ice_vsi *vsi) +{ + u16 avail = ice_get_avail_txq_count(vsi->back); + u16 cpus = num_possible_cpus(); + + if (avail < cpus / 2) + return -ENOMEM; + + vsi->num_xdp_txq = min_t(u16, avail, cpus); + + if (vsi->num_xdp_txq < cpus) + static_branch_inc(&ice_xdp_locking_key); + + return 0; +} + /** * ice_xdp_setup_prog - Add or remove XDP eBPF program * @vsi: VSI to setup XDP for @@ -2626,10 +2659,14 @@ ice_xdp_setup_prog(struct ice_vsi *vsi, struct bpf_prog *prog, } if (!ice_is_xdp_ena_vsi(vsi) && prog) { - vsi->num_xdp_txq = num_possible_cpus(); - xdp_ring_err = ice_prepare_xdp_rings(vsi, prog); - if (xdp_ring_err) - NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Tx resources failed"); + xdp_ring_err = ice_vsi_determine_xdp_res(vsi); + if (xdp_ring_err) { + NL_SET_ERR_MSG_MOD(extack, "Not enough Tx resources for XDP"); + } else { + xdp_ring_err = ice_prepare_xdp_rings(vsi, prog); + if (xdp_ring_err) + NL_SET_ERR_MSG_MOD(extack, "Setting up XDP Tx resources failed"); + } } else if (ice_is_xdp_ena_vsi(vsi) && !prog) { xdp_ring_err = ice_destroy_xdp_rings(vsi); if (xdp_ring_err) diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.c b/drivers/net/ethernet/intel/ice/ice_txrx.c index fef1f74562e5..97d205fb19da 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx.c @@ -577,19 +577,35 @@ ice_run_xdp(struct ice_ring *rx_ring, struct xdp_buff *xdp, struct bpf_prog *xdp_prog) { struct ice_ring *xdp_ring; - int err, result; + int err; u32 act; + u16 idx; act = bpf_prog_run_xdp(xdp_prog, xdp); switch (act) { case XDP_PASS: return ICE_XDP_PASS; case XDP_TX: - xdp_ring = rx_ring->vsi->xdp_rings[smp_processor_id()]; - result = ice_xmit_xdp_ring(xdp->data, xdp->data_end - xdp->data, xdp_ring); - if (result == ICE_XDP_CONSUMED) + idx = smp_processor_id(); + if (static_branch_unlikely(&ice_xdp_locking_key)) { + idx %= rx_ring->vsi->num_xdp_txq; + xdp_ring = rx_ring->vsi->xdp_rings[idx]; + spin_lock(&xdp_ring->tx_lock); + } else { + xdp_ring = rx_ring->vsi->xdp_rings[idx]; + } + err = ice_xmit_xdp_ring(xdp->data, xdp->data_end - xdp->data, xdp_ring); + if (static_branch_unlikely(&ice_xdp_locking_key)) { + spin_unlock(&xdp_ring->tx_lock); + if (err == ICE_XDP_CONSUMED) { + /* kick the remote NAPI that xdp_ring belongs to */ + ice_trigger_sw_intr(&xdp_ring->vsi->back->hw, xdp_ring->q_vector); + goto out_failure; + } + } + if (err == ICE_XDP_CONSUMED) goto out_failure; - return result; + return err; case XDP_REDIRECT: err = xdp_do_redirect(rx_ring->netdev, xdp, xdp_prog); if (err) @@ -638,7 +654,14 @@ ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, if (unlikely(flags & ~XDP_XMIT_FLAGS_MASK)) return -EINVAL; - xdp_ring = vsi->xdp_rings[queue_index]; + if (static_branch_unlikely(&ice_xdp_locking_key)) { + queue_index %= vsi->num_xdp_txq; + xdp_ring = vsi->xdp_rings[queue_index]; + spin_lock(&xdp_ring->tx_lock); + } else { + xdp_ring = vsi->xdp_rings[queue_index]; + } + for (i = 0; i < n; i++) { struct xdp_frame *xdpf = frames[i]; int err; @@ -649,8 +672,15 @@ ice_xdp_xmit(struct net_device *dev, int n, struct xdp_frame **frames, nxmit++; } - if (unlikely(flags & XDP_XMIT_FLUSH)) + if (unlikely(flags & XDP_XMIT_FLUSH)) { + ice_xdp_ring_set_rs(xdp_ring); ice_xdp_ring_update_tail(xdp_ring); + } + + if (static_branch_unlikely(&ice_xdp_locking_key)) { + spin_unlock(&xdp_ring->tx_lock); + ice_trigger_sw_intr(&vsi->back->hw, xdp_ring->q_vector); + } return nxmit; } diff --git a/drivers/net/ethernet/intel/ice/ice_txrx.h b/drivers/net/ethernet/intel/ice/ice_txrx.h index b43d471ce05d..63538cb3b632 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx.h @@ -303,6 +303,7 @@ struct ice_ring { u16 rx_offset; /* CL3 - 3rd cacheline starts here */ struct xdp_rxq_info xdp_rxq; + spinlock_t tx_lock; struct sk_buff *skb; /* CLX - the below items are only accessed infrequently and should be * in their own cache line if possible diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c index 0b3d51c9869b..164008e26869 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.c +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.c @@ -2,6 +2,7 @@ /* Copyright (c) 2019, Intel Corporation. */ #include "ice_txrx_lib.h" +#include "ice_base.h" /** * ice_release_rx_desc - Store the new tail and head values @@ -288,12 +289,21 @@ void ice_finalize_xdp_rx(struct ice_ring *rx_ring, unsigned int xdp_res) xdp_do_flush_map(); if (xdp_res & ICE_XDP_TX) { - struct ice_ring *xdp_ring = - rx_ring->vsi->xdp_rings[smp_processor_id()]; - struct ice_tx_desc *next_rs_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs_idx); - - next_rs_desc->cmd_type_offset_bsz |= - cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S); + u16 idx = smp_processor_id(); + struct ice_ring *xdp_ring; + + if (static_branch_unlikely(&ice_xdp_locking_key)) { + idx %= rx_ring->vsi->num_xdp_txq; + xdp_ring = rx_ring->vsi->xdp_rings[idx]; + spin_lock(&xdp_ring->tx_lock); + } else { + xdp_ring = rx_ring->vsi->xdp_rings[idx]; + } + ice_xdp_ring_set_rs(xdp_ring); ice_xdp_ring_update_tail(xdp_ring); + if (static_branch_unlikely(&ice_xdp_locking_key)) { + spin_unlock(&xdp_ring->tx_lock); + ice_trigger_sw_intr(&rx_ring->vsi->back->hw, xdp_ring->q_vector); + } } } diff --git a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h index 05ac30752902..319b4bfcecfc 100644 --- a/drivers/net/ethernet/intel/ice/ice_txrx_lib.h +++ b/drivers/net/ethernet/intel/ice/ice_txrx_lib.h @@ -31,6 +31,22 @@ ice_build_ctob(u64 td_cmd, u64 td_offset, unsigned int size, u64 td_tag) (td_tag << ICE_TXD_QW1_L2TAG1_S)); } +/** + * ice_xdp_ring_set_rs - Sets RS bit on a Tx descriptor + * @xdp_ring: XDP Tx ring + * + * This function sets RS bit on the last descriptor of a batch, + * before the update of HW tail register + */ +static inline void ice_xdp_ring_set_rs(struct ice_ring *xdp_ring) +{ + struct ice_tx_desc *next_rs_desc; + + next_rs_desc = ICE_TX_DESC(xdp_ring, xdp_ring->next_rs_idx); + next_rs_desc->cmd_type_offset_bsz |= + cpu_to_le64(ICE_TX_DESC_CMD_RS << ICE_TXD_QW1_CMD_S); +} + /** * ice_xdp_ring_update_tail - Updates the XDP Tx ring tail register * @xdp_ring: XDP Tx ring