diff mbox series

[net-next,10/10] nfp: nfdk: implement xdp tx path for NFDK

Message ID 20220318101302.113419-11-simon.horman@corigine.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series nfp: support for NFP-3800 | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 1 this patch: 3
netdev/cc_maintainers warning 8 maintainers not CCed: daniel@iogearbox.net ast@kernel.org yinjun.zhang@corigine.com hawk@kernel.org john.fastabend@gmail.com fei.qin@corigine.com pabeni@redhat.com bpf@vger.kernel.org
netdev/build_clang success Errors and warnings before: 1 this patch: 1
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1 this patch: 1
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 1 this patch: 1
netdev/source_inline success Was 0 now: 0

Commit Message

Simon Horman March 18, 2022, 10:13 a.m. UTC
From: Yinjun Zhang <yinjun.zhang@corigine.com>

Due to the different definition of txbuf in NFDK comparing to NFD3,
there're no pre-allocated txbufs for xdp use in NFDK's implementation,
we just use the existed rxbuf and recycle it when xdp tx is completed.

For each packet to transmit in xdp path, we cannot use more than
`NFDK_TX_DESC_PER_SIMPLE_PKT` txbufs, one is to stash virtual address,
and another is for dma address, so currently the amount of transmitted
bytes is not accumulated. Also we borrow the last bit of virtual addr
to indicate a new transmitted packet due to address's alignment
attribution.

Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
Signed-off-by: Fei Qin <fei.qin@corigine.com>
Signed-off-by: Simon Horman <simon.horman@corigine.com>
---
 drivers/net/ethernet/netronome/nfp/nfdk/dp.c  | 196 +++++++++++++++++-
 .../net/ethernet/netronome/nfp/nfdk/nfdk.h    |  16 ++
 2 files changed, 207 insertions(+), 5 deletions(-)

Comments

Jakub Kicinski March 18, 2022, 5:56 p.m. UTC | #1
On Fri, 18 Mar 2022 11:13:02 +0100 Simon Horman wrote:
> From: Yinjun Zhang <yinjun.zhang@corigine.com>
> 
> Due to the different definition of txbuf in NFDK comparing to NFD3,
> there're no pre-allocated txbufs for xdp use in NFDK's implementation,
> we just use the existed rxbuf and recycle it when xdp tx is completed.
> 
> For each packet to transmit in xdp path, we cannot use more than
> `NFDK_TX_DESC_PER_SIMPLE_PKT` txbufs, one is to stash virtual address,
> and another is for dma address, so currently the amount of transmitted
> bytes is not accumulated. Also we borrow the last bit of virtual addr
> to indicate a new transmitted packet due to address's alignment
> attribution.
> 
> Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
> Signed-off-by: Fei Qin <fei.qin@corigine.com>
> Signed-off-by: Simon Horman <simon.horman@corigine.com>

Breaks 32 bit :(
Yinjun Zhang March 19, 2022, 1:55 a.m. UTC | #2
On Fri, Mar 18, 2022 at 10:56:45AM -0700, Jakub Kicinski wrote:
> On Fri, 18 Mar 2022 11:13:02 +0100 Simon Horman wrote:
> > From: Yinjun Zhang <yinjun.zhang@corigine.com>
> > 
> > Due to the different definition of txbuf in NFDK comparing to NFD3,
> > there're no pre-allocated txbufs for xdp use in NFDK's implementation,
> > we just use the existed rxbuf and recycle it when xdp tx is completed.
> > 
> > For each packet to transmit in xdp path, we cannot use more than
> > `NFDK_TX_DESC_PER_SIMPLE_PKT` txbufs, one is to stash virtual address,
> > and another is for dma address, so currently the amount of transmitted
> > bytes is not accumulated. Also we borrow the last bit of virtual addr
> > to indicate a new transmitted packet due to address's alignment
> > attribution.
> > 
> > Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
> > Signed-off-by: Fei Qin <fei.qin@corigine.com>
> > Signed-off-by: Simon Horman <simon.horman@corigine.com>
> 
> Breaks 32 bit :(

You mean 32-bit arch? I'd thought of that, but why needn't
`NFCT_PTRMASK` take that into account?
Yinjun Zhang March 19, 2022, 3:15 a.m. UTC | #3
On Sat, Mar 19, 2022 at 09:55:46AM +0800, Yinjun Zhang wrote:
> On Fri, Mar 18, 2022 at 10:56:45AM -0700, Jakub Kicinski wrote:
> > On Fri, 18 Mar 2022 11:13:02 +0100 Simon Horman wrote:
> > > From: Yinjun Zhang <yinjun.zhang@corigine.com>
> > > 
> > > Due to the different definition of txbuf in NFDK comparing to NFD3,
> > > there're no pre-allocated txbufs for xdp use in NFDK's implementation,
> > > we just use the existed rxbuf and recycle it when xdp tx is completed.
> > > 
> > > For each packet to transmit in xdp path, we cannot use more than
> > > `NFDK_TX_DESC_PER_SIMPLE_PKT` txbufs, one is to stash virtual address,
> > > and another is for dma address, so currently the amount of transmitted
> > > bytes is not accumulated. Also we borrow the last bit of virtual addr
> > > to indicate a new transmitted packet due to address's alignment
> > > attribution.
> > > 
> > > Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
> > > Signed-off-by: Fei Qin <fei.qin@corigine.com>
> > > Signed-off-by: Simon Horman <simon.horman@corigine.com>
> > 
> > Breaks 32 bit :(
> 
> You mean 32-bit arch? I'd thought of that, but why needn't
> `NFCT_PTRMASK` take that into account?

I see, because `nf_conn` is allocated from kmem_cache which is
created as 8 bytes aligned. I'll modify our implementation.
Jakub Kicinski March 19, 2022, 4:30 a.m. UTC | #4
On Sat, 19 Mar 2022 09:55:46 +0800 Yinjun Zhang wrote:
> On Fri, Mar 18, 2022 at 10:56:45AM -0700, Jakub Kicinski wrote:
> > On Fri, 18 Mar 2022 11:13:02 +0100 Simon Horman wrote:  
> > > From: Yinjun Zhang <yinjun.zhang@corigine.com>
> > > 
> > > Due to the different definition of txbuf in NFDK comparing to NFD3,
> > > there're no pre-allocated txbufs for xdp use in NFDK's implementation,
> > > we just use the existed rxbuf and recycle it when xdp tx is completed.
> > > 
> > > For each packet to transmit in xdp path, we cannot use more than
> > > `NFDK_TX_DESC_PER_SIMPLE_PKT` txbufs, one is to stash virtual address,
> > > and another is for dma address, so currently the amount of transmitted
> > > bytes is not accumulated. Also we borrow the last bit of virtual addr
> > > to indicate a new transmitted packet due to address's alignment
> > > attribution.
> > > 
> > > Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
> > > Signed-off-by: Fei Qin <fei.qin@corigine.com>
> > > Signed-off-by: Simon Horman <simon.horman@corigine.com>  
> > 
> > Breaks 32 bit :(  
> 
> You mean 32-bit arch? I'd thought of that, but why needn't
> `NFCT_PTRMASK` take that into account?

Simpler than that, I just meant the build.
It's about casting between 64b types and pointers:

drivers/net/ethernet/netronome/nfp/nfdk/dp.c: In function ‘nfp_nfdk_xdp_complete’:
drivers/net/ethernet/netronome/nfp/nfdk/dp.c:827:38: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
  827 |                                      (void *)NFDK_TX_BUF_PTR(txbuf[0].raw),
      |                                      ^
drivers/net/ethernet/netronome/nfp/nfdk/dp.c: In function ‘nfp_nfdk_tx_xdp_buf’:
drivers/net/ethernet/netronome/nfp/nfdk/dp.c:909:24: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
  909 |         txbuf[0].raw = (u64)rxbuf->frag | NFDK_TX_BUF_INFO_SOP;
      |                        ^
Yinjun Zhang March 19, 2022, 6:26 a.m. UTC | #5
On Fri, Mar 18, 2022 at 09:30:12PM -0700, Jakub Kicinski wrote:
> On Sat, 19 Mar 2022 09:55:46 +0800 Yinjun Zhang wrote:
> > On Fri, Mar 18, 2022 at 10:56:45AM -0700, Jakub Kicinski wrote:
> > > On Fri, 18 Mar 2022 11:13:02 +0100 Simon Horman wrote:  
> > > > From: Yinjun Zhang <yinjun.zhang@corigine.com>
> > > > 
> > > > Due to the different definition of txbuf in NFDK comparing to NFD3,
> > > > there're no pre-allocated txbufs for xdp use in NFDK's implementation,
> > > > we just use the existed rxbuf and recycle it when xdp tx is completed.
> > > > 
> > > > For each packet to transmit in xdp path, we cannot use more than
> > > > `NFDK_TX_DESC_PER_SIMPLE_PKT` txbufs, one is to stash virtual address,
> > > > and another is for dma address, so currently the amount of transmitted
> > > > bytes is not accumulated. Also we borrow the last bit of virtual addr
> > > > to indicate a new transmitted packet due to address's alignment
> > > > attribution.
> > > > 
> > > > Signed-off-by: Yinjun Zhang <yinjun.zhang@corigine.com>
> > > > Signed-off-by: Fei Qin <fei.qin@corigine.com>
> > > > Signed-off-by: Simon Horman <simon.horman@corigine.com>  
> > > 
> > > Breaks 32 bit :(  
> > 
> > You mean 32-bit arch? I'd thought of that, but why needn't
> > `NFCT_PTRMASK` take that into account?
> 
> Simpler than that, I just meant the build.
> It's about casting between 64b types and pointers:
> 
> drivers/net/ethernet/netronome/nfp/nfdk/dp.c: In function ‘nfp_nfdk_xdp_complete’:
> drivers/net/ethernet/netronome/nfp/nfdk/dp.c:827:38: warning: cast to pointer from integer of different size [-Wint-to-pointer-cast]
>   827 |                                      (void *)NFDK_TX_BUF_PTR(txbuf[0].raw),
>       |                                      ^
> drivers/net/ethernet/netronome/nfp/nfdk/dp.c: In function ‘nfp_nfdk_tx_xdp_buf’:
> drivers/net/ethernet/netronome/nfp/nfdk/dp.c:909:24: warning: cast from pointer to integer of different size [-Wpointer-to-int-cast]
>   909 |         txbuf[0].raw = (u64)rxbuf->frag | NFDK_TX_BUF_INFO_SOP;
>       |                        ^

I see, thanks for capturing this, will fix that in next version.
diff mbox series

Patch

diff --git a/drivers/net/ethernet/netronome/nfp/nfdk/dp.c b/drivers/net/ethernet/netronome/nfp/nfdk/dp.c
index d107458d1fd3..5a9310f770af 100644
--- a/drivers/net/ethernet/netronome/nfp/nfdk/dp.c
+++ b/drivers/net/ethernet/netronome/nfp/nfdk/dp.c
@@ -541,11 +541,6 @@  static void nfp_nfdk_tx_complete(struct nfp_net_tx_ring *tx_ring, int budget)
 		  tx_ring->rd_p, tx_ring->wr_p, tx_ring->cnt);
 }
 
-static bool nfp_nfdk_xdp_complete(struct nfp_net_tx_ring *tx_ring)
-{
-	return true;
-}
-
 /* Receive processing */
 static void *
 nfp_nfdk_napi_alloc_one(struct nfp_net_dp *dp, dma_addr_t *dma_addr)
@@ -791,6 +786,185 @@  nfp_nfdk_rx_drop(const struct nfp_net_dp *dp, struct nfp_net_r_vector *r_vec,
 		dev_kfree_skb_any(skb);
 }
 
+static bool nfp_nfdk_xdp_complete(struct nfp_net_tx_ring *tx_ring)
+{
+	struct nfp_net_r_vector *r_vec = tx_ring->r_vec;
+	struct nfp_net_dp *dp = &r_vec->nfp_net->dp;
+	struct nfp_net_rx_ring *rx_ring;
+	u32 qcp_rd_p, done = 0;
+	bool done_all;
+	int todo;
+
+	/* Work out how many descriptors have been transmitted */
+	qcp_rd_p = nfp_net_read_tx_cmpl(tx_ring, dp);
+	if (qcp_rd_p == tx_ring->qcp_rd_p)
+		return true;
+
+	todo = D_IDX(tx_ring, qcp_rd_p - tx_ring->qcp_rd_p);
+
+	done_all = todo <= NFP_NET_XDP_MAX_COMPLETE;
+	todo = min(todo, NFP_NET_XDP_MAX_COMPLETE);
+
+	rx_ring = r_vec->rx_ring;
+	while (todo > 0) {
+		int idx = D_IDX(tx_ring, tx_ring->rd_p + done);
+		struct nfp_nfdk_tx_buf *txbuf;
+		unsigned int step = 1;
+
+		txbuf = &tx_ring->ktxbufs[idx];
+		if (!txbuf->raw)
+			goto next;
+
+		if (NFDK_TX_BUF_INFO(txbuf->raw) != NFDK_TX_BUF_INFO_SOP) {
+			WARN_ONCE(1, "Unexpected TX buffer in XDP TX ring\n");
+			goto next;
+		}
+
+		/* Two successive txbufs are used to stash virtual and dma
+		 * address respectively, recycle and clean them here.
+		 */
+		nfp_nfdk_rx_give_one(dp, rx_ring,
+				     (void *)NFDK_TX_BUF_PTR(txbuf[0].raw),
+				     txbuf[1].dma_addr);
+		txbuf[0].raw = 0;
+		txbuf[1].raw = 0;
+		step = 2;
+
+		u64_stats_update_begin(&r_vec->tx_sync);
+		/* Note: tx_bytes not accumulated. */
+		r_vec->tx_pkts++;
+		u64_stats_update_end(&r_vec->tx_sync);
+next:
+		todo -= step;
+		done += step;
+	}
+
+	tx_ring->qcp_rd_p = D_IDX(tx_ring, tx_ring->qcp_rd_p + done);
+	tx_ring->rd_p += done;
+
+	WARN_ONCE(tx_ring->wr_p - tx_ring->rd_p > tx_ring->cnt,
+		  "XDP TX ring corruption rd_p=%u wr_p=%u cnt=%u\n",
+		  tx_ring->rd_p, tx_ring->wr_p, tx_ring->cnt);
+
+	return done_all;
+}
+
+static bool
+nfp_nfdk_tx_xdp_buf(struct nfp_net_dp *dp, struct nfp_net_rx_ring *rx_ring,
+		    struct nfp_net_tx_ring *tx_ring,
+		    struct nfp_net_rx_buf *rxbuf, unsigned int dma_off,
+		    unsigned int pkt_len, bool *completed)
+{
+	unsigned int dma_map_sz = dp->fl_bufsz - NFP_NET_RX_BUF_NON_DATA;
+	unsigned int dma_len, type, cnt, dlen_type, tmp_dlen;
+	struct nfp_nfdk_tx_buf *txbuf;
+	struct nfp_nfdk_tx_desc *txd;
+	unsigned int n_descs;
+	dma_addr_t dma_addr;
+	int wr_idx;
+
+	/* Reject if xdp_adjust_tail grow packet beyond DMA area */
+	if (pkt_len + dma_off > dma_map_sz)
+		return false;
+
+	/* Make sure there's still at least one block available after
+	 * aligning to block boundary, so that the txds used below
+	 * won't wrap around the tx_ring.
+	 */
+	if (unlikely(nfp_net_tx_full(tx_ring, NFDK_TX_DESC_STOP_CNT))) {
+		if (!*completed) {
+			nfp_nfdk_xdp_complete(tx_ring);
+			*completed = true;
+		}
+
+		if (unlikely(nfp_net_tx_full(tx_ring, NFDK_TX_DESC_STOP_CNT))) {
+			nfp_nfdk_rx_drop(dp, rx_ring->r_vec, rx_ring, rxbuf,
+					 NULL);
+			return false;
+		}
+	}
+
+	/* Check if cross block boundary */
+	n_descs = nfp_nfdk_headlen_to_segs(pkt_len);
+	if ((round_down(tx_ring->wr_p, NFDK_TX_DESC_BLOCK_CNT) !=
+	     round_down(tx_ring->wr_p + n_descs, NFDK_TX_DESC_BLOCK_CNT)) ||
+	    ((u32)tx_ring->data_pending + pkt_len >
+	     NFDK_TX_MAX_DATA_PER_BLOCK)) {
+		unsigned int nop_slots = D_BLOCK_CPL(tx_ring->wr_p);
+
+		wr_idx = D_IDX(tx_ring, tx_ring->wr_p);
+		txd = &tx_ring->ktxds[wr_idx];
+		memset(txd, 0,
+		       array_size(nop_slots, sizeof(struct nfp_nfdk_tx_desc)));
+
+		tx_ring->data_pending = 0;
+		tx_ring->wr_p += nop_slots;
+		tx_ring->wr_ptr_add += nop_slots;
+	}
+
+	wr_idx = D_IDX(tx_ring, tx_ring->wr_p);
+
+	txbuf = &tx_ring->ktxbufs[wr_idx];
+
+	txbuf[0].raw = (u64)rxbuf->frag | NFDK_TX_BUF_INFO_SOP;
+	txbuf[1].dma_addr = rxbuf->dma_addr;
+	/* Note: pkt len not stored */
+
+	dma_sync_single_for_device(dp->dev, rxbuf->dma_addr + dma_off,
+				   pkt_len, DMA_BIDIRECTIONAL);
+
+	/* Build TX descriptor */
+	txd = &tx_ring->ktxds[wr_idx];
+	dma_len = pkt_len;
+	dma_addr = rxbuf->dma_addr + dma_off;
+
+	if (dma_len < NFDK_TX_MAX_DATA_PER_HEAD)
+		type = NFDK_DESC_TX_TYPE_SIMPLE;
+	else
+		type = NFDK_DESC_TX_TYPE_GATHER;
+
+	/* FIELD_PREP() implicitly truncates to chunk */
+	dma_len -= 1;
+	dlen_type = FIELD_PREP(NFDK_DESC_TX_DMA_LEN_HEAD, dma_len) |
+		    FIELD_PREP(NFDK_DESC_TX_TYPE_HEAD, type);
+
+	txd->dma_len_type = cpu_to_le16(dlen_type);
+	nfp_desc_set_dma_addr(txd, dma_addr);
+
+	tmp_dlen = dlen_type & NFDK_DESC_TX_DMA_LEN_HEAD;
+	dma_len -= tmp_dlen;
+	dma_addr += tmp_dlen + 1;
+	txd++;
+
+	while (dma_len > 0) {
+		dma_len -= 1;
+		dlen_type = FIELD_PREP(NFDK_DESC_TX_DMA_LEN, dma_len);
+		txd->dma_len_type = cpu_to_le16(dlen_type);
+		nfp_desc_set_dma_addr(txd, dma_addr);
+
+		dlen_type &= NFDK_DESC_TX_DMA_LEN;
+		dma_len -= dlen_type;
+		dma_addr += dlen_type + 1;
+		txd++;
+	}
+
+	(txd - 1)->dma_len_type = cpu_to_le16(dlen_type | NFDK_DESC_TX_EOP);
+
+	/* Metadata desc */
+	txd->raw = 0;
+	txd++;
+
+	cnt = txd - tx_ring->ktxds - wr_idx;
+	tx_ring->wr_p += cnt;
+	if (tx_ring->wr_p % NFDK_TX_DESC_BLOCK_CNT)
+		tx_ring->data_pending += pkt_len;
+	else
+		tx_ring->data_pending = 0;
+
+	tx_ring->wr_ptr_add += cnt;
+	return true;
+}
+
 /**
  * nfp_nfdk_rx() - receive up to @budget packets on @rx_ring
  * @rx_ring:   RX ring to receive from
@@ -903,6 +1077,7 @@  static int nfp_nfdk_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 
 		if (xdp_prog && !meta.portid) {
 			void *orig_data = rxbuf->frag + pkt_off;
+			unsigned int dma_off;
 			int act;
 
 			xdp_prepare_buff(&xdp,
@@ -919,6 +1094,17 @@  static int nfp_nfdk_rx(struct nfp_net_rx_ring *rx_ring, int budget)
 			case XDP_PASS:
 				meta_len_xdp = xdp.data - xdp.data_meta;
 				break;
+			case XDP_TX:
+				dma_off = pkt_off - NFP_NET_RX_BUF_HEADROOM;
+				if (unlikely(!nfp_nfdk_tx_xdp_buf(dp, rx_ring,
+								  tx_ring,
+								  rxbuf,
+								  dma_off,
+								  pkt_len,
+								  &xdp_tx_cmpl)))
+					trace_xdp_exception(dp->netdev,
+							    xdp_prog, act);
+				continue;
 			default:
 				bpf_warn_invalid_xdp_action(dp->netdev, xdp_prog, act);
 				fallthrough;
diff --git a/drivers/net/ethernet/netronome/nfp/nfdk/nfdk.h b/drivers/net/ethernet/netronome/nfp/nfdk/nfdk.h
index 5107c4f03feb..82bfbe7062b8 100644
--- a/drivers/net/ethernet/netronome/nfp/nfdk/nfdk.h
+++ b/drivers/net/ethernet/netronome/nfp/nfdk/nfdk.h
@@ -71,6 +71,22 @@  struct nfp_nfdk_tx_desc {
 	};
 };
 
+/* The device don't make use of the three least significant bits of the address
+ * due to alignment constraints. The driver can make use of those bits to carry
+ * information about the buffer before giving it to the device.
+ *
+ * NOTE: The driver must clear the lower bits before handing the buffer to the
+ * device.
+ *
+ * - NFDK_TX_BUF_INFO_SOP - Start of a packet
+ *   Mark the buffer as a start of a packet. This is used in the XDP TX process
+ *   to stash virtual and DMA address so that they can be recycled when the TX
+ *   operation is completed.
+ */
+#define NFDK_TX_BUF_PTR(raw) ((raw) & ~7UL)
+#define NFDK_TX_BUF_INFO(raw) ((raw) & 7UL)
+#define NFDK_TX_BUF_INFO_SOP BIT(0)
+
 struct nfp_nfdk_tx_buf {
 	union {
 		/* First slot */