diff mbox series

[net-next,v4,4/6] enic: Use the Page Pool API for RX when MTU is less than page size

Message ID 20250102222427.28370-5-johndale@cisco.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series enic: Use Page Pool API for receiving packets | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1 this patch: 1
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 14 this patch: 14
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 18 this patch: 18
netdev/checkpatch warning WARNING: line length of 82 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

John Daley Jan. 2, 2025, 10:24 p.m. UTC
The Page Pool API improves bandwidth and CPU overhead by recycling
pages instead of allocating new buffers in the driver. Make use of
page pool fragment allocation for smaller MTUs so that multiple
packets can share a page.

Added 'pp_alloc_error' per RQ ethtool statistic to count
page_pool_dev_alloc() failures.

Co-developed-by: Nelson Escobar <neescoba@cisco.com>
Signed-off-by: Nelson Escobar <neescoba@cisco.com>
Co-developed-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: Satish Kharat <satishkh@cisco.com>
Signed-off-by: John Daley <johndale@cisco.com>
---
 drivers/net/ethernet/cisco/enic/enic.h        |  10 ++
 .../net/ethernet/cisco/enic/enic_ethtool.c    |   1 +
 drivers/net/ethernet/cisco/enic/enic_main.c   |  51 ++++++-
 drivers/net/ethernet/cisco/enic/enic_rq.c     | 140 ++++++++++++++++++
 drivers/net/ethernet/cisco/enic/enic_rq.h     |   5 +
 drivers/net/ethernet/cisco/enic/vnic_rq.h     |   2 +
 6 files changed, 203 insertions(+), 6 deletions(-)

Comments

Jakub Kicinski Jan. 5, 2025, 1:41 a.m. UTC | #1
On Thu,  2 Jan 2025 14:24:25 -0800 John Daley wrote:
> The Page Pool API improves bandwidth and CPU overhead by recycling
> pages instead of allocating new buffers in the driver. Make use of
> page pool fragment allocation for smaller MTUs so that multiple
> packets can share a page.

Why the MTU limitation? You can set page_pool_params.order 
to appropriate value always use the page pool.

> Added 'pp_alloc_error' per RQ ethtool statistic to count
> page_pool_dev_alloc() failures.

SG, but please don't report it via ethtool. Add it in 
enic_get_queue_stats_rx() as alloc_fail (and enic_get_base_stats()).
As one of the benefits you'll be able to use
tools/testing/selftests/drivers/net/hw/pp_alloc_fail.py
to test this stat and error handling in the driver.

> +void enic_rq_page_cleanup(struct enic_rq *rq)
> +{
> +	struct vnic_rq *vrq = &rq->vrq;
> +	struct enic *enic = vnic_dev_priv(vrq->vdev);
> +	struct napi_struct *napi = &enic->napi[vrq->index];
> +
> +	napi_free_frags(napi);

why?

> +	page_pool_destroy(rq->pool);
> +}
John Daley Jan. 6, 2025, 9:54 p.m. UTC | #2
>> The Page Pool API improves bandwidth and CPU overhead by recycling
>> pages instead of allocating new buffers in the driver. Make use of
>> page pool fragment allocation for smaller MTUs so that multiple
>> packets can share a page.
>
>Why the MTU limitation? You can set page_pool_params.order
>to appropriate value always use the page pool.

I thought it might waste memory, e.g. allocating 16K for 9000 mtu.
But now that you mention it, I see that the added code complexity is
probably not worth it. I am unclear on what to set pp_params.max_len
to when MTU > PAGE_SIZE. Order * PAGE_SIZE or MTU size? In this case
the pages won't be fragmented so isn't only necessary for the MTU sized
area to be DMA SYNC'ed?

>
>> Added 'pp_alloc_error' per RQ ethtool statistic to count
>> page_pool_dev_alloc() failures.
>
>SG, but please don't report it via ethtool. Add it in
>enic_get_queue_stats_rx() as alloc_fail (and enic_get_base_stats()).
>As one of the benefits you'll be able to use
>tools/testing/selftests/drivers/net/hw/pp_alloc_fail.py
>to test this stat and error handling in the driver.

ok, will do.
>
>> +void enic_rq_page_cleanup(struct enic_rq *rq)
>> +{
>> +    struct vnic_rq *vrq = &rq->vrq;
>> +    struct enic *enic = vnic_dev_priv(vrq->vdev);
>> +    struct napi_struct *napi = &enic->napi[vrq->index];
>> +
>> +    napi_free_frags(napi);
>
>why?

Mistake, left over from previous patch. Also, I will remove enic_rq_error_reset()
which calls napi_free_frags at a time when napi->skb is not owned by driver.
>
>> +    page_pool_destroy(rq->pool);
>> +}

I will make a v5 shortly. Would you recommend I split the patchset into 2 parts
as I think @andrew+netdev was suggesting? The last 2 patches are kind of unrelated
to the first 4.
Jakub Kicinski Jan. 7, 2025, 12:19 a.m. UTC | #3
On Mon,  6 Jan 2025 13:54:25 -0800 John Daley wrote:
> >> The Page Pool API improves bandwidth and CPU overhead by recycling
> >> pages instead of allocating new buffers in the driver. Make use of
> >> page pool fragment allocation for smaller MTUs so that multiple
> >> packets can share a page.  
> >
> >Why the MTU limitation? You can set page_pool_params.order
> >to appropriate value always use the page pool.  
> 
> I thought it might waste memory, e.g. allocating 16K for 9000 mtu.
> But now that you mention it, I see that the added code complexity is
> probably not worth it. I am unclear on what to set pp_params.max_len
> to when MTU > PAGE_SIZE. Order * PAGE_SIZE or MTU size? In this case
> the pages won't be fragmented so isn't only necessary for the MTU sized
> area to be DMA SYNC'ed?

Good point, once fragmentation is no longer possible you can
set .max_len to the size of the fragment HW may clobber,
and .offset to the reserved headroom.

> >  
> >> +    page_pool_destroy(rq->pool);
> >> +}  
> 
> I will make a v5 shortly. Would you recommend I split the patchset into 2 parts
> as I think @andrew+netdev was suggesting? The last 2 patches are kind of unrelated
> to the first 4.

Yes, seems like a good idea, patches 5 and 6 would probably have been
merged a while back if they were separate.
John Daley Jan. 7, 2025, 3 a.m. UTC | #4
>> >> The Page Pool API improves bandwidth and CPU overhead by recycling
>> >> pages instead of allocating new buffers in the driver. Make use of
>> >> page pool fragment allocation for smaller MTUs so that multiple
>> >> packets can share a page.  
>> >
>> >Why the MTU limitation? You can set page_pool_params.order
>> >to appropriate value always use the page pool.  
>> 
>> I thought it might waste memory, e.g. allocating 16K for 9000 mtu.
>> But now that you mention it, I see that the added code complexity is
>> probably not worth it. I am unclear on what to set pp_params.max_len
>> to when MTU > PAGE_SIZE. Order * PAGE_SIZE or MTU size? In this case
>> the pages won't be fragmented so isn't only necessary for the MTU sized
>> area to be DMA SYNC'ed?
>
>Good point, once fragmentation is no longer possible you can
>set .max_len to the size of the fragment HW may clobber,
>and .offset to the reserved headroom.

Ok, testing going good so far, but need another day.
>
>> >  
>> >> +    page_pool_destroy(rq->pool);
>> >> +}  
>> 
>> I will make a v5 shortly. Would you recommend I split the patchset into 2 parts
>> as I think @andrew+netdev was suggesting? The last 2 patches are kind of unrelated
>> to the first 4.
>
>Yes, seems like a good idea, patches 5 and 6 would probably have been
>merged a while back if they were separate.

Ok, I submitted patches 5 and 6 as separate trivial patchset. Hopefully it will be
merged by the time my testing for the page_pool changes are complete so I can submit
on top of them. thanks!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/cisco/enic/enic.h b/drivers/net/ethernet/cisco/enic/enic.h
index 51f80378d928..19e22aba71a8 100644
--- a/drivers/net/ethernet/cisco/enic/enic.h
+++ b/drivers/net/ethernet/cisco/enic/enic.h
@@ -17,6 +17,8 @@ 
 #include "vnic_nic.h"
 #include "vnic_rss.h"
 #include <linux/irq.h>
+#include <linux/if_vlan.h>
+#include <net/page_pool/helpers.h>
 
 #define DRV_NAME		"enic"
 #define DRV_DESCRIPTION		"Cisco VIC Ethernet NIC Driver"
@@ -158,6 +160,7 @@  struct enic_rq_stats {
 	u64 pkt_truncated;		/* truncated pkts */
 	u64 no_skb;			/* out of skbs */
 	u64 desc_skip;			/* Rx pkt went into later buffer */
+	u64 pp_alloc_error;		/* page alloc error */
 };
 
 struct enic_wq {
@@ -169,6 +172,7 @@  struct enic_wq {
 struct enic_rq {
 	struct vnic_rq vrq;
 	struct enic_rq_stats stats;
+	struct page_pool *pool;
 } ____cacheline_aligned;
 
 /* Per-instance private data structure */
@@ -231,8 +235,14 @@  struct enic {
 			       void *opaque);
 	int (*rq_alloc_buf)(struct vnic_rq *rq);
 	void (*rq_free_buf)(struct vnic_rq *rq, struct vnic_rq_buf *buf);
+	void (*rq_cleanup)(struct enic_rq *rq);
 };
 
+static inline unsigned int get_max_pkt_len(struct enic *enic)
+{
+	return enic->netdev->mtu + VLAN_ETH_HLEN;
+}
+
 static inline struct net_device *vnic_get_netdev(struct vnic_dev *vdev)
 {
 	struct enic *enic = vdev->priv;
diff --git a/drivers/net/ethernet/cisco/enic/enic_ethtool.c b/drivers/net/ethernet/cisco/enic/enic_ethtool.c
index d607b4f0542c..799f44b95bfc 100644
--- a/drivers/net/ethernet/cisco/enic/enic_ethtool.c
+++ b/drivers/net/ethernet/cisco/enic/enic_ethtool.c
@@ -51,6 +51,7 @@  static const struct enic_stat enic_per_rq_stats[] = {
 	ENIC_PER_RQ_STAT(napi_repoll),
 	ENIC_PER_RQ_STAT(no_skb),
 	ENIC_PER_RQ_STAT(desc_skip),
+	ENIC_PER_RQ_STAT(pp_alloc_error),
 };
 
 #define NUM_ENIC_PER_RQ_STATS   ARRAY_SIZE(enic_per_rq_stats)
diff --git a/drivers/net/ethernet/cisco/enic/enic_main.c b/drivers/net/ethernet/cisco/enic/enic_main.c
index 45ab6b670563..5bfd89749237 100644
--- a/drivers/net/ethernet/cisco/enic/enic_main.c
+++ b/drivers/net/ethernet/cisco/enic/enic_main.c
@@ -1282,6 +1282,11 @@  static int enic_get_vf_port(struct net_device *netdev, int vf,
 	return -EMSGSIZE;
 }
 
+/* nothing to do for buffers based allocation */
+static void enic_rq_buf_cleanup(struct enic_rq *rq)
+{
+}
+
 static void enic_free_rq_buf(struct vnic_rq *rq, struct vnic_rq_buf *buf)
 {
 	struct enic *enic = vnic_dev_priv(rq->vdev);
@@ -1881,10 +1886,33 @@  static int enic_open(struct net_device *netdev)
 	struct enic *enic = netdev_priv(netdev);
 	unsigned int i;
 	int err, ret;
-
-	enic->rq_buf_service = enic_rq_indicate_buf;
-	enic->rq_alloc_buf = enic_rq_alloc_buf;
-	enic->rq_free_buf = enic_free_rq_buf;
+	bool use_page_pool;
+	struct page_pool_params pp_params = { 0 };
+
+	/* Use the Page Pool API for MTUs <= PAGE_SIZE */
+	use_page_pool = (get_max_pkt_len(enic) <= PAGE_SIZE);
+
+	if (use_page_pool) {
+		/* use the page pool API */
+		pp_params.order = 0;
+		pp_params.pool_size = enic->config.rq_desc_count;
+		pp_params.nid = dev_to_node(&enic->pdev->dev);
+		pp_params.dev = &enic->pdev->dev;
+		pp_params.dma_dir = DMA_FROM_DEVICE;
+		pp_params.max_len = PAGE_SIZE;
+		pp_params.netdev = netdev;
+		pp_params.flags = PP_FLAG_DMA_MAP | PP_FLAG_DMA_SYNC_DEV;
+
+		enic->rq_buf_service = enic_rq_indicate_page;
+		enic->rq_alloc_buf = enic_rq_alloc_page;
+		enic->rq_free_buf = enic_rq_free_page;
+		enic->rq_cleanup = enic_rq_page_cleanup;
+	} else {
+		enic->rq_buf_service = enic_rq_indicate_buf;
+		enic->rq_alloc_buf = enic_rq_alloc_buf;
+		enic->rq_free_buf = enic_free_rq_buf;
+		enic->rq_cleanup = enic_rq_buf_cleanup;
+	}
 
 	err = enic_request_intr(enic);
 	if (err) {
@@ -1902,6 +1930,13 @@  static int enic_open(struct net_device *netdev)
 	}
 
 	for (i = 0; i < enic->rq_count; i++) {
+		/* create a page pool for each RQ */
+		if (use_page_pool) {
+			pp_params.napi = &enic->napi[i];
+			pp_params.queue_idx = i;
+			enic->rq[i].pool = page_pool_create(&pp_params);
+		}
+
 		/* enable rq before updating rq desc */
 		vnic_rq_enable(&enic->rq[i].vrq);
 		vnic_rq_fill(&enic->rq[i].vrq, enic->rq_alloc_buf);
@@ -1942,8 +1977,10 @@  static int enic_open(struct net_device *netdev)
 err_out_free_rq:
 	for (i = 0; i < enic->rq_count; i++) {
 		ret = vnic_rq_disable(&enic->rq[i].vrq);
-		if (!ret)
+		if (!ret) {
 			vnic_rq_clean(&enic->rq[i].vrq, enic->rq_free_buf);
+			enic->rq_cleanup(&enic->rq[i]);
+		}
 	}
 	enic_dev_notify_unset(enic);
 err_out_free_intr:
@@ -2001,8 +2038,10 @@  static int enic_stop(struct net_device *netdev)
 
 	for (i = 0; i < enic->wq_count; i++)
 		vnic_wq_clean(&enic->wq[i].vwq, enic_free_wq_buf);
-	for (i = 0; i < enic->rq_count; i++)
+	for (i = 0; i < enic->rq_count; i++) {
 		vnic_rq_clean(&enic->rq[i].vrq, enic->rq_free_buf);
+		enic->rq_cleanup(&enic->rq[i]);
+	}
 	for (i = 0; i < enic->cq_count; i++)
 		vnic_cq_clean(&enic->cq[i]);
 	for (i = 0; i < enic->intr_count; i++)
diff --git a/drivers/net/ethernet/cisco/enic/enic_rq.c b/drivers/net/ethernet/cisco/enic/enic_rq.c
index ae2ab5af87e9..d450d2f4f694 100644
--- a/drivers/net/ethernet/cisco/enic/enic_rq.c
+++ b/drivers/net/ethernet/cisco/enic/enic_rq.c
@@ -7,6 +7,7 @@ 
 #include "enic_rq.h"
 #include "vnic_rq.h"
 #include "cq_enet_desc.h"
+#include "enic_res.h"
 
 #define ENIC_LARGE_PKT_THRESHOLD		1000
 
@@ -118,3 +119,142 @@  int enic_rq_service(struct vnic_dev *vdev, struct cq_desc *cq_desc,
 
 	return 0;
 }
+
+void enic_rq_page_cleanup(struct enic_rq *rq)
+{
+	struct vnic_rq *vrq = &rq->vrq;
+	struct enic *enic = vnic_dev_priv(vrq->vdev);
+	struct napi_struct *napi = &enic->napi[vrq->index];
+
+	napi_free_frags(napi);
+	page_pool_destroy(rq->pool);
+}
+
+void enic_rq_free_page(struct vnic_rq *vrq, struct vnic_rq_buf *buf)
+{
+	struct enic *enic = vnic_dev_priv(vrq->vdev);
+	struct enic_rq *rq = &enic->rq[vrq->index];
+
+	if (!buf->os_buf)
+		return;
+
+	page_pool_put_full_page(rq->pool, (struct page *)buf->os_buf, true);
+	buf->os_buf = NULL;
+}
+
+int enic_rq_alloc_page(struct vnic_rq *vrq)
+{
+	struct enic *enic = vnic_dev_priv(vrq->vdev);
+	struct enic_rq *rq = &enic->rq[vrq->index];
+	struct enic_rq_stats *rqstats = &rq->stats;
+	struct vnic_rq_buf *buf = vrq->to_use;
+	dma_addr_t dma_addr;
+	struct page *page;
+	unsigned int offset = 0;
+	unsigned int len;
+	unsigned int truesize;
+
+	len = get_max_pkt_len(enic);
+	truesize = len;
+
+	if (buf->os_buf) {
+		dma_addr = buf->dma_addr;
+	} else {
+		page = page_pool_dev_alloc(rq->pool, &offset, &truesize);
+		if (unlikely(!page)) {
+			rqstats->pp_alloc_error++;
+			return -ENOMEM;
+		}
+		buf->os_buf = (void *)page;
+		buf->offset = offset;
+		buf->truesize = truesize;
+		dma_addr = page_pool_get_dma_addr(page) + offset;
+	}
+
+	enic_queue_rq_desc(vrq, buf->os_buf, dma_addr, len);
+
+	return 0;
+}
+
+/* Unmap and free pages fragments making up the error packet.
+ */
+static void enic_rq_error_reset(struct vnic_rq *vrq)
+{
+	struct enic *enic = vnic_dev_priv(vrq->vdev);
+	struct napi_struct *napi = &enic->napi[vrq->index];
+
+	napi_free_frags(napi);
+}
+
+void enic_rq_indicate_page(struct vnic_rq *vrq, struct cq_desc *cq_desc,
+			   struct vnic_rq_buf *buf, int skipped, void *opaque)
+{
+	struct enic *enic = vnic_dev_priv(vrq->vdev);
+	struct sk_buff *skb;
+	struct enic_rq *rq = &enic->rq[vrq->index];
+	struct enic_rq_stats *rqstats = &rq->stats;
+	struct vnic_cq *cq = &enic->cq[enic_cq_rq(enic, vrq->index)];
+	struct napi_struct *napi;
+	u8 type, color, eop, sop, ingress_port, vlan_stripped;
+	u8 fcoe, fcoe_sof, fcoe_fc_crc_ok, fcoe_enc_error, fcoe_eof;
+	u8 tcp_udp_csum_ok, udp, tcp, ipv4_csum_ok;
+	u8 ipv6, ipv4, ipv4_fragment, fcs_ok, rss_type, csum_not_calc;
+	u8 packet_error;
+	u16 q_number, completed_index, bytes_written, vlan_tci, checksum;
+	u32 rss_hash;
+
+	if (skipped) {
+		rqstats->desc_skip++;
+		return;
+	}
+
+	if (!buf || !buf->dma_addr) {
+		net_warn_ratelimited("%s: !buf || !buf->dma_addr!!\n",
+				     enic->netdev->name);
+		return;
+	}
+
+	cq_enet_rq_desc_dec((struct cq_enet_rq_desc *)cq_desc,
+			    &type, &color, &q_number, &completed_index,
+			    &ingress_port, &fcoe, &eop, &sop, &rss_type,
+			    &csum_not_calc, &rss_hash, &bytes_written,
+			    &packet_error, &vlan_stripped, &vlan_tci, &checksum,
+			    &fcoe_sof, &fcoe_fc_crc_ok, &fcoe_enc_error,
+			    &fcoe_eof, &tcp_udp_csum_ok, &udp, &tcp,
+			    &ipv4_csum_ok, &ipv6, &ipv4, &ipv4_fragment,
+			    &fcs_ok);
+
+	if (enic_rq_pkt_error(vrq, packet_error, fcs_ok, bytes_written)) {
+		enic_rq_error_reset(vrq);
+		return;
+	}
+
+	napi = &enic->napi[vrq->index];
+	skb = napi_get_frags(napi);
+	if (unlikely(!skb)) {
+		net_warn_ratelimited("%s: skb alloc error rq[%d], desc[%d]\n",
+				     enic->netdev->name, vrq->index,
+				     completed_index);
+		rqstats->no_skb++;
+		return;
+	}
+
+	dma_sync_single_for_cpu(&enic->pdev->dev, buf->dma_addr, bytes_written,
+				DMA_FROM_DEVICE);
+	skb_add_rx_frag(skb, skb_shinfo(skb)->nr_frags, (struct page *)buf->os_buf,
+			buf->offset, bytes_written, buf->truesize);
+
+	buf->os_buf = NULL;
+	buf->dma_addr = 0;
+	buf = buf->next;
+
+	enic_rq_set_skb_flags(vrq, type, rss_hash, rss_type, fcoe, fcoe_fc_crc_ok,
+			      vlan_stripped, csum_not_calc, tcp_udp_csum_ok, ipv6,
+			      ipv4_csum_ok, vlan_tci, skb);
+	if (enic->rx_coalesce_setting.use_adaptive_rx_coalesce)
+		enic_intr_update_pkt_size(&cq->pkt_size_counter, skb->len);
+	skb_mark_for_recycle(skb);
+	skb_record_rx_queue(skb, vrq->index);
+	napi_gro_frags(napi);
+	rqstats->packets++;
+}
diff --git a/drivers/net/ethernet/cisco/enic/enic_rq.h b/drivers/net/ethernet/cisco/enic/enic_rq.h
index 46ab75fd74a0..f429f31b6172 100644
--- a/drivers/net/ethernet/cisco/enic/enic_rq.h
+++ b/drivers/net/ethernet/cisco/enic/enic_rq.h
@@ -19,4 +19,9 @@  int enic_rq_service(struct vnic_dev *vdev, struct cq_desc *cq_desc,
 		    u8 type, u16 q_number, u16 completed_index, void *opaque);
 void enic_rq_indicate_buf(struct vnic_rq *rq, struct cq_desc *cq_desc,
 			  struct vnic_rq_buf *buf, int skipped, void *opaque);
+void enic_rq_indicate_page(struct vnic_rq *rq, struct cq_desc *cq_desc,
+			   struct vnic_rq_buf *buf, int skipped, void *opaque);
+int enic_rq_alloc_page(struct vnic_rq *rq);
+void enic_rq_free_page(struct vnic_rq *rq, struct vnic_rq_buf *buf);
+void enic_rq_page_cleanup(struct enic_rq *rq);
 #endif /* _ENIC_RQ_H_ */
diff --git a/drivers/net/ethernet/cisco/enic/vnic_rq.h b/drivers/net/ethernet/cisco/enic/vnic_rq.h
index 0bc595abc03b..2ee4be2b9a34 100644
--- a/drivers/net/ethernet/cisco/enic/vnic_rq.h
+++ b/drivers/net/ethernet/cisco/enic/vnic_rq.h
@@ -61,6 +61,8 @@  struct vnic_rq_buf {
 	unsigned int index;
 	void *desc;
 	uint64_t wr_id;
+	unsigned int offset;
+	unsigned int truesize;
 };
 
 enum enic_poll_state {