diff mbox series

[net-next] net: mana: Add page pool for RX buffers

Message ID 1689259687-5231-1-git-send-email-haiyangz@microsoft.com (mailing list archive)
State Superseded
Headers show
Series [net-next] net: mana: Add page pool for RX buffers | expand

Commit Message

Haiyang Zhang July 13, 2023, 2:48 p.m. UTC
Add page pool for RX buffers for faster buffer cycle and reduce CPU
usage.

Get an extra ref count of a page after allocation, so after upper
layers put the page, it's still referenced by the pool. We can reuse
it as RX buffer without alloc a new page.

Signed-off-by: Haiyang Zhang <haiyangz@microsoft.com>
---
 drivers/net/ethernet/microsoft/mana/mana_en.c | 73 ++++++++++++++++++-
 include/net/mana/mana.h                       |  5 ++
 2 files changed, 77 insertions(+), 1 deletion(-)

Comments

Jakub Kicinski July 14, 2023, 3:53 a.m. UTC | #1
On Thu, 13 Jul 2023 14:48:45 +0000 Haiyang Zhang wrote:
> Add page pool for RX buffers for faster buffer cycle and reduce CPU
> usage.
> 
> Get an extra ref count of a page after allocation, so after upper
> layers put the page, it's still referenced by the pool. We can reuse
> it as RX buffer without alloc a new page.

Please use the real page_pool API from include/net/page_pool.h
We've moved past every driver reinventing the wheel, sorry.
Jesper Dangaard Brouer July 14, 2023, 7:43 a.m. UTC | #2
On 14/07/2023 05.53, Jakub Kicinski wrote:
> On Thu, 13 Jul 2023 14:48:45 +0000 Haiyang Zhang wrote:
>> Add page pool for RX buffers for faster buffer cycle and reduce CPU
>> usage.
>>
>> Get an extra ref count of a page after allocation, so after upper
>> layers put the page, it's still referenced by the pool. We can reuse
>> it as RX buffer without alloc a new page.
> 
> Please use the real page_pool API from include/net/page_pool.h
> We've moved past every driver reinventing the wheel, sorry.

+1

Quoting[1]: Documentation/networking/page_pool.rst

  Basic use involves replacing alloc_pages() calls with the 
page_pool_alloc_pages() call.
  Drivers should use page_pool_dev_alloc_pages() replacing 
dev_alloc_pages().


[1] https://kernel.org/doc/html/latest/networking/page_pool.html
Haiyang Zhang July 14, 2023, 12:51 p.m. UTC | #3
> -----Original Message-----
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> On 14/07/2023 05.53, Jakub Kicinski wrote:
> > On Thu, 13 Jul 2023 14:48:45 +0000 Haiyang Zhang wrote:
> >> Add page pool for RX buffers for faster buffer cycle and reduce CPU
> >> usage.
> >>
> >> Get an extra ref count of a page after allocation, so after upper
> >> layers put the page, it's still referenced by the pool. We can reuse
> >> it as RX buffer without alloc a new page.
> >
> > Please use the real page_pool API from include/net/page_pool.h
> > We've moved past every driver reinventing the wheel, sorry.
> 
> +1
> 
> Quoting[1]: Documentation/networking/page_pool.rst
> 
>   Basic use involves replacing alloc_pages() calls with the
> page_pool_alloc_pages() call.
>   Drivers should use page_pool_dev_alloc_pages() replacing
> dev_alloc_pages().
 
Thank Jakub and Jesper for the reviews.
I'm aware of the page_pool.rst doc, and actually tried it before this 
patch, but I got lower perf. If I understand correctly, we should call 
page_pool_release_page() before passing the SKB to napi_gro_receive().

I found the page_pool_dev_alloc_pages() goes through the slow path, 
because the page_pool_release_page() let the page leave the pool.

Do we have to call page_pool_release_page() before passing the SKB 
to napi_gro_receive()? Any better way to recycle the pages from the 
upper layer of non-XDP case?

Thanks,
- Haiyang
Jesper Dangaard Brouer July 14, 2023, 1:13 p.m. UTC | #4
On 14/07/2023 14.51, Haiyang Zhang wrote:
> 
> 
>> -----Original Message-----
>> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
>> On 14/07/2023 05.53, Jakub Kicinski wrote:
>>> On Thu, 13 Jul 2023 14:48:45 +0000 Haiyang Zhang wrote:
>>>> Add page pool for RX buffers for faster buffer cycle and reduce CPU
>>>> usage.
>>>>
>>>> Get an extra ref count of a page after allocation, so after upper
>>>> layers put the page, it's still referenced by the pool. We can reuse
>>>> it as RX buffer without alloc a new page.
>>>
>>> Please use the real page_pool API from include/net/page_pool.h
>>> We've moved past every driver reinventing the wheel, sorry.
>>
>> +1
>>
>> Quoting[1]: Documentation/networking/page_pool.rst
>>
>>    Basic use involves replacing alloc_pages() calls with the
>> page_pool_alloc_pages() call.
>>    Drivers should use page_pool_dev_alloc_pages() replacing
>> dev_alloc_pages().
>   
> Thank Jakub and Jesper for the reviews.
> I'm aware of the page_pool.rst doc, and actually tried it before this
> patch, but I got lower perf. If I understand correctly, we should call
> page_pool_release_page() before passing the SKB to napi_gro_receive().
> 
> I found the page_pool_dev_alloc_pages() goes through the slow path,
> because the page_pool_release_page() let the page leave the pool.
> 
> Do we have to call page_pool_release_page() before passing the SKB
> to napi_gro_receive()? Any better way to recycle the pages from the
> upper layer of non-XDP case?
> 

Today SKB "upper layers" can recycle page_pool backed packet data/page.

Just use skb_mark_for_recycle(skb), then you don't need 
page_pool_release_page().

I guess, we should update the documentation, mentioning this.

--Jesper
Jakub Kicinski July 14, 2023, 3:31 p.m. UTC | #5
On Fri, 14 Jul 2023 15:13:15 +0200 Jesper Dangaard Brouer wrote:
> > Thank Jakub and Jesper for the reviews.
> > I'm aware of the page_pool.rst doc, and actually tried it before this
> > patch, but I got lower perf. If I understand correctly, we should call
> > page_pool_release_page() before passing the SKB to napi_gro_receive().
> > 
> > I found the page_pool_dev_alloc_pages() goes through the slow path,
> > because the page_pool_release_page() let the page leave the pool.
> > 
> > Do we have to call page_pool_release_page() before passing the SKB
> > to napi_gro_receive()? Any better way to recycle the pages from the
> > upper layer of non-XDP case?
> >   
> 
> Today SKB "upper layers" can recycle page_pool backed packet data/page.
> 
> Just use skb_mark_for_recycle(skb), then you don't need 
> page_pool_release_page().
> 
> I guess, we should update the documentation, mentioning this.

Ah, I should probably send in the few cleanups form the huge page
series. It looks like all users of page_pool_release_page() can
be converted to skb recycling, so we should hide it and remove 
from docs?
Haiyang Zhang July 17, 2023, 6:26 p.m. UTC | #6
> -----Original Message-----
> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> Sent: Friday, July 14, 2023 9:13 AM
> To: Haiyang Zhang <haiyangz@microsoft.com>; Jesper Dangaard Brouer
> <jbrouer@redhat.com>; Jakub Kicinski <kuba@kernel.org>
> Cc: brouer@redhat.com; linux-hyperv@vger.kernel.org; netdev@vger.kernel.org;
> Dexuan Cui <decui@microsoft.com>; KY Srinivasan <kys@microsoft.com>; Paul
> Rosswurm <paulros@microsoft.com>; olaf@aepfle.de; vkuznets@redhat.com;
> davem@davemloft.net; wei.liu@kernel.org; edumazet@google.com;
> pabeni@redhat.com; leon@kernel.org; Long Li <longli@microsoft.com>;
> ssengar@linux.microsoft.com; linux-rdma@vger.kernel.org;
> daniel@iogearbox.net; john.fastabend@gmail.com; bpf@vger.kernel.org;
> ast@kernel.org; Ajay Sharma <sharmaajay@microsoft.com>; hawk@kernel.org;
> tglx@linutronix.de; shradhagupta@linux.microsoft.com; linux-
> kernel@vger.kernel.org; Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Subject: Re: [PATCH net-next] net: mana: Add page pool for RX buffers
> 
> [You don't often get email from jbrouer@redhat.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
> 
> On 14/07/2023 14.51, Haiyang Zhang wrote:
> >
> >
> >> -----Original Message-----
> >> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
> >> On 14/07/2023 05.53, Jakub Kicinski wrote:
> >>> On Thu, 13 Jul 2023 14:48:45 +0000 Haiyang Zhang wrote:
> >>>> Add page pool for RX buffers for faster buffer cycle and reduce CPU
> >>>> usage.
> >>>>
> >>>> Get an extra ref count of a page after allocation, so after upper
> >>>> layers put the page, it's still referenced by the pool. We can reuse
> >>>> it as RX buffer without alloc a new page.
> >>>
> >>> Please use the real page_pool API from include/net/page_pool.h
> >>> We've moved past every driver reinventing the wheel, sorry.
> >>
> >> +1
> >>
> >> Quoting[1]: Documentation/networking/page_pool.rst
> >>
> >>    Basic use involves replacing alloc_pages() calls with the
> >> page_pool_alloc_pages() call.
> >>    Drivers should use page_pool_dev_alloc_pages() replacing
> >> dev_alloc_pages().
> >
> > Thank Jakub and Jesper for the reviews.
> > I'm aware of the page_pool.rst doc, and actually tried it before this
> > patch, but I got lower perf. If I understand correctly, we should call
> > page_pool_release_page() before passing the SKB to napi_gro_receive().
> >
> > I found the page_pool_dev_alloc_pages() goes through the slow path,
> > because the page_pool_release_page() let the page leave the pool.
> >
> > Do we have to call page_pool_release_page() before passing the SKB
> > to napi_gro_receive()? Any better way to recycle the pages from the
> > upper layer of non-XDP case?
> >
> 
> Today SKB "upper layers" can recycle page_pool backed packet data/page.
> 
> Just use skb_mark_for_recycle(skb), then you don't need
> page_pool_release_page().

Will do. Thanks a lot!

- Haiyang
Zhu Yanjun July 17, 2023, 11:59 p.m. UTC | #7
在 2023/7/14 20:51, Haiyang Zhang 写道:
> 
> 
>> -----Original Message-----
>> From: Jesper Dangaard Brouer <jbrouer@redhat.com>
>> On 14/07/2023 05.53, Jakub Kicinski wrote:
>>> On Thu, 13 Jul 2023 14:48:45 +0000 Haiyang Zhang wrote:
>>>> Add page pool for RX buffers for faster buffer cycle and reduce CPU
>>>> usage.
>>>>
>>>> Get an extra ref count of a page after allocation, so after upper
>>>> layers put the page, it's still referenced by the pool. We can reuse
>>>> it as RX buffer without alloc a new page.
>>>
>>> Please use the real page_pool API from include/net/page_pool.h
>>> We've moved past every driver reinventing the wheel, sorry.
>>
>> +1
>>
>> Quoting[1]: Documentation/networking/page_pool.rst
>>
>>    Basic use involves replacing alloc_pages() calls with the
>> page_pool_alloc_pages() call.
>>    Drivers should use page_pool_dev_alloc_pages() replacing
>> dev_alloc_pages().
>   
> Thank Jakub and Jesper for the reviews.
> I'm aware of the page_pool.rst doc, and actually tried it before this
> patch, but I got lower perf. If I understand correctly, we should call
> page_pool_release_page() before passing the SKB to napi_gro_receive().
> 

If I get this commit correctly, this commit is to use page pool to get 
better performance.

IIRC, folio is to make memory optimization. From the performance 
results, with folio, the performance will get about 10%.

So not sure if the folio can be used in this commit to get better 
performance.

That is my 2 cent.

Zhu Yanjun

> I found the page_pool_dev_alloc_pages() goes through the slow path,
> because the page_pool_release_page() let the page leave the pool.
> 
> Do we have to call page_pool_release_page() before passing the SKB
> to napi_gro_receive()? Any better way to recycle the pages from the
> upper layer of non-XDP case?
> 
> Thanks,
> - Haiyang
>
diff mbox series

Patch

diff --git a/drivers/net/ethernet/microsoft/mana/mana_en.c b/drivers/net/ethernet/microsoft/mana/mana_en.c
index a499e460594b..6444a8e47852 100644
--- a/drivers/net/ethernet/microsoft/mana/mana_en.c
+++ b/drivers/net/ethernet/microsoft/mana/mana_en.c
@@ -1507,6 +1507,34 @@  static void mana_rx_skb(void *buf_va, struct mana_rxcomp_oob *cqe,
 	return;
 }
 
+static struct page *mana_get_page_from_pool(struct mana_rxq *rxq)
+{
+	struct page *page;
+	int i;
+
+	i = rxq->pl_last + 1;
+	if (i >= MANA_POOL_SIZE)
+		i = 0;
+
+	rxq->pl_last = i;
+
+	page = rxq->pool[i];
+	if (page_ref_count(page) == 1) {
+		get_page(page);
+		return page;
+	}
+
+	page = dev_alloc_page();
+	if (page) {
+		put_page(rxq->pool[i]);
+
+		get_page(page);
+		rxq->pool[i] = page;
+	}
+
+	return page;
+}
+
 static void *mana_get_rxfrag(struct mana_rxq *rxq, struct device *dev,
 			     dma_addr_t *da, bool is_napi)
 {
@@ -1533,7 +1561,7 @@  static void *mana_get_rxfrag(struct mana_rxq *rxq, struct device *dev,
 			return NULL;
 		}
 	} else {
-		page = dev_alloc_page();
+		page = mana_get_page_from_pool(rxq);
 		if (!page)
 			return NULL;
 
@@ -1873,6 +1901,21 @@  static int mana_create_txq(struct mana_port_context *apc,
 	return err;
 }
 
+static void mana_release_rxq_pool(struct mana_rxq *rxq)
+{
+	struct page *page;
+	int i;
+
+	for (i = 0; i < MANA_POOL_SIZE; i++) {
+		page = rxq->pool[i];
+
+		if (page)
+			put_page(page);
+
+		rxq->pool[i] = NULL;
+	}
+}
+
 static void mana_destroy_rxq(struct mana_port_context *apc,
 			     struct mana_rxq *rxq, bool validate_state)
 
@@ -1917,6 +1960,8 @@  static void mana_destroy_rxq(struct mana_port_context *apc,
 		rx_oob->buf_va = NULL;
 	}
 
+	mana_release_rxq_pool(rxq);
+
 	if (rxq->gdma_rq)
 		mana_gd_destroy_queue(gc, rxq->gdma_rq);
 
@@ -2008,6 +2053,27 @@  static int mana_push_wqe(struct mana_rxq *rxq)
 	return 0;
 }
 
+static int mana_alloc_rxq_pool(struct mana_rxq *rxq)
+{
+	struct page *page;
+	int i;
+
+	for (i = 0; i < MANA_POOL_SIZE; i++) {
+		page = dev_alloc_page();
+		if (!page)
+			goto err;
+
+		rxq->pool[i] = page;
+	}
+
+	return 0;
+
+err:
+	mana_release_rxq_pool(rxq);
+
+	return -ENOMEM;
+}
+
 static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
 					u32 rxq_idx, struct mana_eq *eq,
 					struct net_device *ndev)
@@ -2029,6 +2095,11 @@  static struct mana_rxq *mana_create_rxq(struct mana_port_context *apc,
 	if (!rxq)
 		return NULL;
 
+	if (mana_alloc_rxq_pool(rxq)) {
+		kfree(rxq);
+		return NULL;
+	}
+
 	rxq->ndev = ndev;
 	rxq->num_rx_buf = RX_BUFFERS_PER_QUEUE;
 	rxq->rxq_idx = rxq_idx;
diff --git a/include/net/mana/mana.h b/include/net/mana/mana.h
index 024ad8ddb27e..8f1f09f9e4ab 100644
--- a/include/net/mana/mana.h
+++ b/include/net/mana/mana.h
@@ -297,6 +297,8 @@  struct mana_recv_buf_oob {
 
 #define MANA_XDP_MTU_MAX (PAGE_SIZE - MANA_RXBUF_PAD - XDP_PACKET_HEADROOM)
 
+#define MANA_POOL_SIZE (RX_BUFFERS_PER_QUEUE * 2)
+
 struct mana_rxq {
 	struct gdma_queue *gdma_rq;
 	/* Cache the gdma receive queue id */
@@ -330,6 +332,9 @@  struct mana_rxq {
 	bool xdp_flush;
 	int xdp_rc; /* XDP redirect return code */
 
+	struct page *pool[MANA_POOL_SIZE];
+	int pl_last;
+
 	/* MUST BE THE LAST MEMBER:
 	 * Each receive buffer has an associated mana_recv_buf_oob.
 	 */