[rfc,v2,4/5] page_pool: support page frag API for page pool

Message ID	1625903002-31619-5-git-send-email-linyunsheng@huawei.com (mailing list archive)
State	RFC
Delegated to:	Netdev Maintainers
Headers	show Return-Path: <bpf-owner@kernel.org> From: Yunsheng Lin <linyunsheng@huawei.com> To: <davem@davemloft.net>, <kuba@kernel.org> CC: <alexander.duyck@gmail.com>, <linux@armlinux.org.uk>, <mw@semihalf.com>, <linuxarm@openeuler.org>, <yisen.zhuang@huawei.com>, <salil.mehta@huawei.com>, <thomas.petazzoni@bootlin.com>, <hawk@kernel.org>, <ilias.apalodimas@linaro.org>, <ast@kernel.org>, <daniel@iogearbox.net>, <john.fastabend@gmail.com>, <akpm@linux-foundation.org>, <peterz@infradead.org>, <will@kernel.org>, <willy@infradead.org>, <vbabka@suse.cz>, <fenghua.yu@intel.com>, <guro@fb.com>, <peterx@redhat.com>, <feng.tang@intel.com>, <jgg@ziepe.ca>, <mcroce@microsoft.com>, <hughd@google.com>, <jonathan.lemon@gmail.com>, <alobakin@pm.me>, <willemb@google.com>, <wenxu@ucloud.cn>, <cong.wang@bytedance.com>, <haokexin@gmail.com>, <nogikh@google.com>, <elver@google.com>, <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>, <bpf@vger.kernel.org> Subject: [PATCH rfc v2 4/5] page_pool: support page frag API for page pool Date: Sat, 10 Jul 2021 15:43:21 +0800 Message-ID: <1625903002-31619-5-git-send-email-linyunsheng@huawei.com> In-Reply-To: <1625903002-31619-1-git-send-email-linyunsheng@huawei.com> References: <1625903002-31619-1-git-send-email-linyunsheng@huawei.com> MIME-Version: 1.0 Content-Type: text/plain Precedence: bulk
Series	add elevated refcnt support for page pool \| expand [rfc,v2,0/5] add elevated refcnt support for page pool [rfc,v2,1/5] page_pool: keep pp info as long as page pool owns the page [rfc,v2,2/5] page_pool: add interface for getting and setting pagecnt_bias [rfc,v2,3/5] page_pool: add page recycling support based on elevated refcnt [rfc,v2,4/5] page_pool: support page frag API for page pool [rfc,v2,5/5] net: hns3: support skb's frag page recycling based on page pool

Message ID

1625903002-31619-5-git-send-email-linyunsheng@huawei.com (mailing list archive)

State

RFC

Delegated to:

Netdev Maintainers

Headers

From: Yunsheng Lin <linyunsheng@huawei.com>
To: <davem@davemloft.net>, <kuba@kernel.org>
CC: <alexander.duyck@gmail.com>, <linux@armlinux.org.uk>,
        <mw@semihalf.com>, <linuxarm@openeuler.org>,
        <yisen.zhuang@huawei.com>, <salil.mehta@huawei.com>,
        <thomas.petazzoni@bootlin.com>, <hawk@kernel.org>,
        <ilias.apalodimas@linaro.org>, <ast@kernel.org>,
        <daniel@iogearbox.net>, <john.fastabend@gmail.com>,
        <akpm@linux-foundation.org>, <peterz@infradead.org>,
        <will@kernel.org>, <willy@infradead.org>, <vbabka@suse.cz>,
        <fenghua.yu@intel.com>, <guro@fb.com>, <peterx@redhat.com>,
        <feng.tang@intel.com>, <jgg@ziepe.ca>, <mcroce@microsoft.com>,
        <hughd@google.com>, <jonathan.lemon@gmail.com>, <alobakin@pm.me>,
        <willemb@google.com>, <wenxu@ucloud.cn>, <cong.wang@bytedance.com>,
        <haokexin@gmail.com>, <nogikh@google.com>, <elver@google.com>,
        <netdev@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
        <bpf@vger.kernel.org>
Subject: [PATCH rfc v2 4/5] page_pool: support page frag API for page pool
Date: Sat, 10 Jul 2021 15:43:21 +0800
Message-ID: <1625903002-31619-5-git-send-email-linyunsheng@huawei.com>
In-Reply-To: <1625903002-31619-1-git-send-email-linyunsheng@huawei.com>
References: <1625903002-31619-1-git-send-email-linyunsheng@huawei.com>
MIME-Version: 1.0
Content-Type: text/plain
Precedence: bulk

Series

add elevated refcnt support for page pool | expand

Checks

Context	Check	Description
netdev/cover_letter	success	Link
netdev/fixes_present	success	Link
netdev/patch_count	success	Link
netdev/tree_selection	success	Guessed tree name to be net-next
netdev/subject_prefix	warning	Target tree name not specified in the subject
netdev/cc_maintainers	warning	5 maintainers not CCed: yhs@fb.com kpsingh@kernel.org andrii@kernel.org kafai@fb.com songliubraving@fb.com
netdev/source_inline	success	Was 0 now: 0
netdev/verify_signedoff	success	Link
netdev/module_param	success	Was 0 now: 0
netdev/build_32bit	success	Errors and warnings before: 6118 this patch: 6118
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/verify_fixes	success	Link
netdev/checkpatch	success	total: 0 errors, 0 warnings, 0 checks, 93 lines checked
netdev/build_allmodconfig_warn	success	Errors and warnings before: 6175 this patch: 6175
netdev/header_inline	success	Link

Context

Check

Description

netdev/cover_letter

success

Link

netdev/fixes_present

success

Link

netdev/patch_count

success

Link

netdev/tree_selection

success

Guessed tree name to be net-next

netdev/subject_prefix

warning

Target tree name not specified in the subject

netdev/cc_maintainers

warning

5 maintainers not CCed: yhs@fb.com kpsingh@kernel.org andrii@kernel.org kafai@fb.com songliubraving@fb.com

netdev/source_inline

success

Was 0 now: 0

netdev/verify_signedoff

success

Link

netdev/module_param

success

Was 0 now: 0

netdev/build_32bit

success

Errors and warnings before: 6118 this patch: 6118

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/verify_fixes

success

Link

netdev/checkpatch

success

total: 0 errors, 0 warnings, 0 checks, 93 lines checked

netdev/build_allmodconfig_warn

success

Errors and warnings before: 6175 this patch: 6175

netdev/header_inline

success

Link

Commit Message

Yunsheng Lin July 10, 2021, 7:43 a.m. UTC

Currently each desc use a whole page to do ping-pong page
reusing in most driver. As the page pool has support page
recycling based on elevated refcnt, it makes sense to add
a page frag API in page pool to split a page to different
frag to serve multi descriptions.

This means a huge memory saving for kernel with page size of
64K, as a page can be used by 32 descriptions with 2k buffer
size, comparing to each desc using one page currently.

Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
---
 include/net/page_pool.h | 14 ++++++++++++++
 net/core/page_pool.c    | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 63 insertions(+)

Comments

Alexander Duyck July 10, 2021, 5:43 p.m. UTC | #1

On Sat, Jul 10, 2021 at 12:44 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>
> Currently each desc use a whole page to do ping-pong page
> reusing in most driver. As the page pool has support page
> recycling based on elevated refcnt, it makes sense to add
> a page frag API in page pool to split a page to different
> frag to serve multi descriptions.
>
> This means a huge memory saving for kernel with page size of
> 64K, as a page can be used by 32 descriptions with 2k buffer
> size, comparing to each desc using one page currently.
>
> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
> ---
>  include/net/page_pool.h | 14 ++++++++++++++
>  net/core/page_pool.c    | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>  2 files changed, 63 insertions(+)
>
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index f0e708d..06a5e43 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -80,6 +80,7 @@ struct page_pool_params {
>         enum dma_data_direction dma_dir; /* DMA mapping direction */
>         unsigned int    max_len; /* max DMA sync memory size */
>         unsigned int    offset;  /* DMA addr offset */
> +       unsigned int    frag_size;
>  };
>
>  struct page_pool {
> @@ -91,6 +92,8 @@ struct page_pool {
>         unsigned long defer_warn;
>
>         u32 pages_state_hold_cnt;
> +       unsigned int frag_offset;
> +       struct page *frag_page;
>
>         /*
>          * Data structure for allocation side
> @@ -140,6 +143,17 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
>         return page_pool_alloc_pages(pool, gfp);
>  }
>
> +struct page *page_pool_alloc_frag(struct page_pool *pool,
> +                                 unsigned int *offset, gfp_t gfp);
> +
> +static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
> +                                                   unsigned int *offset)
> +{
> +       gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
> +
> +       return page_pool_alloc_frag(pool, offset, gfp);
> +}
> +
>  /* get the stored dma direction. A driver might decide to treat this locally and
>   * avoid the extra cache line from page_pool to determine the direction
>   */
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index a87cbe1..b787033 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -350,6 +350,53 @@ struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
>  }
>  EXPORT_SYMBOL(page_pool_alloc_pages);
>
> +struct page *page_pool_alloc_frag(struct page_pool *pool,
> +                                 unsigned int *offset, gfp_t gfp)
> +{
> +       unsigned int frag_offset = pool->frag_offset;
> +       unsigned int frag_size = pool->p.frag_size;
> +       struct page *frag_page = pool->frag_page;
> +       unsigned int max_len = pool->p.max_len;
> +
> +       if (!frag_page || frag_offset + frag_size > max_len) {
> +               frag_page = page_pool_alloc_pages(pool, gfp);
> +               if (unlikely(!frag_page)) {
> +                       pool->frag_page = NULL;
> +                       return NULL;
> +               }
> +
> +               pool->frag_page = frag_page;
> +               frag_offset = 0;
> +
> +               page_pool_sub_bias(pool, frag_page,
> +                                  max_len / frag_size - 1);
> +       }
> +
> +       *offset = frag_offset;
> +       pool->frag_offset = frag_offset + frag_size;
> +
> +       return frag_page;
> +}
> +EXPORT_SYMBOL(page_pool_alloc_frag);

I'm still not a fan of the fixed implementation. For the cost of the
division as I said before you could make this flexible like
page_frag_alloc_align and just decrement the bias by one per
allocation instead of trying to batch it.

I'm sure there would likely be implementations that might need to
operate at two different sizes, for example a header and payload size.

> +static void page_pool_empty_frag(struct page_pool *pool)
> +{
> +       unsigned int frag_offset = pool->frag_offset;
> +       unsigned int frag_size = pool->p.frag_size;
> +       struct page *frag_page = pool->frag_page;
> +       unsigned int max_len = pool->p.max_len;
> +
> +       if (!frag_page)
> +               return;
> +
> +       while (frag_offset + frag_size <= max_len) {
> +               page_pool_put_full_page(pool, frag_page, false);
> +               frag_offset += frag_size;
> +       }
> +

Having to call this to free the page seems confusing. Rather than
reserving multiple and having to free the page multiple times I really
think you would be better off just holding one bias reservation on the
page at a time.

> +       pool->frag_page = NULL;
> +}
> +
>  /* Calculate distance between two u32 values, valid if distance is below 2^(31)
>   *  https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution
>   */
> @@ -670,6 +717,8 @@ void page_pool_destroy(struct page_pool *pool)
>         if (!page_pool_put(pool))
>                 return;
>
> +       page_pool_empty_frag(pool);
> +
>         if (!page_pool_release(pool))
>                 return;
>
> --
> 2.7.4
>

Yunsheng Lin July 12, 2021, 7:57 a.m. UTC | #2

On 2021/7/11 1:43, Alexander Duyck wrote:
> On Sat, Jul 10, 2021 at 12:44 AM Yunsheng Lin <linyunsheng@huawei.com> wrote:
>>
>> Currently each desc use a whole page to do ping-pong page
>> reusing in most driver. As the page pool has support page
>> recycling based on elevated refcnt, it makes sense to add
>> a page frag API in page pool to split a page to different
>> frag to serve multi descriptions.
>>
>> This means a huge memory saving for kernel with page size of
>> 64K, as a page can be used by 32 descriptions with 2k buffer
>> size, comparing to each desc using one page currently.
>>
>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com>
>> ---
>>  include/net/page_pool.h | 14 ++++++++++++++
>>  net/core/page_pool.c    | 49 +++++++++++++++++++++++++++++++++++++++++++++++++
>>  2 files changed, 63 insertions(+)
>>
>> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
>> index f0e708d..06a5e43 100644
>> --- a/include/net/page_pool.h
>> +++ b/include/net/page_pool.h
>> @@ -80,6 +80,7 @@ struct page_pool_params {
>>         enum dma_data_direction dma_dir; /* DMA mapping direction */
>>         unsigned int    max_len; /* max DMA sync memory size */
>>         unsigned int    offset;  /* DMA addr offset */
>> +       unsigned int    frag_size;
>>  };
>>
>>  struct page_pool {
>> @@ -91,6 +92,8 @@ struct page_pool {
>>         unsigned long defer_warn;
>>
>>         u32 pages_state_hold_cnt;
>> +       unsigned int frag_offset;
>> +       struct page *frag_page;
>>
>>         /*
>>          * Data structure for allocation side
>> @@ -140,6 +143,17 @@ static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
>>         return page_pool_alloc_pages(pool, gfp);
>>  }
>>
>> +struct page *page_pool_alloc_frag(struct page_pool *pool,
>> +                                 unsigned int *offset, gfp_t gfp);
>> +
>> +static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
>> +                                                   unsigned int *offset)
>> +{
>> +       gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
>> +
>> +       return page_pool_alloc_frag(pool, offset, gfp);
>> +}
>> +
>>  /* get the stored dma direction. A driver might decide to treat this locally and
>>   * avoid the extra cache line from page_pool to determine the direction
>>   */
>> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
>> index a87cbe1..b787033 100644
>> --- a/net/core/page_pool.c
>> +++ b/net/core/page_pool.c
>> @@ -350,6 +350,53 @@ struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
>>  }
>>  EXPORT_SYMBOL(page_pool_alloc_pages);
>>
>> +struct page *page_pool_alloc_frag(struct page_pool *pool,
>> +                                 unsigned int *offset, gfp_t gfp)
>> +{
>> +       unsigned int frag_offset = pool->frag_offset;
>> +       unsigned int frag_size = pool->p.frag_size;
>> +       struct page *frag_page = pool->frag_page;
>> +       unsigned int max_len = pool->p.max_len;
>> +
>> +       if (!frag_page || frag_offset + frag_size > max_len) {
>> +               frag_page = page_pool_alloc_pages(pool, gfp);
>> +               if (unlikely(!frag_page)) {
>> +                       pool->frag_page = NULL;
>> +                       return NULL;
>> +               }
>> +
>> +               pool->frag_page = frag_page;
>> +               frag_offset = 0;
>> +
>> +               page_pool_sub_bias(pool, frag_page,
>> +                                  max_len / frag_size - 1);
>> +       }
>> +
>> +       *offset = frag_offset;
>> +       pool->frag_offset = frag_offset + frag_size;
>> +
>> +       return frag_page;
>> +}
>> +EXPORT_SYMBOL(page_pool_alloc_frag);
> 
> I'm still not a fan of the fixed implementation. For the cost of the
> division as I said before you could make this flexible like
> page_frag_alloc_align and just decrement the bias by one per
> allocation instead of trying to batch it.
> 
> I'm sure there would likely be implementations that might need to
> operate at two different sizes, for example a header and payload size.

Will try to implement the frag allocation of different sizes
in new version.

> 
>> +static void page_pool_empty_frag(struct page_pool *pool)
>> +{
>> +       unsigned int frag_offset = pool->frag_offset;
>> +       unsigned int frag_size = pool->p.frag_size;
>> +       struct page *frag_page = pool->frag_page;
>> +       unsigned int max_len = pool->p.max_len;
>> +
>> +       if (!frag_page)
>> +               return;
>> +
>> +       while (frag_offset + frag_size <= max_len) {
>> +               page_pool_put_full_page(pool, frag_page, false);
>> +               frag_offset += frag_size;
>> +       }
>> +
> 
> Having to call this to free the page seems confusing. Rather than
> reserving multiple and having to free the page multiple times I really
> think you would be better off just holding one bias reservation on the
> page at a time.

will remove the above freeing the page multiple times.

> 
>> +       pool->frag_page = NULL;
>> +}
>> +
>>  /* Calculate distance between two u32 values, valid if distance is below 2^(31)
>>   *  https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution
>>   */
>> @@ -670,6 +717,8 @@ void page_pool_destroy(struct page_pool *pool)
>>         if (!page_pool_put(pool))
>>                 return;
>>
>> +       page_pool_empty_frag(pool);
>> +
>>         if (!page_pool_release(pool))
>>                 return;
>>
>> --
>> 2.7.4
>>
> .
>

diff --git a/include/net/page_pool.h b/include/net/page_pool.h
index f0e708d..06a5e43 100644
--- a/include/net/page_pool.h
+++ b/include/net/page_pool.h
@@ -80,6 +80,7 @@  struct page_pool_params {
 	enum dma_data_direction dma_dir; /* DMA mapping direction */
 	unsigned int	max_len; /* max DMA sync memory size */
 	unsigned int	offset;  /* DMA addr offset */
+	unsigned int	frag_size;
 };
 
 struct page_pool {
@@ -91,6 +92,8 @@  struct page_pool {
 	unsigned long defer_warn;
 
 	u32 pages_state_hold_cnt;
+	unsigned int frag_offset;
+	struct page *frag_page;
 
 	/*
 	 * Data structure for allocation side
@@ -140,6 +143,17 @@  static inline struct page *page_pool_dev_alloc_pages(struct page_pool *pool)
 	return page_pool_alloc_pages(pool, gfp);
 }
 
+struct page *page_pool_alloc_frag(struct page_pool *pool,
+				  unsigned int *offset, gfp_t gfp);
+
+static inline struct page *page_pool_dev_alloc_frag(struct page_pool *pool,
+						    unsigned int *offset)
+{
+	gfp_t gfp = (GFP_ATOMIC | __GFP_NOWARN);
+
+	return page_pool_alloc_frag(pool, offset, gfp);
+}
+
 /* get the stored dma direction. A driver might decide to treat this locally and
  * avoid the extra cache line from page_pool to determine the direction
  */
diff --git a/net/core/page_pool.c b/net/core/page_pool.c
index a87cbe1..b787033 100644
--- a/net/core/page_pool.c
+++ b/net/core/page_pool.c
@@ -350,6 +350,53 @@  struct page *page_pool_alloc_pages(struct page_pool *pool, gfp_t gfp)
 }
 EXPORT_SYMBOL(page_pool_alloc_pages);
 
+struct page *page_pool_alloc_frag(struct page_pool *pool,
+				  unsigned int *offset, gfp_t gfp)
+{
+	unsigned int frag_offset = pool->frag_offset;
+	unsigned int frag_size = pool->p.frag_size;
+	struct page *frag_page = pool->frag_page;
+	unsigned int max_len = pool->p.max_len;
+
+	if (!frag_page || frag_offset + frag_size > max_len) {
+		frag_page = page_pool_alloc_pages(pool, gfp);
+		if (unlikely(!frag_page)) {
+			pool->frag_page = NULL;
+			return NULL;
+		}
+
+		pool->frag_page = frag_page;
+		frag_offset = 0;
+
+		page_pool_sub_bias(pool, frag_page,
+				   max_len / frag_size - 1);
+	}
+
+	*offset = frag_offset;
+	pool->frag_offset = frag_offset + frag_size;
+
+	return frag_page;
+}
+EXPORT_SYMBOL(page_pool_alloc_frag);
+
+static void page_pool_empty_frag(struct page_pool *pool)
+{
+	unsigned int frag_offset = pool->frag_offset;
+	unsigned int frag_size = pool->p.frag_size;
+	struct page *frag_page = pool->frag_page;
+	unsigned int max_len = pool->p.max_len;
+
+	if (!frag_page)
+		return;
+
+	while (frag_offset + frag_size <= max_len) {
+		page_pool_put_full_page(pool, frag_page, false);
+		frag_offset += frag_size;
+	}
+
+	pool->frag_page = NULL;
+}
+
 /* Calculate distance between two u32 values, valid if distance is below 2^(31)
  *  https://en.wikipedia.org/wiki/Serial_number_arithmetic#General_Solution
  */
@@ -670,6 +717,8 @@  void page_pool_destroy(struct page_pool *pool)
 	if (!page_pool_put(pool))
 		return;
 
+	page_pool_empty_frag(pool);
+
 	if (!page_pool_release(pool))
 		return;