Message ID | 20231206105419.27952-5-liangchen.linux@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | skbuff: Optimize SKB coalescing for page pool | expand |
On Wed, Dec 6, 2023 at 2:54 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > In order to address the issues encountered with commit 1effe8ca4e34 > ("skbuff: fix coalescing for page_pool fragment recycling"), the > combination of the following condition was excluded from skb coalescing: > > from->pp_recycle = 1 > from->cloned = 1 > to->pp_recycle = 1 > > However, with page pool environments, the aforementioned combination can > be quite common(ex. NetworkMananger may lead to the additional > packet_type being registered, thus the cloning). In scenarios with a > higher number of small packets, it can significantly affect the success > rate of coalescing. For example, considering packets of 256 bytes size, > our comparison of coalescing success rate is as follows: > > Without page pool: 70% > With page pool: 13% > > Consequently, this has an impact on performance: > > Without page pool: 2.57 Gbits/sec > With page pool: 2.26 Gbits/sec > > Therefore, it seems worthwhile to optimize this scenario and enable > coalescing of this particular combination. To achieve this, we need to > ensure the correct increment of the "from" SKB page's page pool > reference count (pp_ref_count). > > Following this optimization, the success rate of coalescing measured in > our environment has improved as follows: > > With page pool: 60% > > This success rate is approaching the rate achieved without using page > pool, and the performance has also been improved: > > With page pool: 2.52 Gbits/sec > > Below is the performance comparison for small packets before and after > this optimization. We observe no impact to packets larger than 4K. > > packet size before after improved > (bytes) (Gbits/sec) (Gbits/sec) > 128 1.19 1.27 7.13% > 256 2.26 2.52 11.75% > 512 4.13 4.81 16.50% > 1024 6.17 6.73 9.05% > 2048 14.54 15.47 6.45% > 4096 25.44 27.87 9.52% > > Signed-off-by: Liang Chen <liangchen.linux@gmail.com> > Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> > Suggested-by: Jason Wang <jasowang@redhat.com> > --- > include/net/page_pool/helpers.h | 5 ++++ > net/core/skbuff.c | 41 +++++++++++++++++++++++---------- > 2 files changed, 34 insertions(+), 12 deletions(-) > > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h > index 9dc8eaf8a959..268bc9d9ffd3 100644 > --- a/include/net/page_pool/helpers.h > +++ b/include/net/page_pool/helpers.h > @@ -278,6 +278,11 @@ static inline long page_pool_unref_page(struct page *page, long nr) > return ret; > } > > +static inline void page_pool_ref_page(struct page *page) > +{ > + atomic_long_inc(&page->pp_ref_count); > +} > + > static inline bool page_pool_is_last_ref(struct page *page) > { > /* If page_pool_unref_page() returns 0, we were the last user */ > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 7e26b56cda38..3c2515a29376 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -947,6 +947,24 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe) > return napi_pp_put_page(virt_to_page(data), napi_safe); > } > > +/** > + * skb_pp_frag_ref() - Increase fragment reference count of a page > + * @page: page of the fragment on which to increase a reference > + * > + * Increase fragment reference count (pp_ref_count) on a page, but if it is > + * not a page pool page, fallback to increase a reference(_refcount) on a > + * normal page. > + */ > +static void skb_pp_frag_ref(struct page *page) > +{ > + struct page *head_page = compound_head(page); > + > + if (likely(is_pp_page(head_page))) > + page_pool_ref_page(head_page); > + else > + page_ref_inc(head_page); > +} > + I am confused by this, why add a new helper instead of modifying the existing helper, skb_frag_ref()? My mental model is that if the net stack wants to acquire a reference on a frag, it calls skb_frag_ref(), and if it wants to drop a reference on a frag, it should call skb_frag_unref(). Internally skb_frag_ref/unref() can do all sorts of checking to decide whether to increment page->refcount or page->pp_ref_count. I can't wrap my head around the introduction of skb_pp_frag_ref(), but no equivalent skb_pp_frag_unref(). But even if skb_pp_frag_unref() was added, when should the net stack use skb_frag_ref/unref, and when should the stack use skb_pp_ref/unref? The docs currently describe what the function does, but when a program unfamiliar with the page pool should use it. > static void skb_kfree_head(void *head, unsigned int end_offset) > { > if (end_offset == SKB_SMALL_HEAD_HEADROOM) > @@ -5769,17 +5787,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > return false; > > /* In general, avoid mixing page_pool and non-page_pool allocated > - * pages within the same SKB. Additionally avoid dealing with clones > - * with page_pool pages, in case the SKB is using page_pool fragment > - * references (page_pool_alloc_frag()). Since we only take full page > - * references for cloned SKBs at the moment that would result in > - * inconsistent reference counts. > - * In theory we could take full references if @from is cloned and > - * !@to->pp_recycle but its tricky (due to potential race with > - * the clone disappearing) and rare, so not worth dealing with. > + * pages within the same SKB. In theory we could take full > + * references if @from is cloned and !@to->pp_recycle but its > + * tricky (due to potential race with the clone disappearing) and > + * rare, so not worth dealing with. > */ > - if (to->pp_recycle != from->pp_recycle || > - (from->pp_recycle && skb_cloned(from))) > + if (to->pp_recycle != from->pp_recycle) > return false; > > if (len <= skb_tailroom(to)) { > @@ -5836,8 +5849,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > /* if the skb is not cloned this does nothing > * since we set nr_frags to 0. > */ > - for (i = 0; i < from_shinfo->nr_frags; i++) > - __skb_frag_ref(&from_shinfo->frags[i]); > + if (from->pp_recycle) > + for (i = 0; i < from_shinfo->nr_frags; i++) > + skb_pp_frag_ref(skb_frag_page(&from_shinfo->frags[i])); > + else > + for (i = 0; i < from_shinfo->nr_frags; i++) > + __skb_frag_ref(&from_shinfo->frags[i]); You added a check here to use skb_pp_frag_ref() instead of skb_frag_ref() here, but it's not clear to me why other callsites of skb_frag_ref() don't need to be modified in the same way after your patch. After your patch: skb_frag_ref() will always increment page->_refcount skb_frag_unref() will either decrement page->_refcount or decrement page->pp_ref_count (depending on the value of skb->pp_recycle). skb_pp_frag_ref() will either increment page->_refcount or increment page->pp_ref_count (depending on the value of is_pp_page(), not skb->pp_recycle). skb_pp_frag_unref() doesn't exist. Is this not confusing? Can we streamline things: skb_frag_ref() increments page->pp_ref_count for skb->pp_recycle, page->_refcount otherwise. skb_frag_unref() decrement page->pp_ref_count for skb->pp_recycle, page->_refcount otherwise. Or am I missing something that causes us to require this asymmetric reference counting? > > to->truesize += delta; > to->len += len; > -- > 2.31.1 > >
On Sat, Dec 9, 2023 at 10:18 AM Mina Almasry <almasrymina@google.com> wrote: > > On Wed, Dec 6, 2023 at 2:54 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > > > In order to address the issues encountered with commit 1effe8ca4e34 > > ("skbuff: fix coalescing for page_pool fragment recycling"), the > > combination of the following condition was excluded from skb coalescing: > > > > from->pp_recycle = 1 > > from->cloned = 1 > > to->pp_recycle = 1 > > > > However, with page pool environments, the aforementioned combination can > > be quite common(ex. NetworkMananger may lead to the additional > > packet_type being registered, thus the cloning). In scenarios with a > > higher number of small packets, it can significantly affect the success > > rate of coalescing. For example, considering packets of 256 bytes size, > > our comparison of coalescing success rate is as follows: > > > > Without page pool: 70% > > With page pool: 13% > > > > Consequently, this has an impact on performance: > > > > Without page pool: 2.57 Gbits/sec > > With page pool: 2.26 Gbits/sec > > > > Therefore, it seems worthwhile to optimize this scenario and enable > > coalescing of this particular combination. To achieve this, we need to > > ensure the correct increment of the "from" SKB page's page pool > > reference count (pp_ref_count). > > > > Following this optimization, the success rate of coalescing measured in > > our environment has improved as follows: > > > > With page pool: 60% > > > > This success rate is approaching the rate achieved without using page > > pool, and the performance has also been improved: > > > > With page pool: 2.52 Gbits/sec > > > > Below is the performance comparison for small packets before and after > > this optimization. We observe no impact to packets larger than 4K. > > > > packet size before after improved > > (bytes) (Gbits/sec) (Gbits/sec) > > 128 1.19 1.27 7.13% > > 256 2.26 2.52 11.75% > > 512 4.13 4.81 16.50% > > 1024 6.17 6.73 9.05% > > 2048 14.54 15.47 6.45% > > 4096 25.44 27.87 9.52% > > > > Signed-off-by: Liang Chen <liangchen.linux@gmail.com> > > Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> > > Suggested-by: Jason Wang <jasowang@redhat.com> > > --- > > include/net/page_pool/helpers.h | 5 ++++ > > net/core/skbuff.c | 41 +++++++++++++++++++++++---------- > > 2 files changed, 34 insertions(+), 12 deletions(-) > > > > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h > > index 9dc8eaf8a959..268bc9d9ffd3 100644 > > --- a/include/net/page_pool/helpers.h > > +++ b/include/net/page_pool/helpers.h > > @@ -278,6 +278,11 @@ static inline long page_pool_unref_page(struct page *page, long nr) > > return ret; > > } > > > > +static inline void page_pool_ref_page(struct page *page) > > +{ > > + atomic_long_inc(&page->pp_ref_count); > > +} > > + > > static inline bool page_pool_is_last_ref(struct page *page) > > { > > /* If page_pool_unref_page() returns 0, we were the last user */ > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > index 7e26b56cda38..3c2515a29376 100644 > > --- a/net/core/skbuff.c > > +++ b/net/core/skbuff.c > > @@ -947,6 +947,24 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe) > > return napi_pp_put_page(virt_to_page(data), napi_safe); > > } > > > > +/** > > + * skb_pp_frag_ref() - Increase fragment reference count of a page > > + * @page: page of the fragment on which to increase a reference > > + * > > + * Increase fragment reference count (pp_ref_count) on a page, but if it is > > + * not a page pool page, fallback to increase a reference(_refcount) on a > > + * normal page. > > + */ > > +static void skb_pp_frag_ref(struct page *page) > > +{ > > + struct page *head_page = compound_head(page); > > + > > + if (likely(is_pp_page(head_page))) > > + page_pool_ref_page(head_page); > > + else > > + page_ref_inc(head_page); > > +} > > + > > I am confused by this, why add a new helper instead of modifying the > existing helper, skb_frag_ref()? > > My mental model is that if the net stack wants to acquire a reference > on a frag, it calls skb_frag_ref(), and if it wants to drop a > reference on a frag, it should call skb_frag_unref(). Internally > skb_frag_ref/unref() can do all sorts of checking to decide whether to > increment page->refcount or page->pp_ref_count. I can't wrap my head > around the introduction of skb_pp_frag_ref(), but no equivalent > skb_pp_frag_unref(). > > But even if skb_pp_frag_unref() was added, when should the net stack > use skb_frag_ref/unref, and when should the stack use > skb_pp_ref/unref? The docs currently describe what the function does, > but when a program unfamiliar with the page pool should use it. > > > static void skb_kfree_head(void *head, unsigned int end_offset) > > { > > if (end_offset == SKB_SMALL_HEAD_HEADROOM) > > @@ -5769,17 +5787,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > > return false; > > > > /* In general, avoid mixing page_pool and non-page_pool allocated > > - * pages within the same SKB. Additionally avoid dealing with clones > > - * with page_pool pages, in case the SKB is using page_pool fragment > > - * references (page_pool_alloc_frag()). Since we only take full page > > - * references for cloned SKBs at the moment that would result in > > - * inconsistent reference counts. > > - * In theory we could take full references if @from is cloned and > > - * !@to->pp_recycle but its tricky (due to potential race with > > - * the clone disappearing) and rare, so not worth dealing with. > > + * pages within the same SKB. In theory we could take full > > + * references if @from is cloned and !@to->pp_recycle but its > > + * tricky (due to potential race with the clone disappearing) and > > + * rare, so not worth dealing with. > > */ > > - if (to->pp_recycle != from->pp_recycle || > > - (from->pp_recycle && skb_cloned(from))) > > + if (to->pp_recycle != from->pp_recycle) > > return false; > > > > if (len <= skb_tailroom(to)) { > > @@ -5836,8 +5849,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > > /* if the skb is not cloned this does nothing > > * since we set nr_frags to 0. > > */ > > - for (i = 0; i < from_shinfo->nr_frags; i++) > > - __skb_frag_ref(&from_shinfo->frags[i]); > > + if (from->pp_recycle) > > + for (i = 0; i < from_shinfo->nr_frags; i++) > > + skb_pp_frag_ref(skb_frag_page(&from_shinfo->frags[i])); > > + else > > + for (i = 0; i < from_shinfo->nr_frags; i++) > > + __skb_frag_ref(&from_shinfo->frags[i]); > > You added a check here to use skb_pp_frag_ref() instead of > skb_frag_ref() here, but it's not clear to me why other callsites of > skb_frag_ref() don't need to be modified in the same way after your > patch. > > After your patch: > > skb_frag_ref() will always increment page->_refcount > skb_frag_unref() will either decrement page->_refcount or decrement > page->pp_ref_count (depending on the value of skb->pp_recycle). > skb_pp_frag_ref() will either increment page->_refcount or increment > page->pp_ref_count (depending on the value of is_pp_page(), not > skb->pp_recycle). > skb_pp_frag_unref() doesn't exist. > > Is this not confusing? Can we streamline things: > > skb_frag_ref() increments page->pp_ref_count for skb->pp_recycle, > page->_refcount otherwise. > skb_frag_unref() decrement page->pp_ref_count for skb->pp_recycle, > page->_refcount otherwise. > > Or am I missing something that causes us to require this asymmetric > reference counting? > This idea was previously implemented, as shown here: https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/. But implementing this would result in some unnecessary overhead, since currently, 'skb_try_coalesce' is the only place where the page pool reference count for skb frag might be increased. I would prefer to move the logic to '__skb_frag_ref' when such a need becomes more common. Thanks! > > > > to->truesize += delta; > > to->len += len; > > -- > > 2.31.1 > > > > > > > -- > Thanks, > Mina
On Sun, Dec 10, 2023 at 7:38 PM Liang Chen <liangchen.linux@gmail.com> wrote: > > On Sat, Dec 9, 2023 at 10:18 AM Mina Almasry <almasrymina@google.com> wrote: > > > > On Wed, Dec 6, 2023 at 2:54 AM Liang Chen <liangchen.linux@gmail.com> wrote: > > > > > > In order to address the issues encountered with commit 1effe8ca4e34 > > > ("skbuff: fix coalescing for page_pool fragment recycling"), the > > > combination of the following condition was excluded from skb coalescing: > > > > > > from->pp_recycle = 1 > > > from->cloned = 1 > > > to->pp_recycle = 1 > > > > > > However, with page pool environments, the aforementioned combination can > > > be quite common(ex. NetworkMananger may lead to the additional > > > packet_type being registered, thus the cloning). In scenarios with a > > > higher number of small packets, it can significantly affect the success > > > rate of coalescing. For example, considering packets of 256 bytes size, > > > our comparison of coalescing success rate is as follows: > > > > > > Without page pool: 70% > > > With page pool: 13% > > > > > > Consequently, this has an impact on performance: > > > > > > Without page pool: 2.57 Gbits/sec > > > With page pool: 2.26 Gbits/sec > > > > > > Therefore, it seems worthwhile to optimize this scenario and enable > > > coalescing of this particular combination. To achieve this, we need to > > > ensure the correct increment of the "from" SKB page's page pool > > > reference count (pp_ref_count). > > > > > > Following this optimization, the success rate of coalescing measured in > > > our environment has improved as follows: > > > > > > With page pool: 60% > > > > > > This success rate is approaching the rate achieved without using page > > > pool, and the performance has also been improved: > > > > > > With page pool: 2.52 Gbits/sec > > > > > > Below is the performance comparison for small packets before and after > > > this optimization. We observe no impact to packets larger than 4K. > > > > > > packet size before after improved > > > (bytes) (Gbits/sec) (Gbits/sec) > > > 128 1.19 1.27 7.13% > > > 256 2.26 2.52 11.75% > > > 512 4.13 4.81 16.50% > > > 1024 6.17 6.73 9.05% > > > 2048 14.54 15.47 6.45% > > > 4096 25.44 27.87 9.52% > > > > > > Signed-off-by: Liang Chen <liangchen.linux@gmail.com> > > > Reviewed-by: Yunsheng Lin <linyunsheng@huawei.com> > > > Suggested-by: Jason Wang <jasowang@redhat.com> > > > --- > > > include/net/page_pool/helpers.h | 5 ++++ > > > net/core/skbuff.c | 41 +++++++++++++++++++++++---------- > > > 2 files changed, 34 insertions(+), 12 deletions(-) > > > > > > diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h > > > index 9dc8eaf8a959..268bc9d9ffd3 100644 > > > --- a/include/net/page_pool/helpers.h > > > +++ b/include/net/page_pool/helpers.h > > > @@ -278,6 +278,11 @@ static inline long page_pool_unref_page(struct page *page, long nr) > > > return ret; > > > } > > > > > > +static inline void page_pool_ref_page(struct page *page) > > > +{ > > > + atomic_long_inc(&page->pp_ref_count); > > > +} > > > + > > > static inline bool page_pool_is_last_ref(struct page *page) > > > { > > > /* If page_pool_unref_page() returns 0, we were the last user */ > > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > > > index 7e26b56cda38..3c2515a29376 100644 > > > --- a/net/core/skbuff.c > > > +++ b/net/core/skbuff.c > > > @@ -947,6 +947,24 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe) > > > return napi_pp_put_page(virt_to_page(data), napi_safe); > > > } > > > > > > +/** > > > + * skb_pp_frag_ref() - Increase fragment reference count of a page > > > + * @page: page of the fragment on which to increase a reference > > > + * > > > + * Increase fragment reference count (pp_ref_count) on a page, but if it is > > > + * not a page pool page, fallback to increase a reference(_refcount) on a > > > + * normal page. > > > + */ > > > +static void skb_pp_frag_ref(struct page *page) > > > +{ > > > + struct page *head_page = compound_head(page); > > > + > > > + if (likely(is_pp_page(head_page))) > > > + page_pool_ref_page(head_page); > > > + else > > > + page_ref_inc(head_page); > > > +} > > > + > > > > I am confused by this, why add a new helper instead of modifying the > > existing helper, skb_frag_ref()? > > > > My mental model is that if the net stack wants to acquire a reference > > on a frag, it calls skb_frag_ref(), and if it wants to drop a > > reference on a frag, it should call skb_frag_unref(). Internally > > skb_frag_ref/unref() can do all sorts of checking to decide whether to > > increment page->refcount or page->pp_ref_count. I can't wrap my head > > around the introduction of skb_pp_frag_ref(), but no equivalent > > skb_pp_frag_unref(). > > > > But even if skb_pp_frag_unref() was added, when should the net stack > > use skb_frag_ref/unref, and when should the stack use > > skb_pp_ref/unref? The docs currently describe what the function does, > > but when a program unfamiliar with the page pool should use it. > > > > > static void skb_kfree_head(void *head, unsigned int end_offset) > > > { > > > if (end_offset == SKB_SMALL_HEAD_HEADROOM) > > > @@ -5769,17 +5787,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > > > return false; > > > > > > /* In general, avoid mixing page_pool and non-page_pool allocated > > > - * pages within the same SKB. Additionally avoid dealing with clones > > > - * with page_pool pages, in case the SKB is using page_pool fragment > > > - * references (page_pool_alloc_frag()). Since we only take full page > > > - * references for cloned SKBs at the moment that would result in > > > - * inconsistent reference counts. > > > - * In theory we could take full references if @from is cloned and > > > - * !@to->pp_recycle but its tricky (due to potential race with > > > - * the clone disappearing) and rare, so not worth dealing with. > > > + * pages within the same SKB. In theory we could take full > > > + * references if @from is cloned and !@to->pp_recycle but its > > > + * tricky (due to potential race with the clone disappearing) and > > > + * rare, so not worth dealing with. > > > */ > > > - if (to->pp_recycle != from->pp_recycle || > > > - (from->pp_recycle && skb_cloned(from))) > > > + if (to->pp_recycle != from->pp_recycle) > > > return false; > > > > > > if (len <= skb_tailroom(to)) { > > > @@ -5836,8 +5849,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, > > > /* if the skb is not cloned this does nothing > > > * since we set nr_frags to 0. > > > */ > > > - for (i = 0; i < from_shinfo->nr_frags; i++) > > > - __skb_frag_ref(&from_shinfo->frags[i]); > > > + if (from->pp_recycle) > > > + for (i = 0; i < from_shinfo->nr_frags; i++) > > > + skb_pp_frag_ref(skb_frag_page(&from_shinfo->frags[i])); > > > + else > > > + for (i = 0; i < from_shinfo->nr_frags; i++) > > > + __skb_frag_ref(&from_shinfo->frags[i]); > > > > You added a check here to use skb_pp_frag_ref() instead of > > skb_frag_ref() here, but it's not clear to me why other callsites of > > skb_frag_ref() don't need to be modified in the same way after your > > patch. > > > > After your patch: > > > > skb_frag_ref() will always increment page->_refcount > > skb_frag_unref() will either decrement page->_refcount or decrement > > page->pp_ref_count (depending on the value of skb->pp_recycle). > > skb_pp_frag_ref() will either increment page->_refcount or increment > > page->pp_ref_count (depending on the value of is_pp_page(), not > > skb->pp_recycle). > > skb_pp_frag_unref() doesn't exist. > > > > Is this not confusing? Can we streamline things: > > > > skb_frag_ref() increments page->pp_ref_count for skb->pp_recycle, > > page->_refcount otherwise. > > skb_frag_unref() decrement page->pp_ref_count for skb->pp_recycle, > > page->_refcount otherwise. > > > > Or am I missing something that causes us to require this asymmetric > > reference counting? > > > > This idea was previously implemented, as shown here: > https://lore.kernel.org/all/20211009093724.10539-5-linyunsheng@huawei.com/. > But implementing this would result in some unnecessary overhead, since > currently, 'skb_try_coalesce' is the only place where the page pool > reference count for skb frag might be increased. I would prefer to > move the logic to '__skb_frag_ref' when such a need becomes more > common. Thanks! > Is it possible/desirable to add a comment to skb_frag_ref() that it should not be used with skb->pp_recycle? At least I was tripped by this, but maybe it's considered obvious somehow. But I feel like this maybe needs to be fixed. Why does the page_pool need a separate page->pp_ref_count? Why not use page->_refcount like the rest of the code? Is there a history here behind this decision that you can point me to? It seems to me that incrementing/decrementing page->pp_ref_count may be equivalent to doing the same on page->_refcount. > > > > > > to->truesize += delta; > > > to->len += len; > > > -- > > > 2.31.1 > > > > > > > > > > > > -- > > Thanks, > > Mina
On Sun, 10 Dec 2023 20:21:21 -0800 Mina Almasry wrote: > Is it possible/desirable to add a comment to skb_frag_ref() that it > should not be used with skb->pp_recycle? At least I was tripped by > this, but maybe it's considered obvious somehow. > > But I feel like this maybe needs to be fixed. Why does the page_pool > need a separate page->pp_ref_count? Why not use page->_refcount like > the rest of the code? Is there a history here behind this decision > that you can point me to? It seems to me that > incrementing/decrementing page->pp_ref_count may be equivalent to > doing the same on page->_refcount. Does reading the contents of the comment I proposed here: https://lore.kernel.org/all/20231208173816.2f32ad0f@kernel.org/ elucidate it? The pp_ref_count means the holder is aware that they can't release the reference by calling put_page(). Because (a) we may need to clean up the pp state, unmap DMA etc. and (b) one day it may not even be a real page (your work). TBH I'm partial to the rename from patch 1, so I wouldn't delay this work any more :) But you have a point that we should inspect the code and consider making the semantics of skb_frag_ref() stronger all by itself, without the need to add a new flavor of the helper.. Are you okay with leaving that as a follow up or do you reckon it's easy enough we should push for it now?
On Mon, Dec 11, 2023 at 11:32 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Sun, 10 Dec 2023 20:21:21 -0800 Mina Almasry wrote: > > Is it possible/desirable to add a comment to skb_frag_ref() that it > > should not be used with skb->pp_recycle? At least I was tripped by > > this, but maybe it's considered obvious somehow. > > > > But I feel like this maybe needs to be fixed. Why does the page_pool > > need a separate page->pp_ref_count? Why not use page->_refcount like > > the rest of the code? Is there a history here behind this decision > > that you can point me to? It seems to me that > > incrementing/decrementing page->pp_ref_count may be equivalent to > > doing the same on page->_refcount. > > Does reading the contents of the comment I proposed here: > https://lore.kernel.org/all/20231208173816.2f32ad0f@kernel.org/ > elucidate it? The pp_ref_count means the holder is aware that > they can't release the reference by calling put_page(). > Because (a) we may need to clean up the pp state, unmap DMA etc. > and (b) one day it may not even be a real page (your work). > Thank you, that makes sense. > TBH I'm partial to the rename from patch 1, so I wouldn't delay this > work any more :) But you have a point that we should inspect the code > and consider making the semantics of skb_frag_ref() stronger all by > itself, without the need to add a new flavor of the helper.. > Are you okay with leaving that as a follow up or do you reckon it's > easy enough we should push for it now? I think the rename from pp_frag_count -> pp_ref_count is a huge improvement, and I think the fact that the netstack has a way to obtain a reference on a pp frag is also a huge improvement. Please go ahead mearging this if you like, I was asking questions for my own education/follow up work to consider.
diff --git a/include/net/page_pool/helpers.h b/include/net/page_pool/helpers.h index 9dc8eaf8a959..268bc9d9ffd3 100644 --- a/include/net/page_pool/helpers.h +++ b/include/net/page_pool/helpers.h @@ -278,6 +278,11 @@ static inline long page_pool_unref_page(struct page *page, long nr) return ret; } +static inline void page_pool_ref_page(struct page *page) +{ + atomic_long_inc(&page->pp_ref_count); +} + static inline bool page_pool_is_last_ref(struct page *page) { /* If page_pool_unref_page() returns 0, we were the last user */ diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7e26b56cda38..3c2515a29376 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -947,6 +947,24 @@ static bool skb_pp_recycle(struct sk_buff *skb, void *data, bool napi_safe) return napi_pp_put_page(virt_to_page(data), napi_safe); } +/** + * skb_pp_frag_ref() - Increase fragment reference count of a page + * @page: page of the fragment on which to increase a reference + * + * Increase fragment reference count (pp_ref_count) on a page, but if it is + * not a page pool page, fallback to increase a reference(_refcount) on a + * normal page. + */ +static void skb_pp_frag_ref(struct page *page) +{ + struct page *head_page = compound_head(page); + + if (likely(is_pp_page(head_page))) + page_pool_ref_page(head_page); + else + page_ref_inc(head_page); +} + static void skb_kfree_head(void *head, unsigned int end_offset) { if (end_offset == SKB_SMALL_HEAD_HEADROOM) @@ -5769,17 +5787,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, return false; /* In general, avoid mixing page_pool and non-page_pool allocated - * pages within the same SKB. Additionally avoid dealing with clones - * with page_pool pages, in case the SKB is using page_pool fragment - * references (page_pool_alloc_frag()). Since we only take full page - * references for cloned SKBs at the moment that would result in - * inconsistent reference counts. - * In theory we could take full references if @from is cloned and - * !@to->pp_recycle but its tricky (due to potential race with - * the clone disappearing) and rare, so not worth dealing with. + * pages within the same SKB. In theory we could take full + * references if @from is cloned and !@to->pp_recycle but its + * tricky (due to potential race with the clone disappearing) and + * rare, so not worth dealing with. */ - if (to->pp_recycle != from->pp_recycle || - (from->pp_recycle && skb_cloned(from))) + if (to->pp_recycle != from->pp_recycle) return false; if (len <= skb_tailroom(to)) { @@ -5836,8 +5849,12 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from, /* if the skb is not cloned this does nothing * since we set nr_frags to 0. */ - for (i = 0; i < from_shinfo->nr_frags; i++) - __skb_frag_ref(&from_shinfo->frags[i]); + if (from->pp_recycle) + for (i = 0; i < from_shinfo->nr_frags; i++) + skb_pp_frag_ref(skb_frag_page(&from_shinfo->frags[i])); + else + for (i = 0; i < from_shinfo->nr_frags; i++) + __skb_frag_ref(&from_shinfo->frags[i]); to->truesize += delta; to->len += len;