Message ID | 20240407130850.19625-13-linyunsheng@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | First try to replace page_frag with page_frag_cache | expand |
On Sun, 2024-04-07 at 21:08 +0800, Yunsheng Lin wrote: > Update documentation about design, implementation and API usages > for page_frag. > > Also update MAINTAINERS for page_frag. Alexander seems to be the > orginal author for page_frag, we can add him to the MAINTAINERS > later if we have an ack from him. > > CC: Alexander Duyck <alexander.duyck@gmail.com> > Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> Again, this seems more like 2 different pathches at least. One for the Documentation and MAINTAINERS changes, and one for the function documentation. > --- > Documentation/mm/page_frags.rst | 115 ++++++++++++++++++---------- > MAINTAINERS | 10 +++ > include/linux/page_frag_cache.h | 128 ++++++++++++++++++++++++++++++++ > mm/page_frag_cache.c | 51 ++++++++++--- > 4 files changed, 256 insertions(+), 48 deletions(-) > > diff --git a/Documentation/mm/page_frags.rst b/Documentation/mm/page_frags.rst > index 503ca6cdb804..77256dfb58bf 100644 > --- a/Documentation/mm/page_frags.rst > +++ b/Documentation/mm/page_frags.rst > @@ -1,43 +1,80 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > ============== > Page fragments > ============== > > -A page fragment is an arbitrary-length arbitrary-offset area of memory > -which resides within a 0 or higher order compound page. Multiple > -fragments within that page are individually refcounted, in the page's > -reference counter. > - > -The page_frag functions, page_frag_alloc and page_frag_free, provide a > -simple allocation framework for page fragments. This is used by the > -network stack and network device drivers to provide a backing region of > -memory for use as either an sk_buff->head, or to be used in the "frags" > -portion of skb_shared_info. > - > -In order to make use of the page fragment APIs a backing page fragment > -cache is needed. This provides a central point for the fragment allocation > -and tracks allows multiple calls to make use of a cached page. The > -advantage to doing this is that multiple calls to get_page can be avoided > -which can be expensive at allocation time. However due to the nature of > -this caching it is required that any calls to the cache be protected by > -either a per-cpu limitation, or a per-cpu limitation and forcing interrupts > -to be disabled when executing the fragment allocation. > - > -The network stack uses two separate caches per CPU to handle fragment > -allocation. The netdev_alloc_cache is used by callers making use of the > -netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is > -used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The > -main difference between these two calls is the context in which they may be > -called. The "netdev" prefixed functions are usable in any context as these > -functions will disable interrupts, while the "napi" prefixed functions are > -only usable within the softirq context. > - > -Many network device drivers use a similar methodology for allocating page > -fragments, but the page fragments are cached at the ring or descriptor > -level. In order to enable these cases it is necessary to provide a generic > -way of tearing down a page cache. For this reason __page_frag_cache_drain > -was implemented. It allows for freeing multiple references from a single > -page via a single call. The advantage to doing this is that it allows for > -cleaning up the multiple references that were added to a page in order to > -avoid calling get_page per allocation. > - > -Alexander Duyck, Nov 29, 2016. What is the point of removing this just to add it to a C file further down in the diff? Honestly I am not a fan of all the noise this is adding to these diffs. Can we do a little less moving of lines for the sake of moving them? All it does is pollute the git blame if you try to figure out the origin of the lines. > +.. kernel-doc:: mm/page_frag_cache.c > + :doc: page_frag allocator > + > +Architecture overview > +===================== > + > +.. code-block:: none > + > + +----------------------+ > + | page_frag API caller | > + +----------------------+ > + ^ > + | > + | > + | > + v > + +----------------------------------------------+ > + | request page fragment | > + +----------------------------------------------+ > + ^ ^ > + | | > + | Cache empty or not enough | > + | | > + v | > + +--------------------------------+ | > + | refill cache with order 3 page | | > + +--------------------------------+ | > + ^ ^ | > + | | | > + | | Refill failed | > + | | | Cache is enough > + | | | > + | v | > + | +----------------------------------+ | > + | | refill cache with order 0 page | | > + | +----------------------------------+ | > + | ^ | > + | Refill succeed | | > + | | Refill succeed | > + | | | > + v v v > + +----------------------------------------------+ > + | allocate fragment from cache | > + +----------------------------------------------+ > + +1 for the simple visualization of how this works. > +API interface > +============= > +As the design and implementation of page_frag API, the allocation side does not > +allow concurrent calling, it is assumed that the caller must ensure there is not > +concurrent alloc calling to the same page_frag_cache instance by using it's own > +lock or rely on some lockless guarantee like NAPI softirq. > + > +Depending on different use cases, callers expecting to deal with va, page or > +both va and page for them may call page_frag_alloc_va(), page_frag_alloc_pg(), > +or page_frag_alloc() accordingly. > + So the new documentation is good up to here. > +There is also a use case that need minimum memory in order for forward > +progressing, but can do better if there is more memory available. Introduce > +page_frag_alloc_prepare() and page_frag_alloc_commit() related API, the caller > +requests the minimum memory it need and the prepare API will return the maximum > +size of the fragment returned, caller need to report back to the page_frag core > +how much memory it actually use by calling commit API, or not calling the commit > +API if deciding to not use any memory. > + This part is as clear as mud to me. It sounds like kind of a convoluted setup where you are having the caller have to know a fair bit about the internal structure of the cache and it is essentially checking the state and then performing a commit. Not a huge fan. I would almost prefer to see something more like what we used to do with msix where you just had a range you could request and if it can't give you at least the minimum it fails. I assume the patch is somewhere here in the set. Will take a look at it later. > +.. kernel-doc:: include/linux/page_frag_cache.h > + :identifiers: page_frag_cache_init page_frag_cache_is_pfmemalloc > + page_frag_alloc_va __page_frag_alloc_va_align > + page_frag_alloc_va_align page_frag_alloc_va_prepare > + page_frag_alloc_va_prepare_align page_frag_alloc_pg_prepare > + page_frag_alloc_prepare page_frag_alloc_commit > + page_frag_alloc_commit_noref page_frag_free_va > + > +.. kernel-doc:: mm/page_frag_cache.c > + :identifiers: page_frag_cache_drain > diff --git a/MAINTAINERS b/MAINTAINERS > index 4745ea94d463..2f84aba59428 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -16683,6 +16683,16 @@ F: mm/page-writeback.c > F: mm/readahead.c > F: mm/truncate.c > > +PAGE FRAG > +M: Yunsheng Lin <linyunsheng@huawei.com> > +L: linux-mm@kvack.org > +L: netdev@vger.kernel.org > +S: Supported > +F: Documentation/mm/page_frags.rst > +F: include/linux/page_frag_cache.h > +F: mm/page_frag_cache.c > +F: mm/page_frag_test.c > + I would appreciate it if you could add me as I usually am having to deal with issues people have with this anyway. You can probably just go with: Alexander Duyck <alexander.duyck@gmail.com> > PAGE POOL > M: Jesper Dangaard Brouer <hawk@kernel.org> > M: Ilias Apalodimas <ilias.apalodimas@linaro.org> > diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h > index 28185969cd2c..d8edbecdd179 100644 > --- a/include/linux/page_frag_cache.h > +++ b/include/linux/page_frag_cache.h > @@ -31,11 +31,23 @@ struct page_frag_cache { > #endif > }; > > +/** > + * page_frag_cache_init() - Init page_frag cache. > + * @nc: page_frag cache from which to init > + * > + * Inline helper to init the page_frag cache. > + */ > static inline void page_frag_cache_init(struct page_frag_cache *nc) > { > nc->va = NULL; > } > > +/** > + * page_frag_cache_is_pfmemalloc() - Check for pfmemalloc. > + * @nc: page_frag cache from which to check > + * > + * Used to check if the current page in page_frag cache is pfmemalloc'ed. > + */ > static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) > { > return !!nc->pfmemalloc; > @@ -46,6 +58,17 @@ void __page_frag_cache_drain(struct page *page, unsigned int count); > void *page_frag_cache_refill(struct page_frag_cache *nc, unsigned int fragsz, > gfp_t gfp_mask); > > +/** > + * page_frag_alloc_va() - Alloc a page fragment. > + * @nc: page_frag cache from which to allocate > + * @fragsz: the requested fragment size > + * @gfp_mask: the allocation gfp to use when cache need to be refilled > + * > + * Get a page fragment from page_frag cache. > + * > + * Return: > + * Return va of the page fragment, otherwise return NULL. > + */ > static inline void *page_frag_alloc_va(struct page_frag_cache *nc, > unsigned int fragsz, gfp_t gfp_mask) > { > @@ -63,6 +86,19 @@ static inline void *page_frag_alloc_va(struct page_frag_cache *nc, > return va + offset; > } > > +/** > + * __page_frag_alloc_va_align() - Alloc a page fragment with aligning > + * requirement. > + * @nc: page_frag cache from which to allocate > + * @fragsz: the requested fragment size > + * @gfp_mask: the allocation gfp to use when cache need to be refilled > + * @align: the requested aligning requirement > + * > + * Get a page fragment from page_frag cache with aligning requirement. > + * > + * Return: > + * Return va of the page fragment, otherwise return NULL. > + */ > static inline void *__page_frag_alloc_va_align(struct page_frag_cache *nc, > unsigned int fragsz, > gfp_t gfp_mask, > @@ -75,6 +111,19 @@ static inline void *__page_frag_alloc_va_align(struct page_frag_cache *nc, > return page_frag_alloc_va(nc, fragsz, gfp_mask); > } > > +/** > + * page_frag_alloc_va_align() - Alloc a page fragment with aligning requirement. > + * @nc: page_frag cache from which to allocate > + * @fragsz: the requested fragment size > + * @gfp_mask: the allocation gfp to use when cache need to be refilled > + * @align: the requested aligning requirement > + * > + * WARN_ON_ONCE() checking for align and fragsz before getting a page fragment > + * from page_frag cache with aligning requirement. > + * > + * Return: > + * Return va of the page fragment, otherwise return NULL. > + */ > static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc, > unsigned int fragsz, > gfp_t gfp_mask, > @@ -86,6 +135,19 @@ static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc, > return __page_frag_alloc_va_align(nc, fragsz, gfp_mask, align); > } > > +/** > + * page_frag_alloc_va_prepare() - Prepare allocing a page fragment. > + * @nc: page_frag cache from which to prepare > + * @offset: out as the offset of the page fragment > + * @size: in as the requested size, out as the available size > + * @gfp_mask: the allocation gfp to use when cache need to be refilled > + * > + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to > + * report the maximum size of the page fragment the caller can use. > + * > + * Return: > + * Return va of the page fragment, otherwise return NULL. > + */ > static inline void *page_frag_alloc_va_prepare(struct page_frag_cache *nc, > unsigned int *offset, > unsigned int *size, > @@ -108,6 +170,21 @@ static inline void *page_frag_alloc_va_prepare(struct page_frag_cache *nc, > return va + *offset; > } > > +/** > + * page_frag_alloc_va_prepare_align() - Prepare allocing a page fragment with > + * aligning requirement. > + * @nc: page_frag cache from which to prepare > + * @offset: out as the offset of the page fragment > + * @size: in as the requested size, out as the available size > + * @align: the requested aligning requirement > + * @gfp_mask: the allocation gfp to use when cache need to be refilled > + * > + * Prepare an aligned page fragment with minimum size of ‘size’, 'size' is also > + * used to report the maximum size of the page fragment the caller can use. > + * > + * Return: > + * Return va of the page fragment, otherwise return NULL. > + */ > static inline void *page_frag_alloc_va_prepare_align(struct page_frag_cache *nc, > unsigned int *offset, > unsigned int *size, > @@ -144,6 +221,19 @@ static inline void *__page_frag_alloc_pg_prepare(struct page_frag_cache *nc, > return va; > } > > +/** > + * page_frag_alloc_pg_prepare - Prepare allocing a page fragment. > + * @nc: page_frag cache from which to prepare > + * @offset: out as the offset of the page fragment > + * @size: in as the requested size, out as the available size > + * @gfp: the allocation gfp to use when cache need to be refilled > + * > + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to > + * report the maximum size of the page fragment the caller can use. > + * > + * Return: > + * Return the page fragment, otherwise return NULL. > + */ > #define page_frag_alloc_pg_prepare(nc, offset, size, gfp) \ > ({ \ > struct page *__page = NULL; \ > @@ -179,6 +269,21 @@ static inline void *__page_frag_alloc_prepare(struct page_frag_cache *nc, > return nc_va; > } > > +/** > + * page_frag_alloc_prepare - Prepare allocing a page fragment. > + * @nc: page_frag cache from which to prepare > + * @offset: out as the offset of the page fragment > + * @size: in as the requested size, out as the available size > + * @va: out as the va of the returned page fragment > + * @gfp: the allocation gfp to use when cache need to be refilled > + * > + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to > + * report the maximum size of the page fragment. Return both 'page' and 'va' of > + * the fragment to the caller. > + * > + * Return: > + * Return the page fragment, otherwise return NULL. > + */ > #define page_frag_alloc_prepare(nc, offset, size, va, gfp) \ > ({ \ > struct page *__page = NULL; \ > @@ -191,6 +296,14 @@ static inline void *__page_frag_alloc_prepare(struct page_frag_cache *nc, > __page; \ > }) > > +/** > + * page_frag_alloc_commit - Commit allocing a page fragment. > + * @nc: page_frag cache from which to commit > + * @offset: offset of the page fragment > + * @size: size of the page fragment has been used > + * > + * Commit the alloc preparing by passing offset and the actual used size. > + */ > static inline void page_frag_alloc_commit(struct page_frag_cache *nc, > unsigned int offset, > unsigned int size) > @@ -199,6 +312,17 @@ static inline void page_frag_alloc_commit(struct page_frag_cache *nc, > nc->offset = offset + size; > } > > +/** > + * page_frag_alloc_commit_noref - Commit allocing a page fragment without taking > + * page refcount. > + * @nc: page_frag cache from which to commit > + * @offset: offset of the page fragment > + * @size: size of the page fragment has been used > + * > + * Commit the alloc preparing by passing offset and the actual used size, but > + * not taking page refcount. Mostly used for fragmemt coaleasing case when the > + * current fragmemt can share the same refcount with previous fragmemt. > + */ > static inline void page_frag_alloc_commit_noref(struct page_frag_cache *nc, > unsigned int offset, > unsigned int size) > @@ -206,6 +330,10 @@ static inline void page_frag_alloc_commit_noref(struct page_frag_cache *nc, > nc->offset = offset + size; > } > > +/** > + * page_frag_free_va - Free a page fragment by va. > + * @addr: va of page fragment to be freed > + */ > void page_frag_free_va(void *addr); > > #endif > diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c > index cbd0ed82a596..0c76ec006c22 100644 > --- a/mm/page_frag_cache.c > +++ b/mm/page_frag_cache.c > @@ -1,15 +1,44 @@ > // SPDX-License-Identifier: GPL-2.0-only > -/* Page fragment allocator > + > +/** > + * DOC: page_frag allocator > + * > + * A page fragment is an arbitrary-length arbitrary-offset area of memory which > + * resides within a 0 or higher order compound page. Multiple fragments within > + * that page are individually refcounted, in the page's reference counter. > + * > + * The page_frag functions, page_frag_alloc* and page_frag_free*, provide a > + * simple allocation framework for page fragments. This is used by the network > + * stack and network device drivers to provide a backing region of memory for > + * use as either an sk_buff->head, or to be used in the "frags" portion of > + * skb_shared_info. > * > - * Page Fragment: > - * An arbitrary-length arbitrary-offset area of memory which resides within a > - * 0 or higher order page. Multiple fragments within that page are > - * individually refcounted, in the page's reference counter. > + * In order to make use of the page fragment APIs a backing page fragment cache > + * is needed. This provides a central point for the fragment allocation and > + * tracks allows multiple calls to make use of a cached page. The advantage to > + * doing this is that multiple calls to get_page can be avoided which can be > + * expensive at allocation time. However due to the nature of this caching it > + * is required that any calls to the cache be protected by either a per-cpu > + * limitation, or a per-cpu limitation and forcing interrupts to be disabled > + * when executing the fragment allocation. > * > - * The page_frag functions provide a simple allocation framework for page > - * fragments. This is used by the network stack and network device drivers to > - * provide a backing region of memory for use as either an sk_buff->head, or to > - * be used in the "frags" portion of skb_shared_info. > + * The network stack uses two separate caches per CPU to handle fragment > + * allocation. The netdev_alloc_cache is used by callers making use of the > + * netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is > + * used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The > + * main difference between these two calls is the context in which they may be > + * called. The "netdev" prefixed functions are usable in any context as these > + * functions will disable interrupts, while the "napi" prefixed functions are > + * only usable within the softirq context. > + * > + * Many network device drivers use a similar methodology for allocating page > + * fragments, but the page fragments are cached at the ring or descriptor > + * level. In order to enable these cases it is necessary to provide a generic > + * way of tearing down a page cache. For this reason __page_frag_cache_drain > + * was implemented. It allows for freeing multiple references from a single > + * page via a single call. The advantage to doing this is that it allows for > + * cleaning up the multiple references that were added to a page in order to > + * avoid calling get_page per allocation. > */ > Again, not a huge fan of moving this. It would be better to just leave it where it was and add your documentation onto it. > #include <linux/export.h> > @@ -57,6 +86,10 @@ static bool __page_frag_cache_refill(struct page_frag_cache *nc, > return true; > } > > +/** > + * page_frag_cache_drain - Drain the current page from page_frag cache. > + * @nc: page_frag cache from which to drain > + */ > void page_frag_cache_drain(struct page_frag_cache *nc) > { > if (!nc->va)
On 2024/4/8 2:13, Alexander H Duyck wrote: > On Sun, 2024-04-07 at 21:08 +0800, Yunsheng Lin wrote: >> Update documentation about design, implementation and API usages >> for page_frag. >> >> Also update MAINTAINERS for page_frag. Alexander seems to be the >> orginal author for page_frag, we can add him to the MAINTAINERS >> later if we have an ack from him. >> >> CC: Alexander Duyck <alexander.duyck@gmail.com> >> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> > > Again, this seems more like 2 different pathches at least. One for the > Documentation and MAINTAINERS changes, and one for the function > documentation. Sure. > >> --- >> Documentation/mm/page_frags.rst | 115 ++++++++++++++++++---------- >> MAINTAINERS | 10 +++ >> include/linux/page_frag_cache.h | 128 ++++++++++++++++++++++++++++++++ >> mm/page_frag_cache.c | 51 ++++++++++--- >> 4 files changed, 256 insertions(+), 48 deletions(-) >> >> diff --git a/Documentation/mm/page_frags.rst b/Documentation/mm/page_frags.rst >> index 503ca6cdb804..77256dfb58bf 100644 >> --- a/Documentation/mm/page_frags.rst >> +++ b/Documentation/mm/page_frags.rst >> @@ -1,43 +1,80 @@ >> +.. SPDX-License-Identifier: GPL-2.0 >> + >> ============== >> Page fragments >> ============== >> >> -A page fragment is an arbitrary-length arbitrary-offset area of memory >> -which resides within a 0 or higher order compound page. Multiple >> -fragments within that page are individually refcounted, in the page's >> -reference counter. >> - >> -The page_frag functions, page_frag_alloc and page_frag_free, provide a >> -simple allocation framework for page fragments. This is used by the >> -network stack and network device drivers to provide a backing region of >> -memory for use as either an sk_buff->head, or to be used in the "frags" >> -portion of skb_shared_info. >> - >> -In order to make use of the page fragment APIs a backing page fragment >> -cache is needed. This provides a central point for the fragment allocation >> -and tracks allows multiple calls to make use of a cached page. The >> -advantage to doing this is that multiple calls to get_page can be avoided >> -which can be expensive at allocation time. However due to the nature of >> -this caching it is required that any calls to the cache be protected by >> -either a per-cpu limitation, or a per-cpu limitation and forcing interrupts >> -to be disabled when executing the fragment allocation. >> - >> -The network stack uses two separate caches per CPU to handle fragment >> -allocation. The netdev_alloc_cache is used by callers making use of the >> -netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is >> -used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The >> -main difference between these two calls is the context in which they may be >> -called. The "netdev" prefixed functions are usable in any context as these >> -functions will disable interrupts, while the "napi" prefixed functions are >> -only usable within the softirq context. >> - >> -Many network device drivers use a similar methodology for allocating page >> -fragments, but the page fragments are cached at the ring or descriptor >> -level. In order to enable these cases it is necessary to provide a generic >> -way of tearing down a page cache. For this reason __page_frag_cache_drain >> -was implemented. It allows for freeing multiple references from a single >> -page via a single call. The advantage to doing this is that it allows for >> -cleaning up the multiple references that were added to a page in order to >> -avoid calling get_page per allocation. >> - >> -Alexander Duyck, Nov 29, 2016. > > What is the point of removing this just to add it to a C file further > down in the diff? Honestly I am not a fan of all the noise this is > adding to these diffs. Can we do a little less moving of lines for the > sake of moving them? All it does is pollute the git blame if you try to > figure out the origin of the lines. I was thinking about move the doc related code to file where code is related, so that author will remember to update the doc when changing the code. Maybe above does not matter that much? > >> +.. kernel-doc:: mm/page_frag_cache.c >> + :doc: page_frag allocator >> + >> +Architecture overview >> +===================== >> + >> +.. code-block:: none >> + >> + +----------------------+ >> + | page_frag API caller | >> + +----------------------+ >> + ^ >> + | >> + | >> + | >> + v >> + +----------------------------------------------+ >> + | request page fragment | >> + +----------------------------------------------+ >> + ^ ^ >> + | | >> + | Cache empty or not enough | >> + | | >> + v | >> + +--------------------------------+ | >> + | refill cache with order 3 page | | >> + +--------------------------------+ | >> + ^ ^ | >> + | | | >> + | | Refill failed | >> + | | | Cache is enough >> + | | | >> + | v | >> + | +----------------------------------+ | >> + | | refill cache with order 0 page | | >> + | +----------------------------------+ | >> + | ^ | >> + | Refill succeed | | >> + | | Refill succeed | >> + | | | >> + v v v >> + +----------------------------------------------+ >> + | allocate fragment from cache | >> + +----------------------------------------------+ >> + > > +1 for the simple visualization of how this works. > >> +API interface >> +============= >> +As the design and implementation of page_frag API, the allocation side does not >> +allow concurrent calling, it is assumed that the caller must ensure there is not >> +concurrent alloc calling to the same page_frag_cache instance by using it's own >> +lock or rely on some lockless guarantee like NAPI softirq. >> + >> +Depending on different use cases, callers expecting to deal with va, page or >> +both va and page for them may call page_frag_alloc_va(), page_frag_alloc_pg(), >> +or page_frag_alloc() accordingly. >> + > > So the new documentation is good up to here. > >> +There is also a use case that need minimum memory in order for forward >> +progressing, but can do better if there is more memory available. Introduce >> +page_frag_alloc_prepare() and page_frag_alloc_commit() related API, the caller >> +requests the minimum memory it need and the prepare API will return the maximum >> +size of the fragment returned, caller need to report back to the page_frag core >> +how much memory it actually use by calling commit API, or not calling the commit >> +API if deciding to not use any memory. >> + > > This part is as clear as mud to me. It sounds like kind of a convoluted > setup where you are having the caller have to know a fair bit about the > internal structure of the cache and it is essentially checking the > state and then performing a commit. Not a huge fan. I would almost > prefer to see something more like what we used to do with msix where > you just had a range you could request and if it can't give you at > least the minimum it fails.> > I assume the patch is somewhere here in the set. Will take a look at it > later. Yes, the API is introduced in patch 9 and used in patch 10. > >> +.. kernel-doc:: include/linux/page_frag_cache.h >> + :identifiers: page_frag_cache_init page_frag_cache_is_pfmemalloc >> + page_frag_alloc_va __page_frag_alloc_va_align >> + page_frag_alloc_va_align page_frag_alloc_va_prepare >> + page_frag_alloc_va_prepare_align page_frag_alloc_pg_prepare >> + page_frag_alloc_prepare page_frag_alloc_commit >> + page_frag_alloc_commit_noref page_frag_free_va >> + >> +.. kernel-doc:: mm/page_frag_cache.c >> + :identifiers: page_frag_cache_drain >> diff --git a/MAINTAINERS b/MAINTAINERS >> index 4745ea94d463..2f84aba59428 100644 >> --- a/MAINTAINERS >> +++ b/MAINTAINERS >> @@ -16683,6 +16683,16 @@ F: mm/page-writeback.c >> F: mm/readahead.c >> F: mm/truncate.c >> >> +PAGE FRAG >> +M: Yunsheng Lin <linyunsheng@huawei.com> >> +L: linux-mm@kvack.org >> +L: netdev@vger.kernel.org >> +S: Supported >> +F: Documentation/mm/page_frags.rst >> +F: include/linux/page_frag_cache.h >> +F: mm/page_frag_cache.c >> +F: mm/page_frag_test.c >> + > > I would appreciate it if you could add me as I usually am having to > deal with issues people have with this anyway. You can probably just go > with: > Alexander Duyck <alexander.duyck@gmail.com> Sure, good to your ack here. > >> PAGE POOL >> M: Jesper Dangaard Brouer <hawk@kernel.org> >> M: Ilias Apalodimas <ilias.apalodimas@linaro.org> >> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h >> index 28185969cd2c..d8edbecdd179 100644
On Mon, Apr 8, 2024 at 6:39 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: > > On 2024/4/8 2:13, Alexander H Duyck wrote: > > On Sun, 2024-04-07 at 21:08 +0800, Yunsheng Lin wrote: > >> Update documentation about design, implementation and API usages > >> for page_frag. > >> > >> Also update MAINTAINERS for page_frag. Alexander seems to be the > >> orginal author for page_frag, we can add him to the MAINTAINERS > >> later if we have an ack from him. > >> > >> CC: Alexander Duyck <alexander.duyck@gmail.com> > >> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> > > > > Again, this seems more like 2 different pathches at least. One for the > > Documentation and MAINTAINERS changes, and one for the function > > documentation. > > Sure. > [...] > >> --- a/MAINTAINERS > >> +++ b/MAINTAINERS > >> @@ -16683,6 +16683,16 @@ F: mm/page-writeback.c > >> F: mm/readahead.c > >> F: mm/truncate.c > >> > >> +PAGE FRAG > >> +M: Yunsheng Lin <linyunsheng@huawei.com> > >> +L: linux-mm@kvack.org > >> +L: netdev@vger.kernel.org > >> +S: Supported > >> +F: Documentation/mm/page_frags.rst > >> +F: include/linux/page_frag_cache.h > >> +F: mm/page_frag_cache.c > >> +F: mm/page_frag_test.c > >> + > > > > I would appreciate it if you could add me as I usually am having to > > deal with issues people have with this anyway. You can probably just go > > with: > > Alexander Duyck <alexander.duyck@gmail.com> > > Sure, good to your ack here. Just to be clear this isn't an Ack, but if you are going to list maintainers for this my name should be on the list so this is the preferred format. There are still some things to be cleaned up in this patch.
On 2024/4/9 0:13, Alexander Duyck wrote: > On Mon, Apr 8, 2024 at 6:39 AM Yunsheng Lin <linyunsheng@huawei.com> wrote: >> >> On 2024/4/8 2:13, Alexander H Duyck wrote: >>> On Sun, 2024-04-07 at 21:08 +0800, Yunsheng Lin wrote: >>>> Update documentation about design, implementation and API usages >>>> for page_frag. >>>> >>>> Also update MAINTAINERS for page_frag. Alexander seems to be the >>>> orginal author for page_frag, we can add him to the MAINTAINERS >>>> later if we have an ack from him. >>>> >>>> CC: Alexander Duyck <alexander.duyck@gmail.com> >>>> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> >>> >>> Again, this seems more like 2 different pathches at least. One for the >>> Documentation and MAINTAINERS changes, and one for the function >>> documentation. >> >> Sure. >> > > [...] > >>>> --- a/MAINTAINERS >>>> +++ b/MAINTAINERS >>>> @@ -16683,6 +16683,16 @@ F: mm/page-writeback.c >>>> F: mm/readahead.c >>>> F: mm/truncate.c >>>> >>>> +PAGE FRAG >>>> +M: Yunsheng Lin <linyunsheng@huawei.com> >>>> +L: linux-mm@kvack.org >>>> +L: netdev@vger.kernel.org >>>> +S: Supported >>>> +F: Documentation/mm/page_frags.rst >>>> +F: include/linux/page_frag_cache.h >>>> +F: mm/page_frag_cache.c >>>> +F: mm/page_frag_test.c >>>> + >>> >>> I would appreciate it if you could add me as I usually am having to >>> deal with issues people have with this anyway. You can probably just go >>> with: >>> Alexander Duyck <alexander.duyck@gmail.com> >> >> Sure, good to your ack here. > > Just to be clear this isn't an Ack, but if you are going to list > maintainers for this my name should be on the list so this is the > preferred format. There are still some things to be cleaned up in this > patch. Sure, I was talking about "Alexander seems to be the orginal author for page_frag, we can add him to the MAINTAINERS later if we have an ack from him." in the commit log. > . >
On Tue, 9 Apr 2024 15:59:58 +0800 Yunsheng Lin wrote: > > Just to be clear this isn't an Ack, but if you are going to list > > maintainers for this my name should be on the list so this is the > > preferred format. There are still some things to be cleaned up in this > > patch. > > Sure, I was talking about "Alexander seems to be the orginal author for > page_frag, we can add him to the MAINTAINERS later if we have an ack from > him." in the commit log. Do we have to have a MAINTAINERS entry for every 1000 lines of code? It really feels forced :/
On Tue, Apr 9, 2024 at 6:25 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 9 Apr 2024 15:59:58 +0800 Yunsheng Lin wrote: > > > Just to be clear this isn't an Ack, but if you are going to list > > > maintainers for this my name should be on the list so this is the > > > preferred format. There are still some things to be cleaned up in this > > > patch. > > > > Sure, I was talking about "Alexander seems to be the orginal author for > > page_frag, we can add him to the MAINTAINERS later if we have an ack from > > him." in the commit log. > > Do we have to have a MAINTAINERS entry for every 1000 lines of code? > It really feels forced :/ I don't disagree. However, if nothing else I think it gets used as a part of get_maintainers.pl that tells you who to email about changes doesn't it? It might make sense in my case since I am still maintaining it using my gmail account, but I think the commits for that were mostly from my Intel account weren't they? So if nothing else it might be a way to provide a trail of breadcrumbs on how to find a maintainer who changed employers..
On 2024/4/9 23:11, Alexander Duyck wrote: > On Tue, Apr 9, 2024 at 6:25 AM Jakub Kicinski <kuba@kernel.org> wrote: >> >> On Tue, 9 Apr 2024 15:59:58 +0800 Yunsheng Lin wrote: >>>> Just to be clear this isn't an Ack, but if you are going to list >>>> maintainers for this my name should be on the list so this is the >>>> preferred format. There are still some things to be cleaned up in this >>>> patch. >>> >>> Sure, I was talking about "Alexander seems to be the orginal author for >>> page_frag, we can add him to the MAINTAINERS later if we have an ack from >>> him." in the commit log. >> >> Do we have to have a MAINTAINERS entry for every 1000 lines of code? Do we have something like rule or guidance against that? Looking at the entry in MAINTAINERS, it seems quite normal to me, I thouht it is generally encourage someone with willing and ability to be a maintainer/reviewer. Considering you have refused adding me as the reviewer of page_pool despite the support from two maintainers of page_pool, for the season of something like below: 'page pool is increasingly central to the whole networking stack. The bar for "ownership" is getting higher and higher..' I think I might need a second opinion here. >> It really feels forced :/ I am not a native english speaker here, I would rather not comment on the 'forced' part here and focus more on the technical disscusion. > > I don't disagree. However, if nothing else I think it gets used as a > part of get_maintainers.pl that tells you who to email about changes > doesn't it? It might make sense in my case since I am still > maintaining it using my gmail account, but I think the commits for > that were mostly from my Intel account weren't they? So if nothing > else it might be a way to provide a trail of breadcrumbs on how to > find a maintainer who changed employers.. +1. I generally pay more attention to the patch that is to'ed or cc'ed to my email when I am overloaded with other work. > . >
On 09.04.24 17:11, Alexander Duyck wrote: > On Tue, Apr 9, 2024 at 6:25 AM Jakub Kicinski <kuba@kernel.org> wrote: >> >> On Tue, 9 Apr 2024 15:59:58 +0800 Yunsheng Lin wrote: >>>> Just to be clear this isn't an Ack, but if you are going to list >>>> maintainers for this my name should be on the list so this is the >>>> preferred format. There are still some things to be cleaned up in this >>>> patch. >>> >>> Sure, I was talking about "Alexander seems to be the orginal author for >>> page_frag, we can add him to the MAINTAINERS later if we have an ack from >>> him." in the commit log. >> >> Do we have to have a MAINTAINERS entry for every 1000 lines of code? >> It really feels forced :/ > > I don't disagree. However, if nothing else I think it gets used as a > part of get_maintainers.pl that tells you who to email about changes > doesn't it? It might make sense in my case since I am still > maintaining it using my gmail account, but I think the commits for > that were mostly from my Intel account weren't they? So if nothing > else it might be a way to provide a trail of breadcrumbs on how to > find a maintainer who changed employers.. Would a .mailmap entry also help for your case, such that the mail address might get mapped to the new one? (note, I never edited .mailmap myself)
On Wed, Apr 10, 2024 at 9:06 AM David Hildenbrand <david@redhat.com> wrote: > > On 09.04.24 17:11, Alexander Duyck wrote: > > On Tue, Apr 9, 2024 at 6:25 AM Jakub Kicinski <kuba@kernel.org> wrote: > >> > >> On Tue, 9 Apr 2024 15:59:58 +0800 Yunsheng Lin wrote: > >>>> Just to be clear this isn't an Ack, but if you are going to list > >>>> maintainers for this my name should be on the list so this is the > >>>> preferred format. There are still some things to be cleaned up in this > >>>> patch. > >>> > >>> Sure, I was talking about "Alexander seems to be the orginal author for > >>> page_frag, we can add him to the MAINTAINERS later if we have an ack from > >>> him." in the commit log. > >> > >> Do we have to have a MAINTAINERS entry for every 1000 lines of code? > >> It really feels forced :/ > > > > I don't disagree. However, if nothing else I think it gets used as a > > part of get_maintainers.pl that tells you who to email about changes > > doesn't it? It might make sense in my case since I am still > > maintaining it using my gmail account, but I think the commits for > > that were mostly from my Intel account weren't they? So if nothing > > else it might be a way to provide a trail of breadcrumbs on how to > > find a maintainer who changed employers.. > > Would a .mailmap entry also help for your case, such that the mail > address might get mapped to the new one? (note, I never edited .mailmap > myself) Not sure. My concern is that it might undo the existing tracking for contributions by employer as I know they use the emails for the most basic setup for that. I suppose that is one downside of being a job hopper.. :-P I'd rather not make more work for someone like Jon Corbet or Jakub who I know maintain statistics based on the emails used and such.
On 10.04.24 20:19, Alexander Duyck wrote: > On Wed, Apr 10, 2024 at 9:06 AM David Hildenbrand <david@redhat.com> wrote: >> >> On 09.04.24 17:11, Alexander Duyck wrote: >>> On Tue, Apr 9, 2024 at 6:25 AM Jakub Kicinski <kuba@kernel.org> wrote: >>>> >>>> On Tue, 9 Apr 2024 15:59:58 +0800 Yunsheng Lin wrote: >>>>>> Just to be clear this isn't an Ack, but if you are going to list >>>>>> maintainers for this my name should be on the list so this is the >>>>>> preferred format. There are still some things to be cleaned up in this >>>>>> patch. >>>>> >>>>> Sure, I was talking about "Alexander seems to be the orginal author for >>>>> page_frag, we can add him to the MAINTAINERS later if we have an ack from >>>>> him." in the commit log. >>>> >>>> Do we have to have a MAINTAINERS entry for every 1000 lines of code? >>>> It really feels forced :/ >>> >>> I don't disagree. However, if nothing else I think it gets used as a >>> part of get_maintainers.pl that tells you who to email about changes >>> doesn't it? It might make sense in my case since I am still >>> maintaining it using my gmail account, but I think the commits for >>> that were mostly from my Intel account weren't they? So if nothing >>> else it might be a way to provide a trail of breadcrumbs on how to >>> find a maintainer who changed employers.. >> >> Would a .mailmap entry also help for your case, such that the mail >> address might get mapped to the new one? (note, I never edited .mailmap >> myself) > > Not sure. My concern is that it might undo the existing tracking for > contributions by employer as I know they use the emails for the most > basic setup for that. I suppose that is one downside of being a job > hopper.. :-P I wouldn't be concerned about undoing existing tracking. I can spot people in .mailmap with more than 5 entries / different employers, so it is quite common! > > I'd rather not make more work for someone like Jon Corbet or Jakub who > I know maintain statistics based on the emails used and such. From what I recall, they do have their own mapping of mail addresses to employers, for example for people that just don't use corporate mail addresses.
diff --git a/Documentation/mm/page_frags.rst b/Documentation/mm/page_frags.rst index 503ca6cdb804..77256dfb58bf 100644 --- a/Documentation/mm/page_frags.rst +++ b/Documentation/mm/page_frags.rst @@ -1,43 +1,80 @@ +.. SPDX-License-Identifier: GPL-2.0 + ============== Page fragments ============== -A page fragment is an arbitrary-length arbitrary-offset area of memory -which resides within a 0 or higher order compound page. Multiple -fragments within that page are individually refcounted, in the page's -reference counter. - -The page_frag functions, page_frag_alloc and page_frag_free, provide a -simple allocation framework for page fragments. This is used by the -network stack and network device drivers to provide a backing region of -memory for use as either an sk_buff->head, or to be used in the "frags" -portion of skb_shared_info. - -In order to make use of the page fragment APIs a backing page fragment -cache is needed. This provides a central point for the fragment allocation -and tracks allows multiple calls to make use of a cached page. The -advantage to doing this is that multiple calls to get_page can be avoided -which can be expensive at allocation time. However due to the nature of -this caching it is required that any calls to the cache be protected by -either a per-cpu limitation, or a per-cpu limitation and forcing interrupts -to be disabled when executing the fragment allocation. - -The network stack uses two separate caches per CPU to handle fragment -allocation. The netdev_alloc_cache is used by callers making use of the -netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is -used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The -main difference between these two calls is the context in which they may be -called. The "netdev" prefixed functions are usable in any context as these -functions will disable interrupts, while the "napi" prefixed functions are -only usable within the softirq context. - -Many network device drivers use a similar methodology for allocating page -fragments, but the page fragments are cached at the ring or descriptor -level. In order to enable these cases it is necessary to provide a generic -way of tearing down a page cache. For this reason __page_frag_cache_drain -was implemented. It allows for freeing multiple references from a single -page via a single call. The advantage to doing this is that it allows for -cleaning up the multiple references that were added to a page in order to -avoid calling get_page per allocation. - -Alexander Duyck, Nov 29, 2016. +.. kernel-doc:: mm/page_frag_cache.c + :doc: page_frag allocator + +Architecture overview +===================== + +.. code-block:: none + + +----------------------+ + | page_frag API caller | + +----------------------+ + ^ + | + | + | + v + +----------------------------------------------+ + | request page fragment | + +----------------------------------------------+ + ^ ^ + | | + | Cache empty or not enough | + | | + v | + +--------------------------------+ | + | refill cache with order 3 page | | + +--------------------------------+ | + ^ ^ | + | | | + | | Refill failed | + | | | Cache is enough + | | | + | v | + | +----------------------------------+ | + | | refill cache with order 0 page | | + | +----------------------------------+ | + | ^ | + | Refill succeed | | + | | Refill succeed | + | | | + v v v + +----------------------------------------------+ + | allocate fragment from cache | + +----------------------------------------------+ + +API interface +============= +As the design and implementation of page_frag API, the allocation side does not +allow concurrent calling, it is assumed that the caller must ensure there is not +concurrent alloc calling to the same page_frag_cache instance by using it's own +lock or rely on some lockless guarantee like NAPI softirq. + +Depending on different use cases, callers expecting to deal with va, page or +both va and page for them may call page_frag_alloc_va(), page_frag_alloc_pg(), +or page_frag_alloc() accordingly. + +There is also a use case that need minimum memory in order for forward +progressing, but can do better if there is more memory available. Introduce +page_frag_alloc_prepare() and page_frag_alloc_commit() related API, the caller +requests the minimum memory it need and the prepare API will return the maximum +size of the fragment returned, caller need to report back to the page_frag core +how much memory it actually use by calling commit API, or not calling the commit +API if deciding to not use any memory. + +.. kernel-doc:: include/linux/page_frag_cache.h + :identifiers: page_frag_cache_init page_frag_cache_is_pfmemalloc + page_frag_alloc_va __page_frag_alloc_va_align + page_frag_alloc_va_align page_frag_alloc_va_prepare + page_frag_alloc_va_prepare_align page_frag_alloc_pg_prepare + page_frag_alloc_prepare page_frag_alloc_commit + page_frag_alloc_commit_noref page_frag_free_va + +.. kernel-doc:: mm/page_frag_cache.c + :identifiers: page_frag_cache_drain diff --git a/MAINTAINERS b/MAINTAINERS index 4745ea94d463..2f84aba59428 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -16683,6 +16683,16 @@ F: mm/page-writeback.c F: mm/readahead.c F: mm/truncate.c +PAGE FRAG +M: Yunsheng Lin <linyunsheng@huawei.com> +L: linux-mm@kvack.org +L: netdev@vger.kernel.org +S: Supported +F: Documentation/mm/page_frags.rst +F: include/linux/page_frag_cache.h +F: mm/page_frag_cache.c +F: mm/page_frag_test.c + PAGE POOL M: Jesper Dangaard Brouer <hawk@kernel.org> M: Ilias Apalodimas <ilias.apalodimas@linaro.org> diff --git a/include/linux/page_frag_cache.h b/include/linux/page_frag_cache.h index 28185969cd2c..d8edbecdd179 100644 --- a/include/linux/page_frag_cache.h +++ b/include/linux/page_frag_cache.h @@ -31,11 +31,23 @@ struct page_frag_cache { #endif }; +/** + * page_frag_cache_init() - Init page_frag cache. + * @nc: page_frag cache from which to init + * + * Inline helper to init the page_frag cache. + */ static inline void page_frag_cache_init(struct page_frag_cache *nc) { nc->va = NULL; } +/** + * page_frag_cache_is_pfmemalloc() - Check for pfmemalloc. + * @nc: page_frag cache from which to check + * + * Used to check if the current page in page_frag cache is pfmemalloc'ed. + */ static inline bool page_frag_cache_is_pfmemalloc(struct page_frag_cache *nc) { return !!nc->pfmemalloc; @@ -46,6 +58,17 @@ void __page_frag_cache_drain(struct page *page, unsigned int count); void *page_frag_cache_refill(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask); +/** + * page_frag_alloc_va() - Alloc a page fragment. + * @nc: page_frag cache from which to allocate + * @fragsz: the requested fragment size + * @gfp_mask: the allocation gfp to use when cache need to be refilled + * + * Get a page fragment from page_frag cache. + * + * Return: + * Return va of the page fragment, otherwise return NULL. + */ static inline void *page_frag_alloc_va(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask) { @@ -63,6 +86,19 @@ static inline void *page_frag_alloc_va(struct page_frag_cache *nc, return va + offset; } +/** + * __page_frag_alloc_va_align() - Alloc a page fragment with aligning + * requirement. + * @nc: page_frag cache from which to allocate + * @fragsz: the requested fragment size + * @gfp_mask: the allocation gfp to use when cache need to be refilled + * @align: the requested aligning requirement + * + * Get a page fragment from page_frag cache with aligning requirement. + * + * Return: + * Return va of the page fragment, otherwise return NULL. + */ static inline void *__page_frag_alloc_va_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, @@ -75,6 +111,19 @@ static inline void *__page_frag_alloc_va_align(struct page_frag_cache *nc, return page_frag_alloc_va(nc, fragsz, gfp_mask); } +/** + * page_frag_alloc_va_align() - Alloc a page fragment with aligning requirement. + * @nc: page_frag cache from which to allocate + * @fragsz: the requested fragment size + * @gfp_mask: the allocation gfp to use when cache need to be refilled + * @align: the requested aligning requirement + * + * WARN_ON_ONCE() checking for align and fragsz before getting a page fragment + * from page_frag cache with aligning requirement. + * + * Return: + * Return va of the page fragment, otherwise return NULL. + */ static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc, unsigned int fragsz, gfp_t gfp_mask, @@ -86,6 +135,19 @@ static inline void *page_frag_alloc_va_align(struct page_frag_cache *nc, return __page_frag_alloc_va_align(nc, fragsz, gfp_mask, align); } +/** + * page_frag_alloc_va_prepare() - Prepare allocing a page fragment. + * @nc: page_frag cache from which to prepare + * @offset: out as the offset of the page fragment + * @size: in as the requested size, out as the available size + * @gfp_mask: the allocation gfp to use when cache need to be refilled + * + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to + * report the maximum size of the page fragment the caller can use. + * + * Return: + * Return va of the page fragment, otherwise return NULL. + */ static inline void *page_frag_alloc_va_prepare(struct page_frag_cache *nc, unsigned int *offset, unsigned int *size, @@ -108,6 +170,21 @@ static inline void *page_frag_alloc_va_prepare(struct page_frag_cache *nc, return va + *offset; } +/** + * page_frag_alloc_va_prepare_align() - Prepare allocing a page fragment with + * aligning requirement. + * @nc: page_frag cache from which to prepare + * @offset: out as the offset of the page fragment + * @size: in as the requested size, out as the available size + * @align: the requested aligning requirement + * @gfp_mask: the allocation gfp to use when cache need to be refilled + * + * Prepare an aligned page fragment with minimum size of ‘size’, 'size' is also + * used to report the maximum size of the page fragment the caller can use. + * + * Return: + * Return va of the page fragment, otherwise return NULL. + */ static inline void *page_frag_alloc_va_prepare_align(struct page_frag_cache *nc, unsigned int *offset, unsigned int *size, @@ -144,6 +221,19 @@ static inline void *__page_frag_alloc_pg_prepare(struct page_frag_cache *nc, return va; } +/** + * page_frag_alloc_pg_prepare - Prepare allocing a page fragment. + * @nc: page_frag cache from which to prepare + * @offset: out as the offset of the page fragment + * @size: in as the requested size, out as the available size + * @gfp: the allocation gfp to use when cache need to be refilled + * + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to + * report the maximum size of the page fragment the caller can use. + * + * Return: + * Return the page fragment, otherwise return NULL. + */ #define page_frag_alloc_pg_prepare(nc, offset, size, gfp) \ ({ \ struct page *__page = NULL; \ @@ -179,6 +269,21 @@ static inline void *__page_frag_alloc_prepare(struct page_frag_cache *nc, return nc_va; } +/** + * page_frag_alloc_prepare - Prepare allocing a page fragment. + * @nc: page_frag cache from which to prepare + * @offset: out as the offset of the page fragment + * @size: in as the requested size, out as the available size + * @va: out as the va of the returned page fragment + * @gfp: the allocation gfp to use when cache need to be refilled + * + * Prepare a page fragment with minimum size of ‘size’, 'size' is also used to + * report the maximum size of the page fragment. Return both 'page' and 'va' of + * the fragment to the caller. + * + * Return: + * Return the page fragment, otherwise return NULL. + */ #define page_frag_alloc_prepare(nc, offset, size, va, gfp) \ ({ \ struct page *__page = NULL; \ @@ -191,6 +296,14 @@ static inline void *__page_frag_alloc_prepare(struct page_frag_cache *nc, __page; \ }) +/** + * page_frag_alloc_commit - Commit allocing a page fragment. + * @nc: page_frag cache from which to commit + * @offset: offset of the page fragment + * @size: size of the page fragment has been used + * + * Commit the alloc preparing by passing offset and the actual used size. + */ static inline void page_frag_alloc_commit(struct page_frag_cache *nc, unsigned int offset, unsigned int size) @@ -199,6 +312,17 @@ static inline void page_frag_alloc_commit(struct page_frag_cache *nc, nc->offset = offset + size; } +/** + * page_frag_alloc_commit_noref - Commit allocing a page fragment without taking + * page refcount. + * @nc: page_frag cache from which to commit + * @offset: offset of the page fragment + * @size: size of the page fragment has been used + * + * Commit the alloc preparing by passing offset and the actual used size, but + * not taking page refcount. Mostly used for fragmemt coaleasing case when the + * current fragmemt can share the same refcount with previous fragmemt. + */ static inline void page_frag_alloc_commit_noref(struct page_frag_cache *nc, unsigned int offset, unsigned int size) @@ -206,6 +330,10 @@ static inline void page_frag_alloc_commit_noref(struct page_frag_cache *nc, nc->offset = offset + size; } +/** + * page_frag_free_va - Free a page fragment by va. + * @addr: va of page fragment to be freed + */ void page_frag_free_va(void *addr); #endif diff --git a/mm/page_frag_cache.c b/mm/page_frag_cache.c index cbd0ed82a596..0c76ec006c22 100644 --- a/mm/page_frag_cache.c +++ b/mm/page_frag_cache.c @@ -1,15 +1,44 @@ // SPDX-License-Identifier: GPL-2.0-only -/* Page fragment allocator + +/** + * DOC: page_frag allocator + * + * A page fragment is an arbitrary-length arbitrary-offset area of memory which + * resides within a 0 or higher order compound page. Multiple fragments within + * that page are individually refcounted, in the page's reference counter. + * + * The page_frag functions, page_frag_alloc* and page_frag_free*, provide a + * simple allocation framework for page fragments. This is used by the network + * stack and network device drivers to provide a backing region of memory for + * use as either an sk_buff->head, or to be used in the "frags" portion of + * skb_shared_info. * - * Page Fragment: - * An arbitrary-length arbitrary-offset area of memory which resides within a - * 0 or higher order page. Multiple fragments within that page are - * individually refcounted, in the page's reference counter. + * In order to make use of the page fragment APIs a backing page fragment cache + * is needed. This provides a central point for the fragment allocation and + * tracks allows multiple calls to make use of a cached page. The advantage to + * doing this is that multiple calls to get_page can be avoided which can be + * expensive at allocation time. However due to the nature of this caching it + * is required that any calls to the cache be protected by either a per-cpu + * limitation, or a per-cpu limitation and forcing interrupts to be disabled + * when executing the fragment allocation. * - * The page_frag functions provide a simple allocation framework for page - * fragments. This is used by the network stack and network device drivers to - * provide a backing region of memory for use as either an sk_buff->head, or to - * be used in the "frags" portion of skb_shared_info. + * The network stack uses two separate caches per CPU to handle fragment + * allocation. The netdev_alloc_cache is used by callers making use of the + * netdev_alloc_frag and __netdev_alloc_skb calls. The napi_alloc_cache is + * used by callers of the __napi_alloc_frag and napi_alloc_skb calls. The + * main difference between these two calls is the context in which they may be + * called. The "netdev" prefixed functions are usable in any context as these + * functions will disable interrupts, while the "napi" prefixed functions are + * only usable within the softirq context. + * + * Many network device drivers use a similar methodology for allocating page + * fragments, but the page fragments are cached at the ring or descriptor + * level. In order to enable these cases it is necessary to provide a generic + * way of tearing down a page cache. For this reason __page_frag_cache_drain + * was implemented. It allows for freeing multiple references from a single + * page via a single call. The advantage to doing this is that it allows for + * cleaning up the multiple references that were added to a page in order to + * avoid calling get_page per allocation. */ #include <linux/export.h> @@ -57,6 +86,10 @@ static bool __page_frag_cache_refill(struct page_frag_cache *nc, return true; } +/** + * page_frag_cache_drain - Drain the current page from page_frag cache. + * @nc: page_frag cache from which to drain + */ void page_frag_cache_drain(struct page_frag_cache *nc) { if (!nc->va)
Update documentation about design, implementation and API usages for page_frag. Also update MAINTAINERS for page_frag. Alexander seems to be the orginal author for page_frag, we can add him to the MAINTAINERS later if we have an ack from him. CC: Alexander Duyck <alexander.duyck@gmail.com> Signed-off-by: Yunsheng Lin <linyunsheng@huawei.com> --- Documentation/mm/page_frags.rst | 115 ++++++++++++++++++---------- MAINTAINERS | 10 +++ include/linux/page_frag_cache.h | 128 ++++++++++++++++++++++++++++++++ mm/page_frag_cache.c | 51 ++++++++++--- 4 files changed, 256 insertions(+), 48 deletions(-)