Message ID | 20201115201029.11903-1-dongli.zhang@oracle.com (mailing list archive) |
---|---|
State | Accepted |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v3,1/1] page_frag: Recover from memory pressure | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
On Sun, Nov 15, 2020 at 9:16 PM Dongli Zhang <dongli.zhang@oracle.com> wrote: > > The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb(). > This ends up to page_frag_alloc() to allocate skb->data from > page_frag_cache->va. > > During the memory pressure, page_frag_cache->va may be allocated as > pfmemalloc page. As a result, the skb->pfmemalloc is always true as > skb->data is from page_frag_cache->va. The skb will be dropped if the > sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour > under memory pressure. ... > References: https://lore.kernel.org/lkml/20201103193239.1807-1-dongli.zhang@oracle.com/ > References: https://lore.kernel.org/linux-mm/20201105042140.5253-1-willy@infradead.org/ > Suggested-by: Matthew Wilcox (Oracle) <willy@infradead.org> > Cc: Aruna Ramakrishna <aruna.ramakrishna@oracle.com> > Cc: Bert Barbe <bert.barbe@oracle.com> > Cc: Rama Nichanamatlu <rama.nichanamatlu@oracle.com> > Cc: Venkat Venkatsubra <venkat.x.venkatsubra@oracle.com> > Cc: Manjunath Patil <manjunath.b.patil@oracle.com> > Cc: Joe Jin <joe.jin@oracle.com> > Cc: SRINIVAS <srinivas.eeda@oracle.com> > Cc: stable@vger.kernel.org > Fixes: 79930f5892e ("net: do not deplete pfmemalloc reserve") > Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> > Acked-by: Vlastimil Babka <vbabka@suse.cz> > --- > Changed since v1: > - change author from Matthew to Dongli > - Add references to all prior discussions > - Add more details to commit message > Changed since v2: > - add unlikely (suggested by Eric Dumazet) > > mm/page_alloc.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index 23f5066bd4a5..91129ce75ed4 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -5103,6 +5103,11 @@ void *page_frag_alloc(struct page_frag_cache *nc, > if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) > goto refill; > > + if (unlikely(nc->pfmemalloc)) { > + free_the_page(page, compound_order(page)); > + goto refill; > + } > + Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks !
On Sun, 15 Nov 2020 12:10:29 -0800 Dongli Zhang wrote: > The ethernet driver may allocate skb (and skb->data) via napi_alloc_skb(). > This ends up to page_frag_alloc() to allocate skb->data from > page_frag_cache->va. > > During the memory pressure, page_frag_cache->va may be allocated as > pfmemalloc page. As a result, the skb->pfmemalloc is always true as > skb->data is from page_frag_cache->va. The skb will be dropped if the > sock (receiver) does not have SOCK_MEMALLOC. This is expected behaviour > under memory pressure. > > However, once kernel is not under memory pressure any longer (suppose large > amount of memory pages are just reclaimed), the page_frag_alloc() may still > re-use the prior pfmemalloc page_frag_cache->va to allocate skb->data. As a > result, the skb->pfmemalloc is always true unless page_frag_cache->va is > re-allocated, even if the kernel is not under memory pressure any longer. > > Here is how kernel runs into issue. > > 1. The kernel is under memory pressure and allocation of > PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead, > the pfmemalloc page is allocated for page_frag_cache->va. > > 2: All skb->data from page_frag_cache->va (pfmemalloc) will have > skb->pfmemalloc=true. The skb will always be dropped by sock without > SOCK_MEMALLOC. This is an expected behaviour. > > 3. Suppose a large amount of pages are reclaimed and kernel is not under > memory pressure any longer. We expect skb->pfmemalloc drop will not happen. > > 4. Unfortunately, page_frag_alloc() does not proactively re-allocate > page_frag_alloc->va and will always re-use the prior pfmemalloc page. The > skb->pfmemalloc is always true even kernel is not under memory pressure any > longer. > > Fix this by freeing and re-allocating the page instead of recycling it. Andrew, are you taking this via -mm or should I put it in net? I'm sending a PR to Linus tomorrow.
On Wed, 18 Nov 2020 11:46:54 -0800 Jakub Kicinski <kuba@kernel.org> wrote: > > 1. The kernel is under memory pressure and allocation of > > PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead, > > the pfmemalloc page is allocated for page_frag_cache->va. > > > > 2: All skb->data from page_frag_cache->va (pfmemalloc) will have > > skb->pfmemalloc=true. The skb will always be dropped by sock without > > SOCK_MEMALLOC. This is an expected behaviour. > > > > 3. Suppose a large amount of pages are reclaimed and kernel is not under > > memory pressure any longer. We expect skb->pfmemalloc drop will not happen. > > > > 4. Unfortunately, page_frag_alloc() does not proactively re-allocate > > page_frag_alloc->va and will always re-use the prior pfmemalloc page. The > > skb->pfmemalloc is always true even kernel is not under memory pressure any > > longer. > > > > Fix this by freeing and re-allocating the page instead of recycling it. > > Andrew, are you taking this via -mm or should I put it in net? > I'm sending a PR to Linus tomorrow. Please go ahead - if/when it appears in mainline or linux-next, I'll drop the -mm copy.
On Wed, 18 Nov 2020 13:13:35 -0800 Andrew Morton wrote: > On Wed, 18 Nov 2020 11:46:54 -0800 Jakub Kicinski <kuba@kernel.org> wrote: > > > > 1. The kernel is under memory pressure and allocation of > > > PAGE_FRAG_CACHE_MAX_ORDER in __page_frag_cache_refill() will fail. Instead, > > > the pfmemalloc page is allocated for page_frag_cache->va. > > > > > > 2: All skb->data from page_frag_cache->va (pfmemalloc) will have > > > skb->pfmemalloc=true. The skb will always be dropped by sock without > > > SOCK_MEMALLOC. This is an expected behaviour. > > > > > > 3. Suppose a large amount of pages are reclaimed and kernel is not under > > > memory pressure any longer. We expect skb->pfmemalloc drop will not happen. > > > > > > 4. Unfortunately, page_frag_alloc() does not proactively re-allocate > > > page_frag_alloc->va and will always re-use the prior pfmemalloc page. The > > > skb->pfmemalloc is always true even kernel is not under memory pressure any > > > longer. > > > > > > Fix this by freeing and re-allocating the page instead of recycling it. > > > > Andrew, are you taking this via -mm or should I put it in net? > > I'm sending a PR to Linus tomorrow. > > Please go ahead - if/when it appears in mainline or linux-next, I'll > drop the -mm copy. Okay, applied, thank you!
diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 23f5066bd4a5..91129ce75ed4 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -5103,6 +5103,11 @@ void *page_frag_alloc(struct page_frag_cache *nc, if (!page_ref_sub_and_test(page, nc->pagecnt_bias)) goto refill; + if (unlikely(nc->pfmemalloc)) { + free_the_page(page, compound_order(page)); + goto refill; + } + #if (PAGE_SIZE < PAGE_FRAG_CACHE_MAX_SIZE) /* if size can vary use size else just use PAGE_SIZE */ size = nc->size;