Message ID | 20210113051207.142711-1-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] Revert "virtio_net: replace netdev_alloc_skb_ip_align() with napi_alloc_skb()" | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | fail | Series targets non-next tree, but doesn't contain any Fixes tags |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | warning | 2 maintainers not CCed: jasowang@redhat.com virtualization@lists.linux-foundation.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 0 this patch: 0 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 8 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 0 this patch: 0 |
netdev/header_inline | success | Link |
netdev/stable | success | Stable not CCed |
On Wed, Jan 13, 2021 at 6:12 AM Eric Dumazet <eric.dumazet@gmail.com> wrote: > > From: Eric Dumazet <edumazet@google.com> > > This reverts commit c67f5db82027ba6d2ea4ac9176bc45996a03ae6a. > > While using page fragments instead of a kmalloc backed skb->head might give > a small performance improvement in some cases, there is a huge risk of > memory use under estimation. > > GOOD_COPY_LEN is 128 bytes. This means that we need a small amount > of memory to hold the headers and struct skb_shared_info > > Yet, napi_alloc_skb() might use a whole 32KB page (or 64KB on PowerPc) > for long lived incoming TCP packets. > > We have been tracking OOM issues on GKE hosts hitting tcp_mem limits > but consuming far more memory for TCP buffers than instructed in tcp_mem[2] > > Even if we force napi_alloc_skb() to only use order-0 pages, the issue > would still be there on arches with PAGE_SIZE >= 32768 > > Using alloc_skb() and thus standard kmallloc() for skb->head allocations > will get the benefit of letting other objects in each page being independently > used by other skbs, regardless of the lifetime. > > Note that a similar problem exists for skbs allocated from napi_get_frags(), > this is handled in a separate patch. > > I would like to thank Greg Thelen for his precious help on this matter, > analysing crash dumps is always a time consuming task. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Michael S. Tsirkin <mst@redhat.com> > Cc: Greg Thelen <gthelen@google.com> > --- > drivers/net/virtio_net.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 508408fbe78fbd8658dc226834b5b1b334b8b011..5886504c1acacf3f6148127b5c1cc7f6a906b827 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -386,7 +386,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, > p = page_address(page) + offset; > > /* copy small packet so we can reuse these pages for small data */ > - skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN); > + skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN); > if (unlikely(!skb)) > return NULL; > > -- > 2.30.0.284.gd98b1dd5eaa7-goog > Note that __netdev_alloc_skb() will also need to be changed. Fitting 32 skb->head in a page without SLAB/SLUB help is simply too dangerous.
On Tue, Jan 12, 2021 at 09:12:07PM -0800, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > This reverts commit c67f5db82027ba6d2ea4ac9176bc45996a03ae6a. > > While using page fragments instead of a kmalloc backed skb->head might give > a small performance improvement in some cases, there is a huge risk of > memory use under estimation. > > GOOD_COPY_LEN is 128 bytes. This means that we need a small amount > of memory to hold the headers and struct skb_shared_info > > Yet, napi_alloc_skb() might use a whole 32KB page (or 64KB on PowerPc) > for long lived incoming TCP packets. > > We have been tracking OOM issues on GKE hosts hitting tcp_mem limits > but consuming far more memory for TCP buffers than instructed in tcp_mem[2] Are you using virtio on the host then? Is this with a hardware virtio device? These do exist, guest is just more common, so I wanted to make sure this is not a mistake. > Even if we force napi_alloc_skb() to only use order-0 pages, the issue > would still be there on arches with PAGE_SIZE >= 32768 > > Using alloc_skb() and thus standard kmallloc() for skb->head allocations > will get the benefit of letting other objects in each page being independently > used by other skbs, regardless of the lifetime. > > Note that a similar problem exists for skbs allocated from napi_get_frags(), > this is handled in a separate patch. > > I would like to thank Greg Thelen for his precious help on this matter, > analysing crash dumps is always a time consuming task. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Michael S. Tsirkin <mst@redhat.com> > Cc: Greg Thelen <gthelen@google.com> Just curious - is the way virtio used napi_alloc_skb wrong somehow? The idea was to benefit from better batching and less play with irq save ... It would be helpful to improve the comments for napi_alloc_skb to make it clearer how to use it correctly. Are other uses of napi_alloc_skb ok? > --- > drivers/net/virtio_net.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 508408fbe78fbd8658dc226834b5b1b334b8b011..5886504c1acacf3f6148127b5c1cc7f6a906b827 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -386,7 +386,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, > p = page_address(page) + offset; > > /* copy small packet so we can reuse these pages for small data */ > - skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN); > + skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN); > if (unlikely(!skb)) > return NULL; > > -- > 2.30.0.284.gd98b1dd5eaa7-goog
On Wed, Jan 13, 2021 at 10:53 AM Michael S. Tsirkin <mst@redhat.com> wrote: > > On Tue, Jan 12, 2021 at 09:12:07PM -0800, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > This reverts commit c67f5db82027ba6d2ea4ac9176bc45996a03ae6a. > > > > While using page fragments instead of a kmalloc backed skb->head might give > > a small performance improvement in some cases, there is a huge risk of > > memory use under estimation. > > > > GOOD_COPY_LEN is 128 bytes. This means that we need a small amount > > of memory to hold the headers and struct skb_shared_info > > > > Yet, napi_alloc_skb() might use a whole 32KB page (or 64KB on PowerPc) > > for long lived incoming TCP packets. > > > > We have been tracking OOM issues on GKE hosts hitting tcp_mem limits > > but consuming far more memory for TCP buffers than instructed in tcp_mem[2] > > Are you using virtio on the host then? Is this with a hardware virtio > device? These do exist, guest is just more common, so I wanted to > make sure this is not a mistake. > > > Even if we force napi_alloc_skb() to only use order-0 pages, the issue > > would still be there on arches with PAGE_SIZE >= 32768 > > > > Using alloc_skb() and thus standard kmallloc() for skb->head allocations > > will get the benefit of letting other objects in each page being independently > > used by other skbs, regardless of the lifetime. > > > > Note that a similar problem exists for skbs allocated from napi_get_frags(), > > this is handled in a separate patch. > > > > I would like to thank Greg Thelen for his precious help on this matter, > > analysing crash dumps is always a time consuming task. > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Paolo Abeni <pabeni@redhat.com> > > Cc: Michael S. Tsirkin <mst@redhat.com> > > Cc: Greg Thelen <gthelen@google.com> > > Just curious - is the way virtio used napi_alloc_skb wrong somehow? > > The idea was to benefit from better batching and less play with irq save > ... > > It would be helpful to improve the comments for napi_alloc_skb > to make it clearer how to use it correctly. > > Are other uses of napi_alloc_skb ok? Yeah, as I mentioned already, the real problem lies in use of fragments for skb->head in general, especially for small buffers. It is too dangerous to use 32KB pages to allocate ~1KB buffers. I am working on it.
On Tue, 2021-01-12 at 21:12 -0800, Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > This reverts commit c67f5db82027ba6d2ea4ac9176bc45996a03ae6a. > > While using page fragments instead of a kmalloc backed skb->head might give > a small performance improvement in some cases, there is a huge risk of > memory use under estimation. > > GOOD_COPY_LEN is 128 bytes. This means that we need a small amount > of memory to hold the headers and struct skb_shared_info > > Yet, napi_alloc_skb() might use a whole 32KB page (or 64KB on PowerPc) > for long lived incoming TCP packets. > > We have been tracking OOM issues on GKE hosts hitting tcp_mem limits > but consuming far more memory for TCP buffers than instructed in tcp_mem[2] > > Even if we force napi_alloc_skb() to only use order-0 pages, the issue > would still be there on arches with PAGE_SIZE >= 32768 > > Using alloc_skb() and thus standard kmallloc() for skb->head allocations > will get the benefit of letting other objects in each page being independently > used by other skbs, regardless of the lifetime. > > Note that a similar problem exists for skbs allocated from napi_get_frags(), > this is handled in a separate patch. > > I would like to thank Greg Thelen for his precious help on this matter, > analysing crash dumps is always a time consuming task. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Michael S. Tsirkin <mst@redhat.com> > Cc: Greg Thelen <gthelen@google.com> > --- > drivers/net/virtio_net.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > index 508408fbe78fbd8658dc226834b5b1b334b8b011..5886504c1acacf3f6148127b5c1cc7f6a906b827 100644 > --- a/drivers/net/virtio_net.c > +++ b/drivers/net/virtio_net.c > @@ -386,7 +386,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, > p = page_address(page) + offset; > > /* copy small packet so we can reuse these pages for small data */ > - skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN); > + skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN); > if (unlikely(!skb)) > return NULL; I'm ok with the revert. The gain given by the original change in my tests was measurable, but small. Acked-by: Paolo Abeni <pabeni@redhat.com>
On Wed, Jan 13, 2021 at 4:44 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2021-01-12 at 21:12 -0800, Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > This reverts commit c67f5db82027ba6d2ea4ac9176bc45996a03ae6a. > > > > While using page fragments instead of a kmalloc backed skb->head might give > > a small performance improvement in some cases, there is a huge risk of > > memory use under estimation. > > > > GOOD_COPY_LEN is 128 bytes. This means that we need a small amount > > of memory to hold the headers and struct skb_shared_info > > > > Yet, napi_alloc_skb() might use a whole 32KB page (or 64KB on PowerPc) > > for long lived incoming TCP packets. > > > > We have been tracking OOM issues on GKE hosts hitting tcp_mem limits > > but consuming far more memory for TCP buffers than instructed in tcp_mem[2] > > > > Even if we force napi_alloc_skb() to only use order-0 pages, the issue > > would still be there on arches with PAGE_SIZE >= 32768 > > > > Using alloc_skb() and thus standard kmallloc() for skb->head allocations > > will get the benefit of letting other objects in each page being independently > > used by other skbs, regardless of the lifetime. > > > > Note that a similar problem exists for skbs allocated from napi_get_frags(), > > this is handled in a separate patch. > > > > I would like to thank Greg Thelen for his precious help on this matter, > > analysing crash dumps is always a time consuming task. > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Cc: Paolo Abeni <pabeni@redhat.com> > > Cc: Michael S. Tsirkin <mst@redhat.com> > > Cc: Greg Thelen <gthelen@google.com> > > --- > > drivers/net/virtio_net.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c > > index 508408fbe78fbd8658dc226834b5b1b334b8b011..5886504c1acacf3f6148127b5c1cc7f6a906b827 100644 > > --- a/drivers/net/virtio_net.c > > +++ b/drivers/net/virtio_net.c > > @@ -386,7 +386,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, > > p = page_address(page) + offset; > > > > /* copy small packet so we can reuse these pages for small data */ > > - skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN); > > + skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN); > > if (unlikely(!skb)) > > return NULL; > > I'm ok with the revert. The gain given by the original change in my > tests was measurable, but small. > > Acked-by: Paolo Abeni <pabeni@redhat.com> To be clear, I now think the revert is not needed. I will post instead a patch that should take care of the problem both for virtio and napi_get_frags() tiny skbs Something like : diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 7626a33cce590e530f36167bd096026916131897..3a8f55a43e6964344df464a27b9b1faa0eb804f3 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -501,13 +501,17 @@ EXPORT_SYMBOL(__netdev_alloc_skb); struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, gfp_t gfp_mask) { - struct napi_alloc_cache *nc = this_cpu_ptr(&napi_alloc_cache); + struct napi_alloc_cache *nc; struct sk_buff *skb; void *data; len += NET_SKB_PAD + NET_IP_ALIGN; - if ((len > SKB_WITH_OVERHEAD(PAGE_SIZE)) || + /* If requested length is either too small or too big, + * we use kmalloc() for skb->head allocation. + */ + if (len <= SKB_WITH_OVERHEAD(1024) || + len > SKB_WITH_OVERHEAD(PAGE_SIZE) || (gfp_mask & (__GFP_DIRECT_RECLAIM | GFP_DMA))) { skb = __alloc_skb(len, gfp_mask, SKB_ALLOC_RX, NUMA_NO_NODE); if (!skb) @@ -515,6 +519,7 @@ struct sk_buff *__napi_alloc_skb(struct napi_struct *napi, unsigned int len, goto skb_success; } + nc = this_cpu_ptr(&napi_alloc_cache); len += SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); len = SKB_DATA_ALIGN(len);
On Wed, Jan 13, 2021 at 4:51 PM Eric Dumazet <edumazet@google.com> wrote: > > To be clear, I now think the revert is not needed. > > I will post instead a patch that should take care of the problem both > for virtio and napi_get_frags() tiny skbs Then later, we might change things so that the sk_buff cache being changed by Alexander Lobakin can also be used for these allocations.
On Wed, Jan 13, 2021 at 04:53:38PM +0100, Eric Dumazet wrote: > On Wed, Jan 13, 2021 at 4:51 PM Eric Dumazet <edumazet@google.com> wrote: > > > > > To be clear, I now think the revert is not needed. > > > > I will post instead a patch that should take care of the problem both > > for virtio and napi_get_frags() tiny skbs > > Then later, we might change things so that the sk_buff cache being > changed by Alexander Lobakin > can also be used for these allocations. That sounds like a great idea.
diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 508408fbe78fbd8658dc226834b5b1b334b8b011..5886504c1acacf3f6148127b5c1cc7f6a906b827 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -386,7 +386,7 @@ static struct sk_buff *page_to_skb(struct virtnet_info *vi, p = page_address(page) + offset; /* copy small packet so we can reuse these pages for small data */ - skb = napi_alloc_skb(&rq->napi, GOOD_COPY_LEN); + skb = netdev_alloc_skb_ip_align(vi->dev, GOOD_COPY_LEN); if (unlikely(!skb)) return NULL;