diff mbox series

[net-next,v3] skbuff: Introduce slab_build_skb()

Message ID 20221208060256.give.994-kees@kernel.org (mailing list archive)
State Accepted
Commit ce098da1497c6dee9589fce2c61d1910f4fcf0e7
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v3] skbuff: Introduce slab_build_skb() | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5337 this patch: 5337
netdev/cc_maintainers success CCed 22 of 22 maintainers
netdev/build_clang success Errors and warnings before: 1111 this patch: 1111
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes fail Problems with Fixes tag: 1
netdev/build_allmodconfig_warn success Errors and warnings before: 5519 this patch: 5519
netdev/checkpatch warning WARNING: Unknown commit id '38931d8989b5', maybe rebased or not pulled?
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline fail Was 0 now: 2

Commit Message

Kees Cook Dec. 8, 2022, 6:02 a.m. UTC
syzkaller reported:

  BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
  Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295

For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
build_skb().

When build_skb() is passed a frag_size of 0, it means the buffer came
from kmalloc. In these cases, ksize() is used to find its actual size,
but since the allocation may not have been made to that size, actually
perform the krealloc() call so that all the associated buffer size
checking will be correctly notified (and use the "new" pointer so that
compiler hinting works correctly). Split this logic out into a new
interface, slab_build_skb(), but leave the original 0 checking for now
to catch any stragglers.

Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Pavel Begunkov <asml.silence@gmail.com>
Cc: pepsipu <soopthegoop@gmail.com>
Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: kasan-dev <kasan-dev@googlegroups.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: ast@kernel.org
Cc: bpf <bpf@vger.kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Jesper Dangaard Brouer <hawk@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: jolsa@kernel.org
Cc: KP Singh <kpsingh@kernel.org>
Cc: martin.lau@linux.dev
Cc: Stanislav Fomichev <sdf@google.com>
Cc: song@kernel.org
Cc: Yonghong Song <yhs@fb.com>
Cc: netdev@vger.kernel.org
Cc: LKML <linux-kernel@vger.kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
---
v3:
- make sure "resized" is passed back so compiler hints survive
- update kerndoc (kuba)
v2: https://lore.kernel.org/lkml/20221208000209.gonna.368-kees@kernel.org
v1: https://lore.kernel.org/netdev/20221206231659.never.929-kees@kernel.org/
---
 drivers/net/ethernet/broadcom/bnx2.c      |  2 +-
 drivers/net/ethernet/qlogic/qed/qed_ll2.c |  2 +-
 include/linux/skbuff.h                    |  1 +
 net/bpf/test_run.c                        |  2 +-
 net/core/skbuff.c                         | 70 ++++++++++++++++++++---
 5 files changed, 66 insertions(+), 11 deletions(-)

Comments

Vlastimil Babka Dec. 8, 2022, 8:13 a.m. UTC | #1
On 12/8/22 07:02, Kees Cook wrote:
> syzkaller reported:
> 
>   BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
>   Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295
> 
> For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
> build_skb().
> 
> When build_skb() is passed a frag_size of 0, it means the buffer came
> from kmalloc. In these cases, ksize() is used to find its actual size,
> but since the allocation may not have been made to that size, actually
> perform the krealloc() call so that all the associated buffer size
> checking will be correctly notified (and use the "new" pointer so that
> compiler hinting works correctly). Split this logic out into a new
> interface, slab_build_skb(), but leave the original 0 checking for now
> to catch any stragglers.
> 
> Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
> Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
> Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Pavel Begunkov <asml.silence@gmail.com>
> Cc: pepsipu <soopthegoop@gmail.com>
> Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: kasan-dev <kasan-dev@googlegroups.com>
> Cc: Andrii Nakryiko <andrii@kernel.org>
> Cc: ast@kernel.org
> Cc: bpf <bpf@vger.kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Hao Luo <haoluo@google.com>
> Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: jolsa@kernel.org
> Cc: KP Singh <kpsingh@kernel.org>
> Cc: martin.lau@linux.dev
> Cc: Stanislav Fomichev <sdf@google.com>
> Cc: song@kernel.org
> Cc: Yonghong Song <yhs@fb.com>
> Cc: netdev@vger.kernel.org
> Cc: LKML <linux-kernel@vger.kernel.org>
> Signed-off-by: Kees Cook <keescook@chromium.org>
> ---
> v3:
> - make sure "resized" is passed back so compiler hints survive
> - update kerndoc (kuba)
> v2: https://lore.kernel.org/lkml/20221208000209.gonna.368-kees@kernel.org
> v1: https://lore.kernel.org/netdev/20221206231659.never.929-kees@kernel.org/
> ---
>  drivers/net/ethernet/broadcom/bnx2.c      |  2 +-
>  drivers/net/ethernet/qlogic/qed/qed_ll2.c |  2 +-
>  include/linux/skbuff.h                    |  1 +
>  net/bpf/test_run.c                        |  2 +-
>  net/core/skbuff.c                         | 70 ++++++++++++++++++++---
>  5 files changed, 66 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
> index fec57f1982c8..b2230a4a2086 100644
> --- a/drivers/net/ethernet/broadcom/bnx2.c
> +++ b/drivers/net/ethernet/broadcom/bnx2.c
> @@ -3045,7 +3045,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u8 *data,
>  
>  	dma_unmap_single(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
>  			 DMA_FROM_DEVICE);
> -	skb = build_skb(data, 0);
> +	skb = slab_build_skb(data);
>  	if (!skb) {
>  		kfree(data);
>  		goto error;
> diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> index ed274f033626..e5116a86cfbc 100644
> --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> @@ -200,7 +200,7 @@ static void qed_ll2b_complete_rx_packet(void *cxt,
>  	dma_unmap_single(&cdev->pdev->dev, buffer->phys_addr,
>  			 cdev->ll2->rx_size, DMA_FROM_DEVICE);
>  
> -	skb = build_skb(buffer->data, 0);
> +	skb = slab_build_skb(buffer->data);
>  	if (!skb) {
>  		DP_INFO(cdev, "Failed to build SKB\n");
>  		kfree(buffer->data);
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 7be5bb4c94b6..0b391b635430 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -1253,6 +1253,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb,
>  void skb_attempt_defer_free(struct sk_buff *skb);
>  
>  struct sk_buff *napi_build_skb(void *data, unsigned int frag_size);
> +struct sk_buff *slab_build_skb(void *data);
>  
>  /**
>   * alloc_skb - allocate a network buffer
> diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> index 13d578ce2a09..611b1f4082cf 100644
> --- a/net/bpf/test_run.c
> +++ b/net/bpf/test_run.c
> @@ -1130,7 +1130,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>  	}
>  	sock_init_data(NULL, sk);
>  
> -	skb = build_skb(data, 0);
> +	skb = slab_build_skb(data);
>  	if (!skb) {
>  		kfree(data);
>  		kfree(ctx);
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 1d9719e72f9d..ae5a6f7db37b 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -269,12 +269,10 @@ static struct sk_buff *napi_skb_cache_get(void)
>  	return skb;
>  }
>  
> -/* Caller must provide SKB that is memset cleared */
> -static void __build_skb_around(struct sk_buff *skb, void *data,
> -			       unsigned int frag_size)
> +static inline void __finalize_skb_around(struct sk_buff *skb, void *data,
> +					 unsigned int size)
>  {
>  	struct skb_shared_info *shinfo;
> -	unsigned int size = frag_size ? : ksize(data);
>  
>  	size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>  
> @@ -296,15 +294,71 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
>  	skb_set_kcov_handle(skb, kcov_common_handle());
>  }
>  
> +static inline void *__slab_build_skb(struct sk_buff *skb, void *data,
> +				     unsigned int *size)
> +{
> +	void *resized;
> +
> +	/* Must find the allocation size (and grow it to match). */
> +	*size = ksize(data);
> +	/* krealloc() will immediately return "data" when
> +	 * "ksize(data)" is requested: it is the existing upper
> +	 * bounds. As a result, GFP_ATOMIC will be ignored. Note
> +	 * that this "new" pointer needs to be passed back to the
> +	 * caller for use so the __alloc_size hinting will be
> +	 * tracked correctly.
> +	 */
> +	resized = krealloc(data, *size, GFP_ATOMIC);

Hmm, I just realized, this trick will probably break the new kmalloc size
tracking from Feng Tang (CC'd)? We need to make krealloc() update the stored
size, right? And even worse if slab_debug redzoning is enabled and after
commit 946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated
kmalloc space than requested") where the lack of update will result in
redzone check failures.

> +	WARN_ON_ONCE(resized != data);
> +	return resized;
> +}
> +
> +/* build_skb() variant which can operate on slab buffers.
> + * Note that this should be used sparingly as slab buffers
> + * cannot be combined efficiently by GRO!
> + */
> +struct sk_buff *slab_build_skb(void *data)
> +{
> +	struct sk_buff *skb;
> +	unsigned int size;
> +
> +	skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC);
> +	if (unlikely(!skb))
> +		return NULL;
> +
> +	memset(skb, 0, offsetof(struct sk_buff, tail));
> +	data = __slab_build_skb(skb, data, &size);
> +	__finalize_skb_around(skb, data, size);
> +
> +	return skb;
> +}
> +EXPORT_SYMBOL(slab_build_skb);
> +
> +/* Caller must provide SKB that is memset cleared */
> +static void __build_skb_around(struct sk_buff *skb, void *data,
> +			       unsigned int frag_size)
> +{
> +	unsigned int size = frag_size;
> +
> +	/* frag_size == 0 is considered deprecated now. Callers
> +	 * using slab buffer should use slab_build_skb() instead.
> +	 */
> +	if (WARN_ONCE(size == 0, "Use slab_build_skb() instead"))
> +		data = __slab_build_skb(skb, data, &size);
> +	__finalize_skb_around(skb, data, size);
> +}
> +
>  /**
>   * __build_skb - build a network buffer
>   * @data: data buffer provided by caller
> - * @frag_size: size of data, or 0 if head was kmalloced
> + * @frag_size: size of data (must not be 0)
>   *
>   * Allocate a new &sk_buff. Caller provides space holding head and
> - * skb_shared_info. @data must have been allocated by kmalloc() only if
> - * @frag_size is 0, otherwise data should come from the page allocator
> - *  or vmalloc()
> + * skb_shared_info. @data must have been allocated from the page
> + * allocator or vmalloc(). (A @frag_size of 0 to indicate a kmalloc()
> + * allocation is deprecated, and callers should use slab_build_skb()
> + * instead.)
>   * The return is the new skb buffer.
>   * On a failure the return is %NULL, and @data is not freed.
>   * Notes :
Feng Tang Dec. 8, 2022, 10:19 a.m. UTC | #2
On Thu, Dec 08, 2022 at 09:13:41AM +0100, Vlastimil Babka wrote:
> On 12/8/22 07:02, Kees Cook wrote:
> > syzkaller reported:
> > 
> >   BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
> >   Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295
> > 
> > For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
> > build_skb().
> > 
> > When build_skb() is passed a frag_size of 0, it means the buffer came
> > from kmalloc. In these cases, ksize() is used to find its actual size,
> > but since the allocation may not have been made to that size, actually
> > perform the krealloc() call so that all the associated buffer size
> > checking will be correctly notified (and use the "new" pointer so that
> > compiler hinting works correctly). Split this logic out into a new
> > interface, slab_build_skb(), but leave the original 0 checking for now
> > to catch any stragglers.
> > 
> > Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
> > Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
> > Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Pavel Begunkov <asml.silence@gmail.com>
> > Cc: pepsipu <soopthegoop@gmail.com>
> > Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: kasan-dev <kasan-dev@googlegroups.com>
> > Cc: Andrii Nakryiko <andrii@kernel.org>
> > Cc: ast@kernel.org
> > Cc: bpf <bpf@vger.kernel.org>
> > Cc: Daniel Borkmann <daniel@iogearbox.net>
> > Cc: Hao Luo <haoluo@google.com>
> > Cc: Jesper Dangaard Brouer <hawk@kernel.org>
> > Cc: John Fastabend <john.fastabend@gmail.com>
> > Cc: jolsa@kernel.org
> > Cc: KP Singh <kpsingh@kernel.org>
> > Cc: martin.lau@linux.dev
> > Cc: Stanislav Fomichev <sdf@google.com>
> > Cc: song@kernel.org
> > Cc: Yonghong Song <yhs@fb.com>
> > Cc: netdev@vger.kernel.org
> > Cc: LKML <linux-kernel@vger.kernel.org>
> > Signed-off-by: Kees Cook <keescook@chromium.org>
> > ---
> > v3:
> > - make sure "resized" is passed back so compiler hints survive
> > - update kerndoc (kuba)
> > v2: https://lore.kernel.org/lkml/20221208000209.gonna.368-kees@kernel.org
> > v1: https://lore.kernel.org/netdev/20221206231659.never.929-kees@kernel.org/
> > ---
> >  drivers/net/ethernet/broadcom/bnx2.c      |  2 +-
> >  drivers/net/ethernet/qlogic/qed/qed_ll2.c |  2 +-
> >  include/linux/skbuff.h                    |  1 +
> >  net/bpf/test_run.c                        |  2 +-
> >  net/core/skbuff.c                         | 70 ++++++++++++++++++++---
> >  5 files changed, 66 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
> > index fec57f1982c8..b2230a4a2086 100644
> > --- a/drivers/net/ethernet/broadcom/bnx2.c
> > +++ b/drivers/net/ethernet/broadcom/bnx2.c
> > @@ -3045,7 +3045,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u8 *data,
> >  
> >  	dma_unmap_single(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
> >  			 DMA_FROM_DEVICE);
> > -	skb = build_skb(data, 0);
> > +	skb = slab_build_skb(data);
> >  	if (!skb) {
> >  		kfree(data);
> >  		goto error;
> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> > index ed274f033626..e5116a86cfbc 100644
> > --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> > +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
> > @@ -200,7 +200,7 @@ static void qed_ll2b_complete_rx_packet(void *cxt,
> >  	dma_unmap_single(&cdev->pdev->dev, buffer->phys_addr,
> >  			 cdev->ll2->rx_size, DMA_FROM_DEVICE);
> >  
> > -	skb = build_skb(buffer->data, 0);
> > +	skb = slab_build_skb(buffer->data);
> >  	if (!skb) {
> >  		DP_INFO(cdev, "Failed to build SKB\n");
> >  		kfree(buffer->data);
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 7be5bb4c94b6..0b391b635430 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -1253,6 +1253,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb,
> >  void skb_attempt_defer_free(struct sk_buff *skb);
> >  
> >  struct sk_buff *napi_build_skb(void *data, unsigned int frag_size);
> > +struct sk_buff *slab_build_skb(void *data);
> >  
> >  /**
> >   * alloc_skb - allocate a network buffer
> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
> > index 13d578ce2a09..611b1f4082cf 100644
> > --- a/net/bpf/test_run.c
> > +++ b/net/bpf/test_run.c
> > @@ -1130,7 +1130,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
> >  	}
> >  	sock_init_data(NULL, sk);
> >  
> > -	skb = build_skb(data, 0);
> > +	skb = slab_build_skb(data);
> >  	if (!skb) {
> >  		kfree(data);
> >  		kfree(ctx);
> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> > index 1d9719e72f9d..ae5a6f7db37b 100644
> > --- a/net/core/skbuff.c
> > +++ b/net/core/skbuff.c
> > @@ -269,12 +269,10 @@ static struct sk_buff *napi_skb_cache_get(void)
> >  	return skb;
> >  }
> >  
> > -/* Caller must provide SKB that is memset cleared */
> > -static void __build_skb_around(struct sk_buff *skb, void *data,
> > -			       unsigned int frag_size)
> > +static inline void __finalize_skb_around(struct sk_buff *skb, void *data,
> > +					 unsigned int size)
> >  {
> >  	struct skb_shared_info *shinfo;
> > -	unsigned int size = frag_size ? : ksize(data);
> >  
> >  	size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
> >  
> > @@ -296,15 +294,71 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
> >  	skb_set_kcov_handle(skb, kcov_common_handle());
> >  }
> >  
> > +static inline void *__slab_build_skb(struct sk_buff *skb, void *data,
> > +				     unsigned int *size)
> > +{
> > +	void *resized;
> > +
> > +	/* Must find the allocation size (and grow it to match). */
> > +	*size = ksize(data);
> > +	/* krealloc() will immediately return "data" when
> > +	 * "ksize(data)" is requested: it is the existing upper
> > +	 * bounds. As a result, GFP_ATOMIC will be ignored. Note
> > +	 * that this "new" pointer needs to be passed back to the
> > +	 * caller for use so the __alloc_size hinting will be
> > +	 * tracked correctly.
> > +	 */
> > +	resized = krealloc(data, *size, GFP_ATOMIC);
> 
> Hmm, I just realized, this trick will probably break the new kmalloc size
> tracking from Feng Tang (CC'd)? We need to make krealloc() update the stored
> size, right? And even worse if slab_debug redzoning is enabled and after
> commit 946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated
> kmalloc space than requested") where the lack of update will result in
> redzone check failures.

I think it's still safe, as currently we skip the kmalloc redzone check
by calling skip_orig_size_check() inside __ksize(). But as we have plan
to remove this skip_orig_size_check() after all ksize() usage has been
sanitized, we need to cover this krealloc() case.

Thanks,
Feng
Vlastimil Babka Dec. 8, 2022, 11:08 a.m. UTC | #3
On 12/8/22 11:19, Feng Tang wrote:
> On Thu, Dec 08, 2022 at 09:13:41AM +0100, Vlastimil Babka wrote:
>> On 12/8/22 07:02, Kees Cook wrote:
>> > syzkaller reported:
>> > 
>> >   BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
>> >   Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295
>> > 
>> > For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
>> > build_skb().
>> > 
>> > When build_skb() is passed a frag_size of 0, it means the buffer came
>> > from kmalloc. In these cases, ksize() is used to find its actual size,
>> > but since the allocation may not have been made to that size, actually
>> > perform the krealloc() call so that all the associated buffer size
>> > checking will be correctly notified (and use the "new" pointer so that
>> > compiler hinting works correctly). Split this logic out into a new
>> > interface, slab_build_skb(), but leave the original 0 checking for now
>> > to catch any stragglers.
>> > 
>> > Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
>> > Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ
>> > Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function")
>> > Cc: Jakub Kicinski <kuba@kernel.org>
>> > Cc: Eric Dumazet <edumazet@google.com>
>> > Cc: "David S. Miller" <davem@davemloft.net>
>> > Cc: Paolo Abeni <pabeni@redhat.com>
>> > Cc: Pavel Begunkov <asml.silence@gmail.com>
>> > Cc: pepsipu <soopthegoop@gmail.com>
>> > Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com
>> > Cc: Vlastimil Babka <vbabka@suse.cz>
>> > Cc: kasan-dev <kasan-dev@googlegroups.com>
>> > Cc: Andrii Nakryiko <andrii@kernel.org>
>> > Cc: ast@kernel.org
>> > Cc: bpf <bpf@vger.kernel.org>
>> > Cc: Daniel Borkmann <daniel@iogearbox.net>
>> > Cc: Hao Luo <haoluo@google.com>
>> > Cc: Jesper Dangaard Brouer <hawk@kernel.org>
>> > Cc: John Fastabend <john.fastabend@gmail.com>
>> > Cc: jolsa@kernel.org
>> > Cc: KP Singh <kpsingh@kernel.org>
>> > Cc: martin.lau@linux.dev
>> > Cc: Stanislav Fomichev <sdf@google.com>
>> > Cc: song@kernel.org
>> > Cc: Yonghong Song <yhs@fb.com>
>> > Cc: netdev@vger.kernel.org
>> > Cc: LKML <linux-kernel@vger.kernel.org>
>> > Signed-off-by: Kees Cook <keescook@chromium.org>
>> > ---
>> > v3:
>> > - make sure "resized" is passed back so compiler hints survive
>> > - update kerndoc (kuba)
>> > v2: https://lore.kernel.org/lkml/20221208000209.gonna.368-kees@kernel.org
>> > v1: https://lore.kernel.org/netdev/20221206231659.never.929-kees@kernel.org/
>> > ---
>> >  drivers/net/ethernet/broadcom/bnx2.c      |  2 +-
>> >  drivers/net/ethernet/qlogic/qed/qed_ll2.c |  2 +-
>> >  include/linux/skbuff.h                    |  1 +
>> >  net/bpf/test_run.c                        |  2 +-
>> >  net/core/skbuff.c                         | 70 ++++++++++++++++++++---
>> >  5 files changed, 66 insertions(+), 11 deletions(-)
>> > 
>> > diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
>> > index fec57f1982c8..b2230a4a2086 100644
>> > --- a/drivers/net/ethernet/broadcom/bnx2.c
>> > +++ b/drivers/net/ethernet/broadcom/bnx2.c
>> > @@ -3045,7 +3045,7 @@ bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u8 *data,
>> >  
>> >  	dma_unmap_single(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
>> >  			 DMA_FROM_DEVICE);
>> > -	skb = build_skb(data, 0);
>> > +	skb = slab_build_skb(data);
>> >  	if (!skb) {
>> >  		kfree(data);
>> >  		goto error;
>> > diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
>> > index ed274f033626..e5116a86cfbc 100644
>> > --- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
>> > +++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
>> > @@ -200,7 +200,7 @@ static void qed_ll2b_complete_rx_packet(void *cxt,
>> >  	dma_unmap_single(&cdev->pdev->dev, buffer->phys_addr,
>> >  			 cdev->ll2->rx_size, DMA_FROM_DEVICE);
>> >  
>> > -	skb = build_skb(buffer->data, 0);
>> > +	skb = slab_build_skb(buffer->data);
>> >  	if (!skb) {
>> >  		DP_INFO(cdev, "Failed to build SKB\n");
>> >  		kfree(buffer->data);
>> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
>> > index 7be5bb4c94b6..0b391b635430 100644
>> > --- a/include/linux/skbuff.h
>> > +++ b/include/linux/skbuff.h
>> > @@ -1253,6 +1253,7 @@ struct sk_buff *build_skb_around(struct sk_buff *skb,
>> >  void skb_attempt_defer_free(struct sk_buff *skb);
>> >  
>> >  struct sk_buff *napi_build_skb(void *data, unsigned int frag_size);
>> > +struct sk_buff *slab_build_skb(void *data);
>> >  
>> >  /**
>> >   * alloc_skb - allocate a network buffer
>> > diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
>> > index 13d578ce2a09..611b1f4082cf 100644
>> > --- a/net/bpf/test_run.c
>> > +++ b/net/bpf/test_run.c
>> > @@ -1130,7 +1130,7 @@ int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
>> >  	}
>> >  	sock_init_data(NULL, sk);
>> >  
>> > -	skb = build_skb(data, 0);
>> > +	skb = slab_build_skb(data);
>> >  	if (!skb) {
>> >  		kfree(data);
>> >  		kfree(ctx);
>> > diff --git a/net/core/skbuff.c b/net/core/skbuff.c
>> > index 1d9719e72f9d..ae5a6f7db37b 100644
>> > --- a/net/core/skbuff.c
>> > +++ b/net/core/skbuff.c
>> > @@ -269,12 +269,10 @@ static struct sk_buff *napi_skb_cache_get(void)
>> >  	return skb;
>> >  }
>> >  
>> > -/* Caller must provide SKB that is memset cleared */
>> > -static void __build_skb_around(struct sk_buff *skb, void *data,
>> > -			       unsigned int frag_size)
>> > +static inline void __finalize_skb_around(struct sk_buff *skb, void *data,
>> > +					 unsigned int size)
>> >  {
>> >  	struct skb_shared_info *shinfo;
>> > -	unsigned int size = frag_size ? : ksize(data);
>> >  
>> >  	size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> >  
>> > @@ -296,15 +294,71 @@ static void __build_skb_around(struct sk_buff *skb, void *data,
>> >  	skb_set_kcov_handle(skb, kcov_common_handle());
>> >  }
>> >  
>> > +static inline void *__slab_build_skb(struct sk_buff *skb, void *data,
>> > +				     unsigned int *size)
>> > +{
>> > +	void *resized;
>> > +
>> > +	/* Must find the allocation size (and grow it to match). */
>> > +	*size = ksize(data);
>> > +	/* krealloc() will immediately return "data" when
>> > +	 * "ksize(data)" is requested: it is the existing upper
>> > +	 * bounds. As a result, GFP_ATOMIC will be ignored. Note
>> > +	 * that this "new" pointer needs to be passed back to the
>> > +	 * caller for use so the __alloc_size hinting will be
>> > +	 * tracked correctly.
>> > +	 */
>> > +	resized = krealloc(data, *size, GFP_ATOMIC);
>> 
>> Hmm, I just realized, this trick will probably break the new kmalloc size
>> tracking from Feng Tang (CC'd)? We need to make krealloc() update the stored
>> size, right? And even worse if slab_debug redzoning is enabled and after
>> commit 946fa0dbf2d8 ("mm/slub: extend redzone check to extra allocated
>> kmalloc space than requested") where the lack of update will result in
>> redzone check failures.
> 
> I think it's still safe, as currently we skip the kmalloc redzone check
> by calling skip_orig_size_check() inside __ksize(). But as we have plan

Ah, right, I forgot. So that's good.

> to remove this skip_orig_size_check() after all ksize() usage has been
> sanitized, we need to cover this krealloc() case.

Yeah, can be done as part of the removal then, thanks.

> Thanks,
> Feng
patchwork-bot+netdevbpf@kernel.org Dec. 10, 2022, 4 a.m. UTC | #4
Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Wed,  7 Dec 2022 22:02:59 -0800 you wrote:
> syzkaller reported:
> 
>   BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294
>   Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295
> 
> For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to
> build_skb().
> 
> [...]

Here is the summary with links:
  - [net-next,v3] skbuff: Introduce slab_build_skb()
    https://git.kernel.org/netdev/net-next/c/ce098da1497c

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/net/ethernet/broadcom/bnx2.c b/drivers/net/ethernet/broadcom/bnx2.c
index fec57f1982c8..b2230a4a2086 100644
--- a/drivers/net/ethernet/broadcom/bnx2.c
+++ b/drivers/net/ethernet/broadcom/bnx2.c
@@ -3045,7 +3045,7 @@  bnx2_rx_skb(struct bnx2 *bp, struct bnx2_rx_ring_info *rxr, u8 *data,
 
 	dma_unmap_single(&bp->pdev->dev, dma_addr, bp->rx_buf_use_size,
 			 DMA_FROM_DEVICE);
-	skb = build_skb(data, 0);
+	skb = slab_build_skb(data);
 	if (!skb) {
 		kfree(data);
 		goto error;
diff --git a/drivers/net/ethernet/qlogic/qed/qed_ll2.c b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
index ed274f033626..e5116a86cfbc 100644
--- a/drivers/net/ethernet/qlogic/qed/qed_ll2.c
+++ b/drivers/net/ethernet/qlogic/qed/qed_ll2.c
@@ -200,7 +200,7 @@  static void qed_ll2b_complete_rx_packet(void *cxt,
 	dma_unmap_single(&cdev->pdev->dev, buffer->phys_addr,
 			 cdev->ll2->rx_size, DMA_FROM_DEVICE);
 
-	skb = build_skb(buffer->data, 0);
+	skb = slab_build_skb(buffer->data);
 	if (!skb) {
 		DP_INFO(cdev, "Failed to build SKB\n");
 		kfree(buffer->data);
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 7be5bb4c94b6..0b391b635430 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -1253,6 +1253,7 @@  struct sk_buff *build_skb_around(struct sk_buff *skb,
 void skb_attempt_defer_free(struct sk_buff *skb);
 
 struct sk_buff *napi_build_skb(void *data, unsigned int frag_size);
+struct sk_buff *slab_build_skb(void *data);
 
 /**
  * alloc_skb - allocate a network buffer
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 13d578ce2a09..611b1f4082cf 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -1130,7 +1130,7 @@  int bpf_prog_test_run_skb(struct bpf_prog *prog, const union bpf_attr *kattr,
 	}
 	sock_init_data(NULL, sk);
 
-	skb = build_skb(data, 0);
+	skb = slab_build_skb(data);
 	if (!skb) {
 		kfree(data);
 		kfree(ctx);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 1d9719e72f9d..ae5a6f7db37b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -269,12 +269,10 @@  static struct sk_buff *napi_skb_cache_get(void)
 	return skb;
 }
 
-/* Caller must provide SKB that is memset cleared */
-static void __build_skb_around(struct sk_buff *skb, void *data,
-			       unsigned int frag_size)
+static inline void __finalize_skb_around(struct sk_buff *skb, void *data,
+					 unsigned int size)
 {
 	struct skb_shared_info *shinfo;
-	unsigned int size = frag_size ? : ksize(data);
 
 	size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
 
@@ -296,15 +294,71 @@  static void __build_skb_around(struct sk_buff *skb, void *data,
 	skb_set_kcov_handle(skb, kcov_common_handle());
 }
 
+static inline void *__slab_build_skb(struct sk_buff *skb, void *data,
+				     unsigned int *size)
+{
+	void *resized;
+
+	/* Must find the allocation size (and grow it to match). */
+	*size = ksize(data);
+	/* krealloc() will immediately return "data" when
+	 * "ksize(data)" is requested: it is the existing upper
+	 * bounds. As a result, GFP_ATOMIC will be ignored. Note
+	 * that this "new" pointer needs to be passed back to the
+	 * caller for use so the __alloc_size hinting will be
+	 * tracked correctly.
+	 */
+	resized = krealloc(data, *size, GFP_ATOMIC);
+	WARN_ON_ONCE(resized != data);
+	return resized;
+}
+
+/* build_skb() variant which can operate on slab buffers.
+ * Note that this should be used sparingly as slab buffers
+ * cannot be combined efficiently by GRO!
+ */
+struct sk_buff *slab_build_skb(void *data)
+{
+	struct sk_buff *skb;
+	unsigned int size;
+
+	skb = kmem_cache_alloc(skbuff_head_cache, GFP_ATOMIC);
+	if (unlikely(!skb))
+		return NULL;
+
+	memset(skb, 0, offsetof(struct sk_buff, tail));
+	data = __slab_build_skb(skb, data, &size);
+	__finalize_skb_around(skb, data, size);
+
+	return skb;
+}
+EXPORT_SYMBOL(slab_build_skb);
+
+/* Caller must provide SKB that is memset cleared */
+static void __build_skb_around(struct sk_buff *skb, void *data,
+			       unsigned int frag_size)
+{
+	unsigned int size = frag_size;
+
+	/* frag_size == 0 is considered deprecated now. Callers
+	 * using slab buffer should use slab_build_skb() instead.
+	 */
+	if (WARN_ONCE(size == 0, "Use slab_build_skb() instead"))
+		data = __slab_build_skb(skb, data, &size);
+
+	__finalize_skb_around(skb, data, size);
+}
+
 /**
  * __build_skb - build a network buffer
  * @data: data buffer provided by caller
- * @frag_size: size of data, or 0 if head was kmalloced
+ * @frag_size: size of data (must not be 0)
  *
  * Allocate a new &sk_buff. Caller provides space holding head and
- * skb_shared_info. @data must have been allocated by kmalloc() only if
- * @frag_size is 0, otherwise data should come from the page allocator
- *  or vmalloc()
+ * skb_shared_info. @data must have been allocated from the page
+ * allocator or vmalloc(). (A @frag_size of 0 to indicate a kmalloc()
+ * allocation is deprecated, and callers should use slab_build_skb()
+ * instead.)
  * The return is the new skb buffer.
  * On a failure the return is %NULL, and @data is not freed.
  * Notes :