Message ID | 20221206231659.never.929-kees@kernel.org (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | skbuff: Reallocate to ksize() in __build_skb_around() | expand |
On Tue, 6 Dec 2022 15:17:14 -0800 Kees Cook wrote: > - unsigned int size = frag_size ? : ksize(data); > + unsigned int size = frag_size; > + > + /* When frag_size == 0, the buffer came from kmalloc, so we > + * must find its true allocation size (and grow it to match). > + */ > + if (unlikely(size == 0)) { > + void *resized; > + > + size = ksize(data); > + /* krealloc() will immediate return "data" when > + * "ksize(data)" is requested: it is the existing upper > + * bounds. As a result, GFP_ATOMIC will be ignored. > + */ > + resized = krealloc(data, size, GFP_ATOMIC); > + if (WARN_ON(resized != data)) > + data = resized; > + } > Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of using kmalloc()'ed heads is large because GRO can't free the metadata. So we end up carrying per-MTU skbs across to the application and then freeing them one by one. With pages we just aggregate up to 64k of data in a single skb. I can only grep out 3 cases of build_skb(.. 0), could we instead convert them into a new build_skb_slab(), and handle all the silliness in such a new helper? That'd be a win both for the memory safety and one fewer branch for the fast path. I think it's worth doing, so LMK if you're okay to do this extra work, otherwise I can help (unless e.g. Eric tells me I'm wrong..).
On December 6, 2022 5:55:57 PM PST, Jakub Kicinski <kuba@kernel.org> wrote: >On Tue, 6 Dec 2022 15:17:14 -0800 Kees Cook wrote: >> - unsigned int size = frag_size ? : ksize(data); >> + unsigned int size = frag_size; >> + >> + /* When frag_size == 0, the buffer came from kmalloc, so we >> + * must find its true allocation size (and grow it to match). >> + */ >> + if (unlikely(size == 0)) { >> + void *resized; >> + >> + size = ksize(data); >> + /* krealloc() will immediate return "data" when >> + * "ksize(data)" is requested: it is the existing upper >> + * bounds. As a result, GFP_ATOMIC will be ignored. >> + */ >> + resized = krealloc(data, size, GFP_ATOMIC); >> + if (WARN_ON(resized != data)) >> + data = resized; >> + } >> > >Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of >using kmalloc()'ed heads is large because GRO can't free the metadata. >So we end up carrying per-MTU skbs across to the application and then >freeing them one by one. With pages we just aggregate up to 64k of data >in a single skb. This isn't changed by this patch, though? The users of kmalloc+build_skb are pre-existing. >I can only grep out 3 cases of build_skb(.. 0), could we instead >convert them into a new build_skb_slab(), and handle all the silliness >in such a new helper? That'd be a win both for the memory safety and one >fewer branch for the fast path. When I went through callers, it was many more than 3. Regardless, I don't see the point: my patch has no more branches than the original code (in fact, it may actually be faster because I made the initial assignment unconditional, and zero-test-after-assign is almost free, where as before it tested before the assign. And now it's marked as unlikely to keep it out-of-line. >I think it's worth doing, so LMK if you're okay to do this extra work, >otherwise I can help (unless e.g. Eric tells me I'm wrong..). I had been changing callers to round up (e.g. bnx2), but it seemed like centralizing this makes more sense. I don't think a different helper will clean this up. -Kees
On Tue, 06 Dec 2022 19:47:13 -0800 Kees Cook wrote: > >Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of > >using kmalloc()'ed heads is large because GRO can't free the metadata. > >So we end up carrying per-MTU skbs across to the application and then > >freeing them one by one. With pages we just aggregate up to 64k of data > >in a single skb. > > This isn't changed by this patch, though? The users of > kmalloc+build_skb are pre-existing. Yes. > >I can only grep out 3 cases of build_skb(.. 0), could we instead > >convert them into a new build_skb_slab(), and handle all the silliness > >in such a new helper? That'd be a win both for the memory safety and one > >fewer branch for the fast path. > > When I went through callers, it was many more than 3. Regardless, I > don't see the point: my patch has no more branches than the original > code (in fact, it may actually be faster because I made the initial > assignment unconditional, and zero-test-after-assign is almost free, > where as before it tested before the assign. And now it's marked as > unlikely to keep it out-of-line. Maybe. > >I think it's worth doing, so LMK if you're okay to do this extra > >work, otherwise I can help (unless e.g. Eric tells me I'm wrong..). > > I had been changing callers to round up (e.g. bnx2), but it seemed > like centralizing this makes more sense. I don't think a different > helper will clean this up. It's a combination of the fact that I think "0 is magic" falls in the "garbage" category of APIs, and the fact that driver developers have many things to worry about, so they often don't know that using slab is a bad idea. So I want a helper out of the normal path, where I can put a kdoc warning that says "if you're doing this - GRO will suck, use page frags".
On 12/7/22 00:17, Kees Cook wrote: > When build_skb() is passed a frag_size of 0, it means the buffer came > from kmalloc. In these cases, ksize() is used to find its actual size, > but since the allocation may not have been made to that size, actually > perform the krealloc() call so that all the associated buffer size > checking will be correctly notified. For example, syzkaller reported: > > BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294 > Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295 > > For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to > build_skb(). Weren't all such kmalloc() users converted to kmalloc_size_roundup() to prevent this? > Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com > Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ > Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function") > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Pavel Begunkov <asml.silence@gmail.com> > Cc: pepsipu <soopthegoop@gmail.com> > Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com > Cc: Vlastimil Babka <vbabka@suse.cz> > Cc: kasan-dev <kasan-dev@googlegroups.com> > Cc: Andrii Nakryiko <andrii@kernel.org> > Cc: ast@kernel.org > Cc: bpf <bpf@vger.kernel.org> > Cc: Daniel Borkmann <daniel@iogearbox.net> > Cc: Hao Luo <haoluo@google.com> > Cc: Jesper Dangaard Brouer <hawk@kernel.org> > Cc: John Fastabend <john.fastabend@gmail.com> > Cc: jolsa@kernel.org > Cc: KP Singh <kpsingh@kernel.org> > Cc: martin.lau@linux.dev > Cc: Stanislav Fomichev <sdf@google.com> > Cc: song@kernel.org > Cc: Yonghong Song <yhs@fb.com> > Cc: netdev@vger.kernel.org > Cc: LKML <linux-kernel@vger.kernel.org> > Signed-off-by: Kees Cook <keescook@chromium.org> > --- > net/core/skbuff.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 1d9719e72f9d..b55d061ed8b4 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -274,7 +274,23 @@ static void __build_skb_around(struct sk_buff *skb, void *data, > unsigned int frag_size) > { > struct skb_shared_info *shinfo; > - unsigned int size = frag_size ? : ksize(data); > + unsigned int size = frag_size; > + > + /* When frag_size == 0, the buffer came from kmalloc, so we > + * must find its true allocation size (and grow it to match). > + */ > + if (unlikely(size == 0)) { > + void *resized; > + > + size = ksize(data); > + /* krealloc() will immediate return "data" when > + * "ksize(data)" is requested: it is the existing upper > + * bounds. As a result, GFP_ATOMIC will be ignored. > + */ > + resized = krealloc(data, size, GFP_ATOMIC); > + if (WARN_ON(resized != data)) WARN_ON_ONCE() could be sufficient as either this is impossible to hit by definition, or something went very wrong (a patch screwed ksize/krealloc?) and it can be hit many times? > + data = resized; In that "impossible" case, this could also end up as NULL due to GFP_ATOMIC allocation failure, but maybe it's really impractical to do anything about it... > + } > > size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info)); >
On Wed, Dec 7, 2022 at 2:56 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 6 Dec 2022 15:17:14 -0800 Kees Cook wrote: > > - unsigned int size = frag_size ? : ksize(data); > > + unsigned int size = frag_size; > > + > > + /* When frag_size == 0, the buffer came from kmalloc, so we > > + * must find its true allocation size (and grow it to match). > > + */ > > + if (unlikely(size == 0)) { > > + void *resized; > > + > > + size = ksize(data); > > + /* krealloc() will immediate return "data" when > > + * "ksize(data)" is requested: it is the existing upper > > + * bounds. As a result, GFP_ATOMIC will be ignored. > > + */ > > + resized = krealloc(data, size, GFP_ATOMIC); > > + if (WARN_ON(resized != data)) > > + data = resized; > > + } > > > > Aammgh. build_skb(0) is plain silly, AFAIK. The performance hit of > using kmalloc()'ed heads is large because GRO can't free the metadata. > So we end up carrying per-MTU skbs across to the application and then > freeing them one by one. With pages we just aggregate up to 64k of data > in a single skb. > > I can only grep out 3 cases of build_skb(.. 0), could we instead > convert them into a new build_skb_slab(), and handle all the silliness > in such a new helper? That'd be a win both for the memory safety and one > fewer branch for the fast path. > > I think it's worth doing, so LMK if you're okay to do this extra work, > otherwise I can help (unless e.g. Eric tells me I'm wrong..). I totally agree, I would indeed remove ksize() use completely, let callers give us the size, and the head_frag boolean, instead of inferring from size==0
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 1d9719e72f9d..b55d061ed8b4 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -274,7 +274,23 @@ static void __build_skb_around(struct sk_buff *skb, void *data, unsigned int frag_size) { struct skb_shared_info *shinfo; - unsigned int size = frag_size ? : ksize(data); + unsigned int size = frag_size; + + /* When frag_size == 0, the buffer came from kmalloc, so we + * must find its true allocation size (and grow it to match). + */ + if (unlikely(size == 0)) { + void *resized; + + size = ksize(data); + /* krealloc() will immediate return "data" when + * "ksize(data)" is requested: it is the existing upper + * bounds. As a result, GFP_ATOMIC will be ignored. + */ + resized = krealloc(data, size, GFP_ATOMIC); + if (WARN_ON(resized != data)) + data = resized; + } size -= SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
When build_skb() is passed a frag_size of 0, it means the buffer came from kmalloc. In these cases, ksize() is used to find its actual size, but since the allocation may not have been made to that size, actually perform the krealloc() call so that all the associated buffer size checking will be correctly notified. For example, syzkaller reported: BUG: KASAN: slab-out-of-bounds in __build_skb_around+0x235/0x340 net/core/skbuff.c:294 Write of size 32 at addr ffff88802aa172c0 by task syz-executor413/5295 For bpf_prog_test_run_skb(), which uses a kmalloc()ed buffer passed to build_skb(). Reported-by: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com Link: https://groups.google.com/g/syzkaller-bugs/c/UnIKxTtU5-0/m/-wbXinkgAQAJ Fixes: 38931d8989b5 ("mm: Make ksize() a reporting-only function") Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: pepsipu <soopthegoop@gmail.com> Cc: syzbot+fda18eaa8c12534ccb3b@syzkaller.appspotmail.com Cc: Vlastimil Babka <vbabka@suse.cz> Cc: kasan-dev <kasan-dev@googlegroups.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: ast@kernel.org Cc: bpf <bpf@vger.kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Hao Luo <haoluo@google.com> Cc: Jesper Dangaard Brouer <hawk@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: jolsa@kernel.org Cc: KP Singh <kpsingh@kernel.org> Cc: martin.lau@linux.dev Cc: Stanislav Fomichev <sdf@google.com> Cc: song@kernel.org Cc: Yonghong Song <yhs@fb.com> Cc: netdev@vger.kernel.org Cc: LKML <linux-kernel@vger.kernel.org> Signed-off-by: Kees Cook <keescook@chromium.org> --- net/core/skbuff.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-)