Message ID | 167361792462.531803.224198635706602340.stgit@firesoul (mailing list archive) |
---|---|
State | Accepted |
Commit | eedade12f4cb7284555c4c0314485e9575c70ab7 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: use kmem_cache_free_bulk in kfree_skb_list | expand |
On Fri, Jan 13, 2023 at 2:52 PM Jesper Dangaard Brouer <brouer@redhat.com> wrote: > > The kfree_skb_list function walks SKB (via skb->next) and frees them > individually to the SLUB/SLAB allocator (kmem_cache). It is more > efficient to bulk free them via the kmem_cache_free_bulk API. > > This patches create a stack local array with SKBs to bulk free while > walking the list. Bulk array size is limited to 16 SKBs to trade off > stack usage and efficiency. The SLUB kmem_cache "skbuff_head_cache" > uses objsize 256 bytes usually in an order-1 page 8192 bytes that is > 32 objects per slab (can vary on archs and due to SLUB sharing). Thus, > for SLUB the optimal bulk free case is 32 objects belonging to same > slab, but runtime this isn't likely to occur. > > The expected gain from using kmem_cache bulk alloc and free API > have been assessed via a microbencmark kernel module[1]. > > The module 'slab_bulk_test01' results at bulk 16 element: > kmem-in-loop Per elem: 109 cycles(tsc) 30.532 ns (step:16) > kmem-bulk Per elem: 64 cycles(tsc) 17.905 ns (step:16) > > More detailed description of benchmarks avail in [2]. > > [1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm > [2] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/kfree_skb_list01.org > > V2: rename function to kfree_skb_add_bulk. > > Reviewed-by: Saeed Mahameed <saeed@kernel.org> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- According to syzbot, this patch causes kernel panics, in IP fragmentation logic. Can you double check if there is no obvious bug ?
On 18/01/2023 17.05, Eric Dumazet wrote: > On Fri, Jan 13, 2023 at 2:52 PM Jesper Dangaard Brouer > <brouer@redhat.com> wrote: >> >> The kfree_skb_list function walks SKB (via skb->next) and frees them >> individually to the SLUB/SLAB allocator (kmem_cache). It is more >> efficient to bulk free them via the kmem_cache_free_bulk API. >> >> This patches create a stack local array with SKBs to bulk free while >> walking the list. Bulk array size is limited to 16 SKBs to trade off >> stack usage and efficiency. The SLUB kmem_cache "skbuff_head_cache" >> uses objsize 256 bytes usually in an order-1 page 8192 bytes that is >> 32 objects per slab (can vary on archs and due to SLUB sharing). Thus, >> for SLUB the optimal bulk free case is 32 objects belonging to same >> slab, but runtime this isn't likely to occur. >> >> The expected gain from using kmem_cache bulk alloc and free API >> have been assessed via a microbencmark kernel module[1]. >> >> The module 'slab_bulk_test01' results at bulk 16 element: >> kmem-in-loop Per elem: 109 cycles(tsc) 30.532 ns (step:16) >> kmem-bulk Per elem: 64 cycles(tsc) 17.905 ns (step:16) >> >> More detailed description of benchmarks avail in [2]. >> >> [1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm >> [2] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/kfree_skb_list01.org >> >> V2: rename function to kfree_skb_add_bulk. >> >> Reviewed-by: Saeed Mahameed <saeed@kernel.org> >> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> >> --- > > According to syzbot, this patch causes kernel panics, in IP fragmentation logic. > > Can you double check if there is no obvious bug ? Do you have a link to the syzbot issue? --Jesper
On Wed, Jan 18, 2023 at 5:42 PM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > On 18/01/2023 17.05, Eric Dumazet wrote: > > On Fri, Jan 13, 2023 at 2:52 PM Jesper Dangaard Brouer > > <brouer@redhat.com> wrote: > >> > >> The kfree_skb_list function walks SKB (via skb->next) and frees them > >> individually to the SLUB/SLAB allocator (kmem_cache). It is more > >> efficient to bulk free them via the kmem_cache_free_bulk API. > >> > >> This patches create a stack local array with SKBs to bulk free while > >> walking the list. Bulk array size is limited to 16 SKBs to trade off > >> stack usage and efficiency. The SLUB kmem_cache "skbuff_head_cache" > >> uses objsize 256 bytes usually in an order-1 page 8192 bytes that is > >> 32 objects per slab (can vary on archs and due to SLUB sharing). Thus, > >> for SLUB the optimal bulk free case is 32 objects belonging to same > >> slab, but runtime this isn't likely to occur. > >> > >> The expected gain from using kmem_cache bulk alloc and free API > >> have been assessed via a microbencmark kernel module[1]. > >> > >> The module 'slab_bulk_test01' results at bulk 16 element: > >> kmem-in-loop Per elem: 109 cycles(tsc) 30.532 ns (step:16) > >> kmem-bulk Per elem: 64 cycles(tsc) 17.905 ns (step:16) > >> > >> More detailed description of benchmarks avail in [2]. > >> > >> [1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm > >> [2] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/kfree_skb_list01.org > >> > >> V2: rename function to kfree_skb_add_bulk. > >> > >> Reviewed-by: Saeed Mahameed <saeed@kernel.org> > >> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > >> --- > > > > According to syzbot, this patch causes kernel panics, in IP fragmentation logic. > > > > Can you double check if there is no obvious bug ? > > Do you have a link to the syzbot issue? Not yet, I will release it, with a repro.
(related to syzbot issue[1]) On 13/01/2023 14.52, Jesper Dangaard Brouer wrote: > The kfree_skb_list function walks SKB (via skb->next) and frees them > individually to the SLUB/SLAB allocator (kmem_cache). It is more > efficient to bulk free them via the kmem_cache_free_bulk API. > > This patches create a stack local array with SKBs to bulk free while > walking the list. Bulk array size is limited to 16 SKBs to trade off > stack usage and efficiency. The SLUB kmem_cache "skbuff_head_cache" > uses objsize 256 bytes usually in an order-1 page 8192 bytes that is > 32 objects per slab (can vary on archs and due to SLUB sharing). Thus, > for SLUB the optimal bulk free case is 32 objects belonging to same > slab, but runtime this isn't likely to occur. > > The expected gain from using kmem_cache bulk alloc and free API > have been assessed via a microbencmark kernel module[1]. > > The module 'slab_bulk_test01' results at bulk 16 element: > kmem-in-loop Per elem: 109 cycles(tsc) 30.532 ns (step:16) > kmem-bulk Per elem: 64 cycles(tsc) 17.905 ns (step:16) > > More detailed description of benchmarks avail in [2]. > > [1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm > [2] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/kfree_skb_list01.org > > V2: rename function to kfree_skb_add_bulk. > > Reviewed-by: Saeed Mahameed <saeed@kernel.org> > Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> > --- > net/core/skbuff.c | 40 +++++++++++++++++++++++++++++++++++++++- > 1 file changed, 39 insertions(+), 1 deletion(-) > > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 007a5fbe284b..79c9e795a964 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -964,16 +964,54 @@ kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) > } > EXPORT_SYMBOL(kfree_skb_reason); > > +#define KFREE_SKB_BULK_SIZE 16 > + > +struct skb_free_array { > + unsigned int skb_count; > + void *skb_array[KFREE_SKB_BULK_SIZE]; > +}; > + > +static void kfree_skb_add_bulk(struct sk_buff *skb, > + struct skb_free_array *sa, > + enum skb_drop_reason reason) > +{ > + /* if SKB is a clone, don't handle this case */ > + if (unlikely(skb->fclone != SKB_FCLONE_UNAVAILABLE)) { > + __kfree_skb(skb); > + return; > + } > + > + skb_release_all(skb, reason); > + sa->skb_array[sa->skb_count++] = skb; > + > + if (unlikely(sa->skb_count == KFREE_SKB_BULK_SIZE)) { > + kmem_cache_free_bulk(skbuff_head_cache, KFREE_SKB_BULK_SIZE, > + sa->skb_array); > + sa->skb_count = 0; > + } > +} > + > void __fix_address > kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason) > { > + struct skb_free_array sa; > + > + sa.skb_count = 0; > + > while (segs) { > struct sk_buff *next = segs->next; > > + skb_mark_not_on_list(segs); The syzbot[1] bug goes way if I remove this skb_mark_not_on_list(). I don't understand why I cannot clear skb->next here? [1] https://lore.kernel.org/all/000000000000d58eae05f28ca51f@google.com/ > if (__kfree_skb_reason(segs, reason)) > - __kfree_skb(segs); > + kfree_skb_add_bulk(segs, &sa, reason); > + > segs = next; > } > + > + if (sa.skb_count) > + kmem_cache_free_bulk(skbuff_head_cache, sa.skb_count, > + sa.skb_array); > } > EXPORT_SYMBOL(kfree_skb_list_reason); > > >
On Wed, 18 Jan 2023 22:37:47 +0100 Jesper Dangaard Brouer wrote: > > + skb_mark_not_on_list(segs); > > The syzbot[1] bug goes way if I remove this skb_mark_not_on_list(). > > I don't understand why I cannot clear skb->next here? Some of the skbs on the list are not private? IOW we should only unlink them if skb_unref().
On 2023/1/19 5:37, Jesper Dangaard Brouer wrote: > (related to syzbot issue[1]) > > On 13/01/2023 14.52, Jesper Dangaard Brouer wrote: >> The kfree_skb_list function walks SKB (via skb->next) and frees them >> individually to the SLUB/SLAB allocator (kmem_cache). It is more >> efficient to bulk free them via the kmem_cache_free_bulk API. >> >> This patches create a stack local array with SKBs to bulk free while >> walking the list. Bulk array size is limited to 16 SKBs to trade off >> stack usage and efficiency. The SLUB kmem_cache "skbuff_head_cache" >> uses objsize 256 bytes usually in an order-1 page 8192 bytes that is >> 32 objects per slab (can vary on archs and due to SLUB sharing). Thus, >> for SLUB the optimal bulk free case is 32 objects belonging to same >> slab, but runtime this isn't likely to occur. >> >> The expected gain from using kmem_cache bulk alloc and free API >> have been assessed via a microbencmark kernel module[1]. >> >> The module 'slab_bulk_test01' results at bulk 16 element: >> kmem-in-loop Per elem: 109 cycles(tsc) 30.532 ns (step:16) >> kmem-bulk Per elem: 64 cycles(tsc) 17.905 ns (step:16) >> >> More detailed description of benchmarks avail in [2]. >> >> [1] https://github.com/netoptimizer/prototype-kernel/tree/master/kernel/mm >> [2] https://github.com/xdp-project/xdp-project/blob/master/areas/mem/kfree_skb_list01.org >> >> V2: rename function to kfree_skb_add_bulk. >> >> Reviewed-by: Saeed Mahameed <saeed@kernel.org> >> Signed-off-by: Jesper Dangaard Brouer <brouer@redhat.com> >> --- >> net/core/skbuff.c | 40 +++++++++++++++++++++++++++++++++++++++- >> 1 file changed, 39 insertions(+), 1 deletion(-) >> >> diff --git a/net/core/skbuff.c b/net/core/skbuff.c >> index 007a5fbe284b..79c9e795a964 100644 >> --- a/net/core/skbuff.c >> +++ b/net/core/skbuff.c >> @@ -964,16 +964,54 @@ kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) >> } >> EXPORT_SYMBOL(kfree_skb_reason); >> +#define KFREE_SKB_BULK_SIZE 16 >> + >> +struct skb_free_array { >> + unsigned int skb_count; >> + void *skb_array[KFREE_SKB_BULK_SIZE]; >> +}; >> + >> +static void kfree_skb_add_bulk(struct sk_buff *skb, >> + struct skb_free_array *sa, >> + enum skb_drop_reason reason) >> +{ >> + /* if SKB is a clone, don't handle this case */ >> + if (unlikely(skb->fclone != SKB_FCLONE_UNAVAILABLE)) { >> + __kfree_skb(skb); >> + return; >> + } >> + >> + skb_release_all(skb, reason); >> + sa->skb_array[sa->skb_count++] = skb; >> + >> + if (unlikely(sa->skb_count == KFREE_SKB_BULK_SIZE)) { >> + kmem_cache_free_bulk(skbuff_head_cache, KFREE_SKB_BULK_SIZE, >> + sa->skb_array); >> + sa->skb_count = 0; >> + } >> +} >> + >> void __fix_address >> kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason) >> { >> + struct skb_free_array sa; >> + >> + sa.skb_count = 0; >> + >> while (segs) { >> struct sk_buff *next = segs->next; >> + skb_mark_not_on_list(segs); > > The syzbot[1] bug goes way if I remove this skb_mark_not_on_list(). > > I don't understand why I cannot clear skb->next here? Clearing skb->next seems unrelated, it may just increase the problem recurrence probability. Because It seems kfree_skb_list_reason() is also used to release skb in shinfo->frag_list, which should go through the skb_unref() checking, and this patch seems to skip the skb_unref() checking for skb in shinfo->frag_list. > > [1] https://lore.kernel.org/all/000000000000d58eae05f28ca51f@google.com/ > >> if (__kfree_skb_reason(segs, reason)) >> - __kfree_skb(segs); >> + kfree_skb_add_bulk(segs, &sa, reason); >> + >> segs = next; >> } >> + >> + if (sa.skb_count) >> + kmem_cache_free_bulk(skbuff_head_cache, sa.skb_count, >> + sa.skb_array); >> } >> EXPORT_SYMBOL(kfree_skb_list_reason); >> >> > > . >
On 19/01/2023 03.26, Jakub Kicinski wrote: > On Wed, 18 Jan 2023 22:37:47 +0100 Jesper Dangaard Brouer wrote: >>> + skb_mark_not_on_list(segs); >> >> The syzbot[1] bug goes way if I remove this skb_mark_not_on_list(). >> >> I don't understand why I cannot clear skb->next here? > > Some of the skbs on the list are not private? > IOW we should only unlink them if skb_unref(). Yes, you are right. The skb_mark_not_on_list() should only be called if __kfree_skb_reason() returns true, meaning the SKB is ready to be free'ed (as it calls/check skb_unref()). I will send a proper fix patch shortly... after syzbot do a test on it. --Jesper
On Thu, Jan 19, 2023 at 11:18 AM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > > On 19/01/2023 03.26, Jakub Kicinski wrote: > > On Wed, 18 Jan 2023 22:37:47 +0100 Jesper Dangaard Brouer wrote: > >>> + skb_mark_not_on_list(segs); > >> > >> The syzbot[1] bug goes way if I remove this skb_mark_not_on_list(). > >> > >> I don't understand why I cannot clear skb->next here? > > > > Some of the skbs on the list are not private? > > IOW we should only unlink them if skb_unref(). > > Yes, you are right. > > The skb_mark_not_on_list() should only be called if __kfree_skb_reason() > returns true, meaning the SKB is ready to be free'ed (as it calls/check > skb_unref()). This was the case already before your changes. skb->next/prev can not be shared by multiple users. One skb can be put on a single list by definition. Whoever calls kfree_skb_list(list) owns all the skbs->next|prev found in the list So you can mangle skb->next as you like, even if the unref() is telling that someone else has a reference on skb. > > I will send a proper fix patch shortly... after syzbot do a test on it. > > --Jesper >
On 19/01/2023 11.28, Eric Dumazet wrote: > On Thu, Jan 19, 2023 at 11:18 AM Jesper Dangaard Brouer > <jbrouer@redhat.com> wrote: >> >> >> >> On 19/01/2023 03.26, Jakub Kicinski wrote: >>> On Wed, 18 Jan 2023 22:37:47 +0100 Jesper Dangaard Brouer wrote: >>>>> + skb_mark_not_on_list(segs); >>>> >>>> The syzbot[1] bug goes way if I remove this skb_mark_not_on_list(). >>>> >>>> I don't understand why I cannot clear skb->next here? >>> >>> Some of the skbs on the list are not private? >>> IOW we should only unlink them if skb_unref(). >> >> Yes, you are right. >> >> The skb_mark_not_on_list() should only be called if __kfree_skb_reason() >> returns true, meaning the SKB is ready to be free'ed (as it calls/check >> skb_unref()). > > > This was the case already before your changes. > > skb->next/prev can not be shared by multiple users. > > One skb can be put on a single list by definition. > > Whoever calls kfree_skb_list(list) owns all the skbs->next|prev found > in the list > > So you can mangle skb->next as you like, even if the unref() is > telling that someone > else has a reference on skb. Then why does the bug go way if I remove the skb_mark_not_on_list() call then? >> >> I will send a proper fix patch shortly... after syzbot do a test on it. >> I've send a patch for syzbot that only calls skb_mark_not_on_list() when unref() and __kfree_skb_reason() "permits" this. I tested it locally with reproducer and it also fixes/"removes" the bug. --Jesper
On Thu, Jan 19, 2023 at 12:22 PM Jesper Dangaard Brouer <jbrouer@redhat.com> wrote: > > > > On 19/01/2023 11.28, Eric Dumazet wrote: > > On Thu, Jan 19, 2023 at 11:18 AM Jesper Dangaard Brouer > > <jbrouer@redhat.com> wrote: > >> > >> > >> > >> On 19/01/2023 03.26, Jakub Kicinski wrote: > >>> On Wed, 18 Jan 2023 22:37:47 +0100 Jesper Dangaard Brouer wrote: > >>>>> + skb_mark_not_on_list(segs); > >>>> > >>>> The syzbot[1] bug goes way if I remove this skb_mark_not_on_list(). > >>>> > >>>> I don't understand why I cannot clear skb->next here? > >>> > >>> Some of the skbs on the list are not private? > >>> IOW we should only unlink them if skb_unref(). > >> > >> Yes, you are right. > >> > >> The skb_mark_not_on_list() should only be called if __kfree_skb_reason() > >> returns true, meaning the SKB is ready to be free'ed (as it calls/check > >> skb_unref()). > > > > > > This was the case already before your changes. > > > > skb->next/prev can not be shared by multiple users. > > > > One skb can be put on a single list by definition. > > > > Whoever calls kfree_skb_list(list) owns all the skbs->next|prev found > > in the list > > > > So you can mangle skb->next as you like, even if the unref() is > > telling that someone > > else has a reference on skb. > > Then why does the bug go way if I remove the skb_mark_not_on_list() call > then? > Some side effects. This _particular_ repro uses a specific pattern that might be defeated by a small change. (just working around another bug) Instead of setting skb->next to NULL, try to set it to skb->next = (struct sk_buff *)0x800; This might show a different pattern. > >> > >> I will send a proper fix patch shortly... after syzbot do a test on it. > >> > > I've send a patch for syzbot that only calls skb_mark_not_on_list() when > unref() and __kfree_skb_reason() "permits" this. > I tested it locally with reproducer and it also fixes/"removes" the bug. This does not mean we will accept a patch with no clear explanation other than "this removes a syzbot bug, so this must be good" Make sure to give precise details on _why_ this is needed or not. Again, the user of kfree_skb_list(list) _owns_ skb->next for sure. If you think this assertion is not true, we are in big trouble.
On 19/01/2023 12.46, Eric Dumazet wrote: > On Thu, Jan 19, 2023 at 12:22 PM Jesper Dangaard Brouer > <jbrouer@redhat.com> wrote: >> >> >> >> On 19/01/2023 11.28, Eric Dumazet wrote: >>> On Thu, Jan 19, 2023 at 11:18 AM Jesper Dangaard Brouer >>> <jbrouer@redhat.com> wrote: >>>> >>>> >>>> >>>> On 19/01/2023 03.26, Jakub Kicinski wrote: >>>>> On Wed, 18 Jan 2023 22:37:47 +0100 Jesper Dangaard Brouer wrote: >>>>>>> + skb_mark_not_on_list(segs); >>>>>> >>>>>> The syzbot[1] bug goes way if I remove this skb_mark_not_on_list(). >>>>>> >>>>>> I don't understand why I cannot clear skb->next here? >>>>> >>>>> Some of the skbs on the list are not private? >>>>> IOW we should only unlink them if skb_unref(). >>>> >>>> Yes, you are right. >>>> >>>> The skb_mark_not_on_list() should only be called if __kfree_skb_reason() >>>> returns true, meaning the SKB is ready to be free'ed (as it calls/check >>>> skb_unref()). >>> >>> >>> This was the case already before your changes. >>> >>> skb->next/prev can not be shared by multiple users. >>> >>> One skb can be put on a single list by definition. >>> >>> Whoever calls kfree_skb_list(list) owns all the skbs->next|prev found >>> in the list >>> >>> So you can mangle skb->next as you like, even if the unref() is >>> telling that someone >>> else has a reference on skb. >> >> Then why does the bug go way if I remove the skb_mark_not_on_list() call >> then? >> > > Some side effects. > > This _particular_ repro uses a specific pattern that might be defeated > by a small change. > (just working around another bug) > > Instead of setting skb->next to NULL, try to set it to > > skb->next = (struct sk_buff *)0x800; > > This might show a different pattern. Nice trick, I'll use this next time. I modified to code and added a kfree_skb tracepoint with a known reason (PROTO_MEM) to capture the callstack. See end of email. Which shows multicast code path is involved. trace_kfree_skb(segs, __builtin_return_address(0), SKB_DROP_REASON_PROTO_MEM); >>>> >>>> I will send a proper fix patch shortly... after syzbot do a test on it. >>>> >> >> I've send a patch for syzbot that only calls skb_mark_not_on_list() when >> unref() and __kfree_skb_reason() "permits" this. >> I tested it locally with reproducer and it also fixes/"removes" the bug. > > This does not mean we will accept a patch with no clear explanation > other than "this removes a syzbot bug, so this must be good" > > Make sure to give precise details on _why_ this is needed or not. > > Again, the user of kfree_skb_list(list) _owns_ skb->next for sure. > If you think this assertion is not true, we are in big trouble. > I think I have found an explanation, why/when refcnt can be elevated on an SKB-list. The skb_shinfo(skb)->flag_list can increase refcnt. See code: static void skb_clone_fraglist(struct sk_buff *skb) { struct sk_buff *list; skb_walk_frags(skb, list) skb_get(list); } This walks the SKBs on the shinfo->frag_list and increase the refcnt (skb->users). Notice that kfree_skb_list is also called when freeing the SKBs "frag_list" in skb_release_data(). IMHO this explains why we can only remove the SKB from the list, when "permitted" by skb_unref(), e.g. if __kfree_skb_reason() returns true. --Jesper Call-stack of case with elevated refcnt when walking SKB-list: -------------------------------------------------------------- repro 3048 [003] 101.689670: skb:kfree_skb: skbaddr=0xffff888104086600 protocol=0 location=skb_release_data.cold+0x25 reason: PROTO_MEM ffffffff81bb6448 kfree_skb_list_reason.cold+0x3b (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81bb6448 kfree_skb_list_reason.cold+0x3b (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81bb6484 skb_release_data.cold+0x25 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff819137e1 kfree_skb_reason+0x41 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81a1e246 igmp_rcv+0xf6 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff819c5e85 ip_protocol_deliver_rcu+0x165 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff819c5f12 ip_local_deliver_finish+0x72 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81930b89 __netif_receive_skb_one_core+0x69 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81930e31 process_backlog+0x91 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff8193197b __napi_poll+0x2b (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81931e86 net_rx_action+0x276 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81bcf083 __do_softirq+0xd3 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81087243 do_softirq+0x63 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff810872e4 __local_bh_enable_ip+0x64 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff8192ba8f netif_rx+0xdf (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff8192bb27 dev_loopback_xmit+0x77 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff819c9195 ip_mc_finish_output+0x65 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff819cc597 ip_mc_output+0x137 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff819cd2b8 ip_push_pending_frames+0xa8 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81a04f67 raw_sendmsg+0x607 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff8190153b sock_sendmsg+0x8b (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff8190323b __sys_sendto+0xeb (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff819032b0 __x64_sys_sendto+0x20 (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81bb9ffa do_syscall_64+0x3a (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) ffffffff81c000aa entry_SYSCALL_64+0xaa (/boot/vmlinux-6.2.0-rc3-net-next-kfunc-xdp-hints+) 10af3d syscall+0x1d (/usr/lib64/libc.so.6) 46d3 do_sandbox_none+0x81 (/home/jbrouer/syzbot-kfree_skb_list/repro) 48ce main+0xa5 (/home/jbrouer/syzbot-kfree_skb_list/repro) 29550 __libc_start_call_main+0x80 (/usr/lib64/libc.so.6) Resolving some symbols: ip_mc_finish_output+0x65 ------------------------ [net-next]$ ./scripts/faddr2line net/ipv4/ip_output.o ip_mc_finish_output+0x65 ip_mc_finish_output+0x65/0x190: ip_mc_finish_output at net/ipv4/ip_output.c:356 ip_mc_output+0x137 ------------------ [net-next]$ ./scripts/faddr2line vmlinux ip_mc_output+0x137 ip_mc_output+0x137/0x2a0: skb_network_header at include/linux/skbuff.h:2829 (inlined by) ip_hdr at include/linux/ip.h:21 (inlined by) ip_mc_output at net/ipv4/ip_output.c:401 igmp_rcv+0xf6 ------------- [net-next]$ ./scripts/faddr2line vmlinux igmp_rcv+0xf6 igmp_rcv+0xf6/0x2e0: igmp_rcv at net/ipv4/igmp.c:1130
diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 007a5fbe284b..79c9e795a964 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -964,16 +964,54 @@ kfree_skb_reason(struct sk_buff *skb, enum skb_drop_reason reason) } EXPORT_SYMBOL(kfree_skb_reason); +#define KFREE_SKB_BULK_SIZE 16 + +struct skb_free_array { + unsigned int skb_count; + void *skb_array[KFREE_SKB_BULK_SIZE]; +}; + +static void kfree_skb_add_bulk(struct sk_buff *skb, + struct skb_free_array *sa, + enum skb_drop_reason reason) +{ + /* if SKB is a clone, don't handle this case */ + if (unlikely(skb->fclone != SKB_FCLONE_UNAVAILABLE)) { + __kfree_skb(skb); + return; + } + + skb_release_all(skb, reason); + sa->skb_array[sa->skb_count++] = skb; + + if (unlikely(sa->skb_count == KFREE_SKB_BULK_SIZE)) { + kmem_cache_free_bulk(skbuff_head_cache, KFREE_SKB_BULK_SIZE, + sa->skb_array); + sa->skb_count = 0; + } +} + void __fix_address kfree_skb_list_reason(struct sk_buff *segs, enum skb_drop_reason reason) { + struct skb_free_array sa; + + sa.skb_count = 0; + while (segs) { struct sk_buff *next = segs->next; + skb_mark_not_on_list(segs); + if (__kfree_skb_reason(segs, reason)) - __kfree_skb(segs); + kfree_skb_add_bulk(segs, &sa, reason); + segs = next; } + + if (sa.skb_count) + kmem_cache_free_bulk(skbuff_head_cache, sa.skb_count, + sa.skb_array); } EXPORT_SYMBOL(kfree_skb_list_reason);