diff mbox series

[net] net: tun: limit first seg size to avoid oversized linearization

Message ID 20220907015618.2140679-1-william.xuanziyang@huawei.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net] net: tun: limit first seg size to avoid oversized linearization | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers fail 1 blamed authors not CCed: maheshb@google,com; 1 maintainers not CCed: maheshb@google,com
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 21 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Ziyang Xuan (William) Sept. 7, 2022, 1:56 a.m. UTC
Recently, we found a syzkaller problem as following:

========================================================
WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
...
Call trace:
 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
 __alloc_pages_node include/linux/gfp.h:550 [inline]
 alloc_pages_node include/linux/gfp.h:564 [inline]
 kmalloc_large_node+0x94/0x350 mm/slub.c:4038
 __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
 __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
 pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
 __skb_grow include/linux/skbuff.h:2779 [inline]
 tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
 tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
 tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036

It is because the first seg size of the iov_iter from user space is
very big, it is 2147479538 which is bigger than the threshold value
for bail out early in __alloc_pages(). And skb->pfmemalloc is true,
__kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN
flag. Thus we got a warning.

I noticed that non-first segs size are required less than PAGE_SIZE in
tun_napi_alloc_frags(). The first seg should not be a special case, and
oversized linearization is also unreasonable. Limit the first seg size to
PAGE_SIZE to avoid oversized linearization.

Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
---
 drivers/net/tun.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

Comments

Petar Penkov Sept. 9, 2022, 2:18 p.m. UTC | #1
Thanks, this looks good to me!

On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan
<william.xuanziyang@huawei.com> wrote:
>
> Recently, we found a syzkaller problem as following:
>
> ========================================================
> WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
> ...
> Call trace:
>  __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
>  __alloc_pages_node include/linux/gfp.h:550 [inline]
>  alloc_pages_node include/linux/gfp.h:564 [inline]
>  kmalloc_large_node+0x94/0x350 mm/slub.c:4038
>  __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
>  __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
>  pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
>  __skb_grow include/linux/skbuff.h:2779 [inline]
>  tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
>  tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
>  tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
>
> It is because the first seg size of the iov_iter from user space is
> very big, it is 2147479538 which is bigger than the threshold value
> for bail out early in __alloc_pages(). And skb->pfmemalloc is true,
> __kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN
> flag. Thus we got a warning.
>
> I noticed that non-first segs size are required less than PAGE_SIZE in
> tun_napi_alloc_frags(). The first seg should not be a special case, and
> oversized linearization is also unreasonable. Limit the first seg size to
> PAGE_SIZE to avoid oversized linearization.
>
> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> ---
Acked-by: Petar Penkov <ppenkov@aviatrix.com>
Eric Dumazet Sept. 9, 2022, 4:36 p.m. UTC | #2
On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan
<william.xuanziyang@huawei.com> wrote:
>
> Recently, we found a syzkaller problem as following:
>
> ========================================================
> WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
> ...
> Call trace:
>  __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
>  __alloc_pages_node include/linux/gfp.h:550 [inline]
>  alloc_pages_node include/linux/gfp.h:564 [inline]
>  kmalloc_large_node+0x94/0x350 mm/slub.c:4038
>  __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
>  __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
>  pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
>  __skb_grow include/linux/skbuff.h:2779 [inline]
>  tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
>  tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
>  tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
>
> It is because the first seg size of the iov_iter from user space is
> very big, it is 2147479538 which is bigger than the threshold value
> for bail out early in __alloc_pages(). And skb->pfmemalloc is true,
> __kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN
> flag. Thus we got a warning.
>
> I noticed that non-first segs size are required less than PAGE_SIZE in
> tun_napi_alloc_frags(). The first seg should not be a special case, and
> oversized linearization is also unreasonable. Limit the first seg size to
> PAGE_SIZE to avoid oversized linearization.
>
> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> ---
>  drivers/net/tun.c | 5 ++---
>  1 file changed, 2 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> index 259b2b84b2b3..7db515f94667 100644
> --- a/drivers/net/tun.c
> +++ b/drivers/net/tun.c
> @@ -1454,12 +1454,12 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile,
>                                             size_t len,
>                                             const struct iov_iter *it)
>  {
> +       size_t linear = iov_iter_single_seg_count(it);
>         struct sk_buff *skb;
> -       size_t linear;
>         int err;
>         int i;
>
> -       if (it->nr_segs > MAX_SKB_FRAGS + 1)
> +       if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear > PAGE_SIZE)
>                 return ERR_PTR(-EMSGSIZE);
>

This does not look good to me.

Some drivers allocate 9KB+ for 9000 MTU, in a single allocation,
because the hardware is not SG capable in RX.

>         local_bh_disable();
> @@ -1468,7 +1468,6 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile,
>         if (!skb)
>                 return ERR_PTR(-ENOMEM);
>
> -       linear = iov_iter_single_seg_count(it);
>         err = __skb_grow(skb, linear);
>         if (err)
>                 goto free;
> --
> 2.25.1
>
Ziyang Xuan (William) Sept. 13, 2022, 12:07 p.m. UTC | #3
> On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan
> <william.xuanziyang@huawei.com> wrote:
>>
>> Recently, we found a syzkaller problem as following:
>>
>> ========================================================
>> WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295 __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
>> ...
>> Call trace:
>>  __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
>>  __alloc_pages_node include/linux/gfp.h:550 [inline]
>>  alloc_pages_node include/linux/gfp.h:564 [inline]
>>  kmalloc_large_node+0x94/0x350 mm/slub.c:4038
>>  __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
>>  __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
>>  pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
>>  __skb_grow include/linux/skbuff.h:2779 [inline]
>>  tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
>>  tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
>>  tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
>>
>> It is because the first seg size of the iov_iter from user space is
>> very big, it is 2147479538 which is bigger than the threshold value
>> for bail out early in __alloc_pages(). And skb->pfmemalloc is true,
>> __kmalloc_reserve() would use pfmemalloc reserves without __GFP_NOWARN
>> flag. Thus we got a warning.
>>
>> I noticed that non-first segs size are required less than PAGE_SIZE in
>> tun_napi_alloc_frags(). The first seg should not be a special case, and
>> oversized linearization is also unreasonable. Limit the first seg size to
>> PAGE_SIZE to avoid oversized linearization.
>>
>> Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP driver")
>> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
>> ---
>>  drivers/net/tun.c | 5 ++---
>>  1 file changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/drivers/net/tun.c b/drivers/net/tun.c
>> index 259b2b84b2b3..7db515f94667 100644
>> --- a/drivers/net/tun.c
>> +++ b/drivers/net/tun.c
>> @@ -1454,12 +1454,12 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile,
>>                                             size_t len,
>>                                             const struct iov_iter *it)
>>  {
>> +       size_t linear = iov_iter_single_seg_count(it);
>>         struct sk_buff *skb;
>> -       size_t linear;
>>         int err;
>>         int i;
>>
>> -       if (it->nr_segs > MAX_SKB_FRAGS + 1)
>> +       if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear > PAGE_SIZE)
>>                 return ERR_PTR(-EMSGSIZE);
>>
> 
> This does not look good to me.
> 
> Some drivers allocate 9KB+ for 9000 MTU, in a single allocation,
> because the hardware is not SG capable in RX.

So, do you mean that it does not matter and keep current status, or give a bigger size but PAGE_SIZE (usually 4KB size)?

Would like to hear your advice.

Thank you.

> 
>>         local_bh_disable();
>> @@ -1468,7 +1468,6 @@ static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile,
>>         if (!skb)
>>                 return ERR_PTR(-ENOMEM);
>>
>> -       linear = iov_iter_single_seg_count(it);
>>         err = __skb_grow(skb, linear);
>>         if (err)
>>                 goto free;
>> --
>> 2.25.1
>>
> .
>
Paolo Abeni Sept. 15, 2022, 10:31 a.m. UTC | #4
On Tue, 2022-09-13 at 20:07 +0800, Ziyang Xuan (William) wrote:
> > On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan
> > <william.xuanziyang@huawei.com> wrote:
> > > 
> > > Recently, we found a syzkaller problem as following:
> > > 
> > > ========================================================
> > > WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295
> > > __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
> > > ...
> > > Call trace:
> > >  __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
> > >  __alloc_pages_node include/linux/gfp.h:550 [inline]
> > >  alloc_pages_node include/linux/gfp.h:564 [inline]
> > >  kmalloc_large_node+0x94/0x350 mm/slub.c:4038
> > >  __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
> > >  __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
> > >  pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
> > >  __skb_grow include/linux/skbuff.h:2779 [inline]
> > >  tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
> > >  tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
> > >  tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
> > > 
> > > It is because the first seg size of the iov_iter from user space
> > > is
> > > very big, it is 2147479538 which is bigger than the threshold
> > > value
> > > for bail out early in __alloc_pages(). And skb->pfmemalloc is
> > > true,
> > > __kmalloc_reserve() would use pfmemalloc reserves without
> > > __GFP_NOWARN
> > > flag. Thus we got a warning.
> > > 
> > > I noticed that non-first segs size are required less than
> > > PAGE_SIZE in
> > > tun_napi_alloc_frags(). The first seg should not be a special
> > > case, and
> > > oversized linearization is also unreasonable. Limit the first seg
> > > size to
> > > PAGE_SIZE to avoid oversized linearization.
> > > 
> > > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP
> > > driver")
> > > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> > > ---
> > >  drivers/net/tun.c | 5 ++---
> > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > 
> > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > index 259b2b84b2b3..7db515f94667 100644
> > > --- a/drivers/net/tun.c
> > > +++ b/drivers/net/tun.c
> > > @@ -1454,12 +1454,12 @@ static struct sk_buff
> > > *tun_napi_alloc_frags(struct tun_file *tfile,
> > >                                             size_t len,
> > >                                             const struct iov_iter
> > > *it)
> > >  {
> > > +       size_t linear = iov_iter_single_seg_count(it);
> > >         struct sk_buff *skb;
> > > -       size_t linear;
> > >         int err;
> > >         int i;
> > > 
> > > -       if (it->nr_segs > MAX_SKB_FRAGS + 1)
> > > +       if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear >
> > > PAGE_SIZE)
> > >                 return ERR_PTR(-EMSGSIZE);
> > > 
> > 
> > This does not look good to me.
> > 
> > Some drivers allocate 9KB+ for 9000 MTU, in a single allocation,
> > because the hardware is not SG capable in RX.
> 
> So, do you mean that it does not matter and keep current status, or
> give a bigger size but PAGE_SIZE (usually 4KB size)?
> 
> Would like to hear your advice.

I'm guessing that what Eric is suggesting here is to use a bigger limit
for 'linear'. Possibly ETH_MAX_MTU could fit. @Eric, fell free to
correct me :)

Thanks!

Paolo
Eric Dumazet Oct. 27, 2022, 2:11 p.m. UTC | #5
On Thu, Sep 15, 2022 at 3:31 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2022-09-13 at 20:07 +0800, Ziyang Xuan (William) wrote:
> > > On Tue, Sep 6, 2022 at 6:56 PM Ziyang Xuan
> > > <william.xuanziyang@huawei.com> wrote:
> > > >
> > > > Recently, we found a syzkaller problem as following:
> > > >
> > > > ========================================================
> > > > WARNING: CPU: 1 PID: 17965 at mm/page_alloc.c:5295
> > > > __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
> > > > ...
> > > > Call trace:
> > > >  __alloc_pages+0x1308/0x16c4 mm/page_alloc.c:5295
> > > >  __alloc_pages_node include/linux/gfp.h:550 [inline]
> > > >  alloc_pages_node include/linux/gfp.h:564 [inline]
> > > >  kmalloc_large_node+0x94/0x350 mm/slub.c:4038
> > > >  __kmalloc_node_track_caller+0x620/0x8e4 mm/slub.c:4545
> > > >  __kmalloc_reserve.constprop.0+0x1e4/0x2b0 net/core/skbuff.c:151
> > > >  pskb_expand_head+0x130/0x8b0 net/core/skbuff.c:1654
> > > >  __skb_grow include/linux/skbuff.h:2779 [inline]
> > > >  tun_napi_alloc_frags+0x144/0x610 drivers/net/tun.c:1477
> > > >  tun_get_user+0x31c/0x2010 drivers/net/tun.c:1835
> > > >  tun_chr_write_iter+0x98/0x100 drivers/net/tun.c:2036
> > > >
> > > > It is because the first seg size of the iov_iter from user space
> > > > is
> > > > very big, it is 2147479538 which is bigger than the threshold
> > > > value
> > > > for bail out early in __alloc_pages(). And skb->pfmemalloc is
> > > > true,
> > > > __kmalloc_reserve() would use pfmemalloc reserves without
> > > > __GFP_NOWARN
> > > > flag. Thus we got a warning.
> > > >
> > > > I noticed that non-first segs size are required less than
> > > > PAGE_SIZE in
> > > > tun_napi_alloc_frags(). The first seg should not be a special
> > > > case, and
> > > > oversized linearization is also unreasonable. Limit the first seg
> > > > size to
> > > > PAGE_SIZE to avoid oversized linearization.
> > > >
> > > > Fixes: 90e33d459407 ("tun: enable napi_gro_frags() for TUN/TAP
> > > > driver")
> > > > Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com>
> > > > ---
> > > >  drivers/net/tun.c | 5 ++---
> > > >  1 file changed, 2 insertions(+), 3 deletions(-)
> > > >
> > > > diff --git a/drivers/net/tun.c b/drivers/net/tun.c
> > > > index 259b2b84b2b3..7db515f94667 100644
> > > > --- a/drivers/net/tun.c
> > > > +++ b/drivers/net/tun.c
> > > > @@ -1454,12 +1454,12 @@ static struct sk_buff
> > > > *tun_napi_alloc_frags(struct tun_file *tfile,
> > > >                                             size_t len,
> > > >                                             const struct iov_iter
> > > > *it)
> > > >  {
> > > > +       size_t linear = iov_iter_single_seg_count(it);
> > > >         struct sk_buff *skb;
> > > > -       size_t linear;
> > > >         int err;
> > > >         int i;
> > > >
> > > > -       if (it->nr_segs > MAX_SKB_FRAGS + 1)
> > > > +       if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear >
> > > > PAGE_SIZE)
> > > >                 return ERR_PTR(-EMSGSIZE);
> > > >
> > >
> > > This does not look good to me.
> > >
> > > Some drivers allocate 9KB+ for 9000 MTU, in a single allocation,
> > > because the hardware is not SG capable in RX.
> >
> > So, do you mean that it does not matter and keep current status, or
> > give a bigger size but PAGE_SIZE (usually 4KB size)?
> >
> > Would like to hear your advice.
>
> I'm guessing that what Eric is suggesting here is to use a bigger limit
> for 'linear'. Possibly ETH_MAX_MTU could fit. @Eric, fell free to
> correct me :)
>

Something like that, yes. We need to be careful when approaching 64K limit,
because of possible u16 fields overflows.

We just got another patch in GRO layer, just because tun has not been fixed yet.



> Thanks!
>
> Paolo
>
diff mbox series

Patch

diff --git a/drivers/net/tun.c b/drivers/net/tun.c
index 259b2b84b2b3..7db515f94667 100644
--- a/drivers/net/tun.c
+++ b/drivers/net/tun.c
@@ -1454,12 +1454,12 @@  static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile,
 					    size_t len,
 					    const struct iov_iter *it)
 {
+	size_t linear = iov_iter_single_seg_count(it);
 	struct sk_buff *skb;
-	size_t linear;
 	int err;
 	int i;
 
-	if (it->nr_segs > MAX_SKB_FRAGS + 1)
+	if (it->nr_segs > MAX_SKB_FRAGS + 1 || linear > PAGE_SIZE)
 		return ERR_PTR(-EMSGSIZE);
 
 	local_bh_disable();
@@ -1468,7 +1468,6 @@  static struct sk_buff *tun_napi_alloc_frags(struct tun_file *tfile,
 	if (!skb)
 		return ERR_PTR(-ENOMEM);
 
-	linear = iov_iter_single_seg_count(it);
 	err = __skb_grow(skb, linear);
 	if (err)
 		goto free;