diff mbox series

[-next] mm: delete oversized WARN_ON() in kvmalloc() calls

Message ID 1638410784-48646-1-git-send-email-cuibixuan@linux.alibaba.com (mailing list archive)
State New
Headers show
Series [-next] mm: delete oversized WARN_ON() in kvmalloc() calls | expand

Commit Message

Bixuan Cui Dec. 2, 2021, 2:06 a.m. UTC
Delete the WARN_ON() and return NULL directly for oversized parameter
in kvmalloc() calls.
Also add unlikely().

Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
---
There are a lot of oversize warnings and patches about kvmalloc() calls
recently. Maybe these warnings are not very necessary.

https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal

The example of size check in __do_kmalloc_node():
__do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
{
        struct kmem_cache *cachep;
        void *ret;

        if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
                return NULL;
        cachep = kmalloc_slab(size, flags);

 mm/util.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Tang Yizhou Dec. 2, 2021, 2:53 a.m. UTC | #1
On 2021/12/2 10:06, Bixuan Cui wrote:
> Delete the WARN_ON() and return NULL directly for oversized parameter
> in kvmalloc() calls.
> Also add unlikely().
> 

The commit message should explain why we need to do this. Thanks.

> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
> ---
> There are a lot of oversize warnings and patches about kvmalloc() calls
> recently. Maybe these warnings are not very necessary.
> 
> https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal
> 
> The example of size check in __do_kmalloc_node():
> __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> {
>         struct kmem_cache *cachep;
>         void *ret;
> 
>         if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
>                 return NULL;
>         cachep = kmalloc_slab(size, flags);
> 
>  mm/util.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 7e433690..d26f19c 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -587,7 +587,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  		return ret;
>  
>  	/* Don't even allow crazy sizes */
> -	if (WARN_ON_ONCE(size > INT_MAX))
> +	if (unlikely(size > INT_MAX))
>  		return NULL;
>  
>  	return __vmalloc_node(size, 1, flags, node,
> 

Tang
Andrew Morton Dec. 2, 2021, 3:26 a.m. UTC | #2
On Thu,  2 Dec 2021 10:06:24 +0800 Bixuan Cui <cuibixuan@linux.alibaba.com> wrote:

> Delete the WARN_ON() and return NULL directly for oversized parameter
> in kvmalloc() calls.
> Also add unlikely().
> 
> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
> ---
> There are a lot of oversize warnings and patches about kvmalloc() calls
> recently. Maybe these warnings are not very necessary.

Or maybe they are.  Please let's take a look at these warnings, one at
a time.  If a large number of them are bogus then sure, let's disable
the runtime test.  But perhaps it's the case that calling code has
genuine issues and should be repaired.
Kees Cook Dec. 2, 2021, 3:46 a.m. UTC | #3
On Thu, Dec 02, 2021 at 10:06:24AM +0800, Bixuan Cui wrote:
> Delete the WARN_ON() and return NULL directly for oversized parameter
> in kvmalloc() calls.
> Also add unlikely().
> 
> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
> ---
> There are a lot of oversize warnings and patches about kvmalloc() calls
> recently. Maybe these warnings are not very necessary.

It seems these warnings are working, yes? i.e. we're finding the places
where giant values are coming in?

> 
> https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal
> 
> The example of size check in __do_kmalloc_node():
> __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> {
>         struct kmem_cache *cachep;
>         void *ret;
> 
>         if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
>                 return NULL;
>         cachep = kmalloc_slab(size, flags);
> 
>  mm/util.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 7e433690..d26f19c 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -587,7 +587,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  		return ret;
>  
>  	/* Don't even allow crazy sizes */
> -	if (WARN_ON_ONCE(size > INT_MAX))
> +	if (unlikely(size > INT_MAX))
>  		return NULL;

If we're rejecting the value, then it's still a pathological size, so
shouldn't the check be happening in the caller? I think the WARN is
doing exactly what it was supposed to do: find the places where bad
sizes can reach vmalloc.

-Kees

>  
>  	return __vmalloc_node(size, 1, flags, node,
> -- 
> 1.8.3.1
>
Bixuan Cui Dec. 2, 2021, 4:05 a.m. UTC | #4
在 2021/12/2 上午11:26, Andrew Morton 写道:
>> Delete the WARN_ON() and return NULL directly for oversized parameter
>> in kvmalloc() calls.
>> Also add unlikely().
>>
>> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
>> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com>
>> ---
>> There are a lot of oversize warnings and patches about kvmalloc() calls
>> recently. Maybe these warnings are not very necessary.
> Or maybe they are.  Please let's take a look at these warnings, one at
> a time.  If a large number of them are bogus then sure, let's disable
> the runtime test.  But perhaps it's the case that calling code has
> genuine issues and should be repaired.
Such as:

https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e

https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d

https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7

https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00

https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83
Andrew Morton Dec. 2, 2021, 4:29 a.m. UTC | #5
On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui <cuibixuan@linux.alibaba.com> wrote:

> 
> 在 2021/12/2 上午11:26, Andrew Morton 写道:
> >> Delete the WARN_ON() and return NULL directly for oversized parameter
> >> in kvmalloc() calls.
> >> Also add unlikely().
> >>
> >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com>
> >> ---
> >> There are a lot of oversize warnings and patches about kvmalloc() calls
> >> recently. Maybe these warnings are not very necessary.
> > Or maybe they are.  Please let's take a look at these warnings, one at
> > a time.  If a large number of them are bogus then sure, let's disable
> > the runtime test.  But perhaps it's the case that calling code has
> > genuine issues and should be repaired.
> Such as:

Thanks, that's helpful.

Let's bring all these to the attention of the relevant developers.

If the consensus is "the code's fine, the warning is bogus" then let's
consider retiring the warning.

If the consensus is otherwise then hopefully they will fix their stuff!



> https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e

(cc bpf@vger.kernel.org)

> https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d

(cc netdev@vger.kernel.org, maintainers)

> https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7

(cc kvm@vger.kernel.org)

> https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00

(cc netfilter-devel@vger.kernel.org)

> https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83

(bpf again).
Bixuan Cui Dec. 2, 2021, 4:44 a.m. UTC | #6
在 2021/12/2 上午11:46, Kees Cook 写道:
> On Thu, Dec 02, 2021 at 10:06:24AM +0800, Bixuan Cui wrote:
>> Delete the WARN_ON() and return NULL directly for oversized parameter
>> in kvmalloc() calls.
>> Also add unlikely().
>>
>> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
>> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com>
>> ---
>> There are a lot of oversize warnings and patches about kvmalloc() calls
>> recently. Maybe these warnings are not very necessary.
> It seems these warnings are working, yes? i.e. we're finding the places
> where giant values are coming in?
Yes,  It's working.
>
>> https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal
>>
>> The example of size check in __do_kmalloc_node():
>> __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
>> {
>>          struct kmem_cache *cachep;
>>          void *ret;
>>
>>          if (unlikely(size > KMALLOC_MAX_CACHE_SIZE))
>>                  return NULL;
>>          cachep = kmalloc_slab(size, flags);
>>
>>   mm/util.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/mm/util.c b/mm/util.c
>> index 7e433690..d26f19c 100644
>> --- a/mm/util.c
>> +++ b/mm/util.c
>> @@ -587,7 +587,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>>   		return ret;
>>   
>>   	/* Don't even allow crazy sizes */
>> -	if (WARN_ON_ONCE(size > INT_MAX))
>> +	if (unlikely(size > INT_MAX))
>>   		return NULL;
> If we're rejecting the value, then it's still a pathological size, so
> shouldn't the check be happening in the caller? I think the WARN is
> doing exactly what it was supposed to do: find the places where bad
> sizes can reach vmalloc.
In this way, we must check whether the size from the user exceeds INT_MAX
before calling kvmalloc() calls.  Generally speaking, the oversize check 
is rarely
done before.

Thanks,
Bixuan Cui


>
> -Kees
>
>>   
>>   	return __vmalloc_node(size, 1, flags, node,
>> -- 
>> 1.8.3.1
>>
> -- Kees Cook
Jeremy Sowden Dec. 2, 2021, 10:38 a.m. UTC | #7
On 2021-12-01, at 20:29:05 -0800, Andrew Morton wrote:
> On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui wrote:
> > 在 2021/12/2 上午11:26, Andrew Morton 写道:
> > >> Delete the WARN_ON() and return NULL directly for oversized
> > >> parameter in kvmalloc() calls.
> > >> Also add unlikely().
> > >>
> > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> > >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com>
> > >> ---
> > >> There are a lot of oversize warnings and patches about kvmalloc()
> > >> calls recently. Maybe these warnings are not very necessary.
> > >
> > > Or maybe they are.  Please let's take a look at these warnings,
> > > one at a time.  If a large number of them are bogus then sure,
> > > let's disable the runtime test.  But perhaps it's the case that
> > > calling code has genuine issues and should be repaired.
> >
> > Such as:
>
> Thanks, that's helpful.
>
> Let's bring all these to the attention of the relevant developers.
>
> If the consensus is "the code's fine, the warning is bogus" then let's
> consider retiring the warning.
>
> If the consensus is otherwise then hopefully they will fix their stuff!
>
> > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e
>
> (cc bpf@vger.kernel.org)
>
> > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d
>
> (cc netdev@vger.kernel.org, maintainers)
>
> > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7
>
> (cc kvm@vger.kernel.org)
>
> > https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00
>
> (cc netfilter-devel@vger.kernel.org)

The netfilter bug has since been fixed:

  https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=7bbc3d385bd813077acaf0e6fdb2a86a901f5382

> > https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83
>
> (bpf again).

J.
Bixuan Cui Dec. 2, 2021, 11:49 a.m. UTC | #8
在 2021/12/2 下午12:29, Andrew Morton 写道:
> Thanks, that's helpful.
>
> Let's bring all these to the attention of the relevant developers.
>
> If the consensus is "the code's fine, the warning is bogus" then let's
> consider retiring the warning.
>
> If the consensus is otherwise then hopefully they will fix their stuff!

Ok,thanks for your advice :-)


Thanks,

Bixuan Cui
Leon Romanovsky Dec. 2, 2021, 3:23 p.m. UTC | #9
On Wed, Dec 01, 2021 at 07:26:43PM -0800, Andrew Morton wrote:
> On Thu,  2 Dec 2021 10:06:24 +0800 Bixuan Cui <cuibixuan@linux.alibaba.com> wrote:
> 
> > Delete the WARN_ON() and return NULL directly for oversized parameter
> > in kvmalloc() calls.
> > Also add unlikely().
> > 
> > Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> > Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>
> > ---
> > There are a lot of oversize warnings and patches about kvmalloc() calls
> > recently. Maybe these warnings are not very necessary.
> 
> Or maybe they are.  Please let's take a look at these warnings, one at
> a time.  If a large number of them are bogus then sure, let's disable
> the runtime test.  But perhaps it's the case that calling code has
> genuine issues and should be repaired.

Andrew,

The problem is that this WARN_ON() is triggered by the users.

At least in the RDMA world, users can provide huge sizes and they expect
to get plain -ENOMEM and not dump stack, because it happens indirectly
to them.

In our case, these two kvcalloc() generates WARN_ON().

		umem_odp->pfn_list = kvcalloc(
			npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL);
		if (!umem_odp->pfn_list)
			return -ENOMEM;

		umem_odp->dma_list = kvcalloc(
			ndmas, sizeof(*umem_odp->dma_list), GFP_KERNEL);
https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L80

It is not a kernel programmer error to allow "oversized kvmalloc call" .

Thanks
Matthew Wilcox Dec. 2, 2021, 3:29 p.m. UTC | #10
On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> The problem is that this WARN_ON() is triggered by the users.

... or the problem is that you don't do a sanity check between the user
and the MM system.  I mean, that's what this conversation is about --
is it a bug to be asking for this much memory in the first place?

> At least in the RDMA world, users can provide huge sizes and they expect
> to get plain -ENOMEM and not dump stack, because it happens indirectly
> to them.
> 
> In our case, these two kvcalloc() generates WARN_ON().
> 
> 		umem_odp->pfn_list = kvcalloc(
> 			npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL);

Does it really make sense for the user to specify 2^31 PFNs in a single
call?  I mean, that's 8TB of memory.  Should RDMA put its own limit
in here, or should it rely on kvmalloc returning -ENOMEM?
Alexei Starovoitov Dec. 2, 2021, 3:34 p.m. UTC | #11
On Thu, Dec 2, 2021 at 2:38 AM Jeremy Sowden <jeremy@azazel.net> wrote:
>
> On 2021-12-01, at 20:29:05 -0800, Andrew Morton wrote:
> > On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui wrote:
> > > 在 2021/12/2 上午11:26, Andrew Morton 写道:
> > > >> Delete the WARN_ON() and return NULL directly for oversized
> > > >> parameter in kvmalloc() calls.
> > > >> Also add unlikely().
> > > >>
> > > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> > > >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com>
> > > >> ---
> > > >> There are a lot of oversize warnings and patches about kvmalloc()
> > > >> calls recently. Maybe these warnings are not very necessary.
> > > >
> > > > Or maybe they are.  Please let's take a look at these warnings,
> > > > one at a time.  If a large number of them are bogus then sure,
> > > > let's disable the runtime test.  But perhaps it's the case that
> > > > calling code has genuine issues and should be repaired.
> > >
> > > Such as:
> >
> > Thanks, that's helpful.
> >
> > Let's bring all these to the attention of the relevant developers.
> >
> > If the consensus is "the code's fine, the warning is bogus" then let's
> > consider retiring the warning.
> >
> > If the consensus is otherwise then hopefully they will fix their stuff!
> >
> > > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e
> >
> > (cc bpf@vger.kernel.org)
> >
> > > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d
> >
> > (cc netdev@vger.kernel.org, maintainers)
> >
> > > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7
> >
> > (cc kvm@vger.kernel.org)
> >
> > > https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00
> >
> > (cc netfilter-devel@vger.kernel.org)
>
> The netfilter bug has since been fixed:
>
>   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=7bbc3d385bd813077acaf0e6fdb2a86a901f5382

How is this a "fix" ?
u32 was the limit and because of the new warn the limit
got reduced to s32.
Every subsystem is supposed to do this "fix" now?

> > > https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83
> >
> > (bpf again).
>
> J.
Leon Romanovsky Dec. 2, 2021, 4:08 p.m. UTC | #12
On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > The problem is that this WARN_ON() is triggered by the users.
> 
> ... or the problem is that you don't do a sanity check between the user
> and the MM system.  I mean, that's what this conversation is about --
> is it a bug to be asking for this much memory in the first place?

We do a lot of checks, and in this case, user provided valid input.
He asked size that doesn't cross his address space.
https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67

		start = ALIGN_DOWN(umem_odp->umem.address, page_size);
		if (check_add_overflow(umem_odp->umem.address,
				       (unsigned long)umem_odp->umem.length,
				       &end))
			return -EOVERFLOW;

There is a feature called ODP (on-demand-paging) which is supported
in some RDMA NICs. It allows to the user "export" their whole address
space to the other RDMA node without pinning the pages. And once the
other node sends data to not-pinned page, the RDMA NIC will prefetch
it.

> 
> > At least in the RDMA world, users can provide huge sizes and they expect
> > to get plain -ENOMEM and not dump stack, because it happens indirectly
> > to them.
> > 
> > In our case, these two kvcalloc() generates WARN_ON().
> > 
> > 		umem_odp->pfn_list = kvcalloc(
> > 			npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL);
> 
> Does it really make sense for the user to specify 2^31 PFNs in a single
> call?  I mean, that's 8TB of memory.  Should RDMA put its own limit
> in here, or should it rely on kvmalloc returning -ENOMEM?

I heard about such systems with so many available RAM. I don't know
about their usage pattern, most likely they will use hugepages, but
it is my guess.

The thing is that it is not RDMA-specific. You are asking to place same
if (size > KVMALLOC...) check in all subsystems.

Thanks
Jason Gunthorpe Dec. 2, 2021, 5 p.m. UTC | #13
On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > The problem is that this WARN_ON() is triggered by the users.
> 
> ... or the problem is that you don't do a sanity check between the user
> and the MM system.  I mean, that's what this conversation is about --
> is it a bug to be asking for this much memory in the first place?
>
> > At least in the RDMA world, users can provide huge sizes and they expect
> > to get plain -ENOMEM and not dump stack, because it happens indirectly
> > to them.
> > 
> > In our case, these two kvcalloc() generates WARN_ON().
> > 
> > 		umem_odp->pfn_list = kvcalloc(
> > 			npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL);
> 
> Does it really make sense for the user to specify 2^31 PFNs in a single
> call?  I mean, that's 8TB of memory.  Should RDMA put its own limit
> in here, or should it rely on kvmalloc returning -ENOMEM?

I wrote this - I don't think RDMA should put a limit here. What
limit would it use anyhow?

I'm pretty sure database people are already using low TB's here. It is
not absurd when you have DAX and the biggest user of ODP is with DAX.

If anything we might get to a point in a few years where the 2^31 is
too small and this has to be a better datastructure :\

Maybe an xarray and I should figure out how to use the multi-order
stuff to optimize huge pages?

I'd actually really like to get rid of it, just haven't figured out
how. The only purpose is to call set_page_dirty() and in many cases
the pfn should still be in the mm's page table. We also store another
copy of the PFN in the NIC's mapping. Surely one of these two could do
instead?

Jason
Jason Gunthorpe Dec. 2, 2021, 5:03 p.m. UTC | #14
On Wed, Dec 01, 2021 at 07:46:01PM -0800, Kees Cook wrote:

> If we're rejecting the value, then it's still a pathological size, so
> shouldn't the check be happening in the caller? I think the WARN is
> doing exactly what it was supposed to do: find the places where bad
> sizes can reach vmalloc.

I think it meshes very poorly with the overflow work:

  p = kzalloc(struct_size(p, regions, num_regions), GFP_KERNEL);

If num_regions is user controlled data why should the calling driver
hvae to somehow sanitize num_regions (without bugs!) instead of
relying on struct_size() and kzalloc() to contain all the sanitation?

What you are suggesting just pushes security sensitive coding into
drivers, which I think is the opposite of what we all want?

Jason
Kees Cook Dec. 2, 2021, 7:08 p.m. UTC | #15
On Thu, Dec 02, 2021 at 06:08:40PM +0200, Leon Romanovsky wrote:
> On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > > The problem is that this WARN_ON() is triggered by the users.
> > 
> > ... or the problem is that you don't do a sanity check between the user
> > and the MM system.  I mean, that's what this conversation is about --
> > is it a bug to be asking for this much memory in the first place?
> 
> We do a lot of checks, and in this case, user provided valid input.
> He asked size that doesn't cross his address space.
> https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67
> 
> 		start = ALIGN_DOWN(umem_odp->umem.address, page_size);
> 		if (check_add_overflow(umem_odp->umem.address,
> 				       (unsigned long)umem_odp->umem.length,
> 				       &end))
> 			return -EOVERFLOW;
> 
> There is a feature called ODP (on-demand-paging) which is supported
> in some RDMA NICs. It allows to the user "export" their whole address
> space to the other RDMA node without pinning the pages. And once the
> other node sends data to not-pinned page, the RDMA NIC will prefetch
> it.

I think we have two cases:

- limiting kvmalloc allocations to INT_MAX
- issuing a WARN when that limit is exceeded

The argument for the having the WARN is "that amount should never be
allocated so we want to find the pathological callers".

But if the actual issue is that >INT_MAX is _acceptable_, then we have
to do away with the entire check, not just the WARN.
Leon Romanovsky Dec. 2, 2021, 7:24 p.m. UTC | #16
On Thu, Dec 02, 2021 at 11:08:34AM -0800, Kees Cook wrote:
> On Thu, Dec 02, 2021 at 06:08:40PM +0200, Leon Romanovsky wrote:
> > On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> > > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > > > The problem is that this WARN_ON() is triggered by the users.
> > > 
> > > ... or the problem is that you don't do a sanity check between the user
> > > and the MM system.  I mean, that's what this conversation is about --
> > > is it a bug to be asking for this much memory in the first place?
> > 
> > We do a lot of checks, and in this case, user provided valid input.
> > He asked size that doesn't cross his address space.
> > https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67
> > 
> > 		start = ALIGN_DOWN(umem_odp->umem.address, page_size);
> > 		if (check_add_overflow(umem_odp->umem.address,
> > 				       (unsigned long)umem_odp->umem.length,
> > 				       &end))
> > 			return -EOVERFLOW;
> > 
> > There is a feature called ODP (on-demand-paging) which is supported
> > in some RDMA NICs. It allows to the user "export" their whole address
> > space to the other RDMA node without pinning the pages. And once the
> > other node sends data to not-pinned page, the RDMA NIC will prefetch
> > it.
> 
> I think we have two cases:
> 
> - limiting kvmalloc allocations to INT_MAX
> - issuing a WARN when that limit is exceeded
> 
> The argument for the having the WARN is "that amount should never be
> allocated so we want to find the pathological callers".
> 
> But if the actual issue is that >INT_MAX is _acceptable_, then we have
> to do away with the entire check, not just the WARN.

First we need to get rid from WARN_ON(), which is completely safe thing to do.

Removal of the check can be done in second step as it will require audit
of whole kvmalloc* path.

Thanks

> 
> -- 
> Kees Cook
Jeremy Sowden Dec. 2, 2021, 9:16 p.m. UTC | #17
On 2021-12-02, at 07:34:36 -0800, Alexei Starovoitov wrote:
> On Thu, Dec 2, 2021 at 2:38 AM Jeremy Sowden wrote:
> > On 2021-12-01, at 20:29:05 -0800, Andrew Morton wrote:
> > > On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui wrote:
> > > > 在 2021/12/2 上午11:26, Andrew Morton 写道:
> > > > >> Delete the WARN_ON() and return NULL directly for oversized
> > > > >> parameter in kvmalloc() calls.
> > > > >> Also add unlikely().
> > > > >>
> > > > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> > > > >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com>
> > > > >> ---
> > > > >> There are a lot of oversize warnings and patches about kvmalloc()
> > > > >> calls recently. Maybe these warnings are not very necessary.
> > > > >
> > > > > Or maybe they are.  Please let's take a look at these warnings,
> > > > > one at a time.  If a large number of them are bogus then sure,
> > > > > let's disable the runtime test.  But perhaps it's the case that
> > > > > calling code has genuine issues and should be repaired.
> > > >
> > > > Such as:
> > >
> > > Thanks, that's helpful.
> > >
> > > Let's bring all these to the attention of the relevant developers.
> > >
> > > If the consensus is "the code's fine, the warning is bogus" then let's
> > > consider retiring the warning.
> > >
> > > If the consensus is otherwise then hopefully they will fix their stuff!
> > >
> > > > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e
> > >
> > > (cc bpf@vger.kernel.org)
> > >
> > > > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d
> > >
> > > (cc netdev@vger.kernel.org, maintainers)
> > >
> > > > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7
> > >
> > > (cc kvm@vger.kernel.org)
> > >
> > > > https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00
> > >
> > > (cc netfilter-devel@vger.kernel.org)
> >
> > The netfilter bug has since been fixed:
> >
> >   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=7bbc3d385bd813077acaf0e6fdb2a86a901f5382
>
> How is this a "fix" ?
> u32 was the limit and because of the new warn the limit
> got reduced to s32.
> Every subsystem is supposed to do this "fix" now?

My intention was only to provide information about what had been done in
the ipset case.  In that case, there was already a check in place to
ensure that the requested hash-table size would not result in integer
overflow, and it was adjusted to reflect the limit imposed by the new
warning (one imagines that there is not much demand for hash-tables that
big).

I'm not familiar with the other cases, and so I would not presume to
make suggestions about whether those warnings were useful.

J.
Kees Cook Dec. 2, 2021, 9:23 p.m. UTC | #18
On Thu, Dec 02, 2021 at 09:24:08PM +0200, Leon Romanovsky wrote:
> On Thu, Dec 02, 2021 at 11:08:34AM -0800, Kees Cook wrote:
> > On Thu, Dec 02, 2021 at 06:08:40PM +0200, Leon Romanovsky wrote:
> > > On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote:
> > > > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote:
> > > > > The problem is that this WARN_ON() is triggered by the users.
> > > > 
> > > > ... or the problem is that you don't do a sanity check between the user
> > > > and the MM system.  I mean, that's what this conversation is about --
> > > > is it a bug to be asking for this much memory in the first place?
> > > 
> > > We do a lot of checks, and in this case, user provided valid input.
> > > He asked size that doesn't cross his address space.
> > > https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67
> > > 
> > > 		start = ALIGN_DOWN(umem_odp->umem.address, page_size);
> > > 		if (check_add_overflow(umem_odp->umem.address,
> > > 				       (unsigned long)umem_odp->umem.length,
> > > 				       &end))
> > > 			return -EOVERFLOW;
> > > 
> > > There is a feature called ODP (on-demand-paging) which is supported
> > > in some RDMA NICs. It allows to the user "export" their whole address
> > > space to the other RDMA node without pinning the pages. And once the
> > > other node sends data to not-pinned page, the RDMA NIC will prefetch
> > > it.
> > 
> > I think we have two cases:
> > 
> > - limiting kvmalloc allocations to INT_MAX
> > - issuing a WARN when that limit is exceeded
> > 
> > The argument for the having the WARN is "that amount should never be
> > allocated so we want to find the pathological callers".
> > 
> > But if the actual issue is that >INT_MAX is _acceptable_, then we have
> > to do away with the entire check, not just the WARN.
> 
> First we need to get rid from WARN_ON(), which is completely safe thing to do.
> 
> Removal of the check can be done in second step as it will require audit
> of whole kvmalloc* path.

If those are legit sizes, I'm fine with dropping the WARN. (But I still
think if they're legit sizes, we must also drop the INT_MAX limit.)
Andrew Morton Dec. 2, 2021, 10:03 p.m. UTC | #19
On Thu, 2 Dec 2021 13:23:13 -0800 Kees Cook <keescook@chromium.org> wrote:

> > > I think we have two cases:
> > > 
> > > - limiting kvmalloc allocations to INT_MAX
> > > - issuing a WARN when that limit is exceeded
> > > 
> > > The argument for the having the WARN is "that amount should never be
> > > allocated so we want to find the pathological callers".
> > > 
> > > But if the actual issue is that >INT_MAX is _acceptable_, then we have
> > > to do away with the entire check, not just the WARN.
> > 
> > First we need to get rid from WARN_ON(), which is completely safe thing to do.
> > 
> > Removal of the check can be done in second step as it will require audit
> > of whole kvmalloc* path.
> 
> If those are legit sizes, I'm fine with dropping the WARN. (But I still
> think if they're legit sizes, we must also drop the INT_MAX limit.)

Can we suppress the WARN if the caller passed __GFP_NOWARN?
Matthew Wilcox Dec. 3, 2021, 4:39 a.m. UTC | #20
On Thu, Dec 02, 2021 at 02:03:43PM -0800, Andrew Morton wrote:
> On Thu, 2 Dec 2021 13:23:13 -0800 Kees Cook <keescook@chromium.org> wrote:
> 
> > > > I think we have two cases:
> > > > 
> > > > - limiting kvmalloc allocations to INT_MAX
> > > > - issuing a WARN when that limit is exceeded
> > > > 
> > > > The argument for the having the WARN is "that amount should never be
> > > > allocated so we want to find the pathological callers".
> > > > 
> > > > But if the actual issue is that >INT_MAX is _acceptable_, then we have
> > > > to do away with the entire check, not just the WARN.
> > > 
> > > First we need to get rid from WARN_ON(), which is completely safe thing to do.
> > > 
> > > Removal of the check can be done in second step as it will require audit
> > > of whole kvmalloc* path.
> > 
> > If those are legit sizes, I'm fine with dropping the WARN. (But I still
> > think if they're legit sizes, we must also drop the INT_MAX limit.)
> 
> Can we suppress the WARN if the caller passed __GFP_NOWARN?

I don't think that's a good idea.  NOWARN is for allocation failure
messages whereas this warning is more of a "You're doing something
wrong" -- ENOMEM vs EINVAL.

I'm still agnostic on whether this should be a check at all, or whether
we should let people kvmalloc(20GB).  But I don't like conditioning the
warning on GFP_NOWARN.
Sean Christopherson Dec. 3, 2021, 7:37 p.m. UTC | #21
+Paolo, I'm pretty sure he's still not subscribed to the KVM mailing list :-)

On Wed, Dec 01, 2021, Andrew Morton wrote:
> On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui <cuibixuan@linux.alibaba.com> wrote:
> 
> > 
> > 在 2021/12/2 上午11:26, Andrew Morton 写道:
> > >> Delete the WARN_ON() and return NULL directly for oversized parameter
> > >> in kvmalloc() calls.
> > >> Also add unlikely().
> > >>
> > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> > >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com>
> > >> ---
> > >> There are a lot of oversize warnings and patches about kvmalloc() calls
> > >> recently. Maybe these warnings are not very necessary.
> > > Or maybe they are.  Please let's take a look at these warnings, one at
> > > a time.  If a large number of them are bogus then sure, let's disable
> > > the runtime test.  But perhaps it's the case that calling code has
> > > genuine issues and should be repaired.
> > Such as:
> 
> Thanks, that's helpful.
> 
> Let's bring all these to the attention of the relevant developers.
> 
> If the consensus is "the code's fine, the warning is bogus" then let's
> consider retiring the warning.
> 
> If the consensus is otherwise then hopefully they will fix their stuff!
> 
> 
> 
> > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e
> 
> (cc bpf@vger.kernel.org)
> 
> > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d
> 
> (cc netdev@vger.kernel.org, maintainers)
> 
> > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7
> 
> (cc kvm@vger.kernel.org)

Paolo posted patches to resolve the KVM issues, but I don't think they ever got
applied.

https://lore.kernel.org/all/20211016064302.165220-1-pbonzini@redhat.com/
https://lore.kernel.org/all/20211015165519.135670-1-pbonzini@redhat.com/
Leon Romanovsky Dec. 5, 2021, 11:59 a.m. UTC | #22
On Thu, Dec 02, 2021 at 10:06:24AM +0800, Bixuan Cui wrote:
> Delete the WARN_ON() and return NULL directly for oversized parameter
> in kvmalloc() calls.
> Also add unlikely().
> 
> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com>

How can we progress with this patch?

From this discussion, at least for RDMA, KVM and BPF, this huge size is legit
and WARN_ON() can be seen as false alarm.

Thanks
diff mbox series

Patch

diff --git a/mm/util.c b/mm/util.c
index 7e433690..d26f19c 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -587,7 +587,7 @@  void *kvmalloc_node(size_t size, gfp_t flags, int node)
 		return ret;
 
 	/* Don't even allow crazy sizes */
-	if (WARN_ON_ONCE(size > INT_MAX))
+	if (unlikely(size > INT_MAX))
 		return NULL;
 
 	return __vmalloc_node(size, 1, flags, node,