Message ID | 1638410784-48646-1-git-send-email-cuibixuan@linux.alibaba.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [-next] mm: delete oversized WARN_ON() in kvmalloc() calls | expand |
On 2021/12/2 10:06, Bixuan Cui wrote: > Delete the WARN_ON() and return NULL directly for oversized parameter > in kvmalloc() calls. > Also add unlikely(). > The commit message should explain why we need to do this. Thanks. > Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com> > --- > There are a lot of oversize warnings and patches about kvmalloc() calls > recently. Maybe these warnings are not very necessary. > > https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal > > The example of size check in __do_kmalloc_node(): > __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller) > { > struct kmem_cache *cachep; > void *ret; > > if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) > return NULL; > cachep = kmalloc_slab(size, flags); > > mm/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/util.c b/mm/util.c > index 7e433690..d26f19c 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -587,7 +587,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) > return ret; > > /* Don't even allow crazy sizes */ > - if (WARN_ON_ONCE(size > INT_MAX)) > + if (unlikely(size > INT_MAX)) > return NULL; > > return __vmalloc_node(size, 1, flags, node, > Tang
On Thu, 2 Dec 2021 10:06:24 +0800 Bixuan Cui <cuibixuan@linux.alibaba.com> wrote: > Delete the WARN_ON() and return NULL directly for oversized parameter > in kvmalloc() calls. > Also add unlikely(). > > Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com> > --- > There are a lot of oversize warnings and patches about kvmalloc() calls > recently. Maybe these warnings are not very necessary. Or maybe they are. Please let's take a look at these warnings, one at a time. If a large number of them are bogus then sure, let's disable the runtime test. But perhaps it's the case that calling code has genuine issues and should be repaired.
On Thu, Dec 02, 2021 at 10:06:24AM +0800, Bixuan Cui wrote: > Delete the WARN_ON() and return NULL directly for oversized parameter > in kvmalloc() calls. > Also add unlikely(). > > Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com> > --- > There are a lot of oversize warnings and patches about kvmalloc() calls > recently. Maybe these warnings are not very necessary. It seems these warnings are working, yes? i.e. we're finding the places where giant values are coming in? > > https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal > > The example of size check in __do_kmalloc_node(): > __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller) > { > struct kmem_cache *cachep; > void *ret; > > if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) > return NULL; > cachep = kmalloc_slab(size, flags); > > mm/util.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/util.c b/mm/util.c > index 7e433690..d26f19c 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -587,7 +587,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) > return ret; > > /* Don't even allow crazy sizes */ > - if (WARN_ON_ONCE(size > INT_MAX)) > + if (unlikely(size > INT_MAX)) > return NULL; If we're rejecting the value, then it's still a pathological size, so shouldn't the check be happening in the caller? I think the WARN is doing exactly what it was supposed to do: find the places where bad sizes can reach vmalloc. -Kees > > return __vmalloc_node(size, 1, flags, node, > -- > 1.8.3.1 >
在 2021/12/2 上午11:26, Andrew Morton 写道: >> Delete the WARN_ON() and return NULL directly for oversized parameter >> in kvmalloc() calls. >> Also add unlikely(). >> >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com> >> --- >> There are a lot of oversize warnings and patches about kvmalloc() calls >> recently. Maybe these warnings are not very necessary. > Or maybe they are. Please let's take a look at these warnings, one at > a time. If a large number of them are bogus then sure, let's disable > the runtime test. But perhaps it's the case that calling code has > genuine issues and should be repaired. Such as: https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7 https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00 https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83
On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui <cuibixuan@linux.alibaba.com> wrote: > > 在 2021/12/2 上午11:26, Andrew Morton 写道: > >> Delete the WARN_ON() and return NULL directly for oversized parameter > >> in kvmalloc() calls. > >> Also add unlikely(). > >> > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com> > >> --- > >> There are a lot of oversize warnings and patches about kvmalloc() calls > >> recently. Maybe these warnings are not very necessary. > > Or maybe they are. Please let's take a look at these warnings, one at > > a time. If a large number of them are bogus then sure, let's disable > > the runtime test. But perhaps it's the case that calling code has > > genuine issues and should be repaired. > Such as: Thanks, that's helpful. Let's bring all these to the attention of the relevant developers. If the consensus is "the code's fine, the warning is bogus" then let's consider retiring the warning. If the consensus is otherwise then hopefully they will fix their stuff! > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e (cc bpf@vger.kernel.org) > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d (cc netdev@vger.kernel.org, maintainers) > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7 (cc kvm@vger.kernel.org) > https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00 (cc netfilter-devel@vger.kernel.org) > https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83 (bpf again).
在 2021/12/2 上午11:46, Kees Cook 写道: > On Thu, Dec 02, 2021 at 10:06:24AM +0800, Bixuan Cui wrote: >> Delete the WARN_ON() and return NULL directly for oversized parameter >> in kvmalloc() calls. >> Also add unlikely(). >> >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com> >> --- >> There are a lot of oversize warnings and patches about kvmalloc() calls >> recently. Maybe these warnings are not very necessary. > It seems these warnings are working, yes? i.e. we're finding the places > where giant values are coming in? Yes, It's working. > >> https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal >> >> The example of size check in __do_kmalloc_node(): >> __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller) >> { >> struct kmem_cache *cachep; >> void *ret; >> >> if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) >> return NULL; >> cachep = kmalloc_slab(size, flags); >> >> mm/util.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/mm/util.c b/mm/util.c >> index 7e433690..d26f19c 100644 >> --- a/mm/util.c >> +++ b/mm/util.c >> @@ -587,7 +587,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) >> return ret; >> >> /* Don't even allow crazy sizes */ >> - if (WARN_ON_ONCE(size > INT_MAX)) >> + if (unlikely(size > INT_MAX)) >> return NULL; > If we're rejecting the value, then it's still a pathological size, so > shouldn't the check be happening in the caller? I think the WARN is > doing exactly what it was supposed to do: find the places where bad > sizes can reach vmalloc. In this way, we must check whether the size from the user exceeds INT_MAX before calling kvmalloc() calls. Generally speaking, the oversize check is rarely done before. Thanks, Bixuan Cui > > -Kees > >> >> return __vmalloc_node(size, 1, flags, node, >> -- >> 1.8.3.1 >> > -- Kees Cook
On 2021-12-01, at 20:29:05 -0800, Andrew Morton wrote: > On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui wrote: > > 在 2021/12/2 上午11:26, Andrew Morton 写道: > > >> Delete the WARN_ON() and return NULL directly for oversized > > >> parameter in kvmalloc() calls. > > >> Also add unlikely(). > > >> > > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > > >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com> > > >> --- > > >> There are a lot of oversize warnings and patches about kvmalloc() > > >> calls recently. Maybe these warnings are not very necessary. > > > > > > Or maybe they are. Please let's take a look at these warnings, > > > one at a time. If a large number of them are bogus then sure, > > > let's disable the runtime test. But perhaps it's the case that > > > calling code has genuine issues and should be repaired. > > > > Such as: > > Thanks, that's helpful. > > Let's bring all these to the attention of the relevant developers. > > If the consensus is "the code's fine, the warning is bogus" then let's > consider retiring the warning. > > If the consensus is otherwise then hopefully they will fix their stuff! > > > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e > > (cc bpf@vger.kernel.org) > > > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d > > (cc netdev@vger.kernel.org, maintainers) > > > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7 > > (cc kvm@vger.kernel.org) > > > https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00 > > (cc netfilter-devel@vger.kernel.org) The netfilter bug has since been fixed: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=7bbc3d385bd813077acaf0e6fdb2a86a901f5382 > > https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83 > > (bpf again). J.
在 2021/12/2 下午12:29, Andrew Morton 写道: > Thanks, that's helpful. > > Let's bring all these to the attention of the relevant developers. > > If the consensus is "the code's fine, the warning is bogus" then let's > consider retiring the warning. > > If the consensus is otherwise then hopefully they will fix their stuff! Ok,thanks for your advice :-) Thanks, Bixuan Cui
On Wed, Dec 01, 2021 at 07:26:43PM -0800, Andrew Morton wrote: > On Thu, 2 Dec 2021 10:06:24 +0800 Bixuan Cui <cuibixuan@linux.alibaba.com> wrote: > > > Delete the WARN_ON() and return NULL directly for oversized parameter > > in kvmalloc() calls. > > Also add unlikely(). > > > > Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > > Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com> > > --- > > There are a lot of oversize warnings and patches about kvmalloc() calls > > recently. Maybe these warnings are not very necessary. > > Or maybe they are. Please let's take a look at these warnings, one at > a time. If a large number of them are bogus then sure, let's disable > the runtime test. But perhaps it's the case that calling code has > genuine issues and should be repaired. Andrew, The problem is that this WARN_ON() is triggered by the users. At least in the RDMA world, users can provide huge sizes and they expect to get plain -ENOMEM and not dump stack, because it happens indirectly to them. In our case, these two kvcalloc() generates WARN_ON(). umem_odp->pfn_list = kvcalloc( npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL); if (!umem_odp->pfn_list) return -ENOMEM; umem_odp->dma_list = kvcalloc( ndmas, sizeof(*umem_odp->dma_list), GFP_KERNEL); https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L80 It is not a kernel programmer error to allow "oversized kvmalloc call" . Thanks
On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote: > The problem is that this WARN_ON() is triggered by the users. ... or the problem is that you don't do a sanity check between the user and the MM system. I mean, that's what this conversation is about -- is it a bug to be asking for this much memory in the first place? > At least in the RDMA world, users can provide huge sizes and they expect > to get plain -ENOMEM and not dump stack, because it happens indirectly > to them. > > In our case, these two kvcalloc() generates WARN_ON(). > > umem_odp->pfn_list = kvcalloc( > npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL); Does it really make sense for the user to specify 2^31 PFNs in a single call? I mean, that's 8TB of memory. Should RDMA put its own limit in here, or should it rely on kvmalloc returning -ENOMEM?
On Thu, Dec 2, 2021 at 2:38 AM Jeremy Sowden <jeremy@azazel.net> wrote: > > On 2021-12-01, at 20:29:05 -0800, Andrew Morton wrote: > > On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui wrote: > > > 在 2021/12/2 上午11:26, Andrew Morton 写道: > > > >> Delete the WARN_ON() and return NULL directly for oversized > > > >> parameter in kvmalloc() calls. > > > >> Also add unlikely(). > > > >> > > > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > > > >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com> > > > >> --- > > > >> There are a lot of oversize warnings and patches about kvmalloc() > > > >> calls recently. Maybe these warnings are not very necessary. > > > > > > > > Or maybe they are. Please let's take a look at these warnings, > > > > one at a time. If a large number of them are bogus then sure, > > > > let's disable the runtime test. But perhaps it's the case that > > > > calling code has genuine issues and should be repaired. > > > > > > Such as: > > > > Thanks, that's helpful. > > > > Let's bring all these to the attention of the relevant developers. > > > > If the consensus is "the code's fine, the warning is bogus" then let's > > consider retiring the warning. > > > > If the consensus is otherwise then hopefully they will fix their stuff! > > > > > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e > > > > (cc bpf@vger.kernel.org) > > > > > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d > > > > (cc netdev@vger.kernel.org, maintainers) > > > > > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7 > > > > (cc kvm@vger.kernel.org) > > > > > https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00 > > > > (cc netfilter-devel@vger.kernel.org) > > The netfilter bug has since been fixed: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=7bbc3d385bd813077acaf0e6fdb2a86a901f5382 How is this a "fix" ? u32 was the limit and because of the new warn the limit got reduced to s32. Every subsystem is supposed to do this "fix" now? > > > https://syzkaller.appspot.com/bug?id=4c9ab8c7d0f8b551950db06559dc9cde4119ac83 > > > > (bpf again). > > J.
On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote: > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote: > > The problem is that this WARN_ON() is triggered by the users. > > ... or the problem is that you don't do a sanity check between the user > and the MM system. I mean, that's what this conversation is about -- > is it a bug to be asking for this much memory in the first place? We do a lot of checks, and in this case, user provided valid input. He asked size that doesn't cross his address space. https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67 start = ALIGN_DOWN(umem_odp->umem.address, page_size); if (check_add_overflow(umem_odp->umem.address, (unsigned long)umem_odp->umem.length, &end)) return -EOVERFLOW; There is a feature called ODP (on-demand-paging) which is supported in some RDMA NICs. It allows to the user "export" their whole address space to the other RDMA node without pinning the pages. And once the other node sends data to not-pinned page, the RDMA NIC will prefetch it. > > > At least in the RDMA world, users can provide huge sizes and they expect > > to get plain -ENOMEM and not dump stack, because it happens indirectly > > to them. > > > > In our case, these two kvcalloc() generates WARN_ON(). > > > > umem_odp->pfn_list = kvcalloc( > > npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL); > > Does it really make sense for the user to specify 2^31 PFNs in a single > call? I mean, that's 8TB of memory. Should RDMA put its own limit > in here, or should it rely on kvmalloc returning -ENOMEM? I heard about such systems with so many available RAM. I don't know about their usage pattern, most likely they will use hugepages, but it is my guess. The thing is that it is not RDMA-specific. You are asking to place same if (size > KVMALLOC...) check in all subsystems. Thanks
On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote: > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote: > > The problem is that this WARN_ON() is triggered by the users. > > ... or the problem is that you don't do a sanity check between the user > and the MM system. I mean, that's what this conversation is about -- > is it a bug to be asking for this much memory in the first place? > > > At least in the RDMA world, users can provide huge sizes and they expect > > to get plain -ENOMEM and not dump stack, because it happens indirectly > > to them. > > > > In our case, these two kvcalloc() generates WARN_ON(). > > > > umem_odp->pfn_list = kvcalloc( > > npfns, sizeof(*umem_odp->pfn_list), GFP_KERNEL); > > Does it really make sense for the user to specify 2^31 PFNs in a single > call? I mean, that's 8TB of memory. Should RDMA put its own limit > in here, or should it rely on kvmalloc returning -ENOMEM? I wrote this - I don't think RDMA should put a limit here. What limit would it use anyhow? I'm pretty sure database people are already using low TB's here. It is not absurd when you have DAX and the biggest user of ODP is with DAX. If anything we might get to a point in a few years where the 2^31 is too small and this has to be a better datastructure :\ Maybe an xarray and I should figure out how to use the multi-order stuff to optimize huge pages? I'd actually really like to get rid of it, just haven't figured out how. The only purpose is to call set_page_dirty() and in many cases the pfn should still be in the mm's page table. We also store another copy of the PFN in the NIC's mapping. Surely one of these two could do instead? Jason
On Wed, Dec 01, 2021 at 07:46:01PM -0800, Kees Cook wrote: > If we're rejecting the value, then it's still a pathological size, so > shouldn't the check be happening in the caller? I think the WARN is > doing exactly what it was supposed to do: find the places where bad > sizes can reach vmalloc. I think it meshes very poorly with the overflow work: p = kzalloc(struct_size(p, regions, num_regions), GFP_KERNEL); If num_regions is user controlled data why should the calling driver hvae to somehow sanitize num_regions (without bugs!) instead of relying on struct_size() and kzalloc() to contain all the sanitation? What you are suggesting just pushes security sensitive coding into drivers, which I think is the opposite of what we all want? Jason
On Thu, Dec 02, 2021 at 06:08:40PM +0200, Leon Romanovsky wrote: > On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote: > > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote: > > > The problem is that this WARN_ON() is triggered by the users. > > > > ... or the problem is that you don't do a sanity check between the user > > and the MM system. I mean, that's what this conversation is about -- > > is it a bug to be asking for this much memory in the first place? > > We do a lot of checks, and in this case, user provided valid input. > He asked size that doesn't cross his address space. > https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67 > > start = ALIGN_DOWN(umem_odp->umem.address, page_size); > if (check_add_overflow(umem_odp->umem.address, > (unsigned long)umem_odp->umem.length, > &end)) > return -EOVERFLOW; > > There is a feature called ODP (on-demand-paging) which is supported > in some RDMA NICs. It allows to the user "export" their whole address > space to the other RDMA node without pinning the pages. And once the > other node sends data to not-pinned page, the RDMA NIC will prefetch > it. I think we have two cases: - limiting kvmalloc allocations to INT_MAX - issuing a WARN when that limit is exceeded The argument for the having the WARN is "that amount should never be allocated so we want to find the pathological callers". But if the actual issue is that >INT_MAX is _acceptable_, then we have to do away with the entire check, not just the WARN.
On Thu, Dec 02, 2021 at 11:08:34AM -0800, Kees Cook wrote: > On Thu, Dec 02, 2021 at 06:08:40PM +0200, Leon Romanovsky wrote: > > On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote: > > > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote: > > > > The problem is that this WARN_ON() is triggered by the users. > > > > > > ... or the problem is that you don't do a sanity check between the user > > > and the MM system. I mean, that's what this conversation is about -- > > > is it a bug to be asking for this much memory in the first place? > > > > We do a lot of checks, and in this case, user provided valid input. > > He asked size that doesn't cross his address space. > > https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67 > > > > start = ALIGN_DOWN(umem_odp->umem.address, page_size); > > if (check_add_overflow(umem_odp->umem.address, > > (unsigned long)umem_odp->umem.length, > > &end)) > > return -EOVERFLOW; > > > > There is a feature called ODP (on-demand-paging) which is supported > > in some RDMA NICs. It allows to the user "export" their whole address > > space to the other RDMA node without pinning the pages. And once the > > other node sends data to not-pinned page, the RDMA NIC will prefetch > > it. > > I think we have two cases: > > - limiting kvmalloc allocations to INT_MAX > - issuing a WARN when that limit is exceeded > > The argument for the having the WARN is "that amount should never be > allocated so we want to find the pathological callers". > > But if the actual issue is that >INT_MAX is _acceptable_, then we have > to do away with the entire check, not just the WARN. First we need to get rid from WARN_ON(), which is completely safe thing to do. Removal of the check can be done in second step as it will require audit of whole kvmalloc* path. Thanks > > -- > Kees Cook
On 2021-12-02, at 07:34:36 -0800, Alexei Starovoitov wrote: > On Thu, Dec 2, 2021 at 2:38 AM Jeremy Sowden wrote: > > On 2021-12-01, at 20:29:05 -0800, Andrew Morton wrote: > > > On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui wrote: > > > > 在 2021/12/2 上午11:26, Andrew Morton 写道: > > > > >> Delete the WARN_ON() and return NULL directly for oversized > > > > >> parameter in kvmalloc() calls. > > > > >> Also add unlikely(). > > > > >> > > > > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > > > > >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com> > > > > >> --- > > > > >> There are a lot of oversize warnings and patches about kvmalloc() > > > > >> calls recently. Maybe these warnings are not very necessary. > > > > > > > > > > Or maybe they are. Please let's take a look at these warnings, > > > > > one at a time. If a large number of them are bogus then sure, > > > > > let's disable the runtime test. But perhaps it's the case that > > > > > calling code has genuine issues and should be repaired. > > > > > > > > Such as: > > > > > > Thanks, that's helpful. > > > > > > Let's bring all these to the attention of the relevant developers. > > > > > > If the consensus is "the code's fine, the warning is bogus" then let's > > > consider retiring the warning. > > > > > > If the consensus is otherwise then hopefully they will fix their stuff! > > > > > > > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e > > > > > > (cc bpf@vger.kernel.org) > > > > > > > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d > > > > > > (cc netdev@vger.kernel.org, maintainers) > > > > > > > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7 > > > > > > (cc kvm@vger.kernel.org) > > > > > > > https://syzkaller.appspot.com/bug?id=6f30adb592d476978777a1125d1f680edfc23e00 > > > > > > (cc netfilter-devel@vger.kernel.org) > > > > The netfilter bug has since been fixed: > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/log/?id=7bbc3d385bd813077acaf0e6fdb2a86a901f5382 > > How is this a "fix" ? > u32 was the limit and because of the new warn the limit > got reduced to s32. > Every subsystem is supposed to do this "fix" now? My intention was only to provide information about what had been done in the ipset case. In that case, there was already a check in place to ensure that the requested hash-table size would not result in integer overflow, and it was adjusted to reflect the limit imposed by the new warning (one imagines that there is not much demand for hash-tables that big). I'm not familiar with the other cases, and so I would not presume to make suggestions about whether those warnings were useful. J.
On Thu, Dec 02, 2021 at 09:24:08PM +0200, Leon Romanovsky wrote: > On Thu, Dec 02, 2021 at 11:08:34AM -0800, Kees Cook wrote: > > On Thu, Dec 02, 2021 at 06:08:40PM +0200, Leon Romanovsky wrote: > > > On Thu, Dec 02, 2021 at 03:29:47PM +0000, Matthew Wilcox wrote: > > > > On Thu, Dec 02, 2021 at 05:23:42PM +0200, Leon Romanovsky wrote: > > > > > The problem is that this WARN_ON() is triggered by the users. > > > > > > > > ... or the problem is that you don't do a sanity check between the user > > > > and the MM system. I mean, that's what this conversation is about -- > > > > is it a bug to be asking for this much memory in the first place? > > > > > > We do a lot of checks, and in this case, user provided valid input. > > > He asked size that doesn't cross his address space. > > > https://elixir.bootlin.com/linux/v5.16-rc3/source/drivers/infiniband/core/umem_odp.c#L67 > > > > > > start = ALIGN_DOWN(umem_odp->umem.address, page_size); > > > if (check_add_overflow(umem_odp->umem.address, > > > (unsigned long)umem_odp->umem.length, > > > &end)) > > > return -EOVERFLOW; > > > > > > There is a feature called ODP (on-demand-paging) which is supported > > > in some RDMA NICs. It allows to the user "export" their whole address > > > space to the other RDMA node without pinning the pages. And once the > > > other node sends data to not-pinned page, the RDMA NIC will prefetch > > > it. > > > > I think we have two cases: > > > > - limiting kvmalloc allocations to INT_MAX > > - issuing a WARN when that limit is exceeded > > > > The argument for the having the WARN is "that amount should never be > > allocated so we want to find the pathological callers". > > > > But if the actual issue is that >INT_MAX is _acceptable_, then we have > > to do away with the entire check, not just the WARN. > > First we need to get rid from WARN_ON(), which is completely safe thing to do. > > Removal of the check can be done in second step as it will require audit > of whole kvmalloc* path. If those are legit sizes, I'm fine with dropping the WARN. (But I still think if they're legit sizes, we must also drop the INT_MAX limit.)
On Thu, 2 Dec 2021 13:23:13 -0800 Kees Cook <keescook@chromium.org> wrote: > > > I think we have two cases: > > > > > > - limiting kvmalloc allocations to INT_MAX > > > - issuing a WARN when that limit is exceeded > > > > > > The argument for the having the WARN is "that amount should never be > > > allocated so we want to find the pathological callers". > > > > > > But if the actual issue is that >INT_MAX is _acceptable_, then we have > > > to do away with the entire check, not just the WARN. > > > > First we need to get rid from WARN_ON(), which is completely safe thing to do. > > > > Removal of the check can be done in second step as it will require audit > > of whole kvmalloc* path. > > If those are legit sizes, I'm fine with dropping the WARN. (But I still > think if they're legit sizes, we must also drop the INT_MAX limit.) Can we suppress the WARN if the caller passed __GFP_NOWARN?
On Thu, Dec 02, 2021 at 02:03:43PM -0800, Andrew Morton wrote: > On Thu, 2 Dec 2021 13:23:13 -0800 Kees Cook <keescook@chromium.org> wrote: > > > > > I think we have two cases: > > > > > > > > - limiting kvmalloc allocations to INT_MAX > > > > - issuing a WARN when that limit is exceeded > > > > > > > > The argument for the having the WARN is "that amount should never be > > > > allocated so we want to find the pathological callers". > > > > > > > > But if the actual issue is that >INT_MAX is _acceptable_, then we have > > > > to do away with the entire check, not just the WARN. > > > > > > First we need to get rid from WARN_ON(), which is completely safe thing to do. > > > > > > Removal of the check can be done in second step as it will require audit > > > of whole kvmalloc* path. > > > > If those are legit sizes, I'm fine with dropping the WARN. (But I still > > think if they're legit sizes, we must also drop the INT_MAX limit.) > > Can we suppress the WARN if the caller passed __GFP_NOWARN? I don't think that's a good idea. NOWARN is for allocation failure messages whereas this warning is more of a "You're doing something wrong" -- ENOMEM vs EINVAL. I'm still agnostic on whether this should be a check at all, or whether we should let people kvmalloc(20GB). But I don't like conditioning the warning on GFP_NOWARN.
+Paolo, I'm pretty sure he's still not subscribed to the KVM mailing list :-) On Wed, Dec 01, 2021, Andrew Morton wrote: > On Thu, 2 Dec 2021 12:05:15 +0800 Bixuan Cui <cuibixuan@linux.alibaba.com> wrote: > > > > > 在 2021/12/2 上午11:26, Andrew Morton 写道: > > >> Delete the WARN_ON() and return NULL directly for oversized parameter > > >> in kvmalloc() calls. > > >> Also add unlikely(). > > >> > > >> Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > > >> Signed-off-by: Bixuan Cui<cuibixuan@linux.alibaba.com> > > >> --- > > >> There are a lot of oversize warnings and patches about kvmalloc() calls > > >> recently. Maybe these warnings are not very necessary. > > > Or maybe they are. Please let's take a look at these warnings, one at > > > a time. If a large number of them are bogus then sure, let's disable > > > the runtime test. But perhaps it's the case that calling code has > > > genuine issues and should be repaired. > > Such as: > > Thanks, that's helpful. > > Let's bring all these to the attention of the relevant developers. > > If the consensus is "the code's fine, the warning is bogus" then let's > consider retiring the warning. > > If the consensus is otherwise then hopefully they will fix their stuff! > > > > > https://syzkaller.appspot.com/bug?id=24452f89446639c901ac07379ccc702808471e8e > > (cc bpf@vger.kernel.org) > > > https://syzkaller.appspot.com/bug?id=f7c5a86e747f9b7ce333e7295875cd4ede2c7a0d > > (cc netdev@vger.kernel.org, maintainers) > > > https://syzkaller.appspot.com/bug?id=8f306f3db150657a1f6bbe1927467084531602c7 > > (cc kvm@vger.kernel.org) Paolo posted patches to resolve the KVM issues, but I don't think they ever got applied. https://lore.kernel.org/all/20211016064302.165220-1-pbonzini@redhat.com/ https://lore.kernel.org/all/20211015165519.135670-1-pbonzini@redhat.com/
On Thu, Dec 02, 2021 at 10:06:24AM +0800, Bixuan Cui wrote: > Delete the WARN_ON() and return NULL directly for oversized parameter > in kvmalloc() calls. > Also add unlikely(). > > Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") > Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com> How can we progress with this patch? From this discussion, at least for RDMA, KVM and BPF, this huge size is legit and WARN_ON() can be seen as false alarm. Thanks
diff --git a/mm/util.c b/mm/util.c index 7e433690..d26f19c 100644 --- a/mm/util.c +++ b/mm/util.c @@ -587,7 +587,7 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) return ret; /* Don't even allow crazy sizes */ - if (WARN_ON_ONCE(size > INT_MAX)) + if (unlikely(size > INT_MAX)) return NULL; return __vmalloc_node(size, 1, flags, node,
Delete the WARN_ON() and return NULL directly for oversized parameter in kvmalloc() calls. Also add unlikely(). Fixes: 7661809d493b ("mm: don't allow oversized kvmalloc() calls") Signed-off-by: Bixuan Cui <cuibixuan@linux.alibaba.com> --- There are a lot of oversize warnings and patches about kvmalloc() calls recently. Maybe these warnings are not very necessary. https://lore.kernel.org/all/YadOjJXMTjP85MQx@unreal The example of size check in __do_kmalloc_node(): __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller) { struct kmem_cache *cachep; void *ret; if (unlikely(size > KMALLOC_MAX_CACHE_SIZE)) return NULL; cachep = kmalloc_slab(size, flags); mm/util.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)