diff mbox series

mm: allow huge kvmalloc() calls if they're accounted to memcg

Message ID 20211016065130.166128-1-pbonzini@redhat.com (mailing list archive)
State New, archived
Headers show
Series mm: allow huge kvmalloc() calls if they're accounted to memcg | expand

Commit Message

Paolo Bonzini Oct. 16, 2021, 6:51 a.m. UTC
Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
restricted memory allocation with 'kvmalloc()' to sizes that fit
in an 'int', to protect against trivial integer conversion issues.

However, the WARN triggers with KVM, when it allocates ancillary page
data whose size essentially depends on whatever userspace has passed to
the KVM_SET_USER_MEMORY_REGION ioctl.  The warnings are easily raised by
syzkaller, but the largest allocation that KVM can do is 8 bytes per page
of guest memory; therefore, a 1 TiB memslot will cause a warning even
outside fuzzing, and those allocations are known to happen in the wild.
Google for example already has VMs that create 1.5tb memslots (12tb of
total guest memory spread across 8 virtual NUMA nodes).

Use memcg accounting as evidence that the crazy large allocations are
expected---in which case, it is indeed a good idea to have them
properly accounted---and exempt them from the warning.

Cc: Willy Tarreau <w@1wt.eu>
Cc: Kees Cook <keescook@chromium.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Reported-by: syzbot+e0de2333cbf95ea473e8@syzkaller.appspotmail.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
	Linus, what do you think of this?  It is a bit of a hack,
	but the reasoning in the commit message does make at least
	some sense.

	The alternative would be to just use __vmalloc in KVM, and add
	__vcalloc too.	The two underscores would suggest that something
	"different" is going on, but I wonder what you prefer between
	this and having a __vcalloc with 2-3 uses in the whole source.

 mm/util.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Kees Cook Oct. 18, 2021, 3:09 p.m. UTC | #1
On Sat, Oct 16, 2021 at 02:51:30AM -0400, Paolo Bonzini wrote:
> Commit 7661809d493b ("mm: don't allow oversized kvmalloc() calls")
> restricted memory allocation with 'kvmalloc()' to sizes that fit
> in an 'int', to protect against trivial integer conversion issues.
> 
> However, the WARN triggers with KVM, when it allocates ancillary page
> data whose size essentially depends on whatever userspace has passed to
> the KVM_SET_USER_MEMORY_REGION ioctl.  The warnings are easily raised by
> syzkaller, but the largest allocation that KVM can do is 8 bytes per page
> of guest memory; therefore, a 1 TiB memslot will cause a warning even
> outside fuzzing, and those allocations are known to happen in the wild.
> Google for example already has VMs that create 1.5tb memslots (12tb of
> total guest memory spread across 8 virtual NUMA nodes).
> 
> Use memcg accounting as evidence that the crazy large allocations are
> expected---in which case, it is indeed a good idea to have them
> properly accounted---and exempt them from the warning.

Will memcg always have a "sane" upper bound? If so, yeah, this seems a
better solution than dropping the WARN completely. :)

Reviewed-by: Kees Cook <keescook@chromium.org>

-Kees

> 
> Cc: Willy Tarreau <w@1wt.eu>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Reported-by: syzbot+e0de2333cbf95ea473e8@syzkaller.appspotmail.com
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> 	Linus, what do you think of this?  It is a bit of a hack,
> 	but the reasoning in the commit message does make at least
> 	some sense.
> 
> 	The alternative would be to just use __vmalloc in KVM, and add
> 	__vcalloc too.	The two underscores would suggest that something
> 	"different" is going on, but I wonder what you prefer between
> 	this and having a __vcalloc with 2-3 uses in the whole source.
> 
>  mm/util.c | 8 ++++++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
> 
> diff --git a/mm/util.c b/mm/util.c
> index 499b6b5767ed..31fca4a999c6 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -593,8 +593,12 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
>  	if (ret || size <= PAGE_SIZE)
>  		return ret;
>  
> -	/* Don't even allow crazy sizes */
> -	if (WARN_ON_ONCE(size > INT_MAX))
> +	/*
> +	 * Don't even allow crazy sizes unless memcg accounting is
> +	 * request.  We take that as a sign that huge allocations
> +	 * are indeed expected.
> +	 */
> +	if (likely(!(flags & __GFP_ACCOUNT)) && WARN_ON_ONCE(size > INT_MAX))
>  		return NULL;
>  
>  	return __vmalloc_node(size, 1, flags, node,
> -- 
> 2.27.0
>
diff mbox series

Patch

diff --git a/mm/util.c b/mm/util.c
index 499b6b5767ed..31fca4a999c6 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -593,8 +593,12 @@  void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
-	/* Don't even allow crazy sizes */
-	if (WARN_ON_ONCE(size > INT_MAX))
+	/*
+	 * Don't even allow crazy sizes unless memcg accounting is
+	 * request.  We take that as a sign that huge allocations
+	 * are indeed expected.
+	 */
+	if (likely(!(flags & __GFP_ACCOUNT)) && WARN_ON_ONCE(size > INT_MAX))
 		return NULL;
 
 	return __vmalloc_node(size, 1, flags, node,