diff mbox

[2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

Message ID 20180214182618.14627-3-willy@infradead.org (mailing list archive)
State New, archived
Headers show

Commit Message

Matthew Wilcox Feb. 14, 2018, 6:26 p.m. UTC
From: Matthew Wilcox <mawilcox@microsoft.com>

We have kvmalloc_array in order to safely allocate an array with a
number of elements specified by userspace (avoiding arithmetic overflow
leading to a buffer overrun).  But it's fairly common to have a header
in front of that array (eg specifying the length of the array), so we
need a helper function for that situation.

kvmalloc_ab_c() is the workhorse that does the calculation, but in spite
of our best efforts to name the arguments, it's really hard to remember
which order to put the arguments in.  kvzalloc_struct() eliminates that
effort; you tell it about the struct you're allocating, and it puts the
arguments in the right order for you (and checks that the arguments
you've given are at least plausible).

For comparison between the three schemes:

	sev = kvzalloc(sizeof(*sev) + sizeof(struct v4l2_kevent) * elems,
			GFP_KERNEL);
	sev = kvzalloc_ab_c(elems, sizeof(struct v4l2_kevent), sizeof(*sev),
			GFP_KERNEL);
	sev = kvzalloc_struct(sev, events, elems, GFP_KERNEL);

Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
---
 include/linux/mm.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 51 insertions(+)

Comments

Kees Cook Feb. 14, 2018, 7:22 p.m. UTC | #1
On Wed, Feb 14, 2018 at 10:26 AM, Matthew Wilcox <willy@infradead.org> wrote:
> From: Matthew Wilcox <mawilcox@microsoft.com>
>
> We have kvmalloc_array in order to safely allocate an array with a
> number of elements specified by userspace (avoiding arithmetic overflow
> leading to a buffer overrun).  But it's fairly common to have a header
> in front of that array (eg specifying the length of the array), so we
> need a helper function for that situation.
>
> kvmalloc_ab_c() is the workhorse that does the calculation, but in spite
> of our best efforts to name the arguments, it's really hard to remember
> which order to put the arguments in.  kvzalloc_struct() eliminates that
> effort; you tell it about the struct you're allocating, and it puts the
> arguments in the right order for you (and checks that the arguments
> you've given are at least plausible).
>
> For comparison between the three schemes:
>
>         sev = kvzalloc(sizeof(*sev) + sizeof(struct v4l2_kevent) * elems,
>                         GFP_KERNEL);
>         sev = kvzalloc_ab_c(elems, sizeof(struct v4l2_kevent), sizeof(*sev),
>                         GFP_KERNEL);
>         sev = kvzalloc_struct(sev, events, elems, GFP_KERNEL);
>
> Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> ---
>  include/linux/mm.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 51 insertions(+)
>
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 81bd7f0be286..ddf929c5aaee 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -557,6 +557,57 @@ static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
>         return kvmalloc(n * size, flags);
>  }
>
> +/**
> + * kvmalloc_ab_c() - Allocate memory.

Longer description, maybe? "Allocate a *b + c bytes of memory"?

> + * @n: Number of elements.
> + * @size: Size of each element (should be constant).
> + * @c: Size of header (should be constant).

If these should be constant, should we mark them as "const"? Or WARN
if __builtin_constant_p() isn't true?

> + * @gfp: Memory allocation flags.
> + *
> + * Use this function to allocate @n * @size + @c bytes of memory.  This
> + * function is safe to use when @n is controlled from userspace; it will
> + * return %NULL if the required amount of memory cannot be allocated.
> + * Use kvfree() to free the allocated memory.
> + *
> + * The kvzalloc_hdr_arr() function is easier to use as it has typechecking

renaming typo? Should this be "kvzalloc_struct()"?

> + * and you do not need to remember which of the arguments should be constants.
> + *
> + * Context: Process context.  May sleep; the @gfp flags should be based on
> + *         %GFP_KERNEL.
> + * Return: A pointer to the allocated memory or %NULL.
> + */
> +static inline __must_check
> +void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
> +{
> +       if (size != 0 && n > (SIZE_MAX - c) / size)
> +               return NULL;
> +
> +       return kvmalloc(n * size + c, gfp);
> +}
> +#define kvzalloc_ab_c(a, b, c, gfp)    kvmalloc_ab_c(a, b, c, gfp | __GFP_ZERO)

Nit: "(gfp) | __GFP_ZERO" just in case of insane usage.

> +
> +/**
> + * kvzalloc_struct() - Allocate and zero-fill a structure containing a
> + *                    variable length array.
> + * @p: Pointer to the structure.
> + * @member: Name of the array member.
> + * @n: Number of elements in the array.
> + * @gfp: Memory allocation flags.
> + *
> + * Allocate (and zero-fill) enough memory for a structure with an array
> + * of @n elements.  This function is safe to use when @n is specified by
> + * userspace as the arithmetic will not overflow.
> + * Use kvfree() to free the allocated memory.
> + *
> + * Context: Process context.  May sleep; the @gfp flags should be based on
> + *         %GFP_KERNEL.
> + * Return: Zero-filled memory or a NULL pointer.
> + */
> +#define kvzalloc_struct(p, member, n, gfp)                             \
> +       (typeof(p))kvzalloc_ab_c(n,                                     \
> +               sizeof(*(p)->member) + __must_be_array((p)->member),    \
> +               offsetof(typeof(*(p)), member), gfp)
> +
>  extern void kvfree(const void *addr);
>
>  static inline atomic_t *compound_mapcount_ptr(struct page *page)

It might be nice to include another patch that replaces some of the
existing/common uses of a*b+c with the new function...

Otherwise, yes, please. We could build a coccinelle rule for
additional replacements...

-Kees
Julia Lawall Feb. 14, 2018, 7:27 p.m. UTC | #2
On Wed, 14 Feb 2018, Kees Cook wrote:

> On Wed, Feb 14, 2018 at 10:26 AM, Matthew Wilcox <willy@infradead.org> wrote:
> > From: Matthew Wilcox <mawilcox@microsoft.com>
> >
> > We have kvmalloc_array in order to safely allocate an array with a
> > number of elements specified by userspace (avoiding arithmetic overflow
> > leading to a buffer overrun).  But it's fairly common to have a header
> > in front of that array (eg specifying the length of the array), so we
> > need a helper function for that situation.
> >
> > kvmalloc_ab_c() is the workhorse that does the calculation, but in spite
> > of our best efforts to name the arguments, it's really hard to remember
> > which order to put the arguments in.  kvzalloc_struct() eliminates that
> > effort; you tell it about the struct you're allocating, and it puts the
> > arguments in the right order for you (and checks that the arguments
> > you've given are at least plausible).
> >
> > For comparison between the three schemes:
> >
> >         sev = kvzalloc(sizeof(*sev) + sizeof(struct v4l2_kevent) * elems,
> >                         GFP_KERNEL);
> >         sev = kvzalloc_ab_c(elems, sizeof(struct v4l2_kevent), sizeof(*sev),
> >                         GFP_KERNEL);
> >         sev = kvzalloc_struct(sev, events, elems, GFP_KERNEL);
> >
> > Signed-off-by: Matthew Wilcox <mawilcox@microsoft.com>
> > ---
> >  include/linux/mm.h | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 51 insertions(+)
> >
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 81bd7f0be286..ddf929c5aaee 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -557,6 +557,57 @@ static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
> >         return kvmalloc(n * size, flags);
> >  }
> >
> > +/**
> > + * kvmalloc_ab_c() - Allocate memory.
>
> Longer description, maybe? "Allocate a *b + c bytes of memory"?
>
> > + * @n: Number of elements.
> > + * @size: Size of each element (should be constant).
> > + * @c: Size of header (should be constant).
>
> If these should be constant, should we mark them as "const"? Or WARN
> if __builtin_constant_p() isn't true?
>
> > + * @gfp: Memory allocation flags.
> > + *
> > + * Use this function to allocate @n * @size + @c bytes of memory.  This
> > + * function is safe to use when @n is controlled from userspace; it will
> > + * return %NULL if the required amount of memory cannot be allocated.
> > + * Use kvfree() to free the allocated memory.
> > + *
> > + * The kvzalloc_hdr_arr() function is easier to use as it has typechecking
>
> renaming typo? Should this be "kvzalloc_struct()"?
>
> > + * and you do not need to remember which of the arguments should be constants.
> > + *
> > + * Context: Process context.  May sleep; the @gfp flags should be based on
> > + *         %GFP_KERNEL.
> > + * Return: A pointer to the allocated memory or %NULL.
> > + */
> > +static inline __must_check
> > +void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
> > +{
> > +       if (size != 0 && n > (SIZE_MAX - c) / size)
> > +               return NULL;
> > +
> > +       return kvmalloc(n * size + c, gfp);
> > +}
> > +#define kvzalloc_ab_c(a, b, c, gfp)    kvmalloc_ab_c(a, b, c, gfp | __GFP_ZERO)
>
> Nit: "(gfp) | __GFP_ZERO" just in case of insane usage.
>
> > +
> > +/**
> > + * kvzalloc_struct() - Allocate and zero-fill a structure containing a
> > + *                    variable length array.
> > + * @p: Pointer to the structure.
> > + * @member: Name of the array member.
> > + * @n: Number of elements in the array.
> > + * @gfp: Memory allocation flags.
> > + *
> > + * Allocate (and zero-fill) enough memory for a structure with an array
> > + * of @n elements.  This function is safe to use when @n is specified by
> > + * userspace as the arithmetic will not overflow.
> > + * Use kvfree() to free the allocated memory.
> > + *
> > + * Context: Process context.  May sleep; the @gfp flags should be based on
> > + *         %GFP_KERNEL.
> > + * Return: Zero-filled memory or a NULL pointer.
> > + */
> > +#define kvzalloc_struct(p, member, n, gfp)                             \
> > +       (typeof(p))kvzalloc_ab_c(n,                                     \
> > +               sizeof(*(p)->member) + __must_be_array((p)->member),    \
> > +               offsetof(typeof(*(p)), member), gfp)
> > +
> >  extern void kvfree(const void *addr);
> >
> >  static inline atomic_t *compound_mapcount_ptr(struct page *page)
>
> It might be nice to include another patch that replaces some of the
> existing/common uses of a*b+c with the new function...
>
> Otherwise, yes, please. We could build a coccinelle rule for
> additional replacements...

Thanks for the suggestion.  I will look into it.

julia
Matthew Wilcox Feb. 14, 2018, 7:35 p.m. UTC | #3
On Wed, Feb 14, 2018 at 11:22:38AM -0800, Kees Cook wrote:
> > +/**
> > + * kvmalloc_ab_c() - Allocate memory.
> 
> Longer description, maybe? "Allocate a *b + c bytes of memory"?

Done!

> > + * @n: Number of elements.
> > + * @size: Size of each element (should be constant).
> > + * @c: Size of header (should be constant).
> 
> If these should be constant, should we mark them as "const"? Or WARN
> if __builtin_constant_p() isn't true?

It's only less efficient if they're not const.  Theoretically they could be
variable ... and I've been bitten by __builtin_constant_p() recently
(gcc bug 83653 which I still don't really understand).

> > + * @gfp: Memory allocation flags.
> > + *
> > + * Use this function to allocate @n * @size + @c bytes of memory.  This
> > + * function is safe to use when @n is controlled from userspace; it will
> > + * return %NULL if the required amount of memory cannot be allocated.
> > + * Use kvfree() to free the allocated memory.
> > + *
> > + * The kvzalloc_hdr_arr() function is easier to use as it has typechecking
> 
> renaming typo? Should this be "kvzalloc_struct()"?

Urgh, yes.  I swear I searched for it ... must've typoed my search string.
Anyway, fixed, because kvzalloc_hdr_arr() wasn't a good name.

> > +#define kvzalloc_ab_c(a, b, c, gfp)    kvmalloc_ab_c(a, b, c, gfp | __GFP_ZERO)
> 
> Nit: "(gfp) | __GFP_ZERO" just in case of insane usage.

Fixed!

> It might be nice to include another patch that replaces some of the
> existing/common uses of a*b+c with the new function...

Sure!  I have a few examples in my tree, I just didn't want to complicate
things by sending a patch that crossed dozens of maintainer trees.
Christoph Lameter (Ampere) Feb. 14, 2018, 7:55 p.m. UTC | #4
On Wed, 14 Feb 2018, Matthew Wilcox wrote:

> +#define kvzalloc_struct(p, member, n, gfp)				\
> +	(typeof(p))kvzalloc_ab_c(n,					\
> +		sizeof(*(p)->member) + __must_be_array((p)->member),	\
> +		offsetof(typeof(*(p)), member), gfp)
> +

Uppercase like the similar KMEM_CACHE related macros in
include/linux/slab.h?>
Matthew Wilcox Feb. 14, 2018, 8:14 p.m. UTC | #5
On Wed, Feb 14, 2018 at 01:55:59PM -0600, Christopher Lameter wrote:
> On Wed, 14 Feb 2018, Matthew Wilcox wrote:
> 
> > +#define kvzalloc_struct(p, member, n, gfp)				\
> > +	(typeof(p))kvzalloc_ab_c(n,					\
> > +		sizeof(*(p)->member) + __must_be_array((p)->member),	\
> > +		offsetof(typeof(*(p)), member), gfp)
> > +
> 
> Uppercase like the similar KMEM_CACHE related macros in
> include/linux/slab.h?>

Do you think that would look better in the users?  Compare:

@@ -1284,7 +1284,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
                return -EOPNOTSUPP;
        if (mem.nregions > max_mem_regions)
                return -E2BIG;
-       newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
+       newmem = kvzalloc_struct(newmem, regions, mem.nregions, GFP_KERNEL);
        if (!newmem)
                return -ENOMEM;

@@ -1284,7 +1284,7 @@ static long vhost_set_memory(struct vhost_dev *d, struct vhost_memory __user *m)
                return -EOPNOTSUPP;
        if (mem.nregions > max_mem_regions)
                return -E2BIG;
-       newmem = kvzalloc(size + mem.nregions * sizeof(*m->regions), GFP_KERNEL);
+       newmem = KVZALLOC_STRUCT(newmem, regions, mem.nregions, GFP_KERNEL);
        if (!newmem)
                return -ENOMEM;

Making it look like a function is more pleasing to my eye, but I'll
change it if that's the only thing keeping it from being merged.
Christoph Lameter (Ampere) Feb. 15, 2018, 3:55 p.m. UTC | #6
On Wed, 14 Feb 2018, Matthew Wilcox wrote:

> > Uppercase like the similar KMEM_CACHE related macros in
> > include/linux/slab.h?>
>
> Do you think that would look better in the users?  Compare:

Does looking matter? I thought we had the convention that macros are
uppercase. There are some tricks going on with the struct. Uppercase shows
that something special is going on.

> Making it look like a function is more pleasing to my eye, but I'll
> change it if that's the only thing keeping it from being merged.

This should be consistent throughout the source.
Matthew Wilcox Feb. 15, 2018, 4:23 p.m. UTC | #7
On Thu, Feb 15, 2018 at 09:55:11AM -0600, Christopher Lameter wrote:
> On Wed, 14 Feb 2018, Matthew Wilcox wrote:
> 
> > > Uppercase like the similar KMEM_CACHE related macros in
> > > include/linux/slab.h?>
> >
> > Do you think that would look better in the users?  Compare:
> 
> Does looking matter? I thought we had the convention that macros are
> uppercase. There are some tricks going on with the struct. Uppercase shows
> that something special is going on.

  12) Macros, Enums and RTL
  -------------------------

  Names of macros defining constants and labels in enums are capitalized.

  .. code-block:: c

          #define CONSTANT 0x12345

  Enums are preferred when defining several related constants.

  CAPITALIZED macro names are appreciated but macros resembling functions
  may be named in lower case.

I dunno.  Yes, there's macro trickery going on here, but it certainly
resembles a function.  It doesn't fail any of the rules laid out in that
chapter of coding-style about unacceptable uses of macros.
Christoph Lameter (Ampere) Feb. 15, 2018, 5:06 p.m. UTC | #8
On Thu, 15 Feb 2018, Matthew Wilcox wrote:

> I dunno.  Yes, there's macro trickery going on here, but it certainly
> resembles a function.  It doesn't fail any of the rules laid out in that
> chapter of coding-style about unacceptable uses of macros.

It sure looks like a function but does magic things with the struct
parameter. So its not working like a function and the capitalization makes
one aware of that.
Kees Cook Feb. 22, 2018, 1:28 a.m. UTC | #9
On Thu, Feb 15, 2018 at 9:06 AM, Christopher Lameter <cl@linux.com> wrote:
> On Thu, 15 Feb 2018, Matthew Wilcox wrote:
>
>> I dunno.  Yes, there's macro trickery going on here, but it certainly
>> resembles a function.  It doesn't fail any of the rules laid out in that
>> chapter of coding-style about unacceptable uses of macros.
>
> It sure looks like a function but does magic things with the struct
> parameter. So its not working like a function and the capitalization makes
> one aware of that.

I think readability trumps that -- nearly everything else in the
kernel that hides these kinds of details is lower case.

-Kees
Linus Torvalds May 4, 2018, 7:42 a.m. UTC | #10
On Wed, Feb 14, 2018 at 8:27 AM Matthew Wilcox <willy@infradead.org> wrote:
> +static inline __must_check
> +void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
> +{
> +       if (size != 0 && n > (SIZE_MAX - c) / size)
> +               return NULL;
> +
> +       return kvmalloc(n * size + c, gfp);

Ok, so some more bikeshedding:

  - I really don't want to encourage people to use kvmalloc().

In fact, the conversion I saw was buggy. You can *not* convert a GFP_ATOMIC
user of kmalloc() to use kvmalloc.

  - that divide is really really expensive on many architectures.

Note that these issues kind of go hand in hand.

Normal kernel allocations are *limited*. It's simply not ok to allocate
megabytes (or gigabytes) of mmory in general. We have serious limits, and
we *should* have serious limits. If people worry about the multiply
overflowing because a user is controlling the size valus, then dammit, such
a user should not be able to do a huge gigabyte vmalloc() that exhausts
memory and then spends time clearing it all!

So the whole notion that "overflows are dangerous, let's get rid of them"
somehow fixes a bug is BULLSHIT. You literally introduced a *new* bug by
removing the normal kmalloc() size limit because you thought that pverflows
are the only problem.

So stop doing things like this. We should not do a stupid divide, because
we damn well know that it is NOT VALID to allocate arrays that have
hundreds of fthousands of elements,  or where the size of one element is
very big.

So get rid of the stupid divide, and make the limits be much stricter. Like
saying "the array element size had better be smaller than one page"
(because honestly, bigger elements are not valid in the kernel), and "the
size of the array cannot be more than "pick-some-number-out-of-your-ass".

So just make the divide go the hell away, a and check the size for validity.

Something like

      if (size > PAGE_SIZE)
           return NULL;
      if (elem > 65535)
           return NULL;
      if (offset > PAGE_SIZE)
           return NULL;
      return kzalloc(size*elem+offset);

and now you (a) guarantee it can't overflow and (b) don't make people use
crazy vmalloc() allocations when they damn well shouldn't.

And yeah, if  somebody has a page size bigger than 64k, then the above can
still overflow. I'm sorry, that architecture s broken shit.

Are there  cases where vmalloc() is ok? Yes. But they should be rare, and
they should have a good reason for them. And honestly, even then the above
limits really really sound quite reasonable. There is no excuse for
million-entry arrays in the kernel. You are doing something seriously wrong
if you do those.

             Linus
Matthew Wilcox May 4, 2018, 1:14 p.m. UTC | #11
On Fri, May 04, 2018 at 07:42:52AM +0000, Linus Torvalds wrote:
> On Wed, Feb 14, 2018 at 8:27 AM Matthew Wilcox <willy@infradead.org> wrote:
> > +static inline __must_check
> > +void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
> > +{
> > +       if (size != 0 && n > (SIZE_MAX - c) / size)
> > +               return NULL;
> > +
> > +       return kvmalloc(n * size + c, gfp);
> 
> Ok, so some more bikeshedding:

I'm putting up the bikeshed against your house ... the colour is your
choice!

>   - I really don't want to encourage people to use kvmalloc().
> 
> In fact, the conversion I saw was buggy. You can *not* convert a GFP_ATOMIC
> user of kmalloc() to use kvmalloc.

Not sure which conversion you're referring to; not one of mine, I hope?

>   - that divide is really really expensive on many architectures.

'c' and 'size' are _supposed_ to be constant and get evaluated at
compile-time.  ie you should get something like this on x86:

   0:   48 b8 fe ff ff ff ff    movabs $0x1ffffffffffffffe,%rax
   7:   ff ff 1f 
   a:   48 39 c7                cmp    %rax,%rdi
   d:   76 09                   jbe    18 <a+0x18>
   f:   48 c1 e7 03             shl    $0x3,%rdi
  13:   e9 00 00 00 00          jmpq   18 <a+0x18>
                        14: R_X86_64_PLT32      malloc-0x4
  18:   31 c0                   xor    %eax,%eax
  1a:   c3                      retq   

Now, if someone's an idiot, then you'll get the divide done at runtime,
and that'll be expensive.

> Normal kernel allocations are *limited*. It's simply not ok to allocate
> megabytes (or gigabytes) of mmory in general. We have serious limits, and
> we *should* have serious limits. If people worry about the multiply
> overflowing because a user is controlling the size valus, then dammit, such
> a user should not be able to do a huge gigabyte vmalloc() that exhausts
> memory and then spends time clearing it all!

I agree.

> So the whole notion that "overflows are dangerous, let's get rid of them"
> somehow fixes a bug is BULLSHIT. You literally introduced a *new* bug by
> removing the normal kmalloc() size limit because you thought that pverflows
> are the only problem.

Rather, I replaced one bug with another.  The removed bug was one
where we allocated 24 bytes and then indexed into the next slab object.
The added bug was that someone can now persuade the driver to allocate
gigabytes of memory.  It's a less severe bug, but I take your point.
We do have _some_ limits in vmalloc -- we fail if you're trying to
allocate more memory than is in the machine, and we fail if there's
insufficient contiguous space in the virtual address space.  But, yes,
this does allow people to allocate more memory than kmalloc would allow.

> So stop doing things like this. We should not do a stupid divide, because
> we damn well know that it is NOT VALID to allocate arrays that have
> hundreds of fthousands of elements,  or where the size of one element is
> very big.
> 
> So get rid of the stupid divide, and make the limits be much stricter. Like
> saying "the array element size had better be smaller than one page"
> (because honestly, bigger elements are not valid in the kernel), and "the
> size of the array cannot be more than "pick-some-number-out-of-your-ass".
> 
> So just make the divide go the hell away, a and check the size for validity.
> 
> Something like
> 
>       if (size > PAGE_SIZE)
>            return NULL;
>       if (elem > 65535)
>            return NULL;
>       if (offset > PAGE_SIZE)
>            return NULL;
>       return kzalloc(size*elem+offset);
> 
> and now you (a) guarantee it can't overflow and (b) don't make people use
> crazy vmalloc() allocations when they damn well shouldn't.

I find your faith in the size of structs in the kernel touching ;-)

struct cmp_data {
        /* size: 290904, cachelines: 4546, members: 11 */
struct dec_data {
        /* size: 274520, cachelines: 4290, members: 10 */
struct cpu_entry_area {
        /* size: 180224, cachelines: 2816, members: 7 */
struct saved_cmdlines_buffer {
        /* size: 131104, cachelines: 2049, members: 5 */
struct debug_store_buffers {
        /* size: 131072, cachelines: 2048, members: 2 */
struct bunzip_data {
        /* size: 42648, cachelines: 667, members: 23 */
struct inflate_workspace {
        /* size: 42312, cachelines: 662, members: 2 */
struct xz_dec_lzma2 {
        /* size: 28496, cachelines: 446, members: 5 */
struct lzma_dec {
        /* size: 28304, cachelines: 443, members: 21 */
struct rcu_state {
        /* size: 17344, cachelines: 271, members: 34 */
struct pglist_data {
        /* size: 18304, cachelines: 286, members: 34 */
struct tss_struct {
        /* size: 12288, cachelines: 192, members: 2 */
struct bts_ctx {
        /* size: 12288, cachelines: 192, members: 3 */

Those are just the ones above 10kB.  Sure, I can see some of them are
boot time use only, or we allocate one per node, or whatever.  But people
do create arrays of these things.  The biggest object we have in the
slab_cache today is 23488 bytes (kvm_vcpu) -- at least on my laptop.  Maybe
there's some insane driver out there that's creating larger things.

> And yeah, if  somebody has a page size bigger than 64k, then the above can
> still overflow. I'm sorry, that architecture s broken shit.
> 
> Are there  cases where vmalloc() is ok? Yes. But they should be rare, and
> they should have a good reason for them. And honestly, even then the above
> limits really really sound quite reasonable. There is no excuse for
> million-entry arrays in the kernel. You are doing something seriously wrong
> if you do those.

Normally, yes.  But then you get people like Google who want to have
a million file descriptors open in a single process.  So I'm leery of
putting hard limits on, like the ones you suggest above, because I'm not
going to be the one who Google come to when they want to exceed the limit.
If you want to draw that line in the sand, then I'm happy to respin the
patch in that direction.

We really have two reasons for using vmalloc -- one is "fragmentation
currently makes it impossible to allocate enough contiguous memory
to satisfy your needs" and the other is "this request is for too much
memory to satisfy through the buddy allocator".  kvmalloc is normally
(not always; see file descriptor example above) for the first kind of
problem, but I wonder if kvmalloc() shouldn't have the same limit as
kmalloc (2048 pages), then add a kvmalloc_large() which will not impose
that limit check.
Linus Torvalds May 4, 2018, 3:35 p.m. UTC | #12
On Fri, May 4, 2018 at 3:14 AM Matthew Wilcox <willy@infradead.org> wrote:

> > In fact, the conversion I saw was buggy. You can *not* convert a
GFP_ATOMIC
> > user of kmalloc() to use kvmalloc.

> Not sure which conversion you're referring to; not one of mine, I hope?

I was thinking of the coccinelle patch in this thread, but just realized
that that actually only did it for GFP_KERNEL, so I guess it would work,
apart from the "oops, now it doesn't enforce the kmalloc limits any more"
issue.

> >   - that divide is really really expensive on many architectures.

> 'c' and 'size' are _supposed_ to be constant and get evaluated at
> compile-time.  ie you should get something like this on x86:

I guess that willalways  be true of the 'kvzalloc_struct() version that
will always use a sizeof(). I was more thinking of any bare kvalloc_ab_c()
cases, but maybe we'd discourage that to ever be used as such?

Because we definitely have things like that, ie a quick grep finds

    f = kmalloc (sizeof (*f) + size*num, GFP_KERNEL);

where 'size' is not obviously a constant. There may be others, but I didn't
really try to grep any further.

Maybe they aren't common, and maybe the occasional divide doesn't matter,
but particularly if we use scripting to then catch and convert users, I
really hate the idea of "let's introduce something that is potentially much
more expensive than it needs to be".

(And the automated coccinelle scripting it's also something where we must
very much avoid then subtly lifting allocation size limits)

> > Something like
> >
> >       if (size > PAGE_SIZE)
> >            return NULL;
> >       if (elem > 65535)
> >            return NULL;
> >       if (offset > PAGE_SIZE)
> >            return NULL;
> >       return kzalloc(size*elem+offset);
> >
> > and now you (a) guarantee it can't overflow and (b) don't make people
use
> > crazy vmalloc() allocations when they damn well shouldn't.

> I find your faith in the size of structs in the kernel touching ;-)

I *really* hope some of those examples of yours aren't allocated with
kmalloc using this pattern..

But yes, I may be naive on the sizes.

> We really have two reasons for using vmalloc -- one is "fragmentation
> currently makes it impossible to allocate enough contiguous memory
> to satisfy your needs" and the other is "this request is for too much
> memory to satisfy through the buddy allocator".  kvmalloc is normally
> (not always; see file descriptor example above) for the first kind of
> problem, but I wonder if kvmalloc() shouldn't have the same limit as
> kmalloc (2048 pages), then add a kvmalloc_large() which will not impose
> that limit check.

That would definitely solve at least one worry.

We do potentially have users which use kmalloc optimistically and do not
want to fall back to vmalloc (they'll fall back to smaller allocations
entirely), but I guess if fwe make sure to not convert any
__GFP_NOWARN/NORETRY users, that should all be ok.

But honestly, I'd rather see just kmalloc users stay as kmalloc users.

If instread of "kzvmalloc_ab_c()" you introduce the "struct size
calculation" part as a "saturating non-overflow calculation", then it
should be fairly easy to just do

   #define kzalloc_struct(p, member, n, gfp) \
      kzalloc(struct_size(p, member, n, gfp)

and you basically *know* that the only thing you changed was purely the
overflow semantics.

That also would take care of my diide worry, because now there  wouldn't be
any ab_c() calculations that might take a non-constant size. The
"struct_size()" thing would always do the sizeof.

So you'd have something like

   /* 'b' and 'c' are always constants */
   static inline sizeof_t __ab_c_saturating(size_t a, size_t b, size_t c)
   {
     if (b && n > (SIZE_MAX -c) / size)
         return SIZE_MAX;
     return a*b+c;
   }
   #define struct_size(p, member, n) \
       __ab_c_saturating(n, \
             sizeof(*(p)->member) + __must_be_array((p)->member), \
             offsetof(typeof(*(p)), member))

and then that kzalloc_struct() wrapper define above should just work.

The above is entirely untested, but it *looks* sane, and avoids all
semantic changes outside of the overflow protection. And nobody would use
that __ab_c_saturating() by mistake, I hope.

No?

               Linus
Kees Cook May 4, 2018, 4:03 p.m. UTC | #13
On Fri, May 4, 2018 at 8:35 AM, Linus Torvalds
<torvalds@linux-foundation.org> wrote:
> On Fri, May 4, 2018 at 3:14 AM Matthew Wilcox <willy@infradead.org> wrote:
>
>> > In fact, the conversion I saw was buggy. You can *not* convert a
> GFP_ATOMIC
>> > user of kmalloc() to use kvmalloc.
>
>> Not sure which conversion you're referring to; not one of mine, I hope?
>
> I was thinking of the coccinelle patch in this thread, but just realized
> that that actually only did it for GFP_KERNEL, so I guess it would work,
> apart from the "oops, now it doesn't enforce the kmalloc limits any more"
> issue.

Just to be clear: the Coccinelle scripts I'm building aren't doing a
kmalloc -> kvmalloc conversion. I'm just removing all the 2-factor
multiplication and replacing it with the appropriate calls to the
allocator family's *calloc or *alloc_array(). This will get us to the
place where we can do all the sanity-checking in the allocator
functions (whatever that checking ends up being). As it turns out,
though, we have kind of a lot of allocator families. Some are
wrappers, like devm_*alloc(), etc.

All that said, the overwhelming majority of *alloc() multiplications
are just "count * sizeof()". It really feels like everything should
just be using a new *alloc_struct() which can do the type checking,
etc, etc, but we can get there. The remaining "count * size" are a
minority and could be dealt with some other way.

>> >   - that divide is really really expensive on many architectures.
>
>> 'c' and 'size' are _supposed_ to be constant and get evaluated at
>> compile-time.  ie you should get something like this on x86:
>
> I guess that willalways  be true of the 'kvzalloc_struct() version that
> will always use a sizeof(). I was more thinking of any bare kvalloc_ab_c()
> cases, but maybe we'd discourage that to ever be used as such?

Yeah, bare *alloc_ab_c() is not great. Perhaps a leading "__" can hint to that?

> Because we definitely have things like that, ie a quick grep finds
>
>     f = kmalloc (sizeof (*f) + size*num, GFP_KERNEL);
>
> where 'size' is not obviously a constant. There may be others, but I didn't
> really try to grep any further.
>
> Maybe they aren't common, and maybe the occasional divide doesn't matter,
> but particularly if we use scripting to then catch and convert users, I
> really hate the idea of "let's introduce something that is potentially much
> more expensive than it needs to be".

Yup: I'm not after that either. I just want to be able to get at the
multiplication factors before they're multiplied. :)

> (And the automated coccinelle scripting it's also something where we must
> very much avoid then subtly lifting allocation size limits)

Agreed. I think most cases are already getting lifted to size_t due to
the sizeof(). It's the "two variables" cases I want to double-check.
Another completely insane idea would be to have a macro that did type
size checking and would DTRT, but with all the alloc families, it
looks nasty. This is all RFC stage, as far as I'm concerned.

Fun example: devm_kzalloc(dev, sizeof(...) * num, gfp...)

$ git grep 'devm_kzalloc([^,]*, *sizeof([^,]*,' | egrep '\* *sizeof|\)
*\*' | wc -l
88

some are constants:
drivers/video/fbdev/au1100fb.c:         devm_kzalloc(&dev->dev,
sizeof(u32) * 16, GFP_KERNEL);

but many aren't:
sound/soc/generic/audio-graph-card.c:   dai_link  = devm_kzalloc(dev,
sizeof(*dai_link)  * num, GFP_KERNEL);

While type-checking on the non-sizeof factor would let us know if it
was safe, so would the division, and most of those could happen at
compile time. It's the size_t variables that we want to catch.

So, mainly I'm just trying to get the arguments reordered (for a
compile-time division) into the correct helpers so the existing logic
can do the right thing, and only for 2-factor products. After that,
then I'm hoping to tackle the multi-factor products, of which the
*alloc_struct() helper seems to cover the vast majority of the
remaining cases.

-Kees
diff mbox

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 81bd7f0be286..ddf929c5aaee 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -557,6 +557,57 @@  static inline void *kvmalloc_array(size_t n, size_t size, gfp_t flags)
 	return kvmalloc(n * size, flags);
 }
 
+/**
+ * kvmalloc_ab_c() - Allocate memory.
+ * @n: Number of elements.
+ * @size: Size of each element (should be constant).
+ * @c: Size of header (should be constant).
+ * @gfp: Memory allocation flags.
+ *
+ * Use this function to allocate @n * @size + @c bytes of memory.  This
+ * function is safe to use when @n is controlled from userspace; it will
+ * return %NULL if the required amount of memory cannot be allocated.
+ * Use kvfree() to free the allocated memory.
+ *
+ * The kvzalloc_hdr_arr() function is easier to use as it has typechecking
+ * and you do not need to remember which of the arguments should be constants.
+ *
+ * Context: Process context.  May sleep; the @gfp flags should be based on
+ *	    %GFP_KERNEL.
+ * Return: A pointer to the allocated memory or %NULL.
+ */
+static inline __must_check
+void *kvmalloc_ab_c(size_t n, size_t size, size_t c, gfp_t gfp)
+{
+	if (size != 0 && n > (SIZE_MAX - c) / size)
+		return NULL;
+
+	return kvmalloc(n * size + c, gfp);
+}
+#define kvzalloc_ab_c(a, b, c, gfp)	kvmalloc_ab_c(a, b, c, gfp | __GFP_ZERO)
+
+/**
+ * kvzalloc_struct() - Allocate and zero-fill a structure containing a
+ *		       variable length array.
+ * @p: Pointer to the structure.
+ * @member: Name of the array member.
+ * @n: Number of elements in the array.
+ * @gfp: Memory allocation flags.
+ *
+ * Allocate (and zero-fill) enough memory for a structure with an array
+ * of @n elements.  This function is safe to use when @n is specified by
+ * userspace as the arithmetic will not overflow.
+ * Use kvfree() to free the allocated memory.
+ *
+ * Context: Process context.  May sleep; the @gfp flags should be based on
+ *	    %GFP_KERNEL.
+ * Return: Zero-filled memory or a NULL pointer.
+ */
+#define kvzalloc_struct(p, member, n, gfp)				\
+	(typeof(p))kvzalloc_ab_c(n,					\
+		sizeof(*(p)->member) + __must_be_array((p)->member),	\
+		offsetof(typeof(*(p)), member), gfp)
+
 extern void kvfree(const void *addr);
 
 static inline atomic_t *compound_mapcount_ptr(struct page *page)