diff mbox series

[9/9] mm/vmalloc: add __alloc_size attributes for better bounds checking

Message ID 20210910031046.G76dQvPhV%akpm@linux-foundation.org (mailing list archive)
State New
Headers show
Series [1/9] mm: move kvmalloc-related functions to slab.h | expand

Commit Message

Andrew Morton Sept. 10, 2021, 3:10 a.m. UTC
From: Kees Cook <keescook@chromium.org>
Subject: mm/vmalloc: add __alloc_size attributes for better bounds checking

As already done in GrapheneOS, add the __alloc_size attribute for
appropriate vmalloc allocator interfaces, to provide additional hinting
for better bounds checking, assisting CONFIG_FORTIFY_SOURCE and other
compiler optimizations.

Link: https://lkml.kernel.org/r/20210818214021.2476230-8-keescook@chromium.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Co-developed-by: Daniel Micay <danielmicay@gmail.com>
Signed-off-by: Daniel Micay <danielmicay@gmail.com>
Cc: Andy Whitcroft <apw@canonical.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: David Rientjes <rientjes@google.com>
Cc: Dennis Zhou <dennis@kernel.org>
Cc: Dwaipayan Ray <dwaipayanray1@gmail.com>
Cc: Joe Perches <joe@perches.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Tejun Heo <tj@kernel.org>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/vmalloc.h |   11 +++++++++++
 1 file changed, 11 insertions(+)

Comments

Linus Torvalds Sept. 10, 2021, 5:23 p.m. UTC | #1
On Thu, Sep 9, 2021 at 8:10 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> +__alloc_size(1)
>  extern void *vmalloc(unsigned long size);
[...]

All of these are added in the wrong place - inconsistent with the very
compiler documentation the patches add.

The function attributes are generally added _after_ the function,
although admittedly we've been quite confused here before.

But the very compiler documentation you point to in the patch that
adds these macros gives that as the examples both for gcc and clang:

+ *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute
+ * clang: https://clang.llvm.org/docs/AttributeReference.html#alloc-size

and honestly I think that is the preferred format because this is
about the *function*, not about the return type.

Do both placements work? Yes.

Have we been confused about this before? Yes. I note that our __printf
attributes in particular have been added in odd places. And our
existing __malloc annotations seem to correct in <linux/slab.h> and
<linux/device.h> but then randomly applied in some other places.

I really think it's pointlessly stupid and hard to read/grep for to
make it be a separate line before the whole thing.

I also think it should have taken over the "__malloc" name that is
almost unused right now. Because why would you ever have
__alloc_size() without having __malloc().

So wouldn't this all have looked much more natural as

     void *vmalloc(unsigned long size) __malloc(1);

instead? Hmm?

              Linus
Kees Cook Sept. 10, 2021, 6:43 p.m. UTC | #2
On Fri, Sep 10, 2021 at 10:23:48AM -0700, Linus Torvalds wrote:
> On Thu, Sep 9, 2021 at 8:10 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > +__alloc_size(1)
> >  extern void *vmalloc(unsigned long size);
> [...]
> 
> All of these are added in the wrong place - inconsistent with the very
> compiler documentation the patches add.
> 
> The function attributes are generally added _after_ the function,
> although admittedly we've been quite confused here before.
> 
> But the very compiler documentation you point to in the patch that
> adds these macros gives that as the examples both for gcc and clang:
> 
> + *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute
> + * clang: https://clang.llvm.org/docs/AttributeReference.html#alloc-size
> 
> and honestly I think that is the preferred format because this is
> about the *function*, not about the return type.
> 
> Do both placements work? Yes.
> 
> Have we been confused about this before? Yes. I note that our __printf
> attributes in particular have been added in odd places. And our
> existing __malloc annotations seem to correct in <linux/slab.h> and
> <linux/device.h> but then randomly applied in some other places.
> 
> I really think it's pointlessly stupid and hard to read/grep for to
> make it be a separate line before the whole thing.

This was bike-shed on the list, and this result seemed to be consensus,
but I kind of dislike all the options. Either things are on separate
lines or they're trailing attributes that get really long, etc. Ugh.

I'm happy to clean all of it up into whatever form can be agreed on for
the "correct" placement.

> I also think it should have taken over the "__malloc" name that is
> almost unused right now. Because why would you ever have
> __alloc_size() without having __malloc().

I had originally set out to do that, but the problem with merging with
__malloc is the bit in the docs about "and that the memory has undefined
content". So we can't do that for kmalloc() in the face of GFP_ZERO, as
well as a bunch of other helpers. I always get suspicious about "this
will improve optimization because we depend on claiming something is
'undefined'". :|

-Kees
Linus Torvalds Sept. 10, 2021, 7:17 p.m. UTC | #3
On Fri, Sep 10, 2021 at 11:43 AM Kees Cook <keescook@chromium.org> wrote:
>
> I had originally set out to do that, but the problem with merging with
> __malloc is the bit in the docs about "and that the memory has undefined
> content". So we can't do that for kmalloc() in the face of GFP_ZERO, as
> well as a bunch of other helpers. I always get suspicious about "this
> will improve optimization because we depend on claiming something is
> 'undefined'". :|

Oh, I had entirely missed that historical subtlety of __malloc.

Yeah, that would have been absolutely horrible. But it's not actually
really true.

It seems that the gcc people actually realized the problem, and fixed
the documentation:

  "Attribute malloc indicates that a function is malloc-like, i.e.,
that the pointer P returned by the function cannot alias any other
pointer valid when the function returns, and moreover no pointers to
valid objects occur in any storage addressed by P. In addition, the
GCC predicts that a function with the attribute returns non-null in
most cases"

IOW, it is purely about aliasing guarantees. Basically the guarantee
is that the memory that a "malloc" function returns can not alias
(directly or indirectly) any other allocations.

See

    https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Common-Function-Attributes.html#Common-Function-Attributes

So I think it's ok, and your reaction was entirely correct, but came
from looking at old documentation.

             Linus
Kees Cook Sept. 10, 2021, 7:32 p.m. UTC | #4
On Fri, Sep 10, 2021 at 12:17:40PM -0700, Linus Torvalds wrote:
> On Fri, Sep 10, 2021 at 11:43 AM Kees Cook <keescook@chromium.org> wrote:
> >
> > I had originally set out to do that, but the problem with merging with
> > __malloc is the bit in the docs about "and that the memory has undefined
> > content". So we can't do that for kmalloc() in the face of GFP_ZERO, as
> > well as a bunch of other helpers. I always get suspicious about "this
> > will improve optimization because we depend on claiming something is
> > 'undefined'". :|
> 
> Oh, I had entirely missed that historical subtlety of __malloc.
> 
> Yeah, that would have been absolutely horrible. But it's not actually
> really true.
> 
> It seems that the gcc people actually realized the problem, and fixed
> the documentation:
> 
>   "Attribute malloc indicates that a function is malloc-like, i.e.,
> that the pointer P returned by the function cannot alias any other
> pointer valid when the function returns, and moreover no pointers to
> valid objects occur in any storage addressed by P. In addition, the
> GCC predicts that a function with the attribute returns non-null in
> most cases"
> 
> IOW, it is purely about aliasing guarantees. Basically the guarantee
> is that the memory that a "malloc" function returns can not alias
> (directly or indirectly) any other allocations.
> 
> See
> 
>     https://gcc.gnu.org/onlinedocs/gcc-11.2.0/gcc/Common-Function-Attributes.html#Common-Function-Attributes
> 
> So I think it's ok, and your reaction was entirely correct, but came
> from looking at old documentation.

Okay, sounds good. The other reason for having them be separate is that
some of our allocators are implicitly sized. (i.e. kmem_cache_alloc()),
so there isn't actually a "size" argument to give. I suppose some kind
of VARARGS macro magic could be used to make __malloc() be valid, but
I don't like that in the face of future changes where people just don't
include the argument by accident.

How about the other way around, where __malloc is included in
__alloc_size()? Then the implicitly-sized allocators are left unchanged
with __malloc.

For the mechanical part of this, I'm left needing an answer to "what's
the style guide for this?" in the face of these longer definitions,
especially in the face of potential future trailing attributes.

e.g. all on one line would be 119 characters, way past even the updated
100 character limit:

__must_check static inline void *krealloc_array(void *p, size_t new_n, size_t new_size, gfp_t flags) __alloc_size(2, 3)
{
	...
}

Maybe this? I find it weird still:

__must_check static inline void *krealloc_array(void *p, size_t new_n,
						size_t new_size, gfp_t flags)
						__alloc_size(2, 3)
{
	...
}
Nick Desaulniers Sept. 10, 2021, 7:49 p.m. UTC | #5
On Fri, Sep 10, 2021 at 10:24 AM Linus Torvalds
<torvalds@linux-foundation.org> wrote:
>
> On Thu, Sep 9, 2021 at 8:10 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > +__alloc_size(1)
> >  extern void *vmalloc(unsigned long size);
> [...]
>
> All of these are added in the wrong place

Huh?
$ grep -rn __always_inline
$ grep -rn noinline
$ grep -rn __cold
...

Not that we have explicit guidance here in any coding style guide,
perhaps we can make a clarification in
Documentation/process/coding-style.rst?

I'd say when using function attributes, we tend to have the linkage
(ie. explicitly static or implicitly extern), followed by function
attributes related to inlining (__always_inline, noinline), followed
by return type, followed by function level attributes, followed by
parameter list. But I don't think we're even internally consistent
here.
Linus Torvalds Sept. 10, 2021, 8:16 p.m. UTC | #6
On Fri, Sep 10, 2021 at 12:49 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> Huh?
> $ grep -rn __always_inline
> $ grep -rn noinline
> $ grep -rn __cold

Those are all examples of basically storage classes.

And yes, they go to the front - generally even before the return
value. Exactly like 'extern' and 'static'.

The In contrast, the "function argument descriptions" have
traditionally gone at the end, although you can certainly finds us
screwing that up too.

This, btw, tends to be how the compiler documentation also does it.

So to a close approximation

 - "storage class" goes first, so "static inline" etc.

 - return type next (including attributes directly related to the
returned value - like "__must_check")

 - then function name and argument declaration

 - and finally the "function argument type attributes" at the end.

can you do it in different orders? Yes. And the compiler won't even
generally warn about it. So we've gotten it wrong many many times.

I mean, compilers won't complain even about clear garbage that is _so_
bad that we generally get it right:

  int static myfn(void);

will build perfectly fine. That most certainly doesn't make it right.

Arguably "__malloc" could be seen about the returned type, rather than
being about the function declarations. But if it was about the
returned type, you'd call it a "restrict" pointer, so the very name
kind of implies that it's about the behaviot of the _function_ more
than the type of the return value.

And the "enumerate the arguments" of __alloc_size() makes it 100%
clear that it has absolutely NOTHING to do with the return type, and
is all about the function itself and which arguments give the size.

So the attribute goes at the end, not the front.

Are these a bit arbitrary? Sure. And because it's not checked, it's
not consistent. But I can only repeat: this is literally how the
compiler docs themselves tend to order things, pointed to in the very
patches that are under discussion.

So the rules may be arbitrary, but they are at least _somewhat_
internally consistent. Yes, you can always argue about whether some
behavior is about the returned type or whether it is about the
semantics of the function.

         Linus
Kees Cook Sept. 10, 2021, 8:47 p.m. UTC | #7
On Fri, Sep 10, 2021 at 01:16:00PM -0700, Linus Torvalds wrote:
> So to a close approximation
> 
>  - "storage class" goes first, so "static inline" etc.
> 
>  - return type next (including attributes directly related to the
> returned value - like "__must_check")
> 
>  - then function name and argument declaration
> 
>  - and finally the "function argument type attributes" at the end.

I'm going to eventually forget this thread, so I want to get it into
our coding style so I can find it again more easily. :) How does this
look?

diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
index 42969ab37b34..3c72f0232f02 100644
--- a/Documentation/process/coding-style.rst
+++ b/Documentation/process/coding-style.rst
@@ -487,6 +487,29 @@ because it is a simple way to add valuable information for the reader.
 Do not use the ``extern`` keyword with function prototypes as this makes
 lines longer and isn't strictly necessary.
 
+.. code-block:: c
+
+	static __always_inline __must_check void *action(enum magic value,
+							 size_t size, u8 count,
+							 char *buffer)
+							__alloc_size(2, 3)
+	{
+		...
+	}
+
+When writing a function prototype, keep the order of elements regular. The
+desired order is "storage class", "return type attributes", "return
+type", name, arguments (as described earlier), followed by "function
+argument attributes". In the ``action`` function example above, ``static
+__always_inline`` is the "storage class" (even though ``__always_inline``
+is an attribute, it is treated like ``inline``). ``__must_check`` is
+a "return type attribute" (describing ``void *``). ``void *`` is the
+"return type". ``action`` is the function name, followed by the function
+arguments. Finally ``__alloc_size(2,3)`` is an "function argument attribute",
+describing things about the function arguments. Some attributes, like
+``__malloc``, describe the behavior of the function more than they
+describe the function return type, and are more appropriately included
+in the "function argument attributes".
 
 7) Centralized exiting of functions
 -----------------------------------
Nick Desaulniers Sept. 10, 2021, 8:58 p.m. UTC | #8
On Fri, Sep 10, 2021 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Fri, Sep 10, 2021 at 01:16:00PM -0700, Linus Torvalds wrote:
> > So to a close approximation
> >
> >  - "storage class" goes first, so "static inline" etc.
> >
> >  - return type next (including attributes directly related to the
> > returned value - like "__must_check")
> >
> >  - then function name and argument declaration
> >
> >  - and finally the "function argument type attributes" at the end.
>
> I'm going to eventually forget this thread, so I want to get it into
> our coding style so I can find it again more easily. :) How does this
> look?
>
> diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> index 42969ab37b34..3c72f0232f02 100644
> --- a/Documentation/process/coding-style.rst
> +++ b/Documentation/process/coding-style.rst
> @@ -487,6 +487,29 @@ because it is a simple way to add valuable information for the reader.
>  Do not use the ``extern`` keyword with function prototypes as this makes
>  lines longer and isn't strictly necessary.
>
> +.. code-block:: c
> +
> +       static __always_inline __must_check void *action(enum magic value,
> +                                                        size_t size, u8 count,
> +                                                        char *buffer)
> +                                                       __alloc_size(2, 3)
> +       {
> +               ...
> +       }
> +
> +When writing a function prototype, keep the order of elements regular. The
> +desired order is "storage class", "return type attributes", "return
> +type", name, arguments (as described earlier), followed by "function
> +argument attributes". In the ``action`` function example above, ``static
> +__always_inline`` is the "storage class" (even though ``__always_inline``
> +is an attribute, it is treated like ``inline``). ``__must_check`` is

eh...mentioning inline as though it was a storage class doesn't seem
precise, but I think this is a good start. Thanks Kees.

Acked-by: Nick Desaulniers <ndesaulniers@google.com>

Worst case, consider "inlining related attributes like __always_inline
and noinline should follow the storage class (static, extern).

> +a "return type attribute" (describing ``void *``). ``void *`` is the
> +"return type". ``action`` is the function name, followed by the function
> +arguments. Finally ``__alloc_size(2,3)`` is an "function argument attribute",
> +describing things about the function arguments. Some attributes, like
> +``__malloc``, describe the behavior of the function more than they
> +describe the function return type, and are more appropriately included
> +in the "function argument attributes".
>
>  7) Centralized exiting of functions
>  -----------------------------------
>
> --
> Kees Cook
Kees Cook Sept. 10, 2021, 9:07 p.m. UTC | #9
On Fri, Sep 10, 2021 at 01:58:17PM -0700, Nick Desaulniers wrote:
> On Fri, Sep 10, 2021 at 1:47 PM Kees Cook <keescook@chromium.org> wrote:
> >
> > On Fri, Sep 10, 2021 at 01:16:00PM -0700, Linus Torvalds wrote:
> > > So to a close approximation
> > >
> > >  - "storage class" goes first, so "static inline" etc.
> > >
> > >  - return type next (including attributes directly related to the
> > > returned value - like "__must_check")
> > >
> > >  - then function name and argument declaration
> > >
> > >  - and finally the "function argument type attributes" at the end.
> >
> > I'm going to eventually forget this thread, so I want to get it into
> > our coding style so I can find it again more easily. :) How does this
> > look?
> >
> > diff --git a/Documentation/process/coding-style.rst b/Documentation/process/coding-style.rst
> > index 42969ab37b34..3c72f0232f02 100644
> > --- a/Documentation/process/coding-style.rst
> > +++ b/Documentation/process/coding-style.rst
> > @@ -487,6 +487,29 @@ because it is a simple way to add valuable information for the reader.
> >  Do not use the ``extern`` keyword with function prototypes as this makes
> >  lines longer and isn't strictly necessary.
> >
> > +.. code-block:: c
> > +
> > +       static __always_inline __must_check void *action(enum magic value,
> > +                                                        size_t size, u8 count,
> > +                                                        char *buffer)
> > +                                                       __alloc_size(2, 3)
> > +       {
> > +               ...
> > +       }
> > +
> > +When writing a function prototype, keep the order of elements regular. The
> > +desired order is "storage class", "return type attributes", "return
> > +type", name, arguments (as described earlier), followed by "function
> > +argument attributes". In the ``action`` function example above, ``static
> > +__always_inline`` is the "storage class" (even though ``__always_inline``
> > +is an attribute, it is treated like ``inline``). ``__must_check`` is
> 
> eh...mentioning inline as though it was a storage class doesn't seem
> precise, but I think this is a good start. Thanks Kees.

Well, hm, it's kinda like that? "where does it go?" "*everywhere*" :P
In looking at this a little longer, I do wonder about section attributes,
though. __cold is a hint, but ends up being a section attribute. And
section attributes appear to be used in the storage class (i.e.
"noinstr"). We treat "how the function should behave" attributes as
storage classes too, though (e.g. "notrace"). Is that right?

> 
> Acked-by: Nick Desaulniers <ndesaulniers@google.com>
> 
> Worst case, consider "inlining related attributes like __always_inline
> and noinline should follow the storage class (static, extern).
> 
> > +a "return type attribute" (describing ``void *``). ``void *`` is the
> > +"return type". ``action`` is the function name, followed by the function
> > +arguments. Finally ``__alloc_size(2,3)`` is an "function argument attribute",
> > +describing things about the function arguments. Some attributes, like
> > +``__malloc``, describe the behavior of the function more than they
> > +describe the function return type, and are more appropriately included
> > +in the "function argument attributes".
> >
> >  7) Centralized exiting of functions
> >  -----------------------------------
> >
> > --
> > Kees Cook
> 
> 
> 
> -- 
> Thanks,
> ~Nick Desaulniers
Joe Perches Sept. 11, 2021, 5:29 a.m. UTC | #10
On Fri, 2021-09-10 at 10:23 -0700, Linus Torvalds wrote:
> On Thu, Sep 9, 2021 at 8:10 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > 
> > +__alloc_size(1)
> >  extern void *vmalloc(unsigned long size);
[]
> So wouldn't this all have looked much more natural as
> 
>      void *vmalloc(unsigned long size) __malloc(1);
> 
> instead? Hmm?

I think not as the __malloc attribute and in fact all the other function
attributes are effectively uninteresting when it comes to grepping for
function declarations.

Putting the attribute lists after the function arguments in many
cases would just be visual noise.

My preference would be for declarations to be mostly like:

[optional attribute list[
<return type> function(arguments);

Frequently there are multiline function declarations with many
arguments similar to either of

[optional attribute list]
<return type> function(type arg1,
		       type arg2,
		       etc...);

or

[optional attribute list]
<return type>
function(type arg1,
	 type arg2,
	 etc...);

which always makes grep rather difficult.

And given the expansion is naming lengths there are more and more
of these multiline argument lists.  It doesn't matter if the line
length is increased above 80 columns or not.
Kees Cook Sept. 21, 2021, 11:37 p.m. UTC | #11
On Fri, Sep 10, 2021 at 10:23:48AM -0700, Linus Torvalds wrote:
> On Thu, Sep 9, 2021 at 8:10 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> >
> > +__alloc_size(1)
> >  extern void *vmalloc(unsigned long size);
> [...]
> 
> All of these are added in the wrong place - inconsistent with the very
> compiler documentation the patches add.
> 
> The function attributes are generally added _after_ the function,
> although admittedly we've been quite confused here before.
> 
> But the very compiler documentation you point to in the patch that
> adds these macros gives that as the examples both for gcc and clang:
> 
> + *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute
> + * clang: https://clang.llvm.org/docs/AttributeReference.html#alloc-size
> 
> and honestly I think that is the preferred format because this is
> about the *function*, not about the return type.
> 
> Do both placements work? Yes.

I'm cleaning this up now, and have discovered that the reason for the
before-function placement is consistency with static inlines. If I do this:

static __always_inline void * kmalloc(size_t size, gfp_t flags) __alloc_size(1)
{
	...
}

GCC is very angry:

./include/linux/slab.h:519:1: error: attributes should be specified before the declarator in a function definition
  519 | static __always_inline void *kmalloc_large(size_t size, gfp_t flags) __alloc_size(1)
      | ^~~~~~

It's happy if I treat it as a "return type attribute" in the ordering,
though:

static __always_inline void * __alloc_size(1) kmalloc(size_t size, gfp_t flags)

I'll do that unless you have a preference for somewhere else...

-Kees
Joe Perches Sept. 21, 2021, 11:45 p.m. UTC | #12
On Tue, 2021-09-21 at 16:37 -0700, Kees Cook wrote:
> On Fri, Sep 10, 2021 at 10:23:48AM -0700, Linus Torvalds wrote:
> > On Thu, Sep 9, 2021 at 8:10 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > 
> > > +__alloc_size(1)
> > >  extern void *vmalloc(unsigned long size);
> > [...]
> > 
> > All of these are added in the wrong place - inconsistent with the very
> > compiler documentation the patches add.
> > 
> > The function attributes are generally added _after_ the function,
> > although admittedly we've been quite confused here before.
> > 
> > But the very compiler documentation you point to in the patch that
> > adds these macros gives that as the examples both for gcc and clang:
> > 
> > + *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute
> > + * clang: https://clang.llvm.org/docs/AttributeReference.html#alloc-size
> > 
> > and honestly I think that is the preferred format because this is
> > about the *function*, not about the return type.
> > 
> > Do both placements work? Yes.
> 
> I'm cleaning this up now, and have discovered that the reason for the
> before-function placement is consistency with static inlines. If I do this:
> 
> static __always_inline void * kmalloc(size_t size, gfp_t flags) __alloc_size(1)
> {
> 	...
> }
> 
> GCC is very angry:
> 
> ./include/linux/slab.h:519:1: error: attributes should be specified before the declarator in a function definition
>   519 | static __always_inline void *kmalloc_large(size_t size, gfp_t flags) __alloc_size(1)
>       | ^~~~~~
> 
> It's happy if I treat it as a "return type attribute" in the ordering,
> though:
> 
> static __always_inline void * __alloc_size(1) kmalloc(size_t size, gfp_t flags)
> 
> I'll do that unless you have a preference for somewhere else...

_please_ put it before the return type on a separate line.

[__attributes]
[static inline const] <return type> function(<args...>)
Kees Cook Sept. 22, 2021, 2:25 a.m. UTC | #13
On Tue, Sep 21, 2021 at 04:45:44PM -0700, Joe Perches wrote:
> On Tue, 2021-09-21 at 16:37 -0700, Kees Cook wrote:
> > On Fri, Sep 10, 2021 at 10:23:48AM -0700, Linus Torvalds wrote:
> > > On Thu, Sep 9, 2021 at 8:10 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > 
> > > > +__alloc_size(1)
> > > >  extern void *vmalloc(unsigned long size);
> > > [...]
> > > 
> > > All of these are added in the wrong place - inconsistent with the very
> > > compiler documentation the patches add.
> > > 
> > > The function attributes are generally added _after_ the function,
> > > although admittedly we've been quite confused here before.
> > > 
> > > But the very compiler documentation you point to in the patch that
> > > adds these macros gives that as the examples both for gcc and clang:
> > > 
> > > + *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute
> > > + * clang: https://clang.llvm.org/docs/AttributeReference.html#alloc-size
> > > 
> > > and honestly I think that is the preferred format because this is
> > > about the *function*, not about the return type.
> > > 
> > > Do both placements work? Yes.
> > 
> > I'm cleaning this up now, and have discovered that the reason for the
> > before-function placement is consistency with static inlines. If I do this:
> > 
> > static __always_inline void * kmalloc(size_t size, gfp_t flags) __alloc_size(1)
> > {
> > 	...
> > }
> > 
> > GCC is very angry:
> > 
> > ./include/linux/slab.h:519:1: error: attributes should be specified before the declarator in a function definition
> >   519 | static __always_inline void *kmalloc_large(size_t size, gfp_t flags) __alloc_size(1)
> >       | ^~~~~~
> > 
> > It's happy if I treat it as a "return type attribute" in the ordering,
> > though:
> > 
> > static __always_inline void * __alloc_size(1) kmalloc(size_t size, gfp_t flags)
> > 
> > I'll do that unless you have a preference for somewhere else...
> 
> _please_ put it before the return type on a separate line.
> 
> [__attributes]
> [static inline const] <return type> function(<args...>)

Somehow Linus wasn't in CC. :P

Linus, what do you want here? I keep getting conflicting (or
uncompilable) advice. I'm also trying to prepare a patch for
Documentation/process/coding-style.rst ...

Looking through what was written before[1] and through examples in the
source tree, I find the following categories:

1- storage class: static extern inline __always_inline
2- storage class attributes/hints/???: __init __cold
3- return type: void *
4- return type attributes: __must_check __noreturn __assume_aligned(n)
5- function attributes: __attribute_const__ __malloc
6- function argument attributes: __printf(n, m) __alloc_size(n)

Everyone seems to basically agree on:

[storage class] [return type] [return type attributes] [name]([arg1type] [arg1name], ...)

There is a lot of disagreement over where 5 and 6 should fit in above. And
there is a lot of confusion over 4 (mixed between before and after the
function name) and 2 (see below).

What's currently blocking me is that 6 cannot go after the function
(for definitions) because it angers GCC (see quoted bit above), but 5
can (e.g. __attribute_const__).

Another inconsistency seems to be 2 (mainly section markings like
__init). Sometimes it's after the storage class and sometimes after the
return type, but it certainly feels more like a storage class than a
return type attribute:

$ git grep 'static __init int' | wc -l
349
$ git grep 'static int __init' | wc -l
8402

But it's clearly positioned like a return type attribute in most of the
tree. What's correct?

Regardless, given the constraints above, it seems like what Linus may
want is (on "one line", though it will get wrapped in pathological cases
like kmem_cache_alloc_node_trace):

[storage class] [storage class attributes] [return type] [return type attributes] [function argument attributes] [name]([arg1type] [arg1name], ...) [function attributes]

Joe appears to want (on two lines):

[storage class attributes] [function attributes] [function argument attributes]
[storage class] [return type] [return type attributes] [name]([arg1type] [arg1name], ...)

I would just like to have an arrangement that won't get NAKed by
someone. ;) And I'm willing to document it. :)

-Kees

[1] https://lore.kernel.org/mm-commits/CAHk-=wiOCLRny5aifWNhr621kYrJwhfURsa0vFPeUEm8mF0ufg@mail.gmail.com/
Joe Perches Sept. 22, 2021, 4:24 a.m. UTC | #14
On Tue, 2021-09-21 at 19:25 -0700, Kees Cook wrote:
> On Tue, Sep 21, 2021 at 04:45:44PM -0700, Joe Perches wrote:
> > On Tue, 2021-09-21 at 16:37 -0700, Kees Cook wrote:
> > > On Fri, Sep 10, 2021 at 10:23:48AM -0700, Linus Torvalds wrote:
> > > > On Thu, Sep 9, 2021 at 8:10 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > > 
> > > > > +__alloc_size(1)
> > > > >  extern void *vmalloc(unsigned long size);
> > > > [...]
> > > > 
> > > > All of these are added in the wrong place - inconsistent with the very
> > > > compiler documentation the patches add.
> > > > 
> > > > The function attributes are generally added _after_ the function,
> > > > although admittedly we've been quite confused here before.
> > > > 
> > > > But the very compiler documentation you point to in the patch that
> > > > adds these macros gives that as the examples both for gcc and clang:
> > > > 
> > > > + *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute
> > > > + * clang: https://clang.llvm.org/docs/AttributeReference.html#alloc-size
> > > > 
> > > > and honestly I think that is the preferred format because this is
> > > > about the *function*, not about the return type.
> > > > 
> > > > Do both placements work? Yes.
> > > 
> > > I'm cleaning this up now, and have discovered that the reason for the
> > > before-function placement is consistency with static inlines. If I do this:
> > > 
> > > static __always_inline void * kmalloc(size_t size, gfp_t flags) __alloc_size(1)
> > > {
> > > 	...
> > > }
> > > 
> > > GCC is very angry:
> > > 
> > > ./include/linux/slab.h:519:1: error: attributes should be specified before the declarator in a function definition
> > >   519 | static __always_inline void *kmalloc_large(size_t size, gfp_t flags) __alloc_size(1)
> > >       | ^~~~~~
> > > 
> > > It's happy if I treat it as a "return type attribute" in the ordering,
> > > though:
> > > 
> > > static __always_inline void * __alloc_size(1) kmalloc(size_t size, gfp_t flags)
> > > 
> > > I'll do that unless you have a preference for somewhere else...
> > 
> > _please_ put it before the return type on a separate line.
> > 
> > [__attributes]
> > [static inline const] <return type> function(<args...>)
> 
> Somehow Linus wasn't in CC. :P
> 
> Linus, what do you want here? I keep getting conflicting (or
> uncompilable) advice. I'm also trying to prepare a patch for
> Documentation/process/coding-style.rst ...
> 
> Looking through what was written before[1] and through examples in the
> source tree, I find the following categories:
> 
> 1- storage class: static extern inline __always_inline
> 2- storage class attributes/hints/???: __init __cold
> 3- return type: void *
> 4- return type attributes: __must_check __noreturn __assume_aligned(n)
> 5- function attributes: __attribute_const__ __malloc
> 6- function argument attributes: __printf(n, m) __alloc_size(n)
> 
> Everyone seems to basically agree on:
> 
> [storage class] [return type] [return type attributes] [name]([arg1type] [arg1name], ...)
> 
> There is a lot of disagreement over where 5 and 6 should fit in above. And
> there is a lot of confusion over 4 (mixed between before and after the
> function name) and 2 (see below).
> 
> What's currently blocking me is that 6 cannot go after the function
> (for definitions) because it angers GCC (see quoted bit above), but 5
> can (e.g. __attribute_const__).
> 
> Another inconsistency seems to be 2 (mainly section markings like
> __init). Sometimes it's after the storage class and sometimes after the
> return type, but it certainly feels more like a storage class than a
> return type attribute:
> 
> $ git grep 'static __init int' | wc -l
> 349
> $ git grep 'static int __init' | wc -l
> 8402
> 
> But it's clearly positioned like a return type attribute in most of the
> tree. What's correct?

Neither really.  'Correct' is such a difficult concept.
'Preferred' might be better.

btw: there are about another 100 other uses with '__init' as the
initial attribute, mostly in trace.

And I still think that return type attributes like __init, which is
just a __section define, should go before the function storage class
and ideally on a separate line to simplify the parsing of the actual
function declaration.  Attributes like __section, __aligned, __cold,
etc... don't have much value when looking up a function definition.

> Regardless, given the constraints above, it seems like what Linus may
> want is (on "one line", though it will get wrapped in pathological cases
> like kmem_cache_alloc_node_trace):

Pathological is pretty common these days as the function name length
is rather longer now than earlier times.
 
> [storage class] [storage class attributes] [return type] [return type attributes] [function argument attributes] [name]([arg1type] [arg1name], ...) [function attributes]
> 
> Joe appears to want (on two lines):
> 
> [storage class attributes] [function attributes] [function argument attributes]
> [storage class] [return type] [return type attributes] [name]([arg1type] [arg1name], ...)

I would put [return type attributes] on the initial separate line
even though that's not the most common use today.

> I would just like to have an arrangement that won't get NAKed by
> someone. ;)

Bikeshed building dreamer...

btw:

Scouting through kernel code for frequency of use examples really
should have some age of code checking associated to the use.

Older code was far more freeform than more recently written code.

But IMO the desire here is to ask for a bit more uniformity, not
require it.
Alexey Dobriyan Sept. 22, 2021, 7:24 a.m. UTC | #15
On Tue, Sep 21, 2021 at 07:25:53PM -0700, Kees Cook wrote:
> On Tue, Sep 21, 2021 at 04:45:44PM -0700, Joe Perches wrote:
> > On Tue, 2021-09-21 at 16:37 -0700, Kees Cook wrote:
> > > On Fri, Sep 10, 2021 at 10:23:48AM -0700, Linus Torvalds wrote:
> > > > On Thu, Sep 9, 2021 at 8:10 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > > > > 
> > > > > +__alloc_size(1)
> > > > >  extern void *vmalloc(unsigned long size);
> > > > [...]
> > > > 
> > > > All of these are added in the wrong place - inconsistent with the very
> > > > compiler documentation the patches add.
> > > > 
> > > > The function attributes are generally added _after_ the function,
> > > > although admittedly we've been quite confused here before.
> > > > 
> > > > But the very compiler documentation you point to in the patch that
> > > > adds these macros gives that as the examples both for gcc and clang:
> > > > 
> > > > + *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute
> > > > + * clang: https://clang.llvm.org/docs/AttributeReference.html#alloc-size
> > > > 
> > > > and honestly I think that is the preferred format because this is
> > > > about the *function*, not about the return type.
> > > > 
> > > > Do both placements work? Yes.
> > > 
> > > I'm cleaning this up now, and have discovered that the reason for the
> > > before-function placement is consistency with static inlines. If I do this:
> > > 
> > > static __always_inline void * kmalloc(size_t size, gfp_t flags) __alloc_size(1)
> > > {
> > > 	...
> > > }
> > > 
> > > GCC is very angry:
> > > 
> > > ./include/linux/slab.h:519:1: error: attributes should be specified before the declarator in a function definition
> > >   519 | static __always_inline void *kmalloc_large(size_t size, gfp_t flags) __alloc_size(1)
> > >       | ^~~~~~
> > > 
> > > It's happy if I treat it as a "return type attribute" in the ordering,
> > > though:
> > > 
> > > static __always_inline void * __alloc_size(1) kmalloc(size_t size, gfp_t flags)
> > > 
> > > I'll do that unless you have a preference for somewhere else...
> > 
> > _please_ put it before the return type on a separate line.
> > 
> > [__attributes]
> > [static inline const] <return type> function(<args...>)
> 
> Somehow Linus wasn't in CC. :P
> 
> Linus, what do you want here? I keep getting conflicting (or
> uncompilable) advice. I'm also trying to prepare a patch for
> Documentation/process/coding-style.rst ...
> 
> Looking through what was written before[1] and through examples in the
> source tree, I find the following categories:
> 
> 1- storage class: static extern inline __always_inline
> 2- storage class attributes/hints/???: __init __cold
> 3- return type: void *
> 4- return type attributes: __must_check __noreturn __assume_aligned(n)
> 5- function attributes: __attribute_const__ __malloc
> 6- function argument attributes: __printf(n, m) __alloc_size(n)
> 
> Everyone seems to basically agree on:
> 
> [storage class] [return type] [return type attributes] [name]([arg1type] [arg1name], ...)
> 
> There is a lot of disagreement over where 5 and 6 should fit in above. And
> there is a lot of confusion over 4 (mixed between before and after the
> function name) and 2 (see below).
> 
> What's currently blocking me is that 6 cannot go after the function
> (for definitions) because it angers GCC (see quoted bit above), but 5
> can (e.g. __attribute_const__).
> 
> Another inconsistency seems to be 2 (mainly section markings like
> __init). Sometimes it's after the storage class and sometimes after the
> return type, but it certainly feels more like a storage class than a
> return type attribute:
> 
> $ git grep 'static __init int' | wc -l
> 349
> $ git grep 'static int __init' | wc -l
> 8402
> 
> But it's clearly positioned like a return type attribute in most of the
> tree. What's correct?
> 
> Regardless, given the constraints above, it seems like what Linus may
> want is (on "one line", though it will get wrapped in pathological cases
> like kmem_cache_alloc_node_trace):
> 
> [storage class] [storage class attributes] [return type] [return type attributes] [function argument attributes] [name]([arg1type] [arg1name], ...) [function attributes]
> 
> Joe appears to want (on two lines):
> 
> [storage class attributes] [function attributes] [function argument attributes]
> [storage class] [return type] [return type attributes] [name]([arg1type] [arg1name], ...)
> 
> I would just like to have an arrangement that won't get NAKed by
> someone. ;) And I'm willing to document it. :)

Attributes should be on their own line, they can be quite lengthy.

	__attribute__((...))
	[static] [inline] T f(A1 arg1, ...)
	{
		...
	}

There will be even more attributes in the future, both added by
compilers and developers (const, pure, WUR), so let's make "prototype lane"
for them.

Same for structures:

	__attribute__((packed))
	struct S {
	};

Kernel practice of hiding attributes under defines (__ro_after_init)
breaks ctags which parses the last identifier before semicolon as object
name. Naturally, it is ctags bug, but placing attributes before
declaration will autmatically unbreak such cases.
Joe Perches Sept. 22, 2021, 8:51 a.m. UTC | #16
On Wed, 2021-09-22 at 10:24 +0300, Alexey Dobriyan wrote:

> Attributes should be on their own line, they can be quite lengthy.
> 
> 	__attribute__((...))
> 	[static] [inline] T f(A1 arg1, ...)
> 	{
> 		...
> 	}
> 
> There will be even more attributes in the future, both added by
> compilers and developers (const, pure, WUR), so let's make "prototype lane"
> for them.
> 
> Same for structures:
> 
> 	__attribute__((packed))
> 	struct S {
> 	};

Do you know if placing attributes like __packed/__aligned() before
definitions would work for all cases for structs/substructs/unions?
Alexey Dobriyan Sept. 22, 2021, 10:45 a.m. UTC | #17
On Wed, Sep 22, 2021 at 01:51:34AM -0700, Joe Perches wrote:
> On Wed, 2021-09-22 at 10:24 +0300, Alexey Dobriyan wrote:
> 
> > Attributes should be on their own line, they can be quite lengthy.
> > 
> > 	__attribute__((...))
> > 	[static] [inline] T f(A1 arg1, ...)
> > 	{
> > 		...
> > 	}
> > 
> > There will be even more attributes in the future, both added by
> > compilers and developers (const, pure, WUR), so let's make "prototype lane"
> > for them.
> > 
> > Same for structures:
> > 
> > 	__attribute__((packed))
> > 	struct S {
> > 	};
> 
> Do you know if placing attributes like __packed/__aligned() before
> definitions would work for all cases for structs/substructs/unions?

Somehow, it doesn't.

But it works for members:

	struct S {
       		__attribute__((aligned(16)))
	        int a;
	};
Jani Nikula Sept. 22, 2021, 11:19 a.m. UTC | #18
On Wed, 22 Sep 2021, Alexey Dobriyan <adobriyan@gmail.com> wrote:
> On Tue, Sep 21, 2021 at 07:25:53PM -0700, Kees Cook wrote:
>> On Tue, Sep 21, 2021 at 04:45:44PM -0700, Joe Perches wrote:
>> > On Tue, 2021-09-21 at 16:37 -0700, Kees Cook wrote:
>> > > On Fri, Sep 10, 2021 at 10:23:48AM -0700, Linus Torvalds wrote:
>> > > > On Thu, Sep 9, 2021 at 8:10 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>> > > > > 
>> > > > > +__alloc_size(1)
>> > > > >  extern void *vmalloc(unsigned long size);
>> > > > [...]
>> > > > 
>> > > > All of these are added in the wrong place - inconsistent with the very
>> > > > compiler documentation the patches add.
>> > > > 
>> > > > The function attributes are generally added _after_ the function,
>> > > > although admittedly we've been quite confused here before.
>> > > > 
>> > > > But the very compiler documentation you point to in the patch that
>> > > > adds these macros gives that as the examples both for gcc and clang:
>> > > > 
>> > > > + *   gcc: https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attributes.html#index-alloc_005fsize-function-attribute
>> > > > + * clang: https://clang.llvm.org/docs/AttributeReference.html#alloc-size
>> > > > 
>> > > > and honestly I think that is the preferred format because this is
>> > > > about the *function*, not about the return type.
>> > > > 
>> > > > Do both placements work? Yes.
>> > > 
>> > > I'm cleaning this up now, and have discovered that the reason for the
>> > > before-function placement is consistency with static inlines. If I do this:
>> > > 
>> > > static __always_inline void * kmalloc(size_t size, gfp_t flags) __alloc_size(1)
>> > > {
>> > > 	...
>> > > }
>> > > 
>> > > GCC is very angry:
>> > > 
>> > > ./include/linux/slab.h:519:1: error: attributes should be specified before the declarator in a function definition
>> > >   519 | static __always_inline void *kmalloc_large(size_t size, gfp_t flags) __alloc_size(1)
>> > >       | ^~~~~~
>> > > 
>> > > It's happy if I treat it as a "return type attribute" in the ordering,
>> > > though:
>> > > 
>> > > static __always_inline void * __alloc_size(1) kmalloc(size_t size, gfp_t flags)
>> > > 
>> > > I'll do that unless you have a preference for somewhere else...
>> > 
>> > _please_ put it before the return type on a separate line.
>> > 
>> > [__attributes]
>> > [static inline const] <return type> function(<args...>)
>> 
>> Somehow Linus wasn't in CC. :P
>> 
>> Linus, what do you want here? I keep getting conflicting (or
>> uncompilable) advice. I'm also trying to prepare a patch for
>> Documentation/process/coding-style.rst ...
>> 
>> Looking through what was written before[1] and through examples in the
>> source tree, I find the following categories:
>> 
>> 1- storage class: static extern inline __always_inline
>> 2- storage class attributes/hints/???: __init __cold
>> 3- return type: void *
>> 4- return type attributes: __must_check __noreturn __assume_aligned(n)
>> 5- function attributes: __attribute_const__ __malloc
>> 6- function argument attributes: __printf(n, m) __alloc_size(n)
>> 
>> Everyone seems to basically agree on:
>> 
>> [storage class] [return type] [return type attributes] [name]([arg1type] [arg1name], ...)
>> 
>> There is a lot of disagreement over where 5 and 6 should fit in above. And
>> there is a lot of confusion over 4 (mixed between before and after the
>> function name) and 2 (see below).
>> 
>> What's currently blocking me is that 6 cannot go after the function
>> (for definitions) because it angers GCC (see quoted bit above), but 5
>> can (e.g. __attribute_const__).
>> 
>> Another inconsistency seems to be 2 (mainly section markings like
>> __init). Sometimes it's after the storage class and sometimes after the
>> return type, but it certainly feels more like a storage class than a
>> return type attribute:
>> 
>> $ git grep 'static __init int' | wc -l
>> 349
>> $ git grep 'static int __init' | wc -l
>> 8402
>> 
>> But it's clearly positioned like a return type attribute in most of the
>> tree. What's correct?
>> 
>> Regardless, given the constraints above, it seems like what Linus may
>> want is (on "one line", though it will get wrapped in pathological cases
>> like kmem_cache_alloc_node_trace):
>> 
>> [storage class] [storage class attributes] [return type] [return type attributes] [function argument attributes] [name]([arg1type] [arg1name], ...) [function attributes]
>> 
>> Joe appears to want (on two lines):
>> 
>> [storage class attributes] [function attributes] [function argument attributes]
>> [storage class] [return type] [return type attributes] [name]([arg1type] [arg1name], ...)
>> 
>> I would just like to have an arrangement that won't get NAKed by
>> someone. ;) And I'm willing to document it. :)
>
> Attributes should be on their own line, they can be quite lengthy.
>
> 	__attribute__((...))
> 	[static] [inline] T f(A1 arg1, ...)
> 	{
> 		...
> 	}
>
> There will be even more attributes in the future, both added by
> compilers and developers (const, pure, WUR), so let's make "prototype lane"
> for them.
>
> Same for structures:
>
> 	__attribute__((packed))
> 	struct S {
> 	};
>
> Kernel practice of hiding attributes under defines (__ro_after_init)
> breaks ctags which parses the last identifier before semicolon as object
> name. Naturally, it is ctags bug, but placing attributes before
> declaration will autmatically unbreak such cases.

git grep seems to suggest __packed is preferred over
__attribute__((packed)), and at the end of the struct declaration
instead of at front:

	struct S {
		/* ... */
        } __packed;

And GNU Global handles this just fine. ;)


BR,
Jani.
Linus Torvalds Sept. 22, 2021, 9:15 p.m. UTC | #19
On Wed, Sep 22, 2021 at 12:24 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
>
> Attributes should be on their own line, they can be quite lengthy.

No, no no. They really shouldn't.

First off, no normal code should use that "__attribute__(())" syntax
anyway. It's ugly and big, and many of the attributes are
compiler-specific anyway.

So the "quite lengthy" argument is bogus, because the actual names you
should use are things like "__packed" or "__pure" or "__user" etc.

But the "on their own line" is complete garbage to begin with. That
will NEVER be a kernel rule. We should never have a rule that assumes
things are so long that they need to be on multiple lines.

We don't put function return types on their own lines either, even if
some other projects have that rule (just to get function names at the
beginning of lines or some other odd reason).

So no, attributes do not go on their own lines, and they also
generally don't go before the thing they describe.  Your examples are
wrong, and explicitly against kernel rules.

           Linus
Joe Perches Sept. 23, 2021, 5:10 a.m. UTC | #20
On Wed, 2021-09-22 at 14:15 -0700, Linus Torvalds wrote:
> On Wed, Sep 22, 2021 at 12:24 AM Alexey Dobriyan <adobriyan@gmail.com> wrote:
> > 
> > Attributes should be on their own line, they can be quite lengthy.
> 
> No, no no. They really shouldn't.
> 
> First off, no normal code should use that "__attribute__(())" syntax
> anyway. It's ugly and big, and many of the attributes are
> compiler-specific anyway.
> 
> So the "quite lengthy" argument is bogus, because the actual names you
> should use are things like "__packed" or "__pure" or "__user" etc.
> 
> But the "on their own line" is complete garbage to begin with. That
> will NEVER be a kernel rule. We should never have a rule that assumes
> things are so long that they need to be on multiple lines.

I think it's not so much that lines are long, it's more that the
information provided by these markings aren't particularly useful to
a caller/user of a function.

Under what circumstance is a marking like __pure/__cold or __section
useful to someone that just wants to call a particular function?

A secondary reason why these should be separate or at least put
at the begining of a function declaration is compatibility with
existing tools like ctags.
Kees Cook Sept. 24, 2021, 7:43 p.m. UTC | #21
On Tue, Sep 21, 2021 at 09:24:04PM -0700, Joe Perches wrote:
> On Tue, 2021-09-21 at 19:25 -0700, Kees Cook wrote:
> > [...]
> > Looking through what was written before[1] and through examples in the
> > source tree, I find the following categories:
> > 
> > 1- storage class: static extern inline __always_inline
> > 2- storage class attributes/hints/???: __init __cold
> > 3- return type: void *
> > 4- return type attributes: __must_check __noreturn __assume_aligned(n)
> > 5- function attributes: __attribute_const__ __malloc
> > 6- function argument attributes: __printf(n, m) __alloc_size(n)
> > 
> > Everyone seems to basically agree on:
> > 
> > [storage class] [return type] [return type attributes] [name]([arg1type] [arg1name], ...)
> > 
> > There is a lot of disagreement over where 5 and 6 should fit in above. And
> > there is a lot of confusion over 4 (mixed between before and after the
> > function name) and 2 (see below).
> > 
> > What's currently blocking me is that 6 cannot go after the function
> > (for definitions) because it angers GCC (see quoted bit above), but 5
> > can (e.g. __attribute_const__).
> > 
> > Another inconsistency seems to be 2 (mainly section markings like
> > __init). Sometimes it's after the storage class and sometimes after the
> > return type, but it certainly feels more like a storage class than a
> > return type attribute:
> > 
> > $ git grep 'static __init int' | wc -l
> > 349
> > $ git grep 'static int __init' | wc -l
> > 8402
> > 
> > But it's clearly positioned like a return type attribute in most of the
> > tree. What's correct?
> 
> Neither really.  'Correct' is such a difficult concept.
> 'Preferred' might be better.

Right -- I expect it to be a guideline.

> btw: there are about another 100 other uses with '__init' as the
> initial attribute, mostly in trace.

Hah, yeah.

> And I still think that return type attributes like __init, which is
> just a __section define, should go before the function storage class
> and ideally on a separate line to simplify the parsing of the actual
> function declaration.  Attributes like __section, __aligned, __cold,
> etc... don't have much value when looking up a function definition.
> 
> > Regardless, given the constraints above, it seems like what Linus may
> > want is (on "one line", though it will get wrapped in pathological cases
> > like kmem_cache_alloc_node_trace):
> 
> Pathological is pretty common these days as the function name length
> is rather longer now than earlier times.

Agreed!

> > [storage class] [storage class attributes] [return type] [return type attributes] [function argument attributes] [name]([arg1type] [arg1name], ...) [function attributes]
> > 
> > Joe appears to want (on two lines):
> > 
> > [storage class attributes] [function attributes] [function argument attributes]
> > [storage class] [return type] [return type attributes] [name]([arg1type] [arg1name], ...)
> 
> I would put [return type attributes] on the initial separate line
> even though that's not the most common use today.

I found a few other people wanting separate lines too, so at the risk of
annoying Linus, I guess I'll attempt this (again).

> > I would just like to have an arrangement that won't get NAKed by
> > someone. ;)
> 
> Bikeshed building dreamer...

I just want to know the right place to put stuff. :P

> But IMO the desire here is to ask for a bit more uniformity, not
> require it.

Yeah.
David Laight Sept. 25, 2021, 7:40 p.m. UTC | #22
From: Linus Torvalds
> Sent: 22 September 2021 22:16
...
> We don't put function return types on their own lines either, even if
> some other projects have that rule (just to get function names at the
> beginning of lines or some other odd reason).

If the function name starts at the beginning of a line it is
much easier to grep for the definition.
Trying to find function definitions in the Linux kernel tree
is a PITA - unless they are exported when 'EXPORT.*(function_name)'
will tend to work.

Trying to compile:
static int x(int y) __attribute__((section("x"))) { return y;}
with gcc generates "error: attributes are not allowed on a function-definition".

Putting the attribute anywhere before the function name works fine.
gcc probably accepts:
__inline static __inline int __inline x(void) {return 0;} 

So any of those locations is plausible.
But after the arguments isn't allowed.
So an (extern) function declaration probably should not put them
there - if only for consistency.

I think I'd go for 'first' - optionally on their own line.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Linus Torvalds Sept. 26, 2021, 9:03 p.m. UTC | #23
On Sat, Sep 25, 2021 at 12:40 PM David Laight <David.Laight@aculab.com> wrote:
>
> If the function name starts at the beginning of a line it is
> much easier to grep for the definition.

That has always been a completely bogus argument. I grep to look up
the type as often as I grep for the function definition, plus it's not
at all unlikely that the "function" is actually a macro wrapper, so
grepping for the beginning of line is just completely wrong.

It's completely wrong for another reason too: it assumes a style of
programming that has never actually been all that common. It's a very
specific pattern to very specific projects, and anybody who learnt
that pattern for their project is going to be completely lost anywhere
else. So don't do it. It's just a bad idea.

So a broken "easier to grep for" is not an excuse for "make the code
harder to read" particularly when it just makes another type of
grepping harder, and it's not actually nearly universal enough to
actually be a useful pattern in the first place.

It's not only never been the pattern in the kernel, but it's generally
not been the pattern anywhere else either. It's literally one of the
broken GNU coding standards - and the fact that almost every other
part of the GNU coding standards were wrong (indentation, placement of
braces, you name it) should give you a hint about how good _that_ one
was.

Here's an exercise for you: go search for C coding examples on the
web, and see how many of them do

    int main(int argc, char **argv)

vs how many of them do

    int
    main(int argc, char **argv)

and then realize that in order for the "grep for ^main" pattern to be
useful, the second version has to not just be more common, it has to
be practically *universal*.

Hint: it isn't even remotely more common, much less universal. In
Debian code search, I had to go to the third page to find any example
at all of people putting the "int" and the "main" on different lines,
and even that one didn't place the "main()" at the beginning of the
line - they had been separated because of other reasons and looked
like this:

int
#ifdef _WIN32
    __cdecl
#endif // _WIN32
    main(int argc, char** argv)

instead.

Maybe Dbian code search isn't the place to go, but I think it proves
my case: the "function name at beginning of line" story is pure
make-believe, and has absolutely no relevance in the real world.

It's a bad straightjacket. Just get over it, and stop perpetuating the
idiotic myth.

If you care so much about grepping for function declarations, and you
use that old-fashioned GNU coding standard policy as an argument, just
be *properly* old-fashioned instead, and use etags or something.

Don't make the rest of us suffer.

Because I grep for functions all the time, and I'd rather have useful
output - which very much includes the type of the function. That's
often one reason _why_ I grep for things in the first place.

Other grep tricks for when the function really is used everywhere, and
you are having trouble finding the definition vs the use:

 - grep in the headers for the type, and actually use the type (either
of the function, or the first argument) as part of the pattern.

   If you really have no idea where it might be, you'll want to start
off with the header grep anyway, to find the macro case (or the inline
case)

   Yeah, splitting the declaration will screw the type information up.
So don't do that, then.

 - if it's so widely used that you find it all over, it's probably
exported. grep for 'EXPORT.*fnname' to see where it is defined.

   We used to (brokenly) export things separately from the definition.
If you find cases of that, let's fix them.

Of course, usually I know roughly where it is defined, so I just limit
the pathnames for 'git grep'.

But the 'add the type of the return value or first argument to the
pattern' is often my second go-to (particularly for the cases where
you might be looking for _multiple_ definitions because it's some
architecture-specific thing, or you have some partial pattern because
every filesystem does their own thing).

Other 'git grep' patterns that often work for kernel sources:

 - looking for a structure declaration? Use

      git grep 'struct name {'

   which mostly works, but obviously depends on coding style so it's
not guaranteed. Good first thing to try, though.

 - use

        git grep '\t\.name\>.*='

   to find things like particular inode operations.

That second case is because we have almost universally converted our
filesystem operation initializers to use that named format (and really
strive to have a policy of constant structures of function pointers
only), and it's really convenient if you are doing VFS changes and
need to find all the places that use a particular VFS interface (eg
".get_acl" or similar).

It used to be a nightmare to find those things back when most of our
initializers were using the traditional unnamed ordered structure
initializers, so this is one area where we've introduced coding style
policies to make it really easy to grep for things (but also much
easier to add new fields and not have to add pointless NULL
initializer elements, of course).

             Linus
David Laight Sept. 27, 2021, 8:21 a.m. UTC | #24
From: Linus Torvalds
> Sent: 26 September 2021 22:04
> 
> On Sat, Sep 25, 2021 at 12:40 PM David Laight <David.Laight@aculab.com> wrote:
> >
> > If the function name starts at the beginning of a line it is
> > much easier to grep for the definition.
> 
> That has always been a completely bogus argument. I grep to look up
> the type as often as I grep for the function definition, plus it's not
> at all unlikely that the "function" is actually a macro wrapper, so
> grepping for the beginning of line is just completely wrong.
> 
> It's completely wrong for another reason too: it assumes a style of
> programming that has never actually been all that common. It's a very
> specific pattern to very specific projects, and anybody who learnt
> that pattern for their project is going to be completely lost anywhere
> else. So don't do it. It's just a bad idea.
> 
> So a broken "easier to grep for" is not an excuse for "make the code
> harder to read" particularly when it just makes another type of
> grepping harder, and it's not actually nearly universal enough to
> actually be a useful pattern in the first place.
> 
> It's not only never been the pattern in the kernel, but it's generally
> not been the pattern anywhere else either. It's literally one of the
> broken GNU coding standards - and the fact that almost every other
> part of the GNU coding standards were wrong (indentation, placement of
> braces, you name it) should give you a hint about how good _that_ one
> was.
> 
> Here's an exercise for you: go search for C coding examples on the
> web, and see how many of them do
> 
>     int main(int argc, char **argv)
> 
> vs how many of them do
> 
>     int
>     main(int argc, char **argv)

It makes a bigger difference with:

struct frobulate *find_frobulate(args)
which is going to need a line break somewhere.
Especially with the (strange) rule about aligning the continued
arguments with the (.

But I didn't expect such a long response :-)

I'm sure the netBSD tree (mostly) puts the function name in column 1.
But after that uses the K&R location for {} (as does Linux).

It true that a lot of 'coding standards' are horrid.
Putting '} else {' on one line is important when reading code.
Especially if the '}' would be at the bottom of the screen,
or worse still turning the page on a fan-fold paper listing to find
a floating 'else' = with no idea which 'if' it goes with.

The modern example of why { and } shouldn't be on their own lines is:
		...
	}
	while (...........................
	{
		...
Is that a loop bottom followed by a code block or
a conditional followed by a loop?

But none of this is related to the location of attributes unless
you need to split long lines and put the attribute before the
function name where you may need.

static struct frobulate *
__inline ....
find_frobulate(....)

Especially if you need #if around the attributes.

	David


	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Willy Tarreau Sept. 27, 2021, 9:22 a.m. UTC | #25
On Mon, Sep 27, 2021 at 08:21:24AM +0000, David Laight wrote:
> Putting '} else {' on one line is important when reading code.

I used not to like that due to "else if ()" being less readable and less
easy to spot, but the arguments you gave regarding the end of screen are
valid and are similar to my hate of GNU's broken "while ()" on its own
line especially after a "do { }" block where it immediately looks like
an accidental infinite loop.

However:

> But none of this is related to the location of attributes unless
> you need to split long lines and put the attribute before the
> function name where you may need.
> 
> static struct frobulate *
> __inline ....
> find_frobulate(....)

This is exactly the case where I hate to dig into code looking like
that: you build, it fails to find symbol "find_frobulate()", you run
"git grep -w find_frobulate" to figure what file provides it, or even
"grep ^find_frobulate" if you want. And you find it in frobulate.c. You
double-check, you find that frobulate.o was built and linked into your
executable. Despite this it fails to find the symbol. Finally you open
the file to discover this painful "static" two lines above, which made
you waste 3 minutes of your time digging at the wrong place.

*Just* for this reason I'm much more careful to always put the type and
name on the same line nowadays.

> Especially if you need #if around the attributes.

This is the only exception I still have to the rule above. But #if by
definition require multi-line processing anyway and they're not welcome
in the middle of control flows.

Willy
diff mbox series

Patch

--- a/include/linux/vmalloc.h~mm-vmalloc-add-__alloc_size-attributes-for-better-bounds-checking
+++ a/include/linux/vmalloc.h
@@ -136,20 +136,31 @@  static inline void vmalloc_init(void)
 static inline unsigned long vmalloc_nr_pages(void) { return 0; }
 #endif
 
+__alloc_size(1)
 extern void *vmalloc(unsigned long size);
+__alloc_size(1)
 extern void *vzalloc(unsigned long size);
+__alloc_size(1)
 extern void *vmalloc_user(unsigned long size);
+__alloc_size(1)
 extern void *vmalloc_node(unsigned long size, int node);
+__alloc_size(1)
 extern void *vzalloc_node(unsigned long size, int node);
+__alloc_size(1)
 extern void *vmalloc_32(unsigned long size);
+__alloc_size(1)
 extern void *vmalloc_32_user(unsigned long size);
+__alloc_size(1)
 extern void *__vmalloc(unsigned long size, gfp_t gfp_mask);
+__alloc_size(1)
 extern void *__vmalloc_node_range(unsigned long size, unsigned long align,
 			unsigned long start, unsigned long end, gfp_t gfp_mask,
 			pgprot_t prot, unsigned long vm_flags, int node,
 			const void *caller);
+__alloc_size(1)
 void *__vmalloc_node(unsigned long size, unsigned long align, gfp_t gfp_mask,
 		int node, const void *caller);
+__alloc_size(1)
 void *vmalloc_no_huge(unsigned long size);
 
 extern void vfree(const void *addr);