diff mbox series

lib/string: shrink lib/string.i via IWYU

Message ID 20231205-libstringheader-v1-1-7f9c573053a7@gmail.com (mailing list archive)
State Changes Requested
Headers show
Series lib/string: shrink lib/string.i via IWYU | expand

Commit Message

Tanzir Hasan Dec. 5, 2023, 8:58 p.m. UTC
This diff uses an open source tool include-what-you-use (IWYU) to modify
the include list changing indirect includes to direct includes.
IWYU is implemented using the IWYUScripts github repository which is a tool that is
currently undergoing development. These changes seek to improve build times.

This change to lib/string.c resulted in a preprocessed size of
lib/string.i from 26371 lines to 5232 lines (-80%).

If there are any problems with the output of the tool please raise an
issue on the github.

Link: https://github.com/ClangBuiltLinux/IWYUScripts

Signed-off-by: Tanzir Hasan <tanzirh@google.com>
---

---
 lib/string.c | 14 +++++++-------
 1 file changed, 7 insertions(+), 7 deletions(-)


---
base-commit: 33cc938e65a98f1d29d0a18403dbbee050dcad9a
change-id: 20231204-libstringheader-e238e2af5eec

Best regards,

Comments

Andrew Morton Dec. 5, 2023, 9:04 p.m. UTC | #1
On Tue, 05 Dec 2023 20:58:53 +0000 tanzirh@google.com wrote:

> This diff uses an open source tool include-what-you-use (IWYU) to modify
> the include list changing indirect includes to direct includes.
> IWYU is implemented using the IWYUScripts github repository which is a tool that is
> currently undergoing development. These changes seek to improve build times.
> 
> This change to lib/string.c resulted in a preprocessed size of
> lib/string.i from 26371 lines to 5232 lines (-80%).
> 
> If there are any problems with the output of the tool please raise an
> issue on the github.
> 
> Link: https://github.com/ClangBuiltLinux/IWYUScripts

Issue:

> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -16,16 +16,16 @@
>  
>  #define __NO_FORTIFY
>  #include <linux/types.h>
> +#include <asm/bitsperlong.h>

The preferred way to import bit-fiddling stuff is to include
<linux/bits.h>.  Under the hood this may include asm/bitsperlong.h.  Or
it may not, depending on Kconfig settings (particularly architecture).
Nick Desaulniers Dec. 5, 2023, 9:14 p.m. UTC | #2
On Tue, Dec 5, 2023 at 1:04 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 05 Dec 2023 20:58:53 +0000 tanzirh@google.com wrote:
>
> > This diff uses an open source tool include-what-you-use (IWYU) to modify
> > the include list changing indirect includes to direct includes.
> > IWYU is implemented using the IWYUScripts github repository which is a tool that is
> > currently undergoing development. These changes seek to improve build times.
> >
> > This change to lib/string.c resulted in a preprocessed size of
> > lib/string.i from 26371 lines to 5232 lines (-80%).
> >
> > If there are any problems with the output of the tool please raise an
> > issue on the github.
> >
> > Link: https://github.com/ClangBuiltLinux/IWYUScripts
>
> Issue:
>
> > --- a/lib/string.c
> > +++ b/lib/string.c
> > @@ -16,16 +16,16 @@
> >
> >  #define __NO_FORTIFY
> >  #include <linux/types.h>
> > +#include <asm/bitsperlong.h>

Thanks for the quick feedback.  For additional context, Tanzir is my
intern working on getting IWYU working on the kernel.  Welcome Tanzir
to LKML!

>
> The preferred way to import bit-fiddling stuff is to include
> <linux/bits.h>.  Under the hood this may include asm/bitsperlong.h.  Or
> it may not, depending on Kconfig settings (particularly architecture).
>

Just triple checking my understanding; it looks like
include/linux/bits.h unconditionally includes asm/bitsperlong.h (which
is implemented per arch) most of which seem to include
asm-generic/bitsperlong.h.

include/linux/bits.h also defines a few macros (BIT_MASK, BIT_WORD,
BITS_PER_BYTE, GENMASK, etc).  If lib/string.c is not using any of
those, why can't we go straight to #including asm/bitsperlong.h?  That
should resolve to the arch specific impl which may include
asm-generic/bitsperlong.h?
Andrew Morton Dec. 5, 2023, 9:24 p.m. UTC | #3
On Tue, 5 Dec 2023 13:14:16 -0800 Nick Desaulniers <ndesaulniers@google.com> wrote:

> >
> > The preferred way to import bit-fiddling stuff is to include
> > <linux/bits.h>.  Under the hood this may include asm/bitsperlong.h.  Or
> > it may not, depending on Kconfig settings (particularly architecture).
> >
> 
> Just triple checking my understanding; it looks like
> include/linux/bits.h unconditionally includes asm/bitsperlong.h (which
> is implemented per arch) most of which seem to include
> asm-generic/bitsperlong.h.
> 
> include/linux/bits.h also defines a few macros (BIT_MASK, BIT_WORD,
> BITS_PER_BYTE, GENMASK, etc).  If lib/string.c is not using any of
> those, why can't we go straight to #including asm/bitsperlong.h?  That
> should resolve to the arch specific impl which may include
> asm-generic/bitsperlong.h?

It's just a general rule.  If the higher-level include is present, use
that.  Because of the above, plus I guess things might change in the
future.

We've been getting better about irregular asm/include files.

But bits.h is a poor example.  A better case to study is spinlock.h. 
If this tool recommended including asm/spinlock.h then that won't work
on any architecture which doesn't implement SMP (there is no
arch/nios2/include/asm/spinlock.h).
Al Viro Dec. 5, 2023, 9:38 p.m. UTC | #4
On Tue, Dec 05, 2023 at 08:58:53PM +0000, tanzirh@google.com wrote:
> This diff uses an open source tool include-what-you-use (IWYU) to modify
> the include list changing indirect includes to direct includes.

How does it account for arch- and config-dependent indirect includes?

In particular, on sh this patch breaks, since there word-at-a-time.h does not
contain an include of kernel.h, even though it uses REPEAT_BYTE.  This is
a very simple case (they really ought to include kernel.h, same as all other
instances of word-at-a-time.h), but I would expect arseloads of more subtle
breakage in anything less trivial.

And I'm not at all sure that there's no config-dependent breakage as well -
this had been caught by quick make allmodconfig + make lib/string.o on
a bunch of architectures; the graph of indirects includes (as well as the
set of symbols needed for given header) is very much config-dependent.

> IWYU is implemented using the IWYUScripts github repository which is a tool that is
> currently undergoing development. These changes seek to improve build times.
> 
> This change to lib/string.c resulted in a preprocessed size of
> lib/string.i from 26371 lines to 5232 lines (-80%).

It also breeds includes of asm/*.h, by the look of the output, which is
not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
outside of linux/uaccess.h is a bad idea.

> If there are any problems with the output of the tool please raise an
> issue on the github.

Sorry, no.  Your tool, your workflow, of course, but I don't do github issues.
You are welcome to the contents of this reply, but I'm not using browser-based
UI without very strong reasons; this one isn't.
Nick Desaulniers Dec. 5, 2023, 9:39 p.m. UTC | #5
On Tue, Dec 5, 2023 at 1:24 PM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Tue, 5 Dec 2023 13:14:16 -0800 Nick Desaulniers <ndesaulniers@google.com> wrote:
>
> > >
> > > The preferred way to import bit-fiddling stuff is to include
> > > <linux/bits.h>.  Under the hood this may include asm/bitsperlong.h.  Or
> > > it may not, depending on Kconfig settings (particularly architecture).
> > >
> >
> > Just triple checking my understanding; it looks like
> > include/linux/bits.h unconditionally includes asm/bitsperlong.h (which
> > is implemented per arch) most of which seem to include
> > asm-generic/bitsperlong.h.
> >
> > include/linux/bits.h also defines a few macros (BIT_MASK, BIT_WORD,
> > BITS_PER_BYTE, GENMASK, etc).  If lib/string.c is not using any of
> > those, why can't we go straight to #including asm/bitsperlong.h?  That
> > should resolve to the arch specific impl which may include
> > asm-generic/bitsperlong.h?
>
> It's just a general rule.  If the higher-level include is present, use
> that.  Because of the above, plus I guess things might change in the
> future.

Hmm...how does one know that linux/bits.h is the higher-level include
of asm/bitsperlong.h?

Do we mention these conventions anywhere under Documentation?

>
> We've been getting better about irregular asm/include files.
>
> But bits.h is a poor example.  A better case to study is spinlock.h.
> If this tool recommended including asm/spinlock.h then that won't work
> on any architecture which doesn't implement SMP (there is no
> arch/nios2/include/asm/spinlock.h).

The tooling Tanzir is working on does wrap IWYU, and does support such
mapping (of 'low level' to 'high level' headers; more so, if it
recommends X you can override to suggest Y instead).

arch/nios/ also doesn't provide a bug.h, which this patch is
suggesting we include directly.  I guess the same goes for
asm/rwonce.h.
Al Viro Dec. 5, 2023, 9:43 p.m. UTC | #6
On Tue, Dec 05, 2023 at 01:39:47PM -0800, Nick Desaulniers wrote:

> The tooling Tanzir is working on does wrap IWYU, and does support such
> mapping (of 'low level' to 'high level' headers; more so, if it
> recommends X you can override to suggest Y instead).
> 
> arch/nios/ also doesn't provide a bug.h, which this patch is
> suggesting we include directly.  I guess the same goes for
> asm/rwonce.h.

See include/asm-generic/Kbuild:
mandatory-y += bug.h
...
mandatory-y += rwonce.h

IOW, sh will have asm/bug.h and as/rwonce.h copied from asm-generic.

Still, includes of asm/*.h had been a massive headache historically
and breeding more of those shouldn't be overdone.

More painful problem is arch- and config-dependent stuff, though...
Andy Shevchenko Dec. 5, 2023, 9:50 p.m. UTC | #7
On Tue, Dec 05, 2023 at 08:58:53PM +0000, tanzirh@google.com wrote:
> This diff uses an open source tool include-what-you-use (IWYU) to modify
> the include list changing indirect includes to direct includes.
> IWYU is implemented using the IWYUScripts github repository which is a tool that is
> currently undergoing development. These changes seek to improve build times.
> 
> This change to lib/string.c resulted in a preprocessed size of
> lib/string.i from 26371 lines to 5232 lines (-80%).
> 
> If there are any problems with the output of the tool please raise an
> issue on the github.

> Link: https://github.com/ClangBuiltLinux/IWYUScripts
> 
> Signed-off-by: Tanzir Hasan <tanzirh@google.com>

Blank lines are not allowed in tag block. Unless Link: is not supposed to be
a tag (which looks quite strange in that case).
Nick Desaulniers Dec. 5, 2023, 9:51 p.m. UTC | #8
On Tue, Dec 5, 2023 at 1:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Dec 05, 2023 at 08:58:53PM +0000, tanzirh@google.com wrote:
> > This diff uses an open source tool include-what-you-use (IWYU) to modify
> > the include list changing indirect includes to direct includes.
>
> How does it account for arch- and config-dependent indirect includes?
>
> In particular, on sh this patch breaks, since there word-at-a-time.h does not
> contain an include of kernel.h, even though it uses REPEAT_BYTE.  This is
> a very simple case (they really ought to include kernel.h, same as all other
> instances of word-at-a-time.h), but I would expect arseloads of more subtle
> breakage in anything less trivial.
>
> And I'm not at all sure that there's no config-dependent breakage as well -
> this had been caught by quick make allmodconfig + make lib/string.o on
> a bunch of architectures; the graph of indirects includes (as well as the
> set of symbols needed for given header) is very much config-dependent.

We're sending these to Kees to stage in branch flowing into linux-next
in order for the patches to get soak time in linux-next; it's not
possible to test every possible randconfig, but with enough soak time
and the bots chewing on linux-next, I think we can get to a certain
level of confidence.

We'll ramp up the amount of testing we're doing locally as well. (I
did quite a few randconfigs locally in a loop; didn't test all
architectures) We can probably fetch the kernel.org toolchains for
very extensive testing.
https://mirrors.edge.kernel.org/pub/tools/crosstool/

>
> > IWYU is implemented using the IWYUScripts github repository which is a tool that is
> > currently undergoing development. These changes seek to improve build times.
> >
> > This change to lib/string.c resulted in a preprocessed size of
> > lib/string.i from 26371 lines to 5232 lines (-80%).
>
> It also breeds includes of asm/*.h, by the look of the output, which is
> not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
> outside of linux/uaccess.h is a bad idea.

It's not clear to me when it's ok to #include <asm/*.h>.  Is there a
convention here that I'm missing?

>
> > If there are any problems with the output of the tool please raise an
> > issue on the github.
>
> Sorry, no.  Your tool, your workflow, of course, but I don't do github issues.
> You are welcome to the contents of this reply, but I'm not using browser-based
> UI without very strong reasons; this one isn't.

No problem; Tanzir, please drop that part of the commit message in
future patches.
Andy Shevchenko Dec. 5, 2023, 9:53 p.m. UTC | #9
On Tue, Dec 05, 2023 at 01:39:47PM -0800, Nick Desaulniers wrote:
> On Tue, Dec 5, 2023 at 1:24 PM Andrew Morton <akpm@linux-foundation.org> wrote:
> > On Tue, 5 Dec 2023 13:14:16 -0800 Nick Desaulniers <ndesaulniers@google.com> wrote:
> > > > The preferred way to import bit-fiddling stuff is to include
> > > > <linux/bits.h>.  Under the hood this may include asm/bitsperlong.h.  Or
> > > > it may not, depending on Kconfig settings (particularly architecture).
> > >
> > > Just triple checking my understanding; it looks like
> > > include/linux/bits.h unconditionally includes asm/bitsperlong.h (which
> > > is implemented per arch) most of which seem to include
> > > asm-generic/bitsperlong.h.
> > >
> > > include/linux/bits.h also defines a few macros (BIT_MASK, BIT_WORD,
> > > BITS_PER_BYTE, GENMASK, etc).  If lib/string.c is not using any of
> > > those, why can't we go straight to #including asm/bitsperlong.h?  That
> > > should resolve to the arch specific impl which may include
> > > asm-generic/bitsperlong.h?
> >
> > It's just a general rule.  If the higher-level include is present, use
> > that.  Because of the above, plus I guess things might change in the
> > future.
> 
> Hmm...how does one know that linux/bits.h is the higher-level include
> of asm/bitsperlong.h?
> 
> Do we mention these conventions anywhere under Documentation?

Unfortunately it comes with an experience. The problem here is the absence of
Documentation for the headers that guarantee inclusion of other headers. In
general linux/* are preferred over asm/* but sometimes things are complicated
(when some asm ones include linux ones via architectural code and circling
around).

> > We've been getting better about irregular asm/include files.
> >
> > But bits.h is a poor example.  A better case to study is spinlock.h.
> > If this tool recommended including asm/spinlock.h then that won't work
> > on any architecture which doesn't implement SMP (there is no
> > arch/nios2/include/asm/spinlock.h).
> 
> The tooling Tanzir is working on does wrap IWYU, and does support such
> mapping (of 'low level' to 'high level' headers; more so, if it
> recommends X you can override to suggest Y instead).
> 
> arch/nios/ also doesn't provide a bug.h, which this patch is
> suggesting we include directly.  I guess the same goes for
> asm/rwonce.h.

Have you checked Ingo Molnar's gigantic series (2k+ patches) for the header
hell clean up? Perhaps we need to apply that first.
Al Viro Dec. 5, 2023, 9:57 p.m. UTC | #10
On Tue, Dec 05, 2023 at 09:38:07PM +0000, Al Viro wrote:

> It also breeds includes of asm/*.h, by the look of the output, which is
> not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
> outside of linux/uaccess.h is a bad idea.

Let me elaborate a bit:

Consider e.g. a situation when we have linux/foo.h pulling asm/foo.h;
among the things declared there is foo_bar(), which is identical in
18 instances of asm/foo.h (out of current 21).

At some point we notice that; the obvious approach would be to have
3 unusual architectures have their asm/foo.h define _ARCH_HAS_ODD_FOO_BAR
and put the common variant in linux/foo.h under ifndef _ARCH_HAS_ODD_FOO_BAR.

Except that any file that pulls asm/foo.h will get broken by that.
And something like your tool might convert the users of linux/foo.h to
asm/foo.h - all it takes is using only the primitives that currently
happen to be in asm/foo.h on all architectures.

This is not an artifical example, BTW - look at the history of uaccess.h and
checksum.h...
Nick Desaulniers Dec. 5, 2023, 9:57 p.m. UTC | #11
On Tue, Dec 5, 2023 at 1:44 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Dec 05, 2023 at 01:39:47PM -0800, Nick Desaulniers wrote:
>
> > The tooling Tanzir is working on does wrap IWYU, and does support such
> > mapping (of 'low level' to 'high level' headers; more so, if it
> > recommends X you can override to suggest Y instead).
> >
> > arch/nios/ also doesn't provide a bug.h, which this patch is
> > suggesting we include directly.  I guess the same goes for
> > asm/rwonce.h.
>
> See include/asm-generic/Kbuild:
> mandatory-y += bug.h
> ...
> mandatory-y += rwonce.h
>
> IOW, sh will have asm/bug.h and as/rwonce.h copied from asm-generic.

Ah, right, Ard had mentioned this file to me once.  And mandatory-y is
documented in Documentation/kbuild/makefiles.rst.

Ok, so we can use include/asm-generic/Kbuild to know that if the .h
files listed there are recommended to include as asm/*.h, then we
should replace that recommendation with linux/*.h.  Our tooling has
something where we can automate that.

>
> Still, includes of asm/*.h had been a massive headache historically
> and breeding more of those shouldn't be overdone.
>
> More painful problem is arch- and config-dependent stuff, though...
Greg KH Dec. 5, 2023, 9:59 p.m. UTC | #12
On Tue, Dec 05, 2023 at 01:51:10PM -0800, Nick Desaulniers wrote:
> On Tue, Dec 5, 2023 at 1:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> >
> > On Tue, Dec 05, 2023 at 08:58:53PM +0000, tanzirh@google.com wrote:
> > > This diff uses an open source tool include-what-you-use (IWYU) to modify
> > > the include list changing indirect includes to direct includes.
> >
> > How does it account for arch- and config-dependent indirect includes?
> >
> > In particular, on sh this patch breaks, since there word-at-a-time.h does not
> > contain an include of kernel.h, even though it uses REPEAT_BYTE.  This is
> > a very simple case (they really ought to include kernel.h, same as all other
> > instances of word-at-a-time.h), but I would expect arseloads of more subtle
> > breakage in anything less trivial.
> >
> > And I'm not at all sure that there's no config-dependent breakage as well -
> > this had been caught by quick make allmodconfig + make lib/string.o on
> > a bunch of architectures; the graph of indirects includes (as well as the
> > set of symbols needed for given header) is very much config-dependent.
> 
> We're sending these to Kees to stage in branch flowing into linux-next
> in order for the patches to get soak time in linux-next; it's not
> possible to test every possible randconfig, but with enough soak time
> and the bots chewing on linux-next, I think we can get to a certain
> level of confidence.
> 
> We'll ramp up the amount of testing we're doing locally as well. (I
> did quite a few randconfigs locally in a loop; didn't test all
> architectures) We can probably fetch the kernel.org toolchains for
> very extensive testing.
> https://mirrors.edge.kernel.org/pub/tools/crosstool/
> 
> >
> > > IWYU is implemented using the IWYUScripts github repository which is a tool that is
> > > currently undergoing development. These changes seek to improve build times.
> > >
> > > This change to lib/string.c resulted in a preprocessed size of
> > > lib/string.i from 26371 lines to 5232 lines (-80%).
> >
> > It also breeds includes of asm/*.h, by the look of the output, which is
> > not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
> > outside of linux/uaccess.h is a bad idea.
> 
> It's not clear to me when it's ok to #include <asm/*.h>.  Is there a
> convention here that I'm missing?

General rule, NEVER include asm/*.h, there should be a include/*.h
instead that works.  So much so that checkpatch.pl should catch this,
right?

But of course, it doesn't always hold true, there are a few minor
exceptions, but they are rare.

thanks,

greg k-h
Andy Shevchenko Dec. 5, 2023, 10:01 p.m. UTC | #13
On Tue, Dec 05, 2023 at 01:51:10PM -0800, Nick Desaulniers wrote:
> On Tue, Dec 5, 2023 at 1:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Tue, Dec 05, 2023 at 08:58:53PM +0000, tanzirh@google.com wrote:

...

> > > IWYU is implemented using the IWYUScripts github repository which is a tool that is
> > > currently undergoing development. These changes seek to improve build times.
> > >
> > > This change to lib/string.c resulted in a preprocessed size of
> > > lib/string.i from 26371 lines to 5232 lines (-80%).
> >
> > It also breeds includes of asm/*.h, by the look of the output, which is
> > not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
> > outside of linux/uaccess.h is a bad idea.
> 
> It's not clear to me when it's ok to #include <asm/*.h>.  Is there a
> convention here that I'm missing?

The mandatory ones can be used, but not all of them.
In some cases you even must include asm and not linux
(unaligned.h, byteorder.h, maybe others...).

As I told, it comes with experience, we lack of the
respective documentation (or file which is good for
automation checks, like with IWYU).
Andy Shevchenko Dec. 5, 2023, 10:05 p.m. UTC | #14
+Cc: Jonathan, who tried IWYU with kernel a year or so ago.

On Tue, Dec 05, 2023 at 08:58:53PM +0000, tanzirh@google.com wrote:
> This diff uses an open source tool include-what-you-use (IWYU) to modify
> the include list changing indirect includes to direct includes.
> IWYU is implemented using the IWYUScripts github repository which is a tool that is
> currently undergoing development. These changes seek to improve build times.
> 
> This change to lib/string.c resulted in a preprocessed size of
> lib/string.i from 26371 lines to 5232 lines (-80%).
> 
> If there are any problems with the output of the tool please raise an
> issue on the github.
> 
> Link: https://github.com/ClangBuiltLinux/IWYUScripts
> 
> Signed-off-by: Tanzir Hasan <tanzirh@google.com>
> ---
> 
> ---
>  lib/string.c | 14 +++++++-------
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/lib/string.c b/lib/string.c
> index be26623953d2..aff066e9da9f 100644
> --- a/lib/string.c
> +++ b/lib/string.c
> @@ -16,16 +16,16 @@
>  
>  #define __NO_FORTIFY
>  #include <linux/types.h>
> +#include <asm/bitsperlong.h>
> +#include <asm/bug.h>
> +#include <asm/errno.h>
> +#include <asm/rwonce.h>
> +#include <linux/linkage.h>
> +#include <linux/stddef.h>
> +#include <vdso/limits.h>
>  #include <linux/string.h>
>  #include <linux/ctype.h>
> -#include <linux/kernel.h>
> -#include <linux/export.h>
> -#include <linux/bug.h>
> -#include <linux/errno.h>
> -#include <linux/slab.h>
> -
>  #include <asm/unaligned.h>
> -#include <asm/byteorder.h>
>  #include <asm/word-at-a-time.h>
>  #include <asm/page.h>
>
Nick Desaulniers Dec. 5, 2023, 10:05 p.m. UTC | #15
On Tue, Dec 5, 2023 at 1:53 PM Andy Shevchenko <andy@kernel.org> wrote:
>
> Have you checked Ingo Molnar's gigantic series (2k+ patches) for the header
> hell clean up? Perhaps we need to apply that first.
>

We missed you at plumbers.  Tanzir gave a talk on this, with the
raison d'etre of this work was to see if we could help automate Ingo's
work.
https://www.youtube.com/watch?v=eFq_oqLiXPM
Randy Dunlap Dec. 5, 2023, 10:10 p.m. UTC | #16
On 12/5/23 14:01, Andy Shevchenko wrote:
> On Tue, Dec 05, 2023 at 01:51:10PM -0800, Nick Desaulniers wrote:
>> On Tue, Dec 5, 2023 at 1:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>>> On Tue, Dec 05, 2023 at 08:58:53PM +0000, tanzirh@google.com wrote:
> 
> ...
> 
>>>> IWYU is implemented using the IWYUScripts github repository which is a tool that is
>>>> currently undergoing development. These changes seek to improve build times.
>>>>
>>>> This change to lib/string.c resulted in a preprocessed size of
>>>> lib/string.i from 26371 lines to 5232 lines (-80%).
>>>
>>> It also breeds includes of asm/*.h, by the look of the output, which is
>>> not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
>>> outside of linux/uaccess.h is a bad idea.
>>
>> It's not clear to me when it's ok to #include <asm/*.h>.  Is there a
>> convention here that I'm missing?
> 
> The mandatory ones can be used, but not all of them.
> In some cases you even must include asm and not linux
> (unaligned.h, byteorder.h, maybe others...).
> 
> As I told, it comes with experience, we lack of the
> respective documentation (or file which is good for
> automation checks, like with IWYU).

I have an unpublished Linux Best Known Practices txt file with
this "rule" and some other info that I have collected over a few
years. I wouldn't say that it's up to date. Anyway, it's copied below
FWIW.
Nick Desaulniers Dec. 5, 2023, 10:14 p.m. UTC | #17
On Tue, Dec 5, 2023 at 1:59 PM Greg KH <greg@kroah.com> wrote:
>
> On Tue, Dec 05, 2023 at 01:51:10PM -0800, Nick Desaulniers wrote:
> > On Tue, Dec 5, 2023 at 1:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > >
> > > It also breeds includes of asm/*.h, by the look of the output, which is
> > > not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
> > > outside of linux/uaccess.h is a bad idea.
> >
> > It's not clear to me when it's ok to #include <asm/*.h>.  Is there a
> > convention here that I'm missing?
>
> General rule, NEVER include asm/*.h, there should be a include/*.h
> instead that works.  So much so that checkpatch.pl should catch this,
> right?

ah, shoot, I was showing Tanzir how to use `b4` for patch development,
and forgot to check this.  Indeed it does.

I can see how the check works (scripts/checkpatch.pl L5881).  Decoding
that will probably help us improve the tooling.

>
> But of course, it doesn't always hold true, there are a few minor
> exceptions, but they are rare.

$ grep -r \\#include lib | grep asm

shows quite a few exceptions, and just in lib/.

For example, lib/math/int_log.c includes asm/bug.h.  Is that a case
where lib/math/int_log.c should be #include 'ing linux/bug.h rather
than asm/bug.h?
Al Viro Dec. 5, 2023, 10:15 p.m. UTC | #18
On Wed, Dec 06, 2023 at 12:01:56AM +0200, Andy Shevchenko wrote:
> On Tue, Dec 05, 2023 at 01:51:10PM -0800, Nick Desaulniers wrote:
> > On Tue, Dec 5, 2023 at 1:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > On Tue, Dec 05, 2023 at 08:58:53PM +0000, tanzirh@google.com wrote:
> 
> ...
> 
> > > > IWYU is implemented using the IWYUScripts github repository which is a tool that is
> > > > currently undergoing development. These changes seek to improve build times.
> > > >
> > > > This change to lib/string.c resulted in a preprocessed size of
> > > > lib/string.i from 26371 lines to 5232 lines (-80%).
> > >
> > > It also breeds includes of asm/*.h, by the look of the output, which is
> > > not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
> > > outside of linux/uaccess.h is a bad idea.
> > 
> > It's not clear to me when it's ok to #include <asm/*.h>.  Is there a
> > convention here that I'm missing?
> 
> The mandatory ones can be used, but not all of them.
> In some cases you even must include asm and not linux
> (unaligned.h, byteorder.h, maybe others...).
> 
> As I told, it comes with experience, we lack of the
> respective documentation (or file which is good for
> automation checks, like with IWYU).

It would certainly be nice to have such information in the tree;
"where should I pick $SYMBOL from?" is something one needs to
find out often enough.  To a large extent it's covered by "where
in include/*.h do we have it defined?", but that's not all there
is to it.  E.g. "get_user() => use linux/uaccess.h".

There's also stuff like "$SYMBOL should not be used outside of arch/*
and include/*, better use $OTHER_SYMBOL", etc.
Nick Desaulniers Dec. 5, 2023, 10:20 p.m. UTC | #19
On Tue, Dec 5, 2023 at 2:15 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Wed, Dec 06, 2023 at 12:01:56AM +0200, Andy Shevchenko wrote:
> > On Tue, Dec 05, 2023 at 01:51:10PM -0800, Nick Desaulniers wrote:
> > > On Tue, Dec 5, 2023 at 1:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > > On Tue, Dec 05, 2023 at 08:58:53PM +0000, tanzirh@google.com wrote:
> >
> > ...
> >
> > > > > IWYU is implemented using the IWYUScripts github repository which is a tool that is
> > > > > currently undergoing development. These changes seek to improve build times.
> > > > >
> > > > > This change to lib/string.c resulted in a preprocessed size of
> > > > > lib/string.i from 26371 lines to 5232 lines (-80%).
> > > >
> > > > It also breeds includes of asm/*.h, by the look of the output, which is
> > > > not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
> > > > outside of linux/uaccess.h is a bad idea.
> > >
> > > It's not clear to me when it's ok to #include <asm/*.h>.  Is there a
> > > convention here that I'm missing?
> >
> > The mandatory ones can be used, but not all of them.
> > In some cases you even must include asm and not linux
> > (unaligned.h, byteorder.h, maybe others...).
> >
> > As I told, it comes with experience, we lack of the
> > respective documentation (or file which is good for
> > automation checks, like with IWYU).
>
> It would certainly be nice to have such information in the tree;
> "where should I pick $SYMBOL from?" is something one needs to
> find out often enough.  To a large extent it's covered by "where
> in include/*.h do we have it defined?", but that's not all there
> is to it.  E.g. "get_user() => use linux/uaccess.h".
>
> There's also stuff like "$SYMBOL should not be used outside of arch/*
> and include/*, better use $OTHER_SYMBOL", etc.

That's basically one of the tables we maintain.
https://github.com/ClangBuiltLinux/IWYUScripts/blob/main/symbol.imp

Of course, such a table is a living document; whether it resides in
tree or out of tree eventually I don't particularly care either way.
There are outstanding questions like "is it even possible to
autogenerate such a table (vs manual curation)" and "are the ones
we've coded so far correct?"  I don't expect to have all of the
answers with the initial implementation, but instead to collect
feedback from kernel devs and iterate from there.  Very helpful
responses in this thread so far; we appreciate it.

I suspect that folks that have played with IWYU in the past
encountered great difficulty since these override tables are poorly
documented, and the out of the box defaults assume more of a userspace
layout for common definitions (which our script which wraps IWYU
disables).
Nick Desaulniers Dec. 5, 2023, 10:25 p.m. UTC | #20
On Tue, Dec 5, 2023 at 2:10 PM Randy Dunlap <rdunlap@infradead.org> wrote:
>
> I have an unpublished Linux Best Known Practices txt file with
> this "rule" and some other info that I have collected over a few
> years. I wouldn't say that it's up to date. Anyway, it's copied below
> FWIW.
> Generic drivers should not (do not) #include header files from
> <asm-generic>. Only arch code and arch-specific drivers do that.

Ok, that confirms my understanding.  We had picked up on that
convention, but had missed the one below.

>
> Don't use <asm/header.h> in generic driver code. Instead use
> some <linux/header.h> that then #includes <asm/header.h>.
> asm/ headers are implementation; linux/ headers are API.

Ah, yeah, that's what we missed.

---
You should publish you doc somewhere, it can be helpful to refer back
to them when you're on a different machine or to point someone else
to.

I keep notes like this myself for debugging LLVM:
https://github.com/ClangBuiltLinux/llvm-dev-conf-2020/issues/3 (and
recently gcc: https://github.com/ClangBuiltLinux/llvm-dev-conf-2020/issues/4)
Al Viro Dec. 5, 2023, 10:32 p.m. UTC | #21
On Tue, Dec 05, 2023 at 10:15:21PM +0000, Al Viro wrote:

> It would certainly be nice to have such information in the tree;
> "where should I pick $SYMBOL from?" is something one needs to
> find out often enough.  To a large extent it's covered by "where
> in include/*.h do we have it defined?", but that's not all there
> is to it.  E.g. "get_user() => use linux/uaccess.h".
> 
> There's also stuff like "$SYMBOL should not be used outside of arch/*
> and include/*, better use $OTHER_SYMBOL", etc.

Speaking of...

arch/arm64/include/asm/asm-prototypes.h:18:#include <asm/uaccess.h>
arch/powerpc/include/asm/inst.h:8:#include <asm/uaccess.h>
arch/powerpc/kvm/book3s_xive_native.c:16:#include <asm/uaccess.h>
arch/powerpc/mm/book3s64/radix_pgtable.c:31:#include <asm/uaccess.h>
arch/riscv/kernel/sys_riscv.c:15:#include <asm/uaccess.h>
arch/xtensa/include/asm/asm-prototypes.h:10:#include <asm/uaccess.h>
include/linux/uaccess.h:11:#include <asm/uaccess.h>
tools/testing/crypto/chacha20-s390/test-cipher.c:8:#include <asm/uaccess.h>

Only one of those is legitimate (linux/uaccess.h); asm-prototypes.h
ones are borderline, but probably should be pulling linux/uaccess.h,
everything in *.c is really wrong.  Powerpc asm/inst.h is the really
unpleasant one here - it's pulled by quite a few other places, including
several asm/*.h.  Hell knows; might be worth splitting the
{__,}{get_user_instr,copy_inst_from_kernel_nofault}() off to
a separate header (asm/inst-uaccess.h?), to be included only by the
files that use any of those (that would be
arch/powerpc/kernel/{align,hw_breakpoint_constraints,module_32,traps,vecemu}.c
and arch/powerpc/kernel/trace/{ftrace,ftrace_64_pg}.c - not a single header
among them).  That header would pull linux/uaccess.h and asm/inst.h wouldn't
need any uaccess at all; might spill into explicit includes of linux/uaccess.h
in some of the places in arch/powerpc that pull asm/inst.h, directly or
not...
Greg KH Dec. 5, 2023, 11:46 p.m. UTC | #22
On Tue, Dec 05, 2023 at 02:14:55PM -0800, Nick Desaulniers wrote:
> On Tue, Dec 5, 2023 at 1:59 PM Greg KH <greg@kroah.com> wrote:
> >
> > On Tue, Dec 05, 2023 at 01:51:10PM -0800, Nick Desaulniers wrote:
> > > On Tue, Dec 5, 2023 at 1:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > >
> > > > It also breeds includes of asm/*.h, by the look of the output, which is
> > > > not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
> > > > outside of linux/uaccess.h is a bad idea.
> > >
> > > It's not clear to me when it's ok to #include <asm/*.h>.  Is there a
> > > convention here that I'm missing?
> >
> > General rule, NEVER include asm/*.h, there should be a include/*.h
> > instead that works.  So much so that checkpatch.pl should catch this,
> > right?
> 
> ah, shoot, I was showing Tanzir how to use `b4` for patch development,
> and forgot to check this.  Indeed it does.
> 
> I can see how the check works (scripts/checkpatch.pl L5881).  Decoding
> that will probably help us improve the tooling.
> 
> >
> > But of course, it doesn't always hold true, there are a few minor
> > exceptions, but they are rare.
> 
> $ grep -r \\#include lib | grep asm
> 
> shows quite a few exceptions, and just in lib/.
> 
> For example, lib/math/int_log.c includes asm/bug.h.  Is that a case
> where lib/math/int_log.c should be #include 'ing linux/bug.h rather
> than asm/bug.h?

Probably yes, but we don't normally go back and take coding style fixes
for old files like this as it doesn't make much sense to do so.

But, if you are cleaning up the headers for large portions with the goal
of faster builds, that's a good reason.

Good luck!

greg k-h
Al Viro Dec. 6, 2023, 12:55 a.m. UTC | #23
On Wed, Dec 06, 2023 at 08:46:50AM +0900, Greg KH wrote:
> > >
> > > But of course, it doesn't always hold true, there are a few minor
> > > exceptions, but they are rare.
> > 
> > $ grep -r \\#include lib | grep asm
> > 
> > shows quite a few exceptions, and just in lib/.
> > 
> > For example, lib/math/int_log.c includes asm/bug.h.  Is that a case
> > where lib/math/int_log.c should be #include 'ing linux/bug.h rather
> > than asm/bug.h?
> 
> Probably yes, but we don't normally go back and take coding style fixes
> for old files like this as it doesn't make much sense to do so.
> 
> But, if you are cleaning up the headers for large portions with the goal
> of faster builds, that's a good reason.

FWIW, the most common (by far - about 13% of such includes, over drivers/, fs/,
mm/, net/ and sound/) is asm/unaligned.h.

The next are asm/io.h (10%), asm/byteorder.h (6%), asm/irq.h (5%), asm/div64.h (3%),
asm/page.h and asm/dma.h (2% each).  The rest is below that (the next is a bit over
1%).

In fs/* unaligned.h is again the top one (37 out of 139), followed by
byteorder.h (30 out of 139), div64.h (12), page.h (10) and then comes random
shite - I do not believe that fs/coda/psdev.c might have a legitimate
reason to pull asm/io.h, for one thing...
Al Viro Dec. 6, 2023, 3 a.m. UTC | #24
On Wed, Dec 06, 2023 at 12:55:42AM +0000, Al Viro wrote:
> On Wed, Dec 06, 2023 at 08:46:50AM +0900, Greg KH wrote:
> > > >
> > > > But of course, it doesn't always hold true, there are a few minor
> > > > exceptions, but they are rare.
> > > 
> > > $ grep -r \\#include lib | grep asm
> > > 
> > > shows quite a few exceptions, and just in lib/.
> > > 
> > > For example, lib/math/int_log.c includes asm/bug.h.  Is that a case
> > > where lib/math/int_log.c should be #include 'ing linux/bug.h rather
> > > than asm/bug.h?
> > 
> > Probably yes, but we don't normally go back and take coding style fixes
> > for old files like this as it doesn't make much sense to do so.
> > 
> > But, if you are cleaning up the headers for large portions with the goal
> > of faster builds, that's a good reason.
> 
> FWIW, the most common (by far - about 13% of such includes, over drivers/, fs/,
> mm/, net/ and sound/) is asm/unaligned.h.

Why the hell is unaligned.h in asm/*, anyway?

We have 3 variants: arc, parisc and generic (== everything else).
Both arc and parisc instances have an explicit include of
asm-generic/unaligned.h (i.e. the generic variant).

On arc there's also misaligned_fixup() extern or stub, with exactly
one user (in arch/arc/kernel/traps.c).  On parisc there are
externs for handle_unaligned() and check_unaligned() (3 call sites,
all in arch/parisc/kernel/traps.c).

How about we take those into arch/{arc,parisc}/kernel/unaligned.h,
slap #include "unaligned.h" into their traps.c and unaligned.c
(callers and definitions resp.) and strip those from asm/unaligned.h?
At that point we can remove arch/{arc,parisc}/asm/unaligned.h - everything
will pick include/asm-generic/unaligned.h.

Then the next cycle we ask Linus to run the following:
for i in `git grep -l -w asm/unaligned.h`; do
	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
done
git mv include/asm-generic/unaligned.h include/linux/unaligned.h
sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
right before releasing -rc1 and asm/unaligned.h is gone...

Completely untested delta (for the non-automatic parts, that is) follows:
------------------
arc, parisc: get rid of private asm/unaligned.h

Declarations local to arch/*/kernel/*.c are better off *not* in a public
header - arch/{arc,parisc}/kernel/unaligned.h is just fine for those
bits.

With that done these asm/unaligned.h instances are reduced to include
of asm-generic/unaligned.h and can be removed - unaligned.h is in
mandatory-y in include/asm-generic/Kbuild.
    
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
---
diff --git a/arch/arc/include/asm/unaligned.h b/arch/arc/include/asm/unaligned.h
deleted file mode 100644
index cf5a02382e0e..000000000000
--- a/arch/arc/include/asm/unaligned.h
+++ /dev/null
@@ -1,27 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0-only */
-/*
- * Copyright (C) 2004, 2007-2010, 2011-2012 Synopsys, Inc. (www.synopsys.com)
- */
-
-#ifndef _ASM_ARC_UNALIGNED_H
-#define _ASM_ARC_UNALIGNED_H
-
-/* ARC700 can't handle unaligned Data accesses. */
-
-#include <asm-generic/unaligned.h>
-#include <asm/ptrace.h>
-
-#ifdef CONFIG_ARC_EMUL_UNALIGNED
-int misaligned_fixup(unsigned long address, struct pt_regs *regs,
-		     struct callee_regs *cregs);
-#else
-static inline int
-misaligned_fixup(unsigned long address, struct pt_regs *regs,
-		 struct callee_regs *cregs)
-{
-	/* Not fixed */
-	return 1;
-}
-#endif
-
-#endif /* _ASM_ARC_UNALIGNED_H */
diff --git a/arch/arc/kernel/traps.c b/arch/arc/kernel/traps.c
index 9b9570b79362..8e40f0881e02 100644
--- a/arch/arc/kernel/traps.c
+++ b/arch/arc/kernel/traps.c
@@ -20,6 +20,7 @@
 #include <asm/setup.h>
 #include <asm/unaligned.h>
 #include <asm/kprobes.h>
+#include "unaligned.h"
 
 void die(const char *str, struct pt_regs *regs, unsigned long address)
 {
diff --git a/arch/arc/kernel/unaligned.c b/arch/arc/kernel/unaligned.c
index 99a9b92ed98d..d2f5ceaaed1b 100644
--- a/arch/arc/kernel/unaligned.c
+++ b/arch/arc/kernel/unaligned.c
@@ -12,6 +12,7 @@
 #include <linux/ptrace.h>
 #include <linux/uaccess.h>
 #include <asm/disasm.h>
+#include "unaligned.h"
 
 #ifdef CONFIG_CPU_BIG_ENDIAN
 #define BE		1
diff --git a/arch/arc/kernel/unaligned.h b/arch/arc/kernel/unaligned.h
new file mode 100644
index 000000000000..5244453bb85f
--- /dev/null
+++ b/arch/arc/kernel/unaligned.h
@@ -0,0 +1,16 @@
+struct pt_regs;
+struct callee_regs;
+
+#ifdef CONFIG_ARC_EMUL_UNALIGNED
+int misaligned_fixup(unsigned long address, struct pt_regs *regs,
+		     struct callee_regs *cregs);
+#else
+static inline int
+misaligned_fixup(unsigned long address, struct pt_regs *regs,
+		 struct callee_regs *cregs)
+{
+	/* Not fixed */
+	return 1;
+}
+#endif
+
diff --git a/arch/parisc/include/asm/unaligned.h b/arch/parisc/include/asm/unaligned.h
deleted file mode 100644
index c0621295100d..000000000000
--- a/arch/parisc/include/asm/unaligned.h
+++ /dev/null
@@ -1,11 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 */
-#ifndef _ASM_PARISC_UNALIGNED_H
-#define _ASM_PARISC_UNALIGNED_H
-
-#include <asm-generic/unaligned.h>
-
-struct pt_regs;
-void handle_unaligned(struct pt_regs *regs);
-int check_unaligned(struct pt_regs *regs);
-
-#endif /* _ASM_PARISC_UNALIGNED_H */
diff --git a/arch/parisc/kernel/traps.c b/arch/parisc/kernel/traps.c
index 1107ca819ac8..7ab0d44ef698 100644
--- a/arch/parisc/kernel/traps.c
+++ b/arch/parisc/kernel/traps.c
@@ -47,6 +47,8 @@
 #include <linux/kgdb.h>
 #include <linux/kprobes.h>
 
+#include "unaligned.h"
+
 #if defined(CONFIG_LIGHTWEIGHT_SPINLOCK_CHECK)
 #include <asm/spinlock.h>
 #endif
diff --git a/arch/parisc/kernel/unaligned.c b/arch/parisc/kernel/unaligned.c
index ce25acfe4889..39cd222366bb 100644
--- a/arch/parisc/kernel/unaligned.c
+++ b/arch/parisc/kernel/unaligned.c
@@ -15,6 +15,7 @@
 #include <asm/unaligned.h>
 #include <asm/hardirq.h>
 #include <asm/traps.h>
+#include "unaligned.h"
 
 /* #define DEBUG_UNALIGNED 1 */
 
diff --git a/arch/parisc/kernel/unaligned.h b/arch/parisc/kernel/unaligned.h
new file mode 100644
index 000000000000..c1aa4b12e284
--- /dev/null
+++ b/arch/parisc/kernel/unaligned.h
@@ -0,0 +1,3 @@
+struct pt_regs;
+void handle_unaligned(struct pt_regs *regs);
+int check_unaligned(struct pt_regs *regs);
Greg KH Dec. 6, 2023, 3:09 a.m. UTC | #25
On Wed, Dec 06, 2023 at 03:00:47AM +0000, Al Viro wrote:
> On Wed, Dec 06, 2023 at 12:55:42AM +0000, Al Viro wrote:
> > On Wed, Dec 06, 2023 at 08:46:50AM +0900, Greg KH wrote:
> > > > >
> > > > > But of course, it doesn't always hold true, there are a few minor
> > > > > exceptions, but they are rare.
> > > > 
> > > > $ grep -r \\#include lib | grep asm
> > > > 
> > > > shows quite a few exceptions, and just in lib/.
> > > > 
> > > > For example, lib/math/int_log.c includes asm/bug.h.  Is that a case
> > > > where lib/math/int_log.c should be #include 'ing linux/bug.h rather
> > > > than asm/bug.h?
> > > 
> > > Probably yes, but we don't normally go back and take coding style fixes
> > > for old files like this as it doesn't make much sense to do so.
> > > 
> > > But, if you are cleaning up the headers for large portions with the goal
> > > of faster builds, that's a good reason.
> > 
> > FWIW, the most common (by far - about 13% of such includes, over drivers/, fs/,
> > mm/, net/ and sound/) is asm/unaligned.h.
> 
> Why the hell is unaligned.h in asm/*, anyway?
> 
> We have 3 variants: arc, parisc and generic (== everything else).
> Both arc and parisc instances have an explicit include of
> asm-generic/unaligned.h (i.e. the generic variant).
> 
> On arc there's also misaligned_fixup() extern or stub, with exactly
> one user (in arch/arc/kernel/traps.c).  On parisc there are
> externs for handle_unaligned() and check_unaligned() (3 call sites,
> all in arch/parisc/kernel/traps.c).
> 
> How about we take those into arch/{arc,parisc}/kernel/unaligned.h,
> slap #include "unaligned.h" into their traps.c and unaligned.c
> (callers and definitions resp.) and strip those from asm/unaligned.h?
> At that point we can remove arch/{arc,parisc}/asm/unaligned.h - everything
> will pick include/asm-generic/unaligned.h.
> 
> Then the next cycle we ask Linus to run the following:
> for i in `git grep -l -w asm/unaligned.h`; do
> 	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
> done
> git mv include/asm-generic/unaligned.h include/linux/unaligned.h
> sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
> right before releasing -rc1 and asm/unaligned.h is gone...

Please do, it's really annoying and would be great to fix up.

thanks,

greg k-h
kernel test robot Dec. 6, 2023, 7:10 a.m. UTC | #26
Hi,

kernel test robot noticed the following build errors:

[auto build test ERROR on 33cc938e65a98f1d29d0a18403dbbee050dcad9a]

url:    https://github.com/intel-lab-lkp/linux/commits/tanzirh-google-com/lib-string-shrink-lib-string-i-via-IWYU/20231206-050121
base:   33cc938e65a98f1d29d0a18403dbbee050dcad9a
patch link:    https://lore.kernel.org/r/20231205-libstringheader-v1-1-7f9c573053a7%40gmail.com
patch subject: [PATCH] lib/string: shrink lib/string.i via IWYU
config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20231206/202312061429.BJ7GrHnt-lkp@intel.com/config)
compiler: sh4-linux-gcc (GCC) 13.2.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231206/202312061429.BJ7GrHnt-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202312061429.BJ7GrHnt-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from lib/string.c:29:
   lib/string.c: In function 'strscpy':
>> arch/sh/include/asm/word-at-a-time.h:15:36: error: implicit declaration of function 'REPEAT_BYTE' [-Werror=implicit-function-declaration]
      15 | #define WORD_AT_A_TIME_CONSTANTS { REPEAT_BYTE(0x01), REPEAT_BYTE(0x80) }
         |                                    ^~~~~~~~~~~
   lib/string.c:124:49: note: in expansion of macro 'WORD_AT_A_TIME_CONSTANTS'
     124 |         const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;
         |                                                 ^~~~~~~~~~~~~~~~~~~~~~~~
   cc1: some warnings being treated as errors


vim +/REPEAT_BYTE +15 arch/sh/include/asm/word-at-a-time.h

cba8df4be3bdf1 Paul Mundt 2012-06-04  14  
cba8df4be3bdf1 Paul Mundt 2012-06-04 @15  #define WORD_AT_A_TIME_CONSTANTS { REPEAT_BYTE(0x01), REPEAT_BYTE(0x80) }
cba8df4be3bdf1 Paul Mundt 2012-06-04  16
Christoph Hellwig Dec. 7, 2023, 6:25 a.m. UTC | #27
On Tue, Dec 05, 2023 at 01:39:47PM -0800, Nick Desaulniers wrote:
> Hmm...how does one know that linux/bits.h is the higher-level include
> of asm/bitsperlong.h?

I think this the wrong way of thinking.  In general we should always
avoid including asm/ headers unless there is no other way.  No other
way is not formalized right now, but I think the answer is roughly if
no linux/ headers includes that asm header.  This will probably give
a few wrong results at the moment, but it's probably worth fixing
them up.
Andy Shevchenko Dec. 7, 2023, 12:50 p.m. UTC | #28
On Tue, Dec 05, 2023 at 02:14:55PM -0800, Nick Desaulniers wrote:
> On Tue, Dec 5, 2023 at 1:59 PM Greg KH <greg@kroah.com> wrote:

...

> For example, lib/math/int_log.c includes asm/bug.h.  Is that a case
> where lib/math/int_log.c should be #include 'ing linux/bug.h rather
> than asm/bug.h?

For the C-files it's fine to include linux/* versions of asm/* ones,
for the _headers_ I would prefer to be strictly minimalistic, because
of possible header dependency hell. If something is defined in asm/*
and some header wants to use that, it's better to include asm/*.
Andy Shevchenko Dec. 7, 2023, 12:52 p.m. UTC | #29
On Tue, Dec 05, 2023 at 10:15:21PM +0000, Al Viro wrote:
> On Wed, Dec 06, 2023 at 12:01:56AM +0200, Andy Shevchenko wrote:
> > On Tue, Dec 05, 2023 at 01:51:10PM -0800, Nick Desaulniers wrote:
> > > On Tue, Dec 5, 2023 at 1:38 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > > > On Tue, Dec 05, 2023 at 08:58:53PM +0000, tanzirh@google.com wrote:

...

> > > > > IWYU is implemented using the IWYUScripts github repository which is a tool that is
> > > > > currently undergoing development. These changes seek to improve build times.
> > > > >
> > > > > This change to lib/string.c resulted in a preprocessed size of
> > > > > lib/string.i from 26371 lines to 5232 lines (-80%).
> > > >
> > > > It also breeds includes of asm/*.h, by the look of the output, which is
> > > > not a good thing in general ;-/  E.g. #include <asm/uaccess.h> *anywhere*
> > > > outside of linux/uaccess.h is a bad idea.
> > > 
> > > It's not clear to me when it's ok to #include <asm/*.h>.  Is there a
> > > convention here that I'm missing?
> > 
> > The mandatory ones can be used, but not all of them.
> > In some cases you even must include asm and not linux
> > (unaligned.h, byteorder.h, maybe others...).
> > 
> > As I told, it comes with experience, we lack of the
> > respective documentation (or file which is good for
> > automation checks, like with IWYU).
> 
> It would certainly be nice to have such information in the tree;
> "where should I pick $SYMBOL from?" is something one needs to
> find out often enough.  To a large extent it's covered by "where
> in include/*.h do we have it defined?", but that's not all there
> is to it.  E.g. "get_user() => use linux/uaccess.h".
> 
> There's also stuff like "$SYMBOL should not be used outside of arch/*
> and include/*, better use $OTHER_SYMBOL", etc.

Precisely! That's what many developers will benefit from!
Andy Shevchenko Dec. 7, 2023, 12:55 p.m. UTC | #30
On Wed, Dec 06, 2023 at 03:10:46PM +0800, kernel test robot wrote:
> Hi,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on 33cc938e65a98f1d29d0a18403dbbee050dcad9a]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/tanzirh-google-com/lib-string-shrink-lib-string-i-via-IWYU/20231206-050121
> base:   33cc938e65a98f1d29d0a18403dbbee050dcad9a
> patch link:    https://lore.kernel.org/r/20231205-libstringheader-v1-1-7f9c573053a7%40gmail.com
> patch subject: [PATCH] lib/string: shrink lib/string.i via IWYU
> config: sh-allmodconfig (https://download.01.org/0day-ci/archive/20231206/202312061429.BJ7GrHnt-lkp@intel.com/config)
> compiler: sh4-linux-gcc (GCC) 13.2.0
> reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20231206/202312061429.BJ7GrHnt-lkp@intel.com/reproduce)

...

>    lib/string.c:124:49: note: in expansion of macro 'WORD_AT_A_TIME_CONSTANTS'
>      124 |         const struct word_at_a_time constants = WORD_AT_A_TIME_CONSTANTS;

> cba8df4be3bdf1 Paul Mundt 2012-06-04 @15  #define WORD_AT_A_TIME_CONSTANTS { REPEAT_BYTE(0x01), REPEAT_BYTE(0x80) }

REPEAT_BYTE() shouldn't be even in kernel.h to begin with!
Nick Desaulniers Dec. 11, 2023, 8:47 p.m. UTC | #31
On Tue, Dec 5, 2023 at 1:44 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
>
> On Tue, Dec 05, 2023 at 01:39:47PM -0800, Nick Desaulniers wrote:
>
> > The tooling Tanzir is working on does wrap IWYU, and does support such
> > mapping (of 'low level' to 'high level' headers; more so, if it
> > recommends X you can override to suggest Y instead).
> >
> > arch/nios/ also doesn't provide a bug.h, which this patch is
> > suggesting we include directly.  I guess the same goes for
> > asm/rwonce.h.
>
> See include/asm-generic/Kbuild:
> mandatory-y += bug.h
> ...
> mandatory-y += rwonce.h
>
> IOW, sh will have asm/bug.h and as/rwonce.h copied from asm-generic.
>
> Still, includes of asm/*.h had been a massive headache historically
> and breeding more of those shouldn't be overdone.
>
> More painful problem is arch- and config-dependent stuff, though...

Ah, it looks like include/uapi/asm-generic/Kbuild also makes use of
this pattern using mandatory-y.

So I think we can handle this as a two step translation. We can tell
the tooling to 'nevermind recommending X, always replace
recommendations for X with Y <for raisins>', so:
1. any recommendation to use asm-generic/foo.h should be replaced with
asm/foo.h (always, based on those 2 Kbuild files, could autogenerate)
2. some recommendations to use asm/foo.h should be replaced with
linux/foo.h (not necessarily codified anywhere; depends on if there is
a linux/foo.h, will manually curate this list for now)

Orthogonal but some places in tree can be cleaned up to include
linux/foo.h rather than asm/foo.h.

Does this sound like an improvement to my mental model of the
conventions used for kernel header inclusion within the linux kernel?

Tanzir nearly has the above done (for 1, 2 we will probably need to
iterate on more).  We've also beefed up our local testing to test more
architectures.  I expect Tanzir to send a v2 of this patch (as a
series, with the fix for ARCH=sh) later this week, if the above seems
correct.
Andy Shevchenko Dec. 11, 2023, 8:50 p.m. UTC | #32
On Mon, Dec 11, 2023 at 10:48 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
> On Tue, Dec 5, 2023 at 1:44 PM Al Viro <viro@zeniv.linux.org.uk> wrote:
> > On Tue, Dec 05, 2023 at 01:39:47PM -0800, Nick Desaulniers wrote:
> >
> > > The tooling Tanzir is working on does wrap IWYU, and does support such
> > > mapping (of 'low level' to 'high level' headers; more so, if it
> > > recommends X you can override to suggest Y instead).
> > >
> > > arch/nios/ also doesn't provide a bug.h, which this patch is
> > > suggesting we include directly.  I guess the same goes for
> > > asm/rwonce.h.
> >
> > See include/asm-generic/Kbuild:
> > mandatory-y += bug.h
> > ...
> > mandatory-y += rwonce.h
> >
> > IOW, sh will have asm/bug.h and as/rwonce.h copied from asm-generic.
> >
> > Still, includes of asm/*.h had been a massive headache historically
> > and breeding more of those shouldn't be overdone.
> >
> > More painful problem is arch- and config-dependent stuff, though...
>
> Ah, it looks like include/uapi/asm-generic/Kbuild also makes use of
> this pattern using mandatory-y.
>
> So I think we can handle this as a two step translation. We can tell
> the tooling to 'nevermind recommending X, always replace
> recommendations for X with Y <for raisins>', so:
> 1. any recommendation to use asm-generic/foo.h should be replaced with
> asm/foo.h (always, based on those 2 Kbuild files, could autogenerate)
> 2. some recommendations to use asm/foo.h should be replaced with
> linux/foo.h (not necessarily codified anywhere; depends on if there is
> a linux/foo.h, will manually curate this list for now)
>
> Orthogonal but some places in tree can be cleaned up to include
> linux/foo.h rather than asm/foo.h.
>
> Does this sound like an improvement to my mental model of the
> conventions used for kernel header inclusion within the linux kernel?
>
> Tanzir nearly has the above done (for 1, 2 we will probably need to
> iterate on more).  We've also beefed up our local testing to test more
> architectures.  I expect Tanzir to send a v2 of this patch (as a
> series, with the fix for ARCH=sh) later this week, if the above seems
> correct.

Whatever you choose, please make sure it gets documented, so we won't
go another round in the future for something similar.
Al Viro Dec. 14, 2023, 9:04 p.m. UTC | #33
On Wed, Dec 06, 2023 at 12:09:17PM +0900, Greg KH wrote:
> > slap #include "unaligned.h" into their traps.c and unaligned.c
> > (callers and definitions resp.) and strip those from asm/unaligned.h?
> > At that point we can remove arch/{arc,parisc}/asm/unaligned.h - everything
> > will pick include/asm-generic/unaligned.h.
> > 
> > Then the next cycle we ask Linus to run the following:
> > for i in `git grep -l -w asm/unaligned.h`; do
> > 	sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i
> > done
> > git mv include/asm-generic/unaligned.h include/linux/unaligned.h
> > sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild
> > right before releasing -rc1 and asm/unaligned.h is gone...
> 
> Please do, it's really annoying and would be great to fix up.

	FWIW, turns out that we have several places in drivers/* that pull
asm-generic/unaligned.h.  IMO that's completely wrong - not just in this
case (it's a matter of trivially adjusting the script), but I think that
as a matter of policy we should *NOT* have includes of asm-generic/*.h
anywhere in drivers/ fs/ io_uring/ kernel/ mm/ net/ sound/

	Current situation (outside of arch, include, scripts and tools,
with asm-generic/unaligned.h cases already taken out):

drivers/android/binderfs.c:32:#include <uapi/asm-generic/errno-base.h>
drivers/clk/microchip/clk-mpfs-ccc.c:7:#include "asm-generic/errno-base.h"
drivers/firmware/arm_scmi/shmem.c:13:#include <asm-generic/bug.h>
drivers/irqchip/irq-ti-sci-inta.c:24:#include <asm-generic/msi.h>
drivers/mtd/nand/raw/ingenic/ingenic_ecc.h:9:#include <uapi/asm-generic/errno-base.h>
drivers/platform/x86/hp/hp-bioscfg/passwdobj-attributes.c:10:#include <asm-generic/posix_types.h>
io_uring/uring_cmd.c:10:#include <uapi/asm-generic/ioctls.h>
lib/trace_readwrite.c:10:#include <asm-generic/io.h>
mm/damon/vaddr.c:10:#include <asm-generic/mman-common.h>
net/sunrpc/xprtrdma/verbs.c:58:#include <asm-generic/barrier.h>
rust/uapi/uapi_helper.h:9:#include <uapi/asm-generic/ioctl.h>

That's it.  asm-generic/errno-base.h cases - just use linux/errno.h and
be done with it.  Anyone who goes "I've got ENOENT and ENOMEM in my driver,
but those two are covered by errno-base.h and I can be clever and use
just that" is wrong.

asm-generic/mman-common.h - and linux/mman.h would not serve why, exactly?
uapi/linux/linux/mman.h, if one really feels like it...

asm-generic/ioctls.h in io_uring - definitely wrong; SIOCINQ is what
it's brought for, but that's in sockios.h.  It expands to FIONREAD, which
*is* in ioctls.h, but...

arch/alpha/include/uapi/asm/ioctls.h:11:#define FIONREAD        _IOR('f', 127, int)
arch/mips/include/uapi/asm/ioctls.h:64:#define FIONREAD 0x467f
arch/parisc/include/uapi/asm/ioctls.h:35:#define FIONREAD       0x541B
arch/powerpc/include/uapi/asm/ioctls.h:11:#define FIONREAD      _IOR('f', 127, int)
arch/sh/include/uapi/asm/ioctls.h:11:#define FIONREAD   _IOR('f', 127, int)
arch/sparc/include/uapi/asm/ioctls.h:101:#define FIONREAD       _IOR('f', 127, int)
arch/xtensa/include/uapi/asm/ioctls.h:23:#define FIONREAD       _IOR('f', 127, int)
include/uapi/asm-generic/ioctls.h:46:#define FIONREAD   0x541B

See the problem?  On mips the value is not 0x541B - it's 0x467F; on
alpha, powerpc and sparc it's 0x4004667F, on sh and xtensa - 0x8004667F
(different expansions of _IOR).

The only thing that has any business including asm-generic/ioctls.h is
arch/*/include/uapi/asm/ioctls.h - if it feels like doing so.  Anywhere
else it ought to be asm/ioctls.h.

asm-generic/msi.h - looks bogus; there's an include of linux/msi.h several lines
above, and that pulls asm/msi.h.  Which makes it either a no-op or a build bug
(the latter - if that driver can be built on x86 with CONFIG_GENERIC_MSI_IRQ
defined;
typedef struct irq_alloc_info msi_alloc_info_t;
in asm/msi.h vs.
typedef struct msi_alloc_info {
        struct msi_desc                 *desc;
        irq_hw_number_t                 hwirq;
        unsigned long                   flags;
        union {
                unsigned long           ul;
                void                    *ptr;
        } scratchpad[NUM_MSI_ALLOC_SCRATCHPAD_REGS];
} msi_alloc_info_t;
in asm-generic/msi.h).

Hell knows about the rest, but they all look very dubious...
Al Viro Dec. 15, 2023, 9:03 p.m. UTC | #34
On Thu, Dec 14, 2023 at 09:04:00PM +0000, Al Viro wrote:

> drivers/firmware/arm_scmi/shmem.c:13:#include <asm-generic/bug.h>

Should just use linux/bug.h and be done with that.

> drivers/platform/x86/hp/hp-bioscfg/passwdobj-attributes.c:10:#include <asm-generic/posix_types.h>

Completely pointless; not to mention that none of the types defined
there are used anywhere in that file, the previous line pulls their
private header, which explicitly pulls linux/types.h.

> lib/trace_readwrite.c:10:#include <asm-generic/io.h>

Yeccchhh...  This is just plain wrong.

What happens here is that hooks are added to writeb() et.al., to
allow ftrace to play with those.  By default these are empty
inlines; with CONFIG_TRACE_MMIO_ACCESS they are real function
calls, the functions living in lib/trace_readwrite.c

asm-generic/io.h is pulled by all asm/io.h instances, so that's
where the externs went.  That would've made sense, except that
asm-generic/io.h is used as a backstop for architectures that
had not bothered to define e.g. readl() of their own.  And *that*
is where the calls of those hooks had gone.  IOW, if architecture
has readl()/writeb()/whatnot of its own, it won't get those hooks
at all.

It smells like a conversion in progress, stalled after the first
(and only) architecture where that thing is selectable.  In effect,
it's arm64-specific.

> net/sunrpc/xprtrdma/verbs.c:58:#include <asm-generic/barrier.h>

Bogus, same as the include of asm/bitops.h immediately before that
line (the latter would've blown up if we hadn't already pulled
linux/bitops.h - which needs asm/barrier.h and pulls it, TYVM).

> rust/uapi/uapi_helper.h:9:#include <uapi/asm-generic/ioctl.h>

To the rust crowd, that...  It's wrong for e.g. powerpc -
uapi/asm/ioctl.h in there does pull asm-generic/ioctl.h, but
only after
#define _IOC_SIZEBITS   13
#define _IOC_DIRBITS    3

#define _IOC_NONE       1U
#define _IOC_READ       2U
#define _IOC_WRITE      4U

which means that trying to use asm-generic/ioctl.h directly
will yield the wrong numbers out of _IOC() et.al.


So...
in arch headers (..../asm/.../*.h and similar):
	OK, that's what asm-generic is for
in asm-generic headers themselves (..../asm-generic/.../*.h):
	OK
in linker scripts (*/*.lds{,.S}):
	asm-generic/vmlinux.lds.h is fine in those (and nowhere else)
in */*audit*.c:
	asm-generic/audit_*.h is OK there (ugly, but...)
in drivers,fs,init,io_uring,ipc,kernel,mm,net,sound,virt and probably rust:
	100% bollocks, not a single valid use
in lib/trace_readwrite.c:
	asm-generic/io.h is an exception; complicated story.

That leaves several instances in arch/, tools/ and include/linux, plus
some oddities in makefiles, scripts, etc.  And include/linux ones are
also not obviously correct - e.g. linux/bug.h pulls asm/bug.h, then
(under ifdef CONFIG_BUG) asm-generic/bug.h.  AFAICS on all architectures
we have asm/bug.h pulling asm-generic/bug.h (the ones that don't have
asm/bug.h of their own get it generated with just such an include in
it).  So do we need that include in linux/bug.h these days?
diff mbox series

Patch

diff --git a/lib/string.c b/lib/string.c
index be26623953d2..aff066e9da9f 100644
--- a/lib/string.c
+++ b/lib/string.c
@@ -16,16 +16,16 @@ 
 
 #define __NO_FORTIFY
 #include <linux/types.h>
+#include <asm/bitsperlong.h>
+#include <asm/bug.h>
+#include <asm/errno.h>
+#include <asm/rwonce.h>
+#include <linux/linkage.h>
+#include <linux/stddef.h>
+#include <vdso/limits.h>
 #include <linux/string.h>
 #include <linux/ctype.h>
-#include <linux/kernel.h>
-#include <linux/export.h>
-#include <linux/bug.h>
-#include <linux/errno.h>
-#include <linux/slab.h>
-
 #include <asm/unaligned.h>
-#include <asm/byteorder.h>
 #include <asm/word-at-a-time.h>
 #include <asm/page.h>