diff mbox series

[v2,2/3] lib/vsprintf: Split out sprintf() and friends

Message ID 20230805175027.50029-3-andriy.shevchenko@linux.intel.com (mailing list archive)
State New
Headers show
Series lib/vsprintf: Rework header inclusions | expand

Commit Message

Andy Shevchenko Aug. 5, 2023, 5:50 p.m. UTC
kernel.h is being used as a dump for all kinds of stuff for a long time.
sprintf() and friends are used in many drivers without need of the full
kernel.h dependency train with it.

Here is the attempt on cleaning it up by splitting out sprintf() and
friends.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
---
 include/linux/kernel.h  | 30 +-----------------------------
 include/linux/sprintf.h | 25 +++++++++++++++++++++++++
 lib/test_printf.c       |  1 +
 lib/vsprintf.c          |  1 +
 4 files changed, 28 insertions(+), 29 deletions(-)
 create mode 100644 include/linux/sprintf.h

Comments

Andrew Morton Aug. 5, 2023, 6:43 p.m. UTC | #1
On Sat,  5 Aug 2023 20:50:26 +0300 Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:

> kernel.h is being used as a dump for all kinds of stuff for a long time.
> sprintf() and friends are used in many drivers without need of the full
> kernel.h dependency train with it.

There seems little point in this unless someone signs up to convert
lots of code to include sprintf.h instead of kernel.h?

And such conversions will presumably cause all sorts of nasties
which require additional work?

So... what's the plan here?
Andy Shevchenko Aug. 5, 2023, 9:31 p.m. UTC | #2
On Sat, Aug 05, 2023 at 11:43:04AM -0700, Andrew Morton wrote:
> On Sat,  5 Aug 2023 20:50:26 +0300 Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> 
> > kernel.h is being used as a dump for all kinds of stuff for a long time.
> > sprintf() and friends are used in many drivers without need of the full
> > kernel.h dependency train with it.
> 
> There seems little point in this unless someone signs up to convert
> lots of code to include sprintf.h instead of kernel.h?
> 
> And such conversions will presumably cause all sorts of nasties
> which require additional work?
> 
> So... what's the plan here?

My main plan is to clean _headers_ from kernel.h.
The rest of the code may do that gradually.
Petr Mladek Aug. 7, 2023, 3:03 p.m. UTC | #3
On Sat 2023-08-05 20:50:26, Andy Shevchenko wrote:
> kernel.h is being used as a dump for all kinds of stuff for a long time.
> sprintf() and friends are used in many drivers without need of the full
> kernel.h dependency train with it.
> 
> Here is the attempt on cleaning it up by splitting out sprintf() and
> friends.
> 
> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
> ---
>  include/linux/kernel.h  | 30 +-----------------------------
>  include/linux/sprintf.h | 25 +++++++++++++++++++++++++
>  lib/test_printf.c       |  1 +
>  lib/vsprintf.c          |  1 +
>  4 files changed, 28 insertions(+), 29 deletions(-)
>  create mode 100644 include/linux/sprintf.h

I agree that kernel.h is not the right place. But are there any
numbers how much separate sprintf.h might safe?

Maybe, we should not reinvent the wheel and get inspired by
userspace.

sprintf() and friends are basic functions which most people know
from userspace. And it is pretty handy that the kernel variants
are are mostly compatible as well.

IMHO, it might be handful when they are also included similar way
as in userspace. From my POV printk.h is like stdio.h. And we already
have include/linux/stdarg.h where the v*print*() function might
fit nicely.

How does this sound, please?

Best Regards,
Petr
Andy Shevchenko Aug. 7, 2023, 3:09 p.m. UTC | #4
On Mon, Aug 07, 2023 at 05:03:19PM +0200, Petr Mladek wrote:
> On Sat 2023-08-05 20:50:26, Andy Shevchenko wrote:
> > kernel.h is being used as a dump for all kinds of stuff for a long time.
> > sprintf() and friends are used in many drivers without need of the full
> > kernel.h dependency train with it.
> > 
> > Here is the attempt on cleaning it up by splitting out sprintf() and
> > friends.

...

> I agree that kernel.h is not the right place. But are there any
> numbers how much separate sprintf.h might safe?
> Maybe, we should not reinvent the wheel and get inspired by
> userspace.
> 
> sprintf() and friends are basic functions which most people know
> from userspace. And it is pretty handy that the kernel variants
> are are mostly compatible as well.
> 
> IMHO, it might be handful when they are also included similar way
> as in userspace. From my POV printk.h is like stdio.h. And we already
> have include/linux/stdarg.h where the v*print*() function might
> fit nicely.
> 
> How does this sound, please?

Not every user (especially _header_) wants to have printk.h included just for
sprintf.h that may have nothing to do with real output. So, same reasoning
from me as keeping that in kernel.h, i.e. printk.h no better.
Andy Shevchenko Aug. 7, 2023, 3:11 p.m. UTC | #5
On Mon, Aug 07, 2023 at 06:09:54PM +0300, Andy Shevchenko wrote:
> On Mon, Aug 07, 2023 at 05:03:19PM +0200, Petr Mladek wrote:
> > On Sat 2023-08-05 20:50:26, Andy Shevchenko wrote:

...

> > I agree that kernel.h is not the right place. But are there any
> > numbers how much separate sprintf.h might safe?
> > Maybe, we should not reinvent the wheel and get inspired by
> > userspace.
> > 
> > sprintf() and friends are basic functions which most people know
> > from userspace. And it is pretty handy that the kernel variants
> > are are mostly compatible as well.
> > 
> > IMHO, it might be handful when they are also included similar way
> > as in userspace. From my POV printk.h is like stdio.h. And we already
> > have include/linux/stdarg.h where the v*print*() function might
> > fit nicely.
> > 
> > How does this sound, please?
> 
> Not every user (especially _header_) wants to have printk.h included just for
> sprintf.h that may have nothing to do with real output. So, same reasoning
> from me as keeping that in kernel.h, i.e. printk.h no better.

(haven't check these, just to show how many _headers_ uses sprintf() call)

$ git grep -lw s.*printf -- include/linux/
include/linux/acpi.h
include/linux/audit.h
include/linux/btf.h
include/linux/dev_printk.h
include/linux/device-mapper.h
include/linux/efi.h
include/linux/fortify-string.h
include/linux/fs.h
include/linux/gameport.h
include/linux/kdb.h
include/linux/kdev_t.h
include/linux/kernel.h
include/linux/mmiotrace.h
include/linux/netlink.h
include/linux/pci-p2pdma.h
include/linux/perf_event.h
include/linux/printk.h
include/linux/seq_buf.h
include/linux/seq_file.h
include/linux/shrinker.h
include/linux/string.h
include/linux/sunrpc/svc_xprt.h
include/linux/tnum.h
include/linux/trace_seq.h
include/linux/usb.h
include/linux/usb/gadget_configfs.h
Andy Shevchenko Aug. 7, 2023, 3:13 p.m. UTC | #6
On Mon, Aug 07, 2023 at 06:11:24PM +0300, Andy Shevchenko wrote:
> On Mon, Aug 07, 2023 at 06:09:54PM +0300, Andy Shevchenko wrote:
> > On Mon, Aug 07, 2023 at 05:03:19PM +0200, Petr Mladek wrote:
> > > On Sat 2023-08-05 20:50:26, Andy Shevchenko wrote:

...

> > > How does this sound, please?
> > 
> > Not every user (especially _header_) wants to have printk.h included just for
> > sprintf.h that may have nothing to do with real output. So, same reasoning
> > from me as keeping that in kernel.h, i.e. printk.h no better.
> 
> (haven't check these, just to show how many _headers_ uses sprintf() call)
> 
> $ git grep -lw s.*printf -- include/linux/
> include/linux/acpi.h
> include/linux/audit.h
> include/linux/btf.h
> include/linux/dev_printk.h
> include/linux/device-mapper.h
> include/linux/efi.h
> include/linux/fortify-string.h
> include/linux/fs.h
> include/linux/gameport.h
> include/linux/kdb.h
> include/linux/kdev_t.h
> include/linux/kernel.h
> include/linux/mmiotrace.h
> include/linux/netlink.h
> include/linux/pci-p2pdma.h
> include/linux/perf_event.h
> include/linux/printk.h
> include/linux/seq_buf.h
> include/linux/seq_file.h
> include/linux/shrinker.h
> include/linux/string.h
> include/linux/sunrpc/svc_xprt.h
> include/linux/tnum.h
> include/linux/trace_seq.h
> include/linux/usb.h
> include/linux/usb/gadget_configfs.h

Okay, revised as my regexp was too lazy

$ git grep -lw s[^[:space:]_]*printf -- include/linux/
include/linux/btf.h
include/linux/device-mapper.h
include/linux/efi.h
include/linux/fortify-string.h
include/linux/kdev_t.h
include/linux/kernel.h
include/linux/netlink.h
include/linux/pci-p2pdma.h
include/linux/perf_event.h
include/linux/sunrpc/svc_xprt.h
include/linux/tnum.h
include/linux/usb.h
include/linux/usb/gadget_configfs.h
Rasmus Villemoes Aug. 7, 2023, 7:31 p.m. UTC | #7
On 07/08/2023 17.03, Petr Mladek wrote:

> I agree that kernel.h is not the right place. But are there any
> numbers how much separate sprintf.h might safe?
> 
> Maybe, we should not reinvent the wheel and get inspired by
> userspace.
> 
> sprintf() and friends are basic functions which most people know
> from userspace. And it is pretty handy that the kernel variants
> are are mostly compatible as well.
> 
> IMHO, it might be handful when they are also included similar way
> as in userspace. From my POV printk.h is like stdio.h. And we already
> have include/linux/stdarg.h where the v*print*() function might
> fit nicely.
> 
> How does this sound, please?

No, please. Let's have a separate header for the functions defined in
vsprintf.c. We really need to trim our headers down to something more
manageable, and stop including everything from everywhere just because
$this little macro needs $that little inline function.

I did https://wildmoose.dk/header-bloat/ many moons ago, I'm sure it
looks even worse today. I also did some sparse-hackery to let it tell me
which macros/functions/types were declared/defined in a given .h file,
and then if some .c file included that .h file but didn't use any of
those, the #include could go away.

Sure, individually, moving the sprintf family out of kernel.h won't save
much (and, of course, nothing at all initially when we're forced to add
an include of that new header from kernel.h). But this technical debt
has crept in over many years, it's not going away in one or two
releases. And use of the sprintf family is very easy to grep for, so
it's a good low-hanging fruit where we should be able to make everybody
who needs one of them include the proper header, and then drop the
include from kernel.h.

Rasmus
Steven Rostedt Aug. 8, 2023, 2:24 a.m. UTC | #8
On Mon, 7 Aug 2023 18:09:54 +0300
Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:

> On Mon, Aug 07, 2023 at 05:03:19PM +0200, Petr Mladek wrote:
> > On Sat 2023-08-05 20:50:26, Andy Shevchenko wrote:  
> > > kernel.h is being used as a dump for all kinds of stuff for a long time.
> > > sprintf() and friends are used in many drivers without need of the full
> > > kernel.h dependency train with it.
> > > 
> > > Here is the attempt on cleaning it up by splitting out sprintf() and
> > > friends.  
> 
> ...
> 
> > I agree that kernel.h is not the right place. But are there any
> > numbers how much separate sprintf.h might safe?
> > Maybe, we should not reinvent the wheel and get inspired by
> > userspace.
> > 
> > sprintf() and friends are basic functions which most people know
> > from userspace. And it is pretty handy that the kernel variants
> > are are mostly compatible as well.
> > 
> > IMHO, it might be handful when they are also included similar way
> > as in userspace. From my POV printk.h is like stdio.h. And we already
> > have include/linux/stdarg.h where the v*print*() function might
> > fit nicely.
> > 
> > How does this sound, please?  
> 
> Not every user (especially _header_) wants to have printk.h included just for
> sprintf.h that may have nothing to do with real output. So, same reasoning
> from me as keeping that in kernel.h, i.e. printk.h no better.
> 

If you separate out the sprintf() into its own header and still include
that in kernel.h, then for what you said in the other email:

> What to do with _headers_ that include kernel.h for no reason other than
> sprintf.h (as an example)? Your suggestion, please?

It can include sprintf.h (or printk.h or stdio.h, whatever) instead of kernel.h.

What's the issue?

-- Steve
Petr Mladek Aug. 8, 2023, 6:41 a.m. UTC | #9
On Mon 2023-08-07 18:13:57, Andy Shevchenko wrote:
> On Mon, Aug 07, 2023 at 06:11:24PM +0300, Andy Shevchenko wrote:
> > On Mon, Aug 07, 2023 at 06:09:54PM +0300, Andy Shevchenko wrote:
> > > On Mon, Aug 07, 2023 at 05:03:19PM +0200, Petr Mladek wrote:
> > > > On Sat 2023-08-05 20:50:26, Andy Shevchenko wrote:
> 
> ...
> 
> > > > How does this sound, please?
> > > 
> > > Not every user (especially _header_) wants to have printk.h included just for
> > > sprintf.h that may have nothing to do with real output. So, same reasoning
> > > from me as keeping that in kernel.h, i.e. printk.h no better.
> > 
> > (haven't check these, just to show how many _headers_ uses sprintf() call)
> > 
> > $ git grep -lw s.*printf -- include/linux/
> > include/linux/acpi.h
> > include/linux/audit.h
> > include/linux/btf.h
> > include/linux/dev_printk.h
> > include/linux/device-mapper.h
> > include/linux/efi.h
> > include/linux/fortify-string.h
> > include/linux/fs.h
> > include/linux/gameport.h
> > include/linux/kdb.h
> > include/linux/kdev_t.h
> > include/linux/kernel.h
> > include/linux/mmiotrace.h
> > include/linux/netlink.h
> > include/linux/pci-p2pdma.h
> > include/linux/perf_event.h
> > include/linux/printk.h
> > include/linux/seq_buf.h
> > include/linux/seq_file.h
> > include/linux/shrinker.h
> > include/linux/string.h
> > include/linux/sunrpc/svc_xprt.h
> > include/linux/tnum.h
> > include/linux/trace_seq.h
> > include/linux/usb.h
> > include/linux/usb/gadget_configfs.h
> 
> Okay, revised as my regexp was too lazy
> 
> $ git grep -lw s[^[:space:]_]*printf -- include/linux/
> include/linux/btf.h
> include/linux/device-mapper.h
> include/linux/efi.h
> include/linux/fortify-string.h
> include/linux/kdev_t.h
> include/linux/kernel.h
> include/linux/netlink.h
> include/linux/pci-p2pdma.h
> include/linux/perf_event.h
> include/linux/sunrpc/svc_xprt.h
> include/linux/tnum.h
> include/linux/usb.h
> include/linux/usb/gadget_configfs.h

This is only a tiny part of the picture.

$> git grep sc*n*printf | cut -d : -f1 | uniq | grep "\.c$" | wc -l
5254
$> find . -name  "*.c" | wc -l
32319

It means that the vsprintf() family is used in 1/6 of all kernel
source files. They would need to include one extra header.

If you split headers into so many small pieces then all
source files will start with 3 screens of includes. I do not see
how this helps with maintainability.

Best Regards,
Petr
David Laight Aug. 8, 2023, 11:17 a.m. UTC | #10
From: Rasmus Villemoes
> Sent: 07 August 2023 20:32
...
> No, please. Let's have a separate header for the functions defined in
> vsprintf.c. We really need to trim our headers down to something more
> manageable, and stop including everything from everywhere just because
> $this little macro needs $that little inline function.

The problem I see isn't things like kernel.h defining a few 'library'
functions, but deep nested includes that means that pretty much all
of the headers get pulled into all the compiles.

Some nested includes sequences can go through an "asm" header
that you might expect to be architecture specific stuff and then
include something like ioctl.h.

Add something like #define IO_WR @@@ to the top a C file
and then see where the compiler finds the duplicate definition.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Andy Shevchenko Aug. 8, 2023, 12:47 p.m. UTC | #11
On Tue, Aug 08, 2023 at 08:41:49AM +0200, Petr Mladek wrote:
> On Mon 2023-08-07 18:13:57, Andy Shevchenko wrote:
> > On Mon, Aug 07, 2023 at 06:11:24PM +0300, Andy Shevchenko wrote:
> > > On Mon, Aug 07, 2023 at 06:09:54PM +0300, Andy Shevchenko wrote:
> > > > On Mon, Aug 07, 2023 at 05:03:19PM +0200, Petr Mladek wrote:
> > > > > On Sat 2023-08-05 20:50:26, Andy Shevchenko wrote:

...

> > > > > How does this sound, please?
> > > > 
> > > > Not every user (especially _header_) wants to have printk.h included just for
> > > > sprintf.h that may have nothing to do with real output. So, same reasoning
> > > > from me as keeping that in kernel.h, i.e. printk.h no better.
> > > 
> > > (haven't check these, just to show how many _headers_ uses sprintf() call)
> > > 
> > > $ git grep -lw s.*printf -- include/linux/
> > > include/linux/acpi.h
> > > include/linux/audit.h
> > > include/linux/btf.h
> > > include/linux/dev_printk.h
> > > include/linux/device-mapper.h
> > > include/linux/efi.h
> > > include/linux/fortify-string.h
> > > include/linux/fs.h
> > > include/linux/gameport.h
> > > include/linux/kdb.h
> > > include/linux/kdev_t.h
> > > include/linux/kernel.h
> > > include/linux/mmiotrace.h
> > > include/linux/netlink.h
> > > include/linux/pci-p2pdma.h
> > > include/linux/perf_event.h
> > > include/linux/printk.h
> > > include/linux/seq_buf.h
> > > include/linux/seq_file.h
> > > include/linux/shrinker.h
> > > include/linux/string.h
> > > include/linux/sunrpc/svc_xprt.h
> > > include/linux/tnum.h
> > > include/linux/trace_seq.h
> > > include/linux/usb.h
> > > include/linux/usb/gadget_configfs.h
> > 
> > Okay, revised as my regexp was too lazy
> > 
> > $ git grep -lw s[^[:space:]_]*printf -- include/linux/
> > include/linux/btf.h
> > include/linux/device-mapper.h
> > include/linux/efi.h
> > include/linux/fortify-string.h
> > include/linux/kdev_t.h
> > include/linux/kernel.h
> > include/linux/netlink.h
> > include/linux/pci-p2pdma.h
> > include/linux/perf_event.h
> > include/linux/sunrpc/svc_xprt.h
> > include/linux/tnum.h
> > include/linux/usb.h
> > include/linux/usb/gadget_configfs.h
> 
> This is only a tiny part of the picture.
> 
> $> git grep sc*n*printf | cut -d : -f1 | uniq | grep "\.c$" | wc -l
> 5254
> $> find . -name  "*.c" | wc -l
> 32319
> 
> It means that the vsprintf() family is used in 1/6 of all kernel
> source files. They would need to include one extra header.

No, not only one. more, but the outcome of this is not using what is not used
and unwinding the header dependency hell.

But hey, I am not talking about C files right now, it's secondary, however
in IIO we want to get rid of kernel.h in the C files as well.

Also, please, go through all of them and tell, how many of them are using
stuff from kernel.h besides sprintf.h and ARRAY_SIZE() (which I plan
for a long time to split from kernel.h)?

> If you split headers into so many small pieces then all
> source files will start with 3 screens of includes. I do not see
> how this helps with maintainability.

It should be a compromise. But including kernel.h mess into a file
(**especially** into header) for let's say a single sprintf() call
or use ARRAY_SIZE() macro is a bad idea. _This_ is not maintainable
code and developers definitely haven't put their brains to what they
are doing with the header inclusion block in their code.
Andy Shevchenko Aug. 8, 2023, 12:49 p.m. UTC | #12
On Mon, Aug 07, 2023 at 10:24:55PM -0400, Steven Rostedt wrote:
> On Mon, 7 Aug 2023 18:09:54 +0300
> Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> > On Mon, Aug 07, 2023 at 05:03:19PM +0200, Petr Mladek wrote:
> > > On Sat 2023-08-05 20:50:26, Andy Shevchenko wrote:  
> > > > kernel.h is being used as a dump for all kinds of stuff for a long time.
> > > > sprintf() and friends are used in many drivers without need of the full
> > > > kernel.h dependency train with it.
> > > > 
> > > > Here is the attempt on cleaning it up by splitting out sprintf() and
> > > > friends.  

...

> > > I agree that kernel.h is not the right place. But are there any
> > > numbers how much separate sprintf.h might safe?
> > > Maybe, we should not reinvent the wheel and get inspired by
> > > userspace.
> > > 
> > > sprintf() and friends are basic functions which most people know
> > > from userspace. And it is pretty handy that the kernel variants
> > > are are mostly compatible as well.
> > > 
> > > IMHO, it might be handful when they are also included similar way
> > > as in userspace. From my POV printk.h is like stdio.h. And we already
> > > have include/linux/stdarg.h where the v*print*() function might
> > > fit nicely.
> > > 
> > > How does this sound, please?  
> > 
> > Not every user (especially _header_) wants to have printk.h included just for
> > sprintf.h that may have nothing to do with real output. So, same reasoning
> > from me as keeping that in kernel.h, i.e. printk.h no better.
> 
> If you separate out the sprintf() into its own header and still include
> that in kernel.h, then for what you said in the other email:
> 
> > What to do with _headers_ that include kernel.h for no reason other than
> > sprintf.h (as an example)? Your suggestion, please?
> 
> It can include sprintf.h (or printk.h or stdio.h, whatever) instead of kernel.h.
> 
> What's the issue?

The issue is the same, printk.h brings a lot more than just s*printf().
Why should I include it for a, let's say, single sprintf() call?
Andy Shevchenko Aug. 8, 2023, 12:55 p.m. UTC | #13
On Sat, Aug 05, 2023 at 11:43:04AM -0700, Andrew Morton wrote:
> On Sat,  5 Aug 2023 20:50:26 +0300 Andy Shevchenko <andriy.shevchenko@linux.intel.com> wrote:
> 
> > kernel.h is being used as a dump for all kinds of stuff for a long time.
> > sprintf() and friends are used in many drivers without need of the full
> > kernel.h dependency train with it.
> 
> There seems little point in this unless someone signs up to convert
> lots of code to include sprintf.h instead of kernel.h?

You can say it to any cleanup work that starts from the baby steps.

> And such conversions will presumably cause all sorts of nasties
> which require additional work?
> 
> So... what's the plan here?

My main goal is to get rid from kernel.h in the _headers_ first.
The secondary goal as discussed with Jonathan to have IIO subsystem
be cleaned up from kernel.h (meaning C files as well) at some point.

FWIW, I have started kernel.h cleanup due to impossibility to
make bitmap_*alloc() being static inline.
David Laight Aug. 9, 2023, 8:48 a.m. UTC | #14
...
> If you split headers into so many small pieces then all
> source files will start with 3 screens of includes. I do not see
> how this helps with maintainability.

You also slow down compilations.

A few extra definitions in a 'leaf' header (one without any
#includes) don't really matter.
If a header includes other 'leaf' headers that doesn't matter
much.

But the deep include chains caused by a low level header
including a main header are what causes pretty much every
header to get included in every compilation.

Breaking the deep chains is probably more useful than
adding leaf headers for things that are in a header pretty
much everything in going to include anyway.

The is probably scope for counting the depth of header
includes by looking at what each header includes.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Petr Mladek Aug. 10, 2023, 8:15 a.m. UTC | #15
On Tue 2023-08-08 15:47:59, Andy Shevchenko wrote:
> On Tue, Aug 08, 2023 at 08:41:49AM +0200, Petr Mladek wrote:
> > On Mon 2023-08-07 18:13:57, Andy Shevchenko wrote:
> > > On Mon, Aug 07, 2023 at 06:11:24PM +0300, Andy Shevchenko wrote:
> > > > On Mon, Aug 07, 2023 at 06:09:54PM +0300, Andy Shevchenko wrote:
> > > > > On Mon, Aug 07, 2023 at 05:03:19PM +0200, Petr Mladek wrote:
> > > > > > On Sat 2023-08-05 20:50:26, Andy Shevchenko wrote:
> 
> ...
> 
> > > > > > How does this sound, please?
> > > > > 
> > > > > Not every user (especially _header_) wants to have printk.h included just for
> > > > > sprintf.h that may have nothing to do with real output. So, same reasoning
> > > > > from me as keeping that in kernel.h, i.e. printk.h no better.
> > > > 
> > > > (haven't check these, just to show how many _headers_ uses sprintf() call)
> > > > 
> > > > $ git grep -lw s.*printf -- include/linux/
> > > > include/linux/acpi.h
> > > > include/linux/audit.h
> > > > include/linux/btf.h
> > > > include/linux/dev_printk.h
> > > > include/linux/device-mapper.h
> > > > include/linux/efi.h
> > > > include/linux/fortify-string.h
> > > > include/linux/fs.h
> > > > include/linux/gameport.h
> > > > include/linux/kdb.h
> > > > include/linux/kdev_t.h
> > > > include/linux/kernel.h
> > > > include/linux/mmiotrace.h
> > > > include/linux/netlink.h
> > > > include/linux/pci-p2pdma.h
> > > > include/linux/perf_event.h
> > > > include/linux/printk.h
> > > > include/linux/seq_buf.h
> > > > include/linux/seq_file.h
> > > > include/linux/shrinker.h
> > > > include/linux/string.h
> > > > include/linux/sunrpc/svc_xprt.h
> > > > include/linux/tnum.h
> > > > include/linux/trace_seq.h
> > > > include/linux/usb.h
> > > > include/linux/usb/gadget_configfs.h
> > > 
> > > Okay, revised as my regexp was too lazy
> > > 
> > > $ git grep -lw s[^[:space:]_]*printf -- include/linux/
> > > include/linux/btf.h
> > > include/linux/device-mapper.h
> > > include/linux/efi.h
> > > include/linux/fortify-string.h
> > > include/linux/kdev_t.h
> > > include/linux/kernel.h
> > > include/linux/netlink.h
> > > include/linux/pci-p2pdma.h
> > > include/linux/perf_event.h
> > > include/linux/sunrpc/svc_xprt.h
> > > include/linux/tnum.h
> > > include/linux/usb.h
> > > include/linux/usb/gadget_configfs.h
> > 
> > This is only a tiny part of the picture.
> > 
> > $> git grep sc*n*printf | cut -d : -f1 | uniq | grep "\.c$" | wc -l
> > 5254
> > $> find . -name  "*.c" | wc -l
> > 32319
> > 
> > It means that the vsprintf() family is used in 1/6 of all kernel
> > source files. They would need to include one extra header.
> 
> No, not only one. more, but the outcome of this is not using what is not used
> and unwinding the header dependency hell.
> 
> But hey, I am not talking about C files right now, it's secondary, however
> in IIO we want to get rid of kernel.h in the C files as well.

This sounds scary. Headers and C files are closely related. IMHO, it
does not makes sense to split header files without looking how
the functions are used.

Everyone agrees that kernel.h should be removed. But there are always
more possibilities where to move the definitions. For this, the use
in C files must be considered. Otherwise, it is just a try&hope approach.

> Also, please, go through all of them and tell, how many of them are using
> stuff from kernel.h besides sprintf.h and ARRAY_SIZE() (which I plan
> for a long time to split from kernel.h)?

I am all for removing vsprintf declarations from linux.h.

I provided the above numbers to support the idea of moving them
into printk.h.

The numbers show that the vsprintf function famility is used
quite frequently. IMHO, creating an extra tiny include file
will create more harm then good. By the harm I mean:

    + churn when updating 1/6 of source files

    + prolonging the list of #include lines in .c file. It will
      not help with maintainability which was one of the motivation
      in this patchset.

    + an extra work for people using vsprintf function family in
      new .c files. People are used to get them for free,
      together with printk().

Best Regards,
Petr
Rasmus Villemoes Aug. 10, 2023, 9:09 a.m. UTC | #16
On 10/08/2023 10.15, Petr Mladek wrote:

> Everyone agrees that kernel.h should be removed. But there are always
> more possibilities where to move the definitions. For this, the use
> in C files must be considered. Otherwise, it is just a try&hope approach.
> 
>> Also, please, go through all of them and tell, how many of them are using
>> stuff from kernel.h besides sprintf.h and ARRAY_SIZE() (which I plan
>> for a long time to split from kernel.h)?
> 
> I am all for removing vsprintf declarations from linux.h.
> 
> I provided the above numbers to support the idea of moving them
> into printk.h.
> 
> The numbers show that the vsprintf function famility is used
> quite frequently. IMHO, creating an extra tiny include file
> will create more harm then good. By the harm I mean:
> 
>     + churn when updating 1/6 of source files

Well, we probably shouldn't do 5000 single-line patches to add that
sprintf.h include, and another 10000 to add an array-macros.h include
(just as an example). Some tooling and reasonable batching would
probably be required. Churn it will be, but how many thousands of
patches were done to make i2c drivers' probe methods lose a parameter
(first converting them all to .probe_new, then another round to again
assign to .probe when that prototype was changed). That's just the cost
of any tree-wide change in a tree our size.

>     + prolonging the list of #include lines in .c file. It will
>       not help with maintainability which was one of the motivation
>       in this patchset.

We really have to stop pretending it's ok to rely on header a.h
automatically pulling in b.h, if a .c file actually uses something
declared in b.h. [Of course, the reality is more complicated; e.g. we
have many cases where one must include linux/foo.h, not asm/foo.h, but
the actual declarations are in the appropriate arch-specific file.
However, we should not rely on linux/bar.h pulling in linux/foo.h.]

>     + an extra work for people using vsprintf function family in
>       new .c files. People are used to get them for free,
>       together with printk().

This is flawed. Not every C source file does a printk, or uses anything
else from printk.h. E.g. a lot of drivers only do the dev_err() family,
some subsystems have their own wrappers, etc. So by moving the
declarations to printk.h you just replace the kernel.h with something
equally bad (essentially all existing headers are bad because they all
include each other recursively). Also, by not moving the declarations to
a separate header, you're ignoring the fact that your own numbers show
that 5/6 of the kernel's TUs would become _smaller_ by not having to
parse those declarations. And the 1/6 that do use sprintf() may become
smaller by thousands of lines once they can avoid kernel.h and all that
that includes recursively.

But those gains can never be achieved if we don't start somewhere, and
if every such baby step results in 20+ message threads.

Rasmus
Andy Shevchenko Aug. 10, 2023, 1:13 p.m. UTC | #17
On Wed, Aug 09, 2023 at 08:48:54AM +0000, David Laight wrote:
> ...
> > If you split headers into so many small pieces then all
> > source files will start with 3 screens of includes. I do not see
> > how this helps with maintainability.
> 
> You also slow down compilations.

Ingo's patches showed the opposite. Do you have actual try and numbers?

> A few extra definitions in a 'leaf' header (one without any
> #includes) don't really matter.
> If a header includes other 'leaf' headers that doesn't matter
> much.
> 
> But the deep include chains caused by a low level header
> including a main header are what causes pretty much every
> header to get included in every compilation.
> 
> Breaking the deep chains is probably more useful than
> adding leaf headers for things that are in a header pretty
> much everything in going to include anyway.
> 
> The is probably scope for counting the depth of header
> includes by looking at what each header includes.
Andy Shevchenko Aug. 10, 2023, 1:17 p.m. UTC | #18
On Thu, Aug 10, 2023 at 11:09:20AM +0200, Rasmus Villemoes wrote:
> On 10/08/2023 10.15, Petr Mladek wrote:

...

> >     + prolonging the list of #include lines in .c file. It will
> >       not help with maintainability which was one of the motivation
> >       in this patchset.
> 
> We really have to stop pretending it's ok to rely on header a.h
> automatically pulling in b.h, if a .c file actually uses something
> declared in b.h. [Of course, the reality is more complicated; e.g. we
> have many cases where one must include linux/foo.h, not asm/foo.h, but
> the actual declarations are in the appropriate arch-specific file.
> However, we should not rely on linux/bar.h pulling in linux/foo.h.]

Btw, it's easy to enforce IIUC, i.e. by dropping

  #ifndef _FOO_H
  #define _FOO_H
  #endif

mantra from the headers.
Rasmus Villemoes Aug. 10, 2023, 2:17 p.m. UTC | #19
On 10/08/2023 15.17, Andy Shevchenko wrote:
> On Thu, Aug 10, 2023 at 11:09:20AM +0200, Rasmus Villemoes wrote:
>> On 10/08/2023 10.15, Petr Mladek wrote:
> 
> ...
> 
>>>     + prolonging the list of #include lines in .c file. It will
>>>       not help with maintainability which was one of the motivation
>>>       in this patchset.
>>
>> We really have to stop pretending it's ok to rely on header a.h
>> automatically pulling in b.h, if a .c file actually uses something
>> declared in b.h. [Of course, the reality is more complicated; e.g. we
>> have many cases where one must include linux/foo.h, not asm/foo.h, but
>> the actual declarations are in the appropriate arch-specific file.
>> However, we should not rely on linux/bar.h pulling in linux/foo.h.]
> 
> Btw, it's easy to enforce IIUC, i.e. by dropping
> 
>   #ifndef _FOO_H
>   #define _FOO_H
>   #endif
> 
> mantra from the headers.
> 

No, you can't do that, because some headers legitimately include other
headers, often for type definitions. Say some struct definition where
one of the members is another struct (struct list_head being an obvious
example). Or a static inline function.

We _also_ don't want to force everybody who includes a.h to ensure that
they first include b.h because something in a.h needs stuff from b.h.

So include guards must be used. They are a so well-known idiom that gcc
even has special code for handling them: If everything in a foo.h file
except comments is inside an ifndef/define/endif, gcc remembers that
that foo.h file has such an include guard, so when gcc then encounters
some #include directive that would again resolve to that same foo.h, and
the include guard hasn't been #undef'ed, it doesn't even do the syscalls
to open/read/close the file again.

Rasmus
Steven Rostedt Aug. 11, 2023, 7:28 p.m. UTC | #20
On Thu, 10 Aug 2023 16:17:57 +0200
Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:

> > Btw, it's easy to enforce IIUC, i.e. by dropping
> > 
> >   #ifndef _FOO_H
> >   #define _FOO_H
> >   #endif
> > 
> > mantra from the headers.
> >   
> 
> No, you can't do that, because some headers legitimately include other
> headers, often for type definitions. Say some struct definition where
> one of the members is another struct (struct list_head being an obvious
> example). Or a static inline function.
> 
> We _also_ don't want to force everybody who includes a.h to ensure that
> they first include b.h because something in a.h needs stuff from b.h.
> 
> So include guards must be used. They are a so well-known idiom that gcc
> even has special code for handling them: If everything in a foo.h file
> except comments is inside an ifndef/define/endif, gcc remembers that
> that foo.h file has such an include guard, so when gcc then encounters
> some #include directive that would again resolve to that same foo.h, and
> the include guard hasn't been #undef'ed, it doesn't even do the syscalls
> to open/read/close the file again.

I hope Andy was just joking with that recommendation.

-- Steve
David Laight Aug. 14, 2023, 8:12 a.m. UTC | #21
From: Andy Shevchenko
> Sent: 10 August 2023 14:14
> 
> On Wed, Aug 09, 2023 at 08:48:54AM +0000, David Laight wrote:
> > ...
> > > If you split headers into so many small pieces then all
> > > source files will start with 3 screens of includes. I do not see
> > > how this helps with maintainability.
> >
> > You also slow down compilations.
> 
> Ingo's patches showed the opposite. Do you have actual try and numbers?

The compiler has to open the extra file on every compile.
If you include it from lots of different places it has to open
it for each one (to find the include guard).
Any attempted compiler optimisations have the same much the
same problem as #pragma once.

With a long -I list even finding the file can take a while.

Probably most obvious when using NFS mounted filesystems.
Especially the 'traditional' NFS protocol that required a
message 'round trip' for each element of the directory path.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Andy Shevchenko Aug. 14, 2023, 11:16 a.m. UTC | #22
On Fri, Aug 11, 2023 at 03:28:17PM -0400, Steven Rostedt wrote:
> On Thu, 10 Aug 2023 16:17:57 +0200
> Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:
> 
> > > Btw, it's easy to enforce IIUC, i.e. by dropping
> > > 
> > >   #ifndef _FOO_H
> > >   #define _FOO_H
> > >   #endif
> > > 
> > > mantra from the headers.
> > 
> > No, you can't do that, because some headers legitimately include other
> > headers, often for type definitions. Say some struct definition where
> > one of the members is another struct (struct list_head being an obvious
> > example). Or a static inline function.
> > 
> > We _also_ don't want to force everybody who includes a.h to ensure that
> > they first include b.h because something in a.h needs stuff from b.h.
> > 
> > So include guards must be used. They are a so well-known idiom that gcc
> > even has special code for handling them: If everything in a foo.h file
> > except comments is inside an ifndef/define/endif, gcc remembers that
> > that foo.h file has such an include guard, so when gcc then encounters
> > some #include directive that would again resolve to that same foo.h, and
> > the include guard hasn't been #undef'ed, it doesn't even do the syscalls
> > to open/read/close the file again.
> 
> I hope Andy was just joking with that recommendation.

Too radical to be true to implement. But it's always good to have a rationale
(thanks Rasmus) behind existing approach.
Andy Shevchenko Aug. 14, 2023, 12:28 p.m. UTC | #23
On Mon, Aug 14, 2023 at 08:12:55AM +0000, David Laight wrote:
> From: Andy Shevchenko
> > Sent: 10 August 2023 14:14
> > On Wed, Aug 09, 2023 at 08:48:54AM +0000, David Laight wrote:

...

> > > > If you split headers into so many small pieces then all
> > > > source files will start with 3 screens of includes. I do not see
> > > > how this helps with maintainability.
> > >
> > > You also slow down compilations.
> > 
> > Ingo's patches showed the opposite. Do you have actual try and numbers?
> 
> The compiler has to open the extra file on every compile.
> If you include it from lots of different places it has to open
> it for each one (to find the include guard).
> Any attempted compiler optimisations have the same much the
> same problem as #pragma once.
> 
> With a long -I list even finding the file can take a while.
> 
> Probably most obvious when using NFS mounted filesystems.
> Especially the 'traditional' NFS protocol that required a
> message 'round trip' for each element of the directory path.

Right, as I said come up with numbers. Ingo did that, so can you.
His numbers shows _increase_ of build speed.
Petr Mladek Aug. 14, 2023, 3:16 p.m. UTC | #24
On Thu 2023-08-10 11:09:20, Rasmus Villemoes wrote:
> On 10/08/2023 10.15, Petr Mladek wrote:
> 
> > Everyone agrees that kernel.h should be removed. But there are always
> > more possibilities where to move the definitions. For this, the use
> > in C files must be considered. Otherwise, it is just a try&hope approach.
> > 
> >> Also, please, go through all of them and tell, how many of them are using
> >> stuff from kernel.h besides sprintf.h and ARRAY_SIZE() (which I plan
> >> for a long time to split from kernel.h)?
> > 
> > I am all for removing vsprintf declarations from linux.h.
> > 
> > I provided the above numbers to support the idea of moving them
> > into printk.h.
> > 
> > The numbers show that the vsprintf function famility is used
> > quite frequently. IMHO, creating an extra tiny include file
> > will create more harm then good. By the harm I mean:
> > 
> >     + churn when updating 1/6 of source files
> 
> Well, we probably shouldn't do 5000 single-line patches to add that
> sprintf.h include, and another 10000 to add an array-macros.h include
> (just as an example). Some tooling and reasonable batching would
> probably be required. Churn it will be, but how many thousands of
> patches were done to make i2c drivers' probe methods lose a parameter
> (first converting them all to .probe_new, then another round to again
> assign to .probe when that prototype was changed). That's just the cost
> of any tree-wide change in a tree our size.

OK.

> >     + prolonging the list of #include lines in .c file. It will
> >       not help with maintainability which was one of the motivation
> >       in this patchset.
> 
> We really have to stop pretending it's ok to rely on header a.h
> automatically pulling in b.h, if a .c file actually uses something
> declared in b.h.

Yes, we need to find some ballance.

> >     + an extra work for people using vsprintf function family in
> >       new .c files. People are used to get them for free,
> >       together with printk().
> 
> This is flawed. Not every C source file does a printk, or uses anything
> else from printk.h. E.g. a lot of drivers only do the dev_err() family,
> some subsystems have their own wrappers, etc. So by moving the
> declarations to printk.h you just replace the kernel.h with something
> equally bad (essentially all existing headers are bad because they all
> include each other recursively). Also, by not moving the declarations to
> a separate header, you're ignoring the fact that your own numbers show
> that 5/6 of the kernel's TUs would become _smaller_ by not having to
> parse those declarations. And the 1/6 that do use sprintf() may become
> smaller by thousands of lines once they can avoid kernel.h and all that
> that includes recursively.

OK, I did some grepping:

## total number of .c files
pmladek@alley:/prace/kernel/linux> find . -name *.c | wc -l
32319

# printk() usage:

## .c files with printk() calls:
$> git grep  "printk(\|pr_\(emerg\|alert\|crit\|err\|warn\|notice\|info\|cont\|debug\)(" | cut -d ":" -f 1 | uniq | grep "\.c$" | wc -l
8966

    => 28% .c files use printk() directly

## .h files with printk() calls:
$> git grep  "printk(\|pr_\(emerg\|alert\|crit\|err\|warn\|notice\|info\|cont\|debug\)(" | cut -d ":" -f 1 | uniq | grep "\.h$" | wc -l
1006

   => the number is probably much higher because it is also used
      in 1000+ header files.


# vprintf() usage:

## .c files where printk() functions are use without vprintf() functions
$> grep -f printf.list -v  printk.list | wc -l
6725

  => 21% .c files use vprintf() functions directly


# unique usage:

## .c files where vprintf() family functions are used directly
$> git grep sc*n*printf | cut -d : -f1 | uniq | grep "\.c$" | wc -l
5254

  => 75% .c of files using printk() are not using vprintf()

## .c files where vprintf() functions are use without printk() functions
$> grep -f printk.list -v  printf.list | wc -l
3045

  => 45% .c of files using vprintf() are not using printk()


My view:

The overlap will likely be bigger because vprintk() family is often
used directly in .c files but printk() is quite frequently used
indirectly via .h files.

But still, there seems to be non-trivial number of .c files which use
vprintf() and not printk().

=> The split might help after all.

In each case, I do not want to discuss this to the death. And will
not block this patch.

Best Regards,
Petr
David Laight Aug. 15, 2023, 9:58 a.m. UTC | #25
From: Rasmus Villemoes
> Sent: 10 August 2023 10:09
...
> We really have to stop pretending it's ok to rely on header a.h
> automatically pulling in b.h, if a .c file actually uses something
> declared in b.h. [Of course, the reality is more complicated; e.g. we
> have many cases where one must include linux/foo.h, not asm/foo.h, but
> the actual declarations are in the appropriate arch-specific file.
> However, we should not rely on linux/bar.h pulling in linux/foo.h.]

IMHO (for what it matters) it would be better to focus on why
#include <cdev.h> pulls in around 350 other headers (look at
a .d file) that worry about moving a few files into a new
'leaf' header from somewhere that pretty much everything has
to include anyway.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
diff mbox series

Patch

diff --git a/include/linux/kernel.h b/include/linux/kernel.h
index b9e76f717a7e..cee8fe87e9f4 100644
--- a/include/linux/kernel.h
+++ b/include/linux/kernel.h
@@ -29,6 +29,7 @@ 
 #include <linux/panic.h>
 #include <linux/printk.h>
 #include <linux/build_bug.h>
+#include <linux/sprintf.h>
 #include <linux/static_call_types.h>
 #include <linux/instruction_pointer.h>
 #include <asm/byteorder.h>
@@ -203,35 +204,6 @@  static inline void might_fault(void) { }
 
 void do_exit(long error_code) __noreturn;
 
-extern int num_to_str(char *buf, int size,
-		      unsigned long long num, unsigned int width);
-
-/* lib/printf utilities */
-
-extern __printf(2, 3) int sprintf(char *buf, const char * fmt, ...);
-extern __printf(2, 0) int vsprintf(char *buf, const char *, va_list);
-extern __printf(3, 4)
-int snprintf(char *buf, size_t size, const char *fmt, ...);
-extern __printf(3, 0)
-int vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
-extern __printf(3, 4)
-int scnprintf(char *buf, size_t size, const char *fmt, ...);
-extern __printf(3, 0)
-int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
-extern __printf(2, 3) __malloc
-char *kasprintf(gfp_t gfp, const char *fmt, ...);
-extern __printf(2, 0) __malloc
-char *kvasprintf(gfp_t gfp, const char *fmt, va_list args);
-extern __printf(2, 0)
-const char *kvasprintf_const(gfp_t gfp, const char *fmt, va_list args);
-
-extern __scanf(2, 3)
-int sscanf(const char *, const char *, ...);
-extern __scanf(2, 0)
-int vsscanf(const char *, const char *, va_list);
-
-extern int no_hash_pointers_enable(char *str);
-
 extern int get_option(char **str, int *pint);
 extern char *get_options(const char *str, int nints, int *ints);
 extern unsigned long long memparse(const char *ptr, char **retptr);
diff --git a/include/linux/sprintf.h b/include/linux/sprintf.h
new file mode 100644
index 000000000000..9ca23bcf9f42
--- /dev/null
+++ b/include/linux/sprintf.h
@@ -0,0 +1,25 @@ 
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_KERNEL_SPRINTF_H_
+#define _LINUX_KERNEL_SPRINTF_H_
+
+#include <linux/compiler_attributes.h>
+#include <linux/types.h>
+
+int num_to_str(char *buf, int size, unsigned long long num, unsigned int width);
+
+__printf(2, 3) int sprintf(char *buf, const char * fmt, ...);
+__printf(2, 0) int vsprintf(char *buf, const char *, va_list);
+__printf(3, 4) int snprintf(char *buf, size_t size, const char *fmt, ...);
+__printf(3, 0) int vsnprintf(char *buf, size_t size, const char *fmt, va_list args);
+__printf(3, 4) int scnprintf(char *buf, size_t size, const char *fmt, ...);
+__printf(3, 0) int vscnprintf(char *buf, size_t size, const char *fmt, va_list args);
+__printf(2, 3) __malloc char *kasprintf(gfp_t gfp, const char *fmt, ...);
+__printf(2, 0) __malloc char *kvasprintf(gfp_t gfp, const char *fmt, va_list args);
+__printf(2, 0) const char *kvasprintf_const(gfp_t gfp, const char *fmt, va_list args);
+
+__scanf(2, 3) int sscanf(const char *, const char *, ...);
+__scanf(2, 0) int vsscanf(const char *, const char *, va_list);
+
+int no_hash_pointers_enable(char *str);
+
+#endif	/* _LINUX_KERNEL_SPRINTF_H */
diff --git a/lib/test_printf.c b/lib/test_printf.c
index 2ab09a0dc841..5adca19d34e2 100644
--- a/lib/test_printf.c
+++ b/lib/test_printf.c
@@ -19,6 +19,7 @@ 
 #include <linux/rtc.h>
 #include <linux/slab.h>
 #include <linux/socket.h>
+#include <linux/sprintf.h>
 #include <linux/string.h>
 
 #include "../tools/testing/selftests/kselftest_module.h"
diff --git a/lib/vsprintf.c b/lib/vsprintf.c
index b17e0744a7bc..c89719586d0c 100644
--- a/lib/vsprintf.c
+++ b/lib/vsprintf.c
@@ -35,6 +35,7 @@ 
 #include <linux/property.h>
 #include <linux/rtc.h>
 #include <linux/siphash.h>
+#include <linux/sprintf.h>
 #include <linux/stdarg.h>
 #include <linux/string.h>
 #include <linux/string_helpers.h>