diff mbox series

[128/147] init: move usermodehelper_enable() to populate_rootfs()

Message ID 20210908030003.WFvhfWzKJ%akpm@linux-foundation.org (mailing list archive)
State New
Headers show
Series [001/147] mm, slub: don't call flush_all() from slab_debug_trace_open() | expand

Commit Message

Andrew Morton Sept. 8, 2021, 3 a.m. UTC
From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Subject: init: move usermodehelper_enable() to populate_rootfs()

Currently, usermodehelper is enabled right before PID1 starts going
through the initcalls. However, any call of a usermodehelper from a
pure_, core_, postcore_, arch_, subsys_ or fs_ initcall is futile, as
there is no filesystem contents yet.

Up until commit e7cb072eb988 ("init/initramfs.c: do unpacking
asynchronously"), such calls, whether via some request_module(), a
legacy uevent "/sbin/hotplug" notification or something else, would
just fail silently with (presumably) -ENOENT from
kernel_execve(). However, that commit introduced the
wait_for_initramfs() synchronization hook which must be called from
the usermodehelper exec path right before the kernel_execve, in order
that request_module() et al done from *after* rootfs_initcall()
time (i.e. device_ and late_ initcalls) would continue to find a
populated initramfs as they used to.

Any call of wait_for_initramfs() done before the unpacking has been
scheduled (i.e. before rootfs_initcall time) must just return
immediately [and let the caller find an empty file system] in order
not to deadlock the machine. I mistakenly thought, and my limited
testing confirmed, that there were no such calls, so I added a
pr_warn_once() in wait_for_initramfs(). It turns out that one can
indeed hit request_module() as well as kobject_uevent_env() during
those early init calls, leading to a user-visible warning in the
kernel log emitted consistently for certain configurations.

We could just remove the pr_warn_once(), but I think it's better to
postpone enabling the usermodehelper framework until there is at least
some chance of finding the executable. That is also a little more
efficient in that a lot of work done in umh.c will be elided. However,
it does change the error seen by those early callers from -ENOENT to
-EBUSY, so there is a risk of a regression if any caller care about
the exact error value.

Link: https://lkml.kernel.org/r/20210728134638.329060-1-linux@rasmusvillemoes.dk
Fixes: e7cb072eb988 ("init/initramfs.c: do unpacking asynchronously")
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
Cc: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 init/initramfs.c   |    2 ++
 init/main.c        |    1 -
 init/noinitramfs.c |    2 ++
 3 files changed, 4 insertions(+), 1 deletion(-)

Comments

Luis Chamberlain Sept. 8, 2021, 3:44 p.m. UTC | #1
On Tue, Sep 07, 2021 at 08:00:03PM -0700, Andrew Morton wrote:
> From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> Subject: init: move usermodehelper_enable() to populate_rootfs()
> 
> Currently, usermodehelper is enabled right before PID1 starts going
> through the initcalls. However, any call of a usermodehelper from a
> pure_, core_, postcore_, arch_, subsys_ or fs_ initcall is futile, as
> there is no filesystem contents yet.
> 
> Up until commit e7cb072eb988 ("init/initramfs.c: do unpacking
> asynchronously"), such calls, whether via some request_module(), a
> legacy uevent "/sbin/hotplug" notification or something else, would
> just fail silently with (presumably) -ENOENT from
> kernel_execve(). However, that commit introduced the
> wait_for_initramfs() synchronization hook which must be called from
> the usermodehelper exec path right before the kernel_execve, in order
> that request_module() et al done from *after* rootfs_initcall()
> time (i.e. device_ and late_ initcalls) would continue to find a
> populated initramfs as they used to.
> 
> Any call of wait_for_initramfs() done before the unpacking has been
> scheduled (i.e. before rootfs_initcall time) must just return
> immediately [and let the caller find an empty file system] in order
> not to deadlock the machine. I mistakenly thought, and my limited
> testing confirmed, that there were no such calls, so I added a
> pr_warn_once() in wait_for_initramfs(). It turns out that one can
> indeed hit request_module() as well as kobject_uevent_env() during
> those early init calls, leading to a user-visible warning in the
> kernel log emitted consistently for certain configurations.

Further proof that the semantics for init is still loose. Formalizing
dependencies on init is something we should strive to. Eventualy with a
DAG.  The linker-tables work I had done years ago strived to get us
there which allows us to get a simple explicit DAG through the linker.
Unfortunately that patch set fell through because folks were
more interested in questioning the alternative side benefits of
linker-tables, but the use-case for helping with init is still valid.

If we *do* want to resurrect this folks should let me know.

Since the kobject_uevent_env() interest here is for /sbin/hotplug and
that crap is deprecated, in practice the relevant calls we'd care about
are the request_module() calls.

> We could just remove the pr_warn_once(), but I think it's better to
> postpone enabling the usermodehelper framework until there is at least
> some chance of finding the executable. That is also a little more
> efficient in that a lot of work done in umh.c will be elided.

I *don't* think we were aware that such request_module() calls were
happening before the fs was even ready and failing silently with
-ENOENT. As such, although moving the usermodehelper_enable()
to right after scheduling populating the rootfs is the right thing,
we do loose on the opportunity to learn who were those odd callers
before. We could not care... but this is also a missed opportunity
in finding those. How important that is, is not clear to me as
this was silently failing before...

If we wanted to keep a print for the above purpose though, we'd likely
want the full stack trace to see who the hell made the call.

> However,
> it does change the error seen by those early callers from -ENOENT to
> -EBUSY, so there is a risk of a regression if any caller care about
> the exact error value.

I'd see this as a welcomed evolution as it tells us more: we're saying
"it's coming, try again" or whatever.

A debug option to allow us to get a full warning trace in the -EBUSY
case on early init would be nice to have.

Otherwise:

Reviewed-by: Luis Chamberlain <mcgrof@kernel.org>

  Luis

> Link: https://lkml.kernel.org/r/20210728134638.329060-1-linux@rasmusvillemoes.dk
> Fixes: e7cb072eb988 ("init/initramfs.c: do unpacking asynchronously")
> Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com>
> Reported-by: Bruno Goncalves <bgoncalv@redhat.com>
> Reported-by: Heiner Kallweit <hkallweit1@gmail.com>
> Cc: Luis Chamberlain <mcgrof@kernel.org>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  init/initramfs.c   |    2 ++
>  init/main.c        |    1 -
>  init/noinitramfs.c |    2 ++
>  3 files changed, 4 insertions(+), 1 deletion(-)
> 
> --- a/init/initramfs.c~init-move-usermodehelper_enable-to-populate_rootfs
> +++ a/init/initramfs.c
> @@ -15,6 +15,7 @@
>  #include <linux/mm.h>
>  #include <linux/namei.h>
>  #include <linux/init_syscalls.h>
> +#include <linux/umh.h>
>  
>  static ssize_t __init xwrite(struct file *file, const char *p, size_t count,
>  		loff_t *pos)
> @@ -727,6 +728,7 @@ static int __init populate_rootfs(void)
>  {
>  	initramfs_cookie = async_schedule_domain(do_populate_rootfs, NULL,
>  						 &initramfs_domain);
> +	usermodehelper_enable();
>  	if (!initramfs_async)
>  		wait_for_initramfs();
>  	return 0;
> --- a/init/main.c~init-move-usermodehelper_enable-to-populate_rootfs
> +++ a/init/main.c
> @@ -1392,7 +1392,6 @@ static void __init do_basic_setup(void)
>  	driver_init();
>  	init_irq_proc();
>  	do_ctors();
> -	usermodehelper_enable();
>  	do_initcalls();
>  }
>  
> --- a/init/noinitramfs.c~init-move-usermodehelper_enable-to-populate_rootfs
> +++ a/init/noinitramfs.c
> @@ -10,6 +10,7 @@
>  #include <linux/kdev_t.h>
>  #include <linux/syscalls.h>
>  #include <linux/init_syscalls.h>
> +#include <linux/umh.h>
>  
>  /*
>   * Create a simple rootfs that is similar to the default initramfs
> @@ -18,6 +19,7 @@ static int __init default_rootfs(void)
>  {
>  	int err;
>  
> +	usermodehelper_enable();
>  	err = init_mkdir("/dev", 0755);
>  	if (err < 0)
>  		goto out;
> _
Rasmus Villemoes Sept. 10, 2021, 8:12 a.m. UTC | #2
On 08/09/2021 17.44, Luis Chamberlain wrote:
> On Tue, Sep 07, 2021 at 08:00:03PM -0700, Andrew Morton wrote:
>> From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
>> Any call of wait_for_initramfs() done before the unpacking has been
>> scheduled (i.e. before rootfs_initcall time) must just return
>> immediately [and let the caller find an empty file system] in order
>> not to deadlock the machine. I mistakenly thought, and my limited
>> testing confirmed, that there were no such calls, so I added a
>> pr_warn_once() in wait_for_initramfs(). It turns out that one can
>> indeed hit request_module() as well as kobject_uevent_env() during
>> those early init calls, leading to a user-visible warning in the
>> kernel log emitted consistently for certain configurations.
> 
> Further proof that the semantics for init is still loose. Formalizing
> dependencies on init is something we should strive to. Eventualy with a
> DAG.  The linker-tables work I had done years ago strived to get us
> there which allows us to get a simple explicit DAG through the linker.
> Unfortunately that patch set fell through because folks were
> more interested in questioning the alternative side benefits of
> linker-tables, but the use-case for helping with init is still valid.
> 
> If we *do* want to resurrect this folks should let me know.

Heh, a while back I actually had some completely unrelated thing where
I'd want to make use of the linker tables infrastructure - I remembered
reading about it on LWN, and was quite surprised when I learnt that that
work had never made it in. I don't quite remember the use case (I think
it was for some test module infrastructure). But if you do have time to
resurrect those patches, I'd certainly be interested.

> Since the kobject_uevent_env() interest here is for /sbin/hotplug and
> that crap is deprecated, in practice the relevant calls we'd care about
> are the request_module() calls.

Yes - the first report I got about that pr_warn_once was indeed fixed by
the reporter simply disabling CONFIG_UEVENT_HELPER
(https://lore.kernel.org/lkml/9849be80-cfe5-b33e-8224-590a4c451415@gmail.com/).

>> We could just remove the pr_warn_once(), but I think it's better to
>> postpone enabling the usermodehelper framework until there is at least
>> some chance of finding the executable. That is also a little more
>> efficient in that a lot of work done in umh.c will be elided.
> 
> I *don't* think we were aware that such request_module() calls were
> happening before the fs was even ready and failing silently with
> -ENOENT. 

Probably not, no, otherwise somebody would have noticed.

As such, although moving the usermodehelper_enable()
> to right after scheduling populating the rootfs is the right thing,
> we do loose on the opportunity to learn who were those odd callers
> before. We could not care... but this is also a missed opportunity
> in finding those. How important that is, is not clear to me as
> this was silently failing before...
> 
> If we wanted to keep a print for the above purpose though, we'd likely
> want the full stack trace to see who the hell made the call.

Well, yes, I have myself fallen into that trap not just once, but at
least twice. The first time when I discovered this behaviour on one of
the ppc targets I did this work for in the first place (before I came up
with the CONFIG_MODPROBE_PATH patch). The second when I asked a reporter
to replace the pr_warn_once by WARN_ONCE:

https://lore.kernel.org/lkml/4434f245-db3b-c02a-36c4-0111a0dfb78d@rasmusvillemoes.dk/


The problem is that request_module() just fires off some worker thread
and then the original calling thread sits back and waits for that worker
to return a result.


>> However,
>> it does change the error seen by those early callers from -ENOENT to
>> -EBUSY, so there is a risk of a regression if any caller care about
>> the exact error value.
> 
> I'd see this as a welcomed evolution as it tells us more: we're saying
> "it's coming, try again" or whatever.

Indeed, and I don't think it's the end of the world if somebody notices
some change due to that, because we'd learn more about where those early
request_module() calls come from.

> A debug option to allow us to get a full warning trace in the -EBUSY
> case on early init would be nice to have.

As noted above, that's difficult. We'd need a way to know which other
task is waiting for us, then print the trace of that guy.

I don't think anybody is gonna hear this tree falling, so let's not try
to solve a problem before we know there is one.

Rasmus
H. Peter Anvin Sept. 10, 2021, 5:47 p.m. UTC | #3
I feel there is a general problem with the way infrastructure improvements are dealt with in the kernel – basically it feels like the submitter of new infrastructure is expected to convert existing users or "what we have now works." This is a classic way of building tech debt.

Maybe this might be a topic to discuss?

On September 10, 2021 1:12:01 AM PDT, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote:
>On 08/09/2021 17.44, Luis Chamberlain wrote:
>> On Tue, Sep 07, 2021 at 08:00:03PM -0700, Andrew Morton wrote:
>>> From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
>>> Any call of wait_for_initramfs() done before the unpacking has been
>>> scheduled (i.e. before rootfs_initcall time) must just return
>>> immediately [and let the caller find an empty file system] in order
>>> not to deadlock the machine. I mistakenly thought, and my limited
>>> testing confirmed, that there were no such calls, so I added a
>>> pr_warn_once() in wait_for_initramfs(). It turns out that one can
>>> indeed hit request_module() as well as kobject_uevent_env() during
>>> those early init calls, leading to a user-visible warning in the
>>> kernel log emitted consistently for certain configurations.
>> 
>> Further proof that the semantics for init is still loose. Formalizing
>> dependencies on init is something we should strive to. Eventualy with a
>> DAG.  The linker-tables work I had done years ago strived to get us
>> there which allows us to get a simple explicit DAG through the linker.
>> Unfortunately that patch set fell through because folks were
>> more interested in questioning the alternative side benefits of
>> linker-tables, but the use-case for helping with init is still valid.
>> 
>> If we *do* want to resurrect this folks should let me know.
>
>Heh, a while back I actually had some completely unrelated thing where
>I'd want to make use of the linker tables infrastructure - I remembered
>reading about it on LWN, and was quite surprised when I learnt that that
>work had never made it in. I don't quite remember the use case (I think
>it was for some test module infrastructure). But if you do have time to
>resurrect those patches, I'd certainly be interested.
>
>> Since the kobject_uevent_env() interest here is for /sbin/hotplug and
>> that crap is deprecated, in practice the relevant calls we'd care about
>> are the request_module() calls.
>
>Yes - the first report I got about that pr_warn_once was indeed fixed by
>the reporter simply disabling CONFIG_UEVENT_HELPER
>(https://lore.kernel.org/lkml/9849be80-cfe5-b33e-8224-590a4c451415@gmail.com/).
>
>>> We could just remove the pr_warn_once(), but I think it's better to
>>> postpone enabling the usermodehelper framework until there is at least
>>> some chance of finding the executable. That is also a little more
>>> efficient in that a lot of work done in umh.c will be elided.
>> 
>> I *don't* think we were aware that such request_module() calls were
>> happening before the fs was even ready and failing silently with
>> -ENOENT. 
>
>Probably not, no, otherwise somebody would have noticed.
>
>As such, although moving the usermodehelper_enable()
>> to right after scheduling populating the rootfs is the right thing,
>> we do loose on the opportunity to learn who were those odd callers
>> before. We could not care... but this is also a missed opportunity
>> in finding those. How important that is, is not clear to me as
>> this was silently failing before...
>> 
>> If we wanted to keep a print for the above purpose though, we'd likely
>> want the full stack trace to see who the hell made the call.
>
>Well, yes, I have myself fallen into that trap not just once, but at
>least twice. The first time when I discovered this behaviour on one of
>the ppc targets I did this work for in the first place (before I came up
>with the CONFIG_MODPROBE_PATH patch). The second when I asked a reporter
>to replace the pr_warn_once by WARN_ONCE:
>
>https://lore.kernel.org/lkml/4434f245-db3b-c02a-36c4-0111a0dfb78d@rasmusvillemoes.dk/
>
>
>The problem is that request_module() just fires off some worker thread
>and then the original calling thread sits back and waits for that worker
>to return a result.
>
>
>>> However,
>>> it does change the error seen by those early callers from -ENOENT to
>>> -EBUSY, so there is a risk of a regression if any caller care about
>>> the exact error value.
>> 
>> I'd see this as a welcomed evolution as it tells us more: we're saying
>> "it's coming, try again" or whatever.
>
>Indeed, and I don't think it's the end of the world if somebody notices
>some change due to that, because we'd learn more about where those early
>request_module() calls come from.
>
>> A debug option to allow us to get a full warning trace in the -EBUSY
>> case on early init would be nice to have.
>
>As noted above, that's difficult. We'd need a way to know which other
>task is waiting for us, then print the trace of that guy.
>
>I don't think anybody is gonna hear this tree falling, so let's not try
>to solve a problem before we know there is one.
>
>Rasmus
Luis Chamberlain Sept. 10, 2021, 5:51 p.m. UTC | #4
On Fri, Sep 10, 2021 at 10:12:01AM +0200, Rasmus Villemoes wrote:
> On 08/09/2021 17.44, Luis Chamberlain wrote:
> > On Tue, Sep 07, 2021 at 08:00:03PM -0700, Andrew Morton wrote:
> >> From: Rasmus Villemoes <linux@rasmusvillemoes.dk>
> >> Any call of wait_for_initramfs() done before the unpacking has been
> >> scheduled (i.e. before rootfs_initcall time) must just return
> >> immediately [and let the caller find an empty file system] in order
> >> not to deadlock the machine. I mistakenly thought, and my limited
> >> testing confirmed, that there were no such calls, so I added a
> >> pr_warn_once() in wait_for_initramfs(). It turns out that one can
> >> indeed hit request_module() as well as kobject_uevent_env() during
> >> those early init calls, leading to a user-visible warning in the
> >> kernel log emitted consistently for certain configurations.
> > 
> > Further proof that the semantics for init is still loose. Formalizing
> > dependencies on init is something we should strive to. Eventualy with a
> > DAG.  The linker-tables work I had done years ago strived to get us
> > there which allows us to get a simple explicit DAG through the linker.
> > Unfortunately that patch set fell through because folks were
> > more interested in questioning the alternative side benefits of
> > linker-tables, but the use-case for helping with init is still valid.
> > 
> > If we *do* want to resurrect this folks should let me know.
> 
> Heh, a while back I actually had some completely unrelated thing where
> I'd want to make use of the linker tables infrastructure - I remembered
> reading about it on LWN, and was quite surprised when I learnt that that
> work had never made it in. I don't quite remember the use case (I think
> it was for some test module infrastructure). But if you do have time to
> resurrect those patches, I'd certainly be interested.

OK I might.

> > Since the kobject_uevent_env() interest here is for /sbin/hotplug and
> > that crap is deprecated, in practice the relevant calls we'd care about
> > are the request_module() calls.
> 
> Yes - the first report I got about that pr_warn_once was indeed fixed by
> the reporter simply disabling CONFIG_UEVENT_HELPER
> (https://lore.kernel.org/lkml/9849be80-cfe5-b33e-8224-590a4c451415@gmail.com/).

Ah I see.

> >> We could just remove the pr_warn_once(), but I think it's better to
> >> postpone enabling the usermodehelper framework until there is at least
> >> some chance of finding the executable. That is also a little more
> >> efficient in that a lot of work done in umh.c will be elided.
> > 
> > I *don't* think we were aware that such request_module() calls were
> > happening before the fs was even ready and failing silently with
> > -ENOENT. 
> 
> Probably not, no, otherwise somebody would have noticed.

OK your commit log was not clear on this, it seemed to suggest this
as a possibility or that such a case existed. That also means the
impact of your change is less.

> >> However,
> >> it does change the error seen by those early callers from -ENOENT to
> >> -EBUSY, so there is a risk of a regression if any caller care about
> >> the exact error value.
> > 
> > I'd see this as a welcomed evolution as it tells us more: we're saying
> > "it's coming, try again" or whatever.
> 
> Indeed, and I don't think it's the end of the world if somebody notices
> some change due to that, because we'd learn more about where those early
> request_module() calls come from.

But since it would seem none have been reported yet, it is an even
better situation.

> > A debug option to allow us to get a full warning trace in the -EBUSY
> > case on early init would be nice to have.
> 
> As noted above, that's difficult. We'd need a way to know which other
> task is waiting for us, then print the trace of that guy.
> 
> I don't think anybody is gonna hear this tree falling, so let's not try
> to solve a problem before we know there is one.

That's fair. But let's also recall neither of us expected the above
situation either. But I agree the possible collateral at this point
seems to be small, if any.

 Luis
diff mbox series

Patch

--- a/init/initramfs.c~init-move-usermodehelper_enable-to-populate_rootfs
+++ a/init/initramfs.c
@@ -15,6 +15,7 @@ 
 #include <linux/mm.h>
 #include <linux/namei.h>
 #include <linux/init_syscalls.h>
+#include <linux/umh.h>
 
 static ssize_t __init xwrite(struct file *file, const char *p, size_t count,
 		loff_t *pos)
@@ -727,6 +728,7 @@  static int __init populate_rootfs(void)
 {
 	initramfs_cookie = async_schedule_domain(do_populate_rootfs, NULL,
 						 &initramfs_domain);
+	usermodehelper_enable();
 	if (!initramfs_async)
 		wait_for_initramfs();
 	return 0;
--- a/init/main.c~init-move-usermodehelper_enable-to-populate_rootfs
+++ a/init/main.c
@@ -1392,7 +1392,6 @@  static void __init do_basic_setup(void)
 	driver_init();
 	init_irq_proc();
 	do_ctors();
-	usermodehelper_enable();
 	do_initcalls();
 }
 
--- a/init/noinitramfs.c~init-move-usermodehelper_enable-to-populate_rootfs
+++ a/init/noinitramfs.c
@@ -10,6 +10,7 @@ 
 #include <linux/kdev_t.h>
 #include <linux/syscalls.h>
 #include <linux/init_syscalls.h>
+#include <linux/umh.h>
 
 /*
  * Create a simple rootfs that is similar to the default initramfs
@@ -18,6 +19,7 @@  static int __init default_rootfs(void)
 {
 	int err;
 
+	usermodehelper_enable();
 	err = init_mkdir("/dev", 0755);
 	if (err < 0)
 		goto out;