Message ID | 20210908030003.WFvhfWzKJ%akpm@linux-foundation.org (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [001/147] mm, slub: don't call flush_all() from slab_debug_trace_open() | expand |
On Tue, Sep 07, 2021 at 08:00:03PM -0700, Andrew Morton wrote: > From: Rasmus Villemoes <linux@rasmusvillemoes.dk> > Subject: init: move usermodehelper_enable() to populate_rootfs() > > Currently, usermodehelper is enabled right before PID1 starts going > through the initcalls. However, any call of a usermodehelper from a > pure_, core_, postcore_, arch_, subsys_ or fs_ initcall is futile, as > there is no filesystem contents yet. > > Up until commit e7cb072eb988 ("init/initramfs.c: do unpacking > asynchronously"), such calls, whether via some request_module(), a > legacy uevent "/sbin/hotplug" notification or something else, would > just fail silently with (presumably) -ENOENT from > kernel_execve(). However, that commit introduced the > wait_for_initramfs() synchronization hook which must be called from > the usermodehelper exec path right before the kernel_execve, in order > that request_module() et al done from *after* rootfs_initcall() > time (i.e. device_ and late_ initcalls) would continue to find a > populated initramfs as they used to. > > Any call of wait_for_initramfs() done before the unpacking has been > scheduled (i.e. before rootfs_initcall time) must just return > immediately [and let the caller find an empty file system] in order > not to deadlock the machine. I mistakenly thought, and my limited > testing confirmed, that there were no such calls, so I added a > pr_warn_once() in wait_for_initramfs(). It turns out that one can > indeed hit request_module() as well as kobject_uevent_env() during > those early init calls, leading to a user-visible warning in the > kernel log emitted consistently for certain configurations. Further proof that the semantics for init is still loose. Formalizing dependencies on init is something we should strive to. Eventualy with a DAG. The linker-tables work I had done years ago strived to get us there which allows us to get a simple explicit DAG through the linker. Unfortunately that patch set fell through because folks were more interested in questioning the alternative side benefits of linker-tables, but the use-case for helping with init is still valid. If we *do* want to resurrect this folks should let me know. Since the kobject_uevent_env() interest here is for /sbin/hotplug and that crap is deprecated, in practice the relevant calls we'd care about are the request_module() calls. > We could just remove the pr_warn_once(), but I think it's better to > postpone enabling the usermodehelper framework until there is at least > some chance of finding the executable. That is also a little more > efficient in that a lot of work done in umh.c will be elided. I *don't* think we were aware that such request_module() calls were happening before the fs was even ready and failing silently with -ENOENT. As such, although moving the usermodehelper_enable() to right after scheduling populating the rootfs is the right thing, we do loose on the opportunity to learn who were those odd callers before. We could not care... but this is also a missed opportunity in finding those. How important that is, is not clear to me as this was silently failing before... If we wanted to keep a print for the above purpose though, we'd likely want the full stack trace to see who the hell made the call. > However, > it does change the error seen by those early callers from -ENOENT to > -EBUSY, so there is a risk of a regression if any caller care about > the exact error value. I'd see this as a welcomed evolution as it tells us more: we're saying "it's coming, try again" or whatever. A debug option to allow us to get a full warning trace in the -EBUSY case on early init would be nice to have. Otherwise: Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Luis > Link: https://lkml.kernel.org/r/20210728134638.329060-1-linux@rasmusvillemoes.dk > Fixes: e7cb072eb988 ("init/initramfs.c: do unpacking asynchronously") > Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk> > Reported-by: Alexander Egorenkov <egorenar@linux.ibm.com> > Reported-by: Bruno Goncalves <bgoncalv@redhat.com> > Reported-by: Heiner Kallweit <hkallweit1@gmail.com> > Cc: Luis Chamberlain <mcgrof@kernel.org> > Signed-off-by: Andrew Morton <akpm@linux-foundation.org> > --- > > init/initramfs.c | 2 ++ > init/main.c | 1 - > init/noinitramfs.c | 2 ++ > 3 files changed, 4 insertions(+), 1 deletion(-) > > --- a/init/initramfs.c~init-move-usermodehelper_enable-to-populate_rootfs > +++ a/init/initramfs.c > @@ -15,6 +15,7 @@ > #include <linux/mm.h> > #include <linux/namei.h> > #include <linux/init_syscalls.h> > +#include <linux/umh.h> > > static ssize_t __init xwrite(struct file *file, const char *p, size_t count, > loff_t *pos) > @@ -727,6 +728,7 @@ static int __init populate_rootfs(void) > { > initramfs_cookie = async_schedule_domain(do_populate_rootfs, NULL, > &initramfs_domain); > + usermodehelper_enable(); > if (!initramfs_async) > wait_for_initramfs(); > return 0; > --- a/init/main.c~init-move-usermodehelper_enable-to-populate_rootfs > +++ a/init/main.c > @@ -1392,7 +1392,6 @@ static void __init do_basic_setup(void) > driver_init(); > init_irq_proc(); > do_ctors(); > - usermodehelper_enable(); > do_initcalls(); > } > > --- a/init/noinitramfs.c~init-move-usermodehelper_enable-to-populate_rootfs > +++ a/init/noinitramfs.c > @@ -10,6 +10,7 @@ > #include <linux/kdev_t.h> > #include <linux/syscalls.h> > #include <linux/init_syscalls.h> > +#include <linux/umh.h> > > /* > * Create a simple rootfs that is similar to the default initramfs > @@ -18,6 +19,7 @@ static int __init default_rootfs(void) > { > int err; > > + usermodehelper_enable(); > err = init_mkdir("/dev", 0755); > if (err < 0) > goto out; > _
On 08/09/2021 17.44, Luis Chamberlain wrote: > On Tue, Sep 07, 2021 at 08:00:03PM -0700, Andrew Morton wrote: >> From: Rasmus Villemoes <linux@rasmusvillemoes.dk> >> Any call of wait_for_initramfs() done before the unpacking has been >> scheduled (i.e. before rootfs_initcall time) must just return >> immediately [and let the caller find an empty file system] in order >> not to deadlock the machine. I mistakenly thought, and my limited >> testing confirmed, that there were no such calls, so I added a >> pr_warn_once() in wait_for_initramfs(). It turns out that one can >> indeed hit request_module() as well as kobject_uevent_env() during >> those early init calls, leading to a user-visible warning in the >> kernel log emitted consistently for certain configurations. > > Further proof that the semantics for init is still loose. Formalizing > dependencies on init is something we should strive to. Eventualy with a > DAG. The linker-tables work I had done years ago strived to get us > there which allows us to get a simple explicit DAG through the linker. > Unfortunately that patch set fell through because folks were > more interested in questioning the alternative side benefits of > linker-tables, but the use-case for helping with init is still valid. > > If we *do* want to resurrect this folks should let me know. Heh, a while back I actually had some completely unrelated thing where I'd want to make use of the linker tables infrastructure - I remembered reading about it on LWN, and was quite surprised when I learnt that that work had never made it in. I don't quite remember the use case (I think it was for some test module infrastructure). But if you do have time to resurrect those patches, I'd certainly be interested. > Since the kobject_uevent_env() interest here is for /sbin/hotplug and > that crap is deprecated, in practice the relevant calls we'd care about > are the request_module() calls. Yes - the first report I got about that pr_warn_once was indeed fixed by the reporter simply disabling CONFIG_UEVENT_HELPER (https://lore.kernel.org/lkml/9849be80-cfe5-b33e-8224-590a4c451415@gmail.com/). >> We could just remove the pr_warn_once(), but I think it's better to >> postpone enabling the usermodehelper framework until there is at least >> some chance of finding the executable. That is also a little more >> efficient in that a lot of work done in umh.c will be elided. > > I *don't* think we were aware that such request_module() calls were > happening before the fs was even ready and failing silently with > -ENOENT. Probably not, no, otherwise somebody would have noticed. As such, although moving the usermodehelper_enable() > to right after scheduling populating the rootfs is the right thing, > we do loose on the opportunity to learn who were those odd callers > before. We could not care... but this is also a missed opportunity > in finding those. How important that is, is not clear to me as > this was silently failing before... > > If we wanted to keep a print for the above purpose though, we'd likely > want the full stack trace to see who the hell made the call. Well, yes, I have myself fallen into that trap not just once, but at least twice. The first time when I discovered this behaviour on one of the ppc targets I did this work for in the first place (before I came up with the CONFIG_MODPROBE_PATH patch). The second when I asked a reporter to replace the pr_warn_once by WARN_ONCE: https://lore.kernel.org/lkml/4434f245-db3b-c02a-36c4-0111a0dfb78d@rasmusvillemoes.dk/ The problem is that request_module() just fires off some worker thread and then the original calling thread sits back and waits for that worker to return a result. >> However, >> it does change the error seen by those early callers from -ENOENT to >> -EBUSY, so there is a risk of a regression if any caller care about >> the exact error value. > > I'd see this as a welcomed evolution as it tells us more: we're saying > "it's coming, try again" or whatever. Indeed, and I don't think it's the end of the world if somebody notices some change due to that, because we'd learn more about where those early request_module() calls come from. > A debug option to allow us to get a full warning trace in the -EBUSY > case on early init would be nice to have. As noted above, that's difficult. We'd need a way to know which other task is waiting for us, then print the trace of that guy. I don't think anybody is gonna hear this tree falling, so let's not try to solve a problem before we know there is one. Rasmus
I feel there is a general problem with the way infrastructure improvements are dealt with in the kernel – basically it feels like the submitter of new infrastructure is expected to convert existing users or "what we have now works." This is a classic way of building tech debt. Maybe this might be a topic to discuss? On September 10, 2021 1:12:01 AM PDT, Rasmus Villemoes <linux@rasmusvillemoes.dk> wrote: >On 08/09/2021 17.44, Luis Chamberlain wrote: >> On Tue, Sep 07, 2021 at 08:00:03PM -0700, Andrew Morton wrote: >>> From: Rasmus Villemoes <linux@rasmusvillemoes.dk> >>> Any call of wait_for_initramfs() done before the unpacking has been >>> scheduled (i.e. before rootfs_initcall time) must just return >>> immediately [and let the caller find an empty file system] in order >>> not to deadlock the machine. I mistakenly thought, and my limited >>> testing confirmed, that there were no such calls, so I added a >>> pr_warn_once() in wait_for_initramfs(). It turns out that one can >>> indeed hit request_module() as well as kobject_uevent_env() during >>> those early init calls, leading to a user-visible warning in the >>> kernel log emitted consistently for certain configurations. >> >> Further proof that the semantics for init is still loose. Formalizing >> dependencies on init is something we should strive to. Eventualy with a >> DAG. The linker-tables work I had done years ago strived to get us >> there which allows us to get a simple explicit DAG through the linker. >> Unfortunately that patch set fell through because folks were >> more interested in questioning the alternative side benefits of >> linker-tables, but the use-case for helping with init is still valid. >> >> If we *do* want to resurrect this folks should let me know. > >Heh, a while back I actually had some completely unrelated thing where >I'd want to make use of the linker tables infrastructure - I remembered >reading about it on LWN, and was quite surprised when I learnt that that >work had never made it in. I don't quite remember the use case (I think >it was for some test module infrastructure). But if you do have time to >resurrect those patches, I'd certainly be interested. > >> Since the kobject_uevent_env() interest here is for /sbin/hotplug and >> that crap is deprecated, in practice the relevant calls we'd care about >> are the request_module() calls. > >Yes - the first report I got about that pr_warn_once was indeed fixed by >the reporter simply disabling CONFIG_UEVENT_HELPER >(https://lore.kernel.org/lkml/9849be80-cfe5-b33e-8224-590a4c451415@gmail.com/). > >>> We could just remove the pr_warn_once(), but I think it's better to >>> postpone enabling the usermodehelper framework until there is at least >>> some chance of finding the executable. That is also a little more >>> efficient in that a lot of work done in umh.c will be elided. >> >> I *don't* think we were aware that such request_module() calls were >> happening before the fs was even ready and failing silently with >> -ENOENT. > >Probably not, no, otherwise somebody would have noticed. > >As such, although moving the usermodehelper_enable() >> to right after scheduling populating the rootfs is the right thing, >> we do loose on the opportunity to learn who were those odd callers >> before. We could not care... but this is also a missed opportunity >> in finding those. How important that is, is not clear to me as >> this was silently failing before... >> >> If we wanted to keep a print for the above purpose though, we'd likely >> want the full stack trace to see who the hell made the call. > >Well, yes, I have myself fallen into that trap not just once, but at >least twice. The first time when I discovered this behaviour on one of >the ppc targets I did this work for in the first place (before I came up >with the CONFIG_MODPROBE_PATH patch). The second when I asked a reporter >to replace the pr_warn_once by WARN_ONCE: > >https://lore.kernel.org/lkml/4434f245-db3b-c02a-36c4-0111a0dfb78d@rasmusvillemoes.dk/ > > >The problem is that request_module() just fires off some worker thread >and then the original calling thread sits back and waits for that worker >to return a result. > > >>> However, >>> it does change the error seen by those early callers from -ENOENT to >>> -EBUSY, so there is a risk of a regression if any caller care about >>> the exact error value. >> >> I'd see this as a welcomed evolution as it tells us more: we're saying >> "it's coming, try again" or whatever. > >Indeed, and I don't think it's the end of the world if somebody notices >some change due to that, because we'd learn more about where those early >request_module() calls come from. > >> A debug option to allow us to get a full warning trace in the -EBUSY >> case on early init would be nice to have. > >As noted above, that's difficult. We'd need a way to know which other >task is waiting for us, then print the trace of that guy. > >I don't think anybody is gonna hear this tree falling, so let's not try >to solve a problem before we know there is one. > >Rasmus
On Fri, Sep 10, 2021 at 10:12:01AM +0200, Rasmus Villemoes wrote: > On 08/09/2021 17.44, Luis Chamberlain wrote: > > On Tue, Sep 07, 2021 at 08:00:03PM -0700, Andrew Morton wrote: > >> From: Rasmus Villemoes <linux@rasmusvillemoes.dk> > >> Any call of wait_for_initramfs() done before the unpacking has been > >> scheduled (i.e. before rootfs_initcall time) must just return > >> immediately [and let the caller find an empty file system] in order > >> not to deadlock the machine. I mistakenly thought, and my limited > >> testing confirmed, that there were no such calls, so I added a > >> pr_warn_once() in wait_for_initramfs(). It turns out that one can > >> indeed hit request_module() as well as kobject_uevent_env() during > >> those early init calls, leading to a user-visible warning in the > >> kernel log emitted consistently for certain configurations. > > > > Further proof that the semantics for init is still loose. Formalizing > > dependencies on init is something we should strive to. Eventualy with a > > DAG. The linker-tables work I had done years ago strived to get us > > there which allows us to get a simple explicit DAG through the linker. > > Unfortunately that patch set fell through because folks were > > more interested in questioning the alternative side benefits of > > linker-tables, but the use-case for helping with init is still valid. > > > > If we *do* want to resurrect this folks should let me know. > > Heh, a while back I actually had some completely unrelated thing where > I'd want to make use of the linker tables infrastructure - I remembered > reading about it on LWN, and was quite surprised when I learnt that that > work had never made it in. I don't quite remember the use case (I think > it was for some test module infrastructure). But if you do have time to > resurrect those patches, I'd certainly be interested. OK I might. > > Since the kobject_uevent_env() interest here is for /sbin/hotplug and > > that crap is deprecated, in practice the relevant calls we'd care about > > are the request_module() calls. > > Yes - the first report I got about that pr_warn_once was indeed fixed by > the reporter simply disabling CONFIG_UEVENT_HELPER > (https://lore.kernel.org/lkml/9849be80-cfe5-b33e-8224-590a4c451415@gmail.com/). Ah I see. > >> We could just remove the pr_warn_once(), but I think it's better to > >> postpone enabling the usermodehelper framework until there is at least > >> some chance of finding the executable. That is also a little more > >> efficient in that a lot of work done in umh.c will be elided. > > > > I *don't* think we were aware that such request_module() calls were > > happening before the fs was even ready and failing silently with > > -ENOENT. > > Probably not, no, otherwise somebody would have noticed. OK your commit log was not clear on this, it seemed to suggest this as a possibility or that such a case existed. That also means the impact of your change is less. > >> However, > >> it does change the error seen by those early callers from -ENOENT to > >> -EBUSY, so there is a risk of a regression if any caller care about > >> the exact error value. > > > > I'd see this as a welcomed evolution as it tells us more: we're saying > > "it's coming, try again" or whatever. > > Indeed, and I don't think it's the end of the world if somebody notices > some change due to that, because we'd learn more about where those early > request_module() calls come from. But since it would seem none have been reported yet, it is an even better situation. > > A debug option to allow us to get a full warning trace in the -EBUSY > > case on early init would be nice to have. > > As noted above, that's difficult. We'd need a way to know which other > task is waiting for us, then print the trace of that guy. > > I don't think anybody is gonna hear this tree falling, so let's not try > to solve a problem before we know there is one. That's fair. But let's also recall neither of us expected the above situation either. But I agree the possible collateral at this point seems to be small, if any. Luis
--- a/init/initramfs.c~init-move-usermodehelper_enable-to-populate_rootfs +++ a/init/initramfs.c @@ -15,6 +15,7 @@ #include <linux/mm.h> #include <linux/namei.h> #include <linux/init_syscalls.h> +#include <linux/umh.h> static ssize_t __init xwrite(struct file *file, const char *p, size_t count, loff_t *pos) @@ -727,6 +728,7 @@ static int __init populate_rootfs(void) { initramfs_cookie = async_schedule_domain(do_populate_rootfs, NULL, &initramfs_domain); + usermodehelper_enable(); if (!initramfs_async) wait_for_initramfs(); return 0; --- a/init/main.c~init-move-usermodehelper_enable-to-populate_rootfs +++ a/init/main.c @@ -1392,7 +1392,6 @@ static void __init do_basic_setup(void) driver_init(); init_irq_proc(); do_ctors(); - usermodehelper_enable(); do_initcalls(); } --- a/init/noinitramfs.c~init-move-usermodehelper_enable-to-populate_rootfs +++ a/init/noinitramfs.c @@ -10,6 +10,7 @@ #include <linux/kdev_t.h> #include <linux/syscalls.h> #include <linux/init_syscalls.h> +#include <linux/umh.h> /* * Create a simple rootfs that is similar to the default initramfs @@ -18,6 +19,7 @@ static int __init default_rootfs(void) { int err; + usermodehelper_enable(); err = init_mkdir("/dev", 0755); if (err < 0) goto out;