diff mbox

[RFC] exec: Avoid recursive modprobe for binary format handlers

Message ID 20170802232331.GO18884@wotan.suse.de (mailing list archive)
State New, archived
Headers show

Commit Message

Luis Chamberlain Aug. 2, 2017, 11:23 p.m. UTC
On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
> On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
> >> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
> >> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
> >> no longer caught, it just ends up waiting indefinitely for the kmod_wq
> >> wait queue. Hence the kernel appears to hang silently when starting
> >> userspace.
> >
> > Indeed, the recursive issue were no longer expected to exist.
> 
> Errr, yeah, recursive binfmt loads can still happen.
> 
> > The *old* implementation would also prevent a set of binaries to daisy chain
> > a set of 50 different binaries which require different binfmt loaders. The
> > current implementation enables this and we'd just wait. There's a bound to
> > the number of binfmd loaders though, so this would be bounded. If however
> > a 2nd loader loaded the first binary we'd run into the same issue I think.
> >
> > If we can't think of a good way to resolve this we'll just have to revert
> > 6d7964a722af for now.
> 
> The weird but "normal" recursive case is usually a script calling a
> script calling a misc format. Getting a chain of modprobes running,
> though, seems unlikely. I *think* Matt's patch is okay, but I agree,
> it'd be better for the request_module() to fail.

In that case how about we just have each waiter only wait max X seconds,
if the number of concurrent ongoing modprobe calls hasn't reduced by
a single digit in X seconds we give up on request_module() for the
module and clearly indicate what happened.

Matt, can you test?

Note I've used wait_event_killable_timeout() to only accept SIGKILL
for now. I've seen issues wit SIGCHILD and at modprobe this could
even be a bigger issue, so this would restrict the signals received
*only* to SIGKILL.

It would be good to come up with a simple test case for this in
tools/testing/selftests/kmod/kmod.sh

  Luis

Comments

Kees Cook Aug. 4, 2017, 12:02 a.m. UTC | #1
On Wed, Aug 2, 2017 at 4:23 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
>> On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>> > On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
>> >> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
>> >> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
>> >> no longer caught, it just ends up waiting indefinitely for the kmod_wq
>> >> wait queue. Hence the kernel appears to hang silently when starting
>> >> userspace.
>> >
>> > Indeed, the recursive issue were no longer expected to exist.
>>
>> Errr, yeah, recursive binfmt loads can still happen.
>>
>> > The *old* implementation would also prevent a set of binaries to daisy chain
>> > a set of 50 different binaries which require different binfmt loaders. The
>> > current implementation enables this and we'd just wait. There's a bound to
>> > the number of binfmd loaders though, so this would be bounded. If however
>> > a 2nd loader loaded the first binary we'd run into the same issue I think.
>> >
>> > If we can't think of a good way to resolve this we'll just have to revert
>> > 6d7964a722af for now.
>>
>> The weird but "normal" recursive case is usually a script calling a
>> script calling a misc format. Getting a chain of modprobes running,
>> though, seems unlikely. I *think* Matt's patch is okay, but I agree,
>> it'd be better for the request_module() to fail.
>
> In that case how about we just have each waiter only wait max X seconds,
> if the number of concurrent ongoing modprobe calls hasn't reduced by
> a single digit in X seconds we give up on request_module() for the
> module and clearly indicate what happened.
>
> Matt, can you test?
>
> Note I've used wait_event_killable_timeout() to only accept SIGKILL
> for now. I've seen issues wit SIGCHILD and at modprobe this could
> even be a bigger issue, so this would restrict the signals received
> *only* to SIGKILL.
>
> It would be good to come up with a simple test case for this in
> tools/testing/selftests/kmod/kmod.sh
>
>   Luis
>
> diff --git a/include/linux/wait.h b/include/linux/wait.h
> index 5b74e36c0ca8..dc19880c02f5 100644
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -757,6 +757,43 @@ extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *);
>         __ret;                                                                  \
>  })
>
> +#define __wait_event_killable_timeout(wq_head, condition, timeout)             \
> +       ___wait_event(wq_head, ___wait_cond_timeout(condition),                 \
> +                     TASK_KILLABLE, 0, timeout,                                \
> +                     __ret = schedule_timeout(__ret))
> +
> +/**
> + * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses
> + * @wq_head: the waitqueue to wait on
> + * @condition: a C expression for the event to wait for
> + * @timeout: timeout, in jiffies
> + *
> + * The process is put to sleep (TASK_KILLABLE) until the
> + * @condition evaluates to true or a kill signal is received.
> + * The @condition is checked each time the waitqueue @wq_head is woken up.
> + *
> + * wake_up() has to be called after changing any variable that could
> + * change the result of the wait condition.
> + *
> + * Returns:
> + * 0 if the @condition evaluated to %false after the @timeout elapsed,
> + * 1 if the @condition evaluated to %true after the @timeout elapsed,
> + * the remaining jiffies (at least 1) if the @condition evaluated
> + * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was
> + * interrupted by a kill signal.
> + *
> + * Only kill signals interrupt this process.
> + */
> +#define wait_event_killable_timeout(wq_head, condition, timeout)               \
> +({                                                                             \
> +       long __ret = timeout;                                                   \
> +       might_sleep();                                                          \
> +       if (!___wait_cond_timeout(condition))                                   \
> +               __ret = __wait_event_killable_timeout(wq_head,                  \
> +                                               condition, timeout);            \
> +       __ret;                                                                  \
> +})
> +
>
>  #define __wait_event_lock_irq(wq_head, condition, lock, cmd)                   \
>         (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0,     \
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 6d016c5d97c8..1b5f7bada8d2 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -71,6 +71,13 @@ static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
>  static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
>
>  /*
> + * If modprobe can't be called after this time we assume its very likely
> + * your userspace has created a recursive dependency, and we'll have no
> + * option but to fail.
> + */
> +#define MAX_KMOD_TIMEOUT 5

Would this mean slow (swappy) systems could start failing modprobe
just due to access times?

-Kees

> +
> +/*
>         modprobe_path is set via /proc/sys.
>  */
>  char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
> @@ -167,8 +174,18 @@ int __request_module(bool wait, const char *fmt, ...)
>                 pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) close to 0 (max_modprobes: %u), for module %s, throttling...",
>                                     atomic_read(&kmod_concurrent_max),
>                                     MAX_KMOD_CONCURRENT, module_name);
> -               wait_event_interruptible(kmod_wq,
> -                                        atomic_dec_if_positive(&kmod_concurrent_max) >= 0);
> +               ret = wait_event_killable_timeout(kmod_wq,
> +                                                 atomic_dec_if_positive(&kmod_concurrent_max) >= 0,
> +                                                 MAX_KMOD_TIMEOUT * HZ);
> +               if (!ret) {
> +                       pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
> +                                           module_name, atomic_read(&kmod_concurrent_max), MAX_KMOD_TIMEOUT);
> +                       pr_warn_ratelimited("request_module: recursive modprobe call very likely!");
> +                       return -ETIME;
> +               } else if (ret == -ERESTARTSYS) {
> +                       pr_warn_ratelimited("request_module: sigkill sent for modprobe %s, giving up", module_name);
> +                       return ret;
> +               }
>         }
>
>         trace_module_request(module_name, wait, _RET_IP_);
Luis Chamberlain Aug. 4, 2017, 12:10 a.m. UTC | #2
On Thu, Aug 03, 2017 at 05:02:40PM -0700, Kees Cook wrote:
> On Wed, Aug 2, 2017 at 4:23 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
> >> On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> >> > On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
> >> >> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
> >> >> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
> >> >> no longer caught, it just ends up waiting indefinitely for the kmod_wq
> >> >> wait queue. Hence the kernel appears to hang silently when starting
> >> >> userspace.
> >> >
> >> > Indeed, the recursive issue were no longer expected to exist.
> >>
> >> Errr, yeah, recursive binfmt loads can still happen.
> >>
> >> > The *old* implementation would also prevent a set of binaries to daisy chain
> >> > a set of 50 different binaries which require different binfmt loaders. The
> >> > current implementation enables this and we'd just wait. There's a bound to
> >> > the number of binfmd loaders though, so this would be bounded. If however
> >> > a 2nd loader loaded the first binary we'd run into the same issue I think.
> >> >
> >> > If we can't think of a good way to resolve this we'll just have to revert
> >> > 6d7964a722af for now.
> >>
> >> The weird but "normal" recursive case is usually a script calling a
> >> script calling a misc format. Getting a chain of modprobes running,
> >> though, seems unlikely. I *think* Matt's patch is okay, but I agree,
> >> it'd be better for the request_module() to fail.
> >
> > In that case how about we just have each waiter only wait max X seconds,
> > if the number of concurrent ongoing modprobe calls hasn't reduced by
> > a single digit in X seconds we give up on request_module() for the
> > module and clearly indicate what happened.
> >
> > Matt, can you test?
> >
> > Note I've used wait_event_killable_timeout() to only accept SIGKILL
> > for now. I've seen issues wit SIGCHILD and at modprobe this could
> > even be a bigger issue, so this would restrict the signals received
> > *only* to SIGKILL.
> >
> > It would be good to come up with a simple test case for this in
> > tools/testing/selftests/kmod/kmod.sh
> >
> >   Luis
> >
> > diff --git a/include/linux/wait.h b/include/linux/wait.h
> > index 5b74e36c0ca8..dc19880c02f5 100644
> > --- a/include/linux/wait.h
> > +++ b/include/linux/wait.h
> > @@ -757,6 +757,43 @@ extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *);
> >         __ret;                                                                  \
> >  })
> >
> > +#define __wait_event_killable_timeout(wq_head, condition, timeout)             \
> > +       ___wait_event(wq_head, ___wait_cond_timeout(condition),                 \
> > +                     TASK_KILLABLE, 0, timeout,                                \
> > +                     __ret = schedule_timeout(__ret))
> > +
> > +/**
> > + * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses
> > + * @wq_head: the waitqueue to wait on
> > + * @condition: a C expression for the event to wait for
> > + * @timeout: timeout, in jiffies
> > + *
> > + * The process is put to sleep (TASK_KILLABLE) until the
> > + * @condition evaluates to true or a kill signal is received.
> > + * The @condition is checked each time the waitqueue @wq_head is woken up.
> > + *
> > + * wake_up() has to be called after changing any variable that could
> > + * change the result of the wait condition.
> > + *
> > + * Returns:
> > + * 0 if the @condition evaluated to %false after the @timeout elapsed,
> > + * 1 if the @condition evaluated to %true after the @timeout elapsed,
> > + * the remaining jiffies (at least 1) if the @condition evaluated
> > + * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was
> > + * interrupted by a kill signal.
> > + *
> > + * Only kill signals interrupt this process.
> > + */
> > +#define wait_event_killable_timeout(wq_head, condition, timeout)               \
> > +({                                                                             \
> > +       long __ret = timeout;                                                   \
> > +       might_sleep();                                                          \
> > +       if (!___wait_cond_timeout(condition))                                   \
> > +               __ret = __wait_event_killable_timeout(wq_head,                  \
> > +                                               condition, timeout);            \
> > +       __ret;                                                                  \
> > +})
> > +
> >
> >  #define __wait_event_lock_irq(wq_head, condition, lock, cmd)                   \
> >         (void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0,     \
> > diff --git a/kernel/kmod.c b/kernel/kmod.c
> > index 6d016c5d97c8..1b5f7bada8d2 100644
> > --- a/kernel/kmod.c
> > +++ b/kernel/kmod.c
> > @@ -71,6 +71,13 @@ static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
> >  static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
> >
> >  /*
> > + * If modprobe can't be called after this time we assume its very likely
> > + * your userspace has created a recursive dependency, and we'll have no
> > + * option but to fail.
> > + */
> > +#define MAX_KMOD_TIMEOUT 5
> 
> Would this mean slow (swappy) systems could start failing modprobe
> just due to access times?

No, this is pre-launch and depends on *all* running kmod threads.
The wait would *only* fail if we already hit the limit of 50 concurrent
kmod threads running at the same time and they *all* don't finish for 5 seconds
straight. If at any point in time any modprobe call finishes that would clear
this and the waiting modprobe waiting would chug on. So this would only happen
if we were maxed out busy without any return for X seconds straight with all
kmod threads busy.

The name probably should reflect that better then, MAX_KMOD_ALL_BUSY_TIMEOUT
maybe?

  Luis
Matt Redfearn Aug. 7, 2017, 10:26 a.m. UTC | #3
Hi Luis,


On 03/08/17 00:23, Luis R. Rodriguez wrote:
> On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
>> On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
>>> On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
>>>> Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
>>>> merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
>>>> no longer caught, it just ends up waiting indefinitely for the kmod_wq
>>>> wait queue. Hence the kernel appears to hang silently when starting
>>>> userspace.
>>> Indeed, the recursive issue were no longer expected to exist.
>> Errr, yeah, recursive binfmt loads can still happen.
>>
>>> The *old* implementation would also prevent a set of binaries to daisy chain
>>> a set of 50 different binaries which require different binfmt loaders. The
>>> current implementation enables this and we'd just wait. There's a bound to
>>> the number of binfmd loaders though, so this would be bounded. If however
>>> a 2nd loader loaded the first binary we'd run into the same issue I think.
>>>
>>> If we can't think of a good way to resolve this we'll just have to revert
>>> 6d7964a722af for now.
>> The weird but "normal" recursive case is usually a script calling a
>> script calling a misc format. Getting a chain of modprobes running,
>> though, seems unlikely. I *think* Matt's patch is okay, but I agree,
>> it'd be better for the request_module() to fail.
> In that case how about we just have each waiter only wait max X seconds,
> if the number of concurrent ongoing modprobe calls hasn't reduced by
> a single digit in X seconds we give up on request_module() for the
> module and clearly indicate what happened.
>
> Matt, can you test?

Sure - I've tested patch this on Cavium Octeon under the same conditions 
as before (64 bit kernel with 32bit userspace & no binfmt handler builtin).

The failing modprobe is now caught and rather than silence we get the 
expected kernel panic, albeit all the warnings look quite noisy.

VFS: Mounted root (ext3 filesystem) readonly on device 8:5.
devtmpfs: mounted
Freeing unused kernel memory: 372K
This architecture does not have kernel memory protection.
request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), 
for module binfmt-4c46, throttling...
request_module: modprobe binfmt-4c46 cannot be processed, kmod busy with 
0 threads for more than 5 seconds now
request_module: recursive modprobe call very likely!
Starting init: /sbin/init exists but couldn't execute it (error -8)
request_module: kmod_concurrent_max (0) close to 0 (max_modprobes: 50), 
for module binfmt-4c46, throttling...
request_module: modprobe binfmt-4c46 cannot be processed, kmod busy with 
0 threads for more than 5 seconds now
request_module: recursive modprobe call very likely!
Starting init: /bin/sh exists but couldn't execute it (error -8)
Kernel panic - not syncing: No working init found.  Try passing init= 
option to kernel. See Linux Documentation/admin-guide/init.rst for guidance.
---[ end Kernel panic - not syncing: No working init found.  Try passing 
init= option to kernel. See Linux Documentation/admin-guide/init.rst for 
guidance.


In any case, this is better than the 4.13-rc1 behavior, so

Tested-by: Matt Redfearn <matt.redfearn@imgetc.com>

Thanks,
Matt

>
> Note I've used wait_event_killable_timeout() to only accept SIGKILL
> for now. I've seen issues wit SIGCHILD and at modprobe this could
> even be a bigger issue, so this would restrict the signals received
> *only* to SIGKILL.
>
> It would be good to come up with a simple test case for this in
> tools/testing/selftests/kmod/kmod.sh
>
>    Luis
>
> diff --git a/include/linux/wait.h b/include/linux/wait.h
> index 5b74e36c0ca8..dc19880c02f5 100644
> --- a/include/linux/wait.h
> +++ b/include/linux/wait.h
> @@ -757,6 +757,43 @@ extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *);
>   	__ret;									\
>   })
>   
> +#define __wait_event_killable_timeout(wq_head, condition, timeout)		\
> +	___wait_event(wq_head, ___wait_cond_timeout(condition),			\
> +		      TASK_KILLABLE, 0, timeout,				\
> +		      __ret = schedule_timeout(__ret))
> +
> +/**
> + * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses
> + * @wq_head: the waitqueue to wait on
> + * @condition: a C expression for the event to wait for
> + * @timeout: timeout, in jiffies
> + *
> + * The process is put to sleep (TASK_KILLABLE) until the
> + * @condition evaluates to true or a kill signal is received.
> + * The @condition is checked each time the waitqueue @wq_head is woken up.
> + *
> + * wake_up() has to be called after changing any variable that could
> + * change the result of the wait condition.
> + *
> + * Returns:
> + * 0 if the @condition evaluated to %false after the @timeout elapsed,
> + * 1 if the @condition evaluated to %true after the @timeout elapsed,
> + * the remaining jiffies (at least 1) if the @condition evaluated
> + * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was
> + * interrupted by a kill signal.
> + *
> + * Only kill signals interrupt this process.
> + */
> +#define wait_event_killable_timeout(wq_head, condition, timeout)		\
> +({										\
> +	long __ret = timeout;							\
> +	might_sleep();								\
> +	if (!___wait_cond_timeout(condition))					\
> +		__ret = __wait_event_killable_timeout(wq_head,			\
> +						condition, timeout);		\
> +	__ret;									\
> +})
> +
>   
>   #define __wait_event_lock_irq(wq_head, condition, lock, cmd)			\
>   	(void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0,	\
> diff --git a/kernel/kmod.c b/kernel/kmod.c
> index 6d016c5d97c8..1b5f7bada8d2 100644
> --- a/kernel/kmod.c
> +++ b/kernel/kmod.c
> @@ -71,6 +71,13 @@ static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
>   static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
>   
>   /*
> + * If modprobe can't be called after this time we assume its very likely
> + * your userspace has created a recursive dependency, and we'll have no
> + * option but to fail.
> + */
> +#define MAX_KMOD_TIMEOUT 5
> +
> +/*
>   	modprobe_path is set via /proc/sys.
>   */
>   char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
> @@ -167,8 +174,18 @@ int __request_module(bool wait, const char *fmt, ...)
>   		pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) close to 0 (max_modprobes: %u), for module %s, throttling...",
>   				    atomic_read(&kmod_concurrent_max),
>   				    MAX_KMOD_CONCURRENT, module_name);
> -		wait_event_interruptible(kmod_wq,
> -					 atomic_dec_if_positive(&kmod_concurrent_max) >= 0);
> +		ret = wait_event_killable_timeout(kmod_wq,
> +						  atomic_dec_if_positive(&kmod_concurrent_max) >= 0,
> +						  MAX_KMOD_TIMEOUT * HZ);
> +		if (!ret) {
> +			pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
> +					    module_name, atomic_read(&kmod_concurrent_max), MAX_KMOD_TIMEOUT);
> +			pr_warn_ratelimited("request_module: recursive modprobe call very likely!");
> +			return -ETIME;
> +		} else if (ret == -ERESTARTSYS) {
> +			pr_warn_ratelimited("request_module: sigkill sent for modprobe %s, giving up", module_name);
> +			return ret;
> +		}
>   	}
>   
>   	trace_module_request(module_name, wait, _RET_IP_);
Luis Chamberlain Aug. 8, 2017, 7:23 p.m. UTC | #4
On Mon, Aug 07, 2017 at 11:26:09AM +0100, Matt Redfearn wrote:
> Hi Luis,
> On 03/08/17 00:23, Luis R. Rodriguez wrote:
> > On Tue, Aug 01, 2017 at 07:28:20PM -0700, Kees Cook wrote:
> > > On Tue, Aug 1, 2017 at 5:12 PM, Luis R. Rodriguez <mcgrof@kernel.org> wrote:
> > > > On Fri, Jul 21, 2017 at 03:05:20PM +0100, Matt Redfearn wrote:
> > > > > Commit 6d7964a722af ("kmod: throttle kmod thread limit") which was
> > > > > merged in v4.13-rc1 broke this behaviour since the recursive modprobe is
> > > > > no longer caught, it just ends up waiting indefinitely for the kmod_wq
> > > > > wait queue. Hence the kernel appears to hang silently when starting
> > > > > userspace.
> > > > Indeed, the recursive issue were no longer expected to exist.
> > > Errr, yeah, recursive binfmt loads can still happen.
> > > 
> > > > The *old* implementation would also prevent a set of binaries to daisy chain
> > > > a set of 50 different binaries which require different binfmt loaders. The
> > > > current implementation enables this and we'd just wait. There's a bound to
> > > > the number of binfmd loaders though, so this would be bounded. If however
> > > > a 2nd loader loaded the first binary we'd run into the same issue I think.
> > > > 
> > > > If we can't think of a good way to resolve this we'll just have to revert
> > > > 6d7964a722af for now.
> > > The weird but "normal" recursive case is usually a script calling a
> > > script calling a misc format. Getting a chain of modprobes running,
> > > though, seems unlikely. I *think* Matt's patch is okay, but I agree,
> > > it'd be better for the request_module() to fail.
> > In that case how about we just have each waiter only wait max X seconds,
> > if the number of concurrent ongoing modprobe calls hasn't reduced by
> > a single digit in X seconds we give up on request_module() for the
> > module and clearly indicate what happened.
> > 
> > Matt, can you test?
> 
> Sure - I've tested patch this on Cavium Octeon under the same conditions as
> before (64 bit kernel with 32bit userspace & no binfmt handler builtin).
> 
> The failing modprobe is now caught and rather than silence we get the
> expected kernel panic, albeit all the warnings look quite noisy.

Thanks for testing! I agree its all too verbose, I'll clean that up and
resubmit with a cleaner log.

I tried to devise a test case for this but for the life of me I could not. If
you happen to come up with something please feel free to submit one for
lib/test_kmod.c!

> In any case, this is better than the 4.13-rc1 behavior, so
> 
> Tested-by: Matt Redfearn <matt.redfearn@imgetc.com>

 Luis
diff mbox

Patch

diff --git a/include/linux/wait.h b/include/linux/wait.h
index 5b74e36c0ca8..dc19880c02f5 100644
--- a/include/linux/wait.h
+++ b/include/linux/wait.h
@@ -757,6 +757,43 @@  extern int do_wait_intr_irq(wait_queue_head_t *, wait_queue_entry_t *);
 	__ret;									\
 })
 
+#define __wait_event_killable_timeout(wq_head, condition, timeout)		\
+	___wait_event(wq_head, ___wait_cond_timeout(condition),			\
+		      TASK_KILLABLE, 0, timeout,				\
+		      __ret = schedule_timeout(__ret))
+
+/**
+ * wait_event_killable_timeout - sleep until a condition gets true or a timeout elapses
+ * @wq_head: the waitqueue to wait on
+ * @condition: a C expression for the event to wait for
+ * @timeout: timeout, in jiffies
+ *
+ * The process is put to sleep (TASK_KILLABLE) until the
+ * @condition evaluates to true or a kill signal is received.
+ * The @condition is checked each time the waitqueue @wq_head is woken up.
+ *
+ * wake_up() has to be called after changing any variable that could
+ * change the result of the wait condition.
+ *
+ * Returns:
+ * 0 if the @condition evaluated to %false after the @timeout elapsed,
+ * 1 if the @condition evaluated to %true after the @timeout elapsed,
+ * the remaining jiffies (at least 1) if the @condition evaluated
+ * to %true before the @timeout elapsed, or -%ERESTARTSYS if it was
+ * interrupted by a kill signal.
+ *
+ * Only kill signals interrupt this process.
+ */
+#define wait_event_killable_timeout(wq_head, condition, timeout)		\
+({										\
+	long __ret = timeout;							\
+	might_sleep();								\
+	if (!___wait_cond_timeout(condition))					\
+		__ret = __wait_event_killable_timeout(wq_head,			\
+						condition, timeout);		\
+	__ret;									\
+})
+
 
 #define __wait_event_lock_irq(wq_head, condition, lock, cmd)			\
 	(void)___wait_event(wq_head, condition, TASK_UNINTERRUPTIBLE, 0, 0,	\
diff --git a/kernel/kmod.c b/kernel/kmod.c
index 6d016c5d97c8..1b5f7bada8d2 100644
--- a/kernel/kmod.c
+++ b/kernel/kmod.c
@@ -71,6 +71,13 @@  static atomic_t kmod_concurrent_max = ATOMIC_INIT(MAX_KMOD_CONCURRENT);
 static DECLARE_WAIT_QUEUE_HEAD(kmod_wq);
 
 /*
+ * If modprobe can't be called after this time we assume its very likely
+ * your userspace has created a recursive dependency, and we'll have no
+ * option but to fail.
+ */
+#define MAX_KMOD_TIMEOUT 5
+
+/*
 	modprobe_path is set via /proc/sys.
 */
 char modprobe_path[KMOD_PATH_LEN] = "/sbin/modprobe";
@@ -167,8 +174,18 @@  int __request_module(bool wait, const char *fmt, ...)
 		pr_warn_ratelimited("request_module: kmod_concurrent_max (%u) close to 0 (max_modprobes: %u), for module %s, throttling...",
 				    atomic_read(&kmod_concurrent_max),
 				    MAX_KMOD_CONCURRENT, module_name);
-		wait_event_interruptible(kmod_wq,
-					 atomic_dec_if_positive(&kmod_concurrent_max) >= 0);
+		ret = wait_event_killable_timeout(kmod_wq,
+						  atomic_dec_if_positive(&kmod_concurrent_max) >= 0,
+						  MAX_KMOD_TIMEOUT * HZ);
+		if (!ret) {
+			pr_warn_ratelimited("request_module: modprobe %s cannot be processed, kmod busy with %d threads for more than %d seconds now",
+					    module_name, atomic_read(&kmod_concurrent_max), MAX_KMOD_TIMEOUT);
+			pr_warn_ratelimited("request_module: recursive modprobe call very likely!");
+			return -ETIME;
+		} else if (ret == -ERESTARTSYS) {
+			pr_warn_ratelimited("request_module: sigkill sent for modprobe %s, giving up", module_name);
+			return ret;
+		}
 	}
 
 	trace_module_request(module_name, wait, _RET_IP_);