diff mbox

signals: Generate warning when flush_signals() is called from non-kthread context

Message ID 20150501193813.GA2812@gmail.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ingo Molnar May 1, 2015, 7:38 p.m. UTC
* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> On Fri, May 1, 2015 at 10:40 AM, Alex Williamson
> <alex.williamson@redhat.com> wrote:
> >
> > - Flush signals on interrupted wait to retain polling interval (Alex Williamson)
> 
> This cannot *possibly* be right. If I read this patch right, you're
> randomly just getting rid of signals. No way in hell is that correct.
> 
> "flush_signals()" is only for kernel threads, where it's a hacky
> alternative to actually handling them (since kernel threads never
> rreturn to user space and cannot really "handle" a signal). But you're
> doing it in the ->remove handler for the device, which can be called
> by arbitrary system processes. This is not a kernel thread thing, as
> far as I can see.
> 
> If you cannot handle signals, you damn well shouldn't be using
> "wait_event_interruptible_timeout()" to begin with. Get rid of the
> "interruptible", since it apparently *isn't* interruptible.
> 
> So I'm not pulling this.
> 
> Now I'm worried that other drivers do insane things like this. I
> wonder if we should add some sanity test to flush_signals() to make
> sure that it can only ever get called from a kernel thread.
> 
> Oleg?

So there are these uses:

  triton:~/tip> git grep -lw flush_signals
  arch/arm/common/bL_switcher.c

Looks safe: used within the bL_switcher_thread() kthread.

  drivers/block/drbd/drbd_main.c
  drivers/block/drbd/drbd_nl.c
  drivers/block/drbd/drbd_receiver.c
  drivers/block/drbd/drbd_worker.c

Couldn't convince myself it's safe, but it appears to be. (Call chains 
are obfuscated in various ways that makes it hard to tell where a 
given function execute.)

  drivers/md/md.c
  drivers/md/raid1.c
  drivers/md/raid5.c

Hm, so I'm not super sure about the flush_signals() in 
raid1.c:make_request() AFAICS we can do direct RAID1 writes in 
raid1_unplug(). That looks unsafe ... I've Cc:-ed Neil.

raid5.c seems safe: raid5_unplug() doesn't create requests directly, 
leaves it all for the mddev kthread.

  drivers/scsi/bnx2fc/bnx2fc_fcoe.c
  drivers/scsi/bnx2fc/bnx2fc_tgt.c
  drivers/scsi/bnx2i/bnx2i_iscsi.c
  drivers/scsi/libiscsi.c
  drivers/target/iscsi/iscsi_target_login.c
  drivers/target/iscsi/iscsi_target_nego.c

Couldn't fully check it due to excessive complexity, but seemed safe.

  drivers/staging/rtl8188eu/core/rtw_cmd.c
  drivers/staging/rtl8712/osdep_service.h

Looks safe: done in RTW_CMD_THREAD and 'padapter' kthreads.

  drivers/w1/w1_family.c
  drivers/w1/w1_int.c

Looks unsafe: called from various module exit handlers in:

  drivers/w1/slaves/w1_bq27000.c
  drivers/w1/slaves/w1_ds2406.c
  drivers/w1/slaves/w1_ds2408.c
  drivers/w1/slaves/w1_ds2413.c
  drivers/w1/slaves/w1_ds2423.c
  drivers/w1/slaves/w1_ds2431.c
  drivers/w1/slaves/w1_ds2433.c
  drivers/w1/slaves/w1_ds2760.c
  drivers/w1/slaves/w1_ds2780.c
  drivers/w1/slaves/w1_ds2781.c
  drivers/w1/slaves/w1_ds28e04.c
  drivers/w1/slaves/w1_smem.c
  drivers/w1/slaves/w1_therm.c

which would be executed in rmmod context, losing signals.
Cc:-ed Evgeniy.

  fs/lockd/svc.c
  fs/nfs/callback.c
  fs/nfsd/nfssvc.c

Looks safe: lockd, nfsd plus nfsv4.%u-svc kthreads.

  kernel/locking/rtmutex-tester.c

Looks safe: used within a kthread.

  include/linux/sched.h
  kernel/signal.c

Both safe ;-)

I also found a __flush_signals() use in:

  security/selinux/hooks.c

Now that's selinux_bprm_committed_creds(), apparently executed on 
exec(). Also does stuff like:

                memset(&itimer, 0, sizeof itimer);
                for (i = 0; i < 3; i++)
                        do_setitimer(i, &itimer, NULL);

and unblocks signals as well:

                        sigemptyset(&current->blocked);

but this appears to be kind of legit: the task failed to get the 
required permissions, and guns go off.

In any case, it seems to me that the patch below would be justified? 
Totally untested and so. __flush_signals() not affected.

Thanks,

	Ingo

---
 kernel/signal.c | 4 ++++
 1 file changed, 4 insertions(+)

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

NeilBrown May 2, 2015, 8:30 a.m. UTC | #1
On Fri, 1 May 2015 21:38:13 +0200 Ingo Molnar <mingo@kernel.org> wrote:


>   drivers/md/md.c
>   drivers/md/raid1.c
>   drivers/md/raid5.c
> 
> Hm, so I'm not super sure about the flush_signals() in 
> raid1.c:make_request() AFAICS we can do direct RAID1 writes in 
> raid1_unplug(). That looks unsafe ... I've Cc:-ed Neil.
> 
> raid5.c seems safe: raid5_unplug() doesn't create requests directly, 
> leaves it all for the mddev kthread.

Both raid1.c and raid5.c call flush_signals() in the make_request function
(in unusual circumstances).
I wanted a uninterruptible wait which didn't add to load-average.  That
approach works in kernel threads...

All the calls in md.c are in a kernel thread so safe, but I'd rather have an
explicit "uninterruptible, but no load-average" wait....

I should  probably change the make_request code to queue the request
somewhere rather than wait for it to be serviceable.

I'll look into that...



> In any case, it seems to me that the patch below would be justified? 
> Totally untested and so. __flush_signals() not affected.

Fine by me - does seem justified.

Thanks,
NeilBrown


> 
> Thanks,
> 
> 	Ingo
> 
> ---
>  kernel/signal.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/kernel/signal.c b/kernel/signal.c
> index d51c5ddd855c..100e30afe5d2 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -427,6 +427,10 @@ void flush_signals(struct task_struct *t)
>  {
>  	unsigned long flags;
>  
> +	/* Only kthreads are allowed to destroy signals: */
> +	if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD)))
> +		return;
> +
>  	spin_lock_irqsave(&t->sighand->siglock, flags);
>  	__flush_signals(t);
>  	spin_unlock_irqrestore(&t->sighand->siglock, flags);
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
Evgeniy Polyakov May 2, 2015, 11:56 a.m. UTC | #2
Hi Ingo

01.05.2015, 22:38, "Ingo Molnar" <mingo@kernel.org>:
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>>  On Fri, May 1, 2015 at 10:40 AM, Alex Williamson
>>  <alex.williamson@redhat.com> wrote:
>>>  - Flush signals on interrupted wait to retain polling interval (Alex Williamson)
>>  This cannot *possibly* be right. If I read this patch right, you're
>>  randomly just getting rid of signals. No way in hell is that correct.
>>
>>  "flush_signals()" is only for kernel threads, where it's a hacky
>>  alternative to actually handling them (since kernel threads never
>>  rreturn to user space and cannot really "handle" a signal). But you're
>>  doing it in the ->remove handler for the device, which can be called
>>  by arbitrary system processes. This is not a kernel thread thing, as
>>  far as I can see.
>>
>>  If you cannot handle signals, you damn well shouldn't be using
>>  "wait_event_interruptible_timeout()" to begin with. Get rid of the
>>  "interruptible", since it apparently *isn't* interruptible.
>>
>>  So I'm not pulling this.
>>
>>  Now I'm worried that other drivers do insane things like this. I
>>  wonder if we should add some sanity test to flush_signals() to make
>>  sure that it can only ever get called from a kernel thread.

> Looks unsafe: called from various module exit handlers in:
>
>   drivers/w1/slaves/w1_bq27000.c
>   drivers/w1/slaves/w1_ds2406.c
>   drivers/w1/slaves/w1_ds2408.c
>   drivers/w1/slaves/w1_ds2413.c
>   drivers/w1/slaves/w1_ds2423.c
>   drivers/w1/slaves/w1_ds2431.c
>   drivers/w1/slaves/w1_ds2433.c
>   drivers/w1/slaves/w1_ds2760.c
>   drivers/w1/slaves/w1_ds2780.c
>   drivers/w1/slaves/w1_ds2781.c
>   drivers/w1/slaves/w1_ds28e04.c
>   drivers/w1/slaves/w1_smem.c
>   drivers/w1/slaves/w1_therm.c
>
> which would be executed in rmmod context, losing signals.
> Cc:-ed Evgeniy.

w1 uses a little bit strange refcnt logic, basically every w1 module waits
for all its users to release their w1 resources and doesn't exit until its safe.

To wait for resources to be freed at module exit time it checks its refcnt to drop to zero periodically
and sleeps between the checks for a second. w1 flushes signals between these
checks for no particular reason, it can be safely removed from w1_unregister_family()
and interruptible sleep replaced with the normal one.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds May 2, 2015, 4:27 p.m. UTC | #3
On Sat, May 2, 2015 at 1:30 AM, NeilBrown <neilb@suse.de> wrote:
>
> All the calls in md.c are in a kernel thread so safe, but I'd rather have an
> explicit "uninterruptible, but no load-average" wait....

Hmm. Our task state is a bitmask anyway, so we could probably just add a

    #define __TASK_NOLOAD              16

(and move the EXIT_xyz defines *away* from the list that is actually
the task state), and teach our load average thing to not count those
kinds of waits. Then you could just use

    TASK_UNINTERRUPTIBLE | __TASK_NOLOAD

to make processes not count towards the load.

Or - probably preferably - we could really clean things up, and make
things much more like the bitmask it *should* be, and have explicit
bits for

 - SLEEPING/STOPPED/EXITING ("why not running?")
 - LOADAVG (accounted towards load)
 - WAKESIG (ie "interruptible")
 - WAKEKILL (this we already have)

and just make the rule be that we use "__TASK_xyz" for the actual
individual bits, and "TASK_xyz" for the helper combinations. So then
we'd have

   #define TASK_UNINTERRUPTIBLE (__TASK_SLEEPING | __TASK_LOADAVG)
   #define TASK_INTERRUPTIBLE (__TASK_SLEEPING | __TASK_WAKESIG)
   #define TASK_KILLABLE (__TASK_SLEEPING | __TASK_WAKEKILL)
   #define TASK_NOLOADAVG (__TASK_SLEEPING)

which is almost certainly how this *should* have been done, but isn't,
because of historical use.

Cleaning up like that *should* be fairly simple, but I'd be a bit
nervous about getting all the state comparisons right (we have an
unholy mix of "check this bit" and "check this whole state", and we'd
need to make sure we get those cases all right).

Ingo, what do you think? This is mostly a scheduler interface issue..

                       Linus
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Richard Weinberger May 2, 2015, 4:33 p.m. UTC | #4
On Fri, May 1, 2015 at 9:38 PM, Ingo Molnar <mingo@kernel.org> wrote:
> diff --git a/kernel/signal.c b/kernel/signal.c
> index d51c5ddd855c..100e30afe5d2 100644
> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -427,6 +427,10 @@ void flush_signals(struct task_struct *t)
>  {
>         unsigned long flags;
>
> +       /* Only kthreads are allowed to destroy signals: */
> +       if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD)))

Shouldn't this be t->flags?
Oleg Nesterov May 3, 2015, 5:34 p.m. UTC | #5
On 05/01, Ingo Molnar wrote:
>
> * Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
> > On Fri, May 1, 2015 at 10:40 AM, Alex Williamson
> > <alex.williamson@redhat.com> wrote:
> > >
> > > - Flush signals on interrupted wait to retain polling interval (Alex Williamson)
> >
> > This cannot *possibly* be right. If I read this patch right, you're
> > randomly just getting rid of signals. No way in hell is that correct.
> >
> > "flush_signals()" is only for kernel threads, where it's a hacky
> > alternative to actually handling them (since kernel threads never
> > rreturn to user space and cannot really "handle" a signal). But you're
> > doing it in the ->remove handler for the device, which can be called
> > by arbitrary system processes. This is not a kernel thread thing, as
> > far as I can see.
> >
> > If you cannot handle signals, you damn well shouldn't be using
> > "wait_event_interruptible_timeout()" to begin with. Get rid of the
> > "interruptible", since it apparently *isn't* interruptible.
> >
> > So I'm not pulling this.
> >
> > Now I'm worried that other drivers do insane things like this. I
> > wonder if we should add some sanity test to flush_signals() to make
> > sure that it can only ever get called from a kernel thread.
> >
> > Oleg?
>
> So there are these uses:
>
>   triton:~/tip> git grep -lw flush_signals
>   arch/arm/common/bL_switcher.c
>
>
> Looks safe: used within the bL_switcher_thread() kthread.

safe but wrong at first glance... I mean, it is pointless. Looks like,
bL_switcher_thread() wrongly thinks that wait_event_interruptible() can
lead to signal_pending().

>   drivers/block/drbd/drbd_main.c
>   drivers/block/drbd/drbd_nl.c
>   drivers/block/drbd/drbd_receiver.c
>   drivers/block/drbd/drbd_worker.c


Oh, I didn't know this helper is abused so much. I'll try to recheck
tomorrow, but it seems that it should die...


>   drivers/md/md.c

I can't understand this code... The comment says:

	/* We need to wait INTERRUPTIBLE so that
	 * we don't add to the load-average.
	 * That means we need to be sure no signals are
	 * pending
	 */
	if (signal_pending(current))
		flush_signals(current);

and this is wrong. However, signal_pending() can be true because of
allow_signal(SIGKILL) above. But why it does allow_signal() ?


>   fs/lockd/svc.c
>   fs/nfs/callback.c
>   fs/nfsd/nfssvc.c
>
> Looks safe: lockd, nfsd plus nfsv4.%u-svc kthreads.

Yes, this case looks fine. But perhaps it makes sense to add another
helper... Or not, I'll try to think more.

> I also found a __flush_signals() use in:
>
>   security/selinux/hooks.c
>
> Now that's selinux_bprm_committed_creds(), apparently executed on
> exec(). Also does stuff like:
>
>                 memset(&itimer, 0, sizeof itimer);
>                 for (i = 0; i < 3; i++)
>                         do_setitimer(i, &itimer, NULL);
>
> and unblocks signals as well:
>
>                         sigemptyset(&current->blocked);
>
> but this appears to be kind of legit: the task failed to get the
> required permissions, and guns go off.

and I simply can't understand this code... but I feel that it can't
be correct ;) Will try to re-read tomorrow.


> In any case, it seems to me that the patch below would be justified?
> Totally untested and so. __flush_signals() not affected.

Yes, agreed, it can't be right if the caller is not kthread.


> --- a/kernel/signal.c
> +++ b/kernel/signal.c
> @@ -427,6 +427,10 @@ void flush_signals(struct task_struct *t)
>  {
>  	unsigned long flags;
>
> +	/* Only kthreads are allowed to destroy signals: */
> +	if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD)))
> +		return;
> +

But I am not sure this can't make some buggy driver even more buggy.
Just suppose it does something

	do {
		if (signal_pending())
			flush_signals();
	} while (wait_event_interruptible(...));

and this change will turn this into busy-wait loop.

So perhaps another change which just adds WARN_ON_RATELIMIT() without
"return" will be safer?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov May 4, 2015, 4:45 p.m. UTC | #6
On 05/03, Oleg Nesterov wrote:
>
> On 05/01, Ingo Molnar wrote:
> >
>
> > I also found a __flush_signals() use in:
> >
> >   security/selinux/hooks.c
> >
> and I simply can't understand this code... but I feel that it can't
> be correct ;) Will try to re-read tomorrow.

Yes, this doesn't look right. Lets kill __flush_signals() first, there
are no other users.

And I am not sure it is fine to flush SIGSTOP... do we really want this?

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Oleg Nesterov May 4, 2015, 5:35 p.m. UTC | #7
On 05/02, NeilBrown wrote:
>
> All the calls in md.c are in a kernel thread so safe, but I'd rather have an
> explicit "uninterruptible, but no load-average" wait....

Could you please explain why md_thread() does allow_signal(SIGKILL) ?

I am just curious. It looks as if we want to allow user-space to "call"
thread->run(), and this looks strange.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ingo Molnar May 6, 2015, 10:19 a.m. UTC | #8
* Oleg Nesterov <oleg@redhat.com> wrote:

> > --- a/kernel/signal.c
> > +++ b/kernel/signal.c
> > @@ -427,6 +427,10 @@ void flush_signals(struct task_struct *t)
> >  {
> >  	unsigned long flags;
> >
> > +	/* Only kthreads are allowed to destroy signals: */
> > +	if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD)))
> > +		return;
> > +
> 
> But I am not sure this can't make some buggy driver even more buggy.
> Just suppose it does something
> 
> 	do {
> 		if (signal_pending())
> 			flush_signals();
> 	} while (wait_event_interruptible(...));
> 
> and this change will turn this into busy-wait loop.
> 
> So perhaps another change which just adds WARN_ON_RATELIMIT() 
> without "return" will be safer?

Yeah, absolutely.

Thanks,

	Ingo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Peter Zijlstra May 7, 2015, 12:54 p.m. UTC | #9
On Sat, May 02, 2015 at 09:27:49AM -0700, Linus Torvalds wrote:
> On Sat, May 2, 2015 at 1:30 AM, NeilBrown <neilb@suse.de> wrote:
> >
> > All the calls in md.c are in a kernel thread so safe, but I'd rather have an
> > explicit "uninterruptible, but no load-average" wait....
> 
> Hmm. Our task state is a bitmask anyway, so we could probably just add a
> 
>     #define __TASK_NOLOAD              16
> 
> (and move the EXIT_xyz defines *away* from the list that is actually
> the task state), and teach our load average thing to not count those
> kinds of waits. Then you could just use
> 
>     TASK_UNINTERRUPTIBLE | __TASK_NOLOAD
> 
> to make processes not count towards the load.
> 
> Or - probably preferably - we could really clean things up, and make
> things much more like the bitmask it *should* be, and have explicit
> bits for
> 
>  - SLEEPING/STOPPED/EXITING ("why not running?")
>  - LOADAVG (accounted towards load)
>  - WAKESIG (ie "interruptible")
>  - WAKEKILL (this we already have)
> 
> and just make the rule be that we use "__TASK_xyz" for the actual
> individual bits, and "TASK_xyz" for the helper combinations. So then
> we'd have
> 
>    #define TASK_UNINTERRUPTIBLE (__TASK_SLEEPING | __TASK_LOADAVG)
>    #define TASK_INTERRUPTIBLE (__TASK_SLEEPING | __TASK_WAKESIG)
>    #define TASK_KILLABLE (__TASK_SLEEPING | __TASK_WAKEKILL)
>    #define TASK_NOLOADAVG (__TASK_SLEEPING)
> 
> which is almost certainly how this *should* have been done, but isn't,
> because of historical use.
> 
> Cleaning up like that *should* be fairly simple, but I'd be a bit
> nervous about getting all the state comparisons right (we have an
> unholy mix of "check this bit" and "check this whole state", and we'd
> need to make sure we get those cases all right).
> 
> Ingo, what do you think? This is mostly a scheduler interface issue..

Hehe, a little something like this:

  https://lkml.org/lkml/2013/11/12/710

Lemme go clean that up and finish it :-)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jiri Kosina May 7, 2015, 1:33 p.m. UTC | #10
On Mon, 4 May 2015, Oleg Nesterov wrote:

> > All the calls in md.c are in a kernel thread so safe, but I'd rather have an
> > explicit "uninterruptible, but no load-average" wait....
> 
> Could you please explain why md_thread() does allow_signal(SIGKILL) ?
> 
> I am just curious. It looks as if we want to allow user-space to "call"
> thread->run(), and this looks strange.

One would think that this is because md wants to be notified when system 
is going to be halted/rebooted, and userspace init (whatever that is) 
decides to do 'kill -9 -1' to perform the final shutdown of md (the 
question is why it really should be needed, becasue all filesystems should 
be R/O by that time anyway).
NeilBrown May 7, 2015, 10:37 p.m. UTC | #11
On Thu, 7 May 2015 15:33:53 +0200 (CEST) Jiri Kosina <jkosina@suse.cz> wrote:

> On Mon, 4 May 2015, Oleg Nesterov wrote:
> 
> > > All the calls in md.c are in a kernel thread so safe, but I'd rather have an
> > > explicit "uninterruptible, but no load-average" wait....
> > 
> > Could you please explain why md_thread() does allow_signal(SIGKILL) ?
> > 
> > I am just curious. It looks as if we want to allow user-space to "call"
> > thread->run(), and this looks strange.
> 
> One would think that this is because md wants to be notified when system 
> is going to be halted/rebooted, and userspace init (whatever that is) 
> decides to do 'kill -9 -1' to perform the final shutdown of md (the 
> question is why it really should be needed, becasue all filesystems should 
> be R/O by that time anyway).
> 

Something like that.  It is historical strangeness that might have seemed
like a good idea at the time, but is hard to justify.

There is an alt-sysrq that will remount filesystems read-only, but no
alt-sysrq to switch RAID arrays to "idle/clean".  So I leverages alt-sysrq-K,
which does "kill -9 -1" (I think).

Since md gained the ability to switch to idle/clean after a short timeout,
and also the ability to use a write-intent-bitmap so only little bits of the
array are ever non idle/clean, this all became much less interesting.

NeilBrown
diff mbox

Patch

diff --git a/kernel/signal.c b/kernel/signal.c
index d51c5ddd855c..100e30afe5d2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -427,6 +427,10 @@  void flush_signals(struct task_struct *t)
 {
 	unsigned long flags;
 
+	/* Only kthreads are allowed to destroy signals: */
+	if (WARN_ON_ONCE(!(current->flags & PF_KTHREAD)))
+		return;
+
 	spin_lock_irqsave(&t->sighand->siglock, flags);
 	__flush_signals(t);
 	spin_unlock_irqrestore(&t->sighand->siglock, flags);