Patchwork v2.6.31-rc6: BUG: unable to handle kernel NULL pointer dereference at 0000000000000008

login
register
mail settings
Submitter Linus Torvalds
Date Aug. 25, 2009, 12:09 a.m.
Message ID <alpine.LFD.2.01.0908241655370.3824@localhost.localdomain>
Download mbox | patch
Permalink /patch/43679/
State New, archived
Headers show

Comments

Linus Torvalds - Aug. 25, 2009, 12:09 a.m.
On Mon, 24 Aug 2009, Linus Torvalds wrote:
> 
> But I wanted to let people know that the patch is clearly not the "last 
> word" on this. It's a useful thing to try, but we need something better.

This may be better (this is a replacement for the previous patch).

Instead of using 'cancel_delayed_work_sync()', it makes tty_ldisc_hangup() 
do a 'flush_scheduled_work()' afterwards, like the other callers already 
do.

And like 'tty_ldisc_release()' already does, it does this all before even 
getting the ldisc_mutex, avoiding the deadlock.

I'm not 100% happy with this patch either, but my remaining unhappiness is 
more with the tty locking in general that causes this all. I suspect this 
patch in itself is not any worse than the other hacks we have.

Oh, and in case you didn't guess - this is _STILL_ totally untested. It 
compiles for me, but that's all I'm going to guarantee. I'm just looking 
at the code (and getting pretty fed up with it ;)

And as already mentioned: I doubt the deadlock on tty->ldisc_mutex is 
anything that would be hit in practice. And even if it can be triggered, 
the previous patch I sent out is still interesting in a "does it make the 
problem go away" sense. Because if it doesn't (with or without a new 
deadlock), then I'm looking at all the wrong places.

		Linus

---
 drivers/char/tty_ldisc.c |   10 +++++++---
 1 files changed, 7 insertions(+), 3 deletions(-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Eric W. Biederman - Aug. 25, 2009, 1:41 a.m.
Linus Torvalds <torvalds@linux-foundation.org> writes:

> On Mon, 24 Aug 2009, Linus Torvalds wrote:
>> 
>> But I wanted to let people know that the patch is clearly not the "last 
>> word" on this. It's a useful thing to try, but we need something better.
>
> This may be better (this is a replacement for the previous patch).
>
> Instead of using 'cancel_delayed_work_sync()', it makes tty_ldisc_hangup() 
> do a 'flush_scheduled_work()' afterwards, like the other callers already 
> do.
>
> And like 'tty_ldisc_release()' already does, it does this all before even 
> getting the ldisc_mutex, avoiding the deadlock.
>
> I'm not 100% happy with this patch either, but my remaining unhappiness is 
> more with the tty locking in general that causes this all. I suspect this 
> patch in itself is not any worse than the other hacks we have.
>
> Oh, and in case you didn't guess - this is _STILL_ totally untested. It 
> compiles for me, but that's all I'm going to guarantee. I'm just looking 
> at the code (and getting pretty fed up with it ;)
>
> And as already mentioned: I doubt the deadlock on tty->ldisc_mutex is 
> anything that would be hit in practice. And even if it can be triggered, 
> the previous patch I sent out is still interesting in a "does it make the 
> problem go away" sense. Because if it doesn't (with or without a new 
> deadlock), then I'm looking at all the wrong places.

Assuming no one beats me to it I should be able to test this tomorrow.

Eric
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Dave Young - Aug. 25, 2009, 2:48 a.m.
On Mon, Aug 24, 2009 at 05:09:08PM -0700, Linus Torvalds wrote:
> 
> 
> On Mon, 24 Aug 2009, Linus Torvalds wrote:
> > 
> > But I wanted to let people know that the patch is clearly not the "last 
> > word" on this. It's a useful thing to try, but we need something better.
> 
> This may be better (this is a replacement for the previous patch).
> 
> Instead of using 'cancel_delayed_work_sync()', it makes tty_ldisc_hangup() 
> do a 'flush_scheduled_work()' afterwards, like the other callers already 
> do.
> 
> And like 'tty_ldisc_release()' already does, it does this all before even 
> getting the ldisc_mutex, avoiding the deadlock.
> 
> I'm not 100% happy with this patch either, but my remaining unhappiness is 
> more with the tty locking in general that causes this all. I suspect this 
> patch in itself is not any worse than the other hacks we have.
> 
> Oh, and in case you didn't guess - this is _STILL_ totally untested. It 
> compiles for me, but that's all I'm going to guarantee. I'm just looking 
> at the code (and getting pretty fed up with it ;)
> 
> And as already mentioned: I doubt the deadlock on tty->ldisc_mutex is 
> anything that would be hit in practice. And even if it can be triggered, 
> the previous patch I sent out is still interesting in a "does it make the 
> problem go away" sense. Because if it doesn't (with or without a new 
> deadlock), then I'm looking at all the wrong places.

Tested for half an hour, seems it fixed the problem.

> 
> 		Linus
> 
> ---
>  drivers/char/tty_ldisc.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
> index 1733d34..f893d18 100644
> --- a/drivers/char/tty_ldisc.c
> +++ b/drivers/char/tty_ldisc.c
> @@ -508,8 +508,9 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
>   *	be obtained while the delayed work queue halt ensures that no more
>   *	data is fed to the ldisc.
>   *
> - *	In order to wait for any existing references to complete see
> - *	tty_ldisc_wait_idle.
> + *	You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex
> + *	in order to make sure any currently executing ldisc work is also
> + *	flushed.
>   */
>  
>  static int tty_ldisc_halt(struct tty_struct *tty)
> @@ -753,11 +754,14 @@ void tty_ldisc_hangup(struct tty_struct *tty)
>  	 * N_TTY.
>  	 */
>  	if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) {
> +		/* Make sure the old ldisc is quiescent */
> +		tty_ldisc_halt(tty);
> +		flush_scheduled_work();
> +
>  		/* Avoid racing set_ldisc or tty_ldisc_release */
>  		mutex_lock(&tty->ldisc_mutex);
>  		if (tty->ldisc) {	/* Not yet closed */
>  			/* Switch back to N_TTY */
> -			tty_ldisc_halt(tty);
>  			tty_ldisc_reinit(tty);
>  			/* At this point we have a closed ldisc and we want to
>  			   reopen it. We could defer this to the next open but
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Xiaotian Feng - Aug. 25, 2009, 3:08 a.m.
On Tue, Aug 25, 2009 at 8:09 AM, Linus
Torvalds<torvalds@linux-foundation.org> wrote:
>
>
> On Mon, 24 Aug 2009, Linus Torvalds wrote:
>>
>> But I wanted to let people know that the patch is clearly not the "last
>> word" on this. It's a useful thing to try, but we need something better.
>
> This may be better (this is a replacement for the previous patch).
>
> Instead of using 'cancel_delayed_work_sync()', it makes tty_ldisc_hangup()
> do a 'flush_scheduled_work()' afterwards, like the other callers already
> do.
>
> And like 'tty_ldisc_release()' already does, it does this all before even
> getting the ldisc_mutex, avoiding the deadlock.
>
> I'm not 100% happy with this patch either, but my remaining unhappiness is
> more with the tty locking in general that causes this all. I suspect this
> patch in itself is not any worse than the other hacks we have.
>
> Oh, and in case you didn't guess - this is _STILL_ totally untested. It
> compiles for me, but that's all I'm going to guarantee. I'm just looking
> at the code (and getting pretty fed up with it ;)
>
> And as already mentioned: I doubt the deadlock on tty->ldisc_mutex is
> anything that would be hit in practice. And even if it can be triggered,
> the previous patch I sent out is still interesting in a "does it make the
> problem go away" sense. Because if it doesn't (with or without a new
> deadlock), then I'm looking at all the wrong places.

I have run the test case for about 2 hours on my x86_64 machine, no
panic happens.

>
>                Linus
>
> ---
>  drivers/char/tty_ldisc.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
> index 1733d34..f893d18 100644
> --- a/drivers/char/tty_ldisc.c
> +++ b/drivers/char/tty_ldisc.c
> @@ -508,8 +508,9 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
>  *     be obtained while the delayed work queue halt ensures that no more
>  *     data is fed to the ldisc.
>  *
> - *     In order to wait for any existing references to complete see
> - *     tty_ldisc_wait_idle.
> + *     You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex
> + *     in order to make sure any currently executing ldisc work is also
> + *     flushed.
>  */
>
>  static int tty_ldisc_halt(struct tty_struct *tty)
> @@ -753,11 +754,14 @@ void tty_ldisc_hangup(struct tty_struct *tty)
>         * N_TTY.
>         */
>        if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) {
> +               /* Make sure the old ldisc is quiescent */
> +               tty_ldisc_halt(tty);
> +               flush_scheduled_work();
> +
>                /* Avoid racing set_ldisc or tty_ldisc_release */
>                mutex_lock(&tty->ldisc_mutex);
>                if (tty->ldisc) {       /* Not yet closed */
>                        /* Switch back to N_TTY */
> -                       tty_ldisc_halt(tty);
>                        tty_ldisc_reinit(tty);
>                        /* At this point we have a closed ldisc and we want to
>                           reopen it. We could defer this to the next open but
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Frederic Weisbecker - Aug. 25, 2009, 3:39 a.m.
On Mon, Aug 24, 2009 at 05:09:08PM -0700, Linus Torvalds wrote:
> 
> 
> On Mon, 24 Aug 2009, Linus Torvalds wrote:
> > 
> > But I wanted to let people know that the patch is clearly not the "last 
> > word" on this. It's a useful thing to try, but we need something better.
> 
> This may be better (this is a replacement for the previous patch).
> 
> Instead of using 'cancel_delayed_work_sync()', it makes tty_ldisc_hangup() 
> do a 'flush_scheduled_work()' afterwards, like the other callers already 
> do.
> 
> And like 'tty_ldisc_release()' already does, it does this all before even 
> getting the ldisc_mutex, avoiding the deadlock.
> 
> I'm not 100% happy with this patch either, but my remaining unhappiness is 
> more with the tty locking in general that causes this all. I suspect this 
> patch in itself is not any worse than the other hacks we have.
> 
> Oh, and in case you didn't guess - this is _STILL_ totally untested. It 
> compiles for me, but that's all I'm going to guarantee. I'm just looking 
> at the code (and getting pretty fed up with it ;)
> 
> And as already mentioned: I doubt the deadlock on tty->ldisc_mutex is 
> anything that would be hit in practice. And even if it can be triggered, 
> the previous patch I sent out is still interesting in a "does it make the 
> problem go away" sense. Because if it doesn't (with or without a new 
> deadlock), then I'm looking at all the wrong places.
> 
> 		Linus
> 
> ---
>  drivers/char/tty_ldisc.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
> index 1733d34..f893d18 100644
> --- a/drivers/char/tty_ldisc.c
> +++ b/drivers/char/tty_ldisc.c
> @@ -508,8 +508,9 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
>   *	be obtained while the delayed work queue halt ensures that no more
>   *	data is fed to the ldisc.
>   *
> - *	In order to wait for any existing references to complete see
> - *	tty_ldisc_wait_idle.
> + *	You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex
> + *	in order to make sure any currently executing ldisc work is also
> + *	flushed.
>   */
>  
>  static int tty_ldisc_halt(struct tty_struct *tty)
> @@ -753,11 +754,14 @@ void tty_ldisc_hangup(struct tty_struct *tty)
>  	 * N_TTY.
>  	 */
>  	if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) {
> +		/* Make sure the old ldisc is quiescent */
> +		tty_ldisc_halt(tty);



Now that also makes the TTY_LDISC flag clearing unprotected by
tty->ldisc_mutex.

tty_set_ldisc() can play concurrently with these flags right?

tty_ldisc_halt() could remain protected by the mutex, so that the
flag is safely toggled. Once it is cleared, we can ensure no more
user can ref it and the lock can be relaxed while the pending
work is flushed.


> +		flush_scheduled_work();
> +
>  		/* Avoid racing set_ldisc or tty_ldisc_release */
>  		mutex_lock(&tty->ldisc_mutex);
>  		if (tty->ldisc) {	/* Not yet closed */
>  			/* Switch back to N_TTY */
> -			tty_ldisc_halt(tty);
>  			tty_ldisc_reinit(tty);
>  			/* At this point we have a closed ldisc and we want to
>  			   reopen it. We could defer this to the next open but
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Linus Torvalds - Aug. 25, 2009, 4:10 a.m.
On Tue, 25 Aug 2009, Frederic Weisbecker wrote:
> 
> Now that also makes the TTY_LDISC flag clearing unprotected by
> tty->ldisc_mutex.

Yes.

> tty_set_ldisc() can play concurrently with these flags right?

.. but that shouldn't matter.

The actual bit-setting is "atomic" already - and any other atomicity is 
pretty much unattainable, because all the routines in question drop the 
lock they need to hold in order to make it really be reliably atomic.

> tty_ldisc_halt() could remain protected by the mutex, so that the
> flag is safely toggled. Once it is cleared, we can ensure no more
> user can ref it and the lock can be relaxed while the pending
> work is flushed.

That would make no difference at all. tty_set_ldisc() won't care about the 
flag (in fact, it will do its own tty_ldisc_halt()), and will be happy to 
replace the ldisc we just flushed with a new one regardless of whether it 
was halted before or not. And it will do tty_ldisc_enable() regardless of 
whether it was enabled or not before.

In fact, because tty_set_ldisc() itself had to release the ldisc_mutex 
(for the same reason), you have this issue regardless of whether you hold 
the lock in tty_hangup() or not: the two will always be able to get "mixed 
up", because they - by design - have to release that silly lock.

That's why I said I was unhappy about the tty layer locking - it really 
isn't very sane. Things like tty_set_ldisc() will drop the lock in the 
middle because of that crazy workqueue deadlock - exactly for the same 
reasons that tty_ldisc_hangup() will need to do that "wait for things to 
flush" without the lock held.

So I could have taken the ldisc_mutex, and then just dropped it 
temporarily while waiting for any workqueue entries, but as far as I can 
tell, it doesn't actually solve anything.

I considered using the TTY_LDISC_CHANGING bit(*) there to protect against 
tty_set_ldisc(), and it may even be the right solution. But there's no way 
I'll do that kind of changes this late in the -rc series.

We also have the "TTY_HUPPED" bit that disables tty_set_ldisc(), but that 
is set too late by do_tty_hangup(), and so doesn't fix the problem either. 
Again, moving it earlier may be a solution, but again, it's not 
appropriate for this late in the -rc.

Finally, the solution that is most likely the _real_ solution would be to 
just fix the locking. The whole "ldisc_mutex" seems dubious. It's not even 
a real lock - exactly because it's dropped - and we already really use 
that TTY_LDISC_CHANGING bit to do the _real_ locking. I don't think it 
needs to be a mutex at all. The locking is just very dubious. 

And that, least of all, is anything I'm willing to really do in -rc. 

Anyway, I'll happily be shown wrong. I think the (second) patch I sent out 
is an acceptable hack in the presense of the current locking, but as I 
said, I'm not exactly happy about it, because I do think the locking is 
broken.

		Linus

(*) We already have that hacky open-coded "lock" using TTY_LDISC_CHANGING, 
which protects two different tty_set_ldisc()'s from screwing up each other 
when they drop the semaphore. It could be just separated out into a 
function of its own, and then the hangup code would/could/should be taught 
to use that logic.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Linus Torvalds - Aug. 25, 2009, 4:30 a.m.
On Mon, 24 Aug 2009, Linus Torvalds wrote:
> 
> Anyway, I'll happily be shown wrong. I think the (second) patch I sent out 
> is an acceptable hack in the presense of the current locking, but as I 
> said, I'm not exactly happy about it, because I do think the locking is 
> broken.

Btw, another solution to all this would be to just not have that 
ldisc_mutex deadlock due to do_tty_hangup -> tty_ldisc_hangup at all.

The actual _flushing_ doesn't need the mutex - it's just that both 
flushing and hangup is done with workqueues.

If we can avoid the deadlock by not having the (artificial) workqueue 
dependency, it would allow everybody to just hold on to the mutex over the 
whole sequence - and would obviate the need for that hacky 
TTY_LDISC_CHANGING bit thing in tty_set_ldisc.

In other words, the whole problem really comes in from the fact that 
do_tty_hangup() is called from "hangup_work", and the workqueues can get 
hung to the point where you can't then do the (totally _unrelated_) queue 
flushing.

Because flush_to_ldisc() itself - which is what we want to do - doesn't 
need that mutex or the workqueue at all. It could run from any context, 
afaik.

So if we were to turn it into just a timer (rather than a "delayed work"), 
then we'd not need to do that "flush_scheduled_work()" thing at all, and 
we wouldn't have that interaction with do_tty_hangup(). At which point we 
could again hold on to locks, because we wouldn't need to worry about the 
workqueues getting stuck on the mutex (that isn't even needed for the 
actual flushing part that we want to do!).

So don't get me wrong - there are _multiple_ ways to solve this. But they 
are all pretty major surgery, changing "big" semantics. We could fix the 
locking, we could change how we flush, we could do all of those things. 
And I'd love to. But I think the almost-oneliner is the safest approach 
right now. It's certainly not perfect, but it's fairly minimal impact.

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Zhang, Yanmin - Aug. 25, 2009, 6:16 a.m.
On Tue, 2009-08-25 at 11:08 +0800, Xiaotian Feng wrote:
> On Tue, Aug 25, 2009 at 8:09 AM, Linus
> Torvalds<torvalds@linux-foundation.org> wrote:
> >
> >
> > On Mon, 24 Aug 2009, Linus Torvalds wrote:
> >>
> >> But I wanted to let people know that the patch is clearly not the "last
> >> word" on this. It's a useful thing to try, but we need something better.
> >
> > This may be better (this is a replacement for the previous patch).
> >
> > Instead of using 'cancel_delayed_work_sync()', it makes tty_ldisc_hangup()
> > do a 'flush_scheduled_work()' afterwards, like the other callers already
> > do.
> >
> > And like 'tty_ldisc_release()' already does, it does this all before even
> > getting the ldisc_mutex, avoiding the deadlock.
> >
> > I'm not 100% happy with this patch either, but my remaining unhappiness is
> > more with the tty locking in general that causes this all. I suspect this
> > patch in itself is not any worse than the other hacks we have.
> >
> > Oh, and in case you didn't guess - this is _STILL_ totally untested. It
> > compiles for me, but that's all I'm going to guarantee. I'm just looking
> > at the code (and getting pretty fed up with it ;)
> >
> > And as already mentioned: I doubt the deadlock on tty->ldisc_mutex is
> > anything that would be hit in practice. And even if it can be triggered,
> > the previous patch I sent out is still interesting in a "does it make the
> > problem go away" sense. Because if it doesn't (with or without a new
> > deadlock), then I'm looking at all the wrong places.
> 
> I have run the test case for about 2 hours on my x86_64 machine, no
> panic happens.
I ran the test case and didn't hit the panic again. It seems the patch does work.
But I'm still curious. On my stoakley machine, most panic happens at
fn(data) in function run_timer_softirq=>__run_timers because the register which saves
fn is equal to NULL. I added a fn==NULL checking just before fn(data), and found
it's never equal to NULL which is quite different from the panic info.

> 
> >
> >                Linus
> >
> > ---
> >  drivers/char/tty_ldisc.c |   10 +++++++---
> >  1 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
> > index 1733d34..f893d18 100644
> > --- a/drivers/char/tty_ldisc.c
> > +++ b/drivers/char/tty_ldisc.c
> > @@ -508,8 +508,9 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
> >  *     be obtained while the delayed work queue halt ensures that no more
> >  *     data is fed to the ldisc.
> >  *
> > - *     In order to wait for any existing references to complete see
> > - *     tty_ldisc_wait_idle.
> > + *     You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex
> > + *     in order to make sure any currently executing ldisc work is also
> > + *     flushed.
> >  */
> >
> >  static int tty_ldisc_halt(struct tty_struct *tty)
> > @@ -753,11 +754,14 @@ void tty_ldisc_hangup(struct tty_struct *tty)
> >         * N_TTY.
> >         */
> >        if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) {
> > +               /* Make sure the old ldisc is quiescent */
> > +               tty_ldisc_halt(tty);
> > +               flush_scheduled_work();
> > +
> >                /* Avoid racing set_ldisc or tty_ldisc_release */
> >                mutex_lock(&tty->ldisc_mutex);
> >                if (tty->ldisc) {       /* Not yet closed */
> >                        /* Switch back to N_TTY */
> > -                       tty_ldisc_halt(tty);
> >                        tty_ldisc_reinit(tty);
> >                        /* At this point we have a closed ldisc and we want to
> >                           reopen it. We could defer this to the next open but
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Frederic Weisbecker - Aug. 25, 2009, 2:24 p.m.
On Mon, Aug 24, 2009 at 09:10:38PM -0700, Linus Torvalds wrote:
> 
> 
> On Tue, 25 Aug 2009, Frederic Weisbecker wrote:
> > 
> > Now that also makes the TTY_LDISC flag clearing unprotected by
> > tty->ldisc_mutex.
> 
> Yes.
> 
> > tty_set_ldisc() can play concurrently with these flags right?
> 
> .. but that shouldn't matter.
> 
> The actual bit-setting is "atomic" already - and any other atomicity is 
> pretty much unattainable, because all the routines in question drop the 
> lock they need to hold in order to make it really be reliably atomic.
> 
> > tty_ldisc_halt() could remain protected by the mutex, so that the
> > flag is safely toggled. Once it is cleared, we can ensure no more
> > user can ref it and the lock can be relaxed while the pending
> > work is flushed.
> 
> That would make no difference at all. tty_set_ldisc() won't care about the 
> flag (in fact, it will do its own tty_ldisc_halt()), and will be happy to 
> replace the ldisc we just flushed with a new one regardless of whether it 
> was halted before or not. And it will do tty_ldisc_enable() regardless of 
> whether it was enabled or not before.
> 
> In fact, because tty_set_ldisc() itself had to release the ldisc_mutex 
> (for the same reason), you have this issue regardless of whether you hold 
> the lock in tty_hangup() or not: the two will always be able to get "mixed 
> up", because they - by design - have to release that silly lock.



Hmm, that's why I had a headache while trying to imagine every races in this
place...



> That's why I said I was unhappy about the tty layer locking - it really 
> isn't very sane. Things like tty_set_ldisc() will drop the lock in the 
> middle because of that crazy workqueue deadlock - exactly for the same 
> reasons that tty_ldisc_hangup() will need to do that "wait for things to 
> flush" without the lock held.
> 
> So I could have taken the ldisc_mutex, and then just dropped it 
> temporarily while waiting for any workqueue entries, but as far as I can 
> tell, it doesn't actually solve anything.


Yeah, indeed.



> I considered using the TTY_LDISC_CHANGING bit(*) there to protect against 
> tty_set_ldisc(), and it may even be the right solution. But there's no way 
> I'll do that kind of changes this late in the -rc series.
> 
> We also have the "TTY_HUPPED" bit that disables tty_set_ldisc(), but that 
> is set too late by do_tty_hangup(), and so doesn't fix the problem either. 
> Again, moving it earlier may be a solution, but again, it's not 
> appropriate for this late in the -rc.


Ok.

 
> Finally, the solution that is most likely the _real_ solution would be to 
> just fix the locking. The whole "ldisc_mutex" seems dubious. It's not even 
> a real lock - exactly because it's dropped - and we already really use 
> that TTY_LDISC_CHANGING bit to do the _real_ locking. I don't think it 
> needs to be a mutex at all. The locking is just very dubious. 
> 
> And that, least of all, is anything I'm willing to really do in -rc. 
> 
> Anyway, I'll happily be shown wrong. I think the (second) patch I sent out 
> is an acceptable hack in the presense of the current locking, but as I 
> said, I'm not exactly happy about it, because I do think the locking is 
> broken.
> 
> 		Linus
> 
> (*) We already have that hacky open-coded "lock" using TTY_LDISC_CHANGING, 
> which protects two different tty_set_ldisc()'s from screwing up each other 
> when they drop the semaphore. It could be just separated out into a 
> function of its own, and then the hangup code would/could/should be taught 
> to use that logic.


Ok, thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Frederic Weisbecker - Aug. 25, 2009, 3:05 p.m.
On Mon, Aug 24, 2009 at 09:30:16PM -0700, Linus Torvalds wrote:
> 
> 
> On Mon, 24 Aug 2009, Linus Torvalds wrote:
> > 
> > Anyway, I'll happily be shown wrong. I think the (second) patch I sent out 
> > is an acceptable hack in the presense of the current locking, but as I 
> > said, I'm not exactly happy about it, because I do think the locking is 
> > broken.
> 
> Btw, another solution to all this would be to just not have that 
> ldisc_mutex deadlock due to do_tty_hangup -> tty_ldisc_hangup at all.
> 
> The actual _flushing_ doesn't need the mutex - it's just that both 
> flushing and hangup is done with workqueues.



Yeah, it would be sad, but having the flushing done in a dedicated workqueue
would solve the need of relaxing the lock, because we would only wait
for the pending flush works, not the hangup works.

But it's sad to create a thread only for that.

 
> If we can avoid the deadlock by not having the (artificial) workqueue 
> dependency, it would allow everybody to just hold on to the mutex over the 
> whole sequence - and would obviate the need for that hacky 
> TTY_LDISC_CHANGING bit thing in tty_set_ldisc.
> 
> In other words, the whole problem really comes in from the fact that 
> do_tty_hangup() is called from "hangup_work", and the workqueues can get 
> hung to the point where you can't then do the (totally _unrelated_) queue 
> flushing.
> 
> Because flush_to_ldisc() itself - which is what we want to do - doesn't 
> need that mutex or the workqueue at all. It could run from any context, 
> afaik.
> 
> So if we were to turn it into just a timer (rather than a "delayed work"), 
> then we'd not need to do that "flush_scheduled_work()" thing at all, and 
> we wouldn't have that interaction with do_tty_hangup(). At which point we 
> could again hold on to locks, because we wouldn't need to worry about the 
> workqueues getting stuck on the mutex (that isn't even needed for the 
> actual flushing part that we want to do!).



Yeah, a simple timer would be better than a dedicated workqueue in that
we don't need a whole thread for such small job.


> 
> So don't get me wrong - there are _multiple_ ways to solve this. But they 
> are all pretty major surgery, changing "big" semantics. We could fix the 
> locking, we could change how we flush, we could do all of those things. 
> And I'd love to. But I think the almost-oneliner is the safest approach 
> right now. It's certainly not perfect, but it's fairly minimal impact.
> 
> 		Linus


Yep.
I hope the progressive work Jens Axboe is doing on workqueues will drop
their serialized nature which leads to such perpetual deadlocks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Zhang, Yanmin - Aug. 27, 2009, 9:15 a.m.
On Mon, 2009-08-24 at 17:09 -0700, Linus Torvalds wrote:
> 
> On Mon, 24 Aug 2009, Linus Torvalds wrote:
> > 
> > But I wanted to let people know that the patch is clearly not the "last 
> > word" on this. It's a useful thing to try, but we need something better.
> 
> This may be better (this is a replacement for the previous patch).
> 
> Instead of using 'cancel_delayed_work_sync()', it makes tty_ldisc_hangup() 
> do a 'flush_scheduled_work()' afterwards, like the other callers already 
> do.
> 
> And like 'tty_ldisc_release()' already does, it does this all before even 
> getting the ldisc_mutex, avoiding the deadlock.
> 
> I'm not 100% happy with this patch either, but my remaining unhappiness is 
> more with the tty locking in general that causes this all. I suspect this 
> patch in itself is not any worse than the other hacks we have.
> 
> Oh, and in case you didn't guess - this is _STILL_ totally untested. It 
> compiles for me, but that's all I'm going to guarantee. I'm just looking 
> at the code (and getting pretty fed up with it ;)
> 
> And as already mentioned: I doubt the deadlock on tty->ldisc_mutex is 
> anything that would be hit in practice. And even if it can be triggered, 
> the previous patch I sent out is still interesting in a "does it make the 
> problem go away" sense. Because if it doesn't (with or without a new 
> deadlock), then I'm looking at all the wrong places.
> 
> 		Linus
> 
I traced kernel and the root cause is really that the process doesn't wait for
the work to be complete. So a new process picks up the released tty/work to
use it. After the new process deletes the timer but before releasing the tty,
the old scheduled work is started to add timer. Then, the new process releases
the tty struct, including work/timer. So later on when the timer rearmed by the old
scheduled work expires, kernel panic.

> ---
>  drivers/char/tty_ldisc.c |   10 +++++++---
>  1 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
> index 1733d34..f893d18 100644
> --- a/drivers/char/tty_ldisc.c
> +++ b/drivers/char/tty_ldisc.c
> @@ -508,8 +508,9 @@ static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
>   *	be obtained while the delayed work queue halt ensures that no more
>   *	data is fed to the ldisc.
>   *
> - *	In order to wait for any existing references to complete see
> - *	tty_ldisc_wait_idle.
> + *	You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex
> + *	in order to make sure any currently executing ldisc work is also
> + *	flushed.
>   */
>  
>  static int tty_ldisc_halt(struct tty_struct *tty)
> @@ -753,11 +754,14 @@ void tty_ldisc_hangup(struct tty_struct *tty)
>  	 * N_TTY.
>  	 */
>  	if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) {
> +		/* Make sure the old ldisc is quiescent */
> +		tty_ldisc_halt(tty);
> +		flush_scheduled_work();
Perhaps it's better with below:
	int ret = tty_ldisc_halt(tty);
	if (!ret)
		flush_scheduled_work();


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch

diff --git a/drivers/char/tty_ldisc.c b/drivers/char/tty_ldisc.c
index 1733d34..f893d18 100644
--- a/drivers/char/tty_ldisc.c
+++ b/drivers/char/tty_ldisc.c
@@ -508,8 +508,9 @@  static void tty_ldisc_restore(struct tty_struct *tty, struct tty_ldisc *old)
  *	be obtained while the delayed work queue halt ensures that no more
  *	data is fed to the ldisc.
  *
- *	In order to wait for any existing references to complete see
- *	tty_ldisc_wait_idle.
+ *	You need to do a 'flush_scheduled_work()' (outside the ldisc_mutex
+ *	in order to make sure any currently executing ldisc work is also
+ *	flushed.
  */
 
 static int tty_ldisc_halt(struct tty_struct *tty)
@@ -753,11 +754,14 @@  void tty_ldisc_hangup(struct tty_struct *tty)
 	 * N_TTY.
 	 */
 	if (tty->driver->flags & TTY_DRIVER_RESET_TERMIOS) {
+		/* Make sure the old ldisc is quiescent */
+		tty_ldisc_halt(tty);
+		flush_scheduled_work();
+
 		/* Avoid racing set_ldisc or tty_ldisc_release */
 		mutex_lock(&tty->ldisc_mutex);
 		if (tty->ldisc) {	/* Not yet closed */
 			/* Switch back to N_TTY */
-			tty_ldisc_halt(tty);
 			tty_ldisc_reinit(tty);
 			/* At this point we have a closed ldisc and we want to
 			   reopen it. We could defer this to the next open but