diff mbox

[2/3,v2] fb: udlfb: fix hang at disconnect

Message ID 5108329E.2050802@ahsoftware.de (mailing list archive)
State New, archived
Headers show

Commit Message

Alexander Holler Jan. 29, 2013, 8:35 p.m. UTC
Am 29.01.2013 16:51, schrieb Alexander Holler:
> Am 29.01.2013 12:11, schrieb Alexander Holler:
>
>>
>> To explain the problem on shutdown a bit further, I think the following
>> happens (usb and driver are statically linked and started by the kernel):
>>
>> shutdown -> kill signal -> usb stack shuts down -> udlfb waits (forever)
>> for a kill or an urb which it doesn't get.
>
> Having a second look at what I've written above, I'm not even sure if
> the kernel sends one or more fatal signals on shutdown at all. I've just
> assumed it because otherwise down_interruptible() wouldn't have worked
> before (it would have stalled on shutdown too (if an urb got missed),
> not only on disconnect).
>
> Sounds like an interesting question I should read about (if/when fatal
> signals are issued by the kernel). ;)
>
>> Maybe the sequence is different if the usb-stack and udlfb are used as a
>> module and/or udlfb is used only for X/fb. I'm not sure what actually
>> does shut down the usb-stack in such a case, but maybe more than one
>> kill signal might be thrown around.

If anyone still follows my monologue: The question was interesting
enough that I couldn't resist for long. ;)

(all pasted => format broken)

In drivers/tty/sysrq.c there is

------
static void send_sig_all(int sig)
{
         struct task_struct *p;

         read_lock(&tasklist_lock);
         for_each_process(p) {
                 if (p->flags & PF_KTHREAD)
                         continue;
                 if (is_global_init(p))
                         continue;

                 do_send_sig_info(sig, SEND_SIG_FORCED, p, true);
         }
         read_unlock(&tasklist_lock);
}

static void sysrq_handle_term(int key)
{
         send_sig_all(SIGTERM);
         console_loglevel = 8;
}

(...)

static void sysrq_handle_kill(int key)
{
         send_sig_all(SIGKILL);
         console_loglevel = 8;
}
------

Now I've done some learning by doing (kernel 3.7.5 + some patches):

------

                 spin_lock_irqsave(&dev->urbs.lock, flags);
------

Now I've disconnected the display. And, as send_sig_all() already 
suggests, the result was (besides discovering an oops in 
call_timer_fn.isra (once)):

------
[  120.963010] udlfb: AHO: I'm a kernel thread
[  122.957024] udlfb: AHO: ret -62
------
(-62 is -ETIME)

So, if the above down_timeout_killable() is only down_interruptible(), 
as in kernel 3.7.5, the  box would not shutdown afterwards, because on 
shutdown no signal would be send to that kernel-thread which called 
dlfb_free_urb_list().

A last note: dlfb_usb_disconnect() (thus dlfb_free_urb_list()) isn't 
called on shutdown if the device would still be connected. So the 
problem only might happen, if the screen will be disconnected before 
shutdown (and an urb gets missed). So the subject of my patch is correct. ;)

</monologue>

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Alexander Holler Jan. 29, 2013, 8:56 p.m. UTC | #1
Am 29.01.2013 21:35, schrieb Alexander Holler:
>
> So, if the above down_timeout_killable() is only down_interruptible(),
> as in kernel 3.7.5, the  box would not shutdown afterwards, because on
> shutdown no signal would be send to that kernel-thread which called
> dlfb_free_urb_list().
>
> A last note: dlfb_usb_disconnect() (thus dlfb_free_urb_list()) isn't
> called on shutdown if the device would still be connected. So the
> problem only might happen, if the screen will be disconnected before
> shutdown (and an urb gets missed). So the subject of my patch is
> correct. ;)
>
> </monologue>

Hmm, wrong, sorry, I still have something to add: As no signal arrives 
at all, v1 of that patch is enough and the implementation of 
down_timeout_killable() isn't necessary at all.

If there is a chance that the patch would be Acked-by by someone, I 
would made a v3, replacing

+		ret = down_timeout(&dev->urbs.limit_sem, GET_URB_TIMEOUT);

in v1 with

+		ret = down_timeout(&dev->urbs.limit_sem, FREE_URB_TIMEOUT);

as this seems to be what it should be.

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Feb. 4, 2013, 1:14 a.m. UTC | #2
On Tue, Jan 29, 2013 at 09:35:42PM +0100, Alexander Holler wrote:
> Am 29.01.2013 16:51, schrieb Alexander Holler:
> >Am 29.01.2013 12:11, schrieb Alexander Holler:
> >
> >>
> >>To explain the problem on shutdown a bit further, I think the following
> >>happens (usb and driver are statically linked and started by the kernel):
> >>
> >>shutdown -> kill signal -> usb stack shuts down -> udlfb waits (forever)
> >>for a kill or an urb which it doesn't get.
> >
> >Having a second look at what I've written above, I'm not even sure if
> >the kernel sends one or more fatal signals on shutdown at all. I've just
> >assumed it because otherwise down_interruptible() wouldn't have worked
> >before (it would have stalled on shutdown too (if an urb got missed),
> >not only on disconnect).
> >
> >Sounds like an interesting question I should read about (if/when fatal
> >signals are issued by the kernel). ;)
> >
> >>Maybe the sequence is different if the usb-stack and udlfb are used as a
> >>module and/or udlfb is used only for X/fb. I'm not sure what actually
> >>does shut down the usb-stack in such a case, but maybe more than one
> >>kill signal might be thrown around.
> 
> If anyone still follows my monologue: The question was interesting
> enough that I couldn't resist for long. ;)
> 
> (all pasted => format broken)
> 
> In drivers/tty/sysrq.c there is
> 
> ------
> static void send_sig_all(int sig)
> {
>         struct task_struct *p;
> 
>         read_lock(&tasklist_lock);
>         for_each_process(p) {
>                 if (p->flags & PF_KTHREAD)
>                         continue;
>                 if (is_global_init(p))
>                         continue;
> 
>                 do_send_sig_info(sig, SEND_SIG_FORCED, p, true);
>         }
>         read_unlock(&tasklist_lock);
> }
> 
> static void sysrq_handle_term(int key)
> {
>         send_sig_all(SIGTERM);
>         console_loglevel = 8;
> }
> 
> (...)
> 
> static void sysrq_handle_kill(int key)
> {
>         send_sig_all(SIGKILL);
>         console_loglevel = 8;
> }
> ------
> 
> Now I've done some learning by doing (kernel 3.7.5 + some patches):
> 
> ------
> diff --git a/drivers/video/udlfb.c b/drivers/video/udlfb.c
> index df249f3..db8a86c 100644
> --- a/drivers/video/udlfb.c
> +++ b/drivers/video/udlfb.c
> @@ -1876,14 +1876,18 @@ static void dlfb_free_urb_list(struct dlfb_data
> *dev)
>         unsigned long flags;
> 
>         pr_notice("Freeing all render urbs\n");
> +       if (current->flags & PF_KTHREAD)
> +               pr_info("AHO: I'm a kernel thread\n");
> 
>         /* keep waiting and freeing, until we've got 'em all */
>         while (count--) {
> 
>                 /* Timeout likely occurs at disconnect (resulting in a
> leak) */
>                 ret = down_timeout_killable(&dev->urbs.limit_sem,
> FREE_URB_TIMEOUT);
> -               if (ret)
> +               if (ret) {
> +                       pr_info("AHO: ret %d\n", ret);
>                         break;
> +               }
> 
>                 spin_lock_irqsave(&dev->urbs.lock, flags);
> ------
> 
> Now I've disconnected the display. And, as send_sig_all() already
> suggests, the result was (besides discovering an oops in
> call_timer_fn.isra (once)):
> 
> ------
> [  120.963010] udlfb: AHO: I'm a kernel thread
> [  122.957024] udlfb: AHO: ret -62
> ------
> (-62 is -ETIME)
> 
> So, if the above down_timeout_killable() is only
> down_interruptible(), as in kernel 3.7.5, the  box would not
> shutdown afterwards, because on shutdown no signal would be send to
> that kernel-thread which called dlfb_free_urb_list().
> 
> A last note: dlfb_usb_disconnect() (thus dlfb_free_urb_list()) isn't
> called on shutdown if the device would still be connected. So the
> problem only might happen, if the screen will be disconnected before
> shutdown (and an urb gets missed). So the subject of my patch is
> correct. ;)

Yes, we don't disconnect all devices from the USB bus on shutdown
because, I think, we didn't tear down all of the PCI devices originally,
so your USB bus never knew it was going to be shutdown.

This is how things have always worked, and shutting down PCI devices in
the past have been known to cause problems.  I think.  I vaguely
remember some issues when I tried to do this 10+ years or so ago in the
2.5 kernel days, but I could be totally wrong given that I can't
remember what I was working on last month at times...

So you are right in that your driver will wait for forever for a
disconnect() to happen, as it will never be called.  I don't understand
the problem that this is causing when it happens.  What's wrong with
udlfb that having the cpu suddently reset as the powerdown happened
without it knowing about it?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Feb. 4, 2013, 12:05 p.m. UTC | #3
Am 04.02.2013 02:14, schrieb Greg KH:

> So you are right in that your driver will wait for forever for a
> disconnect() to happen, as it will never be called.  I don't understand
> the problem that this is causing when it happens.  What's wrong with
> udlfb that having the cpu suddently reset as the powerdown happened
> without it knowing about it?

There is nothing wrong with that. I've just explained why a problem 
doesn't occur on shutdown but on disconnect (of the device).

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Feb. 4, 2013, 7:17 p.m. UTC | #4
Am 04.02.2013 13:05, schrieb Alexander Holler:
> Am 04.02.2013 02:14, schrieb Greg KH:
>
>> So you are right in that your driver will wait for forever for a
>> disconnect() to happen, as it will never be called.  I don't understand
>> the problem that this is causing when it happens.  What's wrong with
>> udlfb that having the cpu suddently reset as the powerdown happened
>> without it knowing about it?
>
> There is nothing wrong with that. I've just explained why a problem
> doesn't occur on shutdown but on disconnect (of the device).

Maybe my explanation before was just to long and I try to explain it a 
bit shorter:

If a device gets disconnected, the disconnect in udlfb might wait 
forever in down_interruptible() (because it waits for an urb it never 
receives). This even prevents a shutdown afterwards, because that 
down_interruptible() never receives a signal (at shutdown, because 
kernel threads don't get such).

So the change from down_timeout() to down_interruptible() in 
dlfb_free_urb_list() with commit 
33077b8d3042e01da61924973e372abe589ba297 only results in that the 
following code (thus the break there) will never be reached if an urb 
got missed (because of a disconnect).

And the accompanying comment (... at shutdown) is misleading, because on 
shutdown, the kernel thread which calls dlfb_free_urb_list() never gets 
a signal, so the interruption just never happens.

As I've experienced the "missing urb on disconnect" problem quiet often, 
I've changed that down_interruptible() to down_timeout() (in v1 and in 
v2 to down_timeout_interruptible, because I wasn't aware that no signal 
arrives on shutdown).

Hmm, ok, that explanation isn't much shorter. ;)

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Feb. 4, 2013, 7:25 p.m. UTC | #5
On Mon, Feb 04, 2013 at 08:17:04PM +0100, Alexander Holler wrote:
> Am 04.02.2013 13:05, schrieb Alexander Holler:
> >Am 04.02.2013 02:14, schrieb Greg KH:
> >
> >>So you are right in that your driver will wait for forever for a
> >>disconnect() to happen, as it will never be called.  I don't understand
> >>the problem that this is causing when it happens.  What's wrong with
> >>udlfb that having the cpu suddently reset as the powerdown happened
> >>without it knowing about it?
> >
> >There is nothing wrong with that. I've just explained why a problem
> >doesn't occur on shutdown but on disconnect (of the device).
> 
> Maybe my explanation before was just to long and I try to explain it
> a bit shorter:
> 
> If a device gets disconnected, the disconnect in udlfb might wait
> forever in down_interruptible() (because it waits for an urb it
> never receives). This even prevents a shutdown afterwards, because
> that down_interruptible() never receives a signal (at shutdown,
> because kernel threads don't get such).

Where was that urb when the disconnect happened?  The USB core should
call your urb callback for any outstanding urbs at that point in time,
with the proper error flag being set, are you handling that properly?

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Feb. 5, 2013, 7:08 a.m. UTC | #6
Am 04.02.2013 20:25, schrieb Greg KH:
> On Mon, Feb 04, 2013 at 08:17:04PM +0100, Alexander Holler wrote:
>> Am 04.02.2013 13:05, schrieb Alexander Holler:
>>> Am 04.02.2013 02:14, schrieb Greg KH:
>>>
>>>> So you are right in that your driver will wait for forever for a
>>>> disconnect() to happen, as it will never be called.  I don't understand
>>>> the problem that this is causing when it happens.  What's wrong with
>>>> udlfb that having the cpu suddently reset as the powerdown happened
>>>> without it knowing about it?
>>>
>>> There is nothing wrong with that. I've just explained why a problem
>>> doesn't occur on shutdown but on disconnect (of the device).
>>
>> Maybe my explanation before was just to long and I try to explain it
>> a bit shorter:
>>
>> If a device gets disconnected, the disconnect in udlfb might wait
>> forever in down_interruptible() (because it waits for an urb it
>> never receives). This even prevents a shutdown afterwards, because
>> that down_interruptible() never receives a signal (at shutdown,
>> because kernel threads don't get such).
> 
> Where was that urb when the disconnect happened?  The USB core should
> call your urb callback for any outstanding urbs at that point in time,
> with the proper error flag being set, are you handling that properly?

I don't know where that urb is as I don't handle it. I just know that
_interruptible doesn't make any sense and _timeout is necessary here.
But as nobody else seems to have a problem, nobody else see seems to see
the problem there and I seem to be unable to explain it, just ignore
that patch.

Regards,

Alexander

--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Greg Kroah-Hartman Feb. 5, 2013, 5:22 p.m. UTC | #7
On Tue, Feb 05, 2013 at 08:08:28AM +0100, Alexander Holler wrote:
> Am 04.02.2013 20:25, schrieb Greg KH:
> > On Mon, Feb 04, 2013 at 08:17:04PM +0100, Alexander Holler wrote:
> >> Am 04.02.2013 13:05, schrieb Alexander Holler:
> >>> Am 04.02.2013 02:14, schrieb Greg KH:
> >>>
> >>>> So you are right in that your driver will wait for forever for a
> >>>> disconnect() to happen, as it will never be called.  I don't understand
> >>>> the problem that this is causing when it happens.  What's wrong with
> >>>> udlfb that having the cpu suddently reset as the powerdown happened
> >>>> without it knowing about it?
> >>>
> >>> There is nothing wrong with that. I've just explained why a problem
> >>> doesn't occur on shutdown but on disconnect (of the device).
> >>
> >> Maybe my explanation before was just to long and I try to explain it
> >> a bit shorter:
> >>
> >> If a device gets disconnected, the disconnect in udlfb might wait
> >> forever in down_interruptible() (because it waits for an urb it
> >> never receives). This even prevents a shutdown afterwards, because
> >> that down_interruptible() never receives a signal (at shutdown,
> >> because kernel threads don't get such).
> > 
> > Where was that urb when the disconnect happened?  The USB core should
> > call your urb callback for any outstanding urbs at that point in time,
> > with the proper error flag being set, are you handling that properly?
> 
> I don't know where that urb is as I don't handle it.

What do you mean by that?  The urb is being sent back to your driver,
right?  If not, that's a bug, but please be sure that your urb callback
isn't really being called.

thanks,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Feb. 5, 2013, 5:36 p.m. UTC | #8
Am 05.02.2013 18:22, schrieb Greg KH:
> On Tue, Feb 05, 2013 at 08:08:28AM +0100, Alexander Holler wrote:
>> Am 04.02.2013 20:25, schrieb Greg KH:

>>> Where was that urb when the disconnect happened?  The USB core should
>>> call your urb callback for any outstanding urbs at that point in time,
>>> with the proper error flag being set, are you handling that properly?
>>
>> I don't know where that urb is as I don't handle it.
>
> What do you mean by that?  The urb is being sent back to your driver,
> right?  If not, that's a bug, but please be sure that your urb callback
> isn't really being called.

I meant it isn't _my_ driver. ;)

I'm just trying to add some würgarounds without having the need to 
rewrite the whole driver.

In regard to that "urb missing problem", I think I've just named it 
wrong and the actual problem is a race-condition between the semaphore 
handling (which is used to keep track of the urbs) and the urb handling 
inside the driver.

But I've just switched to udl (instead of udlfb) and will see if I can 
fix the bugs there to make it usable as a console. udl is a rewrite of 
udlfb with some additional features (e.g. drm), so hopefully fixing the 
remaining problems there will require less work.

Regards,

Alexander

--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Dave Airlie Feb. 8, 2013, 4:07 a.m. UTC | #9
>
> But I've just switched to udl (instead of udlfb) and will see if I can fix
> the bugs there to make it usable as a console. udl is a rewrite of udlfb
> with some additional features (e.g. drm), so hopefully fixing the remaining
> problems there will require less work.

I may have fixed the major udl problem with being a console, patch was
posted to dri-devel yesterday, and I've pushed it into -next.

Dave.
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Feb. 8, 2013, 9:53 a.m. UTC | #10
Am 08.02.2013 05:07, schrieb Dave Airlie:
>>
>> But I've just switched to udl (instead of udlfb) and will see if I can fix
>> the bugs there to make it usable as a console. udl is a rewrite of udlfb
>> with some additional features (e.g. drm), so hopefully fixing the remaining
>> problems there will require less work.
>
> I may have fixed the major udl problem with being a console, patch was
> posted to dri-devel yesterday, and I've pushed it into -next.

Thanks a lot. I will check it.

Regards,

Alexander

--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/video/udlfb.c b/drivers/video/udlfb.c
index df249f3..db8a86c 100644
--- a/drivers/video/udlfb.c
+++ b/drivers/video/udlfb.c
@@ -1876,14 +1876,18 @@  static void dlfb_free_urb_list(struct dlfb_data
*dev)
         unsigned long flags;

         pr_notice("Freeing all render urbs\n");
+       if (current->flags & PF_KTHREAD)
+               pr_info("AHO: I'm a kernel thread\n");

         /* keep waiting and freeing, until we've got 'em all */
         while (count--) {

                 /* Timeout likely occurs at disconnect (resulting in a
leak) */
                 ret = down_timeout_killable(&dev->urbs.limit_sem,
FREE_URB_TIMEOUT);
-               if (ret)
+               if (ret) {
+                       pr_info("AHO: ret %d\n", ret);
                         break;
+               }