diff mbox

[2/3,v2] fb: udlfb: fix hang at disconnect

Message ID 1359139768-32294-2-git-send-email-holler@ahsoftware.de (mailing list archive)
State New, archived
Headers show

Commit Message

Alexander Holler Jan. 25, 2013, 6:49 p.m. UTC
When a device was disconnected the driver may hang at waiting for urbs it never
will get. Fix this by using a timeout while waiting for the used semaphore.

There is still a memory leak if a timeout happens, but at least the driver
now continues his disconnect routine.

Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Holler <holler@ahsoftware.de>
---
 drivers/video/udlfb.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Andrew Morton Jan. 29, 2013, 12:22 a.m. UTC | #1
On Fri, 25 Jan 2013 19:49:27 +0100
Alexander Holler <holler@ahsoftware.de> wrote:

> When a device was disconnected the driver may hang at waiting for urbs it never
> will get. Fix this by using a timeout while waiting for the used semaphore.
> 
> There is still a memory leak if a timeout happens, but at least the driver
> now continues his disconnect routine.
> 
> ...
>
> --- a/drivers/video/udlfb.c
> +++ b/drivers/video/udlfb.c
> @@ -1832,8 +1832,9 @@ static void dlfb_free_urb_list(struct dlfb_data *dev)
>  	/* keep waiting and freeing, until we've got 'em all */
>  	while (count--) {
>  
> -		/* Getting interrupted means a leak, but ok at disconnect */
> -		ret = down_interruptible(&dev->urbs.limit_sem);
> +		/* Timeout likely occurs at disconnect (resulting in a leak) */
> +		ret = down_timeout_killable(&dev->urbs.limit_sem,
> +						FREE_URB_TIMEOUT);
>  		if (ret)
>  			break;

This is rather a hack.  Do you have an understanding of the underlying
bug?  Why is the driver waiting for things which will never happen?

--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Jan. 29, 2013, 12:56 a.m. UTC | #2
Am 29.01.2013 01:22, schrieb Andrew Morton:
> On Fri, 25 Jan 2013 19:49:27 +0100
> Alexander Holler <holler@ahsoftware.de> wrote:
> 
>> When a device was disconnected the driver may hang at waiting for urbs it never
>> will get. Fix this by using a timeout while waiting for the used semaphore.
>>
>> There is still a memory leak if a timeout happens, but at least the driver
>> now continues his disconnect routine.
>>
>> ...
>>
>> --- a/drivers/video/udlfb.c
>> +++ b/drivers/video/udlfb.c
>> @@ -1832,8 +1832,9 @@ static void dlfb_free_urb_list(struct dlfb_data *dev)
>>  	/* keep waiting and freeing, until we've got 'em all */
>>  	while (count--) {
>>  
>> -		/* Getting interrupted means a leak, but ok at disconnect */
>> -		ret = down_interruptible(&dev->urbs.limit_sem);
>> +		/* Timeout likely occurs at disconnect (resulting in a leak) */
>> +		ret = down_timeout_killable(&dev->urbs.limit_sem,
>> +						FREE_URB_TIMEOUT);
>>  		if (ret)
>>  			break;
> 
> This is rather a hack.  Do you have an understanding of the underlying
> bug?  Why is the driver waiting for things which will never happen?

It is a hack, but I don't want to rewrite the whole driver. There is
already an attempt to do so, udl. The hack is a quick solution which
doesn't make something worse, just better. The only drawback might be
the few additional bytes for the implementation of
down_timeout_killable(), but I thought such should be already available,
and wondered it wasn't.

Fixing the underlying problem (including removing the leaks) would just
end up in another rewrite and I prefer to have a look at udl, maybe
fixing the current problems to use such a device as console there.

I've only posted this small patch series, because some (longterm) stable
kernels don't have udl, and smscufx, which is in large parts identical
to udlfb might want that patch too.

Just in case: I don't know anything about smscufx and have discovered
the similarities between those drivers only when I did a git grep to
search for something I fixed with a previous patch.

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Jan. 29, 2013, 10:35 a.m. UTC | #3
Am 29.01.2013 01:56, schrieb Alexander Holler:
> Am 29.01.2013 01:22, schrieb Andrew Morton:
>> On Fri, 25 Jan 2013 19:49:27 +0100
>> Alexander Holler <holler@ahsoftware.de> wrote:
>>
>>> When a device was disconnected the driver may hang at waiting for urbs it never
>>> will get. Fix this by using a timeout while waiting for the used semaphore.
>>>
>>> There is still a memory leak if a timeout happens, but at least the driver
>>> now continues his disconnect routine.
>>>
>>> ...
>>>
>>> --- a/drivers/video/udlfb.c
>>> +++ b/drivers/video/udlfb.c
>>> @@ -1832,8 +1832,9 @@ static void dlfb_free_urb_list(struct dlfb_data *dev)
>>>   	/* keep waiting and freeing, until we've got 'em all */
>>>   	while (count--) {
>>>
>>> -		/* Getting interrupted means a leak, but ok at disconnect */
>>> -		ret = down_interruptible(&dev->urbs.limit_sem);
>>> +		/* Timeout likely occurs at disconnect (resulting in a leak) */
>>> +		ret = down_timeout_killable(&dev->urbs.limit_sem,
>>> +						FREE_URB_TIMEOUT);
>>>   		if (ret)
>>>   			break;
>>
>> This is rather a hack.  Do you have an understanding of the underlying
>> bug?  Why is the driver waiting for things which will never happen?

To add a bit more explanation:

I've experienced that bug after moving the fb-damage-handling into a 
workqueue (to make the driver usable as console). This likely has 
increased the possibility that an urb gets missed when the usb-stack 
calls the (usb-)disconnect function of the driver. But I don't know as I 
couldn't use the driver before (as fbcon) so I don't really have a 
comparison.

What currently happens here is something like that:

fb -> damage -> workload which sends urb and waits for answer
device disconnect -> dlfb_usb_disconnect() -> stall (no answer to the 
above urb)

I don't know why the disconnect waits for all urbs. The code looks like 
it does that just to free the allocated memory. As I'm not very familiar 
with the usb-stack, I would have to read up about the urb-handling to 
find out how to free the memory otherwise.

As the previous comment in the code suggests that urbs already got 
missed (on shutdown) before, I assume that even without my patch, which 
moved the damage into a workqueue, the problem could occur which then 
prevents a shutdown as there is no timeout. As I've experienced that 
problem not only on disconnect, but on shutdown too (no shutdown was 
possible), I have to assume, that the previous used down_interruptible() 
didn't get a signal on shutdown (if the driver is used as fbcon), 
therefor I consider the timeout as necessary.

Regards,

Alexander

--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Jan. 29, 2013, 11:11 a.m. UTC | #4
Am 29.01.2013 11:35, schrieb Alexander Holler:
> Am 29.01.2013 01:56, schrieb Alexander Holler:
>> Am 29.01.2013 01:22, schrieb Andrew Morton:
>>> On Fri, 25 Jan 2013 19:49:27 +0100
>>> Alexander Holler <holler@ahsoftware.de> wrote:
>>>
>>>> When a device was disconnected the driver may hang at waiting for
>>>> urbs it never
>>>> will get. Fix this by using a timeout while waiting for the used
>>>> semaphore.
>>>>
>>>> There is still a memory leak if a timeout happens, but at least the
>>>> driver
>>>> now continues his disconnect routine.
>>>>
>>>> ...
>>>>
>>>> --- a/drivers/video/udlfb.c
>>>> +++ b/drivers/video/udlfb.c
>>>> @@ -1832,8 +1832,9 @@ static void dlfb_free_urb_list(struct
>>>> dlfb_data *dev)
>>>>       /* keep waiting and freeing, until we've got 'em all */
>>>>       while (count--) {
>>>>
>>>> -        /* Getting interrupted means a leak, but ok at disconnect */
>>>> -        ret = down_interruptible(&dev->urbs.limit_sem);
>>>> +        /* Timeout likely occurs at disconnect (resulting in a
>>>> leak) */
>>>> +        ret = down_timeout_killable(&dev->urbs.limit_sem,
>>>> +                        FREE_URB_TIMEOUT);
>>>>           if (ret)
>>>>               break;
>>>
>>> This is rather a hack.  Do you have an understanding of the underlying
>>> bug?  Why is the driver waiting for things which will never happen?
>
> To add a bit more explanation:
>
> I've experienced that bug after moving the fb-damage-handling into a
> workqueue (to make the driver usable as console). This likely has
> increased the possibility that an urb gets missed when the usb-stack
> calls the (usb-)disconnect function of the driver. But I don't know as I
> couldn't use the driver before (as fbcon) so I don't really have a
> comparison.
>
> What currently happens here is something like that:
>
> fb -> damage -> workload which sends urb and waits for answer
> device disconnect -> dlfb_usb_disconnect() -> stall (no answer to the
> above urb)
>
> I don't know why the disconnect waits for all urbs. The code looks like
> it does that just to free the allocated memory. As I'm not very familiar
> with the usb-stack, I would have to read up about the urb-handling to
> find out how to free the memory otherwise.
>
> As the previous comment in the code suggests that urbs already got
> missed (on shutdown) before, I assume that even without my patch, which
> moved the damage into a workqueue, the problem could occur which then
> prevents a shutdown as there is no timeout. As I've experienced that
> problem not only on disconnect, but on shutdown too (no shutdown was
> possible), I have to assume, that the previous used down_interruptible()
> didn't get a signal on shutdown (if the driver is used as fbcon),
> therefor I consider the timeout as necessary.

To explain the problem on shutdown a bit further, I think the following 
happens (usb and driver are statically linked and started by the kernel):

shutdown -> kill signal -> usb stack shuts down -> udlfb waits (forever) 
for a kill or an urb which it doesn't get.

Maybe the sequence is different if the usb-stack and udlfb are used as a 
module and/or udlfb is used only for X/fb. I'm not sure what actually 
does shut down the usb-stack in such a case, but maybe more than one 
kill signal might be thrown around.

Regards,

Alexander

--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Jan. 29, 2013, 3:51 p.m. UTC | #5
Am 29.01.2013 12:11, schrieb Alexander Holler:

>
> To explain the problem on shutdown a bit further, I think the following
> happens (usb and driver are statically linked and started by the kernel):
>
> shutdown -> kill signal -> usb stack shuts down -> udlfb waits (forever)
> for a kill or an urb which it doesn't get.

Having a second look at what I've written above, I'm not even sure if 
the kernel sends one or more fatal signals on shutdown at all. I've just 
assumed it because otherwise down_interruptible() wouldn't have worked 
before (it would have stalled on shutdown too (if an urb got missed), 
not only on disconnect).

Sounds like an interesting question I should read about (if/when fatal 
signals are issued by the kernel). ;)

> Maybe the sequence is different if the usb-stack and udlfb are used as a
> module and/or udlfb is used only for X/fb. I'm not sure what actually
> does shut down the usb-stack in such a case, but maybe more than one
> kill signal might be thrown around.

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/video/udlfb.c b/drivers/video/udlfb.c
index 86d449e..db6cc66 100644
--- a/drivers/video/udlfb.c
+++ b/drivers/video/udlfb.c
@@ -1832,8 +1832,9 @@  static void dlfb_free_urb_list(struct dlfb_data *dev)
 	/* keep waiting and freeing, until we've got 'em all */
 	while (count--) {
 
-		/* Getting interrupted means a leak, but ok at disconnect */
-		ret = down_interruptible(&dev->urbs.limit_sem);
+		/* Timeout likely occurs at disconnect (resulting in a leak) */
+		ret = down_timeout_killable(&dev->urbs.limit_sem,
+						FREE_URB_TIMEOUT);
 		if (ret)
 			break;