diff mbox

fb: udlfb: fix hang at disconnect

Message ID 1357996822-13072-1-git-send-email-holler@ahsoftware.de (mailing list archive)
State New, archived
Headers show

Commit Message

Alexander Holler Jan. 12, 2013, 1:20 p.m. UTC
When a device was disconnected the driver may hang at waiting for urbs it never
will get. Fix this by using a timeout while waiting for the used semaphore.

There is still a memory leak if a timeout happens, but at least the driver
now continues his disconnect routine.

Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Holler <holler@ahsoftware.de>
---
 drivers/video/udlfb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

bernie@plugable.com Jan. 12, 2013, 10:22 p.m. UTC | #1
Hi Alexander,

On Sat, Jan 12, 2013 at 5:20 AM, Alexander Holler <holler@ahsoftware.de> wrote:
> When a device was disconnected the driver may hang at waiting for urbs it never
> will get. Fix this by using a timeout while waiting for the used semaphore.

The code used to be this way, but it used to cause nasty shutdown hangs:
http://git.plugable.com/gitphp/index.php?p=udlfb&a=commitdiff&h=1dd39a65001deb5a84088dfabb788d3274fbb6b6

Which is why the code is the way it is today.

Can you say under what situations you're hitting hangs on device
disconnect?  Have you tested extensively to confirm no shutdown hangs
with your patch?

Stepping back, there was another recent patch from the community to
udlfb to work around issues of sleeping in the wrong context. The fix
involved introducing another scheduled workitem. This slows everything
down when it's in the main path, and isn't really desirable if we can
avoid it.

Another option to eliminate all these problems -- long considered but
never implemented -- is to get rid of all semaphores and potential
sleeps in udlfb entirely.  That would require a strategy to throttle
rendering in some way other than by waiting in kernel (without some
throttling strategy, the USB bus can be a bottleneck which can flood
the system with rendered but untransmitted pixels).

Options might be:

1) When transfer buffers are full, keep track of dirty rectangles for
the rest and pick up where we left off the next time we're entered
(avoiding flooding by potentially having pixels in the dirty regions
be written over multiple times before we get to rendering them once)

2 ) If we "bet" on page-fault-based defio dirty pixel detection, we
could allocate buffers dynamically but increase the scheduling time to
transfer as our outstanding buffer count grows, and reduce the latency
only when the buffer count goes down (again, pixels will be
potentially rendered many times before being transfered once, avoiding
flooding).

Any other ideas on the specific or general case are welcome.  Also
note that udlfb is being largely superceeded by the udl DRM driver -
so any decisions here should also be considered in that codebase.

In any case, thanks for giving the DisplayLink USB 2.0 graphics
drivers attention - it's much appreciated!

Bernie Thompson
http://plugable.com/
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Jan. 13, 2013, 12:05 p.m. UTC | #2
Am 12.01.2013 23:22, schrieb Bernie Thompson:
> Hi Alexander,
>
> On Sat, Jan 12, 2013 at 5:20 AM, Alexander Holler <holler@ahsoftware.de> wrote:
>> When a device was disconnected the driver may hang at waiting for urbs it never
>> will get. Fix this by using a timeout while waiting for the used semaphore.
>
> The code used to be this way, but it used to cause nasty shutdown hangs:
> http://git.plugable.com/gitphp/index.php?p=udlfb&a=commitdiff&h=1dd39a65001deb5a84088dfabb788d3274fbb6b6
>
> Which is why the code is the way it is today.
>
> Can you say under what situations you're hitting hangs on device
> disconnect?  Have you tested extensively to confirm no shutdown hangs
> with your patch?
>

The driver almost always (2/3) hangs here when the device gets 
disconnected. It is easy to see when the device gets attached again as 
nothing will happen if the driver (already) hangs (in addition that a 
shutdown isn't possible).

I didn't test it extensively, but without the patch the driver isn't 
usable here. Maybe my previous patch which moves damages to a workqueue 
is the reason that it's more likely that urbs get missing, but the 
problem already existed because an urb might get missed on disconnect. I 
don't know what problems existed before, maybe people just had a problem 
with the BUG_ON(ret). If that _interrupted_ is really needed, it could 
make sense to implement a down_timeout_interruptible() for semaphores.

> Stepping back, there was another recent patch from the community to
> udlfb to work around issues of sleeping in the wrong context. The fix
> involved introducing another scheduled workitem. This slows everything
> down when it's in the main path, and isn't really desirable if we can
> avoid it.

Do you mean the one I've recently posted? It is needed, at least for 3.7 
(I don't know since when those "schedule while atomic" messages appear).
It might slow down refreshes, but it is needed, at least until someone 
gets around those semaphores or removes the spinlocks in upper layers 
(as Alan Cox suggested with the "I am crap" helper for printk).

Maybe using a WQ_HIGHPRI for the workqueue with the damages will speed 
up things.

More optimizations might be doable too (e.g. combining multiple queued 
damages).

> Another option to eliminate all these problems -- long considered but
> never implemented -- is to get rid of all semaphores and potential
> sleeps in udlfb entirely.  That would require a strategy to throttle
> rendering in some way other than by waiting in kernel (without some
> throttling strategy, the USB bus can be a bottleneck which can flood
> the system with rendered but untransmitted pixels).
>
> Options might be:
>
> 1) When transfer buffers are full, keep track of dirty rectangles for
> the rest and pick up where we left off the next time we're entered
> (avoiding flooding by potentially having pixels in the dirty regions
> be written over multiple times before we get to rendering them once)
>
> 2 ) If we "bet" on page-fault-based defio dirty pixel detection, we
> could allocate buffers dynamically but increase the scheduling time to
> transfer as our outstanding buffer count grows, and reduce the latency
> only when the buffer count goes down (again, pixels will be
> potentially rendered many times before being transfered once, avoiding
> flooding).
>
> Any other ideas on the specific or general case are welcome.  Also
> note that udlfb is being largely superceeded by the udl DRM driver -
> so any decisions here should also be considered in that codebase.
>
> In any case, thanks for giving the DisplayLink USB 2.0 graphics
> drivers attention - it's much appreciated!

Thanks for the sugestions, but I don't feel the need to spend a lot of 
time here. I just wanted to use the console with the device and a kernel 
3.7.x and neither udlfb nor udl currently worked (and I'm pretty sure 
I've used one of them some time before, likely udlfb).

Btw, to see the console again after a disconnect and connect, I'm 
currently using the following (necessary) quick&dirty hack:

---------
         /* if clients still have us open, will be freed on last close */
-       if (dev->fb_count == 0)
+//     if (dev->fb_count == 0)
                 schedule_delayed_work(&dev->free_framebuffer_work, 0);
---------

Without that the framebuffer will never get unregistered (because just 
unlinking it doesn't remove the fb-console which counts for one client) 
with the result that the new one (after connecting the device again) 
will not get the console.

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alexander Holler Jan. 13, 2013, 12:24 p.m. UTC | #3
Am 13.01.2013 13:05, schrieb Alexander Holler:
> Am 12.01.2013 23:22, schrieb Bernie Thompson:

> I didn't test it extensively, but without the patch the driver isn't
> usable here. Maybe my previous patch which moves damages to a workqueue

To add some more explanations, I'm currently only testing it with a 
statically linked udlfb (for fbcon) as that is what I'm mainly using the 
device for (with otherwise headless boxes). When udlfb is a module, I 
don't see those "schedule while atomic" messages (I don't know why), but 
having a console only after the modules got loaded isn't always an option.

Regards,

Alexander
--
To unsubscribe from this list: send the line "unsubscribe linux-fbdev" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/drivers/video/udlfb.c b/drivers/video/udlfb.c
index 86d449e..cc4a8d1 100644
--- a/drivers/video/udlfb.c
+++ b/drivers/video/udlfb.c
@@ -1832,8 +1832,8 @@  static void dlfb_free_urb_list(struct dlfb_data *dev)
 	/* keep waiting and freeing, until we've got 'em all */
 	while (count--) {
 
-		/* Getting interrupted means a leak, but ok at disconnect */
-		ret = down_interruptible(&dev->urbs.limit_sem);
+		/* Timeout likely occurs at disconnect (resulting in a leak) */
+		ret = down_timeout(&dev->urbs.limit_sem, GET_URB_TIMEOUT);
 		if (ret)
 			break;