Message ID | 1409745310-19092-7-git-send-email-tomi.valkeinen@ti.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Sep 03, 2014 at 02:55:08PM +0300, Tomi Valkeinen wrote: > omap_crtc_flush() is used to wait for scheduled work to be done for the > give crtc. However, it's not quite right at the moment. > > omap_crtc_flush() does wait for work that is ran via vsync irq to be > done. However, work is also queued to the driver's priv->wq workqueue, > which is not handled by omap_crtc_flush(). > > Improve omap_crtc_flush() to flush the workqueue so that work there will > be ran. > > This fixes a race issue on module unload, where an unpin work may be on > the work queue, but does not get ran before drm core starts tearing the > driver down, leading to a WARN. > > Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> I didn't really dig into details, but isn't that the same workqueue as used by the async modeset code? So the same deadlocks might happen ... lockdep won't complain though since you essentially open-code a workqueue_flush, and lockdep also doesn't complain about all possible deadlocks (due to some design issues with lockdep). -Daniel > --- > drivers/gpu/drm/omapdrm/omap_crtc.c | 31 ++++++++++++++++++++++++++----- > 1 file changed, 26 insertions(+), 5 deletions(-) > > diff --git a/drivers/gpu/drm/omapdrm/omap_crtc.c b/drivers/gpu/drm/omapdrm/omap_crtc.c > index 3261fbf94957..506cf8bc804f 100644 > --- a/drivers/gpu/drm/omapdrm/omap_crtc.c > +++ b/drivers/gpu/drm/omapdrm/omap_crtc.c > @@ -655,21 +655,42 @@ static void omap_crtc_post_apply(struct omap_drm_apply *apply) > /* nothing needed for post-apply */ > } > > +static bool omap_crtc_work_pending(struct omap_crtc *omap_crtc) > +{ > + return !list_empty(&omap_crtc->pending_applies) || > + !list_empty(&omap_crtc->queued_applies) || > + omap_crtc->event || omap_crtc->old_fb; > +} > + > +/* > + * Wait for any work on the workqueue to be finished, and any work which will > + * be run via vsync irq to be done. > + * > + * Note that work on workqueue could schedule new vsync work, and vice versa. > + */ > void omap_crtc_flush(struct drm_crtc *crtc) > { > struct omap_crtc *omap_crtc = to_omap_crtc(crtc); > + struct omap_drm_private *priv = crtc->dev->dev_private; > int loops = 0; > > - while (!list_empty(&omap_crtc->pending_applies) || > - !list_empty(&omap_crtc->queued_applies) || > - omap_crtc->event || omap_crtc->old_fb) { > + while (true) { > + /* first flush the wq, so that any scheduled work is done */ > + flush_workqueue(priv->wq); > + > + /* if we have nothing queued for this crtc, we're done */ > + if (!omap_crtc_work_pending(omap_crtc)) > + break; > > if (++loops > 10) { > - dev_err(crtc->dev->dev, > - "omap_crtc_flush() timeout\n"); > + dev_err(crtc->dev->dev, "omap_crtc_flush() timeout\n"); > break; > } > > + /* > + * wait for a bit so that a vsync has (probably) happened, and > + * that the crtc work is (probably) done. > + */ > schedule_timeout_uninterruptible(msecs_to_jiffies(20)); > } > } > -- > 1.9.1 > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
On 03/09/14 17:27, Daniel Vetter wrote: > On Wed, Sep 03, 2014 at 02:55:08PM +0300, Tomi Valkeinen wrote: >> omap_crtc_flush() is used to wait for scheduled work to be done for the >> give crtc. However, it's not quite right at the moment. >> >> omap_crtc_flush() does wait for work that is ran via vsync irq to be >> done. However, work is also queued to the driver's priv->wq workqueue, >> which is not handled by omap_crtc_flush(). >> >> Improve omap_crtc_flush() to flush the workqueue so that work there will >> be ran. >> >> This fixes a race issue on module unload, where an unpin work may be on >> the work queue, but does not get ran before drm core starts tearing the >> driver down, leading to a WARN. >> >> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > > I didn't really dig into details, but isn't that the same workqueue as > used by the async modeset code? So the same deadlocks might happen ... Yes, we have just one workqueue in the driver. Hmm, deadlocks with what lock? The modeconfig or crtc->mutex? I don't think they are locked at any place where omap_crtc_flush is called. > lockdep won't complain though since you essentially open-code a > workqueue_flush, and lockdep also doesn't complain about all possible > deadlocks (due to some design issues with lockdep). What do you mean "open-code a workqueue_flush"?. I use flush_workqueue there. We have two things to wait for: work on the workqueue and work which is triggered by the vsync irq. So we loop and test for both of those, until there's no more work. Tomi
On Mon, Sep 08, 2014 at 04:03:18PM +0300, Tomi Valkeinen wrote: > On 03/09/14 17:27, Daniel Vetter wrote: > > On Wed, Sep 03, 2014 at 02:55:08PM +0300, Tomi Valkeinen wrote: > >> omap_crtc_flush() is used to wait for scheduled work to be done for the > >> give crtc. However, it's not quite right at the moment. > >> > >> omap_crtc_flush() does wait for work that is ran via vsync irq to be > >> done. However, work is also queued to the driver's priv->wq workqueue, > >> which is not handled by omap_crtc_flush(). > >> > >> Improve omap_crtc_flush() to flush the workqueue so that work there will > >> be ran. > >> > >> This fixes a race issue on module unload, where an unpin work may be on > >> the work queue, but does not get ran before drm core starts tearing the > >> driver down, leading to a WARN. > >> > >> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> > > > > I didn't really dig into details, but isn't that the same workqueue as > > used by the async modeset code? So the same deadlocks might happen ... > > Yes, we have just one workqueue in the driver. > > Hmm, deadlocks with what lock? The modeconfig or crtc->mutex? I don't > think they are locked at any place where omap_crtc_flush is called. Oh, I presumed you're using _flush in the relevant modeset functions - we do that in i915 to make sure that all the pageflips and other stuff completed before we do another modeset. But omap only calls this at driver unload, so no direct problem. > > lockdep won't complain though since you essentially open-code a > > workqueue_flush, and lockdep also doesn't complain about all possible > > deadlocks (due to some design issues with lockdep). > > What do you mean "open-code a workqueue_flush"?. I use flush_workqueue > there. We have two things to wait for: work on the workqueue and work > which is triggered by the vsync irq. So we loop and test for both of > those, until there's no more work. Oops, missed that. Ordering looks wrong though since if the irq can latch the workqueue you need to wait for irqs to happen first before flushing. And obviously queue the work before signalling the completion of the interrupt. But since this seems to lack locking anyway and is only used for unload it doesn't really matter. -Daniel
On 08/09/14 16:31, Daniel Vetter wrote: > On Mon, Sep 08, 2014 at 04:03:18PM +0300, Tomi Valkeinen wrote: >> On 03/09/14 17:27, Daniel Vetter wrote: >>> On Wed, Sep 03, 2014 at 02:55:08PM +0300, Tomi Valkeinen wrote: >>>> omap_crtc_flush() is used to wait for scheduled work to be done for the >>>> give crtc. However, it's not quite right at the moment. >>>> >>>> omap_crtc_flush() does wait for work that is ran via vsync irq to be >>>> done. However, work is also queued to the driver's priv->wq workqueue, >>>> which is not handled by omap_crtc_flush(). >>>> >>>> Improve omap_crtc_flush() to flush the workqueue so that work there will >>>> be ran. >>>> >>>> This fixes a race issue on module unload, where an unpin work may be on >>>> the work queue, but does not get ran before drm core starts tearing the >>>> driver down, leading to a WARN. >>>> >>>> Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> >>> >>> I didn't really dig into details, but isn't that the same workqueue as >>> used by the async modeset code? So the same deadlocks might happen ... >> >> Yes, we have just one workqueue in the driver. >> >> Hmm, deadlocks with what lock? The modeconfig or crtc->mutex? I don't >> think they are locked at any place where omap_crtc_flush is called. > > Oh, I presumed you're using _flush in the relevant modeset functions - we No. That's the locking issue again. We can't flush when holding the crtc mutex, as the works in the workqueue also try to grab it... > do that in i915 to make sure that all the pageflips and other stuff > completed before we do another modeset. But omap only calls this at driver > unload, so no direct problem. At the moment yes, but in this series I add the same omap_crtc_flush() call to two new places: dev_preclose and omap_crtc_commit. Of which the omap_crtc_commit is the problematic one, discussed in the mail thread for patch 4. >>> lockdep won't complain though since you essentially open-code a >>> workqueue_flush, and lockdep also doesn't complain about all possible >>> deadlocks (due to some design issues with lockdep). >> >> What do you mean "open-code a workqueue_flush"?. I use flush_workqueue >> there. We have two things to wait for: work on the workqueue and work >> which is triggered by the vsync irq. So we loop and test for both of >> those, until there's no more work. > > Oops, missed that. Ordering looks wrong though since if the irq can latch > the workqueue you need to wait for irqs to happen first before flushing. > And obviously queue the work before signalling the completion of the > interrupt. But since this seems to lack locking anyway and is only used > for unload it doesn't really matter. Yeah, well, the workqueue can create work for the irq also. I don't know if it does, currently, but I think it's safer to presume that both workqueue and the irq can create work to the other. But that's why I have a loop there. So we flush, then check if there is work for the irq. If yes, sleep a bit and go back to start. So if the irq work created new work for the wq, we flush that. And if that work created new work for the irq, we check that again. Etc. Tomi
On Tue, Sep 9, 2014 at 3:07 AM, Tomi Valkeinen <tomi.valkeinen@ti.com> wrote: > >>>> lockdep won't complain though since you essentially open-code a >>>> workqueue_flush, and lockdep also doesn't complain about all possible >>>> deadlocks (due to some design issues with lockdep). >>> >>> What do you mean "open-code a workqueue_flush"?. I use flush_workqueue >>> there. We have two things to wait for: work on the workqueue and work >>> which is triggered by the vsync irq. So we loop and test for both of >>> those, until there's no more work. >> >> Oops, missed that. Ordering looks wrong though since if the irq can latch >> the workqueue you need to wait for irqs to happen first before flushing. >> And obviously queue the work before signalling the completion of the >> interrupt. But since this seems to lack locking anyway and is only used >> for unload it doesn't really matter. from a quick look at omap_crtc_flush(), you probably also race w/ apply worker list manipulation (like moving from queued_applies to pending_applies) > Yeah, well, the workqueue can create work for the irq also. I don't know > if it does, currently, but I think it's safer to presume that both > workqueue and the irq can create work to the other. > > But that's why I have a loop there. So we flush, then check if there is > work for the irq. If yes, sleep a bit and go back to start. So if the > irq work created new work for the wq, we flush that. And if that work > created new work for the irq, we check that again. Etc. I think adding an internal per-crtc apply_lock would solve a lot of problems. I was originally shying away from recommending that, mostly because I didn't want to think about it and I though there was an easier solution. But with an apply_lock we could at least add some locking to _flush() and make it not racy as well.. Although, I wonder if some waitqueue scheme would be a bit more sane.. ie. end of apply_worker, if there is nothing more queued up, signal the event that _flush() is waiting on. BR, -R
On 09/09/14 13:43, Rob Clark wrote: > Although, I wonder if some waitqueue scheme would be a bit more sane.. > ie. end of apply_worker, if there is nothing more queued up, signal > the event that _flush() is waiting on. Maybe, but we would still need either the separate apply_lock, or unlock the crtc->mutex to allow the workqueue to run. Tomi
diff --git a/drivers/gpu/drm/omapdrm/omap_crtc.c b/drivers/gpu/drm/omapdrm/omap_crtc.c index 3261fbf94957..506cf8bc804f 100644 --- a/drivers/gpu/drm/omapdrm/omap_crtc.c +++ b/drivers/gpu/drm/omapdrm/omap_crtc.c @@ -655,21 +655,42 @@ static void omap_crtc_post_apply(struct omap_drm_apply *apply) /* nothing needed for post-apply */ } +static bool omap_crtc_work_pending(struct omap_crtc *omap_crtc) +{ + return !list_empty(&omap_crtc->pending_applies) || + !list_empty(&omap_crtc->queued_applies) || + omap_crtc->event || omap_crtc->old_fb; +} + +/* + * Wait for any work on the workqueue to be finished, and any work which will + * be run via vsync irq to be done. + * + * Note that work on workqueue could schedule new vsync work, and vice versa. + */ void omap_crtc_flush(struct drm_crtc *crtc) { struct omap_crtc *omap_crtc = to_omap_crtc(crtc); + struct omap_drm_private *priv = crtc->dev->dev_private; int loops = 0; - while (!list_empty(&omap_crtc->pending_applies) || - !list_empty(&omap_crtc->queued_applies) || - omap_crtc->event || omap_crtc->old_fb) { + while (true) { + /* first flush the wq, so that any scheduled work is done */ + flush_workqueue(priv->wq); + + /* if we have nothing queued for this crtc, we're done */ + if (!omap_crtc_work_pending(omap_crtc)) + break; if (++loops > 10) { - dev_err(crtc->dev->dev, - "omap_crtc_flush() timeout\n"); + dev_err(crtc->dev->dev, "omap_crtc_flush() timeout\n"); break; } + /* + * wait for a bit so that a vsync has (probably) happened, and + * that the crtc work is (probably) done. + */ schedule_timeout_uninterruptible(msecs_to_jiffies(20)); } }
omap_crtc_flush() is used to wait for scheduled work to be done for the give crtc. However, it's not quite right at the moment. omap_crtc_flush() does wait for work that is ran via vsync irq to be done. However, work is also queued to the driver's priv->wq workqueue, which is not handled by omap_crtc_flush(). Improve omap_crtc_flush() to flush the workqueue so that work there will be ran. This fixes a race issue on module unload, where an unpin work may be on the work queue, but does not get ran before drm core starts tearing the driver down, leading to a WARN. Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ti.com> --- drivers/gpu/drm/omapdrm/omap_crtc.c | 31 ++++++++++++++++++++++++++----- 1 file changed, 26 insertions(+), 5 deletions(-)