Message ID | 1402999924-2403-1-git-send-email-michel@daenzer.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Am 17.06.2014 12:12, schrieb Michel Dänzer: > From: Michel Dänzer <michel.daenzer@amd.com> > > This reverts commit 75f36d861957cb05b7889af24c8cd4a789398304. > > drm_vblank_get() is necessary to ensure the DRM vblank counter value is > up to date in drm_send_vblank_event(). > > Seems to fix weston hangs waiting for page flips to complete. > > Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> Both patches are: Reviewed-by: Christian König <christian.koenig@amd.com> > --- > drivers/gpu/drm/radeon/radeon_display.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c > index 2a8b9f1..97d7a80 100644 > --- a/drivers/gpu/drm/radeon/radeon_display.c > +++ b/drivers/gpu/drm/radeon/radeon_display.c > @@ -357,6 +357,7 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id) > > spin_unlock_irqrestore(&rdev->ddev->event_lock, flags); > > + drm_vblank_put(rdev->ddev, radeon_crtc->crtc_id); > radeon_fence_unref(&work->fence); > radeon_irq_kms_pflip_irq_get(rdev, work->crtc_id); > queue_work(radeon_crtc->flip_queue, &work->unpin_work); > @@ -459,6 +460,12 @@ static void radeon_flip_work_func(struct work_struct *__work) > base &= ~7; > } > > + r = drm_vblank_get(crtc->dev, radeon_crtc->crtc_id); > + if (r) { > + DRM_ERROR("failed to get vblank before flip\n"); > + goto pflip_cleanup; > + } > + > /* We borrow the event spin lock for protecting flip_work */ > spin_lock_irqsave(&crtc->dev->event_lock, flags); > > @@ -473,6 +480,16 @@ static void radeon_flip_work_func(struct work_struct *__work) > > return; > > +pflip_cleanup: > + if (unlikely(radeon_bo_reserve(work->new_rbo, false) != 0)) { > + DRM_ERROR("failed to reserve new rbo in error path\n"); > + goto cleanup; > + } > + if (unlikely(radeon_bo_unpin(work->new_rbo) != 0)) { > + DRM_ERROR("failed to unpin new rbo in error path\n"); > + } > + radeon_bo_unreserve(work->new_rbo); > + > cleanup: > drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); > radeon_fence_unref(&work->fence);
On Tue, Jun 17, 2014 at 7:41 AM, Christian König <deathsimple@vodafone.de> wrote: > Am 17.06.2014 12:12, schrieb Michel Dänzer: > >> From: Michel Dänzer <michel.daenzer@amd.com> >> >> This reverts commit 75f36d861957cb05b7889af24c8cd4a789398304. >> >> drm_vblank_get() is necessary to ensure the DRM vblank counter value is >> up to date in drm_send_vblank_event(). >> >> Seems to fix weston hangs waiting for page flips to complete. >> >> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> > > > Both patches are: Reviewed-by: Christian König <christian.koenig@amd.com> Both applied to my fixes tree. Alex > > >> --- >> drivers/gpu/drm/radeon/radeon_display.c | 17 +++++++++++++++++ >> 1 file changed, 17 insertions(+) >> >> diff --git a/drivers/gpu/drm/radeon/radeon_display.c >> b/drivers/gpu/drm/radeon/radeon_display.c >> index 2a8b9f1..97d7a80 100644 >> --- a/drivers/gpu/drm/radeon/radeon_display.c >> +++ b/drivers/gpu/drm/radeon/radeon_display.c >> @@ -357,6 +357,7 @@ void radeon_crtc_handle_flip(struct radeon_device >> *rdev, int crtc_id) >> spin_unlock_irqrestore(&rdev->ddev->event_lock, flags); >> + drm_vblank_put(rdev->ddev, radeon_crtc->crtc_id); >> radeon_fence_unref(&work->fence); >> radeon_irq_kms_pflip_irq_get(rdev, work->crtc_id); >> queue_work(radeon_crtc->flip_queue, &work->unpin_work); >> @@ -459,6 +460,12 @@ static void radeon_flip_work_func(struct work_struct >> *__work) >> base &= ~7; >> } >> + r = drm_vblank_get(crtc->dev, radeon_crtc->crtc_id); >> + if (r) { >> + DRM_ERROR("failed to get vblank before flip\n"); >> + goto pflip_cleanup; >> + } >> + >> /* We borrow the event spin lock for protecting flip_work */ >> spin_lock_irqsave(&crtc->dev->event_lock, flags); >> @@ -473,6 +480,16 @@ static void radeon_flip_work_func(struct >> work_struct *__work) >> return; >> +pflip_cleanup: >> + if (unlikely(radeon_bo_reserve(work->new_rbo, false) != 0)) { >> + DRM_ERROR("failed to reserve new rbo in error path\n"); >> + goto cleanup; >> + } >> + if (unlikely(radeon_bo_unpin(work->new_rbo) != 0)) { >> + DRM_ERROR("failed to unpin new rbo in error path\n"); >> + } >> + radeon_bo_unreserve(work->new_rbo); >> + >> cleanup: >> drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); >> radeon_fence_unref(&work->fence); > > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
On 17.06.2014 20:41, Christian König wrote: > Am 17.06.2014 12:12, schrieb Michel Dänzer: >> From: Michel Dänzer <michel.daenzer@amd.com> >> >> This reverts commit 75f36d861957cb05b7889af24c8cd4a789398304. >> >> drm_vblank_get() is necessary to ensure the DRM vblank counter value is >> up to date in drm_send_vblank_event(). >> >> Seems to fix weston hangs waiting for page flips to complete. >> >> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> > > Both patches are: Reviewed-by: Christian König <christian.koenig@amd.com> Thank you. Looking into these issues has got me thinking about the use of the page flip interrupt: If the page flip interrupt arrives before the corresponding vertical blank interrupt, the DRM vblank counter will be lower than expected by 1 in drm_send_vblank_event(). I suspect this is the cause of (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x] messages in the X log file which have been popping up in bug reports lately. This also results in 0s being returned to the client for the MSC and timestamp of the swap completion, which could cause all kinds of bad behaviour. The easy way to avoid that would be to stop using the page flip interrupt for this again. Could there be another solution for the issues you addressed by using it? If not, another issue I encountered in 3.15 is that radeon_crtc_handle_flip() is called unconditionally when a page flip interrupt arrives. If the flip was already handled (presumably from the vertical blank interrupt), the BUG_ON() in drm_vblank_put() triggers a panic. This happened to me with weston. This is presumably not an issue in 3.16 because radeon_crtc_handle_flip() now bails early if radeon_crtc->flip_work == NULL.
Am 18.06.2014 07:53, schrieb Michel Dänzer: > On 17.06.2014 20:41, Christian König wrote: >> Am 17.06.2014 12:12, schrieb Michel Dänzer: >>> From: Michel Dänzer <michel.daenzer@amd.com> >>> >>> This reverts commit 75f36d861957cb05b7889af24c8cd4a789398304. >>> >>> drm_vblank_get() is necessary to ensure the DRM vblank counter value is >>> up to date in drm_send_vblank_event(). >>> >>> Seems to fix weston hangs waiting for page flips to complete. >>> >>> Signed-off-by: Michel Dänzer <michel.daenzer@amd.com> >> Both patches are: Reviewed-by: Christian König <christian.koenig@amd.com> > Thank you. > > > Looking into these issues has got me thinking about the use of the page > flip interrupt: If the page flip interrupt arrives before the corresponding > vertical blank interrupt, the DRM vblank counter will be lower than > expected by 1 in drm_send_vblank_event(). I suspect this is the cause of > > (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc [x-1] < target_msc [x] > > messages in the X log file which have been popping up in bug reports lately. > This also results in 0s being returned to the client for the MSC and > timestamp of the swap completion, which could cause all kinds of bad > behaviour. First of all thanks for looking into it. Are you getting this on 3.16 or 3.15? I don't think that the pflip irq is thrown earlier than the vblank, but on 3.16 it might actually be that we program the flip so fast into the hardware that we do it one frame earlier than planned. > The easy way to avoid that would be to stop using the page flip interrupt > for this again. Could there be another solution for the issues you > addressed by using it? The original problem was that programming the flip in the vblank event actually doesn't work reliable because of the underlying hardware double buffering. We just can't tell if the flip will complete in this frame or if the vblank interrupt was processed so late that it will happen in the next frame. We could just busy loop until either the pending bit or the bit for the update period becomes null, but even busy waiting for the pending bit to go up in an interrupt handler like we did before is quite questionable. Additional to that using the pflip interrupt enables us to sync to the hblank as well or just not at all with just changing a few register bits. And it's also a prerequisite of switching to a non constant sync rate. So I would like to keep it and try to fix the issues we are seeing instead. > If not, another issue I encountered in 3.15 is that > radeon_crtc_handle_flip() is called unconditionally when a page flip > interrupt arrives. If the flip was already handled (presumably from the > vertical blank interrupt), the BUG_ON() in drm_vblank_put() triggers a > panic. This happened to me with weston. Calling radeon_crtc_handle_flip multiple times shouldn't be a problem, that can happen with the old code as well. Setting unpin_work to NULL under a spin lock protects us from that case. But take a look at the 3.15 version of radeon_crtc_page_flip instead!!! We first set "unpin_work", release the spin lock and *then* reserve and pin the BO. If I'm not completely wrong there is a race condition here that when the vblank interrupt happens before the rest of the function all kind of bad things can happen. The only thing preventing us from that is that the vblank interrupt is turned on only at the end of the function, but the vblank interrupt can be turned on before by other reasons as well. > This is presumably not an issue in 3.16 because radeon_crtc_handle_flip() > now bails early if radeon_crtc->flip_work == NULL. Thanks, Christian.
On 18.06.2014 18:14, Christian König wrote: > Am 18.06.2014 07:53, schrieb Michel Dänzer: >> >> Looking into these issues has got me thinking about the use of the page >> flip interrupt: If the page flip interrupt arrives before the >> corresponding >> vertical blank interrupt, the DRM vblank counter will be lower than >> expected by 1 in drm_send_vblank_event(). I suspect this is the cause of >> >> (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion >> event has impossible msc [x-1] < target_msc [x] >> >> messages in the X log file which have been popping up in bug reports >> lately. >> This also results in 0s being returned to the client for the MSC and >> timestamp of the swap completion, which could cause all kinds of bad >> behaviour. > First of all thanks for looking into it. Are you getting this on 3.16 or > 3.15? I haven't actually run into this myself yet. I thought I'd seen it in several bug reports, but right now I can only find https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems to include the page flipping changes from 3.16. > I don't think that the pflip irq is thrown earlier than the vblank, but > on 3.16 it might actually be that we program the flip so fast into the > hardware that we do it one frame earlier than planned. So userspace is notified of the previous vertical blank period and calls the page flip ioctl in response, which then manages to program the scanout address update into the hardware before the scanout address update is latched during the previous vertical blank period? To avoid that scenario, one possibility might be to check if we're in vertical blank before calling radeon_page_flip(), and if so sleep for 1ms or so before trying again? That might unnecessarily delay flips on other CRTCs though...
Am 23.06.2014 11:34, schrieb Michel Dänzer: > On 18.06.2014 18:14, Christian König wrote: >> Am 18.06.2014 07:53, schrieb Michel Dänzer: >>> Looking into these issues has got me thinking about the use of the page >>> flip interrupt: If the page flip interrupt arrives before the >>> corresponding >>> vertical blank interrupt, the DRM vblank counter will be lower than >>> expected by 1 in drm_send_vblank_event(). I suspect this is the cause of >>> >>> (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion >>> event has impossible msc [x-1] < target_msc [x] >>> >>> messages in the X log file which have been popping up in bug reports >>> lately. >>> This also results in 0s being returned to the client for the MSC and >>> timestamp of the swap completion, which could cause all kinds of bad >>> behaviour. >> First of all thanks for looking into it. Are you getting this on 3.16 or >> 3.15? > I haven't actually run into this myself yet. I thought I'd seen it in > several bug reports, but right now I can only find > https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems to > include the page flipping changes from 3.16. > > >> I don't think that the pflip irq is thrown earlier than the vblank, but >> on 3.16 it might actually be that we program the flip so fast into the >> hardware that we do it one frame earlier than planned. > So userspace is notified of the previous vertical blank period and calls > the page flip ioctl in response, which then manages to program the > scanout address update into the hardware before the scanout address > update is latched during the previous vertical blank period? Yes correct. That at least sounds like the most likely explanation to me. > To avoid that scenario, one possibility might be to check if we're in > vertical blank before calling radeon_page_flip(), and if so sleep for > 1ms or so before trying again? That might unnecessarily delay flips on > other CRTCs though... It won't delay the other CRTCs because each CRTC has it's own kernel thread, but it won't be optimal either. Going to try to reproduce the bug with 3.16, Christian.
Am 23.06.2014 11:34, schrieb Michel Dänzer: > On 18.06.2014 18:14, Christian König wrote: >> Am 18.06.2014 07:53, schrieb Michel Dänzer: >>> >>> Looking into these issues has got me thinking about the use of the >>> page >>> flip interrupt: If the page flip interrupt arrives before the >>> corresponding >>> vertical blank interrupt, the DRM vblank counter will be lower than >>> expected by 1 in drm_send_vblank_event(). I suspect this is the cause >>> of >>> >>> (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion >>> event has impossible msc [x-1] < target_msc [x] >>> >>> messages in the X log file which have been popping up in bug reports >>> lately. >>> This also results in 0s being returned to the client for the MSC and >>> timestamp of the swap completion, which could cause all kinds of bad >>> behaviour. >> First of all thanks for looking into it. Are you getting this on 3.16 >> or >> 3.15? > > I haven't actually run into this myself yet. I thought I'd seen it in > several bug reports, but right now I can only find > https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems to > include the page flipping changes from 3.16. With 3.16-rc2 I get it now on my RV730 AGP as in the above bug report. But only the lines in Xorg.0.log. NO signs of any damage/error in use. Since 3.15 and 3.16 (rc2 only) my system is rock solid. I've tried 3.15-rc7 + Christian's pflip rework (did some little handwork), too. It was solid but I saw the reported flip/black distortion in the below part during Kwin 4.13 cube screen effect (rotation). Your fix for 3.16-rc1 fixed that. Before 3.15/3.16-rcX I got some hangs from time to time during system boot. Nothing in the logs but SSD RAID1 rebuild. Maybe it was MD related an NOT r600/DRM. 3.16-rcX (3.15-rc7+pflip patches) seems to be more responsive that 3.15, for me. First and latest attchments from bug #80141 https://bugs.freedesktop.org/attachment.cgi?id=101605 show same. Where should I add/send my Xorg.0.log? Cheers, Dieter >> I don't think that the pflip irq is thrown earlier than the vblank, >> but >> on 3.16 it might actually be that we program the flip so fast into the >> hardware that we do it one frame earlier than planned. > > So userspace is notified of the previous vertical blank period and > calls > the page flip ioctl in response, which then manages to program the > scanout address update into the hardware before the scanout address > update is latched during the previous vertical blank period? > > To avoid that scenario, one possibility might be to check if we're in > vertical blank before calling radeon_page_flip(), and if so sleep for > 1ms or so before trying again? That might unnecessarily delay flips on > other CRTCs though...
Am 23.06.2014 21:46, schrieb Dieter Nützel: > Am 23.06.2014 11:34, schrieb Michel Dänzer: >> On 18.06.2014 18:14, Christian König wrote: >>> Am 18.06.2014 07:53, schrieb Michel Dänzer: >>>> >>>> Looking into these issues has got me thinking about the use of the >>>> page >>>> flip interrupt: If the page flip interrupt arrives before the >>>> corresponding >>>> vertical blank interrupt, the DRM vblank counter will be lower than >>>> expected by 1 in drm_send_vblank_event(). I suspect this is the >>>> cause of >>>> >>>> (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip >>>> completion >>>> event has impossible msc [x-1] < target_msc [x] >>>> >>>> messages in the X log file which have been popping up in bug reports >>>> lately. >>>> This also results in 0s being returned to the client for the MSC and >>>> timestamp of the swap completion, which could cause all kinds of bad >>>> behaviour. >>> First of all thanks for looking into it. Are you getting this on 3.16 >>> or >>> 3.15? >> >> I haven't actually run into this myself yet. I thought I'd seen it in >> several bug reports, but right now I can only find >> https://bugs.freedesktop.org/show_bug.cgi?id=80029#c17 , which seems >> to >> include the page flipping changes from 3.16. > > With 3.16-rc2 I get it now on my RV730 AGP as in the above bug report. > But only the lines in Xorg.0.log. > NO signs of any damage/error in use. > > Since 3.15 and 3.16 (rc2 only) my system is rock solid. > > I've tried 3.15-rc7 + Christian's pflip rework (did some little > handwork), too. > It was solid but I saw the reported flip/black distortion in the below > part during Kwin 4.13 cube screen effect (rotation). Your fix for > 3.16-rc1 fixed that. > > Before 3.15/3.16-rcX I got some hangs from time to time during system > boot. > Nothing in the logs but SSD RAID1 rebuild. Maybe it was MD related an > NOT r600/DRM. > > 3.16-rcX (3.15-rc7+pflip patches) seems to be more responsive that > 3.15, for me. > > First and latest attchments from bug #80141 > https://bugs.freedesktop.org/attachment.cgi?id=101605 > show same. > > Where should I add/send my Xorg.0.log? > > Cheers, > Dieter Addendum: I can reliable generate such lines in Xorg.0.log with KWin cube desktop effect. Rotate screens with mouse wheel or screen switcher => new entry in Xorg.0.log. If it happens I notice ('see') flip delay. [ 9893.183] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 594382 < target_msc 594383 [ 10859.753] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 652497 < target_msc 652498 [ 10915.719] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 655863 < target_msc 655864 [ 10916.817] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 655929 < target_msc 655930 [ 10925.843] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 656472 < target_msc 656473 [ 10926.774] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 656528 < target_msc 656529 [ 10965.519] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 658859 < target_msc 658860 [ 11081.878] (WW) RADEON(0): radeon_dri2_flip_event_handler: Pageflip completion event has impossible msc 665846 < target_msc 665847 >>> I don't think that the pflip irq is thrown earlier than the vblank, >>> but >>> on 3.16 it might actually be that we program the flip so fast into >>> the >>> hardware that we do it one frame earlier than planned. >> >> So userspace is notified of the previous vertical blank period and >> calls >> the page flip ioctl in response, which then manages to program the >> scanout address update into the hardware before the scanout address >> update is latched during the previous vertical blank period? >> >> To avoid that scenario, one possibility might be to check if we're in >> vertical blank before calling radeon_page_flip(), and if so sleep for >> 1ms or so before trying again? That might unnecessarily delay flips on >> other CRTCs though... > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/dri-devel
diff --git a/drivers/gpu/drm/radeon/radeon_display.c b/drivers/gpu/drm/radeon/radeon_display.c index 2a8b9f1..97d7a80 100644 --- a/drivers/gpu/drm/radeon/radeon_display.c +++ b/drivers/gpu/drm/radeon/radeon_display.c @@ -357,6 +357,7 @@ void radeon_crtc_handle_flip(struct radeon_device *rdev, int crtc_id) spin_unlock_irqrestore(&rdev->ddev->event_lock, flags); + drm_vblank_put(rdev->ddev, radeon_crtc->crtc_id); radeon_fence_unref(&work->fence); radeon_irq_kms_pflip_irq_get(rdev, work->crtc_id); queue_work(radeon_crtc->flip_queue, &work->unpin_work); @@ -459,6 +460,12 @@ static void radeon_flip_work_func(struct work_struct *__work) base &= ~7; } + r = drm_vblank_get(crtc->dev, radeon_crtc->crtc_id); + if (r) { + DRM_ERROR("failed to get vblank before flip\n"); + goto pflip_cleanup; + } + /* We borrow the event spin lock for protecting flip_work */ spin_lock_irqsave(&crtc->dev->event_lock, flags); @@ -473,6 +480,16 @@ static void radeon_flip_work_func(struct work_struct *__work) return; +pflip_cleanup: + if (unlikely(radeon_bo_reserve(work->new_rbo, false) != 0)) { + DRM_ERROR("failed to reserve new rbo in error path\n"); + goto cleanup; + } + if (unlikely(radeon_bo_unpin(work->new_rbo) != 0)) { + DRM_ERROR("failed to unpin new rbo in error path\n"); + } + radeon_bo_unreserve(work->new_rbo); + cleanup: drm_gem_object_unreference_unlocked(&work->old_rbo->gem_base); radeon_fence_unref(&work->fence);