diff mbox

[6/6] drm/i915: Really wait for pending flips in intel_pipe_set_base()

Message ID 1359476018-31274-7-git-send-email-ville.syrjala@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Ville Syrjälä Jan. 29, 2013, 4:13 p.m. UTC
From: Ville Syrjälä <ville.syrjala@linux.intel.com>

Since obj->pending_flips was never set, intel_pipe_set_base() never
actually waited for pending page flips to complete.

We really do want to wait for the pending flips, because otherwise the
mmio surface base address update could overtake the flip, and you
could end up with an old frame on the screen once the flip really
completes.

Just call intel_crtc_wait_pending_flips_locked() instead of
intel_finish_fb() from intel_pipe_set_base() to achieve the
desired result.

Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Lespiau, Damien Feb. 13, 2013, 10:40 a.m. UTC | #1
On Tue, Jan 29, 2013 at 06:13:38PM +0200, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Since obj->pending_flips was never set, intel_pipe_set_base() never
> actually waited for pending page flips to complete.
> 
> We really do want to wait for the pending flips, because otherwise the
> mmio surface base address update could overtake the flip, and you
> could end up with an old frame on the screen once the flip really
> completes.
> 
> Just call intel_crtc_wait_pending_flips_locked() instead of
> intel_finish_fb() from intel_pipe_set_base() to achieve the
> desired result.
> 
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

Reviewed-by: Damien Lespiau <damien.lespiau@intel.com>
Daniel Vetter Feb. 13, 2013, 3:49 p.m. UTC | #2
On Tue, Jan 29, 2013 at 06:13:38PM +0200, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> Since obj->pending_flips was never set, intel_pipe_set_base() never
> actually waited for pending page flips to complete.
> 
> We really do want to wait for the pending flips, because otherwise the
> mmio surface base address update could overtake the flip, and you
> could end up with an old frame on the screen once the flip really
> completes.
> 
> Just call intel_crtc_wait_pending_flips_locked() instead of
> intel_finish_fb() from intel_pipe_set_base() to achieve the
> desired result.
> 
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_display.c | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index a2e04f7..7e047c1 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -2330,8 +2330,7 @@ intel_pipe_set_base(struct drm_crtc *crtc, int x, int y,
>  		return ret;
>  	}
>  
> -	if (crtc->fb)
> -		intel_finish_fb(crtc->fb);
> +	intel_crtc_wait_for_pending_flips_locked(crtc);

Ah, I see now why you grab dev->struct_mutex and need to kick waiters from
the pending flip queue. I think grabbing the mutex twice isn't a major
offense, since both the crtc disable code and set_base are slowpaths used
rarely. So what about simply calling wait_for_pending_flips before
grabbing the mutex in intel_pipe_set_base? We could then also inline
finish_fb into it's only callsite ...
-Daniel

>  
>  	ret = dev_priv->display.update_plane(crtc, fb, x, y);
>  	if (ret) {
> -- 
> 1.7.12.4
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Ville Syrjälä Feb. 13, 2013, 5:06 p.m. UTC | #3
On Wed, Feb 13, 2013 at 04:49:35PM +0100, Daniel Vetter wrote:
> On Tue, Jan 29, 2013 at 06:13:38PM +0200, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > Since obj->pending_flips was never set, intel_pipe_set_base() never
> > actually waited for pending page flips to complete.
> > 
> > We really do want to wait for the pending flips, because otherwise the
> > mmio surface base address update could overtake the flip, and you
> > could end up with an old frame on the screen once the flip really
> > completes.
> > 
> > Just call intel_crtc_wait_pending_flips_locked() instead of
> > intel_finish_fb() from intel_pipe_set_base() to achieve the
> > desired result.
> > 
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index a2e04f7..7e047c1 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -2330,8 +2330,7 @@ intel_pipe_set_base(struct drm_crtc *crtc, int x, int y,
> >  		return ret;
> >  	}
> >  
> > -	if (crtc->fb)
> > -		intel_finish_fb(crtc->fb);
> > +	intel_crtc_wait_for_pending_flips_locked(crtc);
> 
> Ah, I see now why you grab dev->struct_mutex and need to kick waiters from
> the pending flip queue. I think grabbing the mutex twice isn't a major
> offense, since both the crtc disable code and set_base are slowpaths used
> rarely. So what about simply calling wait_for_pending_flips before
> grabbing the mutex in intel_pipe_set_base? We could then also inline
> finish_fb into it's only callsite ...

I didn't want to slow down intel_pipe_set_base() too much. If we wait
for pending flips before pinning the new fb, we can never achieve any
parallelism there. But if no-one cares about that, we can reorder.
Daniel Vetter Feb. 13, 2013, 5:11 p.m. UTC | #4
On Wed, Feb 13, 2013 at 6:06 PM, Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
>> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>> > index a2e04f7..7e047c1 100644
>> > --- a/drivers/gpu/drm/i915/intel_display.c
>> > +++ b/drivers/gpu/drm/i915/intel_display.c
>> > @@ -2330,8 +2330,7 @@ intel_pipe_set_base(struct drm_crtc *crtc, int x, int y,
>> >             return ret;
>> >     }
>> >
>> > -   if (crtc->fb)
>> > -           intel_finish_fb(crtc->fb);
>> > +   intel_crtc_wait_for_pending_flips_locked(crtc);
>>
>> Ah, I see now why you grab dev->struct_mutex and need to kick waiters from
>> the pending flip queue. I think grabbing the mutex twice isn't a major
>> offense, since both the crtc disable code and set_base are slowpaths used
>> rarely. So what about simply calling wait_for_pending_flips before
>> grabbing the mutex in intel_pipe_set_base? We could then also inline
>> finish_fb into it's only callsite ...
>
> I didn't want to slow down intel_pipe_set_base() too much. If we wait
> for pending flips before pinning the new fb, we can never achieve any
> parallelism there. But if no-one cares about that, we can reorder.

I'm confused here - where can we extract parallelism in set_base
between waiting for pending flips and the pinning? And imo set_base
isn't really critical: It's officially a synchronous thing (we have a
vblank wait in there), and if we want to fix that imo the nuclear
pageflip should be the answer.
-Daniel
Ville Syrjälä Feb. 13, 2013, 5:26 p.m. UTC | #5
On Wed, Feb 13, 2013 at 06:11:00PM +0100, Daniel Vetter wrote:
> On Wed, Feb 13, 2013 at 6:06 PM, Ville Syrjälä
> <ville.syrjala@linux.intel.com> wrote:
> >> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> >> > index a2e04f7..7e047c1 100644
> >> > --- a/drivers/gpu/drm/i915/intel_display.c
> >> > +++ b/drivers/gpu/drm/i915/intel_display.c
> >> > @@ -2330,8 +2330,7 @@ intel_pipe_set_base(struct drm_crtc *crtc, int x, int y,
> >> >             return ret;
> >> >     }
> >> >
> >> > -   if (crtc->fb)
> >> > -           intel_finish_fb(crtc->fb);
> >> > +   intel_crtc_wait_for_pending_flips_locked(crtc);
> >>
> >> Ah, I see now why you grab dev->struct_mutex and need to kick waiters from
> >> the pending flip queue. I think grabbing the mutex twice isn't a major
> >> offense, since both the crtc disable code and set_base are slowpaths used
> >> rarely. So what about simply calling wait_for_pending_flips before
> >> grabbing the mutex in intel_pipe_set_base? We could then also inline
> >> finish_fb into it's only callsite ...
> >
> > I didn't want to slow down intel_pipe_set_base() too much. If we wait
> > for pending flips before pinning the new fb, we can never achieve any
> > parallelism there. But if no-one cares about that, we can reorder.
> 
> I'm confused here - where can we extract parallelism in set_base
> between waiting for pending flips and the pinning? And imo set_base
> isn't really critical: It's officially a synchronous thing (we have a
> vblank wait in there), and if we want to fix that imo the nuclear
> pageflip should be the answer.

The flip is making progress on the GPU side, and at the same time the
CPU side can make some progress with the pin operation. At least that
was my theory.
Daniel Vetter Feb. 13, 2013, 5:31 p.m. UTC | #6
On Wed, Feb 13, 2013 at 6:26 PM, Ville Syrjälä
<ville.syrjala@linux.intel.com> wrote:
> The flip is making progress on the GPU side, and at the same time the
> CPU side can make some progress with the pin operation. At least that
> was my theory.

The pin should be a no-op when moving a given framebuffer around. It's
only really expensive when we need to clflush (and lesser so when we
need to rewrite ptes), which should only happen on fresh framebuffers.
As long as userspace does it's job and caches scanout buffers we
should be fine. The other thing is waiting for outstanding rendering
(to avoid hanging on an MI_WAIT_SCANLINE cmd), which just blocks. So
doesn't really matter imo.
-Daniel
Chris Wilson Feb. 13, 2013, 5:39 p.m. UTC | #7
On Wed, Feb 13, 2013 at 06:11:00PM +0100, Daniel Vetter wrote:
> I'm confused here - where can we extract parallelism in set_base
> between waiting for pending flips and the pinning? And imo set_base
> isn't really critical: It's officially a synchronous thing (we have a
> vblank wait in there), and if we want to fix that imo the nuclear
> pageflip should be the answer.

Speaking of which I've had a patch to remove that extra synchronous
wait for over a year...
-Chris
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index a2e04f7..7e047c1 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2330,8 +2330,7 @@  intel_pipe_set_base(struct drm_crtc *crtc, int x, int y,
 		return ret;
 	}
 
-	if (crtc->fb)
-		intel_finish_fb(crtc->fb);
+	intel_crtc_wait_for_pending_flips_locked(crtc);
 
 	ret = dev_priv->display.update_plane(crtc, fb, x, y);
 	if (ret) {