diff mbox

drm/i915: workaround bad DSL readout v2

Message ID 1441921133-3381-1-git-send-email-jbarnes@virtuousgeek.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jesse Barnes Sept. 10, 2015, 9:38 p.m. UTC
On HSW at least (still testing other platforms, but should be harmless
elsewhere), the DSL reg reads back as 0 when read around vblank start
time.  This ends up confusing the atomic start/end checking code, since
it causes the update to appear as if it crossed a frame count boundary.
Avoid the problem by making sure we don't return scanline_offset from
the get_crtc_scanline function.  In moving the code there, I add to add
an additional delay since it could be called and have a legitimate 0
result for some time (depending on the pixel clock).

v2: move hsw dsl read hack to get_crtc_scanline (Ville)

References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

Comments

Ville Syrjälä Sept. 10, 2015, 9:53 p.m. UTC | #1
On Thu, Sep 10, 2015 at 02:38:53PM -0700, Jesse Barnes wrote:
> On HSW at least (still testing other platforms, but should be harmless
> elsewhere), the DSL reg reads back as 0 when read around vblank start
> time.  This ends up confusing the atomic start/end checking code, since
> it causes the update to appear as if it crossed a frame count boundary.
> Avoid the problem by making sure we don't return scanline_offset from
> the get_crtc_scanline function.  In moving the code there, I add to add
> an additional delay since it could be called and have a legitimate 0
> result for some time (depending on the pixel clock).
> 
> v2: move hsw dsl read hack to get_crtc_scanline (Ville)
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 90bc6c2..97e5d52 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -697,6 +697,27 @@ static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
>  		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
>  
>  	/*
> +	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
> +	 * read it right around the start of vblank.  So try it again
> +	 * so we don't accidentally end up spanning a vblank frame
> +	 * increment, causing the pipe_update_end() code to squak at us.
> +	 */
> +	if (IS_HASWELL(dev) && !position) {
> +		int i, temp;
> +
> +		for (i = 0; i < 100; i++) {
> +			udelay(1);
> +			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
> +				DSL_LINEMASK_GEN3;
> +			if (temp != position) {
> +				position = temp;
> +				goto out;
> +			}
> +		}
> +	}

Hmm. Another idea. If it always happens at start of vbl, maybe have a
look at the ISR. If scanline reads 0, but ISR says we're in vblank, just
return vblank_start. That's assming ISR would in fact show that it's in
vblank when this happens.

> +
> +out:
> +	/*
>  	 * See update_scanline_offset() for the details on the
>  	 * scanline_offset adjustment.
>  	 */
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Jesse Barnes Sept. 10, 2015, 9:57 p.m. UTC | #2
On 09/10/2015 02:53 PM, Ville Syrjälä wrote:
> On Thu, Sep 10, 2015 at 02:38:53PM -0700, Jesse Barnes wrote:
>> On HSW at least (still testing other platforms, but should be harmless
>> elsewhere), the DSL reg reads back as 0 when read around vblank start
>> time.  This ends up confusing the atomic start/end checking code, since
>> it causes the update to appear as if it crossed a frame count boundary.
>> Avoid the problem by making sure we don't return scanline_offset from
>> the get_crtc_scanline function.  In moving the code there, I add to add
>> an additional delay since it could be called and have a legitimate 0
>> result for some time (depending on the pixel clock).
>>
>> v2: move hsw dsl read hack to get_crtc_scanline (Ville)
>>
>> References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
>> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
>> ---
>>  drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
>>  1 file changed, 21 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>> index 90bc6c2..97e5d52 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -697,6 +697,27 @@ static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
>>  		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
>>  
>>  	/*
>> +	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
>> +	 * read it right around the start of vblank.  So try it again
>> +	 * so we don't accidentally end up spanning a vblank frame
>> +	 * increment, causing the pipe_update_end() code to squak at us.
>> +	 */
>> +	if (IS_HASWELL(dev) && !position) {
>> +		int i, temp;
>> +
>> +		for (i = 0; i < 100; i++) {
>> +			udelay(1);
>> +			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
>> +				DSL_LINEMASK_GEN3;
>> +			if (temp != position) {
>> +				position = temp;
>> +				goto out;
>> +			}
>> +		}
>> +	}
> 
> Hmm. Another idea. If it always happens at start of vbl, maybe have a
> look at the ISR. If scanline reads 0, but ISR says we're in vblank, just
> return vblank_start. That's assming ISR would in fact show that it's in
> vblank when this happens.

Sounds a bit racy, though I guess it wouldn't hurt.  I'm actually a
little dubious about this change anyway since it will mean longer delays
when we really are at the top of the field/frame.  You sure you don't
like the first one better?

Thanks,
Jesse
Ville Syrjälä Sept. 10, 2015, 10:33 p.m. UTC | #3
On Thu, Sep 10, 2015 at 02:57:32PM -0700, Jesse Barnes wrote:
> On 09/10/2015 02:53 PM, Ville Syrjälä wrote:
> > On Thu, Sep 10, 2015 at 02:38:53PM -0700, Jesse Barnes wrote:
> >> On HSW at least (still testing other platforms, but should be harmless
> >> elsewhere), the DSL reg reads back as 0 when read around vblank start
> >> time.  This ends up confusing the atomic start/end checking code, since
> >> it causes the update to appear as if it crossed a frame count boundary.
> >> Avoid the problem by making sure we don't return scanline_offset from
> >> the get_crtc_scanline function.  In moving the code there, I add to add
> >> an additional delay since it could be called and have a legitimate 0
> >> result for some time (depending on the pixel clock).
> >>
> >> v2: move hsw dsl read hack to get_crtc_scanline (Ville)
> >>
> >> References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
> >> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> >> ---
> >>  drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
> >>  1 file changed, 21 insertions(+)
> >>
> >> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> >> index 90bc6c2..97e5d52 100644
> >> --- a/drivers/gpu/drm/i915/i915_irq.c
> >> +++ b/drivers/gpu/drm/i915/i915_irq.c
> >> @@ -697,6 +697,27 @@ static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
> >>  		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
> >>  
> >>  	/*
> >> +	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
> >> +	 * read it right around the start of vblank.  So try it again
> >> +	 * so we don't accidentally end up spanning a vblank frame
> >> +	 * increment, causing the pipe_update_end() code to squak at us.
> >> +	 */
> >> +	if (IS_HASWELL(dev) && !position) {
> >> +		int i, temp;
> >> +
> >> +		for (i = 0; i < 100; i++) {
> >> +			udelay(1);
> >> +			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
> >> +				DSL_LINEMASK_GEN3;
> >> +			if (temp != position) {
> >> +				position = temp;
> >> +				goto out;
> >> +			}
> >> +		}
> >> +	}
> > 
> > Hmm. Another idea. If it always happens at start of vbl, maybe have a
> > look at the ISR. If scanline reads 0, but ISR says we're in vblank, just
> > return vblank_start. That's assming ISR would in fact show that it's in
> > vblank when this happens.
> 
> Sounds a bit racy,

Not too much I think. We're under spinlock_irqsave here, and if the
scanline should magically start to tick just after we've read it we
would still give almost the right answer, even if we got magically
delayed by one or two scanlines. That's assuming that it has already
latched the double buffered registers when it goes wonky. If the
register latching happens only when the counter starts working again,
well then we're in deeper doodoo. But then I wouldn't expect ISR to
indicate that we're in vblank then either. So I think you should
definitely check what the ISR actually tells you while things are
in the bad state.

In case the badness happens just before start of vblank, well, then
I have no good idea how to handle it. This sort of retry loop is
the best that comes to mind right now :(

> though I guess it wouldn't hurt.  I'm actually a
> little dubious about this change anyway since it will mean longer delays
> when we really are at the top of the field/frame.  You sure you don't
> like the first one better?

The first one can't help vblank timestamps. Although I suppose that if
the badness always cures itself before the interrupt gets raised, we
would not hit this normally when we update the timestamp from the irq
handler. We could hit it when updating the timestamp outside the irq
handler though (ie. during vblank_get/put etc.).
Daniel Vetter Sept. 14, 2015, 9:10 a.m. UTC | #4
On Fri, Sep 11, 2015 at 01:33:08AM +0300, Ville Syrjälä wrote:
> On Thu, Sep 10, 2015 at 02:57:32PM -0700, Jesse Barnes wrote:
> > On 09/10/2015 02:53 PM, Ville Syrjälä wrote:
> > > On Thu, Sep 10, 2015 at 02:38:53PM -0700, Jesse Barnes wrote:
> > >> On HSW at least (still testing other platforms, but should be harmless
> > >> elsewhere), the DSL reg reads back as 0 when read around vblank start
> > >> time.  This ends up confusing the atomic start/end checking code, since
> > >> it causes the update to appear as if it crossed a frame count boundary.
> > >> Avoid the problem by making sure we don't return scanline_offset from
> > >> the get_crtc_scanline function.  In moving the code there, I add to add
> > >> an additional delay since it could be called and have a legitimate 0
> > >> result for some time (depending on the pixel clock).
> > >>
> > >> v2: move hsw dsl read hack to get_crtc_scanline (Ville)
> > >>
> > >> References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
> > >> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> > >> ---
> > >>  drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
> > >>  1 file changed, 21 insertions(+)
> > >>
> > >> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > >> index 90bc6c2..97e5d52 100644
> > >> --- a/drivers/gpu/drm/i915/i915_irq.c
> > >> +++ b/drivers/gpu/drm/i915/i915_irq.c
> > >> @@ -697,6 +697,27 @@ static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
> > >>  		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
> > >>  
> > >>  	/*
> > >> +	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
> > >> +	 * read it right around the start of vblank.  So try it again
> > >> +	 * so we don't accidentally end up spanning a vblank frame
> > >> +	 * increment, causing the pipe_update_end() code to squak at us.
> > >> +	 */
> > >> +	if (IS_HASWELL(dev) && !position) {
> > >> +		int i, temp;
> > >> +
> > >> +		for (i = 0; i < 100; i++) {
> > >> +			udelay(1);
> > >> +			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
> > >> +				DSL_LINEMASK_GEN3;
> > >> +			if (temp != position) {
> > >> +				position = temp;
> > >> +				goto out;
> > >> +			}
> > >> +		}
> > >> +	}
> > > 
> > > Hmm. Another idea. If it always happens at start of vbl, maybe have a
> > > look at the ISR. If scanline reads 0, but ISR says we're in vblank, just
> > > return vblank_start. That's assming ISR would in fact show that it's in
> > > vblank when this happens.
> > 
> > Sounds a bit racy,
> 
> Not too much I think. We're under spinlock_irqsave here, and if the
> scanline should magically start to tick just after we've read it we
> would still give almost the right answer, even if we got magically
> delayed by one or two scanlines. That's assuming that it has already
> latched the double buffered registers when it goes wonky. If the
> register latching happens only when the counter starts working again,
> well then we're in deeper doodoo. But then I wouldn't expect ISR to
> indicate that we're in vblank then either. So I think you should
> definitely check what the ISR actually tells you while things are
> in the bad state.

There's still a race
1. we read bogus vblank counter
2. vblank processing completes in hw
3. ISR indicates where out of vblank.

So you'd need at least a loop to make sure you're 0 vblank counter is
stable across the ISR read.
-Daniel

> In case the badness happens just before start of vblank, well, then
> I have no good idea how to handle it. This sort of retry loop is
> the best that comes to mind right now :(
> 
> > though I guess it wouldn't hurt.  I'm actually a
> > little dubious about this change anyway since it will mean longer delays
> > when we really are at the top of the field/frame.  You sure you don't
> > like the first one better?
> 
> The first one can't help vblank timestamps. Although I suppose that if
> the badness always cures itself before the interrupt gets raised, we
> would not hit this normally when we update the timestamp from the irq
> handler. We could hit it when updating the timestamp outside the irq
> handler though (ie. during vblank_get/put etc.).

On top of that there's patches floating from Chris to get a vblank
timestamp without grabbing the interrupts temporarily or any spinlock.
That helps when we immediately disable the interrupt and then a bit later
the compositors asks for the current vblank to estimate the next frame.

So if possible I'd definitely vote to share this "fix up bonghits vblank"
logic if possible ...
-Daniel
Ville Syrjälä Sept. 14, 2015, 1:02 p.m. UTC | #5
On Mon, Sep 14, 2015 at 11:10:04AM +0200, Daniel Vetter wrote:
> On Fri, Sep 11, 2015 at 01:33:08AM +0300, Ville Syrjälä wrote:
> > On Thu, Sep 10, 2015 at 02:57:32PM -0700, Jesse Barnes wrote:
> > > On 09/10/2015 02:53 PM, Ville Syrjälä wrote:
> > > > On Thu, Sep 10, 2015 at 02:38:53PM -0700, Jesse Barnes wrote:
> > > >> On HSW at least (still testing other platforms, but should be harmless
> > > >> elsewhere), the DSL reg reads back as 0 when read around vblank start
> > > >> time.  This ends up confusing the atomic start/end checking code, since
> > > >> it causes the update to appear as if it crossed a frame count boundary.
> > > >> Avoid the problem by making sure we don't return scanline_offset from
> > > >> the get_crtc_scanline function.  In moving the code there, I add to add
> > > >> an additional delay since it could be called and have a legitimate 0
> > > >> result for some time (depending on the pixel clock).
> > > >>
> > > >> v2: move hsw dsl read hack to get_crtc_scanline (Ville)
> > > >>
> > > >> References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
> > > >> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> > > >> ---
> > > >>  drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
> > > >>  1 file changed, 21 insertions(+)
> > > >>
> > > >> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > >> index 90bc6c2..97e5d52 100644
> > > >> --- a/drivers/gpu/drm/i915/i915_irq.c
> > > >> +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > >> @@ -697,6 +697,27 @@ static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
> > > >>  		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
> > > >>  
> > > >>  	/*
> > > >> +	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
> > > >> +	 * read it right around the start of vblank.  So try it again
> > > >> +	 * so we don't accidentally end up spanning a vblank frame
> > > >> +	 * increment, causing the pipe_update_end() code to squak at us.
> > > >> +	 */
> > > >> +	if (IS_HASWELL(dev) && !position) {
> > > >> +		int i, temp;
> > > >> +
> > > >> +		for (i = 0; i < 100; i++) {
> > > >> +			udelay(1);
> > > >> +			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
> > > >> +				DSL_LINEMASK_GEN3;
> > > >> +			if (temp != position) {
> > > >> +				position = temp;
> > > >> +				goto out;
> > > >> +			}
> > > >> +		}
> > > >> +	}
> > > > 
> > > > Hmm. Another idea. If it always happens at start of vbl, maybe have a
> > > > look at the ISR. If scanline reads 0, but ISR says we're in vblank, just
> > > > return vblank_start. That's assming ISR would in fact show that it's in
> > > > vblank when this happens.
> > > 
> > > Sounds a bit racy,
> > 
> > Not too much I think. We're under spinlock_irqsave here, and if the
> > scanline should magically start to tick just after we've read it we
> > would still give almost the right answer, even if we got magically
> > delayed by one or two scanlines. That's assuming that it has already
> > latched the double buffered registers when it goes wonky. If the
> > register latching happens only when the counter starts working again,
> > well then we're in deeper doodoo. But then I wouldn't expect ISR to
> > indicate that we're in vblank then either. So I think you should
> > definitely check what the ISR actually tells you while things are
> > in the bad state.
> 
> There's still a race
> 1. we read bogus vblank counter
> 2. vblank processing completes in hw
> 3. ISR indicates where out of vblank.
> 
> So you'd need at least a loop to make sure you're 0 vblank counter is
> stable across the ISR read.

Seems rather unlikely. That would mean something blocked us for the entire
duration of the vblank.

> -Daniel
> 
> > In case the badness happens just before start of vblank, well, then
> > I have no good idea how to handle it. This sort of retry loop is
> > the best that comes to mind right now :(
> > 
> > > though I guess it wouldn't hurt.  I'm actually a
> > > little dubious about this change anyway since it will mean longer delays
> > > when we really are at the top of the field/frame.  You sure you don't
> > > like the first one better?
> > 
> > The first one can't help vblank timestamps. Although I suppose that if
> > the badness always cures itself before the interrupt gets raised, we
> > would not hit this normally when we update the timestamp from the irq
> > handler. We could hit it when updating the timestamp outside the irq
> > handler though (ie. during vblank_get/put etc.).
> 
> On top of that there's patches floating from Chris to get a vblank
> timestamp without grabbing the interrupts temporarily or any spinlock.
> That helps when we immediately disable the interrupt and then a bit later
> the compositors asks for the current vblank to estimate the next frame.

I think we all already agreed that it's not going to fly. If we don't
ask the hw how many vblanks we lost we can't give userspace any kind
of accurate information.

> So if possible I'd definitely vote to share this "fix up bonghits vblank"
> logic if possible ...

I have no idea what you're looking to share and with whom. I do not want
workarounds for hardware specific bugs in the shared vblank code. We
already had that due to radeon, and that stuff has been doing nothing
for us except hinder attempts at making the code work sanely for sane
Intel hardware.

Which reminds me, I need to post a pile of vblank patches...
Daniel Vetter Sept. 14, 2015, 2:29 p.m. UTC | #6
On Mon, Sep 14, 2015 at 04:02:44PM +0300, Ville Syrjälä wrote:
> On Mon, Sep 14, 2015 at 11:10:04AM +0200, Daniel Vetter wrote:
> > On Fri, Sep 11, 2015 at 01:33:08AM +0300, Ville Syrjälä wrote:
> > > On Thu, Sep 10, 2015 at 02:57:32PM -0700, Jesse Barnes wrote:
> > > > On 09/10/2015 02:53 PM, Ville Syrjälä wrote:
> > > > > On Thu, Sep 10, 2015 at 02:38:53PM -0700, Jesse Barnes wrote:
> > > > >> On HSW at least (still testing other platforms, but should be harmless
> > > > >> elsewhere), the DSL reg reads back as 0 when read around vblank start
> > > > >> time.  This ends up confusing the atomic start/end checking code, since
> > > > >> it causes the update to appear as if it crossed a frame count boundary.
> > > > >> Avoid the problem by making sure we don't return scanline_offset from
> > > > >> the get_crtc_scanline function.  In moving the code there, I add to add
> > > > >> an additional delay since it could be called and have a legitimate 0
> > > > >> result for some time (depending on the pixel clock).
> > > > >>
> > > > >> v2: move hsw dsl read hack to get_crtc_scanline (Ville)
> > > > >>
> > > > >> References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
> > > > >> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> > > > >> ---
> > > > >>  drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
> > > > >>  1 file changed, 21 insertions(+)
> > > > >>
> > > > >> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> > > > >> index 90bc6c2..97e5d52 100644
> > > > >> --- a/drivers/gpu/drm/i915/i915_irq.c
> > > > >> +++ b/drivers/gpu/drm/i915/i915_irq.c
> > > > >> @@ -697,6 +697,27 @@ static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
> > > > >>  		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
> > > > >>  
> > > > >>  	/*
> > > > >> +	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
> > > > >> +	 * read it right around the start of vblank.  So try it again
> > > > >> +	 * so we don't accidentally end up spanning a vblank frame
> > > > >> +	 * increment, causing the pipe_update_end() code to squak at us.
> > > > >> +	 */
> > > > >> +	if (IS_HASWELL(dev) && !position) {
> > > > >> +		int i, temp;
> > > > >> +
> > > > >> +		for (i = 0; i < 100; i++) {
> > > > >> +			udelay(1);
> > > > >> +			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
> > > > >> +				DSL_LINEMASK_GEN3;
> > > > >> +			if (temp != position) {
> > > > >> +				position = temp;
> > > > >> +				goto out;
> > > > >> +			}
> > > > >> +		}
> > > > >> +	}
> > > > > 
> > > > > Hmm. Another idea. If it always happens at start of vbl, maybe have a
> > > > > look at the ISR. If scanline reads 0, but ISR says we're in vblank, just
> > > > > return vblank_start. That's assming ISR would in fact show that it's in
> > > > > vblank when this happens.
> > > > 
> > > > Sounds a bit racy,
> > > 
> > > Not too much I think. We're under spinlock_irqsave here, and if the
> > > scanline should magically start to tick just after we've read it we
> > > would still give almost the right answer, even if we got magically
> > > delayed by one or two scanlines. That's assuming that it has already
> > > latched the double buffered registers when it goes wonky. If the
> > > register latching happens only when the counter starts working again,
> > > well then we're in deeper doodoo. But then I wouldn't expect ISR to
> > > indicate that we're in vblank then either. So I think you should
> > > definitely check what the ISR actually tells you while things are
> > > in the bad state.
> > 
> > There's still a race
> > 1. we read bogus vblank counter
> > 2. vblank processing completes in hw
> > 3. ISR indicates where out of vblank.
> > 
> > So you'd need at least a loop to make sure you're 0 vblank counter is
> > stable across the ISR read.
> 
> Seems rather unlikely. That would mean something blocked us for the entire
> duration of the vblank.
> 
> > -Daniel
> > 
> > > In case the badness happens just before start of vblank, well, then
> > > I have no good idea how to handle it. This sort of retry loop is
> > > the best that comes to mind right now :(
> > > 
> > > > though I guess it wouldn't hurt.  I'm actually a
> > > > little dubious about this change anyway since it will mean longer delays
> > > > when we really are at the top of the field/frame.  You sure you don't
> > > > like the first one better?
> > > 
> > > The first one can't help vblank timestamps. Although I suppose that if
> > > the badness always cures itself before the interrupt gets raised, we
> > > would not hit this normally when we update the timestamp from the irq
> > > handler. We could hit it when updating the timestamp outside the irq
> > > handler though (ie. during vblank_get/put etc.).
> > 
> > On top of that there's patches floating from Chris to get a vblank
> > timestamp without grabbing the interrupts temporarily or any spinlock.
> > That helps when we immediately disable the interrupt and then a bit later
> > the compositors asks for the current vblank to estimate the next frame.
> 
> I think we all already agreed that it's not going to fly. If we don't
> ask the hw how many vblanks we lost we can't give userspace any kind
> of accurate information.
> 
> > So if possible I'd definitely vote to share this "fix up bonghits vblank"
> > logic if possible ...
> 
> I have no idea what you're looking to share and with whom. I do not want
> workarounds for hardware specific bugs in the shared vblank code. We
> already had that due to radeon, and that stuff has been doing nothing
> for us except hinder attempts at making the code work sanely for sane
> Intel hardware.
> 
> Which reminds me, I need to post a pile of vblank patches...

s/vblank/scanline/ in all of the above since I was confused again. And I
just thought if we have some special hacks to make the get_scanline
function cope with hw oddity that should be used by the vblank time stamp
code too. But that's the case already (if we put the hack into
__intel_get_crtc_scanline).
-Daniel
Ville Syrjälä Sept. 14, 2015, 6:47 p.m. UTC | #7
On Thu, Sep 10, 2015 at 02:38:53PM -0700, Jesse Barnes wrote:
> On HSW at least (still testing other platforms, but should be harmless
> elsewhere), the DSL reg reads back as 0 when read around vblank start
> time.  This ends up confusing the atomic start/end checking code, since
> it causes the update to appear as if it crossed a frame count boundary.
> Avoid the problem by making sure we don't return scanline_offset from
> the get_crtc_scanline function.  In moving the code there, I add to add
> an additional delay since it could be called and have a legitimate 0
> result for some time (depending on the pixel clock).
> 
> v2: move hsw dsl read hack to get_crtc_scanline (Ville)
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 90bc6c2..97e5d52 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -697,6 +697,27 @@ static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
>  		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
>  
>  	/*
> +	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
> +	 * read it right around the start of vblank.  So try it again
> +	 * so we don't accidentally end up spanning a vblank frame
> +	 * increment, causing the pipe_update_end() code to squak at us.
> +	 */
> +	if (IS_HASWELL(dev) && !position) {
> +		int i, temp;
> +
> +		for (i = 0; i < 100; i++) {
> +			udelay(1);
> +			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
> +				DSL_LINEMASK_GEN3;
> +			if (temp != position) {
> +				position = temp;
> +				goto out;
> +			}
> +		}
> +	}

I mentioned this on irc, but I'll repeat here so we have it on record
somewhere; There's another scanline register on hsw+ in the transcoder.
We should definitely check if that suffers from the same fault.
Jesse Barnes Sept. 15, 2015, 9 p.m. UTC | #8
On 09/10/2015 03:33 PM, Ville Syrjälä wrote:
> On Thu, Sep 10, 2015 at 02:57:32PM -0700, Jesse Barnes wrote:
>> On 09/10/2015 02:53 PM, Ville Syrjälä wrote:
>>> On Thu, Sep 10, 2015 at 02:38:53PM -0700, Jesse Barnes wrote:
>>>> On HSW at least (still testing other platforms, but should be harmless
>>>> elsewhere), the DSL reg reads back as 0 when read around vblank start
>>>> time.  This ends up confusing the atomic start/end checking code, since
>>>> it causes the update to appear as if it crossed a frame count boundary.
>>>> Avoid the problem by making sure we don't return scanline_offset from
>>>> the get_crtc_scanline function.  In moving the code there, I add to add
>>>> an additional delay since it could be called and have a legitimate 0
>>>> result for some time (depending on the pixel clock).
>>>>
>>>> v2: move hsw dsl read hack to get_crtc_scanline (Ville)
>>>>
>>>> References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
>>>> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
>>>> ---
>>>>  drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
>>>>  1 file changed, 21 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>>>> index 90bc6c2..97e5d52 100644
>>>> --- a/drivers/gpu/drm/i915/i915_irq.c
>>>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>>>> @@ -697,6 +697,27 @@ static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
>>>>  		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
>>>>  
>>>>  	/*
>>>> +	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
>>>> +	 * read it right around the start of vblank.  So try it again
>>>> +	 * so we don't accidentally end up spanning a vblank frame
>>>> +	 * increment, causing the pipe_update_end() code to squak at us.
>>>> +	 */
>>>> +	if (IS_HASWELL(dev) && !position) {
>>>> +		int i, temp;
>>>> +
>>>> +		for (i = 0; i < 100; i++) {
>>>> +			udelay(1);
>>>> +			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
>>>> +				DSL_LINEMASK_GEN3;
>>>> +			if (temp != position) {
>>>> +				position = temp;
>>>> +				goto out;
>>>> +			}
>>>> +		}
>>>> +	}
>>>
>>> Hmm. Another idea. If it always happens at start of vbl, maybe have a
>>> look at the ISR. If scanline reads 0, but ISR says we're in vblank, just
>>> return vblank_start. That's assming ISR would in fact show that it's in
>>> vblank when this happens.
>>
>> Sounds a bit racy,
> 
> Not too much I think. We're under spinlock_irqsave here, and if the
> scanline should magically start to tick just after we've read it we
> would still give almost the right answer, even if we got magically
> delayed by one or two scanlines. That's assuming that it has already
> latched the double buffered registers when it goes wonky. If the
> register latching happens only when the counter starts working again,
> well then we're in deeper doodoo. But then I wouldn't expect ISR to
> indicate that we're in vblank then either. So I think you should
> definitely check what the ISR actually tells you while things are
> in the bad state.
> 
> In case the badness happens just before start of vblank, well, then
> I have no good idea how to handle it. This sort of retry loop is
> the best that comes to mind right now :(
> 
>> though I guess it wouldn't hurt.  I'm actually a
>> little dubious about this change anyway since it will mean longer delays
>> when we really are at the top of the field/frame.  You sure you don't
>> like the first one better?
> 
> The first one can't help vblank timestamps. Although I suppose that if
> the badness always cures itself before the interrupt gets raised, we
> would not hit this normally when we update the timestamp from the irq
> handler. We could hit it when updating the timestamp outside the irq
> handler though (ie. during vblank_get/put etc.).
> 

Ok so just for the record, I've tried a few things:
  - using the ISR to check the vblank and vsync status when this occurs (both are clear)
  - using the transcoder debug reg for reading the scanline (also returns 0)

so I think we're stuck with one of the two approaches already posted.  As for affecting the vblank timestamps, I think you're right that this bug shouldn't affect that.  The issue (a 0 return from the PIPEDSL read) seems to manifest just as the frame counter should be getting incremented.  By the time we actually raise an interrupt or move to the next scanline, the correct value is returned.  So even if we stuff the workaround into the pipe update start function (which should be more tolerant of delays in general) I think we'll be safe.

What do you prefer?  Do you want any changes on top of either of the previous patches given the above?

Thanks,
Jesse
Ville Syrjälä Sept. 22, 2015, 4:41 p.m. UTC | #9
On Thu, Sep 10, 2015 at 02:38:53PM -0700, Jesse Barnes wrote:
> On HSW at least (still testing other platforms, but should be harmless
> elsewhere), the DSL reg reads back as 0 when read around vblank start
> time.  This ends up confusing the atomic start/end checking code, since
> it causes the update to appear as if it crossed a frame count boundary.
> Avoid the problem by making sure we don't return scanline_offset from
> the get_crtc_scanline function.  In moving the code there, I add to add
> an additional delay since it could be called and have a legitimate 0
> result for some time (depending on the pixel clock).
> 
> v2: move hsw dsl read hack to get_crtc_scanline (Ville)
> 
> References: https://bugs.freedesktop.org/show_bug.cgi?id=91579
> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
> ---
>  drivers/gpu/drm/i915/i915_irq.c | 21 +++++++++++++++++++++
>  1 file changed, 21 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 90bc6c2..97e5d52 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -697,6 +697,27 @@ static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
>  		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
>  
>  	/*
> +	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
> +	 * read it right around the start of vblank. So try it again

Make that "just before the start of vblank". Otherwise I'm sure I'll
forget the details and start to think again about using ISR tricks.

> +	 * so we don't accidentally end up spanning a vblank frame
> +	 * increment, causing the pipe_update_end() code to squak at us.
> +	 */
> +	if (IS_HASWELL(dev) && !position) {
> +		int i, temp;
> +
> +		for (i = 0; i < 100; i++) {
> +			udelay(1);
> +			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
> +				DSL_LINEMASK_GEN3;
> +			if (temp != position) {
> +				position = temp;
> +				goto out;

break is enough

A bit nasty this loop with irqs off, but it's the only way we can avoid
corrupted vblank timestamps from happening when the hardware is on the
fritz.

With the comment updatead and goto killed:
Reviewed-by: Ville Syrjälä <ville.syrjala@linux.intel.com>

> +			}
> +		}
> +	}
> +
> +out:
> +	/*
>  	 * See update_scanline_offset() for the details on the
>  	 * scanline_offset adjustment.
>  	 */
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 90bc6c2..97e5d52 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -697,6 +697,27 @@  static int __intel_get_crtc_scanline(struct intel_crtc *crtc)
 		position = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) & DSL_LINEMASK_GEN3;
 
 	/*
+	 * On HSW, the DSL reg (0x70000) appears to return 0 if we
+	 * read it right around the start of vblank.  So try it again
+	 * so we don't accidentally end up spanning a vblank frame
+	 * increment, causing the pipe_update_end() code to squak at us.
+	 */
+	if (IS_HASWELL(dev) && !position) {
+		int i, temp;
+
+		for (i = 0; i < 100; i++) {
+			udelay(1);
+			temp = __raw_i915_read32(dev_priv, PIPEDSL(pipe)) &
+				DSL_LINEMASK_GEN3;
+			if (temp != position) {
+				position = temp;
+				goto out;
+			}
+		}
+	}
+
+out:
+	/*
 	 * See update_scanline_offset() for the details on the
 	 * scanline_offset adjustment.
 	 */