diff mbox series

drm/amd/display: Use vrr friendly pageflip throttling in DC.

Message ID 20190209065255.9480-1-mario.kleiner.de@gmail.com (mailing list archive)
State New, archived
Headers show
Series drm/amd/display: Use vrr friendly pageflip throttling in DC. | expand

Commit Message

Mario Kleiner Feb. 9, 2019, 6:52 a.m. UTC
In VRR mode, keep track of the vblank count of the last
completed pageflip in amdgpu_crtc->last_flip_vblank, as
recorded in the pageflip completion handler after each
completed flip.

Use that count to prevent mmio programming a new pageflip
within the same vblank in which the last pageflip completed,
iow. to throttle pageflips to at most one flip per video
frame, while at the same time allowing to request a flip
not only before start of vblank, but also anywhere within
vblank.

The old logic did the same, and made sense for regular fixed
refresh rate flipping, but in vrr mode it prevents requesting
a flip anywhere inside the possibly huge vblank, thereby
reducing framerate in vrr mode instead of improving it, by
delaying a slightly delayed flip requests up to a maximum
vblank duration + 1 scanout duration. This would limit VRR
usefulness to only help applications with a very high GPU
demand, which can submit the flip request before start of
vblank, but then have to wait long for fences to complete.

With this method a flip can be both requested and - after
fences have completed - executed, ie. it doesn't matter if
the request (amdgpu_dm_do_flip()) gets delayed until deep
into the extended vblank due to cpu execution delays. This
also allows clients which want to regulate framerate within
the vrr range a much more fine-grained control of flip timing,
a feature that might be useful for video playback, and is
very useful for neuroscience/vision research applications.

In regular non-VRR mode, retain the old flip submission
behavior. This to keep flip scheduling for fullscreen X11/GLX
OpenGL clients intact, if they use the GLX_OML_sync_control
extensions glXSwapBufferMscOML(, ..., target_msc,...) function
with a specific target_msc target vblank count.

glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
not flip at the proper target_msc for a non-zero target_msc
if VRR mode is active with this patch. They'd often flip one
frame too early. However, this limitation should not matter
much in VRR mode, as scheduling based on vblank counts is
pretty futile/unusable under variable refresh duration
anyway, so no real extra harm is done.

According to some testing already done with this patch by
Nicholas on top of my tests, IGT tests didn't report any
problems. If fixes stuttering and flickering when flipping
at rates below the minimum vrr refresh rate.

Fixes: bb47de736661 ("drm/amdgpu: Set FreeSync state using drm VRR
properties")
Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: <stable@vger.kernel.org>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Michel Dänzer <michel@daenzer.net>
---
 drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h      |  1 +
 .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 33 ++++++++++++++++---
 2 files changed, 30 insertions(+), 4 deletions(-)

Comments

Kazlauskas, Nicholas Feb. 11, 2019, 2:38 p.m. UTC | #1
On 2/9/19 1:52 AM, Mario Kleiner wrote:
> In VRR mode, keep track of the vblank count of the last
> completed pageflip in amdgpu_crtc->last_flip_vblank, as
> recorded in the pageflip completion handler after each
> completed flip.
> 
> Use that count to prevent mmio programming a new pageflip
> within the same vblank in which the last pageflip completed,
> iow. to throttle pageflips to at most one flip per video
> frame, while at the same time allowing to request a flip
> not only before start of vblank, but also anywhere within
> vblank.
> 
> The old logic did the same, and made sense for regular fixed
> refresh rate flipping, but in vrr mode it prevents requesting
> a flip anywhere inside the possibly huge vblank, thereby
> reducing framerate in vrr mode instead of improving it, by
> delaying a slightly delayed flip requests up to a maximum
> vblank duration + 1 scanout duration. This would limit VRR
> usefulness to only help applications with a very high GPU
> demand, which can submit the flip request before start of
> vblank, but then have to wait long for fences to complete.
> 
> With this method a flip can be both requested and - after
> fences have completed - executed, ie. it doesn't matter if
> the request (amdgpu_dm_do_flip()) gets delayed until deep
> into the extended vblank due to cpu execution delays. This
> also allows clients which want to regulate framerate within
> the vrr range a much more fine-grained control of flip timing,
> a feature that might be useful for video playback, and is
> very useful for neuroscience/vision research applications.
> 
> In regular non-VRR mode, retain the old flip submission
> behavior. This to keep flip scheduling for fullscreen X11/GLX
> OpenGL clients intact, if they use the GLX_OML_sync_control
> extensions glXSwapBufferMscOML(, ..., target_msc,...) function
> with a specific target_msc target vblank count.
> 
> glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
> not flip at the proper target_msc for a non-zero target_msc
> if VRR mode is active with this patch. They'd often flip one
> frame too early. However, this limitation should not matter
> much in VRR mode, as scheduling based on vblank counts is
> pretty futile/unusable under variable refresh duration
> anyway, so no real extra harm is done.
> 
> According to some testing already done with this patch by
> Nicholas on top of my tests, IGT tests didn't report any
> problems. If fixes stuttering and flickering when flipping
> at rates below the minimum vrr refresh rate.
> 
> Fixes: bb47de736661 ("drm/amdgpu: Set FreeSync state using drm VRR
> properties")
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org>
> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Michel Dänzer <michel@daenzer.net>

Reviewed-by: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>

I can confirm I've tested this a bit as well and it looks reasonable to 
me. This patch really helps avoid large or frequent changes in timeout 
duration in the case where we take too long from the last and flip and 
we're already in the next vblank interval by the time we get into commit 
tail.

It's unfortunate that this changes the behavior of how waiting for the 
next vblank interval works when VRR is active but I can't really think 
of a case where we would want anything else but this.

For applications where timing and scheduling like this really matters I 
think we'd need to look into adding the target presentation timestamp to 
have full control over this kind of behavior from userspace in the first 
place. Along with some sort of control over async page-flipping in the 
atomic commit IOCTL as well.

Nicholas Kazlauskas

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h      |  1 +
>   .../gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c | 33 ++++++++++++++++---
>   2 files changed, 30 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h
> index bfa394ffd6d2..87ca5746f861 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h
> @@ -411,6 +411,7 @@ struct amdgpu_crtc {
>   	struct amdgpu_flip_work *pflip_works;
>   	enum amdgpu_flip_status pflip_status;
>   	int deferred_flip_completion;
> +	u64 last_flip_vblank;
>   	/* pll sharing */
>   	struct amdgpu_atom_ss ss;
>   	bool ss_enabled;
> diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> index d59bafc84475..d4da331aa349 100644
> --- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> +++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
> @@ -303,12 +303,11 @@ static void dm_pflip_high_irq(void *interrupt_params)
>   		return;
>   	}
>   
> +	/* Update to correct count(s) if racing with vblank irq */
> +	amdgpu_crtc->last_flip_vblank = drm_crtc_accurate_vblank_count(&amdgpu_crtc->base);
>   
>   	/* wake up userspace */
>   	if (amdgpu_crtc->event) {
> -		/* Update to correct count(s) if racing with vblank irq */
> -		drm_crtc_accurate_vblank_count(&amdgpu_crtc->base);
> -
>   		drm_crtc_send_vblank_event(&amdgpu_crtc->base, amdgpu_crtc->event);
>   
>   		/* page flip completed. clean up */
> @@ -4736,6 +4735,8 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>   	struct amdgpu_bo *abo;
>   	uint64_t tiling_flags, dcc_address;
>   	uint32_t target, target_vblank;
> +	uint64_t last_flip_vblank;
> +	bool vrr_active = acrtc_state->freesync_config.state == VRR_STATE_ACTIVE_VARIABLE;
>   
>   	struct {
>   		struct dc_surface_update surface_updates[MAX_SURFACES];
> @@ -4889,7 +4890,31 @@ static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
>   	 * hopefully eliminating dc_*_update structs in their entirety.
>   	 */
>   	if (flip_count) {
> -		target = (uint32_t)drm_crtc_vblank_count(pcrtc) + *wait_for_vblank;
> +		if (!vrr_active) {
> +			/* Use old throttling in non-vrr fixed refresh rate mode
> +			 * to keep flip scheduling based on target vblank counts
> +			 * working in a backwards compatible way, e.g., for
> +			 * clients using the GLX_OML_sync_control extension or
> +			 * DRI3/Present extension with defined target_msc.
> +			 */
> +			last_flip_vblank = drm_crtc_vblank_count(pcrtc);
> +		}
> +		else {
> +			/* For variable refresh rate mode only:
> +			 * Get vblank of last completed flip to avoid > 1 vrr
> +			 * flips per video frame by use of throttling, but allow
> +			 * flip programming anywhere in the possibly large
> +			 * variable vrr vblank interval for fine-grained flip
> +			 * timing control and more opportunity to avoid stutter
> +			 * on late submission of flips.
> +			 */
> +			spin_lock_irqsave(&pcrtc->dev->event_lock, flags);
> +			last_flip_vblank = acrtc_attach->last_flip_vblank;
> +			spin_unlock_irqrestore(&pcrtc->dev->event_lock, flags);
> +		}
> +
> +		target = (uint32_t)last_flip_vblank + *wait_for_vblank;
> +
>   		/* Prepare wait for target vblank early - before the fence-waits */
>   		target_vblank = target - (uint32_t)drm_crtc_vblank_count(pcrtc) +
>   				amdgpu_get_vblank_counter_kms(pcrtc->dev, acrtc_attach->crtc_id);
>
Michel Dänzer Feb. 11, 2019, 3:01 p.m. UTC | #2
On 2019-02-09 7:52 a.m., Mario Kleiner wrote:
> In VRR mode, keep track of the vblank count of the last
> completed pageflip in amdgpu_crtc->last_flip_vblank, as
> recorded in the pageflip completion handler after each
> completed flip.
> 
> Use that count to prevent mmio programming a new pageflip
> within the same vblank in which the last pageflip completed,
> iow. to throttle pageflips to at most one flip per video
> frame, while at the same time allowing to request a flip
> not only before start of vblank, but also anywhere within
> vblank.
> 
> The old logic did the same, and made sense for regular fixed
> refresh rate flipping, but in vrr mode it prevents requesting
> a flip anywhere inside the possibly huge vblank, thereby
> reducing framerate in vrr mode instead of improving it, by
> delaying a slightly delayed flip requests up to a maximum
> vblank duration + 1 scanout duration. This would limit VRR
> usefulness to only help applications with a very high GPU
> demand, which can submit the flip request before start of
> vblank, but then have to wait long for fences to complete.
> 
> With this method a flip can be both requested and - after
> fences have completed - executed, ie. it doesn't matter if
> the request (amdgpu_dm_do_flip()) gets delayed until deep
> into the extended vblank due to cpu execution delays. This
> also allows clients which want to regulate framerate within
> the vrr range a much more fine-grained control of flip timing,
> a feature that might be useful for video playback, and is
> very useful for neuroscience/vision research applications.
> 
> In regular non-VRR mode, retain the old flip submission
> behavior. This to keep flip scheduling for fullscreen X11/GLX
> OpenGL clients intact, if they use the GLX_OML_sync_control
> extensions glXSwapBufferMscOML(, ..., target_msc,...) function
> with a specific target_msc target vblank count.
> 
> glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
> not flip at the proper target_msc for a non-zero target_msc
> if VRR mode is active with this patch. They'd often flip one
> frame too early. However, this limitation should not matter
> much in VRR mode, as scheduling based on vblank counts is
> pretty futile/unusable under variable refresh duration
> anyway, so no real extra harm is done.
> 
> According to some testing already done with this patch by
> Nicholas on top of my tests, IGT tests didn't report any
> problems. If fixes stuttering and flickering when flipping
> at rates below the minimum vrr refresh rate.
> 
> Fixes: bb47de736661 ("drm/amdgpu: Set FreeSync state using drm VRR
> properties")
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: <stable@vger.kernel.org>
> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Michel Dänzer <michel@daenzer.net>

I wonder if this couldn't be solved in a simpler / cleaner way by making
use of the target MSC passed to the page_flip_target hook.
Kazlauskas, Nicholas Feb. 11, 2019, 6:44 p.m. UTC | #3
On 2/11/19 10:01 AM, Michel Dänzer wrote:
> On 2019-02-09 7:52 a.m., Mario Kleiner wrote:
>> In VRR mode, keep track of the vblank count of the last
>> completed pageflip in amdgpu_crtc->last_flip_vblank, as
>> recorded in the pageflip completion handler after each
>> completed flip.
>>
>> Use that count to prevent mmio programming a new pageflip
>> within the same vblank in which the last pageflip completed,
>> iow. to throttle pageflips to at most one flip per video
>> frame, while at the same time allowing to request a flip
>> not only before start of vblank, but also anywhere within
>> vblank.
>>
>> The old logic did the same, and made sense for regular fixed
>> refresh rate flipping, but in vrr mode it prevents requesting
>> a flip anywhere inside the possibly huge vblank, thereby
>> reducing framerate in vrr mode instead of improving it, by
>> delaying a slightly delayed flip requests up to a maximum
>> vblank duration + 1 scanout duration. This would limit VRR
>> usefulness to only help applications with a very high GPU
>> demand, which can submit the flip request before start of
>> vblank, but then have to wait long for fences to complete.
>>
>> With this method a flip can be both requested and - after
>> fences have completed - executed, ie. it doesn't matter if
>> the request (amdgpu_dm_do_flip()) gets delayed until deep
>> into the extended vblank due to cpu execution delays. This
>> also allows clients which want to regulate framerate within
>> the vrr range a much more fine-grained control of flip timing,
>> a feature that might be useful for video playback, and is
>> very useful for neuroscience/vision research applications.
>>
>> In regular non-VRR mode, retain the old flip submission
>> behavior. This to keep flip scheduling for fullscreen X11/GLX
>> OpenGL clients intact, if they use the GLX_OML_sync_control
>> extensions glXSwapBufferMscOML(, ..., target_msc,...) function
>> with a specific target_msc target vblank count.
>>
>> glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
>> not flip at the proper target_msc for a non-zero target_msc
>> if VRR mode is active with this patch. They'd often flip one
>> frame too early. However, this limitation should not matter
>> much in VRR mode, as scheduling based on vblank counts is
>> pretty futile/unusable under variable refresh duration
>> anyway, so no real extra harm is done.
>>
>> According to some testing already done with this patch by
>> Nicholas on top of my tests, IGT tests didn't report any
>> problems. If fixes stuttering and flickering when flipping
>> at rates below the minimum vrr refresh rate.
>>
>> Fixes: bb47de736661 ("drm/amdgpu: Set FreeSync state using drm VRR
>> properties")
>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>> Cc: <stable@vger.kernel.org>
>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>> Cc: Harry Wentland <harry.wentland@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Michel Dänzer <michel@daenzer.net>
> 
> I wonder if this couldn't be solved in a simpler / cleaner way by making
> use of the target MSC passed to the page_flip_target hook.
> 
> 

I'm not sure if this has a good userspace solution.

I think the problem with all of this is the difference between the 
hardware vblank and the DRM vblank.

The hardware vblank counter wraps around at the front porch, DRM wraps 
around at the back porch. The behavior in amdgpu_get_vblank_counter_kms 
is to return the hardware count if vpos < 0 and return the 
hardware_count + 1 if vpos >= 0.

The value for vpos being used here is from 
amdgpu_display_vblank_get_counter which returns a value < 0 if we're in 
vblank and >= 0 if we're in scanout. The value returned in the extended 
front porch will always be considered -vbl_end so it'll never be 
considered within scanout.

The target vblank for the "wait for vblank" logic is 
amdgpu_get_vblank_counter_kms(...) + 1.

The "wait for vblank" logic waits while we're within vblank and if the 
target - amdgpu_get_vblank_counter_kms(...) > 0.

There are 3 cases that can happen here:
(1) The wait starts in the same vblank interval as the last flip
(2) The wait starts in scanout
(3) The wait starts in the next vblank interval as the last flip

In case (1) the wait breaks as soon as we enter scanout.
In case (2) the wait breaks immediately.
In case (3) we wait for the next scanout.

By DRM logic the programming should be happening at the start of the 
back porch, but it doesn't really matter since it's both targeting the 
right scanout for cases (1) and (2).

However, for case (3) we have an issue. If the wait starts within vblank 
front porch then it's not going to be programmed for the upcoming 
scanout, but the one after that.

This is the case that's trying to be addressed with the patch - with a 
caveat.

If the last_flip_vblank != drm_crtc_vblank_count then the target_vblank 
should be < amdgpu_get_vblank_counter_kms which would cause the wait to 
exit immediately. If we're in the back porch of the vblank in this 
instance then we should be waiting until scanout to begin programming 
(and target the scanout after next).

Nicholas Kazlauskas
kernel test robot via dri-devel Feb. 12, 2019, 8:39 a.m. UTC | #4
Oops, dropped the mailing ist from my reply, so again...

On Mon, Feb 11, 2019 at 4:01 PM Michel Dänzer <michel@daenzer.net> wrote:
>
> On 2019-02-09 7:52 a.m., Mario Kleiner wrote:
> > In VRR mode, keep track of the vblank count of the last
> > completed pageflip in amdgpu_crtc->last_flip_vblank, as
> > recorded in the pageflip completion handler after each
> > completed flip.
> >
> > Use that count to prevent mmio programming a new pageflip
> > within the same vblank in which the last pageflip completed,
> > iow. to throttle pageflips to at most one flip per video
> > frame, while at the same time allowing to request a flip
> > not only before start of vblank, but also anywhere within
> > vblank.
> >
> > The old logic did the same, and made sense for regular fixed
> > refresh rate flipping, but in vrr mode it prevents requesting
> > a flip anywhere inside the possibly huge vblank, thereby
> > reducing framerate in vrr mode instead of improving it, by
> > delaying a slightly delayed flip requests up to a maximum
> > vblank duration + 1 scanout duration. This would limit VRR
> > usefulness to only help applications with a very high GPU
> > demand, which can submit the flip request before start of
> > vblank, but then have to wait long for fences to complete.
> >
> > With this method a flip can be both requested and - after
> > fences have completed - executed, ie. it doesn't matter if
> > the request (amdgpu_dm_do_flip()) gets delayed until deep
> > into the extended vblank due to cpu execution delays. This
> > also allows clients which want to regulate framerate within
> > the vrr range a much more fine-grained control of flip timing,
> > a feature that might be useful for video playback, and is
> > very useful for neuroscience/vision research applications.
> >
> > In regular non-VRR mode, retain the old flip submission
> > behavior. This to keep flip scheduling for fullscreen X11/GLX
> > OpenGL clients intact, if they use the GLX_OML_sync_control
> > extensions glXSwapBufferMscOML(, ..., target_msc,...) function
> > with a specific target_msc target vblank count.
> >
> > glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
> > not flip at the proper target_msc for a non-zero target_msc
> > if VRR mode is active with this patch. They'd often flip one
> > frame too early. However, this limitation should not matter
> > much in VRR mode, as scheduling based on vblank counts is
> > pretty futile/unusable under variable refresh duration
> > anyway, so no real extra harm is done.
> >
> > According to some testing already done with this patch by
> > Nicholas on top of my tests, IGT tests didn't report any
> > problems. If fixes stuttering and flickering when flipping
> > at rates below the minimum vrr refresh rate.
> >
> > Fixes: bb47de736661 ("drm/amdgpu: Set FreeSync state using drm VRR
> > properties")
> > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > Cc: <stable@vger.kernel.org>
> > Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > Cc: Harry Wentland <harry.wentland@amd.com>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Michel Dänzer <michel@daenzer.net>
>
> I wonder if this couldn't be solved in a simpler / cleaner way by making
> use of the target MSC passed to the page_flip_target hook.
>
If DisplayCore would implement the page_flip_target hook, one could do
the same implementation in userspace, ie. tracking msc of last
completed flip and setting target msc accordingly to throttle. But i
don't think we'd be better of with that. Same solution, but now we'd
have to let userspace know if the crtc is currently in VRR active mode
or not. And implement page_flip_target hook in DC. And implement the
tracking logic in every userspace driver that wants good performance
from VRR, e.g., also the modesetting ddx and all wayland compositors
in addition to amdgpu ddx.

What i'd like to have in the long term would be a way to pass a target
presentation time for VRR instead of a target msc.

>
> --
> Earthling Michel Dänzer               |               http://www.amd.com
> Libre software enthusiast             |             Mesa and X developer
kernel test robot via dri-devel Feb. 12, 2019, 9:03 a.m. UTC | #5
On Mon, Feb 11, 2019 at 7:44 PM Kazlauskas, Nicholas
<Nicholas.Kazlauskas@amd.com> wrote:
>
> On 2/11/19 10:01 AM, Michel Dänzer wrote:
> > On 2019-02-09 7:52 a.m., Mario Kleiner wrote:
> >> In VRR mode, keep track of the vblank count of the last
> >> completed pageflip in amdgpu_crtc->last_flip_vblank, as
> >> recorded in the pageflip completion handler after each
> >> completed flip.
...
> >
> > I wonder if this couldn't be solved in a simpler / cleaner way by making
> > use of the target MSC passed to the page_flip_target hook.
> >
> >
>
> I'm not sure if this has a good userspace solution.
>
> I think the problem with all of this is the difference between the
> hardware vblank and the DRM vblank.
>
> The hardware vblank counter wraps around at the front porch, DRM wraps
> around at the back porch. The behavior in amdgpu_get_vblank_counter_kms
> is to return the hardware count if vpos < 0 and return the
> hardware_count + 1 if vpos >= 0.
>

Small correction: It's actually the other way around. DRM wraps at
start of vblank ~ start of front porch, whereas the hw counter wraps
at start of back porch. I think that removes the caveat you mention
below, if i understand what you meant.

One way i tested if this patch does what i expect it to do was do
flips (glXSwapbuffers calls) with different durations between calls,
and continuously change that wait time in between calls to see if the
delta between consecutive flip timestamps follows those wait times,
ie. without any big discontinous "jumps" inbetween.

> The value for vpos being used here is from
> amdgpu_display_vblank_get_counter which returns a value < 0 if we're in
> vblank and >= 0 if we're in scanout. The value returned in the extended
> front porch will always be considered -vbl_end so it'll never be
> considered within scanout.
>

Another small correction: Afaik we don't actually use the sign of vpos
for "in vblank" detection anywhere, but instead the returned status
flag. Which is good, given that patch 1/3 of that other vrr fixes
series slightly change the meaning of the sign in vrr mode.

-mario

> The target vblank for the "wait for vblank" logic is
> amdgpu_get_vblank_counter_kms(...) + 1.
>
> The "wait for vblank" logic waits while we're within vblank and if the
> target - amdgpu_get_vblank_counter_kms(...) > 0.
>
> There are 3 cases that can happen here:
> (1) The wait starts in the same vblank interval as the last flip
> (2) The wait starts in scanout
> (3) The wait starts in the next vblank interval as the last flip
>
> In case (1) the wait breaks as soon as we enter scanout.
> In case (2) the wait breaks immediately.
> In case (3) we wait for the next scanout.
>
> By DRM logic the programming should be happening at the start of the
> back porch, but it doesn't really matter since it's both targeting the
> right scanout for cases (1) and (2).
>
> However, for case (3) we have an issue. If the wait starts within vblank
> front porch then it's not going to be programmed for the upcoming
> scanout, but the one after that.
>
> This is the case that's trying to be addressed with the patch - with a
> caveat.
>
> If the last_flip_vblank != drm_crtc_vblank_count then the target_vblank
> should be < amdgpu_get_vblank_counter_kms which would cause the wait to
> exit immediately. If we're in the back porch of the vblank in this
> instance then we should be waiting until scanout to begin programming
> (and target the scanout after next).
>
> Nicholas Kazlauskas
Michel Dänzer Feb. 12, 2019, 9:24 a.m. UTC | #6
On 2019-02-12 9:39 a.m., Mario Kleiner via dri-devel wrote:
> On Mon, Feb 11, 2019 at 4:01 PM Michel Dänzer <michel@daenzer.net> wrote:
>>
>> On 2019-02-09 7:52 a.m., Mario Kleiner wrote:
>>> In VRR mode, keep track of the vblank count of the last
>>> completed pageflip in amdgpu_crtc->last_flip_vblank, as
>>> recorded in the pageflip completion handler after each
>>> completed flip.
>>>
>>> Use that count to prevent mmio programming a new pageflip
>>> within the same vblank in which the last pageflip completed,
>>> iow. to throttle pageflips to at most one flip per video
>>> frame, while at the same time allowing to request a flip
>>> not only before start of vblank, but also anywhere within
>>> vblank.
>>>
>>> The old logic did the same, and made sense for regular fixed
>>> refresh rate flipping, but in vrr mode it prevents requesting
>>> a flip anywhere inside the possibly huge vblank, thereby
>>> reducing framerate in vrr mode instead of improving it, by
>>> delaying a slightly delayed flip requests up to a maximum
>>> vblank duration + 1 scanout duration. This would limit VRR
>>> usefulness to only help applications with a very high GPU
>>> demand, which can submit the flip request before start of
>>> vblank, but then have to wait long for fences to complete.
>>>
>>> With this method a flip can be both requested and - after
>>> fences have completed - executed, ie. it doesn't matter if
>>> the request (amdgpu_dm_do_flip()) gets delayed until deep
>>> into the extended vblank due to cpu execution delays. This
>>> also allows clients which want to regulate framerate within
>>> the vrr range a much more fine-grained control of flip timing,
>>> a feature that might be useful for video playback, and is
>>> very useful for neuroscience/vision research applications.
>>>
>>> In regular non-VRR mode, retain the old flip submission
>>> behavior. This to keep flip scheduling for fullscreen X11/GLX
>>> OpenGL clients intact, if they use the GLX_OML_sync_control
>>> extensions glXSwapBufferMscOML(, ..., target_msc,...) function
>>> with a specific target_msc target vblank count.
>>>
>>> glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
>>> not flip at the proper target_msc for a non-zero target_msc
>>> if VRR mode is active with this patch. They'd often flip one
>>> frame too early. However, this limitation should not matter
>>> much in VRR mode, as scheduling based on vblank counts is
>>> pretty futile/unusable under variable refresh duration
>>> anyway, so no real extra harm is done.
>>>
>>> According to some testing already done with this patch by
>>> Nicholas on top of my tests, IGT tests didn't report any
>>> problems. If fixes stuttering and flickering when flipping
>>> at rates below the minimum vrr refresh rate.
>>>
>>> Fixes: bb47de736661 ("drm/amdgpu: Set FreeSync state using drm VRR
>>> properties")
>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>> Cc: <stable@vger.kernel.org>
>>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>>> Cc: Harry Wentland <harry.wentland@amd.com>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Michel Dänzer <michel@daenzer.net>
>>
>> I wonder if this couldn't be solved in a simpler / cleaner way by making
>> use of the target MSC passed to the page_flip_target hook.
>>
> If DisplayCore would implement the page_flip_target hook, one could do
> the same implementation in userspace, ie. tracking msc of last
> completed flip and setting target msc accordingly to throttle. But i
> don't think we'd be better of with that. Same solution, but now we'd
> have to let userspace know if the crtc is currently in VRR active mode
> or not.

I don't think so.

xf86-video-amdgpu is already telling the kernel which MSC is being
targeted by a flip, using DRM_MODE_PAGE_FLIP_TARGET_ABSOLUTE/RELATIVE.
This information is used by the non-DC code to decide whether the flip
can be submitted during a vertical blank period or not.

This should work with VRR as well, as the target MSC should always be <=
the current one.


> And implement page_flip_target hook in DC.

Well, DC should really make use of the target MSC information anyway. :)


> And implement the tracking logic in every userspace driver that wants
> good performance from VRR, e.g., also the modesetting ddx and all
> wayland compositors in addition to amdgpu ddx.

They don't have any VRR support yet, not sure this would make adding it
significantly harder.

That said, it would require adding target MSC support to the atomic KMS
UAPI as well. And I agree that functionally, there would be no
significant difference in the VRR case at the end of the day. I do think
it would be cleaner / less "hackish" though, making use of the same
logic with or without VRR.
Daniel Vetter Feb. 13, 2019, 9:53 a.m. UTC | #7
On Mon, Feb 11, 2019 at 04:01:12PM +0100, Michel Dänzer wrote:
> On 2019-02-09 7:52 a.m., Mario Kleiner wrote:
> > In VRR mode, keep track of the vblank count of the last
> > completed pageflip in amdgpu_crtc->last_flip_vblank, as
> > recorded in the pageflip completion handler after each
> > completed flip.
> > 
> > Use that count to prevent mmio programming a new pageflip
> > within the same vblank in which the last pageflip completed,
> > iow. to throttle pageflips to at most one flip per video
> > frame, while at the same time allowing to request a flip
> > not only before start of vblank, but also anywhere within
> > vblank.
> > 
> > The old logic did the same, and made sense for regular fixed
> > refresh rate flipping, but in vrr mode it prevents requesting
> > a flip anywhere inside the possibly huge vblank, thereby
> > reducing framerate in vrr mode instead of improving it, by
> > delaying a slightly delayed flip requests up to a maximum
> > vblank duration + 1 scanout duration. This would limit VRR
> > usefulness to only help applications with a very high GPU
> > demand, which can submit the flip request before start of
> > vblank, but then have to wait long for fences to complete.
> > 
> > With this method a flip can be both requested and - after
> > fences have completed - executed, ie. it doesn't matter if
> > the request (amdgpu_dm_do_flip()) gets delayed until deep
> > into the extended vblank due to cpu execution delays. This
> > also allows clients which want to regulate framerate within
> > the vrr range a much more fine-grained control of flip timing,
> > a feature that might be useful for video playback, and is
> > very useful for neuroscience/vision research applications.
> > 
> > In regular non-VRR mode, retain the old flip submission
> > behavior. This to keep flip scheduling for fullscreen X11/GLX
> > OpenGL clients intact, if they use the GLX_OML_sync_control
> > extensions glXSwapBufferMscOML(, ..., target_msc,...) function
> > with a specific target_msc target vblank count.
> > 
> > glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
> > not flip at the proper target_msc for a non-zero target_msc
> > if VRR mode is active with this patch. They'd often flip one
> > frame too early. However, this limitation should not matter
> > much in VRR mode, as scheduling based on vblank counts is
> > pretty futile/unusable under variable refresh duration
> > anyway, so no real extra harm is done.
> > 
> > According to some testing already done with this patch by
> > Nicholas on top of my tests, IGT tests didn't report any
> > problems. If fixes stuttering and flickering when flipping
> > at rates below the minimum vrr refresh rate.
> > 
> > Fixes: bb47de736661 ("drm/amdgpu: Set FreeSync state using drm VRR
> > properties")
> > Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > Cc: <stable@vger.kernel.org>
> > Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > Cc: Harry Wentland <harry.wentland@amd.com>
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Michel Dänzer <michel@daenzer.net>
> 
> I wonder if this couldn't be solved in a simpler / cleaner way by making
> use of the target MSC passed to the page_flip_target hook.

Requiring that all compositors that use VRR also have to use page_flip
target (which is not yet exposed on the atomic side yet at all) just
because amdgpu doesn't sound like a great idea. I think better to handle
this in the amdgpu kernel driver, for similar reasons we've originally
added this.
-Daniel
Michel Dänzer Feb. 13, 2019, 10:54 a.m. UTC | #8
On 2019-02-13 10:53 a.m., Daniel Vetter wrote:
> On Mon, Feb 11, 2019 at 04:01:12PM +0100, Michel Dänzer wrote:
>> On 2019-02-09 7:52 a.m., Mario Kleiner wrote:
>>> In VRR mode, keep track of the vblank count of the last
>>> completed pageflip in amdgpu_crtc->last_flip_vblank, as
>>> recorded in the pageflip completion handler after each
>>> completed flip.
>>>
>>> Use that count to prevent mmio programming a new pageflip
>>> within the same vblank in which the last pageflip completed,
>>> iow. to throttle pageflips to at most one flip per video
>>> frame, while at the same time allowing to request a flip
>>> not only before start of vblank, but also anywhere within
>>> vblank.
>>>
>>> The old logic did the same, and made sense for regular fixed
>>> refresh rate flipping, but in vrr mode it prevents requesting
>>> a flip anywhere inside the possibly huge vblank, thereby
>>> reducing framerate in vrr mode instead of improving it, by
>>> delaying a slightly delayed flip requests up to a maximum
>>> vblank duration + 1 scanout duration. This would limit VRR
>>> usefulness to only help applications with a very high GPU
>>> demand, which can submit the flip request before start of
>>> vblank, but then have to wait long for fences to complete.
>>>
>>> With this method a flip can be both requested and - after
>>> fences have completed - executed, ie. it doesn't matter if
>>> the request (amdgpu_dm_do_flip()) gets delayed until deep
>>> into the extended vblank due to cpu execution delays. This
>>> also allows clients which want to regulate framerate within
>>> the vrr range a much more fine-grained control of flip timing,
>>> a feature that might be useful for video playback, and is
>>> very useful for neuroscience/vision research applications.
>>>
>>> In regular non-VRR mode, retain the old flip submission
>>> behavior. This to keep flip scheduling for fullscreen X11/GLX
>>> OpenGL clients intact, if they use the GLX_OML_sync_control
>>> extensions glXSwapBufferMscOML(, ..., target_msc,...) function
>>> with a specific target_msc target vblank count.
>>>
>>> glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
>>> not flip at the proper target_msc for a non-zero target_msc
>>> if VRR mode is active with this patch. They'd often flip one
>>> frame too early. However, this limitation should not matter
>>> much in VRR mode, as scheduling based on vblank counts is
>>> pretty futile/unusable under variable refresh duration
>>> anyway, so no real extra harm is done.
>>>
>>> According to some testing already done with this patch by
>>> Nicholas on top of my tests, IGT tests didn't report any
>>> problems. If fixes stuttering and flickering when flipping
>>> at rates below the minimum vrr refresh rate.
>>>
>>> Fixes: bb47de736661 ("drm/amdgpu: Set FreeSync state using drm VRR
>>> properties")
>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>> Cc: <stable@vger.kernel.org>
>>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>>> Cc: Harry Wentland <harry.wentland@amd.com>
>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>> Cc: Michel Dänzer <michel@daenzer.net>
>>
>> I wonder if this couldn't be solved in a simpler / cleaner way by making
>> use of the target MSC passed to the page_flip_target hook.
> 
> Requiring that all compositors that use VRR also have to use page_flip
> target (which is not yet exposed on the atomic side yet at all) just
> because amdgpu doesn't sound like a great idea. I think better to handle
> this in the amdgpu kernel driver, for similar reasons we've originally
> added this.

We've originally added what?

Also not sure what you mean by "because amdgpu". Other drivers which
want to support VRR might also have to deal with this issue one way or
another, as it's due to page flips submitted during a vertical blank
period only being expected to take effect during the following vertical
blank period normally.
Daniel Vetter Feb. 13, 2019, 12:54 p.m. UTC | #9
On Wed, Feb 13, 2019 at 11:54 AM Michel Dänzer <michel@daenzer.net> wrote:
>
> On 2019-02-13 10:53 a.m., Daniel Vetter wrote:
> > On Mon, Feb 11, 2019 at 04:01:12PM +0100, Michel Dänzer wrote:
> >> On 2019-02-09 7:52 a.m., Mario Kleiner wrote:
> >>> In VRR mode, keep track of the vblank count of the last
> >>> completed pageflip in amdgpu_crtc->last_flip_vblank, as
> >>> recorded in the pageflip completion handler after each
> >>> completed flip.
> >>>
> >>> Use that count to prevent mmio programming a new pageflip
> >>> within the same vblank in which the last pageflip completed,
> >>> iow. to throttle pageflips to at most one flip per video
> >>> frame, while at the same time allowing to request a flip
> >>> not only before start of vblank, but also anywhere within
> >>> vblank.
> >>>
> >>> The old logic did the same, and made sense for regular fixed
> >>> refresh rate flipping, but in vrr mode it prevents requesting
> >>> a flip anywhere inside the possibly huge vblank, thereby
> >>> reducing framerate in vrr mode instead of improving it, by
> >>> delaying a slightly delayed flip requests up to a maximum
> >>> vblank duration + 1 scanout duration. This would limit VRR
> >>> usefulness to only help applications with a very high GPU
> >>> demand, which can submit the flip request before start of
> >>> vblank, but then have to wait long for fences to complete.
> >>>
> >>> With this method a flip can be both requested and - after
> >>> fences have completed - executed, ie. it doesn't matter if
> >>> the request (amdgpu_dm_do_flip()) gets delayed until deep
> >>> into the extended vblank due to cpu execution delays. This
> >>> also allows clients which want to regulate framerate within
> >>> the vrr range a much more fine-grained control of flip timing,
> >>> a feature that might be useful for video playback, and is
> >>> very useful for neuroscience/vision research applications.
> >>>
> >>> In regular non-VRR mode, retain the old flip submission
> >>> behavior. This to keep flip scheduling for fullscreen X11/GLX
> >>> OpenGL clients intact, if they use the GLX_OML_sync_control
> >>> extensions glXSwapBufferMscOML(, ..., target_msc,...) function
> >>> with a specific target_msc target vblank count.
> >>>
> >>> glXSwapBuffersMscOML() or DRI3/Present PresentPixmap() will
> >>> not flip at the proper target_msc for a non-zero target_msc
> >>> if VRR mode is active with this patch. They'd often flip one
> >>> frame too early. However, this limitation should not matter
> >>> much in VRR mode, as scheduling based on vblank counts is
> >>> pretty futile/unusable under variable refresh duration
> >>> anyway, so no real extra harm is done.
> >>>
> >>> According to some testing already done with this patch by
> >>> Nicholas on top of my tests, IGT tests didn't report any
> >>> problems. If fixes stuttering and flickering when flipping
> >>> at rates below the minimum vrr refresh rate.
> >>>
> >>> Fixes: bb47de736661 ("drm/amdgpu: Set FreeSync state using drm VRR
> >>> properties")
> >>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> >>> Cc: <stable@vger.kernel.org>
> >>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> >>> Cc: Harry Wentland <harry.wentland@amd.com>
> >>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>> Cc: Michel Dänzer <michel@daenzer.net>
> >>
> >> I wonder if this couldn't be solved in a simpler / cleaner way by making
> >> use of the target MSC passed to the page_flip_target hook.
> >
> > Requiring that all compositors that use VRR also have to use page_flip
> > target (which is not yet exposed on the atomic side yet at all) just
> > because amdgpu doesn't sound like a great idea. I think better to handle
> > this in the amdgpu kernel driver, for similar reasons we've originally
> > added this.
>
> We've originally added what?

The pageflip delay right after you receive a vblank. amdgpu did accept
pageflips for the current frame even after the vblank for the same was
sent out already, breaking non-amdgpu specific compositors. This is
why the page_flip target was added, so that amdgpu could still do
this, while delaying the page_flip for everyone else.

Side effect of that code is that the refresh rate goes slow if you
page_flip without a target and happen to issue it right in the vblank
(somewhat unlikely, given that scanout takes longer than blank
period). "Fixing" that by requiring everyone to specificy the target
(which isn't enabled even for atomic) doesn't sound like a good
solution to me. And page_flip target is an amdgpu-only feature.

> Also not sure what you mean by "because amdgpu". Other drivers which
> want to support VRR might also have to deal with this issue one way or
> another, as it's due to page flips submitted during a vertical blank
> period only being expected to take effect during the following vertical
> blank period normally.

Yeah I expect we'll need to do something similar for intel vrr. That's
why I'm in this discussion. Making the vblank and page_flip timestamps
agree, plus issueing page_flips as soon as possible (without violating
the ordering rules generic userspace expects) sounds like the best
option here.
-Daniel
diff mbox series

Patch

diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h b/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h
index bfa394ffd6d2..87ca5746f861 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_mode.h
@@ -411,6 +411,7 @@  struct amdgpu_crtc {
 	struct amdgpu_flip_work *pflip_works;
 	enum amdgpu_flip_status pflip_status;
 	int deferred_flip_completion;
+	u64 last_flip_vblank;
 	/* pll sharing */
 	struct amdgpu_atom_ss ss;
 	bool ss_enabled;
diff --git a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
index d59bafc84475..d4da331aa349 100644
--- a/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
+++ b/drivers/gpu/drm/amd/display/amdgpu_dm/amdgpu_dm.c
@@ -303,12 +303,11 @@  static void dm_pflip_high_irq(void *interrupt_params)
 		return;
 	}
 
+	/* Update to correct count(s) if racing with vblank irq */
+	amdgpu_crtc->last_flip_vblank = drm_crtc_accurate_vblank_count(&amdgpu_crtc->base);
 
 	/* wake up userspace */
 	if (amdgpu_crtc->event) {
-		/* Update to correct count(s) if racing with vblank irq */
-		drm_crtc_accurate_vblank_count(&amdgpu_crtc->base);
-
 		drm_crtc_send_vblank_event(&amdgpu_crtc->base, amdgpu_crtc->event);
 
 		/* page flip completed. clean up */
@@ -4736,6 +4735,8 @@  static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
 	struct amdgpu_bo *abo;
 	uint64_t tiling_flags, dcc_address;
 	uint32_t target, target_vblank;
+	uint64_t last_flip_vblank;
+	bool vrr_active = acrtc_state->freesync_config.state == VRR_STATE_ACTIVE_VARIABLE;
 
 	struct {
 		struct dc_surface_update surface_updates[MAX_SURFACES];
@@ -4889,7 +4890,31 @@  static void amdgpu_dm_commit_planes(struct drm_atomic_state *state,
 	 * hopefully eliminating dc_*_update structs in their entirety.
 	 */
 	if (flip_count) {
-		target = (uint32_t)drm_crtc_vblank_count(pcrtc) + *wait_for_vblank;
+		if (!vrr_active) {
+			/* Use old throttling in non-vrr fixed refresh rate mode
+			 * to keep flip scheduling based on target vblank counts
+			 * working in a backwards compatible way, e.g., for
+			 * clients using the GLX_OML_sync_control extension or
+			 * DRI3/Present extension with defined target_msc.
+			 */
+			last_flip_vblank = drm_crtc_vblank_count(pcrtc);
+		}
+		else {
+			/* For variable refresh rate mode only:
+			 * Get vblank of last completed flip to avoid > 1 vrr
+			 * flips per video frame by use of throttling, but allow
+			 * flip programming anywhere in the possibly large
+			 * variable vrr vblank interval for fine-grained flip
+			 * timing control and more opportunity to avoid stutter
+			 * on late submission of flips.
+			 */
+			spin_lock_irqsave(&pcrtc->dev->event_lock, flags);
+			last_flip_vblank = acrtc_attach->last_flip_vblank;
+			spin_unlock_irqrestore(&pcrtc->dev->event_lock, flags);
+		}
+
+		target = (uint32_t)last_flip_vblank + *wait_for_vblank;
+
 		/* Prepare wait for target vblank early - before the fence-waits */
 		target_vblank = target - (uint32_t)drm_crtc_vblank_count(pcrtc) +
 				amdgpu_get_vblank_counter_kms(pcrtc->dev, acrtc_attach->crtc_id);