diff mbox series

[2/3] drm: Add basic helper to allow precise pageflip timestamps in vrr.

Message ID 20190211032225.9488-3-mario.kleiner.de@gmail.com (mailing list archive)
State New, archived
Headers show
Series [1/3] drm/amdgpu: Fix get_crtc_scanoutpos behavior in vrr when vpos >= vtotal. | expand

Commit Message

Mario Kleiner Feb. 11, 2019, 3:22 a.m. UTC
The pageflip completion timestamps transmitted to userspace
via pageflip completion events are supposed to describe the
time at which the first pixel of the new post-pageflip scanout
buffer leaves the video output of the gpu. This time is
identical to end of vblank, when active scanout starts.

For a crtc in standard fixed refresh rate, the end of vblank
is identical to the vblank timestamps calculated by
drm_update_vblank_count() at each vblank interrupt, or each
vblank dis-/enable. Therefore pageflip events just carry
that vblank timestamp as their pageflip timestamp.

For a crtc switched to variable refresh rate mode (vrr), the
pageflip completion timestamps are identical to the vblank
timestamps iff the pageflip was executed early in vblank,
before the minimum vblank duration elapsed. In this case
the time of display onset is identical to when the crtc
is running in fixed refresh rate.

However, if a pageflip completes later in the vblank, inside
the "extended front porch" in vrr mode, then the vblank will
terminate at a fixed (back porch) duration after flip, so
the display onset time is delayed correspondingly. In this
case the vblank timestamp computed at vblank irq time would
be too early, and we need a way to calculate an estimated
pageflip timestamp that will be later than the vblank timestamp.

How a driver determines such a "late flip" timestamp is hw
and driver specific, but this patch adds a new helper function
that allows the driver to propose such an alternate "late flip"
timestamp for use in pageflip events:

drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);

When sending out pageflip events, we now compare that proposed
flip_timestamp against the vblank timestamp of the current
vblank of flip completion and choose to send out the greater/
later timestamp as flip completion timestamp.

The most simple way for a kms driver to supply a suitable
flip_timestamp in vrr mode would be to simply take a timestamp
at start of the pageflip completion handler, e.g., pageflip
irq handler: flip_timestamp = ktime_get(); and then set that
as proposed "late" alternative timestamp via ...
drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);

More clever approaches could try to add some corrective offset
for fixed back porch duration, or ideally use hardware features
like hw timestamps to calculate the exact end time of vblank.

Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
Cc: Harry Wentland <harry.wentland@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
---
 drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
 include/drm/drm_vblank.h     |  8 ++++++
 2 files changed, 56 insertions(+), 1 deletion(-)

Comments

Daniel Vetter Feb. 11, 2019, 8:35 a.m. UTC | #1
On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
> The pageflip completion timestamps transmitted to userspace
> via pageflip completion events are supposed to describe the
> time at which the first pixel of the new post-pageflip scanout
> buffer leaves the video output of the gpu. This time is
> identical to end of vblank, when active scanout starts.
> 
> For a crtc in standard fixed refresh rate, the end of vblank
> is identical to the vblank timestamps calculated by
> drm_update_vblank_count() at each vblank interrupt, or each
> vblank dis-/enable. Therefore pageflip events just carry
> that vblank timestamp as their pageflip timestamp.
> 
> For a crtc switched to variable refresh rate mode (vrr), the
> pageflip completion timestamps are identical to the vblank
> timestamps iff the pageflip was executed early in vblank,
> before the minimum vblank duration elapsed. In this case
> the time of display onset is identical to when the crtc
> is running in fixed refresh rate.
> 
> However, if a pageflip completes later in the vblank, inside
> the "extended front porch" in vrr mode, then the vblank will
> terminate at a fixed (back porch) duration after flip, so
> the display onset time is delayed correspondingly. In this
> case the vblank timestamp computed at vblank irq time would
> be too early, and we need a way to calculate an estimated
> pageflip timestamp that will be later than the vblank timestamp.
> 
> How a driver determines such a "late flip" timestamp is hw
> and driver specific, but this patch adds a new helper function
> that allows the driver to propose such an alternate "late flip"
> timestamp for use in pageflip events:
> 
> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> 
> When sending out pageflip events, we now compare that proposed
> flip_timestamp against the vblank timestamp of the current
> vblank of flip completion and choose to send out the greater/
> later timestamp as flip completion timestamp.
> 
> The most simple way for a kms driver to supply a suitable
> flip_timestamp in vrr mode would be to simply take a timestamp
> at start of the pageflip completion handler, e.g., pageflip
> irq handler: flip_timestamp = ktime_get(); and then set that
> as proposed "late" alternative timestamp via ...
> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> 
> More clever approaches could try to add some corrective offset
> for fixed back porch duration, or ideally use hardware features
> like hw timestamps to calculate the exact end time of vblank.
> 
> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> Cc: Harry Wentland <harry.wentland@amd.com>
> Cc: Alex Deucher <alexander.deucher@amd.com>

Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
the right timestampe, once? With this I guess if you do a vblank query in
between the wrong and the right vblank you'll get the bogus value. Not
really great for userspace.
-Daniel

> ---
>  drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
>  include/drm/drm_vblank.h     |  8 ++++++
>  2 files changed, 56 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> index 98e091175921..4b3a4c38fabe 100644
> --- a/drivers/gpu/drm/drm_vblank.c
> +++ b/drivers/gpu/drm/drm_vblank.c
> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
>  		u64 seq, ktime_t now)
>  {
>  	struct timespec64 tv;
> +	ktime_t alt_flip_time;
>  
>  	switch (e->event.base.type) {
> -	case DRM_EVENT_VBLANK:
>  	case DRM_EVENT_FLIP_COMPLETE:
> +		/*
> +		 * For flip completion events, override "now" time
> +		 * with alt_flip_time provided by the driver via
> +		 * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
> +		 * if that time is later than given "now" vblank time.
> +		 */
> +		alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
> +		if (alt_flip_time > now)
> +			now = alt_flip_time;
> +		/* Fallthrough */
> +	case DRM_EVENT_VBLANK:
>  		tv = ktime_to_timespec64(now);
>  		e->event.vbl.sequence = seq;
>  		/*
> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
>  
>  		now = ktime_get();
>  	}
> +
>  	e->pipe = pipe;
>  	send_vblank_event(dev, e, seq, now);
>  }
>  EXPORT_SYMBOL(drm_crtc_send_vblank_event);
>  
> +/**
> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
> + * @crtc: the source CRTC of the pageflip completion event
> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
> + *
> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
> + * by the pageflip event can never be earlier than the vblank timestamp of the
> + * vblank of flip completion, as that vblank timestamp defines the end of the
> + * shortest possible vblank duration. In case of a delayed flip completion
> + * inside the extended VRR front porch however, the end of vblank can be much
> + * later, so the driver must assign an estimated timestamp of that later end of
> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
> + * set an alternate flip completion timestamp in case of late flip completions
> + * in extended vblank. In the most simple case, this @flip_time timestamp could
> + * simply be a ktime_get() timestamp taken at the start of the pageflip
> + * completion routine, with some constant duration of the back porch interval
> + * added, although more precise estimates may be possible on some hardware if
> + * the hardware provides some means of timestamping the true end of vblank.
> + *
> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
> + * will use either the standard vblank timestamp, calculated for a minimum
> + * duration vblank, or the provided @flip_time if that time is later than the
> + * vblank timestamp, to get the best possible estimate of start of display of
> + * the new post-pageflip scanout buffer.
> + */
> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> +					 ktime_t flip_time)
> +{
> +	struct drm_device *dev = crtc->dev;
> +	struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
> +
> +	vblank->alt_flip_time = flip_time;
> +}
> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
> +
>  static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
>  {
>  	if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> index 6ad9630d4f48..aacf44694ab6 100644
> --- a/include/drm/drm_vblank.h
> +++ b/include/drm/drm_vblank.h
> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
>  	 * @time: Vblank timestamp corresponding to @count.
>  	 */
>  	ktime_t time;
> +	/**
> +	 * @alt_flip_time: Vblank timestamp for end of extended vblank due to
> +	 * a late pageflip completion in variable refresh rate mode. Pageflip
> +	 * events will carry the later one of @time and @alt_flip_time.
> +	 */
> +	ktime_t alt_flip_time;
>  
>  	/**
>  	 * @refcount: Number of users/waiters of the vblank interrupt. Only when
> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
>  u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
>  u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
>  				   ktime_t *vblanktime);
> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> +					 ktime_t flip_time);
>  void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
>  			       struct drm_pending_vblank_event *e);
>  void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
> -- 
> 2.17.1
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Kazlauskas, Nicholas Feb. 11, 2019, 3:01 p.m. UTC | #2
On 2/11/19 3:35 AM, Daniel Vetter wrote:
> On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
>> The pageflip completion timestamps transmitted to userspace
>> via pageflip completion events are supposed to describe the
>> time at which the first pixel of the new post-pageflip scanout
>> buffer leaves the video output of the gpu. This time is
>> identical to end of vblank, when active scanout starts.
>>
>> For a crtc in standard fixed refresh rate, the end of vblank
>> is identical to the vblank timestamps calculated by
>> drm_update_vblank_count() at each vblank interrupt, or each
>> vblank dis-/enable. Therefore pageflip events just carry
>> that vblank timestamp as their pageflip timestamp.
>>
>> For a crtc switched to variable refresh rate mode (vrr), the
>> pageflip completion timestamps are identical to the vblank
>> timestamps iff the pageflip was executed early in vblank,
>> before the minimum vblank duration elapsed. In this case
>> the time of display onset is identical to when the crtc
>> is running in fixed refresh rate.
>>
>> However, if a pageflip completes later in the vblank, inside
>> the "extended front porch" in vrr mode, then the vblank will
>> terminate at a fixed (back porch) duration after flip, so
>> the display onset time is delayed correspondingly. In this
>> case the vblank timestamp computed at vblank irq time would
>> be too early, and we need a way to calculate an estimated
>> pageflip timestamp that will be later than the vblank timestamp.
>>
>> How a driver determines such a "late flip" timestamp is hw
>> and driver specific, but this patch adds a new helper function
>> that allows the driver to propose such an alternate "late flip"
>> timestamp for use in pageflip events:
>>
>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
>>
>> When sending out pageflip events, we now compare that proposed
>> flip_timestamp against the vblank timestamp of the current
>> vblank of flip completion and choose to send out the greater/
>> later timestamp as flip completion timestamp.
>>
>> The most simple way for a kms driver to supply a suitable
>> flip_timestamp in vrr mode would be to simply take a timestamp
>> at start of the pageflip completion handler, e.g., pageflip
>> irq handler: flip_timestamp = ktime_get(); and then set that
>> as proposed "late" alternative timestamp via ...
>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
>>
>> More clever approaches could try to add some corrective offset
>> for fixed back porch duration, or ideally use hardware features
>> like hw timestamps to calculate the exact end time of vblank.
>>
>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>> Cc: Harry Wentland <harry.wentland@amd.com>
>> Cc: Alex Deucher <alexander.deucher@amd.com>
> 
> Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
> the right timestampe, once? With this I guess if you do a vblank query in
> between the wrong and the right vblank you'll get the bogus value. Not
> really great for userspace.
> -Daniel

I think we calculate the timestamp and send the vblank event both within 
the pageflip IRQ handler so calculating the right pageflip timestamp 
once could probably be done. I'm not sure if it's easier than proposing 
a later flip time with an API like this though.

The actual scanout time should be known from the page-flip handler so 
the semantics for VRR on/off remain the same. This is because the 
page-flip triggers entering the back porch if we're in the extended 
front porch.

But scanout time from vblank events for something like 
DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only 
treated as estimates. If we're in the regular front porch then the 
timing to scanout is based on the fixed duration front porch for the 
current mode. If we're in the extended back porch then it's technically 
driver defined but the most reasonable guess is to assume that the front 
porch is going to end at any moment, so just return the length of the 
back porch for getting the scanout time.

Proposing the late timestamp shouldn't affect vblank event in the 
DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip 
event case. I'm not sure if that's what's guaranteed to happen with this 
patch though. There doesn't seem to be any locking on either 
dev->vblank_time_lock or the vblank->seqlock so while it's likely to get 
the same vblank event back as the one just stored I don't think it's 
guaranteed.

Nicholas Kazlauskas

> 
>> ---
>>   drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
>>   include/drm/drm_vblank.h     |  8 ++++++
>>   2 files changed, 56 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
>> index 98e091175921..4b3a4c38fabe 100644
>> --- a/drivers/gpu/drm/drm_vblank.c
>> +++ b/drivers/gpu/drm/drm_vblank.c
>> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
>>   		u64 seq, ktime_t now)
>>   {
>>   	struct timespec64 tv;
>> +	ktime_t alt_flip_time;
>>   
>>   	switch (e->event.base.type) {
>> -	case DRM_EVENT_VBLANK:
>>   	case DRM_EVENT_FLIP_COMPLETE:
>> +		/*
>> +		 * For flip completion events, override "now" time
>> +		 * with alt_flip_time provided by the driver via
>> +		 * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
>> +		 * if that time is later than given "now" vblank time.
>> +		 */
>> +		alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
>> +		if (alt_flip_time > now)
>> +			now = alt_flip_time;
>> +		/* Fallthrough */
>> +	case DRM_EVENT_VBLANK:
>>   		tv = ktime_to_timespec64(now);
>>   		e->event.vbl.sequence = seq;
>>   		/*
>> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
>>   
>>   		now = ktime_get();
>>   	}
>> +
>>   	e->pipe = pipe;
>>   	send_vblank_event(dev, e, seq, now);
>>   }
>>   EXPORT_SYMBOL(drm_crtc_send_vblank_event);
>>   
>> +/**
>> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
>> + * @crtc: the source CRTC of the pageflip completion event
>> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
>> + *
>> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
>> + * by the pageflip event can never be earlier than the vblank timestamp of the
>> + * vblank of flip completion, as that vblank timestamp defines the end of the
>> + * shortest possible vblank duration. In case of a delayed flip completion
>> + * inside the extended VRR front porch however, the end of vblank can be much
>> + * later, so the driver must assign an estimated timestamp of that later end of
>> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
>> + * set an alternate flip completion timestamp in case of late flip completions
>> + * in extended vblank. In the most simple case, this @flip_time timestamp could
>> + * simply be a ktime_get() timestamp taken at the start of the pageflip
>> + * completion routine, with some constant duration of the back porch interval
>> + * added, although more precise estimates may be possible on some hardware if
>> + * the hardware provides some means of timestamping the true end of vblank.
>> + *
>> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
>> + * will use either the standard vblank timestamp, calculated for a minimum
>> + * duration vblank, or the provided @flip_time if that time is later than the
>> + * vblank timestamp, to get the best possible estimate of start of display of
>> + * the new post-pageflip scanout buffer.
>> + */
>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
>> +					 ktime_t flip_time)
>> +{
>> +	struct drm_device *dev = crtc->dev;
>> +	struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
>> +
>> +	vblank->alt_flip_time = flip_time;
>> +}
>> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
>> +
>>   static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
>>   {
>>   	if (drm_core_check_feature(dev, DRIVER_MODESET)) {
>> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
>> index 6ad9630d4f48..aacf44694ab6 100644
>> --- a/include/drm/drm_vblank.h
>> +++ b/include/drm/drm_vblank.h
>> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
>>   	 * @time: Vblank timestamp corresponding to @count.
>>   	 */
>>   	ktime_t time;
>> +	/**
>> +	 * @alt_flip_time: Vblank timestamp for end of extended vblank due to
>> +	 * a late pageflip completion in variable refresh rate mode. Pageflip
>> +	 * events will carry the later one of @time and @alt_flip_time.
>> +	 */
>> +	ktime_t alt_flip_time;
>>   
>>   	/**
>>   	 * @refcount: Number of users/waiters of the vblank interrupt. Only when
>> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
>>   u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
>>   u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
>>   				   ktime_t *vblanktime);
>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
>> +					 ktime_t flip_time);
>>   void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
>>   			       struct drm_pending_vblank_event *e);
>>   void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
>> -- 
>> 2.17.1
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Daniel Vetter Feb. 11, 2019, 5:04 p.m. UTC | #3
On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
<Nicholas.Kazlauskas@amd.com> wrote:
>
> On 2/11/19 3:35 AM, Daniel Vetter wrote:
> > On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
> >> The pageflip completion timestamps transmitted to userspace
> >> via pageflip completion events are supposed to describe the
> >> time at which the first pixel of the new post-pageflip scanout
> >> buffer leaves the video output of the gpu. This time is
> >> identical to end of vblank, when active scanout starts.
> >>
> >> For a crtc in standard fixed refresh rate, the end of vblank
> >> is identical to the vblank timestamps calculated by
> >> drm_update_vblank_count() at each vblank interrupt, or each
> >> vblank dis-/enable. Therefore pageflip events just carry
> >> that vblank timestamp as their pageflip timestamp.
> >>
> >> For a crtc switched to variable refresh rate mode (vrr), the
> >> pageflip completion timestamps are identical to the vblank
> >> timestamps iff the pageflip was executed early in vblank,
> >> before the minimum vblank duration elapsed. In this case
> >> the time of display onset is identical to when the crtc
> >> is running in fixed refresh rate.
> >>
> >> However, if a pageflip completes later in the vblank, inside
> >> the "extended front porch" in vrr mode, then the vblank will
> >> terminate at a fixed (back porch) duration after flip, so
> >> the display onset time is delayed correspondingly. In this
> >> case the vblank timestamp computed at vblank irq time would
> >> be too early, and we need a way to calculate an estimated
> >> pageflip timestamp that will be later than the vblank timestamp.
> >>
> >> How a driver determines such a "late flip" timestamp is hw
> >> and driver specific, but this patch adds a new helper function
> >> that allows the driver to propose such an alternate "late flip"
> >> timestamp for use in pageflip events:
> >>
> >> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> >>
> >> When sending out pageflip events, we now compare that proposed
> >> flip_timestamp against the vblank timestamp of the current
> >> vblank of flip completion and choose to send out the greater/
> >> later timestamp as flip completion timestamp.
> >>
> >> The most simple way for a kms driver to supply a suitable
> >> flip_timestamp in vrr mode would be to simply take a timestamp
> >> at start of the pageflip completion handler, e.g., pageflip
> >> irq handler: flip_timestamp = ktime_get(); and then set that
> >> as proposed "late" alternative timestamp via ...
> >> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> >>
> >> More clever approaches could try to add some corrective offset
> >> for fixed back porch duration, or ideally use hardware features
> >> like hw timestamps to calculate the exact end time of vblank.
> >>
> >> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> >> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> >> Cc: Harry Wentland <harry.wentland@amd.com>
> >> Cc: Alex Deucher <alexander.deucher@amd.com>
> >
> > Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
> > the right timestampe, once? With this I guess if you do a vblank query in
> > between the wrong and the right vblank you'll get the bogus value. Not
> > really great for userspace.
> > -Daniel
>
> I think we calculate the timestamp and send the vblank event both within
> the pageflip IRQ handler so calculating the right pageflip timestamp
> once could probably be done. I'm not sure if it's easier than proposing
> a later flip time with an API like this though.
>
> The actual scanout time should be known from the page-flip handler so
> the semantics for VRR on/off remain the same. This is because the
> page-flip triggers entering the back porch if we're in the extended
> front porch.
>
> But scanout time from vblank events for something like
> DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
> treated as estimates. If we're in the regular front porch then the
> timing to scanout is based on the fixed duration front porch for the
> current mode. If we're in the extended back porch then it's technically
> driver defined but the most reasonable guess is to assume that the front
> porch is going to end at any moment, so just return the length of the
> back porch for getting the scanout time.
>
> Proposing the late timestamp shouldn't affect vblank event in the
> DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
> event case. I'm not sure if that's what's guaranteed to happen with this
> patch though. There doesn't seem to be any locking on either
> dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
> the same vblank event back as the one just stored I don't think it's
> guaranteed.

That's the inconsistency I mean to highlight - the timestamp for the
same frame as observed through flip complete and through the
wait_vblank ioctl can differ. Which they really shouldn't.

Now added complication is that amdgpu sends out vblank events really
early, which is used by userspace to do frontbuffer rendering in the
vblank time. But I don't think anyone wants to do both VRR and
frontbuffer rendering, hence I think it should be possible to create
correct vblank timestamps for the VRR case, while leaving the current
logic in place for everything else. But that means moving the entire
vblank processing to where you know the next frame's start time for
VRR, both for page flips and for all other vblank events.
-Daniel

>
> Nicholas Kazlauskas
>
> >
> >> ---
> >>   drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
> >>   include/drm/drm_vblank.h     |  8 ++++++
> >>   2 files changed, 56 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> >> index 98e091175921..4b3a4c38fabe 100644
> >> --- a/drivers/gpu/drm/drm_vblank.c
> >> +++ b/drivers/gpu/drm/drm_vblank.c
> >> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
> >>              u64 seq, ktime_t now)
> >>   {
> >>      struct timespec64 tv;
> >> +    ktime_t alt_flip_time;
> >>
> >>      switch (e->event.base.type) {
> >> -    case DRM_EVENT_VBLANK:
> >>      case DRM_EVENT_FLIP_COMPLETE:
> >> +            /*
> >> +             * For flip completion events, override "now" time
> >> +             * with alt_flip_time provided by the driver via
> >> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
> >> +             * if that time is later than given "now" vblank time.
> >> +             */
> >> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
> >> +            if (alt_flip_time > now)
> >> +                    now = alt_flip_time;
> >> +            /* Fallthrough */
> >> +    case DRM_EVENT_VBLANK:
> >>              tv = ktime_to_timespec64(now);
> >>              e->event.vbl.sequence = seq;
> >>              /*
> >> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> >>
> >>              now = ktime_get();
> >>      }
> >> +
> >>      e->pipe = pipe;
> >>      send_vblank_event(dev, e, seq, now);
> >>   }
> >>   EXPORT_SYMBOL(drm_crtc_send_vblank_event);
> >>
> >> +/**
> >> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
> >> + * @crtc: the source CRTC of the pageflip completion event
> >> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
> >> + *
> >> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
> >> + * by the pageflip event can never be earlier than the vblank timestamp of the
> >> + * vblank of flip completion, as that vblank timestamp defines the end of the
> >> + * shortest possible vblank duration. In case of a delayed flip completion
> >> + * inside the extended VRR front porch however, the end of vblank can be much
> >> + * later, so the driver must assign an estimated timestamp of that later end of
> >> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
> >> + * set an alternate flip completion timestamp in case of late flip completions
> >> + * in extended vblank. In the most simple case, this @flip_time timestamp could
> >> + * simply be a ktime_get() timestamp taken at the start of the pageflip
> >> + * completion routine, with some constant duration of the back porch interval
> >> + * added, although more precise estimates may be possible on some hardware if
> >> + * the hardware provides some means of timestamping the true end of vblank.
> >> + *
> >> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
> >> + * will use either the standard vblank timestamp, calculated for a minimum
> >> + * duration vblank, or the provided @flip_time if that time is later than the
> >> + * vblank timestamp, to get the best possible estimate of start of display of
> >> + * the new post-pageflip scanout buffer.
> >> + */
> >> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> >> +                                     ktime_t flip_time)
> >> +{
> >> +    struct drm_device *dev = crtc->dev;
> >> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
> >> +
> >> +    vblank->alt_flip_time = flip_time;
> >> +}
> >> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
> >> +
> >>   static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
> >>   {
> >>      if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> >> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> >> index 6ad9630d4f48..aacf44694ab6 100644
> >> --- a/include/drm/drm_vblank.h
> >> +++ b/include/drm/drm_vblank.h
> >> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
> >>       * @time: Vblank timestamp corresponding to @count.
> >>       */
> >>      ktime_t time;
> >> +    /**
> >> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
> >> +     * a late pageflip completion in variable refresh rate mode. Pageflip
> >> +     * events will carry the later one of @time and @alt_flip_time.
> >> +     */
> >> +    ktime_t alt_flip_time;
> >>
> >>      /**
> >>       * @refcount: Number of users/waiters of the vblank interrupt. Only when
> >> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
> >>   u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
> >>   u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
> >>                                 ktime_t *vblanktime);
> >> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> >> +                                     ktime_t flip_time);
> >>   void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> >>                             struct drm_pending_vblank_event *e);
> >>   void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
> >> --
> >> 2.17.1
> >>
> >> _______________________________________________
> >> dri-devel mailing list
> >> dri-devel@lists.freedesktop.org
> >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
>
kernel test robot via dri-devel Feb. 12, 2019, 9:32 p.m. UTC | #4
On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
> <Nicholas.Kazlauskas@amd.com> wrote:
> >
> > On 2/11/19 3:35 AM, Daniel Vetter wrote:
> > > On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
> > >> The pageflip completion timestamps transmitted to userspace
> > >> via pageflip completion events are supposed to describe the
> > >> time at which the first pixel of the new post-pageflip scanout
> > >> buffer leaves the video output of the gpu. This time is
> > >> identical to end of vblank, when active scanout starts.
> > >>
> > >> For a crtc in standard fixed refresh rate, the end of vblank
> > >> is identical to the vblank timestamps calculated by
> > >> drm_update_vblank_count() at each vblank interrupt, or each
> > >> vblank dis-/enable. Therefore pageflip events just carry
> > >> that vblank timestamp as their pageflip timestamp.
> > >>
> > >> For a crtc switched to variable refresh rate mode (vrr), the
> > >> pageflip completion timestamps are identical to the vblank
> > >> timestamps iff the pageflip was executed early in vblank,
> > >> before the minimum vblank duration elapsed. In this case
> > >> the time of display onset is identical to when the crtc
> > >> is running in fixed refresh rate.
> > >>
> > >> However, if a pageflip completes later in the vblank, inside
> > >> the "extended front porch" in vrr mode, then the vblank will
> > >> terminate at a fixed (back porch) duration after flip, so
> > >> the display onset time is delayed correspondingly. In this
> > >> case the vblank timestamp computed at vblank irq time would
> > >> be too early, and we need a way to calculate an estimated
> > >> pageflip timestamp that will be later than the vblank timestamp.
> > >>
> > >> How a driver determines such a "late flip" timestamp is hw
> > >> and driver specific, but this patch adds a new helper function
> > >> that allows the driver to propose such an alternate "late flip"
> > >> timestamp for use in pageflip events:
> > >>
> > >> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> > >>
> > >> When sending out pageflip events, we now compare that proposed
> > >> flip_timestamp against the vblank timestamp of the current
> > >> vblank of flip completion and choose to send out the greater/
> > >> later timestamp as flip completion timestamp.
> > >>
> > >> The most simple way for a kms driver to supply a suitable
> > >> flip_timestamp in vrr mode would be to simply take a timestamp
> > >> at start of the pageflip completion handler, e.g., pageflip
> > >> irq handler: flip_timestamp = ktime_get(); and then set that
> > >> as proposed "late" alternative timestamp via ...
> > >> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> > >>
> > >> More clever approaches could try to add some corrective offset
> > >> for fixed back porch duration, or ideally use hardware features
> > >> like hw timestamps to calculate the exact end time of vblank.
> > >>
> > >> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > >> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > >> Cc: Harry Wentland <harry.wentland@amd.com>
> > >> Cc: Alex Deucher <alexander.deucher@amd.com>
> > >
> > > Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
> > > the right timestampe, once? With this I guess if you do a vblank query in
> > > between the wrong and the right vblank you'll get the bogus value. Not
> > > really great for userspace.
> > > -Daniel
> >
> > I think we calculate the timestamp and send the vblank event both within
> > the pageflip IRQ handler so calculating the right pageflip timestamp
> > once could probably be done. I'm not sure if it's easier than proposing
> > a later flip time with an API like this though.
> >
> > The actual scanout time should be known from the page-flip handler so
> > the semantics for VRR on/off remain the same. This is because the
> > page-flip triggers entering the back porch if we're in the extended
> > front porch.
> >
> > But scanout time from vblank events for something like
> > DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
> > treated as estimates. If we're in the regular front porch then the
> > timing to scanout is based on the fixed duration front porch for the
> > current mode. If we're in the extended back porch then it's technically
> > driver defined but the most reasonable guess is to assume that the front
> > porch is going to end at any moment, so just return the length of the
> > back porch for getting the scanout time.
> >
> > Proposing the late timestamp shouldn't affect vblank event in the
> > DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
> > event case. I'm not sure if that's what's guaranteed to happen with this
> > patch though. There doesn't seem to be any locking on either
> > dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
> > the same vblank event back as the one just stored I don't think it's
> > guaranteed.
>
> That's the inconsistency I mean to highlight - the timestamp for the
> same frame as observed through flip complete and through the
> wait_vblank ioctl can differ. Which they really shouldn't.
>

Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
that vblank and pageflip timestamps should always match. But then the
kernel docs for "Variable refresh properties" in drm_connector.c for
vblank timestamps were changed for the VRR implementation in Linux
5.0-rc to redefine them when in VRR mode. They are defined, but
probably rather useless for any practical purpose, like this:

"The semantics for the vertical blank timestamp differ when
variable refresh rate is active. The vertical blank timestamp
is defined to be an estimate using the current mode's fixed
refresh rate timings. The semantics for the page-flip event
timestamp remain the same."

So our docs contradict each other as of Linux 5.0-rc. Certainly having
useful vblank timetamps would be useful.

> Now added complication is that amdgpu sends out vblank events really
> early, which is used by userspace to do frontbuffer rendering in the
> vblank time. But I don't think anyone wants to do both VRR and

I think all kms drivers try to call drm_crtc_handle_vblank() at start
of vblank to give Mesa the most time for frontbuffer rendering for
classic X. But vblank events are also used for scheduling bufferswaps
or other stuff for redirected windowed rendering, or via api's like
OML_sync_controls glXWaitForMscOML, so there might be other things
affected by a more delayed vblank handling.

> frontbuffer rendering, hence I think it should be possible to create
> correct vblank timestamps for the VRR case, while leaving the current
> logic in place for everything else. But that means moving the entire
> vblank processing to where you know the next frame's start time for
> VRR, both for page flips and for all other vblank events.
> -Daniel
>

I think for amdgpu it would be doable by calling drm_handle_vblank
from the pageflip irq handler in VRR mode, and then additionally from
the vblank interrupt handler like now. Vblank irq's are triggered via
vline interrupts with programmable vertical trigger line position, so
we could program the trigger line to be start of back-porch instead of
start of front-porch. In the pageflip case we do vblank handling +
timestamping + pflip completion from pflip irq, and have the regular
back-porch vblank handling for refresh cycles in which no
pageflip/pflip irq happened.

Not sure if/how that would mess with below-vrr-refresh-rate-range
support for stutter reduction at low fps though? Or with performance,
given vblank event dispatch could be delayed a lot in VRR compared to
normal mode? Or with all the vblank accounting during modesets or crtc
en/disable?

Also, could other drivers like Intel easily do this delayed vblank
processing in the non-pageflip case?

From my user perspective as a developer of a neuroscience research
application that has a couple of exciting/important use cases for VRR
mode, i absolutely do need trustworthy and precise pageflip event
timestamps that represent reality, otherwise VRR mode would be less
than useless for me. I can do without meaningful vblank timestamps in
VRR mode though, and expected to do without them, as any scheduling
has to happen time-based instead of based on vblank counts anyway,
given that vblank counts are quite meaningless if every frame can have
a different duration. I'd expect that situation to be similar for
other apps that would want to use timestamps in VRR mode like video
games, movie players or VR/AR applications.

-mario

> >
> > Nicholas Kazlauskas
> >
> > >
> > >> ---
> > >>   drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
> > >>   include/drm/drm_vblank.h     |  8 ++++++
> > >>   2 files changed, 56 insertions(+), 1 deletion(-)
> > >>
> > >> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> > >> index 98e091175921..4b3a4c38fabe 100644
> > >> --- a/drivers/gpu/drm/drm_vblank.c
> > >> +++ b/drivers/gpu/drm/drm_vblank.c
> > >> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
> > >>              u64 seq, ktime_t now)
> > >>   {
> > >>      struct timespec64 tv;
> > >> +    ktime_t alt_flip_time;
> > >>
> > >>      switch (e->event.base.type) {
> > >> -    case DRM_EVENT_VBLANK:
> > >>      case DRM_EVENT_FLIP_COMPLETE:
> > >> +            /*
> > >> +             * For flip completion events, override "now" time
> > >> +             * with alt_flip_time provided by the driver via
> > >> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
> > >> +             * if that time is later than given "now" vblank time.
> > >> +             */
> > >> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
> > >> +            if (alt_flip_time > now)
> > >> +                    now = alt_flip_time;
> > >> +            /* Fallthrough */
> > >> +    case DRM_EVENT_VBLANK:
> > >>              tv = ktime_to_timespec64(now);
> > >>              e->event.vbl.sequence = seq;
> > >>              /*
> > >> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> > >>
> > >>              now = ktime_get();
> > >>      }
> > >> +
> > >>      e->pipe = pipe;
> > >>      send_vblank_event(dev, e, seq, now);
> > >>   }
> > >>   EXPORT_SYMBOL(drm_crtc_send_vblank_event);
> > >>
> > >> +/**
> > >> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
> > >> + * @crtc: the source CRTC of the pageflip completion event
> > >> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
> > >> + *
> > >> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
> > >> + * by the pageflip event can never be earlier than the vblank timestamp of the
> > >> + * vblank of flip completion, as that vblank timestamp defines the end of the
> > >> + * shortest possible vblank duration. In case of a delayed flip completion
> > >> + * inside the extended VRR front porch however, the end of vblank can be much
> > >> + * later, so the driver must assign an estimated timestamp of that later end of
> > >> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
> > >> + * set an alternate flip completion timestamp in case of late flip completions
> > >> + * in extended vblank. In the most simple case, this @flip_time timestamp could
> > >> + * simply be a ktime_get() timestamp taken at the start of the pageflip
> > >> + * completion routine, with some constant duration of the back porch interval
> > >> + * added, although more precise estimates may be possible on some hardware if
> > >> + * the hardware provides some means of timestamping the true end of vblank.
> > >> + *
> > >> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
> > >> + * will use either the standard vblank timestamp, calculated for a minimum
> > >> + * duration vblank, or the provided @flip_time if that time is later than the
> > >> + * vblank timestamp, to get the best possible estimate of start of display of
> > >> + * the new post-pageflip scanout buffer.
> > >> + */
> > >> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> > >> +                                     ktime_t flip_time)
> > >> +{
> > >> +    struct drm_device *dev = crtc->dev;
> > >> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
> > >> +
> > >> +    vblank->alt_flip_time = flip_time;
> > >> +}
> > >> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
> > >> +
> > >>   static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
> > >>   {
> > >>      if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> > >> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> > >> index 6ad9630d4f48..aacf44694ab6 100644
> > >> --- a/include/drm/drm_vblank.h
> > >> +++ b/include/drm/drm_vblank.h
> > >> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
> > >>       * @time: Vblank timestamp corresponding to @count.
> > >>       */
> > >>      ktime_t time;
> > >> +    /**
> > >> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
> > >> +     * a late pageflip completion in variable refresh rate mode. Pageflip
> > >> +     * events will carry the later one of @time and @alt_flip_time.
> > >> +     */
> > >> +    ktime_t alt_flip_time;
> > >>
> > >>      /**
> > >>       * @refcount: Number of users/waiters of the vblank interrupt. Only when
> > >> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
> > >>   u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
> > >>   u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
> > >>                                 ktime_t *vblanktime);
> > >> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> > >> +                                     ktime_t flip_time);
> > >>   void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> > >>                             struct drm_pending_vblank_event *e);
> > >>   void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
> > >> --
> > >> 2.17.1
> > >>
> > >> _______________________________________________
> > >> dri-devel mailing list
> > >> dri-devel@lists.freedesktop.org
> > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Feb. 13, 2019, 9:50 a.m. UTC | #5
On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
> On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
> > <Nicholas.Kazlauskas@amd.com> wrote:
> > >
> > > On 2/11/19 3:35 AM, Daniel Vetter wrote:
> > > > On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
> > > >> The pageflip completion timestamps transmitted to userspace
> > > >> via pageflip completion events are supposed to describe the
> > > >> time at which the first pixel of the new post-pageflip scanout
> > > >> buffer leaves the video output of the gpu. This time is
> > > >> identical to end of vblank, when active scanout starts.
> > > >>
> > > >> For a crtc in standard fixed refresh rate, the end of vblank
> > > >> is identical to the vblank timestamps calculated by
> > > >> drm_update_vblank_count() at each vblank interrupt, or each
> > > >> vblank dis-/enable. Therefore pageflip events just carry
> > > >> that vblank timestamp as their pageflip timestamp.
> > > >>
> > > >> For a crtc switched to variable refresh rate mode (vrr), the
> > > >> pageflip completion timestamps are identical to the vblank
> > > >> timestamps iff the pageflip was executed early in vblank,
> > > >> before the minimum vblank duration elapsed. In this case
> > > >> the time of display onset is identical to when the crtc
> > > >> is running in fixed refresh rate.
> > > >>
> > > >> However, if a pageflip completes later in the vblank, inside
> > > >> the "extended front porch" in vrr mode, then the vblank will
> > > >> terminate at a fixed (back porch) duration after flip, so
> > > >> the display onset time is delayed correspondingly. In this
> > > >> case the vblank timestamp computed at vblank irq time would
> > > >> be too early, and we need a way to calculate an estimated
> > > >> pageflip timestamp that will be later than the vblank timestamp.
> > > >>
> > > >> How a driver determines such a "late flip" timestamp is hw
> > > >> and driver specific, but this patch adds a new helper function
> > > >> that allows the driver to propose such an alternate "late flip"
> > > >> timestamp for use in pageflip events:
> > > >>
> > > >> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> > > >>
> > > >> When sending out pageflip events, we now compare that proposed
> > > >> flip_timestamp against the vblank timestamp of the current
> > > >> vblank of flip completion and choose to send out the greater/
> > > >> later timestamp as flip completion timestamp.
> > > >>
> > > >> The most simple way for a kms driver to supply a suitable
> > > >> flip_timestamp in vrr mode would be to simply take a timestamp
> > > >> at start of the pageflip completion handler, e.g., pageflip
> > > >> irq handler: flip_timestamp = ktime_get(); and then set that
> > > >> as proposed "late" alternative timestamp via ...
> > > >> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> > > >>
> > > >> More clever approaches could try to add some corrective offset
> > > >> for fixed back porch duration, or ideally use hardware features
> > > >> like hw timestamps to calculate the exact end time of vblank.
> > > >>
> > > >> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > > >> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > > >> Cc: Harry Wentland <harry.wentland@amd.com>
> > > >> Cc: Alex Deucher <alexander.deucher@amd.com>
> > > >
> > > > Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
> > > > the right timestampe, once? With this I guess if you do a vblank query in
> > > > between the wrong and the right vblank you'll get the bogus value. Not
> > > > really great for userspace.
> > > > -Daniel
> > >
> > > I think we calculate the timestamp and send the vblank event both within
> > > the pageflip IRQ handler so calculating the right pageflip timestamp
> > > once could probably be done. I'm not sure if it's easier than proposing
> > > a later flip time with an API like this though.
> > >
> > > The actual scanout time should be known from the page-flip handler so
> > > the semantics for VRR on/off remain the same. This is because the
> > > page-flip triggers entering the back porch if we're in the extended
> > > front porch.
> > >
> > > But scanout time from vblank events for something like
> > > DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
> > > treated as estimates. If we're in the regular front porch then the
> > > timing to scanout is based on the fixed duration front porch for the
> > > current mode. If we're in the extended back porch then it's technically
> > > driver defined but the most reasonable guess is to assume that the front
> > > porch is going to end at any moment, so just return the length of the
> > > back porch for getting the scanout time.
> > >
> > > Proposing the late timestamp shouldn't affect vblank event in the
> > > DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
> > > event case. I'm not sure if that's what's guaranteed to happen with this
> > > patch though. There doesn't seem to be any locking on either
> > > dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
> > > the same vblank event back as the one just stored I don't think it's
> > > guaranteed.
> >
> > That's the inconsistency I mean to highlight - the timestamp for the
> > same frame as observed through flip complete and through the
> > wait_vblank ioctl can differ. Which they really shouldn't.
> >
> 
> Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
> that vblank and pageflip timestamps should always match. But then the
> kernel docs for "Variable refresh properties" in drm_connector.c for
> vblank timestamps were changed for the VRR implementation in Linux
> 5.0-rc to redefine them when in VRR mode. They are defined, but
> probably rather useless for any practical purpose, like this:
> 
> "The semantics for the vertical blank timestamp differ when
> variable refresh rate is active. The vertical blank timestamp
> is defined to be an estimate using the current mode's fixed
> refresh rate timings. The semantics for the page-flip event
> timestamp remain the same."

Uh I missed that. That sounds like nonsense tbh.

> So our docs contradict each other as of Linux 5.0-rc. Certainly having
> useful vblank timetamps would be useful.

Yup, imo vblank should still match page_flip. Otherwise I expect a lot of
hilarity will ensue.

> > Now added complication is that amdgpu sends out vblank events really
> > early, which is used by userspace to do frontbuffer rendering in the
> > vblank time. But I don't think anyone wants to do both VRR and
> 
> I think all kms drivers try to call drm_crtc_handle_vblank() at start
> of vblank to give Mesa the most time for frontbuffer rendering for
> classic X. But vblank events are also used for scheduling bufferswaps
> or other stuff for redirected windowed rendering, or via api's like
> OML_sync_controls glXWaitForMscOML, so there might be other things
> affected by a more delayed vblank handling.

The frontbuffer rendering is very much X driver specific, and I think
-amdgpu/radeon is the only one that requires this. No i915 driver ever
used the vblank interrupt to schedule frontbuffer blits, we use some
CS-side stalls.

Wrt scheduling pageflips: The rule is that right after you've received the
vblank for frame X, then an immediately schedule pageflip should hit X+1,
but not X. amdgpu had this broken, but it's fixed since a while.

> > frontbuffer rendering, hence I think it should be possible to create
> > correct vblank timestamps for the VRR case, while leaving the current
> > logic in place for everything else. But that means moving the entire
> > vblank processing to where you know the next frame's start time for
> > VRR, both for page flips and for all other vblank events.
> > -Daniel
> >
> 
> I think for amdgpu it would be doable by calling drm_handle_vblank
> from the pageflip irq handler in VRR mode, and then additionally from
> the vblank interrupt handler like now. Vblank irq's are triggered via
> vline interrupts with programmable vertical trigger line position, so
> we could program the trigger line to be start of back-porch instead of
> start of front-porch. In the pageflip case we do vblank handling +
> timestamping + pflip completion from pflip irq, and have the regular
> back-porch vblank handling for refresh cycles in which no
> pageflip/pflip irq happened.

Hm, can't we simply program the vblank to occur in the back porch region
for VRR? The vblank timestamping should automatically correct the
timestamp in that case. Would require a lot less code changes.

Plus update the documentation ofc.

> Not sure if/how that would mess with below-vrr-refresh-rate-range
> support for stutter reduction at low fps though? Or with performance,
> given vblank event dispatch could be delayed a lot in VRR compared to
> normal mode? Or with all the vblank accounting during modesets or crtc
> en/disable?

Currently VRR and explicit frame scheduling don't mix. We've discussed
this, and consens seems to be that we need a new property for page_flip,
similar to the target frame, but instead of frames a timestamp. Then the
driver can hit a specific time for the next frame.

And since compositors use the vblank ioctl just to do frame scheduling, I
don't think we have to worry hugely about interactions there. Much better
to aim for clean semantics (i.e. vblank and flip complete timestamps
better match).

> Also, could other drivers like Intel easily do this delayed vblank
> processing in the non-pageflip case?

Intel's problem :-) But yeah we have page flip timestamp registers, so
worst case all we need to do is shuffle code around a bit. And we have
interrupts for both start of vblank and and of vblank. It might be a bit
more work because we use the vblank interrupt to avoid races in atomic
updates, but that's not too terrible to fix.

> From my user perspective as a developer of a neuroscience research
> application that has a couple of exciting/important use cases for VRR
> mode, i absolutely do need trustworthy and precise pageflip event
> timestamps that represent reality, otherwise VRR mode would be less
> than useless for me. I can do without meaningful vblank timestamps in
> VRR mode though, and expected to do without them, as any scheduling
> has to happen time-based instead of based on vblank counts anyway,
> given that vblank counts are quite meaningless if every frame can have
> a different duration. I'd expect that situation to be similar for
> other apps that would want to use timestamps in VRR mode like video
> games, movie players or VR/AR applications.

Yeah for that you want to schedule your frames with a target frame number.
Would still be nice if vblank timestamps wouldn't be an outright lie (and
the timestamp is still useful information, even if the frame counter
isn't - many displays have min/max frame rates for VRR, which is
information we could perhaps expose).
-Daniel

> 
> -mario
> 
> > >
> > > Nicholas Kazlauskas
> > >
> > > >
> > > >> ---
> > > >>   drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
> > > >>   include/drm/drm_vblank.h     |  8 ++++++
> > > >>   2 files changed, 56 insertions(+), 1 deletion(-)
> > > >>
> > > >> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> > > >> index 98e091175921..4b3a4c38fabe 100644
> > > >> --- a/drivers/gpu/drm/drm_vblank.c
> > > >> +++ b/drivers/gpu/drm/drm_vblank.c
> > > >> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
> > > >>              u64 seq, ktime_t now)
> > > >>   {
> > > >>      struct timespec64 tv;
> > > >> +    ktime_t alt_flip_time;
> > > >>
> > > >>      switch (e->event.base.type) {
> > > >> -    case DRM_EVENT_VBLANK:
> > > >>      case DRM_EVENT_FLIP_COMPLETE:
> > > >> +            /*
> > > >> +             * For flip completion events, override "now" time
> > > >> +             * with alt_flip_time provided by the driver via
> > > >> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
> > > >> +             * if that time is later than given "now" vblank time.
> > > >> +             */
> > > >> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
> > > >> +            if (alt_flip_time > now)
> > > >> +                    now = alt_flip_time;
> > > >> +            /* Fallthrough */
> > > >> +    case DRM_EVENT_VBLANK:
> > > >>              tv = ktime_to_timespec64(now);
> > > >>              e->event.vbl.sequence = seq;
> > > >>              /*
> > > >> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> > > >>
> > > >>              now = ktime_get();
> > > >>      }
> > > >> +
> > > >>      e->pipe = pipe;
> > > >>      send_vblank_event(dev, e, seq, now);
> > > >>   }
> > > >>   EXPORT_SYMBOL(drm_crtc_send_vblank_event);
> > > >>
> > > >> +/**
> > > >> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
> > > >> + * @crtc: the source CRTC of the pageflip completion event
> > > >> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
> > > >> + *
> > > >> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
> > > >> + * by the pageflip event can never be earlier than the vblank timestamp of the
> > > >> + * vblank of flip completion, as that vblank timestamp defines the end of the
> > > >> + * shortest possible vblank duration. In case of a delayed flip completion
> > > >> + * inside the extended VRR front porch however, the end of vblank can be much
> > > >> + * later, so the driver must assign an estimated timestamp of that later end of
> > > >> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
> > > >> + * set an alternate flip completion timestamp in case of late flip completions
> > > >> + * in extended vblank. In the most simple case, this @flip_time timestamp could
> > > >> + * simply be a ktime_get() timestamp taken at the start of the pageflip
> > > >> + * completion routine, with some constant duration of the back porch interval
> > > >> + * added, although more precise estimates may be possible on some hardware if
> > > >> + * the hardware provides some means of timestamping the true end of vblank.
> > > >> + *
> > > >> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
> > > >> + * will use either the standard vblank timestamp, calculated for a minimum
> > > >> + * duration vblank, or the provided @flip_time if that time is later than the
> > > >> + * vblank timestamp, to get the best possible estimate of start of display of
> > > >> + * the new post-pageflip scanout buffer.
> > > >> + */
> > > >> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> > > >> +                                     ktime_t flip_time)
> > > >> +{
> > > >> +    struct drm_device *dev = crtc->dev;
> > > >> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
> > > >> +
> > > >> +    vblank->alt_flip_time = flip_time;
> > > >> +}
> > > >> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
> > > >> +
> > > >>   static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
> > > >>   {
> > > >>      if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> > > >> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> > > >> index 6ad9630d4f48..aacf44694ab6 100644
> > > >> --- a/include/drm/drm_vblank.h
> > > >> +++ b/include/drm/drm_vblank.h
> > > >> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
> > > >>       * @time: Vblank timestamp corresponding to @count.
> > > >>       */
> > > >>      ktime_t time;
> > > >> +    /**
> > > >> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
> > > >> +     * a late pageflip completion in variable refresh rate mode. Pageflip
> > > >> +     * events will carry the later one of @time and @alt_flip_time.
> > > >> +     */
> > > >> +    ktime_t alt_flip_time;
> > > >>
> > > >>      /**
> > > >>       * @refcount: Number of users/waiters of the vblank interrupt. Only when
> > > >> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
> > > >>   u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
> > > >>   u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
> > > >>                                 ktime_t *vblanktime);
> > > >> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> > > >> +                                     ktime_t flip_time);
> > > >>   void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> > > >>                             struct drm_pending_vblank_event *e);
> > > >>   void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
> > > >> --
> > > >> 2.17.1
> > > >>
> > > >> _______________________________________________
> > > >> dri-devel mailing list
> > > >> dri-devel@lists.freedesktop.org
> > > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > >
> > >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Chris Wilson Feb. 13, 2019, 9:55 a.m. UTC | #6
Quoting Daniel Vetter (2019-02-13 09:50:55)
> On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
> > I think all kms drivers try to call drm_crtc_handle_vblank() at start
> > of vblank to give Mesa the most time for frontbuffer rendering for
> > classic X. But vblank events are also used for scheduling bufferswaps
> > or other stuff for redirected windowed rendering, or via api's like
> > OML_sync_controls glXWaitForMscOML, so there might be other things
> > affected by a more delayed vblank handling.
> 
> The frontbuffer rendering is very much X driver specific, and I think
> -amdgpu/radeon is the only one that requires this. No i915 driver ever
> used the vblank interrupt to schedule frontbuffer blits, we use some
> CS-side stalls.

Fwiw, the Present midlayer does use vblank scheduling for inplace copy
updates. Not that I wish to encourage anyone to use frontbuffer
rendering.
-Chris
kernel test robot via dri-devel Feb. 13, 2019, 11:05 a.m. UTC | #7
On Wed, Feb 13, 2019 at 10:56 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Daniel Vetter (2019-02-13 09:50:55)
> > On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
> > > I think all kms drivers try to call drm_crtc_handle_vblank() at start
> > > of vblank to give Mesa the most time for frontbuffer rendering for
> > > classic X. But vblank events are also used for scheduling bufferswaps
> > > or other stuff for redirected windowed rendering, or via api's like
> > > OML_sync_controls glXWaitForMscOML, so there might be other things
> > > affected by a more delayed vblank handling.
> >
> > The frontbuffer rendering is very much X driver specific, and I think
> > -amdgpu/radeon is the only one that requires this. No i915 driver ever
> > used the vblank interrupt to schedule frontbuffer blits, we use some
> > CS-side stalls.
>
> Fwiw, the Present midlayer does use vblank scheduling for inplace copy
> updates. Not that I wish to encourage anyone to use frontbuffer
> rendering.
> -Chris

Yes, that's what i meant. Under DRI2 at least AMD, Intel and nouveau
have throttling based on CS stalls to avoid tearing and do throttling.
DRI3/Present last time i checked just waited for a vblank event and
then triggered the blit - something that causes tearing even when
triggered at start of vblank.

-mario
kernel test robot via dri-devel Feb. 13, 2019, 11:22 a.m. UTC | #8
On Wed, Feb 13, 2019 at 10:50 AM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
> > On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
> > > <Nicholas.Kazlauskas@amd.com> wrote:
> > > >
> > > > On 2/11/19 3:35 AM, Daniel Vetter wrote:
> > > > > On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
> > > > >> The pageflip completion timestamps transmitted to userspace
...
> > > > >
> > > > > Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
> > > > > the right timestampe, once? With this I guess if you do a vblank query in
> > > > > between the wrong and the right vblank you'll get the bogus value. Not
> > > > > really great for userspace.
> > > > > -Daniel
> > > >
> > > > I think we calculate the timestamp and send the vblank event both within
> > > > the pageflip IRQ handler so calculating the right pageflip timestamp
> > > > once could probably be done. I'm not sure if it's easier than proposing
> > > > a later flip time with an API like this though.
> > > >
> > > > The actual scanout time should be known from the page-flip handler so
> > > > the semantics for VRR on/off remain the same. This is because the
> > > > page-flip triggers entering the back porch if we're in the extended
> > > > front porch.
> > > >
> > > > But scanout time from vblank events for something like
> > > > DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
> > > > treated as estimates. If we're in the regular front porch then the
> > > > timing to scanout is based on the fixed duration front porch for the
> > > > current mode. If we're in the extended back porch then it's technically
> > > > driver defined but the most reasonable guess is to assume that the front
> > > > porch is going to end at any moment, so just return the length of the
> > > > back porch for getting the scanout time.
> > > >
> > > > Proposing the late timestamp shouldn't affect vblank event in the
> > > > DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
> > > > event case. I'm not sure if that's what's guaranteed to happen with this
> > > > patch though. There doesn't seem to be any locking on either
> > > > dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
> > > > the same vblank event back as the one just stored I don't think it's
> > > > guaranteed.
> > >
> > > That's the inconsistency I mean to highlight - the timestamp for the
> > > same frame as observed through flip complete and through the
> > > wait_vblank ioctl can differ. Which they really shouldn't.
> > >
> >
> > Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
> > that vblank and pageflip timestamps should always match. But then the
> > kernel docs for "Variable refresh properties" in drm_connector.c for
> > vblank timestamps were changed for the VRR implementation in Linux
> > 5.0-rc to redefine them when in VRR mode. They are defined, but
> > probably rather useless for any practical purpose, like this:
> >
> > "The semantics for the vertical blank timestamp differ when
> > variable refresh rate is active. The vertical blank timestamp
> > is defined to be an estimate using the current mode's fixed
> > refresh rate timings. The semantics for the page-flip event
> > timestamp remain the same."
>
> Uh I missed that. That sounds like nonsense tbh.
>
> > So our docs contradict each other as of Linux 5.0-rc. Certainly having
> > useful vblank timetamps would be useful.
>
> Yup, imo vblank should still match page_flip. Otherwise I expect a lot of
> hilarity will ensue.
>
> > > Now added complication is that amdgpu sends out vblank events really
> > > early, which is used by userspace to do frontbuffer rendering in the
> > > vblank time. But I don't think anyone wants to do both VRR and
> >
> > I think all kms drivers try to call drm_crtc_handle_vblank() at start
> > of vblank to give Mesa the most time for frontbuffer rendering for
> > classic X. But vblank events are also used for scheduling bufferswaps
> > or other stuff for redirected windowed rendering, or via api's like
> > OML_sync_controls glXWaitForMscOML, so there might be other things
> > affected by a more delayed vblank handling.
>
> The frontbuffer rendering is very much X driver specific, and I think
> -amdgpu/radeon is the only one that requires this. No i915 driver ever
> used the vblank interrupt to schedule frontbuffer blits, we use some
> CS-side stalls.
>
> Wrt scheduling pageflips: The rule is that right after you've received the
> vblank for frame X, then an immediately schedule pageflip should hit X+1,
> but not X. amdgpu had this broken, but it's fixed since a while.
>
> > > frontbuffer rendering, hence I think it should be possible to create
> > > correct vblank timestamps for the VRR case, while leaving the current
> > > logic in place for everything else. But that means moving the entire
> > > vblank processing to where you know the next frame's start time for
> > > VRR, both for page flips and for all other vblank events.
> > > -Daniel
> > >
> >
> > I think for amdgpu it would be doable by calling drm_handle_vblank
> > from the pageflip irq handler in VRR mode, and then additionally from
> > the vblank interrupt handler like now. Vblank irq's are triggered via
> > vline interrupts with programmable vertical trigger line position, so
> > we could program the trigger line to be start of back-porch instead of
> > start of front-porch. In the pageflip case we do vblank handling +
> > timestamping + pflip completion from pflip irq, and have the regular
> > back-porch vblank handling for refresh cycles in which no
> > pageflip/pflip irq happened.
>
> Hm, can't we simply program the vblank to occur in the back porch region
> for VRR? The vblank timestamping should automatically correct the
> timestamp in that case. Would require a lot less code changes.
>
> Plus update the documentation ofc.
>

Thanks to an unexpected bus driver strike i got stuck in the lab
instead of my bed for the last 6 extra hours, so i had some time to
think about this and hack up a proof-of-concept. Yes, moving vblank
irq ~ drm_handle_vblank() to the start of back-porch seems to do the
trick in VRR mode pretty well, at least as far as 30 minutes of
testing with my own application Psychtoolbox show. This may work -
More sleep and more testing required :)

As a side-effect of the shifted point of drm_update_vblank_counter()
update in back-porch or flip-irq though, the flip throttling in VRR
needed some work, which makes it even more hacky/messy - Michel will
not be pleased ;). Might need a timestamp-based throttling in VRR mode
to make it less messy.

-mario

> > Not sure if/how that would mess with below-vrr-refresh-rate-range
> > support for stutter reduction at low fps though? Or with performance,
> > given vblank event dispatch could be delayed a lot in VRR compared to
> > normal mode? Or with all the vblank accounting during modesets or crtc
> > en/disable?
>
> Currently VRR and explicit frame scheduling don't mix. We've discussed
> this, and consens seems to be that we need a new property for page_flip,
> similar to the target frame, but instead of frames a timestamp. Then the
> driver can hit a specific time for the next frame.
>
> And since compositors use the vblank ioctl just to do frame scheduling, I
> don't think we have to worry hugely about interactions there. Much better
> to aim for clean semantics (i.e. vblank and flip complete timestamps
> better match).
>
> > Also, could other drivers like Intel easily do this delayed vblank
> > processing in the non-pageflip case?
>
> Intel's problem :-) But yeah we have page flip timestamp registers, so
> worst case all we need to do is shuffle code around a bit. And we have
> interrupts for both start of vblank and and of vblank. It might be a bit
> more work because we use the vblank interrupt to avoid races in atomic
> updates, but that's not too terrible to fix.
>
> > From my user perspective as a developer of a neuroscience research
> > application that has a couple of exciting/important use cases for VRR
> > mode, i absolutely do need trustworthy and precise pageflip event
> > timestamps that represent reality, otherwise VRR mode would be less
> > than useless for me. I can do without meaningful vblank timestamps in
> > VRR mode though, and expected to do without them, as any scheduling
> > has to happen time-based instead of based on vblank counts anyway,
> > given that vblank counts are quite meaningless if every frame can have
> > a different duration. I'd expect that situation to be similar for
> > other apps that would want to use timestamps in VRR mode like video
> > games, movie players or VR/AR applications.
>
> Yeah for that you want to schedule your frames with a target frame number.
> Would still be nice if vblank timestamps wouldn't be an outright lie (and
> the timestamp is still useful information, even if the frame counter
> isn't - many displays have min/max frame rates for VRR, which is
> information we could perhaps expose).
> -Daniel
>
> >
> > -mario
> >
> > > >
> > > > Nicholas Kazlauskas
> > > >
> > > > >
> > > > >> ---
> > > > >>   drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
> > > > >>   include/drm/drm_vblank.h     |  8 ++++++
> > > > >>   2 files changed, 56 insertions(+), 1 deletion(-)
> > > > >>
> > > > >> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> > > > >> index 98e091175921..4b3a4c38fabe 100644
> > > > >> --- a/drivers/gpu/drm/drm_vblank.c
> > > > >> +++ b/drivers/gpu/drm/drm_vblank.c
> > > > >> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
> > > > >>              u64 seq, ktime_t now)
> > > > >>   {
> > > > >>      struct timespec64 tv;
> > > > >> +    ktime_t alt_flip_time;
> > > > >>
> > > > >>      switch (e->event.base.type) {
> > > > >> -    case DRM_EVENT_VBLANK:
> > > > >>      case DRM_EVENT_FLIP_COMPLETE:
> > > > >> +            /*
> > > > >> +             * For flip completion events, override "now" time
> > > > >> +             * with alt_flip_time provided by the driver via
> > > > >> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
> > > > >> +             * if that time is later than given "now" vblank time.
> > > > >> +             */
> > > > >> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
> > > > >> +            if (alt_flip_time > now)
> > > > >> +                    now = alt_flip_time;
> > > > >> +            /* Fallthrough */
> > > > >> +    case DRM_EVENT_VBLANK:
> > > > >>              tv = ktime_to_timespec64(now);
> > > > >>              e->event.vbl.sequence = seq;
> > > > >>              /*
> > > > >> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> > > > >>
> > > > >>              now = ktime_get();
> > > > >>      }
> > > > >> +
> > > > >>      e->pipe = pipe;
> > > > >>      send_vblank_event(dev, e, seq, now);
> > > > >>   }
> > > > >>   EXPORT_SYMBOL(drm_crtc_send_vblank_event);
> > > > >>
> > > > >> +/**
> > > > >> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
> > > > >> + * @crtc: the source CRTC of the pageflip completion event
> > > > >> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
> > > > >> + *
> > > > >> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
> > > > >> + * by the pageflip event can never be earlier than the vblank timestamp of the
> > > > >> + * vblank of flip completion, as that vblank timestamp defines the end of the
> > > > >> + * shortest possible vblank duration. In case of a delayed flip completion
> > > > >> + * inside the extended VRR front porch however, the end of vblank can be much
> > > > >> + * later, so the driver must assign an estimated timestamp of that later end of
> > > > >> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
> > > > >> + * set an alternate flip completion timestamp in case of late flip completions
> > > > >> + * in extended vblank. In the most simple case, this @flip_time timestamp could
> > > > >> + * simply be a ktime_get() timestamp taken at the start of the pageflip
> > > > >> + * completion routine, with some constant duration of the back porch interval
> > > > >> + * added, although more precise estimates may be possible on some hardware if
> > > > >> + * the hardware provides some means of timestamping the true end of vblank.
> > > > >> + *
> > > > >> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
> > > > >> + * will use either the standard vblank timestamp, calculated for a minimum
> > > > >> + * duration vblank, or the provided @flip_time if that time is later than the
> > > > >> + * vblank timestamp, to get the best possible estimate of start of display of
> > > > >> + * the new post-pageflip scanout buffer.
> > > > >> + */
> > > > >> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> > > > >> +                                     ktime_t flip_time)
> > > > >> +{
> > > > >> +    struct drm_device *dev = crtc->dev;
> > > > >> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
> > > > >> +
> > > > >> +    vblank->alt_flip_time = flip_time;
> > > > >> +}
> > > > >> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
> > > > >> +
> > > > >>   static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
> > > > >>   {
> > > > >>      if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> > > > >> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> > > > >> index 6ad9630d4f48..aacf44694ab6 100644
> > > > >> --- a/include/drm/drm_vblank.h
> > > > >> +++ b/include/drm/drm_vblank.h
> > > > >> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
> > > > >>       * @time: Vblank timestamp corresponding to @count.
> > > > >>       */
> > > > >>      ktime_t time;
> > > > >> +    /**
> > > > >> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
> > > > >> +     * a late pageflip completion in variable refresh rate mode. Pageflip
> > > > >> +     * events will carry the later one of @time and @alt_flip_time.
> > > > >> +     */
> > > > >> +    ktime_t alt_flip_time;
> > > > >>
> > > > >>      /**
> > > > >>       * @refcount: Number of users/waiters of the vblank interrupt. Only when
> > > > >> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
> > > > >>   u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
> > > > >>   u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
> > > > >>                                 ktime_t *vblanktime);
> > > > >> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> > > > >> +                                     ktime_t flip_time);
> > > > >>   void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> > > > >>                             struct drm_pending_vblank_event *e);
> > > > >>   void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
> > > > >> --
> > > > >> 2.17.1
> > > > >>
> > > > >> _______________________________________________
> > > > >> dri-devel mailing list
> > > > >> dri-devel@lists.freedesktop.org
> > > > >> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > > >
> > > >
> > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter Feb. 13, 2019, 12:46 p.m. UTC | #9
On Wed, Feb 13, 2019 at 12:05 PM Mario Kleiner
<mario.kleiner.de@gmail.com> wrote:
>
> On Wed, Feb 13, 2019 at 10:56 AM Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >
> > Quoting Daniel Vetter (2019-02-13 09:50:55)
> > > On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
> > > > I think all kms drivers try to call drm_crtc_handle_vblank() at start
> > > > of vblank to give Mesa the most time for frontbuffer rendering for
> > > > classic X. But vblank events are also used for scheduling bufferswaps
> > > > or other stuff for redirected windowed rendering, or via api's like
> > > > OML_sync_controls glXWaitForMscOML, so there might be other things
> > > > affected by a more delayed vblank handling.
> > >
> > > The frontbuffer rendering is very much X driver specific, and I think
> > > -amdgpu/radeon is the only one that requires this. No i915 driver ever
> > > used the vblank interrupt to schedule frontbuffer blits, we use some
> > > CS-side stalls.
> >
> > Fwiw, the Present midlayer does use vblank scheduling for inplace copy
> > updates. Not that I wish to encourage anyone to use frontbuffer
> > rendering.
> > -Chris
>
> Yes, that's what i meant. Under DRI2 at least AMD, Intel and nouveau
> have throttling based on CS stalls to avoid tearing and do throttling.
> DRI3/Present last time i checked just waited for a vblank event and
> then triggered the blit - something that causes tearing even when
> triggered at start of vblank.

Hm, might be good to document all that stuff somewhere. But in the end
I'd say the answer is "don't do frontbuffer rendering". I'm not sure
we managed to document all the rules for existing vblank vs. flip
interactions anywhere. At least I didn't find anything with a look at
our docs.
-Daniel
Kazlauskas, Nicholas Feb. 13, 2019, 2:33 p.m. UTC | #10
On 2/13/19 4:50 AM, Daniel Vetter wrote:
> On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
>> On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>
>>> On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
>>> <Nicholas.Kazlauskas@amd.com> wrote:
>>>>
>>>> On 2/11/19 3:35 AM, Daniel Vetter wrote:
>>>>> On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
>>>>>> The pageflip completion timestamps transmitted to userspace
>>>>>> via pageflip completion events are supposed to describe the
>>>>>> time at which the first pixel of the new post-pageflip scanout
>>>>>> buffer leaves the video output of the gpu. This time is
>>>>>> identical to end of vblank, when active scanout starts.
>>>>>>
>>>>>> For a crtc in standard fixed refresh rate, the end of vblank
>>>>>> is identical to the vblank timestamps calculated by
>>>>>> drm_update_vblank_count() at each vblank interrupt, or each
>>>>>> vblank dis-/enable. Therefore pageflip events just carry
>>>>>> that vblank timestamp as their pageflip timestamp.
>>>>>>
>>>>>> For a crtc switched to variable refresh rate mode (vrr), the
>>>>>> pageflip completion timestamps are identical to the vblank
>>>>>> timestamps iff the pageflip was executed early in vblank,
>>>>>> before the minimum vblank duration elapsed. In this case
>>>>>> the time of display onset is identical to when the crtc
>>>>>> is running in fixed refresh rate.
>>>>>>
>>>>>> However, if a pageflip completes later in the vblank, inside
>>>>>> the "extended front porch" in vrr mode, then the vblank will
>>>>>> terminate at a fixed (back porch) duration after flip, so
>>>>>> the display onset time is delayed correspondingly. In this
>>>>>> case the vblank timestamp computed at vblank irq time would
>>>>>> be too early, and we need a way to calculate an estimated
>>>>>> pageflip timestamp that will be later than the vblank timestamp.
>>>>>>
>>>>>> How a driver determines such a "late flip" timestamp is hw
>>>>>> and driver specific, but this patch adds a new helper function
>>>>>> that allows the driver to propose such an alternate "late flip"
>>>>>> timestamp for use in pageflip events:
>>>>>>
>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
>>>>>>
>>>>>> When sending out pageflip events, we now compare that proposed
>>>>>> flip_timestamp against the vblank timestamp of the current
>>>>>> vblank of flip completion and choose to send out the greater/
>>>>>> later timestamp as flip completion timestamp.
>>>>>>
>>>>>> The most simple way for a kms driver to supply a suitable
>>>>>> flip_timestamp in vrr mode would be to simply take a timestamp
>>>>>> at start of the pageflip completion handler, e.g., pageflip
>>>>>> irq handler: flip_timestamp = ktime_get(); and then set that
>>>>>> as proposed "late" alternative timestamp via ...
>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
>>>>>>
>>>>>> More clever approaches could try to add some corrective offset
>>>>>> for fixed back porch duration, or ideally use hardware features
>>>>>> like hw timestamps to calculate the exact end time of vblank.
>>>>>>
>>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>>>>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>>>>>> Cc: Harry Wentland <harry.wentland@amd.com>
>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>
>>>>> Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
>>>>> the right timestampe, once? With this I guess if you do a vblank query in
>>>>> between the wrong and the right vblank you'll get the bogus value. Not
>>>>> really great for userspace.
>>>>> -Daniel
>>>>
>>>> I think we calculate the timestamp and send the vblank event both within
>>>> the pageflip IRQ handler so calculating the right pageflip timestamp
>>>> once could probably be done. I'm not sure if it's easier than proposing
>>>> a later flip time with an API like this though.
>>>>
>>>> The actual scanout time should be known from the page-flip handler so
>>>> the semantics for VRR on/off remain the same. This is because the
>>>> page-flip triggers entering the back porch if we're in the extended
>>>> front porch.
>>>>
>>>> But scanout time from vblank events for something like
>>>> DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
>>>> treated as estimates. If we're in the regular front porch then the
>>>> timing to scanout is based on the fixed duration front porch for the
>>>> current mode. If we're in the extended back porch then it's technically
>>>> driver defined but the most reasonable guess is to assume that the front
>>>> porch is going to end at any moment, so just return the length of the
>>>> back porch for getting the scanout time.
>>>>
>>>> Proposing the late timestamp shouldn't affect vblank event in the
>>>> DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
>>>> event case. I'm not sure if that's what's guaranteed to happen with this
>>>> patch though. There doesn't seem to be any locking on either
>>>> dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
>>>> the same vblank event back as the one just stored I don't think it's
>>>> guaranteed.
>>>
>>> That's the inconsistency I mean to highlight - the timestamp for the
>>> same frame as observed through flip complete and through the
>>> wait_vblank ioctl can differ. Which they really shouldn't.
>>>
>>
>> Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
>> that vblank and pageflip timestamps should always match. But then the
>> kernel docs for "Variable refresh properties" in drm_connector.c for
>> vblank timestamps were changed for the VRR implementation in Linux
>> 5.0-rc to redefine them when in VRR mode. They are defined, but
>> probably rather useless for any practical purpose, like this:
>>
>> "The semantics for the vertical blank timestamp differ when
>> variable refresh rate is active. The vertical blank timestamp
>> is defined to be an estimate using the current mode's fixed
>> refresh rate timings. The semantics for the page-flip event
>> timestamp remain the same."
> 
> Uh I missed that. That sounds like nonsense tbh.
> 
>> So our docs contradict each other as of Linux 5.0-rc. Certainly having
>> useful vblank timetamps would be useful.
> 
> Yup, imo vblank should still match page_flip. Otherwise I expect a lot of
> hilarity will ensue.

I would imagine you would see more breakage by changing userspace 
expectations for when the event is actually received.

The IOCTL is "wait for vblank", but if we start sending the event near 
the end of vblank it becomes more "wait for pageflip" (near scanout) or 
"wait 15ms while the VRR front porch times out and causes flickering".

I don't speak for all userspace but it seems more useful to me to have 
the event sent at the start of vblank.

VRR in its current state is useful for applications that aren't 
completely timing sensitive (like most games). This is because the 
driver has free reign in restricting the min/max bounds of the range - 
allowing for enhancements like low framerate compensation / BTR.

For a concrete example, imagine you're not doing frontbuffer rendering 
but you're still trying to target a specific scanout time. So even if we 
were to send the vblank event at vpos 0 you'd get an accurate scanout 
timestamp for the next scanout after the vback porch ends. You're not 
going to be doing any rendering or preparation in this time because it's 
simply too short. So you'd target the scanout period after this one, 
which you'd need to calculate somehow in userspace. Taking the deltas 
between two vblank events is completely useless here because you'll get 
back the vrr min refresh delta. So you can take your source content rate 
or some fixed rate you'd like to target within the range but even then 
you're still not guaranteed to have that flip end up at that scanout time.

The current solution is to just return the scanout pos at vrr max, or 
the current mode's fixed refresh rate. If you're flip before the fixed 
duration front porch ends then that scanout time will likely be accurate 
since you can't flip faster than vrr max.

Is there actually userspace that cares that these timestamps need to 
match? Like said below, it seems that wait for vblank is mostly about 
scheduling and for that I think having the event sent back at the start 
of vblank is more important here.


> 
>>> Now added complication is that amdgpu sends out vblank events really
>>> early, which is used by userspace to do frontbuffer rendering in the
>>> vblank time. But I don't think anyone wants to do both VRR and
>>
>> I think all kms drivers try to call drm_crtc_handle_vblank() at start
>> of vblank to give Mesa the most time for frontbuffer rendering for
>> classic X. But vblank events are also used for scheduling bufferswaps
>> or other stuff for redirected windowed rendering, or via api's like
>> OML_sync_controls glXWaitForMscOML, so there might be other things
>> affected by a more delayed vblank handling.
> 
> The frontbuffer rendering is very much X driver specific, and I think
> -amdgpu/radeon is the only one that requires this. No i915 driver ever
> used the vblank interrupt to schedule frontbuffer blits, we use some
> CS-side stalls.
> 
> Wrt scheduling pageflips: The rule is that right after you've received the
> vblank for frame X, then an immediately schedule pageflip should hit X+1,
> but not X. amdgpu had this broken, but it's fixed since a while.
> 
>>> frontbuffer rendering, hence I think it should be possible to create
>>> correct vblank timestamps for the VRR case, while leaving the current
>>> logic in place for everything else. But that means moving the entire
>>> vblank processing to where you know the next frame's start time for
>>> VRR, both for page flips and for all other vblank events.
>>> -Daniel
>>>
>>
>> I think for amdgpu it would be doable by calling drm_handle_vblank
>> from the pageflip irq handler in VRR mode, and then additionally from
>> the vblank interrupt handler like now. Vblank irq's are triggered via
>> vline interrupts with programmable vertical trigger line position, so
>> we could program the trigger line to be start of back-porch instead of
>> start of front-porch. In the pageflip case we do vblank handling +
>> timestamping + pflip completion from pflip irq, and have the regular
>> back-porch vblank handling for refresh cycles in which no
>> pageflip/pflip irq happened.
> 
> Hm, can't we simply program the vblank to occur in the back porch region
> for VRR? The vblank timestamping should automatically correct the
> timestamp in that case. Would require a lot less code changes.
> 
> Plus update the documentation ofc.
> 
>> Not sure if/how that would mess with below-vrr-refresh-rate-range
>> support for stutter reduction at low fps though? Or with performance,
>> given vblank event dispatch could be delayed a lot in VRR compared to
>> normal mode? Or with all the vblank accounting during modesets or crtc
>> en/disable?

I think we have extra VLINEs that can be programmed at vpos 0 that could 
resolve this but I'm still not really sold on the idea.

> 
> Currently VRR and explicit frame scheduling don't mix. We've discussed
> this, and consens seems to be that we need a new property for page_flip,
> similar to the target frame, but instead of frames a timestamp. Then the
> driver can hit a specific time for the next frame.
> 
> And since compositors use the vblank ioctl just to do frame scheduling, I
> don't think we have to worry hugely about interactions there. Much better
> to aim for clean semantics (i.e. vblank and flip complete timestamps
> better match).
> 
>> Also, could other drivers like Intel easily do this delayed vblank
>> processing in the non-pageflip case?
> 
> Intel's problem :-) But yeah we have page flip timestamp registers, so
> worst case all we need to do is shuffle code around a bit. And we have
> interrupts for both start of vblank and and of vblank. It might be a bit
> more work because we use the vblank interrupt to avoid races in atomic
> updates, but that's not too terrible to fix.
> 
>>  From my user perspective as a developer of a neuroscience research
>> application that has a couple of exciting/important use cases for VRR
>> mode, i absolutely do need trustworthy and precise pageflip event
>> timestamps that represent reality, otherwise VRR mode would be less
>> than useless for me. I can do without meaningful vblank timestamps in
>> VRR mode though, and expected to do without them, as any scheduling
>> has to happen time-based instead of based on vblank counts anyway,
>> given that vblank counts are quite meaningless if every frame can have
>> a different duration. I'd expect that situation to be similar for
>> other apps that would want to use timestamps in VRR mode like video
>> games, movie players or VR/AR applications.
> 
> Yeah for that you want to schedule your frames with a target frame number.
> Would still be nice if vblank timestamps wouldn't be an outright lie (and
> the timestamp is still useful information, even if the frame counter
> isn't - many displays have min/max frame rates for VRR, which is
> information we could perhaps expose).
> -Daniel >
>>
>> -mario
>>
>>>>
>>>> Nicholas Kazlauskas
>>>>
>>>>>
>>>>>> ---
>>>>>>    drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
>>>>>>    include/drm/drm_vblank.h     |  8 ++++++
>>>>>>    2 files changed, 56 insertions(+), 1 deletion(-)
>>>>>>
>>>>>> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
>>>>>> index 98e091175921..4b3a4c38fabe 100644
>>>>>> --- a/drivers/gpu/drm/drm_vblank.c
>>>>>> +++ b/drivers/gpu/drm/drm_vblank.c
>>>>>> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
>>>>>>               u64 seq, ktime_t now)
>>>>>>    {
>>>>>>       struct timespec64 tv;
>>>>>> +    ktime_t alt_flip_time;
>>>>>>
>>>>>>       switch (e->event.base.type) {
>>>>>> -    case DRM_EVENT_VBLANK:
>>>>>>       case DRM_EVENT_FLIP_COMPLETE:
>>>>>> +            /*
>>>>>> +             * For flip completion events, override "now" time
>>>>>> +             * with alt_flip_time provided by the driver via
>>>>>> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
>>>>>> +             * if that time is later than given "now" vblank time.
>>>>>> +             */
>>>>>> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
>>>>>> +            if (alt_flip_time > now)
>>>>>> +                    now = alt_flip_time;
>>>>>> +            /* Fallthrough */
>>>>>> +    case DRM_EVENT_VBLANK:
>>>>>>               tv = ktime_to_timespec64(now);
>>>>>>               e->event.vbl.sequence = seq;
>>>>>>               /*
>>>>>> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
>>>>>>
>>>>>>               now = ktime_get();
>>>>>>       }
>>>>>> +
>>>>>>       e->pipe = pipe;
>>>>>>       send_vblank_event(dev, e, seq, now);
>>>>>>    }
>>>>>>    EXPORT_SYMBOL(drm_crtc_send_vblank_event);
>>>>>>
>>>>>> +/**
>>>>>> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
>>>>>> + * @crtc: the source CRTC of the pageflip completion event
>>>>>> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
>>>>>> + *
>>>>>> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
>>>>>> + * by the pageflip event can never be earlier than the vblank timestamp of the
>>>>>> + * vblank of flip completion, as that vblank timestamp defines the end of the
>>>>>> + * shortest possible vblank duration. In case of a delayed flip completion
>>>>>> + * inside the extended VRR front porch however, the end of vblank can be much
>>>>>> + * later, so the driver must assign an estimated timestamp of that later end of
>>>>>> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
>>>>>> + * set an alternate flip completion timestamp in case of late flip completions
>>>>>> + * in extended vblank. In the most simple case, this @flip_time timestamp could
>>>>>> + * simply be a ktime_get() timestamp taken at the start of the pageflip
>>>>>> + * completion routine, with some constant duration of the back porch interval
>>>>>> + * added, although more precise estimates may be possible on some hardware if
>>>>>> + * the hardware provides some means of timestamping the true end of vblank.
>>>>>> + *
>>>>>> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
>>>>>> + * will use either the standard vblank timestamp, calculated for a minimum
>>>>>> + * duration vblank, or the provided @flip_time if that time is later than the
>>>>>> + * vblank timestamp, to get the best possible estimate of start of display of
>>>>>> + * the new post-pageflip scanout buffer.
>>>>>> + */
>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
>>>>>> +                                     ktime_t flip_time)
>>>>>> +{
>>>>>> +    struct drm_device *dev = crtc->dev;
>>>>>> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
>>>>>> +
>>>>>> +    vblank->alt_flip_time = flip_time;
>>>>>> +}
>>>>>> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
>>>>>> +
>>>>>>    static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
>>>>>>    {
>>>>>>       if (drm_core_check_feature(dev, DRIVER_MODESET)) {
>>>>>> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
>>>>>> index 6ad9630d4f48..aacf44694ab6 100644
>>>>>> --- a/include/drm/drm_vblank.h
>>>>>> +++ b/include/drm/drm_vblank.h
>>>>>> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
>>>>>>        * @time: Vblank timestamp corresponding to @count.
>>>>>>        */
>>>>>>       ktime_t time;
>>>>>> +    /**
>>>>>> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
>>>>>> +     * a late pageflip completion in variable refresh rate mode. Pageflip
>>>>>> +     * events will carry the later one of @time and @alt_flip_time.
>>>>>> +     */
>>>>>> +    ktime_t alt_flip_time;
>>>>>>
>>>>>>       /**
>>>>>>        * @refcount: Number of users/waiters of the vblank interrupt. Only when
>>>>>> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
>>>>>>    u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
>>>>>>    u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
>>>>>>                                  ktime_t *vblanktime);
>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
>>>>>> +                                     ktime_t flip_time);
>>>>>>    void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
>>>>>>                              struct drm_pending_vblank_event *e);
>>>>>>    void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
>>>>>> --
>>>>>> 2.17.1
>>>>>>
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>
>>>>
>>>
>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> 

Nicholas Kazlauskas
Daniel Vetter Feb. 13, 2019, 3:14 p.m. UTC | #11
On Wed, Feb 13, 2019 at 3:33 PM Kazlauskas, Nicholas
<Nicholas.Kazlauskas@amd.com> wrote:
>
> On 2/13/19 4:50 AM, Daniel Vetter wrote:
> > On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
> >> On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>
> >>> On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
> >>> <Nicholas.Kazlauskas@amd.com> wrote:
> >>>>
> >>>> On 2/11/19 3:35 AM, Daniel Vetter wrote:
> >>>>> On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
> >>>>>> The pageflip completion timestamps transmitted to userspace
> >>>>>> via pageflip completion events are supposed to describe the
> >>>>>> time at which the first pixel of the new post-pageflip scanout
> >>>>>> buffer leaves the video output of the gpu. This time is
> >>>>>> identical to end of vblank, when active scanout starts.
> >>>>>>
> >>>>>> For a crtc in standard fixed refresh rate, the end of vblank
> >>>>>> is identical to the vblank timestamps calculated by
> >>>>>> drm_update_vblank_count() at each vblank interrupt, or each
> >>>>>> vblank dis-/enable. Therefore pageflip events just carry
> >>>>>> that vblank timestamp as their pageflip timestamp.
> >>>>>>
> >>>>>> For a crtc switched to variable refresh rate mode (vrr), the
> >>>>>> pageflip completion timestamps are identical to the vblank
> >>>>>> timestamps iff the pageflip was executed early in vblank,
> >>>>>> before the minimum vblank duration elapsed. In this case
> >>>>>> the time of display onset is identical to when the crtc
> >>>>>> is running in fixed refresh rate.
> >>>>>>
> >>>>>> However, if a pageflip completes later in the vblank, inside
> >>>>>> the "extended front porch" in vrr mode, then the vblank will
> >>>>>> terminate at a fixed (back porch) duration after flip, so
> >>>>>> the display onset time is delayed correspondingly. In this
> >>>>>> case the vblank timestamp computed at vblank irq time would
> >>>>>> be too early, and we need a way to calculate an estimated
> >>>>>> pageflip timestamp that will be later than the vblank timestamp.
> >>>>>>
> >>>>>> How a driver determines such a "late flip" timestamp is hw
> >>>>>> and driver specific, but this patch adds a new helper function
> >>>>>> that allows the driver to propose such an alternate "late flip"
> >>>>>> timestamp for use in pageflip events:
> >>>>>>
> >>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> >>>>>>
> >>>>>> When sending out pageflip events, we now compare that proposed
> >>>>>> flip_timestamp against the vblank timestamp of the current
> >>>>>> vblank of flip completion and choose to send out the greater/
> >>>>>> later timestamp as flip completion timestamp.
> >>>>>>
> >>>>>> The most simple way for a kms driver to supply a suitable
> >>>>>> flip_timestamp in vrr mode would be to simply take a timestamp
> >>>>>> at start of the pageflip completion handler, e.g., pageflip
> >>>>>> irq handler: flip_timestamp = ktime_get(); and then set that
> >>>>>> as proposed "late" alternative timestamp via ...
> >>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> >>>>>>
> >>>>>> More clever approaches could try to add some corrective offset
> >>>>>> for fixed back porch duration, or ideally use hardware features
> >>>>>> like hw timestamps to calculate the exact end time of vblank.
> >>>>>>
> >>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> >>>>>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> >>>>>> Cc: Harry Wentland <harry.wentland@amd.com>
> >>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>
> >>>>> Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
> >>>>> the right timestampe, once? With this I guess if you do a vblank query in
> >>>>> between the wrong and the right vblank you'll get the bogus value. Not
> >>>>> really great for userspace.
> >>>>> -Daniel
> >>>>
> >>>> I think we calculate the timestamp and send the vblank event both within
> >>>> the pageflip IRQ handler so calculating the right pageflip timestamp
> >>>> once could probably be done. I'm not sure if it's easier than proposing
> >>>> a later flip time with an API like this though.
> >>>>
> >>>> The actual scanout time should be known from the page-flip handler so
> >>>> the semantics for VRR on/off remain the same. This is because the
> >>>> page-flip triggers entering the back porch if we're in the extended
> >>>> front porch.
> >>>>
> >>>> But scanout time from vblank events for something like
> >>>> DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
> >>>> treated as estimates. If we're in the regular front porch then the
> >>>> timing to scanout is based on the fixed duration front porch for the
> >>>> current mode. If we're in the extended back porch then it's technically
> >>>> driver defined but the most reasonable guess is to assume that the front
> >>>> porch is going to end at any moment, so just return the length of the
> >>>> back porch for getting the scanout time.
> >>>>
> >>>> Proposing the late timestamp shouldn't affect vblank event in the
> >>>> DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
> >>>> event case. I'm not sure if that's what's guaranteed to happen with this
> >>>> patch though. There doesn't seem to be any locking on either
> >>>> dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
> >>>> the same vblank event back as the one just stored I don't think it's
> >>>> guaranteed.
> >>>
> >>> That's the inconsistency I mean to highlight - the timestamp for the
> >>> same frame as observed through flip complete and through the
> >>> wait_vblank ioctl can differ. Which they really shouldn't.
> >>>
> >>
> >> Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
> >> that vblank and pageflip timestamps should always match. But then the
> >> kernel docs for "Variable refresh properties" in drm_connector.c for
> >> vblank timestamps were changed for the VRR implementation in Linux
> >> 5.0-rc to redefine them when in VRR mode. They are defined, but
> >> probably rather useless for any practical purpose, like this:
> >>
> >> "The semantics for the vertical blank timestamp differ when
> >> variable refresh rate is active. The vertical blank timestamp
> >> is defined to be an estimate using the current mode's fixed
> >> refresh rate timings. The semantics for the page-flip event
> >> timestamp remain the same."
> >
> > Uh I missed that. That sounds like nonsense tbh.
> >
> >> So our docs contradict each other as of Linux 5.0-rc. Certainly having
> >> useful vblank timetamps would be useful.
> >
> > Yup, imo vblank should still match page_flip. Otherwise I expect a lot of
> > hilarity will ensue.
>
> I would imagine you would see more breakage by changing userspace
> expectations for when the event is actually received.
>
> The IOCTL is "wait for vblank", but if we start sending the event near
> the end of vblank it becomes more "wait for pageflip" (near scanout) or
> "wait 15ms while the VRR front porch times out and causes flickering".
>
> I don't speak for all userspace but it seems more useful to me to have
> the event sent at the start of vblank.

Maybe we need to things. The problem is that the timestampe is defined
to be correct for start-of-frame. And right now the amdgpu
implementation of VRR doesn't even give an accurate timestampe for the
flip, so most likely no one yet thought about this properly.
Definitely no testcases (or it would have been caught), whereas we
have tons of testcases for fixed refresh.

> VRR in its current state is useful for applications that aren't
> completely timing sensitive (like most games). This is because the
> driver has free reign in restricting the min/max bounds of the range -
> allowing for enhancements like low framerate compensation / BTR.
>
> For a concrete example, imagine you're not doing frontbuffer rendering
> but you're still trying to target a specific scanout time. So even if we
> were to send the vblank event at vpos 0 you'd get an accurate scanout
> timestamp for the next scanout after the vback porch ends. You're not
> going to be doing any rendering or preparation in this time because it's
> simply too short. So you'd target the scanout period after this one,
> which you'd need to calculate somehow in userspace. Taking the deltas
> between two vblank events is completely useless here because you'll get
> back the vrr min refresh delta. So you can take your source content rate
> or some fixed rate you'd like to target within the range but even then
> you're still not guaranteed to have that flip end up at that scanout time.
>
> The current solution is to just return the scanout pos at vrr max, or
> the current mode's fixed refresh rate. If you're flip before the fixed
> duration front porch ends then that scanout time will likely be accurate
> since you can't flip faster than vrr max.
>
> Is there actually userspace that cares that these timestamps need to
> match? Like said below, it seems that wait for vblank is mostly about
> scheduling and for that I think having the event sent back at the start
> of vblank is more important here.

There's 3 cases:
- DRI3: Waits for vblank for frontbuffer blits. If you combine that
with VRR then imo you deserve to keep all the pieces.
- generic compositors. Use VRR for scheduling, expect that a page_flip
queued immediately after a page_flip will hit the next frame, and not
the current frame. They also expect that this will not give you the
slowest possible refresh rate, but I guess VRR plus frame scheduling
is currently out of scope. It's kinda just for as-fast-as-possible
games.
- flip timestamp. Currently the entire codebase assumes that vblank ts
= flip ts. I don't really see a point in breaking that.
- better scheduling of VRR: what compositors actually do isn't wait
for the vblank, that only latches the page_flip. They set up a
hrtimer, with a fixed/auto-tuned head-start, from which they start
their rendering. The same could be done for VRR, if we expose the
earliest/latest vblank end times (and how much they can shift). vblank
ioctl is only used for syncing that up and making sure we're hitting
the right frame. This is all for pageflipping compositors. This is
also what they'll probably keep doing when we give them a desired flip
time (instead of a target frame), they still need an hrtimer to fire a
little bit ahead to get the frame rendered.

So taking all together, we have 2 use-cases I think:
- frontbuffer rendering. Kinda wants vblank event to fire at the
beginning of vblank. Won't ever do VRR
- pageflipping compositors. Don't care about the vblank event, either
they have an hrtimer timer (which gets synced with the vblank ioctl),
or just flip as fast as possible. They don't care one bit when the
vblank event happens, as long as a pageflip scheduled immediately
after the event shows up only hits the next frame (which is the delay
code that currently stretches vblank unecessarily with VRR if you're
already a bit late).

Making sure that vblank and page flip timestamp match the real
userspace seems like the cleanest semantics, and the least surprising
to userspace. E.g. what happens if you implement VRR in a compositor,
but then tune your hrtimer within the vblank, when the vblank
timestamp is totally not reflecting your real pageflip? I just don't
see what's the upside of having a vblank timestamp that's not actually
reflecting the real flip.

Also, if your app wants the start of vblank, that's very easy to
compute: Just add the scanout time to the last vblank timestamp.
Figuring out the start of frame otoh isn't doable if you can't query,
and constantly pageflipping just to have an accurate start-of-frame
(for tuning your hrtimer) seems rather wasteful - it outright defeats
some of the use-cases for VRR.

So what benefit do you see in having vblank ts and pageflip ts not
agree? What's the use-case this solves - aside from "less typing" :-)
If the only use-case is VRR for frontbuffer rendering, then that
doesn't even work right now (aside from that I think it's something we
shouldn't support and definitely not encourage).
-Daniel

> >>> Now added complication is that amdgpu sends out vblank events really
> >>> early, which is used by userspace to do frontbuffer rendering in the
> >>> vblank time. But I don't think anyone wants to do both VRR and
> >>
> >> I think all kms drivers try to call drm_crtc_handle_vblank() at start
> >> of vblank to give Mesa the most time for frontbuffer rendering for
> >> classic X. But vblank events are also used for scheduling bufferswaps
> >> or other stuff for redirected windowed rendering, or via api's like
> >> OML_sync_controls glXWaitForMscOML, so there might be other things
> >> affected by a more delayed vblank handling.
> >
> > The frontbuffer rendering is very much X driver specific, and I think
> > -amdgpu/radeon is the only one that requires this. No i915 driver ever
> > used the vblank interrupt to schedule frontbuffer blits, we use some
> > CS-side stalls.
> >
> > Wrt scheduling pageflips: The rule is that right after you've received the
> > vblank for frame X, then an immediately schedule pageflip should hit X+1,
> > but not X. amdgpu had this broken, but it's fixed since a while.
> >
> >>> frontbuffer rendering, hence I think it should be possible to create
> >>> correct vblank timestamps for the VRR case, while leaving the current
> >>> logic in place for everything else. But that means moving the entire
> >>> vblank processing to where you know the next frame's start time for
> >>> VRR, both for page flips and for all other vblank events.
> >>> -Daniel
> >>>
> >>
> >> I think for amdgpu it would be doable by calling drm_handle_vblank
> >> from the pageflip irq handler in VRR mode, and then additionally from
> >> the vblank interrupt handler like now. Vblank irq's are triggered via
> >> vline interrupts with programmable vertical trigger line position, so
> >> we could program the trigger line to be start of back-porch instead of
> >> start of front-porch. In the pageflip case we do vblank handling +
> >> timestamping + pflip completion from pflip irq, and have the regular
> >> back-porch vblank handling for refresh cycles in which no
> >> pageflip/pflip irq happened.
> >
> > Hm, can't we simply program the vblank to occur in the back porch region
> > for VRR? The vblank timestamping should automatically correct the
> > timestamp in that case. Would require a lot less code changes.
> >
> > Plus update the documentation ofc.
> >
> >> Not sure if/how that would mess with below-vrr-refresh-rate-range
> >> support for stutter reduction at low fps though? Or with performance,
> >> given vblank event dispatch could be delayed a lot in VRR compared to
> >> normal mode? Or with all the vblank accounting during modesets or crtc
> >> en/disable?
>
> I think we have extra VLINEs that can be programmed at vpos 0 that could
> resolve this but I'm still not really sold on the idea.
>
> >
> > Currently VRR and explicit frame scheduling don't mix. We've discussed
> > this, and consens seems to be that we need a new property for page_flip,
> > similar to the target frame, but instead of frames a timestamp. Then the
> > driver can hit a specific time for the next frame.
> >
> > And since compositors use the vblank ioctl just to do frame scheduling, I
> > don't think we have to worry hugely about interactions there. Much better
> > to aim for clean semantics (i.e. vblank and flip complete timestamps
> > better match).
> >
> >> Also, could other drivers like Intel easily do this delayed vblank
> >> processing in the non-pageflip case?
> >
> > Intel's problem :-) But yeah we have page flip timestamp registers, so
> > worst case all we need to do is shuffle code around a bit. And we have
> > interrupts for both start of vblank and and of vblank. It might be a bit
> > more work because we use the vblank interrupt to avoid races in atomic
> > updates, but that's not too terrible to fix.
> >
> >>  From my user perspective as a developer of a neuroscience research
> >> application that has a couple of exciting/important use cases for VRR
> >> mode, i absolutely do need trustworthy and precise pageflip event
> >> timestamps that represent reality, otherwise VRR mode would be less
> >> than useless for me. I can do without meaningful vblank timestamps in
> >> VRR mode though, and expected to do without them, as any scheduling
> >> has to happen time-based instead of based on vblank counts anyway,
> >> given that vblank counts are quite meaningless if every frame can have
> >> a different duration. I'd expect that situation to be similar for
> >> other apps that would want to use timestamps in VRR mode like video
> >> games, movie players or VR/AR applications.
> >
> > Yeah for that you want to schedule your frames with a target frame number.
> > Would still be nice if vblank timestamps wouldn't be an outright lie (and
> > the timestamp is still useful information, even if the frame counter
> > isn't - many displays have min/max frame rates for VRR, which is
> > information we could perhaps expose).
> > -Daniel >
> >>
> >> -mario
> >>
> >>>>
> >>>> Nicholas Kazlauskas
> >>>>
> >>>>>
> >>>>>> ---
> >>>>>>    drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
> >>>>>>    include/drm/drm_vblank.h     |  8 ++++++
> >>>>>>    2 files changed, 56 insertions(+), 1 deletion(-)
> >>>>>>
> >>>>>> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> >>>>>> index 98e091175921..4b3a4c38fabe 100644
> >>>>>> --- a/drivers/gpu/drm/drm_vblank.c
> >>>>>> +++ b/drivers/gpu/drm/drm_vblank.c
> >>>>>> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
> >>>>>>               u64 seq, ktime_t now)
> >>>>>>    {
> >>>>>>       struct timespec64 tv;
> >>>>>> +    ktime_t alt_flip_time;
> >>>>>>
> >>>>>>       switch (e->event.base.type) {
> >>>>>> -    case DRM_EVENT_VBLANK:
> >>>>>>       case DRM_EVENT_FLIP_COMPLETE:
> >>>>>> +            /*
> >>>>>> +             * For flip completion events, override "now" time
> >>>>>> +             * with alt_flip_time provided by the driver via
> >>>>>> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
> >>>>>> +             * if that time is later than given "now" vblank time.
> >>>>>> +             */
> >>>>>> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
> >>>>>> +            if (alt_flip_time > now)
> >>>>>> +                    now = alt_flip_time;
> >>>>>> +            /* Fallthrough */
> >>>>>> +    case DRM_EVENT_VBLANK:
> >>>>>>               tv = ktime_to_timespec64(now);
> >>>>>>               e->event.vbl.sequence = seq;
> >>>>>>               /*
> >>>>>> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> >>>>>>
> >>>>>>               now = ktime_get();
> >>>>>>       }
> >>>>>> +
> >>>>>>       e->pipe = pipe;
> >>>>>>       send_vblank_event(dev, e, seq, now);
> >>>>>>    }
> >>>>>>    EXPORT_SYMBOL(drm_crtc_send_vblank_event);
> >>>>>>
> >>>>>> +/**
> >>>>>> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
> >>>>>> + * @crtc: the source CRTC of the pageflip completion event
> >>>>>> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
> >>>>>> + *
> >>>>>> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
> >>>>>> + * by the pageflip event can never be earlier than the vblank timestamp of the
> >>>>>> + * vblank of flip completion, as that vblank timestamp defines the end of the
> >>>>>> + * shortest possible vblank duration. In case of a delayed flip completion
> >>>>>> + * inside the extended VRR front porch however, the end of vblank can be much
> >>>>>> + * later, so the driver must assign an estimated timestamp of that later end of
> >>>>>> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
> >>>>>> + * set an alternate flip completion timestamp in case of late flip completions
> >>>>>> + * in extended vblank. In the most simple case, this @flip_time timestamp could
> >>>>>> + * simply be a ktime_get() timestamp taken at the start of the pageflip
> >>>>>> + * completion routine, with some constant duration of the back porch interval
> >>>>>> + * added, although more precise estimates may be possible on some hardware if
> >>>>>> + * the hardware provides some means of timestamping the true end of vblank.
> >>>>>> + *
> >>>>>> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
> >>>>>> + * will use either the standard vblank timestamp, calculated for a minimum
> >>>>>> + * duration vblank, or the provided @flip_time if that time is later than the
> >>>>>> + * vblank timestamp, to get the best possible estimate of start of display of
> >>>>>> + * the new post-pageflip scanout buffer.
> >>>>>> + */
> >>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> >>>>>> +                                     ktime_t flip_time)
> >>>>>> +{
> >>>>>> +    struct drm_device *dev = crtc->dev;
> >>>>>> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
> >>>>>> +
> >>>>>> +    vblank->alt_flip_time = flip_time;
> >>>>>> +}
> >>>>>> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
> >>>>>> +
> >>>>>>    static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
> >>>>>>    {
> >>>>>>       if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> >>>>>> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> >>>>>> index 6ad9630d4f48..aacf44694ab6 100644
> >>>>>> --- a/include/drm/drm_vblank.h
> >>>>>> +++ b/include/drm/drm_vblank.h
> >>>>>> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
> >>>>>>        * @time: Vblank timestamp corresponding to @count.
> >>>>>>        */
> >>>>>>       ktime_t time;
> >>>>>> +    /**
> >>>>>> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
> >>>>>> +     * a late pageflip completion in variable refresh rate mode. Pageflip
> >>>>>> +     * events will carry the later one of @time and @alt_flip_time.
> >>>>>> +     */
> >>>>>> +    ktime_t alt_flip_time;
> >>>>>>
> >>>>>>       /**
> >>>>>>        * @refcount: Number of users/waiters of the vblank interrupt. Only when
> >>>>>> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
> >>>>>>    u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
> >>>>>>    u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
> >>>>>>                                  ktime_t *vblanktime);
> >>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> >>>>>> +                                     ktime_t flip_time);
> >>>>>>    void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> >>>>>>                              struct drm_pending_vblank_event *e);
> >>>>>>    void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
> >>>>>> --
> >>>>>> 2.17.1
> >>>>>>
> >>>>>> _______________________________________________
> >>>>>> dri-devel mailing list
> >>>>>> dri-devel@lists.freedesktop.org
> >>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>>>>
> >>>>
> >>>
> >>>
> >>> --
> >>> Daniel Vetter
> >>> Software Engineer, Intel Corporation
> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >
>
> Nicholas Kazlauskas
>
Kazlauskas, Nicholas Feb. 13, 2019, 3:41 p.m. UTC | #12
On 2/13/19 10:14 AM, Daniel Vetter wrote:
> On Wed, Feb 13, 2019 at 3:33 PM Kazlauskas, Nicholas
> <Nicholas.Kazlauskas@amd.com> wrote:
>>
>> On 2/13/19 4:50 AM, Daniel Vetter wrote:
>>> On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
>>>> On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>
>>>>> On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
>>>>> <Nicholas.Kazlauskas@amd.com> wrote:
>>>>>>
>>>>>> On 2/11/19 3:35 AM, Daniel Vetter wrote:
>>>>>>> On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
>>>>>>>> The pageflip completion timestamps transmitted to userspace
>>>>>>>> via pageflip completion events are supposed to describe the
>>>>>>>> time at which the first pixel of the new post-pageflip scanout
>>>>>>>> buffer leaves the video output of the gpu. This time is
>>>>>>>> identical to end of vblank, when active scanout starts.
>>>>>>>>
>>>>>>>> For a crtc in standard fixed refresh rate, the end of vblank
>>>>>>>> is identical to the vblank timestamps calculated by
>>>>>>>> drm_update_vblank_count() at each vblank interrupt, or each
>>>>>>>> vblank dis-/enable. Therefore pageflip events just carry
>>>>>>>> that vblank timestamp as their pageflip timestamp.
>>>>>>>>
>>>>>>>> For a crtc switched to variable refresh rate mode (vrr), the
>>>>>>>> pageflip completion timestamps are identical to the vblank
>>>>>>>> timestamps iff the pageflip was executed early in vblank,
>>>>>>>> before the minimum vblank duration elapsed. In this case
>>>>>>>> the time of display onset is identical to when the crtc
>>>>>>>> is running in fixed refresh rate.
>>>>>>>>
>>>>>>>> However, if a pageflip completes later in the vblank, inside
>>>>>>>> the "extended front porch" in vrr mode, then the vblank will
>>>>>>>> terminate at a fixed (back porch) duration after flip, so
>>>>>>>> the display onset time is delayed correspondingly. In this
>>>>>>>> case the vblank timestamp computed at vblank irq time would
>>>>>>>> be too early, and we need a way to calculate an estimated
>>>>>>>> pageflip timestamp that will be later than the vblank timestamp.
>>>>>>>>
>>>>>>>> How a driver determines such a "late flip" timestamp is hw
>>>>>>>> and driver specific, but this patch adds a new helper function
>>>>>>>> that allows the driver to propose such an alternate "late flip"
>>>>>>>> timestamp for use in pageflip events:
>>>>>>>>
>>>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
>>>>>>>>
>>>>>>>> When sending out pageflip events, we now compare that proposed
>>>>>>>> flip_timestamp against the vblank timestamp of the current
>>>>>>>> vblank of flip completion and choose to send out the greater/
>>>>>>>> later timestamp as flip completion timestamp.
>>>>>>>>
>>>>>>>> The most simple way for a kms driver to supply a suitable
>>>>>>>> flip_timestamp in vrr mode would be to simply take a timestamp
>>>>>>>> at start of the pageflip completion handler, e.g., pageflip
>>>>>>>> irq handler: flip_timestamp = ktime_get(); and then set that
>>>>>>>> as proposed "late" alternative timestamp via ...
>>>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
>>>>>>>>
>>>>>>>> More clever approaches could try to add some corrective offset
>>>>>>>> for fixed back porch duration, or ideally use hardware features
>>>>>>>> like hw timestamps to calculate the exact end time of vblank.
>>>>>>>>
>>>>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>>>>>>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>>>>>>>> Cc: Harry Wentland <harry.wentland@amd.com>
>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>
>>>>>>> Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
>>>>>>> the right timestampe, once? With this I guess if you do a vblank query in
>>>>>>> between the wrong and the right vblank you'll get the bogus value. Not
>>>>>>> really great for userspace.
>>>>>>> -Daniel
>>>>>>
>>>>>> I think we calculate the timestamp and send the vblank event both within
>>>>>> the pageflip IRQ handler so calculating the right pageflip timestamp
>>>>>> once could probably be done. I'm not sure if it's easier than proposing
>>>>>> a later flip time with an API like this though.
>>>>>>
>>>>>> The actual scanout time should be known from the page-flip handler so
>>>>>> the semantics for VRR on/off remain the same. This is because the
>>>>>> page-flip triggers entering the back porch if we're in the extended
>>>>>> front porch.
>>>>>>
>>>>>> But scanout time from vblank events for something like
>>>>>> DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
>>>>>> treated as estimates. If we're in the regular front porch then the
>>>>>> timing to scanout is based on the fixed duration front porch for the
>>>>>> current mode. If we're in the extended back porch then it's technically
>>>>>> driver defined but the most reasonable guess is to assume that the front
>>>>>> porch is going to end at any moment, so just return the length of the
>>>>>> back porch for getting the scanout time.
>>>>>>
>>>>>> Proposing the late timestamp shouldn't affect vblank event in the
>>>>>> DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
>>>>>> event case. I'm not sure if that's what's guaranteed to happen with this
>>>>>> patch though. There doesn't seem to be any locking on either
>>>>>> dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
>>>>>> the same vblank event back as the one just stored I don't think it's
>>>>>> guaranteed.
>>>>>
>>>>> That's the inconsistency I mean to highlight - the timestamp for the
>>>>> same frame as observed through flip complete and through the
>>>>> wait_vblank ioctl can differ. Which they really shouldn't.
>>>>>
>>>>
>>>> Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
>>>> that vblank and pageflip timestamps should always match. But then the
>>>> kernel docs for "Variable refresh properties" in drm_connector.c for
>>>> vblank timestamps were changed for the VRR implementation in Linux
>>>> 5.0-rc to redefine them when in VRR mode. They are defined, but
>>>> probably rather useless for any practical purpose, like this:
>>>>
>>>> "The semantics for the vertical blank timestamp differ when
>>>> variable refresh rate is active. The vertical blank timestamp
>>>> is defined to be an estimate using the current mode's fixed
>>>> refresh rate timings. The semantics for the page-flip event
>>>> timestamp remain the same."
>>>
>>> Uh I missed that. That sounds like nonsense tbh.
>>>
>>>> So our docs contradict each other as of Linux 5.0-rc. Certainly having
>>>> useful vblank timetamps would be useful.
>>>
>>> Yup, imo vblank should still match page_flip. Otherwise I expect a lot of
>>> hilarity will ensue.
>>
>> I would imagine you would see more breakage by changing userspace
>> expectations for when the event is actually received.
>>
>> The IOCTL is "wait for vblank", but if we start sending the event near
>> the end of vblank it becomes more "wait for pageflip" (near scanout) or
>> "wait 15ms while the VRR front porch times out and causes flickering".
>>
>> I don't speak for all userspace but it seems more useful to me to have
>> the event sent at the start of vblank.
> 
> Maybe we need to things. The problem is that the timestampe is defined
> to be correct for start-of-frame. And right now the amdgpu
> implementation of VRR doesn't even give an accurate timestampe for the
> flip, so most likely no one yet thought about this properly.
> Definitely no testcases (or it would have been caught), whereas we
> have tons of testcases for fixed refresh.

I agree that it would be good to have documentation for what waiting for 
vblank is supposed to mean here. There's a big difference between 
getting the event at the start of vsync vs the start of the back porch 
near the end of it.

> 
>> VRR in its current state is useful for applications that aren't
>> completely timing sensitive (like most games). This is because the
>> driver has free reign in restricting the min/max bounds of the range -
>> allowing for enhancements like low framerate compensation / BTR.
>>
>> For a concrete example, imagine you're not doing frontbuffer rendering
>> but you're still trying to target a specific scanout time. So even if we
>> were to send the vblank event at vpos 0 you'd get an accurate scanout
>> timestamp for the next scanout after the vback porch ends. You're not
>> going to be doing any rendering or preparation in this time because it's
>> simply too short. So you'd target the scanout period after this one,
>> which you'd need to calculate somehow in userspace. Taking the deltas
>> between two vblank events is completely useless here because you'll get
>> back the vrr min refresh delta. So you can take your source content rate
>> or some fixed rate you'd like to target within the range but even then
>> you're still not guaranteed to have that flip end up at that scanout time.
>>
>> The current solution is to just return the scanout pos at vrr max, or
>> the current mode's fixed refresh rate. If you're flip before the fixed
>> duration front porch ends then that scanout time will likely be accurate
>> since you can't flip faster than vrr max.
>>
>> Is there actually userspace that cares that these timestamps need to
>> match? Like said below, it seems that wait for vblank is mostly about
>> scheduling and for that I think having the event sent back at the start
>> of vblank is more important here.
> 
> There's 3 cases:
> - DRI3: Waits for vblank for frontbuffer blits. If you combine that
> with VRR then imo you deserve to keep all the pieces.
> - generic compositors. Use VRR for scheduling, expect that a page_flip
> queued immediately after a page_flip will hit the next frame, and not
> the current frame. They also expect that this will not give you the
> slowest possible refresh rate, but I guess VRR plus frame scheduling
> is currently out of scope. It's kinda just for as-fast-as-possible
> games.
> - flip timestamp. Currently the entire codebase assumes that vblank ts
> = flip ts. I don't really see a point in breaking that.
> - better scheduling of VRR: what compositors actually do isn't wait
> for the vblank, that only latches the page_flip. They set up a
> hrtimer, with a fixed/auto-tuned head-start, from which they start
> their rendering. The same could be done for VRR, if we expose the
> earliest/latest vblank end times (and how much they can shift). vblank
> ioctl is only used for syncing that up and making sure we're hitting
> the right frame. This is all for pageflipping compositors. This is
> also what they'll probably keep doing when we give them a desired flip
> time (instead of a target frame), they still need an hrtimer to fire a
> little bit ahead to get the frame rendered.
> 
> So taking all together, we have 2 use-cases I think:
> - frontbuffer rendering. Kinda wants vblank event to fire at the
> beginning of vblank. Won't ever do VRR
> - pageflipping compositors. Don't care about the vblank event, either
> they have an hrtimer timer (which gets synced with the vblank ioctl),
> or just flip as fast as possible. They don't care one bit when the
> vblank event happens, as long as a pageflip scheduled immediately
> after the event shows up only hits the next frame (which is the delay
> code that currently stretches vblank unecessarily with VRR if you're
> already a bit late).
> 
> Making sure that vblank and page flip timestamp match the real
> userspace seems like the cleanest semantics, and the least surprising
> to userspace. E.g. what happens if you implement VRR in a compositor,
> but then tune your hrtimer within the vblank, when the vblank
> timestamp is totally not reflecting your real pageflip? I just don't
> see what's the upside of having a vblank timestamp that's not actually
> reflecting the real flip.
> 
> Also, if your app wants the start of vblank, that's very easy to
> compute: Just add the scanout time to the last vblank timestamp.
> Figuring out the start of frame otoh isn't doable if you can't query,
> and constantly pageflipping just to have an accurate start-of-frame
> (for tuning your hrtimer) seems rather wasteful - it outright defeats
> some of the use-cases for VRR.
> 
> So what benefit do you see in having vblank ts and pageflip ts not
> agree? What's the use-case this solves - aside from "less typing" :-)
> If the only use-case is VRR for frontbuffer rendering, then that
> doesn't even work right now (aside from that I think it's something we
> shouldn't support and definitely not encourage).
> -Daniel

I'm actually fine with moving the vblank event back to the back porch 
but I'd just be worried about userspace breakages that relied on it 
happening the other way.

It'd be especially confusing if the VLINE was different for fixed vs 
variable refresh. I wouldn't want to move it to vpos 0 unless the event 
is also getting moved for fixed refresh as well.

But like you said, the only thing I can think of that should need the 
event right away is low latency / framebuffer rendering.

I can think at least one potential use case where you would want to do 
front buffer rendering for low latency with VRR - emulation of old 
console hardware. There's not really any modern display that has fixed 
refresh modes that match the render frequency of some consoles. But this 
use case isn't out there in the wild right now at least.

Regarding compositors: I don't really recommend VRR for this use case in 
general since you'll want a fixed refresh rate frequency for input to 
feel consistent and to avoid any flickering / luminance changes. The 
fastest refresh rate will be the most responsive to the user and you can 
just used the fixed refresh rate mode in this case. Lowering the refresh 
rate to save power sounds nice in theory but I'd imagine it'd feel 
terrible in practice.

Nicholas Kazlauskas

> 
>>>>> Now added complication is that amdgpu sends out vblank events really
>>>>> early, which is used by userspace to do frontbuffer rendering in the
>>>>> vblank time. But I don't think anyone wants to do both VRR and
>>>>
>>>> I think all kms drivers try to call drm_crtc_handle_vblank() at start
>>>> of vblank to give Mesa the most time for frontbuffer rendering for
>>>> classic X. But vblank events are also used for scheduling bufferswaps
>>>> or other stuff for redirected windowed rendering, or via api's like
>>>> OML_sync_controls glXWaitForMscOML, so there might be other things
>>>> affected by a more delayed vblank handling.
>>>
>>> The frontbuffer rendering is very much X driver specific, and I think
>>> -amdgpu/radeon is the only one that requires this. No i915 driver ever
>>> used the vblank interrupt to schedule frontbuffer blits, we use some
>>> CS-side stalls.
>>>
>>> Wrt scheduling pageflips: The rule is that right after you've received the
>>> vblank for frame X, then an immediately schedule pageflip should hit X+1,
>>> but not X. amdgpu had this broken, but it's fixed since a while.
>>>
>>>>> frontbuffer rendering, hence I think it should be possible to create
>>>>> correct vblank timestamps for the VRR case, while leaving the current
>>>>> logic in place for everything else. But that means moving the entire
>>>>> vblank processing to where you know the next frame's start time for
>>>>> VRR, both for page flips and for all other vblank events.
>>>>> -Daniel
>>>>>
>>>>
>>>> I think for amdgpu it would be doable by calling drm_handle_vblank
>>>> from the pageflip irq handler in VRR mode, and then additionally from
>>>> the vblank interrupt handler like now. Vblank irq's are triggered via
>>>> vline interrupts with programmable vertical trigger line position, so
>>>> we could program the trigger line to be start of back-porch instead of
>>>> start of front-porch. In the pageflip case we do vblank handling +
>>>> timestamping + pflip completion from pflip irq, and have the regular
>>>> back-porch vblank handling for refresh cycles in which no
>>>> pageflip/pflip irq happened.
>>>
>>> Hm, can't we simply program the vblank to occur in the back porch region
>>> for VRR? The vblank timestamping should automatically correct the
>>> timestamp in that case. Would require a lot less code changes.
>>>
>>> Plus update the documentation ofc.
>>>
>>>> Not sure if/how that would mess with below-vrr-refresh-rate-range
>>>> support for stutter reduction at low fps though? Or with performance,
>>>> given vblank event dispatch could be delayed a lot in VRR compared to
>>>> normal mode? Or with all the vblank accounting during modesets or crtc
>>>> en/disable?
>>
>> I think we have extra VLINEs that can be programmed at vpos 0 that could
>> resolve this but I'm still not really sold on the idea.
>>
>>>
>>> Currently VRR and explicit frame scheduling don't mix. We've discussed
>>> this, and consens seems to be that we need a new property for page_flip,
>>> similar to the target frame, but instead of frames a timestamp. Then the
>>> driver can hit a specific time for the next frame.
>>>
>>> And since compositors use the vblank ioctl just to do frame scheduling, I
>>> don't think we have to worry hugely about interactions there. Much better
>>> to aim for clean semantics (i.e. vblank and flip complete timestamps
>>> better match).
>>>
>>>> Also, could other drivers like Intel easily do this delayed vblank
>>>> processing in the non-pageflip case?
>>>
>>> Intel's problem :-) But yeah we have page flip timestamp registers, so
>>> worst case all we need to do is shuffle code around a bit. And we have
>>> interrupts for both start of vblank and and of vblank. It might be a bit
>>> more work because we use the vblank interrupt to avoid races in atomic
>>> updates, but that's not too terrible to fix.
>>>
>>>>   From my user perspective as a developer of a neuroscience research
>>>> application that has a couple of exciting/important use cases for VRR
>>>> mode, i absolutely do need trustworthy and precise pageflip event
>>>> timestamps that represent reality, otherwise VRR mode would be less
>>>> than useless for me. I can do without meaningful vblank timestamps in
>>>> VRR mode though, and expected to do without them, as any scheduling
>>>> has to happen time-based instead of based on vblank counts anyway,
>>>> given that vblank counts are quite meaningless if every frame can have
>>>> a different duration. I'd expect that situation to be similar for
>>>> other apps that would want to use timestamps in VRR mode like video
>>>> games, movie players or VR/AR applications.
>>>
>>> Yeah for that you want to schedule your frames with a target frame number.
>>> Would still be nice if vblank timestamps wouldn't be an outright lie (and
>>> the timestamp is still useful information, even if the frame counter
>>> isn't - many displays have min/max frame rates for VRR, which is
>>> information we could perhaps expose).
>>> -Daniel >
>>>>
>>>> -mario
>>>>
>>>>>>
>>>>>> Nicholas Kazlauskas
>>>>>>
>>>>>>>
>>>>>>>> ---
>>>>>>>>     drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
>>>>>>>>     include/drm/drm_vblank.h     |  8 ++++++
>>>>>>>>     2 files changed, 56 insertions(+), 1 deletion(-)
>>>>>>>>
>>>>>>>> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
>>>>>>>> index 98e091175921..4b3a4c38fabe 100644
>>>>>>>> --- a/drivers/gpu/drm/drm_vblank.c
>>>>>>>> +++ b/drivers/gpu/drm/drm_vblank.c
>>>>>>>> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
>>>>>>>>                u64 seq, ktime_t now)
>>>>>>>>     {
>>>>>>>>        struct timespec64 tv;
>>>>>>>> +    ktime_t alt_flip_time;
>>>>>>>>
>>>>>>>>        switch (e->event.base.type) {
>>>>>>>> -    case DRM_EVENT_VBLANK:
>>>>>>>>        case DRM_EVENT_FLIP_COMPLETE:
>>>>>>>> +            /*
>>>>>>>> +             * For flip completion events, override "now" time
>>>>>>>> +             * with alt_flip_time provided by the driver via
>>>>>>>> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
>>>>>>>> +             * if that time is later than given "now" vblank time.
>>>>>>>> +             */
>>>>>>>> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
>>>>>>>> +            if (alt_flip_time > now)
>>>>>>>> +                    now = alt_flip_time;
>>>>>>>> +            /* Fallthrough */
>>>>>>>> +    case DRM_EVENT_VBLANK:
>>>>>>>>                tv = ktime_to_timespec64(now);
>>>>>>>>                e->event.vbl.sequence = seq;
>>>>>>>>                /*
>>>>>>>> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
>>>>>>>>
>>>>>>>>                now = ktime_get();
>>>>>>>>        }
>>>>>>>> +
>>>>>>>>        e->pipe = pipe;
>>>>>>>>        send_vblank_event(dev, e, seq, now);
>>>>>>>>     }
>>>>>>>>     EXPORT_SYMBOL(drm_crtc_send_vblank_event);
>>>>>>>>
>>>>>>>> +/**
>>>>>>>> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
>>>>>>>> + * @crtc: the source CRTC of the pageflip completion event
>>>>>>>> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
>>>>>>>> + *
>>>>>>>> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
>>>>>>>> + * by the pageflip event can never be earlier than the vblank timestamp of the
>>>>>>>> + * vblank of flip completion, as that vblank timestamp defines the end of the
>>>>>>>> + * shortest possible vblank duration. In case of a delayed flip completion
>>>>>>>> + * inside the extended VRR front porch however, the end of vblank can be much
>>>>>>>> + * later, so the driver must assign an estimated timestamp of that later end of
>>>>>>>> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
>>>>>>>> + * set an alternate flip completion timestamp in case of late flip completions
>>>>>>>> + * in extended vblank. In the most simple case, this @flip_time timestamp could
>>>>>>>> + * simply be a ktime_get() timestamp taken at the start of the pageflip
>>>>>>>> + * completion routine, with some constant duration of the back porch interval
>>>>>>>> + * added, although more precise estimates may be possible on some hardware if
>>>>>>>> + * the hardware provides some means of timestamping the true end of vblank.
>>>>>>>> + *
>>>>>>>> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
>>>>>>>> + * will use either the standard vblank timestamp, calculated for a minimum
>>>>>>>> + * duration vblank, or the provided @flip_time if that time is later than the
>>>>>>>> + * vblank timestamp, to get the best possible estimate of start of display of
>>>>>>>> + * the new post-pageflip scanout buffer.
>>>>>>>> + */
>>>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
>>>>>>>> +                                     ktime_t flip_time)
>>>>>>>> +{
>>>>>>>> +    struct drm_device *dev = crtc->dev;
>>>>>>>> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
>>>>>>>> +
>>>>>>>> +    vblank->alt_flip_time = flip_time;
>>>>>>>> +}
>>>>>>>> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
>>>>>>>> +
>>>>>>>>     static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
>>>>>>>>     {
>>>>>>>>        if (drm_core_check_feature(dev, DRIVER_MODESET)) {
>>>>>>>> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
>>>>>>>> index 6ad9630d4f48..aacf44694ab6 100644
>>>>>>>> --- a/include/drm/drm_vblank.h
>>>>>>>> +++ b/include/drm/drm_vblank.h
>>>>>>>> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
>>>>>>>>         * @time: Vblank timestamp corresponding to @count.
>>>>>>>>         */
>>>>>>>>        ktime_t time;
>>>>>>>> +    /**
>>>>>>>> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
>>>>>>>> +     * a late pageflip completion in variable refresh rate mode. Pageflip
>>>>>>>> +     * events will carry the later one of @time and @alt_flip_time.
>>>>>>>> +     */
>>>>>>>> +    ktime_t alt_flip_time;
>>>>>>>>
>>>>>>>>        /**
>>>>>>>>         * @refcount: Number of users/waiters of the vblank interrupt. Only when
>>>>>>>> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
>>>>>>>>     u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
>>>>>>>>     u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
>>>>>>>>                                   ktime_t *vblanktime);
>>>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
>>>>>>>> +                                     ktime_t flip_time);
>>>>>>>>     void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
>>>>>>>>                               struct drm_pending_vblank_event *e);
>>>>>>>>     void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
>>>>>>>> --
>>>>>>>> 2.17.1
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> dri-devel mailing list
>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>>
>>
>> Nicholas Kazlauskas
>>
> 
>
Daniel Vetter Feb. 13, 2019, 4:03 p.m. UTC | #13
On Wed, Feb 13, 2019 at 4:46 PM Kazlauskas, Nicholas
<Nicholas.Kazlauskas@amd.com> wrote:
>
> On 2/13/19 10:14 AM, Daniel Vetter wrote:
> > On Wed, Feb 13, 2019 at 3:33 PM Kazlauskas, Nicholas
> > <Nicholas.Kazlauskas@amd.com> wrote:
> >>
> >> On 2/13/19 4:50 AM, Daniel Vetter wrote:
> >>> On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
> >>>> On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>
> >>>>> On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
> >>>>> <Nicholas.Kazlauskas@amd.com> wrote:
> >>>>>>
> >>>>>> On 2/11/19 3:35 AM, Daniel Vetter wrote:
> >>>>>>> On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
> >>>>>>>> The pageflip completion timestamps transmitted to userspace
> >>>>>>>> via pageflip completion events are supposed to describe the
> >>>>>>>> time at which the first pixel of the new post-pageflip scanout
> >>>>>>>> buffer leaves the video output of the gpu. This time is
> >>>>>>>> identical to end of vblank, when active scanout starts.
> >>>>>>>>
> >>>>>>>> For a crtc in standard fixed refresh rate, the end of vblank
> >>>>>>>> is identical to the vblank timestamps calculated by
> >>>>>>>> drm_update_vblank_count() at each vblank interrupt, or each
> >>>>>>>> vblank dis-/enable. Therefore pageflip events just carry
> >>>>>>>> that vblank timestamp as their pageflip timestamp.
> >>>>>>>>
> >>>>>>>> For a crtc switched to variable refresh rate mode (vrr), the
> >>>>>>>> pageflip completion timestamps are identical to the vblank
> >>>>>>>> timestamps iff the pageflip was executed early in vblank,
> >>>>>>>> before the minimum vblank duration elapsed. In this case
> >>>>>>>> the time of display onset is identical to when the crtc
> >>>>>>>> is running in fixed refresh rate.
> >>>>>>>>
> >>>>>>>> However, if a pageflip completes later in the vblank, inside
> >>>>>>>> the "extended front porch" in vrr mode, then the vblank will
> >>>>>>>> terminate at a fixed (back porch) duration after flip, so
> >>>>>>>> the display onset time is delayed correspondingly. In this
> >>>>>>>> case the vblank timestamp computed at vblank irq time would
> >>>>>>>> be too early, and we need a way to calculate an estimated
> >>>>>>>> pageflip timestamp that will be later than the vblank timestamp.
> >>>>>>>>
> >>>>>>>> How a driver determines such a "late flip" timestamp is hw
> >>>>>>>> and driver specific, but this patch adds a new helper function
> >>>>>>>> that allows the driver to propose such an alternate "late flip"
> >>>>>>>> timestamp for use in pageflip events:
> >>>>>>>>
> >>>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> >>>>>>>>
> >>>>>>>> When sending out pageflip events, we now compare that proposed
> >>>>>>>> flip_timestamp against the vblank timestamp of the current
> >>>>>>>> vblank of flip completion and choose to send out the greater/
> >>>>>>>> later timestamp as flip completion timestamp.
> >>>>>>>>
> >>>>>>>> The most simple way for a kms driver to supply a suitable
> >>>>>>>> flip_timestamp in vrr mode would be to simply take a timestamp
> >>>>>>>> at start of the pageflip completion handler, e.g., pageflip
> >>>>>>>> irq handler: flip_timestamp = ktime_get(); and then set that
> >>>>>>>> as proposed "late" alternative timestamp via ...
> >>>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> >>>>>>>>
> >>>>>>>> More clever approaches could try to add some corrective offset
> >>>>>>>> for fixed back porch duration, or ideally use hardware features
> >>>>>>>> like hw timestamps to calculate the exact end time of vblank.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> >>>>>>>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> >>>>>>>> Cc: Harry Wentland <harry.wentland@amd.com>
> >>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> >>>>>>>
> >>>>>>> Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
> >>>>>>> the right timestampe, once? With this I guess if you do a vblank query in
> >>>>>>> between the wrong and the right vblank you'll get the bogus value. Not
> >>>>>>> really great for userspace.
> >>>>>>> -Daniel
> >>>>>>
> >>>>>> I think we calculate the timestamp and send the vblank event both within
> >>>>>> the pageflip IRQ handler so calculating the right pageflip timestamp
> >>>>>> once could probably be done. I'm not sure if it's easier than proposing
> >>>>>> a later flip time with an API like this though.
> >>>>>>
> >>>>>> The actual scanout time should be known from the page-flip handler so
> >>>>>> the semantics for VRR on/off remain the same. This is because the
> >>>>>> page-flip triggers entering the back porch if we're in the extended
> >>>>>> front porch.
> >>>>>>
> >>>>>> But scanout time from vblank events for something like
> >>>>>> DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
> >>>>>> treated as estimates. If we're in the regular front porch then the
> >>>>>> timing to scanout is based on the fixed duration front porch for the
> >>>>>> current mode. If we're in the extended back porch then it's technically
> >>>>>> driver defined but the most reasonable guess is to assume that the front
> >>>>>> porch is going to end at any moment, so just return the length of the
> >>>>>> back porch for getting the scanout time.
> >>>>>>
> >>>>>> Proposing the late timestamp shouldn't affect vblank event in the
> >>>>>> DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
> >>>>>> event case. I'm not sure if that's what's guaranteed to happen with this
> >>>>>> patch though. There doesn't seem to be any locking on either
> >>>>>> dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
> >>>>>> the same vblank event back as the one just stored I don't think it's
> >>>>>> guaranteed.
> >>>>>
> >>>>> That's the inconsistency I mean to highlight - the timestamp for the
> >>>>> same frame as observed through flip complete and through the
> >>>>> wait_vblank ioctl can differ. Which they really shouldn't.
> >>>>>
> >>>>
> >>>> Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
> >>>> that vblank and pageflip timestamps should always match. But then the
> >>>> kernel docs for "Variable refresh properties" in drm_connector.c for
> >>>> vblank timestamps were changed for the VRR implementation in Linux
> >>>> 5.0-rc to redefine them when in VRR mode. They are defined, but
> >>>> probably rather useless for any practical purpose, like this:
> >>>>
> >>>> "The semantics for the vertical blank timestamp differ when
> >>>> variable refresh rate is active. The vertical blank timestamp
> >>>> is defined to be an estimate using the current mode's fixed
> >>>> refresh rate timings. The semantics for the page-flip event
> >>>> timestamp remain the same."
> >>>
> >>> Uh I missed that. That sounds like nonsense tbh.
> >>>
> >>>> So our docs contradict each other as of Linux 5.0-rc. Certainly having
> >>>> useful vblank timetamps would be useful.
> >>>
> >>> Yup, imo vblank should still match page_flip. Otherwise I expect a lot of
> >>> hilarity will ensue.
> >>
> >> I would imagine you would see more breakage by changing userspace
> >> expectations for when the event is actually received.
> >>
> >> The IOCTL is "wait for vblank", but if we start sending the event near
> >> the end of vblank it becomes more "wait for pageflip" (near scanout) or
> >> "wait 15ms while the VRR front porch times out and causes flickering".
> >>
> >> I don't speak for all userspace but it seems more useful to me to have
> >> the event sent at the start of vblank.
> >
> > Maybe we need to things. The problem is that the timestampe is defined
> > to be correct for start-of-frame. And right now the amdgpu
> > implementation of VRR doesn't even give an accurate timestampe for the
> > flip, so most likely no one yet thought about this properly.
> > Definitely no testcases (or it would have been caught), whereas we
> > have tons of testcases for fixed refresh.
>
> I agree that it would be good to have documentation for what waiting for
> vblank is supposed to mean here. There's a big difference between
> getting the event at the start of vsync vs the start of the back porch
> near the end of it.
>
> >
> >> VRR in its current state is useful for applications that aren't
> >> completely timing sensitive (like most games). This is because the
> >> driver has free reign in restricting the min/max bounds of the range -
> >> allowing for enhancements like low framerate compensation / BTR.
> >>
> >> For a concrete example, imagine you're not doing frontbuffer rendering
> >> but you're still trying to target a specific scanout time. So even if we
> >> were to send the vblank event at vpos 0 you'd get an accurate scanout
> >> timestamp for the next scanout after the vback porch ends. You're not
> >> going to be doing any rendering or preparation in this time because it's
> >> simply too short. So you'd target the scanout period after this one,
> >> which you'd need to calculate somehow in userspace. Taking the deltas
> >> between two vblank events is completely useless here because you'll get
> >> back the vrr min refresh delta. So you can take your source content rate
> >> or some fixed rate you'd like to target within the range but even then
> >> you're still not guaranteed to have that flip end up at that scanout time.
> >>
> >> The current solution is to just return the scanout pos at vrr max, or
> >> the current mode's fixed refresh rate. If you're flip before the fixed
> >> duration front porch ends then that scanout time will likely be accurate
> >> since you can't flip faster than vrr max.
> >>
> >> Is there actually userspace that cares that these timestamps need to
> >> match? Like said below, it seems that wait for vblank is mostly about
> >> scheduling and for that I think having the event sent back at the start
> >> of vblank is more important here.
> >
> > There's 3 cases:
> > - DRI3: Waits for vblank for frontbuffer blits. If you combine that
> > with VRR then imo you deserve to keep all the pieces.
> > - generic compositors. Use VRR for scheduling, expect that a page_flip
> > queued immediately after a page_flip will hit the next frame, and not
> > the current frame. They also expect that this will not give you the
> > slowest possible refresh rate, but I guess VRR plus frame scheduling
> > is currently out of scope. It's kinda just for as-fast-as-possible
> > games.
> > - flip timestamp. Currently the entire codebase assumes that vblank ts
> > = flip ts. I don't really see a point in breaking that.
> > - better scheduling of VRR: what compositors actually do isn't wait
> > for the vblank, that only latches the page_flip. They set up a
> > hrtimer, with a fixed/auto-tuned head-start, from which they start
> > their rendering. The same could be done for VRR, if we expose the
> > earliest/latest vblank end times (and how much they can shift). vblank
> > ioctl is only used for syncing that up and making sure we're hitting
> > the right frame. This is all for pageflipping compositors. This is
> > also what they'll probably keep doing when we give them a desired flip
> > time (instead of a target frame), they still need an hrtimer to fire a
> > little bit ahead to get the frame rendered.
> >
> > So taking all together, we have 2 use-cases I think:
> > - frontbuffer rendering. Kinda wants vblank event to fire at the
> > beginning of vblank. Won't ever do VRR
> > - pageflipping compositors. Don't care about the vblank event, either
> > they have an hrtimer timer (which gets synced with the vblank ioctl),
> > or just flip as fast as possible. They don't care one bit when the
> > vblank event happens, as long as a pageflip scheduled immediately
> > after the event shows up only hits the next frame (which is the delay
> > code that currently stretches vblank unecessarily with VRR if you're
> > already a bit late).
> >
> > Making sure that vblank and page flip timestamp match the real
> > userspace seems like the cleanest semantics, and the least surprising
> > to userspace. E.g. what happens if you implement VRR in a compositor,
> > but then tune your hrtimer within the vblank, when the vblank
> > timestamp is totally not reflecting your real pageflip? I just don't
> > see what's the upside of having a vblank timestamp that's not actually
> > reflecting the real flip.
> >
> > Also, if your app wants the start of vblank, that's very easy to
> > compute: Just add the scanout time to the last vblank timestamp.
> > Figuring out the start of frame otoh isn't doable if you can't query,
> > and constantly pageflipping just to have an accurate start-of-frame
> > (for tuning your hrtimer) seems rather wasteful - it outright defeats
> > some of the use-cases for VRR.
> >
> > So what benefit do you see in having vblank ts and pageflip ts not
> > agree? What's the use-case this solves - aside from "less typing" :-)
> > If the only use-case is VRR for frontbuffer rendering, then that
> > doesn't even work right now (aside from that I think it's something we
> > shouldn't support and definitely not encourage).
> > -Daniel
>
> I'm actually fine with moving the vblank event back to the back porch
> but I'd just be worried about userspace breakages that relied on it
> happening the other way.
>
> It'd be especially confusing if the VLINE was different for fixed vs
> variable refresh. I wouldn't want to move it to vpos 0 unless the event
> is also getting moved for fixed refresh as well.

That would break DRI3. So we can pick between two options:
- always send out the vblank early, accept that vblank timestamps change.
- for VRR only, send out the vblank only when we know the correct
timestamp. For non-VRR we have to send out the vblank early, because
that's somewhat become api (but not all drivers do it that way, so
it's about as hopeless as frontbuffer rendering in general).

I still don't see the benefit of the first option, or the exact use case.

> But like you said, the only thing I can think of that should need the
> event right away is low latency / framebuffer rendering.
>
> I can think at least one potential use case where you would want to do
> front buffer rendering for low latency with VRR - emulation of old
> console hardware. There's not really any modern display that has fixed
> refresh modes that match the render frequency of some consoles. But this
> use case isn't out there in the wild right now at least.

Even on an emulator I'd expect it'd just do regular flips to the
frontbuffer. Especially on mobile there's really no such thing as
frontbuffer rendering, you always do a flip. The DIRTYFB ioctl (which
you need to call instead of doing a page flip) simply does a pageflip
behind your back for you. Which I think is worse than actually
flipping.

And for emulating exact refresh rates your then again go back to
hrtimer + issuing the flip approach, but using your desired refresh
rate to tune the hrtimer instead of the fixed one. But that needs a
correct vblank timestamp for the current frame to work correctly (or
you force compositor has to always flip, which defeats the power
saving aspects of VRR allowing to stretch refresh rates when nothing
changes).

> Regarding compositors: I don't really recommend VRR for this use case in
> general since you'll want a fixed refresh rate frequency for input to
> feel consistent and to avoid any flickering / luminance changes. The
> fastest refresh rate will be the most responsive to the user and you can
> just used the fixed refresh rate mode in this case. Lowering the refresh
> rate to save power sounds nice in theory but I'd imagine it'd feel
> terrible in practice.

Why is a compositor any different from a game? You aim to render at
60fps, or whatever the refresh rate is. If you're one frame late it's
better to stretch the vblank slightly than a full on 1-frame freeze.
And on mobile it's unfortunately fairly common that the system fails
to clock up quickly enough when starting from an idle state, VRR
sounds like really useful here.

But like in a game if you don't at least aim to render at a consistent
fps, and handle your input at a consistent fps, then the end result
will indeed be terrible.
-Daniel

> Nicholas Kazlauskas
>
> >
> >>>>> Now added complication is that amdgpu sends out vblank events really
> >>>>> early, which is used by userspace to do frontbuffer rendering in the
> >>>>> vblank time. But I don't think anyone wants to do both VRR and
> >>>>
> >>>> I think all kms drivers try to call drm_crtc_handle_vblank() at start
> >>>> of vblank to give Mesa the most time for frontbuffer rendering for
> >>>> classic X. But vblank events are also used for scheduling bufferswaps
> >>>> or other stuff for redirected windowed rendering, or via api's like
> >>>> OML_sync_controls glXWaitForMscOML, so there might be other things
> >>>> affected by a more delayed vblank handling.
> >>>
> >>> The frontbuffer rendering is very much X driver specific, and I think
> >>> -amdgpu/radeon is the only one that requires this. No i915 driver ever
> >>> used the vblank interrupt to schedule frontbuffer blits, we use some
> >>> CS-side stalls.
> >>>
> >>> Wrt scheduling pageflips: The rule is that right after you've received the
> >>> vblank for frame X, then an immediately schedule pageflip should hit X+1,
> >>> but not X. amdgpu had this broken, but it's fixed since a while.
> >>>
> >>>>> frontbuffer rendering, hence I think it should be possible to create
> >>>>> correct vblank timestamps for the VRR case, while leaving the current
> >>>>> logic in place for everything else. But that means moving the entire
> >>>>> vblank processing to where you know the next frame's start time for
> >>>>> VRR, both for page flips and for all other vblank events.
> >>>>> -Daniel
> >>>>>
> >>>>
> >>>> I think for amdgpu it would be doable by calling drm_handle_vblank
> >>>> from the pageflip irq handler in VRR mode, and then additionally from
> >>>> the vblank interrupt handler like now. Vblank irq's are triggered via
> >>>> vline interrupts with programmable vertical trigger line position, so
> >>>> we could program the trigger line to be start of back-porch instead of
> >>>> start of front-porch. In the pageflip case we do vblank handling +
> >>>> timestamping + pflip completion from pflip irq, and have the regular
> >>>> back-porch vblank handling for refresh cycles in which no
> >>>> pageflip/pflip irq happened.
> >>>
> >>> Hm, can't we simply program the vblank to occur in the back porch region
> >>> for VRR? The vblank timestamping should automatically correct the
> >>> timestamp in that case. Would require a lot less code changes.
> >>>
> >>> Plus update the documentation ofc.
> >>>
> >>>> Not sure if/how that would mess with below-vrr-refresh-rate-range
> >>>> support for stutter reduction at low fps though? Or with performance,
> >>>> given vblank event dispatch could be delayed a lot in VRR compared to
> >>>> normal mode? Or with all the vblank accounting during modesets or crtc
> >>>> en/disable?
> >>
> >> I think we have extra VLINEs that can be programmed at vpos 0 that could
> >> resolve this but I'm still not really sold on the idea.
> >>
> >>>
> >>> Currently VRR and explicit frame scheduling don't mix. We've discussed
> >>> this, and consens seems to be that we need a new property for page_flip,
> >>> similar to the target frame, but instead of frames a timestamp. Then the
> >>> driver can hit a specific time for the next frame.
> >>>
> >>> And since compositors use the vblank ioctl just to do frame scheduling, I
> >>> don't think we have to worry hugely about interactions there. Much better
> >>> to aim for clean semantics (i.e. vblank and flip complete timestamps
> >>> better match).
> >>>
> >>>> Also, could other drivers like Intel easily do this delayed vblank
> >>>> processing in the non-pageflip case?
> >>>
> >>> Intel's problem :-) But yeah we have page flip timestamp registers, so
> >>> worst case all we need to do is shuffle code around a bit. And we have
> >>> interrupts for both start of vblank and and of vblank. It might be a bit
> >>> more work because we use the vblank interrupt to avoid races in atomic
> >>> updates, but that's not too terrible to fix.
> >>>
> >>>>   From my user perspective as a developer of a neuroscience research
> >>>> application that has a couple of exciting/important use cases for VRR
> >>>> mode, i absolutely do need trustworthy and precise pageflip event
> >>>> timestamps that represent reality, otherwise VRR mode would be less
> >>>> than useless for me. I can do without meaningful vblank timestamps in
> >>>> VRR mode though, and expected to do without them, as any scheduling
> >>>> has to happen time-based instead of based on vblank counts anyway,
> >>>> given that vblank counts are quite meaningless if every frame can have
> >>>> a different duration. I'd expect that situation to be similar for
> >>>> other apps that would want to use timestamps in VRR mode like video
> >>>> games, movie players or VR/AR applications.
> >>>
> >>> Yeah for that you want to schedule your frames with a target frame number.
> >>> Would still be nice if vblank timestamps wouldn't be an outright lie (and
> >>> the timestamp is still useful information, even if the frame counter
> >>> isn't - many displays have min/max frame rates for VRR, which is
> >>> information we could perhaps expose).
> >>> -Daniel >
> >>>>
> >>>> -mario
> >>>>
> >>>>>>
> >>>>>> Nicholas Kazlauskas
> >>>>>>
> >>>>>>>
> >>>>>>>> ---
> >>>>>>>>     drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
> >>>>>>>>     include/drm/drm_vblank.h     |  8 ++++++
> >>>>>>>>     2 files changed, 56 insertions(+), 1 deletion(-)
> >>>>>>>>
> >>>>>>>> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> >>>>>>>> index 98e091175921..4b3a4c38fabe 100644
> >>>>>>>> --- a/drivers/gpu/drm/drm_vblank.c
> >>>>>>>> +++ b/drivers/gpu/drm/drm_vblank.c
> >>>>>>>> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
> >>>>>>>>                u64 seq, ktime_t now)
> >>>>>>>>     {
> >>>>>>>>        struct timespec64 tv;
> >>>>>>>> +    ktime_t alt_flip_time;
> >>>>>>>>
> >>>>>>>>        switch (e->event.base.type) {
> >>>>>>>> -    case DRM_EVENT_VBLANK:
> >>>>>>>>        case DRM_EVENT_FLIP_COMPLETE:
> >>>>>>>> +            /*
> >>>>>>>> +             * For flip completion events, override "now" time
> >>>>>>>> +             * with alt_flip_time provided by the driver via
> >>>>>>>> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
> >>>>>>>> +             * if that time is later than given "now" vblank time.
> >>>>>>>> +             */
> >>>>>>>> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
> >>>>>>>> +            if (alt_flip_time > now)
> >>>>>>>> +                    now = alt_flip_time;
> >>>>>>>> +            /* Fallthrough */
> >>>>>>>> +    case DRM_EVENT_VBLANK:
> >>>>>>>>                tv = ktime_to_timespec64(now);
> >>>>>>>>                e->event.vbl.sequence = seq;
> >>>>>>>>                /*
> >>>>>>>> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> >>>>>>>>
> >>>>>>>>                now = ktime_get();
> >>>>>>>>        }
> >>>>>>>> +
> >>>>>>>>        e->pipe = pipe;
> >>>>>>>>        send_vblank_event(dev, e, seq, now);
> >>>>>>>>     }
> >>>>>>>>     EXPORT_SYMBOL(drm_crtc_send_vblank_event);
> >>>>>>>>
> >>>>>>>> +/**
> >>>>>>>> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
> >>>>>>>> + * @crtc: the source CRTC of the pageflip completion event
> >>>>>>>> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
> >>>>>>>> + *
> >>>>>>>> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
> >>>>>>>> + * by the pageflip event can never be earlier than the vblank timestamp of the
> >>>>>>>> + * vblank of flip completion, as that vblank timestamp defines the end of the
> >>>>>>>> + * shortest possible vblank duration. In case of a delayed flip completion
> >>>>>>>> + * inside the extended VRR front porch however, the end of vblank can be much
> >>>>>>>> + * later, so the driver must assign an estimated timestamp of that later end of
> >>>>>>>> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
> >>>>>>>> + * set an alternate flip completion timestamp in case of late flip completions
> >>>>>>>> + * in extended vblank. In the most simple case, this @flip_time timestamp could
> >>>>>>>> + * simply be a ktime_get() timestamp taken at the start of the pageflip
> >>>>>>>> + * completion routine, with some constant duration of the back porch interval
> >>>>>>>> + * added, although more precise estimates may be possible on some hardware if
> >>>>>>>> + * the hardware provides some means of timestamping the true end of vblank.
> >>>>>>>> + *
> >>>>>>>> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
> >>>>>>>> + * will use either the standard vblank timestamp, calculated for a minimum
> >>>>>>>> + * duration vblank, or the provided @flip_time if that time is later than the
> >>>>>>>> + * vblank timestamp, to get the best possible estimate of start of display of
> >>>>>>>> + * the new post-pageflip scanout buffer.
> >>>>>>>> + */
> >>>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> >>>>>>>> +                                     ktime_t flip_time)
> >>>>>>>> +{
> >>>>>>>> +    struct drm_device *dev = crtc->dev;
> >>>>>>>> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
> >>>>>>>> +
> >>>>>>>> +    vblank->alt_flip_time = flip_time;
> >>>>>>>> +}
> >>>>>>>> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
> >>>>>>>> +
> >>>>>>>>     static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
> >>>>>>>>     {
> >>>>>>>>        if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> >>>>>>>> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> >>>>>>>> index 6ad9630d4f48..aacf44694ab6 100644
> >>>>>>>> --- a/include/drm/drm_vblank.h
> >>>>>>>> +++ b/include/drm/drm_vblank.h
> >>>>>>>> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
> >>>>>>>>         * @time: Vblank timestamp corresponding to @count.
> >>>>>>>>         */
> >>>>>>>>        ktime_t time;
> >>>>>>>> +    /**
> >>>>>>>> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
> >>>>>>>> +     * a late pageflip completion in variable refresh rate mode. Pageflip
> >>>>>>>> +     * events will carry the later one of @time and @alt_flip_time.
> >>>>>>>> +     */
> >>>>>>>> +    ktime_t alt_flip_time;
> >>>>>>>>
> >>>>>>>>        /**
> >>>>>>>>         * @refcount: Number of users/waiters of the vblank interrupt. Only when
> >>>>>>>> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
> >>>>>>>>     u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
> >>>>>>>>     u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
> >>>>>>>>                                   ktime_t *vblanktime);
> >>>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> >>>>>>>> +                                     ktime_t flip_time);
> >>>>>>>>     void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> >>>>>>>>                               struct drm_pending_vblank_event *e);
> >>>>>>>>     void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
> >>>>>>>> --
> >>>>>>>> 2.17.1
> >>>>>>>>
> >>>>>>>> _______________________________________________
> >>>>>>>> dri-devel mailing list
> >>>>>>>> dri-devel@lists.freedesktop.org
> >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>>>>>>
> >>>>>>
> >>>>>
> >>>>>
> >>>>> --
> >>>>> Daniel Vetter
> >>>>> Software Engineer, Intel Corporation
> >>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >>>
> >>
> >> Nicholas Kazlauskas
> >>
> >
> >
>
kernel test robot via dri-devel Feb. 13, 2019, 6:10 p.m. UTC | #14
On Wed, Feb 13, 2019 at 5:03 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Wed, Feb 13, 2019 at 4:46 PM Kazlauskas, Nicholas
> <Nicholas.Kazlauskas@amd.com> wrote:
> >
> > On 2/13/19 10:14 AM, Daniel Vetter wrote:
> > > On Wed, Feb 13, 2019 at 3:33 PM Kazlauskas, Nicholas
> > > <Nicholas.Kazlauskas@amd.com> wrote:
> > >>
> > >> On 2/13/19 4:50 AM, Daniel Vetter wrote:
> > >>> On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
> > >>>> On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>>>>
> > >>>>> On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
> > >>>>> <Nicholas.Kazlauskas@amd.com> wrote:
> > >>>>>>
> > >>>>>> On 2/11/19 3:35 AM, Daniel Vetter wrote:
> > >>>>>>> On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
> > >>>>>>>> The pageflip completion timestamps transmitted to userspace
> > >>>>>>>> via pageflip completion events are supposed to describe the
> > >>>>>>>> time at which the first pixel of the new post-pageflip scanout
> > >>>>>>>> buffer leaves the video output of the gpu. This time is
> > >>>>>>>> identical to end of vblank, when active scanout starts.
> > >>>>>>>>
> > >>>>>>>> For a crtc in standard fixed refresh rate, the end of vblank
> > >>>>>>>> is identical to the vblank timestamps calculated by
> > >>>>>>>> drm_update_vblank_count() at each vblank interrupt, or each
> > >>>>>>>> vblank dis-/enable. Therefore pageflip events just carry
> > >>>>>>>> that vblank timestamp as their pageflip timestamp.
> > >>>>>>>>
> > >>>>>>>> For a crtc switched to variable refresh rate mode (vrr), the
> > >>>>>>>> pageflip completion timestamps are identical to the vblank
> > >>>>>>>> timestamps iff the pageflip was executed early in vblank,
> > >>>>>>>> before the minimum vblank duration elapsed. In this case
> > >>>>>>>> the time of display onset is identical to when the crtc
> > >>>>>>>> is running in fixed refresh rate.
> > >>>>>>>>
> > >>>>>>>> However, if a pageflip completes later in the vblank, inside
> > >>>>>>>> the "extended front porch" in vrr mode, then the vblank will
> > >>>>>>>> terminate at a fixed (back porch) duration after flip, so
> > >>>>>>>> the display onset time is delayed correspondingly. In this
> > >>>>>>>> case the vblank timestamp computed at vblank irq time would
> > >>>>>>>> be too early, and we need a way to calculate an estimated
> > >>>>>>>> pageflip timestamp that will be later than the vblank timestamp.
> > >>>>>>>>
> > >>>>>>>> How a driver determines such a "late flip" timestamp is hw
> > >>>>>>>> and driver specific, but this patch adds a new helper function
> > >>>>>>>> that allows the driver to propose such an alternate "late flip"
> > >>>>>>>> timestamp for use in pageflip events:
> > >>>>>>>>
> > >>>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> > >>>>>>>>
> > >>>>>>>> When sending out pageflip events, we now compare that proposed
> > >>>>>>>> flip_timestamp against the vblank timestamp of the current
> > >>>>>>>> vblank of flip completion and choose to send out the greater/
> > >>>>>>>> later timestamp as flip completion timestamp.
> > >>>>>>>>
> > >>>>>>>> The most simple way for a kms driver to supply a suitable
> > >>>>>>>> flip_timestamp in vrr mode would be to simply take a timestamp
> > >>>>>>>> at start of the pageflip completion handler, e.g., pageflip
> > >>>>>>>> irq handler: flip_timestamp = ktime_get(); and then set that
> > >>>>>>>> as proposed "late" alternative timestamp via ...
> > >>>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> > >>>>>>>>
> > >>>>>>>> More clever approaches could try to add some corrective offset
> > >>>>>>>> for fixed back porch duration, or ideally use hardware features
> > >>>>>>>> like hw timestamps to calculate the exact end time of vblank.
> > >>>>>>>>
> > >>>>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > >>>>>>>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > >>>>>>>> Cc: Harry Wentland <harry.wentland@amd.com>
> > >>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> > >>>>>>>
> > >>>>>>> Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
> > >>>>>>> the right timestampe, once? With this I guess if you do a vblank query in
> > >>>>>>> between the wrong and the right vblank you'll get the bogus value. Not
> > >>>>>>> really great for userspace.
> > >>>>>>> -Daniel
> > >>>>>>
> > >>>>>> I think we calculate the timestamp and send the vblank event both within
> > >>>>>> the pageflip IRQ handler so calculating the right pageflip timestamp
> > >>>>>> once could probably be done. I'm not sure if it's easier than proposing
> > >>>>>> a later flip time with an API like this though.
> > >>>>>>
> > >>>>>> The actual scanout time should be known from the page-flip handler so
> > >>>>>> the semantics for VRR on/off remain the same. This is because the
> > >>>>>> page-flip triggers entering the back porch if we're in the extended
> > >>>>>> front porch.
> > >>>>>>
> > >>>>>> But scanout time from vblank events for something like
> > >>>>>> DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
> > >>>>>> treated as estimates. If we're in the regular front porch then the
> > >>>>>> timing to scanout is based on the fixed duration front porch for the
> > >>>>>> current mode. If we're in the extended back porch then it's technically
> > >>>>>> driver defined but the most reasonable guess is to assume that the front
> > >>>>>> porch is going to end at any moment, so just return the length of the
> > >>>>>> back porch for getting the scanout time.
> > >>>>>>
> > >>>>>> Proposing the late timestamp shouldn't affect vblank event in the
> > >>>>>> DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
> > >>>>>> event case. I'm not sure if that's what's guaranteed to happen with this
> > >>>>>> patch though. There doesn't seem to be any locking on either
> > >>>>>> dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
> > >>>>>> the same vblank event back as the one just stored I don't think it's
> > >>>>>> guaranteed.
> > >>>>>
> > >>>>> That's the inconsistency I mean to highlight - the timestamp for the
> > >>>>> same frame as observed through flip complete and through the
> > >>>>> wait_vblank ioctl can differ. Which they really shouldn't.
> > >>>>>
> > >>>>
> > >>>> Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
> > >>>> that vblank and pageflip timestamps should always match. But then the
> > >>>> kernel docs for "Variable refresh properties" in drm_connector.c for
> > >>>> vblank timestamps were changed for the VRR implementation in Linux
> > >>>> 5.0-rc to redefine them when in VRR mode. They are defined, but
> > >>>> probably rather useless for any practical purpose, like this:
> > >>>>
> > >>>> "The semantics for the vertical blank timestamp differ when
> > >>>> variable refresh rate is active. The vertical blank timestamp
> > >>>> is defined to be an estimate using the current mode's fixed
> > >>>> refresh rate timings. The semantics for the page-flip event
> > >>>> timestamp remain the same."
> > >>>
> > >>> Uh I missed that. That sounds like nonsense tbh.
> > >>>
> > >>>> So our docs contradict each other as of Linux 5.0-rc. Certainly having
> > >>>> useful vblank timetamps would be useful.
> > >>>
> > >>> Yup, imo vblank should still match page_flip. Otherwise I expect a lot of
> > >>> hilarity will ensue.
> > >>
> > >> I would imagine you would see more breakage by changing userspace
> > >> expectations for when the event is actually received.
> > >>
> > >> The IOCTL is "wait for vblank", but if we start sending the event near
> > >> the end of vblank it becomes more "wait for pageflip" (near scanout) or
> > >> "wait 15ms while the VRR front porch times out and causes flickering".
> > >>
> > >> I don't speak for all userspace but it seems more useful to me to have
> > >> the event sent at the start of vblank.
> > >
> > > Maybe we need to things. The problem is that the timestampe is defined
> > > to be correct for start-of-frame. And right now the amdgpu
> > > implementation of VRR doesn't even give an accurate timestampe for the
> > > flip, so most likely no one yet thought about this properly.
> > > Definitely no testcases (or it would have been caught), whereas we
> > > have tons of testcases for fixed refresh.
> >
> > I agree that it would be good to have documentation for what waiting for
> > vblank is supposed to mean here. There's a big difference between
> > getting the event at the start of vsync vs the start of the back porch
> > near the end of it.
> >
> > >
> > >> VRR in its current state is useful for applications that aren't
> > >> completely timing sensitive (like most games). This is because the
> > >> driver has free reign in restricting the min/max bounds of the range -
> > >> allowing for enhancements like low framerate compensation / BTR.
> > >>
> > >> For a concrete example, imagine you're not doing frontbuffer rendering
> > >> but you're still trying to target a specific scanout time. So even if we
> > >> were to send the vblank event at vpos 0 you'd get an accurate scanout
> > >> timestamp for the next scanout after the vback porch ends. You're not
> > >> going to be doing any rendering or preparation in this time because it's
> > >> simply too short. So you'd target the scanout period after this one,
> > >> which you'd need to calculate somehow in userspace. Taking the deltas
> > >> between two vblank events is completely useless here because you'll get
> > >> back the vrr min refresh delta. So you can take your source content rate
> > >> or some fixed rate you'd like to target within the range but even then
> > >> you're still not guaranteed to have that flip end up at that scanout time.
> > >>
> > >> The current solution is to just return the scanout pos at vrr max, or
> > >> the current mode's fixed refresh rate. If you're flip before the fixed
> > >> duration front porch ends then that scanout time will likely be accurate
> > >> since you can't flip faster than vrr max.
> > >>
> > >> Is there actually userspace that cares that these timestamps need to
> > >> match? Like said below, it seems that wait for vblank is mostly about
> > >> scheduling and for that I think having the event sent back at the start
> > >> of vblank is more important here.
> > >
> > > There's 3 cases:
> > > - DRI3: Waits for vblank for frontbuffer blits. If you combine that
> > > with VRR then imo you deserve to keep all the pieces.
> > > - generic compositors. Use VRR for scheduling, expect that a page_flip
> > > queued immediately after a page_flip will hit the next frame, and not
> > > the current frame. They also expect that this will not give you the
> > > slowest possible refresh rate, but I guess VRR plus frame scheduling
> > > is currently out of scope. It's kinda just for as-fast-as-possible
> > > games.
> > > - flip timestamp. Currently the entire codebase assumes that vblank ts
> > > = flip ts. I don't really see a point in breaking that.
> > > - better scheduling of VRR: what compositors actually do isn't wait
> > > for the vblank, that only latches the page_flip. They set up a
> > > hrtimer, with a fixed/auto-tuned head-start, from which they start
> > > their rendering. The same could be done for VRR, if we expose the
> > > earliest/latest vblank end times (and how much they can shift). vblank
> > > ioctl is only used for syncing that up and making sure we're hitting
> > > the right frame. This is all for pageflipping compositors. This is
> > > also what they'll probably keep doing when we give them a desired flip
> > > time (instead of a target frame), they still need an hrtimer to fire a
> > > little bit ahead to get the frame rendered.
> > >
> > > So taking all together, we have 2 use-cases I think:
> > > - frontbuffer rendering. Kinda wants vblank event to fire at the
> > > beginning of vblank. Won't ever do VRR
> > > - pageflipping compositors. Don't care about the vblank event, either
> > > they have an hrtimer timer (which gets synced with the vblank ioctl),
> > > or just flip as fast as possible. They don't care one bit when the
> > > vblank event happens, as long as a pageflip scheduled immediately
> > > after the event shows up only hits the next frame (which is the delay
> > > code that currently stretches vblank unecessarily with VRR if you're
> > > already a bit late).
> > >
> > > Making sure that vblank and page flip timestamp match the real
> > > userspace seems like the cleanest semantics, and the least surprising
> > > to userspace. E.g. what happens if you implement VRR in a compositor,
> > > but then tune your hrtimer within the vblank, when the vblank
> > > timestamp is totally not reflecting your real pageflip? I just don't
> > > see what's the upside of having a vblank timestamp that's not actually
> > > reflecting the real flip.
> > >
> > > Also, if your app wants the start of vblank, that's very easy to
> > > compute: Just add the scanout time to the last vblank timestamp.
> > > Figuring out the start of frame otoh isn't doable if you can't query,
> > > and constantly pageflipping just to have an accurate start-of-frame
> > > (for tuning your hrtimer) seems rather wasteful - it outright defeats
> > > some of the use-cases for VRR.
> > >
> > > So what benefit do you see in having vblank ts and pageflip ts not
> > > agree? What's the use-case this solves - aside from "less typing" :-)
> > > If the only use-case is VRR for frontbuffer rendering, then that
> > > doesn't even work right now (aside from that I think it's something we
> > > shouldn't support and definitely not encourage).
> > > -Daniel
> >
> > I'm actually fine with moving the vblank event back to the back porch
> > but I'd just be worried about userspace breakages that relied on it
> > happening the other way.
> >
> > It'd be especially confusing if the VLINE was different for fixed vs
> > variable refresh. I wouldn't want to move it to vpos 0 unless the event
> > is also getting moved for fixed refresh as well.
>
> That would break DRI3. So we can pick between two options:
> - always send out the vblank early, accept that vblank timestamps change.
> - for VRR only, send out the vblank only when we know the correct
> timestamp. For non-VRR we have to send out the vblank early, because
> that's somewhat become api (but not all drivers do it that way, so
> it's about as hopeless as frontbuffer rendering in general).
>
> I still don't see the benefit of the first option, or the exact use case.
>

Still haven't slept so i don't know if i'm adding anything to the
discussion that Daniel hasn't stated already, but anyway:

I think i agree with Daniel on this. Having a vblank ts that truly
reflects the end of last vblank/start of scanout should make
time-based scheduling easier. Then we know when scanout of the active
frame starts and calculate from known fixed scanout duration when next
vblank starts and when it ends the latest. On X11 you can
glXGetSyncValuesOML() query that start time to sync up a compositors
update cycle or start of an animation or timed presentation sequence
with vblank. E.g., Waylands Weston does this when switching from idle
desktop to redraw loop. A movie player or my own software could use
this info in the same way, whereas i don't see the current vblank
timestams being useful. If you get the current vblank event with
timestamp at start of vblank ie. a bit into the short fixed duration
front porch, you'd have to render and flip within what small amount of
time is left unti the end of the fixed front-porch, because once you
get into variable extended front-porch your vblank timestamp is
already meaningless.

I also think if we want to implement something like the ability to
flip for a specific target scanout time -- which i'd really like to
have -- having the true end of last vblank gives us a baseline
timestamp that is easier to reason about inside the kernel, similar to
how a compositor can reason/adjust to when to start composing the next
frame or submitting a flip to hit some consistent framerate, or in
turn to better obey a presentation request of a client for a specific
target time.

At the same time we need to keep current behaviour for non-VRR mode,
with early vblanks, so all the target_msc vblank count based
scheduling keeps working. I don't think clients that are written to
use VRR should expect exactly the same timing behaviour wrt. when
vblank events get sent or timestamps updated. I certainly expect to
have separate code-paths in my app for optimal VRR vs. non-VRR
presentation scheduling, a compositor or movie player might adapt in
the same way, and fullscreen apps can opt-out of VRR if they want.

Clients just need to know what the behavior is in VRR mode. I wonder
what we should do for the kernel docs for Linux 5.0-rc? I assume we
won't have the time to fix this all before release, would it make
sense to have some kernel-docs updates via drm-fixes that declare the
vblank/pageflip timestamps as "undefined"/"not to be relied
on"/"subject to future change" for this release - essentially declare
vblank timestamps as unstable api for the moment, so nobody relies on
the current quite broken timestamp behaviour?

I do now have hacked up proof-of-concept patches here that after
torturing them for a few hours with my own tests now seem to be able
to calculate correct pageflip timestamps regardless which of the two
options we choose, but they need refinement/cleanup/rebasing onto
somethign else than Linux 5.0-rc5, and help from Nicholas, because i
tend to get lost in all the indirections and file hierarchy inside the
DisplayCore code etc., and probably also a few kinks worked out. And
see if shifting the vblank irq handling doesn't cause other weird
interactions? I haven't looked how Nicholas below-the-range/framerate
compensation works in detail and if it matter that it happens at start
of front-porch. I can only say that it doesn't seem to interfere with
correct timestamping if i run with framerates intentionally below
vrr-min, as far as my limited testing goes. Another thing that may be
problematic is disabling/reenabling vblank during idle times, given
that we can only reliably timestamp from within pageflip irq handler
or vblank irq ~ back-porch/active scanout. Maybe we need to
drm_crtc_vblank_get() hold a reference while VRR is active to simplify
things for the moment.

One thing one could merge is that other VRR friendly throttling patch
i sent, given that that works and is independent of the timestamp
stuff. I also have a backport for Linux 5.0-rc if that makes sense?
That VRR throttling also needs to get reworked if/once we wanted to
emit matching vblank/pflip timestamps, as that doesn't like the
shifted point inside vblank when vblank counter gets updated.

Basically i don't know when the deadlines for merging this kind of
work are/were wrt. Linux 5.0, 5.1, 5.2?

I'll look more into cleaning/testing those patches tomorrow after lots
of sleep and then see what Nicholas thinks of them.

One problem with the VRR stuff is that the hardware testing equipment,
that i'd usually use to verify if timestamps are really correct and
accurate, doesn't work with DisplayPort or HDMI signals, so i can only
test for self-consistency of timestamps atm., and look if plots of
timestamps look how i would expect them to look if stuff is mostly
correct.

-mario

> > But like you said, the only thing I can think of that should need the
> > event right away is low latency / framebuffer rendering.
> >
> > I can think at least one potential use case where you would want to do
> > front buffer rendering for low latency with VRR - emulation of old
> > console hardware. There's not really any modern display that has fixed
> > refresh modes that match the render frequency of some consoles. But this
> > use case isn't out there in the wild right now at least.
>
> Even on an emulator I'd expect it'd just do regular flips to the
> frontbuffer. Especially on mobile there's really no such thing as
> frontbuffer rendering, you always do a flip. The DIRTYFB ioctl (which
> you need to call instead of doing a page flip) simply does a pageflip
> behind your back for you. Which I think is worse than actually
> flipping.
>
> And for emulating exact refresh rates your then again go back to
> hrtimer + issuing the flip approach, but using your desired refresh
> rate to tune the hrtimer instead of the fixed one. But that needs a
> correct vblank timestamp for the current frame to work correctly (or
> you force compositor has to always flip, which defeats the power
> saving aspects of VRR allowing to stretch refresh rates when nothing
> changes).
>
> > Regarding compositors: I don't really recommend VRR for this use case in
> > general since you'll want a fixed refresh rate frequency for input to
> > feel consistent and to avoid any flickering / luminance changes. The
> > fastest refresh rate will be the most responsive to the user and you can
> > just used the fixed refresh rate mode in this case. Lowering the refresh
> > rate to save power sounds nice in theory but I'd imagine it'd feel
> > terrible in practice.
>
> Why is a compositor any different from a game? You aim to render at
> 60fps, or whatever the refresh rate is. If you're one frame late it's
> better to stretch the vblank slightly than a full on 1-frame freeze.
> And on mobile it's unfortunately fairly common that the system fails
> to clock up quickly enough when starting from an idle state, VRR
> sounds like really useful here.
>
> But like in a game if you don't at least aim to render at a consistent
> fps, and handle your input at a consistent fps, then the end result
> will indeed be terrible.
> -Daniel
>
> > Nicholas Kazlauskas
> >
> > >
> > >>>>> Now added complication is that amdgpu sends out vblank events really
> > >>>>> early, which is used by userspace to do frontbuffer rendering in the
> > >>>>> vblank time. But I don't think anyone wants to do both VRR and
> > >>>>
> > >>>> I think all kms drivers try to call drm_crtc_handle_vblank() at start
> > >>>> of vblank to give Mesa the most time for frontbuffer rendering for
> > >>>> classic X. But vblank events are also used for scheduling bufferswaps
> > >>>> or other stuff for redirected windowed rendering, or via api's like
> > >>>> OML_sync_controls glXWaitForMscOML, so there might be other things
> > >>>> affected by a more delayed vblank handling.
> > >>>
> > >>> The frontbuffer rendering is very much X driver specific, and I think
> > >>> -amdgpu/radeon is the only one that requires this. No i915 driver ever
> > >>> used the vblank interrupt to schedule frontbuffer blits, we use some
> > >>> CS-side stalls.
> > >>>
> > >>> Wrt scheduling pageflips: The rule is that right after you've received the
> > >>> vblank for frame X, then an immediately schedule pageflip should hit X+1,
> > >>> but not X. amdgpu had this broken, but it's fixed since a while.
> > >>>
> > >>>>> frontbuffer rendering, hence I think it should be possible to create
> > >>>>> correct vblank timestamps for the VRR case, while leaving the current
> > >>>>> logic in place for everything else. But that means moving the entire
> > >>>>> vblank processing to where you know the next frame's start time for
> > >>>>> VRR, both for page flips and for all other vblank events.
> > >>>>> -Daniel
> > >>>>>
> > >>>>
> > >>>> I think for amdgpu it would be doable by calling drm_handle_vblank
> > >>>> from the pageflip irq handler in VRR mode, and then additionally from
> > >>>> the vblank interrupt handler like now. Vblank irq's are triggered via
> > >>>> vline interrupts with programmable vertical trigger line position, so
> > >>>> we could program the trigger line to be start of back-porch instead of
> > >>>> start of front-porch. In the pageflip case we do vblank handling +
> > >>>> timestamping + pflip completion from pflip irq, and have the regular
> > >>>> back-porch vblank handling for refresh cycles in which no
> > >>>> pageflip/pflip irq happened.
> > >>>
> > >>> Hm, can't we simply program the vblank to occur in the back porch region
> > >>> for VRR? The vblank timestamping should automatically correct the
> > >>> timestamp in that case. Would require a lot less code changes.
> > >>>
> > >>> Plus update the documentation ofc.
> > >>>
> > >>>> Not sure if/how that would mess with below-vrr-refresh-rate-range
> > >>>> support for stutter reduction at low fps though? Or with performance,
> > >>>> given vblank event dispatch could be delayed a lot in VRR compared to
> > >>>> normal mode? Or with all the vblank accounting during modesets or crtc
> > >>>> en/disable?
> > >>
> > >> I think we have extra VLINEs that can be programmed at vpos 0 that could
> > >> resolve this but I'm still not really sold on the idea.
> > >>
> > >>>
> > >>> Currently VRR and explicit frame scheduling don't mix. We've discussed
> > >>> this, and consens seems to be that we need a new property for page_flip,
> > >>> similar to the target frame, but instead of frames a timestamp. Then the
> > >>> driver can hit a specific time for the next frame.
> > >>>
> > >>> And since compositors use the vblank ioctl just to do frame scheduling, I
> > >>> don't think we have to worry hugely about interactions there. Much better
> > >>> to aim for clean semantics (i.e. vblank and flip complete timestamps
> > >>> better match).
> > >>>
> > >>>> Also, could other drivers like Intel easily do this delayed vblank
> > >>>> processing in the non-pageflip case?
> > >>>
> > >>> Intel's problem :-) But yeah we have page flip timestamp registers, so
> > >>> worst case all we need to do is shuffle code around a bit. And we have
> > >>> interrupts for both start of vblank and and of vblank. It might be a bit
> > >>> more work because we use the vblank interrupt to avoid races in atomic
> > >>> updates, but that's not too terrible to fix.
> > >>>
> > >>>>   From my user perspective as a developer of a neuroscience research
> > >>>> application that has a couple of exciting/important use cases for VRR
> > >>>> mode, i absolutely do need trustworthy and precise pageflip event
> > >>>> timestamps that represent reality, otherwise VRR mode would be less
> > >>>> than useless for me. I can do without meaningful vblank timestamps in
> > >>>> VRR mode though, and expected to do without them, as any scheduling
> > >>>> has to happen time-based instead of based on vblank counts anyway,
> > >>>> given that vblank counts are quite meaningless if every frame can have
> > >>>> a different duration. I'd expect that situation to be similar for
> > >>>> other apps that would want to use timestamps in VRR mode like video
> > >>>> games, movie players or VR/AR applications.
> > >>>
> > >>> Yeah for that you want to schedule your frames with a target frame number.
> > >>> Would still be nice if vblank timestamps wouldn't be an outright lie (and
> > >>> the timestamp is still useful information, even if the frame counter
> > >>> isn't - many displays have min/max frame rates for VRR, which is
> > >>> information we could perhaps expose).
> > >>> -Daniel >
> > >>>>
> > >>>> -mario
> > >>>>
> > >>>>>>
> > >>>>>> Nicholas Kazlauskas
> > >>>>>>
> > >>>>>>>
> > >>>>>>>> ---
> > >>>>>>>>     drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
> > >>>>>>>>     include/drm/drm_vblank.h     |  8 ++++++
> > >>>>>>>>     2 files changed, 56 insertions(+), 1 deletion(-)
> > >>>>>>>>
> > >>>>>>>> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> > >>>>>>>> index 98e091175921..4b3a4c38fabe 100644
> > >>>>>>>> --- a/drivers/gpu/drm/drm_vblank.c
> > >>>>>>>> +++ b/drivers/gpu/drm/drm_vblank.c
> > >>>>>>>> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
> > >>>>>>>>                u64 seq, ktime_t now)
> > >>>>>>>>     {
> > >>>>>>>>        struct timespec64 tv;
> > >>>>>>>> +    ktime_t alt_flip_time;
> > >>>>>>>>
> > >>>>>>>>        switch (e->event.base.type) {
> > >>>>>>>> -    case DRM_EVENT_VBLANK:
> > >>>>>>>>        case DRM_EVENT_FLIP_COMPLETE:
> > >>>>>>>> +            /*
> > >>>>>>>> +             * For flip completion events, override "now" time
> > >>>>>>>> +             * with alt_flip_time provided by the driver via
> > >>>>>>>> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
> > >>>>>>>> +             * if that time is later than given "now" vblank time.
> > >>>>>>>> +             */
> > >>>>>>>> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
> > >>>>>>>> +            if (alt_flip_time > now)
> > >>>>>>>> +                    now = alt_flip_time;
> > >>>>>>>> +            /* Fallthrough */
> > >>>>>>>> +    case DRM_EVENT_VBLANK:
> > >>>>>>>>                tv = ktime_to_timespec64(now);
> > >>>>>>>>                e->event.vbl.sequence = seq;
> > >>>>>>>>                /*
> > >>>>>>>> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> > >>>>>>>>
> > >>>>>>>>                now = ktime_get();
> > >>>>>>>>        }
> > >>>>>>>> +
> > >>>>>>>>        e->pipe = pipe;
> > >>>>>>>>        send_vblank_event(dev, e, seq, now);
> > >>>>>>>>     }
> > >>>>>>>>     EXPORT_SYMBOL(drm_crtc_send_vblank_event);
> > >>>>>>>>
> > >>>>>>>> +/**
> > >>>>>>>> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
> > >>>>>>>> + * @crtc: the source CRTC of the pageflip completion event
> > >>>>>>>> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
> > >>>>>>>> + *
> > >>>>>>>> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
> > >>>>>>>> + * by the pageflip event can never be earlier than the vblank timestamp of the
> > >>>>>>>> + * vblank of flip completion, as that vblank timestamp defines the end of the
> > >>>>>>>> + * shortest possible vblank duration. In case of a delayed flip completion
> > >>>>>>>> + * inside the extended VRR front porch however, the end of vblank can be much
> > >>>>>>>> + * later, so the driver must assign an estimated timestamp of that later end of
> > >>>>>>>> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
> > >>>>>>>> + * set an alternate flip completion timestamp in case of late flip completions
> > >>>>>>>> + * in extended vblank. In the most simple case, this @flip_time timestamp could
> > >>>>>>>> + * simply be a ktime_get() timestamp taken at the start of the pageflip
> > >>>>>>>> + * completion routine, with some constant duration of the back porch interval
> > >>>>>>>> + * added, although more precise estimates may be possible on some hardware if
> > >>>>>>>> + * the hardware provides some means of timestamping the true end of vblank.
> > >>>>>>>> + *
> > >>>>>>>> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
> > >>>>>>>> + * will use either the standard vblank timestamp, calculated for a minimum
> > >>>>>>>> + * duration vblank, or the provided @flip_time if that time is later than the
> > >>>>>>>> + * vblank timestamp, to get the best possible estimate of start of display of
> > >>>>>>>> + * the new post-pageflip scanout buffer.
> > >>>>>>>> + */
> > >>>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> > >>>>>>>> +                                     ktime_t flip_time)
> > >>>>>>>> +{
> > >>>>>>>> +    struct drm_device *dev = crtc->dev;
> > >>>>>>>> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
> > >>>>>>>> +
> > >>>>>>>> +    vblank->alt_flip_time = flip_time;
> > >>>>>>>> +}
> > >>>>>>>> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
> > >>>>>>>> +
> > >>>>>>>>     static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
> > >>>>>>>>     {
> > >>>>>>>>        if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> > >>>>>>>> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> > >>>>>>>> index 6ad9630d4f48..aacf44694ab6 100644
> > >>>>>>>> --- a/include/drm/drm_vblank.h
> > >>>>>>>> +++ b/include/drm/drm_vblank.h
> > >>>>>>>> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
> > >>>>>>>>         * @time: Vblank timestamp corresponding to @count.
> > >>>>>>>>         */
> > >>>>>>>>        ktime_t time;
> > >>>>>>>> +    /**
> > >>>>>>>> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
> > >>>>>>>> +     * a late pageflip completion in variable refresh rate mode. Pageflip
> > >>>>>>>> +     * events will carry the later one of @time and @alt_flip_time.
> > >>>>>>>> +     */
> > >>>>>>>> +    ktime_t alt_flip_time;
> > >>>>>>>>
> > >>>>>>>>        /**
> > >>>>>>>>         * @refcount: Number of users/waiters of the vblank interrupt. Only when
> > >>>>>>>> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
> > >>>>>>>>     u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
> > >>>>>>>>     u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
> > >>>>>>>>                                   ktime_t *vblanktime);
> > >>>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> > >>>>>>>> +                                     ktime_t flip_time);
> > >>>>>>>>     void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> > >>>>>>>>                               struct drm_pending_vblank_event *e);
> > >>>>>>>>     void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
> > >>>>>>>> --
> > >>>>>>>> 2.17.1
> > >>>>>>>>
> > >>>>>>>> _______________________________________________
> > >>>>>>>> dri-devel mailing listjj
> > >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >>>>>>>
> > >>>>>>
> > >>>>>
> > >>>>>
> > >>>>> --
> > >>>>> Daniel Vetter
> > >>>>> Software Engineer, Intel Corporation
> > >>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > >>>
> > >>
> > >> Nicholas Kazlauskas
> > >>
> > >
> > >
> >
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Kazlauskas, Nicholas Feb. 13, 2019, 6:34 p.m. UTC | #15
On 2/13/19 1:10 PM, Mario Kleiner wrote:
> On Wed, Feb 13, 2019 at 5:03 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>> On Wed, Feb 13, 2019 at 4:46 PM Kazlauskas, Nicholas
>> <Nicholas.Kazlauskas@amd.com> wrote:
>>>
>>> On 2/13/19 10:14 AM, Daniel Vetter wrote:
>>>> On Wed, Feb 13, 2019 at 3:33 PM Kazlauskas, Nicholas
>>>> <Nicholas.Kazlauskas@amd.com> wrote:
>>>>>
>>>>> On 2/13/19 4:50 AM, Daniel Vetter wrote:
>>>>>> On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
>>>>>>> On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>>>>
>>>>>>>> On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
>>>>>>>> <Nicholas.Kazlauskas@amd.com> wrote:
>>>>>>>>>
>>>>>>>>> On 2/11/19 3:35 AM, Daniel Vetter wrote:
>>>>>>>>>> On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
>>>>>>>>>>> The pageflip completion timestamps transmitted to userspace
>>>>>>>>>>> via pageflip completion events are supposed to describe the
>>>>>>>>>>> time at which the first pixel of the new post-pageflip scanout
>>>>>>>>>>> buffer leaves the video output of the gpu. This time is
>>>>>>>>>>> identical to end of vblank, when active scanout starts.
>>>>>>>>>>>
>>>>>>>>>>> For a crtc in standard fixed refresh rate, the end of vblank
>>>>>>>>>>> is identical to the vblank timestamps calculated by
>>>>>>>>>>> drm_update_vblank_count() at each vblank interrupt, or each
>>>>>>>>>>> vblank dis-/enable. Therefore pageflip events just carry
>>>>>>>>>>> that vblank timestamp as their pageflip timestamp.
>>>>>>>>>>>
>>>>>>>>>>> For a crtc switched to variable refresh rate mode (vrr), the
>>>>>>>>>>> pageflip completion timestamps are identical to the vblank
>>>>>>>>>>> timestamps iff the pageflip was executed early in vblank,
>>>>>>>>>>> before the minimum vblank duration elapsed. In this case
>>>>>>>>>>> the time of display onset is identical to when the crtc
>>>>>>>>>>> is running in fixed refresh rate.
>>>>>>>>>>>
>>>>>>>>>>> However, if a pageflip completes later in the vblank, inside
>>>>>>>>>>> the "extended front porch" in vrr mode, then the vblank will
>>>>>>>>>>> terminate at a fixed (back porch) duration after flip, so
>>>>>>>>>>> the display onset time is delayed correspondingly. In this
>>>>>>>>>>> case the vblank timestamp computed at vblank irq time would
>>>>>>>>>>> be too early, and we need a way to calculate an estimated
>>>>>>>>>>> pageflip timestamp that will be later than the vblank timestamp.
>>>>>>>>>>>
>>>>>>>>>>> How a driver determines such a "late flip" timestamp is hw
>>>>>>>>>>> and driver specific, but this patch adds a new helper function
>>>>>>>>>>> that allows the driver to propose such an alternate "late flip"
>>>>>>>>>>> timestamp for use in pageflip events:
>>>>>>>>>>>
>>>>>>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
>>>>>>>>>>>
>>>>>>>>>>> When sending out pageflip events, we now compare that proposed
>>>>>>>>>>> flip_timestamp against the vblank timestamp of the current
>>>>>>>>>>> vblank of flip completion and choose to send out the greater/
>>>>>>>>>>> later timestamp as flip completion timestamp.
>>>>>>>>>>>
>>>>>>>>>>> The most simple way for a kms driver to supply a suitable
>>>>>>>>>>> flip_timestamp in vrr mode would be to simply take a timestamp
>>>>>>>>>>> at start of the pageflip completion handler, e.g., pageflip
>>>>>>>>>>> irq handler: flip_timestamp = ktime_get(); and then set that
>>>>>>>>>>> as proposed "late" alternative timestamp via ...
>>>>>>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
>>>>>>>>>>>
>>>>>>>>>>> More clever approaches could try to add some corrective offset
>>>>>>>>>>> for fixed back porch duration, or ideally use hardware features
>>>>>>>>>>> like hw timestamps to calculate the exact end time of vblank.
>>>>>>>>>>>
>>>>>>>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
>>>>>>>>>>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
>>>>>>>>>>> Cc: Harry Wentland <harry.wentland@amd.com>
>>>>>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
>>>>>>>>>>
>>>>>>>>>> Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
>>>>>>>>>> the right timestampe, once? With this I guess if you do a vblank query in
>>>>>>>>>> between the wrong and the right vblank you'll get the bogus value. Not
>>>>>>>>>> really great for userspace.
>>>>>>>>>> -Daniel
>>>>>>>>>
>>>>>>>>> I think we calculate the timestamp and send the vblank event both within
>>>>>>>>> the pageflip IRQ handler so calculating the right pageflip timestamp
>>>>>>>>> once could probably be done. I'm not sure if it's easier than proposing
>>>>>>>>> a later flip time with an API like this though.
>>>>>>>>>
>>>>>>>>> The actual scanout time should be known from the page-flip handler so
>>>>>>>>> the semantics for VRR on/off remain the same. This is because the
>>>>>>>>> page-flip triggers entering the back porch if we're in the extended
>>>>>>>>> front porch.
>>>>>>>>>
>>>>>>>>> But scanout time from vblank events for something like
>>>>>>>>> DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
>>>>>>>>> treated as estimates. If we're in the regular front porch then the
>>>>>>>>> timing to scanout is based on the fixed duration front porch for the
>>>>>>>>> current mode. If we're in the extended back porch then it's technically
>>>>>>>>> driver defined but the most reasonable guess is to assume that the front
>>>>>>>>> porch is going to end at any moment, so just return the length of the
>>>>>>>>> back porch for getting the scanout time.
>>>>>>>>>
>>>>>>>>> Proposing the late timestamp shouldn't affect vblank event in the
>>>>>>>>> DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
>>>>>>>>> event case. I'm not sure if that's what's guaranteed to happen with this
>>>>>>>>> patch though. There doesn't seem to be any locking on either
>>>>>>>>> dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
>>>>>>>>> the same vblank event back as the one just stored I don't think it's
>>>>>>>>> guaranteed.
>>>>>>>>
>>>>>>>> That's the inconsistency I mean to highlight - the timestamp for the
>>>>>>>> same frame as observed through flip complete and through the
>>>>>>>> wait_vblank ioctl can differ. Which they really shouldn't.
>>>>>>>>
>>>>>>>
>>>>>>> Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
>>>>>>> that vblank and pageflip timestamps should always match. But then the
>>>>>>> kernel docs for "Variable refresh properties" in drm_connector.c for
>>>>>>> vblank timestamps were changed for the VRR implementation in Linux
>>>>>>> 5.0-rc to redefine them when in VRR mode. They are defined, but
>>>>>>> probably rather useless for any practical purpose, like this:
>>>>>>>
>>>>>>> "The semantics for the vertical blank timestamp differ when
>>>>>>> variable refresh rate is active. The vertical blank timestamp
>>>>>>> is defined to be an estimate using the current mode's fixed
>>>>>>> refresh rate timings. The semantics for the page-flip event
>>>>>>> timestamp remain the same."
>>>>>>
>>>>>> Uh I missed that. That sounds like nonsense tbh.
>>>>>>
>>>>>>> So our docs contradict each other as of Linux 5.0-rc. Certainly having
>>>>>>> useful vblank timetamps would be useful.
>>>>>>
>>>>>> Yup, imo vblank should still match page_flip. Otherwise I expect a lot of
>>>>>> hilarity will ensue.
>>>>>
>>>>> I would imagine you would see more breakage by changing userspace
>>>>> expectations for when the event is actually received.
>>>>>
>>>>> The IOCTL is "wait for vblank", but if we start sending the event near
>>>>> the end of vblank it becomes more "wait for pageflip" (near scanout) or
>>>>> "wait 15ms while the VRR front porch times out and causes flickering".
>>>>>
>>>>> I don't speak for all userspace but it seems more useful to me to have
>>>>> the event sent at the start of vblank.
>>>>
>>>> Maybe we need to things. The problem is that the timestampe is defined
>>>> to be correct for start-of-frame. And right now the amdgpu
>>>> implementation of VRR doesn't even give an accurate timestampe for the
>>>> flip, so most likely no one yet thought about this properly.
>>>> Definitely no testcases (or it would have been caught), whereas we
>>>> have tons of testcases for fixed refresh.
>>>
>>> I agree that it would be good to have documentation for what waiting for
>>> vblank is supposed to mean here. There's a big difference between
>>> getting the event at the start of vsync vs the start of the back porch
>>> near the end of it.
>>>
>>>>
>>>>> VRR in its current state is useful for applications that aren't
>>>>> completely timing sensitive (like most games). This is because the
>>>>> driver has free reign in restricting the min/max bounds of the range -
>>>>> allowing for enhancements like low framerate compensation / BTR.
>>>>>
>>>>> For a concrete example, imagine you're not doing frontbuffer rendering
>>>>> but you're still trying to target a specific scanout time. So even if we
>>>>> were to send the vblank event at vpos 0 you'd get an accurate scanout
>>>>> timestamp for the next scanout after the vback porch ends. You're not
>>>>> going to be doing any rendering or preparation in this time because it's
>>>>> simply too short. So you'd target the scanout period after this one,
>>>>> which you'd need to calculate somehow in userspace. Taking the deltas
>>>>> between two vblank events is completely useless here because you'll get
>>>>> back the vrr min refresh delta. So you can take your source content rate
>>>>> or some fixed rate you'd like to target within the range but even then
>>>>> you're still not guaranteed to have that flip end up at that scanout time.
>>>>>
>>>>> The current solution is to just return the scanout pos at vrr max, or
>>>>> the current mode's fixed refresh rate. If you're flip before the fixed
>>>>> duration front porch ends then that scanout time will likely be accurate
>>>>> since you can't flip faster than vrr max.
>>>>>
>>>>> Is there actually userspace that cares that these timestamps need to
>>>>> match? Like said below, it seems that wait for vblank is mostly about
>>>>> scheduling and for that I think having the event sent back at the start
>>>>> of vblank is more important here.
>>>>
>>>> There's 3 cases:
>>>> - DRI3: Waits for vblank for frontbuffer blits. If you combine that
>>>> with VRR then imo you deserve to keep all the pieces.
>>>> - generic compositors. Use VRR for scheduling, expect that a page_flip
>>>> queued immediately after a page_flip will hit the next frame, and not
>>>> the current frame. They also expect that this will not give you the
>>>> slowest possible refresh rate, but I guess VRR plus frame scheduling
>>>> is currently out of scope. It's kinda just for as-fast-as-possible
>>>> games.
>>>> - flip timestamp. Currently the entire codebase assumes that vblank ts
>>>> = flip ts. I don't really see a point in breaking that.
>>>> - better scheduling of VRR: what compositors actually do isn't wait
>>>> for the vblank, that only latches the page_flip. They set up a
>>>> hrtimer, with a fixed/auto-tuned head-start, from which they start
>>>> their rendering. The same could be done for VRR, if we expose the
>>>> earliest/latest vblank end times (and how much they can shift). vblank
>>>> ioctl is only used for syncing that up and making sure we're hitting
>>>> the right frame. This is all for pageflipping compositors. This is
>>>> also what they'll probably keep doing when we give them a desired flip
>>>> time (instead of a target frame), they still need an hrtimer to fire a
>>>> little bit ahead to get the frame rendered.
>>>>
>>>> So taking all together, we have 2 use-cases I think:
>>>> - frontbuffer rendering. Kinda wants vblank event to fire at the
>>>> beginning of vblank. Won't ever do VRR
>>>> - pageflipping compositors. Don't care about the vblank event, either
>>>> they have an hrtimer timer (which gets synced with the vblank ioctl),
>>>> or just flip as fast as possible. They don't care one bit when the
>>>> vblank event happens, as long as a pageflip scheduled immediately
>>>> after the event shows up only hits the next frame (which is the delay
>>>> code that currently stretches vblank unecessarily with VRR if you're
>>>> already a bit late).
>>>>
>>>> Making sure that vblank and page flip timestamp match the real
>>>> userspace seems like the cleanest semantics, and the least surprising
>>>> to userspace. E.g. what happens if you implement VRR in a compositor,
>>>> but then tune your hrtimer within the vblank, when the vblank
>>>> timestamp is totally not reflecting your real pageflip? I just don't
>>>> see what's the upside of having a vblank timestamp that's not actually
>>>> reflecting the real flip.
>>>>
>>>> Also, if your app wants the start of vblank, that's very easy to
>>>> compute: Just add the scanout time to the last vblank timestamp.
>>>> Figuring out the start of frame otoh isn't doable if you can't query,
>>>> and constantly pageflipping just to have an accurate start-of-frame
>>>> (for tuning your hrtimer) seems rather wasteful - it outright defeats
>>>> some of the use-cases for VRR.
>>>>
>>>> So what benefit do you see in having vblank ts and pageflip ts not
>>>> agree? What's the use-case this solves - aside from "less typing" :-)
>>>> If the only use-case is VRR for frontbuffer rendering, then that
>>>> doesn't even work right now (aside from that I think it's something we
>>>> shouldn't support and definitely not encourage).
>>>> -Daniel
>>>
>>> I'm actually fine with moving the vblank event back to the back porch
>>> but I'd just be worried about userspace breakages that relied on it
>>> happening the other way.
>>>
>>> It'd be especially confusing if the VLINE was different for fixed vs
>>> variable refresh. I wouldn't want to move it to vpos 0 unless the event
>>> is also getting moved for fixed refresh as well.
>>
>> That would break DRI3. So we can pick between two options:
>> - always send out the vblank early, accept that vblank timestamps change.
>> - for VRR only, send out the vblank only when we know the correct
>> timestamp. For non-VRR we have to send out the vblank early, because
>> that's somewhat become api (but not all drivers do it that way, so
>> it's about as hopeless as frontbuffer rendering in general).
>>
>> I still don't see the benefit of the first option, or the exact use case.
>>
> 
> Still haven't slept so i don't know if i'm adding anything to the
> discussion that Daniel hasn't stated already, but anyway:
> 
> I think i agree with Daniel on this. Having a vblank ts that truly
> reflects the end of last vblank/start of scanout should make
> time-based scheduling easier. Then we know when scanout of the active
> frame starts and calculate from known fixed scanout duration when next
> vblank starts and when it ends the latest. On X11 you can
> glXGetSyncValuesOML() query that start time to sync up a compositors
> update cycle or start of an animation or timed presentation sequence
> with vblank. E.g., Waylands Weston does this when switching from idle
> desktop to redraw loop. A movie player or my own software could use
> this info in the same way, whereas i don't see the current vblank
> timestams being useful. If you get the current vblank event with
> timestamp at start of vblank ie. a bit into the short fixed duration
> front porch, you'd have to render and flip within what small amount of
> time is left unti the end of the fixed front-porch, because once you
> get into variable extended front-porch your vblank timestamp is
> already meaningless.

Yeah, the timestamp is more useful for preparing animation/content/etc 
for the scanout after next. I agree that it should be the same, but 
there are the challenges in not breaking userspace or the driver.

Wait for vblank is really poorly defined as is. Defining it explicitly 
for VRR as the start of back porch is fine as long as it is documented 
somewhere. Implementation wise it's going to be a pain since we'd need 
separate interrupts for vpos=vfront and vpos=0 for proper BTR support, 
but I think all the ASICs are capable of this.

This is simple enough to enforce in IGT testing as well, wait for 
vblank, grab the timestamp, then do the flip targeting something within 
the vrr range and grab the timestamp and see if the values are within 
some sane threshold.

> 
> I also think if we want to implement something like the ability to
> flip for a specific target scanout time -- which i'd really like to
> have -- having the true end of last vblank gives us a baseline
> timestamp that is easier to reason about inside the kernel, similar to
> how a compositor can reason/adjust to when to start composing the next
> frame or submitting a flip to hit some consistent framerate, or in
> turn to better obey a presentation request of a client for a specific
> target time.
Yes, the target presentation timestamp is ideal here. It probably should 
have been part of the original proposal but there wasn't anything in 
userspace that would have been using it. It would have been nice for IGT 
testing though.

> 
> At the same time we need to keep current behaviour for non-VRR mode,
> with early vblanks, so all the target_msc vblank count based
> scheduling keeps working. I don't think clients that are written to
> use VRR should expect exactly the same timing behaviour wrt. when
> vblank events get sent or timestamps updated. I certainly expect to
> have separate code-paths in my app for optimal VRR vs. non-VRR
> presentation scheduling, a compositor or movie player might adapt in
> the same way, and fullscreen apps can opt-out of VRR if they want.

FWIW most compositors/movie players/web browsers are blocked from using 
VRR right not in a blacklist in mesa. So maybe we don't have to worry 
about this yet at least.

> 
> Clients just need to know what the behavior is in VRR mode. I wonder
> what we should do for the kernel docs for Linux 5.0-rc? I assume we
> won't have the time to fix this all before release, would it make
> sense to have some kernel-docs updates via drm-fixes that declare the
> vblank/pageflip timestamps as "undefined"/"not to be relied
> on"/"subject to future change" for this release - essentially declare
> vblank timestamps as unstable api for the moment, so nobody relies on
> the current quite broken timestamp behaviour?
> 
> I do now have hacked up proof-of-concept patches here that after
> torturing them for a few hours with my own tests now seem to be able
> to calculate correct pageflip timestamps regardless which of the two
> options we choose, but they need refinement/cleanup/rebasing onto
> somethign else than Linux 5.0-rc5, and help from Nicholas, because i
> tend to get lost in all the indirections and file hierarchy inside the
> DisplayCore code etc., and probably also a few kinks worked out. And
> see if shifting the vblank irq handling doesn't cause other weird
> interactions? I haven't looked how Nicholas below-the-range/framerate
> compensation works in detail and if it matter that it happens at start
> of front-porch. I can only say that it doesn't seem to interfere with
> correct timestamping if i run with framerates intentionally below
> vrr-min, as far as my limited testing goes. Another thing that may be
> problematic is disabling/reenabling vblank during idle times, given
> that we can only reliably timestamp from within pageflip irq handler
> or vblank irq ~ back-porch/active scanout. Maybe we need to
> drm_crtc_vblank_get() hold a reference while VRR is active to simplify
> things for the moment.

BTR support has the bit in the vblank IRQ handler that adjusts the 
min/max duration for the display and if we're doing that in the back 
porch it's already too late.

Another vline interrupt could be installed here and only enabled when 
VRR goes on. If VRR is on then that should be the source for updating 
the vblank count. This would probably have some sort of messy 
interaction with the amdgpu kms counter bit I mentioned in another email 
though, but I haven't really looked into this in depth yet.

I think just calling the wait for vblank IOCTL will grab the reference 
to the vblank and it should enable the interrupts. I think holding an 
extra reference could be a simple solution if that's a problem though.

> 
> One thing one could merge is that other VRR friendly throttling patch
> i sent, given that that works and is independent of the timestamp
> stuff. I also have a backport for Linux 5.0-rc if that makes sense?
> That VRR throttling also needs to get reworked if/once we wanted to
> emit matching vblank/pflip timestamps, as that doesn't like the
> shifted point inside vblank when vblank counter gets updated.
> 
> Basically i don't know when the deadlines for merging this kind of
> work are/were wrt. Linux 5.0, 5.1, 5.2?
> 
> I'll look more into cleaning/testing those patches tomorrow after lots
> of sleep and then see what Nicholas thinks of them.
> 
> One problem with the VRR stuff is that the hardware testing equipment,
> that i'd usually use to verify if timestamps are really correct and
> accurate, doesn't work with DisplayPort or HDMI signals, so i can only
> test for self-consistency of timestamps atm., and look if plots of
> timestamps look how i would expect them to look if stuff is mostly
> correct.
> 
> -mario

The throttling patch is probably fine to merge. There are a few emails 
on that chain that I need to read through, however.

Thanks for looking into this.

Nicholas Kazlauskas

> 
>>> But like you said, the only thing I can think of that should need the
>>> event right away is low latency / framebuffer rendering.
>>>
>>> I can think at least one potential use case where you would want to do
>>> front buffer rendering for low latency with VRR - emulation of old
>>> console hardware. There's not really any modern display that has fixed
>>> refresh modes that match the render frequency of some consoles. But this
>>> use case isn't out there in the wild right now at least.
>>
>> Even on an emulator I'd expect it'd just do regular flips to the
>> frontbuffer. Especially on mobile there's really no such thing as
>> frontbuffer rendering, you always do a flip. The DIRTYFB ioctl (which
>> you need to call instead of doing a page flip) simply does a pageflip
>> behind your back for you. Which I think is worse than actually
>> flipping.
>>
>> And for emulating exact refresh rates your then again go back to
>> hrtimer + issuing the flip approach, but using your desired refresh
>> rate to tune the hrtimer instead of the fixed one. But that needs a
>> correct vblank timestamp for the current frame to work correctly (or
>> you force compositor has to always flip, which defeats the power
>> saving aspects of VRR allowing to stretch refresh rates when nothing
>> changes).
>>
>>> Regarding compositors: I don't really recommend VRR for this use case in
>>> general since you'll want a fixed refresh rate frequency for input to
>>> feel consistent and to avoid any flickering / luminance changes. The
>>> fastest refresh rate will be the most responsive to the user and you can
>>> just used the fixed refresh rate mode in this case. Lowering the refresh
>>> rate to save power sounds nice in theory but I'd imagine it'd feel
>>> terrible in practice.
>>
>> Why is a compositor any different from a game? You aim to render at
>> 60fps, or whatever the refresh rate is. If you're one frame late it's
>> better to stretch the vblank slightly than a full on 1-frame freeze.
>> And on mobile it's unfortunately fairly common that the system fails
>> to clock up quickly enough when starting from an idle state, VRR
>> sounds like really useful here.
>>
>> But like in a game if you don't at least aim to render at a consistent
>> fps, and handle your input at a consistent fps, then the end result
>> will indeed be terrible.
>> -Daniel
>>
>>> Nicholas Kazlauskas
>>>
>>>>
>>>>>>>> Now added complication is that amdgpu sends out vblank events really
>>>>>>>> early, which is used by userspace to do frontbuffer rendering in the
>>>>>>>> vblank time. But I don't think anyone wants to do both VRR and
>>>>>>>
>>>>>>> I think all kms drivers try to call drm_crtc_handle_vblank() at start
>>>>>>> of vblank to give Mesa the most time for frontbuffer rendering for
>>>>>>> classic X. But vblank events are also used for scheduling bufferswaps
>>>>>>> or other stuff for redirected windowed rendering, or via api's like
>>>>>>> OML_sync_controls glXWaitForMscOML, so there might be other things
>>>>>>> affected by a more delayed vblank handling.
>>>>>>
>>>>>> The frontbuffer rendering is very much X driver specific, and I think
>>>>>> -amdgpu/radeon is the only one that requires this. No i915 driver ever
>>>>>> used the vblank interrupt to schedule frontbuffer blits, we use some
>>>>>> CS-side stalls.
>>>>>>
>>>>>> Wrt scheduling pageflips: The rule is that right after you've received the
>>>>>> vblank for frame X, then an immediately schedule pageflip should hit X+1,
>>>>>> but not X. amdgpu had this broken, but it's fixed since a while.
>>>>>>
>>>>>>>> frontbuffer rendering, hence I think it should be possible to create
>>>>>>>> correct vblank timestamps for the VRR case, while leaving the current
>>>>>>>> logic in place for everything else. But that means moving the entire
>>>>>>>> vblank processing to where you know the next frame's start time for
>>>>>>>> VRR, both for page flips and for all other vblank events.
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>
>>>>>>> I think for amdgpu it would be doable by calling drm_handle_vblank
>>>>>>> from the pageflip irq handler in VRR mode, and then additionally from
>>>>>>> the vblank interrupt handler like now. Vblank irq's are triggered via
>>>>>>> vline interrupts with programmable vertical trigger line position, so
>>>>>>> we could program the trigger line to be start of back-porch instead of
>>>>>>> start of front-porch. In the pageflip case we do vblank handling +
>>>>>>> timestamping + pflip completion from pflip irq, and have the regular
>>>>>>> back-porch vblank handling for refresh cycles in which no
>>>>>>> pageflip/pflip irq happened.
>>>>>>
>>>>>> Hm, can't we simply program the vblank to occur in the back porch region
>>>>>> for VRR? The vblank timestamping should automatically correct the
>>>>>> timestamp in that case. Would require a lot less code changes.
>>>>>>
>>>>>> Plus update the documentation ofc.
>>>>>>
>>>>>>> Not sure if/how that would mess with below-vrr-refresh-rate-range
>>>>>>> support for stutter reduction at low fps though? Or with performance,
>>>>>>> given vblank event dispatch could be delayed a lot in VRR compared to
>>>>>>> normal mode? Or with all the vblank accounting during modesets or crtc
>>>>>>> en/disable?
>>>>>
>>>>> I think we have extra VLINEs that can be programmed at vpos 0 that could
>>>>> resolve this but I'm still not really sold on the idea.
>>>>>
>>>>>>
>>>>>> Currently VRR and explicit frame scheduling don't mix. We've discussed
>>>>>> this, and consens seems to be that we need a new property for page_flip,
>>>>>> similar to the target frame, but instead of frames a timestamp. Then the
>>>>>> driver can hit a specific time for the next frame.
>>>>>>
>>>>>> And since compositors use the vblank ioctl just to do frame scheduling, I
>>>>>> don't think we have to worry hugely about interactions there. Much better
>>>>>> to aim for clean semantics (i.e. vblank and flip complete timestamps
>>>>>> better match).
>>>>>>
>>>>>>> Also, could other drivers like Intel easily do this delayed vblank
>>>>>>> processing in the non-pageflip case?
>>>>>>
>>>>>> Intel's problem :-) But yeah we have page flip timestamp registers, so
>>>>>> worst case all we need to do is shuffle code around a bit. And we have
>>>>>> interrupts for both start of vblank and and of vblank. It might be a bit
>>>>>> more work because we use the vblank interrupt to avoid races in atomic
>>>>>> updates, but that's not too terrible to fix.
>>>>>>
>>>>>>>    From my user perspective as a developer of a neuroscience research
>>>>>>> application that has a couple of exciting/important use cases for VRR
>>>>>>> mode, i absolutely do need trustworthy and precise pageflip event
>>>>>>> timestamps that represent reality, otherwise VRR mode would be less
>>>>>>> than useless for me. I can do without meaningful vblank timestamps in
>>>>>>> VRR mode though, and expected to do without them, as any scheduling
>>>>>>> has to happen time-based instead of based on vblank counts anyway,
>>>>>>> given that vblank counts are quite meaningless if every frame can have
>>>>>>> a different duration. I'd expect that situation to be similar for
>>>>>>> other apps that would want to use timestamps in VRR mode like video
>>>>>>> games, movie players or VR/AR applications.
>>>>>>
>>>>>> Yeah for that you want to schedule your frames with a target frame number.
>>>>>> Would still be nice if vblank timestamps wouldn't be an outright lie (and
>>>>>> the timestamp is still useful information, even if the frame counter
>>>>>> isn't - many displays have min/max frame rates for VRR, which is
>>>>>> information we could perhaps expose).
>>>>>> -Daniel >
>>>>>>>
>>>>>>> -mario
>>>>>>>
>>>>>>>>>
>>>>>>>>> Nicholas Kazlauskas
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> ---
>>>>>>>>>>>      drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
>>>>>>>>>>>      include/drm/drm_vblank.h     |  8 ++++++
>>>>>>>>>>>      2 files changed, 56 insertions(+), 1 deletion(-)
>>>>>>>>>>>
>>>>>>>>>>> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
>>>>>>>>>>> index 98e091175921..4b3a4c38fabe 100644
>>>>>>>>>>> --- a/drivers/gpu/drm/drm_vblank.c
>>>>>>>>>>> +++ b/drivers/gpu/drm/drm_vblank.c
>>>>>>>>>>> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
>>>>>>>>>>>                 u64 seq, ktime_t now)
>>>>>>>>>>>      {
>>>>>>>>>>>         struct timespec64 tv;
>>>>>>>>>>> +    ktime_t alt_flip_time;
>>>>>>>>>>>
>>>>>>>>>>>         switch (e->event.base.type) {
>>>>>>>>>>> -    case DRM_EVENT_VBLANK:
>>>>>>>>>>>         case DRM_EVENT_FLIP_COMPLETE:
>>>>>>>>>>> +            /*
>>>>>>>>>>> +             * For flip completion events, override "now" time
>>>>>>>>>>> +             * with alt_flip_time provided by the driver via
>>>>>>>>>>> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
>>>>>>>>>>> +             * if that time is later than given "now" vblank time.
>>>>>>>>>>> +             */
>>>>>>>>>>> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
>>>>>>>>>>> +            if (alt_flip_time > now)
>>>>>>>>>>> +                    now = alt_flip_time;
>>>>>>>>>>> +            /* Fallthrough */
>>>>>>>>>>> +    case DRM_EVENT_VBLANK:
>>>>>>>>>>>                 tv = ktime_to_timespec64(now);
>>>>>>>>>>>                 e->event.vbl.sequence = seq;
>>>>>>>>>>>                 /*
>>>>>>>>>>> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
>>>>>>>>>>>
>>>>>>>>>>>                 now = ktime_get();
>>>>>>>>>>>         }
>>>>>>>>>>> +
>>>>>>>>>>>         e->pipe = pipe;
>>>>>>>>>>>         send_vblank_event(dev, e, seq, now);
>>>>>>>>>>>      }
>>>>>>>>>>>      EXPORT_SYMBOL(drm_crtc_send_vblank_event);
>>>>>>>>>>>
>>>>>>>>>>> +/**
>>>>>>>>>>> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
>>>>>>>>>>> + * @crtc: the source CRTC of the pageflip completion event
>>>>>>>>>>> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
>>>>>>>>>>> + *
>>>>>>>>>>> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
>>>>>>>>>>> + * by the pageflip event can never be earlier than the vblank timestamp of the
>>>>>>>>>>> + * vblank of flip completion, as that vblank timestamp defines the end of the
>>>>>>>>>>> + * shortest possible vblank duration. In case of a delayed flip completion
>>>>>>>>>>> + * inside the extended VRR front porch however, the end of vblank can be much
>>>>>>>>>>> + * later, so the driver must assign an estimated timestamp of that later end of
>>>>>>>>>>> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
>>>>>>>>>>> + * set an alternate flip completion timestamp in case of late flip completions
>>>>>>>>>>> + * in extended vblank. In the most simple case, this @flip_time timestamp could
>>>>>>>>>>> + * simply be a ktime_get() timestamp taken at the start of the pageflip
>>>>>>>>>>> + * completion routine, with some constant duration of the back porch interval
>>>>>>>>>>> + * added, although more precise estimates may be possible on some hardware if
>>>>>>>>>>> + * the hardware provides some means of timestamping the true end of vblank.
>>>>>>>>>>> + *
>>>>>>>>>>> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
>>>>>>>>>>> + * will use either the standard vblank timestamp, calculated for a minimum
>>>>>>>>>>> + * duration vblank, or the provided @flip_time if that time is later than the
>>>>>>>>>>> + * vblank timestamp, to get the best possible estimate of start of display of
>>>>>>>>>>> + * the new post-pageflip scanout buffer.
>>>>>>>>>>> + */
>>>>>>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
>>>>>>>>>>> +                                     ktime_t flip_time)
>>>>>>>>>>> +{
>>>>>>>>>>> +    struct drm_device *dev = crtc->dev;
>>>>>>>>>>> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
>>>>>>>>>>> +
>>>>>>>>>>> +    vblank->alt_flip_time = flip_time;
>>>>>>>>>>> +}
>>>>>>>>>>> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
>>>>>>>>>>> +
>>>>>>>>>>>      static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
>>>>>>>>>>>      {
>>>>>>>>>>>         if (drm_core_check_feature(dev, DRIVER_MODESET)) {
>>>>>>>>>>> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
>>>>>>>>>>> index 6ad9630d4f48..aacf44694ab6 100644
>>>>>>>>>>> --- a/include/drm/drm_vblank.h
>>>>>>>>>>> +++ b/include/drm/drm_vblank.h
>>>>>>>>>>> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
>>>>>>>>>>>          * @time: Vblank timestamp corresponding to @count.
>>>>>>>>>>>          */
>>>>>>>>>>>         ktime_t time;
>>>>>>>>>>> +    /**
>>>>>>>>>>> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
>>>>>>>>>>> +     * a late pageflip completion in variable refresh rate mode. Pageflip
>>>>>>>>>>> +     * events will carry the later one of @time and @alt_flip_time.
>>>>>>>>>>> +     */
>>>>>>>>>>> +    ktime_t alt_flip_time;
>>>>>>>>>>>
>>>>>>>>>>>         /**
>>>>>>>>>>>          * @refcount: Number of users/waiters of the vblank interrupt. Only when
>>>>>>>>>>> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
>>>>>>>>>>>      u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
>>>>>>>>>>>      u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
>>>>>>>>>>>                                    ktime_t *vblanktime);
>>>>>>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
>>>>>>>>>>> +                                     ktime_t flip_time);
>>>>>>>>>>>      void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
>>>>>>>>>>>                                struct drm_pending_vblank_event *e);
>>>>>>>>>>>      void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
>>>>>>>>>>> --
>>>>>>>>>>> 2.17.1
>>>>>>>>>>>
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> dri-devel mailing listjj
>>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Daniel Vetter
>>>>>>>> Software Engineer, Intel Corporation
>>>>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>>>>>
>>>>>
>>>>> Nicholas Kazlauskas
>>>>>
>>>>
>>>>
>>>
>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Feb. 13, 2019, 8:53 p.m. UTC | #16
On Wed, Feb 13, 2019 at 07:10:00PM +0100, Mario Kleiner wrote:
> On Wed, Feb 13, 2019 at 5:03 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Wed, Feb 13, 2019 at 4:46 PM Kazlauskas, Nicholas
> > <Nicholas.Kazlauskas@amd.com> wrote:
> > >
> > > On 2/13/19 10:14 AM, Daniel Vetter wrote:
> > > > On Wed, Feb 13, 2019 at 3:33 PM Kazlauskas, Nicholas
> > > > <Nicholas.Kazlauskas@amd.com> wrote:
> > > >>
> > > >> On 2/13/19 4:50 AM, Daniel Vetter wrote:
> > > >>> On Tue, Feb 12, 2019 at 10:32:31PM +0100, Mario Kleiner wrote:
> > > >>>> On Mon, Feb 11, 2019 at 6:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > > >>>>>
> > > >>>>> On Mon, Feb 11, 2019 at 4:01 PM Kazlauskas, Nicholas
> > > >>>>> <Nicholas.Kazlauskas@amd.com> wrote:
> > > >>>>>>
> > > >>>>>> On 2/11/19 3:35 AM, Daniel Vetter wrote:
> > > >>>>>>> On Mon, Feb 11, 2019 at 04:22:24AM +0100, Mario Kleiner wrote:
> > > >>>>>>>> The pageflip completion timestamps transmitted to userspace
> > > >>>>>>>> via pageflip completion events are supposed to describe the
> > > >>>>>>>> time at which the first pixel of the new post-pageflip scanout
> > > >>>>>>>> buffer leaves the video output of the gpu. This time is
> > > >>>>>>>> identical to end of vblank, when active scanout starts.
> > > >>>>>>>>
> > > >>>>>>>> For a crtc in standard fixed refresh rate, the end of vblank
> > > >>>>>>>> is identical to the vblank timestamps calculated by
> > > >>>>>>>> drm_update_vblank_count() at each vblank interrupt, or each
> > > >>>>>>>> vblank dis-/enable. Therefore pageflip events just carry
> > > >>>>>>>> that vblank timestamp as their pageflip timestamp.
> > > >>>>>>>>
> > > >>>>>>>> For a crtc switched to variable refresh rate mode (vrr), the
> > > >>>>>>>> pageflip completion timestamps are identical to the vblank
> > > >>>>>>>> timestamps iff the pageflip was executed early in vblank,
> > > >>>>>>>> before the minimum vblank duration elapsed. In this case
> > > >>>>>>>> the time of display onset is identical to when the crtc
> > > >>>>>>>> is running in fixed refresh rate.
> > > >>>>>>>>
> > > >>>>>>>> However, if a pageflip completes later in the vblank, inside
> > > >>>>>>>> the "extended front porch" in vrr mode, then the vblank will
> > > >>>>>>>> terminate at a fixed (back porch) duration after flip, so
> > > >>>>>>>> the display onset time is delayed correspondingly. In this
> > > >>>>>>>> case the vblank timestamp computed at vblank irq time would
> > > >>>>>>>> be too early, and we need a way to calculate an estimated
> > > >>>>>>>> pageflip timestamp that will be later than the vblank timestamp.
> > > >>>>>>>>
> > > >>>>>>>> How a driver determines such a "late flip" timestamp is hw
> > > >>>>>>>> and driver specific, but this patch adds a new helper function
> > > >>>>>>>> that allows the driver to propose such an alternate "late flip"
> > > >>>>>>>> timestamp for use in pageflip events:
> > > >>>>>>>>
> > > >>>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> > > >>>>>>>>
> > > >>>>>>>> When sending out pageflip events, we now compare that proposed
> > > >>>>>>>> flip_timestamp against the vblank timestamp of the current
> > > >>>>>>>> vblank of flip completion and choose to send out the greater/
> > > >>>>>>>> later timestamp as flip completion timestamp.
> > > >>>>>>>>
> > > >>>>>>>> The most simple way for a kms driver to supply a suitable
> > > >>>>>>>> flip_timestamp in vrr mode would be to simply take a timestamp
> > > >>>>>>>> at start of the pageflip completion handler, e.g., pageflip
> > > >>>>>>>> irq handler: flip_timestamp = ktime_get(); and then set that
> > > >>>>>>>> as proposed "late" alternative timestamp via ...
> > > >>>>>>>> drm_crtc_set_vrr_pageflip_timestamp(crtc, flip_timestamp);
> > > >>>>>>>>
> > > >>>>>>>> More clever approaches could try to add some corrective offset
> > > >>>>>>>> for fixed back porch duration, or ideally use hardware features
> > > >>>>>>>> like hw timestamps to calculate the exact end time of vblank.
> > > >>>>>>>>
> > > >>>>>>>> Signed-off-by: Mario Kleiner <mario.kleiner.de@gmail.com>
> > > >>>>>>>> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com>
> > > >>>>>>>> Cc: Harry Wentland <harry.wentland@amd.com>
> > > >>>>>>>> Cc: Alex Deucher <alexander.deucher@amd.com>
> > > >>>>>>>
> > > >>>>>>> Uh, this looks like a pretty bad hack. Can't we fix amdgpu to only give us
> > > >>>>>>> the right timestampe, once? With this I guess if you do a vblank query in
> > > >>>>>>> between the wrong and the right vblank you'll get the bogus value. Not
> > > >>>>>>> really great for userspace.
> > > >>>>>>> -Daniel
> > > >>>>>>
> > > >>>>>> I think we calculate the timestamp and send the vblank event both within
> > > >>>>>> the pageflip IRQ handler so calculating the right pageflip timestamp
> > > >>>>>> once could probably be done. I'm not sure if it's easier than proposing
> > > >>>>>> a later flip time with an API like this though.
> > > >>>>>>
> > > >>>>>> The actual scanout time should be known from the page-flip handler so
> > > >>>>>> the semantics for VRR on/off remain the same. This is because the
> > > >>>>>> page-flip triggers entering the back porch if we're in the extended
> > > >>>>>> front porch.
> > > >>>>>>
> > > >>>>>> But scanout time from vblank events for something like
> > > >>>>>> DRM_IOCTL_WAIT_VBLANK are going to be wrong in most cases and are only
> > > >>>>>> treated as estimates. If we're in the regular front porch then the
> > > >>>>>> timing to scanout is based on the fixed duration front porch for the
> > > >>>>>> current mode. If we're in the extended back porch then it's technically
> > > >>>>>> driver defined but the most reasonable guess is to assume that the front
> > > >>>>>> porch is going to end at any moment, so just return the length of the
> > > >>>>>> back porch for getting the scanout time.
> > > >>>>>>
> > > >>>>>> Proposing the late timestamp shouldn't affect vblank event in the
> > > >>>>>> DRM_IOCTL_WAIT_VBLANK case and should only be used in the page-flip
> > > >>>>>> event case. I'm not sure if that's what's guaranteed to happen with this
> > > >>>>>> patch though. There doesn't seem to be any locking on either
> > > >>>>>> dev->vblank_time_lock or the vblank->seqlock so while it's likely to get
> > > >>>>>> the same vblank event back as the one just stored I don't think it's
> > > >>>>>> guaranteed.
> > > >>>>>
> > > >>>>> That's the inconsistency I mean to highlight - the timestamp for the
> > > >>>>> same frame as observed through flip complete and through the
> > > >>>>> wait_vblank ioctl can differ. Which they really shouldn't.
> > > >>>>>
> > > >>>>
> > > >>>> Ideally they shouldn't differ. The kernel docs for drm_crtc_state say
> > > >>>> that vblank and pageflip timestamps should always match. But then the
> > > >>>> kernel docs for "Variable refresh properties" in drm_connector.c for
> > > >>>> vblank timestamps were changed for the VRR implementation in Linux
> > > >>>> 5.0-rc to redefine them when in VRR mode. They are defined, but
> > > >>>> probably rather useless for any practical purpose, like this:
> > > >>>>
> > > >>>> "The semantics for the vertical blank timestamp differ when
> > > >>>> variable refresh rate is active. The vertical blank timestamp
> > > >>>> is defined to be an estimate using the current mode's fixed
> > > >>>> refresh rate timings. The semantics for the page-flip event
> > > >>>> timestamp remain the same."
> > > >>>
> > > >>> Uh I missed that. That sounds like nonsense tbh.
> > > >>>
> > > >>>> So our docs contradict each other as of Linux 5.0-rc. Certainly having
> > > >>>> useful vblank timetamps would be useful.
> > > >>>
> > > >>> Yup, imo vblank should still match page_flip. Otherwise I expect a lot of
> > > >>> hilarity will ensue.
> > > >>
> > > >> I would imagine you would see more breakage by changing userspace
> > > >> expectations for when the event is actually received.
> > > >>
> > > >> The IOCTL is "wait for vblank", but if we start sending the event near
> > > >> the end of vblank it becomes more "wait for pageflip" (near scanout) or
> > > >> "wait 15ms while the VRR front porch times out and causes flickering".
> > > >>
> > > >> I don't speak for all userspace but it seems more useful to me to have
> > > >> the event sent at the start of vblank.
> > > >
> > > > Maybe we need to things. The problem is that the timestampe is defined
> > > > to be correct for start-of-frame. And right now the amdgpu
> > > > implementation of VRR doesn't even give an accurate timestampe for the
> > > > flip, so most likely no one yet thought about this properly.
> > > > Definitely no testcases (or it would have been caught), whereas we
> > > > have tons of testcases for fixed refresh.
> > >
> > > I agree that it would be good to have documentation for what waiting for
> > > vblank is supposed to mean here. There's a big difference between
> > > getting the event at the start of vsync vs the start of the back porch
> > > near the end of it.
> > >
> > > >
> > > >> VRR in its current state is useful for applications that aren't
> > > >> completely timing sensitive (like most games). This is because the
> > > >> driver has free reign in restricting the min/max bounds of the range -
> > > >> allowing for enhancements like low framerate compensation / BTR.
> > > >>
> > > >> For a concrete example, imagine you're not doing frontbuffer rendering
> > > >> but you're still trying to target a specific scanout time. So even if we
> > > >> were to send the vblank event at vpos 0 you'd get an accurate scanout
> > > >> timestamp for the next scanout after the vback porch ends. You're not
> > > >> going to be doing any rendering or preparation in this time because it's
> > > >> simply too short. So you'd target the scanout period after this one,
> > > >> which you'd need to calculate somehow in userspace. Taking the deltas
> > > >> between two vblank events is completely useless here because you'll get
> > > >> back the vrr min refresh delta. So you can take your source content rate
> > > >> or some fixed rate you'd like to target within the range but even then
> > > >> you're still not guaranteed to have that flip end up at that scanout time.
> > > >>
> > > >> The current solution is to just return the scanout pos at vrr max, or
> > > >> the current mode's fixed refresh rate. If you're flip before the fixed
> > > >> duration front porch ends then that scanout time will likely be accurate
> > > >> since you can't flip faster than vrr max.
> > > >>
> > > >> Is there actually userspace that cares that these timestamps need to
> > > >> match? Like said below, it seems that wait for vblank is mostly about
> > > >> scheduling and for that I think having the event sent back at the start
> > > >> of vblank is more important here.
> > > >
> > > > There's 3 cases:
> > > > - DRI3: Waits for vblank for frontbuffer blits. If you combine that
> > > > with VRR then imo you deserve to keep all the pieces.
> > > > - generic compositors. Use VRR for scheduling, expect that a page_flip
> > > > queued immediately after a page_flip will hit the next frame, and not
> > > > the current frame. They also expect that this will not give you the
> > > > slowest possible refresh rate, but I guess VRR plus frame scheduling
> > > > is currently out of scope. It's kinda just for as-fast-as-possible
> > > > games.
> > > > - flip timestamp. Currently the entire codebase assumes that vblank ts
> > > > = flip ts. I don't really see a point in breaking that.
> > > > - better scheduling of VRR: what compositors actually do isn't wait
> > > > for the vblank, that only latches the page_flip. They set up a
> > > > hrtimer, with a fixed/auto-tuned head-start, from which they start
> > > > their rendering. The same could be done for VRR, if we expose the
> > > > earliest/latest vblank end times (and how much they can shift). vblank
> > > > ioctl is only used for syncing that up and making sure we're hitting
> > > > the right frame. This is all for pageflipping compositors. This is
> > > > also what they'll probably keep doing when we give them a desired flip
> > > > time (instead of a target frame), they still need an hrtimer to fire a
> > > > little bit ahead to get the frame rendered.
> > > >
> > > > So taking all together, we have 2 use-cases I think:
> > > > - frontbuffer rendering. Kinda wants vblank event to fire at the
> > > > beginning of vblank. Won't ever do VRR
> > > > - pageflipping compositors. Don't care about the vblank event, either
> > > > they have an hrtimer timer (which gets synced with the vblank ioctl),
> > > > or just flip as fast as possible. They don't care one bit when the
> > > > vblank event happens, as long as a pageflip scheduled immediately
> > > > after the event shows up only hits the next frame (which is the delay
> > > > code that currently stretches vblank unecessarily with VRR if you're
> > > > already a bit late).
> > > >
> > > > Making sure that vblank and page flip timestamp match the real
> > > > userspace seems like the cleanest semantics, and the least surprising
> > > > to userspace. E.g. what happens if you implement VRR in a compositor,
> > > > but then tune your hrtimer within the vblank, when the vblank
> > > > timestamp is totally not reflecting your real pageflip? I just don't
> > > > see what's the upside of having a vblank timestamp that's not actually
> > > > reflecting the real flip.
> > > >
> > > > Also, if your app wants the start of vblank, that's very easy to
> > > > compute: Just add the scanout time to the last vblank timestamp.
> > > > Figuring out the start of frame otoh isn't doable if you can't query,
> > > > and constantly pageflipping just to have an accurate start-of-frame
> > > > (for tuning your hrtimer) seems rather wasteful - it outright defeats
> > > > some of the use-cases for VRR.
> > > >
> > > > So what benefit do you see in having vblank ts and pageflip ts not
> > > > agree? What's the use-case this solves - aside from "less typing" :-)
> > > > If the only use-case is VRR for frontbuffer rendering, then that
> > > > doesn't even work right now (aside from that I think it's something we
> > > > shouldn't support and definitely not encourage).
> > > > -Daniel
> > >
> > > I'm actually fine with moving the vblank event back to the back porch
> > > but I'd just be worried about userspace breakages that relied on it
> > > happening the other way.
> > >
> > > It'd be especially confusing if the VLINE was different for fixed vs
> > > variable refresh. I wouldn't want to move it to vpos 0 unless the event
> > > is also getting moved for fixed refresh as well.
> >
> > That would break DRI3. So we can pick between two options:
> > - always send out the vblank early, accept that vblank timestamps change.
> > - for VRR only, send out the vblank only when we know the correct
> > timestamp. For non-VRR we have to send out the vblank early, because
> > that's somewhat become api (but not all drivers do it that way, so
> > it's about as hopeless as frontbuffer rendering in general).
> >
> > I still don't see the benefit of the first option, or the exact use case.
> >
> 
> Still haven't slept so i don't know if i'm adding anything to the
> discussion that Daniel hasn't stated already, but anyway:
> 
> I think i agree with Daniel on this. Having a vblank ts that truly
> reflects the end of last vblank/start of scanout should make
> time-based scheduling easier. Then we know when scanout of the active
> frame starts and calculate from known fixed scanout duration when next
> vblank starts and when it ends the latest. On X11 you can
> glXGetSyncValuesOML() query that start time to sync up a compositors
> update cycle or start of an animation or timed presentation sequence
> with vblank. E.g., Waylands Weston does this when switching from idle
> desktop to redraw loop. A movie player or my own software could use
> this info in the same way, whereas i don't see the current vblank
> timestams being useful. If you get the current vblank event with
> timestamp at start of vblank ie. a bit into the short fixed duration
> front porch, you'd have to render and flip within what small amount of
> time is left unti the end of the fixed front-porch, because once you
> get into variable extended front-porch your vblank timestamp is
> already meaningless.
> 
> I also think if we want to implement something like the ability to
> flip for a specific target scanout time -- which i'd really like to
> have -- having the true end of last vblank gives us a baseline
> timestamp that is easier to reason about inside the kernel, similar to
> how a compositor can reason/adjust to when to start composing the next
> frame or submitting a flip to hit some consistent framerate, or in
> turn to better obey a presentation request of a client for a specific
> target time.
> 
> At the same time we need to keep current behaviour for non-VRR mode,
> with early vblanks, so all the target_msc vblank count based
> scheduling keeps working. I don't think clients that are written to
> use VRR should expect exactly the same timing behaviour wrt. when
> vblank events get sent or timestamps updated. I certainly expect to
> have separate code-paths in my app for optimal VRR vs. non-VRR
> presentation scheduling, a compositor or movie player might adapt in
> the same way, and fullscreen apps can opt-out of VRR if they want.
> 
> Clients just need to know what the behavior is in VRR mode. I wonder
> what we should do for the kernel docs for Linux 5.0-rc? I assume we
> won't have the time to fix this all before release, would it make
> sense to have some kernel-docs updates via drm-fixes that declare the
> vblank/pageflip timestamps as "undefined"/"not to be relied
> on"/"subject to future change" for this release - essentially declare
> vblank timestamps as unstable api for the moment, so nobody relies on
> the current quite broken timestamp behaviour?
> 
> I do now have hacked up proof-of-concept patches here that after
> torturing them for a few hours with my own tests now seem to be able
> to calculate correct pageflip timestamps regardless which of the two
> options we choose, but they need refinement/cleanup/rebasing onto
> somethign else than Linux 5.0-rc5, and help from Nicholas, because i
> tend to get lost in all the indirections and file hierarchy inside the
> DisplayCore code etc., and probably also a few kinks worked out. And
> see if shifting the vblank irq handling doesn't cause other weird
> interactions? I haven't looked how Nicholas below-the-range/framerate
> compensation works in detail and if it matter that it happens at start
> of front-porch. I can only say that it doesn't seem to interfere with
> correct timestamping if i run with framerates intentionally below
> vrr-min, as far as my limited testing goes. Another thing that may be
> problematic is disabling/reenabling vblank during idle times, given
> that we can only reliably timestamp from within pageflip irq handler
> or vblank irq ~ back-porch/active scanout. Maybe we need to
> drm_crtc_vblank_get() hold a reference while VRR is active to simplify
> things for the moment.

I think a solid testcase would be really good. Something that uses an
hrtimer to sweept across the entire vblank, from 0 until it notices that
the kernel doesn't increase the vblank any more (or 1 second or something,
we want to stop eventually) should hit all the corner cases, and then
making sure all the timestamps we get back on the sw side are at least
self-consistent. For non-VRR we have this in igt with kms_flip, and it
tends to find lots of bugs.

I think it'd be great if we could have something like this for VRR too,
once we've agreed on the exact semantics. Plus docs patch, bugfixes and
everything ofc.

> One thing one could merge is that other VRR friendly throttling patch
> i sent, given that that works and is independent of the timestamp
> stuff. I also have a backport for Linux 5.0-rc if that makes sense?
> That VRR throttling also needs to get reworked if/once we wanted to
> emit matching vblank/pflip timestamps, as that doesn't like the
> shifted point inside vblank when vblank counter gets updated.
> 
> Basically i don't know when the deadlines for merging this kind of
> work are/were wrt. Linux 5.0, 5.1, 5.2?

Given how much we're still discussing, and how much needs to be done, I'd
say 5.2. And we shrug 5.0 and 5.1 simply off as "welp, broken kernel for
VRR if you care about timestamps" if backporting the final fix is too
much.

> I'll look more into cleaning/testing those patches tomorrow after lots
> of sleep and then see what Nicholas thinks of them.
> 
> One problem with the VRR stuff is that the hardware testing equipment,
> that i'd usually use to verify if timestamps are really correct and
> accurate, doesn't work with DisplayPort or HDMI signals, so i can only
> test for self-consistency of timestamps atm., and look if plots of
> timestamps look how i would expect them to look if stuff is mostly
> correct.

Yeah I think self-consistency is already really useful. As mentioned
above, kms_flip (which only tests self-consistency) catches tons of stuff.
-Daniel

> 
> -mario
> 
> > > But like you said, the only thing I can think of that should need the
> > > event right away is low latency / framebuffer rendering.
> > >
> > > I can think at least one potential use case where you would want to do
> > > front buffer rendering for low latency with VRR - emulation of old
> > > console hardware. There's not really any modern display that has fixed
> > > refresh modes that match the render frequency of some consoles. But this
> > > use case isn't out there in the wild right now at least.
> >
> > Even on an emulator I'd expect it'd just do regular flips to the
> > frontbuffer. Especially on mobile there's really no such thing as
> > frontbuffer rendering, you always do a flip. The DIRTYFB ioctl (which
> > you need to call instead of doing a page flip) simply does a pageflip
> > behind your back for you. Which I think is worse than actually
> > flipping.
> >
> > And for emulating exact refresh rates your then again go back to
> > hrtimer + issuing the flip approach, but using your desired refresh
> > rate to tune the hrtimer instead of the fixed one. But that needs a
> > correct vblank timestamp for the current frame to work correctly (or
> > you force compositor has to always flip, which defeats the power
> > saving aspects of VRR allowing to stretch refresh rates when nothing
> > changes).
> >
> > > Regarding compositors: I don't really recommend VRR for this use case in
> > > general since you'll want a fixed refresh rate frequency for input to
> > > feel consistent and to avoid any flickering / luminance changes. The
> > > fastest refresh rate will be the most responsive to the user and you can
> > > just used the fixed refresh rate mode in this case. Lowering the refresh
> > > rate to save power sounds nice in theory but I'd imagine it'd feel
> > > terrible in practice.
> >
> > Why is a compositor any different from a game? You aim to render at
> > 60fps, or whatever the refresh rate is. If you're one frame late it's
> > better to stretch the vblank slightly than a full on 1-frame freeze.
> > And on mobile it's unfortunately fairly common that the system fails
> > to clock up quickly enough when starting from an idle state, VRR
> > sounds like really useful here.
> >
> > But like in a game if you don't at least aim to render at a consistent
> > fps, and handle your input at a consistent fps, then the end result
> > will indeed be terrible.
> > -Daniel
> >
> > > Nicholas Kazlauskas
> > >
> > > >
> > > >>>>> Now added complication is that amdgpu sends out vblank events really
> > > >>>>> early, which is used by userspace to do frontbuffer rendering in the
> > > >>>>> vblank time. But I don't think anyone wants to do both VRR and
> > > >>>>
> > > >>>> I think all kms drivers try to call drm_crtc_handle_vblank() at start
> > > >>>> of vblank to give Mesa the most time for frontbuffer rendering for
> > > >>>> classic X. But vblank events are also used for scheduling bufferswaps
> > > >>>> or other stuff for redirected windowed rendering, or via api's like
> > > >>>> OML_sync_controls glXWaitForMscOML, so there might be other things
> > > >>>> affected by a more delayed vblank handling.
> > > >>>
> > > >>> The frontbuffer rendering is very much X driver specific, and I think
> > > >>> -amdgpu/radeon is the only one that requires this. No i915 driver ever
> > > >>> used the vblank interrupt to schedule frontbuffer blits, we use some
> > > >>> CS-side stalls.
> > > >>>
> > > >>> Wrt scheduling pageflips: The rule is that right after you've received the
> > > >>> vblank for frame X, then an immediately schedule pageflip should hit X+1,
> > > >>> but not X. amdgpu had this broken, but it's fixed since a while.
> > > >>>
> > > >>>>> frontbuffer rendering, hence I think it should be possible to create
> > > >>>>> correct vblank timestamps for the VRR case, while leaving the current
> > > >>>>> logic in place for everything else. But that means moving the entire
> > > >>>>> vblank processing to where you know the next frame's start time for
> > > >>>>> VRR, both for page flips and for all other vblank events.
> > > >>>>> -Daniel
> > > >>>>>
> > > >>>>
> > > >>>> I think for amdgpu it would be doable by calling drm_handle_vblank
> > > >>>> from the pageflip irq handler in VRR mode, and then additionally from
> > > >>>> the vblank interrupt handler like now. Vblank irq's are triggered via
> > > >>>> vline interrupts with programmable vertical trigger line position, so
> > > >>>> we could program the trigger line to be start of back-porch instead of
> > > >>>> start of front-porch. In the pageflip case we do vblank handling +
> > > >>>> timestamping + pflip completion from pflip irq, and have the regular
> > > >>>> back-porch vblank handling for refresh cycles in which no
> > > >>>> pageflip/pflip irq happened.
> > > >>>
> > > >>> Hm, can't we simply program the vblank to occur in the back porch region
> > > >>> for VRR? The vblank timestamping should automatically correct the
> > > >>> timestamp in that case. Would require a lot less code changes.
> > > >>>
> > > >>> Plus update the documentation ofc.
> > > >>>
> > > >>>> Not sure if/how that would mess with below-vrr-refresh-rate-range
> > > >>>> support for stutter reduction at low fps though? Or with performance,
> > > >>>> given vblank event dispatch could be delayed a lot in VRR compared to
> > > >>>> normal mode? Or with all the vblank accounting during modesets or crtc
> > > >>>> en/disable?
> > > >>
> > > >> I think we have extra VLINEs that can be programmed at vpos 0 that could
> > > >> resolve this but I'm still not really sold on the idea.
> > > >>
> > > >>>
> > > >>> Currently VRR and explicit frame scheduling don't mix. We've discussed
> > > >>> this, and consens seems to be that we need a new property for page_flip,
> > > >>> similar to the target frame, but instead of frames a timestamp. Then the
> > > >>> driver can hit a specific time for the next frame.
> > > >>>
> > > >>> And since compositors use the vblank ioctl just to do frame scheduling, I
> > > >>> don't think we have to worry hugely about interactions there. Much better
> > > >>> to aim for clean semantics (i.e. vblank and flip complete timestamps
> > > >>> better match).
> > > >>>
> > > >>>> Also, could other drivers like Intel easily do this delayed vblank
> > > >>>> processing in the non-pageflip case?
> > > >>>
> > > >>> Intel's problem :-) But yeah we have page flip timestamp registers, so
> > > >>> worst case all we need to do is shuffle code around a bit. And we have
> > > >>> interrupts for both start of vblank and and of vblank. It might be a bit
> > > >>> more work because we use the vblank interrupt to avoid races in atomic
> > > >>> updates, but that's not too terrible to fix.
> > > >>>
> > > >>>>   From my user perspective as a developer of a neuroscience research
> > > >>>> application that has a couple of exciting/important use cases for VRR
> > > >>>> mode, i absolutely do need trustworthy and precise pageflip event
> > > >>>> timestamps that represent reality, otherwise VRR mode would be less
> > > >>>> than useless for me. I can do without meaningful vblank timestamps in
> > > >>>> VRR mode though, and expected to do without them, as any scheduling
> > > >>>> has to happen time-based instead of based on vblank counts anyway,
> > > >>>> given that vblank counts are quite meaningless if every frame can have
> > > >>>> a different duration. I'd expect that situation to be similar for
> > > >>>> other apps that would want to use timestamps in VRR mode like video
> > > >>>> games, movie players or VR/AR applications.
> > > >>>
> > > >>> Yeah for that you want to schedule your frames with a target frame number.
> > > >>> Would still be nice if vblank timestamps wouldn't be an outright lie (and
> > > >>> the timestamp is still useful information, even if the frame counter
> > > >>> isn't - many displays have min/max frame rates for VRR, which is
> > > >>> information we could perhaps expose).
> > > >>> -Daniel >
> > > >>>>
> > > >>>> -mario
> > > >>>>
> > > >>>>>>
> > > >>>>>> Nicholas Kazlauskas
> > > >>>>>>
> > > >>>>>>>
> > > >>>>>>>> ---
> > > >>>>>>>>     drivers/gpu/drm/drm_vblank.c | 49 +++++++++++++++++++++++++++++++++++-
> > > >>>>>>>>     include/drm/drm_vblank.h     |  8 ++++++
> > > >>>>>>>>     2 files changed, 56 insertions(+), 1 deletion(-)
> > > >>>>>>>>
> > > >>>>>>>> diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
> > > >>>>>>>> index 98e091175921..4b3a4c38fabe 100644
> > > >>>>>>>> --- a/drivers/gpu/drm/drm_vblank.c
> > > >>>>>>>> +++ b/drivers/gpu/drm/drm_vblank.c
> > > >>>>>>>> @@ -814,10 +814,21 @@ static void send_vblank_event(struct drm_device *dev,
> > > >>>>>>>>                u64 seq, ktime_t now)
> > > >>>>>>>>     {
> > > >>>>>>>>        struct timespec64 tv;
> > > >>>>>>>> +    ktime_t alt_flip_time;
> > > >>>>>>>>
> > > >>>>>>>>        switch (e->event.base.type) {
> > > >>>>>>>> -    case DRM_EVENT_VBLANK:
> > > >>>>>>>>        case DRM_EVENT_FLIP_COMPLETE:
> > > >>>>>>>> +            /*
> > > >>>>>>>> +             * For flip completion events, override "now" time
> > > >>>>>>>> +             * with alt_flip_time provided by the driver via
> > > >>>>>>>> +             * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
> > > >>>>>>>> +             * if that time is later than given "now" vblank time.
> > > >>>>>>>> +             */
> > > >>>>>>>> +            alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
> > > >>>>>>>> +            if (alt_flip_time > now)
> > > >>>>>>>> +                    now = alt_flip_time;
> > > >>>>>>>> +            /* Fallthrough */
> > > >>>>>>>> +    case DRM_EVENT_VBLANK:
> > > >>>>>>>>                tv = ktime_to_timespec64(now);
> > > >>>>>>>>                e->event.vbl.sequence = seq;
> > > >>>>>>>>                /*
> > > >>>>>>>> @@ -916,11 +927,47 @@ void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> > > >>>>>>>>
> > > >>>>>>>>                now = ktime_get();
> > > >>>>>>>>        }
> > > >>>>>>>> +
> > > >>>>>>>>        e->pipe = pipe;
> > > >>>>>>>>        send_vblank_event(dev, e, seq, now);
> > > >>>>>>>>     }
> > > >>>>>>>>     EXPORT_SYMBOL(drm_crtc_send_vblank_event);
> > > >>>>>>>>
> > > >>>>>>>> +/**
> > > >>>>>>>> + * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
> > > >>>>>>>> + * @crtc: the source CRTC of the pageflip completion event
> > > >>>>>>>> + * @flip_time: The alternate pageflip completion timestamp in VRR mode
> > > >>>>>>>> + *
> > > >>>>>>>> + * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
> > > >>>>>>>> + * by the pageflip event can never be earlier than the vblank timestamp of the
> > > >>>>>>>> + * vblank of flip completion, as that vblank timestamp defines the end of the
> > > >>>>>>>> + * shortest possible vblank duration. In case of a delayed flip completion
> > > >>>>>>>> + * inside the extended VRR front porch however, the end of vblank can be much
> > > >>>>>>>> + * later, so the driver must assign an estimated timestamp of that later end of
> > > >>>>>>>> + * vblank. For a CRTC in VRR mode, the driver should use this helper function to
> > > >>>>>>>> + * set an alternate flip completion timestamp in case of late flip completions
> > > >>>>>>>> + * in extended vblank. In the most simple case, this @flip_time timestamp could
> > > >>>>>>>> + * simply be a ktime_get() timestamp taken at the start of the pageflip
> > > >>>>>>>> + * completion routine, with some constant duration of the back porch interval
> > > >>>>>>>> + * added, although more precise estimates may be possible on some hardware if
> > > >>>>>>>> + * the hardware provides some means of timestamping the true end of vblank.
> > > >>>>>>>> + *
> > > >>>>>>>> + * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
> > > >>>>>>>> + * will use either the standard vblank timestamp, calculated for a minimum
> > > >>>>>>>> + * duration vblank, or the provided @flip_time if that time is later than the
> > > >>>>>>>> + * vblank timestamp, to get the best possible estimate of start of display of
> > > >>>>>>>> + * the new post-pageflip scanout buffer.
> > > >>>>>>>> + */
> > > >>>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> > > >>>>>>>> +                                     ktime_t flip_time)
> > > >>>>>>>> +{
> > > >>>>>>>> +    struct drm_device *dev = crtc->dev;
> > > >>>>>>>> +    struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
> > > >>>>>>>> +
> > > >>>>>>>> +    vblank->alt_flip_time = flip_time;
> > > >>>>>>>> +}
> > > >>>>>>>> +EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
> > > >>>>>>>> +
> > > >>>>>>>>     static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
> > > >>>>>>>>     {
> > > >>>>>>>>        if (drm_core_check_feature(dev, DRIVER_MODESET)) {
> > > >>>>>>>> diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
> > > >>>>>>>> index 6ad9630d4f48..aacf44694ab6 100644
> > > >>>>>>>> --- a/include/drm/drm_vblank.h
> > > >>>>>>>> +++ b/include/drm/drm_vblank.h
> > > >>>>>>>> @@ -117,6 +117,12 @@ struct drm_vblank_crtc {
> > > >>>>>>>>         * @time: Vblank timestamp corresponding to @count.
> > > >>>>>>>>         */
> > > >>>>>>>>        ktime_t time;
> > > >>>>>>>> +    /**
> > > >>>>>>>> +     * @alt_flip_time: Vblank timestamp for end of extended vblank due to
> > > >>>>>>>> +     * a late pageflip completion in variable refresh rate mode. Pageflip
> > > >>>>>>>> +     * events will carry the later one of @time and @alt_flip_time.
> > > >>>>>>>> +     */
> > > >>>>>>>> +    ktime_t alt_flip_time;
> > > >>>>>>>>
> > > >>>>>>>>        /**
> > > >>>>>>>>         * @refcount: Number of users/waiters of the vblank interrupt. Only when
> > > >>>>>>>> @@ -179,6 +185,8 @@ int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
> > > >>>>>>>>     u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
> > > >>>>>>>>     u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
> > > >>>>>>>>                                   ktime_t *vblanktime);
> > > >>>>>>>> +void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
> > > >>>>>>>> +                                     ktime_t flip_time);
> > > >>>>>>>>     void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
> > > >>>>>>>>                               struct drm_pending_vblank_event *e);
> > > >>>>>>>>     void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,
> > > >>>>>>>> --
> > > >>>>>>>> 2.17.1
> > > >>>>>>>>
> > > >>>>>>>> _______________________________________________
> > > >>>>>>>> dri-devel mailing listjj
> > > >>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > >>>>>>>
> > > >>>>>>
> > > >>>>>
> > > >>>>>
> > > >>>>> --
> > > >>>>> Daniel Vetter
> > > >>>>> Software Engineer, Intel Corporation
> > > >>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > > >>>
> > > >>
> > > >> Nicholas Kazlauskas
> > > >>
> > > >
> > > >
> > >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
diff mbox series

Patch

diff --git a/drivers/gpu/drm/drm_vblank.c b/drivers/gpu/drm/drm_vblank.c
index 98e091175921..4b3a4c38fabe 100644
--- a/drivers/gpu/drm/drm_vblank.c
+++ b/drivers/gpu/drm/drm_vblank.c
@@ -814,10 +814,21 @@  static void send_vblank_event(struct drm_device *dev,
 		u64 seq, ktime_t now)
 {
 	struct timespec64 tv;
+	ktime_t alt_flip_time;
 
 	switch (e->event.base.type) {
-	case DRM_EVENT_VBLANK:
 	case DRM_EVENT_FLIP_COMPLETE:
+		/*
+		 * For flip completion events, override "now" time
+		 * with alt_flip_time provided by the driver via
+		 * drm_crtc_set_vrr_pageflip_timestamp() in VRR mode
+		 * if that time is later than given "now" vblank time.
+		 */
+		alt_flip_time = dev->vblank[e->pipe].alt_flip_time;
+		if (alt_flip_time > now)
+			now = alt_flip_time;
+		/* Fallthrough */
+	case DRM_EVENT_VBLANK:
 		tv = ktime_to_timespec64(now);
 		e->event.vbl.sequence = seq;
 		/*
@@ -916,11 +927,47 @@  void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
 
 		now = ktime_get();
 	}
+
 	e->pipe = pipe;
 	send_vblank_event(dev, e, seq, now);
 }
 EXPORT_SYMBOL(drm_crtc_send_vblank_event);
 
+/**
+ * drm_crtc_set_vrr_pageflip_timestamp - helper to set alternate pageflip time
+ * @crtc: the source CRTC of the pageflip completion event
+ * @flip_time: The alternate pageflip completion timestamp in VRR mode
+ *
+ * In variable refresh rate mode (VRR), a pageflip completion timestamp carried
+ * by the pageflip event can never be earlier than the vblank timestamp of the
+ * vblank of flip completion, as that vblank timestamp defines the end of the
+ * shortest possible vblank duration. In case of a delayed flip completion
+ * inside the extended VRR front porch however, the end of vblank can be much
+ * later, so the driver must assign an estimated timestamp of that later end of
+ * vblank. For a CRTC in VRR mode, the driver should use this helper function to
+ * set an alternate flip completion timestamp in case of late flip completions
+ * in extended vblank. In the most simple case, this @flip_time timestamp could
+ * simply be a ktime_get() timestamp taken at the start of the pageflip
+ * completion routine, with some constant duration of the back porch interval
+ * added, although more precise estimates may be possible on some hardware if
+ * the hardware provides some means of timestamping the true end of vblank.
+ *
+ * When sending out pageflip events, e.g., via drm_crtc_send_vblank_event(), it
+ * will use either the standard vblank timestamp, calculated for a minimum
+ * duration vblank, or the provided @flip_time if that time is later than the
+ * vblank timestamp, to get the best possible estimate of start of display of
+ * the new post-pageflip scanout buffer.
+ */
+void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
+					 ktime_t flip_time)
+{
+	struct drm_device *dev = crtc->dev;
+	struct drm_vblank_crtc *vblank = &dev->vblank[drm_crtc_index(crtc)];
+
+	vblank->alt_flip_time = flip_time;
+}
+EXPORT_SYMBOL(drm_crtc_set_vrr_pageflip_timestamp);
+
 static int __enable_vblank(struct drm_device *dev, unsigned int pipe)
 {
 	if (drm_core_check_feature(dev, DRIVER_MODESET)) {
diff --git a/include/drm/drm_vblank.h b/include/drm/drm_vblank.h
index 6ad9630d4f48..aacf44694ab6 100644
--- a/include/drm/drm_vblank.h
+++ b/include/drm/drm_vblank.h
@@ -117,6 +117,12 @@  struct drm_vblank_crtc {
 	 * @time: Vblank timestamp corresponding to @count.
 	 */
 	ktime_t time;
+	/**
+	 * @alt_flip_time: Vblank timestamp for end of extended vblank due to
+	 * a late pageflip completion in variable refresh rate mode. Pageflip
+	 * events will carry the later one of @time and @alt_flip_time.
+	 */
+	ktime_t alt_flip_time;
 
 	/**
 	 * @refcount: Number of users/waiters of the vblank interrupt. Only when
@@ -179,6 +185,8 @@  int drm_vblank_init(struct drm_device *dev, unsigned int num_crtcs);
 u64 drm_crtc_vblank_count(struct drm_crtc *crtc);
 u64 drm_crtc_vblank_count_and_time(struct drm_crtc *crtc,
 				   ktime_t *vblanktime);
+void drm_crtc_set_vrr_pageflip_timestamp(struct drm_crtc *crtc,
+					 ktime_t flip_time);
 void drm_crtc_send_vblank_event(struct drm_crtc *crtc,
 			       struct drm_pending_vblank_event *e);
 void drm_crtc_arm_vblank_event(struct drm_crtc *crtc,