diff mbox

drm: Return current vblank value for drmWaitVBlank queries

Message ID 1426607071-20515-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson March 17, 2015, 3:44 p.m. UTC
When userspace queries the current vblank for the CRTC, we can reply
with the cached value (using atomic reads to serialise with the vblank
interrupt as necessary) without having to touch registers. In the
instant disable case, this saves us from enabling/disabling the vblank
around every query, greatly reducing the number of registers read and
written.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Imre Deak <imre.deak@intel.com>
Cc: Daniel Vetter <daniel.vetter@intel.com>
Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
Cc: Dave Airlie <airlied@redhat.com>
---
 drivers/gpu/drm/drm_irq.c | 15 ++++++++++++++-
 1 file changed, 14 insertions(+), 1 deletion(-)

Comments

Michel Dänzer March 18, 2015, 2:53 a.m. UTC | #1
On 18.03.2015 00:44, Chris Wilson wrote:
> When userspace queries the current vblank for the CRTC, we can reply
> with the cached value (using atomic reads to serialise with the vblank
> interrupt as necessary) without having to touch registers. In the
> instant disable case, this saves us from enabling/disabling the vblank
> around every query, greatly reducing the number of registers read and
> written.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Imre Deak <imre.deak@intel.com>
> Cc: Daniel Vetter <daniel.vetter@intel.com>
> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
> Cc: Dave Airlie <airlied@redhat.com>
> ---
>  drivers/gpu/drm/drm_irq.c | 15 ++++++++++++++-
>  1 file changed, 14 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
> index c8a34476570a..6c4570082b65 100644
> --- a/drivers/gpu/drm/drm_irq.c
> +++ b/drivers/gpu/drm/drm_irq.c
> @@ -1585,7 +1585,18 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
>  	if (crtc >= dev->num_crtcs)
>  		return -EINVAL;
>  
> -	vblank = &dev->vblank[crtc];
> +	/* Fast-path the query for the current value (without an event)
> +	 * to avoid having to enable/disable the vblank interrupts.
> +	 */
> +	if ((vblwait->request.type & (_DRM_VBLANK_TYPES_MASK | _DRM_VBLANK_FLAGS_MASK)) == _DRM_VBLANK_RELATIVE &&
> +	    vblwait->request.sequence == 0) {
> +		struct timeval now;
> +
> +		vblwait->reply.sequence = drm_vblank_count_and_time(dev, crtc, &now);
> +		vblwait->reply.tval_sec = now.tv_sec;
> +		vblwait->reply.tval_usec = now.tv_usec;
> +		return 0;
> +	}

drm_vblank_count_and_time() doesn't return the correct sequence number
while the vblank interrupt is disabled, does it? It returns the sequence
number from the last time vblank_disable_and_save() was called (when the
vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
Michel Dänzer March 18, 2015, 3:13 a.m. UTC | #2
On 18.03.2015 11:53, Michel Dänzer wrote:
> On 18.03.2015 00:44, Chris Wilson wrote:
>> When userspace queries the current vblank for the CRTC, we can reply
>> with the cached value (using atomic reads to serialise with the vblank
>> interrupt as necessary) without having to touch registers. In the
>> instant disable case, this saves us from enabling/disabling the vblank
>> around every query, greatly reducing the number of registers read and
>> written.
>>
>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Imre Deak <imre.deak@intel.com>
>> Cc: Daniel Vetter <daniel.vetter@intel.com>
>> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
>> Cc: Dave Airlie <airlied@redhat.com>
>> ---
>>  drivers/gpu/drm/drm_irq.c | 15 ++++++++++++++-
>>  1 file changed, 14 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
>> index c8a34476570a..6c4570082b65 100644
>> --- a/drivers/gpu/drm/drm_irq.c
>> +++ b/drivers/gpu/drm/drm_irq.c
>> @@ -1585,7 +1585,18 @@ int drm_wait_vblank(struct drm_device *dev, void *data,
>>  	if (crtc >= dev->num_crtcs)
>>  		return -EINVAL;
>>  
>> -	vblank = &dev->vblank[crtc];
>> +	/* Fast-path the query for the current value (without an event)
>> +	 * to avoid having to enable/disable the vblank interrupts.
>> +	 */
>> +	if ((vblwait->request.type & (_DRM_VBLANK_TYPES_MASK | _DRM_VBLANK_FLAGS_MASK)) == _DRM_VBLANK_RELATIVE &&
>> +	    vblwait->request.sequence == 0) {
>> +		struct timeval now;
>> +
>> +		vblwait->reply.sequence = drm_vblank_count_and_time(dev, crtc, &now);
>> +		vblwait->reply.tval_sec = now.tv_sec;
>> +		vblwait->reply.tval_usec = now.tv_usec;
>> +		return 0;
>> +	}
> 
> drm_vblank_count_and_time() doesn't return the correct sequence number
> while the vblank interrupt is disabled, does it? It returns the sequence
> number from the last time vblank_disable_and_save() was called (when the
> vblank interrupt was disabled). That's why drm_vblank_get() is needed here.

It might be possible to avoid drm_vblank_get() and use
drm_update_vblank_count() before (or in) drm_vblank_count_and_time()
instead when the vblank interrupt is disabled, but then you'd first need
to make it safe to call drm_update_vblank_count() several times while
the vblank interrupt is disabled, by making it update vblank->last at least.
Chris Wilson March 18, 2015, 9:30 a.m. UTC | #3
On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
> drm_vblank_count_and_time() doesn't return the correct sequence number
> while the vblank interrupt is disabled, does it? It returns the sequence
> number from the last time vblank_disable_and_save() was called (when the
> vblank interrupt was disabled). That's why drm_vblank_get() is needed here.

Ville enlightened me as well. I thought the value was cooked so that
time did not pass whilst the IRQ was disabled. Hopefully, I can impress
upon the Intel folks, at least, that enabling/disabling the interrupts
just to read the current hw counter is interesting to say the least and
sits at the top of the profiles when benchmarking Present.
-Chris
Mario Kleiner March 18, 2015, 2:52 p.m. UTC | #4
On 03/18/2015 10:30 AM, Chris Wilson wrote:
> On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
>> drm_vblank_count_and_time() doesn't return the correct sequence number
>> while the vblank interrupt is disabled, does it? It returns the sequence
>> number from the last time vblank_disable_and_save() was called (when the
>> vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
>
> Ville enlightened me as well. I thought the value was cooked so that
> time did not pass whilst the IRQ was disabled. Hopefully, I can impress
> upon the Intel folks, at least, that enabling/disabling the interrupts
> just to read the current hw counter is interesting to say the least and
> sits at the top of the profiles when benchmarking Present.
> -Chris
>

drm_wait_vblank() not only gets the counter but also the corresponding 
vblank timestamp. Counters are recalculated in vblank_disable_and_save() 
for irq off, then in the vblank irq on path, and every refresh in 
drm_handle_vblank at vblank irq time.

The timestamps can be recalculated at any time iff the driver supports 
high precision timestamping, which currently intel kms, radeon kms, and 
nouveau kms do. But for other parts, like most SoC's, afaik you only get 
a valid timestamp by sampling system time in the vblank irq handler, so 
there you'd have a problem.

There are also some races around the enable/disable path which require a 
lot of care and exact knowledge of when each hardware fires its vblanks, 
updates its hardware counters etc. to get rid of them. Ville did that - 
successfully as far as my tests go - for Intel kms, but other drivers 
would be less forgiving.

Our current method is to:

a) Only disable vblank irqs after a default idle period of 5 seconds, so 
we don't get races frequent/likely enough to cause problems for clients. 
And we save the overhead for all the vblank irq on/off.

b) On drivers which have high precision timestamping and have been 
carefully checked to be race free (== intel kms only atm.) we have 
instant disable, so things like blinking cursors don't keep vblank irq 
on forever.

If b) causes so much overhead, maybe we could change the "instant 
disable" into a "disable after a very short time", e.g., lowering the 
timeout from 5000 msecs to 2-3 video refresh durations ~ 50 msecs? That 
would still disable vblank irqs for power saving if the desktop is 
really idle, but avoid on/off storms for the various drm_wait_vblank's 
that happen when preparing a swap.

-mario
Daniel Vetter March 19, 2015, 2:33 p.m. UTC | #5
On Wed, Mar 18, 2015 at 03:52:56PM +0100, Mario Kleiner wrote:
> On 03/18/2015 10:30 AM, Chris Wilson wrote:
> >On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
> >>drm_vblank_count_and_time() doesn't return the correct sequence number
> >>while the vblank interrupt is disabled, does it? It returns the sequence
> >>number from the last time vblank_disable_and_save() was called (when the
> >>vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
> >
> >Ville enlightened me as well. I thought the value was cooked so that
> >time did not pass whilst the IRQ was disabled. Hopefully, I can impress
> >upon the Intel folks, at least, that enabling/disabling the interrupts
> >just to read the current hw counter is interesting to say the least and
> >sits at the top of the profiles when benchmarking Present.
> >-Chris
> >
> 
> drm_wait_vblank() not only gets the counter but also the corresponding
> vblank timestamp. Counters are recalculated in vblank_disable_and_save() for
> irq off, then in the vblank irq on path, and every refresh in
> drm_handle_vblank at vblank irq time.
> 
> The timestamps can be recalculated at any time iff the driver supports high
> precision timestamping, which currently intel kms, radeon kms, and nouveau
> kms do. But for other parts, like most SoC's, afaik you only get a valid
> timestamp by sampling system time in the vblank irq handler, so there you'd
> have a problem.
> 
> There are also some races around the enable/disable path which require a lot
> of care and exact knowledge of when each hardware fires its vblanks, updates
> its hardware counters etc. to get rid of them. Ville did that - successfully
> as far as my tests go - for Intel kms, but other drivers would be less
> forgiving.
> 
> Our current method is to:
> 
> a) Only disable vblank irqs after a default idle period of 5 seconds, so we
> don't get races frequent/likely enough to cause problems for clients. And we
> save the overhead for all the vblank irq on/off.
> 
> b) On drivers which have high precision timestamping and have been carefully
> checked to be race free (== intel kms only atm.) we have instant disable, so
> things like blinking cursors don't keep vblank irq on forever.
> 
> If b) causes so much overhead, maybe we could change the "instant disable"
> into a "disable after a very short time", e.g., lowering the timeout from
> 5000 msecs to 2-3 video refresh durations ~ 50 msecs? That would still
> disable vblank irqs for power saving if the desktop is really idle, but
> avoid on/off storms for the various drm_wait_vblank's that happen when
> preparing a swap.

Yeah I think we could add code which only gets run for drivers which
support instant disable (i915 doesn't do that on gen2 because the hw is
lacking). There we should be able to update the vblank counter/timestamp
correctly without enabling interrupts temporarily. Ofc we need to make
sure we have enough nasty igt testcase to ensure there's not going to be
jumps and missed frame numbers in that case.
-Daniel
Ville Syrjala March 19, 2015, 3:04 p.m. UTC | #6
On Thu, Mar 19, 2015 at 03:33:11PM +0100, Daniel Vetter wrote:
> On Wed, Mar 18, 2015 at 03:52:56PM +0100, Mario Kleiner wrote:
> > On 03/18/2015 10:30 AM, Chris Wilson wrote:
> > >On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
> > >>drm_vblank_count_and_time() doesn't return the correct sequence number
> > >>while the vblank interrupt is disabled, does it? It returns the sequence
> > >>number from the last time vblank_disable_and_save() was called (when the
> > >>vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
> > >
> > >Ville enlightened me as well. I thought the value was cooked so that
> > >time did not pass whilst the IRQ was disabled. Hopefully, I can impress
> > >upon the Intel folks, at least, that enabling/disabling the interrupts
> > >just to read the current hw counter is interesting to say the least and
> > >sits at the top of the profiles when benchmarking Present.
> > >-Chris
> > >
> > 
> > drm_wait_vblank() not only gets the counter but also the corresponding
> > vblank timestamp. Counters are recalculated in vblank_disable_and_save() for
> > irq off, then in the vblank irq on path, and every refresh in
> > drm_handle_vblank at vblank irq time.
> > 
> > The timestamps can be recalculated at any time iff the driver supports high
> > precision timestamping, which currently intel kms, radeon kms, and nouveau
> > kms do. But for other parts, like most SoC's, afaik you only get a valid
> > timestamp by sampling system time in the vblank irq handler, so there you'd
> > have a problem.
> > 
> > There are also some races around the enable/disable path which require a lot
> > of care and exact knowledge of when each hardware fires its vblanks, updates
> > its hardware counters etc. to get rid of them. Ville did that - successfully
> > as far as my tests go - for Intel kms, but other drivers would be less
> > forgiving.
> > 
> > Our current method is to:
> > 
> > a) Only disable vblank irqs after a default idle period of 5 seconds, so we
> > don't get races frequent/likely enough to cause problems for clients. And we
> > save the overhead for all the vblank irq on/off.
> > 
> > b) On drivers which have high precision timestamping and have been carefully
> > checked to be race free (== intel kms only atm.) we have instant disable, so
> > things like blinking cursors don't keep vblank irq on forever.
> > 
> > If b) causes so much overhead, maybe we could change the "instant disable"
> > into a "disable after a very short time", e.g., lowering the timeout from
> > 5000 msecs to 2-3 video refresh durations ~ 50 msecs? That would still
> > disable vblank irqs for power saving if the desktop is really idle, but
> > avoid on/off storms for the various drm_wait_vblank's that happen when
> > preparing a swap.
> 
> Yeah I think we could add code which only gets run for drivers which
> support instant disable (i915 doesn't do that on gen2 because the hw is
> lacking). There we should be able to update the vblank counter/timestamp
> correctly without enabling interrupts temporarily. Ofc we need to make
> sure we have enough nasty igt testcase to ensure there's not going to be
> jumps and missed frame numbers in that case.

Is enabling the interrupts the expensive part, or is it the actual
double timestamp read + scanout pos read? Or is it due to the several
spinlocks we have in this code?

Also why is userspace reading the vblank counter in the first place? Due
to the crazy OML_whatever stuff perhaps? In the simple swap interval case
you shouldn't really need to read it. And if we actually made the page
flip/atomic ioctl take a target vblank count and let the kernel deal
with it we wouldn't need to call the vblank ioctl at all.

As far as gen2 goes, I have some code somewhere that improved things a
bit. Essentially I just looked at the monotonic timestamps to figure out
if we missed any vblanks while interrupts were off. IIRC my current code
only added one to the cooked count in that case, but should be easy to
make it come up with a more realistic delta.
Chris Wilson March 19, 2015, 3:13 p.m. UTC | #7
On Thu, Mar 19, 2015 at 05:04:19PM +0200, Ville Syrjälä wrote:
> Is enabling the interrupts the expensive part, or is it the actual
> double timestamp read + scanout pos read? Or is it due to the several
> spinlocks we have in this code?

Chiefly it was the read during disable, then the spinlocked irq
enabling.
 
> Also why is userspace reading the vblank counter in the first place? Due
> to the crazy OML_whatever stuff perhaps? In the simple swap interval case
> you shouldn't really need to read it. And if we actually made the page
> flip/atomic ioctl take a target vblank count and let the kernel deal
> with it we wouldn't need to call the vblank ioctl at all.

More or less due to OML, yes. Client asks for random MSC, often 0, we
compute the next MSC for it. This happens 10,0000+ a second when
benchmarking and drmWaitVblank is the most expensive step in that chain
for the server...

Pretty much an igt that compared the speed of just querying the hw
counter vs querying with a regular vblank interrupt would be ideal for
measuring the impact here.
-Chris
Chris Wilson March 19, 2015, 3:36 p.m. UTC | #8
On Thu, Mar 19, 2015 at 03:13:15PM +0000, Chris Wilson wrote:
> Pretty much an igt that compared the speed of just querying the hw
> counter vs querying with a regular vblank interrupt would be ideal for
> measuring the impact here.

ickle@crystalwell:/usr/src/intel-gpu-tools$ sudo ./tests/kms_vblank 
IGT-Version: 1.10-gc15fb9e (x86_64) (Linux: 4.0.0-rc4+ x86_64)
Time to query current counter (idle):		  2.217µs
Subtest query-idle: SUCCESS (1.006s)
Time to query current counter (busy):		  0.244µs
Subtest query-busy: SUCCESS (1.201s

-  28.04%     kms_vblank  [kernel.kallsyms]   [k] gen6_read32                  
   - gen6_read32                                                               
      + 65.27% gm45_get_vblank_counter                                         
      + 17.50% ironlake_disable_display_irq                                    
      + 16.53% ironlake_enable_display_irq                                     
+  21.57%     kms_vblank  [kernel.kallsyms]   [k] hsw_unclaimed_reg_detect.isra
+  10.69%     kms_vblank  [kernel.kallsyms]   [k] _raw_spin_lock_irqsave       
+   9.91%     kms_vblank  [kernel.kallsyms]   [k] __intel_get_crtc_scanline    
+   6.66%     kms_vblank  [kernel.kallsyms]   [k] _raw_spin_unlock_irqrestore 
+   1.34%     kms_vblank  [kernel.kallsyms]   [k] i915_get_crtc_scanoutpos     
+   1.29%     kms_vblank  [kernel.kallsyms]   [k] drm_wait_vblank              
+   1.27%     kms_vblank  [kernel.kallsyms]   [k] hsw_unclaimed_reg_debug.isra.
+   1.17%     kms_vblank  [kernel.kallsyms]   [k] copy_user_enhanced_fast_strin
+   1.12%     kms_vblank  [kernel.kallsyms]   [k] drm_ioctl                    
+   1.05%     kms_vblank  [kernel.kallsyms]   [k] vblank_disable_and_save
Mario Kleiner March 19, 2015, 4:43 p.m. UTC | #9
On 03/19/2015 04:04 PM, Ville Syrjälä wrote:
> On Thu, Mar 19, 2015 at 03:33:11PM +0100, Daniel Vetter wrote:
>> On Wed, Mar 18, 2015 at 03:52:56PM +0100, Mario Kleiner wrote:
>>> On 03/18/2015 10:30 AM, Chris Wilson wrote:
>>>> On Wed, Mar 18, 2015 at 11:53:16AM +0900, Michel Dänzer wrote:
>>>>> drm_vblank_count_and_time() doesn't return the correct sequence number
>>>>> while the vblank interrupt is disabled, does it? It returns the sequence
>>>>> number from the last time vblank_disable_and_save() was called (when the
>>>>> vblank interrupt was disabled). That's why drm_vblank_get() is needed here.
>>>>
>>>> Ville enlightened me as well. I thought the value was cooked so that
>>>> time did not pass whilst the IRQ was disabled. Hopefully, I can impress
>>>> upon the Intel folks, at least, that enabling/disabling the interrupts
>>>> just to read the current hw counter is interesting to say the least and
>>>> sits at the top of the profiles when benchmarking Present.
>>>> -Chris
>>>>
>>>
>>> drm_wait_vblank() not only gets the counter but also the corresponding
>>> vblank timestamp. Counters are recalculated in vblank_disable_and_save() for
>>> irq off, then in the vblank irq on path, and every refresh in
>>> drm_handle_vblank at vblank irq time.
>>>
>>> The timestamps can be recalculated at any time iff the driver supports high
>>> precision timestamping, which currently intel kms, radeon kms, and nouveau
>>> kms do. But for other parts, like most SoC's, afaik you only get a valid
>>> timestamp by sampling system time in the vblank irq handler, so there you'd
>>> have a problem.
>>>
>>> There are also some races around the enable/disable path which require a lot
>>> of care and exact knowledge of when each hardware fires its vblanks, updates
>>> its hardware counters etc. to get rid of them. Ville did that - successfully
>>> as far as my tests go - for Intel kms, but other drivers would be less
>>> forgiving.
>>>
>>> Our current method is to:
>>>
>>> a) Only disable vblank irqs after a default idle period of 5 seconds, so we
>>> don't get races frequent/likely enough to cause problems for clients. And we
>>> save the overhead for all the vblank irq on/off.
>>>
>>> b) On drivers which have high precision timestamping and have been carefully
>>> checked to be race free (== intel kms only atm.) we have instant disable, so
>>> things like blinking cursors don't keep vblank irq on forever.
>>>
>>> If b) causes so much overhead, maybe we could change the "instant disable"
>>> into a "disable after a very short time", e.g., lowering the timeout from
>>> 5000 msecs to 2-3 video refresh durations ~ 50 msecs? That would still
>>> disable vblank irqs for power saving if the desktop is really idle, but
>>> avoid on/off storms for the various drm_wait_vblank's that happen when
>>> preparing a swap.
>>
>> Yeah I think we could add code which only gets run for drivers which
>> support instant disable (i915 doesn't do that on gen2 because the hw is
>> lacking). There we should be able to update the vblank counter/timestamp
>> correctly without enabling interrupts temporarily. Ofc we need to make
>> sure we have enough nasty igt testcase to ensure there's not going to be
>> jumps and missed frame numbers in that case.

I'd rather go for the very simple "fast disable with short timeout" 
method. That would only be a tiny almost one-liner patch that reuses the 
existing timer for the default slow case, and we'd know already that it 
will work reliably on "instant off" capable drivers - no extra tests 
required. Those drm_vblank_get/put calls usually come in short bursts 
which should be covered by a timeout of maybe 1 to max. 3 refresh durations.

When we query the hw timestamps, we always have a little bit of 
unavoidable noise, even if it's often only +/- 1 usec on modern hw, so 
clients querying the timestamp for the same vblank would get slightly 
different results on repeated queries. On hw which only allows scanline 
granularity for queries, we can get variability up to 1 scanline 
duration. If the caller does things like delta calculations on those 
results (dT = currentts - lastts) it can get confusing results like time 
going backwards by a few microseconds. That's why the current code 
caches the last vblank ts, to save overhead and to make sure that 
repeated queries of the same vblank give identical results.

>
> Is enabling the interrupts the expensive part, or is it the actual
> double timestamp read + scanout pos read? Or is it due to the several
> spinlocks we have in this code?
>

The timestamp/scanout pos read itself is not that expensive iirc, 
usually 1-3 usecs depending on hw, from some testing i did a year ago. 
The machinery for irq on/off + all the reinitializing of vblank counts 
and matching timestamps etc. is probably not that cheap.

> Also why is userspace reading the vblank counter in the first place? Due
> to the crazy OML_whatever stuff perhaps? In the simple swap interval case
> you shouldn't really need to read it. And if we actually made the page
> flip/atomic ioctl take a target vblank count and let the kernel deal
> with it we wouldn't need to call the vblank ioctl at all.

I object to the "crazy", extensions have feelings too ;-). Support for 
OML_sync_control is one of the most lovely features Linux has atm. as a 
big advantage over other OS's for research, vr and medical applications 
requiring precise visual timing. But yes, glXGetSyncValuesOML(), 
glXGetVideoSyncSGI() for clients. Mine and similar applications use it 
to synchronize code execution, or things like audio and other i/o to 
vblank, for correctness checks, and for remapping user provided target 
times into target vblank counts. X11 compositors use it, Weston uses it 
for scheduling and for synchronizing its repaint cycle to vblank. The 
ddx use it a lot under dri2 and dri3/present.

Not that it couldn't be done better, e.g., having the kernel directly 
deal with a target vblank count, or even with a target system time after 
which a flip should happen, could be useful additions. But then you'd 
have extra complexity for dealing with things like client windows going 
away or getting (un)redirected after requesting a page flip far into the 
future, and you'd still have to deal with windowed X11 clients which 
can't use page flipping directly for their swaps.

-mario

>
> As far as gen2 goes, I have some code somewhere that improved things a
> bit. Essentially I just looked at the monotonic timestamps to figure out
> if we missed any vblanks while interrupts were off. IIRC my current code
> only added one to the cooked count in that case, but should be easy to
> make it come up with a more realistic delta.
>
diff mbox

Patch

diff --git a/drivers/gpu/drm/drm_irq.c b/drivers/gpu/drm/drm_irq.c
index c8a34476570a..6c4570082b65 100644
--- a/drivers/gpu/drm/drm_irq.c
+++ b/drivers/gpu/drm/drm_irq.c
@@ -1585,7 +1585,18 @@  int drm_wait_vblank(struct drm_device *dev, void *data,
 	if (crtc >= dev->num_crtcs)
 		return -EINVAL;
 
-	vblank = &dev->vblank[crtc];
+	/* Fast-path the query for the current value (without an event)
+	 * to avoid having to enable/disable the vblank interrupts.
+	 */
+	if ((vblwait->request.type & (_DRM_VBLANK_TYPES_MASK | _DRM_VBLANK_FLAGS_MASK)) == _DRM_VBLANK_RELATIVE &&
+	    vblwait->request.sequence == 0) {
+		struct timeval now;
+
+		vblwait->reply.sequence = drm_vblank_count_and_time(dev, crtc, &now);
+		vblwait->reply.tval_sec = now.tv_sec;
+		vblwait->reply.tval_usec = now.tv_usec;
+		return 0;
+	}
 
 	ret = drm_vblank_get(dev, crtc);
 	if (ret) {
@@ -1619,6 +1630,8 @@  int drm_wait_vblank(struct drm_device *dev, void *data,
 
 	DRM_DEBUG("waiting on vblank count %d, crtc %d\n",
 		  vblwait->request.sequence, crtc);
+
+	vblank = &dev->vblank[crtc];
 	vblank->last_wait = vblwait->request.sequence;
 	DRM_WAIT_ON(ret, vblank->queue, 3 * HZ,
 		    (((drm_vblank_count(dev, crtc) -