diff mbox

drm/i915: Asynchronously perform the set-base for a simple modeset

Message ID 1376057602-15233-1-git-send-email-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show

Commit Message

Chris Wilson Aug. 9, 2013, 2:13 p.m. UTC
A simple modeset, where we only wish to switch over to a new framebuffer
such as the transition from fbcon to X, takes around 30-60ms. This is
due to three factors:

1. We need to make sure the fb->obj is in the display domain, which
incurs a cache flush to ensure no dirt is left on the scanout.

2. We need to flush any pending rendering before performing the mmio
so that the frame is complete before it is shown.

3. We currently wait for the vblank after the mmio to be sure that the
old fb is no longer being shown before releasing it.

(1) can only be eliminated by userspace preparing the fb->obj in advance
to already be in the display domain. This can be done through use of the
create2 ioctl, or by reusing an existing fb->obj.

However, (2) and (3) are already solved by the existing page flip
mechanism, and it is surprisingly trivial to wire them up for use in the
set-base fast path. Though it can be argued that this represents a
subtle ABI break in that the set_config ioctl now returns before the old
framebuffer is unpinned. The danger is that userspace will start to
modify it before it is no longer being shown, however we should be able
to prevent that through proper domain tracking.

By combining all of the above, we can achieve an instaneous set_config:

[     6.601] (II) intel(0): switch to mode 2560x1440@60.0 on pipe 0 using DP2, position (0, 0), rotation normal
[     6.601] (II) intel(0): Setting screen physical size to 677 x 381

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_display.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

Comments

Daniel Vetter Aug. 9, 2013, 7:17 p.m. UTC | #1
On Fri, Aug 09, 2013 at 03:13:22PM +0100, Chris Wilson wrote:
> A simple modeset, where we only wish to switch over to a new framebuffer
> such as the transition from fbcon to X, takes around 30-60ms. This is
> due to three factors:
> 
> 1. We need to make sure the fb->obj is in the display domain, which
> incurs a cache flush to ensure no dirt is left on the scanout.
> 
> 2. We need to flush any pending rendering before performing the mmio
> so that the frame is complete before it is shown.
> 
> 3. We currently wait for the vblank after the mmio to be sure that the
> old fb is no longer being shown before releasing it.
> 
> (1) can only be eliminated by userspace preparing the fb->obj in advance
> to already be in the display domain. This can be done through use of the
> create2 ioctl, or by reusing an existing fb->obj.
> 
> However, (2) and (3) are already solved by the existing page flip
> mechanism, and it is surprisingly trivial to wire them up for use in the
> set-base fast path. Though it can be argued that this represents a
> subtle ABI break in that the set_config ioctl now returns before the old
> framebuffer is unpinned. The danger is that userspace will start to
> modify it before it is no longer being shown, however we should be able
> to prevent that through proper domain tracking.

Hm, right now we don't prevent anyone from rendering into a to-be-flipped
out buffer. There was once code in it, using MI_WAIT_EVENT but we've
ripped it out. I guess we could just throw in a synchronous stall on the
flip queue though, that should work always.

Testing would be easy if we have the crtc CRC stuff, but that's atm stuck
due to lack of volunteers ...

Overall I really like the idea and I think doing most of the plane
enabling (including psr, fbc, ips, and all that stuff which potentially
blows through a wblank wait) should be done in async work queues. That
should then also help resume time a lot.

Cheers, Daniel

> By combining all of the above, we can achieve an instaneous set_config:
> 
> [     6.601] (II) intel(0): switch to mode 2560x1440@60.0 on pipe 0 using DP2, position (0, 0), rotation normal
> [     6.601] (II) intel(0): Setting screen physical size to 677 x 381
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/intel_display.c | 11 +++++++----
>  1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 809b968..c6eea51 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -9077,10 +9077,13 @@ static int intel_crtc_set_config(struct drm_mode_set *set)
>  		ret = intel_set_mode(set->crtc, set->mode,
>  				     set->x, set->y, set->fb);
>  	} else if (config->fb_changed) {
> -		intel_crtc_wait_for_pending_flips(set->crtc);
> -
> -		ret = intel_pipe_set_base(set->crtc,
> -					  set->x, set->y, set->fb);
> +		if (to_intel_framebuffer(set->fb)->obj->ring == NULL ||
> +		    save_set.x != set->x || save_set.y != set->y ||
> +		    intel_crtc_page_flip(set->crtc, set->fb, NULL)) {
> +			intel_crtc_wait_for_pending_flips(set->crtc);
> +			ret = intel_pipe_set_base(set->crtc,
> +						  set->x, set->y, set->fb);
> +		}
>  	}
>  
>  	if (ret) {
> -- 
> 1.8.1.2
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Aug. 9, 2013, 8:06 p.m. UTC | #2
On Fri, Aug 09, 2013 at 09:17:11PM +0200, Daniel Vetter wrote:
> On Fri, Aug 09, 2013 at 03:13:22PM +0100, Chris Wilson wrote:
> > A simple modeset, where we only wish to switch over to a new framebuffer
> > such as the transition from fbcon to X, takes around 30-60ms. This is
> > due to three factors:
> > 
> > 1. We need to make sure the fb->obj is in the display domain, which
> > incurs a cache flush to ensure no dirt is left on the scanout.
> > 
> > 2. We need to flush any pending rendering before performing the mmio
> > so that the frame is complete before it is shown.
> > 
> > 3. We currently wait for the vblank after the mmio to be sure that the
> > old fb is no longer being shown before releasing it.
> > 
> > (1) can only be eliminated by userspace preparing the fb->obj in advance
> > to already be in the display domain. This can be done through use of the
> > create2 ioctl, or by reusing an existing fb->obj.
> > 
> > However, (2) and (3) are already solved by the existing page flip
> > mechanism, and it is surprisingly trivial to wire them up for use in the
> > set-base fast path. Though it can be argued that this represents a
> > subtle ABI break in that the set_config ioctl now returns before the old
> > framebuffer is unpinned. The danger is that userspace will start to
> > modify it before it is no longer being shown, however we should be able
> > to prevent that through proper domain tracking.
> 
> Hm, right now we don't prevent anyone from rendering into a to-be-flipped
> out buffer. There was once code in it, using MI_WAIT_EVENT but we've
> ripped it out. I guess we could just throw in a synchronous stall on the
> flip queue though, that should work always.

I'm glad we did. I'd rather put that into userspace rather than have the
kernel impose that policy on everybody, as for X that is exactly the
behaviour we want (i.e. not blocking rendering on the next scanout).

> Testing would be easy if we have the crtc CRC stuff, but that's atm stuck
> due to lack of volunteers ...
> 
> Overall I really like the idea and I think doing most of the plane
> enabling (including psr, fbc, ips, and all that stuff which potentially
> blows through a wblank wait) should be done in async work queues. That
> should then also help resume time a lot.

I'd also like to hear Ville's opinion since with his atomic modesetting
I hope we will be able to achieve something very similar.
-Chris
Ville Syrjälä Aug. 12, 2013, 8:03 a.m. UTC | #3
On Fri, Aug 09, 2013 at 09:06:36PM +0100, Chris Wilson wrote:
> On Fri, Aug 09, 2013 at 09:17:11PM +0200, Daniel Vetter wrote:
> > On Fri, Aug 09, 2013 at 03:13:22PM +0100, Chris Wilson wrote:
> > > A simple modeset, where we only wish to switch over to a new framebuffer
> > > such as the transition from fbcon to X, takes around 30-60ms. This is
> > > due to three factors:
> > > 
> > > 1. We need to make sure the fb->obj is in the display domain, which
> > > incurs a cache flush to ensure no dirt is left on the scanout.
> > > 
> > > 2. We need to flush any pending rendering before performing the mmio
> > > so that the frame is complete before it is shown.
> > > 
> > > 3. We currently wait for the vblank after the mmio to be sure that the
> > > old fb is no longer being shown before releasing it.
> > > 
> > > (1) can only be eliminated by userspace preparing the fb->obj in advance
> > > to already be in the display domain. This can be done through use of the
> > > create2 ioctl, or by reusing an existing fb->obj.
> > > 
> > > However, (2) and (3) are already solved by the existing page flip
> > > mechanism, and it is surprisingly trivial to wire them up for use in the
> > > set-base fast path. Though it can be argued that this represents a
> > > subtle ABI break in that the set_config ioctl now returns before the old
> > > framebuffer is unpinned. The danger is that userspace will start to
> > > modify it before it is no longer being shown, however we should be able
> > > to prevent that through proper domain tracking.
> > 
> > Hm, right now we don't prevent anyone from rendering into a to-be-flipped
> > out buffer. There was once code in it, using MI_WAIT_EVENT but we've
> > ripped it out. I guess we could just throw in a synchronous stall on the
> > flip queue though, that should work always.
> 
> I'm glad we did. I'd rather put that into userspace rather than have the
> kernel impose that policy on everybody, as for X that is exactly the
> behaviour we want (i.e. not blocking rendering on the next scanout).
> 
> > Testing would be easy if we have the crtc CRC stuff, but that's atm stuck
> > due to lack of volunteers ...
> > 
> > Overall I really like the idea and I think doing most of the plane
> > enabling (including psr, fbc, ips, and all that stuff which potentially
> > blows through a wblank wait) should be done in async work queues. That
> > should then also help resume time a lot.
> 
> I'd also like to hear Ville's opinion since with his atomic modesetting
> I hope we will be able to achieve something very similar.

Async is nice. Like everyone else I suppose, my only concern has to do
with writes to the old scanout buffer. One option would be to add an
event also to setcrtc, and maybe a new flag to request the
event+nonblocking operation. That's what I have in my atomic page flip
code (actually I think I had separate flags for each). Unfortunately
we don't seem to have flags in setcrtc, unless we commandeer some bits
from mode_valid.
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 809b968..c6eea51 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9077,10 +9077,13 @@  static int intel_crtc_set_config(struct drm_mode_set *set)
 		ret = intel_set_mode(set->crtc, set->mode,
 				     set->x, set->y, set->fb);
 	} else if (config->fb_changed) {
-		intel_crtc_wait_for_pending_flips(set->crtc);
-
-		ret = intel_pipe_set_base(set->crtc,
-					  set->x, set->y, set->fb);
+		if (to_intel_framebuffer(set->fb)->obj->ring == NULL ||
+		    save_set.x != set->x || save_set.y != set->y ||
+		    intel_crtc_page_flip(set->crtc, set->fb, NULL)) {
+			intel_crtc_wait_for_pending_flips(set->crtc);
+			ret = intel_pipe_set_base(set->crtc,
+						  set->x, set->y, set->fb);
+		}
 	}
 
 	if (ret) {