diff mbox

glx/dri3: Use four buffers until X driver supports async flips

Message ID 1404336487-7627-1-git-send-email-keithp@keithp.com (mailing list archive)
State New, archived
Headers show

Commit Message

Keith Packard July 2, 2014, 9:28 p.m. UTC
A driver which doesn't have async flip support will queue up flips without any
way to replace them afterwards. This means we've got a scanout buffer pinned
as soon as we schedule a flip and so we need another buffer to keep from
stalling.

When vblank_mode=0, if there are only three buffers we do:

        current scanout buffer = 0 at MSC 0

        Render frame 1 to buffer 1
        PresentPixmap for buffer 1 at MSC 1

                This is sitting down in the kernel waiting for vblank to
                become the next scanout buffer

        Render frame 2 to buffer 2
        PresentPixmap for buffer 2 at MSC 1

                This cannot be displayed at MSC 1 because the
                kernel doesn't have any way to replace buffer 1 as the pending
                scanout buffer. So, best case this will get displayed at MSC 2.

Now we block after this, waiting for one of the three buffers to become idle.
We can't use buffer 0 because it is the scanout buffer. We can't use buffer 1
because it's sitting in the kernel waiting to become the next scanout buffer
and we can't use buffer 2 because that's the most recent frame which will
become the next scanout buffer if the application doesn't manage to generate
another complete frame by MSC 2.

With four buffers, we get:

        current scanout buffer = 0 at MSC 0

        Render frame 1 to buffer 1
        PresentPixmap for buffer 1 at MSC 1

                This is sitting down in the kernel waiting for vblank to
                become the next scanout buffer

        Render frame 2 to buffer 2
        PresentPixmap for buffer 2 at MSC 1

                This cannot be displayed at MSC 1 because the
                kernel doesn't have any way to replace buffer 1 as the pending
                scanout buffer. So, best case this will get displayed at MSC
                2. The X server will queue this swap until buffer 1 becomes
                the scanout buffer.

        Render frame 3 to buffer 3
        PresentPixmap for buffer 3 at MSC 1

                As soon as the X server sees this, it will replace the pending
                buffer 2 swap with this swap and release buffer 2 back to the
                application

        Render frame 4 to buffer 2
        PresentPixmap for buffer 2 at MSC 1

                Now we're in a steady state, flipping between buffer 2 and 3
                waiting for one of them to be queued to the kernel.

        ...

        current scanout buffer = 1 at MSC 1

                Now buffer 0 is free and (e.g.) buffer 2 is queued in
                the kernel to be the scanout buffer at MSC 2

        Render frames, flipping between buffer 0 and 3

When the system can replace a queued buffer, and we update Present to take
advantage of that, we can use three buffers and get:

        current scanout buffer = 0 at MSC 0

        Render frame 1 to buffer 1
        PresentPixmap for buffer 1 at MSC 1

                This is sitting waiting for vblank to become the next scanout
                buffer

        Render frame 2 to buffer 2
        PresentPixmap for buffer 2 at MSC 1

                Queue this for display at MSC 1
                1. There are three possible results:

                  1) We're still before MSC 1. Buffer 1 is released,
                     buffer 2 is queued waiting for MSC 1.

                  2) We're now after MSC 1. Buffer 0 was released at MSC 1.
                     Buffer 1 is the current scanout buffer.

                     a) If the user asked for a tearing update, we swap
                        scanout from buffer 1 to buffer 2 and release buffer
                        1.

                     b) If the user asked for non-tearing update, we
                        queue buffer 2 for the MSC 2.

                In all three cases, we have a buffer released (call it 'n'),
                ready to receive the next frame.

        Render frame 3 to buffer n
        PresentPixmap for buffer n

                If we're still before MSC 1, then we'll ask to present at MSC
                1. Otherwise, we'll ask to present at MSC 2.

Present already does this if the driver offers async flips, however it does
this by waiting for the right vblank event and sending an async flip right at
that point.

I've hacked the intel driver to offer this, but I get tearing at the top of
the screen. I think this is because flips are always done from within the
ring, and so the latency between the vblank event and the async flip happening
can cause tearing at the top of the screen.

That's why I'm keying the need for the extra buffer on the lack of 2D
driver support for async flips.

Signed-off-by: Keith Packard <keithp@keithp.com>
---
 src/glx/dri3_glx.c  | 20 +++++++++++++++++++-
 src/glx/dri3_priv.h |  6 +++++-
 2 files changed, 24 insertions(+), 2 deletions(-)

Comments

Matt Turner Sept. 29, 2014, 7:25 p.m. UTC | #1
Cc'ing people who might be able to review.
Jason Ekstrand Sept. 29, 2014, 7:36 p.m. UTC | #2
I can't really verify the X bits of this patch.  However, I do understand
the problem and I can verify that using quad-buffering is a totally sane
solution.  We had this issue about a year ago with Wayland apps trying to
do eglSwapInterval(0) and mesa quad-buffers in that case too.

Acked-by: Jason Ekstrand <jason.ekstrand@intel.com>

On Mon, Sep 29, 2014 at 12:25 PM, Matt Turner <mattst88@gmail.com> wrote:

> Cc'ing people who might be able to review.
>
Dylan Baker Sept. 29, 2014, 7:40 p.m. UTC | #3
Tested-by: Dylan Baker <baker.dylan.c@gmail.com>

On Wednesday, July 02, 2014 02:28:07 PM Keith Packard wrote:
> A driver which doesn't have async flip support will queue up flips without any
> way to replace them afterwards. This means we've got a scanout buffer pinned
> as soon as we schedule a flip and so we need another buffer to keep from
> stalling.
> 
> When vblank_mode=0, if there are only three buffers we do:
> 
>         current scanout buffer = 0 at MSC 0
> 
>         Render frame 1 to buffer 1
>         PresentPixmap for buffer 1 at MSC 1
> 
>                 This is sitting down in the kernel waiting for vblank to
>                 become the next scanout buffer
> 
>         Render frame 2 to buffer 2
>         PresentPixmap for buffer 2 at MSC 1
> 
>                 This cannot be displayed at MSC 1 because the
>                 kernel doesn't have any way to replace buffer 1 as the pending
>                 scanout buffer. So, best case this will get displayed at MSC 2.
> 
> Now we block after this, waiting for one of the three buffers to become idle.
> We can't use buffer 0 because it is the scanout buffer. We can't use buffer 1
> because it's sitting in the kernel waiting to become the next scanout buffer
> and we can't use buffer 2 because that's the most recent frame which will
> become the next scanout buffer if the application doesn't manage to generate
> another complete frame by MSC 2.
> 
> With four buffers, we get:
> 
>         current scanout buffer = 0 at MSC 0
> 
>         Render frame 1 to buffer 1
>         PresentPixmap for buffer 1 at MSC 1
> 
>                 This is sitting down in the kernel waiting for vblank to
>                 become the next scanout buffer
> 
>         Render frame 2 to buffer 2
>         PresentPixmap for buffer 2 at MSC 1
> 
>                 This cannot be displayed at MSC 1 because the
>                 kernel doesn't have any way to replace buffer 1 as the pending
>                 scanout buffer. So, best case this will get displayed at MSC
>                 2. The X server will queue this swap until buffer 1 becomes
>                 the scanout buffer.
> 
>         Render frame 3 to buffer 3
>         PresentPixmap for buffer 3 at MSC 1
> 
>                 As soon as the X server sees this, it will replace the pending
>                 buffer 2 swap with this swap and release buffer 2 back to the
>                 application
> 
>         Render frame 4 to buffer 2
>         PresentPixmap for buffer 2 at MSC 1
> 
>                 Now we're in a steady state, flipping between buffer 2 and 3
>                 waiting for one of them to be queued to the kernel.
> 
>         ...
> 
>         current scanout buffer = 1 at MSC 1
> 
>                 Now buffer 0 is free and (e.g.) buffer 2 is queued in
>                 the kernel to be the scanout buffer at MSC 2
> 
>         Render frames, flipping between buffer 0 and 3
> 
> When the system can replace a queued buffer, and we update Present to take
> advantage of that, we can use three buffers and get:
> 
>         current scanout buffer = 0 at MSC 0
> 
>         Render frame 1 to buffer 1
>         PresentPixmap for buffer 1 at MSC 1
> 
>                 This is sitting waiting for vblank to become the next scanout
>                 buffer
> 
>         Render frame 2 to buffer 2
>         PresentPixmap for buffer 2 at MSC 1
> 
>                 Queue this for display at MSC 1
>                 1. There are three possible results:
> 
>                   1) We're still before MSC 1. Buffer 1 is released,
>                      buffer 2 is queued waiting for MSC 1.
> 
>                   2) We're now after MSC 1. Buffer 0 was released at MSC 1.
>                      Buffer 1 is the current scanout buffer.
> 
>                      a) If the user asked for a tearing update, we swap
>                         scanout from buffer 1 to buffer 2 and release buffer
>                         1.
> 
>                      b) If the user asked for non-tearing update, we
>                         queue buffer 2 for the MSC 2.
> 
>                 In all three cases, we have a buffer released (call it 'n'),
>                 ready to receive the next frame.
> 
>         Render frame 3 to buffer n
>         PresentPixmap for buffer n
> 
>                 If we're still before MSC 1, then we'll ask to present at MSC
>                 1. Otherwise, we'll ask to present at MSC 2.
> 
> Present already does this if the driver offers async flips, however it does
> this by waiting for the right vblank event and sending an async flip right at
> that point.
> 
> I've hacked the intel driver to offer this, but I get tearing at the top of
> the screen. I think this is because flips are always done from within the
> ring, and so the latency between the vblank event and the async flip happening
> can cause tearing at the top of the screen.
> 
> That's why I'm keying the need for the extra buffer on the lack of 2D
> driver support for async flips.
> 
> Signed-off-by: Keith Packard <keithp@keithp.com>
> ---
>  src/glx/dri3_glx.c  | 20 +++++++++++++++++++-
>  src/glx/dri3_priv.h |  6 +++++-
>  2 files changed, 24 insertions(+), 2 deletions(-)
> 
> diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c
> index e3fc4de..753b8d8 100644
> --- a/src/glx/dri3_glx.c
> +++ b/src/glx/dri3_glx.c
> @@ -271,8 +271,11 @@ static void
>  dri3_update_num_back(struct dri3_drawable *priv)
>  {
>     priv->num_back = 1;
> -   if (priv->flipping)
> +   if (priv->flipping) {
> +      if (!priv->is_pixmap && !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC))
> +         priv->num_back++;
>        priv->num_back++;
> +   }
>     if (priv->swap_interval == 0)
>        priv->num_back++;
>  }
> @@ -976,6 +979,9 @@ dri3_update_drawable(__DRIdrawable *driDrawable, void *loaderPrivate)
>        xcb_get_geometry_reply_t                  *geom_reply;
>        xcb_void_cookie_t                         cookie;
>        xcb_generic_error_t                       *error;
> +      xcb_present_query_capabilities_cookie_t   present_capabilities_cookie;
> +      xcb_present_query_capabilities_reply_t    *present_capabilities_reply;
> +
>  
>        /* Try to select for input on the window.
>         *
> @@ -994,6 +1000,8 @@ dri3_update_drawable(__DRIdrawable *driDrawable, void *loaderPrivate)
>                                                  XCB_PRESENT_EVENT_MASK_COMPLETE_NOTIFY|
>                                                  XCB_PRESENT_EVENT_MASK_IDLE_NOTIFY);
>  
> +      present_capabilities_cookie = xcb_present_query_capabilities(c, priv->base.xDrawable);
> +
>        /* Create an XCB event queue to hold present events outside of the usual
>         * application event queue
>         */
> @@ -1023,6 +1031,16 @@ dri3_update_drawable(__DRIdrawable *driDrawable, void *loaderPrivate)
>  
>        error = xcb_request_check(c, cookie);
>  
> +      present_capabilities_reply = xcb_present_query_capabilities_reply(c,
> +                                                                        present_capabilities_cookie,
> +                                                                        NULL);
> +
> +      if (present_capabilities_reply) {
> +         priv->present_capabilities = present_capabilities_reply->capabilities;
> +         free(present_capabilities_reply);
> +      } else
> +         priv->present_capabilities = 0;
> +
>        if (error) {
>           if (error->error_code != BadWindow) {
>              free(error);
> diff --git a/src/glx/dri3_priv.h b/src/glx/dri3_priv.h
> index c0e35ee..742db60 100644
> --- a/src/glx/dri3_priv.h
> +++ b/src/glx/dri3_priv.h
> @@ -147,7 +147,7 @@ struct dri3_context
>     __DRIcontext *driContext;
>  };
>  
> -#define DRI3_MAX_BACK   3
> +#define DRI3_MAX_BACK   4
>  #define DRI3_BACK_ID(i) (i)
>  #define DRI3_FRONT_ID   (DRI3_MAX_BACK)
>  
> @@ -172,6 +172,10 @@ struct dri3_drawable {
>     uint8_t is_pixmap;
>     uint8_t flipping;
>  
> +   /* Present extension capabilities
> +    */
> +   uint32_t present_capabilities;
> +
>     /* SBC numbers are tracked by using the serial numbers
>      * in the present request and complete events
>      */
> -- 
> 2.0.0.rc4
> 
> _______________________________________________
> mesa-dev mailing list
> mesa-dev@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/mesa-dev
diff mbox

Patch

diff --git a/src/glx/dri3_glx.c b/src/glx/dri3_glx.c
index e3fc4de..753b8d8 100644
--- a/src/glx/dri3_glx.c
+++ b/src/glx/dri3_glx.c
@@ -271,8 +271,11 @@  static void
 dri3_update_num_back(struct dri3_drawable *priv)
 {
    priv->num_back = 1;
-   if (priv->flipping)
+   if (priv->flipping) {
+      if (!priv->is_pixmap && !(priv->present_capabilities & XCB_PRESENT_CAPABILITY_ASYNC))
+         priv->num_back++;
       priv->num_back++;
+   }
    if (priv->swap_interval == 0)
       priv->num_back++;
 }
@@ -976,6 +979,9 @@  dri3_update_drawable(__DRIdrawable *driDrawable, void *loaderPrivate)
       xcb_get_geometry_reply_t                  *geom_reply;
       xcb_void_cookie_t                         cookie;
       xcb_generic_error_t                       *error;
+      xcb_present_query_capabilities_cookie_t   present_capabilities_cookie;
+      xcb_present_query_capabilities_reply_t    *present_capabilities_reply;
+
 
       /* Try to select for input on the window.
        *
@@ -994,6 +1000,8 @@  dri3_update_drawable(__DRIdrawable *driDrawable, void *loaderPrivate)
                                                 XCB_PRESENT_EVENT_MASK_COMPLETE_NOTIFY|
                                                 XCB_PRESENT_EVENT_MASK_IDLE_NOTIFY);
 
+      present_capabilities_cookie = xcb_present_query_capabilities(c, priv->base.xDrawable);
+
       /* Create an XCB event queue to hold present events outside of the usual
        * application event queue
        */
@@ -1023,6 +1031,16 @@  dri3_update_drawable(__DRIdrawable *driDrawable, void *loaderPrivate)
 
       error = xcb_request_check(c, cookie);
 
+      present_capabilities_reply = xcb_present_query_capabilities_reply(c,
+                                                                        present_capabilities_cookie,
+                                                                        NULL);
+
+      if (present_capabilities_reply) {
+         priv->present_capabilities = present_capabilities_reply->capabilities;
+         free(present_capabilities_reply);
+      } else
+         priv->present_capabilities = 0;
+
       if (error) {
          if (error->error_code != BadWindow) {
             free(error);
diff --git a/src/glx/dri3_priv.h b/src/glx/dri3_priv.h
index c0e35ee..742db60 100644
--- a/src/glx/dri3_priv.h
+++ b/src/glx/dri3_priv.h
@@ -147,7 +147,7 @@  struct dri3_context
    __DRIcontext *driContext;
 };
 
-#define DRI3_MAX_BACK   3
+#define DRI3_MAX_BACK   4
 #define DRI3_BACK_ID(i) (i)
 #define DRI3_FRONT_ID   (DRI3_MAX_BACK)
 
@@ -172,6 +172,10 @@  struct dri3_drawable {
    uint8_t is_pixmap;
    uint8_t flipping;
 
+   /* Present extension capabilities
+    */
+   uint32_t present_capabilities;
+
    /* SBC numbers are tracked by using the serial numbers
     * in the present request and complete events
     */