diff mbox

intel: Fix stencil buffer to be W tiled

Message ID 1310975703-20269-3-git-send-email-chad@chad-versace.us (mailing list archive)
State New, archived
Headers show

Commit Message

Chad Versace July 18, 2011, 7:55 a.m. UTC
Until now, the stencil buffer was allocated as a Y tiled buffer, because
in several locations the PRM states that it is. However, it is actually
W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
4.5.2.1 W-Major Format:
    W-Major Tile Format is used for separate stencil.

The GTT is incapable of W fencing, so we allocate the stencil buffer with
I915_TILING_NONE and decode the tile's layout in software.

This fix touches the following portions of code:
    - In intel_allocate_renderbuffer_storage(), allocate the stencil
      buffer with I915_TILING_NONE.
    - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
      not tiled.
    - In the stencil buffer's span functions, the tile's layout must be
      decoded in software.

This commit mutually depends on the xf86-video-intel commit
    dri: Do not tile stencil buffer
    Author: Chad Versace <chad@chad-versace.us>
    Date:   Mon Jul 18 00:38:00 2011 -0700

On Gen6 with separate stencil enabled, fixes the following Piglit tests:
    bugs/fdo23670-drawpix_stencil
    general/stencil-drawpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
    spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
    spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
    spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
    spec/EXT_packed_depth_stencil/readpixels-24_8

Note: This is a candidate for the 7.11 branch.

CC: Eric Anholt <eric@anholt.net>
CC: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Chad Versace <chad@chad-versace.us>
---
 src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
 src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
 src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
 src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
 src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
 5 files changed, 89 insertions(+), 33 deletions(-)

Comments

Paul Menzel July 18, 2011, 8:20 a.m. UTC | #1
Am Montag, den 18.07.2011, 00:55 -0700 schrieb Chad Versace:

[…]

> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
> index 153803f..d306432 100644
> --- a/src/mesa/drivers/dri/intel/intel_span.c
> +++ b/src/mesa/drivers/dri/intel/intel_span.c
> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>     int miny = 0;							\
>     int maxx = rb->Width;						\
>     int maxy = rb->Height;						\
> -   int stride = rb->RowStride;						\
> -   uint8_t *buf = rb->Data;						\
> +									\
> +   /*									\
> +    * Here we ignore rb->Data and rb->RowStride as set by		\
> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
> +    * manually, the region's *real* base address and stride is		\
> +    * required.								\
> +    */									\
> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
> +   uint8_t *buf = irb->region->buffer->virtual;				\

There are alignment/white space issues above.

> +   unsigned stride = irb->region->pitch;				\
> +   unsigned height = 2 * irb->region->height;				\
> +   bool flip = rb->Name == 0;						\

[…]


Thanks,

Paul
Eric Anholt July 18, 2011, 3:57 p.m. UTC | #2
On Mon, 18 Jul 2011 00:55:03 -0700, Chad Versace <chad@chad-versace.us> wrote:
> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
> index 1669af2..507cc33 100644
> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>  
>     if (irb->Base.Format == MESA_FORMAT_S8) {
>        /*
> +       * The stencil buffer is W tiled. However, we request from the kernel a
> +       * non-tiled buffer because the GTT is incapable of W fencing.
> +       *
>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>         *    The pitch must be set to 2x the value computed based on width, as
> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>         * To accomplish this, we resort to the nasty hack of doubling the drm
>         * region's cpp and halving its height.
>         *
> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
> -       * maps the memory incorrectly.
> +       * If we neglect to double the pitch, then render corruption occurs.
>         */
>        irb->region = intel_region_alloc(intel->intelScreen,
> -				       I915_TILING_Y,
> +				       I915_TILING_NONE,
>  				       cpp * 2,
> -				       width,
> -				       height / 2,
> +				       ALIGN(width, 64),
> +				       /* XXX: Maybe align to 128? */
> +				       ALIGN(height / 2, 64),
>  				       GL_TRUE);
>        if (!irb->region)
>  	return false;

This looks like it would fail on a buffer with height = 129.  Doesn't
seem like 128 pitch requirement would be needed -- has it been tested?

> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
> index 153803f..d306432 100644
> --- a/src/mesa/drivers/dri/intel/intel_span.c
> +++ b/src/mesa/drivers/dri/intel/intel_span.c
> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>     int miny = 0;							\
>     int maxx = rb->Width;						\
>     int maxy = rb->Height;						\
> -   int stride = rb->RowStride;						\
> -   uint8_t *buf = rb->Data;						\
> +									\
> +   /*									\
> +    * Here we ignore rb->Data and rb->RowStride as set by		\
> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
> +    * manually, the region's *real* base address and stride is		\
> +    * required.								\
> +    */									\
> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
> +   uint8_t *buf = irb->region->buffer->virtual;				\
> +   unsigned stride = irb->region->pitch;				\
> +   unsigned height = 2 * irb->region->height;				\
> +   bool flip = rb->Name == 0;						\
>  
> -/* Don't flip y. */
>  #undef Y_FLIP
> -#define Y_FLIP(y) y
> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))

The flip is usually handled by a scale and bias variable, so that Y_FLIP
is ((y) * scale + bias).  I think it'll produce less code, since Y_FLIP
is used a lot.
Ian Romanick July 18, 2011, 6:49 p.m. UTC | #3
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/18/2011 12:55 AM, Chad Versace wrote:
> Until now, the stencil buffer was allocated as a Y tiled buffer, because
> in several locations the PRM states that it is. However, it is actually
> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
> 4.5.2.1 W-Major Format:
>     W-Major Tile Format is used for separate stencil.
> 
> The GTT is incapable of W fencing, so we allocate the stencil buffer with
> I915_TILING_NONE and decode the tile's layout in software.
> 
> This fix touches the following portions of code:
>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>       buffer with I915_TILING_NONE.
>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>       not tiled.
>     - In the stencil buffer's span functions, the tile's layout must be
>       decoded in software.
> 
> This commit mutually depends on the xf86-video-intel commit
>     dri: Do not tile stencil buffer
>     Author: Chad Versace <chad@chad-versace.us>
>     Date:   Mon Jul 18 00:38:00 2011 -0700
> 
> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>     bugs/fdo23670-drawpix_stencil
>     general/stencil-drawpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>     spec/EXT_packed_depth_stencil/readpixels-24_8
> 
> Note: This is a candidate for the 7.11 branch.
> 
> CC: Eric Anholt <eric@anholt.net>
> CC: Kenneth Graunke <kenneth@whitecape.org>
> Signed-off-by: Chad Versace <chad@chad-versace.us>
> ---
>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>  5 files changed, 89 insertions(+), 33 deletions(-)
> 
> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
> index dfca03c..5ab9873 100644
> --- a/src/mesa/drivers/dri/intel/intel_clear.c
> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>  	     */
>              tri_mask |= BUFFER_BIT_STENCIL;
>           }
> +	 else if (intel->has_separate_stencil &&
> +	       stencilRegion->tiling == I915_TILING_NONE) {
> +	    /* The stencil buffer is actually W tiled, which the hardware
> +	     * cannot blit to. */
> +	    tri_mask |= BUFFER_BIT_STENCIL;
> +	 }
>           else {
>              /* clearing all stencil bits, use blitting */
>              blit_mask |= BUFFER_BIT_STENCIL;
> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
> index 2ba1363..fe8be08 100644
> --- a/src/mesa/drivers/dri/intel/intel_context.c
> +++ b/src/mesa/drivers/dri/intel/intel_context.c
> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
>  
> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
> +	 /*
> +	  * The stencil buffer is actually W tiled. The region's tiling is
> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
> +	  * fencing.
> +	  */
>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>  	 return;
>        } else {
> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>         * Presently, however, no verification or clean up is necessary, and
>         * execution should not reach here. If the framebuffer still has a hiz
>         * region, then we have already set dri2_has_hiz to true after
> -       * confirming above that the stencil buffer is Y tiled.
> +       * confirming above that the stencil buffer is W tiled.
>         */
>        assert(0);
>     }
> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
> index 1669af2..507cc33 100644
> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>  
>     if (irb->Base.Format == MESA_FORMAT_S8) {
>        /*
> +       * The stencil buffer is W tiled. However, we request from the kernel a
> +       * non-tiled buffer because the GTT is incapable of W fencing.
> +       *
>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>         *    The pitch must be set to 2x the value computed based on width, as
> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>         * To accomplish this, we resort to the nasty hack of doubling the drm
>         * region's cpp and halving its height.
>         *
> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
> -       * maps the memory incorrectly.
> +       * If we neglect to double the pitch, then render corruption occurs.
>         */
>        irb->region = intel_region_alloc(intel->intelScreen,
> -				       I915_TILING_Y,
> +				       I915_TILING_NONE,
>  				       cpp * 2,
> -				       width,
> -				       height / 2,
> +				       ALIGN(width, 64),
> +				       /* XXX: Maybe align to 128? */
> +				       ALIGN(height / 2, 64),
>  				       GL_TRUE);
>        if (!irb->region)
>  	return false;
> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
> index b2013af..9dd6a52 100644
> --- a/src/mesa/drivers/dri/intel/intel_screen.h
> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
> @@ -63,9 +63,12 @@
>   * x8_z24 and s8).
>   *
>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
> - * nothing happend.
> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
> + * true and continue as if nothing happend.
> + *
> + * [1] The stencil buffer is actually W tiled. However, we request from the
> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>   *
>   * If the buffers are X tiled, however, the handshake has failed and we must
>   * clean up.
> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
> index 153803f..d306432 100644
> --- a/src/mesa/drivers/dri/intel/intel_span.c
> +++ b/src/mesa/drivers/dri/intel/intel_span.c
> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>     int miny = 0;							\
>     int maxx = rb->Width;						\
>     int maxy = rb->Height;						\
> -   int stride = rb->RowStride;						\
> -   uint8_t *buf = rb->Data;						\
> +									\
> +   /*									\
> +    * Here we ignore rb->Data and rb->RowStride as set by		\
> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
> +    * manually, the region's *real* base address and stride is		\
> +    * required.								\
> +    */									\
> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
> +   uint8_t *buf = irb->region->buffer->virtual;				\
> +   unsigned stride = irb->region->pitch;				\
> +   unsigned height = 2 * irb->region->height;				\
> +   bool flip = rb->Name == 0;						\
>  
> -/* Don't flip y. */
>  #undef Y_FLIP
> -#define Y_FLIP(y) y
> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
>  
>  /**
>   * \brief Get pointer offset into stencil buffer.
>   *
> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
> - * with positive stride (s=stride).
> - *
> - *     x    | y     | byte offset
> - *     --------------------------
> - *     0    | 0     | 0
> - *     0    | 1     | 1
> - *     1    | 0     | 2
> - *     1    | 1     | 3
> - *     ...  | ...   | ...
> - *     0    | 2     | s
> - *     0    | 3     | s + 1
> - *     1    | 2     | s + 2
> - *     1    | 3     | s + 3
> - *
> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
> + * must decode the tile's layout in software.
>   *
> + * See
> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
> + *     Format.
> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>   */
> -static inline intptr_t
> -intel_offset_S8(int stride, GLint x, GLint y)
> +static inline uintptr_t

Why the type change?

> +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y)
>  {
> -   return 2 * ((y / 2) * stride + x) + y % 2;
> +   uint32_t tile_size = 4096;
> +   uint32_t tile_width = 64;
> +   uint32_t tile_height = 64;
> +   uint32_t row_size = 64 * stride;
> +
> +   uint32_t tile_x = x / tile_width;
> +   uint32_t tile_y = y / tile_height;
> +
> +   /* The byte's address relative to the tile's base addres. */
> +   uint32_t byte_x = x % tile_width;
> +   uint32_t byte_y = y % tile_height;
> +
> +   uintptr_t u = tile_y * row_size
> +               + tile_x * tile_size
> +               + 512 * (byte_x / 8)
> +               +  64 * (byte_y / 8)
> +               +  32 * ((byte_y / 4) % 2)
> +               +  16 * ((byte_x / 4) % 2)
> +               +   8 * ((byte_y / 2) % 2)
> +               +   4 * ((byte_x / 2) % 2)
> +               +   2 * (byte_y % 2)
> +               +   1 * (byte_x % 2);
> +
> +   /*
> +    * Errata for Gen5:
> +    *
> +    * An additional offset is needed which is not documented in the PRM.
> +    *
> +    * if ((byte_x / 8) % 2 == 1) {
> +    *    if ((byte_y / 8) % 2) == 0) {
> +    *       u += 64;
> +    *    } else {
> +    *       u -= 64;
> +    *    }
> +    * }
> +    *
> +    * The offset is expressed more tersely as
> +    * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1));
> +    */
> +
> +   return u;
>  }
>  
>  #define WRITE_STENCIL(x, y, src)  buf[intel_offset_S8(stride, x, y)] = src;

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk4kgDUACgkQX1gOwKyEAw+9WgCffPOqrYW9zIZ0114HTmwcXfCP
t7sAnA6cpwuNkL+o6/yc20DI27xzSopm
=66S9
-----END PGP SIGNATURE-----
Chad Versace July 18, 2011, 8:54 p.m. UTC | #4
On 07/18/2011 11:49 AM, Ian Romanick wrote:
> On 07/18/2011 12:55 AM, Chad Versace wrote:
>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>> in several locations the PRM states that it is. However, it is actually
>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>> 4.5.2.1 W-Major Format:
>>     W-Major Tile Format is used for separate stencil.
> 
>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>> I915_TILING_NONE and decode the tile's layout in software.
> 
>> This fix touches the following portions of code:
>>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>>       buffer with I915_TILING_NONE.
>>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>>       not tiled.
>>     - In the stencil buffer's span functions, the tile's layout must be
>>       decoded in software.
> 
>> This commit mutually depends on the xf86-video-intel commit
>>     dri: Do not tile stencil buffer
>>     Author: Chad Versace <chad@chad-versace.us>
>>     Date:   Mon Jul 18 00:38:00 2011 -0700
> 
>> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>>     bugs/fdo23670-drawpix_stencil
>>     general/stencil-drawpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>>     spec/EXT_packed_depth_stencil/readpixels-24_8
> 
>> Note: This is a candidate for the 7.11 branch.
> 
>> CC: Eric Anholt <eric@anholt.net>
>> CC: Kenneth Graunke <kenneth@whitecape.org>
>> Signed-off-by: Chad Versace <chad@chad-versace.us>
>> ---
>>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>>  5 files changed, 89 insertions(+), 33 deletions(-)
> 
>> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
>> index dfca03c..5ab9873 100644
>> --- a/src/mesa/drivers/dri/intel/intel_clear.c
>> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
>> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>>  	     */
>>              tri_mask |= BUFFER_BIT_STENCIL;
>>           }
>> +	 else if (intel->has_separate_stencil &&
>> +	       stencilRegion->tiling == I915_TILING_NONE) {
>> +	    /* The stencil buffer is actually W tiled, which the hardware
>> +	     * cannot blit to. */
>> +	    tri_mask |= BUFFER_BIT_STENCIL;
>> +	 }
>>           else {
>>              /* clearing all stencil bits, use blitting */
>>              blit_mask |= BUFFER_BIT_STENCIL;
>> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
>> index 2ba1363..fe8be08 100644
>> --- a/src/mesa/drivers/dri/intel/intel_context.c
>> +++ b/src/mesa/drivers/dri/intel/intel_context.c
>> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
> 
>> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
>> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
>> +	 /*
>> +	  * The stencil buffer is actually W tiled. The region's tiling is
>> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
>> +	  * fencing.
>> +	  */
>>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>>  	 return;
>>        } else {
>> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>         * Presently, however, no verification or clean up is necessary, and
>>         * execution should not reach here. If the framebuffer still has a hiz
>>         * region, then we have already set dri2_has_hiz to true after
>> -       * confirming above that the stencil buffer is Y tiled.
>> +       * confirming above that the stencil buffer is W tiled.
>>         */
>>        assert(0);
>>     }
>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>> index 1669af2..507cc33 100644
>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
> 
>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>        /*
>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>> +       *
>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>         *    The pitch must be set to 2x the value computed based on width, as
>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>         * region's cpp and halving its height.
>>         *
>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>> -       * maps the memory incorrectly.
>> +       * If we neglect to double the pitch, then render corruption occurs.
>>         */
>>        irb->region = intel_region_alloc(intel->intelScreen,
>> -				       I915_TILING_Y,
>> +				       I915_TILING_NONE,
>>  				       cpp * 2,
>> -				       width,
>> -				       height / 2,
>> +				       ALIGN(width, 64),
>> +				       /* XXX: Maybe align to 128? */
>> +				       ALIGN(height / 2, 64),
>>  				       GL_TRUE);
>>        if (!irb->region)
>>  	return false;
>> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
>> index b2013af..9dd6a52 100644
>> --- a/src/mesa/drivers/dri/intel/intel_screen.h
>> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
>> @@ -63,9 +63,12 @@
>>   * x8_z24 and s8).
>>   *
>>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
>> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
>> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
>> - * nothing happend.
>> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
>> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
>> + * true and continue as if nothing happend.
>> + *
>> + * [1] The stencil buffer is actually W tiled. However, we request from the
>> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>>   *
>>   * If the buffers are X tiled, however, the handshake has failed and we must
>>   * clean up.
>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>> index 153803f..d306432 100644
>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>     int miny = 0;							\
>>     int maxx = rb->Width;						\
>>     int maxy = rb->Height;						\
>> -   int stride = rb->RowStride;						\
>> -   uint8_t *buf = rb->Data;						\
>> +									\
>> +   /*									\
>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>> +    * manually, the region's *real* base address and stride is		\
>> +    * required.								\
>> +    */									\
>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>> +   unsigned stride = irb->region->pitch;				\
>> +   unsigned height = 2 * irb->region->height;				\
>> +   bool flip = rb->Name == 0;						\
> 
>> -/* Don't flip y. */
>>  #undef Y_FLIP
>> -#define Y_FLIP(y) y
>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
> 
>>  /**
>>   * \brief Get pointer offset into stencil buffer.
>>   *
>> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
>> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
>> - * with positive stride (s=stride).
>> - *
>> - *     x    | y     | byte offset
>> - *     --------------------------
>> - *     0    | 0     | 0
>> - *     0    | 1     | 1
>> - *     1    | 0     | 2
>> - *     1    | 1     | 3
>> - *     ...  | ...   | ...
>> - *     0    | 2     | s
>> - *     0    | 3     | s + 1
>> - *     1    | 2     | s + 2
>> - *     1    | 3     | s + 3 
>> - *
>> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
>> + * must decode the tile's layout in software.
>>   *
>> + * See
>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
>> + *     Format.
>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>>   */
>> -static inline intptr_t
>> -intel_offset_S8(int stride, GLint x, GLint y)
>> +static inline uintptr_t
> 
> Why the type change?

Two reasons.

1. I redeclared the parameters as unsigned so to generate better code.
Since x, y, and stride are unsigned, the division and modulo operators
generate shift and bit-logic instructions rather than the slower arithmetic
instructions.

Below is a comparison of x % 64, signed versus unsigned, compiled with gcc -O3.
In f, the % generates 5 instructions. In g, only one instruction.

int f(int x) { return x % 64; }

f:
	movl	%edi, %edx
	sarl	$31, %edx
	shrl	$26, %edx
	leal	(%rdi,%rdx), %eax
	andl	$63, %eax
	subl	%edx, %eax
	ret

unsigned g(unsigned x) { return x % 64);

g:
	movl	%edi, %eax
	andl	$63, %eax
	ret

2. I redeclared the return type as unsigned to make it explicit that it does
not return a negative offset.

>> +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y)
>>  {
>> -   return 2 * ((y / 2) * stride + x) + y % 2;
>> +   uint32_t tile_size = 4096;
>> +   uint32_t tile_width = 64;
>> +   uint32_t tile_height = 64;
>> +   uint32_t row_size = 64 * stride;
>> +
>> +   uint32_t tile_x = x / tile_width;
>> +   uint32_t tile_y = y / tile_height;
>> +
>> +   /* The byte's address relative to the tile's base addres. */
>> +   uint32_t byte_x = x % tile_width;
>> +   uint32_t byte_y = y % tile_height;
>> +
>> +   uintptr_t u = tile_y * row_size
>> +               + tile_x * tile_size
>> +               + 512 * (byte_x / 8)
>> +               +  64 * (byte_y / 8)
>> +               +  32 * ((byte_y / 4) % 2)
>> +               +  16 * ((byte_x / 4) % 2)
>> +               +   8 * ((byte_y / 2) % 2)
>> +               +   4 * ((byte_x / 2) % 2)
>> +               +   2 * (byte_y % 2)
>> +               +   1 * (byte_x % 2);
>> +
>> +   /*
>> +    * Errata for Gen5:
>> +    *
>> +    * An additional offset is needed which is not documented in the PRM.
>> +    *
>> +    * if ((byte_x / 8) % 2 == 1) {
>> +    *    if ((byte_y / 8) % 2) == 0) {
>> +    *       u += 64;
>> +    *    } else {
>> +    *       u -= 64;
>> +    *    }
>> +    * }
>> +    *
>> +    * The offset is expressed more tersely as
>> +    * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1));
>> +    */
>> +
>> +   return u;
>>  }
> 
>>  #define WRITE_STENCIL(x, y, src)  buf[intel_offset_S8(stride, x, y)] = src;
Ian Romanick July 18, 2011, 9:02 p.m. UTC | #5
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/18/2011 01:54 PM, Chad Versace wrote:
> On 07/18/2011 11:49 AM, Ian Romanick wrote:
>> On 07/18/2011 12:55 AM, Chad Versace wrote:
>>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>>> in several locations the PRM states that it is. However, it is actually
>>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>>> 4.5.2.1 W-Major Format:
>>>     W-Major Tile Format is used for separate stencil.
>>
>>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>>> I915_TILING_NONE and decode the tile's layout in software.
>>
>>> This fix touches the following portions of code:
>>>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>>>       buffer with I915_TILING_NONE.
>>>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>>>       not tiled.
>>>     - In the stencil buffer's span functions, the tile's layout must be
>>>       decoded in software.
>>
>>> This commit mutually depends on the xf86-video-intel commit
>>>     dri: Do not tile stencil buffer
>>>     Author: Chad Versace <chad@chad-versace.us>
>>>     Date:   Mon Jul 18 00:38:00 2011 -0700
>>
>>> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>>>     bugs/fdo23670-drawpix_stencil
>>>     general/stencil-drawpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>>>     spec/EXT_packed_depth_stencil/readpixels-24_8
>>
>>> Note: This is a candidate for the 7.11 branch.
>>
>>> CC: Eric Anholt <eric@anholt.net>
>>> CC: Kenneth Graunke <kenneth@whitecape.org>
>>> Signed-off-by: Chad Versace <chad@chad-versace.us>
>>> ---
>>>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>>>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>>>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>>>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>>>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>>>  5 files changed, 89 insertions(+), 33 deletions(-)
>>
>>> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
>>> index dfca03c..5ab9873 100644
>>> --- a/src/mesa/drivers/dri/intel/intel_clear.c
>>> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
>>> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>>>  	     */
>>>              tri_mask |= BUFFER_BIT_STENCIL;
>>>           }
>>> +	 else if (intel->has_separate_stencil &&
>>> +	       stencilRegion->tiling == I915_TILING_NONE) {
>>> +	    /* The stencil buffer is actually W tiled, which the hardware
>>> +	     * cannot blit to. */
>>> +	    tri_mask |= BUFFER_BIT_STENCIL;
>>> +	 }
>>>           else {
>>>              /* clearing all stencil bits, use blitting */
>>>              blit_mask |= BUFFER_BIT_STENCIL;
>>> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
>>> index 2ba1363..fe8be08 100644
>>> --- a/src/mesa/drivers/dri/intel/intel_context.c
>>> +++ b/src/mesa/drivers/dri/intel/intel_context.c
>>> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>>>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
>>
>>> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
>>> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
>>> +	 /*
>>> +	  * The stencil buffer is actually W tiled. The region's tiling is
>>> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
>>> +	  * fencing.
>>> +	  */
>>>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>>>  	 return;
>>>        } else {
>>> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>         * Presently, however, no verification or clean up is necessary, and
>>>         * execution should not reach here. If the framebuffer still has a hiz
>>>         * region, then we have already set dri2_has_hiz to true after
>>> -       * confirming above that the stencil buffer is Y tiled.
>>> +       * confirming above that the stencil buffer is W tiled.
>>>         */
>>>        assert(0);
>>>     }
>>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>>> index 1669af2..507cc33 100644
>>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>
>>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>>        /*
>>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>>> +       *
>>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>>         *    The pitch must be set to 2x the value computed based on width, as
>>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>>         * region's cpp and halving its height.
>>>         *
>>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>>> -       * maps the memory incorrectly.
>>> +       * If we neglect to double the pitch, then render corruption occurs.
>>>         */
>>>        irb->region = intel_region_alloc(intel->intelScreen,
>>> -				       I915_TILING_Y,
>>> +				       I915_TILING_NONE,
>>>  				       cpp * 2,
>>> -				       width,
>>> -				       height / 2,
>>> +				       ALIGN(width, 64),
>>> +				       /* XXX: Maybe align to 128? */
>>> +				       ALIGN(height / 2, 64),
>>>  				       GL_TRUE);
>>>        if (!irb->region)
>>>  	return false;
>>> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
>>> index b2013af..9dd6a52 100644
>>> --- a/src/mesa/drivers/dri/intel/intel_screen.h
>>> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
>>> @@ -63,9 +63,12 @@
>>>   * x8_z24 and s8).
>>>   *
>>>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
>>> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
>>> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
>>> - * nothing happend.
>>> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
>>> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
>>> + * true and continue as if nothing happend.
>>> + *
>>> + * [1] The stencil buffer is actually W tiled. However, we request from the
>>> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>>>   *
>>>   * If the buffers are X tiled, however, the handshake has failed and we must
>>>   * clean up.
>>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>>> index 153803f..d306432 100644
>>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>>     int miny = 0;							\
>>>     int maxx = rb->Width;						\
>>>     int maxy = rb->Height;						\
>>> -   int stride = rb->RowStride;						\
>>> -   uint8_t *buf = rb->Data;						\
>>> +									\
>>> +   /*									\
>>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>>> +    * manually, the region's *real* base address and stride is		\
>>> +    * required.								\
>>> +    */									\
>>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>>> +   unsigned stride = irb->region->pitch;				\
>>> +   unsigned height = 2 * irb->region->height;				\
>>> +   bool flip = rb->Name == 0;						\
>>
>>> -/* Don't flip y. */
>>>  #undef Y_FLIP
>>> -#define Y_FLIP(y) y
>>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
>>
>>>  /**
>>>   * \brief Get pointer offset into stencil buffer.
>>>   *
>>> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
>>> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
>>> - * with positive stride (s=stride).
>>> - *
>>> - *     x    | y     | byte offset
>>> - *     --------------------------
>>> - *     0    | 0     | 0
>>> - *     0    | 1     | 1
>>> - *     1    | 0     | 2
>>> - *     1    | 1     | 3
>>> - *     ...  | ...   | ...
>>> - *     0    | 2     | s
>>> - *     0    | 3     | s + 1
>>> - *     1    | 2     | s + 2
>>> - *     1    | 3     | s + 3 
>>> - *
>>> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
>>> + * must decode the tile's layout in software.
>>>   *
>>> + * See
>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
>>> + *     Format.
>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>>>   */
>>> -static inline intptr_t
>>> -intel_offset_S8(int stride, GLint x, GLint y)
>>> +static inline uintptr_t
>>
>> Why the type change?
> 
> Two reasons.
> 
> 1. I redeclared the parameters as unsigned so to generate better code.
> Since x, y, and stride are unsigned, the division and modulo operators
> generate shift and bit-logic instructions rather than the slower arithmetic
> instructions.

Is stride always unsigned now?  Will it always be in the future?  This
is the problem that we encountered with bug #37351 (fixed by e8b1c6d).

> Below is a comparison of x % 64, signed versus unsigned, compiled with gcc -O3.
> In f, the % generates 5 instructions. In g, only one instruction.
> 
> int f(int x) { return x % 64; }
> 
> f:
> 	movl	%edi, %edx
> 	sarl	$31, %edx
> 	shrl	$26, %edx
> 	leal	(%rdi,%rdx), %eax
> 	andl	$63, %eax
> 	subl	%edx, %eax
> 	ret
> 
> unsigned g(unsigned x) { return x % 64);
> 
> g:
> 	movl	%edi, %eax
> 	andl	$63, %eax
> 	ret
> 
> 2. I redeclared the return type as unsigned to make it explicit that it does
> not return a negative offset.
> 
>>> +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y)
>>>  {
>>> -   return 2 * ((y / 2) * stride + x) + y % 2;
>>> +   uint32_t tile_size = 4096;
>>> +   uint32_t tile_width = 64;
>>> +   uint32_t tile_height = 64;
>>> +   uint32_t row_size = 64 * stride;
>>> +
>>> +   uint32_t tile_x = x / tile_width;
>>> +   uint32_t tile_y = y / tile_height;
>>> +
>>> +   /* The byte's address relative to the tile's base addres. */
>>> +   uint32_t byte_x = x % tile_width;
>>> +   uint32_t byte_y = y % tile_height;
>>> +
>>> +   uintptr_t u = tile_y * row_size
>>> +               + tile_x * tile_size
>>> +               + 512 * (byte_x / 8)
>>> +               +  64 * (byte_y / 8)
>>> +               +  32 * ((byte_y / 4) % 2)
>>> +               +  16 * ((byte_x / 4) % 2)
>>> +               +   8 * ((byte_y / 2) % 2)
>>> +               +   4 * ((byte_x / 2) % 2)
>>> +               +   2 * (byte_y % 2)
>>> +               +   1 * (byte_x % 2);
>>> +
>>> +   /*
>>> +    * Errata for Gen5:
>>> +    *
>>> +    * An additional offset is needed which is not documented in the PRM.
>>> +    *
>>> +    * if ((byte_x / 8) % 2 == 1) {
>>> +    *    if ((byte_y / 8) % 2) == 0) {
>>> +    *       u += 64;
>>> +    *    } else {
>>> +    *       u -= 64;
>>> +    *    }
>>> +    * }
>>> +    *
>>> +    * The offset is expressed more tersely as
>>> +    * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1));
>>> +    */
>>> +
>>> +   return u;
>>>  }
>>
>>>  #define WRITE_STENCIL(x, y, src)  buf[intel_offset_S8(stride, x, y)] = src;

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk4kn08ACgkQX1gOwKyEAw/jIQCfYBpmMYlhRAWQa6lKs+z68HJo
aXQAn2DGTu/NU92jUfR4wP19PWU+zdOL
=AQiH
-----END PGP SIGNATURE-----
Chad Versace July 18, 2011, 9:05 p.m. UTC | #6
On 07/18/2011 08:57 AM, Eric Anholt wrote:
> On Mon, 18 Jul 2011 00:55:03 -0700, Chad Versace <chad@chad-versace.us> wrote:
>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>> index 1669af2..507cc33 100644
>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>  
>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>        /*
>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>> +       *
>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>         *    The pitch must be set to 2x the value computed based on width, as
>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>         * region's cpp and halving its height.
>>         *
>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>> -       * maps the memory incorrectly.
>> +       * If we neglect to double the pitch, then render corruption occurs.
>>         */
>>        irb->region = intel_region_alloc(intel->intelScreen,
>> -				       I915_TILING_Y,
>> +				       I915_TILING_NONE,
>>  				       cpp * 2,
>> -				       width,
>> -				       height / 2,
>> +				       ALIGN(width, 64),
>> +				       /* XXX: Maybe align to 128? */
>> +				       ALIGN(height / 2, 64),
>>  				       GL_TRUE);
>>        if (!irb->region)
>>  	return false;
> 
> This looks like it would fail on a buffer with height = 129.  Doesn't
> seem like 128 pitch requirement would be needed -- has it been tested?
> 
>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>> index 153803f..d306432 100644
>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>     int miny = 0;							\
>>     int maxx = rb->Width;						\
>>     int maxy = rb->Height;						\
>> -   int stride = rb->RowStride;						\
>> -   uint8_t *buf = rb->Data;						\
>> +									\
>> +   /*									\
>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>> +    * manually, the region's *real* base address and stride is		\
>> +    * required.								\
>> +    */									\
>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>> +   unsigned stride = irb->region->pitch;				\
>> +   unsigned height = 2 * irb->region->height;				\
>> +   bool flip = rb->Name == 0;						\
>>  
>> -/* Don't flip y. */
>>  #undef Y_FLIP
>> -#define Y_FLIP(y) y
>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
> 
> The flip is usually handled by a scale and bias variable, so that Y_FLIP
> is ((y) * scale + bias).  I think it'll produce less code, since Y_FLIP
> is used a lot.

Good call. Does changing the hunk like this look good to you?

+   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
+   uint8_t *buf = irb->region->buffer->virtual;			\
+   unsigned stride = irb->region->pitch;				\
+   unsigned height = 2 * irb->region->height;				\
+   bool flip = rb->Name == 0;						\
+   int y_scale = flip ? -1 : 1;					\
+   int y_bias = flip ? (height - 1) : 0;				\

-/* Don't flip y. */
 #undef Y_FLIP
-#define Y_FLIP(y) y
+#define Y_FLIP(y) (y_scale * (y) + y_bias)
Chad Versace July 18, 2011, 9:24 p.m. UTC | #7
On 07/18/2011 02:02 PM, Ian Romanick wrote:
> On 07/18/2011 01:54 PM, Chad Versace wrote:
>> On 07/18/2011 11:49 AM, Ian Romanick wrote:
>>> On 07/18/2011 12:55 AM, Chad Versace wrote:
>>>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>>>> in several locations the PRM states that it is. However, it is actually
>>>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>>>> 4.5.2.1 W-Major Format:
>>>>     W-Major Tile Format is used for separate stencil.
>>>
>>>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>>>> I915_TILING_NONE and decode the tile's layout in software.
>>>
>>>> This fix touches the following portions of code:
>>>>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>>>>       buffer with I915_TILING_NONE.
>>>>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>>>>       not tiled.
>>>>     - In the stencil buffer's span functions, the tile's layout must be
>>>>       decoded in software.
>>>
>>>> This commit mutually depends on the xf86-video-intel commit
>>>>     dri: Do not tile stencil buffer
>>>>     Author: Chad Versace <chad@chad-versace.us>
>>>>     Date:   Mon Jul 18 00:38:00 2011 -0700
>>>
>>>> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>>>>     bugs/fdo23670-drawpix_stencil
>>>>     general/stencil-drawpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>>>>     spec/EXT_packed_depth_stencil/readpixels-24_8
>>>
>>>> Note: This is a candidate for the 7.11 branch.
>>>
>>>> CC: Eric Anholt <eric@anholt.net>
>>>> CC: Kenneth Graunke <kenneth@whitecape.org>
>>>> Signed-off-by: Chad Versace <chad@chad-versace.us>
>>>> ---
>>>>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>>>>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>>>>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>>>>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>>>>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>>>>  5 files changed, 89 insertions(+), 33 deletions(-)
>>>
>>>> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
>>>> index dfca03c..5ab9873 100644
>>>> --- a/src/mesa/drivers/dri/intel/intel_clear.c
>>>> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
>>>> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>>>>  	     */
>>>>              tri_mask |= BUFFER_BIT_STENCIL;
>>>>           }
>>>> +	 else if (intel->has_separate_stencil &&
>>>> +	       stencilRegion->tiling == I915_TILING_NONE) {
>>>> +	    /* The stencil buffer is actually W tiled, which the hardware
>>>> +	     * cannot blit to. */
>>>> +	    tri_mask |= BUFFER_BIT_STENCIL;
>>>> +	 }
>>>>           else {
>>>>              /* clearing all stencil bits, use blitting */
>>>>              blit_mask |= BUFFER_BIT_STENCIL;
>>>> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
>>>> index 2ba1363..fe8be08 100644
>>>> --- a/src/mesa/drivers/dri/intel/intel_context.c
>>>> +++ b/src/mesa/drivers/dri/intel/intel_context.c
>>>> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>>>>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
>>>
>>>> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
>>>> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
>>>> +	 /*
>>>> +	  * The stencil buffer is actually W tiled. The region's tiling is
>>>> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
>>>> +	  * fencing.
>>>> +	  */
>>>>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>>>>  	 return;
>>>>        } else {
>>>> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>>         * Presently, however, no verification or clean up is necessary, and
>>>>         * execution should not reach here. If the framebuffer still has a hiz
>>>>         * region, then we have already set dri2_has_hiz to true after
>>>> -       * confirming above that the stencil buffer is Y tiled.
>>>> +       * confirming above that the stencil buffer is W tiled.
>>>>         */
>>>>        assert(0);
>>>>     }
>>>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>>>> index 1669af2..507cc33 100644
>>>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>>>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>>>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>
>>>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>>>        /*
>>>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>>>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>>>> +       *
>>>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>>>         *    The pitch must be set to 2x the value computed based on width, as
>>>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>>>         * region's cpp and halving its height.
>>>>         *
>>>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>>>> -       * maps the memory incorrectly.
>>>> +       * If we neglect to double the pitch, then render corruption occurs.
>>>>         */
>>>>        irb->region = intel_region_alloc(intel->intelScreen,
>>>> -				       I915_TILING_Y,
>>>> +				       I915_TILING_NONE,
>>>>  				       cpp * 2,
>>>> -				       width,
>>>> -				       height / 2,
>>>> +				       ALIGN(width, 64),
>>>> +				       /* XXX: Maybe align to 128? */
>>>> +				       ALIGN(height / 2, 64),
>>>>  				       GL_TRUE);
>>>>        if (!irb->region)
>>>>  	return false;
>>>> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
>>>> index b2013af..9dd6a52 100644
>>>> --- a/src/mesa/drivers/dri/intel/intel_screen.h
>>>> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
>>>> @@ -63,9 +63,12 @@
>>>>   * x8_z24 and s8).
>>>>   *
>>>>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
>>>> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
>>>> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
>>>> - * nothing happend.
>>>> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
>>>> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
>>>> + * true and continue as if nothing happend.
>>>> + *
>>>> + * [1] The stencil buffer is actually W tiled. However, we request from the
>>>> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>>>>   *
>>>>   * If the buffers are X tiled, however, the handshake has failed and we must
>>>>   * clean up.
>>>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>>>> index 153803f..d306432 100644
>>>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>>>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>>>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>>>     int miny = 0;							\
>>>>     int maxx = rb->Width;						\
>>>>     int maxy = rb->Height;						\
>>>> -   int stride = rb->RowStride;						\
>>>> -   uint8_t *buf = rb->Data;						\
>>>> +									\
>>>> +   /*									\
>>>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>>>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>>>> +    * manually, the region's *real* base address and stride is		\
>>>> +    * required.								\
>>>> +    */									\
>>>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>>>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>>>> +   unsigned stride = irb->region->pitch;				\
>>>> +   unsigned height = 2 * irb->region->height;				\
>>>> +   bool flip = rb->Name == 0;						\
>>>
>>>> -/* Don't flip y. */
>>>>  #undef Y_FLIP
>>>> -#define Y_FLIP(y) y
>>>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
>>>
>>>>  /**
>>>>   * \brief Get pointer offset into stencil buffer.
>>>>   *
>>>> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
>>>> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
>>>> - * with positive stride (s=stride).
>>>> - *
>>>> - *     x    | y     | byte offset
>>>> - *     --------------------------
>>>> - *     0    | 0     | 0
>>>> - *     0    | 1     | 1
>>>> - *     1    | 0     | 2
>>>> - *     1    | 1     | 3
>>>> - *     ...  | ...   | ...
>>>> - *     0    | 2     | s
>>>> - *     0    | 3     | s + 1
>>>> - *     1    | 2     | s + 2
>>>> - *     1    | 3     | s + 3 
>>>> - *
>>>> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
>>>> + * must decode the tile's layout in software.
>>>>   *
>>>> + * See
>>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
>>>> + *     Format.
>>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>>>>   */
>>>> -static inline intptr_t
>>>> -intel_offset_S8(int stride, GLint x, GLint y)
>>>> +static inline uintptr_t
>>>
>>> Why the type change?
> 
>> Two reasons.
> 
>> 1. I redeclared the parameters as unsigned so to generate better code.
>> Since x, y, and stride are unsigned, the division and modulo operators
>> generate shift and bit-logic instructions rather than the slower arithmetic
>> instructions.
> 
> Is stride always unsigned now? 

intelSpanRenderStart still sets rb->RowStride to be negative for system
stencil buffers. But we ignore that and use a positive stride instead.
To quote the hunk above:

+   /*									\
+    * Here we ignore rb->Data and rb->RowStride as set by		\
+    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
+    * manually, the region's *real* base address and stride is		\
+    * required.							\
+    */									\
+   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
+   uint8_t *buf = irb->region->buffer->virtual;			\
+   unsigned stride = irb->region->pitch;

By setting the buf and stride to the *real* base address and stride,
intel_offset_S8 can implement the exact W-tiling algorithm as described
in the bspec. (PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling
Algorithm).

If intel_offset_S8 instead used rb->Data and rb->RowStride as the buffer's
base address and stride, then intel_offset_S8 would essentially
need two implementations, like this:
    intptr_t
    intel_offset_S8(...)
    {
        if (stride > 0) {
            // do normal stuff;
        } else {
            // do upside down stuff;
        }
    }

I'd like to avoid such a bifurcated function.

> Will it always be in the future? 

Yes.

> This is the problem that we encountered with bug #37351 (fixed by e8b1c6d).

Ah... Thanks for recalling that bug. I believe setting the return type of
intel_offset_S8 to be intptr_t will eliminate reoccurrence of this bug. Do you agree?

>> Below is a comparison of x % 64, signed versus unsigned, compiled with gcc -O3.
>> In f, the % generates 5 instructions. In g, only one instruction.
> 
>> int f(int x) { return x % 64; }
> 
>> f:
>> 	movl	%edi, %edx
>> 	sarl	$31, %edx
>> 	shrl	$26, %edx
>> 	leal	(%rdi,%rdx), %eax
>> 	andl	$63, %eax
>> 	subl	%edx, %eax
>> 	ret
> 
>> unsigned g(unsigned x) { return x % 64);
> 
>> g:
>> 	movl	%edi, %eax
>> 	andl	$63, %eax
>> 	ret
> 
>> 2. I redeclared the return type as unsigned to make it explicit that it does
>> not return a negative offset.
> 
>>>> +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y)
>>>>  {
>>>> -   return 2 * ((y / 2) * stride + x) + y % 2;
>>>> +   uint32_t tile_size = 4096;
>>>> +   uint32_t tile_width = 64;
>>>> +   uint32_t tile_height = 64;
>>>> +   uint32_t row_size = 64 * stride;
>>>> +
>>>> +   uint32_t tile_x = x / tile_width;
>>>> +   uint32_t tile_y = y / tile_height;
>>>> +
>>>> +   /* The byte's address relative to the tile's base addres. */
>>>> +   uint32_t byte_x = x % tile_width;
>>>> +   uint32_t byte_y = y % tile_height;
>>>> +
>>>> +   uintptr_t u = tile_y * row_size
>>>> +               + tile_x * tile_size
>>>> +               + 512 * (byte_x / 8)
>>>> +               +  64 * (byte_y / 8)
>>>> +               +  32 * ((byte_y / 4) % 2)
>>>> +               +  16 * ((byte_x / 4) % 2)
>>>> +               +   8 * ((byte_y / 2) % 2)
>>>> +               +   4 * ((byte_x / 2) % 2)
>>>> +               +   2 * (byte_y % 2)
>>>> +               +   1 * (byte_x % 2);
>>>> +
>>>> +   /*
>>>> +    * Errata for Gen5:
>>>> +    *
>>>> +    * An additional offset is needed which is not documented in the PRM.
>>>> +    *
>>>> +    * if ((byte_x / 8) % 2 == 1) {
>>>> +    *    if ((byte_y / 8) % 2) == 0) {
>>>> +    *       u += 64;
>>>> +    *    } else {
>>>> +    *       u -= 64;
>>>> +    *    }
>>>> +    * }
>>>> +    *
>>>> +    * The offset is expressed more tersely as
>>>> +    * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1));
>>>> +    */
>>>> +
>>>> +   return u;
>>>>  }
>>>
>>>>  #define WRITE_STENCIL(x, y, src)  buf[intel_offset_S8(stride, x, y)] = src;
Ian Romanick July 18, 2011, 10:34 p.m. UTC | #8
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 07/18/2011 02:24 PM, Chad Versace wrote:
> On 07/18/2011 02:02 PM, Ian Romanick wrote:
>> On 07/18/2011 01:54 PM, Chad Versace wrote:
>>> On 07/18/2011 11:49 AM, Ian Romanick wrote:
>>>> On 07/18/2011 12:55 AM, Chad Versace wrote:
>>>>> Until now, the stencil buffer was allocated as a Y tiled buffer, because
>>>>> in several locations the PRM states that it is. However, it is actually
>>>>> W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section
>>>>> 4.5.2.1 W-Major Format:
>>>>>     W-Major Tile Format is used for separate stencil.
>>>>
>>>>> The GTT is incapable of W fencing, so we allocate the stencil buffer with
>>>>> I915_TILING_NONE and decode the tile's layout in software.
>>>>
>>>>> This fix touches the following portions of code:
>>>>>     - In intel_allocate_renderbuffer_storage(), allocate the stencil
>>>>>       buffer with I915_TILING_NONE.
>>>>>     - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is
>>>>>       not tiled.
>>>>>     - In the stencil buffer's span functions, the tile's layout must be
>>>>>       decoded in software.
>>>>
>>>>> This commit mutually depends on the xf86-video-intel commit
>>>>>     dri: Do not tile stencil buffer
>>>>>     Author: Chad Versace <chad@chad-versace.us>
>>>>>     Date:   Mon Jul 18 00:38:00 2011 -0700
>>>>
>>>>> On Gen6 with separate stencil enabled, fixes the following Piglit tests:
>>>>>     bugs/fdo23670-drawpix_stencil
>>>>>     general/stencil-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels
>>>>>     spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels
>>>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels
>>>>>     spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels
>>>>>     spec/EXT_packed_depth_stencil/readpixels-24_8
>>>>
>>>>> Note: This is a candidate for the 7.11 branch.
>>>>
>>>>> CC: Eric Anholt <eric@anholt.net>
>>>>> CC: Kenneth Graunke <kenneth@whitecape.org>
>>>>> Signed-off-by: Chad Versace <chad@chad-versace.us>
>>>>> ---
>>>>>  src/mesa/drivers/dri/intel/intel_clear.c   |    6 ++
>>>>>  src/mesa/drivers/dri/intel/intel_context.c |    9 ++-
>>>>>  src/mesa/drivers/dri/intel/intel_fbo.c     |   13 +++--
>>>>>  src/mesa/drivers/dri/intel/intel_screen.h  |    9 ++-
>>>>>  src/mesa/drivers/dri/intel/intel_span.c    |   85 ++++++++++++++++++++--------
>>>>>  5 files changed, 89 insertions(+), 33 deletions(-)
>>>>
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
>>>>> index dfca03c..5ab9873 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_clear.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_clear.c
>>>>> @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask)
>>>>>  	     */
>>>>>              tri_mask |= BUFFER_BIT_STENCIL;
>>>>>           }
>>>>> +	 else if (intel->has_separate_stencil &&
>>>>> +	       stencilRegion->tiling == I915_TILING_NONE) {
>>>>> +	    /* The stencil buffer is actually W tiled, which the hardware
>>>>> +	     * cannot blit to. */
>>>>> +	    tri_mask |= BUFFER_BIT_STENCIL;
>>>>> +	 }
>>>>>           else {
>>>>>              /* clearing all stencil bits, use blitting */
>>>>>              blit_mask |= BUFFER_BIT_STENCIL;
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
>>>>> index 2ba1363..fe8be08 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_context.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_context.c
>>>>> @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>>>        assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
>>>>>        assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
>>>>
>>>>> -      if (stencil_rb->region->tiling == I915_TILING_Y) {
>>>>> +      if (stencil_rb->region->tiling == I915_TILING_NONE) {
>>>>> +	 /*
>>>>> +	  * The stencil buffer is actually W tiled. The region's tiling is
>>>>> +	  * I915_TILING_NONE, however, because the GTT is incapable of W
>>>>> +	  * fencing.
>>>>> +	  */
>>>>>  	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
>>>>>  	 return;
>>>>>        } else {
>>>>> @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel,
>>>>>         * Presently, however, no verification or clean up is necessary, and
>>>>>         * execution should not reach here. If the framebuffer still has a hiz
>>>>>         * region, then we have already set dri2_has_hiz to true after
>>>>> -       * confirming above that the stencil buffer is Y tiled.
>>>>> +       * confirming above that the stencil buffer is W tiled.
>>>>>         */
>>>>>        assert(0);
>>>>>     }
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>>>>> index 1669af2..507cc33 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>>>>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>>
>>>>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>>>>        /*
>>>>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>>>>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>>>>> +       *
>>>>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>>>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>>>>         *    The pitch must be set to 2x the value computed based on width, as
>>>>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>>>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>>>>         * region's cpp and halving its height.
>>>>>         *
>>>>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>>>>> -       * maps the memory incorrectly.
>>>>> +       * If we neglect to double the pitch, then render corruption occurs.
>>>>>         */
>>>>>        irb->region = intel_region_alloc(intel->intelScreen,
>>>>> -				       I915_TILING_Y,
>>>>> +				       I915_TILING_NONE,
>>>>>  				       cpp * 2,
>>>>> -				       width,
>>>>> -				       height / 2,
>>>>> +				       ALIGN(width, 64),
>>>>> +				       /* XXX: Maybe align to 128? */
>>>>> +				       ALIGN(height / 2, 64),
>>>>>  				       GL_TRUE);
>>>>>        if (!irb->region)
>>>>>  	return false;
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
>>>>> index b2013af..9dd6a52 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_screen.h
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_screen.h
>>>>> @@ -63,9 +63,12 @@
>>>>>   * x8_z24 and s8).
>>>>>   *
>>>>>   * Eventually, intel_update_renderbuffers() makes a DRI2 request for
>>>>> - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
>>>>> - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
>>>>> - * nothing happend.
>>>>> + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
>>>>> + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
>>>>> + * true and continue as if nothing happend.
>>>>> + *
>>>>> + * [1] The stencil buffer is actually W tiled. However, we request from the
>>>>> + *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
>>>>>   *
>>>>>   * If the buffers are X tiled, however, the handshake has failed and we must
>>>>>   * clean up.
>>>>> diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
>>>>> index 153803f..d306432 100644
>>>>> --- a/src/mesa/drivers/dri/intel/intel_span.c
>>>>> +++ b/src/mesa/drivers/dri/intel/intel_span.c
>>>>> @@ -131,38 +131,77 @@ intel_set_span_functions(struct intel_context *intel,
>>>>>     int miny = 0;							\
>>>>>     int maxx = rb->Width;						\
>>>>>     int maxy = rb->Height;						\
>>>>> -   int stride = rb->RowStride;						\
>>>>> -   uint8_t *buf = rb->Data;						\
>>>>> +									\
>>>>> +   /*									\
>>>>> +    * Here we ignore rb->Data and rb->RowStride as set by		\
>>>>> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
>>>>> +    * manually, the region's *real* base address and stride is		\
>>>>> +    * required.								\
>>>>> +    */									\
>>>>> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
>>>>> +   uint8_t *buf = irb->region->buffer->virtual;				\
>>>>> +   unsigned stride = irb->region->pitch;				\
>>>>> +   unsigned height = 2 * irb->region->height;				\
>>>>> +   bool flip = rb->Name == 0;						\
>>>>
>>>>> -/* Don't flip y. */
>>>>>  #undef Y_FLIP
>>>>> -#define Y_FLIP(y) y
>>>>> +#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
>>>>
>>>>>  /**
>>>>>   * \brief Get pointer offset into stencil buffer.
>>>>>   *
>>>>> - * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
>>>>> - * The table below demonstrates how the pointer arithmetic behaves for a buffer
>>>>> - * with positive stride (s=stride).
>>>>> - *
>>>>> - *     x    | y     | byte offset
>>>>> - *     --------------------------
>>>>> - *     0    | 0     | 0
>>>>> - *     0    | 1     | 1
>>>>> - *     1    | 0     | 2
>>>>> - *     1    | 1     | 3
>>>>> - *     ...  | ...   | ...
>>>>> - *     0    | 2     | s
>>>>> - *     0    | 3     | s + 1
>>>>> - *     1    | 2     | s + 2
>>>>> - *     1    | 3     | s + 3 
>>>>> - *
>>>>> + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
>>>>> + * must decode the tile's layout in software.
>>>>>   *
>>>>> + * See
>>>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
>>>>> + *     Format.
>>>>> + *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
>>>>>   */
>>>>> -static inline intptr_t
>>>>> -intel_offset_S8(int stride, GLint x, GLint y)
>>>>> +static inline uintptr_t
>>>>
>>>> Why the type change?
>>
>>> Two reasons.
>>
>>> 1. I redeclared the parameters as unsigned so to generate better code.
>>> Since x, y, and stride are unsigned, the division and modulo operators
>>> generate shift and bit-logic instructions rather than the slower arithmetic
>>> instructions.
>>
>> Is stride always unsigned now? 
> 
> intelSpanRenderStart still sets rb->RowStride to be negative for system
> stencil buffers. But we ignore that and use a positive stride instead.
> To quote the hunk above:
> 
> +   /*									\
> +    * Here we ignore rb->Data and rb->RowStride as set by		\
> +    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
> +    * manually, the region's *real* base address and stride is		\
> +    * required.							\
> +    */									\
> +   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
> +   uint8_t *buf = irb->region->buffer->virtual;			\
> +   unsigned stride = irb->region->pitch;
> 
> By setting the buf and stride to the *real* base address and stride,
> intel_offset_S8 can implement the exact W-tiling algorithm as described
> in the bspec. (PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling
> Algorithm).
> 
> If intel_offset_S8 instead used rb->Data and rb->RowStride as the buffer's
> base address and stride, then intel_offset_S8 would essentially
> need two implementations, like this:
>     intptr_t
>     intel_offset_S8(...)
>     {
>         if (stride > 0) {
>             // do normal stuff;
>         } else {
>             // do upside down stuff;
>         }
>     }
> 
> I'd like to avoid such a bifurcated function.

That makes perfect sense.  Thanks for clearing that all up.

>> Will it always be in the future? 
> 
> Yes.
> 
>> This is the problem that we encountered with bug #37351 (fixed by e8b1c6d).
> 
> Ah... Thanks for recalling that bug. I believe setting the return type of
> intel_offset_S8 to be intptr_t will eliminate reoccurrence of this bug. Do you agree?

It should.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Fedora - http://enigmail.mozdev.org/

iEYEARECAAYFAk4ktN4ACgkQX1gOwKyEAw/fxQCeMr45By87f0YxBNU+Fp62C1LV
asMAnitL5MjvaIKXgXUIGFDof8hNHRPL
=5kb4
-----END PGP SIGNATURE-----
Chad Versace July 19, 2011, midnight UTC | #9
On 07/18/2011 08:57 AM, Eric Anholt wrote:
> On Mon, 18 Jul 2011 00:55:03 -0700, Chad Versace <chad@chad-versace.us> wrote:
>> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
>> index 1669af2..507cc33 100644
>> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
>> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
>> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>  
>>     if (irb->Base.Format == MESA_FORMAT_S8) {
>>        /*
>> +       * The stencil buffer is W tiled. However, we request from the kernel a
>> +       * non-tiled buffer because the GTT is incapable of W fencing.
>> +       *
>>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
>>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
>>         *    The pitch must be set to 2x the value computed based on width, as
>> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
>>         * To accomplish this, we resort to the nasty hack of doubling the drm
>>         * region's cpp and halving its height.
>>         *
>> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
>> -       * maps the memory incorrectly.
>> +       * If we neglect to double the pitch, then render corruption occurs.
>>         */
>>        irb->region = intel_region_alloc(intel->intelScreen,
>> -				       I915_TILING_Y,
>> +				       I915_TILING_NONE,
>>  				       cpp * 2,
>> -				       width,
>> -				       height / 2,
>> +				       ALIGN(width, 64),
>> +				       /* XXX: Maybe align to 128? */
>> +				       ALIGN(height / 2, 64),
>>  				       GL_TRUE);
>>        if (!irb->region)
>>  	return false;
> 
> This looks like it would fail on a buffer with height = 129.

Oops. It does fail for height = 129. Accordingly, I've changed this hunk to:

       irb->region = intel_region_alloc(intel->intelScreen,
-				       I915_TILING_Y,
+				       I915_TILING_NONE,
 				       cpp * 2,
-				       width,
-				       height / 2,
+				       ALIGN(width, 64),
+				       ALIGN((height + 1)/ 2, 64),
 				       GL_TRUE);

And changed the same line in xf86-video-intel too.

> Doesn't
> seem like 128 pitch requirement would be needed -- has it been tested?

The XXX was left in the patch accidentally. 128-alignment has been tested, and was
found unnecessary.
Eric Anholt July 19, 2011, 3:34 p.m. UTC | #10
On Mon, 18 Jul 2011 17:00:54 -0700, Chad Versace <chad@chad-versace.us> wrote:
> On 07/18/2011 08:57 AM, Eric Anholt wrote:
> > On Mon, 18 Jul 2011 00:55:03 -0700, Chad Versace <chad@chad-versace.us> wrote:
> >> diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
> >> index 1669af2..507cc33 100644
> >> --- a/src/mesa/drivers/dri/intel/intel_fbo.c
> >> +++ b/src/mesa/drivers/dri/intel/intel_fbo.c
> >> @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
> >>  
> >>     if (irb->Base.Format == MESA_FORMAT_S8) {
> >>        /*
> >> +       * The stencil buffer is W tiled. However, we request from the kernel a
> >> +       * non-tiled buffer because the GTT is incapable of W fencing.
> >> +       *
> >>         * The stencil buffer has quirky pitch requirements.  From Vol 2a,
> >>         * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
> >>         *    The pitch must be set to 2x the value computed based on width, as
> >> @@ -180,14 +183,14 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
> >>         * To accomplish this, we resort to the nasty hack of doubling the drm
> >>         * region's cpp and halving its height.
> >>         *
> >> -       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
> >> -       * maps the memory incorrectly.
> >> +       * If we neglect to double the pitch, then render corruption occurs.
> >>         */
> >>        irb->region = intel_region_alloc(intel->intelScreen,
> >> -				       I915_TILING_Y,
> >> +				       I915_TILING_NONE,
> >>  				       cpp * 2,
> >> -				       width,
> >> -				       height / 2,
> >> +				       ALIGN(width, 64),
> >> +				       /* XXX: Maybe align to 128? */
> >> +				       ALIGN(height / 2, 64),
> >>  				       GL_TRUE);
> >>        if (!irb->region)
> >>  	return false;
> > 
> > This looks like it would fail on a buffer with height = 129.
> 
> Oops. It does fail for height = 129. Accordingly, I've changed this hunk to:
> 
>        irb->region = intel_region_alloc(intel->intelScreen,
> -				       I915_TILING_Y,
> +				       I915_TILING_NONE,
>  				       cpp * 2,
> -				       width,
> -				       height / 2,
> +				       ALIGN(width, 64),
> +				       ALIGN((height + 1)/ 2, 64),
>  				       GL_TRUE);
> 
> And changed the same line in xf86-video-intel too.
> 
> > Doesn't
> > seem like 128 pitch requirement would be needed -- has it been tested?
> 
> The XXX was left in the patch accidentally. 128-alignment has been tested, and was
> found unnecessary.

Looks good to me, gets the Reviewed-by: Eric Anholt <eric@anholt.net>
Chad Versace July 19, 2011, 8:03 p.m. UTC | #11
On 07/18/2011 01:20 AM, Paul Menzel wrote:
> Am Montag, den 18.07.2011, 00:55 -0700 schrieb Chad Versace:
> There are alignment/white space issues above.
> 
>> +   unsigned stride = irb->region->pitch;				\
>> +   unsigned height = 2 * irb->region->height;				\
>> +   bool flip = rb->Name == 0;						\
> 
> […]
> 
> 
> Thanks,
> 
> Paul
> 

Thanks. Fix whitespace in both patches.
diff mbox

Patch

diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c
index dfca03c..5ab9873 100644
--- a/src/mesa/drivers/dri/intel/intel_clear.c
+++ b/src/mesa/drivers/dri/intel/intel_clear.c
@@ -143,6 +143,12 @@  intelClear(struct gl_context *ctx, GLbitfield mask)
 	     */
             tri_mask |= BUFFER_BIT_STENCIL;
          }
+	 else if (intel->has_separate_stencil &&
+	       stencilRegion->tiling == I915_TILING_NONE) {
+	    /* The stencil buffer is actually W tiled, which the hardware
+	     * cannot blit to. */
+	    tri_mask |= BUFFER_BIT_STENCIL;
+	 }
          else {
             /* clearing all stencil bits, use blitting */
             blit_mask |= BUFFER_BIT_STENCIL;
diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c
index 2ba1363..fe8be08 100644
--- a/src/mesa/drivers/dri/intel/intel_context.c
+++ b/src/mesa/drivers/dri/intel/intel_context.c
@@ -1439,7 +1439,12 @@  intel_verify_dri2_has_hiz(struct intel_context *intel,
       assert(stencil_rb->Base.Format == MESA_FORMAT_S8);
       assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24);
 
-      if (stencil_rb->region->tiling == I915_TILING_Y) {
+      if (stencil_rb->region->tiling == I915_TILING_NONE) {
+	 /*
+	  * The stencil buffer is actually W tiled. The region's tiling is
+	  * I915_TILING_NONE, however, because the GTT is incapable of W
+	  * fencing.
+	  */
 	 intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE;
 	 return;
       } else {
@@ -1527,7 +1532,7 @@  intel_verify_dri2_has_hiz(struct intel_context *intel,
        * Presently, however, no verification or clean up is necessary, and
        * execution should not reach here. If the framebuffer still has a hiz
        * region, then we have already set dri2_has_hiz to true after
-       * confirming above that the stencil buffer is Y tiled.
+       * confirming above that the stencil buffer is W tiled.
        */
       assert(0);
    }
diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c
index 1669af2..507cc33 100644
--- a/src/mesa/drivers/dri/intel/intel_fbo.c
+++ b/src/mesa/drivers/dri/intel/intel_fbo.c
@@ -173,6 +173,9 @@  intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
 
    if (irb->Base.Format == MESA_FORMAT_S8) {
       /*
+       * The stencil buffer is W tiled. However, we request from the kernel a
+       * non-tiled buffer because the GTT is incapable of W fencing.
+       *
        * The stencil buffer has quirky pitch requirements.  From Vol 2a,
        * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch":
        *    The pitch must be set to 2x the value computed based on width, as
@@ -180,14 +183,14 @@  intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer
        * To accomplish this, we resort to the nasty hack of doubling the drm
        * region's cpp and halving its height.
        *
-       * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt()
-       * maps the memory incorrectly.
+       * If we neglect to double the pitch, then render corruption occurs.
        */
       irb->region = intel_region_alloc(intel->intelScreen,
-				       I915_TILING_Y,
+				       I915_TILING_NONE,
 				       cpp * 2,
-				       width,
-				       height / 2,
+				       ALIGN(width, 64),
+				       /* XXX: Maybe align to 128? */
+				       ALIGN(height / 2, 64),
 				       GL_TRUE);
       if (!irb->region)
 	return false;
diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h
index b2013af..9dd6a52 100644
--- a/src/mesa/drivers/dri/intel/intel_screen.h
+++ b/src/mesa/drivers/dri/intel/intel_screen.h
@@ -63,9 +63,12 @@ 
  * x8_z24 and s8).
  *
  * Eventually, intel_update_renderbuffers() makes a DRI2 request for
- * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled,
- * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if
- * nothing happend.
+ * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is
+ * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to
+ * true and continue as if nothing happend.
+ *
+ * [1] The stencil buffer is actually W tiled. However, we request from the
+ *     kernel a non-tiled buffer because the GTT is incapable of W fencing.
  *
  * If the buffers are X tiled, however, the handshake has failed and we must
  * clean up.
diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c
index 153803f..d306432 100644
--- a/src/mesa/drivers/dri/intel/intel_span.c
+++ b/src/mesa/drivers/dri/intel/intel_span.c
@@ -131,38 +131,77 @@  intel_set_span_functions(struct intel_context *intel,
    int miny = 0;							\
    int maxx = rb->Width;						\
    int maxy = rb->Height;						\
-   int stride = rb->RowStride;						\
-   uint8_t *buf = rb->Data;						\
+									\
+   /*									\
+    * Here we ignore rb->Data and rb->RowStride as set by		\
+    * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile	\
+    * manually, the region's *real* base address and stride is		\
+    * required.								\
+    */									\
+   struct intel_renderbuffer *irb = intel_renderbuffer(rb);		\
+   uint8_t *buf = irb->region->buffer->virtual;				\
+   unsigned stride = irb->region->pitch;				\
+   unsigned height = 2 * irb->region->height;				\
+   bool flip = rb->Name == 0;						\
 
-/* Don't flip y. */
 #undef Y_FLIP
-#define Y_FLIP(y) y
+#define Y_FLIP(y) ((1 - flip) * y + flip * (height - 1 - y))
 
 /**
  * \brief Get pointer offset into stencil buffer.
  *
- * The stencil buffer interleaves two rows into one. Yay for crazy hardware.
- * The table below demonstrates how the pointer arithmetic behaves for a buffer
- * with positive stride (s=stride).
- *
- *     x    | y     | byte offset
- *     --------------------------
- *     0    | 0     | 0
- *     0    | 1     | 1
- *     1    | 0     | 2
- *     1    | 1     | 3
- *     ...  | ...   | ...
- *     0    | 2     | s
- *     0    | 3     | s + 1
- *     1    | 2     | s + 2
- *     1    | 3     | s + 3
- *
+ * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we
+ * must decode the tile's layout in software.
  *
+ * See
+ *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile
+ *     Format.
+ *   - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm
  */
-static inline intptr_t
-intel_offset_S8(int stride, GLint x, GLint y)
+static inline uintptr_t
+intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y)
 {
-   return 2 * ((y / 2) * stride + x) + y % 2;
+   uint32_t tile_size = 4096;
+   uint32_t tile_width = 64;
+   uint32_t tile_height = 64;
+   uint32_t row_size = 64 * stride;
+
+   uint32_t tile_x = x / tile_width;
+   uint32_t tile_y = y / tile_height;
+
+   /* The byte's address relative to the tile's base addres. */
+   uint32_t byte_x = x % tile_width;
+   uint32_t byte_y = y % tile_height;
+
+   uintptr_t u = tile_y * row_size
+               + tile_x * tile_size
+               + 512 * (byte_x / 8)
+               +  64 * (byte_y / 8)
+               +  32 * ((byte_y / 4) % 2)
+               +  16 * ((byte_x / 4) % 2)
+               +   8 * ((byte_y / 2) % 2)
+               +   4 * ((byte_x / 2) % 2)
+               +   2 * (byte_y % 2)
+               +   1 * (byte_x % 2);
+
+   /*
+    * Errata for Gen5:
+    *
+    * An additional offset is needed which is not documented in the PRM.
+    *
+    * if ((byte_x / 8) % 2 == 1) {
+    *    if ((byte_y / 8) % 2) == 0) {
+    *       u += 64;
+    *    } else {
+    *       u -= 64;
+    *    }
+    * }
+    *
+    * The offset is expressed more tersely as
+    * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1));
+    */
+
+   return u;
 }
 
 #define WRITE_STENCIL(x, y, src)  buf[intel_offset_S8(stride, x, y)] = src;