From patchwork Tue Jul 19 00:08:35 2011 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Chad Versace X-Patchwork-Id: 987952 Received: from gabe.freedesktop.org (gabe.freedesktop.org [131.252.210.177]) by demeter1.kernel.org (8.14.4/8.14.4) with ESMTP id p6J0BUVQ000422 for ; Tue, 19 Jul 2011 00:11:50 GMT Received: from gabe.freedesktop.org (localhost [127.0.0.1]) by gabe.freedesktop.org (Postfix) with ESMTP id 0006E9E864 for ; Mon, 18 Jul 2011 17:11:29 -0700 (PDT) X-Original-To: intel-gfx@lists.freedesktop.org Delivered-To: intel-gfx@lists.freedesktop.org Received: from cloud01.chad-versace.us (184-106-247-128.static.cloud-ips.com [184.106.247.128]) by gabe.freedesktop.org (Postfix) with ESMTP id 0948A9E8D8; Mon, 18 Jul 2011 17:08:51 -0700 (PDT) Received: from localhost (localhost [127.0.0.1]) by cloud01.chad-versace.us (Postfix) with ESMTP id BA9E21D424F; Tue, 19 Jul 2011 00:11:21 +0000 (UTC) X-Virus-Scanned: amavisd-new at static.cloud-ips.com X-Spam-Flag: NO X-Spam-Score: -1 X-Spam-Level: X-Spam-Status: No, score=-1 tagged_above=-100 required=5 tests=[ALL_TRUSTED=-1] autolearn=ham Received: from cloud01.chad-versace.us ([127.0.0.1]) by localhost (cloud01.static.cloud-ips.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id aLTYyHorji7X; Tue, 19 Jul 2011 00:11:14 +0000 (UTC) Received: from localhost (unknown [64.122.13.55]) by cloud01.chad-versace.us (Postfix) with ESMTPSA id 578A91D4263; Tue, 19 Jul 2011 00:11:13 +0000 (UTC) From: Chad Versace To: intel-gfx@lists.freedesktop.org, mesa-dev@lists.freedesktop.org Date: Mon, 18 Jul 2011 17:08:35 -0700 Message-Id: <1311034115-4191-3-git-send-email-chad@chad-versace.us> X-Mailer: git-send-email 1.7.6 In-Reply-To: <1311034115-4191-1-git-send-email-chad@chad-versace.us> References: <1311034115-4191-1-git-send-email-chad@chad-versace.us> Cc: Chad Versace , Ian Romancik Subject: [Intel-gfx] [PATCH] intel: Fix stencil buffer to be W tiled X-BeenThere: intel-gfx@lists.freedesktop.org X-Mailman-Version: 2.1.11 Precedence: list List-Id: Intel graphics driver community testing & development List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , MIME-Version: 1.0 Sender: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org Errors-To: intel-gfx-bounces+patchwork-intel-gfx=patchwork.kernel.org@lists.freedesktop.org X-Greylist: IP, sender and recipient auto-whitelisted, not delayed by milter-greylist-4.2.6 (demeter1.kernel.org [140.211.167.41]); Tue, 19 Jul 2011 00:12:02 +0000 (UTC) Until now, the stencil buffer was allocated as a Y tiled buffer, because in several locations the PRM states that it is. However, it is actually W tiled. From the PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Format: W-Major Tile Format is used for separate stencil. The GTT is incapable of W fencing, so we allocate the stencil buffer with I915_TILING_NONE and decode the tile's layout in software. This fix touches the following portions of code: - In intel_allocate_renderbuffer_storage(), allocate the stencil buffer with I915_TILING_NONE. - In intel_verify_dri2_has_hiz(), verify that the stencil buffer is not tiled. - In the stencil buffer's span functions, the tile's layout must be decoded in software. This commit mutually depends on the xf86-video-intel commit dri: Do not tile stencil buffer Author: Chad Versace Date: Mon Jul 18 00:38:00 2011 -0700 On Gen6 with separate stencil enabled, fixes the following Piglit tests: bugs/fdo23670-drawpix_stencil general/stencil-drawpixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-copypixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-drawpixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX16-readpixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-copypixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-drawpixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX1-readpixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-copypixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-drawpixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX4-readpixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-copypixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-drawpixels spec/EXT_framebuffer_object/fbo-stencil-GL_STENCIL_INDEX8-readpixels spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-copypixels spec/EXT_packed_depth_stencil/fbo-stencil-GL_DEPTH24_STENCIL8-readpixels spec/EXT_packed_depth_stencil/readpixels-24_8 Note: This is a candidate for the 7.11 branch. CC: Eric Anholt CC: Kenneth Graunke CC: Ian Romancik Signed-off-by: Chad Versace --- src/mesa/drivers/dri/intel/intel_clear.c | 6 ++ src/mesa/drivers/dri/intel/intel_context.c | 9 ++- src/mesa/drivers/dri/intel/intel_fbo.c | 12 ++-- src/mesa/drivers/dri/intel/intel_screen.h | 9 ++- src/mesa/drivers/dri/intel/intel_span.c | 88 +++++++++++++++++++++------- 5 files changed, 93 insertions(+), 31 deletions(-) diff --git a/src/mesa/drivers/dri/intel/intel_clear.c b/src/mesa/drivers/dri/intel/intel_clear.c index dfca03c..5ab9873 100644 --- a/src/mesa/drivers/dri/intel/intel_clear.c +++ b/src/mesa/drivers/dri/intel/intel_clear.c @@ -143,6 +143,12 @@ intelClear(struct gl_context *ctx, GLbitfield mask) */ tri_mask |= BUFFER_BIT_STENCIL; } + else if (intel->has_separate_stencil && + stencilRegion->tiling == I915_TILING_NONE) { + /* The stencil buffer is actually W tiled, which the hardware + * cannot blit to. */ + tri_mask |= BUFFER_BIT_STENCIL; + } else { /* clearing all stencil bits, use blitting */ blit_mask |= BUFFER_BIT_STENCIL; diff --git a/src/mesa/drivers/dri/intel/intel_context.c b/src/mesa/drivers/dri/intel/intel_context.c index 2ba1363..fe8be08 100644 --- a/src/mesa/drivers/dri/intel/intel_context.c +++ b/src/mesa/drivers/dri/intel/intel_context.c @@ -1439,7 +1439,12 @@ intel_verify_dri2_has_hiz(struct intel_context *intel, assert(stencil_rb->Base.Format == MESA_FORMAT_S8); assert(depth_rb && depth_rb->Base.Format == MESA_FORMAT_X8_Z24); - if (stencil_rb->region->tiling == I915_TILING_Y) { + if (stencil_rb->region->tiling == I915_TILING_NONE) { + /* + * The stencil buffer is actually W tiled. The region's tiling is + * I915_TILING_NONE, however, because the GTT is incapable of W + * fencing. + */ intel->intelScreen->dri2_has_hiz = INTEL_DRI2_HAS_HIZ_TRUE; return; } else { @@ -1527,7 +1532,7 @@ intel_verify_dri2_has_hiz(struct intel_context *intel, * Presently, however, no verification or clean up is necessary, and * execution should not reach here. If the framebuffer still has a hiz * region, then we have already set dri2_has_hiz to true after - * confirming above that the stencil buffer is Y tiled. + * confirming above that the stencil buffer is W tiled. */ assert(0); } diff --git a/src/mesa/drivers/dri/intel/intel_fbo.c b/src/mesa/drivers/dri/intel/intel_fbo.c index 1669af2..2206391 100644 --- a/src/mesa/drivers/dri/intel/intel_fbo.c +++ b/src/mesa/drivers/dri/intel/intel_fbo.c @@ -173,6 +173,9 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer if (irb->Base.Format == MESA_FORMAT_S8) { /* + * The stencil buffer is W tiled. However, we request from the kernel a + * non-tiled buffer because the GTT is incapable of W fencing. + * * The stencil buffer has quirky pitch requirements. From Vol 2a, * 11.5.6.2.1 3DSTATE_STENCIL_BUFFER, field "Surface Pitch": * The pitch must be set to 2x the value computed based on width, as @@ -180,14 +183,13 @@ intel_alloc_renderbuffer_storage(struct gl_context * ctx, struct gl_renderbuffer * To accomplish this, we resort to the nasty hack of doubling the drm * region's cpp and halving its height. * - * If we neglect to double the pitch, then drm_intel_gem_bo_map_gtt() - * maps the memory incorrectly. + * If we neglect to double the pitch, then render corruption occurs. */ irb->region = intel_region_alloc(intel->intelScreen, - I915_TILING_Y, + I915_TILING_NONE, cpp * 2, - width, - height / 2, + ALIGN(width, 64), + ALIGN((height + 1) / 2, 64), GL_TRUE); if (!irb->region) return false; diff --git a/src/mesa/drivers/dri/intel/intel_screen.h b/src/mesa/drivers/dri/intel/intel_screen.h index b2013af..9dd6a52 100644 --- a/src/mesa/drivers/dri/intel/intel_screen.h +++ b/src/mesa/drivers/dri/intel/intel_screen.h @@ -63,9 +63,12 @@ * x8_z24 and s8). * * Eventually, intel_update_renderbuffers() makes a DRI2 request for - * DRI2BufferStencil and DRI2BufferHiz. If the returned buffers are Y tiled, - * then we joyfully set intel_screen.dri2_has_hiz to true and continue as if - * nothing happend. + * DRI2BufferStencil and DRI2BufferHiz. If the stencil buffer's tiling is + * I915_TILING_NONE [1], then we joyfully set intel_screen.dri2_has_hiz to + * true and continue as if nothing happend. + * + * [1] The stencil buffer is actually W tiled. However, we request from the + * kernel a non-tiled buffer because the GTT is incapable of W fencing. * * If the buffers are X tiled, however, the handshake has failed and we must * clean up. diff --git a/src/mesa/drivers/dri/intel/intel_span.c b/src/mesa/drivers/dri/intel/intel_span.c index 153803f..2e1c80c 100644 --- a/src/mesa/drivers/dri/intel/intel_span.c +++ b/src/mesa/drivers/dri/intel/intel_span.c @@ -131,38 +131,84 @@ intel_set_span_functions(struct intel_context *intel, int miny = 0; \ int maxx = rb->Width; \ int maxy = rb->Height; \ - int stride = rb->RowStride; \ - uint8_t *buf = rb->Data; \ + \ + /* \ + * Here we ignore rb->Data and rb->RowStride as set by \ + * intelSpanRenderStart. Since intel_offset_S8 decodes the W tile \ + * manually, the region's *real* base address and stride is \ + * required. \ + */ \ + struct intel_renderbuffer *irb = intel_renderbuffer(rb); \ + uint8_t *buf = irb->region->buffer->virtual; \ + unsigned stride = irb->region->pitch; \ + unsigned height = 2 * irb->region->height; \ + bool flip = rb->Name == 0; \ + int y_scale = flip ? -1 : 1; \ + int y_bias = flip ? (height - 1) : 0; \ -/* Don't flip y. */ #undef Y_FLIP -#define Y_FLIP(y) y +#define Y_FLIP(y) (y_scale * (y) + y_bias) /** * \brief Get pointer offset into stencil buffer. * - * The stencil buffer interleaves two rows into one. Yay for crazy hardware. - * The table below demonstrates how the pointer arithmetic behaves for a buffer - * with positive stride (s=stride). - * - * x | y | byte offset - * -------------------------- - * 0 | 0 | 0 - * 0 | 1 | 1 - * 1 | 0 | 2 - * 1 | 1 | 3 - * ... | ... | ... - * 0 | 2 | s - * 0 | 3 | s + 1 - * 1 | 2 | s + 2 - * 1 | 3 | s + 3 + * The stencil buffer is W tiled. Since the GTT is incapable of W fencing, we + * must decode the tile's layout in software. * + * See + * - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.2.1 W-Major Tile + * Format. + * - PRM, 2011 Sandy Bridge, Volume 1, Part 2, Section 4.5.3 Tiling Algorithm * + * Even though the returned offset is always positive, the return type is + * signed due to + * commit e8b1c6d6f55f5be3bef25084fdd8b6127517e137 + * mesa: Fix return type of _mesa_get_format_bytes() (#37351) */ static inline intptr_t -intel_offset_S8(int stride, GLint x, GLint y) +intel_offset_S8(uint32_t stride, uint32_t x, uint32_t y) { - return 2 * ((y / 2) * stride + x) + y % 2; + uint32_t tile_size = 4096; + uint32_t tile_width = 64; + uint32_t tile_height = 64; + uint32_t row_size = 64 * stride; + + uint32_t tile_x = x / tile_width; + uint32_t tile_y = y / tile_height; + + /* The byte's address relative to the tile's base addres. */ + uint32_t byte_x = x % tile_width; + uint32_t byte_y = y % tile_height; + + uintptr_t u = tile_y * row_size + + tile_x * tile_size + + 512 * (byte_x / 8) + + 64 * (byte_y / 8) + + 32 * ((byte_y / 4) % 2) + + 16 * ((byte_x / 4) % 2) + + 8 * ((byte_y / 2) % 2) + + 4 * ((byte_x / 2) % 2) + + 2 * (byte_y % 2) + + 1 * (byte_x % 2); + + /* + * Errata for Gen5: + * + * An additional offset is needed which is not documented in the PRM. + * + * if ((byte_x / 8) % 2 == 1) { + * if ((byte_y / 8) % 2) == 0) { + * u += 64; + * } else { + * u -= 64; + * } + * } + * + * The offset is expressed more tersely as + * u += ((int) x & 0x8) * (8 - (((int) y & 0x8) << 1)); + */ + + return u; } #define WRITE_STENCIL(x, y, src) buf[intel_offset_S8(stride, x, y)] = src;