[8/9] drm/i915: Implement .get_format_info() hook for CCS

Message ID	20170104184232.23048-9-ville.syrjala@linux.intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <dri-devel-bounces@lists.freedesktop.org> From: ville.syrjala@linux.intel.com To: intel-gfx@lists.freedesktop.org Subject: [PATCH 8/9] drm/i915: Implement .get_format_info() hook for CCS Date: Wed, 4 Jan 2017 20:42:31 +0200 Message-Id: <20170104184232.23048-9-ville.syrjala@linux.intel.com> In-Reply-To: <20170104184232.23048-1-ville.syrjala@linux.intel.com> References: <20170104184232.23048-1-ville.syrjala@linux.intel.com> MIME-Version: 1.0 Cc: Jason Ekstrand <jason@jlekstrand.net>, Ben Widawsky <ben@bwidawsk.net>, Vandana Kannan <vandana.kannan@intel.com>, dri-devel@lists.freedesktop.org Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: dri-devel-bounces@lists.freedesktop.org Sender: "dri-devel" <dri-devel-bounces@lists.freedesktop.org>

Ville Syrjälä Jan. 4, 2017, 6:42 p.m. UTC

From: Ville Syrjälä <ville.syrjala@linux.intel.com>

SKL+ display engine can scan out certain kinds of compressed surfaces
produced by the render engine. This involved telling the display engine
the location of the color control surfae (CCS) which describes which
parts of the main surface are compressed and which are not. The location
of CCS is provided by userspace as just another plane with its own offset.

By providing our own format information for the CCS formats, we should
be able to make framebuffer_check() do the right thing for the CCS
surface as well.

Note that we'll return the same format info for both Y and Yf tiled
format as that's what happens with the non-CCS Y vs. Yf as well. If
desired, we could potentially return a unique pointer for each
pixel_format+tiling+ccs combination, in which case we immediately be
able to tell if any of that stuff changed by just comparing the
pointers. But that does sound a bit wasteful space wise.

v2: Drop the 'dev' argument from the hook
v3: Include the description of the CCS surface layout

Cc: Vandana Kannan <vandana.kannan@intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Ben Widawsky <ben@bwidawsk.net>
Cc: Jason Ekstrand <jason@jlekstrand.net>
Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_display.c | 36 ++++++++++++++++++++++++++
 include/uapi/drm/drm_fourcc.h        | 49 ++++++++++++++++++++++++++++++++++++
 2 files changed, 85 insertions(+)

Tvrtko Ursulin Jan. 5, 2017, 4:24 p.m. UTC | #1

On 04/01/2017 18:42, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>
> SKL+ display engine can scan out certain kinds of compressed surfaces
> produced by the render engine. This involved telling the display engine
> the location of the color control surfae (CCS) which describes which
> parts of the main surface are compressed and which are not. The location
> of CCS is provided by userspace as just another plane with its own offset.
>
> By providing our own format information for the CCS formats, we should
> be able to make framebuffer_check() do the right thing for the CCS
> surface as well.
>
> Note that we'll return the same format info for both Y and Yf tiled
> format as that's what happens with the non-CCS Y vs. Yf as well. If
> desired, we could potentially return a unique pointer for each
> pixel_format+tiling+ccs combination, in which case we immediately be
> able to tell if any of that stuff changed by just comparing the
> pointers. But that does sound a bit wasteful space wise.
>
> v2: Drop the 'dev' argument from the hook
> v3: Include the description of the CCS surface layout
>
> Cc: Vandana Kannan <vandana.kannan@intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Cc: Jason Ekstrand <jason@jlekstrand.net>
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_display.c | 36 ++++++++++++++++++++++++++
>  include/uapi/drm/drm_fourcc.h        | 49 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index c4662b2e9613..38de9df0ec60 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -2478,6 +2478,41 @@ static unsigned int intel_fb_modifier_to_tiling(uint64_t fb_modifier)
>  	}
>  }
>
> +static const struct drm_format_info ccs_formats[] = {
> +	{ .format = DRM_FORMAT_XRGB8888, .depth = 24, .num_planes = 2, .cpp = { 4, 1, }, .hsub = 16, .vsub = 8, },
> +	{ .format = DRM_FORMAT_XBGR8888, .depth = 24, .num_planes = 2, .cpp = { 4, 1, }, .hsub = 16, .vsub = 8, },
> +	{ .format = DRM_FORMAT_ARGB8888, .depth = 32, .num_planes = 2, .cpp = { 4, 1, }, .hsub = 16, .vsub = 8, },
> +	{ .format = DRM_FORMAT_ABGR8888, .depth = 32, .num_planes = 2, .cpp = { 4, 1, }, .hsub = 16, .vsub = 8, },
> +};
> +
> +static const struct drm_format_info *
> +lookup_format_info(const struct drm_format_info formats[],
> +		   int num_formats, u32 format)
> +{
> +	int i;
> +
> +	for (i = 0; i < num_formats; i++) {
> +		if (formats[i].format == format)
> +			return &formats[i];
> +	}
> +
> +	return NULL;
> +}
> +
> +static const struct drm_format_info *
> +intel_get_format_info(const struct drm_mode_fb_cmd2 *cmd)
> +{
> +	switch (cmd->modifier[0]) {
> +	case I915_FORMAT_MOD_Y_TILED_CCS:
> +	case I915_FORMAT_MOD_Yf_TILED_CCS:
> +		return lookup_format_info(ccs_formats,
> +					  ARRAY_SIZE(ccs_formats),
> +					  cmd->pixel_format);
> +	default:
> +		return NULL;
> +	}
> +}
> +
>  static int
>  intel_fill_fb_info(struct drm_i915_private *dev_priv,
>  		   struct drm_framebuffer *fb)
> @@ -16083,6 +16118,7 @@ static void intel_atomic_state_free(struct drm_atomic_state *state)
>
>  static const struct drm_mode_config_funcs intel_mode_funcs = {
>  	.fb_create = intel_user_framebuffer_create,
> +	.get_format_info = intel_get_format_info,
>  	.output_poll_changed = intel_fbdev_output_poll_changed,
>  	.atomic_check = intel_atomic_check,
>  	.atomic_commit = intel_atomic_commit,
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 9e1bb7fabcde..4581e3d41e5c 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -230,6 +230,55 @@ extern "C" {
>  #define I915_FORMAT_MOD_Yf_TILED fourcc_mod_code(INTEL, 3)
>
>  /*
> + * Intel color control surface (CCS) for render compression
> + *
> + * The framebuffer format must be one of the 8:8:8:8 RGB formats,
> + * the main surface will be plane index 0 and will be Y/Yf-tiled,
> + * the CCS will be plane index 1.
> + *
> + * Each byte of CCS contains 4 pairs of bits, with each pair of bits
> + * matching an area of 8x4 pixels of the main surface. Which would seem
> + * to match 2 cachelines containing 4x4 pixels each. The pairs bits within
> + * the byte form a 2x2 grid, which thus matches a 16x8 pixel area of the
> + * main surface. This is the 2x2 pattern the bits form (0=lsb, 7=msb):
> + * -----------
> + * | 01 | 23 |
> + *  ----------
> + * | 45 | 67 |
> + * -----------
> + *
> + * Actually only the lower bit of the pair seems to have any effect.
> + * No idea why. 0 in the lower bit would seem to mean not compressed,
> + * and 1 is compressed.  The interpreation of the main surface data
> + * when the block is marked compressed is unknown as of now.
> + *
> + * CCS tile is laid out in 8 byte horizontal strips each strip thus corresponds
> + * to a 128x8 pixel are of the main surface. So each 8x8 bytes of the CCS
> + * (1 cacheline) will match an area of 4x2 tiles on the main surface.
> + *
> + * Here is the layout of a full CCS tile, with the 8 byte strips numbered 0-511:
> + * ------------------------
> + * |  0 |  64 | ... | 448 |
> + * |  1 |  65 |     | 449 |
> + * |  2 |  66 |     | 450 |
> + * |  . |   . |     |   . |
> + * |  . |   . |     |   . |
> + * |  . |   . |     |   . |
> + * | 63 | 127 |     | 511 |
> + * ------------------------
> + *
> + * This will match an area of 1024x512 pixels on the main surface.
> + *
> + * The CCS plane pitch must be a multiple of the CCS tile width (64 bytes),
> + * and for the purposes of the CCS plane offset we assume cpp==1. As each
> + * byte matches a 16x8 area of the main surface, the dimensions of the CCS
> + * plane will thus appear to be width/16 x height/8. Both planes must be
> + * contained within the same gem object.
> + */
> +#define I915_FORMAT_MOD_Y_TILED_CCS	fourcc_mod_code(INTEL, 4)
> +#define I915_FORMAT_MOD_Yf_TILED_CCS	fourcc_mod_code(INTEL, 5)

Are we sure this is better than reserving some bits for tiling mode and 
having CCS as separate bit flag? IMHO code is already a bit unsightly 
with this scheme and it would take just one more orthogonal but 
simultaneously usable set of modifiers to drown us in permutations. We 
got plenty of bits available.

Regards,

Tvrtko

Ben Widawsky Jan. 5, 2017, 5:40 p.m. UTC | #2

On 17-01-05 16:24:56, Tvrtko Ursulin wrote:
>
>On 04/01/2017 18:42, ville.syrjala@linux.intel.com wrote:
>>From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>
>>SKL+ display engine can scan out certain kinds of compressed surfaces
>>produced by the render engine. This involved telling the display engine
>>the location of the color control surfae (CCS) which describes which
>>parts of the main surface are compressed and which are not. The location
>>of CCS is provided by userspace as just another plane with its own offset.
>>
>>By providing our own format information for the CCS formats, we should
>>be able to make framebuffer_check() do the right thing for the CCS
>>surface as well.
>>
>>Note that we'll return the same format info for both Y and Yf tiled
>>format as that's what happens with the non-CCS Y vs. Yf as well. If
>>desired, we could potentially return a unique pointer for each
>>pixel_format+tiling+ccs combination, in which case we immediately be
>>able to tell if any of that stuff changed by just comparing the
>>pointers. But that does sound a bit wasteful space wise.
>>
>>v2: Drop the 'dev' argument from the hook
>>v3: Include the description of the CCS surface layout
>>
>>Cc: Vandana Kannan <vandana.kannan@intel.com>
>>Cc: Daniel Vetter <daniel@ffwll.ch>
>>Cc: Ben Widawsky <ben@bwidawsk.net>
>>Cc: Jason Ekstrand <jason@jlekstrand.net>
>>Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
>>Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>>---
>> drivers/gpu/drm/i915/intel_display.c | 36 ++++++++++++++++++++++++++
>> include/uapi/drm/drm_fourcc.h        | 49 ++++++++++++++++++++++++++++++++++++
>> 2 files changed, 85 insertions(+)
>>
>>diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
>>index c4662b2e9613..38de9df0ec60 100644
>>--- a/drivers/gpu/drm/i915/intel_display.c
>>+++ b/drivers/gpu/drm/i915/intel_display.c
>>@@ -2478,6 +2478,41 @@ static unsigned int intel_fb_modifier_to_tiling(uint64_t fb_modifier)
>> 	}
>> }
>>
>>+static const struct drm_format_info ccs_formats[] = {
>>+	{ .format = DRM_FORMAT_XRGB8888, .depth = 24, .num_planes = 2, .cpp = { 4, 1, }, .hsub = 16, .vsub = 8, },
>>+	{ .format = DRM_FORMAT_XBGR8888, .depth = 24, .num_planes = 2, .cpp = { 4, 1, }, .hsub = 16, .vsub = 8, },
>>+	{ .format = DRM_FORMAT_ARGB8888, .depth = 32, .num_planes = 2, .cpp = { 4, 1, }, .hsub = 16, .vsub = 8, },
>>+	{ .format = DRM_FORMAT_ABGR8888, .depth = 32, .num_planes = 2, .cpp = { 4, 1, }, .hsub = 16, .vsub = 8, },
>>+};
>>+
>>+static const struct drm_format_info *
>>+lookup_format_info(const struct drm_format_info formats[],
>>+		   int num_formats, u32 format)
>>+{
>>+	int i;
>>+
>>+	for (i = 0; i < num_formats; i++) {
>>+		if (formats[i].format == format)
>>+			return &formats[i];
>>+	}
>>+
>>+	return NULL;
>>+}
>>+
>>+static const struct drm_format_info *
>>+intel_get_format_info(const struct drm_mode_fb_cmd2 *cmd)
>>+{
>>+	switch (cmd->modifier[0]) {
>>+	case I915_FORMAT_MOD_Y_TILED_CCS:
>>+	case I915_FORMAT_MOD_Yf_TILED_CCS:
>>+		return lookup_format_info(ccs_formats,
>>+					  ARRAY_SIZE(ccs_formats),
>>+					  cmd->pixel_format);
>>+	default:
>>+		return NULL;
>>+	}
>>+}
>>+
>> static int
>> intel_fill_fb_info(struct drm_i915_private *dev_priv,
>> 		   struct drm_framebuffer *fb)
>>@@ -16083,6 +16118,7 @@ static void intel_atomic_state_free(struct drm_atomic_state *state)
>>
>> static const struct drm_mode_config_funcs intel_mode_funcs = {
>> 	.fb_create = intel_user_framebuffer_create,
>>+	.get_format_info = intel_get_format_info,
>> 	.output_poll_changed = intel_fbdev_output_poll_changed,
>> 	.atomic_check = intel_atomic_check,
>> 	.atomic_commit = intel_atomic_commit,
>>diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>>index 9e1bb7fabcde..4581e3d41e5c 100644
>>--- a/include/uapi/drm/drm_fourcc.h
>>+++ b/include/uapi/drm/drm_fourcc.h
>>@@ -230,6 +230,55 @@ extern "C" {
>> #define I915_FORMAT_MOD_Yf_TILED fourcc_mod_code(INTEL, 3)
>>
>> /*
>>+ * Intel color control surface (CCS) for render compression
>>+ *
>>+ * The framebuffer format must be one of the 8:8:8:8 RGB formats,
>>+ * the main surface will be plane index 0 and will be Y/Yf-tiled,
>>+ * the CCS will be plane index 1.
>>+ *
>>+ * Each byte of CCS contains 4 pairs of bits, with each pair of bits
>>+ * matching an area of 8x4 pixels of the main surface. Which would seem
>>+ * to match 2 cachelines containing 4x4 pixels each. The pairs bits within
>>+ * the byte form a 2x2 grid, which thus matches a 16x8 pixel area of the
>>+ * main surface. This is the 2x2 pattern the bits form (0=lsb, 7=msb):
>>+ * -----------
>>+ * | 01 | 23 |
>>+ *  ----------
>>+ * | 45 | 67 |
>>+ * -----------
>>+ *
>>+ * Actually only the lower bit of the pair seems to have any effect.
>>+ * No idea why. 0 in the lower bit would seem to mean not compressed,
>>+ * and 1 is compressed.  The interpreation of the main surface data
>>+ * when the block is marked compressed is unknown as of now.
>>+ *
>>+ * CCS tile is laid out in 8 byte horizontal strips each strip thus corresponds
>>+ * to a 128x8 pixel are of the main surface. So each 8x8 bytes of the CCS
>>+ * (1 cacheline) will match an area of 4x2 tiles on the main surface.
>>+ *
>>+ * Here is the layout of a full CCS tile, with the 8 byte strips numbered 0-511:
>>+ * ------------------------
>>+ * |  0 |  64 | ... | 448 |
>>+ * |  1 |  65 |     | 449 |
>>+ * |  2 |  66 |     | 450 |
>>+ * |  . |   . |     |   . |
>>+ * |  . |   . |     |   . |
>>+ * |  . |   . |     |   . |
>>+ * | 63 | 127 |     | 511 |
>>+ * ------------------------
>>+ *
>>+ * This will match an area of 1024x512 pixels on the main surface.
>>+ *
>>+ * The CCS plane pitch must be a multiple of the CCS tile width (64 bytes),
>>+ * and for the purposes of the CCS plane offset we assume cpp==1. As each
>>+ * byte matches a 16x8 area of the main surface, the dimensions of the CCS
>>+ * plane will thus appear to be width/16 x height/8. Both planes must be
>>+ * contained within the same gem object.
>>+ */
>>+#define I915_FORMAT_MOD_Y_TILED_CCS	fourcc_mod_code(INTEL, 4)
>>+#define I915_FORMAT_MOD_Yf_TILED_CCS	fourcc_mod_code(INTEL, 5)
>
>Are we sure this is better than reserving some bits for tiling mode 
>and having CCS as separate bit flag? IMHO code is already a bit 
>unsightly with this scheme and it would take just one more orthogonal 
>but simultaneously usable set of modifiers to drown us in 
>permutations. We got plenty of bits available.
>
>Regards,
>
>Tvrtko

Adding Kristian...

Obviously it's a matter of opinion but I certainly don't think it's unsightly
now. I agree that if we have 1 more orthogonal modifier makes it bad, and 2
pretty much make it terrible which was actually what I liked about having the
per plane modifiers. Anyway, it seems everyone has mostly agreed on this
already and I'd propose we move forward and say that if this scheme doesn't
work, we bail for addfb3.

I'm not entirely convinced we need a Yf even now....

Chad Versace Feb. 26, 2017, 10:41 p.m. UTC | #3

On Wed 04 Jan 2017, ville.syrjala@linux.intel.com wrote:
> From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> 
> SKL+ display engine can scan out certain kinds of compressed surfaces
> produced by the render engine. This involved telling the display engine
> the location of the color control surfae (CCS) which describes which
> parts of the main surface are compressed and which are not. The location
> of CCS is provided by userspace as just another plane with its own offset.
> 
> By providing our own format information for the CCS formats, we should
> be able to make framebuffer_check() do the right thing for the CCS
> surface as well.
> 
> Note that we'll return the same format info for both Y and Yf tiled
> format as that's what happens with the non-CCS Y vs. Yf as well. If
> desired, we could potentially return a unique pointer for each
> pixel_format+tiling+ccs combination, in which case we immediately be
> able to tell if any of that stuff changed by just comparing the
> pointers. But that does sound a bit wasteful space wise.
> 
> v2: Drop the 'dev' argument from the hook
> v3: Include the description of the CCS surface layout
> 
> Cc: Vandana Kannan <vandana.kannan@intel.com>
> Cc: Daniel Vetter <daniel@ffwll.ch>
> Cc: Ben Widawsky <ben@bwidawsk.net>
> Cc: Jason Ekstrand <jason@jlekstrand.net>
> Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
> Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_display.c | 36 ++++++++++++++++++++++++++
>  include/uapi/drm/drm_fourcc.h        | 49 ++++++++++++++++++++++++++++++++++++
>  2 files changed, 85 insertions(+)


> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 9e1bb7fabcde..4581e3d41e5c 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -230,6 +230,55 @@ extern "C" {
>  #define I915_FORMAT_MOD_Yf_TILED fourcc_mod_code(INTEL, 3)
>  
>  /*
> + * Intel color control surface (CCS) for render compression
> + *
> + * The framebuffer format must be one of the 8:8:8:8 RGB formats,
> + * the main surface will be plane index 0 and will be Y/Yf-tiled,
> + * the CCS will be plane index 1.
> + *

The paragraph below is...

> + * Each byte of CCS contains 4 pairs of bits, with each pair of bits
> + * matching an area of 8x4 pixels of the main surface. Which would seem
> + * to match 2 cachelines containing 4x4 pixels each. The pairs bits within
> + * the byte form a 2x2 grid, which thus matches a 16x8 pixel area of the
> + * main surface. This is the 2x2 pattern the bits form (0=lsb, 7=msb):
> + * -----------
> + * | 01 | 23 |
> + *  ----------
> + * | 45 | 67 |
> + * -----------

...almost correct. The hw docs state that each bit-pair of the CCS maps
cacheline-pair, horizontally adjacent in the Y tile, of the primary surface. As
a consequence, the remainder of the above paragraph is correct for 32-bit
formats, but not others.

This is not a nitpick, because Vulkan and EGL users may want to share
dma_bufs with a fat format like R32G32B32A32_UNORM, and want to have CCS
enabled when possible. As long as the users use the dma_buf only the 3D
engine, and don't submit it to KMS, it's all safe.

For those users, we should document the bit-pair/cacheline-pair relationship
correctly. And then preface the following detailed explanation and nice ascii
diagrams by saying "this applies only to the 32-bit formats".

Here's the relevant PRM quote:

     The Color Control Surface (CCS) contains the compression status
     of the cache-line pairs. The compression state of the cache-line
     pair is specified by 2 bits in the CCS.  Each CCS cache-line
     represents an area on the main surface of 16x16 sets of 128 byte
     Y-tiled cache-line-pairs. CCS is always Y tiled.

See this Mesa comment for more details:
https://cgit.freedesktop.org/mesa/mesa/tree/src/intel/isl/isl.c?h=17.0#n211

> + * Actually only the lower bit of the pair seems to have any effect.
> + * No idea why. 0 in the lower bit would seem to mean not compressed,
> + * and 1 is compressed.  The interpreation of the main surface data
> + * when the block is marked compressed is unknown as of now.

If I recall correctly (it's been several months since I inspected the
bits), the high bit is actually significant. Bit pattern 11 means that
the data in primary surface's cacheline-pair is invalid; the 3D engine
interprets the color of all pixels in that cacheline-pair to be the
clear color stored in RENDER_SURFACE_STATE. Before the display engine
can consume the surface, userspace is required to do a partial resolve,
flushing the clear color into the primary surface. So it makes sense
that the kernel would never observe that bit pattern in an incoming ccs
surface, as long as userspace is doing its job correctly. And it makes
sense that the display engine would ignore the high bit, because there is
no way to provide the clear color to the display engine (at least
according my reading of the docs).

Jason, does this match your understanding of the high bit?

> + *
> + * CCS tile is laid out in 8 byte horizontal strips each strip thus corresponds
> + * to a 128x8 pixel are of the main surface. So each 8x8 bytes of the CCS
> + * (1 cacheline) will match an area of 4x2 tiles on the main surface.
> + *
> + * Here is the layout of a full CCS tile, with the 8 byte strips numbered 0-511:
> + * ------------------------
> + * |  0 |  64 | ... | 448 |
> + * |  1 |  65 |     | 449 |
> + * |  2 |  66 |     | 450 |
> + * |  . |   . |     |   . |
> + * |  . |   . |     |   . |
> + * |  . |   . |     |   . |
> + * | 63 | 127 |     | 511 |
> + * ------------------------
> + *
> + * This will match an area of 1024x512 pixels on the main surface.
> + *
> + * The CCS plane pitch must be a multiple of the CCS tile width (64 bytes),
> + * and for the purposes of the CCS plane offset we assume cpp==1. As each
> + * byte matches a 16x8 area of the main surface, the dimensions of the CCS
> + * plane will thus appear to be width/16 x height/8. Both planes must be
> + * contained within the same gem object.

Again, the above paragraphs should clarify that they apply only to 32-bit formats.

> + */
> +#define I915_FORMAT_MOD_Y_TILED_CCS	fourcc_mod_code(INTEL, 4)
> +#define I915_FORMAT_MOD_Yf_TILED_CCS	fourcc_mod_code(INTEL, 5)

Ville Syrjälä Feb. 27, 2017, 3:13 p.m. UTC | #4

On Sun, Feb 26, 2017 at 02:41:50PM -0800, Chad Versace wrote:
> On Wed 04 Jan 2017, ville.syrjala@linux.intel.com wrote:
> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > 
> > SKL+ display engine can scan out certain kinds of compressed surfaces
> > produced by the render engine. This involved telling the display engine
> > the location of the color control surfae (CCS) which describes which
> > parts of the main surface are compressed and which are not. The location
> > of CCS is provided by userspace as just another plane with its own offset.
> > 
> > By providing our own format information for the CCS formats, we should
> > be able to make framebuffer_check() do the right thing for the CCS
> > surface as well.
> > 
> > Note that we'll return the same format info for both Y and Yf tiled
> > format as that's what happens with the non-CCS Y vs. Yf as well. If
> > desired, we could potentially return a unique pointer for each
> > pixel_format+tiling+ccs combination, in which case we immediately be
> > able to tell if any of that stuff changed by just comparing the
> > pointers. But that does sound a bit wasteful space wise.
> > 
> > v2: Drop the 'dev' argument from the hook
> > v3: Include the description of the CCS surface layout
> > 
> > Cc: Vandana Kannan <vandana.kannan@intel.com>
> > Cc: Daniel Vetter <daniel@ffwll.ch>
> > Cc: Ben Widawsky <ben@bwidawsk.net>
> > Cc: Jason Ekstrand <jason@jlekstrand.net>
> > Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
> > ---
> >  drivers/gpu/drm/i915/intel_display.c | 36 ++++++++++++++++++++++++++
> >  include/uapi/drm/drm_fourcc.h        | 49 ++++++++++++++++++++++++++++++++++++
> >  2 files changed, 85 insertions(+)
> 
> 
> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > index 9e1bb7fabcde..4581e3d41e5c 100644
> > --- a/include/uapi/drm/drm_fourcc.h
> > +++ b/include/uapi/drm/drm_fourcc.h
> > @@ -230,6 +230,55 @@ extern "C" {
> >  #define I915_FORMAT_MOD_Yf_TILED fourcc_mod_code(INTEL, 3)
> >  
> >  /*
> > + * Intel color control surface (CCS) for render compression
> > + *
> > + * The framebuffer format must be one of the 8:8:8:8 RGB formats,
> > + * the main surface will be plane index 0 and will be Y/Yf-tiled,
> > + * the CCS will be plane index 1.
> > + *
> 
> The paragraph below is...
> 
> > + * Each byte of CCS contains 4 pairs of bits, with each pair of bits
> > + * matching an area of 8x4 pixels of the main surface. Which would seem
> > + * to match 2 cachelines containing 4x4 pixels each. The pairs bits within
> > + * the byte form a 2x2 grid, which thus matches a 16x8 pixel area of the
> > + * main surface. This is the 2x2 pattern the bits form (0=lsb, 7=msb):
> > + * -----------
> > + * | 01 | 23 |
> > + *  ----------
> > + * | 45 | 67 |
> > + * -----------
> 
> ...almost correct. The hw docs state that each bit-pair of the CCS maps
> cacheline-pair, horizontally adjacent in the Y tile, of the primary surface. As
> a consequence, the remainder of the above paragraph is correct for 32-bit
> formats, but not others.

Which is why the comment says at the very top that the fb needs to
use a 8:8:8:8 format. IIRC that's the only thing the hardware supports.

> 
> This is not a nitpick, because Vulkan and EGL users may want to share
> dma_bufs with a fat format like R32G32B32A32_UNORM, and want to have CCS
> enabled when possible. As long as the users use the dma_buf only the 3D
> engine, and don't submit it to KMS, it's all safe.
> 
> For those users, we should document the bit-pair/cacheline-pair relationship
> correctly. And then preface the following detailed explanation and nice ascii
> diagrams by saying "this applies only to the 32-bit formats".
> 
> Here's the relevant PRM quote:
> 
>      The Color Control Surface (CCS) contains the compression status
>      of the cache-line pairs. The compression state of the cache-line
>      pair is specified by 2 bits in the CCS.  Each CCS cache-line
>      represents an area on the main surface of 16x16 sets of 128 byte
>      Y-tiled cache-line-pairs. CCS is always Y tiled.
> 
> See this Mesa comment for more details:
> https://cgit.freedesktop.org/mesa/mesa/tree/src/intel/isl/isl.c?h=17.0#n211
> 
> > + * Actually only the lower bit of the pair seems to have any effect.
> > + * No idea why. 0 in the lower bit would seem to mean not compressed,
> > + * and 1 is compressed.  The interpreation of the main surface data
> > + * when the block is marked compressed is unknown as of now.
> 
> If I recall correctly (it's been several months since I inspected the
> bits), the high bit is actually significant. Bit pattern 11 means that
> the data in primary surface's cacheline-pair is invalid; the 3D engine
> interprets the color of all pixels in that cacheline-pair to be the
> clear color stored in RENDER_SURFACE_STATE. Before the display engine
> can consume the surface, userspace is required to do a partial resolve,
> flushing the clear color into the primary surface. So it makes sense
> that the kernel would never observe that bit pattern in an incoming ccs
> surface, as long as userspace is doing its job correctly. And it makes
> sense that the display engine would ignore the high bit, because there is
> no way to provide the clear color to the display engine (at least
> according my reading of the docs).
> 
> Jason, does this match your understanding of the high bit?
> 
> > + *
> > + * CCS tile is laid out in 8 byte horizontal strips each strip thus corresponds
> > + * to a 128x8 pixel are of the main surface. So each 8x8 bytes of the CCS
> > + * (1 cacheline) will match an area of 4x2 tiles on the main surface.
> > + *
> > + * Here is the layout of a full CCS tile, with the 8 byte strips numbered 0-511:
> > + * ------------------------
> > + * |  0 |  64 | ... | 448 |
> > + * |  1 |  65 |     | 449 |
> > + * |  2 |  66 |     | 450 |
> > + * |  . |   . |     |   . |
> > + * |  . |   . |     |   . |
> > + * |  . |   . |     |   . |
> > + * | 63 | 127 |     | 511 |
> > + * ------------------------
> > + *
> > + * This will match an area of 1024x512 pixels on the main surface.
> > + *
> > + * The CCS plane pitch must be a multiple of the CCS tile width (64 bytes),
> > + * and for the purposes of the CCS plane offset we assume cpp==1. As each
> > + * byte matches a 16x8 area of the main surface, the dimensions of the CCS
> > + * plane will thus appear to be width/16 x height/8. Both planes must be
> > + * contained within the same gem object.
> 
> Again, the above paragraphs should clarify that they apply only to 32-bit formats.
> 
> > + */
> > +#define I915_FORMAT_MOD_Y_TILED_CCS	fourcc_mod_code(INTEL, 4)
> > +#define I915_FORMAT_MOD_Yf_TILED_CCS	fourcc_mod_code(INTEL, 5)

Ben Widawsky Feb. 28, 2017, 5:36 a.m. UTC | #5

On 17-02-27 17:13:41, Ville Syrjälä wrote:
>On Sun, Feb 26, 2017 at 02:41:50PM -0800, Chad Versace wrote:
>> On Wed 04 Jan 2017, ville.syrjala@linux.intel.com wrote:
>> > From: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> >
>> > SKL+ display engine can scan out certain kinds of compressed surfaces
>> > produced by the render engine. This involved telling the display engine
>> > the location of the color control surfae (CCS) which describes which
>> > parts of the main surface are compressed and which are not. The location
>> > of CCS is provided by userspace as just another plane with its own offset.
>> >
>> > By providing our own format information for the CCS formats, we should
>> > be able to make framebuffer_check() do the right thing for the CCS
>> > surface as well.
>> >
>> > Note that we'll return the same format info for both Y and Yf tiled
>> > format as that's what happens with the non-CCS Y vs. Yf as well. If
>> > desired, we could potentially return a unique pointer for each
>> > pixel_format+tiling+ccs combination, in which case we immediately be
>> > able to tell if any of that stuff changed by just comparing the
>> > pointers. But that does sound a bit wasteful space wise.
>> >
>> > v2: Drop the 'dev' argument from the hook
>> > v3: Include the description of the CCS surface layout
>> >
>> > Cc: Vandana Kannan <vandana.kannan@intel.com>
>> > Cc: Daniel Vetter <daniel@ffwll.ch>
>> > Cc: Ben Widawsky <ben@bwidawsk.net>
>> > Cc: Jason Ekstrand <jason@jlekstrand.net>
>> > Reviewed-by: Ben Widawsky <ben@bwidawsk.net>
>> > Signed-off-by: Ville Syrjälä <ville.syrjala@linux.intel.com>
>> > ---
>> >  drivers/gpu/drm/i915/intel_display.c | 36 ++++++++++++++++++++++++++
>> >  include/uapi/drm/drm_fourcc.h        | 49 ++++++++++++++++++++++++++++++++++++
>> >  2 files changed, 85 insertions(+)
>>
>>
>> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
>> > index 9e1bb7fabcde..4581e3d41e5c 100644
>> > --- a/include/uapi/drm/drm_fourcc.h
>> > +++ b/include/uapi/drm/drm_fourcc.h
>> > @@ -230,6 +230,55 @@ extern "C" {
>> >  #define I915_FORMAT_MOD_Yf_TILED fourcc_mod_code(INTEL, 3)
>> >
>> >  /*
>> > + * Intel color control surface (CCS) for render compression
>> > + *
>> > + * The framebuffer format must be one of the 8:8:8:8 RGB formats,
>> > + * the main surface will be plane index 0 and will be Y/Yf-tiled,
>> > + * the CCS will be plane index 1.
>> > + *
>>
>> The paragraph below is...
>>
>> > + * Each byte of CCS contains 4 pairs of bits, with each pair of bits
>> > + * matching an area of 8x4 pixels of the main surface. Which would seem
>> > + * to match 2 cachelines containing 4x4 pixels each. The pairs bits within
>> > + * the byte form a 2x2 grid, which thus matches a 16x8 pixel area of the
>> > + * main surface. This is the 2x2 pattern the bits form (0=lsb, 7=msb):
>> > + * -----------
>> > + * | 01 | 23 |
>> > + *  ----------
>> > + * | 45 | 67 |
>> > + * -----------
>>
>> ...almost correct. The hw docs state that each bit-pair of the CCS maps
>> cacheline-pair, horizontally adjacent in the Y tile, of the primary surface. As
>> a consequence, the remainder of the above paragraph is correct for 32-bit
>> formats, but not others.
>
>Which is why the comment says at the very top that the fb needs to
>use a 8:8:8:8 format. IIRC that's the only thing the hardware supports.
>

It is, and it is for the foreseeable future too. Chad, granted you say this
isn't a nitpick below, but it is because Ville's patch is for the KMS case, it's
not entirely relevant here.


>>
>> This is not a nitpick, because Vulkan and EGL users may want to share
>> dma_bufs with a fat format like R32G32B32A32_UNORM, and want to have CCS
>> enabled when possible. As long as the users use the dma_buf only the 3D
>> engine, and don't submit it to KMS, it's all safe.
>>
>> For those users, we should document the bit-pair/cacheline-pair relationship
>> correctly. And then preface the following detailed explanation and nice ascii
>> diagrams by saying "this applies only to the 32-bit formats".
>>
>> Here's the relevant PRM quote:
>>
>>      The Color Control Surface (CCS) contains the compression status
>>      of the cache-line pairs. The compression state of the cache-line
>>      pair is specified by 2 bits in the CCS.  Each CCS cache-line
>>      represents an area on the main surface of 16x16 sets of 128 byte
>>      Y-tiled cache-line-pairs. CCS is always Y tiled.
>>
>> See this Mesa comment for more details:
>> https://cgit.freedesktop.org/mesa/mesa/tree/src/intel/isl/isl.c?h=17.0#n211
>>
>> > + * Actually only the lower bit of the pair seems to have any effect.
>> > + * No idea why. 0 in the lower bit would seem to mean not compressed,
>> > + * and 1 is compressed.  The interpreation of the main surface data
>> > + * when the block is marked compressed is unknown as of now.
>>
>> If I recall correctly (it's been several months since I inspected the
>> bits), the high bit is actually significant. Bit pattern 11 means that
>> the data in primary surface's cacheline-pair is invalid; the 3D engine
>> interprets the color of all pixels in that cacheline-pair to be the
>> clear color stored in RENDER_SURFACE_STATE. Before the display engine
>> can consume the surface, userspace is required to do a partial resolve,
>> flushing the clear color into the primary surface. So it makes sense
>> that the kernel would never observe that bit pattern in an incoming ccs
>> surface, as long as userspace is doing its job correctly. And it makes
>> sense that the display engine would ignore the high bit, because there is
>> no way to provide the clear color to the display engine (at least
>> according my reading of the docs).
>>
>> Jason, does this match your understanding of the high bit?
>>
>> > + *
>> > + * CCS tile is laid out in 8 byte horizontal strips each strip thus corresponds
>> > + * to a 128x8 pixel are of the main surface. So each 8x8 bytes of the CCS
>> > + * (1 cacheline) will match an area of 4x2 tiles on the main surface.
>> > + *
>> > + * Here is the layout of a full CCS tile, with the 8 byte strips numbered 0-511:
>> > + * ------------------------
>> > + * |  0 |  64 | ... | 448 |
>> > + * |  1 |  65 |     | 449 |
>> > + * |  2 |  66 |     | 450 |
>> > + * |  . |   . |     |   . |
>> > + * |  . |   . |     |   . |
>> > + * |  . |   . |     |   . |
>> > + * | 63 | 127 |     | 511 |
>> > + * ------------------------
>> > + *
>> > + * This will match an area of 1024x512 pixels on the main surface.
>> > + *
>> > + * The CCS plane pitch must be a multiple of the CCS tile width (64 bytes),
>> > + * and for the purposes of the CCS plane offset we assume cpp==1. As each
>> > + * byte matches a 16x8 area of the main surface, the dimensions of the CCS
>> > + * plane will thus appear to be width/16 x height/8. Both planes must be
>> > + * contained within the same gem object.
>>
>> Again, the above paragraphs should clarify that they apply only to 32-bit formats.
>>
>> > + */
>> > +#define I915_FORMAT_MOD_Y_TILED_CCS	fourcc_mod_code(INTEL, 4)
>> > +#define I915_FORMAT_MOD_Yf_TILED_CCS	fourcc_mod_code(INTEL, 5)
>
>-- 
>Ville Syrjälä
>Intel OTC

[8/9] drm/i915: Implement .get_format_info() hook for CCS

Commit Message

Comments

Patch