diff mbox series

[RFC] drm/amdgpu: Add macros and documentation for format modifiers.

Message ID 20180904010033.67611-1-bas@basnieuwenhuizen.nl (mailing list archive)
State New, archived
Headers show
Series [RFC] drm/amdgpu: Add macros and documentation for format modifiers. | expand

Commit Message

Bas Nieuwenhuizen Sept. 4, 2018, 1 a.m. UTC
This is an initial proposal for format modifiers for AMD hardware.

It uses 48 bits including a chip generation, leaving 8 bits for
a format version number.

On gfx6-gfx8 we have all the fields influencing sample locations
in memory.

Tile split bytes are optional for single sample buffers as no
hardware reaches the split size with 1 sample and hence the actual
size does not matter.

The macrotile fields are duplicated for images with multiple planes.
If the planes have different bitdepth they need different macro
tile fields and different tile split bytes if multisample.

I could not fit multiple copies in for tile split bytes, but
multisample & multiplane images are very rare. Overall, I think
we should punt on multisample for a later format version since
they are generally not shared on any modifier aware windowing
system, and we have more issues like fmask & cmask.

We need these copies because the drm modifier of all planes in an
image needs to be equal, so we need to fit these together.

This adds fields for compression support, using metadata that is
compatible with AMDVLK and for which radv and radeonsi can
reasonably be extended.

The big open question for compression is between which generations
the format changed to see if we can share more.

This explicitly does not try to solve the linear stride alignment
issue, thoguh we could internally just use the tiling modes for
the linear modes to indicate linear images with the stride for the
given chip.

Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
CC: Chad Versace <chadversary@chromium.org>
CC: Dave Airlie <airlied@redhat.com>
CC: Marek Olšák <marek.olsak@amd.com>
CC: Nicolai Hähnle <nicolai.haehnle@amd.com>
CC: Alex Deucher <alexander.deucher@amd.com>
CC: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 include/uapi/drm/amdgpu_drm.h | 130 ++++++++++++++++++++++++++++++++++
 1 file changed, 130 insertions(+)

Comments

Christian König Sept. 4, 2018, 7:10 a.m. UTC | #1
Am 04.09.2018 um 03:00 schrieb Bas Nieuwenhuizen:
> This is an initial proposal for format modifiers for AMD hardware.
>
> It uses 48 bits including a chip generation, leaving 8 bits for
> a format version number.

I'm absolutely not an expert on this, but as far as I know the major 
problem with this approach was that we couldn't fit all necessary 
parameter into a 64bit identifier.

The last thing I've heard was something like we need at least 128bit, 
better 256. But Marek and Alex certainly know more about that stuff.

Christian.

>
> On gfx6-gfx8 we have all the fields influencing sample locations
> in memory.
>
> Tile split bytes are optional for single sample buffers as no
> hardware reaches the split size with 1 sample and hence the actual
> size does not matter.
>
> The macrotile fields are duplicated for images with multiple planes.
> If the planes have different bitdepth they need different macro
> tile fields and different tile split bytes if multisample.
>
> I could not fit multiple copies in for tile split bytes, but
> multisample & multiplane images are very rare. Overall, I think
> we should punt on multisample for a later format version since
> they are generally not shared on any modifier aware windowing
> system, and we have more issues like fmask & cmask.
>
> We need these copies because the drm modifier of all planes in an
> image needs to be equal, so we need to fit these together.
>
> This adds fields for compression support, using metadata that is
> compatible with AMDVLK and for which radv and radeonsi can
> reasonably be extended.
>
> The big open question for compression is between which generations
> the format changed to see if we can share more.
>
> This explicitly does not try to solve the linear stride alignment
> issue, thoguh we could internally just use the tiling modes for
> the linear modes to indicate linear images with the stride for the
> given chip.
>
> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> CC: Chad Versace <chadversary@chromium.org>
> CC: Dave Airlie <airlied@redhat.com>
> CC: Marek Olšák <marek.olsak@amd.com>
> CC: Nicolai Hähnle <nicolai.haehnle@amd.com>
> CC: Alex Deucher <alexander.deucher@amd.com>
> CC: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>   include/uapi/drm/amdgpu_drm.h | 130 ++++++++++++++++++++++++++++++++++
>   1 file changed, 130 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 94444eeba55b..4e1452161dbf 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table {
>   #define AMDGPU_FAMILY_AI			141 /* Vega10 */
>   #define AMDGPU_FAMILY_RV			142 /* Raven */
>   
> +#define AMDGPU_CHIP_TAHITI	0
> +#define AMDGPU_CHIP_PITCAIRN	1
> +#define AMDGPU_CHIP_VERDE	2
> +#define AMDGPU_CHIP_OLAND	3
> +#define AMDGPU_CHIP_HAINAN	4
> +#define AMDGPU_CHIP_BONAIRE	5
> +#define AMDGPU_CHIP_KAVERI	6
> +#define AMDGPU_CHIP_KABINI	7
> +#define AMDGPU_CHIP_HAWAII	8
> +#define AMDGPU_CHIP_MULLINS	9
> +#define AMDGPU_CHIP_TOPAZ	10
> +#define AMDGPU_CHIP_TONGA	11
> +#define AMDGPU_CHIP_FIJI	12
> +#define AMDGPU_CHIP_CARRIZO	13
> +#define AMDGPU_CHIP_STONEY	14
> +#define AMDGPU_CHIP_POLARIS10	15
> +#define AMDGPU_CHIP_POLARIS11	16
> +#define AMDGPU_CHIP_POLARIS12	17
> +#define AMDGPU_CHIP_VEGAM	18
> +#define AMDGPU_CHIP_VEGA10	19
> +#define AMDGPU_CHIP_VEGA12	20
> +#define AMDGPU_CHIP_VEGA20	21
> +#define AMDGPU_CHIP_RAVEN	22
> +
> +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same
> + * as AMDGPU_TILING_*. However, the the rules as to when to set them are
> + * different.
> + *
> + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR
> + * instead.
> + *
> + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be
> + * set.
> + *
> + * For other ARRAY_MODEs:
> + *  - Only set TILE_SPLIT if the image is multisample.
> + *
> + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1
> + * different value there. The values are
> + *   - depth           : 0
> + *   - displayable     : 1
> + *   - thin            : 2
> + *   - thick (GFX6)    : 3
> + *   - rotated (GFX7+) : 4
> + *
> + * TODO: What to do with multisample multi plane images? More tile split
> + * fields don't fit if we want to keep a few bits for a format version.
> + */
> +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT		0
> +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK		0xf
> +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT		4
> +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK		0x1f
> +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT		9
> +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK		0x7
> +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT	12
> +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK	0x7
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT		15
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK		0x3
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT		17
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK		0x3
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT	19
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK	0x3
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT		21
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_MASK		0x3
> +
> +/* Macrotile parameters for a second plane if existing */
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_SHIFT		23
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_MASK		0x3
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_SHIFT	25
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_MASK		0x3
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_SHIFT	27
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_MASK	0x3
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_SHIFT		29
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_MASK		0x3
> +
> +/* Macrotile parameters for a third plane if existing */
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_SHIFT		31
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_MASK		0x3
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_SHIFT	33
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_MASK		0x3
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_SHIFT	35
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_MASK	0x3
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_SHIFT		37
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_MASK		0x3
> +
> +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_SHIFT		0
> +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_MASK		0x1f
> +
> +/* Whether to enable DCC compression.
> + *
> + * If enabled, exporting the surface results in three
> + * planes:
> + *   - color data
> + *   - DCC data
> + *   - a 64-byte block with
> + *     - a 16 byte 0/1 bool as to whether the surface is currently DCC compressed.
> + *     - a 16-byte 0/1 bool as to whether the surface has fastclear data
> + *     - a 8-byte chunk with the current fastclear colors
> + *
> + * To ensure we do not keep compressing and decompressing the surface, once it
> + * has been decompressed no party may recompress again.
> + *
> + * Applications should not hand over images with fastclear data as not
> + * all users can support it, however, to help both Vulkan implementations
> + * with the allocation we keep it in the 64-byte block.
> + *
> + * TODO: Can scanout really not support fastclear data?
> + * TODO: What to do with multiplane images?
> + */
> +#define AMDGPU_MODIFIER_COMPRESSION_SHIFT		39
> +#define AMDGPU_MODIFIER_COMPRESSION_MASK		0x1
> +
> +/* The chip this is compatible with.
> + *
> + * If compression is disabled, use
> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> + *
> + * With compression enabled please use the exact chip.
> + *
> + * TODO: Do some generations share DCC format?
> + */
> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT			40
> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK			0xff
> +
> +#define AMDGPU_MODIFIER_SET(field, value) \
> +	(((__u64)(value) & AMDGPU_MODIFIER_##field##_MASK) << AMDGPU_MODIFIER_##field##_SHIFT)
> +#define AMDGPU_MODIFIER_GET(value, field) \
> +	(((__u64)(value) >> AMDGPU_MODIFIER_##field##_SHIFT) & AMDGPU_MODIFIER_##field##_MASK)
> +
>   /*
>    * Definition of free sync enter and exit signals
>    * We may have more options in the future
Daniel Vetter Sept. 4, 2018, 10:04 a.m. UTC | #2
On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen
<bas@basnieuwenhuizen.nl> wrote:
> This is an initial proposal for format modifiers for AMD hardware.
>
> It uses 48 bits including a chip generation, leaving 8 bits for
> a format version number.
>
> On gfx6-gfx8 we have all the fields influencing sample locations
> in memory.
>
> Tile split bytes are optional for single sample buffers as no
> hardware reaches the split size with 1 sample and hence the actual
> size does not matter.
>
> The macrotile fields are duplicated for images with multiple planes.
> If the planes have different bitdepth they need different macro
> tile fields and different tile split bytes if multisample.
>
> I could not fit multiple copies in for tile split bytes, but
> multisample & multiplane images are very rare. Overall, I think
> we should punt on multisample for a later format version since
> they are generally not shared on any modifier aware windowing
> system, and we have more issues like fmask & cmask.
>
> We need these copies because the drm modifier of all planes in an
> image needs to be equal, so we need to fit these together.
>
> This adds fields for compression support, using metadata that is
> compatible with AMDVLK and for which radv and radeonsi can
> reasonably be extended.
>
> The big open question for compression is between which generations
> the format changed to see if we can share more.
>
> This explicitly does not try to solve the linear stride alignment
> issue, thoguh we could internally just use the tiling modes for
> the linear modes to indicate linear images with the stride for the
> given chip.
>
> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> CC: Chad Versace <chadversary@chromium.org>
> CC: Dave Airlie <airlied@redhat.com>
> CC: Marek Olšák <marek.olsak@amd.com>
> CC: Nicolai Hähnle <nicolai.haehnle@amd.com>
> CC: Alex Deucher <alexander.deucher@amd.com>
> CC: Daniel Vetter <daniel.vetter@ffwll.ch>
> ---
>  include/uapi/drm/amdgpu_drm.h | 130 ++++++++++++++++++++++++++++++++++
>  1 file changed, 130 insertions(+)
>
> diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> index 94444eeba55b..4e1452161dbf 100644
> --- a/include/uapi/drm/amdgpu_drm.h
> +++ b/include/uapi/drm/amdgpu_drm.h
> @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table {
>  #define AMDGPU_FAMILY_AI                       141 /* Vega10 */
>  #define AMDGPU_FAMILY_RV                       142 /* Raven */
>
> +#define AMDGPU_CHIP_TAHITI     0
> +#define AMDGPU_CHIP_PITCAIRN   1
> +#define AMDGPU_CHIP_VERDE      2
> +#define AMDGPU_CHIP_OLAND      3
> +#define AMDGPU_CHIP_HAINAN     4
> +#define AMDGPU_CHIP_BONAIRE    5
> +#define AMDGPU_CHIP_KAVERI     6
> +#define AMDGPU_CHIP_KABINI     7
> +#define AMDGPU_CHIP_HAWAII     8
> +#define AMDGPU_CHIP_MULLINS    9
> +#define AMDGPU_CHIP_TOPAZ      10
> +#define AMDGPU_CHIP_TONGA      11
> +#define AMDGPU_CHIP_FIJI       12
> +#define AMDGPU_CHIP_CARRIZO    13
> +#define AMDGPU_CHIP_STONEY     14
> +#define AMDGPU_CHIP_POLARIS10  15
> +#define AMDGPU_CHIP_POLARIS11  16
> +#define AMDGPU_CHIP_POLARIS12  17
> +#define AMDGPU_CHIP_VEGAM      18
> +#define AMDGPU_CHIP_VEGA10     19
> +#define AMDGPU_CHIP_VEGA12     20
> +#define AMDGPU_CHIP_VEGA20     21
> +#define AMDGPU_CHIP_RAVEN      22
> +
> +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same
> + * as AMDGPU_TILING_*. However, the the rules as to when to set them are
> + * different.
> + *
> + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR
> + * instead.
> + *
> + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be
> + * set.
> + *
> + * For other ARRAY_MODEs:
> + *  - Only set TILE_SPLIT if the image is multisample.
> + *
> + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1
> + * different value there. The values are
> + *   - depth           : 0
> + *   - displayable     : 1
> + *   - thin            : 2
> + *   - thick (GFX6)    : 3
> + *   - rotated (GFX7+) : 4
> + *
> + * TODO: What to do with multisample multi plane images? More tile split
> + * fields don't fit if we want to keep a few bits for a format version.
> + */
> +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT          0
> +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK           0xf
> +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT         4
> +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK          0x1f
> +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT          9
> +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK           0x7
> +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT     12
> +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK      0x7
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT          15
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK           0x3
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT         17
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK          0x3
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT   19
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK    0x3
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT           21
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_MASK            0x3
> +
> +/* Macrotile parameters for a second plane if existing */
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_SHIFT                23
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_MASK         0x3
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_SHIFT       25
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_MASK                0x3
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_SHIFT 27
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_MASK  0x3
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_SHIFT         29
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_MASK          0x3
> +
> +/* Macrotile parameters for a third plane if existing */
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_SHIFT                31
> +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_MASK         0x3
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_SHIFT       33
> +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_MASK                0x3
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_SHIFT 35
> +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_MASK  0x3
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_SHIFT         37
> +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_MASK          0x3
> +
> +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_SHIFT                0
> +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_MASK         0x1f
> +
> +/* Whether to enable DCC compression.
> + *
> + * If enabled, exporting the surface results in three
> + * planes:
> + *   - color data
> + *   - DCC data
> + *   - a 64-byte block with
> + *     - a 16 byte 0/1 bool as to whether the surface is currently DCC compressed.
> + *     - a 16-byte 0/1 bool as to whether the surface has fastclear data
> + *     - a 8-byte chunk with the current fastclear colors
> + *
> + * To ensure we do not keep compressing and decompressing the surface, once it
> + * has been decompressed no party may recompress again.
> + *
> + * Applications should not hand over images with fastclear data as not
> + * all users can support it, however, to help both Vulkan implementations
> + * with the allocation we keep it in the 64-byte block.
> + *
> + * TODO: Can scanout really not support fastclear data?
> + * TODO: What to do with multiplane images?
> + */
> +#define AMDGPU_MODIFIER_COMPRESSION_SHIFT              39
> +#define AMDGPU_MODIFIER_COMPRESSION_MASK               0x1

Summary of the irc discussion with my 2 recommendations:

- Make it clear in the spec here that all consumers must follow the 16
byte boolean to control actual DCC usage (using a conditional register
write in the command buffer). On intel we've put that into the
modifier itself, along the lines of "I don't like compressed anymore,
pls give me something else". Not that runtime renogiation works well
(or well, at all).

- If you have consumers which can't handle fastclear, make it a
separate modifier.

> +
> +/* The chip this is compatible with.
> + *
> + * If compression is disabled, use
> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> + *
> + * With compression enabled please use the exact chip.
> + *
> + * TODO: Do some generations share DCC format?
> + */
> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff

Do you really need all the combinations here of DCC + gpu gen + tiling
details? When we had the entire discussion with nvidia folks they
eventually agreed that they don't need the massive pile with every
possible combination. Do you really plan to share all these different
things?

Note that e.g. on i915 we spec some of the tiling depending upon
buffer size and buffer format (because that's how the hw works), not
using explicit modifier flags for everything.
-Daniel

> +
> +#define AMDGPU_MODIFIER_SET(field, value) \
> +       (((__u64)(value) & AMDGPU_MODIFIER_##field##_MASK) << AMDGPU_MODIFIER_##field##_SHIFT)
> +#define AMDGPU_MODIFIER_GET(value, field) \
> +       (((__u64)(value) >> AMDGPU_MODIFIER_##field##_SHIFT) & AMDGPU_MODIFIER_##field##_MASK)
> +
>  /*
>   * Definition of free sync enter and exit signals
>   * We may have more options in the future
> --
> 2.18.0
>
Daniel Stone Sept. 4, 2018, 10:15 a.m. UTC | #3
Hi,

On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> > +/* The chip this is compatible with.
> > + *
> > + * If compression is disabled, use
> > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > + *
> > + * With compression enabled please use the exact chip.
> > + *
> > + * TODO: Do some generations share DCC format?
> > + */
> > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>
> Do you really need all the combinations here of DCC + gpu gen + tiling
> details? When we had the entire discussion with nvidia folks they
> eventually agreed that they don't need the massive pile with every
> possible combination. Do you really plan to share all these different
> things?
>
> Note that e.g. on i915 we spec some of the tiling depending upon
> buffer size and buffer format (because that's how the hw works), not
> using explicit modifier flags for everything.

Right. The conclusion, after people went through and started sorting
out the kinds of formats for which they would _actually_ export real
colour buffers for, that most vendors definitely have fewer than
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
possible formats to represent, very likely fewer than
340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
fewer than 72,057,594,037,927,936 formats, and even still generally
fewer than 281,474,976,710,656 if you want to be generous and leave 8
bits of the 56 available.

If you do use 256 bits in order to represent
115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
modifiers per format, userspace would start hitting OOM pretty quickly
as it attempted to enumerate and negotiate acceptable modifiers.
Either that or we need to replace the fixed 64-bit modifier tokens
with some kind of eBPF script.

Cheers,
Daniel
Christian König Sept. 4, 2018, 10:44 a.m. UTC | #4
Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> Hi,
>
> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
>>> +/* The chip this is compatible with.
>>> + *
>>> + * If compression is disabled, use
>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>>> + *
>>> + * With compression enabled please use the exact chip.
>>> + *
>>> + * TODO: Do some generations share DCC format?
>>> + */
>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>> Do you really need all the combinations here of DCC + gpu gen + tiling
>> details? When we had the entire discussion with nvidia folks they
>> eventually agreed that they don't need the massive pile with every
>> possible combination. Do you really plan to share all these different
>> things?
>>
>> Note that e.g. on i915 we spec some of the tiling depending upon
>> buffer size and buffer format (because that's how the hw works), not
>> using explicit modifier flags for everything.
> Right. The conclusion, after people went through and started sorting
> out the kinds of formats for which they would _actually_ export real
> colour buffers for, that most vendors definitely have fewer than
> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> possible formats to represent, very likely fewer than
> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> fewer than 72,057,594,037,927,936 formats, and even still generally
> fewer than 281,474,976,710,656 if you want to be generous and leave 8
> bits of the 56 available.

The problem here is that at least for some parameters we actually don't 
know which formats are actually used.

The following are not real world examples, but just to explain the 
general problem.

The memory configuration for example can be not ASIC specific, but 
rather determined by whoever took the ASIC and glued it together with 
VRAM on a board. It is not likely that somebody puts all the VRAM chips 
on one channel, but it is still perfectly possible.

Same is true for things like harvesting, e.g. of 16 channels halve of 
them could be bad and we need to know which to actually use.

Regards,
Christian.

>
> If you do use 256 bits in order to represent
> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> modifiers per format, userspace would start hitting OOM pretty quickly
> as it attempted to enumerate and negotiate acceptable modifiers.
> Either that or we need to replace the fixed 64-bit modifier tokens
> with some kind of eBPF script.
>
> Cheers,
> Daniel
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Bas Nieuwenhuizen Sept. 4, 2018, 12:21 p.m. UTC | #5
On Tue, Sep 4, 2018 at 12:04 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>
> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl> wrote:
> > This is an initial proposal for format modifiers for AMD hardware.
> >
> > It uses 48 bits including a chip generation, leaving 8 bits for
> > a format version number.
> >
> > On gfx6-gfx8 we have all the fields influencing sample locations
> > in memory.
> >
> > Tile split bytes are optional for single sample buffers as no
> > hardware reaches the split size with 1 sample and hence the actual
> > size does not matter.
> >
> > The macrotile fields are duplicated for images with multiple planes.
> > If the planes have different bitdepth they need different macro
> > tile fields and different tile split bytes if multisample.
> >
> > I could not fit multiple copies in for tile split bytes, but
> > multisample & multiplane images are very rare. Overall, I think
> > we should punt on multisample for a later format version since
> > they are generally not shared on any modifier aware windowing
> > system, and we have more issues like fmask & cmask.
> >
> > We need these copies because the drm modifier of all planes in an
> > image needs to be equal, so we need to fit these together.
> >
> > This adds fields for compression support, using metadata that is
> > compatible with AMDVLK and for which radv and radeonsi can
> > reasonably be extended.
> >
> > The big open question for compression is between which generations
> > the format changed to see if we can share more.
> >
> > This explicitly does not try to solve the linear stride alignment
> > issue, thoguh we could internally just use the tiling modes for
> > the linear modes to indicate linear images with the stride for the
> > given chip.
> >
> > Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl>
> > CC: Chad Versace <chadversary@chromium.org>
> > CC: Dave Airlie <airlied@redhat.com>
> > CC: Marek Olšák <marek.olsak@amd.com>
> > CC: Nicolai Hähnle <nicolai.haehnle@amd.com>
> > CC: Alex Deucher <alexander.deucher@amd.com>
> > CC: Daniel Vetter <daniel.vetter@ffwll.ch>
> > ---
> >  include/uapi/drm/amdgpu_drm.h | 130 ++++++++++++++++++++++++++++++++++
> >  1 file changed, 130 insertions(+)
> >
> > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
> > index 94444eeba55b..4e1452161dbf 100644
> > --- a/include/uapi/drm/amdgpu_drm.h
> > +++ b/include/uapi/drm/amdgpu_drm.h
> > @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table {
> >  #define AMDGPU_FAMILY_AI                       141 /* Vega10 */
> >  #define AMDGPU_FAMILY_RV                       142 /* Raven */
> >
> > +#define AMDGPU_CHIP_TAHITI     0
> > +#define AMDGPU_CHIP_PITCAIRN   1
> > +#define AMDGPU_CHIP_VERDE      2
> > +#define AMDGPU_CHIP_OLAND      3
> > +#define AMDGPU_CHIP_HAINAN     4
> > +#define AMDGPU_CHIP_BONAIRE    5
> > +#define AMDGPU_CHIP_KAVERI     6
> > +#define AMDGPU_CHIP_KABINI     7
> > +#define AMDGPU_CHIP_HAWAII     8
> > +#define AMDGPU_CHIP_MULLINS    9
> > +#define AMDGPU_CHIP_TOPAZ      10
> > +#define AMDGPU_CHIP_TONGA      11
> > +#define AMDGPU_CHIP_FIJI       12
> > +#define AMDGPU_CHIP_CARRIZO    13
> > +#define AMDGPU_CHIP_STONEY     14
> > +#define AMDGPU_CHIP_POLARIS10  15
> > +#define AMDGPU_CHIP_POLARIS11  16
> > +#define AMDGPU_CHIP_POLARIS12  17
> > +#define AMDGPU_CHIP_VEGAM      18
> > +#define AMDGPU_CHIP_VEGA10     19
> > +#define AMDGPU_CHIP_VEGA12     20
> > +#define AMDGPU_CHIP_VEGA20     21
> > +#define AMDGPU_CHIP_RAVEN      22
> > +
> > +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same
> > + * as AMDGPU_TILING_*. However, the the rules as to when to set them are
> > + * different.
> > + *
> > + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR
> > + * instead.
> > + *
> > + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be
> > + * set.
> > + *
> > + * For other ARRAY_MODEs:
> > + *  - Only set TILE_SPLIT if the image is multisample.
> > + *
> > + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1
> > + * different value there. The values are
> > + *   - depth           : 0
> > + *   - displayable     : 1
> > + *   - thin            : 2
> > + *   - thick (GFX6)    : 3
> > + *   - rotated (GFX7+) : 4
> > + *
> > + * TODO: What to do with multisample multi plane images? More tile split
> > + * fields don't fit if we want to keep a few bits for a format version.
> > + */
> > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT          0
> > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK           0xf
> > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT         4
> > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK          0x1f
> > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT          9
> > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK           0x7
> > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT     12
> > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK      0x7
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT          15
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK           0x3
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT         17
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK          0x3
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT   19
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK    0x3
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT           21
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_MASK            0x3
> > +
> > +/* Macrotile parameters for a second plane if existing */
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_SHIFT                23
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_MASK         0x3
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_SHIFT       25
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_MASK                0x3
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_SHIFT 27
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_MASK  0x3
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_SHIFT         29
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_MASK          0x3
> > +
> > +/* Macrotile parameters for a third plane if existing */
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_SHIFT                31
> > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_MASK         0x3
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_SHIFT       33
> > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_MASK                0x3
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_SHIFT 35
> > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_MASK  0x3
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_SHIFT         37
> > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_MASK          0x3
> > +
> > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_SHIFT                0
> > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_MASK         0x1f
> > +
> > +/* Whether to enable DCC compression.
> > + *
> > + * If enabled, exporting the surface results in three
> > + * planes:
> > + *   - color data
> > + *   - DCC data
> > + *   - a 64-byte block with
> > + *     - a 16 byte 0/1 bool as to whether the surface is currently DCC compressed.
> > + *     - a 16-byte 0/1 bool as to whether the surface has fastclear data
> > + *     - a 8-byte chunk with the current fastclear colors
> > + *
> > + * To ensure we do not keep compressing and decompressing the surface, once it
> > + * has been decompressed no party may recompress again.
> > + *
> > + * Applications should not hand over images with fastclear data as not
> > + * all users can support it, however, to help both Vulkan implementations
> > + * with the allocation we keep it in the 64-byte block.
> > + *
> > + * TODO: Can scanout really not support fastclear data?
> > + * TODO: What to do with multiplane images?
> > + */
> > +#define AMDGPU_MODIFIER_COMPRESSION_SHIFT              39
> > +#define AMDGPU_MODIFIER_COMPRESSION_MASK               0x1
>
> Summary of the irc discussion with my 2 recommendations:
>
> - Make it clear in the spec here that all consumers must follow the 16
> byte boolean to control actual DCC usage (using a conditional register
> write in the command buffer). On intel we've put that into the
> modifier itself, along the lines of "I don't like compressed anymore,
> pls give me something else". Not that runtime renogiation works well
> (or well, at all).
>
> - If you have consumers which can't handle fastclear, make it a
> separate modifier.
>
> > +
> > +/* The chip this is compatible with.
> > + *
> > + * If compression is disabled, use
> > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > + *
> > + * With compression enabled please use the exact chip.
> > + *
> > + * TODO: Do some generations share DCC format?
> > + */
> > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>
> Do you really need all the combinations here of DCC + gpu gen + tiling
> details? When we had the entire discussion with nvidia folks they
> eventually agreed that they don't need the massive pile with every
> possible combination. Do you really plan to share all these different
> things?

So GPU gen is based on the fact that GFX6-GFX8 and GFX9 had a
significant change in tiling modes that are not compatible.
Furthermore, DCC formats have seen changes between chips in the same
generation and without more documentation on when those are compatible
I'd like to be conservative and for DCC put every chip in its own
class.

For whether to enable DCC: there are basically a bunch of conditions
in image creation parameters on when we can use it, so I'd like to
keep non-DCC versions available even for HW that supports DCC.

Tling on GFX9+ is only 5 bits so that is easy. For GFX6-GFX8 it is complicated.

Each chip has a configurable system-wide 32-entry table of tiling
modes (GFX7-GFX8 have the components split in two tables, but in
practice given a format there is only one choice for the second
table). The kernel has different settings for each chip to put in
there.

So if we do not want to enable sharing between GPUs we can just do a
precise chip + tiling mode index and be done with it. However, since
the entries are different between chips we have to either:
 1) make some kind of mapping from (chip, tiling index) to a canonical
value based on compatiblity. Which means for backwards compatibility
reasons we're never going to be able to change the table config in the
kernel again.
 2) include the value of the table entry in the modifier.

I chose solution 2 here. This means that while there is a lot of data
in here in practice the tiling mode only has ~32 options per chip per
format/bitdepth.

There are some blocks not bound by this table (mostly scanout), but in
practice the GL/vulkan driver just allocates from the tiling table
anyway so kernel driver is just going to have to expose the modifiers
corresponding to the tiling table to keep the amount limited.

So as far as fields go, we have
ARRAY_MODE = is this linear, micro-tiling, macro-tiling or thick-macro
tiling (for 3d textures, various thicknesses), sparse texture.
  linear is not relevant here. micro-tiling is relevant for cross-GPU
sharing if macro-tiling is not compatible. macro-tiling is optimal for
2d, sparse-texture could probably be aliased/out of scope.

MICRO_TILE = normal, display, depth, rotated display
  I think these are obviously worth keeping

PIPE_CONFIG = pretty much something constant over a chip but important
for everything that is not linear
TILE_SPLIT = relevant for multisample images. I think in practice
constant for pretty much everything except GFX6, where it is format
specific
BANK_WIDTH,BANK_HEIGHT,MACRO_TILE_ASPEC,NUM_BANKS = for macro tiles,
pretty much specific to chip + format, mostly independent of
everything else.
   because of the bitdepth dependence we have to have multiple for
multiplane images as some are e.g. one R plane + one GB plane.

>
> Note that e.g. on i915 we spec some of the tiling depending upon
> buffer size and buffer format (because that's how the hw works), not
> using explicit modifier flags for everything.
> -Daniel
>
> > +
> > +#define AMDGPU_MODIFIER_SET(field, value) \
> > +       (((__u64)(value) & AMDGPU_MODIFIER_##field##_MASK) << AMDGPU_MODIFIER_##field##_SHIFT)
> > +#define AMDGPU_MODIFIER_GET(value, field) \
> > +       (((__u64)(value) >> AMDGPU_MODIFIER_##field##_SHIFT) & AMDGPU_MODIFIER_##field##_MASK)
> > +
> >  /*
> >   * Definition of free sync enter and exit signals
> >   * We may have more options in the future
> > --
> > 2.18.0
> >
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Stone Sept. 4, 2018, 12:22 p.m. UTC | #6
Hi,

On Tue, 4 Sep 2018 at 11:44, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > Right. The conclusion, after people went through and started sorting
> > out the kinds of formats for which they would _actually_ export real
> > colour buffers for, that most vendors definitely have fewer than
> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > possible formats to represent, very likely fewer than
> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > fewer than 72,057,594,037,927,936 formats, and even still generally
> > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > bits of the 56 available.
>
> The problem here is that at least for some parameters we actually don't
> know which formats are actually used.
>
> The following are not real world examples, but just to explain the
> general problem.
>
> The memory configuration for example can be not ASIC specific, but
> rather determined by whoever took the ASIC and glued it together with
> VRAM on a board. It is not likely that somebody puts all the VRAM chips
> on one channel, but it is still perfectly possible.
>
> Same is true for things like harvesting, e.g. of 16 channels halve of
> them could be bad and we need to know which to actually use.

Sure, that's fine, but I'm not sure why, for instance, 'half of the
memory channels are bad and must be avoided' would be attached to a
format in the same way that macrotile size/depth would be? Similarly,
what happens if you encode a channel or lane blacklist mask into a
modifier, and then you import a buffer with that modifier into a
client API on another device. Does that also apply to the other device
or only the source device?

How then do you handle the capability query between them: is it
relevant to device B attempting to import and sample from an image
that device A only has a 128-bit memory bus because of SKU
configuration? How does generic userspace look at a token which
encodes all of this information and know that the buffer is
interchangeable between devices?

Cheers,
Daniel
Daniel Vetter Sept. 4, 2018, 12:26 p.m. UTC | #7
On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > Hi,
> > 
> > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> > > > +/* The chip this is compatible with.
> > > > + *
> > > > + * If compression is disabled, use
> > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > > > + *
> > > > + * With compression enabled please use the exact chip.
> > > > + *
> > > > + * TODO: Do some generations share DCC format?
> > > > + */
> > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> > > Do you really need all the combinations here of DCC + gpu gen + tiling
> > > details? When we had the entire discussion with nvidia folks they
> > > eventually agreed that they don't need the massive pile with every
> > > possible combination. Do you really plan to share all these different
> > > things?
> > > 
> > > Note that e.g. on i915 we spec some of the tiling depending upon
> > > buffer size and buffer format (because that's how the hw works), not
> > > using explicit modifier flags for everything.
> > Right. The conclusion, after people went through and started sorting
> > out the kinds of formats for which they would _actually_ export real
> > colour buffers for, that most vendors definitely have fewer than
> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > possible formats to represent, very likely fewer than
> > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > fewer than 72,057,594,037,927,936 formats, and even still generally
> > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > bits of the 56 available.
> 
> The problem here is that at least for some parameters we actually don't know
> which formats are actually used.
> 
> The following are not real world examples, but just to explain the general
> problem.
> 
> The memory configuration for example can be not ASIC specific, but rather
> determined by whoever took the ASIC and glued it together with VRAM on a
> board. It is not likely that somebody puts all the VRAM chips on one
> channel, but it is still perfectly possible.
> 
> Same is true for things like harvesting, e.g. of 16 channels halve of them
> could be bad and we need to know which to actually use.

For my understanding: This leaks outside the chip when sharing buffers?
All the information you only need locally to a given amdgpu instance
don't really need to be encoded in modifiers.

Pointers to code where this is all decided (kernel and radeonsi would be
good starters I guess) would be really good here.
-Daniel

> 
> Regards,
> Christian.
> 
> > 
> > If you do use 256 bits in order to represent
> > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > modifiers per format, userspace would start hitting OOM pretty quickly
> > as it attempted to enumerate and negotiate acceptable modifiers.
> > Either that or we need to replace the fixed 64-bit modifier tokens
> > with some kind of eBPF script.
> > 
> > Cheers,
> > Daniel
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Bas Nieuwenhuizen Sept. 4, 2018, 12:33 p.m. UTC | #8
On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > > Hi,
> > >
> > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> > > > > +/* The chip this is compatible with.
> > > > > + *
> > > > > + * If compression is disabled, use
> > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > > > > + *
> > > > > + * With compression enabled please use the exact chip.
> > > > > + *
> > > > > + * TODO: Do some generations share DCC format?
> > > > > + */
> > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> > > > Do you really need all the combinations here of DCC + gpu gen + tiling
> > > > details? When we had the entire discussion with nvidia folks they
> > > > eventually agreed that they don't need the massive pile with every
> > > > possible combination. Do you really plan to share all these different
> > > > things?
> > > >
> > > > Note that e.g. on i915 we spec some of the tiling depending upon
> > > > buffer size and buffer format (because that's how the hw works), not
> > > > using explicit modifier flags for everything.
> > > Right. The conclusion, after people went through and started sorting
> > > out the kinds of formats for which they would _actually_ export real
> > > colour buffers for, that most vendors definitely have fewer than
> > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > possible formats to represent, very likely fewer than
> > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > > fewer than 72,057,594,037,927,936 formats, and even still generally
> > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > > bits of the 56 available.
> >
> > The problem here is that at least for some parameters we actually don't know
> > which formats are actually used.
> >
> > The following are not real world examples, but just to explain the general
> > problem.
> >
> > The memory configuration for example can be not ASIC specific, but rather
> > determined by whoever took the ASIC and glued it together with VRAM on a
> > board. It is not likely that somebody puts all the VRAM chips on one
> > channel, but it is still perfectly possible.
> >
> > Same is true for things like harvesting, e.g. of 16 channels halve of them
> > could be bad and we need to know which to actually use.
>
> For my understanding: This leaks outside the chip when sharing buffers?
> All the information you only need locally to a given amdgpu instance
> don't really need to be encoded in modifiers.
>
> Pointers to code where this is all decided (kernel and radeonsi would be
> good starters I guess) would be really good here.

I extracted the information on which bits are relevant mostly from the
AddrFromCoord functions in addrlib in mesa:

for macro-tiles:
https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587

for micro-tiles (or the micro-tiles in macro-tiles):

https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016

> -Daniel
>
> >
> > Regards,
> > Christian.
> >
> > >
> > > If you do use 256 bits in order to represent
> > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > modifiers per format, userspace would start hitting OOM pretty quickly
> > > as it attempted to enumerate and negotiate acceptable modifiers.
> > > Either that or we need to replace the fixed 64-bit modifier tokens
> > > with some kind of eBPF script.
> > >
> > > Cheers,
> > > Daniel
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Bas Nieuwenhuizen Sept. 4, 2018, 12:40 p.m. UTC | #9
+everyone again
On Tue, Sep 4, 2018 at 2:39 PM Bas Nieuwenhuizen
<bas@basnieuwenhuizen.nl> wrote:
>
> On Tue, Sep 4, 2018 at 2:22 PM Daniel Stone <daniel@fooishbar.org> wrote:
> >
> > Hi,
> >
> > On Tue, 4 Sep 2018 at 11:44, Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > > > Right. The conclusion, after people went through and started sorting
> > > > out the kinds of formats for which they would _actually_ export real
> > > > colour buffers for, that most vendors definitely have fewer than
> > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > possible formats to represent, very likely fewer than
> > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > > > fewer than 72,057,594,037,927,936 formats, and even still generally
> > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > > > bits of the 56 available.
> > >
> > > The problem here is that at least for some parameters we actually don't
> > > know which formats are actually used.
> > >
> > > The following are not real world examples, but just to explain the
> > > general problem.
> > >
> > > The memory configuration for example can be not ASIC specific, but
> > > rather determined by whoever took the ASIC and glued it together with
> > > VRAM on a board. It is not likely that somebody puts all the VRAM chips
> > > on one channel, but it is still perfectly possible.
> > >
> > > Same is true for things like harvesting, e.g. of 16 channels halve of
> > > them could be bad and we need to know which to actually use.
> >
> > Sure, that's fine, but I'm not sure why, for instance, 'half of the
> > memory channels are bad and must be avoided' would be attached to a
> > format in the same way that macrotile size/depth would be? Similarly,
> > what happens if you encode a channel or lane blacklist mask into a
> > modifier, and then you import a buffer with that modifier into a
> > client API on another device. Does that also apply to the other device
> > or only the source device?
> >
> > How then do you handle the capability query between them: is it
> > relevant to device B attempting to import and sample from an image
> > that device A only has a 128-bit memory bus because of SKU
> > configuration? How does generic userspace look at a token which
> > encodes all of this information and know that the buffer is
> > interchangeable between devices?
>
> So my idea here is to encode all the bits that determine format in a
> way that as much as possible if it
> results in the same layout it should get the same modifier(i.e. remove
> useless info):
>
> https://lists.freedesktop.org/archives/dri-devel/2018-September/188418.html
>
> An example of the per-chip things is the PIPE_CONFIG, which is used
> for address calculation in the macro-tiled mode:
>
> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/siaddrlib.cpp#L666
>
> two chips with different PIPE_CONFIG will have different texture
> layouts, as the address swizzles are different.
>
> So it gets included in the modifier (the PIPE_CONFIG is pretty much
> constant per chip but multiple chips might have the same PIPE_CONFIG).
>
> >
> > Cheers,
> > Daniel
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Christian König Sept. 4, 2018, 12:59 p.m. UTC | #10
Am 04.09.2018 um 14:22 schrieb Daniel Stone:
> Hi,
>
> On Tue, 4 Sep 2018 at 11:44, Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>>> Right. The conclusion, after people went through and started sorting
>>> out the kinds of formats for which they would _actually_ export real
>>> colour buffers for, that most vendors definitely have fewer than
>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>> possible formats to represent, very likely fewer than
>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>>> bits of the 56 available.
>> The problem here is that at least for some parameters we actually don't
>> know which formats are actually used.
>>
>> The following are not real world examples, but just to explain the
>> general problem.
>>
>> The memory configuration for example can be not ASIC specific, but
>> rather determined by whoever took the ASIC and glued it together with
>> VRAM on a board. It is not likely that somebody puts all the VRAM chips
>> on one channel, but it is still perfectly possible.
>>
>> Same is true for things like harvesting, e.g. of 16 channels halve of
>> them could be bad and we need to know which to actually use.
> Sure, that's fine, but I'm not sure why, for instance, 'half of the
> memory channels are bad and must be avoided' would be attached to a
> format in the same way that macrotile size/depth would be?

Because it results in a different memory layout.

> Similarly,
> what happens if you encode a channel or lane blacklist mask into a
> modifier, and then you import a buffer with that modifier into a
> client API on another device. Does that also apply to the other device
> or only the source device?

If the other device doesn't have the same hardware config it probably 
can't access that buffer.

> How then do you handle the capability query between them: is it
> relevant to device B attempting to import and sample from an image
> that device A only has a 128-bit memory bus because of SKU
> configuration? How does generic userspace look at a token which
> encodes all of this information and know that the buffer is
> interchangeable between devices?

Well that is exactly the point where my understanding stops. No idea how 
the user space driver is handling this.

Marek came up with the interop interface, he needs to explain the details.

Regards,
Christian.

>
> Cheers,
> Daniel
Christian König Sept. 4, 2018, 1:03 p.m. UTC | #11
Am 04.09.2018 um 14:26 schrieb Daniel Vetter:
> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>>> Hi,
>>>
>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
>>>>> +/* The chip this is compatible with.
>>>>> + *
>>>>> + * If compression is disabled, use
>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>>>>> + *
>>>>> + * With compression enabled please use the exact chip.
>>>>> + *
>>>>> + * TODO: Do some generations share DCC format?
>>>>> + */
>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
>>>> details? When we had the entire discussion with nvidia folks they
>>>> eventually agreed that they don't need the massive pile with every
>>>> possible combination. Do you really plan to share all these different
>>>> things?
>>>>
>>>> Note that e.g. on i915 we spec some of the tiling depending upon
>>>> buffer size and buffer format (because that's how the hw works), not
>>>> using explicit modifier flags for everything.
>>> Right. The conclusion, after people went through and started sorting
>>> out the kinds of formats for which they would _actually_ export real
>>> colour buffers for, that most vendors definitely have fewer than
>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>> possible formats to represent, very likely fewer than
>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>>> bits of the 56 available.
>> The problem here is that at least for some parameters we actually don't know
>> which formats are actually used.
>>
>> The following are not real world examples, but just to explain the general
>> problem.
>>
>> The memory configuration for example can be not ASIC specific, but rather
>> determined by whoever took the ASIC and glued it together with VRAM on a
>> board. It is not likely that somebody puts all the VRAM chips on one
>> channel, but it is still perfectly possible.
>>
>> Same is true for things like harvesting, e.g. of 16 channels halve of them
>> could be bad and we need to know which to actually use.
> For my understanding: This leaks outside the chip when sharing buffers?
> All the information you only need locally to a given amdgpu instance
> don't really need to be encoded in modifiers.

Yeah, that is certainly a possibility. The problem is that the 
identifiers are then not universal applicable.

But if we can somehow say this format only applies to buffers from 
devices with hardware config X, then that would certainly simplify 
things a lot.

> Pointers to code where this is all decided (kernel and radeonsi would be
> good starters I guess) would be really good here.

Well that is the problem, a good part of that stuff is rather deeply 
embedded in the firmware.

The kernel driver just reads the hardware config from some registers and 
hands it of to userspace.

Regards,
Christian.

> -Daniel
>
>> Regards,
>> Christian.
>>
>>> If you do use 256 bits in order to represent
>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>> modifiers per format, userspace would start hitting OOM pretty quickly
>>> as it attempted to enumerate and negotiate acceptable modifiers.
>>> Either that or we need to replace the fixed 64-bit modifier tokens
>>> with some kind of eBPF script.
>>>
>>> Cheers,
>>> Daniel
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter Sept. 4, 2018, 1:03 p.m. UTC | #12
On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > > > Hi,
> > > >
> > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> > > > > > +/* The chip this is compatible with.
> > > > > > + *
> > > > > > + * If compression is disabled, use
> > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > > > > > + *
> > > > > > + * With compression enabled please use the exact chip.
> > > > > > + *
> > > > > > + * TODO: Do some generations share DCC format?
> > > > > > + */
> > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> > > > > Do you really need all the combinations here of DCC + gpu gen + tiling
> > > > > details? When we had the entire discussion with nvidia folks they
> > > > > eventually agreed that they don't need the massive pile with every
> > > > > possible combination. Do you really plan to share all these different
> > > > > things?
> > > > >
> > > > > Note that e.g. on i915 we spec some of the tiling depending upon
> > > > > buffer size and buffer format (because that's how the hw works), not
> > > > > using explicit modifier flags for everything.
> > > > Right. The conclusion, after people went through and started sorting
> > > > out the kinds of formats for which they would _actually_ export real
> > > > colour buffers for, that most vendors definitely have fewer than
> > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > possible formats to represent, very likely fewer than
> > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > > > fewer than 72,057,594,037,927,936 formats, and even still generally
> > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > > > bits of the 56 available.
> > >
> > > The problem here is that at least for some parameters we actually don't know
> > > which formats are actually used.
> > >
> > > The following are not real world examples, but just to explain the general
> > > problem.
> > >
> > > The memory configuration for example can be not ASIC specific, but rather
> > > determined by whoever took the ASIC and glued it together with VRAM on a
> > > board. It is not likely that somebody puts all the VRAM chips on one
> > > channel, but it is still perfectly possible.
> > >
> > > Same is true for things like harvesting, e.g. of 16 channels halve of them
> > > could be bad and we need to know which to actually use.
> >
> > For my understanding: This leaks outside the chip when sharing buffers?
> > All the information you only need locally to a given amdgpu instance
> > don't really need to be encoded in modifiers.
> >
> > Pointers to code where this is all decided (kernel and radeonsi would be
> > good starters I guess) would be really good here.
> 
> I extracted the information on which bits are relevant mostly from the
> AddrFromCoord functions in addrlib in mesa:
> 
> for macro-tiles:
> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> 
> for micro-tiles (or the micro-tiles in macro-tiles):
> 
> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016

So this is the decoding thing. How many of these actually exist, even when
taking all the other information into account?

E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
+ height + width, how much of all these bits do you actually still freely
pick?

It might be that all the things you need to know from the memory config
don't encode smaller than the macro/micro/whatever else stuff. But that's
kinda the angle that we looked at this for everyone else.

E.g. for multi-plane stuff, if everyone picks the same config for the
2nd/3rd plane, then you don't actually need to encode that. It just
becomes part of the implied stuff in the modifier.
-Daniel

> 
> > -Daniel
> >
> > >
> > > Regards,
> > > Christian.
> > >
> > > >
> > > > If you do use 256 bits in order to represent
> > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > modifiers per format, userspace would start hitting OOM pretty quickly
> > > > as it attempted to enumerate and negotiate acceptable modifiers.
> > > > Either that or we need to replace the fixed 64-bit modifier tokens
> > > > with some kind of eBPF script.
> > > >
> > > > Cheers,
> > > > Daniel
> > > > _______________________________________________
> > > > dri-devel mailing list
> > > > dri-devel@lists.freedesktop.org
> > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > http://blog.ffwll.ch
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Christian König Sept. 4, 2018, 1:12 p.m. UTC | #13
Am 04.09.2018 um 15:03 schrieb Daniel Vetter:
> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>>>>> Hi,
>>>>>
>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
>>>>>>> +/* The chip this is compatible with.
>>>>>>> + *
>>>>>>> + * If compression is disabled, use
>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>>>>>>> + *
>>>>>>> + * With compression enabled please use the exact chip.
>>>>>>> + *
>>>>>>> + * TODO: Do some generations share DCC format?
>>>>>>> + */
>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
>>>>>> details? When we had the entire discussion with nvidia folks they
>>>>>> eventually agreed that they don't need the massive pile with every
>>>>>> possible combination. Do you really plan to share all these different
>>>>>> things?
>>>>>>
>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
>>>>>> buffer size and buffer format (because that's how the hw works), not
>>>>>> using explicit modifier flags for everything.
>>>>> Right. The conclusion, after people went through and started sorting
>>>>> out the kinds of formats for which they would _actually_ export real
>>>>> colour buffers for, that most vendors definitely have fewer than
>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>> possible formats to represent, very likely fewer than
>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>>>>> bits of the 56 available.
>>>> The problem here is that at least for some parameters we actually don't know
>>>> which formats are actually used.
>>>>
>>>> The following are not real world examples, but just to explain the general
>>>> problem.
>>>>
>>>> The memory configuration for example can be not ASIC specific, but rather
>>>> determined by whoever took the ASIC and glued it together with VRAM on a
>>>> board. It is not likely that somebody puts all the VRAM chips on one
>>>> channel, but it is still perfectly possible.
>>>>
>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
>>>> could be bad and we need to know which to actually use.
>>> For my understanding: This leaks outside the chip when sharing buffers?
>>> All the information you only need locally to a given amdgpu instance
>>> don't really need to be encoded in modifiers.
>>>
>>> Pointers to code where this is all decided (kernel and radeonsi would be
>>> good starters I guess) would be really good here.
>> I extracted the information on which bits are relevant mostly from the
>> AddrFromCoord functions in addrlib in mesa:
>>
>> for macro-tiles:
>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>>
>> for micro-tiles (or the micro-tiles in macro-tiles):
>>
>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
> So this is the decoding thing. How many of these actually exist, even when
> taking all the other information into account?
>
> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> + height + width, how much of all these bits do you actually still freely
> pick?
>
> It might be that all the things you need to know from the memory config
> don't encode smaller than the macro/micro/whatever else stuff. But that's
> kinda the angle that we looked at this for everyone else.
>
> E.g. for multi-plane stuff, if everyone picks the same config for the
> 2nd/3rd plane, then you don't actually need to encode that. It just
> becomes part of the implied stuff in the modifier.

The pipe/bank configuration for the 2nd/3rd plane is the same, but the 
tile configuration is potentially different.

We can make an educated guess, but I don't think that this is universal 
applicable as well.

Regards,
Christian.

> -Daniel
>
>>> -Daniel
>>>
>>>> Regards,
>>>> Christian.
>>>>
>>>>> If you do use 256 bits in order to represent
>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
>>>>> with some kind of eBPF script.
>>>>>
>>>>> Cheers,
>>>>> Daniel
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> http://blog.ffwll.ch
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Daniel Vetter Sept. 4, 2018, 1:17 p.m. UTC | #14
On Tue, Sep 4, 2018 at 3:12 PM, Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
> Am 04.09.2018 um 15:03 schrieb Daniel Vetter:
>>
>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>>>
>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>
>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>>>>>
>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>>> wrote:
>>>>>>>
>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen
>>>>>>> <bas@basnieuwenhuizen.nl> wrote:
>>>>>>>>
>>>>>>>> +/* The chip this is compatible with.
>>>>>>>> + *
>>>>>>>> + * If compression is disabled, use
>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>>>>>>>> + *
>>>>>>>> + * With compression enabled please use the exact chip.
>>>>>>>> + *
>>>>>>>> + * TODO: Do some generations share DCC format?
>>>>>>>> + */
>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>>>>>>>
>>>>>>> Do you really need all the combinations here of DCC + gpu gen +
>>>>>>> tiling
>>>>>>> details? When we had the entire discussion with nvidia folks they
>>>>>>> eventually agreed that they don't need the massive pile with every
>>>>>>> possible combination. Do you really plan to share all these different
>>>>>>> things?
>>>>>>>
>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
>>>>>>> buffer size and buffer format (because that's how the hw works), not
>>>>>>> using explicit modifier flags for everything.
>>>>>>
>>>>>> Right. The conclusion, after people went through and started sorting
>>>>>> out the kinds of formats for which they would _actually_ export real
>>>>>> colour buffers for, that most vendors definitely have fewer than
>>>>>>
>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>>> possible formats to represent, very likely fewer than
>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>>>>>> bits of the 56 available.
>>>>>
>>>>> The problem here is that at least for some parameters we actually don't
>>>>> know
>>>>> which formats are actually used.
>>>>>
>>>>> The following are not real world examples, but just to explain the
>>>>> general
>>>>> problem.
>>>>>
>>>>> The memory configuration for example can be not ASIC specific, but
>>>>> rather
>>>>> determined by whoever took the ASIC and glued it together with VRAM on
>>>>> a
>>>>> board. It is not likely that somebody puts all the VRAM chips on one
>>>>> channel, but it is still perfectly possible.
>>>>>
>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of
>>>>> them
>>>>> could be bad and we need to know which to actually use.
>>>>
>>>> For my understanding: This leaks outside the chip when sharing buffers?
>>>> All the information you only need locally to a given amdgpu instance
>>>> don't really need to be encoded in modifiers.
>>>>
>>>> Pointers to code where this is all decided (kernel and radeonsi would be
>>>> good starters I guess) would be really good here.
>>>
>>> I extracted the information on which bits are relevant mostly from the
>>> AddrFromCoord functions in addrlib in mesa:
>>>
>>> for macro-tiles:
>>>
>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>>>
>>> for micro-tiles (or the micro-tiles in macro-tiles):
>>>
>>>
>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>>
>> So this is the decoding thing. How many of these actually exist, even when
>> taking all the other information into account?
>>
>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>> + height + width, how much of all these bits do you actually still freely
>> pick?
>>
>> It might be that all the things you need to know from the memory config
>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>> kinda the angle that we looked at this for everyone else.
>>
>> E.g. for multi-plane stuff, if everyone picks the same config for the
>> 2nd/3rd plane, then you don't actually need to encode that. It just
>> becomes part of the implied stuff in the modifier.
>
>
> The pipe/bank configuration for the 2nd/3rd plane is the same, but the tile
> configuration is potentially different.
>
> We can make an educated guess, but I don't think that this is universal
> applicable as well.

Why? The source code for amdgpu.ko, mesa, radv, amdvk is all open.
Should be possible to figure out what's actually used (on buffers you
want to share), and then figuring out how to best encode the
information you need to reconstruct all the information. You might
need to occasionally respin that if best practices for how to lay out
buffers changes after you started upstreaming for a given generation.
But ime hw people have pretty strong opinions on how to best tile
things. You even know what's coming down the internal pipeline.

Or do you mean something else with "educated guess"?
-Daniel

>
> Regards,
> Christian.
>
>
>> -Daniel
>>
>>>> -Daniel
>>>>
>>>>> Regards,
>>>>> Christian.
>>>>>
>>>>>> If you do use 256 bits in order to represent
>>>>>>
>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
>>>>>> with some kind of eBPF script.
>>>>>>
>>>>>> Cheers,
>>>>>> Daniel
>>>>>> _______________________________________________
>>>>>> dri-devel mailing list
>>>>>> dri-devel@lists.freedesktop.org
>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>
>>>> --
>>>> Daniel Vetter
>>>> Software Engineer, Intel Corporation
>>>> http://blog.ffwll.ch
>>>> _______________________________________________
>>>> dri-devel mailing list
>>>> dri-devel@lists.freedesktop.org
>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
Christian König Sept. 4, 2018, 1:23 p.m. UTC | #15
Am 04.09.2018 um 15:17 schrieb Daniel Vetter:
> On Tue, Sep 4, 2018 at 3:12 PM, Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>> Am 04.09.2018 um 15:03 schrieb Daniel Vetter:
>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>>>>>>> Hi,
>>>>>>>
>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch>
>>>>>>> wrote:
>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen
>>>>>>>> <bas@basnieuwenhuizen.nl> wrote:
>>>>>>>>> +/* The chip this is compatible with.
>>>>>>>>> + *
>>>>>>>>> + * If compression is disabled, use
>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>>>>>>>>> + *
>>>>>>>>> + * With compression enabled please use the exact chip.
>>>>>>>>> + *
>>>>>>>>> + * TODO: Do some generations share DCC format?
>>>>>>>>> + */
>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>>>>>>>> Do you really need all the combinations here of DCC + gpu gen +
>>>>>>>> tiling
>>>>>>>> details? When we had the entire discussion with nvidia folks they
>>>>>>>> eventually agreed that they don't need the massive pile with every
>>>>>>>> possible combination. Do you really plan to share all these different
>>>>>>>> things?
>>>>>>>>
>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
>>>>>>>> buffer size and buffer format (because that's how the hw works), not
>>>>>>>> using explicit modifier flags for everything.
>>>>>>> Right. The conclusion, after people went through and started sorting
>>>>>>> out the kinds of formats for which they would _actually_ export real
>>>>>>> colour buffers for, that most vendors definitely have fewer than
>>>>>>>
>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>>>> possible formats to represent, very likely fewer than
>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>>>>>>> bits of the 56 available.
>>>>>> The problem here is that at least for some parameters we actually don't
>>>>>> know
>>>>>> which formats are actually used.
>>>>>>
>>>>>> The following are not real world examples, but just to explain the
>>>>>> general
>>>>>> problem.
>>>>>>
>>>>>> The memory configuration for example can be not ASIC specific, but
>>>>>> rather
>>>>>> determined by whoever took the ASIC and glued it together with VRAM on
>>>>>> a
>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
>>>>>> channel, but it is still perfectly possible.
>>>>>>
>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of
>>>>>> them
>>>>>> could be bad and we need to know which to actually use.
>>>>> For my understanding: This leaks outside the chip when sharing buffers?
>>>>> All the information you only need locally to a given amdgpu instance
>>>>> don't really need to be encoded in modifiers.
>>>>>
>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
>>>>> good starters I guess) would be really good here.
>>>> I extracted the information on which bits are relevant mostly from the
>>>> AddrFromCoord functions in addrlib in mesa:
>>>>
>>>> for macro-tiles:
>>>>
>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>>>>
>>>> for micro-tiles (or the micro-tiles in macro-tiles):
>>>>
>>>>
>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>>> So this is the decoding thing. How many of these actually exist, even when
>>> taking all the other information into account?
>>>
>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>>> + height + width, how much of all these bits do you actually still freely
>>> pick?
>>>
>>> It might be that all the things you need to know from the memory config
>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>>> kinda the angle that we looked at this for everyone else.
>>>
>>> E.g. for multi-plane stuff, if everyone picks the same config for the
>>> 2nd/3rd plane, then you don't actually need to encode that. It just
>>> becomes part of the implied stuff in the modifier.
>>
>> The pipe/bank configuration for the 2nd/3rd plane is the same, but the tile
>> configuration is potentially different.
>>
>> We can make an educated guess, but I don't think that this is universal
>> applicable as well.
> Why? The source code for amdgpu.ko, mesa, radv, amdvk is all open.
> Should be possible to figure out what's actually used (on buffers you
> want to share), and then figuring out how to best encode the
> information you need to reconstruct all the information. You might
> need to occasionally respin that if best practices for how to lay out
> buffers changes after you started upstreaming for a given generation.
> But ime hw people have pretty strong opinions on how to best tile
> things. You even know what's coming down the internal pipeline.
>
> Or do you mean something else with "educated guess"?

I mean something else. It can depend on the use case what a buffer 
object ends up with.

In other words when you want to transcode a video you could use a 
different layout as when you want to display the resulting image.

So essentially we would need to encode the use case and not the format 
of the buffer.

That might work, but my gut feeling strongly tells me that this is the 
wrong approach.

Christian.

> -Daniel
>
>> Regards,
>> Christian.
>>
>>
>>> -Daniel
>>>
>>>>> -Daniel
>>>>>
>>>>>> Regards,
>>>>>> Christian.
>>>>>>
>>>>>>> If you do use 256 bits in order to represent
>>>>>>>
>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
>>>>>>> with some kind of eBPF script.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Daniel
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> http://blog.ffwll.ch
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>
>
Bas Nieuwenhuizen Sept. 4, 2018, 1:33 p.m. UTC | #16
On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >
> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > > > > Hi,
> > > > >
> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> > > > > > > +/* The chip this is compatible with.
> > > > > > > + *
> > > > > > > + * If compression is disabled, use
> > > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > > > > > > + *
> > > > > > > + * With compression enabled please use the exact chip.
> > > > > > > + *
> > > > > > > + * TODO: Do some generations share DCC format?
> > > > > > > + */
> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling
> > > > > > details? When we had the entire discussion with nvidia folks they
> > > > > > eventually agreed that they don't need the massive pile with every
> > > > > > possible combination. Do you really plan to share all these different
> > > > > > things?
> > > > > >
> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon
> > > > > > buffer size and buffer format (because that's how the hw works), not
> > > > > > using explicit modifier flags for everything.
> > > > > Right. The conclusion, after people went through and started sorting
> > > > > out the kinds of formats for which they would _actually_ export real
> > > > > colour buffers for, that most vendors definitely have fewer than
> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > > possible formats to represent, very likely fewer than
> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally
> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > > > > bits of the 56 available.
> > > >
> > > > The problem here is that at least for some parameters we actually don't know
> > > > which formats are actually used.
> > > >
> > > > The following are not real world examples, but just to explain the general
> > > > problem.
> > > >
> > > > The memory configuration for example can be not ASIC specific, but rather
> > > > determined by whoever took the ASIC and glued it together with VRAM on a
> > > > board. It is not likely that somebody puts all the VRAM chips on one
> > > > channel, but it is still perfectly possible.
> > > >
> > > > Same is true for things like harvesting, e.g. of 16 channels halve of them
> > > > could be bad and we need to know which to actually use.
> > >
> > > For my understanding: This leaks outside the chip when sharing buffers?
> > > All the information you only need locally to a given amdgpu instance
> > > don't really need to be encoded in modifiers.
> > >
> > > Pointers to code where this is all decided (kernel and radeonsi would be
> > > good starters I guess) would be really good here.
> >
> > I extracted the information on which bits are relevant mostly from the
> > AddrFromCoord functions in addrlib in mesa:
> >
> > for macro-tiles:
> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> >
> > for micro-tiles (or the micro-tiles in macro-tiles):
> >
> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>
> So this is the decoding thing. How many of these actually exist, even when
> taking all the other information into account?
>
> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> + height + width, how much of all these bits do you actually still freely
> pick?

Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
depth, display-rotated) + whether to use compression, everything else
is fixed given those option, the properties of the chip and the
format.


>
> It might be that all the things you need to know from the memory config
> don't encode smaller than the macro/micro/whatever else stuff. But that's
> kinda the angle that we looked at this for everyone else.
>
> E.g. for multi-plane stuff, if everyone picks the same config for the
> 2nd/3rd plane, then you don't actually need to encode that. It just
> becomes part of the implied stuff in the modifier.

The problem is some GPUs are compatible for say 8-bpp images, but not
for 32-bpp surfaces. e.g. lets look at the following table showing the
current configuration for all GFX6-GFX8 GPU:

format: (bank width, bank height, macro tile aspect, num banks) for
8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG

verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
ADDR_SURF_P8_32x32_8x16
bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2

We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
compatible for 8-bpp.

So we can't just assume that if the first plane properties match that
they do so for the second plane, because we don't know what GPU it is
coming from.

And we can make a canonical table from this but then if we change the
above tables in the kernel that runs into compatibility issues.

> -Daniel
>
> >
> > > -Daniel
> > >
> > > >
> > > > Regards,
> > > > Christian.
> > > >
> > > > >
> > > > > If you do use 256 bits in order to represent
> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > > > > modifiers per format, userspace would start hitting OOM pretty quickly
> > > > > as it attempted to enumerate and negotiate acceptable modifiers.
> > > > > Either that or we need to replace the fixed 64-bit modifier tokens
> > > > > with some kind of eBPF script.
> > > > >
> > > > > Cheers,
> > > > > Daniel
> > > > > _______________________________________________
> > > > > dri-devel mailing list
> > > > > dri-devel@lists.freedesktop.org
> > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > > >
> > >
> > > --
> > > Daniel Vetter
> > > Software Engineer, Intel Corporation
> > > http://blog.ffwll.ch
> > > _______________________________________________
> > > dri-devel mailing list
> > > dri-devel@lists.freedesktop.org
> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch
Daniel Vetter Sept. 4, 2018, 2:43 p.m. UTC | #17
On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
<bas@basnieuwenhuizen.nl> wrote:
> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> > >
>> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>> > > > > Hi,
>> > > > >
>> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
>> > > > > > > +/* The chip this is compatible with.
>> > > > > > > + *
>> > > > > > > + * If compression is disabled, use
>> > > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>> > > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>> > > > > > > + *
>> > > > > > > + * With compression enabled please use the exact chip.
>> > > > > > > + *
>> > > > > > > + * TODO: Do some generations share DCC format?
>> > > > > > > + */
>> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>> > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling
>> > > > > > details? When we had the entire discussion with nvidia folks they
>> > > > > > eventually agreed that they don't need the massive pile with every
>> > > > > > possible combination. Do you really plan to share all these different
>> > > > > > things?
>> > > > > >
>> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon
>> > > > > > buffer size and buffer format (because that's how the hw works), not
>> > > > > > using explicit modifier flags for everything.
>> > > > > Right. The conclusion, after people went through and started sorting
>> > > > > out the kinds of formats for which they would _actually_ export real
>> > > > > colour buffers for, that most vendors definitely have fewer than
>> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> > > > > possible formats to represent, very likely fewer than
>> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally
>> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
>> > > > > bits of the 56 available.
>> > > >
>> > > > The problem here is that at least for some parameters we actually don't know
>> > > > which formats are actually used.
>> > > >
>> > > > The following are not real world examples, but just to explain the general
>> > > > problem.
>> > > >
>> > > > The memory configuration for example can be not ASIC specific, but rather
>> > > > determined by whoever took the ASIC and glued it together with VRAM on a
>> > > > board. It is not likely that somebody puts all the VRAM chips on one
>> > > > channel, but it is still perfectly possible.
>> > > >
>> > > > Same is true for things like harvesting, e.g. of 16 channels halve of them
>> > > > could be bad and we need to know which to actually use.
>> > >
>> > > For my understanding: This leaks outside the chip when sharing buffers?
>> > > All the information you only need locally to a given amdgpu instance
>> > > don't really need to be encoded in modifiers.
>> > >
>> > > Pointers to code where this is all decided (kernel and radeonsi would be
>> > > good starters I guess) would be really good here.
>> >
>> > I extracted the information on which bits are relevant mostly from the
>> > AddrFromCoord functions in addrlib in mesa:
>> >
>> > for macro-tiles:
>> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>> >
>> > for micro-tiles (or the micro-tiles in macro-tiles):
>> >
>> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>>
>> So this is the decoding thing. How many of these actually exist, even when
>> taking all the other information into account?
>>
>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>> + height + width, how much of all these bits do you actually still freely
>> pick?
>
> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
> depth, display-rotated) + whether to use compression, everything else
> is fixed given those option, the properties of the chip and the
> format.
>
>
>>
>> It might be that all the things you need to know from the memory config
>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>> kinda the angle that we looked at this for everyone else.
>>
>> E.g. for multi-plane stuff, if everyone picks the same config for the
>> 2nd/3rd plane, then you don't actually need to encode that. It just
>> becomes part of the implied stuff in the modifier.
>
> The problem is some GPUs are compatible for say 8-bpp images, but not
> for 32-bpp surfaces. e.g. lets look at the following table showing the
> current configuration for all GFX6-GFX8 GPU:
>
> format: (bank width, bank height, macro tile aspect, num banks) for
> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
>
> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
> ADDR_SURF_P8_32x32_8x16
> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
>
> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
> compatible for 8-bpp.
>
> So we can't just assume that if the first plane properties match that
> they do so for the second plane, because we don't know what GPU it is
> coming from.
>
> And we can make a canonical table from this but then if we change the
> above tables in the kernel that runs into compatibility issues.


I think that's roughly what nvidia ended up doing (not sure they
published anything, haven't seen any patches on dri-devel). Two parts:
- Bunch of flags/bits for the major mode, like {display, non-display,
DCC} and so on.
- For each of the major modes a list of enumerated modes actually used
in reality. For DCC this would be the generation, for more plain
formats it would be the above table. Table entries would be per-bpp
(in case that matters, or whatever matters for amd gpus).

Imo would be totally ok to put the relevant lookup tables into
drm_fourcc.h directly. Or any other suitable place.

From a quick look about 16 bits total. Gives you 40bits of second
chances and "oops totally sorry".
-Daniel

>
>> -Daniel
>>
>> >
>> > > -Daniel
>> > >
>> > > >
>> > > > Regards,
>> > > > Christian.
>> > > >
>> > > > >
>> > > > > If you do use 256 bits in order to represent
>> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> > > > > modifiers per format, userspace would start hitting OOM pretty quickly
>> > > > > as it attempted to enumerate and negotiate acceptable modifiers.
>> > > > > Either that or we need to replace the fixed 64-bit modifier tokens
>> > > > > with some kind of eBPF script.
>> > > > >
>> > > > > Cheers,
>> > > > > Daniel
>> > > > > _______________________________________________
>> > > > > dri-devel mailing list
>> > > > > dri-devel@lists.freedesktop.org
>> > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> > > >
>> > >
>> > > --
>> > > Daniel Vetter
>> > > Software Engineer, Intel Corporation
>> > > http://blog.ffwll.ch
>> > > _______________________________________________
>> > > dri-devel mailing list
>> > > dri-devel@lists.freedesktop.org
>> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> http://blog.ffwll.ch
Bas Nieuwenhuizen Sept. 4, 2018, 3:52 p.m. UTC | #18
On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl> wrote:
> > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>
> >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> >> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >> > >
> >> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> >> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> >> > > > > Hi,
> >> > > > >
> >> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> >> > > > > > > +/* The chip this is compatible with.
> >> > > > > > > + *
> >> > > > > > > + * If compression is disabled, use
> >> > > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> >> > > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> >> > > > > > > + *
> >> > > > > > > + * With compression enabled please use the exact chip.
> >> > > > > > > + *
> >> > > > > > > + * TODO: Do some generations share DCC format?
> >> > > > > > > + */
> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> >> > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling
> >> > > > > > details? When we had the entire discussion with nvidia folks they
> >> > > > > > eventually agreed that they don't need the massive pile with every
> >> > > > > > possible combination. Do you really plan to share all these different
> >> > > > > > things?
> >> > > > > >
> >> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon
> >> > > > > > buffer size and buffer format (because that's how the hw works), not
> >> > > > > > using explicit modifier flags for everything.
> >> > > > > Right. The conclusion, after people went through and started sorting
> >> > > > > out the kinds of formats for which they would _actually_ export real
> >> > > > > colour buffers for, that most vendors definitely have fewer than
> >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> >> > > > > possible formats to represent, very likely fewer than
> >> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> >> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally
> >> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
> >> > > > > bits of the 56 available.
> >> > > >
> >> > > > The problem here is that at least for some parameters we actually don't know
> >> > > > which formats are actually used.
> >> > > >
> >> > > > The following are not real world examples, but just to explain the general
> >> > > > problem.
> >> > > >
> >> > > > The memory configuration for example can be not ASIC specific, but rather
> >> > > > determined by whoever took the ASIC and glued it together with VRAM on a
> >> > > > board. It is not likely that somebody puts all the VRAM chips on one
> >> > > > channel, but it is still perfectly possible.
> >> > > >
> >> > > > Same is true for things like harvesting, e.g. of 16 channels halve of them
> >> > > > could be bad and we need to know which to actually use.
> >> > >
> >> > > For my understanding: This leaks outside the chip when sharing buffers?
> >> > > All the information you only need locally to a given amdgpu instance
> >> > > don't really need to be encoded in modifiers.
> >> > >
> >> > > Pointers to code where this is all decided (kernel and radeonsi would be
> >> > > good starters I guess) would be really good here.
> >> >
> >> > I extracted the information on which bits are relevant mostly from the
> >> > AddrFromCoord functions in addrlib in mesa:
> >> >
> >> > for macro-tiles:
> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> >> >
> >> > for micro-tiles (or the micro-tiles in macro-tiles):
> >> >
> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
> >>
> >> So this is the decoding thing. How many of these actually exist, even when
> >> taking all the other information into account?
> >>
> >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> >> + height + width, how much of all these bits do you actually still freely
> >> pick?
> >
> > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
> > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
> > depth, display-rotated) + whether to use compression, everything else
> > is fixed given those option, the properties of the chip and the
> > format.
> >
> >
> >>
> >> It might be that all the things you need to know from the memory config
> >> don't encode smaller than the macro/micro/whatever else stuff. But that's
> >> kinda the angle that we looked at this for everyone else.
> >>
> >> E.g. for multi-plane stuff, if everyone picks the same config for the
> >> 2nd/3rd plane, then you don't actually need to encode that. It just
> >> becomes part of the implied stuff in the modifier.
> >
> > The problem is some GPUs are compatible for say 8-bpp images, but not
> > for 32-bpp surfaces. e.g. lets look at the following table showing the
> > current configuration for all GFX6-GFX8 GPU:
> >
> > format: (bank width, bank height, macro tile aspect, num banks) for
> > 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
> >
> > verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> > oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> > hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
> > tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
> > ADDR_SURF_P8_32x32_8x16
> > bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
> > hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
> > CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
> > topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
> > fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
> > tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> > polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
> > polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> > stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
> >
> > We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
> > and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
> > compatible for 8-bpp.
> >
> > So we can't just assume that if the first plane properties match that
> > they do so for the second plane, because we don't know what GPU it is
> > coming from.
> >
> > And we can make a canonical table from this but then if we change the
> > above tables in the kernel that runs into compatibility issues.
>
>
> I think that's roughly what nvidia ended up doing (not sure they
> published anything, haven't seen any patches on dri-devel). Two parts:
> - Bunch of flags/bits for the major mode, like {display, non-display,
> DCC} and so on.
> - For each of the major modes a list of enumerated modes actually used
> in reality. For DCC this would be the generation, for more plain
> formats it would be the above table. Table entries would be per-bpp
> (in case that matters, or whatever matters for amd gpus).
>
> Imo would be totally ok to put the relevant lookup tables into
> drm_fourcc.h directly. Or any other suitable place.

Right, the issue would be that the tables are configured int he kernel

https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417

and having a shared hardcoded table means that if the kernel tables
change we break userspace. That would be a new commitment on the side
of the kernel driver.

(If I can get an okay for that commitment from Alex/Christian I'd be
totally happy to switch it over)

>
> From a quick look about 16 bits total. Gives you 40bits of second
> chances and "oops totally sorry".

I purposefully left 8 bits as a version field for those oopses.

> -Daniel
>
> >
> >> -Daniel
> >>
> >> >
> >> > > -Daniel
> >> > >
> >> > > >
> >> > > > Regards,
> >> > > > Christian.
> >> > > >
> >> > > > >
> >> > > > > If you do use 256 bits in order to represent
> >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> >> > > > > modifiers per format, userspace would start hitting OOM pretty quickly
> >> > > > > as it attempted to enumerate and negotiate acceptable modifiers.
> >> > > > > Either that or we need to replace the fixed 64-bit modifier tokens
> >> > > > > with some kind of eBPF script.
> >> > > > >
> >> > > > > Cheers,
> >> > > > > Daniel
> >> > > > > _______________________________________________
> >> > > > > dri-devel mailing list
> >> > > > > dri-devel@lists.freedesktop.org
> >> > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >> > > >
> >> > >
> >> > > --
> >> > > Daniel Vetter
> >> > > Software Engineer, Intel Corporation
> >> > > http://blog.ffwll.ch
> >> > > _______________________________________________
> >> > > dri-devel mailing list
> >> > > dri-devel@lists.freedesktop.org
> >> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Sept. 4, 2018, 4:37 p.m. UTC | #19
On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
<bas@basnieuwenhuizen.nl> wrote:
> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>> <bas@basnieuwenhuizen.nl> wrote:
>> > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> >>
>> >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> >> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> >> > >
>> >> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>> >> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>> >> > > > > Hi,
>> >> > > > >
>> >> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>> >> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
>> >> > > > > > > +/* The chip this is compatible with.
>> >> > > > > > > + *
>> >> > > > > > > + * If compression is disabled, use
>> >> > > > > > > + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>> >> > > > > > > + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>> >> > > > > > > + *
>> >> > > > > > > + * With compression enabled please use the exact chip.
>> >> > > > > > > + *
>> >> > > > > > > + * TODO: Do some generations share DCC format?
>> >> > > > > > > + */
>> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>> >> > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling
>> >> > > > > > details? When we had the entire discussion with nvidia folks they
>> >> > > > > > eventually agreed that they don't need the massive pile with every
>> >> > > > > > possible combination. Do you really plan to share all these different
>> >> > > > > > things?
>> >> > > > > >
>> >> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon
>> >> > > > > > buffer size and buffer format (because that's how the hw works), not
>> >> > > > > > using explicit modifier flags for everything.
>> >> > > > > Right. The conclusion, after people went through and started sorting
>> >> > > > > out the kinds of formats for which they would _actually_ export real
>> >> > > > > colour buffers for, that most vendors definitely have fewer than
>> >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> >> > > > > possible formats to represent, very likely fewer than
>> >> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>> >> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally
>> >> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8
>> >> > > > > bits of the 56 available.
>> >> > > >
>> >> > > > The problem here is that at least for some parameters we actually don't know
>> >> > > > which formats are actually used.
>> >> > > >
>> >> > > > The following are not real world examples, but just to explain the general
>> >> > > > problem.
>> >> > > >
>> >> > > > The memory configuration for example can be not ASIC specific, but rather
>> >> > > > determined by whoever took the ASIC and glued it together with VRAM on a
>> >> > > > board. It is not likely that somebody puts all the VRAM chips on one
>> >> > > > channel, but it is still perfectly possible.
>> >> > > >
>> >> > > > Same is true for things like harvesting, e.g. of 16 channels halve of them
>> >> > > > could be bad and we need to know which to actually use.
>> >> > >
>> >> > > For my understanding: This leaks outside the chip when sharing buffers?
>> >> > > All the information you only need locally to a given amdgpu instance
>> >> > > don't really need to be encoded in modifiers.
>> >> > >
>> >> > > Pointers to code where this is all decided (kernel and radeonsi would be
>> >> > > good starters I guess) would be really good here.
>> >> >
>> >> > I extracted the information on which bits are relevant mostly from the
>> >> > AddrFromCoord functions in addrlib in mesa:
>> >> >
>> >> > for macro-tiles:
>> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>> >> >
>> >> > for micro-tiles (or the micro-tiles in macro-tiles):
>> >> >
>> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>> >>
>> >> So this is the decoding thing. How many of these actually exist, even when
>> >> taking all the other information into account?
>> >>
>> >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>> >> + height + width, how much of all these bits do you actually still freely
>> >> pick?
>> >
>> > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
>> > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
>> > depth, display-rotated) + whether to use compression, everything else
>> > is fixed given those option, the properties of the chip and the
>> > format.
>> >
>> >
>> >>
>> >> It might be that all the things you need to know from the memory config
>> >> don't encode smaller than the macro/micro/whatever else stuff. But that's
>> >> kinda the angle that we looked at this for everyone else.
>> >>
>> >> E.g. for multi-plane stuff, if everyone picks the same config for the
>> >> 2nd/3rd plane, then you don't actually need to encode that. It just
>> >> becomes part of the implied stuff in the modifier.
>> >
>> > The problem is some GPUs are compatible for say 8-bpp images, but not
>> > for 32-bpp surfaces. e.g. lets look at the following table showing the
>> > current configuration for all GFX6-GFX8 GPU:
>> >
>> > format: (bank width, bank height, macro tile aspect, num banks) for
>> > 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
>> >
>> > verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>> > oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>> > hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
>> > tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
>> > ADDR_SURF_P8_32x32_8x16
>> > bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
>> > hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
>> > CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
>> > topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
>> > fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
>> > tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>> > polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
>> > polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>> > stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
>> >
>> > We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
>> > and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
>> > compatible for 8-bpp.
>> >
>> > So we can't just assume that if the first plane properties match that
>> > they do so for the second plane, because we don't know what GPU it is
>> > coming from.
>> >
>> > And we can make a canonical table from this but then if we change the
>> > above tables in the kernel that runs into compatibility issues.
>>
>>
>> I think that's roughly what nvidia ended up doing (not sure they
>> published anything, haven't seen any patches on dri-devel). Two parts:
>> - Bunch of flags/bits for the major mode, like {display, non-display,
>> DCC} and so on.
>> - For each of the major modes a list of enumerated modes actually used
>> in reality. For DCC this would be the generation, for more plain
>> formats it would be the above table. Table entries would be per-bpp
>> (in case that matters, or whatever matters for amd gpus).
>>
>> Imo would be totally ok to put the relevant lookup tables into
>> drm_fourcc.h directly. Or any other suitable place.
>
> Right, the issue would be that the tables are configured int he kernel
>
> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
>
> and having a shared hardcoded table means that if the kernel tables
> change we break userspace. That would be a new commitment on the side
> of the kernel driver.
>
> (If I can get an okay for that commitment from Alex/Christian I'd be
> totally happy to switch it over)

Yeah it'd make that table uapi - well the compressed from with
duplicates removed, if you really want to make it tiny. But you can
just add new entries if the heuristics for a given platform changes,
as long as you keep the old entries working (if you shipped them ever
ofc, otherwise totally ok to change as you see fit).
-Daniel

>> From a quick look about 16 bits total. Gives you 40bits of second
>> chances and "oops totally sorry".
>
> I purposefully left 8 bits as a version field for those oopses.
>
>> -Daniel
>>
>> >
>> >> -Daniel
>> >>
>> >> >
>> >> > > -Daniel
>> >> > >
>> >> > > >
>> >> > > > Regards,
>> >> > > > Christian.
>> >> > > >
>> >> > > > >
>> >> > > > > If you do use 256 bits in order to represent
>> >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> >> > > > > modifiers per format, userspace would start hitting OOM pretty quickly
>> >> > > > > as it attempted to enumerate and negotiate acceptable modifiers.
>> >> > > > > Either that or we need to replace the fixed 64-bit modifier tokens
>> >> > > > > with some kind of eBPF script.
>> >> > > > >
>> >> > > > > Cheers,
>> >> > > > > Daniel
>> >> > > > > _______________________________________________
>> >> > > > > dri-devel mailing list
>> >> > > > > dri-devel@lists.freedesktop.org
>> >> > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> >> > > >
>> >> > >
>> >> > > --
>> >> > > Daniel Vetter
>> >> > > Software Engineer, Intel Corporation
>> >> > > http://blog.ffwll.ch
>> >> > > _______________________________________________
>> >> > > dri-devel mailing list
>> >> > > dri-devel@lists.freedesktop.org
>> >> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> >>
>> >> --
>> >> Daniel Vetter
>> >> Software Engineer, Intel Corporation
>> >> http://blog.ffwll.ch
>>
>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Christian König Sept. 4, 2018, 5:48 p.m. UTC | #20
Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl> wrote:
>> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>>> <bas@basnieuwenhuizen.nl> wrote:
>>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>>>>>>>>> Hi,
>>>>>>>>>
>>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
>>>>>>>>>>> +/* The chip this is compatible with.
>>>>>>>>>>> + *
>>>>>>>>>>> + * If compression is disabled, use
>>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>>>>>>>>>>> + *
>>>>>>>>>>> + * With compression enabled please use the exact chip.
>>>>>>>>>>> + *
>>>>>>>>>>> + * TODO: Do some generations share DCC format?
>>>>>>>>>>> + */
>>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
>>>>>>>>>> details? When we had the entire discussion with nvidia folks they
>>>>>>>>>> eventually agreed that they don't need the massive pile with every
>>>>>>>>>> possible combination. Do you really plan to share all these different
>>>>>>>>>> things?
>>>>>>>>>>
>>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
>>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
>>>>>>>>>> using explicit modifier flags for everything.
>>>>>>>>> Right. The conclusion, after people went through and started sorting
>>>>>>>>> out the kinds of formats for which they would _actually_ export real
>>>>>>>>> colour buffers for, that most vendors definitely have fewer than
>>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>>>>>> possible formats to represent, very likely fewer than
>>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>>>>>>>>> bits of the 56 available.
>>>>>>>> The problem here is that at least for some parameters we actually don't know
>>>>>>>> which formats are actually used.
>>>>>>>>
>>>>>>>> The following are not real world examples, but just to explain the general
>>>>>>>> problem.
>>>>>>>>
>>>>>>>> The memory configuration for example can be not ASIC specific, but rather
>>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
>>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
>>>>>>>> channel, but it is still perfectly possible.
>>>>>>>>
>>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
>>>>>>>> could be bad and we need to know which to actually use.
>>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
>>>>>>> All the information you only need locally to a given amdgpu instance
>>>>>>> don't really need to be encoded in modifiers.
>>>>>>>
>>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
>>>>>>> good starters I guess) would be really good here.
>>>>>> I extracted the information on which bits are relevant mostly from the
>>>>>> AddrFromCoord functions in addrlib in mesa:
>>>>>>
>>>>>> for macro-tiles:
>>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>>>>>>
>>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
>>>>>>
>>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>>>>> So this is the decoding thing. How many of these actually exist, even when
>>>>> taking all the other information into account?
>>>>>
>>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>>>>> + height + width, how much of all these bits do you actually still freely
>>>>> pick?
>>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
>>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
>>>> depth, display-rotated) + whether to use compression, everything else
>>>> is fixed given those option, the properties of the chip and the
>>>> format.
>>>>
>>>>
>>>>> It might be that all the things you need to know from the memory config
>>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>>>>> kinda the angle that we looked at this for everyone else.
>>>>>
>>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
>>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
>>>>> becomes part of the implied stuff in the modifier.
>>>> The problem is some GPUs are compatible for say 8-bpp images, but not
>>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
>>>> current configuration for all GFX6-GFX8 GPU:
>>>>
>>>> format: (bank width, bank height, macro tile aspect, num banks) for
>>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
>>>>
>>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
>>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
>>>> ADDR_SURF_P8_32x32_8x16
>>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
>>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
>>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
>>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
>>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
>>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
>>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
>>>>
>>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
>>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
>>>> compatible for 8-bpp.
>>>>
>>>> So we can't just assume that if the first plane properties match that
>>>> they do so for the second plane, because we don't know what GPU it is
>>>> coming from.
>>>>
>>>> And we can make a canonical table from this but then if we change the
>>>> above tables in the kernel that runs into compatibility issues.
>>>
>>> I think that's roughly what nvidia ended up doing (not sure they
>>> published anything, haven't seen any patches on dri-devel). Two parts:
>>> - Bunch of flags/bits for the major mode, like {display, non-display,
>>> DCC} and so on.
>>> - For each of the major modes a list of enumerated modes actually used
>>> in reality. For DCC this would be the generation, for more plain
>>> formats it would be the above table. Table entries would be per-bpp
>>> (in case that matters, or whatever matters for amd gpus).
>>>
>>> Imo would be totally ok to put the relevant lookup tables into
>>> drm_fourcc.h directly. Or any other suitable place.
>> Right, the issue would be that the tables are configured int he kernel
>>
>> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
>>
>> and having a shared hardcoded table means that if the kernel tables
>> change we break userspace. That would be a new commitment on the side
>> of the kernel driver.
>>
>> (If I can get an okay for that commitment from Alex/Christian I'd be
>> totally happy to switch it over)
> Yeah it'd make that table uapi

That won't work. Those lists are not necessary stable.

What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index 
plus a 8bit version number.

Shouldn't be more than 32 bits in total.

Regards,
Christian.

>   - well the compressed from with
> duplicates removed, if you really want to make it tiny. But you can
> just add new entries if the heuristics for a given platform changes,
> as long as you keep the old entries working (if you shipped them ever
> ofc, otherwise totally ok to change as you see fit).
> -Daniel
>
>>>  From a quick look about 16 bits total. Gives you 40bits of second
>>> chances and "oops totally sorry".
>> I purposefully left 8 bits as a version field for those oopses.
>>
>>> -Daniel
>>>
>>>>> -Daniel
>>>>>
>>>>>>> -Daniel
>>>>>>>
>>>>>>>> Regards,
>>>>>>>> Christian.
>>>>>>>>
>>>>>>>>> If you do use 256 bits in order to represent
>>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
>>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
>>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
>>>>>>>>> with some kind of eBPF script.
>>>>>>>>>
>>>>>>>>> Cheers,
>>>>>>>>> Daniel
>>>>>>>>> _______________________________________________
>>>>>>>>> dri-devel mailing list
>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>> --
>>>>>>> Daniel Vetter
>>>>>>> Software Engineer, Intel Corporation
>>>>>>> http://blog.ffwll.ch
>>>>>>> _______________________________________________
>>>>>>> dri-devel mailing list
>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>> --
>>>>> Daniel Vetter
>>>>> Software Engineer, Intel Corporation
>>>>> http://blog.ffwll.ch
>>>
>>>
>>> --
>>> Daniel Vetter
>>> Software Engineer, Intel Corporation
>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
>
Bas Nieuwenhuizen Sept. 4, 2018, 5:57 p.m. UTC | #21
On Tue, Sep 4, 2018 at 7:48 PM Christian König
<ckoenig.leichtzumerken@gmail.com> wrote:
>
> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> > <bas@basnieuwenhuizen.nl> wrote:
> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
> >>> <bas@basnieuwenhuizen.nl> wrote:
> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> >>>>>>>>> Hi,
> >>>>>>>>>
> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> >>>>>>>>>>> +/* The chip this is compatible with.
> >>>>>>>>>>> + *
> >>>>>>>>>>> + * If compression is disabled, use
> >>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> >>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> >>>>>>>>>>> + *
> >>>>>>>>>>> + * With compression enabled please use the exact chip.
> >>>>>>>>>>> + *
> >>>>>>>>>>> + * TODO: Do some generations share DCC format?
> >>>>>>>>>>> + */
> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they
> >>>>>>>>>> eventually agreed that they don't need the massive pile with every
> >>>>>>>>>> possible combination. Do you really plan to share all these different
> >>>>>>>>>> things?
> >>>>>>>>>>
> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
> >>>>>>>>>> using explicit modifier flags for everything.
> >>>>>>>>> Right. The conclusion, after people went through and started sorting
> >>>>>>>>> out the kinds of formats for which they would _actually_ export real
> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than
> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> >>>>>>>>> possible formats to represent, very likely fewer than
> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
> >>>>>>>>> bits of the 56 available.
> >>>>>>>> The problem here is that at least for some parameters we actually don't know
> >>>>>>>> which formats are actually used.
> >>>>>>>>
> >>>>>>>> The following are not real world examples, but just to explain the general
> >>>>>>>> problem.
> >>>>>>>>
> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather
> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
> >>>>>>>> channel, but it is still perfectly possible.
> >>>>>>>>
> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
> >>>>>>>> could be bad and we need to know which to actually use.
> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
> >>>>>>> All the information you only need locally to a given amdgpu instance
> >>>>>>> don't really need to be encoded in modifiers.
> >>>>>>>
> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
> >>>>>>> good starters I guess) would be really good here.
> >>>>>> I extracted the information on which bits are relevant mostly from the
> >>>>>> AddrFromCoord functions in addrlib in mesa:
> >>>>>>
> >>>>>> for macro-tiles:
> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> >>>>>>
> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
> >>>>>>
> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
> >>>>> So this is the decoding thing. How many of these actually exist, even when
> >>>>> taking all the other information into account?
> >>>>>
> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> >>>>> + height + width, how much of all these bits do you actually still freely
> >>>>> pick?
> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
> >>>> depth, display-rotated) + whether to use compression, everything else
> >>>> is fixed given those option, the properties of the chip and the
> >>>> format.
> >>>>
> >>>>
> >>>>> It might be that all the things you need to know from the memory config
> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
> >>>>> kinda the angle that we looked at this for everyone else.
> >>>>>
> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
> >>>>> becomes part of the implied stuff in the modifier.
> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not
> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
> >>>> current configuration for all GFX6-GFX8 GPU:
> >>>>
> >>>> format: (bank width, bank height, macro tile aspect, num banks) for
> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
> >>>>
> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
> >>>> ADDR_SURF_P8_32x32_8x16
> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
> >>>>
> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
> >>>> compatible for 8-bpp.
> >>>>
> >>>> So we can't just assume that if the first plane properties match that
> >>>> they do so for the second plane, because we don't know what GPU it is
> >>>> coming from.
> >>>>
> >>>> And we can make a canonical table from this but then if we change the
> >>>> above tables in the kernel that runs into compatibility issues.
> >>>
> >>> I think that's roughly what nvidia ended up doing (not sure they
> >>> published anything, haven't seen any patches on dri-devel). Two parts:
> >>> - Bunch of flags/bits for the major mode, like {display, non-display,
> >>> DCC} and so on.
> >>> - For each of the major modes a list of enumerated modes actually used
> >>> in reality. For DCC this would be the generation, for more plain
> >>> formats it would be the above table. Table entries would be per-bpp
> >>> (in case that matters, or whatever matters for amd gpus).
> >>>
> >>> Imo would be totally ok to put the relevant lookup tables into
> >>> drm_fourcc.h directly. Or any other suitable place.
> >> Right, the issue would be that the tables are configured int he kernel
> >>
> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
> >>
> >> and having a shared hardcoded table means that if the kernel tables
> >> change we break userspace. That would be a new commitment on the side
> >> of the kernel driver.
> >>
> >> (If I can get an okay for that commitment from Alex/Christian I'd be
> >> totally happy to switch it over)
> > Yeah it'd make that table uapi
>
> That won't work. Those lists are not necessary stable.
>
> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index
> plus a 8bit version number.

But if we use PCI-ID, then we cannot really share between different
models of GPU ...

So we have seen 3 options on design level:
1) Use PCI-ID/chip.
  Advantage:
    -simple
    - Few bits
  Disadvantage:
    - No sharing between different GPU models
2) Use a fixed lookup/canonicalization table.
  Advantage:
    -Few bits
    - Sharing between models
  Disadvantage:
    - Cannot change the kernel tiling tables.
3) Encode tiling mode properties in modifier.
  Advantage:
    - Can share between models.
    - Can change the kernel tiling tables.
  Disadvantage:
    - Use a lot of bits.

Overall the number of exposed modifiers should be pretty much the
same, so encoding size that we have to deal with all possible bit
patterns.

I've implemented option 3 in the patch here because I think it is the
best tradeoff. As far as I can see it fits, and especially if we do no
multisample support and remove the tile split bytes, and remove the
3rd set of plane macro-tile options because no format has 3 different
bitdepths for the 3 planes, we are left with using 37 bits, leaving 19
for future extensions and a version number. It is not great but pretty
doable?

>
> Shouldn't be more than 32 bits in total.
>
> Regards,
> Christian.
>
> >   - well the compressed from with
> > duplicates removed, if you really want to make it tiny. But you can
> > just add new entries if the heuristics for a given platform changes,
> > as long as you keep the old entries working (if you shipped them ever
> > ofc, otherwise totally ok to change as you see fit).
> > -Daniel
> >
> >>>  From a quick look about 16 bits total. Gives you 40bits of second
> >>> chances and "oops totally sorry".
> >> I purposefully left 8 bits as a version field for those oopses.
> >>
> >>> -Daniel
> >>>
> >>>>> -Daniel
> >>>>>
> >>>>>>> -Daniel
> >>>>>>>
> >>>>>>>> Regards,
> >>>>>>>> Christian.
> >>>>>>>>
> >>>>>>>>> If you do use 256 bits in order to represent
> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
> >>>>>>>>> with some kind of eBPF script.
> >>>>>>>>>
> >>>>>>>>> Cheers,
> >>>>>>>>> Daniel
> >>>>>>>>> _______________________________________________
> >>>>>>>>> dri-devel mailing list
> >>>>>>>>> dri-devel@lists.freedesktop.org
> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>>>>>> --
> >>>>>>> Daniel Vetter
> >>>>>>> Software Engineer, Intel Corporation
> >>>>>>> http://blog.ffwll.ch
> >>>>>>> _______________________________________________
> >>>>>>> dri-devel mailing list
> >>>>>>> dri-devel@lists.freedesktop.org
> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >>>>> --
> >>>>> Daniel Vetter
> >>>>> Software Engineer, Intel Corporation
> >>>>> http://blog.ffwll.ch
> >>>
> >>>
> >>> --
> >>> Daniel Vetter
> >>> Software Engineer, Intel Corporation
> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >
> >
>
Bas Nieuwenhuizen Sept. 4, 2018, 6 p.m. UTC | #22
On Tue, Sep 4, 2018 at 7:57 PM Bas Nieuwenhuizen
<bas@basnieuwenhuizen.nl> wrote:
>
> On Tue, Sep 4, 2018 at 7:48 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
> >
> > Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> > > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> > > <bas@basnieuwenhuizen.nl> wrote:
> > >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
> > >>> <bas@basnieuwenhuizen.nl> wrote:
> > >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> > >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> > >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > >>>>>>>>> Hi,
> > >>>>>>>>>
> > >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> > >>>>>>>>>>> +/* The chip this is compatible with.
> > >>>>>>>>>>> + *
> > >>>>>>>>>>> + * If compression is disabled, use
> > >>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > >>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > >>>>>>>>>>> + *
> > >>>>>>>>>>> + * With compression enabled please use the exact chip.
> > >>>>>>>>>>> + *
> > >>>>>>>>>>> + * TODO: Do some generations share DCC format?
> > >>>>>>>>>>> + */
> > >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> > >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> > >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
> > >>>>>>>>>> details? When we had the entire discussion with nvidia folks they
> > >>>>>>>>>> eventually agreed that they don't need the massive pile with every
> > >>>>>>>>>> possible combination. Do you really plan to share all these different
> > >>>>>>>>>> things?
> > >>>>>>>>>>
> > >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
> > >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
> > >>>>>>>>>> using explicit modifier flags for everything.
> > >>>>>>>>> Right. The conclusion, after people went through and started sorting
> > >>>>>>>>> out the kinds of formats for which they would _actually_ export real
> > >>>>>>>>> colour buffers for, that most vendors definitely have fewer than
> > >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > >>>>>>>>> possible formats to represent, very likely fewer than
> > >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
> > >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > >>>>>>>>> bits of the 56 available.
> > >>>>>>>> The problem here is that at least for some parameters we actually don't know
> > >>>>>>>> which formats are actually used.
> > >>>>>>>>
> > >>>>>>>> The following are not real world examples, but just to explain the general
> > >>>>>>>> problem.
> > >>>>>>>>
> > >>>>>>>> The memory configuration for example can be not ASIC specific, but rather
> > >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
> > >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
> > >>>>>>>> channel, but it is still perfectly possible.
> > >>>>>>>>
> > >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
> > >>>>>>>> could be bad and we need to know which to actually use.
> > >>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
> > >>>>>>> All the information you only need locally to a given amdgpu instance
> > >>>>>>> don't really need to be encoded in modifiers.
> > >>>>>>>
> > >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
> > >>>>>>> good starters I guess) would be really good here.
> > >>>>>> I extracted the information on which bits are relevant mostly from the
> > >>>>>> AddrFromCoord functions in addrlib in mesa:
> > >>>>>>
> > >>>>>> for macro-tiles:
> > >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> > >>>>>>
> > >>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
> > >>>>>>
> > >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
> > >>>>> So this is the decoding thing. How many of these actually exist, even when
> > >>>>> taking all the other information into account?
> > >>>>>
> > >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> > >>>>> + height + width, how much of all these bits do you actually still freely
> > >>>>> pick?
> > >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
> > >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
> > >>>> depth, display-rotated) + whether to use compression, everything else
> > >>>> is fixed given those option, the properties of the chip and the
> > >>>> format.
> > >>>>
> > >>>>
> > >>>>> It might be that all the things you need to know from the memory config
> > >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
> > >>>>> kinda the angle that we looked at this for everyone else.
> > >>>>>
> > >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
> > >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
> > >>>>> becomes part of the implied stuff in the modifier.
> > >>>> The problem is some GPUs are compatible for say 8-bpp images, but not
> > >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
> > >>>> current configuration for all GFX6-GFX8 GPU:
> > >>>>
> > >>>> format: (bank width, bank height, macro tile aspect, num banks) for
> > >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
> > >>>>
> > >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> > >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> > >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
> > >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
> > >>>> ADDR_SURF_P8_32x32_8x16
> > >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
> > >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
> > >>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
> > >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
> > >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
> > >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> > >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
> > >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> > >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
> > >>>>
> > >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
> > >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
> > >>>> compatible for 8-bpp.
> > >>>>
> > >>>> So we can't just assume that if the first plane properties match that
> > >>>> they do so for the second plane, because we don't know what GPU it is
> > >>>> coming from.
> > >>>>
> > >>>> And we can make a canonical table from this but then if we change the
> > >>>> above tables in the kernel that runs into compatibility issues.
> > >>>
> > >>> I think that's roughly what nvidia ended up doing (not sure they
> > >>> published anything, haven't seen any patches on dri-devel). Two parts:
> > >>> - Bunch of flags/bits for the major mode, like {display, non-display,
> > >>> DCC} and so on.
> > >>> - For each of the major modes a list of enumerated modes actually used
> > >>> in reality. For DCC this would be the generation, for more plain
> > >>> formats it would be the above table. Table entries would be per-bpp
> > >>> (in case that matters, or whatever matters for amd gpus).
> > >>>
> > >>> Imo would be totally ok to put the relevant lookup tables into
> > >>> drm_fourcc.h directly. Or any other suitable place.
> > >> Right, the issue would be that the tables are configured int he kernel
> > >>
> > >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
> > >>
> > >> and having a shared hardcoded table means that if the kernel tables
> > >> change we break userspace. That would be a new commitment on the side
> > >> of the kernel driver.
> > >>
> > >> (If I can get an okay for that commitment from Alex/Christian I'd be
> > >> totally happy to switch it over)
> > > Yeah it'd make that table uapi
> >
> > That won't work. Those lists are not necessary stable.
> >
> > What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index
> > plus a 8bit version number.
>
> But if we use PCI-ID, then we cannot really share between different
> models of GPU ...
>
> So we have seen 3 options on design level:
> 1) Use PCI-ID/chip.
>   Advantage:
>     -simple
>     - Few bits
>   Disadvantage:
>     - No sharing between different GPU models
> 2) Use a fixed lookup/canonicalization table.
>   Advantage:
>     -Few bits
>     - Sharing between models
>   Disadvantage:
>     - Cannot change the kernel tiling tables.

A fourth option would actually be to put this fixed table in the
kernel and have an ioctl so userspace can ask for it. That way we
avoid the kernel update issue.

> 3) Encode tiling mode properties in modifier.
>   Advantage:
>     - Can share between models.
>     - Can change the kernel tiling tables.
>   Disadvantage:
>     - Use a lot of bits.
>
> Overall the number of exposed modifiers should be pretty much the
> same, so encoding size that we have to deal with all possible bit
> patterns.
>
> I've implemented option 3 in the patch here because I think it is the
> best tradeoff. As far as I can see it fits, and especially if we do no
> multisample support and remove the tile split bytes, and remove the
> 3rd set of plane macro-tile options because no format has 3 different
> bitdepths for the 3 planes, we are left with using 37 bits, leaving 19
> for future extensions and a version number. It is not great but pretty
> doable?
>
> >
> > Shouldn't be more than 32 bits in total.
> >
> > Regards,
> > Christian.
> >
> > >   - well the compressed from with
> > > duplicates removed, if you really want to make it tiny. But you can
> > > just add new entries if the heuristics for a given platform changes,
> > > as long as you keep the old entries working (if you shipped them ever
> > > ofc, otherwise totally ok to change as you see fit).
> > > -Daniel
> > >
> > >>>  From a quick look about 16 bits total. Gives you 40bits of second
> > >>> chances and "oops totally sorry".
> > >> I purposefully left 8 bits as a version field for those oopses.
> > >>
> > >>> -Daniel
> > >>>
> > >>>>> -Daniel
> > >>>>>
> > >>>>>>> -Daniel
> > >>>>>>>
> > >>>>>>>> Regards,
> > >>>>>>>> Christian.
> > >>>>>>>>
> > >>>>>>>>> If you do use 256 bits in order to represent
> > >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
> > >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
> > >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
> > >>>>>>>>> with some kind of eBPF script.
> > >>>>>>>>>
> > >>>>>>>>> Cheers,
> > >>>>>>>>> Daniel
> > >>>>>>>>> _______________________________________________
> > >>>>>>>>> dri-devel mailing list
> > >>>>>>>>> dri-devel@lists.freedesktop.org
> > >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >>>>>>> --
> > >>>>>>> Daniel Vetter
> > >>>>>>> Software Engineer, Intel Corporation
> > >>>>>>> http://blog.ffwll.ch
> > >>>>>>> _______________________________________________
> > >>>>>>> dri-devel mailing list
> > >>>>>>> dri-devel@lists.freedesktop.org
> > >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >>>>> --
> > >>>>> Daniel Vetter
> > >>>>> Software Engineer, Intel Corporation
> > >>>>> http://blog.ffwll.ch
> > >>>
> > >>>
> > >>> --
> > >>> Daniel Vetter
> > >>> Software Engineer, Intel Corporation
> > >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > >
> > >
> >
Christian König Sept. 4, 2018, 6:06 p.m. UTC | #23
Am 04.09.2018 um 20:00 schrieb Bas Nieuwenhuizen:
> On Tue, Sep 4, 2018 at 7:57 PM Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl> wrote:
>> On Tue, Sep 4, 2018 at 7:48 PM Christian König
>> <ckoenig.leichtzumerken@gmail.com> wrote:
>>> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
>>>> On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
>>>> <bas@basnieuwenhuizen.nl> wrote:
>>>>> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>>>>>> <bas@basnieuwenhuizen.nl> wrote:
>>>>>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>>>>>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>>>>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>>>>>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>>>>>>>>>>>> Hi,
>>>>>>>>>>>>
>>>>>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>>>>>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
>>>>>>>>>>>>>> +/* The chip this is compatible with.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * If compression is disabled, use
>>>>>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>>>>>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * With compression enabled please use the exact chip.
>>>>>>>>>>>>>> + *
>>>>>>>>>>>>>> + * TODO: Do some generations share DCC format?
>>>>>>>>>>>>>> + */
>>>>>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>>>>>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>>>>>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
>>>>>>>>>>>>> details? When we had the entire discussion with nvidia folks they
>>>>>>>>>>>>> eventually agreed that they don't need the massive pile with every
>>>>>>>>>>>>> possible combination. Do you really plan to share all these different
>>>>>>>>>>>>> things?
>>>>>>>>>>>>>
>>>>>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
>>>>>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
>>>>>>>>>>>>> using explicit modifier flags for everything.
>>>>>>>>>>>> Right. The conclusion, after people went through and started sorting
>>>>>>>>>>>> out the kinds of formats for which they would _actually_ export real
>>>>>>>>>>>> colour buffers for, that most vendors definitely have fewer than
>>>>>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>>>>>>>>> possible formats to represent, very likely fewer than
>>>>>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>>>>>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>>>>>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>>>>>>>>>>>> bits of the 56 available.
>>>>>>>>>>> The problem here is that at least for some parameters we actually don't know
>>>>>>>>>>> which formats are actually used.
>>>>>>>>>>>
>>>>>>>>>>> The following are not real world examples, but just to explain the general
>>>>>>>>>>> problem.
>>>>>>>>>>>
>>>>>>>>>>> The memory configuration for example can be not ASIC specific, but rather
>>>>>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
>>>>>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
>>>>>>>>>>> channel, but it is still perfectly possible.
>>>>>>>>>>>
>>>>>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
>>>>>>>>>>> could be bad and we need to know which to actually use.
>>>>>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
>>>>>>>>>> All the information you only need locally to a given amdgpu instance
>>>>>>>>>> don't really need to be encoded in modifiers.
>>>>>>>>>>
>>>>>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
>>>>>>>>>> good starters I guess) would be really good here.
>>>>>>>>> I extracted the information on which bits are relevant mostly from the
>>>>>>>>> AddrFromCoord functions in addrlib in mesa:
>>>>>>>>>
>>>>>>>>> for macro-tiles:
>>>>>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>>>>>>>>>
>>>>>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
>>>>>>>>>
>>>>>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>>>>>>>> So this is the decoding thing. How many of these actually exist, even when
>>>>>>>> taking all the other information into account?
>>>>>>>>
>>>>>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>>>>>>>> + height + width, how much of all these bits do you actually still freely
>>>>>>>> pick?
>>>>>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
>>>>>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
>>>>>>> depth, display-rotated) + whether to use compression, everything else
>>>>>>> is fixed given those option, the properties of the chip and the
>>>>>>> format.
>>>>>>>
>>>>>>>
>>>>>>>> It might be that all the things you need to know from the memory config
>>>>>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>>>>>>>> kinda the angle that we looked at this for everyone else.
>>>>>>>>
>>>>>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
>>>>>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
>>>>>>>> becomes part of the implied stuff in the modifier.
>>>>>>> The problem is some GPUs are compatible for say 8-bpp images, but not
>>>>>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
>>>>>>> current configuration for all GFX6-GFX8 GPU:
>>>>>>>
>>>>>>> format: (bank width, bank height, macro tile aspect, num banks) for
>>>>>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
>>>>>>>
>>>>>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>>>>>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>>>>>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
>>>>>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
>>>>>>> ADDR_SURF_P8_32x32_8x16
>>>>>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
>>>>>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
>>>>>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
>>>>>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
>>>>>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
>>>>>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>>>>>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
>>>>>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>>>>>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
>>>>>>>
>>>>>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
>>>>>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
>>>>>>> compatible for 8-bpp.
>>>>>>>
>>>>>>> So we can't just assume that if the first plane properties match that
>>>>>>> they do so for the second plane, because we don't know what GPU it is
>>>>>>> coming from.
>>>>>>>
>>>>>>> And we can make a canonical table from this but then if we change the
>>>>>>> above tables in the kernel that runs into compatibility issues.
>>>>>> I think that's roughly what nvidia ended up doing (not sure they
>>>>>> published anything, haven't seen any patches on dri-devel). Two parts:
>>>>>> - Bunch of flags/bits for the major mode, like {display, non-display,
>>>>>> DCC} and so on.
>>>>>> - For each of the major modes a list of enumerated modes actually used
>>>>>> in reality. For DCC this would be the generation, for more plain
>>>>>> formats it would be the above table. Table entries would be per-bpp
>>>>>> (in case that matters, or whatever matters for amd gpus).
>>>>>>
>>>>>> Imo would be totally ok to put the relevant lookup tables into
>>>>>> drm_fourcc.h directly. Or any other suitable place.
>>>>> Right, the issue would be that the tables are configured int he kernel
>>>>>
>>>>> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
>>>>>
>>>>> and having a shared hardcoded table means that if the kernel tables
>>>>> change we break userspace. That would be a new commitment on the side
>>>>> of the kernel driver.
>>>>>
>>>>> (If I can get an okay for that commitment from Alex/Christian I'd be
>>>>> totally happy to switch it over)
>>>> Yeah it'd make that table uapi
>>> That won't work. Those lists are not necessary stable.
>>>
>>> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index
>>> plus a 8bit version number.
>> But if we use PCI-ID, then we cannot really share between different
>> models of GPU ...
>>
>> So we have seen 3 options on design level:
>> 1) Use PCI-ID/chip.
>>    Advantage:
>>      -simple
>>      - Few bits
>>    Disadvantage:
>>      - No sharing between different GPU models
>> 2) Use a fixed lookup/canonicalization table.
>>    Advantage:
>>      -Few bits
>>      - Sharing between models
>>    Disadvantage:
>>      - Cannot change the kernel tiling tables.
> A fourth option would actually be to put this fixed table in the
> kernel and have an ioctl so userspace can ask for it. That way we
> avoid the kernel update issue.

That is already present. The table is actually only used for GFX.

MM, SDMA, DCE have separate ways of configuring that and so need the 
config for each entry.

>> 3) Encode tiling mode properties in modifier.
>>    Advantage:
>>      - Can share between models.
>>      - Can change the kernel tiling tables.
>>    Disadvantage:
>>      - Use a lot of bits.
>>
>> Overall the number of exposed modifiers should be pretty much the
>> same, so encoding size that we have to deal with all possible bit
>> patterns.
>>
>> I've implemented option 3 in the patch here because I think it is the
>> best tradeoff. As far as I can see it fits, and especially if we do no
>> multisample support and remove the tile split bytes, and remove the
>> 3rd set of plane macro-tile options because no format has 3 different
>> bitdepths for the 3 planes, we are left with using 37 bits, leaving 19
>> for future extensions and a version number. It is not great but pretty
>> doable?

Yeah, the issue is you seem to look only at the public available bits of 
addrlib.

That might not be enough to get a complete picture, but I think for 
sharing between processes it should be enough.

Regards,
Christian.

>>
>>> Shouldn't be more than 32 bits in total.
>>>
>>> Regards,
>>> Christian.
>>>
>>>>    - well the compressed from with
>>>> duplicates removed, if you really want to make it tiny. But you can
>>>> just add new entries if the heuristics for a given platform changes,
>>>> as long as you keep the old entries working (if you shipped them ever
>>>> ofc, otherwise totally ok to change as you see fit).
>>>> -Daniel
>>>>
>>>>>>   From a quick look about 16 bits total. Gives you 40bits of second
>>>>>> chances and "oops totally sorry".
>>>>> I purposefully left 8 bits as a version field for those oopses.
>>>>>
>>>>>> -Daniel
>>>>>>
>>>>>>>> -Daniel
>>>>>>>>
>>>>>>>>>> -Daniel
>>>>>>>>>>
>>>>>>>>>>> Regards,
>>>>>>>>>>> Christian.
>>>>>>>>>>>
>>>>>>>>>>>> If you do use 256 bits in order to represent
>>>>>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>>>>>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
>>>>>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
>>>>>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
>>>>>>>>>>>> with some kind of eBPF script.
>>>>>>>>>>>>
>>>>>>>>>>>> Cheers,
>>>>>>>>>>>> Daniel
>>>>>>>>>>>> _______________________________________________
>>>>>>>>>>>> dri-devel mailing list
>>>>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>>>> --
>>>>>>>>>> Daniel Vetter
>>>>>>>>>> Software Engineer, Intel Corporation
>>>>>>>>>> http://blog.ffwll.ch
>>>>>>>>>> _______________________________________________
>>>>>>>>>> dri-devel mailing list
>>>>>>>>>> dri-devel@lists.freedesktop.org
>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>>>>>> --
>>>>>>>> Daniel Vetter
>>>>>>>> Software Engineer, Intel Corporation
>>>>>>>> http://blog.ffwll.ch
>>>>>>
>>>>>> --
>>>>>> Daniel Vetter
>>>>>> Software Engineer, Intel Corporation
>>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>>>>
Daniel Vetter Sept. 4, 2018, 6:27 p.m. UTC | #24
On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen
<bas@basnieuwenhuizen.nl> wrote:
> On Tue, Sep 4, 2018 at 7:48 PM Christian König
> <ckoenig.leichtzumerken@gmail.com> wrote:
>>
>> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
>> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
>> > <bas@basnieuwenhuizen.nl> wrote:
>> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>> >>> <bas@basnieuwenhuizen.nl> wrote:
>> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>> >>>>>>>>> Hi,
>> >>>>>>>>>
>> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
>> >>>>>>>>>>> +/* The chip this is compatible with.
>> >>>>>>>>>>> + *
>> >>>>>>>>>>> + * If compression is disabled, use
>> >>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>> >>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>> >>>>>>>>>>> + *
>> >>>>>>>>>>> + * With compression enabled please use the exact chip.
>> >>>>>>>>>>> + *
>> >>>>>>>>>>> + * TODO: Do some generations share DCC format?
>> >>>>>>>>>>> + */
>> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
>> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they
>> >>>>>>>>>> eventually agreed that they don't need the massive pile with every
>> >>>>>>>>>> possible combination. Do you really plan to share all these different
>> >>>>>>>>>> things?
>> >>>>>>>>>>
>> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
>> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
>> >>>>>>>>>> using explicit modifier flags for everything.
>> >>>>>>>>> Right. The conclusion, after people went through and started sorting
>> >>>>>>>>> out the kinds of formats for which they would _actually_ export real
>> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than
>> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> >>>>>>>>> possible formats to represent, very likely fewer than
>> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>> >>>>>>>>> bits of the 56 available.
>> >>>>>>>> The problem here is that at least for some parameters we actually don't know
>> >>>>>>>> which formats are actually used.
>> >>>>>>>>
>> >>>>>>>> The following are not real world examples, but just to explain the general
>> >>>>>>>> problem.
>> >>>>>>>>
>> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather
>> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
>> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
>> >>>>>>>> channel, but it is still perfectly possible.
>> >>>>>>>>
>> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
>> >>>>>>>> could be bad and we need to know which to actually use.
>> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
>> >>>>>>> All the information you only need locally to a given amdgpu instance
>> >>>>>>> don't really need to be encoded in modifiers.
>> >>>>>>>
>> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
>> >>>>>>> good starters I guess) would be really good here.
>> >>>>>> I extracted the information on which bits are relevant mostly from the
>> >>>>>> AddrFromCoord functions in addrlib in mesa:
>> >>>>>>
>> >>>>>> for macro-tiles:
>> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>> >>>>>>
>> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
>> >>>>>>
>> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>> >>>>> So this is the decoding thing. How many of these actually exist, even when
>> >>>>> taking all the other information into account?
>> >>>>>
>> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>> >>>>> + height + width, how much of all these bits do you actually still freely
>> >>>>> pick?
>> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
>> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
>> >>>> depth, display-rotated) + whether to use compression, everything else
>> >>>> is fixed given those option, the properties of the chip and the
>> >>>> format.
>> >>>>
>> >>>>
>> >>>>> It might be that all the things you need to know from the memory config
>> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>> >>>>> kinda the angle that we looked at this for everyone else.
>> >>>>>
>> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
>> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
>> >>>>> becomes part of the implied stuff in the modifier.
>> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not
>> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
>> >>>> current configuration for all GFX6-GFX8 GPU:
>> >>>>
>> >>>> format: (bank width, bank height, macro tile aspect, num banks) for
>> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
>> >>>>
>> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
>> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
>> >>>> ADDR_SURF_P8_32x32_8x16
>> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
>> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
>> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
>> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
>> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
>> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
>> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
>> >>>>
>> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
>> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
>> >>>> compatible for 8-bpp.
>> >>>>
>> >>>> So we can't just assume that if the first plane properties match that
>> >>>> they do so for the second plane, because we don't know what GPU it is
>> >>>> coming from.
>> >>>>
>> >>>> And we can make a canonical table from this but then if we change the
>> >>>> above tables in the kernel that runs into compatibility issues.
>> >>>
>> >>> I think that's roughly what nvidia ended up doing (not sure they
>> >>> published anything, haven't seen any patches on dri-devel). Two parts:
>> >>> - Bunch of flags/bits for the major mode, like {display, non-display,
>> >>> DCC} and so on.
>> >>> - For each of the major modes a list of enumerated modes actually used
>> >>> in reality. For DCC this would be the generation, for more plain
>> >>> formats it would be the above table. Table entries would be per-bpp
>> >>> (in case that matters, or whatever matters for amd gpus).
>> >>>
>> >>> Imo would be totally ok to put the relevant lookup tables into
>> >>> drm_fourcc.h directly. Or any other suitable place.
>> >> Right, the issue would be that the tables are configured int he kernel
>> >>
>> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
>> >>
>> >> and having a shared hardcoded table means that if the kernel tables
>> >> change we break userspace. That would be a new commitment on the side
>> >> of the kernel driver.
>> >>
>> >> (If I can get an okay for that commitment from Alex/Christian I'd be
>> >> totally happy to switch it over)
>> > Yeah it'd make that table uapi
>>
>> That won't work. Those lists are not necessary stable.
>>
>> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index
>> plus a 8bit version number.
>
> But if we use PCI-ID, then we cannot really share between different
> models of GPU ...
>
> So we have seen 3 options on design level:
> 1) Use PCI-ID/chip.
>   Advantage:
>     -simple
>     - Few bits
>   Disadvantage:
>     - No sharing between different GPU models
> 2) Use a fixed lookup/canonicalization table.
>   Advantage:
>     -Few bits
>     - Sharing between models
>   Disadvantage:
>     - Cannot change the kernel tiling tables.

Why is changing the kernel table impossible? Just because it's uapi
doesn't mean we can never extend it. Just add a new entry at the
bottom if you need more combinations ...

You don't even need to require the updated kernel amdgpu.ko, just the
updated drm_fourcc.h header (if it's one of these things you can
configure through cs packets entirely from userspace).

Or why does this not work?
-Daniel

> 3) Encode tiling mode properties in modifier.
>   Advantage:
>     - Can share between models.
>     - Can change the kernel tiling tables.
>   Disadvantage:
>     - Use a lot of bits.
>
> Overall the number of exposed modifiers should be pretty much the
> same, so encoding size that we have to deal with all possible bit
> patterns.
>
> I've implemented option 3 in the patch here because I think it is the
> best tradeoff. As far as I can see it fits, and especially if we do no
> multisample support and remove the tile split bytes, and remove the
> 3rd set of plane macro-tile options because no format has 3 different
> bitdepths for the 3 planes, we are left with using 37 bits, leaving 19
> for future extensions and a version number. It is not great but pretty
> doable?
>
>>
>> Shouldn't be more than 32 bits in total.
>>
>> Regards,
>> Christian.
>>
>> >   - well the compressed from with
>> > duplicates removed, if you really want to make it tiny. But you can
>> > just add new entries if the heuristics for a given platform changes,
>> > as long as you keep the old entries working (if you shipped them ever
>> > ofc, otherwise totally ok to change as you see fit).
>> > -Daniel
>> >
>> >>>  From a quick look about 16 bits total. Gives you 40bits of second
>> >>> chances and "oops totally sorry".
>> >> I purposefully left 8 bits as a version field for those oopses.
>> >>
>> >>> -Daniel
>> >>>
>> >>>>> -Daniel
>> >>>>>
>> >>>>>>> -Daniel
>> >>>>>>>
>> >>>>>>>> Regards,
>> >>>>>>>> Christian.
>> >>>>>>>>
>> >>>>>>>>> If you do use 256 bits in order to represent
>> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
>> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
>> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
>> >>>>>>>>> with some kind of eBPF script.
>> >>>>>>>>>
>> >>>>>>>>> Cheers,
>> >>>>>>>>> Daniel
>> >>>>>>>>> _______________________________________________
>> >>>>>>>>> dri-devel mailing list
>> >>>>>>>>> dri-devel@lists.freedesktop.org
>> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> >>>>>>> --
>> >>>>>>> Daniel Vetter
>> >>>>>>> Software Engineer, Intel Corporation
>> >>>>>>> http://blog.ffwll.ch
>> >>>>>>> _______________________________________________
>> >>>>>>> dri-devel mailing list
>> >>>>>>> dri-devel@lists.freedesktop.org
>> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> >>>>> --
>> >>>>> Daniel Vetter
>> >>>>> Software Engineer, Intel Corporation
>> >>>>> http://blog.ffwll.ch
>> >>>
>> >>>
>> >>> --
>> >>> Daniel Vetter
>> >>> Software Engineer, Intel Corporation
>> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>> >
>> >
>>
Bas Nieuwenhuizen Sept. 4, 2018, 6:31 p.m. UTC | #25
On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl> wrote:
> > On Tue, Sep 4, 2018 at 7:48 PM Christian König
> > <ckoenig.leichtzumerken@gmail.com> wrote:
> >>
> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> >> > <bas@basnieuwenhuizen.nl> wrote:
> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
> >> >>> <bas@basnieuwenhuizen.nl> wrote:
> >> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> >> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> >> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> >> >>>>>>>>> Hi,
> >> >>>>>>>>>
> >> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> >> >>>>>>>>>>> +/* The chip this is compatible with.
> >> >>>>>>>>>>> + *
> >> >>>>>>>>>>> + * If compression is disabled, use
> >> >>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> >> >>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> >> >>>>>>>>>>> + *
> >> >>>>>>>>>>> + * With compression enabled please use the exact chip.
> >> >>>>>>>>>>> + *
> >> >>>>>>>>>>> + * TODO: Do some generations share DCC format?
> >> >>>>>>>>>>> + */
> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> >> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
> >> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they
> >> >>>>>>>>>> eventually agreed that they don't need the massive pile with every
> >> >>>>>>>>>> possible combination. Do you really plan to share all these different
> >> >>>>>>>>>> things?
> >> >>>>>>>>>>
> >> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
> >> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
> >> >>>>>>>>>> using explicit modifier flags for everything.
> >> >>>>>>>>> Right. The conclusion, after people went through and started sorting
> >> >>>>>>>>> out the kinds of formats for which they would _actually_ export real
> >> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than
> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> >> >>>>>>>>> possible formats to represent, very likely fewer than
> >> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> >> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
> >> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
> >> >>>>>>>>> bits of the 56 available.
> >> >>>>>>>> The problem here is that at least for some parameters we actually don't know
> >> >>>>>>>> which formats are actually used.
> >> >>>>>>>>
> >> >>>>>>>> The following are not real world examples, but just to explain the general
> >> >>>>>>>> problem.
> >> >>>>>>>>
> >> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather
> >> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
> >> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
> >> >>>>>>>> channel, but it is still perfectly possible.
> >> >>>>>>>>
> >> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
> >> >>>>>>>> could be bad and we need to know which to actually use.
> >> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
> >> >>>>>>> All the information you only need locally to a given amdgpu instance
> >> >>>>>>> don't really need to be encoded in modifiers.
> >> >>>>>>>
> >> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
> >> >>>>>>> good starters I guess) would be really good here.
> >> >>>>>> I extracted the information on which bits are relevant mostly from the
> >> >>>>>> AddrFromCoord functions in addrlib in mesa:
> >> >>>>>>
> >> >>>>>> for macro-tiles:
> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> >> >>>>>>
> >> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
> >> >>>>>>
> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
> >> >>>>> So this is the decoding thing. How many of these actually exist, even when
> >> >>>>> taking all the other information into account?
> >> >>>>>
> >> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> >> >>>>> + height + width, how much of all these bits do you actually still freely
> >> >>>>> pick?
> >> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
> >> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
> >> >>>> depth, display-rotated) + whether to use compression, everything else
> >> >>>> is fixed given those option, the properties of the chip and the
> >> >>>> format.
> >> >>>>
> >> >>>>
> >> >>>>> It might be that all the things you need to know from the memory config
> >> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
> >> >>>>> kinda the angle that we looked at this for everyone else.
> >> >>>>>
> >> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
> >> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
> >> >>>>> becomes part of the implied stuff in the modifier.
> >> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not
> >> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
> >> >>>> current configuration for all GFX6-GFX8 GPU:
> >> >>>>
> >> >>>> format: (bank width, bank height, macro tile aspect, num banks) for
> >> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
> >> >>>>
> >> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> >> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> >> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
> >> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
> >> >>>> ADDR_SURF_P8_32x32_8x16
> >> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
> >> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
> >> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
> >> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
> >> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
> >> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> >> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
> >> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> >> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
> >> >>>>
> >> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
> >> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
> >> >>>> compatible for 8-bpp.
> >> >>>>
> >> >>>> So we can't just assume that if the first plane properties match that
> >> >>>> they do so for the second plane, because we don't know what GPU it is
> >> >>>> coming from.
> >> >>>>
> >> >>>> And we can make a canonical table from this but then if we change the
> >> >>>> above tables in the kernel that runs into compatibility issues.
> >> >>>
> >> >>> I think that's roughly what nvidia ended up doing (not sure they
> >> >>> published anything, haven't seen any patches on dri-devel). Two parts:
> >> >>> - Bunch of flags/bits for the major mode, like {display, non-display,
> >> >>> DCC} and so on.
> >> >>> - For each of the major modes a list of enumerated modes actually used
> >> >>> in reality. For DCC this would be the generation, for more plain
> >> >>> formats it would be the above table. Table entries would be per-bpp
> >> >>> (in case that matters, or whatever matters for amd gpus).
> >> >>>
> >> >>> Imo would be totally ok to put the relevant lookup tables into
> >> >>> drm_fourcc.h directly. Or any other suitable place.
> >> >> Right, the issue would be that the tables are configured int he kernel
> >> >>
> >> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
> >> >>
> >> >> and having a shared hardcoded table means that if the kernel tables
> >> >> change we break userspace. That would be a new commitment on the side
> >> >> of the kernel driver.
> >> >>
> >> >> (If I can get an okay for that commitment from Alex/Christian I'd be
> >> >> totally happy to switch it over)
> >> > Yeah it'd make that table uapi
> >>
> >> That won't work. Those lists are not necessary stable.
> >>
> >> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index
> >> plus a 8bit version number.
> >
> > But if we use PCI-ID, then we cannot really share between different
> > models of GPU ...
> >
> > So we have seen 3 options on design level:
> > 1) Use PCI-ID/chip.
> >   Advantage:
> >     -simple
> >     - Few bits
> >   Disadvantage:
> >     - No sharing between different GPU models
> > 2) Use a fixed lookup/canonicalization table.
> >   Advantage:
> >     -Few bits
> >     - Sharing between models
> >   Disadvantage:
> >     - Cannot change the kernel tiling tables.
>
> Why is changing the kernel table impossible? Just because it's uapi
> doesn't mean we can never extend it. Just add a new entry at the
> bottom if you need more combinations ...
>
> You don't even need to require the updated kernel amdgpu.ko, just the
> updated drm_fourcc.h header (if it's one of these things you can
> configure through cs packets entirely from userspace).

So we have two tables we are talking about:
 a) The kernel initialized system wide table of tiling configurations
that already exists
 b)The uapi table

Currently mesa selects by index, i.e. if we want a 32-bit surface with
macro-tiling that is not displayable we should select index 14 in the
system wide tiling table. Now if we change that table entry 14 that
means that we change which things it is compatible with, so we need a
new entry in the uapi table. However, an old mesa does not know about
the new entry in the uapi table, so loses the ability to select a
modifier for the preferred tiling mode, e.g. we pretty much regress
old userspace.

>
> Or why does this not work?
> -Daniel
>
> > 3) Encode tiling mode properties in modifier.
> >   Advantage:
> >     - Can share between models.
> >     - Can change the kernel tiling tables.
> >   Disadvantage:
> >     - Use a lot of bits.
> >
> > Overall the number of exposed modifiers should be pretty much the
> > same, so encoding size that we have to deal with all possible bit
> > patterns.
> >
> > I've implemented option 3 in the patch here because I think it is the
> > best tradeoff. As far as I can see it fits, and especially if we do no
> > multisample support and remove the tile split bytes, and remove the
> > 3rd set of plane macro-tile options because no format has 3 different
> > bitdepths for the 3 planes, we are left with using 37 bits, leaving 19
> > for future extensions and a version number. It is not great but pretty
> > doable?
> >
> >>
> >> Shouldn't be more than 32 bits in total.
> >>
> >> Regards,
> >> Christian.
> >>
> >> >   - well the compressed from with
> >> > duplicates removed, if you really want to make it tiny. But you can
> >> > just add new entries if the heuristics for a given platform changes,
> >> > as long as you keep the old entries working (if you shipped them ever
> >> > ofc, otherwise totally ok to change as you see fit).
> >> > -Daniel
> >> >
> >> >>>  From a quick look about 16 bits total. Gives you 40bits of second
> >> >>> chances and "oops totally sorry".
> >> >> I purposefully left 8 bits as a version field for those oopses.
> >> >>
> >> >>> -Daniel
> >> >>>
> >> >>>>> -Daniel
> >> >>>>>
> >> >>>>>>> -Daniel
> >> >>>>>>>
> >> >>>>>>>> Regards,
> >> >>>>>>>> Christian.
> >> >>>>>>>>
> >> >>>>>>>>> If you do use 256 bits in order to represent
> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> >> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
> >> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
> >> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
> >> >>>>>>>>> with some kind of eBPF script.
> >> >>>>>>>>>
> >> >>>>>>>>> Cheers,
> >> >>>>>>>>> Daniel
> >> >>>>>>>>> _______________________________________________
> >> >>>>>>>>> dri-devel mailing list
> >> >>>>>>>>> dri-devel@lists.freedesktop.org
> >> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >> >>>>>>> --
> >> >>>>>>> Daniel Vetter
> >> >>>>>>> Software Engineer, Intel Corporation
> >> >>>>>>> http://blog.ffwll.ch
> >> >>>>>>> _______________________________________________
> >> >>>>>>> dri-devel mailing list
> >> >>>>>>> dri-devel@lists.freedesktop.org
> >> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >> >>>>> --
> >> >>>>> Daniel Vetter
> >> >>>>> Software Engineer, Intel Corporation
> >> >>>>> http://blog.ffwll.ch
> >> >>>
> >> >>>
> >> >>> --
> >> >>> Daniel Vetter
> >> >>> Software Engineer, Intel Corporation
> >> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >> >
> >> >
> >>
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Sept. 4, 2018, 7:28 p.m. UTC | #26
On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen
<bas@basnieuwenhuizen.nl> wrote:
> On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>>
>> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen
>> <bas@basnieuwenhuizen.nl> wrote:
>> > On Tue, Sep 4, 2018 at 7:48 PM Christian König
>> > <ckoenig.leichtzumerken@gmail.com> wrote:
>> >>
>> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
>> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
>> >> > <bas@basnieuwenhuizen.nl> wrote:
>> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
>> >> >>> <bas@basnieuwenhuizen.nl> wrote:
>> >> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> >> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
>> >> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>> >> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
>> >> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
>> >> >>>>>>>>> Hi,
>> >> >>>>>>>>>
>> >> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
>> >> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
>> >> >>>>>>>>>>> +/* The chip this is compatible with.
>> >> >>>>>>>>>>> + *
>> >> >>>>>>>>>>> + * If compression is disabled, use
>> >> >>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
>> >> >>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
>> >> >>>>>>>>>>> + *
>> >> >>>>>>>>>>> + * With compression enabled please use the exact chip.
>> >> >>>>>>>>>>> + *
>> >> >>>>>>>>>>> + * TODO: Do some generations share DCC format?
>> >> >>>>>>>>>>> + */
>> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
>> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
>> >> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
>> >> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they
>> >> >>>>>>>>>> eventually agreed that they don't need the massive pile with every
>> >> >>>>>>>>>> possible combination. Do you really plan to share all these different
>> >> >>>>>>>>>> things?
>> >> >>>>>>>>>>
>> >> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
>> >> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
>> >> >>>>>>>>>> using explicit modifier flags for everything.
>> >> >>>>>>>>> Right. The conclusion, after people went through and started sorting
>> >> >>>>>>>>> out the kinds of formats for which they would _actually_ export real
>> >> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than
>> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> >> >>>>>>>>> possible formats to represent, very likely fewer than
>> >> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
>> >> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
>> >> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
>> >> >>>>>>>>> bits of the 56 available.
>> >> >>>>>>>> The problem here is that at least for some parameters we actually don't know
>> >> >>>>>>>> which formats are actually used.
>> >> >>>>>>>>
>> >> >>>>>>>> The following are not real world examples, but just to explain the general
>> >> >>>>>>>> problem.
>> >> >>>>>>>>
>> >> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather
>> >> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
>> >> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
>> >> >>>>>>>> channel, but it is still perfectly possible.
>> >> >>>>>>>>
>> >> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
>> >> >>>>>>>> could be bad and we need to know which to actually use.
>> >> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
>> >> >>>>>>> All the information you only need locally to a given amdgpu instance
>> >> >>>>>>> don't really need to be encoded in modifiers.
>> >> >>>>>>>
>> >> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
>> >> >>>>>>> good starters I guess) would be really good here.
>> >> >>>>>> I extracted the information on which bits are relevant mostly from the
>> >> >>>>>> AddrFromCoord functions in addrlib in mesa:
>> >> >>>>>>
>> >> >>>>>> for macro-tiles:
>> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
>> >> >>>>>>
>> >> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
>> >> >>>>>>
>> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
>> >> >>>>> So this is the decoding thing. How many of these actually exist, even when
>> >> >>>>> taking all the other information into account?
>> >> >>>>>
>> >> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
>> >> >>>>> + height + width, how much of all these bits do you actually still freely
>> >> >>>>> pick?
>> >> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
>> >> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
>> >> >>>> depth, display-rotated) + whether to use compression, everything else
>> >> >>>> is fixed given those option, the properties of the chip and the
>> >> >>>> format.
>> >> >>>>
>> >> >>>>
>> >> >>>>> It might be that all the things you need to know from the memory config
>> >> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
>> >> >>>>> kinda the angle that we looked at this for everyone else.
>> >> >>>>>
>> >> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
>> >> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
>> >> >>>>> becomes part of the implied stuff in the modifier.
>> >> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not
>> >> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
>> >> >>>> current configuration for all GFX6-GFX8 GPU:
>> >> >>>>
>> >> >>>> format: (bank width, bank height, macro tile aspect, num banks) for
>> >> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
>> >> >>>>
>> >> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>> >> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
>> >> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
>> >> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
>> >> >>>> ADDR_SURF_P8_32x32_8x16
>> >> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
>> >> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
>> >> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
>> >> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
>> >> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
>> >> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>> >> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
>> >> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
>> >> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
>> >> >>>>
>> >> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
>> >> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
>> >> >>>> compatible for 8-bpp.
>> >> >>>>
>> >> >>>> So we can't just assume that if the first plane properties match that
>> >> >>>> they do so for the second plane, because we don't know what GPU it is
>> >> >>>> coming from.
>> >> >>>>
>> >> >>>> And we can make a canonical table from this but then if we change the
>> >> >>>> above tables in the kernel that runs into compatibility issues.
>> >> >>>
>> >> >>> I think that's roughly what nvidia ended up doing (not sure they
>> >> >>> published anything, haven't seen any patches on dri-devel). Two parts:
>> >> >>> - Bunch of flags/bits for the major mode, like {display, non-display,
>> >> >>> DCC} and so on.
>> >> >>> - For each of the major modes a list of enumerated modes actually used
>> >> >>> in reality. For DCC this would be the generation, for more plain
>> >> >>> formats it would be the above table. Table entries would be per-bpp
>> >> >>> (in case that matters, or whatever matters for amd gpus).
>> >> >>>
>> >> >>> Imo would be totally ok to put the relevant lookup tables into
>> >> >>> drm_fourcc.h directly. Or any other suitable place.
>> >> >> Right, the issue would be that the tables are configured int he kernel
>> >> >>
>> >> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
>> >> >>
>> >> >> and having a shared hardcoded table means that if the kernel tables
>> >> >> change we break userspace. That would be a new commitment on the side
>> >> >> of the kernel driver.
>> >> >>
>> >> >> (If I can get an okay for that commitment from Alex/Christian I'd be
>> >> >> totally happy to switch it over)
>> >> > Yeah it'd make that table uapi
>> >>
>> >> That won't work. Those lists are not necessary stable.
>> >>
>> >> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index
>> >> plus a 8bit version number.
>> >
>> > But if we use PCI-ID, then we cannot really share between different
>> > models of GPU ...
>> >
>> > So we have seen 3 options on design level:
>> > 1) Use PCI-ID/chip.
>> >   Advantage:
>> >     -simple
>> >     - Few bits
>> >   Disadvantage:
>> >     - No sharing between different GPU models
>> > 2) Use a fixed lookup/canonicalization table.
>> >   Advantage:
>> >     -Few bits
>> >     - Sharing between models
>> >   Disadvantage:
>> >     - Cannot change the kernel tiling tables.
>>
>> Why is changing the kernel table impossible? Just because it's uapi
>> doesn't mean we can never extend it. Just add a new entry at the
>> bottom if you need more combinations ...
>>
>> You don't even need to require the updated kernel amdgpu.ko, just the
>> updated drm_fourcc.h header (if it's one of these things you can
>> configure through cs packets entirely from userspace).
>
> So we have two tables we are talking about:
>  a) The kernel initialized system wide table of tiling configurations
> that already exists
>  b)The uapi table
>
> Currently mesa selects by index, i.e. if we want a 32-bit surface with
> macro-tiling that is not displayable we should select index 14 in the
> system wide tiling table. Now if we change that table entry 14 that
> means that we change which things it is compatible with, so we need a
> new entry in the uapi table. However, an old mesa does not know about
> the new entry in the uapi table, so loses the ability to select a
> modifier for the preferred tiling mode, e.g. we pretty much regress
> old userspace.

That sounds like this table already _is_ uapi. So you can't change it
... why would you want to change it then?

Note also that the fourcc table doesn't really have to match this
by-index tiling mode uapi table, you can freely add more at the end or
have any kind of fancy lookup table (one for everyone, or
per-generation, or per-whatever, doesn't matter). Just a tradeoff
between compressing the set of actually used layouts into as few
modifier bits as possible against code maintenance headaches of having
to deal with too many lookup tables for all the different things.
-Daniel

>
>>
>> Or why does this not work?
>> -Daniel
>>
>> > 3) Encode tiling mode properties in modifier.
>> >   Advantage:
>> >     - Can share between models.
>> >     - Can change the kernel tiling tables.
>> >   Disadvantage:
>> >     - Use a lot of bits.
>> >
>> > Overall the number of exposed modifiers should be pretty much the
>> > same, so encoding size that we have to deal with all possible bit
>> > patterns.
>> >
>> > I've implemented option 3 in the patch here because I think it is the
>> > best tradeoff. As far as I can see it fits, and especially if we do no
>> > multisample support and remove the tile split bytes, and remove the
>> > 3rd set of plane macro-tile options because no format has 3 different
>> > bitdepths for the 3 planes, we are left with using 37 bits, leaving 19
>> > for future extensions and a version number. It is not great but pretty
>> > doable?
>> >
>> >>
>> >> Shouldn't be more than 32 bits in total.
>> >>
>> >> Regards,
>> >> Christian.
>> >>
>> >> >   - well the compressed from with
>> >> > duplicates removed, if you really want to make it tiny. But you can
>> >> > just add new entries if the heuristics for a given platform changes,
>> >> > as long as you keep the old entries working (if you shipped them ever
>> >> > ofc, otherwise totally ok to change as you see fit).
>> >> > -Daniel
>> >> >
>> >> >>>  From a quick look about 16 bits total. Gives you 40bits of second
>> >> >>> chances and "oops totally sorry".
>> >> >> I purposefully left 8 bits as a version field for those oopses.
>> >> >>
>> >> >>> -Daniel
>> >> >>>
>> >> >>>>> -Daniel
>> >> >>>>>
>> >> >>>>>>> -Daniel
>> >> >>>>>>>
>> >> >>>>>>>> Regards,
>> >> >>>>>>>> Christian.
>> >> >>>>>>>>
>> >> >>>>>>>>> If you do use 256 bits in order to represent
>> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
>> >> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
>> >> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
>> >> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
>> >> >>>>>>>>> with some kind of eBPF script.
>> >> >>>>>>>>>
>> >> >>>>>>>>> Cheers,
>> >> >>>>>>>>> Daniel
>> >> >>>>>>>>> _______________________________________________
>> >> >>>>>>>>> dri-devel mailing list
>> >> >>>>>>>>> dri-devel@lists.freedesktop.org
>> >> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> >> >>>>>>> --
>> >> >>>>>>> Daniel Vetter
>> >> >>>>>>> Software Engineer, Intel Corporation
>> >> >>>>>>> http://blog.ffwll.ch
>> >> >>>>>>> _______________________________________________
>> >> >>>>>>> dri-devel mailing list
>> >> >>>>>>> dri-devel@lists.freedesktop.org
>> >> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>> >> >>>>> --
>> >> >>>>> Daniel Vetter
>> >> >>>>> Software Engineer, Intel Corporation
>> >> >>>>> http://blog.ffwll.ch
>> >> >>>
>> >> >>>
>> >> >>> --
>> >> >>> Daniel Vetter
>> >> >>> Software Engineer, Intel Corporation
>> >> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>> >> >
>> >> >
>> >>
>>
>>
>>
>> --
>> Daniel Vetter
>> Software Engineer, Intel Corporation
>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Bas Nieuwenhuizen Sept. 4, 2018, 7:36 p.m. UTC | #27
On Tue, Sep 4, 2018 at 9:28 PM Daniel Vetter <daniel@ffwll.ch> wrote:
>
> On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen
> <bas@basnieuwenhuizen.nl> wrote:
> > On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >>
> >> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen
> >> <bas@basnieuwenhuizen.nl> wrote:
> >> > On Tue, Sep 4, 2018 at 7:48 PM Christian König
> >> > <ckoenig.leichtzumerken@gmail.com> wrote:
> >> >>
> >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> >> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> >> >> > <bas@basnieuwenhuizen.nl> wrote:
> >> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
> >> >> >>> <bas@basnieuwenhuizen.nl> wrote:
> >> >> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >> >> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> >> >> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >> >> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> >> >> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> >> >> >>>>>>>>> Hi,
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> >> >> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> >> >> >>>>>>>>>>> +/* The chip this is compatible with.
> >> >> >>>>>>>>>>> + *
> >> >> >>>>>>>>>>> + * If compression is disabled, use
> >> >> >>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> >> >> >>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> >> >> >>>>>>>>>>> + *
> >> >> >>>>>>>>>>> + * With compression enabled please use the exact chip.
> >> >> >>>>>>>>>>> + *
> >> >> >>>>>>>>>>> + * TODO: Do some generations share DCC format?
> >> >> >>>>>>>>>>> + */
> >> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> >> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> >> >> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
> >> >> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they
> >> >> >>>>>>>>>> eventually agreed that they don't need the massive pile with every
> >> >> >>>>>>>>>> possible combination. Do you really plan to share all these different
> >> >> >>>>>>>>>> things?
> >> >> >>>>>>>>>>
> >> >> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
> >> >> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
> >> >> >>>>>>>>>> using explicit modifier flags for everything.
> >> >> >>>>>>>>> Right. The conclusion, after people went through and started sorting
> >> >> >>>>>>>>> out the kinds of formats for which they would _actually_ export real
> >> >> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than
> >> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> >> >> >>>>>>>>> possible formats to represent, very likely fewer than
> >> >> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> >> >> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
> >> >> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
> >> >> >>>>>>>>> bits of the 56 available.
> >> >> >>>>>>>> The problem here is that at least for some parameters we actually don't know
> >> >> >>>>>>>> which formats are actually used.
> >> >> >>>>>>>>
> >> >> >>>>>>>> The following are not real world examples, but just to explain the general
> >> >> >>>>>>>> problem.
> >> >> >>>>>>>>
> >> >> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather
> >> >> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
> >> >> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
> >> >> >>>>>>>> channel, but it is still perfectly possible.
> >> >> >>>>>>>>
> >> >> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
> >> >> >>>>>>>> could be bad and we need to know which to actually use.
> >> >> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
> >> >> >>>>>>> All the information you only need locally to a given amdgpu instance
> >> >> >>>>>>> don't really need to be encoded in modifiers.
> >> >> >>>>>>>
> >> >> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
> >> >> >>>>>>> good starters I guess) would be really good here.
> >> >> >>>>>> I extracted the information on which bits are relevant mostly from the
> >> >> >>>>>> AddrFromCoord functions in addrlib in mesa:
> >> >> >>>>>>
> >> >> >>>>>> for macro-tiles:
> >> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> >> >> >>>>>>
> >> >> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
> >> >> >>>>>>
> >> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
> >> >> >>>>> So this is the decoding thing. How many of these actually exist, even when
> >> >> >>>>> taking all the other information into account?
> >> >> >>>>>
> >> >> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> >> >> >>>>> + height + width, how much of all these bits do you actually still freely
> >> >> >>>>> pick?
> >> >> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
> >> >> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
> >> >> >>>> depth, display-rotated) + whether to use compression, everything else
> >> >> >>>> is fixed given those option, the properties of the chip and the
> >> >> >>>> format.
> >> >> >>>>
> >> >> >>>>
> >> >> >>>>> It might be that all the things you need to know from the memory config
> >> >> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
> >> >> >>>>> kinda the angle that we looked at this for everyone else.
> >> >> >>>>>
> >> >> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
> >> >> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
> >> >> >>>>> becomes part of the implied stuff in the modifier.
> >> >> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not
> >> >> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
> >> >> >>>> current configuration for all GFX6-GFX8 GPU:
> >> >> >>>>
> >> >> >>>> format: (bank width, bank height, macro tile aspect, num banks) for
> >> >> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
> >> >> >>>>
> >> >> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> >> >> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> >> >> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
> >> >> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
> >> >> >>>> ADDR_SURF_P8_32x32_8x16
> >> >> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
> >> >> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
> >> >> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
> >> >> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
> >> >> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
> >> >> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> >> >> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
> >> >> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> >> >> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
> >> >> >>>>
> >> >> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
> >> >> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
> >> >> >>>> compatible for 8-bpp.
> >> >> >>>>
> >> >> >>>> So we can't just assume that if the first plane properties match that
> >> >> >>>> they do so for the second plane, because we don't know what GPU it is
> >> >> >>>> coming from.
> >> >> >>>>
> >> >> >>>> And we can make a canonical table from this but then if we change the
> >> >> >>>> above tables in the kernel that runs into compatibility issues.
> >> >> >>>
> >> >> >>> I think that's roughly what nvidia ended up doing (not sure they
> >> >> >>> published anything, haven't seen any patches on dri-devel). Two parts:
> >> >> >>> - Bunch of flags/bits for the major mode, like {display, non-display,
> >> >> >>> DCC} and so on.
> >> >> >>> - For each of the major modes a list of enumerated modes actually used
> >> >> >>> in reality. For DCC this would be the generation, for more plain
> >> >> >>> formats it would be the above table. Table entries would be per-bpp
> >> >> >>> (in case that matters, or whatever matters for amd gpus).
> >> >> >>>
> >> >> >>> Imo would be totally ok to put the relevant lookup tables into
> >> >> >>> drm_fourcc.h directly. Or any other suitable place.
> >> >> >> Right, the issue would be that the tables are configured int he kernel
> >> >> >>
> >> >> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
> >> >> >>
> >> >> >> and having a shared hardcoded table means that if the kernel tables
> >> >> >> change we break userspace. That would be a new commitment on the side
> >> >> >> of the kernel driver.
> >> >> >>
> >> >> >> (If I can get an okay for that commitment from Alex/Christian I'd be
> >> >> >> totally happy to switch it over)
> >> >> > Yeah it'd make that table uapi
> >> >>
> >> >> That won't work. Those lists are not necessary stable.
> >> >>
> >> >> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index
> >> >> plus a 8bit version number.
> >> >
> >> > But if we use PCI-ID, then we cannot really share between different
> >> > models of GPU ...
> >> >
> >> > So we have seen 3 options on design level:
> >> > 1) Use PCI-ID/chip.
> >> >   Advantage:
> >> >     -simple
> >> >     - Few bits
> >> >   Disadvantage:
> >> >     - No sharing between different GPU models
> >> > 2) Use a fixed lookup/canonicalization table.
> >> >   Advantage:
> >> >     -Few bits
> >> >     - Sharing between models
> >> >   Disadvantage:
> >> >     - Cannot change the kernel tiling tables.
> >>
> >> Why is changing the kernel table impossible? Just because it's uapi
> >> doesn't mean we can never extend it. Just add a new entry at the
> >> bottom if you need more combinations ...
> >>
> >> You don't even need to require the updated kernel amdgpu.ko, just the
> >> updated drm_fourcc.h header (if it's one of these things you can
> >> configure through cs packets entirely from userspace).
> >
> > So we have two tables we are talking about:
> >  a) The kernel initialized system wide table of tiling configurations
> > that already exists
> >  b)The uapi table
> >
> > Currently mesa selects by index, i.e. if we want a 32-bit surface with
> > macro-tiling that is not displayable we should select index 14 in the
> > system wide tiling table. Now if we change that table entry 14 that
> > means that we change which things it is compatible with, so we need a
> > new entry in the uapi table. However, an old mesa does not know about
> > the new entry in the uapi table, so loses the ability to select a
> > modifier for the preferred tiling mode, e.g. we pretty much regress
> > old userspace.
>
> That sounds like this table already _is_ uapi. So you can't change it
> ... why would you want to change it then?
You can change it somewhat. An entry still has to be valid for the use
case mesa selects it for, but the actual values are still tunable as
long as you have that constraint in mind.

Also there effectively is an ioctl to get that table from the kernel
and userspace already uses it.

>
> Note also that the fourcc table doesn't really have to match this
> by-index tiling mode uapi table, you can freely add more at the end or
> have any kind of fancy lookup table (one for everyone, or
> per-generation, or per-whatever, doesn't matter). Just a tradeoff
> between compressing the set of actually used layouts into as few
> modifier bits as possible against code maintenance headaches of having
> to deal with too many lookup tables for all the different things.
> -Daniel
>
> >
> >>
> >> Or why does this not work?
> >> -Daniel
> >>
> >> > 3) Encode tiling mode properties in modifier.
> >> >   Advantage:
> >> >     - Can share between models.
> >> >     - Can change the kernel tiling tables.
> >> >   Disadvantage:
> >> >     - Use a lot of bits.
> >> >
> >> > Overall the number of exposed modifiers should be pretty much the
> >> > same, so encoding size that we have to deal with all possible bit
> >> > patterns.
> >> >
> >> > I've implemented option 3 in the patch here because I think it is the
> >> > best tradeoff. As far as I can see it fits, and especially if we do no
> >> > multisample support and remove the tile split bytes, and remove the
> >> > 3rd set of plane macro-tile options because no format has 3 different
> >> > bitdepths for the 3 planes, we are left with using 37 bits, leaving 19
> >> > for future extensions and a version number. It is not great but pretty
> >> > doable?
> >> >
> >> >>
> >> >> Shouldn't be more than 32 bits in total.
> >> >>
> >> >> Regards,
> >> >> Christian.
> >> >>
> >> >> >   - well the compressed from with
> >> >> > duplicates removed, if you really want to make it tiny. But you can
> >> >> > just add new entries if the heuristics for a given platform changes,
> >> >> > as long as you keep the old entries working (if you shipped them ever
> >> >> > ofc, otherwise totally ok to change as you see fit).
> >> >> > -Daniel
> >> >> >
> >> >> >>>  From a quick look about 16 bits total. Gives you 40bits of second
> >> >> >>> chances and "oops totally sorry".
> >> >> >> I purposefully left 8 bits as a version field for those oopses.
> >> >> >>
> >> >> >>> -Daniel
> >> >> >>>
> >> >> >>>>> -Daniel
> >> >> >>>>>
> >> >> >>>>>>> -Daniel
> >> >> >>>>>>>
> >> >> >>>>>>>> Regards,
> >> >> >>>>>>>> Christian.
> >> >> >>>>>>>>
> >> >> >>>>>>>>> If you do use 256 bits in order to represent
> >> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> >> >> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
> >> >> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
> >> >> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
> >> >> >>>>>>>>> with some kind of eBPF script.
> >> >> >>>>>>>>>
> >> >> >>>>>>>>> Cheers,
> >> >> >>>>>>>>> Daniel
> >> >> >>>>>>>>> _______________________________________________
> >> >> >>>>>>>>> dri-devel mailing list
> >> >> >>>>>>>>> dri-devel@lists.freedesktop.org
> >> >> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >> >> >>>>>>> --
> >> >> >>>>>>> Daniel Vetter
> >> >> >>>>>>> Software Engineer, Intel Corporation
> >> >> >>>>>>> http://blog.ffwll.ch
> >> >> >>>>>>> _______________________________________________
> >> >> >>>>>>> dri-devel mailing list
> >> >> >>>>>>> dri-devel@lists.freedesktop.org
> >> >> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >> >> >>>>> --
> >> >> >>>>> Daniel Vetter
> >> >> >>>>> Software Engineer, Intel Corporation
> >> >> >>>>> http://blog.ffwll.ch
> >> >> >>>
> >> >> >>>
> >> >> >>> --
> >> >> >>> Daniel Vetter
> >> >> >>> Software Engineer, Intel Corporation
> >> >> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >> >> >
> >> >> >
> >> >>
> >>
> >>
> >>
> >> --
> >> Daniel Vetter
> >> Software Engineer, Intel Corporation
> >> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Daniel Vetter Sept. 4, 2018, 9:39 p.m. UTC | #28
On Tue, Sep 04, 2018 at 09:36:01PM +0200, Bas Nieuwenhuizen wrote:
> On Tue, Sep 4, 2018 at 9:28 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> >
> > On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen
> > <bas@basnieuwenhuizen.nl> wrote:
> > > On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >>
> > >> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen
> > >> <bas@basnieuwenhuizen.nl> wrote:
> > >> > On Tue, Sep 4, 2018 at 7:48 PM Christian König
> > >> > <ckoenig.leichtzumerken@gmail.com> wrote:
> > >> >>
> > >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter:
> > >> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen
> > >> >> > <bas@basnieuwenhuizen.nl> wrote:
> > >> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen
> > >> >> >>> <bas@basnieuwenhuizen.nl> wrote:
> > >> >> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >> >> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote:
> > >> >> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote:
> > >> >> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote:
> > >> >> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone:
> > >> >> >>>>>>>>> Hi,
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote:
> > >> >> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote:
> > >> >> >>>>>>>>>>> +/* The chip this is compatible with.
> > >> >> >>>>>>>>>>> + *
> > >> >> >>>>>>>>>>> + * If compression is disabled, use
> > >> >> >>>>>>>>>>> + *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
> > >> >> >>>>>>>>>>> + *   - AMDGPU_CHIP_VEGA10 for GFX9+
> > >> >> >>>>>>>>>>> + *
> > >> >> >>>>>>>>>>> + * With compression enabled please use the exact chip.
> > >> >> >>>>>>>>>>> + *
> > >> >> >>>>>>>>>>> + * TODO: Do some generations share DCC format?
> > >> >> >>>>>>>>>>> + */
> > >> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT                 40
> > >> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK                  0xff
> > >> >> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling
> > >> >> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they
> > >> >> >>>>>>>>>> eventually agreed that they don't need the massive pile with every
> > >> >> >>>>>>>>>> possible combination. Do you really plan to share all these different
> > >> >> >>>>>>>>>> things?
> > >> >> >>>>>>>>>>
> > >> >> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon
> > >> >> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not
> > >> >> >>>>>>>>>> using explicit modifier flags for everything.
> > >> >> >>>>>>>>> Right. The conclusion, after people went through and started sorting
> > >> >> >>>>>>>>> out the kinds of formats for which they would _actually_ export real
> > >> >> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than
> > >> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > >> >> >>>>>>>>> possible formats to represent, very likely fewer than
> > >> >> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably
> > >> >> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally
> > >> >> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8
> > >> >> >>>>>>>>> bits of the 56 available.
> > >> >> >>>>>>>> The problem here is that at least for some parameters we actually don't know
> > >> >> >>>>>>>> which formats are actually used.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> The following are not real world examples, but just to explain the general
> > >> >> >>>>>>>> problem.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather
> > >> >> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a
> > >> >> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one
> > >> >> >>>>>>>> channel, but it is still perfectly possible.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them
> > >> >> >>>>>>>> could be bad and we need to know which to actually use.
> > >> >> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers?
> > >> >> >>>>>>> All the information you only need locally to a given amdgpu instance
> > >> >> >>>>>>> don't really need to be encoded in modifiers.
> > >> >> >>>>>>>
> > >> >> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be
> > >> >> >>>>>>> good starters I guess) would be really good here.
> > >> >> >>>>>> I extracted the information on which bits are relevant mostly from the
> > >> >> >>>>>> AddrFromCoord functions in addrlib in mesa:
> > >> >> >>>>>>
> > >> >> >>>>>> for macro-tiles:
> > >> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587
> > >> >> >>>>>>
> > >> >> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles):
> > >> >> >>>>>>
> > >> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016
> > >> >> >>>>> So this is the decoding thing. How many of these actually exist, even when
> > >> >> >>>>> taking all the other information into account?
> > >> >> >>>>>
> > >> >> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride
> > >> >> >>>>> + height + width, how much of all these bits do you actually still freely
> > >> >> >>>>> pick?
> > >> >> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse,
> > >> >> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display,
> > >> >> >>>> depth, display-rotated) + whether to use compression, everything else
> > >> >> >>>> is fixed given those option, the properties of the chip and the
> > >> >> >>>> format.
> > >> >> >>>>
> > >> >> >>>>
> > >> >> >>>>> It might be that all the things you need to know from the memory config
> > >> >> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's
> > >> >> >>>>> kinda the angle that we looked at this for everyone else.
> > >> >> >>>>>
> > >> >> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the
> > >> >> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just
> > >> >> >>>>> becomes part of the implied stuff in the modifier.
> > >> >> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not
> > >> >> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the
> > >> >> >>>> current configuration for all GFX6-GFX8 GPU:
> > >> >> >>>>
> > >> >> >>>> format: (bank width, bank height, macro tile aspect, num banks) for
> > >> >> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG
> > >> >> >>>>
> > >> >> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> > >> >> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16
> > >> >> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2
> > >> >> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16)
> > >> >> >>>> ADDR_SURF_P8_32x32_8x16
> > >> >> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16
> > >> >> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16
> > >> >> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8),  (1, 2, 2, 8) ADDR_SURF_P2
> > >> >> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2
> > >> >> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16
> > >> >> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> > >> >> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16
> > >> >> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16
> > >> >> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2
> > >> >> >>>>
> > >> >> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp
> > >> >> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only
> > >> >> >>>> compatible for 8-bpp.
> > >> >> >>>>
> > >> >> >>>> So we can't just assume that if the first plane properties match that
> > >> >> >>>> they do so for the second plane, because we don't know what GPU it is
> > >> >> >>>> coming from.
> > >> >> >>>>
> > >> >> >>>> And we can make a canonical table from this but then if we change the
> > >> >> >>>> above tables in the kernel that runs into compatibility issues.
> > >> >> >>>
> > >> >> >>> I think that's roughly what nvidia ended up doing (not sure they
> > >> >> >>> published anything, haven't seen any patches on dri-devel). Two parts:
> > >> >> >>> - Bunch of flags/bits for the major mode, like {display, non-display,
> > >> >> >>> DCC} and so on.
> > >> >> >>> - For each of the major modes a list of enumerated modes actually used
> > >> >> >>> in reality. For DCC this would be the generation, for more plain
> > >> >> >>> formats it would be the above table. Table entries would be per-bpp
> > >> >> >>> (in case that matters, or whatever matters for amd gpus).
> > >> >> >>>
> > >> >> >>> Imo would be totally ok to put the relevant lookup tables into
> > >> >> >>> drm_fourcc.h directly. Or any other suitable place.
> > >> >> >> Right, the issue would be that the tables are configured int he kernel
> > >> >> >>
> > >> >> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417
> > >> >> >>
> > >> >> >> and having a shared hardcoded table means that if the kernel tables
> > >> >> >> change we break userspace. That would be a new commitment on the side
> > >> >> >> of the kernel driver.
> > >> >> >>
> > >> >> >> (If I can get an okay for that commitment from Alex/Christian I'd be
> > >> >> >> totally happy to switch it over)
> > >> >> > Yeah it'd make that table uapi
> > >> >>
> > >> >> That won't work. Those lists are not necessary stable.
> > >> >>
> > >> >> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index
> > >> >> plus a 8bit version number.
> > >> >
> > >> > But if we use PCI-ID, then we cannot really share between different
> > >> > models of GPU ...
> > >> >
> > >> > So we have seen 3 options on design level:
> > >> > 1) Use PCI-ID/chip.
> > >> >   Advantage:
> > >> >     -simple
> > >> >     - Few bits
> > >> >   Disadvantage:
> > >> >     - No sharing between different GPU models
> > >> > 2) Use a fixed lookup/canonicalization table.
> > >> >   Advantage:
> > >> >     -Few bits
> > >> >     - Sharing between models
> > >> >   Disadvantage:
> > >> >     - Cannot change the kernel tiling tables.
> > >>
> > >> Why is changing the kernel table impossible? Just because it's uapi
> > >> doesn't mean we can never extend it. Just add a new entry at the
> > >> bottom if you need more combinations ...
> > >>
> > >> You don't even need to require the updated kernel amdgpu.ko, just the
> > >> updated drm_fourcc.h header (if it's one of these things you can
> > >> configure through cs packets entirely from userspace).
> > >
> > > So we have two tables we are talking about:
> > >  a) The kernel initialized system wide table of tiling configurations
> > > that already exists
> > >  b)The uapi table
> > >
> > > Currently mesa selects by index, i.e. if we want a 32-bit surface with
> > > macro-tiling that is not displayable we should select index 14 in the
> > > system wide tiling table. Now if we change that table entry 14 that
> > > means that we change which things it is compatible with, so we need a
> > > new entry in the uapi table. However, an old mesa does not know about
> > > the new entry in the uapi table, so loses the ability to select a
> > > modifier for the preferred tiling mode, e.g. we pretty much regress
> > > old userspace.
> >
> > That sounds like this table already _is_ uapi. So you can't change it
> > ... why would you want to change it then?
> You can change it somewhat. An entry still has to be valid for the use
> case mesa selects it for, but the actual values are still tunable as
> long as you have that constraint in mind.
> 
> Also there effectively is an ioctl to get that table from the kernel
> and userspace already uses it.

Hm, if that ioctl already exists, could you extend it to include the
corresponding fourcc code? Then things would Just Work. Assuming no one
else in the amdgpu ecosystem can allocate modifier combinations because
they must go through this table.
-Daniel

> 
> >
> > Note also that the fourcc table doesn't really have to match this
> > by-index tiling mode uapi table, you can freely add more at the end or
> > have any kind of fancy lookup table (one for everyone, or
> > per-generation, or per-whatever, doesn't matter). Just a tradeoff
> > between compressing the set of actually used layouts into as few
> > modifier bits as possible against code maintenance headaches of having
> > to deal with too many lookup tables for all the different things.
> > -Daniel
> >
> > >
> > >>
> > >> Or why does this not work?
> > >> -Daniel
> > >>
> > >> > 3) Encode tiling mode properties in modifier.
> > >> >   Advantage:
> > >> >     - Can share between models.
> > >> >     - Can change the kernel tiling tables.
> > >> >   Disadvantage:
> > >> >     - Use a lot of bits.
> > >> >
> > >> > Overall the number of exposed modifiers should be pretty much the
> > >> > same, so encoding size that we have to deal with all possible bit
> > >> > patterns.
> > >> >
> > >> > I've implemented option 3 in the patch here because I think it is the
> > >> > best tradeoff. As far as I can see it fits, and especially if we do no
> > >> > multisample support and remove the tile split bytes, and remove the
> > >> > 3rd set of plane macro-tile options because no format has 3 different
> > >> > bitdepths for the 3 planes, we are left with using 37 bits, leaving 19
> > >> > for future extensions and a version number. It is not great but pretty
> > >> > doable?
> > >> >
> > >> >>
> > >> >> Shouldn't be more than 32 bits in total.
> > >> >>
> > >> >> Regards,
> > >> >> Christian.
> > >> >>
> > >> >> >   - well the compressed from with
> > >> >> > duplicates removed, if you really want to make it tiny. But you can
> > >> >> > just add new entries if the heuristics for a given platform changes,
> > >> >> > as long as you keep the old entries working (if you shipped them ever
> > >> >> > ofc, otherwise totally ok to change as you see fit).
> > >> >> > -Daniel
> > >> >> >
> > >> >> >>>  From a quick look about 16 bits total. Gives you 40bits of second
> > >> >> >>> chances and "oops totally sorry".
> > >> >> >> I purposefully left 8 bits as a version field for those oopses.
> > >> >> >>
> > >> >> >>> -Daniel
> > >> >> >>>
> > >> >> >>>>> -Daniel
> > >> >> >>>>>
> > >> >> >>>>>>> -Daniel
> > >> >> >>>>>>>
> > >> >> >>>>>>>> Regards,
> > >> >> >>>>>>>> Christian.
> > >> >> >>>>>>>>
> > >> >> >>>>>>>>> If you do use 256 bits in order to represent
> > >> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936
> > >> >> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly
> > >> >> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers.
> > >> >> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens
> > >> >> >>>>>>>>> with some kind of eBPF script.
> > >> >> >>>>>>>>>
> > >> >> >>>>>>>>> Cheers,
> > >> >> >>>>>>>>> Daniel
> > >> >> >>>>>>>>> _______________________________________________
> > >> >> >>>>>>>>> dri-devel mailing list
> > >> >> >>>>>>>>> dri-devel@lists.freedesktop.org
> > >> >> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >> >> >>>>>>> --
> > >> >> >>>>>>> Daniel Vetter
> > >> >> >>>>>>> Software Engineer, Intel Corporation
> > >> >> >>>>>>> http://blog.ffwll.ch
> > >> >> >>>>>>> _______________________________________________
> > >> >> >>>>>>> dri-devel mailing list
> > >> >> >>>>>>> dri-devel@lists.freedesktop.org
> > >> >> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> > >> >> >>>>> --
> > >> >> >>>>> Daniel Vetter
> > >> >> >>>>> Software Engineer, Intel Corporation
> > >> >> >>>>> http://blog.ffwll.ch
> > >> >> >>>
> > >> >> >>>
> > >> >> >>> --
> > >> >> >>> Daniel Vetter
> > >> >> >>> Software Engineer, Intel Corporation
> > >> >> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> > >> >> >
> > >> >> >
> > >> >>
> > >>
> > >>
> > >>
> > >> --
> > >> Daniel Vetter
> > >> Software Engineer, Intel Corporation
> > >> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
> >
> >
> >
> > --
> > Daniel Vetter
> > Software Engineer, Intel Corporation
> > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Marek Olšák Sept. 7, 2018, 4:50 a.m. UTC | #29
Hopefully this answers some questions.

Other parameters that affect tiling layouts are GB_ADDR_CONFIG (all
chips) and MC_ARB_RAMCFG (GFX6-8 only), and those vary with each chip.

Some 32bpp 1D tiling layouts are compatible across all chips (1D
display tiling is the same as SW_256B_D if Bpp == 4).

On GFX9, swizzle modes <= 11 are the same on all GFX9 chips. The
remaining modes depend on GB_ADDR_CONFIG and are also more efficient.
Bpp, number of samples, and resource type (2D/3D) affect the layout
too, e.g. 3D textures silently use thick tiling on GFX9.

Harvesting doesn't affect tiling layouts.

The layout changes between layers/slices a little. Always use the base
address of the whole image when programming the hardware. Don't assume
that the 2nd layer has the same layout.

> + * TODO: Can scanout really not support fastclear data?

It can, but only those encoded in the DCC buffer (0/1). There is no
DAL support for DCC though.


> + * TODO: Do some generations share DCC format?

DCC mirrors the tiling layout, so the same tiling mode means the same
DCC. Take the absolute pixel address, shift it to the right, and
you'll get the DCC element address.

I would generally consider DCC as non-shareable because of different
meanings of TILING_INDEX between chips except maybe for common GFX9
layouts.


> [comments about number of bits]

We could certainly represent all formats as a list of enums, but then
we would need to convert the enums to the full description in drivers.
GFX6-8 can use TILING_INDEX (except for stencil, let's ignore
stencil). The tiling tables shouldn't change anymore because they are
optimized for the hardware, and later hw doesn't have any tables.

Marek
Bas Nieuwenhuizen Sept. 7, 2018, 9:55 a.m. UTC | #30
On Fri, Sep 7, 2018 at 6:51 AM Marek Olšák <maraeo@gmail.com> wrote:
>
> Hopefully this answers some questions.
>
> Other parameters that affect tiling layouts are GB_ADDR_CONFIG (all
> chips) and MC_ARB_RAMCFG (GFX6-8 only), and those vary with each chip.

For GFX6-GFX8:
From GB_ADDR_CONFIG addrlib only uses the pipe interleave bytes which
are 0 (=256 bytes) for all AMDGPU HW (and on GFX9 addrlib even asserts
on that).  From MC_ARB_RAMCFG addrlib reads the number of banks and
ranks, calculates the number of logical banks from it, but then does
not use it. (Presumably because it is the same number as the number of
banks in the tiling table entry?) Some bits gets used by the kernel
(memory row size), but those get encoded in the tile split of the
tiling table, i.e. we do not need the separate bits.

for GFX9, only the DCC meta surface seems to depend on GB_ADDR_CONFIG
(except the aforementioned pipe interleave bytes) which are constant.

I'll do some checks to see if we can trust addrlib here.

>
> Some 32bpp 1D tiling layouts are compatible across all chips (1D
> display tiling is the same as SW_256B_D if Bpp == 4).

Cool hadn't seen that they were equivalent in some situations.

>
> On GFX9, swizzle modes <= 11 are the same on all GFX9 chips. The
> remaining modes depend on GB_ADDR_CONFIG and are also more efficient.
> Bpp, number of samples, and resource type (2D/3D) affect the layout
> too, e.g. 3D textures silently use thick tiling on GFX9.
>
> Harvesting doesn't affect tiling layouts.
>
> The layout changes between layers/slices a little. Always use the base
> address of the whole image when programming the hardware. Don't assume
> that the 2nd layer has the same layout.
>
> > + * TODO: Can scanout really not support fastclear data?
>
> It can, but only those encoded in the DCC buffer (0/1). There is no
> DAL support for DCC though.
>
>
> > + * TODO: Do some generations share DCC format?
>
> DCC mirrors the tiling layout, so the same tiling mode means the same
> DCC. Take the absolute pixel address, shift it to the right, and
> you'll get the DCC element address.
>
> I would generally consider DCC as non-shareable because of different
> meanings of TILING_INDEX between chips except maybe for common GFX9
> layouts.

I was also thinking about the compression changes on Polaris that
improved compression. I'm not sure if that added e.g. new encodings in
how we store the data of a DCC tile in the main data surface.
>
>
> > [comments about number of bits]
>
> We could certainly represent all formats as a list of enums, but then
> we would need to convert the enums to the full description in drivers.
> GFX6-8 can use TILING_INDEX (except for stencil, let's ignore
> stencil). The tiling tables shouldn't change anymore because they are
> optimized for the hardware, and later hw doesn't have any tables.
>
> Marek
Marek Olšák Sept. 7, 2018, 10:33 p.m. UTC | #31
On Fri, Sep 7, 2018 at 5:55 AM, Bas Nieuwenhuizen
<bas@basnieuwenhuizen.nl> wrote:
> On Fri, Sep 7, 2018 at 6:51 AM Marek Olšák <maraeo@gmail.com> wrote:
>>
>> Hopefully this answers some questions.
>>
>> Other parameters that affect tiling layouts are GB_ADDR_CONFIG (all
>> chips) and MC_ARB_RAMCFG (GFX6-8 only), and those vary with each chip.
>
> For GFX6-GFX8:
> From GB_ADDR_CONFIG addrlib only uses the pipe interleave bytes which
> are 0 (=256 bytes) for all AMDGPU HW (and on GFX9 addrlib even asserts
> on that).  From MC_ARB_RAMCFG addrlib reads the number of banks and
> ranks, calculates the number of logical banks from it, but then does
> not use it. (Presumably because it is the same number as the number of
> banks in the tiling table entry?) Some bits gets used by the kernel
> (memory row size), but those get encoded in the tile split of the
> tiling table, i.e. we do not need the separate bits.
>
> for GFX9, only the DCC meta surface seems to depend on GB_ADDR_CONFIG
> (except the aforementioned pipe interleave bytes) which are constant.

On GFX9, addrlib in Mesa uses most fields from GB_ADDR_CONFIG.
GB_ADDR_CONFIG defines the tiling formats.

On older chips, addrlib reads some fields from GB_ADDR_CONFIG and uses
the chip identification for others like the number of pipes, even
though GB_ADDR_CONFIG has the information too.

Marek
diff mbox series

Patch

diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h
index 94444eeba55b..4e1452161dbf 100644
--- a/include/uapi/drm/amdgpu_drm.h
+++ b/include/uapi/drm/amdgpu_drm.h
@@ -990,6 +990,136 @@  struct drm_amdgpu_info_vce_clock_table {
 #define AMDGPU_FAMILY_AI			141 /* Vega10 */
 #define AMDGPU_FAMILY_RV			142 /* Raven */
 
+#define AMDGPU_CHIP_TAHITI	0
+#define AMDGPU_CHIP_PITCAIRN	1
+#define AMDGPU_CHIP_VERDE	2
+#define AMDGPU_CHIP_OLAND	3
+#define AMDGPU_CHIP_HAINAN	4
+#define AMDGPU_CHIP_BONAIRE	5
+#define AMDGPU_CHIP_KAVERI	6
+#define AMDGPU_CHIP_KABINI	7
+#define AMDGPU_CHIP_HAWAII	8
+#define AMDGPU_CHIP_MULLINS	9
+#define AMDGPU_CHIP_TOPAZ	10
+#define AMDGPU_CHIP_TONGA	11
+#define AMDGPU_CHIP_FIJI	12
+#define AMDGPU_CHIP_CARRIZO	13
+#define AMDGPU_CHIP_STONEY	14
+#define AMDGPU_CHIP_POLARIS10	15
+#define AMDGPU_CHIP_POLARIS11	16
+#define AMDGPU_CHIP_POLARIS12	17
+#define AMDGPU_CHIP_VEGAM	18
+#define AMDGPU_CHIP_VEGA10	19
+#define AMDGPU_CHIP_VEGA12	20
+#define AMDGPU_CHIP_VEGA20	21
+#define AMDGPU_CHIP_RAVEN	22
+
+/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same
+ * as AMDGPU_TILING_*. However, the the rules as to when to set them are
+ * different.
+ *
+ * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR
+ * instead.
+ *
+ * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be
+ * set.
+ *
+ * For other ARRAY_MODEs:
+ *  - Only set TILE_SPLIT if the image is multisample.
+ *
+ * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1
+ * different value there. The values are
+ *   - depth           : 0
+ *   - displayable     : 1
+ *   - thin            : 2
+ *   - thick (GFX6)    : 3
+ *   - rotated (GFX7+) : 4
+ *
+ * TODO: What to do with multisample multi plane images? More tile split
+ * fields don't fit if we want to keep a few bits for a format version.
+ */
+#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT		0
+#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK		0xf
+#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT		4
+#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK		0x1f
+#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT		9
+#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK		0x7
+#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT	12
+#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK	0x7
+#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT		15
+#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK		0x3
+#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT		17
+#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK		0x3
+#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT	19
+#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK	0x3
+#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT		21
+#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_MASK		0x3
+
+/* Macrotile parameters for a second plane if existing */
+#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_SHIFT		23
+#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_MASK		0x3
+#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_SHIFT	25
+#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_MASK		0x3
+#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_SHIFT	27
+#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_MASK	0x3
+#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_SHIFT		29
+#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_MASK		0x3
+
+/* Macrotile parameters for a third plane if existing */
+#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_SHIFT		31
+#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_MASK		0x3
+#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_SHIFT	33
+#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_MASK		0x3
+#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_SHIFT	35
+#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_MASK	0x3
+#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_SHIFT		37
+#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_MASK		0x3
+
+#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_SHIFT		0
+#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_MASK		0x1f
+
+/* Whether to enable DCC compression.
+ *
+ * If enabled, exporting the surface results in three
+ * planes:
+ *   - color data
+ *   - DCC data
+ *   - a 64-byte block with
+ *     - a 16 byte 0/1 bool as to whether the surface is currently DCC compressed.
+ *     - a 16-byte 0/1 bool as to whether the surface has fastclear data
+ *     - a 8-byte chunk with the current fastclear colors
+ *
+ * To ensure we do not keep compressing and decompressing the surface, once it
+ * has been decompressed no party may recompress again.
+ *
+ * Applications should not hand over images with fastclear data as not
+ * all users can support it, however, to help both Vulkan implementations
+ * with the allocation we keep it in the 64-byte block.
+ *
+ * TODO: Can scanout really not support fastclear data?
+ * TODO: What to do with multiplane images?
+ */
+#define AMDGPU_MODIFIER_COMPRESSION_SHIFT		39
+#define AMDGPU_MODIFIER_COMPRESSION_MASK		0x1
+
+/* The chip this is compatible with.
+ *
+ * If compression is disabled, use
+ *   - AMDGPU_CHIP_TAHITI for GFX6-GFX8
+ *   - AMDGPU_CHIP_VEGA10 for GFX9+
+ *
+ * With compression enabled please use the exact chip.
+ *
+ * TODO: Do some generations share DCC format?
+ */
+#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT			40
+#define AMDGPU_MODIFIER_CHIP_GEN_MASK			0xff
+
+#define AMDGPU_MODIFIER_SET(field, value) \
+	(((__u64)(value) & AMDGPU_MODIFIER_##field##_MASK) << AMDGPU_MODIFIER_##field##_SHIFT)
+#define AMDGPU_MODIFIER_GET(value, field) \
+	(((__u64)(value) >> AMDGPU_MODIFIER_##field##_SHIFT) & AMDGPU_MODIFIER_##field##_MASK)
+
 /*
  * Definition of free sync enter and exit signals
  * We may have more options in the future