Message ID | 20180904010033.67611-1-bas@basnieuwenhuizen.nl (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] drm/amdgpu: Add macros and documentation for format modifiers. | expand |
Am 04.09.2018 um 03:00 schrieb Bas Nieuwenhuizen: > This is an initial proposal for format modifiers for AMD hardware. > > It uses 48 bits including a chip generation, leaving 8 bits for > a format version number. I'm absolutely not an expert on this, but as far as I know the major problem with this approach was that we couldn't fit all necessary parameter into a 64bit identifier. The last thing I've heard was something like we need at least 128bit, better 256. But Marek and Alex certainly know more about that stuff. Christian. > > On gfx6-gfx8 we have all the fields influencing sample locations > in memory. > > Tile split bytes are optional for single sample buffers as no > hardware reaches the split size with 1 sample and hence the actual > size does not matter. > > The macrotile fields are duplicated for images with multiple planes. > If the planes have different bitdepth they need different macro > tile fields and different tile split bytes if multisample. > > I could not fit multiple copies in for tile split bytes, but > multisample & multiplane images are very rare. Overall, I think > we should punt on multisample for a later format version since > they are generally not shared on any modifier aware windowing > system, and we have more issues like fmask & cmask. > > We need these copies because the drm modifier of all planes in an > image needs to be equal, so we need to fit these together. > > This adds fields for compression support, using metadata that is > compatible with AMDVLK and for which radv and radeonsi can > reasonably be extended. > > The big open question for compression is between which generations > the format changed to see if we can share more. > > This explicitly does not try to solve the linear stride alignment > issue, thoguh we could internally just use the tiling modes for > the linear modes to indicate linear images with the stride for the > given chip. > > Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> > CC: Chad Versace <chadversary@chromium.org> > CC: Dave Airlie <airlied@redhat.com> > CC: Marek Olšák <marek.olsak@amd.com> > CC: Nicolai Hähnle <nicolai.haehnle@amd.com> > CC: Alex Deucher <alexander.deucher@amd.com> > CC: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > include/uapi/drm/amdgpu_drm.h | 130 ++++++++++++++++++++++++++++++++++ > 1 file changed, 130 insertions(+) > > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h > index 94444eeba55b..4e1452161dbf 100644 > --- a/include/uapi/drm/amdgpu_drm.h > +++ b/include/uapi/drm/amdgpu_drm.h > @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table { > #define AMDGPU_FAMILY_AI 141 /* Vega10 */ > #define AMDGPU_FAMILY_RV 142 /* Raven */ > > +#define AMDGPU_CHIP_TAHITI 0 > +#define AMDGPU_CHIP_PITCAIRN 1 > +#define AMDGPU_CHIP_VERDE 2 > +#define AMDGPU_CHIP_OLAND 3 > +#define AMDGPU_CHIP_HAINAN 4 > +#define AMDGPU_CHIP_BONAIRE 5 > +#define AMDGPU_CHIP_KAVERI 6 > +#define AMDGPU_CHIP_KABINI 7 > +#define AMDGPU_CHIP_HAWAII 8 > +#define AMDGPU_CHIP_MULLINS 9 > +#define AMDGPU_CHIP_TOPAZ 10 > +#define AMDGPU_CHIP_TONGA 11 > +#define AMDGPU_CHIP_FIJI 12 > +#define AMDGPU_CHIP_CARRIZO 13 > +#define AMDGPU_CHIP_STONEY 14 > +#define AMDGPU_CHIP_POLARIS10 15 > +#define AMDGPU_CHIP_POLARIS11 16 > +#define AMDGPU_CHIP_POLARIS12 17 > +#define AMDGPU_CHIP_VEGAM 18 > +#define AMDGPU_CHIP_VEGA10 19 > +#define AMDGPU_CHIP_VEGA12 20 > +#define AMDGPU_CHIP_VEGA20 21 > +#define AMDGPU_CHIP_RAVEN 22 > + > +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same > + * as AMDGPU_TILING_*. However, the the rules as to when to set them are > + * different. > + * > + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR > + * instead. > + * > + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be > + * set. > + * > + * For other ARRAY_MODEs: > + * - Only set TILE_SPLIT if the image is multisample. > + * > + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1 > + * different value there. The values are > + * - depth : 0 > + * - displayable : 1 > + * - thin : 2 > + * - thick (GFX6) : 3 > + * - rotated (GFX7+) : 4 > + * > + * TODO: What to do with multisample multi plane images? More tile split > + * fields don't fit if we want to keep a few bits for a format version. > + */ > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT 0 > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK 0xf > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4 > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK 0x1f > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT 9 > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK 0x7 > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12 > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK 0x7 > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT 15 > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT 17 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT 19 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT 21 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_MASK 0x3 > + > +/* Macrotile parameters for a second plane if existing */ > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_SHIFT 23 > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_SHIFT 25 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_SHIFT 27 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_SHIFT 29 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_MASK 0x3 > + > +/* Macrotile parameters for a third plane if existing */ > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_SHIFT 31 > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_SHIFT 33 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_SHIFT 35 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_SHIFT 37 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_MASK 0x3 > + > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_SHIFT 0 > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_MASK 0x1f > + > +/* Whether to enable DCC compression. > + * > + * If enabled, exporting the surface results in three > + * planes: > + * - color data > + * - DCC data > + * - a 64-byte block with > + * - a 16 byte 0/1 bool as to whether the surface is currently DCC compressed. > + * - a 16-byte 0/1 bool as to whether the surface has fastclear data > + * - a 8-byte chunk with the current fastclear colors > + * > + * To ensure we do not keep compressing and decompressing the surface, once it > + * has been decompressed no party may recompress again. > + * > + * Applications should not hand over images with fastclear data as not > + * all users can support it, however, to help both Vulkan implementations > + * with the allocation we keep it in the 64-byte block. > + * > + * TODO: Can scanout really not support fastclear data? > + * TODO: What to do with multiplane images? > + */ > +#define AMDGPU_MODIFIER_COMPRESSION_SHIFT 39 > +#define AMDGPU_MODIFIER_COMPRESSION_MASK 0x1 > + > +/* The chip this is compatible with. > + * > + * If compression is disabled, use > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > + * > + * With compression enabled please use the exact chip. > + * > + * TODO: Do some generations share DCC format? > + */ > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > + > +#define AMDGPU_MODIFIER_SET(field, value) \ > + (((__u64)(value) & AMDGPU_MODIFIER_##field##_MASK) << AMDGPU_MODIFIER_##field##_SHIFT) > +#define AMDGPU_MODIFIER_GET(value, field) \ > + (((__u64)(value) >> AMDGPU_MODIFIER_##field##_SHIFT) & AMDGPU_MODIFIER_##field##_MASK) > + > /* > * Definition of free sync enter and exit signals > * We may have more options in the future
On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > This is an initial proposal for format modifiers for AMD hardware. > > It uses 48 bits including a chip generation, leaving 8 bits for > a format version number. > > On gfx6-gfx8 we have all the fields influencing sample locations > in memory. > > Tile split bytes are optional for single sample buffers as no > hardware reaches the split size with 1 sample and hence the actual > size does not matter. > > The macrotile fields are duplicated for images with multiple planes. > If the planes have different bitdepth they need different macro > tile fields and different tile split bytes if multisample. > > I could not fit multiple copies in for tile split bytes, but > multisample & multiplane images are very rare. Overall, I think > we should punt on multisample for a later format version since > they are generally not shared on any modifier aware windowing > system, and we have more issues like fmask & cmask. > > We need these copies because the drm modifier of all planes in an > image needs to be equal, so we need to fit these together. > > This adds fields for compression support, using metadata that is > compatible with AMDVLK and for which radv and radeonsi can > reasonably be extended. > > The big open question for compression is between which generations > the format changed to see if we can share more. > > This explicitly does not try to solve the linear stride alignment > issue, thoguh we could internally just use the tiling modes for > the linear modes to indicate linear images with the stride for the > given chip. > > Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> > CC: Chad Versace <chadversary@chromium.org> > CC: Dave Airlie <airlied@redhat.com> > CC: Marek Olšák <marek.olsak@amd.com> > CC: Nicolai Hähnle <nicolai.haehnle@amd.com> > CC: Alex Deucher <alexander.deucher@amd.com> > CC: Daniel Vetter <daniel.vetter@ffwll.ch> > --- > include/uapi/drm/amdgpu_drm.h | 130 ++++++++++++++++++++++++++++++++++ > 1 file changed, 130 insertions(+) > > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h > index 94444eeba55b..4e1452161dbf 100644 > --- a/include/uapi/drm/amdgpu_drm.h > +++ b/include/uapi/drm/amdgpu_drm.h > @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table { > #define AMDGPU_FAMILY_AI 141 /* Vega10 */ > #define AMDGPU_FAMILY_RV 142 /* Raven */ > > +#define AMDGPU_CHIP_TAHITI 0 > +#define AMDGPU_CHIP_PITCAIRN 1 > +#define AMDGPU_CHIP_VERDE 2 > +#define AMDGPU_CHIP_OLAND 3 > +#define AMDGPU_CHIP_HAINAN 4 > +#define AMDGPU_CHIP_BONAIRE 5 > +#define AMDGPU_CHIP_KAVERI 6 > +#define AMDGPU_CHIP_KABINI 7 > +#define AMDGPU_CHIP_HAWAII 8 > +#define AMDGPU_CHIP_MULLINS 9 > +#define AMDGPU_CHIP_TOPAZ 10 > +#define AMDGPU_CHIP_TONGA 11 > +#define AMDGPU_CHIP_FIJI 12 > +#define AMDGPU_CHIP_CARRIZO 13 > +#define AMDGPU_CHIP_STONEY 14 > +#define AMDGPU_CHIP_POLARIS10 15 > +#define AMDGPU_CHIP_POLARIS11 16 > +#define AMDGPU_CHIP_POLARIS12 17 > +#define AMDGPU_CHIP_VEGAM 18 > +#define AMDGPU_CHIP_VEGA10 19 > +#define AMDGPU_CHIP_VEGA12 20 > +#define AMDGPU_CHIP_VEGA20 21 > +#define AMDGPU_CHIP_RAVEN 22 > + > +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same > + * as AMDGPU_TILING_*. However, the the rules as to when to set them are > + * different. > + * > + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR > + * instead. > + * > + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be > + * set. > + * > + * For other ARRAY_MODEs: > + * - Only set TILE_SPLIT if the image is multisample. > + * > + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1 > + * different value there. The values are > + * - depth : 0 > + * - displayable : 1 > + * - thin : 2 > + * - thick (GFX6) : 3 > + * - rotated (GFX7+) : 4 > + * > + * TODO: What to do with multisample multi plane images? More tile split > + * fields don't fit if we want to keep a few bits for a format version. > + */ > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT 0 > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK 0xf > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4 > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK 0x1f > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT 9 > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK 0x7 > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12 > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK 0x7 > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT 15 > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT 17 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT 19 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT 21 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_MASK 0x3 > + > +/* Macrotile parameters for a second plane if existing */ > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_SHIFT 23 > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_SHIFT 25 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_SHIFT 27 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_SHIFT 29 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_MASK 0x3 > + > +/* Macrotile parameters for a third plane if existing */ > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_SHIFT 31 > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_SHIFT 33 > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_SHIFT 35 > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_MASK 0x3 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_SHIFT 37 > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_MASK 0x3 > + > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_SHIFT 0 > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_MASK 0x1f > + > +/* Whether to enable DCC compression. > + * > + * If enabled, exporting the surface results in three > + * planes: > + * - color data > + * - DCC data > + * - a 64-byte block with > + * - a 16 byte 0/1 bool as to whether the surface is currently DCC compressed. > + * - a 16-byte 0/1 bool as to whether the surface has fastclear data > + * - a 8-byte chunk with the current fastclear colors > + * > + * To ensure we do not keep compressing and decompressing the surface, once it > + * has been decompressed no party may recompress again. > + * > + * Applications should not hand over images with fastclear data as not > + * all users can support it, however, to help both Vulkan implementations > + * with the allocation we keep it in the 64-byte block. > + * > + * TODO: Can scanout really not support fastclear data? > + * TODO: What to do with multiplane images? > + */ > +#define AMDGPU_MODIFIER_COMPRESSION_SHIFT 39 > +#define AMDGPU_MODIFIER_COMPRESSION_MASK 0x1 Summary of the irc discussion with my 2 recommendations: - Make it clear in the spec here that all consumers must follow the 16 byte boolean to control actual DCC usage (using a conditional register write in the command buffer). On intel we've put that into the modifier itself, along the lines of "I don't like compressed anymore, pls give me something else". Not that runtime renogiation works well (or well, at all). - If you have consumers which can't handle fastclear, make it a separate modifier. > + > +/* The chip this is compatible with. > + * > + * If compression is disabled, use > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > + * > + * With compression enabled please use the exact chip. > + * > + * TODO: Do some generations share DCC format? > + */ > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff Do you really need all the combinations here of DCC + gpu gen + tiling details? When we had the entire discussion with nvidia folks they eventually agreed that they don't need the massive pile with every possible combination. Do you really plan to share all these different things? Note that e.g. on i915 we spec some of the tiling depending upon buffer size and buffer format (because that's how the hw works), not using explicit modifier flags for everything. -Daniel > + > +#define AMDGPU_MODIFIER_SET(field, value) \ > + (((__u64)(value) & AMDGPU_MODIFIER_##field##_MASK) << AMDGPU_MODIFIER_##field##_SHIFT) > +#define AMDGPU_MODIFIER_GET(value, field) \ > + (((__u64)(value) >> AMDGPU_MODIFIER_##field##_SHIFT) & AMDGPU_MODIFIER_##field##_MASK) > + > /* > * Definition of free sync enter and exit signals > * We may have more options in the future > -- > 2.18.0 >
Hi, On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > +/* The chip this is compatible with. > > + * > > + * If compression is disabled, use > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > + * > > + * With compression enabled please use the exact chip. > > + * > > + * TODO: Do some generations share DCC format? > > + */ > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > Do you really need all the combinations here of DCC + gpu gen + tiling > details? When we had the entire discussion with nvidia folks they > eventually agreed that they don't need the massive pile with every > possible combination. Do you really plan to share all these different > things? > > Note that e.g. on i915 we spec some of the tiling depending upon > buffer size and buffer format (because that's how the hw works), not > using explicit modifier flags for everything. Right. The conclusion, after people went through and started sorting out the kinds of formats for which they would _actually_ export real colour buffers for, that most vendors definitely have fewer than 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 possible formats to represent, very likely fewer than 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably fewer than 72,057,594,037,927,936 formats, and even still generally fewer than 281,474,976,710,656 if you want to be generous and leave 8 bits of the 56 available. If you do use 256 bits in order to represent 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 modifiers per format, userspace would start hitting OOM pretty quickly as it attempted to enumerate and negotiate acceptable modifiers. Either that or we need to replace the fixed 64-bit modifier tokens with some kind of eBPF script. Cheers, Daniel
Am 04.09.2018 um 12:15 schrieb Daniel Stone: > Hi, > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: >>> +/* The chip this is compatible with. >>> + * >>> + * If compression is disabled, use >>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >>> + * >>> + * With compression enabled please use the exact chip. >>> + * >>> + * TODO: Do some generations share DCC format? >>> + */ >>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >> Do you really need all the combinations here of DCC + gpu gen + tiling >> details? When we had the entire discussion with nvidia folks they >> eventually agreed that they don't need the massive pile with every >> possible combination. Do you really plan to share all these different >> things? >> >> Note that e.g. on i915 we spec some of the tiling depending upon >> buffer size and buffer format (because that's how the hw works), not >> using explicit modifier flags for everything. > Right. The conclusion, after people went through and started sorting > out the kinds of formats for which they would _actually_ export real > colour buffers for, that most vendors definitely have fewer than > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > possible formats to represent, very likely fewer than > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > fewer than 72,057,594,037,927,936 formats, and even still generally > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > bits of the 56 available. The problem here is that at least for some parameters we actually don't know which formats are actually used. The following are not real world examples, but just to explain the general problem. The memory configuration for example can be not ASIC specific, but rather determined by whoever took the ASIC and glued it together with VRAM on a board. It is not likely that somebody puts all the VRAM chips on one channel, but it is still perfectly possible. Same is true for things like harvesting, e.g. of 16 channels halve of them could be bad and we need to know which to actually use. Regards, Christian. > > If you do use 256 bits in order to represent > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > modifiers per format, userspace would start hitting OOM pretty quickly > as it attempted to enumerate and negotiate acceptable modifiers. > Either that or we need to replace the fixed 64-bit modifier tokens > with some kind of eBPF script. > > Cheers, > Daniel > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Sep 4, 2018 at 12:04 PM Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen > <bas@basnieuwenhuizen.nl> wrote: > > This is an initial proposal for format modifiers for AMD hardware. > > > > It uses 48 bits including a chip generation, leaving 8 bits for > > a format version number. > > > > On gfx6-gfx8 we have all the fields influencing sample locations > > in memory. > > > > Tile split bytes are optional for single sample buffers as no > > hardware reaches the split size with 1 sample and hence the actual > > size does not matter. > > > > The macrotile fields are duplicated for images with multiple planes. > > If the planes have different bitdepth they need different macro > > tile fields and different tile split bytes if multisample. > > > > I could not fit multiple copies in for tile split bytes, but > > multisample & multiplane images are very rare. Overall, I think > > we should punt on multisample for a later format version since > > they are generally not shared on any modifier aware windowing > > system, and we have more issues like fmask & cmask. > > > > We need these copies because the drm modifier of all planes in an > > image needs to be equal, so we need to fit these together. > > > > This adds fields for compression support, using metadata that is > > compatible with AMDVLK and for which radv and radeonsi can > > reasonably be extended. > > > > The big open question for compression is between which generations > > the format changed to see if we can share more. > > > > This explicitly does not try to solve the linear stride alignment > > issue, thoguh we could internally just use the tiling modes for > > the linear modes to indicate linear images with the stride for the > > given chip. > > > > Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> > > CC: Chad Versace <chadversary@chromium.org> > > CC: Dave Airlie <airlied@redhat.com> > > CC: Marek Olšák <marek.olsak@amd.com> > > CC: Nicolai Hähnle <nicolai.haehnle@amd.com> > > CC: Alex Deucher <alexander.deucher@amd.com> > > CC: Daniel Vetter <daniel.vetter@ffwll.ch> > > --- > > include/uapi/drm/amdgpu_drm.h | 130 ++++++++++++++++++++++++++++++++++ > > 1 file changed, 130 insertions(+) > > > > diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h > > index 94444eeba55b..4e1452161dbf 100644 > > --- a/include/uapi/drm/amdgpu_drm.h > > +++ b/include/uapi/drm/amdgpu_drm.h > > @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table { > > #define AMDGPU_FAMILY_AI 141 /* Vega10 */ > > #define AMDGPU_FAMILY_RV 142 /* Raven */ > > > > +#define AMDGPU_CHIP_TAHITI 0 > > +#define AMDGPU_CHIP_PITCAIRN 1 > > +#define AMDGPU_CHIP_VERDE 2 > > +#define AMDGPU_CHIP_OLAND 3 > > +#define AMDGPU_CHIP_HAINAN 4 > > +#define AMDGPU_CHIP_BONAIRE 5 > > +#define AMDGPU_CHIP_KAVERI 6 > > +#define AMDGPU_CHIP_KABINI 7 > > +#define AMDGPU_CHIP_HAWAII 8 > > +#define AMDGPU_CHIP_MULLINS 9 > > +#define AMDGPU_CHIP_TOPAZ 10 > > +#define AMDGPU_CHIP_TONGA 11 > > +#define AMDGPU_CHIP_FIJI 12 > > +#define AMDGPU_CHIP_CARRIZO 13 > > +#define AMDGPU_CHIP_STONEY 14 > > +#define AMDGPU_CHIP_POLARIS10 15 > > +#define AMDGPU_CHIP_POLARIS11 16 > > +#define AMDGPU_CHIP_POLARIS12 17 > > +#define AMDGPU_CHIP_VEGAM 18 > > +#define AMDGPU_CHIP_VEGA10 19 > > +#define AMDGPU_CHIP_VEGA12 20 > > +#define AMDGPU_CHIP_VEGA20 21 > > +#define AMDGPU_CHIP_RAVEN 22 > > + > > +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same > > + * as AMDGPU_TILING_*. However, the the rules as to when to set them are > > + * different. > > + * > > + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR > > + * instead. > > + * > > + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be > > + * set. > > + * > > + * For other ARRAY_MODEs: > > + * - Only set TILE_SPLIT if the image is multisample. > > + * > > + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1 > > + * different value there. The values are > > + * - depth : 0 > > + * - displayable : 1 > > + * - thin : 2 > > + * - thick (GFX6) : 3 > > + * - rotated (GFX7+) : 4 > > + * > > + * TODO: What to do with multisample multi plane images? More tile split > > + * fields don't fit if we want to keep a few bits for a format version. > > + */ > > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT 0 > > +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK 0xf > > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4 > > +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK 0x1f > > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT 9 > > +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK 0x7 > > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12 > > +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK 0x7 > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT 15 > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT 17 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT 19 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT 21 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_MASK 0x3 > > + > > +/* Macrotile parameters for a second plane if existing */ > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_SHIFT 23 > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_SHIFT 25 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_SHIFT 27 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_SHIFT 29 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_MASK 0x3 > > + > > +/* Macrotile parameters for a third plane if existing */ > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_SHIFT 31 > > +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_SHIFT 33 > > +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_SHIFT 35 > > +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_MASK 0x3 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_SHIFT 37 > > +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_MASK 0x3 > > + > > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_SHIFT 0 > > +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_MASK 0x1f > > + > > +/* Whether to enable DCC compression. > > + * > > + * If enabled, exporting the surface results in three > > + * planes: > > + * - color data > > + * - DCC data > > + * - a 64-byte block with > > + * - a 16 byte 0/1 bool as to whether the surface is currently DCC compressed. > > + * - a 16-byte 0/1 bool as to whether the surface has fastclear data > > + * - a 8-byte chunk with the current fastclear colors > > + * > > + * To ensure we do not keep compressing and decompressing the surface, once it > > + * has been decompressed no party may recompress again. > > + * > > + * Applications should not hand over images with fastclear data as not > > + * all users can support it, however, to help both Vulkan implementations > > + * with the allocation we keep it in the 64-byte block. > > + * > > + * TODO: Can scanout really not support fastclear data? > > + * TODO: What to do with multiplane images? > > + */ > > +#define AMDGPU_MODIFIER_COMPRESSION_SHIFT 39 > > +#define AMDGPU_MODIFIER_COMPRESSION_MASK 0x1 > > Summary of the irc discussion with my 2 recommendations: > > - Make it clear in the spec here that all consumers must follow the 16 > byte boolean to control actual DCC usage (using a conditional register > write in the command buffer). On intel we've put that into the > modifier itself, along the lines of "I don't like compressed anymore, > pls give me something else". Not that runtime renogiation works well > (or well, at all). > > - If you have consumers which can't handle fastclear, make it a > separate modifier. > > > + > > +/* The chip this is compatible with. > > + * > > + * If compression is disabled, use > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > + * > > + * With compression enabled please use the exact chip. > > + * > > + * TODO: Do some generations share DCC format? > > + */ > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > Do you really need all the combinations here of DCC + gpu gen + tiling > details? When we had the entire discussion with nvidia folks they > eventually agreed that they don't need the massive pile with every > possible combination. Do you really plan to share all these different > things? So GPU gen is based on the fact that GFX6-GFX8 and GFX9 had a significant change in tiling modes that are not compatible. Furthermore, DCC formats have seen changes between chips in the same generation and without more documentation on when those are compatible I'd like to be conservative and for DCC put every chip in its own class. For whether to enable DCC: there are basically a bunch of conditions in image creation parameters on when we can use it, so I'd like to keep non-DCC versions available even for HW that supports DCC. Tling on GFX9+ is only 5 bits so that is easy. For GFX6-GFX8 it is complicated. Each chip has a configurable system-wide 32-entry table of tiling modes (GFX7-GFX8 have the components split in two tables, but in practice given a format there is only one choice for the second table). The kernel has different settings for each chip to put in there. So if we do not want to enable sharing between GPUs we can just do a precise chip + tiling mode index and be done with it. However, since the entries are different between chips we have to either: 1) make some kind of mapping from (chip, tiling index) to a canonical value based on compatiblity. Which means for backwards compatibility reasons we're never going to be able to change the table config in the kernel again. 2) include the value of the table entry in the modifier. I chose solution 2 here. This means that while there is a lot of data in here in practice the tiling mode only has ~32 options per chip per format/bitdepth. There are some blocks not bound by this table (mostly scanout), but in practice the GL/vulkan driver just allocates from the tiling table anyway so kernel driver is just going to have to expose the modifiers corresponding to the tiling table to keep the amount limited. So as far as fields go, we have ARRAY_MODE = is this linear, micro-tiling, macro-tiling or thick-macro tiling (for 3d textures, various thicknesses), sparse texture. linear is not relevant here. micro-tiling is relevant for cross-GPU sharing if macro-tiling is not compatible. macro-tiling is optimal for 2d, sparse-texture could probably be aliased/out of scope. MICRO_TILE = normal, display, depth, rotated display I think these are obviously worth keeping PIPE_CONFIG = pretty much something constant over a chip but important for everything that is not linear TILE_SPLIT = relevant for multisample images. I think in practice constant for pretty much everything except GFX6, where it is format specific BANK_WIDTH,BANK_HEIGHT,MACRO_TILE_ASPEC,NUM_BANKS = for macro tiles, pretty much specific to chip + format, mostly independent of everything else. because of the bitdepth dependence we have to have multiple for multiplane images as some are e.g. one R plane + one GB plane. > > Note that e.g. on i915 we spec some of the tiling depending upon > buffer size and buffer format (because that's how the hw works), not > using explicit modifier flags for everything. > -Daniel > > > + > > +#define AMDGPU_MODIFIER_SET(field, value) \ > > + (((__u64)(value) & AMDGPU_MODIFIER_##field##_MASK) << AMDGPU_MODIFIER_##field##_SHIFT) > > +#define AMDGPU_MODIFIER_GET(value, field) \ > > + (((__u64)(value) >> AMDGPU_MODIFIER_##field##_SHIFT) & AMDGPU_MODIFIER_##field##_MASK) > > + > > /* > > * Definition of free sync enter and exit signals > > * We may have more options in the future > > -- > > 2.18.0 > > > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Hi, On Tue, 4 Sep 2018 at 11:44, Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > Right. The conclusion, after people went through and started sorting > > out the kinds of formats for which they would _actually_ export real > > colour buffers for, that most vendors definitely have fewer than > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > possible formats to represent, very likely fewer than > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > fewer than 72,057,594,037,927,936 formats, and even still generally > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > bits of the 56 available. > > The problem here is that at least for some parameters we actually don't > know which formats are actually used. > > The following are not real world examples, but just to explain the > general problem. > > The memory configuration for example can be not ASIC specific, but > rather determined by whoever took the ASIC and glued it together with > VRAM on a board. It is not likely that somebody puts all the VRAM chips > on one channel, but it is still perfectly possible. > > Same is true for things like harvesting, e.g. of 16 channels halve of > them could be bad and we need to know which to actually use. Sure, that's fine, but I'm not sure why, for instance, 'half of the memory channels are bad and must be avoided' would be attached to a format in the same way that macrotile size/depth would be? Similarly, what happens if you encode a channel or lane blacklist mask into a modifier, and then you import a buffer with that modifier into a client API on another device. Does that also apply to the other device or only the source device? How then do you handle the capability query between them: is it relevant to device B attempting to import and sample from an image that device A only has a 128-bit memory bus because of SKU configuration? How does generic userspace look at a token which encodes all of this information and know that the buffer is interchangeable between devices? Cheers, Daniel
On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > Hi, > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > > > +/* The chip this is compatible with. > > > > + * > > > > + * If compression is disabled, use > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > > > + * > > > > + * With compression enabled please use the exact chip. > > > > + * > > > > + * TODO: Do some generations share DCC format? > > > > + */ > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > > Do you really need all the combinations here of DCC + gpu gen + tiling > > > details? When we had the entire discussion with nvidia folks they > > > eventually agreed that they don't need the massive pile with every > > > possible combination. Do you really plan to share all these different > > > things? > > > > > > Note that e.g. on i915 we spec some of the tiling depending upon > > > buffer size and buffer format (because that's how the hw works), not > > > using explicit modifier flags for everything. > > Right. The conclusion, after people went through and started sorting > > out the kinds of formats for which they would _actually_ export real > > colour buffers for, that most vendors definitely have fewer than > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > possible formats to represent, very likely fewer than > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > fewer than 72,057,594,037,927,936 formats, and even still generally > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > bits of the 56 available. > > The problem here is that at least for some parameters we actually don't know > which formats are actually used. > > The following are not real world examples, but just to explain the general > problem. > > The memory configuration for example can be not ASIC specific, but rather > determined by whoever took the ASIC and glued it together with VRAM on a > board. It is not likely that somebody puts all the VRAM chips on one > channel, but it is still perfectly possible. > > Same is true for things like harvesting, e.g. of 16 channels halve of them > could be bad and we need to know which to actually use. For my understanding: This leaks outside the chip when sharing buffers? All the information you only need locally to a given amdgpu instance don't really need to be encoded in modifiers. Pointers to code where this is all decided (kernel and radeonsi would be good starters I guess) would be really good here. -Daniel > > Regards, > Christian. > > > > > If you do use 256 bits in order to represent > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > modifiers per format, userspace would start hitting OOM pretty quickly > > as it attempted to enumerate and negotiate acceptable modifiers. > > Either that or we need to replace the fixed 64-bit modifier tokens > > with some kind of eBPF script. > > > > Cheers, > > Daniel > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > > Hi, > > > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > > > > +/* The chip this is compatible with. > > > > > + * > > > > > + * If compression is disabled, use > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > > > > + * > > > > > + * With compression enabled please use the exact chip. > > > > > + * > > > > > + * TODO: Do some generations share DCC format? > > > > > + */ > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > > > Do you really need all the combinations here of DCC + gpu gen + tiling > > > > details? When we had the entire discussion with nvidia folks they > > > > eventually agreed that they don't need the massive pile with every > > > > possible combination. Do you really plan to share all these different > > > > things? > > > > > > > > Note that e.g. on i915 we spec some of the tiling depending upon > > > > buffer size and buffer format (because that's how the hw works), not > > > > using explicit modifier flags for everything. > > > Right. The conclusion, after people went through and started sorting > > > out the kinds of formats for which they would _actually_ export real > > > colour buffers for, that most vendors definitely have fewer than > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > possible formats to represent, very likely fewer than > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > > fewer than 72,057,594,037,927,936 formats, and even still generally > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > > bits of the 56 available. > > > > The problem here is that at least for some parameters we actually don't know > > which formats are actually used. > > > > The following are not real world examples, but just to explain the general > > problem. > > > > The memory configuration for example can be not ASIC specific, but rather > > determined by whoever took the ASIC and glued it together with VRAM on a > > board. It is not likely that somebody puts all the VRAM chips on one > > channel, but it is still perfectly possible. > > > > Same is true for things like harvesting, e.g. of 16 channels halve of them > > could be bad and we need to know which to actually use. > > For my understanding: This leaks outside the chip when sharing buffers? > All the information you only need locally to a given amdgpu instance > don't really need to be encoded in modifiers. > > Pointers to code where this is all decided (kernel and radeonsi would be > good starters I guess) would be really good here. I extracted the information on which bits are relevant mostly from the AddrFromCoord functions in addrlib in mesa: for macro-tiles: https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 for micro-tiles (or the micro-tiles in macro-tiles): https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > -Daniel > > > > > Regards, > > Christian. > > > > > > > > If you do use 256 bits in order to represent > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > modifiers per format, userspace would start hitting OOM pretty quickly > > > as it attempted to enumerate and negotiate acceptable modifiers. > > > Either that or we need to replace the fixed 64-bit modifier tokens > > > with some kind of eBPF script. > > > > > > Cheers, > > > Daniel > > > _______________________________________________ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel
+everyone again On Tue, Sep 4, 2018 at 2:39 PM Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > On Tue, Sep 4, 2018 at 2:22 PM Daniel Stone <daniel@fooishbar.org> wrote: > > > > Hi, > > > > On Tue, 4 Sep 2018 at 11:44, Christian König > > <ckoenig.leichtzumerken@gmail.com> wrote: > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > > > Right. The conclusion, after people went through and started sorting > > > > out the kinds of formats for which they would _actually_ export real > > > > colour buffers for, that most vendors definitely have fewer than > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > > possible formats to represent, very likely fewer than > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > > > fewer than 72,057,594,037,927,936 formats, and even still generally > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > > > bits of the 56 available. > > > > > > The problem here is that at least for some parameters we actually don't > > > know which formats are actually used. > > > > > > The following are not real world examples, but just to explain the > > > general problem. > > > > > > The memory configuration for example can be not ASIC specific, but > > > rather determined by whoever took the ASIC and glued it together with > > > VRAM on a board. It is not likely that somebody puts all the VRAM chips > > > on one channel, but it is still perfectly possible. > > > > > > Same is true for things like harvesting, e.g. of 16 channels halve of > > > them could be bad and we need to know which to actually use. > > > > Sure, that's fine, but I'm not sure why, for instance, 'half of the > > memory channels are bad and must be avoided' would be attached to a > > format in the same way that macrotile size/depth would be? Similarly, > > what happens if you encode a channel or lane blacklist mask into a > > modifier, and then you import a buffer with that modifier into a > > client API on another device. Does that also apply to the other device > > or only the source device? > > > > How then do you handle the capability query between them: is it > > relevant to device B attempting to import and sample from an image > > that device A only has a 128-bit memory bus because of SKU > > configuration? How does generic userspace look at a token which > > encodes all of this information and know that the buffer is > > interchangeable between devices? > > So my idea here is to encode all the bits that determine format in a > way that as much as possible if it > results in the same layout it should get the same modifier(i.e. remove > useless info): > > https://lists.freedesktop.org/archives/dri-devel/2018-September/188418.html > > An example of the per-chip things is the PIPE_CONFIG, which is used > for address calculation in the macro-tiled mode: > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/siaddrlib.cpp#L666 > > two chips with different PIPE_CONFIG will have different texture > layouts, as the address swizzles are different. > > So it gets included in the modifier (the PIPE_CONFIG is pretty much > constant per chip but multiple chips might have the same PIPE_CONFIG). > > > > > Cheers, > > Daniel > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 04.09.2018 um 14:22 schrieb Daniel Stone: > Hi, > > On Tue, 4 Sep 2018 at 11:44, Christian König > <ckoenig.leichtzumerken@gmail.com> wrote: >> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >>> Right. The conclusion, after people went through and started sorting >>> out the kinds of formats for which they would _actually_ export real >>> colour buffers for, that most vendors definitely have fewer than >>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>> possible formats to represent, very likely fewer than >>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >>> fewer than 72,057,594,037,927,936 formats, and even still generally >>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 >>> bits of the 56 available. >> The problem here is that at least for some parameters we actually don't >> know which formats are actually used. >> >> The following are not real world examples, but just to explain the >> general problem. >> >> The memory configuration for example can be not ASIC specific, but >> rather determined by whoever took the ASIC and glued it together with >> VRAM on a board. It is not likely that somebody puts all the VRAM chips >> on one channel, but it is still perfectly possible. >> >> Same is true for things like harvesting, e.g. of 16 channels halve of >> them could be bad and we need to know which to actually use. > Sure, that's fine, but I'm not sure why, for instance, 'half of the > memory channels are bad and must be avoided' would be attached to a > format in the same way that macrotile size/depth would be? Because it results in a different memory layout. > Similarly, > what happens if you encode a channel or lane blacklist mask into a > modifier, and then you import a buffer with that modifier into a > client API on another device. Does that also apply to the other device > or only the source device? If the other device doesn't have the same hardware config it probably can't access that buffer. > How then do you handle the capability query between them: is it > relevant to device B attempting to import and sample from an image > that device A only has a 128-bit memory bus because of SKU > configuration? How does generic userspace look at a token which > encodes all of this information and know that the buffer is > interchangeable between devices? Well that is exactly the point where my understanding stops. No idea how the user space driver is handling this. Marek came up with the interop interface, he needs to explain the details. Regards, Christian. > > Cheers, > Daniel
Am 04.09.2018 um 14:26 schrieb Daniel Vetter: > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >>> Hi, >>> >>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: >>>>> +/* The chip this is compatible with. >>>>> + * >>>>> + * If compression is disabled, use >>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >>>>> + * >>>>> + * With compression enabled please use the exact chip. >>>>> + * >>>>> + * TODO: Do some generations share DCC format? >>>>> + */ >>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >>>> Do you really need all the combinations here of DCC + gpu gen + tiling >>>> details? When we had the entire discussion with nvidia folks they >>>> eventually agreed that they don't need the massive pile with every >>>> possible combination. Do you really plan to share all these different >>>> things? >>>> >>>> Note that e.g. on i915 we spec some of the tiling depending upon >>>> buffer size and buffer format (because that's how the hw works), not >>>> using explicit modifier flags for everything. >>> Right. The conclusion, after people went through and started sorting >>> out the kinds of formats for which they would _actually_ export real >>> colour buffers for, that most vendors definitely have fewer than >>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>> possible formats to represent, very likely fewer than >>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >>> fewer than 72,057,594,037,927,936 formats, and even still generally >>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 >>> bits of the 56 available. >> The problem here is that at least for some parameters we actually don't know >> which formats are actually used. >> >> The following are not real world examples, but just to explain the general >> problem. >> >> The memory configuration for example can be not ASIC specific, but rather >> determined by whoever took the ASIC and glued it together with VRAM on a >> board. It is not likely that somebody puts all the VRAM chips on one >> channel, but it is still perfectly possible. >> >> Same is true for things like harvesting, e.g. of 16 channels halve of them >> could be bad and we need to know which to actually use. > For my understanding: This leaks outside the chip when sharing buffers? > All the information you only need locally to a given amdgpu instance > don't really need to be encoded in modifiers. Yeah, that is certainly a possibility. The problem is that the identifiers are then not universal applicable. But if we can somehow say this format only applies to buffers from devices with hardware config X, then that would certainly simplify things a lot. > Pointers to code where this is all decided (kernel and radeonsi would be > good starters I guess) would be really good here. Well that is the problem, a good part of that stuff is rather deeply embedded in the firmware. The kernel driver just reads the hardware config from some registers and hands it of to userspace. Regards, Christian. > -Daniel > >> Regards, >> Christian. >> >>> If you do use 256 bits in order to represent >>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>> modifiers per format, userspace would start hitting OOM pretty quickly >>> as it attempted to enumerate and negotiate acceptable modifiers. >>> Either that or we need to replace the fixed 64-bit modifier tokens >>> with some kind of eBPF script. >>> >>> Cheers, >>> Daniel >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > > > Hi, > > > > > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > > > > > +/* The chip this is compatible with. > > > > > > + * > > > > > > + * If compression is disabled, use > > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > > > > > + * > > > > > > + * With compression enabled please use the exact chip. > > > > > > + * > > > > > > + * TODO: Do some generations share DCC format? > > > > > > + */ > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling > > > > > details? When we had the entire discussion with nvidia folks they > > > > > eventually agreed that they don't need the massive pile with every > > > > > possible combination. Do you really plan to share all these different > > > > > things? > > > > > > > > > > Note that e.g. on i915 we spec some of the tiling depending upon > > > > > buffer size and buffer format (because that's how the hw works), not > > > > > using explicit modifier flags for everything. > > > > Right. The conclusion, after people went through and started sorting > > > > out the kinds of formats for which they would _actually_ export real > > > > colour buffers for, that most vendors definitely have fewer than > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > > possible formats to represent, very likely fewer than > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > > > fewer than 72,057,594,037,927,936 formats, and even still generally > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > > > bits of the 56 available. > > > > > > The problem here is that at least for some parameters we actually don't know > > > which formats are actually used. > > > > > > The following are not real world examples, but just to explain the general > > > problem. > > > > > > The memory configuration for example can be not ASIC specific, but rather > > > determined by whoever took the ASIC and glued it together with VRAM on a > > > board. It is not likely that somebody puts all the VRAM chips on one > > > channel, but it is still perfectly possible. > > > > > > Same is true for things like harvesting, e.g. of 16 channels halve of them > > > could be bad and we need to know which to actually use. > > > > For my understanding: This leaks outside the chip when sharing buffers? > > All the information you only need locally to a given amdgpu instance > > don't really need to be encoded in modifiers. > > > > Pointers to code where this is all decided (kernel and radeonsi would be > > good starters I guess) would be really good here. > > I extracted the information on which bits are relevant mostly from the > AddrFromCoord functions in addrlib in mesa: > > for macro-tiles: > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > > for micro-tiles (or the micro-tiles in macro-tiles): > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 So this is the decoding thing. How many of these actually exist, even when taking all the other information into account? E.g. given a platform + memory config (seems needed) + drm_fourcc + stride + height + width, how much of all these bits do you actually still freely pick? It might be that all the things you need to know from the memory config don't encode smaller than the macro/micro/whatever else stuff. But that's kinda the angle that we looked at this for everyone else. E.g. for multi-plane stuff, if everyone picks the same config for the 2nd/3rd plane, then you don't actually need to encode that. It just becomes part of the implied stuff in the modifier. -Daniel > > > -Daniel > > > > > > > > Regards, > > > Christian. > > > > > > > > > > > If you do use 256 bits in order to represent > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > > modifiers per format, userspace would start hitting OOM pretty quickly > > > > as it attempted to enumerate and negotiate acceptable modifiers. > > > > Either that or we need to replace the fixed 64-bit modifier tokens > > > > with some kind of eBPF script. > > > > > > > > Cheers, > > > > Daniel > > > > _______________________________________________ > > > > dri-devel mailing list > > > > dri-devel@lists.freedesktop.org > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > http://blog.ffwll.ch > > _______________________________________________ > > dri-devel mailing list > > dri-devel@lists.freedesktop.org > > https://lists.freedesktop.org/mailman/listinfo/dri-devel
Am 04.09.2018 um 15:03 schrieb Daniel Vetter: > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: >>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >>>>> Hi, >>>>> >>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: >>>>>>> +/* The chip this is compatible with. >>>>>>> + * >>>>>>> + * If compression is disabled, use >>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >>>>>>> + * >>>>>>> + * With compression enabled please use the exact chip. >>>>>>> + * >>>>>>> + * TODO: Do some generations share DCC format? >>>>>>> + */ >>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling >>>>>> details? When we had the entire discussion with nvidia folks they >>>>>> eventually agreed that they don't need the massive pile with every >>>>>> possible combination. Do you really plan to share all these different >>>>>> things? >>>>>> >>>>>> Note that e.g. on i915 we spec some of the tiling depending upon >>>>>> buffer size and buffer format (because that's how the hw works), not >>>>>> using explicit modifier flags for everything. >>>>> Right. The conclusion, after people went through and started sorting >>>>> out the kinds of formats for which they would _actually_ export real >>>>> colour buffers for, that most vendors definitely have fewer than >>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>>>> possible formats to represent, very likely fewer than >>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >>>>> fewer than 72,057,594,037,927,936 formats, and even still generally >>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 >>>>> bits of the 56 available. >>>> The problem here is that at least for some parameters we actually don't know >>>> which formats are actually used. >>>> >>>> The following are not real world examples, but just to explain the general >>>> problem. >>>> >>>> The memory configuration for example can be not ASIC specific, but rather >>>> determined by whoever took the ASIC and glued it together with VRAM on a >>>> board. It is not likely that somebody puts all the VRAM chips on one >>>> channel, but it is still perfectly possible. >>>> >>>> Same is true for things like harvesting, e.g. of 16 channels halve of them >>>> could be bad and we need to know which to actually use. >>> For my understanding: This leaks outside the chip when sharing buffers? >>> All the information you only need locally to a given amdgpu instance >>> don't really need to be encoded in modifiers. >>> >>> Pointers to code where this is all decided (kernel and radeonsi would be >>> good starters I guess) would be really good here. >> I extracted the information on which bits are relevant mostly from the >> AddrFromCoord functions in addrlib in mesa: >> >> for macro-tiles: >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >> >> for micro-tiles (or the micro-tiles in macro-tiles): >> >> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > So this is the decoding thing. How many of these actually exist, even when > taking all the other information into account? > > E.g. given a platform + memory config (seems needed) + drm_fourcc + stride > + height + width, how much of all these bits do you actually still freely > pick? > > It might be that all the things you need to know from the memory config > don't encode smaller than the macro/micro/whatever else stuff. But that's > kinda the angle that we looked at this for everyone else. > > E.g. for multi-plane stuff, if everyone picks the same config for the > 2nd/3rd plane, then you don't actually need to encode that. It just > becomes part of the implied stuff in the modifier. The pipe/bank configuration for the 2nd/3rd plane is the same, but the tile configuration is potentially different. We can make an educated guess, but I don't think that this is universal applicable as well. Regards, Christian. > -Daniel > >>> -Daniel >>> >>>> Regards, >>>> Christian. >>>> >>>>> If you do use 256 bits in order to represent >>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>>>> modifiers per format, userspace would start hitting OOM pretty quickly >>>>> as it attempted to enumerate and negotiate acceptable modifiers. >>>>> Either that or we need to replace the fixed 64-bit modifier tokens >>>>> with some kind of eBPF script. >>>>> >>>>> Cheers, >>>>> Daniel >>>>> _______________________________________________ >>>>> dri-devel mailing list >>>>> dri-devel@lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>> -- >>> Daniel Vetter >>> Software Engineer, Intel Corporation >>> http://blog.ffwll.ch >>> _______________________________________________ >>> dri-devel mailing list >>> dri-devel@lists.freedesktop.org >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
On Tue, Sep 4, 2018 at 3:12 PM, Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > Am 04.09.2018 um 15:03 schrieb Daniel Vetter: >> >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >>> >>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: >>>> >>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >>>>> >>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >>>>>> >>>>>> Hi, >>>>>> >>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> >>>>>> wrote: >>>>>>> >>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen >>>>>>> <bas@basnieuwenhuizen.nl> wrote: >>>>>>>> >>>>>>>> +/* The chip this is compatible with. >>>>>>>> + * >>>>>>>> + * If compression is disabled, use >>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >>>>>>>> + * >>>>>>>> + * With compression enabled please use the exact chip. >>>>>>>> + * >>>>>>>> + * TODO: Do some generations share DCC format? >>>>>>>> + */ >>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >>>>>>> >>>>>>> Do you really need all the combinations here of DCC + gpu gen + >>>>>>> tiling >>>>>>> details? When we had the entire discussion with nvidia folks they >>>>>>> eventually agreed that they don't need the massive pile with every >>>>>>> possible combination. Do you really plan to share all these different >>>>>>> things? >>>>>>> >>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon >>>>>>> buffer size and buffer format (because that's how the hw works), not >>>>>>> using explicit modifier flags for everything. >>>>>> >>>>>> Right. The conclusion, after people went through and started sorting >>>>>> out the kinds of formats for which they would _actually_ export real >>>>>> colour buffers for, that most vendors definitely have fewer than >>>>>> >>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>>>>> possible formats to represent, very likely fewer than >>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally >>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 >>>>>> bits of the 56 available. >>>>> >>>>> The problem here is that at least for some parameters we actually don't >>>>> know >>>>> which formats are actually used. >>>>> >>>>> The following are not real world examples, but just to explain the >>>>> general >>>>> problem. >>>>> >>>>> The memory configuration for example can be not ASIC specific, but >>>>> rather >>>>> determined by whoever took the ASIC and glued it together with VRAM on >>>>> a >>>>> board. It is not likely that somebody puts all the VRAM chips on one >>>>> channel, but it is still perfectly possible. >>>>> >>>>> Same is true for things like harvesting, e.g. of 16 channels halve of >>>>> them >>>>> could be bad and we need to know which to actually use. >>>> >>>> For my understanding: This leaks outside the chip when sharing buffers? >>>> All the information you only need locally to a given amdgpu instance >>>> don't really need to be encoded in modifiers. >>>> >>>> Pointers to code where this is all decided (kernel and radeonsi would be >>>> good starters I guess) would be really good here. >>> >>> I extracted the information on which bits are relevant mostly from the >>> AddrFromCoord functions in addrlib in mesa: >>> >>> for macro-tiles: >>> >>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >>> >>> for micro-tiles (or the micro-tiles in macro-tiles): >>> >>> >>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >> >> So this is the decoding thing. How many of these actually exist, even when >> taking all the other information into account? >> >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >> + height + width, how much of all these bits do you actually still freely >> pick? >> >> It might be that all the things you need to know from the memory config >> don't encode smaller than the macro/micro/whatever else stuff. But that's >> kinda the angle that we looked at this for everyone else. >> >> E.g. for multi-plane stuff, if everyone picks the same config for the >> 2nd/3rd plane, then you don't actually need to encode that. It just >> becomes part of the implied stuff in the modifier. > > > The pipe/bank configuration for the 2nd/3rd plane is the same, but the tile > configuration is potentially different. > > We can make an educated guess, but I don't think that this is universal > applicable as well. Why? The source code for amdgpu.ko, mesa, radv, amdvk is all open. Should be possible to figure out what's actually used (on buffers you want to share), and then figuring out how to best encode the information you need to reconstruct all the information. You might need to occasionally respin that if best practices for how to lay out buffers changes after you started upstreaming for a given generation. But ime hw people have pretty strong opinions on how to best tile things. You even know what's coming down the internal pipeline. Or do you mean something else with "educated guess"? -Daniel > > Regards, > Christian. > > >> -Daniel >> >>>> -Daniel >>>> >>>>> Regards, >>>>> Christian. >>>>> >>>>>> If you do use 256 bits in order to represent >>>>>> >>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>>>>> modifiers per format, userspace would start hitting OOM pretty quickly >>>>>> as it attempted to enumerate and negotiate acceptable modifiers. >>>>>> Either that or we need to replace the fixed 64-bit modifier tokens >>>>>> with some kind of eBPF script. >>>>>> >>>>>> Cheers, >>>>>> Daniel >>>>>> _______________________________________________ >>>>>> dri-devel mailing list >>>>>> dri-devel@lists.freedesktop.org >>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>> >>>> -- >>>> Daniel Vetter >>>> Software Engineer, Intel Corporation >>>> http://blog.ffwll.ch >>>> _______________________________________________ >>>> dri-devel mailing list >>>> dri-devel@lists.freedesktop.org >>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >
Am 04.09.2018 um 15:17 schrieb Daniel Vetter: > On Tue, Sep 4, 2018 at 3:12 PM, Christian König > <ckoenig.leichtzumerken@gmail.com> wrote: >> Am 04.09.2018 um 15:03 schrieb Daniel Vetter: >>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: >>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >>>>>>> Hi, >>>>>>> >>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> >>>>>>> wrote: >>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen >>>>>>>> <bas@basnieuwenhuizen.nl> wrote: >>>>>>>>> +/* The chip this is compatible with. >>>>>>>>> + * >>>>>>>>> + * If compression is disabled, use >>>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >>>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >>>>>>>>> + * >>>>>>>>> + * With compression enabled please use the exact chip. >>>>>>>>> + * >>>>>>>>> + * TODO: Do some generations share DCC format? >>>>>>>>> + */ >>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >>>>>>>> Do you really need all the combinations here of DCC + gpu gen + >>>>>>>> tiling >>>>>>>> details? When we had the entire discussion with nvidia folks they >>>>>>>> eventually agreed that they don't need the massive pile with every >>>>>>>> possible combination. Do you really plan to share all these different >>>>>>>> things? >>>>>>>> >>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon >>>>>>>> buffer size and buffer format (because that's how the hw works), not >>>>>>>> using explicit modifier flags for everything. >>>>>>> Right. The conclusion, after people went through and started sorting >>>>>>> out the kinds of formats for which they would _actually_ export real >>>>>>> colour buffers for, that most vendors definitely have fewer than >>>>>>> >>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>>>>>> possible formats to represent, very likely fewer than >>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally >>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 >>>>>>> bits of the 56 available. >>>>>> The problem here is that at least for some parameters we actually don't >>>>>> know >>>>>> which formats are actually used. >>>>>> >>>>>> The following are not real world examples, but just to explain the >>>>>> general >>>>>> problem. >>>>>> >>>>>> The memory configuration for example can be not ASIC specific, but >>>>>> rather >>>>>> determined by whoever took the ASIC and glued it together with VRAM on >>>>>> a >>>>>> board. It is not likely that somebody puts all the VRAM chips on one >>>>>> channel, but it is still perfectly possible. >>>>>> >>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of >>>>>> them >>>>>> could be bad and we need to know which to actually use. >>>>> For my understanding: This leaks outside the chip when sharing buffers? >>>>> All the information you only need locally to a given amdgpu instance >>>>> don't really need to be encoded in modifiers. >>>>> >>>>> Pointers to code where this is all decided (kernel and radeonsi would be >>>>> good starters I guess) would be really good here. >>>> I extracted the information on which bits are relevant mostly from the >>>> AddrFromCoord functions in addrlib in mesa: >>>> >>>> for macro-tiles: >>>> >>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >>>> >>>> for micro-tiles (or the micro-tiles in macro-tiles): >>>> >>>> >>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >>> So this is the decoding thing. How many of these actually exist, even when >>> taking all the other information into account? >>> >>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >>> + height + width, how much of all these bits do you actually still freely >>> pick? >>> >>> It might be that all the things you need to know from the memory config >>> don't encode smaller than the macro/micro/whatever else stuff. But that's >>> kinda the angle that we looked at this for everyone else. >>> >>> E.g. for multi-plane stuff, if everyone picks the same config for the >>> 2nd/3rd plane, then you don't actually need to encode that. It just >>> becomes part of the implied stuff in the modifier. >> >> The pipe/bank configuration for the 2nd/3rd plane is the same, but the tile >> configuration is potentially different. >> >> We can make an educated guess, but I don't think that this is universal >> applicable as well. > Why? The source code for amdgpu.ko, mesa, radv, amdvk is all open. > Should be possible to figure out what's actually used (on buffers you > want to share), and then figuring out how to best encode the > information you need to reconstruct all the information. You might > need to occasionally respin that if best practices for how to lay out > buffers changes after you started upstreaming for a given generation. > But ime hw people have pretty strong opinions on how to best tile > things. You even know what's coming down the internal pipeline. > > Or do you mean something else with "educated guess"? I mean something else. It can depend on the use case what a buffer object ends up with. In other words when you want to transcode a video you could use a different layout as when you want to display the resulting image. So essentially we would need to encode the use case and not the format of the buffer. That might work, but my gut feeling strongly tells me that this is the wrong approach. Christian. > -Daniel > >> Regards, >> Christian. >> >> >>> -Daniel >>> >>>>> -Daniel >>>>> >>>>>> Regards, >>>>>> Christian. >>>>>> >>>>>>> If you do use 256 bits in order to represent >>>>>>> >>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly >>>>>>> as it attempted to enumerate and negotiate acceptable modifiers. >>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens >>>>>>> with some kind of eBPF script. >>>>>>> >>>>>>> Cheers, >>>>>>> Daniel >>>>>>> _______________________________________________ >>>>>>> dri-devel mailing list >>>>>>> dri-devel@lists.freedesktop.org >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>> -- >>>>> Daniel Vetter >>>>> Software Engineer, Intel Corporation >>>>> http://blog.ffwll.ch >>>>> _______________________________________________ >>>>> dri-devel mailing list >>>>> dri-devel@lists.freedesktop.org >>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> > >
On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > > > > Hi, > > > > > > > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > > > > > > +/* The chip this is compatible with. > > > > > > > + * > > > > > > > + * If compression is disabled, use > > > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > > > > > > + * > > > > > > > + * With compression enabled please use the exact chip. > > > > > > > + * > > > > > > > + * TODO: Do some generations share DCC format? > > > > > > > + */ > > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling > > > > > > details? When we had the entire discussion with nvidia folks they > > > > > > eventually agreed that they don't need the massive pile with every > > > > > > possible combination. Do you really plan to share all these different > > > > > > things? > > > > > > > > > > > > Note that e.g. on i915 we spec some of the tiling depending upon > > > > > > buffer size and buffer format (because that's how the hw works), not > > > > > > using explicit modifier flags for everything. > > > > > Right. The conclusion, after people went through and started sorting > > > > > out the kinds of formats for which they would _actually_ export real > > > > > colour buffers for, that most vendors definitely have fewer than > > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > > > possible formats to represent, very likely fewer than > > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > > > > fewer than 72,057,594,037,927,936 formats, and even still generally > > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > > > > bits of the 56 available. > > > > > > > > The problem here is that at least for some parameters we actually don't know > > > > which formats are actually used. > > > > > > > > The following are not real world examples, but just to explain the general > > > > problem. > > > > > > > > The memory configuration for example can be not ASIC specific, but rather > > > > determined by whoever took the ASIC and glued it together with VRAM on a > > > > board. It is not likely that somebody puts all the VRAM chips on one > > > > channel, but it is still perfectly possible. > > > > > > > > Same is true for things like harvesting, e.g. of 16 channels halve of them > > > > could be bad and we need to know which to actually use. > > > > > > For my understanding: This leaks outside the chip when sharing buffers? > > > All the information you only need locally to a given amdgpu instance > > > don't really need to be encoded in modifiers. > > > > > > Pointers to code where this is all decided (kernel and radeonsi would be > > > good starters I guess) would be really good here. > > > > I extracted the information on which bits are relevant mostly from the > > AddrFromCoord functions in addrlib in mesa: > > > > for macro-tiles: > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > > > > for micro-tiles (or the micro-tiles in macro-tiles): > > > > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > > So this is the decoding thing. How many of these actually exist, even when > taking all the other information into account? > > E.g. given a platform + memory config (seems needed) + drm_fourcc + stride > + height + width, how much of all these bits do you actually still freely > pick? Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, depth, display-rotated) + whether to use compression, everything else is fixed given those option, the properties of the chip and the format. > > It might be that all the things you need to know from the memory config > don't encode smaller than the macro/micro/whatever else stuff. But that's > kinda the angle that we looked at this for everyone else. > > E.g. for multi-plane stuff, if everyone picks the same config for the > 2nd/3rd plane, then you don't actually need to encode that. It just > becomes part of the implied stuff in the modifier. The problem is some GPUs are compatible for say 8-bpp images, but not for 32-bpp surfaces. e.g. lets look at the following table showing the current configuration for all GFX6-GFX8 GPU: format: (bank width, bank height, macro tile aspect, num banks) for 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) ADDR_SURF_P8_32x32_8x16 bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only compatible for 8-bpp. So we can't just assume that if the first plane properties match that they do so for the second plane, because we don't know what GPU it is coming from. And we can make a canonical table from this but then if we change the above tables in the kernel that runs into compatibility issues. > -Daniel > > > > > > -Daniel > > > > > > > > > > > Regards, > > > > Christian. > > > > > > > > > > > > > > If you do use 256 bits in order to represent > > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > > > > modifiers per format, userspace would start hitting OOM pretty quickly > > > > > as it attempted to enumerate and negotiate acceptable modifiers. > > > > > Either that or we need to replace the fixed 64-bit modifier tokens > > > > > with some kind of eBPF script. > > > > > > > > > > Cheers, > > > > > Daniel > > > > > _______________________________________________ > > > > > dri-devel mailing list > > > > > dri-devel@lists.freedesktop.org > > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > > > > > > > > > -- > > > Daniel Vetter > > > Software Engineer, Intel Corporation > > > http://blog.ffwll.ch > > > _______________________________________________ > > > dri-devel mailing list > > > dri-devel@lists.freedesktop.org > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > > -- > Daniel Vetter > Software Engineer, Intel Corporation > http://blog.ffwll.ch
On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> > > >> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: >> > > > > Hi, >> > > > > >> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: >> > > > > > > +/* The chip this is compatible with. >> > > > > > > + * >> > > > > > > + * If compression is disabled, use >> > > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >> > > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ >> > > > > > > + * >> > > > > > > + * With compression enabled please use the exact chip. >> > > > > > > + * >> > > > > > > + * TODO: Do some generations share DCC format? >> > > > > > > + */ >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >> > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling >> > > > > > details? When we had the entire discussion with nvidia folks they >> > > > > > eventually agreed that they don't need the massive pile with every >> > > > > > possible combination. Do you really plan to share all these different >> > > > > > things? >> > > > > > >> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon >> > > > > > buffer size and buffer format (because that's how the hw works), not >> > > > > > using explicit modifier flags for everything. >> > > > > Right. The conclusion, after people went through and started sorting >> > > > > out the kinds of formats for which they would _actually_ export real >> > > > > colour buffers for, that most vendors definitely have fewer than >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> > > > > possible formats to represent, very likely fewer than >> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally >> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 >> > > > > bits of the 56 available. >> > > > >> > > > The problem here is that at least for some parameters we actually don't know >> > > > which formats are actually used. >> > > > >> > > > The following are not real world examples, but just to explain the general >> > > > problem. >> > > > >> > > > The memory configuration for example can be not ASIC specific, but rather >> > > > determined by whoever took the ASIC and glued it together with VRAM on a >> > > > board. It is not likely that somebody puts all the VRAM chips on one >> > > > channel, but it is still perfectly possible. >> > > > >> > > > Same is true for things like harvesting, e.g. of 16 channels halve of them >> > > > could be bad and we need to know which to actually use. >> > > >> > > For my understanding: This leaks outside the chip when sharing buffers? >> > > All the information you only need locally to a given amdgpu instance >> > > don't really need to be encoded in modifiers. >> > > >> > > Pointers to code where this is all decided (kernel and radeonsi would be >> > > good starters I guess) would be really good here. >> > >> > I extracted the information on which bits are relevant mostly from the >> > AddrFromCoord functions in addrlib in mesa: >> > >> > for macro-tiles: >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >> > >> > for micro-tiles (or the micro-tiles in macro-tiles): >> > >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >> >> So this is the decoding thing. How many of these actually exist, even when >> taking all the other information into account? >> >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >> + height + width, how much of all these bits do you actually still freely >> pick? > > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, > depth, display-rotated) + whether to use compression, everything else > is fixed given those option, the properties of the chip and the > format. > > >> >> It might be that all the things you need to know from the memory config >> don't encode smaller than the macro/micro/whatever else stuff. But that's >> kinda the angle that we looked at this for everyone else. >> >> E.g. for multi-plane stuff, if everyone picks the same config for the >> 2nd/3rd plane, then you don't actually need to encode that. It just >> becomes part of the implied stuff in the modifier. > > The problem is some GPUs are compatible for say 8-bpp images, but not > for 32-bpp surfaces. e.g. lets look at the following table showing the > current configuration for all GFX6-GFX8 GPU: > > format: (bank width, bank height, macro tile aspect, num banks) for > 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG > > verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 > tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) > ADDR_SURF_P8_32x32_8x16 > bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 > hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 > CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 > topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 > fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 > tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 > polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 > > We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp > and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only > compatible for 8-bpp. > > So we can't just assume that if the first plane properties match that > they do so for the second plane, because we don't know what GPU it is > coming from. > > And we can make a canonical table from this but then if we change the > above tables in the kernel that runs into compatibility issues. I think that's roughly what nvidia ended up doing (not sure they published anything, haven't seen any patches on dri-devel). Two parts: - Bunch of flags/bits for the major mode, like {display, non-display, DCC} and so on. - For each of the major modes a list of enumerated modes actually used in reality. For DCC this would be the generation, for more plain formats it would be the above table. Table entries would be per-bpp (in case that matters, or whatever matters for amd gpus). Imo would be totally ok to put the relevant lookup tables into drm_fourcc.h directly. Or any other suitable place. From a quick look about 16 bits total. Gives you 40bits of second chances and "oops totally sorry". -Daniel > >> -Daniel >> >> > >> > > -Daniel >> > > >> > > > >> > > > Regards, >> > > > Christian. >> > > > >> > > > > >> > > > > If you do use 256 bits in order to represent >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> > > > > modifiers per format, userspace would start hitting OOM pretty quickly >> > > > > as it attempted to enumerate and negotiate acceptable modifiers. >> > > > > Either that or we need to replace the fixed 64-bit modifier tokens >> > > > > with some kind of eBPF script. >> > > > > >> > > > > Cheers, >> > > > > Daniel >> > > > > _______________________________________________ >> > > > > dri-devel mailing list >> > > > > dri-devel@lists.freedesktop.org >> > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel >> > > > >> > > >> > > -- >> > > Daniel Vetter >> > > Software Engineer, Intel Corporation >> > > http://blog.ffwll.ch >> > > _______________________________________________ >> > > dri-devel mailing list >> > > dri-devel@lists.freedesktop.org >> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> http://blog.ffwll.ch
On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > <bas@basnieuwenhuizen.nl> wrote: > > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >> > >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > >> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >> > > > >> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > >> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: > >> > > > > Hi, > >> > > > > > >> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > >> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > >> > > > > > > +/* The chip this is compatible with. > >> > > > > > > + * > >> > > > > > > + * If compression is disabled, use > >> > > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > >> > > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ > >> > > > > > > + * > >> > > > > > > + * With compression enabled please use the exact chip. > >> > > > > > > + * > >> > > > > > > + * TODO: Do some generations share DCC format? > >> > > > > > > + */ > >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > >> > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling > >> > > > > > details? When we had the entire discussion with nvidia folks they > >> > > > > > eventually agreed that they don't need the massive pile with every > >> > > > > > possible combination. Do you really plan to share all these different > >> > > > > > things? > >> > > > > > > >> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon > >> > > > > > buffer size and buffer format (because that's how the hw works), not > >> > > > > > using explicit modifier flags for everything. > >> > > > > Right. The conclusion, after people went through and started sorting > >> > > > > out the kinds of formats for which they would _actually_ export real > >> > > > > colour buffers for, that most vendors definitely have fewer than > >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > >> > > > > possible formats to represent, very likely fewer than > >> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > >> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally > >> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 > >> > > > > bits of the 56 available. > >> > > > > >> > > > The problem here is that at least for some parameters we actually don't know > >> > > > which formats are actually used. > >> > > > > >> > > > The following are not real world examples, but just to explain the general > >> > > > problem. > >> > > > > >> > > > The memory configuration for example can be not ASIC specific, but rather > >> > > > determined by whoever took the ASIC and glued it together with VRAM on a > >> > > > board. It is not likely that somebody puts all the VRAM chips on one > >> > > > channel, but it is still perfectly possible. > >> > > > > >> > > > Same is true for things like harvesting, e.g. of 16 channels halve of them > >> > > > could be bad and we need to know which to actually use. > >> > > > >> > > For my understanding: This leaks outside the chip when sharing buffers? > >> > > All the information you only need locally to a given amdgpu instance > >> > > don't really need to be encoded in modifiers. > >> > > > >> > > Pointers to code where this is all decided (kernel and radeonsi would be > >> > > good starters I guess) would be really good here. > >> > > >> > I extracted the information on which bits are relevant mostly from the > >> > AddrFromCoord functions in addrlib in mesa: > >> > > >> > for macro-tiles: > >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > >> > > >> > for micro-tiles (or the micro-tiles in macro-tiles): > >> > > >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > >> > >> So this is the decoding thing. How many of these actually exist, even when > >> taking all the other information into account? > >> > >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride > >> + height + width, how much of all these bits do you actually still freely > >> pick? > > > > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, > > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, > > depth, display-rotated) + whether to use compression, everything else > > is fixed given those option, the properties of the chip and the > > format. > > > > > >> > >> It might be that all the things you need to know from the memory config > >> don't encode smaller than the macro/micro/whatever else stuff. But that's > >> kinda the angle that we looked at this for everyone else. > >> > >> E.g. for multi-plane stuff, if everyone picks the same config for the > >> 2nd/3rd plane, then you don't actually need to encode that. It just > >> becomes part of the implied stuff in the modifier. > > > > The problem is some GPUs are compatible for say 8-bpp images, but not > > for 32-bpp surfaces. e.g. lets look at the following table showing the > > current configuration for all GFX6-GFX8 GPU: > > > > format: (bank width, bank height, macro tile aspect, num banks) for > > 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG > > > > verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > > oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > > hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 > > tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) > > ADDR_SURF_P8_32x32_8x16 > > bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 > > hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 > > CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 > > topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 > > fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 > > tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > > polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 > > polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > > stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 > > > > We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp > > and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only > > compatible for 8-bpp. > > > > So we can't just assume that if the first plane properties match that > > they do so for the second plane, because we don't know what GPU it is > > coming from. > > > > And we can make a canonical table from this but then if we change the > > above tables in the kernel that runs into compatibility issues. > > > I think that's roughly what nvidia ended up doing (not sure they > published anything, haven't seen any patches on dri-devel). Two parts: > - Bunch of flags/bits for the major mode, like {display, non-display, > DCC} and so on. > - For each of the major modes a list of enumerated modes actually used > in reality. For DCC this would be the generation, for more plain > formats it would be the above table. Table entries would be per-bpp > (in case that matters, or whatever matters for amd gpus). > > Imo would be totally ok to put the relevant lookup tables into > drm_fourcc.h directly. Or any other suitable place. Right, the issue would be that the tables are configured int he kernel https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 and having a shared hardcoded table means that if the kernel tables change we break userspace. That would be a new commitment on the side of the kernel driver. (If I can get an okay for that commitment from Alex/Christian I'd be totally happy to switch it over) > > From a quick look about 16 bits total. Gives you 40bits of second > chances and "oops totally sorry". I purposefully left 8 bits as a version field for those oopses. > -Daniel > > > > >> -Daniel > >> > >> > > >> > > -Daniel > >> > > > >> > > > > >> > > > Regards, > >> > > > Christian. > >> > > > > >> > > > > > >> > > > > If you do use 256 bits in order to represent > >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > >> > > > > modifiers per format, userspace would start hitting OOM pretty quickly > >> > > > > as it attempted to enumerate and negotiate acceptable modifiers. > >> > > > > Either that or we need to replace the fixed 64-bit modifier tokens > >> > > > > with some kind of eBPF script. > >> > > > > > >> > > > > Cheers, > >> > > > > Daniel > >> > > > > _______________________________________________ > >> > > > > dri-devel mailing list > >> > > > > dri-devel@lists.freedesktop.org > >> > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > >> > > > > >> > > > >> > > -- > >> > > Daniel Vetter > >> > > Software Engineer, Intel Corporation > >> > > http://blog.ffwll.ch > >> > > _______________________________________________ > >> > > dri-devel mailing list > >> > > dri-devel@lists.freedesktop.org > >> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel > >> > >> -- > >> Daniel Vetter > >> Software Engineer, Intel Corporation > >> http://blog.ffwll.ch > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen >> <bas@basnieuwenhuizen.nl> wrote: >> > On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >> >> >> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >> >> > On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >> > > >> >> > > On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >> >> > > > Am 04.09.2018 um 12:15 schrieb Daniel Stone: >> >> > > > > Hi, >> >> > > > > >> >> > > > > On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >> >> > > > > > On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: >> >> > > > > > > +/* The chip this is compatible with. >> >> > > > > > > + * >> >> > > > > > > + * If compression is disabled, use >> >> > > > > > > + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >> >> > > > > > > + * - AMDGPU_CHIP_VEGA10 for GFX9+ >> >> > > > > > > + * >> >> > > > > > > + * With compression enabled please use the exact chip. >> >> > > > > > > + * >> >> > > > > > > + * TODO: Do some generations share DCC format? >> >> > > > > > > + */ >> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >> >> > > > > > > +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >> >> > > > > > Do you really need all the combinations here of DCC + gpu gen + tiling >> >> > > > > > details? When we had the entire discussion with nvidia folks they >> >> > > > > > eventually agreed that they don't need the massive pile with every >> >> > > > > > possible combination. Do you really plan to share all these different >> >> > > > > > things? >> >> > > > > > >> >> > > > > > Note that e.g. on i915 we spec some of the tiling depending upon >> >> > > > > > buffer size and buffer format (because that's how the hw works), not >> >> > > > > > using explicit modifier flags for everything. >> >> > > > > Right. The conclusion, after people went through and started sorting >> >> > > > > out the kinds of formats for which they would _actually_ export real >> >> > > > > colour buffers for, that most vendors definitely have fewer than >> >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> >> > > > > possible formats to represent, very likely fewer than >> >> > > > > 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >> >> > > > > fewer than 72,057,594,037,927,936 formats, and even still generally >> >> > > > > fewer than 281,474,976,710,656 if you want to be generous and leave 8 >> >> > > > > bits of the 56 available. >> >> > > > >> >> > > > The problem here is that at least for some parameters we actually don't know >> >> > > > which formats are actually used. >> >> > > > >> >> > > > The following are not real world examples, but just to explain the general >> >> > > > problem. >> >> > > > >> >> > > > The memory configuration for example can be not ASIC specific, but rather >> >> > > > determined by whoever took the ASIC and glued it together with VRAM on a >> >> > > > board. It is not likely that somebody puts all the VRAM chips on one >> >> > > > channel, but it is still perfectly possible. >> >> > > > >> >> > > > Same is true for things like harvesting, e.g. of 16 channels halve of them >> >> > > > could be bad and we need to know which to actually use. >> >> > > >> >> > > For my understanding: This leaks outside the chip when sharing buffers? >> >> > > All the information you only need locally to a given amdgpu instance >> >> > > don't really need to be encoded in modifiers. >> >> > > >> >> > > Pointers to code where this is all decided (kernel and radeonsi would be >> >> > > good starters I guess) would be really good here. >> >> > >> >> > I extracted the information on which bits are relevant mostly from the >> >> > AddrFromCoord functions in addrlib in mesa: >> >> > >> >> > for macro-tiles: >> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >> >> > >> >> > for micro-tiles (or the micro-tiles in macro-tiles): >> >> > >> >> > https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >> >> >> >> So this is the decoding thing. How many of these actually exist, even when >> >> taking all the other information into account? >> >> >> >> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >> >> + height + width, how much of all these bits do you actually still freely >> >> pick? >> > >> > Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, >> > thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, >> > depth, display-rotated) + whether to use compression, everything else >> > is fixed given those option, the properties of the chip and the >> > format. >> > >> > >> >> >> >> It might be that all the things you need to know from the memory config >> >> don't encode smaller than the macro/micro/whatever else stuff. But that's >> >> kinda the angle that we looked at this for everyone else. >> >> >> >> E.g. for multi-plane stuff, if everyone picks the same config for the >> >> 2nd/3rd plane, then you don't actually need to encode that. It just >> >> becomes part of the implied stuff in the modifier. >> > >> > The problem is some GPUs are compatible for say 8-bpp images, but not >> > for 32-bpp surfaces. e.g. lets look at the following table showing the >> > current configuration for all GFX6-GFX8 GPU: >> > >> > format: (bank width, bank height, macro tile aspect, num banks) for >> > 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG >> > >> > verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 >> > oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 >> > hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 >> > tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) >> > ADDR_SURF_P8_32x32_8x16 >> > bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 >> > hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 >> > CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 >> > topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 >> > fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 >> > tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 >> > polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 >> > polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 >> > stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 >> > >> > We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp >> > and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only >> > compatible for 8-bpp. >> > >> > So we can't just assume that if the first plane properties match that >> > they do so for the second plane, because we don't know what GPU it is >> > coming from. >> > >> > And we can make a canonical table from this but then if we change the >> > above tables in the kernel that runs into compatibility issues. >> >> >> I think that's roughly what nvidia ended up doing (not sure they >> published anything, haven't seen any patches on dri-devel). Two parts: >> - Bunch of flags/bits for the major mode, like {display, non-display, >> DCC} and so on. >> - For each of the major modes a list of enumerated modes actually used >> in reality. For DCC this would be the generation, for more plain >> formats it would be the above table. Table entries would be per-bpp >> (in case that matters, or whatever matters for amd gpus). >> >> Imo would be totally ok to put the relevant lookup tables into >> drm_fourcc.h directly. Or any other suitable place. > > Right, the issue would be that the tables are configured int he kernel > > https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 > > and having a shared hardcoded table means that if the kernel tables > change we break userspace. That would be a new commitment on the side > of the kernel driver. > > (If I can get an okay for that commitment from Alex/Christian I'd be > totally happy to switch it over) Yeah it'd make that table uapi - well the compressed from with duplicates removed, if you really want to make it tiny. But you can just add new entries if the heuristics for a given platform changes, as long as you keep the old entries working (if you shipped them ever ofc, otherwise totally ok to change as you see fit). -Daniel >> From a quick look about 16 bits total. Gives you 40bits of second >> chances and "oops totally sorry". > > I purposefully left 8 bits as a version field for those oopses. > >> -Daniel >> >> > >> >> -Daniel >> >> >> >> > >> >> > > -Daniel >> >> > > >> >> > > > >> >> > > > Regards, >> >> > > > Christian. >> >> > > > >> >> > > > > >> >> > > > > If you do use 256 bits in order to represent >> >> > > > > 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> >> > > > > modifiers per format, userspace would start hitting OOM pretty quickly >> >> > > > > as it attempted to enumerate and negotiate acceptable modifiers. >> >> > > > > Either that or we need to replace the fixed 64-bit modifier tokens >> >> > > > > with some kind of eBPF script. >> >> > > > > >> >> > > > > Cheers, >> >> > > > > Daniel >> >> > > > > _______________________________________________ >> >> > > > > dri-devel mailing list >> >> > > > > dri-devel@lists.freedesktop.org >> >> > > > > https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> > > > >> >> > > >> >> > > -- >> >> > > Daniel Vetter >> >> > > Software Engineer, Intel Corporation >> >> > > http://blog.ffwll.ch >> >> > > _______________________________________________ >> >> > > dri-devel mailing list >> >> > > dri-devel@lists.freedesktop.org >> >> > > https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> >> >> -- >> >> Daniel Vetter >> >> Software Engineer, Intel Corporation >> >> http://blog.ffwll.ch >> >> >> >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Am 04.09.2018 um 18:37 schrieb Daniel Vetter: > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen > <bas@basnieuwenhuizen.nl> wrote: >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen >>> <bas@basnieuwenhuizen.nl> wrote: >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: >>>>>>>>>>> +/* The chip this is compatible with. >>>>>>>>>>> + * >>>>>>>>>>> + * If compression is disabled, use >>>>>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >>>>>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >>>>>>>>>>> + * >>>>>>>>>>> + * With compression enabled please use the exact chip. >>>>>>>>>>> + * >>>>>>>>>>> + * TODO: Do some generations share DCC format? >>>>>>>>>>> + */ >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling >>>>>>>>>> details? When we had the entire discussion with nvidia folks they >>>>>>>>>> eventually agreed that they don't need the massive pile with every >>>>>>>>>> possible combination. Do you really plan to share all these different >>>>>>>>>> things? >>>>>>>>>> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not >>>>>>>>>> using explicit modifier flags for everything. >>>>>>>>> Right. The conclusion, after people went through and started sorting >>>>>>>>> out the kinds of formats for which they would _actually_ export real >>>>>>>>> colour buffers for, that most vendors definitely have fewer than >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>>>>>>>> possible formats to represent, very likely fewer than >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 >>>>>>>>> bits of the 56 available. >>>>>>>> The problem here is that at least for some parameters we actually don't know >>>>>>>> which formats are actually used. >>>>>>>> >>>>>>>> The following are not real world examples, but just to explain the general >>>>>>>> problem. >>>>>>>> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one >>>>>>>> channel, but it is still perfectly possible. >>>>>>>> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them >>>>>>>> could be bad and we need to know which to actually use. >>>>>>> For my understanding: This leaks outside the chip when sharing buffers? >>>>>>> All the information you only need locally to a given amdgpu instance >>>>>>> don't really need to be encoded in modifiers. >>>>>>> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be >>>>>>> good starters I guess) would be really good here. >>>>>> I extracted the information on which bits are relevant mostly from the >>>>>> AddrFromCoord functions in addrlib in mesa: >>>>>> >>>>>> for macro-tiles: >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >>>>>> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles): >>>>>> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >>>>> So this is the decoding thing. How many of these actually exist, even when >>>>> taking all the other information into account? >>>>> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >>>>> + height + width, how much of all these bits do you actually still freely >>>>> pick? >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, >>>> depth, display-rotated) + whether to use compression, everything else >>>> is fixed given those option, the properties of the chip and the >>>> format. >>>> >>>> >>>>> It might be that all the things you need to know from the memory config >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's >>>>> kinda the angle that we looked at this for everyone else. >>>>> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just >>>>> becomes part of the implied stuff in the modifier. >>>> The problem is some GPUs are compatible for say 8-bpp images, but not >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the >>>> current configuration for all GFX6-GFX8 GPU: >>>> >>>> format: (bank width, bank height, macro tile aspect, num banks) for >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG >>>> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) >>>> ADDR_SURF_P8_32x32_8x16 >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 >>>> CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 >>>> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only >>>> compatible for 8-bpp. >>>> >>>> So we can't just assume that if the first plane properties match that >>>> they do so for the second plane, because we don't know what GPU it is >>>> coming from. >>>> >>>> And we can make a canonical table from this but then if we change the >>>> above tables in the kernel that runs into compatibility issues. >>> >>> I think that's roughly what nvidia ended up doing (not sure they >>> published anything, haven't seen any patches on dri-devel). Two parts: >>> - Bunch of flags/bits for the major mode, like {display, non-display, >>> DCC} and so on. >>> - For each of the major modes a list of enumerated modes actually used >>> in reality. For DCC this would be the generation, for more plain >>> formats it would be the above table. Table entries would be per-bpp >>> (in case that matters, or whatever matters for amd gpus). >>> >>> Imo would be totally ok to put the relevant lookup tables into >>> drm_fourcc.h directly. Or any other suitable place. >> Right, the issue would be that the tables are configured int he kernel >> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 >> >> and having a shared hardcoded table means that if the kernel tables >> change we break userspace. That would be a new commitment on the side >> of the kernel driver. >> >> (If I can get an okay for that commitment from Alex/Christian I'd be >> totally happy to switch it over) > Yeah it'd make that table uapi That won't work. Those lists are not necessary stable. What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index plus a 8bit version number. Shouldn't be more than 32 bits in total. Regards, Christian. > - well the compressed from with > duplicates removed, if you really want to make it tiny. But you can > just add new entries if the heuristics for a given platform changes, > as long as you keep the old entries working (if you shipped them ever > ofc, otherwise totally ok to change as you see fit). > -Daniel > >>> From a quick look about 16 bits total. Gives you 40bits of second >>> chances and "oops totally sorry". >> I purposefully left 8 bits as a version field for those oopses. >> >>> -Daniel >>> >>>>> -Daniel >>>>> >>>>>>> -Daniel >>>>>>> >>>>>>>> Regards, >>>>>>>> Christian. >>>>>>>> >>>>>>>>> If you do use 256 bits in order to represent >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers. >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens >>>>>>>>> with some kind of eBPF script. >>>>>>>>> >>>>>>>>> Cheers, >>>>>>>>> Daniel >>>>>>>>> _______________________________________________ >>>>>>>>> dri-devel mailing list >>>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>> -- >>>>>>> Daniel Vetter >>>>>>> Software Engineer, Intel Corporation >>>>>>> http://blog.ffwll.ch >>>>>>> _______________________________________________ >>>>>>> dri-devel mailing list >>>>>>> dri-devel@lists.freedesktop.org >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>> -- >>>>> Daniel Vetter >>>>> Software Engineer, Intel Corporation >>>>> http://blog.ffwll.ch >>> >>> >>> -- >>> Daniel Vetter >>> Software Engineer, Intel Corporation >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > >
On Tue, Sep 4, 2018 at 7:48 PM Christian König <ckoenig.leichtzumerken@gmail.com> wrote: > > Am 04.09.2018 um 18:37 schrieb Daniel Vetter: > > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen > > <bas@basnieuwenhuizen.nl> wrote: > >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > >>> <bas@basnieuwenhuizen.nl> wrote: > >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: > >>>>>>>>> Hi, > >>>>>>>>> > >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > >>>>>>>>>>> +/* The chip this is compatible with. > >>>>>>>>>>> + * > >>>>>>>>>>> + * If compression is disabled, use > >>>>>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > >>>>>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ > >>>>>>>>>>> + * > >>>>>>>>>>> + * With compression enabled please use the exact chip. > >>>>>>>>>>> + * > >>>>>>>>>>> + * TODO: Do some generations share DCC format? > >>>>>>>>>>> + */ > >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling > >>>>>>>>>> details? When we had the entire discussion with nvidia folks they > >>>>>>>>>> eventually agreed that they don't need the massive pile with every > >>>>>>>>>> possible combination. Do you really plan to share all these different > >>>>>>>>>> things? > >>>>>>>>>> > >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon > >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not > >>>>>>>>>> using explicit modifier flags for everything. > >>>>>>>>> Right. The conclusion, after people went through and started sorting > >>>>>>>>> out the kinds of formats for which they would _actually_ export real > >>>>>>>>> colour buffers for, that most vendors definitely have fewer than > >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > >>>>>>>>> possible formats to represent, very likely fewer than > >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally > >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 > >>>>>>>>> bits of the 56 available. > >>>>>>>> The problem here is that at least for some parameters we actually don't know > >>>>>>>> which formats are actually used. > >>>>>>>> > >>>>>>>> The following are not real world examples, but just to explain the general > >>>>>>>> problem. > >>>>>>>> > >>>>>>>> The memory configuration for example can be not ASIC specific, but rather > >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a > >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one > >>>>>>>> channel, but it is still perfectly possible. > >>>>>>>> > >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them > >>>>>>>> could be bad and we need to know which to actually use. > >>>>>>> For my understanding: This leaks outside the chip when sharing buffers? > >>>>>>> All the information you only need locally to a given amdgpu instance > >>>>>>> don't really need to be encoded in modifiers. > >>>>>>> > >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be > >>>>>>> good starters I guess) would be really good here. > >>>>>> I extracted the information on which bits are relevant mostly from the > >>>>>> AddrFromCoord functions in addrlib in mesa: > >>>>>> > >>>>>> for macro-tiles: > >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > >>>>>> > >>>>>> for micro-tiles (or the micro-tiles in macro-tiles): > >>>>>> > >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > >>>>> So this is the decoding thing. How many of these actually exist, even when > >>>>> taking all the other information into account? > >>>>> > >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride > >>>>> + height + width, how much of all these bits do you actually still freely > >>>>> pick? > >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, > >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, > >>>> depth, display-rotated) + whether to use compression, everything else > >>>> is fixed given those option, the properties of the chip and the > >>>> format. > >>>> > >>>> > >>>>> It might be that all the things you need to know from the memory config > >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's > >>>>> kinda the angle that we looked at this for everyone else. > >>>>> > >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the > >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just > >>>>> becomes part of the implied stuff in the modifier. > >>>> The problem is some GPUs are compatible for say 8-bpp images, but not > >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the > >>>> current configuration for all GFX6-GFX8 GPU: > >>>> > >>>> format: (bank width, bank height, macro tile aspect, num banks) for > >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG > >>>> > >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 > >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) > >>>> ADDR_SURF_P8_32x32_8x16 > >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 > >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 > >>>> CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 > >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 > >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 > >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 > >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 > >>>> > >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp > >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only > >>>> compatible for 8-bpp. > >>>> > >>>> So we can't just assume that if the first plane properties match that > >>>> they do so for the second plane, because we don't know what GPU it is > >>>> coming from. > >>>> > >>>> And we can make a canonical table from this but then if we change the > >>>> above tables in the kernel that runs into compatibility issues. > >>> > >>> I think that's roughly what nvidia ended up doing (not sure they > >>> published anything, haven't seen any patches on dri-devel). Two parts: > >>> - Bunch of flags/bits for the major mode, like {display, non-display, > >>> DCC} and so on. > >>> - For each of the major modes a list of enumerated modes actually used > >>> in reality. For DCC this would be the generation, for more plain > >>> formats it would be the above table. Table entries would be per-bpp > >>> (in case that matters, or whatever matters for amd gpus). > >>> > >>> Imo would be totally ok to put the relevant lookup tables into > >>> drm_fourcc.h directly. Or any other suitable place. > >> Right, the issue would be that the tables are configured int he kernel > >> > >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 > >> > >> and having a shared hardcoded table means that if the kernel tables > >> change we break userspace. That would be a new commitment on the side > >> of the kernel driver. > >> > >> (If I can get an okay for that commitment from Alex/Christian I'd be > >> totally happy to switch it over) > > Yeah it'd make that table uapi > > That won't work. Those lists are not necessary stable. > > What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index > plus a 8bit version number. But if we use PCI-ID, then we cannot really share between different models of GPU ... So we have seen 3 options on design level: 1) Use PCI-ID/chip. Advantage: -simple - Few bits Disadvantage: - No sharing between different GPU models 2) Use a fixed lookup/canonicalization table. Advantage: -Few bits - Sharing between models Disadvantage: - Cannot change the kernel tiling tables. 3) Encode tiling mode properties in modifier. Advantage: - Can share between models. - Can change the kernel tiling tables. Disadvantage: - Use a lot of bits. Overall the number of exposed modifiers should be pretty much the same, so encoding size that we have to deal with all possible bit patterns. I've implemented option 3 in the patch here because I think it is the best tradeoff. As far as I can see it fits, and especially if we do no multisample support and remove the tile split bytes, and remove the 3rd set of plane macro-tile options because no format has 3 different bitdepths for the 3 planes, we are left with using 37 bits, leaving 19 for future extensions and a version number. It is not great but pretty doable? > > Shouldn't be more than 32 bits in total. > > Regards, > Christian. > > > - well the compressed from with > > duplicates removed, if you really want to make it tiny. But you can > > just add new entries if the heuristics for a given platform changes, > > as long as you keep the old entries working (if you shipped them ever > > ofc, otherwise totally ok to change as you see fit). > > -Daniel > > > >>> From a quick look about 16 bits total. Gives you 40bits of second > >>> chances and "oops totally sorry". > >> I purposefully left 8 bits as a version field for those oopses. > >> > >>> -Daniel > >>> > >>>>> -Daniel > >>>>> > >>>>>>> -Daniel > >>>>>>> > >>>>>>>> Regards, > >>>>>>>> Christian. > >>>>>>>> > >>>>>>>>> If you do use 256 bits in order to represent > >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly > >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers. > >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens > >>>>>>>>> with some kind of eBPF script. > >>>>>>>>> > >>>>>>>>> Cheers, > >>>>>>>>> Daniel > >>>>>>>>> _______________________________________________ > >>>>>>>>> dri-devel mailing list > >>>>>>>>> dri-devel@lists.freedesktop.org > >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >>>>>>> -- > >>>>>>> Daniel Vetter > >>>>>>> Software Engineer, Intel Corporation > >>>>>>> http://blog.ffwll.ch > >>>>>>> _______________________________________________ > >>>>>>> dri-devel mailing list > >>>>>>> dri-devel@lists.freedesktop.org > >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >>>>> -- > >>>>> Daniel Vetter > >>>>> Software Engineer, Intel Corporation > >>>>> http://blog.ffwll.ch > >>> > >>> > >>> -- > >>> Daniel Vetter > >>> Software Engineer, Intel Corporation > >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > > > > >
On Tue, Sep 4, 2018 at 7:57 PM Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > On Tue, Sep 4, 2018 at 7:48 PM Christian König > <ckoenig.leichtzumerken@gmail.com> wrote: > > > > Am 04.09.2018 um 18:37 schrieb Daniel Vetter: > > > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen > > > <bas@basnieuwenhuizen.nl> wrote: > > >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > > >>> <bas@basnieuwenhuizen.nl> wrote: > > >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > > >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > > >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > >>>>>>>>> Hi, > > >>>>>>>>> > > >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > >>>>>>>>>>> +/* The chip this is compatible with. > > >>>>>>>>>>> + * > > >>>>>>>>>>> + * If compression is disabled, use > > >>>>>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > >>>>>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > >>>>>>>>>>> + * > > >>>>>>>>>>> + * With compression enabled please use the exact chip. > > >>>>>>>>>>> + * > > >>>>>>>>>>> + * TODO: Do some generations share DCC format? > > >>>>>>>>>>> + */ > > >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling > > >>>>>>>>>> details? When we had the entire discussion with nvidia folks they > > >>>>>>>>>> eventually agreed that they don't need the massive pile with every > > >>>>>>>>>> possible combination. Do you really plan to share all these different > > >>>>>>>>>> things? > > >>>>>>>>>> > > >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon > > >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not > > >>>>>>>>>> using explicit modifier flags for everything. > > >>>>>>>>> Right. The conclusion, after people went through and started sorting > > >>>>>>>>> out the kinds of formats for which they would _actually_ export real > > >>>>>>>>> colour buffers for, that most vendors definitely have fewer than > > >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > >>>>>>>>> possible formats to represent, very likely fewer than > > >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally > > >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > >>>>>>>>> bits of the 56 available. > > >>>>>>>> The problem here is that at least for some parameters we actually don't know > > >>>>>>>> which formats are actually used. > > >>>>>>>> > > >>>>>>>> The following are not real world examples, but just to explain the general > > >>>>>>>> problem. > > >>>>>>>> > > >>>>>>>> The memory configuration for example can be not ASIC specific, but rather > > >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a > > >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one > > >>>>>>>> channel, but it is still perfectly possible. > > >>>>>>>> > > >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them > > >>>>>>>> could be bad and we need to know which to actually use. > > >>>>>>> For my understanding: This leaks outside the chip when sharing buffers? > > >>>>>>> All the information you only need locally to a given amdgpu instance > > >>>>>>> don't really need to be encoded in modifiers. > > >>>>>>> > > >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be > > >>>>>>> good starters I guess) would be really good here. > > >>>>>> I extracted the information on which bits are relevant mostly from the > > >>>>>> AddrFromCoord functions in addrlib in mesa: > > >>>>>> > > >>>>>> for macro-tiles: > > >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > > >>>>>> > > >>>>>> for micro-tiles (or the micro-tiles in macro-tiles): > > >>>>>> > > >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > > >>>>> So this is the decoding thing. How many of these actually exist, even when > > >>>>> taking all the other information into account? > > >>>>> > > >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride > > >>>>> + height + width, how much of all these bits do you actually still freely > > >>>>> pick? > > >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, > > >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, > > >>>> depth, display-rotated) + whether to use compression, everything else > > >>>> is fixed given those option, the properties of the chip and the > > >>>> format. > > >>>> > > >>>> > > >>>>> It might be that all the things you need to know from the memory config > > >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's > > >>>>> kinda the angle that we looked at this for everyone else. > > >>>>> > > >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the > > >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just > > >>>>> becomes part of the implied stuff in the modifier. > > >>>> The problem is some GPUs are compatible for say 8-bpp images, but not > > >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the > > >>>> current configuration for all GFX6-GFX8 GPU: > > >>>> > > >>>> format: (bank width, bank height, macro tile aspect, num banks) for > > >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG > > >>>> > > >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > > >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > > >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 > > >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) > > >>>> ADDR_SURF_P8_32x32_8x16 > > >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 > > >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 > > >>>> CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 > > >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 > > >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 > > >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > > >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 > > >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > > >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 > > >>>> > > >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp > > >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only > > >>>> compatible for 8-bpp. > > >>>> > > >>>> So we can't just assume that if the first plane properties match that > > >>>> they do so for the second plane, because we don't know what GPU it is > > >>>> coming from. > > >>>> > > >>>> And we can make a canonical table from this but then if we change the > > >>>> above tables in the kernel that runs into compatibility issues. > > >>> > > >>> I think that's roughly what nvidia ended up doing (not sure they > > >>> published anything, haven't seen any patches on dri-devel). Two parts: > > >>> - Bunch of flags/bits for the major mode, like {display, non-display, > > >>> DCC} and so on. > > >>> - For each of the major modes a list of enumerated modes actually used > > >>> in reality. For DCC this would be the generation, for more plain > > >>> formats it would be the above table. Table entries would be per-bpp > > >>> (in case that matters, or whatever matters for amd gpus). > > >>> > > >>> Imo would be totally ok to put the relevant lookup tables into > > >>> drm_fourcc.h directly. Or any other suitable place. > > >> Right, the issue would be that the tables are configured int he kernel > > >> > > >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 > > >> > > >> and having a shared hardcoded table means that if the kernel tables > > >> change we break userspace. That would be a new commitment on the side > > >> of the kernel driver. > > >> > > >> (If I can get an okay for that commitment from Alex/Christian I'd be > > >> totally happy to switch it over) > > > Yeah it'd make that table uapi > > > > That won't work. Those lists are not necessary stable. > > > > What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index > > plus a 8bit version number. > > But if we use PCI-ID, then we cannot really share between different > models of GPU ... > > So we have seen 3 options on design level: > 1) Use PCI-ID/chip. > Advantage: > -simple > - Few bits > Disadvantage: > - No sharing between different GPU models > 2) Use a fixed lookup/canonicalization table. > Advantage: > -Few bits > - Sharing between models > Disadvantage: > - Cannot change the kernel tiling tables. A fourth option would actually be to put this fixed table in the kernel and have an ioctl so userspace can ask for it. That way we avoid the kernel update issue. > 3) Encode tiling mode properties in modifier. > Advantage: > - Can share between models. > - Can change the kernel tiling tables. > Disadvantage: > - Use a lot of bits. > > Overall the number of exposed modifiers should be pretty much the > same, so encoding size that we have to deal with all possible bit > patterns. > > I've implemented option 3 in the patch here because I think it is the > best tradeoff. As far as I can see it fits, and especially if we do no > multisample support and remove the tile split bytes, and remove the > 3rd set of plane macro-tile options because no format has 3 different > bitdepths for the 3 planes, we are left with using 37 bits, leaving 19 > for future extensions and a version number. It is not great but pretty > doable? > > > > > Shouldn't be more than 32 bits in total. > > > > Regards, > > Christian. > > > > > - well the compressed from with > > > duplicates removed, if you really want to make it tiny. But you can > > > just add new entries if the heuristics for a given platform changes, > > > as long as you keep the old entries working (if you shipped them ever > > > ofc, otherwise totally ok to change as you see fit). > > > -Daniel > > > > > >>> From a quick look about 16 bits total. Gives you 40bits of second > > >>> chances and "oops totally sorry". > > >> I purposefully left 8 bits as a version field for those oopses. > > >> > > >>> -Daniel > > >>> > > >>>>> -Daniel > > >>>>> > > >>>>>>> -Daniel > > >>>>>>> > > >>>>>>>> Regards, > > >>>>>>>> Christian. > > >>>>>>>> > > >>>>>>>>> If you do use 256 bits in order to represent > > >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly > > >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers. > > >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens > > >>>>>>>>> with some kind of eBPF script. > > >>>>>>>>> > > >>>>>>>>> Cheers, > > >>>>>>>>> Daniel > > >>>>>>>>> _______________________________________________ > > >>>>>>>>> dri-devel mailing list > > >>>>>>>>> dri-devel@lists.freedesktop.org > > >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > >>>>>>> -- > > >>>>>>> Daniel Vetter > > >>>>>>> Software Engineer, Intel Corporation > > >>>>>>> http://blog.ffwll.ch > > >>>>>>> _______________________________________________ > > >>>>>>> dri-devel mailing list > > >>>>>>> dri-devel@lists.freedesktop.org > > >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > >>>>> -- > > >>>>> Daniel Vetter > > >>>>> Software Engineer, Intel Corporation > > >>>>> http://blog.ffwll.ch > > >>> > > >>> > > >>> -- > > >>> Daniel Vetter > > >>> Software Engineer, Intel Corporation > > >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > > > > > > > >
Am 04.09.2018 um 20:00 schrieb Bas Nieuwenhuizen: > On Tue, Sep 4, 2018 at 7:57 PM Bas Nieuwenhuizen > <bas@basnieuwenhuizen.nl> wrote: >> On Tue, Sep 4, 2018 at 7:48 PM Christian König >> <ckoenig.leichtzumerken@gmail.com> wrote: >>> Am 04.09.2018 um 18:37 schrieb Daniel Vetter: >>>> On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen >>>> <bas@basnieuwenhuizen.nl> wrote: >>>>> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: >>>>>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen >>>>>> <bas@basnieuwenhuizen.nl> wrote: >>>>>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: >>>>>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >>>>>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: >>>>>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >>>>>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >>>>>>>>>>>> Hi, >>>>>>>>>>>> >>>>>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >>>>>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: >>>>>>>>>>>>>> +/* The chip this is compatible with. >>>>>>>>>>>>>> + * >>>>>>>>>>>>>> + * If compression is disabled, use >>>>>>>>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >>>>>>>>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >>>>>>>>>>>>>> + * >>>>>>>>>>>>>> + * With compression enabled please use the exact chip. >>>>>>>>>>>>>> + * >>>>>>>>>>>>>> + * TODO: Do some generations share DCC format? >>>>>>>>>>>>>> + */ >>>>>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >>>>>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >>>>>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling >>>>>>>>>>>>> details? When we had the entire discussion with nvidia folks they >>>>>>>>>>>>> eventually agreed that they don't need the massive pile with every >>>>>>>>>>>>> possible combination. Do you really plan to share all these different >>>>>>>>>>>>> things? >>>>>>>>>>>>> >>>>>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon >>>>>>>>>>>>> buffer size and buffer format (because that's how the hw works), not >>>>>>>>>>>>> using explicit modifier flags for everything. >>>>>>>>>>>> Right. The conclusion, after people went through and started sorting >>>>>>>>>>>> out the kinds of formats for which they would _actually_ export real >>>>>>>>>>>> colour buffers for, that most vendors definitely have fewer than >>>>>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>>>>>>>>>>> possible formats to represent, very likely fewer than >>>>>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >>>>>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally >>>>>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 >>>>>>>>>>>> bits of the 56 available. >>>>>>>>>>> The problem here is that at least for some parameters we actually don't know >>>>>>>>>>> which formats are actually used. >>>>>>>>>>> >>>>>>>>>>> The following are not real world examples, but just to explain the general >>>>>>>>>>> problem. >>>>>>>>>>> >>>>>>>>>>> The memory configuration for example can be not ASIC specific, but rather >>>>>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a >>>>>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one >>>>>>>>>>> channel, but it is still perfectly possible. >>>>>>>>>>> >>>>>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them >>>>>>>>>>> could be bad and we need to know which to actually use. >>>>>>>>>> For my understanding: This leaks outside the chip when sharing buffers? >>>>>>>>>> All the information you only need locally to a given amdgpu instance >>>>>>>>>> don't really need to be encoded in modifiers. >>>>>>>>>> >>>>>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be >>>>>>>>>> good starters I guess) would be really good here. >>>>>>>>> I extracted the information on which bits are relevant mostly from the >>>>>>>>> AddrFromCoord functions in addrlib in mesa: >>>>>>>>> >>>>>>>>> for macro-tiles: >>>>>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >>>>>>>>> >>>>>>>>> for micro-tiles (or the micro-tiles in macro-tiles): >>>>>>>>> >>>>>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >>>>>>>> So this is the decoding thing. How many of these actually exist, even when >>>>>>>> taking all the other information into account? >>>>>>>> >>>>>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >>>>>>>> + height + width, how much of all these bits do you actually still freely >>>>>>>> pick? >>>>>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, >>>>>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, >>>>>>> depth, display-rotated) + whether to use compression, everything else >>>>>>> is fixed given those option, the properties of the chip and the >>>>>>> format. >>>>>>> >>>>>>> >>>>>>>> It might be that all the things you need to know from the memory config >>>>>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's >>>>>>>> kinda the angle that we looked at this for everyone else. >>>>>>>> >>>>>>>> E.g. for multi-plane stuff, if everyone picks the same config for the >>>>>>>> 2nd/3rd plane, then you don't actually need to encode that. It just >>>>>>>> becomes part of the implied stuff in the modifier. >>>>>>> The problem is some GPUs are compatible for say 8-bpp images, but not >>>>>>> for 32-bpp surfaces. e.g. lets look at the following table showing the >>>>>>> current configuration for all GFX6-GFX8 GPU: >>>>>>> >>>>>>> format: (bank width, bank height, macro tile aspect, num banks) for >>>>>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG >>>>>>> >>>>>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 >>>>>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 >>>>>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 >>>>>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) >>>>>>> ADDR_SURF_P8_32x32_8x16 >>>>>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 >>>>>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 >>>>>>> CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 >>>>>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 >>>>>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 >>>>>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 >>>>>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 >>>>>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 >>>>>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 >>>>>>> >>>>>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp >>>>>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only >>>>>>> compatible for 8-bpp. >>>>>>> >>>>>>> So we can't just assume that if the first plane properties match that >>>>>>> they do so for the second plane, because we don't know what GPU it is >>>>>>> coming from. >>>>>>> >>>>>>> And we can make a canonical table from this but then if we change the >>>>>>> above tables in the kernel that runs into compatibility issues. >>>>>> I think that's roughly what nvidia ended up doing (not sure they >>>>>> published anything, haven't seen any patches on dri-devel). Two parts: >>>>>> - Bunch of flags/bits for the major mode, like {display, non-display, >>>>>> DCC} and so on. >>>>>> - For each of the major modes a list of enumerated modes actually used >>>>>> in reality. For DCC this would be the generation, for more plain >>>>>> formats it would be the above table. Table entries would be per-bpp >>>>>> (in case that matters, or whatever matters for amd gpus). >>>>>> >>>>>> Imo would be totally ok to put the relevant lookup tables into >>>>>> drm_fourcc.h directly. Or any other suitable place. >>>>> Right, the issue would be that the tables are configured int he kernel >>>>> >>>>> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 >>>>> >>>>> and having a shared hardcoded table means that if the kernel tables >>>>> change we break userspace. That would be a new commitment on the side >>>>> of the kernel driver. >>>>> >>>>> (If I can get an okay for that commitment from Alex/Christian I'd be >>>>> totally happy to switch it over) >>>> Yeah it'd make that table uapi >>> That won't work. Those lists are not necessary stable. >>> >>> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index >>> plus a 8bit version number. >> But if we use PCI-ID, then we cannot really share between different >> models of GPU ... >> >> So we have seen 3 options on design level: >> 1) Use PCI-ID/chip. >> Advantage: >> -simple >> - Few bits >> Disadvantage: >> - No sharing between different GPU models >> 2) Use a fixed lookup/canonicalization table. >> Advantage: >> -Few bits >> - Sharing between models >> Disadvantage: >> - Cannot change the kernel tiling tables. > A fourth option would actually be to put this fixed table in the > kernel and have an ioctl so userspace can ask for it. That way we > avoid the kernel update issue. That is already present. The table is actually only used for GFX. MM, SDMA, DCE have separate ways of configuring that and so need the config for each entry. >> 3) Encode tiling mode properties in modifier. >> Advantage: >> - Can share between models. >> - Can change the kernel tiling tables. >> Disadvantage: >> - Use a lot of bits. >> >> Overall the number of exposed modifiers should be pretty much the >> same, so encoding size that we have to deal with all possible bit >> patterns. >> >> I've implemented option 3 in the patch here because I think it is the >> best tradeoff. As far as I can see it fits, and especially if we do no >> multisample support and remove the tile split bytes, and remove the >> 3rd set of plane macro-tile options because no format has 3 different >> bitdepths for the 3 planes, we are left with using 37 bits, leaving 19 >> for future extensions and a version number. It is not great but pretty >> doable? Yeah, the issue is you seem to look only at the public available bits of addrlib. That might not be enough to get a complete picture, but I think for sharing between processes it should be enough. Regards, Christian. >> >>> Shouldn't be more than 32 bits in total. >>> >>> Regards, >>> Christian. >>> >>>> - well the compressed from with >>>> duplicates removed, if you really want to make it tiny. But you can >>>> just add new entries if the heuristics for a given platform changes, >>>> as long as you keep the old entries working (if you shipped them ever >>>> ofc, otherwise totally ok to change as you see fit). >>>> -Daniel >>>> >>>>>> From a quick look about 16 bits total. Gives you 40bits of second >>>>>> chances and "oops totally sorry". >>>>> I purposefully left 8 bits as a version field for those oopses. >>>>> >>>>>> -Daniel >>>>>> >>>>>>>> -Daniel >>>>>>>> >>>>>>>>>> -Daniel >>>>>>>>>> >>>>>>>>>>> Regards, >>>>>>>>>>> Christian. >>>>>>>>>>> >>>>>>>>>>>> If you do use 256 bits in order to represent >>>>>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >>>>>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly >>>>>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers. >>>>>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens >>>>>>>>>>>> with some kind of eBPF script. >>>>>>>>>>>> >>>>>>>>>>>> Cheers, >>>>>>>>>>>> Daniel >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> dri-devel mailing list >>>>>>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>>>>> -- >>>>>>>>>> Daniel Vetter >>>>>>>>>> Software Engineer, Intel Corporation >>>>>>>>>> http://blog.ffwll.ch >>>>>>>>>> _______________________________________________ >>>>>>>>>> dri-devel mailing list >>>>>>>>>> dri-devel@lists.freedesktop.org >>>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>>>>>>> -- >>>>>>>> Daniel Vetter >>>>>>>> Software Engineer, Intel Corporation >>>>>>>> http://blog.ffwll.ch >>>>>> >>>>>> -- >>>>>> Daniel Vetter >>>>>> Software Engineer, Intel Corporation >>>>>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch >>>>
On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > On Tue, Sep 4, 2018 at 7:48 PM Christian König > <ckoenig.leichtzumerken@gmail.com> wrote: >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter: >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen >> > <bas@basnieuwenhuizen.nl> wrote: >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen >> >>> <bas@basnieuwenhuizen.nl> wrote: >> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >> >>>>>>>>> Hi, >> >>>>>>>>> >> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: >> >>>>>>>>>>> +/* The chip this is compatible with. >> >>>>>>>>>>> + * >> >>>>>>>>>>> + * If compression is disabled, use >> >>>>>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >> >>>>>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >> >>>>>>>>>>> + * >> >>>>>>>>>>> + * With compression enabled please use the exact chip. >> >>>>>>>>>>> + * >> >>>>>>>>>>> + * TODO: Do some generations share DCC format? >> >>>>>>>>>>> + */ >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling >> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they >> >>>>>>>>>> eventually agreed that they don't need the massive pile with every >> >>>>>>>>>> possible combination. Do you really plan to share all these different >> >>>>>>>>>> things? >> >>>>>>>>>> >> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon >> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not >> >>>>>>>>>> using explicit modifier flags for everything. >> >>>>>>>>> Right. The conclusion, after people went through and started sorting >> >>>>>>>>> out the kinds of formats for which they would _actually_ export real >> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> >>>>>>>>> possible formats to represent, very likely fewer than >> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally >> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 >> >>>>>>>>> bits of the 56 available. >> >>>>>>>> The problem here is that at least for some parameters we actually don't know >> >>>>>>>> which formats are actually used. >> >>>>>>>> >> >>>>>>>> The following are not real world examples, but just to explain the general >> >>>>>>>> problem. >> >>>>>>>> >> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather >> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a >> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one >> >>>>>>>> channel, but it is still perfectly possible. >> >>>>>>>> >> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them >> >>>>>>>> could be bad and we need to know which to actually use. >> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers? >> >>>>>>> All the information you only need locally to a given amdgpu instance >> >>>>>>> don't really need to be encoded in modifiers. >> >>>>>>> >> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be >> >>>>>>> good starters I guess) would be really good here. >> >>>>>> I extracted the information on which bits are relevant mostly from the >> >>>>>> AddrFromCoord functions in addrlib in mesa: >> >>>>>> >> >>>>>> for macro-tiles: >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >> >>>>>> >> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles): >> >>>>>> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >> >>>>> So this is the decoding thing. How many of these actually exist, even when >> >>>>> taking all the other information into account? >> >>>>> >> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >> >>>>> + height + width, how much of all these bits do you actually still freely >> >>>>> pick? >> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, >> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, >> >>>> depth, display-rotated) + whether to use compression, everything else >> >>>> is fixed given those option, the properties of the chip and the >> >>>> format. >> >>>> >> >>>> >> >>>>> It might be that all the things you need to know from the memory config >> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's >> >>>>> kinda the angle that we looked at this for everyone else. >> >>>>> >> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the >> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just >> >>>>> becomes part of the implied stuff in the modifier. >> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not >> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the >> >>>> current configuration for all GFX6-GFX8 GPU: >> >>>> >> >>>> format: (bank width, bank height, macro tile aspect, num banks) for >> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG >> >>>> >> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 >> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 >> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 >> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) >> >>>> ADDR_SURF_P8_32x32_8x16 >> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 >> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 >> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 >> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 >> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 >> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 >> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 >> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 >> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 >> >>>> >> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp >> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only >> >>>> compatible for 8-bpp. >> >>>> >> >>>> So we can't just assume that if the first plane properties match that >> >>>> they do so for the second plane, because we don't know what GPU it is >> >>>> coming from. >> >>>> >> >>>> And we can make a canonical table from this but then if we change the >> >>>> above tables in the kernel that runs into compatibility issues. >> >>> >> >>> I think that's roughly what nvidia ended up doing (not sure they >> >>> published anything, haven't seen any patches on dri-devel). Two parts: >> >>> - Bunch of flags/bits for the major mode, like {display, non-display, >> >>> DCC} and so on. >> >>> - For each of the major modes a list of enumerated modes actually used >> >>> in reality. For DCC this would be the generation, for more plain >> >>> formats it would be the above table. Table entries would be per-bpp >> >>> (in case that matters, or whatever matters for amd gpus). >> >>> >> >>> Imo would be totally ok to put the relevant lookup tables into >> >>> drm_fourcc.h directly. Or any other suitable place. >> >> Right, the issue would be that the tables are configured int he kernel >> >> >> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 >> >> >> >> and having a shared hardcoded table means that if the kernel tables >> >> change we break userspace. That would be a new commitment on the side >> >> of the kernel driver. >> >> >> >> (If I can get an okay for that commitment from Alex/Christian I'd be >> >> totally happy to switch it over) >> > Yeah it'd make that table uapi >> >> That won't work. Those lists are not necessary stable. >> >> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index >> plus a 8bit version number. > > But if we use PCI-ID, then we cannot really share between different > models of GPU ... > > So we have seen 3 options on design level: > 1) Use PCI-ID/chip. > Advantage: > -simple > - Few bits > Disadvantage: > - No sharing between different GPU models > 2) Use a fixed lookup/canonicalization table. > Advantage: > -Few bits > - Sharing between models > Disadvantage: > - Cannot change the kernel tiling tables. Why is changing the kernel table impossible? Just because it's uapi doesn't mean we can never extend it. Just add a new entry at the bottom if you need more combinations ... You don't even need to require the updated kernel amdgpu.ko, just the updated drm_fourcc.h header (if it's one of these things you can configure through cs packets entirely from userspace). Or why does this not work? -Daniel > 3) Encode tiling mode properties in modifier. > Advantage: > - Can share between models. > - Can change the kernel tiling tables. > Disadvantage: > - Use a lot of bits. > > Overall the number of exposed modifiers should be pretty much the > same, so encoding size that we have to deal with all possible bit > patterns. > > I've implemented option 3 in the patch here because I think it is the > best tradeoff. As far as I can see it fits, and especially if we do no > multisample support and remove the tile split bytes, and remove the > 3rd set of plane macro-tile options because no format has 3 different > bitdepths for the 3 planes, we are left with using 37 bits, leaving 19 > for future extensions and a version number. It is not great but pretty > doable? > >> >> Shouldn't be more than 32 bits in total. >> >> Regards, >> Christian. >> >> > - well the compressed from with >> > duplicates removed, if you really want to make it tiny. But you can >> > just add new entries if the heuristics for a given platform changes, >> > as long as you keep the old entries working (if you shipped them ever >> > ofc, otherwise totally ok to change as you see fit). >> > -Daniel >> > >> >>> From a quick look about 16 bits total. Gives you 40bits of second >> >>> chances and "oops totally sorry". >> >> I purposefully left 8 bits as a version field for those oopses. >> >> >> >>> -Daniel >> >>> >> >>>>> -Daniel >> >>>>> >> >>>>>>> -Daniel >> >>>>>>> >> >>>>>>>> Regards, >> >>>>>>>> Christian. >> >>>>>>>> >> >>>>>>>>> If you do use 256 bits in order to represent >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly >> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers. >> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens >> >>>>>>>>> with some kind of eBPF script. >> >>>>>>>>> >> >>>>>>>>> Cheers, >> >>>>>>>>> Daniel >> >>>>>>>>> _______________________________________________ >> >>>>>>>>> dri-devel mailing list >> >>>>>>>>> dri-devel@lists.freedesktop.org >> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >>>>>>> -- >> >>>>>>> Daniel Vetter >> >>>>>>> Software Engineer, Intel Corporation >> >>>>>>> http://blog.ffwll.ch >> >>>>>>> _______________________________________________ >> >>>>>>> dri-devel mailing list >> >>>>>>> dri-devel@lists.freedesktop.org >> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >>>>> -- >> >>>>> Daniel Vetter >> >>>>> Software Engineer, Intel Corporation >> >>>>> http://blog.ffwll.ch >> >>> >> >>> >> >>> -- >> >>> Daniel Vetter >> >>> Software Engineer, Intel Corporation >> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch >> > >> > >>
On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen > <bas@basnieuwenhuizen.nl> wrote: > > On Tue, Sep 4, 2018 at 7:48 PM Christian König > > <ckoenig.leichtzumerken@gmail.com> wrote: > >> > >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter: > >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen > >> > <bas@basnieuwenhuizen.nl> wrote: > >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > >> >>> <bas@basnieuwenhuizen.nl> wrote: > >> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > >> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > >> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: > >> >>>>>>>>> Hi, > >> >>>>>>>>> > >> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > >> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > >> >>>>>>>>>>> +/* The chip this is compatible with. > >> >>>>>>>>>>> + * > >> >>>>>>>>>>> + * If compression is disabled, use > >> >>>>>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > >> >>>>>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ > >> >>>>>>>>>>> + * > >> >>>>>>>>>>> + * With compression enabled please use the exact chip. > >> >>>>>>>>>>> + * > >> >>>>>>>>>>> + * TODO: Do some generations share DCC format? > >> >>>>>>>>>>> + */ > >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > >> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling > >> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they > >> >>>>>>>>>> eventually agreed that they don't need the massive pile with every > >> >>>>>>>>>> possible combination. Do you really plan to share all these different > >> >>>>>>>>>> things? > >> >>>>>>>>>> > >> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon > >> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not > >> >>>>>>>>>> using explicit modifier flags for everything. > >> >>>>>>>>> Right. The conclusion, after people went through and started sorting > >> >>>>>>>>> out the kinds of formats for which they would _actually_ export real > >> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than > >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > >> >>>>>>>>> possible formats to represent, very likely fewer than > >> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > >> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally > >> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 > >> >>>>>>>>> bits of the 56 available. > >> >>>>>>>> The problem here is that at least for some parameters we actually don't know > >> >>>>>>>> which formats are actually used. > >> >>>>>>>> > >> >>>>>>>> The following are not real world examples, but just to explain the general > >> >>>>>>>> problem. > >> >>>>>>>> > >> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather > >> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a > >> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one > >> >>>>>>>> channel, but it is still perfectly possible. > >> >>>>>>>> > >> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them > >> >>>>>>>> could be bad and we need to know which to actually use. > >> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers? > >> >>>>>>> All the information you only need locally to a given amdgpu instance > >> >>>>>>> don't really need to be encoded in modifiers. > >> >>>>>>> > >> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be > >> >>>>>>> good starters I guess) would be really good here. > >> >>>>>> I extracted the information on which bits are relevant mostly from the > >> >>>>>> AddrFromCoord functions in addrlib in mesa: > >> >>>>>> > >> >>>>>> for macro-tiles: > >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > >> >>>>>> > >> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles): > >> >>>>>> > >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > >> >>>>> So this is the decoding thing. How many of these actually exist, even when > >> >>>>> taking all the other information into account? > >> >>>>> > >> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride > >> >>>>> + height + width, how much of all these bits do you actually still freely > >> >>>>> pick? > >> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, > >> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, > >> >>>> depth, display-rotated) + whether to use compression, everything else > >> >>>> is fixed given those option, the properties of the chip and the > >> >>>> format. > >> >>>> > >> >>>> > >> >>>>> It might be that all the things you need to know from the memory config > >> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's > >> >>>>> kinda the angle that we looked at this for everyone else. > >> >>>>> > >> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the > >> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just > >> >>>>> becomes part of the implied stuff in the modifier. > >> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not > >> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the > >> >>>> current configuration for all GFX6-GFX8 GPU: > >> >>>> > >> >>>> format: (bank width, bank height, macro tile aspect, num banks) for > >> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG > >> >>>> > >> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > >> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > >> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 > >> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) > >> >>>> ADDR_SURF_P8_32x32_8x16 > >> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 > >> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 > >> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 > >> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 > >> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 > >> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > >> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 > >> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > >> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 > >> >>>> > >> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp > >> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only > >> >>>> compatible for 8-bpp. > >> >>>> > >> >>>> So we can't just assume that if the first plane properties match that > >> >>>> they do so for the second plane, because we don't know what GPU it is > >> >>>> coming from. > >> >>>> > >> >>>> And we can make a canonical table from this but then if we change the > >> >>>> above tables in the kernel that runs into compatibility issues. > >> >>> > >> >>> I think that's roughly what nvidia ended up doing (not sure they > >> >>> published anything, haven't seen any patches on dri-devel). Two parts: > >> >>> - Bunch of flags/bits for the major mode, like {display, non-display, > >> >>> DCC} and so on. > >> >>> - For each of the major modes a list of enumerated modes actually used > >> >>> in reality. For DCC this would be the generation, for more plain > >> >>> formats it would be the above table. Table entries would be per-bpp > >> >>> (in case that matters, or whatever matters for amd gpus). > >> >>> > >> >>> Imo would be totally ok to put the relevant lookup tables into > >> >>> drm_fourcc.h directly. Or any other suitable place. > >> >> Right, the issue would be that the tables are configured int he kernel > >> >> > >> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 > >> >> > >> >> and having a shared hardcoded table means that if the kernel tables > >> >> change we break userspace. That would be a new commitment on the side > >> >> of the kernel driver. > >> >> > >> >> (If I can get an okay for that commitment from Alex/Christian I'd be > >> >> totally happy to switch it over) > >> > Yeah it'd make that table uapi > >> > >> That won't work. Those lists are not necessary stable. > >> > >> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index > >> plus a 8bit version number. > > > > But if we use PCI-ID, then we cannot really share between different > > models of GPU ... > > > > So we have seen 3 options on design level: > > 1) Use PCI-ID/chip. > > Advantage: > > -simple > > - Few bits > > Disadvantage: > > - No sharing between different GPU models > > 2) Use a fixed lookup/canonicalization table. > > Advantage: > > -Few bits > > - Sharing between models > > Disadvantage: > > - Cannot change the kernel tiling tables. > > Why is changing the kernel table impossible? Just because it's uapi > doesn't mean we can never extend it. Just add a new entry at the > bottom if you need more combinations ... > > You don't even need to require the updated kernel amdgpu.ko, just the > updated drm_fourcc.h header (if it's one of these things you can > configure through cs packets entirely from userspace). So we have two tables we are talking about: a) The kernel initialized system wide table of tiling configurations that already exists b)The uapi table Currently mesa selects by index, i.e. if we want a 32-bit surface with macro-tiling that is not displayable we should select index 14 in the system wide tiling table. Now if we change that table entry 14 that means that we change which things it is compatible with, so we need a new entry in the uapi table. However, an old mesa does not know about the new entry in the uapi table, so loses the ability to select a modifier for the preferred tiling mode, e.g. we pretty much regress old userspace. > > Or why does this not work? > -Daniel > > > 3) Encode tiling mode properties in modifier. > > Advantage: > > - Can share between models. > > - Can change the kernel tiling tables. > > Disadvantage: > > - Use a lot of bits. > > > > Overall the number of exposed modifiers should be pretty much the > > same, so encoding size that we have to deal with all possible bit > > patterns. > > > > I've implemented option 3 in the patch here because I think it is the > > best tradeoff. As far as I can see it fits, and especially if we do no > > multisample support and remove the tile split bytes, and remove the > > 3rd set of plane macro-tile options because no format has 3 different > > bitdepths for the 3 planes, we are left with using 37 bits, leaving 19 > > for future extensions and a version number. It is not great but pretty > > doable? > > > >> > >> Shouldn't be more than 32 bits in total. > >> > >> Regards, > >> Christian. > >> > >> > - well the compressed from with > >> > duplicates removed, if you really want to make it tiny. But you can > >> > just add new entries if the heuristics for a given platform changes, > >> > as long as you keep the old entries working (if you shipped them ever > >> > ofc, otherwise totally ok to change as you see fit). > >> > -Daniel > >> > > >> >>> From a quick look about 16 bits total. Gives you 40bits of second > >> >>> chances and "oops totally sorry". > >> >> I purposefully left 8 bits as a version field for those oopses. > >> >> > >> >>> -Daniel > >> >>> > >> >>>>> -Daniel > >> >>>>> > >> >>>>>>> -Daniel > >> >>>>>>> > >> >>>>>>>> Regards, > >> >>>>>>>> Christian. > >> >>>>>>>> > >> >>>>>>>>> If you do use 256 bits in order to represent > >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > >> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly > >> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers. > >> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens > >> >>>>>>>>> with some kind of eBPF script. > >> >>>>>>>>> > >> >>>>>>>>> Cheers, > >> >>>>>>>>> Daniel > >> >>>>>>>>> _______________________________________________ > >> >>>>>>>>> dri-devel mailing list > >> >>>>>>>>> dri-devel@lists.freedesktop.org > >> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >> >>>>>>> -- > >> >>>>>>> Daniel Vetter > >> >>>>>>> Software Engineer, Intel Corporation > >> >>>>>>> http://blog.ffwll.ch > >> >>>>>>> _______________________________________________ > >> >>>>>>> dri-devel mailing list > >> >>>>>>> dri-devel@lists.freedesktop.org > >> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >> >>>>> -- > >> >>>>> Daniel Vetter > >> >>>>> Software Engineer, Intel Corporation > >> >>>>> http://blog.ffwll.ch > >> >>> > >> >>> > >> >>> -- > >> >>> Daniel Vetter > >> >>> Software Engineer, Intel Corporation > >> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > >> > > >> > > >> > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen >> <bas@basnieuwenhuizen.nl> wrote: >> > On Tue, Sep 4, 2018 at 7:48 PM Christian König >> > <ckoenig.leichtzumerken@gmail.com> wrote: >> >> >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter: >> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen >> >> > <bas@basnieuwenhuizen.nl> wrote: >> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen >> >> >>> <bas@basnieuwenhuizen.nl> wrote: >> >> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: >> >> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: >> >> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: >> >> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: >> >> >>>>>>>>> Hi, >> >> >>>>>>>>> >> >> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: >> >> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: >> >> >>>>>>>>>>> +/* The chip this is compatible with. >> >> >>>>>>>>>>> + * >> >> >>>>>>>>>>> + * If compression is disabled, use >> >> >>>>>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 >> >> >>>>>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ >> >> >>>>>>>>>>> + * >> >> >>>>>>>>>>> + * With compression enabled please use the exact chip. >> >> >>>>>>>>>>> + * >> >> >>>>>>>>>>> + * TODO: Do some generations share DCC format? >> >> >>>>>>>>>>> + */ >> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 >> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff >> >> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling >> >> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they >> >> >>>>>>>>>> eventually agreed that they don't need the massive pile with every >> >> >>>>>>>>>> possible combination. Do you really plan to share all these different >> >> >>>>>>>>>> things? >> >> >>>>>>>>>> >> >> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon >> >> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not >> >> >>>>>>>>>> using explicit modifier flags for everything. >> >> >>>>>>>>> Right. The conclusion, after people went through and started sorting >> >> >>>>>>>>> out the kinds of formats for which they would _actually_ export real >> >> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than >> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> >> >>>>>>>>> possible formats to represent, very likely fewer than >> >> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably >> >> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally >> >> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 >> >> >>>>>>>>> bits of the 56 available. >> >> >>>>>>>> The problem here is that at least for some parameters we actually don't know >> >> >>>>>>>> which formats are actually used. >> >> >>>>>>>> >> >> >>>>>>>> The following are not real world examples, but just to explain the general >> >> >>>>>>>> problem. >> >> >>>>>>>> >> >> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather >> >> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a >> >> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one >> >> >>>>>>>> channel, but it is still perfectly possible. >> >> >>>>>>>> >> >> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them >> >> >>>>>>>> could be bad and we need to know which to actually use. >> >> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers? >> >> >>>>>>> All the information you only need locally to a given amdgpu instance >> >> >>>>>>> don't really need to be encoded in modifiers. >> >> >>>>>>> >> >> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be >> >> >>>>>>> good starters I guess) would be really good here. >> >> >>>>>> I extracted the information on which bits are relevant mostly from the >> >> >>>>>> AddrFromCoord functions in addrlib in mesa: >> >> >>>>>> >> >> >>>>>> for macro-tiles: >> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 >> >> >>>>>> >> >> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles): >> >> >>>>>> >> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 >> >> >>>>> So this is the decoding thing. How many of these actually exist, even when >> >> >>>>> taking all the other information into account? >> >> >>>>> >> >> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride >> >> >>>>> + height + width, how much of all these bits do you actually still freely >> >> >>>>> pick? >> >> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, >> >> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, >> >> >>>> depth, display-rotated) + whether to use compression, everything else >> >> >>>> is fixed given those option, the properties of the chip and the >> >> >>>> format. >> >> >>>> >> >> >>>> >> >> >>>>> It might be that all the things you need to know from the memory config >> >> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's >> >> >>>>> kinda the angle that we looked at this for everyone else. >> >> >>>>> >> >> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the >> >> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just >> >> >>>>> becomes part of the implied stuff in the modifier. >> >> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not >> >> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the >> >> >>>> current configuration for all GFX6-GFX8 GPU: >> >> >>>> >> >> >>>> format: (bank width, bank height, macro tile aspect, num banks) for >> >> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG >> >> >>>> >> >> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 >> >> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 >> >> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 >> >> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) >> >> >>>> ADDR_SURF_P8_32x32_8x16 >> >> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 >> >> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 >> >> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 >> >> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 >> >> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 >> >> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 >> >> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 >> >> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 >> >> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 >> >> >>>> >> >> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp >> >> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only >> >> >>>> compatible for 8-bpp. >> >> >>>> >> >> >>>> So we can't just assume that if the first plane properties match that >> >> >>>> they do so for the second plane, because we don't know what GPU it is >> >> >>>> coming from. >> >> >>>> >> >> >>>> And we can make a canonical table from this but then if we change the >> >> >>>> above tables in the kernel that runs into compatibility issues. >> >> >>> >> >> >>> I think that's roughly what nvidia ended up doing (not sure they >> >> >>> published anything, haven't seen any patches on dri-devel). Two parts: >> >> >>> - Bunch of flags/bits for the major mode, like {display, non-display, >> >> >>> DCC} and so on. >> >> >>> - For each of the major modes a list of enumerated modes actually used >> >> >>> in reality. For DCC this would be the generation, for more plain >> >> >>> formats it would be the above table. Table entries would be per-bpp >> >> >>> (in case that matters, or whatever matters for amd gpus). >> >> >>> >> >> >>> Imo would be totally ok to put the relevant lookup tables into >> >> >>> drm_fourcc.h directly. Or any other suitable place. >> >> >> Right, the issue would be that the tables are configured int he kernel >> >> >> >> >> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 >> >> >> >> >> >> and having a shared hardcoded table means that if the kernel tables >> >> >> change we break userspace. That would be a new commitment on the side >> >> >> of the kernel driver. >> >> >> >> >> >> (If I can get an okay for that commitment from Alex/Christian I'd be >> >> >> totally happy to switch it over) >> >> > Yeah it'd make that table uapi >> >> >> >> That won't work. Those lists are not necessary stable. >> >> >> >> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index >> >> plus a 8bit version number. >> > >> > But if we use PCI-ID, then we cannot really share between different >> > models of GPU ... >> > >> > So we have seen 3 options on design level: >> > 1) Use PCI-ID/chip. >> > Advantage: >> > -simple >> > - Few bits >> > Disadvantage: >> > - No sharing between different GPU models >> > 2) Use a fixed lookup/canonicalization table. >> > Advantage: >> > -Few bits >> > - Sharing between models >> > Disadvantage: >> > - Cannot change the kernel tiling tables. >> >> Why is changing the kernel table impossible? Just because it's uapi >> doesn't mean we can never extend it. Just add a new entry at the >> bottom if you need more combinations ... >> >> You don't even need to require the updated kernel amdgpu.ko, just the >> updated drm_fourcc.h header (if it's one of these things you can >> configure through cs packets entirely from userspace). > > So we have two tables we are talking about: > a) The kernel initialized system wide table of tiling configurations > that already exists > b)The uapi table > > Currently mesa selects by index, i.e. if we want a 32-bit surface with > macro-tiling that is not displayable we should select index 14 in the > system wide tiling table. Now if we change that table entry 14 that > means that we change which things it is compatible with, so we need a > new entry in the uapi table. However, an old mesa does not know about > the new entry in the uapi table, so loses the ability to select a > modifier for the preferred tiling mode, e.g. we pretty much regress > old userspace. That sounds like this table already _is_ uapi. So you can't change it ... why would you want to change it then? Note also that the fourcc table doesn't really have to match this by-index tiling mode uapi table, you can freely add more at the end or have any kind of fancy lookup table (one for everyone, or per-generation, or per-whatever, doesn't matter). Just a tradeoff between compressing the set of actually used layouts into as few modifier bits as possible against code maintenance headaches of having to deal with too many lookup tables for all the different things. -Daniel > >> >> Or why does this not work? >> -Daniel >> >> > 3) Encode tiling mode properties in modifier. >> > Advantage: >> > - Can share between models. >> > - Can change the kernel tiling tables. >> > Disadvantage: >> > - Use a lot of bits. >> > >> > Overall the number of exposed modifiers should be pretty much the >> > same, so encoding size that we have to deal with all possible bit >> > patterns. >> > >> > I've implemented option 3 in the patch here because I think it is the >> > best tradeoff. As far as I can see it fits, and especially if we do no >> > multisample support and remove the tile split bytes, and remove the >> > 3rd set of plane macro-tile options because no format has 3 different >> > bitdepths for the 3 planes, we are left with using 37 bits, leaving 19 >> > for future extensions and a version number. It is not great but pretty >> > doable? >> > >> >> >> >> Shouldn't be more than 32 bits in total. >> >> >> >> Regards, >> >> Christian. >> >> >> >> > - well the compressed from with >> >> > duplicates removed, if you really want to make it tiny. But you can >> >> > just add new entries if the heuristics for a given platform changes, >> >> > as long as you keep the old entries working (if you shipped them ever >> >> > ofc, otherwise totally ok to change as you see fit). >> >> > -Daniel >> >> > >> >> >>> From a quick look about 16 bits total. Gives you 40bits of second >> >> >>> chances and "oops totally sorry". >> >> >> I purposefully left 8 bits as a version field for those oopses. >> >> >> >> >> >>> -Daniel >> >> >>> >> >> >>>>> -Daniel >> >> >>>>> >> >> >>>>>>> -Daniel >> >> >>>>>>> >> >> >>>>>>>> Regards, >> >> >>>>>>>> Christian. >> >> >>>>>>>> >> >> >>>>>>>>> If you do use 256 bits in order to represent >> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 >> >> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly >> >> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers. >> >> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens >> >> >>>>>>>>> with some kind of eBPF script. >> >> >>>>>>>>> >> >> >>>>>>>>> Cheers, >> >> >>>>>>>>> Daniel >> >> >>>>>>>>> _______________________________________________ >> >> >>>>>>>>> dri-devel mailing list >> >> >>>>>>>>> dri-devel@lists.freedesktop.org >> >> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> >>>>>>> -- >> >> >>>>>>> Daniel Vetter >> >> >>>>>>> Software Engineer, Intel Corporation >> >> >>>>>>> http://blog.ffwll.ch >> >> >>>>>>> _______________________________________________ >> >> >>>>>>> dri-devel mailing list >> >> >>>>>>> dri-devel@lists.freedesktop.org >> >> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel >> >> >>>>> -- >> >> >>>>> Daniel Vetter >> >> >>>>> Software Engineer, Intel Corporation >> >> >>>>> http://blog.ffwll.ch >> >> >>> >> >> >>> >> >> >>> -- >> >> >>> Daniel Vetter >> >> >>> Software Engineer, Intel Corporation >> >> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch >> >> > >> >> > >> >> >> >> >> >> -- >> Daniel Vetter >> Software Engineer, Intel Corporation >> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Tue, Sep 4, 2018 at 9:28 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen > <bas@basnieuwenhuizen.nl> wrote: > > On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >> > >> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen > >> <bas@basnieuwenhuizen.nl> wrote: > >> > On Tue, Sep 4, 2018 at 7:48 PM Christian König > >> > <ckoenig.leichtzumerken@gmail.com> wrote: > >> >> > >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter: > >> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen > >> >> > <bas@basnieuwenhuizen.nl> wrote: > >> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > >> >> >>> <bas@basnieuwenhuizen.nl> wrote: > >> >> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >> >> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > >> >> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: > >> >> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > >> >> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: > >> >> >>>>>>>>> Hi, > >> >> >>>>>>>>> > >> >> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > >> >> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > >> >> >>>>>>>>>>> +/* The chip this is compatible with. > >> >> >>>>>>>>>>> + * > >> >> >>>>>>>>>>> + * If compression is disabled, use > >> >> >>>>>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > >> >> >>>>>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ > >> >> >>>>>>>>>>> + * > >> >> >>>>>>>>>>> + * With compression enabled please use the exact chip. > >> >> >>>>>>>>>>> + * > >> >> >>>>>>>>>>> + * TODO: Do some generations share DCC format? > >> >> >>>>>>>>>>> + */ > >> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > >> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > >> >> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling > >> >> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they > >> >> >>>>>>>>>> eventually agreed that they don't need the massive pile with every > >> >> >>>>>>>>>> possible combination. Do you really plan to share all these different > >> >> >>>>>>>>>> things? > >> >> >>>>>>>>>> > >> >> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon > >> >> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not > >> >> >>>>>>>>>> using explicit modifier flags for everything. > >> >> >>>>>>>>> Right. The conclusion, after people went through and started sorting > >> >> >>>>>>>>> out the kinds of formats for which they would _actually_ export real > >> >> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than > >> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > >> >> >>>>>>>>> possible formats to represent, very likely fewer than > >> >> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > >> >> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally > >> >> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 > >> >> >>>>>>>>> bits of the 56 available. > >> >> >>>>>>>> The problem here is that at least for some parameters we actually don't know > >> >> >>>>>>>> which formats are actually used. > >> >> >>>>>>>> > >> >> >>>>>>>> The following are not real world examples, but just to explain the general > >> >> >>>>>>>> problem. > >> >> >>>>>>>> > >> >> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather > >> >> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a > >> >> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one > >> >> >>>>>>>> channel, but it is still perfectly possible. > >> >> >>>>>>>> > >> >> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them > >> >> >>>>>>>> could be bad and we need to know which to actually use. > >> >> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers? > >> >> >>>>>>> All the information you only need locally to a given amdgpu instance > >> >> >>>>>>> don't really need to be encoded in modifiers. > >> >> >>>>>>> > >> >> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be > >> >> >>>>>>> good starters I guess) would be really good here. > >> >> >>>>>> I extracted the information on which bits are relevant mostly from the > >> >> >>>>>> AddrFromCoord functions in addrlib in mesa: > >> >> >>>>>> > >> >> >>>>>> for macro-tiles: > >> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > >> >> >>>>>> > >> >> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles): > >> >> >>>>>> > >> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > >> >> >>>>> So this is the decoding thing. How many of these actually exist, even when > >> >> >>>>> taking all the other information into account? > >> >> >>>>> > >> >> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride > >> >> >>>>> + height + width, how much of all these bits do you actually still freely > >> >> >>>>> pick? > >> >> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, > >> >> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, > >> >> >>>> depth, display-rotated) + whether to use compression, everything else > >> >> >>>> is fixed given those option, the properties of the chip and the > >> >> >>>> format. > >> >> >>>> > >> >> >>>> > >> >> >>>>> It might be that all the things you need to know from the memory config > >> >> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's > >> >> >>>>> kinda the angle that we looked at this for everyone else. > >> >> >>>>> > >> >> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the > >> >> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just > >> >> >>>>> becomes part of the implied stuff in the modifier. > >> >> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not > >> >> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the > >> >> >>>> current configuration for all GFX6-GFX8 GPU: > >> >> >>>> > >> >> >>>> format: (bank width, bank height, macro tile aspect, num banks) for > >> >> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG > >> >> >>>> > >> >> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > >> >> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > >> >> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 > >> >> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) > >> >> >>>> ADDR_SURF_P8_32x32_8x16 > >> >> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 > >> >> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 > >> >> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 > >> >> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 > >> >> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 > >> >> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > >> >> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 > >> >> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > >> >> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 > >> >> >>>> > >> >> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp > >> >> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only > >> >> >>>> compatible for 8-bpp. > >> >> >>>> > >> >> >>>> So we can't just assume that if the first plane properties match that > >> >> >>>> they do so for the second plane, because we don't know what GPU it is > >> >> >>>> coming from. > >> >> >>>> > >> >> >>>> And we can make a canonical table from this but then if we change the > >> >> >>>> above tables in the kernel that runs into compatibility issues. > >> >> >>> > >> >> >>> I think that's roughly what nvidia ended up doing (not sure they > >> >> >>> published anything, haven't seen any patches on dri-devel). Two parts: > >> >> >>> - Bunch of flags/bits for the major mode, like {display, non-display, > >> >> >>> DCC} and so on. > >> >> >>> - For each of the major modes a list of enumerated modes actually used > >> >> >>> in reality. For DCC this would be the generation, for more plain > >> >> >>> formats it would be the above table. Table entries would be per-bpp > >> >> >>> (in case that matters, or whatever matters for amd gpus). > >> >> >>> > >> >> >>> Imo would be totally ok to put the relevant lookup tables into > >> >> >>> drm_fourcc.h directly. Or any other suitable place. > >> >> >> Right, the issue would be that the tables are configured int he kernel > >> >> >> > >> >> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 > >> >> >> > >> >> >> and having a shared hardcoded table means that if the kernel tables > >> >> >> change we break userspace. That would be a new commitment on the side > >> >> >> of the kernel driver. > >> >> >> > >> >> >> (If I can get an okay for that commitment from Alex/Christian I'd be > >> >> >> totally happy to switch it over) > >> >> > Yeah it'd make that table uapi > >> >> > >> >> That won't work. Those lists are not necessary stable. > >> >> > >> >> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index > >> >> plus a 8bit version number. > >> > > >> > But if we use PCI-ID, then we cannot really share between different > >> > models of GPU ... > >> > > >> > So we have seen 3 options on design level: > >> > 1) Use PCI-ID/chip. > >> > Advantage: > >> > -simple > >> > - Few bits > >> > Disadvantage: > >> > - No sharing between different GPU models > >> > 2) Use a fixed lookup/canonicalization table. > >> > Advantage: > >> > -Few bits > >> > - Sharing between models > >> > Disadvantage: > >> > - Cannot change the kernel tiling tables. > >> > >> Why is changing the kernel table impossible? Just because it's uapi > >> doesn't mean we can never extend it. Just add a new entry at the > >> bottom if you need more combinations ... > >> > >> You don't even need to require the updated kernel amdgpu.ko, just the > >> updated drm_fourcc.h header (if it's one of these things you can > >> configure through cs packets entirely from userspace). > > > > So we have two tables we are talking about: > > a) The kernel initialized system wide table of tiling configurations > > that already exists > > b)The uapi table > > > > Currently mesa selects by index, i.e. if we want a 32-bit surface with > > macro-tiling that is not displayable we should select index 14 in the > > system wide tiling table. Now if we change that table entry 14 that > > means that we change which things it is compatible with, so we need a > > new entry in the uapi table. However, an old mesa does not know about > > the new entry in the uapi table, so loses the ability to select a > > modifier for the preferred tiling mode, e.g. we pretty much regress > > old userspace. > > That sounds like this table already _is_ uapi. So you can't change it > ... why would you want to change it then? You can change it somewhat. An entry still has to be valid for the use case mesa selects it for, but the actual values are still tunable as long as you have that constraint in mind. Also there effectively is an ioctl to get that table from the kernel and userspace already uses it. > > Note also that the fourcc table doesn't really have to match this > by-index tiling mode uapi table, you can freely add more at the end or > have any kind of fancy lookup table (one for everyone, or > per-generation, or per-whatever, doesn't matter). Just a tradeoff > between compressing the set of actually used layouts into as few > modifier bits as possible against code maintenance headaches of having > to deal with too many lookup tables for all the different things. > -Daniel > > > > >> > >> Or why does this not work? > >> -Daniel > >> > >> > 3) Encode tiling mode properties in modifier. > >> > Advantage: > >> > - Can share between models. > >> > - Can change the kernel tiling tables. > >> > Disadvantage: > >> > - Use a lot of bits. > >> > > >> > Overall the number of exposed modifiers should be pretty much the > >> > same, so encoding size that we have to deal with all possible bit > >> > patterns. > >> > > >> > I've implemented option 3 in the patch here because I think it is the > >> > best tradeoff. As far as I can see it fits, and especially if we do no > >> > multisample support and remove the tile split bytes, and remove the > >> > 3rd set of plane macro-tile options because no format has 3 different > >> > bitdepths for the 3 planes, we are left with using 37 bits, leaving 19 > >> > for future extensions and a version number. It is not great but pretty > >> > doable? > >> > > >> >> > >> >> Shouldn't be more than 32 bits in total. > >> >> > >> >> Regards, > >> >> Christian. > >> >> > >> >> > - well the compressed from with > >> >> > duplicates removed, if you really want to make it tiny. But you can > >> >> > just add new entries if the heuristics for a given platform changes, > >> >> > as long as you keep the old entries working (if you shipped them ever > >> >> > ofc, otherwise totally ok to change as you see fit). > >> >> > -Daniel > >> >> > > >> >> >>> From a quick look about 16 bits total. Gives you 40bits of second > >> >> >>> chances and "oops totally sorry". > >> >> >> I purposefully left 8 bits as a version field for those oopses. > >> >> >> > >> >> >>> -Daniel > >> >> >>> > >> >> >>>>> -Daniel > >> >> >>>>> > >> >> >>>>>>> -Daniel > >> >> >>>>>>> > >> >> >>>>>>>> Regards, > >> >> >>>>>>>> Christian. > >> >> >>>>>>>> > >> >> >>>>>>>>> If you do use 256 bits in order to represent > >> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > >> >> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly > >> >> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers. > >> >> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens > >> >> >>>>>>>>> with some kind of eBPF script. > >> >> >>>>>>>>> > >> >> >>>>>>>>> Cheers, > >> >> >>>>>>>>> Daniel > >> >> >>>>>>>>> _______________________________________________ > >> >> >>>>>>>>> dri-devel mailing list > >> >> >>>>>>>>> dri-devel@lists.freedesktop.org > >> >> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >> >> >>>>>>> -- > >> >> >>>>>>> Daniel Vetter > >> >> >>>>>>> Software Engineer, Intel Corporation > >> >> >>>>>>> http://blog.ffwll.ch > >> >> >>>>>>> _______________________________________________ > >> >> >>>>>>> dri-devel mailing list > >> >> >>>>>>> dri-devel@lists.freedesktop.org > >> >> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > >> >> >>>>> -- > >> >> >>>>> Daniel Vetter > >> >> >>>>> Software Engineer, Intel Corporation > >> >> >>>>> http://blog.ffwll.ch > >> >> >>> > >> >> >>> > >> >> >>> -- > >> >> >>> Daniel Vetter > >> >> >>> Software Engineer, Intel Corporation > >> >> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > >> >> > > >> >> > > >> >> > >> > >> > >> > >> -- > >> Daniel Vetter > >> Software Engineer, Intel Corporation > >> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > > > > -- > Daniel Vetter > Software Engineer, Intel Corporation > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
On Tue, Sep 04, 2018 at 09:36:01PM +0200, Bas Nieuwenhuizen wrote: > On Tue, Sep 4, 2018 at 9:28 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > > > On Tue, Sep 4, 2018 at 8:31 PM, Bas Nieuwenhuizen > > <bas@basnieuwenhuizen.nl> wrote: > > > On Tue, Sep 4, 2018 at 8:27 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > >> > > >> On Tue, Sep 4, 2018 at 7:57 PM, Bas Nieuwenhuizen > > >> <bas@basnieuwenhuizen.nl> wrote: > > >> > On Tue, Sep 4, 2018 at 7:48 PM Christian König > > >> > <ckoenig.leichtzumerken@gmail.com> wrote: > > >> >> > > >> >> Am 04.09.2018 um 18:37 schrieb Daniel Vetter: > > >> >> > On Tue, Sep 4, 2018 at 5:52 PM, Bas Nieuwenhuizen > > >> >> > <bas@basnieuwenhuizen.nl> wrote: > > >> >> >> On Tue, Sep 4, 2018 at 4:43 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > >> >> >>> On Tue, Sep 4, 2018 at 3:33 PM, Bas Nieuwenhuizen > > >> >> >>> <bas@basnieuwenhuizen.nl> wrote: > > >> >> >>>> On Tue, Sep 4, 2018 at 3:04 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > >> >> >>>>> On Tue, Sep 04, 2018 at 02:33:02PM +0200, Bas Nieuwenhuizen wrote: > > >> >> >>>>>> On Tue, Sep 4, 2018 at 2:26 PM Daniel Vetter <daniel@ffwll.ch> wrote: > > >> >> >>>>>>> On Tue, Sep 04, 2018 at 12:44:19PM +0200, Christian König wrote: > > >> >> >>>>>>>> Am 04.09.2018 um 12:15 schrieb Daniel Stone: > > >> >> >>>>>>>>> Hi, > > >> >> >>>>>>>>> > > >> >> >>>>>>>>> On Tue, 4 Sep 2018 at 11:05, Daniel Vetter <daniel.vetter@ffwll.ch> wrote: > > >> >> >>>>>>>>>> On Tue, Sep 4, 2018 at 3:00 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > > >> >> >>>>>>>>>>> +/* The chip this is compatible with. > > >> >> >>>>>>>>>>> + * > > >> >> >>>>>>>>>>> + * If compression is disabled, use > > >> >> >>>>>>>>>>> + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 > > >> >> >>>>>>>>>>> + * - AMDGPU_CHIP_VEGA10 for GFX9+ > > >> >> >>>>>>>>>>> + * > > >> >> >>>>>>>>>>> + * With compression enabled please use the exact chip. > > >> >> >>>>>>>>>>> + * > > >> >> >>>>>>>>>>> + * TODO: Do some generations share DCC format? > > >> >> >>>>>>>>>>> + */ > > >> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 > > >> >> >>>>>>>>>>> +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff > > >> >> >>>>>>>>>> Do you really need all the combinations here of DCC + gpu gen + tiling > > >> >> >>>>>>>>>> details? When we had the entire discussion with nvidia folks they > > >> >> >>>>>>>>>> eventually agreed that they don't need the massive pile with every > > >> >> >>>>>>>>>> possible combination. Do you really plan to share all these different > > >> >> >>>>>>>>>> things? > > >> >> >>>>>>>>>> > > >> >> >>>>>>>>>> Note that e.g. on i915 we spec some of the tiling depending upon > > >> >> >>>>>>>>>> buffer size and buffer format (because that's how the hw works), not > > >> >> >>>>>>>>>> using explicit modifier flags for everything. > > >> >> >>>>>>>>> Right. The conclusion, after people went through and started sorting > > >> >> >>>>>>>>> out the kinds of formats for which they would _actually_ export real > > >> >> >>>>>>>>> colour buffers for, that most vendors definitely have fewer than > > >> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > >> >> >>>>>>>>> possible formats to represent, very likely fewer than > > >> >> >>>>>>>>> 340,282,366,920,938,463,463,374,607,431,768,211,456 formats, probably > > >> >> >>>>>>>>> fewer than 72,057,594,037,927,936 formats, and even still generally > > >> >> >>>>>>>>> fewer than 281,474,976,710,656 if you want to be generous and leave 8 > > >> >> >>>>>>>>> bits of the 56 available. > > >> >> >>>>>>>> The problem here is that at least for some parameters we actually don't know > > >> >> >>>>>>>> which formats are actually used. > > >> >> >>>>>>>> > > >> >> >>>>>>>> The following are not real world examples, but just to explain the general > > >> >> >>>>>>>> problem. > > >> >> >>>>>>>> > > >> >> >>>>>>>> The memory configuration for example can be not ASIC specific, but rather > > >> >> >>>>>>>> determined by whoever took the ASIC and glued it together with VRAM on a > > >> >> >>>>>>>> board. It is not likely that somebody puts all the VRAM chips on one > > >> >> >>>>>>>> channel, but it is still perfectly possible. > > >> >> >>>>>>>> > > >> >> >>>>>>>> Same is true for things like harvesting, e.g. of 16 channels halve of them > > >> >> >>>>>>>> could be bad and we need to know which to actually use. > > >> >> >>>>>>> For my understanding: This leaks outside the chip when sharing buffers? > > >> >> >>>>>>> All the information you only need locally to a given amdgpu instance > > >> >> >>>>>>> don't really need to be encoded in modifiers. > > >> >> >>>>>>> > > >> >> >>>>>>> Pointers to code where this is all decided (kernel and radeonsi would be > > >> >> >>>>>>> good starters I guess) would be really good here. > > >> >> >>>>>> I extracted the information on which bits are relevant mostly from the > > >> >> >>>>>> AddrFromCoord functions in addrlib in mesa: > > >> >> >>>>>> > > >> >> >>>>>> for macro-tiles: > > >> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/r800/egbaddrlib.cpp#L1587 > > >> >> >>>>>> > > >> >> >>>>>> for micro-tiles (or the micro-tiles in macro-tiles): > > >> >> >>>>>> > > >> >> >>>>>> https://gitlab.freedesktop.org/mesa/mesa/blob/master/src/amd/addrlib/core/addrlib1.cpp#L3016 > > >> >> >>>>> So this is the decoding thing. How many of these actually exist, even when > > >> >> >>>>> taking all the other information into account? > > >> >> >>>>> > > >> >> >>>>> E.g. given a platform + memory config (seems needed) + drm_fourcc + stride > > >> >> >>>>> + height + width, how much of all these bits do you actually still freely > > >> >> >>>>> pick? > > >> >> >>>> Basically you pick ARRAY_MODE (linear, micro-tile, macro-tile, sparse, > > >> >> >>>> thick variants of macro-tile), MICRO_TILE_MODE(display, non-display, > > >> >> >>>> depth, display-rotated) + whether to use compression, everything else > > >> >> >>>> is fixed given those option, the properties of the chip and the > > >> >> >>>> format. > > >> >> >>>> > > >> >> >>>> > > >> >> >>>>> It might be that all the things you need to know from the memory config > > >> >> >>>>> don't encode smaller than the macro/micro/whatever else stuff. But that's > > >> >> >>>>> kinda the angle that we looked at this for everyone else. > > >> >> >>>>> > > >> >> >>>>> E.g. for multi-plane stuff, if everyone picks the same config for the > > >> >> >>>>> 2nd/3rd plane, then you don't actually need to encode that. It just > > >> >> >>>>> becomes part of the implied stuff in the modifier. > > >> >> >>>> The problem is some GPUs are compatible for say 8-bpp images, but not > > >> >> >>>> for 32-bpp surfaces. e.g. lets look at the following table showing the > > >> >> >>>> current configuration for all GFX6-GFX8 GPU: > > >> >> >>>> > > >> >> >>>> format: (bank width, bank height, macro tile aspect, num banks) for > > >> >> >>>> 8-bpp, 16-bpp and 32 bpp single-sample followed by the PIPE_CONFIG > > >> >> >>>> > > >> >> >>>> verde: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > > >> >> >>>> oland: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P4_8x16 > > >> >> >>>> hainan: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 2, 16) ADDR_SURF_P2 > > >> >> >>>> tahiti/pitcairn: (1, 4, 1, 16) (1, 2, 1, 16) (1, 1, 1, 16) > > >> >> >>>> ADDR_SURF_P8_32x32_8x16 > > >> >> >>>> bonaire: (1,4,4,16) (1, 2, 4, 16) (1, 1, 2, 16) ADDR_SURF_P4_16x16 > > >> >> >>>> hawaii: (1, 4, 2, 16) (1, 2, 2, 16) (1, 1, 1, 16) ADDR_SURF_P16_32x32_16x16 > > >> >> >>>> CIK APUs: (1,4,4, 8), (1,2,4,8), (1, 2, 2, 8) ADDR_SURF_P2 > > >> >> >>>> topaz: (4, 4, 2, 8) (4, 4, 2, 8) (2, 4, 2, 8) ADDR_SURF_P2 > > >> >> >>>> fiji: (1, 4, 2, 8) (1, 4, 2, 8) (1, 4, 2, 8) ADDR_SURF_P16_32x32_16x16 > > >> >> >>>> tonga:: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > > >> >> >>>> polaris11/12: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P4_16x16 > > >> >> >>>> polaris10: (1, 4, 4, 16), (1, 4, 4, 16) (1, 4, 4, 16) ADDR_SURF_P8_32x32_16x16 > > >> >> >>>> stoney,carrizo: (1, 4, 4, 8) (1, 2, 4, 8), (1, 1, 2, 8) ADDR_SURF_P2 > > >> >> >>>> > > >> >> >>>> We see here that e.g. Stoney and the CIK APUs are compatible for 8-bpp > > >> >> >>>> and 16-bpp, but not 32-bpp. or bonaire and polaris11/12 are only > > >> >> >>>> compatible for 8-bpp. > > >> >> >>>> > > >> >> >>>> So we can't just assume that if the first plane properties match that > > >> >> >>>> they do so for the second plane, because we don't know what GPU it is > > >> >> >>>> coming from. > > >> >> >>>> > > >> >> >>>> And we can make a canonical table from this but then if we change the > > >> >> >>>> above tables in the kernel that runs into compatibility issues. > > >> >> >>> > > >> >> >>> I think that's roughly what nvidia ended up doing (not sure they > > >> >> >>> published anything, haven't seen any patches on dri-devel). Two parts: > > >> >> >>> - Bunch of flags/bits for the major mode, like {display, non-display, > > >> >> >>> DCC} and so on. > > >> >> >>> - For each of the major modes a list of enumerated modes actually used > > >> >> >>> in reality. For DCC this would be the generation, for more plain > > >> >> >>> formats it would be the above table. Table entries would be per-bpp > > >> >> >>> (in case that matters, or whatever matters for amd gpus). > > >> >> >>> > > >> >> >>> Imo would be totally ok to put the relevant lookup tables into > > >> >> >>> drm_fourcc.h directly. Or any other suitable place. > > >> >> >> Right, the issue would be that the tables are configured int he kernel > > >> >> >> > > >> >> >> https://cgit.freedesktop.org/~agd5f/linux/tree/drivers/gpu/drm/amd/amdgpu/gfx_v6_0.c?h=amd-staging-drm-next#n417 > > >> >> >> > > >> >> >> and having a shared hardcoded table means that if the kernel tables > > >> >> >> change we break userspace. That would be a new commitment on the side > > >> >> >> of the kernel driver. > > >> >> >> > > >> >> >> (If I can get an okay for that commitment from Alex/Christian I'd be > > >> >> >> totally happy to switch it over) > > >> >> > Yeah it'd make that table uapi > > >> >> > > >> >> That won't work. Those lists are not necessary stable. > > >> >> > > >> >> What we could do is to use PCI-ID plus tile_mode/macro_tile_mode index > > >> >> plus a 8bit version number. > > >> > > > >> > But if we use PCI-ID, then we cannot really share between different > > >> > models of GPU ... > > >> > > > >> > So we have seen 3 options on design level: > > >> > 1) Use PCI-ID/chip. > > >> > Advantage: > > >> > -simple > > >> > - Few bits > > >> > Disadvantage: > > >> > - No sharing between different GPU models > > >> > 2) Use a fixed lookup/canonicalization table. > > >> > Advantage: > > >> > -Few bits > > >> > - Sharing between models > > >> > Disadvantage: > > >> > - Cannot change the kernel tiling tables. > > >> > > >> Why is changing the kernel table impossible? Just because it's uapi > > >> doesn't mean we can never extend it. Just add a new entry at the > > >> bottom if you need more combinations ... > > >> > > >> You don't even need to require the updated kernel amdgpu.ko, just the > > >> updated drm_fourcc.h header (if it's one of these things you can > > >> configure through cs packets entirely from userspace). > > > > > > So we have two tables we are talking about: > > > a) The kernel initialized system wide table of tiling configurations > > > that already exists > > > b)The uapi table > > > > > > Currently mesa selects by index, i.e. if we want a 32-bit surface with > > > macro-tiling that is not displayable we should select index 14 in the > > > system wide tiling table. Now if we change that table entry 14 that > > > means that we change which things it is compatible with, so we need a > > > new entry in the uapi table. However, an old mesa does not know about > > > the new entry in the uapi table, so loses the ability to select a > > > modifier for the preferred tiling mode, e.g. we pretty much regress > > > old userspace. > > > > That sounds like this table already _is_ uapi. So you can't change it > > ... why would you want to change it then? > You can change it somewhat. An entry still has to be valid for the use > case mesa selects it for, but the actual values are still tunable as > long as you have that constraint in mind. > > Also there effectively is an ioctl to get that table from the kernel > and userspace already uses it. Hm, if that ioctl already exists, could you extend it to include the corresponding fourcc code? Then things would Just Work. Assuming no one else in the amdgpu ecosystem can allocate modifier combinations because they must go through this table. -Daniel > > > > > Note also that the fourcc table doesn't really have to match this > > by-index tiling mode uapi table, you can freely add more at the end or > > have any kind of fancy lookup table (one for everyone, or > > per-generation, or per-whatever, doesn't matter). Just a tradeoff > > between compressing the set of actually used layouts into as few > > modifier bits as possible against code maintenance headaches of having > > to deal with too many lookup tables for all the different things. > > -Daniel > > > > > > > >> > > >> Or why does this not work? > > >> -Daniel > > >> > > >> > 3) Encode tiling mode properties in modifier. > > >> > Advantage: > > >> > - Can share between models. > > >> > - Can change the kernel tiling tables. > > >> > Disadvantage: > > >> > - Use a lot of bits. > > >> > > > >> > Overall the number of exposed modifiers should be pretty much the > > >> > same, so encoding size that we have to deal with all possible bit > > >> > patterns. > > >> > > > >> > I've implemented option 3 in the patch here because I think it is the > > >> > best tradeoff. As far as I can see it fits, and especially if we do no > > >> > multisample support and remove the tile split bytes, and remove the > > >> > 3rd set of plane macro-tile options because no format has 3 different > > >> > bitdepths for the 3 planes, we are left with using 37 bits, leaving 19 > > >> > for future extensions and a version number. It is not great but pretty > > >> > doable? > > >> > > > >> >> > > >> >> Shouldn't be more than 32 bits in total. > > >> >> > > >> >> Regards, > > >> >> Christian. > > >> >> > > >> >> > - well the compressed from with > > >> >> > duplicates removed, if you really want to make it tiny. But you can > > >> >> > just add new entries if the heuristics for a given platform changes, > > >> >> > as long as you keep the old entries working (if you shipped them ever > > >> >> > ofc, otherwise totally ok to change as you see fit). > > >> >> > -Daniel > > >> >> > > > >> >> >>> From a quick look about 16 bits total. Gives you 40bits of second > > >> >> >>> chances and "oops totally sorry". > > >> >> >> I purposefully left 8 bits as a version field for those oopses. > > >> >> >> > > >> >> >>> -Daniel > > >> >> >>> > > >> >> >>>>> -Daniel > > >> >> >>>>> > > >> >> >>>>>>> -Daniel > > >> >> >>>>>>> > > >> >> >>>>>>>> Regards, > > >> >> >>>>>>>> Christian. > > >> >> >>>>>>>> > > >> >> >>>>>>>>> If you do use 256 bits in order to represent > > >> >> >>>>>>>>> 115,792,089,237,316,195,423,570,985,008,687,907,853,269,984,665,640,564,039,457,584,007,913,129,639,936 > > >> >> >>>>>>>>> modifiers per format, userspace would start hitting OOM pretty quickly > > >> >> >>>>>>>>> as it attempted to enumerate and negotiate acceptable modifiers. > > >> >> >>>>>>>>> Either that or we need to replace the fixed 64-bit modifier tokens > > >> >> >>>>>>>>> with some kind of eBPF script. > > >> >> >>>>>>>>> > > >> >> >>>>>>>>> Cheers, > > >> >> >>>>>>>>> Daniel > > >> >> >>>>>>>>> _______________________________________________ > > >> >> >>>>>>>>> dri-devel mailing list > > >> >> >>>>>>>>> dri-devel@lists.freedesktop.org > > >> >> >>>>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > >> >> >>>>>>> -- > > >> >> >>>>>>> Daniel Vetter > > >> >> >>>>>>> Software Engineer, Intel Corporation > > >> >> >>>>>>> http://blog.ffwll.ch > > >> >> >>>>>>> _______________________________________________ > > >> >> >>>>>>> dri-devel mailing list > > >> >> >>>>>>> dri-devel@lists.freedesktop.org > > >> >> >>>>>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel > > >> >> >>>>> -- > > >> >> >>>>> Daniel Vetter > > >> >> >>>>> Software Engineer, Intel Corporation > > >> >> >>>>> http://blog.ffwll.ch > > >> >> >>> > > >> >> >>> > > >> >> >>> -- > > >> >> >>> Daniel Vetter > > >> >> >>> Software Engineer, Intel Corporation > > >> >> >>> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > > >> >> > > > >> >> > > > >> >> > > >> > > >> > > >> > > >> -- > > >> Daniel Vetter > > >> Software Engineer, Intel Corporation > > >> +41 (0) 79 365 57 48 - http://blog.ffwll.ch > > > > > > > > -- > > Daniel Vetter > > Software Engineer, Intel Corporation > > +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Hopefully this answers some questions. Other parameters that affect tiling layouts are GB_ADDR_CONFIG (all chips) and MC_ARB_RAMCFG (GFX6-8 only), and those vary with each chip. Some 32bpp 1D tiling layouts are compatible across all chips (1D display tiling is the same as SW_256B_D if Bpp == 4). On GFX9, swizzle modes <= 11 are the same on all GFX9 chips. The remaining modes depend on GB_ADDR_CONFIG and are also more efficient. Bpp, number of samples, and resource type (2D/3D) affect the layout too, e.g. 3D textures silently use thick tiling on GFX9. Harvesting doesn't affect tiling layouts. The layout changes between layers/slices a little. Always use the base address of the whole image when programming the hardware. Don't assume that the 2nd layer has the same layout. > + * TODO: Can scanout really not support fastclear data? It can, but only those encoded in the DCC buffer (0/1). There is no DAL support for DCC though. > + * TODO: Do some generations share DCC format? DCC mirrors the tiling layout, so the same tiling mode means the same DCC. Take the absolute pixel address, shift it to the right, and you'll get the DCC element address. I would generally consider DCC as non-shareable because of different meanings of TILING_INDEX between chips except maybe for common GFX9 layouts. > [comments about number of bits] We could certainly represent all formats as a list of enums, but then we would need to convert the enums to the full description in drivers. GFX6-8 can use TILING_INDEX (except for stencil, let's ignore stencil). The tiling tables shouldn't change anymore because they are optimized for the hardware, and later hw doesn't have any tables. Marek
On Fri, Sep 7, 2018 at 6:51 AM Marek Olšák <maraeo@gmail.com> wrote: > > Hopefully this answers some questions. > > Other parameters that affect tiling layouts are GB_ADDR_CONFIG (all > chips) and MC_ARB_RAMCFG (GFX6-8 only), and those vary with each chip. For GFX6-GFX8: From GB_ADDR_CONFIG addrlib only uses the pipe interleave bytes which are 0 (=256 bytes) for all AMDGPU HW (and on GFX9 addrlib even asserts on that). From MC_ARB_RAMCFG addrlib reads the number of banks and ranks, calculates the number of logical banks from it, but then does not use it. (Presumably because it is the same number as the number of banks in the tiling table entry?) Some bits gets used by the kernel (memory row size), but those get encoded in the tile split of the tiling table, i.e. we do not need the separate bits. for GFX9, only the DCC meta surface seems to depend on GB_ADDR_CONFIG (except the aforementioned pipe interleave bytes) which are constant. I'll do some checks to see if we can trust addrlib here. > > Some 32bpp 1D tiling layouts are compatible across all chips (1D > display tiling is the same as SW_256B_D if Bpp == 4). Cool hadn't seen that they were equivalent in some situations. > > On GFX9, swizzle modes <= 11 are the same on all GFX9 chips. The > remaining modes depend on GB_ADDR_CONFIG and are also more efficient. > Bpp, number of samples, and resource type (2D/3D) affect the layout > too, e.g. 3D textures silently use thick tiling on GFX9. > > Harvesting doesn't affect tiling layouts. > > The layout changes between layers/slices a little. Always use the base > address of the whole image when programming the hardware. Don't assume > that the 2nd layer has the same layout. > > > + * TODO: Can scanout really not support fastclear data? > > It can, but only those encoded in the DCC buffer (0/1). There is no > DAL support for DCC though. > > > > + * TODO: Do some generations share DCC format? > > DCC mirrors the tiling layout, so the same tiling mode means the same > DCC. Take the absolute pixel address, shift it to the right, and > you'll get the DCC element address. > > I would generally consider DCC as non-shareable because of different > meanings of TILING_INDEX between chips except maybe for common GFX9 > layouts. I was also thinking about the compression changes on Polaris that improved compression. I'm not sure if that added e.g. new encodings in how we store the data of a DCC tile in the main data surface. > > > > [comments about number of bits] > > We could certainly represent all formats as a list of enums, but then > we would need to convert the enums to the full description in drivers. > GFX6-8 can use TILING_INDEX (except for stencil, let's ignore > stencil). The tiling tables shouldn't change anymore because they are > optimized for the hardware, and later hw doesn't have any tables. > > Marek
On Fri, Sep 7, 2018 at 5:55 AM, Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> wrote: > On Fri, Sep 7, 2018 at 6:51 AM Marek Olšák <maraeo@gmail.com> wrote: >> >> Hopefully this answers some questions. >> >> Other parameters that affect tiling layouts are GB_ADDR_CONFIG (all >> chips) and MC_ARB_RAMCFG (GFX6-8 only), and those vary with each chip. > > For GFX6-GFX8: > From GB_ADDR_CONFIG addrlib only uses the pipe interleave bytes which > are 0 (=256 bytes) for all AMDGPU HW (and on GFX9 addrlib even asserts > on that). From MC_ARB_RAMCFG addrlib reads the number of banks and > ranks, calculates the number of logical banks from it, but then does > not use it. (Presumably because it is the same number as the number of > banks in the tiling table entry?) Some bits gets used by the kernel > (memory row size), but those get encoded in the tile split of the > tiling table, i.e. we do not need the separate bits. > > for GFX9, only the DCC meta surface seems to depend on GB_ADDR_CONFIG > (except the aforementioned pipe interleave bytes) which are constant. On GFX9, addrlib in Mesa uses most fields from GB_ADDR_CONFIG. GB_ADDR_CONFIG defines the tiling formats. On older chips, addrlib reads some fields from GB_ADDR_CONFIG and uses the chip identification for others like the number of pipes, even though GB_ADDR_CONFIG has the information too. Marek
diff --git a/include/uapi/drm/amdgpu_drm.h b/include/uapi/drm/amdgpu_drm.h index 94444eeba55b..4e1452161dbf 100644 --- a/include/uapi/drm/amdgpu_drm.h +++ b/include/uapi/drm/amdgpu_drm.h @@ -990,6 +990,136 @@ struct drm_amdgpu_info_vce_clock_table { #define AMDGPU_FAMILY_AI 141 /* Vega10 */ #define AMDGPU_FAMILY_RV 142 /* Raven */ +#define AMDGPU_CHIP_TAHITI 0 +#define AMDGPU_CHIP_PITCAIRN 1 +#define AMDGPU_CHIP_VERDE 2 +#define AMDGPU_CHIP_OLAND 3 +#define AMDGPU_CHIP_HAINAN 4 +#define AMDGPU_CHIP_BONAIRE 5 +#define AMDGPU_CHIP_KAVERI 6 +#define AMDGPU_CHIP_KABINI 7 +#define AMDGPU_CHIP_HAWAII 8 +#define AMDGPU_CHIP_MULLINS 9 +#define AMDGPU_CHIP_TOPAZ 10 +#define AMDGPU_CHIP_TONGA 11 +#define AMDGPU_CHIP_FIJI 12 +#define AMDGPU_CHIP_CARRIZO 13 +#define AMDGPU_CHIP_STONEY 14 +#define AMDGPU_CHIP_POLARIS10 15 +#define AMDGPU_CHIP_POLARIS11 16 +#define AMDGPU_CHIP_POLARIS12 17 +#define AMDGPU_CHIP_VEGAM 18 +#define AMDGPU_CHIP_VEGA10 19 +#define AMDGPU_CHIP_VEGA12 20 +#define AMDGPU_CHIP_VEGA20 21 +#define AMDGPU_CHIP_RAVEN 22 + +/* Format for GFX6-GFX8 DRM format modifiers. These are intentionally the same + * as AMDGPU_TILING_*. However, the the rules as to when to set them are + * different. + * + * Do not use linear ARRAY_MODEs or SWIZZLE_MODEs. Use DRM_FORMAT_MOD_LINEAR + * instead. + * + * If ARRAY_MODE is 1D, only the micro tile mode and the pipe config should be + * set. + * + * For other ARRAY_MODEs: + * - Only set TILE_SPLIT if the image is multisample. + * + * We have 1 extra bit for the micro tile mode, as GFX6 and GFX7+ have 1 + * different value there. The values are + * - depth : 0 + * - displayable : 1 + * - thin : 2 + * - thick (GFX6) : 3 + * - rotated (GFX7+) : 4 + * + * TODO: What to do with multisample multi plane images? More tile split + * fields don't fit if we want to keep a few bits for a format version. + */ +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_SHIFT 0 +#define AMDGPU_MODIFIER_GFX8_ARRAY_MODE_MASK 0xf +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_SHIFT 4 +#define AMDGPU_MODIFIER_GFX8_PIPE_CONFIG_MASK 0x1f +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_SHIFT 9 +#define AMDGPU_MODIFIER_GFX8_TILE_SPLIT_MASK 0x7 +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_SHIFT 12 +#define AMDGPU_MODIFIER_GFX8_MICRO_TILE_MODE_MASK 0x7 +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_SHIFT 15 +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_SHIFT 17 +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_SHIFT 19 +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_SHIFT 21 +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_MASK 0x3 + +/* Macrotile parameters for a second plane if existing */ +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_SHIFT 23 +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_1_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_SHIFT 25 +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_1_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_SHIFT 27 +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_1_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_SHIFT 29 +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_1_MASK 0x3 + +/* Macrotile parameters for a third plane if existing */ +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_SHIFT 31 +#define AMDGPU_MODIFIER_GFX8_BANK_WIDTH_2_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_SHIFT 33 +#define AMDGPU_MODIFIER_GFX8_BANK_HEIGHT_2_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_SHIFT 35 +#define AMDGPU_MODIFIER_GFX8_MACRO_TILE_ASPECT_2_MASK 0x3 +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_SHIFT 37 +#define AMDGPU_MODIFIER_GFX8_NUM_BANKS_2_MASK 0x3 + +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_SHIFT 0 +#define AMDGPU_MODIFIER_GFX9_SWIZZLE_MODE_MASK 0x1f + +/* Whether to enable DCC compression. + * + * If enabled, exporting the surface results in three + * planes: + * - color data + * - DCC data + * - a 64-byte block with + * - a 16 byte 0/1 bool as to whether the surface is currently DCC compressed. + * - a 16-byte 0/1 bool as to whether the surface has fastclear data + * - a 8-byte chunk with the current fastclear colors + * + * To ensure we do not keep compressing and decompressing the surface, once it + * has been decompressed no party may recompress again. + * + * Applications should not hand over images with fastclear data as not + * all users can support it, however, to help both Vulkan implementations + * with the allocation we keep it in the 64-byte block. + * + * TODO: Can scanout really not support fastclear data? + * TODO: What to do with multiplane images? + */ +#define AMDGPU_MODIFIER_COMPRESSION_SHIFT 39 +#define AMDGPU_MODIFIER_COMPRESSION_MASK 0x1 + +/* The chip this is compatible with. + * + * If compression is disabled, use + * - AMDGPU_CHIP_TAHITI for GFX6-GFX8 + * - AMDGPU_CHIP_VEGA10 for GFX9+ + * + * With compression enabled please use the exact chip. + * + * TODO: Do some generations share DCC format? + */ +#define AMDGPU_MODIFIER_CHIP_GEN_SHIFT 40 +#define AMDGPU_MODIFIER_CHIP_GEN_MASK 0xff + +#define AMDGPU_MODIFIER_SET(field, value) \ + (((__u64)(value) & AMDGPU_MODIFIER_##field##_MASK) << AMDGPU_MODIFIER_##field##_SHIFT) +#define AMDGPU_MODIFIER_GET(value, field) \ + (((__u64)(value) >> AMDGPU_MODIFIER_##field##_SHIFT) & AMDGPU_MODIFIER_##field##_MASK) + /* * Definition of free sync enter and exit signals * We may have more options in the future
This is an initial proposal for format modifiers for AMD hardware. It uses 48 bits including a chip generation, leaving 8 bits for a format version number. On gfx6-gfx8 we have all the fields influencing sample locations in memory. Tile split bytes are optional for single sample buffers as no hardware reaches the split size with 1 sample and hence the actual size does not matter. The macrotile fields are duplicated for images with multiple planes. If the planes have different bitdepth they need different macro tile fields and different tile split bytes if multisample. I could not fit multiple copies in for tile split bytes, but multisample & multiplane images are very rare. Overall, I think we should punt on multisample for a later format version since they are generally not shared on any modifier aware windowing system, and we have more issues like fmask & cmask. We need these copies because the drm modifier of all planes in an image needs to be equal, so we need to fit these together. This adds fields for compression support, using metadata that is compatible with AMDVLK and for which radv and radeonsi can reasonably be extended. The big open question for compression is between which generations the format changed to see if we can share more. This explicitly does not try to solve the linear stride alignment issue, thoguh we could internally just use the tiling modes for the linear modes to indicate linear images with the stride for the given chip. Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> CC: Chad Versace <chadversary@chromium.org> CC: Dave Airlie <airlied@redhat.com> CC: Marek Olšák <marek.olsak@amd.com> CC: Nicolai Hähnle <nicolai.haehnle@amd.com> CC: Alex Deucher <alexander.deucher@amd.com> CC: Daniel Vetter <daniel.vetter@ffwll.ch> --- include/uapi/drm/amdgpu_drm.h | 130 ++++++++++++++++++++++++++++++++++ 1 file changed, 130 insertions(+)