mbox series

[RFC,0/3] Introducing I915_FORMAT_MOD_4_TILED_XE2_CCS Modifier for Xe2

Message ID 20240506185238.364539-1-juhapekka.heikkila@gmail.com (mailing list archive)
Headers show
Series Introducing I915_FORMAT_MOD_4_TILED_XE2_CCS Modifier for Xe2 | expand

Message

Juha-Pekka Heikkila May 6, 2024, 6:52 p.m. UTC
These patches introduce I915_FORMAT_MOD_4_TILED_XE2_CCS modifier, which,
from the kernel's perspective, behaves similarly to `I915_FORMAT_MOD_4_TILED`.
This new modifier is primarily intended for user space to effectively monitor
compression status, especially when dealing with a mix of compressed and
uncompressed buffers.

The addition of this modifier facilitates user space in managing compression
status, particularly when utilizing both compressed and uncompressed buffers
concurrently. To leverage compression for these buffers, user space
applications must configure the appropriate Page Attribute Table (PAT) index.
Display engine will treat all Tile4 as if it were compressed under all
circumstances on Xe2 architecture.

Notably, this patch series omits support for X-tiled CCS and linear CCS
for Xe2, neither of which is supported by display engine. X-tiled CCS
offers stateless compression making it less likely to be extensively
utilized. Linear CCS does possess state, but currently lacks expected users.

These patches aim to enhance the flexibility and efficiency of handling
compressed and uncompressed buffers in Xe driver, particularly
catering to the specific requirements of the Xe2 architecture.

Juha-Pekka Heikkila (3):
  drm/fourcc: define Intel Xe2 related tile4 ccs modifier
  drm/xe/display: allow creation of case I915_FORMAT_MOD_4_TILED_XE2_CCS
    type framebuffer
  drm/i915/display: allow creation of case
    I915_FORMAT_MOD_4_TILED_XE2_CCS type framebuffer

 drivers/gpu/drm/i915/display/intel_display.c       |  1 +
 drivers/gpu/drm/i915/display/intel_fb.c            | 10 ++++++++++
 drivers/gpu/drm/i915/display/skl_universal_plane.c |  4 +++-
 drivers/gpu/drm/xe/display/xe_plane_initial.c      |  1 +
 include/uapi/drm/drm_fourcc.h                      | 12 ++++++++++++
 5 files changed, 27 insertions(+), 1 deletion(-)

Comments

Matt Roper May 7, 2024, 10:56 p.m. UTC | #1
On Mon, May 06, 2024 at 09:52:35PM +0300, Juha-Pekka Heikkila wrote:
> These patches introduce I915_FORMAT_MOD_4_TILED_XE2_CCS modifier, which,
> from the kernel's perspective, behaves similarly to `I915_FORMAT_MOD_4_TILED`.
> This new modifier is primarily intended for user space to effectively monitor
> compression status, especially when dealing with a mix of compressed and
> uncompressed buffers.
> 
> The addition of this modifier facilitates user space in managing compression
> status, particularly when utilizing both compressed and uncompressed buffers
> concurrently. To leverage compression for these buffers, user space
> applications must configure the appropriate Page Attribute Table (PAT) index.
> Display engine will treat all Tile4 as if it were compressed under all
> circumstances on Xe2 architecture.

I may have missed some discussion about this, but I thought the previous
consensus was that we didn't want/need new modifiers for compression on
Xe2?  If a userspace client (or the display hardware) receives a buffer
of unknown origin and unknown compression status, it's always fine to
select a compressed PAT when binding the buffer to read since even for
uncompressed buffers the CCS metadata will accurately reflect the
compression status.  Unlike Xe1, where generating content without
compression enabled would leave random garbage in the FlatCCS area, Xe2
will set the corresponding FlatCCS to '0x0' for each block, indicating
uncompressed data.

Can you explain more what the benefit of handling these modifiers
explicitly is?


Matt

> 
> Notably, this patch series omits support for X-tiled CCS and linear CCS
> for Xe2, neither of which is supported by display engine. X-tiled CCS
> offers stateless compression making it less likely to be extensively
> utilized. Linear CCS does possess state, but currently lacks expected users.
> 
> These patches aim to enhance the flexibility and efficiency of handling
> compressed and uncompressed buffers in Xe driver, particularly
> catering to the specific requirements of the Xe2 architecture.
> 
> Juha-Pekka Heikkila (3):
>   drm/fourcc: define Intel Xe2 related tile4 ccs modifier
>   drm/xe/display: allow creation of case I915_FORMAT_MOD_4_TILED_XE2_CCS
>     type framebuffer
>   drm/i915/display: allow creation of case
>     I915_FORMAT_MOD_4_TILED_XE2_CCS type framebuffer
> 
>  drivers/gpu/drm/i915/display/intel_display.c       |  1 +
>  drivers/gpu/drm/i915/display/intel_fb.c            | 10 ++++++++++
>  drivers/gpu/drm/i915/display/skl_universal_plane.c |  4 +++-
>  drivers/gpu/drm/xe/display/xe_plane_initial.c      |  1 +
>  include/uapi/drm/drm_fourcc.h                      | 12 ++++++++++++
>  5 files changed, 27 insertions(+), 1 deletion(-)
> 
> -- 
> 2.43.2
>
Juha-Pekka Heikkila May 8, 2024, 1:08 p.m. UTC | #2
On 8.5.2024 1.56, Matt Roper wrote:
> On Mon, May 06, 2024 at 09:52:35PM +0300, Juha-Pekka Heikkila wrote:
>> These patches introduce I915_FORMAT_MOD_4_TILED_XE2_CCS modifier, which,
>> from the kernel's perspective, behaves similarly to `I915_FORMAT_MOD_4_TILED`.
>> This new modifier is primarily intended for user space to effectively monitor
>> compression status, especially when dealing with a mix of compressed and
>> uncompressed buffers.
>>
>> The addition of this modifier facilitates user space in managing compression
>> status, particularly when utilizing both compressed and uncompressed buffers
>> concurrently. To leverage compression for these buffers, user space
>> applications must configure the appropriate Page Attribute Table (PAT) index.
>> Display engine will treat all Tile4 as if it were compressed under all
>> circumstances on Xe2 architecture.
> 
> I may have missed some discussion about this, but I thought the previous
> consensus was that we didn't want/need new modifiers for compression on
> Xe2?  If a userspace client (or the display hardware) receives a buffer
> of unknown origin and unknown compression status, it's always fine to
> select a compressed PAT when binding the buffer to read since even for
> uncompressed buffers the CCS metadata will accurately reflect the
> compression status.  Unlike Xe1, where generating content without
> compression enabled would leave random garbage in the FlatCCS area, Xe2
> will set the corresponding FlatCCS to '0x0' for each block, indicating
> uncompressed data.

UMDs have been wishing solution for situation where they receive shared 
bo with unknown compressions status. Problem would arise when 
uncompressed buffer is shared and it's mapped as compressed while 
process which shared bo still would use it as uncompressed. One who 
mapped it last will have correct content on surface while other will 
have have semi-random garbage.

 From what I understand of the compression specifications, all buffers 
should indeed be treated as compressed unless they fall into one of the 
specially listed categories, which are primarily display-related. I 
can't say if this patch set is what is needed to solve UMD issues, I 
don't know why wouldn't user space treat buffers as compressed by default.

> 
> Can you explain more what the benefit of handling these modifiers
> explicitly is?
> 
> 
> Matt
> 
>>
>> Notably, this patch series omits support for X-tiled CCS and linear CCS
>> for Xe2, neither of which is supported by display engine. X-tiled CCS
>> offers stateless compression making it less likely to be extensively
>> utilized. Linear CCS does possess state, but currently lacks expected users.
>>
>> These patches aim to enhance the flexibility and efficiency of handling
>> compressed and uncompressed buffers in Xe driver, particularly
>> catering to the specific requirements of the Xe2 architecture.
>>
>> Juha-Pekka Heikkila (3):
>>    drm/fourcc: define Intel Xe2 related tile4 ccs modifier
>>    drm/xe/display: allow creation of case I915_FORMAT_MOD_4_TILED_XE2_CCS
>>      type framebuffer
>>    drm/i915/display: allow creation of case
>>      I915_FORMAT_MOD_4_TILED_XE2_CCS type framebuffer
>>
>>   drivers/gpu/drm/i915/display/intel_display.c       |  1 +
>>   drivers/gpu/drm/i915/display/intel_fb.c            | 10 ++++++++++
>>   drivers/gpu/drm/i915/display/skl_universal_plane.c |  4 +++-
>>   drivers/gpu/drm/xe/display/xe_plane_initial.c      |  1 +
>>   include/uapi/drm/drm_fourcc.h                      | 12 ++++++++++++
>>   5 files changed, 27 insertions(+), 1 deletion(-)
>>
>> -- 
>> 2.43.2
>>
>
Kenneth Graunke May 11, 2024, 12:58 a.m. UTC | #3
On Tuesday, May 7, 2024 3:56:57 PM PDT Matt Roper wrote:
> On Mon, May 06, 2024 at 09:52:35PM +0300, Juha-Pekka Heikkila wrote:
> > These patches introduce I915_FORMAT_MOD_4_TILED_XE2_CCS modifier, which,
> > from the kernel's perspective, behaves similarly to 
`I915_FORMAT_MOD_4_TILED`.
> > This new modifier is primarily intended for user space to effectively 
monitor
> > compression status, especially when dealing with a mix of compressed and
> > uncompressed buffers.
> > 
> > The addition of this modifier facilitates user space in managing 
compression
> > status, particularly when utilizing both compressed and uncompressed 
buffers
> > concurrently. To leverage compression for these buffers, user space
> > applications must configure the appropriate Page Attribute Table (PAT) 
index.
> > Display engine will treat all Tile4 as if it were compressed under all
> > circumstances on Xe2 architecture.
> 
> I may have missed some discussion about this, but I thought the previous
> consensus was that we didn't want/need new modifiers for compression on
> Xe2?  If a userspace client (or the display hardware) receives a buffer
> of unknown origin and unknown compression status, it's always fine to
> select a compressed PAT when binding the buffer to read since even for
> uncompressed buffers the CCS metadata will accurately reflect the
> compression status.  Unlike Xe1, where generating content without
> compression enabled would leave random garbage in the FlatCCS area, Xe2
> will set the corresponding FlatCCS to '0x0' for each block, indicating
> uncompressed data.
> 
> Can you explain more what the benefit of handling these modifiers
> explicitly is?
> 
> 
> Matt

Thanks, Matt!  I'm a bit late in getting up to speed with the Xe2 compression 
changes; this is really good information.

As I understand it...all blocks on the GPU behave in the way you mentioned, 
where generating uncompressed data via the GPU will set FlatCCS = 0, so you 
can assume a compressed PAT entry and everything works.

One snag is...I've heard that CPU access doesn't work that way.  So, if you 
mmap a buffer on the CPU, and write data with the CPU, then I think we're back 
to the "FlatCCS contains uninitialized garbage" case, where it's unsafe to 
assume a compressed PAT.  And... we don't really know when sharing buffers 
whether the other side is going to want to do CPU access.

It would be really nice to assume compression by default, though, which got me 
thinking: if we mmap a buffer via DRM_XE_GEM_MMAP_OFFSET, could xe.ko disable 
compression for us?  So, resolve any outstanding CCS data, and then switch any 
PAT entries to uncompressed.  Mapping would block until that resolve is done.  
It could leave compression off forever (once you CPU map a buffer, it's never 
compressed again).  Or it could turn CCS back on when map count reaches 0 (but 
frankly I'm not sure that's terribly important, and sounds more complex).

As I understand it, at least on discrete GPUs, the kernel already has to do 
something similar for eviction, when migrating BOs to system memory (which 
doesn't support compression).  So this would be doing basically the same 
"resolve and disable CCS" step the kernel can presumably already do, but now 
on mmap as well.

What do you think?  Viable?  Crazy?  Have I missed something?

--Ken
Joonas Lahtinen May 14, 2024, 9:25 a.m. UTC | #4
Quoting Kenneth Graunke (2024-05-11 03:58:34)
> On Tuesday, May 7, 2024 3:56:57 PM PDT Matt Roper wrote:
> > On Mon, May 06, 2024 at 09:52:35PM +0300, Juha-Pekka Heikkila wrote:
> > > These patches introduce I915_FORMAT_MOD_4_TILED_XE2_CCS modifier, which,
> > > from the kernel's perspective, behaves similarly to 
> `I915_FORMAT_MOD_4_TILED`.
> > > This new modifier is primarily intended for user space to effectively 
> monitor
> > > compression status, especially when dealing with a mix of compressed and
> > > uncompressed buffers.
> > > 
> > > The addition of this modifier facilitates user space in managing 
> compression
> > > status, particularly when utilizing both compressed and uncompressed 
> buffers
> > > concurrently. To leverage compression for these buffers, user space
> > > applications must configure the appropriate Page Attribute Table (PAT) 
> index.
> > > Display engine will treat all Tile4 as if it were compressed under all
> > > circumstances on Xe2 architecture.
> > 
> > I may have missed some discussion about this, but I thought the previous
> > consensus was that we didn't want/need new modifiers for compression on
> > Xe2?  If a userspace client (or the display hardware) receives a buffer
> > of unknown origin and unknown compression status, it's always fine to
> > select a compressed PAT when binding the buffer to read since even for
> > uncompressed buffers the CCS metadata will accurately reflect the
> > compression status.  Unlike Xe1, where generating content without
> > compression enabled would leave random garbage in the FlatCCS area, Xe2
> > will set the corresponding FlatCCS to '0x0' for each block, indicating
> > uncompressed data.
> > 
> > Can you explain more what the benefit of handling these modifiers
> > explicitly is?
> > 
> > 
> > Matt
> 
> Thanks, Matt!  I'm a bit late in getting up to speed with the Xe2 compression 
> changes; this is really good information.
> 
> As I understand it...all blocks on the GPU behave in the way you mentioned, 
> where generating uncompressed data via the GPU will set FlatCCS = 0, so you 
> can assume a compressed PAT entry and everything works.
> 
> One snag is...I've heard that CPU access doesn't work that way.  So, if you 
> mmap a buffer on the CPU, and write data with the CPU, then I think we're back 
> to the "FlatCCS contains uninitialized garbage" case, where it's unsafe to 
> assume a compressed PAT.  And... we don't really know when sharing buffers 
> whether the other side is going to want to do CPU access.

I think the previous discussion has specifically happened in the context of
dma-buf, so not only CPU but other GPUs/accelerators/decoders/devices in the
system are also relevant.

It boils down to the fact that when exporting a dma-buf, one can't know it will
be consumed only by the same GPU (or other device for that matter) unless there
is an explicit negotiation between exporter and importers.

> It would be really nice to assume compression by default, though, which got me 
> thinking: if we mmap a buffer via DRM_XE_GEM_MMAP_OFFSET, could xe.ko disable 
> compression for us?  So, resolve any outstanding CCS data, and then switch any 
> PAT entries to uncompressed.  Mapping would block until that resolve is done.  
> It could leave compression off forever (once you CPU map a buffer, it's never 
> compressed again).  Or it could turn CCS back on when map count reaches 0 (but 
> frankly I'm not sure that's terribly important, and sounds more complex).

This would only really work for a single device but the dma-buf is
specifically targeting more generic sharing. There's no built-in mechanism
to limit the sharing to subset of devices without explicit negotiation
between the exporter and importers.

So I think the "by default" mode needs to be interoperable, and the
explicit negotiation can then use less compatible formats given those FD
are never passed to importers that were not part of the negotiation.

> As I understand it, at least on discrete GPUs, the kernel already has to do 
> something similar for eviction, when migrating BOs to system memory (which 
> doesn't support compression).  So this would be doing basically the same 
> "resolve and disable CCS" step the kernel can presumably already do, but now 
> on mmap as well.

So far the eviction strategy has been to copy both the backing store and
compression bits in raw form. With Xe2 it would indeed be possible to do
an implicit resolve IFF the buffer has not been shared to someone who doesn't
understand compression and might have left garbage in the CCS bits.

When evicting in raw form, kernel doesn't have to know if the CCS bits
are garbage or not on any given moment.

Regards, Joonas

> 
> What do you think?  Viable?  Crazy?  Have I missed something?
> 
> --Ken
Thomas Hellstrom May 14, 2024, 4:51 p.m. UTC | #5
On Tue, 2024-05-14 at 12:25 +0300, Joonas Lahtinen wrote:
> Quoting Kenneth Graunke (2024-05-11 03:58:34)
> > On Tuesday, May 7, 2024 3:56:57 PM PDT Matt Roper wrote:
> > > On Mon, May 06, 2024 at 09:52:35PM +0300, Juha-Pekka Heikkila
> > > wrote:
> > > > These patches introduce I915_FORMAT_MOD_4_TILED_XE2_CCS
> > > > modifier, which,
> > > > from the kernel's perspective, behaves similarly to 
> > `I915_FORMAT_MOD_4_TILED`.
> > > > This new modifier is primarily intended for user space to
> > > > effectively 
> > monitor
> > > > compression status, especially when dealing with a mix of
> > > > compressed and
> > > > uncompressed buffers.
> > > > 
> > > > The addition of this modifier facilitates user space in
> > > > managing 
> > compression
> > > > status, particularly when utilizing both compressed and
> > > > uncompressed 
> > buffers
> > > > concurrently. To leverage compression for these buffers, user
> > > > space
> > > > applications must configure the appropriate Page Attribute
> > > > Table (PAT) 
> > index.
> > > > Display engine will treat all Tile4 as if it were compressed
> > > > under all
> > > > circumstances on Xe2 architecture.
> > > 
> > > I may have missed some discussion about this, but I thought the
> > > previous
> > > consensus was that we didn't want/need new modifiers for
> > > compression on
> > > Xe2?  If a userspace client (or the display hardware) receives a
> > > buffer
> > > of unknown origin and unknown compression status, it's always
> > > fine to
> > > select a compressed PAT when binding the buffer to read since
> > > even for
> > > uncompressed buffers the CCS metadata will accurately reflect the
> > > compression status.  Unlike Xe1, where generating content without
> > > compression enabled would leave random garbage in the FlatCCS
> > > area, Xe2
> > > will set the corresponding FlatCCS to '0x0' for each block,
> > > indicating
> > > uncompressed data.
> > > 
> > > Can you explain more what the benefit of handling these modifiers
> > > explicitly is?
> > > 
> > > 
> > > Matt
> > 
> > Thanks, Matt!  I'm a bit late in getting up to speed with the Xe2
> > compression 
> > changes; this is really good information.
> > 
> > As I understand it...all blocks on the GPU behave in the way you
> > mentioned, 
> > where generating uncompressed data via the GPU will set FlatCCS =
> > 0, so you 
> > can assume a compressed PAT entry and everything works.
> > 
> > One snag is...I've heard that CPU access doesn't work that way. 
> > So, if you 
> > mmap a buffer on the CPU, and write data with the CPU, then I think
> > we're back 
> > to the "FlatCCS contains uninitialized garbage" case, where it's
> > unsafe to 
> > assume a compressed PAT.  And... we don't really know when sharing
> > buffers 
> > whether the other side is going to want to do CPU access.
> 
> I think the previous discussion has specifically happened in the
> context of
> dma-buf, so not only CPU but other GPUs/accelerators/decoders/devices
> in the
> system are also relevant.
> 
> It boils down to the fact that when exporting a dma-buf, one can't
> know it will
> be consumed only by the same GPU (or other device for that matter)
> unless there
> is an explicit negotiation between exporter and importers.
> 
> > It would be really nice to assume compression by default, though,
> > which got me 
> > thinking: if we mmap a buffer via DRM_XE_GEM_MMAP_OFFSET, could
> > xe.ko disable 
> > compression for us?  So, resolve any outstanding CCS data, and then
> > switch any 
> > PAT entries to uncompressed.  Mapping would block until that
> > resolve is done.  
> > It could leave compression off forever (once you CPU map a buffer,
> > it's never 
> > compressed again).  Or it could turn CCS back on when map count
> > reaches 0 (but 
> > frankly I'm not sure that's terribly important, and sounds more
> > complex).
> 
> This would only really work for a single device but the dma-buf is
> specifically targeting more generic sharing. There's no built-in
> mechanism
> to limit the sharing to subset of devices without explicit
> negotiation
> between the exporter and importers.
> 
> So I think the "by default" mode needs to be interoperable, and the
> explicit negotiation can then use less compatible formats given those
> FD
> are never passed to importers that were not part of the negotiation.
> 
> > As I understand it, at least on discrete GPUs, the kernel already
> > has to do 
> > something similar for eviction, when migrating BOs to system memory
> > (which 
> > doesn't support compression).  So this would be doing basically the
> > same 
> > "resolve and disable CCS" step the kernel can presumably already
> > do, but now 
> > on mmap as well.
> 
> So far the eviction strategy has been to copy both the backing store
> and
> compression bits in raw form. With Xe2 it would indeed be possible to
> do
> an implicit resolve IFF the buffer has not been shared to someone who
> doesn't
> understand compression and might have left garbage in the CCS bits.
> 
> When evicting in raw form, kernel doesn't have to know if the CCS
> bits
> are garbage or not on any given moment.
> 
> Regards, Joonas

Just a follow-up comment (TBC), IF we're going for the everything-is-
compressed approach, I think there are some considerations to be made:

dma-buf exports to foreign devices need to resolve at map_attachment
time. Foreign devices are all devices that can't interpret the
compressed content.
dma-buf mmaps need to resolve. IIRC Could be implemented in the
DMA_BUF_IOCTL_SYNC callbacks that wrap cpu access.
dma-buf imports from foreign devices need to never use compressed PAT
for writing. Should KMD enforce this? Implicitly? Explicitly? I don't
see how UMD could know whether the imported dma-buf is from a foreign
device.

For mmaps of buffer objects of local device a resolve is needed. Having
KMD do that on a pagefault-basis is definitely possible, but will most
likely be terribly inefficient. Better to leave that to UMD?

/Thomas

> 
> > 
> > What do you think?  Viable?  Crazy?  Have I missed something?
> > 
> > --Ken