diff mbox series

[1/4] drm/fourcc: Add modifier definitions for describing Amlogic Video Framebuffer Compression

Message ID 20200221090845.7397-2-narmstrong@baylibre.com (mailing list archive)
State New, archived
Headers show
Series drm/meson: add support for Amlogic Video FBC | expand

Commit Message

Neil Armstrong Feb. 21, 2020, 9:08 a.m. UTC
Amlogic uses a proprietary lossless image compression protocol and format
for their hardware video codec accelerators, either video decoders or
video input encoders.

It considerably reduces memory bandwidth while writing and reading
frames in memory.

The underlying storage is considered to be 3 components, 8bit or 10-bit
per component, YCbCr 420, single plane :
- DRM_FORMAT_YUV420_8BIT
- DRM_FORMAT_YUV420_10BIT

This modifier will be notably added to DMA-BUF frames imported from the V4L2
Amlogic VDEC decoder.

At least two options are supported :
- Scatter mode: the buffer is filled with a IOMMU scatter table referring
  to the encoder current memory layout. This mode if more efficient in terms
  of memory allocation but frames are not dumpable and only valid during until
  the buffer is freed and back in control of the encoder
- Memory saving: when the pixel bpp is 8b, the size of the superblock can
  be reduced, thus saving memory.

Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
---
 include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++
 1 file changed, 56 insertions(+)

Comments

Maxime Jourdan March 2, 2020, 4:28 p.m. UTC | #1
On Fri, Feb 21, 2020 at 10:09 AM Neil Armstrong <narmstrong@baylibre.com> wrote:
>
> Amlogic uses a proprietary lossless image compression protocol and format
> for their hardware video codec accelerators, either video decoders or
> video input encoders.
>
> It considerably reduces memory bandwidth while writing and reading
> frames in memory.
>
> The underlying storage is considered to be 3 components, 8bit or 10-bit
> per component, YCbCr 420, single plane :
> - DRM_FORMAT_YUV420_8BIT
> - DRM_FORMAT_YUV420_10BIT
>
> This modifier will be notably added to DMA-BUF frames imported from the V4L2
> Amlogic VDEC decoder.
>
> At least two options are supported :
> - Scatter mode: the buffer is filled with a IOMMU scatter table referring
>   to the encoder current memory layout. This mode if more efficient in terms
>   of memory allocation but frames are not dumpable and only valid during until
>   the buffer is freed and back in control of the encoder
> - Memory saving: when the pixel bpp is 8b, the size of the superblock can
>   be reduced, thus saving memory.
>
> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> ---
>  include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
>
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 8bc0b31597d8..8a6e87bacadb 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -309,6 +309,7 @@ extern "C" {
>  #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
>  #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
>  #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
> +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
>
>  /* add more to the end as needed */
>
> @@ -804,6 +805,61 @@ extern "C" {
>   */
>  #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
>
> +/*
> + * Amlogic Video Framebuffer Compression modifiers
> + *
> + * Amlogic uses a proprietary lossless image compression protocol and format
> + * for their hardware video codec accelerators, either video decoders or
> + * video input encoders.
> + *
> + * It considerably reduces memory bandwidth while writing and reading
> + * frames in memory.
> + * Implementation details may be platform and SoC specific, and shared
> + * between the producer and the decoder on the same platform.
> + *
> + * The underlying storage is considered to be 3 components, 8bit or 10-bit
> + * per component YCbCr 420, single plane :
> + * - DRM_FORMAT_YUV420_8BIT
> + * - DRM_FORMAT_YUV420_10BIT
> + *
> + * The classic memory storage is composed of:
> + * - a body content organized in 64x32 superblocks with 4096 bytes per
> + *   superblock in default mode.
> + * - a 32 bytes per 128x64 header block
> + */
> +#define DRM_FORMAT_MOD_AMLOGIC_FBC_DEFAULT fourcc_mod_code(AMLOGIC, 0)
> +
> +/*
> + * Amlogic Video Framebuffer Compression Options
> + *
> + * Two optional features are available which may not supported/used on every
> + * SoCs and Compressed Framebuffer producers.
> + */
> +#define DRM_FORMAT_MOD_AMLOGIC_FBC(__modes) fourcc_mod_code(AMLOGIC, __modes)
> +
> +/*
> + * Amlogic FBC Scatter Memory layout
> + *
> + * Indicates the header contains IOMMU references to the compressed
> + * frames content to optimize memory access and layout.
> + * In this mode, only the header memory address is needed, thus the
> + * content memory organization is tied to the current producer
> + * execution and cannot be saved/dumped.
> + */
> +#define DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER     (1ULL << 0)
> +
> +/*
> + * Amlogic FBC Memory Saving mode
> + *
> + * Indicates the storage is packed when pixel size is multiple of word
> + * boudaries, i.e. 8bit should be stored in this mode to save allocation
> + * memory.
> + *
> + * This mode reduces body layout to 3072 bytes per 64x32 superblock and
> + * 3200 bytes per 64x32 superblock combined with scatter mode.
> + */
> +#define DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING  (1ULL << 1)
> +
>  #if defined(__cplusplus)
>  }
>  #endif
> --
> 2.22.0
>
>

I'm the main developer of the V4L2 video decoder (H264, VP9..) on
amlogic platforms, which is a producer of such compressed frames.

Those modifiers suit well the combinations of options that can be
applied to the frames when created. I also helped testing the
following scenarios of decode+display on various SoCs:

- SM1: DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER (10-bit & 8-bit video)
- SM1: DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER +
DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING (8-bit video)
- G12A: DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER (10-bit & 8-bit video)
- G12A: DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER +
DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING (8-bit video)
- GXL: DRM_FORMAT_MOD_AMLOGIC_FBC_DEFAULT (10-bit & 8-bit video)
- GXL: DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING (8-bit video)

Reviewed-by: Maxime Jourdan <mjourdan@baylibre.com>
Pekka Paalanen March 3, 2020, 10:10 a.m. UTC | #2
On Fri, 21 Feb 2020 10:08:42 +0100
Neil Armstrong <narmstrong@baylibre.com> wrote:

> Amlogic uses a proprietary lossless image compression protocol and format
> for their hardware video codec accelerators, either video decoders or
> video input encoders.
> 
> It considerably reduces memory bandwidth while writing and reading
> frames in memory.
> 
> The underlying storage is considered to be 3 components, 8bit or 10-bit
> per component, YCbCr 420, single plane :
> - DRM_FORMAT_YUV420_8BIT
> - DRM_FORMAT_YUV420_10BIT
> 
> This modifier will be notably added to DMA-BUF frames imported from the V4L2
> Amlogic VDEC decoder.
> 
> At least two options are supported :
> - Scatter mode: the buffer is filled with a IOMMU scatter table referring
>   to the encoder current memory layout. This mode if more efficient in terms
>   of memory allocation but frames are not dumpable and only valid during until
>   the buffer is freed and back in control of the encoder
> - Memory saving: when the pixel bpp is 8b, the size of the superblock can
>   be reduced, thus saving memory.
> 
> Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> ---
>  include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++
>  1 file changed, 56 insertions(+)
> 
> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> index 8bc0b31597d8..8a6e87bacadb 100644
> --- a/include/uapi/drm/drm_fourcc.h
> +++ b/include/uapi/drm/drm_fourcc.h
> @@ -309,6 +309,7 @@ extern "C" {
>  #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
>  #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
>  #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
> +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
>  
>  /* add more to the end as needed */
>  
> @@ -804,6 +805,61 @@ extern "C" {
>   */
>  #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
>  
> +/*
> + * Amlogic Video Framebuffer Compression modifiers
> + *
> + * Amlogic uses a proprietary lossless image compression protocol and format
> + * for their hardware video codec accelerators, either video decoders or
> + * video input encoders.
> + *
> + * It considerably reduces memory bandwidth while writing and reading
> + * frames in memory.
> + * Implementation details may be platform and SoC specific, and shared
> + * between the producer and the decoder on the same platform.

Hi,

after a lengthy IRC discussion on #dri-devel, this "may be platform and
SoC specific" is a problem.

It can be an issue in two ways:

- If something in the data acts like a sub-modifier, then advertising
  support for one modifier does not really tell if the data layout is
  supported or not.

- If you need to know the platform and/or SoC to be able to interpret
  the data, it means the modifier is ill-defined and cannot be used in
  inter-machine communication (e.g. Pipewire).

Neil mentioned the data contains a "header" that further specifies
things, but there is no specification about the header itself.
Therefore I don't think we can even know if the header contains
something that acts like a sub-modifier or not.

All this sounds like the modifier definitions here are not enough to
fully interpret the data. At the very least I would expect a reference
to a document explaining the "header", or even better, a kernel ReST
doc.

I wonder if this is at all suitable as a DRM format modifier as is. I
have been assuming that a modifier together with all the usual FB
parameters should be enough to interpret the stored data, but in this
case I have doubt it actually is.

I have no problem with proprietary data layouts as long as they are
fully specified.

I do feel like I would not be able to write a software decoder for this
set of modifiers given the details below.


Thanks,
pq

> + *
> + * The underlying storage is considered to be 3 components, 8bit or 10-bit
> + * per component YCbCr 420, single plane :
> + * - DRM_FORMAT_YUV420_8BIT
> + * - DRM_FORMAT_YUV420_10BIT
> + *
> + * The classic memory storage is composed of:
> + * - a body content organized in 64x32 superblocks with 4096 bytes per
> + *   superblock in default mode.
> + * - a 32 bytes per 128x64 header block
> + */
> +#define DRM_FORMAT_MOD_AMLOGIC_FBC_DEFAULT fourcc_mod_code(AMLOGIC, 0)
> +
> +/*
> + * Amlogic Video Framebuffer Compression Options
> + *
> + * Two optional features are available which may not supported/used on every
> + * SoCs and Compressed Framebuffer producers.
> + */
> +#define DRM_FORMAT_MOD_AMLOGIC_FBC(__modes) fourcc_mod_code(AMLOGIC, __modes)
> +
> +/*
> + * Amlogic FBC Scatter Memory layout
> + *
> + * Indicates the header contains IOMMU references to the compressed
> + * frames content to optimize memory access and layout.
> + * In this mode, only the header memory address is needed, thus the
> + * content memory organization is tied to the current producer
> + * execution and cannot be saved/dumped.
> + */
> +#define DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER	(1ULL << 0)
> +
> +/*
> + * Amlogic FBC Memory Saving mode
> + *
> + * Indicates the storage is packed when pixel size is multiple of word
> + * boudaries, i.e. 8bit should be stored in this mode to save allocation
> + * memory.
> + *
> + * This mode reduces body layout to 3072 bytes per 64x32 superblock and
> + * 3200 bytes per 64x32 superblock combined with scatter mode.
> + */
> +#define DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING	(1ULL << 1)
> +
>  #if defined(__cplusplus)
>  }
>  #endif
Brian Starkey March 3, 2020, 10:53 a.m. UTC | #3
Hi,

On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
> On Fri, 21 Feb 2020 10:08:42 +0100
> Neil Armstrong <narmstrong@baylibre.com> wrote:
> 
> > Amlogic uses a proprietary lossless image compression protocol and format
> > for their hardware video codec accelerators, either video decoders or
> > video input encoders.
> > 
> > It considerably reduces memory bandwidth while writing and reading
> > frames in memory.
> > 
> > The underlying storage is considered to be 3 components, 8bit or 10-bit
> > per component, YCbCr 420, single plane :
> > - DRM_FORMAT_YUV420_8BIT
> > - DRM_FORMAT_YUV420_10BIT
> > 
> > This modifier will be notably added to DMA-BUF frames imported from the V4L2
> > Amlogic VDEC decoder.
> > 
> > At least two options are supported :
> > - Scatter mode: the buffer is filled with a IOMMU scatter table referring
> >   to the encoder current memory layout. This mode if more efficient in terms
> >   of memory allocation but frames are not dumpable and only valid during until
> >   the buffer is freed and back in control of the encoder
> > - Memory saving: when the pixel bpp is 8b, the size of the superblock can
> >   be reduced, thus saving memory.
> > 
> > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> > ---
> >  include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++
> >  1 file changed, 56 insertions(+)
> > 
> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > index 8bc0b31597d8..8a6e87bacadb 100644
> > --- a/include/uapi/drm/drm_fourcc.h
> > +++ b/include/uapi/drm/drm_fourcc.h
> > @@ -309,6 +309,7 @@ extern "C" {
> >  #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
> >  #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
> >  #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
> > +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
> >  
> >  /* add more to the end as needed */
> >  
> > @@ -804,6 +805,61 @@ extern "C" {
> >   */
> >  #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
> >  
> > +/*
> > + * Amlogic Video Framebuffer Compression modifiers
> > + *
> > + * Amlogic uses a proprietary lossless image compression protocol and format
> > + * for their hardware video codec accelerators, either video decoders or
> > + * video input encoders.
> > + *
> > + * It considerably reduces memory bandwidth while writing and reading
> > + * frames in memory.
> > + * Implementation details may be platform and SoC specific, and shared
> > + * between the producer and the decoder on the same platform.
> 
> Hi,
> 
> after a lengthy IRC discussion on #dri-devel, this "may be platform and
> SoC specific" is a problem.
> 
> It can be an issue in two ways:
> 
> - If something in the data acts like a sub-modifier, then advertising
>   support for one modifier does not really tell if the data layout is
>   supported or not.
> 
> - If you need to know the platform and/or SoC to be able to interpret
>   the data, it means the modifier is ill-defined and cannot be used in
>   inter-machine communication (e.g. Pipewire).
> 

Playing devil's advocate, the comment sounds similar to
I915_FORMAT_MOD_{X,Y}_TILED:

 * This format is highly platforms specific and not useful for cross-driver
 * sharing. It exists since on a given platform it does uniquely identify the
 * layout in a simple way for i915-specific userspace.

Isn't the statement that this for sharing between producer and decoder
_on the same platform_ a similar clause with the same effect?

What advantage is there to exposing the gory details? For Arm AFBC
it's necessary because IP on the SoC can be (likely to be) from
different vendors with different capabilities.

If this is only for talking between Amlogic IP on the same SoC, and
those devices support all the same "flavours", I don't see what is
gained by making userspace care about internals.

Thanks,
-Brian

> Neil mentioned the data contains a "header" that further specifies
> things, but there is no specification about the header itself.
> Therefore I don't think we can even know if the header contains
> something that acts like a sub-modifier or not.
> 
> All this sounds like the modifier definitions here are not enough to
> fully interpret the data. At the very least I would expect a reference
> to a document explaining the "header", or even better, a kernel ReST
> doc.
> 
> I wonder if this is at all suitable as a DRM format modifier as is. I
> have been assuming that a modifier together with all the usual FB
> parameters should be enough to interpret the stored data, but in this
> case I have doubt it actually is.
> 
> I have no problem with proprietary data layouts as long as they are
> fully specified.
> 
> I do feel like I would not be able to write a software decoder for this
> set of modifiers given the details below.
> 
> 
> Thanks,
> pq
>
Daniel Vetter March 3, 2020, 11:37 a.m. UTC | #4
On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey <brian.starkey@arm.com> wrote:
>
> Hi,
>
> On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
> > On Fri, 21 Feb 2020 10:08:42 +0100
> > Neil Armstrong <narmstrong@baylibre.com> wrote:
> >
> > > Amlogic uses a proprietary lossless image compression protocol and format
> > > for their hardware video codec accelerators, either video decoders or
> > > video input encoders.
> > >
> > > It considerably reduces memory bandwidth while writing and reading
> > > frames in memory.
> > >
> > > The underlying storage is considered to be 3 components, 8bit or 10-bit
> > > per component, YCbCr 420, single plane :
> > > - DRM_FORMAT_YUV420_8BIT
> > > - DRM_FORMAT_YUV420_10BIT
> > >
> > > This modifier will be notably added to DMA-BUF frames imported from the V4L2
> > > Amlogic VDEC decoder.
> > >
> > > At least two options are supported :
> > > - Scatter mode: the buffer is filled with a IOMMU scatter table referring
> > >   to the encoder current memory layout. This mode if more efficient in terms
> > >   of memory allocation but frames are not dumpable and only valid during until
> > >   the buffer is freed and back in control of the encoder
> > > - Memory saving: when the pixel bpp is 8b, the size of the superblock can
> > >   be reduced, thus saving memory.
> > >
> > > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> > > ---
> > >  include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++
> > >  1 file changed, 56 insertions(+)
> > >
> > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > > index 8bc0b31597d8..8a6e87bacadb 100644
> > > --- a/include/uapi/drm/drm_fourcc.h
> > > +++ b/include/uapi/drm/drm_fourcc.h
> > > @@ -309,6 +309,7 @@ extern "C" {
> > >  #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
> > >  #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
> > >  #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
> > > +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
> > >
> > >  /* add more to the end as needed */
> > >
> > > @@ -804,6 +805,61 @@ extern "C" {
> > >   */
> > >  #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
> > >
> > > +/*
> > > + * Amlogic Video Framebuffer Compression modifiers
> > > + *
> > > + * Amlogic uses a proprietary lossless image compression protocol and format
> > > + * for their hardware video codec accelerators, either video decoders or
> > > + * video input encoders.
> > > + *
> > > + * It considerably reduces memory bandwidth while writing and reading
> > > + * frames in memory.
> > > + * Implementation details may be platform and SoC specific, and shared
> > > + * between the producer and the decoder on the same platform.
> >
> > Hi,
> >
> > after a lengthy IRC discussion on #dri-devel, this "may be platform and
> > SoC specific" is a problem.
> >
> > It can be an issue in two ways:
> >
> > - If something in the data acts like a sub-modifier, then advertising
> >   support for one modifier does not really tell if the data layout is
> >   supported or not.
> >
> > - If you need to know the platform and/or SoC to be able to interpret
> >   the data, it means the modifier is ill-defined and cannot be used in
> >   inter-machine communication (e.g. Pipewire).
> >
>
> Playing devil's advocate, the comment sounds similar to
> I915_FORMAT_MOD_{X,Y}_TILED:
>
>  * This format is highly platforms specific and not useful for cross-driver
>  * sharing. It exists since on a given platform it does uniquely identify the
>  * layout in a simple way for i915-specific userspace.

Yeah which we regret now. We need to now roll out a new set of
modifiers for at least some of the differences in these on the
modern-ish chips (the old crap is pretty much lost cause anyway).

This was kinda a nasty hack to smooth things over since we have epic
amounts of userspace, but it's really not a great idea (and no one
else really has epic amounts of existing userspace that uses tiling
flags everywhere, this is all new code).
-Daniel

> Isn't the statement that this for sharing between producer and decoder
> _on the same platform_ a similar clause with the same effect?
>
> What advantage is there to exposing the gory details? For Arm AFBC
> it's necessary because IP on the SoC can be (likely to be) from
> different vendors with different capabilities.
>
> If this is only for talking between Amlogic IP on the same SoC, and
> those devices support all the same "flavours", I don't see what is
> gained by making userspace care about internals.

The trouble is if you mix&match IP cores, and one of them supports
flavours A, B, C and the other C, D, E. But all you have is a single
magic modifier for "whatever the flavour is that soc prefers". So
someone gets to stuff this in DT.

Also eventually, maybe, perhaps ARM does grow up into the
client/server space with add-on pcie graphics, and at least for client
you very often end up with integrated + add-in pcie gpu. At that point
you really can't have magic per-soc modifiers anymore.

If people get confused I'm happy to add a "WARNING: This was a dumb
idea for backwards compat with legacy code, no one with new stuff ever
repeat it" to the i915 modifers.
-Daniel

>
> Thanks,
> -Brian
>
> > Neil mentioned the data contains a "header" that further specifies
> > things, but there is no specification about the header itself.
> > Therefore I don't think we can even know if the header contains
> > something that acts like a sub-modifier or not.
> >
> > All this sounds like the modifier definitions here are not enough to
> > fully interpret the data. At the very least I would expect a reference
> > to a document explaining the "header", or even better, a kernel ReST
> > doc.
> >
> > I wonder if this is at all suitable as a DRM format modifier as is. I
> > have been assuming that a modifier together with all the usual FB
> > parameters should be enough to interpret the stored data, but in this
> > case I have doubt it actually is.
> >
> > I have no problem with proprietary data layouts as long as they are
> > fully specified.
> >
> > I do feel like I would not be able to write a software decoder for this
> > set of modifiers given the details below.
> >
> >
> > Thanks,
> > pq
> >
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Pekka Paalanen March 3, 2020, 1:25 p.m. UTC | #5
On Tue, 3 Mar 2020 12:37:16 +0100
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey <brian.starkey@arm.com> wrote:
> >
> > Hi,
> >
> > On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:  
> > > On Fri, 21 Feb 2020 10:08:42 +0100
> > > Neil Armstrong <narmstrong@baylibre.com> wrote:
> > >  
> > > > Amlogic uses a proprietary lossless image compression protocol and format
> > > > for their hardware video codec accelerators, either video decoders or
> > > > video input encoders.
> > > >
> > > > It considerably reduces memory bandwidth while writing and reading
> > > > frames in memory.
> > > >
> > > > The underlying storage is considered to be 3 components, 8bit or 10-bit
> > > > per component, YCbCr 420, single plane :
> > > > - DRM_FORMAT_YUV420_8BIT
> > > > - DRM_FORMAT_YUV420_10BIT
> > > >
> > > > This modifier will be notably added to DMA-BUF frames imported from the V4L2
> > > > Amlogic VDEC decoder.
> > > >
> > > > At least two options are supported :
> > > > - Scatter mode: the buffer is filled with a IOMMU scatter table referring
> > > >   to the encoder current memory layout. This mode if more efficient in terms
> > > >   of memory allocation but frames are not dumpable and only valid during until
> > > >   the buffer is freed and back in control of the encoder
> > > > - Memory saving: when the pixel bpp is 8b, the size of the superblock can
> > > >   be reduced, thus saving memory.
> > > >
> > > > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> > > > ---
> > > >  include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 56 insertions(+)
> > > >
> > > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > > > index 8bc0b31597d8..8a6e87bacadb 100644
> > > > --- a/include/uapi/drm/drm_fourcc.h
> > > > +++ b/include/uapi/drm/drm_fourcc.h
> > > > @@ -309,6 +309,7 @@ extern "C" {
> > > >  #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
> > > >  #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
> > > >  #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
> > > > +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
> > > >
> > > >  /* add more to the end as needed */
> > > >
> > > > @@ -804,6 +805,61 @@ extern "C" {
> > > >   */
> > > >  #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
> > > >
> > > > +/*
> > > > + * Amlogic Video Framebuffer Compression modifiers
> > > > + *
> > > > + * Amlogic uses a proprietary lossless image compression protocol and format
> > > > + * for their hardware video codec accelerators, either video decoders or
> > > > + * video input encoders.
> > > > + *
> > > > + * It considerably reduces memory bandwidth while writing and reading
> > > > + * frames in memory.
> > > > + * Implementation details may be platform and SoC specific, and shared
> > > > + * between the producer and the decoder on the same platform.  
> > >
> > > Hi,
> > >
> > > after a lengthy IRC discussion on #dri-devel, this "may be platform and
> > > SoC specific" is a problem.
> > >
> > > It can be an issue in two ways:
> > >
> > > - If something in the data acts like a sub-modifier, then advertising
> > >   support for one modifier does not really tell if the data layout is
> > >   supported or not.
> > >
> > > - If you need to know the platform and/or SoC to be able to interpret
> > >   the data, it means the modifier is ill-defined and cannot be used in
> > >   inter-machine communication (e.g. Pipewire).
> > >  
> >
> > Playing devil's advocate, the comment sounds similar to
> > I915_FORMAT_MOD_{X,Y}_TILED:
> >
> >  * This format is highly platforms specific and not useful for cross-driver
> >  * sharing. It exists since on a given platform it does uniquely identify the
> >  * layout in a simple way for i915-specific userspace.  
> 
> Yeah which we regret now. We need to now roll out a new set of
> modifiers for at least some of the differences in these on the
> modern-ish chips (the old crap is pretty much lost cause anyway).
> 
> This was kinda a nasty hack to smooth things over since we have epic
> amounts of userspace, but it's really not a great idea (and no one
> else really has epic amounts of existing userspace that uses tiling
> flags everywhere, this is all new code).
> -Daniel
> 
> > Isn't the statement that this for sharing between producer and decoder
> > _on the same platform_ a similar clause with the same effect?
> >
> > What advantage is there to exposing the gory details? For Arm AFBC
> > it's necessary because IP on the SoC can be (likely to be) from
> > different vendors with different capabilities.
> >
> > If this is only for talking between Amlogic IP on the same SoC, and
> > those devices support all the same "flavours", I don't see what is
> > gained by making userspace care about internals.  
> 
> The trouble is if you mix&match IP cores, and one of them supports
> flavours A, B, C and the other C, D, E. But all you have is a single
> magic modifier for "whatever the flavour is that soc prefers". So
> someone gets to stuff this in DT.
> 
> Also eventually, maybe, perhaps ARM does grow up into the
> client/server space with add-on pcie graphics, and at least for client
> you very often end up with integrated + add-in pcie gpu. At that point
> you really can't have magic per-soc modifiers anymore.

Hi,

I also heard that Pipewire will copy buffers and modifiers verbatim
from one machine to another when streaming across network, assuming
that the same modifier means the same thing on all machines.[Citation needed]

If that is something that must not be done with DRM modifiers, then
please contact them and document that.


Thanks,
pq


> If people get confused I'm happy to add a "WARNING: This was a dumb
> idea for backwards compat with legacy code, no one with new stuff ever
> repeat it" to the i915 modifers.
> -Daniel
> 
> >
> > Thanks,
> > -Brian
> >  
> > > Neil mentioned the data contains a "header" that further specifies
> > > things, but there is no specification about the header itself.
> > > Therefore I don't think we can even know if the header contains
> > > something that acts like a sub-modifier or not.
> > >
> > > All this sounds like the modifier definitions here are not enough to
> > > fully interpret the data. At the very least I would expect a reference
> > > to a document explaining the "header", or even better, a kernel ReST
> > > doc.
> > >
> > > I wonder if this is at all suitable as a DRM format modifier as is. I
> > > have been assuming that a modifier together with all the usual FB
> > > parameters should be enough to interpret the stored data, but in this
> > > case I have doubt it actually is.
> > >
> > > I have no problem with proprietary data layouts as long as they are
> > > fully specified.
> > >
> > > I do feel like I would not be able to write a software decoder for this
> > > set of modifiers given the details below.
> > >
> > >
> > > Thanks,
> > > pq
> > >
Brian Starkey March 3, 2020, 1:43 p.m. UTC | #6
On Tue, Mar 03, 2020 at 12:37:16PM +0100, Daniel Vetter wrote:
> On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey <brian.starkey@arm.com> wrote:
> >
> > Hi,
> >
> > On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:
> > > On Fri, 21 Feb 2020 10:08:42 +0100
> > > Neil Armstrong <narmstrong@baylibre.com> wrote:
> > >
> > > > Amlogic uses a proprietary lossless image compression protocol and format
> > > > for their hardware video codec accelerators, either video decoders or
> > > > video input encoders.
> > > >
> > > > It considerably reduces memory bandwidth while writing and reading
> > > > frames in memory.
> > > >
> > > > The underlying storage is considered to be 3 components, 8bit or 10-bit
> > > > per component, YCbCr 420, single plane :
> > > > - DRM_FORMAT_YUV420_8BIT
> > > > - DRM_FORMAT_YUV420_10BIT
> > > >
> > > > This modifier will be notably added to DMA-BUF frames imported from the V4L2
> > > > Amlogic VDEC decoder.
> > > >
> > > > At least two options are supported :
> > > > - Scatter mode: the buffer is filled with a IOMMU scatter table referring
> > > >   to the encoder current memory layout. This mode if more efficient in terms
> > > >   of memory allocation but frames are not dumpable and only valid during until
> > > >   the buffer is freed and back in control of the encoder
> > > > - Memory saving: when the pixel bpp is 8b, the size of the superblock can
> > > >   be reduced, thus saving memory.
> > > >
> > > > Signed-off-by: Neil Armstrong <narmstrong@baylibre.com>
> > > > ---
> > > >  include/uapi/drm/drm_fourcc.h | 56 +++++++++++++++++++++++++++++++++++
> > > >  1 file changed, 56 insertions(+)
> > > >
> > > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
> > > > index 8bc0b31597d8..8a6e87bacadb 100644
> > > > --- a/include/uapi/drm/drm_fourcc.h
> > > > +++ b/include/uapi/drm/drm_fourcc.h
> > > > @@ -309,6 +309,7 @@ extern "C" {
> > > >  #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
> > > >  #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
> > > >  #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
> > > > +#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
> > > >
> > > >  /* add more to the end as needed */
> > > >
> > > > @@ -804,6 +805,61 @@ extern "C" {
> > > >   */
> > > >  #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
> > > >
> > > > +/*
> > > > + * Amlogic Video Framebuffer Compression modifiers
> > > > + *
> > > > + * Amlogic uses a proprietary lossless image compression protocol and format
> > > > + * for their hardware video codec accelerators, either video decoders or
> > > > + * video input encoders.
> > > > + *
> > > > + * It considerably reduces memory bandwidth while writing and reading
> > > > + * frames in memory.
> > > > + * Implementation details may be platform and SoC specific, and shared
> > > > + * between the producer and the decoder on the same platform.
> > >
> > > Hi,
> > >
> > > after a lengthy IRC discussion on #dri-devel, this "may be platform and
> > > SoC specific" is a problem.
> > >
> > > It can be an issue in two ways:
> > >
> > > - If something in the data acts like a sub-modifier, then advertising
> > >   support for one modifier does not really tell if the data layout is
> > >   supported or not.
> > >
> > > - If you need to know the platform and/or SoC to be able to interpret
> > >   the data, it means the modifier is ill-defined and cannot be used in
> > >   inter-machine communication (e.g. Pipewire).
> > >
> >
> > Playing devil's advocate, the comment sounds similar to
> > I915_FORMAT_MOD_{X,Y}_TILED:
> >
> >  * This format is highly platforms specific and not useful for cross-driver
> >  * sharing. It exists since on a given platform it does uniquely identify the
> >  * layout in a simple way for i915-specific userspace.
> 
> Yeah which we regret now. We need to now roll out a new set of
> modifiers for at least some of the differences in these on the
> modern-ish chips (the old crap is pretty much lost cause anyway).
> 
> This was kinda a nasty hack to smooth things over since we have epic
> amounts of userspace, but it's really not a great idea (and no one
> else really has epic amounts of existing userspace that uses tiling
> flags everywhere, this is all new code).
> -Daniel
> 
> > Isn't the statement that this for sharing between producer and decoder
> > _on the same platform_ a similar clause with the same effect?
> >
> > What advantage is there to exposing the gory details? For Arm AFBC
> > it's necessary because IP on the SoC can be (likely to be) from
> > different vendors with different capabilities.
> >
> > If this is only for talking between Amlogic IP on the same SoC, and
> > those devices support all the same "flavours", I don't see what is
> > gained by making userspace care about internals.
> 
> The trouble is if you mix&match IP cores, and one of them supports
> flavours A, B, C and the other C, D, E. But all you have is a single
> magic modifier for "whatever the flavour is that soc prefers". So
> someone gets to stuff this in DT.
> 

Yes, if incompatible support levels are possible, then they must be
described, no disagreement there. That's why AFBC is so explicit.

> Also eventually, maybe, perhaps ARM does grow up into the
> client/server space with add-on pcie graphics, and at least for client
> you very often end up with integrated + add-in pcie gpu. At that point
> you really can't have magic per-soc modifiers anymore.
> 

I don't entirely agree. This is only relevant for modifiers which
might be used between the PCIe GPU and the SoC (in your example).
Per-SoC modifiers still work, they just lose meaning at the SoC
boundary.

Looking at the description of DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER in
particular, it sounds like that would never be shareable even if it
had a more "complete" modifier.

> If people get confused I'm happy to add a "WARNING: This was a dumb
> idea for backwards compat with legacy code, no one with new stuff ever
> repeat it" to the i915 modifers.
> -Daniel

I think marking it as non-preferred (and why) would be a good idea, so
as not to use it as an example.

Cheers,
-Brian

> 
> >
> > Thanks,
> > -Brian
> >
> > > Neil mentioned the data contains a "header" that further specifies
> > > things, but there is no specification about the header itself.
> > > Therefore I don't think we can even know if the header contains
> > > something that acts like a sub-modifier or not.
> > >
> > > All this sounds like the modifier definitions here are not enough to
> > > fully interpret the data. At the very least I would expect a reference
> > > to a document explaining the "header", or even better, a kernel ReST
> > > doc.
> > >
> > > I wonder if this is at all suitable as a DRM format modifier as is. I
> > > have been assuming that a modifier together with all the usual FB
> > > parameters should be enough to interpret the stored data, but in this
> > > case I have doubt it actually is.
> > >
> > > I have no problem with proprietary data layouts as long as they are
> > > fully specified.
> > >
> > > I do feel like I would not be able to write a software decoder for this
> > > set of modifiers given the details below.
> > >
> > >
> > > Thanks,
> > > pq
> > >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> 
> 
> -- 
> Daniel Vetter
> Software Engineer, Intel Corporation
> +41 (0) 79 365 57 48 - http://blog.ffwll.ch
Pekka Paalanen March 3, 2020, 3:33 p.m. UTC | #7
On Tue, 3 Mar 2020 15:25:41 +0200
Pekka Paalanen <ppaalanen@gmail.com> wrote:

> On Tue, 3 Mar 2020 12:37:16 +0100
> Daniel Vetter <daniel@ffwll.ch> wrote:
> 
> > On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey <brian.starkey@arm.com> wrote:  
> > >
> > > Hi,
> > >
> > > On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:    
> > > > On Fri, 21 Feb 2020 10:08:42 +0100
> > > > Neil Armstrong <narmstrong@baylibre.com> wrote:
> > > >    
...
> > > > > +/*
> > > > > + * Amlogic Video Framebuffer Compression modifiers
> > > > > + *
> > > > > + * Amlogic uses a proprietary lossless image compression protocol and format
> > > > > + * for their hardware video codec accelerators, either video decoders or
> > > > > + * video input encoders.
> > > > > + *
> > > > > + * It considerably reduces memory bandwidth while writing and reading
> > > > > + * frames in memory.
> > > > > + * Implementation details may be platform and SoC specific, and shared
> > > > > + * between the producer and the decoder on the same platform.    
> > > >
> > > > Hi,
> > > >
> > > > after a lengthy IRC discussion on #dri-devel, this "may be platform and
> > > > SoC specific" is a problem.
> > > >
> > > > It can be an issue in two ways:
> > > >
> > > > - If something in the data acts like a sub-modifier, then advertising
> > > >   support for one modifier does not really tell if the data layout is
> > > >   supported or not.
> > > >
> > > > - If you need to know the platform and/or SoC to be able to interpret
> > > >   the data, it means the modifier is ill-defined and cannot be used in
> > > >   inter-machine communication (e.g. Pipewire).
> > > >    
> > >
> > > Playing devil's advocate, the comment sounds similar to
> > > I915_FORMAT_MOD_{X,Y}_TILED:
> > >
> > >  * This format is highly platforms specific and not useful for cross-driver
> > >  * sharing. It exists since on a given platform it does uniquely identify the
> > >  * layout in a simple way for i915-specific userspace.    
> > 
> > Yeah which we regret now. We need to now roll out a new set of
> > modifiers for at least some of the differences in these on the
> > modern-ish chips (the old crap is pretty much lost cause anyway).
> > 
> > This was kinda a nasty hack to smooth things over since we have epic
> > amounts of userspace, but it's really not a great idea (and no one
> > else really has epic amounts of existing userspace that uses tiling
> > flags everywhere, this is all new code).
> > -Daniel
> >   
> > > Isn't the statement that this for sharing between producer and decoder
> > > _on the same platform_ a similar clause with the same effect?
> > >
> > > What advantage is there to exposing the gory details? For Arm AFBC
> > > it's necessary because IP on the SoC can be (likely to be) from
> > > different vendors with different capabilities.
> > >
> > > If this is only for talking between Amlogic IP on the same SoC, and
> > > those devices support all the same "flavours", I don't see what is
> > > gained by making userspace care about internals.    
> > 
> > The trouble is if you mix&match IP cores, and one of them supports
> > flavours A, B, C and the other C, D, E. But all you have is a single
> > magic modifier for "whatever the flavour is that soc prefers". So
> > someone gets to stuff this in DT.
> > 
> > Also eventually, maybe, perhaps ARM does grow up into the
> > client/server space with add-on pcie graphics, and at least for client
> > you very often end up with integrated + add-in pcie gpu. At that point
> > you really can't have magic per-soc modifiers anymore.  
> 
> Hi,
> 
> I also heard that Pipewire will copy buffers and modifiers verbatim
> from one machine to another when streaming across network, assuming
> that the same modifier means the same thing on all machines.[Citation needed]
> 
> If that is something that must not be done with DRM modifiers, then
> please contact them and document that.

Sorry, it's waypipe, not pipewire:
https://gitlab.freedesktop.org/mstoeckl/waypipe/


Thanks,
pq
Daniel Vetter March 6, 2020, 10:13 a.m. UTC | #8
On Tue, Mar 03, 2020 at 05:33:32PM +0200, Pekka Paalanen wrote:
> On Tue, 3 Mar 2020 15:25:41 +0200
> Pekka Paalanen <ppaalanen@gmail.com> wrote:
> 
> > On Tue, 3 Mar 2020 12:37:16 +0100
> > Daniel Vetter <daniel@ffwll.ch> wrote:
> > 
> > > On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey <brian.starkey@arm.com> wrote:  
> > > >
> > > > Hi,
> > > >
> > > > On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:    
> > > > > On Fri, 21 Feb 2020 10:08:42 +0100
> > > > > Neil Armstrong <narmstrong@baylibre.com> wrote:
> > > > >    
> ...
> > > > > > +/*
> > > > > > + * Amlogic Video Framebuffer Compression modifiers
> > > > > > + *
> > > > > > + * Amlogic uses a proprietary lossless image compression protocol and format
> > > > > > + * for their hardware video codec accelerators, either video decoders or
> > > > > > + * video input encoders.
> > > > > > + *
> > > > > > + * It considerably reduces memory bandwidth while writing and reading
> > > > > > + * frames in memory.
> > > > > > + * Implementation details may be platform and SoC specific, and shared
> > > > > > + * between the producer and the decoder on the same platform.    
> > > > >
> > > > > Hi,
> > > > >
> > > > > after a lengthy IRC discussion on #dri-devel, this "may be platform and
> > > > > SoC specific" is a problem.
> > > > >
> > > > > It can be an issue in two ways:
> > > > >
> > > > > - If something in the data acts like a sub-modifier, then advertising
> > > > >   support for one modifier does not really tell if the data layout is
> > > > >   supported or not.
> > > > >
> > > > > - If you need to know the platform and/or SoC to be able to interpret
> > > > >   the data, it means the modifier is ill-defined and cannot be used in
> > > > >   inter-machine communication (e.g. Pipewire).
> > > > >    
> > > >
> > > > Playing devil's advocate, the comment sounds similar to
> > > > I915_FORMAT_MOD_{X,Y}_TILED:
> > > >
> > > >  * This format is highly platforms specific and not useful for cross-driver
> > > >  * sharing. It exists since on a given platform it does uniquely identify the
> > > >  * layout in a simple way for i915-specific userspace.    
> > > 
> > > Yeah which we regret now. We need to now roll out a new set of
> > > modifiers for at least some of the differences in these on the
> > > modern-ish chips (the old crap is pretty much lost cause anyway).
> > > 
> > > This was kinda a nasty hack to smooth things over since we have epic
> > > amounts of userspace, but it's really not a great idea (and no one
> > > else really has epic amounts of existing userspace that uses tiling
> > > flags everywhere, this is all new code).
> > > -Daniel
> > >   
> > > > Isn't the statement that this for sharing between producer and decoder
> > > > _on the same platform_ a similar clause with the same effect?
> > > >
> > > > What advantage is there to exposing the gory details? For Arm AFBC
> > > > it's necessary because IP on the SoC can be (likely to be) from
> > > > different vendors with different capabilities.
> > > >
> > > > If this is only for talking between Amlogic IP on the same SoC, and
> > > > those devices support all the same "flavours", I don't see what is
> > > > gained by making userspace care about internals.    
> > > 
> > > The trouble is if you mix&match IP cores, and one of them supports
> > > flavours A, B, C and the other C, D, E. But all you have is a single
> > > magic modifier for "whatever the flavour is that soc prefers". So
> > > someone gets to stuff this in DT.
> > > 
> > > Also eventually, maybe, perhaps ARM does grow up into the
> > > client/server space with add-on pcie graphics, and at least for client
> > > you very often end up with integrated + add-in pcie gpu. At that point
> > > you really can't have magic per-soc modifiers anymore.  
> > 
> > Hi,
> > 
> > I also heard that Pipewire will copy buffers and modifiers verbatim
> > from one machine to another when streaming across network, assuming
> > that the same modifier means the same thing on all machines.[Citation needed]
> > 
> > If that is something that must not be done with DRM modifiers, then
> > please contact them and document that.
> 
> Sorry, it's waypipe, not pipewire:
> https://gitlab.freedesktop.org/mstoeckl/waypipe/

I do think this is very much something we want to make possible. They
might pick a silly modifier (compression modifiers only compress bw, by
necessity the lossless ones have to increase storage space so kinda dumb
thing to push over the network if you don't add .xz or whatever on top).

I'm also hoping that intel's modifiers are definitely the one and only
that we ever screwed up, and we should be getting those fixed in the near
future too.

So maybe what we should do instead is add a comment to the modifier docs
that this stuff _is_ supposed to be transferrable over networks and work.
-Daniel
Pekka Paalanen March 6, 2020, 12:30 p.m. UTC | #9
On Fri, 6 Mar 2020 11:13:28 +0100
Daniel Vetter <daniel@ffwll.ch> wrote:

> On Tue, Mar 03, 2020 at 05:33:32PM +0200, Pekka Paalanen wrote:
> > On Tue, 3 Mar 2020 15:25:41 +0200
> > Pekka Paalanen <ppaalanen@gmail.com> wrote:
> >   
> > > On Tue, 3 Mar 2020 12:37:16 +0100
> > > Daniel Vetter <daniel@ffwll.ch> wrote:
> > >   
> > > > On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey <brian.starkey@arm.com> wrote:    
> > > > >
> > > > > Hi,
> > > > >
> > > > > On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:      
> > > > > > On Fri, 21 Feb 2020 10:08:42 +0100
> > > > > > Neil Armstrong <narmstrong@baylibre.com> wrote:
> > > > > >      
> > ...  
> > > > > > > +/*
> > > > > > > + * Amlogic Video Framebuffer Compression modifiers
> > > > > > > + *
> > > > > > > + * Amlogic uses a proprietary lossless image compression protocol and format
> > > > > > > + * for their hardware video codec accelerators, either video decoders or
> > > > > > > + * video input encoders.
> > > > > > > + *
> > > > > > > + * It considerably reduces memory bandwidth while writing and reading
> > > > > > > + * frames in memory.
> > > > > > > + * Implementation details may be platform and SoC specific, and shared
> > > > > > > + * between the producer and the decoder on the same platform.      
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > after a lengthy IRC discussion on #dri-devel, this "may be platform and
> > > > > > SoC specific" is a problem.
> > > > > >
> > > > > > It can be an issue in two ways:
> > > > > >
> > > > > > - If something in the data acts like a sub-modifier, then advertising
> > > > > >   support for one modifier does not really tell if the data layout is
> > > > > >   supported or not.
> > > > > >
> > > > > > - If you need to know the platform and/or SoC to be able to interpret
> > > > > >   the data, it means the modifier is ill-defined and cannot be used in
> > > > > >   inter-machine communication (e.g. Pipewire).
> > > > > >      
> > > > >
> > > > > Playing devil's advocate, the comment sounds similar to
> > > > > I915_FORMAT_MOD_{X,Y}_TILED:
> > > > >
> > > > >  * This format is highly platforms specific and not useful for cross-driver
> > > > >  * sharing. It exists since on a given platform it does uniquely identify the
> > > > >  * layout in a simple way for i915-specific userspace.      
> > > > 
> > > > Yeah which we regret now. We need to now roll out a new set of
> > > > modifiers for at least some of the differences in these on the
> > > > modern-ish chips (the old crap is pretty much lost cause anyway).
> > > > 
> > > > This was kinda a nasty hack to smooth things over since we have epic
> > > > amounts of userspace, but it's really not a great idea (and no one
> > > > else really has epic amounts of existing userspace that uses tiling
> > > > flags everywhere, this is all new code).
> > > > -Daniel
> > > >     
> > > > > Isn't the statement that this for sharing between producer and decoder
> > > > > _on the same platform_ a similar clause with the same effect?
> > > > >
> > > > > What advantage is there to exposing the gory details? For Arm AFBC
> > > > > it's necessary because IP on the SoC can be (likely to be) from
> > > > > different vendors with different capabilities.
> > > > >
> > > > > If this is only for talking between Amlogic IP on the same SoC, and
> > > > > those devices support all the same "flavours", I don't see what is
> > > > > gained by making userspace care about internals.      
> > > > 
> > > > The trouble is if you mix&match IP cores, and one of them supports
> > > > flavours A, B, C and the other C, D, E. But all you have is a single
> > > > magic modifier for "whatever the flavour is that soc prefers". So
> > > > someone gets to stuff this in DT.
> > > > 
> > > > Also eventually, maybe, perhaps ARM does grow up into the
> > > > client/server space with add-on pcie graphics, and at least for client
> > > > you very often end up with integrated + add-in pcie gpu. At that point
> > > > you really can't have magic per-soc modifiers anymore.    
> > > 
> > > Hi,
> > > 
> > > I also heard that Pipewire will copy buffers and modifiers verbatim
> > > from one machine to another when streaming across network, assuming
> > > that the same modifier means the same thing on all machines.[Citation needed]
> > > 
> > > If that is something that must not be done with DRM modifiers, then
> > > please contact them and document that.  
> > 
> > Sorry, it's waypipe, not pipewire:
> > https://gitlab.freedesktop.org/mstoeckl/waypipe/  
> 
> I do think this is very much something we want to make possible. They
> might pick a silly modifier (compression modifiers only compress bw, by
> necessity the lossless ones have to increase storage space so kinda dumb
> thing to push over the network if you don't add .xz or whatever on top).
> 
> I'm also hoping that intel's modifiers are definitely the one and only
> that we ever screwed up, and we should be getting those fixed in the near
> future too.
> 
> So maybe what we should do instead is add a comment to the modifier docs
> that this stuff _is_ supposed to be transferrable over networks and work.

Personally I was not sure if it was so. Good to hear it is. Writing it
down would be much appreciated.

While at it, could you also write down something about the requirements
of memory layout documentation? What I mean is, is it required that the
memory layout is publicly specified *somewhere* if not in the modifier
doc itself?

It's not necessary for anyone to actually know the memory layout when
the use cases only involve hardware access, but if there is no public
spec I fear it would be easy to adapt an incompatible layout somewhere
and never be able to notice until some rare case of interoperability
mysteriously produces garbage.


Thanks,
pq
Neil Armstrong March 6, 2020, 2:40 p.m. UTC | #10
Hi Pekka, Brian, Daniel,

On 06/03/2020 11:13, Daniel Vetter wrote:
> On Tue, Mar 03, 2020 at 05:33:32PM +0200, Pekka Paalanen wrote:
>> On Tue, 3 Mar 2020 15:25:41 +0200
>> Pekka Paalanen <ppaalanen@gmail.com> wrote:
>>
>>> On Tue, 3 Mar 2020 12:37:16 +0100
>>> Daniel Vetter <daniel@ffwll.ch> wrote:
>>>
>>>> On Tue, Mar 3, 2020 at 11:53 AM Brian Starkey <brian.starkey@arm.com> wrote:  
>>>>>
>>>>> Hi,
>>>>>
>>>>> On Tue, Mar 03, 2020 at 12:10:29PM +0200, Pekka Paalanen wrote:    
>>>>>> On Fri, 21 Feb 2020 10:08:42 +0100
>>>>>> Neil Armstrong <narmstrong@baylibre.com> wrote:
>>>>>>    
>> ...
>>>>>>> +/*
>>>>>>> + * Amlogic Video Framebuffer Compression modifiers
>>>>>>> + *
>>>>>>> + * Amlogic uses a proprietary lossless image compression protocol and format
>>>>>>> + * for their hardware video codec accelerators, either video decoders or
>>>>>>> + * video input encoders.
>>>>>>> + *
>>>>>>> + * It considerably reduces memory bandwidth while writing and reading
>>>>>>> + * frames in memory.
>>>>>>> + * Implementation details may be platform and SoC specific, and shared
>>>>>>> + * between the producer and the decoder on the same platform.    
>>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> after a lengthy IRC discussion on #dri-devel, this "may be platform and
>>>>>> SoC specific" is a problem.

This one is definitely only for the SCATTER modifier, not the DEFAULT and MEM_SAVING.

>>>>>>
>>>>>> It can be an issue in two ways:
>>>>>>
>>>>>> - If something in the data acts like a sub-modifier, then advertising
>>>>>>   support for one modifier does not really tell if the data layout is
>>>>>>   supported or not.

It's clearly not.

The DEFAULT and MEM_SAVING modifiers are clearly transferable, and their layout is
extremely simple. While we don't have the memory compression algorithm, the memory
layout is simple to describe and doesn't act as a sub-modifier.

The complexity lies in the SCATTER modifier, which describe an instant live memory
layout, that is not transferable and with an unknown and variable layout.

>>>>>>
>>>>>> - If you need to know the platform and/or SoC to be able to interpret
>>>>>>   the data, it means the modifier is ill-defined and cannot be used in
>>>>>>   inter-machine communication (e.g. Pipewire).

It's not the case for the DEFAULT and MEM_SAVING modifiers.

The SCATTER modifier is mandatory for the Amlogic G12A and G12B HW video decoder,
but the same HW is capable of displaying the non-SCATTER buffer for example.

>>>>>>    
>>>>>
>>>>> Playing devil's advocate, the comment sounds similar to
>>>>> I915_FORMAT_MOD_{X,Y}_TILED:
>>>>>
>>>>>  * This format is highly platforms specific and not useful for cross-driver
>>>>>  * sharing. It exists since on a given platform it does uniquely identify the
>>>>>  * layout in a simple way for i915-specific userspace.    
>>>>
>>>> Yeah which we regret now. We need to now roll out a new set of
>>>> modifiers for at least some of the differences in these on the
>>>> modern-ish chips (the old crap is pretty much lost cause anyway).
>>>>
>>>> This was kinda a nasty hack to smooth things over since we have epic
>>>> amounts of userspace, but it's really not a great idea (and no one
>>>> else really has epic amounts of existing userspace that uses tiling
>>>> flags everywhere, this is all new code).
>>>> -Daniel
>>>>   
>>>>> Isn't the statement that this for sharing between producer and decoder
>>>>> _on the same platform_ a similar clause with the same effect?
>>>>>
>>>>> What advantage is there to exposing the gory details? For Arm AFBC
>>>>> it's necessary because IP on the SoC can be (likely to be) from
>>>>> different vendors with different capabilities.
>>>>>
>>>>> If this is only for talking between Amlogic IP on the same SoC, and
>>>>> those devices support all the same "flavours", I don't see what is
>>>>> gained by making userspace care about internals.    
>>>>
>>>> The trouble is if you mix&match IP cores, and one of them supports
>>>> flavours A, B, C and the other C, D, E. But all you have is a single
>>>> magic modifier for "whatever the flavour is that soc prefers". So
>>>> someone gets to stuff this in DT.

This is not the case here, maybe I should explicit the "DEFAULT" modifier with
a bit like "BASIC" to explicitly define support for the currently defined
DEFAULT mode.

>>>>
>>>> Also eventually, maybe, perhaps ARM does grow up into the
>>>> client/server space with add-on pcie graphics, and at least for client
>>>> you very often end up with integrated + add-in pcie gpu. At that point
>>>> you really can't have magic per-soc modifiers anymore.  
>>>
>>> Hi,
>>>
>>> I also heard that Pipewire will copy buffers and modifiers verbatim
>>> from one machine to another when streaming across network, assuming
>>> that the same modifier means the same thing on all machines.[Citation needed]

Transferring AFBC buffers doesn't sound like a good idea to me....

>>>
>>> If that is something that must not be done with DRM modifiers, then
>>> please contact them and document that.
>>
>> Sorry, it's waypipe, not pipewire:
>> https://gitlab.freedesktop.org/mstoeckl/waypipe/
> 
> I do think this is very much something we want to make possible. They
> might pick a silly modifier (compression modifiers only compress bw, by
> necessity the lossless ones have to increase storage space so kinda dumb
> thing to push over the network if you don't add .xz or whatever on top).

The AFBC, and Amlogic FBC are not size optimized compressions, but really
layout and memory access optimized compressions, without a proper network
size compression, transferring plain NV12 would be the same.

> 
> I'm also hoping that intel's modifiers are definitely the one and only
> that we ever screwed up, and we should be getting those fixed in the near
> future too.

I'd like too.

> 
> So maybe what we should do instead is add a comment to the modifier docs
> that this stuff _is_ supposed to be transferrable over networks and work.

Only the "SCATTER" is not transferable, the other options are definitely
transferable, and across 6 families and at least between a minimum of 15
different upstream supported SoCs.

Should it be in the modifier description ? should I add a reserved bit
in the Amlogic modifier space describing it's non-transferable nature ?


> -Daniel
> 

Neil
Pekka Paalanen March 6, 2020, 4:17 p.m. UTC | #11
On Fri, 6 Mar 2020 15:40:01 +0100
Neil Armstrong <narmstrong@baylibre.com> wrote:

> Hi Pekka, Brian, Daniel,
> 
> On 06/03/2020 11:13, Daniel Vetter wrote:
> > On Tue, Mar 03, 2020 at 05:33:32PM +0200, Pekka Paalanen wrote:  

...

> >> Sorry, it's waypipe, not pipewire:
> >> https://gitlab.freedesktop.org/mstoeckl/waypipe/  
> > 
> > I do think this is very much something we want to make possible. They
> > might pick a silly modifier (compression modifiers only compress bw, by
> > necessity the lossless ones have to increase storage space so kinda dumb
> > thing to push over the network if you don't add .xz or whatever on top).  
> 
> The AFBC, and Amlogic FBC are not size optimized compressions, but really
> layout and memory access optimized compressions, without a proper network
> size compression, transferring plain NV12 would be the same.

FWIW, waypipe is not intended to be the most efficient network
streaming protocol, but it is intended to be a direct Wayland protocol
proxy (X11 forwarding, anyone?), which means that it needs to be able to
transmit also dmabuf buffers as is. It does not want to understand
modifiers but just send opaque data.

It may or may not do lossless compression of the data it sends over the
wire, but it will replicate the dmabuf on the remote end.

Or so I'm told.


Thanks,
pq
diff mbox series

Patch

diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h
index 8bc0b31597d8..8a6e87bacadb 100644
--- a/include/uapi/drm/drm_fourcc.h
+++ b/include/uapi/drm/drm_fourcc.h
@@ -309,6 +309,7 @@  extern "C" {
 #define DRM_FORMAT_MOD_VENDOR_BROADCOM 0x07
 #define DRM_FORMAT_MOD_VENDOR_ARM     0x08
 #define DRM_FORMAT_MOD_VENDOR_ALLWINNER 0x09
+#define DRM_FORMAT_MOD_VENDOR_AMLOGIC 0x0a
 
 /* add more to the end as needed */
 
@@ -804,6 +805,61 @@  extern "C" {
  */
 #define DRM_FORMAT_MOD_ALLWINNER_TILED fourcc_mod_code(ALLWINNER, 1)
 
+/*
+ * Amlogic Video Framebuffer Compression modifiers
+ *
+ * Amlogic uses a proprietary lossless image compression protocol and format
+ * for their hardware video codec accelerators, either video decoders or
+ * video input encoders.
+ *
+ * It considerably reduces memory bandwidth while writing and reading
+ * frames in memory.
+ * Implementation details may be platform and SoC specific, and shared
+ * between the producer and the decoder on the same platform.
+ *
+ * The underlying storage is considered to be 3 components, 8bit or 10-bit
+ * per component YCbCr 420, single plane :
+ * - DRM_FORMAT_YUV420_8BIT
+ * - DRM_FORMAT_YUV420_10BIT
+ *
+ * The classic memory storage is composed of:
+ * - a body content organized in 64x32 superblocks with 4096 bytes per
+ *   superblock in default mode.
+ * - a 32 bytes per 128x64 header block
+ */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC_DEFAULT fourcc_mod_code(AMLOGIC, 0)
+
+/*
+ * Amlogic Video Framebuffer Compression Options
+ *
+ * Two optional features are available which may not supported/used on every
+ * SoCs and Compressed Framebuffer producers.
+ */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC(__modes) fourcc_mod_code(AMLOGIC, __modes)
+
+/*
+ * Amlogic FBC Scatter Memory layout
+ *
+ * Indicates the header contains IOMMU references to the compressed
+ * frames content to optimize memory access and layout.
+ * In this mode, only the header memory address is needed, thus the
+ * content memory organization is tied to the current producer
+ * execution and cannot be saved/dumped.
+ */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC_SCATTER	(1ULL << 0)
+
+/*
+ * Amlogic FBC Memory Saving mode
+ *
+ * Indicates the storage is packed when pixel size is multiple of word
+ * boudaries, i.e. 8bit should be stored in this mode to save allocation
+ * memory.
+ *
+ * This mode reduces body layout to 3072 bytes per 64x32 superblock and
+ * 3200 bytes per 64x32 superblock combined with scatter mode.
+ */
+#define DRM_FORMAT_MOD_AMLOGIC_FBC_MEM_SAVING	(1ULL << 1)
+
 #if defined(__cplusplus)
 }
 #endif