Message ID | 20230807140317.26146-2-jfalempe@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/fourcc: Add documentation about software color conversion. | expand |
Looks good to me. Maybe the IN_FORMATS prop docs is a better place for this?
Hi On Mon, Aug 07, 2023 at 03:45:15PM +0200, Jocelyn Falempe wrote: > After discussions on IRC, the consensus is that the DRM drivers should > not do software color conversion, and only advertise the supported formats. > Update the doc accordingly so that the rule and exceptions are clear for > everyone. > > Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> > --- > include/uapi/drm/drm_fourcc.h | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > index 8db7fd3f743e..00a29152da9f 100644 > --- a/include/uapi/drm/drm_fourcc.h > +++ b/include/uapi/drm/drm_fourcc.h > @@ -38,6 +38,13 @@ extern "C" { > * fourcc code, a Format Modifier may optionally be provided, in order to > * further describe the buffer's format - for example tiling or compression. > * > + * DRM drivers should not do software color conversion, and only advertise the > + * format they support in hardware. But there are two exceptions: I would do a bullet list here: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks > + * The first is to support XRGB8888 if the hardware doesn't support it, because > + * it's the de facto standard for userspace applications. We can also provide a bit more context here, something like: All drivers must support XRGB8888, even if the hardware cannot support it. This has become the de-facto standard and a lot of user-space assume it will be present. > + * The second is to drop the unused bits when sending the data to the hardware, > + * to improve the bandwidth, like dropping the "X" in XRGB8888. I think it can be made a bit more generic, with something like: Any driver is free to modify its internal representation of the format, as long as it doesn't alter the visible content in any way. An example would be to drop the padding component from a format to save some memory bandwidth. Otherwise, on principle, I'm fine with that change :) Maxime
On 09/08/2023 18:29, Simon Ser wrote: > Looks good to me. > Thanks for reviewing it. > Maybe the IN_FORMATS prop docs is a better place for this? > you mean to move it here ?: https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_plane.c#L132 I don't have a preference since it's my first contribution to the drm documentation.
On 10/08/2023 09:45, Maxime Ripard wrote: > Hi > > On Mon, Aug 07, 2023 at 03:45:15PM +0200, Jocelyn Falempe wrote: >> After discussions on IRC, the consensus is that the DRM drivers should >> not do software color conversion, and only advertise the supported formats. >> Update the doc accordingly so that the rule and exceptions are clear for >> everyone. >> >> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> >> --- >> include/uapi/drm/drm_fourcc.h | 7 +++++++ >> 1 file changed, 7 insertions(+) >> >> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h >> index 8db7fd3f743e..00a29152da9f 100644 >> --- a/include/uapi/drm/drm_fourcc.h >> +++ b/include/uapi/drm/drm_fourcc.h >> @@ -38,6 +38,13 @@ extern "C" { >> * fourcc code, a Format Modifier may optionally be provided, in order to >> * further describe the buffer's format - for example tiling or compression. >> * >> + * DRM drivers should not do software color conversion, and only advertise the >> + * format they support in hardware. But there are two exceptions: > > I would do a bullet list here: > https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks > ok, that would look better. >> + * The first is to support XRGB8888 if the hardware doesn't support it, because >> + * it's the de facto standard for userspace applications. > > We can also provide a bit more context here, something like: > > All drivers must support XRGB8888, even if the hardware cannot support > it. This has become the de-facto standard and a lot of user-space assume > it will be present. ok, I can add this before the first paragraph. > >> + * The second is to drop the unused bits when sending the data to the hardware, >> + * to improve the bandwidth, like dropping the "X" in XRGB8888. > > I think it can be made a bit more generic, with something like: > > Any driver is free to modify its internal representation of the format, > as long as it doesn't alter the visible content in any way. An example > would be to drop the padding component from a format to save some memory > bandwidth. ok, > > Otherwise, on principle, I'm fine with that change :) Thanks, I will wait a bit for other comments, and send a v2. > > Maxime
Jocelyn Falempe <jfalempe@redhat.com> writes: Hello Jocelyn, > On 10/08/2023 09:45, Maxime Ripard wrote: >> Hi >> >> On Mon, Aug 07, 2023 at 03:45:15PM +0200, Jocelyn Falempe wrote: >>> After discussions on IRC, the consensus is that the DRM drivers should >>> not do software color conversion, and only advertise the supported formats. >>> Update the doc accordingly so that the rule and exceptions are clear for >>> everyone. >>> >>> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> >>> --- >>> include/uapi/drm/drm_fourcc.h | 7 +++++++ >>> 1 file changed, 7 insertions(+) >>> >>> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h >>> index 8db7fd3f743e..00a29152da9f 100644 >>> --- a/include/uapi/drm/drm_fourcc.h >>> +++ b/include/uapi/drm/drm_fourcc.h >>> @@ -38,6 +38,13 @@ extern "C" { >>> * fourcc code, a Format Modifier may optionally be provided, in order to >>> * further describe the buffer's format - for example tiling or compression. >>> * >>> + * DRM drivers should not do software color conversion, and only advertise the >>> + * format they support in hardware. But there are two exceptions: >> >> I would do a bullet list here: >> https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks >> > ok, that would look better. > >>> + * The first is to support XRGB8888 if the hardware doesn't support it, because >>> + * it's the de facto standard for userspace applications. >> >> We can also provide a bit more context here, something like: >> >> All drivers must support XRGB8888, even if the hardware cannot support >> it. This has become the de-facto standard and a lot of user-space assume >> it will be present. > > ok, I can add this before the first paragraph. >> Agreed with Maxime's suggestion but I would also mention that if XRGB8888 is not supported, the native formats should be the default and XRGB8888 be only used as a fallback for user-space compatiblity. In other words, that XRGB8888 must not be the default, e.g: mode_config.preferred_depth = 24 or drm_fbdev_generic_setup(drm, 32) only if is a supported native format. I agree with the general content of the patch, so if you post a v2 feel free to add my Reviewed-by: Javier Martinez Canillas <javierm@redhat.com>
On Thu, 10 Aug 2023 09:45:27 +0200 Maxime Ripard <mripard@kernel.org> wrote: > Hi > > On Mon, Aug 07, 2023 at 03:45:15PM +0200, Jocelyn Falempe wrote: > > After discussions on IRC, the consensus is that the DRM drivers should > > not do software color conversion, and only advertise the supported formats. > > Update the doc accordingly so that the rule and exceptions are clear for > > everyone. > > > > Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> > > --- > > include/uapi/drm/drm_fourcc.h | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > > index 8db7fd3f743e..00a29152da9f 100644 > > --- a/include/uapi/drm/drm_fourcc.h > > +++ b/include/uapi/drm/drm_fourcc.h > > @@ -38,6 +38,13 @@ extern "C" { > > * fourcc code, a Format Modifier may optionally be provided, in order to > > * further describe the buffer's format - for example tiling or compression. > > * > > + * DRM drivers should not do software color conversion, and only advertise the > > + * format they support in hardware. But there are two exceptions: > > I would do a bullet list here: > https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks > > > + * The first is to support XRGB8888 if the hardware doesn't support it, because > > + * it's the de facto standard for userspace applications. > > We can also provide a bit more context here, something like: > > All drivers must support XRGB8888, even if the hardware cannot support > it. This has become the de-facto standard and a lot of user-space assume > it will be present. > > > + * The second is to drop the unused bits when sending the data to the hardware, > > + * to improve the bandwidth, like dropping the "X" in XRGB8888. > > I think it can be made a bit more generic, with something like: > > Any driver is free to modify its internal representation of the format, > as long as it doesn't alter the visible content in any way. An example > would be to drop the padding component from a format to save some memory > bandwidth. Hi, to my understanding and desire, the rule to not "fake" pixel format support is strictly related to performance. When a KMS client does a page flip, it usually does not expect a massive amount of CPU or GPU work to occur just because of the flip. A name for such work is "copy", referring to any kind of copying of large amounts of pixel data, including a format conversion or not. This is especially important with GPU rendering and hardware video playback systems, where any such copy could destroy the usability of the whole system. This is the main reason why KMS must not do any expensive processing unexpectedly (as in, not documented in UAPI). Doing any kind of copy could cause a vblank to be missed, ruining display timings. I believe the above is the spirit of the rule. Then there will be exceptions. I'd like to think that everything below (except for XRGB8888) can be derived from the above with common sense - that's what I did. XRGB8888 support is the prime exception. I suspect it originates from the legacy KMS UAPI, and the practise that XRGB8888 has been widely supported always. This makes it plausible for userspace to exist that cannot produce any other format. Hence, it is good to support XRGB8888 through a conversion (copy) in the kernel for dumb buffers (that is, for software rendered framebuffers). I would be very hesitant to extend this exception to GPU rendered buffers, but OTOH if you have a GPU, presumably you also have a display controller capable of scanning out what the GPU renders, so you wouldn't even consider copying under the hood. DRM devices that cannot directly scan out buffers at all are a whole category of exceptions. They include USB display adapters (literal USB, not USB-C alt mode), perhaps networked and wireless displays, VKMS which does everything in software, and so on. They simply have to process the bulk pixel data with a CPU one way or another, and hopefully they make use of damage rectangles to minimise the work. Old-school special cursor planes may have been using special pixel formats that may not be supported by userspace. Cursors are usually small images and they can make a huge performance impact, so it makes sense to support ARGB8888 even with a CPU conversion. Then we have display controllers without GPUs. Everything is software-rendered. If it so happens that software rendering into sysram and then copying (with conversion) into VRAM is more performant than rendering into VRAM, then the copy is well justified. Software-rendering into sysram and then copying into VRAM is actually so commonly preferred, that KMS has a special flag to suggest userspace does it: DRM_CAP_DUMB_PREFER_SHADOW [1]. A well-behaved software-rendering KMS client checks this flag and honours it. If a driver both sets the flag, and copies itself, then that's two copies for each flip. The driver's copy is unexpected, but is there a good reason for the driver to do it? I can only think one reason: hardware scanout pixel format being one that userspace cannot reasonably be expected to produce. I think nowadays RGB888 (the literally 3 bytes per pixel) format counts as one. If hardware requires RGB888 to e.g. reach a specific resolution, should the driver set DRM_CAP_DUMB_PREFER_SHADOW or not? If the driver always allocates sysram as dumb buffers because there simply is not enough VRAM to give out, then definitely not. That is a very good reason for the driver to do a copy/conversion with damage under the hood. It sucks, but it's the only way it can work. But if the dumb buffers are allocated in VRAM, then DRM_CAP_DUMB_PREFER_SHADOW should likely be set because direct VRAM access tends to be slow, and the driver should not copy - unless maybe for XRGB8888 and cursors. A CPU copy from VRAM into VRAM is the worst. For RGB888 hardware and software rendering, it would be best if: - Dumb buffers are allocated from VRAM, making them directly scanout-able for RGB888. - DRM_CAP_DUMB_PREFER_SHADOW is set, telling userspace to render into a shadow and then copy into a dumb buffer. - Userspace's copy into a dumb buffer produces RGB888, while the shadow buffer can be of any format userspace likes. This minimises the amount of work done, and page flips are literal flips without a hidden copy in the driver. If userspace does not support RGB888, things get hairy. If XRGB8888 is faked via a driver copy, then you would not want to be copying from VRAM into VRAM. Create-dumb ioctl passes in bpp, so the driver could special-case 24 vs. 32 I guess, allocating 24 from VRAM and 32 from sysram. But do you set DRM_CAP_DUMB_PREFER_SHADOW? It would always be wrong for the other format. Ideally, XRGB8888 would be supported without artificially crippling more optimal pixel formats by lack of DRM_CAP_DUMB_PREFER_SHADOW, even if XRGB8888 support is fake and hurt by DRM_CAP_DUMB_PREFER_SHADOW. But that also depends on the expected userspace, which format does it use. [1] https://dri.freedesktop.org/docs/drm/gpu/drm-uapi.html#c.DRM_CAP_DUMB_PREFER_SHADOW Thanks, pq
Hi Pekka, Thanks for answering On Fri, Aug 18, 2023 at 04:24:15PM +0300, Pekka Paalanen wrote: > On Thu, 10 Aug 2023 09:45:27 +0200 > Maxime Ripard <mripard@kernel.org> wrote: > > On Mon, Aug 07, 2023 at 03:45:15PM +0200, Jocelyn Falempe wrote: > > > After discussions on IRC, the consensus is that the DRM drivers should > > > not do software color conversion, and only advertise the supported formats. > > > Update the doc accordingly so that the rule and exceptions are clear for > > > everyone. > > > > > > Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> > > > --- > > > include/uapi/drm/drm_fourcc.h | 7 +++++++ > > > 1 file changed, 7 insertions(+) > > > > > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > > > index 8db7fd3f743e..00a29152da9f 100644 > > > --- a/include/uapi/drm/drm_fourcc.h > > > +++ b/include/uapi/drm/drm_fourcc.h > > > @@ -38,6 +38,13 @@ extern "C" { > > > * fourcc code, a Format Modifier may optionally be provided, in order to > > > * further describe the buffer's format - for example tiling or compression. > > > * > > > + * DRM drivers should not do software color conversion, and only advertise the > > > + * format they support in hardware. But there are two exceptions: > > > > I would do a bullet list here: > > https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks > > > > > + * The first is to support XRGB8888 if the hardware doesn't support it, because > > > + * it's the de facto standard for userspace applications. > > > > We can also provide a bit more context here, something like: > > > > All drivers must support XRGB8888, even if the hardware cannot support > > it. This has become the de-facto standard and a lot of user-space assume > > it will be present. > > > > > + * The second is to drop the unused bits when sending the data to the hardware, > > > + * to improve the bandwidth, like dropping the "X" in XRGB8888. > > > > I think it can be made a bit more generic, with something like: > > > > Any driver is free to modify its internal representation of the format, > > as long as it doesn't alter the visible content in any way. An example > > would be to drop the padding component from a format to save some memory > > bandwidth. > > to my understanding and desire, the rule to not "fake" pixel format > support is strictly related to performance. When a KMS client does a > page flip, it usually does not expect a massive amount of CPU or GPU > work to occur just because of the flip. A name for such work is "copy", > referring to any kind of copying of large amounts of pixel data, > including a format conversion or not. Should we add that to the suggested documentation that it shouldn't degrade performance and shouldn't be something that the userspace can notice? > This is especially important with GPU rendering and hardware video > playback systems, where any such copy could destroy the usability of > the whole system. This is the main reason why KMS must not do any > expensive processing unexpectedly (as in, not documented in UAPI). > Doing any kind of copy could cause a vblank to be missed, ruining > display timings. > > I believe the above is the spirit of the rule. That's totally reasonable to me :) > Then there will be exceptions. I'd like to think that everything below > (except for XRGB8888) can be derived from the above with common sense > - that's what I did. > > XRGB8888 support is the prime exception. I suspect it originates from > the legacy KMS UAPI, and the practise that XRGB8888 has been widely > supported always. This makes it plausible for userspace to exist that > cannot produce any other format. Hence, it is good to support XRGB8888 > through a conversion (copy) in the kernel for dumb buffers (that is, > for software rendered framebuffers). I would be very hesitant to extend > this exception to GPU rendered buffers, but OTOH if you have a GPU, > presumably you also have a display controller capable of scanning out > what the GPU renders, so you wouldn't even consider copying under the > hood. > > DRM devices that cannot directly scan out buffers at all are a whole > category of exceptions. They include USB display adapters (literal USB, > not USB-C alt mode), perhaps networked and wireless displays, VKMS > which does everything in software, and so on. They simply have to > process the bulk pixel data with a CPU one way or another, and > hopefully they make use of damage rectangles to minimise the work. > > Old-school special cursor planes may have been using special pixel > formats that may not be supported by userspace. Cursors are usually > small images and they can make a huge performance impact, so it makes > sense to support ARGB8888 even with a CPU conversion. > > Then we have display controllers without GPUs. Everything is > software-rendered. If it so happens that software rendering into sysram > and then copying (with conversion) into VRAM is more performant than > rendering into VRAM, then the copy is well justified. > > Software-rendering into sysram and then copying into VRAM is actually > so commonly preferred, that KMS has a special flag to suggest userspace > does it: DRM_CAP_DUMB_PREFER_SHADOW [1]. A well-behaved > software-rendering KMS client checks this flag and honours it. If a > driver both sets the flag, and copies itself, then that's two copies > for each flip. The driver's copy is unexpected, but is there a good > reason for the driver to do it? > > I can only think one reason: hardware scanout pixel format being one > that userspace cannot reasonably be expected to produce. I think > nowadays RGB888 (the literally 3 bytes per pixel) format counts as one. I guess any tiled format would count there too, right? > If hardware requires RGB888 to e.g. reach a specific resolution, should > the driver set DRM_CAP_DUMB_PREFER_SHADOW or not? If the driver always > allocates sysram as dumb buffers because there simply is not enough > VRAM to give out, then definitely not. That is a very good reason for > the driver to do a copy/conversion with damage under the hood. It > sucks, but it's the only way it can work. > > But if the dumb buffers are allocated in VRAM, then > DRM_CAP_DUMB_PREFER_SHADOW should likely be set because direct VRAM > access tends to be slow, and the driver should not copy - unless maybe > for XRGB8888 and cursors. A CPU copy from VRAM into VRAM is the worst. > > For RGB888 hardware and software rendering, it would be best if: > > - Dumb buffers are allocated from VRAM, making them directly > scanout-able for RGB888. > > - DRM_CAP_DUMB_PREFER_SHADOW is set, telling userspace to render into a > shadow and then copy into a dumb buffer. > > - Userspace's copy into a dumb buffer produces RGB888, while the shadow > buffer can be of any format userspace likes. > > This minimises the amount of work done, and page flips are literal > flips without a hidden copy in the driver. > > If userspace does not support RGB888, things get hairy. If XRGB8888 is > faked via a driver copy, then you would not want to be copying from > VRAM into VRAM. Create-dumb ioctl passes in bpp, so the driver could > special-case 24 vs. 32 I guess, allocating 24 from VRAM and 32 from > sysram. But do you set DRM_CAP_DUMB_PREFER_SHADOW? It would always be > wrong for the other format. > > Ideally, XRGB8888 would be supported without artificially crippling > more optimal pixel formats by lack of DRM_CAP_DUMB_PREFER_SHADOW, even > if XRGB8888 support is fake and hurt by DRM_CAP_DUMB_PREFER_SHADOW. But > that also depends on the expected userspace, which format does it use. Jocelyn, I think we should have a link to that mail too in the doc you've written. Pekka, it's not clear to me though if what you wrote here are changes you would like on the original patch, or if it's just general thoughts? Maxime
On Mon, 21 Aug 2023 17:55:33 +0200 Maxime Ripard <mripard@kernel.org> wrote: > Hi Pekka, > > Thanks for answering > > On Fri, Aug 18, 2023 at 04:24:15PM +0300, Pekka Paalanen wrote: > > On Thu, 10 Aug 2023 09:45:27 +0200 > > Maxime Ripard <mripard@kernel.org> wrote: > > > On Mon, Aug 07, 2023 at 03:45:15PM +0200, Jocelyn Falempe wrote: > > > > After discussions on IRC, the consensus is that the DRM drivers should > > > > not do software color conversion, and only advertise the supported formats. > > > > Update the doc accordingly so that the rule and exceptions are clear for > > > > everyone. > > > > > > > > Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> > > > > --- > > > > include/uapi/drm/drm_fourcc.h | 7 +++++++ > > > > 1 file changed, 7 insertions(+) > > > > > > > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > > > > index 8db7fd3f743e..00a29152da9f 100644 > > > > --- a/include/uapi/drm/drm_fourcc.h > > > > +++ b/include/uapi/drm/drm_fourcc.h > > > > @@ -38,6 +38,13 @@ extern "C" { > > > > * fourcc code, a Format Modifier may optionally be provided, in order to > > > > * further describe the buffer's format - for example tiling or compression. > > > > * > > > > + * DRM drivers should not do software color conversion, and only advertise the > > > > + * format they support in hardware. But there are two exceptions: > > > > > > I would do a bullet list here: > > > https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks > > > > > > > + * The first is to support XRGB8888 if the hardware doesn't support it, because > > > > + * it's the de facto standard for userspace applications. > > > > > > We can also provide a bit more context here, something like: > > > > > > All drivers must support XRGB8888, even if the hardware cannot support > > > it. This has become the de-facto standard and a lot of user-space assume > > > it will be present. > > > > > > > + * The second is to drop the unused bits when sending the data to the hardware, > > > > + * to improve the bandwidth, like dropping the "X" in XRGB8888. > > > > > > I think it can be made a bit more generic, with something like: > > > > > > Any driver is free to modify its internal representation of the format, > > > as long as it doesn't alter the visible content in any way. An example > > > would be to drop the padding component from a format to save some memory > > > bandwidth. > > > > to my understanding and desire, the rule to not "fake" pixel format > > support is strictly related to performance. When a KMS client does a > > page flip, it usually does not expect a massive amount of CPU or GPU > > work to occur just because of the flip. A name for such work is "copy", > > referring to any kind of copying of large amounts of pixel data, > > including a format conversion or not. > > Should we add that to the suggested documentation that it shouldn't > degrade performance and shouldn't be something that the userspace can > notice? I would let Sima (or Simon Ser) answer that, and verify my understanding too. > > This is especially important with GPU rendering and hardware video > > playback systems, where any such copy could destroy the usability of > > the whole system. This is the main reason why KMS must not do any > > expensive processing unexpectedly (as in, not documented in UAPI). > > Doing any kind of copy could cause a vblank to be missed, ruining > > display timings. > > > > I believe the above is the spirit of the rule. > > That's totally reasonable to me :) > > > Then there will be exceptions. I'd like to think that everything below > > (except for XRGB8888) can be derived from the above with common sense > > - that's what I did. > > > > XRGB8888 support is the prime exception. I suspect it originates from > > the legacy KMS UAPI, and the practise that XRGB8888 has been widely > > supported always. This makes it plausible for userspace to exist that > > cannot produce any other format. Hence, it is good to support XRGB8888 > > through a conversion (copy) in the kernel for dumb buffers (that is, > > for software rendered framebuffers). I would be very hesitant to extend > > this exception to GPU rendered buffers, but OTOH if you have a GPU, > > presumably you also have a display controller capable of scanning out > > what the GPU renders, so you wouldn't even consider copying under the > > hood. > > > > DRM devices that cannot directly scan out buffers at all are a whole > > category of exceptions. They include USB display adapters (literal USB, > > not USB-C alt mode), perhaps networked and wireless displays, VKMS > > which does everything in software, and so on. They simply have to > > process the bulk pixel data with a CPU one way or another, and > > hopefully they make use of damage rectangles to minimise the work. > > > > Old-school special cursor planes may have been using special pixel > > formats that may not be supported by userspace. Cursors are usually > > small images and they can make a huge performance impact, so it makes > > sense to support ARGB8888 even with a CPU conversion. > > > > Then we have display controllers without GPUs. Everything is > > software-rendered. If it so happens that software rendering into sysram > > and then copying (with conversion) into VRAM is more performant than > > rendering into VRAM, then the copy is well justified. > > > > Software-rendering into sysram and then copying into VRAM is actually > > so commonly preferred, that KMS has a special flag to suggest userspace > > does it: DRM_CAP_DUMB_PREFER_SHADOW [1]. A well-behaved > > software-rendering KMS client checks this flag and honours it. If a > > driver both sets the flag, and copies itself, then that's two copies > > for each flip. The driver's copy is unexpected, but is there a good > > reason for the driver to do it? > > > > I can only think one reason: hardware scanout pixel format being one > > that userspace cannot reasonably be expected to produce. I think > > nowadays RGB888 (the literally 3 bytes per pixel) format counts as one. > > I guess any tiled format would count there too, right? Only if the display controller is unable to read a linear format, or some other extremely pressing reason that makes the use of a linear format infeasible, e.g. hardware bugs or prohibitive memory bandwidth usage. That would be very surprising though, given that all video signal formats I know about scan in a linear fashion, meaning that any other framebuffer layout would cause complications like needing lots of CRTC pixel buffer memory. In short, I think not. Sub-sampled YUV is another matter, but I assume you meant non-linear DRM format modifiers rather than just YUV layouts. Again, there could be exceptions in ancient hardware. I recall hearing that some hardware might have stored pixels as "bit planes", meaning that in memory you have a bit plane after a bit plane, and each bit plane contains one specific bit of all pixels in the image. We do not have DRM format codes nor modifiers to express such layout that I know of, so under-the-hood driver conversion would be appropriate since I don't think any software would want to produce such images today. > > If hardware requires RGB888 to e.g. reach a specific resolution, should > > the driver set DRM_CAP_DUMB_PREFER_SHADOW or not? If the driver always > > allocates sysram as dumb buffers because there simply is not enough > > VRAM to give out, then definitely not. That is a very good reason for > > the driver to do a copy/conversion with damage under the hood. It > > sucks, but it's the only way it can work. > > > > But if the dumb buffers are allocated in VRAM, then > > DRM_CAP_DUMB_PREFER_SHADOW should likely be set because direct VRAM > > access tends to be slow, and the driver should not copy - unless maybe > > for XRGB8888 and cursors. A CPU copy from VRAM into VRAM is the worst. > > > > For RGB888 hardware and software rendering, it would be best if: > > > > - Dumb buffers are allocated from VRAM, making them directly > > scanout-able for RGB888. > > > > - DRM_CAP_DUMB_PREFER_SHADOW is set, telling userspace to render into a > > shadow and then copy into a dumb buffer. > > > > - Userspace's copy into a dumb buffer produces RGB888, while the shadow > > buffer can be of any format userspace likes. > > > > This minimises the amount of work done, and page flips are literal > > flips without a hidden copy in the driver. > > > > If userspace does not support RGB888, things get hairy. If XRGB8888 is > > faked via a driver copy, then you would not want to be copying from > > VRAM into VRAM. Create-dumb ioctl passes in bpp, so the driver could > > special-case 24 vs. 32 I guess, allocating 24 from VRAM and 32 from > > sysram. But do you set DRM_CAP_DUMB_PREFER_SHADOW? It would always be > > wrong for the other format. > > > > Ideally, XRGB8888 would be supported without artificially crippling > > more optimal pixel formats by lack of DRM_CAP_DUMB_PREFER_SHADOW, even > > if XRGB8888 support is fake and hurt by DRM_CAP_DUMB_PREFER_SHADOW. But > > that also depends on the expected userspace, which format does it use. > > Jocelyn, I think we should have a link to that mail too in the doc > you've written. > > Pekka, it's not clear to me though if what you wrote here are changes > you would like on the original patch, or if it's just general thoughts? It is my understanding of the matter and of the background. Take from it what you feel is important. The proposed patch is not wrong, but it is perhaps a bit too simple description, like not limiting CPU conversions to dumb buffers, and ignoring the worst case: dumb buffers in VRAM + CPU conversion which would have utterly horrible performance. I feel the interaction with DRM_CAP_DUMB_PREFER_SHADOW is important to note. I'm not sure what should be documented of it. Also, things like the Pixman library support conversions between tons of pixel formats, and is able to use hand-tuned SSE, vector and whatnot instructions in order to optimise conversions. I doubt you could reach that level of flexibility and performance in-kernel. In the XRGB8888 to RGB888 case, the kernel doing the conversion may allow for higher resolutions, but it may also hurt framerate more than userspace doing conversion, theoretically. For lower resolution where XRGB8888 could be scanned out directly from VRAM, the conversion would be strictly hurting. Thanks, pq
On 22/08/2023 10:20, Pekka Paalanen wrote: > On Mon, 21 Aug 2023 17:55:33 +0200 > Maxime Ripard <mripard@kernel.org> wrote: > >> Hi Pekka, >> >> Thanks for answering >> >> On Fri, Aug 18, 2023 at 04:24:15PM +0300, Pekka Paalanen wrote: >>> On Thu, 10 Aug 2023 09:45:27 +0200 >>> Maxime Ripard <mripard@kernel.org> wrote: >>>> On Mon, Aug 07, 2023 at 03:45:15PM +0200, Jocelyn Falempe wrote: >>>>> After discussions on IRC, the consensus is that the DRM drivers should >>>>> not do software color conversion, and only advertise the supported formats. >>>>> Update the doc accordingly so that the rule and exceptions are clear for >>>>> everyone. >>>>> >>>>> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> >>>>> --- >>>>> include/uapi/drm/drm_fourcc.h | 7 +++++++ >>>>> 1 file changed, 7 insertions(+) >>>>> >>>>> diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h >>>>> index 8db7fd3f743e..00a29152da9f 100644 >>>>> --- a/include/uapi/drm/drm_fourcc.h >>>>> +++ b/include/uapi/drm/drm_fourcc.h >>>>> @@ -38,6 +38,13 @@ extern "C" { >>>>> * fourcc code, a Format Modifier may optionally be provided, in order to >>>>> * further describe the buffer's format - for example tiling or compression. >>>>> * >>>>> + * DRM drivers should not do software color conversion, and only advertise the >>>>> + * format they support in hardware. But there are two exceptions: >>>> >>>> I would do a bullet list here: >>>> https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html#lists-and-quote-like-blocks >>>> >>>>> + * The first is to support XRGB8888 if the hardware doesn't support it, because >>>>> + * it's the de facto standard for userspace applications. >>>> >>>> We can also provide a bit more context here, something like: >>>> >>>> All drivers must support XRGB8888, even if the hardware cannot support >>>> it. This has become the de-facto standard and a lot of user-space assume >>>> it will be present. >>>> >>>>> + * The second is to drop the unused bits when sending the data to the hardware, >>>>> + * to improve the bandwidth, like dropping the "X" in XRGB8888. >>>> >>>> I think it can be made a bit more generic, with something like: >>>> >>>> Any driver is free to modify its internal representation of the format, >>>> as long as it doesn't alter the visible content in any way. An example >>>> would be to drop the padding component from a format to save some memory >>>> bandwidth. >>> >>> to my understanding and desire, the rule to not "fake" pixel format >>> support is strictly related to performance. When a KMS client does a >>> page flip, it usually does not expect a massive amount of CPU or GPU >>> work to occur just because of the flip. A name for such work is "copy", >>> referring to any kind of copying of large amounts of pixel data, >>> including a format conversion or not. >> >> Should we add that to the suggested documentation that it shouldn't >> degrade performance and shouldn't be something that the userspace can >> notice? > > I would let Sima (or Simon Ser) answer that, and verify my > understanding too. > >>> This is especially important with GPU rendering and hardware video >>> playback systems, where any such copy could destroy the usability of >>> the whole system. This is the main reason why KMS must not do any >>> expensive processing unexpectedly (as in, not documented in UAPI). >>> Doing any kind of copy could cause a vblank to be missed, ruining >>> display timings. >>> >>> I believe the above is the spirit of the rule. >> >> That's totally reasonable to me :) >> >>> Then there will be exceptions. I'd like to think that everything below >>> (except for XRGB8888) can be derived from the above with common sense >>> - that's what I did. >>> >>> XRGB8888 support is the prime exception. I suspect it originates from >>> the legacy KMS UAPI, and the practise that XRGB8888 has been widely >>> supported always. This makes it plausible for userspace to exist that >>> cannot produce any other format. Hence, it is good to support XRGB8888 >>> through a conversion (copy) in the kernel for dumb buffers (that is, >>> for software rendered framebuffers). I would be very hesitant to extend >>> this exception to GPU rendered buffers, but OTOH if you have a GPU, >>> presumably you also have a display controller capable of scanning out >>> what the GPU renders, so you wouldn't even consider copying under the >>> hood. >>> >>> DRM devices that cannot directly scan out buffers at all are a whole >>> category of exceptions. They include USB display adapters (literal USB, >>> not USB-C alt mode), perhaps networked and wireless displays, VKMS >>> which does everything in software, and so on. They simply have to >>> process the bulk pixel data with a CPU one way or another, and >>> hopefully they make use of damage rectangles to minimise the work. >>> >>> Old-school special cursor planes may have been using special pixel >>> formats that may not be supported by userspace. Cursors are usually >>> small images and they can make a huge performance impact, so it makes >>> sense to support ARGB8888 even with a CPU conversion. >>> >>> Then we have display controllers without GPUs. Everything is >>> software-rendered. If it so happens that software rendering into sysram >>> and then copying (with conversion) into VRAM is more performant than >>> rendering into VRAM, then the copy is well justified. >>> >>> Software-rendering into sysram and then copying into VRAM is actually >>> so commonly preferred, that KMS has a special flag to suggest userspace >>> does it: DRM_CAP_DUMB_PREFER_SHADOW [1]. A well-behaved >>> software-rendering KMS client checks this flag and honours it. If a >>> driver both sets the flag, and copies itself, then that's two copies >>> for each flip. The driver's copy is unexpected, but is there a good >>> reason for the driver to do it? >>> >>> I can only think one reason: hardware scanout pixel format being one >>> that userspace cannot reasonably be expected to produce. I think >>> nowadays RGB888 (the literally 3 bytes per pixel) format counts as one. >> >> I guess any tiled format would count there too, right? > > Only if the display controller is unable to read a linear format, or > some other extremely pressing reason that makes the use of a linear > format infeasible, e.g. hardware bugs or prohibitive memory bandwidth > usage. That would be very surprising though, given that all video > signal formats I know about scan in a linear fashion, meaning that any > other framebuffer layout would cause complications like needing lots of > CRTC pixel buffer memory. > > In short, I think not. > > Sub-sampled YUV is another matter, but I assume you meant non-linear > DRM format modifiers rather than just YUV layouts. > > Again, there could be exceptions in ancient hardware. I recall hearing > that some hardware might have stored pixels as "bit planes", meaning > that in memory you have a bit plane after a bit plane, and each bit > plane contains one specific bit of all pixels in the image. We do not > have DRM format codes nor modifiers to express such layout that I know > of, so under-the-hood driver conversion would be appropriate since I > don't think any software would want to produce such images today. > >>> If hardware requires RGB888 to e.g. reach a specific resolution, should >>> the driver set DRM_CAP_DUMB_PREFER_SHADOW or not? If the driver always >>> allocates sysram as dumb buffers because there simply is not enough >>> VRAM to give out, then definitely not. That is a very good reason for >>> the driver to do a copy/conversion with damage under the hood. It >>> sucks, but it's the only way it can work. >>> >>> But if the dumb buffers are allocated in VRAM, then >>> DRM_CAP_DUMB_PREFER_SHADOW should likely be set because direct VRAM >>> access tends to be slow, and the driver should not copy - unless maybe >>> for XRGB8888 and cursors. A CPU copy from VRAM into VRAM is the worst. >>> >>> For RGB888 hardware and software rendering, it would be best if: >>> >>> - Dumb buffers are allocated from VRAM, making them directly >>> scanout-able for RGB888. >>> >>> - DRM_CAP_DUMB_PREFER_SHADOW is set, telling userspace to render into a >>> shadow and then copy into a dumb buffer. >>> >>> - Userspace's copy into a dumb buffer produces RGB888, while the shadow >>> buffer can be of any format userspace likes. >>> >>> This minimises the amount of work done, and page flips are literal >>> flips without a hidden copy in the driver. >>> >>> If userspace does not support RGB888, things get hairy. If XRGB8888 is >>> faked via a driver copy, then you would not want to be copying from >>> VRAM into VRAM. Create-dumb ioctl passes in bpp, so the driver could >>> special-case 24 vs. 32 I guess, allocating 24 from VRAM and 32 from >>> sysram. But do you set DRM_CAP_DUMB_PREFER_SHADOW? It would always be >>> wrong for the other format. >>> >>> Ideally, XRGB8888 would be supported without artificially crippling >>> more optimal pixel formats by lack of DRM_CAP_DUMB_PREFER_SHADOW, even >>> if XRGB8888 support is fake and hurt by DRM_CAP_DUMB_PREFER_SHADOW. But >>> that also depends on the expected userspace, which format does it use. >> >> Jocelyn, I think we should have a link to that mail too in the doc >> you've written. >> >> Pekka, it's not clear to me though if what you wrote here are changes >> you would like on the original patch, or if it's just general thoughts? > > It is my understanding of the matter and of the background. Take from it > what you feel is important. The proposed patch is not wrong, but it is > perhaps a bit too simple description, like not limiting CPU conversions > to dumb buffers, and ignoring the worst case: dumb buffers in VRAM + > CPU conversion which would have utterly horrible performance. I feel > the interaction with DRM_CAP_DUMB_PREFER_SHADOW is important to note. Yes, I will add the dumb buffer limitation, and a paragraph about DRM_CAP_DUMB_PREFER_SHADOW in v2. I can also add a link to this mail thread. > > I'm not sure what should be documented of it. > > Also, things like the Pixman library support conversions between tons > of pixel formats, and is able to use hand-tuned SSE, vector and whatnot > instructions in order to optimise conversions. I doubt you could reach > that level of flexibility and performance in-kernel. > > In the XRGB8888 to RGB888 case, the kernel doing the conversion may > allow for higher resolutions, but it may also hurt framerate more than > userspace doing conversion, theoretically. For lower resolution where > XRGB8888 could be scanned out directly from VRAM, the conversion would > be strictly hurting. > I think that depends on the hardware. For the Matrox, the bandwidth between system RAM and VRAM is so low, that doing the conversion is much faster than copying XRGB8888 to the VRAM. It also uses less CPU cycles, because the copy is done with memcpy(), (DMA is broken or even slower on most mgag200 hardware). To get numbers, on my test machine, copying the 5MB framebuffer XRGB8888 to VRAM takes 125ms. Copying 3.75MB RGB888 framebuffer takes 95ms. The conversion has no measurable impact, as it is done on the fly during the memcpy, when the CPU is waiting for the bus to accept more data. Doing the conversion in user-space would actually be slower, even with SSE, because they won't use the "wasted" cpu cycle. But anyway the conversion can take a few micro-seconds on the CPU, which is still negligible compared to the memory transfer. > > Thanks, > pq Before sending the v2, Simon Ser proposed to move the paragraph to https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_plane.c#L132 instead of fourcc.h. I'm wondering what other thinks about this. Thanks,
On Tue, 22 Aug 2023 17:49:08 +0200 Jocelyn Falempe <jfalempe@redhat.com> wrote: > On 22/08/2023 10:20, Pekka Paalanen wrote: > > On Mon, 21 Aug 2023 17:55:33 +0200 > > Maxime Ripard <mripard@kernel.org> wrote: > > > >> Hi Pekka, > >> > >> Thanks for answering > >> > >> On Fri, Aug 18, 2023 at 04:24:15PM +0300, Pekka Paalanen wrote: > >>> On Thu, 10 Aug 2023 09:45:27 +0200 > >>> Maxime Ripard <mripard@kernel.org> wrote: > >>>> On Mon, Aug 07, 2023 at 03:45:15PM +0200, Jocelyn Falempe wrote: > >>>>> After discussions on IRC, the consensus is that the DRM drivers should > >>>>> not do software color conversion, and only advertise the supported formats. > >>>>> Update the doc accordingly so that the rule and exceptions are clear for > >>>>> everyone. > >>>>> > >>>>> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> > >>>>> --- > >>>>> include/uapi/drm/drm_fourcc.h | 7 +++++++ > >>>>> 1 file changed, 7 insertions(+) ... > > In the XRGB8888 to RGB888 case, the kernel doing the conversion may > > allow for higher resolutions, but it may also hurt framerate more than > > userspace doing conversion, theoretically. For lower resolution where > > XRGB8888 could be scanned out directly from VRAM, the conversion would > > be strictly hurting. > > > I think that depends on the hardware. For the Matrox, the bandwidth > between system RAM and VRAM is so low, that doing the conversion is much > faster than copying XRGB8888 to the VRAM. It also uses less CPU cycles, > because the copy is done with memcpy(), (DMA is broken or even slower on > most mgag200 hardware). > To get numbers, on my test machine, copying the 5MB framebuffer XRGB8888 > to VRAM takes 125ms. Copying 3.75MB RGB888 framebuffer takes 95ms. The > conversion has no measurable impact, as it is done on the fly during the > memcpy, when the CPU is waiting for the bus to accept more data. > Doing the conversion in user-space would actually be slower, even with > SSE, because they won't use the "wasted" cpu cycle. But anyway the > conversion can take a few micro-seconds on the CPU, which is still > negligible compared to the memory transfer. I stand corrected! Always exceptions. :-) I suppose you have dumb alloc returning sysmem, and PREFER_SHADOW set to false to go with that? Sounds good for XRGB8888. I guess there is not enough VRAM to alloc dumb buffers from there anyway? > Before sending the v2, Simon Ser proposed to move the paragraph to > https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_plane.c#L132 > instead of fourcc.h. > I'm wondering what other thinks about this. I like moving it out of drm_fourcc.h. drm_fourcc.h is about format definitions which are used by things like EGL, Wayland, and whatnot which are not KMS specific. Thanks, pq
On 23/08/2023 10:11, Pekka Paalanen wrote: > On Tue, 22 Aug 2023 17:49:08 +0200 > Jocelyn Falempe <jfalempe@redhat.com> wrote: > >> On 22/08/2023 10:20, Pekka Paalanen wrote: >>> On Mon, 21 Aug 2023 17:55:33 +0200 >>> Maxime Ripard <mripard@kernel.org> wrote: >>> >>>> Hi Pekka, >>>> >>>> Thanks for answering >>>> >>>> On Fri, Aug 18, 2023 at 04:24:15PM +0300, Pekka Paalanen wrote: >>>>> On Thu, 10 Aug 2023 09:45:27 +0200 >>>>> Maxime Ripard <mripard@kernel.org> wrote: >>>>>> On Mon, Aug 07, 2023 at 03:45:15PM +0200, Jocelyn Falempe wrote: >>>>>>> After discussions on IRC, the consensus is that the DRM drivers should >>>>>>> not do software color conversion, and only advertise the supported formats. >>>>>>> Update the doc accordingly so that the rule and exceptions are clear for >>>>>>> everyone. >>>>>>> >>>>>>> Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> >>>>>>> --- >>>>>>> include/uapi/drm/drm_fourcc.h | 7 +++++++ >>>>>>> 1 file changed, 7 insertions(+) > > ... > >>> In the XRGB8888 to RGB888 case, the kernel doing the conversion may >>> allow for higher resolutions, but it may also hurt framerate more than >>> userspace doing conversion, theoretically. For lower resolution where >>> XRGB8888 could be scanned out directly from VRAM, the conversion would >>> be strictly hurting. >>> >> I think that depends on the hardware. For the Matrox, the bandwidth >> between system RAM and VRAM is so low, that doing the conversion is much >> faster than copying XRGB8888 to the VRAM. It also uses less CPU cycles, >> because the copy is done with memcpy(), (DMA is broken or even slower on >> most mgag200 hardware). >> To get numbers, on my test machine, copying the 5MB framebuffer XRGB8888 >> to VRAM takes 125ms. Copying 3.75MB RGB888 framebuffer takes 95ms. The >> conversion has no measurable impact, as it is done on the fly during the >> memcpy, when the CPU is waiting for the bus to accept more data. >> Doing the conversion in user-space would actually be slower, even with >> SSE, because they won't use the "wasted" cpu cycle. But anyway the >> conversion can take a few micro-seconds on the CPU, which is still >> negligible compared to the memory transfer. > > I stand corrected! > > Always exceptions. :-) > > I suppose you have dumb alloc returning sysmem, and PREFER_SHADOW set > to false to go with that? Sounds good for XRGB8888. I guess there is > not enough VRAM to alloc dumb buffers from there anyway? Yes VRAM is between 4MB to 16MB, so the driver only expose dumb buffers in system RAM, and PREFER_SHADOW is set to false. > >> Before sending the v2, Simon Ser proposed to move the paragraph to >> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_plane.c#L132 >> instead of fourcc.h. >> I'm wondering what other thinks about this. > > I like moving it out of drm_fourcc.h. drm_fourcc.h is about format > definitions which are used by things like EGL, Wayland, and whatnot > which are not KMS specific. Ok thanks, I will move it to drm_plane then. > > > Thanks, > pq Best regards,
diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h index 8db7fd3f743e..00a29152da9f 100644 --- a/include/uapi/drm/drm_fourcc.h +++ b/include/uapi/drm/drm_fourcc.h @@ -38,6 +38,13 @@ extern "C" { * fourcc code, a Format Modifier may optionally be provided, in order to * further describe the buffer's format - for example tiling or compression. * + * DRM drivers should not do software color conversion, and only advertise the + * format they support in hardware. But there are two exceptions: + * The first is to support XRGB8888 if the hardware doesn't support it, because + * it's the de facto standard for userspace applications. + * The second is to drop the unused bits when sending the data to the hardware, + * to improve the bandwidth, like dropping the "X" in XRGB8888. + * * Format Modifiers * ---------------- *
After discussions on IRC, the consensus is that the DRM drivers should not do software color conversion, and only advertise the supported formats. Update the doc accordingly so that the rule and exceptions are clear for everyone. Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> --- include/uapi/drm/drm_fourcc.h | 7 +++++++ 1 file changed, 7 insertions(+)