Message ID | 20230825140434.182664-1-jfalempe@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] drm/plane: Add documentation about software color conversion. | expand |
On Fri, 25 Aug 2023 16:04:18 +0200, Jocelyn Falempe wrote: > After discussions on IRC, the consensus is that the DRM drivers should > avoid software color conversion, and only advertise the formats supported > by hardware. > Update the doc accordingly so that the rule and exceptions are clear for > everyone. > > [ ... ] Acked-by: Maxime Ripard <mripard@kernel.org> Thanks! Maxime
Jocelyn Falempe <jfalempe@redhat.com> writes: Hello Jocelyn, > After discussions on IRC, the consensus is that the DRM drivers should > avoid software color conversion, and only advertise the formats supported > by hardware. > Update the doc accordingly so that the rule and exceptions are clear for > everyone. > > Acked-by: Simon Ser <contact@emersion.fr> > Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> > --- Thanks a lot for writing this! Acked-by: Javier Martinez Canillas <javierm@redhat.com>
On Fri, 25 Aug 2023 16:04:18 +0200 Jocelyn Falempe <jfalempe@redhat.com> wrote: > After discussions on IRC, the consensus is that the DRM drivers should > avoid software color conversion, and only advertise the formats supported > by hardware. > Update the doc accordingly so that the rule and exceptions are clear for > everyone. > > Acked-by: Simon Ser <contact@emersion.fr> > Signed-off-by: Jocelyn Falempe <jfalempe@redhat.com> > --- > drivers/gpu/drm/drm_plane.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c > index 24e7998d1731..d05642033202 100644 > --- a/drivers/gpu/drm/drm_plane.c > +++ b/drivers/gpu/drm/drm_plane.c > @@ -140,6 +140,30 @@ > * DRM_FORMAT_MOD_LINEAR. Before linux kernel release v5.1 there have been > * various bugs in this area with inconsistencies between the capability > * flag and per-plane properties. > + * > + * All drivers should support XRGB8888, even if the hardware cannot support > + * it. This has become the de-facto standard and a lot of user-space assume > + * it will be present. If XRGB8888 is not natively supported, then it > + * shouldn't be the default for preferred depth or fbdev emulation. > + * > + * DRM drivers should not do software color conversion, and > + * only advertise the formats they support in hardware. This is for > + * performance reason, and to avoid multiple conversions in userspace and > + * kernel space. KMS page flips are generally expected to be very cheap > + * operations. > + * > + * But there are two exceptions only for dumb buffers: > + * * To support XRGB8888 if it's not supported by the hardware. > + * * Any driver is free to modify its internal representation of the format, > + * as long as it doesn't alter the visible content in any way, and doesn't > + * modify the user-provided buffer. An example would be to drop the > + * padding component from a format to save some memory bandwidth. > + * On most hardware, VRAM read access are slow, so when doing the software > + * conversion, the dumb buffer should be allocated in system RAM in order to > + * have decent performance. > + * Extra care should be taken when doing software conversion with > + * DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here: > + * https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/ > */ Acked-by: Pekka Paalanen <pekka.paalanen@collabora.com> Thanks, pq
Hi Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: [...] > + * > + * But there are two exceptions only for dumb buffers: > + * * To support XRGB8888 if it's not supported by the hardware. > + * * Any driver is free to modify its internal representation of the format, > + * as long as it doesn't alter the visible content in any way, and doesn't > + * modify the user-provided buffer. An example would be to drop the > + * padding component from a format to save some memory bandwidth. I have strong objections to this point, _especially_ as you're apparently trying to sneak this in after our discussion. NAK on this part from my side. If you want userspace to be able to use a certain format, then export the corresponding 4cc code. Then let userspace decide what to do about it. If userspace pick a certain format, go with it. Hence, no implicit conversion from XRGB888 to RGB888, just because it's possible. Best regards Thomas > + * On most hardware, VRAM read access are slow, so when doing the software > + * conversion, the dumb buffer should be allocated in system RAM in order to > + * have decent performance. > + * Extra care should be taken when doing software conversion with > + * DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here: > + * https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/ > */ > > static unsigned int drm_num_planes(struct drm_device *dev) > > base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781
Hi, On Fri, Sep 08, 2023 at 11:21:51AM +0200, Thomas Zimmermann wrote: > Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: > [...] > > + * > > + * But there are two exceptions only for dumb buffers: > > + * * To support XRGB8888 if it's not supported by the hardware. > > > > + * * Any driver is free to modify its internal representation of the format, > > + * as long as it doesn't alter the visible content in any way, and doesn't > > + * modify the user-provided buffer. An example would be to drop the > > + * padding component from a format to save some memory bandwidth. > > I have strong objections to this point, _especially_ as you're apparently > trying to sneak this in after our discussion. I think it's an unfair characterization. This was discussed on #dri-devel, and went through several rounds over the mailing lists, with you in Cc for each. How is that sneaking something in? > NAK on this part from my side. > > If you want userspace to be able to use a certain format, then export the > corresponding 4cc code. Then let userspace decide what to do about it. If > userspace pick a certain format, go with it. > > Hence, no implicit conversion from XRGB888 to RGB888, just because it's > possible. I'm not sure what's your argument is, really. Last time we discussed this, you were saying that you were enforcing that rule because that was the outcome of that (painful) discussion with Pekka and Javier. It turns out that both Pekka and Javier are ok with the patch, so it's not clear to me what your objections are at this point. Are you really arguing that we shouldn't allow a driver to consume less resources while not affecting any other component in the system in any way? Maxime
On Fri, 8 Sep 2023 11:21:51 +0200 Thomas Zimmermann <tzimmermann@suse.de> wrote: > Hi > > Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: > [...] > > + * > > + * But there are two exceptions only for dumb buffers: > > + * * To support XRGB8888 if it's not supported by the hardware. > > > > + * * Any driver is free to modify its internal representation of the format, > > + * as long as it doesn't alter the visible content in any way, and doesn't > > + * modify the user-provided buffer. An example would be to drop the > > + * padding component from a format to save some memory bandwidth. > > I have strong objections to this point, _especially_ as you're > apparently trying to sneak this in after our discussion. NAK on this > part from my side. > > If you want userspace to be able to use a certain format, then export > the corresponding 4cc code. Then let userspace decide what to do about > it. If userspace pick a certain format, go with it. What is the reason for your objection, exactly? > Hence, no implicit conversion from XRGB888 to RGB888, just because it's > possible. For the particular driver in question though, the conversion allows using a display resolution that is otherwise not possible. I also hear it improves performance since 25% less data needs to travel across a slow bus. There is also so little VRAM, than all dumb buffers need to be allocated from sysram instead anyway, so a copy is always necessary. Since XRGB8888 is the one format that is recommended to be supported by all drivers, I don't see a problem here. Did you test on your incredibly slow g200 test rig if the conversion ends up hurting instead of helping performance there? If it hurts, then I see that you have a good reason to NAK this. It's hard to imagine how it would hurt, since you always need a copy from sysram dumb buffers to VRAM - or do you? Thanks, pq > > + * On most hardware, VRAM read access are slow, so when doing the software > > + * conversion, the dumb buffer should be allocated in system RAM in order to > > + * have decent performance. > > + * Extra care should be taken when doing software conversion with > > + * DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here: > > + * https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/ > > */ > > > > static unsigned int drm_num_planes(struct drm_device *dev) > > > > base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781 >
Hi Maxime Am 08.09.23 um 12:58 schrieb Maxime Ripard: > Hi, > > On Fri, Sep 08, 2023 at 11:21:51AM +0200, Thomas Zimmermann wrote: >> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: >> [...] >>> + * >>> + * But there are two exceptions only for dumb buffers: >>> + * * To support XRGB8888 if it's not supported by the hardware. >> >> >>> + * * Any driver is free to modify its internal representation of the format, >>> + * as long as it doesn't alter the visible content in any way, and doesn't >>> + * modify the user-provided buffer. An example would be to drop the >>> + * padding component from a format to save some memory bandwidth. >> >> I have strong objections to this point, _especially_ as you're apparently >> trying to sneak this in after our discussion. > > I think it's an unfair characterization. This was discussed on > #dri-devel, and went through several rounds over the mailing lists, with > you in Cc for each. How is that sneaking something in? A few months ago, we had a flamewar'ish IRC discussion on format conversion within the kernel. The general sentiment was that the kernel drivers should use what ever is provided by userspace without further processing. The short argument was 'userspace knows better'. The only exception is for supporting XRGB8888 on hardware that would otherwise not support it. After some consideration, I agree with all that. (Back then I didn't.) A few weeks ago I received a patch to do an implicit conversion from XRGB8888 to RGB888 within mgag200. [1] I don't have a link to the discussion, but I NAK'ed that patch pretty hard on IRC by following that other discussion. And know I find that this patch (even in its v1) contains language that retroactively legitimizes the mgag200 patch. I wrote 'apparently' I my reply, as I assume that there's more to it, but how does it not look like an attempt to sneak in something that is known to be controversial? It might have been better to discuss the question separately on the dri-devel ML. Maybe we can do this here. Best regards Thomas [1] https://patchwork.freedesktop.org/patch/531879/?series=116381&rev=1 > >> NAK on this part from my side. >> >> If you want userspace to be able to use a certain format, then export the >> corresponding 4cc code. Then let userspace decide what to do about it. If >> userspace pick a certain format, go with it. >> >> Hence, no implicit conversion from XRGB888 to RGB888, just because it's >> possible. > > I'm not sure what's your argument is, really. Last time we discussed > this, you were saying that you were enforcing that rule because that was > the outcome of that (painful) discussion with Pekka and Javier. It turns > out that both Pekka and Javier are ok with the patch, so it's not clear > to me what your objections are at this point. > > Are you really arguing that we shouldn't allow a driver to consume less > resources while not affecting any other component in the system in any > way? > > Maxime
On Friday, September 8th, 2023 at 22:22, Thomas Zimmermann <tzimmermann@suse.de> wrote: > Am 08.09.23 um 12:58 schrieb Maxime Ripard: > > > Hi, > > > > On Fri, Sep 08, 2023 at 11:21:51AM +0200, Thomas Zimmermann wrote: > > > > > Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: > > > [...] > > > > > > > + * > > > > + * But there are two exceptions only for dumb buffers: > > > > + * * To support XRGB8888 if it's not supported by the hardware. > > > > > > > + * * Any driver is free to modify its internal representation of the format, > > > > + * as long as it doesn't alter the visible content in any way, and doesn't > > > > + * modify the user-provided buffer. An example would be to drop the > > > > + * padding component from a format to save some memory bandwidth. > > > > > > I have strong objections to this point, especially as you're apparently > > > trying to sneak this in after our discussion. > > > > I think it's an unfair characterization. This was discussed on > > #dri-devel, and went through several rounds over the mailing lists, with > > you in Cc for each. How is that sneaking something in? > > > A few months ago, we had a flamewar'ish IRC discussion on format > conversion within the kernel. The general sentiment was that the kernel > drivers should use what ever is provided by userspace without further > processing. The short argument was 'userspace knows better'. The only > exception is for supporting XRGB8888 on hardware that would otherwise > not support it. After some consideration, I agree with all that. (Back > then I didn't.) (FWIW, as a userspace dev, the above makes perfect sense to me.)
Thomas Zimmermann <tzimmermann@suse.de> writes: Hello Thomas, > Hi Maxime > > Am 08.09.23 um 12:58 schrieb Maxime Ripard: >> Hi, >> >> On Fri, Sep 08, 2023 at 11:21:51AM +0200, Thomas Zimmermann wrote: >>> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: >>> [...] >>>> + * >>>> + * But there are two exceptions only for dumb buffers: >>>> + * * To support XRGB8888 if it's not supported by the hardware. >>> >>> >>>> + * * Any driver is free to modify its internal representation of the format, >>>> + * as long as it doesn't alter the visible content in any way, and doesn't >>>> + * modify the user-provided buffer. An example would be to drop the >>>> + * padding component from a format to save some memory bandwidth. >>> >>> I have strong objections to this point, _especially_ as you're apparently >>> trying to sneak this in after our discussion. >> >> I think it's an unfair characterization. This was discussed on >> #dri-devel, and went through several rounds over the mailing lists, with >> you in Cc for each. How is that sneaking something in? > > A few months ago, we had a flamewar'ish IRC discussion on format > conversion within the kernel. The general sentiment was that the kernel > drivers should use what ever is provided by userspace without further > processing. The short argument was 'userspace knows better'. The only > exception is for supporting XRGB8888 on hardware that would otherwise > not support it. After some consideration, I agree with all that. (Back > then I didn't.) > > A few weeks ago I received a patch to do an implicit conversion from > XRGB8888 to RGB888 within mgag200. [1] I don't have a link to the > discussion, but I NAK'ed that patch pretty hard on IRC by following that > other discussion. > > And know I find that this patch (even in its v1) contains language that > retroactively legitimizes the mgag200 patch. I wrote 'apparently' I my > reply, as I assume that there's more to it, but how does it not look > like an attempt to sneak in something that is known to be controversial? > While is true that the motivation for Jocelyn's patch was to make explicit what are the rules with regard to drivers emulating formats (other than "we had a flamewar on IRC a while back" which is quite ambiguous), it was not attempt to sneak something that is known to be controversial. In fact, it is an attempt to dispel the controversy and document what is acceptable and what is not for a driver. > It might have been better to discuss the question separately on the > dri-devel ML. Maybe we can do this here. > This was discussed in the #dri-devel IRC channel, I believe you were on PTO at the time and probably that's why you missed. I found the logs here: https://people.freedesktop.org/~cbrill/dri-log/?channel=dri-devel&date=2023-08-04 As you can see there, most people agreed that what Jocelyn wrote in his doc patch is the most pragmatic compromise.
Hi Am 08.09.23 um 13:16 schrieb Pekka Paalanen: > On Fri, 8 Sep 2023 11:21:51 +0200 > Thomas Zimmermann <tzimmermann@suse.de> wrote: > >> Hi >> >> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: >> [...] >>> + * >>> + * But there are two exceptions only for dumb buffers: >>> + * * To support XRGB8888 if it's not supported by the hardware. >> >> >>> + * * Any driver is free to modify its internal representation of the format, >>> + * as long as it doesn't alter the visible content in any way, and doesn't >>> + * modify the user-provided buffer. An example would be to drop the >>> + * padding component from a format to save some memory bandwidth. >> >> I have strong objections to this point, _especially_ as you're >> apparently trying to sneak this in after our discussion. NAK on this >> part from my side. >> >> If you want userspace to be able to use a certain format, then export >> the corresponding 4cc code. Then let userspace decide what to do about >> it. If userspace pick a certain format, go with it. > > What is the reason for your objection, exactly? > >> Hence, no implicit conversion from XRGB888 to RGB888, just because it's >> possible. > > For the particular driver in question though, the conversion allows > using a display resolution that is otherwise not possible. I also hear > it improves performance since 25% less data needs to travel across a > slow bus. There is also so little VRAM, than all dumb buffers need to > be allocated from sysram instead anyway, so a copy is always necessary. > > Since XRGB8888 is the one format that is recommended to be supported by > all drivers, I don't see a problem here. Did you test on your > incredibly slow g200 test rig if the conversion ends up hurting instead > of helping performance there? > > If it hurts, then I see that you have a good reason to NAK this. > > It's hard to imagine how it would hurt, since you always need a copy > from sysram dumb buffers to VRAM - or do you? I have a number of concerns. My point it not that we shouldn't optimize. I just don't want it in the kernel. Mgag200 can export DRM_FORMAT_RGB888 for userspace to use. AFAICT the main argument against userspace is that Mesa doesn't like 3-byte pixels. But I don't see how this conversion cannot be a post-processing step within Mesa: do the rendering in RGB32 and then convert to a framebuffer in RGB24. Userspace can do that more efficiently than the kernel. This has all of the upsides of reduced bandwidth, but none of the downsides of kernel code. Applications and/or Mesa would be in control of the buffer format and apply the optimization where it makes sense. And it would be available for all drivers that are similar to mgag200. My main point is simplicity of the driver: I prefer the driver to be simple without unnecessary indirection or overhead. Optimizations like these my or may not work on a given system with a certain workload. I'd better leave this heuristic to userspace. Another point of concern is CPU consumption: Slow I/O buses may stall the display thread, but the CPU could do something else in the meantime. Doing format conversion on the CPU prevents that, hence affecting other parts of the system negatively. Of course, that's more of a gut feeling than hard data. Please note that the kernel's conversion code uses memory allocation of intermediate buffers. We even recently had a discussion about allocation overhead during display updates. Userspace can surely do a better job at keeping such buffers around. And finally a note the hardware itself: on low-end hardware like those Matrox chips, just switch to RGB16. That will be pretty and fast enough for these chips' server systems. Anyone who cares about fast and beautiful should buy a real graphics card. Best regards Thomas > > > Thanks, > pq > >>> + * On most hardware, VRAM read access are slow, so when doing the software >>> + * conversion, the dumb buffer should be allocated in system RAM in order to >>> + * have decent performance. >>> + * Extra care should be taken when doing software conversion with >>> + * DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here: >>> + * https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/ >>> */ >>> >>> static unsigned int drm_num_planes(struct drm_device *dev) >>> >>> base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781 >> >
On 08/09/2023 15:46, Javier Martinez Canillas wrote: > Thomas Zimmermann <tzimmermann@suse.de> writes: > > Hello Thomas, > >> Hi Maxime >> >> Am 08.09.23 um 12:58 schrieb Maxime Ripard: >>> Hi, >>> >>> On Fri, Sep 08, 2023 at 11:21:51AM +0200, Thomas Zimmermann wrote: >>>> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: >>>> [...] >>>>> + * >>>>> + * But there are two exceptions only for dumb buffers: >>>>> + * * To support XRGB8888 if it's not supported by the hardware. >>>> >>>> >>>>> + * * Any driver is free to modify its internal representation of the format, >>>>> + * as long as it doesn't alter the visible content in any way, and doesn't >>>>> + * modify the user-provided buffer. An example would be to drop the >>>>> + * padding component from a format to save some memory bandwidth. >>>> >>>> I have strong objections to this point, _especially_ as you're apparently >>>> trying to sneak this in after our discussion. >>> >>> I think it's an unfair characterization. This was discussed on >>> #dri-devel, and went through several rounds over the mailing lists, with >>> you in Cc for each. How is that sneaking something in? >> >> A few months ago, we had a flamewar'ish IRC discussion on format >> conversion within the kernel. The general sentiment was that the kernel >> drivers should use what ever is provided by userspace without further >> processing. The short argument was 'userspace knows better'. The only >> exception is for supporting XRGB8888 on hardware that would otherwise >> not support it. After some consideration, I agree with all that. (Back >> then I didn't.) I wasn't part of this "flamewar", and though my patch was a bit unrelated to this. That's why I started this work to document clearly what is acceptable in the kernel or not. I discuss it on IRC, and then proposed the patch on dri-devel to find a compromise, and see if my case can be acceptable or not. >> >> A few weeks ago I received a patch to do an implicit conversion from >> XRGB8888 to RGB888 within mgag200. [1] I don't have a link to the >> discussion, but I NAK'ed that patch pretty hard on IRC by following that >> other discussion. >> >> And know I find that this patch (even in its v1) contains language that >> retroactively legitimizes the mgag200 patch. I wrote 'apparently' I my >> reply, as I assume that there's more to it, but how does it not look >> like an attempt to sneak in something that is known to be controversial? >> That was not my intention, and I apologize if you feel it this way. My goal was just to clarify if this optimization is acceptable for other kernel developers, since I though you were willing to accept it, but some other developers from the "flamewar" were against. > > While is true that the motivation for Jocelyn's patch was to make explicit > what are the rules with regard to drivers emulating formats (other than > "we had a flamewar on IRC a while back" which is quite ambiguous), it was > not attempt to sneak something that is known to be controversial. > > In fact, it is an attempt to dispel the controversy and document what is > acceptable and what is not for a driver. > >> It might have been better to discuss the question separately on the >> dri-devel ML. Maybe we can do this here. >> > > This was discussed in the #dri-devel IRC channel, I believe you were on > PTO at the time and probably that's why you missed. I found the logs here: > > https://people.freedesktop.org/~cbrill/dri-log/?channel=dri-devel&date=2023-08-04 > > As you can see there, most people agreed that what Jocelyn wrote in his > doc patch is the most pragmatic compromise. > Best regards,
Hi Javier Am 08.09.23 um 15:46 schrieb Javier Martinez Canillas: > Thomas Zimmermann <tzimmermann@suse.de> writes: > > Hello Thomas, > >> Hi Maxime >> >> Am 08.09.23 um 12:58 schrieb Maxime Ripard: >>> Hi, >>> >>> On Fri, Sep 08, 2023 at 11:21:51AM +0200, Thomas Zimmermann wrote: >>>> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: >>>> [...] >>>>> + * >>>>> + * But there are two exceptions only for dumb buffers: >>>>> + * * To support XRGB8888 if it's not supported by the hardware. >>>> >>>> >>>>> + * * Any driver is free to modify its internal representation of the format, >>>>> + * as long as it doesn't alter the visible content in any way, and doesn't >>>>> + * modify the user-provided buffer. An example would be to drop the >>>>> + * padding component from a format to save some memory bandwidth. >>>> >>>> I have strong objections to this point, _especially_ as you're apparently >>>> trying to sneak this in after our discussion. >>> >>> I think it's an unfair characterization. This was discussed on >>> #dri-devel, and went through several rounds over the mailing lists, with >>> you in Cc for each. How is that sneaking something in? >> >> A few months ago, we had a flamewar'ish IRC discussion on format >> conversion within the kernel. The general sentiment was that the kernel >> drivers should use what ever is provided by userspace without further >> processing. The short argument was 'userspace knows better'. The only >> exception is for supporting XRGB8888 on hardware that would otherwise >> not support it. After some consideration, I agree with all that. (Back >> then I didn't.) >> >> A few weeks ago I received a patch to do an implicit conversion from >> XRGB8888 to RGB888 within mgag200. [1] I don't have a link to the >> discussion, but I NAK'ed that patch pretty hard on IRC by following that >> other discussion. >> >> And know I find that this patch (even in its v1) contains language that >> retroactively legitimizes the mgag200 patch. I wrote 'apparently' I my >> reply, as I assume that there's more to it, but how does it not look >> like an attempt to sneak in something that is known to be controversial? >> > > While is true that the motivation for Jocelyn's patch was to make explicit > what are the rules with regard to drivers emulating formats (other than > "we had a flamewar on IRC a while back" which is quite ambiguous), it was > not attempt to sneak something that is known to be controversial. > > In fact, it is an attempt to dispel the controversy and document what is > acceptable and what is not for a driver. > >> It might have been better to discuss the question separately on the >> dri-devel ML. Maybe we can do this here. >> > > This was discussed in the #dri-devel IRC channel, I believe you were on > PTO at the time and probably that's why you missed. I found the logs here: > > https://people.freedesktop.org/~cbrill/dri-log/?channel=dri-devel&date=2023-08-04 > > As you can see there, most people agreed that what Jocelyn wrote in his > doc patch is the most pragmatic compromise. Thanks for the pointer to the URL. Quoting Daniel (sima) from that discussion. "imo just document that for hw/drivers where XRGB8888 is not support or has a very bad cost in terms of upload/scanout bw it's ok to emulate it in the kernel driver, but _only_ for that format " This seems the overall sentiment. I disagree with the "has very bad cost" part, though. The cost/benefit ratio should be determined by userspace IMHO. Please see my reply to Pekka for some details. I don't have a pointer to that other IRC discussion, but IIRC there where quite a lot more people involved, including from userspace. Many of those are absent here. Best regards Thomas >
Hi Jocelyn Am 08.09.23 um 16:06 schrieb Jocelyn Falempe: [...] >>> And know I find that this patch (even in its v1) contains language that >>> retroactively legitimizes the mgag200 patch. I wrote 'apparently' I my >>> reply, as I assume that there's more to it, but how does it not look >>> like an attempt to sneak in something that is known to be controversial? >>> > > That was not my intention, and I apologize if you feel it this way. My > goal was just to clarify if this optimization is acceptable for other Sorry for throwing accusations around. You certainly act in good faith. Best regards Thomas
On Fri, 8 Sep 2023 15:56:51 +0200 Thomas Zimmermann <tzimmermann@suse.de> wrote: > Hi > > Am 08.09.23 um 13:16 schrieb Pekka Paalanen: > > On Fri, 8 Sep 2023 11:21:51 +0200 > > Thomas Zimmermann <tzimmermann@suse.de> wrote: > > > >> Hi > >> > >> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: > >> [...] > >>> + * > >>> + * But there are two exceptions only for dumb buffers: > >>> + * * To support XRGB8888 if it's not supported by the hardware. > >> > >> > >>> + * * Any driver is free to modify its internal representation of the format, > >>> + * as long as it doesn't alter the visible content in any way, and doesn't > >>> + * modify the user-provided buffer. An example would be to drop the > >>> + * padding component from a format to save some memory bandwidth. > >> > >> I have strong objections to this point, _especially_ as you're > >> apparently trying to sneak this in after our discussion. NAK on this > >> part from my side. > >> > >> If you want userspace to be able to use a certain format, then export > >> the corresponding 4cc code. Then let userspace decide what to do about > >> it. If userspace pick a certain format, go with it. > > > > What is the reason for your objection, exactly? > > > >> Hence, no implicit conversion from XRGB888 to RGB888, just because it's > >> possible. > > > > For the particular driver in question though, the conversion allows > > using a display resolution that is otherwise not possible. I also hear > > it improves performance since 25% less data needs to travel across a > > slow bus. There is also so little VRAM, than all dumb buffers need to > > be allocated from sysram instead anyway, so a copy is always necessary. > > > > Since XRGB8888 is the one format that is recommended to be supported by > > all drivers, I don't see a problem here. Did you test on your > > incredibly slow g200 test rig if the conversion ends up hurting instead > > of helping performance there? > > > > If it hurts, then I see that you have a good reason to NAK this. > > > > It's hard to imagine how it would hurt, since you always need a copy > > from sysram dumb buffers to VRAM - or do you? > > I have a number of concerns. My point it not that we shouldn't optimize. > I just don't want it in the kernel. Mgag200 can export DRM_FORMAT_RGB888 > for userspace to use. > > AFAICT the main argument against userspace is that Mesa doesn't like > 3-byte pixels. But I don't see how this conversion cannot be a > post-processing step within Mesa: do the rendering in RGB32 and then > convert to a framebuffer in RGB24. Userspace can do that more > efficiently than the kernel. This has all of the upsides of reduced > bandwidth, but none of the downsides of kernel code. Applications and/or > Mesa would be in control of the buffer format and apply the optimization > where it makes sense. And it would be available for all drivers that are > similar to mgag200. > > My main point is simplicity of the driver: I prefer the driver to be > simple without unnecessary indirection or overhead. Optimizations like > these my or may not work on a given system with a certain workload. I'd > better leave this heuristic to userspace. > > Another point of concern is CPU consumption: Slow I/O buses may stall > the display thread, but the CPU could do something else in the meantime. > Doing format conversion on the CPU prevents that, hence affecting other > parts of the system negatively. Of course, that's more of a gut feeling > than hard data. > > Please note that the kernel's conversion code uses memory allocation of > intermediate buffers. We even recently had a discussion about allocation > overhead during display updates. Userspace can surely do a better job at > keeping such buffers around. > > And finally a note the hardware itself: on low-end hardware like those > Matrox chips, just switch to RGB16. That will be pretty and fast enough > for these chips' server systems. Anyone who cares about fast and > beautiful should buy a real graphics card. Fair enough. Did you consider that every frame will be copied twice: once in userspace to get RGB888, then again by the driver into VRAM, since dumb buffers reside in sysram? RGB565 would probably go the same route I guess. I suspect the intermediate buffers (dumb buffers in this case) need to be full frame size rather than a scanline, too. I'm not sure why the driver would need any scratch buffers beyond a couple dozen bytes while doing a software conversion, just enough to have the lowest common multiple of 3 bytes and optimal bus write width. Thanks, pq > > Best regards > Thomas > > > > > > > Thanks, > > pq > > > >>> + * On most hardware, VRAM read access are slow, so when doing the software > >>> + * conversion, the dumb buffer should be allocated in system RAM in order to > >>> + * have decent performance. > >>> + * Extra care should be taken when doing software conversion with > >>> + * DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here: > >>> + * https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/ > >>> */ > >>> > >>> static unsigned int drm_num_planes(struct drm_device *dev) > >>> > >>> base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781 > >> > > >
On 08/09/2023 15:56, Thomas Zimmermann wrote: > Hi > > Am 08.09.23 um 13:16 schrieb Pekka Paalanen: >> On Fri, 8 Sep 2023 11:21:51 +0200 >> Thomas Zimmermann <tzimmermann@suse.de> wrote: >> >>> Hi >>> >>> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: >>> [...] >>>> + * >>>> + * But there are two exceptions only for dumb buffers: >>>> + * * To support XRGB8888 if it's not supported by the hardware. >>> >>> >>>> + * * Any driver is free to modify its internal representation >>>> of the format, >>>> + * as long as it doesn't alter the visible content in any >>>> way, and doesn't >>>> + * modify the user-provided buffer. An example would be to >>>> drop the >>>> + * padding component from a format to save some memory >>>> bandwidth. >>> >>> I have strong objections to this point, _especially_ as you're >>> apparently trying to sneak this in after our discussion. NAK on this >>> part from my side. >>> >>> If you want userspace to be able to use a certain format, then export >>> the corresponding 4cc code. Then let userspace decide what to do about >>> it. If userspace pick a certain format, go with it. >> >> What is the reason for your objection, exactly? >> >>> Hence, no implicit conversion from XRGB888 to RGB888, just because it's >>> possible. >> >> For the particular driver in question though, the conversion allows >> using a display resolution that is otherwise not possible. I also hear >> it improves performance since 25% less data needs to travel across a >> slow bus. There is also so little VRAM, than all dumb buffers need to >> be allocated from sysram instead anyway, so a copy is always necessary. >> >> Since XRGB8888 is the one format that is recommended to be supported by >> all drivers, I don't see a problem here. Did you test on your >> incredibly slow g200 test rig if the conversion ends up hurting instead >> of helping performance there? >> >> If it hurts, then I see that you have a good reason to NAK this. >> >> It's hard to imagine how it would hurt, since you always need a copy >> from sysram dumb buffers to VRAM - or do you? > > I have a number of concerns. My point it not that we shouldn't optimize. > I just don't want it in the kernel. Mgag200 can export DRM_FORMAT_RGB888 > for userspace to use. It already does, it's just userspace doesn't want to support it. > > AFAICT the main argument against userspace is that Mesa doesn't like > 3-byte pixels. But I don't see how this conversion cannot be a > post-processing step within Mesa: do the rendering in RGB32 and then > convert to a framebuffer in RGB24. Userspace can do that more > efficiently than the kernel. This has all of the upsides of reduced > bandwidth, but none of the downsides of kernel code. Applications and/or > Mesa would be in control of the buffer format and apply the optimization > where it makes sense. And it would be available for all drivers that are > similar to mgag200. For this particular case, user-space would use more memory and CPU, because the copy to VRAM is done on kernel side, and it is where the conversion must be done for maximum performances. So there is no way for userspace to be as efficient as the kernel in this use-case. For the conversion, the kernel allocate only 1 line, and convert/copy one line at a time. And because the CPU is out-of-order, it can start converting the next line using CPU registers while waiting for the bus. Userspace would need to allocate a whole framebuffer, and can't use the "wasted" CPU cycle to do the conversion. > > My main point is simplicity of the driver: I prefer the driver to be > simple without unnecessary indirection or overhead. Optimizations like > these my or may not work on a given system with a certain workload. I'd > better leave this heuristic to userspace. I agree that the driver is simpler without optimizing thing. But I think it's the role of the driver to get the maximum from the hardware, even if it's old and broken like g200. And userspace don't want to optimize for such hardware. > > Another point of concern is CPU consumption: Slow I/O buses may stall > the display thread, but the CPU could do something else in the meantime. > Doing format conversion on the CPU prevents that, hence affecting other > parts of the system negatively. Of course, that's more of a gut feeling > than hard data. I think it's the reverse. Without dropping the X data, the CPU is actually stalling much longer sending useless bytes to the mgag200's VRAM. And the CPU can't do anything else while doing memcpy_toio(). Using DMA is the only way to free the CPU during the copy, but it appears to be either broken or significantly slower on most mgag200 hardware. I actually have made the work to support DMA, and I'm a bit sad it didn't work on all g200, so this optimization is really a last resort, on a really broken hardware. > > Please note that the kernel's conversion code uses memory allocation of > intermediate buffers. We even recently had a discussion about allocation > overhead during display updates. Userspace can surely do a better job at > keeping such buffers around. > > And finally a note the hardware itself: on low-end hardware like those > Matrox chips, just switch to RGB16. That will be pretty and fast enough > for these chips' server systems. Anyone who cares about fast and > beautiful should buy a real graphics card. There are still sysadmin users that occasionally have to browse the web to find answer, or read their mail in a GUI client. It turns out that rgb16 is pretty ugly for today standard, and buying an external card would be a bit too much, and won't work for remote control. Best regards,
Hi Am 08.09.23 um 16:41 schrieb Pekka Paalanen: > On Fri, 8 Sep 2023 15:56:51 +0200 > Thomas Zimmermann <tzimmermann@suse.de> wrote: > >> Hi >> >> Am 08.09.23 um 13:16 schrieb Pekka Paalanen: >>> On Fri, 8 Sep 2023 11:21:51 +0200 >>> Thomas Zimmermann <tzimmermann@suse.de> wrote: >>> >>>> Hi >>>> >>>> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: >>>> [...] >>>>> + * >>>>> + * But there are two exceptions only for dumb buffers: >>>>> + * * To support XRGB8888 if it's not supported by the hardware. >>>> >>>> >>>>> + * * Any driver is free to modify its internal representation of the format, >>>>> + * as long as it doesn't alter the visible content in any way, and doesn't >>>>> + * modify the user-provided buffer. An example would be to drop the >>>>> + * padding component from a format to save some memory bandwidth. >>>> >>>> I have strong objections to this point, _especially_ as you're >>>> apparently trying to sneak this in after our discussion. NAK on this >>>> part from my side. >>>> >>>> If you want userspace to be able to use a certain format, then export >>>> the corresponding 4cc code. Then let userspace decide what to do about >>>> it. If userspace pick a certain format, go with it. >>> >>> What is the reason for your objection, exactly? >>> >>>> Hence, no implicit conversion from XRGB888 to RGB888, just because it's >>>> possible. >>> >>> For the particular driver in question though, the conversion allows >>> using a display resolution that is otherwise not possible. I also hear >>> it improves performance since 25% less data needs to travel across a >>> slow bus. There is also so little VRAM, than all dumb buffers need to >>> be allocated from sysram instead anyway, so a copy is always necessary. >>> >>> Since XRGB8888 is the one format that is recommended to be supported by >>> all drivers, I don't see a problem here. Did you test on your >>> incredibly slow g200 test rig if the conversion ends up hurting instead >>> of helping performance there? >>> >>> If it hurts, then I see that you have a good reason to NAK this. >>> >>> It's hard to imagine how it would hurt, since you always need a copy >>> from sysram dumb buffers to VRAM - or do you? >> >> I have a number of concerns. My point it not that we shouldn't optimize. >> I just don't want it in the kernel. Mgag200 can export DRM_FORMAT_RGB888 >> for userspace to use. >> >> AFAICT the main argument against userspace is that Mesa doesn't like >> 3-byte pixels. But I don't see how this conversion cannot be a >> post-processing step within Mesa: do the rendering in RGB32 and then >> convert to a framebuffer in RGB24. Userspace can do that more >> efficiently than the kernel. This has all of the upsides of reduced >> bandwidth, but none of the downsides of kernel code. Applications and/or >> Mesa would be in control of the buffer format and apply the optimization >> where it makes sense. And it would be available for all drivers that are >> similar to mgag200. >> >> My main point is simplicity of the driver: I prefer the driver to be >> simple without unnecessary indirection or overhead. Optimizations like >> these my or may not work on a given system with a certain workload. I'd >> better leave this heuristic to userspace. >> >> Another point of concern is CPU consumption: Slow I/O buses may stall >> the display thread, but the CPU could do something else in the meantime. >> Doing format conversion on the CPU prevents that, hence affecting other >> parts of the system negatively. Of course, that's more of a gut feeling >> than hard data. >> >> Please note that the kernel's conversion code uses memory allocation of >> intermediate buffers. We even recently had a discussion about allocation >> overhead during display updates. Userspace can surely do a better job at >> keeping such buffers around. >> >> And finally a note the hardware itself: on low-end hardware like those >> Matrox chips, just switch to RGB16. That will be pretty and fast enough >> for these chips' server systems. Anyone who cares about fast and >> beautiful should buy a real graphics card. > > Fair enough. > > Did you consider that every frame will be copied twice: once in > userspace to get RGB888, then again by the driver into VRAM, since dumb > buffers reside in sysram? In the kernel, we reduce the copying to the changed parts, if we have damage information from userspace. IDK Mesa's software renderer, but it could certainly apply a similar optimization. > > RGB565 would probably go the same route I guess. From what I know userspace still supports RGB565 rendering directly. Last time I tested, it worked for me. > > I suspect the intermediate buffers (dumb buffers in this case) need to > be full frame size rather than a scanline, too. I'm not sure why the > driver would need any scratch buffers beyond a couple dozen bytes while > doing a software conversion, just enough to have the lowest common > multiple of 3 bytes and optimal bus write width. For the conversion in the kernel we allocate enough temporary memory to hold the part of each scanline that changed. Then we go over dirty scanlines, convert each into the output format and store it in that temporary buffer. Then copy the temporary buffer to VRAM. The buffer can be reused for the scanlines of a single conversion. But the nature of the framebuffer (buffers are possibly shared with concurrent access from multiple planes) makes it hard to keep that temporary memory around. Hence it's freed after each conversion. The code is at [1]. I assume that a userspace software renderer could do a much better job at keeping the temporary memory allocated. Best regards Thomas [1] https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/drm_format_helper.c#L88 > > > Thanks, > pq > >> >> Best regards >> Thomas >> >>> >>> >>> Thanks, >>> pq >>> >>>>> + * On most hardware, VRAM read access are slow, so when doing the software >>>>> + * conversion, the dumb buffer should be allocated in system RAM in order to >>>>> + * have decent performance. >>>>> + * Extra care should be taken when doing software conversion with >>>>> + * DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here: >>>>> + * https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/ >>>>> */ >>>>> >>>>> static unsigned int drm_num_planes(struct drm_device *dev) >>>>> >>>>> base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781 >>>> >>> >> >
Hi Am 08.09.23 um 16:48 schrieb Jocelyn Falempe: > On 08/09/2023 15:56, Thomas Zimmermann wrote: >> Hi >> >> Am 08.09.23 um 13:16 schrieb Pekka Paalanen: >>> On Fri, 8 Sep 2023 11:21:51 +0200 >>> Thomas Zimmermann <tzimmermann@suse.de> wrote: >>> >>>> Hi >>>> >>>> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: >>>> [...] >>>>> + * >>>>> + * But there are two exceptions only for dumb buffers: >>>>> + * * To support XRGB8888 if it's not supported by the hardware. >>>> >>>> >>>>> + * * Any driver is free to modify its internal representation >>>>> of the format, >>>>> + * as long as it doesn't alter the visible content in any >>>>> way, and doesn't >>>>> + * modify the user-provided buffer. An example would be to >>>>> drop the >>>>> + * padding component from a format to save some memory >>>>> bandwidth. >>>> >>>> I have strong objections to this point, _especially_ as you're >>>> apparently trying to sneak this in after our discussion. NAK on this >>>> part from my side. >>>> >>>> If you want userspace to be able to use a certain format, then export >>>> the corresponding 4cc code. Then let userspace decide what to do about >>>> it. If userspace pick a certain format, go with it. >>> >>> What is the reason for your objection, exactly? >>> >>>> Hence, no implicit conversion from XRGB888 to RGB888, just because it's >>>> possible. >>> >>> For the particular driver in question though, the conversion allows >>> using a display resolution that is otherwise not possible. I also hear >>> it improves performance since 25% less data needs to travel across a >>> slow bus. There is also so little VRAM, than all dumb buffers need to >>> be allocated from sysram instead anyway, so a copy is always necessary. >>> >>> Since XRGB8888 is the one format that is recommended to be supported by >>> all drivers, I don't see a problem here. Did you test on your >>> incredibly slow g200 test rig if the conversion ends up hurting instead >>> of helping performance there? >>> >>> If it hurts, then I see that you have a good reason to NAK this. >>> >>> It's hard to imagine how it would hurt, since you always need a copy >>> from sysram dumb buffers to VRAM - or do you? >> >> I have a number of concerns. My point it not that we shouldn't >> optimize. I just don't want it in the kernel. Mgag200 can export >> DRM_FORMAT_RGB888 for userspace to use. > > It already does, it's just userspace doesn't want to support it. >> >> AFAICT the main argument against userspace is that Mesa doesn't like >> 3-byte pixels. But I don't see how this conversion cannot be a >> post-processing step within Mesa: do the rendering in RGB32 and then >> convert to a framebuffer in RGB24. Userspace can do that more >> efficiently than the kernel. This has all of the upsides of reduced >> bandwidth, but none of the downsides of kernel code. Applications >> and/or Mesa would be in control of the buffer format and apply the >> optimization where it makes sense. And it would be available for all >> drivers that are similar to mgag200. > > For this particular case, user-space would use more memory and CPU, > because the copy to VRAM is done on kernel side, and it is where the > conversion must be done for maximum performances. So there is no way for > userspace to be as efficient as the kernel in this use-case. > > For the conversion, the kernel allocate only 1 line, and convert/copy > one line at a time. And because the CPU is out-of-order, it can start > converting the next line using CPU registers while waiting for the bus. Access is writecombined, so you cannot throw large amounts of data towards the bus and do something else meanwhile. > > Userspace would need to allocate a whole framebuffer, and can't use the > "wasted" CPU cycle to do the conversion. Yes, userspace would probably need a full extra framebuffer to store the RGB32 data. But just as in the kernel, userspace can do format conversion of only the damaged areas. No extra CPU overhead here. > >> >> My main point is simplicity of the driver: I prefer the driver to be >> simple without unnecessary indirection or overhead. Optimizations like >> these my or may not work on a given system with a certain workload. >> I'd better leave this heuristic to userspace. > > I agree that the driver is simpler without optimizing thing. But I think > it's the role of the driver to get the maximum from the hardware, even > if it's old and broken like g200. And userspace don't want to optimize > for such hardware. Optimization always depends on the workload; something that the driver doesn't know. For example, as we mostly move the mouse cursor around the screen, the damages areas are usually small. Optimizing this might be pointless in any case. > >> >> Another point of concern is CPU consumption: Slow I/O buses may stall >> the display thread, but the CPU could do something else in the >> meantime. Doing format conversion on the CPU prevents that, hence >> affecting other parts of the system negatively. Of course, that's more >> of a gut feeling than hard data. > > I think it's the reverse. Without dropping the X data, the CPU is > actually stalling much longer sending useless bytes to the mgag200's > VRAM. And the CPU can't do anything else while doing memcpy_toio(). Hyperthreading. You are also arguing against XRGB in general, which is a different topic. > Using DMA is the only way to free the CPU during the copy, but it > appears to be either broken or significantly slower on most mgag200 > hardware. > > I actually have made the work to support DMA, and I'm a bit sad it > didn't work on all g200, so this optimization is really a last resort, > on a really broken hardware. > >> >> Please note that the kernel's conversion code uses memory allocation >> of intermediate buffers. We even recently had a discussion about >> allocation overhead during display updates. Userspace can surely do a >> better job at keeping such buffers around. >> >> And finally a note the hardware itself: on low-end hardware like those >> Matrox chips, just switch to RGB16. That will be pretty and fast >> enough for these chips' server systems. Anyone who cares about fast >> and beautiful should buy a real graphics card. > > There are still sysadmin users that occasionally have to browse the web > to find answer, or read their mail in a GUI client. It turns out that > rgb16 is pretty ugly for today standard, and buying an external card > would be a bit too much, and won't work for remote control. I'm sure sysadmins have a computer for work with a decent GPU and don't need to browse the web on their server systems. Best regards Thomas > > > Best regards, >
On Fri, 8 Sep 2023 17:10:46 +0200 Thomas Zimmermann <tzimmermann@suse.de> wrote: > Hi > > Am 08.09.23 um 16:41 schrieb Pekka Paalanen: > > On Fri, 8 Sep 2023 15:56:51 +0200 > > Thomas Zimmermann <tzimmermann@suse.de> wrote: > > > >> Hi > >> > >> Am 08.09.23 um 13:16 schrieb Pekka Paalanen: > >>> On Fri, 8 Sep 2023 11:21:51 +0200 > >>> Thomas Zimmermann <tzimmermann@suse.de> wrote: > >>> > >>>> Hi > >>>> > >>>> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: > >>>> [...] > >>>>> + * > >>>>> + * But there are two exceptions only for dumb buffers: > >>>>> + * * To support XRGB8888 if it's not supported by the hardware. > >>>> > >>>> > >>>>> + * * Any driver is free to modify its internal representation of the format, > >>>>> + * as long as it doesn't alter the visible content in any way, and doesn't > >>>>> + * modify the user-provided buffer. An example would be to drop the > >>>>> + * padding component from a format to save some memory bandwidth. > >>>> > >>>> I have strong objections to this point, _especially_ as you're > >>>> apparently trying to sneak this in after our discussion. NAK on this > >>>> part from my side. > >>>> > >>>> If you want userspace to be able to use a certain format, then export > >>>> the corresponding 4cc code. Then let userspace decide what to do about > >>>> it. If userspace pick a certain format, go with it. > >>> > >>> What is the reason for your objection, exactly? > >>> > >>>> Hence, no implicit conversion from XRGB888 to RGB888, just because it's > >>>> possible. > >>> > >>> For the particular driver in question though, the conversion allows > >>> using a display resolution that is otherwise not possible. I also hear > >>> it improves performance since 25% less data needs to travel across a > >>> slow bus. There is also so little VRAM, than all dumb buffers need to > >>> be allocated from sysram instead anyway, so a copy is always necessary. > >>> > >>> Since XRGB8888 is the one format that is recommended to be supported by > >>> all drivers, I don't see a problem here. Did you test on your > >>> incredibly slow g200 test rig if the conversion ends up hurting instead > >>> of helping performance there? > >>> > >>> If it hurts, then I see that you have a good reason to NAK this. > >>> > >>> It's hard to imagine how it would hurt, since you always need a copy > >>> from sysram dumb buffers to VRAM - or do you? > >> > >> I have a number of concerns. My point it not that we shouldn't optimize. > >> I just don't want it in the kernel. Mgag200 can export DRM_FORMAT_RGB888 > >> for userspace to use. > >> > >> AFAICT the main argument against userspace is that Mesa doesn't like > >> 3-byte pixels. But I don't see how this conversion cannot be a > >> post-processing step within Mesa: do the rendering in RGB32 and then > >> convert to a framebuffer in RGB24. Userspace can do that more > >> efficiently than the kernel. This has all of the upsides of reduced > >> bandwidth, but none of the downsides of kernel code. Applications and/or > >> Mesa would be in control of the buffer format and apply the optimization > >> where it makes sense. And it would be available for all drivers that are > >> similar to mgag200. > >> > >> My main point is simplicity of the driver: I prefer the driver to be > >> simple without unnecessary indirection or overhead. Optimizations like > >> these my or may not work on a given system with a certain workload. I'd > >> better leave this heuristic to userspace. > >> > >> Another point of concern is CPU consumption: Slow I/O buses may stall > >> the display thread, but the CPU could do something else in the meantime. > >> Doing format conversion on the CPU prevents that, hence affecting other > >> parts of the system negatively. Of course, that's more of a gut feeling > >> than hard data. > >> > >> Please note that the kernel's conversion code uses memory allocation of > >> intermediate buffers. We even recently had a discussion about allocation > >> overhead during display updates. Userspace can surely do a better job at > >> keeping such buffers around. > >> > >> And finally a note the hardware itself: on low-end hardware like those > >> Matrox chips, just switch to RGB16. That will be pretty and fast enough > >> for these chips' server systems. Anyone who cares about fast and > >> beautiful should buy a real graphics card. > > > > Fair enough. > > > > Did you consider that every frame will be copied twice: once in > > userspace to get RGB888, then again by the driver into VRAM, since dumb > > buffers reside in sysram? > > In the kernel, we reduce the copying to the changed parts, if we have > damage information from userspace. IDK Mesa's software renderer, but it > could certainly apply a similar optimization. I have already assumed that everything uses damage information to optimize the regions to copy. It's still two copies instead of one. Actually, it's slightly more than two copies. Due to damage tracking and the driver needing to flip between two VRAM buffers, it is always copying current + previous damage, not only current damage. > > I suspect the intermediate buffers (dumb buffers in this case) need to > > be full frame size rather than a scanline, too. I'm not sure why the > > driver would need any scratch buffers beyond a couple dozen bytes while > > doing a software conversion, just enough to have the lowest common > > multiple of 3 bytes and optimal bus write width. > > For the conversion in the kernel we allocate enough temporary memory to > hold the part of each scanline that changed. Then we go over dirty > scanlines, convert each into the output format and store it in that > temporary buffer. Then copy the temporary buffer to VRAM. The buffer can > be reused for the scanlines of a single conversion. But the nature of > the framebuffer (buffers are possibly shared with concurrent access from > multiple planes) makes it hard to keep that temporary memory around. > Hence it's freed after each conversion. The code is at [1]. Yes, that's the conversion in the kernel. However, before the kernel can run that conversion, userspace must have already allocated full sized dumb buffers to convert its full sized internal framebuffer into. Since KMS is modernly used with double-buffering, there must always be two dumb buffers, which means updating a dumb buffer needs to copy not just current damage but also previous damage. Userspace has no way to know that single-buffering would be equally good in this case for this particular driver. If userspace gave its internal framebuffer to the kernel without doing the conversion to RGB888, then that internal buffer would be the dumb buffer, and there would be no need to allocate a second full size buffer (third, because double-buffering towards KMS). The driver allocating even full scanlines would be a lot less memory touched if userspace didn't convert to RGB888, and if the driver used a tailored conversion function (it literally needs to handle only a single combination of input and output conditions), it wouldn't need even that nor to allocate and free on every conversion. I understand you do not want to pay the cost having such special-case code, and that's ok. It would just be even further optimization after getting rid of a static full sized buffer allocation. > I assume that a userspace software renderer could do a much better job > at keeping the temporary memory allocated. I'm not sure why the kernel can't keep the temporary scanline buffer allocated with the CRTC. It only needs to be reallocated if it's too small. Sure, people probably don't want to spend time on such code. All the above discussion is based on the assumption that dumb buffers are allocated in sysram. It's fine to say you don't want to optimize, but I want to be clear on the exact trade-off. The trade-offs vary greatly depending on each particular use case, which is why I wouldn't make a hard rule of no in-kernel conversions, just a rule of thumb since it's *usually* not useful or is actively hurting. Here we are talking about XRGB8888 which is already exempt from the rule of thumb, with two more special conditions: dumb buffers in sysram, and the performance traits of RGB888 on the specific hardware. Thanks, pq > > Best regards > Thomas > > [1] > https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/drm_format_helper.c#L88 > > > > > > > Thanks, > > pq > > > >> > >> Best regards > >> Thomas > >> > >>> > >>> > >>> Thanks, > >>> pq > >>> > >>>>> + * On most hardware, VRAM read access are slow, so when doing the software > >>>>> + * conversion, the dumb buffer should be allocated in system RAM in order to > >>>>> + * have decent performance. > >>>>> + * Extra care should be taken when doing software conversion with > >>>>> + * DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here: > >>>>> + * https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/ > >>>>> */ > >>>>> > >>>>> static unsigned int drm_num_planes(struct drm_device *dev) > >>>>> > >>>>> base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781 > >>>> > >>> > >> > > >
On 08/09/2023 17:37, Thomas Zimmermann wrote: > Hi > > Am 08.09.23 um 16:48 schrieb Jocelyn Falempe: >> On 08/09/2023 15:56, Thomas Zimmermann wrote: >>> Hi >>> >>> Am 08.09.23 um 13:16 schrieb Pekka Paalanen: >>>> On Fri, 8 Sep 2023 11:21:51 +0200 >>>> Thomas Zimmermann <tzimmermann@suse.de> wrote: >>>> >>>>> Hi >>>>> >>>>> Am 25.08.23 um 16:04 schrieb Jocelyn Falempe: >>>>> [...] >>>>>> + * >>>>>> + * But there are two exceptions only for dumb buffers: >>>>>> + * * To support XRGB8888 if it's not supported by the hardware. >>>>> >>>>> >>>>>> + * * Any driver is free to modify its internal representation >>>>>> of the format, >>>>>> + * as long as it doesn't alter the visible content in any >>>>>> way, and doesn't >>>>>> + * modify the user-provided buffer. An example would be to >>>>>> drop the >>>>>> + * padding component from a format to save some memory >>>>>> bandwidth. >>>>> >>>>> I have strong objections to this point, _especially_ as you're >>>>> apparently trying to sneak this in after our discussion. NAK on this >>>>> part from my side. >>>>> >>>>> If you want userspace to be able to use a certain format, then export >>>>> the corresponding 4cc code. Then let userspace decide what to do about >>>>> it. If userspace pick a certain format, go with it. >>>> >>>> What is the reason for your objection, exactly? >>>> >>>>> Hence, no implicit conversion from XRGB888 to RGB888, just because >>>>> it's >>>>> possible. >>>> >>>> For the particular driver in question though, the conversion allows >>>> using a display resolution that is otherwise not possible. I also hear >>>> it improves performance since 25% less data needs to travel across a >>>> slow bus. There is also so little VRAM, than all dumb buffers need to >>>> be allocated from sysram instead anyway, so a copy is always necessary. >>>> >>>> Since XRGB8888 is the one format that is recommended to be supported by >>>> all drivers, I don't see a problem here. Did you test on your >>>> incredibly slow g200 test rig if the conversion ends up hurting instead >>>> of helping performance there? >>>> >>>> If it hurts, then I see that you have a good reason to NAK this. >>>> >>>> It's hard to imagine how it would hurt, since you always need a copy >>>> from sysram dumb buffers to VRAM - or do you? >>> >>> I have a number of concerns. My point it not that we shouldn't >>> optimize. I just don't want it in the kernel. Mgag200 can export >>> DRM_FORMAT_RGB888 for userspace to use. >> >> It already does, it's just userspace doesn't want to support it. >>> >>> AFAICT the main argument against userspace is that Mesa doesn't like >>> 3-byte pixels. But I don't see how this conversion cannot be a >>> post-processing step within Mesa: do the rendering in RGB32 and then >>> convert to a framebuffer in RGB24. Userspace can do that more >>> efficiently than the kernel. This has all of the upsides of reduced >>> bandwidth, but none of the downsides of kernel code. Applications >>> and/or Mesa would be in control of the buffer format and apply the >>> optimization where it makes sense. And it would be available for all >>> drivers that are similar to mgag200. >> >> For this particular case, user-space would use more memory and CPU, >> because the copy to VRAM is done on kernel side, and it is where the >> conversion must be done for maximum performances. So there is no way >> for userspace to be as efficient as the kernel in this use-case. >> >> For the conversion, the kernel allocate only 1 line, and convert/copy >> one line at a time. And because the CPU is out-of-order, it can start >> converting the next line using CPU registers while waiting for the bus. > > Access is writecombined, so you cannot throw large amounts of data > towards the bus and do something else meanwhile. > >> >> Userspace would need to allocate a whole framebuffer, and can't use >> the "wasted" CPU cycle to do the conversion. > > Yes, userspace would probably need a full extra framebuffer to store the > RGB32 data. But just as in the kernel, userspace can do format > conversion of only the damaged areas. No extra CPU overhead here. > >> >>> >>> My main point is simplicity of the driver: I prefer the driver to be >>> simple without unnecessary indirection or overhead. Optimizations >>> like these my or may not work on a given system with a certain >>> workload. I'd better leave this heuristic to userspace. >> >> I agree that the driver is simpler without optimizing thing. But I >> think it's the role of the driver to get the maximum from the >> hardware, even if it's old and broken like g200. And userspace don't >> want to optimize for such hardware. > > Optimization always depends on the workload; something that the driver > doesn't know. For example, as we mostly move the mouse cursor around the > screen, the damages areas are usually small. Optimizing this might be > pointless in any case. So your point is we should not optimize because sometime it might not be necessary ? And even for cursor update, the conversion is still 25% faster. > >> >>> >>> Another point of concern is CPU consumption: Slow I/O buses may stall >>> the display thread, but the CPU could do something else in the >>> meantime. Doing format conversion on the CPU prevents that, hence >>> affecting other parts of the system negatively. Of course, that's >>> more of a gut feeling than hard data. >> >> I think it's the reverse. Without dropping the X data, the CPU is >> actually stalling much longer sending useless bytes to the mgag200's >> VRAM. And the CPU can't do anything else while doing memcpy_toio(). > > Hyperthreading. I still doubt a user-space conversion would do a better job than the kernel. > > You are also arguing against XRGB in general, which is a different topic. yes, the issue is human eyes only sees 3 colors, and it's not a power of two. So compromise have been made, and that Matrox card, is from the era of the transition from 16bits to 32bits, and works significantly better in 24bits. And it's probably the only remaining GPU with this problem. > >> Using DMA is the only way to free the CPU during the copy, but it >> appears to be either broken or significantly slower on most mgag200 >> hardware. >> >> I actually have made the work to support DMA, and I'm a bit sad it >> didn't work on all g200, so this optimization is really a last resort, >> on a really broken hardware. >> >>> >>> Please note that the kernel's conversion code uses memory allocation >>> of intermediate buffers. We even recently had a discussion about >>> allocation overhead during display updates. Userspace can surely do a >>> better job at keeping such buffers around. >>> >>> And finally a note the hardware itself: on low-end hardware like >>> those Matrox chips, just switch to RGB16. That will be pretty and >>> fast enough for these chips' server systems. Anyone who cares about >>> fast and beautiful should buy a real graphics card. >> >> There are still sysadmin users that occasionally have to browse the >> web to find answer, or read their mail in a GUI client. It turns out >> that rgb16 is pretty ugly for today standard, and buying an external >> card would be a bit too much, and won't work for remote control. > > I'm sure sysadmins have a computer for work with a decent GPU and don't > need to browse the web on their server systems. The GUI applications also include graphical installer, that obviously you can't run on other system. I do have bug reports, and I already fixed a few regressions in the mgag200 driver from this reports. But if you think they shouldn't use this GPU, then why maintaining a driver in the first place ? Simpledrm is enough if you don't use graphics. > > Best regards > Thomas > >> >> >> Best regards, >> > Best regards,
Hi Am 11.09.23 um 10:38 schrieb Pekka Paalanen: [...] >> In the kernel, we reduce the copying to the changed parts, if we have >> damage information from userspace. IDK Mesa's software renderer, but it >> could certainly apply a similar optimization. > > I have already assumed that everything uses damage information to > optimize the regions to copy. It's still two copies instead of one. > Actually, it's slightly more than two copies. > > Due to damage tracking and the driver needing to flip between two VRAM > buffers, it is always copying current + previous damage, not only > current damage. All dumb buffers are in system memory. From there, we always copy into the same region of VRAM. So from the kernel's perspective, it's really just the latest damage. > >>> I suspect the intermediate buffers (dumb buffers in this case) need to >>> be full frame size rather than a scanline, too. I'm not sure why the >>> driver would need any scratch buffers beyond a couple dozen bytes while >>> doing a software conversion, just enough to have the lowest common >>> multiple of 3 bytes and optimal bus write width. >> >> For the conversion in the kernel we allocate enough temporary memory to >> hold the part of each scanline that changed. Then we go over dirty >> scanlines, convert each into the output format and store it in that >> temporary buffer. Then copy the temporary buffer to VRAM. The buffer can >> be reused for the scanlines of a single conversion. But the nature of >> the framebuffer (buffers are possibly shared with concurrent access from >> multiple planes) makes it hard to keep that temporary memory around. >> Hence it's freed after each conversion. The code is at [1]. > > Yes, that's the conversion in the kernel. However, before the kernel > can run that conversion, userspace must have already allocated full > sized dumb buffers to convert its full sized internal framebuffer into. > Since KMS is modernly used with double-buffering, there must always be > two dumb buffers, which means updating a dumb buffer needs to copy not > just current damage but also previous damage. Userspace has no way to > know that single-buffering would be equally good in this case for this > particular driver. I think that this would be a good optimization. If we let userspace know that a certain GEM BO is a shadow buffer, the compositor could avoid that second dumb buffer entirely. We have plenty of DRM drivers that would benefit from this (grep the DRM code for 'shadow_plane'). It would also allow a format conversion in userspace with little memory overhead compared to the two dumb buffers of the current code. > > If userspace gave its internal framebuffer to the kernel without doing > the conversion to RGB888, then that internal buffer would be the dumb > buffer, and there would be no need to allocate a second full size > buffer (third, because double-buffering towards KMS). > > The driver allocating even full scanlines would be a lot less memory > touched if userspace didn't convert to RGB888, and if the driver used a > tailored conversion function (it literally needs to handle only a single > combination of input and output conditions), it wouldn't need even that > nor to allocate and free on every conversion. I understand you do not > want to pay the cost having such special-case code, and that's ok. It > would just be even further optimization after getting rid of a static > full sized buffer allocation. > > >> I assume that a userspace software renderer could do a much better job >> at keeping the temporary memory allocated. > > I'm not sure why the kernel can't keep the temporary scanline buffer > allocated with the CRTC. It only needs to be reallocated if it's too > small. Sure, people probably don't want to spend time on such code. I wanted to make that happen at some point. But it's a bit of a hassle. Either we pass a temporary buffer to the kernel's format-conversion code or we store it in one of the involved data structures, preferable drm_framebuffer. The former solution complicates the caller and is prone to memory leaks; the latter is complicated as drm_framebuffer could be used concurrently. > > All the above discussion is based on the assumption that dumb buffers > are allocated in sysram. > > It's fine to say you don't want to optimize, but I want to be clear on > the exact trade-off. > > The trade-offs vary greatly depending on each particular use case, > which is why I wouldn't make a hard rule of no in-kernel conversions, > just a rule of thumb since it's *usually* not useful or is actively > hurting. Here we are talking about XRGB8888 which is already exempt > from the rule of thumb, with two more special conditions: dumb buffers > in sysram, and the performance traits of RGB888 on the specific > hardware. I don't like this optimization from an architectural POV. But I also haven't seen that it makes a difference in practice. As I said in another email, the current XRGB-based code is not too slow to be unusable in the case of a full-screen update. And as we're mostly moving mouse cursors around the screen, screen updates are small most of the time. Best regards Thomas > > > Thanks, > pq > >> >> Best regards >> Thomas >> >> [1] >> https://elixir.bootlin.com/linux/v6.5/source/drivers/gpu/drm/drm_format_helper.c#L88 >> >>> >>> >>> Thanks, >>> pq >>> >>>> >>>> Best regards >>>> Thomas >>>> >>>>> >>>>> >>>>> Thanks, >>>>> pq >>>>> >>>>>>> + * On most hardware, VRAM read access are slow, so when doing the software >>>>>>> + * conversion, the dumb buffer should be allocated in system RAM in order to >>>>>>> + * have decent performance. >>>>>>> + * Extra care should be taken when doing software conversion with >>>>>>> + * DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here: >>>>>>> + * https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/ >>>>>>> */ >>>>>>> >>>>>>> static unsigned int drm_num_planes(struct drm_device *dev) >>>>>>> >>>>>>> base-commit: 82d750e9d2f5d0594c8f7057ce59127e701af781 >>>>>> >>>>> >>>> >>> >> >
Hi Am 11.09.23 um 12:05 schrieb Jocelyn Falempe: [...] >> Optimization always depends on the workload; something that the driver >> doesn't know. For example, as we mostly move the mouse cursor around >> the screen, the damages areas are usually small. Optimizing this might >> be pointless in any case. > > So your point is we should not optimize because sometime it might not be > necessary ? And even for cursor update, the conversion is still 25% faster. Yes, kind of. Those 25% are meaningless in the big picture. And the current code is fast enough for what these chips do (server consoles). So there's no pointing micro-optimizing anything here. See my reply to Pekka for an optimization that I'd find much more useful. If I have to choose between fast-enough-but-simple and faster-but-complicated, the former is much preferable. Anyone who wants fast and beautiful graphics has a better graphics card anyway. > >> >>> >>>> >>>> Another point of concern is CPU consumption: Slow I/O buses may >>>> stall the display thread, but the CPU could do something else in the >>>> meantime. Doing format conversion on the CPU prevents that, hence >>>> affecting other parts of the system negatively. Of course, that's >>>> more of a gut feeling than hard data. >>> >>> I think it's the reverse. Without dropping the X data, the CPU is >>> actually stalling much longer sending useless bytes to the mgag200's >>> VRAM. And the CPU can't do anything else while doing memcpy_toio(). >> >> Hyperthreading. > > I still doubt a user-space conversion would do a better job than the > kernel. >> >> You are also arguing against XRGB in general, which is a different topic. > > yes, the issue is human eyes only sees 3 colors, and it's not a power of > two. So compromise have been made, and that Matrox card, is from the era > of the transition from 16bits to 32bits, and works significantly better > in 24bits. And it's probably the only remaining GPU with this problem. No. There's at least udl hardware, which does only support RGB16 and RGB24. And for simpledrm and ofdrm, we have to take whatever the system provided us with. That could be RGB24. > >> >>> Using DMA is the only way to free the CPU during the copy, but it >>> appears to be either broken or significantly slower on most mgag200 >>> hardware. >>> >>> I actually have made the work to support DMA, and I'm a bit sad it >>> didn't work on all g200, so this optimization is really a last >>> resort, on a really broken hardware. >>> >>>> >>>> Please note that the kernel's conversion code uses memory allocation >>>> of intermediate buffers. We even recently had a discussion about >>>> allocation overhead during display updates. Userspace can surely do >>>> a better job at keeping such buffers around. >>>> >>>> And finally a note the hardware itself: on low-end hardware like >>>> those Matrox chips, just switch to RGB16. That will be pretty and >>>> fast enough for these chips' server systems. Anyone who cares about >>>> fast and beautiful should buy a real graphics card. >>> >>> There are still sysadmin users that occasionally have to browse the >>> web to find answer, or read their mail in a GUI client. It turns out >>> that rgb16 is pretty ugly for today standard, and buying an external >>> card would be a bit too much, and won't work for remote control. >> >> I'm sure sysadmins have a computer for work with a decent GPU and >> don't need to browse the web on their server systems. > > The GUI applications also include graphical installer, that obviously > you can't run on other system. > I do have bug reports, and I already fixed a few regressions in the > mgag200 driver from this reports. This isn't a driver bug. > But if you think they shouldn't use this GPU, then why maintaining a > driver in the first place ? Simpledrm is enough if you don't use graphics. I've done my share of graphic installations on these chips. Neither low-color modes nor performance has ever been a problem. And if, it is still better to spend the time on the renderer, so that all drivers can benefit. Best regards Thomas > >> >> Best regards >> Thomas >> >>> >>> >>> Best regards, >>> >> > > Best regards, >
On Fri, Sep 08, 2023 at 05:37:27PM +0200, Thomas Zimmermann wrote: > > > Please note that the kernel's conversion code uses memory allocation > > > of intermediate buffers. We even recently had a discussion about > > > allocation overhead during display updates. Userspace can surely do > > > a better job at keeping such buffers around. > > > > > > And finally a note the hardware itself: on low-end hardware like > > > those Matrox chips, just switch to RGB16. That will be pretty and > > > fast enough for these chips' server systems. Anyone who cares about > > > fast and beautiful should buy a real graphics card. > > > > There are still sysadmin users that occasionally have to browse the web > > to find answer, or read their mail in a GUI client. It turns out that > > rgb16 is pretty ugly for today standard, and buying an external card > > would be a bit too much, and won't work for remote control. > > I'm sure sysadmins have a computer for work with a decent GPU and don't need > to browse the web on their server systems. If your expectation is that there's no capable display stack running on those systems, what user-space component would you expect to optimize the format/transfers like you previously hinted at exactly? Maxime
On 9/11/23 10:38, Pekka Paalanen wrote: > On Fri, 8 Sep 2023 17:10:46 +0200 > Thomas Zimmermann <tzimmermann@suse.de> wrote: >> Am 08.09.23 um 16:41 schrieb Pekka Paalanen: >>> On Fri, 8 Sep 2023 15:56:51 +0200 >>> Thomas Zimmermann <tzimmermann@suse.de> wrote: >>>> >>>> I have a number of concerns. My point it not that we shouldn't optimize. >>>> I just don't want it in the kernel. Mgag200 can export DRM_FORMAT_RGB888 >>>> for userspace to use. >>>> >>>> AFAICT the main argument against userspace is that Mesa doesn't like >>>> 3-byte pixels. But I don't see how this conversion cannot be a >>>> post-processing step within Mesa: do the rendering in RGB32 and then >>>> convert to a framebuffer in RGB24. Even assuming the conversion could be handled transparently in Mesa, it would still require the KMS client to pick RGB888 instead of XRGB8888. Most (all?) KMS clients support XRGB8888, many of them will realistically never support RGB888. (Or even if they did, they might prefer XRGB8888 anyway, if RGB888 requires a final conversion step) >>>> Another point of concern is CPU consumption: Slow I/O buses may stall >>>> the display thread, but the CPU could do something else in the meantime. >>>> Doing format conversion on the CPU prevents that, hence affecting other >>>> parts of the system negatively. Of course, that's more of a gut feeling >>>> than hard data. Jocelyn, have you measured if the XRGB8888 -> RGB888 conversion copy takes longer than a straight RGB888 -> RGB888 copy in the kernel?
On 12/09/2023 17:57, Michel Dänzer wrote: > On 9/11/23 10:38, Pekka Paalanen wrote: >> On Fri, 8 Sep 2023 17:10:46 +0200 >> Thomas Zimmermann <tzimmermann@suse.de> wrote: >>> Am 08.09.23 um 16:41 schrieb Pekka Paalanen: >>>> On Fri, 8 Sep 2023 15:56:51 +0200 >>>> Thomas Zimmermann <tzimmermann@suse.de> wrote: >>>>> >>>>> I have a number of concerns. My point it not that we shouldn't optimize. >>>>> I just don't want it in the kernel. Mgag200 can export DRM_FORMAT_RGB888 >>>>> for userspace to use. >>>>> >>>>> AFAICT the main argument against userspace is that Mesa doesn't like >>>>> 3-byte pixels. But I don't see how this conversion cannot be a >>>>> post-processing step within Mesa: do the rendering in RGB32 and then >>>>> convert to a framebuffer in RGB24. > > Even assuming the conversion could be handled transparently in Mesa, it would still require the KMS client to pick RGB888 instead of XRGB8888. Most (all?) KMS clients support XRGB8888, many of them will realistically never support RGB888. (Or even if they did, they might prefer XRGB8888 anyway, if RGB888 requires a final conversion step) > > >>>>> Another point of concern is CPU consumption: Slow I/O buses may stall >>>>> the display thread, but the CPU could do something else in the meantime. >>>>> Doing format conversion on the CPU prevents that, hence affecting other >>>>> parts of the system negatively. Of course, that's more of a gut feeling >>>>> than hard data. > > Jocelyn, have you measured if the XRGB8888 -> RGB888 conversion copy takes longer than a straight RGB888 -> RGB888 copy in the kernel? > > At least on my Matrox system, the conversion is really negligible compared to the copy, due to slow memory bandwidth. I wasn't able to see a difference, using ktime_get_ns(). The CPU is an old Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz in 1280x1024, the RGB888 copy takes 95ms. and the XRGB8888->RGB888 takes also 95ms. (and the full XRGB8888 copy takes 125ms) Also we say "conversion", but when talking about XRGB8888->RGB888, there is no math involved, only dropping one bytes every 4. But really performance is not the main concern here as it is obvious that it's much faster in RGB888. I tried to summarize the other arguments below, which I still find not convincing: * The driver is already fast enough, if you want faster, buy another GPU. * This adds too much complexity in the driver (about ~20 lines of code) * It breaks an unwritten rule of not changing the color format inside the kernel (which I tried to write with this patch, while adding an exception for legitimate use cases, like this one). But let's admit that this discussion is going nowhere, and that I failed to reach a consensus, so I'm now focusing on other things. Best regards,
On 9/13/23 10:14, Jocelyn Falempe wrote: > On 12/09/2023 17:57, Michel Dänzer wrote: >> On 9/11/23 10:38, Pekka Paalanen wrote: >>> On Fri, 8 Sep 2023 17:10:46 +0200 >>> Thomas Zimmermann <tzimmermann@suse.de> wrote: >>>> Am 08.09.23 um 16:41 schrieb Pekka Paalanen: >>>>> On Fri, 8 Sep 2023 15:56:51 +0200 >>>>> Thomas Zimmermann <tzimmermann@suse.de> wrote: >>>>>> >>>>>> Another point of concern is CPU consumption: Slow I/O buses may stall >>>>>> the display thread, but the CPU could do something else in the meantime. >>>>>> Doing format conversion on the CPU prevents that, hence affecting other >>>>>> parts of the system negatively. Of course, that's more of a gut feeling >>>>>> than hard data. >> >> Jocelyn, have you measured if the XRGB8888 -> RGB888 conversion copy takes longer than a straight RGB888 -> RGB888 copy in the kernel? > > At least on my Matrox system, the conversion is really negligible compared to the copy, due to slow memory bandwidth. I wasn't able to see a difference, using ktime_get_ns(). > > The CPU is an old Intel(R) Core(TM) i3 CPU 540 @ 3.07GHz > in 1280x1024, the RGB888 copy takes 95ms. > and the XRGB8888->RGB888 takes also 95ms. > (and the full XRGB8888 copy takes 125ms) Then any XRGB8888 → RGB888 conversion in user space has to result in worse performance. > But let's admit that this discussion is going nowhere, and that I failed to reach a consensus, so I'm now focusing on other things. I see consensus, with one person disagreeing.
diff --git a/drivers/gpu/drm/drm_plane.c b/drivers/gpu/drm/drm_plane.c index 24e7998d1731..d05642033202 100644 --- a/drivers/gpu/drm/drm_plane.c +++ b/drivers/gpu/drm/drm_plane.c @@ -140,6 +140,30 @@ * DRM_FORMAT_MOD_LINEAR. Before linux kernel release v5.1 there have been * various bugs in this area with inconsistencies between the capability * flag and per-plane properties. + * + * All drivers should support XRGB8888, even if the hardware cannot support + * it. This has become the de-facto standard and a lot of user-space assume + * it will be present. If XRGB8888 is not natively supported, then it + * shouldn't be the default for preferred depth or fbdev emulation. + * + * DRM drivers should not do software color conversion, and + * only advertise the formats they support in hardware. This is for + * performance reason, and to avoid multiple conversions in userspace and + * kernel space. KMS page flips are generally expected to be very cheap + * operations. + * + * But there are two exceptions only for dumb buffers: + * * To support XRGB8888 if it's not supported by the hardware. + * * Any driver is free to modify its internal representation of the format, + * as long as it doesn't alter the visible content in any way, and doesn't + * modify the user-provided buffer. An example would be to drop the + * padding component from a format to save some memory bandwidth. + * On most hardware, VRAM read access are slow, so when doing the software + * conversion, the dumb buffer should be allocated in system RAM in order to + * have decent performance. + * Extra care should be taken when doing software conversion with + * DRM_CAP_DUMB_PREFER_SHADOW, there are more detailed explanations here: + * https://lore.kernel.org/dri-devel/20230818162415.2185f8e3@eldfell/ */ static unsigned int drm_num_planes(struct drm_device *dev)