Message ID | 20240113183334.1690740-1-aurelien@aurel32.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | media: solo6x10: replace max(a, min(b, c)) by clamp(b, a, c) | expand |
From: Aurelien Jarno > Sent: 13 January 2024 18:34 > > This patch replaces max(a, min(b, c)) by clamp(b, a, c) in the solo6x10 > driver. This improves the readability and more importantly, for the > solo6x10-p2m.c file, this reduces on my system (x86-64, gcc 13): > - the preprocessed size from 121 MiB to 4.5 MiB; > - the build CPU time from 46.8 s to 1.6 s; > - the build memory from 2786 MiB to 98MiB. > > In fine, this allows this relatively simple C file to be built on a > 32-bit system. > > Reported-by: Jiri Slaby <jirislaby@gmail.com> > Closes: https://lore.kernel.org/lkml/18c6df0d-45ed-450c-9eda-95160a2bbb8e@gmail.com/ > Cc: stable@vger.kernel.org # v6.7+ > Suggested-by: David Laight <David.Laight@ACULAB.COM> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> Reviewed-by: David Laight <David.Laight@ACULAB.COM> I was about to send the same patch. Although I'm not sure why the cpu time is so large. It compiles pretty immediately on my system. I do have some patches to minmax.h that reduce the .i for the nested clamp() to around 200k. Mostly obtained be adding min/max_const() for the few places that need a constant and min/max_ptr() for pointer types. Supporting both causes the expansion to be a lot larger. David > --- > drivers/media/pci/solo6x10/solo6x10-offsets.h | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/media/pci/solo6x10/solo6x10-offsets.h b/drivers/media/pci/solo6x10/solo6x10- > offsets.h > index f414ee1316f2..fdbb817e6360 100644 > --- a/drivers/media/pci/solo6x10/solo6x10-offsets.h > +++ b/drivers/media/pci/solo6x10/solo6x10-offsets.h > @@ -57,16 +57,16 @@ > #define SOLO_MP4E_EXT_ADDR(__solo) \ > (SOLO_EREF_EXT_ADDR(__solo) + SOLO_EREF_EXT_AREA(__solo)) > #define SOLO_MP4E_EXT_SIZE(__solo) \ > - max((__solo->nr_chans * 0x00080000), \ > - min(((__solo->sdram_size - SOLO_MP4E_EXT_ADDR(__solo)) - \ > - __SOLO_JPEG_MIN_SIZE(__solo)), 0x00ff0000)) > + clamp(__solo->sdram_size - SOLO_MP4E_EXT_ADDR(__solo) - \ > + __SOLO_JPEG_MIN_SIZE(__solo), \ > + __solo->nr_chans * 0x00080000, 0x00ff0000) > > #define __SOLO_JPEG_MIN_SIZE(__solo) (__solo->nr_chans * 0x00080000) > #define SOLO_JPEG_EXT_ADDR(__solo) \ > (SOLO_MP4E_EXT_ADDR(__solo) + SOLO_MP4E_EXT_SIZE(__solo)) > #define SOLO_JPEG_EXT_SIZE(__solo) \ > - max(__SOLO_JPEG_MIN_SIZE(__solo), \ > - min((__solo->sdram_size - SOLO_JPEG_EXT_ADDR(__solo)), 0x00ff0000)) > + clamp(__solo->sdram_size - SOLO_JPEG_EXT_ADDR(__solo), \ > + __SOLO_JPEG_MIN_SIZE(__solo), 0x00ff0000) > > #define SOLO_SDRAM_END(__solo) \ > (SOLO_JPEG_EXT_ADDR(__solo) + SOLO_JPEG_EXT_SIZE(__solo)) > -- > 2.42.0 - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
Hi Aurelien, On 13/01/2024 19:33, Aurelien Jarno wrote: > This patch replaces max(a, min(b, c)) by clamp(b, a, c) in the solo6x10 > driver. This improves the readability and more importantly, for the > solo6x10-p2m.c file, this reduces on my system (x86-64, gcc 13): > - the preprocessed size from 121 MiB to 4.5 MiB; > - the build CPU time from 46.8 s to 1.6 s; > - the build memory from 2786 MiB to 98MiB. Thank you for this patch. Reviewed-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> I'll pick this up as a fix for v6.8. Linus, if you prefer to pick this up directly, then that's fine as well. I was aware for some time that something had changed elsewhere that caused the solo driver compilation to take a lot of memory, but I didn't have the time yet to investigate it. I now see that this was discussed in the thread from the Closes tag. Note that it would have been nice if that was CCed to linux-media, since it concerned a media driver. The use of clamp() is definitely cleaner code, but it is rather insane that max(a, min(b, c)) takes 2786 MiB over 98 MiB for clamp(). The driver code looks perfectly innocuous, and you won't see the problem unless you compile on a system or a VM with a lot less memory. (I only noticed it myself when compiling in a VM. Perhaps sparse or smatch should be extended with a check for nested min/max and warn for that with a recommendation to use clamp instead? Although right now these tools skip the solo driver since it takes too much memory :-) I even found a min(max(max())) case in drivers/net/wireless/ath/ath11k/mac.c: nss = min(nss, max(max(ath11k_mac_max_ht_nss(ht_mcs_mask), ath11k_mac_max_vht_nss(vht_mcs_mask)), ath11k_mac_max_he_nss(he_mcs_mask))); I wonder how that expands in cpp. So while this patch is a nice cleanup, the underlying problem remains. Regards, Hans > > In fine, this allows this relatively simple C file to be built on a > 32-bit system. > > Reported-by: Jiri Slaby <jirislaby@gmail.com> > Closes: https://lore.kernel.org/lkml/18c6df0d-45ed-450c-9eda-95160a2bbb8e@gmail.com/ > Cc: stable@vger.kernel.org # v6.7+ > Suggested-by: David Laight <David.Laight@ACULAB.COM> > Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> > --- > drivers/media/pci/solo6x10/solo6x10-offsets.h | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/drivers/media/pci/solo6x10/solo6x10-offsets.h b/drivers/media/pci/solo6x10/solo6x10-offsets.h > index f414ee1316f2..fdbb817e6360 100644 > --- a/drivers/media/pci/solo6x10/solo6x10-offsets.h > +++ b/drivers/media/pci/solo6x10/solo6x10-offsets.h > @@ -57,16 +57,16 @@ > #define SOLO_MP4E_EXT_ADDR(__solo) \ > (SOLO_EREF_EXT_ADDR(__solo) + SOLO_EREF_EXT_AREA(__solo)) > #define SOLO_MP4E_EXT_SIZE(__solo) \ > - max((__solo->nr_chans * 0x00080000), \ > - min(((__solo->sdram_size - SOLO_MP4E_EXT_ADDR(__solo)) - \ > - __SOLO_JPEG_MIN_SIZE(__solo)), 0x00ff0000)) > + clamp(__solo->sdram_size - SOLO_MP4E_EXT_ADDR(__solo) - \ > + __SOLO_JPEG_MIN_SIZE(__solo), \ > + __solo->nr_chans * 0x00080000, 0x00ff0000) > > #define __SOLO_JPEG_MIN_SIZE(__solo) (__solo->nr_chans * 0x00080000) > #define SOLO_JPEG_EXT_ADDR(__solo) \ > (SOLO_MP4E_EXT_ADDR(__solo) + SOLO_MP4E_EXT_SIZE(__solo)) > #define SOLO_JPEG_EXT_SIZE(__solo) \ > - max(__SOLO_JPEG_MIN_SIZE(__solo), \ > - min((__solo->sdram_size - SOLO_JPEG_EXT_ADDR(__solo)), 0x00ff0000)) > + clamp(__solo->sdram_size - SOLO_JPEG_EXT_ADDR(__solo), \ > + __SOLO_JPEG_MIN_SIZE(__solo), 0x00ff0000) > > #define SOLO_SDRAM_END(__solo) \ > (SOLO_JPEG_EXT_ADDR(__solo) + SOLO_JPEG_EXT_SIZE(__solo))
On Sun, 14 Jan 2024 at 03:04, Hans Verkuil <hverkuil@xs4all.nl> wrote: > > I'll pick this up as a fix for v6.8. > > Linus, if you prefer to pick this up directly, then that's fine as well. Bah, missed this email, and so a belated note that I picked the patch up as commit 31e97d7c9ae3. It even got your Reviewed-by thanks to b4 picking that up automatically. Linus
diff --git a/drivers/media/pci/solo6x10/solo6x10-offsets.h b/drivers/media/pci/solo6x10/solo6x10-offsets.h index f414ee1316f2..fdbb817e6360 100644 --- a/drivers/media/pci/solo6x10/solo6x10-offsets.h +++ b/drivers/media/pci/solo6x10/solo6x10-offsets.h @@ -57,16 +57,16 @@ #define SOLO_MP4E_EXT_ADDR(__solo) \ (SOLO_EREF_EXT_ADDR(__solo) + SOLO_EREF_EXT_AREA(__solo)) #define SOLO_MP4E_EXT_SIZE(__solo) \ - max((__solo->nr_chans * 0x00080000), \ - min(((__solo->sdram_size - SOLO_MP4E_EXT_ADDR(__solo)) - \ - __SOLO_JPEG_MIN_SIZE(__solo)), 0x00ff0000)) + clamp(__solo->sdram_size - SOLO_MP4E_EXT_ADDR(__solo) - \ + __SOLO_JPEG_MIN_SIZE(__solo), \ + __solo->nr_chans * 0x00080000, 0x00ff0000) #define __SOLO_JPEG_MIN_SIZE(__solo) (__solo->nr_chans * 0x00080000) #define SOLO_JPEG_EXT_ADDR(__solo) \ (SOLO_MP4E_EXT_ADDR(__solo) + SOLO_MP4E_EXT_SIZE(__solo)) #define SOLO_JPEG_EXT_SIZE(__solo) \ - max(__SOLO_JPEG_MIN_SIZE(__solo), \ - min((__solo->sdram_size - SOLO_JPEG_EXT_ADDR(__solo)), 0x00ff0000)) + clamp(__solo->sdram_size - SOLO_JPEG_EXT_ADDR(__solo), \ + __SOLO_JPEG_MIN_SIZE(__solo), 0x00ff0000) #define SOLO_SDRAM_END(__solo) \ (SOLO_JPEG_EXT_ADDR(__solo) + SOLO_JPEG_EXT_SIZE(__solo))
This patch replaces max(a, min(b, c)) by clamp(b, a, c) in the solo6x10 driver. This improves the readability and more importantly, for the solo6x10-p2m.c file, this reduces on my system (x86-64, gcc 13): - the preprocessed size from 121 MiB to 4.5 MiB; - the build CPU time from 46.8 s to 1.6 s; - the build memory from 2786 MiB to 98MiB. In fine, this allows this relatively simple C file to be built on a 32-bit system. Reported-by: Jiri Slaby <jirislaby@gmail.com> Closes: https://lore.kernel.org/lkml/18c6df0d-45ed-450c-9eda-95160a2bbb8e@gmail.com/ Cc: stable@vger.kernel.org # v6.7+ Suggested-by: David Laight <David.Laight@ACULAB.COM> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net> --- drivers/media/pci/solo6x10/solo6x10-offsets.h | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-)