Message ID | 20231018182412.80291-1-hamza.mahfooz@amd.com (mailing list archive) |
---|---|
State | Changes Requested |
Headers | show |
Series | lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Not a local patch |
Hi Hamza, On Wed, Oct 18, 2023 at 8:24 PM Hamza Mahfooz <hamza.mahfooz@amd.com> wrote: > With every release of LLVM, both of these sanitizers eat up more and > more of the stack. So, set FRAME_WARN to 0 if either of them is enabled > for a given build. > > Cc: stable@vger.kernel.org > Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Thanks for your patch! > --- a/lib/Kconfig.debug > +++ b/lib/Kconfig.debug > @@ -429,11 +429,10 @@ endif # DEBUG_INFO > config FRAME_WARN > int "Warn for stack frames larger than" > range 0 8192 > - default 0 if KMSAN > + default 0 if KASAN || KCSAN || KMSAN Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable? Stack overflows do cause crashes. > default 2048 if GCC_PLUGIN_LATENT_ENTROPY > default 2048 if PARISC > default 1536 if (!64BIT && XTENSA) > - default 1280 if KASAN && !64BIT > default 1024 if !64BIT > default 2048 if 64BIT > help Gr{oetje,eeting}s, Geert
On 10/18/23 14:29, Geert Uytterhoeven wrote: > Hi Hamza, > > On Wed, Oct 18, 2023 at 8:24 PM Hamza Mahfooz <hamza.mahfooz@amd.com> wrote: >> With every release of LLVM, both of these sanitizers eat up more and >> more of the stack. So, set FRAME_WARN to 0 if either of them is enabled >> for a given build. >> >> Cc: stable@vger.kernel.org >> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> > > Thanks for your patch! > >> --- a/lib/Kconfig.debug >> +++ b/lib/Kconfig.debug >> @@ -429,11 +429,10 @@ endif # DEBUG_INFO >> config FRAME_WARN >> int "Warn for stack frames larger than" >> range 0 8192 >> - default 0 if KMSAN >> + default 0 if KASAN || KCSAN || KMSAN > > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable? They are all intended to be used for runtime debugging, so I'd imagine so. > Stack overflows do cause crashes. It is worth noting that FRAME_WARN has been disabled for KMSAN for quite a while and as far as I can tell no one has complained. > >> default 2048 if GCC_PLUGIN_LATENT_ENTROPY >> default 2048 if PARISC >> default 1536 if (!64BIT && XTENSA) >> - default 1280 if KASAN && !64BIT >> default 1024 if !64BIT >> default 2048 if 64BIT >> help > > Gr{oetje,eeting}s, > > Geert >
Hi Hamza, On Wed, Oct 18, 2023 at 8:39 PM Hamza Mahfooz <hamza.mahfooz@amd.com> wrote: > On 10/18/23 14:29, Geert Uytterhoeven wrote: > > On Wed, Oct 18, 2023 at 8:24 PM Hamza Mahfooz <hamza.mahfooz@amd.com> wrote: > >> With every release of LLVM, both of these sanitizers eat up more and > >> more of the stack. So, set FRAME_WARN to 0 if either of them is enabled > >> for a given build. > >> > >> Cc: stable@vger.kernel.org > >> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> > > > > Thanks for your patch! > > > >> --- a/lib/Kconfig.debug > >> +++ b/lib/Kconfig.debug > >> @@ -429,11 +429,10 @@ endif # DEBUG_INFO > >> config FRAME_WARN > >> int "Warn for stack frames larger than" > >> range 0 8192 > >> - default 0 if KMSAN > >> + default 0 if KASAN || KCSAN || KMSAN > > > > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable? > > They are all intended to be used for runtime debugging, so I'd imagine so. Then I strongly suggest putting a nonzero value here. As you write that "with every release of LLVM, both of these sanitizers eat up more and more of the stack", don't you want to have at least some canary to detect when "more and more" is guaranteed to run into problems? > > Stack overflows do cause crashes. > > It is worth noting that FRAME_WARN has been disabled for KMSAN for quite > a while and as far as I can tell no one has complained. ROTFL... > >> default 2048 if GCC_PLUGIN_LATENT_ENTROPY > >> default 2048 if PARISC > >> default 1536 if (!64BIT && XTENSA) > >> - default 1280 if KASAN && !64BIT > >> default 1024 if !64BIT > >> default 2048 if 64BIT > >> help Gr{oetje,eeting}s, Geert
> > > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable? > > > > They are all intended to be used for runtime debugging, so I'd imagine so. > > Then I strongly suggest putting a nonzero value here. As you write > that "with every release of LLVM, both of these sanitizers eat up more and more > of the stack", don't you want to have at least some canary to detect > when "more and more" is guaranteed to run into problems? FRAME_WARN is a poor canary. First, it does not necessarily indicate that a build is faulty (a single bloated stack frame won't crash the system). Second, devs are unlikely to fix a function because its stack frame is too big under some exotic tool+compiler combination. So the remaining option would be to just increase the frame size every time a new function surpasses the limit.
On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote: >> > > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable? >> > >> > They are all intended to be used for runtime debugging, so I'd imagine so. >> >> Then I strongly suggest putting a nonzero value here. As you write >> that "with every release of LLVM, both of these sanitizers eat up more and more >> of the stack", don't you want to have at least some canary to detect >> when "more and more" is guaranteed to run into problems? > > FRAME_WARN is a poor canary. First, it does not necessarily indicate > that a build is faulty (a single bloated stack frame won't crash the > system). I agree it's flawed, but it does catch a lot of bugs, both in the driver and the compiler. What we should probably have is some better runtime debugging in addition to FRAME_WARN, but it's better than nothing. One idea that I've suggested in the past is to add a soft stack limit that is lower than THREAD_SIZE, using VMAP_STACK with a custom stack start and a read-only page at the end to catch a thread exceeding the soft limit and print a backtrace before marking the page writable. > Second, devs are unlikely to fix a function because its stack frame is > too big under some exotic tool+compiler combination. I've probably sent hundreds of fixes for these in the past. Most of the time there is an actual driver bug, and almost always the driver maintainers are responsive and treat the report with the appropriate urgency: even if only some configurations actually push it over the limit, the general case is some data structure that is hundreds of bytes long and was not actually meant to be on the stack. The gcc bug reports also usually get addressed quickly, though we've had problems with clang not making progress on known bugs for years. It sounds like Nick has made some important progress on clang very recently, so we should be able to raise the minimum clang version for kasan and kcsan once there is a known good release. > So the remaining option would be to just increase the frame size every > time a new function surpasses the limit. That is clearly not an option, though we could try to add Kconfig dependencies that avoid the known bad combinations, such as annotating the AMD GPU driver as depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN) Arnd
On Thu, Oct 19, 2023 at 02:53:01PM +0200, Arnd Bergmann wrote: > On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote: > > So the remaining option would be to just increase the frame size every > > time a new function surpasses the limit. > > That is clearly not an option, though we could try to > add Kconfig dependencies that avoid the known bad combinations, > such as annotating the AMD GPU driver as > > depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN) This would effectively disable the AMDGPU driver for allmodconfig, which is somewhat unfortunate as it is an easy testing target. Taking a step back, this is all being done because of a couple of warnings in the AMDGPU code. If fixing those in the source is too much effort (I did note [1] that GCC is at the current limit for that file even with Rodrigo's series applied [2]), couldn't we just take the existing workaround that this Makefile has for this file and its high stack usage and just extend it slightly for clang? diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile index 66431525f2a0..fd49e3526c0d 100644 --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile @@ -58,7 +58,7 @@ endif endif ifneq ($(CONFIG_FRAME_WARN),0) -frame_warn_flag := -Wframe-larger-than=2048 +frame_warn_flag := -Wframe-larger-than=$(if $(CONFIG_CC_IS_CLANG),3072,2048) endif CFLAGS_$(AMDDALPATH)/dc/dml2/display_mode_core.o := $(dml2_ccflags) $(frame_warn_flag) That would address the immediate concern of the warning breaking builds with CONFIG_WERROR=y while not raising the limit for other files in the kernel (just this one file in AMDGPU) and avoiding disabling the whole driver. The number could be lower, I think ~2500 bytes is the most usage I see with Rodrigo's series applied, so maybe 2800 would be a decent limit? Once there is a fix in the compiler, this expression could be changed to use clang-min-version or something of that sort. [1]: https://lore.kernel.org/20231017172231.GA2348194@dev-arch.thelio-3990X/ [2]: https://lore.kernel.org/20231016142031.241912-1-Rodrigo.Siqueira@amd.com/ Cheers, Nathan
On 10/19/23 11:56, Nathan Chancellor wrote: > On Thu, Oct 19, 2023 at 02:53:01PM +0200, Arnd Bergmann wrote: >> On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote: >>> So the remaining option would be to just increase the frame size every >>> time a new function surpasses the limit. >> >> That is clearly not an option, though we could try to >> add Kconfig dependencies that avoid the known bad combinations, >> such as annotating the AMD GPU driver as >> >> depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN) > > This would effectively disable the AMDGPU driver for allmodconfig, which > is somewhat unfortunate as it is an easy testing target. > > Taking a step back, this is all being done because of a couple of > warnings in the AMDGPU code. If fixing those in the source is too much > effort (I did note [1] that GCC is at the current limit for that file > even with Rodrigo's series applied [2]), couldn't we just take the > existing workaround that this Makefile has for this file and its high > stack usage and just extend it slightly for clang? I personally don't mind fixing these issues in the driver, but the fact that they the creep back every time a new major version of Clang rolls out (that has been true for the past couple of years at the very least), makes it rather annoying to deal with. > > diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile > index 66431525f2a0..fd49e3526c0d 100644 > --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile > +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile > @@ -58,7 +58,7 @@ endif > endif > > ifneq ($(CONFIG_FRAME_WARN),0) > -frame_warn_flag := -Wframe-larger-than=2048 > +frame_warn_flag := -Wframe-larger-than=$(if $(CONFIG_CC_IS_CLANG),3072,2048) > endif > > CFLAGS_$(AMDDALPATH)/dc/dml2/display_mode_core.o := $(dml2_ccflags) $(frame_warn_flag) > > That would address the immediate concern of the warning breaking builds > with CONFIG_WERROR=y while not raising the limit for other files in the > kernel (just this one file in AMDGPU) and avoiding disabling the whole > driver. The number could be lower, I think ~2500 bytes is the most usage > I see with Rodrigo's series applied, so maybe 2800 would be a decent > limit? Once there is a fix in the compiler, this expression could be > changed to use clang-min-version or something of that sort. > > [1]: https://lore.kernel.org/20231017172231.GA2348194@dev-arch.thelio-3990X/ > [2]: https://lore.kernel.org/20231016142031.241912-1-Rodrigo.Siqueira@amd.com/ > > Cheers, > Nathan
On Thu, Oct 19, 2023 at 04:17:26PM -0400, Hamza Mahfooz wrote: > On 10/19/23 11:56, Nathan Chancellor wrote: > > On Thu, Oct 19, 2023 at 02:53:01PM +0200, Arnd Bergmann wrote: > > > On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote: > > > > So the remaining option would be to just increase the frame size every > > > > time a new function surpasses the limit. > > > > > > That is clearly not an option, though we could try to > > > add Kconfig dependencies that avoid the known bad combinations, > > > such as annotating the AMD GPU driver as > > > > > > depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN) > > > > This would effectively disable the AMDGPU driver for allmodconfig, which > > is somewhat unfortunate as it is an easy testing target. > > > > Taking a step back, this is all being done because of a couple of > > warnings in the AMDGPU code. If fixing those in the source is too much > > effort (I did note [1] that GCC is at the current limit for that file > > even with Rodrigo's series applied [2]), couldn't we just take the > > existing workaround that this Makefile has for this file and its high > > stack usage and just extend it slightly for clang? > > I personally don't mind fixing these issues in the driver, but the fact > that they the creep back every time a new major version of Clang rolls > out (that has been true for the past couple of years at the very > least), makes it rather annoying to deal with. I am not sure I agree with that characterization of the situation. clang has been pretty consistent for the most part (which is certainly on us), as all versions that the kernel supports warns about this code. I believe it is more so the fact that there is a new copy of the dcn code added every year that has none of the fixes applied from earlier generations... It is not just me that has fixed issues like this, just run 'git log --grep=stack drivers/gpu/drm/amd/display'. It is not just clang that complains about the code when sanitizers are turned on, GCC does as well since Stephen reported them. Cheers, Nathan > > diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile > > index 66431525f2a0..fd49e3526c0d 100644 > > --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile > > +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile > > @@ -58,7 +58,7 @@ endif > > endif > > ifneq ($(CONFIG_FRAME_WARN),0) > > -frame_warn_flag := -Wframe-larger-than=2048 > > +frame_warn_flag := -Wframe-larger-than=$(if $(CONFIG_CC_IS_CLANG),3072,2048) > > endif > > CFLAGS_$(AMDDALPATH)/dc/dml2/display_mode_core.o := $(dml2_ccflags) $(frame_warn_flag) > > > > That would address the immediate concern of the warning breaking builds > > with CONFIG_WERROR=y while not raising the limit for other files in the > > kernel (just this one file in AMDGPU) and avoiding disabling the whole > > driver. The number could be lower, I think ~2500 bytes is the most usage > > I see with Rodrigo's series applied, so maybe 2800 would be a decent > > limit? Once there is a fix in the compiler, this expression could be > > changed to use clang-min-version or something of that sort. > > > > [1]: https://lore.kernel.org/20231017172231.GA2348194@dev-arch.thelio-3990X/ > > [2]: https://lore.kernel.org/20231016142031.241912-1-Rodrigo.Siqueira@amd.com/ > > > > Cheers, > > Nathan > -- > Hamza >
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug index 39d1d93164bd..15ad742729ca 100644 --- a/lib/Kconfig.debug +++ b/lib/Kconfig.debug @@ -429,11 +429,10 @@ endif # DEBUG_INFO config FRAME_WARN int "Warn for stack frames larger than" range 0 8192 - default 0 if KMSAN + default 0 if KASAN || KCSAN || KMSAN default 2048 if GCC_PLUGIN_LATENT_ENTROPY default 2048 if PARISC default 1536 if (!64BIT && XTENSA) - default 1280 if KASAN && !64BIT default 1024 if !64BIT default 2048 if 64BIT help
With every release of LLVM, both of these sanitizers eat up more and more of the stack. So, set FRAME_WARN to 0 if either of them is enabled for a given build. Cc: stable@vger.kernel.org Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> --- lib/Kconfig.debug | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-)