diff mbox series

lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan

Message ID 20231018182412.80291-1-hamza.mahfooz@amd.com (mailing list archive)
State Changes Requested
Headers show
Series lib/Kconfig.debug: disable FRAME_WARN for kasan and kcsan | expand

Checks

Context Check Description
netdev/tree_selection success Not a local patch

Commit Message

Hamza Mahfooz Oct. 18, 2023, 6:24 p.m. UTC
With every release of LLVM, both of these sanitizers eat up more and
more of the stack. So, set FRAME_WARN to 0 if either of them is enabled
for a given build.

Cc: stable@vger.kernel.org
Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
---
 lib/Kconfig.debug | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

Comments

Geert Uytterhoeven Oct. 18, 2023, 6:29 p.m. UTC | #1
Hi Hamza,

On Wed, Oct 18, 2023 at 8:24 PM Hamza Mahfooz <hamza.mahfooz@amd.com> wrote:
> With every release of LLVM, both of these sanitizers eat up more and
> more of the stack. So, set FRAME_WARN to 0 if either of them is enabled
> for a given build.
>
> Cc: stable@vger.kernel.org
> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>

Thanks for your patch!

> --- a/lib/Kconfig.debug
> +++ b/lib/Kconfig.debug
> @@ -429,11 +429,10 @@ endif # DEBUG_INFO
>  config FRAME_WARN
>         int "Warn for stack frames larger than"
>         range 0 8192
> -       default 0 if KMSAN
> +       default 0 if KASAN || KCSAN || KMSAN

Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable?
Stack overflows do cause crashes.

>         default 2048 if GCC_PLUGIN_LATENT_ENTROPY
>         default 2048 if PARISC
>         default 1536 if (!64BIT && XTENSA)
> -       default 1280 if KASAN && !64BIT
>         default 1024 if !64BIT
>         default 2048 if 64BIT
>         help

Gr{oetje,eeting}s,

                        Geert
Hamza Mahfooz Oct. 18, 2023, 6:39 p.m. UTC | #2
On 10/18/23 14:29, Geert Uytterhoeven wrote:
> Hi Hamza,
> 
> On Wed, Oct 18, 2023 at 8:24 PM Hamza Mahfooz <hamza.mahfooz@amd.com> wrote:
>> With every release of LLVM, both of these sanitizers eat up more and
>> more of the stack. So, set FRAME_WARN to 0 if either of them is enabled
>> for a given build.
>>
>> Cc: stable@vger.kernel.org
>> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
> 
> Thanks for your patch!
> 
>> --- a/lib/Kconfig.debug
>> +++ b/lib/Kconfig.debug
>> @@ -429,11 +429,10 @@ endif # DEBUG_INFO
>>   config FRAME_WARN
>>          int "Warn for stack frames larger than"
>>          range 0 8192
>> -       default 0 if KMSAN
>> +       default 0 if KASAN || KCSAN || KMSAN
> 
> Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable?

They are all intended to be used for runtime debugging, so I'd imagine so.

> Stack overflows do cause crashes.

It is worth noting that FRAME_WARN has been disabled for KMSAN for quite
a while and as far as I can tell no one has complained.

> 
>>          default 2048 if GCC_PLUGIN_LATENT_ENTROPY
>>          default 2048 if PARISC
>>          default 1536 if (!64BIT && XTENSA)
>> -       default 1280 if KASAN && !64BIT
>>          default 1024 if !64BIT
>>          default 2048 if 64BIT
>>          help
> 
> Gr{oetje,eeting}s,
> 
>                          Geert
>
Geert Uytterhoeven Oct. 18, 2023, 7:12 p.m. UTC | #3
Hi Hamza,

On Wed, Oct 18, 2023 at 8:39 PM Hamza Mahfooz <hamza.mahfooz@amd.com> wrote:
> On 10/18/23 14:29, Geert Uytterhoeven wrote:
> > On Wed, Oct 18, 2023 at 8:24 PM Hamza Mahfooz <hamza.mahfooz@amd.com> wrote:
> >> With every release of LLVM, both of these sanitizers eat up more and
> >> more of the stack. So, set FRAME_WARN to 0 if either of them is enabled
> >> for a given build.
> >>
> >> Cc: stable@vger.kernel.org
> >> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com>
> >
> > Thanks for your patch!
> >
> >> --- a/lib/Kconfig.debug
> >> +++ b/lib/Kconfig.debug
> >> @@ -429,11 +429,10 @@ endif # DEBUG_INFO
> >>   config FRAME_WARN
> >>          int "Warn for stack frames larger than"
> >>          range 0 8192
> >> -       default 0 if KMSAN
> >> +       default 0 if KASAN || KCSAN || KMSAN
> >
> > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable?
>
> They are all intended to be used for runtime debugging, so I'd imagine so.

Then I strongly suggest putting a nonzero value here.  As you write
that "with every release of LLVM, both of these sanitizers eat up more and more
of the stack", don't you want to have at least some canary to detect
when "more and more" is guaranteed to run into problems?

> > Stack overflows do cause crashes.
>
> It is worth noting that FRAME_WARN has been disabled for KMSAN for quite
> a while and as far as I can tell no one has complained.

ROTFL...

> >>          default 2048 if GCC_PLUGIN_LATENT_ENTROPY
> >>          default 2048 if PARISC
> >>          default 1536 if (!64BIT && XTENSA)
> >> -       default 1280 if KASAN && !64BIT
> >>          default 1024 if !64BIT
> >>          default 2048 if 64BIT
> >>          help

Gr{oetje,eeting}s,

                        Geert
Alexander Potapenko Oct. 19, 2023, 10:04 a.m. UTC | #4
> > > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable?
> >
> > They are all intended to be used for runtime debugging, so I'd imagine so.
>
> Then I strongly suggest putting a nonzero value here.  As you write
> that "with every release of LLVM, both of these sanitizers eat up more and more
> of the stack", don't you want to have at least some canary to detect
> when "more and more" is guaranteed to run into problems?

FRAME_WARN is a poor canary. First, it does not necessarily indicate
that a build is faulty (a single bloated stack frame won't crash the
system).
Second, devs are unlikely to fix a function because its stack frame is
too big under some exotic tool+compiler combination.
So the remaining option would be to just increase the frame size every
time a new function surpasses the limit.
Arnd Bergmann Oct. 19, 2023, 12:53 p.m. UTC | #5
On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote:
>> > > Are kernels with KASAN || KCSAN || KMSAN enabled supposed to be bootable?
>> >
>> > They are all intended to be used for runtime debugging, so I'd imagine so.
>>
>> Then I strongly suggest putting a nonzero value here.  As you write
>> that "with every release of LLVM, both of these sanitizers eat up more and more
>> of the stack", don't you want to have at least some canary to detect
>> when "more and more" is guaranteed to run into problems?
>
> FRAME_WARN is a poor canary. First, it does not necessarily indicate
> that a build is faulty (a single bloated stack frame won't crash the
> system).

I agree it's flawed, but it does catch a lot of bugs, both in the
driver and the compiler. What we should probably have is some better
runtime debugging in addition to FRAME_WARN, but it's better than
nothing.

One idea that I've suggested in the past is to add a soft stack
limit that is lower than THREAD_SIZE, using VMAP_STACK with a custom
stack start and a read-only page at the end to catch a thread
exceeding the soft limit and print a backtrace before marking
the page writable.

> Second, devs are unlikely to fix a function because its stack frame is
> too big under some exotic tool+compiler combination.

I've probably sent hundreds of fixes for these in the past. Most
of the time there is an actual driver bug, and almost always
the driver maintainers are responsive and treat the report with
the appropriate urgency: even if only some configurations actually
push it over the limit, the general case is some data structure that
is hundreds of bytes long and was not actually meant to be on
the stack.

The gcc bug reports also usually get addressed quickly, though
we've had problems with clang not making progress on known
bugs for years. It sounds like Nick has made some important
progress on clang very recently, so we should be able to
raise the minimum clang version for kasan and kcsan once
there is a known good release.

> So the remaining option would be to just increase the frame size every
> time a new function surpasses the limit.

That is clearly not an option, though we could try to
add Kconfig dependencies that avoid the known bad combinations,
such as annotating the AMD GPU driver as

      depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN)

    Arnd
Nathan Chancellor Oct. 19, 2023, 3:56 p.m. UTC | #6
On Thu, Oct 19, 2023 at 02:53:01PM +0200, Arnd Bergmann wrote:
> On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote:
> > So the remaining option would be to just increase the frame size every
> > time a new function surpasses the limit.
> 
> That is clearly not an option, though we could try to
> add Kconfig dependencies that avoid the known bad combinations,
> such as annotating the AMD GPU driver as
> 
>       depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN)

This would effectively disable the AMDGPU driver for allmodconfig, which
is somewhat unfortunate as it is an easy testing target.

Taking a step back, this is all being done because of a couple of
warnings in the AMDGPU code. If fixing those in the source is too much
effort (I did note [1] that GCC is at the current limit for that file
even with Rodrigo's series applied [2]), couldn't we just take the
existing workaround that this Makefile has for this file and its high
stack usage and just extend it slightly for clang?

diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
index 66431525f2a0..fd49e3526c0d 100644
--- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
+++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
@@ -58,7 +58,7 @@ endif
 endif
 
 ifneq ($(CONFIG_FRAME_WARN),0)
-frame_warn_flag := -Wframe-larger-than=2048
+frame_warn_flag := -Wframe-larger-than=$(if $(CONFIG_CC_IS_CLANG),3072,2048)
 endif
 
 CFLAGS_$(AMDDALPATH)/dc/dml2/display_mode_core.o := $(dml2_ccflags) $(frame_warn_flag)

That would address the immediate concern of the warning breaking builds
with CONFIG_WERROR=y while not raising the limit for other files in the
kernel (just this one file in AMDGPU) and avoiding disabling the whole
driver. The number could be lower, I think ~2500 bytes is the most usage
I see with Rodrigo's series applied, so maybe 2800 would be a decent
limit? Once there is a fix in the compiler, this expression could be
changed to use clang-min-version or something of that sort.

[1]: https://lore.kernel.org/20231017172231.GA2348194@dev-arch.thelio-3990X/
[2]: https://lore.kernel.org/20231016142031.241912-1-Rodrigo.Siqueira@amd.com/

Cheers,
Nathan
Hamza Mahfooz Oct. 19, 2023, 8:17 p.m. UTC | #7
On 10/19/23 11:56, Nathan Chancellor wrote:
> On Thu, Oct 19, 2023 at 02:53:01PM +0200, Arnd Bergmann wrote:
>> On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote:
>>> So the remaining option would be to just increase the frame size every
>>> time a new function surpasses the limit.
>>
>> That is clearly not an option, though we could try to
>> add Kconfig dependencies that avoid the known bad combinations,
>> such as annotating the AMD GPU driver as
>>
>>        depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN)
> 
> This would effectively disable the AMDGPU driver for allmodconfig, which
> is somewhat unfortunate as it is an easy testing target.
> 
> Taking a step back, this is all being done because of a couple of
> warnings in the AMDGPU code. If fixing those in the source is too much
> effort (I did note [1] that GCC is at the current limit for that file
> even with Rodrigo's series applied [2]), couldn't we just take the
> existing workaround that this Makefile has for this file and its high
> stack usage and just extend it slightly for clang?

I personally don't mind fixing these issues in the driver, but the fact
that they the creep back every time a new major version of Clang rolls
out (that has been true for the past couple of years at the very
least), makes it rather annoying to deal with.

> 
> diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> index 66431525f2a0..fd49e3526c0d 100644
> --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> @@ -58,7 +58,7 @@ endif
>   endif
>   
>   ifneq ($(CONFIG_FRAME_WARN),0)
> -frame_warn_flag := -Wframe-larger-than=2048
> +frame_warn_flag := -Wframe-larger-than=$(if $(CONFIG_CC_IS_CLANG),3072,2048)
>   endif
>   
>   CFLAGS_$(AMDDALPATH)/dc/dml2/display_mode_core.o := $(dml2_ccflags) $(frame_warn_flag)
> 
> That would address the immediate concern of the warning breaking builds
> with CONFIG_WERROR=y while not raising the limit for other files in the
> kernel (just this one file in AMDGPU) and avoiding disabling the whole
> driver. The number could be lower, I think ~2500 bytes is the most usage
> I see with Rodrigo's series applied, so maybe 2800 would be a decent
> limit? Once there is a fix in the compiler, this expression could be
> changed to use clang-min-version or something of that sort.
> 
> [1]: https://lore.kernel.org/20231017172231.GA2348194@dev-arch.thelio-3990X/
> [2]: https://lore.kernel.org/20231016142031.241912-1-Rodrigo.Siqueira@amd.com/
> 
> Cheers,
> Nathan
Nathan Chancellor Oct. 19, 2023, 8:51 p.m. UTC | #8
On Thu, Oct 19, 2023 at 04:17:26PM -0400, Hamza Mahfooz wrote:
> On 10/19/23 11:56, Nathan Chancellor wrote:
> > On Thu, Oct 19, 2023 at 02:53:01PM +0200, Arnd Bergmann wrote:
> > > On Thu, Oct 19, 2023, at 12:04, Alexander Potapenko wrote:
> > > > So the remaining option would be to just increase the frame size every
> > > > time a new function surpasses the limit.
> > > 
> > > That is clearly not an option, though we could try to
> > > add Kconfig dependencies that avoid the known bad combinations,
> > > such as annotating the AMD GPU driver as
> > > 
> > >        depends on (CC_IS_GCC || CLANG_VERSION >=180000) || !(KASAN || KCSAN)
> > 
> > This would effectively disable the AMDGPU driver for allmodconfig, which
> > is somewhat unfortunate as it is an easy testing target.
> > 
> > Taking a step back, this is all being done because of a couple of
> > warnings in the AMDGPU code. If fixing those in the source is too much
> > effort (I did note [1] that GCC is at the current limit for that file
> > even with Rodrigo's series applied [2]), couldn't we just take the
> > existing workaround that this Makefile has for this file and its high
> > stack usage and just extend it slightly for clang?
> 
> I personally don't mind fixing these issues in the driver, but the fact
> that they the creep back every time a new major version of Clang rolls
> out (that has been true for the past couple of years at the very
> least), makes it rather annoying to deal with.

I am not sure I agree with that characterization of the situation. clang
has been pretty consistent for the most part (which is certainly on us),
as all versions that the kernel supports warns about this code. I
believe it is more so the fact that there is a new copy of the dcn code
added every year that has none of the fixes applied from earlier
generations... It is not just me that has fixed issues like this, just
run 'git log --grep=stack drivers/gpu/drm/amd/display'. It is not just
clang that complains about the code when sanitizers are turned on, GCC
does as well since Stephen reported them.

Cheers,
Nathan

> > diff --git a/drivers/gpu/drm/amd/display/dc/dml2/Makefile b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> > index 66431525f2a0..fd49e3526c0d 100644
> > --- a/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> > +++ b/drivers/gpu/drm/amd/display/dc/dml2/Makefile
> > @@ -58,7 +58,7 @@ endif
> >   endif
> >   ifneq ($(CONFIG_FRAME_WARN),0)
> > -frame_warn_flag := -Wframe-larger-than=2048
> > +frame_warn_flag := -Wframe-larger-than=$(if $(CONFIG_CC_IS_CLANG),3072,2048)
> >   endif
> >   CFLAGS_$(AMDDALPATH)/dc/dml2/display_mode_core.o := $(dml2_ccflags) $(frame_warn_flag)
> > 
> > That would address the immediate concern of the warning breaking builds
> > with CONFIG_WERROR=y while not raising the limit for other files in the
> > kernel (just this one file in AMDGPU) and avoiding disabling the whole
> > driver. The number could be lower, I think ~2500 bytes is the most usage
> > I see with Rodrigo's series applied, so maybe 2800 would be a decent
> > limit? Once there is a fix in the compiler, this expression could be
> > changed to use clang-min-version or something of that sort.
> > 
> > [1]: https://lore.kernel.org/20231017172231.GA2348194@dev-arch.thelio-3990X/
> > [2]: https://lore.kernel.org/20231016142031.241912-1-Rodrigo.Siqueira@amd.com/
> > 
> > Cheers,
> > Nathan
> -- 
> Hamza
>
diff mbox series

Patch

diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 39d1d93164bd..15ad742729ca 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -429,11 +429,10 @@  endif # DEBUG_INFO
 config FRAME_WARN
 	int "Warn for stack frames larger than"
 	range 0 8192
-	default 0 if KMSAN
+	default 0 if KASAN || KCSAN || KMSAN
 	default 2048 if GCC_PLUGIN_LATENT_ENTROPY
 	default 2048 if PARISC
 	default 1536 if (!64BIT && XTENSA)
-	default 1280 if KASAN && !64BIT
 	default 1024 if !64BIT
 	default 2048 if 64BIT
 	help