diff mbox series

kasan: turn off asan-stack for clang-8 and earlier

Message ID 20190219214940.391081-1-arnd@arndb.de (mailing list archive)
State New, archived
Headers show
Series kasan: turn off asan-stack for clang-8 and earlier | expand

Commit Message

Arnd Bergmann Feb. 19, 2019, 9:49 p.m. UTC
Building an arm64 allmodconfig kernel with clang results in over 140 warnings
about overly large stack frames, the worst ones being:

drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare'
drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread'
drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe'
drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd'
drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo'

None of these happen with gcc today, and almost all of these are the result
of a single known bug in llvm.  Hopefully it will eventually get fixed with the
clang-9 release.

In the meantime, the best idea I have is to turn off asan-stack for clang-8
and earlier, so we can produce a kernel that is safe to run.

I have posted three patches that address the frame overflow warnings that are
not addressed by turning off asan-stack, so in combination with this change,
we get much closer to a clean allmodconfig build, which in turn is necessary
to do meaningful build regression testing.

Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Qian Cai <cai@lca.pw>
Link: https://bugs.llvm.org/show_bug.cgi?id=38809
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
---
 lib/Kconfig.kasan      | 13 +++++++++++++
 scripts/Makefile.kasan |  2 +-
 2 files changed, 14 insertions(+), 1 deletion(-)

Comments

Qian Cai Feb. 19, 2019, 10:17 p.m. UTC | #1
On Tue, 2019-02-19 at 22:49 +0100, Arnd Bergmann wrote:
> Building an arm64 allmodconfig kernel with clang results in over 140 warnings
> about overly large stack frames, the worst ones being:
> 
> drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size
> of 20224 bytes in function 'st7789v_prepare'
> drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12:
> error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
> drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes
> in function 'max3421_spi_thread'
> drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes
> in function 'slic_ds26522_probe'
> drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in
> function 'ccp_run_cmd'
> drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840
> bytes in function 'stv0367ter_algo'
> 
> None of these happen with gcc today, and almost all of these are the result
> of a single known bug in llvm.  Hopefully it will eventually get fixed with
> the
> clang-9 release.
> 
> In the meantime, the best idea I have is to turn off asan-stack for clang-8
> and earlier, so we can produce a kernel that is safe to run.
> 
> I have posted three patches that address the frame overflow warnings that are
> not addressed by turning off asan-stack, so in combination with this change,
> we get much closer to a clean allmodconfig build, which in turn is necessary
> to do meaningful build regression testing.

Well, I am using clang 8.0 on arm64 and running the kernel just fine for a few
weeks now and never trigger a single stack overflow (THREAD_SHIFT = 15) because
I never use any of those drivers you mentioned above. I don't think it is a good
idea to blankly remove the testing coverage here and affect people don't use all
those offensive functions at all.
Nick Desaulniers Feb. 19, 2019, 10:43 p.m. UTC | #2
+ Evgenii, Kostya for KASAN

On Tue, Feb 19, 2019 at 2:17 PM Qian Cai <cai@lca.pw> wrote:
>
> On Tue, 2019-02-19 at 22:49 +0100, Arnd Bergmann wrote:
> > Building an arm64 allmodconfig kernel with clang results in over 140 warnings
> > about overly large stack frames, the worst ones being:
> >
> > drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size
> > of 20224 bytes in function 'st7789v_prepare'
> > drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12:
> > error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
> > drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes
> > in function 'max3421_spi_thread'
> > drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes
> > in function 'slic_ds26522_probe'
> > drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in
> > function 'ccp_run_cmd'
> > drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840
> > bytes in function 'stv0367ter_algo'
> >
> > None of these happen with gcc today, and almost all of these are the result
> > of a single known bug in llvm.  Hopefully it will eventually get fixed with
> > the
> > clang-9 release.

Would it be better to disable outright if CC_IS_CLANG, then go back
and adjust it once we know explicitly which version it lands in?
(Maybe I'm being too pessimistic about when it may be fixed).

> >
> > In the meantime, the best idea I have is to turn off asan-stack for clang-8
> > and earlier, so we can produce a kernel that is safe to run.
> >
> > I have posted three patches that address the frame overflow warnings that are
> > not addressed by turning off asan-stack, so in combination with this change,
> > we get much closer to a clean allmodconfig build, which in turn is necessary
> > to do meaningful build regression testing.
>
> Well, I am using clang 8.0 on arm64 and running the kernel just fine for a few
> weeks now and never trigger a single stack overflow (THREAD_SHIFT = 15) because
> I never use any of those drivers you mentioned above. I don't think it is a good
> idea to blankly remove the testing coverage here and affect people don't use all
> those offensive functions at all.

Thanks for the patch, Arnd!  Hopefully we can fix that up in Clang
soon.  Qian, I guess the alternative would be to add `-mllvm
-asan-stack=0` on potentially up to 140 Makefiles?
Kostya Serebryany Feb. 20, 2019, 12:33 a.m. UTC | #3
On Tue, Feb 19, 2019 at 2:43 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> + Evgenii, Kostya for KASAN
>
> On Tue, Feb 19, 2019 at 2:17 PM Qian Cai <cai@lca.pw> wrote:
> >
> > On Tue, 2019-02-19 at 22:49 +0100, Arnd Bergmann wrote:
> > > Building an arm64 allmodconfig kernel with clang results in over 140 warnings
> > > about overly large stack frames, the worst ones being:
> > >
> > > drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size
> > > of 20224 bytes in function 'st7789v_prepare'
> > > drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12:
> > > error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
> > > drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes
> > > in function 'max3421_spi_thread'
> > > drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes
> > > in function 'slic_ds26522_probe'
> > > drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in
> > > function 'ccp_run_cmd'
> > > drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840
> > > bytes in function 'stv0367ter_algo'
> > >
> > > None of these happen with gcc today, and almost all of these are the result
> > > of a single known bug in llvm.  Hopefully it will eventually get fixed with
> > > the
> > > clang-9 release.

I am not confident we can fix this in clang.
The difference between gcc and clang, AFAICT, is not in the asan
instrumentation, but in inining.
Looking at the reproducer from https://bugs.llvm.org/show_bug.cgi?id=38809#c4,
if I change ltv350qv_write_reg to always inline, gcc also produces a
huge 10K stack frame,
and making it noinline fixes the stack frame for both gcc and clang.


>
> Would it be better to disable outright if CC_IS_CLANG, then go back
> and adjust it once we know explicitly which version it lands in?
> (Maybe I'm being too pessimistic about when it may be fixed).
>
> > >
> > > In the meantime, the best idea I have is to turn off asan-stack for clang-8
> > > and earlier, so we can produce a kernel that is safe to run.
> > >
> > > I have posted three patches that address the frame overflow warnings that are
> > > not addressed by turning off asan-stack, so in combination with this change,
> > > we get much closer to a clean allmodconfig build, which in turn is necessary
> > > to do meaningful build regression testing.
> >
> > Well, I am using clang 8.0 on arm64 and running the kernel just fine for a few
> > weeks now and never trigger a single stack overflow (THREAD_SHIFT = 15) because
> > I never use any of those drivers you mentioned above. I don't think it is a good
> > idea to blankly remove the testing coverage here and affect people don't use all
> > those offensive functions at all.
>
> Thanks for the patch, Arnd!  Hopefully we can fix that up in Clang
> soon.  Qian, I guess the alternative would be to add `-mllvm
> -asan-stack=0` on potentially up to 140 Makefiles?
>
> --
> Thanks,
> ~Nick Desaulniers
Qian Cai Feb. 20, 2019, 1:25 a.m. UTC | #4
On 2/19/19 7:33 PM, Kostya Serebryany wrote:
>>> Well, I am using clang 8.0 on arm64 and running the kernel just fine for a few
>>> weeks now and never trigger a single stack overflow (THREAD_SHIFT = 15) because
>>> I never use any of those drivers you mentioned above. I don't think it is a good
>>> idea to blankly remove the testing coverage here and affect people don't use all
>>> those offensive functions at all.
>>
>> Thanks for the patch, Arnd!  Hopefully we can fix that up in Clang
>> soon.  Qian, I guess the alternative would be to add `-mllvm
>> -asan-stack=0` on potentially up to 140 Makefiles?

Depends on the exact stack consumption of those 140 functions. For example, I
don't care if you turn-off -asan-stack if the current stack size is < 32k.

For those functions always consume like a lot of stack like 8k or more, fix them
in Makefile if not too many of them.
Dmitry Vyukov Feb. 20, 2019, 6:44 a.m. UTC | #5
On Wed, Feb 20, 2019 at 1:34 AM Kostya Serebryany <kcc@google.com> wrote:
>
> On Tue, Feb 19, 2019 at 2:43 PM Nick Desaulniers
> <ndesaulniers@google.com> wrote:
> >
> > + Evgenii, Kostya for KASAN
> >
> > On Tue, Feb 19, 2019 at 2:17 PM Qian Cai <cai@lca.pw> wrote:
> > >
> > > On Tue, 2019-02-19 at 22:49 +0100, Arnd Bergmann wrote:
> > > > Building an arm64 allmodconfig kernel with clang results in over 140 warnings
> > > > about overly large stack frames, the worst ones being:
> > > >
> > > > drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size
> > > > of 20224 bytes in function 'st7789v_prepare'
> > > > drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12:
> > > > error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
> > > > drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes
> > > > in function 'max3421_spi_thread'
> > > > drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes
> > > > in function 'slic_ds26522_probe'
> > > > drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in
> > > > function 'ccp_run_cmd'
> > > > drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840
> > > > bytes in function 'stv0367ter_algo'
> > > >
> > > > None of these happen with gcc today, and almost all of these are the result
> > > > of a single known bug in llvm.  Hopefully it will eventually get fixed with
> > > > the
> > > > clang-9 release.
>
> I am not confident we can fix this in clang.
> The difference between gcc and clang, AFAICT, is not in the asan
> instrumentation, but in inining.
> Looking at the reproducer from https://bugs.llvm.org/show_bug.cgi?id=38809#c4,
> if I change ltv350qv_write_reg to always inline, gcc also produces a
> huge 10K stack frame,
> and making it noinline fixes the stack frame for both gcc and clang.

Does your gcc have fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 ?

For me gcc rejects to inline it:

gcc version 7.3.0 (Debian 7.3.0-5)

$ gcc /tmp/test.c -Wframe-larger-than=128 -c -fsanitize=kernel-address
-O2 --param asan-stack=1
/tmp/test.c:23:13: warning: always_inline function might not be
inlinable [-Wattributes]
 static void ltv350qv_write_reg(void) {
             ^~~~~~~~~~~~~~~~~~
/tmp/test.c: In function ‘ltv350qv_power_on’:
/tmp/test.c:57:1: warning: the frame size of 416 bytes is larger than
128 bytes [-Wframe-larger-than=]
 }
 ^
Arnd Bergmann Feb. 20, 2019, 9:19 a.m. UTC | #6
On Wed, Feb 20, 2019 at 7:44 AM Dmitry Vyukov <dvyukov@google.com> wrote:
>
> On Wed, Feb 20, 2019 at 1:34 AM Kostya Serebryany <kcc@google.com> wrote:
> >
> > On Tue, Feb 19, 2019 at 2:43 PM Nick Desaulniers
> > <ndesaulniers@google.com> wrote:
> > >
> > > + Evgenii, Kostya for KASAN
> > >
> > > On Tue, Feb 19, 2019 at 2:17 PM Qian Cai <cai@lca.pw> wrote:
> > > >
> > > > On Tue, 2019-02-19 at 22:49 +0100, Arnd Bergmann wrote:
> > > > > Building an arm64 allmodconfig kernel with clang results in over 140 warnings
> > > > > about overly large stack frames, the worst ones being:
> > > > >
> > > > > drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size
> > > > > of 20224 bytes in function 'st7789v_prepare'
> > > > > drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12:
> > > > > error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
> > > > > drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes
> > > > > in function 'max3421_spi_thread'
> > > > > drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes
> > > > > in function 'slic_ds26522_probe'
> > > > > drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in
> > > > > function 'ccp_run_cmd'
> > > > > drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840
> > > > > bytes in function 'stv0367ter_algo'
> > > > >
> > > > > None of these happen with gcc today, and almost all of these are the result
> > > > > of a single known bug in llvm.  Hopefully it will eventually get fixed with
> > > > > the
> > > > > clang-9 release.
> >
> > I am not confident we can fix this in clang.
> > The difference between gcc and clang, AFAICT, is not in the asan
> > instrumentation, but in inining.
> > Looking at the reproducer from https://bugs.llvm.org/show_bug.cgi?id=38809#c4,
> > if I change ltv350qv_write_reg to always inline, gcc also produces a
> > huge 10K stack frame,
> > and making it noinline fixes the stack frame for both gcc and clang.
>
> Does your gcc have fix for https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81715 ?
>
> For me gcc rejects to inline it:
>
> gcc version 7.3.0 (Debian 7.3.0-5)
>
> $ gcc /tmp/test.c -Wframe-larger-than=128 -c -fsanitize=kernel-address
> -O2 --param asan-stack=1
> /tmp/test.c:23:13: warning: always_inline function might not be
> inlinable [-Wattributes]

I don't see that warning here

>  static void ltv350qv_write_reg(void) {
>              ^~~~~~~~~~~~~~~~~~
> /tmp/test.c: In function ‘ltv350qv_power_on’:
> /tmp/test.c:57:1: warning: the frame size of 416 bytes is larger than
> 128 bytes [-Wframe-larger-than=]
>  }

but I also see the small stack size here: https://godbolt.org/z/Uz5Ruv

gcc definitely inlines the function here but only has one copy of
spi_message and spi_transfer on the stack, while clang inserts
a call to __asan_report_store8_noabort() and has one copy per
inlined call to ltv350qv_write_reg().

      Arnd
Andrey Konovalov Feb. 20, 2019, 2:44 p.m. UTC | #7
On Tue, Feb 19, 2019 at 10:49 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> Building an arm64 allmodconfig kernel with clang results in over 140 warnings
> about overly large stack frames, the worst ones being:
>
> drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare'
> drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
> drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread'
> drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe'
> drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd'
> drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo'
>
> None of these happen with gcc today, and almost all of these are the result
> of a single known bug in llvm.  Hopefully it will eventually get fixed with the
> clang-9 release.
>
> In the meantime, the best idea I have is to turn off asan-stack for clang-8
> and earlier, so we can produce a kernel that is safe to run.

Hi Arnd,

I don't think it's good to disable KASAN stack instrumentation for the
whole kernel by default with clang. It makes more sense to disable
stack instrumentation only for these few drivers.

Thanks!


>
> I have posted three patches that address the frame overflow warnings that are
> not addressed by turning off asan-stack, so in combination with this change,
> we get much closer to a clean allmodconfig build, which in turn is necessary
> to do meaningful build regression testing.
>
> Cc: Andrey Ryabinin <aryabinin@virtuozzo.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Mark Brown <broonie@kernel.org>
> Cc: Qian Cai <cai@lca.pw>
> Link: https://bugs.llvm.org/show_bug.cgi?id=38809
> Signed-off-by: Arnd Bergmann <arnd@arndb.de>
> ---
>  lib/Kconfig.kasan      | 13 +++++++++++++
>  scripts/Makefile.kasan |  2 +-
>  2 files changed, 14 insertions(+), 1 deletion(-)
>
> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
> index 67d7d1309c52..219cddc913ac 100644
> --- a/lib/Kconfig.kasan
> +++ b/lib/Kconfig.kasan
> @@ -103,6 +103,19 @@ config KASAN_INLINE
>
>  endchoice
>
> +config KASAN_STACK
> +       int
> +       default 0 if CC_IS_CLANG && (CLANG_VERSION < 90000)
> +       default 1
> +       help
> +         The LLVM stack address sanitizer has a know bug that
> +         causes excessive stack usage in a lot of functions, see
> +         https://bugs.llvm.org/show_bug.cgi?id=38809
> +         Disabling asan-stack makes it safe to run kernels build
> +         with clang-8 with KASAN enabled, though it loses some of
> +         the functionality.  We assume that clang-9 will have a fix,
> +         so the feature can be used.
> +
>  config KASAN_S390_4_LEVEL_PAGING
>         bool "KASan: use 4-level paging"
>         depends on KASAN && S390
> diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan
> index f1fb8e502657..6410bd22fe38 100644
> --- a/scripts/Makefile.kasan
> +++ b/scripts/Makefile.kasan
> @@ -26,7 +26,7 @@ else
>         CFLAGS_KASAN := $(CFLAGS_KASAN_SHADOW) \
>          $(call cc-param,asan-globals=1) \
>          $(call cc-param,asan-instrumentation-with-call-threshold=$(call_threshold)) \
> -        $(call cc-param,asan-stack=1) \
> +        $(call cc-param,asan-stack=$(CONFIG_KASAN_STACK)) \
>          $(call cc-param,asan-instrument-allocas=1)
>  endif
>
> --
> 2.20.0
>
Arnd Bergmann Feb. 20, 2019, 2:51 p.m. UTC | #8
On Wed, Feb 20, 2019 at 3:45 PM Andrey Konovalov <andreyknvl@google.com> wrote:
>
> On Tue, Feb 19, 2019 at 10:49 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > Building an arm64 allmodconfig kernel with clang results in over 140 warnings
> > about overly large stack frames, the worst ones being:
> >
> > drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare'
> > drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
> > drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread'
> > drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe'
> > drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd'
> > drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo'
> >
> > None of these happen with gcc today, and almost all of these are the result
> > of a single known bug in llvm.  Hopefully it will eventually get fixed with the
> > clang-9 release.
> >
> > In the meantime, the best idea I have is to turn off asan-stack for clang-8
> > and earlier, so we can produce a kernel that is safe to run.
>
> Hi Arnd,
>
> I don't think it's good to disable KASAN stack instrumentation for the
> whole kernel by default with clang. It makes more sense to disable
> stack instrumentation only for these few drivers.

Do you mean just the 114 files that we get warnings for in allmodconfig,
or also those that run into the same bug but stay below the warning limit,
and the ones that don't warn in allmodconfig but do warn in other
configurations?

I would have to some more research, but I expect several hundred
patches before we get to a clean randconfig build with a broken
compiler.

     Arnd
Andrey Ryabinin Feb. 20, 2019, 5 p.m. UTC | #9
On 2/20/19 5:51 PM, Arnd Bergmann wrote:
> On Wed, Feb 20, 2019 at 3:45 PM Andrey Konovalov <andreyknvl@google.com> wrote:
>>
>> On Tue, Feb 19, 2019 at 10:49 PM Arnd Bergmann <arnd@arndb.de> wrote:
>>>
>>> Building an arm64 allmodconfig kernel with clang results in over 140 warnings
>>> about overly large stack frames, the worst ones being:
>>>
>>> drivers/gpu/drm/panel/panel-sitronix-st7789v.c:196:12: error: stack frame size of 20224 bytes in function 'st7789v_prepare'
>>> drivers/video/fbdev/omap2/omapfb/displays/panel-tpo-td028ttec1.c:196:12: error: stack frame size of 13120 bytes in function 'td028ttec1_panel_enable'
>>> drivers/usb/host/max3421-hcd.c:1395:1: error: stack frame size of 10048 bytes in function 'max3421_spi_thread'
>>> drivers/net/wan/slic_ds26522.c:209:12: error: stack frame size of 9664 bytes in function 'slic_ds26522_probe'
>>> drivers/crypto/ccp/ccp-ops.c:2434:5: error: stack frame size of 8832 bytes in function 'ccp_run_cmd'
>>> drivers/media/dvb-frontends/stv0367.c:1005:12: error: stack frame size of 7840 bytes in function 'stv0367ter_algo'
>>>
>>> None of these happen with gcc today, and almost all of these are the result
>>> of a single known bug in llvm.  Hopefully it will eventually get fixed with the
>>> clang-9 release.
>>>
>>> In the meantime, the best idea I have is to turn off asan-stack for clang-8
>>> and earlier, so we can produce a kernel that is safe to run.
>>
>> Hi Arnd,
>>
>> I don't think it's good to disable KASAN stack instrumentation for the
>> whole kernel by default with clang. It makes more sense to disable
>> stack instrumentation only for these few drivers.
> 
> Do you mean just the 114 files that we get warnings for in allmodconfig,
> or also those that run into the same bug but stay below the warning limit,
> and the ones that don't warn in allmodconfig but do warn in other
> configurations?
> 
> I would have to some more research, but I expect several hundred
> patches before we get to a clean randconfig build with a broken
> compiler.

Manually maintaining asan-stack parameter for the sake of one broken compiler isn't a great idea either.

Couple alternative suggestions:

1) If we can't fix the problem or the cost of fixing is too high, maybe just hide it? Disable -Wframe-larger-then on pre clang-9 compilers.

2) Fallback cflags. The idea is to try to compile every the file with "-mllvm -asan-stack=1 -Wframe-larger-than=2048 -Werror" at first,
 and fallback to "-mllvm -asan-stack=0" if failed. So it would be something similar to $(call cc-option, -mllvm -asan-stack=1 -Wframe-larger-than=2048 -Werror, -mllvm -asan-stack=0)
 except that "cc-option" tries options only once on some code example while  we need to try options on every file that we actually compile.
 Honestly, I'm not sure that it's worthy to hack Kbuild engine for that particular use-case.
Arnd Bergmann Feb. 20, 2019, 5:35 p.m. UTC | #10
On Wed, Feb 20, 2019 at 6:00 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> On 2/20/19 5:51 PM, Arnd Bergmann wrote:
> > On Wed, Feb 20, 2019 at 3:45 PM Andrey Konovalov <andreyknvl@google.com> wrote:
> > I would have to some more research, but I expect several hundred
> > patches before we get to a clean randconfig build with a broken
> > compiler.
>
> Manually maintaining asan-stack parameter for the sake of one broken compiler isn't a great idea either.
>
> Couple alternative suggestions:
>
> 1) If we can't fix the problem or the cost of fixing is too high, maybe just hide it? Disable -Wframe-larger-then on pre clang-9 compilers.
>
> 2) Fallback cflags. The idea is to try to compile every the file with "-mllvm -asan-stack=1 -Wframe-larger-than=2048 -Werror" at first,
>  and fallback to "-mllvm -asan-stack=0" if failed. So it would be something similar to $(call cc-option, -mllvm -asan-stack=1 -Wframe-larger-than=2048 -Werror, -mllvm -asan-stack=0)
>  except that "cc-option" tries options only once on some code example while  we need to try options on every file that we actually compile.
>  Honestly, I'm not sure that it's worthy to hack Kbuild engine for that particular use-case.

My original plan was to put this under CONFIG_KASAN_EXTRA to allow you
to still enable it in older compilers, but you just removed that option ;-)

Maybe bringing it back would be a compromise? That way it's hidden from
all the build testing bots (because of the !CONFIG_COMPILE_TEST dependency),
but anyone who really wants it can still have the option, and set
CONFIG_FRAME_WARN
to whichever value they like.

     Arnd
Nick Desaulniers Feb. 20, 2019, 6:07 p.m. UTC | #11
+ Evgenii

On Wed, Feb 20, 2019 at 9:36 AM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Feb 20, 2019 at 6:00 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> > On 2/20/19 5:51 PM, Arnd Bergmann wrote:
> > > On Wed, Feb 20, 2019 at 3:45 PM Andrey Konovalov <andreyknvl@google.com> wrote:
> > > I would have to some more research, but I expect several hundred
> > > patches before we get to a clean randconfig build with a broken
> > > compiler.
> >
> > Manually maintaining asan-stack parameter for the sake of one broken compiler isn't a great idea either.
> >
> > Couple alternative suggestions:
> >
> > 1) If we can't fix the problem or the cost of fixing is too high, maybe just hide it? Disable -Wframe-larger-then on pre clang-9 compilers.
> >
> > 2) Fallback cflags. The idea is to try to compile every the file with "-mllvm -asan-stack=1 -Wframe-larger-than=2048 -Werror" at first,
> >  and fallback to "-mllvm -asan-stack=0" if failed. So it would be something similar to $(call cc-option, -mllvm -asan-stack=1 -Wframe-larger-than=2048 -Werror, -mllvm -asan-stack=0)
> >  except that "cc-option" tries options only once on some code example while  we need to try options on every file that we actually compile.
> >  Honestly, I'm not sure that it's worthy to hack Kbuild engine for that particular use-case.
>
> My original plan was to put this under CONFIG_KASAN_EXTRA to allow you
> to still enable it in older compilers, but you just removed that option ;-)
>
> Maybe bringing it back would be a compromise? That way it's hidden from
> all the build testing bots (because of the !CONFIG_COMPILE_TEST dependency),
> but anyone who really wants it can still have the option, and set
> CONFIG_FRAME_WARN
> to whichever value they like.
>
>      Arnd

I like Evgenii's idea:
https://bugs.llvm.org/show_bug.cgi?id=38809#c10

Even though something like that wouldn't make the clang-8 train, I
think it's ok.

While I myself share Arnd's goal of driving compiler warnings to zero,
in general I'd prefer not to disable warning-producing-features or
disable warnings outright for cases where we have some ideas of
changes we can make to the compiler.  There's probably a list now of
false warnings produced by old versions of Clang from bugs in Clang
that we fixed.  I'm not interested in additionally trying to work
around those somehow in kernel sources.

Qian previously pointed out that most drivers don't produce this
warning under KASAN+Clang.  While 114 is a lot, what are the chances
that someone NEEDS a KASAN+Clang build to compile warning free and
happen to include one of these problematic drivers?  And if there is a
chance they do observe the warning, are we doing a disservice by
disabling the feature (-asan-stack=1) outright for the whole kernel,
or disabling the warning (`-Wstack-frame-larger-than=`) which can flag
issues unrelated to KASAN?

To Evgenii's idea, I vote that the compiler is incorrect here, and we
shouldn't start turning things off.  Evgenii, do you have some sense
of how to tune the inliner as you described?
Mark Brown Feb. 20, 2019, 6:44 p.m. UTC | #12
On Wed, Feb 20, 2019 at 10:07:36AM -0800, Nick Desaulniers wrote:

> I like Evgenii's idea:
> https://bugs.llvm.org/show_bug.cgi?id=38809#c10

That's a suggestion to tune the inlining heuristics.

> While I myself share Arnd's goal of driving compiler warnings to zero,
> in general I'd prefer not to disable warning-producing-features or
> disable warnings outright for cases where we have some ideas of
> changes we can make to the compiler.  There's probably a list now of
> false warnings produced by old versions of Clang from bugs in Clang
> that we fixed.  I'm not interested in additionally trying to work
> around those somehow in kernel sources.

We do have infrastructure in the kernel for managing warnings based on
compiler version (Arnd was looking at some improvements to that IIRC),
if we've got a kernel that builds with a given compiler it's worth
looking at tuning what we do with that compiler.  If newer versions of
the compiler work better or have new options we can turn things on for
them.

> Qian previously pointed out that most drivers don't produce this
> warning under KASAN+Clang.  While 114 is a lot, what are the chances
> that someone NEEDS a KASAN+Clang build to compile warning free and
> happen to include one of these problematic drivers?  And if there is a
> chance they do observe the warning, are we doing a disservice by
> disabling the feature (-asan-stack=1) outright for the whole kernel,
> or disabling the warning (`-Wstack-frame-larger-than=`) which can flag
> issues unrelated to KASAN?

People doing treewide work and subsystem maintainers are a reasonably
important target for this sort of thing - for example people looking at
the kernelci output.  It's a lot easier to pay attention to problems if
you don't have to wade through large numbers of false positives.
Nick Desaulniers Feb. 20, 2019, 8:02 p.m. UTC | #13
On Wed, Feb 20, 2019 at 10:44 AM Mark Brown <broonie@kernel.org> wrote:
>
> On Wed, Feb 20, 2019 at 10:07:36AM -0800, Nick Desaulniers wrote:
>
> > I like Evgenii's idea:
> > https://bugs.llvm.org/show_bug.cgi?id=38809#c10
>
> That's a suggestion to tune the inlining heuristics.

Yes; but it will also improve KASAN (if feasible).

> > While I myself share Arnd's goal of driving compiler warnings to zero,
> > in general I'd prefer not to disable warning-producing-features or
> > disable warnings outright for cases where we have some ideas of
> > changes we can make to the compiler.  There's probably a list now of
> > false warnings produced by old versions of Clang from bugs in Clang
> > that we fixed.  I'm not interested in additionally trying to work
> > around those somehow in kernel sources.
>
> We do have infrastructure in the kernel for managing warnings based on
> compiler version (Arnd was looking at some improvements to that IIRC),
> if we've got a kernel that builds with a given compiler it's worth
> looking at tuning what we do with that compiler.  If newer versions of
> the compiler work better or have new options we can turn things on for
> them.

so maybe something like (pseudocode):
if kasan && clang && clang_version < 9:
  disable -Wframe-larger-than=

If you overrun the stack with KASAN, a warning would be nice, but
you'll hopefully find out the hard way at runtime.  And that doesn't
require up to 114 Makefile changes, which would be kind of obnoxious
for this papercut.

>
> > Qian previously pointed out that most drivers don't produce this
> > warning under KASAN+Clang.  While 114 is a lot, what are the chances
> > that someone NEEDS a KASAN+Clang build to compile warning free and
> > happen to include one of these problematic drivers?  And if there is a
> > chance they do observe the warning, are we doing a disservice by
> > disabling the feature (-asan-stack=1) outright for the whole kernel,
> > or disabling the warning (`-Wstack-frame-larger-than=`) which can flag
> > issues unrelated to KASAN?
>
> People doing treewide work and subsystem maintainers are a reasonably
> important target for this sort of thing - for example people looking at
> the kernelci output.  It's a lot easier to pay attention to problems if
> you don't have to wade through large numbers of false positives.

Good point.  Current reports are a flood of -Wframe-larger-than=
because of KASAN (we've fixed just about everything else), and I have
to pick out what's new from that sea of false positives.  I would hate
for these warnings from KASAN to be the last thing before people start
taking clang builds seriously due to false positive warnings.
Arnd Bergmann Feb. 20, 2019, 9:13 p.m. UTC | #14
On Wed, Feb 20, 2019 at 9:02 PM Nick Desaulniers
<ndesaulniers@google.com> wrote:
>
> On Wed, Feb 20, 2019 at 10:44 AM Mark Brown <broonie@kernel.org> wrote:
> >
> > On Wed, Feb 20, 2019 at 10:07:36AM -0800, Nick Desaulniers wrote:
> >
> > > I like Evgenii's idea:
> > > https://bugs.llvm.org/show_bug.cgi?id=38809#c10
> >
> > That's a suggestion to tune the inlining heuristics.
>
> Yes; but it will also improve KASAN (if feasible).

From my experiments that I cited in the bug report, very few cases
seem to actually be the result of different inlining decisions, and those
that are tend to be kernel bugs (e.g. in drivers/scsi/bfa/bfa_fcs_lport.c
clang reports a larger stack than gcc because it combines two functions
that individually use 800 bytes each, but if one calls the other, we
use even more stack than the combined function).

In the example in https://bugs.llvm.org/show_bug.cgi?id=38809#c12
(https://godbolt.org/z/ylsGSQ) there is no inlining, yet clang uses
over ten times as much stack space as gcc, for reasons I still
can't explain. My assumption right now is that the underlying bug
causes most of the problems with excessive stack usage in
allmodconfig kernels.

> > > While I myself share Arnd's goal of driving compiler warnings to zero,
> > > in general I'd prefer not to disable warning-producing-features or
> > > disable warnings outright for cases where we have some ideas of
> > > changes we can make to the compiler.  There's probably a list now of
> > > false warnings produced by old versions of Clang from bugs in Clang
> > > that we fixed.  I'm not interested in additionally trying to work
> > > around those somehow in kernel sources.
> >
> > We do have infrastructure in the kernel for managing warnings based on
> > compiler version (Arnd was looking at some improvements to that IIRC),
> > if we've got a kernel that builds with a given compiler it's worth
> > looking at tuning what we do with that compiler.  If newer versions of
> > the compiler work better or have new options we can turn things on for
> > them.
>
> so maybe something like (pseudocode):
> if kasan && clang && clang_version < 9:
>   disable -Wframe-larger-than=
>
> If you overrun the stack with KASAN, a warning would be nice, but
> you'll hopefully find out the hard way at runtime.  And that doesn't
> require up to 114 Makefile changes, which would be kind of obnoxious
> for this papercut.

That would mean that build testing allmodconfig with clang would
ignore all the interesting stack overflow bugs that happen in real
user systems without KASAN.

       Arnd
Arnd Bergmann Feb. 20, 2019, 9:40 p.m. UTC | #15
On Wed, Feb 20, 2019 at 10:13 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> In the example in https://bugs.llvm.org/show_bug.cgi?id=38809#c12
> (https://godbolt.org/z/ylsGSQ) there is no inlining, yet clang uses
> over ten times as much stack space as gcc, for reasons I still
> can't explain. My assumption right now is that the underlying bug
> causes most of the problems with excessive stack usage in
> allmodconfig kernels.

Here is an even more minimal example:

struct s { int i[5]; } f(void);
void g(void) { f(); f();}

https://godbolt.org/z/d_KWkh

It's clear that clang does /something/ here when asan-stack=1 is
set, but I fail to see what it is, or why that is necessary.

The output of clang with asan-stack=0 is the expected
code, and basically identical to what gcc produces with or
without asan-stack.

      Arnd
Kostya Serebryany Feb. 20, 2019, 10:12 p.m. UTC | #16
On Wed, Feb 20, 2019 at 1:40 PM Arnd Bergmann <arnd@arndb.de> wrote:
>
> On Wed, Feb 20, 2019 at 10:13 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > In the example in https://bugs.llvm.org/show_bug.cgi?id=38809#c12
> > (https://godbolt.org/z/ylsGSQ) there is no inlining, yet clang uses
> > over ten times as much stack space as gcc, for reasons I still
> > can't explain. My assumption right now is that the underlying bug
> > causes most of the problems with excessive stack usage in
> > allmodconfig kernels.
>
> Here is an even more minimal example:
>
> struct s { int i[5]; } f(void);
> void g(void) { f(); f();}

On this example I can see some stupidity that clang/asan is doing.
Let me try fixing it and see if it helps bigger cases.
Thanks for reducing the case!

This is the input we get in the asan instrumentation:

; Function Attrs: noinline nounwind optnone sanitize_address uwtable
define dso_local void @g() #0 {
entry:
  %tmp = alloca %struct.s, align 4  <<<<<<<<<<<<<<<<<<<<<<<
  %tmp1 = alloca %struct.s, align 4
  %0 = bitcast %struct.s* %tmp to i8*
  call void @llvm.lifetime.start.p0i8(i64 20, i8* %0) #3
  call void @f(%struct.s* sret %tmp)
  %1 = bitcast %struct.s* %tmp to i8* <<<<<<<<<<<<<<<<<<<<<
  call void @llvm.lifetime.end.p0i8(i64 20, i8* %1) #3
  %2 = bitcast %struct.s* %tmp1 to i8*
  call void @llvm.lifetime.start.p0i8(i64 20, i8* %2) #3  <<<<<<<<<<<<<
  call void @f(%struct.s* sret %tmp1)
  %3 = bitcast %struct.s* %tmp1 to i8*
  call void @llvm.lifetime.end.p0i8(i64 20, i8* %3) #3
  ret void
}

the stack variables are not *really* used, but since they are "used"
inside the lifetime markers they are not eliminated by asan,
and so asan instruments them, after which no one can remove them any more...




>
> https://godbolt.org/z/d_KWkh
>
> It's clear that clang does /something/ here when asan-stack=1 is
> set, but I fail to see what it is, or why that is necessary.
>
> The output of clang with asan-stack=0 is the expected
> code, and basically identical to what gcc produces with or
> without asan-stack.
>
>       Arnd
Kostya Serebryany Feb. 20, 2019, 11:46 p.m. UTC | #17
On Wed, Feb 20, 2019 at 2:12 PM Kostya Serebryany <kcc@google.com> wrote:
>
> On Wed, Feb 20, 2019 at 1:40 PM Arnd Bergmann <arnd@arndb.de> wrote:
> >
> > On Wed, Feb 20, 2019 at 10:13 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > In the example in https://bugs.llvm.org/show_bug.cgi?id=38809#c12
> > > (https://godbolt.org/z/ylsGSQ) there is no inlining, yet clang uses
> > > over ten times as much stack space as gcc, for reasons I still
> > > can't explain. My assumption right now is that the underlying bug
> > > causes most of the problems with excessive stack usage in
> > > allmodconfig kernels.
> >
> > Here is an even more minimal example:
> >
> > struct s { int i[5]; } f(void);
> > void g(void) { f(); f();}
>
> On this example I can see some stupidity that clang/asan is doing.
> Let me try fixing it and see if it helps bigger cases.
> Thanks for reducing the case!
>
> This is the input we get in the asan instrumentation:
>
> ; Function Attrs: noinline nounwind optnone sanitize_address uwtable
> define dso_local void @g() #0 {
> entry:
>   %tmp = alloca %struct.s, align 4  <<<<<<<<<<<<<<<<<<<<<<<
>   %tmp1 = alloca %struct.s, align 4
>   %0 = bitcast %struct.s* %tmp to i8*
>   call void @llvm.lifetime.start.p0i8(i64 20, i8* %0) #3
>   call void @f(%struct.s* sret %tmp)
>   %1 = bitcast %struct.s* %tmp to i8* <<<<<<<<<<<<<<<<<<<<<
>   call void @llvm.lifetime.end.p0i8(i64 20, i8* %1) #3
>   %2 = bitcast %struct.s* %tmp1 to i8*
>   call void @llvm.lifetime.start.p0i8(i64 20, i8* %2) #3  <<<<<<<<<<<<<
>   call void @f(%struct.s* sret %tmp1)
>   %3 = bitcast %struct.s* %tmp1 to i8*
>   call void @llvm.lifetime.end.p0i8(i64 20, i8* %3) #3
>   ret void
> }

Err.. taking my words back.
These allocas *are* used other then in lifetime markers, since they
are passed to f() as 'sret'.
And we can not drop instrumentation for such allocas. Example:

static volatile int zero = 0;
typedef struct {
  int ar[5];
} S;
S foo() {
  S s;
  s.ar[zero + 6] = 42;
  return s;
}
int main(int argc, char **argv) {
  S s = foo();
  return s.ar[argc];
}

% clang -g  -O1  -fsanitize=address  sret.c && ./a.out

==5822==ERROR: AddressSanitizer: stack-buffer-overflow on address
0x7ffcad027f78 at pc 0x0000004f878d bp 0x7ffcad027f20 sp
0x7ffcad027f18
WRITE of size 4 at 0x7ffcad027f78 thread T0
    #0 0x4f878c in foo sret.c:7:18
    #1 0x4f8838 in main sret.c:11:9
Address 0x7ffcad027f78 is located in stack of thread T0 at offset 56 in frame
    #0 0x4f879f in main sret.c:10

  This frame has 1 object(s):
    [32, 52) 's' (line 11) <== Memory access at offset 56 overflows
this variable

Here we have a struct return that needs to be instrumented inside main
so that a buffer overflow in foo() is detected.

Now, I am also not confident that the reduced case reflects the real problem.

--kcc









>
> the stack variables are not *really* used, but since they are "used"
> inside the lifetime markers they are not eliminated by asan,
> and so asan instruments them, after which no one can remove them any more...
>
>
>
>
> >
> > https://godbolt.org/z/d_KWkh
> >
> > It's clear that clang does /something/ here when asan-stack=1 is
> > set, but I fail to see what it is, or why that is necessary.
> >
> > The output of clang with asan-stack=0 is the expected
> > code, and basically identical to what gcc produces with or
> > without asan-stack.
> >
> >       Arnd
Andrey Ryabinin Feb. 21, 2019, 10:06 a.m. UTC | #18
On 2/20/19 8:35 PM, Arnd Bergmann wrote:
> On Wed, Feb 20, 2019 at 6:00 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>> On 2/20/19 5:51 PM, Arnd Bergmann wrote:
>>> On Wed, Feb 20, 2019 at 3:45 PM Andrey Konovalov <andreyknvl@google.com> wrote:
>>> I would have to some more research, but I expect several hundred
>>> patches before we get to a clean randconfig build with a broken
>>> compiler.
>>
>> Manually maintaining asan-stack parameter for the sake of one broken compiler isn't a great idea either.
>>
>> Couple alternative suggestions:
>>
>> 1) If we can't fix the problem or the cost of fixing is too high, maybe just hide it? Disable -Wframe-larger-then on pre clang-9 compilers.
>>
>> 2) Fallback cflags. The idea is to try to compile every the file with "-mllvm -asan-stack=1 -Wframe-larger-than=2048 -Werror" at first,
>>  and fallback to "-mllvm -asan-stack=0" if failed. So it would be something similar to $(call cc-option, -mllvm -asan-stack=1 -Wframe-larger-than=2048 -Werror, -mllvm -asan-stack=0)
>>  except that "cc-option" tries options only once on some code example while  we need to try options on every file that we actually compile.
>>  Honestly, I'm not sure that it's worthy to hack Kbuild engine for that particular use-case.
> 
> My original plan was to put this under CONFIG_KASAN_EXTRA to allow you
> to still enable it in older compilers, but you just removed that option ;-)
> 
> Maybe bringing it back would be a compromise? That way it's hidden from
> all the build testing bots (because of the !CONFIG_COMPILE_TEST dependency),
> but anyone who really wants it can still have the option, and set
> CONFIG_FRAME_WARN
> to whichever value they like.
> 


I think there is much simpler solution:

---
 lib/Kconfig.kasan | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index 219cddc913ac..6cd035f06cee 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -105,6 +105,8 @@ endchoice
 
 config KASAN_STACK
 	int
+	range 0 1
+	prompt "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && (CLANG_VERSION < 90000)
 	default 0 if CC_IS_CLANG && (CLANG_VERSION < 90000)
 	default 1
 	help
Arnd Bergmann Feb. 21, 2019, 3:19 p.m. UTC | #19
On Thu, Feb 21, 2019 at 11:06 AM Andrey Ryabinin
<aryabinin@virtuozzo.com> wrote:
> On 2/20/19 8:35 PM, Arnd Bergmann wrote:
> > On Wed, Feb 20, 2019 at 6:00 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
> >> On 2/20/19 5:51 PM, Arnd Bergmann wrote:

> > Maybe bringing it back would be a compromise? That way it's hidden from
> > all the build testing bots (because of the !CONFIG_COMPILE_TEST dependency),
> > but anyone who really wants it can still have the option, and set
> > CONFIG_FRAME_WARN
> > to whichever value they like.
> >
>
>
> I think there is much simpler solution:
>
> ---
>  lib/Kconfig.kasan | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
> index 219cddc913ac..6cd035f06cee 100644
> --- a/lib/Kconfig.kasan
> +++ b/lib/Kconfig.kasan
> @@ -105,6 +105,8 @@ endchoice
>
>  config KASAN_STACK
>         int
> +       range 0 1
> +       prompt "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && (CLANG_VERSION < 90000)
>         default 0 if CC_IS_CLANG && (CLANG_VERSION < 90000)
>         default 1
>         help
> --
>
>
> AFAIK, randconfig is not able to randomize int config options, so it will be disabled for build robots,
> but users still will be able to enable it.

Right, this will work, but I find it a bit awkward to require users to
enter 0 or 1.

My assumption is that build bots turn on CONFIG_COMPILE_TEST, so
having a bool option that depends on COMPILE_TEST would be more
conventional. We can debate whether it should also depend on
CONFIG_EXPERT or not. Something like

config KASAN_STACK
         bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG
&& !COMPILE_TEST
         default CC_IS_GCC || (CLANG_VERSION >= 90000)

And then a simpler Makefile logic (could also be done in Kconfig) to turn
that bool symbol into an integer argument for asan-stack=

       Arnd
Andrey Ryabinin Feb. 21, 2019, 4:14 p.m. UTC | #20
On 2/21/19 6:19 PM, Arnd Bergmann wrote:
> On Thu, Feb 21, 2019 at 11:06 AM Andrey Ryabinin
> <aryabinin@virtuozzo.com> wrote:
>> On 2/20/19 8:35 PM, Arnd Bergmann wrote:
>>> On Wed, Feb 20, 2019 at 6:00 PM Andrey Ryabinin <aryabinin@virtuozzo.com> wrote:
>>>> On 2/20/19 5:51 PM, Arnd Bergmann wrote:
> 
>>> Maybe bringing it back would be a compromise? That way it's hidden from
>>> all the build testing bots (because of the !CONFIG_COMPILE_TEST dependency),
>>> but anyone who really wants it can still have the option, and set
>>> CONFIG_FRAME_WARN
>>> to whichever value they like.
>>>
>>
>>
>> I think there is much simpler solution:
>>
>> ---
>>  lib/Kconfig.kasan | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
>> index 219cddc913ac..6cd035f06cee 100644
>> --- a/lib/Kconfig.kasan
>> +++ b/lib/Kconfig.kasan
>> @@ -105,6 +105,8 @@ endchoice
>>
>>  config KASAN_STACK
>>         int
>> +       range 0 1
>> +       prompt "Enable stack instrumentation (unsafe)" if CC_IS_CLANG && (CLANG_VERSION < 90000)
>>         default 0 if CC_IS_CLANG && (CLANG_VERSION < 90000)
>>         default 1
>>         help
>> --
>>
>>
>> AFAIK, randconfig is not able to randomize int config options, so it will be disabled for build robots,
>> but users still will be able to enable it.
> 
> Right, this will work, but I find it a bit awkward to require users to
> enter 0 or 1.
> 
> My assumption is that build bots turn on CONFIG_COMPILE_TEST, so
> having a bool option that depends on COMPILE_TEST would be more
> conventional. We can debate whether it should also depend on
> CONFIG_EXPERT or not. Something like
> 
> config KASAN_STACK
>          bool "Enable stack instrumentation (unsafe)" if CC_IS_CLANG
> && !COMPILE_TEST
>          default CC_IS_GCC || (CLANG_VERSION >= 90000)
> 
> And then a simpler Makefile logic (could also be done in Kconfig) to turn
> that bool symbol into an integer argument for asan-stack=
> 

Sounds good.
Arnd Bergmann Feb. 21, 2019, 5:19 p.m. UTC | #21
On Thu, Feb 21, 2019 at 12:47 AM Kostya Serebryany <kcc@google.com> wrote:
>
> On Wed, Feb 20, 2019 at 2:12 PM Kostya Serebryany <kcc@google.com> wrote:
> >
> > On Wed, Feb 20, 2019 at 1:40 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > >
> > > On Wed, Feb 20, 2019 at 10:13 PM Arnd Bergmann <arnd@arndb.de> wrote:
> > > >
> > > > In the example in https://bugs.llvm.org/show_bug.cgi?id=38809#c12
> > > > (https://godbolt.org/z/ylsGSQ) there is no inlining, yet clang uses
> > > > over ten times as much stack space as gcc, for reasons I still
> > > > can't explain. My assumption right now is that the underlying bug
> > > > causes most of the problems with excessive stack usage in
> > > > allmodconfig kernels.
> > >
> > > Here is an even more minimal example:
> > >
> > > struct s { int i[5]; } f(void);
> > > void g(void) { f(); f();}
> >
> > On this example I can see some stupidity that clang/asan is doing.
> > Let me try fixing it and see if it helps bigger cases.
> > Thanks for reducing the case!
> >
> > This is the input we get in the asan instrumentation:
> >
> > ; Function Attrs: noinline nounwind optnone sanitize_address uwtable
> > define dso_local void @g() #0 {
> > entry:
> >   %tmp = alloca %struct.s, align 4  <<<<<<<<<<<<<<<<<<<<<<<
> >   %tmp1 = alloca %struct.s, align 4
> >   %0 = bitcast %struct.s* %tmp to i8*
> >   call void @llvm.lifetime.start.p0i8(i64 20, i8* %0) #3
> >   call void @f(%struct.s* sret %tmp)
> >   %1 = bitcast %struct.s* %tmp to i8* <<<<<<<<<<<<<<<<<<<<<
> >   call void @llvm.lifetime.end.p0i8(i64 20, i8* %1) #3
> >   %2 = bitcast %struct.s* %tmp1 to i8*
> >   call void @llvm.lifetime.start.p0i8(i64 20, i8* %2) #3  <<<<<<<<<<<<<
> >   call void @f(%struct.s* sret %tmp1)
> >   %3 = bitcast %struct.s* %tmp1 to i8*
> >   call void @llvm.lifetime.end.p0i8(i64 20, i8* %3) #3
> >   ret void
> > }
>
> Err.. taking my words back.
> These allocas *are* used other then in lifetime markers, since they
> are passed to f() as 'sret'.

Thanks a lot for the analysis!

> And we can not drop instrumentation for such allocas. Example:
>
> static volatile int zero = 0;
> typedef struct {
>   int ar[5];
> } S;
> S foo() {
>   S s;
>   s.ar[zero + 6] = 42;
>   return s;
> }
> int main(int argc, char **argv) {
>   S s = foo();
>   return s.ar[argc];
> }
>
> % clang -g  -O1  -fsanitize=address  sret.c && ./a.out
>
> ==5822==ERROR: AddressSanitizer: stack-buffer-overflow on address
> 0x7ffcad027f78 at pc 0x0000004f878d bp 0x7ffcad027f20 sp
> 0x7ffcad027f18
> WRITE of size 4 at 0x7ffcad027f78 thread T0
>     #0 0x4f878c in foo sret.c:7:18
>     #1 0x4f8838 in main sret.c:11:9
> Address 0x7ffcad027f78 is located in stack of thread T0 at offset 56 in frame
>     #0 0x4f879f in main sret.c:10
>
>   This frame has 1 object(s):
>     [32, 52) 's' (line 11) <== Memory access at offset 56 overflows
> this variable
>
> Here we have a struct return that needs to be instrumented inside main
> so that a buffer overflow in foo() is detected.

I tried reproducing this with gcc. The example you list above
will generate very similar code on gcc. However, if I change main() to
not assign the return value of foo() to a local variable, I get a trivial
main main function without sanitizer code, but foo() still detects the
overflow:

=================================================================
==3442660==ERROR: AddressSanitizer: stack-buffer-overflow on address
0x7fff5682bd58 at pc 0x0000004009c3 bp 0x7fff5682bd10 sp
0x7fff5682bd00
WRITE of size 4 at 0x7fff5682bd58 thread T0
    #0 0x4009c2 in foo (/tmp/a.out+0x4009c2)
    #1 0x4009dd in main (/tmp/a.out+0x4009dd)
    #2 0x7fc6f0ce482f in __libc_start_main
(/lib/x86_64-linux-gnu/libc.so.6+0x2082f)
    #3 0x400808 in _start (/tmp/a.out+0x400808)

Address 0x7fff5682bd58 is located in stack of thread T0 at offset 56 in frame
    #0 0x4008c6 in foo (/tmp/a.out+0x4008c6)

  This frame has 1 object(s):
    [32, 52) 's' <== Memory access at offset 56 overflows this variable
HINT: this may be a false positive if your program uses some custom
stack unwind mechanism or swapcontext
      (longjmp and C++ exceptions *are* supported)
SUMMARY: AddressSanitizer: stack-buffer-overflow (/tmp/a.out+0x4009c2) in foo

The other difference is that with gcc, we only get one copy of each stack
variable even when it adds code from asan-stack to check it, while clang
generally creates one copy per instance when asan-stack=1 is set
(but doesn't otherwise).

> Now, I am also not confident that the reduced case reflects the real problem.

No, it probably doesn't, but it seems vaguely related. This is a problem
of using creduce, it sometimes reduces the test cases too much and
runs into a different symptom.

The case we frequently see in the kernel is probably more like

void extf(long *);
static inline __attribute__((always_inline)) void f(void)
{
    long i = 7;
    extg(&i);
}
int main(void)
{
     f();
     f();
}

https://godbolt.org/z/ReEQLo

Here, both gcc and clang add sanitizing for 'i', but clang has multiple
copies, while gcc only has one. This happens for both scalars and
structs, and inline functions as well as open-coded blocks or macros
with the same code in them. Using the inline function for illustration
above since that is the most common way it appears in the kernel.

      Arnd
diff mbox series

Patch

diff --git a/lib/Kconfig.kasan b/lib/Kconfig.kasan
index 67d7d1309c52..219cddc913ac 100644
--- a/lib/Kconfig.kasan
+++ b/lib/Kconfig.kasan
@@ -103,6 +103,19 @@  config KASAN_INLINE
 
 endchoice
 
+config KASAN_STACK
+	int
+	default 0 if CC_IS_CLANG && (CLANG_VERSION < 90000)
+	default 1
+	help
+	  The LLVM stack address sanitizer has a know bug that
+	  causes excessive stack usage in a lot of functions, see
+	  https://bugs.llvm.org/show_bug.cgi?id=38809
+	  Disabling asan-stack makes it safe to run kernels build
+	  with clang-8 with KASAN enabled, though it loses some of
+	  the functionality.  We assume that clang-9 will have a fix,
+	  so the feature can be used.
+
 config KASAN_S390_4_LEVEL_PAGING
 	bool "KASan: use 4-level paging"
 	depends on KASAN && S390
diff --git a/scripts/Makefile.kasan b/scripts/Makefile.kasan
index f1fb8e502657..6410bd22fe38 100644
--- a/scripts/Makefile.kasan
+++ b/scripts/Makefile.kasan
@@ -26,7 +26,7 @@  else
 	CFLAGS_KASAN := $(CFLAGS_KASAN_SHADOW) \
 	 $(call cc-param,asan-globals=1) \
 	 $(call cc-param,asan-instrumentation-with-call-threshold=$(call_threshold)) \
-	 $(call cc-param,asan-stack=1) \
+	 $(call cc-param,asan-stack=$(CONFIG_KASAN_STACK)) \
 	 $(call cc-param,asan-instrument-allocas=1)
 endif