diff mbox

Build failure with v4.9-rc1 and GCC trunk -- compiler weirdness

Message ID 20161017183806.GG5601@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Will Deacon Oct. 17, 2016, 6:38 p.m. UTC
Hi all,

I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
believe that the new compiler behaviour at the heart of the problem
has the potential to affect other architectures and other pieces of
kernel code relying on dead-code elimination to remove deliberately
undefined functions.

The failure looks like:

  | drivers/built-in.o: In function `armada_3700_add_composite_clk':
  |
  | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
  | undefined reference to `____ilog2_NaN'
  |
  | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
  | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
  | `____ilog2_NaN'
  |
  | make: *** [vmlinux] Error 1

and if we look at the source for armada_3700_add_composite_clk, we see
that this is caused by:

  int table_size = 0;

  rate->reg = reg + (u64)rate->reg;
  for (clkt = rate->table; clkt->div; clkt++)
	 table_size++;
  rate->width = order_base_2(table_size);

order_base_2 calls ilog2, which has the ____ilog2_NaN call:

#define ilog2(n)				\
(						\
	__builtin_constant_p(n) ? (		\
		(n) < 1 ? ____ilog2_NaN() :	\

This is because we're in a curious case where GCC has emitted a
special-cased version of armada_3700_add_composite_clk, with table_size
effectively constant-folded as 0. Whilst we shouldn't see this in a
non-buggy kernel (hence the deliberate call to the undefined function
____ilog2_NaN), it means that the final link fails because we have a
____ilog2_NaN in the code, with a runtime check on table_size.

In other words, __builtin_constant_p appears to be weaker than we've
been assuming. Talking to the compiler guys here, this is due to the
"jump-threading" optimisation pass, so the patch below disables that.

A simpler example is:

int foo();
int bar();

int count(int *argc)
{
	int table_size = 0;

	for (; *argc; argc++)
		table_size++;

	if (__builtin_constant_p(table_size))
		return table_size == 0 ? foo() : bar();

	return bar();
}

which compiles to:

count:
	ldr	w0, [x0]
	cbz	w0, .L4
	b	bar
	.p2align 3
.L4:
	b	foo

and, with the "optimisation" disabled:

count:
	b	bar

Thoughts? It feels awfully fragile disabling passes like this, but with
GCC transforming the code like this, I can't immediately think of a way
to preserve the intended behaviour of the code.

Will

--->8

Comments

Ard Biesheuvel Oct. 17, 2016, 7:43 p.m. UTC | #1
On 17 October 2016 at 19:38, Will Deacon <will.deacon@arm.com> wrote:
> Hi all,
>
> I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
> believe that the new compiler behaviour at the heart of the problem
> has the potential to affect other architectures and other pieces of
> kernel code relying on dead-code elimination to remove deliberately
> undefined functions.
>
> The failure looks like:
>
>   | drivers/built-in.o: In function `armada_3700_add_composite_clk':
>   |
>   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
>   | undefined reference to `____ilog2_NaN'
>   |
>   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
>   | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
>   | `____ilog2_NaN'
>   |
>   | make: *** [vmlinux] Error 1
>
> and if we look at the source for armada_3700_add_composite_clk, we see
> that this is caused by:
>
>   int table_size = 0;
>
>   rate->reg = reg + (u64)rate->reg;
>   for (clkt = rate->table; clkt->div; clkt++)
>          table_size++;
>   rate->width = order_base_2(table_size);
>
> order_base_2 calls ilog2, which has the ____ilog2_NaN call:
>
> #define ilog2(n)                                \
> (                                               \
>         __builtin_constant_p(n) ? (             \
>                 (n) < 1 ? ____ilog2_NaN() :     \
>
> This is because we're in a curious case where GCC has emitted a
> special-cased version of armada_3700_add_composite_clk, with table_size
> effectively constant-folded as 0. Whilst we shouldn't see this in a
> non-buggy kernel (hence the deliberate call to the undefined function
> ____ilog2_NaN), it means that the final link fails because we have a
> ____ilog2_NaN in the code, with a runtime check on table_size.
>

This is indeed an unintended side effect, but I would not call it
weird behaviour at all. The code in its current form does not handle
the case where it could end up passing 0 into order_base_2(), and we
simply need to handle that case. If order_base_2() is not defined for
input 0, it should BUG() in that case, and the associated
__builtin_unreachable() should prevent the special version from being
emitted. If order_base_2() is defined for input 0, it should not
invoke ilog2() with that argument, and the problem should go away as
well.
Will Deacon Oct. 19, 2016, 1:35 p.m. UTC | #2
Hi Ard,

On Mon, Oct 17, 2016 at 08:43:19PM +0100, Ard Biesheuvel wrote:
> On 17 October 2016 at 19:38, Will Deacon <will.deacon@arm.com> wrote:
> > I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
> > believe that the new compiler behaviour at the heart of the problem
> > has the potential to affect other architectures and other pieces of
> > kernel code relying on dead-code elimination to remove deliberately
> > undefined functions.
> >
> > The failure looks like:
> >
> >   | drivers/built-in.o: In function `armada_3700_add_composite_clk':
> >   |
> >   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
> >   | undefined reference to `____ilog2_NaN'
> >   |
> >   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
> >   | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
> >   | `____ilog2_NaN'
> >   |
> >   | make: *** [vmlinux] Error 1
> >
> > and if we look at the source for armada_3700_add_composite_clk, we see
> > that this is caused by:
> >
> >   int table_size = 0;
> >
> >   rate->reg = reg + (u64)rate->reg;
> >   for (clkt = rate->table; clkt->div; clkt++)
> >          table_size++;
> >   rate->width = order_base_2(table_size);
> >
> > order_base_2 calls ilog2, which has the ____ilog2_NaN call:
> >
> > #define ilog2(n)                                \
> > (                                               \
> >         __builtin_constant_p(n) ? (             \
> >                 (n) < 1 ? ____ilog2_NaN() :     \
> >
> > This is because we're in a curious case where GCC has emitted a
> > special-cased version of armada_3700_add_composite_clk, with table_size
> > effectively constant-folded as 0. Whilst we shouldn't see this in a
> > non-buggy kernel (hence the deliberate call to the undefined function
> > ____ilog2_NaN), it means that the final link fails because we have a
> > ____ilog2_NaN in the code, with a runtime check on table_size.
> >
> 
> This is indeed an unintended side effect, but I would not call it
> weird behaviour at all. The code in its current form does not handle
> the case where it could end up passing 0 into order_base_2(), and we
> simply need to handle that case.

The reasons I think it's weird are:

  (1) The optimisation doesn't generate better code in this case --
      optimising for the table_size == 0 case is uninformed, particularly
      as that *cannot* happen at runtime (GCC probably can't tell, due
      to things like container_of, but all the clock data is static).

  (2) __builtin_constant_p(n) could be interpreted by a developer as
     "this code will execute with a constant n at runtime". With this
     issue, GCC could (in theory) generate a specialisation for every
     possible value of a variable, and return __builtin_constant_p as
     true for all of them, which somewhat undermines the point of the
     builtin.

> If order_base_2() is not defined for input 0, it should BUG() in that
> case, and the associated __builtin_unreachable() should prevent the
> special version from being emitted. If order_base_2() is defined for input
> 0, it should not invoke ilog2() with that argument, and the problem should
> go away as well.

I don't necessarily think it should BUG() if it's not defined for input
0; things like __ffs don't do that and we'd be introducing conditional
checks for cases that should not happen. The comment above order_base_2
does suggest that ob2(0) should return 0, but it can actually end up
invoking ilog2(-1), which is obviously wrong.

I could update the comment, but that doesn't fix the build issue.

Will
Markus Trippelsdorf Oct. 19, 2016, 3:37 p.m. UTC | #3
On 2016.10.17 at 19:38 +0100, Will Deacon wrote:
> Hi all,
> 
> I'm seeing an arm64 build failure with -rc1 and GCC trunk, although I
> believe that the new compiler behaviour at the heart of the problem
> has the potential to affect other architectures and other pieces of
> kernel code relying on dead-code elimination to remove deliberately
> undefined functions.
> 
> The failure looks like:
> 
>   | drivers/built-in.o: In function `armada_3700_add_composite_clk':
>   |
>   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:
>   | undefined reference to `____ilog2_NaN'
>   |
>   | linux/drivers/clk/mvebu/armada-37xx-periph.c:351:(.text+0xc72e0):
>   | relocation truncated to fit: R_AARCH64_CALL26 against undefined symbol
>   | `____ilog2_NaN'
>   |
>   | make: *** [vmlinux] Error 1
> 

This is a gcc bug, see:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785
Linus Torvalds Oct. 19, 2016, 3:55 p.m. UTC | #4
On Wed, Oct 19, 2016 at 8:37 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
>
> This is a gcc bug, see:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785

Well, in the meantime we apparently have to live with it. Unless Will
is using some unreleased gcc version that nobody else is using and we
can just ignore it?

I don't think the link-time check is so important that we need to
notice it, and the "____ilog2_NaN()" could just be replaced with "0".

                 Linus
Markus Trippelsdorf Oct. 19, 2016, 3:56 p.m. UTC | #5
On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
> On Wed, Oct 19, 2016 at 8:37 AM, Markus Trippelsdorf
> <markus@trippelsdorf.de> wrote:
> >
> > This is a gcc bug, see:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785
> 
> Well, in the meantime we apparently have to live with it. Unless Will
> is using some unreleased gcc version that nobody else is using and we
> can just ignore it?

Yes, he is using gcc-7 that is unreleased. (It will be released April
next year.)
Russell King (Oracle) Oct. 19, 2016, 3:58 p.m. UTC | #6
On Wed, Oct 19, 2016 at 02:35:00PM +0100, Will Deacon wrote:
> Hi Ard,
> 
> On Mon, Oct 17, 2016 at 08:43:19PM +0100, Ard Biesheuvel wrote:
> > If order_base_2() is not defined for input 0, it should BUG() in that
> > case, and the associated __builtin_unreachable() should prevent the
> > special version from being emitted. If order_base_2() is defined for input
> > 0, it should not invoke ilog2() with that argument, and the problem should
> > go away as well.
> 
> I don't necessarily think it should BUG() if it's not defined for input
> 0;

In any case, Linus will have a rant about that: Linus has already been
concerned about the abuse of BUG().  BUG() should not be used as an
assert() replacement, but should be used where we have absolutely
no other option than to crash the kernel, because (eg) continuing
would result in the users' data being corrupted.

So no, BUG() is not the answer here.
Ard Biesheuvel Oct. 19, 2016, 4 p.m. UTC | #7
On 19 October 2016 at 16:56, Markus Trippelsdorf <markus@trippelsdorf.de> wrote:
> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>> On Wed, Oct 19, 2016 at 8:37 AM, Markus Trippelsdorf
>> <markus@trippelsdorf.de> wrote:
>> >
>> > This is a gcc bug, see:
>> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785
>>
>> Well, in the meantime we apparently have to live with it. Unless Will
>> is using some unreleased gcc version that nobody else is using and we
>> can just ignore it?
>
> Yes, he is using gcc-7 that is unreleased. (It will be released April
> next year.)
>

order_base_2() is still broken though, given that it is documented as

 * The first few values calculated by this routine:
 *  ob2(0) = 0
 *  ob2(1) = 0
 *  ob2(2) = 1
 *  ob2(3) = 2
 *  ob2(4) = 2
 *  ob2(5) = 3

whereas order_base_2(0) actually ends up invoking
roundup_pow_of_two(0), which is documented as being undefined.
Linus Torvalds Oct. 19, 2016, 4:01 p.m. UTC | #8
On Wed, Oct 19, 2016 at 8:56 AM, Markus Trippelsdorf
<markus@trippelsdorf.de> wrote:
> On 2016.10.19 at 08:55 -0700, Linus Torvalds wrote:
>>
>> Well, in the meantime we apparently have to live with it. Unless Will
>> is using some unreleased gcc version that nobody else is using and we
>> can just ignore it?
>
> Yes, he is using gcc-7 that is unreleased. (It will be released April
> next year.)

Ahh, self-built? So it's not part of some experimental ARM distro
setup and this will be annoying lots of people?

If so, still think that we could just get rid of the ____ilog2_NaN()
thing as it's not _that_ important, but it's certainly not very
high-priority. Will can do it in his tree too for testing, and it can
remind people to get the gcc problem fixed.

              Linus
diff mbox

Patch

diff --git a/Makefile b/Makefile
index 512e47a53e9a..750873d6d11e 100644
--- a/Makefile
+++ b/Makefile
@@ -641,6 +641,11 @@  endif
 # Tell gcc to never replace conditional load with a non-conditional one
 KBUILD_CFLAGS	+= $(call cc-option,--param=allow-store-data-races=0)
 
+# Stop gcc from converting switches into a form that defeats dead code
+# elimination and can subsequently lead to calls to intentionally
+# undefined functions appearing in the final link.
+KBUILD_CFLAGS	+= $(call cc-option,--param=max-fsm-thread-path-insns=1)
+
 include scripts/Makefile.gcc-plugins
 
 ifdef CONFIG_READABLE_ASM