Patchwork [Bug,13012] 2.6.28.9 causes init to segfault on Debian etch; 2.6.28.8 OK

login
register
mail settings
Submitter Frans Pop
Date July 10, 2009, 7:28 a.m.
Message ID <200907100928.07369.elendil@planet.nl>
Download mbox | patch
Permalink /patch/34925/
State New, archived
Headers show

Comments

Frans Pop - July 10, 2009, 7:28 a.m.
On Thu, 9 Apr 2009, Linus Torvalds wrote:
> On Thu, 9 Apr 2009, Andrew Morton wrote:
> > -fwrapv killed Barry's gcc-4.1.2-compiled kernel in 2.6.27.x,
> > 2.6.28.x and presumably 2.6.29, 2.6.30.
>
> Auughh. I hate compiler bugs. They're horrible to debug.
>
> I _think_ 'fwrapv' only really matters with gcc-4.3, so maybe we could
> just enable it for new versions.
>
> HOWEVER, I also wonder if we could instead of "-fwrapv" use
> "-fno-strict-overflow". They are apparently subtly different, and maybe
> the bug literally only happens with -fwrapv.
>
> Barry, can you see if that simple "replace -fwrapv with
> -fno-strict-overflow" works for you?
>
> Or just go with Barry's helpful debugging:
> > > I also noticed that the problem only happens with some gcc's:
> > >
> > > Problem occurs:
> > > gcc (GCC) 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
> > > gcc-4.1 (GCC) 4.1.3 20080704 (prerelease) (Debian 4.1.2-25)
> > >
> > > Problem does not occur (i.e. 2.6.28.9 works and I don't have to
> > > revert anything):
> > > gcc-4.2 (GCC) 4.2.4 (Debian 4.2.4-6)
> > > gcc (Debian 4.3.2-1.1) 4.3.2
>
> and consider 4.2 to be the point where it's ok.
>
> Do we have some gcc developer who
>  (a) knows what the rules are
> and
>  (b) might even help us figure out where the bug occurs?

The discussion on issue looks to have died, but it has bitten Debian 
stable ("Lenny") [1] as it causes init to die on s390 after a kernel 
update.

Here's a possible patch. The exact gcc version to check for is still a bit 
open I guess. For the s390 issue I've confirmed that 4.2.4 is OK, but for 
safety and because of Andrew's comment above I've set the test for 4.3 in 
the patch.

Cheers,
FJP

[1] http://bugs.debian.org/536354

---
From: Frans Pop <elendil@planet.nl>
Subject: Only add '-fwrapv' to gcc CFLAGS for gcc 4.3 and later

This flag has been shown to cause init to segfault for kernels
compiled with gcc-4.1. gcc version 4.2.4 has been shown to be OK,
but as there is some uncertainty the flag is only added for 4.3
and later.

This fixes http://bugzilla.kernel.org/show_bug.cgi?id=13012.

Reported-by: Barry K. Nathan <barryn@pobox.com>
Signed-off-by: Frans Pop <elendil@planet.nl>

 KBUILD_CFLAGS	+= $(call cc-option,-fno-dwarf2-cfi-asm)
--
To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Frans Pop - July 10, 2009, 2:59 p.m.
On Friday 10 July 2009, Frans Pop wrote:
> On Thu, 9 Apr 2009, Linus Torvalds wrote:
> > On Thu, 9 Apr 2009, Andrew Morton wrote:
> > > -fwrapv killed Barry's gcc-4.1.2-compiled kernel in 2.6.27.x,
> > > 2.6.28.x and presumably 2.6.29, 2.6.30.
> >
> > Auughh. I hate compiler bugs. They're horrible to debug.
> >
> > I _think_ 'fwrapv' only really matters with gcc-4.3, so maybe we
> > could just enable it for new versions.
> >
> > HOWEVER, I also wonder if we could instead of "-fwrapv" use
> > "-fno-strict-overflow". They are apparently subtly different, and
> > maybe the bug literally only happens with -fwrapv.
> >
> > Barry, can you see if that simple "replace -fwrapv with
> > -fno-strict-overflow" works for you?

Prompted by the same suggestion from Ben Hutchings I checked this too, 
but -fno-strict-overflow was only introduced in gcc 4.2.
So using it instead of -fwrapv *would* fix the problem for gcc 4.1, but 
*only* because it would effectively do the same as the patch I proposed: 
not add an option at all for gcc 4.1.

So that change seems illogical unless there are other reasons to 
prefer -fno-strict-overflow over -fwrapv (well, it would avoid the
gcc version check).

It does however make it somewhat more logical to change the test in my 
proposed patch to also allow -fwrapv for gcc 4.2.

Cheers,
FJP
--
To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Linus Torvalds - July 12, 2009, 5:58 p.m.
On Fri, 10 Jul 2009, Frans Pop wrote:
> 
> Prompted by the same suggestion from Ben Hutchings I checked this too, 
> but -fno-strict-overflow was only introduced in gcc 4.2.
> So using it instead of -fwrapv *would* fix the problem for gcc 4.1, but 
> *only* because it would effectively do the same as the patch I proposed: 
> not add an option at all for gcc 4.1.
> 
> So that change seems illogical unless there are other reasons to 
> prefer -fno-strict-overflow over -fwrapv (well, it would avoid the
> gcc version check).
>
> It does however make it somewhat more logical to change the test in my 
> proposed patch to also allow -fwrapv for gcc 4.2.

Hmm. It all really makes me suspect that we should really be using
-fno-strict-overflow instead.

That not only apparently avoids the unnecessary gcc version check (by 
virtue of the option only existing in compilers that don't have the 
problem), but qutie frankly, one of the core people involved with the 
whole thing (Ian Lance Taylor) seems to think it's the better option.

See for example

	http://www.airs.com/blog/archives/120

on how gcc actually generates better code with -fno-strict-overflow.

I added Ian to the cc.

Ian: we generally do try to be careful and use explicit unsigned types for 
code that cares about overflow, but we use -fwrapv because there have been 
some cases where we didn't (and used pointer comparisons or signed 
integers).

The problem is that apparently gcc-4.1.x was literally generating buggy 
code with -fwrapv. So now the choice for us is between switching to an 
explicit version test:

	-KBUILD_CFLAGS  += $(call cc-option,-fwrapv)
	+KBUILD_CFLAGS  += $(shell if [ $(call cc-version) -ge 0402 ]; then \
	+                   echo $(call cc-option,-fwrapv); fi ;)

or just switching to -fno-strict-overflow instead:

	-KBUILD_CFLAGS  += $(call cc-option,-fwrapv)
	+KBUILD_CFLAGS  += $(call cc-option,-fno-strict-overflow)

which avoids the buggy gcc versions because it's simply not even supported 
by gcc-4.1.x (and even if that wasn't the case, possibly because only 
'wrapv' is the problematic case - apparently the difference _does_ 
matter to gcc).

From everything I have been able to find, I really prefer the second 
version. Not only is the patch cleaner, but it looks like code generation 
is better too (for some inexplicable reason, but I suspect it's because 
-fno-strict-overflow is just saner).

		Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ian Lance Taylor - July 13, 2009, 5:29 a.m.
Linus Torvalds <torvalds@linux-foundation.org> writes:

> Ian: we generally do try to be careful and use explicit unsigned types for 
> code that cares about overflow, but we use -fwrapv because there have been 
> some cases where we didn't (and used pointer comparisons or signed 
> integers).
>
> The problem is that apparently gcc-4.1.x was literally generating buggy 
> code with -fwrapv. So now the choice for us is between switching to an 
> explicit version test:
>
> 	-KBUILD_CFLAGS  += $(call cc-option,-fwrapv)
> 	+KBUILD_CFLAGS  += $(shell if [ $(call cc-version) -ge 0402 ]; then \
> 	+                   echo $(call cc-option,-fwrapv); fi ;)
>
> or just switching to -fno-strict-overflow instead:
>
> 	-KBUILD_CFLAGS  += $(call cc-option,-fwrapv)
> 	+KBUILD_CFLAGS  += $(call cc-option,-fno-strict-overflow)
>
> which avoids the buggy gcc versions because it's simply not even supported 
> by gcc-4.1.x (and even if that wasn't the case, possibly because only 
> 'wrapv' is the problematic case - apparently the difference _does_ 
> matter to gcc).

My instinctive advice is that y'all should track down and fix the cases
where the program relies on signed overflow being defined.  However, if
that is difficult--and it is--then I agree that -fno-strict-overflow is
preferable when using a compiler which supports it (gcc 4.2.0 and
later).

(The gcc 4.2 and later option -Wstrict-overflow=N can help find the
cases where a program relies on defined signed overflow, but only if
somebody is patient enough to wade through all the false positives.)

Ian
--
To unsubscribe from this list: send the line "unsubscribe linux-kbuild" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch

diff --git a/Makefile b/Makefile
index 0aeec59..2f8756e 100644
--- a/Makefile
+++ b/Makefile
@@ -565,7 +565,8 @@  KBUILD_CFLAGS += $(call 
cc-option,-Wdeclaration-after-statement,)
 KBUILD_CFLAGS += $(call cc-option,-Wno-pointer-sign,)
 
 # disable invalid "can't wrap" optimizations for signed / pointers
-KBUILD_CFLAGS	+= $(call cc-option,-fwrapv)
+KBUILD_CFLAGS  += $(shell if [ $(call cc-version) -ge 0430 ]; then \
+		    echo $(call cc-option,-fwrapv); fi ;)
 
 # revert to pre-gcc-4.4 behaviour of .eh_frame