diff mbox

[regression,drm/noveau] suspend to ram -> BOOM: exception RIP: drm_calc_vbltimestamp_from_scanoutpos+335

Message ID 1500039368.5763.12.camel@gmx.de (mailing list archive)
State New, archived
Headers show

Commit Message

Mike Galbraith July 14, 2017, 1:36 p.m. UTC
On Wed, 2017-07-12 at 07:37 -0400, Ilia Mirkin wrote:
> On Wed, Jul 12, 2017 at 7:25 AM, Mike Galbraith <efault@gmx.de> wrote:
> > On Wed, 2017-07-12 at 11:55 +0200, Mike Galbraith wrote:
> >> On Tue, 2017-07-11 at 14:22 -0400, Ilia Mirkin wrote:
> >> >
> >> > Some display stuff did change for 4.13 for GM20x+ boards. If it's not
> >> > too much trouble, a bisect would be pretty useful.
> >>
> >> Bisection seemingly went fine, but the result is odd.
> >>
> >> e98c58e55f68f8785aebfab1f8c9a03d8de0afe1 is the first bad commit
> >
> > But it really really is bad.  Looking at gitk fork in the road leading
> > to it...
> >
> > 52d9d38c183b drm/sti:fix spelling mistake: "compoment" -> "component" - good
> > e4e818cc2d7c drm: make drm_panel.h self-contained                     - good
> > 9cf8f5802f39 drm: add missing declaration to drm_blend.h              - good
> >
> > Before the git highway splits, all is well.  The lane with commits
> > works fine at both ends, but e98c58e55f68 is busted.  Merge arfifact?
> 
> Hmmm... that tree does not appear to have gotten a v4.12 backmerge at
> any point. The last backmerge from Linus as far as I can tell was
> v4.11-rc7. Could be an interaction with some out-of-tree change.

Ok, a network outage gave me time to go hunting.  Indeed it is a bad
interaction with the tree DRM merged into.  All DRM did was to slip a
WARN_ON_ONCE() that nouveau triggers into a kernel module where such
things no longer warn, they blow the box out of the water.  I made a
dinky testcase module (attached), and bisected to the real root....

19d436268dde95389c616bb3819da73f0a8b28a8 is the first bad commit
commit 19d436268dde95389c616bb3819da73f0a8b28a8
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Sat Feb 25 08:56:53 2017 +0100

    debug: Add _ONCE() logic to report_bug()
    
    Josh suggested moving the _ONCE logic inside the trap handler, using a
    bit in the bug_entry::flags field, avoiding the need for the extra
    variable.
    
    Sadly this only works for WARN_ON_ONCE(), since the others have
    printk() statements prior to triggering the trap.
    
    Still, this saves a fair amount of text and some data:
    
      text         data       filename
      10682460     4530992    defconfig-build/vmlinux.orig
      10665111     4530096    defconfig-build/vmlinux.patched
    
    Suggested-by: Josh Poimboeuf <jpoimboe@redhat.com>
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Andy Lutomirski <luto@kernel.org>
    Cc: Arnd Bergmann <arnd@arndb.de>
    Cc: Borislav Petkov <bp@alien8.de>
    Cc: Brian Gerst <brgerst@gmail.com>
    Cc: Denys Vlasenko <dvlasenk@redhat.com>
    Cc: H. Peter Anvin <hpa@zytor.com>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

:040000 040000 9f47f66ec4c234f6ee8e2a09e991c95fe47cf2c1 3e92aa9e77b39ed075ae2c3bdf041d92ef898f62 M	arch
:040000 040000 34f70b73d40c82533dd7df9b289106be69e2fa8d dd5d7248694a36b3e170f2dca5d9c4121535a990 M	include
:040000 040000 f6e627b0d378f0a00d2987fdd0c7b215306e6e3c b360d4ee2579744cce530184d7dab13493f73ee0 M	lib

Comments

Mike Galbraith July 14, 2017, 4:33 p.m. UTC | #1
On Fri, 2017-07-14 at 18:10 +0200, Peter Zijlstra wrote:
> On Fri, Jul 14, 2017 at 05:58:18PM +0200, Mike Galbraith wrote:
> > On Fri, 2017-07-14 at 17:50 +0200, Peter Zijlstra wrote:
> 
> > > Urgh, is for some mysterious reason the __bug_table section of modules
> > > ending up in RO memory?
> > > 
> > > I forever get lost in that link magic :/
> > 
> > +1
> > 
> > drm.ko
> >  20 __bug_table   00000630  0000000000000000  0000000000000000  0004bff3  2**0
> >                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
> > vmlinux
> >  15 __bug_table   0000ba84  ffffffff81af26c0  0000000001af26c0  00cf26c0  2**0
> >                   CONTENTS, ALLOC, LOAD, READONLY, DATA
> > 
> > Danged if I know... um um RELOC business mucks things up?
> 
> Argh, it shouldn't be READONLY for vmlinux either, but apparently that
> is working for mysterious reasons.
> 
> Some architectures were in fact complaining that I broke that, and hence
> patch:
> 
> b5effd3815cc ("debug: Fix __bug_table[] in arch linker scripts")
> 
> I think we need professional help with this linking stuff, but who to
> ask?

Andy Lutomirski?
Jessica Yu July 17, 2017, 5:20 p.m. UTC | #2
+++ Peter Zijlstra [14/07/17 18:10 +0200]:
>On Fri, Jul 14, 2017 at 05:58:18PM +0200, Mike Galbraith wrote:
>> On Fri, 2017-07-14 at 17:50 +0200, Peter Zijlstra wrote:
>
>> > Urgh, is for some mysterious reason the __bug_table section of modules
>> > ending up in RO memory?
>> >
>> > I forever get lost in that link magic :/
>>
>> +1
>>
>> drm.ko
>>  20 __bug_table   00000630  0000000000000000  0000000000000000  0004bff3  2**0
>>                   CONTENTS, ALLOC, LOAD, RELOC, READONLY, DATA
>> vmlinux
>>  15 __bug_table   0000ba84  ffffffff81af26c0  0000000001af26c0  00cf26c0  2**0
>>                   CONTENTS, ALLOC, LOAD, READONLY, DATA
>>
>> Danged if I know... um um RELOC business mucks things up?
>
>Argh, it shouldn't be READONLY for vmlinux either, but apparently that
>is working for mysterious reasons.

If I'm not mistaken, this works because __bug_table falls outside of
the RO range as specified in the vmlinux linker script (using x86_64
as the example, that means _text - __end_rodata_hpage_align).
mark_rodata_ro() only sets ro memory protections for pages within this
range, so __bug_table remains rw in memory despite its Elf section
flags. Interestingly enough, my .rodata section is set 'WA' (rw) in
vmlinux on my f25 system, so that leads me to think that Elf section
flags in vmlinux don't seem to matter much when it comes to setting
ro/nx protections..

However, in the module loader it's a different story; we do set page
protections strictly according to the section flags, so since
__bug_table only has SHF_ALLOC set, it assumes it's a readonly section
and gets treated as such. So I would think that Josh's patch would fix
this issue.

Jessica
diff mbox

Patch

---
 kernel/Makefile |    2 ++
 kernel/foo.c    |   15 +++++++++++++++
 2 files changed, 17 insertions(+)

--- a/kernel/Makefile
+++ b/kernel/Makefile
@@ -111,6 +111,8 @@  obj-$(CONFIG_MEMBARRIER) += membarrier.o
 
 obj-$(CONFIG_HAS_IOMEM) += memremap.o
 
+obj-m += foo.o
+
 $(obj)/configs.o: $(obj)/config_data.h
 
 targets += config_data.gz
--- /dev/null
+++ b/kernel/foo.c
@@ -0,0 +1,15 @@ 
+#include <linux/module.h>
+#include <linux/bug.h>
+
+static int __init foo_init(void)
+{
+	printk(KERN_INFO "foo: module loaded\n");
+	WARN_ON_ONCE(1);
+	return 0;
+}
+
+static void __exit foo_exit(void) { }
+
+module_init(foo_init);
+module_exit(foo_exit);
+MODULE_LICENSE("GPL");