Re: qemu:metag image runtime failure in -next due to 'kthread: allow to cancel kthread work'
diff mbox

Message ID 20160916233249.GB18931@jhogan-linux.le.imgtec.org
State New
Headers show

Commit Message

James Hogan Sept. 16, 2016, 11:32 p.m. UTC
On Fri, Sep 16, 2016 at 02:37:18PM -0700, Guenter Roeck wrote:
> On Fri, Sep 16, 2016 at 10:27:20PM +0100, James Hogan wrote:
> > On Fri, Sep 16, 2016 at 01:38:19PM -0700, Guenter Roeck wrote:
> > > Hi,
> > > 
> > > I see the following runtime error in -next when running a metag qemu emulation.
> > > 
> > > [ ... ]
> > > workingset: timestamp_bits=30 max_order=16 bucket_order=0
> > > io scheduler noop registered (default)
> > > brd: module loaded
> > > Warning: unable to open an initial console.
> > > List of all partitions:
> > > 0100           16384 ram0  (driver?)
> > > No filesystem could mount root, tried: 
> > > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
> > > 
> > > An example for a complete log is at:
> > > http://kerneltests.org/builders/qemu-metag-next/builds/489/steps/qemubuildcommand/logs/stdio
> > > 
> > > bisect points to commit ef98de028afde ("kthread: allow to cancel kthread work").
> > > I don't know (yet) if other architectures are affected. bisect log is attached.
> > > 
> > > The scripts to run this test are available at
> > > https://github.com/groeck/linux-build-test/tree/master/rootfs/metag.
> > > 
> > > Guenter
> > 
> > Thanks Guenter,
> > 
> > It appears to be related to the command line. After that commit the
> > command line is shown as empty (rather than your "rdinit=/sbin/init
> > doreboot"), but it can still be overridden in the config and then it
> > continues to work.
> > 
> Weird.

Previously the Elf had a single load program header:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x004000 0x40000000 0x40000000 0x34b304 0x376230 RWE 0x4000
  NOTE           0x1ecc08 0x401e8c08 0x401e8c08 0x00000 0x00000 R   0x4

QEMU puts the args at 40376230, straight after the load region (unlike a
real Meta Linux bootloader).

After the above commit the ELF gets two load program headers, with a
small alignment gap in the middle:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x004000 0x40000000 0x40000000 0x18f118 0x18f118 R E 0x4000
  LOAD           0x194000 0x40190000 0x40190000 0x1bd304 0x1e8230 RWE 0x4000
  NOTE           0x1eec08 0x401eac08 0x401eac08 0x00000 0x00000 R   0x4

Here this version of QEMU puts the args at where it thinks the end of
the loaded image is, which is based on the number of bytes copied from
the ELF, i.e. the total MemSiz's, not taking into account the alignment
gap in between, so it puts them at 0x40377348.

But of course:
40378230 B ___bss_stop

so it wipes them out while clearing bss during early init.

Previously:
4018ebd0 T __sdata
4018f000 R ___start_rodata

now:
4018ed98 T __sdata
40190000 R ___start_rodata

So I'm thinking this may have been triggered by c74ba8b3480d ("arch:
Introduce post-init read-only memory").

The hack below does indeed reduce it to a single load section and this
version of QEMU then succeeds:

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  LOAD           0x004000 0x40000000 0x40000000 0x34d304 0x378230 RWE 0x4000
  NOTE           0x1eec88 0x401eac88 0x401eac88 0x00000 0x00000 R   0x4


Kees: Is it expected to get multiple load headers like this since your
patch c74ba8b3480d ("arch: Introduce post-init read-only memory"),
depending on alignment of the read only section?

Cheers
James

Comments

Kees Cook Sept. 17, 2016, 2:58 a.m. UTC | #1
On Fri, Sep 16, 2016 at 4:32 PM, James Hogan <james.hogan@imgtec.com> wrote:
> On Fri, Sep 16, 2016 at 02:37:18PM -0700, Guenter Roeck wrote:
>> On Fri, Sep 16, 2016 at 10:27:20PM +0100, James Hogan wrote:
>> > On Fri, Sep 16, 2016 at 01:38:19PM -0700, Guenter Roeck wrote:
>> > > Hi,
>> > >
>> > > I see the following runtime error in -next when running a metag qemu emulation.
>> > >
>> > > [ ... ]
>> > > workingset: timestamp_bits=30 max_order=16 bucket_order=0
>> > > io scheduler noop registered (default)
>> > > brd: module loaded
>> > > Warning: unable to open an initial console.
>> > > List of all partitions:
>> > > 0100           16384 ram0  (driver?)
>> > > No filesystem could mount root, tried:
>> > > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
>> > >
>> > > An example for a complete log is at:
>> > > http://kerneltests.org/builders/qemu-metag-next/builds/489/steps/qemubuildcommand/logs/stdio
>> > >
>> > > bisect points to commit ef98de028afde ("kthread: allow to cancel kthread work").
>> > > I don't know (yet) if other architectures are affected. bisect log is attached.
>> > >
>> > > The scripts to run this test are available at
>> > > https://github.com/groeck/linux-build-test/tree/master/rootfs/metag.
>> > >
>> > > Guenter
>> >
>> > Thanks Guenter,
>> >
>> > It appears to be related to the command line. After that commit the
>> > command line is shown as empty (rather than your "rdinit=/sbin/init
>> > doreboot"), but it can still be overridden in the config and then it
>> > continues to work.
>> >
>> Weird.
>
> Previously the Elf had a single load program header:
>
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   LOAD           0x004000 0x40000000 0x40000000 0x34b304 0x376230 RWE 0x4000
>   NOTE           0x1ecc08 0x401e8c08 0x401e8c08 0x00000 0x00000 R   0x4
>
> QEMU puts the args at 40376230, straight after the load region (unlike a
> real Meta Linux bootloader).
>
> After the above commit the ELF gets two load program headers, with a
> small alignment gap in the middle:
>
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   LOAD           0x004000 0x40000000 0x40000000 0x18f118 0x18f118 R E 0x4000
>   LOAD           0x194000 0x40190000 0x40190000 0x1bd304 0x1e8230 RWE 0x4000
>   NOTE           0x1eec08 0x401eac08 0x401eac08 0x00000 0x00000 R   0x4
>
> Here this version of QEMU puts the args at where it thinks the end of
> the loaded image is, which is based on the number of bytes copied from
> the ELF, i.e. the total MemSiz's, not taking into account the alignment
> gap in between, so it puts them at 0x40377348.
>
> But of course:
> 40378230 B ___bss_stop
>
> so it wipes them out while clearing bss during early init.
>
> Previously:
> 4018ebd0 T __sdata
> 4018f000 R ___start_rodata
>
> now:
> 4018ed98 T __sdata
> 40190000 R ___start_rodata
>
> So I'm thinking this may have been triggered by c74ba8b3480d ("arch:
> Introduce post-init read-only memory").
>
> The hack below does indeed reduce it to a single load section and this
> version of QEMU then succeeds:
>
> Program Headers:
>   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>   LOAD           0x004000 0x40000000 0x40000000 0x34d304 0x378230 RWE 0x4000
>   NOTE           0x1eec88 0x401eac88 0x401eac88 0x00000 0x00000 R   0x4
>
> diff --git a/arch/metag/include/asm/cache.h b/arch/metag/include/asm/cache.h
> index a43b650cfdc0..b5c7364a94da 100644
> --- a/arch/metag/include/asm/cache.h
> +++ b/arch/metag/include/asm/cache.h
> @@ -20,4 +20,6 @@
>
>  #define __read_mostly __attribute__((__section__(".data..read_mostly")))
>
> +#define __ro_after_init        __read_mostly
> +
>  #endif
>
> Kees: Is it expected to get multiple load headers like this since your
> patch c74ba8b3480d ("arch: Introduce post-init read-only memory"),
> depending on alignment of the read only section?

I'm not expecting that, and I'm especially not expecting any LOAD to
have "RWE" flags. It looks like metag's vmlinux.lds.S is using the
common RO_DATA_SECTION, so that doesn't jump out at me as a reason for
RO_AFTER_INIT_DATA to end up in the wrong place. Also the second LOAD
in the ELF is really huge, so it's not just the RO_AFTER_INIT_DATA
section (which is currently very small).

Could the metag linker be failing to override the section flags when
putting RO_AFTER_INIT_DATA into RO_DATA_SECTION and this cuts the LOAD
in half? This smells like a linker glitch.

However, since metag doesn't support CONFIG_DEBUG_RODATA, your above
patch is likely correct. If metag marks memory read-only at kernel
load time, it's doing it early enough that __ro_after_init will fail
to work. If it never marks memory read-only, then __ro_after_init will
have no effect. (i.e. Both situations need the proposed patch).

Probably the best solution for all architectures is to invent a new
CONFIG_ARCH_HAS... to identify the style of kernel memory protection a
given architecture has so that the default for __ro_after_init can be
more sensible (and the warnings about mark_rodata_ro() can better
reflect reality).

-Kees
James Hogan Sept. 19, 2016, 1:59 p.m. UTC | #2
On Fri, Sep 16, 2016 at 07:58:12PM -0700, Kees Cook wrote:
> On Fri, Sep 16, 2016 at 4:32 PM, James Hogan <james.hogan@imgtec.com> wrote:
> > On Fri, Sep 16, 2016 at 02:37:18PM -0700, Guenter Roeck wrote:
> >> On Fri, Sep 16, 2016 at 10:27:20PM +0100, James Hogan wrote:
> >> > On Fri, Sep 16, 2016 at 01:38:19PM -0700, Guenter Roeck wrote:
> >> > > Hi,
> >> > >
> >> > > I see the following runtime error in -next when running a metag qemu emulation.
> >> > >
> >> > > [ ... ]
> >> > > workingset: timestamp_bits=30 max_order=16 bucket_order=0
> >> > > io scheduler noop registered (default)
> >> > > brd: module loaded
> >> > > Warning: unable to open an initial console.
> >> > > List of all partitions:
> >> > > 0100           16384 ram0  (driver?)
> >> > > No filesystem could mount root, tried:
> >> > > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
> >> > >
> >> > > An example for a complete log is at:
> >> > > http://kerneltests.org/builders/qemu-metag-next/builds/489/steps/qemubuildcommand/logs/stdio
> >> > >
> >> > > bisect points to commit ef98de028afde ("kthread: allow to cancel kthread work").
> >> > > I don't know (yet) if other architectures are affected. bisect log is attached.
> >> > >
> >> > > The scripts to run this test are available at
> >> > > https://github.com/groeck/linux-build-test/tree/master/rootfs/metag.
> >> > >
> >> > > Guenter
> >> >
> >> > Thanks Guenter,
> >> >
> >> > It appears to be related to the command line. After that commit the
> >> > command line is shown as empty (rather than your "rdinit=/sbin/init
> >> > doreboot"), but it can still be overridden in the config and then it
> >> > continues to work.
> >> >
> >> Weird.
> >
> > Previously the Elf had a single load program header:
> >
> > Program Headers:
> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
> >   LOAD           0x004000 0x40000000 0x40000000 0x34b304 0x376230 RWE 0x4000
> >   NOTE           0x1ecc08 0x401e8c08 0x401e8c08 0x00000 0x00000 R   0x4
> >
> > QEMU puts the args at 40376230, straight after the load region (unlike a
> > real Meta Linux bootloader).
> >
> > After the above commit the ELF gets two load program headers, with a
> > small alignment gap in the middle:
> >
> > Program Headers:
> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
> >   LOAD           0x004000 0x40000000 0x40000000 0x18f118 0x18f118 R E 0x4000
> >   LOAD           0x194000 0x40190000 0x40190000 0x1bd304 0x1e8230 RWE 0x4000
> >   NOTE           0x1eec08 0x401eac08 0x401eac08 0x00000 0x00000 R   0x4
> >
> > Here this version of QEMU puts the args at where it thinks the end of
> > the loaded image is, which is based on the number of bytes copied from
> > the ELF, i.e. the total MemSiz's, not taking into account the alignment
> > gap in between, so it puts them at 0x40377348.
> >
> > But of course:
> > 40378230 B ___bss_stop
> >
> > so it wipes them out while clearing bss during early init.
> >
> > Previously:
> > 4018ebd0 T __sdata
> > 4018f000 R ___start_rodata
> >
> > now:
> > 4018ed98 T __sdata
> > 40190000 R ___start_rodata
> >
> > So I'm thinking this may have been triggered by c74ba8b3480d ("arch:
> > Introduce post-init read-only memory").
> >
> > The hack below does indeed reduce it to a single load section and this
> > version of QEMU then succeeds:
> >
> > Program Headers:
> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
> >   LOAD           0x004000 0x40000000 0x40000000 0x34d304 0x378230 RWE 0x4000
> >   NOTE           0x1eec88 0x401eac88 0x401eac88 0x00000 0x00000 R   0x4
> >
> > diff --git a/arch/metag/include/asm/cache.h b/arch/metag/include/asm/cache.h
> > index a43b650cfdc0..b5c7364a94da 100644
> > --- a/arch/metag/include/asm/cache.h
> > +++ b/arch/metag/include/asm/cache.h
> > @@ -20,4 +20,6 @@
> >
> >  #define __read_mostly __attribute__((__section__(".data..read_mostly")))
> >
> > +#define __ro_after_init        __read_mostly
> > +
> >  #endif
> >
> > Kees: Is it expected to get multiple load headers like this since your
> > patch c74ba8b3480d ("arch: Introduce post-init read-only memory"),
> > depending on alignment of the read only section?
> 
> I'm not expecting that, and I'm especially not expecting any LOAD to
> have "RWE" flags. It looks like metag's vmlinux.lds.S is using the
> common RO_DATA_SECTION, so that doesn't jump out at me as a reason for
> RO_AFTER_INIT_DATA to end up in the wrong place. Also the second LOAD
> in the ELF is really huge, so it's not just the RO_AFTER_INIT_DATA
> section (which is currently very small).
> 
> Could the metag linker be failing to override the section flags when
> putting RO_AFTER_INIT_DATA into RO_DATA_SECTION and this cuts the LOAD
> in half? This smells like a linker glitch.

Where is that section flags override supposed to be happening?

The .rodata section is indeed becoming slightly larger to fit the
ptmx_fops __ro_after_init, and in the process changing from flag A to
flag WA (presumably means becoming writable).

This then appears to be hitting the condition in the linker whereby a
new segment is started if a writable section is found after a readonly
section, and the sections are on separate pages:

https://sourceware.org/git/?p=binutils.git;a=blob;f=bfd/elf.c;h=8df38ee37923c020994715f111a0ffc6bae83c8c;hb=HEAD#l3952

> 
> However, since metag doesn't support CONFIG_DEBUG_RODATA, your above
> patch is likely correct. If metag marks memory read-only at kernel
> load time, it's doing it early enough that __ro_after_init will fail
> to work. If it never marks memory read-only, then __ro_after_init will
> have no effect.

Yes, metag doesn't make that memory read only yet, although it would be
possible to do so (maybe delaying the switch to 4MB pages until
afterwards).

Either way it doesn't sound that unreasonable to have multiple load
segments generated.

> (i.e. Both situations need the proposed patch).

purely to avoid the curious linker segment behaviour (which only trips
up that QEMU because its broken), or for other reasons too?

Thanks
James

> 
> Probably the best solution for all architectures is to invent a new
> CONFIG_ARCH_HAS... to identify the style of kernel memory protection a
> given architecture has so that the default for __ro_after_init can be
> more sensible (and the warnings about mark_rodata_ro() can better
> reflect reality).
> 
> -Kees
> 
> -- 
> Kees Cook
> Nexus Security
James Hogan Sept. 19, 2016, 2:55 p.m. UTC | #3
On Sat, Sep 17, 2016 at 12:32:49AM +0100, James Hogan wrote:
> Here this version of QEMU puts the args at where it thinks the end of
> the loaded image is, which is based on the number of bytes copied from
> the ELF, i.e. the total MemSiz's, not taking into account the alignment
> gap in between, so it puts them at 0x40377348.

QEMU meta-v1.3.1 branch updated at:
https://github.com/img-meta/qemu.git

Hopefully that'll fix it for you Guenter.

Cheers
James
Guenter Roeck Sept. 19, 2016, 3:45 p.m. UTC | #4
On Mon, Sep 19, 2016 at 03:55:29PM +0100, James Hogan wrote:
> On Sat, Sep 17, 2016 at 12:32:49AM +0100, James Hogan wrote:
> > Here this version of QEMU puts the args at where it thinks the end of
> > the loaded image is, which is based on the number of bytes copied from
> > the ELF, i.e. the total MemSiz's, not taking into account the alignment
> > gap in between, so it puts them at 0x40377348.
> 
> QEMU meta-v1.3.1 branch updated at:
> https://github.com/img-meta/qemu.git
> 
> Hopefully that'll fix it for you Guenter.
> 
Confirmed fixed.

Thanks a lot!
Guenter
Kees Cook Sept. 19, 2016, 7:53 p.m. UTC | #5
On Mon, Sep 19, 2016 at 6:59 AM, James Hogan <james.hogan@imgtec.com> wrote:
> On Fri, Sep 16, 2016 at 07:58:12PM -0700, Kees Cook wrote:
>> On Fri, Sep 16, 2016 at 4:32 PM, James Hogan <james.hogan@imgtec.com> wrote:
>> > On Fri, Sep 16, 2016 at 02:37:18PM -0700, Guenter Roeck wrote:
>> >> On Fri, Sep 16, 2016 at 10:27:20PM +0100, James Hogan wrote:
>> >> > On Fri, Sep 16, 2016 at 01:38:19PM -0700, Guenter Roeck wrote:
>> >> > > Hi,
>> >> > >
>> >> > > I see the following runtime error in -next when running a metag qemu emulation.
>> >> > >
>> >> > > [ ... ]
>> >> > > workingset: timestamp_bits=30 max_order=16 bucket_order=0
>> >> > > io scheduler noop registered (default)
>> >> > > brd: module loaded
>> >> > > Warning: unable to open an initial console.
>> >> > > List of all partitions:
>> >> > > 0100           16384 ram0  (driver?)
>> >> > > No filesystem could mount root, tried:
>> >> > > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
>> >> > >
>> >> > > An example for a complete log is at:
>> >> > > http://kerneltests.org/builders/qemu-metag-next/builds/489/steps/qemubuildcommand/logs/stdio
>> >> > >
>> >> > > bisect points to commit ef98de028afde ("kthread: allow to cancel kthread work").
>> >> > > I don't know (yet) if other architectures are affected. bisect log is attached.
>> >> > >
>> >> > > The scripts to run this test are available at
>> >> > > https://github.com/groeck/linux-build-test/tree/master/rootfs/metag.
>> >> > >
>> >> > > Guenter
>> >> >
>> >> > Thanks Guenter,
>> >> >
>> >> > It appears to be related to the command line. After that commit the
>> >> > command line is shown as empty (rather than your "rdinit=/sbin/init
>> >> > doreboot"), but it can still be overridden in the config and then it
>> >> > continues to work.
>> >> >
>> >> Weird.
>> >
>> > Previously the Elf had a single load program header:
>> >
>> > Program Headers:
>> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>> >   LOAD           0x004000 0x40000000 0x40000000 0x34b304 0x376230 RWE 0x4000
>> >   NOTE           0x1ecc08 0x401e8c08 0x401e8c08 0x00000 0x00000 R   0x4
>> >
>> > QEMU puts the args at 40376230, straight after the load region (unlike a
>> > real Meta Linux bootloader).
>> >
>> > After the above commit the ELF gets two load program headers, with a
>> > small alignment gap in the middle:
>> >
>> > Program Headers:
>> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>> >   LOAD           0x004000 0x40000000 0x40000000 0x18f118 0x18f118 R E 0x4000
>> >   LOAD           0x194000 0x40190000 0x40190000 0x1bd304 0x1e8230 RWE 0x4000
>> >   NOTE           0x1eec08 0x401eac08 0x401eac08 0x00000 0x00000 R   0x4
>> >
>> > Here this version of QEMU puts the args at where it thinks the end of
>> > the loaded image is, which is based on the number of bytes copied from
>> > the ELF, i.e. the total MemSiz's, not taking into account the alignment
>> > gap in between, so it puts them at 0x40377348.
>> >
>> > But of course:
>> > 40378230 B ___bss_stop
>> >
>> > so it wipes them out while clearing bss during early init.
>> >
>> > Previously:
>> > 4018ebd0 T __sdata
>> > 4018f000 R ___start_rodata
>> >
>> > now:
>> > 4018ed98 T __sdata
>> > 40190000 R ___start_rodata
>> >
>> > So I'm thinking this may have been triggered by c74ba8b3480d ("arch:
>> > Introduce post-init read-only memory").
>> >
>> > The hack below does indeed reduce it to a single load section and this
>> > version of QEMU then succeeds:
>> >
>> > Program Headers:
>> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>> >   LOAD           0x004000 0x40000000 0x40000000 0x34d304 0x378230 RWE 0x4000
>> >   NOTE           0x1eec88 0x401eac88 0x401eac88 0x00000 0x00000 R   0x4
>> >
>> > diff --git a/arch/metag/include/asm/cache.h b/arch/metag/include/asm/cache.h
>> > index a43b650cfdc0..b5c7364a94da 100644
>> > --- a/arch/metag/include/asm/cache.h
>> > +++ b/arch/metag/include/asm/cache.h
>> > @@ -20,4 +20,6 @@
>> >
>> >  #define __read_mostly __attribute__((__section__(".data..read_mostly")))
>> >
>> > +#define __ro_after_init        __read_mostly
>> > +
>> >  #endif
>> >
>> > Kees: Is it expected to get multiple load headers like this since your
>> > patch c74ba8b3480d ("arch: Introduce post-init read-only memory"),
>> > depending on alignment of the read only section?
>>
>> I'm not expecting that, and I'm especially not expecting any LOAD to
>> have "RWE" flags. It looks like metag's vmlinux.lds.S is using the
>> common RO_DATA_SECTION, so that doesn't jump out at me as a reason for
>> RO_AFTER_INIT_DATA to end up in the wrong place. Also the second LOAD
>> in the ELF is really huge, so it's not just the RO_AFTER_INIT_DATA
>> section (which is currently very small).
>>
>> Could the metag linker be failing to override the section flags when
>> putting RO_AFTER_INIT_DATA into RO_DATA_SECTION and this cuts the LOAD
>> in half? This smells like a linker glitch.
>
> Where is that section flags override supposed to be happening?
>
> The .rodata section is indeed becoming slightly larger to fit the
> ptmx_fops __ro_after_init, and in the process changing from flag A to
> flag WA (presumably means becoming writable).
>
> This then appears to be hitting the condition in the linker whereby a
> new segment is started if a writable section is found after a readonly
> section, and the sections are on separate pages:
>
> https://sourceware.org/git/?p=binutils.git;a=blob;f=bfd/elf.c;h=8df38ee37923c020994715f111a0ffc6bae83c8c;hb=HEAD#l3952

Hm, well, I think x86, arm, and arm64 don't do this. They just clobber
the flags from the section and force it to match the master section
(i.e. ro_after_init gets rodata's flags).

>> However, since metag doesn't support CONFIG_DEBUG_RODATA, your above
>> patch is likely correct. If metag marks memory read-only at kernel
>> load time, it's doing it early enough that __ro_after_init will fail
>> to work. If it never marks memory read-only, then __ro_after_init will
>> have no effect.
>
> Yes, metag doesn't make that memory read only yet, although it would be
> possible to do so (maybe delaying the switch to 4MB pages until
> afterwards).
>
> Either way it doesn't sound that unreasonable to have multiple load
> segments generated.

The bug here is that the ro_after_init portion of .rodata shouldn't be
writable according to the ELF headers. It should be writable only
because memory protection of .rodata hasn't happened yet. :)

>> (i.e. Both situations need the proposed patch).
>
> purely to avoid the curious linker segment behaviour (which only trips
> up that QEMU because its broken), or for other reasons too?

Well, it means the linker behavior will go away since there will be no
mixing of sections with different flags, and it'll match current
expectations: __ro_after_init won't be read-only ever (same as
.rodata).

Fixing memory permissions for metag would be nice, though! :)

-Kees

>
> Thanks
> James
>
>>
>> Probably the best solution for all architectures is to invent a new
>> CONFIG_ARCH_HAS... to identify the style of kernel memory protection a
>> given architecture has so that the default for __ro_after_init can be
>> more sensible (and the warnings about mark_rodata_ro() can better
>> reflect reality).
>>
>> -Kees
>>
>> --
>> Kees Cook
>> Nexus Security
James Hogan Sept. 19, 2016, 9:37 p.m. UTC | #6
On Mon, Sep 19, 2016 at 12:53:49PM -0700, Kees Cook wrote:
> On Mon, Sep 19, 2016 at 6:59 AM, James Hogan <james.hogan@imgtec.com> wrote:
> > On Fri, Sep 16, 2016 at 07:58:12PM -0700, Kees Cook wrote:
> >> On Fri, Sep 16, 2016 at 4:32 PM, James Hogan <james.hogan@imgtec.com> wrote:
> >> > On Fri, Sep 16, 2016 at 02:37:18PM -0700, Guenter Roeck wrote:
> >> >> On Fri, Sep 16, 2016 at 10:27:20PM +0100, James Hogan wrote:
> >> >> > On Fri, Sep 16, 2016 at 01:38:19PM -0700, Guenter Roeck wrote:
> >> >> > > Hi,
> >> >> > >
> >> >> > > I see the following runtime error in -next when running a metag qemu emulation.
> >> >> > >
> >> >> > > [ ... ]
> >> >> > > workingset: timestamp_bits=30 max_order=16 bucket_order=0
> >> >> > > io scheduler noop registered (default)
> >> >> > > brd: module loaded
> >> >> > > Warning: unable to open an initial console.
> >> >> > > List of all partitions:
> >> >> > > 0100           16384 ram0  (driver?)
> >> >> > > No filesystem could mount root, tried:
> >> >> > > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
> >> >> > >
> >> >> > > An example for a complete log is at:
> >> >> > > http://kerneltests.org/builders/qemu-metag-next/builds/489/steps/qemubuildcommand/logs/stdio
> >> >> > >
> >> >> > > bisect points to commit ef98de028afde ("kthread: allow to cancel kthread work").
> >> >> > > I don't know (yet) if other architectures are affected. bisect log is attached.
> >> >> > >
> >> >> > > The scripts to run this test are available at
> >> >> > > https://github.com/groeck/linux-build-test/tree/master/rootfs/metag.
> >> >> > >
> >> >> > > Guenter
> >> >> >
> >> >> > Thanks Guenter,
> >> >> >
> >> >> > It appears to be related to the command line. After that commit the
> >> >> > command line is shown as empty (rather than your "rdinit=/sbin/init
> >> >> > doreboot"), but it can still be overridden in the config and then it
> >> >> > continues to work.
> >> >> >
> >> >> Weird.
> >> >
> >> > Previously the Elf had a single load program header:
> >> >
> >> > Program Headers:
> >> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
> >> >   LOAD           0x004000 0x40000000 0x40000000 0x34b304 0x376230 RWE 0x4000
> >> >   NOTE           0x1ecc08 0x401e8c08 0x401e8c08 0x00000 0x00000 R   0x4
> >> >
> >> > QEMU puts the args at 40376230, straight after the load region (unlike a
> >> > real Meta Linux bootloader).
> >> >
> >> > After the above commit the ELF gets two load program headers, with a
> >> > small alignment gap in the middle:
> >> >
> >> > Program Headers:
> >> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
> >> >   LOAD           0x004000 0x40000000 0x40000000 0x18f118 0x18f118 R E 0x4000
> >> >   LOAD           0x194000 0x40190000 0x40190000 0x1bd304 0x1e8230 RWE 0x4000
> >> >   NOTE           0x1eec08 0x401eac08 0x401eac08 0x00000 0x00000 R   0x4
> >> >
> >> > Here this version of QEMU puts the args at where it thinks the end of
> >> > the loaded image is, which is based on the number of bytes copied from
> >> > the ELF, i.e. the total MemSiz's, not taking into account the alignment
> >> > gap in between, so it puts them at 0x40377348.
> >> >
> >> > But of course:
> >> > 40378230 B ___bss_stop
> >> >
> >> > so it wipes them out while clearing bss during early init.
> >> >
> >> > Previously:
> >> > 4018ebd0 T __sdata
> >> > 4018f000 R ___start_rodata
> >> >
> >> > now:
> >> > 4018ed98 T __sdata
> >> > 40190000 R ___start_rodata
> >> >
> >> > So I'm thinking this may have been triggered by c74ba8b3480d ("arch:
> >> > Introduce post-init read-only memory").
> >> >
> >> > The hack below does indeed reduce it to a single load section and this
> >> > version of QEMU then succeeds:
> >> >
> >> > Program Headers:
> >> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
> >> >   LOAD           0x004000 0x40000000 0x40000000 0x34d304 0x378230 RWE 0x4000
> >> >   NOTE           0x1eec88 0x401eac88 0x401eac88 0x00000 0x00000 R   0x4
> >> >
> >> > diff --git a/arch/metag/include/asm/cache.h b/arch/metag/include/asm/cache.h
> >> > index a43b650cfdc0..b5c7364a94da 100644
> >> > --- a/arch/metag/include/asm/cache.h
> >> > +++ b/arch/metag/include/asm/cache.h
> >> > @@ -20,4 +20,6 @@
> >> >
> >> >  #define __read_mostly __attribute__((__section__(".data..read_mostly")))
> >> >
> >> > +#define __ro_after_init        __read_mostly
> >> > +
> >> >  #endif
> >> >
> >> > Kees: Is it expected to get multiple load headers like this since your
> >> > patch c74ba8b3480d ("arch: Introduce post-init read-only memory"),
> >> > depending on alignment of the read only section?
> >>
> >> I'm not expecting that, and I'm especially not expecting any LOAD to
> >> have "RWE" flags. It looks like metag's vmlinux.lds.S is using the
> >> common RO_DATA_SECTION, so that doesn't jump out at me as a reason for
> >> RO_AFTER_INIT_DATA to end up in the wrong place. Also the second LOAD
> >> in the ELF is really huge, so it's not just the RO_AFTER_INIT_DATA
> >> section (which is currently very small).
> >>
> >> Could the metag linker be failing to override the section flags when
> >> putting RO_AFTER_INIT_DATA into RO_DATA_SECTION and this cuts the LOAD
> >> in half? This smells like a linker glitch.
> >
> > Where is that section flags override supposed to be happening?
> >
> > The .rodata section is indeed becoming slightly larger to fit the
> > ptmx_fops __ro_after_init, and in the process changing from flag A to
> > flag WA (presumably means becoming writable).
> >
> > This then appears to be hitting the condition in the linker whereby a
> > new segment is started if a writable section is found after a readonly
> > section, and the sections are on separate pages:
> >
> > https://sourceware.org/git/?p=binutils.git;a=blob;f=bfd/elf.c;h=8df38ee37923c020994715f111a0ffc6bae83c8c;hb=HEAD#l3952
> 
> Hm, well, I think x86, arm, and arm64 don't do this. They just clobber
> the flags from the section and force it to match the master section
> (i.e. ro_after_init gets rodata's flags).

Okay. Please forgive my ignorance, I'm just trying to understand the
mechanism, but is that thought to happen automatically due to
*(.data..ro_after_init) being in a section after *(.rodata) in the
linker script?

The definition in question is:
static struct file_operations ptmx_fops __ro_after_init;

Which isn't const or anything, so I'm not sure how else the linker is
supposed to know to make a section read-only that contains non-read-only
data.


Okay, I just built x86_64 default defconfig (on ef98de028afd, half way
through the mm patches on linux-next from the other day where metag
stopped booting). Perhaps I'm missing some important config option to
enable the memory protection (if so I appologise).

For metag:

$ readelf -S drivers/tty/pty.o
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [51] .data..ro_after_i PROGBITS        00000000 00f0c0 00007c 00  WA  0   0  4

$ readelf -S vmlinux.bust:
  [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
  [ 4] .rodata           PROGBITS        40190000 194000 04c9c8 00  WA  0   0 64

And x86_64:

$ readelf -S drivers/tty/pty.o
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [18] .data..ro_after_i PROGBITS         0000000000000000  00001140
       00000000000000f8  0000000000000000  WA       0     0     64

$ readelf -S vmlinux
  [Nr] Name              Type             Address           Offset
       Size              EntSize          Flags  Link  Info  Align
  [ 4] .rodata           PROGBITS         ffffffff81a00000  00c00000
       00000000002663d0  0000000000000000  WA       0     0     4096

Both have WA on that section in the object file and the final vmlinux
ELF too.

Thanks
James


> 
> >> However, since metag doesn't support CONFIG_DEBUG_RODATA, your above
> >> patch is likely correct. If metag marks memory read-only at kernel
> >> load time, it's doing it early enough that __ro_after_init will fail
> >> to work. If it never marks memory read-only, then __ro_after_init will
> >> have no effect.
> >
> > Yes, metag doesn't make that memory read only yet, although it would be
> > possible to do so (maybe delaying the switch to 4MB pages until
> > afterwards).
> >
> > Either way it doesn't sound that unreasonable to have multiple load
> > segments generated.
> 
> The bug here is that the ro_after_init portion of .rodata shouldn't be
> writable according to the ELF headers. It should be writable only
> because memory protection of .rodata hasn't happened yet. :)
> 
> >> (i.e. Both situations need the proposed patch).
> >
> > purely to avoid the curious linker segment behaviour (which only trips
> > up that QEMU because its broken), or for other reasons too?
> 
> Well, it means the linker behavior will go away since there will be no
> mixing of sections with different flags, and it'll match current
> expectations: __ro_after_init won't be read-only ever (same as
> .rodata).
> 
> Fixing memory permissions for metag would be nice, though! :)
> 
> -Kees
> 
> >
> > Thanks
> > James
> >
> >>
> >> Probably the best solution for all architectures is to invent a new
> >> CONFIG_ARCH_HAS... to identify the style of kernel memory protection a
> >> given architecture has so that the default for __ro_after_init can be
> >> more sensible (and the warnings about mark_rodata_ro() can better
> >> reflect reality).
> >>
> >> -Kees
> >>
> >> --
> >> Kees Cook
> >> Nexus Security
> 
> 
> 
> -- 
> Kees Cook
> Nexus Security
Kees Cook Sept. 19, 2016, 9:51 p.m. UTC | #7
On Mon, Sep 19, 2016 at 2:37 PM, James Hogan <james.hogan@imgtec.com> wrote:
> On Mon, Sep 19, 2016 at 12:53:49PM -0700, Kees Cook wrote:
>> On Mon, Sep 19, 2016 at 6:59 AM, James Hogan <james.hogan@imgtec.com> wrote:
>> > On Fri, Sep 16, 2016 at 07:58:12PM -0700, Kees Cook wrote:
>> >> On Fri, Sep 16, 2016 at 4:32 PM, James Hogan <james.hogan@imgtec.com> wrote:
>> >> > On Fri, Sep 16, 2016 at 02:37:18PM -0700, Guenter Roeck wrote:
>> >> >> On Fri, Sep 16, 2016 at 10:27:20PM +0100, James Hogan wrote:
>> >> >> > On Fri, Sep 16, 2016 at 01:38:19PM -0700, Guenter Roeck wrote:
>> >> >> > > Hi,
>> >> >> > >
>> >> >> > > I see the following runtime error in -next when running a metag qemu emulation.
>> >> >> > >
>> >> >> > > [ ... ]
>> >> >> > > workingset: timestamp_bits=30 max_order=16 bucket_order=0
>> >> >> > > io scheduler noop registered (default)
>> >> >> > > brd: module loaded
>> >> >> > > Warning: unable to open an initial console.
>> >> >> > > List of all partitions:
>> >> >> > > 0100           16384 ram0  (driver?)
>> >> >> > > No filesystem could mount root, tried:
>> >> >> > > Kernel panic - not syncing: VFS: Unable to mount root fs on unknown-block(1,0)
>> >> >> > >
>> >> >> > > An example for a complete log is at:
>> >> >> > > http://kerneltests.org/builders/qemu-metag-next/builds/489/steps/qemubuildcommand/logs/stdio
>> >> >> > >
>> >> >> > > bisect points to commit ef98de028afde ("kthread: allow to cancel kthread work").
>> >> >> > > I don't know (yet) if other architectures are affected. bisect log is attached.
>> >> >> > >
>> >> >> > > The scripts to run this test are available at
>> >> >> > > https://github.com/groeck/linux-build-test/tree/master/rootfs/metag.
>> >> >> > >
>> >> >> > > Guenter
>> >> >> >
>> >> >> > Thanks Guenter,
>> >> >> >
>> >> >> > It appears to be related to the command line. After that commit the
>> >> >> > command line is shown as empty (rather than your "rdinit=/sbin/init
>> >> >> > doreboot"), but it can still be overridden in the config and then it
>> >> >> > continues to work.
>> >> >> >
>> >> >> Weird.
>> >> >
>> >> > Previously the Elf had a single load program header:
>> >> >
>> >> > Program Headers:
>> >> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>> >> >   LOAD           0x004000 0x40000000 0x40000000 0x34b304 0x376230 RWE 0x4000
>> >> >   NOTE           0x1ecc08 0x401e8c08 0x401e8c08 0x00000 0x00000 R   0x4
>> >> >
>> >> > QEMU puts the args at 40376230, straight after the load region (unlike a
>> >> > real Meta Linux bootloader).
>> >> >
>> >> > After the above commit the ELF gets two load program headers, with a
>> >> > small alignment gap in the middle:
>> >> >
>> >> > Program Headers:
>> >> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>> >> >   LOAD           0x004000 0x40000000 0x40000000 0x18f118 0x18f118 R E 0x4000
>> >> >   LOAD           0x194000 0x40190000 0x40190000 0x1bd304 0x1e8230 RWE 0x4000
>> >> >   NOTE           0x1eec08 0x401eac08 0x401eac08 0x00000 0x00000 R   0x4
>> >> >
>> >> > Here this version of QEMU puts the args at where it thinks the end of
>> >> > the loaded image is, which is based on the number of bytes copied from
>> >> > the ELF, i.e. the total MemSiz's, not taking into account the alignment
>> >> > gap in between, so it puts them at 0x40377348.
>> >> >
>> >> > But of course:
>> >> > 40378230 B ___bss_stop
>> >> >
>> >> > so it wipes them out while clearing bss during early init.
>> >> >
>> >> > Previously:
>> >> > 4018ebd0 T __sdata
>> >> > 4018f000 R ___start_rodata
>> >> >
>> >> > now:
>> >> > 4018ed98 T __sdata
>> >> > 40190000 R ___start_rodata
>> >> >
>> >> > So I'm thinking this may have been triggered by c74ba8b3480d ("arch:
>> >> > Introduce post-init read-only memory").
>> >> >
>> >> > The hack below does indeed reduce it to a single load section and this
>> >> > version of QEMU then succeeds:
>> >> >
>> >> > Program Headers:
>> >> >   Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
>> >> >   LOAD           0x004000 0x40000000 0x40000000 0x34d304 0x378230 RWE 0x4000
>> >> >   NOTE           0x1eec88 0x401eac88 0x401eac88 0x00000 0x00000 R   0x4
>> >> >
>> >> > diff --git a/arch/metag/include/asm/cache.h b/arch/metag/include/asm/cache.h
>> >> > index a43b650cfdc0..b5c7364a94da 100644
>> >> > --- a/arch/metag/include/asm/cache.h
>> >> > +++ b/arch/metag/include/asm/cache.h
>> >> > @@ -20,4 +20,6 @@
>> >> >
>> >> >  #define __read_mostly __attribute__((__section__(".data..read_mostly")))
>> >> >
>> >> > +#define __ro_after_init        __read_mostly
>> >> > +
>> >> >  #endif
>> >> >
>> >> > Kees: Is it expected to get multiple load headers like this since your
>> >> > patch c74ba8b3480d ("arch: Introduce post-init read-only memory"),
>> >> > depending on alignment of the read only section?
>> >>
>> >> I'm not expecting that, and I'm especially not expecting any LOAD to
>> >> have "RWE" flags. It looks like metag's vmlinux.lds.S is using the
>> >> common RO_DATA_SECTION, so that doesn't jump out at me as a reason for
>> >> RO_AFTER_INIT_DATA to end up in the wrong place. Also the second LOAD
>> >> in the ELF is really huge, so it's not just the RO_AFTER_INIT_DATA
>> >> section (which is currently very small).
>> >>
>> >> Could the metag linker be failing to override the section flags when
>> >> putting RO_AFTER_INIT_DATA into RO_DATA_SECTION and this cuts the LOAD
>> >> in half? This smells like a linker glitch.
>> >
>> > Where is that section flags override supposed to be happening?
>> >
>> > The .rodata section is indeed becoming slightly larger to fit the
>> > ptmx_fops __ro_after_init, and in the process changing from flag A to
>> > flag WA (presumably means becoming writable).
>> >
>> > This then appears to be hitting the condition in the linker whereby a
>> > new segment is started if a writable section is found after a readonly
>> > section, and the sections are on separate pages:
>> >
>> > https://sourceware.org/git/?p=binutils.git;a=blob;f=bfd/elf.c;h=8df38ee37923c020994715f111a0ffc6bae83c8c;hb=HEAD#l3952
>>
>> Hm, well, I think x86, arm, and arm64 don't do this. They just clobber
>> the flags from the section and force it to match the master section
>> (i.e. ro_after_init gets rodata's flags).
>
> Okay. Please forgive my ignorance, I'm just trying to understand the
> mechanism, but is that thought to happen automatically due to
> *(.data..ro_after_init) being in a section after *(.rodata) in the
> linker script?
>
> The definition in question is:
> static struct file_operations ptmx_fops __ro_after_init;
>
> Which isn't const or anything, so I'm not sure how else the linker is
> supposed to know to make a section read-only that contains non-read-only
> data.
>
>
> Okay, I just built x86_64 default defconfig (on ef98de028afd, half way
> through the mm patches on linux-next from the other day where metag
> stopped booting). Perhaps I'm missing some important config option to
> enable the memory protection (if so I appologise).
>
> For metag:
>
> $ readelf -S drivers/tty/pty.o
>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>   [51] .data..ro_after_i PROGBITS        00000000 00f0c0 00007c 00  WA  0   0  4
>
> $ readelf -S vmlinux.bust:
>   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>   [ 4] .rodata           PROGBITS        40190000 194000 04c9c8 00  WA  0   0 64
>
> And x86_64:
>
> $ readelf -S drivers/tty/pty.o
>   [Nr] Name              Type             Address           Offset
>        Size              EntSize          Flags  Link  Info  Align
>   [18] .data..ro_after_i PROGBITS         0000000000000000  00001140
>        00000000000000f8  0000000000000000  WA       0     0     64
>
> $ readelf -S vmlinux
>   [Nr] Name              Type             Address           Offset
>        Size              EntSize          Flags  Link  Info  Align
>   [ 4] .rodata           PROGBITS         ffffffff81a00000  00c00000
>        00000000002663d0  0000000000000000  WA       0     0     4096
>
> Both have WA on that section in the object file and the final vmlinux
> ELF too.

Hm, very true, I never noticed that. Oddly, the LOAD flags don't pay
any attention on x86:

$ readelf -l vmlinux

Elf file type is EXEC (Executable file)
Entry point 0x1000000
There are 5 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  LOAD           0x0000000000200000 0xffffffff81000000 0x0000000001000000
                 0x0000000000fdc000 0x0000000000fdc000  R E    200000
  LOAD           0x0000000001200000 0xffffffff82000000 0x0000000002000000
                 0x0000000000155000 0x0000000000155000  RW     200000
  LOAD           0x0000000001400000 0x0000000000000000 0x0000000002155000
                 0x0000000000019488 0x0000000000019488  RW     200000
  LOAD           0x000000000156f000 0xffffffff8216f000 0x000000000216f000
                 0x0000000000122000 0x0000000000eb4000  RWE    200000
  NOTE           0x0000000000ca0248 0xffffffff81aa0248 0x0000000001aa0248
                 0x0000000000000024 0x0000000000000024         4

 Section to Segment mapping:
  Segment Sections...
   00     .text .notes __ex_table .rodata __bug_table .pci_fixup
.builtin_fw .tracedata __ksymtab __ksymtab_gpl __ksymtab_strings
__param __modver
   01     .data .vvar
   02     .data..percpu
   03     .init.text .altinstr_aux .init.data .x86_cpu_dev.init
.altinstructions .altinstr_replacement .iommu_table .apicdrivers
.exit.text .smp_locks .bss .brk
   04     .notes

The first load (containing .rodata) is "R E".

But, the point is: the kernel is what sets up the permissions, so the
flags are ignored anyway.

-Kees
James Hogan Sept. 19, 2016, 10:57 p.m. UTC | #8
On Mon, Sep 19, 2016 at 02:51:54PM -0700, Kees Cook wrote:
> On Mon, Sep 19, 2016 at 2:37 PM, James Hogan <james.hogan@imgtec.com> wrote:
> > Okay, I just built x86_64 default defconfig (on ef98de028afd, half way
> > through the mm patches on linux-next from the other day where metag
> > stopped booting). Perhaps I'm missing some important config option to
> > enable the memory protection (if so I appologise).
> >
> > For metag:
> >
> > $ readelf -S drivers/tty/pty.o
> >   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
> >   [51] .data..ro_after_i PROGBITS        00000000 00f0c0 00007c 00  WA  0   0  4
> >
> > $ readelf -S vmlinux.bust:
> >   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
> >   [ 4] .rodata           PROGBITS        40190000 194000 04c9c8 00  WA  0   0 64
> >
> > And x86_64:
> >
> > $ readelf -S drivers/tty/pty.o
> >   [Nr] Name              Type             Address           Offset
> >        Size              EntSize          Flags  Link  Info  Align
> >   [18] .data..ro_after_i PROGBITS         0000000000000000  00001140
> >        00000000000000f8  0000000000000000  WA       0     0     64
> >
> > $ readelf -S vmlinux
> >   [Nr] Name              Type             Address           Offset
> >        Size              EntSize          Flags  Link  Info  Align
> >   [ 4] .rodata           PROGBITS         ffffffff81a00000  00c00000
> >        00000000002663d0  0000000000000000  WA       0     0     4096
> >
> > Both have WA on that section in the object file and the final vmlinux
> > ELF too.
> 
> Hm, very true, I never noticed that. Oddly, the LOAD flags don't pay
> any attention on x86:
> 
> $ readelf -l vmlinux
> 
> Elf file type is EXEC (Executable file)
> Entry point 0x1000000
> There are 5 program headers, starting at offset 64
> 
> Program Headers:
>   Type           Offset             VirtAddr           PhysAddr
>                  FileSiz            MemSiz              Flags  Align
>   LOAD           0x0000000000200000 0xffffffff81000000 0x0000000001000000
>                  0x0000000000fdc000 0x0000000000fdc000  R E    200000
>   LOAD           0x0000000001200000 0xffffffff82000000 0x0000000002000000
>                  0x0000000000155000 0x0000000000155000  RW     200000
>   LOAD           0x0000000001400000 0x0000000000000000 0x0000000002155000
>                  0x0000000000019488 0x0000000000019488  RW     200000
>   LOAD           0x000000000156f000 0xffffffff8216f000 0x000000000216f000
>                  0x0000000000122000 0x0000000000eb4000  RWE    200000
>   NOTE           0x0000000000ca0248 0xffffffff81aa0248 0x0000000001aa0248
>                  0x0000000000000024 0x0000000000000024         4
> 
>  Section to Segment mapping:
>   Segment Sections...
>    00     .text .notes __ex_table .rodata __bug_table .pci_fixup
> .builtin_fw .tracedata __ksymtab __ksymtab_gpl __ksymtab_strings
> __param __modver
>    01     .data .vvar
>    02     .data..percpu
>    03     .init.text .altinstr_aux .init.data .x86_cpu_dev.init
> .altinstructions .altinstr_replacement .iommu_table .apicdrivers
> .exit.text .smp_locks .bss .brk
>    04     .notes
> 
> The first load (containing .rodata) is "R E".

Aah, right, I think thats because the program headers are specified
explicitly in arch/x86/kernel/vmlinux.lds.S:

PHDRS {
	text PT_LOAD FLAGS(5);          /* R_E */
	data PT_LOAD FLAGS(6);          /* RW_ */
#ifdef CONFIG_X86_64
#ifdef CONFIG_SMP
	percpu PT_LOAD FLAGS(6);        /* RW_ */
#endif
	init PT_LOAD FLAGS(7);          /* RWE */
#endif
	note PT_NOTE FLAGS(0);          /* ___ */
}

The bit I was missing is that RO_DATA() is after .text, but before
.data, so counts as part of the PT_LOAD program header for text.

> 
> But, the point is: the kernel is what sets up the permissions, so the
> flags are ignored anyway.

Indeed.

Thanks for your patience working through this stuff with me :)

Cheers
James
Kees Cook Sept. 19, 2016, 11:19 p.m. UTC | #9
On Mon, Sep 19, 2016 at 3:57 PM, James Hogan <james.hogan@imgtec.com> wrote:
> On Mon, Sep 19, 2016 at 02:51:54PM -0700, Kees Cook wrote:
>> On Mon, Sep 19, 2016 at 2:37 PM, James Hogan <james.hogan@imgtec.com> wrote:
>> > Okay, I just built x86_64 default defconfig (on ef98de028afd, half way
>> > through the mm patches on linux-next from the other day where metag
>> > stopped booting). Perhaps I'm missing some important config option to
>> > enable the memory protection (if so I appologise).
>> >
>> > For metag:
>> >
>> > $ readelf -S drivers/tty/pty.o
>> >   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>> >   [51] .data..ro_after_i PROGBITS        00000000 00f0c0 00007c 00  WA  0   0  4
>> >
>> > $ readelf -S vmlinux.bust:
>> >   [Nr] Name              Type            Addr     Off    Size   ES Flg Lk Inf Al
>> >   [ 4] .rodata           PROGBITS        40190000 194000 04c9c8 00  WA  0   0 64
>> >
>> > And x86_64:
>> >
>> > $ readelf -S drivers/tty/pty.o
>> >   [Nr] Name              Type             Address           Offset
>> >        Size              EntSize          Flags  Link  Info  Align
>> >   [18] .data..ro_after_i PROGBITS         0000000000000000  00001140
>> >        00000000000000f8  0000000000000000  WA       0     0     64
>> >
>> > $ readelf -S vmlinux
>> >   [Nr] Name              Type             Address           Offset
>> >        Size              EntSize          Flags  Link  Info  Align
>> >   [ 4] .rodata           PROGBITS         ffffffff81a00000  00c00000
>> >        00000000002663d0  0000000000000000  WA       0     0     4096
>> >
>> > Both have WA on that section in the object file and the final vmlinux
>> > ELF too.
>>
>> Hm, very true, I never noticed that. Oddly, the LOAD flags don't pay
>> any attention on x86:
>>
>> $ readelf -l vmlinux
>>
>> Elf file type is EXEC (Executable file)
>> Entry point 0x1000000
>> There are 5 program headers, starting at offset 64
>>
>> Program Headers:
>>   Type           Offset             VirtAddr           PhysAddr
>>                  FileSiz            MemSiz              Flags  Align
>>   LOAD           0x0000000000200000 0xffffffff81000000 0x0000000001000000
>>                  0x0000000000fdc000 0x0000000000fdc000  R E    200000
>>   LOAD           0x0000000001200000 0xffffffff82000000 0x0000000002000000
>>                  0x0000000000155000 0x0000000000155000  RW     200000
>>   LOAD           0x0000000001400000 0x0000000000000000 0x0000000002155000
>>                  0x0000000000019488 0x0000000000019488  RW     200000
>>   LOAD           0x000000000156f000 0xffffffff8216f000 0x000000000216f000
>>                  0x0000000000122000 0x0000000000eb4000  RWE    200000
>>   NOTE           0x0000000000ca0248 0xffffffff81aa0248 0x0000000001aa0248
>>                  0x0000000000000024 0x0000000000000024         4
>>
>>  Section to Segment mapping:
>>   Segment Sections...
>>    00     .text .notes __ex_table .rodata __bug_table .pci_fixup
>> .builtin_fw .tracedata __ksymtab __ksymtab_gpl __ksymtab_strings
>> __param __modver
>>    01     .data .vvar
>>    02     .data..percpu
>>    03     .init.text .altinstr_aux .init.data .x86_cpu_dev.init
>> .altinstructions .altinstr_replacement .iommu_table .apicdrivers
>> .exit.text .smp_locks .bss .brk
>>    04     .notes
>>
>> The first load (containing .rodata) is "R E".
>
> Aah, right, I think thats because the program headers are specified
> explicitly in arch/x86/kernel/vmlinux.lds.S:
>
> PHDRS {
>         text PT_LOAD FLAGS(5);          /* R_E */
>         data PT_LOAD FLAGS(6);          /* RW_ */
> #ifdef CONFIG_X86_64
> #ifdef CONFIG_SMP
>         percpu PT_LOAD FLAGS(6);        /* RW_ */
> #endif
>         init PT_LOAD FLAGS(7);          /* RWE */
> #endif
>         note PT_NOTE FLAGS(0);          /* ___ */
> }

Ah-ha! That solves that mystery for me. :)

> The bit I was missing is that RO_DATA() is after .text, but before
> .data, so counts as part of the PT_LOAD program header for text.

Right, originally, it was so that there could be a single read-only
mapping covering both, but ultimately it doesn't matter now since they
can't share a mapping anyway: text needs to be read-only and
executable and rodata needs to be read-only and non-executable.

>> But, the point is: the kernel is what sets up the permissions, so the
>> flags are ignored anyway.
>
> Indeed.
>
> Thanks for your patience working through this stuff with me :)

No problem; I learned some stuff too. :)

-Kees
Petr Mladek Sept. 27, 2016, 10:12 a.m. UTC | #10
On Mon 2016-09-19 08:45:09, Guenter Roeck wrote:
> On Mon, Sep 19, 2016 at 03:55:29PM +0100, James Hogan wrote:
> > On Sat, Sep 17, 2016 at 12:32:49AM +0100, James Hogan wrote:
> > > Here this version of QEMU puts the args at where it thinks the end of
> > > the loaded image is, which is based on the number of bytes copied from
> > > the ELF, i.e. the total MemSiz's, not taking into account the alignment
> > > gap in between, so it puts them at 0x40377348.
> > 
> > QEMU meta-v1.3.1 branch updated at:
> > https://github.com/img-meta/qemu.git
> > 
> > Hopefully that'll fix it for you Guenter.
> > 
> Confirmed fixed.

Could you please confirm that the boot problem has been fixed
on the qemu side? I guess that it is
https://github.com/img-meta/qemu/commit/0a2402860228198ae2729048f1de05aeedb7d642

Could Andrew enable all the kthread worker API improvements in -mm
tree again?

I think that kthread worker patch has been an innocent victim.
It added some functions that were not used anywhere. I think
that it has triggered the boot problem just by chance.

Best Regards,
Petr
James Hogan Sept. 27, 2016, 10:19 a.m. UTC | #11
On 27 September 2016 11:12:36 BST, Petr Mladek <pmladek@suse.com> wrote:
>On Mon 2016-09-19 08:45:09, Guenter Roeck wrote:
>> On Mon, Sep 19, 2016 at 03:55:29PM +0100, James Hogan wrote:
>> > On Sat, Sep 17, 2016 at 12:32:49AM +0100, James Hogan wrote:
>> > > Here this version of QEMU puts the args at where it thinks the
>end of
>> > > the loaded image is, which is based on the number of bytes copied
>from
>> > > the ELF, i.e. the total MemSiz's, not taking into account the
>alignment
>> > > gap in between, so it puts them at 0x40377348.
>> > 
>> > QEMU meta-v1.3.1 branch updated at:
>> > https://github.com/img-meta/qemu.git
>> > 
>> > Hopefully that'll fix it for you Guenter.
>> > 
>> Confirmed fixed.
>
>Could you please confirm that the boot problem has been fixed
>on the qemu side? I guess that it is
>https://github.com/img-meta/qemu/commit/0a2402860228198ae2729048f1de05aeedb7d642

Yes, that patch is sufficient.

>
>Could Andrew enable all the kthread worker API improvements in -mm
>tree again?
>
>I think that kthread worker patch has been an innocent victim.
>It added some functions that were not used anywhere. I think
>that it has triggered the boot problem just by chance.

Agreed, it altered the alignment of the sections enough to trigger linker generating multiple load program headers with a gap in between which confused qemu.

Cheers
James

Patch
diff mbox

diff --git a/arch/metag/include/asm/cache.h b/arch/metag/include/asm/cache.h
index a43b650cfdc0..b5c7364a94da 100644
--- a/arch/metag/include/asm/cache.h
+++ b/arch/metag/include/asm/cache.h
@@ -20,4 +20,6 @@ 
 
 #define __read_mostly __attribute__((__section__(".data..read_mostly")))
 
+#define __ro_after_init        __read_mostly
+
 #endif