Message ID | 20151123155132.GC32300@e104818-lin.cambridge.arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 11/23/2015 09:51 AM, Catalin Marinas wrote: > Call trace: > [<ffffffc0000952b8>] __create_mapping.isra.5+0x360/0x530 > [<ffffffc0000954ec>] fixup_init+0x64/0x80 > [<ffffffc0000945a4>] free_initmem+0xc/0x38 > [<ffffffc0005eb9f8>] kernel_init+0x20/0xe0 > [<ffffffc000085c50>] ret_from_fork+0x10/0x40 > > What I don't get is why we have fixup_init() even when > !CONFIG_DEBUG_RODATA. It probably breaks the initial mapping just to get > a non-executable init section. However, the other sections are left > executable when this config option is disabled. The patch below fixes > the warnings above: Well the kernel permissions are a bit of a mess, and not at all "secure" in their current state. But I guess the thought must have been that turning off execute on the init sections was a good way to find functions incorrectly marked _init(). Which is different from RO. Frankly, I expect someone will push to cleanup the kernel permissions at some point (I've got it on my "spare time todo, list"), but this will of course make the create_mapping_late a lot more popular as I see it being called during module load/unload. Anyway, I've been saying the problem is create_mapping_late() all along, as you notice there isn't any TLB flushes in fixup_init() and that is the core of the problem, not all this other discussion. > Jeremy, can you give this fixup_init() patch a try, see whether it makes > any difference (it's just a temporary hack which may prevent us from > reverting the PTE_CONT patches until we have a better solution). I will try it, but i'm pretty sure that fixes it too.
On 11/23/15 8:02 AM, Jeremy Linton wrote: > On 11/23/2015 09:51 AM, Catalin Marinas wrote: >> Call trace: >> [<ffffffc0000952b8>] __create_mapping.isra.5+0x360/0x530 >> [<ffffffc0000954ec>] fixup_init+0x64/0x80 >> [<ffffffc0000945a4>] free_initmem+0xc/0x38 >> [<ffffffc0005eb9f8>] kernel_init+0x20/0xe0 >> [<ffffffc000085c50>] ret_from_fork+0x10/0x40 >> >> What I don't get is why we have fixup_init() even when >> !CONFIG_DEBUG_RODATA. It probably breaks the initial mapping just to get >> a non-executable init section. However, the other sections are left >> executable when this config option is disabled. The patch below fixes >> the warnings above: > > Well the kernel permissions are a bit of a mess, and not at all > "secure" in their current state. But I guess the thought must have been > that turning off execute on the init sections was a good way to find > functions incorrectly marked _init(). Which is different from RO. > Frankly, I expect someone will push to cleanup the kernel permissions at > some point (I've got it on my "spare time todo, list"), but this will of > course make the create_mapping_late a lot more popular as I see it being > called during module load/unload. > Anyway, I've been saying the problem is create_mapping_late() all > along, as you notice there isn't any TLB flushes in fixup_init() and > that is the core of the problem, not all this other discussion. > fixup_init was deliberately designed to change the sections even if DEBUG_RODATA was not enabled. This was partially designed to match the behavior of arm(32) which also drops the nx bit but also good practice in general for security. Which permissions still need to be cleaned up from your perspective? Thanks, Laura
On 11/23/2015 10:37 AM, Laura Abbott wrote:
> Which permissions still need to be cleaned up from your perspective?
IMHO, the vast majority of the linear map should not be
executable/read/write, which is what happens today.
On 11/23/15 8:42 AM, Jeremy Linton wrote: > On 11/23/2015 10:37 AM, Laura Abbott wrote: >> Which permissions still need to be cleaned up from your perspective? > > IMHO, the vast majority of the linear map should not be executable/read/write, which is what happens today. > With CONFIG_DEBUG_RODATA on I'm not seeing any rwx regions # cat /sys/kernel/debug/kernel_page_tables ---[ vmalloc() Area ]--- 0xffffff8000000000-0xffffff8000010000 64K RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000012000-0xffffff8000013000 4K RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000018000-0xffffff8000019000 4K RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000020000-0xffffff8000030000 64K RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000031000-0xffffff8000071000 256K RW NX SHD AF UXN MEM/NORMAL-NC 0xffffff8000080000-0xffffff8000180000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff80001ae000-0xffffff80001b1000 12K RW NX SHD AF UXN MEM/NORMAL 0xffffff8000200000-0xffffff8000300000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000380000-0xffffff8000480000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000500000-0xffffff8000600000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000680000-0xffffff8000780000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000800000-0xffffff8000900000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000980000-0xffffff8000a80000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000b00000-0xffffff8000c00000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000c80000-0xffffff8000d80000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000e00000-0xffffff8000f00000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8000f80000-0xffffff8001080000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8001100000-0xffffff8001200000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8001280000-0xffffff8001380000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8001400000-0xffffff8001500000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8001580000-0xffffff8001680000 1M RW NX SHD AF UXN DEVICE/nGnRE 0xffffff8001700000-0xffffff8001800000 1M RW NX SHD AF UXN DEVICE/nGnRE ---[ vmalloc() End ]--- ---[ vmemmap start ]--- 0xffffffbdc1000000-0xffffffbdc3000000 32M RW NX SHD AF BLK UXN MEM/NORMAL ---[ vmemmap end ]--- ---[ Fixmap start ]--- 0xffffffbffa800000-0xffffffbffaa00000 2M ro NX SHD AF BLK UXN MEM/NORMAL ---[ Fixmap end ]--- ---[ PCI I/O start ]--- 0xffffffbffae00000-0xffffffbffae10000 64K RW NX SHD AF UXN DEVICE/nGnRE ---[ PCI I/O end ]--- ---[ Modules start ]--- ---[ Modules end ]--- ---[ Kernel Mapping ]--- 0xffffffc000000000-0xffffffc000080000 512K RW NX SHD AF CON UXN MEM/NORMAL 0xffffffc000080000-0xffffffc000082000 8K RW NX SHD AF UXN MEM/NORMAL 0xffffffc000082000-0xffffffc000090000 56K ro x SHD AF UXN MEM/NORMAL 0xffffffc000090000-0xffffffc000200000 1472K ro x SHD AF CON UXN MEM/NORMAL 0xffffffc000200000-0xffffffc000800000 6M ro x SHD AF BLK UXN MEM/NORMAL 0xffffffc000800000-0xffffffc000980000 1536K ro x SHD AF CON UXN MEM/NORMAL 0xffffffc000980000-0xffffffc000987000 28K ro x SHD AF UXN MEM/NORMAL 0xffffffc000987000-0xffffffc000990000 36K RW NX SHD AF UXN MEM/NORMAL 0xffffffc000990000-0xffffffc000a00000 448K RW NX SHD AF CON UXN MEM/NORMAL 0xffffffc000a00000-0xffffffc000a10000 64K RW NX SHD AF UXN MEM/NORMAL 0xffffffc000a10000-0xffffffc000c00000 1984K RW NX SHD AF CON UXN MEM/NORMAL 0xffffffc000c00000-0xffffffc040000000 1012M RW NX SHD AF BLK UXN MEM/NORMAL 0xffffffc040000000-0xffffffc080000000 1G RW NX SHD AF BLK UXN MEM/NORMAL are there other parts of the map you're seeing that aren't on my system? Thanks, Laura
On 11/23/2015 11:52 AM, Laura Abbott wrote: > On 11/23/15 8:42 AM, Jeremy Linton wrote: >> On 11/23/2015 10:37 AM, Laura Abbott wrote: >>> Which permissions still need to be cleaned up from your perspective? >> >> IMHO, the vast majority of the linear map should not be >> executable/read/write, which is what happens today. >> > > With CONFIG_DEBUG_RODATA on I'm not seeing any rwx regions > 0xffffffc000000000-0xffffffc000080000 512K RW NX SHD AF > CON UXN MEM/NORMAL > 0xffffffc000080000-0xffffffc000082000 8K RW NX SHD > AF UXN MEM/NORMAL > 0xffffffc000082000-0xffffffc000090000 56K ro x SHD > AF UXN MEM/NORMAL > 0xffffffc000090000-0xffffffc000200000 1472K ro x SHD AF > CON UXN MEM/NORMAL > 0xffffffc000200000-0xffffffc000800000 6M ro x SHD > AF BLK UXN MEM/NORMAL > 0xffffffc000800000-0xffffffc000980000 1536K ro x SHD AF > CON UXN MEM/NORMAL > 0xffffffc000980000-0xffffffc000987000 28K ro x SHD > AF UXN MEM/NORMAL > 0xffffffc000987000-0xffffffc000990000 36K RW NX SHD > AF UXN MEM/NORMAL > 0xffffffc000990000-0xffffffc000a00000 448K RW NX SHD AF > CON UXN MEM/NORMAL > 0xffffffc000a00000-0xffffffc000a10000 64K RW NX SHD > AF UXN MEM/NORMAL > 0xffffffc000a10000-0xffffffc000c00000 1984K RW NX SHD AF > CON UXN MEM/NORMAL > 0xffffffc000c00000-0xffffffc040000000 1012M RW NX SHD > AF BLK UXN MEM/NORMAL > 0xffffffc040000000-0xffffffc080000000 1G RW NX SHD > AF BLK UXN MEM/NORMAL > > are there other parts of the map you're seeing that aren't on my system? Thats my mistake, I didn't remember seeing the NX set for the block mappings with the RO case, the machine I just checked looks good too. So, I you can ignore that little rant.
On 23 November 2015 at 17:42, Jeremy Linton <jlinton@redhat.com> wrote: > On 11/23/2015 10:37 AM, Laura Abbott wrote: >> >> Which permissions still need to be cleaned up from your perspective? > > > IMHO, the vast majority of the linear map should not be > executable/read/write, which is what happens today. > IMHO, this should not be hidden under DEBUG_RODATA (which some people may mistake for a debug option, while others may assume it affects rodata only) since the kernel stacks are executable without it.
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h index defbfde43a43..70e02e3b1a78 100644 --- a/arch/arm64/include/asm/pgtable.h +++ b/arch/arm64/include/asm/pgtable.h @@ -240,6 +240,10 @@ static inline void set_pte(pte_t *ptep, pte_t pte) if (WARN_ON((old & PTE_ATTRINDX_MASK) != (new & PTE_ATTRINDX_MASK))) goto pte_bad; + /* Changing contiguity may lead to a TLB conflict */ + if (WARN_ON((old & PTE_CONT) != (new & PTE_CONT))) + goto pte_bad; + /* Change of OA is only an issue if one mapping is writable */ if (!(old & new & PTE_RDONLY) && WARN_ON(pte_pfn(*ptep) != pte_pfn(pte))) --------------8<-------------------- But it doesn't look nice afterwards. It's the fixup_init() trying to re-write the entries and we start changing the PTE_CONT bit: Call trace: [<ffffffc0000952b8>] __create_mapping.isra.5+0x360/0x530 [<ffffffc0000954ec>] fixup_init+0x64/0x80 [<ffffffc0000945a4>] free_initmem+0xc/0x38 [<ffffffc0005eb9f8>] kernel_init+0x20/0xe0 [<ffffffc000085c50>] ret_from_fork+0x10/0x40 What I don't get is why we have fixup_init() even when !CONFIG_DEBUG_RODATA. It probably breaks the initial mapping just to get a non-executable init section. However, the other sections are left executable when this config option is disabled. The patch below fixes the warnings above: --------------8<-------------------- diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c index abb66f84d4ac..e0f82e1a1c09 100644 --- a/arch/arm64/mm/mmu.c +++ b/arch/arm64/mm/mmu.c @@ -482,9 +482,11 @@ void mark_rodata_ro(void) void fixup_init(void) { +#ifdef CONFIG_DEBUG_RODATA create_mapping_late(__pa(__init_begin), (unsigned long)__init_begin, (unsigned long)__init_end - (unsigned long)__init_begin, PAGE_KERNEL); +#endif } /*