Message ID | 20200206025825.22934-1-yanaijie@huawei.com (mailing list archive) |
---|---|
Headers | show |
Series | implement KASLR for powerpc/fsl_booke/64 | expand |
Hi everyone, any comments or suggestions? Thanks, Jason on 2020/2/6 10:58, Jason Yan wrote: > This is a try to implement KASLR for Freescale BookE64 which is based on > my earlier implementation for Freescale BookE32: > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718 > > The implementation for Freescale BookE64 is similar as BookE32. One > difference is that Freescale BookE64 set up a TLB mapping of 1G during > booting. Another difference is that ppc64 needs the kernel to be > 64K-aligned. So we can randomize the kernel in this 1G mapping and make > it 64K-aligned. This can save some code to creat another TLB map at > early boot. The disadvantage is that we only have about 1G/64K = 16384 > slots to put the kernel in. > > KERNELBASE > > 64K |--> kernel <--| > | | | > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ > | | | |....| | | | | | | | | |....| | | > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ > | | 1G > |-----> offset <-----| > > kernstart_virt_addr > > I'm not sure if the slot numbers is enough or the design has any > defects. If you have some better ideas, I would be happy to hear that. > > Thank you all. > > v2->v3: > Fix build error when KASLR is disabled. > v1->v2: > Add __kaslr_offset for the secondary cpu boot up. > > Jason Yan (6): > powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and > kaslr_early_init() > powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper > powerpc/fsl_booke/64: implement KASLR for fsl_booke64 > powerpc/fsl_booke/64: do not clear the BSS for the second pass > powerpc/fsl_booke/64: clear the original kernel if randomized > powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst > and add 64bit part > > .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++-- > arch/powerpc/Kconfig | 2 +- > arch/powerpc/kernel/exceptions-64e.S | 23 ++++++ > arch/powerpc/kernel/head_64.S | 14 ++++ > arch/powerpc/kernel/setup_64.c | 4 +- > arch/powerpc/mm/mmu_decl.h | 19 ++--- > arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------ > 7 files changed, 132 insertions(+), 36 deletions(-) > rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%) >
ping... on 2020/2/13 11:00, Jason Yan wrote: > Hi everyone, any comments or suggestions? > > Thanks, > Jason > > on 2020/2/6 10:58, Jason Yan wrote: >> This is a try to implement KASLR for Freescale BookE64 which is based on >> my earlier implementation for Freescale BookE32: >> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718 >> >> The implementation for Freescale BookE64 is similar as BookE32. One >> difference is that Freescale BookE64 set up a TLB mapping of 1G during >> booting. Another difference is that ppc64 needs the kernel to be >> 64K-aligned. So we can randomize the kernel in this 1G mapping and make >> it 64K-aligned. This can save some code to creat another TLB map at >> early boot. The disadvantage is that we only have about 1G/64K = 16384 >> slots to put the kernel in. >> >> KERNELBASE >> >> 64K |--> kernel <--| >> | | | >> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ >> | | | |....| | | | | | | | | |....| | | >> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ >> | | 1G >> |-----> offset <-----| >> >> kernstart_virt_addr >> >> I'm not sure if the slot numbers is enough or the design has any >> defects. If you have some better ideas, I would be happy to hear that. >> >> Thank you all. >> >> v2->v3: >> Fix build error when KASLR is disabled. >> v1->v2: >> Add __kaslr_offset for the secondary cpu boot up. >> >> Jason Yan (6): >> powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and >> kaslr_early_init() >> powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper >> powerpc/fsl_booke/64: implement KASLR for fsl_booke64 >> powerpc/fsl_booke/64: do not clear the BSS for the second pass >> powerpc/fsl_booke/64: clear the original kernel if randomized >> powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst >> and add 64bit part >> >> .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++-- >> arch/powerpc/Kconfig | 2 +- >> arch/powerpc/kernel/exceptions-64e.S | 23 ++++++ >> arch/powerpc/kernel/head_64.S | 14 ++++ >> arch/powerpc/kernel/setup_64.c | 4 +- >> arch/powerpc/mm/mmu_decl.h | 19 ++--- >> arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------ >> 7 files changed, 132 insertions(+), 36 deletions(-) >> rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} >> (59%) >> > > > .
Hi Jason, > This is a try to implement KASLR for Freescale BookE64 which is based on > my earlier implementation for Freescale BookE32: > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718 > > The implementation for Freescale BookE64 is similar as BookE32. One > difference is that Freescale BookE64 set up a TLB mapping of 1G during > booting. Another difference is that ppc64 needs the kernel to be > 64K-aligned. So we can randomize the kernel in this 1G mapping and make > it 64K-aligned. This can save some code to creat another TLB map at > early boot. The disadvantage is that we only have about 1G/64K = 16384 > slots to put the kernel in. > > KERNELBASE > > 64K |--> kernel <--| > | | | > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ > | | | |....| | | | | | | | | |....| | | > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ > | | 1G > |-----> offset <-----| > > kernstart_virt_addr > > I'm not sure if the slot numbers is enough or the design has any > defects. If you have some better ideas, I would be happy to hear that. > > Thank you all. > Are you making any attempt to hide kernel address leaks in this series? I've just been looking at the stackdump code just now, and it directly prints link registers and stack pointers, which is probably enough to determine the kernel base address: SPs: LRs: %pS pointer [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 (unreliable) [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac [ 0.424659] [c0000000de403a60] [c0000000024d7a00] mount_block_root+0x634/0x7c0 [ 0.424734] [c0000000de403be0] [c0000000024d8100] prepare_namespace+0x1ec/0x23c [ 0.424811] [c0000000de403c60] [c0000000024d7010] kernel_init_freeable+0x804/0x880 git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all in process.c or in xmon. Maybe replacing the REG format string in KASLR mode would be sufficient? Regards, Daniel > v2->v3: > Fix build error when KASLR is disabled. > v1->v2: > Add __kaslr_offset for the secondary cpu boot up. > > Jason Yan (6): > powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and > kaslr_early_init() > powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper > powerpc/fsl_booke/64: implement KASLR for fsl_booke64 > powerpc/fsl_booke/64: do not clear the BSS for the second pass > powerpc/fsl_booke/64: clear the original kernel if randomized > powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst > and add 64bit part > > .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++-- > arch/powerpc/Kconfig | 2 +- > arch/powerpc/kernel/exceptions-64e.S | 23 ++++++ > arch/powerpc/kernel/head_64.S | 14 ++++ > arch/powerpc/kernel/setup_64.c | 4 +- > arch/powerpc/mm/mmu_decl.h | 19 ++--- > arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------ > 7 files changed, 132 insertions(+), 36 deletions(-) > rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%) > > -- > 2.17.2
Hi Daniel, 在 2020/2/26 15:16, Daniel Axtens 写道: > Hi Jason, > >> This is a try to implement KASLR for Freescale BookE64 which is based on >> my earlier implementation for Freescale BookE32: >> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718 >> >> The implementation for Freescale BookE64 is similar as BookE32. One >> difference is that Freescale BookE64 set up a TLB mapping of 1G during >> booting. Another difference is that ppc64 needs the kernel to be >> 64K-aligned. So we can randomize the kernel in this 1G mapping and make >> it 64K-aligned. This can save some code to creat another TLB map at >> early boot. The disadvantage is that we only have about 1G/64K = 16384 >> slots to put the kernel in. >> >> KERNELBASE >> >> 64K |--> kernel <--| >> | | | >> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ >> | | | |....| | | | | | | | | |....| | | >> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ >> | | 1G >> |-----> offset <-----| >> >> kernstart_virt_addr >> >> I'm not sure if the slot numbers is enough or the design has any >> defects. If you have some better ideas, I would be happy to hear that. >> >> Thank you all. >> > > Are you making any attempt to hide kernel address leaks in this series? Yes. > I've just been looking at the stackdump code just now, and it directly > prints link registers and stack pointers, which is probably enough to > determine the kernel base address: > > SPs: LRs: %pS pointer > [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 (unreliable) > [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac > [ 0.424659] [c0000000de403a60] [c0000000024d7a00] mount_block_root+0x634/0x7c0 > [ 0.424734] [c0000000de403be0] [c0000000024d8100] prepare_namespace+0x1ec/0x23c > [ 0.424811] [c0000000de403c60] [c0000000024d7010] kernel_init_freeable+0x804/0x880 > > git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all > in process.c or in xmon. > Thanks for reminding this. > Maybe replacing the REG format string in KASLR mode would be sufficient? > Most archs have removed the address printing when dumping stack. Do we really have to print this? If we have to do this, maybe we can use "%pK" so that they will be hidden from unprivileged users. Thanks, Jason > Regards, > Daniel > > >> v2->v3: >> Fix build error when KASLR is disabled. >> v1->v2: >> Add __kaslr_offset for the secondary cpu boot up. >> >> Jason Yan (6): >> powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and >> kaslr_early_init() >> powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper >> powerpc/fsl_booke/64: implement KASLR for fsl_booke64 >> powerpc/fsl_booke/64: do not clear the BSS for the second pass >> powerpc/fsl_booke/64: clear the original kernel if randomized >> powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst >> and add 64bit part >> >> .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++-- >> arch/powerpc/Kconfig | 2 +- >> arch/powerpc/kernel/exceptions-64e.S | 23 ++++++ >> arch/powerpc/kernel/head_64.S | 14 ++++ >> arch/powerpc/kernel/setup_64.c | 4 +- >> arch/powerpc/mm/mmu_decl.h | 19 ++--- >> arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------ >> 7 files changed, 132 insertions(+), 36 deletions(-) >> rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%) >> >> -- >> 2.17.2 > > . >
Jason Yan <yanaijie@huawei.com> writes: > Hi Daniel, > > 在 2020/2/26 15:16, Daniel Axtens 写道: >> Hi Jason, >> >>> This is a try to implement KASLR for Freescale BookE64 which is based on >>> my earlier implementation for Freescale BookE32: >>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718 >>> >>> The implementation for Freescale BookE64 is similar as BookE32. One >>> difference is that Freescale BookE64 set up a TLB mapping of 1G during >>> booting. Another difference is that ppc64 needs the kernel to be >>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make >>> it 64K-aligned. This can save some code to creat another TLB map at >>> early boot. The disadvantage is that we only have about 1G/64K = 16384 >>> slots to put the kernel in. >>> >>> KERNELBASE >>> >>> 64K |--> kernel <--| >>> | | | >>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ >>> | | | |....| | | | | | | | | |....| | | >>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ >>> | | 1G >>> |-----> offset <-----| >>> >>> kernstart_virt_addr >>> >>> I'm not sure if the slot numbers is enough or the design has any >>> defects. If you have some better ideas, I would be happy to hear that. >>> >>> Thank you all. >>> >> >> Are you making any attempt to hide kernel address leaks in this series? > > Yes. > >> I've just been looking at the stackdump code just now, and it directly >> prints link registers and stack pointers, which is probably enough to >> determine the kernel base address: >> >> SPs: LRs: %pS pointer >> [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 (unreliable) >> [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac >> [ 0.424659] [c0000000de403a60] [c0000000024d7a00] mount_block_root+0x634/0x7c0 >> [ 0.424734] [c0000000de403be0] [c0000000024d8100] prepare_namespace+0x1ec/0x23c >> [ 0.424811] [c0000000de403c60] [c0000000024d7010] kernel_init_freeable+0x804/0x880 >> >> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all >> in process.c or in xmon. >> > > Thanks for reminding this. > >> Maybe replacing the REG format string in KASLR mode would be sufficient? >> > > Most archs have removed the address printing when dumping stack. Do we > really have to print this? > > If we have to do this, maybe we can use "%pK" so that they will be > hidden from unprivileged users. I suspect that you will find it easier to convince people to accept a change to %pK than removal :) BTW, I have a T4240RDB so I might be able to test this series at some point - do I need an updated bootloader to pass in a random seed, or is the kernel able to get enough randomness by itself? (Sorry if this is explained elsewhere in the series, I have only skimmed it lightly!) Regards, Daniel > > Thanks, > Jason > >> Regards, >> Daniel >> >> >>> v2->v3: >>> Fix build error when KASLR is disabled. >>> v1->v2: >>> Add __kaslr_offset for the secondary cpu boot up. >>> >>> Jason Yan (6): >>> powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and >>> kaslr_early_init() >>> powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper >>> powerpc/fsl_booke/64: implement KASLR for fsl_booke64 >>> powerpc/fsl_booke/64: do not clear the BSS for the second pass >>> powerpc/fsl_booke/64: clear the original kernel if randomized >>> powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst >>> and add 64bit part >>> >>> .../{kaslr-booke32.rst => kaslr-booke.rst} | 35 +++++++-- >>> arch/powerpc/Kconfig | 2 +- >>> arch/powerpc/kernel/exceptions-64e.S | 23 ++++++ >>> arch/powerpc/kernel/head_64.S | 14 ++++ >>> arch/powerpc/kernel/setup_64.c | 4 +- >>> arch/powerpc/mm/mmu_decl.h | 19 ++--- >>> arch/powerpc/mm/nohash/kaslr_booke.c | 71 +++++++++++++------ >>> 7 files changed, 132 insertions(+), 36 deletions(-) >>> rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%) >>> >>> -- >>> 2.17.2 >> >> . >>
在 2020/2/26 19:41, Daniel Axtens 写道: > I suspect that you will find it easier to convince people to accept a > change to %pK than removal:) > > BTW, I have a T4240RDB so I might be able to test this series at some > point - do I need an updated bootloader to pass in a random seed, or is > the kernel able to get enough randomness by itself? (Sorry if this is > explained elsewhere in the series, I have only skimmed it lightly!) > Thanks. It will be great if you have time to test this series. You do not need an updated bootloader becuase the kernel is using timer base as a simple source of entropy. This is enough for testing the KASLR code itself. Jason > Regards, > Daniel
On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote: > Hi Daniel, > > 在 2020/2/26 15:16, Daniel Axtens 写道: > > Hi Jason, > > > > > This is a try to implement KASLR for Freescale BookE64 which is based on > > > my earlier implementation for Freescale BookE32: > > > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718 > > > > > > The implementation for Freescale BookE64 is similar as BookE32. One > > > difference is that Freescale BookE64 set up a TLB mapping of 1G during > > > booting. Another difference is that ppc64 needs the kernel to be > > > 64K-aligned. So we can randomize the kernel in this 1G mapping and make > > > it 64K-aligned. This can save some code to creat another TLB map at > > > early boot. The disadvantage is that we only have about 1G/64K = 16384 > > > slots to put the kernel in. > > > > > > KERNELBASE > > > > > > 64K |--> kernel <--| > > > | | | > > > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ > > > | | | |....| | | | | | | | | |....| | | > > > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ > > > | | 1G > > > |-----> offset <-----| > > > > > > kernstart_virt_addr > > > > > > I'm not sure if the slot numbers is enough or the design has any > > > defects. If you have some better ideas, I would be happy to hear that. > > > > > > Thank you all. > > > > > > > Are you making any attempt to hide kernel address leaks in this series? > > Yes. > > > I've just been looking at the stackdump code just now, and it directly > > prints link registers and stack pointers, which is probably enough to > > determine the kernel base address: > > > > SPs: LRs: %pS pointer > > [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 > > (unreliable) > > [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac > > [ 0.424659] [c0000000de403a60] [c0000000024d7a00] > > mount_block_root+0x634/0x7c0 > > [ 0.424734] [c0000000de403be0] [c0000000024d8100] > > prepare_namespace+0x1ec/0x23c > > [ 0.424811] [c0000000de403c60] [c0000000024d7010] > > kernel_init_freeable+0x804/0x880 > > > > git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all > > in process.c or in xmon. > > > > Thanks for reminding this. > > > Maybe replacing the REG format string in KASLR mode would be sufficient? > > > > Most archs have removed the address printing when dumping stack. Do we > really have to print this? > > If we have to do this, maybe we can use "%pK" so that they will be > hidden from unprivileged users. I've found the addresses to be useful, especially if I had a way to dump the stack data itself. Wouldn't the register dump also be likely to give away the addresses? I don't see any debug setting for %pK (or %p) to always print the actual address (closest is kptr_restrict=1 but that only works in certain contexts)... from looking at the code it seems it hashes even if kaslr is entirely disabled? Or am I missing something? -Scott
在 2020/2/28 13:53, Scott Wood 写道: > On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote: >> Hi Daniel, >> >> 在 2020/2/26 15:16, Daniel Axtens 写道: >>> Hi Jason, >>> >>>> This is a try to implement KASLR for Freescale BookE64 which is based on >>>> my earlier implementation for Freescale BookE32: >>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718 >>>> >>>> The implementation for Freescale BookE64 is similar as BookE32. One >>>> difference is that Freescale BookE64 set up a TLB mapping of 1G during >>>> booting. Another difference is that ppc64 needs the kernel to be >>>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make >>>> it 64K-aligned. This can save some code to creat another TLB map at >>>> early boot. The disadvantage is that we only have about 1G/64K = 16384 >>>> slots to put the kernel in. >>>> >>>> KERNELBASE >>>> >>>> 64K |--> kernel <--| >>>> | | | >>>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ >>>> | | | |....| | | | | | | | | |....| | | >>>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ >>>> | | 1G >>>> |-----> offset <-----| >>>> >>>> kernstart_virt_addr >>>> >>>> I'm not sure if the slot numbers is enough or the design has any >>>> defects. If you have some better ideas, I would be happy to hear that. >>>> >>>> Thank you all. >>>> >>> >>> Are you making any attempt to hide kernel address leaks in this series? >> >> Yes. >> >>> I've just been looking at the stackdump code just now, and it directly >>> prints link registers and stack pointers, which is probably enough to >>> determine the kernel base address: >>> >>> SPs: LRs: %pS pointer >>> [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 >>> (unreliable) >>> [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac >>> [ 0.424659] [c0000000de403a60] [c0000000024d7a00] >>> mount_block_root+0x634/0x7c0 >>> [ 0.424734] [c0000000de403be0] [c0000000024d8100] >>> prepare_namespace+0x1ec/0x23c >>> [ 0.424811] [c0000000de403c60] [c0000000024d7010] >>> kernel_init_freeable+0x804/0x880 >>> >>> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all >>> in process.c or in xmon. >>> >> >> Thanks for reminding this. >> >>> Maybe replacing the REG format string in KASLR mode would be sufficient? >>> >> >> Most archs have removed the address printing when dumping stack. Do we >> really have to print this? >> >> If we have to do this, maybe we can use "%pK" so that they will be >> hidden from unprivileged users. > > I've found the addresses to be useful, especially if I had a way to dump the > stack data itself. Wouldn't the register dump also be likely to give away the > addresses? If we have to print the address, then kptr_restrict and dmesg_restrict must be set properly so that unprivileged users cannot see them. > > I don't see any debug setting for %pK (or %p) to always print the actual > address (closest is kptr_restrict=1 but that only works in certain > contexts)... from looking at the code it seems it hashes even if kaslr is > entirely disabled? Or am I missing something? > Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if we want the real value of the address, we cannot use it. But if you only want to distinguish if two pointers are the same, it's ok. > -Scott > > > > . >
On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote: > > 在 2020/2/28 13:53, Scott Wood 写道: > > On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote: > > > Hi Daniel, > > > > > > 在 2020/2/26 15:16, Daniel Axtens 写道: > > > > Maybe replacing the REG format string in KASLR mode would be > > > > sufficient? > > > > > > > > > > Most archs have removed the address printing when dumping stack. Do we > > > really have to print this? > > > > > > If we have to do this, maybe we can use "%pK" so that they will be > > > hidden from unprivileged users. > > > > I've found the addresses to be useful, especially if I had a way to dump > > the > > stack data itself. Wouldn't the register dump also be likely to give away > > the > > addresses? > > If we have to print the address, then kptr_restrict and dmesg_restrict > must be set properly so that unprivileged users cannot see them. And how does that work with crash dumps that could be from any context? dmesg_restrict is irrelevant as it just controls who can see the dmesg, not what goes into it. kptr_restrict=1 will only get the value if you're not in any sort of IRQ, *and* if the crashing context happened to have CAP_SYSLOG. No other value of kptr_restrict will ever get you the raw value. > > > > I don't see any debug setting for %pK (or %p) to always print the actual > > address (closest is kptr_restrict=1 but that only works in certain > > contexts)... from looking at the code it seems it hashes even if kaslr is > > entirely disabled? Or am I missing something? > > > > Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if > we want the real value of the address, we cannot use it. But if you only > want to distinguish if two pointers are the same, it's ok. Am I the only one that finds this a bit crazy? If you want to lock a system down then fine, but why wage war on debugging even when there's no randomization going on? Comparing two pointers for equality is not always adequate. -Scott
在 2020/2/29 12:28, Scott Wood 写道: > On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote: >> >> 在 2020/2/28 13:53, Scott Wood 写道: >>> On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote: >>>> Hi Daniel, >>>> >>>> 在 2020/2/26 15:16, Daniel Axtens 写道: >>>>> Maybe replacing the REG format string in KASLR mode would be >>>>> sufficient? >>>>> >>>> >>>> Most archs have removed the address printing when dumping stack. Do we >>>> really have to print this? >>>> >>>> If we have to do this, maybe we can use "%pK" so that they will be >>>> hidden from unprivileged users. >>> >>> I've found the addresses to be useful, especially if I had a way to dump >>> the >>> stack data itself. Wouldn't the register dump also be likely to give away >>> the >>> addresses? >> >> If we have to print the address, then kptr_restrict and dmesg_restrict >> must be set properly so that unprivileged users cannot see them. > > And how does that work with crash dumps that could be from any context? > > dmesg_restrict is irrelevant as it just controls who can see the dmesg, not > what goes into it. kptr_restrict=1 will only get the value if you're not in > any sort of IRQ, *and* if the crashing context happened to have CAP_SYSLOG. > No other value of kptr_restrict will ever get you the raw value. > >>> >>> I don't see any debug setting for %pK (or %p) to always print the actual >>> address (closest is kptr_restrict=1 but that only works in certain >>> contexts)... from looking at the code it seems it hashes even if kaslr is >>> entirely disabled? Or am I missing something? >>> >> >> Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if >> we want the real value of the address, we cannot use it. But if you only >> want to distinguish if two pointers are the same, it's ok. > > Am I the only one that finds this a bit crazy? If you want to lock a system > down then fine, but why wage war on debugging even when there's no > randomization going on? Comparing two pointers for equality is not always > adequate. > AFAIK, %p hashing is only exist because of many legacy address printings and force who really want the raw values to switch to %px or even %lx. It's not the opposite of debugging. Raw address printing is not forbidden, only people need to estimate the risk of adrdress leaks. Turnning to %p may not be a good idea in this situation. So for the REG logs printed when dumping stack, we can disable it when KASLR is open. For the REG logs in other places like show_regs(), only privileged can trigger it, and they are not combind with a symbol, so I think it's ok to keep them. diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c index fad50db9dcf2..659c51f0739a 100644 --- a/arch/powerpc/kernel/process.c +++ b/arch/powerpc/kernel/process.c @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned long *stack) newsp = stack[0]; ip = stack[STACK_FRAME_LR_SAVE]; if (!firstframe || ip != lr) { - printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) + printk("%pS", (void *)ip); + else + printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); #ifdef CONFIG_FUNCTION_GRAPH_TRACER ret_addr = ftrace_graph_ret_addr(current, &ftrace_idx, ip, stack); Thanks, Jason > -Scott > > > > . >
On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote: > > 在 2020/2/29 12:28, Scott Wood 写道: > > On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote: > > > > > > 在 2020/2/28 13:53, Scott Wood 写道: > > > > > > > > I don't see any debug setting for %pK (or %p) to always print the > > > > actual > > > > address (closest is kptr_restrict=1 but that only works in certain > > > > contexts)... from looking at the code it seems it hashes even if kaslr > > > > is > > > > entirely disabled? Or am I missing something? > > > > > > > > > > Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if > > > we want the real value of the address, we cannot use it. But if you only > > > want to distinguish if two pointers are the same, it's ok. > > > > Am I the only one that finds this a bit crazy? If you want to lock a > > system > > down then fine, but why wage war on debugging even when there's no > > randomization going on? Comparing two pointers for equality is not always > > adequate. > > > > AFAIK, %p hashing is only exist because of many legacy address printings > and force who really want the raw values to switch to %px or even %lx. > It's not the opposite of debugging. Raw address printing is not > forbidden, only people need to estimate the risk of adrdress leaks. Yes, but I don't see any format specifier to switch to that will hash in a randomized production environment, but not in a debug or other non-randomized environment which seems like the ideal default for most debug output. > > Turnning to %p may not be a good idea in this situation. So > for the REG logs printed when dumping stack, we can disable it when > KASLR is open. For the REG logs in other places like show_regs(), only > privileged can trigger it, and they are not combind with a symbol, so > I think it's ok to keep them. > > diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c > index fad50db9dcf2..659c51f0739a 100644 > --- a/arch/powerpc/kernel/process.c > +++ b/arch/powerpc/kernel/process.c > @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned > long *stack) > newsp = stack[0]; > ip = stack[STACK_FRAME_LR_SAVE]; > if (!firstframe || ip != lr) { > - printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); > + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) > + printk("%pS", (void *)ip); > + else > + printk("["REG"] ["REG"] %pS", sp, ip, > (void *)ip); This doesn't deal with "nokaslr" on the kernel command line. It also doesn't seem like something that every callsite should have to opencode, versus having an appropriate format specifier behaves as I described above (and I still don't see why that format specifier should not be "%p"). -Scott
在 2020/3/1 6:54, Scott Wood 写道: > On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote: >> >> 在 2020/2/29 12:28, Scott Wood 写道: >>> On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote: >>>> >>>> 在 2020/2/28 13:53, Scott Wood 写道: >>>>> >>>>> I don't see any debug setting for %pK (or %p) to always print the >>>>> actual >>>>> address (closest is kptr_restrict=1 but that only works in certain >>>>> contexts)... from looking at the code it seems it hashes even if kaslr >>>>> is >>>>> entirely disabled? Or am I missing something? >>>>> >>>> >>>> Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if >>>> we want the real value of the address, we cannot use it. But if you only >>>> want to distinguish if two pointers are the same, it's ok. >>> >>> Am I the only one that finds this a bit crazy? If you want to lock a >>> system >>> down then fine, but why wage war on debugging even when there's no >>> randomization going on? Comparing two pointers for equality is not always >>> adequate. >>> >> >> AFAIK, %p hashing is only exist because of many legacy address printings >> and force who really want the raw values to switch to %px or even %lx. >> It's not the opposite of debugging. Raw address printing is not >> forbidden, only people need to estimate the risk of adrdress leaks. > > Yes, but I don't see any format specifier to switch to that will hash in a > randomized production environment, but not in a debug or other non-randomized > environment which seems like the ideal default for most debug output. > Sorry I have no idea why there is no format specifier considered for switching of randomized or non-randomized environment. May they think that raw address should not leak in non-randomized environment too. May be Kees or Tobin can answer this question. Kees? Tobin? >> >> Turnning to %p may not be a good idea in this situation. So >> for the REG logs printed when dumping stack, we can disable it when >> KASLR is open. For the REG logs in other places like show_regs(), only >> privileged can trigger it, and they are not combind with a symbol, so >> I think it's ok to keep them. >> >> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c >> index fad50db9dcf2..659c51f0739a 100644 >> --- a/arch/powerpc/kernel/process.c >> +++ b/arch/powerpc/kernel/process.c >> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned >> long *stack) >> newsp = stack[0]; >> ip = stack[STACK_FRAME_LR_SAVE]; >> if (!firstframe || ip != lr) { >> - printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip); >> + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) >> + printk("%pS", (void *)ip); >> + else >> + printk("["REG"] ["REG"] %pS", sp, ip, >> (void *)ip); > > This doesn't deal with "nokaslr" on the kernel command line. It also doesn't > seem like something that every callsite should have to opencode, versus having > an appropriate format specifier behaves as I described above (and I still > don't see why that format specifier should not be "%p"). > Actually I still do not understand why we should print the raw value here. When KALLSYMS is enabled we have symbol name and offset like put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw address. > -Scott > > > > . >
On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote: > > 在 2020/3/1 6:54, Scott Wood 写道: > > On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote: > > > > > > Turnning to %p may not be a good idea in this situation. So > > > for the REG logs printed when dumping stack, we can disable it when > > > KASLR is open. For the REG logs in other places like show_regs(), only > > > privileged can trigger it, and they are not combind with a symbol, so > > > I think it's ok to keep them. > > > > > > diff --git a/arch/powerpc/kernel/process.c > > > b/arch/powerpc/kernel/process.c > > > index fad50db9dcf2..659c51f0739a 100644 > > > --- a/arch/powerpc/kernel/process.c > > > +++ b/arch/powerpc/kernel/process.c > > > @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned > > > long *stack) > > > newsp = stack[0]; > > > ip = stack[STACK_FRAME_LR_SAVE]; > > > if (!firstframe || ip != lr) { > > > - printk("["REG"] ["REG"] %pS", sp, ip, (void > > > *)ip); > > > + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) > > > + printk("%pS", (void *)ip); > > > + else > > > + printk("["REG"] ["REG"] %pS", sp, ip, > > > (void *)ip); > > > > This doesn't deal with "nokaslr" on the kernel command line. It also > > doesn't > > seem like something that every callsite should have to opencode, versus > > having > > an appropriate format specifier behaves as I described above (and I still > > don't see why that format specifier should not be "%p"). > > > > Actually I still do not understand why we should print the raw value > here. When KALLSYMS is enabled we have symbol name and offset like > put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw > address. I'm more concerned about the stack address for wading through a raw stack dump (to find function call arguments, etc). The return address does help confirm that I'm on the right stack frame though, and also makes looking up a line number slightly easier than having to look up a symbol address and then add the offset (at least for non-module addresses). As a random aside, the mismatch between Linux printing a hex offset and GDB using decimal in disassembly is annoying... -Scott
在 2020/3/2 11:24, Scott Wood 写道: > On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote: >> >> 在 2020/3/1 6:54, Scott Wood 写道: >>> On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote: >>>> >>>> Turnning to %p may not be a good idea in this situation. So >>>> for the REG logs printed when dumping stack, we can disable it when >>>> KASLR is open. For the REG logs in other places like show_regs(), only >>>> privileged can trigger it, and they are not combind with a symbol, so >>>> I think it's ok to keep them. >>>> >>>> diff --git a/arch/powerpc/kernel/process.c >>>> b/arch/powerpc/kernel/process.c >>>> index fad50db9dcf2..659c51f0739a 100644 >>>> --- a/arch/powerpc/kernel/process.c >>>> +++ b/arch/powerpc/kernel/process.c >>>> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned >>>> long *stack) >>>> newsp = stack[0]; >>>> ip = stack[STACK_FRAME_LR_SAVE]; >>>> if (!firstframe || ip != lr) { >>>> - printk("["REG"] ["REG"] %pS", sp, ip, (void >>>> *)ip); >>>> + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) >>>> + printk("%pS", (void *)ip); >>>> + else >>>> + printk("["REG"] ["REG"] %pS", sp, ip, >>>> (void *)ip); >>> >>> This doesn't deal with "nokaslr" on the kernel command line. It also >>> doesn't >>> seem like something that every callsite should have to opencode, versus >>> having >>> an appropriate format specifier behaves as I described above (and I still >>> don't see why that format specifier should not be "%p"). >>> >> >> Actually I still do not understand why we should print the raw value >> here. When KALLSYMS is enabled we have symbol name and offset like >> put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw >> address. > > I'm more concerned about the stack address for wading through a raw stack dump > (to find function call arguments, etc). The return address does help confirm > that I'm on the right stack frame though, and also makes looking up a line > number slightly easier than having to look up a symbol address and then add > the offset (at least for non-module addresses). > > As a random aside, the mismatch between Linux printing a hex offset and GDB > using decimal in disassembly is annoying... > OK, I will send a RFC patch to add a new format specifier such as "%pk" or change the exsiting "%pK" to print raw value of addresses when KASLR is disabled and print hash value of addresses when KASLR is enabled. Let's see what the printk guys would say :) > -Scott > > > > . >
On Mon, 2020-03-02 at 15:12 +0800, Jason Yan wrote: > > 在 2020/3/2 11:24, Scott Wood 写道: > > On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote: > > > > > > 在 2020/3/1 6:54, Scott Wood 写道: > > > > On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote: > > > > > > > > > > Turnning to %p may not be a good idea in this situation. So > > > > > for the REG logs printed when dumping stack, we can disable it when > > > > > KASLR is open. For the REG logs in other places like show_regs(), > > > > > only > > > > > privileged can trigger it, and they are not combind with a symbol, > > > > > so > > > > > I think it's ok to keep them. > > > > > > > > > > diff --git a/arch/powerpc/kernel/process.c > > > > > b/arch/powerpc/kernel/process.c > > > > > index fad50db9dcf2..659c51f0739a 100644 > > > > > --- a/arch/powerpc/kernel/process.c > > > > > +++ b/arch/powerpc/kernel/process.c > > > > > @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, > > > > > unsigned > > > > > long *stack) > > > > > newsp = stack[0]; > > > > > ip = stack[STACK_FRAME_LR_SAVE]; > > > > > if (!firstframe || ip != lr) { > > > > > - printk("["REG"] ["REG"] %pS", sp, ip, (void > > > > > *)ip); > > > > > + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) > > > > > + printk("%pS", (void *)ip); > > > > > + else > > > > > + printk("["REG"] ["REG"] %pS", sp, > > > > > ip, > > > > > (void *)ip); > > > > > > > > This doesn't deal with "nokaslr" on the kernel command line. It also > > > > doesn't > > > > seem like something that every callsite should have to opencode, > > > > versus > > > > having > > > > an appropriate format specifier behaves as I described above (and I > > > > still > > > > don't see why that format specifier should not be "%p"). > > > > > > > > > > Actually I still do not understand why we should print the raw value > > > here. When KALLSYMS is enabled we have symbol name and offset like > > > put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw > > > address. > > > > I'm more concerned about the stack address for wading through a raw stack > > dump > > (to find function call arguments, etc). The return address does help > > confirm > > that I'm on the right stack frame though, and also makes looking up a line > > number slightly easier than having to look up a symbol address and then > > add > > the offset (at least for non-module addresses). > > > > As a random aside, the mismatch between Linux printing a hex offset and > > GDB > > using decimal in disassembly is annoying... > > > > OK, I will send a RFC patch to add a new format specifier such as "%pk" > or change the exsiting "%pK" to print raw value of addresses when KASLR > is disabled and print hash value of addresses when KASLR is enabled. > Let's see what the printk guys would say :) I'm not sure that a new format specifier is needed versus changing the behavior of "%p", and "%pK" definitely doesn't seem suitable given that it's intended to be more restricted than "%p" (see commit ef0010a30935de4). The question is whether there is a legitimate reason to hash in the absence of kaslr. -Scott
在 2020/3/2 16:47, Scott Wood 写道: > On Mon, 2020-03-02 at 15:12 +0800, Jason Yan wrote: >> >> 在 2020/3/2 11:24, Scott Wood 写道: >>> On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote: >>>> >>>> 在 2020/3/1 6:54, Scott Wood 写道: >>>>> On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote: >>>>>> >>>>>> Turnning to %p may not be a good idea in this situation. So >>>>>> for the REG logs printed when dumping stack, we can disable it when >>>>>> KASLR is open. For the REG logs in other places like show_regs(), >>>>>> only >>>>>> privileged can trigger it, and they are not combind with a symbol, >>>>>> so >>>>>> I think it's ok to keep them. >>>>>> >>>>>> diff --git a/arch/powerpc/kernel/process.c >>>>>> b/arch/powerpc/kernel/process.c >>>>>> index fad50db9dcf2..659c51f0739a 100644 >>>>>> --- a/arch/powerpc/kernel/process.c >>>>>> +++ b/arch/powerpc/kernel/process.c >>>>>> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, >>>>>> unsigned >>>>>> long *stack) >>>>>> newsp = stack[0]; >>>>>> ip = stack[STACK_FRAME_LR_SAVE]; >>>>>> if (!firstframe || ip != lr) { >>>>>> - printk("["REG"] ["REG"] %pS", sp, ip, (void >>>>>> *)ip); >>>>>> + if (IS_ENABLED(CONFIG_RANDOMIZE_BASE)) >>>>>> + printk("%pS", (void *)ip); >>>>>> + else >>>>>> + printk("["REG"] ["REG"] %pS", sp, >>>>>> ip, >>>>>> (void *)ip); >>>>> >>>>> This doesn't deal with "nokaslr" on the kernel command line. It also >>>>> doesn't >>>>> seem like something that every callsite should have to opencode, >>>>> versus >>>>> having >>>>> an appropriate format specifier behaves as I described above (and I >>>>> still >>>>> don't see why that format specifier should not be "%p"). >>>>> >>>> >>>> Actually I still do not understand why we should print the raw value >>>> here. When KALLSYMS is enabled we have symbol name and offset like >>>> put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw >>>> address. >>> >>> I'm more concerned about the stack address for wading through a raw stack >>> dump >>> (to find function call arguments, etc). The return address does help >>> confirm >>> that I'm on the right stack frame though, and also makes looking up a line >>> number slightly easier than having to look up a symbol address and then >>> add >>> the offset (at least for non-module addresses). >>> >>> As a random aside, the mismatch between Linux printing a hex offset and >>> GDB >>> using decimal in disassembly is annoying... >>> >> >> OK, I will send a RFC patch to add a new format specifier such as "%pk" >> or change the exsiting "%pK" to print raw value of addresses when KASLR >> is disabled and print hash value of addresses when KASLR is enabled. >> Let's see what the printk guys would say :) > > I'm not sure that a new format specifier is needed versus changing the > behavior of "%p", and "%pK" definitely doesn't seem suitable given that it's > intended to be more restricted than "%p" (see commit ef0010a30935de4). The > question is whether there is a legitimate reason to hash in the absence of > kaslr. > The problem is that if we change the behavior of "%p", we have to turn all exsiting "%p" to "%pK". Hashing is still reasonable when there is no kaslr because some architectures support randomize at build time such as arm64. > -Scott > > > > . >
On Wed, 2020-02-26 at 18:16 +1100, Daniel Axtens wrote: > Hi Jason, > > > This is a try to implement KASLR for Freescale BookE64 which is based on > > my earlier implementation for Freescale BookE32: > > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718 > > > > The implementation for Freescale BookE64 is similar as BookE32. One > > difference is that Freescale BookE64 set up a TLB mapping of 1G during > > booting. Another difference is that ppc64 needs the kernel to be > > 64K-aligned. So we can randomize the kernel in this 1G mapping and make > > it 64K-aligned. This can save some code to creat another TLB map at > > early boot. The disadvantage is that we only have about 1G/64K = 16384 > > slots to put the kernel in. > > > > KERNELBASE > > > > 64K |--> kernel <--| > > | | | > > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ > > | | | |....| | | | | | | | | |....| | | > > +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ > > | | 1G > > |-----> offset <-----| > > > > kernstart_virt_addr > > > > I'm not sure if the slot numbers is enough or the design has any > > defects. If you have some better ideas, I would be happy to hear that. > > > > Thank you all. > > > > Are you making any attempt to hide kernel address leaks in this series? > I've just been looking at the stackdump code just now, and it directly > prints link registers and stack pointers, which is probably enough to > determine the kernel base address: > > SPs: LRs: %pS pointer > [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 > (unreliable) > [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac > [ 0.424659] [c0000000de403a60] [c0000000024d7a00] > mount_block_root+0x634/0x7c0 > [ 0.424734] [c0000000de403be0] [c0000000024d8100] > prepare_namespace+0x1ec/0x23c > [ 0.424811] [c0000000de403c60] [c0000000024d7010] > kernel_init_freeable+0x804/0x880 > > git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all > in process.c or in xmon. > > Maybe replacing the REG format string in KASLR mode would be sufficient? Whatever we decide to do here, it's not book3e-specific so it should be considered separately from these patches. -Scott
在 2020/3/5 5:21, Scott Wood 写道: > On Wed, 2020-02-26 at 18:16 +1100, Daniel Axtens wrote: >> Hi Jason, >> >>> This is a try to implement KASLR for Freescale BookE64 which is based on >>> my earlier implementation for Freescale BookE32: >>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718 >>> >>> The implementation for Freescale BookE64 is similar as BookE32. One >>> difference is that Freescale BookE64 set up a TLB mapping of 1G during >>> booting. Another difference is that ppc64 needs the kernel to be >>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make >>> it 64K-aligned. This can save some code to creat another TLB map at >>> early boot. The disadvantage is that we only have about 1G/64K = 16384 >>> slots to put the kernel in. >>> >>> KERNELBASE >>> >>> 64K |--> kernel <--| >>> | | | >>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ >>> | | | |....| | | | | | | | | |....| | | >>> +--+--+--+ +--+--+--+--+--+--+--+--+--+ +--+--+ >>> | | 1G >>> |-----> offset <-----| >>> >>> kernstart_virt_addr >>> >>> I'm not sure if the slot numbers is enough or the design has any >>> defects. If you have some better ideas, I would be happy to hear that. >>> >>> Thank you all. >>> >> >> Are you making any attempt to hide kernel address leaks in this series? >> I've just been looking at the stackdump code just now, and it directly >> prints link registers and stack pointers, which is probably enough to >> determine the kernel base address: >> >> SPs: LRs: %pS pointer >> [ 0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 >> (unreliable) >> [ 0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac >> [ 0.424659] [c0000000de403a60] [c0000000024d7a00] >> mount_block_root+0x634/0x7c0 >> [ 0.424734] [c0000000de403be0] [c0000000024d8100] >> prepare_namespace+0x1ec/0x23c >> [ 0.424811] [c0000000de403c60] [c0000000024d7010] >> kernel_init_freeable+0x804/0x880 >> >> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all >> in process.c or in xmon. >> >> Maybe replacing the REG format string in KASLR mode would be sufficient? > > Whatever we decide to do here, it's not book3e-specific so it should be > considered separately from these patches. > OK, I will continue to work with this series. > -Scott > > > > . >