mbox series

[v3,0/6] implement KASLR for powerpc/fsl_booke/64

Message ID 20200206025825.22934-1-yanaijie@huawei.com (mailing list archive)
Headers show
Series implement KASLR for powerpc/fsl_booke/64 | expand

Message

Jason Yan Feb. 6, 2020, 2:58 a.m. UTC
This is a try to implement KASLR for Freescale BookE64 which is based on
my earlier implementation for Freescale BookE32:
https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718

The implementation for Freescale BookE64 is similar as BookE32. One
difference is that Freescale BookE64 set up a TLB mapping of 1G during
booting. Another difference is that ppc64 needs the kernel to be
64K-aligned. So we can randomize the kernel in this 1G mapping and make
it 64K-aligned. This can save some code to creat another TLB map at
early boot. The disadvantage is that we only have about 1G/64K = 16384
slots to put the kernel in.

    KERNELBASE

          64K                     |--> kernel <--|
           |                      |              |
        +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
        |  |  |  |....|  |  |  |  |  |  |  |  |  |....|  |  |
        +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
        |                         |                        1G
        |----->   offset    <-----|

                              kernstart_virt_addr

I'm not sure if the slot numbers is enough or the design has any
defects. If you have some better ideas, I would be happy to hear that.

Thank you all.

v2->v3:
  Fix build error when KASLR is disabled.
v1->v2:
  Add __kaslr_offset for the secondary cpu boot up.

Jason Yan (6):
  powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
    kaslr_early_init()
  powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
  powerpc/fsl_booke/64: implement KASLR for fsl_booke64
  powerpc/fsl_booke/64: do not clear the BSS for the second pass
  powerpc/fsl_booke/64: clear the original kernel if randomized
  powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
    and add 64bit part

 .../{kaslr-booke32.rst => kaslr-booke.rst}    | 35 +++++++--
 arch/powerpc/Kconfig                          |  2 +-
 arch/powerpc/kernel/exceptions-64e.S          | 23 ++++++
 arch/powerpc/kernel/head_64.S                 | 14 ++++
 arch/powerpc/kernel/setup_64.c                |  4 +-
 arch/powerpc/mm/mmu_decl.h                    | 19 ++---
 arch/powerpc/mm/nohash/kaslr_booke.c          | 71 +++++++++++++------
 7 files changed, 132 insertions(+), 36 deletions(-)
 rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)

Comments

Jason Yan Feb. 13, 2020, 3 a.m. UTC | #1
Hi everyone, any comments or suggestions?

Thanks,
Jason

on 2020/2/6 10:58, Jason Yan wrote:
> This is a try to implement KASLR for Freescale BookE64 which is based on
> my earlier implementation for Freescale BookE32:
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
> 
> The implementation for Freescale BookE64 is similar as BookE32. One
> difference is that Freescale BookE64 set up a TLB mapping of 1G during
> booting. Another difference is that ppc64 needs the kernel to be
> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> it 64K-aligned. This can save some code to creat another TLB map at
> early boot. The disadvantage is that we only have about 1G/64K = 16384
> slots to put the kernel in.
> 
>      KERNELBASE
> 
>            64K                     |--> kernel <--|
>             |                      |              |
>          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>          |  |  |  |....|  |  |  |  |  |  |  |  |  |....|  |  |
>          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>          |                         |                        1G
>          |----->   offset    <-----|
> 
>                                kernstart_virt_addr
> 
> I'm not sure if the slot numbers is enough or the design has any
> defects. If you have some better ideas, I would be happy to hear that.
> 
> Thank you all.
> 
> v2->v3:
>    Fix build error when KASLR is disabled.
> v1->v2:
>    Add __kaslr_offset for the secondary cpu boot up.
> 
> Jason Yan (6):
>    powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
>      kaslr_early_init()
>    powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
>    powerpc/fsl_booke/64: implement KASLR for fsl_booke64
>    powerpc/fsl_booke/64: do not clear the BSS for the second pass
>    powerpc/fsl_booke/64: clear the original kernel if randomized
>    powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
>      and add 64bit part
> 
>   .../{kaslr-booke32.rst => kaslr-booke.rst}    | 35 +++++++--
>   arch/powerpc/Kconfig                          |  2 +-
>   arch/powerpc/kernel/exceptions-64e.S          | 23 ++++++
>   arch/powerpc/kernel/head_64.S                 | 14 ++++
>   arch/powerpc/kernel/setup_64.c                |  4 +-
>   arch/powerpc/mm/mmu_decl.h                    | 19 ++---
>   arch/powerpc/mm/nohash/kaslr_booke.c          | 71 +++++++++++++------
>   7 files changed, 132 insertions(+), 36 deletions(-)
>   rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
>
Jason Yan Feb. 20, 2020, 3:33 a.m. UTC | #2
ping...

on 2020/2/13 11:00, Jason Yan wrote:
> Hi everyone, any comments or suggestions?
> 
> Thanks,
> Jason
> 
> on 2020/2/6 10:58, Jason Yan wrote:
>> This is a try to implement KASLR for Freescale BookE64 which is based on
>> my earlier implementation for Freescale BookE32:
>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>>
>> The implementation for Freescale BookE64 is similar as BookE32. One
>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>> booting. Another difference is that ppc64 needs the kernel to be
>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>> it 64K-aligned. This can save some code to creat another TLB map at
>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>> slots to put the kernel in.
>>
>>      KERNELBASE
>>
>>            64K                     |--> kernel <--|
>>             |                      |              |
>>          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>>          |  |  |  |....|  |  |  |  |  |  |  |  |  |....|  |  |
>>          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>>          |                         |                        1G
>>          |----->   offset    <-----|
>>
>>                                kernstart_virt_addr
>>
>> I'm not sure if the slot numbers is enough or the design has any
>> defects. If you have some better ideas, I would be happy to hear that.
>>
>> Thank you all.
>>
>> v2->v3:
>>    Fix build error when KASLR is disabled.
>> v1->v2:
>>    Add __kaslr_offset for the secondary cpu boot up.
>>
>> Jason Yan (6):
>>    powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
>>      kaslr_early_init()
>>    powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
>>    powerpc/fsl_booke/64: implement KASLR for fsl_booke64
>>    powerpc/fsl_booke/64: do not clear the BSS for the second pass
>>    powerpc/fsl_booke/64: clear the original kernel if randomized
>>    powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
>>      and add 64bit part
>>
>>   .../{kaslr-booke32.rst => kaslr-booke.rst}    | 35 +++++++--
>>   arch/powerpc/Kconfig                          |  2 +-
>>   arch/powerpc/kernel/exceptions-64e.S          | 23 ++++++
>>   arch/powerpc/kernel/head_64.S                 | 14 ++++
>>   arch/powerpc/kernel/setup_64.c                |  4 +-
>>   arch/powerpc/mm/mmu_decl.h                    | 19 ++---
>>   arch/powerpc/mm/nohash/kaslr_booke.c          | 71 +++++++++++++------
>>   7 files changed, 132 insertions(+), 36 deletions(-)
>>   rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} 
>> (59%)
>>
> 
> 
> .
Daniel Axtens Feb. 26, 2020, 7:16 a.m. UTC | #3
Hi Jason,

> This is a try to implement KASLR for Freescale BookE64 which is based on
> my earlier implementation for Freescale BookE32:
> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>
> The implementation for Freescale BookE64 is similar as BookE32. One
> difference is that Freescale BookE64 set up a TLB mapping of 1G during
> booting. Another difference is that ppc64 needs the kernel to be
> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> it 64K-aligned. This can save some code to creat another TLB map at
> early boot. The disadvantage is that we only have about 1G/64K = 16384
> slots to put the kernel in.
>
>     KERNELBASE
>
>           64K                     |--> kernel <--|
>            |                      |              |
>         +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>         |  |  |  |....|  |  |  |  |  |  |  |  |  |....|  |  |
>         +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>         |                         |                        1G
>         |----->   offset    <-----|
>
>                               kernstart_virt_addr
>
> I'm not sure if the slot numbers is enough or the design has any
> defects. If you have some better ideas, I would be happy to hear that.
>
> Thank you all.
>

Are you making any attempt to hide kernel address leaks in this series?
I've just been looking at the stackdump code just now, and it directly
prints link registers and stack pointers, which is probably enough to
determine the kernel base address:

                  SPs:               LRs:             %pS pointer
[    0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 (unreliable)
[    0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
[    0.424659] [c0000000de403a60] [c0000000024d7a00] mount_block_root+0x634/0x7c0
[    0.424734] [c0000000de403be0] [c0000000024d8100] prepare_namespace+0x1ec/0x23c
[    0.424811] [c0000000de403c60] [c0000000024d7010] kernel_init_freeable+0x804/0x880

git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
in process.c or in xmon.

Maybe replacing the REG format string in KASLR mode would be sufficient?

Regards,
Daniel


> v2->v3:
>   Fix build error when KASLR is disabled.
> v1->v2:
>   Add __kaslr_offset for the secondary cpu boot up.
>
> Jason Yan (6):
>   powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
>     kaslr_early_init()
>   powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
>   powerpc/fsl_booke/64: implement KASLR for fsl_booke64
>   powerpc/fsl_booke/64: do not clear the BSS for the second pass
>   powerpc/fsl_booke/64: clear the original kernel if randomized
>   powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
>     and add 64bit part
>
>  .../{kaslr-booke32.rst => kaslr-booke.rst}    | 35 +++++++--
>  arch/powerpc/Kconfig                          |  2 +-
>  arch/powerpc/kernel/exceptions-64e.S          | 23 ++++++
>  arch/powerpc/kernel/head_64.S                 | 14 ++++
>  arch/powerpc/kernel/setup_64.c                |  4 +-
>  arch/powerpc/mm/mmu_decl.h                    | 19 ++---
>  arch/powerpc/mm/nohash/kaslr_booke.c          | 71 +++++++++++++------
>  7 files changed, 132 insertions(+), 36 deletions(-)
>  rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
>
> -- 
> 2.17.2
Jason Yan Feb. 26, 2020, 8:18 a.m. UTC | #4
Hi Daniel,

在 2020/2/26 15:16, Daniel Axtens 写道:
> Hi Jason,
> 
>> This is a try to implement KASLR for Freescale BookE64 which is based on
>> my earlier implementation for Freescale BookE32:
>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>>
>> The implementation for Freescale BookE64 is similar as BookE32. One
>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>> booting. Another difference is that ppc64 needs the kernel to be
>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>> it 64K-aligned. This can save some code to creat another TLB map at
>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>> slots to put the kernel in.
>>
>>      KERNELBASE
>>
>>            64K                     |--> kernel <--|
>>             |                      |              |
>>          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>>          |  |  |  |....|  |  |  |  |  |  |  |  |  |....|  |  |
>>          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>>          |                         |                        1G
>>          |----->   offset    <-----|
>>
>>                                kernstart_virt_addr
>>
>> I'm not sure if the slot numbers is enough or the design has any
>> defects. If you have some better ideas, I would be happy to hear that.
>>
>> Thank you all.
>>
> 
> Are you making any attempt to hide kernel address leaks in this series?

Yes.

> I've just been looking at the stackdump code just now, and it directly
> prints link registers and stack pointers, which is probably enough to
> determine the kernel base address:
> 
>                    SPs:               LRs:             %pS pointer
> [    0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 (unreliable)
> [    0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
> [    0.424659] [c0000000de403a60] [c0000000024d7a00] mount_block_root+0x634/0x7c0
> [    0.424734] [c0000000de403be0] [c0000000024d8100] prepare_namespace+0x1ec/0x23c
> [    0.424811] [c0000000de403c60] [c0000000024d7010] kernel_init_freeable+0x804/0x880
> 
> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
> in process.c or in xmon.
> 

Thanks for reminding this.

> Maybe replacing the REG format string in KASLR mode would be sufficient?
> 

Most archs have removed the address printing when dumping stack. Do we 
really have to print this?

If we have to do this, maybe we can use "%pK" so that they will be 
hidden from unprivileged users.

Thanks,
Jason

> Regards,
> Daniel
> 
> 
>> v2->v3:
>>    Fix build error when KASLR is disabled.
>> v1->v2:
>>    Add __kaslr_offset for the secondary cpu boot up.
>>
>> Jason Yan (6):
>>    powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
>>      kaslr_early_init()
>>    powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
>>    powerpc/fsl_booke/64: implement KASLR for fsl_booke64
>>    powerpc/fsl_booke/64: do not clear the BSS for the second pass
>>    powerpc/fsl_booke/64: clear the original kernel if randomized
>>    powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
>>      and add 64bit part
>>
>>   .../{kaslr-booke32.rst => kaslr-booke.rst}    | 35 +++++++--
>>   arch/powerpc/Kconfig                          |  2 +-
>>   arch/powerpc/kernel/exceptions-64e.S          | 23 ++++++
>>   arch/powerpc/kernel/head_64.S                 | 14 ++++
>>   arch/powerpc/kernel/setup_64.c                |  4 +-
>>   arch/powerpc/mm/mmu_decl.h                    | 19 ++---
>>   arch/powerpc/mm/nohash/kaslr_booke.c          | 71 +++++++++++++------
>>   7 files changed, 132 insertions(+), 36 deletions(-)
>>   rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
>>
>> -- 
>> 2.17.2
> 
> .
>
Daniel Axtens Feb. 26, 2020, 11:41 a.m. UTC | #5
Jason Yan <yanaijie@huawei.com> writes:

> Hi Daniel,
>
> 在 2020/2/26 15:16, Daniel Axtens 写道:
>> Hi Jason,
>> 
>>> This is a try to implement KASLR for Freescale BookE64 which is based on
>>> my earlier implementation for Freescale BookE32:
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>>>
>>> The implementation for Freescale BookE64 is similar as BookE32. One
>>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>>> booting. Another difference is that ppc64 needs the kernel to be
>>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>>> it 64K-aligned. This can save some code to creat another TLB map at
>>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>>> slots to put the kernel in.
>>>
>>>      KERNELBASE
>>>
>>>            64K                     |--> kernel <--|
>>>             |                      |              |
>>>          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>>>          |  |  |  |....|  |  |  |  |  |  |  |  |  |....|  |  |
>>>          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>>>          |                         |                        1G
>>>          |----->   offset    <-----|
>>>
>>>                                kernstart_virt_addr
>>>
>>> I'm not sure if the slot numbers is enough or the design has any
>>> defects. If you have some better ideas, I would be happy to hear that.
>>>
>>> Thank you all.
>>>
>> 
>> Are you making any attempt to hide kernel address leaks in this series?
>
> Yes.
>
>> I've just been looking at the stackdump code just now, and it directly
>> prints link registers and stack pointers, which is probably enough to
>> determine the kernel base address:
>> 
>>                    SPs:               LRs:             %pS pointer
>> [    0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154 (unreliable)
>> [    0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
>> [    0.424659] [c0000000de403a60] [c0000000024d7a00] mount_block_root+0x634/0x7c0
>> [    0.424734] [c0000000de403be0] [c0000000024d8100] prepare_namespace+0x1ec/0x23c
>> [    0.424811] [c0000000de403c60] [c0000000024d7010] kernel_init_freeable+0x804/0x880
>> 
>> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
>> in process.c or in xmon.
>> 
>
> Thanks for reminding this.
>
>> Maybe replacing the REG format string in KASLR mode would be sufficient?
>> 
>
> Most archs have removed the address printing when dumping stack. Do we 
> really have to print this?
>
> If we have to do this, maybe we can use "%pK" so that they will be 
> hidden from unprivileged users.

I suspect that you will find it easier to convince people to accept a
change to %pK than removal :)

BTW, I have a T4240RDB so I might be able to test this series at some
point - do I need an updated bootloader to pass in a random seed, or is
the kernel able to get enough randomness by itself? (Sorry if this is
explained elsewhere in the series, I have only skimmed it lightly!)

Regards,
Daniel
>
> Thanks,
> Jason
>
>> Regards,
>> Daniel
>> 
>> 
>>> v2->v3:
>>>    Fix build error when KASLR is disabled.
>>> v1->v2:
>>>    Add __kaslr_offset for the secondary cpu boot up.
>>>
>>> Jason Yan (6):
>>>    powerpc/fsl_booke/kaslr: refactor kaslr_legal_offset() and
>>>      kaslr_early_init()
>>>    powerpc/fsl_booke/64: introduce reloc_kernel_entry() helper
>>>    powerpc/fsl_booke/64: implement KASLR for fsl_booke64
>>>    powerpc/fsl_booke/64: do not clear the BSS for the second pass
>>>    powerpc/fsl_booke/64: clear the original kernel if randomized
>>>    powerpc/fsl_booke/kaslr: rename kaslr-booke32.rst to kaslr-booke.rst
>>>      and add 64bit part
>>>
>>>   .../{kaslr-booke32.rst => kaslr-booke.rst}    | 35 +++++++--
>>>   arch/powerpc/Kconfig                          |  2 +-
>>>   arch/powerpc/kernel/exceptions-64e.S          | 23 ++++++
>>>   arch/powerpc/kernel/head_64.S                 | 14 ++++
>>>   arch/powerpc/kernel/setup_64.c                |  4 +-
>>>   arch/powerpc/mm/mmu_decl.h                    | 19 ++---
>>>   arch/powerpc/mm/nohash/kaslr_booke.c          | 71 +++++++++++++------
>>>   7 files changed, 132 insertions(+), 36 deletions(-)
>>>   rename Documentation/powerpc/{kaslr-booke32.rst => kaslr-booke.rst} (59%)
>>>
>>> -- 
>>> 2.17.2
>> 
>> .
>>
Jason Yan Feb. 27, 2020, 1:55 a.m. UTC | #6
在 2020/2/26 19:41, Daniel Axtens 写道:
> I suspect that you will find it easier to convince people to accept a
> change to %pK than removal:)
> 
> BTW, I have a T4240RDB so I might be able to test this series at some
> point - do I need an updated bootloader to pass in a random seed, or is
> the kernel able to get enough randomness by itself? (Sorry if this is
> explained elsewhere in the series, I have only skimmed it lightly!)
> 

Thanks. It will be great if you have time to test this series.

You do not need an updated bootloader becuase the kernel is
using timer base as a simple source of entropy. This is enough for
testing the KASLR code itself.

Jason

> Regards,
> Daniel
Crystal Wood Feb. 28, 2020, 5:53 a.m. UTC | #7
On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote:
> Hi Daniel,
> 
> 在 2020/2/26 15:16, Daniel Axtens 写道:
> > Hi Jason,
> > 
> > > This is a try to implement KASLR for Freescale BookE64 which is based on
> > > my earlier implementation for Freescale BookE32:
> > > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
> > > 
> > > The implementation for Freescale BookE64 is similar as BookE32. One
> > > difference is that Freescale BookE64 set up a TLB mapping of 1G during
> > > booting. Another difference is that ppc64 needs the kernel to be
> > > 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> > > it 64K-aligned. This can save some code to creat another TLB map at
> > > early boot. The disadvantage is that we only have about 1G/64K = 16384
> > > slots to put the kernel in.
> > > 
> > >      KERNELBASE
> > > 
> > >            64K                     |--> kernel <--|
> > >             |                      |              |
> > >          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
> > >          |  |  |  |....|  |  |  |  |  |  |  |  |  |....|  |  |
> > >          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
> > >          |                         |                        1G
> > >          |----->   offset    <-----|
> > > 
> > >                                kernstart_virt_addr
> > > 
> > > I'm not sure if the slot numbers is enough or the design has any
> > > defects. If you have some better ideas, I would be happy to hear that.
> > > 
> > > Thank you all.
> > > 
> > 
> > Are you making any attempt to hide kernel address leaks in this series?
> 
> Yes.
> 
> > I've just been looking at the stackdump code just now, and it directly
> > prints link registers and stack pointers, which is probably enough to
> > determine the kernel base address:
> > 
> >                    SPs:               LRs:             %pS pointer
> > [    0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154
> > (unreliable)
> > [    0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
> > [    0.424659] [c0000000de403a60] [c0000000024d7a00]
> > mount_block_root+0x634/0x7c0
> > [    0.424734] [c0000000de403be0] [c0000000024d8100]
> > prepare_namespace+0x1ec/0x23c
> > [    0.424811] [c0000000de403c60] [c0000000024d7010]
> > kernel_init_freeable+0x804/0x880
> > 
> > git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
> > in process.c or in xmon.
> > 
> 
> Thanks for reminding this.
> 
> > Maybe replacing the REG format string in KASLR mode would be sufficient?
> > 
> 
> Most archs have removed the address printing when dumping stack. Do we 
> really have to print this?
> 
> If we have to do this, maybe we can use "%pK" so that they will be 
> hidden from unprivileged users.

I've found the addresses to be useful, especially if I had a way to dump the
stack data itself.  Wouldn't the register dump also be likely to give away the
addresses?

I don't see any debug setting for %pK (or %p) to always print the actual
address (closest is kptr_restrict=1 but that only works in certain
contexts)... from looking at the code it seems it hashes even if kaslr is
entirely disabled?  Or am I missing something?

-Scott
Jason Yan Feb. 28, 2020, 6:47 a.m. UTC | #8
在 2020/2/28 13:53, Scott Wood 写道:
> On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote:
>> Hi Daniel,
>>
>> 在 2020/2/26 15:16, Daniel Axtens 写道:
>>> Hi Jason,
>>>
>>>> This is a try to implement KASLR for Freescale BookE64 which is based on
>>>> my earlier implementation for Freescale BookE32:
>>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>>>>
>>>> The implementation for Freescale BookE64 is similar as BookE32. One
>>>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>>>> booting. Another difference is that ppc64 needs the kernel to be
>>>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>>>> it 64K-aligned. This can save some code to creat another TLB map at
>>>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>>>> slots to put the kernel in.
>>>>
>>>>       KERNELBASE
>>>>
>>>>             64K                     |--> kernel <--|
>>>>              |                      |              |
>>>>           +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>>>>           |  |  |  |....|  |  |  |  |  |  |  |  |  |....|  |  |
>>>>           +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>>>>           |                         |                        1G
>>>>           |----->   offset    <-----|
>>>>
>>>>                                 kernstart_virt_addr
>>>>
>>>> I'm not sure if the slot numbers is enough or the design has any
>>>> defects. If you have some better ideas, I would be happy to hear that.
>>>>
>>>> Thank you all.
>>>>
>>>
>>> Are you making any attempt to hide kernel address leaks in this series?
>>
>> Yes.
>>
>>> I've just been looking at the stackdump code just now, and it directly
>>> prints link registers and stack pointers, which is probably enough to
>>> determine the kernel base address:
>>>
>>>                     SPs:               LRs:             %pS pointer
>>> [    0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154
>>> (unreliable)
>>> [    0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
>>> [    0.424659] [c0000000de403a60] [c0000000024d7a00]
>>> mount_block_root+0x634/0x7c0
>>> [    0.424734] [c0000000de403be0] [c0000000024d8100]
>>> prepare_namespace+0x1ec/0x23c
>>> [    0.424811] [c0000000de403c60] [c0000000024d7010]
>>> kernel_init_freeable+0x804/0x880
>>>
>>> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
>>> in process.c or in xmon.
>>>
>>
>> Thanks for reminding this.
>>
>>> Maybe replacing the REG format string in KASLR mode would be sufficient?
>>>
>>
>> Most archs have removed the address printing when dumping stack. Do we
>> really have to print this?
>>
>> If we have to do this, maybe we can use "%pK" so that they will be
>> hidden from unprivileged users.
> 
> I've found the addresses to be useful, especially if I had a way to dump the
> stack data itself.  Wouldn't the register dump also be likely to give away the
> addresses?

If we have to print the address, then kptr_restrict and dmesg_restrict
must be set properly so that unprivileged users cannot see them.

> 
> I don't see any debug setting for %pK (or %p) to always print the actual
> address (closest is kptr_restrict=1 but that only works in certain
> contexts)... from looking at the code it seems it hashes even if kaslr is
> entirely disabled?  Or am I missing something?
> 

Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if
we want the real value of the address, we cannot use it. But if you only
want to distinguish if two pointers are the same, it's ok.

> -Scott
> 
> 
> 
> .
>
Crystal Wood Feb. 29, 2020, 4:28 a.m. UTC | #9
On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote:
> 
> 在 2020/2/28 13:53, Scott Wood 写道:
> > On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote:
> > > Hi Daniel,
> > > 
> > > 在 2020/2/26 15:16, Daniel Axtens 写道:
> > > > Maybe replacing the REG format string in KASLR mode would be
> > > > sufficient?
> > > > 
> > > 
> > > Most archs have removed the address printing when dumping stack. Do we
> > > really have to print this?
> > > 
> > > If we have to do this, maybe we can use "%pK" so that they will be
> > > hidden from unprivileged users.
> > 
> > I've found the addresses to be useful, especially if I had a way to dump
> > the
> > stack data itself.  Wouldn't the register dump also be likely to give away
> > the
> > addresses?
> 
> If we have to print the address, then kptr_restrict and dmesg_restrict
> must be set properly so that unprivileged users cannot see them.

And how does that work with crash dumps that could be from any context?

dmesg_restrict is irrelevant as it just controls who can see the dmesg, not
what goes into it.  kptr_restrict=1 will only get the value if you're not in
any sort of IRQ, *and* if the crashing context happened to have CAP_SYSLOG. 
No other value of kptr_restrict will ever get you the raw value.

> > 
> > I don't see any debug setting for %pK (or %p) to always print the actual
> > address (closest is kptr_restrict=1 but that only works in certain
> > contexts)... from looking at the code it seems it hashes even if kaslr is
> > entirely disabled?  Or am I missing something?
> > 
> 
> Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if
> we want the real value of the address, we cannot use it. But if you only
> want to distinguish if two pointers are the same, it's ok.

Am I the only one that finds this a bit crazy?  If you want to lock a system
down then fine, but why wage war on debugging even when there's no
randomization going on?  Comparing two pointers for equality is not always
adequate.

-Scott
Jason Yan Feb. 29, 2020, 7:27 a.m. UTC | #10
在 2020/2/29 12:28, Scott Wood 写道:
> On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote:
>>
>> 在 2020/2/28 13:53, Scott Wood 写道:
>>> On Wed, 2020-02-26 at 16:18 +0800, Jason Yan wrote:
>>>> Hi Daniel,
>>>>
>>>> 在 2020/2/26 15:16, Daniel Axtens 写道:
>>>>> Maybe replacing the REG format string in KASLR mode would be
>>>>> sufficient?
>>>>>
>>>>
>>>> Most archs have removed the address printing when dumping stack. Do we
>>>> really have to print this?
>>>>
>>>> If we have to do this, maybe we can use "%pK" so that they will be
>>>> hidden from unprivileged users.
>>>
>>> I've found the addresses to be useful, especially if I had a way to dump
>>> the
>>> stack data itself.  Wouldn't the register dump also be likely to give away
>>> the
>>> addresses?
>>
>> If we have to print the address, then kptr_restrict and dmesg_restrict
>> must be set properly so that unprivileged users cannot see them.
> 
> And how does that work with crash dumps that could be from any context?
> 
> dmesg_restrict is irrelevant as it just controls who can see the dmesg, not
> what goes into it.  kptr_restrict=1 will only get the value if you're not in
> any sort of IRQ, *and* if the crashing context happened to have CAP_SYSLOG.
> No other value of kptr_restrict will ever get you the raw value.
>
>>>
>>> I don't see any debug setting for %pK (or %p) to always print the actual
>>> address (closest is kptr_restrict=1 but that only works in certain
>>> contexts)... from looking at the code it seems it hashes even if kaslr is
>>> entirely disabled?  Or am I missing something?
>>>
>>
>> Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if
>> we want the real value of the address, we cannot use it. But if you only
>> want to distinguish if two pointers are the same, it's ok.
> 
> Am I the only one that finds this a bit crazy?  If you want to lock a system
> down then fine, but why wage war on debugging even when there's no
> randomization going on?  Comparing two pointers for equality is not always
> adequate.
> 

AFAIK, %p hashing is only exist because of many legacy address printings
and force who really want the raw values to switch to %px or even %lx.
It's not the opposite of debugging. Raw address printing is not
forbidden, only people need to estimate the risk of adrdress leaks.

Turnning to %p may not be a good idea in this situation. So
for the REG logs printed when dumping stack, we can disable it when
KASLR is open. For the REG logs in other places like show_regs(), only
privileged can trigger it, and they are not combind with a symbol, so
I think it's ok to keep them.

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index fad50db9dcf2..659c51f0739a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned 
long *stack)
                 newsp = stack[0];
                 ip = stack[STACK_FRAME_LR_SAVE];
                 if (!firstframe || ip != lr) {
-                       printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
+                       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
+                               printk("%pS", (void *)ip);
+                       else
+                               printk("["REG"] ["REG"] %pS", sp, ip, 
(void *)ip);
  #ifdef CONFIG_FUNCTION_GRAPH_TRACER
                         ret_addr = ftrace_graph_ret_addr(current,
                                                 &ftrace_idx, ip, stack);


Thanks,
Jason

> -Scott
> 
> 
> 
> .
>
Crystal Wood Feb. 29, 2020, 10:54 p.m. UTC | #11
On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
> 
> 在 2020/2/29 12:28, Scott Wood 写道:
> > On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote:
> > > 
> > > 在 2020/2/28 13:53, Scott Wood 写道:
> > > > 
> > > > I don't see any debug setting for %pK (or %p) to always print the
> > > > actual
> > > > address (closest is kptr_restrict=1 but that only works in certain
> > > > contexts)... from looking at the code it seems it hashes even if kaslr
> > > > is
> > > > entirely disabled?  Or am I missing something?
> > > > 
> > > 
> > > Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if
> > > we want the real value of the address, we cannot use it. But if you only
> > > want to distinguish if two pointers are the same, it's ok.
> > 
> > Am I the only one that finds this a bit crazy?  If you want to lock a
> > system
> > down then fine, but why wage war on debugging even when there's no
> > randomization going on?  Comparing two pointers for equality is not always
> > adequate.
> > 
> 
> AFAIK, %p hashing is only exist because of many legacy address printings
> and force who really want the raw values to switch to %px or even %lx.
> It's not the opposite of debugging. Raw address printing is not
> forbidden, only people need to estimate the risk of adrdress leaks.

Yes, but I don't see any format specifier to switch to that will hash in a
randomized production environment, but not in a debug or other non-randomized
environment which seems like the ideal default for most debug output.

> 
> Turnning to %p may not be a good idea in this situation. So
> for the REG logs printed when dumping stack, we can disable it when
> KASLR is open. For the REG logs in other places like show_regs(), only
> privileged can trigger it, and they are not combind with a symbol, so
> I think it's ok to keep them.
> 
> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
> index fad50db9dcf2..659c51f0739a 100644
> --- a/arch/powerpc/kernel/process.c
> +++ b/arch/powerpc/kernel/process.c
> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned 
> long *stack)
>                  newsp = stack[0];
>                  ip = stack[STACK_FRAME_LR_SAVE];
>                  if (!firstframe || ip != lr) {
> -                       printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
> +                       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> +                               printk("%pS", (void *)ip);
> +                       else
> +                               printk("["REG"] ["REG"] %pS", sp, ip, 
> (void *)ip);

This doesn't deal with "nokaslr" on the kernel command line.  It also doesn't
seem like something that every callsite should have to opencode, versus having
an appropriate format specifier behaves as I described above (and I still
don't see why that format specifier should not be "%p").

-Scott
Jason Yan March 2, 2020, 2:17 a.m. UTC | #12
在 2020/3/1 6:54, Scott Wood 写道:
> On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
>>
>> 在 2020/2/29 12:28, Scott Wood 写道:
>>> On Fri, 2020-02-28 at 14:47 +0800, Jason Yan wrote:
>>>>
>>>> 在 2020/2/28 13:53, Scott Wood 写道:
>>>>>
>>>>> I don't see any debug setting for %pK (or %p) to always print the
>>>>> actual
>>>>> address (closest is kptr_restrict=1 but that only works in certain
>>>>> contexts)... from looking at the code it seems it hashes even if kaslr
>>>>> is
>>>>> entirely disabled?  Or am I missing something?
>>>>>
>>>>
>>>> Yes, %pK (or %p) always hashes whether kaslr is disabled or not. So if
>>>> we want the real value of the address, we cannot use it. But if you only
>>>> want to distinguish if two pointers are the same, it's ok.
>>>
>>> Am I the only one that finds this a bit crazy?  If you want to lock a
>>> system
>>> down then fine, but why wage war on debugging even when there's no
>>> randomization going on?  Comparing two pointers for equality is not always
>>> adequate.
>>>
>>
>> AFAIK, %p hashing is only exist because of many legacy address printings
>> and force who really want the raw values to switch to %px or even %lx.
>> It's not the opposite of debugging. Raw address printing is not
>> forbidden, only people need to estimate the risk of adrdress leaks.
> 
> Yes, but I don't see any format specifier to switch to that will hash in a
> randomized production environment, but not in a debug or other non-randomized
> environment which seems like the ideal default for most debug output.
> 

Sorry I have no idea why there is no format specifier considered for 
switching of randomized or non-randomized environment. May they think 
that raw address should not leak in non-randomized environment too. May 
be Kees or Tobin can answer this question.

Kees? Tobin?

>>
>> Turnning to %p may not be a good idea in this situation. So
>> for the REG logs printed when dumping stack, we can disable it when
>> KASLR is open. For the REG logs in other places like show_regs(), only
>> privileged can trigger it, and they are not combind with a symbol, so
>> I think it's ok to keep them.
>>
>> diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
>> index fad50db9dcf2..659c51f0739a 100644
>> --- a/arch/powerpc/kernel/process.c
>> +++ b/arch/powerpc/kernel/process.c
>> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned
>> long *stack)
>>                   newsp = stack[0];
>>                   ip = stack[STACK_FRAME_LR_SAVE];
>>                   if (!firstframe || ip != lr) {
>> -                       printk("["REG"] ["REG"] %pS", sp, ip, (void *)ip);
>> +                       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
>> +                               printk("%pS", (void *)ip);
>> +                       else
>> +                               printk("["REG"] ["REG"] %pS", sp, ip,
>> (void *)ip);
> 
> This doesn't deal with "nokaslr" on the kernel command line.  It also doesn't
> seem like something that every callsite should have to opencode, versus having
> an appropriate format specifier behaves as I described above (and I still
> don't see why that format specifier should not be "%p").
> 

Actually I still do not understand why we should print the raw value 
here. When KALLSYMS is enabled we have symbol name  and  offset like 
put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw 
address.

> -Scott
> 
> 
> 
> .
>
Crystal Wood March 2, 2020, 3:24 a.m. UTC | #13
On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote:
> 
> 在 2020/3/1 6:54, Scott Wood 写道:
> > On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
> > > 
> > > Turnning to %p may not be a good idea in this situation. So
> > > for the REG logs printed when dumping stack, we can disable it when
> > > KASLR is open. For the REG logs in other places like show_regs(), only
> > > privileged can trigger it, and they are not combind with a symbol, so
> > > I think it's ok to keep them.
> > > 
> > > diff --git a/arch/powerpc/kernel/process.c
> > > b/arch/powerpc/kernel/process.c
> > > index fad50db9dcf2..659c51f0739a 100644
> > > --- a/arch/powerpc/kernel/process.c
> > > +++ b/arch/powerpc/kernel/process.c
> > > @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned
> > > long *stack)
> > >                   newsp = stack[0];
> > >                   ip = stack[STACK_FRAME_LR_SAVE];
> > >                   if (!firstframe || ip != lr) {
> > > -                       printk("["REG"] ["REG"] %pS", sp, ip, (void
> > > *)ip);
> > > +                       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> > > +                               printk("%pS", (void *)ip);
> > > +                       else
> > > +                               printk("["REG"] ["REG"] %pS", sp, ip,
> > > (void *)ip);
> > 
> > This doesn't deal with "nokaslr" on the kernel command line.  It also
> > doesn't
> > seem like something that every callsite should have to opencode, versus
> > having
> > an appropriate format specifier behaves as I described above (and I still
> > don't see why that format specifier should not be "%p").
> > 
> 
> Actually I still do not understand why we should print the raw value 
> here. When KALLSYMS is enabled we have symbol name  and  offset like 
> put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw 
> address.

I'm more concerned about the stack address for wading through a raw stack dump
(to find function call arguments, etc).  The return address does help confirm
that I'm on the right stack frame though, and also makes looking up a line
number slightly easier than having to look up a symbol address and then add
the offset (at least for non-module addresses).

As a random aside, the mismatch between Linux printing a hex offset and GDB
using decimal in disassembly is annoying...

-Scott
Jason Yan March 2, 2020, 7:12 a.m. UTC | #14
在 2020/3/2 11:24, Scott Wood 写道:
> On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote:
>>
>> 在 2020/3/1 6:54, Scott Wood 写道:
>>> On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
>>>>
>>>> Turnning to %p may not be a good idea in this situation. So
>>>> for the REG logs printed when dumping stack, we can disable it when
>>>> KASLR is open. For the REG logs in other places like show_regs(), only
>>>> privileged can trigger it, and they are not combind with a symbol, so
>>>> I think it's ok to keep them.
>>>>
>>>> diff --git a/arch/powerpc/kernel/process.c
>>>> b/arch/powerpc/kernel/process.c
>>>> index fad50db9dcf2..659c51f0739a 100644
>>>> --- a/arch/powerpc/kernel/process.c
>>>> +++ b/arch/powerpc/kernel/process.c
>>>> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk, unsigned
>>>> long *stack)
>>>>                    newsp = stack[0];
>>>>                    ip = stack[STACK_FRAME_LR_SAVE];
>>>>                    if (!firstframe || ip != lr) {
>>>> -                       printk("["REG"] ["REG"] %pS", sp, ip, (void
>>>> *)ip);
>>>> +                       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
>>>> +                               printk("%pS", (void *)ip);
>>>> +                       else
>>>> +                               printk("["REG"] ["REG"] %pS", sp, ip,
>>>> (void *)ip);
>>>
>>> This doesn't deal with "nokaslr" on the kernel command line.  It also
>>> doesn't
>>> seem like something that every callsite should have to opencode, versus
>>> having
>>> an appropriate format specifier behaves as I described above (and I still
>>> don't see why that format specifier should not be "%p").
>>>
>>
>> Actually I still do not understand why we should print the raw value
>> here. When KALLSYMS is enabled we have symbol name  and  offset like
>> put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw
>> address.
> 
> I'm more concerned about the stack address for wading through a raw stack dump
> (to find function call arguments, etc).  The return address does help confirm
> that I'm on the right stack frame though, and also makes looking up a line
> number slightly easier than having to look up a symbol address and then add
> the offset (at least for non-module addresses).
> 
> As a random aside, the mismatch between Linux printing a hex offset and GDB
> using decimal in disassembly is annoying...
> 

OK, I will send a RFC patch to add a new format specifier such as "%pk" 
or change the exsiting "%pK" to print raw value of addresses when KASLR 
is disabled and print hash value of addresses when KASLR is enabled. 
Let's see what the printk guys would say :)


> -Scott
> 
> 
> 
> .
>
Crystal Wood March 2, 2020, 8:47 a.m. UTC | #15
On Mon, 2020-03-02 at 15:12 +0800, Jason Yan wrote:
> 
> 在 2020/3/2 11:24, Scott Wood 写道:
> > On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote:
> > > 
> > > 在 2020/3/1 6:54, Scott Wood 写道:
> > > > On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
> > > > > 
> > > > > Turnning to %p may not be a good idea in this situation. So
> > > > > for the REG logs printed when dumping stack, we can disable it when
> > > > > KASLR is open. For the REG logs in other places like show_regs(),
> > > > > only
> > > > > privileged can trigger it, and they are not combind with a symbol,
> > > > > so
> > > > > I think it's ok to keep them.
> > > > > 
> > > > > diff --git a/arch/powerpc/kernel/process.c
> > > > > b/arch/powerpc/kernel/process.c
> > > > > index fad50db9dcf2..659c51f0739a 100644
> > > > > --- a/arch/powerpc/kernel/process.c
> > > > > +++ b/arch/powerpc/kernel/process.c
> > > > > @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk,
> > > > > unsigned
> > > > > long *stack)
> > > > >                    newsp = stack[0];
> > > > >                    ip = stack[STACK_FRAME_LR_SAVE];
> > > > >                    if (!firstframe || ip != lr) {
> > > > > -                       printk("["REG"] ["REG"] %pS", sp, ip, (void
> > > > > *)ip);
> > > > > +                       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
> > > > > +                               printk("%pS", (void *)ip);
> > > > > +                       else
> > > > > +                               printk("["REG"] ["REG"] %pS", sp,
> > > > > ip,
> > > > > (void *)ip);
> > > > 
> > > > This doesn't deal with "nokaslr" on the kernel command line.  It also
> > > > doesn't
> > > > seem like something that every callsite should have to opencode,
> > > > versus
> > > > having
> > > > an appropriate format specifier behaves as I described above (and I
> > > > still
> > > > don't see why that format specifier should not be "%p").
> > > > 
> > > 
> > > Actually I still do not understand why we should print the raw value
> > > here. When KALLSYMS is enabled we have symbol name  and  offset like
> > > put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw
> > > address.
> > 
> > I'm more concerned about the stack address for wading through a raw stack
> > dump
> > (to find function call arguments, etc).  The return address does help
> > confirm
> > that I'm on the right stack frame though, and also makes looking up a line
> > number slightly easier than having to look up a symbol address and then
> > add
> > the offset (at least for non-module addresses).
> > 
> > As a random aside, the mismatch between Linux printing a hex offset and
> > GDB
> > using decimal in disassembly is annoying...
> > 
> 
> OK, I will send a RFC patch to add a new format specifier such as "%pk" 
> or change the exsiting "%pK" to print raw value of addresses when KASLR 
> is disabled and print hash value of addresses when KASLR is enabled. 
> Let's see what the printk guys would say :)

I'm not sure that a new format specifier is needed versus changing the
behavior of "%p", and "%pK" definitely doesn't seem suitable given that it's
intended to be more restricted than "%p" (see commit ef0010a30935de4).  The
question is whether there is a legitimate reason to hash in the absence of
kaslr.

-Scott
Jason Yan March 2, 2020, 9:37 a.m. UTC | #16
在 2020/3/2 16:47, Scott Wood 写道:
> On Mon, 2020-03-02 at 15:12 +0800, Jason Yan wrote:
>>
>> 在 2020/3/2 11:24, Scott Wood 写道:
>>> On Mon, 2020-03-02 at 10:17 +0800, Jason Yan wrote:
>>>>
>>>> 在 2020/3/1 6:54, Scott Wood 写道:
>>>>> On Sat, 2020-02-29 at 15:27 +0800, Jason Yan wrote:
>>>>>>
>>>>>> Turnning to %p may not be a good idea in this situation. So
>>>>>> for the REG logs printed when dumping stack, we can disable it when
>>>>>> KASLR is open. For the REG logs in other places like show_regs(),
>>>>>> only
>>>>>> privileged can trigger it, and they are not combind with a symbol,
>>>>>> so
>>>>>> I think it's ok to keep them.
>>>>>>
>>>>>> diff --git a/arch/powerpc/kernel/process.c
>>>>>> b/arch/powerpc/kernel/process.c
>>>>>> index fad50db9dcf2..659c51f0739a 100644
>>>>>> --- a/arch/powerpc/kernel/process.c
>>>>>> +++ b/arch/powerpc/kernel/process.c
>>>>>> @@ -2068,7 +2068,10 @@ void show_stack(struct task_struct *tsk,
>>>>>> unsigned
>>>>>> long *stack)
>>>>>>                     newsp = stack[0];
>>>>>>                     ip = stack[STACK_FRAME_LR_SAVE];
>>>>>>                     if (!firstframe || ip != lr) {
>>>>>> -                       printk("["REG"] ["REG"] %pS", sp, ip, (void
>>>>>> *)ip);
>>>>>> +                       if (IS_ENABLED(CONFIG_RANDOMIZE_BASE))
>>>>>> +                               printk("%pS", (void *)ip);
>>>>>> +                       else
>>>>>> +                               printk("["REG"] ["REG"] %pS", sp,
>>>>>> ip,
>>>>>> (void *)ip);
>>>>>
>>>>> This doesn't deal with "nokaslr" on the kernel command line.  It also
>>>>> doesn't
>>>>> seem like something that every callsite should have to opencode,
>>>>> versus
>>>>> having
>>>>> an appropriate format specifier behaves as I described above (and I
>>>>> still
>>>>> don't see why that format specifier should not be "%p").
>>>>>
>>>>
>>>> Actually I still do not understand why we should print the raw value
>>>> here. When KALLSYMS is enabled we have symbol name  and  offset like
>>>> put_cred_rcu+0x108/0x110, and when KALLSYMS is disabled we have the raw
>>>> address.
>>>
>>> I'm more concerned about the stack address for wading through a raw stack
>>> dump
>>> (to find function call arguments, etc).  The return address does help
>>> confirm
>>> that I'm on the right stack frame though, and also makes looking up a line
>>> number slightly easier than having to look up a symbol address and then
>>> add
>>> the offset (at least for non-module addresses).
>>>
>>> As a random aside, the mismatch between Linux printing a hex offset and
>>> GDB
>>> using decimal in disassembly is annoying...
>>>
>>
>> OK, I will send a RFC patch to add a new format specifier such as "%pk"
>> or change the exsiting "%pK" to print raw value of addresses when KASLR
>> is disabled and print hash value of addresses when KASLR is enabled.
>> Let's see what the printk guys would say :)
> 
> I'm not sure that a new format specifier is needed versus changing the
> behavior of "%p", and "%pK" definitely doesn't seem suitable given that it's
> intended to be more restricted than "%p" (see commit ef0010a30935de4).  The
> question is whether there is a legitimate reason to hash in the absence of
> kaslr.
> 

The problem is that if we change the behavior of "%p", we have to turn
all exsiting "%p" to "%pK". Hashing is still reasonable when there is no
kaslr because some architectures support randomize at build time such as 
arm64.


> -Scott
> 
> 
> 
> .
>
Crystal Wood March 4, 2020, 9:21 p.m. UTC | #17
On Wed, 2020-02-26 at 18:16 +1100, Daniel Axtens wrote:
> Hi Jason,
> 
> > This is a try to implement KASLR for Freescale BookE64 which is based on
> > my earlier implementation for Freescale BookE32:
> > https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
> > 
> > The implementation for Freescale BookE64 is similar as BookE32. One
> > difference is that Freescale BookE64 set up a TLB mapping of 1G during
> > booting. Another difference is that ppc64 needs the kernel to be
> > 64K-aligned. So we can randomize the kernel in this 1G mapping and make
> > it 64K-aligned. This can save some code to creat another TLB map at
> > early boot. The disadvantage is that we only have about 1G/64K = 16384
> > slots to put the kernel in.
> > 
> >     KERNELBASE
> > 
> >           64K                     |--> kernel <--|
> >            |                      |              |
> >         +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
> >         |  |  |  |....|  |  |  |  |  |  |  |  |  |....|  |  |
> >         +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
> >         |                         |                        1G
> >         |----->   offset    <-----|
> > 
> >                               kernstart_virt_addr
> > 
> > I'm not sure if the slot numbers is enough or the design has any
> > defects. If you have some better ideas, I would be happy to hear that.
> > 
> > Thank you all.
> > 
> 
> Are you making any attempt to hide kernel address leaks in this series?
> I've just been looking at the stackdump code just now, and it directly
> prints link registers and stack pointers, which is probably enough to
> determine the kernel base address:
> 
>                   SPs:               LRs:             %pS pointer
> [    0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154
> (unreliable)
> [    0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
> [    0.424659] [c0000000de403a60] [c0000000024d7a00]
> mount_block_root+0x634/0x7c0
> [    0.424734] [c0000000de403be0] [c0000000024d8100]
> prepare_namespace+0x1ec/0x23c
> [    0.424811] [c0000000de403c60] [c0000000024d7010]
> kernel_init_freeable+0x804/0x880
> 
> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
> in process.c or in xmon.
> 
> Maybe replacing the REG format string in KASLR mode would be sufficient?

Whatever we decide to do here, it's not book3e-specific so it should be
considered separately from these patches.

-Scott
Jason Yan March 5, 2020, 3:22 a.m. UTC | #18
在 2020/3/5 5:21, Scott Wood 写道:
> On Wed, 2020-02-26 at 18:16 +1100, Daniel Axtens wrote:
>> Hi Jason,
>>
>>> This is a try to implement KASLR for Freescale BookE64 which is based on
>>> my earlier implementation for Freescale BookE32:
>>> https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=131718
>>>
>>> The implementation for Freescale BookE64 is similar as BookE32. One
>>> difference is that Freescale BookE64 set up a TLB mapping of 1G during
>>> booting. Another difference is that ppc64 needs the kernel to be
>>> 64K-aligned. So we can randomize the kernel in this 1G mapping and make
>>> it 64K-aligned. This can save some code to creat another TLB map at
>>> early boot. The disadvantage is that we only have about 1G/64K = 16384
>>> slots to put the kernel in.
>>>
>>>      KERNELBASE
>>>
>>>            64K                     |--> kernel <--|
>>>             |                      |              |
>>>          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>>>          |  |  |  |....|  |  |  |  |  |  |  |  |  |....|  |  |
>>>          +--+--+--+    +--+--+--+--+--+--+--+--+--+    +--+--+
>>>          |                         |                        1G
>>>          |----->   offset    <-----|
>>>
>>>                                kernstart_virt_addr
>>>
>>> I'm not sure if the slot numbers is enough or the design has any
>>> defects. If you have some better ideas, I would be happy to hear that.
>>>
>>> Thank you all.
>>>
>>
>> Are you making any attempt to hide kernel address leaks in this series?
>> I've just been looking at the stackdump code just now, and it directly
>> prints link registers and stack pointers, which is probably enough to
>> determine the kernel base address:
>>
>>                    SPs:               LRs:             %pS pointer
>> [    0.424506] [c0000000de403970] [c000000001fc0458] dump_stack+0xfc/0x154
>> (unreliable)
>> [    0.424593] [c0000000de4039c0] [c000000000267eec] panic+0x258/0x5ac
>> [    0.424659] [c0000000de403a60] [c0000000024d7a00]
>> mount_block_root+0x634/0x7c0
>> [    0.424734] [c0000000de403be0] [c0000000024d8100]
>> prepare_namespace+0x1ec/0x23c
>> [    0.424811] [c0000000de403c60] [c0000000024d7010]
>> kernel_init_freeable+0x804/0x880
>>
>> git grep \\\"REG\\\" arch/powerpc shows a few other uses like this, all
>> in process.c or in xmon.
>>
>> Maybe replacing the REG format string in KASLR mode would be sufficient?
> 
> Whatever we decide to do here, it's not book3e-specific so it should be
> considered separately from these patches.
> 

OK, I will continue to work with this series.

> -Scott
> 
> 
> 
> .
>