[v3,1/3] arm64, vmcoreinfo : Append 'PTRS_PER_PGD' to vmcoreinfo
diff mbox series

Message ID 1553058574-18606-2-git-send-email-bhsharma@redhat.com
State New
Headers show
Series
  • Append new variables to vmcoreinfo (PTRS_PER_PGD for arm64 and MAX_PHYSMEM_BITS for all archs)
Related show

Commit Message

Bhupesh Sharma March 20, 2019, 5:09 a.m. UTC
With ARMv8.2-LVA architecture extension availability, arm64 hardware
which supports this extension can support a virtual address-space upto
52-bits.

Since at the moment we enable the support of this extension in kernel
via CONFIG flags, e.g.
 - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52

so, there is no clear mechanism in the user-space right now to
determine these CONFIG flag values and hence determine the maximum
virtual address space supported by the underlying kernel.

User-space tools like 'makedumpfile' therefore are broken currently
as they have no proper method to calculate the 'PTRS_PER_PGD' value
which is required to perform a page table walk to determine the
physical address of a corresponding virtual address found in
kcore/vmcoreinfo.

If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
it can be used in user-space to determine the maximum virtual address
supported by underlying kernel.

A reference 'makedumpfile' implementation which uses this approach to
determining the maximum physical address is available in [0].

[0]. https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-va-support-via-vmcore-upstream-v3/arch/arm64.c#L459

Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Dave Anderson <anderson@redhat.com>
Cc: Kazuhito Hagio <k-hagio@ab.jp.nec.com>
Cc: linux-kernel@vger.kernel.org
Cc: kexec@lists.infradead.org
Cc: linux-arm-kernel@lists.infradead.org
Suggested-by: Steve Capper <Steve.Capper@arm.com>
Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com>
---
 arch/arm64/kernel/crash_core.c | 1 +
 1 file changed, 1 insertion(+)

Comments

James Morse March 26, 2019, 4:36 p.m. UTC | #1
Hi Bhupesh,

On 20/03/2019 05:09, Bhupesh Sharma wrote:
> With ARMv8.2-LVA architecture extension availability, arm64 hardware
> which supports this extension can support a virtual address-space upto
> 52-bits.
> 
> Since at the moment we enable the support of this extension in kernel
> via CONFIG flags, e.g.
>  - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
> 
> so, there is no clear mechanism in the user-space right now to
> determine these CONFIG flag values and hence determine the maximum
> virtual address space supported by the underlying kernel.
> 
> User-space tools like 'makedumpfile' therefore are broken currently
> as they have no proper method to calculate the 'PTRS_PER_PGD' value
> which is required to perform a page table walk to determine the
> physical address of a corresponding virtual address found in
> kcore/vmcoreinfo.
> 
> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
> it can be used in user-space to determine the maximum virtual address
> supported by underlying kernel.

I don't think this really solves the problem, it feels fragile.

I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and PTRS_PER_PGD=1024.
You can use this to work out that the top level page table size isn't consistent with a
48bit VA, so 52bit VA must be in use...

But wasn't your problem walking the kernel page tables? In particular the offset that we
apply because the tables were based on a 48bit VA shifted up in swapper_pg_dir.

Where does the TTBR1_EL1 offset come from with this property? I assume makedumpfile
hard-codes it when it sees 52bit is in use ... somewhere.
We haven't solved the problem!

Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but in the future it
could set them the same, or different the other-way-round.

Will makedumpfile using this value keep working once T1SZ is 52bit VA too? In this case
there would be no ttbr offset.

If you need another vmcoreinfo flag once that happens, we've done something wrong here.

(Not to mention what happens if the TTBR1_EL1 uses 52bit va, but TTBR0_EL1 doesn't)


> Suggested-by: Steve Capper <Steve.Capper@arm.com>

(CC: +Steve)


Thanks,

James
Kazuhito Hagio March 27, 2019, 4:07 p.m. UTC | #2
On 3/26/2019 12:36 PM, James Morse wrote:
> Hi Bhupesh,
> 
> On 20/03/2019 05:09, Bhupesh Sharma wrote:
> > With ARMv8.2-LVA architecture extension availability, arm64 hardware
> > which supports this extension can support a virtual address-space upto
> > 52-bits.
> >
> > Since at the moment we enable the support of this extension in kernel
> > via CONFIG flags, e.g.
> >  - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
> >
> > so, there is no clear mechanism in the user-space right now to
> > determine these CONFIG flag values and hence determine the maximum
> > virtual address space supported by the underlying kernel.
> >
> > User-space tools like 'makedumpfile' therefore are broken currently
> > as they have no proper method to calculate the 'PTRS_PER_PGD' value
> > which is required to perform a page table walk to determine the
> > physical address of a corresponding virtual address found in
> > kcore/vmcoreinfo.
> >
> > If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
> > it can be used in user-space to determine the maximum virtual address
> > supported by underlying kernel.
> 
> I don't think this really solves the problem, it feels fragile.
> 
> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and PTRS_PER_PGD=1024.
> You can use this to work out that the top level page table size isn't consistent with a
> 48bit VA, so 52bit VA must be in use...
> 
> But wasn't your problem walking the kernel page tables? In particular the offset that we
> apply because the tables were based on a 48bit VA shifted up in swapper_pg_dir.
> 
> Where does the TTBR1_EL1 offset come from with this property? I assume makedumpfile
> hard-codes it when it sees 52bit is in use ... somewhere.

My understanding is that the TTBR1_EL1 offset comes from a kernel
virtual address with the exported PTRS_PER_PGD.

With T1SZ is 48bit and T0SZ is 52bit,

kva = 0xffff000000000000    <--- start of kernel virtual address
pgd_index(kva) = (kva >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)
               = (0xffff000000000000 >> 42) & (1024 - 1)
               = 0x00000000003fffc0 & 0x3ff
               = 0x3c0      <--- the offset (0x3c0) is included

This is what kernel does now, so makedumpfile also wants to do.

> We haven't solved the problem!
> 
> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but in the future it
> could set them the same, or different the other-way-round.
> 
> Will makedumpfile using this value keep working once T1SZ is 52bit VA too? In this case
> there would be no ttbr offset.

If T1SZ is 52bit, probably kernel virtual address starts from 0xfff0000000000000,
then the offset becomes 0 with the pgd_index() above.
I think makedumpfile will keep working with that.

Thanks,
Kazu

> 
> If you need another vmcoreinfo flag once that happens, we've done something wrong here.
> 
> (Not to mention what happens if the TTBR1_EL1 uses 52bit va, but TTBR0_EL1 doesn't)
> 
> 
> > Suggested-by: Steve Capper <Steve.Capper@arm.com>
> 
> (CC: +Steve)
> 
> 
> Thanks,
> 
> James
Bhupesh Sharma March 28, 2019, 11:42 a.m. UTC | #3
Hi James,

Thanks for your review. Please see my comments inline:

On 03/26/2019 10:06 PM, James Morse wrote:
> Hi Bhupesh,
> 
> On 20/03/2019 05:09, Bhupesh Sharma wrote:
>> With ARMv8.2-LVA architecture extension availability, arm64 hardware
>> which supports this extension can support a virtual address-space upto
>> 52-bits.
>>
>> Since at the moment we enable the support of this extension in kernel
>> via CONFIG flags, e.g.
>>   - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
>>
>> so, there is no clear mechanism in the user-space right now to
>> determine these CONFIG flag values and hence determine the maximum
>> virtual address space supported by the underlying kernel.
>>
>> User-space tools like 'makedumpfile' therefore are broken currently
>> as they have no proper method to calculate the 'PTRS_PER_PGD' value
>> which is required to perform a page table walk to determine the
>> physical address of a corresponding virtual address found in
>> kcore/vmcoreinfo.
>>
>> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
>> it can be used in user-space to determine the maximum virtual address
>> supported by underlying kernel.
> 
> I don't think this really solves the problem, it feels fragile.
> 
> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and PTRS_PER_PGD=1024.
> You can use this to work out that the top level page table size isn't consistent with a
> 48bit VA, so 52bit VA must be in use...
> 
> But wasn't your problem walking the kernel page tables? In particular the offset that we
> apply because the tables were based on a 48bit VA shifted up in swapper_pg_dir.
> 
> Where does the TTBR1_EL1 offset come from with this property? I assume makedumpfile
> hard-codes it when it sees 52bit is in use ... somewhere.
> We haven't solved the problem!

But isn't the TTBR1_EL1 offset already appended by the kernel via 
e842dfb5a2d3 ("arm64: mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD")
in case of kernel configuration where 52-bit userspace VAs are possible.

I copy a snippet from the git log of the above from Steve, which 
explains the above:

"   In other words a 48-bit kernel virtual address will have a different
     pgd_index when using PTRS_PER_PGD = 64 and 1024.

     If, however, we note that:
     kva = 0xFFFF << 48 + lower (where lower[63:48] == 0b)
     and, PGDIR_SHIFT = 42 (as we are dealing with 64KB PAGE_SIZE)

     We can consider:
     (kva >> PGDIR_SHIFT) & (1024 - 1) - (kva >> PGDIR_SHIFT) & (64 - 1)
      = (0xFFFF << 6) & 0x3FF - (0xFFFF << 6) & 0x3F     // "lower" 
cancels out
      = 0x3C0

     In other words, one can switch PTRS_PER_PGD to the 52-bit value 
globally
     provided that they increment ttbr1_el1 by 0x3C0 * 8 = 0x1E00 bytes when
     running with 48-bit kernel VAs (TCR_EL1.T1SZ = 16).

     For kernel configuration where 52-bit userspace VAs are possible, this
     patch offsets ttbr1_el1 and sets PTRS_PER_PGD corresponding to the
     52-bit value.
"

Accordingly we have the following assembler helper in 
'arch/arm64/include/asm/assembler.h':

        .macro  offset_ttbr1, ttbr
#ifdef CONFIG_ARM64_52BIT_VA
        orr     \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
#endif
        .endm

where:
#ifdef CONFIG_ARM64_52BIT_VA
/* Must be at least 64-byte aligned to prevent corruption of the TTBR */
#define TTBR1_BADDR_4852_OFFSET        (((UL(1) << (52 - PGDIR_SHIFT)) - \
                                 (UL(1) << (48 - PGDIR_SHIFT))) * 8)
#endif

And before any load TTBR1 operation in the kernel we offset ttbr1_el1, 
in case CONFIG_ARM64_52BIT_VA is true, for e.g in 
'arch/arm64/kernel/head.S':

ENTRY(__enable_mmu)
<..snip..>
	offset_ttbr1 x1
         msr     ttbr1_el1, x1                   // load TTBR1
<..snip..>

So, the user-space (makedumpfile, for e.g.), just needs to know the 
PTRS_PER_PGD value and it can calculate the pgd_index for a virtual 
address using the following formula:

pgd_index(vaddr) = (((vaddr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1));

Note that the above computation holds true both for PTRS_PER_PGD = 64 
(48-bit kernel with 48-bit User VA) and 1024 (48-bit with 52-bit User 
VA) cases. And these are the configurations for which we are trying to 
fix the user-space regressions reported (on arm64) recently.

> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but in the future it
> could set them the same, or different the other-way-round.
> 
> Will makedumpfile using this value keep working once T1SZ is 52bit VA too? In this case
> there would be no ttbr offset.
 >
> If you need another vmcoreinfo flag once that happens, we've done something wrong here.

I am currently experimenting with Steve's patches for 52-bit kernel VA 
(<https://lwn.net/Articles/780093/>) and will comment more on the same 
when I am able to get the user-space utilities like makedumpfile and 
kexec-tools to work with the same on both ARMv8 Fast Simulator model and 
older CPUs which don't support ARMv8.2 extensions.

However, I think we should not hold up fixes for regressions already 
reported, because the 52-bit kernel VA changes probably still need some 
more rework.

> (Not to mention what happens if the TTBR1_EL1 uses 52bit va, but TTBR0_EL1 doesn't)

I am wondering if there are any real users of the above combination.

So far, I have generally come across discussions where the following 
variations of the address spaces have been proposed/requested:
- 48bit kernel VA + 48-bit User VA,
- 48-bit kernel VA + 52-bit User VA,
- 52-bit kernel VA + 52-bit User VA.

Thanks,
Bhupesh

>> Suggested-by: Steve Capper <Steve.Capper@arm.com>
> 
> (CC: +Steve)
> 
> 
> Thanks,
> 
> James
>
James Morse April 2, 2019, 5:26 p.m. UTC | #4
Hi Bhupesh,

On 28/03/2019 11:42, Bhupesh Sharma wrote:
> On 03/26/2019 10:06 PM, James Morse wrote:
>> On 20/03/2019 05:09, Bhupesh Sharma wrote:
>>> With ARMv8.2-LVA architecture extension availability, arm64 hardware
>>> which supports this extension can support a virtual address-space upto
>>> 52-bits.
>>>
>>> Since at the moment we enable the support of this extension in kernel
>>> via CONFIG flags, e.g.
>>>   - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
>>>
>>> so, there is no clear mechanism in the user-space right now to
>>> determine these CONFIG flag values and hence determine the maximum
>>> virtual address space supported by the underlying kernel.
>>>
>>> User-space tools like 'makedumpfile' therefore are broken currently
>>> as they have no proper method to calculate the 'PTRS_PER_PGD' value
>>> which is required to perform a page table walk to determine the
>>> physical address of a corresponding virtual address found in
>>> kcore/vmcoreinfo.
>>>
>>> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
>>> it can be used in user-space to determine the maximum virtual address
>>> supported by underlying kernel.
>>
>> I don't think this really solves the problem, it feels fragile.
>>
>> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and PTRS_PER_PGD=1024.
>> You can use this to work out that the top level page table size isn't consistent with a
>> 48bit VA, so 52bit VA must be in use...
>>
>> But wasn't your problem walking the kernel page tables? In particular the offset that we
>> apply because the tables were based on a 48bit VA shifted up in swapper_pg_dir.
>>
>> Where does the TTBR1_EL1 offset come from with this property? I assume makedumpfile
>> hard-codes it when it sees 52bit is in use ... somewhere.
>> We haven't solved the problem!

> But isn't the TTBR1_EL1 offset already appended by the kernel via e842dfb5a2d3 ("arm64:
> mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD")
> in case of kernel configuration where 52-bit userspace VAs are possible.

> Accordingly we have the following assembler helper in 'arch/arm64/include/asm/assembler.h':
> 
>        .macro  offset_ttbr1, ttbr
> #ifdef CONFIG_ARM64_52BIT_VA
>        orr     \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
> #endif
>        .endm
> 
> where:
> #ifdef CONFIG_ARM64_52BIT_VA
> /* Must be at least 64-byte aligned to prevent corruption of the TTBR */
> #define TTBR1_BADDR_4852_OFFSET        (((UL(1) << (52 - PGDIR_SHIFT)) - \
>                                 (UL(1) << (48 - PGDIR_SHIFT))) * 8)
> #endif

Sure, and all this would work today, because there is only one weird combination. But once
we support another combination of 52bit-va, you'd either need another value, or to start
using PTRS_PER_PGD as a flag for v5.1_FUNNY_BEHAVIOUR_ONE.


[...]

> Note that the above computation holds true both for PTRS_PER_PGD = 64 (48-bit kernel with
> 48-bit User VA) and 1024 (48-bit with 52-bit User VA) cases. And these are the
> configurations for which we are trying to fix the user-space regressions reported (on
> arm64) recently.

... and revisit it when there is another combination?


>> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but in the future it
>> could set them the same, or different the other-way-round.
>>
>> Will makedumpfile using this value keep working once T1SZ is 52bit VA too? In this case
>> there would be no ttbr offset.
>>
>> If you need another vmcoreinfo flag once that happens, we've done something wrong here.
> 
> I am currently experimenting with Steve's patches for 52-bit kernel VA
> (<https://lwn.net/Articles/780093/>) and will comment more on the same when I am able to
> get the user-space utilities like makedumpfile and kexec-tools to work with the same on
> both ARMv8 Fast Simulator model and older CPUs which don't support ARMv8.2 extensions.


> However, I think we should not hold up fixes for regressions already reported, because the
> 52-bit kernel VA changes probably still need some more rework.

Chucking things into vmcoreinfo isn't free: we need to keep them there forever, otherwise
yesterdays version of the tools breaks. Can we take the time to get this right for the
cases we know about?

Yes the kernel code is going to move around, this is why the information we expose via
vmcoreinfo needs to be thought through: something we would always need, regardless of how
the kernel implements it.


>> (Not to mention what happens if the TTBR1_EL1 uses 52bit va, but TTBR0_EL1 doesn't)
> 
> I am wondering if there are any real users of the above combination.

Heh! Is there any hardware that supports this?

Pointer-auth changes all this again, as we may prefer to use the bits for pointer-auth in
one TTB or the other. PTRS_PER_PGD may show the 52bit value in this case, but neither TTBR
is mapping 52bits of VA.


> So far, I have generally come across discussions where the following variations of the
> address spaces have been proposed/requested:
> - 48bit kernel VA + 48-bit User VA,
> - 48-bit kernel VA + 52-bit User VA,

+ 52bit kernel, because there is excessive quantities of memory, and the kernel maps it
all, but 48-bit user, because it never maps all the memory, and we prefer the bits for
pointer-auth.

> - 52-bit kernel VA + 52-bit User VA.

And...  all four may happen with the same built image. I don't see how you can tell these
cases apart with the one (build-time-constant!) PTRS_PER_PGD value.

I'm sure some of these cases are hypothetical, but by considering it all now, we can avoid
three more kernel:vmcoreinfo updates, and three more fix-user-space-to-use-the-new-value.


I think you probably do need PTRS_PER_PGD, as this is the one value the mm is using to
generate page tables. I'm pretty sure you also need T0SZ and T1SZ to know if that's
actually in use, or the kernel is bodging round it with an offset.


Thanks,

James
James Morse April 2, 2019, 5:27 p.m. UTC | #5
Hi Kazu,

On 27/03/2019 16:07, Kazuhito Hagio wrote:
> On 3/26/2019 12:36 PM, James Morse wrote:
>> On 20/03/2019 05:09, Bhupesh Sharma wrote:
>>> With ARMv8.2-LVA architecture extension availability, arm64 hardware
>>> which supports this extension can support a virtual address-space upto
>>> 52-bits.
>>>
>>> Since at the moment we enable the support of this extension in kernel
>>> via CONFIG flags, e.g.
>>>  - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
>>>
>>> so, there is no clear mechanism in the user-space right now to
>>> determine these CONFIG flag values and hence determine the maximum
>>> virtual address space supported by the underlying kernel.
>>>
>>> User-space tools like 'makedumpfile' therefore are broken currently
>>> as they have no proper method to calculate the 'PTRS_PER_PGD' value
>>> which is required to perform a page table walk to determine the
>>> physical address of a corresponding virtual address found in
>>> kcore/vmcoreinfo.
>>>
>>> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
>>> it can be used in user-space to determine the maximum virtual address
>>> supported by underlying kernel.
>>
>> I don't think this really solves the problem, it feels fragile.
>>
>> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and PTRS_PER_PGD=1024.
>> You can use this to work out that the top level page table size isn't consistent with a
>> 48bit VA, so 52bit VA must be in use...
>>
>> But wasn't your problem walking the kernel page tables? In particular the offset that we
>> apply because the tables were based on a 48bit VA shifted up in swapper_pg_dir.
>>
>> Where does the TTBR1_EL1 offset come from with this property? I assume makedumpfile
>> hard-codes it when it sees 52bit is in use ... somewhere.

> My understanding is that the TTBR1_EL1 offset comes from a kernel
> virtual address with the exported PTRS_PER_PGD.
> 
> With T1SZ is 48bit and T0SZ is 52bit,

(PTRS_PER_PGD doesn't tell you this, PTRS_PER_PGD lets you spot something odd is
happening, and this just happens to be the only odd combination today.)


> kva = 0xffff000000000000    <--- start of kernel virtual address

Does makedumpfile have this value? If the kernel were using 52bit VA for TTBR1 this value
would be different.


> pgd_index(kva) = (kva >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)
>                = (0xffff000000000000 >> 42) & (1024 - 1)
>                = 0x00000000003fffc0 & 0x3ff
>                = 0x3c0      <--- the offset (0x3c0) is included
> 
> This is what kernel does now, so makedumpfile also wants to do.

Sure, and it would work today. I'm worried about tomorrow, where we support something new,
and need to bundle new information out through vmcoreinfo. This ends up being used to
fingerprint the kernel support, instead of as the value it was intended to be.


>> We haven't solved the problem!
>>
>> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but in the future it
>> could set them the same, or different the other-way-round.
>>
>> Will makedumpfile using this value keep working once T1SZ is 52bit VA too? In this case
>> there would be no ttbr offset.
> 
> If T1SZ is 52bit, probably kernel virtual address starts from 0xfff0000000000000,

I didn't think this 'bottom of the ttbr1 mapping range' value was exposed anywhere.
Where can user-space get this from? (I can't see it in the vmcoreinfo list)


> then the offset becomes 0 with the pgd_index() above.
> I think makedumpfile will keep working with that.


Steve mentions a 52/48 combination in his kernel series:
https://lore.kernel.org/linux-arm-kernel/20190218170245.14915-1-steve.capper@arm.com/


I think vmcoreinfo-users will eventually need to spot 52bit used in TTBR1 and/or TTBR0,
and possibly: configured, but not enabled in either. (this is because the bits are also
used for pointer-auth, the kernel may be built for both pointer-auth and 52-bit VA, and
chose which to enabled at boot based on some policy)

I don't see how you can do this with one value.
I'd like to get this right now, so we user-space doesn't need updating again!


Thanks,

James
Bhupesh Sharma April 3, 2019, 5:54 p.m. UTC | #6
Hi James,

On 04/02/2019 10:56 PM, James Morse wrote:
> Hi Bhupesh,
> 
> On 28/03/2019 11:42, Bhupesh Sharma wrote:
>> On 03/26/2019 10:06 PM, James Morse wrote:
>>> On 20/03/2019 05:09, Bhupesh Sharma wrote:
>>>> With ARMv8.2-LVA architecture extension availability, arm64 hardware
>>>> which supports this extension can support a virtual address-space upto
>>>> 52-bits.
>>>>
>>>> Since at the moment we enable the support of this extension in kernel
>>>> via CONFIG flags, e.g.
>>>>    - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
>>>>
>>>> so, there is no clear mechanism in the user-space right now to
>>>> determine these CONFIG flag values and hence determine the maximum
>>>> virtual address space supported by the underlying kernel.
>>>>
>>>> User-space tools like 'makedumpfile' therefore are broken currently
>>>> as they have no proper method to calculate the 'PTRS_PER_PGD' value
>>>> which is required to perform a page table walk to determine the
>>>> physical address of a corresponding virtual address found in
>>>> kcore/vmcoreinfo.
>>>>
>>>> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
>>>> it can be used in user-space to determine the maximum virtual address
>>>> supported by underlying kernel.
>>>
>>> I don't think this really solves the problem, it feels fragile.
>>>
>>> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and PTRS_PER_PGD=1024.
>>> You can use this to work out that the top level page table size isn't consistent with a
>>> 48bit VA, so 52bit VA must be in use...
>>>
>>> But wasn't your problem walking the kernel page tables? In particular the offset that we
>>> apply because the tables were based on a 48bit VA shifted up in swapper_pg_dir.
>>>
>>> Where does the TTBR1_EL1 offset come from with this property? I assume makedumpfile
>>> hard-codes it when it sees 52bit is in use ... somewhere.
>>> We haven't solved the problem!
> 
>> But isn't the TTBR1_EL1 offset already appended by the kernel via e842dfb5a2d3 ("arm64:
>> mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD")
>> in case of kernel configuration where 52-bit userspace VAs are possible.
> 
>> Accordingly we have the following assembler helper in 'arch/arm64/include/asm/assembler.h':
>>
>>         .macro  offset_ttbr1, ttbr
>> #ifdef CONFIG_ARM64_52BIT_VA
>>         orr     \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
>> #endif
>>         .endm
>>
>> where:
>> #ifdef CONFIG_ARM64_52BIT_VA
>> /* Must be at least 64-byte aligned to prevent corruption of the TTBR */
>> #define TTBR1_BADDR_4852_OFFSET        (((UL(1) << (52 - PGDIR_SHIFT)) - \
>>                                  (UL(1) << (48 - PGDIR_SHIFT))) * 8)
>> #endif
> 
> Sure, and all this would work today, because there is only one weird combination. But once
> we support another combination of 52bit-va, you'd either need another value, or to start
> using PTRS_PER_PGD as a flag for v5.1_FUNNY_BEHAVIOUR_ONE.

I completed my user-space experimentation with 52-bit kernel VA changes 
from Steve today and have shared a detailed review on his patchset (See 
<http://lists.infradead.org/pipermail/kexec/2019-April/022750.html>).

But first let me share some opinion on how we are adding the 52-bit 
address space changes for arm64 in the kernel.

I think we have ended up adding just a bit _too many_ CONFIG and MACRO 
values for the increased address space changes. For e.g. after the 
52-bit kernel VA changes we have at-least 4 macros which explain the VA 
address range with CONFIG_ARM64_USER_KERNEL_VA_BITS_52=y:

VA_BITS = 52,
VA_BITS_ACTUAL = vabits_actual = 48,
VA_BITS_MIN = min (48, VA_BITS) = 48.
PTRS_PER_PGD = 64 (48-bit) or 1024 (52-bit)

Of these, VA_BITS, VA_BITS_ACTUAL and PTRS_PER_PGD are definitely of 
interest in the userspace as they define:

1.
  /*
     * VMEMMAP_SIZE - allows the whole linear region to be covered by
     *                a struct page array
     */
    #define VMEMMAP_SIZE (UL(1) << (VA_BITS - PAGE_SHIFT - 1 + 
STRUCT_PAGE_MAX_SHIFT))

2. #define __is_lm_address(addr)    (!((addr) & BIT(VA_BITS_ACTUAL - 1)))

We have discussed the usage of PTRS_PER_PGD in userspace already at 
quite some length, so I will focus on the other two below (VA_BITS and 
VA_BITS_ACUAL).

Both are critical for determining VMEMMAP_SIZE and whether a virtual 
address lies in the linear map range respectively.

I don't see any standard mechanism other than the following to achieve a 
working user-space with these changes:
- a sysfs node (may be a 
'/sys/devices/system/cpu/addressing-capabilities' node?) or HWCAP 
capability export for user-space utilities which perform a live analysis 
and use the above variables.
- exporting these variables in vmcoreinfo (for analysis of crash dump).

VA_BITS is already exported in vmcoreinfo, whereas I have proposed 
exporting PTRS_PER_PGD to vmcoreinfo via this patch.

For 52-bit kernel VA changes, VA_BITS_ACTUAL will also be needed in 
vmcoreinfo (See 
<http://lists.infradead.org/pipermail/kexec/2019-April/022750.html> for 
details).

>> Note that the above computation holds true both for PTRS_PER_PGD = 64 (48-bit kernel with
>> 48-bit User VA) and 1024 (48-bit with 52-bit User VA) cases. And these are the
>> configurations for which we are trying to fix the user-space regressions reported (on
>> arm64) recently.
> 
> ... and revisit it when there is another combination?

See above.

>>> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but in the future it
>>> could set them the same, or different the other-way-round.
>>>
>>> Will makedumpfile using this value keep working once T1SZ is 52bit VA too? In this case
>>> there would be no ttbr offset.
>>>
>>> If you need another vmcoreinfo flag once that happens, we've done something wrong here.
>>
>> I am currently experimenting with Steve's patches for 52-bit kernel VA
>> (<https://lwn.net/Articles/780093/>) and will comment more on the same when I am able to
>> get the user-space utilities like makedumpfile and kexec-tools to work with the same on
>> both ARMv8 Fast Simulator model and older CPUs which don't support ARMv8.2 extensions.
> 
>> However, I think we should not hold up fixes for regressions already reported, because the
>> 52-bit kernel VA changes probably still need some more rework.
> 
> Chucking things into vmcoreinfo isn't free: we need to keep them there forever, otherwise
> yesterdays version of the tools breaks. Can we take the time to get this right for the
> cases we know about?

Sure, but exporting variable(s) in vmcoreinfo in directly related to the 
information variable(s) we add in the kernel side without which the 
user-space would break.

I have added the requirements for 52-bit kernel VA above (i.e we need an 
additional VA_BITS_ACTUAL variable export'ed rather than any tinkering 
with already proposed PTRS_PER_PGD).

May be this is a good time to also talk about minimizing the kernel 
interfaces we are proposing to hold and indicate normal (48-bit) and 
extended (52-bit) address spaces on arm64.

Ideally, we would want to simplify it further to be on similar lines as x86:
CONFIG_X86_5LEVEL=y
vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
			pgtable_l5_enabled());

which seems much cleaner..

I am open to any suggestions on the same.

> Yes the kernel code is going to move around, this is why the information we expose via
> vmcoreinfo needs to be thought through: something we would always need, regardless of how
> the kernel implements it.
> 
> 
>>> (Not to mention what happens if the TTBR1_EL1 uses 52bit va, but TTBR0_EL1 doesn't)
>>
>> I am wondering if there are any real users of the above combination.
> 
> Heh! Is there any hardware that supports this?
> 
> Pointer-auth changes all this again, as we may prefer to use the bits for pointer-auth in
> one TTB or the other. PTRS_PER_PGD may show the 52bit value in this case, but neither TTBR
> is mapping 52bits of VA.
> 
> 
>> So far, I have generally come across discussions where the following variations of the
>> address spaces have been proposed/requested:
>> - 48bit kernel VA + 48-bit User VA,
>> - 48-bit kernel VA + 52-bit User VA,
> 
> + 52bit kernel, because there is excessive quantities of memory, and the kernel maps it
> all, but 48-bit user, because it never maps all the memory, and we prefer the bits for
> pointer-auth.
> 
>> - 52-bit kernel VA + 52-bit User VA.
> 
> And...  all four may happen with the same built image. I don't see how you can tell these
> cases apart with the one (build-time-constant!) PTRS_PER_PGD value.
> 
> I'm sure some of these cases are hypothetical, but by considering it all now, we can avoid
> three more kernel:vmcoreinfo updates, and three more fix-user-space-to-use-the-new-value.

Agree.

> I think you probably do need PTRS_PER_PGD, as this is the one value the mm is using to
> generate page tables. I'm pretty sure you also need T0SZ and T1SZ to know if that's
> actually in use, or the kernel is bodging round it with an offset.

Sure, I am open to suggestions (as I realize that we need an additional 
VA_BITS_ACTUAL variable export'ed for 52-bit kernel VA changes).

Also how do we standardize reading T0SZ and T1SZ in user-space. Do you 
propose I make an enhancement in the cpu-feature-registers interface 
(see [1]) or the HWCAPS interface (see [2]) for the same?

[1]. 
https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt
[2]. https://www.kernel.org/doc/Documentation/arm64/elf_hwcaps.txt

Thanks,
Bhupesh
Kazuhito Hagio April 5, 2019, 8:23 p.m. UTC | #7
Hi James,

Thank you for your comment.

-----Original Message-----
> Hi Kazu,
> 
> On 27/03/2019 16:07, Kazuhito Hagio wrote:
> > On 3/26/2019 12:36 PM, James Morse wrote:
> >> On 20/03/2019 05:09, Bhupesh Sharma wrote:
> >>> With ARMv8.2-LVA architecture extension availability, arm64 hardware
> >>> which supports this extension can support a virtual address-space upto
> >>> 52-bits.
> >>>
> >>> Since at the moment we enable the support of this extension in kernel
> >>> via CONFIG flags, e.g.
> >>>  - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
> >>>
> >>> so, there is no clear mechanism in the user-space right now to
> >>> determine these CONFIG flag values and hence determine the maximum
> >>> virtual address space supported by the underlying kernel.
> >>>
> >>> User-space tools like 'makedumpfile' therefore are broken currently
> >>> as they have no proper method to calculate the 'PTRS_PER_PGD' value
> >>> which is required to perform a page table walk to determine the
> >>> physical address of a corresponding virtual address found in
> >>> kcore/vmcoreinfo.
> >>>
> >>> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
> >>> it can be used in user-space to determine the maximum virtual address
> >>> supported by underlying kernel.
> >>
> >> I don't think this really solves the problem, it feels fragile.
> >>
> >> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and PTRS_PER_PGD=1024.
> >> You can use this to work out that the top level page table size isn't consistent with a
> >> 48bit VA, so 52bit VA must be in use...
> >>
> >> But wasn't your problem walking the kernel page tables? In particular the offset that we
> >> apply because the tables were based on a 48bit VA shifted up in swapper_pg_dir.
> >>
> >> Where does the TTBR1_EL1 offset come from with this property? I assume makedumpfile
> >> hard-codes it when it sees 52bit is in use ... somewhere.
> 
> > My understanding is that the TTBR1_EL1 offset comes from a kernel
> > virtual address with the exported PTRS_PER_PGD.
> >
> > With T1SZ is 48bit and T0SZ is 52bit,
> 
> (PTRS_PER_PGD doesn't tell you this, PTRS_PER_PGD lets you spot something odd is
> happening, and this just happens to be the only odd combination today.)

I didn't intend to guess other things from PTRS_PER_PGD.

> > kva = 0xffff000000000000    <--- start of kernel virtual address
> 
> Does makedumpfile have this value? If the kernel were using 52bit VA for TTBR1 this value
> would be different.

This was an example address to show that pgd_index() automatically returns
a value including the offset for any kernel virtual address by the exported
PTRS_PER_PGD. In this case, even for the first virtual address, it returns
the non-zero value, which is the offset. (sorry for the poor explanation..)

So makedumpfile doesn't need the start address specifically to walk the page
tables, and I was thinking that exporting PTRS_PER_PGD may be stable unless
pgd_index() doesn't change.

> > pgd_index(kva) = (kva >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)
> >                = (0xffff000000000000 >> 42) & (1024 - 1)
> >                = 0x00000000003fffc0 & 0x3ff
> >                = 0x3c0      <--- the offset (0x3c0) is included
> >
> > This is what kernel does now, so makedumpfile also wants to do.
> 
> Sure, and it would work today. I'm worried about tomorrow, where we support something new,
> and need to bundle new information out through vmcoreinfo. This ends up being used to
> fingerprint the kernel support, instead of as the value it was intended to be.

Yes, more stable and reasonable way is preferable.

> >> We haven't solved the problem!
> >>
> >> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but in the future it
> >> could set them the same, or different the other-way-round.
> >>
> >> Will makedumpfile using this value keep working once T1SZ is 52bit VA too? In this case
> >> there would be no ttbr offset.
> >
> > If T1SZ is 52bit, probably kernel virtual address starts from 0xfff0000000000000,
> 
> I didn't think this 'bottom of the ttbr1 mapping range' value was exposed anywhere.
> Where can user-space get this from? (I can't see it in the vmcoreinfo list)
> 
> 
> > then the offset becomes 0 with the pgd_index() above.
> > I think makedumpfile will keep working with that.
> 
> 
> Steve mentions a 52/48 combination in his kernel series:
> https://lore.kernel.org/linux-arm-kernel/20190218170245.14915-1-steve.capper@arm.com/
> 
> 
> I think vmcoreinfo-users will eventually need to spot 52bit used in TTBR1 and/or TTBR0,
> and possibly: configured, but not enabled in either. (this is because the bits are also
> used for pointer-auth, the kernel may be built for both pointer-auth and 52-bit VA, and
> chose which to enabled at boot based on some policy)
> 
> I don't see how you can do this with one value.
> I'd like to get this right now, so we user-space doesn't need updating again!

I need to understand the 52-bit kernel VA implementation, pointer-auth,
and Bhupesh's discussion. I will catch up them.

Thanks a lot!
Kazu

> 
> 
> Thanks,
> 
> James
Bhupesh Sharma May 4, 2019, 12:53 p.m. UTC | #8
On 04/03/2019 11:24 PM, Bhupesh Sharma wrote:
> Hi James,
> 
> On 04/02/2019 10:56 PM, James Morse wrote:
>> Hi Bhupesh,
>>
>> On 28/03/2019 11:42, Bhupesh Sharma wrote:
>>> On 03/26/2019 10:06 PM, James Morse wrote:
>>>> On 20/03/2019 05:09, Bhupesh Sharma wrote:
>>>>> With ARMv8.2-LVA architecture extension availability, arm64 hardware
>>>>> which supports this extension can support a virtual address-space upto
>>>>> 52-bits.
>>>>>
>>>>> Since at the moment we enable the support of this extension in kernel
>>>>> via CONFIG flags, e.g.
>>>>>    - User-space 52-bit LVA via CONFIG_ARM64_USER_VA_BITS_52
>>>>>
>>>>> so, there is no clear mechanism in the user-space right now to
>>>>> determine these CONFIG flag values and hence determine the maximum
>>>>> virtual address space supported by the underlying kernel.
>>>>>
>>>>> User-space tools like 'makedumpfile' therefore are broken currently
>>>>> as they have no proper method to calculate the 'PTRS_PER_PGD' value
>>>>> which is required to perform a page table walk to determine the
>>>>> physical address of a corresponding virtual address found in
>>>>> kcore/vmcoreinfo.
>>>>>
>>>>> If one appends 'PTRS_PER_PGD' number to vmcoreinfo for arm64,
>>>>> it can be used in user-space to determine the maximum virtual address
>>>>> supported by underlying kernel.
>>>>
>>>> I don't think this really solves the problem, it feels fragile.
>>>>
>>>> I can see how vmcoreinfo tells you VA_BITS==48, PAGE_SIZE==64K and 
>>>> PTRS_PER_PGD=1024.
>>>> You can use this to work out that the top level page table size 
>>>> isn't consistent with a
>>>> 48bit VA, so 52bit VA must be in use...
>>>>
>>>> But wasn't your problem walking the kernel page tables? In 
>>>> particular the offset that we
>>>> apply because the tables were based on a 48bit VA shifted up in 
>>>> swapper_pg_dir.
>>>>
>>>> Where does the TTBR1_EL1 offset come from with this property? I 
>>>> assume makedumpfile
>>>> hard-codes it when it sees 52bit is in use ... somewhere.
>>>> We haven't solved the problem!
>>
>>> But isn't the TTBR1_EL1 offset already appended by the kernel via 
>>> e842dfb5a2d3 ("arm64:
>>> mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD")
>>> in case of kernel configuration where 52-bit userspace VAs are possible.
>>
>>> Accordingly we have the following assembler helper in 
>>> 'arch/arm64/include/asm/assembler.h':
>>>
>>>         .macro  offset_ttbr1, ttbr
>>> #ifdef CONFIG_ARM64_52BIT_VA
>>>         orr     \ttbr, \ttbr, #TTBR1_BADDR_4852_OFFSET
>>> #endif
>>>         .endm
>>>
>>> where:
>>> #ifdef CONFIG_ARM64_52BIT_VA
>>> /* Must be at least 64-byte aligned to prevent corruption of the TTBR */
>>> #define TTBR1_BADDR_4852_OFFSET        (((UL(1) << (52 - 
>>> PGDIR_SHIFT)) - \
>>>                                  (UL(1) << (48 - PGDIR_SHIFT))) * 8)
>>> #endif
>>
>> Sure, and all this would work today, because there is only one weird 
>> combination. But once
>> we support another combination of 52bit-va, you'd either need another 
>> value, or to start
>> using PTRS_PER_PGD as a flag for v5.1_FUNNY_BEHAVIOUR_ONE.
> 
> I completed my user-space experimentation with 52-bit kernel VA changes 
> from Steve today and have shared a detailed review on his patchset (See 
> <http://lists.infradead.org/pipermail/kexec/2019-April/022750.html>).
> 
> But first let me share some opinion on how we are adding the 52-bit 
> address space changes for arm64 in the kernel.
> 
> I think we have ended up adding just a bit _too many_ CONFIG and MACRO 
> values for the increased address space changes. For e.g. after the 
> 52-bit kernel VA changes we have at-least 4 macros which explain the VA 
> address range with CONFIG_ARM64_USER_KERNEL_VA_BITS_52=y:
> 
> VA_BITS = 52,
> VA_BITS_ACTUAL = vabits_actual = 48,
> VA_BITS_MIN = min (48, VA_BITS) = 48.
> PTRS_PER_PGD = 64 (48-bit) or 1024 (52-bit)
> 
> Of these, VA_BITS, VA_BITS_ACTUAL and PTRS_PER_PGD are definitely of 
> interest in the userspace as they define:
> 
> 1.
>   /*
>      * VMEMMAP_SIZE - allows the whole linear region to be covered by
>      *                a struct page array
>      */
>     #define VMEMMAP_SIZE (UL(1) << (VA_BITS - PAGE_SHIFT - 1 + 
> STRUCT_PAGE_MAX_SHIFT))
> 
> 2. #define __is_lm_address(addr)    (!((addr) & BIT(VA_BITS_ACTUAL - 1)))
> 
> We have discussed the usage of PTRS_PER_PGD in userspace already at 
> quite some length, so I will focus on the other two below (VA_BITS and 
> VA_BITS_ACUAL).
> 
> Both are critical for determining VMEMMAP_SIZE and whether a virtual 
> address lies in the linear map range respectively.
> 
> I don't see any standard mechanism other than the following to achieve a 
> working user-space with these changes:
> - a sysfs node (may be a 
> '/sys/devices/system/cpu/addressing-capabilities' node?) or HWCAP 
> capability export for user-space utilities which perform a live analysis 
> and use the above variables.
> - exporting these variables in vmcoreinfo (for analysis of crash dump).
> 
> VA_BITS is already exported in vmcoreinfo, whereas I have proposed 
> exporting PTRS_PER_PGD to vmcoreinfo via this patch.
> 
> For 52-bit kernel VA changes, VA_BITS_ACTUAL will also be needed in 
> vmcoreinfo (See 
> <http://lists.infradead.org/pipermail/kexec/2019-April/022750.html> for 
> details).
> 
>>> Note that the above computation holds true both for PTRS_PER_PGD = 64 
>>> (48-bit kernel with
>>> 48-bit User VA) and 1024 (48-bit with 52-bit User VA) cases. And 
>>> these are the
>>> configurations for which we are trying to fix the user-space 
>>> regressions reported (on
>>> arm64) recently.
>>
>> ... and revisit it when there is another combination?
> 
> See above.
> 
>>>> Today __cpu_setup() sets T0SZ and T1SZ differently for 52bit VA, but 
>>>> in the future it
>>>> could set them the same, or different the other-way-round.
>>>>
>>>> Will makedumpfile using this value keep working once T1SZ is 52bit 
>>>> VA too? In this case
>>>> there would be no ttbr offset.
>>>>
>>>> If you need another vmcoreinfo flag once that happens, we've done 
>>>> something wrong here.
>>>
>>> I am currently experimenting with Steve's patches for 52-bit kernel VA
>>> (<https://lwn.net/Articles/780093/>) and will comment more on the 
>>> same when I am able to
>>> get the user-space utilities like makedumpfile and kexec-tools to 
>>> work with the same on
>>> both ARMv8 Fast Simulator model and older CPUs which don't support 
>>> ARMv8.2 extensions.
>>
>>> However, I think we should not hold up fixes for regressions already 
>>> reported, because the
>>> 52-bit kernel VA changes probably still need some more rework.
>>
>> Chucking things into vmcoreinfo isn't free: we need to keep them there 
>> forever, otherwise
>> yesterdays version of the tools breaks. Can we take the time to get 
>> this right for the
>> cases we know about?
> 
> Sure, but exporting variable(s) in vmcoreinfo in directly related to the 
> information variable(s) we add in the kernel side without which the 
> user-space would break.
> 
> I have added the requirements for 52-bit kernel VA above (i.e we need an 
> additional VA_BITS_ACTUAL variable export'ed rather than any tinkering 
> with already proposed PTRS_PER_PGD).
> 
> May be this is a good time to also talk about minimizing the kernel 
> interfaces we are proposing to hold and indicate normal (48-bit) and 
> extended (52-bit) address spaces on arm64.
> 
> Ideally, we would want to simplify it further to be on similar lines as 
> x86:
> CONFIG_X86_5LEVEL=y
> vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n",
>              pgtable_l5_enabled());
> 
> which seems much cleaner..
> 
> I am open to any suggestions on the same.
> 
>> Yes the kernel code is going to move around, this is why the 
>> information we expose via
>> vmcoreinfo needs to be thought through: something we would always 
>> need, regardless of how
>> the kernel implements it.
>>
>>
>>>> (Not to mention what happens if the TTBR1_EL1 uses 52bit va, but 
>>>> TTBR0_EL1 doesn't)
>>>
>>> I am wondering if there are any real users of the above combination.
>>
>> Heh! Is there any hardware that supports this?
>>
>> Pointer-auth changes all this again, as we may prefer to use the bits 
>> for pointer-auth in
>> one TTB or the other. PTRS_PER_PGD may show the 52bit value in this 
>> case, but neither TTBR
>> is mapping 52bits of VA.
>>
>>
>>> So far, I have generally come across discussions where the following 
>>> variations of the
>>> address spaces have been proposed/requested:
>>> - 48bit kernel VA + 48-bit User VA,
>>> - 48-bit kernel VA + 52-bit User VA,
>>
>> + 52bit kernel, because there is excessive quantities of memory, and 
>> the kernel maps it
>> all, but 48-bit user, because it never maps all the memory, and we 
>> prefer the bits for
>> pointer-auth.
>>
>>> - 52-bit kernel VA + 52-bit User VA.
>>
>> And...  all four may happen with the same built image. I don't see how 
>> you can tell these
>> cases apart with the one (build-time-constant!) PTRS_PER_PGD value.
>>
>> I'm sure some of these cases are hypothetical, but by considering it 
>> all now, we can avoid
>> three more kernel:vmcoreinfo updates, and three more 
>> fix-user-space-to-use-the-new-value.
> 
> Agree.
> 
>> I think you probably do need PTRS_PER_PGD, as this is the one value 
>> the mm is using to
>> generate page tables. I'm pretty sure you also need T0SZ and T1SZ to 
>> know if that's
>> actually in use, or the kernel is bodging round it with an offset.
> 
> Sure, I am open to suggestions (as I realize that we need an additional 
> VA_BITS_ACTUAL variable export'ed for 52-bit kernel VA changes).
> 
> Also how do we standardize reading T0SZ and T1SZ in user-space. Do you 
> propose I make an enhancement in the cpu-feature-registers interface 
> (see [1]) or the HWCAPS interface (see [2]) for the same?
> 
> [1]. 
> https://www.kernel.org/doc/Documentation/arm64/cpu-feature-registers.txt
> [2]. https://www.kernel.org/doc/Documentation/arm64/elf_hwcaps.txt
> 
> Thanks,
> Bhupesh

Ping.

Hi James, Steve,

Any comments on the above points? At the moment we have to carry these 
fixes in the distribution kernels and I would like to have these fixed 
in upstream kernel itself.

Thanks,
Bhupesh
James Morse June 7, 2019, 3:11 p.m. UTC | #9
Hi Bhupesh,

(sorry for the delay on this)

On 04/05/2019 13:53, Bhupesh Sharma wrote:
> On 04/03/2019 11:24 PM, Bhupesh Sharma wrote:
>> On 04/02/2019 10:56 PM, James Morse wrote:
>>> Yes the kernel code is going to move around, this is why the information we expose via
>>> vmcoreinfo needs to be thought through: something we would always need, regardless of how
>>> the kernel implements it.
>>>

>>> Pointer-auth changes all this again, as we may prefer to use the bits for pointer-auth in
>>> one TTB or the other. PTRS_PER_PGD may show the 52bit value in this case, but neither TTBR
>>> is mapping 52bits of VA.
>>>
>>>
>>>> So far, I have generally come across discussions where the following variations of the
>>>> address spaces have been proposed/requested:
>>>> - 48bit kernel VA + 48-bit User VA,
>>>> - 48-bit kernel VA + 52-bit User VA,
>>>
>>> + 52bit kernel, because there is excessive quantities of memory, and the kernel maps it
>>> all, but 48-bit user, because it never maps all the memory, and we prefer the bits for
>>> pointer-auth.
>>>
>>>> - 52-bit kernel VA + 52-bit User VA.
>>>
>>> And...  all four may happen with the same built image. I don't see how you can tell these
>>> cases apart with the one (build-time-constant!) PTRS_PER_PGD value.
>>>
>>> I'm sure some of these cases are hypothetical, but by considering it all now, we can avoid
>>> three more kernel:vmcoreinfo updates, and three more fix-user-space-to-use-the-new-value.
>>
>> Agree.
>>
>>> I think you probably do need PTRS_PER_PGD, as this is the one value the mm is using to
>>> generate page tables. I'm pretty sure you also need T0SZ and T1SZ to know if that's
>>> actually in use, or the kernel is bodging round it with an offset.
>>
>> Sure, I am open to suggestions (as I realize that we need an additional VA_BITS_ACTUAL
>> variable export'ed for 52-bit kernel VA changes).

(stepping back a bit:)

I'm against exposing arch-specific #ifdefs that correspond to how we've configured the
arch code's interactions with mm. These are all moving targets, we can't have any of it
become ABI.

I have a straw-man for this: What is the value of PTE_FILE_MAX_BITS on your system?
I have no idea what this value is or means, an afternoon's archaeology would be needed(!).
This is something that made sense for one kernel version, a better idea came along, and it
was replaced. If we'd exposed this to user-space, we'd have to generate a value, even if
it doesn't mean anything. Exposing VA_BITS_ACTUAL is the same.

(Keep an eye out for when we change the kernel memory map, and any second-guessing based
on VA_BITS turns out to be wrong)


What we do have are the hardware properties. The kernel can't change these.


>> Also how do we standardize reading T0SZ and T1SZ in user-space. Do you propose I make an
>> enhancement in the cpu-feature-registers interface (see [1]) or the HWCAPS interface
>> (see [2]) for the same?

cpufeature won't help you if you've already panic()d and only have the vmcore file. This
stuff needs to go in vmcoreinfo.

As long as there is a description of how userspace uses these values, I think adding
key/values for TCR_EL1.TxSZ to the vmcoreinfo is a sensible way out of this. You probably
need TTBR1_EL1.BADDR too. (it should be specific fields, to prevent 'new uses' becoming ABI)

This tells you how the hardware was configured, and covers any combination of TxSZ tricks
we play, and whether those address bits are used for VA, or ptrauth for TTBR0 or TTRB1.


> Any comments on the above points? At the moment we have to carry these fixes in the
> distribution kernels and I would like to have these fixed in upstream kernel itself.


Thanks,

James
Bhupesh Sharma June 10, 2019, 10:52 a.m. UTC | #10
Hi James,

On 06/07/2019 08:41 PM, James Morse wrote:
> Hi Bhupesh,
> 
> (sorry for the delay on this)

No problem.

> On 04/05/2019 13:53, Bhupesh Sharma wrote:
>> On 04/03/2019 11:24 PM, Bhupesh Sharma wrote:
>>> On 04/02/2019 10:56 PM, James Morse wrote:
>>>> Yes the kernel code is going to move around, this is why the information we expose via
>>>> vmcoreinfo needs to be thought through: something we would always need, regardless of how
>>>> the kernel implements it.
>>>>
> 
>>>> Pointer-auth changes all this again, as we may prefer to use the bits for pointer-auth in
>>>> one TTB or the other. PTRS_PER_PGD may show the 52bit value in this case, but neither TTBR
>>>> is mapping 52bits of VA.
>>>>
>>>>
>>>>> So far, I have generally come across discussions where the following variations of the
>>>>> address spaces have been proposed/requested:
>>>>> - 48bit kernel VA + 48-bit User VA,
>>>>> - 48-bit kernel VA + 52-bit User VA,
>>>>
>>>> + 52bit kernel, because there is excessive quantities of memory, and the kernel maps it
>>>> all, but 48-bit user, because it never maps all the memory, and we prefer the bits for
>>>> pointer-auth.
>>>>
>>>>> - 52-bit kernel VA + 52-bit User VA.
>>>>
>>>> And...  all four may happen with the same built image. I don't see how you can tell these
>>>> cases apart with the one (build-time-constant!) PTRS_PER_PGD value.
>>>>
>>>> I'm sure some of these cases are hypothetical, but by considering it all now, we can avoid
>>>> three more kernel:vmcoreinfo updates, and three more fix-user-space-to-use-the-new-value.
>>>
>>> Agree.
>>>
>>>> I think you probably do need PTRS_PER_PGD, as this is the one value the mm is using to
>>>> generate page tables. I'm pretty sure you also need T0SZ and T1SZ to know if that's
>>>> actually in use, or the kernel is bodging round it with an offset.
>>>
>>> Sure, I am open to suggestions (as I realize that we need an additional VA_BITS_ACTUAL
>>> variable export'ed for 52-bit kernel VA changes).
> 
> (stepping back a bit:)
> 
> I'm against exposing arch-specific #ifdefs that correspond to how we've configured the
> arch code's interactions with mm. These are all moving targets, we can't have any of it
> become ABI.

Sure, I understand your concerns.

> I have a straw-man for this: What is the value of PTE_FILE_MAX_BITS on your system?
> I have no idea what this value is or means, an afternoon's archaeology would be needed(!).
> This is something that made sense for one kernel version, a better idea came along, and it
> was replaced. If we'd exposed this to user-space, we'd have to generate a value, even if
> it doesn't mean anything. Exposing VA_BITS_ACTUAL is the same.
> 
> (Keep an eye out for when we change the kernel memory map, and any second-guessing based
> on VA_BITS turns out to be wrong)
> 
> 
> What we do have are the hardware properties. The kernel can't change these.
> 
> 
>>> Also how do we standardize reading T0SZ and T1SZ in user-space. Do you propose I make an
>>> enhancement in the cpu-feature-registers interface (see [1]) or the HWCAPS interface
>>> (see [2]) for the same?
> 
> cpufeature won't help you if you've already panic()d and only have the vmcore file. This
> stuff needs to go in vmcoreinfo.
> 
> As long as there is a description of how userspace uses these values, I think adding
> key/values for TCR_EL1.TxSZ to the vmcoreinfo is a sensible way out of this. You probably
> need TTBR1_EL1.BADDR too. (it should be specific fields, to prevent 'new uses' becoming ABI)
> 
> This tells you how the hardware was configured, and covers any combination of TxSZ tricks
> we play, and whether those address bits are used for VA, or ptrauth for TTBR0 or TTRB1.

Fair enough. Let me try and experiment with this suggestion a bit and I 
will come back with a RFC patch/patchset by this weekend. Hopefully, it 
will cover all the weird PA/VA bit combinations we are handling in arm64 
distros these days :)

Thanks,
Bhupesh


>> Any comments on the above points? At the moment we have to carry these fixes in the
>> distribution kernels and I would like to have these fixed in upstream kernel itself.
> 
> 
> Thanks,
> 
> James
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
>

Patch
diff mbox series

diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c
index ca4c3e12d8c5..123a42c56b8e 100644
--- a/arch/arm64/kernel/crash_core.c
+++ b/arch/arm64/kernel/crash_core.c
@@ -10,6 +10,7 @@ 
 void arch_crash_save_vmcoreinfo(void)
 {
 	VMCOREINFO_NUMBER(VA_BITS);
+	VMCOREINFO_NUMBER(PTRS_PER_PGD);
 	/* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */
 	vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n",
 						kimage_voffset);