Message ID | 1548850991-11879-1-git-send-email-bhsharma@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64, vmcoreinfo : Append 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo | expand |
Hi Bhupesh, On 01/30/2019 12:23 PM, Bhupesh Sharma wrote: > With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which > supports these extensions can support upto 52-bit virtual and 52-bit > physical addresses respectively. > > Since at the moment we enable the support of these extensions via CONFIG > flags, e.g. > - LPA via CONFIG_ARM64_PA_BITS_52 > > there are no clear mechanisms in user-space right now to > deteremine these CONFIG flag values and also determine the PARange and > VARange address values. > User-space tools like 'makedumpfile' and 'crash-utility' can instead > use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine > the maximum virtual address and physical address (respectively) > supported by underlying kernel. > > A reference 'makedumpfile' implementation which uses this approach to > determining the maximum physical address is available in [0]. Why does it need to know? (Suzuki asked the same question on your earlier version) https://lore.kernel.org/linux-arm-kernel/cff44754-7fe4-efea-bc8e-4dde2277c821@arm.com/ From your github link it looks like you use this to re-assemble the two bits of the PFN from the pte. Can't you always do this for 64K pages? CPUs with the feature always do this too, its not something the kernel turns on. Thanks, James > diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c > index ca4c3e12d8c5..ad231be5c0d8 100644 > --- a/arch/arm64/kernel/crash_core.c > +++ b/arch/arm64/kernel/crash_core.c > @@ -10,6 +10,8 @@ > void arch_crash_save_vmcoreinfo(void) > { > VMCOREINFO_NUMBER(VA_BITS); > + VMCOREINFO_NUMBER(MAX_USER_VA_BITS); > + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS); > /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */ > vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", > kimage_voffset); >
Hi James, Thanks for review. Please see my comments inline. On 01/30/2019 08:51 PM, James Morse wrote: > Hi Bhupesh, > > On 01/30/2019 12:23 PM, Bhupesh Sharma wrote: >> With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which >> supports these extensions can support upto 52-bit virtual and 52-bit >> physical addresses respectively. >> >> Since at the moment we enable the support of these extensions via CONFIG >> flags, e.g. >> - LPA via CONFIG_ARM64_PA_BITS_52 >> >> there are no clear mechanisms in user-space right now to >> deteremine these CONFIG flag values and also determine the PARange and >> VARange address values. >> User-space tools like 'makedumpfile' and 'crash-utility' can instead >> use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine >> the maximum virtual address and physical address (respectively) >> supported by underlying kernel. >> >> A reference 'makedumpfile' implementation which uses this approach to >> determining the maximum physical address is available in [0]. > > Why does it need to know? > > (Suzuki asked the same question on your earlier version) > https://lore.kernel.org/linux-arm-kernel/cff44754-7fe4-efea-bc8e-4dde2277c821@arm.com/ I have shared some details (after discussion with our test teams) in reply to the review comments from Suzuki here: http://lists.infradead.org/pipermail/kexec/2019-January/022389.html, and http://lists.infradead.org/pipermail/kexec/2019-January/022390.html Just to summarize, I mentioned in my replies to the review comments tha the makedumpfile implementation (for decoding the PTE) was just as an example, however there can be other user-space applications, for e.g a user-space application running with 48-bit kernel VA and 52-bit user space VA and requesting allocation in 'high' address via a 'hint' to mmap. > From your github link it looks like you use this to re-assemble the two > bits of the PFN from the pte. Can't you always do this for 64K pages? > CPUs with the feature always do this too, its not something the kernel > turns on. Ok, let me try to give some perspective of a common makedumpfile use-case before I jump into the details: (a) makedumpfile tool can be used to generate a vmcore and analyze it later. So for example we can create vmcore for a system running with page-size = 64K and analyze it later on a different system using page-size = 4K. Since several makedumpfile code legs (for page-table walk) are common in both the paths (creating a vmcore and analyzing a vmcore), we cannot hardcode the PTE calculation masks for either 48-bit or 52-bit address spaces (or 4K/64K page sizes). The example invocations for the two cases is given below: Create a vmcore dump on a 64K machine: # makedumpfile -l --message-level 1 -d 31 /proc/vmcore vmcore Analyze the vmcore dump on a 4K machine: # makedumpfile -d 31 -x vmlinux vmcore dumpfile Also hardcoding the PTE calculation to use the high address bit mask always will break the backward compatibility with older kernels (which don't support 52-bit address space extensions). (b). Also x86_64 already has a vmcoreinfo export for 'pgtable_l5_enabled': void arch_crash_save_vmcoreinfo(void) { <.. snip..> vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", pgtable_l5_enabled()); } And the makedumpfile code uses the same to determine support for 5-level page tables in x86_64, see <https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/x86_64.c#L36> for example. Thanks, Bhupesh > >> diff --git a/arch/arm64/kernel/crash_core.c >> b/arch/arm64/kernel/crash_core.c >> index ca4c3e12d8c5..ad231be5c0d8 100644 >> --- a/arch/arm64/kernel/crash_core.c >> +++ b/arch/arm64/kernel/crash_core.c >> @@ -10,6 +10,8 @@ >> void arch_crash_save_vmcoreinfo(void) >> { >> VMCOREINFO_NUMBER(VA_BITS); >> + VMCOREINFO_NUMBER(MAX_USER_VA_BITS); >> + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS); >> /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */ >> vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", >> kimage_voffset); >> >
+ more people On 01/30/19 at 05:53pm, Bhupesh Sharma wrote: > With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which > supports these extensions can support upto 52-bit virtual and 52-bit > physical addresses respectively. > > Since at the moment we enable the support of these extensions via CONFIG > flags, e.g. > - LPA via CONFIG_ARM64_PA_BITS_52 > > there are no clear mechanisms in user-space right now to > deteremine these CONFIG flag values and also determine the PARange and > VARange address values. > > User-space tools like 'makedumpfile' and 'crash-utility' can instead > use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine > the maximum virtual address and physical address (respectively) > supported by underlying kernel. > > A reference 'makedumpfile' implementation which uses this approach to > determining the maximum physical address is available in [0]. > > [0]. https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490 I'm not objecting the patch, just want to make sure to make clear about things and make sure these issues are aware by people, and leave arm people to review the arm bits. 1. MAX_PHYSMEM_BITS As we previously found, back to 2014 makedumpfile took a patch to read the value from vmcore but the kernel patch was not accepted. So we should first make clear if this is really needed, why other arches do not need this in makedumpfile. If we really need it then should it be arm64 only? If it is arm64 only then the makedumpfile code should read this number only for arm64. Also Lianbo added the vmcoreinfo documents, I believe it stays in -tip tree, need to make sure to document this as well. 2. MAX_USER_VA_BITS Does makedumpfile care about userspace VA bits? I do not see other code doing this, Kazu and Dave A should be able to comment. I tend to doubt about this. > > Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: James Morse <james.morse@arm.com> > Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> > --- > arch/arm64/kernel/crash_core.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c > index ca4c3e12d8c5..ad231be5c0d8 100644 > --- a/arch/arm64/kernel/crash_core.c > +++ b/arch/arm64/kernel/crash_core.c > @@ -10,6 +10,8 @@ > void arch_crash_save_vmcoreinfo(void) > { > VMCOREINFO_NUMBER(VA_BITS); > + VMCOREINFO_NUMBER(MAX_USER_VA_BITS); > + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS); > /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */ > vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", > kimage_voffset); > -- > 2.7.4 > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec Thanks Dave
On 01/31/2019 07:18 AM, Dave Young wrote: > + more people > On 01/30/19 at 05:53pm, Bhupesh Sharma wrote: >> With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which >> supports these extensions can support upto 52-bit virtual and 52-bit >> physical addresses respectively. >> >> Since at the moment we enable the support of these extensions via CONFIG >> flags, e.g. >> - LPA via CONFIG_ARM64_PA_BITS_52 >> >> there are no clear mechanisms in user-space right now to >> deteremine these CONFIG flag values and also determine the PARange and >> VARange address values. >> >> User-space tools like 'makedumpfile' and 'crash-utility' can instead >> use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine >> the maximum virtual address and physical address (respectively) >> supported by underlying kernel. >> >> A reference 'makedumpfile' implementation which uses this approach to >> determining the maximum physical address is available in [0]. >> >> [0]. https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490 > > I'm not objecting the patch, just want to make sure to make clear about > things and make sure these issues are aware by people, and leave arm > people to review the arm bits. > > 1. MAX_PHYSMEM_BITS > As we previously found, back to 2014 makedumpfile took a patch to read the > value from vmcore but the kernel patch was not accepted. > So we should first make clear if this is really needed, why other arches > do not need this in makedumpfile. I explained this a bit in my reply to Suzuki's and James's review comments yesterday, but let me summarize the same again for better clarity: Let's take the example of x86. We have CONFIG_X86_5LEVEL config flag to indicate 5 level page-table support. We export the same in vmcoreinfo for x86_64 using: void arch_crash_save_vmcoreinfo(void) { <.. snip..> vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", pgtable_l5_enabled()); } Also a simple grep in makedumpfile and crash for the same indicates that the user-space code determines the MAX_PHYSMEM_BITS value using the 'pgtable_l5_enabled' value available in vmcoreinfo: Example from makedumpfile: ------------------------- int get_versiondep_info_x86_64(void) { /* * On linux-2.6.26, MAX_PHYSMEM_BITS is changed to 44 from 40. */ if (info->kernel_version < KERNEL_VERSION(2, 6, 26)) info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG; else if (info->kernel_version < KERNEL_VERSION(2, 6, 31)) info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_26; else if(check_5level_paging()) info->max_physmem_bits = _MAX_PHYSMEM_BITS_5LEVEL; else info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_31; ... } As we can see above, we use several if-else cases to determine the 'MAX_PHYSMEM_BITS', one of which is setting it to 52-bit, if the 'pgtable_l5_enabled' value is available and TRUE in vmcoreinfo. So, we determine 'MAX_PHYSMEM_BITS' value in makedumpfile via a vmcoreinfo export'ed variable (rather than using named 'MAX_PHYSMEM_BITS' its 'pgtable_l5_enabled' that is being used). Since for arm64, we don't have a single CONFIG flag for 52-bit addresses spaces (kernel VA, user-space VA and PA), so its better to export the respective CONFIG flags in the vmcoreinfo directly. > If we really need it then should it be arm64 only? See above, since archs like x86 use a single flag: CONFIG_X86_5LEVEL, whereas arm64 can use the combination of following flags to indicate combinations of various address spaces: - 48-bit kernel VA + 48-bit user-space VA + 52-bit PA - 48-bit kernel VA + 52-bit user-space VA + 52-bit PA - 52-bit kernel VA + 52-bit user-space VA + 52-bit PA CONFIG_ARM64_64K_PAGES CONFIG_ARM64_USER_VA_BITS_52 CONFIG_ARM64_VA_BITS CONFIG_ARM64_PA_BITS_52 CONFIG_ARM64_PA_BITS CONFIG_EXPERT, and CONFIG_ARM64_FORCE_52BIT so probably its not correct to compare the two cases one-to-one (its more like an apple and an orange comparison). > If it is arm64 only then the makedumpfile code should read this number > only for arm64. > > Also Lianbo added the vmcoreinfo documents, I believe it stays in -tip > tree, need to make sure to document this as well. Sure, I will send a separate patch to fix the same, once this gets in. > > 2. MAX_USER_VA_BITS > Does makedumpfile care about userspace VA bits? Yes. Consider the case 48-bit kernel VA and 52-bit user-space VA, which is a perfectly valid case on arm64. In such cases VA_BITS is set to 48, whereas MAX_USER_VA_BITS is set to 52, which allows user-space applications which use 52-bit virtual address to pass a hint to 'mmap' to get high addresses. I do not see other code > doing this, Kazu and Dave A should be able to comment. I talked to Dave A. yesterday off-list, I think he mentioned that these changes are useful for crash-utility as well and he was hoping it gets accepted soon so that kernel-debugging tools can handle increased address spaces on arm64. But it will be great to have reviews/ACKs from Dave A and others as well. Thanks, Bhupesh > > I tend to doubt about this. > >> >> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: Will Deacon <will.deacon@arm.com> >> Cc: James Morse <james.morse@arm.com> >> Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> >> --- >> arch/arm64/kernel/crash_core.c | 2 ++ >> 1 file changed, 2 insertions(+) >> >> diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c >> index ca4c3e12d8c5..ad231be5c0d8 100644 >> --- a/arch/arm64/kernel/crash_core.c >> +++ b/arch/arm64/kernel/crash_core.c >> @@ -10,6 +10,8 @@ >> void arch_crash_save_vmcoreinfo(void) >> { >> VMCOREINFO_NUMBER(VA_BITS); >> + VMCOREINFO_NUMBER(MAX_USER_VA_BITS); >> + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS); >> /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */ >> vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", >> kimage_voffset); >> -- >> 2.7.4 >> >> >> _______________________________________________ >> kexec mailing list >> kexec@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/kexec > > Thanks > Dave >
----- Original Message ----- > + more people > On 01/30/19 at 05:53pm, Bhupesh Sharma wrote: > > With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which > > supports these extensions can support upto 52-bit virtual and 52-bit > > physical addresses respectively. > > > > Since at the moment we enable the support of these extensions via CONFIG > > flags, e.g. > > - LPA via CONFIG_ARM64_PA_BITS_52 > > > > there are no clear mechanisms in user-space right now to > > deteremine these CONFIG flag values and also determine the PARange and > > VARange address values. > > > > User-space tools like 'makedumpfile' and 'crash-utility' can instead > > use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine > > the maximum virtual address and physical address (respectively) > > supported by underlying kernel. > > > > A reference 'makedumpfile' implementation which uses this approach to > > determining the maximum physical address is available in [0]. > > > > [0]. > > https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490 > > I'm not objecting the patch, just want to make sure to make clear about > things and make sure these issues are aware by people, and leave arm > people to review the arm bits. > > 1. MAX_PHYSMEM_BITS > As we previously found, back to 2014 makedumpfile took a patch to read the > value from vmcore but the kernel patch was not accepted. > So we should first make clear if this is really needed, why other arches > do not need this in makedumpfile. > > If we really need it then should it be arm64 only? > > If it is arm64 only then the makedumpfile code should read this number > only for arm64. With respect to the crash utility, this is the first time/architecture where the determination of MAX_PHYSMEM_BITS cannot be determined. > > Also Lianbo added the vmcoreinfo documents, I believe it stays in -tip > tree, need to make sure to document this as well. > > 2. MAX_USER_VA_BITS > Does makedumpfile care about userspace VA bits? I do not see other code > doing this, Kazu and Dave A should be able to comment. > > I tend to doubt about this. With respect to the crash utility, this won't be required because there is the exported "vabits_user" variable. Dave Anderson > > > > > Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> > > Cc: Mark Rutland <mark.rutland@arm.com> > > Cc: Will Deacon <will.deacon@arm.com> > > Cc: James Morse <james.morse@arm.com> > > Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> > > --- > > arch/arm64/kernel/crash_core.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/arch/arm64/kernel/crash_core.c > > b/arch/arm64/kernel/crash_core.c > > index ca4c3e12d8c5..ad231be5c0d8 100644 > > --- a/arch/arm64/kernel/crash_core.c > > +++ b/arch/arm64/kernel/crash_core.c > > @@ -10,6 +10,8 @@ > > void arch_crash_save_vmcoreinfo(void) > > { > > VMCOREINFO_NUMBER(VA_BITS); > > + VMCOREINFO_NUMBER(MAX_USER_VA_BITS); > > + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS); > > /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */ > > vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", > > kimage_voffset); > > -- > > 2.7.4 > > > > > > _______________________________________________ > > kexec mailing list > > kexec@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec > > Thanks > Dave >
Hi James, All, On 01/31/2019 03:09 AM, Bhupesh Sharma wrote: > Hi James, > > Thanks for review. > Please see my comments inline. > > > On 01/30/2019 08:51 PM, James Morse wrote: >> Hi Bhupesh, >> >> On 01/30/2019 12:23 PM, Bhupesh Sharma wrote: >>> With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which >>> supports these extensions can support upto 52-bit virtual and 52-bit >>> physical addresses respectively. >>> >>> Since at the moment we enable the support of these extensions via CONFIG >>> flags, e.g. >>> - LPA via CONFIG_ARM64_PA_BITS_52 >>> >>> there are no clear mechanisms in user-space right now to >>> deteremine these CONFIG flag values and also determine the PARange and >>> VARange address values. >>> User-space tools like 'makedumpfile' and 'crash-utility' can instead >>> use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine >>> the maximum virtual address and physical address (respectively) >>> supported by underlying kernel. >>> >>> A reference 'makedumpfile' implementation which uses this approach to >>> determining the maximum physical address is available in [0]. >> >> Why does it need to know? >> >> (Suzuki asked the same question on your earlier version) >> https://lore.kernel.org/linux-arm-kernel/cff44754-7fe4-efea-bc8e-4dde2277c821@arm.com/ > > > I have shared some details (after discussion with our test teams) in > reply to the review comments from Suzuki here: > http://lists.infradead.org/pipermail/kexec/2019-January/022389.html, and > http://lists.infradead.org/pipermail/kexec/2019-January/022390.html > > Just to summarize, I mentioned in my replies to the review comments tha > the makedumpfile implementation (for decoding the PTE) was just as an > example, however there can be other user-space applications, for e.g a > user-space application running with 48-bit kernel VA and 52-bit user > space VA and requesting allocation in 'high' address via a 'hint' to mmap. > >> From your github link it looks like you use this to re-assemble the >> two bits of the PFN from the pte. Can't you always do this for 64K >> pages? CPUs with the feature always do this too, its not something the >> kernel turns on. > > Ok, let me try to give some perspective of a common makedumpfile > use-case before I jump into the details: > > (a) makedumpfile tool can be used to generate a vmcore and analyze it > later. So for example we can create vmcore for a system running with > page-size = 64K and analyze it later on a different system using > page-size = 4K. > > Since several makedumpfile code legs (for page-table walk) are common in > both the paths (creating a vmcore and analyzing a vmcore), we cannot > hardcode the PTE calculation masks for either 48-bit or 52-bit address > spaces (or 4K/64K page sizes). The example invocations for the two cases > is given below: > > Create a vmcore dump on a 64K machine: > # makedumpfile -l --message-level 1 -d 31 /proc/vmcore vmcore > > Analyze the vmcore dump on a 4K machine: > # makedumpfile -d 31 -x vmlinux vmcore dumpfile > > Also hardcoding the PTE calculation to use the high address bit mask > always will break the backward compatibility with older kernels (which > don't support 52-bit address space extensions). > > (b). Also x86_64 already has a vmcoreinfo export for 'pgtable_l5_enabled': > > void arch_crash_save_vmcoreinfo(void) > { > <.. snip..> > vmcoreinfo_append_str("NUMBER(pgtable_l5_enabled)=%d\n", > pgtable_l5_enabled()); > } > > And the makedumpfile code uses the same to determine support for 5-level > page tables in x86_64, see > <https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/x86_64.c#L36> > for example. Ping. Since this patch fixes a regression with user-space tools like makedumpfile and crash-utility which are broken since arm64 kernels with 52-bit VA and PA support are available (and distributions which enable them), would request review comments/ack on this simple change. Thanks, Bhupesh >> >>> diff --git a/arch/arm64/kernel/crash_core.c >>> b/arch/arm64/kernel/crash_core.c >>> index ca4c3e12d8c5..ad231be5c0d8 100644 >>> --- a/arch/arm64/kernel/crash_core.c >>> +++ b/arch/arm64/kernel/crash_core.c >>> @@ -10,6 +10,8 @@ >>> void arch_crash_save_vmcoreinfo(void) >>> { >>> VMCOREINFO_NUMBER(VA_BITS); >>> + VMCOREINFO_NUMBER(MAX_USER_VA_BITS); >>> + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS); >>> /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */ >>> vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", >>> kimage_voffset); >>> >> >
On 04/02/2019 14:35, Bhupesh Sharma wrote: [...] >> Also hardcoding the PTE calculation to use the high address bit mask >> always will break the backward compatibility with older kernels (which >> don't support 52-bit address space extensions). No it won't. There's no difference between an old kernel, a new kernel on a CPU without ARMv8.2-LPA, or a new kernel on a CPU with ARMv8.2-LPA in a system which happens to have less than 49 bits of physical memory map - in all those cases the relevant bits are either RES0 or just actually 0 in the PTE, so replacing 4 bits of zeros with 4 bits of other zeros in the final physical address has no effect whatsoever other than taking a couple of extra instructions to perform. If you're running a 64K page kernel on a system with an SMMU, note how that's already been "broken" for nearly a year now ;) https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/io-pgtable-arm.c#n211 Robin.
On 1/30/2019 8:48 PM, Dave Young wrote: > + more people > On 01/30/19 at 05:53pm, Bhupesh Sharma wrote: > > With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which > > supports these extensions can support upto 52-bit virtual and 52-bit > > physical addresses respectively. > > > > Since at the moment we enable the support of these extensions via CONFIG > > flags, e.g. > > - LPA via CONFIG_ARM64_PA_BITS_52 > > > > there are no clear mechanisms in user-space right now to > > deteremine these CONFIG flag values and also determine the PARange and > > VARange address values. > > > > User-space tools like 'makedumpfile' and 'crash-utility' can instead > > use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine > > the maximum virtual address and physical address (respectively) > > supported by underlying kernel. > > > > A reference 'makedumpfile' implementation which uses this approach to > > determining the maximum physical address is available in [0]. > > > > [0]. > https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490 > > I'm not objecting the patch, just want to make sure to make clear about > things and make sure these issues are aware by people, and leave arm > people to review the arm bits. > > 1. MAX_PHYSMEM_BITS > As we previously found, back to 2014 makedumpfile took a patch to read the > value from vmcore but the kernel patch was not accepted. > So we should first make clear if this is really needed, why other arches > do not need this in makedumpfile. > > If we really need it then should it be arm64 only? > > If it is arm64 only then the makedumpfile code should read this number > only for arm64. Sorry for the delay. According to the kernel patch, some of arm32 platforms may need it http://lists.infradead.org/pipermail/kexec/2014-May/011909.html but except for them (and arm64), makedumpfile can manage with kernel version and some switches to determine this value so far. > > Also Lianbo added the vmcoreinfo documents, I believe it stays in -tip > tree, need to make sure to document this as well. > > 2. MAX_USER_VA_BITS > Does makedumpfile care about userspace VA bits? I do not see other code > doing this, Kazu and Dave A should be able to comment. The mapping makedumpfile uses on arm64 is swapper_pg_dir only, so unless the config affects its structure or something, makedumpfile will not need this value. Thanks, Kazu > > I tend to doubt about this. > > > > > Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> > > Cc: Mark Rutland <mark.rutland@arm.com> > > Cc: Will Deacon <will.deacon@arm.com> > > Cc: James Morse <james.morse@arm.com> > > Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> > > --- > > arch/arm64/kernel/crash_core.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c > > index ca4c3e12d8c5..ad231be5c0d8 100644 > > --- a/arch/arm64/kernel/crash_core.c > > +++ b/arch/arm64/kernel/crash_core.c > > @@ -10,6 +10,8 @@ > > void arch_crash_save_vmcoreinfo(void) > > { > > VMCOREINFO_NUMBER(VA_BITS); > > + VMCOREINFO_NUMBER(MAX_USER_VA_BITS); > > + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS); > > /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */ > > vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", > > kimage_voffset); > > -- > > 2.7.4 > > > > > > _______________________________________________ > > kexec mailing list > > kexec@lists.infradead.org > > http://lists.infradead.org/mailman/listinfo/kexec > > Thanks > Dave
Hi Bhupesh, On 04/02/2019 14:35, Bhupesh Sharma wrote: > On 01/31/2019 03:09 AM, Bhupesh Sharma wrote: >> On 01/30/2019 08:51 PM, James Morse wrote: >>> On 01/30/2019 12:23 PM, Bhupesh Sharma wrote: >>>> With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which >>>> supports these extensions can support upto 52-bit virtual and 52-bit >>>> physical addresses respectively. >>>> >>>> Since at the moment we enable the support of these extensions via CONFIG >>>> flags, e.g. >>>> - LPA via CONFIG_ARM64_PA_BITS_52 >>>> >>>> there are no clear mechanisms in user-space right now to >>>> deteremine these CONFIG flag values and also determine the PARange and >>>> VARange address values. >>>> User-space tools like 'makedumpfile' and 'crash-utility' can instead >>>> use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine >>>> the maximum virtual address and physical address (respectively) >>>> supported by underlying kernel. >>>> >>>> A reference 'makedumpfile' implementation which uses this approach to >>>> determining the maximum physical address is available in [0]. >>> >>> Why does it need to know? >>> >>> (Suzuki asked the same question on your earlier version) >>> https://lore.kernel.org/linux-arm-kernel/cff44754-7fe4-efea-bc8e-4dde2277c821@arm.com/ >> I have shared some details (after discussion with our test teams) in reply to >> the review comments from Suzuki here: >> http://lists.infradead.org/pipermail/kexec/2019-January/022389.html, and >> http://lists.infradead.org/pipermail/kexec/2019-January/022390.html >> >> Just to summarize, I mentioned in my replies to the review comments tha the >> makedumpfile implementation (for decoding the PTE) was just as an example, >> however there can be other user-space applications, for e.g a user-space >> application running with 48-bit kernel VA and 52-bit user space VA and >> requesting allocation in 'high' address via a 'hint' to mmap. But vmcoreinfo is the wrong place to expose this information. (it can be configured off, and is only accessible to root) >>> From your github link it looks like you use this to re-assemble the two bits >>> of the PFN from the pte. Can't you always do this for 64K pages? CPUs with >>> the feature always do this too, its not something the kernel turns on. >> >> Ok, let me try to give some perspective of a common makedumpfile use-case >> before I jump into the details: >> Also hardcoding the PTE calculation to use the high address bit mask always >> will break the backward compatibility with older kernels (which don't support >> 52-bit address space extensions). What would go wrong? The hardware ignores those bits and supplies zero. As far as I can see the kernel has always generated zero there. >> (b). Also x86_64 already has a vmcoreinfo export for 'pgtable_l5_enabled': So? 5-level page tables is a different feature. I agree you need to know the number of levels to walk the page-tables, but that isn't how arm64's 52bit stuff works. > Ping. Since this patch fixes a regression with user-space tools like > makedumpfile and crash-utility which are broken since arm64 kernels with 52-bit > VA and PA support are available (and distributions which enable them), would > request review comments/ack on this simple change. Broken how? What goes wrong? I can see how a not-52bit-aware crash/gdb/whatever would be confused by high bits being set in the physical address, and possibly throw them away. But once it supports this for 64K pages, I don't see what can go wrong if those bits aren't set. Why does it need to know? Thanks, James
Hi Robin, On 02/04/2019 09:01 PM, Robin Murphy wrote: > On 04/02/2019 14:35, Bhupesh Sharma wrote: > [...] >>> Also hardcoding the PTE calculation to use the high address bit mask >>> always will break the backward compatibility with older kernels >>> (which don't support 52-bit address space extensions). > > No it won't. There's no difference between an old kernel, a new kernel > on a CPU without ARMv8.2-LPA, or a new kernel on a CPU with ARMv8.2-LPA > in a system which happens to have less than 49 bits of physical memory > map - in all those cases the relevant bits are either RES0 or just > actually 0 in the PTE, so replacing 4 bits of zeros with 4 bits of other > zeros in the final physical address has no effect whatsoever other than > taking a couple of extra instructions to perform. Right, but lets think this from a distribution user p-o-v. Why would a user with newer kernel and CPU which doesn't support ARMv8.2 LPA want to update a user-space utility? Well unless something is broken in the user-space. Requesting an upgrade is easier in this case, as the user-space components like crash-utility/kexec-tools (or for that matter any user-space tool which requires a page-table walk to determine vaddr/paddr values) are indeed broken with this combination. On the other hand, if a user a having a old kernel, CPU which doesn't support LPA, but if we still want them to upgrade to a newer version of user-space utility (with a upgrade fix note reading something like: "Add 52-bit PA support"), surely that would not be an ideal case requiring an upgrade as his underlying environment doesn't support the same. That's what I meant when I said that we need to take care of the backward compatibility also when we propose new code changes to user-space packages. Thanks, Bhupesh > If you're running a 64K page kernel on a system with an SMMU, note how > that's already been "broken" for nearly a year now ;) > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/iommu/io-pgtable-arm.c#n211 > > > > Robin.
Hi Kazu, On 02/04/2019 09:34 PM, Kazuhito Hagio wrote: > On 1/30/2019 8:48 PM, Dave Young wrote: >> + more people >> On 01/30/19 at 05:53pm, Bhupesh Sharma wrote: >>> With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which >>> supports these extensions can support upto 52-bit virtual and 52-bit >>> physical addresses respectively. >>> >>> Since at the moment we enable the support of these extensions via CONFIG >>> flags, e.g. >>> - LPA via CONFIG_ARM64_PA_BITS_52 >>> >>> there are no clear mechanisms in user-space right now to >>> deteremine these CONFIG flag values and also determine the PARange and >>> VARange address values. >>> >>> User-space tools like 'makedumpfile' and 'crash-utility' can instead >>> use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine >>> the maximum virtual address and physical address (respectively) >>> supported by underlying kernel. >>> >>> A reference 'makedumpfile' implementation which uses this approach to >>> determining the maximum physical address is available in [0]. >>> >>> [0]. >> https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490 >> >> I'm not objecting the patch, just want to make sure to make clear about >> things and make sure these issues are aware by people, and leave arm >> people to review the arm bits. >> >> 1. MAX_PHYSMEM_BITS >> As we previously found, back to 2014 makedumpfile took a patch to read the >> value from vmcore but the kernel patch was not accepted. >> So we should first make clear if this is really needed, why other arches >> do not need this in makedumpfile. >> >> If we really need it then should it be arm64 only? >> >> If it is arm64 only then the makedumpfile code should read this number >> only for arm64. > > Sorry for the delay. > > According to the kernel patch, some of arm32 platforms may need it > http://lists.infradead.org/pipermail/kexec/2014-May/011909.html > but except for them (and arm64), makedumpfile can manage with kernel > version and some switches to determine this value so far. > >> >> Also Lianbo added the vmcoreinfo documents, I believe it stays in -tip >> tree, need to make sure to document this as well. >> >> 2. MAX_USER_VA_BITS >> Does makedumpfile care about userspace VA bits? I do not see other code >> doing this, Kazu and Dave A should be able to comment. > > The mapping makedumpfile uses on arm64 is swapper_pg_dir only, so > unless the config affects its structure or something, makedumpfile > will not need this value. I captured this case in more details while sending out the makedumpfile enablement patch for ARMv8.2-LVA (see [0]), but here is a brief summary on the same: Since at the moment we enable the support of the ARMv8.2-LVA extension for 52-bit user-space VA in the kernel via a CONFIG flags (CONFIG_ARM64_USER_VA_BITS_52), so there are no clear mechanisms in user-space to determine this CONFIG flag value and use it to determine the address range values. Since 'VA_BITS' are already exported via vmcoreinfo, if we export 'MAX_USER_VA_BITS' as well, we can use the same in user-space to check if the 'MAX_USER_VA_BITS' value is greater than 'VA_BITS'. If yes, then we are running a use-case where user-space is 52-bit while the underlying kernel is still 48-bit. The increased 'PTRS_PER_PGD' value for such cases needs to be then calculated as is done by the underlying kernel (see 'arch/arm64/include/asm/pgtable-hwdef.h' for details): #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) Also, note that 'arch/arm64/include/asm/memory.h' defines 'MAX_USER_VA_BITS' as 'VA_BITS' in case 'CONFIG_ARM64_USER_VA_BITS_52' is set to 'n': #ifdef CONFIG_ARM64_USER_VA_BITS_52 #define MAX_USER_VA_BITS 52 #else #define MAX_USER_VA_BITS VA_BITS #endif So, makedumpfile will need this symbol exported in vmcore to make the above determination. [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022425.html Thanks, Bhupesh
On 02/12/19 at 10:37am, Bhupesh Sharma wrote: > Hi Kazu, > > On 02/04/2019 09:34 PM, Kazuhito Hagio wrote: > > On 1/30/2019 8:48 PM, Dave Young wrote: > > > + more people > > > On 01/30/19 at 05:53pm, Bhupesh Sharma wrote: > > > > With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which > > > > supports these extensions can support upto 52-bit virtual and 52-bit > > > > physical addresses respectively. > > > > > > > > Since at the moment we enable the support of these extensions via CONFIG > > > > flags, e.g. > > > > - LPA via CONFIG_ARM64_PA_BITS_52 > > > > > > > > there are no clear mechanisms in user-space right now to > > > > deteremine these CONFIG flag values and also determine the PARange and > > > > VARange address values. > > > > > > > > User-space tools like 'makedumpfile' and 'crash-utility' can instead > > > > use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine > > > > the maximum virtual address and physical address (respectively) > > > > supported by underlying kernel. > > > > > > > > A reference 'makedumpfile' implementation which uses this approach to > > > > determining the maximum physical address is available in [0]. > > > > > > > > [0]. > > > https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490 > > > > > > I'm not objecting the patch, just want to make sure to make clear about > > > things and make sure these issues are aware by people, and leave arm > > > people to review the arm bits. > > > > > > 1. MAX_PHYSMEM_BITS > > > As we previously found, back to 2014 makedumpfile took a patch to read the > > > value from vmcore but the kernel patch was not accepted. > > > So we should first make clear if this is really needed, why other arches > > > do not need this in makedumpfile. > > > > > > If we really need it then should it be arm64 only? > > > > > > If it is arm64 only then the makedumpfile code should read this number > > > only for arm64. > > > > Sorry for the delay. > > > > According to the kernel patch, some of arm32 platforms may need it > > http://lists.infradead.org/pipermail/kexec/2014-May/011909.html > > but except for them (and arm64), makedumpfile can manage with kernel > > version and some switches to determine this value so far. > > > > > > > > Also Lianbo added the vmcoreinfo documents, I believe it stays in -tip > > > tree, need to make sure to document this as well. > > > > > > 2. MAX_USER_VA_BITS > > > Does makedumpfile care about userspace VA bits? I do not see other code > > > doing this, Kazu and Dave A should be able to comment. > > > > The mapping makedumpfile uses on arm64 is swapper_pg_dir only, so > > unless the config affects its structure or something, makedumpfile > > will not need this value. > > I captured this case in more details while sending out the makedumpfile > enablement patch for ARMv8.2-LVA (see [0]), but here is a brief summary on > the same: > > Since at the moment we enable the support of the ARMv8.2-LVA extension for > 52-bit user-space VA in the kernel via a CONFIG flags > (CONFIG_ARM64_USER_VA_BITS_52), so there are no clear mechanisms in > user-space to determine this CONFIG > flag value and use it to determine the address range values. > > Since 'VA_BITS' are already exported via vmcoreinfo, if we export > 'MAX_USER_VA_BITS' as well, we can use the same in user-space to check if > the 'MAX_USER_VA_BITS' value is greater than 'VA_BITS'. If yes, then we are > running a use-case where user-space is 52-bit while the underlying kernel is > still 48-bit. Problem is why this is needed, it sounds like you are talking about some non-exist use case. > > The increased 'PTRS_PER_PGD' value for such cases needs to be then > calculated as is done by the underlying kernel (see > 'arch/arm64/include/asm/pgtable-hwdef.h' for details): > > #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) > > Also, note that 'arch/arm64/include/asm/memory.h' defines 'MAX_USER_VA_BITS' > as 'VA_BITS' in case 'CONFIG_ARM64_USER_VA_BITS_52' is set to 'n': > > #ifdef CONFIG_ARM64_USER_VA_BITS_52 > #define MAX_USER_VA_BITS 52 > #else > #define MAX_USER_VA_BITS VA_BITS > #endif > > So, makedumpfile will need this symbol exported in vmcore to make the above > determination. > > [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022425.html > > Thanks, > Bhupesh Thanks Dave
On 12/02/2019 04:55, Bhupesh Sharma wrote: > Hi Robin, > > On 02/04/2019 09:01 PM, Robin Murphy wrote: >> On 04/02/2019 14:35, Bhupesh Sharma wrote: >> [...] >>>> Also hardcoding the PTE calculation to use the high address bit mask >>>> always will break the backward compatibility with older kernels >>>> (which don't support 52-bit address space extensions). >> >> No it won't. There's no difference between an old kernel, a new kernel >> on a CPU without ARMv8.2-LPA, or a new kernel on a CPU with >> ARMv8.2-LPA in a system which happens to have less than 49 bits of >> physical memory map - in all those cases the relevant bits are either >> RES0 or just actually 0 in the PTE, so replacing 4 bits of zeros with >> 4 bits of other zeros in the final physical address has no effect >> whatsoever other than taking a couple of extra instructions to perform. > > Right, but lets think this from a distribution user p-o-v. Why would a > user with newer kernel and CPU which doesn't support ARMv8.2 LPA want to > update a user-space utility? Well unless something is broken in the > user-space. Requesting an upgrade is easier in this case, as the > user-space components like crash-utility/kexec-tools (or for that matter > any user-space tool which requires a page-table walk to determine > vaddr/paddr values) are indeed broken with this combination. Huh? Nothing gets broken unless we go out of our way to break it. The point I'm making is that the format of vmcoreinfo *does not* need to change in order for the userspace tool to support LPA. There is no compatibility issue either way. If a user's machine doesn't support or make use of LPA, then yeah, they should absolutely not have to upgrade their tools for the sake of LPA regardless of what kernel they use. > On the other hand, if a user a having a old kernel, CPU which doesn't > support LPA, but if we still want them to upgrade to a newer version of > user-space utility (with a upgrade fix note reading something like: "Add > 52-bit PA support"), surely that would not be an ideal case requiring an > upgrade as his underlying environment doesn't support the same. I don't get why we would "still want them to upgrade" if there's no technical reason for them to do so. That sounds like the kind of crap that proprietary OS vendors pull :/ > That's what I meant when I said that we need to take care of the > backward compatibility also when we propose new code changes to > user-space packages. And what I meant is that you can trivially update the userspace tools to support LPA entirely transparently, without requiring kernel changes or creating any compatibility issues in any direction. Robin.
On Tue, Feb 12, 2019 at 4:14 PM Dave Young <dyoung@redhat.com> wrote: > > On 02/12/19 at 10:37am, Bhupesh Sharma wrote: > > Hi Kazu, > > > > On 02/04/2019 09:34 PM, Kazuhito Hagio wrote: > > > On 1/30/2019 8:48 PM, Dave Young wrote: > > > > + more people > > > > On 01/30/19 at 05:53pm, Bhupesh Sharma wrote: > > > > > With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which > > > > > supports these extensions can support upto 52-bit virtual and 52-bit > > > > > physical addresses respectively. > > > > > > > > > > Since at the moment we enable the support of these extensions via CONFIG > > > > > flags, e.g. > > > > > - LPA via CONFIG_ARM64_PA_BITS_52 > > > > > > > > > > there are no clear mechanisms in user-space right now to > > > > > deteremine these CONFIG flag values and also determine the PARange and > > > > > VARange address values. > > > > > > > > > > User-space tools like 'makedumpfile' and 'crash-utility' can instead > > > > > use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine > > > > > the maximum virtual address and physical address (respectively) > > > > > supported by underlying kernel. > > > > > > > > > > A reference 'makedumpfile' implementation which uses this approach to > > > > > determining the maximum physical address is available in [0]. > > > > > > > > > > [0]. > > > > https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490 > > > > > > > > I'm not objecting the patch, just want to make sure to make clear about > > > > things and make sure these issues are aware by people, and leave arm > > > > people to review the arm bits. > > > > > > > > 1. MAX_PHYSMEM_BITS > > > > As we previously found, back to 2014 makedumpfile took a patch to read the > > > > value from vmcore but the kernel patch was not accepted. > > > > So we should first make clear if this is really needed, why other arches > > > > do not need this in makedumpfile. > > > > > > > > If we really need it then should it be arm64 only? > > > > > > > > If it is arm64 only then the makedumpfile code should read this number > > > > only for arm64. > > > > > > Sorry for the delay. > > > > > > According to the kernel patch, some of arm32 platforms may need it > > > http://lists.infradead.org/pipermail/kexec/2014-May/011909.html > > > but except for them (and arm64), makedumpfile can manage with kernel > > > version and some switches to determine this value so far. > > > > > > > > > > > Also Lianbo added the vmcoreinfo documents, I believe it stays in -tip > > > > tree, need to make sure to document this as well. > > > > > > > > 2. MAX_USER_VA_BITS > > > > Does makedumpfile care about userspace VA bits? I do not see other code > > > > doing this, Kazu and Dave A should be able to comment. > > > > > > The mapping makedumpfile uses on arm64 is swapper_pg_dir only, so > > > unless the config affects its structure or something, makedumpfile > > > will not need this value. > > > > I captured this case in more details while sending out the makedumpfile > > enablement patch for ARMv8.2-LVA (see [0]), but here is a brief summary on > > the same: > > > > Since at the moment we enable the support of the ARMv8.2-LVA extension for > > 52-bit user-space VA in the kernel via a CONFIG flags > > (CONFIG_ARM64_USER_VA_BITS_52), so there are no clear mechanisms in > > user-space to determine this CONFIG > > flag value and use it to determine the address range values. > > > > Since 'VA_BITS' are already exported via vmcoreinfo, if we export > > 'MAX_USER_VA_BITS' as well, we can use the same in user-space to check if > > the 'MAX_USER_VA_BITS' value is greater than 'VA_BITS'. If yes, then we are > > running a use-case where user-space is 52-bit while the underlying kernel is > > still 48-bit. > > Problem is why this is needed, it sounds like you are talking about some > non-exist use case. I already explained this in an earlier reply to your initial comments (see <http://lists.infradead.org/pipermail/kexec/2019-January/022395.html>). Perhaps you missed reading it, so here are the architecturally possible and kernel supported uses cases with ARMv8.2 extensions (depending on the combination of CONFIG flags and kernel version): - 48-bit kernel VA + 48-bit user-space VA + 52-bit PA - 48-bit kernel VA + 52-bit user-space VA + 52-bit PA - 52-bit kernel VA + 52-bit user-space VA + 52-bit PA Please see 'config ARM64_USER_VA_BITS_52' help text inside 'arch/arm64/Kconfig' for more details: config ARM64_USER_VA_BITS_52 bool "52-bit (user)" depends on ARM64_64K_PAGES && (ARM64_PAN || !ARM64_SW_TTBR0_PAN) help Enable 52-bit virtual addressing for userspace when explicitly requested via a hint to mmap(). The kernel will continue to use 48-bit virtual addresses for its own mappings. BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu mentioned that the changes look necessary. [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html Thanks, Bhupesh > > > > The increased 'PTRS_PER_PGD' value for such cases needs to be then > > calculated as is done by the underlying kernel (see > > 'arch/arm64/include/asm/pgtable-hwdef.h' for details): > > > > #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) > > > > Also, note that 'arch/arm64/include/asm/memory.h' defines 'MAX_USER_VA_BITS' > > as 'VA_BITS' in case 'CONFIG_ARM64_USER_VA_BITS_52' is set to 'n': > > > > #ifdef CONFIG_ARM64_USER_VA_BITS_52 > > #define MAX_USER_VA_BITS 52 > > #else > > #define MAX_USER_VA_BITS VA_BITS > > #endif > > > > So, makedumpfile will need this symbol exported in vmcore to make the above > > determination. > > > > [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022425.html > > > > Thanks, > > Bhupesh > > Thanks > Dave
On 2/12/2019 2:59 PM, Bhupesh Sharma wrote: > BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA > (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu > mentioned that the changes look necessary. > > [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html > > > The increased 'PTRS_PER_PGD' value for such cases needs to be then > > > calculated as is done by the underlying kernel (see > > > 'arch/arm64/include/asm/pgtable-hwdef.h' for details): > > > > > > #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS. It is used for pgd_index() also in makedumpfile to walk page tables. /* to find an entry in a page-table-directory */ #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) Thanks, Kazu > > > > > > Also, note that 'arch/arm64/include/asm/memory.h' defines 'MAX_USER_VA_BITS' > > > as 'VA_BITS' in case 'CONFIG_ARM64_USER_VA_BITS_52' is set to 'n': > > > > > > #ifdef CONFIG_ARM64_USER_VA_BITS_52 > > > #define MAX_USER_VA_BITS 52 > > > #else > > > #define MAX_USER_VA_BITS VA_BITS > > > #endif > > > > > > So, makedumpfile will need this symbol exported in vmcore to make the above > > > determination. > > > > > > [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022425.html > > > > > > Thanks, > > > Bhupesh > > > > Thanks > > Dave
On 02/12/19 at 11:03pm, Kazuhito Hagio wrote: > > On 2/12/2019 2:59 PM, Bhupesh Sharma wrote: > > BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA > > (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu > > mentioned that the changes look necessary. > > > > [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html > > > > > The increased 'PTRS_PER_PGD' value for such cases needs to be then > > > > calculated as is done by the underlying kernel (see > > > > 'arch/arm64/include/asm/pgtable-hwdef.h' for details): > > > > > > > > #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) > > Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS. > It is used for pgd_index() also in makedumpfile to walk page tables. > > /* to find an entry in a page-table-directory */ > #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) Since Dave mentioned crash tool does not need it, but crash should also travel the pg tables. If this is really necessary it would be good to describe what will happen without the patch, eg. some user visible error from an actual test etc. > > Thanks, > Kazu > > > > > > > > > Also, note that 'arch/arm64/include/asm/memory.h' defines 'MAX_USER_VA_BITS' > > > > as 'VA_BITS' in case 'CONFIG_ARM64_USER_VA_BITS_52' is set to 'n': > > > > > > > > #ifdef CONFIG_ARM64_USER_VA_BITS_52 > > > > #define MAX_USER_VA_BITS 52 > > > > #else > > > > #define MAX_USER_VA_BITS VA_BITS > > > > #endif > > > > > > > > So, makedumpfile will need this symbol exported in vmcore to make the above > > > > determination. > > > > > > > > [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022425.html > > > > > > > > Thanks, > > > > Bhupesh > > > > > > Thanks > > > Dave >
Hi guys, On 13/02/2019 11:15, Dave Young wrote: > On 02/12/19 at 11:03pm, Kazuhito Hagio wrote: >> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote: >>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA >>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu >>> mentioned that the changes look necessary. >>> >>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html >> >>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then >>>>> calculated as is done by the underlying kernel Aha! Nothing to do with which-bits-are-pfn in the tables... You need to know if the top level PGD is 512bytes or bigger. As we use a kmem-cache the adjacent data could be some else's page tables. Is this really a problem though? You can't pull the user-space pgd pointers out of no-where, you must have walked some task_struct and struct_mm's to find them. In which case you would have the VMAs on hand to tell you if its in the mapped user range. It would be good to avoid putting something arch-specific in here if we can at all help it. >>>>> (see >>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details): >>>>> >>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) >> >> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS. >> It is used for pgd_index() also in makedumpfile to walk page tables. >> >> /* to find an entry in a page-table-directory */ >> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) > > Since Dave mentioned crash tool does not need it, but crash should also > travel the pg tables. > > If this is really necessary it would be good to describe what will > happen without the patch, eg. some user visible error from an actual test etc. Yes please, it would really help if there was a specific example we could discuss. Thanks, James
On 2/13/2019 1:22 PM, James Morse wrote: > Hi guys, > > On 13/02/2019 11:15, Dave Young wrote: > > On 02/12/19 at 11:03pm, Kazuhito Hagio wrote: > >> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote: > >>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA > >>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu > >>> mentioned that the changes look necessary. > >>> > >>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html > >> > >>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then > >>>>> calculated as is done by the underlying kernel > > Aha! Nothing to do with which-bits-are-pfn in the tables... > > You need to know if the top level PGD is 512bytes or bigger. As we use a > kmem-cache the adjacent data could be some else's page tables. > > Is this really a problem though? You can't pull the user-space pgd pointers out > of no-where, you must have walked some task_struct and struct_mm's to find them. > In which case you would have the VMAs on hand to tell you if its in the mapped > user range. > > It would be good to avoid putting something arch-specific in here if we can at > all help it. > > > >>>>> (see > >>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details): > >>>>> > >>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) > >> > >> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS. > >> It is used for pgd_index() also in makedumpfile to walk page tables. > >> > >> /* to find an entry in a page-table-directory */ > >> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) > > > > Since Dave mentioned crash tool does not need it, but crash should also > > travel the pg tables. The crash utility is always invoked with vmlinux, so it can read the vabits_user variable directly from vmcore, but makedumpfile can not. > > If this is really necessary it would be good to describe what will > > happen without the patch, eg. some user visible error from an actual test etc. > > Yes please, it would really help if there was a specific example we could discuss. With 52-bit user space and 48-bit kernel space configuration, makedumpfile will not be able to convert a virtual kernel address to a physical address, and fail to capture a dumpfile, because the pgd_index() will return a wrong index. But I don't have any suitable test system on hand, so have not tried the kernel configuration actually. If found, I'll try. Bhupesh, do you have any test result? Thanks, Kazu > > > Thanks, > > James
Hi James, On 02/13/2019 11:52 PM, James Morse wrote: > Hi guys, > > On 13/02/2019 11:15, Dave Young wrote: >> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote: >>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote: >>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA >>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu >>>> mentioned that the changes look necessary. >>>> >>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html >>> >>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then >>>>>> calculated as is done by the underlying kernel > > Aha! Nothing to do with which-bits-are-pfn in the tables... > > You need to know if the top level PGD is 512bytes or bigger. As we use a > kmem-cache the adjacent data could be some else's page tables. > > Is this really a problem though? You can't pull the user-space pgd pointers out > of no-where, you must have walked some task_struct and struct_mm's to find them. > In which case you would have the VMAs on hand to tell you if its in the mapped > user range. > > It would be good to avoid putting something arch-specific in here if we can at > all help it. > > >>>>>> (see >>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details): >>>>>> >>>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) >>> >>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS. >>> It is used for pgd_index() also in makedumpfile to walk page tables. >>> >>> /* to find an entry in a page-table-directory */ >>> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) >> >> Since Dave mentioned crash tool does not need it, but crash should also >> travel the pg tables. >> >> If this is really necessary it would be good to describe what will >> happen without the patch, eg. some user visible error from an actual test etc. > > Yes please, it would really help if there was a specific example we could discuss. Sure. Here are two use-cases/regressions reported and which I have been able to reproduce. Note that I tested them both on a CPU which does not support ARMv8.2-LPA/LVA and on ARMv8 FVP model (which supports ARMv8.2 extensions). Environment: ------------ Latest Upstream kernel: sha-id: 1f947a7a011fcceb14cb912f5481a53b18f1879a ("Merge branch 'akpm' (patches from Andrew)") Latest makedumpfile code: (git://git.code.sf.net/p/makedumpfile/code , branch: devel) crash-utility code: (https://github.com/crash-utility/crash.git, sha-id: e082c372c7f1a782b058ec359dfbbbee0f0b6aad) Note that Dave A. has since fixed crash-utility by using a hardcoded value of 'MAX_PHYSMEM_BITS' (via sha id: ac5a7889d31bb37aa0687110ecea08837f8a66a8) and determining 'vabits_user' value via vmlinux (via sha id: 8618ddd817621c40c1f44f0ab6df7c7805234416) (1). Regression Case 1 (ARMv8.2-LPA enabled kernel): - Upstream makedumpfile and crash-utility (with sha-id e082c372c7f1a782b058ec359dfbbbee0f0b6aad) are broken on following kind of platforms: a. Upstream Kernel 5.0.0-rc6+ with the following kernel configuration: CONFIG_ARM64_64K_PAGES=y # CONFIG_ARM64_VA_BITS_42 is not set CONFIG_ARM64_VA_BITS_48=y # CONFIG_ARM64_USER_VA_BITS_52 is not set CONFIG_ARM64_VA_BITS=48 # CONFIG_ARM64_PA_BITS_48 is not set CONFIG_ARM64_PA_BITS_52=y CONFIG_ARM64_PA_BITS=52 b. Both on CPUs which don't support ARMv8.2 LPA extension and on ARMv8 FVP model with ARMv8.2 LPA extensions. - Error message from makedumpfile: $ makedumpfile -f --mem-usage /proc/kcore -D max_mapnr : a00000 kimage_voffset : fffeffff80000000 max_physmem_bits : 30 section_size_bits: 1e vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=ffffff80ffffffd0 vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003 vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003 vaddr_to_paddr_arm64: paddr=911a962c va_bits : 48 page_offset : ffff800000000000 vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=41a474 vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003 vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003 vaddr_to_paddr_arm64: paddr=911a1320 num of NODEs : 1 Memory type : SPARSEMEM vaddr_to_paddr_arm64: pgda=90d70100, pudv.pgd=a657470206461 vaddr_to_paddr_arm64: puda=90d70100, pudv.pgd=9ffffd0003 vaddr_to_paddr_arm64: pmda=9ffffd27d8, pmdv.pud=9ffeee0003 vaddr_to_paddr_arm64: paddr=9ffeedc600 vaddr_to_paddr_arm64: pgda=90d70100, pudv.pgd=ffff97d98224 vaddr_to_paddr_arm64: puda=90d70100, pudv.pgd=9ffffd0003 vaddr_to_paddr_arm64: pmda=9ffffd27d8, pmdv.pud=9ffeee0003 vaddr_to_paddr_arm64: paddr=9ffeedc600 get_mm_sparsemem: Can't get the address of mem_section. c. Root Cause Analysis - - After the PA_BITS changes in arm64 kernel we set: #define MAX_PHYSMEM_BITS CONFIG_ARM64_PA_BITS - For SPARSEMEM, this value is used to calculate the bits space required to store a section: #define SECTIONS_SHIFT (MAX_PHYSMEM_BITS - SECTION_SIZE_BITS) #define NR_MEM_SECTIONS (1UL << SECTIONS_SHIFT) - User-space tools use a similar mechanism to determine the SPARSEMEM type (extreme or not) using the 'NR_MEM_SECTIONS' value (an example from makedumpfile code): int is_sparsemem_extreme(void) { if ((ARRAY_LENGTH(mem_section) == divideup(NR_MEM_SECTIONS(), _SECTIONS_PER_ROOT_EXTREME())) || (ARRAY_LENGTH(mem_section) == NOT_FOUND_STRUCTURE)) return TRUE; else return FALSE; } - Since MAX_PHYSMEM_BITS are 48 bits for normal cases and are 52 bits for extended PA address space, the memory type is incorrectly calculated as SPARSEMEM rather than SPARSEMEM_EX in above case. - Exporting correct 'MAX_PHYSMEM_BITS' via vmcoreinfo for 52-bit PA case, fixes the above mentioned issue: $ makedumpfile -f --mem-usage /proc/kcore -D <..snip..> NUMBER(MAX_PHYSMEM_BITS)=52 <..snip..> max_mapnr : a00000 kimage_voffset : fffeffff80000000 max_physmem_bits : 30 section_size_bits: 1e vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=ffffff80ffffffd0 vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003 vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003 vaddr_to_paddr_arm64: paddr=911a962c va_bits : 48 page_offset : ffff800000000000 vaddr_to_paddr_arm64: pgda=90d70000, pudv.pgd=41a474 vaddr_to_paddr_arm64: puda=90d70000, pudv.pgd=9fffff0003 vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0003 vaddr_to_paddr_arm64: paddr=911a1320 num of NODEs : 1 Memory type : SPARSEMEM_EX <..snip..> TYPE PAGES EXCLUDABLE DESCRIPTION ---------------------------------------------------------------------- ZERO 2626 yes Pages filled with zero NON_PRI_CACHE 569 yes Cache pages without private flag PRI_CACHE 5446 yes Cache pages with private flag USER 3213 yes User process pages FREE 2048971 yes Free pages KERN_DATA 19034 no Dumpable kernel data page size: 65536 Total pages on system: 2079859 Total size on system: 136305639424 Byte (2). Regression Case 2 (ARMv8.2-LPA + LVA [52-bit user-space VA] enabled kernel): - Upstream makedumpfile and crash-utility (with sha-id e082c372c7f1a782b058ec359dfbbbee0f0b6aad) are broken on following kind of platforms: a. Upstream Kernel 5.0.0-rc6+ with the following kernel configuration: CONFIG_ARM64_64K_PAGES=y # CONFIG_ARM64_VA_BITS_42 is not set # CONFIG_ARM64_VA_BITS_48 is not set CONFIG_ARM64_USER_VA_BITS_52=y CONFIG_ARM64_VA_BITS=48 # CONFIG_ARM64_PA_BITS_48 is not set CONFIG_ARM64_PA_BITS_52=y CONFIG_ARM64_PA_BITS=52 b. Both on CPUs which don't support ARMv8.2 extensions and on ARMv8 FVP model with ARMv8.2 extensions. - Error message from makedumpfile: $ makedumpfile -f --mem-usage /proc/kcore -D max_mapnr : a00000 kimage_voffset : fffeffff78000000 max_physmem_bits : 30 section_size_bits: 1e vaddr_to_paddr_arm64: pgda=90f30000, pudv.pgd=ffffff80ffffffd0 vaddr_to_paddr_arm64: puda=90f30000, pudv.pgd=0 readpage_elf: Attempt to read non-existent page at 0x0. readmem: type_addr: 1, addr:0, size:8 vaddr_to_paddr_arm64: Can't read pmd readmem: Can't convert a virtual address(ffff0000093c576c) to physical address. readmem: type_addr: 0, addr:ffff0000093c576c, size:390 check_release: Can't get the address of system_utsname. c. Root Cause Analysis - - After the 52-bit user-space VA_BIT changes in arm64 kernel we set: #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) - User-space tools like makedumpfile and crash use the 'PTRS_PER_PGD' value to calculate the 'pgd_index()' of a vaddr: #define pgd_index(vaddr) (((vaddr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) - Since the kernel now defines 'MAX_USER_VA_BITS' as: #ifdef CONFIG_ARM64_USER_VA_BITS_52 #define MAX_USER_VA_BITS 52 #else #define MAX_USER_VA_BITS VA_BITS #endif so, the user-space also needs this value to calculate the 'PTRS_PER_PGD' and hence 'pgd_index()' correctly. - Exporting correct 'MAX_USER_VA_BITS' via vmcoreinfo for the above case, fixes the above mentioned issue: $ makedumpfile -f --mem-usage /proc/kcore -D <..snip..> max_mapnr : a00000 pa_bits : 52 va_bits : 48 (vmcoreinfo) max_user_va_bits : 52 (vmcoreinfo) kimage_voffset : fffeffff78000000 max_physmem_bits : 52 section_size_bits: 30 vaddr_to_paddr_arm64: pgda=90f31e00, pudv.pgd=ffffff80ffffffd0 vaddr_to_paddr_arm64: puda=90f31e00, pudv.pgd=9fffff0803 vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0803 vaddr_to_paddr_arm64: paddr=913c576c page_offset : ffff800000000000 vaddr_to_paddr_arm64: pgda=90f31e00, pudv.pgd=16e28e8bed294900 vaddr_to_paddr_arm64: puda=90f31e00, pudv.pgd=9fffff0803 vaddr_to_paddr_arm64: pmda=9fffff0000, pmdv.pud=9ffffe0803 vaddr_to_paddr_arm64: paddr=913bd2f8 num of NODEs : 1 Memory type : SPARSEMEM_EX <..snip..> Other important notes --------------------- 1. I have quoted only one makedumpfile use-case failure above (i.e. calculating --mem-usage on the primary kernel). Other use-cases like creating a dumpfile using /proc/vmcore or post-processing a vmcore are also broken similarly and get fixed when a kernel which exports 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo is used along with a modified user-space which can read this information from the vmcoreinfo. 2. I was also going through some of the suggestions on earlier threads about the PTE calculations for the 52-bit LPA case and discussed them with some partner arm64 SoC enggs. The suggestions to convert a page table entry to a physical address without awareness of 52-bit (with an assumption of 64k page size) can be risky. With 64k page and older non-52-bit kernels, while it looks like in the current checks that bits [15:12] are zero, and we can move the zeros to bits [51:48] (because the zeros don't affect the overall PA) to generate the overall 52-bit PA. However, this can cause IMPLEMENTATION SPECIFIC issues on different platforms while generating a PA and IPA. Lets see what the ARMv8 architecture reference manual says about the Bits [15:12] for a 64KB page size: "Bits [15:12] of each valid translation table descriptor hold Bits [51:48] of the output address, or of the address of the translation table to be used for the initial lookup at the next level of translation. If the implementation does not support 52-bit physical addresses, then it is IMPLEMENTATION DEFINED whether non-zero values for these bits generate an Address size fault. In this case, not generating an Address Size Fault is deprecated." As per the vendors, we should not assume that hardware (which does not support 52-bit physical addresses) would generate an Address size fault for non-zero values of Bits[15:12], so extending them to bits [51:48] always can lead to PA address which might cause UNDEFINED behavior on some SoCs. Hope the above text clarifies the problem and what I am trying to fix via this patch. Please let me know if something is missing here. Thanks, Bhupesh
Hi guys, (CC: +Steve, +Kristina) "What's the best way of letting user-space know the MMU config when 52-bit VA and pointer-auth may be in use?" On 13/02/2019 19:52, Kazuhito Hagio wrote: > On 2/13/2019 1:22 PM, James Morse wrote: >> On 13/02/2019 11:15, Dave Young wrote: >>> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote: >>>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote: >>>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA >>>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu >>>>> mentioned that the changes look necessary. >>>>> >>>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html >>>> >>>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then >>>>>>> calculated as is done by the underlying kernel >> >> Aha! Nothing to do with which-bits-are-pfn in the tables... >> >> You need to know if the top level PGD is 512bytes or bigger. As we use a >> kmem-cache the adjacent data could be some else's page tables. >> >> Is this really a problem though? You can't pull the user-space pgd pointers out >> of no-where, you must have walked some task_struct and struct_mm's to find them. >> In which case you would have the VMAs on hand to tell you if its in the mapped >> user range. >> >> It would be good to avoid putting something arch-specific in here if we can at >> all help it. >>>>>>> (see >>>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details): >>>>>>> >>>>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) >>>> >>>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS. >>>> It is used for pgd_index() also in makedumpfile to walk page tables. >>>> >>>> /* to find an entry in a page-table-directory */ >>>> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) >>> >>> Since Dave mentioned crash tool does not need it, but crash should also >>> travel the pg tables. > > The crash utility is always invoked with vmlinux, so it can read the > vabits_user variable directly from vmcore, but makedumpfile can not. (This sounds fragile. That symbol's name may change, it may disappear completely! ... but I guess crash changes with every kernel release anyway) >>> If this is really necessary it would be good to describe what will >>> happen without the patch, eg. some user visible error from an actual test etc. >> >> Yes please, it would really help if there was a specific example we could discuss. > > With 52-bit user space and 48-bit kernel space configuration, > makedumpfile will not be able to convert a virtual kernel address > to a physical address, and fail to capture a dumpfile, because the > pgd_index() will return a wrong index. Got it, thanks! (all this user stuff had me thinking it was user-space you were trying to walk). Yes, this is because of commit e842dfb5a2d3 ("arm64: mm: Offset TTBR1 to allow 52-bit PTRS_PER_PGD"). The kernel has offset the ttbr1 value, if you try and walk it without knowing the offset you get junk. Ideally we tell you the offset with some 'ttbr1_offset=' in vmcoreinfo, but if the offsetting code disappears, the kernel would still have to provide 'ttbr1_offset=0' for user-space to keep working. I'd like to find something future-proof that always has an unambiguous meaning, and isn't a problem if the kernel variable/symbol/kconfig names change. With pointer-auth in use too you can't guess which bits are address and which bits are data. Taking arch-specific to its extreme, we could expose TCR_EL1, but this is a problem if we ever switch that per task (some new bits may turn up with a new feature). Some of those bits vary per cpu too, so we'd have to mask them out in case user-space tries to conclude something from them. My current best suggestion is to export: from core code: * USER_MMAP_END, the maximum value a user-space can try and mmap(). This would normally be TASK_SIZE, but x86 and powerpc also have support for larger VA space, and its plumbed into mm slightly differently. We should have one arch-independent property that covers all these. On arm64 this would be the runtime va bits for user-space's TTBR. (This assumes the value isn't per-task) arch specific: * ARM64_TCR.T1SZ, the va bits mapped by the kernel's TTBR. (We can assume we'll never flip user/kernel space). This has to be arch specific, it will always have a value and its meaning comes from the ARM-ARM (so linux can't change it in the future). It should be the same on every CPU. * ARM64_TTBR1.BADDR, the pa of the kernel page tables, which implicitly has the offset. Again this always has a value, and its meaning comes from the ARM-ARM. If we ever get clever with different page-tables/TCR values on different CPUs, these two should come from the same CPU. I think this gives you what you need if user/kernel may both be using pointer-auth and both may be using 52-bit va. I'm pretty sure the 48:52 bits can be picked at boot time depending on the kernel kconfig and the hardware support. Does anyone have a better idea? (or a corner where this won't work?) Thanks, James
Hi James, On Fri, Feb 15, 2019 at 11:04 PM James Morse <james.morse@arm.com> wrote: > > Hi guys, > > (CC: +Steve, +Kristina) "What's the best way of letting user-space know the MMU > config when 52-bit VA and pointer-auth may be in use?" > > On 13/02/2019 19:52, Kazuhito Hagio wrote: > > On 2/13/2019 1:22 PM, James Morse wrote: > >> On 13/02/2019 11:15, Dave Young wrote: > >>> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote: > >>>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote: > >>>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA > >>>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu > >>>>> mentioned that the changes look necessary. > >>>>> > >>>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html > >>>> > >>>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then > >>>>>>> calculated as is done by the underlying kernel > >> > >> Aha! Nothing to do with which-bits-are-pfn in the tables... > >> > >> You need to know if the top level PGD is 512bytes or bigger. As we use a > >> kmem-cache the adjacent data could be some else's page tables. > >> > >> Is this really a problem though? You can't pull the user-space pgd pointers out > >> of no-where, you must have walked some task_struct and struct_mm's to find them. > >> In which case you would have the VMAs on hand to tell you if its in the mapped > >> user range. > >> > >> It would be good to avoid putting something arch-specific in here if we can at > >> all help it. > > >>>>>>> (see > >>>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details): > >>>>>>> > >>>>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) > >>>> > >>>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS. > >>>> It is used for pgd_index() also in makedumpfile to walk page tables. > >>>> > >>>> /* to find an entry in a page-table-directory */ > >>>> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) > >>> > >>> Since Dave mentioned crash tool does not need it, but crash should also > >>> travel the pg tables. > > > > The crash utility is always invoked with vmlinux, so it can read the > > vabits_user variable directly from vmcore, but makedumpfile can not. > > (This sounds fragile. That symbol's name may change, it may disappear > completely! ... but I guess crash changes with every kernel release anyway) > > > >>> If this is really necessary it would be good to describe what will > >>> happen without the patch, eg. some user visible error from an actual test etc. > >> > >> Yes please, it would really help if there was a specific example we could discuss. > > > > With 52-bit user space and 48-bit kernel space configuration, > > makedumpfile will not be able to convert a virtual kernel address > > to a physical address, and fail to capture a dumpfile, because the > > pgd_index() will return a wrong index. > > Got it, thanks! > (all this user stuff had me thinking it was user-space you were trying to walk). > > Yes, this is because of commit e842dfb5a2d3 ("arm64: mm: Offset TTBR1 to allow > 52-bit PTRS_PER_PGD"). The kernel has offset the ttbr1 value, if you try and > walk it without knowing the offset you get junk. > > Ideally we tell you the offset with some 'ttbr1_offset=' in vmcoreinfo, but if > the offsetting code disappears, the kernel would still have to provide > 'ttbr1_offset=0' for user-space to keep working. > > I'd like to find something future-proof that always has an unambiguous meaning, > and isn't a problem if the kernel variable/symbol/kconfig names change. > > With pointer-auth in use too you can't guess which bits are address and which > bits are data. > > Taking arch-specific to its extreme, we could expose TCR_EL1, but this is a > problem if we ever switch that per task (some new bits may turn up with a new > feature). Some of those bits vary per cpu too, so we'd have to mask them out in > case user-space tries to conclude something from them. > > > My current best suggestion is to export: > from core code: > * USER_MMAP_END, the maximum value a user-space can try and mmap(). > This would normally be TASK_SIZE, but x86 and powerpc also have support for > larger VA space, and its plumbed into mm slightly differently. We should have > one arch-independent property that covers all these. On arm64 this would be the > runtime va bits for user-space's TTBR. (This assumes the value isn't per-task) > > arch specific: > * ARM64_TCR.T1SZ, the va bits mapped by the kernel's TTBR. (We can assume we'll > never flip user/kernel space). This has to be arch specific, it will always have > a value and its meaning comes from the ARM-ARM (so linux can't change it in the > future). It should be the same on every CPU. > * ARM64_TTBR1.BADDR, the pa of the kernel page tables, which implicitly has the > offset. Again this always has a value, and its meaning comes from the ARM-ARM. > If we ever get clever with different page-tables/TCR values on different CPUs, > these two should come from the same CPU. > > > I think this gives you what you need if user/kernel may both be using > pointer-auth and both may be using 52-bit va. I'm pretty sure the 48:52 bits can > be picked at boot time depending on the kernel kconfig and the hardware support. > > Does anyone have a better idea? (or a corner where this won't work?) I am not sure you got a chance to look at the two regression cases I reported here: <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html> Unfortunately the above suggestion doesn't provide any fix for ARMv8.2-LPA regression (see text under heading ' (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)') After going through the regression reports, I think exporting 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo is sufficient for the above regressions (without over-complicating the stuff) as ARM64_TCR.T1SZ and friends seem to arch specific as compared to VA_BITS + 'MAX_USER_VA_BITS' . Thanks, Bhupesh
Hi Bhupesh, Sorry for joining this thread late... On Fri, Feb 15, 2019 at 11:31:56PM +0530, Bhupesh Sharma wrote: > Hi James, > > On Fri, Feb 15, 2019 at 11:04 PM James Morse <james.morse@arm.com> wrote: > > > > Hi guys, > > > > (CC: +Steve, +Kristina) "What's the best way of letting user-space know the MMU > > config when 52-bit VA and pointer-auth may be in use?" > > > > On 13/02/2019 19:52, Kazuhito Hagio wrote: > > > On 2/13/2019 1:22 PM, James Morse wrote: > > >> On 13/02/2019 11:15, Dave Young wrote: > > >>> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote: > > >>>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote: > > >>>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA > > >>>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu > > >>>>> mentioned that the changes look necessary. > > >>>>> > > >>>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html > > >>>> > > >>>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then > > >>>>>>> calculated as is done by the underlying kernel > > >> > > >> Aha! Nothing to do with which-bits-are-pfn in the tables... > > >> > > >> You need to know if the top level PGD is 512bytes or bigger. As we use a > > >> kmem-cache the adjacent data could be some else's page tables. > > >> > > >> Is this really a problem though? You can't pull the user-space pgd pointers out > > >> of no-where, you must have walked some task_struct and struct_mm's to find them. > > >> In which case you would have the VMAs on hand to tell you if its in the mapped > > >> user range. > > >> > > >> It would be good to avoid putting something arch-specific in here if we can at > > >> all help it. > > > > >>>>>>> (see > > >>>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details): > > >>>>>>> > > >>>>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) > > >>>> > > >>>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS. > > >>>> It is used for pgd_index() also in makedumpfile to walk page tables. > > >>>> > > >>>> /* to find an entry in a page-table-directory */ > > >>>> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) > > >>> > > >>> Since Dave mentioned crash tool does not need it, but crash should also > > >>> travel the pg tables. > > > > > > The crash utility is always invoked with vmlinux, so it can read the > > > vabits_user variable directly from vmcore, but makedumpfile can not. > > > > (This sounds fragile. That symbol's name may change, it may disappear > > completely! ... but I guess crash changes with every kernel release anyway) > > > > > > >>> If this is really necessary it would be good to describe what will > > >>> happen without the patch, eg. some user visible error from an actual test etc. > > >> > > >> Yes please, it would really help if there was a specific example we could discuss. > > > > > > With 52-bit user space and 48-bit kernel space configuration, > > > makedumpfile will not be able to convert a virtual kernel address > > > to a physical address, and fail to capture a dumpfile, because the > > > pgd_index() will return a wrong index. > > > > Got it, thanks! > > (all this user stuff had me thinking it was user-space you were trying to walk). > > > > Yes, this is because of commit e842dfb5a2d3 ("arm64: mm: Offset TTBR1 to allow > > 52-bit PTRS_PER_PGD"). The kernel has offset the ttbr1 value, if you try and > > walk it without knowing the offset you get junk. > > > > Ideally we tell you the offset with some 'ttbr1_offset=' in vmcoreinfo, but if > > the offsetting code disappears, the kernel would still have to provide > > 'ttbr1_offset=0' for user-space to keep working. > > > > I'd like to find something future-proof that always has an unambiguous meaning, > > and isn't a problem if the kernel variable/symbol/kconfig names change. > > > > With pointer-auth in use too you can't guess which bits are address and which > > bits are data. > > > > Taking arch-specific to its extreme, we could expose TCR_EL1, but this is a > > problem if we ever switch that per task (some new bits may turn up with a new > > feature). Some of those bits vary per cpu too, so we'd have to mask them out in > > case user-space tries to conclude something from them. > > > > > > My current best suggestion is to export: > > from core code: > > * USER_MMAP_END, the maximum value a user-space can try and mmap(). > > This would normally be TASK_SIZE, but x86 and powerpc also have support for > > larger VA space, and its plumbed into mm slightly differently. We should have > > one arch-independent property that covers all these. On arm64 this would be the > > runtime va bits for user-space's TTBR. (This assumes the value isn't per-task) > > > > arch specific: > > * ARM64_TCR.T1SZ, the va bits mapped by the kernel's TTBR. (We can assume we'll > > never flip user/kernel space). This has to be arch specific, it will always have > > a value and its meaning comes from the ARM-ARM (so linux can't change it in the > > future). It should be the same on every CPU. > > * ARM64_TTBR1.BADDR, the pa of the kernel page tables, which implicitly has the > > offset. Again this always has a value, and its meaning comes from the ARM-ARM. > > If we ever get clever with different page-tables/TCR values on different CPUs, > > these two should come from the same CPU. > > > > > > I think this gives you what you need if user/kernel may both be using > > pointer-auth and both may be using 52-bit va. I'm pretty sure the 48:52 bits can > > be picked at boot time depending on the kernel kconfig and the hardware support. > > > > Does anyone have a better idea? (or a corner where this won't work?) > > I am not sure you got a chance to look at the two regression cases I > reported here: > <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html> > > Unfortunately the above suggestion doesn't provide any fix for > ARMv8.2-LPA regression (see text under heading ' > (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)') > > After going through the regression reports, I think exporting > 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo is sufficient > for the above regressions (without over-complicating the stuff) as > ARM64_TCR.T1SZ and friends seem to arch specific as compared to > VA_BITS + 'MAX_USER_VA_BITS' . > For MAX_USER_VA_BITS, IIUC you are just after a value of PTRS_PER_PGD? Why not just add PTRS_PER_PGD to the vmcoreinfo? FWIW it is possible in vaddr_to_paddr_arm64 to detect a zero pgd entry then try again with another ptrs_per_pgd value (granted this is a little hacky). Cheers,
Hi Bhupesh, -----Original Message----- > I am not sure you got a chance to look at the two regression cases I > reported here: > <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html> > > Unfortunately the above suggestion doesn't provide any fix for > ARMv8.2-LPA regression (see text under heading ' > (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)') As for MAX_PHYSMEM_BITS, I realized that ppc64 makedumpfile can detect it because there is only one SECTION_SIZE_BITS for ppc64. I think we can use the same way as set_ppc64_max_physmem_bits() does also for arm64 for now. I'm going to write it for kernels not having NUMBER(MAX_PHYSMEM_BITS) in vmcoreinfo. Thanks, Kazu > > After going through the regression reports, I think exporting > 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo is sufficient > for the above regressions (without over-complicating the stuff) as > ARM64_TCR.T1SZ and friends seem to arch specific as compared to > VA_BITS + 'MAX_USER_VA_BITS' . > > Thanks, > Bhupesh
Hi Steve, On 02/18/2019 08:57 PM, Steve Capper wrote: > Hi Bhupesh, > > Sorry for joining this thread late... > > On Fri, Feb 15, 2019 at 11:31:56PM +0530, Bhupesh Sharma wrote: >> Hi James, >> >> On Fri, Feb 15, 2019 at 11:04 PM James Morse <james.morse@arm.com> wrote: >>> >>> Hi guys, >>> >>> (CC: +Steve, +Kristina) "What's the best way of letting user-space know the MMU >>> config when 52-bit VA and pointer-auth may be in use?" >>> >>> On 13/02/2019 19:52, Kazuhito Hagio wrote: >>>> On 2/13/2019 1:22 PM, James Morse wrote: >>>>> On 13/02/2019 11:15, Dave Young wrote: >>>>>> On 02/12/19 at 11:03pm, Kazuhito Hagio wrote: >>>>>>> On 2/12/2019 2:59 PM, Bhupesh Sharma wrote: >>>>>>>> BTW, in the makedumpfile enablement patch thread for ARMv8.2 LVA >>>>>>>> (which I sent out for 52-bit User space VA enablement) (see [0]), Kazu >>>>>>>> mentioned that the changes look necessary. >>>>>>>> >>>>>>>> [0]. http://lists.infradead.org/pipermail/kexec/2019-February/022431.html >>>>>>> >>>>>>>>>> The increased 'PTRS_PER_PGD' value for such cases needs to be then >>>>>>>>>> calculated as is done by the underlying kernel >>>>> >>>>> Aha! Nothing to do with which-bits-are-pfn in the tables... >>>>> >>>>> You need to know if the top level PGD is 512bytes or bigger. As we use a >>>>> kmem-cache the adjacent data could be some else's page tables. >>>>> >>>>> Is this really a problem though? You can't pull the user-space pgd pointers out >>>>> of no-where, you must have walked some task_struct and struct_mm's to find them. >>>>> In which case you would have the VMAs on hand to tell you if its in the mapped >>>>> user range. >>>>> >>>>> It would be good to avoid putting something arch-specific in here if we can at >>>>> all help it. >>> >>>>>>>>>> (see >>>>>>>>>> 'arch/arm64/include/asm/pgtable-hwdef.h' for details): >>>>>>>>>> >>>>>>>>>> #define PTRS_PER_PGD (1 << (MAX_USER_VA_BITS - PGDIR_SHIFT)) >>>>>>> >>>>>>> Yes, this is the reason why makedumpfile needs the MAX_USER_VA_BITS. >>>>>>> It is used for pgd_index() also in makedumpfile to walk page tables. >>>>>>> >>>>>>> /* to find an entry in a page-table-directory */ >>>>>>> #define pgd_index(addr) (((addr) >> PGDIR_SHIFT) & (PTRS_PER_PGD - 1)) >>>>>> >>>>>> Since Dave mentioned crash tool does not need it, but crash should also >>>>>> travel the pg tables. >>>> >>>> The crash utility is always invoked with vmlinux, so it can read the >>>> vabits_user variable directly from vmcore, but makedumpfile can not. >>> >>> (This sounds fragile. That symbol's name may change, it may disappear >>> completely! ... but I guess crash changes with every kernel release anyway) >>> >>> >>>>>> If this is really necessary it would be good to describe what will >>>>>> happen without the patch, eg. some user visible error from an actual test etc. >>>>> >>>>> Yes please, it would really help if there was a specific example we could discuss. >>>> >>>> With 52-bit user space and 48-bit kernel space configuration, >>>> makedumpfile will not be able to convert a virtual kernel address >>>> to a physical address, and fail to capture a dumpfile, because the >>>> pgd_index() will return a wrong index. >>> >>> Got it, thanks! >>> (all this user stuff had me thinking it was user-space you were trying to walk). >>> >>> Yes, this is because of commit e842dfb5a2d3 ("arm64: mm: Offset TTBR1 to allow >>> 52-bit PTRS_PER_PGD"). The kernel has offset the ttbr1 value, if you try and >>> walk it without knowing the offset you get junk. >>> >>> Ideally we tell you the offset with some 'ttbr1_offset=' in vmcoreinfo, but if >>> the offsetting code disappears, the kernel would still have to provide >>> 'ttbr1_offset=0' for user-space to keep working. >>> >>> I'd like to find something future-proof that always has an unambiguous meaning, >>> and isn't a problem if the kernel variable/symbol/kconfig names change. >>> >>> With pointer-auth in use too you can't guess which bits are address and which >>> bits are data. >>> >>> Taking arch-specific to its extreme, we could expose TCR_EL1, but this is a >>> problem if we ever switch that per task (some new bits may turn up with a new >>> feature). Some of those bits vary per cpu too, so we'd have to mask them out in >>> case user-space tries to conclude something from them. >>> >>> >>> My current best suggestion is to export: >>> from core code: >>> * USER_MMAP_END, the maximum value a user-space can try and mmap(). >>> This would normally be TASK_SIZE, but x86 and powerpc also have support for >>> larger VA space, and its plumbed into mm slightly differently. We should have >>> one arch-independent property that covers all these. On arm64 this would be the >>> runtime va bits for user-space's TTBR. (This assumes the value isn't per-task) >>> >>> arch specific: >>> * ARM64_TCR.T1SZ, the va bits mapped by the kernel's TTBR. (We can assume we'll >>> never flip user/kernel space). This has to be arch specific, it will always have >>> a value and its meaning comes from the ARM-ARM (so linux can't change it in the >>> future). It should be the same on every CPU. >>> * ARM64_TTBR1.BADDR, the pa of the kernel page tables, which implicitly has the >>> offset. Again this always has a value, and its meaning comes from the ARM-ARM. >>> If we ever get clever with different page-tables/TCR values on different CPUs, >>> these two should come from the same CPU. >>> >>> >>> I think this gives you what you need if user/kernel may both be using >>> pointer-auth and both may be using 52-bit va. I'm pretty sure the 48:52 bits can >>> be picked at boot time depending on the kernel kconfig and the hardware support. >>> >>> Does anyone have a better idea? (or a corner where this won't work?) >> >> I am not sure you got a chance to look at the two regression cases I >> reported here: >> <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html> >> >> Unfortunately the above suggestion doesn't provide any fix for >> ARMv8.2-LPA regression (see text under heading ' >> (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)') >> >> After going through the regression reports, I think exporting >> 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' to vmcoreinfo is sufficient >> for the above regressions (without over-complicating the stuff) as >> ARM64_TCR.T1SZ and friends seem to arch specific as compared to >> VA_BITS + 'MAX_USER_VA_BITS' . >> > > For MAX_USER_VA_BITS, IIUC you are just after a value of PTRS_PER_PGD? > Why not just add PTRS_PER_PGD to the vmcoreinfo? That's a good suggestion. I will re-spin the v2 with the same. > FWIW it is possible in vaddr_to_paddr_arm64 to detect a zero pgd entry > then try again with another ptrs_per_pgd value (granted this is a little > hacky). Right, but having this hack replicated across various user-space tools is perhaps not the ideal portable solution, when we can simply add a valid hint in the vmcoreinfo itself. Thanks, Bhupesh
Hi Kazu, On 02/20/2019 02:17 AM, Kazuhito Hagio wrote: > Hi Bhupesh, > > -----Original Message----- >> I am not sure you got a chance to look at the two regression cases I >> reported here: >> <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html> >> >> Unfortunately the above suggestion doesn't provide any fix for >> ARMv8.2-LPA regression (see text under heading ' >> (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)') > > As for MAX_PHYSMEM_BITS, I realized that ppc64 makedumpfile can detect > it because there is only one SECTION_SIZE_BITS for ppc64. I think we > can use the same way as set_ppc64_max_physmem_bits() does also for > arm64 for now. I'm going to write it for kernels not having > NUMBER(MAX_PHYSMEM_BITS) in vmcoreinfo. I see two drawbacks with the above approach: a). This means that other user-space tools like crash-utility would still be broken and would probably need to find MAX_PHYSMEM_BITS for arm64 via a similar (hack'ish ?) approach. b). I am looking at the makedumpfile code for 'MAX_PHYSMEM_BITS' determination for two archs as an example: ppc --- int set_ppc64_max_physmem_bits(void) { long array_len = ARRAY_LENGTH(mem_section); /* * The older ppc64 kernels uses _MAX_PHYSMEM_BITS as 42 and the * newer kernels 3.7 onwards uses 46 bits. */ info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG ; if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) return TRUE; info->max_physmem_bits = _MAX_PHYSMEM_BITS_3_7; if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) return TRUE; info->max_physmem_bits = _MAX_PHYSMEM_BITS_4_19; if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) return TRUE; info->max_physmem_bits = _MAX_PHYSMEM_BITS_4_20; if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) return TRUE; return FALSE; } x86_64: ------ int get_versiondep_info_x86_64(void) { /* * On linux-2.6.26, MAX_PHYSMEM_BITS is changed to 44 from 40. */ if (info->kernel_version < KERNEL_VERSION(2, 6, 26)) info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG; else if (info->kernel_version < KERNEL_VERSION(2, 6, 31)) info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_26; else if(check_5level_paging()) info->max_physmem_bits = _MAX_PHYSMEM_BITS_5LEVEL; else info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_31; ... } Looking at the above, two questions come to my mind: - Do we really need all the above complexity in user-space code, to hoop across various kernel versions and perform allocations for something that can be so easily exported via vmcoreinfo? Also we need to see how portable is the above code for a new kernel version - IMO, it will need another fix patch when we update to a new kernel version in near future. - Also do we need to replicate the above implementations across user-space tools when they can also utilize the vmcoreinfo information to determine the PA_BITS range without any additional arch/kernel version specific details as the single point of obtaining this information from the kernel? So, in view of the above, I would still advocate that we use a vmcoreinfo export for 'MAX_PHYSMEM_BITS' as well to have a uniform interface for the same across all user-land applications. Thanks, Bhupesh
----- Original Message ----- > Hi Kazu, > > On 02/20/2019 02:17 AM, Kazuhito Hagio wrote: > > Hi Bhupesh, > > > > -----Original Message----- > >> I am not sure you got a chance to look at the two regression cases I > >> reported here: > >> <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html> > >> > >> Unfortunately the above suggestion doesn't provide any fix for > >> ARMv8.2-LPA regression (see text under heading ' > >> (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)') > > > > As for MAX_PHYSMEM_BITS, I realized that ppc64 makedumpfile can detect > > it because there is only one SECTION_SIZE_BITS for ppc64. I think we > > can use the same way as set_ppc64_max_physmem_bits() does also for > > arm64 for now. I'm going to write it for kernels not having > > NUMBER(MAX_PHYSMEM_BITS) in vmcoreinfo. > > I see two drawbacks with the above approach: > > a). This means that other user-space tools like crash-utility would > still be broken and would probably need to find MAX_PHYSMEM_BITS for > arm64 via a similar (hack'ish ?) approach. > > b). I am looking at the makedumpfile code for 'MAX_PHYSMEM_BITS' > determination for two archs as an example: > > ppc > --- > > int > set_ppc64_max_physmem_bits(void) > { > long array_len = ARRAY_LENGTH(mem_section); > /* > * The older ppc64 kernels uses _MAX_PHYSMEM_BITS as 42 and the > * newer kernels 3.7 onwards uses 46 bits. > */ > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG ; > if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) > || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) > return TRUE; > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_3_7; > if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) > || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) > return TRUE; > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_4_19; > if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) > || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) > return TRUE; > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_4_20; > if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) > || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) > return TRUE; > > return FALSE; > } > > x86_64: > ------ > > int > get_versiondep_info_x86_64(void) > { > /* > * On linux-2.6.26, MAX_PHYSMEM_BITS is changed to 44 from 40. > */ > if (info->kernel_version < KERNEL_VERSION(2, 6, 26)) > info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG; > else if (info->kernel_version < KERNEL_VERSION(2, 6, 31)) > info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_26; > else if(check_5level_paging()) > info->max_physmem_bits = _MAX_PHYSMEM_BITS_5LEVEL; > else > info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_31; > > ... > } > > Looking at the above, two questions come to my mind: > > - Do we really need all the above complexity in user-space code, to hoop > across various kernel versions and perform allocations for something > that can be so easily exported via vmcoreinfo? Also we need to see how > portable is the above code for a new kernel version - IMO, it will need > another fix patch when we update to a new kernel version in near future. I agree -- not to mention that the "kernel version" way of determining things does not account for distribution-specific backports. > > - Also do we need to replicate the above implementations across > user-space tools when they can also utilize the vmcoreinfo information > to determine the PA_BITS range without any additional arch/kernel > version specific details as the single point of obtaining this > information from the kernel? > > So, in view of the above, I would still advocate that we use a > vmcoreinfo export for 'MAX_PHYSMEM_BITS' as well to have a uniform > interface for the same across all user-land applications. Again, totally agree. Dave > Thanks, > Bhupesh > >
-----Original Message----- > ----- Original Message ----- > > Hi Kazu, > > > > On 02/20/2019 02:17 AM, Kazuhito Hagio wrote: > > > Hi Bhupesh, > > > > > > -----Original Message----- > > >> I am not sure you got a chance to look at the two regression cases I > > >> reported here: > > >> <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html> > > >> > > >> Unfortunately the above suggestion doesn't provide any fix for > > >> ARMv8.2-LPA regression (see text under heading ' > > >> (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)') > > > > > > As for MAX_PHYSMEM_BITS, I realized that ppc64 makedumpfile can detect > > > it because there is only one SECTION_SIZE_BITS for ppc64. I think we > > > can use the same way as set_ppc64_max_physmem_bits() does also for > > > arm64 for now. I'm going to write it for kernels not having > > > NUMBER(MAX_PHYSMEM_BITS) in vmcoreinfo. > > > > I see two drawbacks with the above approach: > > > > a). This means that other user-space tools like crash-utility would > > still be broken and would probably need to find MAX_PHYSMEM_BITS for > > arm64 via a similar (hack'ish ?) approach. > > > > b). I am looking at the makedumpfile code for 'MAX_PHYSMEM_BITS' > > determination for two archs as an example: > > > > ppc > > --- > > > > int > > set_ppc64_max_physmem_bits(void) > > { > > long array_len = ARRAY_LENGTH(mem_section); > > /* > > * The older ppc64 kernels uses _MAX_PHYSMEM_BITS as 42 and the > > * newer kernels 3.7 onwards uses 46 bits. > > */ > > > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG ; > > if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) > > || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) > > return TRUE; > > > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_3_7; > > if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) > > || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) > > return TRUE; > > > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_4_19; > > if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) > > || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) > > return TRUE; > > > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_4_20; > > if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) > > || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) > > return TRUE; > > > > return FALSE; > > } > > > > x86_64: > > ------ > > > > int > > get_versiondep_info_x86_64(void) > > { > > /* > > * On linux-2.6.26, MAX_PHYSMEM_BITS is changed to 44 from 40. > > */ > > if (info->kernel_version < KERNEL_VERSION(2, 6, 26)) > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG; > > else if (info->kernel_version < KERNEL_VERSION(2, 6, 31)) > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_26; > > else if(check_5level_paging()) > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_5LEVEL; > > else > > info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_31; > > > > ... > > } > > > > Looking at the above, two questions come to my mind: > > > > - Do we really need all the above complexity in user-space code, to hoop > > across various kernel versions and perform allocations for something > > that can be so easily exported via vmcoreinfo? Also we need to see how > > portable is the above code for a new kernel version - IMO, it will need > > another fix patch when we update to a new kernel version in near future. > > I agree -- not to mention that the "kernel version" way of determining things > does not account for distribution-specific backports. > > > > > - Also do we need to replicate the above implementations across > > user-space tools when they can also utilize the vmcoreinfo information > > to determine the PA_BITS range without any additional arch/kernel > > version specific details as the single point of obtaining this > > information from the kernel? > > > > So, in view of the above, I would still advocate that we use a > > vmcoreinfo export for 'MAX_PHYSMEM_BITS' as well to have a uniform > > interface for the same across all user-land applications. > > Again, totally agree. I also agree that we should do so. Then it will be better to have it in kernel core code, not in arch-specific code. Although makedumpfile may have to have the kludge for kernels that support 52-bit PA and don't have the exported MAX_PHYSMEM_BITS sooner or later.. Thanks, Kazu > > Dave > > > > Thanks, > > Bhupesh > > > > > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec
Hi Kazu, On 02/22/2019 12:32 AM, Kazuhito Hagio wrote: > -----Original Message----- >> ----- Original Message ----- >>> Hi Kazu, >>> >>> On 02/20/2019 02:17 AM, Kazuhito Hagio wrote: >>>> Hi Bhupesh, >>>> >>>> -----Original Message----- >>>>> I am not sure you got a chance to look at the two regression cases I >>>>> reported here: >>>>> <http://lists.infradead.org/pipermail/kexec/2019-February/022449.html> >>>>> >>>>> Unfortunately the above suggestion doesn't provide any fix for >>>>> ARMv8.2-LPA regression (see text under heading ' >>>>> (1). Regression Case 1 (ARMv8.2-LPA enabled kernel)') >>>> >>>> As for MAX_PHYSMEM_BITS, I realized that ppc64 makedumpfile can detect >>>> it because there is only one SECTION_SIZE_BITS for ppc64. I think we >>>> can use the same way as set_ppc64_max_physmem_bits() does also for >>>> arm64 for now. I'm going to write it for kernels not having >>>> NUMBER(MAX_PHYSMEM_BITS) in vmcoreinfo. >>> >>> I see two drawbacks with the above approach: >>> >>> a). This means that other user-space tools like crash-utility would >>> still be broken and would probably need to find MAX_PHYSMEM_BITS for >>> arm64 via a similar (hack'ish ?) approach. >>> >>> b). I am looking at the makedumpfile code for 'MAX_PHYSMEM_BITS' >>> determination for two archs as an example: >>> >>> ppc >>> --- >>> >>> int >>> set_ppc64_max_physmem_bits(void) >>> { >>> long array_len = ARRAY_LENGTH(mem_section); >>> /* >>> * The older ppc64 kernels uses _MAX_PHYSMEM_BITS as 42 and the >>> * newer kernels 3.7 onwards uses 46 bits. >>> */ >>> >>> info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG ; >>> if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) >>> || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) >>> return TRUE; >>> >>> info->max_physmem_bits = _MAX_PHYSMEM_BITS_3_7; >>> if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) >>> || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) >>> return TRUE; >>> >>> info->max_physmem_bits = _MAX_PHYSMEM_BITS_4_19; >>> if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) >>> || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) >>> return TRUE; >>> >>> info->max_physmem_bits = _MAX_PHYSMEM_BITS_4_20; >>> if ((array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT_EXTREME())) >>> || (array_len == (NR_MEM_SECTIONS() / _SECTIONS_PER_ROOT()))) >>> return TRUE; >>> >>> return FALSE; >>> } >>> >>> x86_64: >>> ------ >>> >>> int >>> get_versiondep_info_x86_64(void) >>> { >>> /* >>> * On linux-2.6.26, MAX_PHYSMEM_BITS is changed to 44 from 40. >>> */ >>> if (info->kernel_version < KERNEL_VERSION(2, 6, 26)) >>> info->max_physmem_bits = _MAX_PHYSMEM_BITS_ORIG; >>> else if (info->kernel_version < KERNEL_VERSION(2, 6, 31)) >>> info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_26; >>> else if(check_5level_paging()) >>> info->max_physmem_bits = _MAX_PHYSMEM_BITS_5LEVEL; >>> else >>> info->max_physmem_bits = _MAX_PHYSMEM_BITS_2_6_31; >>> >>> ... >>> } >>> >>> Looking at the above, two questions come to my mind: >>> >>> - Do we really need all the above complexity in user-space code, to hoop >>> across various kernel versions and perform allocations for something >>> that can be so easily exported via vmcoreinfo? Also we need to see how >>> portable is the above code for a new kernel version - IMO, it will need >>> another fix patch when we update to a new kernel version in near future. >> >> I agree -- not to mention that the "kernel version" way of determining things >> does not account for distribution-specific backports. >> >>> >>> - Also do we need to replicate the above implementations across >>> user-space tools when they can also utilize the vmcoreinfo information >>> to determine the PA_BITS range without any additional arch/kernel >>> version specific details as the single point of obtaining this >>> information from the kernel? >>> >>> So, in view of the above, I would still advocate that we use a >>> vmcoreinfo export for 'MAX_PHYSMEM_BITS' as well to have a uniform >>> interface for the same across all user-land applications. >> >> Again, totally agree. > > I also agree that we should do so. Then it will be better to have > it in kernel core code, not in arch-specific code. > > Although makedumpfile may have to have the kludge for kernels that > support 52-bit PA and don't have the exported MAX_PHYSMEM_BITS > sooner or later.. I was waiting for maintainers Cc'ed in this thread to share their opinions on the above suggestion. We especially need comments from x86 maintainers on this (Boris), since x86 and ppc user-space utilities are currently not broken (no regression reported) due to a lacking export of 'MAX_PHYSMEM_BITS' via vmcore, whereas as I shared earlier arm64 user-space utilities are broken right now with 52-bit address space changes. Thanks, Bhupesh
diff --git a/arch/arm64/kernel/crash_core.c b/arch/arm64/kernel/crash_core.c index ca4c3e12d8c5..ad231be5c0d8 100644 --- a/arch/arm64/kernel/crash_core.c +++ b/arch/arm64/kernel/crash_core.c @@ -10,6 +10,8 @@ void arch_crash_save_vmcoreinfo(void) { VMCOREINFO_NUMBER(VA_BITS); + VMCOREINFO_NUMBER(MAX_USER_VA_BITS); + VMCOREINFO_NUMBER(MAX_PHYSMEM_BITS); /* Please note VMCOREINFO_NUMBER() uses "%d", not "%x" */ vmcoreinfo_append_str("NUMBER(kimage_voffset)=0x%llx\n", kimage_voffset);
With ARMv8.2-LVA and LPA architecture extensions, arm64 hardware which supports these extensions can support upto 52-bit virtual and 52-bit physical addresses respectively. Since at the moment we enable the support of these extensions via CONFIG flags, e.g. - LPA via CONFIG_ARM64_PA_BITS_52 there are no clear mechanisms in user-space right now to deteremine these CONFIG flag values and also determine the PARange and VARange address values. User-space tools like 'makedumpfile' and 'crash-utility' can instead use the 'MAX_USER_VA_BITS' and 'MAX_PHYSMEM_BITS' values to determine the maximum virtual address and physical address (respectively) supported by underlying kernel. A reference 'makedumpfile' implementation which uses this approach to determining the maximum physical address is available in [0]. [0]. https://github.com/bhupesh-sharma/makedumpfile/blob/52-bit-pa-support-via-vmcore-v1/arch/arm64.c#L490 Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: James Morse <james.morse@arm.com> Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> --- arch/arm64/kernel/crash_core.c | 2 ++ 1 file changed, 2 insertions(+)