Message ID | 1528785361-24477-1-git-send-email-bhsharma@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 12 June 2018 at 08:36, Bhupesh Sharma <bhsharma@redhat.com> wrote: > The start of the linear region map on a KASLR enabled ARM64 machine - > which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL > support), is no longer correctly represented by the PAGE_OFFSET macro, > since it is defined as: > > (UL(1) << (VA_BITS - 1)) + 1) > PAGE_OFFSET is the VA of the start of the linear map. The linear map can be sparsely populated with actual memory, regardless of whether KASLR is in effect or not. The only difference in the presence of KASLR is that there may be such a hole at the beginning, but that does not mean the linear map has moved, or that the value of PAGE_OFFSET is now wrong. > So taking an example of a platform with VA_BITS=48, this gives a static > value of: > PAGE_OFFSET = 0xffff800000000000 > > However, for the KASLR case, we use the 'memstart_offset_seed' > to randomize the linear region - since 'memstart_addr' indicates the > start of physical RAM, we randomize the same on basis > of 'memstart_offset_seed' value. > > As the PAGE_OFFSET value is used presently by several user space > tools (for e.g. makedumpfile and crash tools) to determine the start > of linear region and hence to read addresses (like PT_NOTE fields) from > '/proc/kcore' for the non-KASLR boot cases, so it would be better to > use 'memblock_start_of_DRAM()' value (converted to virtual) as > the start of linear region for the KASLR cases and default to > the PAGE_OFFSET value for non-KASLR cases to indicate the start of > linear region. > Userland code that assumes that the linear map cannot have a hole at the beginning should be fixed. > I tested this on my qualcomm (which supports EFI_RNG_PROTOCOL) > and apm mustang (which does not support EFI_RNG_PROTOCOL) arm64 boards > and was able to use a modified user space utility (like kexec-tools and > makedumpfile) to determine the start of linear region correctly for > both the KASLR and non-KASLR boot cases. > Can you explain the nature of the changes to the userland code? > Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> > Cc: Mark Rutland <mark.rutland@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> > Cc: James Morse <james.morse@arm.com> > Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> > --- > arch/arm64/include/asm/memory.h | 3 +++ > arch/arm64/kernel/arm64ksyms.c | 1 + > arch/arm64/mm/init.c | 3 +++ > 3 files changed, 7 insertions(+) > > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h > index 49d99214f43c..bfd0915ecaf8 100644 > --- a/arch/arm64/include/asm/memory.h > +++ b/arch/arm64/include/asm/memory.h > @@ -178,6 +178,9 @@ extern s64 memstart_addr; > /* PHYS_OFFSET - the physical address of the start of memory. */ > #define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; }) > > +/* the virtual base of the linear region. */ > +extern s64 linear_reg_start_addr; > + > /* the virtual base of the kernel image (minus TEXT_OFFSET) */ > extern u64 kimage_vaddr; > > diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c > index d894a20b70b2..a92238ea45ff 100644 > --- a/arch/arm64/kernel/arm64ksyms.c > +++ b/arch/arm64/kernel/arm64ksyms.c > @@ -42,6 +42,7 @@ EXPORT_SYMBOL(__arch_copy_in_user); > > /* physical memory */ > EXPORT_SYMBOL(memstart_addr); > +EXPORT_SYMBOL(linear_reg_start_addr); > > /* string / mem functions */ > EXPORT_SYMBOL(strchr); > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 325cfb3b858a..29447adb0eef 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -60,6 +60,7 @@ > * that cannot be mistaken for a real physical address. > */ > s64 memstart_addr __ro_after_init = -1; > +s64 linear_reg_start_addr __ro_after_init = PAGE_OFFSET; > phys_addr_t arm64_dma_phys_limit __ro_after_init; > > #ifdef CONFIG_BLK_DEV_INITRD > @@ -452,6 +453,8 @@ void __init arm64_memblock_init(void) > } > } > > + linear_reg_start_addr = __phys_to_virt(memblock_start_of_DRAM()); > + > /* > * Register the kernel text, kernel data, initrd, and initial > * pagetables with memblock. > -- > 2.7.4 >
Hi Ard, On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote: > On 12 June 2018 at 08:36, Bhupesh Sharma <bhsharma@redhat.com> wrote: >> The start of the linear region map on a KASLR enabled ARM64 machine - >> which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL >> support), is no longer correctly represented by the PAGE_OFFSET macro, >> since it is defined as: >> >> (UL(1) << (VA_BITS - 1)) + 1) >> > > PAGE_OFFSET is the VA of the start of the linear map. The linear map > can be sparsely populated with actual memory, regardless of whether > KASLR is in effect or not. The only difference in the presence of > KASLR is that there may be such a hole at the beginning, but that does > not mean the linear map has moved, or that the value of PAGE_OFFSET is > now wrong. > >> So taking an example of a platform with VA_BITS=48, this gives a static >> value of: >> PAGE_OFFSET = 0xffff800000000000 >> >> However, for the KASLR case, we use the 'memstart_offset_seed' >> to randomize the linear region - since 'memstart_addr' indicates the >> start of physical RAM, we randomize the same on basis >> of 'memstart_offset_seed' value. >> >> As the PAGE_OFFSET value is used presently by several user space >> tools (for e.g. makedumpfile and crash tools) to determine the start >> of linear region and hence to read addresses (like PT_NOTE fields) from >> '/proc/kcore' for the non-KASLR boot cases, so it would be better to >> use 'memblock_start_of_DRAM()' value (converted to virtual) as >> the start of linear region for the KASLR cases and default to >> the PAGE_OFFSET value for non-KASLR cases to indicate the start of >> linear region. >> > > Userland code that assumes that the linear map cannot have a hole at > the beginning should be fixed. That is a separate case (although that needs fixing as well via a kernel patch probably as the user-space tools rely on '/proc/iomem' contents to determine the first System RAM/reserved range). 1. In that particular case (see [1]) the EFI firmware sets the first EFI block as EfiReservedMemType: Region1: 0x000000000000-0x000000200000 [EfiReservedMemType] Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData] Since EFI firmware won't return the "EfiReservedMemType" memory to Linux kernel, so the kernel can't get any info about the first mem block, and kernel can only see region2 as below: efi: Processing EFI memory map: efi: 0x000000200000-0x00000021ffff [Runtime Data |RUN| | | | | | | |WB|WT|WC|UC] # head -1 /proc/iomem 00200000-0021ffff : reserved 2a. If we add debug prints to 'arch/arm64/mm/init.c' to print the kernel Virtual map we can see that the memory node is set to: # dmesg | grep memory .......... memory : 0xffff800000200000 - 0xffff801800000000 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that if we use 'readelf' to get the last program Header from vmcore (logs below are for the non-kaslr case): # readelf -l vmcore ELF Header: ........................ Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align .............................................................................................................................................................. LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 0x0000001680000000 0x0000001680000000 RWE 0 3. So if we do a simple calculation: (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = 0xFFFF8017FFE00000 != 0xffff801800000000. which indicates that the end virtual memory nodes are not the same between vmlinux and vmcore. This happens because the kexec-tools rely on 'proc/iomem' contents while 'memstart_addr' is computed as 0 by kernel (as value of memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN). Returning back to this patch, this is a generic requirement where we need the linear region start/base addresses in user-space applications which is used to read addresses which lie in the linear region (for e.g. when we read /proc/kcore contents). >> I tested this on my qualcomm (which supports EFI_RNG_PROTOCOL) >> and apm mustang (which does not support EFI_RNG_PROTOCOL) arm64 boards >> and was able to use a modified user space utility (like kexec-tools and >> makedumpfile) to determine the start of linear region correctly for >> both the KASLR and non-KASLR boot cases. >> > > Can you explain the nature of the changes to the userland code? The changes are not to rely on the fixed PAGE_OFFSET macro value for determining the base address of the linear region, but rather read the ' linear_reg_start_addr' symbol from kernel and use the same both in case of KASLR and non-KASLR boots to determine the base of the linear region (in [2], I have implemented a test change to kexec-tools to read the 'linear_reg_start_addr' symbol which is available on my public github tree, I have a similar change available in makedumpfile which I have not yet pushed to github, as it implements other features as well) [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2018-June/582407.html [2] https://github.com/bhupesh-sharma/kexec-tools/commit/ae511833e948ccf864fae142ccd903f9c7b3461d Regards, Bhupesh >> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> >> Cc: Mark Rutland <mark.rutland@arm.com> >> Cc: Will Deacon <will.deacon@arm.com> >> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> >> Cc: James Morse <james.morse@arm.com> >> Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> >> --- >> arch/arm64/include/asm/memory.h | 3 +++ >> arch/arm64/kernel/arm64ksyms.c | 1 + >> arch/arm64/mm/init.c | 3 +++ >> 3 files changed, 7 insertions(+) >> >> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h >> index 49d99214f43c..bfd0915ecaf8 100644 >> --- a/arch/arm64/include/asm/memory.h >> +++ b/arch/arm64/include/asm/memory.h >> @@ -178,6 +178,9 @@ extern s64 memstart_addr; >> /* PHYS_OFFSET - the physical address of the start of memory. */ >> #define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; }) >> >> +/* the virtual base of the linear region. */ >> +extern s64 linear_reg_start_addr; >> + >> /* the virtual base of the kernel image (minus TEXT_OFFSET) */ >> extern u64 kimage_vaddr; >> >> diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c >> index d894a20b70b2..a92238ea45ff 100644 >> --- a/arch/arm64/kernel/arm64ksyms.c >> +++ b/arch/arm64/kernel/arm64ksyms.c >> @@ -42,6 +42,7 @@ EXPORT_SYMBOL(__arch_copy_in_user); >> >> /* physical memory */ >> EXPORT_SYMBOL(memstart_addr); >> +EXPORT_SYMBOL(linear_reg_start_addr); >> >> /* string / mem functions */ >> EXPORT_SYMBOL(strchr); >> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> index 325cfb3b858a..29447adb0eef 100644 >> --- a/arch/arm64/mm/init.c >> +++ b/arch/arm64/mm/init.c >> @@ -60,6 +60,7 @@ >> * that cannot be mistaken for a real physical address. >> */ >> s64 memstart_addr __ro_after_init = -1; >> +s64 linear_reg_start_addr __ro_after_init = PAGE_OFFSET; >> phys_addr_t arm64_dma_phys_limit __ro_after_init; >> >> #ifdef CONFIG_BLK_DEV_INITRD >> @@ -452,6 +453,8 @@ void __init arm64_memblock_init(void) >> } >> } >> >> + linear_reg_start_addr = __phys_to_virt(memblock_start_of_DRAM()); >> + >> /* >> * Register the kernel text, kernel data, initrd, and initial >> * pagetables with memblock. >> -- >> 2.7.4 >>
Hi Bhupesh, Ard, On 12/06/18 09:25, Bhupesh Sharma wrote: > On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel > <ard.biesheuvel@linaro.org> wrote: >> On 12 June 2018 at 08:36, Bhupesh Sharma <bhsharma@redhat.com> wrote: >>> The start of the linear region map on a KASLR enabled ARM64 machine - >>> which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL >>> support), is no longer correctly represented by the PAGE_OFFSET macro, >>> since it is defined as: >>> >>> (UL(1) << (VA_BITS - 1)) + 1) >> PAGE_OFFSET is the VA of the start of the linear map. The linear map >> can be sparsely populated with actual memory, regardless of whether >> KASLR is in effect or not. The only difference in the presence of >> KASLR is that there may be such a hole at the beginning, but that does >> not mean the linear map has moved, or that the value of PAGE_OFFSET is >> now wrong. >>> So taking an example of a platform with VA_BITS=48, this gives a static >>> value of: >>> PAGE_OFFSET = 0xffff800000000000 >>> >>> However, for the KASLR case, we use the 'memstart_offset_seed' >>> to randomize the linear region - since 'memstart_addr' indicates the >>> start of physical RAM, we randomize the same on basis >>> of 'memstart_offset_seed' value. >>> >>> As the PAGE_OFFSET value is used presently by several user space >>> tools (for e.g. makedumpfile and crash tools) to determine the start >>> of linear region and hence to read addresses (like PT_NOTE fields) from >>> '/proc/kcore' for the non-KASLR boot cases, so it would be better to >>> use 'memblock_start_of_DRAM()' value (converted to virtual) as >>> the start of linear region for the KASLR cases and default to >>> the PAGE_OFFSET value for non-KASLR cases to indicate the start of >>> linear region. >> Userland code that assumes that the linear map cannot have a hole at >> the beginning should be fixed. > That is a separate case (although that needs fixing as well via a > kernel patch probably as the user-space tools rely on '/proc/iomem' > contents to determine the first System RAM/reserved range). This is for kexec-tools generating the kdump vmcore ELF headers in user-space? > 1. In that particular case (see [1]) the EFI firmware sets the first > EFI block as EfiReservedMemType: > > Region1: 0x000000000000-0x000000200000 [EfiReservedMemType] > Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData] > > Since EFI firmware won't return the "EfiReservedMemType" memory to > Linux kernel, (Its linux that makes this choice in drivers/firmware/efi/arm-init.c::is_usable_memory()) > so the kernel can't get any info about the first mem > block, and kernel can only see region2 as below: > > efi: Processing EFI memory map: > efi: 0x000000200000-0x00000021ffff [Runtime Data |RUN| | | > | | | | |WB|WT|WC|UC] > > # head -1 /proc/iomem > 00200000-0021ffff : reserved > > 2a. If we add debug prints to 'arch/arm64/mm/init.c' to print the > kernel Virtual map we can see that the memory node is set to: > > # dmesg | grep memory > .......... > memory : 0xffff800000200000 - 0xffff801800000000 > > 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that > if we use 'readelf' to get the last program Header from vmcore (logs > below are for the non-kaslr case): > > # readelf -l vmcore > > ELF Header: > ........................ > > Program Headers: > Type Offset VirtAddr PhysAddr > FileSiz MemSiz Flags Align > .............................................................................................................................................................. > LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 > 0x0000001680000000 0x0000001680000000 RWE 0 > > 3. So if we do a simple calculation: > > (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = > 0xFFFF8017FFE00000 != 0xffff801800000000. > > which indicates that the end virtual memory nodes are not the same > between vmlinux and vmcore. If I've followed this properly: the problem is that to generate the ELF headers in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the virtual addresses of the 'System RAM' regions it can see in /proc/iomem. The problem you are hitting is an invisible hole at the beginning of RAM, meaning user-space's guess_phys_to_virt() is off by the size of this hole. Isn't KASLR a special case for this? You must have to correct for that after kdump has happened, based on an elf-note in the vmcore. Can't we always do this? > This happens because the kexec-tools rely on 'proc/iomem' contents > while 'memstart_addr' is computed as 0 by kernel (as value of > memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN). > Returning back to this patch, this is a generic requirement where we > need the linear region start/base addresses in user-space applications > which is used to read addresses which lie in the linear region (for > e.g. when we read /proc/kcore contents). >>> I tested this on my qualcomm (which supports EFI_RNG_PROTOCOL) >>> and apm mustang (which does not support EFI_RNG_PROTOCOL) arm64 boards >>> and was able to use a modified user space utility (like kexec-tools and >>> makedumpfile) to determine the start of linear region correctly for >>> both the KASLR and non-KASLR boot cases. >>> >> >> Can you explain the nature of the changes to the userland code? > > The changes are not to rely on the fixed PAGE_OFFSET macro value for > determining the base address of the linear region, but rather read the > ' linear_reg_start_addr' symbol from kernel and use the same both in > case of KASLR and non-KASLR boots to determine the base of the linear > region (in [2], I have implemented a test change to kexec-tools to > read the 'linear_reg_start_addr' symbol which is available on my Don't use /dev/mem. > public github tree, I have a similar change available in makedumpfile > which I have not yet pushed to github, as it implements other features > as well) >>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h >>> index 49d99214f43c..bfd0915ecaf8 100644 >>> --- a/arch/arm64/include/asm/memory.h >>> +++ b/arch/arm64/include/asm/memory.h >>> @@ -178,6 +178,9 @@ extern s64 memstart_addr; >>> /* PHYS_OFFSET - the physical address of the start of memory. */ >>> #define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; }) >>> >>> +/* the virtual base of the linear region. */ >>> +extern s64 linear_reg_start_addr; >>> + >>> /* the virtual base of the kernel image (minus TEXT_OFFSET) */ >>> extern u64 kimage_vaddr; >>> >>> diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c >>> index d894a20b70b2..a92238ea45ff 100644 >>> --- a/arch/arm64/kernel/arm64ksyms.c >>> +++ b/arch/arm64/kernel/arm64ksyms.c >>> @@ -42,6 +42,7 @@ EXPORT_SYMBOL(__arch_copy_in_user); >>> >>> /* physical memory */ >>> EXPORT_SYMBOL(memstart_addr); >>> +EXPORT_SYMBOL(linear_reg_start_addr); >>> >>> /* string / mem functions */ >>> EXPORT_SYMBOL(strchr); >>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >>> index 325cfb3b858a..29447adb0eef 100644 >>> --- a/arch/arm64/mm/init.c >>> +++ b/arch/arm64/mm/init.c >>> @@ -60,6 +60,7 @@ >>> * that cannot be mistaken for a real physical address. >>> */ >>> s64 memstart_addr __ro_after_init = -1; >>> +s64 linear_reg_start_addr __ro_after_init = PAGE_OFFSET; >>> phys_addr_t arm64_dma_phys_limit __ro_after_init; >>> >>> #ifdef CONFIG_BLK_DEV_INITRD >>> @@ -452,6 +453,8 @@ void __init arm64_memblock_init(void) >>> } >>> } >>> >>> + linear_reg_start_addr = __phys_to_virt(memblock_start_of_DRAM()); This patch adds a variable that nothing uses, its going to be removed. You can't depend on reading this via /dev/mem. Could you add the information you need as an elf-note to the vmcore instead? You must already pick these up to handle kaslr. (from memory, this is where the kaslr-offset is described to user-space after we kdump). Thanks, James
Hi James, On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> wrote: > Hi Bhupesh, Ard, > > On 12/06/18 09:25, Bhupesh Sharma wrote: >> On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel >> <ard.biesheuvel@linaro.org> wrote: >>> On 12 June 2018 at 08:36, Bhupesh Sharma <bhsharma@redhat.com> wrote: >>>> The start of the linear region map on a KASLR enabled ARM64 machine - >>>> which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL >>>> support), is no longer correctly represented by the PAGE_OFFSET macro, >>>> since it is defined as: >>>> >>>> (UL(1) << (VA_BITS - 1)) + 1) > >>> PAGE_OFFSET is the VA of the start of the linear map. The linear map >>> can be sparsely populated with actual memory, regardless of whether >>> KASLR is in effect or not. The only difference in the presence of >>> KASLR is that there may be such a hole at the beginning, but that does >>> not mean the linear map has moved, or that the value of PAGE_OFFSET is >>> now wrong. > >>>> So taking an example of a platform with VA_BITS=48, this gives a static >>>> value of: >>>> PAGE_OFFSET = 0xffff800000000000 >>>> >>>> However, for the KASLR case, we use the 'memstart_offset_seed' >>>> to randomize the linear region - since 'memstart_addr' indicates the >>>> start of physical RAM, we randomize the same on basis >>>> of 'memstart_offset_seed' value. >>>> >>>> As the PAGE_OFFSET value is used presently by several user space >>>> tools (for e.g. makedumpfile and crash tools) to determine the start >>>> of linear region and hence to read addresses (like PT_NOTE fields) from >>>> '/proc/kcore' for the non-KASLR boot cases, so it would be better to >>>> use 'memblock_start_of_DRAM()' value (converted to virtual) as >>>> the start of linear region for the KASLR cases and default to >>>> the PAGE_OFFSET value for non-KASLR cases to indicate the start of >>>> linear region. > >>> Userland code that assumes that the linear map cannot have a hole at >>> the beginning should be fixed. > >> That is a separate case (although that needs fixing as well via a >> kernel patch probably as the user-space tools rely on '/proc/iomem' >> contents to determine the first System RAM/reserved range). > > This is for kexec-tools generating the kdump vmcore ELF headers in user-space? Yes, but again, I would like to reiterate that the case where I see a hole at the start of the System RAM range (as I listed above) is just a specific case, which probably deserves a separate patch. The current patch though is for a generic issue (please see more details below). >> 1. In that particular case (see [1]) the EFI firmware sets the first >> EFI block as EfiReservedMemType: >> >> Region1: 0x000000000000-0x000000200000 [EfiReservedMemType] >> Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData] >> >> Since EFI firmware won't return the "EfiReservedMemType" memory to >> Linux kernel, > > (Its linux that makes this choice in > drivers/firmware/efi/arm-init.c::is_usable_memory()) > > >> so the kernel can't get any info about the first mem >> block, and kernel can only see region2 as below: >> >> efi: Processing EFI memory map: >> efi: 0x000000200000-0x00000021ffff [Runtime Data |RUN| | | >> | | | | |WB|WT|WC|UC] >> >> # head -1 /proc/iomem >> 00200000-0021ffff : reserved >> >> 2a. If we add debug prints to 'arch/arm64/mm/init.c' to print the >> kernel Virtual map we can see that the memory node is set to: >> >> # dmesg | grep memory >> .......... >> memory : 0xffff800000200000 - 0xffff801800000000 >> >> 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that >> if we use 'readelf' to get the last program Header from vmcore (logs >> below are for the non-kaslr case): >> >> # readelf -l vmcore >> >> ELF Header: >> ........................ >> >> Program Headers: >> Type Offset VirtAddr PhysAddr >> FileSiz MemSiz Flags Align >> .............................................................................................................................................................. >> LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 >> 0x0000001680000000 0x0000001680000000 RWE 0 >> >> 3. So if we do a simple calculation: >> >> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = >> 0xFFFF8017FFE00000 != 0xffff801800000000. >> >> which indicates that the end virtual memory nodes are not the same >> between vmlinux and vmcore. > > If I've followed this properly: the problem is that to generate the ELF headers > in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the > virtual addresses of the 'System RAM' regions it can see in /proc/iomem. > > The problem you are hitting is an invisible hole at the beginning of RAM, > meaning user-space's guess_phys_to_virt() is off by the size of this hole. > > Isn't KASLR a special case for this? You must have to correct for that after > kdump has happened, based on an elf-note in the vmcore. Can't we always do this? No, I hit this issue both for the KASLR and non-KASLR boot cases. We can fix this either in kernel or user-space. Fixing this in kernel space seems better to me as the definition of 'memstart_addr' is that it indicates the start of the physical ram, but since in this case there is a hole at the start of the system ram visible in Linux (and thus to user-space), but 'memstart_addr' is still 0 which seems contradictory at the least. This causes PHY_OFFSET to be 0 as well, which is again contradictory. >> This happens because the kexec-tools rely on 'proc/iomem' contents >> while 'memstart_addr' is computed as 0 by kernel (as value of >> memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN). > >> Returning back to this patch, this is a generic requirement where we >> need the linear region start/base addresses in user-space applications >> which is used to read addresses which lie in the linear region (for >> e.g. when we read /proc/kcore contents). > > >>>> I tested this on my qualcomm (which supports EFI_RNG_PROTOCOL) >>>> and apm mustang (which does not support EFI_RNG_PROTOCOL) arm64 boards >>>> and was able to use a modified user space utility (like kexec-tools and >>>> makedumpfile) to determine the start of linear region correctly for >>>> both the KASLR and non-KASLR boot cases. >>>> >>> >>> Can you explain the nature of the changes to the userland code? >> >> The changes are not to rely on the fixed PAGE_OFFSET macro value for >> determining the base address of the linear region, but rather read the >> ' linear_reg_start_addr' symbol from kernel and use the same both in >> case of KASLR and non-KASLR boots to determine the base of the linear >> region (in [2], I have implemented a test change to kexec-tools to >> read the 'linear_reg_start_addr' symbol which is available on my > > Don't use /dev/mem. I just used it as a quick hack, we can use other approaches as well. >> public github tree, I have a similar change available in makedumpfile >> which I have not yet pushed to github, as it implements other features >> as well) > >>>> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h >>>> index 49d99214f43c..bfd0915ecaf8 100644 >>>> --- a/arch/arm64/include/asm/memory.h >>>> +++ b/arch/arm64/include/asm/memory.h >>>> @@ -178,6 +178,9 @@ extern s64 memstart_addr; >>>> /* PHYS_OFFSET - the physical address of the start of memory. */ >>>> #define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; }) >>>> >>>> +/* the virtual base of the linear region. */ >>>> +extern s64 linear_reg_start_addr; >>>> + >>>> /* the virtual base of the kernel image (minus TEXT_OFFSET) */ >>>> extern u64 kimage_vaddr; >>>> >>>> diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c >>>> index d894a20b70b2..a92238ea45ff 100644 >>>> --- a/arch/arm64/kernel/arm64ksyms.c >>>> +++ b/arch/arm64/kernel/arm64ksyms.c >>>> @@ -42,6 +42,7 @@ EXPORT_SYMBOL(__arch_copy_in_user); >>>> >>>> /* physical memory */ >>>> EXPORT_SYMBOL(memstart_addr); >>>> +EXPORT_SYMBOL(linear_reg_start_addr); >>>> >>>> /* string / mem functions */ >>>> EXPORT_SYMBOL(strchr); >>>> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >>>> index 325cfb3b858a..29447adb0eef 100644 >>>> --- a/arch/arm64/mm/init.c >>>> +++ b/arch/arm64/mm/init.c >>>> @@ -60,6 +60,7 @@ >>>> * that cannot be mistaken for a real physical address. >>>> */ >>>> s64 memstart_addr __ro_after_init = -1; >>>> +s64 linear_reg_start_addr __ro_after_init = PAGE_OFFSET; >>>> phys_addr_t arm64_dma_phys_limit __ro_after_init; >>>> >>>> #ifdef CONFIG_BLK_DEV_INITRD >>>> @@ -452,6 +453,8 @@ void __init arm64_memblock_init(void) >>>> } >>>> } >>>> >>>> + linear_reg_start_addr = __phys_to_virt(memblock_start_of_DRAM()); > > This patch adds a variable that nothing uses, its going to be removed. You can't > depend on reading this via /dev/mem. > > Could you add the information you need as an elf-note to the vmcore instead? You > must already pick these up to handle kaslr. (from memory, this is where the > kaslr-offset is described to user-space after we kdump). No you are mixing up the two cases (please see above), the issue which this patch fixes is for use cases where we don't have the vmcore available in case of 'live' debugging via makedumpfile and crash tools (we only have '/proc/kcore' or 'vmlinux' available in such cases). I detailed the use case in [1] better (in a reply to Ard), I will detail the use-case again below: One specific use case that I am working on at the moment is the makedumpfile '--mem-usage', which allows one to see the page numbers of current system (1st kernel) in different use (please see MAKEDUMPFILE(8) for more details). Using this we can know how many pages are dumpable when different dump_level is specified when invoking the makedumpfile. Normally, makedumpfile analyses the contents of '/proc/kcore' (while excluding the crashkernel range), and then calculates the page number of different kind per vmcoreinfo. For e.g. here is an output from my arm64 board (a non KASLR boot): TYPE PAGES EXCLUDABLE DESCRIPTION ---------------------------------------------------------------------- ZERO 49524 yes Pages filled with zero NON_PRI_CACHE 15143 yes Cache pages without private flag PRI_CACHE 29147 yes Cache pages with private flag USER 3684 yes User process pages FREE 1450569 yes Free pages KERN_DATA 14243 no Dumpable kernel data page size: 65536 Total pages on system: 1562310 Total size on system: 102387548160 Byte This use case requires directly reading the '/proc/kcore' and the hence the PAGE_OFFSET value is used to determine the base address of the linear region, whose value is not static in case of KASLR boot. Another use-case is where the crash-utility uses the PAGE_OFFSET value to perform a virtual-to-physical conversion for the address lying in the linear region: ulong arm64_VTOP(ulong addr) { if (machdep->flags & NEW_VMEMMAP) { if (addr >= machdep->machspec->page_offset) return machdep->machspec->phys_offset + (addr - machdep->machspec->page_offset); <..snip..> } [1] https://www.spinics.net/lists/arm-kernel/msg656751.html Regards, Bhupesh
On Wed, Jun 13, 2018 at 10:46:56AM +0530, Bhupesh Sharma wrote: > On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> wrote: > > On 12/06/18 09:25, Bhupesh Sharma wrote: > >> On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel > >> <ard.biesheuvel@linaro.org> wrote: > >>> On 12 June 2018 at 08:36, Bhupesh Sharma <bhsharma@redhat.com> wrote: > >>>> The start of the linear region map on a KASLR enabled ARM64 machine - > >>>> which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL > >>>> support), is no longer correctly represented by the PAGE_OFFSET macro, > >>>> since it is defined as: > >>>> > >>>> (UL(1) << (VA_BITS - 1)) + 1) > > > >>> PAGE_OFFSET is the VA of the start of the linear map. The linear map > >>> can be sparsely populated with actual memory, regardless of whether > >>> KASLR is in effect or not. The only difference in the presence of > >>> KASLR is that there may be such a hole at the beginning, but that does > >>> not mean the linear map has moved, or that the value of PAGE_OFFSET is > >>> now wrong. > > > >>>> So taking an example of a platform with VA_BITS=48, this gives a static > >>>> value of: > >>>> PAGE_OFFSET = 0xffff800000000000 > >>>> > >>>> However, for the KASLR case, we use the 'memstart_offset_seed' > >>>> to randomize the linear region - since 'memstart_addr' indicates the > >>>> start of physical RAM, we randomize the same on basis > >>>> of 'memstart_offset_seed' value. > >>>> > >>>> As the PAGE_OFFSET value is used presently by several user space > >>>> tools (for e.g. makedumpfile and crash tools) to determine the start > >>>> of linear region and hence to read addresses (like PT_NOTE fields) from > >>>> '/proc/kcore' for the non-KASLR boot cases, so it would be better to > >>>> use 'memblock_start_of_DRAM()' value (converted to virtual) as > >>>> the start of linear region for the KASLR cases and default to > >>>> the PAGE_OFFSET value for non-KASLR cases to indicate the start of > >>>> linear region. > > > >>> Userland code that assumes that the linear map cannot have a hole at > >>> the beginning should be fixed. > > > >> That is a separate case (although that needs fixing as well via a > >> kernel patch probably as the user-space tools rely on '/proc/iomem' > >> contents to determine the first System RAM/reserved range). > > > > This is for kexec-tools generating the kdump vmcore ELF headers in user-space? > > Yes, but again, I would like to reiterate that the case where I see a > hole at the start of the System RAM range (as I listed above) is just > a specific case, which probably deserves a separate patch. The current > patch though is for a generic issue (please see more details below). > > >> 1. In that particular case (see [1]) the EFI firmware sets the first > >> EFI block as EfiReservedMemType: > >> > >> Region1: 0x000000000000-0x000000200000 [EfiReservedMemType] > >> Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData] > >> > >> Since EFI firmware won't return the "EfiReservedMemType" memory to > >> Linux kernel, > > > > (Its linux that makes this choice in > > drivers/firmware/efi/arm-init.c::is_usable_memory()) > > > > > >> so the kernel can't get any info about the first mem > >> block, and kernel can only see region2 as below: > >> > >> efi: Processing EFI memory map: > >> efi: 0x000000200000-0x00000021ffff [Runtime Data |RUN| | | > >> | | | | |WB|WT|WC|UC] > >> > >> # head -1 /proc/iomem > >> 00200000-0021ffff : reserved > >> > >> 2a. If we add debug prints to 'arch/arm64/mm/init.c' to print the > >> kernel Virtual map we can see that the memory node is set to: > >> > >> # dmesg | grep memory > >> .......... > >> memory : 0xffff800000200000 - 0xffff801800000000 > >> > >> 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that > >> if we use 'readelf' to get the last program Header from vmcore (logs > >> below are for the non-kaslr case): > >> > >> # readelf -l vmcore > >> > >> ELF Header: > >> ........................ > >> > >> Program Headers: > >> Type Offset VirtAddr PhysAddr > >> FileSiz MemSiz Flags Align > >> .............................................................................................................................................................. > >> LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 > >> 0x0000001680000000 0x0000001680000000 RWE 0 > >> > >> 3. So if we do a simple calculation: > >> > >> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = > >> 0xFFFF8017FFE00000 != 0xffff801800000000. > >> > >> which indicates that the end virtual memory nodes are not the same > >> between vmlinux and vmcore. > > > > If I've followed this properly: the problem is that to generate the ELF headers > > in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the > > virtual addresses of the 'System RAM' regions it can see in /proc/iomem. > > > > The problem you are hitting is an invisible hole at the beginning of RAM, > > meaning user-space's guess_phys_to_virt() is off by the size of this hole. > > > > Isn't KASLR a special case for this? You must have to correct for that after > > kdump has happened, based on an elf-note in the vmcore. Can't we always do this? > > No, I hit this issue both for the KASLR and non-KASLR boot cases. We > can fix this either in kernel or user-space. > > Fixing this in kernel space seems better to me as the definition of > 'memstart_addr' is that it indicates the start of the physical ram, > but since in this case there is a hole at the start of the system ram > visible in Linux (and thus to user-space), but 'memstart_addr' is > still 0 which seems contradictory at the least. This causes PHY_OFFSET > to be 0 as well, which is again contradictory. Contradictory to who? Userspace has no business messing around with this stuff and I'm reluctant to make this an ABI by adding a symbol with a special name. Why can't the various constants needed by these tools be exported in the ELF headers for kcore/vmcore, or as a NOTE as James suggests? That sounds a lot less fragile to me. Will
Hi Bhupesh, On 13/06/18 06:16, Bhupesh Sharma wrote: > On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> wrote: >> On 12/06/18 09:25, Bhupesh Sharma wrote: >>> On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel wrote: >>>> Userland code that assumes that the linear map cannot have a hole at >>>> the beginning should be fixed. >>> That is a separate case (although that needs fixing as well via a >>> kernel patch probably as the user-space tools rely on '/proc/iomem' >>> contents to determine the first System RAM/reserved range). >> >> This is for kexec-tools generating the kdump vmcore ELF headers in user-space? > > Yes, but again, I would like to reiterate that the case where I see a > hole at the start of the System RAM range (as I listed above) is just > a specific case, which probably deserves a separate patch. The current > patch though is for a generic issue (please see more details below). >>> # readelf -l vmcore >>> >>> ELF Header: >>> ........................ >>> >>> Program Headers: >>> Type Offset VirtAddr PhysAddr >>> FileSiz MemSiz Flags Align >>> .............................................................................................................................................................. >>> LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 >>> 0x0000001680000000 0x0000001680000000 RWE 0 >>> >>> 3. So if we do a simple calculation: >>> >>> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = >>> 0xFFFF8017FFE00000 != 0xffff801800000000. >>> >>> which indicates that the end virtual memory nodes are not the same >>> between vmlinux and vmcore. >> >> If I've followed this properly: the problem is that to generate the ELF headers >> in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the >> virtual addresses of the 'System RAM' regions it can see in /proc/iomem. >> >> The problem you are hitting is an invisible hole at the beginning of RAM, >> meaning user-space's guess_phys_to_virt() is off by the size of this hole. >> >> Isn't KASLR a special case for this? You must have to correct for that after >> kdump has happened, based on an elf-note in the vmcore. Can't we always do this? > > No, I hit this issue both for the KASLR and non-KASLR boot cases. Because in both cases there is a hole at the beginning of the linear-map. KASLR is a special-case of this as the kernel adds a variable sized hole to do the randomization. Surely treating this as one case makes your user-space code simpler. > Fixing this in kernel space seems better to me as the definition of Is there a kernel bug? Changing the definitions of internal kernel variables for the benefit of code digging in /proc/kcore|/dev/mem isn't going to fly. > 'memstart_addr' is that it indicates the start of the physical ram, > but since in this case there is a hole at the start of the system ram > visible in Linux (and thus to user-space), but 'memstart_addr' is > still 0 which seems contradictory at the least. This causes PHY_OFFSET > to be 0 as well, which is again contradictory. >>> This happens because the kexec-tools rely on 'proc/iomem' contents >>> while 'memstart_addr' is computed as 0 by kernel (as value of >>> memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN). >> >>> Returning back to this patch, this is a generic requirement where we >>> need the linear region start/base addresses in user-space applications >>> which is used to read addresses which lie in the linear region (for >>> e.g. when we read /proc/kcore contents). [...] >> This patch adds a variable that nothing uses, its going to be removed. You can't >> depend on reading this via /dev/mem. >> >> Could you add the information you need as an elf-note to the vmcore instead? You >> must already pick these up to handle kaslr. (from memory, this is where the >> kaslr-offset is described to user-space after we kdump). > No you are mixing up the two cases (please see above), the issue which > this patch fixes is for use cases where we don't have the vmcore > available in case of 'live' debugging via makedumpfile and crash tools > (we only have '/proc/kcore' or 'vmlinux' available in such cases). I > detailed the use case in [1] better (in a reply to Ard), I will detail > the use-case again below: Okay, so not kdump... > One specific use case that I am working on at the moment is the > makedumpfile '--mem-usage', which allows one to see the page numbers > of current system (1st kernel) in different use (please see > MAKEDUMPFILE(8) for more details). https://linux.die.net/man/8/makedumpfile : | Name: makedumpfile - make a small dumpfile of kdump ... but now we are talking about kdump again ... > Using this we can know how many pages are dumpable when different > dump_level is specified when invoking the makedumpfile. > > Normally, makedumpfile analyses the contents of '/proc/kcore' (while > excluding the crashkernel range), and then calculates the page number > of different kind per vmcoreinfo. $ apt-get source makedumpfile $ cd makedumpfile-1.5.3 $ grep -r "kcore" . $ I suspect there are two pieces of software with the same name here. > This use case requires directly reading the '/proc/kcore' and the > hence the PAGE_OFFSET value is used to determine the base address of > the linear region, whose value is not static in case of KASLR boot. Eh? I thought PAGE_OFFSET was a compile-time constant, and it was PHYS_OFFSET has a value other the aligned base of memory for KASLR. > Another use-case is where the crash-utility uses the PAGE_OFFSET value > to perform a virtual-to-physical conversion for the address lying in > the linear region: In all cases the problem you have is assuming the first 'System RAM' value in /proc/iomem is the base of DRAM, which you can use a PHYS_OFFSET in your user-space phys2virt() calculation. What information do you need to make this work? You can evidently read kernel variables, why can't you read memstart_addr and do: | #define __phys_to_virt(x) \ | ((unsigned long)((x) - memstart_addr) | PAGE_OFFSET) based on the physical addresses in /proc/iomem, and PAGE_OFFSET pulled out of the vmlinux. Reading memstart_addr is fragile, we might need to rename it wednesday_memstart_addr. If user-space needs this value to work with /proc/{kcore,vmcore} we should expose something like 'p2v_offset' as an elf-note on those files. (looks like they both have elf-headers). Thanks, James
Hi Will, On Wed, Jun 13, 2018 at 3:41 PM, Will Deacon <will.deacon@arm.com> wrote: > On Wed, Jun 13, 2018 at 10:46:56AM +0530, Bhupesh Sharma wrote: >> On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> wrote: >> > On 12/06/18 09:25, Bhupesh Sharma wrote: >> >> On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel >> >> <ard.biesheuvel@linaro.org> wrote: >> >>> On 12 June 2018 at 08:36, Bhupesh Sharma <bhsharma@redhat.com> wrote: >> >>>> The start of the linear region map on a KASLR enabled ARM64 machine - >> >>>> which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL >> >>>> support), is no longer correctly represented by the PAGE_OFFSET macro, >> >>>> since it is defined as: >> >>>> >> >>>> (UL(1) << (VA_BITS - 1)) + 1) >> > >> >>> PAGE_OFFSET is the VA of the start of the linear map. The linear map >> >>> can be sparsely populated with actual memory, regardless of whether >> >>> KASLR is in effect or not. The only difference in the presence of >> >>> KASLR is that there may be such a hole at the beginning, but that does >> >>> not mean the linear map has moved, or that the value of PAGE_OFFSET is >> >>> now wrong. >> > >> >>>> So taking an example of a platform with VA_BITS=48, this gives a static >> >>>> value of: >> >>>> PAGE_OFFSET = 0xffff800000000000 >> >>>> >> >>>> However, for the KASLR case, we use the 'memstart_offset_seed' >> >>>> to randomize the linear region - since 'memstart_addr' indicates the >> >>>> start of physical RAM, we randomize the same on basis >> >>>> of 'memstart_offset_seed' value. >> >>>> >> >>>> As the PAGE_OFFSET value is used presently by several user space >> >>>> tools (for e.g. makedumpfile and crash tools) to determine the start >> >>>> of linear region and hence to read addresses (like PT_NOTE fields) from >> >>>> '/proc/kcore' for the non-KASLR boot cases, so it would be better to >> >>>> use 'memblock_start_of_DRAM()' value (converted to virtual) as >> >>>> the start of linear region for the KASLR cases and default to >> >>>> the PAGE_OFFSET value for non-KASLR cases to indicate the start of >> >>>> linear region. >> > >> >>> Userland code that assumes that the linear map cannot have a hole at >> >>> the beginning should be fixed. >> > >> >> That is a separate case (although that needs fixing as well via a >> >> kernel patch probably as the user-space tools rely on '/proc/iomem' >> >> contents to determine the first System RAM/reserved range). >> > >> > This is for kexec-tools generating the kdump vmcore ELF headers in user-space? >> >> Yes, but again, I would like to reiterate that the case where I see a >> hole at the start of the System RAM range (as I listed above) is just >> a specific case, which probably deserves a separate patch. The current >> patch though is for a generic issue (please see more details below). >> >> >> 1. In that particular case (see [1]) the EFI firmware sets the first >> >> EFI block as EfiReservedMemType: >> >> >> >> Region1: 0x000000000000-0x000000200000 [EfiReservedMemType] >> >> Region2: 0x000000200000-0x00000021fffff [EfiRuntimeServiceData] >> >> >> >> Since EFI firmware won't return the "EfiReservedMemType" memory to >> >> Linux kernel, >> > >> > (Its linux that makes this choice in >> > drivers/firmware/efi/arm-init.c::is_usable_memory()) >> > >> > >> >> so the kernel can't get any info about the first mem >> >> block, and kernel can only see region2 as below: >> >> >> >> efi: Processing EFI memory map: >> >> efi: 0x000000200000-0x00000021ffff [Runtime Data |RUN| | | >> >> | | | | |WB|WT|WC|UC] >> >> >> >> # head -1 /proc/iomem >> >> 00200000-0021ffff : reserved >> >> >> >> 2a. If we add debug prints to 'arch/arm64/mm/init.c' to print the >> >> kernel Virtual map we can see that the memory node is set to: >> >> >> >> # dmesg | grep memory >> >> .......... >> >> memory : 0xffff800000200000 - 0xffff801800000000 >> >> >> >> 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that >> >> if we use 'readelf' to get the last program Header from vmcore (logs >> >> below are for the non-kaslr case): >> >> >> >> # readelf -l vmcore >> >> >> >> ELF Header: >> >> ........................ >> >> >> >> Program Headers: >> >> Type Offset VirtAddr PhysAddr >> >> FileSiz MemSiz Flags Align >> >> .............................................................................................................................................................. >> >> LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 >> >> 0x0000001680000000 0x0000001680000000 RWE 0 >> >> >> >> 3. So if we do a simple calculation: >> >> >> >> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = >> >> 0xFFFF8017FFE00000 != 0xffff801800000000. >> >> >> >> which indicates that the end virtual memory nodes are not the same >> >> between vmlinux and vmcore. >> > >> > If I've followed this properly: the problem is that to generate the ELF headers >> > in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the >> > virtual addresses of the 'System RAM' regions it can see in /proc/iomem. >> > >> > The problem you are hitting is an invisible hole at the beginning of RAM, >> > meaning user-space's guess_phys_to_virt() is off by the size of this hole. >> > >> > Isn't KASLR a special case for this? You must have to correct for that after >> > kdump has happened, based on an elf-note in the vmcore. Can't we always do this? >> >> No, I hit this issue both for the KASLR and non-KASLR boot cases. We >> can fix this either in kernel or user-space. >> >> Fixing this in kernel space seems better to me as the definition of >> 'memstart_addr' is that it indicates the start of the physical ram, >> but since in this case there is a hole at the start of the system ram >> visible in Linux (and thus to user-space), but 'memstart_addr' is >> still 0 which seems contradictory at the least. This causes PHY_OFFSET >> to be 0 as well, which is again contradictory. > > Contradictory to who? I meant that the 'memstart_addr' and PHY_OFFSET value are computed as 0 in the above particular case, which is not the real representation of the start of System RAM as the 1st memory block available in Linux starts from 2MB [as confirmed by the 'memblock_start_of_DRAM()' value of 0x200000] and indicated by '/proc/iomem': # head -1 /proc/iomem 00200000-0021ffff : reserved > Userspace has no business messing around with this > stuff and I'm reluctant to make this an ABI by adding a symbol with a > special name. Why can't the various constants needed by these tools be > exported in the ELF headers for kcore/vmcore, or as a NOTE as James > suggests? That sounds a lot less fragile to me. But we already add the 'memstart_addr' variable to kallsyms in the kernel, don't we? And so user-space tools do use the same - so we already have a precedent available. Again this patch was an attempt to start a conversation as my query towards determining the base of linear range by either: - reading the 'memstart_addr' and backcomputing the start of linear range, or - adding a new variable (which this patch does), or - use other approaches did not see a conclusion (please see [1]). [1] https://www.spinics.net/lists/arm-kernel/msg655933.html Regards, Bhupesh
Hello James, Thanks for your inputs, please see my responses inline. On Wed, Jun 13, 2018 at 3:59 PM, James Morse <james.morse@arm.com> wrote: > Hi Bhupesh, > > On 13/06/18 06:16, Bhupesh Sharma wrote: >> On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> wrote: >>> On 12/06/18 09:25, Bhupesh Sharma wrote: >>>> On Tue, Jun 12, 2018 at 12:23 PM, Ard Biesheuvel wrote: >>>>> Userland code that assumes that the linear map cannot have a hole at >>>>> the beginning should be fixed. > >>>> That is a separate case (although that needs fixing as well via a >>>> kernel patch probably as the user-space tools rely on '/proc/iomem' >>>> contents to determine the first System RAM/reserved range). >>> >>> This is for kexec-tools generating the kdump vmcore ELF headers in user-space? >> >> Yes, but again, I would like to reiterate that the case where I see a >> hole at the start of the System RAM range (as I listed above) is just >> a specific case, which probably deserves a separate patch. The current >> patch though is for a generic issue (please see more details below). > > >>>> # readelf -l vmcore >>>> >>>> ELF Header: >>>> ........................ >>>> >>>> Program Headers: >>>> Type Offset VirtAddr PhysAddr >>>> FileSiz MemSiz Flags Align >>>> .............................................................................................................................................................. >>>> LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 >>>> 0x0000001680000000 0x0000001680000000 RWE 0 >>>> >>>> 3. So if we do a simple calculation: >>>> >>>> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = >>>> 0xFFFF8017FFE00000 != 0xffff801800000000. >>>> >>>> which indicates that the end virtual memory nodes are not the same >>>> between vmlinux and vmcore. >>> >>> If I've followed this properly: the problem is that to generate the ELF headers >>> in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the >>> virtual addresses of the 'System RAM' regions it can see in /proc/iomem. >>> >>> The problem you are hitting is an invisible hole at the beginning of RAM, >>> meaning user-space's guess_phys_to_virt() is off by the size of this hole. >>> >>> Isn't KASLR a special case for this? You must have to correct for that after >>> kdump has happened, based on an elf-note in the vmcore. Can't we always do this? >> >> No, I hit this issue both for the KASLR and non-KASLR boot cases. > > Because in both cases there is a hole at the beginning of the linear-map. KASLR > is a special-case of this as the kernel adds a variable sized hole to do the > randomization. > > Surely treating this as one case makes your user-space code simpler. Ok. >> Fixing this in kernel space seems better to me as the definition of > > Is there a kernel bug? Changing the definitions of internal kernel variables for > the benefit of code digging in /proc/kcore|/dev/mem isn't going to fly. Indeed, I am not advocating to change the kernel space code just to suit the user-space tools. However in this particular case the 'memstart_addr' and PHY_OFFSET value are computed as 0 which IMO is not the real representation of the start of System RAM as the 1st memory block available in Linux starts from 2MB [as confirmed by the 'memblock_start_of_DRAM()' value of 0x200000] and indicated by '/proc/iomem': # head -1 /proc/iomem 00200000-0021ffff : reserved I think reading the kernel code and finding 'memstart_addr' and PHY_OFFSET as 0, one gets the notion that the base of System RAM starts from 0, which is incorrect in the above case as it starts from 2MB as the 1st block is of the type EfiReservedMemType >> 'memstart_addr' is that it indicates the start of the physical ram, >> but since in this case there is a hole at the start of the system ram >> visible in Linux (and thus to user-space), but 'memstart_addr' is >> still 0 which seems contradictory at the least. This causes PHY_OFFSET >> to be 0 as well, which is again contradictory. > > >>>> This happens because the kexec-tools rely on 'proc/iomem' contents >>>> while 'memstart_addr' is computed as 0 by kernel (as value of >>>> memblock_start_of_DRAM() < ARM64_MEMSTART_ALIGN). >>> >>>> Returning back to this patch, this is a generic requirement where we >>>> need the linear region start/base addresses in user-space applications >>>> which is used to read addresses which lie in the linear region (for >>>> e.g. when we read /proc/kcore contents). > > [...] > >>> This patch adds a variable that nothing uses, its going to be removed. You can't >>> depend on reading this via /dev/mem. >>> >>> Could you add the information you need as an elf-note to the vmcore instead? You >>> must already pick these up to handle kaslr. (from memory, this is where the >>> kaslr-offset is described to user-space after we kdump). > > >> No you are mixing up the two cases (please see above), the issue which >> this patch fixes is for use cases where we don't have the vmcore >> available in case of 'live' debugging via makedumpfile and crash tools >> (we only have '/proc/kcore' or 'vmlinux' available in such cases). I >> detailed the use case in [1] better (in a reply to Ard), I will detail >> the use-case again below: > > Okay, so not kdump... > > >> One specific use case that I am working on at the moment is the >> makedumpfile '--mem-usage', which allows one to see the page numbers >> of current system (1st kernel) in different use (please see >> MAKEDUMPFILE(8) for more details). > > https://linux.die.net/man/8/makedumpfile : > | Name: makedumpfile - make a small dumpfile of kdump > > ... but now we are talking about kdump again ... > > >> Using this we can know how many pages are dumpable when different >> dump_level is specified when invoking the makedumpfile. >> >> Normally, makedumpfile analyses the contents of '/proc/kcore' (while >> excluding the crashkernel range), and then calculates the page number >> of different kind per vmcoreinfo. > > $ apt-get source makedumpfile > $ cd makedumpfile-1.5.3 > $ grep -r "kcore" . > $ > > I suspect there are two pieces of software with the same name here. Here is the makedumpfile upstream git tree - git://git.code.sf.net/p/makedumpfile/code $ grep -r "kcore" . ./elf_info.c:int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len) <..snip..> ./makedumpfile.8:# makedumpfile \-f \-\-mem\-usage /proc/kcore <..snip..> >> This use case requires directly reading the '/proc/kcore' and the >> hence the PAGE_OFFSET value is used to determine the base address of >> the linear region, whose value is not static in case of KASLR boot. > > Eh? I thought PAGE_OFFSET was a compile-time constant, and it was PHYS_OFFSET > has a value other the aligned base of memory for KASLR. Indeed, I tried to capture the dilemma in [1], just to recap: 'arch/arm64/include/asm/memory.h' defines PAGE_OFFSET as: /* * PAGE_OFFSET - the virtual address of the start of the linear map (top * (VA_BITS - 1)) */ #define PAGE_OFFSET (UL(0xffffffffffffffff) - \ (UL(1) << (VA_BITS - 1)) + 1) However, for the KASLR case, we set the 'memstart_offset_seed ' to use the 16-bits of the 'kaslr-seed' to randomize the linear region in 'arch/arm64/kernel/kaslr.c' : u64 __init kaslr_early_init(u64 dt_phys) { <snip..> /* use the top 16 bits to randomize the linear region */ memstart_offset_seed = seed >> 48; <snip..> } So, either we should have a uniform way of representing the virtual base of the linear range both in KASLR and non-KASLR boot cases (macro or variable?). or we should rather look at removing the PAGE_OFFSET usage from the kernel (or atleast the confusing comment from 'memory.h') - again please see [1] for the suggested approaches (bottom part of the query) > >> Another use-case is where the crash-utility uses the PAGE_OFFSET value >> to perform a virtual-to-physical conversion for the address lying in >> the linear region: > > In all cases the problem you have is assuming the first 'System RAM' value in > /proc/iomem is the base of DRAM, which you can use a PHYS_OFFSET in your > user-space phys2virt() calculation. > > What information do you need to make this work? > > You can evidently read kernel variables, why can't you read memstart_addr and do: > | #define __phys_to_virt(x) \ > | ((unsigned long)((x) - memstart_addr) | PAGE_OFFSET) > > based on the physical addresses in /proc/iomem, and PAGE_OFFSET pulled out of > the vmlinux. > > Reading memstart_addr is fragile, we might need to rename it > wednesday_memstart_addr. If user-space needs this value to work with > /proc/{kcore,vmcore} we should expose something like 'p2v_offset' as an elf-note > on those files. (looks like they both have elf-headers). Again I had suggested reading memstart_addr as one of the approaches in [1], but seems we couldn't reach a conclusion, so I sent out this approach to trigger another round of discussion. BTW adding 'p2v_offset' as an elf-note seems like a good idea. If this seems suitable, I can try and spin patch(es) using this approach (both for the kernel and user-space tools). Please share your views, [1] https://www.spinics.net/lists/arm-kernel/msg655933.html Thanks, Bhupesh
Hi Bhupesh, On 14/06/18 08:53, Bhupesh Sharma wrote: > On Wed, Jun 13, 2018 at 3:59 PM, James Morse <james.morse@arm.com> wrote: >> On 13/06/18 06:16, Bhupesh Sharma wrote: >>> On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> wrote: >>>> If I've followed this properly: the problem is that to generate the ELF headers >>>> in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the >>>> virtual addresses of the 'System RAM' regions it can see in /proc/iomem. >>>> >>>> The problem you are hitting is an invisible hole at the beginning of RAM, >>>> meaning user-space's guess_phys_to_virt() is off by the size of this hole. >>>> >>>> Isn't KASLR a special case for this? You must have to correct for that after >>>> kdump has happened, based on an elf-note in the vmcore. Can't we always do this? >>> >>> No, I hit this issue both for the KASLR and non-KASLR boot cases. >> >> Because in both cases there is a hole at the beginning of the linear-map. KASLR >> is a special-case of this as the kernel adds a variable sized hole to do the >> randomization. >> >> Surely treating this as one case makes your user-space code simpler. > > Ok. > >>> Fixing this in kernel space seems better to me as the definition of >> >> Is there a kernel bug? Changing the definitions of internal kernel variables for >> the benefit of code digging in /proc/kcore|/dev/mem isn't going to fly. > > Indeed, I am not advocating to change the kernel space code just to > suit the user-space tools. However in this particular case the > 'memstart_addr' and PHY_OFFSET value are computed as 0 which IMO (What is PHY_OFFSET? I assume you mean PHYS_OFFSET, which is the same as memstart_addr ... why do you quote them together?) > is > not the real representation of the start of System RAM as the 1st > memory block available in Linux starts from 2MB [as confirmed by the > 'memblock_start_of_DRAM()' value of 0x200000] and indicated by > '/proc/iomem': > > # head -1 /proc/iomem > 00200000-0021ffff : reserved You have assumptions about what memstart_addr is based on its name. Names of kernel variables get further from their actual use over time. The purpose of this variable isn't to store where a hypothetical-lowest-page of memory would be in the linear map. The kernel doesn't have a handy variable for this, because on-one needs to know. > I think reading the kernel code and finding 'memstart_addr' and > PHY_OFFSET as 0, one gets the notion notion -> assumption based on the name It's just a name. Anyone reading this should grep for how the value is used. It's added/subtracted from addresses as part of phys_to_virt()/virt_to_phs(). It must be some kind of offset. What does it mean on its own? Probably nothing. > that the base of System RAM starts from 0, > which is incorrect in the above case as it starts from > 2MB as the 1st block is of the type EfiReservedMemType What will they assume if the value is negative? [...] > So, either we should have a uniform way of representing the virtual > base of the linear range What needs to know this? RAM will be somewhere between PAGE_OFFSET and the top of the address space. Anyone who wants to know where has a specific page in mind, phys_to_virt() or page_address() tell them where their page is. > or we should rather look at removing the PAGE_OFFSET > usage from > the kernel (or atleast the confusing comment from 'memory.h') This?: | PAGE_OFFSET - the virtual address of the start of the linear map Nothing here says its the virtual address of any particular physical page. Its the start of the region of VA space that holds the 1:1 mapping of RAM. Its value is generated at compile time, we have no idea where RAM will be until we boot, how could this be the address of any particular page? > BTW adding 'p2v_offset' as an elf-note seems like a good idea. If this > seems suitable, I can try and spin patch(es) using this approach (both > for the kernel and user-space tools). You seem to be using this for user-space phys_to_virt() based on values found in /proc/iomem. This should give you what you want, and isolate your user-space from the kernel's unexpected naming of variables. I'd suggest a 64bit offset that is added to a physical address to get where in the linear map this page would be, if its mapped. Thanks, James
Hi Bhupesh, On Thu, Jun 14, 2018 at 11:53:53AM +0530, Bhupesh Sharma wrote: > On Wed, Jun 13, 2018 at 3:41 PM, Will Deacon <will.deacon@arm.com> wrote: > > On Wed, Jun 13, 2018 at 10:46:56AM +0530, Bhupesh Sharma wrote: > >> On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> wrote: > >> > On 12/06/18 09:25, Bhupesh Sharma wrote: > >> >> 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that > >> >> if we use 'readelf' to get the last program Header from vmcore (logs > >> >> below are for the non-kaslr case): > >> >> > >> >> # readelf -l vmcore > >> >> > >> >> ELF Header: > >> >> ........................ > >> >> > >> >> Program Headers: > >> >> Type Offset VirtAddr PhysAddr > >> >> FileSiz MemSiz Flags Align > >> >> .............................................................................................................................................................. > >> >> LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 > >> >> 0x0000001680000000 0x0000001680000000 RWE 0 > >> >> > >> >> 3. So if we do a simple calculation: > >> >> > >> >> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = > >> >> 0xFFFF8017FFE00000 != 0xffff801800000000. > >> >> > >> >> which indicates that the end virtual memory nodes are not the same > >> >> between vmlinux and vmcore. > >> > > >> > If I've followed this properly: the problem is that to generate the ELF headers > >> > in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the > >> > virtual addresses of the 'System RAM' regions it can see in /proc/iomem. > >> > > >> > The problem you are hitting is an invisible hole at the beginning of RAM, > >> > meaning user-space's guess_phys_to_virt() is off by the size of this hole. > >> > > >> > Isn't KASLR a special case for this? You must have to correct for that after > >> > kdump has happened, based on an elf-note in the vmcore. Can't we always do this? > >> > >> No, I hit this issue both for the KASLR and non-KASLR boot cases. We > >> can fix this either in kernel or user-space. > >> > >> Fixing this in kernel space seems better to me as the definition of > >> 'memstart_addr' is that it indicates the start of the physical ram, > >> but since in this case there is a hole at the start of the system ram > >> visible in Linux (and thus to user-space), but 'memstart_addr' is > >> still 0 which seems contradictory at the least. This causes PHY_OFFSET > >> to be 0 as well, which is again contradictory. > > > > Contradictory to who? > > I meant that the 'memstart_addr' and PHY_OFFSET value are computed as > 0 in the above particular case, which is not the real representation > of the start of System RAM as the 1st memory block available in Linux > starts from 2MB [as confirmed by the 'memblock_start_of_DRAM()' value > of 0x200000] and indicated by '/proc/iomem': > > # head -1 /proc/iomem > 00200000-0021ffff : reserved Who said it's supposed to be the "real representation of the start of System RAM"? The kernel is fine with this being 0 in the case you describe. How about we rename the variable to 'memstart_addr_sometimes_zero'? Does that help? > > Userspace has no business messing around with this > > stuff and I'm reluctant to make this an ABI by adding a symbol with a > > special name. Why can't the various constants needed by these tools be > > exported in the ELF headers for kcore/vmcore, or as a NOTE as James > > suggests? That sounds a lot less fragile to me. > > But we already add the 'memstart_addr' variable to kallsyms in the > kernel, don't we? And so user-space tools do use the same - so we > already have a precedent available. Whoa, whoa -- hold up! The implication here is that variables exposed via kallsyms are ABI. That's simply not true, otherwise we'd be breaking the ABI every kernel release. Will
Hi Will, On Fri, Jun 15, 2018 at 10:22 PM, Will Deacon <will.deacon@arm.com> wrote: > Hi Bhupesh, > > On Thu, Jun 14, 2018 at 11:53:53AM +0530, Bhupesh Sharma wrote: >> On Wed, Jun 13, 2018 at 3:41 PM, Will Deacon <will.deacon@arm.com> wrote: >> > On Wed, Jun 13, 2018 at 10:46:56AM +0530, Bhupesh Sharma wrote: >> >> On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> wrote: >> >> > On 12/06/18 09:25, Bhupesh Sharma wrote: >> >> >> 2b. Now if we use kexec-tools to obtain a crash vmcore we can see that >> >> >> if we use 'readelf' to get the last program Header from vmcore (logs >> >> >> below are for the non-kaslr case): >> >> >> >> >> >> # readelf -l vmcore >> >> >> >> >> >> ELF Header: >> >> >> ........................ >> >> >> >> >> >> Program Headers: >> >> >> Type Offset VirtAddr PhysAddr >> >> >> FileSiz MemSiz Flags Align >> >> >> .............................................................................................................................................................. >> >> >> LOAD 0x0000000076d40000 0xffff80017fe00000 0x0000000180000000 >> >> >> 0x0000001680000000 0x0000001680000000 RWE 0 >> >> >> >> >> >> 3. So if we do a simple calculation: >> >> >> >> >> >> (VirtAddr + MemSiz) = 0xffff80017fe00000 + 0x0000001680000000 = >> >> >> 0xFFFF8017FFE00000 != 0xffff801800000000. >> >> >> >> >> >> which indicates that the end virtual memory nodes are not the same >> >> >> between vmlinux and vmcore. >> >> > >> >> > If I've followed this properly: the problem is that to generate the ELF headers >> >> > in the post-kdump vmcore, at kdump-load-time kexec-tools has to guess the >> >> > virtual addresses of the 'System RAM' regions it can see in /proc/iomem. >> >> > >> >> > The problem you are hitting is an invisible hole at the beginning of RAM, >> >> > meaning user-space's guess_phys_to_virt() is off by the size of this hole. >> >> > >> >> > Isn't KASLR a special case for this? You must have to correct for that after >> >> > kdump has happened, based on an elf-note in the vmcore. Can't we always do this? >> >> >> >> No, I hit this issue both for the KASLR and non-KASLR boot cases. We >> >> can fix this either in kernel or user-space. >> >> >> >> Fixing this in kernel space seems better to me as the definition of >> >> 'memstart_addr' is that it indicates the start of the physical ram, >> >> but since in this case there is a hole at the start of the system ram >> >> visible in Linux (and thus to user-space), but 'memstart_addr' is >> >> still 0 which seems contradictory at the least. This causes PHY_OFFSET >> >> to be 0 as well, which is again contradictory. >> > >> > Contradictory to who? >> >> I meant that the 'memstart_addr' and PHY_OFFSET value are computed as >> 0 in the above particular case, which is not the real representation >> of the start of System RAM as the 1st memory block available in Linux >> starts from 2MB [as confirmed by the 'memblock_start_of_DRAM()' value >> of 0x200000] and indicated by '/proc/iomem': >> >> # head -1 /proc/iomem >> 00200000-0021ffff : reserved > > Who said it's supposed to be the "real representation of the start of System > RAM"? The kernel is fine with this being 0 in the case you describe. How > about we rename the variable to 'memstart_addr_sometimes_zero'? Does that > help? Other architectures (like ppc) have historically used 'memstart_addr' as the representation of the start of System RAM (and it probably inspired the usage of the same in arm64, but I am not sure..). $ grep -inr "memstart_addr" arch/powerpc/ <..snip..> arch/powerpc/include/asm/page.h:123:#define MEMORY_START memstart_addr <..snip..> If we want to have a special interpretation of 'memstart_addr' for arm64, I personally have no issues with it (other than it being, well *confusing*), so I would leave that to your and other arm64 maintainer's discretion. >> > Userspace has no business messing around with this >> > stuff and I'm reluctant to make this an ABI by adding a symbol with a >> > special name. Why can't the various constants needed by these tools be >> > exported in the ELF headers for kcore/vmcore, or as a NOTE as James >> > suggests? That sounds a lot less fragile to me. >> >> But we already add the 'memstart_addr' variable to kallsyms in the >> kernel, don't we? And so user-space tools do use the same - so we >> already have a precedent available. > > Whoa, whoa -- hold up! The implication here is that variables exposed via > kallsyms are ABI. That's simply not true, otherwise we'd be breaking the > ABI every kernel release. I understand, but just to provide a detailed background, since we have use cases in user-space currently (for tools like crash-utility and makedumpfile), where we need to perform a virt_to_phys conversion to determine the physical address of an equivalent virtual address and we need similar computation as done in kernel's 'memory.h': phys_addr_t __x = (phys_addr_t)(x); \ __x & BIT(VA_BITS - 1) ? (__x & ~PAGE_OFFSET) + PHYS_OFFSET : \ (__x - kimage_voffset); }) Also since we define PHYS_OFFSET as: # define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; }) So, currently we calculate PHYS_OFFSET (or 'memstart_addr' value), in user-space by reading '/proc/iomem' nodes (or read the 'memstart_addr' value via '/dev/mem' which is available via '/proc/kallsyms') and use the same to perform the virt_to_phy conversions. An example of how we do the same virt_to_phy conversions in the crash-utility code (see [1]) is shared below for reference: ulong arm64_VTOP(ulong addr) { if (machdep->flags & NEW_VMEMMAP) { if (addr >= machdep->machspec->page_offset) return machdep->machspec->phys_offset + (addr - machdep->machspec->page_offset); else if (machdep->machspec->kimage_voffset) return addr - machdep->machspec->kimage_voffset; else /* no randomness */ return machdep->machspec->phys_offset + (addr - machdep->machspec->vmalloc_start_addr); } else { return machdep->machspec->phys_offset + (addr - machdep->machspec->page_offset); } } Please share your views. [1] https://github.com/crash-utility/crash/blob/master/arm64.c#L955 Thanks, Bhupesh
> -----Original Message----- > From: kexec [mailto:kexec-bounces@lists.infradead.org] On Behalf Of James > Morse > Sent: 2018年6月15日 0:17 > To: Bhupesh Sharma <bhsharma@redhat.com> > Cc: Mark Rutland <mark.rutland@arm.com>; Ard Biesheuvel > <ard.biesheuvel@linaro.org>; Catalin Marinas <catalin.marinas@arm.com>; > Kexec Mailing List <kexec@lists.infradead.org>; Will Deacon > <will.deacon@arm.com>; AKASHI Takahiro <takahiro.akashi@linaro.org>; > Bhupesh SHARMA <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- > kernel@lists.infradead.org> > Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base address of > linear region > > Hi Bhupesh, > > On 14/06/18 08:53, Bhupesh Sharma wrote: > > On Wed, Jun 13, 2018 at 3:59 PM, James Morse <james.morse@arm.com> > wrote: > >> On 13/06/18 06:16, Bhupesh Sharma wrote: > >>> On Tue, Jun 12, 2018 at 3:42 PM, James Morse <james.morse@arm.com> > wrote: > >>>> If I've followed this properly: the problem is that to generate the > >>>> ELF headers in the post-kdump vmcore, at kdump-load-time > >>>> kexec-tools has to guess the virtual addresses of the 'System RAM' regions > it can see in /proc/iomem. > >>>> > >>>> The problem you are hitting is an invisible hole at the beginning > >>>> of RAM, meaning user-space's guess_phys_to_virt() is off by the size of this > hole. > >>>> > >>>> Isn't KASLR a special case for this? You must have to correct for > >>>> that after kdump has happened, based on an elf-note in the vmcore. Can't > we always do this? > >>> > >>> No, I hit this issue both for the KASLR and non-KASLR boot cases. > >> > >> Because in both cases there is a hole at the beginning of the > >> linear-map. KASLR is a special-case of this as the kernel adds a > >> variable sized hole to do the randomization. > >> > >> Surely treating this as one case makes your user-space code simpler. > > > > Ok. > > > >>> Fixing this in kernel space seems better to me as the definition of > >> > >> Is there a kernel bug? Changing the definitions of internal kernel > >> variables for the benefit of code digging in /proc/kcore|/dev/mem isn't going > to fly. > > > > Indeed, I am not advocating to change the kernel space code just to > > suit the user-space tools. However in this particular case the > > 'memstart_addr' and PHY_OFFSET value are computed as 0 which IMO > > (What is PHY_OFFSET? I assume you mean PHYS_OFFSET, which is the same as > memstart_addr ... why do you quote them together?) > > > > is > > not the real representation of the start of System RAM as the 1st > > memory block available in Linux starts from 2MB [as confirmed by the > > 'memblock_start_of_DRAM()' value of 0x200000] and indicated by > > '/proc/iomem': > > > > # head -1 /proc/iomem > > 00200000-0021ffff : reserved > > You have assumptions about what memstart_addr is based on its name. Names > of kernel variables get further from their actual use over time. > > The purpose of this variable isn't to store where a hypothetical-lowest-page of > memory would be in the linear map. The kernel doesn't have a handy variable for > this, because on-one needs to know. > > > > I think reading the kernel code and finding 'memstart_addr' and > > PHY_OFFSET as 0, one gets the notion > > notion -> assumption based on the name > > It's just a name. Anyone reading this should grep for how the value is used. > It's added/subtracted from addresses as part of phys_to_virt()/virt_to_phs(). It > must be some kind of offset. What does it mean on its own? Probably nothing. > > > > that the base of System RAM starts from 0, which is incorrect in the > > above case as it starts from 2MB as the 1st block is of the type > > EfiReservedMemType > > What will they assume if the value is negative? > > [...] > > > So, either we should have a uniform way of representing the virtual > > base of the linear range > > What needs to know this? RAM will be somewhere between PAGE_OFFSET and > the top of the address space. Anyone who wants to know where has a specific > page in mind, phys_to_virt() or page_address() tell them where their page is. > > > > or we should rather look at removing the PAGE_OFFSET usage from the > > kernel (or atleast the confusing comment from 'memory.h') > > This?: > | PAGE_OFFSET - the virtual address of the start of the linear map > > Nothing here says its the virtual address of any particular physical page. Its the > start of the region of VA space that holds the 1:1 mapping of RAM. Its value is > generated at compile time, we have no idea where RAM will be until we boot, > how could this be the address of any particular page? > > > > BTW adding 'p2v_offset' as an elf-note seems like a good idea. If this > > seems suitable, I can try and spin patch(es) using this approach (both > > for the kernel and user-space tools). > > You seem to be using this for user-space phys_to_virt() based on values found in > /proc/iomem. This should give you what you want, and isolate your user-space > from the kernel's unexpected naming of variables. I don't know could I simplify this problem? Let's ignore what memstart_addr represents here, we just want to implement phys_to_virt() in an userspace applications(kexec-tools or others). ARM64 Kernel has a below definition: #define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | PAGE_OFFSET) So userspace app must know PHYS_OFFSET(equal to memstart_addr now). Seems this is very simple, but memstart_addr has gone through several operations in arm64_memblock_init() depends on different Kernel configurations, so userspace app needs to know many additional definitions as following: memblock_start_of_DRAM(), (ifdef CONFIG_SPARSEMEM_VMEMMAP), ARM64_MEMSTART_SHIFT, SECTION_SIZE_BITS, PAGE_OFFSET, memblock_end_of_DRAM(), IS_ENABLED(CONFIG_RANDOMIZE_BASE), memstart_offset_seed. It is hard to know all above in kexec-tools now. Originally I planned to read memstart_addr's value from "/dev/mem", but someone thought not all Kernels enable "/dev/mem", we'd better find a more generic approach. So we want to get some suggestions from ARM kernel community. Can we export this variable in Kernel side through sysconf() or other similar methods? Or someone can provide an effect way to get memstart_addr's value? Thanks, Yanjiang > > I'd suggest a 64bit offset that is added to a physical address to get where in the > linear map this page would be, if its mapped. > > > Thanks, > > James > > _______________________________________________ > kexec mailing list > kexec@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/kexec This email is intended only for the named addressee. It may contain information that is confidential/private, legally privileged, or copyright-protected, and you should handle it accordingly. If you are not the intended recipient, you do not have legal rights to retain, copy, or distribute this email or its contents, and should promptly delete the email and all electronic copies in your system; do not retain copies in any media. If you have received this email in error, please notify the sender promptly. Thank you.
On Tue, Jun 19, 2018 at 03:02:15AM +0000, Jin, Yanjiang wrote: > > You seem to be using this for user-space phys_to_virt() based on values found in > > /proc/iomem. This should give you what you want, and isolate your user-space > > from the kernel's unexpected naming of variables. > > I don't know could I simplify this problem? > Let's ignore what memstart_addr represents here, we just want to implement > phys_to_virt() in an userspace applications(kexec-tools or others). > > ARM64 Kernel has a below definition: > > #define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | PAGE_OFFSET) > > So userspace app must know PHYS_OFFSET(equal to memstart_addr now). Seems > this is very simple, but memstart_addr has gone through several operations > in arm64_memblock_init() depends on different Kernel configurations, so > userspace app needs to know many additional definitions as following: > > memblock_start_of_DRAM(), (ifdef CONFIG_SPARSEMEM_VMEMMAP), > ARM64_MEMSTART_SHIFT, SECTION_SIZE_BITS, PAGE_OFFSET, > memblock_end_of_DRAM(), IS_ENABLED(CONFIG_RANDOMIZE_BASE), > memstart_offset_seed. > > It is hard to know all above in kexec-tools now. Originally I planned to > read memstart_addr's value from "/dev/mem", but someone thought not all > Kernels enable "/dev/mem", we'd better find a more generic approach. So we > want to get some suggestions from ARM kernel community. > Can we export this variable in Kernel side through sysconf() or other > similar methods? Or someone can provide an effect way to get > memstart_addr's value? I thought the suggestion from James was to expose this via an ELF NOTE in kcore and vmcore (or in the header directly if that's possible, but I'm not sure about it)? Will
> -----Original Message----- > From: Will Deacon [mailto:will.deacon@arm.com] > Sent: 2018年6月19日 16:56 > To: Jin, Yanjiang <yanjiang.jin@hxt-semitech.com> > Cc: James Morse <james.morse@arm.com>; Bhupesh Sharma > <bhsharma@redhat.com>; Mark Rutland <mark.rutland@arm.com>; Ard > Biesheuvel <ard.biesheuvel@linaro.org>; Catalin Marinas > <catalin.marinas@arm.com>; Kexec Mailing List <kexec@lists.infradead.org>; > AKASHI Takahiro <takahiro.akashi@linaro.org>; Bhupesh SHARMA > <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- > kernel@lists.infradead.org> > Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base address of > linear region > > On Tue, Jun 19, 2018 at 03:02:15AM +0000, Jin, Yanjiang wrote: > > > You seem to be using this for user-space phys_to_virt() based on > > > values found in /proc/iomem. This should give you what you want, and > > > isolate your user-space from the kernel's unexpected naming of variables. > > > > I don't know could I simplify this problem? > > Let's ignore what memstart_addr represents here, we just want to > > implement > > phys_to_virt() in an userspace applications(kexec-tools or others). > > > > ARM64 Kernel has a below definition: > > > > #define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | > PAGE_OFFSET) > > > > So userspace app must know PHYS_OFFSET(equal to memstart_addr now). > > Seems this is very simple, but memstart_addr has gone through several > > operations in arm64_memblock_init() depends on different Kernel > > configurations, so userspace app needs to know many additional definitions as > following: > > > > memblock_start_of_DRAM(), (ifdef CONFIG_SPARSEMEM_VMEMMAP), > > ARM64_MEMSTART_SHIFT, SECTION_SIZE_BITS, PAGE_OFFSET, > > memblock_end_of_DRAM(), IS_ENABLED(CONFIG_RANDOMIZE_BASE), > > memstart_offset_seed. > > > > It is hard to know all above in kexec-tools now. Originally I planned > > to read memstart_addr's value from "/dev/mem", but someone thought not > > all Kernels enable "/dev/mem", we'd better find a more generic > > approach. So we want to get some suggestions from ARM kernel community. > > Can we export this variable in Kernel side through sysconf() or other > > similar methods? Or someone can provide an effect way to get > > memstart_addr's value? > > I thought the suggestion from James was to expose this via an ELF NOTE in kcore > and vmcore (or in the header directly if that's possible, but I'm not sure about it)? Hi Will, Thanks for your reply firstly. But same as DEVMEM, kcore is not a must-have, so we can't depend on it. On the other hand, phys_to_virt() is called during generating vmcore in Kexec-tools, vmcore also can't help this issue. Unfortunately, not all platforms support analyzing Kernel config in userspace application, so Kexec-tools can't know some key kernel options. If not so, we can simulate the whole arm64_memblock_init() progress in kexec-tools. Thanks, Yanjiang > > Will This email is intended only for the named addressee. It may contain information that is confidential/private, legally privileged, or copyright-protected, and you should handle it accordingly. If you are not the intended recipient, you do not have legal rights to retain, copy, or distribute this email or its contents, and should promptly delete the email and all electronic copies in your system; do not retain copies in any media. If you have received this email in error, please notify the sender promptly. Thank you.
On Tue, Jun 19, 2018 at 09:34:56AM +0000, Jin, Yanjiang wrote: > > On Tue, Jun 19, 2018 at 03:02:15AM +0000, Jin, Yanjiang wrote: > > > > You seem to be using this for user-space phys_to_virt() based on > > > > values found in /proc/iomem. This should give you what you want, and > > > > isolate your user-space from the kernel's unexpected naming of variables. > > > > > > I don't know could I simplify this problem? > > > Let's ignore what memstart_addr represents here, we just want to > > > implement > > > phys_to_virt() in an userspace applications(kexec-tools or others). > > > > > > ARM64 Kernel has a below definition: > > > > > > #define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | > > PAGE_OFFSET) > > > > > > So userspace app must know PHYS_OFFSET(equal to memstart_addr now). > > > Seems this is very simple, but memstart_addr has gone through several > > > operations in arm64_memblock_init() depends on different Kernel > > > configurations, so userspace app needs to know many additional definitions as > > following: > > > > > > memblock_start_of_DRAM(), (ifdef CONFIG_SPARSEMEM_VMEMMAP), > > > ARM64_MEMSTART_SHIFT, SECTION_SIZE_BITS, PAGE_OFFSET, > > > memblock_end_of_DRAM(), IS_ENABLED(CONFIG_RANDOMIZE_BASE), > > > memstart_offset_seed. > > > > > > It is hard to know all above in kexec-tools now. Originally I planned > > > to read memstart_addr's value from "/dev/mem", but someone thought not > > > all Kernels enable "/dev/mem", we'd better find a more generic > > > approach. So we want to get some suggestions from ARM kernel community. > > > Can we export this variable in Kernel side through sysconf() or other > > > similar methods? Or someone can provide an effect way to get > > > memstart_addr's value? > > > > I thought the suggestion from James was to expose this via an ELF NOTE in kcore > > and vmcore (or in the header directly if that's possible, but I'm not sure about it)? > > Thanks for your reply firstly. But same as DEVMEM, kcore is not a > must-have, so we can't depend on it. Neither is KEXEC. We can select PROC_KCORE from KEXEC if it helps. > On the other hand, phys_to_virt() is called during generating vmcore in > Kexec-tools, vmcore also can't help this issue. I don't understand this part. If you have the vmcore in your hand, why can't you grok the pv offset from the note and use that in phys_to_virt()? > Unfortunately, not all platforms support analyzing Kernel config in > userspace application, so Kexec-tools can't know some key kernel options. > If not so, we can simulate the whole arm64_memblock_init() progress in > kexec-tools. I don't understand what the kernel config has to do with kexec tools. Will
> -----Original Message----- > From: Will Deacon [mailto:will.deacon@arm.com] > Sent: 2018年6月19日 17:41 > To: Jin, Yanjiang <yanjiang.jin@hxt-semitech.com> > Cc: James Morse <james.morse@arm.com>; Bhupesh Sharma > <bhsharma@redhat.com>; Mark Rutland <mark.rutland@arm.com>; Ard > Biesheuvel <ard.biesheuvel@linaro.org>; Catalin Marinas > <catalin.marinas@arm.com>; Kexec Mailing List <kexec@lists.infradead.org>; > AKASHI Takahiro <takahiro.akashi@linaro.org>; Bhupesh SHARMA > <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- > kernel@lists.infradead.org> > Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base address of > linear region > > On Tue, Jun 19, 2018 at 09:34:56AM +0000, Jin, Yanjiang wrote: > > > On Tue, Jun 19, 2018 at 03:02:15AM +0000, Jin, Yanjiang wrote: > > > > > You seem to be using this for user-space phys_to_virt() based on > > > > > values found in /proc/iomem. This should give you what you want, > > > > > and isolate your user-space from the kernel's unexpected naming of > variables. > > > > > > > > I don't know could I simplify this problem? > > > > Let's ignore what memstart_addr represents here, we just want to > > > > implement > > > > phys_to_virt() in an userspace applications(kexec-tools or others). > > > > > > > > ARM64 Kernel has a below definition: > > > > > > > > #define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | > > > PAGE_OFFSET) > > > > > > > > So userspace app must know PHYS_OFFSET(equal to memstart_addr now). > > > > Seems this is very simple, but memstart_addr has gone through > > > > several operations in arm64_memblock_init() depends on different > > > > Kernel configurations, so userspace app needs to know many > > > > additional definitions as > > > following: > > > > > > > > memblock_start_of_DRAM(), (ifdef CONFIG_SPARSEMEM_VMEMMAP), > > > > ARM64_MEMSTART_SHIFT, SECTION_SIZE_BITS, PAGE_OFFSET, > > > > memblock_end_of_DRAM(), IS_ENABLED(CONFIG_RANDOMIZE_BASE), > > > > memstart_offset_seed. > > > > > > > > It is hard to know all above in kexec-tools now. Originally I > > > > planned to read memstart_addr's value from "/dev/mem", but someone > > > > thought not all Kernels enable "/dev/mem", we'd better find a more > > > > generic approach. So we want to get some suggestions from ARM kernel > community. > > > > Can we export this variable in Kernel side through sysconf() or > > > > other similar methods? Or someone can provide an effect way to get > > > > memstart_addr's value? > > > > > > I thought the suggestion from James was to expose this via an ELF > > > NOTE in kcore and vmcore (or in the header directly if that's possible, but I'm > not sure about it)? > > > > Thanks for your reply firstly. But same as DEVMEM, kcore is not a > > must-have, so we can't depend on it. > > Neither is KEXEC. We can select PROC_KCORE from KEXEC if it helps. > > > On the other hand, phys_to_virt() is called during generating vmcore > > in Kexec-tools, vmcore also can't help this issue. > > I don't understand this part. If you have the vmcore in your hand, why can't you > grok the pv offset from the note and use that in phys_to_virt()? It is a chicken-and-egg issue. phys_to virt() is for crashdump setup. To generate vmcore, we must call phys_to_virt(). At this point, no vmcore exists. Yanjiang > > > Unfortunately, not all platforms support analyzing Kernel config in > > userspace application, so Kexec-tools can't know some key kernel options. > > If not so, we can simulate the whole arm64_memblock_init() progress > > in kexec-tools. > > I don't understand what the kernel config has to do with kexec tools. I mean that if we can know kernel .config in all circumstances, we can calculate memstart_addr as below in Kexec-tools: memstart_addr = round_down(memblock_start_of_DRAM(), ARM64_MEMSTART_ALIGN); #if defined(CONFIG_SPARSEMEM_VMEMMAP) && ARM64_MEMSTART_SHIFT < SECTION_SIZE_BITS #define ARM64_MEMSTART_ALIGN (1UL << SECTION_SIZE_BITS) ...... #endif #define ARM64_MEMSTART_SHIFT PMD_SHIFT #if CONFIG_PGTABLE_LEVELS > 2 #define PMD_SHIFT ARM64_HW_PGTABLE_LEVEL_SHIFT(2) ........... #endif Yanjiang > > Will This email is intended only for the named addressee. It may contain information that is confidential/private, legally privileged, or copyright-protected, and you should handle it accordingly. If you are not the intended recipient, you do not have legal rights to retain, copy, or distribute this email or its contents, and should promptly delete the email and all electronic copies in your system; do not retain copies in any media. If you have received this email in error, please notify the sender promptly. Thank you.
Hi Yanjiang, Will, On 19/06/18 10:57, Jin, Yanjiang wrote: >> -----Original Message----- >> From: Will Deacon [mailto:will.deacon@arm.com] >> Sent: 2018年6月19日 17:41 >> To: Jin, Yanjiang <yanjiang.jin@hxt-semitech.com> >> Cc: James Morse <james.morse@arm.com>; Bhupesh Sharma >> <bhsharma@redhat.com>; Mark Rutland <mark.rutland@arm.com>; Ard >> Biesheuvel <ard.biesheuvel@linaro.org>; Catalin Marinas >> <catalin.marinas@arm.com>; Kexec Mailing List <kexec@lists.infradead.org>; >> AKASHI Takahiro <takahiro.akashi@linaro.org>; Bhupesh SHARMA >> <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- >> kernel@lists.infradead.org> >> Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base address of >> linear region >> >> On Tue, Jun 19, 2018 at 09:34:56AM +0000, Jin, Yanjiang wrote: >>>> On Tue, Jun 19, 2018 at 03:02:15AM +0000, Jin, Yanjiang wrote: >>>>>> You seem to be using this for user-space phys_to_virt() based on >>>>>> values found in /proc/iomem. This should give you what you want, >>>>>> and isolate your user-space from the kernel's unexpected naming of >> variables. >>>>> >>>>> I don't know could I simplify this problem? >>>>> Let's ignore what memstart_addr represents here, we just want to >>>>> implement >>>>> phys_to_virt() in an userspace applications(kexec-tools or others). >>>>> >>>>> ARM64 Kernel has a below definition: >>>>> >>>>> #define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | >>>> PAGE_OFFSET) >>>>> >>>>> So userspace app must know PHYS_OFFSET(equal to memstart_addr now). >>>>> Seems this is very simple, but memstart_addr has gone through >>>>> several operations in arm64_memblock_init() depends on different >>>>> Kernel configurations, so userspace app needs to know many >>>>> additional definitions as >>>> following: >>>>> >>>>> memblock_start_of_DRAM(), (ifdef CONFIG_SPARSEMEM_VMEMMAP), >>>>> ARM64_MEMSTART_SHIFT, SECTION_SIZE_BITS, PAGE_OFFSET, >>>>> memblock_end_of_DRAM(), IS_ENABLED(CONFIG_RANDOMIZE_BASE), >>>>> memstart_offset_seed. >>>>> >>>>> It is hard to know all above in kexec-tools now. Originally I >>>>> planned to read memstart_addr's value from "/dev/mem", but someone >>>>> thought not all Kernels enable "/dev/mem", we'd better find a more >>>>> generic approach. So we want to get some suggestions from ARM kernel >> community. >>>>> Can we export this variable in Kernel side through sysconf() or >>>>> other similar methods? Or someone can provide an effect way to get >>>>> memstart_addr's value? >>>> >>>> I thought the suggestion from James was to expose this via an ELF >>>> NOTE in kcore and vmcore (or in the header directly if that's possible, but I'm >> not sure about it)? >>> >>> Thanks for your reply firstly. But same as DEVMEM, kcore is not a >>> must-have, so we can't depend on it. >> >> Neither is KEXEC. We can select PROC_KCORE from KEXEC if it helps. >> >>> On the other hand, phys_to_virt() is called during generating vmcore >>> in Kexec-tools, vmcore also can't help this issue. >> >> I don't understand this part. If you have the vmcore in your hand, why can't you >> grok the pv offset from the note and use that in phys_to_virt()? > > It is a chicken-and-egg issue. > phys_to virt() is for crashdump setup. To generate vmcore, we must call > phys_to_virt(). At this point, no vmcore exists. Its needed for the parts of the ELF header that kexec-tools generates at kdump load time? So adding this pv_offset to the key=value data crash_save_vmcoreinfo_init() saves isn't available early enough? If we select PROC_KCORE for KEXEC so you know you will have /proc/kcore if the system supports kdump. We should probably provide the same information in the PT_NOTE section of the /proc/kcore file. (I thought the kdump kernel exported that crash_save_vmcoreinfo_init() data as an elf-note itself, but digging deeper I see the kernel exposes the physical address in /sys/kernel/vmcoreinfo. Presumably its passed back via the kdump elfcorehdr.) >>> Unfortunately, not all platforms support analyzing Kernel config in >>> userspace application, so Kexec-tools can't know some key kernel options. >>> If not so, we can simulate the whole arm64_memblock_init() progress >>> in kexec-tools. >> >> I don't understand what the kernel config has to do with kexec tools. > > I mean that if we can know kernel .config in all circumstances, we can calculate memstart_addr as below in Kexec-tools: > > > memstart_addr = round_down(memblock_start_of_DRAM(), > ARM64_MEMSTART_ALIGN); This wouldn't work for KASLR. Having the kernel provide you with the offset means you are insulated from the details of phys_to_virt() and what affects these values. It should be possible to do this in the same way for all architectures. Thanks, James
Hi James, On Tue, Jun 19, 2018 at 3:46 PM, James Morse <james.morse@arm.com> wrote: > Hi Yanjiang, Will, > > On 19/06/18 10:57, Jin, Yanjiang wrote: >>> -----Original Message----- >>> From: Will Deacon [mailto:will.deacon@arm.com] >>> Sent: 2018年6月19日 17:41 >>> To: Jin, Yanjiang <yanjiang.jin@hxt-semitech.com> >>> Cc: James Morse <james.morse@arm.com>; Bhupesh Sharma >>> <bhsharma@redhat.com>; Mark Rutland <mark.rutland@arm.com>; Ard >>> Biesheuvel <ard.biesheuvel@linaro.org>; Catalin Marinas >>> <catalin.marinas@arm.com>; Kexec Mailing List <kexec@lists.infradead.org>; >>> AKASHI Takahiro <takahiro.akashi@linaro.org>; Bhupesh SHARMA >>> <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- >>> kernel@lists.infradead.org> >>> Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base address of >>> linear region >>> >>> On Tue, Jun 19, 2018 at 09:34:56AM +0000, Jin, Yanjiang wrote: >>>>> On Tue, Jun 19, 2018 at 03:02:15AM +0000, Jin, Yanjiang wrote: >>>>>>> You seem to be using this for user-space phys_to_virt() based on >>>>>>> values found in /proc/iomem. This should give you what you want, >>>>>>> and isolate your user-space from the kernel's unexpected naming of >>> variables. >>>>>> >>>>>> I don't know could I simplify this problem? >>>>>> Let's ignore what memstart_addr represents here, we just want to >>>>>> implement >>>>>> phys_to_virt() in an userspace applications(kexec-tools or others). >>>>>> >>>>>> ARM64 Kernel has a below definition: >>>>>> >>>>>> #define __phys_to_virt(x) ((unsigned long)((x) - PHYS_OFFSET) | >>>>> PAGE_OFFSET) >>>>>> >>>>>> So userspace app must know PHYS_OFFSET(equal to memstart_addr now). >>>>>> Seems this is very simple, but memstart_addr has gone through >>>>>> several operations in arm64_memblock_init() depends on different >>>>>> Kernel configurations, so userspace app needs to know many >>>>>> additional definitions as >>>>> following: >>>>>> >>>>>> memblock_start_of_DRAM(), (ifdef CONFIG_SPARSEMEM_VMEMMAP), >>>>>> ARM64_MEMSTART_SHIFT, SECTION_SIZE_BITS, PAGE_OFFSET, >>>>>> memblock_end_of_DRAM(), IS_ENABLED(CONFIG_RANDOMIZE_BASE), >>>>>> memstart_offset_seed. >>>>>> >>>>>> It is hard to know all above in kexec-tools now. Originally I >>>>>> planned to read memstart_addr's value from "/dev/mem", but someone >>>>>> thought not all Kernels enable "/dev/mem", we'd better find a more >>>>>> generic approach. So we want to get some suggestions from ARM kernel >>> community. >>>>>> Can we export this variable in Kernel side through sysconf() or >>>>>> other similar methods? Or someone can provide an effect way to get >>>>>> memstart_addr's value? >>>>> >>>>> I thought the suggestion from James was to expose this via an ELF >>>>> NOTE in kcore and vmcore (or in the header directly if that's possible, but I'm >>> not sure about it)? >>>> >>>> Thanks for your reply firstly. But same as DEVMEM, kcore is not a >>>> must-have, so we can't depend on it. >>> >>> Neither is KEXEC. We can select PROC_KCORE from KEXEC if it helps. >>> >>>> On the other hand, phys_to_virt() is called during generating vmcore >>>> in Kexec-tools, vmcore also can't help this issue. >>> >>> I don't understand this part. If you have the vmcore in your hand, why can't you >>> grok the pv offset from the note and use that in phys_to_virt()? >> >> It is a chicken-and-egg issue. >> phys_to virt() is for crashdump setup. To generate vmcore, we must call >> phys_to_virt(). At this point, no vmcore exists. > > Its needed for the parts of the ELF header that kexec-tools generates at kdump > load time? > > So adding this pv_offset to the key=value data crash_save_vmcoreinfo_init() > saves isn't available early enough? Yes, one case where it is not actually available early enough for makedumpfile usage is if we are determining the PT_NOTE contents from the '/proc/kcore' on a 'live' system See <https://github.com/bhupesh-sharma/makedumpfile/blob/devel/elf_info.c#L375> for example: int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len) { <snip..> kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET; } Now the problem at hand is to determine the offset at which the pv_offset (key=value data pair) lies in the '/proc/kcore' (I assume that when you mentioned above and earlier about adding this pair to the elfnotes you meant both the vmcoreinfo and 'proc/kcore'), as we can have 'n' number of PT_LOAD segments. So, we have a chicken and egg situation in such case(s). Do you have any pointers on how we can fix such use-cases. Thanks, Bhupesh > If we select PROC_KCORE for KEXEC so you know you will have /proc/kcore if the > system supports kdump. We should probably provide the same information in the > PT_NOTE section of the /proc/kcore file. > > > (I thought the kdump kernel exported that crash_save_vmcoreinfo_init() data as > an elf-note itself, but digging deeper I see the kernel exposes the physical > address in /sys/kernel/vmcoreinfo. Presumably its passed back via the kdump > elfcorehdr.) > > >>>> Unfortunately, not all platforms support analyzing Kernel config in >>>> userspace application, so Kexec-tools can't know some key kernel options. >>>> If not so, we can simulate the whole arm64_memblock_init() progress >>>> in kexec-tools. >>> >>> I don't understand what the kernel config has to do with kexec tools. >> >> I mean that if we can know kernel .config in all circumstances, we can calculate memstart_addr as below in Kexec-tools: >> >> >> memstart_addr = round_down(memblock_start_of_DRAM(), >> ARM64_MEMSTART_ALIGN); > > This wouldn't work for KASLR. Having the kernel provide you with the offset > means you are insulated from the details of phys_to_virt() and what affects > these values. It should be possible to do this in the same way for all > architectures. > > > Thanks, > > James
Hi Bhupesh, On 19/06/18 11:37, Bhupesh Sharma wrote: > On Tue, Jun 19, 2018 at 3:46 PM, James Morse <james.morse@arm.com> wrote: >> On 19/06/18 10:57, Jin, Yanjiang wrote: >>>> -----Original Message----- >>>> From: Will Deacon [mailto:will.deacon@arm.com] >>>> Sent: 2018年6月19日 17:41 >>>> To: Jin, Yanjiang <yanjiang.jin@hxt-semitech.com> >>>> Cc: James Morse <james.morse@arm.com>; Bhupesh Sharma >>>> <bhsharma@redhat.com>; Mark Rutland <mark.rutland@arm.com>; Ard >>>> Biesheuvel <ard.biesheuvel@linaro.org>; Catalin Marinas >>>> <catalin.marinas@arm.com>; Kexec Mailing List <kexec@lists.infradead.org>; >>>> AKASHI Takahiro <takahiro.akashi@linaro.org>; Bhupesh SHARMA >>>> <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- >>>> kernel@lists.infradead.org> >>>> Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base address of >>>> linear region >>>>>>> It is hard to know all above in kexec-tools now. Originally I >>>>>>> planned to read memstart_addr's value from "/dev/mem", but someone >>>>>>> thought not all Kernels enable "/dev/mem", we'd better find a more >>>>>>> generic approach. So we want to get some suggestions from ARM kernel >>>> community. >>>>>>> Can we export this variable in Kernel side through sysconf() or >>>>>>> other similar methods? Or someone can provide an effect way to get >>>>>>> memstart_addr's value? >>>>>> >>>>>> I thought the suggestion from James was to expose this via an ELF >>>>>> NOTE in kcore and vmcore (or in the header directly if that's possible, but I'm >>>> not sure about it)? >>>>> >>>>> Thanks for your reply firstly. But same as DEVMEM, kcore is not a >>>>> must-have, so we can't depend on it. >>>> >>>> Neither is KEXEC. We can select PROC_KCORE from KEXEC if it helps. >>>> >>>>> On the other hand, phys_to_virt() is called during generating vmcore >>>>> in Kexec-tools, vmcore also can't help this issue. >>>> >>>> I don't understand this part. If you have the vmcore in your hand, why can't you >>>> grok the pv offset from the note and use that in phys_to_virt()? >>> >>> It is a chicken-and-egg issue. >>> phys_to virt() is for crashdump setup. To generate vmcore, we must call >>> phys_to_virt(). At this point, no vmcore exists. >> >> Its needed for the parts of the ELF header that kexec-tools generates at kdump >> load time? >> >> So adding this pv_offset to the key=value data crash_save_vmcoreinfo_init() >> saves isn't available early enough? > Yes, one case where it is not actually available early enough for > makedumpfile usage is if we are determining the PT_NOTE contents from > the '/proc/kcore' on a 'live' system > int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len) > > { > > <snip..> > kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET; > > } You are trying to read the vmcoreinfo through /proc/kcore given knowledge of its physical address. I'm suggesting adding the contents of vmcoreinfo as a PT_NOTE section of /proc/kcore's ELF header. No special knowledge necessary, any elf-parser should be able to dump the values. > Now the problem at hand is to determine the offset at which the > pv_offset (key=value data pair) lies in the '/proc/kcore' (I assume > that when you mentioned above and earlier about adding this pair to > the elfnotes you meant both the vmcoreinfo and 'proc/kcore'), as we > can have 'n' number of PT_LOAD segments. It looks like there is already a NOTE section with core info in there: | # readelf -l /proc/kcore | | Elf file type is CORE (Core file) | Entry point 0x0 | There are 16 program headers, starting at offset 64 | | Program Headers: | Type Offset VirtAddr PhysAddr | FileSiz MemSiz Flags Align | NOTE 0x00000000000003c0 0x0000000000000000 0x0000000000000000 | 0x0000000000001114 0x0000000000000000 0x0 I assume we can add more notes without breaking the existing user... (and it looks like there are some broken __pa(kernel symbol) users in there. Thanks, James
Hi James, On Tue, Jun 19, 2018 at 4:56 PM, James Morse <james.morse@arm.com> wrote: > Hi Bhupesh, > > On 19/06/18 11:37, Bhupesh Sharma wrote: >> On Tue, Jun 19, 2018 at 3:46 PM, James Morse <james.morse@arm.com> wrote: >>> On 19/06/18 10:57, Jin, Yanjiang wrote: >>>>> -----Original Message----- >>>>> From: Will Deacon [mailto:will.deacon@arm.com] >>>>> Sent: 2018年6月19日 17:41 >>>>> To: Jin, Yanjiang <yanjiang.jin@hxt-semitech.com> >>>>> Cc: James Morse <james.morse@arm.com>; Bhupesh Sharma >>>>> <bhsharma@redhat.com>; Mark Rutland <mark.rutland@arm.com>; Ard >>>>> Biesheuvel <ard.biesheuvel@linaro.org>; Catalin Marinas >>>>> <catalin.marinas@arm.com>; Kexec Mailing List <kexec@lists.infradead.org>; >>>>> AKASHI Takahiro <takahiro.akashi@linaro.org>; Bhupesh SHARMA >>>>> <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- >>>>> kernel@lists.infradead.org> >>>>> Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base address of >>>>> linear region > >>>>>>>> It is hard to know all above in kexec-tools now. Originally I >>>>>>>> planned to read memstart_addr's value from "/dev/mem", but someone >>>>>>>> thought not all Kernels enable "/dev/mem", we'd better find a more >>>>>>>> generic approach. So we want to get some suggestions from ARM kernel >>>>> community. >>>>>>>> Can we export this variable in Kernel side through sysconf() or >>>>>>>> other similar methods? Or someone can provide an effect way to get >>>>>>>> memstart_addr's value? >>>>>>> >>>>>>> I thought the suggestion from James was to expose this via an ELF >>>>>>> NOTE in kcore and vmcore (or in the header directly if that's possible, but I'm >>>>> not sure about it)? >>>>>> >>>>>> Thanks for your reply firstly. But same as DEVMEM, kcore is not a >>>>>> must-have, so we can't depend on it. >>>>> >>>>> Neither is KEXEC. We can select PROC_KCORE from KEXEC if it helps. >>>>> >>>>>> On the other hand, phys_to_virt() is called during generating vmcore >>>>>> in Kexec-tools, vmcore also can't help this issue. >>>>> >>>>> I don't understand this part. If you have the vmcore in your hand, why can't you >>>>> grok the pv offset from the note and use that in phys_to_virt()? >>>> >>>> It is a chicken-and-egg issue. >>>> phys_to virt() is for crashdump setup. To generate vmcore, we must call >>>> phys_to_virt(). At this point, no vmcore exists. >>> >>> Its needed for the parts of the ELF header that kexec-tools generates at kdump >>> load time? >>> >>> So adding this pv_offset to the key=value data crash_save_vmcoreinfo_init() >>> saves isn't available early enough? > >> Yes, one case where it is not actually available early enough for >> makedumpfile usage is if we are determining the PT_NOTE contents from >> the '/proc/kcore' on a 'live' system > >> int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t vmcoreinfo_len) >> >> { >> >> <snip..> >> kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET; >> >> } > > You are trying to read the vmcoreinfo through /proc/kcore given knowledge of its > physical address. > > I'm suggesting adding the contents of vmcoreinfo as a PT_NOTE section of > /proc/kcore's ELF header. No special knowledge necessary, any elf-parser should > be able to dump the values. > > >> Now the problem at hand is to determine the offset at which the >> pv_offset (key=value data pair) lies in the '/proc/kcore' (I assume >> that when you mentioned above and earlier about adding this pair to >> the elfnotes you meant both the vmcoreinfo and 'proc/kcore'), as we >> can have 'n' number of PT_LOAD segments. > > It looks like there is already a NOTE section with core info in there: > | # readelf -l /proc/kcore > | > | Elf file type is CORE (Core file) > | Entry point 0x0 > | There are 16 program headers, starting at offset 64 > | > | Program Headers: > | Type Offset VirtAddr PhysAddr > | FileSiz MemSiz Flags Align > | NOTE 0x00000000000003c0 0x0000000000000000 0x0000000000000000 > | 0x0000000000001114 0x0000000000000000 0x0 > > I assume we can add more notes without breaking the existing user... > > (and it looks like there are some broken __pa(kernel symbol) users in there. Thanks for your inputs. I am working on fixes on the above lines for kernel and user-space tools (like makedumpfile, crash-utility and kexec-tools). I will post some RFC patches on the same lines (or come back in case I get stuck somewhere) shortly. Thanks, Bhupesh
Hi James, Bhupesh, If /proc/kcore always exists in kexec/kdump, I think this issue can be fixed easily. But it requires that Kexec/kdump have to rely on " CONFIG_PROC_KCORE=y". I am not sure if we can persuade Kexec-tools community to accept this. Thanks, Yanjiang > -----Original Message----- > From: Bhupesh Sharma [mailto:bhsharma@redhat.com] > Sent: 2018年6月19日 19:58 > To: James Morse <james.morse@arm.com> > Cc: Jin, Yanjiang <yanjiang.jin@hxt-semitech.com>; Will Deacon > <will.deacon@arm.com>; Mark Rutland <mark.rutland@arm.com>; Ard > Biesheuvel <ard.biesheuvel@linaro.org>; Catalin Marinas > <catalin.marinas@arm.com>; Kexec Mailing List <kexec@lists.infradead.org>; > AKASHI Takahiro <takahiro.akashi@linaro.org>; Bhupesh SHARMA > <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- > kernel@lists.infradead.org> > Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base address of > linear region > > Hi James, > > On Tue, Jun 19, 2018 at 4:56 PM, James Morse <james.morse@arm.com> wrote: > > Hi Bhupesh, > > > > On 19/06/18 11:37, Bhupesh Sharma wrote: > >> On Tue, Jun 19, 2018 at 3:46 PM, James Morse <james.morse@arm.com> > wrote: > >>> On 19/06/18 10:57, Jin, Yanjiang wrote: > >>>>> -----Original Message----- > >>>>> From: Will Deacon [mailto:will.deacon@arm.com] > >>>>> Sent: 2018年6月19日 17:41 > >>>>> To: Jin, Yanjiang <yanjiang.jin@hxt-semitech.com> > >>>>> Cc: James Morse <james.morse@arm.com>; Bhupesh Sharma > >>>>> <bhsharma@redhat.com>; Mark Rutland <mark.rutland@arm.com>; Ard > >>>>> Biesheuvel <ard.biesheuvel@linaro.org>; Catalin Marinas > >>>>> <catalin.marinas@arm.com>; Kexec Mailing List > >>>>> <kexec@lists.infradead.org>; AKASHI Takahiro > >>>>> <takahiro.akashi@linaro.org>; Bhupesh SHARMA > >>>>> <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- > >>>>> kernel@lists.infradead.org> > >>>>> Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base > >>>>> address of linear region > > > >>>>>>>> It is hard to know all above in kexec-tools now. Originally I > >>>>>>>> planned to read memstart_addr's value from "/dev/mem", but > >>>>>>>> someone thought not all Kernels enable "/dev/mem", we'd better > >>>>>>>> find a more generic approach. So we want to get some > >>>>>>>> suggestions from ARM kernel > >>>>> community. > >>>>>>>> Can we export this variable in Kernel side through sysconf() or > >>>>>>>> other similar methods? Or someone can provide an effect way to > >>>>>>>> get memstart_addr's value? > >>>>>>> > >>>>>>> I thought the suggestion from James was to expose this via an > >>>>>>> ELF NOTE in kcore and vmcore (or in the header directly if > >>>>>>> that's possible, but I'm > >>>>> not sure about it)? > >>>>>> > >>>>>> Thanks for your reply firstly. But same as DEVMEM, kcore is not a > >>>>>> must-have, so we can't depend on it. > >>>>> > >>>>> Neither is KEXEC. We can select PROC_KCORE from KEXEC if it helps. > >>>>> > >>>>>> On the other hand, phys_to_virt() is called during generating > >>>>>> vmcore in Kexec-tools, vmcore also can't help this issue. > >>>>> > >>>>> I don't understand this part. If you have the vmcore in your hand, > >>>>> why can't you grok the pv offset from the note and use that in > phys_to_virt()? > >>>> > >>>> It is a chicken-and-egg issue. > >>>> phys_to virt() is for crashdump setup. To generate vmcore, we must > >>>> call phys_to_virt(). At this point, no vmcore exists. > >>> > >>> Its needed for the parts of the ELF header that kexec-tools > >>> generates at kdump load time? > >>> > >>> So adding this pv_offset to the key=value data > >>> crash_save_vmcoreinfo_init() saves isn't available early enough? > > > >> Yes, one case where it is not actually available early enough for > >> makedumpfile usage is if we are determining the PT_NOTE contents from > >> the '/proc/kcore' on a 'live' system > > > >> int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t > >> vmcoreinfo_len) > >> > >> { > >> > >> <snip..> > >> kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET; > >> > >> } > > > > You are trying to read the vmcoreinfo through /proc/kcore given > > knowledge of its physical address. > > > > I'm suggesting adding the contents of vmcoreinfo as a PT_NOTE section > > of /proc/kcore's ELF header. No special knowledge necessary, any > > elf-parser should be able to dump the values. > > > > > >> Now the problem at hand is to determine the offset at which the > >> pv_offset (key=value data pair) lies in the '/proc/kcore' (I assume > >> that when you mentioned above and earlier about adding this pair to > >> the elfnotes you meant both the vmcoreinfo and 'proc/kcore'), as we > >> can have 'n' number of PT_LOAD segments. > > > > It looks like there is already a NOTE section with core info in there: > > | # readelf -l /proc/kcore > > | > > | Elf file type is CORE (Core file) > > | Entry point 0x0 > > | There are 16 program headers, starting at offset 64 > > | > > | Program Headers: > > | Type Offset VirtAddr PhysAddr > > | FileSiz MemSiz Flags Align > > | NOTE 0x00000000000003c0 0x0000000000000000 0x0000000000000000 > > | 0x0000000000001114 0x0000000000000000 0x0 > > > > I assume we can add more notes without breaking the existing user... > > > > (and it looks like there are some broken __pa(kernel symbol) users in there. > > Thanks for your inputs. > > I am working on fixes on the above lines for kernel and user-space tools (like > makedumpfile, crash-utility and kexec-tools). > > I will post some RFC patches on the same lines (or come back in case I get stuck > somewhere) shortly. > > Thanks, > Bhupesh This email is intended only for the named addressee. It may contain information that is confidential/private, legally privileged, or copyright-protected, and you should handle it accordingly. If you are not the intended recipient, you do not have legal rights to retain, copy, or distribute this email or its contents, and should promptly delete the email and all electronic copies in your system; do not retain copies in any media. If you have received this email in error, please notify the sender promptly. Thank you.
Hi Yanjiang, On Wed, Jun 20, 2018 at 7:46 AM, Jin, Yanjiang <yanjiang.jin@hxt-semitech.com> wrote: > Hi James, Bhupesh, > > If /proc/kcore always exists in kexec/kdump, I think this issue can be fixed easily. But it requires that Kexec/kdump have to rely on " CONFIG_PROC_KCORE=y". > I am not sure if we can persuade Kexec-tools community to accept this. Most distributions like Ubuntu and Fedora already enable CONFIG_PROC_KCORE by default, to support user-space tools like crash-utility and makedumpfile which can be used for 'live' debugging of a primary kernel (without the requirement of being in the secondary or crash kernel). For such cases. '/proc/kcore' and 'vmlinux' are the only available sources for PT_NOTE/PT_LOAD segments and kernel symbols respectively. Since we need to support all such existing user-space utilities (which work well with other archs like x86 and ppc64), we need to have a solution which works without modifying most of them - the rest (like kexec-tools) can be easily modified to follow the same approach. I would share some patches soon on the same lines both for kernel and user-space. Thanks, Bhupesh >> -----Original Message----- >> From: Bhupesh Sharma [mailto:bhsharma@redhat.com] >> Sent: 2018年6月19日 19:58 >> To: James Morse <james.morse@arm.com> >> Cc: Jin, Yanjiang <yanjiang.jin@hxt-semitech.com>; Will Deacon >> <will.deacon@arm.com>; Mark Rutland <mark.rutland@arm.com>; Ard >> Biesheuvel <ard.biesheuvel@linaro.org>; Catalin Marinas >> <catalin.marinas@arm.com>; Kexec Mailing List <kexec@lists.infradead.org>; >> AKASHI Takahiro <takahiro.akashi@linaro.org>; Bhupesh SHARMA >> <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- >> kernel@lists.infradead.org> >> Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base address of >> linear region >> >> Hi James, >> >> On Tue, Jun 19, 2018 at 4:56 PM, James Morse <james.morse@arm.com> wrote: >> > Hi Bhupesh, >> > >> > On 19/06/18 11:37, Bhupesh Sharma wrote: >> >> On Tue, Jun 19, 2018 at 3:46 PM, James Morse <james.morse@arm.com> >> wrote: >> >>> On 19/06/18 10:57, Jin, Yanjiang wrote: >> >>>>> -----Original Message----- >> >>>>> From: Will Deacon [mailto:will.deacon@arm.com] >> >>>>> Sent: 2018年6月19日 17:41 >> >>>>> To: Jin, Yanjiang <yanjiang.jin@hxt-semitech.com> >> >>>>> Cc: James Morse <james.morse@arm.com>; Bhupesh Sharma >> >>>>> <bhsharma@redhat.com>; Mark Rutland <mark.rutland@arm.com>; Ard >> >>>>> Biesheuvel <ard.biesheuvel@linaro.org>; Catalin Marinas >> >>>>> <catalin.marinas@arm.com>; Kexec Mailing List >> >>>>> <kexec@lists.infradead.org>; AKASHI Takahiro >> >>>>> <takahiro.akashi@linaro.org>; Bhupesh SHARMA >> >>>>> <bhupesh.linux@gmail.com>; linux-arm-kernel <linux-arm- >> >>>>> kernel@lists.infradead.org> >> >>>>> Subject: Re: [PATCH] arm64/mm: Introduce a variable to hold base >> >>>>> address of linear region >> > >> >>>>>>>> It is hard to know all above in kexec-tools now. Originally I >> >>>>>>>> planned to read memstart_addr's value from "/dev/mem", but >> >>>>>>>> someone thought not all Kernels enable "/dev/mem", we'd better >> >>>>>>>> find a more generic approach. So we want to get some >> >>>>>>>> suggestions from ARM kernel >> >>>>> community. >> >>>>>>>> Can we export this variable in Kernel side through sysconf() or >> >>>>>>>> other similar methods? Or someone can provide an effect way to >> >>>>>>>> get memstart_addr's value? >> >>>>>>> >> >>>>>>> I thought the suggestion from James was to expose this via an >> >>>>>>> ELF NOTE in kcore and vmcore (or in the header directly if >> >>>>>>> that's possible, but I'm >> >>>>> not sure about it)? >> >>>>>> >> >>>>>> Thanks for your reply firstly. But same as DEVMEM, kcore is not a >> >>>>>> must-have, so we can't depend on it. >> >>>>> >> >>>>> Neither is KEXEC. We can select PROC_KCORE from KEXEC if it helps. >> >>>>> >> >>>>>> On the other hand, phys_to_virt() is called during generating >> >>>>>> vmcore in Kexec-tools, vmcore also can't help this issue. >> >>>>> >> >>>>> I don't understand this part. If you have the vmcore in your hand, >> >>>>> why can't you grok the pv offset from the note and use that in >> phys_to_virt()? >> >>>> >> >>>> It is a chicken-and-egg issue. >> >>>> phys_to virt() is for crashdump setup. To generate vmcore, we must >> >>>> call phys_to_virt(). At this point, no vmcore exists. >> >>> >> >>> Its needed for the parts of the ELF header that kexec-tools >> >>> generates at kdump load time? >> >>> >> >>> So adding this pv_offset to the key=value data >> >>> crash_save_vmcoreinfo_init() saves isn't available early enough? >> > >> >> Yes, one case where it is not actually available early enough for >> >> makedumpfile usage is if we are determining the PT_NOTE contents from >> >> the '/proc/kcore' on a 'live' system >> > >> >> int set_kcore_vmcoreinfo(uint64_t vmcoreinfo_addr, uint64_t >> >> vmcoreinfo_len) >> >> >> >> { >> >> >> >> <snip..> >> >> kvaddr = (ulong)vmcoreinfo_addr + PAGE_OFFSET; >> >> >> >> } >> > >> > You are trying to read the vmcoreinfo through /proc/kcore given >> > knowledge of its physical address. >> > >> > I'm suggesting adding the contents of vmcoreinfo as a PT_NOTE section >> > of /proc/kcore's ELF header. No special knowledge necessary, any >> > elf-parser should be able to dump the values. >> > >> > >> >> Now the problem at hand is to determine the offset at which the >> >> pv_offset (key=value data pair) lies in the '/proc/kcore' (I assume >> >> that when you mentioned above and earlier about adding this pair to >> >> the elfnotes you meant both the vmcoreinfo and 'proc/kcore'), as we >> >> can have 'n' number of PT_LOAD segments. >> > >> > It looks like there is already a NOTE section with core info in there: >> > | # readelf -l /proc/kcore >> > | >> > | Elf file type is CORE (Core file) >> > | Entry point 0x0 >> > | There are 16 program headers, starting at offset 64 >> > | >> > | Program Headers: >> > | Type Offset VirtAddr PhysAddr >> > | FileSiz MemSiz Flags Align >> > | NOTE 0x00000000000003c0 0x0000000000000000 0x0000000000000000 >> > | 0x0000000000001114 0x0000000000000000 0x0 >> > >> > I assume we can add more notes without breaking the existing user... >> > >> > (and it looks like there are some broken __pa(kernel symbol) users in there. >> >> Thanks for your inputs. >> >> I am working on fixes on the above lines for kernel and user-space tools (like >> makedumpfile, crash-utility and kexec-tools). >> >> I will post some RFC patches on the same lines (or come back in case I get stuck >> somewhere) shortly. >> >> Thanks, >> Bhupesh > > > > This email is intended only for the named addressee. It may contain information that is confidential/private, legally privileged, or copyright-protected, and you should handle it accordingly. If you are not the intended recipient, you do not have legal rights to retain, copy, or distribute this email or its contents, and should promptly delete the email and all electronic copies in your system; do not retain copies in any media. If you have received this email in error, please notify the sender promptly. Thank you. > >
Hi Bhupesh, (CC: +Omar) On 20/06/18 08:26, Bhupesh Sharma wrote: > On Wed, Jun 20, 2018 at 7:46 AM, Jin, Yanjiang > <yanjiang.jin@hxt-semitech.com> wrote: >>> From: Bhupesh Sharma [mailto:bhsharma@redhat.com] >>> On Tue, Jun 19, 2018 at 4:56 PM, James Morse <james.morse@arm.com> wrote: >>>> I'm suggesting adding the contents of vmcoreinfo as a PT_NOTE section >>>> of /proc/kcore's ELF header. No special knowledge necessary, any >>>> elf-parser should be able to dump the values. [..] >>> I am working on fixes on the above lines for kernel and user-space tools (like >>> makedumpfile, crash-utility and kexec-tools). >>> >>> I will post some RFC patches on the same lines (or come back in case I get stuck >>> somewhere) shortly. I spotted this series from Omar: https://lkml.org/lkml/2018/7/6/866 Hopefully it does what you need? Thanks, James
Hi James, On Wed, Jul 11, 2018 at 6:54 PM, James Morse <james.morse@arm.com> wrote: > Hi Bhupesh, > > (CC: +Omar) > > On 20/06/18 08:26, Bhupesh Sharma wrote: >> On Wed, Jun 20, 2018 at 7:46 AM, Jin, Yanjiang >> <yanjiang.jin@hxt-semitech.com> wrote: >>>> From: Bhupesh Sharma [mailto:bhsharma@redhat.com] >>>> On Tue, Jun 19, 2018 at 4:56 PM, James Morse <james.morse@arm.com> wrote: >>>>> I'm suggesting adding the contents of vmcoreinfo as a PT_NOTE section >>>>> of /proc/kcore's ELF header. No special knowledge necessary, any >>>>> elf-parser should be able to dump the values. > > [..] >>>> I am working on fixes on the above lines for kernel and user-space tools (like >>>> makedumpfile, crash-utility and kexec-tools). >>>> >>>> I will post some RFC patches on the same lines (or come back in case I get stuck >>>> somewhere) shortly. > > I spotted this series from Omar: > https://lkml.org/lkml/2018/7/6/866 > > Hopefully it does what you need? Thanks a lot for sharing this useful series. BTW, I am sorry for taking a long time to reply to this thread, but I was reading some x86_64/ppc legacy code and also experimenting with approaches in both user-space and kernel-space and have some interesting updates. Just to recap, there are two separate issues we are seeing with arm64 with user-space utilities which are used for debugging live systems or crashed kernels: - Availability of PHYS_OFFSET in user-space (both for KASLR and non-KASLR boot cases): I see two approaches to fix this issue: 1. Fix inside Kernel: a). See <https://www.spinics.net/lists/kexec/msg20847.html> for background details. Having PHY_OFFSET added to the '/proc/kcore' as a PT_NOTE (it is already added to vmcore as a NUMBER) would suffice. b). Omar's series add the vmcoreinfo to the kcore itself, so it would be sufficient for the above case as well, since PHYS_OFFSET is already added to the vmcoreinfo inside 'arch/arm64/kernel/machine_kexec.c': void arch_crash_save_vmcoreinfo(void) { <..snip..> vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n", PHYS_OFFSET); <..snip..> } c). This will help the cases where we are debugging a 'live' (or running system). 2. Fix inside user-space: a). See as an example a flaky reference implementation in 'kexec-tools': See <https://github.com/bhupesh-sharma/kexec-tools/commit/e8f920158ce57399c770c7160711a72fc740f1d6> - Note that the calculation of 'ARM64_MEMSTART_ALIGN' value in user-space is quite tricky (as is evident from the above implementation and I took an easy route for my specific PAGE_SIZE and VA_BITS combination). b). For some user-space tools like crash and makedumpfile, the underlying macros like PMD_SHIFT etc have been added as arch-specific code, so they can handle such implementation better. c). But this again means adding more arch specific code to user-space, which probably not advisable. So, we will be better suited to go with a KERNEL fix for this case and Omar's series should help. I will go ahead and give it a try for arm64. - Availability of PAGE_OFFSET in user-space (both for KASLR and non-KASLR boot cases): 1. I had a look at the legacy x86_64 and ppc64 code for some of the user-space tools on how they handle and calculate the PAGE_OFFSET. a). As an example lets consider the case of 'makedumpfile' tool which determines the PAGE_OFFSET for x86_64 from the PT_LOAD segments inside '/proc/kcore': static int get_page_offset_x86_64(void) { <..snip..> if (get_num_pt_loads()) { for (i = 0; get_pt_load(i, &phys_start, NULL, &virt_start, NULL); i++) { if (virt_start != NOT_KV_ADDR && virt_start < __START_KERNEL_map && phys_start != NOT_PADDR) { info->page_offset = virt_start - phys_start; return TRUE; } } } <..snip..> } b). Note the values of the macros used in above computation: #define __START_KERNEL_map (0xffffffff80000000) #define NOT_KV_ADDR (0x0) #define NOT_PADDR (ULONGLONG_MAX) 2. I have a working approach (completely user-space, no kernel changes needed) in place for makedumpfile: <https://github.com/bhupesh-sharma/makedumpfile/commit/18c1c9d798c3efc89b07c731365a0a0a57764003>, using a similar approach as the one listed for x86_64 above: int get_versiondep_info_arm64(void) { <..snip..> if (get_num_pt_loads()) { for (i = 0; get_pt_load(i, &phys_start, NULL, &virt_start, NULL); i++) { if (virt_start != NOT_KV_ADDR && virt_start < __START_KERNEL_map && phys_start != NOT_PADDR && phys_start != 0x0000000010a80000) { info->page_offset = virt_start - phys_start; return TRUE; } } } <..snip..> } a). Note the values of the macros used in above computation: #define __START_KERNEL_map (0xffffffff80000000) #define NOT_KV_ADDR (0x0) #define NOT_PADDR (ULONGLONG_MAX) I also need an additional check of 'phys_start != 0x0000000010a80000' for arm64 on my qualcomm board. This works both for KASLR and non-KASLR cases and with all the combinations of PAGE_SIZE (4K and 64K) and VA_BITS (42 bits and 48 bits). Just for reference, here are the contents of '/proc/kcore' on this system: # readelf -l /proc/kcore Elf file type is CORE (Core file) Entry point 0x0 There are 33 program headers, starting at offset 64 Program Headers: Type Offset VirtAddr PhysAddr FileSiz MemSiz Flags Align NOTE 0x0000000000000778 0x0000000000000000 0x0000000000000000 0x000000000000134c 0x0000000000000000 0x0 LOAD 0x00000085c2c90000 0xfffffc85c2c80000 0x0000000010a80000 0x0000000001b90000 0x0000000001b90000 RWE 0x10000 LOAD 0x0000000008010000 0xfffffc0008000000 0xffffffffffffffff 0x000001ff57ff0000 0x000001ff57ff0000 RWE 0x10000 LOAD 0x0000000000010000 0xfffffc0000000000 0xffffffffffffffff 0x0000000008000000 0x0000000008000000 RWE 0x10000 LOAD 0x0000026f80830000 0xfffffe6f80820000 0x0000000000820000 0x0000000002820000 0x0000000002820000 RWE 0x10000 LOAD 0x000001ff9be10000 0xfffffdff9be00000 0xffffffffffffffff 0x0000000000010000 0x0000000000010000 RWE 0x10000 LOAD 0x0000026f830a0000 0xfffffe6f83090000 0x0000000003090000 0x0000000000050000 0x0000000000050000 RWE 0x10000 LOAD 0x0000026f83180000 0xfffffe6f83170000 0x0000000003170000 0x0000000000090000 0x0000000000090000 RWE 0x10000 LOAD 0x0000026f83420000 0xfffffe6f83410000 0x0000000003410000 0x0000000000040000 0x0000000000040000 RWE 0x10000 LOAD 0x0000026f834c0000 0xfffffe6f834b0000 0x00000000034b0000 0x00000000000b0000 0x00000000000b0000 RWE 0x10000 LOAD 0x0000026f83650000 0xfffffe6f83640000 0x0000000003640000 0x0000000000040000 0x0000000000040000 RWE 0x10000 LOAD 0x0000026f866b0000 0xfffffe6f866a0000 0x00000000066a0000 0x0000000000080000 0x0000000000080000 RWE 0x10000 LOAD 0x000001ff9be20000 0xfffffdff9be10000 0xffffffffffffffff 0x0000000000010000 0x0000000000010000 RWE 0x10000 LOAD 0x0000026f87070000 0xfffffe6f87060000 0x0000000007060000 0x0000000000280000 0x0000000000280000 RWE 0x10000 LOAD 0x0000026f87390000 0xfffffe6f87380000 0x0000000007380000 0x00000000000b0000 0x00000000000b0000 RWE 0x10000 LOAD 0x0000026f88250000 0xfffffe6f88240000 0x0000000008240000 0x0000000000070000 0x0000000000070000 RWE 0x10000 LOAD 0x000001ff9be30000 0xfffffdff9be20000 0xffffffffffffffff 0x0000000000010000 0x0000000000010000 RWE 0x10000 LOAD 0x0000026f882f0000 0xfffffe6f882e0000 0x00000000082e0000 0x0000000000010000 0x0000000000010000 RWE 0x10000 LOAD 0x0000026f88310000 0xfffffe6f88300000 0x0000000008300000 0x0000000000040000 0x0000000000040000 RWE 0x10000 LOAD 0x0000026f88a00000 0xfffffe6f889f0000 0x00000000089f0000 0x0000000000040000 0x0000000000040000 RWE 0x10000 LOAD 0x0000026f88a60000 0xfffffe6f88a50000 0x0000000008a50000 0x0000000000020000 0x0000000000020000 RWE 0x10000 LOAD 0x0000026f88aa0000 0xfffffe6f88a90000 0x0000000008a90000 0x0000000000020000 0x0000000000020000 RWE 0x10000 LOAD 0x0000026f88fd0000 0xfffffe6f88fc0000 0x0000000008fc0000 0x0000000004fe0000 0x0000000004fe0000 RWE 0x10000 LOAD 0x0000026f8dfe0000 0xfffffe6f8dfd0000 0x000000000dfd0000 0x0000000002030000 0x0000000002030000 RWE 0x10000 LOAD 0x000001ff9be40000 0xfffffdff9be30000 0xffffffffffffffff 0x0000000000010000 0x0000000000010000 RWE 0x10000 LOAD 0x0000026f90810000 0xfffffe6f90800000 0x0000000010800000 0x00000000077f0000 0x00000000077f0000 RWE 0x10000 LOAD 0x000001ff9be50000 0xfffffdff9be40000 0xffffffffffffffff 0x0000000000020000 0x0000000000020000 RWE 0x10000 LOAD 0x0000026f9c020000 0xfffffe6f9c010000 0x000000001c010000 0x00000000007f0000 0x00000000007f0000 RWE 0x10000 LOAD 0x000001ff9be80000 0xfffffdff9be70000 0xffffffffffffffff 0x0000000000010000 0x0000000000010000 RWE 0x10000 LOAD 0x0000026f9c820000 0xfffffe6f9c810000 0x000000001c810000 0x00000000627b0000 0x00000000627b0000 RWE 0x10000 LOAD 0x0000026ffeff0000 0xfffffe6ffefe0000 0x000000007efe0000 0x0000000000010000 0x0000000000010000 RWE 0x10000 LOAD 0x000001ff9c000000 0xfffffdff9bff0000 0xffffffffffffffff 0x0000000000010000 0x0000000000010000 RWE 0x10000 LOAD 0x0000026fff010000 0xfffffe6fff000000 0x000000007f000000 0x0000001781000000 0x0000001781000000 RWE 0x10000 b). The above approach works fine for me with multiple user-space utils, so probably we don't need a kernel fix for this case and can calculate PAGE_OFFSET in user-space via PT_LOAD 'virt_start - phys_start' manipulation. Please share your views. Regards, Bhupesh
On Wed, Jul 11, 2018 at 09:06:27PM +0530, Bhupesh Sharma wrote: > Hi James, > > On Wed, Jul 11, 2018 at 6:54 PM, James Morse <james.morse@arm.com> wrote: > > Hi Bhupesh, > > > > (CC: +Omar) > > > > On 20/06/18 08:26, Bhupesh Sharma wrote: > >> On Wed, Jun 20, 2018 at 7:46 AM, Jin, Yanjiang > >> <yanjiang.jin@hxt-semitech.com> wrote: > >>>> From: Bhupesh Sharma [mailto:bhsharma@redhat.com] > >>>> On Tue, Jun 19, 2018 at 4:56 PM, James Morse <james.morse@arm.com> wrote: > >>>>> I'm suggesting adding the contents of vmcoreinfo as a PT_NOTE section > >>>>> of /proc/kcore's ELF header. No special knowledge necessary, any > >>>>> elf-parser should be able to dump the values. > > > > [..] > >>>> I am working on fixes on the above lines for kernel and user-space tools (like > >>>> makedumpfile, crash-utility and kexec-tools). > >>>> > >>>> I will post some RFC patches on the same lines (or come back in case I get stuck > >>>> somewhere) shortly. > > > > I spotted this series from Omar: > > https://lkml.org/lkml/2018/7/6/866 > > > > Hopefully it does what you need? > > Thanks a lot for sharing this useful series. > > BTW, I am sorry for taking a long time to reply to this thread, but I > was reading some x86_64/ppc legacy code and also experimenting with > approaches in both user-space and kernel-space and have some > interesting updates. > > Just to recap, there are two separate issues we are seeing with arm64 > with user-space utilities which are used for debugging live systems or > crashed kernels: > > - Availability of PHYS_OFFSET in user-space (both for KASLR and > non-KASLR boot cases): > > I see two approaches to fix this issue: > 1. Fix inside Kernel: > a). See <https://www.spinics.net/lists/kexec/msg20847.html> for > background details. Having PHY_OFFSET added to the '/proc/kcore' as a > PT_NOTE (it is already added to vmcore as a NUMBER) would suffice. > > b). Omar's series add the vmcoreinfo to the kcore itself, so it would > be sufficient for the above case as well, since PHYS_OFFSET is already > added to the vmcoreinfo inside 'arch/arm64/kernel/machine_kexec.c': > > void arch_crash_save_vmcoreinfo(void) > { > <..snip..> > vmcoreinfo_append_str("NUMBER(PHYS_OFFSET)=0x%llx\n", > PHYS_OFFSET); > <..snip..> > } > > c). This will help the cases where we are debugging a 'live' (or > running system). > > 2. Fix inside user-space: > a). See as an example a flaky reference implementation in > 'kexec-tools': See > <https://github.com/bhupesh-sharma/kexec-tools/commit/e8f920158ce57399c770c7160711a72fc740f1d6> > - Note that the calculation of 'ARM64_MEMSTART_ALIGN' value in > user-space is quite tricky (as is evident from the above > implementation and I took an easy route for my specific PAGE_SIZE and > VA_BITS combination). > > b). For some user-space tools like crash and makedumpfile, the > underlying macros like PMD_SHIFT etc have been added as arch-specific > code, so they can handle such implementation better. > > c). But this again means adding more arch specific code to user-space, > which probably not advisable. > > So, we will be better suited to go with a KERNEL fix for this case and > Omar's series should help. I will go ahead and give it a try for > arm64. Thanks, please do take a look. A Reviewed-by (or at least Tested-by) would help get it merged. Note that for my use case, the workaround I've been using for now is to get the physical address and size of vmcoreinfo from /sys/kernel/vmcoreinfo, then reading from that physical address in /proc/kcore (assuming that your kernel is new enough to fill in p_paddr in the /proc/kcore segments).
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h index 49d99214f43c..bfd0915ecaf8 100644 --- a/arch/arm64/include/asm/memory.h +++ b/arch/arm64/include/asm/memory.h @@ -178,6 +178,9 @@ extern s64 memstart_addr; /* PHYS_OFFSET - the physical address of the start of memory. */ #define PHYS_OFFSET ({ VM_BUG_ON(memstart_addr & 1); memstart_addr; }) +/* the virtual base of the linear region. */ +extern s64 linear_reg_start_addr; + /* the virtual base of the kernel image (minus TEXT_OFFSET) */ extern u64 kimage_vaddr; diff --git a/arch/arm64/kernel/arm64ksyms.c b/arch/arm64/kernel/arm64ksyms.c index d894a20b70b2..a92238ea45ff 100644 --- a/arch/arm64/kernel/arm64ksyms.c +++ b/arch/arm64/kernel/arm64ksyms.c @@ -42,6 +42,7 @@ EXPORT_SYMBOL(__arch_copy_in_user); /* physical memory */ EXPORT_SYMBOL(memstart_addr); +EXPORT_SYMBOL(linear_reg_start_addr); /* string / mem functions */ EXPORT_SYMBOL(strchr); diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 325cfb3b858a..29447adb0eef 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -60,6 +60,7 @@ * that cannot be mistaken for a real physical address. */ s64 memstart_addr __ro_after_init = -1; +s64 linear_reg_start_addr __ro_after_init = PAGE_OFFSET; phys_addr_t arm64_dma_phys_limit __ro_after_init; #ifdef CONFIG_BLK_DEV_INITRD @@ -452,6 +453,8 @@ void __init arm64_memblock_init(void) } } + linear_reg_start_addr = __phys_to_virt(memblock_start_of_DRAM()); + /* * Register the kernel text, kernel data, initrd, and initial * pagetables with memblock.
The start of the linear region map on a KASLR enabled ARM64 machine - which supports a compatible EFI firmware (with EFI_RNG_PROTOCOL support), is no longer correctly represented by the PAGE_OFFSET macro, since it is defined as: (UL(1) << (VA_BITS - 1)) + 1) So taking an example of a platform with VA_BITS=48, this gives a static value of: PAGE_OFFSET = 0xffff800000000000 However, for the KASLR case, we use the 'memstart_offset_seed' to randomize the linear region - since 'memstart_addr' indicates the start of physical RAM, we randomize the same on basis of 'memstart_offset_seed' value. As the PAGE_OFFSET value is used presently by several user space tools (for e.g. makedumpfile and crash tools) to determine the start of linear region and hence to read addresses (like PT_NOTE fields) from '/proc/kcore' for the non-KASLR boot cases, so it would be better to use 'memblock_start_of_DRAM()' value (converted to virtual) as the start of linear region for the KASLR cases and default to the PAGE_OFFSET value for non-KASLR cases to indicate the start of linear region. I tested this on my qualcomm (which supports EFI_RNG_PROTOCOL) and apm mustang (which does not support EFI_RNG_PROTOCOL) arm64 boards and was able to use a modified user space utility (like kexec-tools and makedumpfile) to determine the start of linear region correctly for both the KASLR and non-KASLR boot cases. Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Will Deacon <will.deacon@arm.com> Cc: AKASHI Takahiro <takahiro.akashi@linaro.org> Cc: James Morse <james.morse@arm.com> Signed-off-by: Bhupesh Sharma <bhsharma@redhat.com> --- arch/arm64/include/asm/memory.h | 3 +++ arch/arm64/kernel/arm64ksyms.c | 1 + arch/arm64/mm/init.c | 3 +++ 3 files changed, 7 insertions(+)