Message ID | 20210531084540.78546-1-kernelfans@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | use __create_pgd_mapping() to implement idmap and unify codes | expand |
On Mon, 31 May 2021 at 10:46, Pingfan Liu <kernelfans@gmail.com> wrote: > > v2 -> v3: > -1. leave out the part of redefinition the CONFIG_PGTABLE_LEVEL, > concentrate on sharing __create_pgd_mapping() in head.S as the first > step. > -2. make IDMAP_PGTABLE_LEVELS use the max value ([3/5]) > > rfc -> v2: > more debug and test > > *** Goal of this series *** > > __create_pgd_mapping() sets up the pgtable for mapping __va(paddr) -> > paddr under the MMU-on situation. Since pgtable upper level holds the > paddr of the lower level, with a slight adaptation, > __create_pgd_mapping() can also set up the mapping under the MMU-off > situation. ([4/5]) > > After that, both idmap_pg_dir and init_pg_dir can be created by > __create_pgd_mapping(). And the counterpart asm code can be simplified. > I understand the desire to simplify the page table construction code in head.S, but up until now, we have been very careful to avoid calling into C code with the MMU off. There are simply too many assumptions in the compiler about the context the generated code will execute in: one example is unaligned access, which must be disabled for source files that may be called with the MMU off, as otherwise, the compiler is permitted to emit loads and stores that are not allowed on device memory (which is the default memory type used for all memory with the MMU off) Do you have a killer use case for this feature? Or is it just a nice cleanup? > This series can be applied against commit 4284bdf9405a ("Linux 5.13-rc2"). > > > *** Plan for the next *** > > The omitted part in V2, which resorts to redefinition of > CONFIG_PGTABLE_LEVEL to provides two sets of routines. One set is proper > to set up a idmap which can adress total system RAM. That can simplify > the asm code in head.S furtherly, also provide an create_idmap() API. > > > *** Test *** > > This series can success booting with the following configuration on either Cavium > ThunderX2 99xx or Qualcomm Falkor: > PAGE_SIZE VA PA PGTABLE_LEVEL > Qualcomm 4K 48 48 4 > 4K 39 48 3 > 16K 48 48 4 > 16K 47 48 3 > Cavium 64K 52 52 3 > 64K 48 52 3 > 64K 42 52 2 > > > > *** History *** > > RFC: > https://lore.kernel.org/linux-arm-kernel/20210410095654.24102-1-kernelfans@gmail.com/ > V2: > https://lore.kernel.org/linux-arm-kernel/20210425141304.32721-1-kernelfans@gmail.com/ > > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will@kernel.org> > Cc: Ard Biesheuvel <ardb@kernel.org> > Cc: Marc Zyngier <maz@kernel.org> > Cc: Kristina Martsenko <kristina.martsenko@arm.com> > Cc: James Morse <james.morse@arm.com> > Cc: Steven Price <steven.price@arm.com> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com> > Cc: Anshuman Khandual <anshuman.khandual@arm.com> > Cc: Atish Patra <atish.patra@wdc.com> > Cc: Mike Rapoport <rppt@kernel.org> > Cc: Logan Gunthorpe <logang@deltatee.com> > Cc: Mark Brown <broonie@kernel.org> > To: linux-arm-kernel@lists.infradead.org > > Pingfan Liu (5): > arm64/mm: introduce pgtable allocator for idmap_pg_dir and init_pg_dir > arm64/mm: disable WARN_ON() and BUG_ON() in __create_pgd_mapping() if > too early > arm64/mm: unconditionally set IDMAP_PGTABLE_LEVELS to max pgtable > level > arm64/mm: make __create_pgd_mapping() capable to handle pgtable's > paddr > arm64/mm: use __create_pgd_mapping() to create pgtable for > idmap_pg_dir and init_pg_dir > > arch/arm64/include/asm/kernel-pgtable.h | 33 +++-- > arch/arm64/include/asm/pgalloc.h | 9 ++ > arch/arm64/kernel/head.S | 164 +++++++----------------- > arch/arm64/mm/mmu.c | 108 ++++++++++++---- > 4 files changed, 153 insertions(+), 161 deletions(-) > > -- > 2.29.2 >
Hi Ard, Thanks for your reviewing. On Tue, Jun 1, 2021 at 3:50 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Mon, 31 May 2021 at 10:46, Pingfan Liu <kernelfans@gmail.com> wrote: > > > > v2 -> v3: > > -1. leave out the part of redefinition the CONFIG_PGTABLE_LEVEL, > > concentrate on sharing __create_pgd_mapping() in head.S as the first > > step. > > -2. make IDMAP_PGTABLE_LEVELS use the max value ([3/5]) > > > > rfc -> v2: > > more debug and test > > > > *** Goal of this series *** > > > > __create_pgd_mapping() sets up the pgtable for mapping __va(paddr) -> > > paddr under the MMU-on situation. Since pgtable upper level holds the > > paddr of the lower level, with a slight adaptation, > > __create_pgd_mapping() can also set up the mapping under the MMU-off > > situation. ([4/5]) > > > > After that, both idmap_pg_dir and init_pg_dir can be created by > > __create_pgd_mapping(). And the counterpart asm code can be simplified. > > > > I understand the desire to simplify the page table construction code > in head.S, but up until now, we have been very careful to avoid > calling into C code with the MMU off. There are simply too many > assumptions in the compiler about the context the generated code will > execute in: one example is unaligned access, which must be disabled > for source files that may be called with the MMU off, as otherwise, > the compiler is permitted to emit loads and stores that are not > allowed on device memory (which is the default memory type used for > all memory with the MMU off) > You are right. These C routines happen to use "unsigned long", which can exclude this unaligned case. To make an guarantee, is "-mno-unaligned-access" good enough? Besides unaligned-access, any further risk originating from compiler assumption? (I think that the common optimization: reordering, merging, reloading on this "device" memory has no bad effect) > Do you have a killer use case for this feature? Or is it just a nice cleanup? > Yes, in the omitted part in v2, I had planned to provide an unified page table manipulation routine, and provide an create_idmap() API. So there can be an handy interface to create a whole RAM addressable idmapX where needed. And three place can share the API create_idmap(): head.S, trans_pgd_idmap_page(), kexec boot with mmu-on. As a result, most of mm/trans_pgd.c can be cleaned after hibernation records paddr. Thanks, Pingfan > > This series can be applied against commit 4284bdf9405a ("Linux 5.13-rc2"). > > > > > > *** Plan for the next *** > > > > The omitted part in V2, which resorts to redefinition of > > CONFIG_PGTABLE_LEVEL to provides two sets of routines. One set is proper > > to set up a idmap which can adress total system RAM. That can simplify > > the asm code in head.S furtherly, also provide an create_idmap() API. > > > > > > *** Test *** > > > > This series can success booting with the following configuration on either Cavium > > ThunderX2 99xx or Qualcomm Falkor: > > PAGE_SIZE VA PA PGTABLE_LEVEL > > Qualcomm 4K 48 48 4 > > 4K 39 48 3 > > 16K 48 48 4 > > 16K 47 48 3 > > Cavium 64K 52 52 3 > > 64K 48 52 3 > > 64K 42 52 2 > > > > > > > > *** History *** > > > > RFC: > > https://lore.kernel.org/linux-arm-kernel/20210410095654.24102-1-kernelfans@gmail.com/ > > V2: > > https://lore.kernel.org/linux-arm-kernel/20210425141304.32721-1-kernelfans@gmail.com/ > > > > > > Cc: Catalin Marinas <catalin.marinas@arm.com> > > Cc: Will Deacon <will@kernel.org> > > Cc: Ard Biesheuvel <ardb@kernel.org> > > Cc: Marc Zyngier <maz@kernel.org> > > Cc: Kristina Martsenko <kristina.martsenko@arm.com> > > Cc: James Morse <james.morse@arm.com> > > Cc: Steven Price <steven.price@arm.com> > > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> > > Cc: Pavel Tatashin <pasha.tatashin@soleen.com> > > Cc: Anshuman Khandual <anshuman.khandual@arm.com> > > Cc: Atish Patra <atish.patra@wdc.com> > > Cc: Mike Rapoport <rppt@kernel.org> > > Cc: Logan Gunthorpe <logang@deltatee.com> > > Cc: Mark Brown <broonie@kernel.org> > > To: linux-arm-kernel@lists.infradead.org > > > > Pingfan Liu (5): > > arm64/mm: introduce pgtable allocator for idmap_pg_dir and init_pg_dir > > arm64/mm: disable WARN_ON() and BUG_ON() in __create_pgd_mapping() if > > too early > > arm64/mm: unconditionally set IDMAP_PGTABLE_LEVELS to max pgtable > > level > > arm64/mm: make __create_pgd_mapping() capable to handle pgtable's > > paddr > > arm64/mm: use __create_pgd_mapping() to create pgtable for > > idmap_pg_dir and init_pg_dir > > > > arch/arm64/include/asm/kernel-pgtable.h | 33 +++-- > > arch/arm64/include/asm/pgalloc.h | 9 ++ > > arch/arm64/kernel/head.S | 164 +++++++----------------- > > arch/arm64/mm/mmu.c | 108 ++++++++++++---- > > 4 files changed, 153 insertions(+), 161 deletions(-) > > > > -- > > 2.29.2 > >
On Tue, Jun 01, 2021 at 05:25:49PM +0800, Pingfan Liu wrote: > On Tue, Jun 1, 2021 at 3:50 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > On Mon, 31 May 2021 at 10:46, Pingfan Liu <kernelfans@gmail.com> wrote: > > > v2 -> v3: > > > -1. leave out the part of redefinition the CONFIG_PGTABLE_LEVEL, > > > concentrate on sharing __create_pgd_mapping() in head.S as the first > > > step. > > > -2. make IDMAP_PGTABLE_LEVELS use the max value ([3/5]) > > > > > > rfc -> v2: > > > more debug and test > > > > > > *** Goal of this series *** > > > > > > __create_pgd_mapping() sets up the pgtable for mapping __va(paddr) -> > > > paddr under the MMU-on situation. Since pgtable upper level holds the > > > paddr of the lower level, with a slight adaptation, > > > __create_pgd_mapping() can also set up the mapping under the MMU-off > > > situation. ([4/5]) > > > > > > After that, both idmap_pg_dir and init_pg_dir can be created by > > > __create_pgd_mapping(). And the counterpart asm code can be simplified. > > > > I understand the desire to simplify the page table construction code > > in head.S, but up until now, we have been very careful to avoid > > calling into C code with the MMU off. There are simply too many > > assumptions in the compiler about the context the generated code will > > execute in: one example is unaligned access, which must be disabled > > for source files that may be called with the MMU off, as otherwise, > > the compiler is permitted to emit loads and stores that are not > > allowed on device memory (which is the default memory type used for > > all memory with the MMU off) > > You are right. These C routines happen to use "unsigned long", which > can exclude this unaligned case. > To make an guarantee, is "-mno-unaligned-access" good enough? > > Besides unaligned-access, any further risk originating from compiler > assumption? (I think that the common optimization: reordering, > merging, reloading on this "device" memory has no bad effect) There's also instrumentation that needs disabling (kasan, ubsan, kcov, gcov). You can look at arch/arm64/kvm/hyp/nvhe/Makefile for various flags added or filtered out, though the KVM hyp code runs with the MMU on. I'm not sure what other flags are needed to guarantee the generated code can run with the MMU off but we can always ask the toolchain folk. However, I'm still not convinced about sharing __create_pgd_mapping() with the early head.S code. A better option would be a separate, stand-alone file where we have more control on what gets called or accessed (of course, if there's any value in going this route). > > Do you have a killer use case for this feature? Or is it just a nice cleanup? > > Yes, in the omitted part in v2, I had planned to provide an unified > page table manipulation routine, and provide an create_idmap() API. So > there can be an handy interface to create a whole RAM addressable > idmapX where needed. Where would this be needed?
On Tue, Jun 08, 2021 at 06:38:07PM +0100, Catalin Marinas wrote: > On Tue, Jun 01, 2021 at 05:25:49PM +0800, Pingfan Liu wrote: > > On Tue, Jun 1, 2021 at 3:50 AM Ard Biesheuvel <ardb@kernel.org> wrote: > > > On Mon, 31 May 2021 at 10:46, Pingfan Liu <kernelfans@gmail.com> wrote: > > > > v2 -> v3: > > > > -1. leave out the part of redefinition the CONFIG_PGTABLE_LEVEL, > > > > concentrate on sharing __create_pgd_mapping() in head.S as the first > > > > step. > > > > -2. make IDMAP_PGTABLE_LEVELS use the max value ([3/5]) > > > > > > > > rfc -> v2: > > > > more debug and test > > > > > > > > *** Goal of this series *** > > > > > > > > __create_pgd_mapping() sets up the pgtable for mapping __va(paddr) -> > > > > paddr under the MMU-on situation. Since pgtable upper level holds the > > > > paddr of the lower level, with a slight adaptation, > > > > __create_pgd_mapping() can also set up the mapping under the MMU-off > > > > situation. ([4/5]) > > > > > > > > After that, both idmap_pg_dir and init_pg_dir can be created by > > > > __create_pgd_mapping(). And the counterpart asm code can be simplified. > > > > > > I understand the desire to simplify the page table construction code > > > in head.S, but up until now, we have been very careful to avoid > > > calling into C code with the MMU off. There are simply too many > > > assumptions in the compiler about the context the generated code will > > > execute in: one example is unaligned access, which must be disabled > > > for source files that may be called with the MMU off, as otherwise, > > > the compiler is permitted to emit loads and stores that are not > > > allowed on device memory (which is the default memory type used for > > > all memory with the MMU off) > > > > You are right. These C routines happen to use "unsigned long", which > > can exclude this unaligned case. > > To make an guarantee, is "-mno-unaligned-access" good enough? > > Unaligned-access is the main challedge, and the unaligned-access should be caused from the programer. Compiler has enforced different default alignment, including global and local(stack) variable, function alignment. Otherwise, the atomity can not be assumped on some base data type. But some code may violate the alignment, which compiler has no method to prevent. As Documentation/core-api/unaligned-memory-access.rst 1. Casting variables to types of different lengths 2. Pointer arithmetic followed by access to at least 2 bytes of data So in order to prevent the unaligned-access, the involved codes should be checked carefully. > > Besides unaligned-access, any further risk originating from compiler > > assumption? (I think that the common optimization: reordering, > > merging, reloading on this "device" memory has no bad effect) > These should be not issue. Since MMU-off present a more strict memory model than MMU-on. So if a code can run on a more relaxed one, it can also run on the strict one. > There's also instrumentation that needs disabling (kasan, ubsan, kcov, > gcov). You can look at arch/arm64/kvm/hyp/nvhe/Makefile for various > flags added or filtered out, though the KVM hyp code runs with the MMU > on. I'm not sure what other flags are needed to guarantee the generated > code can run with the MMU off but we can always ask the toolchain folk. > Thanks for pointing out these issues. I will check. > However, I'm still not convinced about sharing __create_pgd_mapping() > with the early head.S code. A better option would be a separate, > stand-alone file where we have more control on what gets called or > accessed (of course, if there's any value in going this route). > > > > Do you have a killer use case for this feature? Or is it just a nice cleanup? > > > > Yes, in the omitted part in v2, I had planned to provide an unified > > page table manipulation routine, and provide an create_idmap() API. So > > there can be an handy interface to create a whole RAM addressable > > idmapX where needed. > > Where would this be needed? > It was planned for Pavel's series: [PATCH v13 00/18] arm64: MMU enabled kexec relocation, where Pavel uses linear mapping to copy the data. But that way should involve copy of "EL2 vectors" to tackle the transition to EL2. On the other hand, if using a global addressable idmap, the logic of transition to EL2 keeps untouched, and just enable MMU in SYM_CODE_START(arm64_relocate_new_kernel) The other call site is in trans_pgd_idmap_page() Thanks, Pingfan