mbox series

[PATCHv3,0/5] use __create_pgd_mapping() to implement idmap and unify codes

Message ID 20210531084540.78546-1-kernelfans@gmail.com (mailing list archive)
Headers show
Series use __create_pgd_mapping() to implement idmap and unify codes | expand

Message

Pingfan Liu May 31, 2021, 8:45 a.m. UTC
v2 -> v3:
  -1. leave out the part of redefinition the CONFIG_PGTABLE_LEVEL,
concentrate on sharing __create_pgd_mapping() in head.S as the first
step.
  -2. make IDMAP_PGTABLE_LEVELS use the max value ([3/5])

rfc -> v2:
  more debug and test

*** Goal of this series ***

__create_pgd_mapping() sets up the pgtable for mapping __va(paddr) ->
paddr under the MMU-on situation.  Since pgtable upper level holds the
paddr of the lower level, with a slight adaptation,
__create_pgd_mapping() can also set up the mapping under the MMU-off
situation. ([4/5])

After that, both idmap_pg_dir and init_pg_dir can be created by
__create_pgd_mapping(). And the counterpart asm code can be simplified.

This series can be applied against commit 4284bdf9405a ("Linux 5.13-rc2").


*** Plan for the next ***

The omitted part in V2, which resorts to redefinition of
CONFIG_PGTABLE_LEVEL to provides two sets of routines. One set is proper
to set up a idmap which can adress total system RAM.  That can simplify
the asm code in head.S furtherly, also provide an create_idmap() API.


*** Test ***

This series can success booting with the following configuration on either Cavium
ThunderX2 99xx or Qualcomm Falkor:
            PAGE_SIZE  VA  PA  PGTABLE_LEVEL
Qualcomm    4K         48  48  4
            4K         39  48  3
            16K        48  48  4
            16K        47  48  3
Cavium      64K        52  52  3
            64K        48  52  3
            64K        42  52  2



*** History ***

RFC:
https://lore.kernel.org/linux-arm-kernel/20210410095654.24102-1-kernelfans@gmail.com/
V2:
https://lore.kernel.org/linux-arm-kernel/20210425141304.32721-1-kernelfans@gmail.com/


Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Kristina Martsenko <kristina.martsenko@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Steven Price <steven.price@arm.com>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Atish Patra <atish.patra@wdc.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Logan Gunthorpe <logang@deltatee.com>
Cc: Mark Brown <broonie@kernel.org>
To: linux-arm-kernel@lists.infradead.org

Pingfan Liu (5):
  arm64/mm: introduce pgtable allocator for idmap_pg_dir and init_pg_dir
  arm64/mm: disable WARN_ON() and BUG_ON() in __create_pgd_mapping() if
    too early
  arm64/mm: unconditionally set IDMAP_PGTABLE_LEVELS to max pgtable
    level
  arm64/mm: make __create_pgd_mapping() capable to handle pgtable's
    paddr
  arm64/mm: use __create_pgd_mapping() to create pgtable for
    idmap_pg_dir and init_pg_dir

 arch/arm64/include/asm/kernel-pgtable.h |  33 +++--
 arch/arm64/include/asm/pgalloc.h        |   9 ++
 arch/arm64/kernel/head.S                | 164 +++++++-----------------
 arch/arm64/mm/mmu.c                     | 108 ++++++++++++----
 4 files changed, 153 insertions(+), 161 deletions(-)

Comments

Ard Biesheuvel May 31, 2021, 7:50 p.m. UTC | #1
On Mon, 31 May 2021 at 10:46, Pingfan Liu <kernelfans@gmail.com> wrote:
>
> v2 -> v3:
>   -1. leave out the part of redefinition the CONFIG_PGTABLE_LEVEL,
> concentrate on sharing __create_pgd_mapping() in head.S as the first
> step.
>   -2. make IDMAP_PGTABLE_LEVELS use the max value ([3/5])
>
> rfc -> v2:
>   more debug and test
>
> *** Goal of this series ***
>
> __create_pgd_mapping() sets up the pgtable for mapping __va(paddr) ->
> paddr under the MMU-on situation.  Since pgtable upper level holds the
> paddr of the lower level, with a slight adaptation,
> __create_pgd_mapping() can also set up the mapping under the MMU-off
> situation. ([4/5])
>
> After that, both idmap_pg_dir and init_pg_dir can be created by
> __create_pgd_mapping(). And the counterpart asm code can be simplified.
>

I understand the desire to simplify the page table construction code
in head.S, but up until now, we have been very careful to avoid
calling into C code with the MMU off. There are simply too many
assumptions in the compiler about the context the generated code will
execute in: one example is unaligned access, which must be disabled
for source files that may be called with the MMU off, as otherwise,
the compiler is permitted to emit loads and stores that are not
allowed on device memory (which is the default memory type used for
all memory with the MMU off)

Do you have a killer use case for this feature? Or is it just a nice cleanup?

> This series can be applied against commit 4284bdf9405a ("Linux 5.13-rc2").
>
>
> *** Plan for the next ***
>
> The omitted part in V2, which resorts to redefinition of
> CONFIG_PGTABLE_LEVEL to provides two sets of routines. One set is proper
> to set up a idmap which can adress total system RAM.  That can simplify
> the asm code in head.S furtherly, also provide an create_idmap() API.
>
>
> *** Test ***
>
> This series can success booting with the following configuration on either Cavium
> ThunderX2 99xx or Qualcomm Falkor:
>             PAGE_SIZE  VA  PA  PGTABLE_LEVEL
> Qualcomm    4K         48  48  4
>             4K         39  48  3
>             16K        48  48  4
>             16K        47  48  3
> Cavium      64K        52  52  3
>             64K        48  52  3
>             64K        42  52  2
>
>
>
> *** History ***
>
> RFC:
> https://lore.kernel.org/linux-arm-kernel/20210410095654.24102-1-kernelfans@gmail.com/
> V2:
> https://lore.kernel.org/linux-arm-kernel/20210425141304.32721-1-kernelfans@gmail.com/
>
>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will@kernel.org>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Marc Zyngier <maz@kernel.org>
> Cc: Kristina Martsenko <kristina.martsenko@arm.com>
> Cc: James Morse <james.morse@arm.com>
> Cc: Steven Price <steven.price@arm.com>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Atish Patra <atish.patra@wdc.com>
> Cc: Mike Rapoport <rppt@kernel.org>
> Cc: Logan Gunthorpe <logang@deltatee.com>
> Cc: Mark Brown <broonie@kernel.org>
> To: linux-arm-kernel@lists.infradead.org
>
> Pingfan Liu (5):
>   arm64/mm: introduce pgtable allocator for idmap_pg_dir and init_pg_dir
>   arm64/mm: disable WARN_ON() and BUG_ON() in __create_pgd_mapping() if
>     too early
>   arm64/mm: unconditionally set IDMAP_PGTABLE_LEVELS to max pgtable
>     level
>   arm64/mm: make __create_pgd_mapping() capable to handle pgtable's
>     paddr
>   arm64/mm: use __create_pgd_mapping() to create pgtable for
>     idmap_pg_dir and init_pg_dir
>
>  arch/arm64/include/asm/kernel-pgtable.h |  33 +++--
>  arch/arm64/include/asm/pgalloc.h        |   9 ++
>  arch/arm64/kernel/head.S                | 164 +++++++-----------------
>  arch/arm64/mm/mmu.c                     | 108 ++++++++++++----
>  4 files changed, 153 insertions(+), 161 deletions(-)
>
> --
> 2.29.2
>
Pingfan Liu June 1, 2021, 9:25 a.m. UTC | #2
Hi Ard,

Thanks for your reviewing.

On Tue, Jun 1, 2021 at 3:50 AM Ard Biesheuvel <ardb@kernel.org> wrote:
>
> On Mon, 31 May 2021 at 10:46, Pingfan Liu <kernelfans@gmail.com> wrote:
> >
> > v2 -> v3:
> >   -1. leave out the part of redefinition the CONFIG_PGTABLE_LEVEL,
> > concentrate on sharing __create_pgd_mapping() in head.S as the first
> > step.
> >   -2. make IDMAP_PGTABLE_LEVELS use the max value ([3/5])
> >
> > rfc -> v2:
> >   more debug and test
> >
> > *** Goal of this series ***
> >
> > __create_pgd_mapping() sets up the pgtable for mapping __va(paddr) ->
> > paddr under the MMU-on situation.  Since pgtable upper level holds the
> > paddr of the lower level, with a slight adaptation,
> > __create_pgd_mapping() can also set up the mapping under the MMU-off
> > situation. ([4/5])
> >
> > After that, both idmap_pg_dir and init_pg_dir can be created by
> > __create_pgd_mapping(). And the counterpart asm code can be simplified.
> >
>
> I understand the desire to simplify the page table construction code
> in head.S, but up until now, we have been very careful to avoid
> calling into C code with the MMU off. There are simply too many
> assumptions in the compiler about the context the generated code will
> execute in: one example is unaligned access, which must be disabled
> for source files that may be called with the MMU off, as otherwise,
> the compiler is permitted to emit loads and stores that are not
> allowed on device memory (which is the default memory type used for
> all memory with the MMU off)
>
You are right. These C routines happen to use "unsigned long", which
can exclude this unaligned case.
To make an guarantee, is "-mno-unaligned-access" good enough?

Besides unaligned-access, any further risk originating from compiler
assumption? (I think that the common optimization: reordering,
merging, reloading on this "device" memory has no bad effect)

> Do you have a killer use case for this feature? Or is it just a nice cleanup?
>
Yes, in the omitted part in v2, I had planned to provide an unified
page table manipulation routine, and provide an create_idmap() API. So
there can be an handy interface to create a whole RAM addressable
idmapX where needed.

And three place can share the API create_idmap(): head.S,
trans_pgd_idmap_page(), kexec boot with mmu-on. As a result, most of
mm/trans_pgd.c can be cleaned after hibernation records paddr.

Thanks,

Pingfan
> > This series can be applied against commit 4284bdf9405a ("Linux 5.13-rc2").
> >
> >
> > *** Plan for the next ***
> >
> > The omitted part in V2, which resorts to redefinition of
> > CONFIG_PGTABLE_LEVEL to provides two sets of routines. One set is proper
> > to set up a idmap which can adress total system RAM.  That can simplify
> > the asm code in head.S furtherly, also provide an create_idmap() API.
> >
> >
> > *** Test ***
> >
> > This series can success booting with the following configuration on either Cavium
> > ThunderX2 99xx or Qualcomm Falkor:
> >             PAGE_SIZE  VA  PA  PGTABLE_LEVEL
> > Qualcomm    4K         48  48  4
> >             4K         39  48  3
> >             16K        48  48  4
> >             16K        47  48  3
> > Cavium      64K        52  52  3
> >             64K        48  52  3
> >             64K        42  52  2
> >
> >
> >
> > *** History ***
> >
> > RFC:
> > https://lore.kernel.org/linux-arm-kernel/20210410095654.24102-1-kernelfans@gmail.com/
> > V2:
> > https://lore.kernel.org/linux-arm-kernel/20210425141304.32721-1-kernelfans@gmail.com/
> >
> >
> > Cc: Catalin Marinas <catalin.marinas@arm.com>
> > Cc: Will Deacon <will@kernel.org>
> > Cc: Ard Biesheuvel <ardb@kernel.org>
> > Cc: Marc Zyngier <maz@kernel.org>
> > Cc: Kristina Martsenko <kristina.martsenko@arm.com>
> > Cc: James Morse <james.morse@arm.com>
> > Cc: Steven Price <steven.price@arm.com>
> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> > Cc: Pavel Tatashin <pasha.tatashin@soleen.com>
> > Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> > Cc: Atish Patra <atish.patra@wdc.com>
> > Cc: Mike Rapoport <rppt@kernel.org>
> > Cc: Logan Gunthorpe <logang@deltatee.com>
> > Cc: Mark Brown <broonie@kernel.org>
> > To: linux-arm-kernel@lists.infradead.org
> >
> > Pingfan Liu (5):
> >   arm64/mm: introduce pgtable allocator for idmap_pg_dir and init_pg_dir
> >   arm64/mm: disable WARN_ON() and BUG_ON() in __create_pgd_mapping() if
> >     too early
> >   arm64/mm: unconditionally set IDMAP_PGTABLE_LEVELS to max pgtable
> >     level
> >   arm64/mm: make __create_pgd_mapping() capable to handle pgtable's
> >     paddr
> >   arm64/mm: use __create_pgd_mapping() to create pgtable for
> >     idmap_pg_dir and init_pg_dir
> >
> >  arch/arm64/include/asm/kernel-pgtable.h |  33 +++--
> >  arch/arm64/include/asm/pgalloc.h        |   9 ++
> >  arch/arm64/kernel/head.S                | 164 +++++++-----------------
> >  arch/arm64/mm/mmu.c                     | 108 ++++++++++++----
> >  4 files changed, 153 insertions(+), 161 deletions(-)
> >
> > --
> > 2.29.2
> >
Catalin Marinas June 8, 2021, 5:38 p.m. UTC | #3
On Tue, Jun 01, 2021 at 05:25:49PM +0800, Pingfan Liu wrote:
> On Tue, Jun 1, 2021 at 3:50 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > On Mon, 31 May 2021 at 10:46, Pingfan Liu <kernelfans@gmail.com> wrote:
> > > v2 -> v3:
> > >   -1. leave out the part of redefinition the CONFIG_PGTABLE_LEVEL,
> > > concentrate on sharing __create_pgd_mapping() in head.S as the first
> > > step.
> > >   -2. make IDMAP_PGTABLE_LEVELS use the max value ([3/5])
> > >
> > > rfc -> v2:
> > >   more debug and test
> > >
> > > *** Goal of this series ***
> > >
> > > __create_pgd_mapping() sets up the pgtable for mapping __va(paddr) ->
> > > paddr under the MMU-on situation.  Since pgtable upper level holds the
> > > paddr of the lower level, with a slight adaptation,
> > > __create_pgd_mapping() can also set up the mapping under the MMU-off
> > > situation. ([4/5])
> > >
> > > After that, both idmap_pg_dir and init_pg_dir can be created by
> > > __create_pgd_mapping(). And the counterpart asm code can be simplified.
> >
> > I understand the desire to simplify the page table construction code
> > in head.S, but up until now, we have been very careful to avoid
> > calling into C code with the MMU off. There are simply too many
> > assumptions in the compiler about the context the generated code will
> > execute in: one example is unaligned access, which must be disabled
> > for source files that may be called with the MMU off, as otherwise,
> > the compiler is permitted to emit loads and stores that are not
> > allowed on device memory (which is the default memory type used for
> > all memory with the MMU off)
>
> You are right. These C routines happen to use "unsigned long", which
> can exclude this unaligned case.
> To make an guarantee, is "-mno-unaligned-access" good enough?
> 
> Besides unaligned-access, any further risk originating from compiler
> assumption? (I think that the common optimization: reordering,
> merging, reloading on this "device" memory has no bad effect)

There's also instrumentation that needs disabling (kasan, ubsan, kcov,
gcov). You can look at arch/arm64/kvm/hyp/nvhe/Makefile for various
flags added or filtered out, though the KVM hyp code runs with the MMU
on. I'm not sure what other flags are needed to guarantee the generated
code can run with the MMU off but we can always ask the toolchain folk.

However, I'm still not convinced about sharing __create_pgd_mapping()
with the early head.S code. A better option would be a separate,
stand-alone file where we have more control on what gets called or
accessed (of course, if there's any value in going this route).

> > Do you have a killer use case for this feature? Or is it just a nice cleanup?
>
> Yes, in the omitted part in v2, I had planned to provide an unified
> page table manipulation routine, and provide an create_idmap() API. So
> there can be an handy interface to create a whole RAM addressable
> idmapX where needed.

Where would this be needed?
Pingfan Liu June 9, 2021, 9:25 a.m. UTC | #4
On Tue, Jun 08, 2021 at 06:38:07PM +0100, Catalin Marinas wrote:
> On Tue, Jun 01, 2021 at 05:25:49PM +0800, Pingfan Liu wrote:
> > On Tue, Jun 1, 2021 at 3:50 AM Ard Biesheuvel <ardb@kernel.org> wrote:
> > > On Mon, 31 May 2021 at 10:46, Pingfan Liu <kernelfans@gmail.com> wrote:
> > > > v2 -> v3:
> > > >   -1. leave out the part of redefinition the CONFIG_PGTABLE_LEVEL,
> > > > concentrate on sharing __create_pgd_mapping() in head.S as the first
> > > > step.
> > > >   -2. make IDMAP_PGTABLE_LEVELS use the max value ([3/5])
> > > >
> > > > rfc -> v2:
> > > >   more debug and test
> > > >
> > > > *** Goal of this series ***
> > > >
> > > > __create_pgd_mapping() sets up the pgtable for mapping __va(paddr) ->
> > > > paddr under the MMU-on situation.  Since pgtable upper level holds the
> > > > paddr of the lower level, with a slight adaptation,
> > > > __create_pgd_mapping() can also set up the mapping under the MMU-off
> > > > situation. ([4/5])
> > > >
> > > > After that, both idmap_pg_dir and init_pg_dir can be created by
> > > > __create_pgd_mapping(). And the counterpart asm code can be simplified.
> > >
> > > I understand the desire to simplify the page table construction code
> > > in head.S, but up until now, we have been very careful to avoid
> > > calling into C code with the MMU off. There are simply too many
> > > assumptions in the compiler about the context the generated code will
> > > execute in: one example is unaligned access, which must be disabled
> > > for source files that may be called with the MMU off, as otherwise,
> > > the compiler is permitted to emit loads and stores that are not
> > > allowed on device memory (which is the default memory type used for
> > > all memory with the MMU off)
> >
> > You are right. These C routines happen to use "unsigned long", which
> > can exclude this unaligned case.
> > To make an guarantee, is "-mno-unaligned-access" good enough?
> > 

Unaligned-access is the main challedge, and the unaligned-access should be caused from the programer.
Compiler has enforced different default alignment, including global and local(stack) variable, function alignment.
Otherwise, the atomity can not be assumped on some base data type.

But some code may violate the alignment, which compiler has no method to prevent.
As Documentation/core-api/unaligned-memory-access.rst
  1. Casting variables to types of different lengths
  2. Pointer arithmetic followed by access to at least 2 bytes of data

So in order to prevent the unaligned-access, the involved codes should be checked carefully.

> > Besides unaligned-access, any further risk originating from compiler
> > assumption? (I think that the common optimization: reordering,
> > merging, reloading on this "device" memory has no bad effect)
> 

These should be not issue. Since MMU-off present a more strict memory model than MMU-on.
So if a code can run on a more relaxed one, it can also run on the strict one.

> There's also instrumentation that needs disabling (kasan, ubsan, kcov,
> gcov). You can look at arch/arm64/kvm/hyp/nvhe/Makefile for various
> flags added or filtered out, though the KVM hyp code runs with the MMU
> on. I'm not sure what other flags are needed to guarantee the generated
> code can run with the MMU off but we can always ask the toolchain folk.
> 

Thanks for pointing out these issues. I will check.

> However, I'm still not convinced about sharing __create_pgd_mapping()
> with the early head.S code. A better option would be a separate,
> stand-alone file where we have more control on what gets called or
> accessed (of course, if there's any value in going this route).
> 
> > > Do you have a killer use case for this feature? Or is it just a nice cleanup?
> >
> > Yes, in the omitted part in v2, I had planned to provide an unified
> > page table manipulation routine, and provide an create_idmap() API. So
> > there can be an handy interface to create a whole RAM addressable
> > idmapX where needed.
> 
> Where would this be needed?
> 

It was planned for Pavel's series: [PATCH v13 00/18] arm64: MMU enabled kexec relocation,
where Pavel uses linear mapping to copy the data. But that way should involve
copy of "EL2 vectors" to tackle the transition to EL2.  On the other hand, if
using a global addressable idmap, the logic of transition to EL2 keeps
untouched, and just enable MMU in SYM_CODE_START(arm64_relocate_new_kernel)

The other call site is in trans_pgd_idmap_page() 


Thanks,

Pingfan