Message ID | 20220421140339.1329019-2-ardb@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: apply G-to-nG conversion for KPTI with MMU enabled | expand |
Hi Ard, This is a really nice cleanup! On Thu, Apr 21, 2022 at 04:03:38PM +0200, Ard Biesheuvel wrote: > Simplify the KPTI G-to-nG asm helper code by: > - pulling the 'table bit' test into the get/put macros so we can combine > them and incorporate the entire loop; > - moving the 'table bit' test after the update of bit #11 so we no > longer need separate next_xxx and skip_xxx labels; > - redefining the pmd/pud register aliases and the next_pmd/next_pud > labels instead of branching to them if the number of configured page > table levels is less than 3 or 4, respectively; All of this looks good to me. > - folding the descriptor pointer increment into the LDR instructions. I think this leads to issuing a CMO to the wrong address; explained below with a proposed fixup. > No functional change intended, except for the fact that we now descend > into a next level table after setting bit #11 on its descriptor but this > should make no difference in practice. Since we don't play games with recursive tables I agree that should fine. > While at it, switch to .L prefixed local labels so they don't clutter up > the symbol tables, kallsyms, etc, and clean up the indentation for > legibility. This also sounds good to me. > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> If you're happy with my proposed fixup below: Reviewed-by: Mark Rutland <mark.rutland@arm.com> With that applied, I've tested this in VMs on ThunderX2 (with KPTI forced on an kaslr_requires_kpti() hacked to false) in the following configurations: * 4K / 39-bit VAs / 3 levels * 4K / 48-bit VAs / 4 levels * 4K / 48-bit VAs / 4 levels + KASAN (to test shadow skipping) * 64K / 42-bit VAs / 2 levels * 64K / 48-bit VAs / 3 levels * 64K / 52-bit VAs / 3 levels ... in each case checking the resulting kernel tables with ptudmp. In all cases that looked good, so (with the fixup): Tested-by: Mark Rutland <mark.rutland@arm.com> Beware when looking at the ptdump output that we have 3 KPTI trampoline pages in the fixmap region which are (legitimately) executable and global; I got very confused when I first spotted that! > --- > arch/arm64/mm/proc.S | 97 +++++++------------- > 1 file changed, 34 insertions(+), 63 deletions(-) > > diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S > index 50bbed947bec..5619c00f8cd4 100644 > --- a/arch/arm64/mm/proc.S > +++ b/arch/arm64/mm/proc.S > @@ -202,19 +202,24 @@ SYM_FUNC_END(idmap_cpu_replace_ttbr1) > #ifdef CONFIG_UNMAP_KERNEL_AT_EL0 > .pushsection ".idmap.text", "awx" > > - .macro __idmap_kpti_get_pgtable_ent, type > + .macro kpti_mk_tbl_ng, type, num_entries > + add end_\type\()p, cur_\type\()p, #\num_entries * 8 > +.Ldo_\type: > dc cvac, cur_\()\type\()p // Ensure any existing dirty Could clean up the (existing) extraneous `\()` here too? That'd make it easier to spot the `cur_\type\()p` patter consistently. > dmb sy // lines are written back before > - ldr \type, [cur_\()\type\()p] // loading the entry > - tbz \type, #0, skip_\()\type // Skip invalid and > - tbnz \type, #11, skip_\()\type // non-global entries > - .endm > - > - .macro __idmap_kpti_put_pgtable_ent_ng, type > + ldr \type, [cur_\type\()p], #8 // loading the entry So now `cur_`\type()p` points at the *next* entry, which is a bit confusing ... > + tbz \type, #0, .Lnext_\type // Skip invalid and > + tbnz \type, #11, .Lnext_\type // non-global entries > orr \type, \type, #PTE_NG // Same bit for blocks and pages > - str \type, [cur_\()\type\()p] // Update the entry and ensure > + str \type, [cur_\type\()p, #-8] // Update the entry and ensure ... though we handle the offset correctly here ... > dmb sy // that it is visible to all > dc civac, cur_\()\type\()p // CPUs. ... but here we don't, and perform the CMO for the *next* entry. So for the last entry in each cacheline we're missing an invalidate. > + .ifnc \type, pte > + tbnz \type, #1, .Lderef_\type > + .endif > +.Lnext_\type: > + cmp cur_\type\()p, end_\type\()p > + b.ne .Ldo_\type > .endm I reckon it's be slightly clearer and safer to have an explicit `ADD` at the start of `.Lnext_\type`, i.e. | .macro kpti_mk_tbl_ng, type, num_entries | add end_\type\()p, cur_\type\()p, #\num_entries * 8 | .Ldo_\type: | dc cvac, cur_\type\()p // Ensure any existing dirty | dmb sy // lines are written back before | ldr \type, [cur_\type\()p] // loading the entry | tbz \type, #0, .Lnext_\type // Skip invalid and | tbnz \type, #11, .Lnext_\type // non-global entries | orr \type, \type, #PTE_NG // Same bit for blocks and pages | str \type, [cur_\type\()p] // Update the entry and ensure | dmb sy // that it is visible to all | dc civac, cur_\()\type\()p // CPUs. | .ifnc \type, pte | tbnz \type, #1, .Lderef_\type | .endif | .Lnext_\type: | add cur_\type\()p, cur_\type\()p, #8 | cmp cur_\type\()p, end_\type\()p | b.ne .Ldo_\type | .endm Other than that, this looks good to me. Thanks, Mark. > /* > @@ -235,10 +240,8 @@ SYM_FUNC_START(idmap_kpti_install_ng_mappings) > pgd .req x7 > cur_pudp .req x8 > end_pudp .req x9 > - pud .req x10 > cur_pmdp .req x11 > end_pmdp .req x12 > - pmd .req x13 > cur_ptep .req x14 > end_ptep .req x15 > pte .req x16 > @@ -265,16 +268,8 @@ SYM_FUNC_START(idmap_kpti_install_ng_mappings) > > /* Everybody is enjoying the idmap, so we can rewrite swapper. */ > /* PGD */ > - mov cur_pgdp, swapper_pa > - add end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8) > -do_pgd: __idmap_kpti_get_pgtable_ent pgd > - tbnz pgd, #1, walk_puds > -next_pgd: > - __idmap_kpti_put_pgtable_ent_ng pgd > -skip_pgd: > - add cur_pgdp, cur_pgdp, #8 > - cmp cur_pgdp, end_pgdp > - b.ne do_pgd > + mov cur_pgdp, swapper_pa > + kpti_mk_tbl_ng pgd, PTRS_PER_PGD > > /* Publish the updated tables and nuke all the TLBs */ > dsb sy > @@ -291,59 +286,35 @@ skip_pgd: > str wzr, [flag_ptr] > ret > > +.Lderef_pgd: > /* PUD */ > -walk_puds: > - .if CONFIG_PGTABLE_LEVELS > 3 > + .if CONFIG_PGTABLE_LEVELS > 3 > + pud .req x10 > pte_to_phys cur_pudp, pgd > - add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8) > -do_pud: __idmap_kpti_get_pgtable_ent pud > - tbnz pud, #1, walk_pmds > -next_pud: > - __idmap_kpti_put_pgtable_ent_ng pud > -skip_pud: > - add cur_pudp, cur_pudp, 8 > - cmp cur_pudp, end_pudp > - b.ne do_pud > - b next_pgd > - .else /* CONFIG_PGTABLE_LEVELS <= 3 */ > - mov pud, pgd > - b walk_pmds > -next_pud: > - b next_pgd > + kpti_mk_tbl_ng pud, PTRS_PER_PUD > + b .Lnext_pgd > + .else /* CONFIG_PGTABLE_LEVELS <= 3 */ > + pud .req pgd > + .set .Lnext_pud, .Lnext_pgd > .endif > > +.Lderef_pud: > /* PMD */ > -walk_pmds: > - .if CONFIG_PGTABLE_LEVELS > 2 > + .if CONFIG_PGTABLE_LEVELS > 2 > + pmd .req x13 > pte_to_phys cur_pmdp, pud > - add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8) > -do_pmd: __idmap_kpti_get_pgtable_ent pmd > - tbnz pmd, #1, walk_ptes > -next_pmd: > - __idmap_kpti_put_pgtable_ent_ng pmd > -skip_pmd: > - add cur_pmdp, cur_pmdp, #8 > - cmp cur_pmdp, end_pmdp > - b.ne do_pmd > - b next_pud > - .else /* CONFIG_PGTABLE_LEVELS <= 2 */ > - mov pmd, pud > - b walk_ptes > -next_pmd: > - b next_pud > + kpti_mk_tbl_ng pmd, PTRS_PER_PMD > + b .Lnext_pud > + .else /* CONFIG_PGTABLE_LEVELS <= 2 */ > + pmd .req pgd > + .set .Lnext_pmd, .Lnext_pgd > .endif > > +.Lderef_pmd: > /* PTE */ > -walk_ptes: > pte_to_phys cur_ptep, pmd > - add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8) > -do_pte: __idmap_kpti_get_pgtable_ent pte > - __idmap_kpti_put_pgtable_ent_ng pte > -skip_pte: > - add cur_ptep, cur_ptep, #8 > - cmp cur_ptep, end_ptep > - b.ne do_pte > - b next_pmd > + kpti_mk_tbl_ng pte, PTRS_PER_PTE > + b .Lnext_pmd > > .unreq cpu > .unreq num_cpus > -- > 2.30.2 >
On Thu, 19 May 2022 at 19:09, Mark Rutland <mark.rutland@arm.com> wrote: > > Hi Ard, > > This is a really nice cleanup! > Thanks! > On Thu, Apr 21, 2022 at 04:03:38PM +0200, Ard Biesheuvel wrote: > > Simplify the KPTI G-to-nG asm helper code by: > > - pulling the 'table bit' test into the get/put macros so we can combine > > them and incorporate the entire loop; > > - moving the 'table bit' test after the update of bit #11 so we no > > longer need separate next_xxx and skip_xxx labels; > > - redefining the pmd/pud register aliases and the next_pmd/next_pud > > labels instead of branching to them if the number of configured page > > table levels is less than 3 or 4, respectively; > > All of this looks good to me. > > > - folding the descriptor pointer increment into the LDR instructions. > > I think this leads to issuing a CMO to the wrong address; explained below with > a proposed fixup. > > > No functional change intended, except for the fact that we now descend > > into a next level table after setting bit #11 on its descriptor but this > > should make no difference in practice. > > Since we don't play games with recursive tables I agree that should fine. > > > While at it, switch to .L prefixed local labels so they don't clutter up > > the symbol tables, kallsyms, etc, and clean up the indentation for > > legibility. > > This also sounds good to me. > > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org> > > If you're happy with my proposed fixup below: > > Reviewed-by: Mark Rutland <mark.rutland@arm.com> > > With that applied, I've tested this in VMs on ThunderX2 (with KPTI forced on an > kaslr_requires_kpti() hacked to false) in the following configurations: > > * 4K / 39-bit VAs / 3 levels > * 4K / 48-bit VAs / 4 levels > * 4K / 48-bit VAs / 4 levels + KASAN (to test shadow skipping) > * 64K / 42-bit VAs / 2 levels > * 64K / 48-bit VAs / 3 levels > * 64K / 52-bit VAs / 3 levels > > ... in each case checking the resulting kernel tables with ptudmp. > > In all cases that looked good, so (with the fixup): > > Tested-by: Mark Rutland <mark.rutland@arm.com> > > Beware when looking at the ptdump output that we have 3 KPTI trampoline pages > in the fixmap region which are (legitimately) executable and global; I got very > confused when I first spotted that! > > > --- > > arch/arm64/mm/proc.S | 97 +++++++------------- > > 1 file changed, 34 insertions(+), 63 deletions(-) > > > > diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S > > index 50bbed947bec..5619c00f8cd4 100644 > > --- a/arch/arm64/mm/proc.S > > +++ b/arch/arm64/mm/proc.S > > @@ -202,19 +202,24 @@ SYM_FUNC_END(idmap_cpu_replace_ttbr1) > > #ifdef CONFIG_UNMAP_KERNEL_AT_EL0 > > .pushsection ".idmap.text", "awx" > > > > - .macro __idmap_kpti_get_pgtable_ent, type > > + .macro kpti_mk_tbl_ng, type, num_entries > > + add end_\type\()p, cur_\type\()p, #\num_entries * 8 > > +.Ldo_\type: > > dc cvac, cur_\()\type\()p // Ensure any existing dirty > > Could clean up the (existing) extraneous `\()` here too? That'd make it easier > to spot the `cur_\type\()p` patter consistently. > > > dmb sy // lines are written back before > > - ldr \type, [cur_\()\type\()p] // loading the entry > > - tbz \type, #0, skip_\()\type // Skip invalid and > > - tbnz \type, #11, skip_\()\type // non-global entries > > - .endm > > - > > - .macro __idmap_kpti_put_pgtable_ent_ng, type > > + ldr \type, [cur_\type\()p], #8 // loading the entry > > So now `cur_`\type()p` points at the *next* entry, which is a bit confusing ... > > > + tbz \type, #0, .Lnext_\type // Skip invalid and > > + tbnz \type, #11, .Lnext_\type // non-global entries > > orr \type, \type, #PTE_NG // Same bit for blocks and pages > > - str \type, [cur_\()\type\()p] // Update the entry and ensure > > + str \type, [cur_\type\()p, #-8] // Update the entry and ensure > > ... though we handle the offset correctly here ... > > > dmb sy // that it is visible to all > > dc civac, cur_\()\type\()p // CPUs. > > ... but here we don't, and perform the CMO for the *next* entry. So for the > last entry in each cacheline we're missing an invalidate. > Yeah, this is a consequence of splitting off these changes: the next patch gets rid of the CMOs and so it removes the problem as well. So either we merge the patches again, or apply your fixup. I don't have a strong preference either way. > > + .ifnc \type, pte > > + tbnz \type, #1, .Lderef_\type > > + .endif > > +.Lnext_\type: > > + cmp cur_\type\()p, end_\type\()p > > + b.ne .Ldo_\type > > .endm > > I reckon it's be slightly clearer and safer to have an explicit `ADD` at the > start of `.Lnext_\type`, i.e. > > | .macro kpti_mk_tbl_ng, type, num_entries > | add end_\type\()p, cur_\type\()p, #\num_entries * 8 > | .Ldo_\type: > | dc cvac, cur_\type\()p // Ensure any existing dirty > | dmb sy // lines are written back before > | ldr \type, [cur_\type\()p] // loading the entry > | tbz \type, #0, .Lnext_\type // Skip invalid and > | tbnz \type, #11, .Lnext_\type // non-global entries > | orr \type, \type, #PTE_NG // Same bit for blocks and pages > | str \type, [cur_\type\()p] // Update the entry and ensure > | dmb sy // that it is visible to all > | dc civac, cur_\()\type\()p // CPUs. > | .ifnc \type, pte > | tbnz \type, #1, .Lderef_\type > | .endif > | .Lnext_\type: > | add cur_\type\()p, cur_\type\()p, #8 > | cmp cur_\type\()p, end_\type\()p > | b.ne .Ldo_\type > | .endm > > Other than that, this looks good to me. > > Thanks, > Mark. > > > /* > > @@ -235,10 +240,8 @@ SYM_FUNC_START(idmap_kpti_install_ng_mappings) > > pgd .req x7 > > cur_pudp .req x8 > > end_pudp .req x9 > > - pud .req x10 > > cur_pmdp .req x11 > > end_pmdp .req x12 > > - pmd .req x13 > > cur_ptep .req x14 > > end_ptep .req x15 > > pte .req x16 > > @@ -265,16 +268,8 @@ SYM_FUNC_START(idmap_kpti_install_ng_mappings) > > > > /* Everybody is enjoying the idmap, so we can rewrite swapper. */ > > /* PGD */ > > - mov cur_pgdp, swapper_pa > > - add end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8) > > -do_pgd: __idmap_kpti_get_pgtable_ent pgd > > - tbnz pgd, #1, walk_puds > > -next_pgd: > > - __idmap_kpti_put_pgtable_ent_ng pgd > > -skip_pgd: > > - add cur_pgdp, cur_pgdp, #8 > > - cmp cur_pgdp, end_pgdp > > - b.ne do_pgd > > + mov cur_pgdp, swapper_pa > > + kpti_mk_tbl_ng pgd, PTRS_PER_PGD > > > > /* Publish the updated tables and nuke all the TLBs */ > > dsb sy > > @@ -291,59 +286,35 @@ skip_pgd: > > str wzr, [flag_ptr] > > ret > > > > +.Lderef_pgd: > > /* PUD */ > > -walk_puds: > > - .if CONFIG_PGTABLE_LEVELS > 3 > > + .if CONFIG_PGTABLE_LEVELS > 3 > > + pud .req x10 > > pte_to_phys cur_pudp, pgd > > - add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8) > > -do_pud: __idmap_kpti_get_pgtable_ent pud > > - tbnz pud, #1, walk_pmds > > -next_pud: > > - __idmap_kpti_put_pgtable_ent_ng pud > > -skip_pud: > > - add cur_pudp, cur_pudp, 8 > > - cmp cur_pudp, end_pudp > > - b.ne do_pud > > - b next_pgd > > - .else /* CONFIG_PGTABLE_LEVELS <= 3 */ > > - mov pud, pgd > > - b walk_pmds > > -next_pud: > > - b next_pgd > > + kpti_mk_tbl_ng pud, PTRS_PER_PUD > > + b .Lnext_pgd > > + .else /* CONFIG_PGTABLE_LEVELS <= 3 */ > > + pud .req pgd > > + .set .Lnext_pud, .Lnext_pgd > > .endif > > > > +.Lderef_pud: > > /* PMD */ > > -walk_pmds: > > - .if CONFIG_PGTABLE_LEVELS > 2 > > + .if CONFIG_PGTABLE_LEVELS > 2 > > + pmd .req x13 > > pte_to_phys cur_pmdp, pud > > - add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8) > > -do_pmd: __idmap_kpti_get_pgtable_ent pmd > > - tbnz pmd, #1, walk_ptes > > -next_pmd: > > - __idmap_kpti_put_pgtable_ent_ng pmd > > -skip_pmd: > > - add cur_pmdp, cur_pmdp, #8 > > - cmp cur_pmdp, end_pmdp > > - b.ne do_pmd > > - b next_pud > > - .else /* CONFIG_PGTABLE_LEVELS <= 2 */ > > - mov pmd, pud > > - b walk_ptes > > -next_pmd: > > - b next_pud > > + kpti_mk_tbl_ng pmd, PTRS_PER_PMD > > + b .Lnext_pud > > + .else /* CONFIG_PGTABLE_LEVELS <= 2 */ > > + pmd .req pgd > > + .set .Lnext_pmd, .Lnext_pgd > > .endif > > > > +.Lderef_pmd: > > /* PTE */ > > -walk_ptes: > > pte_to_phys cur_ptep, pmd > > - add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8) > > -do_pte: __idmap_kpti_get_pgtable_ent pte > > - __idmap_kpti_put_pgtable_ent_ng pte > > -skip_pte: > > - add cur_ptep, cur_ptep, #8 > > - cmp cur_ptep, end_ptep > > - b.ne do_pte > > - b next_pmd > > + kpti_mk_tbl_ng pte, PTRS_PER_PTE > > + b .Lnext_pmd > > > > .unreq cpu > > .unreq num_cpus > > -- > > 2.30.2 > >
diff --git a/arch/arm64/mm/proc.S b/arch/arm64/mm/proc.S index 50bbed947bec..5619c00f8cd4 100644 --- a/arch/arm64/mm/proc.S +++ b/arch/arm64/mm/proc.S @@ -202,19 +202,24 @@ SYM_FUNC_END(idmap_cpu_replace_ttbr1) #ifdef CONFIG_UNMAP_KERNEL_AT_EL0 .pushsection ".idmap.text", "awx" - .macro __idmap_kpti_get_pgtable_ent, type + .macro kpti_mk_tbl_ng, type, num_entries + add end_\type\()p, cur_\type\()p, #\num_entries * 8 +.Ldo_\type: dc cvac, cur_\()\type\()p // Ensure any existing dirty dmb sy // lines are written back before - ldr \type, [cur_\()\type\()p] // loading the entry - tbz \type, #0, skip_\()\type // Skip invalid and - tbnz \type, #11, skip_\()\type // non-global entries - .endm - - .macro __idmap_kpti_put_pgtable_ent_ng, type + ldr \type, [cur_\type\()p], #8 // loading the entry + tbz \type, #0, .Lnext_\type // Skip invalid and + tbnz \type, #11, .Lnext_\type // non-global entries orr \type, \type, #PTE_NG // Same bit for blocks and pages - str \type, [cur_\()\type\()p] // Update the entry and ensure + str \type, [cur_\type\()p, #-8] // Update the entry and ensure dmb sy // that it is visible to all dc civac, cur_\()\type\()p // CPUs. + .ifnc \type, pte + tbnz \type, #1, .Lderef_\type + .endif +.Lnext_\type: + cmp cur_\type\()p, end_\type\()p + b.ne .Ldo_\type .endm /* @@ -235,10 +240,8 @@ SYM_FUNC_START(idmap_kpti_install_ng_mappings) pgd .req x7 cur_pudp .req x8 end_pudp .req x9 - pud .req x10 cur_pmdp .req x11 end_pmdp .req x12 - pmd .req x13 cur_ptep .req x14 end_ptep .req x15 pte .req x16 @@ -265,16 +268,8 @@ SYM_FUNC_START(idmap_kpti_install_ng_mappings) /* Everybody is enjoying the idmap, so we can rewrite swapper. */ /* PGD */ - mov cur_pgdp, swapper_pa - add end_pgdp, cur_pgdp, #(PTRS_PER_PGD * 8) -do_pgd: __idmap_kpti_get_pgtable_ent pgd - tbnz pgd, #1, walk_puds -next_pgd: - __idmap_kpti_put_pgtable_ent_ng pgd -skip_pgd: - add cur_pgdp, cur_pgdp, #8 - cmp cur_pgdp, end_pgdp - b.ne do_pgd + mov cur_pgdp, swapper_pa + kpti_mk_tbl_ng pgd, PTRS_PER_PGD /* Publish the updated tables and nuke all the TLBs */ dsb sy @@ -291,59 +286,35 @@ skip_pgd: str wzr, [flag_ptr] ret +.Lderef_pgd: /* PUD */ -walk_puds: - .if CONFIG_PGTABLE_LEVELS > 3 + .if CONFIG_PGTABLE_LEVELS > 3 + pud .req x10 pte_to_phys cur_pudp, pgd - add end_pudp, cur_pudp, #(PTRS_PER_PUD * 8) -do_pud: __idmap_kpti_get_pgtable_ent pud - tbnz pud, #1, walk_pmds -next_pud: - __idmap_kpti_put_pgtable_ent_ng pud -skip_pud: - add cur_pudp, cur_pudp, 8 - cmp cur_pudp, end_pudp - b.ne do_pud - b next_pgd - .else /* CONFIG_PGTABLE_LEVELS <= 3 */ - mov pud, pgd - b walk_pmds -next_pud: - b next_pgd + kpti_mk_tbl_ng pud, PTRS_PER_PUD + b .Lnext_pgd + .else /* CONFIG_PGTABLE_LEVELS <= 3 */ + pud .req pgd + .set .Lnext_pud, .Lnext_pgd .endif +.Lderef_pud: /* PMD */ -walk_pmds: - .if CONFIG_PGTABLE_LEVELS > 2 + .if CONFIG_PGTABLE_LEVELS > 2 + pmd .req x13 pte_to_phys cur_pmdp, pud - add end_pmdp, cur_pmdp, #(PTRS_PER_PMD * 8) -do_pmd: __idmap_kpti_get_pgtable_ent pmd - tbnz pmd, #1, walk_ptes -next_pmd: - __idmap_kpti_put_pgtable_ent_ng pmd -skip_pmd: - add cur_pmdp, cur_pmdp, #8 - cmp cur_pmdp, end_pmdp - b.ne do_pmd - b next_pud - .else /* CONFIG_PGTABLE_LEVELS <= 2 */ - mov pmd, pud - b walk_ptes -next_pmd: - b next_pud + kpti_mk_tbl_ng pmd, PTRS_PER_PMD + b .Lnext_pud + .else /* CONFIG_PGTABLE_LEVELS <= 2 */ + pmd .req pgd + .set .Lnext_pmd, .Lnext_pgd .endif +.Lderef_pmd: /* PTE */ -walk_ptes: pte_to_phys cur_ptep, pmd - add end_ptep, cur_ptep, #(PTRS_PER_PTE * 8) -do_pte: __idmap_kpti_get_pgtable_ent pte - __idmap_kpti_put_pgtable_ent_ng pte -skip_pte: - add cur_ptep, cur_ptep, #8 - cmp cur_ptep, end_ptep - b.ne do_pte - b next_pmd + kpti_mk_tbl_ng pte, PTRS_PER_PTE + b .Lnext_pmd .unreq cpu .unreq num_cpus
Simplify the KPTI G-to-nG asm helper code by: - pulling the 'table bit' test into the get/put macros so we can combine them and incorporate the entire loop; - moving the 'table bit' test after the update of bit #11 so we no longer need separate next_xxx and skip_xxx labels; - redefining the pmd/pud register aliases and the next_pmd/next_pud labels instead of branching to them if the number of configured page table levels is less than 3 or 4, respectively; - folding the descriptor pointer increment into the LDR instructions. No functional change intended, except for the fact that we now descend into a next level table after setting bit #11 on its descriptor but this should make no difference in practice. While at it, switch to .L prefixed local labels so they don't clutter up the symbol tables, kallsyms, etc, and clean up the indentation for legibility. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> --- arch/arm64/mm/proc.S | 97 +++++++------------- 1 file changed, 34 insertions(+), 63 deletions(-)