diff mbox series

[2/2] arm64: mm: always map fixmap at page granularity

Message ID 20230314142125.502043-3-mark.rutland@arm.com (mailing list archive)
State New, archived
Headers show
Series arm64: simplify fixmap code | expand

Commit Message

Mark Rutland March 14, 2023, 2:21 p.m. UTC
Today the fixmap code largely maps elements at PAGE_SIZE granularity,
but we special-case the FDT mapping such that it can be mapped with 2M
block mappings when 4K pages are in use. The original rationale for this
was simplicity, but it has some unfortunate side-effects, and
complicates portions of the fixmap code (i.e. is not so simple after
all).

The FDT can be up to 2M in size but is only required to have 8-byte
alignment, and so it may straddle a 2M boundary. Thus when using 2M
block mappings we may map up to 4M of memory surrounding the FDT. This
is unfortunate as most of that memory will be unrelated to the FDT, and
any pages which happen to share a 2M block with the FDT will by mapped
with Normal Write-Back Cacheable attributes, which might not be what we
want elsewhere (e.g. for carve-outs using Non-Cacheable attributes).

The logic to handle mapping the FDT with 2M blocks requires some special
cases in the fixmap code, and ties it to the early page table
configuration by virtue of the SWAPPER_TABLE_SHIFT and
SWAPPER_BLOCK_SIZE constants used to determine the granularity used to
map the FDT.

This patch simplifies the FDT logic and removes the unnecessary mappings
of surrounding pages by always mapping the FDT at page granularity as
with all other fixmap mappings. To do so we statically reserve multiple
PTE tables to cover the fixmap VA range. Since the FDT can be at most
2M, for 4K pages we only need to allocate a single additional PTE table,
and for 16K and 64K pages the existing single PTE table is sufficient.

The PTE table allocation scales with the number of slots reserved in the
fixmap, and so this also makes it easier to add more fixmap entries if
we require those in future.

Our VA layout means that the fixmap will always fall within a single PMD
table (and consequently, within a single PUD/P4D/PGD entry), which we
can verify at compile time with a static_assert(). With that assert a
number of runtime warnings become impossible, and are removed.

I've boot-tested this patch with both 4K and 64K pages.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Anshuman Khandual <anshuman.khandual@arm.com>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Ryan Roberts <ryan.roberts@arm.com>
Cc: Will Deacon <will@kernel.org>
---
 arch/arm64/include/asm/fixmap.h         |  14 +--
 arch/arm64/include/asm/kernel-pgtable.h |   5 +-
 arch/arm64/mm/fixmap.c                  | 148 +++++++++++-------------
 3 files changed, 77 insertions(+), 90 deletions(-)

Comments

Ryan Roberts March 15, 2023, 10:23 a.m. UTC | #1
On 14/03/2023 14:21, Mark Rutland wrote:
> Today the fixmap code largely maps elements at PAGE_SIZE granularity,
> but we special-case the FDT mapping such that it can be mapped with 2M
> block mappings when 4K pages are in use. The original rationale for this
> was simplicity, but it has some unfortunate side-effects, and
> complicates portions of the fixmap code (i.e. is not so simple after
> all).
> 
> The FDT can be up to 2M in size but is only required to have 8-byte
> alignment, and so it may straddle a 2M boundary. Thus when using 2M
> block mappings we may map up to 4M of memory surrounding the FDT. This
> is unfortunate as most of that memory will be unrelated to the FDT, and
> any pages which happen to share a 2M block with the FDT will by mapped
> with Normal Write-Back Cacheable attributes, which might not be what we
> want elsewhere (e.g. for carve-outs using Non-Cacheable attributes).
> 
> The logic to handle mapping the FDT with 2M blocks requires some special
> cases in the fixmap code, and ties it to the early page table
> configuration by virtue of the SWAPPER_TABLE_SHIFT and
> SWAPPER_BLOCK_SIZE constants used to determine the granularity used to
> map the FDT.
> 
> This patch simplifies the FDT logic and removes the unnecessary mappings
> of surrounding pages by always mapping the FDT at page granularity as
> with all other fixmap mappings. To do so we statically reserve multiple
> PTE tables to cover the fixmap VA range. Since the FDT can be at most
> 2M, for 4K pages we only need to allocate a single additional PTE table,
> and for 16K and 64K pages the existing single PTE table is sufficient.
> 
> The PTE table allocation scales with the number of slots reserved in the
> fixmap, and so this also makes it easier to add more fixmap entries if
> we require those in future.
> 
> Our VA layout means that the fixmap will always fall within a single PMD
> table (and consequently, within a single PUD/P4D/PGD entry), which we
> can verify at compile time with a static_assert(). With that assert a
> number of runtime warnings become impossible, and are removed.
> 
> I've boot-tested this patch with both 4K and 64K pages.

I recently got caught up in the fixmap's tentacles so your effort to improve it
it certainly welcomed by me! Review comments below.

Thanks,
Ryan


> 
> Signed-off-by: Mark Rutland <mark.rutland@arm.com>
> Cc: Anshuman Khandual <anshuman.khandual@arm.com>
> Cc: Ard Biesheuvel <ardb@kernel.org>
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Ryan Roberts <ryan.roberts@arm.com>
> Cc: Will Deacon <will@kernel.org>
> ---
>  arch/arm64/include/asm/fixmap.h         |  14 +--
>  arch/arm64/include/asm/kernel-pgtable.h |   5 +-
>  arch/arm64/mm/fixmap.c                  | 148 +++++++++++-------------
>  3 files changed, 77 insertions(+), 90 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
> index c153f069e9c9..c59d433c1801 100644
> --- a/arch/arm64/include/asm/fixmap.h
> +++ b/arch/arm64/include/asm/fixmap.h
> @@ -36,17 +36,13 @@ enum fixed_addresses {
>  	FIX_HOLE,
>  
>  	/*
> -	 * Reserve a virtual window for the FDT that is 2 MB larger than the
> -	 * maximum supported size, and put it at the top of the fixmap region.
> -	 * The additional space ensures that any FDT that does not exceed
> -	 * MAX_FDT_SIZE can be mapped regardless of whether it crosses any
> -	 * 2 MB alignment boundaries.
> -	 *
> -	 * Keep this at the top so it remains 2 MB aligned.
> +	 * Reserve a virtual window for the FDT that is a page bigger than the
> +	 * maximum supported size. The additional space ensures that any FDT
> +	 * that does not exceed MAX_FDT_SIZE can be mapped regardless of
> +	 * whether it crosses any page boundary.
>  	 */
> -#define FIX_FDT_SIZE		(MAX_FDT_SIZE + SZ_2M)
>  	FIX_FDT_END,
> -	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
> +	FIX_FDT = FIX_FDT_END + MAX_FDT_SIZE / PAGE_SIZE - 1,

I don't think this is consistent with your comment - needs a +1 for the extra
page? On a 4K system, FIX_FDT will be calculated as 512, but we need it to be
513 to deal with the case where the FDT is 2MB and not aligned to a page boundary.

>  
>  	FIX_EARLYCON_MEM_BASE,
>  	FIX_TEXT_POKE0,
> diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
> index fcd14197756f..186dd7f85b14 100644
> --- a/arch/arm64/include/asm/kernel-pgtable.h
> +++ b/arch/arm64/include/asm/kernel-pgtable.h
> @@ -59,8 +59,11 @@
>  #define EARLY_KASLR	(0)
>  #endif
>  
> +#define SPAN_NR_ENTRIES(vstart, vend, shift) \
> +	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1)
> +
>  #define EARLY_ENTRIES(vstart, vend, shift, add) \
> -	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1 + add)
> +	(SPAN_NR_ENTRIES(vstart, vend, shift) + (add))
>  
>  #define EARLY_PGDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PGDIR_SHIFT, add))
>  
> diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
> index 54e50552bfe3..d7a6a485f361 100644
> --- a/arch/arm64/mm/fixmap.c
> +++ b/arch/arm64/mm/fixmap.c
> @@ -16,34 +16,77 @@
>  #include <asm/pgalloc.h>
>  #include <asm/tlbflush.h>
>  
> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> +#define NR_BM_PTE_TABLES \
> +	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PMD_SHIFT)
> +#define NR_BM_PMD_TABLES \
> +	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PUD_SHIFT)

One of the bear traps I stepped in last week when I was wrestling with this is
that FIXADDR_START isn't actually the address of the first slot! It is
__end_of_permanent_fixed_addresses. But you then have all the FIX_BTMAP slots,
FIX_PTE/PMD/PUD/PGD before FIXADDR_START. So I think some refactoring is in
order to properly cover all the slots.

Honestly, the fact that the fixed_addresses enum is backwards is mind-bending
too. Any chance we can just define FIXADDR_START and FIXADDR_TOP to cover the
whole lot, then calculate the slot address as (FIXADDR_START + (<ENUM> <<
PAGE_SHIFT))? (plus reorder of start/end enum labels for FDT and BTMAP). Or if
we really care about reclaiming the virtual address space between
__end_of_permanent_fixed_addresses and __end_of_fixed_addresses, create a new
FIXADDR_BOOT_START or whatever.

> +
> +static_assert(NR_BM_PMD_TABLES == 1);
> +
> +#define __BM_TABLE_IDX(addr, shift) \
> +	(((addr) >> (shift)) - (FIXADDR_START >> (shift)))

Doesn't this run the risk of being negative for one of the slots below
FIXADDR_START? (e.g. FIX_PTE). I think with the standard configs today, those
boot-time-only slots all _happen_ to fall in the same PMD. But that wasn't the
case with my issue last week.

> +
> +#define BM_PTE_TABLE_IDX(addr)	__BM_TABLE_IDX(addr, PMD_SHIFT)
> +
> +static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __page_aligned_bss;
>  static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
>  static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
>  
> -static inline pud_t *fixmap_pud(unsigned long addr)
> +static inline pte_t *fixmap_pte(unsigned long addr)
>  {
> -	pgd_t *pgdp = pgd_offset_k(addr);
> -	p4d_t *p4dp = p4d_offset(pgdp, addr);
> -	p4d_t p4d = READ_ONCE(*p4dp);
> +	return &bm_pte[BM_PTE_TABLE_IDX(addr)][pte_index(addr)];
> +}
>  
> -	BUG_ON(p4d_none(p4d) || p4d_bad(p4d));
> +void __init early_fixmap_init_pte(pmd_t *pmdp, unsigned long addr)

static? - same comment for pmd/pud below.

> +{
> +	pmd_t pmd = READ_ONCE(*pmdp);
> +	pte_t *ptep;
>  
> -	return pud_offset_kimg(p4dp, addr);
> +	if (pmd_none(pmd)) {
> +		ptep = bm_pte[BM_PTE_TABLE_IDX(addr)];;

nit: remove one of the semi-colons.

> +		__pmd_populate(pmdp, __pa_symbol(ptep), PMD_TYPE_TABLE);
> +	}
>  }
>  
> -static inline pmd_t *fixmap_pmd(unsigned long addr)
> +void __init early_fixmap_init_pmd(pud_t *pudp, unsigned long addr,
> +				  unsigned long end)
>  {
> -	pud_t *pudp = fixmap_pud(addr);
> +	unsigned long next;
>  	pud_t pud = READ_ONCE(*pudp);
> +	pmd_t *pmdp;
>  
> -	BUG_ON(pud_none(pud) || pud_bad(pud));
> +	if (pud_none(pud))
> +		__pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
>  
> -	return pmd_offset_kimg(pudp, addr);
> +	pmdp = pmd_offset_kimg(pudp, addr);
> +	do {
> +		next = pmd_addr_end(addr, end);
> +		early_fixmap_init_pte(pmdp, addr);
> +	} while (pmdp++, addr = next, addr != end);
>  }
>  
> -static inline pte_t *fixmap_pte(unsigned long addr)
> +
> +void __init early_fixmap_init_pud(p4d_t *p4dp, unsigned long addr,
> +				  unsigned long end)
>  {
> -	return &bm_pte[pte_index(addr)];
> +	p4d_t p4d = READ_ONCE(*p4dp);
> +	pud_t *pudp;
> +
> +	if (CONFIG_PGTABLE_LEVELS > 3 && !p4d_none(p4d) &&
> +	    p4d_page_paddr(p4d) != __pa_symbol(bm_pud)) {
> +		/*
> +		 * We only end up here if the kernel mapping and the fixmap
> +		 * share the top level pgd entry, which should only happen on
> +		 * 16k/4 levels configurations.
> +		 */
> +		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> +	}
> +
> +	if (p4d_none(p4d))
> +		__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
> +
> +	pudp = pud_offset_kimg(p4dp, addr);
> +	early_fixmap_init_pmd(pudp, addr, end);
>  }
>  
>  /*
> @@ -54,55 +97,13 @@ static inline pte_t *fixmap_pte(unsigned long addr)
>   */
>  void __init early_fixmap_init(void)
>  {
> -	pgd_t *pgdp;
> -	p4d_t *p4dp, p4d;
> -	pud_t *pudp;
> -	pmd_t *pmdp;
>  	unsigned long addr = FIXADDR_START;
> +	unsigned long end = FIXADDR_TOP;
>  
> -	pgdp = pgd_offset_k(addr);
> -	p4dp = p4d_offset(pgdp, addr);
> -	p4d = READ_ONCE(*p4dp);
> -	if (CONFIG_PGTABLE_LEVELS > 3 &&
> -	    !(p4d_none(p4d) || p4d_page_paddr(p4d) == __pa_symbol(bm_pud))) {
> -		/*
> -		 * We only end up here if the kernel mapping and the fixmap
> -		 * share the top level pgd entry, which should only happen on
> -		 * 16k/4 levels configurations.
> -		 */
> -		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
> -		pudp = pud_offset_kimg(p4dp, addr);
> -	} else {
> -		if (p4d_none(p4d))
> -			__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
> -		pudp = fixmap_pud(addr);
> -	}
> -	if (pud_none(READ_ONCE(*pudp)))
> -		__pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
> -	pmdp = fixmap_pmd(addr);
> -	__pmd_populate(pmdp, __pa_symbol(bm_pte), PMD_TYPE_TABLE);
> +	pgd_t *pgdp = pgd_offset_k(addr);
> +	p4d_t *p4dp = p4d_offset(pgdp, addr);
>  
> -	/*
> -	 * The boot-ioremap range spans multiple pmds, for which
> -	 * we are not prepared:
> -	 */
> -	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
> -		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
> -
> -	if ((pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
> -	     || pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
> -		WARN_ON(1);
> -		pr_warn("pmdp %p != %p, %p\n",
> -			pmdp, fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)),
> -			fixmap_pmd(fix_to_virt(FIX_BTMAP_END)));
> -		pr_warn("fix_to_virt(FIX_BTMAP_BEGIN): %08lx\n",
> -			fix_to_virt(FIX_BTMAP_BEGIN));
> -		pr_warn("fix_to_virt(FIX_BTMAP_END):   %08lx\n",
> -			fix_to_virt(FIX_BTMAP_END));
> -
> -		pr_warn("FIX_BTMAP_END:       %d\n", FIX_BTMAP_END);
> -		pr_warn("FIX_BTMAP_BEGIN:     %d\n", FIX_BTMAP_BEGIN);
> -	}
> +	early_fixmap_init_pud(p4dp, addr, end);
>  }
>  
>  /*
> @@ -130,6 +131,7 @@ void __set_fixmap(enum fixed_addresses idx,
>  void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
>  {
>  	const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
> +	phys_addr_t dt_phys_base;
>  	int offset;
>  	void *dt_virt;
>  
> @@ -144,27 +146,12 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
>  	if (!dt_phys || dt_phys % MIN_FDT_ALIGN)
>  		return NULL;
>  
> -	/*
> -	 * Make sure that the FDT region can be mapped without the need to
> -	 * allocate additional translation table pages, so that it is safe
> -	 * to call create_mapping_noalloc() this early.
> -	 *
> -	 * On 64k pages, the FDT will be mapped using PTEs, so we need to
> -	 * be in the same PMD as the rest of the fixmap.
> -	 * On 4k pages, we'll use section mappings for the FDT so we only
> -	 * have to be in the same PUD.
> -	 */
> -	BUILD_BUG_ON(dt_virt_base % SZ_2M);
> -
> -	BUILD_BUG_ON(__fix_to_virt(FIX_FDT_END) >> SWAPPER_TABLE_SHIFT !=
> -		     __fix_to_virt(FIX_BTMAP_BEGIN) >> SWAPPER_TABLE_SHIFT);
> -
> -	offset = dt_phys % SWAPPER_BLOCK_SIZE;
> +	dt_phys_base = round_down(dt_phys, PAGE_SIZE);
> +	offset = dt_phys % PAGE_SIZE;
>  	dt_virt = (void *)dt_virt_base + offset;
>  
>  	/* map the first chunk so we can read the size from the header */
> -	create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE),
> -			dt_virt_base, SWAPPER_BLOCK_SIZE, prot);
> +	create_mapping_noalloc(dt_phys_base, dt_virt_base, PAGE_SIZE, prot);
>  
>  	if (fdt_magic(dt_virt) != FDT_MAGIC)
>  		return NULL;
> @@ -173,9 +160,10 @@ void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
>  	if (*size > MAX_FDT_SIZE)
>  		return NULL;
>  
> -	if (offset + *size > SWAPPER_BLOCK_SIZE)
> -		create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
> -			       round_up(offset + *size, SWAPPER_BLOCK_SIZE), prot);
> +	if (offset + *size > PAGE_SIZE) {
> +		create_mapping_noalloc(dt_phys_base, dt_virt_base,
> +				       offset + *size, prot);
> +	}
>  
>  	return dt_virt;
>  }
Mark Rutland March 21, 2023, 2:18 p.m. UTC | #2
On Wed, Mar 15, 2023 at 10:23:31AM +0000, Ryan Roberts wrote:
> On 14/03/2023 14:21, Mark Rutland wrote:

> > diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
> > index c153f069e9c9..c59d433c1801 100644
> > --- a/arch/arm64/include/asm/fixmap.h
> > +++ b/arch/arm64/include/asm/fixmap.h
> > @@ -36,17 +36,13 @@ enum fixed_addresses {
> >  	FIX_HOLE,
> >  
> >  	/*
> > -	 * Reserve a virtual window for the FDT that is 2 MB larger than the
> > -	 * maximum supported size, and put it at the top of the fixmap region.
> > -	 * The additional space ensures that any FDT that does not exceed
> > -	 * MAX_FDT_SIZE can be mapped regardless of whether it crosses any
> > -	 * 2 MB alignment boundaries.
> > -	 *
> > -	 * Keep this at the top so it remains 2 MB aligned.
> > +	 * Reserve a virtual window for the FDT that is a page bigger than the
> > +	 * maximum supported size. The additional space ensures that any FDT
> > +	 * that does not exceed MAX_FDT_SIZE can be mapped regardless of
> > +	 * whether it crosses any page boundary.
> >  	 */
> > -#define FIX_FDT_SIZE		(MAX_FDT_SIZE + SZ_2M)
> >  	FIX_FDT_END,
> > -	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
> > +	FIX_FDT = FIX_FDT_END + MAX_FDT_SIZE / PAGE_SIZE - 1,
> 
> I don't think this is consistent with your comment - needs a +1 for the extra
> page? On a 4K system, FIX_FDT will be calculated as 512, but we need it to be
> 513 to deal with the case where the FDT is 2MB and not aligned to a page boundary.

Indeed; the intent was to have 513 in that case, but I clearly either forgot to
fix the calculation or messed that up when rebasing. I've fixed that to:

	FIX_FDT_END + DIV_ROUND_UP(MAX_FDT_SIZE, PAGE_SIZE) + 1,

... which'll work even if MAX_FDT_SIZE weren't a multiple of PAGE_SIZE.

[...]

> > diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
> > index 54e50552bfe3..d7a6a485f361 100644
> > --- a/arch/arm64/mm/fixmap.c
> > +++ b/arch/arm64/mm/fixmap.c
> > @@ -16,34 +16,77 @@
> >  #include <asm/pgalloc.h>
> >  #include <asm/tlbflush.h>
> >  
> > -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> > +#define NR_BM_PTE_TABLES \
> > +	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PMD_SHIFT)
> > +#define NR_BM_PMD_TABLES \
> > +	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PUD_SHIFT)
> 
> One of the bear traps I stepped in last week when I was wrestling with this is
> that FIXADDR_START isn't actually the address of the first slot! It is
> __end_of_permanent_fixed_addresses. But you then have all the FIX_BTMAP slots,
> FIX_PTE/PMD/PUD/PGD before FIXADDR_START. So I think some refactoring is in
> order to properly cover all the slots.

Ugh, yes. This is a total mess.

> Honestly, the fact that the fixed_addresses enum is backwards is mind-bending
> too. 

That has been a long standing bugbear of mine, too.

> Any chance we can just define FIXADDR_START and FIXADDR_TOP to cover the
> whole lot, then calculate the slot address as (FIXADDR_START + (<ENUM> <<
> PAGE_SHIFT))?

Unfortunately, some code is (implicitly) relying on FIXADDR_START..FIXADDR_TOP
*not* including the FIX_BTMAP slots (e.g. virt_to_fix() and fix_to_virt() use
that to implicitly catch dodgy usage). Other code really wants FIXADDR_START to
be the base of the whole fixmap region (e.g. ptdump).

Cleaning that up is going to be a bit churny/painful. I'll take a look.

> (plus reorder of start/end enum labels for FDT and BTMAP). Or if
> we really care about reclaiming the virtual address space between
> __end_of_permanent_fixed_addresses and __end_of_fixed_addresses, create a new
> FIXADDR_BOOT_START or whatever.

I don't think we care about reclaiming the address space; it's a few pages at
best. I'll go take another look at how churny it'd be clean this up...

> > +
> > +static_assert(NR_BM_PMD_TABLES == 1);
> > +
> > +#define __BM_TABLE_IDX(addr, shift) \
> > +	(((addr) >> (shift)) - (FIXADDR_START >> (shift)))
> 
> Doesn't this run the risk of being negative for one of the slots below
> FIXADDR_START?

Yes; I had erroneously thought FIXADDR_START was below all of those, and the
intent was to use the base of the entire fixmap. So as above we'll need to do
some more preparatory cleanup...

> > +void __init early_fixmap_init_pte(pmd_t *pmdp, unsigned long addr)
> 
> static? - same comment for pmd/pud below.

Yup; done.

> > +{
> > +	pmd_t pmd = READ_ONCE(*pmdp);
> > +	pte_t *ptep;
> >  
> > -	return pud_offset_kimg(p4dp, addr);
> > +	if (pmd_none(pmd)) {
> > +		ptep = bm_pte[BM_PTE_TABLE_IDX(addr)];;
> 
> nit: remove one of the semi-colons.

Whoops; rebase error; done.

Thanks,
Mark.
Ryan Roberts March 21, 2023, 2:34 p.m. UTC | #3
On 21/03/2023 14:18, Mark Rutland wrote:
> On Wed, Mar 15, 2023 at 10:23:31AM +0000, Ryan Roberts wrote:
>> On 14/03/2023 14:21, Mark Rutland wrote:
> 
>>> diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
>>> index c153f069e9c9..c59d433c1801 100644
>>> --- a/arch/arm64/include/asm/fixmap.h
>>> +++ b/arch/arm64/include/asm/fixmap.h
>>> @@ -36,17 +36,13 @@ enum fixed_addresses {
>>>  	FIX_HOLE,
>>>  
>>>  	/*
>>> -	 * Reserve a virtual window for the FDT that is 2 MB larger than the
>>> -	 * maximum supported size, and put it at the top of the fixmap region.
>>> -	 * The additional space ensures that any FDT that does not exceed
>>> -	 * MAX_FDT_SIZE can be mapped regardless of whether it crosses any
>>> -	 * 2 MB alignment boundaries.
>>> -	 *
>>> -	 * Keep this at the top so it remains 2 MB aligned.
>>> +	 * Reserve a virtual window for the FDT that is a page bigger than the
>>> +	 * maximum supported size. The additional space ensures that any FDT
>>> +	 * that does not exceed MAX_FDT_SIZE can be mapped regardless of
>>> +	 * whether it crosses any page boundary.
>>>  	 */
>>> -#define FIX_FDT_SIZE		(MAX_FDT_SIZE + SZ_2M)
>>>  	FIX_FDT_END,
>>> -	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
>>> +	FIX_FDT = FIX_FDT_END + MAX_FDT_SIZE / PAGE_SIZE - 1,
>>
>> I don't think this is consistent with your comment - needs a +1 for the extra
>> page? On a 4K system, FIX_FDT will be calculated as 512, but we need it to be
>> 513 to deal with the case where the FDT is 2MB and not aligned to a page boundary.
> 
> Indeed; the intent was to have 513 in that case, but I clearly either forgot to
> fix the calculation or messed that up when rebasing. I've fixed that to:
> 
> 	FIX_FDT_END + DIV_ROUND_UP(MAX_FDT_SIZE, PAGE_SIZE) + 1,
> 
> ... which'll work even if MAX_FDT_SIZE weren't a multiple of PAGE_SIZE.
> 
> [...]
> 
>>> diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
>>> index 54e50552bfe3..d7a6a485f361 100644
>>> --- a/arch/arm64/mm/fixmap.c
>>> +++ b/arch/arm64/mm/fixmap.c
>>> @@ -16,34 +16,77 @@
>>>  #include <asm/pgalloc.h>
>>>  #include <asm/tlbflush.h>
>>>  
>>> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
>>> +#define NR_BM_PTE_TABLES \
>>> +	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PMD_SHIFT)
>>> +#define NR_BM_PMD_TABLES \
>>> +	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PUD_SHIFT)
>>
>> One of the bear traps I stepped in last week when I was wrestling with this is
>> that FIXADDR_START isn't actually the address of the first slot! It is
>> __end_of_permanent_fixed_addresses. But you then have all the FIX_BTMAP slots,
>> FIX_PTE/PMD/PUD/PGD before FIXADDR_START. So I think some refactoring is in
>> order to properly cover all the slots.
> 
> Ugh, yes. This is a total mess.
> 
>> Honestly, the fact that the fixed_addresses enum is backwards is mind-bending
>> too. 
> 
> That has been a long standing bugbear of mine, too.
> 
>> Any chance we can just define FIXADDR_START and FIXADDR_TOP to cover the
>> whole lot, then calculate the slot address as (FIXADDR_START + (<ENUM> <<
>> PAGE_SHIFT))?
> 
> Unfortunately, some code is (implicitly) relying on FIXADDR_START..FIXADDR_TOP
> *not* including the FIX_BTMAP slots (e.g. virt_to_fix() and fix_to_virt() use
> that to implicitly catch dodgy usage). Other code really wants FIXADDR_START to
> be the base of the whole fixmap region (e.g. ptdump).

So perhaps one approach would be to have 2 sets of macros:

#define ARCH_FIXADDR_START
#define ARCH_FIXADDR_TOP
#define FIXADDR_START
#define FIXADDR_TOP

static_assert(ARCH_FIXADDR_START <= FIXADDR_START);
static_assert(ARCH_FIXADDR_TOP >= FIXADDR_TOP);

ARCH_FIXADDR_* covers the entire region exactly. And FIXADDR_* describes the
sub-region that the generic code cares about. Then all the fixmap internals can
be done against ARCH_FIXADDR_* and the generic code can do its checks against
FIXADDR_* ?

> 
> Cleaning that up is going to be a bit churny/painful. I'll take a look.
> 
>> (plus reorder of start/end enum labels for FDT and BTMAP). Or if
>> we really care about reclaiming the virtual address space between
>> __end_of_permanent_fixed_addresses and __end_of_fixed_addresses, create a new
>> FIXADDR_BOOT_START or whatever.
> 
> I don't think we care about reclaiming the address space; it's a few pages at
> best. I'll go take another look at how churny it'd be clean this up...
> 
>>> +
>>> +static_assert(NR_BM_PMD_TABLES == 1);
>>> +
>>> +#define __BM_TABLE_IDX(addr, shift) \
>>> +	(((addr) >> (shift)) - (FIXADDR_START >> (shift)))
>>
>> Doesn't this run the risk of being negative for one of the slots below
>> FIXADDR_START?
> 
> Yes; I had erroneously thought FIXADDR_START was below all of those, and the
> intent was to use the base of the entire fixmap. So as above we'll need to do
> some more preparatory cleanup...
> 
>>> +void __init early_fixmap_init_pte(pmd_t *pmdp, unsigned long addr)
>>
>> static? - same comment for pmd/pud below.
> 
> Yup; done.
> 
>>> +{
>>> +	pmd_t pmd = READ_ONCE(*pmdp);
>>> +	pte_t *ptep;
>>>  
>>> -	return pud_offset_kimg(p4dp, addr);
>>> +	if (pmd_none(pmd)) {
>>> +		ptep = bm_pte[BM_PTE_TABLE_IDX(addr)];;
>>
>> nit: remove one of the semi-colons.
> 
> Whoops; rebase error; done.
> 
> Thanks,
> Mark.
Mark Rutland March 23, 2023, 3:24 p.m. UTC | #4
On Tue, Mar 21, 2023 at 02:34:15PM +0000, Ryan Roberts wrote:
> On 21/03/2023 14:18, Mark Rutland wrote:
> > On Wed, Mar 15, 2023 at 10:23:31AM +0000, Ryan Roberts wrote:
> >> On 14/03/2023 14:21, Mark Rutland wrote:

> >>> diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
> >>> index 54e50552bfe3..d7a6a485f361 100644
> >>> --- a/arch/arm64/mm/fixmap.c
> >>> +++ b/arch/arm64/mm/fixmap.c
> >>> @@ -16,34 +16,77 @@
> >>>  #include <asm/pgalloc.h>
> >>>  #include <asm/tlbflush.h>
> >>>  
> >>> -static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
> >>> +#define NR_BM_PTE_TABLES \
> >>> +	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PMD_SHIFT)
> >>> +#define NR_BM_PMD_TABLES \
> >>> +	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PUD_SHIFT)
> >>
> >> One of the bear traps I stepped in last week when I was wrestling with this is
> >> that FIXADDR_START isn't actually the address of the first slot! It is
> >> __end_of_permanent_fixed_addresses. But you then have all the FIX_BTMAP slots,
> >> FIX_PTE/PMD/PUD/PGD before FIXADDR_START. So I think some refactoring is in
> >> order to properly cover all the slots.
> > 
> > Ugh, yes. This is a total mess.
> > 
> >> Honestly, the fact that the fixed_addresses enum is backwards is mind-bending
> >> too. 
> > 
> > That has been a long standing bugbear of mine, too.
> > 
> >> Any chance we can just define FIXADDR_START and FIXADDR_TOP to cover the
> >> whole lot, then calculate the slot address as (FIXADDR_START + (<ENUM> <<
> >> PAGE_SHIFT))?
> > 
> > Unfortunately, some code is (implicitly) relying on FIXADDR_START..FIXADDR_TOP
> > *not* including the FIX_BTMAP slots (e.g. virt_to_fix() and fix_to_virt() use
> > that to implicitly catch dodgy usage). Other code really wants FIXADDR_START to
> > be the base of the whole fixmap region (e.g. ptdump).
> 
> So perhaps one approach would be to have 2 sets of macros:
> 
> #define ARCH_FIXADDR_START
> #define ARCH_FIXADDR_TOP
> #define FIXADDR_START
> #define FIXADDR_TOP
> 
> static_assert(ARCH_FIXADDR_START <= FIXADDR_START);
> static_assert(ARCH_FIXADDR_TOP >= FIXADDR_TOP);
> 
> ARCH_FIXADDR_* covers the entire region exactly. And FIXADDR_* describes the
> sub-region that the generic code cares about. Then all the fixmap internals can
> be done against ARCH_FIXADDR_* and the generic code can do its checks against
> FIXADDR_* ?

FWIW, x86 has:

| #define FIXADDR_SIZE            (__end_of_permanent_fixed_addresses << PAGE_SHIFT)
| #define FIXADDR_START           (FIXADDR_TOP - FIXADDR_SIZE)
| #define FIXADDR_TOT_SIZE        (__end_of_fixed_addresses << PAGE_SHIFT)
| #define FIXADDR_TOT_START       (FIXADDR_TOP - FIXADDR_TOT_SIZE)

... so I suspect we should do the same, in the hope that can (eventually) be
unified.

Thanks,
Mark.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/fixmap.h b/arch/arm64/include/asm/fixmap.h
index c153f069e9c9..c59d433c1801 100644
--- a/arch/arm64/include/asm/fixmap.h
+++ b/arch/arm64/include/asm/fixmap.h
@@ -36,17 +36,13 @@  enum fixed_addresses {
 	FIX_HOLE,
 
 	/*
-	 * Reserve a virtual window for the FDT that is 2 MB larger than the
-	 * maximum supported size, and put it at the top of the fixmap region.
-	 * The additional space ensures that any FDT that does not exceed
-	 * MAX_FDT_SIZE can be mapped regardless of whether it crosses any
-	 * 2 MB alignment boundaries.
-	 *
-	 * Keep this at the top so it remains 2 MB aligned.
+	 * Reserve a virtual window for the FDT that is a page bigger than the
+	 * maximum supported size. The additional space ensures that any FDT
+	 * that does not exceed MAX_FDT_SIZE can be mapped regardless of
+	 * whether it crosses any page boundary.
 	 */
-#define FIX_FDT_SIZE		(MAX_FDT_SIZE + SZ_2M)
 	FIX_FDT_END,
-	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
+	FIX_FDT = FIX_FDT_END + MAX_FDT_SIZE / PAGE_SIZE - 1,
 
 	FIX_EARLYCON_MEM_BASE,
 	FIX_TEXT_POKE0,
diff --git a/arch/arm64/include/asm/kernel-pgtable.h b/arch/arm64/include/asm/kernel-pgtable.h
index fcd14197756f..186dd7f85b14 100644
--- a/arch/arm64/include/asm/kernel-pgtable.h
+++ b/arch/arm64/include/asm/kernel-pgtable.h
@@ -59,8 +59,11 @@ 
 #define EARLY_KASLR	(0)
 #endif
 
+#define SPAN_NR_ENTRIES(vstart, vend, shift) \
+	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1)
+
 #define EARLY_ENTRIES(vstart, vend, shift, add) \
-	((((vend) - 1) >> (shift)) - ((vstart) >> (shift)) + 1 + add)
+	(SPAN_NR_ENTRIES(vstart, vend, shift) + (add))
 
 #define EARLY_PGDS(vstart, vend, add) (EARLY_ENTRIES(vstart, vend, PGDIR_SHIFT, add))
 
diff --git a/arch/arm64/mm/fixmap.c b/arch/arm64/mm/fixmap.c
index 54e50552bfe3..d7a6a485f361 100644
--- a/arch/arm64/mm/fixmap.c
+++ b/arch/arm64/mm/fixmap.c
@@ -16,34 +16,77 @@ 
 #include <asm/pgalloc.h>
 #include <asm/tlbflush.h>
 
-static pte_t bm_pte[PTRS_PER_PTE] __page_aligned_bss;
+#define NR_BM_PTE_TABLES \
+	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PMD_SHIFT)
+#define NR_BM_PMD_TABLES \
+	SPAN_NR_ENTRIES(FIXADDR_START, FIXADDR_TOP, PUD_SHIFT)
+
+static_assert(NR_BM_PMD_TABLES == 1);
+
+#define __BM_TABLE_IDX(addr, shift) \
+	(((addr) >> (shift)) - (FIXADDR_START >> (shift)))
+
+#define BM_PTE_TABLE_IDX(addr)	__BM_TABLE_IDX(addr, PMD_SHIFT)
+
+static pte_t bm_pte[NR_BM_PTE_TABLES][PTRS_PER_PTE] __page_aligned_bss;
 static pmd_t bm_pmd[PTRS_PER_PMD] __page_aligned_bss __maybe_unused;
 static pud_t bm_pud[PTRS_PER_PUD] __page_aligned_bss __maybe_unused;
 
-static inline pud_t *fixmap_pud(unsigned long addr)
+static inline pte_t *fixmap_pte(unsigned long addr)
 {
-	pgd_t *pgdp = pgd_offset_k(addr);
-	p4d_t *p4dp = p4d_offset(pgdp, addr);
-	p4d_t p4d = READ_ONCE(*p4dp);
+	return &bm_pte[BM_PTE_TABLE_IDX(addr)][pte_index(addr)];
+}
 
-	BUG_ON(p4d_none(p4d) || p4d_bad(p4d));
+void __init early_fixmap_init_pte(pmd_t *pmdp, unsigned long addr)
+{
+	pmd_t pmd = READ_ONCE(*pmdp);
+	pte_t *ptep;
 
-	return pud_offset_kimg(p4dp, addr);
+	if (pmd_none(pmd)) {
+		ptep = bm_pte[BM_PTE_TABLE_IDX(addr)];;
+		__pmd_populate(pmdp, __pa_symbol(ptep), PMD_TYPE_TABLE);
+	}
 }
 
-static inline pmd_t *fixmap_pmd(unsigned long addr)
+void __init early_fixmap_init_pmd(pud_t *pudp, unsigned long addr,
+				  unsigned long end)
 {
-	pud_t *pudp = fixmap_pud(addr);
+	unsigned long next;
 	pud_t pud = READ_ONCE(*pudp);
+	pmd_t *pmdp;
 
-	BUG_ON(pud_none(pud) || pud_bad(pud));
+	if (pud_none(pud))
+		__pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
 
-	return pmd_offset_kimg(pudp, addr);
+	pmdp = pmd_offset_kimg(pudp, addr);
+	do {
+		next = pmd_addr_end(addr, end);
+		early_fixmap_init_pte(pmdp, addr);
+	} while (pmdp++, addr = next, addr != end);
 }
 
-static inline pte_t *fixmap_pte(unsigned long addr)
+
+void __init early_fixmap_init_pud(p4d_t *p4dp, unsigned long addr,
+				  unsigned long end)
 {
-	return &bm_pte[pte_index(addr)];
+	p4d_t p4d = READ_ONCE(*p4dp);
+	pud_t *pudp;
+
+	if (CONFIG_PGTABLE_LEVELS > 3 && !p4d_none(p4d) &&
+	    p4d_page_paddr(p4d) != __pa_symbol(bm_pud)) {
+		/*
+		 * We only end up here if the kernel mapping and the fixmap
+		 * share the top level pgd entry, which should only happen on
+		 * 16k/4 levels configurations.
+		 */
+		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
+	}
+
+	if (p4d_none(p4d))
+		__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
+
+	pudp = pud_offset_kimg(p4dp, addr);
+	early_fixmap_init_pmd(pudp, addr, end);
 }
 
 /*
@@ -54,55 +97,13 @@  static inline pte_t *fixmap_pte(unsigned long addr)
  */
 void __init early_fixmap_init(void)
 {
-	pgd_t *pgdp;
-	p4d_t *p4dp, p4d;
-	pud_t *pudp;
-	pmd_t *pmdp;
 	unsigned long addr = FIXADDR_START;
+	unsigned long end = FIXADDR_TOP;
 
-	pgdp = pgd_offset_k(addr);
-	p4dp = p4d_offset(pgdp, addr);
-	p4d = READ_ONCE(*p4dp);
-	if (CONFIG_PGTABLE_LEVELS > 3 &&
-	    !(p4d_none(p4d) || p4d_page_paddr(p4d) == __pa_symbol(bm_pud))) {
-		/*
-		 * We only end up here if the kernel mapping and the fixmap
-		 * share the top level pgd entry, which should only happen on
-		 * 16k/4 levels configurations.
-		 */
-		BUG_ON(!IS_ENABLED(CONFIG_ARM64_16K_PAGES));
-		pudp = pud_offset_kimg(p4dp, addr);
-	} else {
-		if (p4d_none(p4d))
-			__p4d_populate(p4dp, __pa_symbol(bm_pud), P4D_TYPE_TABLE);
-		pudp = fixmap_pud(addr);
-	}
-	if (pud_none(READ_ONCE(*pudp)))
-		__pud_populate(pudp, __pa_symbol(bm_pmd), PUD_TYPE_TABLE);
-	pmdp = fixmap_pmd(addr);
-	__pmd_populate(pmdp, __pa_symbol(bm_pte), PMD_TYPE_TABLE);
+	pgd_t *pgdp = pgd_offset_k(addr);
+	p4d_t *p4dp = p4d_offset(pgdp, addr);
 
-	/*
-	 * The boot-ioremap range spans multiple pmds, for which
-	 * we are not prepared:
-	 */
-	BUILD_BUG_ON((__fix_to_virt(FIX_BTMAP_BEGIN) >> PMD_SHIFT)
-		     != (__fix_to_virt(FIX_BTMAP_END) >> PMD_SHIFT));
-
-	if ((pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)))
-	     || pmdp != fixmap_pmd(fix_to_virt(FIX_BTMAP_END))) {
-		WARN_ON(1);
-		pr_warn("pmdp %p != %p, %p\n",
-			pmdp, fixmap_pmd(fix_to_virt(FIX_BTMAP_BEGIN)),
-			fixmap_pmd(fix_to_virt(FIX_BTMAP_END)));
-		pr_warn("fix_to_virt(FIX_BTMAP_BEGIN): %08lx\n",
-			fix_to_virt(FIX_BTMAP_BEGIN));
-		pr_warn("fix_to_virt(FIX_BTMAP_END):   %08lx\n",
-			fix_to_virt(FIX_BTMAP_END));
-
-		pr_warn("FIX_BTMAP_END:       %d\n", FIX_BTMAP_END);
-		pr_warn("FIX_BTMAP_BEGIN:     %d\n", FIX_BTMAP_BEGIN);
-	}
+	early_fixmap_init_pud(p4dp, addr, end);
 }
 
 /*
@@ -130,6 +131,7 @@  void __set_fixmap(enum fixed_addresses idx,
 void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
 {
 	const u64 dt_virt_base = __fix_to_virt(FIX_FDT);
+	phys_addr_t dt_phys_base;
 	int offset;
 	void *dt_virt;
 
@@ -144,27 +146,12 @@  void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
 	if (!dt_phys || dt_phys % MIN_FDT_ALIGN)
 		return NULL;
 
-	/*
-	 * Make sure that the FDT region can be mapped without the need to
-	 * allocate additional translation table pages, so that it is safe
-	 * to call create_mapping_noalloc() this early.
-	 *
-	 * On 64k pages, the FDT will be mapped using PTEs, so we need to
-	 * be in the same PMD as the rest of the fixmap.
-	 * On 4k pages, we'll use section mappings for the FDT so we only
-	 * have to be in the same PUD.
-	 */
-	BUILD_BUG_ON(dt_virt_base % SZ_2M);
-
-	BUILD_BUG_ON(__fix_to_virt(FIX_FDT_END) >> SWAPPER_TABLE_SHIFT !=
-		     __fix_to_virt(FIX_BTMAP_BEGIN) >> SWAPPER_TABLE_SHIFT);
-
-	offset = dt_phys % SWAPPER_BLOCK_SIZE;
+	dt_phys_base = round_down(dt_phys, PAGE_SIZE);
+	offset = dt_phys % PAGE_SIZE;
 	dt_virt = (void *)dt_virt_base + offset;
 
 	/* map the first chunk so we can read the size from the header */
-	create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE),
-			dt_virt_base, SWAPPER_BLOCK_SIZE, prot);
+	create_mapping_noalloc(dt_phys_base, dt_virt_base, PAGE_SIZE, prot);
 
 	if (fdt_magic(dt_virt) != FDT_MAGIC)
 		return NULL;
@@ -173,9 +160,10 @@  void *__init fixmap_remap_fdt(phys_addr_t dt_phys, int *size, pgprot_t prot)
 	if (*size > MAX_FDT_SIZE)
 		return NULL;
 
-	if (offset + *size > SWAPPER_BLOCK_SIZE)
-		create_mapping_noalloc(round_down(dt_phys, SWAPPER_BLOCK_SIZE), dt_virt_base,
-			       round_up(offset + *size, SWAPPER_BLOCK_SIZE), prot);
+	if (offset + *size > PAGE_SIZE) {
+		create_mapping_noalloc(dt_phys_base, dt_virt_base,
+				       offset + *size, prot);
+	}
 
 	return dt_virt;
 }