diff mbox series

[-fixes,1/2] riscv: Move early dtb mapping into the fixmap region

Message ID 20230323163347.182895-2-alexghiti@rivosinc.com (mailing list archive)
State Superseded
Headers show
Series Fixes for dtb mapping | expand

Checks

Context Check Description
conchuod/cover_letter success Series has a cover letter
conchuod/tree_selection success Guessed tree name to be fixes
conchuod/fixes_present success Fixes tag present in non-next series
conchuod/maintainers_pattern success MAINTAINERS pattern errors before the patch: 1 and now 1
conchuod/verify_signedoff success Signed-off-by tag matches author and committer
conchuod/kdoc success Errors and warnings before: 0 this patch: 0
conchuod/build_rv64_clang_allmodconfig success Errors and warnings before: 2275 this patch: 2275
conchuod/module_param success Was 0 now: 0
conchuod/build_rv64_gcc_allmodconfig success Errors and warnings before: 17609 this patch: 17609
conchuod/build_rv32_defconfig success Build OK
conchuod/dtb_warn_rv64 success Errors and warnings before: 3 this patch: 3
conchuod/header_inline success No static functions without inline keyword in header files
conchuod/checkpatch success total: 0 errors, 0 warnings, 0 checks, 177 lines checked
conchuod/source_inline success Was 0 now: 0
conchuod/build_rv64_nommu_k210_defconfig success Build OK
conchuod/verify_fixes success Fixes tag looks correct
conchuod/build_rv64_nommu_virt_defconfig success Build OK

Commit Message

Alexandre Ghiti March 23, 2023, 4:33 p.m. UTC
riscv establishes 2 virtual mappings:

- early_pg_dir maps the kernel which allows to discover the system
  memory
- swapper_pg_dir installs the final mapping (linear mapping included)

We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
mapping was not carried over in swapper_pg_dir. It happens that
early_init_fdt_scan_reserved_mem must be called before swapper_pg_dir is
setup otherwise we could allocate reserved memory defined in the dtb.
And this function initializes reserved_mem variable with addresses that
lie in the early_pg_dir dtb mapping: when those addresses are reused
with swapper_pg_dir, this mapping does not exist and then we trap.

So move the dtb mapping in the fixmap region which is established in
early_pg_dir and handed over to swapper_pg_dir.

Fixes: 50e63dd8ed92 ("riscv: fix reserved memory setup")
Reported-by: Conor Dooley <Conor.Dooley@microchip.com>
Link: https://lore.kernel.org/all/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
---
 Documentation/riscv/vm-layout.rst |  6 +--
 arch/riscv/include/asm/fixmap.h   |  8 ++++
 arch/riscv/include/asm/pgtable.h  |  8 +++-
 arch/riscv/kernel/setup.c         |  1 -
 arch/riscv/mm/init.c              | 61 +++++++++++++++++--------------
 5 files changed, 51 insertions(+), 33 deletions(-)

Comments

Conor Dooley March 27, 2023, 10:37 a.m. UTC | #1
Hey Alex!

On Thu, Mar 23, 2023 at 05:33:46PM +0100, Alexandre Ghiti wrote:
> riscv establishes 2 virtual mappings:
> 
> - early_pg_dir maps the kernel which allows to discover the system
>   memory
> - swapper_pg_dir installs the final mapping (linear mapping included)
> 
> We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
> mapping was not carried over in swapper_pg_dir. It happens that
> early_init_fdt_scan_reserved_mem must be called before swapper_pg_dir is
> setup otherwise we could allocate reserved memory defined in the dtb.
> And this function initializes reserved_mem variable with addresses that
> lie in the early_pg_dir dtb mapping: when those addresses are reused
> with swapper_pg_dir, this mapping does not exist and then we trap.
> 
> So move the dtb mapping in the fixmap region which is established in
> early_pg_dir and handed over to swapper_pg_dir.

So I think this does a good job of explaining why we had a problem with
reserved memory to begin with, but I think it would be good to explain
exactly where
> Fixes: 50e63dd8ed92 ("riscv: fix reserved memory setup")
gets things wrong. That being, as far as I can tell, that it violates
the requirement "...must be called before swapper_pg_dir is set up".
It would be good to mention that specifically I think, say:

	riscv establishes 2 virtual mappings:

	- early_pg_dir maps the kernel which allows to discover the system
	  memory
	- swapper_pg_dir installs the final mapping (linear mapping included)

	We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
	mapping was not carried over in swapper_pg_dir. This caused problems
	for reserved memory, as early_init_fdt_scan_reserved_mem() initialised
	reserved_mem variables with addresses that lie in the early_pg_dir dtb
	mapping. When those addresses are reused with swapper_pg_dir, this
	mapping does not exist and then we trap.
	The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem()
	must be called before swapper_pg_dir is set up otherwise we could
	allocate in reserved memory defined in the dtb.

	Move the dtb mapping in the fixmap region which is established in
	early_pg_dir and handed over to swapper_pg_dir.

You need this one too:
Fixes: 922b0375fc93 ("riscv: Fix memblock reservation for device tree blob")

As an aside, one thing I noticed while re-working that, would you mind
adding ()s to function names in commit messages? Makes differentiation
between functions and variables etc a lot easier :)

This is probably the least confident R-b I've given to date:
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
But I did also run this through the test the trigger the initial
problems and it looks good, so:
Tested-by: Conor Dooley <conor.dooley@microchip.com>

> Reported-by: Conor Dooley <Conor.Dooley@microchip.com>

btw, could you make the email all lowercase if you re-spin, exchange did
what it does to the original report cos it was sent from Thunderbird
rather than mutt.

Thanks for working on this,
Conor.

> Link: https://lore.kernel.org/all/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/
> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
> ---
>  Documentation/riscv/vm-layout.rst |  6 +--
>  arch/riscv/include/asm/fixmap.h   |  8 ++++
>  arch/riscv/include/asm/pgtable.h  |  8 +++-
>  arch/riscv/kernel/setup.c         |  1 -
>  arch/riscv/mm/init.c              | 61 +++++++++++++++++--------------
>  5 files changed, 51 insertions(+), 33 deletions(-)
> 
> diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
> index 3be44e74ec5d..5462c84f4723 100644
> --- a/Documentation/riscv/vm-layout.rst
> +++ b/Documentation/riscv/vm-layout.rst
> @@ -47,7 +47,7 @@ RISC-V Linux Kernel SV39
>                                                                | Kernel-space virtual memory, shared between all processes:
>    ____________________________________________________________|___________________________________________________________
>                      |            |                  |         |
> -   ffffffc6fee00000 | -228    GB | ffffffc6feffffff |    2 MB | fixmap
> +   ffffffc6fea00000 | -228    GB | ffffffc6feffffff |    6 MB | fixmap
>     ffffffc6ff000000 | -228    GB | ffffffc6ffffffff |   16 MB | PCI io
>     ffffffc700000000 | -228    GB | ffffffc7ffffffff |    4 GB | vmemmap
>     ffffffc800000000 | -224    GB | ffffffd7ffffffff |   64 GB | vmalloc/ioremap space
> @@ -83,7 +83,7 @@ RISC-V Linux Kernel SV48
>                                                                | Kernel-space virtual memory, shared between all processes:
>    ____________________________________________________________|___________________________________________________________
>                      |            |                  |         |
> -   ffff8d7ffee00000 |  -114.5 TB | ffff8d7ffeffffff |    2 MB | fixmap
> +   ffff8d7ffea00000 |  -114.5 TB | ffff8d7ffeffffff |    6 MB | fixmap
>     ffff8d7fff000000 |  -114.5 TB | ffff8d7fffffffff |   16 MB | PCI io
>     ffff8d8000000000 |  -114.5 TB | ffff8f7fffffffff |    2 TB | vmemmap
>     ffff8f8000000000 |  -112.5 TB | ffffaf7fffffffff |   32 TB | vmalloc/ioremap space
> @@ -119,7 +119,7 @@ RISC-V Linux Kernel SV57
>                                                                | Kernel-space virtual memory, shared between all processes:
>    ____________________________________________________________|___________________________________________________________
>                      |            |                  |         |
> -   ff1bfffffee00000 | -57     PB | ff1bfffffeffffff |    2 MB | fixmap
> +   ff1bfffffea00000 | -57     PB | ff1bfffffeffffff |    6 MB | fixmap
>     ff1bffffff000000 | -57     PB | ff1bffffffffffff |   16 MB | PCI io
>     ff1c000000000000 | -57     PB | ff1fffffffffffff |    1 PB | vmemmap
>     ff20000000000000 | -56     PB | ff5fffffffffffff |   16 PB | vmalloc/ioremap space
> diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
> index 5c3e7b97fcc6..0a55099bb734 100644
> --- a/arch/riscv/include/asm/fixmap.h
> +++ b/arch/riscv/include/asm/fixmap.h
> @@ -22,6 +22,14 @@
>   */
>  enum fixed_addresses {
>  	FIX_HOLE,
> +	/*
> +	 * The fdt fixmap mapping must be PMD aligned and will be mapped
> +	 * using PMD entries in fixmap_pmd in 64-bit and a PGD entry in 32-bit.
> +	 */
> +	FIX_FDT_END,
> +	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
> +
> +	/* Below fixmaps will be mapped using fixmap_pte */
>  	FIX_PTE,
>  	FIX_PMD,
>  	FIX_PUD,
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index ab05f892d317..f641837ccf31 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -87,9 +87,13 @@
>  
>  #define FIXADDR_TOP      PCI_IO_START
>  #ifdef CONFIG_64BIT
> -#define FIXADDR_SIZE     PMD_SIZE
> +#define MAX_FDT_SIZE	 PMD_SIZE
> +#define FIX_FDT_SIZE	 (MAX_FDT_SIZE + SZ_2M)
> +#define FIXADDR_SIZE     (PMD_SIZE + FIX_FDT_SIZE)
>  #else
> -#define FIXADDR_SIZE     PGDIR_SIZE
> +#define MAX_FDT_SIZE	 PGDIR_SIZE
> +#define FIX_FDT_SIZE	 MAX_FDT_SIZE
> +#define FIXADDR_SIZE     (PGDIR_SIZE + FIX_FDT_SIZE)
>  #endif
>  #define FIXADDR_START    (FIXADDR_TOP - FIXADDR_SIZE)
>  
> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
> index 376d2827e736..542eed85ad2c 100644
> --- a/arch/riscv/kernel/setup.c
> +++ b/arch/riscv/kernel/setup.c
> @@ -283,7 +283,6 @@ void __init setup_arch(char **cmdline_p)
>  	else
>  		pr_err("No DTB found in kernel mappings\n");
>  #endif
> -	early_init_fdt_scan_reserved_mem();
>  	misc_mem_init();
>  
>  	init_resources();
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 478d6763a01a..fb78d6bbabae 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -57,7 +57,6 @@ unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]
>  EXPORT_SYMBOL(empty_zero_page);
>  
>  extern char _start[];
> -#define DTB_EARLY_BASE_VA      PGDIR_SIZE
>  void *_dtb_early_va __initdata;
>  uintptr_t _dtb_early_pa __initdata;
>  
> @@ -236,6 +235,14 @@ static void __init setup_bootmem(void)
>  	set_max_mapnr(max_low_pfn - ARCH_PFN_OFFSET);
>  
>  	reserve_initrd_mem();
> +
> +	/*
> +	 * No allocation should be done before reserving the memory as defined
> +	 * in the device tree, otherwise the allocation could end up in a
> +	 * reserved region.
> +	 */
> +	early_init_fdt_scan_reserved_mem();
> +
>  	/*
>  	 * If DTB is built in, no need to reserve its memblock.
>  	 * Otherwise, do reserve it but avoid using
> @@ -279,9 +286,6 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>  static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
>  
>  pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
> -static p4d_t __maybe_unused early_dtb_p4d[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE);
> -static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
> -static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
>  
>  #ifdef CONFIG_XIP_KERNEL
>  #define pt_ops			(*(struct pt_alloc_ops *)XIP_FIXUP(&pt_ops))
> @@ -626,9 +630,6 @@ static void __init create_p4d_mapping(p4d_t *p4dp,
>  #define trampoline_pgd_next	(pgtable_l5_enabled ?			\
>  		(uintptr_t)trampoline_p4d : (pgtable_l4_enabled ?	\
>  		(uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd))
> -#define early_dtb_pgd_next	(pgtable_l5_enabled ?			\
> -		(uintptr_t)early_dtb_p4d : (pgtable_l4_enabled ?	\
> -		(uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd))
>  #else
>  #define pgd_next_t		pte_t
>  #define alloc_pgd_next(__va)	pt_ops.alloc_pte(__va)
> @@ -636,7 +637,6 @@ static void __init create_p4d_mapping(p4d_t *p4dp,
>  #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
>  	create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
>  #define fixmap_pgd_next		((uintptr_t)fixmap_pte)
> -#define early_dtb_pgd_next	((uintptr_t)early_dtb_pmd)
>  #define create_p4d_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
>  #define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
>  #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
> @@ -860,32 +860,28 @@ static void __init create_kernel_page_table(pgd_t *pgdir, bool early)
>   * this means 2 PMD entries whereas for 32-bit kernel, this is only 1 PGDIR
>   * entry.
>   */
> -static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
> +static void __init create_fdt_early_page_table(pgd_t *pgdir,
> +					       uintptr_t fix_fdt_va,
> +					       uintptr_t dtb_pa)
>  {
> -#ifndef CONFIG_BUILTIN_DTB
>  	uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
>  
> -	create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
> -			   IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
> -			   PGDIR_SIZE,
> -			   IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
> -
> -	if (pgtable_l5_enabled)
> -		create_p4d_mapping(early_dtb_p4d, DTB_EARLY_BASE_VA,
> -				   (uintptr_t)early_dtb_pud, P4D_SIZE, PAGE_TABLE);
> -
> -	if (pgtable_l4_enabled)
> -		create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
> -				   (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
> +#ifndef CONFIG_BUILTIN_DTB
> +	/* Make sure the fdt fixmap address is always aligned on PMD size */
> +	BUILD_BUG_ON(FIX_FDT % (PMD_SIZE / PAGE_SIZE));
>  
> -	if (IS_ENABLED(CONFIG_64BIT)) {
> -		create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
> +	/* In 32-bit only, the fdt lies in its own PGD */
> +	if (!IS_ENABLED(CONFIG_64BIT)) {
> +		create_pgd_mapping(early_pg_dir, fix_fdt_va,
> +				   pa, MAX_FDT_SIZE, PAGE_KERNEL);
> +	} else {
> +		create_pmd_mapping(fixmap_pmd, fix_fdt_va,
>  				   pa, PMD_SIZE, PAGE_KERNEL);
> -		create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA + PMD_SIZE,
> +		create_pmd_mapping(fixmap_pmd, fix_fdt_va + PMD_SIZE,
>  				   pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL);
>  	}
>  
> -	dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 1));
> +	dtb_early_va = (void *)fix_fdt_va + (dtb_pa & (PMD_SIZE - 1));
>  #else
>  	/*
>  	 * For 64-bit kernel, __va can't be used since it would return a linear
> @@ -1055,7 +1051,8 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>  	create_kernel_page_table(early_pg_dir, true);
>  
>  	/* Setup early mapping for FDT early scan */
> -	create_fdt_early_page_table(early_pg_dir, dtb_pa);
> +	create_fdt_early_page_table(early_pg_dir,
> +				    __fix_to_virt(FIX_FDT), dtb_pa);
>  
>  	/*
>  	 * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
> @@ -1097,6 +1094,16 @@ static void __init setup_vm_final(void)
>  	u64 i;
>  
>  	/* Setup swapper PGD for fixmap */
> +#if !defined(CONFIG_64BIT)
> +	/*
> +	 * In 32-bit, the device tree lies in a pgd entry, so it must be copied
> +	 * directly in swapper_pg_dir in addition to the pgd entry that points
> +	 * to fixmap_pte.
> +	 */
> +	unsigned long idx = pgd_index(__fix_to_virt(FIX_FDT));
> +
> +	set_pgd(&swapper_pg_dir[idx], early_pg_dir[idx]);
> +#endif
>  	create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
>  			   __pa_symbol(fixmap_pgd_next),
>  			   PGDIR_SIZE, PAGE_TABLE);
> -- 
> 2.37.2
> 
>
Alexandre Ghiti March 29, 2023, 7:36 a.m. UTC | #2
Hi Conor,

On 3/27/23 12:37, Conor Dooley wrote:
> Hey Alex!
>
> On Thu, Mar 23, 2023 at 05:33:46PM +0100, Alexandre Ghiti wrote:
>> riscv establishes 2 virtual mappings:
>>
>> - early_pg_dir maps the kernel which allows to discover the system
>>    memory
>> - swapper_pg_dir installs the final mapping (linear mapping included)
>>
>> We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
>> mapping was not carried over in swapper_pg_dir. It happens that
>> early_init_fdt_scan_reserved_mem must be called before swapper_pg_dir is
>> setup otherwise we could allocate reserved memory defined in the dtb.
>> And this function initializes reserved_mem variable with addresses that
>> lie in the early_pg_dir dtb mapping: when those addresses are reused
>> with swapper_pg_dir, this mapping does not exist and then we trap.
>>
>> So move the dtb mapping in the fixmap region which is established in
>> early_pg_dir and handed over to swapper_pg_dir.
> So I think this does a good job of explaining why we had a problem with
> reserved memory to begin with, but I think it would be good to explain
> exactly where
>> Fixes: 50e63dd8ed92 ("riscv: fix reserved memory setup")
> gets things wrong. That being, as far as I can tell, that it violates
> the requirement "...must be called before swapper_pg_dir is set up".


The fix is not wrong, it is just incomplete: it fixes the issue with the 
not-existing-anymore address but introduces the problem with the 
possible allocations in the reserved regions (which you explained 
clearly below, thanks).



> It would be good to mention that specifically I think, say:
>
> 	riscv establishes 2 virtual mappings:
>
> 	- early_pg_dir maps the kernel which allows to discover the system
> 	  memory
> 	- swapper_pg_dir installs the final mapping (linear mapping included)
>
> 	We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
> 	mapping was not carried over in swapper_pg_dir. This caused problems
> 	for reserved memory, as early_init_fdt_scan_reserved_mem() initialised
> 	reserved_mem variables with addresses that lie in the early_pg_dir dtb
> 	mapping. When those addresses are reused with swapper_pg_dir, this
> 	mapping does not exist and then we trap.
> 	The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem()
> 	must be called before swapper_pg_dir is set up otherwise we could
> 	allocate in reserved memory defined in the dtb.
>
> 	Move the dtb mapping in the fixmap region which is established in
> 	early_pg_dir and handed over to swapper_pg_dir.
>
> You need this one too:
> Fixes: 922b0375fc93 ("riscv: Fix memblock reservation for device tree blob")

Not sure this commit is related to this fix and it's hard to find *one* 
culprit: TBH I only mentioned this one as otherwise I think the right 
commit is commit 8f3a2b4a96dc ("RISC-V: Move DT mapping outof fixmap") 
but that's a long time ago and the patch won't apply easily, not sure 
what to do here.

>
> As an aside, one thing I noticed while re-working that, would you mind
> adding ()s to function names in commit messages? Makes differentiation
> between functions and variables etc a lot easier :)


Sure


>
> This is probably the least confident R-b I've given to date:
> Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
> But I did also run this through the test the trigger the initial
> problems and it looks good, so:
> Tested-by: Conor Dooley <conor.dooley@microchip.com>


Great, if that breaks something, we'll share responsibility then, thanks :)


>
>> Reported-by: Conor Dooley <Conor.Dooley@microchip.com>
> btw, could you make the email all lowercase if you re-spin, exchange did
> what it does to the original report cos it was sent from Thunderbird
> rather than mutt.


Sure


>
> Thanks for working on this,


You're welcome, that was fun!


Alex


> Conor.
>
>> Link: https://lore.kernel.org/all/f8e67f82-103d-156c-deb0-d6d6e2756f5e@microchip.com/
>> Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
>> ---
>>   Documentation/riscv/vm-layout.rst |  6 +--
>>   arch/riscv/include/asm/fixmap.h   |  8 ++++
>>   arch/riscv/include/asm/pgtable.h  |  8 +++-
>>   arch/riscv/kernel/setup.c         |  1 -
>>   arch/riscv/mm/init.c              | 61 +++++++++++++++++--------------
>>   5 files changed, 51 insertions(+), 33 deletions(-)
>>
>> diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
>> index 3be44e74ec5d..5462c84f4723 100644
>> --- a/Documentation/riscv/vm-layout.rst
>> +++ b/Documentation/riscv/vm-layout.rst
>> @@ -47,7 +47,7 @@ RISC-V Linux Kernel SV39
>>                                                                 | Kernel-space virtual memory, shared between all processes:
>>     ____________________________________________________________|___________________________________________________________
>>                       |            |                  |         |
>> -   ffffffc6fee00000 | -228    GB | ffffffc6feffffff |    2 MB | fixmap
>> +   ffffffc6fea00000 | -228    GB | ffffffc6feffffff |    6 MB | fixmap
>>      ffffffc6ff000000 | -228    GB | ffffffc6ffffffff |   16 MB | PCI io
>>      ffffffc700000000 | -228    GB | ffffffc7ffffffff |    4 GB | vmemmap
>>      ffffffc800000000 | -224    GB | ffffffd7ffffffff |   64 GB | vmalloc/ioremap space
>> @@ -83,7 +83,7 @@ RISC-V Linux Kernel SV48
>>                                                                 | Kernel-space virtual memory, shared between all processes:
>>     ____________________________________________________________|___________________________________________________________
>>                       |            |                  |         |
>> -   ffff8d7ffee00000 |  -114.5 TB | ffff8d7ffeffffff |    2 MB | fixmap
>> +   ffff8d7ffea00000 |  -114.5 TB | ffff8d7ffeffffff |    6 MB | fixmap
>>      ffff8d7fff000000 |  -114.5 TB | ffff8d7fffffffff |   16 MB | PCI io
>>      ffff8d8000000000 |  -114.5 TB | ffff8f7fffffffff |    2 TB | vmemmap
>>      ffff8f8000000000 |  -112.5 TB | ffffaf7fffffffff |   32 TB | vmalloc/ioremap space
>> @@ -119,7 +119,7 @@ RISC-V Linux Kernel SV57
>>                                                                 | Kernel-space virtual memory, shared between all processes:
>>     ____________________________________________________________|___________________________________________________________
>>                       |            |                  |         |
>> -   ff1bfffffee00000 | -57     PB | ff1bfffffeffffff |    2 MB | fixmap
>> +   ff1bfffffea00000 | -57     PB | ff1bfffffeffffff |    6 MB | fixmap
>>      ff1bffffff000000 | -57     PB | ff1bffffffffffff |   16 MB | PCI io
>>      ff1c000000000000 | -57     PB | ff1fffffffffffff |    1 PB | vmemmap
>>      ff20000000000000 | -56     PB | ff5fffffffffffff |   16 PB | vmalloc/ioremap space
>> diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
>> index 5c3e7b97fcc6..0a55099bb734 100644
>> --- a/arch/riscv/include/asm/fixmap.h
>> +++ b/arch/riscv/include/asm/fixmap.h
>> @@ -22,6 +22,14 @@
>>    */
>>   enum fixed_addresses {
>>   	FIX_HOLE,
>> +	/*
>> +	 * The fdt fixmap mapping must be PMD aligned and will be mapped
>> +	 * using PMD entries in fixmap_pmd in 64-bit and a PGD entry in 32-bit.
>> +	 */
>> +	FIX_FDT_END,
>> +	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
>> +
>> +	/* Below fixmaps will be mapped using fixmap_pte */
>>   	FIX_PTE,
>>   	FIX_PMD,
>>   	FIX_PUD,
>> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
>> index ab05f892d317..f641837ccf31 100644
>> --- a/arch/riscv/include/asm/pgtable.h
>> +++ b/arch/riscv/include/asm/pgtable.h
>> @@ -87,9 +87,13 @@
>>   
>>   #define FIXADDR_TOP      PCI_IO_START
>>   #ifdef CONFIG_64BIT
>> -#define FIXADDR_SIZE     PMD_SIZE
>> +#define MAX_FDT_SIZE	 PMD_SIZE
>> +#define FIX_FDT_SIZE	 (MAX_FDT_SIZE + SZ_2M)
>> +#define FIXADDR_SIZE     (PMD_SIZE + FIX_FDT_SIZE)
>>   #else
>> -#define FIXADDR_SIZE     PGDIR_SIZE
>> +#define MAX_FDT_SIZE	 PGDIR_SIZE
>> +#define FIX_FDT_SIZE	 MAX_FDT_SIZE
>> +#define FIXADDR_SIZE     (PGDIR_SIZE + FIX_FDT_SIZE)
>>   #endif
>>   #define FIXADDR_START    (FIXADDR_TOP - FIXADDR_SIZE)
>>   
>> diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
>> index 376d2827e736..542eed85ad2c 100644
>> --- a/arch/riscv/kernel/setup.c
>> +++ b/arch/riscv/kernel/setup.c
>> @@ -283,7 +283,6 @@ void __init setup_arch(char **cmdline_p)
>>   	else
>>   		pr_err("No DTB found in kernel mappings\n");
>>   #endif
>> -	early_init_fdt_scan_reserved_mem();
>>   	misc_mem_init();
>>   
>>   	init_resources();
>> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
>> index 478d6763a01a..fb78d6bbabae 100644
>> --- a/arch/riscv/mm/init.c
>> +++ b/arch/riscv/mm/init.c
>> @@ -57,7 +57,6 @@ unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]
>>   EXPORT_SYMBOL(empty_zero_page);
>>   
>>   extern char _start[];
>> -#define DTB_EARLY_BASE_VA      PGDIR_SIZE
>>   void *_dtb_early_va __initdata;
>>   uintptr_t _dtb_early_pa __initdata;
>>   
>> @@ -236,6 +235,14 @@ static void __init setup_bootmem(void)
>>   	set_max_mapnr(max_low_pfn - ARCH_PFN_OFFSET);
>>   
>>   	reserve_initrd_mem();
>> +
>> +	/*
>> +	 * No allocation should be done before reserving the memory as defined
>> +	 * in the device tree, otherwise the allocation could end up in a
>> +	 * reserved region.
>> +	 */
>> +	early_init_fdt_scan_reserved_mem();
>> +
>>   	/*
>>   	 * If DTB is built in, no need to reserve its memblock.
>>   	 * Otherwise, do reserve it but avoid using
>> @@ -279,9 +286,6 @@ pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
>>   static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
>>   
>>   pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
>> -static p4d_t __maybe_unused early_dtb_p4d[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE);
>> -static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
>> -static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
>>   
>>   #ifdef CONFIG_XIP_KERNEL
>>   #define pt_ops			(*(struct pt_alloc_ops *)XIP_FIXUP(&pt_ops))
>> @@ -626,9 +630,6 @@ static void __init create_p4d_mapping(p4d_t *p4dp,
>>   #define trampoline_pgd_next	(pgtable_l5_enabled ?			\
>>   		(uintptr_t)trampoline_p4d : (pgtable_l4_enabled ?	\
>>   		(uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd))
>> -#define early_dtb_pgd_next	(pgtable_l5_enabled ?			\
>> -		(uintptr_t)early_dtb_p4d : (pgtable_l4_enabled ?	\
>> -		(uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd))
>>   #else
>>   #define pgd_next_t		pte_t
>>   #define alloc_pgd_next(__va)	pt_ops.alloc_pte(__va)
>> @@ -636,7 +637,6 @@ static void __init create_p4d_mapping(p4d_t *p4dp,
>>   #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
>>   	create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
>>   #define fixmap_pgd_next		((uintptr_t)fixmap_pte)
>> -#define early_dtb_pgd_next	((uintptr_t)early_dtb_pmd)
>>   #define create_p4d_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
>>   #define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
>>   #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
>> @@ -860,32 +860,28 @@ static void __init create_kernel_page_table(pgd_t *pgdir, bool early)
>>    * this means 2 PMD entries whereas for 32-bit kernel, this is only 1 PGDIR
>>    * entry.
>>    */
>> -static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
>> +static void __init create_fdt_early_page_table(pgd_t *pgdir,
>> +					       uintptr_t fix_fdt_va,
>> +					       uintptr_t dtb_pa)
>>   {
>> -#ifndef CONFIG_BUILTIN_DTB
>>   	uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
>>   
>> -	create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
>> -			   IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
>> -			   PGDIR_SIZE,
>> -			   IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
>> -
>> -	if (pgtable_l5_enabled)
>> -		create_p4d_mapping(early_dtb_p4d, DTB_EARLY_BASE_VA,
>> -				   (uintptr_t)early_dtb_pud, P4D_SIZE, PAGE_TABLE);
>> -
>> -	if (pgtable_l4_enabled)
>> -		create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
>> -				   (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
>> +#ifndef CONFIG_BUILTIN_DTB
>> +	/* Make sure the fdt fixmap address is always aligned on PMD size */
>> +	BUILD_BUG_ON(FIX_FDT % (PMD_SIZE / PAGE_SIZE));
>>   
>> -	if (IS_ENABLED(CONFIG_64BIT)) {
>> -		create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
>> +	/* In 32-bit only, the fdt lies in its own PGD */
>> +	if (!IS_ENABLED(CONFIG_64BIT)) {
>> +		create_pgd_mapping(early_pg_dir, fix_fdt_va,
>> +				   pa, MAX_FDT_SIZE, PAGE_KERNEL);
>> +	} else {
>> +		create_pmd_mapping(fixmap_pmd, fix_fdt_va,
>>   				   pa, PMD_SIZE, PAGE_KERNEL);
>> -		create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA + PMD_SIZE,
>> +		create_pmd_mapping(fixmap_pmd, fix_fdt_va + PMD_SIZE,
>>   				   pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL);
>>   	}
>>   
>> -	dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 1));
>> +	dtb_early_va = (void *)fix_fdt_va + (dtb_pa & (PMD_SIZE - 1));
>>   #else
>>   	/*
>>   	 * For 64-bit kernel, __va can't be used since it would return a linear
>> @@ -1055,7 +1051,8 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>>   	create_kernel_page_table(early_pg_dir, true);
>>   
>>   	/* Setup early mapping for FDT early scan */
>> -	create_fdt_early_page_table(early_pg_dir, dtb_pa);
>> +	create_fdt_early_page_table(early_pg_dir,
>> +				    __fix_to_virt(FIX_FDT), dtb_pa);
>>   
>>   	/*
>>   	 * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
>> @@ -1097,6 +1094,16 @@ static void __init setup_vm_final(void)
>>   	u64 i;
>>   
>>   	/* Setup swapper PGD for fixmap */
>> +#if !defined(CONFIG_64BIT)
>> +	/*
>> +	 * In 32-bit, the device tree lies in a pgd entry, so it must be copied
>> +	 * directly in swapper_pg_dir in addition to the pgd entry that points
>> +	 * to fixmap_pte.
>> +	 */
>> +	unsigned long idx = pgd_index(__fix_to_virt(FIX_FDT));
>> +
>> +	set_pgd(&swapper_pg_dir[idx], early_pg_dir[idx]);
>> +#endif
>>   	create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
>>   			   __pa_symbol(fixmap_pgd_next),
>>   			   PGDIR_SIZE, PAGE_TABLE);
>> -- 
>> 2.37.2
>>
>>
>>
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@lists.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
Conor Dooley March 29, 2023, 7:45 a.m. UTC | #3
On Wed, Mar 29, 2023 at 09:36:00AM +0200, Alexandre Ghiti wrote:
> On 3/27/23 12:37, Conor Dooley wrote:
> > On Thu, Mar 23, 2023 at 05:33:46PM +0100, Alexandre Ghiti wrote:

> The fix is not wrong, it is just incomplete: it fixes the issue with the
> not-existing-anymore address but introduces the problem with the possible
> allocations in the reserved regions (which you explained clearly below,
> thanks).

Which to me, makes it wrong ;) It's my own patch I'm dumping on so I
think I am best qualified to do that!

> > It would be good to mention that specifically I think, say:
> > 
> > 	riscv establishes 2 virtual mappings:
> > 
> > 	- early_pg_dir maps the kernel which allows to discover the system
> > 	  memory
> > 	- swapper_pg_dir installs the final mapping (linear mapping included)
> > 
> > 	We used to map the dtb in early_pg_dir using DTB_EARLY_BASE_VA, and this
> > 	mapping was not carried over in swapper_pg_dir. This caused problems
> > 	for reserved memory, as early_init_fdt_scan_reserved_mem() initialised
> > 	reserved_mem variables with addresses that lie in the early_pg_dir dtb
> > 	mapping. When those addresses are reused with swapper_pg_dir, this
> > 	mapping does not exist and then we trap.
> > 	The previous "fix" was incorrect as early_init_fdt_scan_reserved_mem()
> > 	must be called before swapper_pg_dir is set up otherwise we could
> > 	allocate in reserved memory defined in the dtb.
> > 
> > 	Move the dtb mapping in the fixmap region which is established in
> > 	early_pg_dir and handed over to swapper_pg_dir.
> > 
> > You need this one too:
> > Fixes: 922b0375fc93 ("riscv: Fix memblock reservation for device tree blob")
> 
> Not sure this commit is related to this fix and it's hard to find *one*
> culprit: TBH I only mentioned this one as otherwise I think the right commit
> is commit 8f3a2b4a96dc ("RISC-V: Move DT mapping outof fixmap") but that's a
> long time ago and the patch won't apply easily, not sure what to do here.

Yeah, it's hard to say.. I think the one I mentioned above should be
mentioned though, because that's what (I think) introduced the bug that
I was fixing in my commit, so if this patch is replacing my fix (which
it is) then I think it should have a super-set of the Fixes: tags in my
one.

> > Thanks for working on this,
> 
> 
> You're welcome, that was fun!

Sounds like masochism to me! ;)
diff mbox series

Patch

diff --git a/Documentation/riscv/vm-layout.rst b/Documentation/riscv/vm-layout.rst
index 3be44e74ec5d..5462c84f4723 100644
--- a/Documentation/riscv/vm-layout.rst
+++ b/Documentation/riscv/vm-layout.rst
@@ -47,7 +47,7 @@  RISC-V Linux Kernel SV39
                                                               | Kernel-space virtual memory, shared between all processes:
   ____________________________________________________________|___________________________________________________________
                     |            |                  |         |
-   ffffffc6fee00000 | -228    GB | ffffffc6feffffff |    2 MB | fixmap
+   ffffffc6fea00000 | -228    GB | ffffffc6feffffff |    6 MB | fixmap
    ffffffc6ff000000 | -228    GB | ffffffc6ffffffff |   16 MB | PCI io
    ffffffc700000000 | -228    GB | ffffffc7ffffffff |    4 GB | vmemmap
    ffffffc800000000 | -224    GB | ffffffd7ffffffff |   64 GB | vmalloc/ioremap space
@@ -83,7 +83,7 @@  RISC-V Linux Kernel SV48
                                                               | Kernel-space virtual memory, shared between all processes:
   ____________________________________________________________|___________________________________________________________
                     |            |                  |         |
-   ffff8d7ffee00000 |  -114.5 TB | ffff8d7ffeffffff |    2 MB | fixmap
+   ffff8d7ffea00000 |  -114.5 TB | ffff8d7ffeffffff |    6 MB | fixmap
    ffff8d7fff000000 |  -114.5 TB | ffff8d7fffffffff |   16 MB | PCI io
    ffff8d8000000000 |  -114.5 TB | ffff8f7fffffffff |    2 TB | vmemmap
    ffff8f8000000000 |  -112.5 TB | ffffaf7fffffffff |   32 TB | vmalloc/ioremap space
@@ -119,7 +119,7 @@  RISC-V Linux Kernel SV57
                                                               | Kernel-space virtual memory, shared between all processes:
   ____________________________________________________________|___________________________________________________________
                     |            |                  |         |
-   ff1bfffffee00000 | -57     PB | ff1bfffffeffffff |    2 MB | fixmap
+   ff1bfffffea00000 | -57     PB | ff1bfffffeffffff |    6 MB | fixmap
    ff1bffffff000000 | -57     PB | ff1bffffffffffff |   16 MB | PCI io
    ff1c000000000000 | -57     PB | ff1fffffffffffff |    1 PB | vmemmap
    ff20000000000000 | -56     PB | ff5fffffffffffff |   16 PB | vmalloc/ioremap space
diff --git a/arch/riscv/include/asm/fixmap.h b/arch/riscv/include/asm/fixmap.h
index 5c3e7b97fcc6..0a55099bb734 100644
--- a/arch/riscv/include/asm/fixmap.h
+++ b/arch/riscv/include/asm/fixmap.h
@@ -22,6 +22,14 @@ 
  */
 enum fixed_addresses {
 	FIX_HOLE,
+	/*
+	 * The fdt fixmap mapping must be PMD aligned and will be mapped
+	 * using PMD entries in fixmap_pmd in 64-bit and a PGD entry in 32-bit.
+	 */
+	FIX_FDT_END,
+	FIX_FDT = FIX_FDT_END + FIX_FDT_SIZE / PAGE_SIZE - 1,
+
+	/* Below fixmaps will be mapped using fixmap_pte */
 	FIX_PTE,
 	FIX_PMD,
 	FIX_PUD,
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index ab05f892d317..f641837ccf31 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -87,9 +87,13 @@ 
 
 #define FIXADDR_TOP      PCI_IO_START
 #ifdef CONFIG_64BIT
-#define FIXADDR_SIZE     PMD_SIZE
+#define MAX_FDT_SIZE	 PMD_SIZE
+#define FIX_FDT_SIZE	 (MAX_FDT_SIZE + SZ_2M)
+#define FIXADDR_SIZE     (PMD_SIZE + FIX_FDT_SIZE)
 #else
-#define FIXADDR_SIZE     PGDIR_SIZE
+#define MAX_FDT_SIZE	 PGDIR_SIZE
+#define FIX_FDT_SIZE	 MAX_FDT_SIZE
+#define FIXADDR_SIZE     (PGDIR_SIZE + FIX_FDT_SIZE)
 #endif
 #define FIXADDR_START    (FIXADDR_TOP - FIXADDR_SIZE)
 
diff --git a/arch/riscv/kernel/setup.c b/arch/riscv/kernel/setup.c
index 376d2827e736..542eed85ad2c 100644
--- a/arch/riscv/kernel/setup.c
+++ b/arch/riscv/kernel/setup.c
@@ -283,7 +283,6 @@  void __init setup_arch(char **cmdline_p)
 	else
 		pr_err("No DTB found in kernel mappings\n");
 #endif
-	early_init_fdt_scan_reserved_mem();
 	misc_mem_init();
 
 	init_resources();
diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
index 478d6763a01a..fb78d6bbabae 100644
--- a/arch/riscv/mm/init.c
+++ b/arch/riscv/mm/init.c
@@ -57,7 +57,6 @@  unsigned long empty_zero_page[PAGE_SIZE / sizeof(unsigned long)]
 EXPORT_SYMBOL(empty_zero_page);
 
 extern char _start[];
-#define DTB_EARLY_BASE_VA      PGDIR_SIZE
 void *_dtb_early_va __initdata;
 uintptr_t _dtb_early_pa __initdata;
 
@@ -236,6 +235,14 @@  static void __init setup_bootmem(void)
 	set_max_mapnr(max_low_pfn - ARCH_PFN_OFFSET);
 
 	reserve_initrd_mem();
+
+	/*
+	 * No allocation should be done before reserving the memory as defined
+	 * in the device tree, otherwise the allocation could end up in a
+	 * reserved region.
+	 */
+	early_init_fdt_scan_reserved_mem();
+
 	/*
 	 * If DTB is built in, no need to reserve its memblock.
 	 * Otherwise, do reserve it but avoid using
@@ -279,9 +286,6 @@  pgd_t trampoline_pg_dir[PTRS_PER_PGD] __page_aligned_bss;
 static pte_t fixmap_pte[PTRS_PER_PTE] __page_aligned_bss;
 
 pgd_t early_pg_dir[PTRS_PER_PGD] __initdata __aligned(PAGE_SIZE);
-static p4d_t __maybe_unused early_dtb_p4d[PTRS_PER_P4D] __initdata __aligned(PAGE_SIZE);
-static pud_t __maybe_unused early_dtb_pud[PTRS_PER_PUD] __initdata __aligned(PAGE_SIZE);
-static pmd_t __maybe_unused early_dtb_pmd[PTRS_PER_PMD] __initdata __aligned(PAGE_SIZE);
 
 #ifdef CONFIG_XIP_KERNEL
 #define pt_ops			(*(struct pt_alloc_ops *)XIP_FIXUP(&pt_ops))
@@ -626,9 +630,6 @@  static void __init create_p4d_mapping(p4d_t *p4dp,
 #define trampoline_pgd_next	(pgtable_l5_enabled ?			\
 		(uintptr_t)trampoline_p4d : (pgtable_l4_enabled ?	\
 		(uintptr_t)trampoline_pud : (uintptr_t)trampoline_pmd))
-#define early_dtb_pgd_next	(pgtable_l5_enabled ?			\
-		(uintptr_t)early_dtb_p4d : (pgtable_l4_enabled ?	\
-		(uintptr_t)early_dtb_pud : (uintptr_t)early_dtb_pmd))
 #else
 #define pgd_next_t		pte_t
 #define alloc_pgd_next(__va)	pt_ops.alloc_pte(__va)
@@ -636,7 +637,6 @@  static void __init create_p4d_mapping(p4d_t *p4dp,
 #define create_pgd_next_mapping(__nextp, __va, __pa, __sz, __prot)	\
 	create_pte_mapping(__nextp, __va, __pa, __sz, __prot)
 #define fixmap_pgd_next		((uintptr_t)fixmap_pte)
-#define early_dtb_pgd_next	((uintptr_t)early_dtb_pmd)
 #define create_p4d_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
 #define create_pud_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
 #define create_pmd_mapping(__pmdp, __va, __pa, __sz, __prot) do {} while(0)
@@ -860,32 +860,28 @@  static void __init create_kernel_page_table(pgd_t *pgdir, bool early)
  * this means 2 PMD entries whereas for 32-bit kernel, this is only 1 PGDIR
  * entry.
  */
-static void __init create_fdt_early_page_table(pgd_t *pgdir, uintptr_t dtb_pa)
+static void __init create_fdt_early_page_table(pgd_t *pgdir,
+					       uintptr_t fix_fdt_va,
+					       uintptr_t dtb_pa)
 {
-#ifndef CONFIG_BUILTIN_DTB
 	uintptr_t pa = dtb_pa & ~(PMD_SIZE - 1);
 
-	create_pgd_mapping(early_pg_dir, DTB_EARLY_BASE_VA,
-			   IS_ENABLED(CONFIG_64BIT) ? early_dtb_pgd_next : pa,
-			   PGDIR_SIZE,
-			   IS_ENABLED(CONFIG_64BIT) ? PAGE_TABLE : PAGE_KERNEL);
-
-	if (pgtable_l5_enabled)
-		create_p4d_mapping(early_dtb_p4d, DTB_EARLY_BASE_VA,
-				   (uintptr_t)early_dtb_pud, P4D_SIZE, PAGE_TABLE);
-
-	if (pgtable_l4_enabled)
-		create_pud_mapping(early_dtb_pud, DTB_EARLY_BASE_VA,
-				   (uintptr_t)early_dtb_pmd, PUD_SIZE, PAGE_TABLE);
+#ifndef CONFIG_BUILTIN_DTB
+	/* Make sure the fdt fixmap address is always aligned on PMD size */
+	BUILD_BUG_ON(FIX_FDT % (PMD_SIZE / PAGE_SIZE));
 
-	if (IS_ENABLED(CONFIG_64BIT)) {
-		create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA,
+	/* In 32-bit only, the fdt lies in its own PGD */
+	if (!IS_ENABLED(CONFIG_64BIT)) {
+		create_pgd_mapping(early_pg_dir, fix_fdt_va,
+				   pa, MAX_FDT_SIZE, PAGE_KERNEL);
+	} else {
+		create_pmd_mapping(fixmap_pmd, fix_fdt_va,
 				   pa, PMD_SIZE, PAGE_KERNEL);
-		create_pmd_mapping(early_dtb_pmd, DTB_EARLY_BASE_VA + PMD_SIZE,
+		create_pmd_mapping(fixmap_pmd, fix_fdt_va + PMD_SIZE,
 				   pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL);
 	}
 
-	dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 1));
+	dtb_early_va = (void *)fix_fdt_va + (dtb_pa & (PMD_SIZE - 1));
 #else
 	/*
 	 * For 64-bit kernel, __va can't be used since it would return a linear
@@ -1055,7 +1051,8 @@  asmlinkage void __init setup_vm(uintptr_t dtb_pa)
 	create_kernel_page_table(early_pg_dir, true);
 
 	/* Setup early mapping for FDT early scan */
-	create_fdt_early_page_table(early_pg_dir, dtb_pa);
+	create_fdt_early_page_table(early_pg_dir,
+				    __fix_to_virt(FIX_FDT), dtb_pa);
 
 	/*
 	 * Bootime fixmap only can handle PMD_SIZE mapping. Thus, boot-ioremap
@@ -1097,6 +1094,16 @@  static void __init setup_vm_final(void)
 	u64 i;
 
 	/* Setup swapper PGD for fixmap */
+#if !defined(CONFIG_64BIT)
+	/*
+	 * In 32-bit, the device tree lies in a pgd entry, so it must be copied
+	 * directly in swapper_pg_dir in addition to the pgd entry that points
+	 * to fixmap_pte.
+	 */
+	unsigned long idx = pgd_index(__fix_to_virt(FIX_FDT));
+
+	set_pgd(&swapper_pg_dir[idx], early_pg_dir[idx]);
+#endif
 	create_pgd_mapping(swapper_pg_dir, FIXADDR_START,
 			   __pa_symbol(fixmap_pgd_next),
 			   PGDIR_SIZE, PAGE_TABLE);