diff mbox series

[v3] arm64: Expose the end of the linear map in PHYSMEM_END

Message ID 20240903164532.3874988-1-scott@os.amperecomputing.com (mailing list archive)
State New
Headers show
Series [v3] arm64: Expose the end of the linear map in PHYSMEM_END | expand

Commit Message

D Scott Phillips Sept. 3, 2024, 4:45 p.m. UTC
The memory hot-plug and resource management code needs to know the
largest address which can fit in the linear map, so set
PHYSMEM_END for that purpose.

This fixes a crash[1] at boot when amdgpu tries to create
DEVICE_PRIVATE_MEMORY and is given a physical address by the
resource management code which is outside the range which can have
a `struct page`

The Fixes: commit listed below isn't actually broken, but the
reorganization of vmemmap causes the improper DEVICE_PRIVATE_MEMORY address
to go from a warning to a crash.

[1]: Unable to handle kernel paging request at virtual address
     000001ffa6000034
     Mem abort info:
       ESR = 0x0000000096000044
       EC = 0x25: DABT (current EL), IL = 32 bits
       SET = 0, FnV = 0
       EA = 0, S1PTW = 0
       FSC = 0x04: level 0 translation fault
     Data abort info:
       ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
       CM = 0, WnR = 1, TnD = 0, TagAccess = 0
       GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
     user pgtable: 4k pages, 48-bit VAs, pgdp=000008000287c000
     [000001ffa6000034] pgd=0000000000000000, p4d=0000000000000000
     Call trace:
      __init_zone_device_page.constprop.0+0x2c/0xa8
      memmap_init_zone_device+0xf0/0x210
      pagemap_range+0x1e0/0x410
      memremap_pages+0x18c/0x2e0
      devm_memremap_pages+0x30/0x90
      kgd2kfd_init_zone_device+0xf0/0x200 [amdgpu]
      amdgpu_device_ip_init+0x674/0x888 [amdgpu]
      amdgpu_device_init+0x7a4/0xea0 [amdgpu]
      amdgpu_driver_load_kms+0x28/0x1c0 [amdgpu]
      amdgpu_pci_probe+0x1a0/0x560 [amdgpu]
      local_pci_probe+0x48/0xb8
      work_for_cpu_fn+0x24/0x40
      process_one_work+0x170/0x3e0
      worker_thread+0x2ac/0x3e0
      kthread+0xf4/0x108
      ret_from_fork+0x10/0x20

Fixes: 32697ff38287 ("arm64: vmemmap: Avoid base2 order of struct page size to dimension region")
Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com>
Cc: stable@vger.kernel.org

---
Link to v2: https://lore.kernel.org/all/20240709002757.2431399-1-scott@os.amperecomputing.com/
Changes since v1:
 - Change approach again to defining the newly created PHYSMEM_END in
   arch/arm64/include/asm/memory.h

Link to v1: https://lore.kernel.org/all/20240703210707.1986816-1-scott@os.amperecomputing.com/
Changes since v1:
 - Change from fiddling the architecture's MAX_PHYSMEM_BITS to checking
   arch_get_mappable_range().

 arch/arm64/include/asm/memory.h | 2 ++
 1 file changed, 2 insertions(+)

Comments

Andy Shevchenko Sept. 3, 2024, 5:03 p.m. UTC | #1
On Tue, Sep 03, 2024 at 09:45:32AM -0700, D Scott Phillips wrote:
> The memory hot-plug and resource management code needs to know the
> largest address which can fit in the linear map, so set
> PHYSMEM_END for that purpose.
> 
> This fixes a crash[1] at boot when amdgpu tries to create
> DEVICE_PRIVATE_MEMORY and is given a physical address by the
> resource management code which is outside the range which can have
> a `struct page`
> 
> The Fixes: commit listed below isn't actually broken, but the
> reorganization of vmemmap causes the improper DEVICE_PRIVATE_MEMORY address
> to go from a warning to a crash.
> 
> [1]: Unable to handle kernel paging request at virtual address

No need to have [1]: prefix here and also read this
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces-in-commit-messages
and amend commit message accordingly.

>      000001ffa6000034
>      Mem abort info:
>        ESR = 0x0000000096000044
>        EC = 0x25: DABT (current EL), IL = 32 bits
>        SET = 0, FnV = 0
>        EA = 0, S1PTW = 0
>        FSC = 0x04: level 0 translation fault
>      Data abort info:
>        ISV = 0, ISS = 0x00000044, ISS2 = 0x00000000
>        CM = 0, WnR = 1, TnD = 0, TagAccess = 0
>        GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
>      user pgtable: 4k pages, 48-bit VAs, pgdp=000008000287c000
>      [000001ffa6000034] pgd=0000000000000000, p4d=0000000000000000
>      Call trace:
>       __init_zone_device_page.constprop.0+0x2c/0xa8
>       memmap_init_zone_device+0xf0/0x210
>       pagemap_range+0x1e0/0x410
>       memremap_pages+0x18c/0x2e0
>       devm_memremap_pages+0x30/0x90
>       kgd2kfd_init_zone_device+0xf0/0x200 [amdgpu]
>       amdgpu_device_ip_init+0x674/0x888 [amdgpu]
>       amdgpu_device_init+0x7a4/0xea0 [amdgpu]
>       amdgpu_driver_load_kms+0x28/0x1c0 [amdgpu]
>       amdgpu_pci_probe+0x1a0/0x560 [amdgpu]
>       local_pci_probe+0x48/0xb8
>       work_for_cpu_fn+0x24/0x40
>       process_one_work+0x170/0x3e0
>       worker_thread+0x2ac/0x3e0
>       kthread+0xf4/0x108
>       ret_from_fork+0x10/0x20
Will Deacon Sept. 4, 2024, 4:12 p.m. UTC | #2
On Tue, 03 Sep 2024 09:45:32 -0700, D Scott Phillips wrote:
> The memory hot-plug and resource management code needs to know the
> largest address which can fit in the linear map, so set
> PHYSMEM_END for that purpose.
> 
> This fixes a crash[1] at boot when amdgpu tries to create
> DEVICE_PRIVATE_MEMORY and is given a physical address by the
> resource management code which is outside the range which can have
> a `struct page`
> 
> [...]

Applied to arm64 (for-next/mm), thanks!

I dropped the cc: stable, however, as PHYSMEM_END looks like it only
exists in linux-next.

[1/1] arm64: Expose the end of the linear map in PHYSMEM_END
      https://git.kernel.org/arm64/c/eeb8fdfcf090

Cheers,
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 54fb014eba05..0480c61dbb4f 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -110,6 +110,8 @@ 
 #define PAGE_END		(_PAGE_END(VA_BITS_MIN))
 #endif /* CONFIG_KASAN */
 
+#define PHYSMEM_END		__pa(PAGE_END - 1)
+
 #define MIN_THREAD_SHIFT	(14 + KASAN_THREAD_SHIFT)
 
 /*