diff mbox series

[v7,5/7] arm64: vmemmap: Avoid base2 order of struct page size to dimension region

Message ID 20231213084024.2367360-14-ardb@google.com (mailing list archive)
State New, archived
Headers show
Series arm64: Reorganize kernel VA space | expand

Commit Message

Ard Biesheuvel Dec. 13, 2023, 8:40 a.m. UTC
From: Ard Biesheuvel <ardb@kernel.org>

The placement and size of the vmemmap region in the kernel virtual
address space is currently derived from the base2 order of the size of a
struct page. This makes for nicely aligned constants with lots of
leading 0xf and trailing 0x0 digits, but given that the actual struct
pages are indexed as an ordinary array, this resulting region is
severely overdimensioned when the size of a struct page is just over a
power of 2.

This doesn't matter today, but once we enable 52-bit virtual addressing
for 4k pages configurations, the vmemmap region may take up almost half
of the upper VA region with the current struct page upper bound at 64
bytes. And once we enable KMSAN or other features that push the size of
a struct page over 64 bytes, we will run out of VMALLOC space entirely.

So instead, let's derive the region size from the actual size of a
struct page, and place the entire region 1 GB from the top of the VA
space, where it still doesn't share any lower level translation table
entries with the fixmap.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
---
 arch/arm64/include/asm/memory.h | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

Comments

Mark Rutland Dec. 13, 2023, 1:49 p.m. UTC | #1
Hi Ard,

On Wed, Dec 13, 2023 at 09:40:30AM +0100, Ard Biesheuvel wrote:
> From: Ard Biesheuvel <ardb@kernel.org>
> 
> The placement and size of the vmemmap region in the kernel virtual
> address space is currently derived from the base2 order of the size of a
> struct page. This makes for nicely aligned constants with lots of
> leading 0xf and trailing 0x0 digits, but given that the actual struct
> pages are indexed as an ordinary array, this resulting region is
> severely overdimensioned when the size of a struct page is just over a
> power of 2.
> 
> This doesn't matter today, but once we enable 52-bit virtual addressing
> for 4k pages configurations, the vmemmap region may take up almost half
> of the upper VA region with the current struct page upper bound at 64
> bytes. And once we enable KMSAN or other features that push the size of
> a struct page over 64 bytes, we will run out of VMALLOC space entirely.
> 
> So instead, let's derive the region size from the actual size of a
> struct page, and place the entire region 1 GB from the top of the VA
> space, where it still doesn't share any lower level translation table
> entries with the fixmap.
> 
> Acked-by: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> ---
>  arch/arm64/include/asm/memory.h | 8 ++++----
>  1 file changed, 4 insertions(+), 4 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 2745bed8ae5b..b49575a92afc 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -30,8 +30,8 @@
>   * keep a constant PAGE_OFFSET and "fallback" to using the higher end
>   * of the VMEMMAP where 52-bit support is not available in hardware.
>   */
> -#define VMEMMAP_SHIFT	(PAGE_SHIFT - STRUCT_PAGE_MAX_SHIFT)
> -#define VMEMMAP_SIZE	((_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET) >> VMEMMAP_SHIFT)
> +#define VMEMMAP_RANGE	(_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET)
> +#define VMEMMAP_SIZE	((VMEMMAP_RANGE >> PAGE_SHIFT) * sizeof(struct page))
>  
>  /*
>   * PAGE_OFFSET - the virtual address of the start of the linear map, at the
> @@ -47,8 +47,8 @@
>  #define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
>  #define MODULES_VADDR		(_PAGE_END(VA_BITS_MIN))
>  #define MODULES_VSIZE		(SZ_2G)
> -#define VMEMMAP_START		(-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> -#define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
> +#define VMEMMAP_START		(VMEMMAP_END - VMEMMAP_SIZE)
> +#define VMEMMAP_END		(-UL(SZ_1G))
>  #define PCI_IO_START		(VMEMMAP_END + SZ_8M)
>  #define PCI_IO_END		(PCI_IO_START + PCI_IO_SIZE)
>  #define FIXADDR_TOP		(-UL(SZ_8M))

I realise I've acked this already, but my big concern here is still that it's
hard to see why these don't overlap (though the assert in fixmap.c will save
us). Usually we try to make that clear by construction, and I think we can do
that here with something like:

| #define GUARD_VA_SIZE          (UL(SZ_8M))
| 
| #define FIXADDR_TOP            (-GUARD_VA_SIZE)
| #define FIXADDR_SIZE_MAX       SZ_8M
| #define FIXADDR_START_MIN      (FIXADDR_TOP - FIXADDR_SIZE_MAX)
| 
| #define PCI_IO_END             (FIXADDR_START_MIN - GUARD_VA_SIZE)
| #define PCI_IO_START           (PCI_IO_END - PCI_IO_SIZE)
| 
| #define VMEMMAP_END            (ALIGN_DOWN(PCI_IO_START - GUARD_VA_SIZE, SZ_1G))
| #define VMEMMAP_START          (VMEMMAP_END - VMEMMAP_SIZE)

... and in fixmap.h have:

/* Ensure the estimate in memory.h was big enough */
static_assert(FIXADDR_SIZE_MAX > FIXADDR_SIZE);

I might be missing some reason why we can't do that; I locally tried the above
atop this series with defconfig+4K and defcconfig+64K, and both build and boot
without sisue.

Other than that, the series looks good to me.

If you're happy with the above I can go spin that as a patch to apply atop.

Mark.
Ard Biesheuvel Dec. 13, 2023, 2:09 p.m. UTC | #2
On Wed, 13 Dec 2023 at 14:49, Mark Rutland <mark.rutland@arm.com> wrote:
>
> Hi Ard,
>
> On Wed, Dec 13, 2023 at 09:40:30AM +0100, Ard Biesheuvel wrote:
> > From: Ard Biesheuvel <ardb@kernel.org>
> >
> > The placement and size of the vmemmap region in the kernel virtual
> > address space is currently derived from the base2 order of the size of a
> > struct page. This makes for nicely aligned constants with lots of
> > leading 0xf and trailing 0x0 digits, but given that the actual struct
> > pages are indexed as an ordinary array, this resulting region is
> > severely overdimensioned when the size of a struct page is just over a
> > power of 2.
> >
> > This doesn't matter today, but once we enable 52-bit virtual addressing
> > for 4k pages configurations, the vmemmap region may take up almost half
> > of the upper VA region with the current struct page upper bound at 64
> > bytes. And once we enable KMSAN or other features that push the size of
> > a struct page over 64 bytes, we will run out of VMALLOC space entirely.
> >
> > So instead, let's derive the region size from the actual size of a
> > struct page, and place the entire region 1 GB from the top of the VA
> > space, where it still doesn't share any lower level translation table
> > entries with the fixmap.
> >
> > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > ---
> >  arch/arm64/include/asm/memory.h | 8 ++++----
> >  1 file changed, 4 insertions(+), 4 deletions(-)
> >
> > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > index 2745bed8ae5b..b49575a92afc 100644
> > --- a/arch/arm64/include/asm/memory.h
> > +++ b/arch/arm64/include/asm/memory.h
> > @@ -30,8 +30,8 @@
> >   * keep a constant PAGE_OFFSET and "fallback" to using the higher end
> >   * of the VMEMMAP where 52-bit support is not available in hardware.
> >   */
> > -#define VMEMMAP_SHIFT        (PAGE_SHIFT - STRUCT_PAGE_MAX_SHIFT)
> > -#define VMEMMAP_SIZE ((_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET) >> VMEMMAP_SHIFT)
> > +#define VMEMMAP_RANGE        (_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET)
> > +#define VMEMMAP_SIZE ((VMEMMAP_RANGE >> PAGE_SHIFT) * sizeof(struct page))
> >
> >  /*
> >   * PAGE_OFFSET - the virtual address of the start of the linear map, at the
> > @@ -47,8 +47,8 @@
> >  #define MODULES_END          (MODULES_VADDR + MODULES_VSIZE)
> >  #define MODULES_VADDR                (_PAGE_END(VA_BITS_MIN))
> >  #define MODULES_VSIZE                (SZ_2G)
> > -#define VMEMMAP_START                (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> > -#define VMEMMAP_END          (VMEMMAP_START + VMEMMAP_SIZE)
> > +#define VMEMMAP_START                (VMEMMAP_END - VMEMMAP_SIZE)
> > +#define VMEMMAP_END          (-UL(SZ_1G))
> >  #define PCI_IO_START         (VMEMMAP_END + SZ_8M)
> >  #define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
> >  #define FIXADDR_TOP          (-UL(SZ_8M))
>
> I realise I've acked this already, but my big concern here is still that it's
> hard to see why these don't overlap (though the assert in fixmap.c will save
> us). Usually we try to make that clear by construction, and I think we can do
> that here with something like:
>
> | #define GUARD_VA_SIZE          (UL(SZ_8M))
> |
> | #define FIXADDR_TOP            (-GUARD_VA_SIZE)
> | #define FIXADDR_SIZE_MAX       SZ_8M
> | #define FIXADDR_START_MIN      (FIXADDR_TOP - FIXADDR_SIZE_MAX)
> |
> | #define PCI_IO_END             (FIXADDR_START_MIN - GUARD_VA_SIZE)
> | #define PCI_IO_START           (PCI_IO_END - PCI_IO_SIZE)
> |
> | #define VMEMMAP_END            (ALIGN_DOWN(PCI_IO_START - GUARD_VA_SIZE, SZ_1G))
> | #define VMEMMAP_START          (VMEMMAP_END - VMEMMAP_SIZE)
>
> ... and in fixmap.h have:
>
> /* Ensure the estimate in memory.h was big enough */
> static_assert(FIXADDR_SIZE_MAX > FIXADDR_SIZE);
>
> I might be missing some reason why we can't do that; I locally tried the above
> atop this series with defconfig+4K and defcconfig+64K, and both build and boot
> without sisue.
>

I am really struggling to understand what the issue is that you are
trying to solve here.

What I am proposing is

#define VMEMMAP_END          (-UL(SZ_1G))
#define PCI_IO_START         (VMEMMAP_END + SZ_8M)
#define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
#define FIXADDR_TOP          (-UL(SZ_8M))

and in fixmap.c

static_assert(FIXADDR_TOT_START > PCI_IO_END);

(which sadly has to live outside of asm/memory.h for reasons of #include hell).

This leaves the top 1G for PCI I/O plus fixmap, and ensures that the
latter does not collide with the former.

> Other than that, the series looks good to me.
>

Thanks.

> If you're happy with the above I can go spin that as a patch to apply atop.
>

I won't object to that but I can't say I am convinced of the need.

Thanks,
Mark Rutland Dec. 13, 2023, 2:39 p.m. UTC | #3
On Wed, Dec 13, 2023 at 03:09:54PM +0100, Ard Biesheuvel wrote:
> On Wed, 13 Dec 2023 at 14:49, Mark Rutland <mark.rutland@arm.com> wrote:
> >
> > Hi Ard,
> >
> > On Wed, Dec 13, 2023 at 09:40:30AM +0100, Ard Biesheuvel wrote:
> > > From: Ard Biesheuvel <ardb@kernel.org>
> > >
> > > The placement and size of the vmemmap region in the kernel virtual
> > > address space is currently derived from the base2 order of the size of a
> > > struct page. This makes for nicely aligned constants with lots of
> > > leading 0xf and trailing 0x0 digits, but given that the actual struct
> > > pages are indexed as an ordinary array, this resulting region is
> > > severely overdimensioned when the size of a struct page is just over a
> > > power of 2.
> > >
> > > This doesn't matter today, but once we enable 52-bit virtual addressing
> > > for 4k pages configurations, the vmemmap region may take up almost half
> > > of the upper VA region with the current struct page upper bound at 64
> > > bytes. And once we enable KMSAN or other features that push the size of
> > > a struct page over 64 bytes, we will run out of VMALLOC space entirely.
> > >
> > > So instead, let's derive the region size from the actual size of a
> > > struct page, and place the entire region 1 GB from the top of the VA
> > > space, where it still doesn't share any lower level translation table
> > > entries with the fixmap.
> > >
> > > Acked-by: Mark Rutland <mark.rutland@arm.com>
> > > Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
> > > ---
> > >  arch/arm64/include/asm/memory.h | 8 ++++----
> > >  1 file changed, 4 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > > index 2745bed8ae5b..b49575a92afc 100644
> > > --- a/arch/arm64/include/asm/memory.h
> > > +++ b/arch/arm64/include/asm/memory.h
> > > @@ -30,8 +30,8 @@
> > >   * keep a constant PAGE_OFFSET and "fallback" to using the higher end
> > >   * of the VMEMMAP where 52-bit support is not available in hardware.
> > >   */
> > > -#define VMEMMAP_SHIFT        (PAGE_SHIFT - STRUCT_PAGE_MAX_SHIFT)
> > > -#define VMEMMAP_SIZE ((_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET) >> VMEMMAP_SHIFT)
> > > +#define VMEMMAP_RANGE        (_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET)
> > > +#define VMEMMAP_SIZE ((VMEMMAP_RANGE >> PAGE_SHIFT) * sizeof(struct page))
> > >
> > >  /*
> > >   * PAGE_OFFSET - the virtual address of the start of the linear map, at the
> > > @@ -47,8 +47,8 @@
> > >  #define MODULES_END          (MODULES_VADDR + MODULES_VSIZE)
> > >  #define MODULES_VADDR                (_PAGE_END(VA_BITS_MIN))
> > >  #define MODULES_VSIZE                (SZ_2G)
> > > -#define VMEMMAP_START                (-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
> > > -#define VMEMMAP_END          (VMEMMAP_START + VMEMMAP_SIZE)
> > > +#define VMEMMAP_START                (VMEMMAP_END - VMEMMAP_SIZE)
> > > +#define VMEMMAP_END          (-UL(SZ_1G))
> > >  #define PCI_IO_START         (VMEMMAP_END + SZ_8M)
> > >  #define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
> > >  #define FIXADDR_TOP          (-UL(SZ_8M))
> >
> > I realise I've acked this already, but my big concern here is still that it's
> > hard to see why these don't overlap (though the assert in fixmap.c will save
> > us). Usually we try to make that clear by construction, and I think we can do
> > that here with something like:
> >
> > | #define GUARD_VA_SIZE          (UL(SZ_8M))
> > |
> > | #define FIXADDR_TOP            (-GUARD_VA_SIZE)
> > | #define FIXADDR_SIZE_MAX       SZ_8M
> > | #define FIXADDR_START_MIN      (FIXADDR_TOP - FIXADDR_SIZE_MAX)
> > |
> > | #define PCI_IO_END             (FIXADDR_START_MIN - GUARD_VA_SIZE)
> > | #define PCI_IO_START           (PCI_IO_END - PCI_IO_SIZE)
> > |
> > | #define VMEMMAP_END            (ALIGN_DOWN(PCI_IO_START - GUARD_VA_SIZE, SZ_1G))
> > | #define VMEMMAP_START          (VMEMMAP_END - VMEMMAP_SIZE)
> >
> > ... and in fixmap.h have:
> >
> > /* Ensure the estimate in memory.h was big enough */
> > static_assert(FIXADDR_SIZE_MAX > FIXADDR_SIZE);
> >
> > I might be missing some reason why we can't do that; I locally tried the above
> > atop this series with defconfig+4K and defcconfig+64K, and both build and boot
> > without sisue.
> >
> 
> I am really struggling to understand what the issue is that you are
> trying to solve here.
> 
> What I am proposing is
> 
> #define VMEMMAP_END          (-UL(SZ_1G))
> #define PCI_IO_START         (VMEMMAP_END + SZ_8M)
> #define PCI_IO_END           (PCI_IO_START + PCI_IO_SIZE)
> #define FIXADDR_TOP          (-UL(SZ_8M))
> 
> and in fixmap.c
> 
> static_assert(FIXADDR_TOT_START > PCI_IO_END);
> 
> (which sadly has to live outside of asm/memory.h for reasons of #include hell).
> 
> This leaves the top 1G for PCI I/O plus fixmap, and ensures that the
> latter does not collide with the former.

Sure, I get that (and that #include hell is why I have FIXADDR_SIZE_MAX above).

What I'm trying to do is make the relationship between those regions clear *in
one place*, so that this is easier to follow, as that was one of the things I
found painful during review. That said, it's not clear cut, and I'll happily
defer to the judgement of Will and Catalin.

For the series as-is:

Acked-by: Mark Rutland <mark.rutland@arm.com>

Will/Catalin, please let me know your preference on the above.

Thanks,
Mark.
Catalin Marinas Feb. 9, 2024, 5:33 p.m. UTC | #4
On Wed, Dec 13, 2023 at 02:39:30PM +0000, Mark Rutland wrote:
> > On Wed, 13 Dec 2023 at 14:49, Mark Rutland <mark.rutland@arm.com> wrote:
> > > | #define GUARD_VA_SIZE          (UL(SZ_8M))
> > > |
> > > | #define FIXADDR_TOP            (-GUARD_VA_SIZE)
> > > | #define FIXADDR_SIZE_MAX       SZ_8M
> > > | #define FIXADDR_START_MIN      (FIXADDR_TOP - FIXADDR_SIZE_MAX)
> > > |
> > > | #define PCI_IO_END             (FIXADDR_START_MIN - GUARD_VA_SIZE)
> > > | #define PCI_IO_START           (PCI_IO_END - PCI_IO_SIZE)
> > > |
> > > | #define VMEMMAP_END            (ALIGN_DOWN(PCI_IO_START - GUARD_VA_SIZE, SZ_1G))
> > > | #define VMEMMAP_START          (VMEMMAP_END - VMEMMAP_SIZE)
> > >
> > > ... and in fixmap.h have:
> > >
> > > /* Ensure the estimate in memory.h was big enough */
> > > static_assert(FIXADDR_SIZE_MAX > FIXADDR_SIZE);
[...]
> What I'm trying to do is make the relationship between those regions clear *in
> one place*, so that this is easier to follow, as that was one of the things I
> found painful during review. That said, it's not clear cut, and I'll happily
> defer to the judgement of Will and Catalin.

I think Mark's proposal is slightly cleaner. There's no actual layout
change, just making the maximum fixaddr size explicit in memory.h.

I'm queuing this series, so Mark please send a patch on top (I'll kick
off b4 ty soon).

Thanks.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 2745bed8ae5b..b49575a92afc 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -30,8 +30,8 @@ 
  * keep a constant PAGE_OFFSET and "fallback" to using the higher end
  * of the VMEMMAP where 52-bit support is not available in hardware.
  */
-#define VMEMMAP_SHIFT	(PAGE_SHIFT - STRUCT_PAGE_MAX_SHIFT)
-#define VMEMMAP_SIZE	((_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET) >> VMEMMAP_SHIFT)
+#define VMEMMAP_RANGE	(_PAGE_END(VA_BITS_MIN) - PAGE_OFFSET)
+#define VMEMMAP_SIZE	((VMEMMAP_RANGE >> PAGE_SHIFT) * sizeof(struct page))
 
 /*
  * PAGE_OFFSET - the virtual address of the start of the linear map, at the
@@ -47,8 +47,8 @@ 
 #define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
 #define MODULES_VADDR		(_PAGE_END(VA_BITS_MIN))
 #define MODULES_VSIZE		(SZ_2G)
-#define VMEMMAP_START		(-(UL(1) << (VA_BITS - VMEMMAP_SHIFT)))
-#define VMEMMAP_END		(VMEMMAP_START + VMEMMAP_SIZE)
+#define VMEMMAP_START		(VMEMMAP_END - VMEMMAP_SIZE)
+#define VMEMMAP_END		(-UL(SZ_1G))
 #define PCI_IO_START		(VMEMMAP_END + SZ_8M)
 #define PCI_IO_END		(PCI_IO_START + PCI_IO_SIZE)
 #define FIXADDR_TOP		(-UL(SZ_8M))