diff mbox series

[3/7] RISC-V: Rework kernel's virtual address space mapping

Message ID 20190327213643.23789-4-logang@deltatee.com (mailing list archive)
State New, archived
Headers show
Series RISC-V: Sparsmem, Memory Hotplug and pte_devmap for P2P | expand

Commit Message

Logan Gunthorpe March 27, 2019, 9:36 p.m. UTC
The motivation for this is to support P2P transactions. P2P requires
having struct pages for IO memory which means the linear mapping must
be able to cover all of the IO regions. Unfortunately with Sv39 we are
not able to cover all the IO regions available on existing hardware,
but we can do better than what we currently do (which only cover's
physical memory).

To this end, we restructure the kernel's virtual address space region.
We position the vmemmap at the beginning of the region (based on how
many virtual address bits we have) and the VMALLOC region comes
immediately after. The linear mapping then takes up the remaining space.
PAGE_OFFSET will need to be within the linear mapping but may not be
the start of the mapping seeing many machines don't have RAM at address
zero and we may still want to access lower addresses through the
linear mapping.

With these changes, on a 64-bit system the virtual memory map (with
sparsemem enabled) will be:

32-bit:

00000000 - 7fffffff   user space, different per mm (2G)
80000000 - 81ffffff   virtual memory map (32MB)
82000000 - bfffffff   vmalloc/ioremap space (1GB - 32MB)
c0000000 - ffffffff   direct mapping of all phys. memory (1GB)

64-bit, Sv39:

0000000000000000 - 0000003fffffffff  user space, different per mm (256GB)
hole caused by [38:63] sign extension
ffffffc000000000 - ffffffc0ffffffff  virtual memory map (4GB)
ffffffc100000000 - ffffffd0ffffffff  vmalloc/ioremap spac (64GB)
ffffffd100000000 - ffffffffffffffff  linear mapping of phys. space (188GB)

On the Sifive hardware this allows us to provide struct pages for
the lower I/O TileLink address ranges, the 32-bit and 34-bit DRAM areas
and 172GB of 240GB of the high I/O TileLink region. Once we progress to
Sv48 we should be able to cover all the available memory regions..

For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
2GB of address space, so we cannot cover the any of the I/O regions
that are higher than it but we do cover the lower I/O TileLink range.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Cc: Palmer Dabbelt <palmer@sifive.com>
Cc: Albert Ou <aou@eecs.berkeley.edu>
Cc: Antony Pavlov <antonynpavlov@gmail.com>
Cc: "Stefan O'Rear" <sorear2@gmail.com>
Cc: Anup Patel <anup.patel@wdc.com>
---
 arch/riscv/Kconfig               |  2 +-
 arch/riscv/include/asm/page.h    |  2 --
 arch/riscv/include/asm/pgtable.h | 27 ++++++++++++++++++---------
 3 files changed, 19 insertions(+), 12 deletions(-)

Comments

Palmer Dabbelt March 28, 2019, 5:39 a.m. UTC | #1
On Wed, 27 Mar 2019 14:36:39 PDT (-0700), logang@deltatee.com wrote:
> The motivation for this is to support P2P transactions. P2P requires
> having struct pages for IO memory which means the linear mapping must
> be able to cover all of the IO regions. Unfortunately with Sv39 we are
> not able to cover all the IO regions available on existing hardware,
> but we can do better than what we currently do (which only cover's
> physical memory).
>
> To this end, we restructure the kernel's virtual address space region.
> We position the vmemmap at the beginning of the region (based on how
> many virtual address bits we have) and the VMALLOC region comes
> immediately after. The linear mapping then takes up the remaining space.
> PAGE_OFFSET will need to be within the linear mapping but may not be
> the start of the mapping seeing many machines don't have RAM at address
> zero and we may still want to access lower addresses through the
> linear mapping.
>
> With these changes, on a 64-bit system the virtual memory map (with
> sparsemem enabled) will be:
>
> 32-bit:
>
> 00000000 - 7fffffff   user space, different per mm (2G)
> 80000000 - 81ffffff   virtual memory map (32MB)
> 82000000 - bfffffff   vmalloc/ioremap space (1GB - 32MB)
> c0000000 - ffffffff   direct mapping of all phys. memory (1GB)
>
> 64-bit, Sv39:
>
> 0000000000000000 - 0000003fffffffff  user space, different per mm (256GB)
> hole caused by [38:63] sign extension
> ffffffc000000000 - ffffffc0ffffffff  virtual memory map (4GB)
> ffffffc100000000 - ffffffd0ffffffff  vmalloc/ioremap spac (64GB)
> ffffffd100000000 - ffffffffffffffff  linear mapping of phys. space (188GB)
>
> On the Sifive hardware this allows us to provide struct pages for
> the lower I/O TileLink address ranges, the 32-bit and 34-bit DRAM areas
> and 172GB of 240GB of the high I/O TileLink region. Once we progress to
> Sv48 we should be able to cover all the available memory regions..
>
> For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
> 2GB of address space, so we cannot cover the any of the I/O regions
> that are higher than it but we do cover the lower I/O TileLink range.

IIRC there was another patch floating around to fix an issue with overlapping 
regions in the 32-bit port, did you also fix that issue?  It's somewhere in my 
email queue...

> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Cc: Palmer Dabbelt <palmer@sifive.com>
> Cc: Albert Ou <aou@eecs.berkeley.edu>
> Cc: Antony Pavlov <antonynpavlov@gmail.com>
> Cc: "Stefan O'Rear" <sorear2@gmail.com>
> Cc: Anup Patel <anup.patel@wdc.com>
> ---
>  arch/riscv/Kconfig               |  2 +-
>  arch/riscv/include/asm/page.h    |  2 --
>  arch/riscv/include/asm/pgtable.h | 27 ++++++++++++++++++---------
>  3 files changed, 19 insertions(+), 12 deletions(-)
>
> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> index 76fc340ae38e..d21e6a12e8b6 100644
> --- a/arch/riscv/Kconfig
> +++ b/arch/riscv/Kconfig
> @@ -71,7 +71,7 @@ config PAGE_OFFSET
>  	hex
>  	default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
>  	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
> -	default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
> +	default 0xffffffd200000000 if 64BIT && MAXPHYSMEM_128GB
>
>  config ARCH_FLATMEM_ENABLE
>  	def_bool y
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 2a546a52f02a..fa0b8058a246 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -31,8 +31,6 @@
>   */
>  #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
>
> -#define KERN_VIRT_SIZE (-PAGE_OFFSET)
> -
>  #ifndef __ASSEMBLY__
>
>  #define PAGE_UP(addr)	(((addr)+((PAGE_SIZE)-1))&(~((PAGE_SIZE)-1)))
> diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> index 5a9fea00ba09..2a5070540996 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -89,22 +89,31 @@ extern pgd_t swapper_pg_dir[];
>  #define __S110	PAGE_SHARED_EXEC
>  #define __S111	PAGE_SHARED_EXEC
>
> -#define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
> -#define VMALLOC_END      (PAGE_OFFSET - 1)
> -#define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
> +#define KERN_SPACE_START	(-1UL << (CONFIG_VA_BITS - 1))
>
>  /*
>   * Roughly size the vmemmap space to be large enough to fit enough
>   * struct pages to map half the virtual address space. Then
>   * position vmemmap directly below the VMALLOC region.
>   */
> -#define VMEMMAP_SHIFT \
> -	(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
> -#define VMEMMAP_SIZE	(1UL << VMEMMAP_SHIFT)
> -#define VMEMMAP_END	(VMALLOC_START - 1)
> -#define VMEMMAP_START	(VMALLOC_START - VMEMMAP_SIZE)
> -
> +#ifdef CONFIG_SPARSEMEM
> +#define VMEMMAP_SIZE	(UL(1) << (CONFIG_VA_BITS - PAGE_SHIFT - 1 + \
> +				   STRUCT_PAGE_MAX_SHIFT))
> +#define VMEMMAP_START	(KERN_SPACE_START)
> +#define VMEMMAP_END	(VMEMMAP_START + VMEMMAP_SIZE - 1)
>  #define vmemmap		((struct page *)VMEMMAP_START)
> +#else
> +#define VMEMMAP_END	KERN_SPACE_START
> +#endif
> +
> +#ifdef CONFIG_32BIT
> +#define VMALLOC_SIZE	((1UL << 30) - VMEMMAP_SIZE)
> +#else
> +#define VMALLOC_SIZE	(64UL << 30)
> +#endif
> +
> +#define VMALLOC_START	(VMEMMAP_END + 1)
> +#define VMALLOC_END	(VMALLOC_START + VMALLOC_SIZE - 1)
>
>  /*
>   * ZERO_PAGE is a global shared page that is always zero,
Anup Patel March 28, 2019, 6:28 a.m. UTC | #2
On Thu, Mar 28, 2019 at 11:09 AM Palmer Dabbelt <palmer@sifive.com> wrote:
>
> On Wed, 27 Mar 2019 14:36:39 PDT (-0700), logang@deltatee.com wrote:
> > The motivation for this is to support P2P transactions. P2P requires
> > having struct pages for IO memory which means the linear mapping must
> > be able to cover all of the IO regions. Unfortunately with Sv39 we are
> > not able to cover all the IO regions available on existing hardware,
> > but we can do better than what we currently do (which only cover's
> > physical memory).
> >
> > To this end, we restructure the kernel's virtual address space region.
> > We position the vmemmap at the beginning of the region (based on how
> > many virtual address bits we have) and the VMALLOC region comes
> > immediately after. The linear mapping then takes up the remaining space.
> > PAGE_OFFSET will need to be within the linear mapping but may not be
> > the start of the mapping seeing many machines don't have RAM at address
> > zero and we may still want to access lower addresses through the
> > linear mapping.
> >
> > With these changes, on a 64-bit system the virtual memory map (with
> > sparsemem enabled) will be:
> >
> > 32-bit:
> >
> > 00000000 - 7fffffff   user space, different per mm (2G)
> > 80000000 - 81ffffff   virtual memory map (32MB)
> > 82000000 - bfffffff   vmalloc/ioremap space (1GB - 32MB)
> > c0000000 - ffffffff   direct mapping of all phys. memory (1GB)
> >
> > 64-bit, Sv39:
> >
> > 0000000000000000 - 0000003fffffffff  user space, different per mm (256GB)
> > hole caused by [38:63] sign extension
> > ffffffc000000000 - ffffffc0ffffffff  virtual memory map (4GB)
> > ffffffc100000000 - ffffffd0ffffffff  vmalloc/ioremap spac (64GB)
> > ffffffd100000000 - ffffffffffffffff  linear mapping of phys. space (188GB)
> >
> > On the Sifive hardware this allows us to provide struct pages for
> > the lower I/O TileLink address ranges, the 32-bit and 34-bit DRAM areas
> > and 172GB of 240GB of the high I/O TileLink region. Once we progress to
> > Sv48 we should be able to cover all the available memory regions..
> >
> > For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
> > 2GB of address space, so we cannot cover the any of the I/O regions
> > that are higher than it but we do cover the lower I/O TileLink range.
>
> IIRC there was another patch floating around to fix an issue with overlapping
> regions in the 32-bit port, did you also fix that issue?  It's somewhere in my
> email queue...

That was a patch I submitted to fix overlapping FIXMAP and VMALLOC
regions.

This patch does not consider FIXMAP region.

I suggest we introduce asm/memory.h where we have all critical defines
related to virtual memory layout. Also, this header should have detailed
comments about virtual memory layout.

>
> > Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> > Cc: Palmer Dabbelt <palmer@sifive.com>
> > Cc: Albert Ou <aou@eecs.berkeley.edu>
> > Cc: Antony Pavlov <antonynpavlov@gmail.com>
> > Cc: "Stefan O'Rear" <sorear2@gmail.com>
> > Cc: Anup Patel <anup.patel@wdc.com>
> > ---
> >  arch/riscv/Kconfig               |  2 +-
> >  arch/riscv/include/asm/page.h    |  2 --
> >  arch/riscv/include/asm/pgtable.h | 27 ++++++++++++++++++---------
> >  3 files changed, 19 insertions(+), 12 deletions(-)
> >
> > diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
> > index 76fc340ae38e..d21e6a12e8b6 100644
> > --- a/arch/riscv/Kconfig
> > +++ b/arch/riscv/Kconfig
> > @@ -71,7 +71,7 @@ config PAGE_OFFSET
> >       hex
> >       default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
> >       default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
> > -     default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
> > +     default 0xffffffd200000000 if 64BIT && MAXPHYSMEM_128GB
> >
> >  config ARCH_FLATMEM_ENABLE
> >       def_bool y
> > diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> > index 2a546a52f02a..fa0b8058a246 100644
> > --- a/arch/riscv/include/asm/page.h
> > +++ b/arch/riscv/include/asm/page.h
> > @@ -31,8 +31,6 @@
> >   */
> >  #define PAGE_OFFSET          _AC(CONFIG_PAGE_OFFSET, UL)
> >
> > -#define KERN_VIRT_SIZE (-PAGE_OFFSET)
> > -
> >  #ifndef __ASSEMBLY__
> >
> >  #define PAGE_UP(addr)        (((addr)+((PAGE_SIZE)-1))&(~((PAGE_SIZE)-1)))
> > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
> > index 5a9fea00ba09..2a5070540996 100644
> > --- a/arch/riscv/include/asm/pgtable.h
> > +++ b/arch/riscv/include/asm/pgtable.h
> > @@ -89,22 +89,31 @@ extern pgd_t swapper_pg_dir[];
> >  #define __S110       PAGE_SHARED_EXEC
> >  #define __S111       PAGE_SHARED_EXEC
> >
> > -#define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
> > -#define VMALLOC_END      (PAGE_OFFSET - 1)
> > -#define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
> > +#define KERN_SPACE_START     (-1UL << (CONFIG_VA_BITS - 1))
> >
> >  /*
> >   * Roughly size the vmemmap space to be large enough to fit enough
> >   * struct pages to map half the virtual address space. Then
> >   * position vmemmap directly below the VMALLOC region.
> >   */
> > -#define VMEMMAP_SHIFT \
> > -     (CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
> > -#define VMEMMAP_SIZE (1UL << VMEMMAP_SHIFT)
> > -#define VMEMMAP_END  (VMALLOC_START - 1)
> > -#define VMEMMAP_START        (VMALLOC_START - VMEMMAP_SIZE)
> > -
> > +#ifdef CONFIG_SPARSEMEM
> > +#define VMEMMAP_SIZE (UL(1) << (CONFIG_VA_BITS - PAGE_SHIFT - 1 + \
> > +                                STRUCT_PAGE_MAX_SHIFT))
> > +#define VMEMMAP_START        (KERN_SPACE_START)
> > +#define VMEMMAP_END  (VMEMMAP_START + VMEMMAP_SIZE - 1)
> >  #define vmemmap              ((struct page *)VMEMMAP_START)
> > +#else
> > +#define VMEMMAP_END  KERN_SPACE_START
> > +#endif
> > +
> > +#ifdef CONFIG_32BIT
> > +#define VMALLOC_SIZE ((1UL << 30) - VMEMMAP_SIZE)
> > +#else
> > +#define VMALLOC_SIZE (64UL << 30)
> > +#endif
> > +
> > +#define VMALLOC_START        (VMEMMAP_END + 1)
> > +#define VMALLOC_END  (VMALLOC_START + VMALLOC_SIZE - 1)
> >
> >  /*
> >   * ZERO_PAGE is a global shared page that is always zero,

Regards,
Anup
Logan Gunthorpe March 28, 2019, 3:54 p.m. UTC | #3
On 2019-03-28 12:28 a.m., Anup Patel wrote:
>>> For the MAXPHYSMEM_2GB case, the physical memory must be in the highest
>>> 2GB of address space, so we cannot cover the any of the I/O regions
>>> that are higher than it but we do cover the lower I/O TileLink range.
>>
>> IIRC there was another patch floating around to fix an issue with overlapping
>> regions in the 32-bit port, did you also fix that issue?  It's somewhere in my
>> email queue...
> 
> That was a patch I submitted to fix overlapping FIXMAP and VMALLOC
> regions.
> 
> This patch does not consider FIXMAP region.

Correct.

> I suggest we introduce asm/memory.h where we have all critical defines
> related to virtual memory layout. Also, this header should have detailed
> comments about virtual memory layout.

Seems like a sensible cleanup. It did seem like the defines for this
stuff were all over the place. I'm not really clear on all the stuff
that would belong in asm/memory.h so I think I'll leave such a cleanup
to you.

The second patch in this series added documentation to describe the
virtual memory layout which matches how it was done in x86.

Logan
diff mbox series

Patch

diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig
index 76fc340ae38e..d21e6a12e8b6 100644
--- a/arch/riscv/Kconfig
+++ b/arch/riscv/Kconfig
@@ -71,7 +71,7 @@  config PAGE_OFFSET
 	hex
 	default 0xC0000000 if 32BIT && MAXPHYSMEM_2GB
 	default 0xffffffff80000000 if 64BIT && MAXPHYSMEM_2GB
-	default 0xffffffe000000000 if 64BIT && MAXPHYSMEM_128GB
+	default 0xffffffd200000000 if 64BIT && MAXPHYSMEM_128GB
 
 config ARCH_FLATMEM_ENABLE
 	def_bool y
diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
index 2a546a52f02a..fa0b8058a246 100644
--- a/arch/riscv/include/asm/page.h
+++ b/arch/riscv/include/asm/page.h
@@ -31,8 +31,6 @@ 
  */
 #define PAGE_OFFSET		_AC(CONFIG_PAGE_OFFSET, UL)
 
-#define KERN_VIRT_SIZE (-PAGE_OFFSET)
-
 #ifndef __ASSEMBLY__
 
 #define PAGE_UP(addr)	(((addr)+((PAGE_SIZE)-1))&(~((PAGE_SIZE)-1)))
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h
index 5a9fea00ba09..2a5070540996 100644
--- a/arch/riscv/include/asm/pgtable.h
+++ b/arch/riscv/include/asm/pgtable.h
@@ -89,22 +89,31 @@  extern pgd_t swapper_pg_dir[];
 #define __S110	PAGE_SHARED_EXEC
 #define __S111	PAGE_SHARED_EXEC
 
-#define VMALLOC_SIZE     (KERN_VIRT_SIZE >> 1)
-#define VMALLOC_END      (PAGE_OFFSET - 1)
-#define VMALLOC_START    (PAGE_OFFSET - VMALLOC_SIZE)
+#define KERN_SPACE_START	(-1UL << (CONFIG_VA_BITS - 1))
 
 /*
  * Roughly size the vmemmap space to be large enough to fit enough
  * struct pages to map half the virtual address space. Then
  * position vmemmap directly below the VMALLOC region.
  */
-#define VMEMMAP_SHIFT \
-	(CONFIG_VA_BITS - PAGE_SHIFT - 1 + STRUCT_PAGE_MAX_SHIFT)
-#define VMEMMAP_SIZE	(1UL << VMEMMAP_SHIFT)
-#define VMEMMAP_END	(VMALLOC_START - 1)
-#define VMEMMAP_START	(VMALLOC_START - VMEMMAP_SIZE)
-
+#ifdef CONFIG_SPARSEMEM
+#define VMEMMAP_SIZE	(UL(1) << (CONFIG_VA_BITS - PAGE_SHIFT - 1 + \
+				   STRUCT_PAGE_MAX_SHIFT))
+#define VMEMMAP_START	(KERN_SPACE_START)
+#define VMEMMAP_END	(VMEMMAP_START + VMEMMAP_SIZE - 1)
 #define vmemmap		((struct page *)VMEMMAP_START)
+#else
+#define VMEMMAP_END	KERN_SPACE_START
+#endif
+
+#ifdef CONFIG_32BIT
+#define VMALLOC_SIZE	((1UL << 30) - VMEMMAP_SIZE)
+#else
+#define VMALLOC_SIZE	(64UL << 30)
+#endif
+
+#define VMALLOC_START	(VMEMMAP_END + 1)
+#define VMALLOC_END	(VMALLOC_START + VMALLOC_SIZE - 1)
 
 /*
  * ZERO_PAGE is a global shared page that is always zero,