diff mbox series

[v3,01/10] arm64: mm: Flip kernel VA space

Message ID 20190612172658.28522-2-steve.capper@arm.com (mailing list archive)
State New, archived
Headers show
Series 52-bit kernel + user VAs | expand

Commit Message

Steve Capper June 12, 2019, 5:26 p.m. UTC
Put the direct linear map in the lower addresses of the kernel VA range
and everything else in the higher ranges.

This allows us to make room for an inline KASAN shadow that operates
under both 48 and 52 bit kernel VA sizes. For example with a 52-bit VA,
if KASAN_SHADOW_END < 0xFFF8000000000000 (it is in the lower addresses
of the kernel VA range), this will be below the start of the minimum
48-bit kernel VA address of 0xFFFF000000000000.

We need to adjust:
 *) KASAN shadow region placement logic,
 *) KASAN_SHADOW_OFFSET computation logic,
 *) virt_to_phys, phys_to_virt checks,
 *) page table dumper.

These are all small changes, that need to take place atomically, so they
are bundled into this commit.

Signed-off-by: Steve Capper <steve.capper@arm.com>
---
 arch/arm64/Makefile              | 2 +-
 arch/arm64/include/asm/memory.h  | 8 ++++----
 arch/arm64/include/asm/pgtable.h | 2 +-
 arch/arm64/kernel/hibernate.c    | 2 +-
 arch/arm64/mm/dump.c             | 8 ++++----
 arch/arm64/mm/init.c             | 9 +--------
 arch/arm64/mm/kasan_init.c       | 6 +++---
 arch/arm64/mm/mmu.c              | 4 ++--
 8 files changed, 17 insertions(+), 24 deletions(-)

Comments

Anshuman Khandual June 14, 2019, 12:17 p.m. UTC | #1
Hello Steve,

On 06/12/2019 10:56 PM, Steve Capper wrote:
> Put the direct linear map in the lower addresses of the kernel VA range
> and everything else in the higher ranges.

The reason for this flip has to be more clear in the commit message.

> 
> This allows us to make room for an inline KASAN shadow that operates
> under both 48 and 52 bit kernel VA sizes. For example with a 52-bit VA,
> if KASAN_SHADOW_END < 0xFFF8000000000000 (it is in the lower addresses
> of the kernel VA range), this will be below the start of the minimum
> 48-bit kernel VA address of 0xFFFF000000000000.

Though this is true it does not convey to the effect of why the flip is
required. As you had mentioned previously KASAN_SHADOW_END is fixed because
it needs to support the highest possible VA (~0UL) in both 48 and 52 bits.

Hence KASAN_SHADOW_START will have to be variable in order to accommodate
both 48 bits or 52 bits effective virtual address. Hence not sure what the
above example based on KASAN_SHADOW_START is trying to convey.

KASAN_SHADOW_END cannot be in the (52 bit space - 48 bit space) exclusion VA
range as it would not work for the 48 bits case. But KASAN_SHADOW_START can
very well be in that space for 52 bits case.

The current definition

#define KASAN_SHADOW_START      (VA_START)
#define KASAN_SHADOW_END        (KASAN_SHADOW_START + KASAN_SHADOW_SIZE)

This wont work in the new scheme because VA_START is different for 48 bits
and 52 bits which will make KASAN_SHADOW_END variable as well. Hence we need
to change this arrangement.

What the commit message here does not try to convince is that there are no
other alternate arrangements where KASAN_SHADOW_END remains fixed (and also
accessible in 48 bits scheme), KASAN_SHADOW_FIRST is variable accommodating
both 48 bits and 52 bits case and flipping the kernel VA space is the only
viable option.

Ahh I see this in the cover letter.

" In order to allow for a KASAN shadow that changes size at boot time, one
must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the
start address. Also, it is highly desirable to maintain the same
function addresses in the kernel .text between VA sizes. Both of these
requirements necessitate us to flip the kernel address space halves s.t.
the direct linear map occupies the lower addresses."

Though this is better (please add it to commit message).

There were no alternate arrangements to achieve the above two objectives
without flipping the VA space ? The reasoning here in the commit message is
not convincing enough even with the above cover letter extract.

> 
> We need to adjust:
>  *) KASAN shadow region placement logic,
>  *) KASAN_SHADOW_OFFSET computation logic,
>  *) virt_to_phys, phys_to_virt checks,
>  *) page table dumper.
> 
> These are all small changes, that need to take place atomically, so they
> are bundled into this commit.

It will be great to add a before patch and after patch view of the kernel
virtual address space enlisting different sections to make things apparent
about what and how the layout has changed.
Anshuman Khandual June 14, 2019, 1 p.m. UTC | #2
On 06/12/2019 10:56 PM, Steve Capper wrote:
> Put the direct linear map in the lower addresses of the kernel VA range
> and everything else in the higher ranges.
> 
> This allows us to make room for an inline KASAN shadow that operates
> under both 48 and 52 bit kernel VA sizes. For example with a 52-bit VA,
> if KASAN_SHADOW_END < 0xFFF8000000000000 (it is in the lower addresses
> of the kernel VA range), this will be below the start of the minimum
> 48-bit kernel VA address of 0xFFFF000000000000.
> 
> We need to adjust:
>  *) KASAN shadow region placement logic,
>  *) KASAN_SHADOW_OFFSET computation logic,
>  *) virt_to_phys, phys_to_virt checks,
>  *) page table dumper.
> 
> These are all small changes, that need to take place atomically, so they
> are bundled into this commit.
> 
> Signed-off-by: Steve Capper <steve.capper@arm.com>
> ---
>  arch/arm64/Makefile              | 2 +-
>  arch/arm64/include/asm/memory.h  | 8 ++++----
>  arch/arm64/include/asm/pgtable.h | 2 +-
>  arch/arm64/kernel/hibernate.c    | 2 +-
>  arch/arm64/mm/dump.c             | 8 ++++----
>  arch/arm64/mm/init.c             | 9 +--------
>  arch/arm64/mm/kasan_init.c       | 6 +++---
>  arch/arm64/mm/mmu.c              | 4 ++--
>  8 files changed, 17 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
> index b025304bde46..2dad2ae6b181 100644
> --- a/arch/arm64/Makefile
> +++ b/arch/arm64/Makefile
> @@ -115,7 +115,7 @@ KBUILD_AFLAGS += -DKASAN_SHADOW_SCALE_SHIFT=$(KASAN_SHADOW_SCALE_SHIFT)
>  #				 - (1 << (64 - KASAN_SHADOW_SCALE_SHIFT))
>  # in 32-bit arithmetic
>  KASAN_SHADOW_OFFSET := $(shell printf "0x%08x00000000\n" $$(( \
> -	(0xffffffff & (-1 << ($(CONFIG_ARM64_VA_BITS) - 32))) \
> +	(0xffffffff & (-1 << ($(CONFIG_ARM64_VA_BITS) - 1 - 32))) \
>  	+ (1 << ($(CONFIG_ARM64_VA_BITS) - 32 - $(KASAN_SHADOW_SCALE_SHIFT))) \
>  	- (1 << (64 - 32 - $(KASAN_SHADOW_SCALE_SHIFT))) )) )
>  
> diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> index 8ffcf5a512bb..5cd2eb8cb424 100644
> --- a/arch/arm64/include/asm/memory.h
> +++ b/arch/arm64/include/asm/memory.h
> @@ -49,9 +49,9 @@
>   */
>  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
>  #define VA_START		(UL(0xffffffffffffffff) - \
> -	(UL(1) << VA_BITS) + 1)
> -#define PAGE_OFFSET		(UL(0xffffffffffffffff) - \
>  	(UL(1) << (VA_BITS - 1)) + 1)
> +#define PAGE_OFFSET		(UL(0xffffffffffffffff) - \
> +	(UL(1) << VA_BITS) + 1)

PAGE_OFFSET and VA_START swapped their positions.

There are many places with UL(0xffffffffffffffff). Time to define
it as a constant ? Something like [KERNEL|TTBR1]_MAX_VADDR.

>  #define KIMAGE_VADDR		(MODULES_END)
>  #define BPF_JIT_REGION_START	(VA_START + KASAN_SHADOW_SIZE)
>  #define BPF_JIT_REGION_SIZE	(SZ_128M)
> @@ -59,7 +59,7 @@
>  #define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
>  #define MODULES_VADDR		(BPF_JIT_REGION_END)
>  #define MODULES_VSIZE		(SZ_128M)
> -#define VMEMMAP_START		(PAGE_OFFSET - VMEMMAP_SIZE)
> +#define VMEMMAP_START		(-VMEMMAP_SIZE)
>  #define PCI_IO_END		(VMEMMAP_START - SZ_2M)
>  #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
>  #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
> @@ -238,7 +238,7 @@ extern u64			vabits_user;
>   * space. Testing the top bit for the start of the region is a
>   * sufficient check.
>   */
> -#define __is_lm_address(addr)	(!!((addr) & BIT(VA_BITS - 1)))
> +#define __is_lm_address(addr)	(!((addr) & BIT(VA_BITS - 1)))

Should it be (!!((addr) & BIT(VA_BITS - 2))) instead for a positive validation
for addresses in the lower half ?

>  
>  #define __lm_to_phys(addr)	(((addr) & ~PAGE_OFFSET) + PHYS_OFFSET)
>  #define __kimg_to_phys(addr)	((addr) - kimage_voffset)
> diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> index 2c41b04708fe..d0ab784304e9 100644
> --- a/arch/arm64/include/asm/pgtable.h
> +++ b/arch/arm64/include/asm/pgtable.h
> @@ -32,7 +32,7 @@
>   *	and fixed mappings
>   */
>  #define VMALLOC_START		(MODULES_END)
> -#define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
> +#define VMALLOC_END		(- PUD_SIZE - VMEMMAP_SIZE - SZ_64K)

(-VMEMMAP_SIZE) and (- PUD_SIZE - VMEMMAP_SIZE - SZ_64K) depends on implicit sign
inversion. IMHO it might be better to add [KERNEL|TTBR1]_MAX_VADDR in the equation.

>  
>  #define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
>  
> diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
> index 9859e1178e6b..6ffcc32f35dd 100644
> --- a/arch/arm64/kernel/hibernate.c
> +++ b/arch/arm64/kernel/hibernate.c
> @@ -497,7 +497,7 @@ int swsusp_arch_resume(void)
>  		rc = -ENOMEM;
>  		goto out;
>  	}
> -	rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, 0);
> +	rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, VA_START);
>  	if (rc)
>  		goto out;
>  
> diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
> index 14fe23cd5932..ee4e5bea8944 100644
> --- a/arch/arm64/mm/dump.c
> +++ b/arch/arm64/mm/dump.c
> @@ -30,6 +30,8 @@
>  #include <asm/ptdump.h>
>  
>  static const struct addr_marker address_markers[] = {
> +	{ PAGE_OFFSET,			"Linear Mapping start" },
> +	{ VA_START,			"Linear Mapping end" },
>  #ifdef CONFIG_KASAN
>  	{ KASAN_SHADOW_START,		"Kasan shadow start" },
>  	{ KASAN_SHADOW_END,		"Kasan shadow end" },
> @@ -43,10 +45,8 @@ static const struct addr_marker address_markers[] = {
>  	{ PCI_IO_START,			"PCI I/O start" },
>  	{ PCI_IO_END,			"PCI I/O end" },
>  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> -	{ VMEMMAP_START,		"vmemmap start" },
> -	{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
> +	{ VMEMMAP_START,		"vmemmap" },

Vmemmap end got dropped ?

>  #endif
> -	{ PAGE_OFFSET,			"Linear mapping" },
>  	{ -1,				NULL },
>  };
>  
> @@ -380,7 +380,7 @@ static void ptdump_initialize(void)
>  static struct ptdump_info kernel_ptdump_info = {
>  	.mm		= &init_mm,
>  	.markers	= address_markers,
> -	.base_addr	= VA_START,
> +	.base_addr	= PAGE_OFFSET,
>  };
>  
>  void ptdump_check_wx(void)
> diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> index d2adffb81b5d..574ed1d4be19 100644
> --- a/arch/arm64/mm/init.c
> +++ b/arch/arm64/mm/init.c
> @@ -311,7 +311,7 @@ static void __init fdt_enforce_memory_region(void)
>  
>  void __init arm64_memblock_init(void)
>  {
> -	const s64 linear_region_size = -(s64)PAGE_OFFSET;
> +	const s64 linear_region_size = BIT(VA_BITS - 1);
>  
>  	/* Handle linux,usable-memory-range property */
>  	fdt_enforce_memory_region();
> @@ -319,13 +319,6 @@ void __init arm64_memblock_init(void)
>  	/* Remove memory above our supported physical address size */
>  	memblock_remove(1ULL << PHYS_MASK_SHIFT, ULLONG_MAX);
>  
> -	/*
> -	 * Ensure that the linear region takes up exactly half of the kernel
> -	 * virtual address space. This way, we can distinguish a linear address
> -	 * from a kernel/module/vmalloc address by testing a single bit.
> -	 */
> -	BUILD_BUG_ON(linear_region_size != BIT(VA_BITS - 1));
> -
>  	/*
>  	 * Select a suitable value for the base of physical memory.
>  	 */
> diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> index 296de39ddee5..8066621052db 100644
> --- a/arch/arm64/mm/kasan_init.c
> +++ b/arch/arm64/mm/kasan_init.c
> @@ -229,10 +229,10 @@ void __init kasan_init(void)
>  	kasan_map_populate(kimg_shadow_start, kimg_shadow_end,
>  			   early_pfn_to_nid(virt_to_pfn(lm_alias(_text))));
>  
> -	kasan_populate_early_shadow((void *)KASAN_SHADOW_START,
> -				    (void *)mod_shadow_start);
> +	kasan_populate_early_shadow(kasan_mem_to_shadow((void *) VA_START),
> +				   (void *)mod_shadow_start);
>  	kasan_populate_early_shadow((void *)kimg_shadow_end,
> -				    kasan_mem_to_shadow((void *)PAGE_OFFSET));
> +				   (void *)KASAN_SHADOW_END);
>  
>  	if (kimg_shadow_start > mod_shadow_end)
>  		kasan_populate_early_shadow((void *)mod_shadow_end,
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index a1bfc4413982..16063ff10c6d 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -409,7 +409,7 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
>  static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
>  				  phys_addr_t size, pgprot_t prot)
>  {
> -	if (virt < VMALLOC_START) {
> +	if ((virt >= VA_START) && (virt < VMALLOC_START)) {
>  		pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
>  			&phys, virt);
>  		return;
> @@ -436,7 +436,7 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
>  static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
>  				phys_addr_t size, pgprot_t prot)
>  {
> -	if (virt < VMALLOC_START) {
> +	if ((virt >= VA_START) && (virt < VMALLOC_START)) {
>  		pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
>  			&phys, virt);
>  		return;
> 

Seems like adding (virt >= VA_START) is a semantics change here. In the previous
scheme (virt < VMALLOC_START) included undefined addresses below VA_START as well
which will be omitted here. Should not we add (virt < PAGE_OFFSET) to check those ?
Steve Capper June 17, 2019, 4:08 p.m. UTC | #3
Hi Anshuman,

On Fri, Jun 14, 2019 at 06:30:21PM +0530, Anshuman Khandual wrote:
> On 06/12/2019 10:56 PM, Steve Capper wrote:
> > Put the direct linear map in the lower addresses of the kernel VA range
> > and everything else in the higher ranges.
> > 

[...]

> > diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
> > index 8ffcf5a512bb..5cd2eb8cb424 100644
> > --- a/arch/arm64/include/asm/memory.h
> > +++ b/arch/arm64/include/asm/memory.h
> > @@ -49,9 +49,9 @@
> >   */
> >  #define VA_BITS			(CONFIG_ARM64_VA_BITS)
> >  #define VA_START		(UL(0xffffffffffffffff) - \
> > -	(UL(1) << VA_BITS) + 1)
> > -#define PAGE_OFFSET		(UL(0xffffffffffffffff) - \
> >  	(UL(1) << (VA_BITS - 1)) + 1)
> > +#define PAGE_OFFSET		(UL(0xffffffffffffffff) - \
> > +	(UL(1) << VA_BITS) + 1)
> 
> PAGE_OFFSET and VA_START swapped their positions.
>

In the file I think they haven't moved, would you like them to be
swapped? (Their values have changed).

> There are many places with UL(0xffffffffffffffff). Time to define
> it as a constant ? Something like [KERNEL|TTBR1]_MAX_VADDR.
>

Could do, I am also tempted to do something like UL(~0) for the sake of
brevity.

> >  #define KIMAGE_VADDR		(MODULES_END)
> >  #define BPF_JIT_REGION_START	(VA_START + KASAN_SHADOW_SIZE)
> >  #define BPF_JIT_REGION_SIZE	(SZ_128M)
> > @@ -59,7 +59,7 @@
> >  #define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
> >  #define MODULES_VADDR		(BPF_JIT_REGION_END)
> >  #define MODULES_VSIZE		(SZ_128M)
> > -#define VMEMMAP_START		(PAGE_OFFSET - VMEMMAP_SIZE)
> > +#define VMEMMAP_START		(-VMEMMAP_SIZE)
> >  #define PCI_IO_END		(VMEMMAP_START - SZ_2M)
> >  #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
> >  #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
> > @@ -238,7 +238,7 @@ extern u64			vabits_user;
> >   * space. Testing the top bit for the start of the region is a
> >   * sufficient check.
> >   */
> > -#define __is_lm_address(addr)	(!!((addr) & BIT(VA_BITS - 1)))
> > +#define __is_lm_address(addr)	(!((addr) & BIT(VA_BITS - 1)))
> 
> Should it be (!!((addr) & BIT(VA_BITS - 2))) instead for a positive validation
> for addresses in the lower half ?
> 

I don't think so, we basically want to test which half of the address
space our address is in, so we test BIT(VA_BITS - 1). If one were to
test BIT(VA_BITS - 2) that would be testing which half of one of the
halves instead.

> >  
> >  #define __lm_to_phys(addr)	(((addr) & ~PAGE_OFFSET) + PHYS_OFFSET)
> >  #define __kimg_to_phys(addr)	((addr) - kimage_voffset)
> > diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
> > index 2c41b04708fe..d0ab784304e9 100644
> > --- a/arch/arm64/include/asm/pgtable.h
> > +++ b/arch/arm64/include/asm/pgtable.h
> > @@ -32,7 +32,7 @@
> >   *	and fixed mappings
> >   */
> >  #define VMALLOC_START		(MODULES_END)
> > -#define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
> > +#define VMALLOC_END		(- PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
> 
> (-VMEMMAP_SIZE) and (- PUD_SIZE - VMEMMAP_SIZE - SZ_64K) depends on implicit sign
> inversion. IMHO it might be better to add [KERNEL|TTBR1]_MAX_VADDR in the equation.
>

Hmmm, okay, I see what you mean. I will play with this.

> >  
> >  #define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
> >  
> > diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
> > index 9859e1178e6b..6ffcc32f35dd 100644
> > --- a/arch/arm64/kernel/hibernate.c
> > +++ b/arch/arm64/kernel/hibernate.c
> > @@ -497,7 +497,7 @@ int swsusp_arch_resume(void)
> >  		rc = -ENOMEM;
> >  		goto out;
> >  	}
> > -	rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, 0);
> > +	rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, VA_START);
> >  	if (rc)
> >  		goto out;
> >  
> > diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
> > index 14fe23cd5932..ee4e5bea8944 100644
> > --- a/arch/arm64/mm/dump.c
> > +++ b/arch/arm64/mm/dump.c
> > @@ -30,6 +30,8 @@
> >  #include <asm/ptdump.h>
> >  
> >  static const struct addr_marker address_markers[] = {
> > +	{ PAGE_OFFSET,			"Linear Mapping start" },
> > +	{ VA_START,			"Linear Mapping end" },
> >  #ifdef CONFIG_KASAN
> >  	{ KASAN_SHADOW_START,		"Kasan shadow start" },
> >  	{ KASAN_SHADOW_END,		"Kasan shadow end" },
> > @@ -43,10 +45,8 @@ static const struct addr_marker address_markers[] = {
> >  	{ PCI_IO_START,			"PCI I/O start" },
> >  	{ PCI_IO_END,			"PCI I/O end" },
> >  #ifdef CONFIG_SPARSEMEM_VMEMMAP
> > -	{ VMEMMAP_START,		"vmemmap start" },
> > -	{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
> > +	{ VMEMMAP_START,		"vmemmap" },
> 
> Vmemmap end got dropped ?
> 

Yeah, IIRC this was because the marker never got printed, I will re-test
this.

> >  #endif
> > -	{ PAGE_OFFSET,			"Linear mapping" },
> >  	{ -1,				NULL },
> >  };
> >  
> > @@ -380,7 +380,7 @@ static void ptdump_initialize(void)
> >  static struct ptdump_info kernel_ptdump_info = {
> >  	.mm		= &init_mm,
> >  	.markers	= address_markers,
> > -	.base_addr	= VA_START,
> > +	.base_addr	= PAGE_OFFSET,
> >  };
> >  
> >  void ptdump_check_wx(void)
> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
> > index d2adffb81b5d..574ed1d4be19 100644
> > --- a/arch/arm64/mm/init.c
> > +++ b/arch/arm64/mm/init.c
> > @@ -311,7 +311,7 @@ static void __init fdt_enforce_memory_region(void)
> >  
> >  void __init arm64_memblock_init(void)
> >  {
> > -	const s64 linear_region_size = -(s64)PAGE_OFFSET;
> > +	const s64 linear_region_size = BIT(VA_BITS - 1);
> >  
> >  	/* Handle linux,usable-memory-range property */
> >  	fdt_enforce_memory_region();
> > @@ -319,13 +319,6 @@ void __init arm64_memblock_init(void)
> >  	/* Remove memory above our supported physical address size */
> >  	memblock_remove(1ULL << PHYS_MASK_SHIFT, ULLONG_MAX);
> >  
> > -	/*
> > -	 * Ensure that the linear region takes up exactly half of the kernel
> > -	 * virtual address space. This way, we can distinguish a linear address
> > -	 * from a kernel/module/vmalloc address by testing a single bit.
> > -	 */
> > -	BUILD_BUG_ON(linear_region_size != BIT(VA_BITS - 1));
> > -
> >  	/*
> >  	 * Select a suitable value for the base of physical memory.
> >  	 */
> > diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
> > index 296de39ddee5..8066621052db 100644
> > --- a/arch/arm64/mm/kasan_init.c
> > +++ b/arch/arm64/mm/kasan_init.c
> > @@ -229,10 +229,10 @@ void __init kasan_init(void)
> >  	kasan_map_populate(kimg_shadow_start, kimg_shadow_end,
> >  			   early_pfn_to_nid(virt_to_pfn(lm_alias(_text))));
> >  
> > -	kasan_populate_early_shadow((void *)KASAN_SHADOW_START,
> > -				    (void *)mod_shadow_start);
> > +	kasan_populate_early_shadow(kasan_mem_to_shadow((void *) VA_START),
> > +				   (void *)mod_shadow_start);
> >  	kasan_populate_early_shadow((void *)kimg_shadow_end,
> > -				    kasan_mem_to_shadow((void *)PAGE_OFFSET));
> > +				   (void *)KASAN_SHADOW_END);
> >  
> >  	if (kimg_shadow_start > mod_shadow_end)
> >  		kasan_populate_early_shadow((void *)mod_shadow_end,
> > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> > index a1bfc4413982..16063ff10c6d 100644
> > --- a/arch/arm64/mm/mmu.c
> > +++ b/arch/arm64/mm/mmu.c
> > @@ -409,7 +409,7 @@ static phys_addr_t pgd_pgtable_alloc(int shift)
> >  static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
> >  				  phys_addr_t size, pgprot_t prot)
> >  {
> > -	if (virt < VMALLOC_START) {
> > +	if ((virt >= VA_START) && (virt < VMALLOC_START)) {
> >  		pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
> >  			&phys, virt);
> >  		return;
> > @@ -436,7 +436,7 @@ void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
> >  static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
> >  				phys_addr_t size, pgprot_t prot)
> >  {
> > -	if (virt < VMALLOC_START) {
> > +	if ((virt >= VA_START) && (virt < VMALLOC_START)) {
> >  		pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
> >  			&phys, virt);
> >  		return;
> > 
> 
> Seems like adding (virt >= VA_START) is a semantics change here. In the previous
> scheme (virt < VMALLOC_START) included undefined addresses below VA_START as well
> which will be omitted here. Should not we add (virt < PAGE_OFFSET) to check those ?

Not quite, the original code would not warn if we passed an address in
the direct linear mapping. This conditional logic has been modified to
reflect the fact that the direct linear mapping has moved.

If the original behaviour was a bug then I'm happy to fix it :-).

Cheers,
Steve Capper June 17, 2019, 4:09 p.m. UTC | #4
Hi Anshuman,

Thanks for taking a look at this.

On Fri, Jun 14, 2019 at 05:47:55PM +0530, Anshuman Khandual wrote:
> Hello Steve,
> 
> On 06/12/2019 10:56 PM, Steve Capper wrote:
> > Put the direct linear map in the lower addresses of the kernel VA range
> > and everything else in the higher ranges.
> 
> The reason for this flip has to be more clear in the commit message.
> 

Agreed, that's what I failed to do below ;-).

> > 
> > This allows us to make room for an inline KASAN shadow that operates
> > under both 48 and 52 bit kernel VA sizes. For example with a 52-bit VA,
> > if KASAN_SHADOW_END < 0xFFF8000000000000 (it is in the lower addresses
> > of the kernel VA range), this will be below the start of the minimum
> > 48-bit kernel VA address of 0xFFFF000000000000.
> 
> Though this is true it does not convey to the effect of why the flip is
> required. As you had mentioned previously KASAN_SHADOW_END is fixed because
> it needs to support the highest possible VA (~0UL) in both 48 and 52 bits.
> 
> Hence KASAN_SHADOW_START will have to be variable in order to accommodate
> both 48 bits or 52 bits effective virtual address. Hence not sure what the
> above example based on KASAN_SHADOW_START is trying to convey.
> 
> KASAN_SHADOW_END cannot be in the (52 bit space - 48 bit space) exclusion VA
> range as it would not work for the 48 bits case. But KASAN_SHADOW_START can
> very well be in that space for 52 bits case.
> 
> The current definition
> 
> #define KASAN_SHADOW_START      (VA_START)
> #define KASAN_SHADOW_END        (KASAN_SHADOW_START + KASAN_SHADOW_SIZE)
> 
> This wont work in the new scheme because VA_START is different for 48 bits
> and 52 bits which will make KASAN_SHADOW_END variable as well. Hence we need
> to change this arrangement.

Yes in a future patch? VA_BITS is still constant here.
> 
> What the commit message here does not try to convince is that there are no
> other alternate arrangements where KASAN_SHADOW_END remains fixed (and also
> accessible in 48 bits scheme), KASAN_SHADOW_FIRST is variable accommodating
> both 48 bits and 52 bits case and flipping the kernel VA space is the only
> viable option.
> 
> Ahh I see this in the cover letter.
> 
> " In order to allow for a KASAN shadow that changes size at boot time, one
> must fix the KASAN_SHADOW_END for both 48 & 52-bit VAs and "grow" the
> start address. Also, it is highly desirable to maintain the same
> function addresses in the kernel .text between VA sizes. Both of these
> requirements necessitate us to flip the kernel address space halves s.t.
> the direct linear map occupies the lower addresses."
> 
> Though this is better (please add it to commit message).
> 

Sure :-)

> There were no alternate arrangements to achieve the above two objectives
> without flipping the VA space ? The reasoning here in the commit message is
> not convincing enough even with the above cover letter extract.
> 
> > 
> > We need to adjust:
> >  *) KASAN shadow region placement logic,
> >  *) KASAN_SHADOW_OFFSET computation logic,
> >  *) virt_to_phys, phys_to_virt checks,
> >  *) page table dumper.
> > 
> > These are all small changes, that need to take place atomically, so they
> > are bundled into this commit.
> 
> It will be great to add a before patch and after patch view of the kernel
> virtual address space enlisting different sections to make things apparent
> about what and how the layout has changed.

Hmm, pondering this. I can introduce a documentation document in a new
first patch then modify it as the series changes?

Catalin, would that work for you?

Cheers,
Catalin Marinas June 26, 2019, 10:56 a.m. UTC | #5
On Mon, Jun 17, 2019 at 05:09:11PM +0100, Steve Capper wrote:
> On Fri, Jun 14, 2019 at 05:47:55PM +0530, Anshuman Khandual wrote:
> > On 06/12/2019 10:56 PM, Steve Capper wrote:
> > > We need to adjust:
> > >  *) KASAN shadow region placement logic,
> > >  *) KASAN_SHADOW_OFFSET computation logic,
> > >  *) virt_to_phys, phys_to_virt checks,
> > >  *) page table dumper.
> > > 
> > > These are all small changes, that need to take place atomically, so they
> > > are bundled into this commit.
> > 
> > It will be great to add a before patch and after patch view of the kernel
> > virtual address space enlisting different sections to make things apparent
> > about what and how the layout has changed.
> 
> Hmm, pondering this. I can introduce a documentation document in a new
> first patch then modify it as the series changes?
> 
> Catalin, would that work for you?

If you want to. Otherwise just a patch documenting the final layout
would work for me (possibly mentioning the commit which removed the
original document as a reference).
diff mbox series

Patch

diff --git a/arch/arm64/Makefile b/arch/arm64/Makefile
index b025304bde46..2dad2ae6b181 100644
--- a/arch/arm64/Makefile
+++ b/arch/arm64/Makefile
@@ -115,7 +115,7 @@  KBUILD_AFLAGS += -DKASAN_SHADOW_SCALE_SHIFT=$(KASAN_SHADOW_SCALE_SHIFT)
 #				 - (1 << (64 - KASAN_SHADOW_SCALE_SHIFT))
 # in 32-bit arithmetic
 KASAN_SHADOW_OFFSET := $(shell printf "0x%08x00000000\n" $$(( \
-	(0xffffffff & (-1 << ($(CONFIG_ARM64_VA_BITS) - 32))) \
+	(0xffffffff & (-1 << ($(CONFIG_ARM64_VA_BITS) - 1 - 32))) \
 	+ (1 << ($(CONFIG_ARM64_VA_BITS) - 32 - $(KASAN_SHADOW_SCALE_SHIFT))) \
 	- (1 << (64 - 32 - $(KASAN_SHADOW_SCALE_SHIFT))) )) )
 
diff --git a/arch/arm64/include/asm/memory.h b/arch/arm64/include/asm/memory.h
index 8ffcf5a512bb..5cd2eb8cb424 100644
--- a/arch/arm64/include/asm/memory.h
+++ b/arch/arm64/include/asm/memory.h
@@ -49,9 +49,9 @@ 
  */
 #define VA_BITS			(CONFIG_ARM64_VA_BITS)
 #define VA_START		(UL(0xffffffffffffffff) - \
-	(UL(1) << VA_BITS) + 1)
-#define PAGE_OFFSET		(UL(0xffffffffffffffff) - \
 	(UL(1) << (VA_BITS - 1)) + 1)
+#define PAGE_OFFSET		(UL(0xffffffffffffffff) - \
+	(UL(1) << VA_BITS) + 1)
 #define KIMAGE_VADDR		(MODULES_END)
 #define BPF_JIT_REGION_START	(VA_START + KASAN_SHADOW_SIZE)
 #define BPF_JIT_REGION_SIZE	(SZ_128M)
@@ -59,7 +59,7 @@ 
 #define MODULES_END		(MODULES_VADDR + MODULES_VSIZE)
 #define MODULES_VADDR		(BPF_JIT_REGION_END)
 #define MODULES_VSIZE		(SZ_128M)
-#define VMEMMAP_START		(PAGE_OFFSET - VMEMMAP_SIZE)
+#define VMEMMAP_START		(-VMEMMAP_SIZE)
 #define PCI_IO_END		(VMEMMAP_START - SZ_2M)
 #define PCI_IO_START		(PCI_IO_END - PCI_IO_SIZE)
 #define FIXADDR_TOP		(PCI_IO_START - SZ_2M)
@@ -238,7 +238,7 @@  extern u64			vabits_user;
  * space. Testing the top bit for the start of the region is a
  * sufficient check.
  */
-#define __is_lm_address(addr)	(!!((addr) & BIT(VA_BITS - 1)))
+#define __is_lm_address(addr)	(!((addr) & BIT(VA_BITS - 1)))
 
 #define __lm_to_phys(addr)	(((addr) & ~PAGE_OFFSET) + PHYS_OFFSET)
 #define __kimg_to_phys(addr)	((addr) - kimage_voffset)
diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index 2c41b04708fe..d0ab784304e9 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -32,7 +32,7 @@ 
  *	and fixed mappings
  */
 #define VMALLOC_START		(MODULES_END)
-#define VMALLOC_END		(PAGE_OFFSET - PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
+#define VMALLOC_END		(- PUD_SIZE - VMEMMAP_SIZE - SZ_64K)
 
 #define vmemmap			((struct page *)VMEMMAP_START - (memstart_addr >> PAGE_SHIFT))
 
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 9859e1178e6b..6ffcc32f35dd 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -497,7 +497,7 @@  int swsusp_arch_resume(void)
 		rc = -ENOMEM;
 		goto out;
 	}
-	rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, 0);
+	rc = copy_page_tables(tmp_pg_dir, PAGE_OFFSET, VA_START);
 	if (rc)
 		goto out;
 
diff --git a/arch/arm64/mm/dump.c b/arch/arm64/mm/dump.c
index 14fe23cd5932..ee4e5bea8944 100644
--- a/arch/arm64/mm/dump.c
+++ b/arch/arm64/mm/dump.c
@@ -30,6 +30,8 @@ 
 #include <asm/ptdump.h>
 
 static const struct addr_marker address_markers[] = {
+	{ PAGE_OFFSET,			"Linear Mapping start" },
+	{ VA_START,			"Linear Mapping end" },
 #ifdef CONFIG_KASAN
 	{ KASAN_SHADOW_START,		"Kasan shadow start" },
 	{ KASAN_SHADOW_END,		"Kasan shadow end" },
@@ -43,10 +45,8 @@  static const struct addr_marker address_markers[] = {
 	{ PCI_IO_START,			"PCI I/O start" },
 	{ PCI_IO_END,			"PCI I/O end" },
 #ifdef CONFIG_SPARSEMEM_VMEMMAP
-	{ VMEMMAP_START,		"vmemmap start" },
-	{ VMEMMAP_START + VMEMMAP_SIZE,	"vmemmap end" },
+	{ VMEMMAP_START,		"vmemmap" },
 #endif
-	{ PAGE_OFFSET,			"Linear mapping" },
 	{ -1,				NULL },
 };
 
@@ -380,7 +380,7 @@  static void ptdump_initialize(void)
 static struct ptdump_info kernel_ptdump_info = {
 	.mm		= &init_mm,
 	.markers	= address_markers,
-	.base_addr	= VA_START,
+	.base_addr	= PAGE_OFFSET,
 };
 
 void ptdump_check_wx(void)
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c
index d2adffb81b5d..574ed1d4be19 100644
--- a/arch/arm64/mm/init.c
+++ b/arch/arm64/mm/init.c
@@ -311,7 +311,7 @@  static void __init fdt_enforce_memory_region(void)
 
 void __init arm64_memblock_init(void)
 {
-	const s64 linear_region_size = -(s64)PAGE_OFFSET;
+	const s64 linear_region_size = BIT(VA_BITS - 1);
 
 	/* Handle linux,usable-memory-range property */
 	fdt_enforce_memory_region();
@@ -319,13 +319,6 @@  void __init arm64_memblock_init(void)
 	/* Remove memory above our supported physical address size */
 	memblock_remove(1ULL << PHYS_MASK_SHIFT, ULLONG_MAX);
 
-	/*
-	 * Ensure that the linear region takes up exactly half of the kernel
-	 * virtual address space. This way, we can distinguish a linear address
-	 * from a kernel/module/vmalloc address by testing a single bit.
-	 */
-	BUILD_BUG_ON(linear_region_size != BIT(VA_BITS - 1));
-
 	/*
 	 * Select a suitable value for the base of physical memory.
 	 */
diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 296de39ddee5..8066621052db 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -229,10 +229,10 @@  void __init kasan_init(void)
 	kasan_map_populate(kimg_shadow_start, kimg_shadow_end,
 			   early_pfn_to_nid(virt_to_pfn(lm_alias(_text))));
 
-	kasan_populate_early_shadow((void *)KASAN_SHADOW_START,
-				    (void *)mod_shadow_start);
+	kasan_populate_early_shadow(kasan_mem_to_shadow((void *) VA_START),
+				   (void *)mod_shadow_start);
 	kasan_populate_early_shadow((void *)kimg_shadow_end,
-				    kasan_mem_to_shadow((void *)PAGE_OFFSET));
+				   (void *)KASAN_SHADOW_END);
 
 	if (kimg_shadow_start > mod_shadow_end)
 		kasan_populate_early_shadow((void *)mod_shadow_end,
diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index a1bfc4413982..16063ff10c6d 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -409,7 +409,7 @@  static phys_addr_t pgd_pgtable_alloc(int shift)
 static void __init create_mapping_noalloc(phys_addr_t phys, unsigned long virt,
 				  phys_addr_t size, pgprot_t prot)
 {
-	if (virt < VMALLOC_START) {
+	if ((virt >= VA_START) && (virt < VMALLOC_START)) {
 		pr_warn("BUG: not creating mapping for %pa at 0x%016lx - outside kernel range\n",
 			&phys, virt);
 		return;
@@ -436,7 +436,7 @@  void __init create_pgd_mapping(struct mm_struct *mm, phys_addr_t phys,
 static void update_mapping_prot(phys_addr_t phys, unsigned long virt,
 				phys_addr_t size, pgprot_t prot)
 {
-	if (virt < VMALLOC_START) {
+	if ((virt >= VA_START) && (virt < VMALLOC_START)) {
 		pr_warn("BUG: not updating mapping for %pa at 0x%016lx - outside kernel range\n",
 			&phys, virt);
 		return;