diff mbox series

[v9,07/18] arm64: trans_pgd: hibernate: idmap the single page that holds the copy page routines

Message ID 20200326032420.27220-8-pasha.tatashin@soleen.com (mailing list archive)
State New, archived
Headers show
Series arm64: MMU enabled kexec relocation | expand

Commit Message

Pasha Tatashin March 26, 2020, 3:24 a.m. UTC
From: James Morse <james.morse@arm.com>

To resume from hibernate, the contents of memory are restored from
the swap image. This may overwrite any page, including the running
kernel and its page tables.

Hibernate copies the code it uses to do the restore into a single
page that it knows won't be overwritten, and maps it with page tables
built from pages that won't be overwritten.

Today the address it uses for this mapping is arbitrary, but to allow
kexec to reuse this code, it needs to be idmapped. To idmap the page
we must avoid the kernel helpers that have VA_BITS baked in.

Convert create_single_mapping() to take a single PA, and idmap it.
The page tables are built in the reverse order to normal using
pfn_pte() to stir in any bits between 52:48. T0SZ is always increased
to cover 48bits, or 52 if the copy code has bits 52:48 in its PA.

Pasha: The original patch from James
inux-arm-kernel/20200115143322.214247-4-james.morse@arm.com
Adopted it to trans_pgd, so it can be commonly used by both Kexec
and Hibernate. Some minor clean-ups.

Signed-off-by: James Morse <james.morse@arm.com>
Signed-off-by: Pavel Tatashin <pasha.tatashin@soleen.com>
---
 arch/arm64/include/asm/trans_pgd.h |  3 ++
 arch/arm64/kernel/hibernate.c      | 32 +++++++------------
 arch/arm64/mm/trans_pgd.c          | 49 ++++++++++++++++++++++++++++++
 3 files changed, 63 insertions(+), 21 deletions(-)

Comments

James Morse April 29, 2020, 5:01 p.m. UTC | #1
Hi Pavel,

On 26/03/2020 03:24, Pavel Tatashin wrote:
> From: James Morse <james.morse@arm.com>
> 
> To resume from hibernate, the contents of memory are restored from
> the swap image. This may overwrite any page, including the running
> kernel and its page tables.
> 
> Hibernate copies the code it uses to do the restore into a single
> page that it knows won't be overwritten, and maps it with page tables
> built from pages that won't be overwritten.
> 
> Today the address it uses for this mapping is arbitrary, but to allow
> kexec to reuse this code, it needs to be idmapped. To idmap the page
> we must avoid the kernel helpers that have VA_BITS baked in.
> 
> Convert create_single_mapping() to take a single PA, and idmap it.
> The page tables are built in the reverse order to normal using
> pfn_pte() to stir in any bits between 52:48. T0SZ is always increased
> to cover 48bits, or 52 if the copy code has bits 52:48 in its PA.
> 
> Pasha: The original patch from James
> inux-arm-kernel/20200115143322.214247-4-james.morse@arm.com

-EBROKENLINK

The convention is to use a 'Link:' tag in the signed-off area.
e.g. 5a3577039cbe

> Adopted it to trans_pgd, so it can be commonly used by both Kexec
> and Hibernate. Some minor clean-ups.

Please describe your changes just before your SoB. This means each author sign's off on
the stuff above their SoB, and its obvious who made which changes.

Search for 'Lucky K Maintainer' in process/submitting-patches.rst for an example.


> diff --git a/arch/arm64/include/asm/trans_pgd.h b/arch/arm64/include/asm/trans_pgd.h
> index 97a7ea73b289..4912d3caf0ca 100644
> --- a/arch/arm64/include/asm/trans_pgd.h
> +++ b/arch/arm64/include/asm/trans_pgd.h
> @@ -32,4 +32,7 @@ int trans_pgd_create_copy(struct trans_pgd_info *info, pgd_t **trans_pgd,
>  int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd,
>  		       void *page, unsigned long dst_addr, pgprot_t pgprot);

This trans_pgd_map_page() used to be create_single_mapping(), which is where the original
patch made its changes.

You should only need one of these, not both.


> +int trans_pgd_idmap_page(struct trans_pgd_info *info, phys_addr_t *trans_ttbr0,
> +			 unsigned long *t0sz, void *page);
> +
>  #endif /* _ASM_TRANS_TABLE_H */

> diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
> index 37d7d1c60f65..c2517d1af2af 100644
> --- a/arch/arm64/mm/trans_pgd.c
> +++ b/arch/arm64/mm/trans_pgd.c
> @@ -242,3 +242,52 @@ int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd,
>  
>  	return 0;
>  }
> +
> +/*
> + * The page we want to idmap may be outside the range covered by VA_BITS that
> + * can be built using the kernel's p?d_populate() helpers. As a one off, for a
> + * single page, we build these page tables bottom up and just assume that will
> + * need the maximum T0SZ.
> + *
> + * Returns 0 on success, and -ENOMEM on failure.
> + * On success trans_ttbr0 contains page table with idmapped page, t0sz is set to

> + * maxumum T0SZ for this page.

maxumum

> + */


Thanks,

James
Pasha Tatashin Jan. 23, 2021, 12:35 a.m. UTC | #2
On Wed, Apr 29, 2020 at 1:01 PM James Morse <james.morse@arm.com> wrote:
>
> Hi Pavel,
>
> On 26/03/2020 03:24, Pavel Tatashin wrote:
> > From: James Morse <james.morse@arm.com>
> >
> > To resume from hibernate, the contents of memory are restored from
> > the swap image. This may overwrite any page, including the running
> > kernel and its page tables.
> >
> > Hibernate copies the code it uses to do the restore into a single
> > page that it knows won't be overwritten, and maps it with page tables
> > built from pages that won't be overwritten.
> >
> > Today the address it uses for this mapping is arbitrary, but to allow
> > kexec to reuse this code, it needs to be idmapped. To idmap the page
> > we must avoid the kernel helpers that have VA_BITS baked in.
> >
> > Convert create_single_mapping() to take a single PA, and idmap it.
> > The page tables are built in the reverse order to normal using
> > pfn_pte() to stir in any bits between 52:48. T0SZ is always increased
> > to cover 48bits, or 52 if the copy code has bits 52:48 in its PA.
> >
> > Pasha: The original patch from James
> > inux-arm-kernel/20200115143322.214247-4-james.morse@arm.com
>
> -EBROKENLINK
>
> The convention is to use a 'Link:' tag in the signed-off area.
> e.g. 5a3577039cbe

OK Fixed.

>
> > Adopted it to trans_pgd, so it can be commonly used by both Kexec
> > and Hibernate. Some minor clean-ups.
>
> Please describe your changes just before your SoB. This means each author sign's off on
> the stuff above their SoB, and its obvious who made which changes.
>
> Search for 'Lucky K Maintainer' in process/submitting-patches.rst for an example.

OK, Fixed.
eed the maximum T0SZ.
> > + *
> > + * Returns 0 on success, and -ENOMEM on failure.
> > + * On success trans_ttbr0 contains page table with idmapped page, t0sz is set to
>
> > + * maxumum T0SZ for this page.
>
> maxumum
>

Ok.
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/trans_pgd.h b/arch/arm64/include/asm/trans_pgd.h
index 97a7ea73b289..4912d3caf0ca 100644
--- a/arch/arm64/include/asm/trans_pgd.h
+++ b/arch/arm64/include/asm/trans_pgd.h
@@ -32,4 +32,7 @@  int trans_pgd_create_copy(struct trans_pgd_info *info, pgd_t **trans_pgd,
 int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd,
 		       void *page, unsigned long dst_addr, pgprot_t pgprot);
 
+int trans_pgd_idmap_page(struct trans_pgd_info *info, phys_addr_t *trans_ttbr0,
+			 unsigned long *t0sz, void *page);
+
 #endif /* _ASM_TRANS_TABLE_H */
diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c
index 95e00536aa67..784aa01bb4bd 100644
--- a/arch/arm64/kernel/hibernate.c
+++ b/arch/arm64/kernel/hibernate.c
@@ -197,7 +197,6 @@  static void *hibernate_page_alloc(void *arg)
  * page system.
  */
 static int create_safe_exec_page(void *src_start, size_t length,
-				 unsigned long dst_addr,
 				 phys_addr_t *phys_dst_addr)
 {
 	struct trans_pgd_info trans_info = {
@@ -206,7 +205,8 @@  static int create_safe_exec_page(void *src_start, size_t length,
 	};
 
 	void *page = (void *)get_safe_page(GFP_ATOMIC);
-	pgd_t *trans_pgd;
+	phys_addr_t trans_ttbr0;
+	unsigned long t0sz;
 	int rc;
 
 	if (!page)
@@ -214,13 +214,7 @@  static int create_safe_exec_page(void *src_start, size_t length,
 
 	memcpy(page, src_start, length);
 	__flush_icache_range((unsigned long)page, (unsigned long)page + length);
-
-	trans_pgd = (void *)get_safe_page(GFP_ATOMIC);
-	if (!trans_pgd)
-		return -ENOMEM;
-
-	rc = trans_pgd_map_page(&trans_info, trans_pgd, page, dst_addr,
-				PAGE_KERNEL_EXEC);
+	rc = trans_pgd_idmap_page(&trans_info, &trans_ttbr0, &t0sz, page);
 	if (rc)
 		return rc;
 
@@ -233,12 +227,15 @@  static int create_safe_exec_page(void *src_start, size_t length,
 	 * page, but TLBs may contain stale ASID-tagged entries (e.g. for EFI
 	 * runtime services), while for a userspace-driven test_resume cycle it
 	 * points to userspace page tables (and we must point it at a zero page
-	 * ourselves). Elsewhere we only (un)install the idmap with preemption
-	 * disabled, so T0SZ should be as required regardless.
+	 * ourselves).
+	 *
+	 * We change T0SZ as part of installing the idmap. This is undone by
+	 * cpu_uninstall_idmap() in __cpu_suspend_exit().
 	 */
 	cpu_set_reserved_ttbr0();
 	local_flush_tlb_all();
-	write_sysreg(phys_to_ttbr(virt_to_phys(trans_pgd)), ttbr0_el1);
+	__cpu_set_tcr_t0sz(t0sz);
+	write_sysreg(trans_ttbr0, ttbr0_el1);
 	isb();
 
 	*phys_dst_addr = virt_to_phys(page);
@@ -319,7 +316,6 @@  int swsusp_arch_resume(void)
 	void *zero_page;
 	size_t exit_size;
 	pgd_t *tmp_pg_dir;
-	phys_addr_t phys_hibernate_exit;
 	void __noreturn (*hibernate_exit)(phys_addr_t, phys_addr_t, void *,
 					  void *, phys_addr_t, phys_addr_t);
 	struct trans_pgd_info trans_info = {
@@ -347,19 +343,13 @@  int swsusp_arch_resume(void)
 		return -ENOMEM;
 	}
 
-	/*
-	 * Locate the exit code in the bottom-but-one page, so that *NULL
-	 * still has disastrous affects.
-	 */
-	hibernate_exit = (void *)PAGE_SIZE;
 	exit_size = __hibernate_exit_text_end - __hibernate_exit_text_start;
 	/*
 	 * Copy swsusp_arch_suspend_exit() to a safe page. This will generate
 	 * a new set of ttbr0 page tables and load them.
 	 */
 	rc = create_safe_exec_page(__hibernate_exit_text_start, exit_size,
-				   (unsigned long)hibernate_exit,
-				   &phys_hibernate_exit);
+				   (phys_addr_t *)&hibernate_exit);
 	if (rc) {
 		pr_err("Failed to create safe executable page for hibernate_exit code.\n");
 		return rc;
@@ -378,7 +368,7 @@  int swsusp_arch_resume(void)
 	 * We can skip this step if we booted at EL1, or are running with VHE.
 	 */
 	if (el2_reset_needed()) {
-		phys_addr_t el2_vectors = phys_hibernate_exit;  /* base */
+		phys_addr_t el2_vectors = (phys_addr_t)hibernate_exit;
 		el2_vectors += hibernate_el2_vectors -
 			       __hibernate_exit_text_start;     /* offset */
 
diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c
index 37d7d1c60f65..c2517d1af2af 100644
--- a/arch/arm64/mm/trans_pgd.c
+++ b/arch/arm64/mm/trans_pgd.c
@@ -242,3 +242,52 @@  int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd,
 
 	return 0;
 }
+
+/*
+ * The page we want to idmap may be outside the range covered by VA_BITS that
+ * can be built using the kernel's p?d_populate() helpers. As a one off, for a
+ * single page, we build these page tables bottom up and just assume that will
+ * need the maximum T0SZ.
+ *
+ * Returns 0 on success, and -ENOMEM on failure.
+ * On success trans_ttbr0 contains page table with idmapped page, t0sz is set to
+ * maxumum T0SZ for this page.
+ */
+int trans_pgd_idmap_page(struct trans_pgd_info *info, phys_addr_t *trans_ttbr0,
+			 unsigned long *t0sz, void *page)
+{
+	phys_addr_t dst_addr = virt_to_phys(page);
+	unsigned long pfn = __phys_to_pfn(dst_addr);
+	int max_msb = (dst_addr & GENMASK(52, 48)) ? 51 : 47;
+	int bits_mapped = PAGE_SHIFT - 4;
+	unsigned long level_mask, prev_level_entry, *levels[4];
+	int this_level, index, level_lsb, level_msb;
+
+	dst_addr &= PAGE_MASK;
+	prev_level_entry = pte_val(pfn_pte(pfn, PAGE_KERNEL_EXEC));
+
+	for (this_level = 3; this_level >= 0; this_level--) {
+		levels[this_level] = trans_alloc(info);
+		if (!levels[this_level])
+			return -ENOMEM;
+
+		level_lsb = ARM64_HW_PGTABLE_LEVEL_SHIFT(this_level);
+		level_msb = min(level_lsb + bits_mapped, max_msb);
+		level_mask = GENMASK_ULL(level_msb, level_lsb);
+
+		index = (dst_addr & level_mask) >> level_lsb;
+		*(levels[this_level] + index) = prev_level_entry;
+
+		pfn = virt_to_pfn(levels[this_level]);
+		prev_level_entry = pte_val(pfn_pte(pfn,
+						   __pgprot(PMD_TYPE_TABLE)));
+
+		if (level_msb == max_msb)
+			break;
+	}
+
+	*trans_ttbr0 = phys_to_ttbr(__pfn_to_phys(pfn));
+	*t0sz = TCR_T0SZ(max_msb + 1);
+
+	return 0;
+}