Message ID | 20200326032420.27220-8-pasha.tatashin@soleen.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: MMU enabled kexec relocation | expand |
Hi Pavel, On 26/03/2020 03:24, Pavel Tatashin wrote: > From: James Morse <james.morse@arm.com> > > To resume from hibernate, the contents of memory are restored from > the swap image. This may overwrite any page, including the running > kernel and its page tables. > > Hibernate copies the code it uses to do the restore into a single > page that it knows won't be overwritten, and maps it with page tables > built from pages that won't be overwritten. > > Today the address it uses for this mapping is arbitrary, but to allow > kexec to reuse this code, it needs to be idmapped. To idmap the page > we must avoid the kernel helpers that have VA_BITS baked in. > > Convert create_single_mapping() to take a single PA, and idmap it. > The page tables are built in the reverse order to normal using > pfn_pte() to stir in any bits between 52:48. T0SZ is always increased > to cover 48bits, or 52 if the copy code has bits 52:48 in its PA. > > Pasha: The original patch from James > inux-arm-kernel/20200115143322.214247-4-james.morse@arm.com -EBROKENLINK The convention is to use a 'Link:' tag in the signed-off area. e.g. 5a3577039cbe > Adopted it to trans_pgd, so it can be commonly used by both Kexec > and Hibernate. Some minor clean-ups. Please describe your changes just before your SoB. This means each author sign's off on the stuff above their SoB, and its obvious who made which changes. Search for 'Lucky K Maintainer' in process/submitting-patches.rst for an example. > diff --git a/arch/arm64/include/asm/trans_pgd.h b/arch/arm64/include/asm/trans_pgd.h > index 97a7ea73b289..4912d3caf0ca 100644 > --- a/arch/arm64/include/asm/trans_pgd.h > +++ b/arch/arm64/include/asm/trans_pgd.h > @@ -32,4 +32,7 @@ int trans_pgd_create_copy(struct trans_pgd_info *info, pgd_t **trans_pgd, > int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd, > void *page, unsigned long dst_addr, pgprot_t pgprot); This trans_pgd_map_page() used to be create_single_mapping(), which is where the original patch made its changes. You should only need one of these, not both. > +int trans_pgd_idmap_page(struct trans_pgd_info *info, phys_addr_t *trans_ttbr0, > + unsigned long *t0sz, void *page); > + > #endif /* _ASM_TRANS_TABLE_H */ > diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c > index 37d7d1c60f65..c2517d1af2af 100644 > --- a/arch/arm64/mm/trans_pgd.c > +++ b/arch/arm64/mm/trans_pgd.c > @@ -242,3 +242,52 @@ int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd, > > return 0; > } > + > +/* > + * The page we want to idmap may be outside the range covered by VA_BITS that > + * can be built using the kernel's p?d_populate() helpers. As a one off, for a > + * single page, we build these page tables bottom up and just assume that will > + * need the maximum T0SZ. > + * > + * Returns 0 on success, and -ENOMEM on failure. > + * On success trans_ttbr0 contains page table with idmapped page, t0sz is set to > + * maxumum T0SZ for this page. maxumum > + */ Thanks, James
On Wed, Apr 29, 2020 at 1:01 PM James Morse <james.morse@arm.com> wrote: > > Hi Pavel, > > On 26/03/2020 03:24, Pavel Tatashin wrote: > > From: James Morse <james.morse@arm.com> > > > > To resume from hibernate, the contents of memory are restored from > > the swap image. This may overwrite any page, including the running > > kernel and its page tables. > > > > Hibernate copies the code it uses to do the restore into a single > > page that it knows won't be overwritten, and maps it with page tables > > built from pages that won't be overwritten. > > > > Today the address it uses for this mapping is arbitrary, but to allow > > kexec to reuse this code, it needs to be idmapped. To idmap the page > > we must avoid the kernel helpers that have VA_BITS baked in. > > > > Convert create_single_mapping() to take a single PA, and idmap it. > > The page tables are built in the reverse order to normal using > > pfn_pte() to stir in any bits between 52:48. T0SZ is always increased > > to cover 48bits, or 52 if the copy code has bits 52:48 in its PA. > > > > Pasha: The original patch from James > > inux-arm-kernel/20200115143322.214247-4-james.morse@arm.com > > -EBROKENLINK > > The convention is to use a 'Link:' tag in the signed-off area. > e.g. 5a3577039cbe OK Fixed. > > > Adopted it to trans_pgd, so it can be commonly used by both Kexec > > and Hibernate. Some minor clean-ups. > > Please describe your changes just before your SoB. This means each author sign's off on > the stuff above their SoB, and its obvious who made which changes. > > Search for 'Lucky K Maintainer' in process/submitting-patches.rst for an example. OK, Fixed. eed the maximum T0SZ. > > + * > > + * Returns 0 on success, and -ENOMEM on failure. > > + * On success trans_ttbr0 contains page table with idmapped page, t0sz is set to > > > + * maxumum T0SZ for this page. > > maxumum > Ok.
diff --git a/arch/arm64/include/asm/trans_pgd.h b/arch/arm64/include/asm/trans_pgd.h index 97a7ea73b289..4912d3caf0ca 100644 --- a/arch/arm64/include/asm/trans_pgd.h +++ b/arch/arm64/include/asm/trans_pgd.h @@ -32,4 +32,7 @@ int trans_pgd_create_copy(struct trans_pgd_info *info, pgd_t **trans_pgd, int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd, void *page, unsigned long dst_addr, pgprot_t pgprot); +int trans_pgd_idmap_page(struct trans_pgd_info *info, phys_addr_t *trans_ttbr0, + unsigned long *t0sz, void *page); + #endif /* _ASM_TRANS_TABLE_H */ diff --git a/arch/arm64/kernel/hibernate.c b/arch/arm64/kernel/hibernate.c index 95e00536aa67..784aa01bb4bd 100644 --- a/arch/arm64/kernel/hibernate.c +++ b/arch/arm64/kernel/hibernate.c @@ -197,7 +197,6 @@ static void *hibernate_page_alloc(void *arg) * page system. */ static int create_safe_exec_page(void *src_start, size_t length, - unsigned long dst_addr, phys_addr_t *phys_dst_addr) { struct trans_pgd_info trans_info = { @@ -206,7 +205,8 @@ static int create_safe_exec_page(void *src_start, size_t length, }; void *page = (void *)get_safe_page(GFP_ATOMIC); - pgd_t *trans_pgd; + phys_addr_t trans_ttbr0; + unsigned long t0sz; int rc; if (!page) @@ -214,13 +214,7 @@ static int create_safe_exec_page(void *src_start, size_t length, memcpy(page, src_start, length); __flush_icache_range((unsigned long)page, (unsigned long)page + length); - - trans_pgd = (void *)get_safe_page(GFP_ATOMIC); - if (!trans_pgd) - return -ENOMEM; - - rc = trans_pgd_map_page(&trans_info, trans_pgd, page, dst_addr, - PAGE_KERNEL_EXEC); + rc = trans_pgd_idmap_page(&trans_info, &trans_ttbr0, &t0sz, page); if (rc) return rc; @@ -233,12 +227,15 @@ static int create_safe_exec_page(void *src_start, size_t length, * page, but TLBs may contain stale ASID-tagged entries (e.g. for EFI * runtime services), while for a userspace-driven test_resume cycle it * points to userspace page tables (and we must point it at a zero page - * ourselves). Elsewhere we only (un)install the idmap with preemption - * disabled, so T0SZ should be as required regardless. + * ourselves). + * + * We change T0SZ as part of installing the idmap. This is undone by + * cpu_uninstall_idmap() in __cpu_suspend_exit(). */ cpu_set_reserved_ttbr0(); local_flush_tlb_all(); - write_sysreg(phys_to_ttbr(virt_to_phys(trans_pgd)), ttbr0_el1); + __cpu_set_tcr_t0sz(t0sz); + write_sysreg(trans_ttbr0, ttbr0_el1); isb(); *phys_dst_addr = virt_to_phys(page); @@ -319,7 +316,6 @@ int swsusp_arch_resume(void) void *zero_page; size_t exit_size; pgd_t *tmp_pg_dir; - phys_addr_t phys_hibernate_exit; void __noreturn (*hibernate_exit)(phys_addr_t, phys_addr_t, void *, void *, phys_addr_t, phys_addr_t); struct trans_pgd_info trans_info = { @@ -347,19 +343,13 @@ int swsusp_arch_resume(void) return -ENOMEM; } - /* - * Locate the exit code in the bottom-but-one page, so that *NULL - * still has disastrous affects. - */ - hibernate_exit = (void *)PAGE_SIZE; exit_size = __hibernate_exit_text_end - __hibernate_exit_text_start; /* * Copy swsusp_arch_suspend_exit() to a safe page. This will generate * a new set of ttbr0 page tables and load them. */ rc = create_safe_exec_page(__hibernate_exit_text_start, exit_size, - (unsigned long)hibernate_exit, - &phys_hibernate_exit); + (phys_addr_t *)&hibernate_exit); if (rc) { pr_err("Failed to create safe executable page for hibernate_exit code.\n"); return rc; @@ -378,7 +368,7 @@ int swsusp_arch_resume(void) * We can skip this step if we booted at EL1, or are running with VHE. */ if (el2_reset_needed()) { - phys_addr_t el2_vectors = phys_hibernate_exit; /* base */ + phys_addr_t el2_vectors = (phys_addr_t)hibernate_exit; el2_vectors += hibernate_el2_vectors - __hibernate_exit_text_start; /* offset */ diff --git a/arch/arm64/mm/trans_pgd.c b/arch/arm64/mm/trans_pgd.c index 37d7d1c60f65..c2517d1af2af 100644 --- a/arch/arm64/mm/trans_pgd.c +++ b/arch/arm64/mm/trans_pgd.c @@ -242,3 +242,52 @@ int trans_pgd_map_page(struct trans_pgd_info *info, pgd_t *trans_pgd, return 0; } + +/* + * The page we want to idmap may be outside the range covered by VA_BITS that + * can be built using the kernel's p?d_populate() helpers. As a one off, for a + * single page, we build these page tables bottom up and just assume that will + * need the maximum T0SZ. + * + * Returns 0 on success, and -ENOMEM on failure. + * On success trans_ttbr0 contains page table with idmapped page, t0sz is set to + * maxumum T0SZ for this page. + */ +int trans_pgd_idmap_page(struct trans_pgd_info *info, phys_addr_t *trans_ttbr0, + unsigned long *t0sz, void *page) +{ + phys_addr_t dst_addr = virt_to_phys(page); + unsigned long pfn = __phys_to_pfn(dst_addr); + int max_msb = (dst_addr & GENMASK(52, 48)) ? 51 : 47; + int bits_mapped = PAGE_SHIFT - 4; + unsigned long level_mask, prev_level_entry, *levels[4]; + int this_level, index, level_lsb, level_msb; + + dst_addr &= PAGE_MASK; + prev_level_entry = pte_val(pfn_pte(pfn, PAGE_KERNEL_EXEC)); + + for (this_level = 3; this_level >= 0; this_level--) { + levels[this_level] = trans_alloc(info); + if (!levels[this_level]) + return -ENOMEM; + + level_lsb = ARM64_HW_PGTABLE_LEVEL_SHIFT(this_level); + level_msb = min(level_lsb + bits_mapped, max_msb); + level_mask = GENMASK_ULL(level_msb, level_lsb); + + index = (dst_addr & level_mask) >> level_lsb; + *(levels[this_level] + index) = prev_level_entry; + + pfn = virt_to_pfn(levels[this_level]); + prev_level_entry = pte_val(pfn_pte(pfn, + __pgprot(PMD_TYPE_TABLE))); + + if (level_msb == max_msb) + break; + } + + *trans_ttbr0 = phys_to_ttbr(__pfn_to_phys(pfn)); + *t0sz = TCR_T0SZ(max_msb + 1); + + return 0; +}