Message ID | 20180430175925.2657-3-toshi.kani@hpe.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Apr 30, 2018 at 11:59:24AM -0600, Toshi Kani wrote: > int pud_free_pmd_page(pud_t *pud, unsigned long addr) > { > - pmd_t *pmd; > + pmd_t *pmd, *pmd_sv; > + pte_t *pte; > int i; > > if (pud_none(*pud)) > return 1; > > pmd = (pmd_t *)pud_page_vaddr(*pud); > + pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL); So you need to allocate a page to free a page? It is better to put the pages into a list with a list_head on the stack. I am still on favour of just reverting the broken commit and do a correct and working fix for the/a merge window. Joerg
On Tue, 2018-05-15 at 16:05 +0200, Joerg Roedel wrote: > On Mon, Apr 30, 2018 at 11:59:24AM -0600, Toshi Kani wrote: > > int pud_free_pmd_page(pud_t *pud, unsigned long addr) > > { > > - pmd_t *pmd; > > + pmd_t *pmd, *pmd_sv; > > + pte_t *pte; > > int i; > > > > if (pud_none(*pud)) > > return 1; > > > > pmd = (pmd_t *)pud_page_vaddr(*pud); > > + pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL); > > So you need to allocate a page to free a page? It is better to put the > pages into a list with a list_head on the stack. The code should have checked if pmd_sv is NULL... I will update the patch. For performance, I do not think this page alloc is a problem. Unlike pmd_free_pte_page(), pud_free_pmd_page() covers an extremely rare case. Since pud requires 1GB-alignment, pud and pmd/pte mappings do not share the same ranges within the vmalloc space. I had to instrument the kernel to force them share the same ranges in order to test this patch. > I am still on favour of just reverting the broken commit and do a > correct and working fix for the/a merge window. I will reorder the patch series, and change patch 3/3 to 1/3 so that we can take it first to fix the BUG_ON on PAE. This revert will disable 2MB ioremap on PAE in some cases, but I do not think it's important on PAE anyway. I do not think revert on x86/64 is necessary and I am more worried about disabling 2MB ioremap in some cases, which can be seen as degradation. Patch 2/3 fixes a possible page-directory cache issue that I cannot hit even though I put ioremap/iounmap with various sizes into a tight loop for a day. Thanks, -Toshi
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c index 37e3cbac59b9..816fd41ee854 100644 --- a/arch/x86/mm/pgtable.c +++ b/arch/x86/mm/pgtable.c @@ -720,24 +720,40 @@ int pmd_clear_huge(pmd_t *pmd) * @pud: Pointer to a PUD. * @addr: Virtual address associated with pud. * - * Context: The pud range has been unmaped and TLB purged. + * Context: The pud range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. */ int pud_free_pmd_page(pud_t *pud, unsigned long addr) { - pmd_t *pmd; + pmd_t *pmd, *pmd_sv; + pte_t *pte; int i; if (pud_none(*pud)) return 1; pmd = (pmd_t *)pud_page_vaddr(*pud); + pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL); - for (i = 0; i < PTRS_PER_PMD; i++) - if (!pmd_free_pte_page(&pmd[i], addr + (i * PMD_SIZE))) - return 0; + for (i = 0; i < PTRS_PER_PMD; i++) { + pmd_sv[i] = pmd[i]; + if (!pmd_none(pmd[i])) + pmd_clear(&pmd[i]); + } pud_clear(pud); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + + for (i = 0; i < PTRS_PER_PMD; i++) { + if (!pmd_none(pmd_sv[i])) { + pte = (pte_t *)pmd_page_vaddr(pmd_sv[i]); + free_page((unsigned long)pte); + } + } + + free_page((unsigned long)pmd_sv); free_page((unsigned long)pmd); return 1; @@ -748,7 +764,7 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr) * @pmd: Pointer to a PMD. * @addr: Virtual address associated with pmd. * - * Context: The pmd range has been unmaped and TLB purged. + * Context: The pmd range has been unmapped and TLB purged. * Return: 1 if clearing the entry succeeded. 0 otherwise. */ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) @@ -760,6 +776,10 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr) pte = (pte_t *)pmd_page_vaddr(*pmd); pmd_clear(pmd); + + /* INVLPG to clear all paging-structure caches */ + flush_tlb_kernel_range(addr, addr + PAGE_SIZE-1); + free_page((unsigned long)pte); return 1;
ioremap() calls pud_free_pmd_page() / pmd_free_pte_page() when it creates a pud / pmd map. The following preconditions are met at their entry. - All pte entries for a target pud/pmd address range have been cleared. - System-wide TLB purges have been peformed for a target pud/pmd address range. The preconditions assure that there is no stale TLB entry for the range. Speculation may not cache TLB entries since it requires all levels of page entries, including ptes, to have P & A-bits set for an associated address. However, speculation may cache pud/pmd entries (paging-structure caches) when they have P-bit set. Add a system-wide TLB purge (INVLPG) to a single page after clearing pud/pmd entry's P-bit. SDM 4.10.4.1, Operation that Invalidate TLBs and Paging-Structure Caches, states that: INVLPG invalidates all paging-structure caches associated with the current PCID regardless of the liner addresses to which they correspond. Fixes: 28ee90fe6048 ("x86/mm: implement free pmd/pte page interfaces") Signed-off-by: Toshi Kani <toshi.kani@hpe.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Joerg Roedel <joro@8bytes.org> Cc: <stable@vger.kernel.org> --- arch/x86/mm/pgtable.c | 32 ++++++++++++++++++++++++++------ 1 file changed, 26 insertions(+), 6 deletions(-)