mbox series

[GIT,PULL] arm64: fix for -rc3

Message ID 20180907154516.GG12788@arm.com (mailing list archive)
State New, archived
Headers show
Series [GIT,PULL] arm64: fix for -rc3 | expand

Pull-request

git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-fixes

Message

Will Deacon Sept. 7, 2018, 3:45 p.m. UTC
Hi Linus,

Just one small fix here, preventing a VM_WARN_ON when a !present PMD/PUD
is "freed" as part of a huge ioremap() operation. The correct behaviour
is to skip the free silently in this case, which is a little weird (the
function is a bit of a misnomer), but it follows the x86 implementation.

I also renewed the expiry dates on my gpg subkeys this week, so you might
need to re-fetch them if you can find a working keyserver.

Cheers,

Will

--->8

The following changes since commit 57361846b52bc686112da6ca5368d11210796804:

  Linux 4.19-rc2 (2018-09-02 14:37:30 -0700)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git tags/arm64-fixes

for you to fetch changes up to fac880c7d074fdfca874114b5c47b36aa034e4ee:

  arm64: fix erroneous warnings in page freeing functions (2018-09-06 18:01:13 +0100)

----------------------------------------------------------------
arm64 fix

- Remove accidental VM_WARN_ON

----------------------------------------------------------------
Mark Rutland (1):
      arm64: fix erroneous warnings in page freeing functions

 arch/arm64/mm/mmu.c | 10 ++++++----
 1 file changed, 6 insertions(+), 4 deletions(-)

Comments

Linus Torvalds Sept. 7, 2018, 5:51 p.m. UTC | #1
On Fri, Sep 7, 2018 at 8:45 AM Will Deacon <will.deacon@arm.com> wrote:
>
> Just one small fix here, preventing a VM_WARN_ON when a !present PMD/PUD
> is "freed" as part of a huge ioremap() operation. The correct behaviour
> is to skip the free silently in this case, which is a little weird (the
> function is a bit of a misnomer), but it follows the x86 implementation.

Hmm. I've obviously pulled it, but it does strike me that maybe the
confusing semantics could be fixed?

There's only one call site, and only two implementations (x86 and arm64).

Maybe the whole "read pmd, check for present" could just be in the
caller, avoiding that oddity wrt name-vs-behavior.

I'm not entirely sure why that code has that special case to begin
with. I guess this is the only case of a kernel mapping of a hugepage,
and people wanted to avoid polluting the generic code with stuff that
was only relevant for architectures that implemented it? But we
already have that ioremap_pmd_enabled() guard, so architectures that
don't do this wouldn't actually have any extra code.

Anyway, I'll let it be, but I think it could be a good idea to simply
change the semantics, and in the process also make it a lot clearer
what that thing actually is expected to do.

                 Linus
Will Deacon Sept. 10, 2018, 5:45 p.m. UTC | #2
Hi Linus,

On Fri, Sep 07, 2018 at 10:51:19AM -0700, Linus Torvalds wrote:
> On Fri, Sep 7, 2018 at 8:45 AM Will Deacon <will.deacon@arm.com> wrote:
> >
> > Just one small fix here, preventing a VM_WARN_ON when a !present PMD/PUD
> > is "freed" as part of a huge ioremap() operation. The correct behaviour
> > is to skip the free silently in this case, which is a little weird (the
> > function is a bit of a misnomer), but it follows the x86 implementation.
> 
> Hmm. I've obviously pulled it, but it does strike me that maybe the
> confusing semantics could be fixed?
> 
> There's only one call site, and only two implementations (x86 and arm64).
> 
> Maybe the whole "read pmd, check for present" could just be in the
> caller, avoiding that oddity wrt name-vs-behavior.
> 
> I'm not entirely sure why that code has that special case to begin
> with. I guess this is the only case of a kernel mapping of a hugepage,
> and people wanted to avoid polluting the generic code with stuff that
> was only relevant for architectures that implemented it? But we
> already have that ioremap_pmd_enabled() guard, so architectures that
> don't do this wouldn't actually have any extra code.
> 
> Anyway, I'll let it be, but I think it could be a good idea to simply
> change the semantics, and in the process also make it a lot clearer
> what that thing actually is expected to do.

Yeah, I agree, and it's better to fix it before it grows lots of other
users. I had a hack at it (see below), but note:

  * I ended up reworking the phys_addr calculations to avoid the misnomer
    there as well (namely that phys_addr isn't a physical address!).

  * The huge p4d code is only half implemented in mainline and unused by
    any architecture. I've added some of the missing bits, or we could
    just rip it out.

  * The shape of the code is duplicated, but I couldn't figure out a
    cleaner way to do it and maintain the type-checking for the different
    levels.

Happy to send as a proper patch if you think this is the right sort of
idea.

Will

--->8

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index 8080c9f489c3..58776b90dd2a 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -985,10 +985,8 @@ int pmd_free_pte_page(pmd_t *pmdp, unsigned long addr)
 
 	pmd = READ_ONCE(*pmdp);
 
-	if (!pmd_present(pmd))
-		return 1;
 	if (!pmd_table(pmd)) {
-		VM_WARN_ON(!pmd_table(pmd));
+		VM_WARN_ON(1);
 		return 1;
 	}
 
@@ -1008,10 +1006,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
 
 	pud = READ_ONCE(*pudp);
 
-	if (!pud_present(pud))
-		return 1;
 	if (!pud_table(pud)) {
-		VM_WARN_ON(!pud_table(pud));
+		VM_WARN_ON(1);
 		return 1;
 	}
 
@@ -1028,3 +1024,8 @@ int pud_free_pmd_page(pud_t *pudp, unsigned long addr)
 	pmd_free(NULL, table);
 	return 1;
 }
+
+int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
+{
+	return 0;	/* Don't attempt a block mapping */
+}
diff --git a/arch/x86/mm/pgtable.c b/arch/x86/mm/pgtable.c
index ae394552fb94..c6094997d060 100644
--- a/arch/x86/mm/pgtable.c
+++ b/arch/x86/mm/pgtable.c
@@ -779,6 +779,14 @@ int pmd_clear_huge(pmd_t *pmd)
 	return 0;
 }
 
+/*
+ * Until we support 512GB pages, skip them in the vmap area.
+ */
+int p4d_free_pud_page(p4d_t *p4d, unsigned long addr)
+{
+	return 0;
+}
+
 #ifdef CONFIG_X86_64
 /**
  * pud_free_pmd_page - Clear pud entry and free pmd page.
@@ -796,9 +804,6 @@ int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 	pte_t *pte;
 	int i;
 
-	if (pud_none(*pud))
-		return 1;
-
 	pmd = (pmd_t *)pud_page_vaddr(*pud);
 	pmd_sv = (pmd_t *)__get_free_page(GFP_KERNEL);
 	if (!pmd_sv)
@@ -840,9 +845,6 @@ int pmd_free_pte_page(pmd_t *pmd, unsigned long addr)
 {
 	pte_t *pte;
 
-	if (pmd_none(*pmd))
-		return 1;
-
 	pte = (pte_t *)pmd_page_vaddr(*pmd);
 	pmd_clear(pmd);
 
diff --git a/include/asm-generic/pgtable.h b/include/asm-generic/pgtable.h
index 88ebc6102c7c..9ac982ea6c4a 100644
--- a/include/asm-generic/pgtable.h
+++ b/include/asm-generic/pgtable.h
@@ -1019,6 +1019,7 @@ int pud_set_huge(pud_t *pud, phys_addr_t addr, pgprot_t prot);
 int pmd_set_huge(pmd_t *pmd, phys_addr_t addr, pgprot_t prot);
 int pud_clear_huge(pud_t *pud);
 int pmd_clear_huge(pmd_t *pmd);
+int p4d_free_pud_page(p4d_t *pmd, unsigned long addr);
 int pud_free_pmd_page(pud_t *pud, unsigned long addr);
 int pmd_free_pte_page(pmd_t *pmd, unsigned long addr);
 #else	/* !CONFIG_HAVE_ARCH_HUGE_VMAP */
@@ -1046,6 +1047,10 @@ static inline int pmd_clear_huge(pmd_t *pmd)
 {
 	return 0;
 }
+static inline int p4d_free_pud_page(p4d_t *pmd, unsigned long addr)
+{
+	return 0;
+}
 static inline int pud_free_pmd_page(pud_t *pud, unsigned long addr)
 {
 	return 0;
diff --git a/lib/ioremap.c b/lib/ioremap.c
index 517f5853ffed..3d04f0e1d79d 100644
--- a/lib/ioremap.c
+++ b/lib/ioremap.c
@@ -76,83 +76,123 @@ static int ioremap_pte_range(pmd_t *pmd, unsigned long addr,
 	return 0;
 }
 
+static int ioremap_try_huge_pmd(pmd_t *pmd, unsigned long addr,
+				unsigned long end, phys_addr_t phys_addr,
+				pgprot_t prot)
+{
+	if (!ioremap_pmd_enabled())
+		return 0;
+
+	if ((end - addr) != PMD_SIZE)
+		return 0;
+
+	if (!IS_ALIGNED(phys_addr, PMD_SIZE))
+		return 0;
+
+	if (pmd_present(*pmd) && !pmd_free_pte_page(pmd, addr))
+		return 0;
+
+	return pmd_set_huge(pmd, phys_addr, prot);
+}
+
 static inline int ioremap_pmd_range(pud_t *pud, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
 	pmd_t *pmd;
 	unsigned long next;
 
-	phys_addr -= addr;
 	pmd = pmd_alloc(&init_mm, pud, addr);
 	if (!pmd)
 		return -ENOMEM;
 	do {
 		next = pmd_addr_end(addr, end);
 
-		if (ioremap_pmd_enabled() &&
-		    ((next - addr) == PMD_SIZE) &&
-		    IS_ALIGNED(phys_addr + addr, PMD_SIZE) &&
-		    pmd_free_pte_page(pmd, addr)) {
-			if (pmd_set_huge(pmd, phys_addr + addr, prot))
-				continue;
-		}
+		if (ioremap_try_huge_pmd(pmd, addr, next, phys_addr, prot))
+			continue;
 
-		if (ioremap_pte_range(pmd, addr, next, phys_addr + addr, prot))
+		if (ioremap_pte_range(pmd, addr, next, phys_addr, prot))
 			return -ENOMEM;
-	} while (pmd++, addr = next, addr != end);
+	} while (pmd++, addr = next, phys_addr += PMD_SIZE, addr != end);
 	return 0;
 }
 
+static int ioremap_try_huge_pud(pud_t *pud, unsigned long addr,
+				unsigned long end, phys_addr_t phys_addr,
+				pgprot_t prot)
+{
+	if (!ioremap_pud_enabled())
+		return 0;
+
+	if ((end - addr) != PUD_SIZE)
+		return 0;
+
+	if (!IS_ALIGNED(phys_addr, PUD_SIZE))
+		return 0;
+
+	if (pud_present(*pud) && !pud_free_pmd_page(pud, addr))
+		return 0;
+
+	return pud_set_huge(pud, phys_addr, prot);
+}
+
 static inline int ioremap_pud_range(p4d_t *p4d, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
 	pud_t *pud;
 	unsigned long next;
 
-	phys_addr -= addr;
 	pud = pud_alloc(&init_mm, p4d, addr);
 	if (!pud)
 		return -ENOMEM;
 	do {
 		next = pud_addr_end(addr, end);
 
-		if (ioremap_pud_enabled() &&
-		    ((next - addr) == PUD_SIZE) &&
-		    IS_ALIGNED(phys_addr + addr, PUD_SIZE) &&
-		    pud_free_pmd_page(pud, addr)) {
-			if (pud_set_huge(pud, phys_addr + addr, prot))
-				continue;
-		}
+		if (ioremap_try_huge_pud(pud, addr, next, phys_addr, prot))
+			continue;
 
-		if (ioremap_pmd_range(pud, addr, next, phys_addr + addr, prot))
+		if (ioremap_pmd_range(pud, addr, next, phys_addr, prot))
 			return -ENOMEM;
-	} while (pud++, addr = next, addr != end);
+	} while (pud++, addr = next, phys_addr += PUD_SIZE, addr != end);
 	return 0;
 }
 
+static int ioremap_try_huge_p4d(p4d_t *p4d, unsigned long addr,
+				unsigned long end, phys_addr_t phys_addr,
+				pgprot_t prot)
+{
+	if (!ioremap_p4d_enabled())
+		return 0;
+
+	if ((end - addr) != P4D_SIZE)
+		return 0;
+
+	if (!IS_ALIGNED(phys_addr, P4D_SIZE))
+		return 0;
+
+	if (p4d_present(*p4d) && !p4d_free_pud_page(p4d, addr))
+		return 0;
+
+	return p4d_set_huge(p4d, phys_addr, prot);
+}
+
 static inline int ioremap_p4d_range(pgd_t *pgd, unsigned long addr,
 		unsigned long end, phys_addr_t phys_addr, pgprot_t prot)
 {
 	p4d_t *p4d;
 	unsigned long next;
 
-	phys_addr -= addr;
 	p4d = p4d_alloc(&init_mm, pgd, addr);
 	if (!p4d)
 		return -ENOMEM;
 	do {
 		next = p4d_addr_end(addr, end);
 
-		if (ioremap_p4d_enabled() &&
-		    ((next - addr) == P4D_SIZE) &&
-		    IS_ALIGNED(phys_addr + addr, P4D_SIZE)) {
-			if (p4d_set_huge(p4d, phys_addr + addr, prot))
-				continue;
-		}
+		if (ioremap_try_huge_p4d(p4d, addr, next, phys_addr, prot))
+			continue;
 
-		if (ioremap_pud_range(p4d, addr, next, phys_addr + addr, prot))
+		if (ioremap_pud_range(p4d, addr, next, phys_addr, prot))
 			return -ENOMEM;
-	} while (p4d++, addr = next, addr != end);
+	} while (p4d++, addr = next, phys_addr += P4D_SIZE, addr != end);
 	return 0;
 }
 
@@ -168,14 +208,13 @@ int ioremap_page_range(unsigned long addr,
 	BUG_ON(addr >= end);
 
 	start = addr;
-	phys_addr -= addr;
 	pgd = pgd_offset_k(addr);
 	do {
 		next = pgd_addr_end(addr, end);
-		err = ioremap_p4d_range(pgd, addr, next, phys_addr+addr, prot);
+		err = ioremap_p4d_range(pgd, addr, next, phys_addr, prot);
 		if (err)
 			break;
-	} while (pgd++, addr = next, addr != end);
+	} while (pgd++, addr = next, phys_addr += addr, addr != end);
 
 	flush_cache_vmap(start, end);
Linus Torvalds Sept. 11, 2018, 6:45 p.m. UTC | #3
On Mon, Sep 10, 2018 at 7:45 AM Will Deacon <will.deacon@arm.com> wrote:
>
> Happy to send as a proper patch if you think this is the right sort of
> idea.

Looks sane to me,

            Linus