diff mbox

[1/2] arm64: hugetlb: remove the wrong pmd check in find_num_contig()

Message ID 1478140059-13829-2-git-send-email-shijie.huang@arm.com (mailing list archive)
State New, archived
Headers show

Commit Message

Huang Shijie Nov. 3, 2016, 2:27 a.m. UTC
The find_num_contig() will return 1 when the pmd is not present.
It will cause a kernel dead loop in the following scenaro:

   1.) pmd entry is not present.

   2.) the page fault occurs:
       ... hugetlb_fault() --> hugetlb_no_page() --> set_huge_pte_at()

   3.) set_huge_pte_at() will only set the first PMD entry, since the
       find_num_contig just return 1 in this case. So the PMD entries
       are all empty except the first one.

   4.) when kernel accesses the address mapped by the second PMD entry,
       a new page fault occurs:
       ... hugetlb_fault() --> huge_ptep_set_access_flags()

       The second PMD entry is still empty now.

   5.) When the kernel returns, the access will cause a page fault again.
       The kernel will run like the "4)" above.
       We will see a dead loop since here.

The dead loop is caught in the 32M hugetlb page (2M PMD + Contiguous bit).

This patch removes wrong pmd check, and fixes this dead loop.

Acked-by: Steve Capper <steve.capper@arm.com>
Signed-off-by: Huang Shijie <shijie.huang@arm.com>
---
 arch/arm64/mm/hugetlbpage.c | 4 ----
 1 file changed, 4 deletions(-)

Comments

Catalin Marinas Nov. 4, 2016, 12:16 a.m. UTC | #1
On Thu, Nov 03, 2016 at 10:27:38AM +0800, Huang Shijie wrote:
> diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> index 2e49bd2..4811ef1 100644
> --- a/arch/arm64/mm/hugetlbpage.c
> +++ b/arch/arm64/mm/hugetlbpage.c
> @@ -61,10 +61,6 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
>  		return 1;
>  	}
>  	pmd = pmd_offset(pud, addr);
> -	if (!pmd_present(*pmd)) {
> -		VM_BUG_ON(!pmd_present(*pmd));
> -		return 1;
> -	}
>  	if ((pte_t *)pmd == ptep) {
>  		*pgsize = PMD_SIZE;
>  		return CONT_PMDS;

BTW, for the !pud_present() and !pgd_present() cases, shouldn't
find_num_contig() actually return 0? These are more likely real bugs, so
no point in setting the huge pte.
Huang Shijie Nov. 4, 2016, 2:52 a.m. UTC | #2
On Thu, Nov 03, 2016 at 06:16:16PM -0600, Catalin Marinas wrote:
> On Thu, Nov 03, 2016 at 10:27:38AM +0800, Huang Shijie wrote:
> > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > index 2e49bd2..4811ef1 100644
> > --- a/arch/arm64/mm/hugetlbpage.c
> > +++ b/arch/arm64/mm/hugetlbpage.c
> > @@ -61,10 +61,6 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
> >  		return 1;
> >  	}
> >  	pmd = pmd_offset(pud, addr);
> > -	if (!pmd_present(*pmd)) {
> > -		VM_BUG_ON(!pmd_present(*pmd));
> > -		return 1;
> > -	}
> >  	if ((pte_t *)pmd == ptep) {
> >  		*pgsize = PMD_SIZE;
> >  		return CONT_PMDS;
> 
> BTW, for the !pud_present() and !pgd_present() cases, shouldn't
The kernel will not call the find_num_contig() if the PGD/PUD are empty.
Please see the code in the hugetlb_fault().

   ------------------------------------------------------
	ptep = huge_pte_offset(mm, address);
	if (ptep) {
	    ...............................
	} else {
		ptep = huge_pte_alloc(mm, address, huge_page_size(h));
		if (!ptep)
			return VM_FAULT_OOM;
	}
   ------------------------------------------------------


Thanks
Huang Shijie
> find_num_contig() actually return 0? These are more likely real bugs, so
> no point in setting the huge pte.
Catalin Marinas Nov. 4, 2016, 3:48 p.m. UTC | #3
On Fri, Nov 04, 2016 at 10:52:17AM +0800, Huang Shijie wrote:
> On Thu, Nov 03, 2016 at 06:16:16PM -0600, Catalin Marinas wrote:
> > On Thu, Nov 03, 2016 at 10:27:38AM +0800, Huang Shijie wrote:
> > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > > index 2e49bd2..4811ef1 100644
> > > --- a/arch/arm64/mm/hugetlbpage.c
> > > +++ b/arch/arm64/mm/hugetlbpage.c
> > > @@ -61,10 +61,6 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
> > >  		return 1;
> > >  	}
> > >  	pmd = pmd_offset(pud, addr);
> > > -	if (!pmd_present(*pmd)) {
> > > -		VM_BUG_ON(!pmd_present(*pmd));
> > > -		return 1;
> > > -	}
> > >  	if ((pte_t *)pmd == ptep) {
> > >  		*pgsize = PMD_SIZE;
> > >  		return CONT_PMDS;
> > 
> > BTW, for the !pud_present() and !pgd_present() cases, shouldn't
> > find_num_contig() actually return 0? These are more likely real bugs, so
> > no point in setting the huge pte.
> 
> The kernel will not call the find_num_contig() if the PGD/PUD are empty.
> Please see the code in the hugetlb_fault().
> 
>    ------------------------------------------------------
> 	ptep = huge_pte_offset(mm, address);
> 	if (ptep) {
> 	    ...............................
> 	} else {
> 		ptep = huge_pte_alloc(mm, address, huge_page_size(h));
> 		if (!ptep)
> 			return VM_FAULT_OOM;
> 	}
>    ------------------------------------------------------

Exactly. So what is the reason for returning 1 if !pgd_present()? Would
removing the checks entirely or adding BUG() be a better option?
Huang Shijie Nov. 8, 2016, 2:25 a.m. UTC | #4
On Fri, Nov 04, 2016 at 09:48:14AM -0600, Catalin Marinas wrote:
> On Fri, Nov 04, 2016 at 10:52:17AM +0800, Huang Shijie wrote:
> > On Thu, Nov 03, 2016 at 06:16:16PM -0600, Catalin Marinas wrote:
> > > On Thu, Nov 03, 2016 at 10:27:38AM +0800, Huang Shijie wrote:
> > > > diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
> > > > index 2e49bd2..4811ef1 100644
> > > > --- a/arch/arm64/mm/hugetlbpage.c
> > > > +++ b/arch/arm64/mm/hugetlbpage.c
> > > > @@ -61,10 +61,6 @@ static int find_num_contig(struct mm_struct *mm, unsigned long addr,
> > > >  		return 1;
> > > >  	}
> > > >  	pmd = pmd_offset(pud, addr);
> > > > -	if (!pmd_present(*pmd)) {
> > > > -		VM_BUG_ON(!pmd_present(*pmd));
> > > > -		return 1;
> > > > -	}
> > > >  	if ((pte_t *)pmd == ptep) {
> > > >  		*pgsize = PMD_SIZE;
> > > >  		return CONT_PMDS;
> > > 
> > > BTW, for the !pud_present() and !pgd_present() cases, shouldn't
> > > find_num_contig() actually return 0? These are more likely real bugs, so
> > > no point in setting the huge pte.
> > 
> > The kernel will not call the find_num_contig() if the PGD/PUD are empty.
> > Please see the code in the hugetlb_fault().
> > 
> >    ------------------------------------------------------
> > 	ptep = huge_pte_offset(mm, address);
> > 	if (ptep) {
> > 	    ...............................
> > 	} else {
> > 		ptep = huge_pte_alloc(mm, address, huge_page_size(h));
> > 		if (!ptep)
> > 			return VM_FAULT_OOM;
> > 	}
> >    ------------------------------------------------------
> 
> Exactly. So what is the reason for returning 1 if !pgd_present()? Would
I think the author was too cautious for returning 1 if !pgd_present().
:)
> removing the checks entirely or adding BUG() be a better option?
I will remove the checks in the next version.

Thanks
Huang Shijie
diff mbox

Patch

diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c
index 2e49bd2..4811ef1 100644
--- a/arch/arm64/mm/hugetlbpage.c
+++ b/arch/arm64/mm/hugetlbpage.c
@@ -61,10 +61,6 @@  static int find_num_contig(struct mm_struct *mm, unsigned long addr,
 		return 1;
 	}
 	pmd = pmd_offset(pud, addr);
-	if (!pmd_present(*pmd)) {
-		VM_BUG_ON(!pmd_present(*pmd));
-		return 1;
-	}
 	if ((pte_t *)pmd == ptep) {
 		*pgsize = PMD_SIZE;
 		return CONT_PMDS;