diff mbox series

mm/hugetlb: avoid unnecessary check on pud and pmd entry in huge_pte_offset

Message ID 1587646154-26276-1-git-send-email-lixinhai.lxh@gmail.com (mailing list archive)
State New, archived
Headers show
Series mm/hugetlb: avoid unnecessary check on pud and pmd entry in huge_pte_offset | expand

Commit Message

Li Xinhai April 23, 2020, 12:49 p.m. UTC
When huge_pte_offset() is called, the parameter sz can only be PUD_SIZE
or PMD_SIZE.
If sz is PUD_SIZE and code can reach pud, then *pud must be none, or
normal hugetlb entry, or non-present (migration or hwpoisoned) hugetlb
entry, and we can directly return pud.
When sz is PMD_SIZE, pud must be none or present, and if code can reach
pmd, we can directly return pmd.

So, after this patch, the code is simplified by first check on the
parameter sz, and avoid unnecessary checks in current code.

Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
Cc: Mike Kravetz <mike.kravetz@oracle.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
---
 mm/hugetlb.c | 24 +++++++++---------------
 1 file changed, 9 insertions(+), 15 deletions(-)

Comments

Li Xinhai April 23, 2020, 1:12 p.m. UTC | #1
On 2020-04-23 at 20:49 Li Xinhai wrote:
>When huge_pte_offset() is called, the parameter sz can only be PUD_SIZE
>or PMD_SIZE.
>If sz is PUD_SIZE and code can reach pud, then *pud must be none, or
>normal hugetlb entry, or non-present (migration or hwpoisoned) hugetlb
>entry, and we can directly return pud.
>When sz is PMD_SIZE, pud must be none or present, and if code can reach
>pmd, we can directly return pmd.
>
>So, after this patch, the code is simplified by first check on the
>parameter sz, and avoid unnecessary checks in current code.
>
>Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>Cc: Mike Kravetz <mike.kravetz@oracle.com>
>Cc: Andrew Morton <akpm@linux-foundation.org> 

Since this huge_pte_offset() is under CONFIG_ARCH_WANT_GENERAL_HUGETLB, 
I only have chance to test with x86_64, but the logical should hold for all other
cases which using this general implementation.

The exsiting code path was introduced by commit 9b19df292c6 (mm/hugetlb.c:
make huge_pte_offset() consistent and document behaviour), and the sematics
is maintained after current patch applied.
Mike Kravetz April 23, 2020, 6:14 p.m. UTC | #2
Cc a few people who have looked at huge_pte_offset() recently.

On 4/23/20 5:49 AM, Li Xinhai wrote:
> When huge_pte_offset() is called, the parameter sz can only be PUD_SIZE
> or PMD_SIZE.
> If sz is PUD_SIZE and code can reach pud, then *pud must be none, or
> normal hugetlb entry, or non-present (migration or hwpoisoned) hugetlb
> entry, and we can directly return pud.
> When sz is PMD_SIZE, pud must be none or present, and if code can reach
> pmd, we can directly return pmd.
> 
> So, after this patch, the code is simplified by first check on the
> parameter sz, and avoid unnecessary checks in current code.
> 
> Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
> Cc: Mike Kravetz <mike.kravetz@oracle.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> ---
>  mm/hugetlb.c | 24 +++++++++---------------
>  1 file changed, 9 insertions(+), 15 deletions(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index bcabbe0..e1424f5 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5365,8 +5365,8 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>  {
>  	pgd_t *pgd;
>  	p4d_t *p4d;
> -	pud_t *pud, pud_entry;
> -	pmd_t *pmd, pmd_entry;
> +	pud_t *pud;
> +	pmd_t *pmd;
>  
>  	pgd = pgd_offset(mm, addr);
>  	if (!pgd_present(*pgd))
> @@ -5376,22 +5376,16 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>  		return NULL;
>  
>  	pud = pud_offset(p4d, addr);
> -	pud_entry = READ_ONCE(*pud);
> -	if (sz != PUD_SIZE && pud_none(pud_entry))
> -		return NULL;
> -	/* hugepage or swap? */
> -	if (pud_huge(pud_entry) || !pud_present(pud_entry))
> +	if (sz == PUD_SIZE)
> +		/* must be pud_huge or pud_none */
>  		return (pte_t *)pud;
> -
> -	pmd = pmd_offset(pud, addr);
> -	pmd_entry = READ_ONCE(*pmd);
> -	if (sz != PMD_SIZE && pmd_none(pmd_entry))
> +	if (!pud_present(*pud))
>  		return NULL;
> -	/* hugepage or swap? */
> -	if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
> -		return (pte_t *)pmd;
> +	/* must have a valid entry and size to go further */
>  
> -	return NULL;
> +	pmd = pmd_offset(pud, addr);

Can we get here with sz = PMD_SIZE and pud_none(*pud)?  Would that be
an issue for the pmd_offset() call?
Jason Gunthorpe April 23, 2020, 6:38 p.m. UTC | #3
On Thu, Apr 23, 2020 at 11:14:28AM -0700, Mike Kravetz wrote:
> Cc a few people who have looked at huge_pte_offset() recently.
> 
> On 4/23/20 5:49 AM, Li Xinhai wrote:
> > When huge_pte_offset() is called, the parameter sz can only be PUD_SIZE
> > or PMD_SIZE.
> > If sz is PUD_SIZE and code can reach pud, then *pud must be none, or
> > normal hugetlb entry, or non-present (migration or hwpoisoned) hugetlb
> > entry, and we can directly return pud.
> > When sz is PMD_SIZE, pud must be none or present, and if code can reach
> > pmd, we can directly return pmd.
> > 
> > So, after this patch, the code is simplified by first check on the
> > parameter sz, and avoid unnecessary checks in current code.
> > 
> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> >  mm/hugetlb.c | 24 +++++++++---------------
> >  1 file changed, 9 insertions(+), 15 deletions(-)
> > 
> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> > index bcabbe0..e1424f5 100644
> > +++ b/mm/hugetlb.c
> > @@ -5365,8 +5365,8 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
> >  {
> >  	pgd_t *pgd;
> >  	p4d_t *p4d;
> > -	pud_t *pud, pud_entry;
> > -	pmd_t *pmd, pmd_entry;
> > +	pud_t *pud;
> > +	pmd_t *pmd;
> >  
> >  	pgd = pgd_offset(mm, addr);
> >  	if (!pgd_present(*pgd))
> > @@ -5376,22 +5376,16 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
> >  		return NULL;
> >  
> >  	pud = pud_offset(p4d, addr);
> > -	pud_entry = READ_ONCE(*pud);
> > -	if (sz != PUD_SIZE && pud_none(pud_entry))
> > -		return NULL;
> > -	/* hugepage or swap? */
> > -	if (pud_huge(pud_entry) || !pud_present(pud_entry))
> > +	if (sz == PUD_SIZE)
> > +		/* must be pud_huge or pud_none */
> >  		return (pte_t *)pud;
> > -
> > -	pmd = pmd_offset(pud, addr);
> > -	pmd_entry = READ_ONCE(*pmd);
> > -	if (sz != PMD_SIZE && pmd_none(pmd_entry))
> > +	if (!pud_present(*pud))
> >  		return NULL;
> > -	/* hugepage or swap? */
> > -	if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
> > -		return (pte_t *)pmd;
> > +	/* must have a valid entry and size to go further */
> >  
> > -	return NULL;
> > +	pmd = pmd_offset(pud, addr);
> 
> Can we get here with sz = PMD_SIZE and pud_none(*pud)?  Would that be
> an issue for the pmd_offset() call?

Certainly pmd_offset() must only be called if the PUD entry is
pointing at a pmd level.

AFAIK this means it should not be called on pud_none(), pud_huge() or
!pud_present() cases.

Jason
Li Xinhai April 24, 2020, 4:07 a.m. UTC | #4
On 2020-04-24 at 02:38 Jason Gunthorpe wrote:
>On Thu, Apr 23, 2020 at 11:14:28AM -0700, Mike Kravetz wrote:
>> Cc a few people who have looked at huge_pte_offset() recently.
>>
>> On 4/23/20 5:49 AM, Li Xinhai wrote:
>> > When huge_pte_offset() is called, the parameter sz can only be PUD_SIZE
>> > or PMD_SIZE.
>> > If sz is PUD_SIZE and code can reach pud, then *pud must be none, or
>> > normal hugetlb entry, or non-present (migration or hwpoisoned) hugetlb
>> > entry, and we can directly return pud.
>> > When sz is PMD_SIZE, pud must be none or present, and if code can reach
>> > pmd, we can directly return pmd.
>> >
>> > So, after this patch, the code is simplified by first check on the
>> > parameter sz, and avoid unnecessary checks in current code.
>> >
>> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> > Cc: Andrew Morton <akpm@linux-foundation.org>
>> >  mm/hugetlb.c | 24 +++++++++---------------
>> >  1 file changed, 9 insertions(+), 15 deletions(-)
>> >
>> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> > index bcabbe0..e1424f5 100644
>> > +++ b/mm/hugetlb.c
>> > @@ -5365,8 +5365,8 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>> >  {
>> >  pgd_t *pgd;
>> >  p4d_t *p4d;
>> > -	pud_t *pud, pud_entry;
>> > -	pmd_t *pmd, pmd_entry;
>> > +	pud_t *pud;
>> > +	pmd_t *pmd;
>> > 
>> >  pgd = pgd_offset(mm, addr);
>> >  if (!pgd_present(*pgd))
>> > @@ -5376,22 +5376,16 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>> >  return NULL;
>> > 
>> >  pud = pud_offset(p4d, addr);
>> > -	pud_entry = READ_ONCE(*pud);
>> > -	if (sz != PUD_SIZE && pud_none(pud_entry))
>> > -	return NULL;
>> > -	/* hugepage or swap? */
>> > -	if (pud_huge(pud_entry) || !pud_present(pud_entry))
>> > +	if (sz == PUD_SIZE)
>> > +	/* must be pud_huge or pud_none */
>> >  return (pte_t *)pud;
>> > -
>> > -	pmd = pmd_offset(pud, addr);
>> > -	pmd_entry = READ_ONCE(*pmd);
>> > -	if (sz != PMD_SIZE && pmd_none(pmd_entry))
>> > +	if (!pud_present(*pud))
>> >  return NULL;
>> > -	/* hugepage or swap? */
>> > -	if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
>> > -	return (pte_t *)pmd;
>> > +	/* must have a valid entry and size to go further */
>> > 
>> > -	return NULL;
>> > +	pmd = pmd_offset(pud, addr);
>>
>> Can we get here with sz = PMD_SIZE and pud_none(*pud)?  Would that be
>> an issue for the pmd_offset() call?
>
>Certainly pmd_offset() must only be called if the PUD entry is
>pointing at a pmd level.
>
>AFAIK this means it should not be called on pud_none(), pud_huge() or
>!pud_present() cases. 

The test of !pud_present(*pud) also block pud_none(*pud), so when sz ==
PMD_SIZE, pmd_offset() only called with a valid PUD entry which point to PMD
page table.

>
>Jason
Jason Gunthorpe April 24, 2020, 12:57 p.m. UTC | #5
On Fri, Apr 24, 2020 at 12:07:50PM +0800, Li Xinhai wrote:
> On 2020-04-24 at 02:38 Jason Gunthorpe wrote:
> >On Thu, Apr 23, 2020 at 11:14:28AM -0700, Mike Kravetz wrote:
> >> Cc a few people who have looked at huge_pte_offset() recently.
> >>
> >> On 4/23/20 5:49 AM, Li Xinhai wrote:
> >> > When huge_pte_offset() is called, the parameter sz can only be PUD_SIZE
> >> > or PMD_SIZE.
> >> > If sz is PUD_SIZE and code can reach pud, then *pud must be none, or
> >> > normal hugetlb entry, or non-present (migration or hwpoisoned) hugetlb
> >> > entry, and we can directly return pud.
> >> > When sz is PMD_SIZE, pud must be none or present, and if code can reach
> >> > pmd, we can directly return pmd.
> >> >
> >> > So, after this patch, the code is simplified by first check on the
> >> > parameter sz, and avoid unnecessary checks in current code.
> >> >
> >> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
> >> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
> >> > Cc: Andrew Morton <akpm@linux-foundation.org>
> >> >  mm/hugetlb.c | 24 +++++++++---------------
> >> >  1 file changed, 9 insertions(+), 15 deletions(-)
> >> >
> >> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> >> > index bcabbe0..e1424f5 100644
> >> > +++ b/mm/hugetlb.c
> >> > @@ -5365,8 +5365,8 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
> >> >  {
> >> >  pgd_t *pgd;
> >> >  p4d_t *p4d;
> >> > -	pud_t *pud, pud_entry;
> >> > -	pmd_t *pmd, pmd_entry;
> >> > +	pud_t *pud;
> >> > +	pmd_t *pmd;
> >> > 
> >> >  pgd = pgd_offset(mm, addr);
> >> >  if (!pgd_present(*pgd))
> >> > @@ -5376,22 +5376,16 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
> >> >  return NULL;
> >> > 
> >> >  pud = pud_offset(p4d, addr);
> >> > -	pud_entry = READ_ONCE(*pud);
> >> > -	if (sz != PUD_SIZE && pud_none(pud_entry))
> >> > -	return NULL;
> >> > -	/* hugepage or swap? */
> >> > -	if (pud_huge(pud_entry) || !pud_present(pud_entry))
> >> > +	if (sz == PUD_SIZE)
> >> > +	/* must be pud_huge or pud_none */
> >> >  return (pte_t *)pud;
> >> > -
> >> > -	pmd = pmd_offset(pud, addr);
> >> > -	pmd_entry = READ_ONCE(*pmd);
> >> > -	if (sz != PMD_SIZE && pmd_none(pmd_entry))
> >> > +	if (!pud_present(*pud))
> >> >  return NULL;
> >> > -	/* hugepage or swap? */
> >> > -	if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
> >> > -	return (pte_t *)pmd;
> >> > +	/* must have a valid entry and size to go further */
> >> > 
> >> > -	return NULL;
> >> > +	pmd = pmd_offset(pud, addr);
> >>
> >> Can we get here with sz = PMD_SIZE and pud_none(*pud)?  Would that be
> >> an issue for the pmd_offset() call?
> >
> >Certainly pmd_offset() must only be called if the PUD entry is
> >pointing at a pmd level.
> >
> >AFAIK this means it should not be called on pud_none(), pud_huge() or
> >!pud_present() cases. 
> 
> The test of !pud_present(*pud) also block pud_none(*pud)

Sure

> , so when sz == PMD_SIZE, pmd_offset() only called with a valid PUD
> entry which point to PMD page table.

But what prevents pud_huge?

This API seems kind of strange to be honest.. Should it be two
functions instead of a sz parameter?

huge_pud_offset() and huge_pmd_offset() ?

Jason
Li Xinhai April 24, 2020, 1:33 p.m. UTC | #6
On 2020-04-24 at 20:57 Jason Gunthorpe wrote:
>On Fri, Apr 24, 2020 at 12:07:50PM +0800, Li Xinhai wrote:
>> On 2020-04-24 at 02:38 Jason Gunthorpe wrote:
>> >On Thu, Apr 23, 2020 at 11:14:28AM -0700, Mike Kravetz wrote:
>> >> Cc a few people who have looked at huge_pte_offset() recently.
>> >>
>> >> On 4/23/20 5:49 AM, Li Xinhai wrote:
>> >> > When huge_pte_offset() is called, the parameter sz can only be PUD_SIZE
>> >> > or PMD_SIZE.
>> >> > If sz is PUD_SIZE and code can reach pud, then *pud must be none, or
>> >> > normal hugetlb entry, or non-present (migration or hwpoisoned) hugetlb
>> >> > entry, and we can directly return pud.
>> >> > When sz is PMD_SIZE, pud must be none or present, and if code can reach
>> >> > pmd, we can directly return pmd.
>> >> >
>> >> > So, after this patch, the code is simplified by first check on the
>> >> > parameter sz, and avoid unnecessary checks in current code.
>> >> >
>> >> > Signed-off-by: Li Xinhai <lixinhai.lxh@gmail.com>
>> >> > Cc: Mike Kravetz <mike.kravetz@oracle.com>
>> >> > Cc: Andrew Morton <akpm@linux-foundation.org>
>> >> >  mm/hugetlb.c | 24 +++++++++---------------
>> >> >  1 file changed, 9 insertions(+), 15 deletions(-)
>> >> >
>> >> > diff --git a/mm/hugetlb.c b/mm/hugetlb.c
>> >> > index bcabbe0..e1424f5 100644
>> >> > +++ b/mm/hugetlb.c
>> >> > @@ -5365,8 +5365,8 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>> >> >  {
>> >> >  pgd_t *pgd;
>> >> >  p4d_t *p4d;
>> >> > -	pud_t *pud, pud_entry;
>> >> > -	pmd_t *pmd, pmd_entry;
>> >> > +	pud_t *pud;
>> >> > +	pmd_t *pmd;
>> >> > 
>> >> >  pgd = pgd_offset(mm, addr);
>> >> >  if (!pgd_present(*pgd))
>> >> > @@ -5376,22 +5376,16 @@ pte_t *huge_pte_offset(struct mm_struct *mm,
>> >> >  return NULL;
>> >> > 
>> >> >  pud = pud_offset(p4d, addr);
>> >> > -	pud_entry = READ_ONCE(*pud);
>> >> > -	if (sz != PUD_SIZE && pud_none(pud_entry))
>> >> > -	return NULL;
>> >> > -	/* hugepage or swap? */
>> >> > -	if (pud_huge(pud_entry) || !pud_present(pud_entry))
>> >> > +	if (sz == PUD_SIZE)
>> >> > +	/* must be pud_huge or pud_none */
>> >> >  return (pte_t *)pud;
>> >> > -
>> >> > -	pmd = pmd_offset(pud, addr);
>> >> > -	pmd_entry = READ_ONCE(*pmd);
>> >> > -	if (sz != PMD_SIZE && pmd_none(pmd_entry))
>> >> > +	if (!pud_present(*pud))
>> >> >  return NULL;
>> >> > -	/* hugepage or swap? */
>> >> > -	if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
>> >> > -	return (pte_t *)pmd;
>> >> > +	/* must have a valid entry and size to go further */
>> >> > 
>> >> > -	return NULL;
>> >> > +	pmd = pmd_offset(pud, addr);
>> >>
>> >> Can we get here with sz = PMD_SIZE and pud_none(*pud)?  Would that be
>> >> an issue for the pmd_offset() call?
>> >
>> >Certainly pmd_offset() must only be called if the PUD entry is
>> >pointing at a pmd level.
>> >
>> >AFAIK this means it should not be called on pud_none(), pud_huge() or
>> >!pud_present() cases.
>>
>> The test of !pud_present(*pud) also block pud_none(*pud)
>
>Sure
>
>> , so when sz == PMD_SIZE, pmd_offset() only called with a valid PUD
>> entry which point to PMD page table.
>
>But what prevents pud_huge?
> 
if sz == PUD_SIZE, the 'return (pte_t*)pud' alrady end the function, which cover
pud_huge() and pud_none(), because we the mapping is for PUD_SIZE huge page.

So, there is no possibility for pmd_offset() been called with invalid pud entry.
Below is the code I used for test which has BUG_ON, that should give more
clear idea about the semantics of code path:

...
	pud = pud_offset(p4d, addr);
	if (sz == PUD_SIZE) {
		/* must be pud_huge or pud_none */
		BUG_ON(!pud_huge(*pud) && !pud_none(*pud));
		return (pte_t *)pud; // note that return valid pointer for pud_none() case,
				// instead of NULL, that is same semantics as existing code.
	}
	if (!pud_present(*pud))
		return NULL; // note that only return NULL in case pud not present,
								// same sematics as existing code.
	/* must have a valid entry and size to go further */
	BUG_ON(sz != PMD_SIZE);
	
	pmd = pmd_offset(pud, addr);
	/* must be pmd_huge or pmd_none */
	BUG_ON(!pmd_huge(*pmd) && !pmd_none(*pmd));
	return (pte_t *)pmd; // note that return valid pointer for pmd_none() case,
			// instead of NULL, that is same semantics as existing code.
...

>This API seems kind of strange to be honest.. Should it be two
>functions instead of a sz parameter?
>
>huge_pud_offset() and huge_pmd_offset() ? 

I think checking huge size then call to one of these two functions at caller
site will involve many redundant code  do branch work in one function is
better.

>
>Jason
Jason Gunthorpe April 24, 2020, 1:42 p.m. UTC | #7
On Fri, Apr 24, 2020 at 09:33:40PM +0800, Li Xinhai wrote:
> >
> >> , so when sz == PMD_SIZE, pmd_offset() only called with a valid PUD
> >> entry which point to PMD page table.
> >
> >But what prevents pud_huge?
> > 
> if sz == PUD_SIZE, the 'return (pte_t*)pud' alrady end the function, which cover
> pud_huge() and pud_none(), because we the mapping is for PUD_SIZE huge page.
> 
> So, there is no possibility for pmd_offset() been called with invalid pud entry.
> Below is the code I used for test which has BUG_ON, that should give more
> clear idea about the semantics of code path:
> 
> ...
> 	pud = pud_offset(p4d, addr);
> 	if (sz == PUD_SIZE) {
> 		/* must be pud_huge or pud_none */
> 		BUG_ON(!pud_huge(*pud) && !pud_none(*pud));
> 		return (pte_t *)pud; // note that return valid pointer for pud_none() case,
> 				// instead of NULL, that is same semantics as existing code.
> 	}
> 	if (!pud_present(*pud))
> 		return NULL; // note that only return NULL in case pud not present,
> 								// same sematics as existing code.
> 	/* must have a valid entry and size to go further */
> 	BUG_ON(sz != PMD_SIZE);
> 	
> 	pmd = pmd_offset(pud, addr);
> 	/* must be pmd_huge or pmd_none */
> 	BUG_ON(!pmd_huge(*pmd) && !pmd_none(*pmd));

But why is !pmd_huge() ? The prior code returned null here, is that
dead code? Your commit message should explain all of this..

Jason
Li Xinhai April 24, 2020, 2:07 p.m. UTC | #8
On 2020-04-24 at 21:42 Jason Gunthorpe wrote:
>On Fri, Apr 24, 2020 at 09:33:40PM +0800, Li Xinhai wrote:
>> >
>> >> , so when sz == PMD_SIZE, pmd_offset() only called with a valid PUD
>> >> entry which point to PMD page table.
>> >
>> >But what prevents pud_huge?
>> >
>> if sz == PUD_SIZE, the 'return (pte_t*)pud' alrady end the function, which cover
>> pud_huge() and pud_none(), because we the mapping is for PUD_SIZE huge page.
>>
>> So, there is no possibility for pmd_offset() been called with invalid pud entry.
>> Below is the code I used for test which has BUG_ON, that should give more
>> clear idea about the semantics of code path:
>>
>> ...
>> pud = pud_offset(p4d, addr);
>> if (sz == PUD_SIZE) {
>> /* must be pud_huge or pud_none */
>> BUG_ON(!pud_huge(*pud) && !pud_none(*pud));
>> return (pte_t *)pud; // note that return valid pointer for pud_none() case,
>> // instead of NULL, that is same semantics as existing code.
>> }
>> if (!pud_present(*pud))
>> return NULL; // note that only return NULL in case pud not present,
>> // same sematics as existing code.
>> /* must have a valid entry and size to go further */
>> BUG_ON(sz != PMD_SIZE);
>>
>> pmd = pmd_offset(pud, addr);
>> /* must be pmd_huge or pmd_none */
>> BUG_ON(!pmd_huge(*pmd) && !pmd_none(*pmd));
>
>But why is !pmd_huge() ? The prior code returned null here, is that
>dead code? Your commit message should explain all of this..
> 
let's see exising code for pmd part, the reason are in comments:
...
        pmd = pmd_offset(pud, addr);
        if (sz != PMD_SIZE && pmd_none(*pmd))
                return NULL; // dead code, must sz == PMD_SIZE
        /* hugepage or swap? */
        if (pmd_huge(*pmd) || !pmd_present(*pmd)) // !pmd_present() also cover pmd_none(),
                return (pte_t *)pmd; // so, all possible and valid value in pmd entry will reach here.

	return NULL; // dead code; can we have (!pmd_huge() && pmd_present()) and reach here? 
		// no, because this is a hugetlb mapping. otherwise, there is invalid value in pmd entry.
...

>Jason
Jason Gunthorpe April 24, 2020, 2:10 p.m. UTC | #9
On Fri, Apr 24, 2020 at 10:07:19PM +0800, Li Xinhai wrote:
> On 2020-04-24 at 21:42 Jason Gunthorpe wrote:
> >On Fri, Apr 24, 2020 at 09:33:40PM +0800, Li Xinhai wrote:
> >> >
> >> >> , so when sz == PMD_SIZE, pmd_offset() only called with a valid PUD
> >> >> entry which point to PMD page table.
> >> >
> >> >But what prevents pud_huge?
> >> >
> >> if sz == PUD_SIZE, the 'return (pte_t*)pud' alrady end the function, which cover
> >> pud_huge() and pud_none(), because we the mapping is for PUD_SIZE huge page.
> >>
> >> So, there is no possibility for pmd_offset() been called with invalid pud entry.
> >> Below is the code I used for test which has BUG_ON, that should give more
> >> clear idea about the semantics of code path:
> >>
> >> ...
> >> pud = pud_offset(p4d, addr);
> >> if (sz == PUD_SIZE) {
> >> /* must be pud_huge or pud_none */
> >> BUG_ON(!pud_huge(*pud) && !pud_none(*pud));
> >> return (pte_t *)pud; // note that return valid pointer for pud_none() case,
> >> // instead of NULL, that is same semantics as existing code.
> >> }
> >> if (!pud_present(*pud))
> >> return NULL; // note that only return NULL in case pud not present,
> >> // same sematics as existing code.
> >> /* must have a valid entry and size to go further */
> >> BUG_ON(sz != PMD_SIZE);
> >>
> >> pmd = pmd_offset(pud, addr);
> >> /* must be pmd_huge or pmd_none */
> >> BUG_ON(!pmd_huge(*pmd) && !pmd_none(*pmd));
> >
> >But why is !pmd_huge() ? The prior code returned null here, is that
> >dead code? Your commit message should explain all of this..
> > 
> let's see exising code for pmd part, the reason are in comments:
> ...
>         pmd = pmd_offset(pud, addr);
>         if (sz != PMD_SIZE && pmd_none(*pmd))
>                 return NULL; // dead code, must sz == PMD_SIZE
>         /* hugepage or swap? */
>         if (pmd_huge(*pmd) || !pmd_present(*pmd)) // !pmd_present() also cover pmd_none(),
>                 return (pte_t *)pmd; // so, all possible and valid value in pmd entry will reach here.
> 
> 	return NULL; // dead code; can we have (!pmd_huge() && pmd_present()) and reach here? 
> 		// no, because this is a hugetlb mapping. otherwise, there is invalid value in pmd entry.
> ...

well if you are relying on the caller to not call this in wrong cases
it would make sense to have a

if (WARN_ON(!pmd_huge(*pmd)))
    return NULL

To document the assertion

Jason
Li Xinhai April 24, 2020, 2:53 p.m. UTC | #10
On 2020-04-24 at 22:10 Jason Gunthorpe wrote:
>On Fri, Apr 24, 2020 at 10:07:19PM +0800, Li Xinhai wrote:
>> On 2020-04-24 at 21:42 Jason Gunthorpe wrote:
>> >On Fri, Apr 24, 2020 at 09:33:40PM +0800, Li Xinhai wrote:
>> >> >
>> >> >> , so when sz == PMD_SIZE, pmd_offset() only called with a valid PUD
>> >> >> entry which point to PMD page table.
>> >> >
>> >> >But what prevents pud_huge?
>> >> >
>> >> if sz == PUD_SIZE, the 'return (pte_t*)pud' alrady end the function, which cover
>> >> pud_huge() and pud_none(), because we the mapping is for PUD_SIZE huge page.
>> >>
>> >> So, there is no possibility for pmd_offset() been called with invalid pud entry.
>> >> Below is the code I used for test which has BUG_ON, that should give more
>> >> clear idea about the semantics of code path:
>> >>
>> >> ...
>> >> pud = pud_offset(p4d, addr);
>> >> if (sz == PUD_SIZE) {
>> >> /* must be pud_huge or pud_none */
>> >> BUG_ON(!pud_huge(*pud) && !pud_none(*pud));
>> >> return (pte_t *)pud; // note that return valid pointer for pud_none() case,
>> >> // instead of NULL, that is same semantics as existing code.
>> >> }
>> >> if (!pud_present(*pud))
>> >> return NULL; // note that only return NULL in case pud not present,
>> >> // same sematics as existing code.
>> >> /* must have a valid entry and size to go further */
>> >> BUG_ON(sz != PMD_SIZE);
>> >>
>> >> pmd = pmd_offset(pud, addr);
>> >> /* must be pmd_huge or pmd_none */
>> >> BUG_ON(!pmd_huge(*pmd) && !pmd_none(*pmd));
>> >
>> >But why is !pmd_huge() ? The prior code returned null here, is that
>> >dead code? Your commit message should explain all of this..
>> >
>> let's see exising code for pmd part, the reason are in comments:
>> ...
>>         pmd = pmd_offset(pud, addr);
>>         if (sz != PMD_SIZE && pmd_none(*pmd))
>>                 return NULL; // dead code, must sz == PMD_SIZE
>>         /* hugepage or swap? */
>>         if (pmd_huge(*pmd) || !pmd_present(*pmd)) // !pmd_present() also cover pmd_none(),
>>                 return (pte_t *)pmd; // so, all possible and valid value in pmd entry will reach here.
>>
>> return NULL; // dead code; can we have (!pmd_huge() && pmd_present()) and reach here? 
>> // no, because this is a hugetlb mapping. otherwise, there is invalid value in pmd entry.
>> ...
>
>well if you are relying on the caller to not call this in wrong cases
>it would make sense to have a
>
>if (WARN_ON(!pmd_huge(*pmd)))
>    return NULL
>
>To document the assertion
> 
right, if this WARN_ON occurs, which means huge_pte_offset() been called
for a normal 4K mapping, that is a bug of caller. After inspected current code
of callers, no one tries to call it on 4K mapping.

>Jason
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index bcabbe0..e1424f5 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5365,8 +5365,8 @@  pte_t *huge_pte_offset(struct mm_struct *mm,
 {
 	pgd_t *pgd;
 	p4d_t *p4d;
-	pud_t *pud, pud_entry;
-	pmd_t *pmd, pmd_entry;
+	pud_t *pud;
+	pmd_t *pmd;
 
 	pgd = pgd_offset(mm, addr);
 	if (!pgd_present(*pgd))
@@ -5376,22 +5376,16 @@  pte_t *huge_pte_offset(struct mm_struct *mm,
 		return NULL;
 
 	pud = pud_offset(p4d, addr);
-	pud_entry = READ_ONCE(*pud);
-	if (sz != PUD_SIZE && pud_none(pud_entry))
-		return NULL;
-	/* hugepage or swap? */
-	if (pud_huge(pud_entry) || !pud_present(pud_entry))
+	if (sz == PUD_SIZE)
+		/* must be pud_huge or pud_none */
 		return (pte_t *)pud;
-
-	pmd = pmd_offset(pud, addr);
-	pmd_entry = READ_ONCE(*pmd);
-	if (sz != PMD_SIZE && pmd_none(pmd_entry))
+	if (!pud_present(*pud))
 		return NULL;
-	/* hugepage or swap? */
-	if (pmd_huge(pmd_entry) || !pmd_present(pmd_entry))
-		return (pte_t *)pmd;
+	/* must have a valid entry and size to go further */
 
-	return NULL;
+	pmd = pmd_offset(pud, addr);
+	/* must be pmd_huge or pmd_none */
+	return (pte_t *)pmd;
 }
 
 #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */