diff mbox series

[v2] mm: pvmw: check possible huge PMD map by transhuge_vma_suitable()

Message ID 20220513191705.457775-1-shy828301@gmail.com (mailing list archive)
State New
Headers show
Series [v2] mm: pvmw: check possible huge PMD map by transhuge_vma_suitable() | expand

Commit Message

Yang Shi May 13, 2022, 7:17 p.m. UTC
IIUC PVMW checks if the vma is possibly huge PMD mapped by
transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".

Actually pvmw->nr_pages is returned by compound_nr() or
folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
>= HPAGE_PMD_NR".  And it is guaranteed THP is allocated for valid VMA
in the first place.  But it may be not PMD mapped if the VMA is file
VMA and it is not properly aligned.  The transhuge_vma_suitable()
is used to do such check, so replace transparent_hugepage_active() to
it, which is too heavy and overkilling.

Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Yang Shi <shy828301@gmail.com>
---
v2: * Fixed build error for !CONFIG_TRANSPARENT_HUGEPAGE
    * Removed fixes tag per Willy

 include/linux/huge_mm.h | 8 ++++++--
 mm/page_vma_mapped.c    | 2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

Comments

Andrew Morton May 18, 2022, 12:05 a.m. UTC | #1
On Fri, 13 May 2022 12:17:05 -0700 Yang Shi <shy828301@gmail.com> wrote:

> IIUC PVMW checks if the vma is possibly huge PMD mapped by
> transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
> 
> Actually pvmw->nr_pages is returned by compound_nr() or
> folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> >= HPAGE_PMD_NR".  And it is guaranteed THP is allocated for valid VMA
> in the first place.  But it may be not PMD mapped if the VMA is file
> VMA and it is not properly aligned.  The transhuge_vma_suitable()
> is used to do such check, so replace transparent_hugepage_active() to
> it, which is too heavy and overkilling.

I messed with the changelog a bit.  The function is called
page_vma_mapped_walk(), so let's call it that.

This patch has been in the trees since May 12, which isn't terribly
long.  Does anyone feel up to a reviewed-by?

Thanks.

From: Yang Shi <shy828301@gmail.com>
Subject: mm/page_vma_mapped.c: check possible huge PMD map with transhuge_vma_suitable()
Date: Fri, 13 May 2022 12:17:05 -0700

IIUC page_vma_mapped_walk() checks if the vma is possibly huge PMD mapped
with transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".

Actually pvmw->nr_pages is returned by compound_nr() or folio_nr_pages(),
so the page should be THP as long as "pvmw->nr_pages >= HPAGE_PMD_NR". 
And it is guaranteed THP is allocated for valid VMA in the first place. 
But it may be not PMD mapped if the VMA is file VMA and it is not properly
aligned.  The transhuge_vma_suitable() is used to do such check, so
replace transparent_hugepage_active() to it, which is too heavy and
overkilling.

Link: https://lkml.kernel.org/r/20220513191705.457775-1-shy828301@gmail.com
Signed-off-by: Yang Shi <shy828301@gmail.com>
Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/huge_mm.h |    8 ++++++--
 mm/page_vma_mapped.c    |    2 +-
 2 files changed, 7 insertions(+), 3 deletions(-)

--- a/include/linux/huge_mm.h~mm-pvmw-check-possible-huge-pmd-map-by-transhuge_vma_suitable
+++ a/include/linux/huge_mm.h
@@ -117,8 +117,10 @@ extern struct kobj_attribute shmem_enabl
 extern unsigned long transparent_hugepage_flags;
 
 static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
-		unsigned long haddr)
+		unsigned long addr)
 {
+	unsigned long haddr;
+
 	/* Don't have to check pgoff for anonymous vma */
 	if (!vma_is_anonymous(vma)) {
 		if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
@@ -126,6 +128,8 @@ static inline bool transhuge_vma_suitabl
 			return false;
 	}
 
+	haddr = addr & HPAGE_PMD_MASK;
+
 	if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
 		return false;
 	return true;
@@ -342,7 +346,7 @@ static inline bool transparent_hugepage_
 }
 
 static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
-		unsigned long haddr)
+		unsigned long addr)
 {
 	return false;
 }
--- a/mm/page_vma_mapped.c~mm-pvmw-check-possible-huge-pmd-map-by-transhuge_vma_suitable
+++ a/mm/page_vma_mapped.c
@@ -243,7 +243,7 @@ restart:
 			 * cleared *pmd but not decremented compound_mapcount().
 			 */
 			if ((pvmw->flags & PVMW_SYNC) &&
-			    transparent_hugepage_active(vma) &&
+			    transhuge_vma_suitable(vma, pvmw->address) &&
 			    (pvmw->nr_pages >= HPAGE_PMD_NR)) {
 				spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
Muchun Song May 18, 2022, 5:31 a.m. UTC | #2
On Fri, May 13, 2022 at 12:17:05PM -0700, Yang Shi wrote:
> IIUC PVMW checks if the vma is possibly huge PMD mapped by
> transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
> 
> Actually pvmw->nr_pages is returned by compound_nr() or
> folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> >= HPAGE_PMD_NR".  And it is guaranteed THP is allocated for valid VMA
> in the first place.  But it may be not PMD mapped if the VMA is file
> VMA and it is not properly aligned.  The transhuge_vma_suitable()
> is used to do such check, so replace transparent_hugepage_active() to
> it, which is too heavy and overkilling.
> 
> Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> Cc: Muchun Song <songmuchun@bytedance.com>
> Signed-off-by: Yang Shi <shy828301@gmail.com>
> ---
> v2: * Fixed build error for !CONFIG_TRANSPARENT_HUGEPAGE
>     * Removed fixes tag per Willy
> 
>  include/linux/huge_mm.h | 8 ++++++--
>  mm/page_vma_mapped.c    | 2 +-
>  2 files changed, 7 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> index fbf36bb1be22..c2826b1f4069 100644
> --- a/include/linux/huge_mm.h
> +++ b/include/linux/huge_mm.h
> @@ -117,8 +117,10 @@ extern struct kobj_attribute shmem_enabled_attr;
>  extern unsigned long transparent_hugepage_flags;
>  
>  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> -		unsigned long haddr)
> +		unsigned long addr)
>  {
> +	unsigned long haddr;
> +
>  	/* Don't have to check pgoff for anonymous vma */
>  	if (!vma_is_anonymous(vma)) {
>  		if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> @@ -126,6 +128,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
>  			return false;
>  	}
>  
> +	haddr = addr & HPAGE_PMD_MASK;
> +
>  	if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
>  		return false;
>  	return true;
> @@ -328,7 +332,7 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
>  }
>  
>  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> -		unsigned long haddr)
> +		unsigned long addr)
>  {
>  	return false;
>  }
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index c10f839fc410..e971a467fcdf 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -243,7 +243,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>  			 * cleared *pmd but not decremented compound_mapcount().
>  			 */
>  			if ((pvmw->flags & PVMW_SYNC) &&
> -			    transparent_hugepage_active(vma) &&
> +			    transhuge_vma_suitable(vma, pvmw->address) &&

How about the following diff? Then we do not need to change
transhuge_vma_suitable().  All the users of transhuge_vma_suitable()
are already do the alignment by themselves.

Thanks.

diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index c10f839fc410..0aed5ca60c67 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -243,7 +243,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
                         * cleared *pmd but not decremented compound_mapcount().
                         */
                        if ((pvmw->flags & PVMW_SYNC) &&
-                           transparent_hugepage_active(vma) &&
+                           IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
+                           transhuge_vma_suitable(vma, pvmw->address & HPAGE_PMD_MASK) &&
                            (pvmw->nr_pages >= HPAGE_PMD_NR)) {
                                spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);

>  			    (pvmw->nr_pages >= HPAGE_PMD_NR)) {
>  				spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
>  
> -- 
> 2.26.3
> 
>
Yang Shi May 18, 2022, 6:45 p.m. UTC | #3
On Tue, May 17, 2022 at 10:31 PM Muchun Song <songmuchun@bytedance.com> wrote:
>
> On Fri, May 13, 2022 at 12:17:05PM -0700, Yang Shi wrote:
> > IIUC PVMW checks if the vma is possibly huge PMD mapped by
> > transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
> >
> > Actually pvmw->nr_pages is returned by compound_nr() or
> > folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> > >= HPAGE_PMD_NR".  And it is guaranteed THP is allocated for valid VMA
> > in the first place.  But it may be not PMD mapped if the VMA is file
> > VMA and it is not properly aligned.  The transhuge_vma_suitable()
> > is used to do such check, so replace transparent_hugepage_active() to
> > it, which is too heavy and overkilling.
> >
> > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > Cc: Muchun Song <songmuchun@bytedance.com>
> > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > ---
> > v2: * Fixed build error for !CONFIG_TRANSPARENT_HUGEPAGE
> >     * Removed fixes tag per Willy
> >
> >  include/linux/huge_mm.h | 8 ++++++--
> >  mm/page_vma_mapped.c    | 2 +-
> >  2 files changed, 7 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index fbf36bb1be22..c2826b1f4069 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -117,8 +117,10 @@ extern struct kobj_attribute shmem_enabled_attr;
> >  extern unsigned long transparent_hugepage_flags;
> >
> >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > -             unsigned long haddr)
> > +             unsigned long addr)
> >  {
> > +     unsigned long haddr;
> > +
> >       /* Don't have to check pgoff for anonymous vma */
> >       if (!vma_is_anonymous(vma)) {
> >               if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> > @@ -126,6 +128,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> >                       return false;
> >       }
> >
> > +     haddr = addr & HPAGE_PMD_MASK;
> > +
> >       if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
> >               return false;
> >       return true;
> > @@ -328,7 +332,7 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> >  }
> >
> >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > -             unsigned long haddr)
> > +             unsigned long addr)
> >  {
> >       return false;
> >  }
> > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> > index c10f839fc410..e971a467fcdf 100644
> > --- a/mm/page_vma_mapped.c
> > +++ b/mm/page_vma_mapped.c
> > @@ -243,7 +243,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> >                        * cleared *pmd but not decremented compound_mapcount().
> >                        */
> >                       if ((pvmw->flags & PVMW_SYNC) &&
> > -                         transparent_hugepage_active(vma) &&
> > +                         transhuge_vma_suitable(vma, pvmw->address) &&
>
> How about the following diff? Then we do not need to change
> transhuge_vma_suitable().  All the users of transhuge_vma_suitable()
> are already do the alignment by themselves.

Thanks for the suggestion. But TBH I don't think this is a better way.
I did think about this before proposing v2, but I don't prefer to
pollute the code with IS_ENABLED(CONFIG_xxx) since the definition of
transhuge_vma_suitable() is already protected by #ifdef. Rounding the
address in transhuge_vma_suitable() seems neater and more readable to
me IMHO.

Some callers of transhuge_vma_suitable() do round the address before
calling it, but the rounded address is used by other codes in the
callers too.

>
> Thanks.
>
> diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> index c10f839fc410..0aed5ca60c67 100644
> --- a/mm/page_vma_mapped.c
> +++ b/mm/page_vma_mapped.c
> @@ -243,7 +243,8 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
>                          * cleared *pmd but not decremented compound_mapcount().
>                          */
>                         if ((pvmw->flags & PVMW_SYNC) &&
> -                           transparent_hugepage_active(vma) &&
> +                           IS_ENABLED(CONFIG_TRANSPARENT_HUGEPAGE) &&
> +                           transhuge_vma_suitable(vma, pvmw->address & HPAGE_PMD_MASK) &&
>                             (pvmw->nr_pages >= HPAGE_PMD_NR)) {
>                                 spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
>
> >                           (pvmw->nr_pages >= HPAGE_PMD_NR)) {
> >                               spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);
> >
> > --
> > 2.26.3
> >
> >
Muchun Song May 19, 2022, 7:38 a.m. UTC | #4
On Wed, May 18, 2022 at 11:45:14AM -0700, Yang Shi wrote:
> On Tue, May 17, 2022 at 10:31 PM Muchun Song <songmuchun@bytedance.com> wrote:
> >
> > On Fri, May 13, 2022 at 12:17:05PM -0700, Yang Shi wrote:
> > > IIUC PVMW checks if the vma is possibly huge PMD mapped by
> > > transparent_hugepage_active() and "pvmw->nr_pages >= HPAGE_PMD_NR".
> > >
> > > Actually pvmw->nr_pages is returned by compound_nr() or
> > > folio_nr_pages(), so the page should be THP as long as "pvmw->nr_pages
> > > >= HPAGE_PMD_NR".  And it is guaranteed THP is allocated for valid VMA
> > > in the first place.  But it may be not PMD mapped if the VMA is file
> > > VMA and it is not properly aligned.  The transhuge_vma_suitable()
> > > is used to do such check, so replace transparent_hugepage_active() to
> > > it, which is too heavy and overkilling.
> > >
> > > Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
> > > Cc: Muchun Song <songmuchun@bytedance.com>
> > > Signed-off-by: Yang Shi <shy828301@gmail.com>
> > > ---
> > > v2: * Fixed build error for !CONFIG_TRANSPARENT_HUGEPAGE
> > >     * Removed fixes tag per Willy
> > >
> > >  include/linux/huge_mm.h | 8 ++++++--
> > >  mm/page_vma_mapped.c    | 2 +-
> > >  2 files changed, 7 insertions(+), 3 deletions(-)
> > >
> > > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > > index fbf36bb1be22..c2826b1f4069 100644
> > > --- a/include/linux/huge_mm.h
> > > +++ b/include/linux/huge_mm.h
> > > @@ -117,8 +117,10 @@ extern struct kobj_attribute shmem_enabled_attr;
> > >  extern unsigned long transparent_hugepage_flags;
> > >
> > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > -             unsigned long haddr)
> > > +             unsigned long addr)
> > >  {
> > > +     unsigned long haddr;
> > > +
> > >       /* Don't have to check pgoff for anonymous vma */
> > >       if (!vma_is_anonymous(vma)) {
> > >               if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
> > > @@ -126,6 +128,8 @@ static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > >                       return false;
> > >       }
> > >
> > > +     haddr = addr & HPAGE_PMD_MASK;
> > > +
> > >       if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
> > >               return false;
> > >       return true;
> > > @@ -328,7 +332,7 @@ static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
> > >  }
> > >
> > >  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
> > > -             unsigned long haddr)
> > > +             unsigned long addr)
> > >  {
> > >       return false;
> > >  }
> > > diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
> > > index c10f839fc410..e971a467fcdf 100644
> > > --- a/mm/page_vma_mapped.c
> > > +++ b/mm/page_vma_mapped.c
> > > @@ -243,7 +243,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
> > >                        * cleared *pmd but not decremented compound_mapcount().
> > >                        */
> > >                       if ((pvmw->flags & PVMW_SYNC) &&
> > > -                         transparent_hugepage_active(vma) &&
> > > +                         transhuge_vma_suitable(vma, pvmw->address) &&
> >
> > How about the following diff? Then we do not need to change
> > transhuge_vma_suitable().  All the users of transhuge_vma_suitable()
> > are already do the alignment by themselves.
> 
> Thanks for the suggestion. But TBH I don't think this is a better way.
> I did think about this before proposing v2, but I don't prefer to
> pollute the code with IS_ENABLED(CONFIG_xxx) since the definition of
> transhuge_vma_suitable() is already protected by #ifdef. Rounding the
> address in transhuge_vma_suitable() seems neater and more readable to
> me IMHO.
> 
> Some callers of transhuge_vma_suitable() do round the address before
> calling it, but the rounded address is used by other codes in the
> callers too.
>

All right.

Reviewed-by: Muchun Song <songmuchun@bytedance.com>

Thanks.
diff mbox series

Patch

diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index fbf36bb1be22..c2826b1f4069 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -117,8 +117,10 @@  extern struct kobj_attribute shmem_enabled_attr;
 extern unsigned long transparent_hugepage_flags;
 
 static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
-		unsigned long haddr)
+		unsigned long addr)
 {
+	unsigned long haddr;
+
 	/* Don't have to check pgoff for anonymous vma */
 	if (!vma_is_anonymous(vma)) {
 		if (!IS_ALIGNED((vma->vm_start >> PAGE_SHIFT) - vma->vm_pgoff,
@@ -126,6 +128,8 @@  static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
 			return false;
 	}
 
+	haddr = addr & HPAGE_PMD_MASK;
+
 	if (haddr < vma->vm_start || haddr + HPAGE_PMD_SIZE > vma->vm_end)
 		return false;
 	return true;
@@ -328,7 +332,7 @@  static inline bool transparent_hugepage_active(struct vm_area_struct *vma)
 }
 
 static inline bool transhuge_vma_suitable(struct vm_area_struct *vma,
-		unsigned long haddr)
+		unsigned long addr)
 {
 	return false;
 }
diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c
index c10f839fc410..e971a467fcdf 100644
--- a/mm/page_vma_mapped.c
+++ b/mm/page_vma_mapped.c
@@ -243,7 +243,7 @@  bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw)
 			 * cleared *pmd but not decremented compound_mapcount().
 			 */
 			if ((pvmw->flags & PVMW_SYNC) &&
-			    transparent_hugepage_active(vma) &&
+			    transhuge_vma_suitable(vma, pvmw->address) &&
 			    (pvmw->nr_pages >= HPAGE_PMD_NR)) {
 				spinlock_t *ptl = pmd_lock(mm, pvmw->pmd);