diff mbox series

[v2,1/2] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"

Message ID 20230818211533.2523697-1-zokeefe@google.com (mailing list archive)
State New
Headers show
Series [v2,1/2] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" | expand

Commit Message

Zach O'Keefe Aug. 18, 2023, 9:15 p.m. UTC
The 6.0 commits:

commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")

merged "can we have THPs in this VMA?" logic that was previously done
separately by fault-path, khugepaged, and smaps "THPeligible" checks.

During the process, the semantics of the fault path check changed in two
ways:

1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
   handler that could satisfy the fault.  Previously, this check had been
   done in create_huge_pud() and create_huge_pmd() routines, but after
   the changes, we never reach those routines.

During the review of the above commits, it was determined that in-tree
users weren't affected by the change; most notably, since the only relevant
user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
explicitly approved early in approval logic.  However, there is at least
one occurrence where an out-of-tree driver that used
VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.

Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
any ->huge_fault handler a chance to handle the fault.  Note that we
don't validate the file mode or mapping alignment, which is consistent
with the behavior before the aforementioned commits.

Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Yang Shi <shy828301@gmail.com>
---
Changed from v1[1]:
	- [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists

[1] https://lore.kernel.org/linux-mm/CAAa6QmQw+F=o6htOn=6ADD6mwvMO=Ow_67f3ifBv3GpXx9Xg_g@mail.gmail.com/

---
 mm/huge_memory.c | 17 ++++++++++-------
 1 file changed, 10 insertions(+), 7 deletions(-)

Comments

Yang Shi Aug. 21, 2023, 10:53 p.m. UTC | #1
On Fri, Aug 18, 2023 at 2:15 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> The 6.0 commits:
>
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
>
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
>
> During the process, the semantics of the fault path check changed in two
> ways:
>
> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
>    handler that could satisfy the fault.  Previously, this check had been
>    done in create_huge_pud() and create_huge_pmd() routines, but after
>    the changes, we never reach those routines.
>
> During the review of the above commits, it was determined that in-tree
> users weren't affected by the change; most notably, since the only relevant
> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> explicitly approved early in approval logic.  However, there is at least
> one occurrence where an out-of-tree driver that used
> VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
>
> Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> any ->huge_fault handler a chance to handle the fault.  Note that we
> don't validate the file mode or mapping alignment, which is consistent
> with the behavior before the aforementioned commits.
>
> Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> Cc: Yang Shi <shy828301@gmail.com>
> ---
> Changed from v1[1]:
>         - [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists
>
> [1] https://lore.kernel.org/linux-mm/CAAa6QmQw+F=o6htOn=6ADD6mwvMO=Ow_67f3ifBv3GpXx9Xg_g@mail.gmail.com/

Thanks, Zach. The patch looks correct to me. You can add
Reviewed-by:Yang Shi <shy828301@gmail.com>.

A further comment below...

>
> ---
>  mm/huge_memory.c | 17 ++++++++++-------
>  1 file changed, 10 insertions(+), 7 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index eb3678360b97..cd379b2c077b 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
>                 return in_pf;
>
>         /*
> -        * Special VMA and hugetlb VMA.
> +        * khugepaged special VMA and hugetlb VMA.
>          * Must be checked after dax since some dax mappings may have
>          * VM_MIXEDMAP set.
>          */
> -       if (vm_flags & VM_NO_KHUGEPAGED)
> +       if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))

I'm wondering whether we shall remove VM_MIXEDMAP from
VM_NO_KHUGEPAGED or not if that kind VMAs are huge page applicable for
some usecases. The downside may be some CPU time waste on the
VM_MIXEDMAP area which has PFN instead of struct page, but it should
be ok. Anything else did I miss? Just back from a long vacation, my
brain is still not running in full speed yet.

>                 return false;
>
>         /*
> @@ -128,12 +128,15 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
>                                            !hugepage_flags_always())))
>                 return false;
>
> -       /* Only regular file is valid */
> -       if (!in_pf && file_thp_enabled(vma))
> -               return true;
> -
>         if (!vma_is_anonymous(vma))
> -               return false;
> +               return in_pf ?
> +                       /*
> +                        * Trust that ->huge_fault() handlers know
> +                        * what they are doing in fault path.
> +                        */
> +                       !!vma->vm_ops->huge_fault :
> +                       /* Only regular file is valid in collapse path */
> +                       file_thp_enabled(vma);
>
>         if (vma_is_temporary_stack(vma))
>                 return false;
> --
> 2.42.0.rc1.204.g551eb34607-goog
>
Zach O'Keefe Aug. 21, 2023, 11:34 p.m. UTC | #2
On Mon, Aug 21, 2023 at 3:53 PM Yang Shi <shy828301@gmail.com> wrote:
>
> On Fri, Aug 18, 2023 at 2:15 PM Zach O'Keefe <zokeefe@google.com> wrote:
> >
> > The 6.0 commits:
> >
> > commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> > commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> >
> > merged "can we have THPs in this VMA?" logic that was previously done
> > separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> >
> > During the process, the semantics of the fault path check changed in two
> > ways:
> >
> > 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> > 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
> >    handler that could satisfy the fault.  Previously, this check had been
> >    done in create_huge_pud() and create_huge_pmd() routines, but after
> >    the changes, we never reach those routines.
> >
> > During the review of the above commits, it was determined that in-tree
> > users weren't affected by the change; most notably, since the only relevant
> > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> > explicitly approved early in approval logic.  However, there is at least
> > one occurrence where an out-of-tree driver that used
> > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> >
> > Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> > any ->huge_fault handler a chance to handle the fault.  Note that we
> > don't validate the file mode or mapping alignment, which is consistent
> > with the behavior before the aforementioned commits.
> >
> > Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> > Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > Cc: Yang Shi <shy828301@gmail.com>
> > ---
> > Changed from v1[1]:
> >         - [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists
> >
> > [1] https://lore.kernel.org/linux-mm/CAAa6QmQw+F=o6htOn=6ADD6mwvMO=Ow_67f3ifBv3GpXx9Xg_g@mail.gmail.com/
>
> Thanks, Zach. The patch looks correct to me. You can add
> Reviewed-by:Yang Shi <shy828301@gmail.com>.


Hey Yang, thanks for taking the time to review .. and ..  welcome back :)

Sorry to do this to you, but while responding to you on another thread
I realized an issue below:

> A further comment below...
>
> >
> > ---
> >  mm/huge_memory.c | 17 ++++++++++-------
> >  1 file changed, 10 insertions(+), 7 deletions(-)
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index eb3678360b97..cd379b2c077b 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
> >                 return in_pf;
> >
> >         /*
> > -        * Special VMA and hugetlb VMA.
> > +        * khugepaged special VMA and hugetlb VMA.
> >          * Must be checked after dax since some dax mappings may have
> >          * VM_MIXEDMAP set.
> >          */
> > -       if (vm_flags & VM_NO_KHUGEPAGED)
> > +       if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
>
> I'm wondering whether we shall remove VM_MIXEDMAP from
> VM_NO_KHUGEPAGED or not if that kind VMAs are huge page applicable for
> some usecases. The downside may be some CPU time waste on the
> VM_MIXEDMAP area which has PFN instead of struct page, but it should
> be ok. Anything else did I miss? Just back from a long vacation, my
> brain is still not running in full speed yet.

I was thinking about the same thing, and had originally intended to
raise that question here -- but thought it better to stick with the
immediate issue. Ironically, we've gone off on both a THPeligible
tangent and another about faulting file-backed memory.

But ya, AFAIU, there is no technical reason why collapse can't act on
VM_MIXEDMAP, as long as all the pages it finds are vm_normal() pages.
I don't know enough about the possible use cases here though, and
whether this is the best memory to be allocating precious hugepages
to. You also raise a good point about cpu usage, since there may be a
greater chance of failing late in scan due finding a PFN-only mapping.

> >                 return false;
> >
> >         /*
> > @@ -128,12 +128,15 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
> >                                            !hugepage_flags_always())))
> >                 return false;
> >
> > -       /* Only regular file is valid */
> > -       if (!in_pf && file_thp_enabled(vma))
> > -               return true;
> > -
> >         if (!vma_is_anonymous(vma))
> > -               return false;
> > +               return in_pf ?
> > +                       /*
> > +                        * Trust that ->huge_fault() handlers know
> > +                        * what they are doing in fault path.
> > +                        */
> > +                       !!vma->vm_ops->huge_fault :
> > +                       /* Only regular file is valid in collapse path */
> > +                       file_thp_enabled(vma);

This works for fault and collapse paths, but what about smaps? I think
we should be doing both checks, and returning "true" if either is
true. This also raises the question of how hugepage_vma_check() is set
up, and how we've been using "in_pf" and "smaps". Today, these mean,
"am I in fault path?" and "am I in smaps path?", whereas I think they
ought to be, "should I check fault path, else check collapse path",
and "am I in smaps path?". smaps path should then use
hugepage_vma_check(in_pf) || hugepage_vma_check(!in_pf). It a depends
on how pedantic we want to be about THPeligible, but I've found a few
corner cases where the distinction matters.

What I think I'll do is send off an embarrassing 3rd revision of this
simple patch -- removing Patch 2 that was previously included in v2 --
just so we have a shot of getting the fix for Saurabh into 6.6. We can
worry about any other refactorings / fixes later..

Thanks,
Zach


> >         if (vma_is_temporary_stack(vma))
> >                 return false;
> > --
> > 2.42.0.rc1.204.g551eb34607-goog
> >
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index eb3678360b97..cd379b2c077b 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -96,11 +96,11 @@  bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
 		return in_pf;
 
 	/*
-	 * Special VMA and hugetlb VMA.
+	 * khugepaged special VMA and hugetlb VMA.
 	 * Must be checked after dax since some dax mappings may have
 	 * VM_MIXEDMAP set.
 	 */
-	if (vm_flags & VM_NO_KHUGEPAGED)
+	if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
 		return false;
 
 	/*
@@ -128,12 +128,15 @@  bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
 					   !hugepage_flags_always())))
 		return false;
 
-	/* Only regular file is valid */
-	if (!in_pf && file_thp_enabled(vma))
-		return true;
-
 	if (!vma_is_anonymous(vma))
-		return false;
+		return in_pf ?
+			/*
+			 * Trust that ->huge_fault() handlers know
+			 * what they are doing in fault path.
+			 */
+			!!vma->vm_ops->huge_fault :
+			/* Only regular file is valid in collapse path */
+			file_thp_enabled(vma);
 
 	if (vma_is_temporary_stack(vma))
 		return false;