diff mbox series

[v4] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"

Message ID 20230925200110.1979606-1-zokeefe@google.com (mailing list archive)
State New
Headers show
Series [v4] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" | expand

Commit Message

Zach O'Keefe Sept. 25, 2023, 8:01 p.m. UTC
The 6.0 commits:

commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")

merged "can we have THPs in this VMA?" logic that was previously done
separately by fault-path, khugepaged, and smaps "THPeligible" checks.

During the process, the semantics of the fault path check changed in two
ways:

1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
   handler that could satisfy the fault.  Previously, this check had been
   done in create_huge_pud() and create_huge_pmd() routines, but after
   the changes, we never reach those routines.

During the review of the above commits, it was determined that in-tree
users weren't affected by the change; most notably, since the only relevant
user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
explicitly approved early in approval logic. However, this was a bad
assumption to make as it assumes the only reason to support ->huge_fault
was for DAX (which is not true in general).

Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
any ->huge_fault handler a chance to handle the fault.  Note that we
don't validate the file mode or mapping alignment, which is consistent
with the behavior before the aforementioned commits.

Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Yang Shi <shy828301@gmail.com>
Cc: Matthew Wilcox <willy@infradead.org>
Cc: David Hildenbrand <david@redhat.com>
---
I've updated the changelog to reflect discussions in [1] -- leaving
ack to David / Matthew on whether to take the patch.

Changed from v3[1]:
	- [akpm / David H. / M. Wilcox] Updated log to capture email discussion
Changed from v2[2]:
	- Fixed false negative in smaps check when !dax && ->huge_fault
Changed from v1[3]:
	- [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists

[1] https://lore.kernel.org/linux-mm/20230821234844.699818-1-zokeefe@google.com/
[2] https://lore.kernel.org/linux-mm/20230818211533.2523697-1-zokeefe@google.com/
[3] https://lore.kernel.org/linux-mm/CAAa6QmQw+F=o6htOn=6ADD6mwvMO=Ow_67f3ifBv3GpXx9Xg_g@mail.gmail.com/

---
 mm/huge_memory.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

Comments

Yang Shi Sept. 26, 2023, 9:39 p.m. UTC | #1
On Mon, Sep 25, 2023 at 1:01 PM Zach O'Keefe <zokeefe@google.com> wrote:
>
> The 6.0 commits:
>
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
>
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
>
> During the process, the semantics of the fault path check changed in two
> ways:
>
> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
>    handler that could satisfy the fault.  Previously, this check had been
>    done in create_huge_pud() and create_huge_pmd() routines, but after
>    the changes, we never reach those routines.
>
> During the review of the above commits, it was determined that in-tree
> users weren't affected by the change; most notably, since the only relevant
> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> explicitly approved early in approval logic. However, this was a bad
> assumption to make as it assumes the only reason to support ->huge_fault
> was for DAX (which is not true in general).
>
> Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> any ->huge_fault handler a chance to handle the fault.  Note that we
> don't validate the file mode or mapping alignment, which is consistent
> with the behavior before the aforementioned commits.

Looks good to me. Reviewed-by: Yang Shi <shy828301@gmail.com>

>
> Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Hildenbrand <david@redhat.com>
> ---
> I've updated the changelog to reflect discussions in [1] -- leaving
> ack to David / Matthew on whether to take the patch.
>
> Changed from v3[1]:
>         - [akpm / David H. / M. Wilcox] Updated log to capture email discussion
> Changed from v2[2]:
>         - Fixed false negative in smaps check when !dax && ->huge_fault
> Changed from v1[3]:
>         - [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists
>
> [1] https://lore.kernel.org/linux-mm/20230821234844.699818-1-zokeefe@google.com/
> [2] https://lore.kernel.org/linux-mm/20230818211533.2523697-1-zokeefe@google.com/
> [3] https://lore.kernel.org/linux-mm/CAAa6QmQw+F=o6htOn=6ADD6mwvMO=Ow_67f3ifBv3GpXx9Xg_g@mail.gmail.com/
>
> ---
>  mm/huge_memory.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> index 0f93a73115f73..797fe617e51ab 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -100,11 +100,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
>                 return in_pf;
>
>         /*
> -        * Special VMA and hugetlb VMA.
> +        * khugepaged special VMA and hugetlb VMA.
>          * Must be checked after dax since some dax mappings may have
>          * VM_MIXEDMAP set.
>          */
> -       if (vm_flags & VM_NO_KHUGEPAGED)
> +       if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
>                 return false;
>
>         /*
> @@ -132,12 +132,18 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
>                                            !hugepage_flags_always())))
>                 return false;
>
> -       /* Only regular file is valid */
> -       if (!in_pf && file_thp_enabled(vma))
> -               return true;
> -
> -       if (!vma_is_anonymous(vma))
> +       if (!vma_is_anonymous(vma)) {
> +               /*
> +                * Trust that ->huge_fault() handlers know what they are doing
> +                * in fault path.
> +                */
> +               if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> +                       return true;
> +               /* Only regular file is valid in collapse path */
> +               if (((!in_pf || smaps)) && file_thp_enabled(vma))
> +                       return true;
>                 return false;
> +       }
>
>         if (vma_is_temporary_stack(vma))
>                 return false;
> --
> 2.42.0.515.g380fc7ccd1-goog
>
Andrew Morton Oct. 6, 2023, 5:50 p.m. UTC | #2
On Mon, 25 Sep 2023 13:01:10 -0700 "Zach O'Keefe" <zokeefe@google.com> wrote:

> The 6.0 commits:
> 
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> 
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> 
> During the process, the semantics of the fault path check changed in two
> ways:
> 
> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
>    handler that could satisfy the fault.  Previously, this check had been
>    done in create_huge_pud() and create_huge_pmd() routines, but after
>    the changes, we never reach those routines.
> 
> During the review of the above commits, it was determined that in-tree
> users weren't affected by the change; most notably, since the only relevant
> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> explicitly approved early in approval logic. However, this was a bad
> assumption to make as it assumes the only reason to support ->huge_fault
> was for DAX (which is not true in general).
> 
> Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> any ->huge_fault handler a chance to handle the fault.  Note that we
> don't validate the file mode or mapping alignment, which is consistent
> with the behavior before the aforementioned commits.

It's unclear what are the userspace visible impacts of this change. 
Which makes it hard for others to determine whether -stable kernels
should be patched.

> Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>

It's nice to include a Closes: link after a Reported-by:.  Then readers
are better able to answer the above question.

> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Hildenbrand <david@redhat.com>
Andrew Morton Oct. 6, 2023, 5:58 p.m. UTC | #3
On Mon, 25 Sep 2023 13:01:10 -0700 "Zach O'Keefe" <zokeefe@google.com> wrote:

> The 6.0 commits:
> 
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> 
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> 
> During the process, the semantics of the fault path check changed in two
> ways:
> 
> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
>    handler that could satisfy the fault.  Previously, this check had been
>    done in create_huge_pud() and create_huge_pmd() routines, but after
>    the changes, we never reach those routines.
> 
> During the review of the above commits, it was determined that in-tree
> users weren't affected by the change; most notably, since the only relevant
> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> explicitly approved early in approval logic. However, this was a bad
> assumption to make as it assumes the only reason to support ->huge_fault
> was for DAX (which is not true in general).
> 
> Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> any ->huge_fault handler a chance to handle the fault.  Note that we
> don't validate the file mode or mapping alignment, which is consistent
> with the behavior before the aforementioned commits.
>
> ...
>
> @@ -100,11 +100,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
>  		return in_pf;
>  

Ryan's "mm: thp: introduce anon_orders and anon_always_mask sysfs
files" changes hugepage_vma_check() to return an unsigned int, so this
patch will need some rework to fit in after that.

However Ryan's overall series "variable-order, large folios for
anonymous memory" is in early days and might not make it.

And as I don't know what is the urgency of this patch ("mm/thp: fix
"mm: thp: kill __transhuge_page_enabled()"), I'm unable to decide which
patch needs to come first (thus requiring rework of the other patch).

Please discuss!
David Hildenbrand Oct. 6, 2023, 6:52 p.m. UTC | #4
On 06.10.23 19:58, Andrew Morton wrote:
> On Mon, 25 Sep 2023 13:01:10 -0700 "Zach O'Keefe" <zokeefe@google.com> wrote:
> 
>> The 6.0 commits:
>>
>> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
>> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
>>
>> merged "can we have THPs in this VMA?" logic that was previously done
>> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
>>
>> During the process, the semantics of the fault path check changed in two
>> ways:
>>
>> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
>> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
>>     handler that could satisfy the fault.  Previously, this check had been
>>     done in create_huge_pud() and create_huge_pmd() routines, but after
>>     the changes, we never reach those routines.
>>
>> During the review of the above commits, it was determined that in-tree
>> users weren't affected by the change; most notably, since the only relevant
>> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
>> explicitly approved early in approval logic. However, this was a bad
>> assumption to make as it assumes the only reason to support ->huge_fault
>> was for DAX (which is not true in general).
>>
>> Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
>> any ->huge_fault handler a chance to handle the fault.  Note that we
>> don't validate the file mode or mapping alignment, which is consistent
>> with the behavior before the aforementioned commits.
>>
>> ...
>>
>> @@ -100,11 +100,11 @@ bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
>>   		return in_pf;
>>   
> 
> Ryan's "mm: thp: introduce anon_orders and anon_always_mask sysfs
> files" changes hugepage_vma_check() to return an unsigned int, so this
> patch will need some rework to fit in after that.
> 
> However Ryan's overall series "variable-order, large folios for
> anonymous memory" is in early days and might not make it.
> 
> And as I don't know what is the urgency of this patch ("mm/thp: fix
> "mm: thp: kill __transhuge_page_enabled()"), I'm unable to decide which
> patch needs to come first (thus requiring rework of the other patch).
> 
> Please discuss!

IMHO clearly this one.
David Hildenbrand Oct. 6, 2023, 6:53 p.m. UTC | #5
On 25.09.23 22:01, Zach O'Keefe wrote:
> The 6.0 commits:
> 
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> 
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> 
> During the process, the semantics of the fault path check changed in two
> ways:
> 
> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
>     handler that could satisfy the fault.  Previously, this check had been
>     done in create_huge_pud() and create_huge_pmd() routines, but after
>     the changes, we never reach those routines.
> 
> During the review of the above commits, it was determined that in-tree
> users weren't affected by the change; most notably, since the only relevant
> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> explicitly approved early in approval logic. However, this was a bad
> assumption to make as it assumes the only reason to support ->huge_fault
> was for DAX (which is not true in general).
> 
> Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> any ->huge_fault handler a chance to handle the fault.  Note that we
> don't validate the file mode or mapping alignment, which is consistent
> with the behavior before the aforementioned commits.
> 
> Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> Cc: Yang Shi <shy828301@gmail.com>
> Cc: Matthew Wilcox <willy@infradead.org>
> Cc: David Hildenbrand <david@redhat.com>
> ---
> I've updated the changelog to reflect discussions in [1] -- leaving
> ack to David / Matthew on whether to take the patch.

Works for me.

Acked-by: David Hildenbrand <david@redhat.com>
Andrew Morton Oct. 6, 2023, 7:11 p.m. UTC | #6
On Fri, 6 Oct 2023 20:52:30 +0200 David Hildenbrand <david@redhat.com> wrote:

> > And as I don't know what is the urgency of this patch ("mm/thp: fix
> > "mm: thp: kill __transhuge_page_enabled()"), I'm unable to decide which
> > patch needs to come first (thus requiring rework of the other patch).
> > 
> > Please discuss!
> 
> IMHO clearly this one.

OK.  I'll drop the "variable-order, large folios for anonymous memory" v6
series for now.
Ryan Roberts Oct. 6, 2023, 9:28 p.m. UTC | #7
On 06/10/2023 20:11, Andrew Morton wrote:
> On Fri, 6 Oct 2023 20:52:30 +0200 David Hildenbrand <david@redhat.com> wrote:
> 
>>> And as I don't know what is the urgency of this patch ("mm/thp: fix
>>> "mm: thp: kill __transhuge_page_enabled()"), I'm unable to decide which
>>> patch needs to come first (thus requiring rework of the other patch).
>>>
>>> Please discuss!
>>
>> IMHO clearly this one.
> 
> OK.  I'll drop the "variable-order, large folios for anonymous memory" v6
> series for now.

Yep, agreed!
Zach O'Keefe Oct. 9, 2023, 1:22 p.m. UTC | #8
On Fri, Oct 6, 2023 at 10:50 AM Andrew Morton <akpm@linux-foundation.org> wrote:
>
> On Mon, 25 Sep 2023 13:01:10 -0700 "Zach O'Keefe" <zokeefe@google.com> wrote:
>
> > The 6.0 commits:
> >
> > commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> > commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> >
> > merged "can we have THPs in this VMA?" logic that was previously done
> > separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> >
> > During the process, the semantics of the fault path check changed in two
> > ways:
> >
> > 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> > 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
> >    handler that could satisfy the fault.  Previously, this check had been
> >    done in create_huge_pud() and create_huge_pmd() routines, but after
> >    the changes, we never reach those routines.
> >
> > During the review of the above commits, it was determined that in-tree
> > users weren't affected by the change; most notably, since the only relevant
> > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> > explicitly approved early in approval logic. However, this was a bad
> > assumption to make as it assumes the only reason to support ->huge_fault
> > was for DAX (which is not true in general).
> >
> > Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> > any ->huge_fault handler a chance to handle the fault.  Note that we
> > don't validate the file mode or mapping alignment, which is consistent
> > with the behavior before the aforementioned commits.
>
> It's unclear what are the userspace visible impacts of this change.
> Which makes it hard for others to determine whether -stable kernels
> should be patched.

IMO, I don't think this change is suitable for -stable; the only users
that would have been affected are those that maintain out-of-tree
drivers / code that hooked into ->huge_fault() or used VM_MIXEDMAP +
THP. No users of the in-tree kernel would have been affected. It's
still a good "fix" to make going forward (and certainly happy to be
able to help Saurabh out).

+ greg k-h for vis / to confirm.

> > Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> > Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
>
> It's nice to include a Closes: link after a Reported-by:.  Then readers
> are better able to answer the above question.

Ah, apologies, Andrew; I didn't know such a tag existed -- I'll be
sure to include it in the future.

> > Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> > Cc: Yang Shi <shy828301@gmail.com>
> > Cc: Matthew Wilcox <willy@infradead.org>
> > Cc: David Hildenbrand <david@redhat.com>
>
Zach O'Keefe Oct. 9, 2023, 1:23 p.m. UTC | #9
On Fri, Oct 6, 2023 at 2:28 PM Ryan Roberts <ryan.roberts@arm.com> wrote:
>
> On 06/10/2023 20:11, Andrew Morton wrote:
> > On Fri, 6 Oct 2023 20:52:30 +0200 David Hildenbrand <david@redhat.com> wrote:
> >
> >>> And as I don't know what is the urgency of this patch ("mm/thp: fix
> >>> "mm: thp: kill __transhuge_page_enabled()"), I'm unable to decide which
> >>> patch needs to come first (thus requiring rework of the other patch).
> >>>
> >>> Please discuss!
> >>
> >> IMHO clearly this one.
> >
> > OK.  I'll drop the "variable-order, large folios for anonymous memory" v6
> > series for now.
>
> Yep, agreed!

Thank all and sorry for the late response ; have been buried as of late.

Best,
Zach
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 0f93a73115f73..797fe617e51ab 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -100,11 +100,11 @@  bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
 		return in_pf;
 
 	/*
-	 * Special VMA and hugetlb VMA.
+	 * khugepaged special VMA and hugetlb VMA.
 	 * Must be checked after dax since some dax mappings may have
 	 * VM_MIXEDMAP set.
 	 */
-	if (vm_flags & VM_NO_KHUGEPAGED)
+	if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
 		return false;
 
 	/*
@@ -132,12 +132,18 @@  bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
 					   !hugepage_flags_always())))
 		return false;
 
-	/* Only regular file is valid */
-	if (!in_pf && file_thp_enabled(vma))
-		return true;
-
-	if (!vma_is_anonymous(vma))
+	if (!vma_is_anonymous(vma)) {
+		/*
+		 * Trust that ->huge_fault() handlers know what they are doing
+		 * in fault path.
+		 */
+		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
+			return true;
+		/* Only regular file is valid in collapse path */
+		if (((!in_pf || smaps)) && file_thp_enabled(vma))
+			return true;
 		return false;
+	}
 
 	if (vma_is_temporary_stack(vma))
 		return false;