diff mbox series

[v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()"

Message ID 20230821234844.699818-1-zokeefe@google.com (mailing list archive)
State New
Headers show
Series [v3] mm/thp: fix "mm: thp: kill __transhuge_page_enabled()" | expand

Commit Message

Zach O'Keefe Aug. 21, 2023, 11:48 p.m. UTC
The 6.0 commits:

commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")

merged "can we have THPs in this VMA?" logic that was previously done
separately by fault-path, khugepaged, and smaps "THPeligible" checks.

During the process, the semantics of the fault path check changed in two
ways:

1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
   handler that could satisfy the fault.  Previously, this check had been
   done in create_huge_pud() and create_huge_pmd() routines, but after
   the changes, we never reach those routines.

During the review of the above commits, it was determined that in-tree
users weren't affected by the change; most notably, since the only relevant
user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
explicitly approved early in approval logic.  However, there is at least
one occurrence where an out-of-tree driver that used
VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.

Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
any ->huge_fault handler a chance to handle the fault.  Note that we
don't validate the file mode or mapping alignment, which is consistent
with the behavior before the aforementioned commits.

Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
Signed-off-by: Zach O'Keefe <zokeefe@google.com>
Cc: Yang Shi <shy828301@gmail.com>
---
Changed from v2[1]:
	- Fixed false negative in smaps check when !dax && ->huge_fault
Changed from v1[2]:
	- [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists

There are some logical holes in smaps' THPeligible checks here, but those
are best dealt with in follow-up patches.  For now, just make sure the
fault path is dealt with.

[1] https://lore.kernel.org/linux-mm/20230818211533.2523697-1-zokeefe@google.com/
[2] https://lore.kernel.org/linux-mm/CAAa6QmQw+F=o6htOn=6ADD6mwvMO=Ow_67f3ifBv3GpXx9Xg_g@mail.gmail.com/

---
 mm/huge_memory.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

Comments

Saurabh Singh Sengar Aug. 22, 2023, 5:20 a.m. UTC | #1
> -----Original Message-----
> From: Zach O'Keefe <zokeefe@google.com>
> Sent: Tuesday, August 22, 2023 5:19 AM
> To: linux-mm@kvack.org; Yang Shi <shy828301@gmail.com>
> Cc: linux-kernel@vger.kernel.org; Zach O'Keefe <zokeefe@google.com>;
> Saurabh Singh Sengar <ssengar@microsoft.com>
> Subject: [EXTERNAL] [PATCH v3] mm/thp: fix "mm: thp: kill
> __transhuge_page_enabled()"
>
> [You don't often get email from zokeefe@google.com. Learn why this is
> important at https://aka.ms/LearnAboutSenderIdentification ]
>
> The 6.0 commits:
>
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()") commit
> 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
>
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
>
> During the process, the semantics of the fault path check changed in two
> ways:
>
> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps
> path).
> 2) We no longer checked if non-anonymous memory had a vm_ops-
> >huge_fault
>    handler that could satisfy the fault.  Previously, this check had been
>    done in create_huge_pud() and create_huge_pmd() routines, but after
>    the changes, we never reach those routines.
>
> During the review of the above commits, it was determined that in-tree users
> weren't affected by the change; most notably, since the only relevant user (in
> terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is explicitly
> approved early in approval logic.  However, there is at least one occurrence
> where an out-of-tree driver that used VM_HUGEPAGE|VM_MIXEDMAP with a
> vm_ops->huge_fault handler, was broken.
>
> Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> any ->huge_fault handler a chance to handle the fault.  Note that we don't
> validate the file mode or mapping alignment, which is consistent with the
> behavior before the aforementioned commits.
>
> Fixes: 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> Reported-by: Saurabh Singh Sengar <ssengar@microsoft.com>
> Signed-off-by: Zach O'Keefe <zokeefe@google.com>
> Cc: Yang Shi <shy828301@gmail.com>
> ---
> Changed from v2[1]:
>         - Fixed false negative in smaps check when !dax && ->huge_fault
> Changed from v1[2]:
>         - [Saurabhi] Allow ->huge_fault handler to handle fault, if it exists
>
> There are some logical holes in smaps' THPeligible checks here, but those are
> best dealt with in follow-up patches.  For now, just make sure the fault path is
> dealt with.
>
> [1]
> https://lore.k/
> ernel.org%2Flinux-mm%2F20230818211533.2523697-1-
> zokeefe%40google.com%2F&data=05%7C01%7Cssengar%40microsoft.com%7
> Ce782558e7bce4f9d060608dba2a12b58%7C72f988bf86f141af91ab2d7cd011
> db47%7C1%7C0%7C638282585367952964%7CUnknown%7CTWFpbGZsb3d8e
> yJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D
> %7C3000%7C%7C%7C&sdata=t%2FwAGlOyKmKp%2FnDPGv9cl2j3h%2F3xVuV
> Y%2BQqeu3A4HHk%3D&reserved=0
> [2]
> https://lore.k/
> ernel.org%2Flinux-
> mm%2FCAAa6QmQw%2BF%3Do6htOn%3D6ADD6mwvMO%3DOw_67f3ifBv3
> GpXx9Xg_g%40mail.gmail.com%2F&data=05%7C01%7Cssengar%40microsoft.
> com%7Ce782558e7bce4f9d060608dba2a12b58%7C72f988bf86f141af91ab2d
> 7cd011db47%7C1%7C0%7C638282585367952964%7CUnknown%7CTWFpbGZ
> sb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6M
> n0%3D%7C3000%7C%7C%7C&sdata=lT6ZqrOBoVIcPbOH%2BHto5pPTmpC6pk
> QMu58gnKG7aLo%3D&reserved=0
>
> ---
>  mm/huge_memory.c | 20 +++++++++++++-------
>  1 file changed, 13 insertions(+), 7 deletions(-)
>
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c index
> eb3678360b97..901dcf8db8d2 100644
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -96,11 +96,11 @@ bool hugepage_vma_check(struct vm_area_struct
> *vma, unsigned long vm_flags,
>                 return in_pf;
>
>         /*
> -        * Special VMA and hugetlb VMA.
> +        * khugepaged special VMA and hugetlb VMA.
>          * Must be checked after dax since some dax mappings may have
>          * VM_MIXEDMAP set.
>          */
> -       if (vm_flags & VM_NO_KHUGEPAGED)
> +       if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
>                 return false;
>
>         /*
> @@ -128,12 +128,18 @@ bool hugepage_vma_check(struct vm_area_struct
> *vma, unsigned long vm_flags,
>                                            !hugepage_flags_always())))
>                 return false;
>
> -       /* Only regular file is valid */
> -       if (!in_pf && file_thp_enabled(vma))
> -               return true;
> -
> -       if (!vma_is_anonymous(vma))
> +       if (!vma_is_anonymous(vma)) {
> +               /*
> +                * Trust that ->huge_fault() handlers know what they are doing
> +                * in fault path.
> +                */
> +               if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
> +                       return true;

Thanks for the patch. I have tested it, looks good to me.

- Saurabh

> +               /* Only regular file is valid in collapse path */
> +               if (((!in_pf || smaps)) && file_thp_enabled(vma))
> +                       return true;
>                 return false;
> +       }
>
>         if (vma_is_temporary_stack(vma))
>                 return false;
> --
> 2.42.0.rc1.204.g551eb34607-goog
David Hildenbrand Aug. 24, 2023, 7:39 a.m. UTC | #2
On 22.08.23 01:48, Zach O'Keefe wrote:
> The 6.0 commits:
> 
> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> 
> merged "can we have THPs in this VMA?" logic that was previously done
> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> 
> During the process, the semantics of the fault path check changed in two
> ways:
> 
> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
>     handler that could satisfy the fault.  Previously, this check had been
>     done in create_huge_pud() and create_huge_pmd() routines, but after
>     the changes, we never reach those routines.
> 
> During the review of the above commits, it was determined that in-tree
> users weren't affected by the change; most notably, since the only relevant
> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> explicitly approved early in approval logic.  However, there is at least
> one occurrence where an out-of-tree driver that used
> VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.

... so all we did is break an arbitrary out-of-tree driver? Sorry to 
say, but why should we care?

Is there any in-tree code affected and needs a "Fixes:" ?
Zach O'Keefe Aug. 24, 2023, 1:59 p.m. UTC | #3
On Thu, Aug 24, 2023 at 12:39 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 22.08.23 01:48, Zach O'Keefe wrote:
> > The 6.0 commits:
> >
> > commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> > commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> >
> > merged "can we have THPs in this VMA?" logic that was previously done
> > separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> >
> > During the process, the semantics of the fault path check changed in two
> > ways:
> >
> > 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> > 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
> >     handler that could satisfy the fault.  Previously, this check had been
> >     done in create_huge_pud() and create_huge_pmd() routines, but after
> >     the changes, we never reach those routines.
> >
> > During the review of the above commits, it was determined that in-tree
> > users weren't affected by the change; most notably, since the only relevant
> > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> > explicitly approved early in approval logic.  However, there is at least
> > one occurrence where an out-of-tree driver that used
> > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
>
> ... so all we did is break an arbitrary out-of-tree driver? Sorry to
> say, but why should we care?
>
> Is there any in-tree code affected and needs a "Fixes:" ?

The in-tree code was taken care of during the rework .. but I didn't
know about the possibility of a driver hooking in here.

I don't know what the normal policy / stance here is, but I figured
the change was simple enough that it was worth helping out.

For both VM_MIXEDMAP and !DAX ->huge_fault, there is some argument to
be made that they are unnecessarily restrictive anyways.

> --
> Cheers,
>
> David / dhildenb
>
David Hildenbrand Aug. 24, 2023, 2:05 p.m. UTC | #4
On 24.08.23 15:59, Zach O'Keefe wrote:
> On Thu, Aug 24, 2023 at 12:39 AM David Hildenbrand <david@redhat.com> wrote:
>>
>> On 22.08.23 01:48, Zach O'Keefe wrote:
>>> The 6.0 commits:
>>>
>>> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
>>> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
>>>
>>> merged "can we have THPs in this VMA?" logic that was previously done
>>> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
>>>
>>> During the process, the semantics of the fault path check changed in two
>>> ways:
>>>
>>> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
>>> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
>>>      handler that could satisfy the fault.  Previously, this check had been
>>>      done in create_huge_pud() and create_huge_pmd() routines, but after
>>>      the changes, we never reach those routines.
>>>
>>> During the review of the above commits, it was determined that in-tree
>>> users weren't affected by the change; most notably, since the only relevant
>>> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
>>> explicitly approved early in approval logic.  However, there is at least
>>> one occurrence where an out-of-tree driver that used
>>> VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
>>
>> ... so all we did is break an arbitrary out-of-tree driver? Sorry to
>> say, but why should we care?
>>
>> Is there any in-tree code affected and needs a "Fixes:" ?
> 
> The in-tree code was taken care of during the rework .. but I didn't
> know about the possibility of a driver hooking in here.

And that's the problem of the driver, no? It's not the job of the kernel 
developers to be aware of each out-of-tree driver to not accidentally 
break something in there.

> 
> I don't know what the normal policy / stance here is, but I figured
> the change was simple enough that it was worth helping out.

If you decide to be out-of-tree, then you have be prepared to only 
support tested kernels and fix your driver when something changes 
upstream -- like upstream developers would do for you when it would be 
in-tree.

So why can't the out-of-tree driver be fixed, similarly to how we would 
have fixed it if it would be in-tree?
Zach O'Keefe Aug. 24, 2023, 2:47 p.m. UTC | #5
On Thu, Aug 24, 2023 at 7:05 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 24.08.23 15:59, Zach O'Keefe wrote:
> > On Thu, Aug 24, 2023 at 12:39 AM David Hildenbrand <david@redhat.com> wrote:
> >>
> >> On 22.08.23 01:48, Zach O'Keefe wrote:
> >>> The 6.0 commits:
> >>>
> >>> commit 9fec51689ff6 ("mm: thp: kill transparent_hugepage_active()")
> >>> commit 7da4e2cb8b1f ("mm: thp: kill __transhuge_page_enabled()")
> >>>
> >>> merged "can we have THPs in this VMA?" logic that was previously done
> >>> separately by fault-path, khugepaged, and smaps "THPeligible" checks.
> >>>
> >>> During the process, the semantics of the fault path check changed in two
> >>> ways:
> >>>
> >>> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps path).
> >>> 2) We no longer checked if non-anonymous memory had a vm_ops->huge_fault
> >>>      handler that could satisfy the fault.  Previously, this check had been
> >>>      done in create_huge_pud() and create_huge_pmd() routines, but after
> >>>      the changes, we never reach those routines.
> >>>
> >>> During the review of the above commits, it was determined that in-tree
> >>> users weren't affected by the change; most notably, since the only relevant
> >>> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> >>> explicitly approved early in approval logic.  However, there is at least
> >>> one occurrence where an out-of-tree driver that used
> >>> VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> >>
> >> ... so all we did is break an arbitrary out-of-tree driver? Sorry to
> >> say, but why should we care?
> >>
> >> Is there any in-tree code affected and needs a "Fixes:" ?
> >
> > The in-tree code was taken care of during the rework .. but I didn't
> > know about the possibility of a driver hooking in here.
>
> And that's the problem of the driver, no? It's not the job of the kernel
> developers to be aware of each out-of-tree driver to not accidentally
> break something in there.
>
> >
> > I don't know what the normal policy / stance here is, but I figured
> > the change was simple enough that it was worth helping out.
>
> If you decide to be out-of-tree, then you have be prepared to only
> support tested kernels and fix your driver when something changes
> upstream -- like upstream developers would do for you when it would be
> in-tree.
>
> So why can't the out-of-tree driver be fixed, similarly to how we would
> have fixed it if it would be in-tree?

I don't know much about driver development, but perhaps they are /
need to use a pristine upstream kernel, with their driver as a
loadable kernel module. Saurabh can comment on this, I don't know.

But your point is very valid otherwise.


> --
> Cheers,
>
> David / dhildenb
>
Saurabh Singh Sengar Aug. 24, 2023, 3:39 p.m. UTC | #6
> On Thu, Aug 24, 2023 at 7:05 AM David Hildenbrand <david@redhat.com>
> wrote:
> >
> > On 24.08.23 15:59, Zach O'Keefe wrote:
> > > On Thu, Aug 24, 2023 at 12:39 AM David Hildenbrand
> <david@redhat.com> wrote:
> > >>
> > >> On 22.08.23 01:48, Zach O'Keefe wrote:
> > >>> The 6.0 commits:
> > >>>
> > >>> commit 9fec51689ff6 ("mm: thp: kill
> > >>> transparent_hugepage_active()") commit 7da4e2cb8b1f ("mm: thp:
> > >>> kill __transhuge_page_enabled()")
> > >>>
> > >>> merged "can we have THPs in this VMA?" logic that was previously
> > >>> done separately by fault-path, khugepaged, and smaps "THPeligible"
> checks.
> > >>>
> > >>> During the process, the semantics of the fault path check changed
> > >>> in two
> > >>> ways:
> > >>>
> > >>> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps
> path).
> > >>> 2) We no longer checked if non-anonymous memory had a vm_ops-
> >huge_fault
> > >>>      handler that could satisfy the fault.  Previously, this check had been
> > >>>      done in create_huge_pud() and create_huge_pmd() routines, but
> after
> > >>>      the changes, we never reach those routines.
> > >>>
> > >>> During the review of the above commits, it was determined that
> > >>> in-tree users weren't affected by the change; most notably, since
> > >>> the only relevant user (in terms of THP) of VM_MIXEDMAP or
> > >>> ->huge_fault is DAX, which is explicitly approved early in
> > >>> approval logic.  However, there is at least one occurrence where
> > >>> an out-of-tree driver that used VM_HUGEPAGE|VM_MIXEDMAP with a
> vm_ops->huge_fault handler, was broken.
> > >>
> > >> ... so all we did is break an arbitrary out-of-tree driver? Sorry
> > >> to say, but why should we care?
> > >>
> > >> Is there any in-tree code affected and needs a "Fixes:" ?
> > >
> > > The in-tree code was taken care of during the rework .. but I didn't
> > > know about the possibility of a driver hooking in here.
> >
> > And that's the problem of the driver, no? It's not the job of the
> > kernel developers to be aware of each out-of-tree driver to not
> > accidentally break something in there.
> >
> > >
> > > I don't know what the normal policy / stance here is, but I figured
> > > the change was simple enough that it was worth helping out.
> >
> > If you decide to be out-of-tree, then you have be prepared to only
> > support tested kernels and fix your driver when something changes
> > upstream -- like upstream developers would do for you when it would be
> > in-tree.
> >
> > So why can't the out-of-tree driver be fixed, similarly to how we
> > would have fixed it if it would be in-tree?
> 
> I don't know much about driver development, but perhaps they are / need to
> use a pristine upstream kernel, with their driver as a loadable kernel module.
> Saurabh can comment on this, I don't know.

You are correct Zach. The "out-of-tree" driver had been seamlessly operational
on version 5.19, leveraging the kernel's capability to handle huge faults for
non-anonymous vma. However, the transition to kernel version 6.1 inadvertently
led to the removal of this feature. It's important to note that this removal wasn't
intentional, and I am optimistic about the potential restoration of this feature.

Hello David,

Given the context, I am currently exploring potential ways to address the issue
with the "out-of-tree" driver. I have recognized a challenge posed by the kernel's
memory management framework, which now imposes restrictions on huge faults
for non-anonymous vma. My inclination to contribute this driver to the mainline
kernel remains strong. If there's a possibility of engaging in discussions or
collaborative efforts to align this driver with the present framework, I'm fully
committed to the process. Your suggestions would be greatly appreciated, as I
am eager to ensure the compatibility of the "out-of-tree" driver with the kernel's
evolving framework.

- Saurabh

> 
> But your point is very valid otherwise.
> 
> 
> > --
> > Cheers,
> >
> > David / dhildenb
> >
David Hildenbrand Aug. 25, 2023, 7:59 a.m. UTC | #7
On 24.08.23 17:39, Saurabh Singh Sengar wrote:
>> On Thu, Aug 24, 2023 at 7:05 AM David Hildenbrand <david@redhat.com>
>> wrote:
>>>
>>> On 24.08.23 15:59, Zach O'Keefe wrote:
>>>> On Thu, Aug 24, 2023 at 12:39 AM David Hildenbrand
>> <david@redhat.com> wrote:
>>>>>
>>>>> On 22.08.23 01:48, Zach O'Keefe wrote:
>>>>>> The 6.0 commits:
>>>>>>
>>>>>> commit 9fec51689ff6 ("mm: thp: kill
>>>>>> transparent_hugepage_active()") commit 7da4e2cb8b1f ("mm: thp:
>>>>>> kill __transhuge_page_enabled()")
>>>>>>
>>>>>> merged "can we have THPs in this VMA?" logic that was previously
>>>>>> done separately by fault-path, khugepaged, and smaps "THPeligible"
>> checks.
>>>>>>
>>>>>> During the process, the semantics of the fault path check changed
>>>>>> in two
>>>>>> ways:
>>>>>>
>>>>>> 1) A VM_NO_KHUGEPAGED check was introduced (also added to smaps
>> path).
>>>>>> 2) We no longer checked if non-anonymous memory had a vm_ops-
>>> huge_fault
>>>>>>       handler that could satisfy the fault.  Previously, this check had been
>>>>>>       done in create_huge_pud() and create_huge_pmd() routines, but
>> after
>>>>>>       the changes, we never reach those routines.
>>>>>>
>>>>>> During the review of the above commits, it was determined that
>>>>>> in-tree users weren't affected by the change; most notably, since
>>>>>> the only relevant user (in terms of THP) of VM_MIXEDMAP or
>>>>>> ->huge_fault is DAX, which is explicitly approved early in
>>>>>> approval logic.  However, there is at least one occurrence where
>>>>>> an out-of-tree driver that used VM_HUGEPAGE|VM_MIXEDMAP with a
>> vm_ops->huge_fault handler, was broken.
>>>>>
>>>>> ... so all we did is break an arbitrary out-of-tree driver? Sorry
>>>>> to say, but why should we care?
>>>>>
>>>>> Is there any in-tree code affected and needs a "Fixes:" ?
>>>>
>>>> The in-tree code was taken care of during the rework .. but I didn't
>>>> know about the possibility of a driver hooking in here.
>>>
>>> And that's the problem of the driver, no? It's not the job of the
>>> kernel developers to be aware of each out-of-tree driver to not
>>> accidentally break something in there.
>>>
>>>>
>>>> I don't know what the normal policy / stance here is, but I figured
>>>> the change was simple enough that it was worth helping out.
>>>
>>> If you decide to be out-of-tree, then you have be prepared to only
>>> support tested kernels and fix your driver when something changes
>>> upstream -- like upstream developers would do for you when it would be
>>> in-tree.
>>>
>>> So why can't the out-of-tree driver be fixed, similarly to how we
>>> would have fixed it if it would be in-tree?
>>
>> I don't know much about driver development, but perhaps they are / need to
>> use a pristine upstream kernel, with their driver as a loadable kernel module.
>> Saurabh can comment on this, I don't know.
> 

Hi!

> You are correct Zach. The "out-of-tree" driver had been seamlessly operational
> on version 5.19, leveraging the kernel's capability to handle huge faults for
> non-anonymous vma. However, the transition to kernel version 6.1 inadvertently
> led to the removal of this feature. It's important to note that this removal wasn't
> intentional, and I am optimistic about the potential restoration of this feature.

We are currently creating 6.5, and are being told that a patch in 6.0 
(released almost one year ago!) broke an out-of-tree driver.

Being that back-level, you cannot possibly expect that the upstream 
community can seriously care about not breaking your OOT driver in each 
release.

Especially, we do have bigger ->huge_fault changes coming up:

https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org

If the driver is not in the tree, people don't care.

You really should try upstreaming that driver.


So this patch here adds complexity (which I don't like) in order to keep 
an OOT driver working -- possibly for a short time. I'm tempted to say 
"please fix your driver to not use huge faults in that scenario, it is 
no longer supported".

But I'm just about to vanish for 1.5 week into vacation :)

@Willy, what are your thoughts?

In any case, I think we should drop the "Fixes" tag. This does not fix 
any kernel BUG -- it restores compatibility with an OOT driver -- and 
already confused people building distributions and asking about this fix ;)


> 
> Hello David,
> 
> Given the context, I am currently exploring potential ways to address the issue
> with the "out-of-tree" driver. I have recognized a challenge posed by the kernel's
> memory management framework, which now imposes restrictions on huge faults
> for non-anonymous vma.

You should try upstreaming your driver possibly without huge fault 
support, and then separately try re-adding huge fault support and see if 
kernel people want to support that or not.

And if your driver *really* requires huge faults, then supporting that 
would be part of your series when upstreaming that driver.
Matthew Wilcox Aug. 25, 2023, 12:49 p.m. UTC | #8
On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> Especially, we do have bigger ->huge_fault changes coming up:
> 
> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
> 
> If the driver is not in the tree, people don't care.
> 
> You really should try upstreaming that driver.
> 
> 
> So this patch here adds complexity (which I don't like) in order to keep an
> OOT driver working -- possibly for a short time. I'm tempted to say "please
> fix your driver to not use huge faults in that scenario, it is no longer
> supported".
> 
> But I'm just about to vanish for 1.5 week into vacation :)
> 
> @Willy, what are your thoughts?

Fundamentally there was a bad assumption with the original patch --
it assumed that the only reason to support ->huge_fault was for DAX,
and that's not true.  It's just that the only drivers in-tree which
support ->huge_fault do so in order to support DAX.

Keeping a driver out of tree is always a risky and costly proposition.
It will continue to be broken by core kernel changes, particularly
if/when it does unusual things.

I think the complexity is entirely on us.  I think there's a simpler way
to handle the problem, but I'd start by turning all of this "admin and
app get to control when THP are used" nonsense into no-ops.
David Hildenbrand Aug. 25, 2023, 12:58 p.m. UTC | #9
On 25.08.23 14:49, Matthew Wilcox wrote:
> On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
>> Especially, we do have bigger ->huge_fault changes coming up:
>>
>> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
>>
>> If the driver is not in the tree, people don't care.
>>
>> You really should try upstreaming that driver.
>>
>>
>> So this patch here adds complexity (which I don't like) in order to keep an
>> OOT driver working -- possibly for a short time. I'm tempted to say "please
>> fix your driver to not use huge faults in that scenario, it is no longer
>> supported".
>>
>> But I'm just about to vanish for 1.5 week into vacation :)
>>
>> @Willy, what are your thoughts?
> 
> Fundamentally there was a bad assumption with the original patch --
> it assumed that the only reason to support ->huge_fault was for DAX,
> and that's not true.  It's just that the only drivers in-tree which
> support ->huge_fault do so in order to support DAX.

Okay, and we are willing to continue supporting that then and it's 
nothing we want to stop OOT drivers from doing.

Fine with me; we should probably reflect that in the patch description.

> 
> Keeping a driver out of tree is always a risky and costly proposition.
> It will continue to be broken by core kernel changes, particularly
> if/when it does unusual things.
> 

Yes.

> I think the complexity is entirely on us.  I think there's a simpler way
> to handle the problem, but I'd start by turning all of this "admin and
> app get to control when THP are used" nonsense into no-ops.

Well, simpler, yes, but also more controversial :)
Zach O'Keefe Aug. 25, 2023, 3:09 p.m. UTC | #10
On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@redhat.com> wrote:
>
> On 25.08.23 14:49, Matthew Wilcox wrote:
> > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> >> Especially, we do have bigger ->huge_fault changes coming up:
> >>
> >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org

FWIW, one of those patches updates the docs to read,

"->huge_fault() is called when there is no PUD or PMD entry present.  This
gives the filesystem the opportunity to install a PUD or PMD sized page.
Filesystems can also use the ->fault method to return a PMD sized page,
so implementing this function may not be necessary.  In particular,
filesystems should not call filemap_fault() from ->huge_fault(). [..]"

Which won't work (in the general case) without this patch (well, at
least the ->huge_fault() check part).

So, if we're advertising this is the way it works, maybe that gives a
stronger argument for addressing it sooner vs when the first in-tree
user depends on it?

> >> If the driver is not in the tree, people don't care.
> >>
> >> You really should try upstreaming that driver.
> >>
> >>
> >> So this patch here adds complexity (which I don't like) in order to keep an
> >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> >> fix your driver to not use huge faults in that scenario, it is no longer
> >> supported".
> >>
> >> But I'm just about to vanish for 1.5 week into vacation :)
> >>
> >> @Willy, what are your thoughts?
> >
> > Fundamentally there was a bad assumption with the original patch --
> > it assumed that the only reason to support ->huge_fault was for DAX,
> > and that's not true.  It's just that the only drivers in-tree which
> > support ->huge_fault do so in order to support DAX.
>
> Okay, and we are willing to continue supporting that then and it's
> nothing we want to stop OOT drivers from doing.
>
> Fine with me; we should probably reflect that in the patch description.

I can change these paragraphs,

"During the review of the above commits, it was determined that in-tree
users weren't affected by the change; most notably, since the only relevant
user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
explicitly approved early in approval logic.  However, there is at least
one occurrence where an out-of-tree driver that used
VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.

Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
any ->huge_fault handler a chance to handle the fault.  Note that we
don't validate the file mode or mapping alignment, which is consistent
with the behavior before the aforementioned commits."

To read,

"The above commits, however, overfit the existing in-tree use cases,
and assume that
the only reason to support ->huge_fault was for DAX (which is
explicitly approved early in the approval logic).
This is a bad assumption to make and unnecessarily prevents general
support of ->huge_fault by filesystems. Allow returning "true" if such
a handler exists, giving the fault path an opportunity to exercise it.

Similarly, the rationale for including the VM_NO_KHUGEPAGED check
along the fault path was that it didn't alter any in-tree users, but
was likewise similarly unnecessarily restrictive (and reads odd).
Remove the check from the fault path."

> >
> > Keeping a driver out of tree is always a risky and costly proposition.
> > It will continue to be broken by core kernel changes, particularly
> > if/when it does unusual things.
> >
>
> Yes.
>
> > I think the complexity is entirely on us.  I think there's a simpler way
> > to handle the problem, but I'd start by turning all of this "admin and
> > app get to control when THP are used" nonsense into no-ops.
>
> Well, simpler, yes, but also more controversial :)
>
> --
> Cheers,
>
> David / dhildenb
>
Saurabh Singh Sengar Sept. 6, 2023, 6:58 a.m. UTC | #11
On Fri, Aug 25, 2023 at 08:09:07AM -0700, Zach O'Keefe wrote:
> On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@redhat.com> wrote:
> >
> > On 25.08.23 14:49, Matthew Wilcox wrote:
> > > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> > >> Especially, we do have bigger ->huge_fault changes coming up:
> > >>
> > >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
> 
> FWIW, one of those patches updates the docs to read,
> 
> "->huge_fault() is called when there is no PUD or PMD entry present.  This
> gives the filesystem the opportunity to install a PUD or PMD sized page.
> Filesystems can also use the ->fault method to return a PMD sized page,
> so implementing this function may not be necessary.  In particular,
> filesystems should not call filemap_fault() from ->huge_fault(). [..]"
> 
> Which won't work (in the general case) without this patch (well, at
> least the ->huge_fault() check part).
> 
> So, if we're advertising this is the way it works, maybe that gives a
> stronger argument for addressing it sooner vs when the first in-tree
> user depends on it?
> 
> > >> If the driver is not in the tree, people don't care.
> > >>
> > >> You really should try upstreaming that driver.
> > >>
> > >>
> > >> So this patch here adds complexity (which I don't like) in order to keep an
> > >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> > >> fix your driver to not use huge faults in that scenario, it is no longer
> > >> supported".
> > >>
> > >> But I'm just about to vanish for 1.5 week into vacation :)
> > >>
> > >> @Willy, what are your thoughts?
> > >
> > > Fundamentally there was a bad assumption with the original patch --
> > > it assumed that the only reason to support ->huge_fault was for DAX,
> > > and that's not true.  It's just that the only drivers in-tree which
> > > support ->huge_fault do so in order to support DAX.
> >
> > Okay, and we are willing to continue supporting that then and it's
> > nothing we want to stop OOT drivers from doing.
> >
> > Fine with me; we should probably reflect that in the patch description.
> 
> I can change these paragraphs,
> 
> "During the review of the above commits, it was determined that in-tree
> users weren't affected by the change; most notably, since the only relevant
> user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> explicitly approved early in approval logic.  However, there is at least
> one occurrence where an out-of-tree driver that used
> VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> 
> Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> any ->huge_fault handler a chance to handle the fault.  Note that we
> don't validate the file mode or mapping alignment, which is consistent
> with the behavior before the aforementioned commits."
> 
> To read,
> 
> "The above commits, however, overfit the existing in-tree use cases,
> and assume that
> the only reason to support ->huge_fault was for DAX (which is
> explicitly approved early in the approval logic).
> This is a bad assumption to make and unnecessarily prevents general
> support of ->huge_fault by filesystems. Allow returning "true" if such
> a handler exists, giving the fault path an opportunity to exercise it.
> 
> Similarly, the rationale for including the VM_NO_KHUGEPAGED check
> along the fault path was that it didn't alter any in-tree users, but
> was likewise similarly unnecessarily restrictive (and reads odd).
> Remove the check from the fault path."
>


Any chance this can make it to 6.6 kernel ?

- Saurabh
Saurabh Singh Sengar Sept. 20, 2023, 5:44 a.m. UTC | #12
On Tue, Sep 05, 2023 at 11:58:17PM -0700, Saurabh Singh Sengar wrote:
> On Fri, Aug 25, 2023 at 08:09:07AM -0700, Zach O'Keefe wrote:
> > On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@redhat.com> wrote:
> > >
> > > On 25.08.23 14:49, Matthew Wilcox wrote:
> > > > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> > > >> Especially, we do have bigger ->huge_fault changes coming up:
> > > >>
> > > >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
> > 
> > FWIW, one of those patches updates the docs to read,
> > 
> > "->huge_fault() is called when there is no PUD or PMD entry present.  This
> > gives the filesystem the opportunity to install a PUD or PMD sized page.
> > Filesystems can also use the ->fault method to return a PMD sized page,
> > so implementing this function may not be necessary.  In particular,
> > filesystems should not call filemap_fault() from ->huge_fault(). [..]"
> > 
> > Which won't work (in the general case) without this patch (well, at
> > least the ->huge_fault() check part).
> > 
> > So, if we're advertising this is the way it works, maybe that gives a
> > stronger argument for addressing it sooner vs when the first in-tree
> > user depends on it?
> > 
> > > >> If the driver is not in the tree, people don't care.
> > > >>
> > > >> You really should try upstreaming that driver.
> > > >>
> > > >>
> > > >> So this patch here adds complexity (which I don't like) in order to keep an
> > > >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> > > >> fix your driver to not use huge faults in that scenario, it is no longer
> > > >> supported".
> > > >>
> > > >> But I'm just about to vanish for 1.5 week into vacation :)
> > > >>
> > > >> @Willy, what are your thoughts?
> > > >
> > > > Fundamentally there was a bad assumption with the original patch --
> > > > it assumed that the only reason to support ->huge_fault was for DAX,
> > > > and that's not true.  It's just that the only drivers in-tree which
> > > > support ->huge_fault do so in order to support DAX.
> > >
> > > Okay, and we are willing to continue supporting that then and it's
> > > nothing we want to stop OOT drivers from doing.
> > >
> > > Fine with me; we should probably reflect that in the patch description.
> > 
> > I can change these paragraphs,
> > 
> > "During the review of the above commits, it was determined that in-tree
> > users weren't affected by the change; most notably, since the only relevant
> > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> > explicitly approved early in approval logic.  However, there is at least
> > one occurrence where an out-of-tree driver that used
> > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> > 
> > Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> > any ->huge_fault handler a chance to handle the fault.  Note that we
> > don't validate the file mode or mapping alignment, which is consistent
> > with the behavior before the aforementioned commits."
> > 
> > To read,
> > 
> > "The above commits, however, overfit the existing in-tree use cases,
> > and assume that
> > the only reason to support ->huge_fault was for DAX (which is
> > explicitly approved early in the approval logic).
> > This is a bad assumption to make and unnecessarily prevents general
> > support of ->huge_fault by filesystems. Allow returning "true" if such
> > a handler exists, giving the fault path an opportunity to exercise it.
> > 
> > Similarly, the rationale for including the VM_NO_KHUGEPAGED check
> > along the fault path was that it didn't alter any in-tree users, but
> > was likewise similarly unnecessarily restrictive (and reads odd).
> > Remove the check from the fault path."
> >
> 
> 
> Any chance this can make it to 6.6 kernel ?

ping

> 
> - Saurabh
Yang Shi Sept. 22, 2023, 4:54 p.m. UTC | #13
On Tue, Sep 19, 2023 at 10:44 PM Saurabh Singh Sengar
<ssengar@linux.microsoft.com> wrote:
>
> On Tue, Sep 05, 2023 at 11:58:17PM -0700, Saurabh Singh Sengar wrote:
> > On Fri, Aug 25, 2023 at 08:09:07AM -0700, Zach O'Keefe wrote:
> > > On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@redhat.com> wrote:
> > > >
> > > > On 25.08.23 14:49, Matthew Wilcox wrote:
> > > > > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> > > > >> Especially, we do have bigger ->huge_fault changes coming up:
> > > > >>
> > > > >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
> > >
> > > FWIW, one of those patches updates the docs to read,
> > >
> > > "->huge_fault() is called when there is no PUD or PMD entry present.  This
> > > gives the filesystem the opportunity to install a PUD or PMD sized page.
> > > Filesystems can also use the ->fault method to return a PMD sized page,
> > > so implementing this function may not be necessary.  In particular,
> > > filesystems should not call filemap_fault() from ->huge_fault(). [..]"
> > >
> > > Which won't work (in the general case) without this patch (well, at
> > > least the ->huge_fault() check part).
> > >
> > > So, if we're advertising this is the way it works, maybe that gives a
> > > stronger argument for addressing it sooner vs when the first in-tree
> > > user depends on it?
> > >
> > > > >> If the driver is not in the tree, people don't care.
> > > > >>
> > > > >> You really should try upstreaming that driver.
> > > > >>
> > > > >>
> > > > >> So this patch here adds complexity (which I don't like) in order to keep an
> > > > >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> > > > >> fix your driver to not use huge faults in that scenario, it is no longer
> > > > >> supported".
> > > > >>
> > > > >> But I'm just about to vanish for 1.5 week into vacation :)
> > > > >>
> > > > >> @Willy, what are your thoughts?
> > > > >
> > > > > Fundamentally there was a bad assumption with the original patch --
> > > > > it assumed that the only reason to support ->huge_fault was for DAX,
> > > > > and that's not true.  It's just that the only drivers in-tree which
> > > > > support ->huge_fault do so in order to support DAX.
> > > >
> > > > Okay, and we are willing to continue supporting that then and it's
> > > > nothing we want to stop OOT drivers from doing.
> > > >
> > > > Fine with me; we should probably reflect that in the patch description.
> > >
> > > I can change these paragraphs,
> > >
> > > "During the review of the above commits, it was determined that in-tree
> > > users weren't affected by the change; most notably, since the only relevant
> > > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> > > explicitly approved early in approval logic.  However, there is at least
> > > one occurrence where an out-of-tree driver that used
> > > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> > >
> > > Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> > > any ->huge_fault handler a chance to handle the fault.  Note that we
> > > don't validate the file mode or mapping alignment, which is consistent
> > > with the behavior before the aforementioned commits."
> > >
> > > To read,
> > >
> > > "The above commits, however, overfit the existing in-tree use cases,
> > > and assume that
> > > the only reason to support ->huge_fault was for DAX (which is
> > > explicitly approved early in the approval logic).
> > > This is a bad assumption to make and unnecessarily prevents general
> > > support of ->huge_fault by filesystems. Allow returning "true" if such
> > > a handler exists, giving the fault path an opportunity to exercise it.
> > >
> > > Similarly, the rationale for including the VM_NO_KHUGEPAGED check
> > > along the fault path was that it didn't alter any in-tree users, but
> > > was likewise similarly unnecessarily restrictive (and reads odd).
> > > Remove the check from the fault path."
> > >
> >
> >
> > Any chance this can make it to 6.6 kernel ?
>
> ping

I think we tend to merge this patch, but anyway it is Andrew's call.
Included Andrew in this loop.

>
> >
> > - Saurabh
Zach O'Keefe Sept. 22, 2023, 4:56 p.m. UTC | #14
On Fri, Sep 22, 2023 at 9:54 AM Yang Shi <shy828301@gmail.com> wrote:
>
> On Tue, Sep 19, 2023 at 10:44 PM Saurabh Singh Sengar
> <ssengar@linux.microsoft.com> wrote:
> >
> > On Tue, Sep 05, 2023 at 11:58:17PM -0700, Saurabh Singh Sengar wrote:
> > > On Fri, Aug 25, 2023 at 08:09:07AM -0700, Zach O'Keefe wrote:
> > > > On Fri, Aug 25, 2023 at 5:58 AM David Hildenbrand <david@redhat.com> wrote:
> > > > >
> > > > > On 25.08.23 14:49, Matthew Wilcox wrote:
> > > > > > On Fri, Aug 25, 2023 at 09:59:23AM +0200, David Hildenbrand wrote:
> > > > > >> Especially, we do have bigger ->huge_fault changes coming up:
> > > > > >>
> > > > > >> https://lkml.kernel.org/r/20230818202335.2739663-1-willy@infradead.org
> > > >
> > > > FWIW, one of those patches updates the docs to read,
> > > >
> > > > "->huge_fault() is called when there is no PUD or PMD entry present.  This
> > > > gives the filesystem the opportunity to install a PUD or PMD sized page.
> > > > Filesystems can also use the ->fault method to return a PMD sized page,
> > > > so implementing this function may not be necessary.  In particular,
> > > > filesystems should not call filemap_fault() from ->huge_fault(). [..]"
> > > >
> > > > Which won't work (in the general case) without this patch (well, at
> > > > least the ->huge_fault() check part).
> > > >
> > > > So, if we're advertising this is the way it works, maybe that gives a
> > > > stronger argument for addressing it sooner vs when the first in-tree
> > > > user depends on it?
> > > >
> > > > > >> If the driver is not in the tree, people don't care.
> > > > > >>
> > > > > >> You really should try upstreaming that driver.
> > > > > >>
> > > > > >>
> > > > > >> So this patch here adds complexity (which I don't like) in order to keep an
> > > > > >> OOT driver working -- possibly for a short time. I'm tempted to say "please
> > > > > >> fix your driver to not use huge faults in that scenario, it is no longer
> > > > > >> supported".
> > > > > >>
> > > > > >> But I'm just about to vanish for 1.5 week into vacation :)
> > > > > >>
> > > > > >> @Willy, what are your thoughts?
> > > > > >
> > > > > > Fundamentally there was a bad assumption with the original patch --
> > > > > > it assumed that the only reason to support ->huge_fault was for DAX,
> > > > > > and that's not true.  It's just that the only drivers in-tree which
> > > > > > support ->huge_fault do so in order to support DAX.
> > > > >
> > > > > Okay, and we are willing to continue supporting that then and it's
> > > > > nothing we want to stop OOT drivers from doing.
> > > > >
> > > > > Fine with me; we should probably reflect that in the patch description.
> > > >
> > > > I can change these paragraphs,
> > > >
> > > > "During the review of the above commits, it was determined that in-tree
> > > > users weren't affected by the change; most notably, since the only relevant
> > > > user (in terms of THP) of VM_MIXEDMAP or ->huge_fault is DAX, which is
> > > > explicitly approved early in approval logic.  However, there is at least
> > > > one occurrence where an out-of-tree driver that used
> > > > VM_HUGEPAGE|VM_MIXEDMAP with a vm_ops->huge_fault handler, was broken.
> > > >
> > > > Remove the VM_NO_KHUGEPAGED check when not in collapse path and give
> > > > any ->huge_fault handler a chance to handle the fault.  Note that we
> > > > don't validate the file mode or mapping alignment, which is consistent
> > > > with the behavior before the aforementioned commits."
> > > >
> > > > To read,
> > > >
> > > > "The above commits, however, overfit the existing in-tree use cases,
> > > > and assume that
> > > > the only reason to support ->huge_fault was for DAX (which is
> > > > explicitly approved early in the approval logic).
> > > > This is a bad assumption to make and unnecessarily prevents general
> > > > support of ->huge_fault by filesystems. Allow returning "true" if such
> > > > a handler exists, giving the fault path an opportunity to exercise it.
> > > >
> > > > Similarly, the rationale for including the VM_NO_KHUGEPAGED check
> > > > along the fault path was that it didn't alter any in-tree users, but
> > > > was likewise similarly unnecessarily restrictive (and reads odd).
> > > > Remove the check from the fault path."
> > > >
> > >
> > >
> > > Any chance this can make it to 6.6 kernel ?
> >
> > ping
>
> I think we tend to merge this patch, but anyway it is Andrew's call.
> Included Andrew in this loop.

Sorry for delay -- just back from (another) OOO,

From this back/forth with David/Matthew, seems like we're OK saying,
"this was a mistake", and that we can take the patch (need some form
of Ack or Reviewed-by from them first, to confirm)

> > Fundamentally there was a bad assumption with the original patch --
> > it assumed that the only reason to support ->huge_fault was for DAX,
> > and that's not true.  It's just that the only drivers in-tree which
> > support ->huge_fault do so in order to support DAX.
>
> Okay, and we are willing to continue supporting that then and it's
> nothing we want to stop OOT drivers from doing.
>
> Fine with me; we should probably reflect that in the patch description.

But, I don't know about timing. We are in 6.6-rc2, and this hasn't
been exposed in Andrew's trees yet. 6.6 is looking like it could be a
LTS candidate, in which case this patch could flow backwards from
-stable (which would also land in 6.1-y) .. but I don't know if that
path is suitable for this.

Otherwise, perhaps you could include this fix
when you're ready to upstream your driver?

> >
> > >
> > > - Saurabh
Andrew Morton Sept. 22, 2023, 5:20 p.m. UTC | #15
On Fri, 22 Sep 2023 09:56:21 -0700 "Zach O'Keefe" <zokeefe@google.com> wrote:

> >From this back/forth with David/Matthew, seems like we're OK saying,
> "this was a mistake", and that we can take the patch (need some form
> of Ack or Reviewed-by from them first, to confirm)

Yup.  And please let's update the changelog to reflect the details
which have been discussed thus far.

If the change *makes sense* for the current kernel then let's proceed,
regardless of the broken driver issue.

But adding a cc:stable would require extra argumentation, which I will
be interested to read ;)
diff mbox series

Patch

diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index eb3678360b97..901dcf8db8d2 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -96,11 +96,11 @@  bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
 		return in_pf;
 
 	/*
-	 * Special VMA and hugetlb VMA.
+	 * khugepaged special VMA and hugetlb VMA.
 	 * Must be checked after dax since some dax mappings may have
 	 * VM_MIXEDMAP set.
 	 */
-	if (vm_flags & VM_NO_KHUGEPAGED)
+	if (!in_pf && !smaps && (vm_flags & VM_NO_KHUGEPAGED))
 		return false;
 
 	/*
@@ -128,12 +128,18 @@  bool hugepage_vma_check(struct vm_area_struct *vma, unsigned long vm_flags,
 					   !hugepage_flags_always())))
 		return false;
 
-	/* Only regular file is valid */
-	if (!in_pf && file_thp_enabled(vma))
-		return true;
-
-	if (!vma_is_anonymous(vma))
+	if (!vma_is_anonymous(vma)) {
+		/*
+		 * Trust that ->huge_fault() handlers know what they are doing
+		 * in fault path.
+		 */
+		if (((in_pf || smaps)) && vma->vm_ops->huge_fault)
+			return true;
+		/* Only regular file is valid in collapse path */
+		if (((!in_pf || smaps)) && file_thp_enabled(vma))
+			return true;
 		return false;
+	}
 
 	if (vma_is_temporary_stack(vma))
 		return false;