diff mbox series

[v1,1/2] mm/hugetlb: fix hugetlb not supporting write-notify

Message ID 20220805110329.80540-2-david@redhat.com (mailing list archive)
State New
Headers show
Series mm/hugetlb: fix write-fault handling for shared mappings | expand

Commit Message

David Hildenbrand Aug. 5, 2022, 11:03 a.m. UTC
Staring at hugetlb_wp(), one might wonder where all the logic for shared
mappings is when stumbling over a write-protected page in a shared
mapping. In fact, there is none, and so far we thought we could get
away with that because e.g., mprotect() should always do the right thing
and map all pages directly writable.

Looks like we were wrong:

--------------------------------------------------------------------------
 #include <stdio.h>
 #include <stdlib.h>
 #include <string.h>
 #include <fcntl.h>
 #include <unistd.h>
 #include <errno.h>
 #include <sys/mman.h>

 #define HUGETLB_SIZE (2 * 1024 * 1024u)

 static void clear_softdirty(void)
 {
         int fd = open("/proc/self/clear_refs", O_WRONLY);
         const char *ctrl = "4";
         int ret;

         if (fd < 0) {
                 fprintf(stderr, "open(clear_refs) failed\n");
                 exit(1);
         }
         ret = write(fd, ctrl, strlen(ctrl));
         if (ret != strlen(ctrl)) {
                 fprintf(stderr, "write(clear_refs) failed\n");
                 exit(1);
         }
         close(fd);
 }

 int main(int argc, char **argv)
 {
         char *map;
         int fd;

         fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT);
         if (!fd) {
                 fprintf(stderr, "open() failed\n");
                 return -errno;
         }
         if (ftruncate(fd, HUGETLB_SIZE)) {
                 fprintf(stderr, "ftruncate() failed\n");
                 return -errno;
         }

         map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0);
         if (map == MAP_FAILED) {
                 fprintf(stderr, "mmap() failed\n");
                 return -errno;
         }

         *map = 0;

         if (mprotect(map, HUGETLB_SIZE, PROT_READ)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         clear_softdirty();

         if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) {
                 fprintf(stderr, "mmprotect() failed\n");
                 return -errno;
         }

         *map = 0;

         return 0;
 }
--------------------------------------------------------------------------

Above test fails with SIGBUS when there is only a single free hugetlb page.
 # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 Bus error (core dumped)

And worse, with sufficient free hugetlb pages it will map an anonymous page
into a shared mapping, for example, messing up accounting during unmap
and breaking MAP_SHARED semantics:
 # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages
 # ./test
 # cat /proc/meminfo | grep HugePages_
 HugePages_Total:       2
 HugePages_Free:        1
 HugePages_Rsvd:    18446744073709551615
 HugePages_Surp:        0

Reason in this particular case is that vma_wants_writenotify() will
return "true", removing VM_SHARED in vma_set_page_prot() to map pages
write-protected. Let's teach vma_wants_writenotify() that hugetlb does not
support write-notify, including softdirty tracking.

Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared")
Cc: <stable@vger.kernel.org> # v3.18+
Signed-off-by: David Hildenbrand <david@redhat.com>
---
 mm/mmap.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Peter Xu Aug. 5, 2022, 6:14 p.m. UTC | #1
On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> diff --git a/mm/mmap.c b/mm/mmap.c
> index 61e6135c54ef..462a6b0344ac 100644
> --- a/mm/mmap.c
> +++ b/mm/mmap.c
> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
>  		return 0;
>  
> +	/*
> +	 * Hugetlb does not require/support writenotify; especially, it does not
> +	 * support softdirty tracking.
> +	 */
> +	if (is_vm_hugetlb_page(vma))
> +		return 0;

I'm kind of confused here..  you seems to be fixing up soft-dirty for
hugetlb but here it's explicitly forbidden.

Could you explain a bit more on why this patch is needed if (assume
there'll be a working) patch 2 being provided?

Thanks,
David Hildenbrand Aug. 5, 2022, 6:22 p.m. UTC | #2
On 05.08.22 20:14, Peter Xu wrote:
> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
>> diff --git a/mm/mmap.c b/mm/mmap.c
>> index 61e6135c54ef..462a6b0344ac 100644
>> --- a/mm/mmap.c
>> +++ b/mm/mmap.c
>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
>>  		return 0;
>>  
>> +	/*
>> +	 * Hugetlb does not require/support writenotify; especially, it does not
>> +	 * support softdirty tracking.
>> +	 */
>> +	if (is_vm_hugetlb_page(vma))
>> +		return 0;
> 
> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> hugetlb but here it's explicitly forbidden.
> 
> Could you explain a bit more on why this patch is needed if (assume
> there'll be a working) patch 2 being provided?

I want something simple to backport. And even with patch #2 in place, as
long as we don't support softdirty tracking, it doesn't make sense to
enable it here as of now.
Mike Kravetz Aug. 5, 2022, 6:23 p.m. UTC | #3
On 08/05/22 14:14, Peter Xu wrote:
> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> > diff --git a/mm/mmap.c b/mm/mmap.c
> > index 61e6135c54ef..462a6b0344ac 100644
> > --- a/mm/mmap.c
> > +++ b/mm/mmap.c
> > @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
> >  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> >  		return 0;
> >  
> > +	/*
> > +	 * Hugetlb does not require/support writenotify; especially, it does not
> > +	 * support softdirty tracking.
> > +	 */
> > +	if (is_vm_hugetlb_page(vma))
> > +		return 0;
> 
> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> hugetlb but here it's explicitly forbidden.
> 
> Could you explain a bit more on why this patch is needed if (assume
> there'll be a working) patch 2 being provided?
> 

No comments on the patch, but ...

Since it required little thought, I ran the test program on next-20220802 and
was surprised that the issue did not recreate.  Even added a simple printk
to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
We were.

I can easily recreate with an older Fedora kernel.

Will be trying to understand why this is the case.
David Hildenbrand Aug. 5, 2022, 6:25 p.m. UTC | #4
On 05.08.22 20:23, Mike Kravetz wrote:
> On 08/05/22 14:14, Peter Xu wrote:
>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>> index 61e6135c54ef..462a6b0344ac 100644
>>> --- a/mm/mmap.c
>>> +++ b/mm/mmap.c
>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
>>>  		return 0;
>>>  
>>> +	/*
>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
>>> +	 * support softdirty tracking.
>>> +	 */
>>> +	if (is_vm_hugetlb_page(vma))
>>> +		return 0;
>>
>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
>> hugetlb but here it's explicitly forbidden.
>>
>> Could you explain a bit more on why this patch is needed if (assume
>> there'll be a working) patch 2 being provided?
>>
> 
> No comments on the patch, but ...
> 
> Since it required little thought, I ran the test program on next-20220802 and
> was surprised that the issue did not recreate.  Even added a simple printk
> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
> We were.


... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
Mike Kravetz Aug. 5, 2022, 6:33 p.m. UTC | #5
On 08/05/22 20:25, David Hildenbrand wrote:
> On 05.08.22 20:23, Mike Kravetz wrote:
> > On 08/05/22 14:14, Peter Xu wrote:
> >> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> >>> diff --git a/mm/mmap.c b/mm/mmap.c
> >>> index 61e6135c54ef..462a6b0344ac 100644
> >>> --- a/mm/mmap.c
> >>> +++ b/mm/mmap.c
> >>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
> >>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> >>>  		return 0;
> >>>  
> >>> +	/*
> >>> +	 * Hugetlb does not require/support writenotify; especially, it does not
> >>> +	 * support softdirty tracking.
> >>> +	 */
> >>> +	if (is_vm_hugetlb_page(vma))
> >>> +		return 0;
> >>
> >> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> >> hugetlb but here it's explicitly forbidden.
> >>
> >> Could you explain a bit more on why this patch is needed if (assume
> >> there'll be a working) patch 2 being provided?
> >>
> > 
> > No comments on the patch, but ...
> > 
> > Since it required little thought, I ran the test program on next-20220802 and
> > was surprised that the issue did not recreate.  Even added a simple printk
> > to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
> > We were.
> 
> 
> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
> 

No, Duh!

FYI - Some time back, I started looking at adding soft dirty support for
hugetlb mappings.  I did not finish that work.  But, I seem to recall
places where code was operating on hugetlb mappings when perhaps it should
not.

Perhaps, it would also be good to just disable soft dirty for hugetlb at
the source?
David Hildenbrand Aug. 5, 2022, 6:57 p.m. UTC | #6
On 05.08.22 20:33, Mike Kravetz wrote:
> On 08/05/22 20:25, David Hildenbrand wrote:
>> On 05.08.22 20:23, Mike Kravetz wrote:
>>> On 08/05/22 14:14, Peter Xu wrote:
>>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
>>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>>> index 61e6135c54ef..462a6b0344ac 100644
>>>>> --- a/mm/mmap.c
>>>>> +++ b/mm/mmap.c
>>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
>>>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
>>>>>  		return 0;
>>>>>  
>>>>> +	/*
>>>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
>>>>> +	 * support softdirty tracking.
>>>>> +	 */
>>>>> +	if (is_vm_hugetlb_page(vma))
>>>>> +		return 0;
>>>>
>>>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
>>>> hugetlb but here it's explicitly forbidden.
>>>>
>>>> Could you explain a bit more on why this patch is needed if (assume
>>>> there'll be a working) patch 2 being provided?
>>>>
>>>
>>> No comments on the patch, but ...
>>>
>>> Since it required little thought, I ran the test program on next-20220802 and
>>> was surprised that the issue did not recreate.  Even added a simple printk
>>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
>>> We were.
>>
>>
>> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
>>
> 
> No, Duh!
> 
> FYI - Some time back, I started looking at adding soft dirty support for
> hugetlb mappings.  I did not finish that work.  But, I seem to recall
> places where code was operating on hugetlb mappings when perhaps it should
> not.
> 
> Perhaps, it would also be good to just disable soft dirty for hugetlb at
> the source?

I thought about that as well. But I came to the conclusion that without
patch #2, hugetlb VMAs cannot possibly support write-notify, so there is
no need to bother in vma_wants_writenotify() at all.

The "root" would be places where we clear VM_SOFTDIRTY. That should only
be fs/proc/task_mmu.c:clear_refs_write() IIRC.

So I don't particularly care, I consider this patch a bit cleaner and
more generic, but I can adjust clear_refs_write() instead of there is a
preference.
Mike Kravetz Aug. 5, 2022, 8:48 p.m. UTC | #7
On 08/05/22 20:57, David Hildenbrand wrote:
> On 05.08.22 20:33, Mike Kravetz wrote:
> > On 08/05/22 20:25, David Hildenbrand wrote:
> >> On 05.08.22 20:23, Mike Kravetz wrote:
> >>> On 08/05/22 14:14, Peter Xu wrote:
> >>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> >>>>> diff --git a/mm/mmap.c b/mm/mmap.c
> >>>>> index 61e6135c54ef..462a6b0344ac 100644
> >>>>> --- a/mm/mmap.c
> >>>>> +++ b/mm/mmap.c
> >>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
> >>>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> >>>>>  		return 0;
> >>>>>  
> >>>>> +	/*
> >>>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
> >>>>> +	 * support softdirty tracking.
> >>>>> +	 */
> >>>>> +	if (is_vm_hugetlb_page(vma))
> >>>>> +		return 0;
> >>>>
> >>>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> >>>> hugetlb but here it's explicitly forbidden.
> >>>>
> >>>> Could you explain a bit more on why this patch is needed if (assume
> >>>> there'll be a working) patch 2 being provided?
> >>>>
> >>>
> >>> No comments on the patch, but ...
> >>>
> >>> Since it required little thought, I ran the test program on next-20220802 and
> >>> was surprised that the issue did not recreate.  Even added a simple printk
> >>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
> >>> We were.
> >>
> >>
> >> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
> >>
> > 
> > No, Duh!
> > 
> > FYI - Some time back, I started looking at adding soft dirty support for
> > hugetlb mappings.  I did not finish that work.  But, I seem to recall
> > places where code was operating on hugetlb mappings when perhaps it should
> > not.
> > 
> > Perhaps, it would also be good to just disable soft dirty for hugetlb at
> > the source?
> 
> I thought about that as well. But I came to the conclusion that without
> patch #2, hugetlb VMAs cannot possibly support write-notify, so there is
> no need to bother in vma_wants_writenotify() at all.
> 
> The "root" would be places where we clear VM_SOFTDIRTY. That should only
> be fs/proc/task_mmu.c:clear_refs_write() IIRC.
> 
> So I don't particularly care, I consider this patch a bit cleaner and
> more generic, but I can adjust clear_refs_write() instead of there is a
> preference.
> 

After a closer look, I agree that this may be the simplest/cleanest way to
proceed.  I was going to suggest that you note hugetlb does not support
softdirty, but see you did in the comment.

Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
Peter Xu Aug. 5, 2022, 11:13 p.m. UTC | #8
On Fri, Aug 05, 2022 at 01:48:35PM -0700, Mike Kravetz wrote:
> On 08/05/22 20:57, David Hildenbrand wrote:
> > On 05.08.22 20:33, Mike Kravetz wrote:
> > > On 08/05/22 20:25, David Hildenbrand wrote:
> > >> On 05.08.22 20:23, Mike Kravetz wrote:
> > >>> On 08/05/22 14:14, Peter Xu wrote:
> > >>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> > >>>>> diff --git a/mm/mmap.c b/mm/mmap.c
> > >>>>> index 61e6135c54ef..462a6b0344ac 100644
> > >>>>> --- a/mm/mmap.c
> > >>>>> +++ b/mm/mmap.c
> > >>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
> > >>>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> > >>>>>  		return 0;
> > >>>>>  
> > >>>>> +	/*
> > >>>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
> > >>>>> +	 * support softdirty tracking.
> > >>>>> +	 */
> > >>>>> +	if (is_vm_hugetlb_page(vma))
> > >>>>> +		return 0;
> > >>>>
> > >>>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> > >>>> hugetlb but here it's explicitly forbidden.
> > >>>>
> > >>>> Could you explain a bit more on why this patch is needed if (assume
> > >>>> there'll be a working) patch 2 being provided?
> > >>>>
> > >>>
> > >>> No comments on the patch, but ...
> > >>>
> > >>> Since it required little thought, I ran the test program on next-20220802 and
> > >>> was surprised that the issue did not recreate.  Even added a simple printk
> > >>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
> > >>> We were.
> > >>
> > >>
> > >> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
> > >>
> > > 
> > > No, Duh!
> > > 
> > > FYI - Some time back, I started looking at adding soft dirty support for
> > > hugetlb mappings.  I did not finish that work.  But, I seem to recall
> > > places where code was operating on hugetlb mappings when perhaps it should
> > > not.
> > > 
> > > Perhaps, it would also be good to just disable soft dirty for hugetlb at
> > > the source?
> > 
> > I thought about that as well. But I came to the conclusion that without
> > patch #2, hugetlb VMAs cannot possibly support write-notify, so there is
> > no need to bother in vma_wants_writenotify() at all.
> > 
> > The "root" would be places where we clear VM_SOFTDIRTY. That should only
> > be fs/proc/task_mmu.c:clear_refs_write() IIRC.
> > 
> > So I don't particularly care, I consider this patch a bit cleaner and
> > more generic, but I can adjust clear_refs_write() instead of there is a
> > preference.
> > 
> 
> After a closer look, I agree that this may be the simplest/cleanest way to
> proceed.  I was going to suggest that you note hugetlb does not support
> softdirty, but see you did in the comment.
> 
> Acked-by: Mike Kravetz <mike.kravetz@oracle.com>

Filtering out hugetlbfs in vma_wants_writenotify() is still a bit hard to
follow to me, since it's not clear why hugetlbfs never wants writenotify.

If it's only about soft-dirty, we could have added the hugetlbfs check into
vma_soft_dirty_enabled(), then I think it'll achieve the same thing and
much clearer - with the soft-dirty check constantly returning false for it,
hugetlbfs shared vmas should have vma_wants_writenotify() naturally return
0 already.

For the long term - shouldn't we just enable soft-dirty for hugetlbfs?  I
remember Mike used to have that in todo.  Since we've got patch 2 already,
I feel like that's really much close (is the only missing piece the clear
refs write part? or maybe some more that I didn't notice).

Then patch 1 (or IMHO equivalant check in vma_soft_dirty_enabled(), but
maybe in stable trees we don't have vma_soft_dirty_enabled then it's
exactly patch 1) can be a stable-only backport just to avoid the bug from
triggering.

Thanks,
Mike Kravetz Aug. 5, 2022, 11:33 p.m. UTC | #9
On 08/05/22 19:13, Peter Xu wrote:
> On Fri, Aug 05, 2022 at 01:48:35PM -0700, Mike Kravetz wrote:
> > On 08/05/22 20:57, David Hildenbrand wrote:
> > > On 05.08.22 20:33, Mike Kravetz wrote:
> > > > On 08/05/22 20:25, David Hildenbrand wrote:
> > > >> On 05.08.22 20:23, Mike Kravetz wrote:
> > > >>> On 08/05/22 14:14, Peter Xu wrote:
> > > >>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
> > > >>>>> diff --git a/mm/mmap.c b/mm/mmap.c
> > > >>>>> index 61e6135c54ef..462a6b0344ac 100644
> > > >>>>> --- a/mm/mmap.c
> > > >>>>> +++ b/mm/mmap.c
> > > >>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
> > > >>>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
> > > >>>>>  		return 0;
> > > >>>>>  
> > > >>>>> +	/*
> > > >>>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
> > > >>>>> +	 * support softdirty tracking.
> > > >>>>> +	 */
> > > >>>>> +	if (is_vm_hugetlb_page(vma))
> > > >>>>> +		return 0;
> > > >>>>
> > > >>>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
> > > >>>> hugetlb but here it's explicitly forbidden.
> > > >>>>
> > > >>>> Could you explain a bit more on why this patch is needed if (assume
> > > >>>> there'll be a working) patch 2 being provided?
> > > >>>>
> > > >>>
> > > >>> No comments on the patch, but ...
> > > >>>
> > > >>> Since it required little thought, I ran the test program on next-20220802 and
> > > >>> was surprised that the issue did not recreate.  Even added a simple printk
> > > >>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
> > > >>> We were.
> > > >>
> > > >>
> > > >> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
> > > >>
> > > > 
> > > > No, Duh!
> > > > 
> > > > FYI - Some time back, I started looking at adding soft dirty support for
> > > > hugetlb mappings.  I did not finish that work.  But, I seem to recall
> > > > places where code was operating on hugetlb mappings when perhaps it should
> > > > not.
> > > > 
> > > > Perhaps, it would also be good to just disable soft dirty for hugetlb at
> > > > the source?
> > > 
> > > I thought about that as well. But I came to the conclusion that without
> > > patch #2, hugetlb VMAs cannot possibly support write-notify, so there is
> > > no need to bother in vma_wants_writenotify() at all.
> > > 
> > > The "root" would be places where we clear VM_SOFTDIRTY. That should only
> > > be fs/proc/task_mmu.c:clear_refs_write() IIRC.
> > > 
> > > So I don't particularly care, I consider this patch a bit cleaner and
> > > more generic, but I can adjust clear_refs_write() instead of there is a
> > > preference.
> > > 
> > 
> > After a closer look, I agree that this may be the simplest/cleanest way to
> > proceed.  I was going to suggest that you note hugetlb does not support
> > softdirty, but see you did in the comment.
> > 
> > Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Filtering out hugetlbfs in vma_wants_writenotify() is still a bit hard to
> follow to me, since it's not clear why hugetlbfs never wants writenotify.
> 
> If it's only about soft-dirty, we could have added the hugetlbfs check into
> vma_soft_dirty_enabled(), then I think it'll achieve the same thing and
> much clearer - with the soft-dirty check constantly returning false for it,
> hugetlbfs shared vmas should have vma_wants_writenotify() naturally return
> 0 already.
> 
> For the long term - shouldn't we just enable soft-dirty for hugetlbfs?  I
> remember Mike used to have that in todo.  Since we've got patch 2 already,
> I feel like that's really much close (is the only missing piece the clear
> refs write part? or maybe some more that I didn't notice).
> 
> Then patch 1 (or IMHO equivalant check in vma_soft_dirty_enabled(), but
> maybe in stable trees we don't have vma_soft_dirty_enabled then it's
> exactly patch 1) can be a stable-only backport just to avoid the bug from
> triggering.

It looks like vma_soft_dirty_enabled is recent and not in any stable
trees (or even 5.19).

Yes, I did start working on hugetlb softdirty support in the past.
https://lore.kernel.org/lkml/20210211000322.159437-1-mike.kravetz@oracle.com/

Unfortunately, it got preempted by other things.  I will try to move it up
the priority list.
Peter Xu Aug. 8, 2022, 4:10 p.m. UTC | #10
On Fri, Aug 05, 2022 at 04:33:56PM -0700, Mike Kravetz wrote:
> It looks like vma_soft_dirty_enabled is recent and not in any stable
> trees (or even 5.19).
> 
> Yes, I did start working on hugetlb softdirty support in the past.
> https://lore.kernel.org/lkml/20210211000322.159437-1-mike.kravetz@oracle.com/
> 
> Unfortunately, it got preempted by other things.  I will try to move it up
> the priority list.

Thanks, Mike.

It'll also makes sense to forbid it if it may take time to finish, so we
don't need to push ourselves.
David Hildenbrand Aug. 8, 2022, 4:36 p.m. UTC | #11
On 06.08.22 01:13, Peter Xu wrote:
> On Fri, Aug 05, 2022 at 01:48:35PM -0700, Mike Kravetz wrote:
>> On 08/05/22 20:57, David Hildenbrand wrote:
>>> On 05.08.22 20:33, Mike Kravetz wrote:
>>>> On 08/05/22 20:25, David Hildenbrand wrote:
>>>>> On 05.08.22 20:23, Mike Kravetz wrote:
>>>>>> On 08/05/22 14:14, Peter Xu wrote:
>>>>>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote:
>>>>>>>> diff --git a/mm/mmap.c b/mm/mmap.c
>>>>>>>> index 61e6135c54ef..462a6b0344ac 100644
>>>>>>>> --- a/mm/mmap.c
>>>>>>>> +++ b/mm/mmap.c
>>>>>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
>>>>>>>>  	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
>>>>>>>>  		return 0;
>>>>>>>>  
>>>>>>>> +	/*
>>>>>>>> +	 * Hugetlb does not require/support writenotify; especially, it does not
>>>>>>>> +	 * support softdirty tracking.
>>>>>>>> +	 */
>>>>>>>> +	if (is_vm_hugetlb_page(vma))
>>>>>>>> +		return 0;
>>>>>>>
>>>>>>> I'm kind of confused here..  you seems to be fixing up soft-dirty for
>>>>>>> hugetlb but here it's explicitly forbidden.
>>>>>>>
>>>>>>> Could you explain a bit more on why this patch is needed if (assume
>>>>>>> there'll be a working) patch 2 being provided?
>>>>>>>
>>>>>>
>>>>>> No comments on the patch, but ...
>>>>>>
>>>>>> Since it required little thought, I ran the test program on next-20220802 and
>>>>>> was surprised that the issue did not recreate.  Even added a simple printk
>>>>>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma.
>>>>>> We were.
>>>>>
>>>>>
>>>>> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
>>>>>
>>>>
>>>> No, Duh!
>>>>
>>>> FYI - Some time back, I started looking at adding soft dirty support for
>>>> hugetlb mappings.  I did not finish that work.  But, I seem to recall
>>>> places where code was operating on hugetlb mappings when perhaps it should
>>>> not.
>>>>
>>>> Perhaps, it would also be good to just disable soft dirty for hugetlb at
>>>> the source?
>>>
>>> I thought about that as well. But I came to the conclusion that without
>>> patch #2, hugetlb VMAs cannot possibly support write-notify, so there is
>>> no need to bother in vma_wants_writenotify() at all.
>>>
>>> The "root" would be places where we clear VM_SOFTDIRTY. That should only
>>> be fs/proc/task_mmu.c:clear_refs_write() IIRC.
>>>
>>> So I don't particularly care, I consider this patch a bit cleaner and
>>> more generic, but I can adjust clear_refs_write() instead of there is a
>>> preference.
>>>
>>
>> After a closer look, I agree that this may be the simplest/cleanest way to
>> proceed.  I was going to suggest that you note hugetlb does not support
>> softdirty, but see you did in the comment.
>>
>> Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
> 
> Filtering out hugetlbfs in vma_wants_writenotify() is still a bit hard to
> follow to me, since it's not clear why hugetlbfs never wants writenotify.

Well, because the write-fault handler as is cannot deal with
write-faults in shared mappings. It cannot possibly work or ever have
worked.

> 
> If it's only about soft-dirty, we could have added the hugetlbfs check into
> vma_soft_dirty_enabled(), then I think it'll achieve the same thing and
> much clearer - with the soft-dirty check constantly returning false for it,
> hugetlbfs shared vmas should have vma_wants_writenotify() naturally return
> 0 already.

I considered that an option, but went with this approach for simplicity
and because I don't see a need to check for hugetlb in other code paths
(especially, in the !hugetlb mprotect variant).

I mean, with patch #2 we can remove it anytime we do support softdirty
tracking -- or if there is need for write-notify we can move it into the
softdirty check only.

> 
> For the long term - shouldn't we just enable soft-dirty for hugetlbfs?  I
> remember Mike used to have that in todo.  Since we've got patch 2 already,
> I feel like that's really much close (is the only missing piece the clear
> refs write part? or maybe some more that I didn't notice).

My gut feeling is that there isn't too much missing to have it working.
Define a PTE bit, implement hugetlb variant of clearing, and set it when
setting the PTE dirty.

Thanks!
Peter Xu Aug. 8, 2022, 7:28 p.m. UTC | #12
On Mon, Aug 08, 2022 at 06:36:58PM +0200, David Hildenbrand wrote:
> Well, because the write-fault handler as is cannot deal with
> write-faults in shared mappings. It cannot possibly work or ever have
> worked.

Trivially - maybe drop the word "require" in "Hugetlb does not
require/support writenotify"?
David Hildenbrand Aug. 10, 2022, 9:29 a.m. UTC | #13
On 08.08.22 21:28, Peter Xu wrote:
> On Mon, Aug 08, 2022 at 06:36:58PM +0200, David Hildenbrand wrote:
>> Well, because the write-fault handler as is cannot deal with
>> write-faults in shared mappings. It cannot possibly work or ever have
>> worked.
> 
> Trivially - maybe drop the word "require" in "Hugetlb does not
> require/support writenotify"?
> 

Sure, can do.
diff mbox series

Patch

diff --git a/mm/mmap.c b/mm/mmap.c
index 61e6135c54ef..462a6b0344ac 100644
--- a/mm/mmap.c
+++ b/mm/mmap.c
@@ -1683,6 +1683,13 @@  int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot)
 	if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED)))
 		return 0;
 
+	/*
+	 * Hugetlb does not require/support writenotify; especially, it does not
+	 * support softdirty tracking.
+	 */
+	if (is_vm_hugetlb_page(vma))
+		return 0;
+
 	/* The backer wishes to know when pages are first written to? */
 	if (vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite))
 		return 1;