Message ID | 20220805110329.80540-2-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/hugetlb: fix write-fault handling for shared mappings | expand |
On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote: > diff --git a/mm/mmap.c b/mm/mmap.c > index 61e6135c54ef..462a6b0344ac 100644 > --- a/mm/mmap.c > +++ b/mm/mmap.c > @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) > if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) > return 0; > > + /* > + * Hugetlb does not require/support writenotify; especially, it does not > + * support softdirty tracking. > + */ > + if (is_vm_hugetlb_page(vma)) > + return 0; I'm kind of confused here.. you seems to be fixing up soft-dirty for hugetlb but here it's explicitly forbidden. Could you explain a bit more on why this patch is needed if (assume there'll be a working) patch 2 being provided? Thanks,
On 05.08.22 20:14, Peter Xu wrote: > On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote: >> diff --git a/mm/mmap.c b/mm/mmap.c >> index 61e6135c54ef..462a6b0344ac 100644 >> --- a/mm/mmap.c >> +++ b/mm/mmap.c >> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) >> if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) >> return 0; >> >> + /* >> + * Hugetlb does not require/support writenotify; especially, it does not >> + * support softdirty tracking. >> + */ >> + if (is_vm_hugetlb_page(vma)) >> + return 0; > > I'm kind of confused here.. you seems to be fixing up soft-dirty for > hugetlb but here it's explicitly forbidden. > > Could you explain a bit more on why this patch is needed if (assume > there'll be a working) patch 2 being provided? I want something simple to backport. And even with patch #2 in place, as long as we don't support softdirty tracking, it doesn't make sense to enable it here as of now.
On 08/05/22 14:14, Peter Xu wrote: > On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote: > > diff --git a/mm/mmap.c b/mm/mmap.c > > index 61e6135c54ef..462a6b0344ac 100644 > > --- a/mm/mmap.c > > +++ b/mm/mmap.c > > @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) > > if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) > > return 0; > > > > + /* > > + * Hugetlb does not require/support writenotify; especially, it does not > > + * support softdirty tracking. > > + */ > > + if (is_vm_hugetlb_page(vma)) > > + return 0; > > I'm kind of confused here.. you seems to be fixing up soft-dirty for > hugetlb but here it's explicitly forbidden. > > Could you explain a bit more on why this patch is needed if (assume > there'll be a working) patch 2 being provided? > No comments on the patch, but ... Since it required little thought, I ran the test program on next-20220802 and was surprised that the issue did not recreate. Even added a simple printk to make sure we were getting into vma_wants_writenotify with a hugetlb vma. We were. I can easily recreate with an older Fedora kernel. Will be trying to understand why this is the case.
On 05.08.22 20:23, Mike Kravetz wrote: > On 08/05/22 14:14, Peter Xu wrote: >> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote: >>> diff --git a/mm/mmap.c b/mm/mmap.c >>> index 61e6135c54ef..462a6b0344ac 100644 >>> --- a/mm/mmap.c >>> +++ b/mm/mmap.c >>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) >>> if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) >>> return 0; >>> >>> + /* >>> + * Hugetlb does not require/support writenotify; especially, it does not >>> + * support softdirty tracking. >>> + */ >>> + if (is_vm_hugetlb_page(vma)) >>> + return 0; >> >> I'm kind of confused here.. you seems to be fixing up soft-dirty for >> hugetlb but here it's explicitly forbidden. >> >> Could you explain a bit more on why this patch is needed if (assume >> there'll be a working) patch 2 being provided? >> > > No comments on the patch, but ... > > Since it required little thought, I ran the test program on next-20220802 and > was surprised that the issue did not recreate. Even added a simple printk > to make sure we were getting into vma_wants_writenotify with a hugetlb vma. > We were. ... does your config have CONFIG_MEM_SOFT_DIRTY enabled?
On 08/05/22 20:25, David Hildenbrand wrote: > On 05.08.22 20:23, Mike Kravetz wrote: > > On 08/05/22 14:14, Peter Xu wrote: > >> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote: > >>> diff --git a/mm/mmap.c b/mm/mmap.c > >>> index 61e6135c54ef..462a6b0344ac 100644 > >>> --- a/mm/mmap.c > >>> +++ b/mm/mmap.c > >>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) > >>> if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) > >>> return 0; > >>> > >>> + /* > >>> + * Hugetlb does not require/support writenotify; especially, it does not > >>> + * support softdirty tracking. > >>> + */ > >>> + if (is_vm_hugetlb_page(vma)) > >>> + return 0; > >> > >> I'm kind of confused here.. you seems to be fixing up soft-dirty for > >> hugetlb but here it's explicitly forbidden. > >> > >> Could you explain a bit more on why this patch is needed if (assume > >> there'll be a working) patch 2 being provided? > >> > > > > No comments on the patch, but ... > > > > Since it required little thought, I ran the test program on next-20220802 and > > was surprised that the issue did not recreate. Even added a simple printk > > to make sure we were getting into vma_wants_writenotify with a hugetlb vma. > > We were. > > > ... does your config have CONFIG_MEM_SOFT_DIRTY enabled? > No, Duh! FYI - Some time back, I started looking at adding soft dirty support for hugetlb mappings. I did not finish that work. But, I seem to recall places where code was operating on hugetlb mappings when perhaps it should not. Perhaps, it would also be good to just disable soft dirty for hugetlb at the source?
On 05.08.22 20:33, Mike Kravetz wrote: > On 08/05/22 20:25, David Hildenbrand wrote: >> On 05.08.22 20:23, Mike Kravetz wrote: >>> On 08/05/22 14:14, Peter Xu wrote: >>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote: >>>>> diff --git a/mm/mmap.c b/mm/mmap.c >>>>> index 61e6135c54ef..462a6b0344ac 100644 >>>>> --- a/mm/mmap.c >>>>> +++ b/mm/mmap.c >>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) >>>>> if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) >>>>> return 0; >>>>> >>>>> + /* >>>>> + * Hugetlb does not require/support writenotify; especially, it does not >>>>> + * support softdirty tracking. >>>>> + */ >>>>> + if (is_vm_hugetlb_page(vma)) >>>>> + return 0; >>>> >>>> I'm kind of confused here.. you seems to be fixing up soft-dirty for >>>> hugetlb but here it's explicitly forbidden. >>>> >>>> Could you explain a bit more on why this patch is needed if (assume >>>> there'll be a working) patch 2 being provided? >>>> >>> >>> No comments on the patch, but ... >>> >>> Since it required little thought, I ran the test program on next-20220802 and >>> was surprised that the issue did not recreate. Even added a simple printk >>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma. >>> We were. >> >> >> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled? >> > > No, Duh! > > FYI - Some time back, I started looking at adding soft dirty support for > hugetlb mappings. I did not finish that work. But, I seem to recall > places where code was operating on hugetlb mappings when perhaps it should > not. > > Perhaps, it would also be good to just disable soft dirty for hugetlb at > the source? I thought about that as well. But I came to the conclusion that without patch #2, hugetlb VMAs cannot possibly support write-notify, so there is no need to bother in vma_wants_writenotify() at all. The "root" would be places where we clear VM_SOFTDIRTY. That should only be fs/proc/task_mmu.c:clear_refs_write() IIRC. So I don't particularly care, I consider this patch a bit cleaner and more generic, but I can adjust clear_refs_write() instead of there is a preference.
On 08/05/22 20:57, David Hildenbrand wrote: > On 05.08.22 20:33, Mike Kravetz wrote: > > On 08/05/22 20:25, David Hildenbrand wrote: > >> On 05.08.22 20:23, Mike Kravetz wrote: > >>> On 08/05/22 14:14, Peter Xu wrote: > >>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote: > >>>>> diff --git a/mm/mmap.c b/mm/mmap.c > >>>>> index 61e6135c54ef..462a6b0344ac 100644 > >>>>> --- a/mm/mmap.c > >>>>> +++ b/mm/mmap.c > >>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) > >>>>> if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) > >>>>> return 0; > >>>>> > >>>>> + /* > >>>>> + * Hugetlb does not require/support writenotify; especially, it does not > >>>>> + * support softdirty tracking. > >>>>> + */ > >>>>> + if (is_vm_hugetlb_page(vma)) > >>>>> + return 0; > >>>> > >>>> I'm kind of confused here.. you seems to be fixing up soft-dirty for > >>>> hugetlb but here it's explicitly forbidden. > >>>> > >>>> Could you explain a bit more on why this patch is needed if (assume > >>>> there'll be a working) patch 2 being provided? > >>>> > >>> > >>> No comments on the patch, but ... > >>> > >>> Since it required little thought, I ran the test program on next-20220802 and > >>> was surprised that the issue did not recreate. Even added a simple printk > >>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma. > >>> We were. > >> > >> > >> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled? > >> > > > > No, Duh! > > > > FYI - Some time back, I started looking at adding soft dirty support for > > hugetlb mappings. I did not finish that work. But, I seem to recall > > places where code was operating on hugetlb mappings when perhaps it should > > not. > > > > Perhaps, it would also be good to just disable soft dirty for hugetlb at > > the source? > > I thought about that as well. But I came to the conclusion that without > patch #2, hugetlb VMAs cannot possibly support write-notify, so there is > no need to bother in vma_wants_writenotify() at all. > > The "root" would be places where we clear VM_SOFTDIRTY. That should only > be fs/proc/task_mmu.c:clear_refs_write() IIRC. > > So I don't particularly care, I consider this patch a bit cleaner and > more generic, but I can adjust clear_refs_write() instead of there is a > preference. > After a closer look, I agree that this may be the simplest/cleanest way to proceed. I was going to suggest that you note hugetlb does not support softdirty, but see you did in the comment. Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
On Fri, Aug 05, 2022 at 01:48:35PM -0700, Mike Kravetz wrote: > On 08/05/22 20:57, David Hildenbrand wrote: > > On 05.08.22 20:33, Mike Kravetz wrote: > > > On 08/05/22 20:25, David Hildenbrand wrote: > > >> On 05.08.22 20:23, Mike Kravetz wrote: > > >>> On 08/05/22 14:14, Peter Xu wrote: > > >>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote: > > >>>>> diff --git a/mm/mmap.c b/mm/mmap.c > > >>>>> index 61e6135c54ef..462a6b0344ac 100644 > > >>>>> --- a/mm/mmap.c > > >>>>> +++ b/mm/mmap.c > > >>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) > > >>>>> if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) > > >>>>> return 0; > > >>>>> > > >>>>> + /* > > >>>>> + * Hugetlb does not require/support writenotify; especially, it does not > > >>>>> + * support softdirty tracking. > > >>>>> + */ > > >>>>> + if (is_vm_hugetlb_page(vma)) > > >>>>> + return 0; > > >>>> > > >>>> I'm kind of confused here.. you seems to be fixing up soft-dirty for > > >>>> hugetlb but here it's explicitly forbidden. > > >>>> > > >>>> Could you explain a bit more on why this patch is needed if (assume > > >>>> there'll be a working) patch 2 being provided? > > >>>> > > >>> > > >>> No comments on the patch, but ... > > >>> > > >>> Since it required little thought, I ran the test program on next-20220802 and > > >>> was surprised that the issue did not recreate. Even added a simple printk > > >>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma. > > >>> We were. > > >> > > >> > > >> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled? > > >> > > > > > > No, Duh! > > > > > > FYI - Some time back, I started looking at adding soft dirty support for > > > hugetlb mappings. I did not finish that work. But, I seem to recall > > > places where code was operating on hugetlb mappings when perhaps it should > > > not. > > > > > > Perhaps, it would also be good to just disable soft dirty for hugetlb at > > > the source? > > > > I thought about that as well. But I came to the conclusion that without > > patch #2, hugetlb VMAs cannot possibly support write-notify, so there is > > no need to bother in vma_wants_writenotify() at all. > > > > The "root" would be places where we clear VM_SOFTDIRTY. That should only > > be fs/proc/task_mmu.c:clear_refs_write() IIRC. > > > > So I don't particularly care, I consider this patch a bit cleaner and > > more generic, but I can adjust clear_refs_write() instead of there is a > > preference. > > > > After a closer look, I agree that this may be the simplest/cleanest way to > proceed. I was going to suggest that you note hugetlb does not support > softdirty, but see you did in the comment. > > Acked-by: Mike Kravetz <mike.kravetz@oracle.com> Filtering out hugetlbfs in vma_wants_writenotify() is still a bit hard to follow to me, since it's not clear why hugetlbfs never wants writenotify. If it's only about soft-dirty, we could have added the hugetlbfs check into vma_soft_dirty_enabled(), then I think it'll achieve the same thing and much clearer - with the soft-dirty check constantly returning false for it, hugetlbfs shared vmas should have vma_wants_writenotify() naturally return 0 already. For the long term - shouldn't we just enable soft-dirty for hugetlbfs? I remember Mike used to have that in todo. Since we've got patch 2 already, I feel like that's really much close (is the only missing piece the clear refs write part? or maybe some more that I didn't notice). Then patch 1 (or IMHO equivalant check in vma_soft_dirty_enabled(), but maybe in stable trees we don't have vma_soft_dirty_enabled then it's exactly patch 1) can be a stable-only backport just to avoid the bug from triggering. Thanks,
On 08/05/22 19:13, Peter Xu wrote: > On Fri, Aug 05, 2022 at 01:48:35PM -0700, Mike Kravetz wrote: > > On 08/05/22 20:57, David Hildenbrand wrote: > > > On 05.08.22 20:33, Mike Kravetz wrote: > > > > On 08/05/22 20:25, David Hildenbrand wrote: > > > >> On 05.08.22 20:23, Mike Kravetz wrote: > > > >>> On 08/05/22 14:14, Peter Xu wrote: > > > >>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote: > > > >>>>> diff --git a/mm/mmap.c b/mm/mmap.c > > > >>>>> index 61e6135c54ef..462a6b0344ac 100644 > > > >>>>> --- a/mm/mmap.c > > > >>>>> +++ b/mm/mmap.c > > > >>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) > > > >>>>> if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) > > > >>>>> return 0; > > > >>>>> > > > >>>>> + /* > > > >>>>> + * Hugetlb does not require/support writenotify; especially, it does not > > > >>>>> + * support softdirty tracking. > > > >>>>> + */ > > > >>>>> + if (is_vm_hugetlb_page(vma)) > > > >>>>> + return 0; > > > >>>> > > > >>>> I'm kind of confused here.. you seems to be fixing up soft-dirty for > > > >>>> hugetlb but here it's explicitly forbidden. > > > >>>> > > > >>>> Could you explain a bit more on why this patch is needed if (assume > > > >>>> there'll be a working) patch 2 being provided? > > > >>>> > > > >>> > > > >>> No comments on the patch, but ... > > > >>> > > > >>> Since it required little thought, I ran the test program on next-20220802 and > > > >>> was surprised that the issue did not recreate. Even added a simple printk > > > >>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma. > > > >>> We were. > > > >> > > > >> > > > >> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled? > > > >> > > > > > > > > No, Duh! > > > > > > > > FYI - Some time back, I started looking at adding soft dirty support for > > > > hugetlb mappings. I did not finish that work. But, I seem to recall > > > > places where code was operating on hugetlb mappings when perhaps it should > > > > not. > > > > > > > > Perhaps, it would also be good to just disable soft dirty for hugetlb at > > > > the source? > > > > > > I thought about that as well. But I came to the conclusion that without > > > patch #2, hugetlb VMAs cannot possibly support write-notify, so there is > > > no need to bother in vma_wants_writenotify() at all. > > > > > > The "root" would be places where we clear VM_SOFTDIRTY. That should only > > > be fs/proc/task_mmu.c:clear_refs_write() IIRC. > > > > > > So I don't particularly care, I consider this patch a bit cleaner and > > > more generic, but I can adjust clear_refs_write() instead of there is a > > > preference. > > > > > > > After a closer look, I agree that this may be the simplest/cleanest way to > > proceed. I was going to suggest that you note hugetlb does not support > > softdirty, but see you did in the comment. > > > > Acked-by: Mike Kravetz <mike.kravetz@oracle.com> > > Filtering out hugetlbfs in vma_wants_writenotify() is still a bit hard to > follow to me, since it's not clear why hugetlbfs never wants writenotify. > > If it's only about soft-dirty, we could have added the hugetlbfs check into > vma_soft_dirty_enabled(), then I think it'll achieve the same thing and > much clearer - with the soft-dirty check constantly returning false for it, > hugetlbfs shared vmas should have vma_wants_writenotify() naturally return > 0 already. > > For the long term - shouldn't we just enable soft-dirty for hugetlbfs? I > remember Mike used to have that in todo. Since we've got patch 2 already, > I feel like that's really much close (is the only missing piece the clear > refs write part? or maybe some more that I didn't notice). > > Then patch 1 (or IMHO equivalant check in vma_soft_dirty_enabled(), but > maybe in stable trees we don't have vma_soft_dirty_enabled then it's > exactly patch 1) can be a stable-only backport just to avoid the bug from > triggering. It looks like vma_soft_dirty_enabled is recent and not in any stable trees (or even 5.19). Yes, I did start working on hugetlb softdirty support in the past. https://lore.kernel.org/lkml/20210211000322.159437-1-mike.kravetz@oracle.com/ Unfortunately, it got preempted by other things. I will try to move it up the priority list.
On Fri, Aug 05, 2022 at 04:33:56PM -0700, Mike Kravetz wrote: > It looks like vma_soft_dirty_enabled is recent and not in any stable > trees (or even 5.19). > > Yes, I did start working on hugetlb softdirty support in the past. > https://lore.kernel.org/lkml/20210211000322.159437-1-mike.kravetz@oracle.com/ > > Unfortunately, it got preempted by other things. I will try to move it up > the priority list. Thanks, Mike. It'll also makes sense to forbid it if it may take time to finish, so we don't need to push ourselves.
On 06.08.22 01:13, Peter Xu wrote: > On Fri, Aug 05, 2022 at 01:48:35PM -0700, Mike Kravetz wrote: >> On 08/05/22 20:57, David Hildenbrand wrote: >>> On 05.08.22 20:33, Mike Kravetz wrote: >>>> On 08/05/22 20:25, David Hildenbrand wrote: >>>>> On 05.08.22 20:23, Mike Kravetz wrote: >>>>>> On 08/05/22 14:14, Peter Xu wrote: >>>>>>> On Fri, Aug 05, 2022 at 01:03:28PM +0200, David Hildenbrand wrote: >>>>>>>> diff --git a/mm/mmap.c b/mm/mmap.c >>>>>>>> index 61e6135c54ef..462a6b0344ac 100644 >>>>>>>> --- a/mm/mmap.c >>>>>>>> +++ b/mm/mmap.c >>>>>>>> @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) >>>>>>>> if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) >>>>>>>> return 0; >>>>>>>> >>>>>>>> + /* >>>>>>>> + * Hugetlb does not require/support writenotify; especially, it does not >>>>>>>> + * support softdirty tracking. >>>>>>>> + */ >>>>>>>> + if (is_vm_hugetlb_page(vma)) >>>>>>>> + return 0; >>>>>>> >>>>>>> I'm kind of confused here.. you seems to be fixing up soft-dirty for >>>>>>> hugetlb but here it's explicitly forbidden. >>>>>>> >>>>>>> Could you explain a bit more on why this patch is needed if (assume >>>>>>> there'll be a working) patch 2 being provided? >>>>>>> >>>>>> >>>>>> No comments on the patch, but ... >>>>>> >>>>>> Since it required little thought, I ran the test program on next-20220802 and >>>>>> was surprised that the issue did not recreate. Even added a simple printk >>>>>> to make sure we were getting into vma_wants_writenotify with a hugetlb vma. >>>>>> We were. >>>>> >>>>> >>>>> ... does your config have CONFIG_MEM_SOFT_DIRTY enabled? >>>>> >>>> >>>> No, Duh! >>>> >>>> FYI - Some time back, I started looking at adding soft dirty support for >>>> hugetlb mappings. I did not finish that work. But, I seem to recall >>>> places where code was operating on hugetlb mappings when perhaps it should >>>> not. >>>> >>>> Perhaps, it would also be good to just disable soft dirty for hugetlb at >>>> the source? >>> >>> I thought about that as well. But I came to the conclusion that without >>> patch #2, hugetlb VMAs cannot possibly support write-notify, so there is >>> no need to bother in vma_wants_writenotify() at all. >>> >>> The "root" would be places where we clear VM_SOFTDIRTY. That should only >>> be fs/proc/task_mmu.c:clear_refs_write() IIRC. >>> >>> So I don't particularly care, I consider this patch a bit cleaner and >>> more generic, but I can adjust clear_refs_write() instead of there is a >>> preference. >>> >> >> After a closer look, I agree that this may be the simplest/cleanest way to >> proceed. I was going to suggest that you note hugetlb does not support >> softdirty, but see you did in the comment. >> >> Acked-by: Mike Kravetz <mike.kravetz@oracle.com> > > Filtering out hugetlbfs in vma_wants_writenotify() is still a bit hard to > follow to me, since it's not clear why hugetlbfs never wants writenotify. Well, because the write-fault handler as is cannot deal with write-faults in shared mappings. It cannot possibly work or ever have worked. > > If it's only about soft-dirty, we could have added the hugetlbfs check into > vma_soft_dirty_enabled(), then I think it'll achieve the same thing and > much clearer - with the soft-dirty check constantly returning false for it, > hugetlbfs shared vmas should have vma_wants_writenotify() naturally return > 0 already. I considered that an option, but went with this approach for simplicity and because I don't see a need to check for hugetlb in other code paths (especially, in the !hugetlb mprotect variant). I mean, with patch #2 we can remove it anytime we do support softdirty tracking -- or if there is need for write-notify we can move it into the softdirty check only. > > For the long term - shouldn't we just enable soft-dirty for hugetlbfs? I > remember Mike used to have that in todo. Since we've got patch 2 already, > I feel like that's really much close (is the only missing piece the clear > refs write part? or maybe some more that I didn't notice). My gut feeling is that there isn't too much missing to have it working. Define a PTE bit, implement hugetlb variant of clearing, and set it when setting the PTE dirty. Thanks!
On Mon, Aug 08, 2022 at 06:36:58PM +0200, David Hildenbrand wrote: > Well, because the write-fault handler as is cannot deal with > write-faults in shared mappings. It cannot possibly work or ever have > worked. Trivially - maybe drop the word "require" in "Hugetlb does not require/support writenotify"?
On 08.08.22 21:28, Peter Xu wrote: > On Mon, Aug 08, 2022 at 06:36:58PM +0200, David Hildenbrand wrote: >> Well, because the write-fault handler as is cannot deal with >> write-faults in shared mappings. It cannot possibly work or ever have >> worked. > > Trivially - maybe drop the word "require" in "Hugetlb does not > require/support writenotify"? > Sure, can do.
diff --git a/mm/mmap.c b/mm/mmap.c index 61e6135c54ef..462a6b0344ac 100644 --- a/mm/mmap.c +++ b/mm/mmap.c @@ -1683,6 +1683,13 @@ int vma_wants_writenotify(struct vm_area_struct *vma, pgprot_t vm_page_prot) if ((vm_flags & (VM_WRITE|VM_SHARED)) != ((VM_WRITE|VM_SHARED))) return 0; + /* + * Hugetlb does not require/support writenotify; especially, it does not + * support softdirty tracking. + */ + if (is_vm_hugetlb_page(vma)) + return 0; + /* The backer wishes to know when pages are first written to? */ if (vm_ops && (vm_ops->page_mkwrite || vm_ops->pfn_mkwrite)) return 1;
Staring at hugetlb_wp(), one might wonder where all the logic for shared mappings is when stumbling over a write-protected page in a shared mapping. In fact, there is none, and so far we thought we could get away with that because e.g., mprotect() should always do the right thing and map all pages directly writable. Looks like we were wrong: -------------------------------------------------------------------------- #include <stdio.h> #include <stdlib.h> #include <string.h> #include <fcntl.h> #include <unistd.h> #include <errno.h> #include <sys/mman.h> #define HUGETLB_SIZE (2 * 1024 * 1024u) static void clear_softdirty(void) { int fd = open("/proc/self/clear_refs", O_WRONLY); const char *ctrl = "4"; int ret; if (fd < 0) { fprintf(stderr, "open(clear_refs) failed\n"); exit(1); } ret = write(fd, ctrl, strlen(ctrl)); if (ret != strlen(ctrl)) { fprintf(stderr, "write(clear_refs) failed\n"); exit(1); } close(fd); } int main(int argc, char **argv) { char *map; int fd; fd = open("/dev/hugepages/tmp", O_RDWR | O_CREAT); if (!fd) { fprintf(stderr, "open() failed\n"); return -errno; } if (ftruncate(fd, HUGETLB_SIZE)) { fprintf(stderr, "ftruncate() failed\n"); return -errno; } map = mmap(NULL, HUGETLB_SIZE, PROT_READ|PROT_WRITE, MAP_SHARED, fd, 0); if (map == MAP_FAILED) { fprintf(stderr, "mmap() failed\n"); return -errno; } *map = 0; if (mprotect(map, HUGETLB_SIZE, PROT_READ)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } clear_softdirty(); if (mprotect(map, HUGETLB_SIZE, PROT_READ|PROT_WRITE)) { fprintf(stderr, "mmprotect() failed\n"); return -errno; } *map = 0; return 0; } -------------------------------------------------------------------------- Above test fails with SIGBUS when there is only a single free hugetlb page. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test Bus error (core dumped) And worse, with sufficient free hugetlb pages it will map an anonymous page into a shared mapping, for example, messing up accounting during unmap and breaking MAP_SHARED semantics: # echo 2 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages # ./test # cat /proc/meminfo | grep HugePages_ HugePages_Total: 2 HugePages_Free: 1 HugePages_Rsvd: 18446744073709551615 HugePages_Surp: 0 Reason in this particular case is that vma_wants_writenotify() will return "true", removing VM_SHARED in vma_set_page_prot() to map pages write-protected. Let's teach vma_wants_writenotify() that hugetlb does not support write-notify, including softdirty tracking. Fixes: 64e455079e1b ("mm: softdirty: enable write notifications on VMAs after VM_SOFTDIRTY cleared") Cc: <stable@vger.kernel.org> # v3.18+ Signed-off-by: David Hildenbrand <david@redhat.com> --- mm/mmap.c | 7 +++++++ 1 file changed, 7 insertions(+)