Message ID | 20190212025632.28946-21-peterx@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | userfaultfd: write protection support | expand |
On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote: > From: Shaohua Li <shli@fb.com> > > Add API to enable/disable writeprotect a vma range. Unlike mprotect, > this doesn't split/merge vmas. > > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: Kirill A. Shutemov <kirill@shutemov.name> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Hugh Dickins <hughd@google.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Signed-off-by: Shaohua Li <shli@fb.com> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > [peterx: > - use the helper to find VMA; > - return -ENOENT if not found to match mcopy case; > - use the new MM_CP_UFFD_WP* flags for change_protection > - check against mmap_changing for failures] > Signed-off-by: Peter Xu <peterx@redhat.com> I have a question see below but anyway: Reviewed-by: Jérôme Glisse <jglisse@redhat.com> > --- > include/linux/userfaultfd_k.h | 3 ++ > mm/userfaultfd.c | 54 +++++++++++++++++++++++++++++++++++ > 2 files changed, 57 insertions(+) > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > index 765ce884cec0..8f6e6ed544fb 100644 > --- a/include/linux/userfaultfd_k.h > +++ b/include/linux/userfaultfd_k.h > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, > unsigned long dst_start, > unsigned long len, > bool *mmap_changing); > +extern int mwriteprotect_range(struct mm_struct *dst_mm, > + unsigned long start, unsigned long len, > + bool enable_wp, bool *mmap_changing); > > /* mm helpers */ > static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index fefa81c301b7..529d180bb4d7 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, > { > return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); > } > + > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > + unsigned long len, bool enable_wp, bool *mmap_changing) > +{ > + struct vm_area_struct *dst_vma; > + pgprot_t newprot; > + int err; > + > + /* > + * Sanitize the command parameters: > + */ > + BUG_ON(start & ~PAGE_MASK); > + BUG_ON(len & ~PAGE_MASK); > + > + /* Does the address range wrap, or is the span zero-sized? */ > + BUG_ON(start + len <= start); > + > + down_read(&dst_mm->mmap_sem); > + > + /* > + * If memory mappings are changing because of non-cooperative > + * operation (e.g. mremap) running in parallel, bail out and > + * request the user to retry later > + */ > + err = -EAGAIN; > + if (mmap_changing && READ_ONCE(*mmap_changing)) > + goto out_unlock; > + > + err = -ENOENT; > + dst_vma = vma_find_uffd(dst_mm, start, len); > + /* > + * Make sure the vma is not shared, that the dst range is > + * both valid and fully within a single existing vma. > + */ > + if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) > + goto out_unlock; > + if (!userfaultfd_wp(dst_vma)) > + goto out_unlock; > + if (!vma_is_anonymous(dst_vma)) > + goto out_unlock; Don't you want to distinguish between no VMA ie ENOENT and vma that can not be write protected (VM_SHARED, not userfaultfd, not anonymous) ?
On Thu, Feb 21, 2019 at 01:23:59PM -0500, Jerome Glisse wrote: > On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote: > > From: Shaohua Li <shli@fb.com> > > > > Add API to enable/disable writeprotect a vma range. Unlike mprotect, > > this doesn't split/merge vmas. > > > > Cc: Andrea Arcangeli <aarcange@redhat.com> > > Cc: Rik van Riel <riel@redhat.com> > > Cc: Kirill A. Shutemov <kirill@shutemov.name> > > Cc: Mel Gorman <mgorman@suse.de> > > Cc: Hugh Dickins <hughd@google.com> > > Cc: Johannes Weiner <hannes@cmpxchg.org> > > Signed-off-by: Shaohua Li <shli@fb.com> > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > > [peterx: > > - use the helper to find VMA; > > - return -ENOENT if not found to match mcopy case; > > - use the new MM_CP_UFFD_WP* flags for change_protection > > - check against mmap_changing for failures] > > Signed-off-by: Peter Xu <peterx@redhat.com> > > I have a question see below but anyway: > > Reviewed-by: Jérôme Glisse <jglisse@redhat.com> Thanks! > > > --- > > include/linux/userfaultfd_k.h | 3 ++ > > mm/userfaultfd.c | 54 +++++++++++++++++++++++++++++++++++ > > 2 files changed, 57 insertions(+) > > > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > > index 765ce884cec0..8f6e6ed544fb 100644 > > --- a/include/linux/userfaultfd_k.h > > +++ b/include/linux/userfaultfd_k.h > > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, > > unsigned long dst_start, > > unsigned long len, > > bool *mmap_changing); > > +extern int mwriteprotect_range(struct mm_struct *dst_mm, > > + unsigned long start, unsigned long len, > > + bool enable_wp, bool *mmap_changing); > > > > /* mm helpers */ > > static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > index fefa81c301b7..529d180bb4d7 100644 > > --- a/mm/userfaultfd.c > > +++ b/mm/userfaultfd.c > > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, > > { > > return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); > > } > > + > > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > > + unsigned long len, bool enable_wp, bool *mmap_changing) > > +{ > > + struct vm_area_struct *dst_vma; > > + pgprot_t newprot; > > + int err; > > + > > + /* > > + * Sanitize the command parameters: > > + */ > > + BUG_ON(start & ~PAGE_MASK); > > + BUG_ON(len & ~PAGE_MASK); > > + > > + /* Does the address range wrap, or is the span zero-sized? */ > > + BUG_ON(start + len <= start); > > + > > + down_read(&dst_mm->mmap_sem); > > + > > + /* > > + * If memory mappings are changing because of non-cooperative > > + * operation (e.g. mremap) running in parallel, bail out and > > + * request the user to retry later > > + */ > > + err = -EAGAIN; > > + if (mmap_changing && READ_ONCE(*mmap_changing)) > > + goto out_unlock; > > + > > + err = -ENOENT; > > + dst_vma = vma_find_uffd(dst_mm, start, len); > > + /* > > + * Make sure the vma is not shared, that the dst range is > > + * both valid and fully within a single existing vma. > > + */ > > + if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) > > + goto out_unlock; > > + if (!userfaultfd_wp(dst_vma)) > > + goto out_unlock; > > + if (!vma_is_anonymous(dst_vma)) > > + goto out_unlock; > > Don't you want to distinguish between no VMA ie ENOENT and vma that > can not be write protected (VM_SHARED, not userfaultfd, not anonymous) ? Here we'll return ENOENT for all these errors which is actually trying to follow existing MISSING codes. Mike noticed some errno issues during reviewing the first version and suggested that we'd better follow the old rules which makes perfect sense to me. E.g., in __mcopy_atomic() we'll return ENOENT for either (1) VMA not found, (2) not UFFD VMA, (3) range check failures. Checking against anonymous and VM_SHARED are special for uffd-wp but I'm simply using this same errno since after all all these errors will stop us from finding a valid VMA before going anywhere further. Thanks,
On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote: > From: Shaohua Li <shli@fb.com> > > Add API to enable/disable writeprotect a vma range. Unlike mprotect, > this doesn't split/merge vmas. > > Cc: Andrea Arcangeli <aarcange@redhat.com> > Cc: Rik van Riel <riel@redhat.com> > Cc: Kirill A. Shutemov <kirill@shutemov.name> > Cc: Mel Gorman <mgorman@suse.de> > Cc: Hugh Dickins <hughd@google.com> > Cc: Johannes Weiner <hannes@cmpxchg.org> > Signed-off-by: Shaohua Li <shli@fb.com> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > [peterx: > - use the helper to find VMA; > - return -ENOENT if not found to match mcopy case; > - use the new MM_CP_UFFD_WP* flags for change_protection > - check against mmap_changing for failures] > Signed-off-by: Peter Xu <peterx@redhat.com> > --- > include/linux/userfaultfd_k.h | 3 ++ > mm/userfaultfd.c | 54 +++++++++++++++++++++++++++++++++++ > 2 files changed, 57 insertions(+) > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > index 765ce884cec0..8f6e6ed544fb 100644 > --- a/include/linux/userfaultfd_k.h > +++ b/include/linux/userfaultfd_k.h > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, > unsigned long dst_start, > unsigned long len, > bool *mmap_changing); > +extern int mwriteprotect_range(struct mm_struct *dst_mm, > + unsigned long start, unsigned long len, > + bool enable_wp, bool *mmap_changing); > > /* mm helpers */ > static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > index fefa81c301b7..529d180bb4d7 100644 > --- a/mm/userfaultfd.c > +++ b/mm/userfaultfd.c > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, > { > return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); > } > + > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > + unsigned long len, bool enable_wp, bool *mmap_changing) > +{ > + struct vm_area_struct *dst_vma; > + pgprot_t newprot; > + int err; > + > + /* > + * Sanitize the command parameters: > + */ > + BUG_ON(start & ~PAGE_MASK); > + BUG_ON(len & ~PAGE_MASK); > + > + /* Does the address range wrap, or is the span zero-sized? */ > + BUG_ON(start + len <= start); I'd replace these BUG_ON()s with if (WARN_ON()) return -EINVAL; > + > + down_read(&dst_mm->mmap_sem); > + > + /* > + * If memory mappings are changing because of non-cooperative > + * operation (e.g. mremap) running in parallel, bail out and > + * request the user to retry later > + */ > + err = -EAGAIN; > + if (mmap_changing && READ_ONCE(*mmap_changing)) > + goto out_unlock; > + > + err = -ENOENT; > + dst_vma = vma_find_uffd(dst_mm, start, len); > + /* > + * Make sure the vma is not shared, that the dst range is > + * both valid and fully within a single existing vma. > + */ > + if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) > + goto out_unlock; > + if (!userfaultfd_wp(dst_vma)) > + goto out_unlock; > + if (!vma_is_anonymous(dst_vma)) > + goto out_unlock; > + > + if (enable_wp) > + newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE)); > + else > + newprot = vm_get_page_prot(dst_vma->vm_flags); > + > + change_protection(dst_vma, start, start + len, newprot, > + enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE); > + > + err = 0; > +out_unlock: > + up_read(&dst_mm->mmap_sem); > + return err; > +} > -- > 2.17.1 >
On Mon, Feb 25, 2019 at 10:52:34PM +0200, Mike Rapoport wrote: > On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote: > > From: Shaohua Li <shli@fb.com> > > > > Add API to enable/disable writeprotect a vma range. Unlike mprotect, > > this doesn't split/merge vmas. > > > > Cc: Andrea Arcangeli <aarcange@redhat.com> > > Cc: Rik van Riel <riel@redhat.com> > > Cc: Kirill A. Shutemov <kirill@shutemov.name> > > Cc: Mel Gorman <mgorman@suse.de> > > Cc: Hugh Dickins <hughd@google.com> > > Cc: Johannes Weiner <hannes@cmpxchg.org> > > Signed-off-by: Shaohua Li <shli@fb.com> > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > > [peterx: > > - use the helper to find VMA; > > - return -ENOENT if not found to match mcopy case; > > - use the new MM_CP_UFFD_WP* flags for change_protection > > - check against mmap_changing for failures] > > Signed-off-by: Peter Xu <peterx@redhat.com> > > --- > > include/linux/userfaultfd_k.h | 3 ++ > > mm/userfaultfd.c | 54 +++++++++++++++++++++++++++++++++++ > > 2 files changed, 57 insertions(+) > > > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > > index 765ce884cec0..8f6e6ed544fb 100644 > > --- a/include/linux/userfaultfd_k.h > > +++ b/include/linux/userfaultfd_k.h > > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, > > unsigned long dst_start, > > unsigned long len, > > bool *mmap_changing); > > +extern int mwriteprotect_range(struct mm_struct *dst_mm, > > + unsigned long start, unsigned long len, > > + bool enable_wp, bool *mmap_changing); > > > > /* mm helpers */ > > static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > index fefa81c301b7..529d180bb4d7 100644 > > --- a/mm/userfaultfd.c > > +++ b/mm/userfaultfd.c > > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, > > { > > return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); > > } > > + > > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > > + unsigned long len, bool enable_wp, bool *mmap_changing) > > +{ > > + struct vm_area_struct *dst_vma; > > + pgprot_t newprot; > > + int err; > > + > > + /* > > + * Sanitize the command parameters: > > + */ > > + BUG_ON(start & ~PAGE_MASK); > > + BUG_ON(len & ~PAGE_MASK); > > + > > + /* Does the address range wrap, or is the span zero-sized? */ > > + BUG_ON(start + len <= start); > > I'd replace these BUG_ON()s with > > if (WARN_ON()) > return -EINVAL; I believe BUG_ON() is used because these parameters should have been checked in userfaultfd_writeprotect() already by the common validate_range() even before calling mwriteprotect_range(). So I'm fine with the WARN_ON() approach but I'd slightly prefer to simply keep the patch as is to keep Jerome's r-b if you won't disagree. :) Thanks,
On Tue, Feb 26, 2019 at 02:06:27PM +0800, Peter Xu wrote: > On Mon, Feb 25, 2019 at 10:52:34PM +0200, Mike Rapoport wrote: > > On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote: > > > From: Shaohua Li <shli@fb.com> > > > > > > Add API to enable/disable writeprotect a vma range. Unlike mprotect, > > > this doesn't split/merge vmas. > > > > > > Cc: Andrea Arcangeli <aarcange@redhat.com> > > > Cc: Rik van Riel <riel@redhat.com> > > > Cc: Kirill A. Shutemov <kirill@shutemov.name> > > > Cc: Mel Gorman <mgorman@suse.de> > > > Cc: Hugh Dickins <hughd@google.com> > > > Cc: Johannes Weiner <hannes@cmpxchg.org> > > > Signed-off-by: Shaohua Li <shli@fb.com> > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > > > [peterx: > > > - use the helper to find VMA; > > > - return -ENOENT if not found to match mcopy case; > > > - use the new MM_CP_UFFD_WP* flags for change_protection > > > - check against mmap_changing for failures] > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > --- > > > include/linux/userfaultfd_k.h | 3 ++ > > > mm/userfaultfd.c | 54 +++++++++++++++++++++++++++++++++++ > > > 2 files changed, 57 insertions(+) > > > > > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > > > index 765ce884cec0..8f6e6ed544fb 100644 > > > --- a/include/linux/userfaultfd_k.h > > > +++ b/include/linux/userfaultfd_k.h > > > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, > > > unsigned long dst_start, > > > unsigned long len, > > > bool *mmap_changing); > > > +extern int mwriteprotect_range(struct mm_struct *dst_mm, > > > + unsigned long start, unsigned long len, > > > + bool enable_wp, bool *mmap_changing); > > > > > > /* mm helpers */ > > > static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > > index fefa81c301b7..529d180bb4d7 100644 > > > --- a/mm/userfaultfd.c > > > +++ b/mm/userfaultfd.c > > > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, > > > { > > > return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); > > > } > > > + > > > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > > > + unsigned long len, bool enable_wp, bool *mmap_changing) > > > +{ > > > + struct vm_area_struct *dst_vma; > > > + pgprot_t newprot; > > > + int err; > > > + > > > + /* > > > + * Sanitize the command parameters: > > > + */ > > > + BUG_ON(start & ~PAGE_MASK); > > > + BUG_ON(len & ~PAGE_MASK); > > > + > > > + /* Does the address range wrap, or is the span zero-sized? */ > > > + BUG_ON(start + len <= start); > > > > I'd replace these BUG_ON()s with > > > > if (WARN_ON()) > > return -EINVAL; > > I believe BUG_ON() is used because these parameters should have been > checked in userfaultfd_writeprotect() already by the common > validate_range() even before calling mwriteprotect_range(). So I'm > fine with the WARN_ON() approach but I'd slightly prefer to simply > keep the patch as is to keep Jerome's r-b if you won't disagree. :) Right, userfaultfd_writeprotect() should check these parameters and if it didn't it was a bug indeed. But still, it's not severe enough to crash the kernel. I hope Jerome wouldn't mind to keep his r-b with s/BUG_ON/WARN_ON ;-) With this change you can also add Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> > Thanks, > > -- > Peter Xu >
On Tue, Feb 26, 2019 at 08:43:47AM +0200, Mike Rapoport wrote: > On Tue, Feb 26, 2019 at 02:06:27PM +0800, Peter Xu wrote: > > On Mon, Feb 25, 2019 at 10:52:34PM +0200, Mike Rapoport wrote: > > > On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote: > > > > From: Shaohua Li <shli@fb.com> > > > > > > > > Add API to enable/disable writeprotect a vma range. Unlike mprotect, > > > > this doesn't split/merge vmas. > > > > > > > > Cc: Andrea Arcangeli <aarcange@redhat.com> > > > > Cc: Rik van Riel <riel@redhat.com> > > > > Cc: Kirill A. Shutemov <kirill@shutemov.name> > > > > Cc: Mel Gorman <mgorman@suse.de> > > > > Cc: Hugh Dickins <hughd@google.com> > > > > Cc: Johannes Weiner <hannes@cmpxchg.org> > > > > Signed-off-by: Shaohua Li <shli@fb.com> > > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > > > > [peterx: > > > > - use the helper to find VMA; > > > > - return -ENOENT if not found to match mcopy case; > > > > - use the new MM_CP_UFFD_WP* flags for change_protection > > > > - check against mmap_changing for failures] > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > --- > > > > include/linux/userfaultfd_k.h | 3 ++ > > > > mm/userfaultfd.c | 54 +++++++++++++++++++++++++++++++++++ > > > > 2 files changed, 57 insertions(+) > > > > > > > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > > > > index 765ce884cec0..8f6e6ed544fb 100644 > > > > --- a/include/linux/userfaultfd_k.h > > > > +++ b/include/linux/userfaultfd_k.h > > > > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, > > > > unsigned long dst_start, > > > > unsigned long len, > > > > bool *mmap_changing); > > > > +extern int mwriteprotect_range(struct mm_struct *dst_mm, > > > > + unsigned long start, unsigned long len, > > > > + bool enable_wp, bool *mmap_changing); > > > > > > > > /* mm helpers */ > > > > static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > > > index fefa81c301b7..529d180bb4d7 100644 > > > > --- a/mm/userfaultfd.c > > > > +++ b/mm/userfaultfd.c > > > > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, > > > > { > > > > return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); > > > > } > > > > + > > > > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > > > > + unsigned long len, bool enable_wp, bool *mmap_changing) > > > > +{ > > > > + struct vm_area_struct *dst_vma; > > > > + pgprot_t newprot; > > > > + int err; > > > > + > > > > + /* > > > > + * Sanitize the command parameters: > > > > + */ > > > > + BUG_ON(start & ~PAGE_MASK); > > > > + BUG_ON(len & ~PAGE_MASK); > > > > + > > > > + /* Does the address range wrap, or is the span zero-sized? */ > > > > + BUG_ON(start + len <= start); > > > > > > I'd replace these BUG_ON()s with > > > > > > if (WARN_ON()) > > > return -EINVAL; > > > > I believe BUG_ON() is used because these parameters should have been > > checked in userfaultfd_writeprotect() already by the common > > validate_range() even before calling mwriteprotect_range(). So I'm > > fine with the WARN_ON() approach but I'd slightly prefer to simply > > keep the patch as is to keep Jerome's r-b if you won't disagree. :) > > Right, userfaultfd_writeprotect() should check these parameters and if it > didn't it was a bug indeed. But still, it's not severe enough to crash the > kernel. > > I hope Jerome wouldn't mind to keep his r-b with s/BUG_ON/WARN_ON ;-) > > With this change you can also add > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> Thanks! Though before I change anything... please note that the BUG_ON()s are really what we've done in existing MISSING code. One example is userfaultfd_copy() which did validate_range() first, then in __mcopy_atomic() we've used BUG_ON()s. They make sense to me becauase userspace should never be able to trigger it. And if we really want to change the BUG_ON()s in this patch, IMHO we probably want to change the other BUG_ON()s as well, then that can be a standalone patch or patchset to address another issue... (and if we really want to use WARN_ON, I would prefer WARN_ON_ONCE, or directly return the errors to avoid DOS). I'll see how you'd prefer to see how I should move on with this patch. Thanks,
On Tue, Feb 26, 2019 at 03:20:28PM +0800, Peter Xu wrote: > On Tue, Feb 26, 2019 at 08:43:47AM +0200, Mike Rapoport wrote: > > On Tue, Feb 26, 2019 at 02:06:27PM +0800, Peter Xu wrote: > > > On Mon, Feb 25, 2019 at 10:52:34PM +0200, Mike Rapoport wrote: > > > > On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote: > > > > > From: Shaohua Li <shli@fb.com> > > > > > > > > > > Add API to enable/disable writeprotect a vma range. Unlike mprotect, > > > > > this doesn't split/merge vmas. > > > > > > > > > > Cc: Andrea Arcangeli <aarcange@redhat.com> > > > > > Cc: Rik van Riel <riel@redhat.com> > > > > > Cc: Kirill A. Shutemov <kirill@shutemov.name> > > > > > Cc: Mel Gorman <mgorman@suse.de> > > > > > Cc: Hugh Dickins <hughd@google.com> > > > > > Cc: Johannes Weiner <hannes@cmpxchg.org> > > > > > Signed-off-by: Shaohua Li <shli@fb.com> > > > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com> > > > > > [peterx: > > > > > - use the helper to find VMA; > > > > > - return -ENOENT if not found to match mcopy case; > > > > > - use the new MM_CP_UFFD_WP* flags for change_protection > > > > > - check against mmap_changing for failures] > > > > > Signed-off-by: Peter Xu <peterx@redhat.com> > > > > > --- > > > > > include/linux/userfaultfd_k.h | 3 ++ > > > > > mm/userfaultfd.c | 54 +++++++++++++++++++++++++++++++++++ > > > > > 2 files changed, 57 insertions(+) > > > > > > > > > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h > > > > > index 765ce884cec0..8f6e6ed544fb 100644 > > > > > --- a/include/linux/userfaultfd_k.h > > > > > +++ b/include/linux/userfaultfd_k.h > > > > > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, > > > > > unsigned long dst_start, > > > > > unsigned long len, > > > > > bool *mmap_changing); > > > > > +extern int mwriteprotect_range(struct mm_struct *dst_mm, > > > > > + unsigned long start, unsigned long len, > > > > > + bool enable_wp, bool *mmap_changing); > > > > > > > > > > /* mm helpers */ > > > > > static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, > > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c > > > > > index fefa81c301b7..529d180bb4d7 100644 > > > > > --- a/mm/userfaultfd.c > > > > > +++ b/mm/userfaultfd.c > > > > > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, > > > > > { > > > > > return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); > > > > > } > > > > > + > > > > > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > > > > > + unsigned long len, bool enable_wp, bool *mmap_changing) > > > > > +{ > > > > > + struct vm_area_struct *dst_vma; > > > > > + pgprot_t newprot; > > > > > + int err; > > > > > + > > > > > + /* > > > > > + * Sanitize the command parameters: > > > > > + */ > > > > > + BUG_ON(start & ~PAGE_MASK); > > > > > + BUG_ON(len & ~PAGE_MASK); > > > > > + > > > > > + /* Does the address range wrap, or is the span zero-sized? */ > > > > > + BUG_ON(start + len <= start); > > > > > > > > I'd replace these BUG_ON()s with > > > > > > > > if (WARN_ON()) > > > > return -EINVAL; > > > > > > I believe BUG_ON() is used because these parameters should have been > > > checked in userfaultfd_writeprotect() already by the common > > > validate_range() even before calling mwriteprotect_range(). So I'm > > > fine with the WARN_ON() approach but I'd slightly prefer to simply > > > keep the patch as is to keep Jerome's r-b if you won't disagree. :) > > > > Right, userfaultfd_writeprotect() should check these parameters and if it > > didn't it was a bug indeed. But still, it's not severe enough to crash the > > kernel. > > > > I hope Jerome wouldn't mind to keep his r-b with s/BUG_ON/WARN_ON ;-) > > > > With this change you can also add > > > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> > > Thanks! Though before I change anything... please note that the > BUG_ON()s are really what we've done in existing MISSING code. One > example is userfaultfd_copy() which did validate_range() first, then > in __mcopy_atomic() we've used BUG_ON()s. They make sense to me > becauase userspace should never be able to trigger it. And if we > really want to change the BUG_ON()s in this patch, IMHO we probably > want to change the other BUG_ON()s as well, then that can be a > standalone patch or patchset to address another issue... Yeah, we have quite a lot of them, so doing the replacement in a separate patch makes perfect sense. > (and if we really want to use WARN_ON, I would prefer WARN_ON_ONCE, or > directly return the errors to avoid DOS). Agree. > I'll see how you'd prefer to see how I should move on with this patch. Let's keep this patch as is and make the replacement on top of the WP series. Feel free to add r-b. > Thanks, > > -- > Peter Xu >
On Tue, Feb 26, 2019 at 09:46:12AM +0200, Mike Rapoport wrote: [...] > > > > > > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, > > > > > > + unsigned long len, bool enable_wp, bool *mmap_changing) > > > > > > +{ > > > > > > + struct vm_area_struct *dst_vma; > > > > > > + pgprot_t newprot; > > > > > > + int err; > > > > > > + > > > > > > + /* > > > > > > + * Sanitize the command parameters: > > > > > > + */ > > > > > > + BUG_ON(start & ~PAGE_MASK); > > > > > > + BUG_ON(len & ~PAGE_MASK); > > > > > > + > > > > > > + /* Does the address range wrap, or is the span zero-sized? */ > > > > > > + BUG_ON(start + len <= start); > > > > > > > > > > I'd replace these BUG_ON()s with > > > > > > > > > > if (WARN_ON()) > > > > > return -EINVAL; > > > > > > > > I believe BUG_ON() is used because these parameters should have been > > > > checked in userfaultfd_writeprotect() already by the common > > > > validate_range() even before calling mwriteprotect_range(). So I'm > > > > fine with the WARN_ON() approach but I'd slightly prefer to simply > > > > keep the patch as is to keep Jerome's r-b if you won't disagree. :) > > > > > > Right, userfaultfd_writeprotect() should check these parameters and if it > > > didn't it was a bug indeed. But still, it's not severe enough to crash the > > > kernel. > > > > > > I hope Jerome wouldn't mind to keep his r-b with s/BUG_ON/WARN_ON ;-) > > > > > > With this change you can also add > > > > > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com> > > > > Thanks! Though before I change anything... please note that the > > BUG_ON()s are really what we've done in existing MISSING code. One > > example is userfaultfd_copy() which did validate_range() first, then > > in __mcopy_atomic() we've used BUG_ON()s. They make sense to me > > becauase userspace should never be able to trigger it. And if we > > really want to change the BUG_ON()s in this patch, IMHO we probably > > want to change the other BUG_ON()s as well, then that can be a > > standalone patch or patchset to address another issue... > > Yeah, we have quite a lot of them, so doing the replacement in a separate > patch makes perfect sense. > > > (and if we really want to use WARN_ON, I would prefer WARN_ON_ONCE, or > > directly return the errors to avoid DOS). > > Agree. > > > I'll see how you'd prefer to see how I should move on with this patch. > > Let's keep this patch as is and make the replacement on top of the WP > series. Feel free to add r-b. Great! I'll do. Thanks,
diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 765ce884cec0..8f6e6ed544fb 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long len, bool *mmap_changing); +extern int mwriteprotect_range(struct mm_struct *dst_mm, + unsigned long start, unsigned long len, + bool enable_wp, bool *mmap_changing); /* mm helpers */ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index fefa81c301b7..529d180bb4d7 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, { return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); } + +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, + unsigned long len, bool enable_wp, bool *mmap_changing) +{ + struct vm_area_struct *dst_vma; + pgprot_t newprot; + int err; + + /* + * Sanitize the command parameters: + */ + BUG_ON(start & ~PAGE_MASK); + BUG_ON(len & ~PAGE_MASK); + + /* Does the address range wrap, or is the span zero-sized? */ + BUG_ON(start + len <= start); + + down_read(&dst_mm->mmap_sem); + + /* + * If memory mappings are changing because of non-cooperative + * operation (e.g. mremap) running in parallel, bail out and + * request the user to retry later + */ + err = -EAGAIN; + if (mmap_changing && READ_ONCE(*mmap_changing)) + goto out_unlock; + + err = -ENOENT; + dst_vma = vma_find_uffd(dst_mm, start, len); + /* + * Make sure the vma is not shared, that the dst range is + * both valid and fully within a single existing vma. + */ + if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) + goto out_unlock; + if (!userfaultfd_wp(dst_vma)) + goto out_unlock; + if (!vma_is_anonymous(dst_vma)) + goto out_unlock; + + if (enable_wp) + newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE)); + else + newprot = vm_get_page_prot(dst_vma->vm_flags); + + change_protection(dst_vma, start, start + len, newprot, + enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE); + + err = 0; +out_unlock: + up_read(&dst_mm->mmap_sem); + return err; +}