diff mbox series

[v2,20/26] userfaultfd: wp: support write protection for userfault vma range

Message ID 20190212025632.28946-21-peterx@redhat.com (mailing list archive)
State New, archived
Headers show
Series userfaultfd: write protection support | expand

Commit Message

Peter Xu Feb. 12, 2019, 2:56 a.m. UTC
From: Shaohua Li <shli@fb.com>

Add API to enable/disable writeprotect a vma range. Unlike mprotect,
this doesn't split/merge vmas.

Cc: Andrea Arcangeli <aarcange@redhat.com>
Cc: Rik van Riel <riel@redhat.com>
Cc: Kirill A. Shutemov <kirill@shutemov.name>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Hugh Dickins <hughd@google.com>
Cc: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Shaohua Li <shli@fb.com>
Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
[peterx:
 - use the helper to find VMA;
 - return -ENOENT if not found to match mcopy case;
 - use the new MM_CP_UFFD_WP* flags for change_protection
 - check against mmap_changing for failures]
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 include/linux/userfaultfd_k.h |  3 ++
 mm/userfaultfd.c              | 54 +++++++++++++++++++++++++++++++++++
 2 files changed, 57 insertions(+)

Comments

Jerome Glisse Feb. 21, 2019, 6:23 p.m. UTC | #1
On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote:
> From: Shaohua Li <shli@fb.com>
> 
> Add API to enable/disable writeprotect a vma range. Unlike mprotect,
> this doesn't split/merge vmas.
> 
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> [peterx:
>  - use the helper to find VMA;
>  - return -ENOENT if not found to match mcopy case;
>  - use the new MM_CP_UFFD_WP* flags for change_protection
>  - check against mmap_changing for failures]
> Signed-off-by: Peter Xu <peterx@redhat.com>

I have a question see below but anyway:

Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

> ---
>  include/linux/userfaultfd_k.h |  3 ++
>  mm/userfaultfd.c              | 54 +++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index 765ce884cec0..8f6e6ed544fb 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
>  			      unsigned long dst_start,
>  			      unsigned long len,
>  			      bool *mmap_changing);
> +extern int mwriteprotect_range(struct mm_struct *dst_mm,
> +			       unsigned long start, unsigned long len,
> +			       bool enable_wp, bool *mmap_changing);
>  
>  /* mm helpers */
>  static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index fefa81c301b7..529d180bb4d7 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
>  {
>  	return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0);
>  }
> +
> +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> +			unsigned long len, bool enable_wp, bool *mmap_changing)
> +{
> +	struct vm_area_struct *dst_vma;
> +	pgprot_t newprot;
> +	int err;
> +
> +	/*
> +	 * Sanitize the command parameters:
> +	 */
> +	BUG_ON(start & ~PAGE_MASK);
> +	BUG_ON(len & ~PAGE_MASK);
> +
> +	/* Does the address range wrap, or is the span zero-sized? */
> +	BUG_ON(start + len <= start);
> +
> +	down_read(&dst_mm->mmap_sem);
> +
> +	/*
> +	 * If memory mappings are changing because of non-cooperative
> +	 * operation (e.g. mremap) running in parallel, bail out and
> +	 * request the user to retry later
> +	 */
> +	err = -EAGAIN;
> +	if (mmap_changing && READ_ONCE(*mmap_changing))
> +		goto out_unlock;
> +
> +	err = -ENOENT;
> +	dst_vma = vma_find_uffd(dst_mm, start, len);
> +	/*
> +	 * Make sure the vma is not shared, that the dst range is
> +	 * both valid and fully within a single existing vma.
> +	 */
> +	if (!dst_vma || (dst_vma->vm_flags & VM_SHARED))
> +		goto out_unlock;
> +	if (!userfaultfd_wp(dst_vma))
> +		goto out_unlock;
> +	if (!vma_is_anonymous(dst_vma))
> +		goto out_unlock;

Don't you want to distinguish between no VMA ie ENOENT and vma that
can not be write protected (VM_SHARED, not userfaultfd, not anonymous) ?
Peter Xu Feb. 25, 2019, 8:16 a.m. UTC | #2
On Thu, Feb 21, 2019 at 01:23:59PM -0500, Jerome Glisse wrote:
> On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote:
> > From: Shaohua Li <shli@fb.com>
> > 
> > Add API to enable/disable writeprotect a vma range. Unlike mprotect,
> > this doesn't split/merge vmas.
> > 
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Shaohua Li <shli@fb.com>
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > [peterx:
> >  - use the helper to find VMA;
> >  - return -ENOENT if not found to match mcopy case;
> >  - use the new MM_CP_UFFD_WP* flags for change_protection
> >  - check against mmap_changing for failures]
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> 
> I have a question see below but anyway:
> 
> Reviewed-by: Jérôme Glisse <jglisse@redhat.com>

Thanks!

> 
> > ---
> >  include/linux/userfaultfd_k.h |  3 ++
> >  mm/userfaultfd.c              | 54 +++++++++++++++++++++++++++++++++++
> >  2 files changed, 57 insertions(+)
> > 
> > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> > index 765ce884cec0..8f6e6ed544fb 100644
> > --- a/include/linux/userfaultfd_k.h
> > +++ b/include/linux/userfaultfd_k.h
> > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
> >  			      unsigned long dst_start,
> >  			      unsigned long len,
> >  			      bool *mmap_changing);
> > +extern int mwriteprotect_range(struct mm_struct *dst_mm,
> > +			       unsigned long start, unsigned long len,
> > +			       bool enable_wp, bool *mmap_changing);
> >  
> >  /* mm helpers */
> >  static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
> > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > index fefa81c301b7..529d180bb4d7 100644
> > --- a/mm/userfaultfd.c
> > +++ b/mm/userfaultfd.c
> > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
> >  {
> >  	return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0);
> >  }
> > +
> > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> > +			unsigned long len, bool enable_wp, bool *mmap_changing)
> > +{
> > +	struct vm_area_struct *dst_vma;
> > +	pgprot_t newprot;
> > +	int err;
> > +
> > +	/*
> > +	 * Sanitize the command parameters:
> > +	 */
> > +	BUG_ON(start & ~PAGE_MASK);
> > +	BUG_ON(len & ~PAGE_MASK);
> > +
> > +	/* Does the address range wrap, or is the span zero-sized? */
> > +	BUG_ON(start + len <= start);
> > +
> > +	down_read(&dst_mm->mmap_sem);
> > +
> > +	/*
> > +	 * If memory mappings are changing because of non-cooperative
> > +	 * operation (e.g. mremap) running in parallel, bail out and
> > +	 * request the user to retry later
> > +	 */
> > +	err = -EAGAIN;
> > +	if (mmap_changing && READ_ONCE(*mmap_changing))
> > +		goto out_unlock;
> > +
> > +	err = -ENOENT;
> > +	dst_vma = vma_find_uffd(dst_mm, start, len);
> > +	/*
> > +	 * Make sure the vma is not shared, that the dst range is
> > +	 * both valid and fully within a single existing vma.
> > +	 */
> > +	if (!dst_vma || (dst_vma->vm_flags & VM_SHARED))
> > +		goto out_unlock;
> > +	if (!userfaultfd_wp(dst_vma))
> > +		goto out_unlock;
> > +	if (!vma_is_anonymous(dst_vma))
> > +		goto out_unlock;
> 
> Don't you want to distinguish between no VMA ie ENOENT and vma that
> can not be write protected (VM_SHARED, not userfaultfd, not anonymous) ?

Here we'll return ENOENT for all these errors which is actually trying
to follow existing MISSING codes.  Mike noticed some errno issues
during reviewing the first version and suggested that we'd better
follow the old rules which makes perfect sense to me.  E.g., in
__mcopy_atomic() we'll return ENOENT for either (1) VMA not found, (2)
not UFFD VMA, (3) range check failures.  Checking against anonymous
and VM_SHARED are special for uffd-wp but I'm simply using this same
errno since after all all these errors will stop us from finding a
valid VMA before going anywhere further.

Thanks,
Mike Rapoport Feb. 25, 2019, 8:52 p.m. UTC | #3
On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote:
> From: Shaohua Li <shli@fb.com>
> 
> Add API to enable/disable writeprotect a vma range. Unlike mprotect,
> this doesn't split/merge vmas.
> 
> Cc: Andrea Arcangeli <aarcange@redhat.com>
> Cc: Rik van Riel <riel@redhat.com>
> Cc: Kirill A. Shutemov <kirill@shutemov.name>
> Cc: Mel Gorman <mgorman@suse.de>
> Cc: Hugh Dickins <hughd@google.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Signed-off-by: Shaohua Li <shli@fb.com>
> Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> [peterx:
>  - use the helper to find VMA;
>  - return -ENOENT if not found to match mcopy case;
>  - use the new MM_CP_UFFD_WP* flags for change_protection
>  - check against mmap_changing for failures]
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
>  include/linux/userfaultfd_k.h |  3 ++
>  mm/userfaultfd.c              | 54 +++++++++++++++++++++++++++++++++++
>  2 files changed, 57 insertions(+)
> 
> diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> index 765ce884cec0..8f6e6ed544fb 100644
> --- a/include/linux/userfaultfd_k.h
> +++ b/include/linux/userfaultfd_k.h
> @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
>  			      unsigned long dst_start,
>  			      unsigned long len,
>  			      bool *mmap_changing);
> +extern int mwriteprotect_range(struct mm_struct *dst_mm,
> +			       unsigned long start, unsigned long len,
> +			       bool enable_wp, bool *mmap_changing);
> 
>  /* mm helpers */
>  static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index fefa81c301b7..529d180bb4d7 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
>  {
>  	return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0);
>  }
> +
> +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> +			unsigned long len, bool enable_wp, bool *mmap_changing)
> +{
> +	struct vm_area_struct *dst_vma;
> +	pgprot_t newprot;
> +	int err;
> +
> +	/*
> +	 * Sanitize the command parameters:
> +	 */
> +	BUG_ON(start & ~PAGE_MASK);
> +	BUG_ON(len & ~PAGE_MASK);
> +
> +	/* Does the address range wrap, or is the span zero-sized? */
> +	BUG_ON(start + len <= start);

I'd replace these BUG_ON()s with

	if (WARN_ON())
		 return -EINVAL;

> +
> +	down_read(&dst_mm->mmap_sem);
> +
> +	/*
> +	 * If memory mappings are changing because of non-cooperative
> +	 * operation (e.g. mremap) running in parallel, bail out and
> +	 * request the user to retry later
> +	 */
> +	err = -EAGAIN;
> +	if (mmap_changing && READ_ONCE(*mmap_changing))
> +		goto out_unlock;
> +
> +	err = -ENOENT;
> +	dst_vma = vma_find_uffd(dst_mm, start, len);
> +	/*
> +	 * Make sure the vma is not shared, that the dst range is
> +	 * both valid and fully within a single existing vma.
> +	 */
> +	if (!dst_vma || (dst_vma->vm_flags & VM_SHARED))
> +		goto out_unlock;
> +	if (!userfaultfd_wp(dst_vma))
> +		goto out_unlock;
> +	if (!vma_is_anonymous(dst_vma))
> +		goto out_unlock;
> +
> +	if (enable_wp)
> +		newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE));
> +	else
> +		newprot = vm_get_page_prot(dst_vma->vm_flags);
> +
> +	change_protection(dst_vma, start, start + len, newprot,
> +			  enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE);
> +
> +	err = 0;
> +out_unlock:
> +	up_read(&dst_mm->mmap_sem);
> +	return err;
> +}
> -- 
> 2.17.1
>
Peter Xu Feb. 26, 2019, 6:06 a.m. UTC | #4
On Mon, Feb 25, 2019 at 10:52:34PM +0200, Mike Rapoport wrote:
> On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote:
> > From: Shaohua Li <shli@fb.com>
> > 
> > Add API to enable/disable writeprotect a vma range. Unlike mprotect,
> > this doesn't split/merge vmas.
> > 
> > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > Cc: Rik van Riel <riel@redhat.com>
> > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > Cc: Mel Gorman <mgorman@suse.de>
> > Cc: Hugh Dickins <hughd@google.com>
> > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > Signed-off-by: Shaohua Li <shli@fb.com>
> > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > [peterx:
> >  - use the helper to find VMA;
> >  - return -ENOENT if not found to match mcopy case;
> >  - use the new MM_CP_UFFD_WP* flags for change_protection
> >  - check against mmap_changing for failures]
> > Signed-off-by: Peter Xu <peterx@redhat.com>
> > ---
> >  include/linux/userfaultfd_k.h |  3 ++
> >  mm/userfaultfd.c              | 54 +++++++++++++++++++++++++++++++++++
> >  2 files changed, 57 insertions(+)
> > 
> > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> > index 765ce884cec0..8f6e6ed544fb 100644
> > --- a/include/linux/userfaultfd_k.h
> > +++ b/include/linux/userfaultfd_k.h
> > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
> >  			      unsigned long dst_start,
> >  			      unsigned long len,
> >  			      bool *mmap_changing);
> > +extern int mwriteprotect_range(struct mm_struct *dst_mm,
> > +			       unsigned long start, unsigned long len,
> > +			       bool enable_wp, bool *mmap_changing);
> > 
> >  /* mm helpers */
> >  static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
> > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > index fefa81c301b7..529d180bb4d7 100644
> > --- a/mm/userfaultfd.c
> > +++ b/mm/userfaultfd.c
> > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
> >  {
> >  	return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0);
> >  }
> > +
> > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> > +			unsigned long len, bool enable_wp, bool *mmap_changing)
> > +{
> > +	struct vm_area_struct *dst_vma;
> > +	pgprot_t newprot;
> > +	int err;
> > +
> > +	/*
> > +	 * Sanitize the command parameters:
> > +	 */
> > +	BUG_ON(start & ~PAGE_MASK);
> > +	BUG_ON(len & ~PAGE_MASK);
> > +
> > +	/* Does the address range wrap, or is the span zero-sized? */
> > +	BUG_ON(start + len <= start);
> 
> I'd replace these BUG_ON()s with
> 
> 	if (WARN_ON())
> 		 return -EINVAL;

I believe BUG_ON() is used because these parameters should have been
checked in userfaultfd_writeprotect() already by the common
validate_range() even before calling mwriteprotect_range().  So I'm
fine with the WARN_ON() approach but I'd slightly prefer to simply
keep the patch as is to keep Jerome's r-b if you won't disagree. :)

Thanks,
Mike Rapoport Feb. 26, 2019, 6:43 a.m. UTC | #5
On Tue, Feb 26, 2019 at 02:06:27PM +0800, Peter Xu wrote:
> On Mon, Feb 25, 2019 at 10:52:34PM +0200, Mike Rapoport wrote:
> > On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote:
> > > From: Shaohua Li <shli@fb.com>
> > > 
> > > Add API to enable/disable writeprotect a vma range. Unlike mprotect,
> > > this doesn't split/merge vmas.
> > > 
> > > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > > Cc: Rik van Riel <riel@redhat.com>
> > > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > > Cc: Mel Gorman <mgorman@suse.de>
> > > Cc: Hugh Dickins <hughd@google.com>
> > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > Signed-off-by: Shaohua Li <shli@fb.com>
> > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > > [peterx:
> > >  - use the helper to find VMA;
> > >  - return -ENOENT if not found to match mcopy case;
> > >  - use the new MM_CP_UFFD_WP* flags for change_protection
> > >  - check against mmap_changing for failures]
> > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > ---
> > >  include/linux/userfaultfd_k.h |  3 ++
> > >  mm/userfaultfd.c              | 54 +++++++++++++++++++++++++++++++++++
> > >  2 files changed, 57 insertions(+)
> > > 
> > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> > > index 765ce884cec0..8f6e6ed544fb 100644
> > > --- a/include/linux/userfaultfd_k.h
> > > +++ b/include/linux/userfaultfd_k.h
> > > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
> > >  			      unsigned long dst_start,
> > >  			      unsigned long len,
> > >  			      bool *mmap_changing);
> > > +extern int mwriteprotect_range(struct mm_struct *dst_mm,
> > > +			       unsigned long start, unsigned long len,
> > > +			       bool enable_wp, bool *mmap_changing);
> > > 
> > >  /* mm helpers */
> > >  static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
> > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > > index fefa81c301b7..529d180bb4d7 100644
> > > --- a/mm/userfaultfd.c
> > > +++ b/mm/userfaultfd.c
> > > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
> > >  {
> > >  	return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0);
> > >  }
> > > +
> > > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> > > +			unsigned long len, bool enable_wp, bool *mmap_changing)
> > > +{
> > > +	struct vm_area_struct *dst_vma;
> > > +	pgprot_t newprot;
> > > +	int err;
> > > +
> > > +	/*
> > > +	 * Sanitize the command parameters:
> > > +	 */
> > > +	BUG_ON(start & ~PAGE_MASK);
> > > +	BUG_ON(len & ~PAGE_MASK);
> > > +
> > > +	/* Does the address range wrap, or is the span zero-sized? */
> > > +	BUG_ON(start + len <= start);
> > 
> > I'd replace these BUG_ON()s with
> > 
> > 	if (WARN_ON())
> > 		 return -EINVAL;
> 
> I believe BUG_ON() is used because these parameters should have been
> checked in userfaultfd_writeprotect() already by the common
> validate_range() even before calling mwriteprotect_range().  So I'm
> fine with the WARN_ON() approach but I'd slightly prefer to simply
> keep the patch as is to keep Jerome's r-b if you won't disagree. :)

Right, userfaultfd_writeprotect() should check these parameters and if it
didn't it was a bug indeed. But still, it's not severe enough to crash the
kernel.

I hope Jerome wouldn't mind to keep his r-b with s/BUG_ON/WARN_ON ;-)

With this change you can also add 

Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
 
> Thanks,
> 
> -- 
> Peter Xu
>
Peter Xu Feb. 26, 2019, 7:20 a.m. UTC | #6
On Tue, Feb 26, 2019 at 08:43:47AM +0200, Mike Rapoport wrote:
> On Tue, Feb 26, 2019 at 02:06:27PM +0800, Peter Xu wrote:
> > On Mon, Feb 25, 2019 at 10:52:34PM +0200, Mike Rapoport wrote:
> > > On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote:
> > > > From: Shaohua Li <shli@fb.com>
> > > > 
> > > > Add API to enable/disable writeprotect a vma range. Unlike mprotect,
> > > > this doesn't split/merge vmas.
> > > > 
> > > > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > > > Cc: Rik van Riel <riel@redhat.com>
> > > > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > > > Cc: Mel Gorman <mgorman@suse.de>
> > > > Cc: Hugh Dickins <hughd@google.com>
> > > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > > Signed-off-by: Shaohua Li <shli@fb.com>
> > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > > > [peterx:
> > > >  - use the helper to find VMA;
> > > >  - return -ENOENT if not found to match mcopy case;
> > > >  - use the new MM_CP_UFFD_WP* flags for change_protection
> > > >  - check against mmap_changing for failures]
> > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > ---
> > > >  include/linux/userfaultfd_k.h |  3 ++
> > > >  mm/userfaultfd.c              | 54 +++++++++++++++++++++++++++++++++++
> > > >  2 files changed, 57 insertions(+)
> > > > 
> > > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> > > > index 765ce884cec0..8f6e6ed544fb 100644
> > > > --- a/include/linux/userfaultfd_k.h
> > > > +++ b/include/linux/userfaultfd_k.h
> > > > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
> > > >  			      unsigned long dst_start,
> > > >  			      unsigned long len,
> > > >  			      bool *mmap_changing);
> > > > +extern int mwriteprotect_range(struct mm_struct *dst_mm,
> > > > +			       unsigned long start, unsigned long len,
> > > > +			       bool enable_wp, bool *mmap_changing);
> > > > 
> > > >  /* mm helpers */
> > > >  static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
> > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > > > index fefa81c301b7..529d180bb4d7 100644
> > > > --- a/mm/userfaultfd.c
> > > > +++ b/mm/userfaultfd.c
> > > > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
> > > >  {
> > > >  	return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0);
> > > >  }
> > > > +
> > > > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> > > > +			unsigned long len, bool enable_wp, bool *mmap_changing)
> > > > +{
> > > > +	struct vm_area_struct *dst_vma;
> > > > +	pgprot_t newprot;
> > > > +	int err;
> > > > +
> > > > +	/*
> > > > +	 * Sanitize the command parameters:
> > > > +	 */
> > > > +	BUG_ON(start & ~PAGE_MASK);
> > > > +	BUG_ON(len & ~PAGE_MASK);
> > > > +
> > > > +	/* Does the address range wrap, or is the span zero-sized? */
> > > > +	BUG_ON(start + len <= start);
> > > 
> > > I'd replace these BUG_ON()s with
> > > 
> > > 	if (WARN_ON())
> > > 		 return -EINVAL;
> > 
> > I believe BUG_ON() is used because these parameters should have been
> > checked in userfaultfd_writeprotect() already by the common
> > validate_range() even before calling mwriteprotect_range().  So I'm
> > fine with the WARN_ON() approach but I'd slightly prefer to simply
> > keep the patch as is to keep Jerome's r-b if you won't disagree. :)
> 
> Right, userfaultfd_writeprotect() should check these parameters and if it
> didn't it was a bug indeed. But still, it's not severe enough to crash the
> kernel.
> 
> I hope Jerome wouldn't mind to keep his r-b with s/BUG_ON/WARN_ON ;-)
> 
> With this change you can also add 
> 
> Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>

Thanks!  Though before I change anything... please note that the
BUG_ON()s are really what we've done in existing MISSING code.  One
example is userfaultfd_copy() which did validate_range() first, then
in __mcopy_atomic() we've used BUG_ON()s.  They make sense to me
becauase userspace should never be able to trigger it.  And if we
really want to change the BUG_ON()s in this patch, IMHO we probably
want to change the other BUG_ON()s as well, then that can be a
standalone patch or patchset to address another issue...

(and if we really want to use WARN_ON, I would prefer WARN_ON_ONCE, or
 directly return the errors to avoid DOS).

I'll see how you'd prefer to see how I should move on with this patch.

Thanks,
Mike Rapoport Feb. 26, 2019, 7:46 a.m. UTC | #7
On Tue, Feb 26, 2019 at 03:20:28PM +0800, Peter Xu wrote:
> On Tue, Feb 26, 2019 at 08:43:47AM +0200, Mike Rapoport wrote:
> > On Tue, Feb 26, 2019 at 02:06:27PM +0800, Peter Xu wrote:
> > > On Mon, Feb 25, 2019 at 10:52:34PM +0200, Mike Rapoport wrote:
> > > > On Tue, Feb 12, 2019 at 10:56:26AM +0800, Peter Xu wrote:
> > > > > From: Shaohua Li <shli@fb.com>
> > > > > 
> > > > > Add API to enable/disable writeprotect a vma range. Unlike mprotect,
> > > > > this doesn't split/merge vmas.
> > > > > 
> > > > > Cc: Andrea Arcangeli <aarcange@redhat.com>
> > > > > Cc: Rik van Riel <riel@redhat.com>
> > > > > Cc: Kirill A. Shutemov <kirill@shutemov.name>
> > > > > Cc: Mel Gorman <mgorman@suse.de>
> > > > > Cc: Hugh Dickins <hughd@google.com>
> > > > > Cc: Johannes Weiner <hannes@cmpxchg.org>
> > > > > Signed-off-by: Shaohua Li <shli@fb.com>
> > > > > Signed-off-by: Andrea Arcangeli <aarcange@redhat.com>
> > > > > [peterx:
> > > > >  - use the helper to find VMA;
> > > > >  - return -ENOENT if not found to match mcopy case;
> > > > >  - use the new MM_CP_UFFD_WP* flags for change_protection
> > > > >  - check against mmap_changing for failures]
> > > > > Signed-off-by: Peter Xu <peterx@redhat.com>
> > > > > ---
> > > > >  include/linux/userfaultfd_k.h |  3 ++
> > > > >  mm/userfaultfd.c              | 54 +++++++++++++++++++++++++++++++++++
> > > > >  2 files changed, 57 insertions(+)
> > > > > 
> > > > > diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
> > > > > index 765ce884cec0..8f6e6ed544fb 100644
> > > > > --- a/include/linux/userfaultfd_k.h
> > > > > +++ b/include/linux/userfaultfd_k.h
> > > > > @@ -39,6 +39,9 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
> > > > >  			      unsigned long dst_start,
> > > > >  			      unsigned long len,
> > > > >  			      bool *mmap_changing);
> > > > > +extern int mwriteprotect_range(struct mm_struct *dst_mm,
> > > > > +			       unsigned long start, unsigned long len,
> > > > > +			       bool enable_wp, bool *mmap_changing);
> > > > > 
> > > > >  /* mm helpers */
> > > > >  static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
> > > > > diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> > > > > index fefa81c301b7..529d180bb4d7 100644
> > > > > --- a/mm/userfaultfd.c
> > > > > +++ b/mm/userfaultfd.c
> > > > > @@ -639,3 +639,57 @@ ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
> > > > >  {
> > > > >  	return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0);
> > > > >  }
> > > > > +
> > > > > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> > > > > +			unsigned long len, bool enable_wp, bool *mmap_changing)
> > > > > +{
> > > > > +	struct vm_area_struct *dst_vma;
> > > > > +	pgprot_t newprot;
> > > > > +	int err;
> > > > > +
> > > > > +	/*
> > > > > +	 * Sanitize the command parameters:
> > > > > +	 */
> > > > > +	BUG_ON(start & ~PAGE_MASK);
> > > > > +	BUG_ON(len & ~PAGE_MASK);
> > > > > +
> > > > > +	/* Does the address range wrap, or is the span zero-sized? */
> > > > > +	BUG_ON(start + len <= start);
> > > > 
> > > > I'd replace these BUG_ON()s with
> > > > 
> > > > 	if (WARN_ON())
> > > > 		 return -EINVAL;
> > > 
> > > I believe BUG_ON() is used because these parameters should have been
> > > checked in userfaultfd_writeprotect() already by the common
> > > validate_range() even before calling mwriteprotect_range().  So I'm
> > > fine with the WARN_ON() approach but I'd slightly prefer to simply
> > > keep the patch as is to keep Jerome's r-b if you won't disagree. :)
> > 
> > Right, userfaultfd_writeprotect() should check these parameters and if it
> > didn't it was a bug indeed. But still, it's not severe enough to crash the
> > kernel.
> > 
> > I hope Jerome wouldn't mind to keep his r-b with s/BUG_ON/WARN_ON ;-)
> > 
> > With this change you can also add 
> > 
> > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> 
> Thanks!  Though before I change anything... please note that the
> BUG_ON()s are really what we've done in existing MISSING code.  One
> example is userfaultfd_copy() which did validate_range() first, then
> in __mcopy_atomic() we've used BUG_ON()s.  They make sense to me
> becauase userspace should never be able to trigger it.  And if we
> really want to change the BUG_ON()s in this patch, IMHO we probably
> want to change the other BUG_ON()s as well, then that can be a
> standalone patch or patchset to address another issue...

Yeah, we have quite a lot of them, so doing the replacement in a separate
patch makes perfect sense.
 
> (and if we really want to use WARN_ON, I would prefer WARN_ON_ONCE, or
>  directly return the errors to avoid DOS).

Agree.

> I'll see how you'd prefer to see how I should move on with this patch.

Let's keep this patch as is and make the replacement on top of the WP
series. Feel free to add r-b.
 
> Thanks,
> 
> -- 
> Peter Xu
>
Peter Xu Feb. 26, 2019, 7:54 a.m. UTC | #8
On Tue, Feb 26, 2019 at 09:46:12AM +0200, Mike Rapoport wrote:

[...]

> > > > > > +int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
> > > > > > +			unsigned long len, bool enable_wp, bool *mmap_changing)
> > > > > > +{
> > > > > > +	struct vm_area_struct *dst_vma;
> > > > > > +	pgprot_t newprot;
> > > > > > +	int err;
> > > > > > +
> > > > > > +	/*
> > > > > > +	 * Sanitize the command parameters:
> > > > > > +	 */
> > > > > > +	BUG_ON(start & ~PAGE_MASK);
> > > > > > +	BUG_ON(len & ~PAGE_MASK);
> > > > > > +
> > > > > > +	/* Does the address range wrap, or is the span zero-sized? */
> > > > > > +	BUG_ON(start + len <= start);
> > > > > 
> > > > > I'd replace these BUG_ON()s with
> > > > > 
> > > > > 	if (WARN_ON())
> > > > > 		 return -EINVAL;
> > > > 
> > > > I believe BUG_ON() is used because these parameters should have been
> > > > checked in userfaultfd_writeprotect() already by the common
> > > > validate_range() even before calling mwriteprotect_range().  So I'm
> > > > fine with the WARN_ON() approach but I'd slightly prefer to simply
> > > > keep the patch as is to keep Jerome's r-b if you won't disagree. :)
> > > 
> > > Right, userfaultfd_writeprotect() should check these parameters and if it
> > > didn't it was a bug indeed. But still, it's not severe enough to crash the
> > > kernel.
> > > 
> > > I hope Jerome wouldn't mind to keep his r-b with s/BUG_ON/WARN_ON ;-)
> > > 
> > > With this change you can also add 
> > > 
> > > Reviewed-by: Mike Rapoport <rppt@linux.ibm.com>
> > 
> > Thanks!  Though before I change anything... please note that the
> > BUG_ON()s are really what we've done in existing MISSING code.  One
> > example is userfaultfd_copy() which did validate_range() first, then
> > in __mcopy_atomic() we've used BUG_ON()s.  They make sense to me
> > becauase userspace should never be able to trigger it.  And if we
> > really want to change the BUG_ON()s in this patch, IMHO we probably
> > want to change the other BUG_ON()s as well, then that can be a
> > standalone patch or patchset to address another issue...
> 
> Yeah, we have quite a lot of them, so doing the replacement in a separate
> patch makes perfect sense.
>  
> > (and if we really want to use WARN_ON, I would prefer WARN_ON_ONCE, or
> >  directly return the errors to avoid DOS).
> 
> Agree.
> 
> > I'll see how you'd prefer to see how I should move on with this patch.
> 
> Let's keep this patch as is and make the replacement on top of the WP
> series. Feel free to add r-b.

Great!  I'll do.  Thanks,
diff mbox series

Patch

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 765ce884cec0..8f6e6ed544fb 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -39,6 +39,9 @@  extern ssize_t mfill_zeropage(struct mm_struct *dst_mm,
 			      unsigned long dst_start,
 			      unsigned long len,
 			      bool *mmap_changing);
+extern int mwriteprotect_range(struct mm_struct *dst_mm,
+			       unsigned long start, unsigned long len,
+			       bool enable_wp, bool *mmap_changing);
 
 /* mm helpers */
 static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma,
diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
index fefa81c301b7..529d180bb4d7 100644
--- a/mm/userfaultfd.c
+++ b/mm/userfaultfd.c
@@ -639,3 +639,57 @@  ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start,
 {
 	return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0);
 }
+
+int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start,
+			unsigned long len, bool enable_wp, bool *mmap_changing)
+{
+	struct vm_area_struct *dst_vma;
+	pgprot_t newprot;
+	int err;
+
+	/*
+	 * Sanitize the command parameters:
+	 */
+	BUG_ON(start & ~PAGE_MASK);
+	BUG_ON(len & ~PAGE_MASK);
+
+	/* Does the address range wrap, or is the span zero-sized? */
+	BUG_ON(start + len <= start);
+
+	down_read(&dst_mm->mmap_sem);
+
+	/*
+	 * If memory mappings are changing because of non-cooperative
+	 * operation (e.g. mremap) running in parallel, bail out and
+	 * request the user to retry later
+	 */
+	err = -EAGAIN;
+	if (mmap_changing && READ_ONCE(*mmap_changing))
+		goto out_unlock;
+
+	err = -ENOENT;
+	dst_vma = vma_find_uffd(dst_mm, start, len);
+	/*
+	 * Make sure the vma is not shared, that the dst range is
+	 * both valid and fully within a single existing vma.
+	 */
+	if (!dst_vma || (dst_vma->vm_flags & VM_SHARED))
+		goto out_unlock;
+	if (!userfaultfd_wp(dst_vma))
+		goto out_unlock;
+	if (!vma_is_anonymous(dst_vma))
+		goto out_unlock;
+
+	if (enable_wp)
+		newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE));
+	else
+		newprot = vm_get_page_prot(dst_vma->vm_flags);
+
+	change_protection(dst_vma, start, start + len, newprot,
+			  enable_wp ? MM_CP_UFFD_WP : MM_CP_UFFD_WP_RESOLVE);
+
+	err = 0;
+out_unlock:
+	up_read(&dst_mm->mmap_sem);
+	return err;
+}