[v4,08/23] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM
diff mbox series

Message ID 20191113042710.3997854-9-jhubbard@nvidia.com
State New
Headers show
Series
  • mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM
Related show

Commit Message

John Hubbard Nov. 13, 2019, 4:26 a.m. UTC
As it says in the updated comment in gup.c: current FOLL_LONGTERM
behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
FS DAX check requirement on vmas.

However, the corresponding restriction in get_user_pages_remote() was
slightly stricter than is actually required: it forbade all
FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
that do not set the "locked" arg.

Update the code and comments accordingly, and update the VFIO caller
to take advantage of this, fixing a bug as a result: the VFIO caller
is logically a FOLL_LONGTERM user.

Also, remove an unnessary pair of calls that were releasing and
reacquiring the mmap_sem. There is no need to avoid holding mmap_sem
just in order to call page_to_pfn().

Also, move the DAX check ("if a VMA is DAX, don't allow long term
pinning") from the VFIO call site, all the way into the internals
of get_user_pages_remote() and __gup_longterm_locked(). That is:
get_user_pages_remote() calls __gup_longterm_locked(), which in turn
calls check_dax_vmas(). It's lightly explained in the comments as well.

Thanks to Jason Gunthorpe for pointing out a clean way to fix this,
and to Dan Williams for helping clarify the DAX refactoring.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/vfio/vfio_iommu_type1.c | 25 ++-----------------------
 mm/gup.c                        | 27 ++++++++++++++++++++++-----
 2 files changed, 24 insertions(+), 28 deletions(-)

Comments

Jason Gunthorpe Nov. 13, 2019, 1:02 p.m. UTC | #1
On Tue, Nov 12, 2019 at 08:26:55PM -0800, John Hubbard wrote:
> As it says in the updated comment in gup.c: current FOLL_LONGTERM
> behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
> FS DAX check requirement on vmas.
> 
> However, the corresponding restriction in get_user_pages_remote() was
> slightly stricter than is actually required: it forbade all
> FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
> that do not set the "locked" arg.
> 
> Update the code and comments accordingly, and update the VFIO caller
> to take advantage of this, fixing a bug as a result: the VFIO caller
> is logically a FOLL_LONGTERM user.
> 
> Also, remove an unnessary pair of calls that were releasing and
> reacquiring the mmap_sem. There is no need to avoid holding mmap_sem
> just in order to call page_to_pfn().
> 
> Also, move the DAX check ("if a VMA is DAX, don't allow long term
> pinning") from the VFIO call site, all the way into the internals
> of get_user_pages_remote() and __gup_longterm_locked(). That is:
> get_user_pages_remote() calls __gup_longterm_locked(), which in turn
> calls check_dax_vmas(). It's lightly explained in the comments as well.
> 
> Thanks to Jason Gunthorpe for pointing out a clean way to fix this,
> and to Dan Williams for helping clarify the DAX refactoring.
> 
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jerome Glisse <jglisse@redhat.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>  drivers/vfio/vfio_iommu_type1.c | 25 ++-----------------------
>  mm/gup.c                        | 27 ++++++++++++++++++++++-----
>  2 files changed, 24 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index d864277ea16f..7301b710c9a4 100644
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -340,7 +340,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>  {
>  	struct page *page[1];
>  	struct vm_area_struct *vma;
> -	struct vm_area_struct *vmas[1];
>  	unsigned int flags = 0;
>  	int ret;
>  
> @@ -348,33 +347,13 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>  		flags |= FOLL_WRITE;
>  
>  	down_read(&mm->mmap_sem);
> -	if (mm == current->mm) {
> -		ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
> -				     vmas);
> -	} else {
> -		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
> -					    vmas, NULL);
> -		/*
> -		 * The lifetime of a vaddr_get_pfn() page pin is
> -		 * userspace-controlled. In the fs-dax case this could
> -		 * lead to indefinite stalls in filesystem operations.
> -		 * Disallow attempts to pin fs-dax pages via this
> -		 * interface.
> -		 */
> -		if (ret > 0 && vma_is_fsdax(vmas[0])) {
> -			ret = -EOPNOTSUPP;
> -			put_page(page[0]);
> -		}
> -	}
> -	up_read(&mm->mmap_sem);
> -
> +	ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> +				    page, NULL, NULL);
>  	if (ret == 1) {
>  		*pfn = page_to_pfn(page[0]);
>  		return 0;

Mind the return with the lock held this needs some goto unwind

Jason
Weiny, Ira Nov. 13, 2019, 6:47 p.m. UTC | #2
On Tue, Nov 12, 2019 at 08:26:55PM -0800, John Hubbard wrote:
> As it says in the updated comment in gup.c: current FOLL_LONGTERM
> behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
> FS DAX check requirement on vmas.
> 
> However, the corresponding restriction in get_user_pages_remote() was
> slightly stricter than is actually required: it forbade all
> FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
> that do not set the "locked" arg.
> 
> Update the code and comments accordingly, and update the VFIO caller
> to take advantage of this, fixing a bug as a result: the VFIO caller
> is logically a FOLL_LONGTERM user.
> 
> Also, remove an unnessary pair of calls that were releasing and
> reacquiring the mmap_sem. There is no need to avoid holding mmap_sem
> just in order to call page_to_pfn().
> 
> Also, move the DAX check ("if a VMA is DAX, don't allow long term
> pinning") from the VFIO call site, all the way into the internals
> of get_user_pages_remote() and __gup_longterm_locked(). That is:
> get_user_pages_remote() calls __gup_longterm_locked(), which in turn
> calls check_dax_vmas(). It's lightly explained in the comments as well.
> 
> Thanks to Jason Gunthorpe for pointing out a clean way to fix this,
> and to Dan Williams for helping clarify the DAX refactoring.
> 
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Jerome Glisse <jglisse@redhat.com>
> Cc: Ira Weiny <ira.weiny@intel.com>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 25 ++-----------------------
>  mm/gup.c                        | 27 ++++++++++++++++++++++-----
>  2 files changed, 24 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index d864277ea16f..7301b710c9a4 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -340,7 +340,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>  {
>  	struct page *page[1];
>  	struct vm_area_struct *vma;
> -	struct vm_area_struct *vmas[1];
>  	unsigned int flags = 0;
>  	int ret;
>  
> @@ -348,33 +347,13 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>  		flags |= FOLL_WRITE;
>  
>  	down_read(&mm->mmap_sem);
> -	if (mm == current->mm) {
> -		ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
> -				     vmas);
> -	} else {
> -		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
> -					    vmas, NULL);
> -		/*
> -		 * The lifetime of a vaddr_get_pfn() page pin is
> -		 * userspace-controlled. In the fs-dax case this could
> -		 * lead to indefinite stalls in filesystem operations.
> -		 * Disallow attempts to pin fs-dax pages via this
> -		 * interface.
> -		 */
> -		if (ret > 0 && vma_is_fsdax(vmas[0])) {
> -			ret = -EOPNOTSUPP;
> -			put_page(page[0]);
> -		}
> -	}
> -	up_read(&mm->mmap_sem);
> -
> +	ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> +				    page, NULL, NULL);
>  	if (ret == 1) {
>  		*pfn = page_to_pfn(page[0]);
>  		return 0;
>  	}
>  
> -	down_read(&mm->mmap_sem);
> -
>  	vaddr = untagged_addr(vaddr);
>  
>  	vma = find_vma_intersection(mm, vaddr, vaddr + 1);
> diff --git a/mm/gup.c b/mm/gup.c
> index 933524de6249..83702b2e86c8 100644
> --- a/mm/gup.c
> +++ b/mm/gup.c
> @@ -29,6 +29,13 @@ struct follow_page_context {
>  	unsigned int page_mask;
>  };
>  
> +static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
> +						  struct mm_struct *mm,
> +						  unsigned long start,
> +						  unsigned long nr_pages,
> +						  struct page **pages,
> +						  struct vm_area_struct **vmas,
> +						  unsigned int flags);
>  /*
>   * Return the compound head page with ref appropriately incremented,
>   * or NULL if that failed.
> @@ -1167,13 +1174,23 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
>  		struct vm_area_struct **vmas, int *locked)
>  {
>  	/*
> -	 * FIXME: Current FOLL_LONGTERM behavior is incompatible with
> +	 * Parts of FOLL_LONGTERM behavior are incompatible with
>  	 * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
> -	 * vmas.  As there are no users of this flag in this call we simply
> -	 * disallow this option for now.
> +	 * vmas. However, this only comes up if locked is set, and there are
> +	 * callers that do request FOLL_LONGTERM, but do not set locked. So,
> +	 * allow what we can.
>  	 */
> -	if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
> -		return -EINVAL;
> +	if (gup_flags & FOLL_LONGTERM) {
> +		if (WARN_ON_ONCE(locked))
> +			return -EINVAL;
> +		/*
> +		 * This will check the vmas (even if our vmas arg is NULL)
> +		 * and return -ENOTSUPP if DAX isn't allowed in this case:
> +		 */
> +		return __gup_longterm_locked(tsk, mm, start, nr_pages, pages,
> +					     vmas, gup_flags | FOLL_TOUCH |
> +					     FOLL_REMOTE);
> +	}
>  
>  	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
>  				       locked,
> -- 
> 2.24.0
>
Weiny, Ira Nov. 13, 2019, 7:17 p.m. UTC | #3
On Wed, Nov 13, 2019 at 09:02:02AM -0400, Jason Gunthorpe wrote:
> On Tue, Nov 12, 2019 at 08:26:55PM -0800, John Hubbard wrote:
> > As it says in the updated comment in gup.c: current FOLL_LONGTERM
> > behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
> > FS DAX check requirement on vmas.
> > 
> > However, the corresponding restriction in get_user_pages_remote() was
> > slightly stricter than is actually required: it forbade all
> > FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
> > that do not set the "locked" arg.
> > 
> > Update the code and comments accordingly, and update the VFIO caller
> > to take advantage of this, fixing a bug as a result: the VFIO caller
> > is logically a FOLL_LONGTERM user.
> > 
> > Also, remove an unnessary pair of calls that were releasing and
> > reacquiring the mmap_sem. There is no need to avoid holding mmap_sem
> > just in order to call page_to_pfn().
> > 
> > Also, move the DAX check ("if a VMA is DAX, don't allow long term
> > pinning") from the VFIO call site, all the way into the internals
> > of get_user_pages_remote() and __gup_longterm_locked(). That is:
> > get_user_pages_remote() calls __gup_longterm_locked(), which in turn
> > calls check_dax_vmas(). It's lightly explained in the comments as well.
> > 
> > Thanks to Jason Gunthorpe for pointing out a clean way to fix this,
> > and to Dan Williams for helping clarify the DAX refactoring.
> > 
> > Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Jerome Glisse <jglisse@redhat.com>
> > Cc: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> >  drivers/vfio/vfio_iommu_type1.c | 25 ++-----------------------
> >  mm/gup.c                        | 27 ++++++++++++++++++++++-----
> >  2 files changed, 24 insertions(+), 28 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> > index d864277ea16f..7301b710c9a4 100644
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -340,7 +340,6 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> >  {
> >  	struct page *page[1];
> >  	struct vm_area_struct *vma;
> > -	struct vm_area_struct *vmas[1];
> >  	unsigned int flags = 0;
> >  	int ret;
> >  
> > @@ -348,33 +347,13 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> >  		flags |= FOLL_WRITE;
> >  
> >  	down_read(&mm->mmap_sem);
> > -	if (mm == current->mm) {
> > -		ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
> > -				     vmas);
> > -	} else {
> > -		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
> > -					    vmas, NULL);
> > -		/*
> > -		 * The lifetime of a vaddr_get_pfn() page pin is
> > -		 * userspace-controlled. In the fs-dax case this could
> > -		 * lead to indefinite stalls in filesystem operations.
> > -		 * Disallow attempts to pin fs-dax pages via this
> > -		 * interface.
> > -		 */
> > -		if (ret > 0 && vma_is_fsdax(vmas[0])) {
> > -			ret = -EOPNOTSUPP;
> > -			put_page(page[0]);
> > -		}
> > -	}
> > -	up_read(&mm->mmap_sem);
> > -
> > +	ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> > +				    page, NULL, NULL);
> >  	if (ret == 1) {
> >  		*pfn = page_to_pfn(page[0]);
> >  		return 0;
> 
> Mind the return with the lock held this needs some goto unwind

Ah yea...  retract my reviewed by...  :-(

Ira
John Hubbard Nov. 13, 2019, 8:19 p.m. UTC | #4
On 11/13/19 11:17 AM, Ira Weiny wrote:
...
>>> @@ -348,33 +347,13 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>>>   		flags |= FOLL_WRITE;
>>>   
>>>   	down_read(&mm->mmap_sem);
>>> -	if (mm == current->mm) {
>>> -		ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
>>> -				     vmas);
>>> -	} else {
>>> -		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
>>> -					    vmas, NULL);
>>> -		/*
>>> -		 * The lifetime of a vaddr_get_pfn() page pin is
>>> -		 * userspace-controlled. In the fs-dax case this could
>>> -		 * lead to indefinite stalls in filesystem operations.
>>> -		 * Disallow attempts to pin fs-dax pages via this
>>> -		 * interface.
>>> -		 */
>>> -		if (ret > 0 && vma_is_fsdax(vmas[0])) {
>>> -			ret = -EOPNOTSUPP;
>>> -			put_page(page[0]);
>>> -		}
>>> -	}
>>> -	up_read(&mm->mmap_sem);
>>> -
>>> +	ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
>>> +				    page, NULL, NULL);
>>>   	if (ret == 1) {
>>>   		*pfn = page_to_pfn(page[0]);
>>>   		return 0;
>>
>> Mind the return with the lock held this needs some goto unwind
> 
> Ah yea...  retract my reviewed by...  :-(
> 

ooops, embarrassed that I missed that, good catch. Will repost with it fixed.



thanks,

Patch
diff mbox series

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d864277ea16f..7301b710c9a4 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -340,7 +340,6 @@  static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 {
 	struct page *page[1];
 	struct vm_area_struct *vma;
-	struct vm_area_struct *vmas[1];
 	unsigned int flags = 0;
 	int ret;
 
@@ -348,33 +347,13 @@  static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 		flags |= FOLL_WRITE;
 
 	down_read(&mm->mmap_sem);
-	if (mm == current->mm) {
-		ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
-				     vmas);
-	} else {
-		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
-					    vmas, NULL);
-		/*
-		 * The lifetime of a vaddr_get_pfn() page pin is
-		 * userspace-controlled. In the fs-dax case this could
-		 * lead to indefinite stalls in filesystem operations.
-		 * Disallow attempts to pin fs-dax pages via this
-		 * interface.
-		 */
-		if (ret > 0 && vma_is_fsdax(vmas[0])) {
-			ret = -EOPNOTSUPP;
-			put_page(page[0]);
-		}
-	}
-	up_read(&mm->mmap_sem);
-
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
+				    page, NULL, NULL);
 	if (ret == 1) {
 		*pfn = page_to_pfn(page[0]);
 		return 0;
 	}
 
-	down_read(&mm->mmap_sem);
-
 	vaddr = untagged_addr(vaddr);
 
 	vma = find_vma_intersection(mm, vaddr, vaddr + 1);
diff --git a/mm/gup.c b/mm/gup.c
index 933524de6249..83702b2e86c8 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -29,6 +29,13 @@  struct follow_page_context {
 	unsigned int page_mask;
 };
 
+static __always_inline long __gup_longterm_locked(struct task_struct *tsk,
+						  struct mm_struct *mm,
+						  unsigned long start,
+						  unsigned long nr_pages,
+						  struct page **pages,
+						  struct vm_area_struct **vmas,
+						  unsigned int flags);
 /*
  * Return the compound head page with ref appropriately incremented,
  * or NULL if that failed.
@@ -1167,13 +1174,23 @@  long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 		struct vm_area_struct **vmas, int *locked)
 {
 	/*
-	 * FIXME: Current FOLL_LONGTERM behavior is incompatible with
+	 * Parts of FOLL_LONGTERM behavior are incompatible with
 	 * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
-	 * vmas.  As there are no users of this flag in this call we simply
-	 * disallow this option for now.
+	 * vmas. However, this only comes up if locked is set, and there are
+	 * callers that do request FOLL_LONGTERM, but do not set locked. So,
+	 * allow what we can.
 	 */
-	if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
-		return -EINVAL;
+	if (gup_flags & FOLL_LONGTERM) {
+		if (WARN_ON_ONCE(locked))
+			return -EINVAL;
+		/*
+		 * This will check the vmas (even if our vmas arg is NULL)
+		 * and return -ENOTSUPP if DAX isn't allowed in this case:
+		 */
+		return __gup_longterm_locked(tsk, mm, start, nr_pages, pages,
+					     vmas, gup_flags | FOLL_TOUCH |
+					     FOLL_REMOTE);
+	}
 
 	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
 				       locked,