diff mbox series

[v3,08/23] vfio, mm: fix get_user_pages_remote() and FOLL_LONGTERM

Message ID 20191112000700.3455038-9-jhubbard@nvidia.com (mailing list archive)
State New, archived
Headers show
Series mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM | expand

Commit Message

John Hubbard Nov. 12, 2019, 12:06 a.m. UTC
As it says in the updated comment in gup.c: current FOLL_LONGTERM
behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
FS DAX check requirement on vmas.

However, the corresponding restriction in get_user_pages_remote() was
slightly stricter than is actually required: it forbade all
FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
that do not set the "locked" arg.

Update the code and comments accordingly, and update the VFIO caller
to take advantage of this, fixing a bug as a result: the VFIO caller
is logically a FOLL_LONGTERM user.

Thanks to Jason Gunthorpe for pointing out a clean way to fix this.

Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Jerome Glisse <jglisse@redhat.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++-----------------
 mm/gup.c                        | 13 ++++++++-----
 2 files changed, 21 insertions(+), 22 deletions(-)

Comments

Jason Gunthorpe Nov. 12, 2019, 8:43 p.m. UTC | #1
On Mon, Nov 11, 2019 at 04:06:45PM -0800, John Hubbard wrote:
> As it says in the updated comment in gup.c: current FOLL_LONGTERM
> behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
> FS DAX check requirement on vmas.
> 
> However, the corresponding restriction in get_user_pages_remote() was
> slightly stricter than is actually required: it forbade all
> FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
> that do not set the "locked" arg.
> 
> Update the code and comments accordingly, and update the VFIO caller
> to take advantage of this, fixing a bug as a result: the VFIO caller
> is logically a FOLL_LONGTERM user.
> 
> Thanks to Jason Gunthorpe for pointing out a clean way to fix this.
> 
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Jerome Glisse <jglisse@redhat.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++-----------------
>  mm/gup.c                        | 13 ++++++++-----
>  2 files changed, 21 insertions(+), 22 deletions(-)

This matches what I thought, but I think DanW should check it too, and
the vfio users should test..

> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index d864277ea16f..017689b7c32b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -348,24 +348,20 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>  		flags |= FOLL_WRITE;
>  
>  	down_read(&mm->mmap_sem);
> -	if (mm == current->mm) {
> -		ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
> -				     vmas);
> -	} else {
> -		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
> -					    vmas, NULL);
> -		/*
> -		 * The lifetime of a vaddr_get_pfn() page pin is
> -		 * userspace-controlled. In the fs-dax case this could
> -		 * lead to indefinite stalls in filesystem operations.
> -		 * Disallow attempts to pin fs-dax pages via this
> -		 * interface.
> -		 */
> -		if (ret > 0 && vma_is_fsdax(vmas[0])) {
> -			ret = -EOPNOTSUPP;
> -			put_page(page[0]);
> -		}
> +	ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> +				    page, vmas, NULL);
> +	/*
> +	 * The lifetime of a vaddr_get_pfn() page pin is
> +	 * userspace-controlled. In the fs-dax case this could
> +	 * lead to indefinite stalls in filesystem operations.
> +	 * Disallow attempts to pin fs-dax pages via this
> +	 * interface.
> +	 */
> +	if (ret > 0 && vma_is_fsdax(vmas[0])) {
> +		ret = -EOPNOTSUPP;
> +		put_page(page[0]);
>  	}

AFAIK this chunk is redundant now as it is some hack to emulate
FOLL_LONGTERM? So vmas can be deleted too.

Also unclear why this function has this:

        up_read(&mm->mmap_sem);

        if (ret == 1) {
                *pfn = page_to_pfn(page[0]);
                return 0;
        }

        down_read(&mm->mmap_sem);

Jason
Dan Williams Nov. 12, 2019, 9:57 p.m. UTC | #2
On Mon, Nov 11, 2019 at 4:07 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> As it says in the updated comment in gup.c: current FOLL_LONGTERM
> behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the
> FS DAX check requirement on vmas.
>
> However, the corresponding restriction in get_user_pages_remote() was
> slightly stricter than is actually required: it forbade all
> FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers
> that do not set the "locked" arg.
>
> Update the code and comments accordingly, and update the VFIO caller
> to take advantage of this, fixing a bug as a result: the VFIO caller
> is logically a FOLL_LONGTERM user.
>
> Thanks to Jason Gunthorpe for pointing out a clean way to fix this.
>
> Suggested-by: Jason Gunthorpe <jgg@ziepe.ca>
> Cc: Jerome Glisse <jglisse@redhat.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
> ---
>  drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++-----------------
>  mm/gup.c                        | 13 ++++++++-----
>  2 files changed, 21 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index d864277ea16f..017689b7c32b 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -348,24 +348,20 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>                 flags |= FOLL_WRITE;
>
>         down_read(&mm->mmap_sem);
> -       if (mm == current->mm) {
> -               ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
> -                                    vmas);
> -       } else {
> -               ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
> -                                           vmas, NULL);
> -               /*
> -                * The lifetime of a vaddr_get_pfn() page pin is
> -                * userspace-controlled. In the fs-dax case this could
> -                * lead to indefinite stalls in filesystem operations.
> -                * Disallow attempts to pin fs-dax pages via this
> -                * interface.
> -                */
> -               if (ret > 0 && vma_is_fsdax(vmas[0])) {
> -                       ret = -EOPNOTSUPP;
> -                       put_page(page[0]);
> -               }
> +       ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> +                                   page, vmas, NULL);

Hmm, what's the point of passing FOLL_LONGTERM to
get_user_pages_remote() if get_user_pages_remote() is not going to
check the vma? I think we got to this code state because the
get_user_pages() vs get_user_pages_remote() split predated the
introduction of FOLL_LONGTERM.

I think check_vma_flags() should do the ((FOLL_LONGTERM | FOLL_GET) &&
vma_is_fsdax()) check and that would also remove the need for
__gup_longterm_locked.
John Hubbard Nov. 12, 2019, 10:24 p.m. UTC | #3
On 11/12/19 1:57 PM, Dan Williams wrote:
...
>> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
>> index d864277ea16f..017689b7c32b 100644
>> --- a/drivers/vfio/vfio_iommu_type1.c
>> +++ b/drivers/vfio/vfio_iommu_type1.c
>> @@ -348,24 +348,20 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
>>                 flags |= FOLL_WRITE;
>>
>>         down_read(&mm->mmap_sem);
>> -       if (mm == current->mm) {
>> -               ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
>> -                                    vmas);
>> -       } else {
>> -               ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
>> -                                           vmas, NULL);
>> -               /*
>> -                * The lifetime of a vaddr_get_pfn() page pin is
>> -                * userspace-controlled. In the fs-dax case this could
>> -                * lead to indefinite stalls in filesystem operations.
>> -                * Disallow attempts to pin fs-dax pages via this
>> -                * interface.
>> -                */
>> -               if (ret > 0 && vma_is_fsdax(vmas[0])) {
>> -                       ret = -EOPNOTSUPP;
>> -                       put_page(page[0]);
>> -               }
>> +       ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
>> +                                   page, vmas, NULL);
> 
> Hmm, what's the point of passing FOLL_LONGTERM to
> get_user_pages_remote() if get_user_pages_remote() is not going to
> check the vma? I think we got to this code state because the

FOLL_LONGTERM is short-lived in this location, because patch 23 
("mm/gup: remove support for gup(FOLL_LONGTERM)") removes it, after
callers are changed over to pin_longterm_pages*().

So FOLL_LONGTERM is not doing much now, but it is basically a marker for
"change gup(FOLL_LONGTERM) to pin_longterm_pages()", and patch 18
actually makes that change.

And then pin_longterm_pages*() is, in turn, a way to mark all the 
places that need file system and/or user space interactions (layout
leases, etc), as per "Case 2: RDMA" in the new 
Documentation/vm/pin_user_pages.rst.

> get_user_pages() vs get_user_pages_remote() split predated the
> introduction of FOLL_LONGTERM.

Yes. And I do want clean this up as I go, so we don't end up with
stale concepts lingering in gup.c...

> 
> I think check_vma_flags() should do the ((FOLL_LONGTERM | FOLL_GET) &&
> vma_is_fsdax()) check and that would also remove the need for
> __gup_longterm_locked.
> 

Good idea, but there is still the call to check_and_migrate_cma_pages(), 
inside __gup_longterm_locked().  So it's a little more involved and
we can't trivially delete __gup_longterm_locked() yet, right?


thanks,
John Hubbard Nov. 12, 2019, 10:42 p.m. UTC | #4
On 11/12/19 12:43 PM, Jason Gunthorpe wrote:
...
>> -		}
>> +	ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
>> +				    page, vmas, NULL);
>> +	/*
>> +	 * The lifetime of a vaddr_get_pfn() page pin is
>> +	 * userspace-controlled. In the fs-dax case this could
>> +	 * lead to indefinite stalls in filesystem operations.
>> +	 * Disallow attempts to pin fs-dax pages via this
>> +	 * interface.
>> +	 */
>> +	if (ret > 0 && vma_is_fsdax(vmas[0])) {
>> +		ret = -EOPNOTSUPP;
>> +		put_page(page[0]);
>>  	}
> 
> AFAIK this chunk is redundant now as it is some hack to emulate
> FOLL_LONGTERM? So vmas can be deleted too.

Let me first make sure I understand what Dan has in mind for the vma
checking, in the other thread...

> 
> Also unclear why this function has this:
> 
>         up_read(&mm->mmap_sem);
> 
>         if (ret == 1) {
>                 *pfn = page_to_pfn(page[0]);
>                 return 0;
>         }
> 
>         down_read(&mm->mmap_sem);
> 

Yes, that's really odd. It's not good to release and retake the lock
anyway in general (without re-checking things), and  certainly it is
not required to release mmap_sem in order to call page_to_pfn().

I've removed that up_read()/down_read() pair, for v4.


thanks,
Dan Williams Nov. 12, 2019, 10:43 p.m. UTC | #5
On Tue, Nov 12, 2019 at 2:24 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 11/12/19 1:57 PM, Dan Williams wrote:
> ...
> >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> >> index d864277ea16f..017689b7c32b 100644
> >> --- a/drivers/vfio/vfio_iommu_type1.c
> >> +++ b/drivers/vfio/vfio_iommu_type1.c
> >> @@ -348,24 +348,20 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
> >>                 flags |= FOLL_WRITE;
> >>
> >>         down_read(&mm->mmap_sem);
> >> -       if (mm == current->mm) {
> >> -               ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
> >> -                                    vmas);
> >> -       } else {
> >> -               ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
> >> -                                           vmas, NULL);
> >> -               /*
> >> -                * The lifetime of a vaddr_get_pfn() page pin is
> >> -                * userspace-controlled. In the fs-dax case this could
> >> -                * lead to indefinite stalls in filesystem operations.
> >> -                * Disallow attempts to pin fs-dax pages via this
> >> -                * interface.
> >> -                */
> >> -               if (ret > 0 && vma_is_fsdax(vmas[0])) {
> >> -                       ret = -EOPNOTSUPP;
> >> -                       put_page(page[0]);
> >> -               }
> >> +       ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> >> +                                   page, vmas, NULL);
> >
> > Hmm, what's the point of passing FOLL_LONGTERM to
> > get_user_pages_remote() if get_user_pages_remote() is not going to
> > check the vma? I think we got to this code state because the
>
> FOLL_LONGTERM is short-lived in this location, because patch 23
> ("mm/gup: remove support for gup(FOLL_LONGTERM)") removes it, after
> callers are changed over to pin_longterm_pages*().
>
> So FOLL_LONGTERM is not doing much now, but it is basically a marker for
> "change gup(FOLL_LONGTERM) to pin_longterm_pages()", and patch 18
> actually makes that change.
>
> And then pin_longterm_pages*() is, in turn, a way to mark all the
> places that need file system and/or user space interactions (layout
> leases, etc), as per "Case 2: RDMA" in the new
> Documentation/vm/pin_user_pages.rst.

Ah, sorry. This was the first time I had looked at this series and
jumped in without reading the background.

Your patch as is looks ok, I assume you've removed the FOLL_LONGTERM
warning in get_user_pages_remote in another patch?

>
> > get_user_pages() vs get_user_pages_remote() split predated the
> > introduction of FOLL_LONGTERM.
>
> Yes. And I do want clean this up as I go, so we don't end up with
> stale concepts lingering in gup.c...
>
> >
> > I think check_vma_flags() should do the ((FOLL_LONGTERM | FOLL_GET) &&
> > vma_is_fsdax()) check and that would also remove the need for
> > __gup_longterm_locked.
> >
>
> Good idea, but there is still the call to check_and_migrate_cma_pages(),
> inside __gup_longterm_locked().  So it's a little more involved and
> we can't trivially delete __gup_longterm_locked() yet, right?

[ add Aneesh ]

Yes, you're right. I had overlooked that had snuck in there. That to
me similarly needs to be pushed down into the core with its own FOLL
flag, or it needs to be an explicit fixup that each caller does after
get_user_pages. The fact that migration silently happens as a side
effect of gup is too magical for my taste.
Dan Williams Nov. 12, 2019, 10:45 p.m. UTC | #6
On Tue, Nov 12, 2019 at 2:43 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 11/12/19 12:43 PM, Jason Gunthorpe wrote:
> ...
> >> -            }
> >> +    ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> >> +                                page, vmas, NULL);
> >> +    /*
> >> +     * The lifetime of a vaddr_get_pfn() page pin is
> >> +     * userspace-controlled. In the fs-dax case this could
> >> +     * lead to indefinite stalls in filesystem operations.
> >> +     * Disallow attempts to pin fs-dax pages via this
> >> +     * interface.
> >> +     */
> >> +    if (ret > 0 && vma_is_fsdax(vmas[0])) {
> >> +            ret = -EOPNOTSUPP;
> >> +            put_page(page[0]);
> >>      }
> >
> > AFAIK this chunk is redundant now as it is some hack to emulate
> > FOLL_LONGTERM? So vmas can be deleted too.
>
> Let me first make sure I understand what Dan has in mind for the vma
> checking, in the other thread...

It's not redundant relative to upstream which does not do anything the
FOLL_LONGTERM in the gup-slow path... but I have not looked at patches
1-7 to see if something there made it redundant.
John Hubbard Nov. 12, 2019, 11:08 p.m. UTC | #7
On 11/12/19 2:43 PM, Dan Williams wrote:
... 
> Ah, sorry. This was the first time I had looked at this series and
> jumped in without reading the background.
> 
> Your patch as is looks ok, I assume you've removed the FOLL_LONGTERM
> warning in get_user_pages_remote in another patch?
> 

Actually, I haven't gone quite that far. Actually this patch is the last
change to that function. Therefore, at the end of this patchset, 
get_user_pages_remote() ends up with this check in it which
is a less-restrictive version of the warning:

	/*
	 * Current FOLL_LONGTERM behavior is incompatible with
	 * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
	 * vmas. However, this only comes up if locked is set, and there are
	 * callers that do request FOLL_LONGTERM, but do not set locked. So,
	 * allow what we can.
	 */
	if (gup_flags & FOLL_LONGTERM) {
		if (WARN_ON_ONCE(locked))
			return -EINVAL;
	}

Is that OK, or did you want to go further (possibly in a follow-up
patchset, as I'm hoping to get this one in soon)?  

...
>>> I think check_vma_flags() should do the ((FOLL_LONGTERM | FOLL_GET) &&
>>> vma_is_fsdax()) check and that would also remove the need for
>>> __gup_longterm_locked.
>>>
>>
>> Good idea, but there is still the call to check_and_migrate_cma_pages(),
>> inside __gup_longterm_locked().  So it's a little more involved and
>> we can't trivially delete __gup_longterm_locked() yet, right?
> 
> [ add Aneesh ]
> 
> Yes, you're right. I had overlooked that had snuck in there. That to
> me similarly needs to be pushed down into the core with its own FOLL
> flag, or it needs to be an explicit fixup that each caller does after
> get_user_pages. The fact that migration silently happens as a side
> effect of gup is too magical for my taste.
> 

Yes. It's an intrusive side effect that is surprising, and not in a 
"happy surprise" way. :) .   Fixing up the CMA pages by splitting that
functionality into separate function calls sounds like an improvement
worth exploring.


thanks,
Dan Williams Nov. 12, 2019, 11:14 p.m. UTC | #8
On Tue, Nov 12, 2019 at 3:08 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 11/12/19 2:43 PM, Dan Williams wrote:
> ...
> > Ah, sorry. This was the first time I had looked at this series and
> > jumped in without reading the background.
> >
> > Your patch as is looks ok, I assume you've removed the FOLL_LONGTERM
> > warning in get_user_pages_remote in another patch?
> >
>
> Actually, I haven't gone quite that far. Actually this patch is the last
> change to that function. Therefore, at the end of this patchset,
> get_user_pages_remote() ends up with this check in it which
> is a less-restrictive version of the warning:
>
>         /*
>          * Current FOLL_LONGTERM behavior is incompatible with
>          * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
>          * vmas. However, this only comes up if locked is set, and there are
>          * callers that do request FOLL_LONGTERM, but do not set locked. So,
>          * allow what we can.
>          */
>         if (gup_flags & FOLL_LONGTERM) {
>                 if (WARN_ON_ONCE(locked))
>                         return -EINVAL;
>         }
>
> Is that OK, or did you want to go further (possibly in a follow-up
> patchset, as I'm hoping to get this one in soon)?

That looks ok. Something to maybe push down into the core in a future
cleanup, but not something that needs to be done now.

> ...
> >>> I think check_vma_flags() should do the ((FOLL_LONGTERM | FOLL_GET) &&
> >>> vma_is_fsdax()) check and that would also remove the need for
> >>> __gup_longterm_locked.
> >>>
> >>
> >> Good idea, but there is still the call to check_and_migrate_cma_pages(),
> >> inside __gup_longterm_locked().  So it's a little more involved and
> >> we can't trivially delete __gup_longterm_locked() yet, right?
> >
> > [ add Aneesh ]
> >
> > Yes, you're right. I had overlooked that had snuck in there. That to
> > me similarly needs to be pushed down into the core with its own FOLL
> > flag, or it needs to be an explicit fixup that each caller does after
> > get_user_pages. The fact that migration silently happens as a side
> > effect of gup is too magical for my taste.
> >
>
> Yes. It's an intrusive side effect that is surprising, and not in a
> "happy surprise" way. :) .   Fixing up the CMA pages by splitting that
> functionality into separate function calls sounds like an improvement
> worth exploring.

Right, future work.
John Hubbard Nov. 12, 2019, 11:17 p.m. UTC | #9
On 11/12/19 2:45 PM, Dan Williams wrote:
> On Tue, Nov 12, 2019 at 2:43 PM John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> On 11/12/19 12:43 PM, Jason Gunthorpe wrote:
>> ...
>>>> -            }
>>>> +    ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
>>>> +                                page, vmas, NULL);
>>>> +    /*
>>>> +     * The lifetime of a vaddr_get_pfn() page pin is
>>>> +     * userspace-controlled. In the fs-dax case this could
>>>> +     * lead to indefinite stalls in filesystem operations.
>>>> +     * Disallow attempts to pin fs-dax pages via this
>>>> +     * interface.
>>>> +     */
>>>> +    if (ret > 0 && vma_is_fsdax(vmas[0])) {
>>>> +            ret = -EOPNOTSUPP;
>>>> +            put_page(page[0]);
>>>>      }
>>>
>>> AFAIK this chunk is redundant now as it is some hack to emulate
>>> FOLL_LONGTERM? So vmas can be deleted too.
>>
>> Let me first make sure I understand what Dan has in mind for the vma
>> checking, in the other thread...
> 
> It's not redundant relative to upstream which does not do anything the
> FOLL_LONGTERM in the gup-slow path... but I have not looked at patches
> 1-7 to see if something there made it redundant.
> 

There is nothing in patches 1-7 that would make it redundant. 

About the only thing that you might find interesting in that subset is
patch 4 ("mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages"),
for devmap and ZONE_DEVICE interest. But it doesn't affect this
discussion directly.


thanks,
John Hubbard Nov. 12, 2019, 11:29 p.m. UTC | #10
On 11/12/19 3:14 PM, Dan Williams wrote:
...
>>
>> Is that OK, or did you want to go further (possibly in a follow-up
>> patchset, as I'm hoping to get this one in soon)?
> 
> That looks ok. Something to maybe push down into the core in a future


Great! I'll post a cleaned up v4 (with the extraneous up_read()/down_read()
removed), then.


> cleanup, but not something that needs to be done now.
> 

Yes. I've put the FOLL_LONGTERM cleanup items on my list now, in case
they don't get done as part of something else. There's a lot more
change coming in this area.


thanks,
Jason Gunthorpe Nov. 12, 2019, 11:42 p.m. UTC | #11
On Tue, Nov 12, 2019 at 02:45:51PM -0800, Dan Williams wrote:
> On Tue, Nov 12, 2019 at 2:43 PM John Hubbard <jhubbard@nvidia.com> wrote:
> >
> > On 11/12/19 12:43 PM, Jason Gunthorpe wrote:
> > ...
> > >> -            }
> > >> +    ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> > >> +                                page, vmas, NULL);
> > >> +    /*
> > >> +     * The lifetime of a vaddr_get_pfn() page pin is
> > >> +     * userspace-controlled. In the fs-dax case this could
> > >> +     * lead to indefinite stalls in filesystem operations.
> > >> +     * Disallow attempts to pin fs-dax pages via this
> > >> +     * interface.
> > >> +     */
> > >> +    if (ret > 0 && vma_is_fsdax(vmas[0])) {
> > >> +            ret = -EOPNOTSUPP;
> > >> +            put_page(page[0]);
> > >>      }
> > >
> > > AFAIK this chunk is redundant now as it is some hack to emulate
> > > FOLL_LONGTERM? So vmas can be deleted too.
> >
> > Let me first make sure I understand what Dan has in mind for the vma
> > checking, in the other thread...
> 
> It's not redundant relative to upstream which does not do anything the
> FOLL_LONGTERM in the gup-slow path... but I have not looked at patches
> 1-7 to see if something there made it redundant.

Oh, the hunk John had below for get_user_pages_remote() also needs to
call __gup_longterm_locked() when FOLL_LONGTERM is specified, then
that calls check_dax_vmas() which duplicates the vma_is_fsdax() check
above.

Certainly no caller of FOLL_LONGTERM should have to do dax specific
VMA checking.

Jason
Dan Williams Nov. 13, 2019, 12:58 a.m. UTC | #12
On Tue, Nov 12, 2019 at 3:43 PM Jason Gunthorpe <jgg@ziepe.ca> wrote:
>
> On Tue, Nov 12, 2019 at 02:45:51PM -0800, Dan Williams wrote:
> > On Tue, Nov 12, 2019 at 2:43 PM John Hubbard <jhubbard@nvidia.com> wrote:
> > >
> > > On 11/12/19 12:43 PM, Jason Gunthorpe wrote:
> > > ...
> > > >> -            }
> > > >> +    ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
> > > >> +                                page, vmas, NULL);
> > > >> +    /*
> > > >> +     * The lifetime of a vaddr_get_pfn() page pin is
> > > >> +     * userspace-controlled. In the fs-dax case this could
> > > >> +     * lead to indefinite stalls in filesystem operations.
> > > >> +     * Disallow attempts to pin fs-dax pages via this
> > > >> +     * interface.
> > > >> +     */
> > > >> +    if (ret > 0 && vma_is_fsdax(vmas[0])) {
> > > >> +            ret = -EOPNOTSUPP;
> > > >> +            put_page(page[0]);
> > > >>      }
> > > >
> > > > AFAIK this chunk is redundant now as it is some hack to emulate
> > > > FOLL_LONGTERM? So vmas can be deleted too.
> > >
> > > Let me first make sure I understand what Dan has in mind for the vma
> > > checking, in the other thread...
> >
> > It's not redundant relative to upstream which does not do anything the
> > FOLL_LONGTERM in the gup-slow path... but I have not looked at patches
> > 1-7 to see if something there made it redundant.
>
> Oh, the hunk John had below for get_user_pages_remote() also needs to
> call __gup_longterm_locked() when FOLL_LONGTERM is specified, then
> that calls check_dax_vmas() which duplicates the vma_is_fsdax() check
> above.

Oh true, good eye. It is redundant if it does additionally call
__gup_longterm_locked(), and it needs to do that otherwises it undoes
the CMA migration magic that Aneesh added.

>
> Certainly no caller of FOLL_LONGTERM should have to do dax specific
> VMA checking.

Agree, that was my comment about cleaning up the vma_is_fsdax() check
to be internal to the gup core.
John Hubbard Nov. 13, 2019, 1:08 a.m. UTC | #13
On 11/12/19 4:58 PM, Dan Williams wrote:
...
>>> It's not redundant relative to upstream which does not do anything the
>>> FOLL_LONGTERM in the gup-slow path... but I have not looked at patches
>>> 1-7 to see if something there made it redundant.
>>
>> Oh, the hunk John had below for get_user_pages_remote() also needs to
>> call __gup_longterm_locked() when FOLL_LONGTERM is specified, then
>> that calls check_dax_vmas() which duplicates the vma_is_fsdax() check
>> above.
> 
> Oh true, good eye. It is redundant if it does additionally call
> __gup_longterm_locked(), and it needs to do that otherwises it undoes
> the CMA migration magic that Aneesh added.
> 

OK. So just to be clear, I'll be removing this from the patch:

	/*
	 * The lifetime of a vaddr_get_pfn() page pin is
	 * userspace-controlled. In the fs-dax case this could
	 * lead to indefinite stalls in filesystem operations.
	 * Disallow attempts to pin fs-dax pages via this
	 * interface.
	 */
	if (ret > 0 && vma_is_fsdax(vmas[0])) {
		ret = -EOPNOTSUPP;
		put_page(page[0]);
  	}

(and the declaration of "vmas", as well).

thanks,
Dan Williams Nov. 13, 2019, 1:35 a.m. UTC | #14
On Tue, Nov 12, 2019 at 5:08 PM John Hubbard <jhubbard@nvidia.com> wrote:
>
> On 11/12/19 4:58 PM, Dan Williams wrote:
> ...
> >>> It's not redundant relative to upstream which does not do anything the
> >>> FOLL_LONGTERM in the gup-slow path... but I have not looked at patches
> >>> 1-7 to see if something there made it redundant.
> >>
> >> Oh, the hunk John had below for get_user_pages_remote() also needs to
> >> call __gup_longterm_locked() when FOLL_LONGTERM is specified, then
> >> that calls check_dax_vmas() which duplicates the vma_is_fsdax() check
> >> above.
> >
> > Oh true, good eye. It is redundant if it does additionally call
> > __gup_longterm_locked(), and it needs to do that otherwises it undoes
> > the CMA migration magic that Aneesh added.
> >
>
> OK. So just to be clear, I'll be removing this from the patch:
>
>         /*
>          * The lifetime of a vaddr_get_pfn() page pin is
>          * userspace-controlled. In the fs-dax case this could
>          * lead to indefinite stalls in filesystem operations.
>          * Disallow attempts to pin fs-dax pages via this
>          * interface.
>          */
>         if (ret > 0 && vma_is_fsdax(vmas[0])) {
>                 ret = -EOPNOTSUPP;
>                 put_page(page[0]);
>         }
>
> (and the declaration of "vmas", as well).

...and add a call to __gup_longterm_locked internal to
get_user_pages_remote(), right?
John Hubbard Nov. 13, 2019, 2:09 a.m. UTC | #15
On 11/12/19 5:35 PM, Dan Williams wrote:
> On Tue, Nov 12, 2019 at 5:08 PM John Hubbard <jhubbard@nvidia.com> wrote:
>>
>> On 11/12/19 4:58 PM, Dan Williams wrote:
>> ...
>>>>> It's not redundant relative to upstream which does not do anything the
>>>>> FOLL_LONGTERM in the gup-slow path... but I have not looked at patches
>>>>> 1-7 to see if something there made it redundant.
>>>>
>>>> Oh, the hunk John had below for get_user_pages_remote() also needs to
>>>> call __gup_longterm_locked() when FOLL_LONGTERM is specified, then
>>>> that calls check_dax_vmas() which duplicates the vma_is_fsdax() check
>>>> above.
>>>
>>> Oh true, good eye. It is redundant if it does additionally call
>>> __gup_longterm_locked(), and it needs to do that otherwises it undoes
>>> the CMA migration magic that Aneesh added.
>>>
>>
>> OK. So just to be clear, I'll be removing this from the patch:
>>
>>         /*
>>          * The lifetime of a vaddr_get_pfn() page pin is
>>          * userspace-controlled. In the fs-dax case this could
>>          * lead to indefinite stalls in filesystem operations.
>>          * Disallow attempts to pin fs-dax pages via this
>>          * interface.
>>          */
>>         if (ret > 0 && vma_is_fsdax(vmas[0])) {
>>                 ret = -EOPNOTSUPP;
>>                 put_page(page[0]);
>>         }
>>
>> (and the declaration of "vmas", as well).
> 
> ...and add a call to __gup_longterm_locked internal to
> get_user_pages_remote(), right?
> 

Yes, and thanks for double-checking. I think I got a little dizzy following
the call stack there. :)  And now I see that this also affects the
implementation of pin_longterm_pages_remote(), because that will need the
same logic that get_user_pages_remote() has. 



thanks,
diff mbox series

Patch

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index d864277ea16f..017689b7c32b 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -348,24 +348,20 @@  static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr,
 		flags |= FOLL_WRITE;
 
 	down_read(&mm->mmap_sem);
-	if (mm == current->mm) {
-		ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page,
-				     vmas);
-	} else {
-		ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page,
-					    vmas, NULL);
-		/*
-		 * The lifetime of a vaddr_get_pfn() page pin is
-		 * userspace-controlled. In the fs-dax case this could
-		 * lead to indefinite stalls in filesystem operations.
-		 * Disallow attempts to pin fs-dax pages via this
-		 * interface.
-		 */
-		if (ret > 0 && vma_is_fsdax(vmas[0])) {
-			ret = -EOPNOTSUPP;
-			put_page(page[0]);
-		}
+	ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM,
+				    page, vmas, NULL);
+	/*
+	 * The lifetime of a vaddr_get_pfn() page pin is
+	 * userspace-controlled. In the fs-dax case this could
+	 * lead to indefinite stalls in filesystem operations.
+	 * Disallow attempts to pin fs-dax pages via this
+	 * interface.
+	 */
+	if (ret > 0 && vma_is_fsdax(vmas[0])) {
+		ret = -EOPNOTSUPP;
+		put_page(page[0]);
 	}
+
 	up_read(&mm->mmap_sem);
 
 	if (ret == 1) {
diff --git a/mm/gup.c b/mm/gup.c
index 933524de6249..cfe6dc5fc343 100644
--- a/mm/gup.c
+++ b/mm/gup.c
@@ -1167,13 +1167,16 @@  long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm,
 		struct vm_area_struct **vmas, int *locked)
 {
 	/*
-	 * FIXME: Current FOLL_LONGTERM behavior is incompatible with
+	 * Current FOLL_LONGTERM behavior is incompatible with
 	 * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on
-	 * vmas.  As there are no users of this flag in this call we simply
-	 * disallow this option for now.
+	 * vmas. However, this only comes up if locked is set, and there are
+	 * callers that do request FOLL_LONGTERM, but do not set locked. So,
+	 * allow what we can.
 	 */
-	if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM))
-		return -EINVAL;
+	if (gup_flags & FOLL_LONGTERM) {
+		if (WARN_ON_ONCE(locked))
+			return -EINVAL;
+	}
 
 	return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas,
 				       locked,