Message ID | 20191112000700.3455038-9-jhubbard@nvidia.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM | expand |
On Mon, Nov 11, 2019 at 04:06:45PM -0800, John Hubbard wrote: > As it says in the updated comment in gup.c: current FOLL_LONGTERM > behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the > FS DAX check requirement on vmas. > > However, the corresponding restriction in get_user_pages_remote() was > slightly stricter than is actually required: it forbade all > FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers > that do not set the "locked" arg. > > Update the code and comments accordingly, and update the VFIO caller > to take advantage of this, fixing a bug as a result: the VFIO caller > is logically a FOLL_LONGTERM user. > > Thanks to Jason Gunthorpe for pointing out a clean way to fix this. > > Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Jerome Glisse <jglisse@redhat.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > --- > drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++----------------- > mm/gup.c | 13 ++++++++----- > 2 files changed, 21 insertions(+), 22 deletions(-) This matches what I thought, but I think DanW should check it too, and the vfio users should test.. > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index d864277ea16f..017689b7c32b 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -348,24 +348,20 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > flags |= FOLL_WRITE; > > down_read(&mm->mmap_sem); > - if (mm == current->mm) { > - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, > - vmas); > - } else { > - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, > - vmas, NULL); > - /* > - * The lifetime of a vaddr_get_pfn() page pin is > - * userspace-controlled. In the fs-dax case this could > - * lead to indefinite stalls in filesystem operations. > - * Disallow attempts to pin fs-dax pages via this > - * interface. > - */ > - if (ret > 0 && vma_is_fsdax(vmas[0])) { > - ret = -EOPNOTSUPP; > - put_page(page[0]); > - } > + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, > + page, vmas, NULL); > + /* > + * The lifetime of a vaddr_get_pfn() page pin is > + * userspace-controlled. In the fs-dax case this could > + * lead to indefinite stalls in filesystem operations. > + * Disallow attempts to pin fs-dax pages via this > + * interface. > + */ > + if (ret > 0 && vma_is_fsdax(vmas[0])) { > + ret = -EOPNOTSUPP; > + put_page(page[0]); > } AFAIK this chunk is redundant now as it is some hack to emulate FOLL_LONGTERM? So vmas can be deleted too. Also unclear why this function has this: up_read(&mm->mmap_sem); if (ret == 1) { *pfn = page_to_pfn(page[0]); return 0; } down_read(&mm->mmap_sem); Jason
On Mon, Nov 11, 2019 at 4:07 PM John Hubbard <jhubbard@nvidia.com> wrote: > > As it says in the updated comment in gup.c: current FOLL_LONGTERM > behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the > FS DAX check requirement on vmas. > > However, the corresponding restriction in get_user_pages_remote() was > slightly stricter than is actually required: it forbade all > FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers > that do not set the "locked" arg. > > Update the code and comments accordingly, and update the VFIO caller > to take advantage of this, fixing a bug as a result: the VFIO caller > is logically a FOLL_LONGTERM user. > > Thanks to Jason Gunthorpe for pointing out a clean way to fix this. > > Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> > Cc: Jerome Glisse <jglisse@redhat.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: John Hubbard <jhubbard@nvidia.com> > --- > drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++----------------- > mm/gup.c | 13 ++++++++----- > 2 files changed, 21 insertions(+), 22 deletions(-) > > diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > index d864277ea16f..017689b7c32b 100644 > --- a/drivers/vfio/vfio_iommu_type1.c > +++ b/drivers/vfio/vfio_iommu_type1.c > @@ -348,24 +348,20 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > flags |= FOLL_WRITE; > > down_read(&mm->mmap_sem); > - if (mm == current->mm) { > - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, > - vmas); > - } else { > - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, > - vmas, NULL); > - /* > - * The lifetime of a vaddr_get_pfn() page pin is > - * userspace-controlled. In the fs-dax case this could > - * lead to indefinite stalls in filesystem operations. > - * Disallow attempts to pin fs-dax pages via this > - * interface. > - */ > - if (ret > 0 && vma_is_fsdax(vmas[0])) { > - ret = -EOPNOTSUPP; > - put_page(page[0]); > - } > + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, > + page, vmas, NULL); Hmm, what's the point of passing FOLL_LONGTERM to get_user_pages_remote() if get_user_pages_remote() is not going to check the vma? I think we got to this code state because the get_user_pages() vs get_user_pages_remote() split predated the introduction of FOLL_LONGTERM. I think check_vma_flags() should do the ((FOLL_LONGTERM | FOLL_GET) && vma_is_fsdax()) check and that would also remove the need for __gup_longterm_locked.
On 11/12/19 1:57 PM, Dan Williams wrote: ... >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c >> index d864277ea16f..017689b7c32b 100644 >> --- a/drivers/vfio/vfio_iommu_type1.c >> +++ b/drivers/vfio/vfio_iommu_type1.c >> @@ -348,24 +348,20 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, >> flags |= FOLL_WRITE; >> >> down_read(&mm->mmap_sem); >> - if (mm == current->mm) { >> - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, >> - vmas); >> - } else { >> - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, >> - vmas, NULL); >> - /* >> - * The lifetime of a vaddr_get_pfn() page pin is >> - * userspace-controlled. In the fs-dax case this could >> - * lead to indefinite stalls in filesystem operations. >> - * Disallow attempts to pin fs-dax pages via this >> - * interface. >> - */ >> - if (ret > 0 && vma_is_fsdax(vmas[0])) { >> - ret = -EOPNOTSUPP; >> - put_page(page[0]); >> - } >> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, >> + page, vmas, NULL); > > Hmm, what's the point of passing FOLL_LONGTERM to > get_user_pages_remote() if get_user_pages_remote() is not going to > check the vma? I think we got to this code state because the FOLL_LONGTERM is short-lived in this location, because patch 23 ("mm/gup: remove support for gup(FOLL_LONGTERM)") removes it, after callers are changed over to pin_longterm_pages*(). So FOLL_LONGTERM is not doing much now, but it is basically a marker for "change gup(FOLL_LONGTERM) to pin_longterm_pages()", and patch 18 actually makes that change. And then pin_longterm_pages*() is, in turn, a way to mark all the places that need file system and/or user space interactions (layout leases, etc), as per "Case 2: RDMA" in the new Documentation/vm/pin_user_pages.rst. > get_user_pages() vs get_user_pages_remote() split predated the > introduction of FOLL_LONGTERM. Yes. And I do want clean this up as I go, so we don't end up with stale concepts lingering in gup.c... > > I think check_vma_flags() should do the ((FOLL_LONGTERM | FOLL_GET) && > vma_is_fsdax()) check and that would also remove the need for > __gup_longterm_locked. > Good idea, but there is still the call to check_and_migrate_cma_pages(), inside __gup_longterm_locked(). So it's a little more involved and we can't trivially delete __gup_longterm_locked() yet, right? thanks,
On 11/12/19 12:43 PM, Jason Gunthorpe wrote: ... >> - } >> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, >> + page, vmas, NULL); >> + /* >> + * The lifetime of a vaddr_get_pfn() page pin is >> + * userspace-controlled. In the fs-dax case this could >> + * lead to indefinite stalls in filesystem operations. >> + * Disallow attempts to pin fs-dax pages via this >> + * interface. >> + */ >> + if (ret > 0 && vma_is_fsdax(vmas[0])) { >> + ret = -EOPNOTSUPP; >> + put_page(page[0]); >> } > > AFAIK this chunk is redundant now as it is some hack to emulate > FOLL_LONGTERM? So vmas can be deleted too. Let me first make sure I understand what Dan has in mind for the vma checking, in the other thread... > > Also unclear why this function has this: > > up_read(&mm->mmap_sem); > > if (ret == 1) { > *pfn = page_to_pfn(page[0]); > return 0; > } > > down_read(&mm->mmap_sem); > Yes, that's really odd. It's not good to release and retake the lock anyway in general (without re-checking things), and certainly it is not required to release mmap_sem in order to call page_to_pfn(). I've removed that up_read()/down_read() pair, for v4. thanks,
On Tue, Nov 12, 2019 at 2:24 PM John Hubbard <jhubbard@nvidia.com> wrote: > > On 11/12/19 1:57 PM, Dan Williams wrote: > ... > >> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c > >> index d864277ea16f..017689b7c32b 100644 > >> --- a/drivers/vfio/vfio_iommu_type1.c > >> +++ b/drivers/vfio/vfio_iommu_type1.c > >> @@ -348,24 +348,20 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, > >> flags |= FOLL_WRITE; > >> > >> down_read(&mm->mmap_sem); > >> - if (mm == current->mm) { > >> - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, > >> - vmas); > >> - } else { > >> - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, > >> - vmas, NULL); > >> - /* > >> - * The lifetime of a vaddr_get_pfn() page pin is > >> - * userspace-controlled. In the fs-dax case this could > >> - * lead to indefinite stalls in filesystem operations. > >> - * Disallow attempts to pin fs-dax pages via this > >> - * interface. > >> - */ > >> - if (ret > 0 && vma_is_fsdax(vmas[0])) { > >> - ret = -EOPNOTSUPP; > >> - put_page(page[0]); > >> - } > >> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, > >> + page, vmas, NULL); > > > > Hmm, what's the point of passing FOLL_LONGTERM to > > get_user_pages_remote() if get_user_pages_remote() is not going to > > check the vma? I think we got to this code state because the > > FOLL_LONGTERM is short-lived in this location, because patch 23 > ("mm/gup: remove support for gup(FOLL_LONGTERM)") removes it, after > callers are changed over to pin_longterm_pages*(). > > So FOLL_LONGTERM is not doing much now, but it is basically a marker for > "change gup(FOLL_LONGTERM) to pin_longterm_pages()", and patch 18 > actually makes that change. > > And then pin_longterm_pages*() is, in turn, a way to mark all the > places that need file system and/or user space interactions (layout > leases, etc), as per "Case 2: RDMA" in the new > Documentation/vm/pin_user_pages.rst. Ah, sorry. This was the first time I had looked at this series and jumped in without reading the background. Your patch as is looks ok, I assume you've removed the FOLL_LONGTERM warning in get_user_pages_remote in another patch? > > > get_user_pages() vs get_user_pages_remote() split predated the > > introduction of FOLL_LONGTERM. > > Yes. And I do want clean this up as I go, so we don't end up with > stale concepts lingering in gup.c... > > > > > I think check_vma_flags() should do the ((FOLL_LONGTERM | FOLL_GET) && > > vma_is_fsdax()) check and that would also remove the need for > > __gup_longterm_locked. > > > > Good idea, but there is still the call to check_and_migrate_cma_pages(), > inside __gup_longterm_locked(). So it's a little more involved and > we can't trivially delete __gup_longterm_locked() yet, right? [ add Aneesh ] Yes, you're right. I had overlooked that had snuck in there. That to me similarly needs to be pushed down into the core with its own FOLL flag, or it needs to be an explicit fixup that each caller does after get_user_pages. The fact that migration silently happens as a side effect of gup is too magical for my taste.
On Tue, Nov 12, 2019 at 2:43 PM John Hubbard <jhubbard@nvidia.com> wrote: > > On 11/12/19 12:43 PM, Jason Gunthorpe wrote: > ... > >> - } > >> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, > >> + page, vmas, NULL); > >> + /* > >> + * The lifetime of a vaddr_get_pfn() page pin is > >> + * userspace-controlled. In the fs-dax case this could > >> + * lead to indefinite stalls in filesystem operations. > >> + * Disallow attempts to pin fs-dax pages via this > >> + * interface. > >> + */ > >> + if (ret > 0 && vma_is_fsdax(vmas[0])) { > >> + ret = -EOPNOTSUPP; > >> + put_page(page[0]); > >> } > > > > AFAIK this chunk is redundant now as it is some hack to emulate > > FOLL_LONGTERM? So vmas can be deleted too. > > Let me first make sure I understand what Dan has in mind for the vma > checking, in the other thread... It's not redundant relative to upstream which does not do anything the FOLL_LONGTERM in the gup-slow path... but I have not looked at patches 1-7 to see if something there made it redundant.
On 11/12/19 2:43 PM, Dan Williams wrote: ... > Ah, sorry. This was the first time I had looked at this series and > jumped in without reading the background. > > Your patch as is looks ok, I assume you've removed the FOLL_LONGTERM > warning in get_user_pages_remote in another patch? > Actually, I haven't gone quite that far. Actually this patch is the last change to that function. Therefore, at the end of this patchset, get_user_pages_remote() ends up with this check in it which is a less-restrictive version of the warning: /* * Current FOLL_LONGTERM behavior is incompatible with * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on * vmas. However, this only comes up if locked is set, and there are * callers that do request FOLL_LONGTERM, but do not set locked. So, * allow what we can. */ if (gup_flags & FOLL_LONGTERM) { if (WARN_ON_ONCE(locked)) return -EINVAL; } Is that OK, or did you want to go further (possibly in a follow-up patchset, as I'm hoping to get this one in soon)? ... >>> I think check_vma_flags() should do the ((FOLL_LONGTERM | FOLL_GET) && >>> vma_is_fsdax()) check and that would also remove the need for >>> __gup_longterm_locked. >>> >> >> Good idea, but there is still the call to check_and_migrate_cma_pages(), >> inside __gup_longterm_locked(). So it's a little more involved and >> we can't trivially delete __gup_longterm_locked() yet, right? > > [ add Aneesh ] > > Yes, you're right. I had overlooked that had snuck in there. That to > me similarly needs to be pushed down into the core with its own FOLL > flag, or it needs to be an explicit fixup that each caller does after > get_user_pages. The fact that migration silently happens as a side > effect of gup is too magical for my taste. > Yes. It's an intrusive side effect that is surprising, and not in a "happy surprise" way. :) . Fixing up the CMA pages by splitting that functionality into separate function calls sounds like an improvement worth exploring. thanks,
On Tue, Nov 12, 2019 at 3:08 PM John Hubbard <jhubbard@nvidia.com> wrote: > > On 11/12/19 2:43 PM, Dan Williams wrote: > ... > > Ah, sorry. This was the first time I had looked at this series and > > jumped in without reading the background. > > > > Your patch as is looks ok, I assume you've removed the FOLL_LONGTERM > > warning in get_user_pages_remote in another patch? > > > > Actually, I haven't gone quite that far. Actually this patch is the last > change to that function. Therefore, at the end of this patchset, > get_user_pages_remote() ends up with this check in it which > is a less-restrictive version of the warning: > > /* > * Current FOLL_LONGTERM behavior is incompatible with > * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on > * vmas. However, this only comes up if locked is set, and there are > * callers that do request FOLL_LONGTERM, but do not set locked. So, > * allow what we can. > */ > if (gup_flags & FOLL_LONGTERM) { > if (WARN_ON_ONCE(locked)) > return -EINVAL; > } > > Is that OK, or did you want to go further (possibly in a follow-up > patchset, as I'm hoping to get this one in soon)? That looks ok. Something to maybe push down into the core in a future cleanup, but not something that needs to be done now. > ... > >>> I think check_vma_flags() should do the ((FOLL_LONGTERM | FOLL_GET) && > >>> vma_is_fsdax()) check and that would also remove the need for > >>> __gup_longterm_locked. > >>> > >> > >> Good idea, but there is still the call to check_and_migrate_cma_pages(), > >> inside __gup_longterm_locked(). So it's a little more involved and > >> we can't trivially delete __gup_longterm_locked() yet, right? > > > > [ add Aneesh ] > > > > Yes, you're right. I had overlooked that had snuck in there. That to > > me similarly needs to be pushed down into the core with its own FOLL > > flag, or it needs to be an explicit fixup that each caller does after > > get_user_pages. The fact that migration silently happens as a side > > effect of gup is too magical for my taste. > > > > Yes. It's an intrusive side effect that is surprising, and not in a > "happy surprise" way. :) . Fixing up the CMA pages by splitting that > functionality into separate function calls sounds like an improvement > worth exploring. Right, future work.
On 11/12/19 2:45 PM, Dan Williams wrote: > On Tue, Nov 12, 2019 at 2:43 PM John Hubbard <jhubbard@nvidia.com> wrote: >> >> On 11/12/19 12:43 PM, Jason Gunthorpe wrote: >> ... >>>> - } >>>> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, >>>> + page, vmas, NULL); >>>> + /* >>>> + * The lifetime of a vaddr_get_pfn() page pin is >>>> + * userspace-controlled. In the fs-dax case this could >>>> + * lead to indefinite stalls in filesystem operations. >>>> + * Disallow attempts to pin fs-dax pages via this >>>> + * interface. >>>> + */ >>>> + if (ret > 0 && vma_is_fsdax(vmas[0])) { >>>> + ret = -EOPNOTSUPP; >>>> + put_page(page[0]); >>>> } >>> >>> AFAIK this chunk is redundant now as it is some hack to emulate >>> FOLL_LONGTERM? So vmas can be deleted too. >> >> Let me first make sure I understand what Dan has in mind for the vma >> checking, in the other thread... > > It's not redundant relative to upstream which does not do anything the > FOLL_LONGTERM in the gup-slow path... but I have not looked at patches > 1-7 to see if something there made it redundant. > There is nothing in patches 1-7 that would make it redundant. About the only thing that you might find interesting in that subset is patch 4 ("mm: devmap: refactor 1-based refcounting for ZONE_DEVICE pages"), for devmap and ZONE_DEVICE interest. But it doesn't affect this discussion directly. thanks,
On 11/12/19 3:14 PM, Dan Williams wrote: ... >> >> Is that OK, or did you want to go further (possibly in a follow-up >> patchset, as I'm hoping to get this one in soon)? > > That looks ok. Something to maybe push down into the core in a future Great! I'll post a cleaned up v4 (with the extraneous up_read()/down_read() removed), then. > cleanup, but not something that needs to be done now. > Yes. I've put the FOLL_LONGTERM cleanup items on my list now, in case they don't get done as part of something else. There's a lot more change coming in this area. thanks,
On Tue, Nov 12, 2019 at 02:45:51PM -0800, Dan Williams wrote: > On Tue, Nov 12, 2019 at 2:43 PM John Hubbard <jhubbard@nvidia.com> wrote: > > > > On 11/12/19 12:43 PM, Jason Gunthorpe wrote: > > ... > > >> - } > > >> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, > > >> + page, vmas, NULL); > > >> + /* > > >> + * The lifetime of a vaddr_get_pfn() page pin is > > >> + * userspace-controlled. In the fs-dax case this could > > >> + * lead to indefinite stalls in filesystem operations. > > >> + * Disallow attempts to pin fs-dax pages via this > > >> + * interface. > > >> + */ > > >> + if (ret > 0 && vma_is_fsdax(vmas[0])) { > > >> + ret = -EOPNOTSUPP; > > >> + put_page(page[0]); > > >> } > > > > > > AFAIK this chunk is redundant now as it is some hack to emulate > > > FOLL_LONGTERM? So vmas can be deleted too. > > > > Let me first make sure I understand what Dan has in mind for the vma > > checking, in the other thread... > > It's not redundant relative to upstream which does not do anything the > FOLL_LONGTERM in the gup-slow path... but I have not looked at patches > 1-7 to see if something there made it redundant. Oh, the hunk John had below for get_user_pages_remote() also needs to call __gup_longterm_locked() when FOLL_LONGTERM is specified, then that calls check_dax_vmas() which duplicates the vma_is_fsdax() check above. Certainly no caller of FOLL_LONGTERM should have to do dax specific VMA checking. Jason
On Tue, Nov 12, 2019 at 3:43 PM Jason Gunthorpe <jgg@ziepe.ca> wrote: > > On Tue, Nov 12, 2019 at 02:45:51PM -0800, Dan Williams wrote: > > On Tue, Nov 12, 2019 at 2:43 PM John Hubbard <jhubbard@nvidia.com> wrote: > > > > > > On 11/12/19 12:43 PM, Jason Gunthorpe wrote: > > > ... > > > >> - } > > > >> + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, > > > >> + page, vmas, NULL); > > > >> + /* > > > >> + * The lifetime of a vaddr_get_pfn() page pin is > > > >> + * userspace-controlled. In the fs-dax case this could > > > >> + * lead to indefinite stalls in filesystem operations. > > > >> + * Disallow attempts to pin fs-dax pages via this > > > >> + * interface. > > > >> + */ > > > >> + if (ret > 0 && vma_is_fsdax(vmas[0])) { > > > >> + ret = -EOPNOTSUPP; > > > >> + put_page(page[0]); > > > >> } > > > > > > > > AFAIK this chunk is redundant now as it is some hack to emulate > > > > FOLL_LONGTERM? So vmas can be deleted too. > > > > > > Let me first make sure I understand what Dan has in mind for the vma > > > checking, in the other thread... > > > > It's not redundant relative to upstream which does not do anything the > > FOLL_LONGTERM in the gup-slow path... but I have not looked at patches > > 1-7 to see if something there made it redundant. > > Oh, the hunk John had below for get_user_pages_remote() also needs to > call __gup_longterm_locked() when FOLL_LONGTERM is specified, then > that calls check_dax_vmas() which duplicates the vma_is_fsdax() check > above. Oh true, good eye. It is redundant if it does additionally call __gup_longterm_locked(), and it needs to do that otherwises it undoes the CMA migration magic that Aneesh added. > > Certainly no caller of FOLL_LONGTERM should have to do dax specific > VMA checking. Agree, that was my comment about cleaning up the vma_is_fsdax() check to be internal to the gup core.
On 11/12/19 4:58 PM, Dan Williams wrote: ... >>> It's not redundant relative to upstream which does not do anything the >>> FOLL_LONGTERM in the gup-slow path... but I have not looked at patches >>> 1-7 to see if something there made it redundant. >> >> Oh, the hunk John had below for get_user_pages_remote() also needs to >> call __gup_longterm_locked() when FOLL_LONGTERM is specified, then >> that calls check_dax_vmas() which duplicates the vma_is_fsdax() check >> above. > > Oh true, good eye. It is redundant if it does additionally call > __gup_longterm_locked(), and it needs to do that otherwises it undoes > the CMA migration magic that Aneesh added. > OK. So just to be clear, I'll be removing this from the patch: /* * The lifetime of a vaddr_get_pfn() page pin is * userspace-controlled. In the fs-dax case this could * lead to indefinite stalls in filesystem operations. * Disallow attempts to pin fs-dax pages via this * interface. */ if (ret > 0 && vma_is_fsdax(vmas[0])) { ret = -EOPNOTSUPP; put_page(page[0]); } (and the declaration of "vmas", as well). thanks,
On Tue, Nov 12, 2019 at 5:08 PM John Hubbard <jhubbard@nvidia.com> wrote: > > On 11/12/19 4:58 PM, Dan Williams wrote: > ... > >>> It's not redundant relative to upstream which does not do anything the > >>> FOLL_LONGTERM in the gup-slow path... but I have not looked at patches > >>> 1-7 to see if something there made it redundant. > >> > >> Oh, the hunk John had below for get_user_pages_remote() also needs to > >> call __gup_longterm_locked() when FOLL_LONGTERM is specified, then > >> that calls check_dax_vmas() which duplicates the vma_is_fsdax() check > >> above. > > > > Oh true, good eye. It is redundant if it does additionally call > > __gup_longterm_locked(), and it needs to do that otherwises it undoes > > the CMA migration magic that Aneesh added. > > > > OK. So just to be clear, I'll be removing this from the patch: > > /* > * The lifetime of a vaddr_get_pfn() page pin is > * userspace-controlled. In the fs-dax case this could > * lead to indefinite stalls in filesystem operations. > * Disallow attempts to pin fs-dax pages via this > * interface. > */ > if (ret > 0 && vma_is_fsdax(vmas[0])) { > ret = -EOPNOTSUPP; > put_page(page[0]); > } > > (and the declaration of "vmas", as well). ...and add a call to __gup_longterm_locked internal to get_user_pages_remote(), right?
On 11/12/19 5:35 PM, Dan Williams wrote: > On Tue, Nov 12, 2019 at 5:08 PM John Hubbard <jhubbard@nvidia.com> wrote: >> >> On 11/12/19 4:58 PM, Dan Williams wrote: >> ... >>>>> It's not redundant relative to upstream which does not do anything the >>>>> FOLL_LONGTERM in the gup-slow path... but I have not looked at patches >>>>> 1-7 to see if something there made it redundant. >>>> >>>> Oh, the hunk John had below for get_user_pages_remote() also needs to >>>> call __gup_longterm_locked() when FOLL_LONGTERM is specified, then >>>> that calls check_dax_vmas() which duplicates the vma_is_fsdax() check >>>> above. >>> >>> Oh true, good eye. It is redundant if it does additionally call >>> __gup_longterm_locked(), and it needs to do that otherwises it undoes >>> the CMA migration magic that Aneesh added. >>> >> >> OK. So just to be clear, I'll be removing this from the patch: >> >> /* >> * The lifetime of a vaddr_get_pfn() page pin is >> * userspace-controlled. In the fs-dax case this could >> * lead to indefinite stalls in filesystem operations. >> * Disallow attempts to pin fs-dax pages via this >> * interface. >> */ >> if (ret > 0 && vma_is_fsdax(vmas[0])) { >> ret = -EOPNOTSUPP; >> put_page(page[0]); >> } >> >> (and the declaration of "vmas", as well). > > ...and add a call to __gup_longterm_locked internal to > get_user_pages_remote(), right? > Yes, and thanks for double-checking. I think I got a little dizzy following the call stack there. :) And now I see that this also affects the implementation of pin_longterm_pages_remote(), because that will need the same logic that get_user_pages_remote() has. thanks,
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c index d864277ea16f..017689b7c32b 100644 --- a/drivers/vfio/vfio_iommu_type1.c +++ b/drivers/vfio/vfio_iommu_type1.c @@ -348,24 +348,20 @@ static int vaddr_get_pfn(struct mm_struct *mm, unsigned long vaddr, flags |= FOLL_WRITE; down_read(&mm->mmap_sem); - if (mm == current->mm) { - ret = get_user_pages(vaddr, 1, flags | FOLL_LONGTERM, page, - vmas); - } else { - ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags, page, - vmas, NULL); - /* - * The lifetime of a vaddr_get_pfn() page pin is - * userspace-controlled. In the fs-dax case this could - * lead to indefinite stalls in filesystem operations. - * Disallow attempts to pin fs-dax pages via this - * interface. - */ - if (ret > 0 && vma_is_fsdax(vmas[0])) { - ret = -EOPNOTSUPP; - put_page(page[0]); - } + ret = get_user_pages_remote(NULL, mm, vaddr, 1, flags | FOLL_LONGTERM, + page, vmas, NULL); + /* + * The lifetime of a vaddr_get_pfn() page pin is + * userspace-controlled. In the fs-dax case this could + * lead to indefinite stalls in filesystem operations. + * Disallow attempts to pin fs-dax pages via this + * interface. + */ + if (ret > 0 && vma_is_fsdax(vmas[0])) { + ret = -EOPNOTSUPP; + put_page(page[0]); } + up_read(&mm->mmap_sem); if (ret == 1) { diff --git a/mm/gup.c b/mm/gup.c index 933524de6249..cfe6dc5fc343 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1167,13 +1167,16 @@ long get_user_pages_remote(struct task_struct *tsk, struct mm_struct *mm, struct vm_area_struct **vmas, int *locked) { /* - * FIXME: Current FOLL_LONGTERM behavior is incompatible with + * Current FOLL_LONGTERM behavior is incompatible with * FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on - * vmas. As there are no users of this flag in this call we simply - * disallow this option for now. + * vmas. However, this only comes up if locked is set, and there are + * callers that do request FOLL_LONGTERM, but do not set locked. So, + * allow what we can. */ - if (WARN_ON_ONCE(gup_flags & FOLL_LONGTERM)) - return -EINVAL; + if (gup_flags & FOLL_LONGTERM) { + if (WARN_ON_ONCE(locked)) + return -EINVAL; + } return __get_user_pages_locked(tsk, mm, start, nr_pages, pages, vmas, locked,
As it says in the updated comment in gup.c: current FOLL_LONGTERM behavior is incompatible with FAULT_FLAG_ALLOW_RETRY because of the FS DAX check requirement on vmas. However, the corresponding restriction in get_user_pages_remote() was slightly stricter than is actually required: it forbade all FOLL_LONGTERM callers, but we can actually allow FOLL_LONGTERM callers that do not set the "locked" arg. Update the code and comments accordingly, and update the VFIO caller to take advantage of this, fixing a bug as a result: the VFIO caller is logically a FOLL_LONGTERM user. Thanks to Jason Gunthorpe for pointing out a clean way to fix this. Suggested-by: Jason Gunthorpe <jgg@ziepe.ca> Cc: Jerome Glisse <jglisse@redhat.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: John Hubbard <jhubbard@nvidia.com> --- drivers/vfio/vfio_iommu_type1.c | 30 +++++++++++++----------------- mm/gup.c | 13 ++++++++----- 2 files changed, 21 insertions(+), 22 deletions(-)