diff mbox series

[v2,07/18] infiniband: set FOLL_PIN, FOLL_LONGTERM via pin_longterm_pages*()

Message ID 20191103211813.213227-8-jhubbard@nvidia.com (mailing list archive)
State Superseded
Headers show
Series mm/gup: track dma-pinned pages: FOLL_PIN, FOLL_LONGTERM | expand

Commit Message

John Hubbard Nov. 3, 2019, 9:18 p.m. UTC
Convert infiniband to use the new wrapper calls, and stop
explicitly setting FOLL_LONGTERM at the call sites.

The new pin_longterm_*() calls replace get_user_pages*()
calls, and set both FOLL_LONGTERM and a new FOLL_PIN
flag. The FOLL_PIN flag requires that the caller must
return the pages via put_user_page*() calls, but
infiniband was already doing that as part of an earlier
commit.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: John Hubbard <jhubbard@nvidia.com>
---
 drivers/infiniband/core/umem.c              |  5 ++---
 drivers/infiniband/core/umem_odp.c          | 10 +++++-----
 drivers/infiniband/hw/hfi1/user_pages.c     |  4 ++--
 drivers/infiniband/hw/mthca/mthca_memfree.c |  3 +--
 drivers/infiniband/hw/qib/qib_user_pages.c  |  8 ++++----
 drivers/infiniband/hw/qib/qib_user_sdma.c   |  2 +-
 drivers/infiniband/hw/usnic/usnic_uiom.c    |  9 ++++-----
 drivers/infiniband/sw/siw/siw_mem.c         |  5 ++---
 8 files changed, 21 insertions(+), 25 deletions(-)

Comments

Jason Gunthorpe Nov. 4, 2019, 8:33 p.m. UTC | #1
On Sun, Nov 03, 2019 at 01:18:02PM -0800, John Hubbard wrote:
> Convert infiniband to use the new wrapper calls, and stop
> explicitly setting FOLL_LONGTERM at the call sites.
> 
> The new pin_longterm_*() calls replace get_user_pages*()
> calls, and set both FOLL_LONGTERM and a new FOLL_PIN
> flag. The FOLL_PIN flag requires that the caller must
> return the pages via put_user_page*() calls, but
> infiniband was already doing that as part of an earlier
> commit.
> 
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: John Hubbard <jhubbard@nvidia.com>
>  drivers/infiniband/core/umem.c              |  5 ++---
>  drivers/infiniband/core/umem_odp.c          | 10 +++++-----
>  drivers/infiniband/hw/hfi1/user_pages.c     |  4 ++--
>  drivers/infiniband/hw/mthca/mthca_memfree.c |  3 +--
>  drivers/infiniband/hw/qib/qib_user_pages.c  |  8 ++++----
>  drivers/infiniband/hw/qib/qib_user_sdma.c   |  2 +-
>  drivers/infiniband/hw/usnic/usnic_uiom.c    |  9 ++++-----
>  drivers/infiniband/sw/siw/siw_mem.c         |  5 ++---
>  8 files changed, 21 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> index 24244a2f68cc..c5a78d3e674b 100644
> +++ b/drivers/infiniband/core/umem.c
> @@ -272,11 +272,10 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
>  
>  	while (npages) {
>  		down_read(&mm->mmap_sem);
> -		ret = get_user_pages(cur_base,
> +		ret = pin_longterm_pages(cur_base,
>  				     min_t(unsigned long, npages,
>  					   PAGE_SIZE / sizeof (struct page *)),
> -				     gup_flags | FOLL_LONGTERM,
> -				     page_list, NULL);
> +				     gup_flags, page_list, NULL);

FWIW, this one should be converted to fast as well, I think we finally
got rid of all the blockers for that?

Jason
John Hubbard Nov. 4, 2019, 8:48 p.m. UTC | #2
On 11/4/19 12:33 PM, Jason Gunthorpe wrote:
...
>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>> index 24244a2f68cc..c5a78d3e674b 100644
>> +++ b/drivers/infiniband/core/umem.c
>> @@ -272,11 +272,10 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
>>  
>>  	while (npages) {
>>  		down_read(&mm->mmap_sem);
>> -		ret = get_user_pages(cur_base,
>> +		ret = pin_longterm_pages(cur_base,
>>  				     min_t(unsigned long, npages,
>>  					   PAGE_SIZE / sizeof (struct page *)),
>> -				     gup_flags | FOLL_LONGTERM,
>> -				     page_list, NULL);
>> +				     gup_flags, page_list, NULL);
> 
> FWIW, this one should be converted to fast as well, I think we finally
> got rid of all the blockers for that?
> 

I'm not aware of any blockers on the gup.c end, anyway. The only broken thing we
have there is "gup remote + FOLL_LONGTERM". But we can do "gup fast + LONGTERM". 

Unless I'm really missing something, in which case several other call sites
would need changes.

I'll change it to pin_longterm_pages_fast().

thanks,

John Hubbard
NVIDIA
Jason Gunthorpe Nov. 4, 2019, 8:57 p.m. UTC | #3
On Mon, Nov 04, 2019 at 12:48:13PM -0800, John Hubbard wrote:
> On 11/4/19 12:33 PM, Jason Gunthorpe wrote:
> ...
> >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> >> index 24244a2f68cc..c5a78d3e674b 100644
> >> +++ b/drivers/infiniband/core/umem.c
> >> @@ -272,11 +272,10 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
> >>  
> >>  	while (npages) {
> >>  		down_read(&mm->mmap_sem);
> >> -		ret = get_user_pages(cur_base,
> >> +		ret = pin_longterm_pages(cur_base,
> >>  				     min_t(unsigned long, npages,
> >>  					   PAGE_SIZE / sizeof (struct page *)),
> >> -				     gup_flags | FOLL_LONGTERM,
> >> -				     page_list, NULL);
> >> +				     gup_flags, page_list, NULL);
> > 
> > FWIW, this one should be converted to fast as well, I think we finally
> > got rid of all the blockers for that?
> > 
> 
> I'm not aware of any blockers on the gup.c end, anyway. The only broken thing we
> have there is "gup remote + FOLL_LONGTERM". But we can do "gup fast + LONGTERM". 

I mean the use of the mmap_sem here is finally in a way where we can
just delete the mmap_sem and use _fast
 
ie, AFAIK there is no need for the mmap_sem to be held during
ib_umem_add_sg_table()

This should probably be a standalone patch however

Jason
John Hubbard Nov. 4, 2019, 10:03 p.m. UTC | #4
On 11/4/19 12:57 PM, Jason Gunthorpe wrote:
> On Mon, Nov 04, 2019 at 12:48:13PM -0800, John Hubbard wrote:
>> On 11/4/19 12:33 PM, Jason Gunthorpe wrote:
>> ...
>>>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
>>>> index 24244a2f68cc..c5a78d3e674b 100644
>>>> +++ b/drivers/infiniband/core/umem.c
>>>> @@ -272,11 +272,10 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
>>>>  
>>>>  	while (npages) {
>>>>  		down_read(&mm->mmap_sem);
>>>> -		ret = get_user_pages(cur_base,
>>>> +		ret = pin_longterm_pages(cur_base,
>>>>  				     min_t(unsigned long, npages,
>>>>  					   PAGE_SIZE / sizeof (struct page *)),
>>>> -				     gup_flags | FOLL_LONGTERM,
>>>> -				     page_list, NULL);
>>>> +				     gup_flags, page_list, NULL);
>>>
>>> FWIW, this one should be converted to fast as well, I think we finally
>>> got rid of all the blockers for that?
>>>
>>
>> I'm not aware of any blockers on the gup.c end, anyway. The only broken thing we
>> have there is "gup remote + FOLL_LONGTERM". But we can do "gup fast + LONGTERM". 
> 
> I mean the use of the mmap_sem here is finally in a way where we can
> just delete the mmap_sem and use _fast
>  
> ie, AFAIK there is no need for the mmap_sem to be held during
> ib_umem_add_sg_table()
> 
> This should probably be a standalone patch however
> 

Yes. Oh, actually I guess the patch flow should be: change to 
get_user_pages_fast() and remove the mmap_sem calls, as one patch. And then change 
to pin_longterm_pages_fast() as the next patch. Otherwise, the internal fallback
from _fast to slow gup would attempt to take the mmap_sem (again) in the same
thread, which is not good. :)

Or just defer the change until after this series. Either way is fine, let me
know if you prefer one over the other.

The patch itself is trivial, but runtime testing to gain confidence that
it's solid is much harder. Is there a stress test you would recommend for that?
(I'm not promising I can quickly run it yet--my local IB setup is still nascent 
at best.)


thanks,
Jason Gunthorpe Nov. 5, 2019, 2:32 a.m. UTC | #5
On Mon, Nov 04, 2019 at 02:03:43PM -0800, John Hubbard wrote:
> On 11/4/19 12:57 PM, Jason Gunthorpe wrote:
> > On Mon, Nov 04, 2019 at 12:48:13PM -0800, John Hubbard wrote:
> >> On 11/4/19 12:33 PM, Jason Gunthorpe wrote:
> >> ...
> >>>> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> >>>> index 24244a2f68cc..c5a78d3e674b 100644
> >>>> +++ b/drivers/infiniband/core/umem.c
> >>>> @@ -272,11 +272,10 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
> >>>>  
> >>>>  	while (npages) {
> >>>>  		down_read(&mm->mmap_sem);
> >>>> -		ret = get_user_pages(cur_base,
> >>>> +		ret = pin_longterm_pages(cur_base,
> >>>>  				     min_t(unsigned long, npages,
> >>>>  					   PAGE_SIZE / sizeof (struct page *)),
> >>>> -				     gup_flags | FOLL_LONGTERM,
> >>>> -				     page_list, NULL);
> >>>> +				     gup_flags, page_list, NULL);
> >>>
> >>> FWIW, this one should be converted to fast as well, I think we finally
> >>> got rid of all the blockers for that?
> >>>
> >>
> >> I'm not aware of any blockers on the gup.c end, anyway. The only broken thing we
> >> have there is "gup remote + FOLL_LONGTERM". But we can do "gup fast + LONGTERM". 
> > 
> > I mean the use of the mmap_sem here is finally in a way where we can
> > just delete the mmap_sem and use _fast
> >  
> > ie, AFAIK there is no need for the mmap_sem to be held during
> > ib_umem_add_sg_table()
> > 
> > This should probably be a standalone patch however
> > 
> 
> Yes. Oh, actually I guess the patch flow should be: change to 
> get_user_pages_fast() and remove the mmap_sem calls, as one patch. And then change 
> to pin_longterm_pages_fast() as the next patch. Otherwise, the internal fallback
> from _fast to slow gup would attempt to take the mmap_sem (again) in the same
> thread, which is not good. :)
> 
> Or just defer the change until after this series. Either way is fine, let me
> know if you prefer one over the other.
> 
> The patch itself is trivial, but runtime testing to gain confidence that
> it's solid is much harder. Is there a stress test you would recommend for that?
> (I'm not promising I can quickly run it yet--my local IB setup is still nascent 
> at best.)

If you make a patch we can probably get it tested, it is something
we should do I keep forgetting about.

Jason
Ira Weiny Nov. 7, 2019, 2:26 a.m. UTC | #6
On Mon, Nov 04, 2019 at 04:57:38PM -0400, Jason Gunthorpe wrote:
> On Mon, Nov 04, 2019 at 12:48:13PM -0800, John Hubbard wrote:
> > On 11/4/19 12:33 PM, Jason Gunthorpe wrote:
> > ...
> > >> diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
> > >> index 24244a2f68cc..c5a78d3e674b 100644
> > >> +++ b/drivers/infiniband/core/umem.c
> > >> @@ -272,11 +272,10 @@ struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
> > >>  
> > >>  	while (npages) {
> > >>  		down_read(&mm->mmap_sem);
> > >> -		ret = get_user_pages(cur_base,
> > >> +		ret = pin_longterm_pages(cur_base,
> > >>  				     min_t(unsigned long, npages,
> > >>  					   PAGE_SIZE / sizeof (struct page *)),
> > >> -				     gup_flags | FOLL_LONGTERM,
> > >> -				     page_list, NULL);
> > >> +				     gup_flags, page_list, NULL);
> > > 
> > > FWIW, this one should be converted to fast as well, I think we finally
> > > got rid of all the blockers for that?
> > > 
> > 
> > I'm not aware of any blockers on the gup.c end, anyway. The only broken thing we
> > have there is "gup remote + FOLL_LONGTERM". But we can do "gup fast + LONGTERM". 
> 
> I mean the use of the mmap_sem here is finally in a way where we can
> just delete the mmap_sem and use _fast

Yay!  I agree if we can do this we should.

Thanks,
Ira

>  
> ie, AFAIK there is no need for the mmap_sem to be held during
> ib_umem_add_sg_table()
> 
> This should probably be a standalone patch however
> 
> Jason
diff mbox series

Patch

diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c
index 24244a2f68cc..c5a78d3e674b 100644
--- a/drivers/infiniband/core/umem.c
+++ b/drivers/infiniband/core/umem.c
@@ -272,11 +272,10 @@  struct ib_umem *ib_umem_get(struct ib_udata *udata, unsigned long addr,
 
 	while (npages) {
 		down_read(&mm->mmap_sem);
-		ret = get_user_pages(cur_base,
+		ret = pin_longterm_pages(cur_base,
 				     min_t(unsigned long, npages,
 					   PAGE_SIZE / sizeof (struct page *)),
-				     gup_flags | FOLL_LONGTERM,
-				     page_list, NULL);
+				     gup_flags, page_list, NULL);
 		if (ret < 0) {
 			up_read(&mm->mmap_sem);
 			goto umem_release;
diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c
index 163ff7ba92b7..a38b67b83db5 100644
--- a/drivers/infiniband/core/umem_odp.c
+++ b/drivers/infiniband/core/umem_odp.c
@@ -534,7 +534,7 @@  static int ib_umem_odp_map_dma_single_page(
 	} else if (umem_odp->page_list[page_index] == page) {
 		umem_odp->dma_list[page_index] |= access_mask;
 	} else {
-		pr_err("error: got different pages in IB device and from get_user_pages. IB device page: %p, gup page: %p\n",
+		pr_err("error: got different pages in IB device and from pin_longterm_pages. IB device page: %p, gup page: %p\n",
 		       umem_odp->page_list[page_index], page);
 		/* Better remove the mapping now, to prevent any further
 		 * damage. */
@@ -639,11 +639,11 @@  int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt,
 		/*
 		 * Note: this might result in redundent page getting. We can
 		 * avoid this by checking dma_list to be 0 before calling
-		 * get_user_pages. However, this make the code much more
-		 * complex (and doesn't gain us much performance in most use
-		 * cases).
+		 * pin_longterm_pages. However, this makes the code much
+		 * more complex (and doesn't gain us much performance in most
+		 * use cases).
 		 */
-		npages = get_user_pages_remote(owning_process, owning_mm,
+		npages = pin_longterm_pages_remote(owning_process, owning_mm,
 				user_virt, gup_num_pages,
 				flags, local_page_list, NULL, NULL);
 		up_read(&owning_mm->mmap_sem);
diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c
index 469acb961fbd..9b55b0a73e29 100644
--- a/drivers/infiniband/hw/hfi1/user_pages.c
+++ b/drivers/infiniband/hw/hfi1/user_pages.c
@@ -104,9 +104,9 @@  int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np
 			    bool writable, struct page **pages)
 {
 	int ret;
-	unsigned int gup_flags = FOLL_LONGTERM | (writable ? FOLL_WRITE : 0);
+	unsigned int gup_flags = (writable ? FOLL_WRITE : 0);
 
-	ret = get_user_pages_fast(vaddr, npages, gup_flags, pages);
+	ret = pin_longterm_pages_fast(vaddr, npages, gup_flags, pages);
 	if (ret < 0)
 		return ret;
 
diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c
index edccfd6e178f..beec7e4b8a96 100644
--- a/drivers/infiniband/hw/mthca/mthca_memfree.c
+++ b/drivers/infiniband/hw/mthca/mthca_memfree.c
@@ -472,8 +472,7 @@  int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar,
 		goto out;
 	}
 
-	ret = get_user_pages_fast(uaddr & PAGE_MASK, 1,
-				  FOLL_WRITE | FOLL_LONGTERM, pages);
+	ret = pin_longterm_pages_fast(uaddr & PAGE_MASK, 1, FOLL_WRITE, pages);
 	if (ret < 0)
 		goto out;
 
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 6bf764e41891..684a14e14d9b 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -108,10 +108,10 @@  int qib_get_user_pages(unsigned long start_page, size_t num_pages,
 
 	down_read(&current->mm->mmap_sem);
 	for (got = 0; got < num_pages; got += ret) {
-		ret = get_user_pages(start_page + got * PAGE_SIZE,
-				     num_pages - got,
-				     FOLL_LONGTERM | FOLL_WRITE | FOLL_FORCE,
-				     p + got, NULL);
+		ret = pin_longterm_pages(start_page + got * PAGE_SIZE,
+					 num_pages - got,
+					 FOLL_WRITE | FOLL_FORCE,
+					 p + got, NULL);
 		if (ret < 0) {
 			up_read(&current->mm->mmap_sem);
 			goto bail_release;
diff --git a/drivers/infiniband/hw/qib/qib_user_sdma.c b/drivers/infiniband/hw/qib/qib_user_sdma.c
index 05190edc2611..fd86a9d19370 100644
--- a/drivers/infiniband/hw/qib/qib_user_sdma.c
+++ b/drivers/infiniband/hw/qib/qib_user_sdma.c
@@ -670,7 +670,7 @@  static int qib_user_sdma_pin_pages(const struct qib_devdata *dd,
 		else
 			j = npages;
 
-		ret = get_user_pages_fast(addr, j, FOLL_LONGTERM, pages);
+		ret = pin_longterm_pages_fast(addr, j, 0, pages);
 		if (ret != j) {
 			i = 0;
 			j = ret;
diff --git a/drivers/infiniband/hw/usnic/usnic_uiom.c b/drivers/infiniband/hw/usnic/usnic_uiom.c
index 62e6ffa9ad78..6b90ca1c3771 100644
--- a/drivers/infiniband/hw/usnic/usnic_uiom.c
+++ b/drivers/infiniband/hw/usnic/usnic_uiom.c
@@ -141,11 +141,10 @@  static int usnic_uiom_get_pages(unsigned long addr, size_t size, int writable,
 	ret = 0;
 
 	while (npages) {
-		ret = get_user_pages(cur_base,
-				     min_t(unsigned long, npages,
-				     PAGE_SIZE / sizeof(struct page *)),
-				     gup_flags | FOLL_LONGTERM,
-				     page_list, NULL);
+		ret = pin_longterm_pages(cur_base,
+					 min_t(unsigned long, npages,
+					     PAGE_SIZE / sizeof(struct page *)),
+					 gup_flags, page_list, NULL);
 
 		if (ret < 0)
 			goto out;
diff --git a/drivers/infiniband/sw/siw/siw_mem.c b/drivers/infiniband/sw/siw/siw_mem.c
index e99983f07663..20e663d7ada8 100644
--- a/drivers/infiniband/sw/siw/siw_mem.c
+++ b/drivers/infiniband/sw/siw/siw_mem.c
@@ -426,9 +426,8 @@  struct siw_umem *siw_umem_get(u64 start, u64 len, bool writable)
 		while (nents) {
 			struct page **plist = &umem->page_chunk[i].plist[got];
 
-			rv = get_user_pages(first_page_va, nents,
-					    foll_flags | FOLL_LONGTERM,
-					    plist, NULL);
+			rv = pin_longterm_pages(first_page_va, nents,
+						foll_flags, plist, NULL);
 			if (rv < 0)
 				goto out_sem_up;