diff mbox series

[3/6] drivers/IB,qib: do not use mmap_sem

Message ID 20190121174220.10583-4-dave@stgolabs.net (mailing list archive)
State Superseded
Delegated to: Jason Gunthorpe
Headers show
Series mm: make pinned_vm atomic and simplify users | expand

Commit Message

Davidlohr Bueso Jan. 21, 2019, 5:42 p.m. UTC
The driver uses mmap_sem for both pinned_vm accounting and
get_user_pages(). By using gup_fast() and letting the mm handle
the lock if needed, we can no longer rely on the semaphore and
simplify the whole thing as the pinning is decoupled from the lock.

This also fixes a bug that __qib_get_user_pages was not taking into
account the current value of pinned_vm.

Cc: dennis.dalessandro@intel.com
Cc: mike.marciniszyn@intel.com
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
---
 drivers/infiniband/hw/qib/qib_user_pages.c | 67 ++++++++++--------------------
 1 file changed, 22 insertions(+), 45 deletions(-)

Comments

Jason Gunthorpe Jan. 28, 2019, 11:31 p.m. UTC | #1
On Mon, Jan 21, 2019 at 09:42:17AM -0800, Davidlohr Bueso wrote:
> The driver uses mmap_sem for both pinned_vm accounting and
> get_user_pages(). By using gup_fast() and letting the mm handle
> the lock if needed, we can no longer rely on the semaphore and
> simplify the whole thing as the pinning is decoupled from the lock.
> 
> This also fixes a bug that __qib_get_user_pages was not taking into
> account the current value of pinned_vm.
> 
> Cc: dennis.dalessandro@intel.com
> Cc: mike.marciniszyn@intel.com
> Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> ---
>  drivers/infiniband/hw/qib/qib_user_pages.c | 67 ++++++++++--------------------
>  1 file changed, 22 insertions(+), 45 deletions(-)

I need you to respin this patch/series against the latest rdma tree:

git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

branch for-next

> diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
> -static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
> -				struct page **p)
> -{
> -	unsigned long lock_limit;
> -	size_t got;
> -	int ret;
> -
> -	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> -
> -	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
> -		ret = -ENOMEM;
> -		goto bail;
> -	}
> -
> -	for (got = 0; got < num_pages; got += ret) {
> -		ret = get_user_pages(start_page + got * PAGE_SIZE,
> -				     num_pages - got,
> -				     FOLL_WRITE | FOLL_FORCE,
> -				     p + got, NULL);

As this has been rightly changed to get_user_pages_longterm, and I
think the right answer to solve the conflict is to discard some of
this patch?

Since Andrew is OK with this I can move this ahead once this is
resolved, thanks.

Jason
Jason Gunthorpe Jan. 29, 2019, 4:46 a.m. UTC | #2
On Mon, Jan 28, 2019 at 04:31:40PM -0700, Jason Gunthorpe wrote:
> On Mon, Jan 21, 2019 at 09:42:17AM -0800, Davidlohr Bueso wrote:
> > The driver uses mmap_sem for both pinned_vm accounting and
> > get_user_pages(). By using gup_fast() and letting the mm handle
> > the lock if needed, we can no longer rely on the semaphore and
> > simplify the whole thing as the pinning is decoupled from the lock.
> > 
> > This also fixes a bug that __qib_get_user_pages was not taking into
> > account the current value of pinned_vm.
> > 
> > Cc: dennis.dalessandro@intel.com
> > Cc: mike.marciniszyn@intel.com
> > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> >  drivers/infiniband/hw/qib/qib_user_pages.c | 67 ++++++++++--------------------
> >  1 file changed, 22 insertions(+), 45 deletions(-)
> 
> I need you to respin this patch/series against the latest rdma tree:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
> 
> branch for-next
> 
> > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
> > -static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
> > -				struct page **p)
> > -{
> > -	unsigned long lock_limit;
> > -	size_t got;
> > -	int ret;
> > -
> > -	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > -
> > -	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
> > -		ret = -ENOMEM;
> > -		goto bail;
> > -	}
> > -
> > -	for (got = 0; got < num_pages; got += ret) {
> > -		ret = get_user_pages(start_page + got * PAGE_SIZE,
> > -				     num_pages - got,
> > -				     FOLL_WRITE | FOLL_FORCE,
> > -				     p + got, NULL);
> 
> As this has been rightly changed to get_user_pages_longterm, and I
> think the right answer to solve the conflict is to discard some of
> this patch?

.. and I'm looking at some of the other conversions here.. *most
likely* any caller that is manipulating rlimit for get_user_pages
should really be calling get_user_pages_longterm, so they should not
be converted to use _fast?

Jason
Davidlohr Bueso Jan. 29, 2019, 2:14 p.m. UTC | #3
On Mon, 28 Jan 2019, Jason Gunthorpe wrote:

>.. and I'm looking at some of the other conversions here.. *most
>likely* any caller that is manipulating rlimit for get_user_pages
>should really be calling get_user_pages_longterm, so they should not
>be converted to use _fast?

Yeah this was something that crossed my mind could come up when I
first sent the series. I'll send a v3 where I leave the gup() calls
alone for rdma drivers; and we can at least convert the mmap_sem to
reader.

Thanks,
Davidlohr
Ira Weiny Jan. 29, 2019, 6:50 p.m. UTC | #4
On Mon, Jan 28, 2019 at 09:46:07PM -0700, Jason Gunthorpe wrote:
> On Mon, Jan 28, 2019 at 04:31:40PM -0700, Jason Gunthorpe wrote:
> > On Mon, Jan 21, 2019 at 09:42:17AM -0800, Davidlohr Bueso wrote:
> > > The driver uses mmap_sem for both pinned_vm accounting and
> > > get_user_pages(). By using gup_fast() and letting the mm handle
> > > the lock if needed, we can no longer rely on the semaphore and
> > > simplify the whole thing as the pinning is decoupled from the lock.
> > > 
> > > This also fixes a bug that __qib_get_user_pages was not taking into
> > > account the current value of pinned_vm.
> > > 
> > > Cc: dennis.dalessandro@intel.com
> > > Cc: mike.marciniszyn@intel.com
> > > Reviewed-by: Ira Weiny <ira.weiny@intel.com>
> > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
> > >  drivers/infiniband/hw/qib/qib_user_pages.c | 67 ++++++++++--------------------
> > >  1 file changed, 22 insertions(+), 45 deletions(-)
> > 
> > I need you to respin this patch/series against the latest rdma tree:
> > 
> > git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git
> > 
> > branch for-next
> > 
> > > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
> > > -static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
> > > -				struct page **p)
> > > -{
> > > -	unsigned long lock_limit;
> > > -	size_t got;
> > > -	int ret;
> > > -
> > > -	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> > > -
> > > -	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
> > > -		ret = -ENOMEM;
> > > -		goto bail;
> > > -	}
> > > -
> > > -	for (got = 0; got < num_pages; got += ret) {
> > > -		ret = get_user_pages(start_page + got * PAGE_SIZE,
> > > -				     num_pages - got,
> > > -				     FOLL_WRITE | FOLL_FORCE,
> > > -				     p + got, NULL);
> > 
> > As this has been rightly changed to get_user_pages_longterm, and I
> > think the right answer to solve the conflict is to discard some of
> > this patch?
> 
> .. and I'm looking at some of the other conversions here.. *most
> likely* any caller that is manipulating rlimit for get_user_pages
> should really be calling get_user_pages_longterm, so they should not
> be converted to use _fast?

Is this a question?  I'm not sure I understand the meaning here?

Ira

> 
> Jason
Jason Gunthorpe Jan. 29, 2019, 11:19 p.m. UTC | #5
On Tue, Jan 29, 2019 at 10:50:05AM -0800, Ira Weiny wrote:
> > .. and I'm looking at some of the other conversions here.. *most
> > likely* any caller that is manipulating rlimit for get_user_pages
> > should really be calling get_user_pages_longterm, so they should not
> > be converted to use _fast?
> 
> Is this a question?  I'm not sure I understand the meaning here?

More an invitation to disprove the statement

Jason
Ira Weiny Jan. 30, 2019, 6:01 p.m. UTC | #6
> 
> On Tue, Jan 29, 2019 at 10:50:05AM -0800, Ira Weiny wrote:
> > > .. and I'm looking at some of the other conversions here.. *most
> > > likely* any caller that is manipulating rlimit for get_user_pages
> > > should really be calling get_user_pages_longterm, so they should not
> > > be converted to use _fast?
> >
> > Is this a question?  I'm not sure I understand the meaning here?
> 
> More an invitation to disprove the statement

Generally I agree.  But would be best if we could get fast GUP for performance.  I have not worked out if that will be possible with the final "longterm" solutions.

IRa

> 
> Jason
Jan Kara Jan. 31, 2019, 10:04 a.m. UTC | #7
On Wed 30-01-19 18:01:33, Weiny, Ira wrote:
> > 
> > On Tue, Jan 29, 2019 at 10:50:05AM -0800, Ira Weiny wrote:
> > > > .. and I'm looking at some of the other conversions here.. *most
> > > > likely* any caller that is manipulating rlimit for get_user_pages
> > > > should really be calling get_user_pages_longterm, so they should not
> > > > be converted to use _fast?
> > >
> > > Is this a question?  I'm not sure I understand the meaning here?
> > 
> > More an invitation to disprove the statement
> 
> Generally I agree.  But would be best if we could get fast GUP for
> performance.  I have not worked out if that will be possible with the
> final "longterm" solutions.

Initially probably not, longer-term it might be added if there are
performance data supporting that (i.e., showing real workload that would
benefit). In principle there's nothing that would prevent gup_fast like
functionality for long-term pins but I expect there will be always
additional overhead (compared to plain gup_fast()) of establishing
something like leases to identify long-term pins. But we haven't figured
out the details yet. For now we concentrate on fixing short-term pins and
issues with those.

								Honza
diff mbox series

Patch

diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c
index 602387bf98e7..e4114aad4a2f 100644
--- a/drivers/infiniband/hw/qib/qib_user_pages.c
+++ b/drivers/infiniband/hw/qib/qib_user_pages.c
@@ -49,43 +49,6 @@  static void __qib_release_user_pages(struct page **p, size_t num_pages,
 	}
 }
 
-/*
- * Call with current->mm->mmap_sem held.
- */
-static int __qib_get_user_pages(unsigned long start_page, size_t num_pages,
-				struct page **p)
-{
-	unsigned long lock_limit;
-	size_t got;
-	int ret;
-
-	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
-
-	if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) {
-		ret = -ENOMEM;
-		goto bail;
-	}
-
-	for (got = 0; got < num_pages; got += ret) {
-		ret = get_user_pages(start_page + got * PAGE_SIZE,
-				     num_pages - got,
-				     FOLL_WRITE | FOLL_FORCE,
-				     p + got, NULL);
-		if (ret < 0)
-			goto bail_release;
-	}
-
-	atomic64_add(num_pages, &current->mm->pinned_vm);
-
-	ret = 0;
-	goto bail;
-
-bail_release:
-	__qib_release_user_pages(p, got, 0);
-bail:
-	return ret;
-}
-
 /**
  * qib_map_page - a safety wrapper around pci_map_page()
  *
@@ -137,26 +100,40 @@  int qib_map_page(struct pci_dev *hwdev, struct page *page, dma_addr_t *daddr)
 int qib_get_user_pages(unsigned long start_page, size_t num_pages,
 		       struct page **p)
 {
+	unsigned long locked, lock_limit;
+	size_t got;
 	int ret;
 
-	down_write(&current->mm->mmap_sem);
+	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
+	locked = atomic64_add_return(num_pages, &current->mm->pinned_vm);
 
-	ret = __qib_get_user_pages(start_page, num_pages, p);
+	if (locked > lock_limit && !capable(CAP_IPC_LOCK)) {
+		ret = -ENOMEM;
+		goto bail;
+	}
 
-	up_write(&current->mm->mmap_sem);
+	for (got = 0; got < num_pages; got += ret) {
+		ret = get_user_pages_fast(start_page + got * PAGE_SIZE,
+				     num_pages - got,
+				     FOLL_WRITE | FOLL_FORCE,
+				     p + got);
+		if (ret < 0)
+			goto bail_release;
+	}
 
+	return 0;
+bail_release:
+	__qib_release_user_pages(p, got, 0);
+bail:
+	atomic64_sub(num_pages, &current->mm->pinned_vm);
 	return ret;
 }
 
 void qib_release_user_pages(struct page **p, size_t num_pages)
 {
-	if (current->mm) /* during close after signal, mm can be NULL */
-		down_write(&current->mm->mmap_sem);
-
 	__qib_release_user_pages(p, num_pages, 1);
 
-	if (current->mm) {
+	if (current->mm) { /* during close after signal, mm can be NULL */
 		atomic64_sub(num_pages, &current->mm->pinned_vm);
-		up_write(&current->mm->mmap_sem);
 	}
 }