Message ID | 20190121174220.10583-4-dave@stgolabs.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: make pinned_vm atomic and simplify users | expand |
On Mon, Jan 21, 2019 at 09:42:17AM -0800, Davidlohr Bueso wrote: > The driver uses mmap_sem for both pinned_vm accounting and > get_user_pages(). By using gup_fast() and letting the mm handle > the lock if needed, we can no longer rely on the semaphore and > simplify the whole thing as the pinning is decoupled from the lock. > > This also fixes a bug that __qib_get_user_pages was not taking into > account the current value of pinned_vm. > > Cc: dennis.dalessandro@intel.com > Cc: mike.marciniszyn@intel.com > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Davidlohr Bueso <dbueso@suse.de> > --- > drivers/infiniband/hw/qib/qib_user_pages.c | 67 ++++++++++-------------------- > 1 file changed, 22 insertions(+), 45 deletions(-) I need you to respin this patch/series against the latest rdma tree: git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git branch for-next > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c > -static int __qib_get_user_pages(unsigned long start_page, size_t num_pages, > - struct page **p) > -{ > - unsigned long lock_limit; > - size_t got; > - int ret; > - > - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > - > - if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) { > - ret = -ENOMEM; > - goto bail; > - } > - > - for (got = 0; got < num_pages; got += ret) { > - ret = get_user_pages(start_page + got * PAGE_SIZE, > - num_pages - got, > - FOLL_WRITE | FOLL_FORCE, > - p + got, NULL); As this has been rightly changed to get_user_pages_longterm, and I think the right answer to solve the conflict is to discard some of this patch? Since Andrew is OK with this I can move this ahead once this is resolved, thanks. Jason
On Mon, Jan 28, 2019 at 04:31:40PM -0700, Jason Gunthorpe wrote: > On Mon, Jan 21, 2019 at 09:42:17AM -0800, Davidlohr Bueso wrote: > > The driver uses mmap_sem for both pinned_vm accounting and > > get_user_pages(). By using gup_fast() and letting the mm handle > > the lock if needed, we can no longer rely on the semaphore and > > simplify the whole thing as the pinning is decoupled from the lock. > > > > This also fixes a bug that __qib_get_user_pages was not taking into > > account the current value of pinned_vm. > > > > Cc: dennis.dalessandro@intel.com > > Cc: mike.marciniszyn@intel.com > > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de> > > drivers/infiniband/hw/qib/qib_user_pages.c | 67 ++++++++++-------------------- > > 1 file changed, 22 insertions(+), 45 deletions(-) > > I need you to respin this patch/series against the latest rdma tree: > > git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git > > branch for-next > > > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c > > -static int __qib_get_user_pages(unsigned long start_page, size_t num_pages, > > - struct page **p) > > -{ > > - unsigned long lock_limit; > > - size_t got; > > - int ret; > > - > > - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > > - > > - if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) { > > - ret = -ENOMEM; > > - goto bail; > > - } > > - > > - for (got = 0; got < num_pages; got += ret) { > > - ret = get_user_pages(start_page + got * PAGE_SIZE, > > - num_pages - got, > > - FOLL_WRITE | FOLL_FORCE, > > - p + got, NULL); > > As this has been rightly changed to get_user_pages_longterm, and I > think the right answer to solve the conflict is to discard some of > this patch? .. and I'm looking at some of the other conversions here.. *most likely* any caller that is manipulating rlimit for get_user_pages should really be calling get_user_pages_longterm, so they should not be converted to use _fast? Jason
On Mon, 28 Jan 2019, Jason Gunthorpe wrote: >.. and I'm looking at some of the other conversions here.. *most >likely* any caller that is manipulating rlimit for get_user_pages >should really be calling get_user_pages_longterm, so they should not >be converted to use _fast? Yeah this was something that crossed my mind could come up when I first sent the series. I'll send a v3 where I leave the gup() calls alone for rdma drivers; and we can at least convert the mmap_sem to reader. Thanks, Davidlohr
On Mon, Jan 28, 2019 at 09:46:07PM -0700, Jason Gunthorpe wrote: > On Mon, Jan 28, 2019 at 04:31:40PM -0700, Jason Gunthorpe wrote: > > On Mon, Jan 21, 2019 at 09:42:17AM -0800, Davidlohr Bueso wrote: > > > The driver uses mmap_sem for both pinned_vm accounting and > > > get_user_pages(). By using gup_fast() and letting the mm handle > > > the lock if needed, we can no longer rely on the semaphore and > > > simplify the whole thing as the pinning is decoupled from the lock. > > > > > > This also fixes a bug that __qib_get_user_pages was not taking into > > > account the current value of pinned_vm. > > > > > > Cc: dennis.dalessandro@intel.com > > > Cc: mike.marciniszyn@intel.com > > > Reviewed-by: Ira Weiny <ira.weiny@intel.com> > > > Signed-off-by: Davidlohr Bueso <dbueso@suse.de> > > > drivers/infiniband/hw/qib/qib_user_pages.c | 67 ++++++++++-------------------- > > > 1 file changed, 22 insertions(+), 45 deletions(-) > > > > I need you to respin this patch/series against the latest rdma tree: > > > > git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git > > > > branch for-next > > > > > diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c > > > -static int __qib_get_user_pages(unsigned long start_page, size_t num_pages, > > > - struct page **p) > > > -{ > > > - unsigned long lock_limit; > > > - size_t got; > > > - int ret; > > > - > > > - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; > > > - > > > - if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) { > > > - ret = -ENOMEM; > > > - goto bail; > > > - } > > > - > > > - for (got = 0; got < num_pages; got += ret) { > > > - ret = get_user_pages(start_page + got * PAGE_SIZE, > > > - num_pages - got, > > > - FOLL_WRITE | FOLL_FORCE, > > > - p + got, NULL); > > > > As this has been rightly changed to get_user_pages_longterm, and I > > think the right answer to solve the conflict is to discard some of > > this patch? > > .. and I'm looking at some of the other conversions here.. *most > likely* any caller that is manipulating rlimit for get_user_pages > should really be calling get_user_pages_longterm, so they should not > be converted to use _fast? Is this a question? I'm not sure I understand the meaning here? Ira > > Jason
On Tue, Jan 29, 2019 at 10:50:05AM -0800, Ira Weiny wrote: > > .. and I'm looking at some of the other conversions here.. *most > > likely* any caller that is manipulating rlimit for get_user_pages > > should really be calling get_user_pages_longterm, so they should not > > be converted to use _fast? > > Is this a question? I'm not sure I understand the meaning here? More an invitation to disprove the statement Jason
> > On Tue, Jan 29, 2019 at 10:50:05AM -0800, Ira Weiny wrote: > > > .. and I'm looking at some of the other conversions here.. *most > > > likely* any caller that is manipulating rlimit for get_user_pages > > > should really be calling get_user_pages_longterm, so they should not > > > be converted to use _fast? > > > > Is this a question? I'm not sure I understand the meaning here? > > More an invitation to disprove the statement Generally I agree. But would be best if we could get fast GUP for performance. I have not worked out if that will be possible with the final "longterm" solutions. IRa > > Jason
On Wed 30-01-19 18:01:33, Weiny, Ira wrote: > > > > On Tue, Jan 29, 2019 at 10:50:05AM -0800, Ira Weiny wrote: > > > > .. and I'm looking at some of the other conversions here.. *most > > > > likely* any caller that is manipulating rlimit for get_user_pages > > > > should really be calling get_user_pages_longterm, so they should not > > > > be converted to use _fast? > > > > > > Is this a question? I'm not sure I understand the meaning here? > > > > More an invitation to disprove the statement > > Generally I agree. But would be best if we could get fast GUP for > performance. I have not worked out if that will be possible with the > final "longterm" solutions. Initially probably not, longer-term it might be added if there are performance data supporting that (i.e., showing real workload that would benefit). In principle there's nothing that would prevent gup_fast like functionality for long-term pins but I expect there will be always additional overhead (compared to plain gup_fast()) of establishing something like leases to identify long-term pins. But we haven't figured out the details yet. For now we concentrate on fixing short-term pins and issues with those. Honza
diff --git a/drivers/infiniband/hw/qib/qib_user_pages.c b/drivers/infiniband/hw/qib/qib_user_pages.c index 602387bf98e7..e4114aad4a2f 100644 --- a/drivers/infiniband/hw/qib/qib_user_pages.c +++ b/drivers/infiniband/hw/qib/qib_user_pages.c @@ -49,43 +49,6 @@ static void __qib_release_user_pages(struct page **p, size_t num_pages, } } -/* - * Call with current->mm->mmap_sem held. - */ -static int __qib_get_user_pages(unsigned long start_page, size_t num_pages, - struct page **p) -{ - unsigned long lock_limit; - size_t got; - int ret; - - lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; - - if (num_pages > lock_limit && !capable(CAP_IPC_LOCK)) { - ret = -ENOMEM; - goto bail; - } - - for (got = 0; got < num_pages; got += ret) { - ret = get_user_pages(start_page + got * PAGE_SIZE, - num_pages - got, - FOLL_WRITE | FOLL_FORCE, - p + got, NULL); - if (ret < 0) - goto bail_release; - } - - atomic64_add(num_pages, ¤t->mm->pinned_vm); - - ret = 0; - goto bail; - -bail_release: - __qib_release_user_pages(p, got, 0); -bail: - return ret; -} - /** * qib_map_page - a safety wrapper around pci_map_page() * @@ -137,26 +100,40 @@ int qib_map_page(struct pci_dev *hwdev, struct page *page, dma_addr_t *daddr) int qib_get_user_pages(unsigned long start_page, size_t num_pages, struct page **p) { + unsigned long locked, lock_limit; + size_t got; int ret; - down_write(¤t->mm->mmap_sem); + lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT; + locked = atomic64_add_return(num_pages, ¤t->mm->pinned_vm); - ret = __qib_get_user_pages(start_page, num_pages, p); + if (locked > lock_limit && !capable(CAP_IPC_LOCK)) { + ret = -ENOMEM; + goto bail; + } - up_write(¤t->mm->mmap_sem); + for (got = 0; got < num_pages; got += ret) { + ret = get_user_pages_fast(start_page + got * PAGE_SIZE, + num_pages - got, + FOLL_WRITE | FOLL_FORCE, + p + got); + if (ret < 0) + goto bail_release; + } + return 0; +bail_release: + __qib_release_user_pages(p, got, 0); +bail: + atomic64_sub(num_pages, ¤t->mm->pinned_vm); return ret; } void qib_release_user_pages(struct page **p, size_t num_pages) { - if (current->mm) /* during close after signal, mm can be NULL */ - down_write(¤t->mm->mmap_sem); - __qib_release_user_pages(p, num_pages, 1); - if (current->mm) { + if (current->mm) { /* during close after signal, mm can be NULL */ atomic64_sub(num_pages, ¤t->mm->pinned_vm); - up_write(¤t->mm->mmap_sem); } }