mbox series

[-next,0/6] mm: make pinned_vm atomic and simplify users

Message ID 20190115181300.27547-1-dave@stgolabs.net (mailing list archive)
Headers show
Series mm: make pinned_vm atomic and simplify users | expand

Message

Davidlohr Bueso Jan. 15, 2019, 6:12 p.m. UTC
Hi,

The following patches aim to provide cleanups to users that pin pages
(mostly infiniband) by converting the counter to atomic -- note that
Daniel Jordan also has patches[1] for the locked_vm counterpart and vfio.

Apart from removing a source of mmap_sem writer, we benefit in that
we can get rid of a lot of code that defers work when the lock cannot
be acquired, as well as drivers avoiding mmap_sem altogether by also
converting gup to gup_fast() and letting the mm handle it. Users
that do the gup_longterm() remain of course under at least reader mmap_sem.

Everything has been compile-tested _only_ so I hope I didn't do anything
too stupid. Please consider for v5.1.

On a similar topic and potential follow up, it would be nice to resurrect
Peter's VM_PINNED idea in that the broken semantics that occurred after
bc3e53f682 ("mm: distinguish between mlocked and pinned pages") are still
present. Also encapsulating internal mm logic via mm[un]pin() instead of
drivers having to know about internals and playing nice with compaction are
all wins.

Thanks!

[1] https://lkml.org/lkml/2018/11/5/854

Davidlohr Bueso (6):
  mm: make mm->pinned_vm an atomic counter
  mic/scif: do not use mmap_sem
  drivers/IB,qib: do not use mmap_sem
  drivers/IB,hfi1: do not se mmap_sem
  drivers/IB,usnic: reduce scope of mmap_sem
  drivers/IB,core: reduce scope of mmap_sem

 drivers/infiniband/core/umem.c              | 47 +++-----------------
 drivers/infiniband/hw/hfi1/user_pages.c     | 12 ++---
 drivers/infiniband/hw/qib/qib_user_pages.c  | 69 ++++++++++-------------------
 drivers/infiniband/hw/usnic/usnic_ib_main.c |  2 -
 drivers/infiniband/hw/usnic/usnic_uiom.c    | 56 +++--------------------
 drivers/infiniband/hw/usnic/usnic_uiom.h    |  1 -
 drivers/misc/mic/scif/scif_rma.c            | 38 +++++-----------
 fs/proc/task_mmu.c                          |  2 +-
 include/linux/mm_types.h                    |  2 +-
 kernel/events/core.c                        |  8 ++--
 kernel/fork.c                               |  2 +-
 mm/debug.c                                  |  3 +-
 12 files changed, 57 insertions(+), 185 deletions(-)

Comments

Davidlohr Bueso Jan. 15, 2019, 6:18 p.m. UTC | #1
Also Ccing lkml, sorry.

On Tue, 15 Jan 2019, Davidlohr Bueso wrote:

>Hi,
>
>The following patches aim to provide cleanups to users that pin pages
>(mostly infiniband) by converting the counter to atomic -- note that
>Daniel Jordan also has patches[1] for the locked_vm counterpart and vfio.
>
>Apart from removing a source of mmap_sem writer, we benefit in that
>we can get rid of a lot of code that defers work when the lock cannot
>be acquired, as well as drivers avoiding mmap_sem altogether by also
>converting gup to gup_fast() and letting the mm handle it. Users
>that do the gup_longterm() remain of course under at least reader mmap_sem.
>
>Everything has been compile-tested _only_ so I hope I didn't do anything
>too stupid. Please consider for v5.1.
>
>On a similar topic and potential follow up, it would be nice to resurrect
>Peter's VM_PINNED idea in that the broken semantics that occurred after
>bc3e53f682 ("mm: distinguish between mlocked and pinned pages") are still
>present. Also encapsulating internal mm logic via mm[un]pin() instead of
>drivers having to know about internals and playing nice with compaction are
>all wins.
>
>Thanks!
>
>[1] https://lkml.org/lkml/2018/11/5/854
>
>Davidlohr Bueso (6):
>  mm: make mm->pinned_vm an atomic counter
>  mic/scif: do not use mmap_sem
>  drivers/IB,qib: do not use mmap_sem
>  drivers/IB,hfi1: do not se mmap_sem
>  drivers/IB,usnic: reduce scope of mmap_sem
>  drivers/IB,core: reduce scope of mmap_sem
>
> drivers/infiniband/core/umem.c              | 47 +++-----------------
> drivers/infiniband/hw/hfi1/user_pages.c     | 12 ++---
> drivers/infiniband/hw/qib/qib_user_pages.c  | 69 ++++++++++-------------------
> drivers/infiniband/hw/usnic/usnic_ib_main.c |  2 -
> drivers/infiniband/hw/usnic/usnic_uiom.c    | 56 +++--------------------
> drivers/infiniband/hw/usnic/usnic_uiom.h    |  1 -
> drivers/misc/mic/scif/scif_rma.c            | 38 +++++-----------
> fs/proc/task_mmu.c                          |  2 +-
> include/linux/mm_types.h                    |  2 +-
> kernel/events/core.c                        |  8 ++--
> kernel/fork.c                               |  2 +-
> mm/debug.c                                  |  3 +-
> 12 files changed, 57 insertions(+), 185 deletions(-)
>
>-- 
>2.16.4
>
Ira Weiny Jan. 15, 2019, 8:28 p.m. UTC | #2
On Tue, Jan 15, 2019 at 10:12:56AM -0800, Davidlohr Bueso wrote:
> The driver uses mmap_sem for both pinned_vm accounting and
> get_user_pages(). By using gup_fast() and letting the mm handle
> the lock if needed, we can no longer rely on the semaphore and
> simplify the whole thing.
> 
> Cc: sudeep.dutt@intel.com
> Cc: ashutosh.dixit@intel.com
> Signed-off-by: Davidlohr Bueso <dbueso@suse.de>

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> ---
>  drivers/misc/mic/scif/scif_rma.c | 36 +++++++++++-------------------------
>  1 file changed, 11 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/misc/mic/scif/scif_rma.c b/drivers/misc/mic/scif/scif_rma.c
> index a92b4d6f099c..445529ce2ad7 100644
> --- a/drivers/misc/mic/scif/scif_rma.c
> +++ b/drivers/misc/mic/scif/scif_rma.c
> @@ -272,21 +272,12 @@ static inline void __scif_release_mm(struct mm_struct *mm)
>  
>  static inline int
>  __scif_dec_pinned_vm_lock(struct mm_struct *mm,
> -			  int nr_pages, bool try_lock)
> +			  int nr_pages)
>  {
>  	if (!mm || !nr_pages || !scif_ulimit_check)
>  		return 0;
> -	if (try_lock) {
> -		if (!down_write_trylock(&mm->mmap_sem)) {
> -			dev_err(scif_info.mdev.this_device,
> -				"%s %d err\n", __func__, __LINE__);
> -			return -1;
> -		}
> -	} else {
> -		down_write(&mm->mmap_sem);
> -	}
> +
>  	atomic_long_sub(nr_pages, &mm->pinned_vm);
> -	up_write(&mm->mmap_sem);
>  	return 0;
>  }
>  
> @@ -298,16 +289,16 @@ static inline int __scif_check_inc_pinned_vm(struct mm_struct *mm,
>  	if (!mm || !nr_pages || !scif_ulimit_check)
>  		return 0;
>  
> -	locked = nr_pages;
> -	locked += atomic_long_read(&mm->pinned_vm);
>  	lock_limit = rlimit(RLIMIT_MEMLOCK) >> PAGE_SHIFT;
> +	locked = atomic_long_add_return(nr_pages, &mm->pinned_vm);
> +
>  	if ((locked > lock_limit) && !capable(CAP_IPC_LOCK)) {
> +		atomic_long_sub(nr_pages, &mm->pinned_vm);
>  		dev_err(scif_info.mdev.this_device,
>  			"locked(%lu) > lock_limit(%lu)\n",
>  			locked, lock_limit);
>  		return -ENOMEM;
>  	}
> -	atomic_long_set(&mm->pinned_vm, locked);
>  	return 0;
>  }
>  
> @@ -326,7 +317,7 @@ int scif_destroy_window(struct scif_endpt *ep, struct scif_window *window)
>  
>  	might_sleep();
>  	if (!window->temp && window->mm) {
> -		__scif_dec_pinned_vm_lock(window->mm, window->nr_pages, 0);
> +		__scif_dec_pinned_vm_lock(window->mm, window->nr_pages);
>  		__scif_release_mm(window->mm);
>  		window->mm = NULL;
>  	}
> @@ -737,7 +728,7 @@ int scif_unregister_window(struct scif_window *window)
>  					    ep->rma_info.dma_chan);
>  		} else {
>  			if (!__scif_dec_pinned_vm_lock(window->mm,
> -						       window->nr_pages, 1)) {
> +						       window->nr_pages)) {
>  				__scif_release_mm(window->mm);
>  				window->mm = NULL;
>  			}
> @@ -1385,28 +1376,23 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot,
>  		prot |= SCIF_PROT_WRITE;
>  retry:
>  		mm = current->mm;
> -		down_write(&mm->mmap_sem);
>  		if (ulimit) {
>  			err = __scif_check_inc_pinned_vm(mm, nr_pages);
>  			if (err) {
> -				up_write(&mm->mmap_sem);
>  				pinned_pages->nr_pages = 0;
>  				goto error_unmap;
>  			}
>  		}
>  
> -		pinned_pages->nr_pages = get_user_pages(
> +		pinned_pages->nr_pages = get_user_pages_fast(
>  				(u64)addr,
>  				nr_pages,
>  				(prot & SCIF_PROT_WRITE) ? FOLL_WRITE : 0,
> -				pinned_pages->pages,
> -				NULL);
> -		up_write(&mm->mmap_sem);
> +				pinned_pages->pages);
>  		if (nr_pages != pinned_pages->nr_pages) {
>  			if (try_upgrade) {
>  				if (ulimit)
> -					__scif_dec_pinned_vm_lock(mm,
> -								  nr_pages, 0);
> +					__scif_dec_pinned_vm_lock(mm, nr_pages);
>  				/* Roll back any pinned pages */
>  				for (i = 0; i < pinned_pages->nr_pages; i++) {
>  					if (pinned_pages->pages[i])
> @@ -1433,7 +1419,7 @@ int __scif_pin_pages(void *addr, size_t len, int *out_prot,
>  	return err;
>  dec_pinned:
>  	if (ulimit)
> -		__scif_dec_pinned_vm_lock(mm, nr_pages, 0);
> +		__scif_dec_pinned_vm_lock(mm, nr_pages);
>  	/* Something went wrong! Rollback */
>  error_unmap:
>  	pinned_pages->nr_pages = nr_pages;
> -- 
> 2.16.4
>