mbox series

[v2,00/15] Consolidate the mmu notifier interval_tree and locking

Message ID 20191028201032.6352-1-jgg@ziepe.ca (mailing list archive)
Headers show
Series Consolidate the mmu notifier interval_tree and locking | expand

Message

Jason Gunthorpe Oct. 28, 2019, 8:10 p.m. UTC
From: Jason Gunthorpe <jgg@mellanox.com>

8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, hfi1,
scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
they only use invalidate_range_start/end and immediately check the
invalidating range against some driver data structure to tell if the
driver is interested. Half of them use an interval_tree, the others are
simple linear search lists.

Of the ones I checked they largely seem to have various kinds of races,
bugs and poor implementation. This is a result of the complexity in how
the notifier interacts with get_user_pages(). It is extremely difficult to
use it correctly.

Consolidate all of this code together into the core mmu_notifier and
provide a locking scheme similar to hmm_mirror that allows the user to
safely use get_user_pages() and reliably know if the page list still
matches the mm.

This new arrangment plays nicely with the !blockable mode for
OOM. Scanning the interval tree is done such that the intersection test
will always succeed, and since there is no invalidate_range_end exposed to
drivers the scheme safely allows multiple drivers to be subscribed.

Four places are converted as an example of how the new API is used.
Four are left for future patches:
 - i915_gem has complex locking around destruction of a registration,
   needs more study
 - hfi1 (2nd user) needs access to the rbtree
 - scif_dma has a complicated logic flow
 - vhost's mmu notifiers are already being rewritten

This series, and the other code it depends on is available on my github:

https://github.com/jgunthorpe/linux/commits/mmu_notifier

v2 changes:
- Add mmu_range_set_seq() to set the mrn sequence number under the driver
  lock and make the locking more understandable
- Add some additional comments around locking/READ_ONCe
- Make the WARN_ON flow in mn_itree_invalidate a bit easier to follow
- Fix wrong WARN_ON

Jason Gunthorpe (15):
  mm/mmu_notifier: define the header pre-processor parts even if
    disabled
  mm/mmu_notifier: add an interval tree notifier
  mm/hmm: allow hmm_range to be used with a mmu_range_notifier or
    hmm_mirror
  mm/hmm: define the pre-processor related parts of hmm.h even if
    disabled
  RDMA/odp: Use mmu_range_notifier_insert()
  RDMA/hfi1: Use mmu_range_notifier_inset for user_exp_rcv
  drm/radeon: use mmu_range_notifier_insert
  xen/gntdev: Use select for DMA_SHARED_BUFFER
  xen/gntdev: use mmu_range_notifier_insert
  nouveau: use mmu_notifier directly for invalidate_range_start
  nouveau: use mmu_range_notifier instead of hmm_mirror
  drm/amdgpu: Call find_vma under mmap_sem
  drm/amdgpu: Use mmu_range_insert instead of hmm_mirror
  drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror
  mm/hmm: remove hmm_mirror and related

 Documentation/vm/hmm.rst                      | 105 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   9 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c        | 457 +++------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h        |  53 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  13 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 111 ++--
 drivers/gpu/drm/nouveau/nouveau_svm.c         | 231 +++++---
 drivers/gpu/drm/radeon/radeon.h               |   9 +-
 drivers/gpu/drm/radeon/radeon_mn.c            | 219 ++-----
 drivers/infiniband/core/device.c              |   1 -
 drivers/infiniband/core/umem_odp.c            | 288 +--------
 drivers/infiniband/hw/hfi1/file_ops.c         |   2 +-
 drivers/infiniband/hw/hfi1/hfi.h              |   2 +-
 drivers/infiniband/hw/hfi1/user_exp_rcv.c     | 146 ++---
 drivers/infiniband/hw/hfi1/user_exp_rcv.h     |   3 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   7 +-
 drivers/infiniband/hw/mlx5/mr.c               |   3 +-
 drivers/infiniband/hw/mlx5/odp.c              |  50 +-
 drivers/xen/Kconfig                           |   3 +-
 drivers/xen/gntdev-common.h                   |   8 +-
 drivers/xen/gntdev.c                          | 180 ++----
 include/linux/hmm.h                           | 195 +------
 include/linux/mmu_notifier.h                  | 144 ++++-
 include/rdma/ib_umem_odp.h                    |  65 +--
 include/rdma/ib_verbs.h                       |   2 -
 kernel/fork.c                                 |   1 -
 mm/Kconfig                                    |   2 +-
 mm/hmm.c                                      | 275 +--------
 mm/mmu_notifier.c                             | 546 +++++++++++++++++-
 32 files changed, 1225 insertions(+), 1922 deletions(-)

Comments

Dennis Dalessandro Oct. 29, 2019, 12:19 p.m. UTC | #1
On 10/28/2019 4:10 PM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> This converts one of the two users of mmu_notifiers to use the new API.
> The conversion is fairly straightforward, however the existing use of
> notifiers here seems to be racey.
> 
> Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
> Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

I tested v1, and replied to it [1]. I can re-test with this version if 
you like as well.

[1] https://marc.info/?l=linux-rdma&m=157235130606412&w=2

-Denny
Jason Gunthorpe Oct. 29, 2019, 12:51 p.m. UTC | #2
On Tue, Oct 29, 2019 at 08:19:20AM -0400, Dennis Dalessandro wrote:
> On 10/28/2019 4:10 PM, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > This converts one of the two users of mmu_notifiers to use the new API.
> > The conversion is fairly straightforward, however the existing use of
> > notifiers here seems to be racey.
> > 
> > Cc: Mike Marciniszyn <mike.marciniszyn@intel.com>
> > Cc: Dennis Dalessandro <dennis.dalessandro@intel.com>
> > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> 
> I tested v1, and replied to it [1]. I can re-test with this version if you
> like as well.
> 
> [1] https://marc.info/?l=linux-rdma&m=157235130606412&w=2

I think it is fine, nothing really changed in v2, thanks

Jason
Boris Ostrovsky Oct. 30, 2019, 4:55 p.m. UTC | #3
On 10/28/19 4:10 PM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
>
> gntdev simply wants to monitor a specific VMA for any notifier events,
> this can be done straightforwardly using mmu_range_notifier_insert() over
> the VMA's VA range.
>
> The notifier should be attached until the original VMA is destroyed.
>
> It is unclear if any of this is even sane, but at least a lot of duplicate
> code is removed.

I didn't have a chance to look at the patch itself yet but as a heads-up
--- it crashes dom0.

-boris


>
> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: xen-devel@lists.xenproject.org
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> ---
>  drivers/xen/gntdev-common.h |   8 +-
>  drivers/xen/gntdev.c        | 180 ++++++++++--------------------------
>  2 files changed, 49 insertions(+), 139 deletions(-)
>
> diff --git a/drivers/xen/gntdev-common.h b/drivers/xen/gntdev-common.h
> index 2f8b949c3eeb14..b201fdd20b667b 100644
> --- a/drivers/xen/gntdev-common.h
> +++ b/drivers/xen/gntdev-common.h
> @@ -21,15 +21,8 @@ struct gntdev_dmabuf_priv;
>  struct gntdev_priv {
>  	/* Maps with visible offsets in the file descriptor. */
>  	struct list_head maps;
> -	/*
> -	 * Maps that are not visible; will be freed on munmap.
> -	 * Only populated if populate_freeable_maps == 1
> -	 */
> -	struct list_head freeable_maps;
>  	/* lock protects maps and freeable_maps. */
>  	struct mutex lock;
> -	struct mm_struct *mm;
> -	struct mmu_notifier mn;
>  
>  #ifdef CONFIG_XEN_GRANT_DMA_ALLOC
>  	/* Device for which DMA memory is allocated. */
> @@ -49,6 +42,7 @@ struct gntdev_unmap_notify {
>  };
>  
>  struct gntdev_grant_map {
> +	struct mmu_range_notifier notifier;
>  	struct list_head next;
>  	struct vm_area_struct *vma;
>  	int index;
> diff --git a/drivers/xen/gntdev.c b/drivers/xen/gntdev.c
> index a446a7221e13e9..12d626670bebbc 100644
> --- a/drivers/xen/gntdev.c
> +++ b/drivers/xen/gntdev.c
> @@ -65,7 +65,6 @@ MODULE_PARM_DESC(limit, "Maximum number of grants that may be mapped by "
>  static atomic_t pages_mapped = ATOMIC_INIT(0);
>  
>  static int use_ptemod;
> -#define populate_freeable_maps use_ptemod
>  
>  static int unmap_grant_pages(struct gntdev_grant_map *map,
>  			     int offset, int pages);
> @@ -251,12 +250,6 @@ void gntdev_put_map(struct gntdev_priv *priv, struct gntdev_grant_map *map)
>  		evtchn_put(map->notify.event);
>  	}
>  
> -	if (populate_freeable_maps && priv) {
> -		mutex_lock(&priv->lock);
> -		list_del(&map->next);
> -		mutex_unlock(&priv->lock);
> -	}
> -
>  	if (map->pages && !use_ptemod)
>  		unmap_grant_pages(map, 0, map->count);
>  	gntdev_free_map(map);
> @@ -445,17 +438,9 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
>  	struct gntdev_priv *priv = file->private_data;
>  
>  	pr_debug("gntdev_vma_close %p\n", vma);
> -	if (use_ptemod) {
> -		/* It is possible that an mmu notifier could be running
> -		 * concurrently, so take priv->lock to ensure that the vma won't
> -		 * vanishing during the unmap_grant_pages call, since we will
> -		 * spin here until that completes. Such a concurrent call will
> -		 * not do any unmapping, since that has been done prior to
> -		 * closing the vma, but it may still iterate the unmap_ops list.
> -		 */
> -		mutex_lock(&priv->lock);
> +	if (use_ptemod && map->vma == vma) {
> +		mmu_range_notifier_remove(&map->notifier);
>  		map->vma = NULL;
> -		mutex_unlock(&priv->lock);
>  	}
>  	vma->vm_private_data = NULL;
>  	gntdev_put_map(priv, map);
> @@ -477,109 +462,44 @@ static const struct vm_operations_struct gntdev_vmops = {
>  
>  /* ------------------------------------------------------------------ */
>  
> -static bool in_range(struct gntdev_grant_map *map,
> -			      unsigned long start, unsigned long end)
> -{
> -	if (!map->vma)
> -		return false;
> -	if (map->vma->vm_start >= end)
> -		return false;
> -	if (map->vma->vm_end <= start)
> -		return false;
> -
> -	return true;
> -}
> -
> -static int unmap_if_in_range(struct gntdev_grant_map *map,
> -			      unsigned long start, unsigned long end,
> -			      bool blockable)
> +static bool gntdev_invalidate(struct mmu_range_notifier *mn,
> +			      const struct mmu_notifier_range *range,
> +			      unsigned long cur_seq)
>  {
> +	struct gntdev_grant_map *map =
> +		container_of(mn, struct gntdev_grant_map, notifier);
>  	unsigned long mstart, mend;
>  	int err;
>  
> -	if (!in_range(map, start, end))
> -		return 0;
> +	if (!mmu_notifier_range_blockable(range))
> +		return false;
>  
> -	if (!blockable)
> -		return -EAGAIN;
> +	/*
> +	 * If the VMA is split or otherwise changed the notifier is not
> +	 * updated, but we don't want to process VA's outside the modified
> +	 * VMA. FIXME: It would be much more understandable to just prevent
> +	 * modifying the VMA in the first place.
> +	 */
> +	if (map->vma->vm_start >= range->end ||
> +	    map->vma->vm_end <= range->start)
> +		return true;
>  
> -	mstart = max(start, map->vma->vm_start);
> -	mend   = min(end,   map->vma->vm_end);
> +	mstart = max(range->start, map->vma->vm_start);
> +	mend = min(range->end, map->vma->vm_end);
>  	pr_debug("map %d+%d (%lx %lx), range %lx %lx, mrange %lx %lx\n",
>  			map->index, map->count,
>  			map->vma->vm_start, map->vma->vm_end,
> -			start, end, mstart, mend);
> +			range->start, range->end, mstart, mend);
>  	err = unmap_grant_pages(map,
>  				(mstart - map->vma->vm_start) >> PAGE_SHIFT,
>  				(mend - mstart) >> PAGE_SHIFT);
>  	WARN_ON(err);
>  
> -	return 0;
> -}
> -
> -static int mn_invl_range_start(struct mmu_notifier *mn,
> -			       const struct mmu_notifier_range *range)
> -{
> -	struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
> -	struct gntdev_grant_map *map;
> -	int ret = 0;
> -
> -	if (mmu_notifier_range_blockable(range))
> -		mutex_lock(&priv->lock);
> -	else if (!mutex_trylock(&priv->lock))
> -		return -EAGAIN;
> -
> -	list_for_each_entry(map, &priv->maps, next) {
> -		ret = unmap_if_in_range(map, range->start, range->end,
> -					mmu_notifier_range_blockable(range));
> -		if (ret)
> -			goto out_unlock;
> -	}
> -	list_for_each_entry(map, &priv->freeable_maps, next) {
> -		ret = unmap_if_in_range(map, range->start, range->end,
> -					mmu_notifier_range_blockable(range));
> -		if (ret)
> -			goto out_unlock;
> -	}
> -
> -out_unlock:
> -	mutex_unlock(&priv->lock);
> -
> -	return ret;
> -}
> -
> -static void mn_release(struct mmu_notifier *mn,
> -		       struct mm_struct *mm)
> -{
> -	struct gntdev_priv *priv = container_of(mn, struct gntdev_priv, mn);
> -	struct gntdev_grant_map *map;
> -	int err;
> -
> -	mutex_lock(&priv->lock);
> -	list_for_each_entry(map, &priv->maps, next) {
> -		if (!map->vma)
> -			continue;
> -		pr_debug("map %d+%d (%lx %lx)\n",
> -				map->index, map->count,
> -				map->vma->vm_start, map->vma->vm_end);
> -		err = unmap_grant_pages(map, /* offset */ 0, map->count);
> -		WARN_ON(err);
> -	}
> -	list_for_each_entry(map, &priv->freeable_maps, next) {
> -		if (!map->vma)
> -			continue;
> -		pr_debug("map %d+%d (%lx %lx)\n",
> -				map->index, map->count,
> -				map->vma->vm_start, map->vma->vm_end);
> -		err = unmap_grant_pages(map, /* offset */ 0, map->count);
> -		WARN_ON(err);
> -	}
> -	mutex_unlock(&priv->lock);
> +	return true;
>  }
>  
> -static const struct mmu_notifier_ops gntdev_mmu_ops = {
> -	.release                = mn_release,
> -	.invalidate_range_start = mn_invl_range_start,
> +static const struct mmu_range_notifier_ops gntdev_mmu_ops = {
> +	.invalidate = gntdev_invalidate,
>  };
>  
>  /* ------------------------------------------------------------------ */
> @@ -594,7 +514,6 @@ static int gntdev_open(struct inode *inode, struct file *flip)
>  		return -ENOMEM;
>  
>  	INIT_LIST_HEAD(&priv->maps);
> -	INIT_LIST_HEAD(&priv->freeable_maps);
>  	mutex_init(&priv->lock);
>  
>  #ifdef CONFIG_XEN_GNTDEV_DMABUF
> @@ -606,17 +525,6 @@ static int gntdev_open(struct inode *inode, struct file *flip)
>  	}
>  #endif
>  
> -	if (use_ptemod) {
> -		priv->mm = get_task_mm(current);
> -		if (!priv->mm) {
> -			kfree(priv);
> -			return -ENOMEM;
> -		}
> -		priv->mn.ops = &gntdev_mmu_ops;
> -		ret = mmu_notifier_register(&priv->mn, priv->mm);
> -		mmput(priv->mm);
> -	}
> -
>  	if (ret) {
>  		kfree(priv);
>  		return ret;
> @@ -653,16 +561,12 @@ static int gntdev_release(struct inode *inode, struct file *flip)
>  		list_del(&map->next);
>  		gntdev_put_map(NULL /* already removed */, map);
>  	}
> -	WARN_ON(!list_empty(&priv->freeable_maps));
>  	mutex_unlock(&priv->lock);
>  
>  #ifdef CONFIG_XEN_GNTDEV_DMABUF
>  	gntdev_dmabuf_fini(priv->dmabuf_priv);
>  #endif
>  
> -	if (use_ptemod)
> -		mmu_notifier_unregister(&priv->mn, priv->mm);
> -
>  	kfree(priv);
>  	return 0;
>  }
> @@ -723,8 +627,6 @@ static long gntdev_ioctl_unmap_grant_ref(struct gntdev_priv *priv,
>  	map = gntdev_find_map_index(priv, op.index >> PAGE_SHIFT, op.count);
>  	if (map) {
>  		list_del(&map->next);
> -		if (populate_freeable_maps)
> -			list_add_tail(&map->next, &priv->freeable_maps);
>  		err = 0;
>  	}
>  	mutex_unlock(&priv->lock);
> @@ -1096,11 +998,6 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  		goto unlock_out;
>  	if (use_ptemod && map->vma)
>  		goto unlock_out;
> -	if (use_ptemod && priv->mm != vma->vm_mm) {
> -		pr_warn("Huh? Other mm?\n");
> -		goto unlock_out;
> -	}
> -
>  	refcount_inc(&map->users);
>  
>  	vma->vm_ops = &gntdev_vmops;
> @@ -1111,10 +1008,6 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  		vma->vm_flags |= VM_DONTCOPY;
>  
>  	vma->vm_private_data = map;
> -
> -	if (use_ptemod)
> -		map->vma = vma;
> -
>  	if (map->flags) {
>  		if ((vma->vm_flags & VM_WRITE) &&
>  				(map->flags & GNTMAP_readonly))
> @@ -1125,8 +1018,28 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  			map->flags |= GNTMAP_readonly;
>  	}
>  
> +	if (use_ptemod) {
> +		map->vma = vma;
> +		err = mmu_range_notifier_insert_locked(
> +			&map->notifier, vma->vm_start,
> +			vma->vm_end - vma->vm_start, vma->vm_mm);
> +		if (err)
> +			goto out_unlock_put;
> +	}
>  	mutex_unlock(&priv->lock);
>  
> +	/*
> +	 * gntdev takes the address of the PTE in find_grant_ptes() and passes
> +	 * it to the hypervisor in gntdev_map_grant_pages(). The purpose of
> +	 * the notifier is to prevent the hypervisor pointer to the PTE from
> +	 * going stale.
> +	 *
> +	 * Since this vma's mappings can't be touched without the mmap_sem,
> +	 * and we are holding it now, there is no need for the notifier_range
> +	 * locking pattern.
> +	 */
> +	mmu_range_read_begin(&map->notifier);
> +
>  	if (use_ptemod) {
>  		map->pages_vm_start = vma->vm_start;
>  		err = apply_to_page_range(vma->vm_mm, vma->vm_start,
> @@ -1175,8 +1088,11 @@ static int gntdev_mmap(struct file *flip, struct vm_area_struct *vma)
>  	mutex_unlock(&priv->lock);
>  out_put_map:
>  	if (use_ptemod) {
> -		map->vma = NULL;
>  		unmap_grant_pages(map, 0, map->count);
> +		if (map->vma) {
> +			mmu_range_notifier_remove(&map->notifier);
> +			map->vma = NULL;
> +		}
>  	}
>  	gntdev_put_map(priv, map);
>  	return err;
Jason Gunthorpe Nov. 1, 2019, 7:54 p.m. UTC | #4
On Mon, Oct 28, 2019 at 05:10:17PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, hfi1,
> scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
> they only use invalidate_range_start/end and immediately check the
> invalidating range against some driver data structure to tell if the
> driver is interested. Half of them use an interval_tree, the others are
> simple linear search lists.

Now that we have the most of the driver changes tested and reviewed
I'm going to move this series into linux-next via the hmm tree, minus
the xen gntdev patches as they are not working yet.

I will keep collecting acks and any additional changes.

Thanks,
Jason
Ralph Campbell Nov. 1, 2019, 8:54 p.m. UTC | #5
On 10/28/19 1:10 PM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, hfi1,
> scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
> they only use invalidate_range_start/end and immediately check the
> invalidating range against some driver data structure to tell if the
> driver is interested. Half of them use an interval_tree, the others are
> simple linear search lists.
> 
> Of the ones I checked they largely seem to have various kinds of races,
> bugs and poor implementation. This is a result of the complexity in how
> the notifier interacts with get_user_pages(). It is extremely difficult to
> use it correctly.
> 
> Consolidate all of this code together into the core mmu_notifier and
> provide a locking scheme similar to hmm_mirror that allows the user to
> safely use get_user_pages() and reliably know if the page list still
> matches the mm.
> 
> This new arrangment plays nicely with the !blockable mode for
> OOM. Scanning the interval tree is done such that the intersection test
> will always succeed, and since there is no invalidate_range_end exposed to
> drivers the scheme safely allows multiple drivers to be subscribed.
> 
> Four places are converted as an example of how the new API is used.
> Four are left for future patches:
>   - i915_gem has complex locking around destruction of a registration,
>     needs more study
>   - hfi1 (2nd user) needs access to the rbtree
>   - scif_dma has a complicated logic flow
>   - vhost's mmu notifiers are already being rewritten
> 
> This series, and the other code it depends on is available on my github:
> 
> https://github.com/jgunthorpe/linux/commits/mmu_notifier
> 
> v2 changes:
> - Add mmu_range_set_seq() to set the mrn sequence number under the driver
>    lock and make the locking more understandable
> - Add some additional comments around locking/READ_ONCe
> - Make the WARN_ON flow in mn_itree_invalidate a bit easier to follow
> - Fix wrong WARN_ON
> 
> Jason Gunthorpe (15):
>    mm/mmu_notifier: define the header pre-processor parts even if
>      disabled
>    mm/mmu_notifier: add an interval tree notifier
>    mm/hmm: allow hmm_range to be used with a mmu_range_notifier or
>      hmm_mirror
>    mm/hmm: define the pre-processor related parts of hmm.h even if
>      disabled
>    RDMA/odp: Use mmu_range_notifier_insert()
>    RDMA/hfi1: Use mmu_range_notifier_inset for user_exp_rcv
>    drm/radeon: use mmu_range_notifier_insert
>    xen/gntdev: Use select for DMA_SHARED_BUFFER
>    xen/gntdev: use mmu_range_notifier_insert
>    nouveau: use mmu_notifier directly for invalidate_range_start
>    nouveau: use mmu_range_notifier instead of hmm_mirror
>    drm/amdgpu: Call find_vma under mmap_sem
>    drm/amdgpu: Use mmu_range_insert instead of hmm_mirror
>    drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror
>    mm/hmm: remove hmm_mirror and related
> 
>   Documentation/vm/hmm.rst                      | 105 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   9 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  14 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c        | 457 +++------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h        |  53 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  13 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 111 ++--
>   drivers/gpu/drm/nouveau/nouveau_svm.c         | 231 +++++---
>   drivers/gpu/drm/radeon/radeon.h               |   9 +-
>   drivers/gpu/drm/radeon/radeon_mn.c            | 219 ++-----
>   drivers/infiniband/core/device.c              |   1 -
>   drivers/infiniband/core/umem_odp.c            | 288 +--------
>   drivers/infiniband/hw/hfi1/file_ops.c         |   2 +-
>   drivers/infiniband/hw/hfi1/hfi.h              |   2 +-
>   drivers/infiniband/hw/hfi1/user_exp_rcv.c     | 146 ++---
>   drivers/infiniband/hw/hfi1/user_exp_rcv.h     |   3 +-
>   drivers/infiniband/hw/mlx5/mlx5_ib.h          |   7 +-
>   drivers/infiniband/hw/mlx5/mr.c               |   3 +-
>   drivers/infiniband/hw/mlx5/odp.c              |  50 +-
>   drivers/xen/Kconfig                           |   3 +-
>   drivers/xen/gntdev-common.h                   |   8 +-
>   drivers/xen/gntdev.c                          | 180 ++----
>   include/linux/hmm.h                           | 195 +------
>   include/linux/mmu_notifier.h                  | 144 ++++-
>   include/rdma/ib_umem_odp.h                    |  65 +--
>   include/rdma/ib_verbs.h                       |   2 -
>   kernel/fork.c                                 |   1 -
>   mm/Kconfig                                    |   2 +-
>   mm/hmm.c                                      | 275 +--------
>   mm/mmu_notifier.c                             | 546 +++++++++++++++++-
>   32 files changed, 1225 insertions(+), 1922 deletions(-)
> 

You can add my Tested-by for the mm and nouveau changes.
IOW, patches 1-4, 10-11, and 15.

Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Jürgen Groß Nov. 5, 2019, 2:44 p.m. UTC | #6
On 01.11.19 19:26, Jason Gunthorpe wrote:
> On Mon, Oct 28, 2019 at 05:10:25PM -0300, Jason Gunthorpe wrote:
>> From: Jason Gunthorpe <jgg@mellanox.com>
>>
>> DMA_SHARED_BUFFER can not be enabled by the user (it represents a library
>> set in the kernel). The kconfig convention is to use select for such
>> symbols so they are turned on implicitly when the user enables a kconfig
>> that needs them.
>>
>> Otherwise the XEN_GNTDEV_DMABUF kconfig is overly difficult to enable.
>>
>> Fixes: 932d6562179e ("xen/gntdev: Add initial support for dma-buf UAPI")
>> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
>> Cc: xen-devel@lists.xenproject.org
>> Cc: Juergen Gross <jgross@suse.com>
>> Cc: Stefano Stabellini <sstabellini@kernel.org>
>> Reviewed-by: Juergen Gross <jgross@suse.com>
>> Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
>> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
>> ---
>>   drivers/xen/Kconfig | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> Juergen/Oleksandr/Xen Maintainers:
> 
> Would you take this patch through a xen related tree? The only reason
> I had in this series is to make it easier to compile-test the gntdev
> changes.

Yes, I can take it for 5.5.


Juergen
Boris Ostrovsky Nov. 5, 2019, 3:16 p.m. UTC | #7
On 11/4/19 9:31 PM, Jason Gunthorpe wrote:
> On Mon, Nov 04, 2019 at 05:03:31PM -0500, Boris Ostrovsky wrote:
>> On 10/28/19 4:10 PM, Jason Gunthorpe wrote:
>>> @@ -445,17 +438,9 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
>>>  	struct gntdev_priv *priv = file->private_data;
>>>  
>>>  	pr_debug("gntdev_vma_close %p\n", vma);
>>> -	if (use_ptemod) {
>>> -		/* It is possible that an mmu notifier could be running
>>> -		 * concurrently, so take priv->lock to ensure that the vma won't
>>> -		 * vanishing during the unmap_grant_pages call, since we will
>>> -		 * spin here until that completes. Such a concurrent call will
>>> -		 * not do any unmapping, since that has been done prior to
>>> -		 * closing the vma, but it may still iterate the unmap_ops list.
>>> -		 */
>>> -		mutex_lock(&priv->lock);
>>> +	if (use_ptemod && map->vma == vma) {
>>
>> Is it possible for map->vma not to be equal to vma?
> It could be NULL at least if use_ptemod is not set.
>
> Otherwise, I'm not sure, the confusing bit is that the map comes from
> here:
>
>         map = gntdev_find_map_index(priv, index, count);
>
> It looks like the intent is that the map->vma is always set to the
> only vma that has the map as private_data.

I am not sure how this can work otherwise. We stash map pointer in vm's
vm_private_data and vice versa (for use_ptemod) gntdev_mmap() so if they
have to match.

That's why I was asking you to see if you had something particular in
mind when you added this test.

> So, I suppose it can be relaxed to a null test and a WARN_ON that it
> hasn't changed?

You mean

if (use_ptemod) {
        WARN_ON(map->vma != vma);
        ...


Yes, that sounds good.


-boris
Jürgen Groß Nov. 7, 2019, 9:39 a.m. UTC | #8
On 28.10.19 21:10, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> DMA_SHARED_BUFFER can not be enabled by the user (it represents a library
> set in the kernel). The kconfig convention is to use select for such
> symbols so they are turned on implicitly when the user enables a kconfig
> that needs them.
> 
> Otherwise the XEN_GNTDEV_DMABUF kconfig is overly difficult to enable.
> 
> Fixes: 932d6562179e ("xen/gntdev: Add initial support for dma-buf UAPI")
> Cc: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com>
> Cc: xen-devel@lists.xenproject.org
> Cc: Juergen Gross <jgross@suse.com>
> Cc: Stefano Stabellini <sstabellini@kernel.org>
> Reviewed-by: Juergen Gross <jgross@suse.com>
> Reviewed-by: Oleksandr Andrushchenko <oleksandr_andrushchenko@epam.com>
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Applied to xen/tip.git for-linus-5.5a


Juergen
Jason Gunthorpe Nov. 7, 2019, 8:36 p.m. UTC | #9
On Tue, Nov 05, 2019 at 10:16:46AM -0500, Boris Ostrovsky wrote:

> > So, I suppose it can be relaxed to a null test and a WARN_ON that it
> > hasn't changed?
> 
> You mean
> 
> if (use_ptemod) {
>         WARN_ON(map->vma != vma);
>         ...
> 
> 
> Yes, that sounds good.

I amended my copy of the patch with the above, has this rework shown
signs of working?

@@ -436,7 +436,8 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
        struct gntdev_priv *priv = file->private_data;
 
        pr_debug("gntdev_vma_close %p\n", vma);
-       if (use_ptemod && map->vma == vma) {
+       if (use_ptemod) {
+               WARN_ON(map->vma != vma);
                mmu_range_notifier_remove(&map->notifier);
                map->vma = NULL;
        }

Jason
Boris Ostrovsky Nov. 7, 2019, 10:54 p.m. UTC | #10
On 11/7/19 3:36 PM, Jason Gunthorpe wrote:
> On Tue, Nov 05, 2019 at 10:16:46AM -0500, Boris Ostrovsky wrote:
>
>>> So, I suppose it can be relaxed to a null test and a WARN_ON that it
>>> hasn't changed?
>> You mean
>>
>> if (use_ptemod) {
>>         WARN_ON(map->vma != vma);
>>         ...
>>
>>
>> Yes, that sounds good.
> I amended my copy of the patch with the above, has this rework shown
> signs of working?

Yes, it works fine.

But please don't forget notifier ops initialization.

With those two changes,

Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>

>
> @@ -436,7 +436,8 @@ static void gntdev_vma_close(struct vm_area_struct *vma)
>         struct gntdev_priv *priv = file->private_data;
>  
>         pr_debug("gntdev_vma_close %p\n", vma);
> -       if (use_ptemod && map->vma == vma) {
> +       if (use_ptemod) {
> +               WARN_ON(map->vma != vma);
>                 mmu_range_notifier_remove(&map->notifier);
>                 map->vma = NULL;
>         }
>
> Jason