mbox series

[v2,00/15] Consolidate the mmu notifier interval_tree and locking

Message ID 20191028201032.6352-1-jgg@ziepe.ca (mailing list archive)
Headers show
Series Consolidate the mmu notifier interval_tree and locking | expand

Message

Jason Gunthorpe Oct. 28, 2019, 8:10 p.m. UTC
From: Jason Gunthorpe <jgg@mellanox.com>

8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, hfi1,
scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
they only use invalidate_range_start/end and immediately check the
invalidating range against some driver data structure to tell if the
driver is interested. Half of them use an interval_tree, the others are
simple linear search lists.

Of the ones I checked they largely seem to have various kinds of races,
bugs and poor implementation. This is a result of the complexity in how
the notifier interacts with get_user_pages(). It is extremely difficult to
use it correctly.

Consolidate all of this code together into the core mmu_notifier and
provide a locking scheme similar to hmm_mirror that allows the user to
safely use get_user_pages() and reliably know if the page list still
matches the mm.

This new arrangment plays nicely with the !blockable mode for
OOM. Scanning the interval tree is done such that the intersection test
will always succeed, and since there is no invalidate_range_end exposed to
drivers the scheme safely allows multiple drivers to be subscribed.

Four places are converted as an example of how the new API is used.
Four are left for future patches:
 - i915_gem has complex locking around destruction of a registration,
   needs more study
 - hfi1 (2nd user) needs access to the rbtree
 - scif_dma has a complicated logic flow
 - vhost's mmu notifiers are already being rewritten

This series, and the other code it depends on is available on my github:

https://github.com/jgunthorpe/linux/commits/mmu_notifier

v2 changes:
- Add mmu_range_set_seq() to set the mrn sequence number under the driver
  lock and make the locking more understandable
- Add some additional comments around locking/READ_ONCe
- Make the WARN_ON flow in mn_itree_invalidate a bit easier to follow
- Fix wrong WARN_ON

Jason Gunthorpe (15):
  mm/mmu_notifier: define the header pre-processor parts even if
    disabled
  mm/mmu_notifier: add an interval tree notifier
  mm/hmm: allow hmm_range to be used with a mmu_range_notifier or
    hmm_mirror
  mm/hmm: define the pre-processor related parts of hmm.h even if
    disabled
  RDMA/odp: Use mmu_range_notifier_insert()
  RDMA/hfi1: Use mmu_range_notifier_inset for user_exp_rcv
  drm/radeon: use mmu_range_notifier_insert
  xen/gntdev: Use select for DMA_SHARED_BUFFER
  xen/gntdev: use mmu_range_notifier_insert
  nouveau: use mmu_notifier directly for invalidate_range_start
  nouveau: use mmu_range_notifier instead of hmm_mirror
  drm/amdgpu: Call find_vma under mmap_sem
  drm/amdgpu: Use mmu_range_insert instead of hmm_mirror
  drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror
  mm/hmm: remove hmm_mirror and related

 Documentation/vm/hmm.rst                      | 105 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   9 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  14 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   1 +
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c        | 457 +++------------
 drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h        |  53 --
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  13 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 111 ++--
 drivers/gpu/drm/nouveau/nouveau_svm.c         | 231 +++++---
 drivers/gpu/drm/radeon/radeon.h               |   9 +-
 drivers/gpu/drm/radeon/radeon_mn.c            | 219 ++-----
 drivers/infiniband/core/device.c              |   1 -
 drivers/infiniband/core/umem_odp.c            | 288 +--------
 drivers/infiniband/hw/hfi1/file_ops.c         |   2 +-
 drivers/infiniband/hw/hfi1/hfi.h              |   2 +-
 drivers/infiniband/hw/hfi1/user_exp_rcv.c     | 146 ++---
 drivers/infiniband/hw/hfi1/user_exp_rcv.h     |   3 +-
 drivers/infiniband/hw/mlx5/mlx5_ib.h          |   7 +-
 drivers/infiniband/hw/mlx5/mr.c               |   3 +-
 drivers/infiniband/hw/mlx5/odp.c              |  50 +-
 drivers/xen/Kconfig                           |   3 +-
 drivers/xen/gntdev-common.h                   |   8 +-
 drivers/xen/gntdev.c                          | 180 ++----
 include/linux/hmm.h                           | 195 +------
 include/linux/mmu_notifier.h                  | 144 ++++-
 include/rdma/ib_umem_odp.h                    |  65 +--
 include/rdma/ib_verbs.h                       |   2 -
 kernel/fork.c                                 |   1 -
 mm/Kconfig                                    |   2 +-
 mm/hmm.c                                      | 275 +--------
 mm/mmu_notifier.c                             | 546 +++++++++++++++++-
 32 files changed, 1225 insertions(+), 1922 deletions(-)

Comments

Christian König Oct. 29, 2019, 7:49 a.m. UTC | #1
Am 28.10.19 um 21:10 schrieb Jason Gunthorpe:
> From: Jason Gunthorpe <jgg@mellanox.com>
>
> find_vma() must be called under the mmap_sem, reorganize this code to
> do the vma check after entering the lock.
>
> Further, fix the unlocked use of struct task_struct's mm, instead use
> the mm from hmm_mirror which has an active mm_grab. Also the mm_grab
> must be converted to a mm_get before acquiring mmap_sem or calling
> find_vma().
>
> Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers")
> Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads")
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: David (ChunMing) Zhou <David1.Zhou@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

Acked-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 37 ++++++++++++++-----------
>   1 file changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index dff41d0a85fe96..c0e41f1f0c2365 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -35,6 +35,7 @@
>   #include <linux/hmm.h>
>   #include <linux/pagemap.h>
>   #include <linux/sched/task.h>
> +#include <linux/sched/mm.h>
>   #include <linux/seq_file.h>
>   #include <linux/slab.h>
>   #include <linux/swap.h>
> @@ -788,7 +789,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>   	struct hmm_mirror *mirror = bo->mn ? &bo->mn->mirror : NULL;
>   	struct ttm_tt *ttm = bo->tbo.ttm;
>   	struct amdgpu_ttm_tt *gtt = (void *)ttm;
> -	struct mm_struct *mm = gtt->usertask->mm;
> +	struct mm_struct *mm;
>   	unsigned long start = gtt->userptr;
>   	struct vm_area_struct *vma;
>   	struct hmm_range *range;
> @@ -796,25 +797,14 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>   	uint64_t *pfns;
>   	int r = 0;
>   
> -	if (!mm) /* Happens during process shutdown */
> -		return -ESRCH;
> -
>   	if (unlikely(!mirror)) {
>   		DRM_DEBUG_DRIVER("Failed to get hmm_mirror\n");
> -		r = -EFAULT;
> -		goto out;
> +		return -EFAULT;
>   	}
>   
> -	vma = find_vma(mm, start);
> -	if (unlikely(!vma || start < vma->vm_start)) {
> -		r = -EFAULT;
> -		goto out;
> -	}
> -	if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
> -		vma->vm_file)) {
> -		r = -EPERM;
> -		goto out;
> -	}
> +	mm = mirror->hmm->mmu_notifier.mm;
> +	if (!mmget_not_zero(mm)) /* Happens during process shutdown */
> +		return -ESRCH;
>   
>   	range = kzalloc(sizeof(*range), GFP_KERNEL);
>   	if (unlikely(!range)) {
> @@ -847,6 +837,17 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>   	hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT);
>   
>   	down_read(&mm->mmap_sem);
> +	vma = find_vma(mm, start);
> +	if (unlikely(!vma || start < vma->vm_start)) {
> +		r = -EFAULT;
> +		goto out_unlock;
> +	}
> +	if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
> +		vma->vm_file)) {
> +		r = -EPERM;
> +		goto out_unlock;
> +	}
> +
>   	r = hmm_range_fault(range, 0);
>   	up_read(&mm->mmap_sem);
>   
> @@ -865,15 +866,19 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>   	}
>   
>   	gtt->range = range;
> +	mmput(mm);
>   
>   	return 0;
>   
> +out_unlock:
> +	up_read(&mm->mmap_sem);
>   out_free_pfns:
>   	hmm_range_unregister(range);
>   	kvfree(pfns);
>   out_free_ranges:
>   	kfree(range);
>   out:
> +	mmput(mm);
>   	return r;
>   }
>
Christian König Oct. 29, 2019, 1:07 p.m. UTC | #2
Am 29.10.19 um 17:28 schrieb Kuehling, Felix:
> On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote:
>> From: Jason Gunthorpe <jgg@mellanox.com>
>>
>> find_vma() must be called under the mmap_sem, reorganize this code to
>> do the vma check after entering the lock.
>>
>> Further, fix the unlocked use of struct task_struct's mm, instead use
>> the mm from hmm_mirror which has an active mm_grab. Also the mm_grab
>> must be converted to a mm_get before acquiring mmap_sem or calling
>> find_vma().
>>
>> Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers")
>> Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads")
>> Cc: Alex Deucher <alexander.deucher@amd.com>
>> Cc: Christian König <christian.koenig@amd.com>
>> Cc: David (ChunMing) Zhou <David1.Zhou@amd.com>
>> Cc: amd-gfx@lists.freedesktop.org
>> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> One question inline to confirm my understanding. Otherwise this patch is
>
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>
>
>
>> ---
>>    drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 37 ++++++++++++++-----------
>>    1 file changed, 21 insertions(+), 16 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> index dff41d0a85fe96..c0e41f1f0c2365 100644
>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
>> @@ -35,6 +35,7 @@
>>    #include <linux/hmm.h>
>>    #include <linux/pagemap.h>
>>    #include <linux/sched/task.h>
>> +#include <linux/sched/mm.h>
>>    #include <linux/seq_file.h>
>>    #include <linux/slab.h>
>>    #include <linux/swap.h>
>> @@ -788,7 +789,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>>    	struct hmm_mirror *mirror = bo->mn ? &bo->mn->mirror : NULL;
>>    	struct ttm_tt *ttm = bo->tbo.ttm;
>>    	struct amdgpu_ttm_tt *gtt = (void *)ttm;
>> -	struct mm_struct *mm = gtt->usertask->mm;
>> +	struct mm_struct *mm;
>>    	unsigned long start = gtt->userptr;
>>    	struct vm_area_struct *vma;
>>    	struct hmm_range *range;
>> @@ -796,25 +797,14 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>>    	uint64_t *pfns;
>>    	int r = 0;
>>    
>> -	if (!mm) /* Happens during process shutdown */
>> -		return -ESRCH;
>> -
>>    	if (unlikely(!mirror)) {
>>    		DRM_DEBUG_DRIVER("Failed to get hmm_mirror\n");
>> -		r = -EFAULT;
>> -		goto out;
>> +		return -EFAULT;
>>    	}
>>    
>> -	vma = find_vma(mm, start);
>> -	if (unlikely(!vma || start < vma->vm_start)) {
>> -		r = -EFAULT;
>> -		goto out;
>> -	}
>> -	if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
>> -		vma->vm_file)) {
>> -		r = -EPERM;
>> -		goto out;
>> -	}
>> +	mm = mirror->hmm->mmu_notifier.mm;
>> +	if (!mmget_not_zero(mm)) /* Happens during process shutdown */
> This works because mirror->hmm->mmu_notifier holds an mmgrab reference
> to the mm? So the MM will not just go away, but if the mmget refcount is
> 0, it means the mm is marked for destruction and shouldn't be used any more.

Yes, exactly. That is a rather common pattern, one reference count for 
the functionality and one for the structure.

When the functionality is gone the structure might still be alive for 
some reason. TTM and a couple of other structures use the same approach.

Christian.

>
>
>> +		return -ESRCH;
>>    
>>    	range = kzalloc(sizeof(*range), GFP_KERNEL);
>>    	if (unlikely(!range)) {
>> @@ -847,6 +837,17 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>>    	hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT);
>>    
>>    	down_read(&mm->mmap_sem);
>> +	vma = find_vma(mm, start);
>> +	if (unlikely(!vma || start < vma->vm_start)) {
>> +		r = -EFAULT;
>> +		goto out_unlock;
>> +	}
>> +	if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
>> +		vma->vm_file)) {
>> +		r = -EPERM;
>> +		goto out_unlock;
>> +	}
>> +
>>    	r = hmm_range_fault(range, 0);
>>    	up_read(&mm->mmap_sem);
>>    
>> @@ -865,15 +866,19 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>>    	}
>>    
>>    	gtt->range = range;
>> +	mmput(mm);
>>    
>>    	return 0;
>>    
>> +out_unlock:
>> +	up_read(&mm->mmap_sem);
>>    out_free_pfns:
>>    	hmm_range_unregister(range);
>>    	kvfree(pfns);
>>    out_free_ranges:
>>    	kfree(range);
>>    out:
>> +	mmput(mm);
>>    	return r;
>>    }
>>    
> _______________________________________________
> amd-gfx mailing list
> amd-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx
Felix Kuehling Oct. 29, 2019, 4:28 p.m. UTC | #3
On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
>
> find_vma() must be called under the mmap_sem, reorganize this code to
> do the vma check after entering the lock.
>
> Further, fix the unlocked use of struct task_struct's mm, instead use
> the mm from hmm_mirror which has an active mm_grab. Also the mm_grab
> must be converted to a mm_get before acquiring mmap_sem or calling
> find_vma().
>
> Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers")
> Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads")
> Cc: Alex Deucher <alexander.deucher@amd.com>
> Cc: Christian König <christian.koenig@amd.com>
> Cc: David (ChunMing) Zhou <David1.Zhou@amd.com>
> Cc: amd-gfx@lists.freedesktop.org
> Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>

One question inline to confirm my understanding. Otherwise this patch is

Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>


> ---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c | 37 ++++++++++++++-----------
>   1 file changed, 21 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> index dff41d0a85fe96..c0e41f1f0c2365 100644
> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c
> @@ -35,6 +35,7 @@
>   #include <linux/hmm.h>
>   #include <linux/pagemap.h>
>   #include <linux/sched/task.h>
> +#include <linux/sched/mm.h>
>   #include <linux/seq_file.h>
>   #include <linux/slab.h>
>   #include <linux/swap.h>
> @@ -788,7 +789,7 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>   	struct hmm_mirror *mirror = bo->mn ? &bo->mn->mirror : NULL;
>   	struct ttm_tt *ttm = bo->tbo.ttm;
>   	struct amdgpu_ttm_tt *gtt = (void *)ttm;
> -	struct mm_struct *mm = gtt->usertask->mm;
> +	struct mm_struct *mm;
>   	unsigned long start = gtt->userptr;
>   	struct vm_area_struct *vma;
>   	struct hmm_range *range;
> @@ -796,25 +797,14 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>   	uint64_t *pfns;
>   	int r = 0;
>   
> -	if (!mm) /* Happens during process shutdown */
> -		return -ESRCH;
> -
>   	if (unlikely(!mirror)) {
>   		DRM_DEBUG_DRIVER("Failed to get hmm_mirror\n");
> -		r = -EFAULT;
> -		goto out;
> +		return -EFAULT;
>   	}
>   
> -	vma = find_vma(mm, start);
> -	if (unlikely(!vma || start < vma->vm_start)) {
> -		r = -EFAULT;
> -		goto out;
> -	}
> -	if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
> -		vma->vm_file)) {
> -		r = -EPERM;
> -		goto out;
> -	}
> +	mm = mirror->hmm->mmu_notifier.mm;
> +	if (!mmget_not_zero(mm)) /* Happens during process shutdown */

This works because mirror->hmm->mmu_notifier holds an mmgrab reference 
to the mm? So the MM will not just go away, but if the mmget refcount is 
0, it means the mm is marked for destruction and shouldn't be used any more.


> +		return -ESRCH;
>   
>   	range = kzalloc(sizeof(*range), GFP_KERNEL);
>   	if (unlikely(!range)) {
> @@ -847,6 +837,17 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>   	hmm_range_wait_until_valid(range, HMM_RANGE_DEFAULT_TIMEOUT);
>   
>   	down_read(&mm->mmap_sem);
> +	vma = find_vma(mm, start);
> +	if (unlikely(!vma || start < vma->vm_start)) {
> +		r = -EFAULT;
> +		goto out_unlock;
> +	}
> +	if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
> +		vma->vm_file)) {
> +		r = -EPERM;
> +		goto out_unlock;
> +	}
> +
>   	r = hmm_range_fault(range, 0);
>   	up_read(&mm->mmap_sem);
>   
> @@ -865,15 +866,19 @@ int amdgpu_ttm_tt_get_user_pages(struct amdgpu_bo *bo, struct page **pages)
>   	}
>   
>   	gtt->range = range;
> +	mmput(mm);
>   
>   	return 0;
>   
> +out_unlock:
> +	up_read(&mm->mmap_sem);
>   out_free_pfns:
>   	hmm_range_unregister(range);
>   	kvfree(pfns);
>   out_free_ranges:
>   	kfree(range);
>   out:
> +	mmput(mm);
>   	return r;
>   }
>
Jason Gunthorpe Oct. 29, 2019, 5:19 p.m. UTC | #4
On Tue, Oct 29, 2019 at 04:28:43PM +0000, Kuehling, Felix wrote:
> On 2019-10-28 4:10 p.m., Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> >
> > find_vma() must be called under the mmap_sem, reorganize this code to
> > do the vma check after entering the lock.
> >
> > Further, fix the unlocked use of struct task_struct's mm, instead use
> > the mm from hmm_mirror which has an active mm_grab. Also the mm_grab
> > must be converted to a mm_get before acquiring mmap_sem or calling
> > find_vma().
> >
> > Fixes: 66c45500bfdc ("drm/amdgpu: use new HMM APIs and helpers")
> > Fixes: 0919195f2b0d ("drm/amdgpu: Enable amdgpu_ttm_tt_get_user_pages in worker threads")
> > Cc: Alex Deucher <alexander.deucher@amd.com>
> > Cc: Christian König <christian.koenig@amd.com>
> > Cc: David (ChunMing) Zhou <David1.Zhou@amd.com>
> > Cc: amd-gfx@lists.freedesktop.org
> > Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
> 
> One question inline to confirm my understanding. Otherwise this patch is
> 
> Reviewed-by: Felix Kuehling <Felix.Kuehling@amd.com>

Thanks

> > -	if (unlikely((gtt->userflags & AMDGPU_GEM_USERPTR_ANONONLY) &&
> > -		vma->vm_file)) {
> > -		r = -EPERM;
> > -		goto out;
> > -	}
> > +	mm = mirror->hmm->mmu_notifier.mm;
> > +	if (!mmget_not_zero(mm)) /* Happens during process shutdown */
> 
> This works because mirror->hmm->mmu_notifier holds an mmgrab reference 
> to the mm?

Yes, this makes sure the mm pointer remains valid

> So the MM will not just go away, but if the mmget refcount is 0, it
> means the mm is marked for destruction and shouldn't be used any
> more.

Not just marked for destruction, but that another thread is
progressing or finished release().

The other detail here is that in general you can't get the mmap_sem
without also having a mmget as exit_mmap() does not lock the mmap_sem
in some places where it alters the datastructures. ie racing
find_vma() with exit_mmap() is not allowed.

This means we have to hold the mmget across the hmm_range_fault(), but
we can drop the mmget and then test mmu_range_read_retry() under the
driver lock. It will return true if the mmget refcount has gone to
zero in the mean time.

But I think this is probably a poor driver design, a driver should
just hold the mmget() until it has completed establishing the shadow
PTEs, as it is hard to see a reason not to..

Jason
Jason Gunthorpe Nov. 1, 2019, 7:54 p.m. UTC | #5
On Mon, Oct 28, 2019 at 05:10:17PM -0300, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, hfi1,
> scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
> they only use invalidate_range_start/end and immediately check the
> invalidating range against some driver data structure to tell if the
> driver is interested. Half of them use an interval_tree, the others are
> simple linear search lists.

Now that we have the most of the driver changes tested and reviewed
I'm going to move this series into linux-next via the hmm tree, minus
the xen gntdev patches as they are not working yet.

I will keep collecting acks and any additional changes.

Thanks,
Jason
Ralph Campbell Nov. 1, 2019, 8:54 p.m. UTC | #6
On 10/28/19 1:10 PM, Jason Gunthorpe wrote:
> From: Jason Gunthorpe <jgg@mellanox.com>
> 
> 8 of the mmu_notifier using drivers (i915_gem, radeon_mn, umem_odp, hfi1,
> scif_dma, vhost, gntdev, hmm) drivers are using a common pattern where
> they only use invalidate_range_start/end and immediately check the
> invalidating range against some driver data structure to tell if the
> driver is interested. Half of them use an interval_tree, the others are
> simple linear search lists.
> 
> Of the ones I checked they largely seem to have various kinds of races,
> bugs and poor implementation. This is a result of the complexity in how
> the notifier interacts with get_user_pages(). It is extremely difficult to
> use it correctly.
> 
> Consolidate all of this code together into the core mmu_notifier and
> provide a locking scheme similar to hmm_mirror that allows the user to
> safely use get_user_pages() and reliably know if the page list still
> matches the mm.
> 
> This new arrangment plays nicely with the !blockable mode for
> OOM. Scanning the interval tree is done such that the intersection test
> will always succeed, and since there is no invalidate_range_end exposed to
> drivers the scheme safely allows multiple drivers to be subscribed.
> 
> Four places are converted as an example of how the new API is used.
> Four are left for future patches:
>   - i915_gem has complex locking around destruction of a registration,
>     needs more study
>   - hfi1 (2nd user) needs access to the rbtree
>   - scif_dma has a complicated logic flow
>   - vhost's mmu notifiers are already being rewritten
> 
> This series, and the other code it depends on is available on my github:
> 
> https://github.com/jgunthorpe/linux/commits/mmu_notifier
> 
> v2 changes:
> - Add mmu_range_set_seq() to set the mrn sequence number under the driver
>    lock and make the locking more understandable
> - Add some additional comments around locking/READ_ONCe
> - Make the WARN_ON flow in mn_itree_invalidate a bit easier to follow
> - Fix wrong WARN_ON
> 
> Jason Gunthorpe (15):
>    mm/mmu_notifier: define the header pre-processor parts even if
>      disabled
>    mm/mmu_notifier: add an interval tree notifier
>    mm/hmm: allow hmm_range to be used with a mmu_range_notifier or
>      hmm_mirror
>    mm/hmm: define the pre-processor related parts of hmm.h even if
>      disabled
>    RDMA/odp: Use mmu_range_notifier_insert()
>    RDMA/hfi1: Use mmu_range_notifier_inset for user_exp_rcv
>    drm/radeon: use mmu_range_notifier_insert
>    xen/gntdev: Use select for DMA_SHARED_BUFFER
>    xen/gntdev: use mmu_range_notifier_insert
>    nouveau: use mmu_notifier directly for invalidate_range_start
>    nouveau: use mmu_range_notifier instead of hmm_mirror
>    drm/amdgpu: Call find_vma under mmap_sem
>    drm/amdgpu: Use mmu_range_insert instead of hmm_mirror
>    drm/amdgpu: Use mmu_range_notifier instead of hmm_mirror
>    mm/hmm: remove hmm_mirror and related
> 
>   Documentation/vm/hmm.rst                      | 105 +---
>   drivers/gpu/drm/amd/amdgpu/amdgpu.h           |   2 +
>   .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   9 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c        |  14 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c    |   1 +
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.c        | 457 +++------------
>   drivers/gpu/drm/amd/amdgpu/amdgpu_mn.h        |  53 --
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h    |  13 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c       | 111 ++--
>   drivers/gpu/drm/nouveau/nouveau_svm.c         | 231 +++++---
>   drivers/gpu/drm/radeon/radeon.h               |   9 +-
>   drivers/gpu/drm/radeon/radeon_mn.c            | 219 ++-----
>   drivers/infiniband/core/device.c              |   1 -
>   drivers/infiniband/core/umem_odp.c            | 288 +--------
>   drivers/infiniband/hw/hfi1/file_ops.c         |   2 +-
>   drivers/infiniband/hw/hfi1/hfi.h              |   2 +-
>   drivers/infiniband/hw/hfi1/user_exp_rcv.c     | 146 ++---
>   drivers/infiniband/hw/hfi1/user_exp_rcv.h     |   3 +-
>   drivers/infiniband/hw/mlx5/mlx5_ib.h          |   7 +-
>   drivers/infiniband/hw/mlx5/mr.c               |   3 +-
>   drivers/infiniband/hw/mlx5/odp.c              |  50 +-
>   drivers/xen/Kconfig                           |   3 +-
>   drivers/xen/gntdev-common.h                   |   8 +-
>   drivers/xen/gntdev.c                          | 180 ++----
>   include/linux/hmm.h                           | 195 +------
>   include/linux/mmu_notifier.h                  | 144 ++++-
>   include/rdma/ib_umem_odp.h                    |  65 +--
>   include/rdma/ib_verbs.h                       |   2 -
>   kernel/fork.c                                 |   1 -
>   mm/Kconfig                                    |   2 +-
>   mm/hmm.c                                      | 275 +--------
>   mm/mmu_notifier.c                             | 546 +++++++++++++++++-
>   32 files changed, 1225 insertions(+), 1922 deletions(-)
> 

You can add my Tested-by for the mm and nouveau changes.
IOW, patches 1-4, 10-11, and 15.

Tested-by: Ralph Campbell <rcampbell@nvidia.com>
Jason Gunthorpe Nov. 4, 2019, 8:40 p.m. UTC | #7
On Fri, Nov 01, 2019 at 01:54:45PM -0700, Ralph Campbell wrote:
> You can add my Tested-by for the mm and nouveau changes.
> IOW, patches 1-4, 10-11, and 15.
> 
> Tested-by: Ralph Campbell <rcampbell@nvidia.com>

Got it, thanks

Jason