diff mbox series

[v3,hmm,11/12] mm/hmm: Remove confusing comment and logic from hmm_release

Message ID 20190614004450.20252-12-jgg@ziepe.ca (mailing list archive)
State New, archived
Headers show
Series mm/hmm: Various revisions from a locking/code review | expand

Commit Message

Jason Gunthorpe June 14, 2019, 12:44 a.m. UTC
From: Jason Gunthorpe <jgg@mellanox.com>

hmm_release() is called exactly once per hmm. ops->release() cannot
accidentally trigger any action that would recurse back onto
hmm->mirrors_sem.

This fixes a use after-free race of the form:

       CPU0                                   CPU1
                                           hmm_release()
                                             up_write(&hmm->mirrors_sem);
 hmm_mirror_unregister(mirror)
  down_write(&hmm->mirrors_sem);
  up_write(&hmm->mirrors_sem);
  kfree(mirror)
                                             mirror->ops->release(mirror)

The only user we have today for ops->release is an empty function, so this
is unambiguously safe.

As a consequence of plugging this race drivers are not allowed to
register/unregister mirrors from within a release op.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Tested-by: Philip Yang <Philip.Yang@amd.com>
---
 mm/hmm.c | 28 +++++++++-------------------
 1 file changed, 9 insertions(+), 19 deletions(-)

Comments

Jason Gunthorpe June 18, 2019, 12:45 a.m. UTC | #1
On Sat, Jun 15, 2019 at 07:21:06AM -0700, Christoph Hellwig wrote:
> On Thu, Jun 13, 2019 at 09:44:49PM -0300, Jason Gunthorpe wrote:
> > From: Jason Gunthorpe <jgg@mellanox.com>
> > 
> > hmm_release() is called exactly once per hmm. ops->release() cannot
> > accidentally trigger any action that would recurse back onto
> > hmm->mirrors_sem.
> 
> In linux-next amdgpu actually calls hmm_mirror_unregister from its
> release function.  That whole release function looks rather sketchy,
> but we probably need to sort that out first.

Does it? I see this:

static void amdgpu_hmm_mirror_release(struct hmm_mirror *mirror)
{
        struct amdgpu_mn *amn = container_of(mirror, struct amdgpu_mn, mirror);

        INIT_WORK(&amn->work, amdgpu_mn_destroy);
        schedule_work(&amn->work);
}

static struct hmm_mirror_ops amdgpu_hmm_mirror_ops[] = {
        [AMDGPU_MN_TYPE_GFX] = {
                .sync_cpu_device_pagetables = amdgpu_mn_sync_pagetables_gfx,
                .release = amdgpu_hmm_mirror_release
        },
        [AMDGPU_MN_TYPE_HSA] = {
                .sync_cpu_device_pagetables = amdgpu_mn_sync_pagetables_hsa,
                .release = amdgpu_hmm_mirror_release
        },
};


Am I looking at the wrong thing? Looks like it calls it through a work
queue should should be OK..

Though very strange that amdgpu only destroys the mirror via release,
that cannot be right.

Jason
Felix Kuehling June 19, 2019, 12:53 a.m. UTC | #2
On 2019-06-18 1:37, Christoph Hellwig wrote:
> On Mon, Jun 17, 2019 at 09:45:09PM -0300, Jason Gunthorpe wrote:
>> Am I looking at the wrong thing? Looks like it calls it through a work
>> queue should should be OK..
> Yes, it calls it through a work queue.  I guess that is fine because
> it needs to take the lock again.
>
>> Though very strange that amdgpu only destroys the mirror via release,
>> that cannot be right.
> As said the whole things looks rather odd to me.

This code is derived from our old MMU notifier code. Before HMM we used 
to register a single MMU notifier per mm_struct and look up virtual 
address ranges that had been registered for mirroring via driver API 
calls. The idea was to reuse a single MMU notifier for the life time of 
the process. It would remain registered until we got a notifier_release.

hmm_mirror took the place of that when we converted the code to HMM.

I suppose we could destroy the mirror earlier, when we have no more 
registered virtual address ranges, and create a new one if needed later.

Regards,
   Felix
Jason Gunthorpe June 19, 2019, 11:56 a.m. UTC | #3
On Wed, Jun 19, 2019 at 01:07:05AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 19, 2019 at 12:53:55AM +0000, Kuehling, Felix wrote:
> > This code is derived from our old MMU notifier code. Before HMM we used 
> > to register a single MMU notifier per mm_struct and look up virtual 
> > address ranges that had been registered for mirroring via driver API 
> > calls. The idea was to reuse a single MMU notifier for the life time of 
> > the process. It would remain registered until we got a notifier_release.
> > 
> > hmm_mirror took the place of that when we converted the code to HMM.
> > 
> > I suppose we could destroy the mirror earlier, when we have no more 
> > registered virtual address ranges, and create a new one if needed later.
> 
> I didn't write the code, but if you look at hmm_mirror it already is
> a multiplexer over the mmu notifier, and the intent clearly seems that
> you register one per range that you want to mirror, and not multiplex
> it once again.  In other words - I think each amdgpu_mn_node should
> probably have its own hmm_mirror.  And while the amdgpu_mn_node objects
> are currently stored in an interval tree it seems like they are only
> linearly iterated anyway, so a list actually seems pretty suitable.  If
> not we need to improve the core data structures instead of working
> around them.

This looks a lot like the ODP code (amdgpu_mn_node == ib_umem_odp)

The interval tree is to quickly find the driver object(s) that have
the virtual pages during invalidation:

static int amdgpu_mn_sync_pagetables_gfx(struct hmm_mirror *mirror,
                        const struct hmm_update *update)
{
        it = interval_tree_iter_first(&amn->objects, start, end);
        while (it) {
                [..]
                amdgpu_mn_invalidate_node(node, start, end);

And following the ODP model there should be a single hmm_mirror per-mm
(user can fork and stuff, this is something I want to have core code
help with). 

The hmm_mirror can either exist so long as objects exist, or it can
exist until the chardev is closed - but never longer than the
chardev's lifetime.

Maybe we should be considering providing a mmu notifier & interval
tree & lock abstraction since ODP & AMD are very similar here..

Jason
diff mbox series

Patch

diff --git a/mm/hmm.c b/mm/hmm.c
index 26af511cbdd075..c0d43302fd6b2f 100644
--- a/mm/hmm.c
+++ b/mm/hmm.c
@@ -137,26 +137,16 @@  static void hmm_release(struct mmu_notifier *mn, struct mm_struct *mm)
 	WARN_ON(!list_empty(&hmm->ranges));
 	mutex_unlock(&hmm->lock);
 
-	down_write(&hmm->mirrors_sem);
-	mirror = list_first_entry_or_null(&hmm->mirrors, struct hmm_mirror,
-					  list);
-	while (mirror) {
-		list_del_init(&mirror->list);
-		if (mirror->ops->release) {
-			/*
-			 * Drop mirrors_sem so the release callback can wait
-			 * on any pending work that might itself trigger a
-			 * mmu_notifier callback and thus would deadlock with
-			 * us.
-			 */
-			up_write(&hmm->mirrors_sem);
+	down_read(&hmm->mirrors_sem);
+	list_for_each_entry(mirror, &hmm->mirrors, list) {
+		/*
+		 * Note: The driver is not allowed to trigger
+		 * hmm_mirror_unregister() from this thread.
+		 */
+		if (mirror->ops->release)
 			mirror->ops->release(mirror);
-			down_write(&hmm->mirrors_sem);
-		}
-		mirror = list_first_entry_or_null(&hmm->mirrors,
-						  struct hmm_mirror, list);
 	}
-	up_write(&hmm->mirrors_sem);
+	up_read(&hmm->mirrors_sem);
 
 	hmm_put(hmm);
 }
@@ -286,7 +276,7 @@  void hmm_mirror_unregister(struct hmm_mirror *mirror)
 	struct hmm *hmm = mirror->hmm;
 
 	down_write(&hmm->mirrors_sem);
-	list_del_init(&mirror->list);
+	list_del(&mirror->list);
 	up_write(&hmm->mirrors_sem);
 	hmm_put(hmm);
 	memset(&mirror->hmm, POISON_INUSE, sizeof(mirror->hmm));