diff mbox series

drm/i915: Enforce TYPESAFE_BY_RCU vs refcount mb on reinitialisation

Message ID 20180804095236.24584-1-chris@chris-wilson.co.uk (mailing list archive)
State New, archived
Headers show
Series drm/i915: Enforce TYPESAFE_BY_RCU vs refcount mb on reinitialisation | expand

Commit Message

Chris Wilson Aug. 4, 2018, 9:52 a.m. UTC
By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from
underneath us, even when using rcu_read_lock(). We use a strong barrier
on acquiring the refcount during lookup, but this needs to be paired
with a barrier on re-initialising it. Currently we call dma_fence_init,
which ultimately does a plain atomic_set(1) on the refcount, not
providing any memory barriers. As we inspect some state before even
acquiring the refcount in the lookup (by arguing that we can detect
inconsistent requests), that state should be initialised before the
refcount.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Mika Kuoppala Aug. 6, 2018, 11:12 a.m. UTC | #1
Chris Wilson <chris@chris-wilson.co.uk> writes:

> By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from
> underneath us, even when using rcu_read_lock(). We use a strong barrier
> on acquiring the refcount during lookup, but this needs to be paired
> with a barrier on re-initialising it. Currently we call dma_fence_init,
> which ultimately does a plain atomic_set(1) on the refcount, not
> providing any memory barriers. As we inspect some state before even
> acquiring the refcount in the lookup (by arguing that we can detect
> inconsistent requests), that state should be initialised before the
> refcount.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_request.c | 7 +++++++
>  1 file changed, 7 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 5c2c93cbab12..04a0b8e75533 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>  	rq->timeline = ce->ring->timeline;
>  	GEM_BUG_ON(rq->timeline == &engine->timeline);
>  
> +	/*
> +	 * In order to coordinate with our RCU lookup,
> +	 * __i915_gem_active_get_rcu(), we need to ensure that the change
> +	 * to rq->engine is visible before acquring the refcount in the lookup.
> +	 */
> +	smp_wmb();
> +

There is quite a lot going on here as we try to get a reference
into a shapeshifting request.

By looking at the code acquiring it, dma_fence_get_rcu
and dma_fence_init and then the precheck of the request,
should memory barrier be:

smb_mb_before_atomic()?

Admittedly that would be uglier as fence_init hides the atomic_set,
but it is atomic on we are serializing. Especially
as there is no atomic in callsight.

Further, as engine and the kref are tightly bound,
should we initialize everything not related first, then
do engine init, wmb, fence init in a tight proximity?

Thanks,
-Mika

>  	spin_lock_init(&rq->lock);
>  	dma_fence_init(&rq->fence,
>  		       &i915_fence_ops,
> -- 
> 2.18.0
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Chris Wilson Aug. 6, 2018, 11:41 a.m. UTC | #2
Quoting Mika Kuoppala (2018-08-06 12:12:15)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from
> > underneath us, even when using rcu_read_lock(). We use a strong barrier
> > on acquiring the refcount during lookup, but this needs to be paired
> > with a barrier on re-initialising it. Currently we call dma_fence_init,
> > which ultimately does a plain atomic_set(1) on the refcount, not
> > providing any memory barriers. As we inspect some state before even
> > acquiring the refcount in the lookup (by arguing that we can detect
> > inconsistent requests), that state should be initialised before the
> > refcount.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >  drivers/gpu/drm/i915/i915_request.c | 7 +++++++
> >  1 file changed, 7 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 5c2c93cbab12..04a0b8e75533 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
> >       rq->timeline = ce->ring->timeline;
> >       GEM_BUG_ON(rq->timeline == &engine->timeline);
> >  
> > +     /*
> > +      * In order to coordinate with our RCU lookup,
> > +      * __i915_gem_active_get_rcu(), we need to ensure that the change
> > +      * to rq->engine is visible before acquring the refcount in the lookup.
> > +      */
> > +     smp_wmb();
> > +
> 
> There is quite a lot going on here as we try to get a reference
> into a shapeshifting request.
> 
> By looking at the code acquiring it, dma_fence_get_rcu
> and dma_fence_init and then the precheck of the request,
> should memory barrier be:
> 
> smb_mb_before_atomic()?

No. The code does have a mb, smb_mb_before_atomic is only for atomics
that don't themselves enforce a mb and so you need a bit of extra
weight. On x86, it's not even a mb, just a compiler barrier.

> Admittedly that would be uglier as fence_init hides the atomic_set,
> but it is atomic on we are serializing. Especially
> as there is no atomic in callsight.

Right, the suggestion in the thread was to use atomic_set_release(), but
that requires a lot of deconstruction merely to do the same: it adds
smp_mb() before the atomic_set.

> Further, as engine and the kref are tightly bound,
> should we initialize everything not related first, then
> do engine init, wmb, fence init in a tight proximity?

As we do. The existing order is sufficient for our needs. Everything
that needs to be initialised before the kref, is -- though I think it's
overkill as our argument about checking stale state is still correct and
safe. So what this nails down is the stability of a full referenced
request -- which is less worrisome as it will only be exposed to the rcu
onlookers much later, we don't have the same danger of immediate
exposure to rcu walkers.

What I do think is useful overall is that it gives the companion mb to
the one referenced by __i915_gem_active_get_rcu, and dma_fence_get_rcu
generally.
-Chris
Mika Kuoppala Aug. 6, 2018, 11:55 a.m. UTC | #3
Chris Wilson <chris@chris-wilson.co.uk> writes:

> Quoting Mika Kuoppala (2018-08-06 12:12:15)
>> Chris Wilson <chris@chris-wilson.co.uk> writes:
>> 
>> > By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from
>> > underneath us, even when using rcu_read_lock(). We use a strong barrier
>> > on acquiring the refcount during lookup, but this needs to be paired
>> > with a barrier on re-initialising it. Currently we call dma_fence_init,
>> > which ultimately does a plain atomic_set(1) on the refcount, not
>> > providing any memory barriers. As we inspect some state before even
>> > acquiring the refcount in the lookup (by arguing that we can detect
>> > inconsistent requests), that state should be initialised before the
>> > refcount.
>> >
>> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>> > ---
>> >  drivers/gpu/drm/i915/i915_request.c | 7 +++++++
>> >  1 file changed, 7 insertions(+)
>> >
>> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
>> > index 5c2c93cbab12..04a0b8e75533 100644
>> > --- a/drivers/gpu/drm/i915/i915_request.c
>> > +++ b/drivers/gpu/drm/i915/i915_request.c
>> > @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>> >       rq->timeline = ce->ring->timeline;
>> >       GEM_BUG_ON(rq->timeline == &engine->timeline);
>> >  
>> > +     /*
>> > +      * In order to coordinate with our RCU lookup,
>> > +      * __i915_gem_active_get_rcu(), we need to ensure that the change
>> > +      * to rq->engine is visible before acquring the refcount in the lookup.
>> > +      */
>> > +     smp_wmb();
>> > +
>> 
>> There is quite a lot going on here as we try to get a reference
>> into a shapeshifting request.
>> 
>> By looking at the code acquiring it, dma_fence_get_rcu
>> and dma_fence_init and then the precheck of the request,
>> should memory barrier be:
>> 
>> smb_mb_before_atomic()?
>
> No. The code does have a mb, smb_mb_before_atomic is only for atomics
> that don't themselves enforce a mb and so you need a bit of extra
> weight. On x86, it's not even a mb, just a compiler barrier.
>
>> Admittedly that would be uglier as fence_init hides the atomic_set,
>> but it is atomic on we are serializing. Especially
>> as there is no atomic in callsight.
>
> Right, the suggestion in the thread was to use atomic_set_release(), but
> that requires a lot of deconstruction merely to do the same: it adds
> smp_mb() before the atomic_set.
>
>> Further, as engine and the kref are tightly bound,
>> should we initialize everything not related first, then
>> do engine init, wmb, fence init in a tight proximity?
>
> As we do. The existing order is sufficient for our needs. Everything
> that needs to be initialised before the kref, is -- though I think it's
> overkill as our argument about checking stale state is still correct and
> safe. So what this nails down is the stability of a full referenced
> request -- which is less worrisome as it will only be exposed to the rcu
> onlookers much later, we don't have the same danger of immediate
> exposure to rcu walkers.
>
> What I do think is useful overall is that it gives the companion mb to
> the one referenced by __i915_gem_active_get_rcu, and dma_fence_get_rcu
> generally.

Agreed.

I tried to think how to improve the comment pairing, but
the rabbit hole is deep in here. Mentioning the refcount
should guide the reader into right spots tho.

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> -Chris
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5c2c93cbab12..04a0b8e75533 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -768,6 +768,13 @@  i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->timeline = ce->ring->timeline;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
 
+	/*
+	 * In order to coordinate with our RCU lookup,
+	 * __i915_gem_active_get_rcu(), we need to ensure that the change
+	 * to rq->engine is visible before acquring the refcount in the lookup.
+	 */
+	smp_wmb();
+
 	spin_lock_init(&rq->lock);
 	dma_fence_init(&rq->fence,
 		       &i915_fence_ops,