Message ID | 20180804095236.24584-1-chris@chris-wilson.co.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915: Enforce TYPESAFE_BY_RCU vs refcount mb on reinitialisation | expand |
Chris Wilson <chris@chris-wilson.co.uk> writes: > By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from > underneath us, even when using rcu_read_lock(). We use a strong barrier > on acquiring the refcount during lookup, but this needs to be paired > with a barrier on re-initialising it. Currently we call dma_fence_init, > which ultimately does a plain atomic_set(1) on the refcount, not > providing any memory barriers. As we inspect some state before even > acquiring the refcount in the lookup (by arguing that we can detect > inconsistent requests), that state should be initialised before the > refcount. > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > --- > drivers/gpu/drm/i915/i915_request.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > index 5c2c93cbab12..04a0b8e75533 100644 > --- a/drivers/gpu/drm/i915/i915_request.c > +++ b/drivers/gpu/drm/i915/i915_request.c > @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) > rq->timeline = ce->ring->timeline; > GEM_BUG_ON(rq->timeline == &engine->timeline); > > + /* > + * In order to coordinate with our RCU lookup, > + * __i915_gem_active_get_rcu(), we need to ensure that the change > + * to rq->engine is visible before acquring the refcount in the lookup. > + */ > + smp_wmb(); > + There is quite a lot going on here as we try to get a reference into a shapeshifting request. By looking at the code acquiring it, dma_fence_get_rcu and dma_fence_init and then the precheck of the request, should memory barrier be: smb_mb_before_atomic()? Admittedly that would be uglier as fence_init hides the atomic_set, but it is atomic on we are serializing. Especially as there is no atomic in callsight. Further, as engine and the kref are tightly bound, should we initialize everything not related first, then do engine init, wmb, fence init in a tight proximity? Thanks, -Mika > spin_lock_init(&rq->lock); > dma_fence_init(&rq->fence, > &i915_fence_ops, > -- > 2.18.0 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/intel-gfx
Quoting Mika Kuoppala (2018-08-06 12:12:15) > Chris Wilson <chris@chris-wilson.co.uk> writes: > > > By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from > > underneath us, even when using rcu_read_lock(). We use a strong barrier > > on acquiring the refcount during lookup, but this needs to be paired > > with a barrier on re-initialising it. Currently we call dma_fence_init, > > which ultimately does a plain atomic_set(1) on the refcount, not > > providing any memory barriers. As we inspect some state before even > > acquiring the refcount in the lookup (by arguing that we can detect > > inconsistent requests), that state should be initialised before the > > refcount. > > > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> > > --- > > drivers/gpu/drm/i915/i915_request.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c > > index 5c2c93cbab12..04a0b8e75533 100644 > > --- a/drivers/gpu/drm/i915/i915_request.c > > +++ b/drivers/gpu/drm/i915/i915_request.c > > @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) > > rq->timeline = ce->ring->timeline; > > GEM_BUG_ON(rq->timeline == &engine->timeline); > > > > + /* > > + * In order to coordinate with our RCU lookup, > > + * __i915_gem_active_get_rcu(), we need to ensure that the change > > + * to rq->engine is visible before acquring the refcount in the lookup. > > + */ > > + smp_wmb(); > > + > > There is quite a lot going on here as we try to get a reference > into a shapeshifting request. > > By looking at the code acquiring it, dma_fence_get_rcu > and dma_fence_init and then the precheck of the request, > should memory barrier be: > > smb_mb_before_atomic()? No. The code does have a mb, smb_mb_before_atomic is only for atomics that don't themselves enforce a mb and so you need a bit of extra weight. On x86, it's not even a mb, just a compiler barrier. > Admittedly that would be uglier as fence_init hides the atomic_set, > but it is atomic on we are serializing. Especially > as there is no atomic in callsight. Right, the suggestion in the thread was to use atomic_set_release(), but that requires a lot of deconstruction merely to do the same: it adds smp_mb() before the atomic_set. > Further, as engine and the kref are tightly bound, > should we initialize everything not related first, then > do engine init, wmb, fence init in a tight proximity? As we do. The existing order is sufficient for our needs. Everything that needs to be initialised before the kref, is -- though I think it's overkill as our argument about checking stale state is still correct and safe. So what this nails down is the stability of a full referenced request -- which is less worrisome as it will only be exposed to the rcu onlookers much later, we don't have the same danger of immediate exposure to rcu walkers. What I do think is useful overall is that it gives the companion mb to the one referenced by __i915_gem_active_get_rcu, and dma_fence_get_rcu generally. -Chris
Chris Wilson <chris@chris-wilson.co.uk> writes: > Quoting Mika Kuoppala (2018-08-06 12:12:15) >> Chris Wilson <chris@chris-wilson.co.uk> writes: >> >> > By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from >> > underneath us, even when using rcu_read_lock(). We use a strong barrier >> > on acquiring the refcount during lookup, but this needs to be paired >> > with a barrier on re-initialising it. Currently we call dma_fence_init, >> > which ultimately does a plain atomic_set(1) on the refcount, not >> > providing any memory barriers. As we inspect some state before even >> > acquiring the refcount in the lookup (by arguing that we can detect >> > inconsistent requests), that state should be initialised before the >> > refcount. >> > >> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> >> > --- >> > drivers/gpu/drm/i915/i915_request.c | 7 +++++++ >> > 1 file changed, 7 insertions(+) >> > >> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c >> > index 5c2c93cbab12..04a0b8e75533 100644 >> > --- a/drivers/gpu/drm/i915/i915_request.c >> > +++ b/drivers/gpu/drm/i915/i915_request.c >> > @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) >> > rq->timeline = ce->ring->timeline; >> > GEM_BUG_ON(rq->timeline == &engine->timeline); >> > >> > + /* >> > + * In order to coordinate with our RCU lookup, >> > + * __i915_gem_active_get_rcu(), we need to ensure that the change >> > + * to rq->engine is visible before acquring the refcount in the lookup. >> > + */ >> > + smp_wmb(); >> > + >> >> There is quite a lot going on here as we try to get a reference >> into a shapeshifting request. >> >> By looking at the code acquiring it, dma_fence_get_rcu >> and dma_fence_init and then the precheck of the request, >> should memory barrier be: >> >> smb_mb_before_atomic()? > > No. The code does have a mb, smb_mb_before_atomic is only for atomics > that don't themselves enforce a mb and so you need a bit of extra > weight. On x86, it's not even a mb, just a compiler barrier. > >> Admittedly that would be uglier as fence_init hides the atomic_set, >> but it is atomic on we are serializing. Especially >> as there is no atomic in callsight. > > Right, the suggestion in the thread was to use atomic_set_release(), but > that requires a lot of deconstruction merely to do the same: it adds > smp_mb() before the atomic_set. > >> Further, as engine and the kref are tightly bound, >> should we initialize everything not related first, then >> do engine init, wmb, fence init in a tight proximity? > > As we do. The existing order is sufficient for our needs. Everything > that needs to be initialised before the kref, is -- though I think it's > overkill as our argument about checking stale state is still correct and > safe. So what this nails down is the stability of a full referenced > request -- which is less worrisome as it will only be exposed to the rcu > onlookers much later, we don't have the same danger of immediate > exposure to rcu walkers. > > What I do think is useful overall is that it gives the companion mb to > the one referenced by __i915_gem_active_get_rcu, and dma_fence_get_rcu > generally. Agreed. I tried to think how to improve the comment pairing, but the rabbit hole is deep in here. Mentioning the refcount should guide the reader into right spots tho. Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com> > -Chris
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c index 5c2c93cbab12..04a0b8e75533 100644 --- a/drivers/gpu/drm/i915/i915_request.c +++ b/drivers/gpu/drm/i915/i915_request.c @@ -768,6 +768,13 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx) rq->timeline = ce->ring->timeline; GEM_BUG_ON(rq->timeline == &engine->timeline); + /* + * In order to coordinate with our RCU lookup, + * __i915_gem_active_get_rcu(), we need to ensure that the change + * to rq->engine is visible before acquring the refcount in the lookup. + */ + smp_wmb(); + spin_lock_init(&rq->lock); dma_fence_init(&rq->fence, &i915_fence_ops,
By using TYPESAFE_BY_RCU, we accept that requests may be swapped out from underneath us, even when using rcu_read_lock(). We use a strong barrier on acquiring the refcount during lookup, but this needs to be paired with a barrier on re-initialising it. Currently we call dma_fence_init, which ultimately does a plain atomic_set(1) on the refcount, not providing any memory barriers. As we inspect some state before even acquiring the refcount in the lookup (by arguing that we can detect inconsistent requests), that state should be initialised before the refcount. Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk> --- drivers/gpu/drm/i915/i915_request.c | 7 +++++++ 1 file changed, 7 insertions(+)