diff mbox

[24/51] drm/i915: Update deferred context creation to do explicit request management

Message ID 1423828140-10653-25-git-send-email-John.C.Harrison@Intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

John Harrison Feb. 13, 2015, 11:48 a.m. UTC
From: John Harrison <John.C.Harrison@Intel.com>

In execlist mode, context initialisation is deferred until first use of the
given context. This is because execlist mode has many more contexts than legacy
mode and many are never actually used. Previously, the initialisation commands
were written to the ring and tagged with some random request structure via the
OLR. This seemed to be causing a null pointer deference bug under certain
circumstances (BZ:40112).

This patch adds explicit request creation and submission to the deferred
initialisation code path. Thus removing any reliance on or randomness caused by
the OLR.

For: VIZ-5115
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c |   17 ++++++++++++++++-
 1 file changed, 16 insertions(+), 1 deletion(-)

Comments

Chris Wilson Feb. 13, 2015, 12:15 p.m. UTC | #1
On Fri, Feb 13, 2015 at 11:48:33AM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> In execlist mode, context initialisation is deferred until first use of the
> given context. This is because execlist mode has many more contexts than legacy
> mode and many are never actually used.

That's not correct. There are no more contexts in execlists than legacy.
There are more ringbuffers, or rather the contexts have an extra state
object associated with them.

> Previously, the initialisation commands
> were written to the ring and tagged with some random request structure via the
> OLR. This seemed to be causing a null pointer deference bug under certain
> circumstances (BZ:40112).
> 
> This patch adds explicit request creation and submission to the deferred
> initialisation code path. Thus removing any reliance on or randomness caused by
> the OLR.

This is upside down though. The request should be referencing the
context (thus instantiating it on demand) and nothing in the context
allocation requires the request. The initialisation here should be during
i915_request_switch_context(), since it can be entirely shared with
legacy.
-Chris
John Harrison Feb. 18, 2015, 3:27 p.m. UTC | #2
On 13/02/2015 12:15, Chris Wilson wrote:
> On Fri, Feb 13, 2015 at 11:48:33AM +0000, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> In execlist mode, context initialisation is deferred until first use of the
>> given context. This is because execlist mode has many more contexts than legacy
>> mode and many are never actually used.
> That's not correct. There are no more contexts in execlists than legacy.
> There are more ringbuffers, or rather the contexts have an extra state
> object associated with them.
Okay, I should have said sub-contexts. Or context state objects. Or 
something.


>> Previously, the initialisation commands
>> were written to the ring and tagged with some random request structure via the
>> OLR. This seemed to be causing a null pointer deference bug under certain
>> circumstances (BZ:40112).
>>
>> This patch adds explicit request creation and submission to the deferred
>> initialisation code path. Thus removing any reliance on or randomness caused by
>> the OLR.
> This is upside down though. The request should be referencing the
> context (thus instantiating it on demand) and nothing in the context
> allocation requires the request. The initialisation here should be during
> i915_request_switch_context(), since it can be entirely shared with
> legacy.
> -Chris

The request does reference the context - the alloc_reques() function 
takes a context object as a parameter. Thus it is impossible for the 
request to be used/supplied/required during context creation. The issue 
here is the lazy initialisation of the per ring context state which 
requires sending commands to the ring on first usage of the given 
context object on the given ring.

One problem is that the initialisation request and the batch buffer 
request cannot be merged at the moment. They both use request->batch_obj 
for tracking the command object. Thus this patch only works due to the 
deferred intialisation occurring during the i915_gem_validate_context() 
call very early on in execbuffer() rather than as part of the context 
switch within the batch buffer execution which is much later.

I'm not sure what you mean by i915_request_switch_context(). The 
existing i915_switch_context() does now take just a request structure 
rather than a ring/ringbuf/context mixture. However, it is not really a 
good idea to do the context switch automatically as part of creating the 
request. The request creation and request execution could be quite 
separated in time, especially with a scheduler.

It should be possible to move the deferred initialisation within the 
context switch if the object tracking can be resolved. Thus they could 
share the same request and there would not be effectively two separate 
execution calls at the hardware level. Again, that's potentially work 
that could be done as a follow up task of improving the context 
management independent of the current task of removing the OLR.

John.
Daniel Vetter Feb. 25, 2015, 9:15 p.m. UTC | #3
On Wed, Feb 18, 2015 at 03:27:38PM +0000, John Harrison wrote:
> On 13/02/2015 12:15, Chris Wilson wrote:
> >On Fri, Feb 13, 2015 at 11:48:33AM +0000, John.C.Harrison@Intel.com wrote:
> >>From: John Harrison <John.C.Harrison@Intel.com>
> >>
> >>In execlist mode, context initialisation is deferred until first use of the
> >>given context. This is because execlist mode has many more contexts than legacy
> >>mode and many are never actually used.
> >That's not correct. There are no more contexts in execlists than legacy.
> >There are more ringbuffers, or rather the contexts have an extra state
> >object associated with them.
> Okay, I should have said sub-contexts. Or context state objects. Or
> something.

per-engine ctx state? Naming stuff is hard ;-)

>> >>Previously, the initialisation commands
> >>were written to the ring and tagged with some random request structure via the
> >>OLR. This seemed to be causing a null pointer deference bug under certain
> >>circumstances (BZ:40112).
> >>
> >>This patch adds explicit request creation and submission to the deferred
> >>initialisation code path. Thus removing any reliance on or randomness caused by
> >>the OLR.
> >This is upside down though. The request should be referencing the
> >context (thus instantiating it on demand) and nothing in the context
> >allocation requires the request. The initialisation here should be during
> >i915_request_switch_context(), since it can be entirely shared with
> >legacy.
> >-Chris
> 
> The request does reference the context - the alloc_reques() function takes a
> context object as a parameter. Thus it is impossible for the request to be
> used/supplied/required during context creation. The issue here is the lazy
> initialisation of the per ring context state which requires sending commands
> to the ring on first usage of the given context object on the given ring.
> 
> One problem is that the initialisation request and the batch buffer request
> cannot be merged at the moment. They both use request->batch_obj for
> tracking the command object. Thus this patch only works due to the deferred
> intialisation occurring during the i915_gem_validate_context() call very
> early on in execbuffer() rather than as part of the context switch within
> the batch buffer execution which is much later.

My request struct doesn't have a batch_obj pointer. Where is that from and
why do we need it? Atm just chasing Chris' comments, haven't read the full
series yet.

> I'm not sure what you mean by i915_request_switch_context(). The existing
> i915_switch_context() does now take just a request structure rather than a
> ring/ringbuf/context mixture. However, it is not really a good idea to do
> the context switch automatically as part of creating the request. The
> request creation and request execution could be quite separated in time,
> especially with a scheduler.
> 
> It should be possible to move the deferred initialisation within the context
> switch if the object tracking can be resolved. Thus they could share the
> same request and there would not be effectively two separate execution calls
> at the hardware level. Again, that's potentially work that could be done as
> a follow up task of improving the context management independent of the
> current task of removing the OLR.

I think the biggest risk with adding a separate request for the lrc
deferred init is in accidentally nesting request when someone moves around
the lrc validation. Atm it's at the top of execbuf but we tend to shuffle
things around a lot.

Is there some simple WARN_ON we could smash into the alloc function to
make sure this never happens? ring->olr would be it, but since we want to
kill that that's not great. Or do I see risks which aren't really there?
-Daniel
John Harrison Feb. 27, 2015, 12:45 p.m. UTC | #4
On 25/02/2015 21:15, Daniel Vetter wrote:
> On Wed, Feb 18, 2015 at 03:27:38PM +0000, John Harrison wrote:
>> On 13/02/2015 12:15, Chris Wilson wrote:
>>> On Fri, Feb 13, 2015 at 11:48:33AM +0000, John.C.Harrison@Intel.com wrote:
>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> In execlist mode, context initialisation is deferred until first use of the
>>>> given context. This is because execlist mode has many more contexts than legacy
>>>> mode and many are never actually used.
>>> That's not correct. There are no more contexts in execlists than legacy.
>>> There are more ringbuffers, or rather the contexts have an extra state
>>> object associated with them.
>> Okay, I should have said sub-contexts. Or context state objects. Or
>> something.
> per-engine ctx state? Naming stuff is hard ;-)
>
>>>>> Previously, the initialisation commands
>>>> were written to the ring and tagged with some random request structure via the
>>>> OLR. This seemed to be causing a null pointer deference bug under certain
>>>> circumstances (BZ:40112).
>>>>
>>>> This patch adds explicit request creation and submission to the deferred
>>>> initialisation code path. Thus removing any reliance on or randomness caused by
>>>> the OLR.
>>> This is upside down though. The request should be referencing the
>>> context (thus instantiating it on demand) and nothing in the context
>>> allocation requires the request. The initialisation here should be during
>>> i915_request_switch_context(), since it can be entirely shared with
>>> legacy.
>>> -Chris
>> The request does reference the context - the alloc_reques() function takes a
>> context object as a parameter. Thus it is impossible for the request to be
>> used/supplied/required during context creation. The issue here is the lazy
>> initialisation of the per ring context state which requires sending commands
>> to the ring on first usage of the given context object on the given ring.
>>
>> One problem is that the initialisation request and the batch buffer request
>> cannot be merged at the moment. They both use request->batch_obj for
>> tracking the command object. Thus this patch only works due to the deferred
>> intialisation occurring during the i915_gem_validate_context() call very
>> early on in execbuffer() rather than as part of the context switch within
>> the batch buffer execution which is much later.
> My request struct doesn't have a batch_obj pointer. Where is that from and
> why do we need it? Atm just chasing Chris' comments, haven't read the full
> series yet.
It should do! It was added way back in June 2013 by Mika in 'drm/i915: 
add batch bo to i915_add_request()'. So unless someone has removed it 
again since I last fetched a tree, you should definitely have it.

>> I'm not sure what you mean by i915_request_switch_context(). The existing
>> i915_switch_context() does now take just a request structure rather than a
>> ring/ringbuf/context mixture. However, it is not really a good idea to do
>> the context switch automatically as part of creating the request. The
>> request creation and request execution could be quite separated in time,
>> especially with a scheduler.
>>
>> It should be possible to move the deferred initialisation within the context
>> switch if the object tracking can be resolved. Thus they could share the
>> same request and there would not be effectively two separate execution calls
>> at the hardware level. Again, that's potentially work that could be done as
>> a follow up task of improving the context management independent of the
>> current task of removing the OLR.
> I think the biggest risk with adding a separate request for the lrc
> deferred init is in accidentally nesting request when someone moves around
> the lrc validation. Atm it's at the top of execbuf but we tend to shuffle
> things around a lot.
>
> Is there some simple WARN_ON we could smash into the alloc function to
> make sure this never happens? ring->olr would be it, but since we want to
> kill that that's not great. Or do I see risks which aren't really there?
> -Daniel

There is a WARN_ON that the batch object in the request structure does 
not get overwritten. Although, that could only happen if the lazy setup 
and the batch buffer execution were sharing the same request. Nesting 
multiple requests shouldn't really be a problem. Without the OLR, there 
is no globally shared state. Having multiple requests being created in 
parallel is fine. The lazy context setup must be completed before the 
execbuffer is executed otherwise Bad Things are going to happen 
irrespective of request usage. So even if the execbuffer request is 
created first, the lazy setup will still happen at the right time 
without breaking the execbuffer's request.
Daniel Vetter Feb. 27, 2015, 1:50 p.m. UTC | #5
On Fri, Feb 27, 2015 at 12:45:19PM +0000, John Harrison wrote:
> On 25/02/2015 21:15, Daniel Vetter wrote:
> >On Wed, Feb 18, 2015 at 03:27:38PM +0000, John Harrison wrote:
> >>On 13/02/2015 12:15, Chris Wilson wrote:
> >>>On Fri, Feb 13, 2015 at 11:48:33AM +0000, John.C.Harrison@Intel.com wrote:
> >>>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>>
> >>>>In execlist mode, context initialisation is deferred until first use of the
> >>>>given context. This is because execlist mode has many more contexts than legacy
> >>>>mode and many are never actually used.
> >>>That's not correct. There are no more contexts in execlists than legacy.
> >>>There are more ringbuffers, or rather the contexts have an extra state
> >>>object associated with them.
> >>Okay, I should have said sub-contexts. Or context state objects. Or
> >>something.
> >per-engine ctx state? Naming stuff is hard ;-)
> >
> >>>>>Previously, the initialisation commands
> >>>>were written to the ring and tagged with some random request structure via the
> >>>>OLR. This seemed to be causing a null pointer deference bug under certain
> >>>>circumstances (BZ:40112).
> >>>>
> >>>>This patch adds explicit request creation and submission to the deferred
> >>>>initialisation code path. Thus removing any reliance on or randomness caused by
> >>>>the OLR.
> >>>This is upside down though. The request should be referencing the
> >>>context (thus instantiating it on demand) and nothing in the context
> >>>allocation requires the request. The initialisation here should be during
> >>>i915_request_switch_context(), since it can be entirely shared with
> >>>legacy.
> >>>-Chris
> >>The request does reference the context - the alloc_reques() function takes a
> >>context object as a parameter. Thus it is impossible for the request to be
> >>used/supplied/required during context creation. The issue here is the lazy
> >>initialisation of the per ring context state which requires sending commands
> >>to the ring on first usage of the given context object on the given ring.
> >>
> >>One problem is that the initialisation request and the batch buffer request
> >>cannot be merged at the moment. They both use request->batch_obj for
> >>tracking the command object. Thus this patch only works due to the deferred
> >>intialisation occurring during the i915_gem_validate_context() call very
> >>early on in execbuffer() rather than as part of the context switch within
> >>the batch buffer execution which is much later.
> >My request struct doesn't have a batch_obj pointer. Where is that from and
> >why do we need it? Atm just chasing Chris' comments, haven't read the full
> >series yet.
> It should do! It was added way back in June 2013 by Mika in 'drm/i915: add
> batch bo to i915_add_request()'. So unless someone has removed it again
> since I last fetched a tree, you should definitely have it.

I must have been blind. Now I've found both the batch_obj and how it's
used in the renderstate. And that usage looks wrong since
request->batch_obj is used to assign blame to userspace batches for
arb_robustness gpu hang stats. The render state init batch is created by
the kernel, per definition userspace can't be responsible for hangs in
there.

Imo we should replace the batch obj argument in i915_gem_render_state_init
with NULL.

Would this allow you to resolve the inversion Chris brought up? Or is
there something else which would prevent us from putting the deferred
render ring init into the same request as the first execbuf?

> >>I'm not sure what you mean by i915_request_switch_context(). The existing
> >>i915_switch_context() does now take just a request structure rather than a
> >>ring/ringbuf/context mixture. However, it is not really a good idea to do
> >>the context switch automatically as part of creating the request. The
> >>request creation and request execution could be quite separated in time,
> >>especially with a scheduler.
> >>
> >>It should be possible to move the deferred initialisation within the context
> >>switch if the object tracking can be resolved. Thus they could share the
> >>same request and there would not be effectively two separate execution calls
> >>at the hardware level. Again, that's potentially work that could be done as
> >>a follow up task of improving the context management independent of the
> >>current task of removing the OLR.
> >I think the biggest risk with adding a separate request for the lrc
> >deferred init is in accidentally nesting request when someone moves around
> >the lrc validation. Atm it's at the top of execbuf but we tend to shuffle
> >things around a lot.
> >
> >Is there some simple WARN_ON we could smash into the alloc function to
> >make sure this never happens? ring->olr would be it, but since we want to
> >kill that that's not great. Or do I see risks which aren't really there?
> >-Daniel
> 
> There is a WARN_ON that the batch object in the request structure does not
> get overwritten. Although, that could only happen if the lazy setup and the
> batch buffer execution were sharing the same request. Nesting multiple
> requests shouldn't really be a problem. Without the OLR, there is no
> globally shared state. Having multiple requests being created in parallel is
> fine. The lazy context setup must be completed before the execbuffer is
> executed otherwise Bad Things are going to happen irrespective of request
> usage. So even if the execbuffer request is created first, the lazy setup
> will still happen at the right time without breaking the execbuffer's
> request.

Ok, sounds like we'd be covered. I just wanted to make sure we're not
ignoring potential issues here, but if requests are free-standing already
as-is (well after this series) we're hopefully fine.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index de01cae..2882d3f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1822,6 +1822,7 @@  static void lrc_setup_hardware_status_page(struct intel_engine_cs *ring,
 int intel_lr_context_deferred_create(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	const bool is_global_default_ctx = (ctx == ring->default_context);
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_gem_object *ctx_obj;
@@ -1902,13 +1903,27 @@  int intel_lr_context_deferred_create(struct intel_context *ctx,
 		lrc_setup_hardware_status_page(ring, ctx_obj);
 	else if (ring->id == RCS && !ctx->rcs_initialized) {
 		if (ring->init_context) {
-			ret = ring->init_context(ring, ctx);
+			struct drm_i915_gem_request *req;
+
+			ret = dev_priv->gt.alloc_request(ring, ctx, &req);
+			if (ret)
+				return ret;
+
+			ret = ring->init_context(req->ring, ctx);
 			if (ret) {
 				DRM_ERROR("ring init context: %d\n", ret);
+				i915_gem_request_unreference(req);
 				ctx->engine[ring->id].ringbuf = NULL;
 				ctx->engine[ring->id].state = NULL;
 				goto error;
 			}
+
+			ret = i915_add_request_no_flush(req->ring);
+			if (ret) {
+				DRM_ERROR("ring init context: %d\n", ret);
+				i915_gem_request_unreference(req);
+				goto error;
+			}
 		}
 
 		ctx->rcs_initialized = true;