diff mbox

[v2] drm/i915: add schedule out notification of preempted but completed request

Message ID 1519441170-4354-1-git-send-email-weinan.z.li@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Li, Weinan Z Feb. 24, 2018, 2:59 a.m. UTC
There is one corner case missing schedule out notification of the preempted
request. The preempted request is just completed when preemption happen,
then it will be canceled and won't be resubmitted later, GVT-g will lost
the schedule out notification.

Here add schedule out notification if found the preempted request has been
completed.

v2:
- refine description, add completed check and notification in
  execlists_cancel_port_requests. (Chris)

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Weinan Li <weinan.z.li@intel.com>
Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Li, Weinan Z March 5, 2018, 3:12 a.m. UTC | #1
Any comments? This issue impact the preemption enabling in host, without the schedule out notification, vgpu will run into wrong state.
Thanks
-Weinan
> -----Original Message-----
> From: Li, Weinan Z
> Sent: Saturday, February 24, 2018 11:00 AM
> To: intel-gfx@lists.freedesktop.org
> Cc: intel-gvt-dev@lists.freedesktop.org; Li, Weinan Z
> <weinan.z.li@intel.com>; Chris Wilson <chris@chris-wilson.co.uk>; Zhenyu
> Wang <zhenyuw@linux.intel.com>
> Subject: [PATCH v2] drm/i915: add schedule out notification of preempted
> but completed request
> 
> There is one corner case missing schedule out notification of the preempted
> request. The preempted request is just completed when preemption
> happen, then it will be canceled and won't be resubmitted later, GVT-g will
> lost the schedule out notification.
> 
> Here add schedule out notification if found the preempted request has been
> completed.
> 
> v2:
> - refine description, add completed check and notification in
>   execlists_cancel_port_requests. (Chris)
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Weinan Li <weinan.z.li@intel.com>
> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/intel_lrc.c | 8 +++++++-
>  1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> b/drivers/gpu/drm/i915/intel_lrc.c
> index e781c91..24a6e68 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -657,10 +657,16 @@ static void execlists_dequeue(struct
> intel_engine_cs *engine)
> 
>  	while (num_ports-- && port_isset(port)) {
>  		struct i915_request *rq = port_request(port);
> +		unsigned int notify;
> 
>  		GEM_BUG_ON(!execlists->active);
>  		intel_engine_context_out(rq->engine);
> -		execlists_context_status_change(rq,
> INTEL_CONTEXT_SCHEDULE_PREEMPTED);
> +
> +		notify = INTEL_CONTEXT_SCHEDULE_PREEMPTED;
> +		if (i915_request_completed(rq))
> +			notify = INTEL_CONTEXT_SCHEDULE_OUT;
> +		execlists_context_status_change(rq, notify);
> +
>  		i915_request_put(rq);
> 
>  		memset(port, 0, sizeof(*port));
> --
> 1.9.1
Tvrtko Ursulin March 5, 2018, 11:06 a.m. UTC | #2
On 24/02/2018 02:59, Weinan Li wrote:
> There is one corner case missing schedule out notification of the preempted
> request. The preempted request is just completed when preemption happen,
> then it will be canceled and won't be resubmitted later, GVT-g will lost
> the schedule out notification.
> 
> Here add schedule out notification if found the preempted request has been
> completed.
> 
> v2:
> - refine description, add completed check and notification in
>    execlists_cancel_port_requests. (Chris)
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Weinan Li <weinan.z.li@intel.com>
> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/intel_lrc.c | 8 +++++++-
>   1 file changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e781c91..24a6e68 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -657,10 +657,16 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   
>   	while (num_ports-- && port_isset(port)) {
>   		struct i915_request *rq = port_request(port);
> +		unsigned int notify;
>   
>   		GEM_BUG_ON(!execlists->active);
>   		intel_engine_context_out(rq->engine);
> -		execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_PREEMPTED);
> +
> +		notify = INTEL_CONTEXT_SCHEDULE_PREEMPTED;
> +		if (i915_request_completed(rq))
> +			notify = INTEL_CONTEXT_SCHEDULE_OUT;
> +		execlists_context_status_change(rq, notify);
> +
>   		i915_request_put(rq);
>   
>   		memset(port, 0, sizeof(*port));
> 

I hope seqno in HWS cannot change between execlists_cancel_port_requests 
and __unwind_incomplete_requests? Some sort of delay in memory 
transaction vs the interrupt? No idea, could be talking nonsense.

 From style point of view I would probably simplify with an inline 
ternary conditional in execlists_context_status_change.

Regards,

Tvrtko
Chris Wilson March 5, 2018, 11:18 a.m. UTC | #3
Quoting Tvrtko Ursulin (2018-03-05 11:06:23)
> 
> On 24/02/2018 02:59, Weinan Li wrote:
> > There is one corner case missing schedule out notification of the preempted
> > request. The preempted request is just completed when preemption happen,
> > then it will be canceled and won't be resubmitted later, GVT-g will lost
> > the schedule out notification.
> > 
> > Here add schedule out notification if found the preempted request has been
> > completed.
> > 
> > v2:
> > - refine description, add completed check and notification in
> >    execlists_cancel_port_requests. (Chris)
> > 
> > Cc: Chris Wilson <chris@chris-wilson.co.uk>
> > Signed-off-by: Weinan Li <weinan.z.li@intel.com>
> > Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
> > ---
> >   drivers/gpu/drm/i915/intel_lrc.c | 8 +++++++-
> >   1 file changed, 7 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index e781c91..24a6e68 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -657,10 +657,16 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >   
> >       while (num_ports-- && port_isset(port)) {
> >               struct i915_request *rq = port_request(port);
> > +             unsigned int notify;
> >   
> >               GEM_BUG_ON(!execlists->active);
> >               intel_engine_context_out(rq->engine);
> > -             execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_PREEMPTED);
> > +
> > +             notify = INTEL_CONTEXT_SCHEDULE_PREEMPTED;
> > +             if (i915_request_completed(rq))
> > +                     notify = INTEL_CONTEXT_SCHEDULE_OUT;
> > +             execlists_context_status_change(rq, notify);
> > +
> >               i915_request_put(rq);
> >   
> >               memset(port, 0, sizeof(*port));
> > 
> 
> I hope seqno in HWS cannot change between execlists_cancel_port_requests 
> and __unwind_incomplete_requests? Some sort of delay in memory 
> transaction vs the interrupt? No idea, could be talking nonsense.

Not nonsense, just the type of nightmare I have (irq_seqno_barrier ahem).
We have an assert following the completion event to try and detect this
type of error, which thankfully hasn't fired during normal usage (so far,
it has only tripped over our programming bugs). I fear iommu in this
regard as well as that introduces random latency around memory
transactions that we know breaks coherency wrt interrupts.

As far as this goes, we already rely on this being called under stable
HWS, or else preemption/reset-recovery is fubar.
-Chris
Tvrtko Ursulin March 5, 2018, 12:19 p.m. UTC | #4
On 05/03/2018 11:18, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2018-03-05 11:06:23)
>>
>> On 24/02/2018 02:59, Weinan Li wrote:
>>> There is one corner case missing schedule out notification of the preempted
>>> request. The preempted request is just completed when preemption happen,
>>> then it will be canceled and won't be resubmitted later, GVT-g will lost
>>> the schedule out notification.
>>>
>>> Here add schedule out notification if found the preempted request has been
>>> completed.
>>>
>>> v2:
>>> - refine description, add completed check and notification in
>>>     execlists_cancel_port_requests. (Chris)
>>>
>>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>>> Signed-off-by: Weinan Li <weinan.z.li@intel.com>
>>> Signed-off-by: Zhenyu Wang <zhenyuw@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/intel_lrc.c | 8 +++++++-
>>>    1 file changed, 7 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index e781c91..24a6e68 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -657,10 +657,16 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>    
>>>        while (num_ports-- && port_isset(port)) {
>>>                struct i915_request *rq = port_request(port);
>>> +             unsigned int notify;
>>>    
>>>                GEM_BUG_ON(!execlists->active);
>>>                intel_engine_context_out(rq->engine);
>>> -             execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_PREEMPTED);
>>> +
>>> +             notify = INTEL_CONTEXT_SCHEDULE_PREEMPTED;
>>> +             if (i915_request_completed(rq))
>>> +                     notify = INTEL_CONTEXT_SCHEDULE_OUT;
>>> +             execlists_context_status_change(rq, notify);
>>> +
>>>                i915_request_put(rq);
>>>    
>>>                memset(port, 0, sizeof(*port));
>>>
>>
>> I hope seqno in HWS cannot change between execlists_cancel_port_requests
>> and __unwind_incomplete_requests? Some sort of delay in memory
>> transaction vs the interrupt? No idea, could be talking nonsense.
> 
> Not nonsense, just the type of nightmare I have (irq_seqno_barrier ahem).
> We have an assert following the completion event to try and detect this
> type of error, which thankfully hasn't fired during normal usage (so far,
> it has only tripped over our programming bugs). I fear iommu in this
> regard as well as that introduces random latency around memory
> transactions that we know breaks coherency wrt interrupts.
> 
> As far as this goes, we already rely on this being called under stable
> HWS, or else preemption/reset-recovery is fubar.

Yes I agree, much deeper issues in that case. So not relevant for GVT.

I wanted to suggest one possible alternative, that 
execlists_cancel_port_requests could always emit 
INTEL_CONTEXT_SCHEDULE_PREEMPTED, and then __unwind_incomplete_requests 
could emit INTEL_CONTEXT_SCHEDULE_OUT, but that would probably 
complicate things for GVT by having to handle duplicate notifications 
with different status.

So with the local variable dropped in favour of a ternary conditional:

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Or maybe even execlists_context_status_cancelled(rq) and the magic 
inside id?

Regards,

Tvrtko
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e781c91..24a6e68 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -657,10 +657,16 @@  static void execlists_dequeue(struct intel_engine_cs *engine)
 
 	while (num_ports-- && port_isset(port)) {
 		struct i915_request *rq = port_request(port);
+		unsigned int notify;
 
 		GEM_BUG_ON(!execlists->active);
 		intel_engine_context_out(rq->engine);
-		execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_PREEMPTED);
+
+		notify = INTEL_CONTEXT_SCHEDULE_PREEMPTED;
+		if (i915_request_completed(rq))
+			notify = INTEL_CONTEXT_SCHEDULE_OUT;
+		execlists_context_status_change(rq, notify);
+
 		i915_request_put(rq);
 
 		memset(port, 0, sizeof(*port));