diff mbox

[14/56] drm/i915: Make retire condition check for requests not objects

Message ID 1433442194-16501-1-git-send-email-John.C.Harrison@Intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

John Harrison June 4, 2015, 6:23 p.m. UTC
From: John Harrison <John.C.Harrison@Intel.com>

A previous patch (read-read optimisation) changed the early exit
condition in i915_gem_retire_requests_ring() from checking the request
list to checking the active list. This assumes that all requests have
objects associated with them which are placed on the active list. The
removal of the OLR means that non-batch buffer work is no longer
tagged onto the nearest batch buffer submission and thus there are
requests going through the system which do not have objects associated
with them. This can therefore lead to the situation where an
outstanding request never gets retired.

This change reverts the early exit condition to check for requests.
Given that the purpose of the function is to retire requests, this
does seem to make much more sense.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

John Harrison June 4, 2015, 6:24 p.m. UTC | #1
Note that this is a new patch to the series. The issue was found when 
debugging a problem with the conversion to struct fence that is still in 
progress. Basically, it was possible to get continuous TDR timeouts on 
the ECS ring because the start of day initialisation request never got 
retired and the TDR got confused by this.

This patch should therefore be included in the anti-OLR series as a new 
patch #14 (immediately before 'update i915_gpu_idle() to ...'). As that 
is the point at which non-batch buffer requests start to appear in the 
system.


On 04/06/2015 19:23, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> A previous patch (read-read optimisation) changed the early exit
> condition in i915_gem_retire_requests_ring() from checking the request
> list to checking the active list. This assumes that all requests have
> objects associated with them which are placed on the active list. The
> removal of the OLR means that non-batch buffer work is no longer
> tagged onto the nearest batch buffer submission and thus there are
> requests going through the system which do not have objects associated
> with them. This can therefore lead to the situation where an
> outstanding request never gets retired.
>
> This change reverts the early exit condition to check for requests.
> Given that the purpose of the function is to retire requests, this
> does seem to make much more sense.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7117659..4c5a6cd 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2859,7 +2859,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   {
>   	WARN_ON(i915_verify_lists(ring->dev));
>   
> -	if (list_empty(&ring->active_list))
> +	if (list_empty(&ring->request_list))
>   		return;
>   
>   	/* Retire requests first as we use it above for the early return.
Tomas Elf June 9, 2015, 3:56 p.m. UTC | #2
On 04/06/2015 19:23, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> A previous patch (read-read optimisation) changed the early exit
> condition in i915_gem_retire_requests_ring() from checking the request
> list to checking the active list. This assumes that all requests have
> objects associated with them which are placed on the active list. The
> removal of the OLR means that non-batch buffer work is no longer
> tagged onto the nearest batch buffer submission and thus there are
> requests going through the system which do not have objects associated
> with them. This can therefore lead to the situation where an
> outstanding request never gets retired.
>
> This change reverts the early exit condition to check for requests.
> Given that the purpose of the function is to retire requests, this
> does seem to make much more sense.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.c |    2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7117659..4c5a6cd 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2859,7 +2859,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   {
>   	WARN_ON(i915_verify_lists(ring->dev));
>
> -	if (list_empty(&ring->active_list))
> +	if (list_empty(&ring->request_list))
>   		return;
>
>   	/* Retire requests first as we use it above for the early return.
>

Note to whoever is integrating this patch: This patch can either be 
applied or we could drop the request_list check entirely. This according 
to Chris Wilson in the following conversation:

3:26:09 PM - ickle: we just kill the check
3:26:25 PM - ickle: the final function is just request_list + trace_request
3:26:37 PM - ickle: adding a test to save one isn't a great tradeoff
3:26:58 PM - siglesias has left the room (Quit: Ping timeout: 256 seconds).
3:27:28 PM - twnqx has left the room (Quit: Ping timeout: 272 seconds).
3:28:04 PM - tomas_elf: fine
3:28:20 PM - tomas_elf: anyway, good to know
3:29:32 PM - ickle: there's actually one more, it's also where the 
execlists_retire should be
3:30:48 PM - tomas_elf: maybe you can just submit a patch (unless you've 
already done so) that removes all of the references
3:31:09 PM - JohnHarrison Joryn
3:31:23 PM - ickle: there's like a backlog of 50 patches before we even 
get to that point
3:31:29 PM - tomas_elf: ok, cool
3:31:51 PM - tomas_elf: at any rate, JohnHarrison's patch can be 
accepted either with the request_list check or no check at all
3:33:14 PM - ickle: I thought vsyrjala sent the patch to completely kill 
it along with a bug citation
3:34:50 PM - doome_ [~doome@82.150.48.146] entered the room.
3:35:25 PM - ickle: 0aedb1626566efd72b369c01992ee7413c82a0c5
3:35:39 PM - darkbasic_ [~quassel@niko.linuxsystems.it] entered the room.
3:36:13 PM - darkbasic has left the room (Quit: Read error: Connection 
reset by peer).
3:39:05 PM - tomas_elf: has it been merged?
3:39:43 PM - ickle: it is in drm-intel-fixes
3:40:02 PM - tomas_elf: ah, ok

As long as the active_list check is removed since it breaks things.

Reviewed-by: Tomas Elf <tomas.elf@intel.com>

Thanks,
Tomas
Daniel Vetter June 17, 2015, 3:01 p.m. UTC | #3
On Tue, Jun 09, 2015 at 04:56:01PM +0100, Tomas Elf wrote:
> On 04/06/2015 19:23, John.C.Harrison@Intel.com wrote:
> >From: John Harrison <John.C.Harrison@Intel.com>
> >
> >A previous patch (read-read optimisation) changed the early exit
> >condition in i915_gem_retire_requests_ring() from checking the request
> >list to checking the active list. This assumes that all requests have
> >objects associated with them which are placed on the active list. The
> >removal of the OLR means that non-batch buffer work is no longer
> >tagged onto the nearest batch buffer submission and thus there are
> >requests going through the system which do not have objects associated
> >with them. This can therefore lead to the situation where an
> >outstanding request never gets retired.
> >
> >This change reverts the early exit condition to check for requests.
> >Given that the purpose of the function is to retire requests, this
> >does seem to make much more sense.
> >
> >For: VIZ-5190
> >Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >---
> >  drivers/gpu/drm/i915/i915_gem.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> >diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >index 7117659..4c5a6cd 100644
> >--- a/drivers/gpu/drm/i915/i915_gem.c
> >+++ b/drivers/gpu/drm/i915/i915_gem.c
> >@@ -2859,7 +2859,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
> >  {
> >  	WARN_ON(i915_verify_lists(ring->dev));
> >
> >-	if (list_empty(&ring->active_list))
> >+	if (list_empty(&ring->request_list))
> >  		return;
> >
> >  	/* Retire requests first as we use it above for the early return.
> >
> 
> Note to whoever is integrating this patch: This patch can either be applied
> or we could drop the request_list check entirely. This according to Chris
> Wilson in the following conversation:
> 
> 3:26:09 PM - ickle: we just kill the check
> 3:26:25 PM - ickle: the final function is just request_list + trace_request
> 3:26:37 PM - ickle: adding a test to save one isn't a great tradeoff
> 3:26:58 PM - siglesias has left the room (Quit: Ping timeout: 256 seconds).
> 3:27:28 PM - twnqx has left the room (Quit: Ping timeout: 272 seconds).
> 3:28:04 PM - tomas_elf: fine
> 3:28:20 PM - tomas_elf: anyway, good to know
> 3:29:32 PM - ickle: there's actually one more, it's also where the
> execlists_retire should be
> 3:30:48 PM - tomas_elf: maybe you can just submit a patch (unless you've
> already done so) that removes all of the references
> 3:31:09 PM - JohnHarrison Joryn
> 3:31:23 PM - ickle: there's like a backlog of 50 patches before we even get
> to that point
> 3:31:29 PM - tomas_elf: ok, cool
> 3:31:51 PM - tomas_elf: at any rate, JohnHarrison's patch can be accepted
> either with the request_list check or no check at all
> 3:33:14 PM - ickle: I thought vsyrjala sent the patch to completely kill it
> along with a bug citation
> 3:34:50 PM - doome_ [~doome@82.150.48.146] entered the room.
> 3:35:25 PM - ickle: 0aedb1626566efd72b369c01992ee7413c82a0c5
> 3:35:39 PM - darkbasic_ [~quassel@niko.linuxsystems.it] entered the room.
> 3:36:13 PM - darkbasic has left the room (Quit: Read error: Connection reset
> by peer).
> 3:39:05 PM - tomas_elf: has it been merged?
> 3:39:43 PM - ickle: it is in drm-intel-fixes
> 3:40:02 PM - tomas_elf: ah, ok
> 
> As long as the active_list check is removed since it breaks things.
> 
> Reviewed-by: Tomas Elf <tomas.elf@intel.com>

Jani already picked up Ville's version of this for dinf:

commit 11ee9615f9bbc9c0c2dbd9f5eb275459b76f032a
Author: Ville Syrjälä <ville.syrjala@linux.intel.com>
Date:   Thu May 28 18:32:36 2015 +0300

    drm/i915: Don't skip request retirement if the active list is empty

We should be covered here I think.
-Daniel
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7117659..4c5a6cd 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2859,7 +2859,7 @@  i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
 	WARN_ON(i915_verify_lists(ring->dev));
 
-	if (list_empty(&ring->active_list))
+	if (list_empty(&ring->request_list))
 		return;
 
 	/* Retire requests first as we use it above for the early return.