diff mbox

drm/i915: Fix an incorrect free rather than derefence issue.

Message ID 1423744161-21554-1-git-send-email-nicholas.hoath@intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Nick Hoath Feb. 12, 2015, 12:29 p.m. UTC
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652

Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c | 2 +-
 drivers/gpu/drm/i915/i915_gem.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

Comments

Daniel Vetter Feb. 13, 2015, 9:32 a.m. UTC | #1
On Thu, Feb 12, 2015 at 12:29:21PM +0000, Nick Hoath wrote:
> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652
> 
> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>

Commit message is missing the absolutely crucial detail about which patch
introduced this regression:

commit 6d3d8274bc45de4babb62d64562d92af984dd238
Author:     Nick Hoath <nicholas.hoath@intel.com>
AuthorDate: Thu Jan 15 13:10:39 2015 +0000

    drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request

Another thing I've noticed is that we explicitly drop the context
reference for the request before dropping the request reference. Without
clearing the req->ctx pointer. That has a very high chance to leading to
tears, imo the context unreferenceing should be pushed into
i915_gem_request_free.

Except that it's there already, which means we have a double unref now?

Also this patch is for the legacy ringbuffer code, but the referenced bug
is for gen8+ execlists. We're definitely not running this code here I
think.

Imo step one is to drop all the explicit ctx refcounting for req->ctx and
always rely on the implicit reference. Then see what happens.

Cheers, Daniel


> ---
>  drivers/gpu/drm/i915/i915_drv.c | 2 +-
>  drivers/gpu/drm/i915/i915_gem.c | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1765989..dc10d86 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2661,7 +2661,7 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>  			intel_lr_context_unpin(ring, submit_req->ctx);
>  
>  		i915_gem_context_unreference(submit_req->ctx);
> -		kfree(submit_req);
> +		i915_gem_request_unreference(submit_req);
>  	}
>  
>  	/*
> -- 
> 2.1.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
Nick Hoath Feb. 13, 2015, 9:58 a.m. UTC | #2
On 13/02/2015 09:32, Daniel Vetter wrote:
> On Thu, Feb 12, 2015 at 12:29:21PM +0000, Nick Hoath wrote:
>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652
>>
>> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
>
> Commit message is missing the absolutely crucial detail about which patch
> introduced this regression:
>
> commit 6d3d8274bc45de4babb62d64562d92af984dd238
> Author:     Nick Hoath <nicholas.hoath@intel.com>
> AuthorDate: Thu Jan 15 13:10:39 2015 +0000
>
>      drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request
>
> Another thing I've noticed is that we explicitly drop the context
> reference for the request before dropping the request reference. Without
> clearing the req->ctx pointer. That has a very high chance to leading to
> tears, imo the context unreferenceing should be pushed into
> i915_gem_request_free.
>
> Except that it's there already, which means we have a double unref now?

Looking at the code, it looks like that's the case.

>
> Also this patch is for the legacy ringbuffer code, but the referenced bug
> is for gen8+ execlists. We're definitely not running this code here I
> think.

i915_gem_reset_ring_cleanup is used in execlists in the hang recovery case.

>
> Imo step one is to drop all the explicit ctx refcounting for req->ctx and
> always rely on the implicit reference. Then see what happens.

I agree that the refcounting needs re-evaluating after the merge of 
execlist queue entries & requests, however I think the cleanup of the 
double unref/removing the refcounting should be done in another 
patchset. This patch is purely to fix the issue raised in 88652. Depends 
on the relative priorities.

>
> Cheers, Daniel
>
>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.c | 2 +-
>>   drivers/gpu/drm/i915/i915_gem.c | 2 +-
>>   2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 1765989..dc10d86 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2661,7 +2661,7 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>>   			intel_lr_context_unpin(ring, submit_req->ctx);
>>
>>   		i915_gem_context_unreference(submit_req->ctx);
>> -		kfree(submit_req);
>> +		i915_gem_request_unreference(submit_req);
>>   	}
>>
>>   	/*
>> --
>> 2.1.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>
Daniel Vetter Feb. 13, 2015, 12:04 p.m. UTC | #3
On Fri, Feb 13, 2015 at 10:58 AM, Nick Hoath <nicholas.hoath@intel.com> wrote:
> On 13/02/2015 09:32, Daniel Vetter wrote:
>>
>> On Thu, Feb 12, 2015 at 12:29:21PM +0000, Nick Hoath wrote:
>>>
>>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652
>>>
>>> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
>>
>>
>> Commit message is missing the absolutely crucial detail about which patch
>> introduced this regression:
>>
>> commit 6d3d8274bc45de4babb62d64562d92af984dd238
>> Author:     Nick Hoath <nicholas.hoath@intel.com>
>> AuthorDate: Thu Jan 15 13:10:39 2015 +0000
>>
>>      drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request
>>
>> Another thing I've noticed is that we explicitly drop the context
>> reference for the request before dropping the request reference. Without
>> clearing the req->ctx pointer. That has a very high chance to leading to
>> tears, imo the context unreferenceing should be pushed into
>> i915_gem_request_free.
>>
>> Except that it's there already, which means we have a double unref now?
>
>
> Looking at the code, it looks like that's the case.
>
>>
>> Also this patch is for the legacy ringbuffer code, but the referenced bug
>> is for gen8+ execlists. We're definitely not running this code here I
>> think.
>
>
> i915_gem_reset_ring_cleanup is used in execlists in the hang recovery case.

Oops I've missed that, somehow I've thought intel_lrc.c has it's own
reset recovery code. But I've just mixed up function names a bit.

>> Imo step one is to drop all the explicit ctx refcounting for req->ctx and
>> always rely on the implicit reference. Then see what happens.
>
>
> I agree that the refcounting needs re-evaluating after the merge of execlist
> queue entries & requests, however I think the cleanup of the double
> unref/removing the refcounting should be done in another patchset. This
> patch is purely to fix the issue raised in 88652. Depends on the relative
> priorities.

Well this patch here won't work because there's now a double unref.
And I spotted that one because intel_execlists_retire_requests seems
to have the exact same double unref already. Hence why I think we
should fix up that first and then reasses what's left.

The bug here (before your patch) is just a use-after-free, if there's
some other reference to the request. And it will also be fixed with
the redone req->ctx refcounting.
-Daniel
Shuang He Feb. 13, 2015, 12:28 p.m. UTC | #4
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5767
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
PNV                 -1              282/282              281/282
ILK                                  313/313              313/313
SNB                                  309/323              309/323
IVB                                  380/380              380/380
BYT                                  296/296              296/296
HSW                 -2              425/425              423/425
BDW                 -1              318/318              317/318
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*PNV  igt_gen3_render_linear_blits      PASS(5)      CRASH(1)PASS(1)
 HSW  igt_kms_flip_plain-flip-fb-recreate      TIMEOUT(2)PASS(1)      TIMEOUT(1)PASS(1)
 HSW  igt_kms_flip_plain-flip-fb-recreate-interruptible      TIMEOUT(2)PASS(2)      TIMEOUT(1)PASS(1)
*BDW  igt_gem_gtt_hog      PASS(8)      DMESG_WARN(1)PASS(1)
Note: You need to pay more attention to line start with '*'
diff mbox

Patch

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1765989..dc10d86 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2661,7 +2661,7 @@  static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 			intel_lr_context_unpin(ring, submit_req->ctx);
 
 		i915_gem_context_unreference(submit_req->ctx);
-		kfree(submit_req);
+		i915_gem_request_unreference(submit_req);
 	}
 
 	/*