Message ID | 1423744161-21554-1-git-send-email-nicholas.hoath@intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Feb 12, 2015 at 12:29:21PM +0000, Nick Hoath wrote: > Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652 > > Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> Commit message is missing the absolutely crucial detail about which patch introduced this regression: commit 6d3d8274bc45de4babb62d64562d92af984dd238 Author: Nick Hoath <nicholas.hoath@intel.com> AuthorDate: Thu Jan 15 13:10:39 2015 +0000 drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request Another thing I've noticed is that we explicitly drop the context reference for the request before dropping the request reference. Without clearing the req->ctx pointer. That has a very high chance to leading to tears, imo the context unreferenceing should be pushed into i915_gem_request_free. Except that it's there already, which means we have a double unref now? Also this patch is for the legacy ringbuffer code, but the referenced bug is for gen8+ execlists. We're definitely not running this code here I think. Imo step one is to drop all the explicit ctx refcounting for req->ctx and always rely on the implicit reference. Then see what happens. Cheers, Daniel > --- > drivers/gpu/drm/i915/i915_drv.c | 2 +- > drivers/gpu/drm/i915/i915_gem.c | 2 +- > 2 files changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c > index 1765989..dc10d86 100644 > --- a/drivers/gpu/drm/i915/i915_gem.c > +++ b/drivers/gpu/drm/i915/i915_gem.c > @@ -2661,7 +2661,7 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, > intel_lr_context_unpin(ring, submit_req->ctx); > > i915_gem_context_unreference(submit_req->ctx); > - kfree(submit_req); > + i915_gem_request_unreference(submit_req); > } > > /* > -- > 2.1.1 > > _______________________________________________ > Intel-gfx mailing list > Intel-gfx@lists.freedesktop.org > http://lists.freedesktop.org/mailman/listinfo/intel-gfx
On 13/02/2015 09:32, Daniel Vetter wrote: > On Thu, Feb 12, 2015 at 12:29:21PM +0000, Nick Hoath wrote: >> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652 >> >> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> > > Commit message is missing the absolutely crucial detail about which patch > introduced this regression: > > commit 6d3d8274bc45de4babb62d64562d92af984dd238 > Author: Nick Hoath <nicholas.hoath@intel.com> > AuthorDate: Thu Jan 15 13:10:39 2015 +0000 > > drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request > > Another thing I've noticed is that we explicitly drop the context > reference for the request before dropping the request reference. Without > clearing the req->ctx pointer. That has a very high chance to leading to > tears, imo the context unreferenceing should be pushed into > i915_gem_request_free. > > Except that it's there already, which means we have a double unref now? Looking at the code, it looks like that's the case. > > Also this patch is for the legacy ringbuffer code, but the referenced bug > is for gen8+ execlists. We're definitely not running this code here I > think. i915_gem_reset_ring_cleanup is used in execlists in the hang recovery case. > > Imo step one is to drop all the explicit ctx refcounting for req->ctx and > always rely on the implicit reference. Then see what happens. I agree that the refcounting needs re-evaluating after the merge of execlist queue entries & requests, however I think the cleanup of the double unref/removing the refcounting should be done in another patchset. This patch is purely to fix the issue raised in 88652. Depends on the relative priorities. > > Cheers, Daniel > > >> --- >> drivers/gpu/drm/i915/i915_drv.c | 2 +- >> drivers/gpu/drm/i915/i915_gem.c | 2 +- >> 2 files changed, 2 insertions(+), 2 deletions(-) >> >> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c >> index 1765989..dc10d86 100644 >> --- a/drivers/gpu/drm/i915/i915_gem.c >> +++ b/drivers/gpu/drm/i915/i915_gem.c >> @@ -2661,7 +2661,7 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, >> intel_lr_context_unpin(ring, submit_req->ctx); >> >> i915_gem_context_unreference(submit_req->ctx); >> - kfree(submit_req); >> + i915_gem_request_unreference(submit_req); >> } >> >> /* >> -- >> 2.1.1 >> >> _______________________________________________ >> Intel-gfx mailing list >> Intel-gfx@lists.freedesktop.org >> http://lists.freedesktop.org/mailman/listinfo/intel-gfx >
On Fri, Feb 13, 2015 at 10:58 AM, Nick Hoath <nicholas.hoath@intel.com> wrote: > On 13/02/2015 09:32, Daniel Vetter wrote: >> >> On Thu, Feb 12, 2015 at 12:29:21PM +0000, Nick Hoath wrote: >>> >>> Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652 >>> >>> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> >> >> >> Commit message is missing the absolutely crucial detail about which patch >> introduced this regression: >> >> commit 6d3d8274bc45de4babb62d64562d92af984dd238 >> Author: Nick Hoath <nicholas.hoath@intel.com> >> AuthorDate: Thu Jan 15 13:10:39 2015 +0000 >> >> drm/i915: Subsume intel_ctx_submit_request in to drm_i915_gem_request >> >> Another thing I've noticed is that we explicitly drop the context >> reference for the request before dropping the request reference. Without >> clearing the req->ctx pointer. That has a very high chance to leading to >> tears, imo the context unreferenceing should be pushed into >> i915_gem_request_free. >> >> Except that it's there already, which means we have a double unref now? > > > Looking at the code, it looks like that's the case. > >> >> Also this patch is for the legacy ringbuffer code, but the referenced bug >> is for gen8+ execlists. We're definitely not running this code here I >> think. > > > i915_gem_reset_ring_cleanup is used in execlists in the hang recovery case. Oops I've missed that, somehow I've thought intel_lrc.c has it's own reset recovery code. But I've just mixed up function names a bit. >> Imo step one is to drop all the explicit ctx refcounting for req->ctx and >> always rely on the implicit reference. Then see what happens. > > > I agree that the refcounting needs re-evaluating after the merge of execlist > queue entries & requests, however I think the cleanup of the double > unref/removing the refcounting should be done in another patchset. This > patch is purely to fix the issue raised in 88652. Depends on the relative > priorities. Well this patch here won't work because there's now a double unref. And I spotted that one because intel_execlists_retire_requests seems to have the exact same double unref already. Hence why I think we should fix up that first and then reasses what's left. The bug here (before your patch) is just a use-after-free, if there's some other reference to the request. And it will also be fixed with the redone req->ctx refcounting. -Daniel
Tested-By: PRC QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 5767
-------------------------------------Summary-------------------------------------
Platform Delta drm-intel-nightly Series Applied
PNV -1 282/282 281/282
ILK 313/313 313/313
SNB 309/323 309/323
IVB 380/380 380/380
BYT 296/296 296/296
HSW -2 425/425 423/425
BDW -1 318/318 317/318
-------------------------------------Detailed-------------------------------------
Platform Test drm-intel-nightly Series Applied
*PNV igt_gen3_render_linear_blits PASS(5) CRASH(1)PASS(1)
HSW igt_kms_flip_plain-flip-fb-recreate TIMEOUT(2)PASS(1) TIMEOUT(1)PASS(1)
HSW igt_kms_flip_plain-flip-fb-recreate-interruptible TIMEOUT(2)PASS(2) TIMEOUT(1)PASS(1)
*BDW igt_gem_gtt_hog PASS(8) DMESG_WARN(1)PASS(1)
Note: You need to pay more attention to line start with '*'
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c index 1765989..dc10d86 100644 --- a/drivers/gpu/drm/i915/i915_gem.c +++ b/drivers/gpu/drm/i915/i915_gem.c @@ -2661,7 +2661,7 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv, intel_lr_context_unpin(ring, submit_req->ctx); i915_gem_context_unreference(submit_req->ctx); - kfree(submit_req); + i915_gem_request_unreference(submit_req); } /*
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=88652 Signed-off-by: Nick Hoath <nicholas.hoath@intel.com> --- drivers/gpu/drm/i915/i915_drv.c | 2 +- drivers/gpu/drm/i915/i915_gem.c | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-)