Message ID | 20240529113809.145084-2-janusz.krzysztofik@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/i915/selftest_hangcheck: Fix potential UAF after HW fence revoke | expand |
-----Original Message----- From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Sent: Wednesday, May 29, 2024 4:37 AM To: intel-gfx@lists.freedesktop.org Cc: dri-devel@lists.freedesktop.org; Jani Nikula <jani.nikula@linux.intel.com>; Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Tvrtko Ursulin <tursulin@ursulin.net>; Andi Shyti <andi.shyti@linux.intel.com>; Cavitt, Jonathan <jonathan.cavitt@intel.com>; Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Subject: [PATCH] drm/i915/selftest_hangcheck: Fix potential UAF after HW fence revoke > > CI is sporadically reporting the following issue triggered by > igt@i915_selftest@live@hangcheck test case: > > <6> [414.049203] i915: Running intel_hangcheck_live_selftests/igt_reset_evict_fence > ... > <6> [414.068804] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled > <6> [414.068812] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled > <3> [414.070354] Unable to pin Y-tiled fence; err:-4 > <3> [414.071282] i915_vma_revoke_fence:301 GEM_BUG_ON(!i915_active_is_idle(&fence->active)) > ... > <4>[ 609.603992] ------------[ cut here ]------------ > <2>[ 609.603995] kernel BUG at drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c:301! > <4>[ 609.604003] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > <4>[ 609.604006] CPU: 0 PID: 268 Comm: kworker/u64:3 Tainted: G U W 6.9.0-CI_DRM_14785-g1ba62f8cea9c+ #1 > <4>[ 609.604008] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023 > <4>[ 609.604010] Workqueue: i915 __i915_gem_free_work [i915] > <4>[ 609.604149] RIP: 0010:i915_vma_revoke_fence+0x187/0x1f0 [i915] > ... > <4>[ 609.604271] Call Trace: > <4>[ 609.604273] <TASK> > ... > <4>[ 609.604716] __i915_vma_evict+0x2e9/0x550 [i915] > <4>[ 609.604852] __i915_vma_unbind+0x7c/0x160 [i915] > <4>[ 609.604977] force_unbind+0x24/0xa0 [i915] > <4>[ 609.605098] i915_vma_destroy+0x2f/0xa0 [i915] > <4>[ 609.605210] __i915_gem_object_pages_fini+0x51/0x2f0 [i915] > <4>[ 609.605330] __i915_gem_free_objects.isra.0+0x6a/0xc0 [i915] > <4>[ 609.605440] process_scheduled_works+0x351/0x690 > > Since no other tests nor users report that issue, I believe it is specific > to that test case, which should just wait after reset it triggers for > actual completion of a request that it forced to claim using a hardware > fence before it releases allocated resources. Fix it. > + Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10021 > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com> -Jonathan Cavitt > --- > drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c > index 9ce8ff1c04fe5..b47c99f38a525 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c > +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c > @@ -1568,6 +1568,8 @@ static int __igt_reset_evict_vma(struct intel_gt *gt, > > out_rq: > i915_request_put(rq); > + if (flags & EXEC_OBJECT_NEEDS_FENCE) > + i915_active_wait(&arg.vma->fence->active); > out_obj: > i915_gem_object_put(obj); > fini: > -- > 2.45.1 > >
On Wednesday, 29 May 2024 13:37:23 GMT+2 Janusz Krzysztofik wrote: > CI is sporadically reporting the following issue triggered by > igt@i915_selftest@live@hangcheck test case: > > <6> [414.049203] i915: Running intel_hangcheck_live_selftests/igt_reset_evict_fence > ... > <6> [414.068804] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled > <6> [414.068812] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled > <3> [414.070354] Unable to pin Y-tiled fence; err:-4 > <3> [414.071282] i915_vma_revoke_fence:301 GEM_BUG_ON(!i915_active_is_idle(&fence->active)) > ... > <4>[ 609.603992] ------------[ cut here ]------------ > <2>[ 609.603995] kernel BUG at drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c:301! > <4>[ 609.604003] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI > <4>[ 609.604006] CPU: 0 PID: 268 Comm: kworker/u64:3 Tainted: G U W 6.9.0-CI_DRM_14785-g1ba62f8cea9c+ #1 > <4>[ 609.604008] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023 > <4>[ 609.604010] Workqueue: i915 __i915_gem_free_work [i915] > <4>[ 609.604149] RIP: 0010:i915_vma_revoke_fence+0x187/0x1f0 [i915] > ... > <4>[ 609.604271] Call Trace: > <4>[ 609.604273] <TASK> > ... > <4>[ 609.604716] __i915_vma_evict+0x2e9/0x550 [i915] > <4>[ 609.604852] __i915_vma_unbind+0x7c/0x160 [i915] > <4>[ 609.604977] force_unbind+0x24/0xa0 [i915] > <4>[ 609.605098] i915_vma_destroy+0x2f/0xa0 [i915] > <4>[ 609.605210] __i915_gem_object_pages_fini+0x51/0x2f0 [i915] > <4>[ 609.605330] __i915_gem_free_objects.isra.0+0x6a/0xc0 [i915] > <4>[ 609.605440] process_scheduled_works+0x351/0x690 > > Since no other tests nor users report that issue, I was wrong, there were similar CI reports from other tests, not within last 3 months but still looking the same. Please ignore this patch, I need to try again to identify a common root cause of all those occurrences. Thanks, Janusz > I believe it is specific > to that test case, which should just wait after reset it triggers for > actual completion of a request that it forced to claim using a hardware > fence before it releases allocated resources. Fix it. > > Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> > --- > drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c > index 9ce8ff1c04fe5..b47c99f38a525 100644 > --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c > +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c > @@ -1568,6 +1568,8 @@ static int __igt_reset_evict_vma(struct intel_gt *gt, > > out_rq: > i915_request_put(rq); > + if (flags & EXEC_OBJECT_NEEDS_FENCE) > + i915_active_wait(&arg.vma->fence->active); > out_obj: > i915_gem_object_put(obj); > fini: >
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c index 9ce8ff1c04fe5..b47c99f38a525 100644 --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c @@ -1568,6 +1568,8 @@ static int __igt_reset_evict_vma(struct intel_gt *gt, out_rq: i915_request_put(rq); + if (flags & EXEC_OBJECT_NEEDS_FENCE) + i915_active_wait(&arg.vma->fence->active); out_obj: i915_gem_object_put(obj); fini:
CI is sporadically reporting the following issue triggered by igt@i915_selftest@live@hangcheck test case: <6> [414.049203] i915: Running intel_hangcheck_live_selftests/igt_reset_evict_fence ... <6> [414.068804] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled <6> [414.068812] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled <3> [414.070354] Unable to pin Y-tiled fence; err:-4 <3> [414.071282] i915_vma_revoke_fence:301 GEM_BUG_ON(!i915_active_is_idle(&fence->active)) ... <4>[ 609.603992] ------------[ cut here ]------------ <2>[ 609.603995] kernel BUG at drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c:301! <4>[ 609.604003] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI <4>[ 609.604006] CPU: 0 PID: 268 Comm: kworker/u64:3 Tainted: G U W 6.9.0-CI_DRM_14785-g1ba62f8cea9c+ #1 <4>[ 609.604008] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023 <4>[ 609.604010] Workqueue: i915 __i915_gem_free_work [i915] <4>[ 609.604149] RIP: 0010:i915_vma_revoke_fence+0x187/0x1f0 [i915] ... <4>[ 609.604271] Call Trace: <4>[ 609.604273] <TASK> ... <4>[ 609.604716] __i915_vma_evict+0x2e9/0x550 [i915] <4>[ 609.604852] __i915_vma_unbind+0x7c/0x160 [i915] <4>[ 609.604977] force_unbind+0x24/0xa0 [i915] <4>[ 609.605098] i915_vma_destroy+0x2f/0xa0 [i915] <4>[ 609.605210] __i915_gem_object_pages_fini+0x51/0x2f0 [i915] <4>[ 609.605330] __i915_gem_free_objects.isra.0+0x6a/0xc0 [i915] <4>[ 609.605440] process_scheduled_works+0x351/0x690 Since no other tests nor users report that issue, I believe it is specific to that test case, which should just wait after reset it triggers for actual completion of a request that it forced to claim using a hardware fence before it releases allocated resources. Fix it. Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> --- drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 ++ 1 file changed, 2 insertions(+)