diff mbox series

drm/i915/selftest_hangcheck: Fix potential UAF after HW fence revoke

Message ID 20240529113809.145084-2-janusz.krzysztofik@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series drm/i915/selftest_hangcheck: Fix potential UAF after HW fence revoke | expand

Commit Message

Janusz Krzysztofik May 29, 2024, 11:37 a.m. UTC
CI is sporadically reporting the following issue triggered by
igt@i915_selftest@live@hangcheck test case:

<6> [414.049203] i915: Running intel_hangcheck_live_selftests/igt_reset_evict_fence
...
<6> [414.068804] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
<6> [414.068812] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
<3> [414.070354] Unable to pin Y-tiled fence; err:-4
<3> [414.071282] i915_vma_revoke_fence:301 GEM_BUG_ON(!i915_active_is_idle(&fence->active))
...
<4>[  609.603992] ------------[ cut here ]------------
<2>[  609.603995] kernel BUG at drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c:301!
<4>[  609.604003] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
<4>[  609.604006] CPU: 0 PID: 268 Comm: kworker/u64:3 Tainted: G     U  W          6.9.0-CI_DRM_14785-g1ba62f8cea9c+ #1
<4>[  609.604008] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
<4>[  609.604010] Workqueue: i915 __i915_gem_free_work [i915]
<4>[  609.604149] RIP: 0010:i915_vma_revoke_fence+0x187/0x1f0 [i915]
...
<4>[  609.604271] Call Trace:
<4>[  609.604273]  <TASK>
...
<4>[  609.604716]  __i915_vma_evict+0x2e9/0x550 [i915]
<4>[  609.604852]  __i915_vma_unbind+0x7c/0x160 [i915]
<4>[  609.604977]  force_unbind+0x24/0xa0 [i915]
<4>[  609.605098]  i915_vma_destroy+0x2f/0xa0 [i915]
<4>[  609.605210]  __i915_gem_object_pages_fini+0x51/0x2f0 [i915]
<4>[  609.605330]  __i915_gem_free_objects.isra.0+0x6a/0xc0 [i915]
<4>[  609.605440]  process_scheduled_works+0x351/0x690

Since no other tests nor users report that issue, I believe it is specific
to that test case, which should just wait after reset it triggers for
actual completion of a request that it forced to claim using a hardware
fence before it releases allocated resources.  Fix it.

Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Cavitt, Jonathan May 29, 2024, 2:13 p.m. UTC | #1
-----Original Message-----
From: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com> 
Sent: Wednesday, May 29, 2024 4:37 AM
To: intel-gfx@lists.freedesktop.org
Cc: dri-devel@lists.freedesktop.org; Jani Nikula <jani.nikula@linux.intel.com>; Joonas Lahtinen <joonas.lahtinen@linux.intel.com>; Vivi, Rodrigo <rodrigo.vivi@intel.com>; Tvrtko Ursulin <tursulin@ursulin.net>; Andi Shyti <andi.shyti@linux.intel.com>; Cavitt, Jonathan <jonathan.cavitt@intel.com>; Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
Subject: [PATCH] drm/i915/selftest_hangcheck: Fix potential UAF after HW fence revoke
> 
> CI is sporadically reporting the following issue triggered by
> igt@i915_selftest@live@hangcheck test case:
> 
> <6> [414.049203] i915: Running intel_hangcheck_live_selftests/igt_reset_evict_fence
> ...
> <6> [414.068804] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
> <6> [414.068812] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
> <3> [414.070354] Unable to pin Y-tiled fence; err:-4
> <3> [414.071282] i915_vma_revoke_fence:301 GEM_BUG_ON(!i915_active_is_idle(&fence->active))
> ...
> <4>[  609.603992] ------------[ cut here ]------------
> <2>[  609.603995] kernel BUG at drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c:301!
> <4>[  609.604003] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> <4>[  609.604006] CPU: 0 PID: 268 Comm: kworker/u64:3 Tainted: G     U  W          6.9.0-CI_DRM_14785-g1ba62f8cea9c+ #1
> <4>[  609.604008] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
> <4>[  609.604010] Workqueue: i915 __i915_gem_free_work [i915]
> <4>[  609.604149] RIP: 0010:i915_vma_revoke_fence+0x187/0x1f0 [i915]
> ...
> <4>[  609.604271] Call Trace:
> <4>[  609.604273]  <TASK>
> ...
> <4>[  609.604716]  __i915_vma_evict+0x2e9/0x550 [i915]
> <4>[  609.604852]  __i915_vma_unbind+0x7c/0x160 [i915]
> <4>[  609.604977]  force_unbind+0x24/0xa0 [i915]
> <4>[  609.605098]  i915_vma_destroy+0x2f/0xa0 [i915]
> <4>[  609.605210]  __i915_gem_object_pages_fini+0x51/0x2f0 [i915]
> <4>[  609.605330]  __i915_gem_free_objects.isra.0+0x6a/0xc0 [i915]
> <4>[  609.605440]  process_scheduled_works+0x351/0x690
> 
> Since no other tests nor users report that issue, I believe it is specific
> to that test case, which should just wait after reset it triggers for
> actual completion of a request that it forced to claim using a hardware
> fence before it releases allocated resources.  Fix it.
> 

+ Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/10021

> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>

Reviewed-by: Jonathan Cavitt <jonathan.cavitt@intel.com>
-Jonathan Cavitt

> ---
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 9ce8ff1c04fe5..b47c99f38a525 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -1568,6 +1568,8 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  
>  out_rq:
>  	i915_request_put(rq);
> +	if (flags & EXEC_OBJECT_NEEDS_FENCE)
> +		i915_active_wait(&arg.vma->fence->active);
>  out_obj:
>  	i915_gem_object_put(obj);
>  fini:
> -- 
> 2.45.1
> 
>
Janusz Krzysztofik May 29, 2024, 2:49 p.m. UTC | #2
On Wednesday, 29 May 2024 13:37:23 GMT+2 Janusz Krzysztofik wrote:
> CI is sporadically reporting the following issue triggered by
> igt@i915_selftest@live@hangcheck test case:
> 
> <6> [414.049203] i915: Running intel_hangcheck_live_selftests/igt_reset_evict_fence
> ...
> <6> [414.068804] i915 0000:00:02.0: [drm] GT0: GUC: submission enabled
> <6> [414.068812] i915 0000:00:02.0: [drm] GT0: GUC: SLPC enabled
> <3> [414.070354] Unable to pin Y-tiled fence; err:-4
> <3> [414.071282] i915_vma_revoke_fence:301 GEM_BUG_ON(!i915_active_is_idle(&fence->active))
> ...
> <4>[  609.603992] ------------[ cut here ]------------
> <2>[  609.603995] kernel BUG at drivers/gpu/drm/i915/gt/intel_ggtt_fencing.c:301!
> <4>[  609.604003] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI
> <4>[  609.604006] CPU: 0 PID: 268 Comm: kworker/u64:3 Tainted: G     U  W          6.9.0-CI_DRM_14785-g1ba62f8cea9c+ #1
> <4>[  609.604008] Hardware name: Intel Corporation Alder Lake Client Platform/AlderLake-P DDR4 RVP, BIOS RPLPFWI1.R00.4035.A00.2301200723 01/20/2023
> <4>[  609.604010] Workqueue: i915 __i915_gem_free_work [i915]
> <4>[  609.604149] RIP: 0010:i915_vma_revoke_fence+0x187/0x1f0 [i915]
> ...
> <4>[  609.604271] Call Trace:
> <4>[  609.604273]  <TASK>
> ...
> <4>[  609.604716]  __i915_vma_evict+0x2e9/0x550 [i915]
> <4>[  609.604852]  __i915_vma_unbind+0x7c/0x160 [i915]
> <4>[  609.604977]  force_unbind+0x24/0xa0 [i915]
> <4>[  609.605098]  i915_vma_destroy+0x2f/0xa0 [i915]
> <4>[  609.605210]  __i915_gem_object_pages_fini+0x51/0x2f0 [i915]
> <4>[  609.605330]  __i915_gem_free_objects.isra.0+0x6a/0xc0 [i915]
> <4>[  609.605440]  process_scheduled_works+0x351/0x690
> 
> Since no other tests nor users report that issue, 

I was wrong, there were similar CI reports from other tests, not within last 3 
months but still looking the same.  Please ignore this patch, I need to try 
again to identify a common root cause of all those occurrences.

Thanks,
Janusz

> I believe it is specific
> to that test case, which should just wait after reset it triggers for
> actual completion of a request that it forced to claim using a hardware
> fence before it releases allocated resources.  Fix it.
> 
> Signed-off-by: Janusz Krzysztofik <janusz.krzysztofik@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 9ce8ff1c04fe5..b47c99f38a525 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -1568,6 +1568,8 @@ static int __igt_reset_evict_vma(struct intel_gt *gt,
>  
>  out_rq:
>  	i915_request_put(rq);
> +	if (flags & EXEC_OBJECT_NEEDS_FENCE)
> +		i915_active_wait(&arg.vma->fence->active);
>  out_obj:
>  	i915_gem_object_put(obj);
>  fini:
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 9ce8ff1c04fe5..b47c99f38a525 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -1568,6 +1568,8 @@  static int __igt_reset_evict_vma(struct intel_gt *gt,
 
 out_rq:
 	i915_request_put(rq);
+	if (flags & EXEC_OBJECT_NEEDS_FENCE)
+		i915_active_wait(&arg.vma->fence->active);
 out_obj:
 	i915_gem_object_put(obj);
 fini: