diff mbox series

[v6] drm/i915: Fix NULL pointer dereference in capture_engine

Message ID xmsgfynkhycw3cf56akp4he2ffg44vuratocsysaowbsnhutzi@augnqbm777at (mailing list archive)
State New
Headers show
Series [v6] drm/i915: Fix NULL pointer dereference in capture_engine | expand

Commit Message

Eugene Kobyak Dec. 3, 2024, 2:54 p.m. UTC
When the intel_context structure contains NULL,
it raises a NULL pointer dereference error in drm_info().

Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request")
Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309
Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
Cc: <stable@vger.kernel.org> # v6.3+
Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com>
---
v2:
  - return drm_info to separate condition
v3:
  - create separate condition which generate string if intel_context exist
v4:
  - rollback and add check intel_context in log condition
v5:
  - create separate string with guc_id if intel_context exist
v6:
  - print changed log if intel_context exist

 drivers/gpu/drm/i915/i915_gpu_error.c | 18 +++++++++++++++---
 1 file changed, 15 insertions(+), 3 deletions(-)

Comments

Michal Wajdeczko Dec. 3, 2024, 5:06 p.m. UTC | #1
On 03.12.2024 15:54, Eugene Kobyak wrote:
> When the intel_context structure contains NULL,
> it raises a NULL pointer dereference error in drm_info().
> 
> Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request")
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Cc: <stable@vger.kernel.org> # v6.3+
> Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com>
> ---
> v2:
>   - return drm_info to separate condition
> v3:
>   - create separate condition which generate string if intel_context exist
> v4:
>   - rollback and add check intel_context in log condition
> v5:
>   - create separate string with guc_id if intel_context exist
> v6:
>   - print changed log if intel_context exist
> 
>  drivers/gpu/drm/i915/i915_gpu_error.c | 18 +++++++++++++++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 135ded17334e..d88cefb889c3 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -1643,9 +1643,21 @@ capture_engine(struct intel_engine_cs *engine,
>  		return NULL;
>  
>  	intel_engine_get_hung_entity(engine, &ce, &rq);
> -	if (rq && !i915_request_started(rq))
> -		drm_info(&engine->gt->i915->drm, "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
> -			 engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);
> +	if (rq && !i915_request_started(rq)) {
> +		/*
> +		* We want to know also what is the gcu_id of the context,

typo: guc_id

> +		* but if we don't have the context reference, then skip
> +		* printing it.
> +		*/

but IMO this comment is redundant as it's quite obvious that without
context pointer you can't print guc_id member

> +		if (ce)
> +			drm_info(&engine->gt->i915->drm,
> +				"Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
> +				engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);
> +		else
> +			drm_info(&engine->gt->i915->drm,
> +				"Got hung context on %s with active request %lld:%lld not yet started\n",
> +				engine->name, rq->fence.context, rq->fence.seqno);

since you are touching drm_info() where we use engine->gt then maybe
it's good time to switch to gt_info() to get better per-GT message?

> +	}
>  
>  	if (rq) {
>  		capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL);
Andi Shyti Dec. 4, 2024, 10:32 a.m. UTC | #2
Hi Michal,

> > +	if (rq && !i915_request_started(rq)) {
> > +		/*
> > +		* We want to know also what is the gcu_id of the context,
> 
> typo: guc_id
> 
> > +		* but if we don't have the context reference, then skip
> > +		* printing it.
> > +		*/
> 
> but IMO this comment is redundant as it's quite obvious that without
> context pointer you can't print guc_id member

I recommended to add a comment because there is some code
redundancy that, I think, needs some explanation; someone might
not spot immediately the need for ce, unless goes a the end of
the drm_info parameter's list.

> > +		if (ce)
> > +			drm_info(&engine->gt->i915->drm,
> > +				"Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
> > +				engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);
> > +		else
> > +			drm_info(&engine->gt->i915->drm,
> > +				"Got hung context on %s with active request %lld:%lld not yet started\n",
> > +				engine->name, rq->fence.context, rq->fence.seqno);
> 
> since you are touching drm_info() where we use engine->gt then maybe
> it's good time to switch to gt_info() to get better per-GT message?

I think the original reason for using drm_info is because we are
outside the GT. But, because the engine belongs to the GT, it
makes also sense to use gt_info(), I don't oppose.

It would make more sense to move this function completely into
gt/.

Thanks,
Andi
Jani Nikula Dec. 4, 2024, 10:51 a.m. UTC | #3
On Wed, 04 Dec 2024, Andi Shyti <andi.shyti@linux.intel.com> wrote:
> Hi Michal,
>
>> > +	if (rq && !i915_request_started(rq)) {
>> > +		/*
>> > +		* We want to know also what is the gcu_id of the context,
>> 
>> typo: guc_id
>> 
>> > +		* but if we don't have the context reference, then skip
>> > +		* printing it.
>> > +		*/
>> 
>> but IMO this comment is redundant as it's quite obvious that without
>> context pointer you can't print guc_id member
>
> I recommended to add a comment because there is some code
> redundancy that, I think, needs some explanation; someone might
> not spot immediately the need for ce, unless goes a the end of
> the drm_info parameter's list.
>
>> > +		if (ce)
>> > +			drm_info(&engine->gt->i915->drm,
>> > +				"Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
>> > +				engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);
>> > +		else
>> > +			drm_info(&engine->gt->i915->drm,
>> > +				"Got hung context on %s with active request %lld:%lld not yet started\n",
>> > +				engine->name, rq->fence.context, rq->fence.seqno);
>> 
>> since you are touching drm_info() where we use engine->gt then maybe
>> it's good time to switch to gt_info() to get better per-GT message?
>
> I think the original reason for using drm_info is because we are
> outside the GT. But, because the engine belongs to the GT, it
> makes also sense to use gt_info(), I don't oppose.
>
> It would make more sense to move this function completely into
> gt/.

Can we converge on the patch instead of diverge, please?

It's a Cc: stable null pointer dereference fix.

It's already been iterated for two weeks to reach v6.

Fix the comment typo while applying, but there's nothing inherently
wrong here AFAICT. Merge it and move on.


BR,
Jani.
Andi Shyti Dec. 4, 2024, 11:04 a.m. UTC | #4
Hi Jani,

On Wed, Dec 04, 2024 at 12:51:56PM +0200, Jani Nikula wrote:
> On Wed, 04 Dec 2024, Andi Shyti <andi.shyti@linux.intel.com> wrote:
> > Hi Michal,
> >
> >> > +	if (rq && !i915_request_started(rq)) {
> >> > +		/*
> >> > +		* We want to know also what is the gcu_id of the context,
> >> 
> >> typo: guc_id
> >> 
> >> > +		* but if we don't have the context reference, then skip
> >> > +		* printing it.
> >> > +		*/
> >> 
> >> but IMO this comment is redundant as it's quite obvious that without
> >> context pointer you can't print guc_id member
> >
> > I recommended to add a comment because there is some code
> > redundancy that, I think, needs some explanation; someone might
> > not spot immediately the need for ce, unless goes a the end of
> > the drm_info parameter's list.
> >
> >> > +		if (ce)
> >> > +			drm_info(&engine->gt->i915->drm,
> >> > +				"Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
> >> > +				engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);
> >> > +		else
> >> > +			drm_info(&engine->gt->i915->drm,
> >> > +				"Got hung context on %s with active request %lld:%lld not yet started\n",
> >> > +				engine->name, rq->fence.context, rq->fence.seqno);
> >> 
> >> since you are touching drm_info() where we use engine->gt then maybe
> >> it's good time to switch to gt_info() to get better per-GT message?
> >
> > I think the original reason for using drm_info is because we are
> > outside the GT. But, because the engine belongs to the GT, it
> > makes also sense to use gt_info(), I don't oppose.
> >
> > It would make more sense to move this function completely into
> > gt/.
> 
> Can we converge on the patch instead of diverge, please?
> 
> It's a Cc: stable null pointer dereference fix.
> 
> It's already been iterated for two weeks to reach v6.
> 
> Fix the comment typo while applying, but there's nothing inherently
> wrong here AFAICT. Merge it and move on.

Thanks for the feedback, will go ahead and merge.

All other gt/ adjustments can be done later.

Andi
Andi Shyti Dec. 4, 2024, 2:43 p.m. UTC | #5
Hi Eugene,

On Tue, Dec 03, 2024 at 02:54:06PM +0000, Eugene Kobyak wrote:
> When the intel_context structure contains NULL,
> it raises a NULL pointer dereference error in drm_info().
> 
> Fixes: e8a3319c31a1 ("drm/i915: Allow error capture without a request")
> Closes: https://gitlab.freedesktop.org/drm/i915/kernel/-/issues/12309
> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com>
> Cc: John Harrison <John.C.Harrison@Intel.com>
> Cc: <stable@vger.kernel.org> # v6.3+
> Signed-off-by: Eugene Kobyak <eugene.kobyak@intel.com>

merged to drm-intel-next.

Thank you,
Andi
diff mbox series

Patch

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 135ded17334e..d88cefb889c3 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1643,9 +1643,21 @@  capture_engine(struct intel_engine_cs *engine,
 		return NULL;
 
 	intel_engine_get_hung_entity(engine, &ce, &rq);
-	if (rq && !i915_request_started(rq))
-		drm_info(&engine->gt->i915->drm, "Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
-			 engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);
+	if (rq && !i915_request_started(rq)) {
+		/*
+		* We want to know also what is the gcu_id of the context,
+		* but if we don't have the context reference, then skip
+		* printing it.
+		*/
+		if (ce)
+			drm_info(&engine->gt->i915->drm,
+				"Got hung context on %s with active request %lld:%lld [0x%04X] not yet started\n",
+				engine->name, rq->fence.context, rq->fence.seqno, ce->guc_id.id);
+		else
+			drm_info(&engine->gt->i915->drm,
+				"Got hung context on %s with active request %lld:%lld not yet started\n",
+				engine->name, rq->fence.context, rq->fence.seqno);
+	}
 
 	if (rq) {
 		capture = intel_engine_coredump_add_request(ee, rq, ATOMIC_MAYFAIL);