diff mbox series

drm/sched: Fix UB pointer dereference

Message ID 20240827074521.12828-2-pstanner@redhat.com (mailing list archive)
State New, archived
Headers show
Series drm/sched: Fix UB pointer dereference | expand

Commit Message

Philipp Stanner Aug. 27, 2024, 7:45 a.m. UTC
In drm_sched_job_init(), commit 56e449603f0a ("drm/sched: Convert the
GPU scheduler to variable number of run-queues") implemented a call to
drm_err(), which uses the job's scheduler pointer as a parameter.
job->sched, however, is not yet valid as it gets set by
drm_sched_job_arm(), which is always called after drm_sched_job_init().

Since the scheduler code has no control over how the API-User has
allocated or set 'job', the pointer's dereference is undefined behavior.

Fix the UB by replacing drm_err() with pr_err().

Cc: <stable@vger.kernel.org>	# 6.7+
Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
Reported-by: Danilo Krummrich <dakr@redhat.com>
Closes: https://lore.kernel.org/lkml/20231108022716.15250-1-dakr@redhat.com/
Signed-off-by: Philipp Stanner <pstanner@redhat.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Christian König Aug. 27, 2024, 7:48 a.m. UTC | #1
Am 27.08.24 um 09:45 schrieb Philipp Stanner:
> In drm_sched_job_init(), commit 56e449603f0a ("drm/sched: Convert the
> GPU scheduler to variable number of run-queues") implemented a call to
> drm_err(), which uses the job's scheduler pointer as a parameter.
> job->sched, however, is not yet valid as it gets set by
> drm_sched_job_arm(), which is always called after drm_sched_job_init().
>
> Since the scheduler code has no control over how the API-User has
> allocated or set 'job', the pointer's dereference is undefined behavior.
>
> Fix the UB by replacing drm_err() with pr_err().
>
> Cc: <stable@vger.kernel.org>	# 6.7+
> Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
> Reported-by: Danilo Krummrich <dakr@redhat.com>
> Closes: https://lore.kernel.org/lkml/20231108022716.15250-1-dakr@redhat.com/
> Signed-off-by: Philipp Stanner <pstanner@redhat.com>

Good catch, Reviewed-by: Christian König <christian.koenig@amd.com>

> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 7e90c9f95611..356c30fa24a8 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -797,7 +797,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		 * or worse--a blank screen--leave a trail in the
>   		 * logs, so this can be debugged easier.
>   		 */
> -		drm_err(job->sched, "%s: entity has no rq!\n", __func__);
> +		pr_err("*ERROR* %s: entity has no rq!\n", __func__);
>   		return -ENOENT;
>   	}
>
Danilo Krummrich Aug. 27, 2024, 8:39 a.m. UTC | #2
On 8/27/24 9:45 AM, Philipp Stanner wrote:
> In drm_sched_job_init(), commit 56e449603f0a ("drm/sched: Convert the
> GPU scheduler to variable number of run-queues") implemented a call to
> drm_err(), which uses the job's scheduler pointer as a parameter.
> job->sched, however, is not yet valid as it gets set by
> drm_sched_job_arm(), which is always called after drm_sched_job_init().
> 
> Since the scheduler code has no control over how the API-User has
> allocated or set 'job', the pointer's dereference is undefined behavior.
> 
> Fix the UB by replacing drm_err() with pr_err().
> 
> Cc: <stable@vger.kernel.org>	# 6.7+
> Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to variable number of run-queues")
> Reported-by: Danilo Krummrich <dakr@redhat.com>
> Closes: https://lore.kernel.org/lkml/20231108022716.15250-1-dakr@redhat.com/
> Signed-off-by: Philipp Stanner <pstanner@redhat.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 7e90c9f95611..356c30fa24a8 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -797,7 +797,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>   		 * or worse--a blank screen--leave a trail in the
>   		 * logs, so this can be debugged easier.
>   		 */
> -		drm_err(job->sched, "%s: entity has no rq!\n", __func__);
> +		pr_err("*ERROR* %s: entity has no rq!\n", __func__);

I don't think the "*ERROR*" string is necessary, it's pr_err() already.

Otherwise,

Acked-by: Danilo Krummrich <dakr@kernel.org>

>   		return -ENOENT;
>   	}
>
Christian König Aug. 27, 2024, 9 a.m. UTC | #3
Am 27.08.24 um 10:39 schrieb Danilo Krummrich:
> On 8/27/24 9:45 AM, Philipp Stanner wrote:
>> In drm_sched_job_init(), commit 56e449603f0a ("drm/sched: Convert the
>> GPU scheduler to variable number of run-queues") implemented a call to
>> drm_err(), which uses the job's scheduler pointer as a parameter.
>> job->sched, however, is not yet valid as it gets set by
>> drm_sched_job_arm(), which is always called after drm_sched_job_init().
>>
>> Since the scheduler code has no control over how the API-User has
>> allocated or set 'job', the pointer's dereference is undefined behavior.
>>
>> Fix the UB by replacing drm_err() with pr_err().
>>
>> Cc: <stable@vger.kernel.org>    # 6.7+
>> Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to 
>> variable number of run-queues")
>> Reported-by: Danilo Krummrich <dakr@redhat.com>
>> Closes: 
>> https://lore.kernel.org/lkml/20231108022716.15250-1-dakr@redhat.com/
>> Signed-off-by: Philipp Stanner <pstanner@redhat.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 2 +-
>>   1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>> b/drivers/gpu/drm/scheduler/sched_main.c
>> index 7e90c9f95611..356c30fa24a8 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -797,7 +797,7 @@ int drm_sched_job_init(struct drm_sched_job *job,
>>            * or worse--a blank screen--leave a trail in the
>>            * logs, so this can be debugged easier.
>>            */
>> -        drm_err(job->sched, "%s: entity has no rq!\n", __func__);
>> +        pr_err("*ERROR* %s: entity has no rq!\n", __func__);
>
> I don't think the "*ERROR*" string is necessary, it's pr_err() already.

Good point. I will remove that and also add a comment why drm_err won't 
work here before pushing it to drm-misc-fixes.

Thanks,
Christian.

>
> Otherwise,
>
> Acked-by: Danilo Krummrich <dakr@kernel.org>
>
>>           return -ENOENT;
>>       }
Philipp Stanner Aug. 27, 2024, 9:59 a.m. UTC | #4
On Tue, 2024-08-27 at 11:00 +0200, Christian König wrote:
> Am 27.08.24 um 10:39 schrieb Danilo Krummrich:
> > On 8/27/24 9:45 AM, Philipp Stanner wrote:
> > > In drm_sched_job_init(), commit 56e449603f0a ("drm/sched: Convert
> > > the
> > > GPU scheduler to variable number of run-queues") implemented a
> > > call to
> > > drm_err(), which uses the job's scheduler pointer as a parameter.
> > > job->sched, however, is not yet valid as it gets set by
> > > drm_sched_job_arm(), which is always called after
> > > drm_sched_job_init().
> > > 
> > > Since the scheduler code has no control over how the API-User has
> > > allocated or set 'job', the pointer's dereference is undefined
> > > behavior.
> > > 
> > > Fix the UB by replacing drm_err() with pr_err().
> > > 
> > > Cc: <stable@vger.kernel.org>    # 6.7+
> > > Fixes: 56e449603f0a ("drm/sched: Convert the GPU scheduler to 
> > > variable number of run-queues")
> > > Reported-by: Danilo Krummrich <dakr@redhat.com>
> > > Closes: 
> > > https://lore.kernel.org/lkml/20231108022716.15250-1-dakr@redhat.com/
> > > Signed-off-by: Philipp Stanner <pstanner@redhat.com>
> > > ---
> > >   drivers/gpu/drm/scheduler/sched_main.c | 2 +-
> > >   1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
> > > b/drivers/gpu/drm/scheduler/sched_main.c
> > > index 7e90c9f95611..356c30fa24a8 100644
> > > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > > @@ -797,7 +797,7 @@ int drm_sched_job_init(struct drm_sched_job
> > > *job,
> > >            * or worse--a blank screen--leave a trail in the
> > >            * logs, so this can be debugged easier.
> > >            */
> > > -        drm_err(job->sched, "%s: entity has no rq!\n",
> > > __func__);
> > > +        pr_err("*ERROR* %s: entity has no rq!\n", __func__);
> > 
> > I don't think the "*ERROR*" string is necessary, it's pr_err()
> > already.
> 
> Good point. I will remove that and also add a comment why drm_err
> won't 
> work here before pushing it to drm-misc-fixes.

Well, as we're at it I want to point out that the exact same mechanism
occurs just a few lines below, from where I shamelessly copied it:

if (unlikely(!credits)) {
	pr_err("*ERROR* %s: credits cannot be 0!\n", __func__);


P.

> 
> Thanks,
> Christian.
> 
> > 
> > Otherwise,
> > 
> > Acked-by: Danilo Krummrich <dakr@kernel.org>
> > 
> > >           return -ENOENT;
> > >       }
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 7e90c9f95611..356c30fa24a8 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -797,7 +797,7 @@  int drm_sched_job_init(struct drm_sched_job *job,
 		 * or worse--a blank screen--leave a trail in the
 		 * logs, so this can be debugged easier.
 		 */
-		drm_err(job->sched, "%s: entity has no rq!\n", __func__);
+		pr_err("*ERROR* %s: entity has no rq!\n", __func__);
 		return -ENOENT;
 	}