diff mbox series

[v2,2/2] drm/sched: warn about drm_sched_job_init()'s partial init

Message ID 20240828094133.17402-4-pstanner@redhat.com (mailing list archive)
State New, archived
Headers show
Series [v2,1/2] drm/sched: memset() 'job' in drm_sched_job_init() | expand

Commit Message

Philipp Stanner Aug. 28, 2024, 9:41 a.m. UTC
drm_sched_job_init()'s name suggests that after the function succeeded,
parameter "job" will be fully initialized. This is not the case; some
members are only later set, notably "job->sched" by drm_sched_job_arm().

Document that drm_sched_job_init() does not set all struct members.

Document that job->sched in particular is uninitialized before
drm_sched_job_arm().

Signed-off-by: Philipp Stanner <pstanner@redhat.com>
---
Changes in v2:
  - Change grammar in the new comments a bit.
---
 drivers/gpu/drm/scheduler/sched_main.c | 4 ++++
 include/drm/gpu_scheduler.h            | 7 +++++++
 2 files changed, 11 insertions(+)

Comments

Tvrtko Ursulin Sept. 13, 2024, 12:21 p.m. UTC | #1
On 28/08/2024 10:41, Philipp Stanner wrote:
> drm_sched_job_init()'s name suggests that after the function succeeded,
> parameter "job" will be fully initialized. This is not the case; some
> members are only later set, notably "job->sched" by drm_sched_job_arm().
> 
> Document that drm_sched_job_init() does not set all struct members.
> 
> Document that job->sched in particular is uninitialized before
> drm_sched_job_arm().
> 
> Signed-off-by: Philipp Stanner <pstanner@redhat.com>
> ---
> Changes in v2:
>    - Change grammar in the new comments a bit.
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 4 ++++
>   include/drm/gpu_scheduler.h            | 7 +++++++
>   2 files changed, 11 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index b0c8ad10b419..721373938c1e 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -781,6 +781,10 @@ EXPORT_SYMBOL(drm_sched_resubmit_jobs);
>    * Drivers must make sure drm_sched_job_cleanup() if this function returns
>    * successfully, even when @job is aborted before drm_sched_job_arm() is called.
>    *
> + * Note that this function does not assign a valid value to each struct member
> + * of struct drm_sched_job. Take a look at that struct's documentation to see
> + * who sets which struct member with what lifetime.

First sentence is fine, but the second I don't see the those details in 
struct drm_sched_job. (And I am not saying that they must be listed. IMO 
at some point it is better to have a high level overview than describe 
the lifetime rules with individual members.)

> + *
>    * WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware
>    * has died, which can mean that there's no valid runqueue for a @entity.
>    * This function returns -ENOENT in this case (which probably should be -EIO as
> diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
> index 5acc64954a88..04a268cd22f1 100644
> --- a/include/drm/gpu_scheduler.h
> +++ b/include/drm/gpu_scheduler.h
> @@ -337,6 +337,13 @@ struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
>   struct drm_sched_job {
>   	struct spsc_node		queue_node;
>   	struct list_head		list;
> +
> +	/*
> +	 * The scheduler this job is or will be scheduled on.
> +	 *
> +	 * Gets set by drm_sched_arm(). Valid until the scheduler's backend_ops
> +	 * callback "free_job()" has been called.

This is interesting - I was not sure where lifetime for job->sched is 
defined and couldn't find it browsing around. Where did you find the 
clues to tie it to the free_job() callback?

Regards,

Tvrtko

> +	 */
>   	struct drm_gpu_scheduler	*sched;
>   	struct drm_sched_fence		*s_fence;
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index b0c8ad10b419..721373938c1e 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -781,6 +781,10 @@  EXPORT_SYMBOL(drm_sched_resubmit_jobs);
  * Drivers must make sure drm_sched_job_cleanup() if this function returns
  * successfully, even when @job is aborted before drm_sched_job_arm() is called.
  *
+ * Note that this function does not assign a valid value to each struct member
+ * of struct drm_sched_job. Take a look at that struct's documentation to see
+ * who sets which struct member with what lifetime.
+ *
  * WARNING: amdgpu abuses &drm_sched.ready to signal when the hardware
  * has died, which can mean that there's no valid runqueue for a @entity.
  * This function returns -ENOENT in this case (which probably should be -EIO as
diff --git a/include/drm/gpu_scheduler.h b/include/drm/gpu_scheduler.h
index 5acc64954a88..04a268cd22f1 100644
--- a/include/drm/gpu_scheduler.h
+++ b/include/drm/gpu_scheduler.h
@@ -337,6 +337,13 @@  struct drm_sched_fence *to_drm_sched_fence(struct dma_fence *f);
 struct drm_sched_job {
 	struct spsc_node		queue_node;
 	struct list_head		list;
+
+	/*
+	 * The scheduler this job is or will be scheduled on.
+	 *
+	 * Gets set by drm_sched_arm(). Valid until the scheduler's backend_ops
+	 * callback "free_job()" has been called.
+	 */
 	struct drm_gpu_scheduler	*sched;
 	struct drm_sched_fence		*s_fence;