diff mbox series

[v2] drm/sched: Avoid infinite waits in the drm_sched_entity_destroy() path

Message ID 20201002065518.1186013-1-boris.brezillon@collabora.com (mailing list archive)
State New, archived
Headers show
Series [v2] drm/sched: Avoid infinite waits in the drm_sched_entity_destroy() path | expand

Commit Message

Boris Brezillon Oct. 2, 2020, 6:55 a.m. UTC
If we don't initialize the entity to idle and the entity is never
scheduled before being destroyed we end up with an infinite wait in the
destroy path.

v2:
- Add Steven's R-b

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
Reviewed-by: Steven Price <steven.price@arm.com>
---
This is something I noticed while debugging another issue on panfrost
causing the scheduler to be in a weird state where new entities were no
longer scheduled. This was causing all userspace threads trying to close
their DRM fd to be blocked in kernel space waiting for this "entity is
idle" event. I don't know if that fix is legitimate (now that we fixed
the other bug we don't seem to end up in that state anymore), but I
thought I'd share it anyway.
---
 drivers/gpu/drm/scheduler/sched_entity.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Christian König Oct. 2, 2020, 8:31 a.m. UTC | #1
Am 02.10.20 um 08:55 schrieb Boris Brezillon:
> If we don't initialize the entity to idle and the entity is never
> scheduled before being destroyed we end up with an infinite wait in the
> destroy path.

Good catch, of hand I would say that this is valid.

>
> v2:
> - Add Steven's R-b
>
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> Reviewed-by: Steven Price <steven.price@arm.com>

Reviewed-by: Christian König <christian.koenig@amd.com>

Should I pick it up for drm-misc-next? (Or maybe even -fixes?).

Thanks,
Christian.

> ---
> This is something I noticed while debugging another issue on panfrost
> causing the scheduler to be in a weird state where new entities were no
> longer scheduled. This was causing all userspace threads trying to close
> their DRM fd to be blocked in kernel space waiting for this "entity is
> idle" event. I don't know if that fix is legitimate (now that we fixed
> the other bug we don't seem to end up in that state anymore), but I
> thought I'd share it anyway.
> ---
>   drivers/gpu/drm/scheduler/sched_entity.c | 3 +++
>   1 file changed, 3 insertions(+)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 146380118962..f8ec277a6aa8 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -73,6 +73,9 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   
>   	init_completion(&entity->entity_idle);
>   
> +	/* We start in an idle state. */
> +	complete(&entity->entity_idle);
> +
>   	spin_lock_init(&entity->rq_lock);
>   	spsc_queue_init(&entity->job_queue);
>
Boris Brezillon Oct. 2, 2020, 12:28 p.m. UTC | #2
On Fri, 2 Oct 2020 10:31:31 +0200
Christian König <christian.koenig@amd.com> wrote:

> Am 02.10.20 um 08:55 schrieb Boris Brezillon:
> > If we don't initialize the entity to idle and the entity is never
> > scheduled before being destroyed we end up with an infinite wait in the
> > destroy path.  
> 
> Good catch, of hand I would say that this is valid.
> 
> >
> > v2:
> > - Add Steven's R-b
> >
> > Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
> > Reviewed-by: Steven Price <steven.price@arm.com>  
> 
> Reviewed-by: Christian König <christian.koenig@amd.com>
> 
> Should I pick it up for drm-misc-next? (Or maybe even -fixes?).

Sure.
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 146380118962..f8ec277a6aa8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,9 @@  int drm_sched_entity_init(struct drm_sched_entity *entity,
 
 	init_completion(&entity->entity_idle);
 
+	/* We start in an idle state. */
+	complete(&entity->entity_idle);
+
 	spin_lock_init(&entity->rq_lock);
 	spsc_queue_init(&entity->job_queue);