diff mbox series

drm/sched: Avoid infinite waits in the drm_sched_entity_destroy() path

Message ID 20201001141253.1066836-1-boris.brezillon@collabora.com (mailing list archive)
State New, archived
Headers show
Series drm/sched: Avoid infinite waits in the drm_sched_entity_destroy() path | expand

Commit Message

Boris Brezillon Oct. 1, 2020, 2:12 p.m. UTC
If we don't initialize the entity to idle and the entity is never
scheduled before being destroyed we end up with an infinite wait in the
destroy path.

Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
---
This is something I noticed while debugging another issue on panfrost
causing the scheduler to be in a weird state where new entities were no
longer scheduled. This was causing all userspace threads trying to close
their DRM fd to be blocked in kernel space waiting for this "entity is
idle" event. I don't know if that fix is legitimate (now that we fixed
the other bug we don't seem to end up in that state anymore), but I
thought I'd share it anyway.
---
 drivers/gpu/drm/scheduler/sched_entity.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Steven Price Oct. 1, 2020, 3:07 p.m. UTC | #1
On 01/10/2020 15:12, Boris Brezillon wrote:
> If we don't initialize the entity to idle and the entity is never
> scheduled before being destroyed we end up with an infinite wait in the
> destroy path.
> 
> Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>

This seems reasonable to me - it looks like in theory if you very 
quickly open, submit a job and close you could trigger this (i.e. if 
drm_sched_main() never actually enters the while loop).

You should CC some other folk as this doesn't just affect Panfrost.

Reviewed-by: Steven Price <steven.price@arm.com>

> ---
> This is something I noticed while debugging another issue on panfrost
> causing the scheduler to be in a weird state where new entities were no
> longer scheduled. This was causing all userspace threads trying to close
> their DRM fd to be blocked in kernel space waiting for this "entity is
> idle" event. I don't know if that fix is legitimate (now that we fixed
> the other bug we don't seem to end up in that state anymore), but I
> thought I'd share it anyway.
> ---
>   drivers/gpu/drm/scheduler/sched_entity.c | 3 +++
>   1 file changed, 3 insertions(+)
> 
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
> index 146380118962..f8ec277a6aa8 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -73,6 +73,9 @@ int drm_sched_entity_init(struct drm_sched_entity *entity,
>   
>   	init_completion(&entity->entity_idle);
>   
> +	/* We start in an idle state. */
> +	complete(&entity->entity_idle);
> +
>   	spin_lock_init(&entity->rq_lock);
>   	spsc_queue_init(&entity->job_queue);
>   
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_entity.c b/drivers/gpu/drm/scheduler/sched_entity.c
index 146380118962..f8ec277a6aa8 100644
--- a/drivers/gpu/drm/scheduler/sched_entity.c
+++ b/drivers/gpu/drm/scheduler/sched_entity.c
@@ -73,6 +73,9 @@  int drm_sched_entity_init(struct drm_sched_entity *entity,
 
 	init_completion(&entity->entity_idle);
 
+	/* We start in an idle state. */
+	complete(&entity->entity_idle);
+
 	spin_lock_init(&entity->rq_lock);
 	spsc_queue_init(&entity->job_queue);