diff mbox series

blocking ops in drm_sched_cleanup_jobs()

Message ID 6ed945cb-619a-b26d-21b0-9bdaeaa69582@arm.com (mailing list archive)
State New, archived
Headers show
Series blocking ops in drm_sched_cleanup_jobs() | expand

Commit Message

Steven Price Sept. 13, 2019, 2:50 p.m. UTC
Hi,

I hit the below splat randomly with panfrost. From what I can tell this
is a more general issue which would affect other drivers.

----8<-----
[58604.913130] ------------[ cut here ]------------
[58604.918590] WARNING: CPU: 1 PID: 1758 at kernel/sched/core.c:6556 __might_sleep+0x74/0x98
[58604.927965] do not call blocking ops when !TASK_RUNNING; state=1 set at [<0c590494>] prepare_to_wait_event+0x104/0x164
[58604.940047] Modules linked in: panfrost gpu_sched
[58604.945370] CPU: 1 PID: 1758 Comm: pan_js Not tainted 5.3.0-rc1+ #13
[58604.952500] Hardware name: Rockchip (Device Tree)
[58604.957815] [<c0111150>] (unwind_backtrace) from [<c010c99c>] (show_stack+0x10/0x14)
[58604.966521] [<c010c99c>] (show_stack) from [<c07adbb4>] (dump_stack+0x9c/0xd4)
[58604.974639] [<c07adbb4>] (dump_stack) from [<c0121da8>] (__warn+0xe8/0x104)
[58604.982462] [<c0121da8>] (__warn) from [<c0121e08>] (warn_slowpath_fmt+0x44/0x6c)
[58604.990867] [<c0121e08>] (warn_slowpath_fmt) from [<c014eccc>] (__might_sleep+0x74/0x98)
[58604.999973] [<c014eccc>] (__might_sleep) from [<c07c73d8>] (__mutex_lock+0x38/0x948)
[58605.008690] [<c07c73d8>] (__mutex_lock) from [<c07c7d00>] (mutex_lock_nested+0x18/0x20)
[58605.017841] [<c07c7d00>] (mutex_lock_nested) from [<bf00b54c>] (panfrost_gem_free_object+0x60/0x10c [panfrost])
[58605.029430] [<bf00b54c>] (panfrost_gem_free_object [panfrost]) from [<bf00cecc>] (panfrost_job_put+0x138/0x150 [panfrost])
[58605.042076] [<bf00cecc>] (panfrost_job_put [panfrost]) from [<bf00121c>] (drm_sched_cleanup_jobs+0xc8/0xe0 [gpu_sched])
[58605.054417] [<bf00121c>] (drm_sched_cleanup_jobs [gpu_sched]) from [<bf001300>] (drm_sched_main+0xcc/0x26c [gpu_sched])
[58605.066620] [<bf001300>] (drm_sched_main [gpu_sched]) from [<c0146cfc>] (kthread+0x13c/0x154)
[58605.076226] [<c0146cfc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
[58605.084346] Exception stack(0xe959bfb0 to 0xe959bff8)
[58605.090046] bfa0:                                     00000000 00000000 00000000 00000000
[58605.099250] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
[58605.108480] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
[58605.116210] irq event stamp: 179
[58605.119955] hardirqs last  enabled at (187): [<c017f7e4>] console_unlock+0x564/0x5c4
[58605.128935] hardirqs last disabled at (202): [<c017f308>] console_unlock+0x88/0x5c4
[58605.137788] softirqs last  enabled at (216): [<c0102334>] __do_softirq+0x18c/0x548
[58605.146543] softirqs last disabled at (227): [<c0129528>] irq_exit+0xc4/0x10c
[58605.154618] ---[ end trace f65bdbd9ea9adfc0 ]---
----8<-----

The problem is that drm_sched_main() calls drm_sched_cleanup_jobs() as
part of the condition of wait_event_interruptible:

> 		wait_event_interruptible(sched->wake_up_worker,
> 					 (drm_sched_cleanup_jobs(sched),
> 					 (!drm_sched_blocked(sched) &&
> 					  (entity = drm_sched_select_entity(sched))) ||
> 					 kthread_should_stop()));

When drm_sched_cleanup_jobs() is called *after* a wait (i.e. after
prepare_to_wait_event() has been called), then any might_sleep() will
moan loudly about it. This doesn't seem to happen often (I've only
triggered it once) because usually drm_sched_cleanup_jobs() either
doesn't sleep or does the sleeping during the first call that
wait_event_interruptible() makes (which is before the task state is set).

I don't really understand why drm_sched_cleanup_jobs() needs to be
called here, a simple change like below 'fixes' it. But I presume
there's some reason for the call being part of the
wait_event_interruptible condition. Can anyone shed light on this?

The code was introduced in commit 5918045c4ed4 ("drm/scheduler: rework job destruction")

Steve

----8<-----

Comments

Christian König Sept. 16, 2019, 8:11 a.m. UTC | #1
Hi Steven,

the problem seems to be than panfrost is trying to sleep while freeing a 
job. E.g. it tries to take a mutex.

That is not allowed any more since we need to free the jobs from atomic 
and even interrupt context.

Your suggestion wouldn't work because this way jobs are not freed when 
there isn't a new one to be scheduled.

Regards,
Christian.

Am 13.09.19 um 16:50 schrieb Steven Price:
> Hi,
>
> I hit the below splat randomly with panfrost. From what I can tell this
> is a more general issue which would affect other drivers.
>
> ----8<-----
> [58604.913130] ------------[ cut here ]------------
> [58604.918590] WARNING: CPU: 1 PID: 1758 at kernel/sched/core.c:6556 __might_sleep+0x74/0x98
> [58604.927965] do not call blocking ops when !TASK_RUNNING; state=1 set at [<0c590494>] prepare_to_wait_event+0x104/0x164
> [58604.940047] Modules linked in: panfrost gpu_sched
> [58604.945370] CPU: 1 PID: 1758 Comm: pan_js Not tainted 5.3.0-rc1+ #13
> [58604.952500] Hardware name: Rockchip (Device Tree)
> [58604.957815] [<c0111150>] (unwind_backtrace) from [<c010c99c>] (show_stack+0x10/0x14)
> [58604.966521] [<c010c99c>] (show_stack) from [<c07adbb4>] (dump_stack+0x9c/0xd4)
> [58604.974639] [<c07adbb4>] (dump_stack) from [<c0121da8>] (__warn+0xe8/0x104)
> [58604.982462] [<c0121da8>] (__warn) from [<c0121e08>] (warn_slowpath_fmt+0x44/0x6c)
> [58604.990867] [<c0121e08>] (warn_slowpath_fmt) from [<c014eccc>] (__might_sleep+0x74/0x98)
> [58604.999973] [<c014eccc>] (__might_sleep) from [<c07c73d8>] (__mutex_lock+0x38/0x948)
> [58605.008690] [<c07c73d8>] (__mutex_lock) from [<c07c7d00>] (mutex_lock_nested+0x18/0x20)
> [58605.017841] [<c07c7d00>] (mutex_lock_nested) from [<bf00b54c>] (panfrost_gem_free_object+0x60/0x10c [panfrost])
> [58605.029430] [<bf00b54c>] (panfrost_gem_free_object [panfrost]) from [<bf00cecc>] (panfrost_job_put+0x138/0x150 [panfrost])
> [58605.042076] [<bf00cecc>] (panfrost_job_put [panfrost]) from [<bf00121c>] (drm_sched_cleanup_jobs+0xc8/0xe0 [gpu_sched])
> [58605.054417] [<bf00121c>] (drm_sched_cleanup_jobs [gpu_sched]) from [<bf001300>] (drm_sched_main+0xcc/0x26c [gpu_sched])
> [58605.066620] [<bf001300>] (drm_sched_main [gpu_sched]) from [<c0146cfc>] (kthread+0x13c/0x154)
> [58605.076226] [<c0146cfc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> [58605.084346] Exception stack(0xe959bfb0 to 0xe959bff8)
> [58605.090046] bfa0:                                     00000000 00000000 00000000 00000000
> [58605.099250] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [58605.108480] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [58605.116210] irq event stamp: 179
> [58605.119955] hardirqs last  enabled at (187): [<c017f7e4>] console_unlock+0x564/0x5c4
> [58605.128935] hardirqs last disabled at (202): [<c017f308>] console_unlock+0x88/0x5c4
> [58605.137788] softirqs last  enabled at (216): [<c0102334>] __do_softirq+0x18c/0x548
> [58605.146543] softirqs last disabled at (227): [<c0129528>] irq_exit+0xc4/0x10c
> [58605.154618] ---[ end trace f65bdbd9ea9adfc0 ]---
> ----8<-----
>
> The problem is that drm_sched_main() calls drm_sched_cleanup_jobs() as
> part of the condition of wait_event_interruptible:
>
>> 		wait_event_interruptible(sched->wake_up_worker,
>> 					 (drm_sched_cleanup_jobs(sched),
>> 					 (!drm_sched_blocked(sched) &&
>> 					  (entity = drm_sched_select_entity(sched))) ||
>> 					 kthread_should_stop()));
> When drm_sched_cleanup_jobs() is called *after* a wait (i.e. after
> prepare_to_wait_event() has been called), then any might_sleep() will
> moan loudly about it. This doesn't seem to happen often (I've only
> triggered it once) because usually drm_sched_cleanup_jobs() either
> doesn't sleep or does the sleeping during the first call that
> wait_event_interruptible() makes (which is before the task state is set).
>
> I don't really understand why drm_sched_cleanup_jobs() needs to be
> called here, a simple change like below 'fixes' it. But I presume
> there's some reason for the call being part of the
> wait_event_interruptible condition. Can anyone shed light on this?
>
> The code was introduced in commit 5918045c4ed4 ("drm/scheduler: rework job destruction")
>
> Steve
>
> ----8<-----
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 9a0ee74d82dc..528f295e3a31 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -699,11 +699,12 @@ static int drm_sched_main(void *param)
>   		struct drm_sched_job *sched_job;
>   		struct dma_fence *fence;
>   
> +		drm_sched_cleanup_jobs(sched);
> +
>   		wait_event_interruptible(sched->wake_up_worker,
> -					 (drm_sched_cleanup_jobs(sched),
>   					 (!drm_sched_blocked(sched) &&
>   					  (entity = drm_sched_select_entity(sched))) ||
> -					 kthread_should_stop()));
> +					 kthread_should_stop());
>   
>   		if (!entity)
>   			continue;
Daniel Vetter Sept. 16, 2019, 2:24 p.m. UTC | #2
On Mon, Sep 16, 2019 at 10:11 AM Koenig, Christian
<Christian.Koenig@amd.com> wrote:
>
> Hi Steven,
>
> the problem seems to be than panfrost is trying to sleep while freeing a
> job. E.g. it tries to take a mutex.
>
> That is not allowed any more since we need to free the jobs from atomic
> and even interrupt context.
>
> Your suggestion wouldn't work because this way jobs are not freed when
> there isn't a new one to be scheduled.

One fix would be to make sure that any that any calls to
drm_sched_cleanup_jobs are atomic, by putting preempt_disable/enable
or local_irq_disable/enable in there, at least when lockdep or sleep
debugging is enabled. That should help catch these reliable, instead
of just once every blue moon.
-Daniel

>
> Regards,
> Christian.
>
> Am 13.09.19 um 16:50 schrieb Steven Price:
> > Hi,
> >
> > I hit the below splat randomly with panfrost. From what I can tell this
> > is a more general issue which would affect other drivers.
> >
> > ----8<-----
> > [58604.913130] ------------[ cut here ]------------
> > [58604.918590] WARNING: CPU: 1 PID: 1758 at kernel/sched/core.c:6556 __might_sleep+0x74/0x98
> > [58604.927965] do not call blocking ops when !TASK_RUNNING; state=1 set at [<0c590494>] prepare_to_wait_event+0x104/0x164
> > [58604.940047] Modules linked in: panfrost gpu_sched
> > [58604.945370] CPU: 1 PID: 1758 Comm: pan_js Not tainted 5.3.0-rc1+ #13
> > [58604.952500] Hardware name: Rockchip (Device Tree)
> > [58604.957815] [<c0111150>] (unwind_backtrace) from [<c010c99c>] (show_stack+0x10/0x14)
> > [58604.966521] [<c010c99c>] (show_stack) from [<c07adbb4>] (dump_stack+0x9c/0xd4)
> > [58604.974639] [<c07adbb4>] (dump_stack) from [<c0121da8>] (__warn+0xe8/0x104)
> > [58604.982462] [<c0121da8>] (__warn) from [<c0121e08>] (warn_slowpath_fmt+0x44/0x6c)
> > [58604.990867] [<c0121e08>] (warn_slowpath_fmt) from [<c014eccc>] (__might_sleep+0x74/0x98)
> > [58604.999973] [<c014eccc>] (__might_sleep) from [<c07c73d8>] (__mutex_lock+0x38/0x948)
> > [58605.008690] [<c07c73d8>] (__mutex_lock) from [<c07c7d00>] (mutex_lock_nested+0x18/0x20)
> > [58605.017841] [<c07c7d00>] (mutex_lock_nested) from [<bf00b54c>] (panfrost_gem_free_object+0x60/0x10c [panfrost])
> > [58605.029430] [<bf00b54c>] (panfrost_gem_free_object [panfrost]) from [<bf00cecc>] (panfrost_job_put+0x138/0x150 [panfrost])
> > [58605.042076] [<bf00cecc>] (panfrost_job_put [panfrost]) from [<bf00121c>] (drm_sched_cleanup_jobs+0xc8/0xe0 [gpu_sched])
> > [58605.054417] [<bf00121c>] (drm_sched_cleanup_jobs [gpu_sched]) from [<bf001300>] (drm_sched_main+0xcc/0x26c [gpu_sched])
> > [58605.066620] [<bf001300>] (drm_sched_main [gpu_sched]) from [<c0146cfc>] (kthread+0x13c/0x154)
> > [58605.076226] [<c0146cfc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> > [58605.084346] Exception stack(0xe959bfb0 to 0xe959bff8)
> > [58605.090046] bfa0:                                     00000000 00000000 00000000 00000000
> > [58605.099250] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> > [58605.108480] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> > [58605.116210] irq event stamp: 179
> > [58605.119955] hardirqs last  enabled at (187): [<c017f7e4>] console_unlock+0x564/0x5c4
> > [58605.128935] hardirqs last disabled at (202): [<c017f308>] console_unlock+0x88/0x5c4
> > [58605.137788] softirqs last  enabled at (216): [<c0102334>] __do_softirq+0x18c/0x548
> > [58605.146543] softirqs last disabled at (227): [<c0129528>] irq_exit+0xc4/0x10c
> > [58605.154618] ---[ end trace f65bdbd9ea9adfc0 ]---
> > ----8<-----
> >
> > The problem is that drm_sched_main() calls drm_sched_cleanup_jobs() as
> > part of the condition of wait_event_interruptible:
> >
> >>              wait_event_interruptible(sched->wake_up_worker,
> >>                                       (drm_sched_cleanup_jobs(sched),
> >>                                       (!drm_sched_blocked(sched) &&
> >>                                        (entity = drm_sched_select_entity(sched))) ||
> >>                                       kthread_should_stop()));
> > When drm_sched_cleanup_jobs() is called *after* a wait (i.e. after
> > prepare_to_wait_event() has been called), then any might_sleep() will
> > moan loudly about it. This doesn't seem to happen often (I've only
> > triggered it once) because usually drm_sched_cleanup_jobs() either
> > doesn't sleep or does the sleeping during the first call that
> > wait_event_interruptible() makes (which is before the task state is set).
> >
> > I don't really understand why drm_sched_cleanup_jobs() needs to be
> > called here, a simple change like below 'fixes' it. But I presume
> > there's some reason for the call being part of the
> > wait_event_interruptible condition. Can anyone shed light on this?
> >
> > The code was introduced in commit 5918045c4ed4 ("drm/scheduler: rework job destruction")
> >
> > Steve
> >
> > ----8<-----
> > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> > index 9a0ee74d82dc..528f295e3a31 100644
> > --- a/drivers/gpu/drm/scheduler/sched_main.c
> > +++ b/drivers/gpu/drm/scheduler/sched_main.c
> > @@ -699,11 +699,12 @@ static int drm_sched_main(void *param)
> >               struct drm_sched_job *sched_job;
> >               struct dma_fence *fence;
> >
> > +             drm_sched_cleanup_jobs(sched);
> > +
> >               wait_event_interruptible(sched->wake_up_worker,
> > -                                      (drm_sched_cleanup_jobs(sched),
> >                                        (!drm_sched_blocked(sched) &&
> >                                         (entity = drm_sched_select_entity(sched))) ||
> > -                                      kthread_should_stop()));
> > +                                      kthread_should_stop());
> >
> >               if (!entity)
> >                       continue;
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
Christian König Sept. 16, 2019, 2:44 p.m. UTC | #3
Am 16.09.19 um 16:24 schrieb Daniel Vetter:
> On Mon, Sep 16, 2019 at 10:11 AM Koenig, Christian
> <Christian.Koenig@amd.com> wrote:
>> Hi Steven,
>>
>> the problem seems to be than panfrost is trying to sleep while freeing a
>> job. E.g. it tries to take a mutex.
>>
>> That is not allowed any more since we need to free the jobs from atomic
>> and even interrupt context.
>>
>> Your suggestion wouldn't work because this way jobs are not freed when
>> there isn't a new one to be scheduled.
> One fix would be to make sure that any that any calls to
> drm_sched_cleanup_jobs are atomic, by putting preempt_disable/enable
> or local_irq_disable/enable in there, at least when lockdep or sleep
> debugging is enabled. That should help catch these reliable, instead
> of just once every blue moon.

Yeah, thought about that as well. But I would also prefer a solution 
where drm_sched_cleanup_jobs() is never called in an atomic context.

The problem is that I don't see how that can be possible without 
delaying freeing of jobs.

Regards,
Christian.

> -Daniel
>
>> Regards,
>> Christian.
>>
>> Am 13.09.19 um 16:50 schrieb Steven Price:
>>> Hi,
>>>
>>> I hit the below splat randomly with panfrost. From what I can tell this
>>> is a more general issue which would affect other drivers.
>>>
>>> ----8<-----
>>> [58604.913130] ------------[ cut here ]------------
>>> [58604.918590] WARNING: CPU: 1 PID: 1758 at kernel/sched/core.c:6556 __might_sleep+0x74/0x98
>>> [58604.927965] do not call blocking ops when !TASK_RUNNING; state=1 set at [<0c590494>] prepare_to_wait_event+0x104/0x164
>>> [58604.940047] Modules linked in: panfrost gpu_sched
>>> [58604.945370] CPU: 1 PID: 1758 Comm: pan_js Not tainted 5.3.0-rc1+ #13
>>> [58604.952500] Hardware name: Rockchip (Device Tree)
>>> [58604.957815] [<c0111150>] (unwind_backtrace) from [<c010c99c>] (show_stack+0x10/0x14)
>>> [58604.966521] [<c010c99c>] (show_stack) from [<c07adbb4>] (dump_stack+0x9c/0xd4)
>>> [58604.974639] [<c07adbb4>] (dump_stack) from [<c0121da8>] (__warn+0xe8/0x104)
>>> [58604.982462] [<c0121da8>] (__warn) from [<c0121e08>] (warn_slowpath_fmt+0x44/0x6c)
>>> [58604.990867] [<c0121e08>] (warn_slowpath_fmt) from [<c014eccc>] (__might_sleep+0x74/0x98)
>>> [58604.999973] [<c014eccc>] (__might_sleep) from [<c07c73d8>] (__mutex_lock+0x38/0x948)
>>> [58605.008690] [<c07c73d8>] (__mutex_lock) from [<c07c7d00>] (mutex_lock_nested+0x18/0x20)
>>> [58605.017841] [<c07c7d00>] (mutex_lock_nested) from [<bf00b54c>] (panfrost_gem_free_object+0x60/0x10c [panfrost])
>>> [58605.029430] [<bf00b54c>] (panfrost_gem_free_object [panfrost]) from [<bf00cecc>] (panfrost_job_put+0x138/0x150 [panfrost])
>>> [58605.042076] [<bf00cecc>] (panfrost_job_put [panfrost]) from [<bf00121c>] (drm_sched_cleanup_jobs+0xc8/0xe0 [gpu_sched])
>>> [58605.054417] [<bf00121c>] (drm_sched_cleanup_jobs [gpu_sched]) from [<bf001300>] (drm_sched_main+0xcc/0x26c [gpu_sched])
>>> [58605.066620] [<bf001300>] (drm_sched_main [gpu_sched]) from [<c0146cfc>] (kthread+0x13c/0x154)
>>> [58605.076226] [<c0146cfc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>>> [58605.084346] Exception stack(0xe959bfb0 to 0xe959bff8)
>>> [58605.090046] bfa0:                                     00000000 00000000 00000000 00000000
>>> [58605.099250] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>>> [58605.108480] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>>> [58605.116210] irq event stamp: 179
>>> [58605.119955] hardirqs last  enabled at (187): [<c017f7e4>] console_unlock+0x564/0x5c4
>>> [58605.128935] hardirqs last disabled at (202): [<c017f308>] console_unlock+0x88/0x5c4
>>> [58605.137788] softirqs last  enabled at (216): [<c0102334>] __do_softirq+0x18c/0x548
>>> [58605.146543] softirqs last disabled at (227): [<c0129528>] irq_exit+0xc4/0x10c
>>> [58605.154618] ---[ end trace f65bdbd9ea9adfc0 ]---
>>> ----8<-----
>>>
>>> The problem is that drm_sched_main() calls drm_sched_cleanup_jobs() as
>>> part of the condition of wait_event_interruptible:
>>>
>>>>               wait_event_interruptible(sched->wake_up_worker,
>>>>                                        (drm_sched_cleanup_jobs(sched),
>>>>                                        (!drm_sched_blocked(sched) &&
>>>>                                         (entity = drm_sched_select_entity(sched))) ||
>>>>                                        kthread_should_stop()));
>>> When drm_sched_cleanup_jobs() is called *after* a wait (i.e. after
>>> prepare_to_wait_event() has been called), then any might_sleep() will
>>> moan loudly about it. This doesn't seem to happen often (I've only
>>> triggered it once) because usually drm_sched_cleanup_jobs() either
>>> doesn't sleep or does the sleeping during the first call that
>>> wait_event_interruptible() makes (which is before the task state is set).
>>>
>>> I don't really understand why drm_sched_cleanup_jobs() needs to be
>>> called here, a simple change like below 'fixes' it. But I presume
>>> there's some reason for the call being part of the
>>> wait_event_interruptible condition. Can anyone shed light on this?
>>>
>>> The code was introduced in commit 5918045c4ed4 ("drm/scheduler: rework job destruction")
>>>
>>> Steve
>>>
>>> ----8<-----
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 9a0ee74d82dc..528f295e3a31 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -699,11 +699,12 @@ static int drm_sched_main(void *param)
>>>                struct drm_sched_job *sched_job;
>>>                struct dma_fence *fence;
>>>
>>> +             drm_sched_cleanup_jobs(sched);
>>> +
>>>                wait_event_interruptible(sched->wake_up_worker,
>>> -                                      (drm_sched_cleanup_jobs(sched),
>>>                                         (!drm_sched_blocked(sched) &&
>>>                                          (entity = drm_sched_select_entity(sched))) ||
>>> -                                      kthread_should_stop()));
>>> +                                      kthread_should_stop());
>>>
>>>                if (!entity)
>>>                        continue;
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
>
Christian König Sept. 17, 2019, 8:42 a.m. UTC | #4
Hi Steven,

thought about that issue a bit more and I think I came up with a solution.

What you could do is to split up drm_sched_cleanup_jobs() into two 
functions.

One that checks if jobs to be cleaned up are present and one which does 
the actual cleanup.

This way we could call drm_sched_cleanup_jobs() outside of the 
wait_event_interruptible().

Regards,
Christian.

Am 13.09.19 um 16:50 schrieb Steven Price:
> Hi,
>
> I hit the below splat randomly with panfrost. From what I can tell this
> is a more general issue which would affect other drivers.
>
> ----8<-----
> [58604.913130] ------------[ cut here ]------------
> [58604.918590] WARNING: CPU: 1 PID: 1758 at kernel/sched/core.c:6556 __might_sleep+0x74/0x98
> [58604.927965] do not call blocking ops when !TASK_RUNNING; state=1 set at [<0c590494>] prepare_to_wait_event+0x104/0x164
> [58604.940047] Modules linked in: panfrost gpu_sched
> [58604.945370] CPU: 1 PID: 1758 Comm: pan_js Not tainted 5.3.0-rc1+ #13
> [58604.952500] Hardware name: Rockchip (Device Tree)
> [58604.957815] [<c0111150>] (unwind_backtrace) from [<c010c99c>] (show_stack+0x10/0x14)
> [58604.966521] [<c010c99c>] (show_stack) from [<c07adbb4>] (dump_stack+0x9c/0xd4)
> [58604.974639] [<c07adbb4>] (dump_stack) from [<c0121da8>] (__warn+0xe8/0x104)
> [58604.982462] [<c0121da8>] (__warn) from [<c0121e08>] (warn_slowpath_fmt+0x44/0x6c)
> [58604.990867] [<c0121e08>] (warn_slowpath_fmt) from [<c014eccc>] (__might_sleep+0x74/0x98)
> [58604.999973] [<c014eccc>] (__might_sleep) from [<c07c73d8>] (__mutex_lock+0x38/0x948)
> [58605.008690] [<c07c73d8>] (__mutex_lock) from [<c07c7d00>] (mutex_lock_nested+0x18/0x20)
> [58605.017841] [<c07c7d00>] (mutex_lock_nested) from [<bf00b54c>] (panfrost_gem_free_object+0x60/0x10c [panfrost])
> [58605.029430] [<bf00b54c>] (panfrost_gem_free_object [panfrost]) from [<bf00cecc>] (panfrost_job_put+0x138/0x150 [panfrost])
> [58605.042076] [<bf00cecc>] (panfrost_job_put [panfrost]) from [<bf00121c>] (drm_sched_cleanup_jobs+0xc8/0xe0 [gpu_sched])
> [58605.054417] [<bf00121c>] (drm_sched_cleanup_jobs [gpu_sched]) from [<bf001300>] (drm_sched_main+0xcc/0x26c [gpu_sched])
> [58605.066620] [<bf001300>] (drm_sched_main [gpu_sched]) from [<c0146cfc>] (kthread+0x13c/0x154)
> [58605.076226] [<c0146cfc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
> [58605.084346] Exception stack(0xe959bfb0 to 0xe959bff8)
> [58605.090046] bfa0:                                     00000000 00000000 00000000 00000000
> [58605.099250] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
> [58605.108480] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
> [58605.116210] irq event stamp: 179
> [58605.119955] hardirqs last  enabled at (187): [<c017f7e4>] console_unlock+0x564/0x5c4
> [58605.128935] hardirqs last disabled at (202): [<c017f308>] console_unlock+0x88/0x5c4
> [58605.137788] softirqs last  enabled at (216): [<c0102334>] __do_softirq+0x18c/0x548
> [58605.146543] softirqs last disabled at (227): [<c0129528>] irq_exit+0xc4/0x10c
> [58605.154618] ---[ end trace f65bdbd9ea9adfc0 ]---
> ----8<-----
>
> The problem is that drm_sched_main() calls drm_sched_cleanup_jobs() as
> part of the condition of wait_event_interruptible:
>
>> 		wait_event_interruptible(sched->wake_up_worker,
>> 					 (drm_sched_cleanup_jobs(sched),
>> 					 (!drm_sched_blocked(sched) &&
>> 					  (entity = drm_sched_select_entity(sched))) ||
>> 					 kthread_should_stop()));
> When drm_sched_cleanup_jobs() is called *after* a wait (i.e. after
> prepare_to_wait_event() has been called), then any might_sleep() will
> moan loudly about it. This doesn't seem to happen often (I've only
> triggered it once) because usually drm_sched_cleanup_jobs() either
> doesn't sleep or does the sleeping during the first call that
> wait_event_interruptible() makes (which is before the task state is set).
>
> I don't really understand why drm_sched_cleanup_jobs() needs to be
> called here, a simple change like below 'fixes' it. But I presume
> there's some reason for the call being part of the
> wait_event_interruptible condition. Can anyone shed light on this?
>
> The code was introduced in commit 5918045c4ed4 ("drm/scheduler: rework job destruction")
>
> Steve
>
> ----8<-----
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 9a0ee74d82dc..528f295e3a31 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -699,11 +699,12 @@ static int drm_sched_main(void *param)
>   		struct drm_sched_job *sched_job;
>   		struct dma_fence *fence;
>   
> +		drm_sched_cleanup_jobs(sched);
> +
>   		wait_event_interruptible(sched->wake_up_worker,
> -					 (drm_sched_cleanup_jobs(sched),
>   					 (!drm_sched_blocked(sched) &&
>   					  (entity = drm_sched_select_entity(sched))) ||
> -					 kthread_should_stop()));
> +					 kthread_should_stop());
>   
>   		if (!entity)
>   			continue;
Steven Price Sept. 17, 2019, 12:46 p.m. UTC | #5
On 17/09/2019 09:42, Koenig, Christian wrote:
> Hi Steven,
> 
> thought about that issue a bit more and I think I came up with a solution.
> 
> What you could do is to split up drm_sched_cleanup_jobs() into two
> functions.
> 
> One that checks if jobs to be cleaned up are present and one which does
> the actual cleanup.
> 
> This way we could call drm_sched_cleanup_jobs() outside of the
> wait_event_interruptible().

Yes that seems like a good solution - there doesn't seem to be a good 
reason why the actual job cleanup needs to be done within the 
wait_event_interruptible() condition. I did briefly attempt that before, 
but I couldn't work out exactly what the condition is which should cause 
the wake (my initial attempt caused continuous wake-ups).

I'm not back in the office until Monday, but I'll have another go then 
at spitting the checking and the actual freeing of the jobs.

Thanks,

Steve

> 
> Regards,
> Christian.
> 
> Am 13.09.19 um 16:50 schrieb Steven Price:
>> Hi,
>>
>> I hit the below splat randomly with panfrost. From what I can tell this
>> is a more general issue which would affect other drivers.
>>
>> ----8<-----
>> [58604.913130] ------------[ cut here ]------------
>> [58604.918590] WARNING: CPU: 1 PID: 1758 at kernel/sched/core.c:6556 __might_sleep+0x74/0x98
>> [58604.927965] do not call blocking ops when !TASK_RUNNING; state=1 set at [<0c590494>] prepare_to_wait_event+0x104/0x164
>> [58604.940047] Modules linked in: panfrost gpu_sched
>> [58604.945370] CPU: 1 PID: 1758 Comm: pan_js Not tainted 5.3.0-rc1+ #13
>> [58604.952500] Hardware name: Rockchip (Device Tree)
>> [58604.957815] [<c0111150>] (unwind_backtrace) from [<c010c99c>] (show_stack+0x10/0x14)
>> [58604.966521] [<c010c99c>] (show_stack) from [<c07adbb4>] (dump_stack+0x9c/0xd4)
>> [58604.974639] [<c07adbb4>] (dump_stack) from [<c0121da8>] (__warn+0xe8/0x104)
>> [58604.982462] [<c0121da8>] (__warn) from [<c0121e08>] (warn_slowpath_fmt+0x44/0x6c)
>> [58604.990867] [<c0121e08>] (warn_slowpath_fmt) from [<c014eccc>] (__might_sleep+0x74/0x98)
>> [58604.999973] [<c014eccc>] (__might_sleep) from [<c07c73d8>] (__mutex_lock+0x38/0x948)
>> [58605.008690] [<c07c73d8>] (__mutex_lock) from [<c07c7d00>] (mutex_lock_nested+0x18/0x20)
>> [58605.017841] [<c07c7d00>] (mutex_lock_nested) from [<bf00b54c>] (panfrost_gem_free_object+0x60/0x10c [panfrost])
>> [58605.029430] [<bf00b54c>] (panfrost_gem_free_object [panfrost]) from [<bf00cecc>] (panfrost_job_put+0x138/0x150 [panfrost])
>> [58605.042076] [<bf00cecc>] (panfrost_job_put [panfrost]) from [<bf00121c>] (drm_sched_cleanup_jobs+0xc8/0xe0 [gpu_sched])
>> [58605.054417] [<bf00121c>] (drm_sched_cleanup_jobs [gpu_sched]) from [<bf001300>] (drm_sched_main+0xcc/0x26c [gpu_sched])
>> [58605.066620] [<bf001300>] (drm_sched_main [gpu_sched]) from [<c0146cfc>] (kthread+0x13c/0x154)
>> [58605.076226] [<c0146cfc>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20)
>> [58605.084346] Exception stack(0xe959bfb0 to 0xe959bff8)
>> [58605.090046] bfa0:                                     00000000 00000000 00000000 00000000
>> [58605.099250] bfc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
>> [58605.108480] bfe0: 00000000 00000000 00000000 00000000 00000013 00000000
>> [58605.116210] irq event stamp: 179
>> [58605.119955] hardirqs last  enabled at (187): [<c017f7e4>] console_unlock+0x564/0x5c4
>> [58605.128935] hardirqs last disabled at (202): [<c017f308>] console_unlock+0x88/0x5c4
>> [58605.137788] softirqs last  enabled at (216): [<c0102334>] __do_softirq+0x18c/0x548
>> [58605.146543] softirqs last disabled at (227): [<c0129528>] irq_exit+0xc4/0x10c
>> [58605.154618] ---[ end trace f65bdbd9ea9adfc0 ]---
>> ----8<-----
>>
>> The problem is that drm_sched_main() calls drm_sched_cleanup_jobs() as
>> part of the condition of wait_event_interruptible:
>>
>>> 		wait_event_interruptible(sched->wake_up_worker,
>>> 					 (drm_sched_cleanup_jobs(sched),
>>> 					 (!drm_sched_blocked(sched) &&
>>> 					  (entity = drm_sched_select_entity(sched))) ||
>>> 					 kthread_should_stop()));
>> When drm_sched_cleanup_jobs() is called *after* a wait (i.e. after
>> prepare_to_wait_event() has been called), then any might_sleep() will
>> moan loudly about it. This doesn't seem to happen often (I've only
>> triggered it once) because usually drm_sched_cleanup_jobs() either
>> doesn't sleep or does the sleeping during the first call that
>> wait_event_interruptible() makes (which is before the task state is set).
>>
>> I don't really understand why drm_sched_cleanup_jobs() needs to be
>> called here, a simple change like below 'fixes' it. But I presume
>> there's some reason for the call being part of the
>> wait_event_interruptible condition. Can anyone shed light on this?
>>
>> The code was introduced in commit 5918045c4ed4 ("drm/scheduler: rework job destruction")
>>
>> Steve
>>
>> ----8<-----
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 9a0ee74d82dc..528f295e3a31 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -699,11 +699,12 @@ static int drm_sched_main(void *param)
>>    		struct drm_sched_job *sched_job;
>>    		struct dma_fence *fence;
>>    
>> +		drm_sched_cleanup_jobs(sched);
>> +
>>    		wait_event_interruptible(sched->wake_up_worker,
>> -					 (drm_sched_cleanup_jobs(sched),
>>    					 (!drm_sched_blocked(sched) &&
>>    					  (entity = drm_sched_select_entity(sched))) ||
>> -					 kthread_should_stop()));
>> +					 kthread_should_stop());
>>    
>>    		if (!entity)
>>    			continue;
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
Christian König Sept. 24, 2019, 9:55 a.m. UTC | #6
Sorry for the delayed response, have been busy on other stuff last week.

Am 17.09.19 um 14:46 schrieb Steven Price:
> On 17/09/2019 09:42, Koenig, Christian wrote:
>> Hi Steven,
>>
>> thought about that issue a bit more and I think I came up with a 
>> solution.
>>
>> What you could do is to split up drm_sched_cleanup_jobs() into two
>> functions.
>>
>> One that checks if jobs to be cleaned up are present and one which does
>> the actual cleanup.
>>
>> This way we could call drm_sched_cleanup_jobs() outside of the
>> wait_event_interruptible().
>
> Yes that seems like a good solution - there doesn't seem to be a good 
> reason why the actual job cleanup needs to be done within the 
> wait_event_interruptible() condition. I did briefly attempt that 
> before, but I couldn't work out exactly what the condition is which 
> should cause the wake (my initial attempt caused continuous wake-ups).

Basically you need something like the following:

1. Test is timeout worker is running:

if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
     !cancel_delayed_work(&sched->work_tdr))
         return false;

2. Test if there is any job ready to be cleaned up.

job = list_first_entry_or_null(&sched->ring_mirror_list, struct 
drm_sched_job, node);
if (!job || !dma_fence_is_signaled(&job->s_fence->finished))
     return false;

That should basically do it.

Regards,
Christian.

>
> I'm not back in the office until Monday, but I'll have another go then 
> at spitting the checking and the actual freeing of the jobs.
>
> Thanks,
>
> Steve
>
>>
>> Regards,
>> Christian.
>>
>> Am 13.09.19 um 16:50 schrieb Steven Price:
>>> Hi,
>>>
>>> I hit the below splat randomly with panfrost. From what I can tell this
>>> is a more general issue which would affect other drivers.
>>>
>>> ----8<-----
>>> [58604.913130] ------------[ cut here ]------------
>>> [58604.918590] WARNING: CPU: 1 PID: 1758 at kernel/sched/core.c:6556 
>>> __might_sleep+0x74/0x98
>>> [58604.927965] do not call blocking ops when !TASK_RUNNING; state=1 
>>> set at [<0c590494>] prepare_to_wait_event+0x104/0x164
>>> [58604.940047] Modules linked in: panfrost gpu_sched
>>> [58604.945370] CPU: 1 PID: 1758 Comm: pan_js Not tainted 5.3.0-rc1+ #13
>>> [58604.952500] Hardware name: Rockchip (Device Tree)
>>> [58604.957815] [<c0111150>] (unwind_backtrace) from [<c010c99c>] 
>>> (show_stack+0x10/0x14)
>>> [58604.966521] [<c010c99c>] (show_stack) from [<c07adbb4>] 
>>> (dump_stack+0x9c/0xd4)
>>> [58604.974639] [<c07adbb4>] (dump_stack) from [<c0121da8>] 
>>> (__warn+0xe8/0x104)
>>> [58604.982462] [<c0121da8>] (__warn) from [<c0121e08>] 
>>> (warn_slowpath_fmt+0x44/0x6c)
>>> [58604.990867] [<c0121e08>] (warn_slowpath_fmt) from [<c014eccc>] 
>>> (__might_sleep+0x74/0x98)
>>> [58604.999973] [<c014eccc>] (__might_sleep) from [<c07c73d8>] 
>>> (__mutex_lock+0x38/0x948)
>>> [58605.008690] [<c07c73d8>] (__mutex_lock) from [<c07c7d00>] 
>>> (mutex_lock_nested+0x18/0x20)
>>> [58605.017841] [<c07c7d00>] (mutex_lock_nested) from [<bf00b54c>] 
>>> (panfrost_gem_free_object+0x60/0x10c [panfrost])
>>> [58605.029430] [<bf00b54c>] (panfrost_gem_free_object [panfrost]) 
>>> from [<bf00cecc>] (panfrost_job_put+0x138/0x150 [panfrost])
>>> [58605.042076] [<bf00cecc>] (panfrost_job_put [panfrost]) from 
>>> [<bf00121c>] (drm_sched_cleanup_jobs+0xc8/0xe0 [gpu_sched])
>>> [58605.054417] [<bf00121c>] (drm_sched_cleanup_jobs [gpu_sched]) 
>>> from [<bf001300>] (drm_sched_main+0xcc/0x26c [gpu_sched])
>>> [58605.066620] [<bf001300>] (drm_sched_main [gpu_sched]) from 
>>> [<c0146cfc>] (kthread+0x13c/0x154)
>>> [58605.076226] [<c0146cfc>] (kthread) from [<c01010b4>] 
>>> (ret_from_fork+0x14/0x20)
>>> [58605.084346] Exception stack(0xe959bfb0 to 0xe959bff8)
>>> [58605.090046] bfa0: 00000000 00000000 00000000 00000000
>>> [58605.099250] bfc0: 00000000 00000000 00000000 00000000 00000000 
>>> 00000000 00000000 00000000
>>> [58605.108480] bfe0: 00000000 00000000 00000000 00000000 00000013 
>>> 00000000
>>> [58605.116210] irq event stamp: 179
>>> [58605.119955] hardirqs last  enabled at (187): [<c017f7e4>] 
>>> console_unlock+0x564/0x5c4
>>> [58605.128935] hardirqs last disabled at (202): [<c017f308>] 
>>> console_unlock+0x88/0x5c4
>>> [58605.137788] softirqs last  enabled at (216): [<c0102334>] 
>>> __do_softirq+0x18c/0x548
>>> [58605.146543] softirqs last disabled at (227): [<c0129528>] 
>>> irq_exit+0xc4/0x10c
>>> [58605.154618] ---[ end trace f65bdbd9ea9adfc0 ]---
>>> ----8<-----
>>>
>>> The problem is that drm_sched_main() calls drm_sched_cleanup_jobs() as
>>> part of the condition of wait_event_interruptible:
>>>
>>>> wait_event_interruptible(sched->wake_up_worker,
>>>>                      (drm_sched_cleanup_jobs(sched),
>>>>                      (!drm_sched_blocked(sched) &&
>>>>                       (entity = drm_sched_select_entity(sched))) ||
>>>>                      kthread_should_stop()));
>>> When drm_sched_cleanup_jobs() is called *after* a wait (i.e. after
>>> prepare_to_wait_event() has been called), then any might_sleep() will
>>> moan loudly about it. This doesn't seem to happen often (I've only
>>> triggered it once) because usually drm_sched_cleanup_jobs() either
>>> doesn't sleep or does the sleeping during the first call that
>>> wait_event_interruptible() makes (which is before the task state is 
>>> set).
>>>
>>> I don't really understand why drm_sched_cleanup_jobs() needs to be
>>> called here, a simple change like below 'fixes' it. But I presume
>>> there's some reason for the call being part of the
>>> wait_event_interruptible condition. Can anyone shed light on this?
>>>
>>> The code was introduced in commit 5918045c4ed4 ("drm/scheduler: 
>>> rework job destruction")
>>>
>>> Steve
>>>
>>> ----8<-----
>>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c 
>>> b/drivers/gpu/drm/scheduler/sched_main.c
>>> index 9a0ee74d82dc..528f295e3a31 100644
>>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>>> @@ -699,11 +699,12 @@ static int drm_sched_main(void *param)
>>>            struct drm_sched_job *sched_job;
>>>            struct dma_fence *fence;
>>>    +        drm_sched_cleanup_jobs(sched);
>>> +
>>>            wait_event_interruptible(sched->wake_up_worker,
>>> -                     (drm_sched_cleanup_jobs(sched),
>>>                         (!drm_sched_blocked(sched) &&
>>>                          (entity = drm_sched_select_entity(sched))) ||
>>> -                     kthread_should_stop()));
>>> +                     kthread_should_stop());
>>>               if (!entity)
>>>                continue;
>>
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>
>
Steven Price Sept. 25, 2019, 2:06 p.m. UTC | #7
On 24/09/2019 10:55, Koenig, Christian wrote:
> Sorry for the delayed response, have been busy on other stuff last week.
> 
> Am 17.09.19 um 14:46 schrieb Steven Price:
>> On 17/09/2019 09:42, Koenig, Christian wrote:
>>> Hi Steven,
>>>
>>> thought about that issue a bit more and I think I came up with a 
>>> solution.
>>>
>>> What you could do is to split up drm_sched_cleanup_jobs() into two
>>> functions.
>>>
>>> One that checks if jobs to be cleaned up are present and one which does
>>> the actual cleanup.
>>>
>>> This way we could call drm_sched_cleanup_jobs() outside of the
>>> wait_event_interruptible().
>>
>> Yes that seems like a good solution - there doesn't seem to be a good 
>> reason why the actual job cleanup needs to be done within the 
>> wait_event_interruptible() condition. I did briefly attempt that 
>> before, but I couldn't work out exactly what the condition is which 
>> should cause the wake (my initial attempt caused continuous wake-ups).
> 
> Basically you need something like the following:
> 
> 1. Test is timeout worker is running:
> 
> if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>      !cancel_delayed_work(&sched->work_tdr))
>          return false;
> 
> 2. Test if there is any job ready to be cleaned up.
> 
> job = list_first_entry_or_null(&sched->ring_mirror_list, struct 
> drm_sched_job, node);
> if (!job || !dma_fence_is_signaled(&job->s_fence->finished))
>      return false;
> 
> That should basically do it.

Thanks for the pointers. I wasn't sure if the "queue timeout for next
job" part was necessary or not if step 2 above returns false.

I've been testing the following patch which simply pulls the
sched->ops->free_job() out of the wait_event_interruptible().

I'll try with just the tests you've described.

----8<-----
From 873c1816394beee72904e64aa2ee0f169e768d76 Mon Sep 17 00:00:00 2001
From: Steven Price <steven.price@arm.com>
Date: Mon, 23 Sep 2019 11:08:50 +0100
Subject: [PATCH] drm: Don't free jobs in wait_event_interruptible()

drm_sched_cleanup_jobs() attempts to free finished jobs, however because
it is called as the condition of wait_event_interruptible() it must not
sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.

Instead let's rename drm_sched_cleanup_jobs() to
drm_sched_get_cleanup_job() and simply return a job for processing if
there is one. The caller can then call the free_job() callback outside
the wait_event_interruptible() where sleeping is possible before
re-checking and returning to sleep if necessary.

Signed-off-by: Steven Price <steven.price@arm.com>
---
 drivers/gpu/drm/scheduler/sched_main.c | 22 +++++++++++++++-------
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 9a0ee74d82dc..bf9b4931ddfd 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -622,20 +622,21 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
 }
 
 /**
- * drm_sched_cleanup_jobs - destroy finished jobs
+ * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
  *
  * @sched: scheduler instance
  *
- * Remove all finished jobs from the mirror list and destroy them.
+ * Returns the next finished job from the mirror list (if there is one)
+ * ready for it to be destroyed.
  */
-static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
+static struct drm_sched_job *drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
 {
 	unsigned long flags;
 
 	/* Don't destroy jobs while the timeout worker is running */
 	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
 	    !cancel_delayed_work(&sched->work_tdr))
-		return;
+		return NULL;
 
 
 	while (!list_empty(&sched->ring_mirror_list)) {
@@ -651,7 +652,7 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
 		list_del_init(&job->node);
 		spin_unlock_irqrestore(&sched->job_list_lock, flags);
 
-		sched->ops->free_job(job);
+		return job;
 	}
 
 	/* queue timeout for next job */
@@ -659,6 +660,7 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
 	drm_sched_start_timeout(sched);
 	spin_unlock_irqrestore(&sched->job_list_lock, flags);
 
+	return NULL;
 }
 
 /**
@@ -698,12 +700,18 @@ static int drm_sched_main(void *param)
 		struct drm_sched_fence *s_fence;
 		struct drm_sched_job *sched_job;
 		struct dma_fence *fence;
+		struct drm_sched_job *cleanup_job = NULL;
 
 		wait_event_interruptible(sched->wake_up_worker,
-					 (drm_sched_cleanup_jobs(sched),
+					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
 					 (!drm_sched_blocked(sched) &&
 					  (entity = drm_sched_select_entity(sched))) ||
-					 kthread_should_stop()));
+					 kthread_should_stop());
+
+		while (cleanup_job) {
+			sched->ops->free_job(cleanup_job);
+			cleanup_job = drm_sched_get_cleanup_job(sched);
+		}
 
 		if (!entity)
 			continue;
Christian König Sept. 25, 2019, 2:12 p.m. UTC | #8
Am 25.09.19 um 16:06 schrieb Steven Price:
> On 24/09/2019 10:55, Koenig, Christian wrote:
>> Sorry for the delayed response, have been busy on other stuff last week.
>>
>> Am 17.09.19 um 14:46 schrieb Steven Price:
>>> On 17/09/2019 09:42, Koenig, Christian wrote:
>>>> Hi Steven,
>>>>
>>>> thought about that issue a bit more and I think I came up with a
>>>> solution.
>>>>
>>>> What you could do is to split up drm_sched_cleanup_jobs() into two
>>>> functions.
>>>>
>>>> One that checks if jobs to be cleaned up are present and one which does
>>>> the actual cleanup.
>>>>
>>>> This way we could call drm_sched_cleanup_jobs() outside of the
>>>> wait_event_interruptible().
>>> Yes that seems like a good solution - there doesn't seem to be a good
>>> reason why the actual job cleanup needs to be done within the
>>> wait_event_interruptible() condition. I did briefly attempt that
>>> before, but I couldn't work out exactly what the condition is which
>>> should cause the wake (my initial attempt caused continuous wake-ups).
>> Basically you need something like the following:
>>
>> 1. Test is timeout worker is running:
>>
>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>       !cancel_delayed_work(&sched->work_tdr))
>>           return false;
>>
>> 2. Test if there is any job ready to be cleaned up.
>>
>> job = list_first_entry_or_null(&sched->ring_mirror_list, struct
>> drm_sched_job, node);
>> if (!job || !dma_fence_is_signaled(&job->s_fence->finished))
>>       return false;
>>
>> That should basically do it.
> Thanks for the pointers. I wasn't sure if the "queue timeout for next
> job" part was necessary or not if step 2 above returns false.
>
> I've been testing the following patch which simply pulls the
> sched->ops->free_job() out of the wait_event_interruptible().
>
> I'll try with just the tests you've described.
>
> ----8<-----
>  From 873c1816394beee72904e64aa2ee0f169e768d76 Mon Sep 17 00:00:00 2001
> From: Steven Price <steven.price@arm.com>
> Date: Mon, 23 Sep 2019 11:08:50 +0100
> Subject: [PATCH] drm: Don't free jobs in wait_event_interruptible()
>
> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
> it is called as the condition of wait_event_interruptible() it must not
> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>
> Instead let's rename drm_sched_cleanup_jobs() to
> drm_sched_get_cleanup_job() and simply return a job for processing if
> there is one. The caller can then call the free_job() callback outside
> the wait_event_interruptible() where sleeping is possible before
> re-checking and returning to sleep if necessary.
>
> Signed-off-by: Steven Price <steven.price@arm.com>
> ---
>   drivers/gpu/drm/scheduler/sched_main.c | 22 +++++++++++++++-------
>   1 file changed, 15 insertions(+), 7 deletions(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
> index 9a0ee74d82dc..bf9b4931ddfd 100644
> --- a/drivers/gpu/drm/scheduler/sched_main.c
> +++ b/drivers/gpu/drm/scheduler/sched_main.c
> @@ -622,20 +622,21 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>   }
>   
>   /**
> - * drm_sched_cleanup_jobs - destroy finished jobs
> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
>    *
>    * @sched: scheduler instance
>    *
> - * Remove all finished jobs from the mirror list and destroy them.
> + * Returns the next finished job from the mirror list (if there is one)
> + * ready for it to be destroyed.
>    */
> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
> +static struct drm_sched_job *drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>   {
>   	unsigned long flags;
>   
>   	/* Don't destroy jobs while the timeout worker is running */
>   	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>   	    !cancel_delayed_work(&sched->work_tdr))
> -		return;
> +		return NULL;
>   
>   
>   	while (!list_empty(&sched->ring_mirror_list)) {

Yeah, you should probably clean that up a bit here, but apart from that 
this should work as well.

Regards,
Christian.

> @@ -651,7 +652,7 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>   		list_del_init(&job->node);
>   		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>   
> -		sched->ops->free_job(job);
> +		return job;
>   	}
>   
>   	/* queue timeout for next job */
> @@ -659,6 +660,7 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>   	drm_sched_start_timeout(sched);
>   	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>   
> +	return NULL;
>   }
>   
>   /**
> @@ -698,12 +700,18 @@ static int drm_sched_main(void *param)
>   		struct drm_sched_fence *s_fence;
>   		struct drm_sched_job *sched_job;
>   		struct dma_fence *fence;
> +		struct drm_sched_job *cleanup_job = NULL;
>   
>   		wait_event_interruptible(sched->wake_up_worker,
> -					 (drm_sched_cleanup_jobs(sched),
> +					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
>   					 (!drm_sched_blocked(sched) &&
>   					  (entity = drm_sched_select_entity(sched))) ||
> -					 kthread_should_stop()));
> +					 kthread_should_stop());
> +
> +		while (cleanup_job) {
> +			sched->ops->free_job(cleanup_job);
> +			cleanup_job = drm_sched_get_cleanup_job(sched);
> +		}
>   
>   		if (!entity)
>   			continue;
Steven Price Sept. 25, 2019, 2:40 p.m. UTC | #9
On 25/09/2019 15:12, Koenig, Christian wrote:
> Am 25.09.19 um 16:06 schrieb Steven Price:
>> On 24/09/2019 10:55, Koenig, Christian wrote:
>>> Sorry for the delayed response, have been busy on other stuff last week.
>>>
>>> Am 17.09.19 um 14:46 schrieb Steven Price:
>>>> On 17/09/2019 09:42, Koenig, Christian wrote:
>>>>> Hi Steven,
>>>>>
>>>>> thought about that issue a bit more and I think I came up with a
>>>>> solution.
>>>>>
>>>>> What you could do is to split up drm_sched_cleanup_jobs() into two
>>>>> functions.
>>>>>
>>>>> One that checks if jobs to be cleaned up are present and one which does
>>>>> the actual cleanup.
>>>>>
>>>>> This way we could call drm_sched_cleanup_jobs() outside of the
>>>>> wait_event_interruptible().
>>>> Yes that seems like a good solution - there doesn't seem to be a good
>>>> reason why the actual job cleanup needs to be done within the
>>>> wait_event_interruptible() condition. I did briefly attempt that
>>>> before, but I couldn't work out exactly what the condition is which
>>>> should cause the wake (my initial attempt caused continuous wake-ups).
>>> Basically you need something like the following:
>>>
>>> 1. Test is timeout worker is running:
>>>
>>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>>       !cancel_delayed_work(&sched->work_tdr))
>>>           return false;
>>>
>>> 2. Test if there is any job ready to be cleaned up.
>>>
>>> job = list_first_entry_or_null(&sched->ring_mirror_list, struct
>>> drm_sched_job, node);
>>> if (!job || !dma_fence_is_signaled(&job->s_fence->finished))
>>>       return false;
>>>
>>> That should basically do it.
>> Thanks for the pointers. I wasn't sure if the "queue timeout for next
>> job" part was necessary or not if step 2 above returns false.
>>
>> I've been testing the following patch which simply pulls the
>> sched->ops->free_job() out of the wait_event_interruptible().
>>
>> I'll try with just the tests you've described.

It looks like it is necessary to queue the timeout for the next job even
if there isn't a job to be cleaned up (i.e. even if we wouldn't exit
from wait_event_interruptible(). So I'll go with a patch like below.

>> ----8<-----
>>  From 873c1816394beee72904e64aa2ee0f169e768d76 Mon Sep 17 00:00:00 2001
>> From: Steven Price <steven.price@arm.com>
>> Date: Mon, 23 Sep 2019 11:08:50 +0100
>> Subject: [PATCH] drm: Don't free jobs in wait_event_interruptible()
>>
>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because
>> it is called as the condition of wait_event_interruptible() it must not
>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep.
>>
>> Instead let's rename drm_sched_cleanup_jobs() to
>> drm_sched_get_cleanup_job() and simply return a job for processing if
>> there is one. The caller can then call the free_job() callback outside
>> the wait_event_interruptible() where sleeping is possible before
>> re-checking and returning to sleep if necessary.
>>
>> Signed-off-by: Steven Price <steven.price@arm.com>
>> ---
>>   drivers/gpu/drm/scheduler/sched_main.c | 22 +++++++++++++++-------
>>   1 file changed, 15 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
>> index 9a0ee74d82dc..bf9b4931ddfd 100644
>> --- a/drivers/gpu/drm/scheduler/sched_main.c
>> +++ b/drivers/gpu/drm/scheduler/sched_main.c
>> @@ -622,20 +622,21 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb)
>>   }
>>   
>>   /**
>> - * drm_sched_cleanup_jobs - destroy finished jobs
>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed
>>    *
>>    * @sched: scheduler instance
>>    *
>> - * Remove all finished jobs from the mirror list and destroy them.
>> + * Returns the next finished job from the mirror list (if there is one)
>> + * ready for it to be destroyed.
>>    */
>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>> +static struct drm_sched_job *drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched)
>>   {
>>   	unsigned long flags;
>>   
>>   	/* Don't destroy jobs while the timeout worker is running */
>>   	if (sched->timeout != MAX_SCHEDULE_TIMEOUT &&
>>   	    !cancel_delayed_work(&sched->work_tdr))
>> -		return;
>> +		return NULL;
>>   
>>   
>>   	while (!list_empty(&sched->ring_mirror_list)) {
> 
> Yeah, you should probably clean that up a bit here, but apart from that 
> this should work as well.

Good point - this function can be tidied up quite a bit, I'll post a new
patch shortly.

Thanks,

Steve

> Regards,
> Christian.
> 
>> @@ -651,7 +652,7 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>   		list_del_init(&job->node);
>>   		spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>   
>> -		sched->ops->free_job(job);
>> +		return job;
>>   	}
>>   
>>   	/* queue timeout for next job */
>> @@ -659,6 +660,7 @@ static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched)
>>   	drm_sched_start_timeout(sched);
>>   	spin_unlock_irqrestore(&sched->job_list_lock, flags);
>>   
>> +	return NULL;
>>   }
>>   
>>   /**
>> @@ -698,12 +700,18 @@ static int drm_sched_main(void *param)
>>   		struct drm_sched_fence *s_fence;
>>   		struct drm_sched_job *sched_job;
>>   		struct dma_fence *fence;
>> +		struct drm_sched_job *cleanup_job = NULL;
>>   
>>   		wait_event_interruptible(sched->wake_up_worker,
>> -					 (drm_sched_cleanup_jobs(sched),
>> +					 (cleanup_job = drm_sched_get_cleanup_job(sched)) ||
>>   					 (!drm_sched_blocked(sched) &&
>>   					  (entity = drm_sched_select_entity(sched))) ||
>> -					 kthread_should_stop()));
>> +					 kthread_should_stop());
>> +
>> +		while (cleanup_job) {
>> +			sched->ops->free_job(cleanup_job);
>> +			cleanup_job = drm_sched_get_cleanup_job(sched);
>> +		}
>>   
>>   		if (!entity)
>>   			continue;
> 
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
diff mbox series

Patch

diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c
index 9a0ee74d82dc..528f295e3a31 100644
--- a/drivers/gpu/drm/scheduler/sched_main.c
+++ b/drivers/gpu/drm/scheduler/sched_main.c
@@ -699,11 +699,12 @@  static int drm_sched_main(void *param)
 		struct drm_sched_job *sched_job;
 		struct dma_fence *fence;
 
+		drm_sched_cleanup_jobs(sched);
+
 		wait_event_interruptible(sched->wake_up_worker,
-					 (drm_sched_cleanup_jobs(sched),
 					 (!drm_sched_blocked(sched) &&
 					  (entity = drm_sched_select_entity(sched))) ||
-					 kthread_should_stop()));
+					 kthread_should_stop());
 
 		if (!entity)
 			continue;