Message ID | 20191024162424.38548-1-steven.price@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RESEND,v4] drm: Don't free jobs in wait_event_interruptible() | expand |
Am Do., 24. Okt. 2019 um 18:25 Uhr schrieb Steven Price <steven.price@arm.com>: > > drm_sched_cleanup_jobs() attempts to free finished jobs, however because > it is called as the condition of wait_event_interruptible() it must not > sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. > > Instead let's rename drm_sched_cleanup_jobs() to > drm_sched_get_cleanup_job() and simply return a job for processing if > there is one. The caller can then call the free_job() callback outside > the wait_event_interruptible() where sleeping is possible before > re-checking and returning to sleep if necessary. > > Signed-off-by: Steven Price <steven.price@arm.com> Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com> Without this patch I get the following warning: [ 242.935254] ------------[ cut here ]------------ [ 242.940044] WARNING: CPU: 2 PID: 109 at kernel/sched/core.c:6731 __might_sleep+0x94/0xa8 [ 242.948242] do not call blocking ops when !TASK_RUNNING; state=1 set at [<38751e36>] prepare_to_wait_event+0xa8/0x180 [ 242.958923] Modules linked in: [ 242.962010] CPU: 2 PID: 109 Comm: 130000.gpu Not tainted 5.4.0-rc4 #10 [ 242.968551] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) [ 242.975112] [<c0113160>] (unwind_backtrace) from [<c010cf34>] (show_stack+0x10/0x14) [ 242.982879] [<c010cf34>] (show_stack) from [<c0c065ec>] (dump_stack+0xd8/0x110) [ 242.990213] [<c0c065ec>] (dump_stack) from [<c0128adc>] (__warn+0xc0/0x10c) [ 242.997194] [<c0128adc>] (__warn) from [<c0128f10>] (warn_slowpath_fmt+0x8c/0xb8) [ 243.004697] [<c0128f10>] (warn_slowpath_fmt) from [<c01598bc>] (__might_sleep+0x94/0xa8) [ 243.012810] [<c01598bc>] (__might_sleep) from [<c0c246e4>] (__mutex_lock+0x38/0xa1c) [ 243.020571] [<c0c246e4>] (__mutex_lock) from [<c0c250e4>] (mutex_lock_nested+0x1c/0x24) [ 243.028600] [<c0c250e4>] (mutex_lock_nested) from [<c064f020>] (etnaviv_cmdbuf_free+0x40/0x8c) [ 243.037233] [<c064f020>] (etnaviv_cmdbuf_free) from [<c06503a0>] (etnaviv_submit_put+0x38/0x1c8) [ 243.046042] [<c06503a0>] (etnaviv_submit_put) from [<c064177c>] (drm_sched_cleanup_jobs+0xc8/0xec) [ 243.055021] [<c064177c>] (drm_sched_cleanup_jobs) from [<c06419b4>] (drm_sched_main+0x214/0x298) [ 243.063826] [<c06419b4>] (drm_sched_main) from [<c0152890>] (kthread+0x140/0x158) [ 243.071329] [<c0152890>] (kthread) from [<c01010b4>] (ret_from_fork+0x14/0x20) [ 243.078563] Exception stack(0xec691fb0 to 0xec691ff8) [ 243.083630] 1fa0: 00000000 00000000 00000000 00000000 [ 243.091822] 1fc0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 [ 243.100013] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 [ 243.106795] irq event stamp: 321 [ 243.110098] hardirqs last enabled at (339): [<c0193854>] console_unlock+0x430/0x620 [ 243.117864] hardirqs last disabled at (346): [<c01934cc>] console_unlock+0xa8/0x620 [ 243.125592] softirqs last enabled at (362): [<c01024e0>] __do_softirq+0x2c0/0x590 [ 243.133232] softirqs last disabled at (373): [<c0130ed0>] irq_exit+0x100/0x18c [ 243.140517] ---[ end trace 8afcd79e9e2725b2 ]---
Am 24.10.19 um 18:24 schrieb Steven Price: > drm_sched_cleanup_jobs() attempts to free finished jobs, however because > it is called as the condition of wait_event_interruptible() it must not > sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. > > Instead let's rename drm_sched_cleanup_jobs() to > drm_sched_get_cleanup_job() and simply return a job for processing if > there is one. The caller can then call the free_job() callback outside > the wait_event_interruptible() where sleeping is possible before > re-checking and returning to sleep if necessary. > > Signed-off-by: Steven Price <steven.price@arm.com> > --- > Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/ > > drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++----------- > 1 file changed, 26 insertions(+), 19 deletions(-) > > diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c > index 9a0ee74d82dc..148468447ba9 100644 > --- a/drivers/gpu/drm/scheduler/sched_main.c > +++ b/drivers/gpu/drm/scheduler/sched_main.c > @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) > } > > /** > - * drm_sched_cleanup_jobs - destroy finished jobs > + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed > * > * @sched: scheduler instance > * > - * Remove all finished jobs from the mirror list and destroy them. > + * Returns the next finished job from the mirror list (if there is one) > + * ready for it to be destroyed. > */ > -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) > +static struct drm_sched_job * > +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) > { > + struct drm_sched_job *job = NULL; Please don't initialize job here. > unsigned long flags; > > /* Don't destroy jobs while the timeout worker is running */ > if (sched->timeout != MAX_SCHEDULE_TIMEOUT && > !cancel_delayed_work(&sched->work_tdr)) > - return; > - > + return NULL; > > - while (!list_empty(&sched->ring_mirror_list)) { > - struct drm_sched_job *job; > + spin_lock_irqsave(&sched->job_list_lock, flags); > > - job = list_first_entry(&sched->ring_mirror_list, > + job = list_first_entry_or_null(&sched->ring_mirror_list, > struct drm_sched_job, node); > - if (!dma_fence_is_signaled(&job->s_fence->finished)) > - break; > > - spin_lock_irqsave(&sched->job_list_lock, flags); > + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { > /* remove job from ring_mirror_list */ > list_del_init(&job->node); > - spin_unlock_irqrestore(&sched->job_list_lock, flags); > - > - sched->ops->free_job(job); > + } else { > + job = NULL; > + /* queue timeout for next job */ > + drm_sched_start_timeout(sched); > } > > - /* queue timeout for next job */ > - spin_lock_irqsave(&sched->job_list_lock, flags); > - drm_sched_start_timeout(sched); > spin_unlock_irqrestore(&sched->job_list_lock, flags); > > + return job; > } > > /** > @@ -698,12 +696,21 @@ static int drm_sched_main(void *param) > struct drm_sched_fence *s_fence; > struct drm_sched_job *sched_job; > struct dma_fence *fence; > + struct drm_sched_job *cleanup_job = NULL; > > wait_event_interruptible(sched->wake_up_worker, > - (drm_sched_cleanup_jobs(sched), > + (cleanup_job = drm_sched_get_cleanup_job(sched)) || > (!drm_sched_blocked(sched) && > (entity = drm_sched_select_entity(sched))) || > - kthread_should_stop())); > + kthread_should_stop()); > + > + while (cleanup_job) { Better make this just "if (cleanup_job)"... to make sure that we stop immediately when the thread should stop. Apart from that looks good to me now, Christian. > + sched->ops->free_job(cleanup_job); > + /* queue timeout for next job */ > + drm_sched_start_timeout(sched); > + > + cleanup_job = drm_sched_get_cleanup_job(sched); > + } > > if (!entity) > continue;
On 25/10/2019 10:43, Christian Gmeiner wrote: > Am Do., 24. Okt. 2019 um 18:25 Uhr schrieb Steven Price <steven.price@arm.com>: >> >> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >> it is called as the condition of wait_event_interruptible() it must not >> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. >> >> Instead let's rename drm_sched_cleanup_jobs() to >> drm_sched_get_cleanup_job() and simply return a job for processing if >> there is one. The caller can then call the free_job() callback outside >> the wait_event_interruptible() where sleeping is possible before >> re-checking and returning to sleep if necessary. >> >> Signed-off-by: Steven Price <steven.price@arm.com> > > Tested-by: Christian Gmeiner <christian.gmeiner@gmail.com> > > Without this patch I get the following warning: Thanks, if you've got an (easily) reproducible case, can you check which commit this fixes. I *think*: Fixes: 5918045c4ed4 ("drm/scheduler: rework job destruction") But I haven't got a reliable way of reproducing this (with Panfrost). Thanks, Steve > > [ 242.935254] ------------[ cut here ]------------ > [ 242.940044] WARNING: CPU: 2 PID: 109 at kernel/sched/core.c:6731 > __might_sleep+0x94/0xa8 > [ 242.948242] do not call blocking ops when !TASK_RUNNING; state=1 > set at [<38751e36>] prepare_to_wait_event+0xa8/0x180 > [ 242.958923] Modules linked in: > [ 242.962010] CPU: 2 PID: 109 Comm: 130000.gpu Not tainted 5.4.0-rc4 #10 > [ 242.968551] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree) > [ 242.975112] [<c0113160>] (unwind_backtrace) from [<c010cf34>] > (show_stack+0x10/0x14) > [ 242.982879] [<c010cf34>] (show_stack) from [<c0c065ec>] > (dump_stack+0xd8/0x110) > [ 242.990213] [<c0c065ec>] (dump_stack) from [<c0128adc>] (__warn+0xc0/0x10c) > [ 242.997194] [<c0128adc>] (__warn) from [<c0128f10>] > (warn_slowpath_fmt+0x8c/0xb8) > [ 243.004697] [<c0128f10>] (warn_slowpath_fmt) from [<c01598bc>] > (__might_sleep+0x94/0xa8) > [ 243.012810] [<c01598bc>] (__might_sleep) from [<c0c246e4>] > (__mutex_lock+0x38/0xa1c) > [ 243.020571] [<c0c246e4>] (__mutex_lock) from [<c0c250e4>] > (mutex_lock_nested+0x1c/0x24) > [ 243.028600] [<c0c250e4>] (mutex_lock_nested) from [<c064f020>] > (etnaviv_cmdbuf_free+0x40/0x8c) > [ 243.037233] [<c064f020>] (etnaviv_cmdbuf_free) from [<c06503a0>] > (etnaviv_submit_put+0x38/0x1c8) > [ 243.046042] [<c06503a0>] (etnaviv_submit_put) from [<c064177c>] > (drm_sched_cleanup_jobs+0xc8/0xec) > [ 243.055021] [<c064177c>] (drm_sched_cleanup_jobs) from [<c06419b4>] > (drm_sched_main+0x214/0x298) > [ 243.063826] [<c06419b4>] (drm_sched_main) from [<c0152890>] > (kthread+0x140/0x158) > [ 243.071329] [<c0152890>] (kthread) from [<c01010b4>] > (ret_from_fork+0x14/0x20) > [ 243.078563] Exception stack(0xec691fb0 to 0xec691ff8) > [ 243.083630] 1fa0: 00000000 > 00000000 00000000 00000000 > [ 243.091822] 1fc0: 00000000 00000000 00000000 00000000 00000000 > 00000000 00000000 00000000 > [ 243.100013] 1fe0: 00000000 00000000 00000000 00000000 00000013 00000000 > [ 243.106795] irq event stamp: 321 > [ 243.110098] hardirqs last enabled at (339): [<c0193854>] > console_unlock+0x430/0x620 > [ 243.117864] hardirqs last disabled at (346): [<c01934cc>] > console_unlock+0xa8/0x620 > [ 243.125592] softirqs last enabled at (362): [<c01024e0>] > __do_softirq+0x2c0/0x590 > [ 243.133232] softirqs last disabled at (373): [<c0130ed0>] > irq_exit+0x100/0x18c > [ 243.140517] ---[ end trace 8afcd79e9e2725b2 ]--- >
On 25/10/2019 10:49, Koenig, Christian wrote: > Am 24.10.19 um 18:24 schrieb Steven Price: >> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >> it is called as the condition of wait_event_interruptible() it must not >> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. >> >> Instead let's rename drm_sched_cleanup_jobs() to >> drm_sched_get_cleanup_job() and simply return a job for processing if >> there is one. The caller can then call the free_job() callback outside >> the wait_event_interruptible() where sleeping is possible before >> re-checking and returning to sleep if necessary. >> >> Signed-off-by: Steven Price <steven.price@arm.com> >> --- >> Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/ >> >> drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++----------- >> 1 file changed, 26 insertions(+), 19 deletions(-) >> >> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >> index 9a0ee74d82dc..148468447ba9 100644 >> --- a/drivers/gpu/drm/scheduler/sched_main.c >> +++ b/drivers/gpu/drm/scheduler/sched_main.c >> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) >> } >> >> /** >> - * drm_sched_cleanup_jobs - destroy finished jobs >> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed >> * >> * @sched: scheduler instance >> * >> - * Remove all finished jobs from the mirror list and destroy them. >> + * Returns the next finished job from the mirror list (if there is one) >> + * ready for it to be destroyed. >> */ >> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >> +static struct drm_sched_job * >> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >> { >> + struct drm_sched_job *job = NULL; > > Please don't initialize job here. Good spot, will fix. >> unsigned long flags; >> >> /* Don't destroy jobs while the timeout worker is running */ >> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >> !cancel_delayed_work(&sched->work_tdr)) >> - return; >> - >> + return NULL; >> >> - while (!list_empty(&sched->ring_mirror_list)) { >> - struct drm_sched_job *job; >> + spin_lock_irqsave(&sched->job_list_lock, flags); >> >> - job = list_first_entry(&sched->ring_mirror_list, >> + job = list_first_entry_or_null(&sched->ring_mirror_list, >> struct drm_sched_job, node); >> - if (!dma_fence_is_signaled(&job->s_fence->finished)) >> - break; >> >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >> /* remove job from ring_mirror_list */ >> list_del_init(&job->node); >> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >> - >> - sched->ops->free_job(job); >> + } else { >> + job = NULL; >> + /* queue timeout for next job */ >> + drm_sched_start_timeout(sched); >> } >> >> - /* queue timeout for next job */ >> - spin_lock_irqsave(&sched->job_list_lock, flags); >> - drm_sched_start_timeout(sched); >> spin_unlock_irqrestore(&sched->job_list_lock, flags); >> >> + return job; >> } >> >> /** >> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param) >> struct drm_sched_fence *s_fence; >> struct drm_sched_job *sched_job; >> struct dma_fence *fence; >> + struct drm_sched_job *cleanup_job = NULL; >> >> wait_event_interruptible(sched->wake_up_worker, >> - (drm_sched_cleanup_jobs(sched), >> + (cleanup_job = drm_sched_get_cleanup_job(sched)) || >> (!drm_sched_blocked(sched) && >> (entity = drm_sched_select_entity(sched))) || >> - kthread_should_stop())); >> + kthread_should_stop()); >> + >> + while (cleanup_job) { > > Better make this just "if (cleanup_job)"... to make sure that we stop > immediately when the thread should stop. Ok, no problem. Note that this is a change in behaviour (previously drm_sched_cleanup_jobs() was called *before* checking kthread_should_stop()). But I can't see the harm. Steve > Apart from that looks good to me now, > Christian. > >> + sched->ops->free_job(cleanup_job); >> + /* queue timeout for next job */ >> + drm_sched_start_timeout(sched); >> + >> + cleanup_job = drm_sched_get_cleanup_job(sched); >> + } >> >> if (!entity) >> continue; > > _______________________________________________ > dri-devel mailing list > dri-devel@lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/dri-devel >
Am 25.10.19 um 12:26 schrieb Steven Price: > On 25/10/2019 10:49, Koenig, Christian wrote: >> Am 24.10.19 um 18:24 schrieb Steven Price: >>> drm_sched_cleanup_jobs() attempts to free finished jobs, however because >>> it is called as the condition of wait_event_interruptible() it must not >>> sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. >>> >>> Instead let's rename drm_sched_cleanup_jobs() to >>> drm_sched_get_cleanup_job() and simply return a job for processing if >>> there is one. The caller can then call the free_job() callback outside >>> the wait_event_interruptible() where sleeping is possible before >>> re-checking and returning to sleep if necessary. >>> >>> Signed-off-by: Steven Price <steven.price@arm.com> >>> --- >>> Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/ >>> >>> drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++----------- >>> 1 file changed, 26 insertions(+), 19 deletions(-) >>> >>> diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c >>> index 9a0ee74d82dc..148468447ba9 100644 >>> --- a/drivers/gpu/drm/scheduler/sched_main.c >>> +++ b/drivers/gpu/drm/scheduler/sched_main.c >>> @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) >>> } >>> >>> /** >>> - * drm_sched_cleanup_jobs - destroy finished jobs >>> + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed >>> * >>> * @sched: scheduler instance >>> * >>> - * Remove all finished jobs from the mirror list and destroy them. >>> + * Returns the next finished job from the mirror list (if there is one) >>> + * ready for it to be destroyed. >>> */ >>> -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) >>> +static struct drm_sched_job * >>> +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) >>> { >>> + struct drm_sched_job *job = NULL; >> Please don't initialize job here. > Good spot, will fix. > >>> unsigned long flags; >>> >>> /* Don't destroy jobs while the timeout worker is running */ >>> if (sched->timeout != MAX_SCHEDULE_TIMEOUT && >>> !cancel_delayed_work(&sched->work_tdr)) >>> - return; >>> - >>> + return NULL; >>> >>> - while (!list_empty(&sched->ring_mirror_list)) { >>> - struct drm_sched_job *job; >>> + spin_lock_irqsave(&sched->job_list_lock, flags); >>> >>> - job = list_first_entry(&sched->ring_mirror_list, >>> + job = list_first_entry_or_null(&sched->ring_mirror_list, >>> struct drm_sched_job, node); >>> - if (!dma_fence_is_signaled(&job->s_fence->finished)) >>> - break; >>> >>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>> + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { >>> /* remove job from ring_mirror_list */ >>> list_del_init(&job->node); >>> - spin_unlock_irqrestore(&sched->job_list_lock, flags); >>> - >>> - sched->ops->free_job(job); >>> + } else { >>> + job = NULL; >>> + /* queue timeout for next job */ >>> + drm_sched_start_timeout(sched); >>> } >>> >>> - /* queue timeout for next job */ >>> - spin_lock_irqsave(&sched->job_list_lock, flags); >>> - drm_sched_start_timeout(sched); >>> spin_unlock_irqrestore(&sched->job_list_lock, flags); >>> >>> + return job; >>> } >>> >>> /** >>> @@ -698,12 +696,21 @@ static int drm_sched_main(void *param) >>> struct drm_sched_fence *s_fence; >>> struct drm_sched_job *sched_job; >>> struct dma_fence *fence; >>> + struct drm_sched_job *cleanup_job = NULL; >>> >>> wait_event_interruptible(sched->wake_up_worker, >>> - (drm_sched_cleanup_jobs(sched), >>> + (cleanup_job = drm_sched_get_cleanup_job(sched)) || >>> (!drm_sched_blocked(sched) && >>> (entity = drm_sched_select_entity(sched))) || >>> - kthread_should_stop())); >>> + kthread_should_stop()); >>> + >>> + while (cleanup_job) { >> Better make this just "if (cleanup_job)"... to make sure that we stop >> immediately when the thread should stop. > Ok, no problem. Note that this is a change in behaviour (previously > drm_sched_cleanup_jobs() was called *before* checking > kthread_should_stop()). But I can't see the harm. Yeah, but this is actually a rather nice improvement. When we say that the thread should stop then that should happen immediately and not cleanup the jobs first. Christian. > > Steve > >> Apart from that looks good to me now, >> Christian. >> >>> + sched->ops->free_job(cleanup_job); >>> + /* queue timeout for next job */ >>> + drm_sched_start_timeout(sched); >>> + >>> + cleanup_job = drm_sched_get_cleanup_job(sched); >>> + } >>> >>> if (!entity) >>> continue; >> _______________________________________________ >> dri-devel mailing list >> dri-devel@lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/dri-devel >>
diff --git a/drivers/gpu/drm/scheduler/sched_main.c b/drivers/gpu/drm/scheduler/sched_main.c index 9a0ee74d82dc..148468447ba9 100644 --- a/drivers/gpu/drm/scheduler/sched_main.c +++ b/drivers/gpu/drm/scheduler/sched_main.c @@ -622,43 +622,41 @@ static void drm_sched_process_job(struct dma_fence *f, struct dma_fence_cb *cb) } /** - * drm_sched_cleanup_jobs - destroy finished jobs + * drm_sched_get_cleanup_job - fetch the next finished job to be destroyed * * @sched: scheduler instance * - * Remove all finished jobs from the mirror list and destroy them. + * Returns the next finished job from the mirror list (if there is one) + * ready for it to be destroyed. */ -static void drm_sched_cleanup_jobs(struct drm_gpu_scheduler *sched) +static struct drm_sched_job * +drm_sched_get_cleanup_job(struct drm_gpu_scheduler *sched) { + struct drm_sched_job *job = NULL; unsigned long flags; /* Don't destroy jobs while the timeout worker is running */ if (sched->timeout != MAX_SCHEDULE_TIMEOUT && !cancel_delayed_work(&sched->work_tdr)) - return; - + return NULL; - while (!list_empty(&sched->ring_mirror_list)) { - struct drm_sched_job *job; + spin_lock_irqsave(&sched->job_list_lock, flags); - job = list_first_entry(&sched->ring_mirror_list, + job = list_first_entry_or_null(&sched->ring_mirror_list, struct drm_sched_job, node); - if (!dma_fence_is_signaled(&job->s_fence->finished)) - break; - spin_lock_irqsave(&sched->job_list_lock, flags); + if (job && dma_fence_is_signaled(&job->s_fence->finished)) { /* remove job from ring_mirror_list */ list_del_init(&job->node); - spin_unlock_irqrestore(&sched->job_list_lock, flags); - - sched->ops->free_job(job); + } else { + job = NULL; + /* queue timeout for next job */ + drm_sched_start_timeout(sched); } - /* queue timeout for next job */ - spin_lock_irqsave(&sched->job_list_lock, flags); - drm_sched_start_timeout(sched); spin_unlock_irqrestore(&sched->job_list_lock, flags); + return job; } /** @@ -698,12 +696,21 @@ static int drm_sched_main(void *param) struct drm_sched_fence *s_fence; struct drm_sched_job *sched_job; struct dma_fence *fence; + struct drm_sched_job *cleanup_job = NULL; wait_event_interruptible(sched->wake_up_worker, - (drm_sched_cleanup_jobs(sched), + (cleanup_job = drm_sched_get_cleanup_job(sched)) || (!drm_sched_blocked(sched) && (entity = drm_sched_select_entity(sched))) || - kthread_should_stop())); + kthread_should_stop()); + + while (cleanup_job) { + sched->ops->free_job(cleanup_job); + /* queue timeout for next job */ + drm_sched_start_timeout(sched); + + cleanup_job = drm_sched_get_cleanup_job(sched); + } if (!entity) continue;
drm_sched_cleanup_jobs() attempts to free finished jobs, however because it is called as the condition of wait_event_interruptible() it must not sleep. Unfortuantly some free callbacks (notibly for Panfrost) do sleep. Instead let's rename drm_sched_cleanup_jobs() to drm_sched_get_cleanup_job() and simply return a job for processing if there is one. The caller can then call the free_job() callback outside the wait_event_interruptible() where sleeping is possible before re-checking and returning to sleep if necessary. Signed-off-by: Steven Price <steven.price@arm.com> --- Previous posting: https://lore.kernel.org/lkml/20190926141630.14258-1-steven.price@arm.com/ drivers/gpu/drm/scheduler/sched_main.c | 45 +++++++++++++++----------- 1 file changed, 26 insertions(+), 19 deletions(-)