Message ID | 20220106181449.696988-2-robdclark@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm/msm/gpu: System suspend fixes | expand |
On Thu 06 Jan 10:14 PST 2022, Rob Clark wrote: > From: Rob Clark <robdclark@chromium.org> > > System suspend uses pm_runtime_force_suspend(), which cheekily bypasses > the runpm reference counts. This doesn't actually work so well when the > GPU is active. So add a reasonable delay waiting for the GPU to become > idle. > > Alternatively we could just return -EBUSY in this case, but that has the > disadvantage of causing system suspend to fail. > Reviewed-by: Bjorn Andersson <bjorn.andersson@linaro.org> Regards, Bjorn > Signed-off-by: Rob Clark <robdclark@chromium.org> > --- > drivers/gpu/drm/msm/adreno/adreno_device.c | 9 +++++++++ > drivers/gpu/drm/msm/msm_gpu.c | 3 +++ > drivers/gpu/drm/msm/msm_gpu.h | 3 +++ > 3 files changed, 15 insertions(+) > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c > index 93005839b5da..b677ca3fd75e 100644 > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c > @@ -611,6 +611,15 @@ static int adreno_resume(struct device *dev) > static int adreno_suspend(struct device *dev) > { > struct msm_gpu *gpu = dev_to_gpu(dev); > + int ret = 0; > + > + ret = wait_event_timeout(gpu->retire_event, > + !msm_gpu_active(gpu), > + msecs_to_jiffies(1000)); > + if (ret == 0) { > + dev_err(dev, "Timeout waiting for GPU to suspend\n"); > + return -EBUSY; > + } > > return gpu->funcs->pm_suspend(gpu); > } > diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c > index 0f78c2615272..2c1049c0ea14 100644 > --- a/drivers/gpu/drm/msm/msm_gpu.c > +++ b/drivers/gpu/drm/msm/msm_gpu.c > @@ -703,6 +703,8 @@ static void retire_submits(struct msm_gpu *gpu) > } > } > } > + > + wake_up_all(&gpu->retire_event); > } > > static void retire_worker(struct kthread_work *work) > @@ -848,6 +850,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, > INIT_LIST_HEAD(&gpu->active_list); > mutex_init(&gpu->active_lock); > mutex_init(&gpu->lock); > + init_waitqueue_head(&gpu->retire_event); > kthread_init_work(&gpu->retire_work, retire_worker); > kthread_init_work(&gpu->recover_work, recover_worker); > kthread_init_work(&gpu->fault_work, fault_worker); > diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h > index 445c6bfd4b6b..92aa1e9196c6 100644 > --- a/drivers/gpu/drm/msm/msm_gpu.h > +++ b/drivers/gpu/drm/msm/msm_gpu.h > @@ -230,6 +230,9 @@ struct msm_gpu { > /* work for handling GPU recovery: */ > struct kthread_work recover_work; > > + /** retire_event: notified when submits are retired: */ > + wait_queue_head_t retire_event; > + > /* work for handling active-list retiring: */ > struct kthread_work retire_work; > > -- > 2.33.1 >
Il 06/01/22 19:14, Rob Clark ha scritto: > From: Rob Clark <robdclark@chromium.org> > > System suspend uses pm_runtime_force_suspend(), which cheekily bypasses > the runpm reference counts. This doesn't actually work so well when the > GPU is active. So add a reasonable delay waiting for the GPU to become > idle. > > Alternatively we could just return -EBUSY in this case, but that has the > disadvantage of causing system suspend to fail. > > Signed-off-by: Rob Clark <robdclark@chromium.org> Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
Quoting Rob Clark (2022-01-06 10:14:46) > From: Rob Clark <robdclark@chromium.org> > > System suspend uses pm_runtime_force_suspend(), which cheekily bypasses > the runpm reference counts. This doesn't actually work so well when the > GPU is active. So add a reasonable delay waiting for the GPU to become > idle. Maybe also say: Failure to wait during system wide suspend leads to GPU hangs seen on resume. > > Alternatively we could just return -EBUSY in this case, but that has the > disadvantage of causing system suspend to fail. > > Signed-off-by: Rob Clark <robdclark@chromium.org> > --- > drivers/gpu/drm/msm/adreno/adreno_device.c | 9 +++++++++ > drivers/gpu/drm/msm/msm_gpu.c | 3 +++ > drivers/gpu/drm/msm/msm_gpu.h | 3 +++ > 3 files changed, 15 insertions(+) > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c > index 93005839b5da..b677ca3fd75e 100644 > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c > @@ -611,6 +611,15 @@ static int adreno_resume(struct device *dev) > static int adreno_suspend(struct device *dev) > { > struct msm_gpu *gpu = dev_to_gpu(dev); > + int ret = 0; Please don't assign and then immediately overwrite. > + > + ret = wait_event_timeout(gpu->retire_event, > + !msm_gpu_active(gpu), > + msecs_to_jiffies(1000)); > + if (ret == 0) { The usual pattern is long timeleft; timeleft = wait_event_timeout(...) if (!timeleft) { /* no time left; timed out */ Can it be the same pattern here? It helps because people sometimes forget that wait_event_timeout() returns the time that is left and not an error code when it times out. > + dev_err(dev, "Timeout waiting for GPU to suspend\n"); > + return -EBUSY; > + } > > return gpu->funcs->pm_suspend(gpu); > }
On Fri, Jan 7, 2022 at 4:27 PM Stephen Boyd <swboyd@chromium.org> wrote: > > Quoting Rob Clark (2022-01-06 10:14:46) > > From: Rob Clark <robdclark@chromium.org> > > > > System suspend uses pm_runtime_force_suspend(), which cheekily bypasses > > the runpm reference counts. This doesn't actually work so well when the > > GPU is active. So add a reasonable delay waiting for the GPU to become > > idle. > > Maybe also say: > > Failure to wait during system wide suspend leads to GPU hangs seen on > resume. The fallout can actually be a lot more than just GPU hangs.. that is just the case that is easy (for us) to observe because the crash logging captures them. But sync/async external aborts are also possible.. and I think even just undefined behavior (ie. I think if the timing works out right, it can survive but just "lose" rendering that hadn't completed yet) > > > > Alternatively we could just return -EBUSY in this case, but that has the > > disadvantage of causing system suspend to fail. > > > > Signed-off-by: Rob Clark <robdclark@chromium.org> > > --- > > drivers/gpu/drm/msm/adreno/adreno_device.c | 9 +++++++++ > > drivers/gpu/drm/msm/msm_gpu.c | 3 +++ > > drivers/gpu/drm/msm/msm_gpu.h | 3 +++ > > 3 files changed, 15 insertions(+) > > > > diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c > > index 93005839b5da..b677ca3fd75e 100644 > > --- a/drivers/gpu/drm/msm/adreno/adreno_device.c > > +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c > > @@ -611,6 +611,15 @@ static int adreno_resume(struct device *dev) > > static int adreno_suspend(struct device *dev) > > { > > struct msm_gpu *gpu = dev_to_gpu(dev); > > + int ret = 0; > > Please don't assign and then immediately overwrite. > > > + > > + ret = wait_event_timeout(gpu->retire_event, > > + !msm_gpu_active(gpu), > > + msecs_to_jiffies(1000)); > > + if (ret == 0) { > > The usual pattern is > > long timeleft; > > timeleft = wait_event_timeout(...) > if (!timeleft) { > /* no time left; timed out */ > > Can it be the same pattern here? It helps because people sometimes > forget that wait_event_timeout() returns the time that is left and not > an error code when it times out. ok, I'll update in v2.. BR, -R > > + dev_err(dev, "Timeout waiting for GPU to suspend\n"); > > + return -EBUSY; > > + } > > > > return gpu->funcs->pm_suspend(gpu); > > }
diff --git a/drivers/gpu/drm/msm/adreno/adreno_device.c b/drivers/gpu/drm/msm/adreno/adreno_device.c index 93005839b5da..b677ca3fd75e 100644 --- a/drivers/gpu/drm/msm/adreno/adreno_device.c +++ b/drivers/gpu/drm/msm/adreno/adreno_device.c @@ -611,6 +611,15 @@ static int adreno_resume(struct device *dev) static int adreno_suspend(struct device *dev) { struct msm_gpu *gpu = dev_to_gpu(dev); + int ret = 0; + + ret = wait_event_timeout(gpu->retire_event, + !msm_gpu_active(gpu), + msecs_to_jiffies(1000)); + if (ret == 0) { + dev_err(dev, "Timeout waiting for GPU to suspend\n"); + return -EBUSY; + } return gpu->funcs->pm_suspend(gpu); } diff --git a/drivers/gpu/drm/msm/msm_gpu.c b/drivers/gpu/drm/msm/msm_gpu.c index 0f78c2615272..2c1049c0ea14 100644 --- a/drivers/gpu/drm/msm/msm_gpu.c +++ b/drivers/gpu/drm/msm/msm_gpu.c @@ -703,6 +703,8 @@ static void retire_submits(struct msm_gpu *gpu) } } } + + wake_up_all(&gpu->retire_event); } static void retire_worker(struct kthread_work *work) @@ -848,6 +850,7 @@ int msm_gpu_init(struct drm_device *drm, struct platform_device *pdev, INIT_LIST_HEAD(&gpu->active_list); mutex_init(&gpu->active_lock); mutex_init(&gpu->lock); + init_waitqueue_head(&gpu->retire_event); kthread_init_work(&gpu->retire_work, retire_worker); kthread_init_work(&gpu->recover_work, recover_worker); kthread_init_work(&gpu->fault_work, fault_worker); diff --git a/drivers/gpu/drm/msm/msm_gpu.h b/drivers/gpu/drm/msm/msm_gpu.h index 445c6bfd4b6b..92aa1e9196c6 100644 --- a/drivers/gpu/drm/msm/msm_gpu.h +++ b/drivers/gpu/drm/msm/msm_gpu.h @@ -230,6 +230,9 @@ struct msm_gpu { /* work for handling GPU recovery: */ struct kthread_work recover_work; + /** retire_event: notified when submits are retired: */ + wait_queue_head_t retire_event; + /* work for handling active-list retiring: */ struct kthread_work retire_work;