Message ID | 20240612152752.v2.8.I27914059cc822b52db9bf72b4013b525b60e06fd@changeid (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | drm: make leftover drivers call drm_atomic_helper_shutdown() at the right times | expand |
On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote: > > Based on grepping through the source code this driver appears to be > missing a call to drm_atomic_helper_shutdown() at system shutdown > time. Among other things, this means that if a panel is in use that it > won't be cleanly powered off at system shutdown time. > > The fact that we should call drm_atomic_helper_shutdown() in the case > of OS shutdown/restart comes straight out of the kernel doc "driver > instance overview" in drm_drv.c. > > Suggested-by: Maxime Ripard <mripard@kernel.org> > Cc: Alex Deucher <alexander.deucher@amd.com> > Cc: Christian König <christian.koenig@amd.com> > Cc: Xinhui Pan <Xinhui.Pan@amd.com> > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > This commit is only compile-time tested. > > ...and further, I'd say that this patch is more of a plea for help > than a patch I think is actually right. I'm _fairly_ certain that > drm/amdgpu needs this call at shutdown time but the logic is a bit > hard for me to follow. I'd appreciate if anyone who actually knows > what this should look like could illuminate me, or perhaps even just > post a patch themselves! I'm not sure this patch makes sense or not. The driver doesn't really do a formal tear down in its shutdown routine, it just quiesces the hardware. What are the actual requirements of the shutdown function? In the past when we did a full driver tear down in shutdown, it delayed the shutdown sequence and users complained. Alex > > (no changes since v1) > > drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++++ > drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 ++ > 3 files changed, 13 insertions(+) > > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > index f87d53e183c3..c202a1d5ff5f 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h > @@ -1197,6 +1197,7 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_device *bdev) > int amdgpu_device_init(struct amdgpu_device *adev, > uint32_t flags); > void amdgpu_device_fini_hw(struct amdgpu_device *adev); > +void amdgpu_device_shutdown_hw(struct amdgpu_device *adev); > void amdgpu_device_fini_sw(struct amdgpu_device *adev); > > int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev); > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > index 861ccff78af9..a8c4b8412e04 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > @@ -4531,6 +4531,16 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) > > } > > +void amdgpu_device_shutdown_hw(struct amdgpu_device *adev) > +{ > + if (adev->mode_info.mode_config_initialized) { > + if (!drm_drv_uses_atomic_modeset(adev_to_drm(adev))) > + drm_helper_force_disable_all(adev_to_drm(adev)); > + else > + drm_atomic_helper_shutdown(adev_to_drm(adev)); > + } > +} > + > void amdgpu_device_fini_sw(struct amdgpu_device *adev) > { > int idx; > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > index ea14f1c8f430..b34bf9259d5c 100644 > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c > @@ -2409,6 +2409,8 @@ amdgpu_pci_shutdown(struct pci_dev *pdev) > struct drm_device *dev = pci_get_drvdata(pdev); > struct amdgpu_device *adev = drm_to_adev(dev); > > + amdgpu_device_shutdown_hw(adev); > + > if (amdgpu_ras_intr_triggered()) > return; > > -- > 2.45.2.505.gda0bf45e8d-goog >
Hi, On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > Based on grepping through the source code this driver appears to be > > missing a call to drm_atomic_helper_shutdown() at system shutdown > > time. Among other things, this means that if a panel is in use that it > > won't be cleanly powered off at system shutdown time. > > > > The fact that we should call drm_atomic_helper_shutdown() in the case > > of OS shutdown/restart comes straight out of the kernel doc "driver > > instance overview" in drm_drv.c. > > > > Suggested-by: Maxime Ripard <mripard@kernel.org> > > Cc: Alex Deucher <alexander.deucher@amd.com> > > Cc: Christian König <christian.koenig@amd.com> > > Cc: Xinhui Pan <Xinhui.Pan@amd.com> > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > This commit is only compile-time tested. > > > > ...and further, I'd say that this patch is more of a plea for help > > than a patch I think is actually right. I'm _fairly_ certain that > > drm/amdgpu needs this call at shutdown time but the logic is a bit > > hard for me to follow. I'd appreciate if anyone who actually knows > > what this should look like could illuminate me, or perhaps even just > > post a patch themselves! > > I'm not sure this patch makes sense or not. The driver doesn't really > do a formal tear down in its shutdown routine, it just quiesces the > hardware. What are the actual requirements of the shutdown function? > In the past when we did a full driver tear down in shutdown, it > delayed the shutdown sequence and users complained. The "inspiration" for this patch is to handle panels properly. Specifically, panels often have several power/enable signals going to them and often have requirements that these signals are powered off in the proper order with the proper delays between them. While we can't always do so when the system crashes / reboots in an uncontrolled way, panel manufacturers / HW Engineers get upset if we don't power things off properly during an orderly shutdown/reboot. When panels are powered off badly it can cause garbage on the screen and, so I've been told, can even cause long term damage to the panels over time. In Linux, some panel drivers have tried to ensure a proper poweroff of the panel by handling the shutdown() call themselves. However, this is ugly and panel maintainers want panel drivers to stop doing it. We have removed the code doing this from most panels now [1]. Instead the assumption is that the DRM modeset drivers should be calling drm_atomic_helper_shutdown() which will make sure panels get an orderly shutdown. For a lot more details, see the cover letter [2] which then contains links to even more discussions about the topic. [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org
On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > > > Based on grepping through the source code this driver appears to be > > > missing a call to drm_atomic_helper_shutdown() at system shutdown > > > time. Among other things, this means that if a panel is in use that it > > > won't be cleanly powered off at system shutdown time. > > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case > > > of OS shutdown/restart comes straight out of the kernel doc "driver > > > instance overview" in drm_drv.c. > > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org> > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > Cc: Christian König <christian.koenig@amd.com> > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com> > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > --- > > > This commit is only compile-time tested. > > > > > > ...and further, I'd say that this patch is more of a plea for help > > > than a patch I think is actually right. I'm _fairly_ certain that > > > drm/amdgpu needs this call at shutdown time but the logic is a bit > > > hard for me to follow. I'd appreciate if anyone who actually knows > > > what this should look like could illuminate me, or perhaps even just > > > post a patch themselves! > > > > I'm not sure this patch makes sense or not. The driver doesn't really > > do a formal tear down in its shutdown routine, it just quiesces the > > hardware. What are the actual requirements of the shutdown function? > > In the past when we did a full driver tear down in shutdown, it > > delayed the shutdown sequence and users complained. > > The "inspiration" for this patch is to handle panels properly. > Specifically, panels often have several power/enable signals going to > them and often have requirements that these signals are powered off in > the proper order with the proper delays between them. While we can't > always do so when the system crashes / reboots in an uncontrolled way, > panel manufacturers / HW Engineers get upset if we don't power things > off properly during an orderly shutdown/reboot. When panels are > powered off badly it can cause garbage on the screen and, so I've been > told, can even cause long term damage to the panels over time. > > In Linux, some panel drivers have tried to ensure a proper poweroff of > the panel by handling the shutdown() call themselves. However, this is > ugly and panel maintainers want panel drivers to stop doing it. We > have removed the code doing this from most panels now [1]. Instead the > assumption is that the DRM modeset drivers should be calling > drm_atomic_helper_shutdown() which will make sure panels get an > orderly shutdown. > > For a lot more details, see the cover letter [2] which then contains > links to even more discussions about the topic. > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org I don't think it's an issue. We quiesce the hardware as if we were about to suspend the system (e.g., S3). For the display hardware we call drm_atomic_helper_suspend() as part of that sequence. Alex
Hi, On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote: > > > > Hi, > > > > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > > > > > Based on grepping through the source code this driver appears to be > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown > > > > time. Among other things, this means that if a panel is in use that it > > > > won't be cleanly powered off at system shutdown time. > > > > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case > > > > of OS shutdown/restart comes straight out of the kernel doc "driver > > > > instance overview" in drm_drv.c. > > > > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org> > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > Cc: Christian König <christian.koenig@amd.com> > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com> > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > --- > > > > This commit is only compile-time tested. > > > > > > > > ...and further, I'd say that this patch is more of a plea for help > > > > than a patch I think is actually right. I'm _fairly_ certain that > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit > > > > hard for me to follow. I'd appreciate if anyone who actually knows > > > > what this should look like could illuminate me, or perhaps even just > > > > post a patch themselves! > > > > > > I'm not sure this patch makes sense or not. The driver doesn't really > > > do a formal tear down in its shutdown routine, it just quiesces the > > > hardware. What are the actual requirements of the shutdown function? > > > In the past when we did a full driver tear down in shutdown, it > > > delayed the shutdown sequence and users complained. > > > > The "inspiration" for this patch is to handle panels properly. > > Specifically, panels often have several power/enable signals going to > > them and often have requirements that these signals are powered off in > > the proper order with the proper delays between them. While we can't > > always do so when the system crashes / reboots in an uncontrolled way, > > panel manufacturers / HW Engineers get upset if we don't power things > > off properly during an orderly shutdown/reboot. When panels are > > powered off badly it can cause garbage on the screen and, so I've been > > told, can even cause long term damage to the panels over time. > > > > In Linux, some panel drivers have tried to ensure a proper poweroff of > > the panel by handling the shutdown() call themselves. However, this is > > ugly and panel maintainers want panel drivers to stop doing it. We > > have removed the code doing this from most panels now [1]. Instead the > > assumption is that the DRM modeset drivers should be calling > > drm_atomic_helper_shutdown() which will make sure panels get an > > orderly shutdown. > > > > For a lot more details, see the cover letter [2] which then contains > > links to even more discussions about the topic. > > > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org > > I don't think it's an issue. We quiesce the hardware as if we were > about to suspend the system (e.g., S3). For the display hardware we > call drm_atomic_helper_suspend() as part of that sequence. OK. It's no skin off my teeth and we can drop this patch if you're convinced it's not needed. From the point of view of someone who has no experience with this driver it seems weird to me that it would use drm_atomic_helper_suspend() at shutdown time instead of the documented drm_atomic_helper_shutdown(), but if it works for everyone then I'm not gonna complain. -Doug
On Tue, Jun 18, 2024 at 7:53 PM Doug Anderson <dianders@chromium.org> wrote: > > Hi, > > On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote: > > > > > > Hi, > > > > > > > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > > > > > > > Based on grepping through the source code this driver appears to be > > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown > > > > > time. Among other things, this means that if a panel is in use that it > > > > > won't be cleanly powered off at system shutdown time. > > > > > > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case > > > > > of OS shutdown/restart comes straight out of the kernel doc "driver > > > > > instance overview" in drm_drv.c. > > > > > > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org> > > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > > Cc: Christian König <christian.koenig@amd.com> > > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com> > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > > --- > > > > > This commit is only compile-time tested. > > > > > > > > > > ...and further, I'd say that this patch is more of a plea for help > > > > > than a patch I think is actually right. I'm _fairly_ certain that > > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit > > > > > hard for me to follow. I'd appreciate if anyone who actually knows > > > > > what this should look like could illuminate me, or perhaps even just > > > > > post a patch themselves! > > > > > > > > I'm not sure this patch makes sense or not. The driver doesn't really > > > > do a formal tear down in its shutdown routine, it just quiesces the > > > > hardware. What are the actual requirements of the shutdown function? > > > > In the past when we did a full driver tear down in shutdown, it > > > > delayed the shutdown sequence and users complained. > > > > > > The "inspiration" for this patch is to handle panels properly. > > > Specifically, panels often have several power/enable signals going to > > > them and often have requirements that these signals are powered off in > > > the proper order with the proper delays between them. While we can't > > > always do so when the system crashes / reboots in an uncontrolled way, > > > panel manufacturers / HW Engineers get upset if we don't power things > > > off properly during an orderly shutdown/reboot. When panels are > > > powered off badly it can cause garbage on the screen and, so I've been > > > told, can even cause long term damage to the panels over time. > > > > > > In Linux, some panel drivers have tried to ensure a proper poweroff of > > > the panel by handling the shutdown() call themselves. However, this is > > > ugly and panel maintainers want panel drivers to stop doing it. We > > > have removed the code doing this from most panels now [1]. Instead the > > > assumption is that the DRM modeset drivers should be calling > > > drm_atomic_helper_shutdown() which will make sure panels get an > > > orderly shutdown. > > > > > > For a lot more details, see the cover letter [2] which then contains > > > links to even more discussions about the topic. > > > > > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org > > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org > > > > I don't think it's an issue. We quiesce the hardware as if we were > > about to suspend the system (e.g., S3). For the display hardware we > > call drm_atomic_helper_suspend() as part of that sequence. > > OK. It's no skin off my teeth and we can drop this patch if you're > convinced it's not needed. From the point of view of someone who has > no experience with this driver it seems weird to me that it would use > drm_atomic_helper_suspend() at shutdown time instead of the documented > drm_atomic_helper_shutdown(), but if it works for everyone then I'm > not gonna complain. I think the problem is that it is not clear exactly what the expectations are around the PCI shutdown callback. The documentation says: "Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any idling DMA operations. Useful for enabling wake-on-lan (NIC) or changing the power state of a device before reboot. e.g. drivers/net/e100.c." We tried a full driver teardown in the shutdown callback and it added a lot of latency that really wasn't needed since the system was just going into a reboot or power down. The best middle ground was to just leverage our hw level suspend code to quiesce the hardware. Adding complexity to call drm_atomic_helper_suspend() vs drm_atomic_helper_shutdown() doesn't seem worth it since the functions do pretty much the same thing (both call drm_atomic_helper_disable_all()). Maybe it's better to update the documentation to recommend drm_atomic_helper_suspend() if drivers want to leverage their suspend code? Alex
On Wed, Jun 19, 2024 at 9:50 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > On Tue, Jun 18, 2024 at 7:53 PM Doug Anderson <dianders@chromium.org> wrote: > > > > Hi, > > > > On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote: > > > > > > > > Hi, > > > > > > > > > > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > > > > > > > > > Based on grepping through the source code this driver appears to be > > > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown > > > > > > time. Among other things, this means that if a panel is in use that it > > > > > > won't be cleanly powered off at system shutdown time. > > > > > > > > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case > > > > > > of OS shutdown/restart comes straight out of the kernel doc "driver > > > > > > instance overview" in drm_drv.c. > > > > > > > > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org> > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > > > Cc: Christian König <christian.koenig@amd.com> > > > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com> > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > > > --- > > > > > > This commit is only compile-time tested. > > > > > > > > > > > > ...and further, I'd say that this patch is more of a plea for help > > > > > > than a patch I think is actually right. I'm _fairly_ certain that > > > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit > > > > > > hard for me to follow. I'd appreciate if anyone who actually knows > > > > > > what this should look like could illuminate me, or perhaps even just > > > > > > post a patch themselves! > > > > > > > > > > I'm not sure this patch makes sense or not. The driver doesn't really > > > > > do a formal tear down in its shutdown routine, it just quiesces the > > > > > hardware. What are the actual requirements of the shutdown function? > > > > > In the past when we did a full driver tear down in shutdown, it > > > > > delayed the shutdown sequence and users complained. > > > > > > > > The "inspiration" for this patch is to handle panels properly. > > > > Specifically, panels often have several power/enable signals going to > > > > them and often have requirements that these signals are powered off in > > > > the proper order with the proper delays between them. While we can't > > > > always do so when the system crashes / reboots in an uncontrolled way, > > > > panel manufacturers / HW Engineers get upset if we don't power things > > > > off properly during an orderly shutdown/reboot. When panels are > > > > powered off badly it can cause garbage on the screen and, so I've been > > > > told, can even cause long term damage to the panels over time. > > > > > > > > In Linux, some panel drivers have tried to ensure a proper poweroff of > > > > the panel by handling the shutdown() call themselves. However, this is > > > > ugly and panel maintainers want panel drivers to stop doing it. We > > > > have removed the code doing this from most panels now [1]. Instead the > > > > assumption is that the DRM modeset drivers should be calling > > > > drm_atomic_helper_shutdown() which will make sure panels get an > > > > orderly shutdown. > > > > > > > > For a lot more details, see the cover letter [2] which then contains > > > > links to even more discussions about the topic. > > > > > > > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org > > > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org > > > > > > I don't think it's an issue. We quiesce the hardware as if we were > > > about to suspend the system (e.g., S3). For the display hardware we > > > call drm_atomic_helper_suspend() as part of that sequence. > > > > OK. It's no skin off my teeth and we can drop this patch if you're > > convinced it's not needed. From the point of view of someone who has > > no experience with this driver it seems weird to me that it would use > > drm_atomic_helper_suspend() at shutdown time instead of the documented > > drm_atomic_helper_shutdown(), but if it works for everyone then I'm > > not gonna complain. > > I think the problem is that it is not clear exactly what the > expectations are around the PCI shutdown callback. The documentation > says: > > "Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any > idling DMA operations. Useful for enabling wake-on-lan (NIC) or > changing the power state of a device before reboot. e.g. > drivers/net/e100.c." Arguably, there is no requirement to even touch the display hardware at all. In theory you could just leave the display hardware as is in the current state. The system will either be rebooting or powering down anyway. Alex > > We tried a full driver teardown in the shutdown callback and it added > a lot of latency that really wasn't needed since the system was just > going into a reboot or power down. The best middle ground was to just > leverage our hw level suspend code to quiesce the hardware. Adding > complexity to call drm_atomic_helper_suspend() vs > drm_atomic_helper_shutdown() doesn't seem worth it since the functions > do pretty much the same thing (both call > drm_atomic_helper_disable_all()). Maybe it's better to update the > documentation to recommend drm_atomic_helper_suspend() if drivers want > to leverage their suspend code? > > Alex
Hi, On Wed, Jun 19, 2024 at 09:53:12AM GMT, Alex Deucher wrote: > On Wed, Jun 19, 2024 at 9:50 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > On Tue, Jun 18, 2024 at 7:53 PM Doug Anderson <dianders@chromium.org> wrote: > > > > > > Hi, > > > > > > On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > > > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > > > > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > > > > > > > > > > > Based on grepping through the source code this driver appears to be > > > > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown > > > > > > > time. Among other things, this means that if a panel is in use that it > > > > > > > won't be cleanly powered off at system shutdown time. > > > > > > > > > > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case > > > > > > > of OS shutdown/restart comes straight out of the kernel doc "driver > > > > > > > instance overview" in drm_drv.c. > > > > > > > > > > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org> > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > > > > Cc: Christian König <christian.koenig@amd.com> > > > > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com> > > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > > > > --- > > > > > > > This commit is only compile-time tested. > > > > > > > > > > > > > > ...and further, I'd say that this patch is more of a plea for help > > > > > > > than a patch I think is actually right. I'm _fairly_ certain that > > > > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit > > > > > > > hard for me to follow. I'd appreciate if anyone who actually knows > > > > > > > what this should look like could illuminate me, or perhaps even just > > > > > > > post a patch themselves! > > > > > > > > > > > > I'm not sure this patch makes sense or not. The driver doesn't really > > > > > > do a formal tear down in its shutdown routine, it just quiesces the > > > > > > hardware. What are the actual requirements of the shutdown function? > > > > > > In the past when we did a full driver tear down in shutdown, it > > > > > > delayed the shutdown sequence and users complained. > > > > > > > > > > The "inspiration" for this patch is to handle panels properly. > > > > > Specifically, panels often have several power/enable signals going to > > > > > them and often have requirements that these signals are powered off in > > > > > the proper order with the proper delays between them. While we can't > > > > > always do so when the system crashes / reboots in an uncontrolled way, > > > > > panel manufacturers / HW Engineers get upset if we don't power things > > > > > off properly during an orderly shutdown/reboot. When panels are > > > > > powered off badly it can cause garbage on the screen and, so I've been > > > > > told, can even cause long term damage to the panels over time. > > > > > > > > > > In Linux, some panel drivers have tried to ensure a proper poweroff of > > > > > the panel by handling the shutdown() call themselves. However, this is > > > > > ugly and panel maintainers want panel drivers to stop doing it. We > > > > > have removed the code doing this from most panels now [1]. Instead the > > > > > assumption is that the DRM modeset drivers should be calling > > > > > drm_atomic_helper_shutdown() which will make sure panels get an > > > > > orderly shutdown. > > > > > > > > > > For a lot more details, see the cover letter [2] which then contains > > > > > links to even more discussions about the topic. > > > > > > > > > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org > > > > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org > > > > > > > > I don't think it's an issue. We quiesce the hardware as if we were > > > > about to suspend the system (e.g., S3). For the display hardware we > > > > call drm_atomic_helper_suspend() as part of that sequence. > > > > > > OK. It's no skin off my teeth and we can drop this patch if you're > > > convinced it's not needed. From the point of view of someone who has > > > no experience with this driver it seems weird to me that it would use > > > drm_atomic_helper_suspend() at shutdown time instead of the documented > > > drm_atomic_helper_shutdown(), but if it works for everyone then I'm > > > not gonna complain. > > > > I think the problem is that it is not clear exactly what the > > expectations are around the PCI shutdown callback. The documentation > > says: > > > > "Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any > > idling DMA operations. Useful for enabling wake-on-lan (NIC) or > > changing the power state of a device before reboot. e.g. > > drivers/net/e100.c." > > Arguably, there is no requirement to even touch the display hardware > at all. In theory you could just leave the display hardware as is in > the current state. The system will either be rebooting or powering > down anyway. I think it mostly boils down to a cultural mismatch :) Doug works on panel for ARM systems, where devices need (and need to handle) much more resources than what's typical on a system with an AMD GPU. So, for the kind of hardware Doug usually deals with, we definitely need the shutdown hook to make sure the regulators, GPIOs, etc. supplying the panel are properly shutdown. And panels usually tied to AMD GPUs probably don't need any of that. Maxime
On Thu, Jun 20, 2024 at 3:10 AM Maxime Ripard <mripard@kernel.org> wrote: > > Hi, > > On Wed, Jun 19, 2024 at 09:53:12AM GMT, Alex Deucher wrote: > > On Wed, Jun 19, 2024 at 9:50 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > On Tue, Jun 18, 2024 at 7:53 PM Doug Anderson <dianders@chromium.org> wrote: > > > > > > > > Hi, > > > > > > > > On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > > > > > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > > > > > > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > > > > > > > > > > > > > Based on grepping through the source code this driver appears to be > > > > > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown > > > > > > > > time. Among other things, this means that if a panel is in use that it > > > > > > > > won't be cleanly powered off at system shutdown time. > > > > > > > > > > > > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case > > > > > > > > of OS shutdown/restart comes straight out of the kernel doc "driver > > > > > > > > instance overview" in drm_drv.c. > > > > > > > > > > > > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org> > > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > > > > > Cc: Christian König <christian.koenig@amd.com> > > > > > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com> > > > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > > > > > --- > > > > > > > > This commit is only compile-time tested. > > > > > > > > > > > > > > > > ...and further, I'd say that this patch is more of a plea for help > > > > > > > > than a patch I think is actually right. I'm _fairly_ certain that > > > > > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit > > > > > > > > hard for me to follow. I'd appreciate if anyone who actually knows > > > > > > > > what this should look like could illuminate me, or perhaps even just > > > > > > > > post a patch themselves! > > > > > > > > > > > > > > I'm not sure this patch makes sense or not. The driver doesn't really > > > > > > > do a formal tear down in its shutdown routine, it just quiesces the > > > > > > > hardware. What are the actual requirements of the shutdown function? > > > > > > > In the past when we did a full driver tear down in shutdown, it > > > > > > > delayed the shutdown sequence and users complained. > > > > > > > > > > > > The "inspiration" for this patch is to handle panels properly. > > > > > > Specifically, panels often have several power/enable signals going to > > > > > > them and often have requirements that these signals are powered off in > > > > > > the proper order with the proper delays between them. While we can't > > > > > > always do so when the system crashes / reboots in an uncontrolled way, > > > > > > panel manufacturers / HW Engineers get upset if we don't power things > > > > > > off properly during an orderly shutdown/reboot. When panels are > > > > > > powered off badly it can cause garbage on the screen and, so I've been > > > > > > told, can even cause long term damage to the panels over time. > > > > > > > > > > > > In Linux, some panel drivers have tried to ensure a proper poweroff of > > > > > > the panel by handling the shutdown() call themselves. However, this is > > > > > > ugly and panel maintainers want panel drivers to stop doing it. We > > > > > > have removed the code doing this from most panels now [1]. Instead the > > > > > > assumption is that the DRM modeset drivers should be calling > > > > > > drm_atomic_helper_shutdown() which will make sure panels get an > > > > > > orderly shutdown. > > > > > > > > > > > > For a lot more details, see the cover letter [2] which then contains > > > > > > links to even more discussions about the topic. > > > > > > > > > > > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org > > > > > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org > > > > > > > > > > I don't think it's an issue. We quiesce the hardware as if we were > > > > > about to suspend the system (e.g., S3). For the display hardware we > > > > > call drm_atomic_helper_suspend() as part of that sequence. > > > > > > > > OK. It's no skin off my teeth and we can drop this patch if you're > > > > convinced it's not needed. From the point of view of someone who has > > > > no experience with this driver it seems weird to me that it would use > > > > drm_atomic_helper_suspend() at shutdown time instead of the documented > > > > drm_atomic_helper_shutdown(), but if it works for everyone then I'm > > > > not gonna complain. > > > > > > I think the problem is that it is not clear exactly what the > > > expectations are around the PCI shutdown callback. The documentation > > > says: > > > > > > "Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any > > > idling DMA operations. Useful for enabling wake-on-lan (NIC) or > > > changing the power state of a device before reboot. e.g. > > > drivers/net/e100.c." > > > > Arguably, there is no requirement to even touch the display hardware > > at all. In theory you could just leave the display hardware as is in > > the current state. The system will either be rebooting or powering > > down anyway. > > I think it mostly boils down to a cultural mismatch :) > > Doug works on panel for ARM systems, where devices need (and need to > handle) much more resources than what's typical on a system with an AMD > GPU. > > So, for the kind of hardware Doug usually deals with, we definitely need > the shutdown hook to make sure the regulators, GPIOs, etc. supplying the > panel are properly shutdown. > > And panels usually tied to AMD GPUs probably don't need any of that. Makes sense. I think drm_atomic_helper_suspend() is a viable alternative if drivers want to leverage their existing suspend code. I could write up a doc patch unless there is reason to prefer the shutdown variant. Alex
On Thu, Jun 20, 2024 at 09:00:23AM GMT, Alex Deucher wrote: > On Thu, Jun 20, 2024 at 3:10 AM Maxime Ripard <mripard@kernel.org> wrote: > > > > Hi, > > > > On Wed, Jun 19, 2024 at 09:53:12AM GMT, Alex Deucher wrote: > > > On Wed, Jun 19, 2024 at 9:50 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > > > On Tue, Jun 18, 2024 at 7:53 PM Doug Anderson <dianders@chromium.org> wrote: > > > > > > > > > > Hi, > > > > > > > > > > On Tue, Jun 18, 2024 at 3:00 PM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > > > > > > > On Tue, Jun 18, 2024 at 5:40 PM Doug Anderson <dianders@chromium.org> wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > On Mon, Jun 17, 2024 at 8:01 AM Alex Deucher <alexdeucher@gmail.com> wrote: > > > > > > > > > > > > > > > > On Wed, Jun 12, 2024 at 6:37 PM Douglas Anderson <dianders@chromium.org> wrote: > > > > > > > > > > > > > > > > > > Based on grepping through the source code this driver appears to be > > > > > > > > > missing a call to drm_atomic_helper_shutdown() at system shutdown > > > > > > > > > time. Among other things, this means that if a panel is in use that it > > > > > > > > > won't be cleanly powered off at system shutdown time. > > > > > > > > > > > > > > > > > > The fact that we should call drm_atomic_helper_shutdown() in the case > > > > > > > > > of OS shutdown/restart comes straight out of the kernel doc "driver > > > > > > > > > instance overview" in drm_drv.c. > > > > > > > > > > > > > > > > > > Suggested-by: Maxime Ripard <mripard@kernel.org> > > > > > > > > > Cc: Alex Deucher <alexander.deucher@amd.com> > > > > > > > > > Cc: Christian König <christian.koenig@amd.com> > > > > > > > > > Cc: Xinhui Pan <Xinhui.Pan@amd.com> > > > > > > > > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > > > > > > > > --- > > > > > > > > > This commit is only compile-time tested. > > > > > > > > > > > > > > > > > > ...and further, I'd say that this patch is more of a plea for help > > > > > > > > > than a patch I think is actually right. I'm _fairly_ certain that > > > > > > > > > drm/amdgpu needs this call at shutdown time but the logic is a bit > > > > > > > > > hard for me to follow. I'd appreciate if anyone who actually knows > > > > > > > > > what this should look like could illuminate me, or perhaps even just > > > > > > > > > post a patch themselves! > > > > > > > > > > > > > > > > I'm not sure this patch makes sense or not. The driver doesn't really > > > > > > > > do a formal tear down in its shutdown routine, it just quiesces the > > > > > > > > hardware. What are the actual requirements of the shutdown function? > > > > > > > > In the past when we did a full driver tear down in shutdown, it > > > > > > > > delayed the shutdown sequence and users complained. > > > > > > > > > > > > > > The "inspiration" for this patch is to handle panels properly. > > > > > > > Specifically, panels often have several power/enable signals going to > > > > > > > them and often have requirements that these signals are powered off in > > > > > > > the proper order with the proper delays between them. While we can't > > > > > > > always do so when the system crashes / reboots in an uncontrolled way, > > > > > > > panel manufacturers / HW Engineers get upset if we don't power things > > > > > > > off properly during an orderly shutdown/reboot. When panels are > > > > > > > powered off badly it can cause garbage on the screen and, so I've been > > > > > > > told, can even cause long term damage to the panels over time. > > > > > > > > > > > > > > In Linux, some panel drivers have tried to ensure a proper poweroff of > > > > > > > the panel by handling the shutdown() call themselves. However, this is > > > > > > > ugly and panel maintainers want panel drivers to stop doing it. We > > > > > > > have removed the code doing this from most panels now [1]. Instead the > > > > > > > assumption is that the DRM modeset drivers should be calling > > > > > > > drm_atomic_helper_shutdown() which will make sure panels get an > > > > > > > orderly shutdown. > > > > > > > > > > > > > > For a lot more details, see the cover letter [2] which then contains > > > > > > > links to even more discussions about the topic. > > > > > > > > > > > > > > [1] https://lore.kernel.org/r/20240605002401.2848541-1-dianders@chromium.org > > > > > > > [2] https://lore.kernel.org/r/20240612222435.3188234-1-dianders@chromium.org > > > > > > > > > > > > I don't think it's an issue. We quiesce the hardware as if we were > > > > > > about to suspend the system (e.g., S3). For the display hardware we > > > > > > call drm_atomic_helper_suspend() as part of that sequence. > > > > > > > > > > OK. It's no skin off my teeth and we can drop this patch if you're > > > > > convinced it's not needed. From the point of view of someone who has > > > > > no experience with this driver it seems weird to me that it would use > > > > > drm_atomic_helper_suspend() at shutdown time instead of the documented > > > > > drm_atomic_helper_shutdown(), but if it works for everyone then I'm > > > > > not gonna complain. > > > > > > > > I think the problem is that it is not clear exactly what the > > > > expectations are around the PCI shutdown callback. The documentation > > > > says: > > > > > > > > "Hook into reboot_notifier_list (kernel/sys.c). Intended to stop any > > > > idling DMA operations. Useful for enabling wake-on-lan (NIC) or > > > > changing the power state of a device before reboot. e.g. > > > > drivers/net/e100.c." > > > > > > Arguably, there is no requirement to even touch the display hardware > > > at all. In theory you could just leave the display hardware as is in > > > the current state. The system will either be rebooting or powering > > > down anyway. > > > > I think it mostly boils down to a cultural mismatch :) > > > > Doug works on panel for ARM systems, where devices need (and need to > > handle) much more resources than what's typical on a system with an AMD > > GPU. > > > > So, for the kind of hardware Doug usually deals with, we definitely need > > the shutdown hook to make sure the regulators, GPIOs, etc. supplying the > > panel are properly shutdown. > > > > And panels usually tied to AMD GPUs probably don't need any of that. > > Makes sense. I think drm_atomic_helper_suspend() is a viable > alternative if drivers want to leverage their existing suspend code. > I could write up a doc patch unless there is reason to prefer the > shutdown variant. It could be worth adding that mention in the _shutdown() doc indeed, but I'm not sure we need to enforce anything at this point. I'd rather have either, at the driver's authors discretion. Maxime
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu.h b/drivers/gpu/drm/amd/amdgpu/amdgpu.h index f87d53e183c3..c202a1d5ff5f 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu.h +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu.h @@ -1197,6 +1197,7 @@ static inline struct amdgpu_device *amdgpu_ttm_adev(struct ttm_device *bdev) int amdgpu_device_init(struct amdgpu_device *adev, uint32_t flags); void amdgpu_device_fini_hw(struct amdgpu_device *adev); +void amdgpu_device_shutdown_hw(struct amdgpu_device *adev); void amdgpu_device_fini_sw(struct amdgpu_device *adev); int amdgpu_gpu_wait_for_idle(struct amdgpu_device *adev); diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c index 861ccff78af9..a8c4b8412e04 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c @@ -4531,6 +4531,16 @@ void amdgpu_device_fini_hw(struct amdgpu_device *adev) } +void amdgpu_device_shutdown_hw(struct amdgpu_device *adev) +{ + if (adev->mode_info.mode_config_initialized) { + if (!drm_drv_uses_atomic_modeset(adev_to_drm(adev))) + drm_helper_force_disable_all(adev_to_drm(adev)); + else + drm_atomic_helper_shutdown(adev_to_drm(adev)); + } +} + void amdgpu_device_fini_sw(struct amdgpu_device *adev) { int idx; diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c index ea14f1c8f430..b34bf9259d5c 100644 --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c @@ -2409,6 +2409,8 @@ amdgpu_pci_shutdown(struct pci_dev *pdev) struct drm_device *dev = pci_get_drvdata(pdev); struct amdgpu_device *adev = drm_to_adev(dev); + amdgpu_device_shutdown_hw(adev); + if (amdgpu_ras_intr_triggered()) return;
Based on grepping through the source code this driver appears to be missing a call to drm_atomic_helper_shutdown() at system shutdown time. Among other things, this means that if a panel is in use that it won't be cleanly powered off at system shutdown time. The fact that we should call drm_atomic_helper_shutdown() in the case of OS shutdown/restart comes straight out of the kernel doc "driver instance overview" in drm_drv.c. Suggested-by: Maxime Ripard <mripard@kernel.org> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: Xinhui Pan <Xinhui.Pan@amd.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> --- This commit is only compile-time tested. ...and further, I'd say that this patch is more of a plea for help than a patch I think is actually right. I'm _fairly_ certain that drm/amdgpu needs this call at shutdown time but the logic is a bit hard for me to follow. I'd appreciate if anyone who actually knows what this should look like could illuminate me, or perhaps even just post a patch themselves! (no changes since v1) drivers/gpu/drm/amd/amdgpu/amdgpu.h | 1 + drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 10 ++++++++++ drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 ++ 3 files changed, 13 insertions(+)