Message ID | 0bf3266a7c6e42e5e19ed2040e6a8feb88202703.1736098238.git.mail@maciej.szmigiero.name (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Bjorn Helgaas |
Headers | show |
Series | [v2] net: wwan: iosm: Fix hibernation by re-binding the driver around it | expand |
Hi Maciej, On 05.01.2025 19:39, Maciej S. Szmigiero wrote: > Currently, the driver is seriously broken with respect to the > hibernation (S4): after image restore the device is back into > IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs > full re-launch of the rest of its firmware, but the driver restore > handler treats the device as merely sleeping and just sends it a > wake-up command. > > This wake-up command times out but device nodes (/dev/wwan*) remain > accessible. > However attempting to use them causes the bootloader to crash and > enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash > dump is ready"). > > It seems that the device cannot be re-initialized from this crashed > stage without toggling some reset pin (on my test platform that's > apparently what the device _RST ACPI method does). > > While it would theoretically be possible to rewrite the driver to tear > down the whole MUX / IPC layers on hibernation (so the bootloader does > not crash from improper access) and then re-launch the device on > restore this would require significant refactoring of the driver > (believe me, I've tried), since there are quite a few assumptions > hard-coded in the driver about the device never being partially > de-initialized (like channels other than devlink cannot be closed, > for example). > Probably this would also need some programming guide for this hardware. > > Considering that the driver seems orphaned [1] and other people are > hitting this issue too [2] fix it by simply unbinding the PCI driver > before hibernation and re-binding it after restore, much like > USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar > problem. > > Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses > the existing suspend / resume handlers) and S4 (which uses the new code). > > [1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@leemhuis.info/ > [2]: > https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413 > > Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> Generally looks good to me. Lets wait for approval from PCI maintainers to be sure that there no unexpected side effects. Reviewed-by: Sergey Ryazanov <ryazanov.s.a@gmail.com> There are few nit pics, please find below. > --- > > Changes from v1: > * Un-register the PM-notifier and PCI driver in iosm_ipc_driver_exit() > in the reverse order of their registration in iosm_ipc_driver_init(). > > * CC the PCI supporter and PCI mailing list in case there's some better > way to fix/implement all of this. > > drivers/net/wwan/iosm/iosm_ipc_pcie.c | 57 ++++++++++++++++++++++++++- > 1 file changed, 56 insertions(+), 1 deletion(-) > > diff --git a/drivers/net/wwan/iosm/iosm_ipc_pcie.c b/drivers/net/wwan/iosm/iosm_ipc_pcie.c > index 04517bd3325a..3ca81864a2fd 100644 > --- a/drivers/net/wwan/iosm/iosm_ipc_pcie.c > +++ b/drivers/net/wwan/iosm/iosm_ipc_pcie.c > @@ -6,6 +6,7 @@ > #include <linux/acpi.h> > #include <linux/bitfield.h> > #include <linux/module.h> > +#include <linux/suspend.h> > #include <net/rtnetlink.h> > > #include "iosm_ipc_imem.h" > @@ -448,7 +449,61 @@ static struct pci_driver iosm_ipc_driver = { > }, > .id_table = iosm_ipc_ids, > }; > -module_pci_driver(iosm_ipc_driver); > + > +static bool pci_registered; nit, global variables are usually placed at the beginning of a source file to allow effortless access to them in the future code changes. Move it next to wwan_acpi_guid please. > + > +static int pm_notify(struct notifier_block *nb, unsigned long mode, void *_unused) > +{ > + if (mode == PM_HIBERNATION_PREPARE || mode == PM_RESTORE_PREPARE) { > + if (pci_registered) { > + pci_unregister_driver(&iosm_ipc_driver); > + pci_registered = false; > + } > + } else if (mode == PM_POST_HIBERNATION || mode == PM_POST_RESTORE) { > + if (!pci_registered) { > + int ret; > + > + ret = pci_register_driver(&iosm_ipc_driver); > + if (ret) { > + pr_err(KBUILD_MODNAME ": unable to re-register PCI driver: %d\n", > + ret); > + } else { > + pci_registered = true; > + } > + } > + } > + > + return 0; > +} > + > +static struct notifier_block pm_notifier = { > + .notifier_call = pm_notify, > +}; > + > +static int __init iosm_ipc_driver_init(void) > +{ > + int ret; > + > + ret = pci_register_driver(&iosm_ipc_driver); > + if (ret) > + return ret; > + > + pci_registered = true; > + > + register_pm_notifier(&pm_notifier); > + > + return 0; > +} > +module_init(iosm_ipc_driver_init); > + > +static void __exit iosm_ipc_driver_exit(void) > +{ > + unregister_pm_notifier(&pm_notifier); > + > + if (pci_registered) > + pci_unregister_driver(&iosm_ipc_driver); > +} > +module_exit(iosm_ipc_driver_exit); Another nit. In opposite to global variables, module initialization and deinitialization handlers are usually placed at the end of a source file. With the same reason to facilitate access to other entities. Nobody calls the module init function, but the module init function would like to call something later. If you do not have a strong reason to keep iosm_ipc_driver_init/iosm_ipc_driver_exit here, please move them together to the end of the file. > int ipc_pcie_addr_map(struct iosm_pcie *ipc_pcie, unsigned char *data, > size_t size, dma_addr_t *mapping, int direction) -- Sergey
On Wed, Jan 08, 2025 at 01:13:41AM +0200, Sergey Ryazanov wrote: > On 05.01.2025 19:39, Maciej S. Szmigiero wrote: > > Currently, the driver is seriously broken with respect to the > > hibernation (S4): after image restore the device is back into > > IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs > > full re-launch of the rest of its firmware, but the driver restore > > handler treats the device as merely sleeping and just sends it a > > wake-up command. > > > > This wake-up command times out but device nodes (/dev/wwan*) remain > > accessible. > > However attempting to use them causes the bootloader to crash and > > enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash > > dump is ready"). > > > > It seems that the device cannot be re-initialized from this crashed > > stage without toggling some reset pin (on my test platform that's > > apparently what the device _RST ACPI method does). > > > > While it would theoretically be possible to rewrite the driver to tear > > down the whole MUX / IPC layers on hibernation (so the bootloader does > > not crash from improper access) and then re-launch the device on > > restore this would require significant refactoring of the driver > > (believe me, I've tried), since there are quite a few assumptions > > hard-coded in the driver about the device never being partially > > de-initialized (like channels other than devlink cannot be closed, > > for example). > > Probably this would also need some programming guide for this hardware. > > > > Considering that the driver seems orphaned [1] and other people are > > hitting this issue too [2] fix it by simply unbinding the PCI driver > > before hibernation and re-binding it after restore, much like > > USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar > > problem. > > > > Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses > > the existing suspend / resume handlers) and S4 (which uses the new code). > > > > [1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@leemhuis.info/ > > [2]: > > https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413 > > > > Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> > > Generally looks good to me. Lets wait for approval from PCI maintainers to > be sure that there no unexpected side effects. I have nothing useful to contribute here. Seems like kind of a mess. But Intel claims to maintain this, so it would be nice if they would step up and make this work nicely. Bjorn
Hi Bjorn, On 08.01.2025 01:45, Bjorn Helgaas wrote: > On Wed, Jan 08, 2025 at 01:13:41AM +0200, Sergey Ryazanov wrote: >> On 05.01.2025 19:39, Maciej S. Szmigiero wrote: >>> Currently, the driver is seriously broken with respect to the >>> hibernation (S4): after image restore the device is back into >>> IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs >>> full re-launch of the rest of its firmware, but the driver restore >>> handler treats the device as merely sleeping and just sends it a >>> wake-up command. >>> >>> This wake-up command times out but device nodes (/dev/wwan*) remain >>> accessible. >>> However attempting to use them causes the bootloader to crash and >>> enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash >>> dump is ready"). >>> >>> It seems that the device cannot be re-initialized from this crashed >>> stage without toggling some reset pin (on my test platform that's >>> apparently what the device _RST ACPI method does). >>> >>> While it would theoretically be possible to rewrite the driver to tear >>> down the whole MUX / IPC layers on hibernation (so the bootloader does >>> not crash from improper access) and then re-launch the device on >>> restore this would require significant refactoring of the driver >>> (believe me, I've tried), since there are quite a few assumptions >>> hard-coded in the driver about the device never being partially >>> de-initialized (like channels other than devlink cannot be closed, >>> for example). >>> Probably this would also need some programming guide for this hardware. >>> >>> Considering that the driver seems orphaned [1] and other people are >>> hitting this issue too [2] fix it by simply unbinding the PCI driver >>> before hibernation and re-binding it after restore, much like >>> USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar >>> problem. >>> >>> Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses >>> the existing suspend / resume handlers) and S4 (which uses the new code). >>> >>> [1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@leemhuis.info/ >>> [2]: >>> https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413 >>> >>> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> >> >> Generally looks good to me. Lets wait for approval from PCI maintainers to >> be sure that there no unexpected side effects. > > I have nothing useful to contribute here. Seems like kind of a mess. > But Intel claims to maintain this, so it would be nice if they would > step up and make this work nicely. Suddenly, Intel lost their interest in the modems market and, as Maciej mentioned, the driver was abandon for a quite time now. The author no more works for Intel. You will see the bounce. Bjorn, could you suggest how to deal easily with the device that is incapable to seamlessly recover from hibernation? I am totally hopeless regarding the PM topic. Or is the deep driver rework the only option? -- Sergey
[+cc Rafael, linux-pm because they *are* PM experts :)] On Wed, Jan 08, 2025 at 02:15:28AM +0200, Sergey Ryazanov wrote: > On 08.01.2025 01:45, Bjorn Helgaas wrote: > > On Wed, Jan 08, 2025 at 01:13:41AM +0200, Sergey Ryazanov wrote: > > > On 05.01.2025 19:39, Maciej S. Szmigiero wrote: > > > > Currently, the driver is seriously broken with respect to the > > > > hibernation (S4): after image restore the device is back into > > > > IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs > > > > full re-launch of the rest of its firmware, but the driver restore > > > > handler treats the device as merely sleeping and just sends it a > > > > wake-up command. > > > > > > > > This wake-up command times out but device nodes (/dev/wwan*) remain > > > > accessible. > > > > However attempting to use them causes the bootloader to crash and > > > > enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash > > > > dump is ready"). > > > > > > > > It seems that the device cannot be re-initialized from this crashed > > > > stage without toggling some reset pin (on my test platform that's > > > > apparently what the device _RST ACPI method does). > > > > > > > > While it would theoretically be possible to rewrite the driver to tear > > > > down the whole MUX / IPC layers on hibernation (so the bootloader does > > > > not crash from improper access) and then re-launch the device on > > > > restore this would require significant refactoring of the driver > > > > (believe me, I've tried), since there are quite a few assumptions > > > > hard-coded in the driver about the device never being partially > > > > de-initialized (like channels other than devlink cannot be closed, > > > > for example). > > > > Probably this would also need some programming guide for this hardware. > > > > > > > > Considering that the driver seems orphaned [1] and other people are > > > > hitting this issue too [2] fix it by simply unbinding the PCI driver > > > > before hibernation and re-binding it after restore, much like > > > > USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar > > > > problem. > > > > > > > > Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses > > > > the existing suspend / resume handlers) and S4 (which uses the new code). > > > > > > > > [1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@leemhuis.info/ > > > > [2]: > > > > https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413 > > > > > > > > Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> > > > > > > Generally looks good to me. Lets wait for approval from PCI > > > maintainers to be sure that there no unexpected side effects. > > > > I have nothing useful to contribute here. Seems like kind of a > > mess. But Intel claims to maintain this, so it would be nice if > > they would step up and make this work nicely. > > Suddenly, Intel lost their interest in the modems market and, as > Maciej mentioned, the driver was abandon for a quite time now. The > author no more works for Intel. You will see the bounce. Well, that's unfortunate :) Maybe step 0 is to remove the Intel entry from MAINTAINERS for this driver. > Bjorn, could you suggest how to deal easily with the device that is > incapable to seamlessly recover from hibernation? I am totally > hopeless regarding the PM topic. Or is the deep driver rework the > only option? I'm pretty PM-illiterate myself. Based on https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/pm/sleep-states.rst?id=v6.12#n109, I assume that when we resume after hibernate, devices are in the same state as after a fresh boot, i.e., the state driver .probe() methods see. So I assume that some combination of dev_pm_ops methods must be able to do basically the same as .probe() to get the device usable again after it was completely powered off and back on. https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/driver-api/pm/devices.rst?id=v6.12#n506 mentions .freeze(), .thaw(), .restore(), etc, but the fact that few drivers set those pointers and all the nice macros for setting pm ops (SYSTEM_SLEEP_PM_OPS, NOIRQ_SYSTEM_SLEEP_PM_OPS, etc) only take suspend and resume functions makes me think most drivers must handle hibernation in the same .suspend() and .resume() functions they use for non-hibernate transitions. Since all drivers have to cope with devices needing to be reinitialized after hibernate, I would look around to see how other drivers do it and see if you can do it similarly. Sorry this is still really a non-answer. Bjorn
On 8.01.2025 20:51, Bjorn Helgaas wrote: > [+cc Rafael, linux-pm because they *are* PM experts :)] > > On Wed, Jan 08, 2025 at 02:15:28AM +0200, Sergey Ryazanov wrote: >> On 08.01.2025 01:45, Bjorn Helgaas wrote: >>> On Wed, Jan 08, 2025 at 01:13:41AM +0200, Sergey Ryazanov wrote: >>>> On 05.01.2025 19:39, Maciej S. Szmigiero wrote: >>>>> Currently, the driver is seriously broken with respect to the >>>>> hibernation (S4): after image restore the device is back into >>>>> IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs >>>>> full re-launch of the rest of its firmware, but the driver restore >>>>> handler treats the device as merely sleeping and just sends it a >>>>> wake-up command. >>>>> >>>>> This wake-up command times out but device nodes (/dev/wwan*) remain >>>>> accessible. >>>>> However attempting to use them causes the bootloader to crash and >>>>> enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash >>>>> dump is ready"). >>>>> >>>>> It seems that the device cannot be re-initialized from this crashed >>>>> stage without toggling some reset pin (on my test platform that's >>>>> apparently what the device _RST ACPI method does). >>>>> >>>>> While it would theoretically be possible to rewrite the driver to tear >>>>> down the whole MUX / IPC layers on hibernation (so the bootloader does >>>>> not crash from improper access) and then re-launch the device on >>>>> restore this would require significant refactoring of the driver >>>>> (believe me, I've tried), since there are quite a few assumptions >>>>> hard-coded in the driver about the device never being partially >>>>> de-initialized (like channels other than devlink cannot be closed, >>>>> for example). >>>>> Probably this would also need some programming guide for this hardware. >>>>> >>>>> Considering that the driver seems orphaned [1] and other people are >>>>> hitting this issue too [2] fix it by simply unbinding the PCI driver >>>>> before hibernation and re-binding it after restore, much like >>>>> USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar >>>>> problem. >>>>> >>>>> Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses >>>>> the existing suspend / resume handlers) and S4 (which uses the new code). >>>>> >>>>> [1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@leemhuis.info/ >>>>> [2]: >>>>> https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413 >>>>> >>>>> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> >>>> >>>> Generally looks good to me. Lets wait for approval from PCI >>>> maintainers to be sure that there no unexpected side effects. >>> >>> I have nothing useful to contribute here. Seems like kind of a >>> mess. But Intel claims to maintain this, so it would be nice if >>> they would step up and make this work nicely. >> >> Suddenly, Intel lost their interest in the modems market and, as >> Maciej mentioned, the driver was abandon for a quite time now. The >> author no more works for Intel. You will see the bounce. > > Well, that's unfortunate :) Maybe step 0 is to remove the Intel > entry from MAINTAINERS for this driver. > >> Bjorn, could you suggest how to deal easily with the device that is >> incapable to seamlessly recover from hibernation? I am totally >> hopeless regarding the PM topic. Or is the deep driver rework the >> only option? > > I'm pretty PM-illiterate myself. Based on > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/pm/sleep-states.rst?id=v6.12#n109, > I assume that when we resume after hibernate, devices are in the same > state as after a fresh boot, i.e., the state driver .probe() methods > see. > > So I assume that some combination of dev_pm_ops methods must be able > to do basically the same as .probe() to get the device usable again > after it was completely powered off and back on. You are right that it should be theoretically possible to fix this issue by re-initializing the driver in the hibernation restore/thaw callbacks and I even have tried to do so in the beginning. But as I wrote in this patch description, doing so would need significant refactoring of the driver as it is not currently capable of being de-initialized and re-initialized partially. Hence this patch approach of simply re-binding the driver which also seemed safer in the absence of any real programming docs for this hardware. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/driver-api/pm/devices.rst?id=v6.12#n506 > mentions .freeze(), .thaw(), .restore(), etc, but the fact that few > drivers set those pointers and all the nice macros for setting pm ops > (SYSTEM_SLEEP_PM_OPS, NOIRQ_SYSTEM_SLEEP_PM_OPS, etc) only take > suspend and resume functions makes me think most drivers must handle > hibernation in the same .suspend() and .resume() functions they use > for non-hibernate transitions. > > Since all drivers have to cope with devices needing to be > reinitialized after hibernate, I would look around to see how other > drivers do it and see if you can do it similarly. > > > Sorry this is still really a non-answer. > > Bjorn Thanks, Maciej
On Wed, Jan 8, 2025 at 8:51 PM Bjorn Helgaas <helgaas@kernel.org> wrote: > > [+cc Rafael, linux-pm because they *are* PM experts :)] > > On Wed, Jan 08, 2025 at 02:15:28AM +0200, Sergey Ryazanov wrote: > > On 08.01.2025 01:45, Bjorn Helgaas wrote: > > > On Wed, Jan 08, 2025 at 01:13:41AM +0200, Sergey Ryazanov wrote: > > > > On 05.01.2025 19:39, Maciej S. Szmigiero wrote: > > > > > Currently, the driver is seriously broken with respect to the > > > > > hibernation (S4): after image restore the device is back into > > > > > IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs > > > > > full re-launch of the rest of its firmware, but the driver restore > > > > > handler treats the device as merely sleeping and just sends it a > > > > > wake-up command. > > > > > > > > > > This wake-up command times out but device nodes (/dev/wwan*) remain > > > > > accessible. > > > > > However attempting to use them causes the bootloader to crash and > > > > > enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash > > > > > dump is ready"). > > > > > > > > > > It seems that the device cannot be re-initialized from this crashed > > > > > stage without toggling some reset pin (on my test platform that's > > > > > apparently what the device _RST ACPI method does). > > > > > > > > > > While it would theoretically be possible to rewrite the driver to tear > > > > > down the whole MUX / IPC layers on hibernation (so the bootloader does > > > > > not crash from improper access) and then re-launch the device on > > > > > restore this would require significant refactoring of the driver > > > > > (believe me, I've tried), since there are quite a few assumptions > > > > > hard-coded in the driver about the device never being partially > > > > > de-initialized (like channels other than devlink cannot be closed, > > > > > for example). > > > > > Probably this would also need some programming guide for this hardware. > > > > > > > > > > Considering that the driver seems orphaned [1] and other people are > > > > > hitting this issue too [2] fix it by simply unbinding the PCI driver > > > > > before hibernation and re-binding it after restore, much like > > > > > USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar > > > > > problem. > > > > > > > > > > Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses > > > > > the existing suspend / resume handlers) and S4 (which uses the new code). > > > > > > > > > > [1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@leemhuis.info/ > > > > > [2]: > > > > > https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413 > > > > > > > > > > Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> > > > > > > > > Generally looks good to me. Lets wait for approval from PCI > > > > maintainers to be sure that there no unexpected side effects. > > > > > > I have nothing useful to contribute here. Seems like kind of a > > > mess. But Intel claims to maintain this, so it would be nice if > > > they would step up and make this work nicely. > > > > Suddenly, Intel lost their interest in the modems market and, as > > Maciej mentioned, the driver was abandon for a quite time now. The > > author no more works for Intel. You will see the bounce. > > Well, that's unfortunate :) Maybe step 0 is to remove the Intel > entry from MAINTAINERS for this driver. > > > Bjorn, could you suggest how to deal easily with the device that is > > incapable to seamlessly recover from hibernation? I am totally > > hopeless regarding the PM topic. Or is the deep driver rework the > > only option? > > I'm pretty PM-illiterate myself. Based on > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/pm/sleep-states.rst?id=v6.12#n109, > I assume that when we resume after hibernate, devices are in the same > state as after a fresh boot, i.e., the state driver .probe() methods > see. Well, yes and no. There are two kernels involved in resume from hibernation: the restore kernel that runs just like after a fresh boot except that at one point in the boot process it attempts to load a hibernation image and (if loading the image succeeds) jumps to the other kernel instance included in the image, referred to as the image kernel. For the restore kernel, devices are in the same state as after a fresh boot (generally speaking, with some rare exceptions that are irrelevant here IMV), but for the image kernel, they are in whatever state the restore kernel has put them into. > So I assume that some combination of dev_pm_ops methods must be able > to do basically the same as .probe() to get the device usable again > after it was completely powered off and back on. Yes, but if the restore kernel has a driver for the device in question, it may as well have initialized that device already. In that case, the image kernel has much less to do to get the device to work again. The caveat is that the image kernel doesn't really know whether or not this has been the case. > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/driver-api/pm/devices.rst?id=v6.12#n506 > mentions .freeze(), .thaw(), .restore(), etc, but the fact that few > drivers set those pointers and all the nice macros for setting pm ops > (SYSTEM_SLEEP_PM_OPS, NOIRQ_SYSTEM_SLEEP_PM_OPS, etc) only take > suspend and resume functions makes me think most drivers must handle > hibernation in the same .suspend() and .resume() functions they use > for non-hibernate transitions. > > Since all drivers have to cope with devices needing to be > reinitialized after hibernate, I would look around to see how other > drivers do it and see if you can do it similarly. This generally is good advice, but if the platform AML is defective, it may not be sufficient.
On Wed, Jan 8, 2025 at 9:04 PM Maciej S. Szmigiero <mail@maciej.szmigiero.name> wrote: > > On 8.01.2025 20:51, Bjorn Helgaas wrote: > > [+cc Rafael, linux-pm because they *are* PM experts :)] > > > > On Wed, Jan 08, 2025 at 02:15:28AM +0200, Sergey Ryazanov wrote: > >> On 08.01.2025 01:45, Bjorn Helgaas wrote: > >>> On Wed, Jan 08, 2025 at 01:13:41AM +0200, Sergey Ryazanov wrote: > >>>> On 05.01.2025 19:39, Maciej S. Szmigiero wrote: > >>>>> Currently, the driver is seriously broken with respect to the > >>>>> hibernation (S4): after image restore the device is back into > >>>>> IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs > >>>>> full re-launch of the rest of its firmware, but the driver restore > >>>>> handler treats the device as merely sleeping and just sends it a > >>>>> wake-up command. > >>>>> > >>>>> This wake-up command times out but device nodes (/dev/wwan*) remain > >>>>> accessible. > >>>>> However attempting to use them causes the bootloader to crash and > >>>>> enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash > >>>>> dump is ready"). > >>>>> > >>>>> It seems that the device cannot be re-initialized from this crashed > >>>>> stage without toggling some reset pin (on my test platform that's > >>>>> apparently what the device _RST ACPI method does). > >>>>> > >>>>> While it would theoretically be possible to rewrite the driver to tear > >>>>> down the whole MUX / IPC layers on hibernation (so the bootloader does > >>>>> not crash from improper access) and then re-launch the device on > >>>>> restore this would require significant refactoring of the driver > >>>>> (believe me, I've tried), since there are quite a few assumptions > >>>>> hard-coded in the driver about the device never being partially > >>>>> de-initialized (like channels other than devlink cannot be closed, > >>>>> for example). > >>>>> Probably this would also need some programming guide for this hardware. > >>>>> > >>>>> Considering that the driver seems orphaned [1] and other people are > >>>>> hitting this issue too [2] fix it by simply unbinding the PCI driver > >>>>> before hibernation and re-binding it after restore, much like > >>>>> USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar > >>>>> problem. > >>>>> > >>>>> Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses > >>>>> the existing suspend / resume handlers) and S4 (which uses the new code). > >>>>> > >>>>> [1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@leemhuis.info/ > >>>>> [2]: > >>>>> https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413 > >>>>> > >>>>> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> > >>>> > >>>> Generally looks good to me. Lets wait for approval from PCI > >>>> maintainers to be sure that there no unexpected side effects. > >>> > >>> I have nothing useful to contribute here. Seems like kind of a > >>> mess. But Intel claims to maintain this, so it would be nice if > >>> they would step up and make this work nicely. > >> > >> Suddenly, Intel lost their interest in the modems market and, as > >> Maciej mentioned, the driver was abandon for a quite time now. The > >> author no more works for Intel. You will see the bounce. > > > > Well, that's unfortunate :) Maybe step 0 is to remove the Intel > > entry from MAINTAINERS for this driver. > > > >> Bjorn, could you suggest how to deal easily with the device that is > >> incapable to seamlessly recover from hibernation? I am totally > >> hopeless regarding the PM topic. Or is the deep driver rework the > >> only option? > > > > I'm pretty PM-illiterate myself. Based on > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/pm/sleep-states.rst?id=v6.12#n109, > > I assume that when we resume after hibernate, devices are in the same > > state as after a fresh boot, i.e., the state driver .probe() methods > > see. > > > > So I assume that some combination of dev_pm_ops methods must be able > > to do basically the same as .probe() to get the device usable again > > after it was completely powered off and back on. > > You are right that it should be theoretically possible to fix this issue > by re-initializing the driver in the hibernation restore/thaw callbacks > and I even have tried to do so in the beginning. > > But as I wrote in this patch description, doing so would need significant > refactoring of the driver as it is not currently capable of being > de-initialized and re-initialized partially. > > Hence this patch approach of simply re-binding the driver which also > seemed safer in the absence of any real programming docs for this hardware. While this may not be elegant, it may actually get the job done. Can you please resend the patch with a CC to linux-pm@vger.kernel.org?
On 8.01.2025 21:17, Rafael J. Wysocki wrote: > On Wed, Jan 8, 2025 at 9:04 PM Maciej S. Szmigiero > <mail@maciej.szmigiero.name> wrote: >> >> On 8.01.2025 20:51, Bjorn Helgaas wrote: >>> [+cc Rafael, linux-pm because they *are* PM experts :)] >>> >>> On Wed, Jan 08, 2025 at 02:15:28AM +0200, Sergey Ryazanov wrote: >>>> On 08.01.2025 01:45, Bjorn Helgaas wrote: >>>>> On Wed, Jan 08, 2025 at 01:13:41AM +0200, Sergey Ryazanov wrote: >>>>>> On 05.01.2025 19:39, Maciej S. Szmigiero wrote: >>>>>>> Currently, the driver is seriously broken with respect to the >>>>>>> hibernation (S4): after image restore the device is back into >>>>>>> IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs >>>>>>> full re-launch of the rest of its firmware, but the driver restore >>>>>>> handler treats the device as merely sleeping and just sends it a >>>>>>> wake-up command. >>>>>>> >>>>>>> This wake-up command times out but device nodes (/dev/wwan*) remain >>>>>>> accessible. >>>>>>> However attempting to use them causes the bootloader to crash and >>>>>>> enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash >>>>>>> dump is ready"). >>>>>>> >>>>>>> It seems that the device cannot be re-initialized from this crashed >>>>>>> stage without toggling some reset pin (on my test platform that's >>>>>>> apparently what the device _RST ACPI method does). >>>>>>> >>>>>>> While it would theoretically be possible to rewrite the driver to tear >>>>>>> down the whole MUX / IPC layers on hibernation (so the bootloader does >>>>>>> not crash from improper access) and then re-launch the device on >>>>>>> restore this would require significant refactoring of the driver >>>>>>> (believe me, I've tried), since there are quite a few assumptions >>>>>>> hard-coded in the driver about the device never being partially >>>>>>> de-initialized (like channels other than devlink cannot be closed, >>>>>>> for example). >>>>>>> Probably this would also need some programming guide for this hardware. >>>>>>> >>>>>>> Considering that the driver seems orphaned [1] and other people are >>>>>>> hitting this issue too [2] fix it by simply unbinding the PCI driver >>>>>>> before hibernation and re-binding it after restore, much like >>>>>>> USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar >>>>>>> problem. >>>>>>> >>>>>>> Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses >>>>>>> the existing suspend / resume handlers) and S4 (which uses the new code). >>>>>>> >>>>>>> [1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@leemhuis.info/ >>>>>>> [2]: >>>>>>> https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413 >>>>>>> >>>>>>> Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> >>>>>> >>>>>> Generally looks good to me. Lets wait for approval from PCI >>>>>> maintainers to be sure that there no unexpected side effects. >>>>> >>>>> I have nothing useful to contribute here. Seems like kind of a >>>>> mess. But Intel claims to maintain this, so it would be nice if >>>>> they would step up and make this work nicely. >>>> >>>> Suddenly, Intel lost their interest in the modems market and, as >>>> Maciej mentioned, the driver was abandon for a quite time now. The >>>> author no more works for Intel. You will see the bounce. >>> >>> Well, that's unfortunate :) Maybe step 0 is to remove the Intel >>> entry from MAINTAINERS for this driver. >>> >>>> Bjorn, could you suggest how to deal easily with the device that is >>>> incapable to seamlessly recover from hibernation? I am totally >>>> hopeless regarding the PM topic. Or is the deep driver rework the >>>> only option? >>> >>> I'm pretty PM-illiterate myself. Based on >>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/Documentation/admin-guide/pm/sleep-states.rst?id=v6.12#n109, >>> I assume that when we resume after hibernate, devices are in the same >>> state as after a fresh boot, i.e., the state driver .probe() methods >>> see. >>> >>> So I assume that some combination of dev_pm_ops methods must be able >>> to do basically the same as .probe() to get the device usable again >>> after it was completely powered off and back on. >> >> You are right that it should be theoretically possible to fix this issue >> by re-initializing the driver in the hibernation restore/thaw callbacks >> and I even have tried to do so in the beginning. >> >> But as I wrote in this patch description, doing so would need significant >> refactoring of the driver as it is not currently capable of being >> de-initialized and re-initialized partially. >> >> Hence this patch approach of simply re-binding the driver which also >> seemed safer in the absence of any real programming docs for this hardware. > > While this may not be elegant, it may actually get the job done. > > Can you please resend the patch with a CC to linux-pm@vger.kernel.org? Will do. Thanks, Maciej
diff --git a/drivers/net/wwan/iosm/iosm_ipc_pcie.c b/drivers/net/wwan/iosm/iosm_ipc_pcie.c index 04517bd3325a..3ca81864a2fd 100644 --- a/drivers/net/wwan/iosm/iosm_ipc_pcie.c +++ b/drivers/net/wwan/iosm/iosm_ipc_pcie.c @@ -6,6 +6,7 @@ #include <linux/acpi.h> #include <linux/bitfield.h> #include <linux/module.h> +#include <linux/suspend.h> #include <net/rtnetlink.h> #include "iosm_ipc_imem.h" @@ -448,7 +449,61 @@ static struct pci_driver iosm_ipc_driver = { }, .id_table = iosm_ipc_ids, }; -module_pci_driver(iosm_ipc_driver); + +static bool pci_registered; + +static int pm_notify(struct notifier_block *nb, unsigned long mode, void *_unused) +{ + if (mode == PM_HIBERNATION_PREPARE || mode == PM_RESTORE_PREPARE) { + if (pci_registered) { + pci_unregister_driver(&iosm_ipc_driver); + pci_registered = false; + } + } else if (mode == PM_POST_HIBERNATION || mode == PM_POST_RESTORE) { + if (!pci_registered) { + int ret; + + ret = pci_register_driver(&iosm_ipc_driver); + if (ret) { + pr_err(KBUILD_MODNAME ": unable to re-register PCI driver: %d\n", + ret); + } else { + pci_registered = true; + } + } + } + + return 0; +} + +static struct notifier_block pm_notifier = { + .notifier_call = pm_notify, +}; + +static int __init iosm_ipc_driver_init(void) +{ + int ret; + + ret = pci_register_driver(&iosm_ipc_driver); + if (ret) + return ret; + + pci_registered = true; + + register_pm_notifier(&pm_notifier); + + return 0; +} +module_init(iosm_ipc_driver_init); + +static void __exit iosm_ipc_driver_exit(void) +{ + unregister_pm_notifier(&pm_notifier); + + if (pci_registered) + pci_unregister_driver(&iosm_ipc_driver); +} +module_exit(iosm_ipc_driver_exit); int ipc_pcie_addr_map(struct iosm_pcie *ipc_pcie, unsigned char *data, size_t size, dma_addr_t *mapping, int direction)
Currently, the driver is seriously broken with respect to the hibernation (S4): after image restore the device is back into IPC_MEM_EXEC_STAGE_BOOT (which AFAIK means bootloader stage) and needs full re-launch of the rest of its firmware, but the driver restore handler treats the device as merely sleeping and just sends it a wake-up command. This wake-up command times out but device nodes (/dev/wwan*) remain accessible. However attempting to use them causes the bootloader to crash and enter IPC_MEM_EXEC_STAGE_CD_READY stage (which apparently means "a crash dump is ready"). It seems that the device cannot be re-initialized from this crashed stage without toggling some reset pin (on my test platform that's apparently what the device _RST ACPI method does). While it would theoretically be possible to rewrite the driver to tear down the whole MUX / IPC layers on hibernation (so the bootloader does not crash from improper access) and then re-launch the device on restore this would require significant refactoring of the driver (believe me, I've tried), since there are quite a few assumptions hard-coded in the driver about the device never being partially de-initialized (like channels other than devlink cannot be closed, for example). Probably this would also need some programming guide for this hardware. Considering that the driver seems orphaned [1] and other people are hitting this issue too [2] fix it by simply unbinding the PCI driver before hibernation and re-binding it after restore, much like USB_QUIRK_RESET_RESUME does for USB devices that exhibit a similar problem. Tested on XMM7360 in HP EliteBook 855 G7 both with s2idle (which uses the existing suspend / resume handlers) and S4 (which uses the new code). [1]: https://lore.kernel.org/all/c248f0b4-2114-4c61-905f-466a786bdebb@leemhuis.info/ [2]: https://github.com/xmm7360/xmm7360-pci/issues/211#issuecomment-1804139413 Signed-off-by: Maciej S. Szmigiero <mail@maciej.szmigiero.name> --- Changes from v1: * Un-register the PM-notifier and PCI driver in iosm_ipc_driver_exit() in the reverse order of their registration in iosm_ipc_driver_init(). * CC the PCI supporter and PCI mailing list in case there's some better way to fix/implement all of this. drivers/net/wwan/iosm/iosm_ipc_pcie.c | 57 ++++++++++++++++++++++++++- 1 file changed, 56 insertions(+), 1 deletion(-)