Message ID | 20240523162207.1.I2395e66cf70c6e67d774c56943825c289b9c13e4@changeid (mailing list archive) |
---|---|
State | Not Applicable |
Headers | show |
Series | serial: Fix problems when serial transfer is happening at suspend time | expand |
Quoting Douglas Anderson (2024-05-23 16:22:12) > Recently, suspend testing on sc7180-trogdor based devices has started > to sometimes fail with messages like this: > > port a88000.serial:0.0: PM: calling pm_runtime_force_suspend+0x0/0xf8 @ 28934, parent: a88000.serial:0 > port a88000.serial:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0xf8 returns -16 > port a88000.serial:0.0: PM: pm_runtime_force_suspend+0x0/0xf8 returned -16 after 33 usecs > port a88000.serial:0.0: PM: failed to suspend: error -16 > > I could reproduce these problem by logging in via an agetty on the > debug serial port (which was _not_ used for kernel console) and > running: > cat /var/log/messages > ...and then (via an SSH session) forcing a few suspend/resume cycles. > > Tracing through the code and doing some printf debugging shows that > the -16 (-EBUSY) comes from the recently added > serial_port_runtime_suspend(). > > The idea of the serial_port_runtime_suspend() function is to prevent > the port from being _runtime_ suspended if it still has bytes left to > transmit. Having bytes left to transmit isn't a reason to block > _system_ suspend, though. Can you elaborate? I paused to think that maybe we would want to make sure that everything that was transmitted had been transmitted but that doesn't seem right because it's a problem for higher layers to solve, e.g. serdev would want to make sure some sleep command sent over the wire actually got sent. > The DEFINE_RUNTIME_DEV_PM_OPS() used by the > serial_port code means that the system suspend function will be > pm_runtime_force_suspend(). In pm_runtime_force_suspend() we can see > that before calling the runtime suspend function we'll call > pm_runtime_disable(). This should be a reliable way to detect that > we're called from system suspend and that we shouldn't look for > busyness. > > Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy") > Signed-off-by: Douglas Anderson <dianders@chromium.org> > --- > > drivers/tty/serial/serial_port.c | 10 ++++++++++ > 1 file changed, 10 insertions(+) > > diff --git a/drivers/tty/serial/serial_port.c b/drivers/tty/serial/serial_port.c > index 91a338d3cb34..b781227cc996 100644 > --- a/drivers/tty/serial/serial_port.c > +++ b/drivers/tty/serial/serial_port.c > @@ -64,6 +64,16 @@ static int serial_port_runtime_suspend(struct device *dev) > if (port->flags & UPF_DEAD) > return 0; > > + /* > + * We only want to check the busyness of the port if PM Runtime is > + * enabled. Specifically PM Runtime will be disabled by > + * pm_runtime_force_suspend() during system suspend and we don't want > + * to block system suspend even if there is data still left to > + * transmit. We only want to block regulator PM Runtime transitions. s/regulator/regular/ Is this a typo? Also, why is "runtime" capitalized? > + */ > + if (!pm_runtime_enabled(dev)) > + return 0;
Hi, On Thu, May 23, 2024 at 5:25 PM Stephen Boyd <swboyd@chromium.org> wrote: > > Quoting Douglas Anderson (2024-05-23 16:22:12) > > Recently, suspend testing on sc7180-trogdor based devices has started > > to sometimes fail with messages like this: > > > > port a88000.serial:0.0: PM: calling pm_runtime_force_suspend+0x0/0xf8 @ 28934, parent: a88000.serial:0 > > port a88000.serial:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0xf8 returns -16 > > port a88000.serial:0.0: PM: pm_runtime_force_suspend+0x0/0xf8 returned -16 after 33 usecs > > port a88000.serial:0.0: PM: failed to suspend: error -16 > > > > I could reproduce these problem by logging in via an agetty on the > > debug serial port (which was _not_ used for kernel console) and > > running: > > cat /var/log/messages > > ...and then (via an SSH session) forcing a few suspend/resume cycles. > > > > Tracing through the code and doing some printf debugging shows that > > the -16 (-EBUSY) comes from the recently added > > serial_port_runtime_suspend(). > > > > The idea of the serial_port_runtime_suspend() function is to prevent > > the port from being _runtime_ suspended if it still has bytes left to > > transmit. Having bytes left to transmit isn't a reason to block > > _system_ suspend, though. > > Can you elaborate? I paused to think that maybe we would want to make > sure that everything that was transmitted had been transmitted but that > doesn't seem right because it's a problem for higher layers to solve, > e.g. serdev would want to make sure some sleep command sent over the > wire actually got sent. I don't have a ton of intuition about how the new "serial_port" code is designed to work. The patch that added it mentioned that it was based on suggestions by a whole bunch of people but there were no links to the previous discussions so it wasn't easy to get tons of context. I would certainly love it if people who have been involved could chime in and say whether my patch is correct. If anyone has any suggestions for something better to say in the commit message I'm happy to use different wording as well. In any case, looking at it I guess a serdev device would use serdev_device_wait_until_sent() to ensure its command was sent. That eventually seems to trickle down to the UART function tx_empty(). If a serdev device needs to block suspend while waiting then that would be up to the serdev device. This could be implicit if the serdev_device_wait_until_sent() was being called as part of the serdev device's suspend() function or perhaps could be through some sort of locking. ...so really I think the case we're running into is if userspace has a whole bunch of bytes queued up to write out the UART. That shouldn't block suspend. If userspace needs to block suspend they should use another method. > > The DEFINE_RUNTIME_DEV_PM_OPS() used by the > > serial_port code means that the system suspend function will be > > pm_runtime_force_suspend(). In pm_runtime_force_suspend() we can see > > that before calling the runtime suspend function we'll call > > pm_runtime_disable(). This should be a reliable way to detect that > > we're called from system suspend and that we shouldn't look for > > busyness. > > > > Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy") > > Signed-off-by: Douglas Anderson <dianders@chromium.org> > > --- > > > > drivers/tty/serial/serial_port.c | 10 ++++++++++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/drivers/tty/serial/serial_port.c b/drivers/tty/serial/serial_port.c > > index 91a338d3cb34..b781227cc996 100644 > > --- a/drivers/tty/serial/serial_port.c > > +++ b/drivers/tty/serial/serial_port.c > > @@ -64,6 +64,16 @@ static int serial_port_runtime_suspend(struct device *dev) > > if (port->flags & UPF_DEAD) > > return 0; > > > > + /* > > + * We only want to check the busyness of the port if PM Runtime is > > + * enabled. Specifically PM Runtime will be disabled by > > + * pm_runtime_force_suspend() during system suspend and we don't want > > + * to block system suspend even if there is data still left to > > + * transmit. We only want to block regulator PM Runtime transitions. > > s/regulator/regular/ > > Is this a typo? Yeah, I'll fix that to "regular". I'll wait (probably till Tuesday) before sending a v2. > Also, why is "runtime" capitalized? Somehow I thought it was supposed to be, but you're right that it looks weird. I guess the docs call it "runtime PM" so I'll change all instances to that in v2. -Doug
On Thu, May 23, 2024 at 04:22:12PM -0700, Douglas Anderson wrote: > Recently, suspend testing on sc7180-trogdor based devices has started > to sometimes fail with messages like this: > > port a88000.serial:0.0: PM: calling pm_runtime_force_suspend+0x0/0xf8 @ 28934, parent: a88000.serial:0 > port a88000.serial:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0xf8 returns -16 > port a88000.serial:0.0: PM: pm_runtime_force_suspend+0x0/0xf8 returned -16 after 33 usecs > port a88000.serial:0.0: PM: failed to suspend: error -16 > > I could reproduce these problem by logging in via an agetty on the > debug serial port (which was _not_ used for kernel console) and > running: > cat /var/log/messages > ...and then (via an SSH session) forcing a few suspend/resume cycles. > > Tracing through the code and doing some printf debugging shows that printf() ...or... printf()-based > the -16 (-EBUSY) comes from the recently added > serial_port_runtime_suspend(). > > The idea of the serial_port_runtime_suspend() function is to prevent > the port from being _runtime_ suspended if it still has bytes left to > transmit. Having bytes left to transmit isn't a reason to block > _system_ suspend, though. The DEFINE_RUNTIME_DEV_PM_OPS() used by the > serial_port code means that the system suspend function will be > pm_runtime_force_suspend(). In pm_runtime_force_suspend() we can see > that before calling the runtime suspend function we'll call > pm_runtime_disable(). This should be a reliable way to detect that > we're called from system suspend and that we shouldn't look for > busyness. ... > + /* > + * We only want to check the busyness of the port if PM Runtime is > + * enabled. Specifically PM Runtime will be disabled by > + * pm_runtime_force_suspend() during system suspend and we don't want > + * to block system suspend even if there is data still left to > + * transmit. We only want to block regulator PM Runtime transitions. regular > + */ > + if (!pm_runtime_enabled(dev)) > + return 0;
diff --git a/drivers/tty/serial/serial_port.c b/drivers/tty/serial/serial_port.c index 91a338d3cb34..b781227cc996 100644 --- a/drivers/tty/serial/serial_port.c +++ b/drivers/tty/serial/serial_port.c @@ -64,6 +64,16 @@ static int serial_port_runtime_suspend(struct device *dev) if (port->flags & UPF_DEAD) return 0; + /* + * We only want to check the busyness of the port if PM Runtime is + * enabled. Specifically PM Runtime will be disabled by + * pm_runtime_force_suspend() during system suspend and we don't want + * to block system suspend even if there is data still left to + * transmit. We only want to block regulator PM Runtime transitions. + */ + if (!pm_runtime_enabled(dev)) + return 0; + uart_port_lock_irqsave(port, &flags); if (!port_dev->tx_enabled) { uart_port_unlock_irqrestore(port, flags);
Recently, suspend testing on sc7180-trogdor based devices has started to sometimes fail with messages like this: port a88000.serial:0.0: PM: calling pm_runtime_force_suspend+0x0/0xf8 @ 28934, parent: a88000.serial:0 port a88000.serial:0.0: PM: dpm_run_callback(): pm_runtime_force_suspend+0x0/0xf8 returns -16 port a88000.serial:0.0: PM: pm_runtime_force_suspend+0x0/0xf8 returned -16 after 33 usecs port a88000.serial:0.0: PM: failed to suspend: error -16 I could reproduce these problem by logging in via an agetty on the debug serial port (which was _not_ used for kernel console) and running: cat /var/log/messages ...and then (via an SSH session) forcing a few suspend/resume cycles. Tracing through the code and doing some printf debugging shows that the -16 (-EBUSY) comes from the recently added serial_port_runtime_suspend(). The idea of the serial_port_runtime_suspend() function is to prevent the port from being _runtime_ suspended if it still has bytes left to transmit. Having bytes left to transmit isn't a reason to block _system_ suspend, though. The DEFINE_RUNTIME_DEV_PM_OPS() used by the serial_port code means that the system suspend function will be pm_runtime_force_suspend(). In pm_runtime_force_suspend() we can see that before calling the runtime suspend function we'll call pm_runtime_disable(). This should be a reliable way to detect that we're called from system suspend and that we shouldn't look for busyness. Fixes: 43066e32227e ("serial: port: Don't suspend if the port is still busy") Signed-off-by: Douglas Anderson <dianders@chromium.org> --- drivers/tty/serial/serial_port.c | 10 ++++++++++ 1 file changed, 10 insertions(+)