diff mbox series

[v9] platform/chrome: cros_ec_lpc: Move host command to prepare/complete

Message ID 20230515142552.1.I17cae37888be3a8683911991602f18e482e7a621@changeid (mailing list archive)
State Accepted
Commit 4b9abbc132b86e2ce65090798186836f48f33382
Headers show
Series [v9] platform/chrome: cros_ec_lpc: Move host command to prepare/complete | expand

Commit Message

Tim Van Patten May 15, 2023, 8:25 p.m. UTC
Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM
.prepare() and cros_ec_lpc_complete() during .complete(). This moves the
host command that the AP sends and allows the EC to log entry/exit of
AP's suspend/resume more accurately.

Changes in v9:
- Remove log statements.
- Ignore return value from cros_ec_resume().

Changes in v8:
- Resend with chromium.org account.
- No code changes.

Changes in v7:
- Rename "host event" to "host command" in title/description.

Changes in v6:
- Fully restore fixes from v3.

Changes in v5:
- Restore fixes from v3.

Changes in v4:
- Update title and description.

Changes in v3:
- Update cros_ec_lpc_suspend() to cros_ec_lpc_prepare()
- Update cros_ec_lpc_resume() to cros_ec_lpc_complete()

Changes in v2:
- Include cros_ec_resume() return value in dev_info() output.
- Guard setting .prepare/.complete with #ifdef CONFIG_PM_SLEEP.

Reviewed-by: Raul E Rangel <rrangel@chromium.org>
Signed-off-by: Tim Van Patten <timvp@chromium.org>
---

 drivers/platform/chrome/cros_ec_lpc.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

Comments

Tzung-Bi Shih May 17, 2023, 2:35 a.m. UTC | #1
On Mon, May 15, 2023 at 02:25:52PM -0600, Tim Van Patten wrote:
> Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM
> .prepare() and cros_ec_lpc_complete() during .complete(). This moves the
> host command that the AP sends and allows the EC to log entry/exit of
> AP's suspend/resume more accurately.

I can understand the patch wants to notify EC earlier/later when the system
suspend/resume.  But what is the issue addressed?  What happens if the
measurement of suspend/resume duration is not that accurate?

Copied from my previous mail:
* Should it move the callbacks?
  * Is it appropriate to call cros_ec_suspend() when PM is still in prepare
    phase and call cros_ec_resume() when PM is already in complete phase?

It seems prepare() is a more general callback.  It could be followed by
suspend(), freeze(), or poweroff()[1].  Do we expect the change?  For example,
the system is going to power off but EC gets notification about the system
should be going to suspend.  Same as complete().

Moreover, cros_ec_suspend() and cros_ec_resume() do more than just notify EC.
E.g. [2].

What about other interfaces (i2c, spi, uart)?  Do they also need to change
the callbacks?

[1]: https://elixir.bootlin.com/linux/v6.4-rc1/source/include/linux/pm.h#L74
[2]: https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/platform/chrome/cros_ec.c#L351

> Changes in v9:
> - Remove log statements.
> - Ignore return value from cros_ec_resume().

The change logs are not part of commit message.  They should put after "---".
Tim Van Patten May 17, 2023, 3:56 p.m. UTC | #2
Hi,

> I can understand the patch wants to notify EC earlier/later when the system
suspend/resume.  But what is the issue addressed?  What happens if the
measurement of suspend/resume duration is not that accurate?

Please see the following:
- b/206860487
- https://docs.google.com/document/d/1AgaZmG70bAKhZb-ZMbZT-TyY49zPoKuDDbD61dDBSTQ/edit?disco=AAAAws1enlw&usp_dm=false

The issue is that we need the EC aware of the AP being in the process
of suspend/resume from start to finish, so we can accurately
determine:
- How long the process took to better gauge we're meeting ChromeOS requirements.
- When the AP failed to complete the process, so we can collect data
and perform error recovery.

If the EC isn't monitoring the entire process, then:
- We get an inaccurate representation of when the suspend/resume
started/finished.
- The AP can fail during the gaps in monitoring and the EC will be
unable to detect and recover from it.

> It seems prepare() is a more general callback.  It could be followed by
suspend(), freeze(), or poweroff()[1].  Do we expect the change?  For example,
the system is going to power off but EC gets notification about the system
should be going to suspend.  Same as complete().

Please reference the implementation of SET_LATE_SYSTEM_SLEEP_PM_OPS
and see that we were already calling it in the poweroff path:

#define SET_LATE_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \
.suspend_late = suspend_fn, \
.resume_early = resume_fn, \
.freeze_late = suspend_fn, \
.thaw_early = resume_fn, \
.poweroff_late = suspend_fn, \  <<---- here
.restore_early = resume_fn,

* @poweroff_late: Continue operations started by @poweroff(). Analogous to
* @suspend_late(), but it need not save the device's settings in memory.

So, there is unlikely to be any functional difference with this change
in terms of poweroff.

> What about other interfaces (i2c, spi, uart)?  Do they also need to change
the callbacks?

We aren't concerned about those devices, because they aren't being
used on the devices we're seeing issues with. If devices using those
ECs want this change, they can pick it up as well, but we don't have
any way to test changes on those devices (whatever they may be).

On Tue, May 16, 2023 at 8:35 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
>
> On Mon, May 15, 2023 at 02:25:52PM -0600, Tim Van Patten wrote:
> > Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM
> > .prepare() and cros_ec_lpc_complete() during .complete(). This moves the
> > host command that the AP sends and allows the EC to log entry/exit of
> > AP's suspend/resume more accurately.
>
> I can understand the patch wants to notify EC earlier/later when the system
> suspend/resume.  But what is the issue addressed?  What happens if the
> measurement of suspend/resume duration is not that accurate?
>
> Copied from my previous mail:
> * Should it move the callbacks?
>   * Is it appropriate to call cros_ec_suspend() when PM is still in prepare
>     phase and call cros_ec_resume() when PM is already in complete phase?
>
> It seems prepare() is a more general callback.  It could be followed by
> suspend(), freeze(), or poweroff()[1].  Do we expect the change?  For example,
> the system is going to power off but EC gets notification about the system
> should be going to suspend.  Same as complete().
>
> Moreover, cros_ec_suspend() and cros_ec_resume() do more than just notify EC.
> E.g. [2].
>
> What about other interfaces (i2c, spi, uart)?  Do they also need to change
> the callbacks?
>
> [1]: https://elixir.bootlin.com/linux/v6.4-rc1/source/include/linux/pm.h#L74
> [2]: https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/platform/chrome/cros_ec.c#L351
>
> > Changes in v9:
> > - Remove log statements.
> > - Ignore return value from cros_ec_resume().
>
> The change logs are not part of commit message.  They should put after "---".
Tzung-Bi Shih May 18, 2023, 1:38 a.m. UTC | #3
On Wed, May 17, 2023 at 09:56:59AM -0600, Tim Van Patten wrote:
> > I can understand the patch wants to notify EC earlier/later when the system
> suspend/resume.  But what is the issue addressed?  What happens if the
> measurement of suspend/resume duration is not that accurate?
> 
> Please see the following:
> - b/206860487
> - https://docs.google.com/document/d/1AgaZmG70bAKhZb-ZMbZT-TyY49zPoKuDDbD61dDBSTQ/edit?disco=AAAAws1enlw&usp_dm=false

I have no permission to access the doc.  Please put the context in the commit
message.  It's usually helpful if you could put the corresponding EC FW
changes.

> The issue is that we need the EC aware of the AP being in the process
> of suspend/resume from start to finish, so we can accurately
> determine:
> - How long the process took to better gauge we're meeting ChromeOS requirements.
> - When the AP failed to complete the process, so we can collect data
> and perform error recovery.

Is it a new feature?  How could the *error* recovery do?

> > It seems prepare() is a more general callback.  It could be followed by
> suspend(), freeze(), or poweroff()[1].  Do we expect the change?  For example,
> the system is going to power off but EC gets notification about the system
> should be going to suspend.  Same as complete().
> 
> Please reference the implementation of SET_LATE_SYSTEM_SLEEP_PM_OPS
> and see that we were already calling it in the poweroff path:
> 
> #define SET_LATE_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \
> .suspend_late = suspend_fn, \
> .resume_early = resume_fn, \
> .freeze_late = suspend_fn, \
> .thaw_early = resume_fn, \
> .poweroff_late = suspend_fn, \  <<---- here
> .restore_early = resume_fn,
> 
> * @poweroff_late: Continue operations started by @poweroff(). Analogous to
> * @suspend_late(), but it need not save the device's settings in memory.
> 
> So, there is unlikely to be any functional difference with this change
> in terms of poweroff.

I see.  There is still slightly change in disabling/enabling IRQ[2].  But I
think it would be fine as long as they are symmetric.

[2]: https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/platform/chrome/cros_ec.c#L351

> > What about other interfaces (i2c, spi, uart)?  Do they also need to change
> the callbacks?
> 
> We aren't concerned about those devices, because they aren't being
> used on the devices we're seeing issues with. If devices using those
> ECs want this change, they can pick it up as well, but we don't have
> any way to test changes on those devices (whatever they may be).

This doesn't sound good.  As I would suppose you are adding some new EC FW
features regarding to EC_CMD_HOST_SLEEP_EVENT, you should consider the
existing systems too.

What happens if a system uses older kernel (without this patch) to
communicate with new EC FW via LPC?

How about adding a new EC host command for your purpose so that it won't
affect the existing systems?  I knew this was discussed in some older series
but I didn't follow the thread.

> On Tue, May 16, 2023 at 8:35 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> >
> > On Mon, May 15, 2023 at 02:25:52PM -0600, Tim Van Patten wrote:
> > > Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM
> > > .prepare() and cros_ec_lpc_complete() during .complete(). This moves the
> > > host command that the AP sends and allows the EC to log entry/exit of
> > > AP's suspend/resume more accurately.
> >
> > I can understand the patch wants to notify EC earlier/later when the system
[...]

Please don't top-posting next time.
Tim Van Patten May 18, 2023, 4:47 p.m. UTC | #4
On Wed, May 17, 2023 at 7:38 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
>
> On Wed, May 17, 2023 at 09:56:59AM -0600, Tim Van Patten wrote:
> > > I can understand the patch wants to notify EC earlier/later when the system
> > suspend/resume.  But what is the issue addressed?  What happens if the
> > measurement of suspend/resume duration is not that accurate?
> >
> > Please see the following:
> > - b/206860487
> > - https://docs.google.com/document/d/1AgaZmG70bAKhZb-ZMbZT-TyY49zPoKuDDbD61dDBSTQ/edit?disco=AAAAws1enlw&usp_dm=false
>
> I have no permission to access the doc.

Since you work at Google, please request permission to access it, so
the owners can grant it.

> Please put the context in the commit
> message.

Done in the next patchset.

> It's usually helpful if you could put the corresponding EC FW
> changes.

Why are you assuming there is EC FW with this change? Any and all of
the EC changes related to this have (obviously) landed long ago.

> > The issue is that we need the EC aware of the AP being in the process
> > of suspend/resume from start to finish, so we can accurately
> > determine:
> > - How long the process took to better gauge we're meeting ChromeOS requirements.
> > - When the AP failed to complete the process, so we can collect data
> > and perform error recovery.
>
> Is it a new feature?

No, it's not.

> How could the *error* recovery do?

I don't understand what this is asking.

> > > What about other interfaces (i2c, spi, uart)?  Do they also need to change
> > the callbacks?
> >
> > We aren't concerned about those devices, because they aren't being
> > used on the devices we're seeing issues with. If devices using those
> > ECs want this change, they can pick it up as well, but we don't have
> > any way to test changes on those devices (whatever they may be).
>
> This doesn't sound good.  As I would suppose you are adding some new EC FW
> features regarding to EC_CMD_HOST_SLEEP_EVENT, you should consider the
> existing systems too.

Again, why are you assuming there is new EC FW for this? This is only
changing when an already-existing host command is being sent. Nothing
is being added or removed.

> What happens if a system uses older kernel (without this patch) to
> communicate with new EC FW via LPC?

The message is being sent regardless of whether this patch lands or
not. This patch just changes when they are sent, so there is no risk
from that perspective.

> How about adding a new EC host command for your purpose so that it won't
> affect the existing systems?  I knew this was discussed in some older series
> but I didn't follow the thread.
>

No. The necessary host command already exists and is being sent. There
is no additional command being sent with this change. It is only
changing when the command is being sent.
Tzung-Bi Shih May 19, 2023, 1:55 a.m. UTC | #5
On Thu, May 18, 2023 at 10:47:23AM -0600, Tim Van Patten wrote:
> On Wed, May 17, 2023 at 7:38 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> > On Wed, May 17, 2023 at 09:56:59AM -0600, Tim Van Patten wrote:
> > > The issue is that we need the EC aware of the AP being in the process
> > > of suspend/resume from start to finish, so we can accurately
> > > determine:
> > > - How long the process took to better gauge we're meeting ChromeOS requirements.
> > > - When the AP failed to complete the process, so we can collect data
> > > and perform error recovery.
[...]
> > How could the *error* recovery do?
> 
> I don't understand what this is asking.

Given that you said "we can collect data and perform error recovery" if the
suspend-resume takes more/less time than expected.  I'm trying to understand
what does "error recovery" mean.  What recovery it could take?

> > > > What about other interfaces (i2c, spi, uart)?  Do they also need to change
> > > the callbacks?
> > >
> > > We aren't concerned about those devices, because they aren't being
> > > used on the devices we're seeing issues with. If devices using those
> > > ECs want this change, they can pick it up as well, but we don't have
> > > any way to test changes on those devices (whatever they may be).
> >
> > This doesn't sound good.  As I would suppose you are adding some new EC FW
> > features regarding to EC_CMD_HOST_SLEEP_EVENT, you should consider the
> > existing systems too.
> 
> Again, why are you assuming there is new EC FW for this? This is only
> changing when an already-existing host command is being sent. Nothing
> is being added or removed.

I see.  There is no EC changes.

Specifically, do you see any crashes, or premature events, or mal-functions
regarding to the measurement is not that accurate?

Also, we wouldn't want it to be LPC-specialized.  Please consider other
interfaces.
Tim Van Patten May 19, 2023, 3:32 p.m. UTC | #6
On Thu, May 18, 2023 at 7:55 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
>
> On Thu, May 18, 2023 at 10:47:23AM -0600, Tim Van Patten wrote:
> > On Wed, May 17, 2023 at 7:38 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> > > On Wed, May 17, 2023 at 09:56:59AM -0600, Tim Van Patten wrote:
> > > > The issue is that we need the EC aware of the AP being in the process
> > > > of suspend/resume from start to finish, so we can accurately
> > > > determine:
> > > > - How long the process took to better gauge we're meeting ChromeOS requirements.
> > > > - When the AP failed to complete the process, so we can collect data
> > > > and perform error recovery.
> [...]
> > > How could the *error* recovery do?
> >
> > I don't understand what this is asking.
>
> Given that you said "we can collect data and perform error recovery" if the
> suspend-resume takes more/less time than expected.  I'm trying to understand
> what does "error recovery" mean.  What recovery it could take?

Currently, for AMD, the EC will trigger data collection and either
reset the AP or send a host event to attempt to trigger the AP into
it's own recovery. Intel is looking into adding error recovery as
well.

> > > > > What about other interfaces (i2c, spi, uart)?  Do they also need to change
> > > > the callbacks?
> > > >
> > > > We aren't concerned about those devices, because they aren't being
> > > > used on the devices we're seeing issues with. If devices using those
> > > > ECs want this change, they can pick it up as well, but we don't have
> > > > any way to test changes on those devices (whatever they may be).
> > >
> > > This doesn't sound good.  As I would suppose you are adding some new EC FW
> > > features regarding to EC_CMD_HOST_SLEEP_EVENT, you should consider the
> > > existing systems too.
> >
> > Again, why are you assuming there is new EC FW for this? This is only
> > changing when an already-existing host command is being sent. Nothing
> > is being added or removed.
>
> I see.  There is no EC changes.
>
> Specifically, do you see any crashes, or premature events, or mal-functions
> regarding to the measurement is not that accurate?

No, we don't see any issues with the current or new timings of sending
these commands.

> Also, we wouldn't want it to be LPC-specialized.  Please consider other
> interfaces.

I'm intentionally excluding those interfaces because we don't have any
way to test/validate this change on those devices. Additionally, we
don't have any devices that use this suspend/resume tracking logic
using those interfaces, so there's no reason to introduce any risk by
changing devices that don't need it.
patchwork-bot+chrome-platform@kernel.org May 22, 2023, 2 a.m. UTC | #7
Hello:

This patch was applied to chrome-platform/linux.git (for-kernelci)
by Tzung-Bi Shih <tzungbi@kernel.org>:

On Mon, 15 May 2023 14:25:52 -0600 you wrote:
> Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM
> .prepare() and cros_ec_lpc_complete() during .complete(). This moves the
> host command that the AP sends and allows the EC to log entry/exit of
> AP's suspend/resume more accurately.
> 
> Changes in v9:
> - Remove log statements.
> - Ignore return value from cros_ec_resume().
> 
> [...]

Here is the summary with links:
  - [v9] platform/chrome: cros_ec_lpc: Move host command to prepare/complete
    https://git.kernel.org/chrome-platform/c/4b9abbc132b8

You are awesome, thank you!
patchwork-bot+chrome-platform@kernel.org May 22, 2023, 8:10 a.m. UTC | #8
Hello:

This patch was applied to chrome-platform/linux.git (for-next)
by Tzung-Bi Shih <tzungbi@kernel.org>:

On Mon, 15 May 2023 14:25:52 -0600 you wrote:
> Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM
> .prepare() and cros_ec_lpc_complete() during .complete(). This moves the
> host command that the AP sends and allows the EC to log entry/exit of
> AP's suspend/resume more accurately.
> 
> Changes in v9:
> - Remove log statements.
> - Ignore return value from cros_ec_resume().
> 
> [...]

Here is the summary with links:
  - [v9] platform/chrome: cros_ec_lpc: Move host command to prepare/complete
    https://git.kernel.org/chrome-platform/c/4b9abbc132b8

You are awesome, thank you!
diff mbox series

Patch

diff --git a/drivers/platform/chrome/cros_ec_lpc.c b/drivers/platform/chrome/cros_ec_lpc.c
index 68bba0fcafab3..2318eccd9c6ad 100644
--- a/drivers/platform/chrome/cros_ec_lpc.c
+++ b/drivers/platform/chrome/cros_ec_lpc.c
@@ -543,23 +543,25 @@  static const struct dmi_system_id cros_ec_lpc_dmi_table[] __initconst = {
 MODULE_DEVICE_TABLE(dmi, cros_ec_lpc_dmi_table);
 
 #ifdef CONFIG_PM_SLEEP
-static int cros_ec_lpc_suspend(struct device *dev)
+static int cros_ec_lpc_prepare(struct device *dev)
 {
 	struct cros_ec_device *ec_dev = dev_get_drvdata(dev);
 
 	return cros_ec_suspend(ec_dev);
 }
 
-static int cros_ec_lpc_resume(struct device *dev)
+static void cros_ec_lpc_complete(struct device *dev)
 {
 	struct cros_ec_device *ec_dev = dev_get_drvdata(dev);
-
-	return cros_ec_resume(ec_dev);
+	cros_ec_resume(ec_dev);
 }
 #endif
 
 static const struct dev_pm_ops cros_ec_lpc_pm_ops = {
-	SET_LATE_SYSTEM_SLEEP_PM_OPS(cros_ec_lpc_suspend, cros_ec_lpc_resume)
+#ifdef CONFIG_PM_SLEEP
+	.prepare = cros_ec_lpc_prepare,
+	.complete = cros_ec_lpc_complete
+#endif
 };
 
 static struct platform_driver cros_ec_lpc_driver = {