Message ID | 20230515142552.1.I17cae37888be3a8683911991602f18e482e7a621@changeid (mailing list archive) |
---|---|
State | Accepted |
Commit | 4b9abbc132b86e2ce65090798186836f48f33382 |
Headers | show |
Series | [v9] platform/chrome: cros_ec_lpc: Move host command to prepare/complete | expand |
On Mon, May 15, 2023 at 02:25:52PM -0600, Tim Van Patten wrote: > Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM > .prepare() and cros_ec_lpc_complete() during .complete(). This moves the > host command that the AP sends and allows the EC to log entry/exit of > AP's suspend/resume more accurately. I can understand the patch wants to notify EC earlier/later when the system suspend/resume. But what is the issue addressed? What happens if the measurement of suspend/resume duration is not that accurate? Copied from my previous mail: * Should it move the callbacks? * Is it appropriate to call cros_ec_suspend() when PM is still in prepare phase and call cros_ec_resume() when PM is already in complete phase? It seems prepare() is a more general callback. It could be followed by suspend(), freeze(), or poweroff()[1]. Do we expect the change? For example, the system is going to power off but EC gets notification about the system should be going to suspend. Same as complete(). Moreover, cros_ec_suspend() and cros_ec_resume() do more than just notify EC. E.g. [2]. What about other interfaces (i2c, spi, uart)? Do they also need to change the callbacks? [1]: https://elixir.bootlin.com/linux/v6.4-rc1/source/include/linux/pm.h#L74 [2]: https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/platform/chrome/cros_ec.c#L351 > Changes in v9: > - Remove log statements. > - Ignore return value from cros_ec_resume(). The change logs are not part of commit message. They should put after "---".
Hi, > I can understand the patch wants to notify EC earlier/later when the system suspend/resume. But what is the issue addressed? What happens if the measurement of suspend/resume duration is not that accurate? Please see the following: - b/206860487 - https://docs.google.com/document/d/1AgaZmG70bAKhZb-ZMbZT-TyY49zPoKuDDbD61dDBSTQ/edit?disco=AAAAws1enlw&usp_dm=false The issue is that we need the EC aware of the AP being in the process of suspend/resume from start to finish, so we can accurately determine: - How long the process took to better gauge we're meeting ChromeOS requirements. - When the AP failed to complete the process, so we can collect data and perform error recovery. If the EC isn't monitoring the entire process, then: - We get an inaccurate representation of when the suspend/resume started/finished. - The AP can fail during the gaps in monitoring and the EC will be unable to detect and recover from it. > It seems prepare() is a more general callback. It could be followed by suspend(), freeze(), or poweroff()[1]. Do we expect the change? For example, the system is going to power off but EC gets notification about the system should be going to suspend. Same as complete(). Please reference the implementation of SET_LATE_SYSTEM_SLEEP_PM_OPS and see that we were already calling it in the poweroff path: #define SET_LATE_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ .suspend_late = suspend_fn, \ .resume_early = resume_fn, \ .freeze_late = suspend_fn, \ .thaw_early = resume_fn, \ .poweroff_late = suspend_fn, \ <<---- here .restore_early = resume_fn, * @poweroff_late: Continue operations started by @poweroff(). Analogous to * @suspend_late(), but it need not save the device's settings in memory. So, there is unlikely to be any functional difference with this change in terms of poweroff. > What about other interfaces (i2c, spi, uart)? Do they also need to change the callbacks? We aren't concerned about those devices, because they aren't being used on the devices we're seeing issues with. If devices using those ECs want this change, they can pick it up as well, but we don't have any way to test changes on those devices (whatever they may be). On Tue, May 16, 2023 at 8:35 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > On Mon, May 15, 2023 at 02:25:52PM -0600, Tim Van Patten wrote: > > Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM > > .prepare() and cros_ec_lpc_complete() during .complete(). This moves the > > host command that the AP sends and allows the EC to log entry/exit of > > AP's suspend/resume more accurately. > > I can understand the patch wants to notify EC earlier/later when the system > suspend/resume. But what is the issue addressed? What happens if the > measurement of suspend/resume duration is not that accurate? > > Copied from my previous mail: > * Should it move the callbacks? > * Is it appropriate to call cros_ec_suspend() when PM is still in prepare > phase and call cros_ec_resume() when PM is already in complete phase? > > It seems prepare() is a more general callback. It could be followed by > suspend(), freeze(), or poweroff()[1]. Do we expect the change? For example, > the system is going to power off but EC gets notification about the system > should be going to suspend. Same as complete(). > > Moreover, cros_ec_suspend() and cros_ec_resume() do more than just notify EC. > E.g. [2]. > > What about other interfaces (i2c, spi, uart)? Do they also need to change > the callbacks? > > [1]: https://elixir.bootlin.com/linux/v6.4-rc1/source/include/linux/pm.h#L74 > [2]: https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/platform/chrome/cros_ec.c#L351 > > > Changes in v9: > > - Remove log statements. > > - Ignore return value from cros_ec_resume(). > > The change logs are not part of commit message. They should put after "---".
On Wed, May 17, 2023 at 09:56:59AM -0600, Tim Van Patten wrote: > > I can understand the patch wants to notify EC earlier/later when the system > suspend/resume. But what is the issue addressed? What happens if the > measurement of suspend/resume duration is not that accurate? > > Please see the following: > - b/206860487 > - https://docs.google.com/document/d/1AgaZmG70bAKhZb-ZMbZT-TyY49zPoKuDDbD61dDBSTQ/edit?disco=AAAAws1enlw&usp_dm=false I have no permission to access the doc. Please put the context in the commit message. It's usually helpful if you could put the corresponding EC FW changes. > The issue is that we need the EC aware of the AP being in the process > of suspend/resume from start to finish, so we can accurately > determine: > - How long the process took to better gauge we're meeting ChromeOS requirements. > - When the AP failed to complete the process, so we can collect data > and perform error recovery. Is it a new feature? How could the *error* recovery do? > > It seems prepare() is a more general callback. It could be followed by > suspend(), freeze(), or poweroff()[1]. Do we expect the change? For example, > the system is going to power off but EC gets notification about the system > should be going to suspend. Same as complete(). > > Please reference the implementation of SET_LATE_SYSTEM_SLEEP_PM_OPS > and see that we were already calling it in the poweroff path: > > #define SET_LATE_SYSTEM_SLEEP_PM_OPS(suspend_fn, resume_fn) \ > .suspend_late = suspend_fn, \ > .resume_early = resume_fn, \ > .freeze_late = suspend_fn, \ > .thaw_early = resume_fn, \ > .poweroff_late = suspend_fn, \ <<---- here > .restore_early = resume_fn, > > * @poweroff_late: Continue operations started by @poweroff(). Analogous to > * @suspend_late(), but it need not save the device's settings in memory. > > So, there is unlikely to be any functional difference with this change > in terms of poweroff. I see. There is still slightly change in disabling/enabling IRQ[2]. But I think it would be fine as long as they are symmetric. [2]: https://elixir.bootlin.com/linux/v6.4-rc1/source/drivers/platform/chrome/cros_ec.c#L351 > > What about other interfaces (i2c, spi, uart)? Do they also need to change > the callbacks? > > We aren't concerned about those devices, because they aren't being > used on the devices we're seeing issues with. If devices using those > ECs want this change, they can pick it up as well, but we don't have > any way to test changes on those devices (whatever they may be). This doesn't sound good. As I would suppose you are adding some new EC FW features regarding to EC_CMD_HOST_SLEEP_EVENT, you should consider the existing systems too. What happens if a system uses older kernel (without this patch) to communicate with new EC FW via LPC? How about adding a new EC host command for your purpose so that it won't affect the existing systems? I knew this was discussed in some older series but I didn't follow the thread. > On Tue, May 16, 2023 at 8:35 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > > > On Mon, May 15, 2023 at 02:25:52PM -0600, Tim Van Patten wrote: > > > Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM > > > .prepare() and cros_ec_lpc_complete() during .complete(). This moves the > > > host command that the AP sends and allows the EC to log entry/exit of > > > AP's suspend/resume more accurately. > > > > I can understand the patch wants to notify EC earlier/later when the system [...] Please don't top-posting next time.
On Wed, May 17, 2023 at 7:38 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > On Wed, May 17, 2023 at 09:56:59AM -0600, Tim Van Patten wrote: > > > I can understand the patch wants to notify EC earlier/later when the system > > suspend/resume. But what is the issue addressed? What happens if the > > measurement of suspend/resume duration is not that accurate? > > > > Please see the following: > > - b/206860487 > > - https://docs.google.com/document/d/1AgaZmG70bAKhZb-ZMbZT-TyY49zPoKuDDbD61dDBSTQ/edit?disco=AAAAws1enlw&usp_dm=false > > I have no permission to access the doc. Since you work at Google, please request permission to access it, so the owners can grant it. > Please put the context in the commit > message. Done in the next patchset. > It's usually helpful if you could put the corresponding EC FW > changes. Why are you assuming there is EC FW with this change? Any and all of the EC changes related to this have (obviously) landed long ago. > > The issue is that we need the EC aware of the AP being in the process > > of suspend/resume from start to finish, so we can accurately > > determine: > > - How long the process took to better gauge we're meeting ChromeOS requirements. > > - When the AP failed to complete the process, so we can collect data > > and perform error recovery. > > Is it a new feature? No, it's not. > How could the *error* recovery do? I don't understand what this is asking. > > > What about other interfaces (i2c, spi, uart)? Do they also need to change > > the callbacks? > > > > We aren't concerned about those devices, because they aren't being > > used on the devices we're seeing issues with. If devices using those > > ECs want this change, they can pick it up as well, but we don't have > > any way to test changes on those devices (whatever they may be). > > This doesn't sound good. As I would suppose you are adding some new EC FW > features regarding to EC_CMD_HOST_SLEEP_EVENT, you should consider the > existing systems too. Again, why are you assuming there is new EC FW for this? This is only changing when an already-existing host command is being sent. Nothing is being added or removed. > What happens if a system uses older kernel (without this patch) to > communicate with new EC FW via LPC? The message is being sent regardless of whether this patch lands or not. This patch just changes when they are sent, so there is no risk from that perspective. > How about adding a new EC host command for your purpose so that it won't > affect the existing systems? I knew this was discussed in some older series > but I didn't follow the thread. > No. The necessary host command already exists and is being sent. There is no additional command being sent with this change. It is only changing when the command is being sent.
On Thu, May 18, 2023 at 10:47:23AM -0600, Tim Van Patten wrote: > On Wed, May 17, 2023 at 7:38 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > On Wed, May 17, 2023 at 09:56:59AM -0600, Tim Van Patten wrote: > > > The issue is that we need the EC aware of the AP being in the process > > > of suspend/resume from start to finish, so we can accurately > > > determine: > > > - How long the process took to better gauge we're meeting ChromeOS requirements. > > > - When the AP failed to complete the process, so we can collect data > > > and perform error recovery. [...] > > How could the *error* recovery do? > > I don't understand what this is asking. Given that you said "we can collect data and perform error recovery" if the suspend-resume takes more/less time than expected. I'm trying to understand what does "error recovery" mean. What recovery it could take? > > > > What about other interfaces (i2c, spi, uart)? Do they also need to change > > > the callbacks? > > > > > > We aren't concerned about those devices, because they aren't being > > > used on the devices we're seeing issues with. If devices using those > > > ECs want this change, they can pick it up as well, but we don't have > > > any way to test changes on those devices (whatever they may be). > > > > This doesn't sound good. As I would suppose you are adding some new EC FW > > features regarding to EC_CMD_HOST_SLEEP_EVENT, you should consider the > > existing systems too. > > Again, why are you assuming there is new EC FW for this? This is only > changing when an already-existing host command is being sent. Nothing > is being added or removed. I see. There is no EC changes. Specifically, do you see any crashes, or premature events, or mal-functions regarding to the measurement is not that accurate? Also, we wouldn't want it to be LPC-specialized. Please consider other interfaces.
On Thu, May 18, 2023 at 7:55 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > On Thu, May 18, 2023 at 10:47:23AM -0600, Tim Van Patten wrote: > > On Wed, May 17, 2023 at 7:38 PM Tzung-Bi Shih <tzungbi@kernel.org> wrote: > > > On Wed, May 17, 2023 at 09:56:59AM -0600, Tim Van Patten wrote: > > > > The issue is that we need the EC aware of the AP being in the process > > > > of suspend/resume from start to finish, so we can accurately > > > > determine: > > > > - How long the process took to better gauge we're meeting ChromeOS requirements. > > > > - When the AP failed to complete the process, so we can collect data > > > > and perform error recovery. > [...] > > > How could the *error* recovery do? > > > > I don't understand what this is asking. > > Given that you said "we can collect data and perform error recovery" if the > suspend-resume takes more/less time than expected. I'm trying to understand > what does "error recovery" mean. What recovery it could take? Currently, for AMD, the EC will trigger data collection and either reset the AP or send a host event to attempt to trigger the AP into it's own recovery. Intel is looking into adding error recovery as well. > > > > > What about other interfaces (i2c, spi, uart)? Do they also need to change > > > > the callbacks? > > > > > > > > We aren't concerned about those devices, because they aren't being > > > > used on the devices we're seeing issues with. If devices using those > > > > ECs want this change, they can pick it up as well, but we don't have > > > > any way to test changes on those devices (whatever they may be). > > > > > > This doesn't sound good. As I would suppose you are adding some new EC FW > > > features regarding to EC_CMD_HOST_SLEEP_EVENT, you should consider the > > > existing systems too. > > > > Again, why are you assuming there is new EC FW for this? This is only > > changing when an already-existing host command is being sent. Nothing > > is being added or removed. > > I see. There is no EC changes. > > Specifically, do you see any crashes, or premature events, or mal-functions > regarding to the measurement is not that accurate? No, we don't see any issues with the current or new timings of sending these commands. > Also, we wouldn't want it to be LPC-specialized. Please consider other > interfaces. I'm intentionally excluding those interfaces because we don't have any way to test/validate this change on those devices. Additionally, we don't have any devices that use this suspend/resume tracking logic using those interfaces, so there's no reason to introduce any risk by changing devices that don't need it.
Hello: This patch was applied to chrome-platform/linux.git (for-kernelci) by Tzung-Bi Shih <tzungbi@kernel.org>: On Mon, 15 May 2023 14:25:52 -0600 you wrote: > Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM > .prepare() and cros_ec_lpc_complete() during .complete(). This moves the > host command that the AP sends and allows the EC to log entry/exit of > AP's suspend/resume more accurately. > > Changes in v9: > - Remove log statements. > - Ignore return value from cros_ec_resume(). > > [...] Here is the summary with links: - [v9] platform/chrome: cros_ec_lpc: Move host command to prepare/complete https://git.kernel.org/chrome-platform/c/4b9abbc132b8 You are awesome, thank you!
Hello: This patch was applied to chrome-platform/linux.git (for-next) by Tzung-Bi Shih <tzungbi@kernel.org>: On Mon, 15 May 2023 14:25:52 -0600 you wrote: > Update cros_ec_lpc_pm_ops to call cros_ec_lpc_prepare() during PM > .prepare() and cros_ec_lpc_complete() during .complete(). This moves the > host command that the AP sends and allows the EC to log entry/exit of > AP's suspend/resume more accurately. > > Changes in v9: > - Remove log statements. > - Ignore return value from cros_ec_resume(). > > [...] Here is the summary with links: - [v9] platform/chrome: cros_ec_lpc: Move host command to prepare/complete https://git.kernel.org/chrome-platform/c/4b9abbc132b8 You are awesome, thank you!
diff --git a/drivers/platform/chrome/cros_ec_lpc.c b/drivers/platform/chrome/cros_ec_lpc.c index 68bba0fcafab3..2318eccd9c6ad 100644 --- a/drivers/platform/chrome/cros_ec_lpc.c +++ b/drivers/platform/chrome/cros_ec_lpc.c @@ -543,23 +543,25 @@ static const struct dmi_system_id cros_ec_lpc_dmi_table[] __initconst = { MODULE_DEVICE_TABLE(dmi, cros_ec_lpc_dmi_table); #ifdef CONFIG_PM_SLEEP -static int cros_ec_lpc_suspend(struct device *dev) +static int cros_ec_lpc_prepare(struct device *dev) { struct cros_ec_device *ec_dev = dev_get_drvdata(dev); return cros_ec_suspend(ec_dev); } -static int cros_ec_lpc_resume(struct device *dev) +static void cros_ec_lpc_complete(struct device *dev) { struct cros_ec_device *ec_dev = dev_get_drvdata(dev); - - return cros_ec_resume(ec_dev); + cros_ec_resume(ec_dev); } #endif static const struct dev_pm_ops cros_ec_lpc_pm_ops = { - SET_LATE_SYSTEM_SLEEP_PM_OPS(cros_ec_lpc_suspend, cros_ec_lpc_resume) +#ifdef CONFIG_PM_SLEEP + .prepare = cros_ec_lpc_prepare, + .complete = cros_ec_lpc_complete +#endif }; static struct platform_driver cros_ec_lpc_driver = {