diff mbox series

drivers/cros_ec: Reduce log polling period to 2s

Message ID 20230921141935.3973069-1-robbarnes@google.com (mailing list archive)
State New, archived
Headers show
Series drivers/cros_ec: Reduce log polling period to 2s | expand

Commit Message

Rob Barnes Sept. 21, 2023, 2:19 p.m. UTC
Lowering the log polling interval to 2 seconds provides two advantages:
1. Minimizes the risk of EC internal log buffer overfilling and truncating.
2. Yields more recent logs prior to a crash, facilitating easier debugging.

Signed-off-by: Rob Barnes <robbarnes@google.com>
---
 drivers/platform/chrome/cros_ec_debugfs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Tzung-Bi Shih Sept. 23, 2023, 8:41 a.m. UTC | #1
On Thu, Sep 21, 2023 at 02:19:35PM +0000, Rob Barnes wrote:
> 2. Yields more recent logs prior to a crash, facilitating easier debugging.

The approach depends on multiple parties before a system reboot: EC reports
on panic, AP sends EC commands for reading the logs, userland programs
(e.g. timberslide) write the logs to filesystem, and filesystem
synchronization.  If anyone in the path didn't work in time, the logs
disappear.

An random idea: could we save the EC console logs to pstore when AP gets
notification on EC panic?

> ---
>  drivers/platform/chrome/cros_ec_debugfs.c | 2 +-

If you have chance to send next version, please refer `git log` on the file
to see a good candidate of title prefixes.
Rob Barnes Sept. 26, 2023, 6:57 p.m. UTC | #2
On Sat, Sep 23, 2023 at 2:42 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
>
> On Thu, Sep 21, 2023 at 02:19:35PM +0000, Rob Barnes wrote:
> > 2. Yields more recent logs prior to a crash, facilitating easier debugging.
>
> The approach depends on multiple parties before a system reboot: EC reports
> on panic, AP sends EC commands for reading the logs, userland programs
> (e.g. timberslide) write the logs to filesystem, and filesystem
> synchronization.  If anyone in the path didn't work in time, the logs
> disappear.

This change is just affecting the continuous polling period of the
cros_ec driver.
Syncing the log immediately after a panic is a separate path and will only work
when EC supports system safe mode recovery, most don't.

The hope with shortening the period is that cros_ec.previous will contain more
relevant logs after a crash, even when safe mode isn't enabled.

>
> An random idea: could we save the EC console logs to pstore when AP gets
> notification on EC panic?

This could work. It would be a separate effort.

>
> > ---
> >  drivers/platform/chrome/cros_ec_debugfs.c | 2 +-
>
> If you have chance to send next version, please refer `git log` on the file
> to see a good candidate of title prefixes.
Ack.
Tzung-Bi Shih Sept. 28, 2023, 7:07 a.m. UTC | #3
On Tue, Sep 26, 2023 at 12:57:22PM -0600, Rob Barnes wrote:
> On Sat, Sep 23, 2023 at 2:42 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> >
> > On Thu, Sep 21, 2023 at 02:19:35PM +0000, Rob Barnes wrote:
> > > 2. Yields more recent logs prior to a crash, facilitating easier debugging.
> >
> > The approach depends on multiple parties before a system reboot: EC reports
> > on panic, AP sends EC commands for reading the logs, userland programs
> > (e.g. timberslide) write the logs to filesystem, and filesystem
> > synchronization.  If anyone in the path didn't work in time, the logs
> > disappear.
> 
> This change is just affecting the continuous polling period of the
> cros_ec driver.
> Syncing the log immediately after a panic is a separate path and will only work
> when EC supports system safe mode recovery, most don't.

I guess I misunderstood.  Is the understanding "EC can't respond to further
host commands from AP after EC crashed unless some more special supports
(e.g. system safe mode)" correct?

If yes, how about after EC reported a panic but before EC crashed?  Can EC
respond to host commands during the period?

> The hope with shortening the period is that cros_ec.previous will contain more
> relevant logs after a crash, even when safe mode isn't enabled.

What is current approach for getting EC crash dumps/logs from AP?  If the
system didn't save them before system reset, are they available after next
boot?
Rob Barnes May 2, 2024, 5:13 p.m. UTC | #4
Sorry for dropping this conversation. I'm pursuing a different
approach for setting the ec log period. See
https://chromium-review.googlesource.com/c/chromiumos/third_party/kernel/+/5509884
for the draft. I will send the patch to this list after internal
review.

Also answering your questions below.

On Thu, Sep 28, 2023 at 1:07 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
>
> On Tue, Sep 26, 2023 at 12:57:22PM -0600, Rob Barnes wrote:
> > On Sat, Sep 23, 2023 at 2:42 AM Tzung-Bi Shih <tzungbi@kernel.org> wrote:
> > >
> > > On Thu, Sep 21, 2023 at 02:19:35PM +0000, Rob Barnes wrote:
> > > > 2. Yields more recent logs prior to a crash, facilitating easier debugging.
> > >
> > > The approach depends on multiple parties before a system reboot: EC reports
> > > on panic, AP sends EC commands for reading the logs, userland programs
> > > (e.g. timberslide) write the logs to filesystem, and filesystem
> > > synchronization.  If anyone in the path didn't work in time, the logs
> > > disappear.
> >
> > This change is just affecting the continuous polling period of the
> > cros_ec driver.
> > Syncing the log immediately after a panic is a separate path and will only work
> > when EC supports system safe mode recovery, most don't.
>
> I guess I misunderstood.  Is the understanding "EC can't respond to further
> host commands from AP after EC crashed unless some more special supports
> (e.g. system safe mode)" correct?
>
> If yes, how about after EC reported a panic but before EC crashed?  Can EC
> respond to host commands during the period?

System safe mode allows the EC to temporarily recover from some types
of panics. The EC is able to respond to host commands in this
scenario.

>
>
> > The hope with shortening the period is that cros_ec.previous will contain more
> > relevant logs after a crash, even when safe mode isn't enabled.
>
> What is current approach for getting EC crash dumps/logs from AP?  If the
> system didn't save them before system reset, are they available after next
>
> boot?

The EC does save a small panicinfo struct in uninitialized RAM that is
preserved across reset. This struct does not include logs, so any EC
logs that are not sync'd to the OS before reset are lost.
diff mbox series

Patch

diff --git a/drivers/platform/chrome/cros_ec_debugfs.c b/drivers/platform/chrome/cros_ec_debugfs.c
index 9044c6bab770c..c96493504127d 100644
--- a/drivers/platform/chrome/cros_ec_debugfs.c
+++ b/drivers/platform/chrome/cros_ec_debugfs.c
@@ -21,7 +21,7 @@ 
 
 #define LOG_SHIFT		14
 #define LOG_SIZE		(1 << LOG_SHIFT)
-#define LOG_POLL_SEC		10
+#define LOG_POLL_SEC		2
 
 #define CIRC_ADD(idx, size, value)	(((idx) + (value)) & ((size) - 1))