Message ID | 20170421122150.76cce2cfrt767glv@pd.tnic (mailing list archive) |
---|---|
State | Not Applicable, archived |
Headers | show |
On 4/21/2017 6:21 AM, Borislav Petkov wrote: > On Tue, Apr 18, 2017 at 05:05:15PM -0600, Tyler Baicar wrote: >> The ACPI 6.1 spec added a timestamp to the HEST generic data > HEST? > > I see the timestamp in > > Table 18-343 Generic Error Data Entry > > where those things are "One or more Generic Error Data Entry structures > may be recorded in the Generic Error Data Entries field of the Generic > Error Status Block structure." > > And those are part of the "18.3.2.7 Generic Hardware Error Source", > i.e., GHES. So why do you say "HEST" above? Ah yes, this should be GHES no HEST. There are too many acronyms involved here :) > >> structure. Print the timestamp out when printing out the error >> status information. >> >> Signed-off-by: Tyler Baicar <tbaicar@codeaurora.org> >> CC: Jonathan (Zhixiong) Zhang <zjzhang@codeaurora.org> >> Reviewed-by: James Morse <james.morse@arm.com> >> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> > Remove those Reviewed-by:s > >> --- >> drivers/firmware/efi/cper.c | 28 ++++++++++++++++++++++++++++ >> 1 file changed, 28 insertions(+) >> >> diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c >> index 8328a6f..46585f9 100644 >> --- a/drivers/firmware/efi/cper.c >> +++ b/drivers/firmware/efi/cper.c >> @@ -32,6 +32,8 @@ >> #include <linux/acpi.h> >> #include <linux/pci.h> >> #include <linux/aer.h> >> +#include <linux/printk.h> >> +#include <linux/bcd.h> >> #include <acpi/ghes.h> >> >> #define INDENT_SP " " >> @@ -387,6 +389,29 @@ static void cper_print_pcie(const char *pfx, const struct cper_sec_pcie *pcie, >> pfx, pcie->bridge.secondary_status, pcie->bridge.control); >> } >> >> +static void cper_estatus_timestamp(const char *pfx, > cper_print_tstamp() > >> + struct acpi_hest_generic_data_v300 *gdata) >> +{ >> + __u8 hour, min, sec, day, mon, year, century, *timestamp; >> + >> + if (gdata->validation_bits & ACPI_HEST_GEN_VALID_TIMESTAMP) { >> + timestamp = (__u8 *)&(gdata->time_stamp); >> + sec = bcd2bin(timestamp[0]); >> + min = bcd2bin(timestamp[1]); >> + hour = bcd2bin(timestamp[2]); >> + day = bcd2bin(timestamp[4]); >> + mon = bcd2bin(timestamp[5]); >> + year = bcd2bin(timestamp[6]); >> + century = bcd2bin(timestamp[7]); >> + >> + if (*(timestamp + 3) & 0x1) >> + printk("%stimestamp is precise\n", pfx); >> + >> + printk("%stime: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx, >> + century, year, mon, day, hour, min, sec); > Yeah, about the precise tstamp, you can do something like this: > > --- > diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c > index 46585f92b741..a649884e2264 100644 > --- a/drivers/firmware/efi/cper.c > +++ b/drivers/firmware/efi/cper.c > @@ -404,10 +404,8 @@ static void cper_estatus_timestamp(const char *pfx, > year = bcd2bin(timestamp[6]); > century = bcd2bin(timestamp[7]); > > - if (*(timestamp + 3) & 0x1) > - printk("%stimestamp is precise\n", pfx); > - > - printk("%stime: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx, > + printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx, > + (timestamp[3] & 0x1 ? "precise " : ""), > century, year, mon, day, hour, min, sec); > } > } This is basically what I already had in v14...you asked to move it into a different if-statement? https://lkml.org/lkml/2017/4/12/397 Thanks, Tyler
On Fri, Apr 21, 2017 at 10:04:35AM -0600, Baicar, Tyler wrote: > This is basically what I already had in v14...you asked to move it into a > different if-statement? https://lkml.org/lkml/2017/4/12/397 Well, clearly I've been smoking some nasty potent sh*t. :-\ /me goes and looks at the spec: "Bit 0 – Timestamp is precise if this bit is set and correlates to the time of the error event." So why are we even printing the timestamp when !precise? IOW, I think we should do: if (!(timestamp[3] & 0x1)) printk("%stimestamp imprecise\n", pfx); else { sec = .. min = ... ... } and print the actual values only when the timestamp is precise. Otherwise it has *some* values which could just as well be completely random. And it's not like we're reporting the error tomorrow - it is mostly a couple of seconds from logging to the fw pushing it out...
On 4/21/2017 11:26 AM, Borislav Petkov wrote: > On Fri, Apr 21, 2017 at 10:04:35AM -0600, Baicar, Tyler wrote: >> This is basically what I already had in v14...you asked to move it into a >> different if-statement? https://lkml.org/lkml/2017/4/12/397 > Well, clearly I've been smoking some nasty potent sh*t. :-\ > > /me goes and looks at the spec: > > "Bit 0 – Timestamp is precise if this bit is set and correlates to the > time of the error event." > > So why are we even printing the timestamp when !precise? > > IOW, I think we should do: > > if (!(timestamp[3] & 0x1)) > printk("%stimestamp imprecise\n", pfx); > else { > sec = .. > min = ... > > ... > } > > and print the actual values only when the timestamp is precise. > Otherwise it has *some* values which could just as well be completely > random. And it's not like we're reporting the error tomorrow - it is > mostly a couple of seconds from logging to the fw pushing it out... The timestamp may still be useful when it is imprecise. In the polling case, you may only poll every minute or so, so the time may be useful. Also, I imagine there could be interrupt based errors happening much faster than the FW/OS handshake can happen. Maybe we can just use what I had before but also specify imprecise so that it is clear: printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx, (timestamp[3] & 0x1 ? "precise " : "imprecise "), century, year, mon, day, hour, min, sec); Thanks, Tyler
On Fri, Apr 21, 2017 at 12:08:43PM -0600, Baicar, Tyler wrote: > The timestamp may still be useful when it is imprecise. In the polling case, > you may only poll every minute or so, so the time may be useful. Well, what is in the timestamp when !precise? Some random time or some timestamp from a couple of seconds ago? How do you differentiate what timestamp is bollocks and what is from a while ago? Is the imprecise tstamp really close to the time the error happened or pointing at 1970 - the beginning of unix time? :-) I'm sure you've picked up by now that we don't trust the firmware one bit. > Also, I imagine there could be interrupt based errors happening much faster than the > FW/OS handshake can happen. Maybe we can just use what I had before but also > specify imprecise so that it is clear: > > printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx, > (timestamp[3] & 0x1 ? "precise " : "imprecise "), > century, year, mon, day, hour, min, sec); I guess.
diff --git a/drivers/firmware/efi/cper.c b/drivers/firmware/efi/cper.c index 46585f92b741..a649884e2264 100644 --- a/drivers/firmware/efi/cper.c +++ b/drivers/firmware/efi/cper.c @@ -404,10 +404,8 @@ static void cper_estatus_timestamp(const char *pfx, year = bcd2bin(timestamp[6]); century = bcd2bin(timestamp[7]); - if (*(timestamp + 3) & 0x1) - printk("%stimestamp is precise\n", pfx); - - printk("%stime: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx, + printk("%s%ststamp: %02d%02d-%02d-%02d %02d:%02d:%02d\n", pfx, + (timestamp[3] & 0x1 ? "precise " : ""), century, year, mon, day, hour, min, sec); } }