diff mbox series

[RFC,v2,3/3] ACPI: extlog: Make print_extlog_rcd() log unconditionally

Message ID 20240510112740.667445-4-fabio.m.de.francesco@linux.intel.com (mailing list archive)
State New, archived
Headers show
Series Make ELOG log and trace consistently with GHES | expand

Commit Message

Fabio M. De Francesco May 10, 2024, 11:21 a.m. UTC
Make extlog_print_rcd() log unconditionally to report errors even if
userspace is not consuming trace events. Remove ras_userspace_consumers()
because it is not anymore used.

This change makes extlog_print() (ELOG) log consistently with ghes_proc()
(GHES).

Cc: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
---
 drivers/acpi/acpi_extlog.c | 6 +-----
 drivers/ras/debugfs.c      | 6 ------
 include/linux/ras.h        | 2 --
 3 files changed, 1 insertion(+), 13 deletions(-)

Comments

Borislav Petkov May 10, 2024, 12:52 p.m. UTC | #1
On Fri, May 10, 2024 at 01:21:47PM +0200, Fabio M. De Francesco wrote:
> Make extlog_print_rcd() log unconditionally to report errors even if
> userspace is not consuming trace events. Remove ras_userspace_consumers()
> because it is not anymore used.

Did you do any git archeology before that?

d6cae935ec5b ("trace, eMCA: Add a knob to adjust where to save event log")

I can't find in this commit message why this is needed... Do share pls.

> This change makes extlog_print() (ELOG) log consistently with ghes_proc()
> (GHES).

Avoid having "This patch" or "This commit" in the commit message. It is
tautologically useless.

Also, do

$ git grep 'This patch' Documentation/process

for more details.

Pls have a look at our documentation and check all your patches.

Thx.
Fabio M. De Francesco May 10, 2024, 7 p.m. UTC | #2
On Friday, May 10, 2024 2:52:14 PM GMT+2 Borislav Petkov wrote:
> On Fri, May 10, 2024 at 01:21:47PM +0200, Fabio M. De Francesco wrote:
> > Make extlog_print_rcd() log unconditionally to report errors even if
> > userspace is not consuming trace events. Remove ras_userspace_consumers()
> > because it is not anymore used.
> 
> Did you do any git archeology before that?
> 
> d6cae935ec5b ("trace, eMCA: Add a knob to adjust where to save event log")
> 
> I can't find in this commit message why this is needed... Do share pls. 
> 
> > This change makes extlog_print() (ELOG) log consistently with ghes_proc()
> > (GHES).

I thought that ELOG and GHES should be modeled consistently. ghes_proc() 
prints to the console while ghes_do_proc() also uses ftrace. I made this short 
series an RFC mainly to ask / receive comments on this change (3/3).

If we want to make ELOG and GHES act similarly, this patch is needed. 
Otherwise, things should be left the way they currently are. 

I'll make a v3  with a more clear explanation of why I think we should prefer 
to make ELOG act similarly to GHES in how kernel logs are handled.

> Avoid having "This patch" or "This commit" in the commit message. It is
> tautologically useless.
>
> Also, do
> 
> $ git grep 'This patch' Documentation/process
> 
> for more details.
> 
> Pls have a look at our documentation and check all your patches.
> 
> Thx.

Please note that I was introducing the "why" part of the message. I never 
refer to this patch for the "what", and I always use an imperative tone only 
in the "what" part. 

However, I see why this commit message was poor. And probably also the other 
two were low quality. Therefore, I'll rework this and also the other two 
messages.

Thanks for your comments,

Fabio
Borislav Petkov May 10, 2024, 7:25 p.m. UTC | #3
On Fri, May 10, 2024 at 09:00:34PM +0200, Fabio M. De Francesco wrote:
> I thought that ELOG and GHES should be modeled consistently. ghes_proc() 
> prints to the console while ghes_do_proc() also uses ftrace.

ghes_proc() calls ghes_do_proc(). I have no clue what you mean here.

> If we want to make ELOG and GHES act similarly, this patch is needed.

I still don't know what the problem is.

> Please note that I was introducing the "why" part of the message. I never 
> refer to this patch for the "what", and I always use an imperative tone only 
> in the "what" part.

No, this is not what I meant. I mean, don't say "This patch does" or "This
commit does" or "This change ... " or whatnot as that is not necessary.

IOW:

"Make extlog_print() (ELOG) log consistently with ghes_proc() (GHES)
because... " and this is where you come in.

So let's start with the problem: what is it and why do you think it is
a problem?
Fabio M. De Francesco May 10, 2024, 8:54 p.m. UTC | #4
On Friday, May 10, 2024 9:25:56 PM GMT+2 Borislav Petkov wrote:
> On Fri, May 10, 2024 at 09:00:34PM +0200, Fabio M. De Francesco wrote:
> > I thought that ELOG and GHES should be modeled consistently. ghes_proc()
> > prints to the console while ghes_do_proc() also uses ftrace.
> 
> ghes_proc() calls ghes_do_proc(). I have no clue what you mean here.
>

My understanding is that ghes_proc() logs to the console and ghes_do_proc() 
calls the tracers. 

Therefore, GHES at the same time always reports the errors via two different 
means.

Instead ELOG depends on the check on ras_userspace_consumers() to decide 
whether to call print_extlog_rcd() to print the logs. And if it print to the 
kernel logs, it jumps to "out" and skips the tracers.

Why is it different with respect to how error reporting is made in GHES? 

I thought that ELOG should be modeled similarly to GHES and so it should print 
to the kernel logs always unconditionally and then it should also use ftrace 
(no goto "out" and skip tracers).

(1) Is my understanding of logging and tracing in ELOG and GHES correct?
(2) If it is, does it make sense for ELOG to print to the kernel log, 
unconditionally, and then call the tracers like ghes_proc() + ghes_do_proc() 
do?

Fabio
Dan Williams May 10, 2024, 10:12 p.m. UTC | #5
Fabio M. De Francesco wrote:
> On Friday, May 10, 2024 9:25:56 PM GMT+2 Borislav Petkov wrote:
> > On Fri, May 10, 2024 at 09:00:34PM +0200, Fabio M. De Francesco wrote:
> > > I thought that ELOG and GHES should be modeled consistently. ghes_proc()
> > > prints to the console while ghes_do_proc() also uses ftrace.
> > 
> > ghes_proc() calls ghes_do_proc(). I have no clue what you mean here.
> >
> 
> My understanding is that ghes_proc() logs to the console and ghes_do_proc() 
> calls the tracers. 
> 
> Therefore, GHES at the same time always reports the errors via two different 
> means.
> 
> Instead ELOG depends on the check on ras_userspace_consumers() to decide 
> whether to call print_extlog_rcd() to print the logs. And if it print to the 
> kernel logs, it jumps to "out" and skips the tracers.
> 
> Why is it different with respect to how error reporting is made in GHES? 
> 
> I thought that ELOG should be modeled similarly to GHES and so it should print 
> to the kernel logs always unconditionally and then it should also use ftrace 
> (no goto "out" and skip tracers).
> 
> (1) Is my understanding of logging and tracing in ELOG and GHES correct?
> (2) If it is, does it make sense for ELOG to print to the kernel log, 
> unconditionally, and then call the tracers like ghes_proc() + ghes_do_proc() 
> do?

I had asked Fabio to take a look at whether it made sense to continue
with the concept of ras_userspace_consumers() especially since it seems
limited to the EXTLOG case.

In general I am finding that between OS Native and Firmware First error
reporting the logging approaches are inconsistent.

As far I can see rasdaemon would not even notice is the "daemon_active"
debugfs file went away [1], and it should be the case that the
tracepoints always fire whether daemon_active is open or not.

So I was expecting this removal to be a conversation starter on the
wider topic of error reporting consistency.

[1]: https://github.com/mchehab/rasdaemon/blob/master/ras-events.c#L992
Borislav Petkov May 11, 2024, 1:08 p.m. UTC | #6
On Fri, May 10, 2024 at 03:12:36PM -0700, Dan Williams wrote:
> I had asked Fabio to take a look at whether it made sense to continue
> with the concept of ras_userspace_consumers() especially since it seems
> limited to the EXTLOG case.
> 
> In general I am finding that between OS Native and Firmware First error
> reporting the logging approaches are inconsistent.
> 
> As far I can see rasdaemon would not even notice is the "daemon_active"
> debugfs file went away [1],

It tells the kernel that it is consuming the error info from the
tracepoints.

> and it should be the case that the tracepoints always fire whether
> daemon_active is open or not.
>
> So I was expecting this removal to be a conversation starter on the
> wider topic of error reporting consistency.

Yeah, and then they'll come and say: ew, we're getting error duplicates
- once logged in dmesg and once through the tracepoints.

So just like with the other thread, we have to figure out what our
scheme will be wrt hw error logging, agree on it and then make it
consistent.

Do we want to have both? Should it be configurable? Probably...

Anything else...?

Thx.
Dan Williams May 12, 2024, 11:45 p.m. UTC | #7
Borislav Petkov wrote:
> On Fri, May 10, 2024 at 03:12:36PM -0700, Dan Williams wrote:
> > I had asked Fabio to take a look at whether it made sense to continue
> > with the concept of ras_userspace_consumers() especially since it seems
> > limited to the EXTLOG case.
> > 
> > In general I am finding that between OS Native and Firmware First error
> > reporting the logging approaches are inconsistent.
> > 
> > As far I can see rasdaemon would not even notice is the "daemon_active"
> > debugfs file went away [1],
> 
> It tells the kernel that it is consuming the error info from the
> tracepoints.

Yes, my point though was that if it got deleted I doubt anyone would
notice. rasdaemon explicitly does not check the return from
open("daemon_active").

I am also curious about the history here. This "daemon_active" scheme is
an awkward way to detect that something is consuming the tracepoint. It
was added on v4.0, but Steven had added "tracepoint_enabled()" back in
v3.17:

7c65bbc7dcfa tracing: Add trace_<tracepoint>_enabled() function

So even if non-rasdaemon userspace was watching the extlog tracepoints
they would not fire because ras_userspace_consumers() prevents it.

I am finding it difficult to see why ras_userspace_consumers() needs to
continue to be maintained.

> > and it should be the case that the tracepoints always fire whether
> > daemon_active is open or not.
> >
> > So I was expecting this removal to be a conversation starter on the
> > wider topic of error reporting consistency.
> 
> Yeah, and then they'll come and say: ew, we're getting error duplicates
> - once logged in dmesg and once through the tracepoints.

That would be odd since there is no ras_userspace_consumers() in the
ACPI GHES path, so it is already the case that you can get duplicate
error information depending on which path triggers the error.

Tracepoints are individually configurable. 

> So just like with the other thread, we have to figure out what our
> scheme will be wrt hw error logging, agree on it and then make it
> consistent.

From my perspective I want alignement between "firmware first" and "OS
Native" events and I think any movement away from kernel log messages as
a hardware error mechanism towards tracepoints is a good thing.

Recall that tracepoints can also be configured to emit to the kernel
log, so that might be a way to keep legacy kernel log message parsing
environments happy.

> Do we want to have both? Should it be configurable? Probably...

Would be great to hear from folks that have a reasons for kernel log
message error reporting to continue.

> Anything else...?

Uniformity of error response to "fatal" events, but that is mainly a
PCIe error handling concern not  CPU errors.
Borislav Petkov May 16, 2024, 9:57 a.m. UTC | #8
On Sun, May 12, 2024 at 04:45:08PM -0700, Dan Williams wrote:
> Yes, my point though was that if it got deleted I doubt anyone would
> notice. rasdaemon explicitly does not check the return from
> open("daemon_active").

The intent was for userspace to open it and thus it'll increment
trace_count which then ras_userspace_consumers() reads...

> I am also curious about the history here. This "daemon_active" scheme is
> an awkward way to detect that something is consuming the tracepoint. It
> was added on v4.0, but Steven had added "tracepoint_enabled()" back in
> v3.17:
> 
> 7c65bbc7dcfa tracing: Add trace_<tracepoint>_enabled() function

Ha, I usually talk to Rostedt for all things tracepoint when wondering
how we could use them for RAS purposes but I haven't this time, it
seems.

> So even if non-rasdaemon userspace was watching the extlog tracepoints
> they would not fire because ras_userspace_consumers() prevents it.
>
> I am finding it difficult to see why ras_userspace_consumers() needs to
> continue to be maintained.

Well, you still need some functionality which tests whether a userspace
daemon consumes RAS events. Whether it is something cludgy like now or
something which checks whether all RAS tracepoints have been enabled,
something's gotta be there.

> That would be odd since there is no ras_userspace_consumers() in the
> ACPI GHES path,

Probably because no one's using RAS daemon with GHES. I at least haven't
heard of anyone complaining about this yet...

> so it is already the case that you can get duplicate error information
> depending on which path triggers the error.
>
> Tracepoints are individually configurable.

Sure.

> From my perspective I want alignement between "firmware first" and "OS
> Native" events and I think any movement away from kernel log messages as
> a hardware error mechanism towards tracepoints is a good thing.

That has been the goal for a while now, yap.

Anyone who parses the kernel log for anything serious has been living
under a rock in the last decade at least. :)

> Recall that tracepoints can also be configured to emit to the kernel
> log, so that might be a way to keep legacy kernel log message parsing
> environments happy.

Ok.

> Would be great to hear from folks that have a reasons for kernel log
> message error reporting to continue.

Right, from my experience so far, you never hear anything. :-\

So if we do anything, it should be something simple and which works for
almost everyone.

With RAS, everyone does their own thing. And then there's the firmware
which claims that it can do better RAS but then f*cks up on basic things
like *actually* shipping a working EINJ or whatever implementation.

So in the end of the day it is, oh, we need our drivers in the OS
because we can't fix firmware. It is harder to fix it than *hardware*
:-P

> Uniformity of error response to "fatal" events, but that is mainly a
> PCIe error handling concern not  CPU errors.

Sure, just make sure to keep it simple and generic.

Thx.
Dan Williams May 16, 2024, 6:56 p.m. UTC | #9
[ add Steven for a "deprecate emitting hardware errors with explicit printk" discussion ]

Borislav Petkov wrote:
> On Sun, May 12, 2024 at 04:45:08PM -0700, Dan Williams wrote:
> > Yes, my point though was that if it got deleted I doubt anyone would
> > notice. rasdaemon explicitly does not check the return from
> > open("daemon_active").
> 
> The intent was for userspace to open it and thus it'll increment
> trace_count which then ras_userspace_consumers() reads...
> 
> > I am also curious about the history here. This "daemon_active" scheme is
> > an awkward way to detect that something is consuming the tracepoint. It
> > was added on v4.0, but Steven had added "tracepoint_enabled()" back in
> > v3.17:
> > 
> > 7c65bbc7dcfa tracing: Add trace_<tracepoint>_enabled() function
> 
> Ha, I usually talk to Rostedt for all things tracepoint when wondering
> how we could use them for RAS purposes but I haven't this time, it
> seems.
> 
> > So even if non-rasdaemon userspace was watching the extlog tracepoints
> > they would not fire because ras_userspace_consumers() prevents it.
> >
> > I am finding it difficult to see why ras_userspace_consumers() needs to
> > continue to be maintained.
> 
> Well, you still need some functionality which tests whether a userspace
> daemon consumes RAS events. Whether it is something cludgy like now or
> something which checks whether all RAS tracepoints have been enabled,
> something's gotta be there.

Honest question, why? Lets deprecate that path. As an example, this
extlog path has bit-rotted and not kept pace with the same level of
error reporting that is available via ACPI GHES or OS native
tracepoints.

Given that is has not kept pace the next question is whether the kernel
should bother to maintain the contract => "if nothing is watching
tracepoints some subset (all?) hardware error messages will be reflected
to the kernel log".

I would point to tp_printk as a way to get tracepoints into the kernel
log. If that is too coarse-grained a replacement for print_extlog_rcd()
I would advocate spending time making tp_printk control more
fine-grained rather than perpetuate this duplicated emission path for
error info.

Something like a new annotation for tracepoints marking them as "emit to
kernel log if no-one is watching this high-priority trace event"?

> > That would be odd since there is no ras_userspace_consumers() in the
> > ACPI GHES path,
> 
> Probably because no one's using RAS daemon with GHES. I at least haven't
> heard of anyone complaining about this yet...

Well no, there is little to complain about in that path because that
path does not play "is anybody watching" games. It always emits to the
kernel log (subject to rate limiting) and then follows up with always
emitting a tracepoint (subject to standard trace filtering).

For CXL I asked that its events deprecate the kernel log path with the
hope of not growing new userspace dependencies on kernel log parsing for
newfangled CXL errors.

[..]
> > Would be great to hear from folks that have a reasons for kernel log
> > message error reporting to continue.
> 
> Right, from my experience so far, you never hear anything. :-\
> 
> So if we do anything, it should be something simple and which works for
> almost everyone.
> 
> With RAS, everyone does their own thing. And then there's the firmware
> which claims that it can do better RAS but then f*cks up on basic things
> like *actually* shipping a working EINJ or whatever implementation.

*sad nod*

> So in the end of the day it is, oh, we need our drivers in the OS
> because we can't fix firmware. It is harder to fix it than *hardware*
> :-P

At least when the firmware gets out of the way Linux has a chance to
solve user issues.

> > Uniformity of error response to "fatal" events, but that is mainly a
> > PCIe error handling concern not  CPU errors.
> 
> Sure, just make sure to keep it simple and generic.

Yes, tracepoints feel simple and generic to me while kernel log messages
with rate-limiting is already a lossy proposition.
Tony Luck May 16, 2024, 8:03 p.m. UTC | #10
> Given that is has not kept pace the next question is whether the kernel
> should bother to maintain the contract => "if nothing is watching
> tracepoints some subset (all?) hardware error messages will be reflected
> to the kernel log".

I'm with you for corrected & recoverable errors. The console is a terrible
place for those logs.

But it's handy to have fatal errors go to the console. They may make it out to a
serial port, or be stashed in pstore for later retrieval. Trace event logs that
are handled by some user level daemon are just going to disappear in a
puff of reset smoke.

-Tony
Dan Williams May 17, 2024, 9:43 p.m. UTC | #11
Luck, Tony wrote:
> > Given that is has not kept pace the next question is whether the kernel
> > should bother to maintain the contract => "if nothing is watching
> > tracepoints some subset (all?) hardware error messages will be reflected
> > to the kernel log".
> 
> I'm with you for corrected & recoverable errors. The console is a terrible
> place for those logs.
> 
> But it's handy to have fatal errors go to the console. They may make it out to a
> serial port, or be stashed in pstore for later retrieval. Trace event logs that
> are handled by some user level daemon are just going to disappear in a
> puff of reset smoke.

That is true, and arguably that fatal event logging to the kernel log should be
independent of whether userspace is watching. I.e. ras_userpsace_consumers()
gating kernel logging is actively getting in the way when the error severity
implies userspace may not get a chance to see or process the event.

I would need help from Steven on whether ftrace would be amenable to have a
localized tracepoint_printk context to trigger emission of fatal events to the
kernel log in addition to the trace buffer.
Borislav Petkov May 21, 2024, 6:39 p.m. UTC | #12
On Thu, May 16, 2024 at 11:56:15AM -0700, Dan Williams wrote:
> Something like a new annotation for tracepoints marking them as "emit to
> kernel log if no-one is watching this high-priority trace event"?

I'm not sure anymore what we want here now... :)

We want for the kernel to not pay attention to whether userspace is
consuming error info from the tracepoints and to issue errors into the
kernel log too.

So you have them in the kernel log *and* in the tracepoints.

Right?

Or only through the tracepoints?

> Well no, there is little to complain about in that path because that
> path does not play "is anybody watching" games. It always emits to the
> kernel log (subject to rate limiting) and then follows up with always
> emitting a tracepoint (subject to standard trace filtering).
> 
> For CXL I asked that its events deprecate the kernel log path with the
> hope of not growing new userspace dependencies on kernel log parsing for

Yeah, kernel log string format is not an ABI so no problem.

> newfangled CXL errors.

So shuffling hw error info solely through the tracepoints is probably ok
but I am unable to rule out *all* possible cases where it would still
make sense to dump to the kernel log, be it there's no userspace, be it
it is a critical error and you want to dump it immediately or whatnot.

It should probably be configurable.

As in: by default all hw error info goes only through tracepoints with
the exception of fatal errors - they go both to the kernel log and
tracepoints.

And then perhaps add "ras=dump_all_error_info_to_kernel_log_too" or so,
cmdline param.

> At least when the firmware gets out of the way Linux has a chance to
> solve user issues.

Yeap.

> Yes, tracepoints feel simple and generic to me while kernel log messages
> with rate-limiting is already a lossy proposition.

Right. Btw, do bear in mind that ratelimiting of hw errors can be bad
sometimes: imagine it ratelimits exactly that one fatal error which you
absolutely should've seen... :-\

Thx.
diff mbox series

Patch

diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c
index fb7b0c73f86a..41ee66306234 100644
--- a/drivers/acpi/acpi_extlog.c
+++ b/drivers/acpi/acpi_extlog.c
@@ -189,10 +189,7 @@  static int extlog_print(struct notifier_block *nb, unsigned long val,
 
 	tmp = (struct acpi_hest_generic_status *)elog_buf;
 
-	if (!ras_userspace_consumers()) {
-		print_extlog_rcd(NULL, tmp, cpu);
-		goto out;
-	}
+	print_extlog_rcd(NULL, tmp, cpu);
 
 	/* log event via trace */
 	err_seq++;
@@ -225,7 +222,6 @@  static int extlog_print(struct notifier_block *nb, unsigned long val,
 		}
 	}
 
-out:
 	mce->kflags |= MCE_HANDLED_EXTLOG;
 	return NOTIFY_OK;
 }
diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c
index 42afd3de68b2..6633844acdd6 100644
--- a/drivers/ras/debugfs.c
+++ b/drivers/ras/debugfs.c
@@ -13,12 +13,6 @@  struct dentry *ras_get_debugfs_root(void)
 }
 EXPORT_SYMBOL_GPL(ras_get_debugfs_root);
 
-int ras_userspace_consumers(void)
-{
-	return atomic_read(&trace_count);
-}
-EXPORT_SYMBOL_GPL(ras_userspace_consumers);
-
 static int trace_show(struct seq_file *m, void *v)
 {
 	return 0;
diff --git a/include/linux/ras.h b/include/linux/ras.h
index a64182bc72ad..98840f16b697 100644
--- a/include/linux/ras.h
+++ b/include/linux/ras.h
@@ -7,11 +7,9 @@ 
 #include <linux/cper.h>
 
 #ifdef CONFIG_DEBUG_FS
-int ras_userspace_consumers(void);
 void ras_debugfs_init(void);
 int ras_add_daemon_trace(void);
 #else
-static inline int ras_userspace_consumers(void) { return 0; }
 static inline void ras_debugfs_init(void) { }
 static inline int ras_add_daemon_trace(void) { return 0; }
 #endif