Message ID | 20240510112740.667445-4-fabio.m.de.francesco@linux.intel.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Make ELOG log and trace consistently with GHES | expand |
On Fri, May 10, 2024 at 01:21:47PM +0200, Fabio M. De Francesco wrote: > Make extlog_print_rcd() log unconditionally to report errors even if > userspace is not consuming trace events. Remove ras_userspace_consumers() > because it is not anymore used. Did you do any git archeology before that? d6cae935ec5b ("trace, eMCA: Add a knob to adjust where to save event log") I can't find in this commit message why this is needed... Do share pls. > This change makes extlog_print() (ELOG) log consistently with ghes_proc() > (GHES). Avoid having "This patch" or "This commit" in the commit message. It is tautologically useless. Also, do $ git grep 'This patch' Documentation/process for more details. Pls have a look at our documentation and check all your patches. Thx.
On Friday, May 10, 2024 2:52:14 PM GMT+2 Borislav Petkov wrote: > On Fri, May 10, 2024 at 01:21:47PM +0200, Fabio M. De Francesco wrote: > > Make extlog_print_rcd() log unconditionally to report errors even if > > userspace is not consuming trace events. Remove ras_userspace_consumers() > > because it is not anymore used. > > Did you do any git archeology before that? > > d6cae935ec5b ("trace, eMCA: Add a knob to adjust where to save event log") > > I can't find in this commit message why this is needed... Do share pls. > > > This change makes extlog_print() (ELOG) log consistently with ghes_proc() > > (GHES). I thought that ELOG and GHES should be modeled consistently. ghes_proc() prints to the console while ghes_do_proc() also uses ftrace. I made this short series an RFC mainly to ask / receive comments on this change (3/3). If we want to make ELOG and GHES act similarly, this patch is needed. Otherwise, things should be left the way they currently are. I'll make a v3 with a more clear explanation of why I think we should prefer to make ELOG act similarly to GHES in how kernel logs are handled. > Avoid having "This patch" or "This commit" in the commit message. It is > tautologically useless. > > Also, do > > $ git grep 'This patch' Documentation/process > > for more details. > > Pls have a look at our documentation and check all your patches. > > Thx. Please note that I was introducing the "why" part of the message. I never refer to this patch for the "what", and I always use an imperative tone only in the "what" part. However, I see why this commit message was poor. And probably also the other two were low quality. Therefore, I'll rework this and also the other two messages. Thanks for your comments, Fabio
On Fri, May 10, 2024 at 09:00:34PM +0200, Fabio M. De Francesco wrote: > I thought that ELOG and GHES should be modeled consistently. ghes_proc() > prints to the console while ghes_do_proc() also uses ftrace. ghes_proc() calls ghes_do_proc(). I have no clue what you mean here. > If we want to make ELOG and GHES act similarly, this patch is needed. I still don't know what the problem is. > Please note that I was introducing the "why" part of the message. I never > refer to this patch for the "what", and I always use an imperative tone only > in the "what" part. No, this is not what I meant. I mean, don't say "This patch does" or "This commit does" or "This change ... " or whatnot as that is not necessary. IOW: "Make extlog_print() (ELOG) log consistently with ghes_proc() (GHES) because... " and this is where you come in. So let's start with the problem: what is it and why do you think it is a problem?
On Friday, May 10, 2024 9:25:56 PM GMT+2 Borislav Petkov wrote: > On Fri, May 10, 2024 at 09:00:34PM +0200, Fabio M. De Francesco wrote: > > I thought that ELOG and GHES should be modeled consistently. ghes_proc() > > prints to the console while ghes_do_proc() also uses ftrace. > > ghes_proc() calls ghes_do_proc(). I have no clue what you mean here. > My understanding is that ghes_proc() logs to the console and ghes_do_proc() calls the tracers. Therefore, GHES at the same time always reports the errors via two different means. Instead ELOG depends on the check on ras_userspace_consumers() to decide whether to call print_extlog_rcd() to print the logs. And if it print to the kernel logs, it jumps to "out" and skips the tracers. Why is it different with respect to how error reporting is made in GHES? I thought that ELOG should be modeled similarly to GHES and so it should print to the kernel logs always unconditionally and then it should also use ftrace (no goto "out" and skip tracers). (1) Is my understanding of logging and tracing in ELOG and GHES correct? (2) If it is, does it make sense for ELOG to print to the kernel log, unconditionally, and then call the tracers like ghes_proc() + ghes_do_proc() do? Fabio
Fabio M. De Francesco wrote: > On Friday, May 10, 2024 9:25:56 PM GMT+2 Borislav Petkov wrote: > > On Fri, May 10, 2024 at 09:00:34PM +0200, Fabio M. De Francesco wrote: > > > I thought that ELOG and GHES should be modeled consistently. ghes_proc() > > > prints to the console while ghes_do_proc() also uses ftrace. > > > > ghes_proc() calls ghes_do_proc(). I have no clue what you mean here. > > > > My understanding is that ghes_proc() logs to the console and ghes_do_proc() > calls the tracers. > > Therefore, GHES at the same time always reports the errors via two different > means. > > Instead ELOG depends on the check on ras_userspace_consumers() to decide > whether to call print_extlog_rcd() to print the logs. And if it print to the > kernel logs, it jumps to "out" and skips the tracers. > > Why is it different with respect to how error reporting is made in GHES? > > I thought that ELOG should be modeled similarly to GHES and so it should print > to the kernel logs always unconditionally and then it should also use ftrace > (no goto "out" and skip tracers). > > (1) Is my understanding of logging and tracing in ELOG and GHES correct? > (2) If it is, does it make sense for ELOG to print to the kernel log, > unconditionally, and then call the tracers like ghes_proc() + ghes_do_proc() > do? I had asked Fabio to take a look at whether it made sense to continue with the concept of ras_userspace_consumers() especially since it seems limited to the EXTLOG case. In general I am finding that between OS Native and Firmware First error reporting the logging approaches are inconsistent. As far I can see rasdaemon would not even notice is the "daemon_active" debugfs file went away [1], and it should be the case that the tracepoints always fire whether daemon_active is open or not. So I was expecting this removal to be a conversation starter on the wider topic of error reporting consistency. [1]: https://github.com/mchehab/rasdaemon/blob/master/ras-events.c#L992
On Fri, May 10, 2024 at 03:12:36PM -0700, Dan Williams wrote: > I had asked Fabio to take a look at whether it made sense to continue > with the concept of ras_userspace_consumers() especially since it seems > limited to the EXTLOG case. > > In general I am finding that between OS Native and Firmware First error > reporting the logging approaches are inconsistent. > > As far I can see rasdaemon would not even notice is the "daemon_active" > debugfs file went away [1], It tells the kernel that it is consuming the error info from the tracepoints. > and it should be the case that the tracepoints always fire whether > daemon_active is open or not. > > So I was expecting this removal to be a conversation starter on the > wider topic of error reporting consistency. Yeah, and then they'll come and say: ew, we're getting error duplicates - once logged in dmesg and once through the tracepoints. So just like with the other thread, we have to figure out what our scheme will be wrt hw error logging, agree on it and then make it consistent. Do we want to have both? Should it be configurable? Probably... Anything else...? Thx.
Borislav Petkov wrote: > On Fri, May 10, 2024 at 03:12:36PM -0700, Dan Williams wrote: > > I had asked Fabio to take a look at whether it made sense to continue > > with the concept of ras_userspace_consumers() especially since it seems > > limited to the EXTLOG case. > > > > In general I am finding that between OS Native and Firmware First error > > reporting the logging approaches are inconsistent. > > > > As far I can see rasdaemon would not even notice is the "daemon_active" > > debugfs file went away [1], > > It tells the kernel that it is consuming the error info from the > tracepoints. Yes, my point though was that if it got deleted I doubt anyone would notice. rasdaemon explicitly does not check the return from open("daemon_active"). I am also curious about the history here. This "daemon_active" scheme is an awkward way to detect that something is consuming the tracepoint. It was added on v4.0, but Steven had added "tracepoint_enabled()" back in v3.17: 7c65bbc7dcfa tracing: Add trace_<tracepoint>_enabled() function So even if non-rasdaemon userspace was watching the extlog tracepoints they would not fire because ras_userspace_consumers() prevents it. I am finding it difficult to see why ras_userspace_consumers() needs to continue to be maintained. > > and it should be the case that the tracepoints always fire whether > > daemon_active is open or not. > > > > So I was expecting this removal to be a conversation starter on the > > wider topic of error reporting consistency. > > Yeah, and then they'll come and say: ew, we're getting error duplicates > - once logged in dmesg and once through the tracepoints. That would be odd since there is no ras_userspace_consumers() in the ACPI GHES path, so it is already the case that you can get duplicate error information depending on which path triggers the error. Tracepoints are individually configurable. > So just like with the other thread, we have to figure out what our > scheme will be wrt hw error logging, agree on it and then make it > consistent. From my perspective I want alignement between "firmware first" and "OS Native" events and I think any movement away from kernel log messages as a hardware error mechanism towards tracepoints is a good thing. Recall that tracepoints can also be configured to emit to the kernel log, so that might be a way to keep legacy kernel log message parsing environments happy. > Do we want to have both? Should it be configurable? Probably... Would be great to hear from folks that have a reasons for kernel log message error reporting to continue. > Anything else...? Uniformity of error response to "fatal" events, but that is mainly a PCIe error handling concern not CPU errors.
On Sun, May 12, 2024 at 04:45:08PM -0700, Dan Williams wrote: > Yes, my point though was that if it got deleted I doubt anyone would > notice. rasdaemon explicitly does not check the return from > open("daemon_active"). The intent was for userspace to open it and thus it'll increment trace_count which then ras_userspace_consumers() reads... > I am also curious about the history here. This "daemon_active" scheme is > an awkward way to detect that something is consuming the tracepoint. It > was added on v4.0, but Steven had added "tracepoint_enabled()" back in > v3.17: > > 7c65bbc7dcfa tracing: Add trace_<tracepoint>_enabled() function Ha, I usually talk to Rostedt for all things tracepoint when wondering how we could use them for RAS purposes but I haven't this time, it seems. > So even if non-rasdaemon userspace was watching the extlog tracepoints > they would not fire because ras_userspace_consumers() prevents it. > > I am finding it difficult to see why ras_userspace_consumers() needs to > continue to be maintained. Well, you still need some functionality which tests whether a userspace daemon consumes RAS events. Whether it is something cludgy like now or something which checks whether all RAS tracepoints have been enabled, something's gotta be there. > That would be odd since there is no ras_userspace_consumers() in the > ACPI GHES path, Probably because no one's using RAS daemon with GHES. I at least haven't heard of anyone complaining about this yet... > so it is already the case that you can get duplicate error information > depending on which path triggers the error. > > Tracepoints are individually configurable. Sure. > From my perspective I want alignement between "firmware first" and "OS > Native" events and I think any movement away from kernel log messages as > a hardware error mechanism towards tracepoints is a good thing. That has been the goal for a while now, yap. Anyone who parses the kernel log for anything serious has been living under a rock in the last decade at least. :) > Recall that tracepoints can also be configured to emit to the kernel > log, so that might be a way to keep legacy kernel log message parsing > environments happy. Ok. > Would be great to hear from folks that have a reasons for kernel log > message error reporting to continue. Right, from my experience so far, you never hear anything. :-\ So if we do anything, it should be something simple and which works for almost everyone. With RAS, everyone does their own thing. And then there's the firmware which claims that it can do better RAS but then f*cks up on basic things like *actually* shipping a working EINJ or whatever implementation. So in the end of the day it is, oh, we need our drivers in the OS because we can't fix firmware. It is harder to fix it than *hardware* :-P > Uniformity of error response to "fatal" events, but that is mainly a > PCIe error handling concern not CPU errors. Sure, just make sure to keep it simple and generic. Thx.
[ add Steven for a "deprecate emitting hardware errors with explicit printk" discussion ] Borislav Petkov wrote: > On Sun, May 12, 2024 at 04:45:08PM -0700, Dan Williams wrote: > > Yes, my point though was that if it got deleted I doubt anyone would > > notice. rasdaemon explicitly does not check the return from > > open("daemon_active"). > > The intent was for userspace to open it and thus it'll increment > trace_count which then ras_userspace_consumers() reads... > > > I am also curious about the history here. This "daemon_active" scheme is > > an awkward way to detect that something is consuming the tracepoint. It > > was added on v4.0, but Steven had added "tracepoint_enabled()" back in > > v3.17: > > > > 7c65bbc7dcfa tracing: Add trace_<tracepoint>_enabled() function > > Ha, I usually talk to Rostedt for all things tracepoint when wondering > how we could use them for RAS purposes but I haven't this time, it > seems. > > > So even if non-rasdaemon userspace was watching the extlog tracepoints > > they would not fire because ras_userspace_consumers() prevents it. > > > > I am finding it difficult to see why ras_userspace_consumers() needs to > > continue to be maintained. > > Well, you still need some functionality which tests whether a userspace > daemon consumes RAS events. Whether it is something cludgy like now or > something which checks whether all RAS tracepoints have been enabled, > something's gotta be there. Honest question, why? Lets deprecate that path. As an example, this extlog path has bit-rotted and not kept pace with the same level of error reporting that is available via ACPI GHES or OS native tracepoints. Given that is has not kept pace the next question is whether the kernel should bother to maintain the contract => "if nothing is watching tracepoints some subset (all?) hardware error messages will be reflected to the kernel log". I would point to tp_printk as a way to get tracepoints into the kernel log. If that is too coarse-grained a replacement for print_extlog_rcd() I would advocate spending time making tp_printk control more fine-grained rather than perpetuate this duplicated emission path for error info. Something like a new annotation for tracepoints marking them as "emit to kernel log if no-one is watching this high-priority trace event"? > > That would be odd since there is no ras_userspace_consumers() in the > > ACPI GHES path, > > Probably because no one's using RAS daemon with GHES. I at least haven't > heard of anyone complaining about this yet... Well no, there is little to complain about in that path because that path does not play "is anybody watching" games. It always emits to the kernel log (subject to rate limiting) and then follows up with always emitting a tracepoint (subject to standard trace filtering). For CXL I asked that its events deprecate the kernel log path with the hope of not growing new userspace dependencies on kernel log parsing for newfangled CXL errors. [..] > > Would be great to hear from folks that have a reasons for kernel log > > message error reporting to continue. > > Right, from my experience so far, you never hear anything. :-\ > > So if we do anything, it should be something simple and which works for > almost everyone. > > With RAS, everyone does their own thing. And then there's the firmware > which claims that it can do better RAS but then f*cks up on basic things > like *actually* shipping a working EINJ or whatever implementation. *sad nod* > So in the end of the day it is, oh, we need our drivers in the OS > because we can't fix firmware. It is harder to fix it than *hardware* > :-P At least when the firmware gets out of the way Linux has a chance to solve user issues. > > Uniformity of error response to "fatal" events, but that is mainly a > > PCIe error handling concern not CPU errors. > > Sure, just make sure to keep it simple and generic. Yes, tracepoints feel simple and generic to me while kernel log messages with rate-limiting is already a lossy proposition.
> Given that is has not kept pace the next question is whether the kernel > should bother to maintain the contract => "if nothing is watching > tracepoints some subset (all?) hardware error messages will be reflected > to the kernel log". I'm with you for corrected & recoverable errors. The console is a terrible place for those logs. But it's handy to have fatal errors go to the console. They may make it out to a serial port, or be stashed in pstore for later retrieval. Trace event logs that are handled by some user level daemon are just going to disappear in a puff of reset smoke. -Tony
Luck, Tony wrote: > > Given that is has not kept pace the next question is whether the kernel > > should bother to maintain the contract => "if nothing is watching > > tracepoints some subset (all?) hardware error messages will be reflected > > to the kernel log". > > I'm with you for corrected & recoverable errors. The console is a terrible > place for those logs. > > But it's handy to have fatal errors go to the console. They may make it out to a > serial port, or be stashed in pstore for later retrieval. Trace event logs that > are handled by some user level daemon are just going to disappear in a > puff of reset smoke. That is true, and arguably that fatal event logging to the kernel log should be independent of whether userspace is watching. I.e. ras_userpsace_consumers() gating kernel logging is actively getting in the way when the error severity implies userspace may not get a chance to see or process the event. I would need help from Steven on whether ftrace would be amenable to have a localized tracepoint_printk context to trigger emission of fatal events to the kernel log in addition to the trace buffer.
On Thu, May 16, 2024 at 11:56:15AM -0700, Dan Williams wrote: > Something like a new annotation for tracepoints marking them as "emit to > kernel log if no-one is watching this high-priority trace event"? I'm not sure anymore what we want here now... :) We want for the kernel to not pay attention to whether userspace is consuming error info from the tracepoints and to issue errors into the kernel log too. So you have them in the kernel log *and* in the tracepoints. Right? Or only through the tracepoints? > Well no, there is little to complain about in that path because that > path does not play "is anybody watching" games. It always emits to the > kernel log (subject to rate limiting) and then follows up with always > emitting a tracepoint (subject to standard trace filtering). > > For CXL I asked that its events deprecate the kernel log path with the > hope of not growing new userspace dependencies on kernel log parsing for Yeah, kernel log string format is not an ABI so no problem. > newfangled CXL errors. So shuffling hw error info solely through the tracepoints is probably ok but I am unable to rule out *all* possible cases where it would still make sense to dump to the kernel log, be it there's no userspace, be it it is a critical error and you want to dump it immediately or whatnot. It should probably be configurable. As in: by default all hw error info goes only through tracepoints with the exception of fatal errors - they go both to the kernel log and tracepoints. And then perhaps add "ras=dump_all_error_info_to_kernel_log_too" or so, cmdline param. > At least when the firmware gets out of the way Linux has a chance to > solve user issues. Yeap. > Yes, tracepoints feel simple and generic to me while kernel log messages > with rate-limiting is already a lossy proposition. Right. Btw, do bear in mind that ratelimiting of hw errors can be bad sometimes: imagine it ratelimits exactly that one fatal error which you absolutely should've seen... :-\ Thx.
diff --git a/drivers/acpi/acpi_extlog.c b/drivers/acpi/acpi_extlog.c index fb7b0c73f86a..41ee66306234 100644 --- a/drivers/acpi/acpi_extlog.c +++ b/drivers/acpi/acpi_extlog.c @@ -189,10 +189,7 @@ static int extlog_print(struct notifier_block *nb, unsigned long val, tmp = (struct acpi_hest_generic_status *)elog_buf; - if (!ras_userspace_consumers()) { - print_extlog_rcd(NULL, tmp, cpu); - goto out; - } + print_extlog_rcd(NULL, tmp, cpu); /* log event via trace */ err_seq++; @@ -225,7 +222,6 @@ static int extlog_print(struct notifier_block *nb, unsigned long val, } } -out: mce->kflags |= MCE_HANDLED_EXTLOG; return NOTIFY_OK; } diff --git a/drivers/ras/debugfs.c b/drivers/ras/debugfs.c index 42afd3de68b2..6633844acdd6 100644 --- a/drivers/ras/debugfs.c +++ b/drivers/ras/debugfs.c @@ -13,12 +13,6 @@ struct dentry *ras_get_debugfs_root(void) } EXPORT_SYMBOL_GPL(ras_get_debugfs_root); -int ras_userspace_consumers(void) -{ - return atomic_read(&trace_count); -} -EXPORT_SYMBOL_GPL(ras_userspace_consumers); - static int trace_show(struct seq_file *m, void *v) { return 0; diff --git a/include/linux/ras.h b/include/linux/ras.h index a64182bc72ad..98840f16b697 100644 --- a/include/linux/ras.h +++ b/include/linux/ras.h @@ -7,11 +7,9 @@ #include <linux/cper.h> #ifdef CONFIG_DEBUG_FS -int ras_userspace_consumers(void); void ras_debugfs_init(void); int ras_add_daemon_trace(void); #else -static inline int ras_userspace_consumers(void) { return 0; } static inline void ras_debugfs_init(void) { } static inline int ras_add_daemon_trace(void) { return 0; } #endif
Make extlog_print_rcd() log unconditionally to report errors even if userspace is not consuming trace events. Remove ras_userspace_consumers() because it is not anymore used. This change makes extlog_print() (ELOG) log consistently with ghes_proc() (GHES). Cc: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com> --- drivers/acpi/acpi_extlog.c | 6 +----- drivers/ras/debugfs.c | 6 ------ include/linux/ras.h | 2 -- 3 files changed, 1 insertion(+), 13 deletions(-)