diff mbox series

[v1,1/2] thunderbolt: add tracefs support to tb_* logging helpers

Message ID 20230727131326.2282301-1-lb@semihalf.com (mailing list archive)
State New, archived
Headers show
Series [v1,1/2] thunderbolt: add tracefs support to tb_* logging helpers | expand

Commit Message

Łukasz Bartosik July 27, 2023, 1:13 p.m. UTC
Thunderbolt tracing is a lightweight alternative to traditional
thunderbolt debug logging. While thunderbolt debug logging is quite
convenient when reproducing a specific issue, it doesn't help when
something goes wrong unexpectedly. There are a couple of reasons why
one does not want to enable thunderbolt debug logging at all times:

1. We don't want to overwhelm kernel log with thunderbolt spam, others
   want to use it too
2. Console logging is slow

Thunderbolt tracing aims to solve both these problems. Use of
the thunderbolt tracefs instance allows to enable thunderbolt
logging in production without impacting performance or spamming
the system logs.

To use thunderbolt tracing, set the thunderbolt.trace module parameter
(via cmdline or sysfs) to true:
::
  eg: echo true > /sys/module/thunderbolt/parameters/trace

Once active, all log messages will be written to the thunderbolt trace.
Once at capacity, the trace will overwrite old messages with new ones.
At any point, one can read the trace file to extract the previous
thunderbolt messages:
::
  eg: cat /sys/kernel/tracing/instances/thunderbolt/trace

The thunderbolt trace instance is subsystem wide, so if you have multiple
devices active, they will be adding logs to the same trace.

Signed-off-by: Łukasz Bartosik <lb@semihalf.com>
---
 drivers/thunderbolt/nhi.c | 81 +++++++++++++++++++++++++++++++++++++++
 drivers/thunderbolt/tb.h  | 60 ++++++++++++++++++++++++++---
 2 files changed, 136 insertions(+), 5 deletions(-)

Comments

Greg KH July 27, 2023, 2:46 p.m. UTC | #1
On Thu, Jul 27, 2023 at 03:13:25PM +0200, Łukasz Bartosik wrote:
> Thunderbolt tracing is a lightweight alternative to traditional
> thunderbolt debug logging. While thunderbolt debug logging is quite
> convenient when reproducing a specific issue, it doesn't help when
> something goes wrong unexpectedly. There are a couple of reasons why
> one does not want to enable thunderbolt debug logging at all times:
> 
> 1. We don't want to overwhelm kernel log with thunderbolt spam, others
>    want to use it too

But doesn't the dynamic debug infrastructure allow this today?

> 2. Console logging is slow

Slow how?

> Thunderbolt tracing aims to solve both these problems. Use of
> the thunderbolt tracefs instance allows to enable thunderbolt
> logging in production without impacting performance or spamming
> the system logs.
> 
> To use thunderbolt tracing, set the thunderbolt.trace module parameter
> (via cmdline or sysfs) to true:
> ::
>   eg: echo true > /sys/module/thunderbolt/parameters/trace

Why yet-another module parameter?  Why is this required?

> Once active, all log messages will be written to the thunderbolt trace.
> Once at capacity, the trace will overwrite old messages with new ones.
> At any point, one can read the trace file to extract the previous
> thunderbolt messages:
> ::
>   eg: cat /sys/kernel/tracing/instances/thunderbolt/trace
> 
> The thunderbolt trace instance is subsystem wide, so if you have multiple
> devices active, they will be adding logs to the same trace.

This just uses the existing logging functionality and ties it into the
trace functionality, right?

If so, why not do this for all printk messages, why should this be
unique to thunderbolt?

Normally with tracing, you enable specific tracepoints that you add, not
just "all dev_*() messages", are you sure this will actually help?

thanks,

greg k-h
Łukasz Bartosik July 28, 2023, 2:50 p.m. UTC | #2
czw., 27 lip 2023 o 16:47 Greg KH <gregkh@linuxfoundation.org> napisał(a):
>
> On Thu, Jul 27, 2023 at 03:13:25PM +0200, Łukasz Bartosik wrote:
> > Thunderbolt tracing is a lightweight alternative to traditional
> > thunderbolt debug logging. While thunderbolt debug logging is quite
> > convenient when reproducing a specific issue, it doesn't help when
> > something goes wrong unexpectedly. There are a couple of reasons why
> > one does not want to enable thunderbolt debug logging at all times:
> >
> > 1. We don't want to overwhelm kernel log with thunderbolt spam, others
> >    want to use it too
>
> But doesn't the dynamic debug infrastructure allow this today?
>

Dynamic debug allows only partially what we would like to achieve.

Our goal is to be able to gather thunderbolt debug logs from ChromeOS
end users when an issue happens. Especially that would be very useful
for us in case of issues which reproduction is difficult. We would
write thunderbolt debug logs to a separate wrap around buffer provided
by trace subsystem. When an issue happens and a user sends a feedback
then thunderbolt logs would be attached to the feedback providing
more insight into what happened.

Dynamic debug sends all debug messages to dmesg and with significant
number of dynamic dev_dbg prints enabled dmesg may be quickly overwritten
and other important logs lost. Also enabling dynamic debug for the
entire kernel might
not be appropriate for production kernels due to impact on kernel size and perf.

> > 2. Console logging is slow
>
> Slow how?
>

I meant slow compared to writing messages to trace array instance in memory.

> > Thunderbolt tracing aims to solve both these problems. Use of
> > the thunderbolt tracefs instance allows to enable thunderbolt
> > logging in production without impacting performance or spamming
> > the system logs.
> >
> > To use thunderbolt tracing, set the thunderbolt.trace module parameter
> > (via cmdline or sysfs) to true:
> > ::
> >   eg: echo true > /sys/module/thunderbolt/parameters/trace
>
> Why yet-another module parameter?  Why is this required?
>

This is to enable/disable write of thunderbolt debug logs to thunderbolt
trace array instance.


> > Once active, all log messages will be written to the thunderbolt trace.
> > Once at capacity, the trace will overwrite old messages with new ones.
> > At any point, one can read the trace file to extract the previous
> > thunderbolt messages:
> > ::
> >   eg: cat /sys/kernel/tracing/instances/thunderbolt/trace
> >
> > The thunderbolt trace instance is subsystem wide, so if you have multiple
> > devices active, they will be adding logs to the same trace.
>
> This just uses the existing logging functionality and ties it into the
> trace functionality, right?
>

Yes, it writes log messages (including debug level) to memory buffer provided
by trace framework.

> If so, why not do this for all printk messages, why should this be
> unique to thunderbolt?
>

I agree it would be better to come up with a general solution that can be
used by any part/subsystem of a kernel. I think it makes sense for me to look
more into dynamic debug and see how it could be extended to cover our use case.

> Normally with tracing, you enable specific tracepoints that you add, not
> just "all dev_*() messages", are you sure this will actually help?
>

Yes, it would be helpful, here is an example snippet from logs
captured into thunderbolt trace array instance:
 <...>-177     [000] .....   217.635100: tb_trace_printf: thunderbolt
0000:00:0d.3: 0: resuming switch
 <...>-177     [000] .....   217.635101: tb_trace_printf: thunderbolt
0000:00:0d.3: restoring Switch at 0x0 (depth: 0, up port: 7)
 <...>-177     [000] .....   217.635400: tb_trace_printf: thunderbolt
0000:00:0d.3: 0:1: USB4 wake: no, connection wake: no, disconnection
wake: yes

Thanks,
Lukasz

> thanks,
>
> greg k-h
Greg KH July 28, 2023, 3:32 p.m. UTC | #3
On Fri, Jul 28, 2023 at 04:50:12PM +0200, Łukasz Bartosik wrote:
> czw., 27 lip 2023 o 16:47 Greg KH <gregkh@linuxfoundation.org> napisał(a):
> >
> > On Thu, Jul 27, 2023 at 03:13:25PM +0200, Łukasz Bartosik wrote:
> > > Thunderbolt tracing is a lightweight alternative to traditional
> > > thunderbolt debug logging. While thunderbolt debug logging is quite
> > > convenient when reproducing a specific issue, it doesn't help when
> > > something goes wrong unexpectedly. There are a couple of reasons why
> > > one does not want to enable thunderbolt debug logging at all times:
> > >
> > > 1. We don't want to overwhelm kernel log with thunderbolt spam, others
> > >    want to use it too
> >
> > But doesn't the dynamic debug infrastructure allow this today?
> >
> 
> Dynamic debug allows only partially what we would like to achieve.
> 
> Our goal is to be able to gather thunderbolt debug logs from ChromeOS
> end users when an issue happens. Especially that would be very useful
> for us in case of issues which reproduction is difficult. We would
> write thunderbolt debug logs to a separate wrap around buffer provided
> by trace subsystem. When an issue happens and a user sends a feedback
> then thunderbolt logs would be attached to the feedback providing
> more insight into what happened.

The problem is, you don't want 100 different debug log formats and tools
for the 100 different driver subsystems.

That is why we unified everything on the dev_* format, and the tracing
tools.  Use them, don't create something new.

> Dynamic debug sends all debug messages to dmesg and with significant
> number of dynamic dev_dbg prints enabled dmesg may be quickly overwritten
> and other important logs lost. Also enabling dynamic debug for the
> entire kernel might
> not be appropriate for production kernels due to impact on kernel size and perf.

If you look at the typec code, they have their own type of ring-buffer
for logging things.  Perhaps look at making that more generic like what
you need here as well, and adding that to the tracing core for everyone
to use and unify with?

I don't want one-off solutions like this, sorry, that way lies madness,
madness that we used to have before we unified everything.  Let's learn
from our past mistakes and not make them again please.

thanks,

greg k-h
Łukasz Bartosik July 28, 2023, 3:49 p.m. UTC | #4
pt., 28 lip 2023 o 17:32 Greg KH <gregkh@linuxfoundation.org> napisał(a):
>
> On Fri, Jul 28, 2023 at 04:50:12PM +0200, Łukasz Bartosik wrote:
> > czw., 27 lip 2023 o 16:47 Greg KH <gregkh@linuxfoundation.org> napisał(a):
> > >
> > > On Thu, Jul 27, 2023 at 03:13:25PM +0200, Łukasz Bartosik wrote:
> > > > Thunderbolt tracing is a lightweight alternative to traditional
> > > > thunderbolt debug logging. While thunderbolt debug logging is quite
> > > > convenient when reproducing a specific issue, it doesn't help when
> > > > something goes wrong unexpectedly. There are a couple of reasons why
> > > > one does not want to enable thunderbolt debug logging at all times:
> > > >
> > > > 1. We don't want to overwhelm kernel log with thunderbolt spam, others
> > > >    want to use it too
> > >
> > > But doesn't the dynamic debug infrastructure allow this today?
> > >
> >
> > Dynamic debug allows only partially what we would like to achieve.
> >
> > Our goal is to be able to gather thunderbolt debug logs from ChromeOS
> > end users when an issue happens. Especially that would be very useful
> > for us in case of issues which reproduction is difficult. We would
> > write thunderbolt debug logs to a separate wrap around buffer provided
> > by trace subsystem. When an issue happens and a user sends a feedback
> > then thunderbolt logs would be attached to the feedback providing
> > more insight into what happened.
>
> The problem is, you don't want 100 different debug log formats and tools
> for the 100 different driver subsystems.
>
> That is why we unified everything on the dev_* format, and the tracing
> tools.  Use them, don't create something new.
>
> > Dynamic debug sends all debug messages to dmesg and with significant
> > number of dynamic dev_dbg prints enabled dmesg may be quickly overwritten
> > and other important logs lost. Also enabling dynamic debug for the
> > entire kernel might
> > not be appropriate for production kernels due to impact on kernel size and perf.
>
> If you look at the typec code, they have their own type of ring-buffer
> for logging things.  Perhaps look at making that more generic like what
> you need here as well, and adding that to the tracing core for everyone
> to use and unify with?
>

Thanks for the comments and pointing me to the typec code for the reference.

Lukasz

> I don't want one-off solutions like this, sorry, that way lies madness,
> madness that we used to have before we unified everything.  Let's learn
> from our past mistakes and not make them again please.
>
> thanks,
>
> greg k-h
diff mbox series

Patch

diff --git a/drivers/thunderbolt/nhi.c b/drivers/thunderbolt/nhi.c
index 4b7bec74e89f..3ff89817e421 100644
--- a/drivers/thunderbolt/nhi.c
+++ b/drivers/thunderbolt/nhi.c
@@ -20,6 +20,7 @@ 
 #include <linux/delay.h>
 #include <linux/property.h>
 #include <linux/string_helpers.h>
+#include <linux/trace.h>
 
 #include "nhi.h"
 #include "nhi_regs.h"
@@ -50,6 +51,19 @@  static bool host_reset = true;
 module_param(host_reset, bool, 0444);
 MODULE_PARM_DESC(host_reset, "reset USBv2 host router (default: true)");
 
+#ifdef CONFIG_TRACING
+/*
+ * __tb_debug_trace - enable debug logs to the thunderbolt tracefs instance
+ */
+bool __tb_debug_trace;
+EXPORT_SYMBOL(__tb_debug_trace);
+
+MODULE_PARM_DESC(trace, "enable debug logs to the thunderbolt tracefs instance (default: false) (bool)");
+module_param_named(trace, __tb_debug_trace, bool, 0600);
+
+static struct trace_array *trace_arr;
+#endif
+
 static int ring_interrupt_index(const struct tb_ring *ring)
 {
 	int bit = ring->hop;
@@ -1539,6 +1553,71 @@  static struct pci_driver nhi_driver = {
 	.driver.pm = &nhi_pm_ops,
 };
 
+#ifdef CONFIG_TRACING
+
+/**
+ * tb_trace_init - initializes the thunderbolt trace array
+ *
+ * This function fetches (or creates) the thunderbolt trace array.
+ * This should be called once on thunderbolt subsystem creation
+ * and matched with tb_trace_cleanup().
+ */
+void tb_trace_init(void)
+{
+	int ret;
+
+	trace_arr = trace_array_get_by_name("thunderbolt");
+	if (!trace_arr)
+		return;
+
+	ret = trace_array_init_printk(trace_arr);
+	if (ret)
+		tb_trace_cleanup();
+}
+
+/**
+ * tb_trace_printf - adds an entry to the thunderbolt tracefs instance
+ * @format: printf format of the message to add to the trace
+ *
+ * This function adds a new entry in the thunderbolt tracefs instance
+ */
+void tb_trace_printf(const struct device *dev, const char *format, ...)
+{
+	struct va_format vaf;
+	va_list args;
+
+	va_start(args, format);
+	vaf.fmt = format;
+	vaf.va = &args;
+
+	if (dev)
+		trace_array_printk(trace_arr, _THIS_IP_, "%s %s: %pV",
+				   dev_driver_string(dev), dev_name(dev),
+				   &vaf);
+	else
+		trace_array_printk(trace_arr, _THIS_IP_,
+				   "(NULL device *): %pV", &vaf);
+	va_end(args);
+}
+
+/**
+ * tb_trace_cleanup - destroys the thunderbolt trace array
+ *
+ * This function destroys the thunderbolt trace array created
+ * with tb_trace_init. This should be called once on thunderbolt
+ * subsystem close and matched with tb_trace_init().
+ */
+void tb_trace_cleanup(void)
+{
+	if (trace_arr) {
+		trace_array_put(trace_arr);
+		trace_array_destroy(trace_arr);
+		trace_arr = NULL;
+	}
+}
+
+#endif
+
 static int __init nhi_init(void)
 {
 	int ret;
@@ -1549,11 +1628,13 @@  static int __init nhi_init(void)
 	ret = pci_register_driver(&nhi_driver);
 	if (ret)
 		tb_domain_exit();
+	tb_trace_init();
 	return ret;
 }
 
 static void __exit nhi_unload(void)
 {
+	tb_trace_cleanup();
 	pci_unregister_driver(&nhi_driver);
 	tb_domain_exit();
 }
diff --git a/drivers/thunderbolt/tb.h b/drivers/thunderbolt/tb.h
index 57a9b272cb94..3d874182b996 100644
--- a/drivers/thunderbolt/tb.h
+++ b/drivers/thunderbolt/tb.h
@@ -14,6 +14,7 @@ 
 #include <linux/thunderbolt.h>
 #include <linux/uuid.h>
 #include <linux/bitfield.h>
+#include <linux/kern_levels.h>
 
 #include "tb_regs.h"
 #include "ctl.h"
@@ -685,11 +686,60 @@  static inline int tb_port_write(struct tb_port *port, const void *buffer,
 			    length);
 }
 
-#define tb_err(tb, fmt, arg...) dev_err(&(tb)->nhi->pdev->dev, fmt, ## arg)
-#define tb_WARN(tb, fmt, arg...) dev_WARN(&(tb)->nhi->pdev->dev, fmt, ## arg)
-#define tb_warn(tb, fmt, arg...) dev_warn(&(tb)->nhi->pdev->dev, fmt, ## arg)
-#define tb_info(tb, fmt, arg...) dev_info(&(tb)->nhi->pdev->dev, fmt, ## arg)
-#define tb_dbg(tb, fmt, arg...) dev_dbg(&(tb)->nhi->pdev->dev, fmt, ## arg)
+#ifdef CONFIG_TRACING
+extern bool __tb_debug_trace;
+
+void tb_trace_init(void);
+__printf(2, 3)
+void tb_trace_printf(const struct device *dev, const char *format, ...);
+void tb_trace_cleanup(void);
+
+static inline bool tb_dbg_trace_enabled(void)
+{
+	return unlikely(__tb_debug_trace);
+}
+
+#else
+static inline void tb_trace_init(void)
+{
+}
+
+__printf(2, 3)
+static inline void tb_trace_printf(const struct device *dev,
+				   const char *format, ...)
+{
+}
+
+static inline void tb_trace_cleanup(void)
+{
+}
+
+static inline bool tb_dbg_trace_enabled(void)
+{
+	return 0;
+}
+#endif
+
+#define __LOG_TRACE_PRINT(log_func, tb, fmt, arg...)			\
+	do {								\
+		const struct device *dev = &(tb)->nhi->pdev->dev;	\
+									\
+		log_func(dev, fmt,  ## arg);				\
+									\
+		if (tb_dbg_trace_enabled())				\
+			tb_trace_printf(dev, fmt, ## arg);		\
+	} while (0)
+
+#define tb_err(tb, fmt, arg...) \
+	__LOG_TRACE_PRINT(dev_err, tb, fmt, ## arg)
+#define tb_WARN(tb, fmt, arg...) \
+	dev_WARN(&(tb)->nhi->pdev->dev, fmt, ## arg)
+#define tb_warn(tb, fmt, arg...) \
+	__LOG_TRACE_PRINT(dev_warn, tb, fmt, ## arg)
+#define tb_info(tb, fmt, arg...) \
+	__LOG_TRACE_PRINT(dev_info, tb, fmt, ## arg)
+#define tb_dbg(tb, fmt, arg...) \
+	__LOG_TRACE_PRINT(dev_dbg, tb, fmt, ## arg)
 
 #define __TB_SW_PRINT(level, sw, fmt, arg...)           \
 	do {                                            \