diff mbox

[1/3] aerdrv: Trace Event for AER

Message ID 20121129215443.5483.43364.stgit@grignak.americas.hpqcorp.net (mailing list archive)
State New, archived
Delegated to: Bjorn Helgaas
Headers show

Commit Message

Lance Ortiz Nov. 29, 2012, 9:54 p.m. UTC
This header file will define a new trace event that will be triggered when
a AER event occurs.  The following data will be provided to the trace 
event.

char * name -	String containing the device path

u32 status - 	Either the correctable or uncorrectable register 
		indicating what error or errors have been see.

u8 severity - 	error severity 0:NONFATAL 1:FATAL 2:CORRECTED

The trace event will also provide a trace string that may look like:

"0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned
TLP"

Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
---

 include/ras/aer_event.h |   77 +++++++++++++++++++++++++++++++++++++++++++++++
 1 files changed, 77 insertions(+), 0 deletions(-)
 create mode 100644 include/ras/aer_event.h


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Steven Rostedt Nov. 30, 2012, 1:51 a.m. UTC | #1
On Thu, 2012-11-29 at 14:54 -0700, Lance Ortiz wrote:
> This header file will define a new trace event that will be triggered when
> a AER event occurs.  The following data will be provided to the trace 
> event.
> 
> char * name -	String containing the device path
> 
> u32 status - 	Either the correctable or uncorrectable register 
> 		indicating what error or errors have been see.
> 
> u8 severity - 	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
> 
> The trace event will also provide a trace string that may look like:
> 
> "0000:05:00.0 PCIe Bus Error:severity=Uncorrected (Non-Fatal), Poisoned
> TLP"
> 
> Signed-off-by: Lance Ortiz <lance.ortiz@hp.com>
> ---
> 
>  include/ras/aer_event.h |   77 +++++++++++++++++++++++++++++++++++++++++++++++

Is there a reason this header is here? Egad, I never noticed the
ras_event.h that is there. This include/ras directory was created for
the sole purpose of trace events! This is not the way to do this.

Please look at the sample in samples/trace_events/

The proper way is to keep the header by the driver. Then you can simply
include the header with "aer_event.h".

But to have the macro magic work, you need to modify the Makefile to
have something like:

CFLAGS_aerdrv_errprint.o = -I$(src)

and it will be able to find your headers without a problem.
The ras_event.h needs to be fixed too. I may just send a patch myself.

-- Steve


>  1 files changed, 77 insertions(+), 0 deletions(-)
>  create mode 100644 include/ras/aer_event.h
> 
> diff --git a/include/ras/aer_event.h b/include/ras/aer_event.h
> new file mode 100644
> index 0000000..735c973
> --- /dev/null
> +++ b/include/ras/aer_event.h
> @@ -0,0 +1,77 @@
> +#undef TRACE_SYSTEM
> +#define TRACE_SYSTEM aer
> +#define TRACE_INCLUDE_FILE aer_event
> +
> +#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
> +#define _TRACE_AER_H
> +
> +#include <linux/tracepoint.h>
> +#include <linux/edac.h>
> +
> +
> +/*
> + * Anhance Error Reporting (AER) PCIE Report Error
> + *
> + * These events are generated when hardware detects a corrected or
> + * uncorrected event on a pci express device and reports
> + * errors.  The event reports the following data.
> + *
> + * char * dev_name -	String containing the device identification
> + * u32 status -		Either the correctable or uncorrectable register
> + *			indicating what error or errors have been seen
> + * u8 severity -	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
> + */
> +
> +#define correctable_error_string			\
> +	{BIT(0),	"Receiver Error"},		\
> +	{BIT(6),	"Bad TLP"},			\
> +	{BIT(7),	"Bad DLLP"},			\
> +	{BIT(8),	"RELAY_NUM Rollover"},		\
> +	{BIT(12),	"Replay Timer Timeout"},	\
> +	{BIT(13),	"Advisory Non-Fatal"}
> +
> +#define uncorrectable_error_string			\
> +	{BIT(4),	"Data Link Protocol"},		\
> +	{BIT(12),	"Poisoned TLP"},		\
> +	{BIT(13),	"Flow Control Protocol"},	\
> +	{BIT(14),	"Completion Timeout"},		\
> +	{BIT(15),	"Completer Abort"},		\
> +	{BIT(16),	"Unexpected Completion"},	\
> +	{BIT(17),	"Receiver Overflow"},		\
> +	{BIT(18),	"Malformed TLP"},		\
> +	{BIT(19),	"ECRC"},			\
> +	{BIT(20),	"Unsupported Request"}
> +
> +TRACE_EVENT(aer_event,
> +	TP_PROTO(const char *dev_name,
> +		 const u32 status,
> +		 const u8 severity),
> +
> +	TP_ARGS(dev_name, status, severity),
> +
> +	TP_STRUCT__entry(
> +		__string(	dev_name,	dev_name	)
> +		__field(	u32,		status		)
> +		__field(	u8,		severity	)
> +	),
> +
> +	TP_fast_assign(
> +		__assign_str(dev_name, dev_name);
> +		__entry->status		= status;
> +		__entry->severity	= severity;
> +	),
> +
> +	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
> +		__get_str(dev_name),
> +		(__entry->severity == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
> +			((__entry->severity == HW_EVENT_ERR_FATAL) ?
> +			"Fatal" : "Uncorrected"),
> +		__entry->severity == HW_EVENT_ERR_CORRECTED ?
> +		__print_flags(__entry->status, "|", correctable_error_string) :
> +		__print_flags(__entry->status, "|", uncorrectable_error_string))
> +);
> +
> +#endif /* _TRACE_AER_H */
> +
> +/* This part must be outside protection */
> +#include <trace/define_trace.h>


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Nov. 30, 2012, 10:53 a.m. UTC | #2
On Thu, Nov 29, 2012 at 08:51:19PM -0500, Steven Rostedt wrote:
> >  include/ras/aer_event.h |   77 +++++++++++++++++++++++++++++++++++++++++++++++
> 
> Is there a reason this header is here? Egad, I never noticed the
> ras_event.h that is there. This include/ras directory was created for
> the sole purpose of trace events! This is not the way to do this.

Well, the idea for the ras event was to be able to use it in multiple
places. It is currently used only by EDAC but it could be that memory
errors could be reported by other agents which would reuse that TP.

> Please look at the sample in samples/trace_events/
> 
> The proper way is to keep the header by the driver. Then you can simply
> include the header with "aer_event.h".
> 
> But to have the macro magic work, you need to modify the Makefile to
> have something like:
> 
> CFLAGS_aerdrv_errprint.o = -I$(src)

So I'm guessing that every .c file including the TP should also -I
include the TP definition header wherever it is. Is that agreeable?

Thanks.
Steven Rostedt Nov. 30, 2012, 11:39 a.m. UTC | #3
On Fri, 2012-11-30 at 11:53 +0100, Borislav Petkov wrote:
> On Thu, Nov 29, 2012 at 08:51:19PM -0500, Steven Rostedt wrote:
> > >  include/ras/aer_event.h |   77 +++++++++++++++++++++++++++++++++++++++++++++++
> > 
> > Is there a reason this header is here? Egad, I never noticed the
> > ras_event.h that is there. This include/ras directory was created for
> > the sole purpose of trace events! This is not the way to do this.
> 
> Well, the idea for the ras event was to be able to use it in multiple
> places. It is currently used only by EDAC but it could be that memory
> errors could be reported by other agents which would reuse that TP.
> 

If it's generic, then place it into the include/trace/events directory,
the you don't need to have the TRACE_INCLUDE_PATH as that is the default
path the macros will use.

> > Please look at the sample in samples/trace_events/
> > 
> > The proper way is to keep the header by the driver. Then you can simply
> > include the header with "aer_event.h".
> > 
> > But to have the macro magic work, you need to modify the Makefile to
> > have something like:
> > 
> > CFLAGS_aerdrv_errprint.o = -I$(src)
> 
> So I'm guessing that every .c file including the TP should also -I
> include the TP definition header wherever it is. Is that agreeable?

You only need the -I$(src) for the .c file that uses the
CREATE_TRACE_POINTS macro, as the trace point macro magic headers
require it to find the TP header. Other files just need "foo.h", or
<trace/events/foo.h> if it's in the generic location.

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Nov. 30, 2012, 1:42 p.m. UTC | #4
On Fri, Nov 30, 2012 at 06:39:11AM -0500, Steven Rostedt wrote:
> On Fri, 2012-11-30 at 11:53 +0100, Borislav Petkov wrote:
> > On Thu, Nov 29, 2012 at 08:51:19PM -0500, Steven Rostedt wrote:
> > > >  include/ras/aer_event.h |   77 +++++++++++++++++++++++++++++++++++++++++++++++
> > > 
> > > Is there a reason this header is here? Egad, I never noticed the
> > > ras_event.h that is there. This include/ras directory was created for
> > > the sole purpose of trace events! This is not the way to do this.
> > 
> > Well, the idea for the ras event was to be able to use it in multiple
> > places. It is currently used only by EDAC but it could be that memory
> > errors could be reported by other agents which would reuse that TP.
> > 
> 
> If it's generic, then place it into the include/trace/events directory,
> the you don't need to have the TRACE_INCLUDE_PATH as that is the default
> path the macros will use.

Hmm, so I'm thinking maybe we should add a ras.h header there which
contains all RAS TPs.

> > > Please look at the sample in samples/trace_events/
> > > 
> > > The proper way is to keep the header by the driver. Then you can simply
> > > include the header with "aer_event.h".
> > > 
> > > But to have the macro magic work, you need to modify the Makefile to
> > > have something like:
> > > 
> > > CFLAGS_aerdrv_errprint.o = -I$(src)
> > 
> > So I'm guessing that every .c file including the TP should also -I
> > include the TP definition header wherever it is. Is that agreeable?
> 
> You only need the -I$(src) for the .c file that uses the
> CREATE_TRACE_POINTS macro, as the trace point macro magic headers
> require it to find the TP header.

This is done like this from EDAC's single usage site:

#define CREATE_TRACE_POINTS
#define TRACE_INCLUDE_PATH ../../include/ras
#include <ras/ras_event.h>

This is in <drivers/edac/edac_mc.c> and it doesn't to "-I$(src)" in the
edac Makefile.

> Other files just need "foo.h", or <trace/events/foo.h> if it's in the
> generic location.

So, it sounds to me like we should we move all RAS-specific tracepoints
to <trace/events/ras.h> and then in each usage site do:

#define CREATE_TRACE_POINTS
#include <trace/events/ras.h>

Correct?

FWIW, it looks neat and clean to me that way.

Thanks.
Steven Rostedt Nov. 30, 2012, 1:53 p.m. UTC | #5
On Fri, 2012-11-30 at 14:42 +0100, Borislav Petkov wrote:

> So, it sounds to me like we should we move all RAS-specific tracepoints
> to <trace/events/ras.h> and then in each usage site do:

Note, the CREATE_TRACE_POINTS must only be done in one location. Not
every place. It creates the code that does the work to make the
tracepoints show up in /debug/tracing/events/* as well as the callback
code and other such things. If you define it in more than one .c file,
then you will have linker issues due to the functions being created more
than once.

> 
> #define CREATE_TRACE_POINTS
> #include <trace/events/ras.h>
> 
> Correct?

That's the default way to do things.

> 
> FWIW, it looks neat and clean to me that way.

Yep, that's why it's default ;-)

-- Steve


--
To unsubscribe from this list: send the line "unsubscribe linux-pci" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Borislav Petkov Nov. 30, 2012, 5:36 p.m. UTC | #6
On Fri, Nov 30, 2012 at 08:53:51AM -0500, Steven Rostedt wrote:
> On Fri, 2012-11-30 at 14:42 +0100, Borislav Petkov wrote:
> 
> > So, it sounds to me like we should we move all RAS-specific tracepoints
> > to <trace/events/ras.h> and then in each usage site do:
> 
> Note, the CREATE_TRACE_POINTS must only be done in one location. Not
> every place. It creates the code that does the work to make the
> tracepoints show up in /debug/tracing/events/* as well as the callback
> code and other such things. If you define it in more than one .c file,
> then you will have linker issues due to the functions being created more
> than once.
> 
> > 
> > #define CREATE_TRACE_POINTS
> > #include <trace/events/ras.h>
> > 
> > Correct?
> 
> That's the default way to do things.

Ok, cool.

Lance, care to move the TP to a new header called
include/trace/events/ras.h in your next iteration of the patches?

I'll later move the mc_event TP there too and drop the RAS-specific TP
header.

Thanks.
Ortiz, Lance E Nov. 30, 2012, 6:18 p.m. UTC | #7
> Ok, cool.

> 

> Lance, care to move the TP to a new header called

> include/trace/events/ras.h in your next iteration of the patches?


Will do.

Lance
diff mbox

Patch

diff --git a/include/ras/aer_event.h b/include/ras/aer_event.h
new file mode 100644
index 0000000..735c973
--- /dev/null
+++ b/include/ras/aer_event.h
@@ -0,0 +1,77 @@ 
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM aer
+#define TRACE_INCLUDE_FILE aer_event
+
+#if !defined(_TRACE_AER_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_AER_H
+
+#include <linux/tracepoint.h>
+#include <linux/edac.h>
+
+
+/*
+ * Anhance Error Reporting (AER) PCIE Report Error
+ *
+ * These events are generated when hardware detects a corrected or
+ * uncorrected event on a pci express device and reports
+ * errors.  The event reports the following data.
+ *
+ * char * dev_name -	String containing the device identification
+ * u32 status -		Either the correctable or uncorrectable register
+ *			indicating what error or errors have been seen
+ * u8 severity -	error severity 0:NONFATAL 1:FATAL 2:CORRECTED
+ */
+
+#define correctable_error_string			\
+	{BIT(0),	"Receiver Error"},		\
+	{BIT(6),	"Bad TLP"},			\
+	{BIT(7),	"Bad DLLP"},			\
+	{BIT(8),	"RELAY_NUM Rollover"},		\
+	{BIT(12),	"Replay Timer Timeout"},	\
+	{BIT(13),	"Advisory Non-Fatal"}
+
+#define uncorrectable_error_string			\
+	{BIT(4),	"Data Link Protocol"},		\
+	{BIT(12),	"Poisoned TLP"},		\
+	{BIT(13),	"Flow Control Protocol"},	\
+	{BIT(14),	"Completion Timeout"},		\
+	{BIT(15),	"Completer Abort"},		\
+	{BIT(16),	"Unexpected Completion"},	\
+	{BIT(17),	"Receiver Overflow"},		\
+	{BIT(18),	"Malformed TLP"},		\
+	{BIT(19),	"ECRC"},			\
+	{BIT(20),	"Unsupported Request"}
+
+TRACE_EVENT(aer_event,
+	TP_PROTO(const char *dev_name,
+		 const u32 status,
+		 const u8 severity),
+
+	TP_ARGS(dev_name, status, severity),
+
+	TP_STRUCT__entry(
+		__string(	dev_name,	dev_name	)
+		__field(	u32,		status		)
+		__field(	u8,		severity	)
+	),
+
+	TP_fast_assign(
+		__assign_str(dev_name, dev_name);
+		__entry->status		= status;
+		__entry->severity	= severity;
+	),
+
+	TP_printk("%s PCIe Bus Error: severity=%s, %s\n",
+		__get_str(dev_name),
+		(__entry->severity == HW_EVENT_ERR_CORRECTED) ? "Corrected" :
+			((__entry->severity == HW_EVENT_ERR_FATAL) ?
+			"Fatal" : "Uncorrected"),
+		__entry->severity == HW_EVENT_ERR_CORRECTED ?
+		__print_flags(__entry->status, "|", correctable_error_string) :
+		__print_flags(__entry->status, "|", uncorrectable_error_string))
+);
+
+#endif /* _TRACE_AER_H */
+
+/* This part must be outside protection */
+#include <trace/define_trace.h>