diff mbox series

[RFC,6/9] cxl/pci: Add trace logging for CXL PCIe port RAS errors

Message ID 20240617200411.1426554-7-terry.bowman@amd.com
State New
Headers show
Series Add RAS support for CXL root ports, CXL downstream switch ports, and CXL upstream switch ports | expand

Commit Message

Terry Bowman June 17, 2024, 8:04 p.m. UTC
The cxl_pci driver uses kernel trace functions to log RAS errors for
endpoints and RCH downstream ports. The same is needed for CXL root ports,
CXL downstream switch ports, and CXL upstream switch ports.

Add RAS correctable and RAS uncorrectable trace logging functions for
CXL PCIE ports.

Signed-off-by: Terry Bowman <terry.bowman@amd.com>
---
 drivers/cxl/core/trace.h | 34 ++++++++++++++++++++++++++++++++++
 1 file changed, 34 insertions(+)

Comments

Jonathan Cameron June 20, 2024, 12:53 p.m. UTC | #1
On Mon, 17 Jun 2024 15:04:08 -0500
Terry Bowman <terry.bowman@amd.com> wrote:

> The cxl_pci driver uses kernel trace functions to log RAS errors for
> endpoints and RCH downstream ports. The same is needed for CXL root ports,
> CXL downstream switch ports, and CXL upstream switch ports.
> 
> Add RAS correctable and RAS uncorrectable trace logging functions for
> CXL PCIE ports.
> 
> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> ---
>  drivers/cxl/core/trace.h | 34 ++++++++++++++++++++++++++++++++++
>  1 file changed, 34 insertions(+)
> 
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index e5f13260fc52..5cfd9952d88a 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -48,6 +48,23 @@
>  	{ CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" }			  \
>  )
>  
> +TRACE_EVENT(cxl_port_aer_uncorrectable_error,
> +	TP_PROTO(struct device *dev, u32 status),

By comparison with existing code, why no fe or header
log?  Don't exist for ports for some reason?
Serial number of the port might also be useful.

> +	TP_ARGS(dev, status),
> +	TP_STRUCT__entry(
> +		__string(devname, dev_name(dev))
> +		__field(u32, status)
> +	),
> +	TP_fast_assign(
> +		__assign_str(devname, dev_name(dev));
> +		__entry->status = status;
> +	),
> +	TP_printk("device=%s status='%s'",
> +		  __get_str(devname),
> +		  show_uc_errs(__entry->status)
> +	)
> +);
> +
>  TRACE_EVENT(cxl_aer_uncorrectable_error,
>  	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),
>  	TP_ARGS(cxlmd, status, fe, hl),
> @@ -96,6 +113,23 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
>  	{ CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer" }	\
>  )
>  
> +TRACE_EVENT(cxl_port_aer_correctable_error,
> +	TP_PROTO(struct device *dev, u32 status),
> +	TP_ARGS(dev, status),
> +	TP_STRUCT__entry(
> +		__string(devname, dev_name(dev))
> +		__field(u32, status)
> +	),
> +	TP_fast_assign(
> +		__assign_str(devname, dev_name(dev));
> +		__entry->status = status;
> +	),
> +	TP_printk("device=%s status='%s'",
> +		  __get_str(devname),
> +		  show_ce_errs(__entry->status)
> +	)
> +);
> +
>  TRACE_EVENT(cxl_aer_correctable_error,
>  	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
>  	TP_ARGS(cxlmd, status),
Terry Bowman June 24, 2024, 3:53 p.m. UTC | #2
Hi Jonathan,

I added responses inline below.

On 6/20/24 07:53, Jonathan Cameron wrote:
> On Mon, 17 Jun 2024 15:04:08 -0500
> Terry Bowman <terry.bowman@amd.com> wrote:
> 
>> The cxl_pci driver uses kernel trace functions to log RAS errors for
>> endpoints and RCH downstream ports. The same is needed for CXL root ports,
>> CXL downstream switch ports, and CXL upstream switch ports.
>>
>> Add RAS correctable and RAS uncorrectable trace logging functions for
>> CXL PCIE ports.
>>
>> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
>> ---
>>  drivers/cxl/core/trace.h | 34 ++++++++++++++++++++++++++++++++++
>>  1 file changed, 34 insertions(+)
>>
>> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
>> index e5f13260fc52..5cfd9952d88a 100644
>> --- a/drivers/cxl/core/trace.h
>> +++ b/drivers/cxl/core/trace.h
>> @@ -48,6 +48,23 @@
>>  	{ CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" }			  \
>>  )
>>  
>> +TRACE_EVENT(cxl_port_aer_uncorrectable_error,
>> +	TP_PROTO(struct device *dev, u32 status),
> 
> By comparison with existing code, why no fe or header
> log?  Don't exist for ports for some reason?
> Serial number of the port might also be useful.
> 

The AER FE and header are the same for ports and the logging 
needs to be added here.

There is no serial number for the ports.

Regards,
Terry

>> +	TP_ARGS(dev, status),
>> +	TP_STRUCT__entry(
>> +		__string(devname, dev_name(dev))
>> +		__field(u32, status)
>> +	),
>> +	TP_fast_assign(
>> +		__assign_str(devname, dev_name(dev));
>> +		__entry->status = status;
>> +	),
>> +	TP_printk("device=%s status='%s'",
>> +		  __get_str(devname),
>> +		  show_uc_errs(__entry->status)
>> +	)
>> +);
>> +
>>  TRACE_EVENT(cxl_aer_uncorrectable_error,
>>  	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),
>>  	TP_ARGS(cxlmd, status, fe, hl),
>> @@ -96,6 +113,23 @@ TRACE_EVENT(cxl_aer_uncorrectable_error,
>>  	{ CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer" }	\
>>  )
>>  
>> +TRACE_EVENT(cxl_port_aer_correctable_error,
>> +	TP_PROTO(struct device *dev, u32 status),
>> +	TP_ARGS(dev, status),
>> +	TP_STRUCT__entry(
>> +		__string(devname, dev_name(dev))
>> +		__field(u32, status)
>> +	),
>> +	TP_fast_assign(
>> +		__assign_str(devname, dev_name(dev));
>> +		__entry->status = status;
>> +	),
>> +	TP_printk("device=%s status='%s'",
>> +		  __get_str(devname),
>> +		  show_ce_errs(__entry->status)
>> +	)
>> +);
>> +
>>  TRACE_EVENT(cxl_aer_correctable_error,
>>  	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
>>  	TP_ARGS(cxlmd, status),
>
Jonathan Cameron July 2, 2024, 3:53 p.m. UTC | #3
On Mon, 24 Jun 2024 10:53:51 -0500
Terry Bowman <Terry.Bowman@amd.com> wrote:

> Hi Jonathan,
> 
> I added responses inline below.
> 
> On 6/20/24 07:53, Jonathan Cameron wrote:
> > On Mon, 17 Jun 2024 15:04:08 -0500
> > Terry Bowman <terry.bowman@amd.com> wrote:
> >   
> >> The cxl_pci driver uses kernel trace functions to log RAS errors for
> >> endpoints and RCH downstream ports. The same is needed for CXL root ports,
> >> CXL downstream switch ports, and CXL upstream switch ports.
> >>
> >> Add RAS correctable and RAS uncorrectable trace logging functions for
> >> CXL PCIE ports.
> >>
> >> Signed-off-by: Terry Bowman <terry.bowman@amd.com>
> >> ---
> >>  drivers/cxl/core/trace.h | 34 ++++++++++++++++++++++++++++++++++
> >>  1 file changed, 34 insertions(+)
> >>
> >> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> >> index e5f13260fc52..5cfd9952d88a 100644
> >> --- a/drivers/cxl/core/trace.h
> >> +++ b/drivers/cxl/core/trace.h
> >> @@ -48,6 +48,23 @@
> >>  	{ CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" }			  \
> >>  )
> >>  
> >> +TRACE_EVENT(cxl_port_aer_uncorrectable_error,
> >> +	TP_PROTO(struct device *dev, u32 status),  
> > 
> > By comparison with existing code, why no fe or header
> > log?  Don't exist for ports for some reason?
> > Serial number of the port might also be useful.
> >   
> 
> The AER FE and header are the same for ports and the logging 
> needs to be added here.
> 
> There is no serial number for the ports.
Why not? At least for switch USP there might be (actually
I believe there can be for pretty much anything but there
are rules on them matching in switch funcitons).

J

> 
> Regards,
> Terry
diff mbox series

Patch

diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index e5f13260fc52..5cfd9952d88a 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -48,6 +48,23 @@ 
 	{ CXL_RAS_UC_IDE_RX_ERR, "IDE Rx Error" }			  \
 )
 
+TRACE_EVENT(cxl_port_aer_uncorrectable_error,
+	TP_PROTO(struct device *dev, u32 status),
+	TP_ARGS(dev, status),
+	TP_STRUCT__entry(
+		__string(devname, dev_name(dev))
+		__field(u32, status)
+	),
+	TP_fast_assign(
+		__assign_str(devname, dev_name(dev));
+		__entry->status = status;
+	),
+	TP_printk("device=%s status='%s'",
+		  __get_str(devname),
+		  show_uc_errs(__entry->status)
+	)
+);
+
 TRACE_EVENT(cxl_aer_uncorrectable_error,
 	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status, u32 fe, u32 *hl),
 	TP_ARGS(cxlmd, status, fe, hl),
@@ -96,6 +113,23 @@  TRACE_EVENT(cxl_aer_uncorrectable_error,
 	{ CXL_RAS_CE_PHYS_LAYER_ERR, "Received Error From Physical Layer" }	\
 )
 
+TRACE_EVENT(cxl_port_aer_correctable_error,
+	TP_PROTO(struct device *dev, u32 status),
+	TP_ARGS(dev, status),
+	TP_STRUCT__entry(
+		__string(devname, dev_name(dev))
+		__field(u32, status)
+	),
+	TP_fast_assign(
+		__assign_str(devname, dev_name(dev));
+		__entry->status = status;
+	),
+	TP_printk("device=%s status='%s'",
+		  __get_str(devname),
+		  show_ce_errs(__entry->status)
+	)
+);
+
 TRACE_EVENT(cxl_aer_correctable_error,
 	TP_PROTO(const struct cxl_memdev *cxlmd, u32 status),
 	TP_ARGS(cxlmd, status),