diff mbox series

cxl/trace: Initialize cxl_poison region name to NULL

Message ID 20240314044301.2108650-1-alison.schofield@intel.com
State New, archived
Headers show
Series cxl/trace: Initialize cxl_poison region name to NULL | expand

Commit Message

Alison Schofield March 14, 2024, 4:43 a.m. UTC
From: Alison Schofield <alison.schofield@intel.com>

The TP_STRUCT__entry that gets assigned the region name, or an
empty string if no region is present, is erroneously initialized
to the cxl_region pointer. It needs to be initialized to NULL
otherwise its length is wrong and garbage chars can appear in
the kernel trace output: /sys/kernel/tracing/trace

Impact is that tooling depending on that trace data can miss
picking up a valid event when searching by region name. The
TP_printk() output, if enabled, does emit the correct region
names in the dmesg log.

This was found during testing of the cxl-list option to report
media-errors for a region.

Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records")
Signed-off-by: Alison Schofield <alison.schofield@intel.com>
---
 drivers/cxl/core/trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


base-commit: e8f897f4afef0031fe618a8e94127a0934896aba

Comments

Ira Weiny March 14, 2024, 1:53 p.m. UTC | #1
alison.schofield@ wrote:
> From: Alison Schofield <alison.schofield@intel.com>
> 
> The TP_STRUCT__entry that gets assigned the region name, or an
> empty string if no region is present, is erroneously initialized
> to the cxl_region pointer. It needs to be initialized to NULL
> otherwise its length is wrong and garbage chars can appear in
> the kernel trace output: /sys/kernel/tracing/trace
> 
> Impact is that tooling depending on that trace data can miss
> picking up a valid event when searching by region name. The
> TP_printk() output, if enabled, does emit the correct region
> names in the dmesg log.
> 
> This was found during testing of the cxl-list option to report
> media-errors for a region.
> 
> Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records")
> Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> ---
>  drivers/cxl/core/trace.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index bdf117a33744..bc5ca4d530d1 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -657,7 +657,7 @@ TRACE_EVENT(cxl_poison,
>  		__string(host, dev_name(cxlmd->dev.parent))
>  		__field(u64, serial)
>  		__field(u8, trace_type)
> -		__string(region, region)
> +		__string(region, NULL)

Couldn't this be "" instead of NULL then remove the __assign_str() if
region is NULL?

Ira

>  		__field(u64, overflow_ts)
>  		__field(u64, hpa)
>  		__field(u64, dpa)
> 
> base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
> -- 
> 2.37.3
> 

k
Alison Schofield March 14, 2024, 5:04 p.m. UTC | #2
On Thu, Mar 14, 2024 at 06:53:47AM -0700, Ira Weiny wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > 
> > The TP_STRUCT__entry that gets assigned the region name, or an
> > empty string if no region is present, is erroneously initialized
> > to the cxl_region pointer. It needs to be initialized to NULL
> > otherwise its length is wrong and garbage chars can appear in
> > the kernel trace output: /sys/kernel/tracing/trace
> > 
> > Impact is that tooling depending on that trace data can miss
> > picking up a valid event when searching by region name. The
> > TP_printk() output, if enabled, does emit the correct region
> > names in the dmesg log.
> > 
> > This was found during testing of the cxl-list option to report
> > media-errors for a region.
> > 
> > Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records")
> > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > ---
> >  drivers/cxl/core/trace.h | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > index bdf117a33744..bc5ca4d530d1 100644
> > --- a/drivers/cxl/core/trace.h
> > +++ b/drivers/cxl/core/trace.h
> > @@ -657,7 +657,7 @@ TRACE_EVENT(cxl_poison,
> >  		__string(host, dev_name(cxlmd->dev.parent))
> >  		__field(u64, serial)
> >  		__field(u8, trace_type)
> > -		__string(region, region)
> > +		__string(region, NULL)
> 
> Couldn't this be "" instead of NULL then remove the __assign_str() if
> region is NULL?
> 
> Ira

Thanks for the review Ira.

That doesn't work because it inits with too short of a length and the
region names all get truncated.

Adding Steve to this thread. He has upcoming changes to this space, but
I also should note that I think this fix may want to be applied
backwards, ie to stable, so we cannot count on Steves upcoming changes.

> 
> >  		__field(u64, overflow_ts)
> >  		__field(u64, hpa)
> >  		__field(u64, dpa)
> > 
> > base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
> > -- 
> > 2.37.3
> > 
> 
> k
Ira Weiny March 14, 2024, 5:24 p.m. UTC | #3
Ira Weiny wrote:
> alison.schofield@ wrote:
> > From: Alison Schofield <alison.schofield@intel.com>
> > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h

[snip]

> > index bdf117a33744..bc5ca4d530d1 100644
> > --- a/drivers/cxl/core/trace.h
> > +++ b/drivers/cxl/core/trace.h
> > @@ -657,7 +657,7 @@ TRACE_EVENT(cxl_poison,
> >  		__string(host, dev_name(cxlmd->dev.parent))
> >  		__field(u64, serial)
> >  		__field(u8, trace_type)
> > -		__string(region, region)
> > +		__string(region, NULL)
> 
> Couldn't this be "" instead of NULL then remove the __assign_str() if
> region is NULL?
> 
> Ira
> 
> >  		__field(u64, overflow_ts)
> >  		__field(u64, hpa)
> >  		__field(u64, dpa)
> > 
> > base-commit: e8f897f4afef0031fe618a8e94127a0934896aba
> > -- 
> > 2.37.3
> > 
> 
> k

Apologies...  This is just a lost 'k' looking for it's home.  (emacs users
don't hate on me please...  :-)

Ira
Ira Weiny March 14, 2024, 5:25 p.m. UTC | #4
Alison Schofield wrote:
> On Thu, Mar 14, 2024 at 06:53:47AM -0700, Ira Weiny wrote:
> > alison.schofield@ wrote:
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > The TP_STRUCT__entry that gets assigned the region name, or an
> > > empty string if no region is present, is erroneously initialized
> > > to the cxl_region pointer. It needs to be initialized to NULL
> > > otherwise its length is wrong and garbage chars can appear in
> > > the kernel trace output: /sys/kernel/tracing/trace
> > > 
> > > Impact is that tooling depending on that trace data can miss
> > > picking up a valid event when searching by region name. The
> > > TP_printk() output, if enabled, does emit the correct region
> > > names in the dmesg log.
> > > 
> > > This was found during testing of the cxl-list option to report
> > > media-errors for a region.
> > > 
> > > Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records")
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  drivers/cxl/core/trace.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > > index bdf117a33744..bc5ca4d530d1 100644
> > > --- a/drivers/cxl/core/trace.h
> > > +++ b/drivers/cxl/core/trace.h
> > > @@ -657,7 +657,7 @@ TRACE_EVENT(cxl_poison,
> > >  		__string(host, dev_name(cxlmd->dev.parent))
> > >  		__field(u64, serial)
> > >  		__field(u8, trace_type)
> > > -		__string(region, region)
> > > +		__string(region, NULL)
> > 
> > Couldn't this be "" instead of NULL then remove the __assign_str() if
> > region is NULL?
> > 
> > Ira
> 
> Thanks for the review Ira.
> 
> That doesn't work because it inits with too short of a length and the
> region names all get truncated.

Ah ok.  With that added detail.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> 
> Adding Steve to this thread. He has upcoming changes to this space, but
> I also should note that I think this fix may want to be applied
> backwards, ie to stable, so we cannot count on Steves upcoming changes.
>
Steven Rostedt March 14, 2024, 6:44 p.m. UTC | #5
On Thu, 14 Mar 2024 10:04:14 -0700
Alison Schofield <alison.schofield@intel.com> wrote:

> On Thu, Mar 14, 2024 at 06:53:47AM -0700, Ira Weiny wrote:
> > alison.schofield@ wrote:  
> > > From: Alison Schofield <alison.schofield@intel.com>
> > > 
> > > The TP_STRUCT__entry that gets assigned the region name, or an
> > > empty string if no region is present, is erroneously initialized
> > > to the cxl_region pointer. It needs to be initialized to NULL
> > > otherwise its length is wrong and garbage chars can appear in
> > > the kernel trace output: /sys/kernel/tracing/trace
> > > 
> > > Impact is that tooling depending on that trace data can miss
> > > picking up a valid event when searching by region name. The
> > > TP_printk() output, if enabled, does emit the correct region
> > > names in the dmesg log.
> > > 
> > > This was found during testing of the cxl-list option to report
> > > media-errors for a region.
> > > 
> > > Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records")
> > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > ---
> > >  drivers/cxl/core/trace.h | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > > index bdf117a33744..bc5ca4d530d1 100644
> > > --- a/drivers/cxl/core/trace.h
> > > +++ b/drivers/cxl/core/trace.h
> > > @@ -657,7 +657,7 @@ TRACE_EVENT(cxl_poison,
> > >  		__string(host, dev_name(cxlmd->dev.parent))
> > >  		__field(u64, serial)
> > >  		__field(u8, trace_type)
> > > -		__string(region, region)
> > > +		__string(region, NULL)  
> > 
> > Couldn't this be "" instead of NULL then remove the __assign_str() if
> > region is NULL?

That will not work either.

	__string() allocates the space on the ring buffer.

Getting rid of __assign_str() still leaves the buffer allocated with garbage.

As I said in the other email thread [1], __string() and __assign_str() are
tightly coupled and are the equivalent of:

  #define __string(buf, str)		len = strlen(str); buf = malloc(len)
  #define __assign_str(buf, str)	strcpy(buf, str)

I also mentioned that __string() does much more. It can handle NULL
pointers, so it's more like:

  #define __string(buf, str)     len = str ? strlen(str) : strlen("null"); buf = malloc(len)
  #define __assign_str(buf, str)  if (str) strcpy(buf, str); else strcpy(buf, "null")

So if the string is NULL, then you want to copy it. That said, looking at
this trace-event, it's still broken, because region isn't a string, so you
are basically doing:

	len = strlen(region);

Which doesn't make sense. What you want is:

	__string(region, region ? dev_name(&region->dev) : NULL)

and then you can add the __assign_str(region, NULL) to the else path.

-- Steve


[1] https://lore.kernel.org/all/20240314143406.6289a060@gandalf.local.home/
Steven Rostedt March 14, 2024, 6:47 p.m. UTC | #6
On Thu, 14 Mar 2024 14:44:05 -0400
Steven Rostedt <rostedt@goodmis.org> wrote:

>   #define __string(buf, str)		len = strlen(str); buf = malloc(len)
>   #define __assign_str(buf, str)	strcpy(buf, str)
> 
> I also mentioned that __string() does much more. It can handle NULL
> pointers, so it's more like:
> 
>   #define __string(buf, str)     len = str ? strlen(str) : strlen("null"); buf = malloc(len)
>   #define __assign_str(buf, str)  if (str) strcpy(buf, str); else strcpy(buf, "null")

BTW, the above is just pseudo code, I'm ignoring the bug of the missing "+ 1" in malloc ;-)

-- Steve
Alison Schofield March 14, 2024, 8:08 p.m. UTC | #7
On Thu, Mar 14, 2024 at 02:44:05PM -0400, Steven Rostedt wrote:
> On Thu, 14 Mar 2024 10:04:14 -0700
> Alison Schofield <alison.schofield@intel.com> wrote:
> 
> > On Thu, Mar 14, 2024 at 06:53:47AM -0700, Ira Weiny wrote:
> > > alison.schofield@ wrote:  
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > > 
> > > > The TP_STRUCT__entry that gets assigned the region name, or an
> > > > empty string if no region is present, is erroneously initialized
> > > > to the cxl_region pointer. It needs to be initialized to NULL
> > > > otherwise its length is wrong and garbage chars can appear in
> > > > the kernel trace output: /sys/kernel/tracing/trace
> > > > 
> > > > Impact is that tooling depending on that trace data can miss
> > > > picking up a valid event when searching by region name. The
> > > > TP_printk() output, if enabled, does emit the correct region
> > > > names in the dmesg log.
> > > > 
> > > > This was found during testing of the cxl-list option to report
> > > > media-errors for a region.
> > > > 
> > > > Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records")
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > ---
> > > >  drivers/cxl/core/trace.h | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > > > index bdf117a33744..bc5ca4d530d1 100644
> > > > --- a/drivers/cxl/core/trace.h
> > > > +++ b/drivers/cxl/core/trace.h
> > > > @@ -657,7 +657,7 @@ TRACE_EVENT(cxl_poison,
> > > >  		__string(host, dev_name(cxlmd->dev.parent))
> > > >  		__field(u64, serial)
> > > >  		__field(u8, trace_type)
> > > > -		__string(region, region)
> > > > +		__string(region, NULL)  
> > > 
> > > Couldn't this be "" instead of NULL then remove the __assign_str() if
> > > region is NULL?
> 
> That will not work either.
> 
> 	__string() allocates the space on the ring buffer.
> 
> Getting rid of __assign_str() still leaves the buffer allocated with garbage.
> 
> As I said in the other email thread [1], __string() and __assign_str() are
> tightly coupled and are the equivalent of:
> 
>   #define __string(buf, str)		len = strlen(str); buf = malloc(len)
>   #define __assign_str(buf, str)	strcpy(buf, str)
> 
> I also mentioned that __string() does much more. It can handle NULL
> pointers, so it's more like:
> 
>   #define __string(buf, str)     len = str ? strlen(str) : strlen("null"); buf = malloc(len)
>   #define __assign_str(buf, str)  if (str) strcpy(buf, str); else strcpy(buf, "null")
> 
> So if the string is NULL, then you want to copy it. That said, looking at
> this trace-event, it's still broken, because region isn't a string, so you
> are basically doing:
> 
> 	len = strlen(region);
> 

Thanks Steve - 

> Which doesn't make sense. What you want is:
> 
> 	__string(region, region ? dev_name(&region->dev) : NULL)

Compiler didn't like that NULL. 
 warning: argument 1 null where non-null expected [-Wnonnull]
   50 |                     strlen((src) ? (const char *)(src) : "(null)") + 1)

Compiler was fine with, and it works with ""

> 
> and then you can add the __assign_str(region, NULL) to the else path.
The compiler was fine w NULL here but gave me garbage in the output when
region was NULL. "" worked.

I'm posting a v2 that functions. Please take another look.


- Alison

> 
> -- Steve
> 
> 
> [1] https://lore.kernel.org/all/20240314143406.6289a060@gandalf.local.home/
>
Steven Rostedt March 14, 2024, 8:34 p.m. UTC | #8
On Thu, 14 Mar 2024 13:08:02 -0700
Alison Schofield <alison.schofield@intel.com> wrote:

> > Which doesn't make sense. What you want is:
> > 
> > 	__string(region, region ? dev_name(&region->dev) : NULL)  
> 
> Compiler didn't like that NULL. 
>  warning: argument 1 null where non-null expected [-Wnonnull]
>    50 |                     strlen((src) ? (const char *)(src) : "(null)") + 1)
> 
> Compiler was fine with, and it works with ""

So the above should have been:

	strlen((region ? dev_name(&dev->dev) : NULL) ? (const char *)(region ? dev_name(&dev->dev) : NULL) : "(null)") + 1

And the compiler didn't like that?

What compiler are you using? Clang? I believe the above is valid, and
shouldn't give a warning, as how does NULL get returned unless the compiler
got confused.

> 
> > 
> > and then you can add the __assign_str(region, NULL) to the else path.  
> The compiler was fine w NULL here but gave me garbage in the output when
> region was NULL. "" worked.
> 
> I'm posting a v2 that functions. Please take another look.

I will.

-- Steve
Ira Weiny March 14, 2024, 8:49 p.m. UTC | #9
Steven Rostedt wrote:
> On Thu, 14 Mar 2024 10:04:14 -0700
> Alison Schofield <alison.schofield@intel.com> wrote:
> 
> > On Thu, Mar 14, 2024 at 06:53:47AM -0700, Ira Weiny wrote:
> > > alison.schofield@ wrote:  
> > > > From: Alison Schofield <alison.schofield@intel.com>
> > > > 
> > > > The TP_STRUCT__entry that gets assigned the region name, or an
> > > > empty string if no region is present, is erroneously initialized
> > > > to the cxl_region pointer. It needs to be initialized to NULL
> > > > otherwise its length is wrong and garbage chars can appear in
> > > > the kernel trace output: /sys/kernel/tracing/trace
> > > > 
> > > > Impact is that tooling depending on that trace data can miss
> > > > picking up a valid event when searching by region name. The
> > > > TP_printk() output, if enabled, does emit the correct region
> > > > names in the dmesg log.
> > > > 
> > > > This was found during testing of the cxl-list option to report
> > > > media-errors for a region.
> > > > 
> > > > Fixes: ddf49d57b841 ("cxl/trace: Add TRACE support for CXL media-error records")
> > > > Signed-off-by: Alison Schofield <alison.schofield@intel.com>
> > > > ---
> > > >  drivers/cxl/core/trace.h | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > > > index bdf117a33744..bc5ca4d530d1 100644
> > > > --- a/drivers/cxl/core/trace.h
> > > > +++ b/drivers/cxl/core/trace.h
> > > > @@ -657,7 +657,7 @@ TRACE_EVENT(cxl_poison,
> > > >  		__string(host, dev_name(cxlmd->dev.parent))
> > > >  		__field(u64, serial)
> > > >  		__field(u8, trace_type)
> > > > -		__string(region, region)
> > > > +		__string(region, NULL)  
> > > 
> > > Couldn't this be "" instead of NULL then remove the __assign_str() if
> > > region is NULL?
> 
> That will not work either.
> 
> 	__string() allocates the space on the ring buffer.

Ah!  Thanks, good to know.

> 
> Getting rid of __assign_str() still leaves the buffer allocated with garbage.
> 
> As I said in the other email thread [1], __string() and __assign_str() are
> tightly coupled and are the equivalent of:
> 
>   #define __string(buf, str)		len = strlen(str); buf = malloc(len)
>   #define __assign_str(buf, str)	strcpy(buf, str)
> 
> I also mentioned that __string() does much more. It can handle NULL
> pointers, so it's more like:
> 
>   #define __string(buf, str)     len = str ? strlen(str) : strlen("null"); buf = malloc(len)
>   #define __assign_str(buf, str)  if (str) strcpy(buf, str); else strcpy(buf, "null")
> 
> So if the string is NULL, then you want to copy it. That said, looking at
> this trace-event, it's still broken, because region isn't a string, so you
> are basically doing:
> 
> 	len = strlen(region);
> 
> Which doesn't make sense. What you want is:
> 
> 	__string(region, region ? dev_name(&region->dev) : NULL)

Yea this looks much cleaner.

Thanks!
Ira

> 
> and then you can add the __assign_str(region, NULL) to the else path.
> 
> -- Steve
> 
> 
> [1] https://lore.kernel.org/all/20240314143406.6289a060@gandalf.local.home/
>
diff mbox series

Patch

diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index bdf117a33744..bc5ca4d530d1 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -657,7 +657,7 @@  TRACE_EVENT(cxl_poison,
 		__string(host, dev_name(cxlmd->dev.parent))
 		__field(u64, serial)
 		__field(u8, trace_type)
-		__string(region, region)
+		__string(region, NULL)
 		__field(u64, overflow_ts)
 		__field(u64, hpa)
 		__field(u64, dpa)