diff mbox series

[v3,1/2] cxl/core: correct length of DPA field masks

Message ID 20240417075053.3273543-2-ruansy.fnst@fujitsu.com (mailing list archive)
State New
Headers show
Series cxl: add poison creation event handler | expand

Commit Message

Shiyang Ruan April 17, 2024, 7:50 a.m. UTC
The length of Physical Address in General Media Event Record/DRAM Event
Record is 64-bit, so the field mask should be defined as such length.
Otherwise, this causes cxl_general_media and cxl_dram tracepoints to
mask off the upper-32-bits of DPA addresses. The cxl_poison event is
unaffected.

If userspace was doing its own DPA-to-HPA translation this could lead to
incorrect page retirement decisions, but there is no known consumer
(like rasdaemon) of this event today.

Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record")
Cc: <stable@vger.kernel.org>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
---
 drivers/cxl/core/trace.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Ira Weiny April 23, 2024, 5:35 p.m. UTC | #1
Shiyang Ruan wrote:
> The length of Physical Address in General Media Event Record/DRAM Event
> Record is 64-bit, so the field mask should be defined as such length.
> Otherwise, this causes cxl_general_media and cxl_dram tracepoints to
> mask off the upper-32-bits of DPA addresses. The cxl_poison event is
> unaffected.
> 
> If userspace was doing its own DPA-to-HPA translation this could lead to
> incorrect page retirement decisions, but there is no known consumer
> (like rasdaemon) of this event today.
> 
> Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record")
> Cc: <stable@vger.kernel.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Ira Weiny <ira.weiny@intel.com>

Apologies I thought I saw this go in before.  But perhaps it was a
different mask.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
>  drivers/cxl/core/trace.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index e5f13260fc52..cdfce932d5b1 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event,
>   * DRAM Event Record
>   * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
>   */
> -#define CXL_DPA_FLAGS_MASK			0x3F
> +#define CXL_DPA_FLAGS_MASK			0x3FULL
>  #define CXL_DPA_MASK				(~CXL_DPA_FLAGS_MASK)
>  
>  #define CXL_DPA_VOLATILE			BIT(0)
> -- 
> 2.34.1
>
Dan Williams April 23, 2024, 5:35 p.m. UTC | #2
Shiyang Ruan wrote:
> The length of Physical Address in General Media Event Record/DRAM Event
> Record is 64-bit, so the field mask should be defined as such length.
> Otherwise, this causes cxl_general_media and cxl_dram tracepoints to
> mask off the upper-32-bits of DPA addresses. The cxl_poison event is
> unaffected.
> 
> If userspace was doing its own DPA-to-HPA translation this could lead to
> incorrect page retirement decisions, but there is no known consumer
> (like rasdaemon) of this event today.
> 
> Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record")
> Cc: <stable@vger.kernel.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
>  drivers/cxl/core/trace.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index e5f13260fc52..cdfce932d5b1 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event,
>   * DRAM Event Record
>   * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
>   */
> -#define CXL_DPA_FLAGS_MASK			0x3F
> +#define CXL_DPA_FLAGS_MASK			0x3FULL
>  #define CXL_DPA_MASK				(~CXL_DPA_FLAGS_MASK)
>  
>  #define CXL_DPA_VOLATILE			BIT(0)

Looks good,

Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Alison Schofield April 23, 2024, 5:42 p.m. UTC | #3
On Wed, Apr 17, 2024 at 03:50:52PM +0800, Shiyang Ruan wrote:
> The length of Physical Address in General Media Event Record/DRAM Event
> Record is 64-bit, so the field mask should be defined as such length.
> Otherwise, this causes cxl_general_media and cxl_dram tracepoints to
> mask off the upper-32-bits of DPA addresses. The cxl_poison event is
> unaffected.
> 
> If userspace was doing its own DPA-to-HPA translation this could lead to
> incorrect page retirement decisions, but there is no known consumer
> (like rasdaemon) of this event today.
> 

So, an invalid DPA is emitted in the trace event log and that could
lead to 'incorrect page retirement decisions...'

> Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record")
> Cc: <stable@vger.kernel.org>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Davidlohr Bueso <dave@stgolabs.net>
> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
> Cc: Ira Weiny <ira.weiny@intel.com>
> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com>
> ---
>  drivers/cxl/core/trace.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> index e5f13260fc52..cdfce932d5b1 100644
> --- a/drivers/cxl/core/trace.h
> +++ b/drivers/cxl/core/trace.h
> @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event,
>   * DRAM Event Record
>   * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
>   */
> -#define CXL_DPA_FLAGS_MASK			0x3F
> +#define CXL_DPA_FLAGS_MASK			0x3FULL
>  #define CXL_DPA_MASK				(~CXL_DPA_FLAGS_MASK)
>  
>  #define CXL_DPA_VOLATILE			BIT(0)

This works but I'm thinking this is the time to convene on one 
CXL_EVENT_DPA_MASK for both all CXL events, rather than having
cxl_poison event be different.

I prefer how poison defines it:

cxlmem.h:#define CXL_POISON_START_MASK          GENMASK_ULL(63, 6)

Can we rename that CXL_EVENT_DPA_MASK and use for all events?

--Alison

> -- 
> 2.34.1
>
Ira Weiny April 23, 2024, 9:04 p.m. UTC | #4
Alison Schofield wrote:
> On Wed, Apr 17, 2024 at 03:50:52PM +0800, Shiyang Ruan wrote:

[snip]

> > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> > index e5f13260fc52..cdfce932d5b1 100644
> > --- a/drivers/cxl/core/trace.h
> > +++ b/drivers/cxl/core/trace.h
> > @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event,
> >   * DRAM Event Record
> >   * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> >   */
> > -#define CXL_DPA_FLAGS_MASK			0x3F
> > +#define CXL_DPA_FLAGS_MASK			0x3FULL
> >  #define CXL_DPA_MASK				(~CXL_DPA_FLAGS_MASK)
> >  
> >  #define CXL_DPA_VOLATILE			BIT(0)
> 
> This works but I'm thinking this is the time to convene on one 
> CXL_EVENT_DPA_MASK for both all CXL events, rather than having
> cxl_poison event be different.
> 
> I prefer how poison defines it:
> 
> cxlmem.h:#define CXL_POISON_START_MASK          GENMASK_ULL(63, 6)
> 
> Can we rename that CXL_EVENT_DPA_MASK and use for all events?

Ah!  Great catch.  I dont' know why I only masked off the 2 used bits.
That was short sighted of me.

Yes we should consolidate these.
Ira
Shiyang Ruan April 25, 2024, 10:05 a.m. UTC | #5
在 2024/4/24 5:04, Ira Weiny 写道:
> Alison Schofield wrote:
>> On Wed, Apr 17, 2024 at 03:50:52PM +0800, Shiyang Ruan wrote:
> 
> [snip]
> 
>>> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
>>> index e5f13260fc52..cdfce932d5b1 100644
>>> --- a/drivers/cxl/core/trace.h
>>> +++ b/drivers/cxl/core/trace.h
>>> @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event,
>>>    * DRAM Event Record
>>>    * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
>>>    */
>>> -#define CXL_DPA_FLAGS_MASK			0x3F
>>> +#define CXL_DPA_FLAGS_MASK			0x3FULL
>>>   #define CXL_DPA_MASK				(~CXL_DPA_FLAGS_MASK)
>>>   
>>>   #define CXL_DPA_VOLATILE			BIT(0)
>>
>> This works but I'm thinking this is the time to convene on one
>> CXL_EVENT_DPA_MASK for both all CXL events, rather than having
>> cxl_poison event be different.
>>
>> I prefer how poison defines it:
>>
>> cxlmem.h:#define CXL_POISON_START_MASK          GENMASK_ULL(63, 6)
>>
>> Can we rename that CXL_EVENT_DPA_MASK and use for all events?

cxlmem.h:CXL_POISON_START_MASK is defined for MBOX commands (poison 
record, the lower 3 bits indicate "Error Source"), but CXL_DPA_MASK here 
is for events.  They have different meaning though their values are 
same.  So, I don't think we should consolidate them.

> 
> Ah!  Great catch.  I dont' know why I only masked off the 2 used bits.

Per spec, the lowest 2 bits of CXL event's DPA are defined for "Volatile 
or not" and "not repairable".  So there is no mistake here.

> That was short sighted of me.
> 
> Yes we should consolidate these.
> Ira

--
Thanks,
Ruan.
Ira Weiny April 25, 2024, 4:04 p.m. UTC | #6
Shiyang Ruan wrote:
> 
> 
> 在 2024/4/24 5:04, Ira Weiny 写道:
> > Alison Schofield wrote:
> >> On Wed, Apr 17, 2024 at 03:50:52PM +0800, Shiyang Ruan wrote:
> > 
> > [snip]
> > 
> >>> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
> >>> index e5f13260fc52..cdfce932d5b1 100644
> >>> --- a/drivers/cxl/core/trace.h
> >>> +++ b/drivers/cxl/core/trace.h
> >>> @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event,
> >>>    * DRAM Event Record
> >>>    * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
> >>>    */
> >>> -#define CXL_DPA_FLAGS_MASK			0x3F
> >>> +#define CXL_DPA_FLAGS_MASK			0x3FULL
> >>>   #define CXL_DPA_MASK				(~CXL_DPA_FLAGS_MASK)
> >>>   
> >>>   #define CXL_DPA_VOLATILE			BIT(0)
> >>
> >> This works but I'm thinking this is the time to convene on one
> >> CXL_EVENT_DPA_MASK for both all CXL events, rather than having
> >> cxl_poison event be different.
> >>
> >> I prefer how poison defines it:
> >>
> >> cxlmem.h:#define CXL_POISON_START_MASK          GENMASK_ULL(63, 6)
> >>
> >> Can we rename that CXL_EVENT_DPA_MASK and use for all events?
> 
> cxlmem.h:CXL_POISON_START_MASK is defined for MBOX commands (poison 
> record, the lower 3 bits indicate "Error Source"), but CXL_DPA_MASK here 
> is for events.  They have different meaning though their values are 
> same.  So, I don't think we should consolidate them.

By definition the DPA in all these payloads can't use the lower 6 bits.
We are defining a mask to get the DPA.  This has nothing to do with what
may be stored in the other 6 bits.

Defining a common DPA mask is correct per both sections of the spec.

> 
> > 
> > Ah!  Great catch.  I dont' know why I only masked off the 2 used bits.
> 
> Per spec, the lowest 2 bits of CXL event's DPA are defined for "Volatile 
> or not" and "not repairable".  So there is no mistake here.

I appreciate your not calling out my code as a bug.  :-D

However, bits [5:2] are also Reserved.  So it is incorrect to mask off
only the lower 2 bits.  Even though the reserved bits must be 0 per the
spec, it is still better to properly mask all reserved bits because they
are by definition not part of the DPA.

Could you create a common macro for the next version of the patch?

Thanks,
Ira

> 
> > That was short sighted of me.
> > 
> > Yes we should consolidate these.
> > Ira
> 
> --
> Thanks,
> Ruan.
>
diff mbox series

Patch

diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h
index e5f13260fc52..cdfce932d5b1 100644
--- a/drivers/cxl/core/trace.h
+++ b/drivers/cxl/core/trace.h
@@ -253,7 +253,7 @@  TRACE_EVENT(cxl_generic_event,
  * DRAM Event Record
  * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44
  */
-#define CXL_DPA_FLAGS_MASK			0x3F
+#define CXL_DPA_FLAGS_MASK			0x3FULL
 #define CXL_DPA_MASK				(~CXL_DPA_FLAGS_MASK)
 
 #define CXL_DPA_VOLATILE			BIT(0)