Message ID | 20240417075053.3273543-2-ruansy.fnst@fujitsu.com |
---|---|
State | New, archived |
Headers | show |
Series | cxl: add poison creation event handler | expand |
Shiyang Ruan wrote: > The length of Physical Address in General Media Event Record/DRAM Event > Record is 64-bit, so the field mask should be defined as such length. > Otherwise, this causes cxl_general_media and cxl_dram tracepoints to > mask off the upper-32-bits of DPA addresses. The cxl_poison event is > unaffected. > > If userspace was doing its own DPA-to-HPA translation this could lead to > incorrect page retirement decisions, but there is no known consumer > (like rasdaemon) of this event today. > > Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record") > Cc: <stable@vger.kernel.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Davidlohr Bueso <dave@stgolabs.net> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Cc: Ira Weiny <ira.weiny@intel.com> Apologies I thought I saw this go in before. But perhaps it was a different mask. Reviewed-by: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > --- > drivers/cxl/core/trace.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h > index e5f13260fc52..cdfce932d5b1 100644 > --- a/drivers/cxl/core/trace.h > +++ b/drivers/cxl/core/trace.h > @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event, > * DRAM Event Record > * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 > */ > -#define CXL_DPA_FLAGS_MASK 0x3F > +#define CXL_DPA_FLAGS_MASK 0x3FULL > #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK) > > #define CXL_DPA_VOLATILE BIT(0) > -- > 2.34.1 >
Shiyang Ruan wrote: > The length of Physical Address in General Media Event Record/DRAM Event > Record is 64-bit, so the field mask should be defined as such length. > Otherwise, this causes cxl_general_media and cxl_dram tracepoints to > mask off the upper-32-bits of DPA addresses. The cxl_poison event is > unaffected. > > If userspace was doing its own DPA-to-HPA translation this could lead to > incorrect page retirement decisions, but there is no known consumer > (like rasdaemon) of this event today. > > Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record") > Cc: <stable@vger.kernel.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Davidlohr Bueso <dave@stgolabs.net> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > --- > drivers/cxl/core/trace.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h > index e5f13260fc52..cdfce932d5b1 100644 > --- a/drivers/cxl/core/trace.h > +++ b/drivers/cxl/core/trace.h > @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event, > * DRAM Event Record > * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 > */ > -#define CXL_DPA_FLAGS_MASK 0x3F > +#define CXL_DPA_FLAGS_MASK 0x3FULL > #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK) > > #define CXL_DPA_VOLATILE BIT(0) Looks good, Reviewed-by: Dan Williams <dan.j.williams@intel.com>
On Wed, Apr 17, 2024 at 03:50:52PM +0800, Shiyang Ruan wrote: > The length of Physical Address in General Media Event Record/DRAM Event > Record is 64-bit, so the field mask should be defined as such length. > Otherwise, this causes cxl_general_media and cxl_dram tracepoints to > mask off the upper-32-bits of DPA addresses. The cxl_poison event is > unaffected. > > If userspace was doing its own DPA-to-HPA translation this could lead to > incorrect page retirement decisions, but there is no known consumer > (like rasdaemon) of this event today. > So, an invalid DPA is emitted in the trace event log and that could lead to 'incorrect page retirement decisions...' > Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record") > Cc: <stable@vger.kernel.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Davidlohr Bueso <dave@stgolabs.net> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > --- > drivers/cxl/core/trace.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h > index e5f13260fc52..cdfce932d5b1 100644 > --- a/drivers/cxl/core/trace.h > +++ b/drivers/cxl/core/trace.h > @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event, > * DRAM Event Record > * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 > */ > -#define CXL_DPA_FLAGS_MASK 0x3F > +#define CXL_DPA_FLAGS_MASK 0x3FULL > #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK) > > #define CXL_DPA_VOLATILE BIT(0) This works but I'm thinking this is the time to convene on one CXL_EVENT_DPA_MASK for both all CXL events, rather than having cxl_poison event be different. I prefer how poison defines it: cxlmem.h:#define CXL_POISON_START_MASK GENMASK_ULL(63, 6) Can we rename that CXL_EVENT_DPA_MASK and use for all events? --Alison > -- > 2.34.1 >
Alison Schofield wrote: > On Wed, Apr 17, 2024 at 03:50:52PM +0800, Shiyang Ruan wrote: [snip] > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h > > index e5f13260fc52..cdfce932d5b1 100644 > > --- a/drivers/cxl/core/trace.h > > +++ b/drivers/cxl/core/trace.h > > @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event, > > * DRAM Event Record > > * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 > > */ > > -#define CXL_DPA_FLAGS_MASK 0x3F > > +#define CXL_DPA_FLAGS_MASK 0x3FULL > > #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK) > > > > #define CXL_DPA_VOLATILE BIT(0) > > This works but I'm thinking this is the time to convene on one > CXL_EVENT_DPA_MASK for both all CXL events, rather than having > cxl_poison event be different. > > I prefer how poison defines it: > > cxlmem.h:#define CXL_POISON_START_MASK GENMASK_ULL(63, 6) > > Can we rename that CXL_EVENT_DPA_MASK and use for all events? Ah! Great catch. I dont' know why I only masked off the 2 used bits. That was short sighted of me. Yes we should consolidate these. Ira
在 2024/4/24 5:04, Ira Weiny 写道: > Alison Schofield wrote: >> On Wed, Apr 17, 2024 at 03:50:52PM +0800, Shiyang Ruan wrote: > > [snip] > >>> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h >>> index e5f13260fc52..cdfce932d5b1 100644 >>> --- a/drivers/cxl/core/trace.h >>> +++ b/drivers/cxl/core/trace.h >>> @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event, >>> * DRAM Event Record >>> * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 >>> */ >>> -#define CXL_DPA_FLAGS_MASK 0x3F >>> +#define CXL_DPA_FLAGS_MASK 0x3FULL >>> #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK) >>> >>> #define CXL_DPA_VOLATILE BIT(0) >> >> This works but I'm thinking this is the time to convene on one >> CXL_EVENT_DPA_MASK for both all CXL events, rather than having >> cxl_poison event be different. >> >> I prefer how poison defines it: >> >> cxlmem.h:#define CXL_POISON_START_MASK GENMASK_ULL(63, 6) >> >> Can we rename that CXL_EVENT_DPA_MASK and use for all events? cxlmem.h:CXL_POISON_START_MASK is defined for MBOX commands (poison record, the lower 3 bits indicate "Error Source"), but CXL_DPA_MASK here is for events. They have different meaning though their values are same. So, I don't think we should consolidate them. > > Ah! Great catch. I dont' know why I only masked off the 2 used bits. Per spec, the lowest 2 bits of CXL event's DPA are defined for "Volatile or not" and "not repairable". So there is no mistake here. > That was short sighted of me. > > Yes we should consolidate these. > Ira -- Thanks, Ruan.
Shiyang Ruan wrote: > > > 在 2024/4/24 5:04, Ira Weiny 写道: > > Alison Schofield wrote: > >> On Wed, Apr 17, 2024 at 03:50:52PM +0800, Shiyang Ruan wrote: > > > > [snip] > > > >>> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h > >>> index e5f13260fc52..cdfce932d5b1 100644 > >>> --- a/drivers/cxl/core/trace.h > >>> +++ b/drivers/cxl/core/trace.h > >>> @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event, > >>> * DRAM Event Record > >>> * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 > >>> */ > >>> -#define CXL_DPA_FLAGS_MASK 0x3F > >>> +#define CXL_DPA_FLAGS_MASK 0x3FULL > >>> #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK) > >>> > >>> #define CXL_DPA_VOLATILE BIT(0) > >> > >> This works but I'm thinking this is the time to convene on one > >> CXL_EVENT_DPA_MASK for both all CXL events, rather than having > >> cxl_poison event be different. > >> > >> I prefer how poison defines it: > >> > >> cxlmem.h:#define CXL_POISON_START_MASK GENMASK_ULL(63, 6) > >> > >> Can we rename that CXL_EVENT_DPA_MASK and use for all events? > > cxlmem.h:CXL_POISON_START_MASK is defined for MBOX commands (poison > record, the lower 3 bits indicate "Error Source"), but CXL_DPA_MASK here > is for events. They have different meaning though their values are > same. So, I don't think we should consolidate them. By definition the DPA in all these payloads can't use the lower 6 bits. We are defining a mask to get the DPA. This has nothing to do with what may be stored in the other 6 bits. Defining a common DPA mask is correct per both sections of the spec. > > > > > Ah! Great catch. I dont' know why I only masked off the 2 used bits. > > Per spec, the lowest 2 bits of CXL event's DPA are defined for "Volatile > or not" and "not repairable". So there is no mistake here. I appreciate your not calling out my code as a bug. :-D However, bits [5:2] are also Reserved. So it is incorrect to mask off only the lower 2 bits. Even though the reserved bits must be 0 per the spec, it is still better to properly mask all reserved bits because they are by definition not part of the DPA. Could you create a common macro for the next version of the patch? Thanks, Ira > > > That was short sighted of me. > > > > Yes we should consolidate these. > > Ira > > -- > Thanks, > Ruan. >
On Wed, Apr 17, 2024 at 03:50:52PM +0800, Shiyang Ruan wrote: > The length of Physical Address in General Media Event Record/DRAM Event > Record is 64-bit, so the field mask should be defined as such length. > Otherwise, this causes cxl_general_media and cxl_dram tracepoints to > mask off the upper-32-bits of DPA addresses. The cxl_poison event is > unaffected. > > If userspace was doing its own DPA-to-HPA translation this could lead to > incorrect page retirement decisions, but there is no known consumer > (like rasdaemon) of this event today. > > Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record") > Cc: <stable@vger.kernel.org> > Cc: Dan Williams <dan.j.williams@intel.com> > Cc: Davidlohr Bueso <dave@stgolabs.net> > Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> > Cc: Ira Weiny <ira.weiny@intel.com> > Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Hi Ruan, This fixup is important for the Event DPA->HPA translation work, so I grabbed it, updated it with most* of the review comments, and posted with that set. I expect you saw that in your mailbox. DaveJ queued it in a topic branch for 6.10 here: https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.10/dpa-to-hpa *I did not create a common mask for events and poison because I wanted to limit the changes. If you'd like to make that change it would be welcomed. -- Alison > --- > drivers/cxl/core/trace.h | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h > index e5f13260fc52..cdfce932d5b1 100644 > --- a/drivers/cxl/core/trace.h > +++ b/drivers/cxl/core/trace.h > @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event, > * DRAM Event Record > * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 > */ > -#define CXL_DPA_FLAGS_MASK 0x3F > +#define CXL_DPA_FLAGS_MASK 0x3FULL > #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK) > > #define CXL_DPA_VOLATILE BIT(0) > -- > 2.34.1 >
在 2024/5/1 5:00, Alison Schofield 写道: > On Wed, Apr 17, 2024 at 03:50:52PM +0800, Shiyang Ruan wrote: >> The length of Physical Address in General Media Event Record/DRAM Event >> Record is 64-bit, so the field mask should be defined as such length. >> Otherwise, this causes cxl_general_media and cxl_dram tracepoints to >> mask off the upper-32-bits of DPA addresses. The cxl_poison event is >> unaffected. >> >> If userspace was doing its own DPA-to-HPA translation this could lead to >> incorrect page retirement decisions, but there is no known consumer >> (like rasdaemon) of this event today. >> >> Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record") >> Cc: <stable@vger.kernel.org> >> Cc: Dan Williams <dan.j.williams@intel.com> >> Cc: Davidlohr Bueso <dave@stgolabs.net> >> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> >> Cc: Ira Weiny <ira.weiny@intel.com> >> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> > > Hi Ruan, > > This fixup is important for the Event DPA->HPA translation work, so I > grabbed it, updated it with most* of the review comments, and posted > with that set. I expect you saw that in your mailbox. Yes, I saw that. Nice fix! > > DaveJ queued it in a topic branch for 6.10 here: > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/log/?h=for-6.10/dpa-to-hpa > That's good! > *I did not create a common mask for events and poison because I wanted to > limit the changes. If you'd like to make that change it would be welcomed. Ok~ -- Thanks, Ruan. > > -- Alison > >> --- >> drivers/cxl/core/trace.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h >> index e5f13260fc52..cdfce932d5b1 100644 >> --- a/drivers/cxl/core/trace.h >> +++ b/drivers/cxl/core/trace.h >> @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event, >> * DRAM Event Record >> * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 >> */ >> -#define CXL_DPA_FLAGS_MASK 0x3F >> +#define CXL_DPA_FLAGS_MASK 0x3FULL >> #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK) >> >> #define CXL_DPA_VOLATILE BIT(0) >> -- >> 2.34.1 >>
diff --git a/drivers/cxl/core/trace.h b/drivers/cxl/core/trace.h index e5f13260fc52..cdfce932d5b1 100644 --- a/drivers/cxl/core/trace.h +++ b/drivers/cxl/core/trace.h @@ -253,7 +253,7 @@ TRACE_EVENT(cxl_generic_event, * DRAM Event Record * CXL rev 3.0 section 8.2.9.2.1.2; Table 8-44 */ -#define CXL_DPA_FLAGS_MASK 0x3F +#define CXL_DPA_FLAGS_MASK 0x3FULL #define CXL_DPA_MASK (~CXL_DPA_FLAGS_MASK) #define CXL_DPA_VOLATILE BIT(0)
The length of Physical Address in General Media Event Record/DRAM Event Record is 64-bit, so the field mask should be defined as such length. Otherwise, this causes cxl_general_media and cxl_dram tracepoints to mask off the upper-32-bits of DPA addresses. The cxl_poison event is unaffected. If userspace was doing its own DPA-to-HPA translation this could lead to incorrect page retirement decisions, but there is no known consumer (like rasdaemon) of this event today. Fixes: d54a531a430b ("cxl/mem: Trace General Media Event Record") Cc: <stable@vger.kernel.org> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Davidlohr Bueso <dave@stgolabs.net> Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com> Cc: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> --- drivers/cxl/core/trace.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)