diff mbox series

[v2,1/1] cxl/core: Fix initialization of mbox_cmd.size_out in get event

Message ID 20240402081404.1106-2-kwangjin.ko@sk.com
State Accepted
Commit f7c52345ccc96343c0a05bdea3121c8ac7b67d5f
Headers show
Series cxl/core: Fix initialization of mbox_cmd.size_out in get event | expand

Commit Message

Kwangjin Ko April 2, 2024, 8:14 a.m. UTC
Since mbox_cmd.size_out is overwritten with the actual output size in
the function below, it needs to be initialized every time.

cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd

Problem scenario:

1) The size_out variable is initially set to the size of the mailbox.
2) Read an event.
   - size_out is set to 160 bytes(header 32B + one event 128B).
   - Two event are created while reading.
3) Read the new *two* events.
   - size_out is still set to 160 bytes.
   - Although the value of out_len is 288 bytes, only 160 bytes are
     copied from the mailbox register to the local variable.
   - record_count is set to 2.
   - Accessing records[1] will result in reading incorrect data.

Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>
---
 drivers/cxl/core/mbox.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Jonathan Cameron April 2, 2024, 2:38 p.m. UTC | #1
On Tue, 2 Apr 2024 17:14:03 +0900
Kwangjin Ko <kwangjin.ko@sk.com> wrote:

> Since mbox_cmd.size_out is overwritten with the actual output size in
> the function below, it needs to be initialized every time.
> 
> cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> 
> Problem scenario:
> 
> 1) The size_out variable is initially set to the size of the mailbox.
> 2) Read an event.
>    - size_out is set to 160 bytes(header 32B + one event 128B).
>    - Two event are created while reading.
> 3) Read the new *two* events.
>    - size_out is still set to 160 bytes.
>    - Although the value of out_len is 288 bytes, only 160 bytes are
>      copied from the mailbox register to the local variable.
>    - record_count is set to 2.
>    - Accessing records[1] will result in reading incorrect data.
> 
> Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>

Looks correct to me.

Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>

> ---
>  drivers/cxl/core/mbox.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..a38531a055c8 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>  		.payload_in = &log_type,
>  		.size_in = sizeof(log_type),
>  		.payload_out = payload,
> -		.size_out = mds->payload_size,
>  		.min_out = struct_size(payload, records, 0),
>  	};
>  
>  	do {
>  		int rc, i;
>  
> +		mbox_cmd.size_out = mds->payload_size;
> +
>  		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>  		if (rc) {
>  			dev_err_ratelimited(dev,
Ira Weiny April 2, 2024, 4:25 p.m. UTC | #2
Kwangjin Ko wrote:
> Since mbox_cmd.size_out is overwritten with the actual output size in
> the function below, it needs to be initialized every time.
> 
> cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> 
> Problem scenario:
> 
> 1) The size_out variable is initially set to the size of the mailbox.
> 2) Read an event.
>    - size_out is set to 160 bytes(header 32B + one event 128B).
>    - Two event are created while reading.
> 3) Read the new *two* events.
>    - size_out is still set to 160 bytes.
>    - Although the value of out_len is 288 bytes, only 160 bytes are
>      copied from the mailbox register to the local variable.
>    - record_count is set to 2.
>    - Accessing records[1] will result in reading incorrect data.
> 
> Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>


Tested-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>

> ---
>  drivers/cxl/core/mbox.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..a38531a055c8 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>  		.payload_in = &log_type,
>  		.size_in = sizeof(log_type),
>  		.payload_out = payload,
> -		.size_out = mds->payload_size,
>  		.min_out = struct_size(payload, records, 0),
>  	};
>  
>  	do {
>  		int rc, i;
>  
> +		mbox_cmd.size_out = mds->payload_size;
> +
>  		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>  		if (rc) {
>  			dev_err_ratelimited(dev,
> -- 
> 2.34.1
>
Dan Williams April 3, 2024, 2:53 p.m. UTC | #3
Kwangjin Ko wrote:
> Since mbox_cmd.size_out is overwritten with the actual output size in
> the function below, it needs to be initialized every time.
> 
> cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> 
> Problem scenario:
> 
> 1) The size_out variable is initially set to the size of the mailbox.
> 2) Read an event.
>    - size_out is set to 160 bytes(header 32B + one event 128B).
>    - Two event are created while reading.
> 3) Read the new *two* events.
>    - size_out is still set to 160 bytes.
>    - Although the value of out_len is 288 bytes, only 160 bytes are
>      copied from the mailbox register to the local variable.
>    - record_count is set to 2.
>    - Accessing records[1] will result in reading incorrect data.
> 
> Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>
> ---
>  drivers/cxl/core/mbox.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..a38531a055c8 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>  		.payload_in = &log_type,
>  		.size_in = sizeof(log_type),
>  		.payload_out = payload,
> -		.size_out = mds->payload_size,
>  		.min_out = struct_size(payload, records, 0),
>  	};
>  
>  	do {
>  		int rc, i;
>  
> +		mbox_cmd.size_out = mds->payload_size;
> +

Fix looks correct, but I am concerned it is a band-aid for a more
general problem. For example, if I am not mistaken, we have a similar
bug in cxl_mem_get_poison().

So perhaps a convention to always define @mbox_cmd immediately before
cxl_internal_send_cmd() like this:

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 9adda4795eb7..5d44b5c095b7 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -946,24 +946,22 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
        struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
        struct device *dev = mds->cxlds.dev;
        struct cxl_get_event_payload *payload;
-       struct cxl_mbox_cmd mbox_cmd;
        u8 log_type = type;
        u16 nr_rec;
 
        mutex_lock(&mds->event.log_lock);
        payload = mds->event.buf;
 
-       mbox_cmd = (struct cxl_mbox_cmd) {
-               .opcode = CXL_MBOX_OP_GET_EVENT_RECORD,
-               .payload_in = &log_type,
-               .size_in = sizeof(log_type),
-               .payload_out = payload,
-               .size_out = mds->payload_size,
-               .min_out = struct_size(payload, records, 0),
-       };
-
        do {
                int rc, i;
+               struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd){
+                       .opcode = CXL_MBOX_OP_GET_EVENT_RECORD,
+                       .payload_in = &log_type,
+                       .size_in = sizeof(log_type),
+                       .payload_out = payload,
+                       .size_out = mds->payload_size,
+                       .min_out = struct_size(payload, records, 0),
+               };
 
                rc = cxl_internal_send_cmd(mds, &mbox_cmd);
                if (rc) {
@@ -1296,7 +1294,6 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
        struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
        struct cxl_mbox_poison_out *po;
        struct cxl_mbox_poison_in pi;
-       struct cxl_mbox_cmd mbox_cmd;
        int nr_records = 0;
        int rc;
 
@@ -1308,16 +1305,16 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
        pi.offset = cpu_to_le64(offset);
        pi.length = cpu_to_le64(len / CXL_POISON_LEN_MULT);
 
-       mbox_cmd = (struct cxl_mbox_cmd) {
-               .opcode = CXL_MBOX_OP_GET_POISON,
-               .size_in = sizeof(pi),
-               .payload_in = &pi,
-               .size_out = mds->payload_size,
-               .payload_out = po,
-               .min_out = struct_size(po, record, 0),
-       };
-
        do {
+               struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd){
+                       .opcode = CXL_MBOX_OP_GET_POISON,
+                       .size_in = sizeof(pi),
+                       .payload_in = &pi,
+                       .size_out = mds->payload_size,
+                       .payload_out = po,
+                       .min_out = struct_size(po, record, 0),
+               };
+
                rc = cxl_internal_send_cmd(mds, &mbox_cmd);
                if (rc)
                        break;
Jonathan Cameron April 4, 2024, 1:30 p.m. UTC | #4
On Wed, 3 Apr 2024 07:53:24 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> Kwangjin Ko wrote:
> > Since mbox_cmd.size_out is overwritten with the actual output size in
> > the function below, it needs to be initialized every time.
> > 
> > cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> > 
> > Problem scenario:
> > 
> > 1) The size_out variable is initially set to the size of the mailbox.
> > 2) Read an event.
> >    - size_out is set to 160 bytes(header 32B + one event 128B).
> >    - Two event are created while reading.
> > 3) Read the new *two* events.
> >    - size_out is still set to 160 bytes.
> >    - Although the value of out_len is 288 bytes, only 160 bytes are
> >      copied from the mailbox register to the local variable.
> >    - record_count is set to 2.
> >    - Accessing records[1] will result in reading incorrect data.
> > 
> > Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>
> > ---
> >  drivers/cxl/core/mbox.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 9adda4795eb7..a38531a055c8 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> >  		.payload_in = &log_type,
> >  		.size_in = sizeof(log_type),
> >  		.payload_out = payload,
> > -		.size_out = mds->payload_size,
> >  		.min_out = struct_size(payload, records, 0),
> >  	};
> >  
> >  	do {
> >  		int rc, i;
> >  
> > +		mbox_cmd.size_out = mds->payload_size;
> > +  
> 
> Fix looks correct, but I am concerned it is a band-aid for a more
> general problem. For example, if I am not mistaken, we have a similar
> bug in cxl_mem_get_poison().
> 
> So perhaps a convention to always define @mbox_cmd immediately before
> cxl_internal_send_cmd() like this:

Makes sense to me.  These aren't hot paths, so safe code is worth the
possible extra writes.

> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..5d44b5c095b7 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -946,24 +946,22 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>         struct cxl_memdev *cxlmd = mds->cxlds.cxlmd;
>         struct device *dev = mds->cxlds.dev;
>         struct cxl_get_event_payload *payload;
> -       struct cxl_mbox_cmd mbox_cmd;
>         u8 log_type = type;
>         u16 nr_rec;
>  
>         mutex_lock(&mds->event.log_lock);
>         payload = mds->event.buf;
>  
> -       mbox_cmd = (struct cxl_mbox_cmd) {
> -               .opcode = CXL_MBOX_OP_GET_EVENT_RECORD,
> -               .payload_in = &log_type,
> -               .size_in = sizeof(log_type),
> -               .payload_out = payload,
> -               .size_out = mds->payload_size,
> -               .min_out = struct_size(payload, records, 0),
> -       };
> -
>         do {
>                 int rc, i;
> +               struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd){
> +                       .opcode = CXL_MBOX_OP_GET_EVENT_RECORD,
> +                       .payload_in = &log_type,
> +                       .size_in = sizeof(log_type),
> +                       .payload_out = payload,
> +                       .size_out = mds->payload_size,
> +                       .min_out = struct_size(payload, records, 0),
> +               };
>  
>                 rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>                 if (rc) {
> @@ -1296,7 +1294,6 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>         struct cxl_memdev_state *mds = to_cxl_memdev_state(cxlmd->cxlds);
>         struct cxl_mbox_poison_out *po;
>         struct cxl_mbox_poison_in pi;
> -       struct cxl_mbox_cmd mbox_cmd;
>         int nr_records = 0;
>         int rc;
>  
> @@ -1308,16 +1305,16 @@ int cxl_mem_get_poison(struct cxl_memdev *cxlmd, u64 offset, u64 len,
>         pi.offset = cpu_to_le64(offset);
>         pi.length = cpu_to_le64(len / CXL_POISON_LEN_MULT);
>  
> -       mbox_cmd = (struct cxl_mbox_cmd) {
> -               .opcode = CXL_MBOX_OP_GET_POISON,
> -               .size_in = sizeof(pi),
> -               .payload_in = &pi,
> -               .size_out = mds->payload_size,
> -               .payload_out = po,
> -               .min_out = struct_size(po, record, 0),
> -       };
> -
>         do {
> +               struct cxl_mbox_cmd mbox_cmd = (struct cxl_mbox_cmd){
> +                       .opcode = CXL_MBOX_OP_GET_POISON,
> +                       .size_in = sizeof(pi),
> +                       .payload_in = &pi,
> +                       .size_out = mds->payload_size,
> +                       .payload_out = po,
> +                       .min_out = struct_size(po, record, 0),
> +               };
> +
>                 rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>                 if (rc)
>                         break;
>
Dan Williams April 4, 2024, 5:35 p.m. UTC | #5
Jonathan Cameron wrote:
[..]
> > 
> > Fix looks correct, but I am concerned it is a band-aid for a more
> > general problem. For example, if I am not mistaken, we have a similar
> > bug in cxl_mem_get_poison().
> > 
> > So perhaps a convention to always define @mbox_cmd immediately before
> > cxl_internal_send_cmd() like this:
> 
> Makes sense to me.  These aren't hot paths, so safe code is worth the
> possible extra writes.

Yeah, the before and after size wise is small:

   text	   data	    bss	    dec	    hex	filename
  15407	   2129	     49	  17585	   44b1	drivers/cxl/core/mbox.o

   text	   data	    bss	    dec	    hex	filename
  15461	   2129	     49	  17639	   44e7	drivers/cxl/core/mbox.o

...which I think is worth the peace of mind, and it matches what is
currently done in cxl_xfer_log()

I will send a follow on with this since this patch is already in
cxl-fixes.
Ira Weiny April 4, 2024, 8:20 p.m. UTC | #6
Dan Williams wrote:
> Jonathan Cameron wrote:
> [..]
> > > 
> > > Fix looks correct, but I am concerned it is a band-aid for a more
> > > general problem. For example, if I am not mistaken, we have a similar
> > > bug in cxl_mem_get_poison().
> > > 
> > > So perhaps a convention to always define @mbox_cmd immediately before
> > > cxl_internal_send_cmd() like this:
> > 
> > Makes sense to me.  These aren't hot paths, so safe code is worth the
> > possible extra writes.
> 
> Yeah, the before and after size wise is small:
> 
>    text	   data	    bss	    dec	    hex	filename
>   15407	   2129	     49	  17585	   44b1	drivers/cxl/core/mbox.o
> 
>    text	   data	    bss	    dec	    hex	filename
>   15461	   2129	     49	  17639	   44e7	drivers/cxl/core/mbox.o
> 
> ...which I think is worth the peace of mind, and it matches what is
> currently done in cxl_xfer_log()
> 
> I will send a follow on with this since this patch is already in
> cxl-fixes.

Thanks Dan.
Ira
Alison Schofield April 5, 2024, 4:04 p.m. UTC | #7
On Tue, Apr 02, 2024 at 05:14:03PM +0900, Kwangjin Ko wrote:
> Since mbox_cmd.size_out is overwritten with the actual output size in
> the function below, it needs to be initialized every time.
> 
> cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> 
> Problem scenario:
> 
> 1) The size_out variable is initially set to the size of the mailbox.
> 2) Read an event.
>    - size_out is set to 160 bytes(header 32B + one event 128B).
>    - Two event are created while reading.
> 3) Read the new *two* events.
>    - size_out is still set to 160 bytes.
>    - Although the value of out_len is 288 bytes, only 160 bytes are
>      copied from the mailbox register to the local variable.
>    - record_count is set to 2.
>    - Accessing records[1] will result in reading incorrect data.

Agree with the other comments on need to set .out_size when doing
cxl_internal_send_cmd() in a loop. Poison list retrieval can hit
this case if the MORE flag is set and a follow on read of the list
delivers more records than the previous read. ie. device gives one
record, sets the _MORE flag, then gives 5.

2 other things appeared to me while looking at this:

First, it seems that there is another cleanup wrt accessing records
with invalid data.  Still focusing on get_events and get_poison
since those loop through output data based on a device supplied
record count. The min_out check means the driver at least gets a
count of records to expect. That's good. The problem occurs::

if (mbox.size_out != struct_size(payload, records, 'record_count'))

The driver will log garbage trace events, and that could lead to
bad actions based on bad data. (like a needless scan of device based
on a false overflow flag). So, checking that size.out is the proper
multiple of record_count protects driver from bad device behavior.

I think that can be combined w the patch Dan is suggesting to
reset mbox.size_out on each loop.

Second thing is the pci-driver quiet handling of PAYLOAD LENGTH
values reported by the device. It seems like at a minimum the
pci-driver could emit an info or debug message when the device
is reporting payload lengths that exceed what the driver can
copy in. I'm referring to the mbox.size_out adjustment in
__cxl_pci_mbox_send_cmd(). Or, if it's not the pci-drivers job
to judge, pass that actual payload length value back in the
mbox structure (new field) so that the cxl-driver can use it.
The pci driver would still do it's "#8 Sanitize the copy" work,
but it would allow the cxl-driver to clearly see why it got the
.size_out it got, and squawk about it if needed.

--Alison

> 
> Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>
> ---
>  drivers/cxl/core/mbox.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> index 9adda4795eb7..a38531a055c8 100644
> --- a/drivers/cxl/core/mbox.c
> +++ b/drivers/cxl/core/mbox.c
> @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
>  		.payload_in = &log_type,
>  		.size_in = sizeof(log_type),
>  		.payload_out = payload,
> -		.size_out = mds->payload_size,
>  		.min_out = struct_size(payload, records, 0),
>  	};
>  
>  	do {
>  		int rc, i;
>  
> +		mbox_cmd.size_out = mds->payload_size;
> +
>  		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
>  		if (rc) {
>  			dev_err_ratelimited(dev,
> -- 
> 2.34.1
>
Jonathan Cameron April 5, 2024, 4:40 p.m. UTC | #8
On Fri, 5 Apr 2024 09:04:16 -0700
Alison Schofield <alison.schofield@intel.com> wrote:

> On Tue, Apr 02, 2024 at 05:14:03PM +0900, Kwangjin Ko wrote:
> > Since mbox_cmd.size_out is overwritten with the actual output size in
> > the function below, it needs to be initialized every time.
> > 
> > cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> > 
> > Problem scenario:
> > 
> > 1) The size_out variable is initially set to the size of the mailbox.
> > 2) Read an event.
> >    - size_out is set to 160 bytes(header 32B + one event 128B).
> >    - Two event are created while reading.
> > 3) Read the new *two* events.
> >    - size_out is still set to 160 bytes.
> >    - Although the value of out_len is 288 bytes, only 160 bytes are
> >      copied from the mailbox register to the local variable.
> >    - record_count is set to 2.
> >    - Accessing records[1] will result in reading incorrect data.  
> 
> Agree with the other comments on need to set .out_size when doing
> cxl_internal_send_cmd() in a loop. Poison list retrieval can hit
> this case if the MORE flag is set and a follow on read of the list
> delivers more records than the previous read. ie. device gives one
> record, sets the _MORE flag, then gives 5.
> 
> 2 other things appeared to me while looking at this:
> 
> First, it seems that there is another cleanup wrt accessing records
> with invalid data.  Still focusing on get_events and get_poison
> since those loop through output data based on a device supplied
> record count. The min_out check means the driver at least gets a
> count of records to expect. That's good. The problem occurs::
> 
> if (mbox.size_out != struct_size(payload, records, 'record_count'))
> 
> The driver will log garbage trace events, and that could lead to
> bad actions based on bad data. (like a needless scan of device based
> on a false overflow flag). So, checking that size.out is the proper
> multiple of record_count protects driver from bad device behavior.
> 
> I think that can be combined w the patch Dan is suggesting to
> reset mbox.size_out on each loop.
Hi Alison,

I'd split it.  Dan's one is a bug fix, this is hardening against
a device bug. Good to have but not really backport material unless
we think there are devices like this out there.

> 
> Second thing is the pci-driver quiet handling of PAYLOAD LENGTH
> values reported by the device. It seems like at a minimum the
> pci-driver could emit an info or debug message when the device
> is reporting payload lengths that exceed what the driver can
> copy in.

When does this happen?
1. New fields on end of a fixed length message.
   Correct to silently eat it as the spec is buggy if we don't
    have backwards compatibility.
    I don't think the spec has had that particular type of bug yet,
    but maybe I'm forgetting one.
2. Device bug.  Can't tell that is different from 1.

So maybe dev_dbg(). I'm not sure why the cxl-driver would ever want to
know.

> I'm referring to the mbox.size_out adjustment in
> __cxl_pci_mbox_send_cmd(). Or, if it's not the pci-drivers job
> to judge, pass that actual payload length value back in the
> mbox structure (new field) so that the cxl-driver can use it.
> The pci driver would still do it's "#8 Sanitize the copy" work,
> but it would allow the cxl-driver to clearly see why it got the
> .size_out it got, and squawk about it if needed.
> 
> --Alison
> 
> > 
> > Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>
> > ---
> >  drivers/cxl/core/mbox.c | 3 ++-
> >  1 file changed, 2 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > index 9adda4795eb7..a38531a055c8 100644
> > --- a/drivers/cxl/core/mbox.c
> > +++ b/drivers/cxl/core/mbox.c
> > @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> >  		.payload_in = &log_type,
> >  		.size_in = sizeof(log_type),
> >  		.payload_out = payload,
> > -		.size_out = mds->payload_size,
> >  		.min_out = struct_size(payload, records, 0),
> >  	};
> >  
> >  	do {
> >  		int rc, i;
> >  
> > +		mbox_cmd.size_out = mds->payload_size;
> > +
> >  		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> >  		if (rc) {
> >  			dev_err_ratelimited(dev,
> > -- 
> > 2.34.1
> >
Alison Schofield April 5, 2024, 5:37 p.m. UTC | #9
On Fri, Apr 05, 2024 at 05:40:56PM +0100, Jonathan Cameron wrote:
> On Fri, 5 Apr 2024 09:04:16 -0700
> Alison Schofield <alison.schofield@intel.com> wrote:
> 
> > On Tue, Apr 02, 2024 at 05:14:03PM +0900, Kwangjin Ko wrote:
> > > Since mbox_cmd.size_out is overwritten with the actual output size in
> > > the function below, it needs to be initialized every time.
> > > 
> > > cxl_internal_send_cmd -> __cxl_pci_mbox_send_cmd
> > > 
> > > Problem scenario:
> > > 
> > > 1) The size_out variable is initially set to the size of the mailbox.
> > > 2) Read an event.
> > >    - size_out is set to 160 bytes(header 32B + one event 128B).
> > >    - Two event are created while reading.
> > > 3) Read the new *two* events.
> > >    - size_out is still set to 160 bytes.
> > >    - Although the value of out_len is 288 bytes, only 160 bytes are
> > >      copied from the mailbox register to the local variable.
> > >    - record_count is set to 2.
> > >    - Accessing records[1] will result in reading incorrect data.  
> > 
> > Agree with the other comments on need to set .out_size when doing
> > cxl_internal_send_cmd() in a loop. Poison list retrieval can hit
> > this case if the MORE flag is set and a follow on read of the list
> > delivers more records than the previous read. ie. device gives one
> > record, sets the _MORE flag, then gives 5.
> > 
> > 2 other things appeared to me while looking at this:
> > 
> > First, it seems that there is another cleanup wrt accessing records
> > with invalid data.  Still focusing on get_events and get_poison
> > since those loop through output data based on a device supplied
> > record count. The min_out check means the driver at least gets a
> > count of records to expect. That's good. The problem occurs::
> > 
> > if (mbox.size_out != struct_size(payload, records, 'record_count'))
> > 
> > The driver will log garbage trace events, and that could lead to
> > bad actions based on bad data. (like a needless scan of device based
> > on a false overflow flag). So, checking that size.out is the proper
> > multiple of record_count protects driver from bad device behavior.
> > 
> > I think that can be combined w the patch Dan is suggesting to
> > reset mbox.size_out on each loop.
> Hi Alison,
> 
> I'd split it.  Dan's one is a bug fix, this is hardening against
> a device bug. Good to have but not really backport material unless
> we think there are devices like this out there.

Agree, not aware of actual device behavior.

> 
> > 
> > Second thing is the pci-driver quiet handling of PAYLOAD LENGTH
> > values reported by the device. It seems like at a minimum the
> > pci-driver could emit an info or debug message when the device
> > is reporting payload lengths that exceed what the driver can
> > copy in.
> 
> When does this happen?
> 1. New fields on end of a fixed length message.
>    Correct to silently eat it as the spec is buggy if we don't
>     have backwards compatibility.
>     I don't think the spec has had that particular type of bug yet,
>     but maybe I'm forgetting one.
> 2. Device bug.  Can't tell that is different from 1.
> 

My thought was 2) device bug. Bad device is returning payload length
that exceeds what pci/cxl-driver can consume, so gets ignored. Am I
worrying about debugging the hardware needlessly? 

--Alison

> So maybe dev_dbg(). I'm not sure why the cxl-driver would ever want to
> know.
> 
> > I'm referring to the mbox.size_out adjustment in
> > __cxl_pci_mbox_send_cmd(). Or, if it's not the pci-drivers job
> > to judge, pass that actual payload length value back in the
> > mbox structure (new field) so that the cxl-driver can use it.
> > The pci driver would still do it's "#8 Sanitize the copy" work,
> > but it would allow the cxl-driver to clearly see why it got the
> > .size_out it got, and squawk about it if needed.
> > 
> > --Alison
> > 
> > > 
> > > Signed-off-by: Kwangjin Ko <kwangjin.ko@sk.com>
> > > ---
> > >  drivers/cxl/core/mbox.c | 3 ++-
> > >  1 file changed, 2 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
> > > index 9adda4795eb7..a38531a055c8 100644
> > > --- a/drivers/cxl/core/mbox.c
> > > +++ b/drivers/cxl/core/mbox.c
> > > @@ -958,13 +958,14 @@ static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
> > >  		.payload_in = &log_type,
> > >  		.size_in = sizeof(log_type),
> > >  		.payload_out = payload,
> > > -		.size_out = mds->payload_size,
> > >  		.min_out = struct_size(payload, records, 0),
> > >  	};
> > >  
> > >  	do {
> > >  		int rc, i;
> > >  
> > > +		mbox_cmd.size_out = mds->payload_size;
> > > +
> > >  		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
> > >  		if (rc) {
> > >  			dev_err_ratelimited(dev,
> > > -- 
> > > 2.34.1
> > >   
>
Dan Williams April 5, 2024, 5:45 p.m. UTC | #10
Alison Schofield wrote:
[..]
> My thought was 2) device bug. Bad device is returning payload length
> that exceeds what pci/cxl-driver can consume, so gets ignored. Am I
> worrying about debugging the hardware needlessly? 

I would not go so far as to say "needlessly", but the number of fun and
interesting ways that hardware can violate software expectations is
myriad, so it will always be game of after-the-fact quirks and fixups.

A payload truncation would seem to fail safely in the sense of no buffer
overrun and no stale data exposure. Still a bug, but no riskier than all
the other potential hardware bugs / spec violations.
diff mbox series

Patch

diff --git a/drivers/cxl/core/mbox.c b/drivers/cxl/core/mbox.c
index 9adda4795eb7..a38531a055c8 100644
--- a/drivers/cxl/core/mbox.c
+++ b/drivers/cxl/core/mbox.c
@@ -958,13 +958,14 @@  static void cxl_mem_get_records_log(struct cxl_memdev_state *mds,
 		.payload_in = &log_type,
 		.size_in = sizeof(log_type),
 		.payload_out = payload,
-		.size_out = mds->payload_size,
 		.min_out = struct_size(payload, records, 0),
 	};
 
 	do {
 		int rc, i;
 
+		mbox_cmd.size_out = mds->payload_size;
+
 		rc = cxl_internal_send_cmd(mds, &mbox_cmd);
 		if (rc) {
 			dev_err_ratelimited(dev,