diff mbox series

[v2,1/2] cxl/cdat: Handle cdat table build errors

Message ID 20231117-fix-cdat-cs-v2-1-715399976d4d@intel.com
State New, archived
Headers show
Series cxl/cdat: Fixes for CXL CDAT processing | expand

Commit Message

Ira Weiny Nov. 30, 2023, 1:33 a.m. UTC
The callback for building CDAT tables may return negative error codes.
This was previously unhandled and will result in potentially huge
allocations later on in ct3_build_cdat()

Detect the negative error code and defer cdat building.

Fixes: f5ee7413d592 ("hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange")
Cc: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Signed-off-by: Ira Weiny <ira.weiny@intel.com>
---
 hw/cxl/cxl-cdat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

fan Dec. 19, 2023, 5:44 p.m. UTC | #1
On Wed, Nov 29, 2023 at 05:33:03PM -0800, Ira Weiny wrote:
> The callback for building CDAT tables may return negative error codes.
> This was previously unhandled and will result in potentially huge
> allocations later on in ct3_build_cdat()
> 
> Detect the negative error code and defer cdat building.
> 
> Fixes: f5ee7413d592 ("hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange")
> Cc: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> ---
>  hw/cxl/cxl-cdat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> index 639a2db3e17b..24829cf2428d 100644
> --- a/hw/cxl/cxl-cdat.c
> +++ b/hw/cxl/cxl-cdat.c
> @@ -63,7 +63,7 @@ static void ct3_build_cdat(CDATObject *cdat, Error **errp)
>      cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf,
>                                                   cdat->private);
>  
> -    if (!cdat->built_buf_len) {
> +    if (cdat->built_buf_len <= 0) {
>          /* Build later as not all data available yet */
>          cdat->to_update = true;
>          return;
> 

The fix looks good to me. Just curious how to really build cdat table
again when an error occurs, for example, the memory allocation fails.

Fan
> -- 
> 2.42.0
>
Ira Weiny Dec. 20, 2023, 7:55 p.m. UTC | #2
fan wrote:
> On Wed, Nov 29, 2023 at 05:33:03PM -0800, Ira Weiny wrote:
> > The callback for building CDAT tables may return negative error codes.
> > This was previously unhandled and will result in potentially huge
> > allocations later on in ct3_build_cdat()
> > 
> > Detect the negative error code and defer cdat building.
> > 
> > Fixes: f5ee7413d592 ("hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange")
> > Cc: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > ---
> >  hw/cxl/cxl-cdat.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> > index 639a2db3e17b..24829cf2428d 100644
> > --- a/hw/cxl/cxl-cdat.c
> > +++ b/hw/cxl/cxl-cdat.c
> > @@ -63,7 +63,7 @@ static void ct3_build_cdat(CDATObject *cdat, Error **errp)
> >      cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf,
> >                                                   cdat->private);
> >  
> > -    if (!cdat->built_buf_len) {
> > +    if (cdat->built_buf_len <= 0) {
> >          /* Build later as not all data available yet */
> >          cdat->to_update = true;
> >          return;
> > 
> 
> The fix looks good to me. Just curious how to really build cdat table
> again when an error occurs, for example, the memory allocation fails.

I did not go that far as I am unsure as well.

Ira
Jonathan Cameron Jan. 8, 2024, 3:03 p.m. UTC | #3
On Wed, 20 Dec 2023 11:55:33 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> fan wrote:
> > On Wed, Nov 29, 2023 at 05:33:03PM -0800, Ira Weiny wrote:  
> > > The callback for building CDAT tables may return negative error codes.
> > > This was previously unhandled and will result in potentially huge
> > > allocations later on in ct3_build_cdat()
> > > 
> > > Detect the negative error code and defer cdat building.
> > > 
> > > Fixes: f5ee7413d592 ("hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange")
> > > Cc: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > ---
> > >  hw/cxl/cxl-cdat.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > 
> > > diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> > > index 639a2db3e17b..24829cf2428d 100644
> > > --- a/hw/cxl/cxl-cdat.c
> > > +++ b/hw/cxl/cxl-cdat.c
> > > @@ -63,7 +63,7 @@ static void ct3_build_cdat(CDATObject *cdat, Error **errp)
> > >      cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf,
> > >                                                   cdat->private);
> > >  
> > > -    if (!cdat->built_buf_len) {
> > > +    if (cdat->built_buf_len <= 0) {
> > >          /* Build later as not all data available yet */
> > >          cdat->to_update = true;
> > >          return;
> > >   
> > 
> > The fix looks good to me. Just curious how to really build cdat table
> > again when an error occurs, for example, the memory allocation fails.  
> 
> I did not go that far as I am unsure as well.
Memory allocations in qemu don't fail (well if they do it crashes)
Side effect of using glib which makes for simpler cases.
https://docs.gtk.org/glib/func.malloc.html

There shouldn't even be any checks :(  I'll fix that up at somepoint
across all the CXL emulation.  Sometimes reviewers noticed and
we dropped it at earlier stages, but clearly didn't catch them all.

Which come to think of it is why this error condition is in practice
not actually buggy as the code won't ever manage to return -ENOMEM and
I don't think there are other error codes.

Jonathan

> 
> Ira
>
Ira Weiny Jan. 8, 2024, 4:06 p.m. UTC | #4
Jonathan Cameron wrote:
> On Wed, 20 Dec 2023 11:55:33 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > fan wrote:
> > > On Wed, Nov 29, 2023 at 05:33:03PM -0800, Ira Weiny wrote:  
> > > > The callback for building CDAT tables may return negative error codes.
> > > > This was previously unhandled and will result in potentially huge
> > > > allocations later on in ct3_build_cdat()
> > > > 
> > > > Detect the negative error code and defer cdat building.
> > > > 
> > > > Fixes: f5ee7413d592 ("hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange")
> > > > Cc: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > > ---
> > > >  hw/cxl/cxl-cdat.c | 2 +-
> > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > 
> > > > diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> > > > index 639a2db3e17b..24829cf2428d 100644
> > > > --- a/hw/cxl/cxl-cdat.c
> > > > +++ b/hw/cxl/cxl-cdat.c
> > > > @@ -63,7 +63,7 @@ static void ct3_build_cdat(CDATObject *cdat, Error **errp)
> > > >      cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf,
> > > >                                                   cdat->private);
> > > >  
> > > > -    if (!cdat->built_buf_len) {
> > > > +    if (cdat->built_buf_len <= 0) {
> > > >          /* Build later as not all data available yet */
> > > >          cdat->to_update = true;
> > > >          return;
> > > >   
> > > 
> > > The fix looks good to me. Just curious how to really build cdat table
> > > again when an error occurs, for example, the memory allocation fails.  
> > 
> > I did not go that far as I am unsure as well.
> Memory allocations in qemu don't fail (well if they do it crashes)
> Side effect of using glib which makes for simpler cases.
> https://docs.gtk.org/glib/func.malloc.html
> 
> There shouldn't even be any checks :(  I'll fix that up at somepoint
> across all the CXL emulation.  Sometimes reviewers noticed and
> we dropped it at earlier stages, but clearly didn't catch them all.
> 
> Which come to think of it is why this error condition is in practice
> not actually buggy as the code won't ever manage to return -ENOMEM and
> I don't think there are other error codes.

Ah.  Ok but in that case I would say that build_cdat_table() should never
return < 0 to be clear at this level what can happen.

Would you like a patch for that?  (/me assumes you dropped this patch)

Ira
Jonathan Cameron Jan. 8, 2024, 6 p.m. UTC | #5
On Mon, 8 Jan 2024 08:06:32 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> Jonathan Cameron wrote:
> > On Wed, 20 Dec 2023 11:55:33 -0800
> > Ira Weiny <ira.weiny@intel.com> wrote:
> >   
> > > fan wrote:  
> > > > On Wed, Nov 29, 2023 at 05:33:03PM -0800, Ira Weiny wrote:    
> > > > > The callback for building CDAT tables may return negative error codes.
> > > > > This was previously unhandled and will result in potentially huge
> > > > > allocations later on in ct3_build_cdat()
> > > > > 
> > > > > Detect the negative error code and defer cdat building.
> > > > > 
> > > > > Fixes: f5ee7413d592 ("hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange")
> > > > > Cc: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > > > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > > > ---
> > > > >  hw/cxl/cxl-cdat.c | 2 +-
> > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > 
> > > > > diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> > > > > index 639a2db3e17b..24829cf2428d 100644
> > > > > --- a/hw/cxl/cxl-cdat.c
> > > > > +++ b/hw/cxl/cxl-cdat.c
> > > > > @@ -63,7 +63,7 @@ static void ct3_build_cdat(CDATObject *cdat, Error **errp)
> > > > >      cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf,
> > > > >                                                   cdat->private);
> > > > >  
> > > > > -    if (!cdat->built_buf_len) {
> > > > > +    if (cdat->built_buf_len <= 0) {
> > > > >          /* Build later as not all data available yet */
> > > > >          cdat->to_update = true;
> > > > >          return;
> > > > >     
> > > > 
> > > > The fix looks good to me. Just curious how to really build cdat table
> > > > again when an error occurs, for example, the memory allocation fails.    
> > > 
> > > I did not go that far as I am unsure as well.  
> > Memory allocations in qemu don't fail (well if they do it crashes)
> > Side effect of using glib which makes for simpler cases.
> > https://docs.gtk.org/glib/func.malloc.html
> > 
> > There shouldn't even be any checks :(  I'll fix that up at somepoint
> > across all the CXL emulation.  Sometimes reviewers noticed and
> > we dropped it at earlier stages, but clearly didn't catch them all.
> > 
> > Which come to think of it is why this error condition is in practice
> > not actually buggy as the code won't ever manage to return -ENOMEM and
> > I don't think there are other error codes.  
> 
> Ah.  Ok but in that case I would say that build_cdat_table() should never
> return < 0 to be clear at this level what can happen.
> 
> Would you like a patch for that?  (/me assumes you dropped this patch)

Probably needs to first rip out all the -ENOMEM returns that got into
the CXL code in general, then tidy up the return type to be unsigned.

If you want to do that it would be welcome!

Jonathan


> 
> Ira
>
Jonathan Cameron Jan. 8, 2024, 6:05 p.m. UTC | #6
On Mon, 8 Jan 2024 18:00:42 +0000
Jonathan Cameron <Jonathan.Cameron@Huawei.com> wrote:

> On Mon, 8 Jan 2024 08:06:32 -0800
> Ira Weiny <ira.weiny@intel.com> wrote:
> 
> > Jonathan Cameron wrote:  
> > > On Wed, 20 Dec 2023 11:55:33 -0800
> > > Ira Weiny <ira.weiny@intel.com> wrote:
> > >     
> > > > fan wrote:    
> > > > > On Wed, Nov 29, 2023 at 05:33:03PM -0800, Ira Weiny wrote:      
> > > > > > The callback for building CDAT tables may return negative error codes.
> > > > > > This was previously unhandled and will result in potentially huge
> > > > > > allocations later on in ct3_build_cdat()
> > > > > > 
> > > > > > Detect the negative error code and defer cdat building.
> > > > > > 
> > > > > > Fixes: f5ee7413d592 ("hw/mem/cxl-type3: Add CXL CDAT Data Object Exchange")
> > > > > > Cc: Huai-Cheng Kuo <hchkuo@avery-design.com.tw>
> > > > > > Reviewed-by: Dave Jiang <dave.jiang@intel.com>
> > > > > > Signed-off-by: Ira Weiny <ira.weiny@intel.com>
> > > > > > ---
> > > > > >  hw/cxl/cxl-cdat.c | 2 +-
> > > > > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > > > > > 
> > > > > > diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
> > > > > > index 639a2db3e17b..24829cf2428d 100644
> > > > > > --- a/hw/cxl/cxl-cdat.c
> > > > > > +++ b/hw/cxl/cxl-cdat.c
> > > > > > @@ -63,7 +63,7 @@ static void ct3_build_cdat(CDATObject *cdat, Error **errp)
> > > > > >      cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf,
> > > > > >                                                   cdat->private);
> > > > > >  
> > > > > > -    if (!cdat->built_buf_len) {
> > > > > > +    if (cdat->built_buf_len <= 0) {
> > > > > >          /* Build later as not all data available yet */
> > > > > >          cdat->to_update = true;
> > > > > >          return;
> > > > > >       
> > > > > 
> > > > > The fix looks good to me. Just curious how to really build cdat table
> > > > > again when an error occurs, for example, the memory allocation fails.      
> > > > 
> > > > I did not go that far as I am unsure as well.    
> > > Memory allocations in qemu don't fail (well if they do it crashes)
> > > Side effect of using glib which makes for simpler cases.
> > > https://docs.gtk.org/glib/func.malloc.html
> > > 
> > > There shouldn't even be any checks :(  I'll fix that up at somepoint
> > > across all the CXL emulation.  Sometimes reviewers noticed and
> > > we dropped it at earlier stages, but clearly didn't catch them all.
> > > 
> > > Which come to think of it is why this error condition is in practice
> > > not actually buggy as the code won't ever manage to return -ENOMEM and
> > > I don't think there are other error codes.    
> > 
> > Ah.  Ok but in that case I would say that build_cdat_table() should never
> > return < 0 to be clear at this level what can happen.
> > 
> > Would you like a patch for that?  (/me assumes you dropped this patch)  
> 
> Probably needs to first rip out all the -ENOMEM returns that got into
> the CXL code in general, then tidy up the return type to be unsigned.
> 
> If you want to do that it would be welcome!
Actually.  Build_cdat_table() can return errors just not for this reason.

host_memory_backend_get_memory() can fail for example.  So original patch is good
as is, just that the discussion of memory allocation failure threw me
off and should be cleaned up separately.

Jonathan

> 
> Jonathan
> 
> 
> > 
> > Ira
> >   
>
Ira Weiny Jan. 9, 2024, 2:48 a.m. UTC | #7
Jonathan Cameron wrote:

[snip]

> > > > > 
> > > > > I did not go that far as I am unsure as well.    
> > > > Memory allocations in qemu don't fail (well if they do it crashes)
> > > > Side effect of using glib which makes for simpler cases.
> > > > https://docs.gtk.org/glib/func.malloc.html
> > > > 
> > > > There shouldn't even be any checks :(  I'll fix that up at somepoint
> > > > across all the CXL emulation.  Sometimes reviewers noticed and
> > > > we dropped it at earlier stages, but clearly didn't catch them all.
> > > > 
> > > > Which come to think of it is why this error condition is in practice
> > > > not actually buggy as the code won't ever manage to return -ENOMEM and
> > > > I don't think there are other error codes.    
> > > 
> > > Ah.  Ok but in that case I would say that build_cdat_table() should never
> > > return < 0 to be clear at this level what can happen.
> > > 
> > > Would you like a patch for that?  (/me assumes you dropped this patch)  
> > 
> > Probably needs to first rip out all the -ENOMEM returns that got into
> > the CXL code in general, then tidy up the return type to be unsigned.
> > 
> > If you want to do that it would be welcome!
> Actually.  Build_cdat_table() can return errors just not for this reason.
> 
> host_memory_backend_get_memory() can fail for example.

I must be on a different version because I don't see that.

>
> So original patch is good
> as is, just that the discussion of memory allocation failure threw me
> off and should be cleaned up separately.
> 

I did this testing on Fan's DCD version...  :-/  ... probably very out of
date.

Fan do you have a newer version than your 2023-11-16 branch?

Ira
Jonathan Cameron Jan. 9, 2024, 3:34 p.m. UTC | #8
On Mon, 8 Jan 2024 18:48:48 -0800
Ira Weiny <ira.weiny@intel.com> wrote:

> Jonathan Cameron wrote:
> 
> [snip]
> 
> > > > > > 
> > > > > > I did not go that far as I am unsure as well.      
> > > > > Memory allocations in qemu don't fail (well if they do it crashes)
> > > > > Side effect of using glib which makes for simpler cases.
> > > > > https://docs.gtk.org/glib/func.malloc.html
> > > > > 
> > > > > There shouldn't even be any checks :(  I'll fix that up at somepoint
> > > > > across all the CXL emulation.  Sometimes reviewers noticed and
> > > > > we dropped it at earlier stages, but clearly didn't catch them all.
> > > > > 
> > > > > Which come to think of it is why this error condition is in practice
> > > > > not actually buggy as the code won't ever manage to return -ENOMEM and
> > > > > I don't think there are other error codes.      
> > > > 
> > > > Ah.  Ok but in that case I would say that build_cdat_table() should never
> > > > return < 0 to be clear at this level what can happen.
> > > > 
> > > > Would you like a patch for that?  (/me assumes you dropped this patch)    
> > > 
> > > Probably needs to first rip out all the -ENOMEM returns that got into
> > > the CXL code in general, then tidy up the return type to be unsigned.
> > > 
> > > If you want to do that it would be welcome!  
> > Actually.  Build_cdat_table() can return errors just not for this reason.
> > 
> > host_memory_backend_get_memory() can fail for example.  
> 
> I must be on a different version because I don't see that.
> 
> >
> > So original patch is good
> > as is, just that the discussion of memory allocation failure threw me
> > off and should be cleaned up separately.
> >   
> 
> I did this testing on Fan's DCD version...  :-/  ... probably very out of
> date.

https://elixir.bootlin.com/qemu/latest/source/hw/mem/cxl_type3.c#L183
https://elixir.bootlin.com/qemu/v8.1.0/source/hw/mem/cxl_type3.c#L171
been there a while, but meh, too many branches floating around :)

> 
> Fan do you have a newer version than your 2023-11-16 branch?
> 


> Ira
>
diff mbox series

Patch

diff --git a/hw/cxl/cxl-cdat.c b/hw/cxl/cxl-cdat.c
index 639a2db3e17b..24829cf2428d 100644
--- a/hw/cxl/cxl-cdat.c
+++ b/hw/cxl/cxl-cdat.c
@@ -63,7 +63,7 @@  static void ct3_build_cdat(CDATObject *cdat, Error **errp)
     cdat->built_buf_len = cdat->build_cdat_table(&cdat->built_buf,
                                                  cdat->private);
 
-    if (!cdat->built_buf_len) {
+    if (cdat->built_buf_len <= 0) {
         /* Build later as not all data available yet */
         cdat->to_update = true;
         return;