diff mbox series

cxl/acpi: Release device after dev_err

Message ID 20230707161616.3554167-1-leitao@debian.org
State New, archived
Headers show
Series cxl/acpi: Release device after dev_err | expand

Commit Message

Breno Leitao July 7, 2023, 4:16 p.m. UTC
Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
fails. Kfence drops this message, after the following:

  BUG: KFENCE: use-after-free read in resource_string

This is happening in cxl_parse_cfmws(), and here is a simplified flow
that is coming from Kfence.

Use-after-free:
	_dev_err
	cxl_parse_cfmws
	acpi_table_parse_entries_array
	acpi_table_parse_cedt
	cxl_acpi_probe

Free:
	cxl_decoder_release
	device_release
	kobject_put
	cxl_parse_cfmws
	acpi_table_parse_entries_array
	acpi_table_parse_cedt
	cxl_acpi_probe

Alloc:
	cxl_decoder_alloc
	cxl_parse_cfmws
	acpi_table_parse_entries_array
	acpi_table_parse_cedt
	cxl_acpi_probe
	platform_probe

From my reading of the issue, the device struct being used by
dev_err() was removed in the put_device() before.

Put the device just after the message is printed.

Signed-off-by: Breno Leitao <leitao@debian.org>
---
 drivers/cxl/acpi.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

Comments

Dave Jiang July 7, 2023, 4:50 p.m. UTC | #1
On 7/7/23 09:16, Breno Leitao wrote:
> Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
> fails. Kfence drops this message, after the following:
> 
>    BUG: KFENCE: use-after-free read in resource_string
> 
> This is happening in cxl_parse_cfmws(), and here is a simplified flow
> that is coming from Kfence.
> 
> Use-after-free:
> 	_dev_err
> 	cxl_parse_cfmws
> 	acpi_table_parse_entries_array
> 	acpi_table_parse_cedt
> 	cxl_acpi_probe
> 
> Free:
> 	cxl_decoder_release
> 	device_release
> 	kobject_put
> 	cxl_parse_cfmws
> 	acpi_table_parse_entries_array
> 	acpi_table_parse_cedt
> 	cxl_acpi_probe
> 
> Alloc:
> 	cxl_decoder_alloc
> 	cxl_parse_cfmws
> 	acpi_table_parse_entries_array
> 	acpi_table_parse_cedt
> 	cxl_acpi_probe
> 	platform_probe
> 
>  From my reading of the issue, the device struct being used by
> dev_err() was removed in the put_device() before.
> 
> Put the device just after the message is printed.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>   drivers/cxl/acpi.c | 7 +++----
>   1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 658e6b84a769..5179bf4211d8 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>   	}
>   	rc = cxl_decoder_add(cxld, target_map);
>   err_xormap:
> -	if (rc)
> -		put_device(&cxld->dev);
> -	else
> -		rc = cxl_decoder_autoremove(dev, cxld);
>   	if (rc) {
>   		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>   			cxld->hpa_range.start, cxld->hpa_range.end);
> +		put_device(&cxld->dev);
>   		return 0;

I think you will want to change this to 'return rc;' in order to reflect 
the error.

> +	} else {
> +		rc = cxl_decoder_autoremove(dev, cxld);
>   	}
>   	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>   		dev_name(&cxld->dev),

You also need to change the 'return 0' that follows to 'return rc', to 
catch the error from cxl_decoder_autoremove().
Alison Schofield July 7, 2023, 10:17 p.m. UTC | #2
On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote:
> Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
> fails. Kfence drops this message, after the following:
> 
>   BUG: KFENCE: use-after-free read in resource_string
> 
> This is happening in cxl_parse_cfmws(), and here is a simplified flow
> that is coming from Kfence.
> 
> Use-after-free:
> 	_dev_err
> 	cxl_parse_cfmws
> 	acpi_table_parse_entries_array
> 	acpi_table_parse_cedt
> 	cxl_acpi_probe
> 
> Free:
> 	cxl_decoder_release
> 	device_release
> 	kobject_put
> 	cxl_parse_cfmws
> 	acpi_table_parse_entries_array
> 	acpi_table_parse_cedt
> 	cxl_acpi_probe
> 
> Alloc:
> 	cxl_decoder_alloc
> 	cxl_parse_cfmws
> 	acpi_table_parse_entries_array
> 	acpi_table_parse_cedt
> 	cxl_acpi_probe
> 	platform_probe
> 
> From my reading of the issue, the device struct being used by
> dev_err() was removed in the put_device() before.

Hi Breno, 

I'm not familiar w kfence, but I don't follow what it finds
suspect here. Does kfence point to exact offensive lines of
code, or ???

The put_device() removed &cxld->dev and the dev_err() that
this patch moves the put after, was using 'dev', which was
assigned from ctx.dev. It is not the same as &cxld->dev. I
wonder if Kfence thinks we can get to the next dev_dbg()
statement and misuse &cxld->dev.

More below...


> 
> Put the device just after the message is printed.
> 
> Signed-off-by: Breno Leitao <leitao@debian.org>
> ---
>  drivers/cxl/acpi.c | 7 +++----
>  1 file changed, 3 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 658e6b84a769..5179bf4211d8 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>  	}
>  	rc = cxl_decoder_add(cxld, target_map);
>  err_xormap:
> -	if (rc)
> -		put_device(&cxld->dev);
> -	else
> -		rc = cxl_decoder_autoremove(dev, cxld);
>  	if (rc) {
>  		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>  			cxld->hpa_range.start, cxld->hpa_range.end);
> +		put_device(&cxld->dev);
>  		return 0;
> +	} else {
> +		rc = cxl_decoder_autoremove(dev, cxld);
>  	}
>  	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>  		dev_name(&cxld->dev),
> -- 
> 2.34.1
> 

(pulled in fresh code snippet to get the dev_dbg() in view.)

>	}
>	rc = cxl_decoder_add(cxld, target_map);
>err_xormap:
>	if (rc)
>		put_device(&cxld->dev);
>

This puts &cxld->dev, not dev.

>
>	else
>		rc = cxl_decoder_autoremove(dev, cxld);
>	if (rc) {
>		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>			cxld->hpa_range.start, cxld->hpa_range.end);
>		return 0;

This return avoids getting to the next dev_dbg() statement after
put_device(). We cannot get to the next dev_dbg() statement when
rc is non zero, but it seems kfence thinks we can.

>	}
>	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>		dev_name(&cxld->dev),

If we could get here, after the put_device(), that would be bad.

>		phys_to_target_node(cxld->hpa_range.start),
>		cxld->hpa_range.start, cxld->hpa_range.end);
>
>	return 0;
Dave Jiang July 8, 2023, 12:33 a.m. UTC | #3
On 7/7/23 15:17, Alison Schofield wrote:
> On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote:
>> Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
>> fails. Kfence drops this message, after the following:
>>
>>    BUG: KFENCE: use-after-free read in resource_string
>>
>> This is happening in cxl_parse_cfmws(), and here is a simplified flow
>> that is coming from Kfence.
>>
>> Use-after-free:
>> 	_dev_err
>> 	cxl_parse_cfmws
>> 	acpi_table_parse_entries_array
>> 	acpi_table_parse_cedt
>> 	cxl_acpi_probe
>>
>> Free:
>> 	cxl_decoder_release
>> 	device_release
>> 	kobject_put
>> 	cxl_parse_cfmws
>> 	acpi_table_parse_entries_array
>> 	acpi_table_parse_cedt
>> 	cxl_acpi_probe
>>
>> Alloc:
>> 	cxl_decoder_alloc
>> 	cxl_parse_cfmws
>> 	acpi_table_parse_entries_array
>> 	acpi_table_parse_cedt
>> 	cxl_acpi_probe
>> 	platform_probe
>>
>>  From my reading of the issue, the device struct being used by
>> dev_err() was removed in the put_device() before.
> 
> Hi Breno,
> 
> I'm not familiar w kfence, but I don't follow what it finds
> suspect here. Does kfence point to exact offensive lines of
> code, or ???
> 
> The put_device() removed &cxld->dev and the dev_err() that
> this patch moves the put after, was using 'dev', which was
> assigned from ctx.dev. It is not the same as &cxld->dev. I
> wonder if Kfence thinks we can get to the next dev_dbg()
> statement and misuse &cxld->dev.
> 
> More below...
> 
> 
>>
>> Put the device just after the message is printed.
>>
>> Signed-off-by: Breno Leitao <leitao@debian.org>
>> ---
>>   drivers/cxl/acpi.c | 7 +++----
>>   1 file changed, 3 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index 658e6b84a769..5179bf4211d8 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>>   	}
>>   	rc = cxl_decoder_add(cxld, target_map);
>>   err_xormap:
>> -	if (rc)
>> -		put_device(&cxld->dev);
>> -	else
>> -		rc = cxl_decoder_autoremove(dev, cxld);
>>   	if (rc) {
>>   		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>>   			cxld->hpa_range.start, cxld->hpa_range.end);
>> +		put_device(&cxld->dev);
>>   		return 0;
>> +	} else {
>> +		rc = cxl_decoder_autoremove(dev, cxld);
>>   	}
>>   	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>>   		dev_name(&cxld->dev),
>> -- 
>> 2.34.1
>>
> 
> (pulled in fresh code snippet to get the dev_dbg() in view.)
> 
>> 	}
>> 	rc = cxl_decoder_add(cxld, target_map);
>> err_xormap:
>> 	if (rc)
>> 		put_device(&cxld->dev);
>>
> 
> This puts &cxld->dev, not dev.
> 
>>
>> 	else
>> 		rc = cxl_decoder_autoremove(dev, cxld);
>> 	if (rc) {
>> 		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>> 			cxld->hpa_range.start, cxld->hpa_range.end);
>> 		return 0;
> 
> This return avoids getting to the next dev_dbg() statement after
> put_device(). We cannot get to the next dev_dbg() statement when
> rc is non zero, but it seems kfence thinks we can.
> 
>> 	}
>> 	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>> 		dev_name(&cxld->dev),
> 
> If we could get here, after the put_device(), that would be bad.
> 
>> 		phys_to_target_node(cxld->hpa_range.start),
>> 		cxld->hpa_range.start, cxld->hpa_range.end);
>>
>> 	return 0;
> 

Seems like the code can be cleaned up this way. The double check of if 
(rc) is kinda weird anyhow.

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 7e1765b09e04..0573b476d29c 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -291,21 +291,21 @@ static int cxl_parse_cfmws(union 
acpi_subtable_headers *header, void *arg,
         }
         rc = cxl_decoder_add(cxld, target_map);
  err_xormap:
-       if (rc)
-               put_device(&cxld->dev);
-       else
-               rc = cxl_decoder_autoremove(dev, cxld);
         if (rc) {
                 dev_err(dev, "Failed to add decode range [%#llx - 
%#llx]\n",
                         cxld->hpa_range.start, cxld->hpa_range.end);
-               return 0;
+               put_device(&cxld->dev);
+               return rc;
         }
+
+       rc = cxl_decoder_autoremove(dev, cxld);
+
         dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
                 dev_name(&cxld->dev),
                 phys_to_target_node(cxld->hpa_range.start),
                 cxld->hpa_range.start, cxld->hpa_range.end);

-       return 0;
+       return rc;

  err_insert:
         kfree(res->name);
Alison Schofield July 8, 2023, 1:07 a.m. UTC | #4
On Fri, Jul 07, 2023 at 05:33:02PM -0700, Dave Jiang wrote:
> 
> 
> On 7/7/23 15:17, Alison Schofield wrote:
> > On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote:
> > > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
> > > fails. Kfence drops this message, after the following:
> > > 
> > >    BUG: KFENCE: use-after-free read in resource_string
> > > 
> > > This is happening in cxl_parse_cfmws(), and here is a simplified flow
> > > that is coming from Kfence.
> > > 
> > > Use-after-free:
> > > 	_dev_err
> > > 	cxl_parse_cfmws
> > > 	acpi_table_parse_entries_array
> > > 	acpi_table_parse_cedt
> > > 	cxl_acpi_probe
> > > 
> > > Free:
> > > 	cxl_decoder_release
> > > 	device_release
> > > 	kobject_put
> > > 	cxl_parse_cfmws
> > > 	acpi_table_parse_entries_array
> > > 	acpi_table_parse_cedt
> > > 	cxl_acpi_probe
> > > 
> > > Alloc:
> > > 	cxl_decoder_alloc
> > > 	cxl_parse_cfmws
> > > 	acpi_table_parse_entries_array
> > > 	acpi_table_parse_cedt
> > > 	cxl_acpi_probe
> > > 	platform_probe
> > > 
> > >  From my reading of the issue, the device struct being used by
> > > dev_err() was removed in the put_device() before.
> > 
> > Hi Breno,
> > 
> > I'm not familiar w kfence, but I don't follow what it finds
> > suspect here. Does kfence point to exact offensive lines of
> > code, or ???
> > 
> > The put_device() removed &cxld->dev and the dev_err() that
> > this patch moves the put after, was using 'dev', which was
> > assigned from ctx.dev. It is not the same as &cxld->dev. I
> > wonder if Kfence thinks we can get to the next dev_dbg()
> > statement and misuse &cxld->dev.
> > 
> > More below...
> > 
> > 
> > > 
> > > Put the device just after the message is printed.
> > > 
> > > Signed-off-by: Breno Leitao <leitao@debian.org>
> > > ---
> > >   drivers/cxl/acpi.c | 7 +++----
> > >   1 file changed, 3 insertions(+), 4 deletions(-)
> > > 
> > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > > index 658e6b84a769..5179bf4211d8 100644
> > > --- a/drivers/cxl/acpi.c
> > > +++ b/drivers/cxl/acpi.c
> > > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> > >   	}
> > >   	rc = cxl_decoder_add(cxld, target_map);
> > >   err_xormap:
> > > -	if (rc)
> > > -		put_device(&cxld->dev);
> > > -	else
> > > -		rc = cxl_decoder_autoremove(dev, cxld);
> > >   	if (rc) {
> > >   		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> > >   			cxld->hpa_range.start, cxld->hpa_range.end);
> > > +		put_device(&cxld->dev);
> > >   		return 0;
> > > +	} else {
> > > +		rc = cxl_decoder_autoremove(dev, cxld);
> > >   	}
> > >   	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> > >   		dev_name(&cxld->dev),
> > > -- 
> > > 2.34.1
> > > 
> > 
> > (pulled in fresh code snippet to get the dev_dbg() in view.)
> > 
> > > 	}
> > > 	rc = cxl_decoder_add(cxld, target_map);
> > > err_xormap:
> > > 	if (rc)
> > > 		put_device(&cxld->dev);
> > > 
> > 
> > This puts &cxld->dev, not dev.
> > 
> > > 
> > > 	else
> > > 		rc = cxl_decoder_autoremove(dev, cxld);
> > > 	if (rc) {
> > > 		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> > > 			cxld->hpa_range.start, cxld->hpa_range.end);
> > > 		return 0;
> > 
> > This return avoids getting to the next dev_dbg() statement after
> > put_device(). We cannot get to the next dev_dbg() statement when
> > rc is non zero, but it seems kfence thinks we can.
> > 
> > > 	}
> > > 	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> > > 		dev_name(&cxld->dev),
> > 
> > If we could get here, after the put_device(), that would be bad.
> > 
> > > 		phys_to_target_node(cxld->hpa_range.start),
> > > 		cxld->hpa_range.start, cxld->hpa_range.end);
> > > 
> > > 	return 0;
> > 
> 
> Seems like the code can be cleaned up this way. The double check of if (rc)
> is kinda weird anyhow.
> 
> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> index 7e1765b09e04..0573b476d29c 100644
> --- a/drivers/cxl/acpi.c
> +++ b/drivers/cxl/acpi.c
> @@ -291,21 +291,21 @@ static int cxl_parse_cfmws(union acpi_subtable_headers
> *header, void *arg,
>         }
>         rc = cxl_decoder_add(cxld, target_map);
>  err_xormap:
> -       if (rc)
> -               put_device(&cxld->dev);
> -       else
> -               rc = cxl_decoder_autoremove(dev, cxld);
>         if (rc) {
>                 dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>                         cxld->hpa_range.start, cxld->hpa_range.end);
> -               return 0;
> +               put_device(&cxld->dev);

Why do you think the put_device() is making deref of cxld bad here?
That cxld is still present.

Maybe the unrelated bug here is that we are leaking the res. 
Rather that returning directly, take the kfree(res) path out.


> +               return rc;
>         }
> +
> +       rc = cxl_decoder_autoremove(dev, cxld);
> +

Now this rc is not examined.


>         dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>                 dev_name(&cxld->dev),
>                 phys_to_target_node(cxld->hpa_range.start),
>                 cxld->hpa_range.start, cxld->hpa_range.end);
> 
> -       return 0;
> +       return rc;
> 
>  err_insert:
>         kfree(res->name);
Breno Leitao July 10, 2023, 10:41 a.m. UTC | #5
On Fri, Jul 07, 2023 at 03:17:58PM -0700, Alison Schofield wrote:
> On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote:
> > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
> > fails. Kfence drops this message, after the following:
> > 
> >   BUG: KFENCE: use-after-free read in resource_string
> > 
> > This is happening in cxl_parse_cfmws(), and here is a simplified flow
> > that is coming from Kfence.
> > 
> > Use-after-free:
> > 	_dev_err
> > 	cxl_parse_cfmws
> > 	acpi_table_parse_entries_array
> > 	acpi_table_parse_cedt
> > 	cxl_acpi_probe
> > 
> > Free:
> > 	cxl_decoder_release
> > 	device_release
> > 	kobject_put
> > 	cxl_parse_cfmws
> > 	acpi_table_parse_entries_array
> > 	acpi_table_parse_cedt
> > 	cxl_acpi_probe
> > 
> > Alloc:
> > 	cxl_decoder_alloc
> > 	cxl_parse_cfmws
> > 	acpi_table_parse_entries_array
> > 	acpi_table_parse_cedt
> > 	cxl_acpi_probe
> > 	platform_probe
> > 
> > From my reading of the issue, the device struct being used by
> > dev_err() was removed in the put_device() before.
> 
> Hi Breno, 
> 
> I'm not familiar w kfence, but I don't follow what it finds
> suspect here. Does kfence point to exact offensive lines of
> code, or ???

Unfortunately I do not lines that match anything public. Collecting them
might be hard also, since kfence problems during failure mode are not
easy to reproduce.

> The put_device() removed &cxld->dev and the dev_err() that
> this patch moves the put after, was using 'dev', which was
> assigned from ctx.dev. It is not the same as &cxld->dev. I
> wonder if Kfence thinks we can get to the next dev_dbg()
> statement and misuse &cxld->dev.

Any chance that "struct device_type->release"() might be touching
ctx->dev? This is because "struct device_type->release"() is calling
during the put operation, which seems to be the one de-allocating the
resource.

> More below...
> 
> 
> > 
> > Put the device just after the message is printed.
> > 
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> >  drivers/cxl/acpi.c | 7 +++----
> >  1 file changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 658e6b84a769..5179bf4211d8 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >  	}
> >  	rc = cxl_decoder_add(cxld, target_map);
> >  err_xormap:
> > -	if (rc)
> > -		put_device(&cxld->dev);
> > -	else
> > -		rc = cxl_decoder_autoremove(dev, cxld);
> >  	if (rc) {
> >  		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> >  			cxld->hpa_range.start, cxld->hpa_range.end);
> > +		put_device(&cxld->dev);
> >  		return 0;
> > +	} else {
> > +		rc = cxl_decoder_autoremove(dev, cxld);
> >  	}
> >  	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
> >  		dev_name(&cxld->dev),
> > -- 
> > 2.34.1
> > 
> 
> (pulled in fresh code snippet to get the dev_dbg() in view.)
> 
> >	}
> >	rc = cxl_decoder_add(cxld, target_map);
> >err_xormap:
> >	if (rc)
> >		put_device(&cxld->dev);
> >
> 
> This puts &cxld->dev, not dev.
> 
> >
> >	else
> >		rc = cxl_decoder_autoremove(dev, cxld);
> >	if (rc) {
> >		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> >			cxld->hpa_range.start, cxld->hpa_range.end);
> >		return 0;
> 
> This return avoids getting to the next dev_dbg() statement after
> put_device(). We cannot get to the next dev_dbg() statement when
> rc is non zero, but it seems kfence thinks we can.

Oh, the problems is on dev_err() not on dev_dbg(). Basically on the
return path.

Here is what dmesg says:

	cxl root0: Failed to populate active decoder targets
	cxl_acpi ACPI0017:00: Failed to add decoder for [mem 0x4080000000-0x2baffffffff flags 0x200]
	cxl root0: Failed to populate active decoder targets
	cxl_acpi ACPI0017:00: Failed to add decoder for [mem 0x2bb00000000-0x5357fffffff flags 0x200]
	cxl root0: Failed to populate active decoder targets
	==================================================================
	BUG: KFENCE: use-after-free read in resource_string+0x80/0x570\x0a
	Use-after-free read at 0x(____ptrval____) (in kfence-#111):
	 resource_string+0x80/0x570
	 pointer+0x389/0x3c0
	 vsnprintf+0x214/0x670
	 pointer+0x1b2/0x3c0
	 vsnprintf+0x214/0x670
	 vprintk_store+0x102/0x450
	 vprintk_emit+0x6f/0x1b0
	 dev_vprintk_emit+0x117/0x163
	 dev_printk_emit+0x51/0x6b
	 _dev_err+0x6e/0x88
	 cxl_parse_cfmws+0x2a0/0x2d
	 acpi_table_parse_entries_array+0x1fc/0x330
	 acpi_table_parse_cedt+0x4f/0x70
	 cxl_acpi_probe+0xd6/0x150
	 platform_probe+0x2f/0x60
	 really_probe+0x1f5/0x340
	 driver_probe_device+0x1e/0x80
	 __driver_attach+0xfc/0x190
	 bus_for_each_dev+0x76/0xb0
	 bus_add_driver+0x1bb/0x230
	 driver_register+0x85/0x120
	 do_one_initcall+0xbe/0x240
	 kernel_init_freeable+0x1cc/0x2d2
	 kernel_init+0x16/0x1a0
	 ret_from_fork+0x1f/0x30
Breno Leitao July 10, 2023, 10:49 a.m. UTC | #6
On Fri, Jul 07, 2023 at 09:50:09AM -0700, Dave Jiang wrote:
> 
> 
> On 7/7/23 09:16, Breno Leitao wrote:
> > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
> > fails. Kfence drops this message, after the following:
> > 
> >    BUG: KFENCE: use-after-free read in resource_string
> > 
> > This is happening in cxl_parse_cfmws(), and here is a simplified flow
> > that is coming from Kfence.
> > 
> > Use-after-free:
> > 	_dev_err
> > 	cxl_parse_cfmws
> > 	acpi_table_parse_entries_array
> > 	acpi_table_parse_cedt
> > 	cxl_acpi_probe
> > 
> > Free:
> > 	cxl_decoder_release
> > 	device_release
> > 	kobject_put
> > 	cxl_parse_cfmws
> > 	acpi_table_parse_entries_array
> > 	acpi_table_parse_cedt
> > 	cxl_acpi_probe
> > 
> > Alloc:
> > 	cxl_decoder_alloc
> > 	cxl_parse_cfmws
> > 	acpi_table_parse_entries_array
> > 	acpi_table_parse_cedt
> > 	cxl_acpi_probe
> > 	platform_probe
> > 
> >  From my reading of the issue, the device struct being used by
> > dev_err() was removed in the put_device() before.
> > 
> > Put the device just after the message is printed.
> > 
> > Signed-off-by: Breno Leitao <leitao@debian.org>
> > ---
> >   drivers/cxl/acpi.c | 7 +++----
> >   1 file changed, 3 insertions(+), 4 deletions(-)
> > 
> > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
> > index 658e6b84a769..5179bf4211d8 100644
> > --- a/drivers/cxl/acpi.c
> > +++ b/drivers/cxl/acpi.c
> > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
> >   	}
> >   	rc = cxl_decoder_add(cxld, target_map);
> >   err_xormap:
> > -	if (rc)
> > -		put_device(&cxld->dev);
> > -	else
> > -		rc = cxl_decoder_autoremove(dev, cxld);
> >   	if (rc) {
> >   		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
> >   			cxld->hpa_range.start, cxld->hpa_range.end);
> > +		put_device(&cxld->dev);
> >   		return 0;
> 
> I think you will want to change this to 'return rc;' in order to reflect the
> error.

This is a good point also, and I can change it in v2, if there is an
agreement in this patch.
Dave Jiang July 10, 2023, 3:55 p.m. UTC | #7
On 7/7/23 18:07, Alison Schofield wrote:
> On Fri, Jul 07, 2023 at 05:33:02PM -0700, Dave Jiang wrote:
>>
>>
>> On 7/7/23 15:17, Alison Schofield wrote:
>>> On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote:
>>>> Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add()
>>>> fails. Kfence drops this message, after the following:
>>>>
>>>>     BUG: KFENCE: use-after-free read in resource_string
>>>>
>>>> This is happening in cxl_parse_cfmws(), and here is a simplified flow
>>>> that is coming from Kfence.
>>>>
>>>> Use-after-free:
>>>> 	_dev_err
>>>> 	cxl_parse_cfmws
>>>> 	acpi_table_parse_entries_array
>>>> 	acpi_table_parse_cedt
>>>> 	cxl_acpi_probe
>>>>
>>>> Free:
>>>> 	cxl_decoder_release
>>>> 	device_release
>>>> 	kobject_put
>>>> 	cxl_parse_cfmws
>>>> 	acpi_table_parse_entries_array
>>>> 	acpi_table_parse_cedt
>>>> 	cxl_acpi_probe
>>>>
>>>> Alloc:
>>>> 	cxl_decoder_alloc
>>>> 	cxl_parse_cfmws
>>>> 	acpi_table_parse_entries_array
>>>> 	acpi_table_parse_cedt
>>>> 	cxl_acpi_probe
>>>> 	platform_probe
>>>>
>>>>   From my reading of the issue, the device struct being used by
>>>> dev_err() was removed in the put_device() before.
>>>
>>> Hi Breno,
>>>
>>> I'm not familiar w kfence, but I don't follow what it finds
>>> suspect here. Does kfence point to exact offensive lines of
>>> code, or ???
>>>
>>> The put_device() removed &cxld->dev and the dev_err() that
>>> this patch moves the put after, was using 'dev', which was
>>> assigned from ctx.dev. It is not the same as &cxld->dev. I
>>> wonder if Kfence thinks we can get to the next dev_dbg()
>>> statement and misuse &cxld->dev.
>>>
>>> More below...
>>>
>>>
>>>>
>>>> Put the device just after the message is printed.
>>>>
>>>> Signed-off-by: Breno Leitao <leitao@debian.org>
>>>> ---
>>>>    drivers/cxl/acpi.c | 7 +++----
>>>>    1 file changed, 3 insertions(+), 4 deletions(-)
>>>>
>>>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>>>> index 658e6b84a769..5179bf4211d8 100644
>>>> --- a/drivers/cxl/acpi.c
>>>> +++ b/drivers/cxl/acpi.c
>>>> @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
>>>>    	}
>>>>    	rc = cxl_decoder_add(cxld, target_map);
>>>>    err_xormap:
>>>> -	if (rc)
>>>> -		put_device(&cxld->dev);
>>>> -	else
>>>> -		rc = cxl_decoder_autoremove(dev, cxld);
>>>>    	if (rc) {
>>>>    		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>>>>    			cxld->hpa_range.start, cxld->hpa_range.end);
>>>> +		put_device(&cxld->dev);
>>>>    		return 0;
>>>> +	} else {
>>>> +		rc = cxl_decoder_autoremove(dev, cxld);
>>>>    	}
>>>>    	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>>>>    		dev_name(&cxld->dev),
>>>> -- 
>>>> 2.34.1
>>>>
>>>
>>> (pulled in fresh code snippet to get the dev_dbg() in view.)
>>>
>>>> 	}
>>>> 	rc = cxl_decoder_add(cxld, target_map);
>>>> err_xormap:
>>>> 	if (rc)
>>>> 		put_device(&cxld->dev);
>>>>
>>>
>>> This puts &cxld->dev, not dev.
>>>
>>>>
>>>> 	else
>>>> 		rc = cxl_decoder_autoremove(dev, cxld);
>>>> 	if (rc) {
>>>> 		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>>>> 			cxld->hpa_range.start, cxld->hpa_range.end);
>>>> 		return 0;
>>>
>>> This return avoids getting to the next dev_dbg() statement after
>>> put_device(). We cannot get to the next dev_dbg() statement when
>>> rc is non zero, but it seems kfence thinks we can.
>>>
>>>> 	}
>>>> 	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>>>> 		dev_name(&cxld->dev),
>>>
>>> If we could get here, after the put_device(), that would be bad.
>>>
>>>> 		phys_to_target_node(cxld->hpa_range.start),
>>>> 		cxld->hpa_range.start, cxld->hpa_range.end);
>>>>
>>>> 	return 0;
>>>
>>
>> Seems like the code can be cleaned up this way. The double check of if (rc)
>> is kinda weird anyhow.
>>
>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
>> index 7e1765b09e04..0573b476d29c 100644
>> --- a/drivers/cxl/acpi.c
>> +++ b/drivers/cxl/acpi.c
>> @@ -291,21 +291,21 @@ static int cxl_parse_cfmws(union acpi_subtable_headers
>> *header, void *arg,
>>          }
>>          rc = cxl_decoder_add(cxld, target_map);
>>   err_xormap:
>> -       if (rc)
>> -               put_device(&cxld->dev);
>> -       else
>> -               rc = cxl_decoder_autoremove(dev, cxld);
>>          if (rc) {
>>                  dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
>>                          cxld->hpa_range.start, cxld->hpa_range.end);
>> -               return 0;
>> +               put_device(&cxld->dev);
> 
> Why do you think the put_device() is making deref of cxld bad here?
> That cxld is still present.

The put_device() can trigger the ->release() of the cxld->dev and 
therefore cxld may no longer be present.

> 
> Maybe the unrelated bug here is that we are leaking the res.
> Rather that returning directly, take the kfree(res) path out.
> 
> 
>> +               return rc;
>>          }
>> +
>> +       rc = cxl_decoder_autoremove(dev, cxld);
>> +
> 
> Now this rc is not examined.

Yeah, should check here and return rc.

> 
> 
>>          dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
>>                  dev_name(&cxld->dev),
>>                  phys_to_target_node(cxld->hpa_range.start),
>>                  cxld->hpa_range.start, cxld->hpa_range.end);
>>
>> -       return 0;
>> +       return rc;
>>
>>   err_insert:
>>          kfree(res->name);
diff mbox series

Patch

diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c
index 658e6b84a769..5179bf4211d8 100644
--- a/drivers/cxl/acpi.c
+++ b/drivers/cxl/acpi.c
@@ -291,14 +291,13 @@  static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg,
 	}
 	rc = cxl_decoder_add(cxld, target_map);
 err_xormap:
-	if (rc)
-		put_device(&cxld->dev);
-	else
-		rc = cxl_decoder_autoremove(dev, cxld);
 	if (rc) {
 		dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n",
 			cxld->hpa_range.start, cxld->hpa_range.end);
+		put_device(&cxld->dev);
 		return 0;
+	} else {
+		rc = cxl_decoder_autoremove(dev, cxld);
 	}
 	dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n",
 		dev_name(&cxld->dev),