Message ID | 20230707161616.3554167-1-leitao@debian.org |
---|---|
State | New, archived |
Headers | show |
Series | cxl/acpi: Release device after dev_err | expand |
On 7/7/23 09:16, Breno Leitao wrote: > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add() > fails. Kfence drops this message, after the following: > > BUG: KFENCE: use-after-free read in resource_string > > This is happening in cxl_parse_cfmws(), and here is a simplified flow > that is coming from Kfence. > > Use-after-free: > _dev_err > cxl_parse_cfmws > acpi_table_parse_entries_array > acpi_table_parse_cedt > cxl_acpi_probe > > Free: > cxl_decoder_release > device_release > kobject_put > cxl_parse_cfmws > acpi_table_parse_entries_array > acpi_table_parse_cedt > cxl_acpi_probe > > Alloc: > cxl_decoder_alloc > cxl_parse_cfmws > acpi_table_parse_entries_array > acpi_table_parse_cedt > cxl_acpi_probe > platform_probe > > From my reading of the issue, the device struct being used by > dev_err() was removed in the put_device() before. > > Put the device just after the message is printed. > > Signed-off-by: Breno Leitao <leitao@debian.org> > --- > drivers/cxl/acpi.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > index 658e6b84a769..5179bf4211d8 100644 > --- a/drivers/cxl/acpi.c > +++ b/drivers/cxl/acpi.c > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, > } > rc = cxl_decoder_add(cxld, target_map); > err_xormap: > - if (rc) > - put_device(&cxld->dev); > - else > - rc = cxl_decoder_autoremove(dev, cxld); > if (rc) { > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", > cxld->hpa_range.start, cxld->hpa_range.end); > + put_device(&cxld->dev); > return 0; I think you will want to change this to 'return rc;' in order to reflect the error. > + } else { > + rc = cxl_decoder_autoremove(dev, cxld); > } > dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", > dev_name(&cxld->dev), You also need to change the 'return 0' that follows to 'return rc', to catch the error from cxl_decoder_autoremove().
On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote: > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add() > fails. Kfence drops this message, after the following: > > BUG: KFENCE: use-after-free read in resource_string > > This is happening in cxl_parse_cfmws(), and here is a simplified flow > that is coming from Kfence. > > Use-after-free: > _dev_err > cxl_parse_cfmws > acpi_table_parse_entries_array > acpi_table_parse_cedt > cxl_acpi_probe > > Free: > cxl_decoder_release > device_release > kobject_put > cxl_parse_cfmws > acpi_table_parse_entries_array > acpi_table_parse_cedt > cxl_acpi_probe > > Alloc: > cxl_decoder_alloc > cxl_parse_cfmws > acpi_table_parse_entries_array > acpi_table_parse_cedt > cxl_acpi_probe > platform_probe > > From my reading of the issue, the device struct being used by > dev_err() was removed in the put_device() before. Hi Breno, I'm not familiar w kfence, but I don't follow what it finds suspect here. Does kfence point to exact offensive lines of code, or ??? The put_device() removed &cxld->dev and the dev_err() that this patch moves the put after, was using 'dev', which was assigned from ctx.dev. It is not the same as &cxld->dev. I wonder if Kfence thinks we can get to the next dev_dbg() statement and misuse &cxld->dev. More below... > > Put the device just after the message is printed. > > Signed-off-by: Breno Leitao <leitao@debian.org> > --- > drivers/cxl/acpi.c | 7 +++---- > 1 file changed, 3 insertions(+), 4 deletions(-) > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > index 658e6b84a769..5179bf4211d8 100644 > --- a/drivers/cxl/acpi.c > +++ b/drivers/cxl/acpi.c > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, > } > rc = cxl_decoder_add(cxld, target_map); > err_xormap: > - if (rc) > - put_device(&cxld->dev); > - else > - rc = cxl_decoder_autoremove(dev, cxld); > if (rc) { > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", > cxld->hpa_range.start, cxld->hpa_range.end); > + put_device(&cxld->dev); > return 0; > + } else { > + rc = cxl_decoder_autoremove(dev, cxld); > } > dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", > dev_name(&cxld->dev), > -- > 2.34.1 > (pulled in fresh code snippet to get the dev_dbg() in view.) > } > rc = cxl_decoder_add(cxld, target_map); >err_xormap: > if (rc) > put_device(&cxld->dev); > This puts &cxld->dev, not dev. > > else > rc = cxl_decoder_autoremove(dev, cxld); > if (rc) { > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", > cxld->hpa_range.start, cxld->hpa_range.end); > return 0; This return avoids getting to the next dev_dbg() statement after put_device(). We cannot get to the next dev_dbg() statement when rc is non zero, but it seems kfence thinks we can. > } > dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", > dev_name(&cxld->dev), If we could get here, after the put_device(), that would be bad. > phys_to_target_node(cxld->hpa_range.start), > cxld->hpa_range.start, cxld->hpa_range.end); > > return 0;
On 7/7/23 15:17, Alison Schofield wrote: > On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote: >> Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add() >> fails. Kfence drops this message, after the following: >> >> BUG: KFENCE: use-after-free read in resource_string >> >> This is happening in cxl_parse_cfmws(), and here is a simplified flow >> that is coming from Kfence. >> >> Use-after-free: >> _dev_err >> cxl_parse_cfmws >> acpi_table_parse_entries_array >> acpi_table_parse_cedt >> cxl_acpi_probe >> >> Free: >> cxl_decoder_release >> device_release >> kobject_put >> cxl_parse_cfmws >> acpi_table_parse_entries_array >> acpi_table_parse_cedt >> cxl_acpi_probe >> >> Alloc: >> cxl_decoder_alloc >> cxl_parse_cfmws >> acpi_table_parse_entries_array >> acpi_table_parse_cedt >> cxl_acpi_probe >> platform_probe >> >> From my reading of the issue, the device struct being used by >> dev_err() was removed in the put_device() before. > > Hi Breno, > > I'm not familiar w kfence, but I don't follow what it finds > suspect here. Does kfence point to exact offensive lines of > code, or ??? > > The put_device() removed &cxld->dev and the dev_err() that > this patch moves the put after, was using 'dev', which was > assigned from ctx.dev. It is not the same as &cxld->dev. I > wonder if Kfence thinks we can get to the next dev_dbg() > statement and misuse &cxld->dev. > > More below... > > >> >> Put the device just after the message is printed. >> >> Signed-off-by: Breno Leitao <leitao@debian.org> >> --- >> drivers/cxl/acpi.c | 7 +++---- >> 1 file changed, 3 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c >> index 658e6b84a769..5179bf4211d8 100644 >> --- a/drivers/cxl/acpi.c >> +++ b/drivers/cxl/acpi.c >> @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, >> } >> rc = cxl_decoder_add(cxld, target_map); >> err_xormap: >> - if (rc) >> - put_device(&cxld->dev); >> - else >> - rc = cxl_decoder_autoremove(dev, cxld); >> if (rc) { >> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", >> cxld->hpa_range.start, cxld->hpa_range.end); >> + put_device(&cxld->dev); >> return 0; >> + } else { >> + rc = cxl_decoder_autoremove(dev, cxld); >> } >> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", >> dev_name(&cxld->dev), >> -- >> 2.34.1 >> > > (pulled in fresh code snippet to get the dev_dbg() in view.) > >> } >> rc = cxl_decoder_add(cxld, target_map); >> err_xormap: >> if (rc) >> put_device(&cxld->dev); >> > > This puts &cxld->dev, not dev. > >> >> else >> rc = cxl_decoder_autoremove(dev, cxld); >> if (rc) { >> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", >> cxld->hpa_range.start, cxld->hpa_range.end); >> return 0; > > This return avoids getting to the next dev_dbg() statement after > put_device(). We cannot get to the next dev_dbg() statement when > rc is non zero, but it seems kfence thinks we can. > >> } >> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", >> dev_name(&cxld->dev), > > If we could get here, after the put_device(), that would be bad. > >> phys_to_target_node(cxld->hpa_range.start), >> cxld->hpa_range.start, cxld->hpa_range.end); >> >> return 0; > Seems like the code can be cleaned up this way. The double check of if (rc) is kinda weird anyhow. diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 7e1765b09e04..0573b476d29c 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -291,21 +291,21 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, } rc = cxl_decoder_add(cxld, target_map); err_xormap: - if (rc) - put_device(&cxld->dev); - else - rc = cxl_decoder_autoremove(dev, cxld); if (rc) { dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", cxld->hpa_range.start, cxld->hpa_range.end); - return 0; + put_device(&cxld->dev); + return rc; } + + rc = cxl_decoder_autoremove(dev, cxld); + dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", dev_name(&cxld->dev), phys_to_target_node(cxld->hpa_range.start), cxld->hpa_range.start, cxld->hpa_range.end); - return 0; + return rc; err_insert: kfree(res->name);
On Fri, Jul 07, 2023 at 05:33:02PM -0700, Dave Jiang wrote: > > > On 7/7/23 15:17, Alison Schofield wrote: > > On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote: > > > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add() > > > fails. Kfence drops this message, after the following: > > > > > > BUG: KFENCE: use-after-free read in resource_string > > > > > > This is happening in cxl_parse_cfmws(), and here is a simplified flow > > > that is coming from Kfence. > > > > > > Use-after-free: > > > _dev_err > > > cxl_parse_cfmws > > > acpi_table_parse_entries_array > > > acpi_table_parse_cedt > > > cxl_acpi_probe > > > > > > Free: > > > cxl_decoder_release > > > device_release > > > kobject_put > > > cxl_parse_cfmws > > > acpi_table_parse_entries_array > > > acpi_table_parse_cedt > > > cxl_acpi_probe > > > > > > Alloc: > > > cxl_decoder_alloc > > > cxl_parse_cfmws > > > acpi_table_parse_entries_array > > > acpi_table_parse_cedt > > > cxl_acpi_probe > > > platform_probe > > > > > > From my reading of the issue, the device struct being used by > > > dev_err() was removed in the put_device() before. > > > > Hi Breno, > > > > I'm not familiar w kfence, but I don't follow what it finds > > suspect here. Does kfence point to exact offensive lines of > > code, or ??? > > > > The put_device() removed &cxld->dev and the dev_err() that > > this patch moves the put after, was using 'dev', which was > > assigned from ctx.dev. It is not the same as &cxld->dev. I > > wonder if Kfence thinks we can get to the next dev_dbg() > > statement and misuse &cxld->dev. > > > > More below... > > > > > > > > > > Put the device just after the message is printed. > > > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > > --- > > > drivers/cxl/acpi.c | 7 +++---- > > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > > > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > > > index 658e6b84a769..5179bf4211d8 100644 > > > --- a/drivers/cxl/acpi.c > > > +++ b/drivers/cxl/acpi.c > > > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, > > > } > > > rc = cxl_decoder_add(cxld, target_map); > > > err_xormap: > > > - if (rc) > > > - put_device(&cxld->dev); > > > - else > > > - rc = cxl_decoder_autoremove(dev, cxld); > > > if (rc) { > > > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", > > > cxld->hpa_range.start, cxld->hpa_range.end); > > > + put_device(&cxld->dev); > > > return 0; > > > + } else { > > > + rc = cxl_decoder_autoremove(dev, cxld); > > > } > > > dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", > > > dev_name(&cxld->dev), > > > -- > > > 2.34.1 > > > > > > > (pulled in fresh code snippet to get the dev_dbg() in view.) > > > > > } > > > rc = cxl_decoder_add(cxld, target_map); > > > err_xormap: > > > if (rc) > > > put_device(&cxld->dev); > > > > > > > This puts &cxld->dev, not dev. > > > > > > > > else > > > rc = cxl_decoder_autoremove(dev, cxld); > > > if (rc) { > > > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", > > > cxld->hpa_range.start, cxld->hpa_range.end); > > > return 0; > > > > This return avoids getting to the next dev_dbg() statement after > > put_device(). We cannot get to the next dev_dbg() statement when > > rc is non zero, but it seems kfence thinks we can. > > > > > } > > > dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", > > > dev_name(&cxld->dev), > > > > If we could get here, after the put_device(), that would be bad. > > > > > phys_to_target_node(cxld->hpa_range.start), > > > cxld->hpa_range.start, cxld->hpa_range.end); > > > > > > return 0; > > > > Seems like the code can be cleaned up this way. The double check of if (rc) > is kinda weird anyhow. > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > index 7e1765b09e04..0573b476d29c 100644 > --- a/drivers/cxl/acpi.c > +++ b/drivers/cxl/acpi.c > @@ -291,21 +291,21 @@ static int cxl_parse_cfmws(union acpi_subtable_headers > *header, void *arg, > } > rc = cxl_decoder_add(cxld, target_map); > err_xormap: > - if (rc) > - put_device(&cxld->dev); > - else > - rc = cxl_decoder_autoremove(dev, cxld); > if (rc) { > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", > cxld->hpa_range.start, cxld->hpa_range.end); > - return 0; > + put_device(&cxld->dev); Why do you think the put_device() is making deref of cxld bad here? That cxld is still present. Maybe the unrelated bug here is that we are leaking the res. Rather that returning directly, take the kfree(res) path out. > + return rc; > } > + > + rc = cxl_decoder_autoremove(dev, cxld); > + Now this rc is not examined. > dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", > dev_name(&cxld->dev), > phys_to_target_node(cxld->hpa_range.start), > cxld->hpa_range.start, cxld->hpa_range.end); > > - return 0; > + return rc; > > err_insert: > kfree(res->name);
On Fri, Jul 07, 2023 at 03:17:58PM -0700, Alison Schofield wrote: > On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote: > > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add() > > fails. Kfence drops this message, after the following: > > > > BUG: KFENCE: use-after-free read in resource_string > > > > This is happening in cxl_parse_cfmws(), and here is a simplified flow > > that is coming from Kfence. > > > > Use-after-free: > > _dev_err > > cxl_parse_cfmws > > acpi_table_parse_entries_array > > acpi_table_parse_cedt > > cxl_acpi_probe > > > > Free: > > cxl_decoder_release > > device_release > > kobject_put > > cxl_parse_cfmws > > acpi_table_parse_entries_array > > acpi_table_parse_cedt > > cxl_acpi_probe > > > > Alloc: > > cxl_decoder_alloc > > cxl_parse_cfmws > > acpi_table_parse_entries_array > > acpi_table_parse_cedt > > cxl_acpi_probe > > platform_probe > > > > From my reading of the issue, the device struct being used by > > dev_err() was removed in the put_device() before. > > Hi Breno, > > I'm not familiar w kfence, but I don't follow what it finds > suspect here. Does kfence point to exact offensive lines of > code, or ??? Unfortunately I do not lines that match anything public. Collecting them might be hard also, since kfence problems during failure mode are not easy to reproduce. > The put_device() removed &cxld->dev and the dev_err() that > this patch moves the put after, was using 'dev', which was > assigned from ctx.dev. It is not the same as &cxld->dev. I > wonder if Kfence thinks we can get to the next dev_dbg() > statement and misuse &cxld->dev. Any chance that "struct device_type->release"() might be touching ctx->dev? This is because "struct device_type->release"() is calling during the put operation, which seems to be the one de-allocating the resource. > More below... > > > > > > Put the device just after the message is printed. > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > --- > > drivers/cxl/acpi.c | 7 +++---- > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > > index 658e6b84a769..5179bf4211d8 100644 > > --- a/drivers/cxl/acpi.c > > +++ b/drivers/cxl/acpi.c > > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, > > } > > rc = cxl_decoder_add(cxld, target_map); > > err_xormap: > > - if (rc) > > - put_device(&cxld->dev); > > - else > > - rc = cxl_decoder_autoremove(dev, cxld); > > if (rc) { > > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", > > cxld->hpa_range.start, cxld->hpa_range.end); > > + put_device(&cxld->dev); > > return 0; > > + } else { > > + rc = cxl_decoder_autoremove(dev, cxld); > > } > > dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", > > dev_name(&cxld->dev), > > -- > > 2.34.1 > > > > (pulled in fresh code snippet to get the dev_dbg() in view.) > > > } > > rc = cxl_decoder_add(cxld, target_map); > >err_xormap: > > if (rc) > > put_device(&cxld->dev); > > > > This puts &cxld->dev, not dev. > > > > > else > > rc = cxl_decoder_autoremove(dev, cxld); > > if (rc) { > > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", > > cxld->hpa_range.start, cxld->hpa_range.end); > > return 0; > > This return avoids getting to the next dev_dbg() statement after > put_device(). We cannot get to the next dev_dbg() statement when > rc is non zero, but it seems kfence thinks we can. Oh, the problems is on dev_err() not on dev_dbg(). Basically on the return path. Here is what dmesg says: cxl root0: Failed to populate active decoder targets cxl_acpi ACPI0017:00: Failed to add decoder for [mem 0x4080000000-0x2baffffffff flags 0x200] cxl root0: Failed to populate active decoder targets cxl_acpi ACPI0017:00: Failed to add decoder for [mem 0x2bb00000000-0x5357fffffff flags 0x200] cxl root0: Failed to populate active decoder targets ================================================================== BUG: KFENCE: use-after-free read in resource_string+0x80/0x570\x0a Use-after-free read at 0x(____ptrval____) (in kfence-#111): resource_string+0x80/0x570 pointer+0x389/0x3c0 vsnprintf+0x214/0x670 pointer+0x1b2/0x3c0 vsnprintf+0x214/0x670 vprintk_store+0x102/0x450 vprintk_emit+0x6f/0x1b0 dev_vprintk_emit+0x117/0x163 dev_printk_emit+0x51/0x6b _dev_err+0x6e/0x88 cxl_parse_cfmws+0x2a0/0x2d acpi_table_parse_entries_array+0x1fc/0x330 acpi_table_parse_cedt+0x4f/0x70 cxl_acpi_probe+0xd6/0x150 platform_probe+0x2f/0x60 really_probe+0x1f5/0x340 driver_probe_device+0x1e/0x80 __driver_attach+0xfc/0x190 bus_for_each_dev+0x76/0xb0 bus_add_driver+0x1bb/0x230 driver_register+0x85/0x120 do_one_initcall+0xbe/0x240 kernel_init_freeable+0x1cc/0x2d2 kernel_init+0x16/0x1a0 ret_from_fork+0x1f/0x30
On Fri, Jul 07, 2023 at 09:50:09AM -0700, Dave Jiang wrote: > > > On 7/7/23 09:16, Breno Leitao wrote: > > Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add() > > fails. Kfence drops this message, after the following: > > > > BUG: KFENCE: use-after-free read in resource_string > > > > This is happening in cxl_parse_cfmws(), and here is a simplified flow > > that is coming from Kfence. > > > > Use-after-free: > > _dev_err > > cxl_parse_cfmws > > acpi_table_parse_entries_array > > acpi_table_parse_cedt > > cxl_acpi_probe > > > > Free: > > cxl_decoder_release > > device_release > > kobject_put > > cxl_parse_cfmws > > acpi_table_parse_entries_array > > acpi_table_parse_cedt > > cxl_acpi_probe > > > > Alloc: > > cxl_decoder_alloc > > cxl_parse_cfmws > > acpi_table_parse_entries_array > > acpi_table_parse_cedt > > cxl_acpi_probe > > platform_probe > > > > From my reading of the issue, the device struct being used by > > dev_err() was removed in the put_device() before. > > > > Put the device just after the message is printed. > > > > Signed-off-by: Breno Leitao <leitao@debian.org> > > --- > > drivers/cxl/acpi.c | 7 +++---- > > 1 file changed, 3 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c > > index 658e6b84a769..5179bf4211d8 100644 > > --- a/drivers/cxl/acpi.c > > +++ b/drivers/cxl/acpi.c > > @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, > > } > > rc = cxl_decoder_add(cxld, target_map); > > err_xormap: > > - if (rc) > > - put_device(&cxld->dev); > > - else > > - rc = cxl_decoder_autoremove(dev, cxld); > > if (rc) { > > dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", > > cxld->hpa_range.start, cxld->hpa_range.end); > > + put_device(&cxld->dev); > > return 0; > > I think you will want to change this to 'return rc;' in order to reflect the > error. This is a good point also, and I can change it in v2, if there is an agreement in this patch.
On 7/7/23 18:07, Alison Schofield wrote: > On Fri, Jul 07, 2023 at 05:33:02PM -0700, Dave Jiang wrote: >> >> >> On 7/7/23 15:17, Alison Schofield wrote: >>> On Fri, Jul 07, 2023 at 09:16:16AM -0700, Breno Leitao wrote: >>>> Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add() >>>> fails. Kfence drops this message, after the following: >>>> >>>> BUG: KFENCE: use-after-free read in resource_string >>>> >>>> This is happening in cxl_parse_cfmws(), and here is a simplified flow >>>> that is coming from Kfence. >>>> >>>> Use-after-free: >>>> _dev_err >>>> cxl_parse_cfmws >>>> acpi_table_parse_entries_array >>>> acpi_table_parse_cedt >>>> cxl_acpi_probe >>>> >>>> Free: >>>> cxl_decoder_release >>>> device_release >>>> kobject_put >>>> cxl_parse_cfmws >>>> acpi_table_parse_entries_array >>>> acpi_table_parse_cedt >>>> cxl_acpi_probe >>>> >>>> Alloc: >>>> cxl_decoder_alloc >>>> cxl_parse_cfmws >>>> acpi_table_parse_entries_array >>>> acpi_table_parse_cedt >>>> cxl_acpi_probe >>>> platform_probe >>>> >>>> From my reading of the issue, the device struct being used by >>>> dev_err() was removed in the put_device() before. >>> >>> Hi Breno, >>> >>> I'm not familiar w kfence, but I don't follow what it finds >>> suspect here. Does kfence point to exact offensive lines of >>> code, or ??? >>> >>> The put_device() removed &cxld->dev and the dev_err() that >>> this patch moves the put after, was using 'dev', which was >>> assigned from ctx.dev. It is not the same as &cxld->dev. I >>> wonder if Kfence thinks we can get to the next dev_dbg() >>> statement and misuse &cxld->dev. >>> >>> More below... >>> >>> >>>> >>>> Put the device just after the message is printed. >>>> >>>> Signed-off-by: Breno Leitao <leitao@debian.org> >>>> --- >>>> drivers/cxl/acpi.c | 7 +++---- >>>> 1 file changed, 3 insertions(+), 4 deletions(-) >>>> >>>> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c >>>> index 658e6b84a769..5179bf4211d8 100644 >>>> --- a/drivers/cxl/acpi.c >>>> +++ b/drivers/cxl/acpi.c >>>> @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, >>>> } >>>> rc = cxl_decoder_add(cxld, target_map); >>>> err_xormap: >>>> - if (rc) >>>> - put_device(&cxld->dev); >>>> - else >>>> - rc = cxl_decoder_autoremove(dev, cxld); >>>> if (rc) { >>>> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", >>>> cxld->hpa_range.start, cxld->hpa_range.end); >>>> + put_device(&cxld->dev); >>>> return 0; >>>> + } else { >>>> + rc = cxl_decoder_autoremove(dev, cxld); >>>> } >>>> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", >>>> dev_name(&cxld->dev), >>>> -- >>>> 2.34.1 >>>> >>> >>> (pulled in fresh code snippet to get the dev_dbg() in view.) >>> >>>> } >>>> rc = cxl_decoder_add(cxld, target_map); >>>> err_xormap: >>>> if (rc) >>>> put_device(&cxld->dev); >>>> >>> >>> This puts &cxld->dev, not dev. >>> >>>> >>>> else >>>> rc = cxl_decoder_autoremove(dev, cxld); >>>> if (rc) { >>>> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", >>>> cxld->hpa_range.start, cxld->hpa_range.end); >>>> return 0; >>> >>> This return avoids getting to the next dev_dbg() statement after >>> put_device(). We cannot get to the next dev_dbg() statement when >>> rc is non zero, but it seems kfence thinks we can. >>> >>>> } >>>> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", >>>> dev_name(&cxld->dev), >>> >>> If we could get here, after the put_device(), that would be bad. >>> >>>> phys_to_target_node(cxld->hpa_range.start), >>>> cxld->hpa_range.start, cxld->hpa_range.end); >>>> >>>> return 0; >>> >> >> Seems like the code can be cleaned up this way. The double check of if (rc) >> is kinda weird anyhow. >> >> diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c >> index 7e1765b09e04..0573b476d29c 100644 >> --- a/drivers/cxl/acpi.c >> +++ b/drivers/cxl/acpi.c >> @@ -291,21 +291,21 @@ static int cxl_parse_cfmws(union acpi_subtable_headers >> *header, void *arg, >> } >> rc = cxl_decoder_add(cxld, target_map); >> err_xormap: >> - if (rc) >> - put_device(&cxld->dev); >> - else >> - rc = cxl_decoder_autoremove(dev, cxld); >> if (rc) { >> dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", >> cxld->hpa_range.start, cxld->hpa_range.end); >> - return 0; >> + put_device(&cxld->dev); > > Why do you think the put_device() is making deref of cxld bad here? > That cxld is still present. The put_device() can trigger the ->release() of the cxld->dev and therefore cxld may no longer be present. > > Maybe the unrelated bug here is that we are leaking the res. > Rather that returning directly, take the kfree(res) path out. > > >> + return rc; >> } >> + >> + rc = cxl_decoder_autoremove(dev, cxld); >> + > > Now this rc is not examined. Yeah, should check here and return rc. > > >> dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", >> dev_name(&cxld->dev), >> phys_to_target_node(cxld->hpa_range.start), >> cxld->hpa_range.start, cxld->hpa_range.end); >> >> - return 0; >> + return rc; >> >> err_insert: >> kfree(res->name);
diff --git a/drivers/cxl/acpi.c b/drivers/cxl/acpi.c index 658e6b84a769..5179bf4211d8 100644 --- a/drivers/cxl/acpi.c +++ b/drivers/cxl/acpi.c @@ -291,14 +291,13 @@ static int cxl_parse_cfmws(union acpi_subtable_headers *header, void *arg, } rc = cxl_decoder_add(cxld, target_map); err_xormap: - if (rc) - put_device(&cxld->dev); - else - rc = cxl_decoder_autoremove(dev, cxld); if (rc) { dev_err(dev, "Failed to add decode range [%#llx - %#llx]\n", cxld->hpa_range.start, cxld->hpa_range.end); + put_device(&cxld->dev); return 0; + } else { + rc = cxl_decoder_autoremove(dev, cxld); } dev_dbg(dev, "add: %s node: %d range [%#llx - %#llx]\n", dev_name(&cxld->dev),
Kfence is detecting a user-after-free in the CXL, when cxl_decoder_add() fails. Kfence drops this message, after the following: BUG: KFENCE: use-after-free read in resource_string This is happening in cxl_parse_cfmws(), and here is a simplified flow that is coming from Kfence. Use-after-free: _dev_err cxl_parse_cfmws acpi_table_parse_entries_array acpi_table_parse_cedt cxl_acpi_probe Free: cxl_decoder_release device_release kobject_put cxl_parse_cfmws acpi_table_parse_entries_array acpi_table_parse_cedt cxl_acpi_probe Alloc: cxl_decoder_alloc cxl_parse_cfmws acpi_table_parse_entries_array acpi_table_parse_cedt cxl_acpi_probe platform_probe From my reading of the issue, the device struct being used by dev_err() was removed in the put_device() before. Put the device just after the message is printed. Signed-off-by: Breno Leitao <leitao@debian.org> --- drivers/cxl/acpi.c | 7 +++---- 1 file changed, 3 insertions(+), 4 deletions(-)