mbox series

[0/2] driver core: platform: avoid use-after-free on device name

Message ID 20250218-pdev-uaf-v1-0-5ea1a0d3aba0@bootlin.com (mailing list archive)
Headers show
Series driver core: platform: avoid use-after-free on device name | expand

Message

Théo Lebrun Feb. 18, 2025, 11 a.m. UTC
The use-after-free bug appears when:
 - A platform device is created from OF, by of_device_add();
 - The same device's name is changed afterwards using dev_set_name(),
   by its probe for example.

Out of the 37 drivers that deal with platform devices and do a
dev_set_name() call, only one might be affected. That driver is
loongson-i2s-plat [0]. All other dev_set_name() calls are on children
devices created on the spot. The issue was found on downstream kernels
and we don't have what it takes to test loongson-i2s-plat.

Note: loongson-i2s-plat maintainers are CCed.

   ⟩ # Finding potential trouble-makers:
   ⟩ git grep -l 'struct platform_device' | xargs grep -l dev_set_name

The solution proposed is to add a flag to platform_device that tells if
it is responsible for freeing its name. We can then duplicate the
device name inside of_device_add() instead of copying the pointer.

What is done elsewhere?
 - Platform bus code does a copy of the argument name that is stored
   alongside the struct platform_device; see platform_device_alloc()[1].
 - Other busses duplicate the device name; either through a dynamic
   allocation [2] or through an array embedded inside devices [3].
 - Some busses don't have a separate name; when they want a name they
   take it from the device [4].

[0]: https://elixir.bootlin.com/linux/v6.13.2/source/sound/soc/loongson/loongson_i2s_plat.c#L155
[1]: https://elixir.bootlin.com/linux/v6.13.2/source/drivers/base/platform.c#L581
[2]: https://elixir.bootlin.com/linux/v6.13.2/source/drivers/gpu/drm/drm_drv.c#L679
[3]: https://elixir.bootlin.com/linux/v6.13.2/source/include/linux/i2c.h#L343
[4]: https://elixir.bootlin.com/linux/v6.13.2/source/include/linux/pci.h#L2150

This can be reproduced using Buildroot's qemu_aarch64_virt_defconfig
with CONFIG_KASAN=y and a dev_set_name() inside the probe of:
drivers/pci/controller/pci-host-common.c

The below splat appears at boot. It happens whenever something tries to
access pdev->name; one big consumer of this field is platform_match()
that fallbacks to name matching.

   ==================================================================
   BUG: KASAN: slab-use-after-free in strcmp+0x2c/0x78
   Read of size 1 at addr ffffff80c0300160 by task swapper/0/1

   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 6.6.32 #1
   Hardware name: linux,dummy-virt (DT)
   Call trace:
    dump_backtrace+0x90/0xe8
    show_stack+0x18/0x24
    dump_stack_lvl+0x48/0x60
    print_report+0xf8/0x5d8
    kasan_report+0x90/0xcc
    __asan_load1+0x60/0x6c
    strcmp+0x2c/0x78
    platform_match+0xd0/0x140
    __driver_attach+0x44/0x240
    bus_for_each_dev+0xe4/0x160
    driver_attach+0x34/0x44
    bus_add_driver+0x134/0x270
    driver_register+0xa4/0x1e4
    __platform_driver_register+0x44/0x54
    ged_driver_init+0x1c/0x28
    do_one_initcall+0xdc/0x260
    kernel_init_freeable+0x314/0x448
    kernel_init+0x2c/0x1e0
    ret_from_fork+0x10/0x20

   Allocated by task 1:
    kasan_save_stack+0x3c/0x64
    kasan_set_track+0x2c/0x40
    kasan_save_alloc_info+0x24/0x34
    __kasan_kmalloc+0xb8/0xbc
    __kmalloc_node_track_caller+0x64/0xa4
    kvasprintf+0xcc/0x16c
    kvasprintf_const+0xe8/0x180
    kobject_set_name_vargs+0x54/0xd4
    dev_set_name+0xa8/0xe4
    of_device_make_bus_id+0x298/0x2b0
    of_device_alloc+0x1ec/0x204
    of_platform_device_create_pdata+0x60/0x168
    of_platform_bus_create+0x20c/0x4a0
    of_platform_populate+0x50/0x10c
    of_platform_default_populate_init+0xe0/0x100
    do_one_initcall+0xdc/0x260
    kernel_init_freeable+0x314/0x448
    kernel_init+0x2c/0x1e0
    ret_from_fork+0x10/0x20

   Freed by task 1:
    kasan_save_stack+0x3c/0x64
    kasan_set_track+0x2c/0x40
    kasan_save_free_info+0x38/0x60
    __kasan_slab_free+0xe4/0x150
    __kmem_cache_free+0x134/0x26c
    kfree+0x54/0x6c
    kfree_const+0x34/0x40
    kobject_set_name_vargs+0xa8/0xd4
    dev_set_name+0xa8/0xe4
    pci_host_common_probe+0x9c/0x294
    platform_probe+0x90/0x100
    really_probe+0x100/0x3cc
    __driver_probe_device+0xb8/0x18c
    driver_probe_device+0x108/0x1d8
    __driver_attach+0xc8/0x240
    bus_for_each_dev+0xe4/0x160
    driver_attach+0x34/0x44
    bus_add_driver+0x134/0x270
    driver_register+0xa4/0x1e4
    __platform_driver_register+0x44/0x54
    gen_pci_driver_init+0x1c/0x28
    do_one_initcall+0xdc/0x260
    kernel_init_freeable+0x314/0x448
    kernel_init+0x2c/0x1e0
    ret_from_fork+0x10/0x20

   The buggy address belongs to the object at ffffff80c0300160
    which belongs to the cache kmalloc-16 of size 16
   The buggy address is located 0 bytes inside of
    freed 16-byte region [ffffff80c0300160, ffffff80c0300170)

   The buggy address belongs to the physical page:
   page:0000000099fe29a0 refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x100300
   flags: 0x8000000000000800(slab|zone=2)
   page_type: 0xffffffff()
   raw: 8000000000000800 ffffff80c00013c0 dead000000000122 0000000000000000
   raw: 0000000000000000 0000000080800080 00000001ffffffff 0000000000000000
   page dumped because: kasan: bad access detected

   Memory state around the buggy address:
    ffffff80c0300000: fa fb fc fc fa fb fc fc 00 07 fc fc 00 07 fc fc
    ffffff80c0300080: 00 07 fc fc 00 02 fc fc 00 02 fc fc 00 02 fc fc
   >ffffff80c0300100: 00 06 fc fc 00 06 fc fc 00 06 fc fc fa fb fc fc
                                                          ^
    ffffff80c0300180: 00 00 fc fc 00 00 fc fc 00 06 fc fc 00 06 fc fc
    ffffff80c0300200: 00 06 fc fc 00 06 fc fc 00 06 fc fc 00 06 fc fc
   ==================================================================

Signed-off-by: Théo Lebrun <theo.lebrun@bootlin.com>
---
Théo Lebrun (2):
      driver core: platform: turn pdev->id_auto into pdev->flags
      driver core: platform: avoid use-after-free on pdev->name

 drivers/base/platform.c         |  8 +++++---
 drivers/of/platform.c           | 12 +++++++++++-
 include/linux/platform_device.h |  4 +++-
 3 files changed, 19 insertions(+), 5 deletions(-)
---
base-commit: 0ad2507d5d93f39619fc42372c347d6006b64319
change-id: 20250217-pdev-uaf-1a779a98d81b

Best regards,

Comments

Greg KH Feb. 20, 2025, 12:41 p.m. UTC | #1
On Tue, Feb 18, 2025 at 12:00:11PM +0100, Théo Lebrun wrote:
> The use-after-free bug appears when:
>  - A platform device is created from OF, by of_device_add();
>  - The same device's name is changed afterwards using dev_set_name(),
>    by its probe for example.
> 
> Out of the 37 drivers that deal with platform devices and do a
> dev_set_name() call, only one might be affected. That driver is
> loongson-i2s-plat [0]. All other dev_set_name() calls are on children
> devices created on the spot. The issue was found on downstream kernels
> and we don't have what it takes to test loongson-i2s-plat.
> 
> Note: loongson-i2s-plat maintainers are CCed.
> 
>    ⟩ # Finding potential trouble-makers:
>    ⟩ git grep -l 'struct platform_device' | xargs grep -l dev_set_name
> 
> The solution proposed is to add a flag to platform_device that tells if
> it is responsible for freeing its name. We can then duplicate the
> device name inside of_device_add() instead of copying the pointer.

Ick.

> What is done elsewhere?
>  - Platform bus code does a copy of the argument name that is stored
>    alongside the struct platform_device; see platform_device_alloc()[1].
>  - Other busses duplicate the device name; either through a dynamic
>    allocation [2] or through an array embedded inside devices [3].
>  - Some busses don't have a separate name; when they want a name they
>    take it from the device [4].

Really ick.

Let's do the right thing here and just get rid of the name pointer
entirely in struct platform_device please.  Isn't that the correct
thing that way the driver core logic will work properly for all of this.

thanks,

greg k-h
Théo Lebrun Feb. 20, 2025, 1:31 p.m. UTC | #2
Hello Greg,

On Thu Feb 20, 2025 at 1:41 PM CET, Greg Kroah-Hartman wrote:
> On Tue, Feb 18, 2025 at 12:00:11PM +0100, Théo Lebrun wrote:
>> The use-after-free bug appears when:
>>  - A platform device is created from OF, by of_device_add();
>>  - The same device's name is changed afterwards using dev_set_name(),
>>    by its probe for example.
>> 
>> Out of the 37 drivers that deal with platform devices and do a
>> dev_set_name() call, only one might be affected. That driver is
>> loongson-i2s-plat [0]. All other dev_set_name() calls are on children
>> devices created on the spot. The issue was found on downstream kernels
>> and we don't have what it takes to test loongson-i2s-plat.
>> 
>> Note: loongson-i2s-plat maintainers are CCed.
>> 
>>    ⟩ # Finding potential trouble-makers:
>>    ⟩ git grep -l 'struct platform_device' | xargs grep -l dev_set_name
>> 
>> The solution proposed is to add a flag to platform_device that tells if
>> it is responsible for freeing its name. We can then duplicate the
>> device name inside of_device_add() instead of copying the pointer.
>
> Ick.
>
>> What is done elsewhere?
>>  - Platform bus code does a copy of the argument name that is stored
>>    alongside the struct platform_device; see platform_device_alloc()[1].
>>  - Other busses duplicate the device name; either through a dynamic
>>    allocation [2] or through an array embedded inside devices [3].
>>  - Some busses don't have a separate name; when they want a name they
>>    take it from the device [4].
>
> Really ick.
>
> Let's do the right thing here and just get rid of the name pointer
> entirely in struct platform_device please.  Isn't that the correct
> thing that way the driver core logic will work properly for all of this.

I would agree, if it wasn't for this consideration that is found in the
commit message [0]:

> It is important to duplicate! pdev->name must not change to make sure
> the platform_match() return value is stable over time. If we updated
> pdev->name alongside dev->name, once a device probes and changes its
> name then the platform_match() return value would change.

I'd be fine sending a V2 that removes the field *and the fallback* [1],
but I don't have the full scope in mind to know what would become broken.

[0]: https://lore.kernel.org/lkml/20250218-pdev-uaf-v1-2-5ea1a0d3aba0@bootlin.com/
[1]: https://elixir.bootlin.com/linux/v6.13.3/source/drivers/base/platform.c#L1357

Regards,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Greg KH Feb. 20, 2025, 2:06 p.m. UTC | #3
On Thu, Feb 20, 2025 at 02:31:29PM +0100, Théo Lebrun wrote:
> Hello Greg,
> 
> On Thu Feb 20, 2025 at 1:41 PM CET, Greg Kroah-Hartman wrote:
> > On Tue, Feb 18, 2025 at 12:00:11PM +0100, Théo Lebrun wrote:
> >> The use-after-free bug appears when:
> >>  - A platform device is created from OF, by of_device_add();
> >>  - The same device's name is changed afterwards using dev_set_name(),
> >>    by its probe for example.
> >> 
> >> Out of the 37 drivers that deal with platform devices and do a
> >> dev_set_name() call, only one might be affected. That driver is
> >> loongson-i2s-plat [0]. All other dev_set_name() calls are on children
> >> devices created on the spot. The issue was found on downstream kernels
> >> and we don't have what it takes to test loongson-i2s-plat.
> >> 
> >> Note: loongson-i2s-plat maintainers are CCed.
> >> 
> >>    ⟩ # Finding potential trouble-makers:
> >>    ⟩ git grep -l 'struct platform_device' | xargs grep -l dev_set_name
> >> 
> >> The solution proposed is to add a flag to platform_device that tells if
> >> it is responsible for freeing its name. We can then duplicate the
> >> device name inside of_device_add() instead of copying the pointer.
> >
> > Ick.
> >
> >> What is done elsewhere?
> >>  - Platform bus code does a copy of the argument name that is stored
> >>    alongside the struct platform_device; see platform_device_alloc()[1].
> >>  - Other busses duplicate the device name; either through a dynamic
> >>    allocation [2] or through an array embedded inside devices [3].
> >>  - Some busses don't have a separate name; when they want a name they
> >>    take it from the device [4].
> >
> > Really ick.
> >
> > Let's do the right thing here and just get rid of the name pointer
> > entirely in struct platform_device please.  Isn't that the correct
> > thing that way the driver core logic will work properly for all of this.
> 
> I would agree, if it wasn't for this consideration that is found in the
> commit message [0]:

What, that the of code is broken?  Then it should be fixed, why does it
need a pointer to a name at all anyway?  It shouldn't be needed there
either.

> > It is important to duplicate! pdev->name must not change to make sure
> > the platform_match() return value is stable over time. If we updated
> > pdev->name alongside dev->name, once a device probes and changes its
> > name then the platform_match() return value would change.
> 
> I'd be fine sending a V2 that removes the field *and the fallback* [1],
> but I don't have the full scope in mind to know what would become broken.
> 
> [0]: https://lore.kernel.org/lkml/20250218-pdev-uaf-v1-2-5ea1a0d3aba0@bootlin.com/
> [1]: https://elixir.bootlin.com/linux/v6.13.3/source/drivers/base/platform.c#L1357

The fallback will not need to be removed, properly point to the name of
the device and it should work correctly.

thanks,

greg k-h
Théo Lebrun Feb. 20, 2025, 3:46 p.m. UTC | #4
On Thu Feb 20, 2025 at 3:06 PM CET, Greg Kroah-Hartman wrote:
> On Thu, Feb 20, 2025 at 02:31:29PM +0100, Théo Lebrun wrote:
>> On Thu Feb 20, 2025 at 1:41 PM CET, Greg Kroah-Hartman wrote:
>> > On Tue, Feb 18, 2025 at 12:00:11PM +0100, Théo Lebrun wrote:
>> >> The solution proposed is to add a flag to platform_device that tells if
>> >> it is responsible for freeing its name. We can then duplicate the
>> >> device name inside of_device_add() instead of copying the pointer.
>> >
>> > Ick.
>> >
>> >> What is done elsewhere?
>> >>  - Platform bus code does a copy of the argument name that is stored
>> >>    alongside the struct platform_device; see platform_device_alloc()[1].
>> >>  - Other busses duplicate the device name; either through a dynamic
>> >>    allocation [2] or through an array embedded inside devices [3].
>> >>  - Some busses don't have a separate name; when they want a name they
>> >>    take it from the device [4].
>> >
>> > Really ick.
>> >
>> > Let's do the right thing here and just get rid of the name pointer
>> > entirely in struct platform_device please.  Isn't that the correct
>> > thing that way the driver core logic will work properly for all of this.
>> 
>> I would agree, if it wasn't for this consideration that is found in the
>> commit message [0]:
>
> What, that the of code is broken?  Then it should be fixed, why does it
> need a pointer to a name at all anyway?  It shouldn't be needed there
> either.

I cannot guess why it originally has a separate pdev->name field.
All I can tell you is a good reason to have one, as quoted below.

>> > It is important to duplicate! pdev->name must not change to make sure
>> > the platform_match() return value is stable over time. If we updated
>> > pdev->name alongside dev->name, once a device probes and changes its
>> > name then the platform_match() return value would change.
>> 
>> I'd be fine sending a V2 that removes the field *and the fallback* [1],
>> but I don't have the full scope in mind to know what would become broken.
>> 
>> [0]: https://lore.kernel.org/lkml/20250218-pdev-uaf-v1-2-5ea1a0d3aba0@bootlin.com/
>> [1]: https://elixir.bootlin.com/linux/v6.13.3/source/drivers/base/platform.c#L1357
>
> The fallback will not need to be removed, properly point to the name of
> the device and it should work correctly.

No, it will not work correctly, as the above quote indicates.

Let's assume we remove the field, this situation would be broken:
 - OF allocates platform devices and gives them names.
 - A device matches with a driver, which gets probed.
 - During the probe, driver does a dev_set_name().
 - Afterwards, the upcoming platform_match() against other drivers are
   called with another device name.

We should be safe as there are guardraids to not probe twice a device,
see __driver_probe_device() that checks dev->driver is NULL. But it
isn't a situation we should be in.

Another broken situation:
 - OF allocates platform devices and gives them names.
 - A device matches with a driver, which gets probed based on its name.
 - During the probe, driver does a dev_set_name().
 - Module is removed.
 - Module is re-added, the (driver, device) pair don't end up matching
   again because the device name changed.

I might be missing other edge-cases.

Conclusion: we need a constant name for platform devices as we want the
return value of platform_match() to stay stable across time.

Regards,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Greg KH Feb. 20, 2025, 4:19 p.m. UTC | #5
On Thu, Feb 20, 2025 at 04:46:59PM +0100, Théo Lebrun wrote:
> On Thu Feb 20, 2025 at 3:06 PM CET, Greg Kroah-Hartman wrote:
> > On Thu, Feb 20, 2025 at 02:31:29PM +0100, Théo Lebrun wrote:
> >> On Thu Feb 20, 2025 at 1:41 PM CET, Greg Kroah-Hartman wrote:
> >> > On Tue, Feb 18, 2025 at 12:00:11PM +0100, Théo Lebrun wrote:
> >> >> The solution proposed is to add a flag to platform_device that tells if
> >> >> it is responsible for freeing its name. We can then duplicate the
> >> >> device name inside of_device_add() instead of copying the pointer.
> >> >
> >> > Ick.
> >> >
> >> >> What is done elsewhere?
> >> >>  - Platform bus code does a copy of the argument name that is stored
> >> >>    alongside the struct platform_device; see platform_device_alloc()[1].
> >> >>  - Other busses duplicate the device name; either through a dynamic
> >> >>    allocation [2] or through an array embedded inside devices [3].
> >> >>  - Some busses don't have a separate name; when they want a name they
> >> >>    take it from the device [4].
> >> >
> >> > Really ick.
> >> >
> >> > Let's do the right thing here and just get rid of the name pointer
> >> > entirely in struct platform_device please.  Isn't that the correct
> >> > thing that way the driver core logic will work properly for all of this.
> >> 
> >> I would agree, if it wasn't for this consideration that is found in the
> >> commit message [0]:
> >
> > What, that the of code is broken?  Then it should be fixed, why does it
> > need a pointer to a name at all anyway?  It shouldn't be needed there
> > either.
> 
> I cannot guess why it originally has a separate pdev->name field.

Many people got this wrong when we designed busses, it's not unique.
But we should learn from our mistakes where we can :)

> >> > It is important to duplicate! pdev->name must not change to make sure
> >> > the platform_match() return value is stable over time. If we updated
> >> > pdev->name alongside dev->name, once a device probes and changes its
> >> > name then the platform_match() return value would change.
> >> 
> >> I'd be fine sending a V2 that removes the field *and the fallback* [1],
> >> but I don't have the full scope in mind to know what would become broken.
> >> 
> >> [0]: https://lore.kernel.org/lkml/20250218-pdev-uaf-v1-2-5ea1a0d3aba0@bootlin.com/
> >> [1]: https://elixir.bootlin.com/linux/v6.13.3/source/drivers/base/platform.c#L1357
> >
> > The fallback will not need to be removed, properly point to the name of
> > the device and it should work correctly.
> 
> No, it will not work correctly, as the above quote indicates.

I don't know which quote, sorry.

> Let's assume we remove the field, this situation would be broken:
>  - OF allocates platform devices and gives them names.
>  - A device matches with a driver, which gets probed.
>  - During the probe, driver does a dev_set_name().
>  - Afterwards, the upcoming platform_match() against other drivers are
>    called with another device name.
> 
> We should be safe as there are guardraids to not probe twice a device,
> see __driver_probe_device() that checks dev->driver is NULL. But it
> isn't a situation we should be in.

The fragility of attempting to match a driver to a device purely by a
name is a very week part of using platform devices.

Why would a driver change the device name?  It's been given to the
driver to "bind to" not to change its name.  That shouldn't be ok, fix
those drivers.

> Another broken situation:
>  - OF allocates platform devices and gives them names.
>  - A device matches with a driver, which gets probed based on its name.
>  - During the probe, driver does a dev_set_name().

Again, don't do that.  That's the breaking part.

>  - Module is removed.
>  - Module is re-added, the (driver, device) pair don't end up matching
>    again because the device name changed.

Sure, that was a bug in the driver.  It shouldn't be changing the name,
the name is set/owned by the bus, not the driver.

Do we have examples today of platform drivers that like to rename
devices?  I did a quick search and couldn't find any in-tree, but I
might have missed some.

Again, the bus controls the name when the device is created, changing it
after the fact is generally not a good idea.

> I might be missing other edge-cases.
> 
> Conclusion: we need a constant name for platform devices as we want the
> return value of platform_match() to stay stable across time.

No, let's just not rename devices in platform drivers.

Or if this really is an issue, let's fix OF to not use the platform bus
and have it's own bus for stuff like this.

thanks,

greg k-h
Théo Lebrun Feb. 20, 2025, 6:26 p.m. UTC | #6
On Thu Feb 20, 2025 at 5:19 PM CET, Greg Kroah-Hartman wrote:
> On Thu, Feb 20, 2025 at 04:46:59PM +0100, Théo Lebrun wrote:
>> On Thu Feb 20, 2025 at 3:06 PM CET, Greg Kroah-Hartman wrote:
>> > On Thu, Feb 20, 2025 at 02:31:29PM +0100, Théo Lebrun wrote:
>> >> On Thu Feb 20, 2025 at 1:41 PM CET, Greg Kroah-Hartman wrote:
>> >> > On Tue, Feb 18, 2025 at 12:00:11PM +0100, Théo Lebrun wrote:
>> >> >> The solution proposed is to add a flag to platform_device that tells if
>> >> >> it is responsible for freeing its name. We can then duplicate the
>> >> >> device name inside of_device_add() instead of copying the pointer.
>> >> >
>> >> > Ick.
>> >> >
>> >> >> What is done elsewhere?
>> >> >>  - Platform bus code does a copy of the argument name that is stored
>> >> >>    alongside the struct platform_device; see platform_device_alloc()[1].
>> >> >>  - Other busses duplicate the device name; either through a dynamic
>> >> >>    allocation [2] or through an array embedded inside devices [3].
>> >> >>  - Some busses don't have a separate name; when they want a name they
>> >> >>    take it from the device [4].
>> >> >
>> >> > Really ick.
>> >> >
>> >> > Let's do the right thing here and just get rid of the name pointer
>> >> > entirely in struct platform_device please.  Isn't that the correct
>> >> > thing that way the driver core logic will work properly for all of this.
>> >> 
>> >> I would agree, if it wasn't for this consideration that is found in the
>> >> commit message [0]:
>> >
>> > What, that the of code is broken?  Then it should be fixed, why does it
>> > need a pointer to a name at all anyway?  It shouldn't be needed there
>> > either.
>> 
>> I cannot guess why it originally has a separate pdev->name field.
>
> Many people got this wrong when we designed busses, it's not unique.
> But we should learn from our mistakes where we can :)
>
>> >> > It is important to duplicate! pdev->name must not change to make sure
>> >> > the platform_match() return value is stable over time. If we updated
>> >> > pdev->name alongside dev->name, once a device probes and changes its
>> >> > name then the platform_match() return value would change.
>> >> 
>> >> I'd be fine sending a V2 that removes the field *and the fallback* [1],
>> >> but I don't have the full scope in mind to know what would become broken.
>> >> 
>> >> [0]: https://lore.kernel.org/lkml/20250218-pdev-uaf-v1-2-5ea1a0d3aba0@bootlin.com/
>> >> [1]: https://elixir.bootlin.com/linux/v6.13.3/source/drivers/base/platform.c#L1357
>> >
>> > The fallback will not need to be removed, properly point to the name of
>> > the device and it should work correctly.
>> 
>> No, it will not work correctly, as the above quote indicates.
>
> I don't know which quote, sorry.
>
>> Let's assume we remove the field, this situation would be broken:
>>  - OF allocates platform devices and gives them names.
>>  - A device matches with a driver, which gets probed.
>>  - During the probe, driver does a dev_set_name().
>>  - Afterwards, the upcoming platform_match() against other drivers are
>>    called with another device name.
>> 
>> We should be safe as there are guardraids to not probe twice a device,
>> see __driver_probe_device() that checks dev->driver is NULL. But it
>> isn't a situation we should be in.
>
> The fragility of attempting to match a driver to a device purely by a
> name is a very week part of using platform devices.

I never said the opposite, and I agree.
However the mechanism exists and I was focused on not breaking it.

> Why would a driver change the device name?  It's been given to the
> driver to "bind to" not to change its name.  That shouldn't be ok, fix
> those drivers.

I do get the argument that devices shouldn't change device names. I'll
take the devil's advocate and give at least one argument FOR allowing
changing names: prettier names, especially as device names leak into
userspace through pseudo filesystems.

If we agree that device names shouldn't be changed one a device is
matched with a driver, then (1) we can remove the pdev->name field and
(2) `dev_set_name()` should warn when used too late.

Turn the implicit explicit.


diff --git a/drivers/base/core.c b/drivers/base/core.c
index 5a1f05198114..3532b068e32d 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -3462,10 +3462,13 @@ static void device_remove_class_symlinks(struct device *dev)
 int dev_set_name(struct device *dev, const char *fmt, ...)
 {
        va_list vargs;
        int err;

+       if (dev_WARN_ONCE(dev, dev->driver, "device name is static once matched"))
+               return -EPERM;
+
        va_start(vargs, fmt);
        err = kobject_set_name_vargs(&dev->kobj, fmt, vargs);
        va_end(vargs);
        return err;
 }

(Unsure about the exact error code to return.)

[...]

> Do we have examples today of platform drivers that like to rename
> devices?  I did a quick search and couldn't find any in-tree, but I
> might have missed some.

The cover letter expands on the quest for those drivers:

On Tue Feb 18, 2025 at 12:00 PM CET, Théo Lebrun wrote:
> Out of the 37 drivers that deal with platform devices and do a
> dev_set_name() call, only one might be affected. That driver is
> loongson-i2s-plat [0]. All other dev_set_name() calls are on children
> devices created on the spot. The issue was found on downstream kernels
> and we don't have what it takes to test loongson-i2s-plat.
[...]
>
>    ⟩ # Finding potential trouble-makers:
>    ⟩ git grep -l 'struct platform_device' | xargs grep -l dev_set_name
>
[...]
> [0]: https://elixir.bootlin.com/linux/v6.13.2/source/sound/soc/loongson/loongson_i2s_plat.c#L155

[...]

> Or if this really is an issue, let's fix OF to not use the platform bus
> and have it's own bus for stuff like this.

That used to exist! I cannot see how it could be a good idea to
reintroduce the distinction though.

commit eca3930163ba8884060ce9d9ff5ef0d9b7c7b00f
Author: Grant Likely <grant.likely@secretlab.ca>
Date:   Tue Jun 8 07:48:21 2010 -0600

    of: Merge of_platform_bus_type with platform_bus_type

Thanks,

--
Théo Lebrun, Bootlin
Embedded Linux and Kernel engineering
https://bootlin.com
Greg KH Feb. 20, 2025, 6:55 p.m. UTC | #7
On Thu, Feb 20, 2025 at 07:26:41PM +0100, Théo Lebrun wrote:
> On Thu Feb 20, 2025 at 5:19 PM CET, Greg Kroah-Hartman wrote:
> > On Thu, Feb 20, 2025 at 04:46:59PM +0100, Théo Lebrun wrote:
> >> On Thu Feb 20, 2025 at 3:06 PM CET, Greg Kroah-Hartman wrote:
> >> > On Thu, Feb 20, 2025 at 02:31:29PM +0100, Théo Lebrun wrote:
> >> >> On Thu Feb 20, 2025 at 1:41 PM CET, Greg Kroah-Hartman wrote:
> >> >> > On Tue, Feb 18, 2025 at 12:00:11PM +0100, Théo Lebrun wrote:
> >> >> >> The solution proposed is to add a flag to platform_device that tells if
> >> >> >> it is responsible for freeing its name. We can then duplicate the
> >> >> >> device name inside of_device_add() instead of copying the pointer.
> >> >> >
> >> >> > Ick.
> >> >> >
> >> >> >> What is done elsewhere?
> >> >> >>  - Platform bus code does a copy of the argument name that is stored
> >> >> >>    alongside the struct platform_device; see platform_device_alloc()[1].
> >> >> >>  - Other busses duplicate the device name; either through a dynamic
> >> >> >>    allocation [2] or through an array embedded inside devices [3].
> >> >> >>  - Some busses don't have a separate name; when they want a name they
> >> >> >>    take it from the device [4].
> >> >> >
> >> >> > Really ick.
> >> >> >
> >> >> > Let's do the right thing here and just get rid of the name pointer
> >> >> > entirely in struct platform_device please.  Isn't that the correct
> >> >> > thing that way the driver core logic will work properly for all of this.
> >> >> 
> >> >> I would agree, if it wasn't for this consideration that is found in the
> >> >> commit message [0]:
> >> >
> >> > What, that the of code is broken?  Then it should be fixed, why does it
> >> > need a pointer to a name at all anyway?  It shouldn't be needed there
> >> > either.
> >> 
> >> I cannot guess why it originally has a separate pdev->name field.
> >
> > Many people got this wrong when we designed busses, it's not unique.
> > But we should learn from our mistakes where we can :)
> >
> >> >> > It is important to duplicate! pdev->name must not change to make sure
> >> >> > the platform_match() return value is stable over time. If we updated
> >> >> > pdev->name alongside dev->name, once a device probes and changes its
> >> >> > name then the platform_match() return value would change.
> >> >> 
> >> >> I'd be fine sending a V2 that removes the field *and the fallback* [1],
> >> >> but I don't have the full scope in mind to know what would become broken.
> >> >> 
> >> >> [0]: https://lore.kernel.org/lkml/20250218-pdev-uaf-v1-2-5ea1a0d3aba0@bootlin.com/
> >> >> [1]: https://elixir.bootlin.com/linux/v6.13.3/source/drivers/base/platform.c#L1357
> >> >
> >> > The fallback will not need to be removed, properly point to the name of
> >> > the device and it should work correctly.
> >> 
> >> No, it will not work correctly, as the above quote indicates.
> >
> > I don't know which quote, sorry.
> >
> >> Let's assume we remove the field, this situation would be broken:
> >>  - OF allocates platform devices and gives them names.
> >>  - A device matches with a driver, which gets probed.
> >>  - During the probe, driver does a dev_set_name().
> >>  - Afterwards, the upcoming platform_match() against other drivers are
> >>    called with another device name.
> >> 
> >> We should be safe as there are guardraids to not probe twice a device,
> >> see __driver_probe_device() that checks dev->driver is NULL. But it
> >> isn't a situation we should be in.
> >
> > The fragility of attempting to match a driver to a device purely by a
> > name is a very week part of using platform devices.
> 
> I never said the opposite, and I agree.
> However the mechanism exists and I was focused on not breaking it.
> 
> > Why would a driver change the device name?  It's been given to the
> > driver to "bind to" not to change its name.  That shouldn't be ok, fix
> > those drivers.
> 
> I do get the argument that devices shouldn't change device names. I'll
> take the devil's advocate and give at least one argument FOR allowing
> changing names: prettier names, especially as device names leak into
> userspace through pseudo filesystems.

Then that same driver should have created a prettier name when it
created the device and sent it to the driver core :)

> If we agree that device names shouldn't be changed one a device is
> matched with a driver, then (1) we can remove the pdev->name field and
> (2) `dev_set_name()` should warn when used too late.
> 
> Turn the implicit explicit.
> 
> 
> diff --git a/drivers/base/core.c b/drivers/base/core.c
> index 5a1f05198114..3532b068e32d 100644
> --- a/drivers/base/core.c
> +++ b/drivers/base/core.c
> @@ -3462,10 +3462,13 @@ static void device_remove_class_symlinks(struct device *dev)
>  int dev_set_name(struct device *dev, const char *fmt, ...)
>  {
>         va_list vargs;
>         int err;
> 
> +       if (dev_WARN_ONCE(dev, dev->driver, "device name is static once matched"))
> +               return -EPERM;

What?  No, this is a platform driver thing, not a driver core thing.
Let's just remove the name pointer in the platform driver structure and
then we can handle the rest from there.

> +
>         va_start(vargs, fmt);
>         err = kobject_set_name_vargs(&dev->kobj, fmt, vargs);
>         va_end(vargs);
>         return err;
>  }
> 
> (Unsure about the exact error code to return.)
> 
> [...]
> 
> > Do we have examples today of platform drivers that like to rename
> > devices?  I did a quick search and couldn't find any in-tree, but I
> > might have missed some.
> 
> The cover letter expands on the quest for those drivers:
> 
> On Tue Feb 18, 2025 at 12:00 PM CET, Théo Lebrun wrote:
> > Out of the 37 drivers that deal with platform devices and do a
> > dev_set_name() call, only one might be affected. That driver is
> > loongson-i2s-plat [0]. All other dev_set_name() calls are on children
> > devices created on the spot. The issue was found on downstream kernels
> > and we don't have what it takes to test loongson-i2s-plat.

out-of-tree drivers don't matter to us :)


> [...]
> >
> >    ⟩ # Finding potential trouble-makers:
> >    ⟩ git grep -l 'struct platform_device' | xargs grep -l dev_set_name
> >
> [...]
> > [0]: https://elixir.bootlin.com/linux/v6.13.2/source/sound/soc/loongson/loongson_i2s_plat.c#L155
> 
> [...]
> 
> > Or if this really is an issue, let's fix OF to not use the platform bus
> > and have it's own bus for stuff like this.
> 
> That used to exist! I cannot see how it could be a good idea to
> reintroduce the distinction though.
> 
> commit eca3930163ba8884060ce9d9ff5ef0d9b7c7b00f
> Author: Grant Likely <grant.likely@secretlab.ca>
> Date:   Tue Jun 8 07:48:21 2010 -0600
> 
>     of: Merge of_platform_bus_type with platform_bus_type

True, that was nice, but we shouldn't let one force bugs in the other :)

Anyway try removing the name pointer and let's see what falls out.

thanks,

greg k-h
Thomas Petazzoni Feb. 21, 2025, 8:46 a.m. UTC | #8
On Thu, 20 Feb 2025 19:26:41 +0100
Théo Lebrun <theo.lebrun@bootlin.com> wrote:

> That used to exist! I cannot see how it could be a good idea to
> reintroduce the distinction though.
> 
> commit eca3930163ba8884060ce9d9ff5ef0d9b7c7b00f
> Author: Grant Likely <grant.likely@secretlab.ca>
> Date:   Tue Jun 8 07:48:21 2010 -0600
> 
>     of: Merge of_platform_bus_type with platform_bus_type

I don't really see how an of_platform bus would make sense. OF is not a
bus at all, it's a way of providing HW description to an operating
system.

What would IMO make a lot more sense is mmio_bus, for Memory-Mapped I/O
peripherals. mmio_device can be described through OF, through old-style
board.c, possibly through ACPI, or other means.

But in my eyes, the current platform bus is exactly this: the bus for
MMIO devices. It would have be clearer to name it mmio_bus, and that
would have probably prevented abuses of the platform bus for things
that aren't memory-mapped peripherals.

But clearly any bus that has "OF" in its name is wrong, as OF cannot be
a bus. Keep in mind that OF allows to describe not only MMIO devices,
but also I2C devices, SPI devices, MMC/SDIO devices, PCI devices, USB
devices, etc. OF is a description of the HW, not a bus.

Best regards,

Thomas