Message ID | 18b3541e5372bc9b9fc733d422f4e698c089077c.1650177997.git.lukas@wunner.de (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: linkwatch: ignore events for unregistered netdevs | expand |
Context | Check | Description |
---|---|---|
netdev/tree_selection | success | Guessing tree name failed - patch did not apply, async |
On Sun, 2022-04-17 at 09:04 +0200, Lukas Wunner wrote: > Jann Horn reports a use-after-free on disconnect of a USB Ethernet > (ax88179_178a.c). Oleksij Rempel has witnessed the same issue with a > different driver (ax88172a.c). > > Jann's report (linked below) explains the root cause in great detail, > but the gist is that USB Ethernet drivers call linkwatch_fire_event() > between unregister_netdev() and free_netdev(). The asynchronous work > linkwatch_event() may thus access the netdev after it's been freed. > > USB Ethernet may not even be the only culprit. To address the problem > in the most general way, ignore link events once a netdev's state has > been set to NETREG_UNREGISTERED. > > That happens in netdev_run_todo() immediately before the call to > linkwatch_forget_dev(). Note that lweventlist_lock (and its implied > memory barrier) guarantees that a linkwatch_add_event() running after > linkwatch_forget_dev() will see the netdev's new state and bail out. > An unregistered netdev is therefore never added to link_watch_list > (but may have its __LINK_STATE_LINKWATCH_PENDING bit set, which should > not matter). That obviates the need to invoke linkwatch_run_queue() in > netdev_wait_allrefs(), so drop it. > > In a sense, the present commit is to *no longer* registered netdevs as > commit b47300168e77 ("net: Do not fire linkwatch events until the device > is registered.") is to *not yet* registered netdevs. > > Reported-by: Jann Horn <jannh@google.com> > Link: https://lore.kernel.org/netdev/CAG48ez0MHBbENX5gCdHAUXZ7h7s20LnepBF-pa5M=7Bi-jZrEA@mail.gmail.com/ > Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> > Link: https://lore.kernel.org/netdev/20220315113841.GA22337@pengutronix.de/ > Signed-off-by: Lukas Wunner <lukas@wunner.de> > Cc: stable@vger.kernel.org > Cc: Eric Dumazet <edumazet@google.com> > Cc: Oliver Neukum <oneukum@suse.com> > Cc: Andrew Lunn <andrew@lunn.ch> > --- > include/linux/netdevice.h | 2 -- > net/core/dev.c | 17 ----------------- > net/core/link_watch.c | 10 ++-------- > 3 files changed, 2 insertions(+), 27 deletions(-) > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index 59e27a2b7bf0..5d950b45b59d 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -4805,8 +4805,6 @@ extern const struct kobj_ns_type_operations net_ns_type_operations; > > const char *netdev_drivername(const struct net_device *dev); > > -void linkwatch_run_queue(void); > - > static inline netdev_features_t netdev_intersect_features(netdev_features_t f1, > netdev_features_t f2) > { > diff --git a/net/core/dev.c b/net/core/dev.c > index 8c6c08446556..0ee56965ff76 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -10140,23 +10140,6 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list) > list_for_each_entry(dev, list, todo_list) > call_netdevice_notifiers(NETDEV_UNREGISTER, dev); > > - __rtnl_unlock(); > - rcu_barrier(); > - rtnl_lock(); > - > - list_for_each_entry(dev, list, todo_list) > - if (test_bit(__LINK_STATE_LINKWATCH_PENDING, > - &dev->state)) { > - /* We must not have linkwatch events > - * pending on unregister. If this > - * happens, we simply run the queue > - * unscheduled, resulting in a noop > - * for this device. > - */ > - linkwatch_run_queue(); > - break; > - } > - > __rtnl_unlock(); > > rebroadcast_time = jiffies; > diff --git a/net/core/link_watch.c b/net/core/link_watch.c > index 95098d1a49bd..9a0ea7cd68e4 100644 > --- a/net/core/link_watch.c > +++ b/net/core/link_watch.c > @@ -107,7 +107,8 @@ static void linkwatch_add_event(struct net_device *dev) > unsigned long flags; > > spin_lock_irqsave(&lweventlist_lock, flags); > - if (list_empty(&dev->link_watch_list)) { > + if (list_empty(&dev->link_watch_list) && > + dev->reg_state < NETREG_UNREGISTERED) { > list_add_tail(&dev->link_watch_list, &lweventlist); > dev_hold_track(dev, &dev->linkwatch_dev_tracker, GFP_ATOMIC); > What about testing dev->reg_state in linkwatch_fire_event() before setting the __LINK_STATE_LINKWATCH_PENDING bit, so that we don't leave the device in an unexpected state? Other than that, it looks good to me, but potentially quite risky. Looking at the original report it looks like the issue could be resolved with a more usb-specific change: e.g. it looks like usbnet_defer_kevent() is not acquiring a dev reference as it should. Have you considered that path? Thanks, Paolo
On Thu, Apr 21, 2022 at 10:02:43AM +0200, Paolo Abeni wrote: > On Sun, 2022-04-17 at 09:04 +0200, Lukas Wunner wrote: > > --- a/net/core/link_watch.c > > +++ b/net/core/link_watch.c > > @@ -107,7 +107,8 @@ static void linkwatch_add_event(struct net_device *dev) > > unsigned long flags; > > > > spin_lock_irqsave(&lweventlist_lock, flags); > > - if (list_empty(&dev->link_watch_list)) { > > + if (list_empty(&dev->link_watch_list) && > > + dev->reg_state < NETREG_UNREGISTERED) { > > list_add_tail(&dev->link_watch_list, &lweventlist); > > dev_hold_track(dev, &dev->linkwatch_dev_tracker, GFP_ATOMIC); > > > > What about testing dev->reg_state in linkwatch_fire_event() before > setting the __LINK_STATE_LINKWATCH_PENDING bit, so that we don't leave > the device in an unexpected state? That would be racy because linkwatch_fire_event() may see a reg_state of REGISTERED or UNREGISTERING and then add the device to link_watch_list, even though reg_state may be changed to UNREGISTERED in-between. That race is avoided by performing the reg_state check under lweventlist_lock: Scenario 1: CPU 1: CPU 2: linkwatch_add_event(dev); dev->reg_state = NETREG_UNREGISTERED; linkwatch_forget_dev(dev); In this scenario, CPU 2 sees the old value of dev->reg_state and adds the device to link_watch_list, but CPU 1 will subsequently delete it from the list. Scenario 2: CPU 1: CPU 2: dev->reg_state = NETREG_UNREGISTERED; linkwatch_forget_dev(dev); linkwatch_add_event(dev); In this scenario, CPU 2 refrains from adding the device to link_watch_list. It is guaranteed to see the new reg_state due to the memory barriers implied by lweventlist_lock, which is taken both by linkwatch_forget_dev() and linkwatch_add_event(). Note that an unregistered netdev has been stopped, so the portion of linkwatch_do_dev() which is constrained to the netdev being IFF_UP is skipped. The only portion that's executed is rfc2863_policy(), which updates the operstate. I believe that operstate changes are irrelevant and unnecessary after the netdev has been unregistered. Same for the fact that __LINK_STATE_LINKWATCH_PENDING may be set even though the netdev is not on link_watch_list. That should be irrelevant for an unregistered netdev. > Other than that, it looks good to me, but potentially quite risky. To mitigate risk I suggest letting the patch bake in linux-next for a couple of weeks. However I would then have to respin it because the declaration of linkwatch_run_queue() was moved from include/linux/netdevice.h to net/core/dev.h by Jakub's net-next commit 6264f58ca0e5 ("net: extract a few internals from netdevice.h"). Let me know if you want me to respin the patch based on net-next. > Looking at the original report it looks like the issue could be > resolved with a more usb-specific change: e.g. it looks like > usbnet_defer_kevent() is not acquiring a dev reference as it should. > > Have you considered that path? First of all, the diffstat of the patch shows this is an opportunity to reduce LoC as well as simplify and speed up device teardown. Second, the approach you're proposing won't work if a driver calls netif_carrier_on/off() after unregister_netdev(). It seems prudent to prevent such a misbehavior in *any* driver, not just usbnet. usbnet may not be the only one doing it wrong. Jann pointed out that there are more syzbot reports related to a UAF in linkwatch: https://lore.kernel.org/netdev/?q=__linkwatch_run_queue+syzbot Third, I think an API which schedules work, invisibly to the driver, is dangerous and misguided. If it is illegal to call netif_carrier_on/off() for an unregistered but not yet freed netdev, catch that in core networking code and don't expect drivers to respect a rule which isn't even documented. Thanks, Lukas
On Sat, Apr 23, 2022 at 06:07:23PM +0200, Lukas Wunner wrote: > On Thu, Apr 21, 2022 at 10:02:43AM +0200, Paolo Abeni wrote: > > On Sun, 2022-04-17 at 09:04 +0200, Lukas Wunner wrote: > > > --- a/net/core/link_watch.c > > > +++ b/net/core/link_watch.c > > > @@ -107,7 +107,8 @@ static void linkwatch_add_event(struct net_device *dev) > > > unsigned long flags; > > > > > > spin_lock_irqsave(&lweventlist_lock, flags); > > > - if (list_empty(&dev->link_watch_list)) { > > > + if (list_empty(&dev->link_watch_list) && > > > + dev->reg_state < NETREG_UNREGISTERED) { > > > list_add_tail(&dev->link_watch_list, &lweventlist); > > > dev_hold_track(dev, &dev->linkwatch_dev_tracker, GFP_ATOMIC); > > > > > > > What about testing dev->reg_state in linkwatch_fire_event() before > > setting the __LINK_STATE_LINKWATCH_PENDING bit, so that we don't leave > > the device in an unexpected state? About __LINK_STATE_LINKWATCH_PENDING being set even though the netdev is not on link_watch_list: After this patch (which removes one user of __LINK_STATE_LINKWATCH_PENDING) the only purpose of the flag is a small speed-up of linkwatch_fire_event(): If the netdev is already on link_watch_list, the function skips acquiring lweventlist_lock. I don't think this is a hotpath, so the small speed-up is probably not worth it and the flag could be removed completely in a follow-up patch. There is a single other (somewhat oddball) user of the flag in bond_should_notify_peers() in drivers/net/bonding/bond_main.c. It would be possible to replace it with "!list_empty(&dev->link_watch_list)". I don't think acquiring lweventlist_lock is necessary for that because test_bit() is unordered (per Documentation/atomic_bitops.txt) and the check is racy anyway. Thanks, Lukas
On Sat, 23 Apr 2022 18:07:23 +0200 Lukas Wunner wrote: > > Looking at the original report it looks like the issue could be > > resolved with a more usb-specific change: e.g. it looks like > > usbnet_defer_kevent() is not acquiring a dev reference as it should. > > > > Have you considered that path? > > First of all, the diffstat of the patch shows this is an opportunity > to reduce LoC as well as simplify and speed up device teardown. > > Second, the approach you're proposing won't work if a driver calls > netif_carrier_on/off() after unregister_netdev(). > > It seems prudent to prevent such a misbehavior in *any* driver, > not just usbnet. usbnet may not be the only one doing it wrong. > Jann pointed out that there are more syzbot reports related > to a UAF in linkwatch: > > https://lore.kernel.org/netdev/?q=__linkwatch_run_queue+syzbot > > Third, I think an API which schedules work, invisibly to the driver, > is dangerous and misguided. If it is illegal to call > netif_carrier_on/off() for an unregistered but not yet freed netdev, > catch that in core networking code and don't expect drivers to respect > a rule which isn't even documented. Doesn't mean we should make it legal. We can add a warning to catch abuses.
On Mon, Apr 25, 2022 at 4:41 PM Jakub Kicinski <kuba@kernel.org> wrote: > On Sat, 23 Apr 2022 18:07:23 +0200 Lukas Wunner wrote: > > > Looking at the original report it looks like the issue could be > > > resolved with a more usb-specific change: e.g. it looks like > > > usbnet_defer_kevent() is not acquiring a dev reference as it should. > > > > > > Have you considered that path? > > > > First of all, the diffstat of the patch shows this is an opportunity > > to reduce LoC as well as simplify and speed up device teardown. > > > > Second, the approach you're proposing won't work if a driver calls > > netif_carrier_on/off() after unregister_netdev(). > > > > It seems prudent to prevent such a misbehavior in *any* driver, > > not just usbnet. usbnet may not be the only one doing it wrong. > > Jann pointed out that there are more syzbot reports related > > to a UAF in linkwatch: > > > > https://lore.kernel.org/netdev/?q=__linkwatch_run_queue+syzbot > > > > Third, I think an API which schedules work, invisibly to the driver, > > is dangerous and misguided. If it is illegal to call > > netif_carrier_on/off() for an unregistered but not yet freed netdev, > > catch that in core networking code and don't expect drivers to respect > > a rule which isn't even documented. > > Doesn't mean we should make it legal. We can add a warning to catch > abuses. That was the idea with https://lore.kernel.org/netdev/20220128014303.2334568-1-jannh@google.com/, but I didn't get any replies when I asked what the precise semantics of dev_hold() are supposed to be (https://lore.kernel.org/netdev/CAG48ez1-OyZETvrYAfaHicYW1LbrQUVp=C0EukSWqZrYMej73w@mail.gmail.com/), so I don't know how to proceed...
On Mon, 25 Apr 2022 16:49:34 +0200 Jann Horn wrote: > > Doesn't mean we should make it legal. We can add a warning to catch > > abuses. > > That was the idea with > https://lore.kernel.org/netdev/20220128014303.2334568-1-jannh@google.com/, > but I didn't get any replies when I asked what the precise semantics > of dev_hold() are supposed to be > (https://lore.kernel.org/netdev/CAG48ez1-OyZETvrYAfaHicYW1LbrQUVp=C0EukSWqZrYMej73w@mail.gmail.com/), > so I don't know how to proceed... Yeah, I think after you pointed out that the netdev per cpu refcounting is fundamentally broken everybody decided to hit themselves with the obliviate spell :S
On Mon, Apr 25, 2022 at 8:01 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 25 Apr 2022 16:49:34 +0200 Jann Horn wrote: > > > Doesn't mean we should make it legal. We can add a warning to catch > > > abuses. > > > > That was the idea with > > https://lore.kernel.org/netdev/20220128014303.2334568-1-jannh@google.com/, > > but I didn't get any replies when I asked what the precise semantics > > of dev_hold() are supposed to be > > (https://lore.kernel.org/netdev/CAG48ez1-OyZETvrYAfaHicYW1LbrQUVp=C0EukSWqZrYMej73w@mail.gmail.com/), > > so I don't know how to proceed... > > Yeah, I think after you pointed out that the netdev per cpu refcounting > is fundamentally broken everybody decided to hit themselves with the > obliviate spell :S dev_hold() has been an increment of a refcount, and dev_put() a decrement. Not sure why it is fundamentally broken. There are specific steps at device dismantles making sure no more users can dev_hold() It is a contract. Any buggy layer can overwrite any piece of memory, including a refcount_t. Traditionally we could not add a test in dev_hold() to prevent an increment if the device is in dismantle phase. Maybe the situation is better nowadays.
On Mon, Apr 25, 2022 at 5:13 PM Eric Dumazet <edumazet@google.com> wrote: > On Mon, Apr 25, 2022 at 8:01 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Mon, 25 Apr 2022 16:49:34 +0200 Jann Horn wrote: > > > > Doesn't mean we should make it legal. We can add a warning to catch > > > > abuses. > > > > > > That was the idea with > > > https://lore.kernel.org/netdev/20220128014303.2334568-1-jannh@google.com/, > > > but I didn't get any replies when I asked what the precise semantics > > > of dev_hold() are supposed to be > > > (https://lore.kernel.org/netdev/CAG48ez1-OyZETvrYAfaHicYW1LbrQUVp=C0EukSWqZrYMej73w@mail.gmail.com/), > > > so I don't know how to proceed... > > > > Yeah, I think after you pointed out that the netdev per cpu refcounting > > is fundamentally broken everybody decided to hit themselves with the > > obliviate spell :S > > dev_hold() has been an increment of a refcount, and dev_put() a decrement. > > Not sure why it is fundamentally broken. Well, it's not quite a refcount. It's a count that can be incremented and decremented but can't be read while the device is alive, and then at some point it turns into a count that can be read and decremented but can't be incremented (see https://lore.kernel.org/netdev/CAG48ez1-OyZETvrYAfaHicYW1LbrQUVp=C0EukSWqZrYMej73w@mail.gmail.com/). Normal refcounts allow anyone who is holding a reference to add another reference. > There are specific steps at device dismantles making sure no more > users can dev_hold() So you're saying it's intentional that even if you're already holding a dev_hold() reference, you may not be allowed to call dev_hold() again?
On Mon, Apr 25, 2022 at 8:19 AM Jann Horn <jannh@google.com> wrote: > > On Mon, Apr 25, 2022 at 5:13 PM Eric Dumazet <edumazet@google.com> wrote: > > On Mon, Apr 25, 2022 at 8:01 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > > > > On Mon, 25 Apr 2022 16:49:34 +0200 Jann Horn wrote: > > > > > Doesn't mean we should make it legal. We can add a warning to catch > > > > > abuses. > > > > > > > > That was the idea with > > > > https://lore.kernel.org/netdev/20220128014303.2334568-1-jannh@google.com/, > > > > but I didn't get any replies when I asked what the precise semantics > > > > of dev_hold() are supposed to be > > > > (https://lore.kernel.org/netdev/CAG48ez1-OyZETvrYAfaHicYW1LbrQUVp=C0EukSWqZrYMej73w@mail.gmail.com/), > > > > so I don't know how to proceed... > > > > > > Yeah, I think after you pointed out that the netdev per cpu refcounting > > > is fundamentally broken everybody decided to hit themselves with the > > > obliviate spell :S > > > > dev_hold() has been an increment of a refcount, and dev_put() a decrement. > > > > Not sure why it is fundamentally broken. > > Well, it's not quite a refcount. It's a count that can be incremented > and decremented but can't be read while the device is alive, and then > at some point it turns into a count that can be read and decremented > but can't be incremented (see > https://lore.kernel.org/netdev/CAG48ez1-OyZETvrYAfaHicYW1LbrQUVp=C0EukSWqZrYMej73w@mail.gmail.com/). > Normal refcounts allow anyone who is holding a reference to add > another reference. On a live netdev nothing wants to read the 'current refcount'. We basically do not care. > > > There are specific steps at device dismantles making sure no more > > users can dev_hold() > > So you're saying it's intentional that even if you're already holding > a dev_hold() reference, you may not be allowed to call dev_hold() > again? I think you can/should not. We might add a test in dev_hold() and catch offenders. Then add a new api, (dev_hold() is void and can not propagate an error), and eventually fix offenders.
On Mon, 25 Apr 2022 08:13:40 -0700 Eric Dumazet wrote: > dev_hold() has been an increment of a refcount, and dev_put() a decrement. > > Not sure why it is fundamentally broken. Jann described a case where someone does CPU 0 CPU 1 CPU 2 dev_hold() ------ #unregister ------- dev_hold() dev_put() Our check for refcount == 0 goes over the CPUs one by one, so if it sums up CPUs 0 and 1 at the "unregister" point above and CPU2 after the CPU1 hold and CPU2 release it will "miss" one refcount. That's a problem unless doing a dev_hold() on a netdev we only have a reference on is illegal. > There are specific steps at device dismantles making sure no more > users can dev_hold() > > It is a contract. Any buggy layer can overwrite any piece of memory, > including a refcount_t. > > Traditionally we could not add a test in dev_hold() to prevent an > increment if the device is in dismantle phase. > Maybe the situation is better nowadays.
On Mon, Apr 25, 2022 at 8:28 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 25 Apr 2022 08:13:40 -0700 Eric Dumazet wrote: > > dev_hold() has been an increment of a refcount, and dev_put() a decrement. > > > > Not sure why it is fundamentally broken. > > Jann described a case where someone does > > CPU 0 CPU 1 CPU 2 > > dev_hold() > ------ #unregister ------- > dev_hold() > dev_put() > > Our check for refcount == 0 goes over the CPUs one by one, > so if it sums up CPUs 0 and 1 at the "unregister" point above > and CPU2 after the CPU1 hold and CPU2 release it will "miss" > one refcount. > > That's a problem unless doing a dev_hold() on a netdev we only have > a reference on is illegal. What is 'illegal' is trying to keep using the device after #unregister. We have barriers to prevent that. Somehow a layer does not care about the barriers and pretends the device is still good to use. It is of course perfectly fine to stack multiple dev_hold() from one path (if these do not leak, but this is a different issue) > > > There are specific steps at device dismantles making sure no more > > users can dev_hold() > > > > It is a contract. Any buggy layer can overwrite any piece of memory, > > including a refcount_t. > > > > Traditionally we could not add a test in dev_hold() to prevent an > > increment if the device is in dismantle phase. > > Maybe the situation is better nowadays. >
On Mon, 25 Apr 2022 08:31:23 -0700 Eric Dumazet wrote: > > Jann described a case where someone does > > > > CPU 0 CPU 1 CPU 2 > > > > dev_hold() > > ------ #unregister ------- > > dev_hold() > > dev_put() > > > > Our check for refcount == 0 goes over the CPUs one by one, > > so if it sums up CPUs 0 and 1 at the "unregister" point above > > and CPU2 after the CPU1 hold and CPU2 release it will "miss" > > one refcount. > > > > That's a problem unless doing a dev_hold() on a netdev we only have > > a reference on is illegal. > > What is 'illegal' is trying to keep using the device after #unregister. > > We have barriers to prevent that. > > Somehow a layer does not care about the barriers and pretends the > device is still good to use. > > It is of course perfectly fine to stack multiple dev_hold() from one > path (if these do not leak, but this is a different issue) So we'd need something like WARN_ON(dev->reg_state != NETREG_REGISTERED && !rtnl_held()) in dev_hold()?
On Mon, Apr 25, 2022 at 05:18:51PM +0200, Jann Horn wrote: > Well, it's not quite a refcount. It's a count that can be incremented > and decremented but can't be read while the device is alive, and then > at some point it turns into a count that can be read and decremented > but can't be incremented Pardon me for being dense, but most other subsystems use the refcounting built into struct device (or rather, its kobject) and tear it down when the refcount reaches zero (e.g. pci_release_dev(), spidev_release()). What's the rationale for struct net_device rolling its own refcounting? Historic artifact? I think a lot of these issues would solve themselves if that was done away with and replaced with the generic kobject refcounting. It's a pity that the tracking infrastructure is now netdev-specific and other subsystems cannot benefit from it. Thanks, Lukas
On Mon, Apr 25, 2022 at 10:20 AM Lukas Wunner <lukas@wunner.de> wrote: > > On Mon, Apr 25, 2022 at 05:18:51PM +0200, Jann Horn wrote: > > Well, it's not quite a refcount. It's a count that can be incremented > > and decremented but can't be read while the device is alive, and then > > at some point it turns into a count that can be read and decremented > > but can't be incremented > > Pardon me for being dense, but most other subsystems use the refcounting > built into struct device (or rather, its kobject) and tear it down > when the refcount reaches zero (e.g. pci_release_dev(), spidev_release()). > > What's the rationale for struct net_device rolling its own refcounting? > Historic artifact? Yes, probably. This was there way before new fancy mechanisms were invented. > > > I think a lot of these issues would solve themselves if that was done away > with and replaced with the generic kobject refcounting. It's a pity that > the tracking infrastructure is now netdev-specific and other subsystems > cannot benefit from it. Make sure that whatever replaces it, heavy dev_hold()/dev_put() users do not come to a crawl. af_packet is using this stuff. Some users want to send millions of packets per second, without having to bypass the kernel because it is suddenly too slow. > > Thanks, > > Lukas
On Mon, Apr 25, 2022 at 07:41:46AM -0700, Jakub Kicinski wrote: > On Sat, 23 Apr 2022 18:07:23 +0200 Lukas Wunner wrote: > > > Looking at the original report it looks like the issue could be > > > resolved with a more usb-specific change: e.g. it looks like > > > usbnet_defer_kevent() is not acquiring a dev reference as it should. > > > > > > Have you considered that path? > > > > First of all, the diffstat of the patch shows this is an opportunity > > to reduce LoC as well as simplify and speed up device teardown. > > > > Second, the approach you're proposing won't work if a driver calls > > netif_carrier_on/off() after unregister_netdev(). > > > > It seems prudent to prevent such a misbehavior in *any* driver, > > not just usbnet. usbnet may not be the only one doing it wrong. > > Jann pointed out that there are more syzbot reports related > > to a UAF in linkwatch: > > > > https://lore.kernel.org/netdev/?q=__linkwatch_run_queue+syzbot > > > > Third, I think an API which schedules work, invisibly to the driver, > > is dangerous and misguided. If it is illegal to call > > netif_carrier_on/off() for an unregistered but not yet freed netdev, > > catch that in core networking code and don't expect drivers to respect > > a rule which isn't even documented. > > Doesn't mean we should make it legal. We can add a warning to catch > abuses. That would be inconsequent, considering that netif_carrier_on/off() do not warn for a reg_state of NETREG_UNINITIALIZED. Thanks, Lukas
On Mon, Apr 25, 2022 at 2:18 PM Lukas Wunner <lukas@wunner.de> wrote: > > On Mon, Apr 25, 2022 at 07:41:46AM -0700, Jakub Kicinski wrote: > > On Sat, 23 Apr 2022 18:07:23 +0200 Lukas Wunner wrote: > > > > Looking at the original report it looks like the issue could be > > > > resolved with a more usb-specific change: e.g. it looks like > > > > usbnet_defer_kevent() is not acquiring a dev reference as it should. > > > > > > > > Have you considered that path? > > > > > > First of all, the diffstat of the patch shows this is an opportunity > > > to reduce LoC as well as simplify and speed up device teardown. > > > > > > Second, the approach you're proposing won't work if a driver calls > > > netif_carrier_on/off() after unregister_netdev(). > > > > > > It seems prudent to prevent such a misbehavior in *any* driver, > > > not just usbnet. usbnet may not be the only one doing it wrong. > > > Jann pointed out that there are more syzbot reports related > > > to a UAF in linkwatch: > > > > > > https://lore.kernel.org/netdev/?q=__linkwatch_run_queue+syzbot > > > > > > Third, I think an API which schedules work, invisibly to the driver, > > > is dangerous and misguided. If it is illegal to call > > > netif_carrier_on/off() for an unregistered but not yet freed netdev, > > > catch that in core networking code and don't expect drivers to respect > > > a rule which isn't even documented. > > > > Doesn't mean we should make it legal. We can add a warning to catch > > abuses. > > That would be inconsequent, considering that netif_carrier_on/off() > do not warn for a reg_state of NETREG_UNINITIALIZED. > Yes, only 1500 calls to audit ;) I guess we could start adding WARN_ON_ONCE(), then wait for a few syzbot/users reports to fix offenders... commit b47300168e770b60ab96c8924854c3b0eb4260eb Author: David S. Miller <davem@davemloft.net> Date: Wed Nov 19 15:33:54 2008 -0800 net: Do not fire linkwatch events until the device is registered. Several device drivers try to do things like netif_carrier_off() before register_netdev() is invoked. This is bogus, but too many drivers do this to fix them all up in one go. Reported-by: Folkert van Heusden <folkert@vanheusden.com> Signed-off-by: David S. Miller <davem@davemloft.net>
On Mon, Apr 25, 2022 at 07:41:46AM -0700, Jakub Kicinski wrote: > On Sat, 23 Apr 2022 18:07:23 +0200 Lukas Wunner wrote: > > > Looking at the original report it looks like the issue could be > > > resolved with a more usb-specific change: e.g. it looks like > > > usbnet_defer_kevent() is not acquiring a dev reference as it should. > > > > > > Have you considered that path? > > > > First of all, the diffstat of the patch shows this is an opportunity > > to reduce LoC as well as simplify and speed up device teardown. > > > > Second, the approach you're proposing won't work if a driver calls > > netif_carrier_on/off() after unregister_netdev(). > > > > It seems prudent to prevent such a misbehavior in *any* driver, > > not just usbnet. usbnet may not be the only one doing it wrong. > > Jann pointed out that there are more syzbot reports related > > to a UAF in linkwatch: > > > > https://lore.kernel.org/netdev/?q=__linkwatch_run_queue+syzbot > > > > Third, I think an API which schedules work, invisibly to the driver, > > is dangerous and misguided. If it is illegal to call > > netif_carrier_on/off() for an unregistered but not yet freed netdev, > > catch that in core networking code and don't expect drivers to respect > > a rule which isn't even documented. > > Doesn't mean we should make it legal. We can add a warning to catch > abuses. It turns out that no, we *cannot* add a warning to catch abuses. I've identified all the places in USB Ethernet drivers which are susceptible to calling linkwatch_fire_event() after unregister_netdev(), see patch below. I'm fixing each one like this: - netif_carrier_on(dev->net); + dev_hold(dev->net); + if (dev->net->reg_state < NETREG_UNREGISTERED) + netif_carrier_on(dev->net); + dev_put(dev->net); If this is called after unregister_netdev(), it becomes a no-op. However if it is called concurrently to unregister_netdev(), the reg_state may change to NETREG_UNREGISTERED after the if-clause has been evaluated and before netif_carrier_on() is called. Then a linkwatch event *will* be fired. There won't be a use-after-free because of the ref I'm acquiring here. (unregister_netdev() will spin in netdev_wait_allrefs_any() until the linkwatch event has been handled.) But this means that we may still call linkwatch_fire_event() after unregister_netdev()! So we cannot emit a WARN splat and we cannot catch use-after-frees outside of the USB Ethernet drivers I'm fixing in the below patch. It may thus very well happen that a use-after-free may still occur for such other drivers and we cannot even WARN about it. For this reason I would strongly prefer the $SUBJECT_PATCH ("net: linkwatch: ignore events for unregistered netdevs") instead of the patch below. I think you are wrong to stall the patch. It avoids UAFs in *any* driver, not just the USB Ethernet ones, it reduces LoC and speeds up netdev unregistration. What more do you want? Thanks, Lukas -- >8 -- diff --git a/drivers/net/usb/aqc111.c b/drivers/net/usb/aqc111.c index ea06d10..279a7ca 100644 --- a/drivers/net/usb/aqc111.c +++ b/drivers/net/usb/aqc111.c @@ -962,7 +962,11 @@ static int aqc111_link_reset(struct usbnet *dev) aqc111_write16_cmd(dev, AQ_ACCESS_MAC, SFR_RX_CTL, 2, &aqc111_data->rxctl); - netif_carrier_on(dev->net); + dev_hold(dev->net); + if (dev->net->reg_state < NETREG_UNREGISTERED) + netif_carrier_on(dev->net); + dev_put(dev->net); + } else { aqc111_read16_cmd(dev, AQ_ACCESS_MAC, SFR_MEDIUM_STATUS_MODE, 2, ®16); @@ -981,7 +985,10 @@ static int aqc111_link_reset(struct usbnet *dev) aqc111_write_cmd(dev, AQ_ACCESS_MAC, SFR_BULK_OUT_CTRL, 1, 1, ®8); - netif_carrier_off(dev->net); + dev_hold(dev->net); + if (dev->net->reg_state < NETREG_UNREGISTERED) + netif_carrier_off(dev->net); + dev_put(dev->net); } return 0; } diff --git a/drivers/net/usb/asix_devices.c b/drivers/net/usb/asix_devices.c index 0872ca12..1e97c0a 100644 --- a/drivers/net/usb/asix_devices.c +++ b/drivers/net/usb/asix_devices.c @@ -173,7 +173,11 @@ static int ax88172_link_reset(struct usbnet *dev) u8 mode; struct ethtool_cmd ecmd = { .cmd = ETHTOOL_GSET }; - mii_check_media(&dev->mii, 1, 1); + dev_hold(dev->net); + if (dev->net->reg_state < NETREG_UNREGISTERED) + mii_check_media(&dev->mii, 1, 1); + dev_put(dev->net); + mii_ethtool_gset(&dev->mii, &ecmd); mode = AX88172_MEDIUM_DEFAULT; @@ -1013,7 +1017,11 @@ static int ax88178_link_reset(struct usbnet *dev) netdev_dbg(dev->net, "ax88178_link_reset()\n"); - mii_check_media(&dev->mii, 1, 1); + dev_hold(dev->net); + if (dev->net->reg_state < NETREG_UNREGISTERED) + mii_check_media(&dev->mii, 1, 1); + dev_put(dev->net); + mii_ethtool_gset(&dev->mii, &ecmd); mode = AX88178_MEDIUM_DEFAULT; speed = ethtool_cmd_speed(&ecmd); diff --git a/drivers/net/usb/ax88179_178a.c b/drivers/net/usb/ax88179_178a.c index a310989..279ddf2 100644 --- a/drivers/net/usb/ax88179_178a.c +++ b/drivers/net/usb/ax88179_178a.c @@ -1632,7 +1632,10 @@ static int ax88179_link_reset(struct usbnet *dev) ax179_data->eee_enabled = ax88179_chk_eee(dev); - netif_carrier_on(dev->net); + dev_hold(dev->net); + if (dev->net->reg_state < NETREG_UNREGISTERED) + netif_carrier_on(dev->net); + dev_put(dev->net); return 0; } diff --git a/drivers/net/usb/ch9200.c b/drivers/net/usb/ch9200.c index f69d9b9..5c7904c 100644 --- a/drivers/net/usb/ch9200.c +++ b/drivers/net/usb/ch9200.c @@ -214,7 +214,11 @@ static int ch9200_link_reset(struct usbnet *dev) { struct ethtool_cmd ecmd; - mii_check_media(&dev->mii, 1, 1); + dev_hold(dev->net); + if (dev->net->reg_state < NETREG_UNREGISTERED) + mii_check_media(&dev->mii, 1, 1); + dev_put(dev->net); + mii_ethtool_gset(&dev->mii, &ecmd); netdev_dbg(dev->net, "%s() speed:%d duplex:%d\n", diff --git a/drivers/net/usb/sierra_net.c b/drivers/net/usb/sierra_net.c index bb4cbe8f..9ae9359 100644 --- a/drivers/net/usb/sierra_net.c +++ b/drivers/net/usb/sierra_net.c @@ -427,7 +427,11 @@ static void sierra_net_handle_lsi(struct usbnet *dev, char *data, } else { priv->link_up = 0; } - usbnet_link_change(dev, link_up, 0); + + dev_hold(dev->net); + if (dev->net->reg_state < NETREG_UNREGISTERED) + usbnet_link_change(dev, link_up, 0); + dev_put(dev->net); } static void sierra_net_dosync(struct usbnet *dev) @@ -758,6 +762,8 @@ static void sierra_net_unbind(struct usbnet *dev, struct usb_interface *intf) dev_dbg(&dev->udev->dev, "%s", __func__); + usbnet_status_stop(dev); + /* kill the timer and work */ del_timer_sync(&priv->sync_timer); cancel_work_sync(&priv->sierra_net_kevent); @@ -769,8 +775,6 @@ static void sierra_net_unbind(struct usbnet *dev, struct usb_interface *intf) netdev_err(dev->net, "usb_control_msg failed, status %d\n", status); - usbnet_status_stop(dev); - sierra_net_set_private(dev, NULL); kfree(priv); } diff --git a/drivers/net/usb/smsc75xx.c b/drivers/net/usb/smsc75xx.c index 95de452..b7f608a 100644 --- a/drivers/net/usb/smsc75xx.c +++ b/drivers/net/usb/smsc75xx.c @@ -640,7 +640,11 @@ static int smsc75xx_link_reset(struct usbnet *dev) return ret; } - mii_check_media(mii, 1, 1); + dev_hold(dev->net); + if (dev->net->reg_state < NETREG_UNREGISTERED) + mii_check_media(mii, 1, 1); + dev_put(dev->net); + mii_ethtool_gset(&dev->mii, &ecmd); lcladv = smsc75xx_mdio_read(dev->net, mii->phy_id, MII_ADVERTISE); rmtadv = smsc75xx_mdio_read(dev->net, mii->phy_id, MII_LPA);
On Sat, Apr 30, 2022 at 12:05:41PM +0200, Lukas Wunner wrote: > But this means that we may still call linkwatch_fire_event() after > unregister_netdev()! I meant to say, "we may still call linkwatch_fire_event() after the state has changed to NETREG_UNREGISTERED".
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index 59e27a2b7bf0..5d950b45b59d 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -4805,8 +4805,6 @@ extern const struct kobj_ns_type_operations net_ns_type_operations; const char *netdev_drivername(const struct net_device *dev); -void linkwatch_run_queue(void); - static inline netdev_features_t netdev_intersect_features(netdev_features_t f1, netdev_features_t f2) { diff --git a/net/core/dev.c b/net/core/dev.c index 8c6c08446556..0ee56965ff76 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10140,23 +10140,6 @@ static struct net_device *netdev_wait_allrefs_any(struct list_head *list) list_for_each_entry(dev, list, todo_list) call_netdevice_notifiers(NETDEV_UNREGISTER, dev); - __rtnl_unlock(); - rcu_barrier(); - rtnl_lock(); - - list_for_each_entry(dev, list, todo_list) - if (test_bit(__LINK_STATE_LINKWATCH_PENDING, - &dev->state)) { - /* We must not have linkwatch events - * pending on unregister. If this - * happens, we simply run the queue - * unscheduled, resulting in a noop - * for this device. - */ - linkwatch_run_queue(); - break; - } - __rtnl_unlock(); rebroadcast_time = jiffies; diff --git a/net/core/link_watch.c b/net/core/link_watch.c index 95098d1a49bd..9a0ea7cd68e4 100644 --- a/net/core/link_watch.c +++ b/net/core/link_watch.c @@ -107,7 +107,8 @@ static void linkwatch_add_event(struct net_device *dev) unsigned long flags; spin_lock_irqsave(&lweventlist_lock, flags); - if (list_empty(&dev->link_watch_list)) { + if (list_empty(&dev->link_watch_list) && + dev->reg_state < NETREG_UNREGISTERED) { list_add_tail(&dev->link_watch_list, &lweventlist); dev_hold_track(dev, &dev->linkwatch_dev_tracker, GFP_ATOMIC); } @@ -250,13 +251,6 @@ void linkwatch_forget_dev(struct net_device *dev) } -/* Must be called with the rtnl semaphore held */ -void linkwatch_run_queue(void) -{ - __linkwatch_run_queue(0); -} - - static void linkwatch_event(struct work_struct *dummy) { rtnl_lock();
Jann Horn reports a use-after-free on disconnect of a USB Ethernet (ax88179_178a.c). Oleksij Rempel has witnessed the same issue with a different driver (ax88172a.c). Jann's report (linked below) explains the root cause in great detail, but the gist is that USB Ethernet drivers call linkwatch_fire_event() between unregister_netdev() and free_netdev(). The asynchronous work linkwatch_event() may thus access the netdev after it's been freed. USB Ethernet may not even be the only culprit. To address the problem in the most general way, ignore link events once a netdev's state has been set to NETREG_UNREGISTERED. That happens in netdev_run_todo() immediately before the call to linkwatch_forget_dev(). Note that lweventlist_lock (and its implied memory barrier) guarantees that a linkwatch_add_event() running after linkwatch_forget_dev() will see the netdev's new state and bail out. An unregistered netdev is therefore never added to link_watch_list (but may have its __LINK_STATE_LINKWATCH_PENDING bit set, which should not matter). That obviates the need to invoke linkwatch_run_queue() in netdev_wait_allrefs(), so drop it. In a sense, the present commit is to *no longer* registered netdevs as commit b47300168e77 ("net: Do not fire linkwatch events until the device is registered.") is to *not yet* registered netdevs. Reported-by: Jann Horn <jannh@google.com> Link: https://lore.kernel.org/netdev/CAG48ez0MHBbENX5gCdHAUXZ7h7s20LnepBF-pa5M=7Bi-jZrEA@mail.gmail.com/ Reported-by: Oleksij Rempel <o.rempel@pengutronix.de> Link: https://lore.kernel.org/netdev/20220315113841.GA22337@pengutronix.de/ Signed-off-by: Lukas Wunner <lukas@wunner.de> Cc: stable@vger.kernel.org Cc: Eric Dumazet <edumazet@google.com> Cc: Oliver Neukum <oneukum@suse.com> Cc: Andrew Lunn <andrew@lunn.ch> --- include/linux/netdevice.h | 2 -- net/core/dev.c | 17 ----------------- net/core/link_watch.c | 10 ++-------- 3 files changed, 2 insertions(+), 27 deletions(-)