Message ID | 20220128014303.2334568-1-jannh@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: dev: Detect dev_hold() after netdev_wait_allrefs() | expand |
On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@google.com> wrote: > > I've run into a bug where dev_hold() was being called after > netdev_wait_allrefs(). But at that point, the device is already going > away, and dev_hold() can't stop that anymore. > > To make such problems easier to diagnose in the future: > > - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether > the net refcount has been elevated. If this is detected, WARN() and > leak the object (to prevent worse consequences from a > use-after-free). > - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero. > This signals to the generic refcount infrastructure that any attempt > to increment the refcount later is a bug. > > Signed-off-by: Jann Horn <jannh@google.com> > --- > net/core/dev.c | 18 +++++++++++++++++- > 1 file changed, 17 insertions(+), 1 deletion(-) > > diff --git a/net/core/dev.c b/net/core/dev.c > index 1baab07820f6..f7916c0d226d 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -9949,8 +9949,18 @@ void netdev_run_todo(void) > > netdev_wait_allrefs(dev); > > + /* Drop the netdev refcount (which should be 1 at this point) > + * to zero. If we're using the generic refcount code, this will > + * tell it that any dev_hold() after this point is a bug. > + */ > +#ifdef CONFIG_PCPU_DEV_REFCNT > + this_cpu_dec(*dev->pcpu_refcnt); > + BUG_ON(netdev_refcnt_read(dev) != 0); > +#else > + BUG_ON(!refcount_dec_and_test(&dev->dev_refcnt)); > +#endif > + > /* paranoia */ > - BUG_ON(netdev_refcnt_read(dev) != 1); > BUG_ON(!list_empty(&dev->ptype_all)); > BUG_ON(!list_empty(&dev->ptype_specific)); > WARN_ON(rcu_access_pointer(dev->ip_ptr)); > @@ -10293,6 +10303,12 @@ void free_netdev(struct net_device *dev) > free_percpu(dev->xdp_bulkq); > dev->xdp_bulkq = NULL; > > + /* Recheck in case someone called dev_hold() between > + * netdev_wait_allrefs() and here. > + */ At this point, dev->pcpu_refcnt per-cpu data has been freed already (CONFIG_PCPU_DEV_REFCNT=y) So this should probably crash, or at least UAF ? > + if (WARN_ON(netdev_refcnt_read(dev) != 0)) > + return; /* leak memory, otherwise we might get UAF */ > + > /* Compatibility with error handling in drivers */ > if (dev->reg_state == NETREG_UNINITIALIZED) { > netdev_freemem(dev); > > base-commit: 23a46422c56144939c091c76cf389aa863ce9c18 > -- > 2.35.0.rc0.227.g00780c9af4-goog >
On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet <edumazet@google.com> wrote: > On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@google.com> wrote: > > > > I've run into a bug where dev_hold() was being called after > > netdev_wait_allrefs(). But at that point, the device is already going > > away, and dev_hold() can't stop that anymore. > > > > To make such problems easier to diagnose in the future: > > > > - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether > > the net refcount has been elevated. If this is detected, WARN() and > > leak the object (to prevent worse consequences from a > > use-after-free). > > - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero. > > This signals to the generic refcount infrastructure that any attempt > > to increment the refcount later is a bug. > > > > Signed-off-by: Jann Horn <jannh@google.com> > > --- > > net/core/dev.c | 18 +++++++++++++++++- > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > index 1baab07820f6..f7916c0d226d 100644 > > --- a/net/core/dev.c > > +++ b/net/core/dev.c > > @@ -9949,8 +9949,18 @@ void netdev_run_todo(void) > > > > netdev_wait_allrefs(dev); > > > > + /* Drop the netdev refcount (which should be 1 at this point) > > + * to zero. If we're using the generic refcount code, this will > > + * tell it that any dev_hold() after this point is a bug. > > + */ > > +#ifdef CONFIG_PCPU_DEV_REFCNT > > + this_cpu_dec(*dev->pcpu_refcnt); > > + BUG_ON(netdev_refcnt_read(dev) != 0); > > +#else > > + BUG_ON(!refcount_dec_and_test(&dev->dev_refcnt)); > > +#endif > > + > > /* paranoia */ > > - BUG_ON(netdev_refcnt_read(dev) != 1); > > BUG_ON(!list_empty(&dev->ptype_all)); > > BUG_ON(!list_empty(&dev->ptype_specific)); > > WARN_ON(rcu_access_pointer(dev->ip_ptr)); > > @@ -10293,6 +10303,12 @@ void free_netdev(struct net_device *dev) > > free_percpu(dev->xdp_bulkq); > > dev->xdp_bulkq = NULL; > > > > + /* Recheck in case someone called dev_hold() between > > + * netdev_wait_allrefs() and here. > > + */ > > At this point, dev->pcpu_refcnt per-cpu data has been freed already > (CONFIG_PCPU_DEV_REFCNT=y) > > So this should probably crash, or at least UAF ? Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT... I guess a better place to put the new check would be directly after checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this? if (dev->reg_state == NETREG_UNREGISTERING) { ASSERT_RTNL(); dev->needs_free_netdev = true; return; } /* Recheck in case someone called dev_hold() between * netdev_wait_allrefs() and here. */ if (WARN_ON(netdev_refcnt_read(dev) != 0)) return; /* leak memory, otherwise we might get UAF */ netif_free_tx_queues(dev); netif_free_rx_queues(dev);
On Fri, 28 Jan 2022 03:14:14 +0100 Jann Horn wrote: > Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT... > > I guess a better place to put the new check would be directly after > checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this? Possibly a very silly suggestion but perhaps we should set the pointer to NULL for the pcpu case and let it crash?
On Thu, Jan 27, 2022 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 28 Jan 2022 03:14:14 +0100 Jann Horn wrote: > > Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT... > > > > I guess a better place to put the new check would be directly after > > checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this? > > Possibly a very silly suggestion but perhaps we should set > the pointer to NULL for the pcpu case and let it crash? It is already set to 0
On Thu, Jan 27, 2022 at 6:14 PM Jann Horn <jannh@google.com> wrote: > > On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@google.com> wrote: > > > > > > I've run into a bug where dev_hold() was being called after > > > netdev_wait_allrefs(). But at that point, the device is already going > > > away, and dev_hold() can't stop that anymore. > > > > > > To make such problems easier to diagnose in the future: > > > > > > - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether > > > the net refcount has been elevated. If this is detected, WARN() and > > > leak the object (to prevent worse consequences from a > > > use-after-free). > > > - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero. > > > This signals to the generic refcount infrastructure that any attempt > > > to increment the refcount later is a bug. > > > > > > Signed-off-by: Jann Horn <jannh@google.com> > > > --- > > > net/core/dev.c | 18 +++++++++++++++++- > > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > index 1baab07820f6..f7916c0d226d 100644 > > > --- a/net/core/dev.c > > > +++ b/net/core/dev.c > > > @@ -9949,8 +9949,18 @@ void netdev_run_todo(void) > > > > > > netdev_wait_allrefs(dev); > > > > > > + /* Drop the netdev refcount (which should be 1 at this point) > > > + * to zero. If we're using the generic refcount code, this will > > > + * tell it that any dev_hold() after this point is a bug. > > > + */ > > > +#ifdef CONFIG_PCPU_DEV_REFCNT > > > + this_cpu_dec(*dev->pcpu_refcnt); > > > + BUG_ON(netdev_refcnt_read(dev) != 0); > > > +#else > > > + BUG_ON(!refcount_dec_and_test(&dev->dev_refcnt)); > > > +#endif > > > + > > > /* paranoia */ > > > - BUG_ON(netdev_refcnt_read(dev) != 1); > > > BUG_ON(!list_empty(&dev->ptype_all)); > > > BUG_ON(!list_empty(&dev->ptype_specific)); > > > WARN_ON(rcu_access_pointer(dev->ip_ptr)); > > > @@ -10293,6 +10303,12 @@ void free_netdev(struct net_device *dev) > > > free_percpu(dev->xdp_bulkq); > > > dev->xdp_bulkq = NULL; > > > > > > + /* Recheck in case someone called dev_hold() between > > > + * netdev_wait_allrefs() and here. > > > + */ > > > > At this point, dev->pcpu_refcnt per-cpu data has been freed already > > (CONFIG_PCPU_DEV_REFCNT=y) > > > > So this should probably crash, or at least UAF ? > > Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT... > > I guess a better place to put the new check would be directly after > checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this? > > if (dev->reg_state == NETREG_UNREGISTERING) { > ASSERT_RTNL(); > dev->needs_free_netdev = true; > return; > } > > /* Recheck in case someone called dev_hold() between > * netdev_wait_allrefs() and here. > */ > if (WARN_ON(netdev_refcnt_read(dev) != 0)) > return; /* leak memory, otherwise we might get UAF */ > > netif_free_tx_queues(dev); > netif_free_rx_queues(dev); Maybe another solution would be to leverage the recent dev_hold_track(). We could add a dead boolean to 'struct ref_tracker_dir ' (dev->refcnt_tracker) diff --git a/include/linux/ref_tracker.h b/include/linux/ref_tracker.h index c11c9db5825cf933acf529c83db441a818135f29..d907759b2fa1dd6b2ef22f883d55963c410dc71b 100644 --- a/include/linux/ref_tracker.h +++ b/include/linux/ref_tracker.h @@ -12,6 +12,7 @@ struct ref_tracker_dir { spinlock_t lock; unsigned int quarantine_avail; refcount_t untracked; + bool dead; struct list_head list; /* List of active trackers */ struct list_head quarantine; /* List of dead trackers */ #endif @@ -24,6 +25,7 @@ static inline void ref_tracker_dir_init(struct ref_tracker_dir *dir, INIT_LIST_HEAD(&dir->list); INIT_LIST_HEAD(&dir->quarantine); spin_lock_init(&dir->lock); + dir->dead = false; dir->quarantine_avail = quarantine_count; refcount_set(&dir->untracked, 1); } diff --git a/lib/ref_tracker.c b/lib/ref_tracker.c index a6789c0c626b0f68ad67c264cd19177a63fb82d2..cbc798e5b97674a389ea3bbf17d9bb39fbf63328 100644 --- a/lib/ref_tracker.c +++ b/lib/ref_tracker.c @@ -20,6 +20,7 @@ void ref_tracker_dir_exit(struct ref_tracker_dir *dir) unsigned long flags; bool leak = false; + dir->dead = true; spin_lock_irqsave(&dir->lock, flags); list_for_each_entry_safe(tracker, n, &dir->quarantine, head) { list_del(&tracker->head); @@ -72,6 +73,8 @@ int ref_tracker_alloc(struct ref_tracker_dir *dir, gfp_t gfp_mask = gfp; unsigned long flags; + WARN_ON_ONCE(dir->dead); + if (gfp & __GFP_DIRECT_RECLAIM) gfp_mask |= __GFP_NOFAIL; *trackerp = tracker = kzalloc(sizeof(*tracker), gfp_mask);
On Fri, Jan 28, 2022 at 3:23 AM Eric Dumazet <edumazet@google.com> wrote: > > On Thu, Jan 27, 2022 at 6:22 PM Jakub Kicinski <kuba@kernel.org> wrote: > > > > On Fri, 28 Jan 2022 03:14:14 +0100 Jann Horn wrote: > > > Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT... > > > > > > I guess a better place to put the new check would be directly after > > > checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this? > > > > Possibly a very silly suggestion but perhaps we should set > > the pointer to NULL for the pcpu case and let it crash? I like that idea... but this_cpu_dec()/this_cpu_inc() use GS-relative addressing, at least on X86-64, so NULL might make things worse, I think? /proc/kallsyms on my machine starts with: 0000000000000000 A fixed_percpu_data 0000000000000000 A __per_cpu_start 0000000000001000 A cpu_debug_store 0000000000002000 A irq_stack_backing_store 0000000000006000 A cpu_tss_rw 000000000000b000 A gdt_page 000000000000c000 A exception_stacks 0000000000010000 A entry_stack_storage 0000000000011000 A espfix_waddr So we'd probably need some different placeholder instead of NULL to actually crash... > It is already set to 0 I think he meant do it already in netdev_run_todo(), not just in free_netdev()?
On Thu, Jan 27, 2022 at 6:27 PM Jann Horn <jannh@google.com> wrote: > > I like that idea... but this_cpu_dec()/this_cpu_inc() use GS-relative > addressing, at least on X86-64, so NULL might make things worse, I > think? /proc/kallsyms on my machine starts with: > > 0000000000000000 A fixed_percpu_data > 0000000000000000 A __per_cpu_start > 0000000000001000 A cpu_debug_store > 0000000000002000 A irq_stack_backing_store > 0000000000006000 A cpu_tss_rw > 000000000000b000 A gdt_page > 000000000000c000 A exception_stacks > 0000000000010000 A entry_stack_storage > 0000000000011000 A espfix_waddr > > So we'd probably need some different placeholder instead of NULL to > actually crash... Orthogonal problem, maybe we should make sure the first page of per-cpu data is un-mapped.
On Fri, Jan 28, 2022 at 3:25 AM Eric Dumazet <edumazet@google.com> wrote: > On Thu, Jan 27, 2022 at 6:14 PM Jann Horn <jannh@google.com> wrote: > > On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet <edumazet@google.com> wrote: > > > On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@google.com> wrote: > > > > > > > > I've run into a bug where dev_hold() was being called after > > > > netdev_wait_allrefs(). But at that point, the device is already going > > > > away, and dev_hold() can't stop that anymore. > > > > > > > > To make such problems easier to diagnose in the future: > > > > > > > > - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether > > > > the net refcount has been elevated. If this is detected, WARN() and > > > > leak the object (to prevent worse consequences from a > > > > use-after-free). > > > > - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero. > > > > This signals to the generic refcount infrastructure that any attempt > > > > to increment the refcount later is a bug. > > > > > > > > Signed-off-by: Jann Horn <jannh@google.com> > > > > --- > > > > net/core/dev.c | 18 +++++++++++++++++- > > > > 1 file changed, 17 insertions(+), 1 deletion(-) > > > > > > > > diff --git a/net/core/dev.c b/net/core/dev.c > > > > index 1baab07820f6..f7916c0d226d 100644 > > > > --- a/net/core/dev.c > > > > +++ b/net/core/dev.c > > > > @@ -9949,8 +9949,18 @@ void netdev_run_todo(void) > > > > > > > > netdev_wait_allrefs(dev); > > > > > > > > + /* Drop the netdev refcount (which should be 1 at this point) > > > > + * to zero. If we're using the generic refcount code, this will > > > > + * tell it that any dev_hold() after this point is a bug. > > > > + */ > > > > +#ifdef CONFIG_PCPU_DEV_REFCNT > > > > + this_cpu_dec(*dev->pcpu_refcnt); > > > > + BUG_ON(netdev_refcnt_read(dev) != 0); > > > > +#else > > > > + BUG_ON(!refcount_dec_and_test(&dev->dev_refcnt)); > > > > +#endif > > > > + > > > > /* paranoia */ > > > > - BUG_ON(netdev_refcnt_read(dev) != 1); > > > > BUG_ON(!list_empty(&dev->ptype_all)); > > > > BUG_ON(!list_empty(&dev->ptype_specific)); > > > > WARN_ON(rcu_access_pointer(dev->ip_ptr)); > > > > @@ -10293,6 +10303,12 @@ void free_netdev(struct net_device *dev) > > > > free_percpu(dev->xdp_bulkq); > > > > dev->xdp_bulkq = NULL; > > > > > > > > + /* Recheck in case someone called dev_hold() between > > > > + * netdev_wait_allrefs() and here. > > > > + */ > > > > > > At this point, dev->pcpu_refcnt per-cpu data has been freed already > > > (CONFIG_PCPU_DEV_REFCNT=y) > > > > > > So this should probably crash, or at least UAF ? > > > > Oh. Whoops. That's what I get for only testing without CONFIG_PCPU_DEV_REFCNT... > > > > I guess a better place to put the new check would be directly after > > checking for "dev->reg_state == NETREG_UNREGISTERING"? Like this? > > > > if (dev->reg_state == NETREG_UNREGISTERING) { > > ASSERT_RTNL(); > > dev->needs_free_netdev = true; > > return; > > } > > > > /* Recheck in case someone called dev_hold() between > > * netdev_wait_allrefs() and here. > > */ > > if (WARN_ON(netdev_refcnt_read(dev) != 0)) > > return; /* leak memory, otherwise we might get UAF */ > > > > netif_free_tx_queues(dev); > > netif_free_rx_queues(dev); > > Maybe another solution would be to leverage the recent dev_hold_track(). > > We could add a dead boolean to 'struct ref_tracker_dir ' (dev->refcnt_tracker) > [...] > @@ -72,6 +73,8 @@ int ref_tracker_alloc(struct ref_tracker_dir *dir, > gfp_t gfp_mask = gfp; > unsigned long flags; > > + WARN_ON_ONCE(dir->dead); When someone is using NET_DEV_REFCNT_TRACKER for slow debugging, they should also be able to take the performance hit of CONFIG_PCPU_DEV_REFCNT and rely on the normal increment-from-zero detection of the generic refcount code, right? (Maybe NET_DEV_REFCNT_TRACKER should depend on !CONFIG_PCPU_DEV_REFCNT?) My intent with the extra check in free_netdev() was to get some limited detection for production systems that don't use NET_DEV_REFCNT_TRACKER.
On Thu, Jan 27, 2022 at 6:48 PM Jann Horn <jannh@google.com> wrote: > When someone is using NET_DEV_REFCNT_TRACKER for slow debugging, they > should also be able to take the performance hit of > CONFIG_PCPU_DEV_REFCNT and rely on the normal increment-from-zero > detection of the generic refcount code, right? (Maybe > NET_DEV_REFCNT_TRACKER should depend on !CONFIG_PCPU_DEV_REFCNT?) NET_DEV_REFCNT_TRACKER is not slow, I think it has neglectable cost really. (I could not see any difference in my tests) Also getting a trap at the exact moment of the buggy dev_hold_track() is somewhat better than after-fact checking. In your case, linkwatch_add_event() already uses dev_hold_track() so my proposal would detect the issue right away. > > My intent with the extra check in free_netdev() was to get some > limited detection for production systems that don't use > NET_DEV_REFCNT_TRACKER. Understood
On Fri, Jan 28, 2022 at 3:53 AM Eric Dumazet <edumazet@google.com> wrote: > On Thu, Jan 27, 2022 at 6:48 PM Jann Horn <jannh@google.com> wrote: > > > When someone is using NET_DEV_REFCNT_TRACKER for slow debugging, they > > should also be able to take the performance hit of > > CONFIG_PCPU_DEV_REFCNT and rely on the normal increment-from-zero > > detection of the generic refcount code, right? (Maybe > > NET_DEV_REFCNT_TRACKER should depend on !CONFIG_PCPU_DEV_REFCNT?) > > NET_DEV_REFCNT_TRACKER is not slow, I think it has neglectable cost really. > (I could not see any difference in my tests) Ah, sorry. I misread the Kconfig text.
On Fri, Jan 28, 2022 at 3:25 AM Eric Dumazet <edumazet@google.com> wrote: > On Thu, Jan 27, 2022 at 6:14 PM Jann Horn <jannh@google.com> wrote: > > On Fri, Jan 28, 2022 at 3:09 AM Eric Dumazet <edumazet@google.com> wrote: > > > On Thu, Jan 27, 2022 at 5:43 PM Jann Horn <jannh@google.com> wrote: > > > > I've run into a bug where dev_hold() was being called after > > > > netdev_wait_allrefs(). But at that point, the device is already going > > > > away, and dev_hold() can't stop that anymore. > > > > > > > > To make such problems easier to diagnose in the future: > > > > > > > > - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether > > > > the net refcount has been elevated. If this is detected, WARN() and > > > > leak the object (to prevent worse consequences from a > > > > use-after-free). > > > > - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero. > > > > This signals to the generic refcount infrastructure that any attempt > > > > to increment the refcount later is a bug. [...] > > if (dev->reg_state == NETREG_UNREGISTERING) { > > ASSERT_RTNL(); > > dev->needs_free_netdev = true; > > return; > > } > > > > /* Recheck in case someone called dev_hold() between > > * netdev_wait_allrefs() and here. > > */ > > if (WARN_ON(netdev_refcnt_read(dev) != 0)) > > return; /* leak memory, otherwise we might get UAF */ > > > > netif_free_tx_queues(dev); > > netif_free_rx_queues(dev); > > Maybe another solution would be to leverage the recent dev_hold_track(). > > We could add a dead boolean to 'struct ref_tracker_dir ' (dev->refcnt_tracker) Hmm... actually, what even are the semantics of dev_hold()? Normal refcounts have the property that if you hold one reference, you're always allowed to add another reference. But from what I can tell, something like this: struct net_device *dev = dev_get_by_name(net, name); dev_hold(dev); dev_put(dev); dev_put(dev); would be buggy using the current CONFIG_PCPU_DEV_REFCNT implementation. Basically, if dev_hold() runs at the same time as netdev_refcnt_read(), it's a bug because netdev_refcnt_read() is non-atomic, and we could get the following race: task B: starts netdev_refcnt_read() task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 0) task A, on CPU 0: dev_hold(dev) increments *per_cpu_ptr(dev->pcpu_refcnt, 0) task A: migrates from CPU 0 to CPU 1 task A, on CPU 1: dev_put(dev) decrements *per_cpu_ptr(dev->pcpu_refcnt, 1) task B: reads *per_cpu_ptr(dev->pcpu_refcnt, 1) which would make task B miss one outstanding reference. (This is why the generic percpu refcounting code in lib/percpu-refcount.c has logic for switching the refcount to atomic mode with an RCU grace period.) If these are the intended semantics for dev_hold(), then I guess your approach of adding a new boolean flag somewhere is the right one - but we should be setting that flag *before* waiting for the refcount to drop to 1. Though maybe it shouldn't be in ref-tracker, since this is a peculiarity of the hand-rolled netdev refcount... Are these the intended semantics (and I should rewrite the patch to also catch dev_hold() racing with netdev_wait_allrefs()), or is this unintended (and the netdev refcount should be replaced)? This should probably be documented...
diff --git a/net/core/dev.c b/net/core/dev.c index 1baab07820f6..f7916c0d226d 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -9949,8 +9949,18 @@ void netdev_run_todo(void) netdev_wait_allrefs(dev); + /* Drop the netdev refcount (which should be 1 at this point) + * to zero. If we're using the generic refcount code, this will + * tell it that any dev_hold() after this point is a bug. + */ +#ifdef CONFIG_PCPU_DEV_REFCNT + this_cpu_dec(*dev->pcpu_refcnt); + BUG_ON(netdev_refcnt_read(dev) != 0); +#else + BUG_ON(!refcount_dec_and_test(&dev->dev_refcnt)); +#endif + /* paranoia */ - BUG_ON(netdev_refcnt_read(dev) != 1); BUG_ON(!list_empty(&dev->ptype_all)); BUG_ON(!list_empty(&dev->ptype_specific)); WARN_ON(rcu_access_pointer(dev->ip_ptr)); @@ -10293,6 +10303,12 @@ void free_netdev(struct net_device *dev) free_percpu(dev->xdp_bulkq); dev->xdp_bulkq = NULL; + /* Recheck in case someone called dev_hold() between + * netdev_wait_allrefs() and here. + */ + if (WARN_ON(netdev_refcnt_read(dev) != 0)) + return; /* leak memory, otherwise we might get UAF */ + /* Compatibility with error handling in drivers */ if (dev->reg_state == NETREG_UNINITIALIZED) { netdev_freemem(dev);
I've run into a bug where dev_hold() was being called after netdev_wait_allrefs(). But at that point, the device is already going away, and dev_hold() can't stop that anymore. To make such problems easier to diagnose in the future: - For CONFIG_PCPU_DEV_REFCNT builds: Recheck in free_netdev() whether the net refcount has been elevated. If this is detected, WARN() and leak the object (to prevent worse consequences from a use-after-free). - For builds without CONFIG_PCPU_DEV_REFCNT: Set the refcount to zero. This signals to the generic refcount infrastructure that any attempt to increment the refcount later is a bug. Signed-off-by: Jann Horn <jannh@google.com> --- net/core/dev.c | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) base-commit: 23a46422c56144939c091c76cf389aa863ce9c18