Message ID | 20211117192031.3906502-2-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | refcount: add tracking infrastructure | expand |
On Wed, 17 Nov 2021 11:20:30 -0800 Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > It can be hard to track where references are taken and released. > > In networking, we have annoying issues at device dismantles, > and we had various proposals to ease root causing them. > > This patch adds new infrastructure pairing refcount increases > and decreases. This will self document code, because programmer > will have to associate increments/decrements. > > This is controled by CONFIG_REF_TRACKER which can be selected > by users of this feature. > > This adds both cpu and memory costs, and thus should be reserved > for debug kernel builds, or be enabled on demand with a static key. > > Signed-off-by: Eric Dumazet <edumazet@google.com> Looks great, this is what I had in mind when I said: | In the future we can extend this structure to also catch those | who fail to release the ref on unregistering notification. I realized today we can get quite a lot of coverage by just plugging in object debug infra. The main differences I see: - do we ever want to use this in prod? - if not why allocate the tracker itself dynamically? The double pointer interface seems harder to compile out completely - whether one stored netdev ptr can hold multiple refs - do we want to wrap the pointer itself or have the "tracker" object be a separate entity - do we want to catch "use after free" when ref is accessed after it was already released No strong preference either way.
On Wed, Nov 17, 2021 at 12:03 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 17 Nov 2021 11:20:30 -0800 Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > It can be hard to track where references are taken and released. > > > > In networking, we have annoying issues at device dismantles, > > and we had various proposals to ease root causing them. > > > > This patch adds new infrastructure pairing refcount increases > > and decreases. This will self document code, because programmer > > will have to associate increments/decrements. > > > > This is controled by CONFIG_REF_TRACKER which can be selected > > by users of this feature. > > > > This adds both cpu and memory costs, and thus should be reserved > > for debug kernel builds, or be enabled on demand with a static key. > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Looks great, this is what I had in mind when I said: > > | In the future we can extend this structure to also catch those > | who fail to release the ref on unregistering notification. > > I realized today we can get quite a lot of coverage by just plugging > in object debug infra. > > The main differences I see: > - do we ever want to use this in prod? - if not why allocate the > tracker itself dynamically? The double pointer interface seems > harder to compile out completely I think that maintaining the tracking state in separate storage would detect cases where the object has been freed, without the help of KASAN. > - whether one stored netdev ptr can hold multiple refs For a same stack depot then ? Problem is that at the time of dev_hold(), we do not know if there is one associated dev_put() or multiple ones (different stack depot) > - do we want to wrap the pointer itself or have the "tracker" object > be a separate entity > - do we want to catch "use after free" when ref is accessed after > it was already released > > No strong preference either way. BTW my current suspicion about reported leaks is in rt6_uncached_list_flush_dev() I was considering something like diff --git a/net/ipv6/route.c b/net/ipv6/route.c index 5e8f2f15607db7e6589b8bdb984e62512ad30589..233931b7c547d852ed3adeaa15f0a48f437b6596 100644 --- a/net/ipv6/route.c +++ b/net/ipv6/route.c @@ -163,9 +163,6 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev) struct net_device *loopback_dev = net->loopback_dev; int cpu; - if (dev == loopback_dev) - return; - for_each_possible_cpu(cpu) { struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu); struct rt6_info *rt; @@ -175,7 +172,7 @@ static void rt6_uncached_list_flush_dev(struct net *net, struct net_device *dev) struct inet6_dev *rt_idev = rt->rt6i_idev; struct net_device *rt_dev = rt->dst.dev; - if (rt_idev->dev == dev) { + if (rt_idev->dev == dev && dev != loopback_dev) { rt->rt6i_idev = in6_dev_get(loopback_dev); in6_dev_put(rt_idev); }
On Wed, 17 Nov 2021 12:16:15 -0800 Eric Dumazet wrote: > On Wed, Nov 17, 2021 at 12:03 PM Jakub Kicinski <kuba@kernel.org> wrote: > > Looks great, this is what I had in mind when I said: > > > > | In the future we can extend this structure to also catch those > > | who fail to release the ref on unregistering notification. > > > > I realized today we can get quite a lot of coverage by just plugging > > in object debug infra. > > > > The main differences I see: > > - do we ever want to use this in prod? - if not why allocate the > > tracker itself dynamically? The double pointer interface seems > > harder to compile out completely > > I think that maintaining the tracking state in separate storage would > detect cases where the object has been freed, without the help of KASAN. Makes sense, I guess we can hang more of the information of a secondary object? Maybe I'm missing a trick on how to make the feature consume no space when disabled via Kconfig. > > - whether one stored netdev ptr can hold multiple refs > > For a same stack depot then ? Not necessarily. > Problem is that at the time of dev_hold(), we do not know if > there is one associated dev_put() or multiple ones (different stack depot) Ack. My thinking was hold all stacks until tracker is completely drained of refs. We'd have to collect both hold and put stacks in that case and if ref leak happens try to match them up manually later (manually == human). But if we can get away without allowing multiple refs with one tracker that makes life easier, and is probably a cleaner API, anyway. > > - do we want to wrap the pointer itself or have the "tracker" object > > be a separate entity > > - do we want to catch "use after free" when ref is accessed after > > it was already released > > > > No strong preference either way. > > BTW my current suspicion about reported leaks is in > rt6_uncached_list_flush_dev() > > I was considering something like > > diff --git a/net/ipv6/route.c b/net/ipv6/route.c > index 5e8f2f15607db7e6589b8bdb984e62512ad30589..233931b7c547d852ed3adeaa15f0a48f437b6596 > 100644 > --- a/net/ipv6/route.c > +++ b/net/ipv6/route.c > @@ -163,9 +163,6 @@ static void rt6_uncached_list_flush_dev(struct net > *net, struct net_device *dev) > struct net_device *loopback_dev = net->loopback_dev; > int cpu; > > - if (dev == loopback_dev) > - return; > - > for_each_possible_cpu(cpu) { > struct uncached_list *ul = per_cpu_ptr(&rt6_uncached_list, cpu); > struct rt6_info *rt; > @@ -175,7 +172,7 @@ static void rt6_uncached_list_flush_dev(struct net > *net, struct net_device *dev) > struct inet6_dev *rt_idev = rt->rt6i_idev; > struct net_device *rt_dev = rt->dst.dev; > > - if (rt_idev->dev == dev) { > + if (rt_idev->dev == dev && dev != loopback_dev) { > rt->rt6i_idev = in6_dev_get(loopback_dev); > in6_dev_put(rt_idev); > } Interesting.
On 11/17/21 12:47 PM, Jakub Kicinski wrote: > On Wed, 17 Nov 2021 12:16:15 -0800 Eric Dumazet wrote: >> On Wed, Nov 17, 2021 at 12:03 PM Jakub Kicinski <kuba@kernel.org> wrote: >>> Looks great, this is what I had in mind when I said: >>> >>> | In the future we can extend this structure to also catch those >>> | who fail to release the ref on unregistering notification. >>> >>> I realized today we can get quite a lot of coverage by just plugging >>> in object debug infra. >>> >>> The main differences I see: >>> - do we ever want to use this in prod? - if not why allocate the >>> tracker itself dynamically? The double pointer interface seems >>> harder to compile out completely >> >> I think that maintaining the tracking state in separate storage would >> detect cases where the object has been freed, without the help of KASAN. > > Makes sense, I guess we can hang more of the information of a secondary > object? > > Maybe I'm missing a trick on how to make the feature consume no space > when disabled via Kconfig. If not enabled in Kconfig, the structures are empty, so consume no space. Basically this should a nop.
On Wed, 17 Nov 2021 14:43:24 -0800 Eric Dumazet wrote: > On 11/17/21 12:47 PM, Jakub Kicinski wrote: > > On Wed, 17 Nov 2021 12:16:15 -0800 Eric Dumazet wrote: > >> I think that maintaining the tracking state in separate storage would > >> detect cases where the object has been freed, without the help of KASAN. > > > > Makes sense, I guess we can hang more of the information of a secondary > > object? > > > > Maybe I'm missing a trick on how to make the feature consume no space > > when disabled via Kconfig. > > If not enabled in Kconfig, the structures are empty, so consume no space. > > Basically this should a nop. Right, probably not worth going back and forth, example use will clarify this. I feel like the two approaches are somewhat complementary, object debug can help us pin point where ref got freed / lost. Could be useful if there are many release paths for the same struct. How do you feel about the struct netdev_ref wrapper I made? Do you prefer to keep the tracking independent or can we provide the sort of API I had in mind as well as yours: void netdev_hold(struct netdev_ref *ref, struct net_device *dev) void netdev_put(struct netdev_ref *ref) struct net_device *netdev_ref_ptr(const struct netdev_ref *ref) (doing both your tracking and object debug behind the scenes)
Hi Eric, Jakub, How strongly do you want to make this work w/o KASAN? I am asking because KASAN will already memorize alloc/free stacks for every heap object (+ pids + 2 aux stacks with kasan_record_aux_stack()). So basically we just need to alloc struct list_head and won't need quarantine/quarantine_avail in ref_tracker_dir. If there are some refcount bugs, it may be due to a previous use-after-free, so debugging a refcount bug w/o KASAN may be waste of time.
Hi Eric, Nice! Especially ref_tracker_dir_print() in netdev_wait_allrefs(). > + *trackerp = tracker = kzalloc(sizeof(*tracker), gfp); This may benefit from __GFP_NOFAIL. syzkaller will use fault injection to fail this. And I think it will do more bad than good. We could also note this condition in dir, along the lines of: if (!tracker) { dir->failed = true; To print on any errors and to check in ref_tracker_free(): int ref_tracker_free(struct ref_tracker_dir *dir, struct ref_tracker **trackerp) { ... if (!tracker) { WARN_ON(!dir->failed); return -EEXIST; } This would be a bug, right? Or: *trackerp = tracker = kzalloc(sizeof(*tracker), gfp); if (!tracker) { *tracker = TRACKERP_ALLOC_FAILED; return -ENOMEM; } and then check TRACKERP_ALLOC_FAILED in ref_tracker_free(). dev_hold_track() ignores the return value, so it would be useful to note this condition. > + if (tracker->dead) { > + pr_err("reference already released.\n"); This and other custom prints won't be detected as bugs by syzkaller and other testing systems, they detect the standard BUG/WARNING. Please use these. ref_tracker_free() uses unnecesary long critical sections. I understand this is debugging code, but frequently debugging code is so pessimistic that nobody use it. If we enable this on syzbot, it will also slowdown all fuzzing. I think with just a small code shuffling critical sections can be significantly reduced: nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 1); tracker->free_stack_handle = stack_depot_save(entries, nr_entries, GFP_ATOMIC); spin_lock_irqsave(&dir->lock, flags); if (tracker->dead) ... tracker->dead = true; list_move_tail(&tracker->head, &dir->quarantine); if (!dir->quarantine_avail) { tracker = list_first_entry(&dir->quarantine, struct ref_tracker, head); list_del(&tracker->head); } else { dir->quarantine_avail--; tracker = NULL; } spin_unlock_irqrestore(&dir->lock, flags); kfree(tracker); > +#define REF_TRACKER_STACK_ENTRIES 16 > + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 1); > + tracker->alloc_stack_handle = stack_depot_save(entries, nr_entries, gfp); The saved stacks can be longer because they are de-duped. But stacks insered into stack_depot need to be trimmed with filter_irq_stacks(). It seems that almost all current users got it wrong. We are considering moving filter_irq_stacks() into stack_depot_save(), but it's not done yet.
On Tue, 30 Nov 2021 10:09:52 +0100 Dmitry Vyukov wrote: > Hi Eric, Jakub, > > How strongly do you want to make this work w/o KASAN? > I am asking because KASAN will already memorize alloc/free stacks for every > heap object (+ pids + 2 aux stacks with kasan_record_aux_stack()). > So basically we just need to alloc struct list_head and won't need > quarantine/quarantine_avail in ref_tracker_dir. > If there are some refcount bugs, it may be due to a previous use-after-free, > so debugging a refcount bug w/o KASAN may be waste of time. I don't mind, I was primarily targeting syzbot instances which will have KASAN enabled AFAIU.
On Tue, Nov 30, 2021 at 1:09 AM Dmitry Vyukov <dvyukov@google.com> wrote: > > Hi Eric, Jakub, > > How strongly do you want to make this work w/o KASAN? > I am asking because KASAN will already memorize alloc/free stacks for every > heap object (+ pids + 2 aux stacks with kasan_record_aux_stack()). > So basically we just need to alloc struct list_head and won't need > quarantine/quarantine_avail in ref_tracker_dir. > If there are some refcount bugs, it may be due to a previous use-after-free, > so debugging a refcount bug w/o KASAN may be waste of time. > No strong opinion, we could have the quarantine stuff enabled only if KASAN is not compiled in. I was trying to make something that could be used even in a production environment, for seldom modified refcounts. As this tracking is optional, we do not have to use it in very small sections of code, where the inc/dec are happening in obviously correct and not long living pairs.
On Tue, 30 Nov 2021 at 16:08, Eric Dumazet <edumazet@google.com> wrote: > > Hi Eric, Jakub, > > > > How strongly do you want to make this work w/o KASAN? > > I am asking because KASAN will already memorize alloc/free stacks for every > > heap object (+ pids + 2 aux stacks with kasan_record_aux_stack()). > > So basically we just need to alloc struct list_head and won't need > > quarantine/quarantine_avail in ref_tracker_dir. > > If there are some refcount bugs, it may be due to a previous use-after-free, > > so debugging a refcount bug w/o KASAN may be waste of time. > > > > No strong opinion, we could have the quarantine stuff enabled only if > KASAN is not compiled in. > I was trying to make something that could be used even in a production > environment, for seldom modified refcounts. > As this tracking is optional, we do not have to use it in very small > sections of code, where the inc/dec are happening in obviously correct > and not long living pairs. If it won't be used on very frequent paths, then it probably does not matter much for syzbot as well. And additional ifdefs are not worth it. Then try to go with your current version.
diff --git a/include/linux/ref_tracker.h b/include/linux/ref_tracker.h new file mode 100644 index 0000000000000000000000000000000000000000..1a2a3696682d40b38f9f1dd2b14663716e37d9d3 --- /dev/null +++ b/include/linux/ref_tracker.h @@ -0,0 +1,78 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#ifndef _LINUX_REF_TRACKER_H +#define _LINUX_REF_TRACKER_H +#include <linux/types.h> +#include <linux/spinlock.h> +#include <linux/stackdepot.h> + +struct ref_tracker { +#ifdef CONFIG_REF_TRACKER + struct list_head head; /* anchor into dir->list or dir->quarantine */ + bool dead; + depot_stack_handle_t alloc_stack_handle; + depot_stack_handle_t free_stack_handle; +#endif +}; + +struct ref_tracker_dir { +#ifdef CONFIG_REF_TRACKER + spinlock_t lock; + unsigned int quarantine_avail; + struct list_head list; /* List of active trackers */ + struct list_head quarantine; /* List of dead trackers */ +#endif +}; + +#ifdef CONFIG_REF_TRACKER +static inline void ref_tracker_dir_init(struct ref_tracker_dir *dir, + unsigned int quarantine_count) +{ + INIT_LIST_HEAD(&dir->list); + INIT_LIST_HEAD(&dir->quarantine); + spin_lock_init(&dir->lock); + dir->quarantine_avail = quarantine_count; +} + +void ref_tracker_dir_exit(struct ref_tracker_dir *dir); + +void ref_tracker_dir_print(struct ref_tracker_dir *dir, + unsigned int display_limit); + +int ref_tracker_alloc(struct ref_tracker_dir *dir, + struct ref_tracker **trackerp, gfp_t gfp); + +int ref_tracker_free(struct ref_tracker_dir *dir, + struct ref_tracker **trackerp); + +#else /* CONFIG_REF_TRACKER */ + +static inline void ref_tracker_dir_init(struct ref_tracker_dir *dir, + unsigned int quarantine_count) +{ +} + +static inline void ref_tracker_dir_exit(struct ref_tracker_dir *dir) +{ +} + +static inline void ref_tracker_dir_print(struct ref_tracker_dir *dir, + unsigned int display_limit) +{ +} + +static inline int ref_tracker_alloc(struct ref_tracker_dir *dir, + struct ref_tracker **trackerp, + gfp_t gfp) +{ + return 0; +} + +static inline int ref_tracker_free(struct ref_tracker_dir *dir, + struct ref_tracker **trackerp) +{ + return 0; +} + +#endif + +#endif /* _LINUX_REF_TRACKER_H */ diff --git a/lib/Kconfig b/lib/Kconfig index 5e7165e6a346c9bec878b78c8c8c3d175fc98dfd..d01be8e9593992a7d94a46bd1716460bc33c3ae1 100644 --- a/lib/Kconfig +++ b/lib/Kconfig @@ -680,6 +680,10 @@ config STACK_HASH_ORDER Select the hash size as a power of 2 for the stackdepot hash table. Choose a lower value to reduce the memory impact. +config REF_TRACKER + bool + select STACKDEPOT + config SBITMAP bool diff --git a/lib/Makefile b/lib/Makefile index 364c23f1557816f73aebd8304c01224a4846ac6c..c1fd9243ddb9cc1ac5252d7eb8009f9290782c4a 100644 --- a/lib/Makefile +++ b/lib/Makefile @@ -270,6 +270,8 @@ obj-$(CONFIG_STACKDEPOT) += stackdepot.o KASAN_SANITIZE_stackdepot.o := n KCOV_INSTRUMENT_stackdepot.o := n +obj-$(CONFIG_REF_TRACKER) += ref_tracker.o + libfdt_files = fdt.o fdt_ro.o fdt_wip.o fdt_rw.o fdt_sw.o fdt_strerror.o \ fdt_empty_tree.o fdt_addresses.o $(foreach file, $(libfdt_files), \ diff --git a/lib/ref_tracker.c b/lib/ref_tracker.c new file mode 100644 index 0000000000000000000000000000000000000000..e907c58c31ed49719e31c6e46abd1715d9884924 --- /dev/null +++ b/lib/ref_tracker.c @@ -0,0 +1,116 @@ +// SPDX-License-Identifier: GPL-2.0-or-later +#include <linux/export.h> +#include <linux/ref_tracker.h> +#include <linux/slab.h> +#include <linux/stacktrace.h> + +#define REF_TRACKER_STACK_ENTRIES 16 + +void ref_tracker_dir_exit(struct ref_tracker_dir *dir) +{ + struct ref_tracker *tracker, *n; + unsigned long flags; + + spin_lock_irqsave(&dir->lock, flags); + list_for_each_entry_safe(tracker, n, &dir->quarantine, head) { + list_del(&tracker->head); + kfree(tracker); + dir->quarantine_avail++; + } + list_for_each_entry_safe(tracker, n, &dir->list, head) { + pr_err("leaked reference.\n"); + if (tracker->alloc_stack_handle) + stack_depot_print(tracker->alloc_stack_handle); + list_del(&tracker->head); + kfree(tracker); + } + spin_unlock_irqrestore(&dir->lock, flags); +} +EXPORT_SYMBOL(ref_tracker_dir_exit); + +void ref_tracker_dir_print(struct ref_tracker_dir *dir, + unsigned int display_limit) +{ + struct ref_tracker *tracker; + unsigned long flags; + unsigned int i = 0; + + spin_lock_irqsave(&dir->lock, flags); + list_for_each_entry(tracker, &dir->list, head) { + tracker->dead = true; + if (i < display_limit) { + pr_err("leaked reference.\n"); + if (tracker->alloc_stack_handle) + stack_depot_print(tracker->alloc_stack_handle); + } + i++; + } + spin_unlock_irqrestore(&dir->lock, flags); +} +EXPORT_SYMBOL(ref_tracker_dir_print); + +int ref_tracker_alloc(struct ref_tracker_dir *dir, + struct ref_tracker **trackerp, + gfp_t gfp) +{ + unsigned long entries[REF_TRACKER_STACK_ENTRIES]; + struct ref_tracker *tracker; + unsigned int nr_entries; + unsigned long flags; + + *trackerp = tracker = kzalloc(sizeof(*tracker), gfp); + if (!tracker) { + pr_err_once("memory allocation failure, unreliable refcount tracker.\n"); + return -ENOMEM; + } + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 1); + tracker->alloc_stack_handle = stack_depot_save(entries, nr_entries, gfp); + + spin_lock_irqsave(&dir->lock, flags); + list_add(&tracker->head, &dir->list); + spin_unlock_irqrestore(&dir->lock, flags); + return 0; +} +EXPORT_SYMBOL_GPL(ref_tracker_alloc); + +int ref_tracker_free(struct ref_tracker_dir *dir, + struct ref_tracker **trackerp) +{ + unsigned long entries[REF_TRACKER_STACK_ENTRIES]; + struct ref_tracker *tracker = *trackerp; + unsigned int nr_entries; + unsigned long flags; + + if (!tracker) + return -EEXIST; + spin_lock_irqsave(&dir->lock, flags); + if (tracker->dead) { + pr_err("reference already released.\n"); + if (tracker->alloc_stack_handle) { + pr_err("allocated in:\n"); + stack_depot_print(tracker->alloc_stack_handle); + } + if (tracker->free_stack_handle) { + pr_err("freed in:\n"); + stack_depot_print(tracker->free_stack_handle); + } + spin_unlock_irqrestore(&dir->lock, flags); + return -EINVAL; + } + tracker->dead = true; + + nr_entries = stack_trace_save(entries, ARRAY_SIZE(entries), 1); + tracker->free_stack_handle = stack_depot_save(entries, nr_entries, GFP_ATOMIC); + + list_move_tail(&tracker->head, &dir->quarantine); + if (!dir->quarantine_avail) { + tracker = list_first_entry(&dir->quarantine, struct ref_tracker, head); + list_del(&tracker->head); + kfree(tracker); + } else { + dir->quarantine_avail--; + } + spin_unlock_irqrestore(&dir->lock, flags); + return 0; +} +EXPORT_SYMBOL_GPL(ref_tracker_free);