Message ID | 20240803133608.2124-1-chenqiwu@xiaomi.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: kfence: print the age time for alloacted objectes to trace memleak | expand |
On Sat, 3 Aug 2024 at 15:36, Qiwu Chen <qiwuchen55@gmail.com> wrote: > > From: "qiwu.chen" <qiwu.chen@transsion.com> > > For a convienince of tracing slab object leak, print the age time for typo: convenience What do you mean by "object leak"? From what I see the additional info is only printed on out-of-bounds access. Or do you mean when you inspect /sys/kernel/debug/kfence/objects? If so, that information would be useful in the commit message. However, to detect leaks there are better tools than KFENCE. Have you tried KMEMLEAK? KFENCE is really not a good choice to manually look for old objects, which themselves are sampled, to find leaks. Have you been able to successfully debug a leak this way? > alloacted objectes in kfence_print_stack(). typo: allocated objects > Signed-off-by: qiwu.chen <qiwu.chen@transsion.com> > --- > mm/kfence/report.c | 14 ++++++++++++-- > 1 file changed, 12 insertions(+), 2 deletions(-) > > diff --git a/mm/kfence/report.c b/mm/kfence/report.c > index c509aed326ce..44c3f82b25a8 100644 > --- a/mm/kfence/report.c > +++ b/mm/kfence/report.c > @@ -16,6 +16,7 @@ > #include <linux/sprintf.h> > #include <linux/stacktrace.h> > #include <linux/string.h> > +#include <linux/sched/clock.h> > #include <trace/events/error_report.h> > > #include <asm/kfence.h> > @@ -110,9 +111,18 @@ static void kfence_print_stack(struct seq_file *seq, const struct kfence_metadat > unsigned long rem_nsec = do_div(ts_sec, NSEC_PER_SEC); > > /* Timestamp matches printk timestamp format. */ > - seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n", > + if (meta->state == KFENCE_OBJECT_ALLOCATED) { In principle, the additonal info is convenient, but I'd like to generalize if possible. > + u64 interval_nsec = local_clock() - meta->alloc_track.ts_nsec; > + unsigned long rem_interval_nsec = do_div(interval_nsec, NSEC_PER_SEC); > + > + seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (age: %lu.%06lus):\n", I've found myself trying to figure out the elapsed time since the allocation or free, based on the current timestamp. So something that would be more helpful is if you just change the printed line for all alloc and free stack infos to say something like: seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago):\n", So rather than saying this info is the "age", we just say the elapsed time. That generalizes this bit of info, and it'll be available for both alloc and free stacks. Does that work for you? > show_alloc ? "allocated" : "freed", track->pid, > - track->cpu, (unsigned long)ts_sec, rem_nsec / 1000); > + track->cpu, (unsigned long)ts_sec, rem_nsec / 1000, > + (unsigned long)interval_nsec, rem_interval_nsec / 1000); > + } else Add braces {} even though it's a single statement - it spans several lines and the above is also {}-enclosed, so it looks balanced. But if you follow my suggestion, you won't have the else branch anymore. > + seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n", > + show_alloc ? "allocated" : "freed", track->pid, > + track->cpu, (unsigned long)ts_sec, rem_nsec / 1000); > > if (track->num_stack_entries) { > /* Skip allocation/free internals stack. */
On Sat, Aug 03, 2024 at 04:51:45PM +0200, Marco Elver wrote: > > typo: convenience > > What do you mean by "object leak"? > It means an allocated object of slab memory which is considered orphan, perhaps it's more clear to say "For a convenience of tracing memory leaks by kfence", what do you think? > From what I see the additional info is only printed on out-of-bounds access. > > Or do you mean when you inspect /sys/kernel/debug/kfence/objects? If > so, that information would be useful in the commit message. > The extra elapsed time of current allocated object would be useful to figure out memory leaks when inspect /sys/kernel/debug/kfence/objects. > However, to detect leaks there are better tools than KFENCE. Have you > tried KMEMLEAK? KFENCE is really not a good choice to manually look > for old objects, which themselves are sampled, to find leaks. > Have you been able to successfully debug a leak this way? > The kmemleak tool has limitations and drawbacks which cannot be used in productive environment directly. KFENCE is a good choice to find leaks in productive environment. > > alloacted objectes in kfence_print_stack(). > > typo: allocated objects > Thank's for your comment. > > In principle, the additonal info is convenient, but I'd like to > generalize if possible. > > > + u64 interval_nsec = local_clock() - meta->alloc_track.ts_nsec; > > + unsigned long rem_interval_nsec = do_div(interval_nsec, NSEC_PER_SEC); > > + > > + seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (age: %lu.%06lus):\n", > > I've found myself trying to figure out the elapsed time since the > allocation or free, based on the current timestamp. > > So something that would be more helpful is if you just change the > printed line for all alloc and free stack infos to say something like: > > seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus > (%lu.%06lus ago):\n", > > So rather than saying this info is the "age", we just say the elapsed > time. That generalizes this bit of info, and it'll be available for > both alloc and free stacks. > > Does that work for you? > It does not work for me actually, since it's unintuitive to figure out memory leaks by the elapsed time of allocated stacks when inspect /sys/kernel/debug/kfence/objects. It's unnecessary to print the elapsed time of allocated stacks for the objects in KFENCE_OBJECT_FREED state. For the elapsed time of free stacks, it seems no available scenarion currently. BTW, The change from "age" to "ago" is okay to me! > > show_alloc ? "allocated" : "freed", track->pid, > > - track->cpu, (unsigned long)ts_sec, rem_nsec / 1000); > > + track->cpu, (unsigned long)ts_sec, rem_nsec / 1000, > > + (unsigned long)interval_nsec, rem_interval_nsec / 1000); > > + } else > > Add braces {} even though it's a single statement - it spans several > lines and the above is also {}-enclosed, so it looks balanced. > > But if you follow my suggestion, you won't have the else branch anymore. > I'm not opposed to convert a single statement, but it will have an effort to find leaks. This change will be helpful to find leaks easier by grep "ago" keyword when inspect /sys/kernel/debug/kfence/objects. Thanks Qiwu
On Sun, 4 Aug 2024 at 05:46, chenqiwu <qiwuchen55@gmail.com> wrote: [...] > > I've found myself trying to figure out the elapsed time since the > > allocation or free, based on the current timestamp. > > > > So something that would be more helpful is if you just change the > > printed line for all alloc and free stack infos to say something like: > > > > seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago):\n", > > > > So rather than saying this info is the "age", we just say the elapsed > > time. That generalizes this bit of info, and it'll be available for > > both alloc and free stacks. > > > > Does that work for you? > > > It does not work for me actually, since it's unintuitive to figure out memory leaks The number printed is the same. It's just the change of "age" to "ago". > by the elapsed time of allocated stacks when inspect /sys/kernel/debug/kfence/objects. > It's unnecessary to print the elapsed time of allocated stacks for the objects in KFENCE_OBJECT_FREED > state. For the elapsed time of free stacks, it seems no available scenarion currently. > BTW, The change from "age" to "ago" is okay to me! Well, what I'm saying, having this info also for FREED objects on the free stack can be useful in some debugging scenarios when you get a use-after-free, and you want to know the elapsed time since the free happened. I have done this calculation manually before, which is why I suggested it. Maybe it's not useful for you for finding leaks, but that's just one usecase. Thanks, -- Marco
On Sun, Aug 04, 2024 at 10:37:43AM +0200, Marco Elver wrote: > > Well, what I'm saying, having this info also for FREED objects on the > free stack can be useful in some debugging scenarios when you get a > use-after-free, and you want to know the elapsed time since the free > happened. I have done this calculation manually before, which is why I > suggested it. Maybe it's not useful for you for finding leaks, but > that's just one usecase. > Agreed with your concern scenarios. How about the following change with additonal object state info? + u64 interval_nsec = local_clock() - meta->alloc_track.ts_nsec; + unsigned long rem_interval_nsec = do_div(interval_nsec, NSEC_PER_SEC); /* Timestamp matches printk timestamp format. */ - seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n", + seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago) for %s object:\n", show_alloc ? "allocated" : "freed", track->pid, - track->cpu, (unsigned long)ts_sec, rem_nsec / 1000); + track->cpu, (unsigned long)ts_sec, rem_nsec / 1000, + (unsigned long)interval_nsec, rem_interval_nsec / 1000, + meta->state == KFENCE_OBJECT_ALLOCATED? "allocated" : "freed"); In this way, we can find leaks by grep "allocated object" and inspect the elapsed time of use-after-free by grep "freed object". Thanks Qiwu
On Mon, 5 Aug 2024 at 05:35, chenqiwu <qiwuchen55@gmail.com> wrote: > > On Sun, Aug 04, 2024 at 10:37:43AM +0200, Marco Elver wrote: > > > > Well, what I'm saying, having this info also for FREED objects on the > > free stack can be useful in some debugging scenarios when you get a > > use-after-free, and you want to know the elapsed time since the free > > happened. I have done this calculation manually before, which is why I > > suggested it. Maybe it's not useful for you for finding leaks, but > > that's just one usecase. > > > Agreed with your concern scenarios. > How about the following change with additonal object state info? > > + u64 interval_nsec = local_clock() - meta->alloc_track.ts_nsec; > + unsigned long rem_interval_nsec = do_div(interval_nsec, NSEC_PER_SEC); > > /* Timestamp matches printk timestamp format. */ > - seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n", > + seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago) for %s object:\n", > show_alloc ? "allocated" : "freed", track->pid, > - track->cpu, (unsigned long)ts_sec, rem_nsec / 1000); > + track->cpu, (unsigned long)ts_sec, rem_nsec / 1000, > + (unsigned long)interval_nsec, rem_interval_nsec / 1000, > + meta->state == KFENCE_OBJECT_ALLOCATED? "allocated" : "freed"); > > In this way, we can find leaks by grep "allocated object" and inspect the elapsed time of > use-after-free by grep "freed object". The "allocated/freed" info is superfluous, as freed objects will have a free stack. Consider a slightly better script vs. just using grep. /sys/kernel/debug/kfence/objects is of secondary concern and was added primarily as a debugging aid for KFENCE developers. We never thought it could be used to look for leaks, but good you found another use for it. ;-) The priority is to keep regular error reports generated by KFENCE readable. Adding this "allocated/freed" info just makes the line longer and is not useful. I'm happy with the "(%lu.%06lus ago)" part alone.
On Mon, Aug 05, 2024 at 08:50:57AM +0200, Marco Elver wrote: > > The "allocated/freed" info is superfluous, as freed objects will have > a free stack. > > Consider a slightly better script vs. just using grep. Well, I think using grep is eaiser than a script to find leaks by a large number of alloc tracks. > /sys/kernel/debug/kfence/objects is of secondary concern and was added > primarily as a debugging aid for KFENCE developers. We never thought > it could be used to look for leaks, but good you found another use for > it. ;-) > The priority is to keep regular error reports generated by KFENCE > readable. Adding this "allocated/freed" info just makes the line > longer and is not useful. > How about print meta->state directly to get the object state for its alloc/free track? - seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n", + seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago) state %d:\n", show_alloc ? "allocated" : "freed", track->pid, - track->cpu, (unsigned long)ts_sec, rem_nsec / 1000); + track->cpu, (unsigned long)ts_sec, rem_nsec / 1000, + (unsigned long)interval_nsec, rem_interval_nsec / 1000, + meta->state); > I'm happy with the "(%lu.%06lus ago)" part alone. If it's still a not good idea, I will follow your suggestion and resend it as v2. Thanks Qiwu
On Mon, 5 Aug 2024 at 16:06, chenqiwu <qiwuchen55@gmail.com> wrote: > > On Mon, Aug 05, 2024 at 08:50:57AM +0200, Marco Elver wrote: > > > > The "allocated/freed" info is superfluous, as freed objects will have > > a free stack. > > > > Consider a slightly better script vs. just using grep. > Well, I think using grep is eaiser than a script to find leaks by a > large number of alloc tracks. Sure. But a slightly more complex script is a better trade-off vs. impacting _all_ KFENCE users world-wide with slightly less readable error reports. > > /sys/kernel/debug/kfence/objects is of secondary concern and was added > > primarily as a debugging aid for KFENCE developers. We never thought > > it could be used to look for leaks, but good you found another use for > > it. ;-) > > The priority is to keep regular error reports generated by KFENCE > > readable. Adding this "allocated/freed" info just makes the line > > longer and is not useful. > > > How about print meta->state directly to get the object state for its > alloc/free track? > - seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n", > + seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago) state %d:\n", > show_alloc ? "allocated" : "freed", track->pid, > - track->cpu, (unsigned long)ts_sec, rem_nsec / 1000); > + track->cpu, (unsigned long)ts_sec, rem_nsec / 1000, > + (unsigned long)interval_nsec, rem_interval_nsec / 1000, > + meta->state); > > I'm happy with the "(%lu.%06lus ago)" part alone. > If it's still a not good idea, I will follow your suggestion and resend > it as v2. No, that's just making it more ugly for no reason. It's replicating the state info (just like before) for alloc and free stacks and generally does not add anything useful. See, we are writing code that is deployed on millions of machines, and KFENCE error reports do appear in the wild occasionally. We have to optimize for the common case. Your change might be useful for you, which is a relatively unique usecase. The common use case of KFENCE is to detect memory-safety errors, and good error reports are a major feature of KFENCE. All information is already present in the reports (and /sys/kernel/debug/kfence/objects). I argue that you are able to write a slightly more complex script that simply looks for the free stack right after the allocation stack to determine if an object is live or freed. Maybe doing it in bash won't work so nicely, but a small Python script can easily do that job. Once you have that Python script you might even do further processing, sort things by age, size, etc. etc., and then print whole stack traces. Just grep can't do that. So if you want something useful, you'd have to give up on grep sooner or later.
diff --git a/mm/kfence/report.c b/mm/kfence/report.c index c509aed326ce..44c3f82b25a8 100644 --- a/mm/kfence/report.c +++ b/mm/kfence/report.c @@ -16,6 +16,7 @@ #include <linux/sprintf.h> #include <linux/stacktrace.h> #include <linux/string.h> +#include <linux/sched/clock.h> #include <trace/events/error_report.h> #include <asm/kfence.h> @@ -110,9 +111,18 @@ static void kfence_print_stack(struct seq_file *seq, const struct kfence_metadat unsigned long rem_nsec = do_div(ts_sec, NSEC_PER_SEC); /* Timestamp matches printk timestamp format. */ - seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n", + if (meta->state == KFENCE_OBJECT_ALLOCATED) { + u64 interval_nsec = local_clock() - meta->alloc_track.ts_nsec; + unsigned long rem_interval_nsec = do_div(interval_nsec, NSEC_PER_SEC); + + seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (age: %lu.%06lus):\n", show_alloc ? "allocated" : "freed", track->pid, - track->cpu, (unsigned long)ts_sec, rem_nsec / 1000); + track->cpu, (unsigned long)ts_sec, rem_nsec / 1000, + (unsigned long)interval_nsec, rem_interval_nsec / 1000); + } else + seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n", + show_alloc ? "allocated" : "freed", track->pid, + track->cpu, (unsigned long)ts_sec, rem_nsec / 1000); if (track->num_stack_entries) { /* Skip allocation/free internals stack. */