diff mbox series

mm: kfence: print the age time for alloacted objectes to trace memleak

Message ID 20240803133608.2124-1-chenqiwu@xiaomi.com (mailing list archive)
State New
Headers show
Series mm: kfence: print the age time for alloacted objectes to trace memleak | expand

Commit Message

qiwu.chen Aug. 3, 2024, 1:36 p.m. UTC
From: "qiwu.chen" <qiwu.chen@transsion.com>

For a convienince of tracing slab object leak, print the age time for
alloacted objectes in kfence_print_stack().

Signed-off-by: qiwu.chen <qiwu.chen@transsion.com>
---
 mm/kfence/report.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

Comments

Marco Elver Aug. 3, 2024, 2:51 p.m. UTC | #1
On Sat, 3 Aug 2024 at 15:36, Qiwu Chen <qiwuchen55@gmail.com> wrote:
>
> From: "qiwu.chen" <qiwu.chen@transsion.com>
>
> For a convienince of tracing slab object leak, print the age time for

typo: convenience

What do you mean by "object leak"?

From what I see the additional info is only printed on out-of-bounds access.

Or do you mean when you inspect /sys/kernel/debug/kfence/objects? If
so, that information would be useful in the commit message.

However, to detect leaks there are better tools than KFENCE. Have you
tried KMEMLEAK? KFENCE is really not a good choice to manually look
for old objects, which themselves are sampled, to find leaks.
Have you been able to successfully debug a leak this way?

> alloacted objectes in kfence_print_stack().

typo: allocated objects

> Signed-off-by: qiwu.chen <qiwu.chen@transsion.com>
> ---
>  mm/kfence/report.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/mm/kfence/report.c b/mm/kfence/report.c
> index c509aed326ce..44c3f82b25a8 100644
> --- a/mm/kfence/report.c
> +++ b/mm/kfence/report.c
> @@ -16,6 +16,7 @@
>  #include <linux/sprintf.h>
>  #include <linux/stacktrace.h>
>  #include <linux/string.h>
> +#include <linux/sched/clock.h>
>  #include <trace/events/error_report.h>
>
>  #include <asm/kfence.h>
> @@ -110,9 +111,18 @@ static void kfence_print_stack(struct seq_file *seq, const struct kfence_metadat
>         unsigned long rem_nsec = do_div(ts_sec, NSEC_PER_SEC);
>
>         /* Timestamp matches printk timestamp format. */
> -       seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n",
> +       if (meta->state == KFENCE_OBJECT_ALLOCATED) {

In principle, the additonal info is convenient, but I'd like to
generalize if possible.

> +               u64 interval_nsec = local_clock() - meta->alloc_track.ts_nsec;
> +               unsigned long rem_interval_nsec = do_div(interval_nsec, NSEC_PER_SEC);
> +
> +               seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (age: %lu.%06lus):\n",

I've found myself trying to figure out the elapsed time since the
allocation or free, based on the current timestamp.

So something that would be more helpful is if you just change the
printed line for all alloc and free stack infos to say something like:

    seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus
(%lu.%06lus ago):\n",

So rather than saying this info is the "age", we just say the elapsed
time. That generalizes this bit of info, and it'll be available for
both alloc and free stacks.

Does that work for you?

>                        show_alloc ? "allocated" : "freed", track->pid,
> -                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);
> +                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000,
> +                          (unsigned long)interval_nsec, rem_interval_nsec / 1000);
> +       } else

Add braces {} even though it's a single statement - it spans several
lines and the above is also {}-enclosed, so it looks balanced.

But if you follow my suggestion, you won't have the else branch anymore.

> +               seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n",
> +                                  show_alloc ? "allocated" : "freed", track->pid,
> +                                  track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);
>
>         if (track->num_stack_entries) {
>                 /* Skip allocation/free internals stack. */
qiwu.chen Aug. 4, 2024, 3:46 a.m. UTC | #2
On Sat, Aug 03, 2024 at 04:51:45PM +0200, Marco Elver wrote:
> 
> typo: convenience
> 
> What do you mean by "object leak"?
> 
It means an allocated object of slab memory which is considered orphan,
perhaps it's more clear to say "For a convenience of tracing memory leaks by kfence",
what do you think?
> From what I see the additional info is only printed on out-of-bounds access.
> 
> Or do you mean when you inspect /sys/kernel/debug/kfence/objects? If
> so, that information would be useful in the commit message.
>
The extra elapsed time of current allocated object would be useful to figure out memory
leaks when inspect /sys/kernel/debug/kfence/objects.
> However, to detect leaks there are better tools than KFENCE. Have you
> tried KMEMLEAK? KFENCE is really not a good choice to manually look
> for old objects, which themselves are sampled, to find leaks.
> Have you been able to successfully debug a leak this way?
> 
The kmemleak tool has limitations and drawbacks which cannot be used in productive environment
directly. KFENCE is a good choice to find leaks in productive environment.
> > alloacted objectes in kfence_print_stack().
> 
> typo: allocated objects
> 
Thank's for your comment.
> 
> In principle, the additonal info is convenient, but I'd like to
> generalize if possible.
> 
> > +               u64 interval_nsec = local_clock() - meta->alloc_track.ts_nsec;
> > +               unsigned long rem_interval_nsec = do_div(interval_nsec, NSEC_PER_SEC);
> > +
> > +               seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (age: %lu.%06lus):\n",
> 
> I've found myself trying to figure out the elapsed time since the
> allocation or free, based on the current timestamp.
> 
> So something that would be more helpful is if you just change the
> printed line for all alloc and free stack infos to say something like:
> 
>     seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus
> (%lu.%06lus ago):\n",
> 
> So rather than saying this info is the "age", we just say the elapsed
> time. That generalizes this bit of info, and it'll be available for
> both alloc and free stacks.
> 
> Does that work for you?
> 
It does not work for me actually, since it's unintuitive to figure out memory leaks
by the elapsed time of allocated stacks when inspect /sys/kernel/debug/kfence/objects.
It's unnecessary to print the elapsed time of allocated stacks for the objects in KFENCE_OBJECT_FREED
state. For the elapsed time of free stacks, it seems no available scenarion currently.
BTW, The change from "age" to "ago" is okay to me!
> >                        show_alloc ? "allocated" : "freed", track->pid,
> > -                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);
> > +                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000,
> > +                          (unsigned long)interval_nsec, rem_interval_nsec / 1000);
> > +       } else
> 
> Add braces {} even though it's a single statement - it spans several
> lines and the above is also {}-enclosed, so it looks balanced.
> 
> But if you follow my suggestion, you won't have the else branch anymore.
> 
I'm not opposed to convert a single statement, but it will have an effort to find leaks.
This change will be helpful to find leaks easier by grep "ago" keyword when inspect /sys/kernel/debug/kfence/objects.

Thanks
Qiwu
Marco Elver Aug. 4, 2024, 8:37 a.m. UTC | #3
On Sun, 4 Aug 2024 at 05:46, chenqiwu <qiwuchen55@gmail.com> wrote:
[...]
> > I've found myself trying to figure out the elapsed time since the
> > allocation or free, based on the current timestamp.
> >
> > So something that would be more helpful is if you just change the
> > printed line for all alloc and free stack infos to say something like:
> >
> >     seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago):\n",
> >
> > So rather than saying this info is the "age", we just say the elapsed
> > time. That generalizes this bit of info, and it'll be available for
> > both alloc and free stacks.
> >
> > Does that work for you?
> >
> It does not work for me actually, since it's unintuitive to figure out memory leaks

The number printed is the same. It's just the change of "age" to "ago".

> by the elapsed time of allocated stacks when inspect /sys/kernel/debug/kfence/objects.
> It's unnecessary to print the elapsed time of allocated stacks for the objects in KFENCE_OBJECT_FREED
> state. For the elapsed time of free stacks, it seems no available scenarion currently.
> BTW, The change from "age" to "ago" is okay to me!

Well, what I'm saying, having this info also for FREED objects on the
free stack can be useful in some debugging scenarios when you get a
use-after-free, and you want to know the elapsed time since the free
happened. I have done this calculation manually before, which is why I
suggested it. Maybe it's not useful for you for finding leaks, but
that's just one usecase.

Thanks,
-- Marco
qiwu.chen Aug. 5, 2024, 3:35 a.m. UTC | #4
On Sun, Aug 04, 2024 at 10:37:43AM +0200, Marco Elver wrote:
> 
> Well, what I'm saying, having this info also for FREED objects on the
> free stack can be useful in some debugging scenarios when you get a
> use-after-free, and you want to know the elapsed time since the free
> happened. I have done this calculation manually before, which is why I
> suggested it. Maybe it's not useful for you for finding leaks, but
> that's just one usecase.
>
Agreed with your concern scenarios.
How about the following change with additonal object state info?

+       u64 interval_nsec = local_clock() - meta->alloc_track.ts_nsec;
+       unsigned long rem_interval_nsec = do_div(interval_nsec, NSEC_PER_SEC);

        /* Timestamp matches printk timestamp format. */
-       seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n",
+       seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago) for %s object:\n",
                       show_alloc ? "allocated" : "freed", track->pid,
-                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);
+                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000,
+                      (unsigned long)interval_nsec, rem_interval_nsec / 1000,
+                      meta->state == KFENCE_OBJECT_ALLOCATED? "allocated" : "freed");

In this way, we can find leaks by grep "allocated object" and inspect the elapsed time of
use-after-free by grep "freed object".

Thanks
Qiwu
Marco Elver Aug. 5, 2024, 6:50 a.m. UTC | #5
On Mon, 5 Aug 2024 at 05:35, chenqiwu <qiwuchen55@gmail.com> wrote:
>
> On Sun, Aug 04, 2024 at 10:37:43AM +0200, Marco Elver wrote:
> >
> > Well, what I'm saying, having this info also for FREED objects on the
> > free stack can be useful in some debugging scenarios when you get a
> > use-after-free, and you want to know the elapsed time since the free
> > happened. I have done this calculation manually before, which is why I
> > suggested it. Maybe it's not useful for you for finding leaks, but
> > that's just one usecase.
> >
> Agreed with your concern scenarios.
> How about the following change with additonal object state info?
>
> +       u64 interval_nsec = local_clock() - meta->alloc_track.ts_nsec;
> +       unsigned long rem_interval_nsec = do_div(interval_nsec, NSEC_PER_SEC);
>
>         /* Timestamp matches printk timestamp format. */
> -       seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n",
> +       seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago) for %s object:\n",
>                        show_alloc ? "allocated" : "freed", track->pid,
> -                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);
> +                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000,
> +                      (unsigned long)interval_nsec, rem_interval_nsec / 1000,
> +                      meta->state == KFENCE_OBJECT_ALLOCATED? "allocated" : "freed");
>
> In this way, we can find leaks by grep "allocated object" and inspect the elapsed time of
> use-after-free by grep "freed object".

The "allocated/freed" info is superfluous, as freed objects will have
a free stack.

Consider a slightly better script vs. just using grep.
/sys/kernel/debug/kfence/objects is of secondary concern and was added
primarily as a debugging aid for KFENCE developers. We never thought
it could be used to look for leaks, but good you found another use for
it. ;-)
The priority is to keep regular error reports generated by KFENCE
readable. Adding this "allocated/freed" info just makes the line
longer and is not useful.

I'm happy with the "(%lu.%06lus ago)" part alone.
qiwu.chen Aug. 5, 2024, 2:06 p.m. UTC | #6
On Mon, Aug 05, 2024 at 08:50:57AM +0200, Marco Elver wrote:
> 
> The "allocated/freed" info is superfluous, as freed objects will have
> a free stack.
> 
> Consider a slightly better script vs. just using grep.
Well, I think using grep is eaiser than a script to find leaks by a
large number of alloc tracks.

> /sys/kernel/debug/kfence/objects is of secondary concern and was added
> primarily as a debugging aid for KFENCE developers. We never thought
> it could be used to look for leaks, but good you found another use for
> it. ;-)
> The priority is to keep regular error reports generated by KFENCE
> readable. Adding this "allocated/freed" info just makes the line
> longer and is not useful.
>
How about print meta->state directly to get the object state for its
alloc/free track?
-       seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n",
+       seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago) state %d:\n",
                       show_alloc ? "allocated" : "freed", track->pid,
-                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);
+                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000,
+                      (unsigned long)interval_nsec, rem_interval_nsec / 1000,
+                      meta->state);
> I'm happy with the "(%lu.%06lus ago)" part alone.
If it's still a not good idea, I will follow your suggestion and resend
it as v2.

Thanks
Qiwu
Marco Elver Aug. 5, 2024, 2:18 p.m. UTC | #7
On Mon, 5 Aug 2024 at 16:06, chenqiwu <qiwuchen55@gmail.com> wrote:
>
> On Mon, Aug 05, 2024 at 08:50:57AM +0200, Marco Elver wrote:
> >
> > The "allocated/freed" info is superfluous, as freed objects will have
> > a free stack.
> >
> > Consider a slightly better script vs. just using grep.
> Well, I think using grep is eaiser than a script to find leaks by a
> large number of alloc tracks.

Sure. But a slightly more complex script is a better trade-off vs.
impacting _all_ KFENCE users world-wide with slightly less readable
error reports.

> > /sys/kernel/debug/kfence/objects is of secondary concern and was added
> > primarily as a debugging aid for KFENCE developers. We never thought
> > it could be used to look for leaks, but good you found another use for
> > it. ;-)
> > The priority is to keep regular error reports generated by KFENCE
> > readable. Adding this "allocated/freed" info just makes the line
> > longer and is not useful.
> >
> How about print meta->state directly to get the object state for its
> alloc/free track?
> -       seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n",
> +       seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (%lu.%06lus ago) state %d:\n",
>                        show_alloc ? "allocated" : "freed", track->pid,
> -                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);
> +                      track->cpu, (unsigned long)ts_sec, rem_nsec / 1000,
> +                      (unsigned long)interval_nsec, rem_interval_nsec / 1000,
> +                      meta->state);
> > I'm happy with the "(%lu.%06lus ago)" part alone.
> If it's still a not good idea, I will follow your suggestion and resend
> it as v2.

No, that's just making it more ugly for no reason. It's replicating
the state info (just like before) for alloc and free stacks and
generally does not add anything useful.

See, we are writing code that is deployed on millions of machines, and
KFENCE error reports do appear in the wild occasionally. We have to
optimize for the common case.

Your change might be useful for you, which is a relatively unique
usecase. The common use case of KFENCE is to detect memory-safety
errors, and good error reports are a major feature of KFENCE. All
information is already present in the reports (and
/sys/kernel/debug/kfence/objects).

I argue that you are able to write a slightly more complex script that
simply looks for the free stack right after the allocation stack to
determine if an object is live or freed. Maybe doing it in bash won't
work so nicely, but a small Python script can easily do that job. Once
you have that Python script you might even do further processing, sort
things by age, size, etc. etc., and then print whole stack traces.
Just grep can't do that. So if you want something useful, you'd have
to give up on grep sooner or later.
diff mbox series

Patch

diff --git a/mm/kfence/report.c b/mm/kfence/report.c
index c509aed326ce..44c3f82b25a8 100644
--- a/mm/kfence/report.c
+++ b/mm/kfence/report.c
@@ -16,6 +16,7 @@ 
 #include <linux/sprintf.h>
 #include <linux/stacktrace.h>
 #include <linux/string.h>
+#include <linux/sched/clock.h>
 #include <trace/events/error_report.h>
 
 #include <asm/kfence.h>
@@ -110,9 +111,18 @@  static void kfence_print_stack(struct seq_file *seq, const struct kfence_metadat
 	unsigned long rem_nsec = do_div(ts_sec, NSEC_PER_SEC);
 
 	/* Timestamp matches printk timestamp format. */
-	seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n",
+	if (meta->state == KFENCE_OBJECT_ALLOCATED) {
+		u64 interval_nsec = local_clock() - meta->alloc_track.ts_nsec;
+		unsigned long rem_interval_nsec = do_div(interval_nsec, NSEC_PER_SEC);
+
+		seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus (age: %lu.%06lus):\n",
 		       show_alloc ? "allocated" : "freed", track->pid,
-		       track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);
+		       track->cpu, (unsigned long)ts_sec, rem_nsec / 1000,
+			   (unsigned long)interval_nsec, rem_interval_nsec / 1000);
+	} else
+		seq_con_printf(seq, "%s by task %d on cpu %d at %lu.%06lus:\n",
+				   show_alloc ? "allocated" : "freed", track->pid,
+				   track->cpu, (unsigned long)ts_sec, rem_nsec / 1000);
 
 	if (track->num_stack_entries) {
 		/* Skip allocation/free internals stack. */