diff mbox series

[v2,sl-b,1/5] mm: Add mem_dump_obj() to print source of memory block

Message ID 20201209011303.32737-1-paulmck@kernel.org (mailing list archive)
State New, archived
Headers show
Series [v2,sl-b,1/5] mm: Add mem_dump_obj() to print source of memory block | expand

Commit Message

Paul E. McKenney Dec. 9, 2020, 1:12 a.m. UTC
From: "Paul E. McKenney" <paulmck@kernel.org>

There are kernel facilities such as per-CPU reference counts that give
error messages in generic handlers or callbacks, whose messages are
unenlightening.  In the case of per-CPU reference-count underflow, this
is not a problem when creating a new use of this facility because in that
case the bug is almost certainly in the code implementing that new use.
However, trouble arises when deploying across many systems, which might
exercise corner cases that were not seen during development and testing.
Here, it would be really nice to get some kind of hint as to which of
several uses the underflow was caused by.

This commit therefore exposes a mem_dump_obj() function that takes
a pointer to memory (which must still be allocated if it has been
dynamically allocated) and prints available information on where that
memory came from.  This pointer can reference the middle of the block as
well as the beginning of the block, as needed by things like RCU callback
functions and timer handlers that might not know where the beginning of
the memory block is.  These functions and handlers can use mem_dump_obj()
to print out better hints as to where the problem might lie.

The information printed can depend on kernel configuration.  For example,
the allocation return address can be printed only for slab and slub,
and even then only when the necessary debug has been enabled.  For slab,
build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
to the next power of two or use the SLAB_STORE_USER when creating the
kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
to enable printing of the allocation-time stack trace.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
[ paulmck: Convert to printing and change names per Joonsoo Kim. ]
[ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/mm.h   |  2 ++
 include/linux/slab.h |  2 ++
 mm/slab.c            | 28 +++++++++++++++++++++
 mm/slab.h            | 11 +++++++++
 mm/slab_common.c     | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/slob.c            |  7 ++++++
 mm/slub.c            | 40 ++++++++++++++++++++++++++++++
 mm/util.c            | 25 +++++++++++++++++++
 8 files changed, 184 insertions(+)

Comments

Christoph Hellwig Dec. 9, 2020, 8:17 a.m. UTC | #1
Your two new exports don't actually seem to get used in modular code
at all in this series.
Paul E. McKenney Dec. 9, 2020, 2:57 p.m. UTC | #2
On Wed, Dec 09, 2020 at 08:17:10AM +0000, Christoph Hellwig wrote:
> Your two new exports don't actually seem to get used in modular code
> at all in this series.

Indeed, and I either need to remove the exports or make my test code in
kernel/rcu/rcuscale.o suitable for upstreaming.  Or find the appropriate
mm/slab selftest location.

							Thanx, Paul
Vlastimil Babka Dec. 9, 2020, 5:28 p.m. UTC | #3
On 12/9/20 2:12 AM, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> There are kernel facilities such as per-CPU reference counts that give
> error messages in generic handlers or callbacks, whose messages are
> unenlightening.  In the case of per-CPU reference-count underflow, this
> is not a problem when creating a new use of this facility because in that
> case the bug is almost certainly in the code implementing that new use.
> However, trouble arises when deploying across many systems, which might
> exercise corner cases that were not seen during development and testing.
> Here, it would be really nice to get some kind of hint as to which of
> several uses the underflow was caused by.
> 
> This commit therefore exposes a mem_dump_obj() function that takes
> a pointer to memory (which must still be allocated if it has been
> dynamically allocated) and prints available information on where that
> memory came from.  This pointer can reference the middle of the block as
> well as the beginning of the block, as needed by things like RCU callback
> functions and timer handlers that might not know where the beginning of
> the memory block is.  These functions and handlers can use mem_dump_obj()
> to print out better hints as to where the problem might lie.

Sounds useful, yeah. It occured to me at least once that we don't have a nice
generic way to print this kind of info. I usually dig it from a crash dump...

> The information printed can depend on kernel configuration.  For example,
> the allocation return address can be printed only for slab and slub,
> and even then only when the necessary debug has been enabled.  For slab,
> build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> to the next power of two or use the SLAB_STORE_USER when creating the
> kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> to enable printing of the allocation-time stack trace.
> 
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <linux-mm@kvack.org>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

...

> +/**
> + * kmem_valid_obj - does the pointer reference a valid slab object?
> + * @object: pointer to query.
> + *
> + * Return: %true if the pointer is to a not-yet-freed object from
> + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> + * is to an already-freed object, and %false otherwise.
> + */

It should be possible to find out more about object being free or not, than you
currently do. At least to find out if it's definitely free. When it appears
allocated, it can be actually still free in some kind of e.g. per-cpu or
per-node cache that would be infeasible to check. But that improvement to the
output can be also added later. Also SLUB stores the freeing stacktrace, which
might be useful...

> +bool kmem_valid_obj(void *object)
> +{
> +	struct page *page;
> +
> +	if (!virt_addr_valid(object))
> +		return false;
> +	page = virt_to_head_page(object);
> +	return PageSlab(page);
> +}
> +EXPORT_SYMBOL_GPL(kmem_valid_obj);
> +
> +/**
> + * kmem_dump_obj - Print available slab provenance information
> + * @object: slab object for which to find provenance information.
> + *
> + * This function uses pr_cont(), so that the caller is expected to have
> + * printed out whatever preamble is appropriate.  The provenance information
> + * depends on the type of object and on how much debugging is enabled.
> + * For a slab-cache object, the fact that it is a slab object is printed,
> + * and, if available, the slab name, return address, and stack trace from
> + * the allocation of that object.
> + *
> + * This function will splat if passed a pointer to a non-slab object.
> + * If you are not sure what type of object you have, you should instead
> + * use mem_dump_obj().
> + */
> +void kmem_dump_obj(void *object)
> +{
> +	int i;
> +	struct page *page;
> +	struct kmem_provenance kp;
> +
> +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> +		return;
> +	page = virt_to_head_page(object);
> +	if (WARN_ON_ONCE(!PageSlab(page))) {
> +		pr_cont(" non-slab memory.\n");
> +		return;
> +	}
> +	kp.kp_ptr = object;
> +	kp.kp_page = page;
> +	kp.kp_nstack = KS_ADDRS_COUNT;
> +	kmem_provenance(&kp);

You don't seem to be printing kp.kp_objp anywhere? (unless in later patch, but
would make sense in this patch already).

> +	if (page->slab_cache)
> +		pr_cont(" slab %s", page->slab_cache->name);
> +	else
> +		pr_cont(" slab ");
> +	if (kp.kp_ret)
> +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> +	else
> +		pr_cont("\n");
> +	if (kp.kp_stack[0]) {
> +		for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
> +			if (!kp.kp_stack[i])
> +				break;
> +			pr_info("    %pS\n", kp.kp_stack[i]);
> +		}
> +	}
> +}

...

> diff --git a/mm/slub.c b/mm/slub.c
> index b30be23..027fe0f 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3918,6 +3918,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
>  	return 0;
>  }
>  
> +void kmem_provenance(struct kmem_provenance *kpp)
> +{
> +#ifdef CONFIG_SLUB_DEBUG

I'd expect at least the very basic stuff (kp_obj) to be possible to determine
even under !CONFIG_SLUB_DEBUG?

> +	void *base;
> +	int i;
> +	void *object = kpp->kp_ptr;
> +	unsigned int objnr;
> +	void *objp;
> +	struct page *page = kpp->kp_page;
> +	struct kmem_cache *s = page->slab_cache;
> +	struct track *trackp;
> +
> +	base = page_address(page);
> +	objp = kasan_reset_tag(object);
> +	objp = restore_red_left(s, objp);
> +	objnr = obj_to_index(s, page, objp);
> +	objp = base + s->size * objnr;
> +	kpp->kp_objp = objp;
> +	if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) ||
> +	    !(s->flags & SLAB_STORE_USER))
> +		goto nodebug;
> +	trackp = get_track(s, objp, TRACK_ALLOC);
> +	kpp->kp_ret = (void *)trackp->addr;
> +#ifdef CONFIG_STACKTRACE
> +	for (i = 0; i < kpp->kp_nstack && i < TRACK_ADDRS_COUNT; i++) {
> +		kpp->kp_stack[i] = (void *)trackp->addrs[i];
> +		if (!kpp->kp_stack[i])
> +			break;
> +	}
> +#endif
> +	if (kpp->kp_stack && i < kpp->kp_nstack)
> +		kpp->kp_stack[i] = NULL;
> +	return;
> +nodebug:
> +#endif
> +	kpp->kp_ret = NULL;
> +	if (kpp->kp_nstack)
> +		kpp->kp_stack[0] = NULL;
> +}
> +
>  /********************************************************************
>   *		Kmalloc subsystem
>   *******************************************************************/
> diff --git a/mm/util.c b/mm/util.c
> index 4ddb6e1..d0e60d2 100644
> --- a/mm/util.c
> +++ b/mm/util.c

I think mm/debug.c is a better fit as it already has dump_page() of a similar
nature. Also you can call that from from mem_dump_obj() at least in case when
the more specific handlers fail. It will even include page_owner info if enabled! :)

Thanks,
Vlastimil

> @@ -970,3 +970,28 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
>  	kunmap_atomic(addr1);
>  	return ret;
>  }
> +
> +/**
> + * mem_dump_obj - Print available provenance information
> + * @object: object for which to find provenance information.
> + *
> + * This function uses pr_cont(), so that the caller is expected to have
> + * printed out whatever preamble is appropriate.  The provenance information
> + * depends on the type of object and on how much debugging is enabled.
> + * For example, for a slab-cache object, the slab name is printed, and,
> + * if available, the return address and stack trace from the allocation
> + * of that object.
> + */
> +void mem_dump_obj(void *object)
> +{
> +	if (!virt_addr_valid(object)) {
> +		pr_cont(" non-paged (local) memory.\n");
> +		return;
> +	}
> +	if (kmem_valid_obj(object)) {
> +		kmem_dump_obj(object);
> +		return;
> +	}
> +	pr_cont(" non-slab memory.\n");
> +}
> +EXPORT_SYMBOL_GPL(mem_dump_obj);
>
Christoph Hellwig Dec. 9, 2020, 5:53 p.m. UTC | #4
On Wed, Dec 09, 2020 at 06:57:02AM -0800, Paul E. McKenney wrote:
> On Wed, Dec 09, 2020 at 08:17:10AM +0000, Christoph Hellwig wrote:
> > Your two new exports don't actually seem to get used in modular code
> > at all in this series.
> 
> Indeed, and I either need to remove the exports or make my test code in
> kernel/rcu/rcuscale.o suitable for upstreaming.  Or find the appropriate
> mm/slab selftest location.

I'd rather not export something like this which pokes deep into
internals.  That being said I've been working on off on a
EXPORT_SYMBOL_FOR() that just exports a symbol to one specific module.
Hopefully I'll finish it for the next merge window, and with that
I'd feel much more comfortable with an export.
Paul E. McKenney Dec. 9, 2020, 5:59 p.m. UTC | #5
On Wed, Dec 09, 2020 at 05:53:06PM +0000, Christoph Hellwig wrote:
> On Wed, Dec 09, 2020 at 06:57:02AM -0800, Paul E. McKenney wrote:
> > On Wed, Dec 09, 2020 at 08:17:10AM +0000, Christoph Hellwig wrote:
> > > Your two new exports don't actually seem to get used in modular code
> > > at all in this series.
> > 
> > Indeed, and I either need to remove the exports or make my test code in
> > kernel/rcu/rcuscale.o suitable for upstreaming.  Or find the appropriate
> > mm/slab selftest location.
> 
> I'd rather not export something like this which pokes deep into
> internals.  That being said I've been working on off on a
> EXPORT_SYMBOL_FOR() that just exports a symbol to one specific module.
> Hopefully I'll finish it for the next merge window, and with that
> I'd feel much more comfortable with an export.

That would be really useful!  I have a number of symbols that should
only be used by a few specific in-tree modules, independent of this
patch series.

For my part, I will see if there is a good mm-related location for this
sort of selftest.

							Thanx, Paul
Paul E. McKenney Dec. 9, 2020, 11:04 p.m. UTC | #6
On Wed, Dec 09, 2020 at 06:28:50PM +0100, Vlastimil Babka wrote:
> On 12/9/20 2:12 AM, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > There are kernel facilities such as per-CPU reference counts that give
> > error messages in generic handlers or callbacks, whose messages are
> > unenlightening.  In the case of per-CPU reference-count underflow, this
> > is not a problem when creating a new use of this facility because in that
> > case the bug is almost certainly in the code implementing that new use.
> > However, trouble arises when deploying across many systems, which might
> > exercise corner cases that were not seen during development and testing.
> > Here, it would be really nice to get some kind of hint as to which of
> > several uses the underflow was caused by.
> > 
> > This commit therefore exposes a mem_dump_obj() function that takes
> > a pointer to memory (which must still be allocated if it has been
> > dynamically allocated) and prints available information on where that
> > memory came from.  This pointer can reference the middle of the block as
> > well as the beginning of the block, as needed by things like RCU callback
> > functions and timer handlers that might not know where the beginning of
> > the memory block is.  These functions and handlers can use mem_dump_obj()
> > to print out better hints as to where the problem might lie.
> 
> Sounds useful, yeah. It occured to me at least once that we don't have a nice
> generic way to print this kind of info. I usually dig it from a crash dump...

Glad to hear that it might be helpful, and thank you for looking this
over!

> > The information printed can depend on kernel configuration.  For example,
> > the allocation return address can be printed only for slab and slub,
> > and even then only when the necessary debug has been enabled.  For slab,
> > build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> > to the next power of two or use the SLAB_STORE_USER when creating the
> > kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> > boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> > if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> > to enable printing of the allocation-time stack trace.
> > 
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: <linux-mm@kvack.org>
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> > [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> ...
> 
> > +/**
> > + * kmem_valid_obj - does the pointer reference a valid slab object?
> > + * @object: pointer to query.
> > + *
> > + * Return: %true if the pointer is to a not-yet-freed object from
> > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> > + * is to an already-freed object, and %false otherwise.
> > + */
> 
> It should be possible to find out more about object being free or not, than you
> currently do. At least to find out if it's definitely free. When it appears
> allocated, it can be actually still free in some kind of e.g. per-cpu or
> per-node cache that would be infeasible to check. But that improvement to the
> output can be also added later. Also SLUB stores the freeing stacktrace, which
> might be useful...

I can see how this could help debugging a use-after-free situation,
at least as long as the poor sap that subsequently allocated it doesn't
free it.

I can easily add more fields to the kmem_provenance structure.  Maybe
it would make sense to have another exported API that you provide a
kmem_provenance structure to, and it fills it in.

One caution though...  I rely on the object being allocated.
If it officially might already be freed, complex and high-overhead
synchronization seems to be required to safely access the various data
structures.

So any use on an already-freed object is on a "you break it you get to
keep the pieces" basis.  On the other hand, if you are dealing with a
use-after-free situation, life is hard anyway.

Or am I missing your point?

> > +bool kmem_valid_obj(void *object)
> > +{
> > +	struct page *page;
> > +
> > +	if (!virt_addr_valid(object))
> > +		return false;
> > +	page = virt_to_head_page(object);
> > +	return PageSlab(page);
> > +}
> > +EXPORT_SYMBOL_GPL(kmem_valid_obj);
> > +
> > +/**
> > + * kmem_dump_obj - Print available slab provenance information
> > + * @object: slab object for which to find provenance information.
> > + *
> > + * This function uses pr_cont(), so that the caller is expected to have
> > + * printed out whatever preamble is appropriate.  The provenance information
> > + * depends on the type of object and on how much debugging is enabled.
> > + * For a slab-cache object, the fact that it is a slab object is printed,
> > + * and, if available, the slab name, return address, and stack trace from
> > + * the allocation of that object.
> > + *
> > + * This function will splat if passed a pointer to a non-slab object.
> > + * If you are not sure what type of object you have, you should instead
> > + * use mem_dump_obj().
> > + */
> > +void kmem_dump_obj(void *object)
> > +{
> > +	int i;
> > +	struct page *page;
> > +	struct kmem_provenance kp;
> > +
> > +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> > +		return;
> > +	page = virt_to_head_page(object);
> > +	if (WARN_ON_ONCE(!PageSlab(page))) {
> > +		pr_cont(" non-slab memory.\n");
> > +		return;
> > +	}
> > +	kp.kp_ptr = object;
> > +	kp.kp_page = page;
> > +	kp.kp_nstack = KS_ADDRS_COUNT;
> > +	kmem_provenance(&kp);
> 
> You don't seem to be printing kp.kp_objp anywhere? (unless in later patch, but
> would make sense in this patch already).

Good point!

However, please note that the various debugging options that reserve
space at the beginning.  This can make the meaning of kp.kp_objp a bit
different than one might expect.

> > +	if (page->slab_cache)
> > +		pr_cont(" slab %s", page->slab_cache->name);
> > +	else
> > +		pr_cont(" slab ");
> > +	if (kp.kp_ret)
> > +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> > +	else
> > +		pr_cont("\n");
> > +	if (kp.kp_stack[0]) {
> > +		for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
> > +			if (!kp.kp_stack[i])
> > +				break;
> > +			pr_info("    %pS\n", kp.kp_stack[i]);
> > +		}
> > +	}
> > +}
> 
> ...
> 
> > diff --git a/mm/slub.c b/mm/slub.c
> > index b30be23..027fe0f 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3918,6 +3918,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
> >  	return 0;
> >  }
> >  
> > +void kmem_provenance(struct kmem_provenance *kpp)
> > +{
> > +#ifdef CONFIG_SLUB_DEBUG
> 
> I'd expect at least the very basic stuff (kp_obj) to be possible to determine
> even under !CONFIG_SLUB_DEBUG?

And doing it that way even saves a line of code!  ;-)

> > +	void *base;
> > +	int i;
> > +	void *object = kpp->kp_ptr;
> > +	unsigned int objnr;
> > +	void *objp;
> > +	struct page *page = kpp->kp_page;
> > +	struct kmem_cache *s = page->slab_cache;
> > +	struct track *trackp;
> > +
> > +	base = page_address(page);
> > +	objp = kasan_reset_tag(object);
> > +	objp = restore_red_left(s, objp);
> > +	objnr = obj_to_index(s, page, objp);
> > +	objp = base + s->size * objnr;
> > +	kpp->kp_objp = objp;
> > +	if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) ||
> > +	    !(s->flags & SLAB_STORE_USER))
> > +		goto nodebug;
> > +	trackp = get_track(s, objp, TRACK_ALLOC);
> > +	kpp->kp_ret = (void *)trackp->addr;
> > +#ifdef CONFIG_STACKTRACE
> > +	for (i = 0; i < kpp->kp_nstack && i < TRACK_ADDRS_COUNT; i++) {
> > +		kpp->kp_stack[i] = (void *)trackp->addrs[i];
> > +		if (!kpp->kp_stack[i])
> > +			break;
> > +	}
> > +#endif
> > +	if (kpp->kp_stack && i < kpp->kp_nstack)
> > +		kpp->kp_stack[i] = NULL;
> > +	return;
> > +nodebug:
> > +#endif
> > +	kpp->kp_ret = NULL;
> > +	if (kpp->kp_nstack)
> > +		kpp->kp_stack[0] = NULL;
> > +}
> > +
> >  /********************************************************************
> >   *		Kmalloc subsystem
> >   *******************************************************************/
> > diff --git a/mm/util.c b/mm/util.c
> > index 4ddb6e1..d0e60d2 100644
> > --- a/mm/util.c
> > +++ b/mm/util.c
> 
> I think mm/debug.c is a better fit as it already has dump_page() of a similar
> nature. Also you can call that from from mem_dump_obj() at least in case when
> the more specific handlers fail. It will even include page_owner info if enabled! :)

I will count this as one vote for mm/debug.c.

Two things to consider, though...  First, Joonsoo suggests that the fact
that this produces useful information without any debugging information
enabled makes it not be debugging as such.  Second, mm/debug.c does
not include either slab.h or vmalloc.h.  The second might not be a
showstopper, but I was interpreting this to mean that its role was
less central.

							Thanx, Paul

> Thanks,
> Vlastimil
> 
> > @@ -970,3 +970,28 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
> >  	kunmap_atomic(addr1);
> >  	return ret;
> >  }
> > +
> > +/**
> > + * mem_dump_obj - Print available provenance information
> > + * @object: object for which to find provenance information.
> > + *
> > + * This function uses pr_cont(), so that the caller is expected to have
> > + * printed out whatever preamble is appropriate.  The provenance information
> > + * depends on the type of object and on how much debugging is enabled.
> > + * For example, for a slab-cache object, the slab name is printed, and,
> > + * if available, the return address and stack trace from the allocation
> > + * of that object.
> > + */
> > +void mem_dump_obj(void *object)
> > +{
> > +	if (!virt_addr_valid(object)) {
> > +		pr_cont(" non-paged (local) memory.\n");
> > +		return;
> > +	}
> > +	if (kmem_valid_obj(object)) {
> > +		kmem_dump_obj(object);
> > +		return;
> > +	}
> > +	pr_cont(" non-slab memory.\n");
> > +}
> > +EXPORT_SYMBOL_GPL(mem_dump_obj);
> > 
>
Vlastimil Babka Dec. 10, 2020, 10:48 a.m. UTC | #7
On 12/10/20 12:04 AM, Paul E. McKenney wrote:
>> > +/**
>> > + * kmem_valid_obj - does the pointer reference a valid slab object?
>> > + * @object: pointer to query.
>> > + *
>> > + * Return: %true if the pointer is to a not-yet-freed object from
>> > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
>> > + * is to an already-freed object, and %false otherwise.
>> > + */
>> 
>> It should be possible to find out more about object being free or not, than you
>> currently do. At least to find out if it's definitely free. When it appears
>> allocated, it can be actually still free in some kind of e.g. per-cpu or
>> per-node cache that would be infeasible to check. But that improvement to the
>> output can be also added later. Also SLUB stores the freeing stacktrace, which
>> might be useful...
> 
> I can see how this could help debugging a use-after-free situation,
> at least as long as the poor sap that subsequently allocated it doesn't
> free it.
> 
> I can easily add more fields to the kmem_provenance structure.  Maybe
> it would make sense to have another exported API that you provide a
> kmem_provenance structure to, and it fills it in.
> 
> One caution though...  I rely on the object being allocated.
> If it officially might already be freed, complex and high-overhead
> synchronization seems to be required to safely access the various data
> structures.

Good point! It's easy to forget that when being used to similar digging in a
crash dump, where nothing changes.

> So any use on an already-freed object is on a "you break it you get to
> keep the pieces" basis.  On the other hand, if you are dealing with a
> use-after-free situation, life is hard anyway.

Yeah, even now I think it's potentially dangerous, as you can get
kmem_valid_obj() as true because PageSlab(page) is true. But the object might be
already free, so as soon as another CPU frees another object from the same slab
page, the page gets also freed... or it was already freed and then allocated by
another slab so it's PageSlab() again.
I guess at least some safety could be achieved by pinning the page with
get_page_unless_zero. But maybe your current implementation is already safe,
need to check in detail.

> Or am I missing your point?
> 
>> > +bool kmem_valid_obj(void *object)
>> > +{
>> > +	struct page *page;
>> > +
>> > +	if (!virt_addr_valid(object))
>> > +		return false;
>> > +	page = virt_to_head_page(object);
>> > +	return PageSlab(page);
>> > +}
>> > +EXPORT_SYMBOL_GPL(kmem_valid_obj);
>> > +
>> > +/**
>> > + * kmem_dump_obj - Print available slab provenance information
>> > + * @object: slab object for which to find provenance information.
>> > + *
>> > + * This function uses pr_cont(), so that the caller is expected to have
>> > + * printed out whatever preamble is appropriate.  The provenance information
>> > + * depends on the type of object and on how much debugging is enabled.
>> > + * For a slab-cache object, the fact that it is a slab object is printed,
>> > + * and, if available, the slab name, return address, and stack trace from
>> > + * the allocation of that object.
>> > + *
>> > + * This function will splat if passed a pointer to a non-slab object.
>> > + * If you are not sure what type of object you have, you should instead
>> > + * use mem_dump_obj().
>> > + */
>> > +void kmem_dump_obj(void *object)
>> > +{
>> > +	int i;
>> > +	struct page *page;
>> > +	struct kmem_provenance kp;
>> > +
>> > +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
>> > +		return;
>> > +	page = virt_to_head_page(object);
>> > +	if (WARN_ON_ONCE(!PageSlab(page))) {
>> > +		pr_cont(" non-slab memory.\n");
>> > +		return;
>> > +	}
>> > +	kp.kp_ptr = object;
>> > +	kp.kp_page = page;
>> > +	kp.kp_nstack = KS_ADDRS_COUNT;
>> > +	kmem_provenance(&kp);
>> 
>> You don't seem to be printing kp.kp_objp anywhere? (unless in later patch, but
>> would make sense in this patch already).
> 
> Good point!
> 
> However, please note that the various debugging options that reserve
> space at the beginning.  This can make the meaning of kp.kp_objp a bit
> different than one might expect.

Yeah, I think the best would be to match the address that
kmalloc/kmem_cache_alloc() would return, thus the beginning of the object
itself, so you can calculate the offset within it, etc.

>> > --- a/mm/util.c
>> > +++ b/mm/util.c
>> 
>> I think mm/debug.c is a better fit as it already has dump_page() of a similar
>> nature. Also you can call that from from mem_dump_obj() at least in case when
>> the more specific handlers fail. It will even include page_owner info if enabled! :)
> 
> I will count this as one vote for mm/debug.c.
> 
> Two things to consider, though...  First, Joonsoo suggests that the fact
> that this produces useful information without any debugging information
> enabled makes it not be debugging as such.

Well there's already dump_page() which also produces information without special
configs.
We're not the best subsystem in this kind of consistency...

> Second, mm/debug.c does
> not include either slab.h or vmalloc.h.  The second might not be a
> showstopper, but I was interpreting this to mean that its role was
> less central.

I think it can include whatever becomes needed there :)

> 							Thanx, Paul
> 
>> Thanks,
>> Vlastimil
>> 
>> > @@ -970,3 +970,28 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
>> >  	kunmap_atomic(addr1);
>> >  	return ret;
>> >  }
>> > +
>> > +/**
>> > + * mem_dump_obj - Print available provenance information
>> > + * @object: object for which to find provenance information.
>> > + *
>> > + * This function uses pr_cont(), so that the caller is expected to have
>> > + * printed out whatever preamble is appropriate.  The provenance information
>> > + * depends on the type of object and on how much debugging is enabled.
>> > + * For example, for a slab-cache object, the slab name is printed, and,
>> > + * if available, the return address and stack trace from the allocation
>> > + * of that object.
>> > + */
>> > +void mem_dump_obj(void *object)
>> > +{
>> > +	if (!virt_addr_valid(object)) {
>> > +		pr_cont(" non-paged (local) memory.\n");
>> > +		return;
>> > +	}
>> > +	if (kmem_valid_obj(object)) {
>> > +		kmem_dump_obj(object);
>> > +		return;
>> > +	}
>> > +	pr_cont(" non-slab memory.\n");
>> > +}
>> > +EXPORT_SYMBOL_GPL(mem_dump_obj);
>> > 
>> 
>
Joonsoo Kim Dec. 10, 2020, 12:04 p.m. UTC | #8
On Tue, Dec 08, 2020 at 05:12:59PM -0800, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> There are kernel facilities such as per-CPU reference counts that give
> error messages in generic handlers or callbacks, whose messages are
> unenlightening.  In the case of per-CPU reference-count underflow, this
> is not a problem when creating a new use of this facility because in that
> case the bug is almost certainly in the code implementing that new use.
> However, trouble arises when deploying across many systems, which might
> exercise corner cases that were not seen during development and testing.
> Here, it would be really nice to get some kind of hint as to which of
> several uses the underflow was caused by.
> 
> This commit therefore exposes a mem_dump_obj() function that takes
> a pointer to memory (which must still be allocated if it has been
> dynamically allocated) and prints available information on where that
> memory came from.  This pointer can reference the middle of the block as
> well as the beginning of the block, as needed by things like RCU callback
> functions and timer handlers that might not know where the beginning of
> the memory block is.  These functions and handlers can use mem_dump_obj()
> to print out better hints as to where the problem might lie.
> 
> The information printed can depend on kernel configuration.  For example,
> the allocation return address can be printed only for slab and slub,
> and even then only when the necessary debug has been enabled.  For slab,
> build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> to the next power of two or use the SLAB_STORE_USER when creating the
> kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> to enable printing of the allocation-time stack trace.
> 
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <linux-mm@kvack.org>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> Signed-off-by: Paul E. McKenney <paulmck@kernel.org>

Introducing three functions, kmem_valid_obj(), kmem_provenance(),
mem_dump_obj() looks better than patchset v1. Nice work. Few comments
below.

> ---
>  include/linux/mm.h   |  2 ++
>  include/linux/slab.h |  2 ++
>  mm/slab.c            | 28 +++++++++++++++++++++
>  mm/slab.h            | 11 +++++++++
>  mm/slab_common.c     | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  mm/slob.c            |  7 ++++++
>  mm/slub.c            | 40 ++++++++++++++++++++++++++++++
>  mm/util.c            | 25 +++++++++++++++++++
>  8 files changed, 184 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ef360fe..1eea266 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -3153,5 +3153,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
>  
>  extern int sysctl_nr_trim_pages;
>  
> +void mem_dump_obj(void *object);
> +
>  #endif /* __KERNEL__ */
>  #endif /* _LINUX_MM_H */
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index dd6897f..169b511 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -186,6 +186,8 @@ void kfree(const void *);
>  void kfree_sensitive(const void *);
>  size_t __ksize(const void *);
>  size_t ksize(const void *);
> +bool kmem_valid_obj(void *object);
> +void kmem_dump_obj(void *object);
>  
>  #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
>  void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
> diff --git a/mm/slab.c b/mm/slab.c
> index b111356..72b6743 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3602,6 +3602,34 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
>  EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
>  #endif
>  
> +void kmem_provenance(struct kmem_provenance *kpp)

To open up the possibility of future enhancement, name, provenance,
looks not good to me. This function could be used to extract various
object information so such as kmem_obj_info() looks better to me. Any
thought?

> +{
> +#ifdef DEBUG
> +	struct kmem_cache *cachep;
> +	void *object = kpp->kp_ptr;
> +	unsigned int objnr;
> +	void *objp;
> +	struct page *page = kpp->kp_page;
> +
> +	cachep = page->slab_cache;
> +	if (!(cachep->flags & SLAB_STORE_USER)) {
> +		kpp->kp_ret = NULL;
> +		goto nodebug;
> +	}
> +	objp = object - obj_offset(cachep);
> +	page = virt_to_head_page(objp);
> +	objnr = obj_to_index(cachep, page, objp);
> +	objp = index_to_obj(cachep, page, objnr);
> +	kpp->kp_objp = objp;
> +	kpp->kp_ret = *dbg_userword(cachep, objp);
> +nodebug:
> +#else
> +	kpp->kp_ret = NULL;
> +#endif
> +	if (kpp->kp_nstack)
> +		kpp->kp_stack[0] = NULL;
> +}
> +
>  static __always_inline void *
>  __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
>  {
> diff --git a/mm/slab.h b/mm/slab.h
> index 6d7c6a5..28a41d5 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -630,4 +630,15 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c)
>  	return false;
>  }
>  
> +#define KS_ADDRS_COUNT 16
> +struct kmem_provenance {
> +	void *kp_ptr;
> +	struct page *kp_page;
> +	void *kp_objp;
> +	void *kp_ret;
> +	void *kp_stack[KS_ADDRS_COUNT];
> +	int kp_nstack;
> +};
> +void kmem_provenance(struct kmem_provenance *kpp);
> +
>  #endif /* MM_SLAB_H */
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index f9ccd5d..09f0cbc 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -536,6 +536,75 @@ bool slab_is_available(void)
>  	return slab_state >= UP;
>  }
>  
> +/**
> + * kmem_valid_obj - does the pointer reference a valid slab object?
> + * @object: pointer to query.
> + *
> + * Return: %true if the pointer is to a not-yet-freed object from
> + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> + * is to an already-freed object, and %false otherwise.
> + */
> +bool kmem_valid_obj(void *object)
> +{
> +	struct page *page;
> +
> +	if (!virt_addr_valid(object))
> +		return false;
> +	page = virt_to_head_page(object);
> +	return PageSlab(page);
> +}
> +EXPORT_SYMBOL_GPL(kmem_valid_obj);
> +
> +/**
> + * kmem_dump_obj - Print available slab provenance information
> + * @object: slab object for which to find provenance information.
> + *
> + * This function uses pr_cont(), so that the caller is expected to have
> + * printed out whatever preamble is appropriate.  The provenance information
> + * depends on the type of object and on how much debugging is enabled.
> + * For a slab-cache object, the fact that it is a slab object is printed,
> + * and, if available, the slab name, return address, and stack trace from
> + * the allocation of that object.
> + *
> + * This function will splat if passed a pointer to a non-slab object.
> + * If you are not sure what type of object you have, you should instead
> + * use mem_dump_obj().
> + */
> +void kmem_dump_obj(void *object)
> +{
> +	int i;
> +	struct page *page;
> +	struct kmem_provenance kp;
> +
> +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> +		return;
> +	page = virt_to_head_page(object);
> +	if (WARN_ON_ONCE(!PageSlab(page))) {
> +		pr_cont(" non-slab memory.\n");
> +		return;
> +	}
> +	kp.kp_ptr = object;
> +	kp.kp_page = page;
> +	kp.kp_nstack = KS_ADDRS_COUNT;

I hope that kmem_dump_obj() doesn't set any kp fields. It's the job
reserved for kmem_provenance().

> +	kmem_provenance(&kp);
> +	if (page->slab_cache)
> +		pr_cont(" slab %s", page->slab_cache->name);

Rather than accessing page->slab_cache, it's better to introduce
slab_cache field on kp and use it. Note that slob doesn't use
page->slab_cache. In slob, that field on struct page would be NULL so
it would not cause a problem. But using kp makes things clear.

> +	else
> +		pr_cont(" slab ");
> +	if (kp.kp_ret)
> +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> +	else
> +		pr_cont("\n");
> +	if (kp.kp_stack[0]) {

This check would be useless since we check it on every iteration.
 
Thanks.
Paul E. McKenney Dec. 10, 2020, 7:56 p.m. UTC | #9
On Thu, Dec 10, 2020 at 11:48:26AM +0100, Vlastimil Babka wrote:
> On 12/10/20 12:04 AM, Paul E. McKenney wrote:
> >> > +/**
> >> > + * kmem_valid_obj - does the pointer reference a valid slab object?
> >> > + * @object: pointer to query.
> >> > + *
> >> > + * Return: %true if the pointer is to a not-yet-freed object from
> >> > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> >> > + * is to an already-freed object, and %false otherwise.
> >> > + */
> >> 
> >> It should be possible to find out more about object being free or not, than you
> >> currently do. At least to find out if it's definitely free. When it appears
> >> allocated, it can be actually still free in some kind of e.g. per-cpu or
> >> per-node cache that would be infeasible to check. But that improvement to the
> >> output can be also added later. Also SLUB stores the freeing stacktrace, which
> >> might be useful...
> > 
> > I can see how this could help debugging a use-after-free situation,
> > at least as long as the poor sap that subsequently allocated it doesn't
> > free it.
> > 
> > I can easily add more fields to the kmem_provenance structure.  Maybe
> > it would make sense to have another exported API that you provide a
> > kmem_provenance structure to, and it fills it in.
> > 
> > One caution though...  I rely on the object being allocated.
> > If it officially might already be freed, complex and high-overhead
> > synchronization seems to be required to safely access the various data
> > structures.
> 
> Good point! It's easy to forget that when being used to similar digging in a
> crash dump, where nothing changes.

Maybe a similar addition to the crash-analysis tools would be helpful?

> > So any use on an already-freed object is on a "you break it you get to
> > keep the pieces" basis.  On the other hand, if you are dealing with a
> > use-after-free situation, life is hard anyway.
> 
> Yeah, even now I think it's potentially dangerous, as you can get
> kmem_valid_obj() as true because PageSlab(page) is true. But the object might be
> already free, so as soon as another CPU frees another object from the same slab
> page, the page gets also freed... or it was already freed and then allocated by
> another slab so it's PageSlab() again.
> I guess at least some safety could be achieved by pinning the page with
> get_page_unless_zero. But maybe your current implementation is already safe,
> need to check in detail.

The code on the various free paths looks to me to make the same
assumptions that I am making.  So if this is unsafe, we have other
problems.

> > Or am I missing your point?
> > 
> >> > +bool kmem_valid_obj(void *object)
> >> > +{
> >> > +	struct page *page;
> >> > +
> >> > +	if (!virt_addr_valid(object))
> >> > +		return false;
> >> > +	page = virt_to_head_page(object);
> >> > +	return PageSlab(page);
> >> > +}
> >> > +EXPORT_SYMBOL_GPL(kmem_valid_obj);
> >> > +
> >> > +/**
> >> > + * kmem_dump_obj - Print available slab provenance information
> >> > + * @object: slab object for which to find provenance information.
> >> > + *
> >> > + * This function uses pr_cont(), so that the caller is expected to have
> >> > + * printed out whatever preamble is appropriate.  The provenance information
> >> > + * depends on the type of object and on how much debugging is enabled.
> >> > + * For a slab-cache object, the fact that it is a slab object is printed,
> >> > + * and, if available, the slab name, return address, and stack trace from
> >> > + * the allocation of that object.
> >> > + *
> >> > + * This function will splat if passed a pointer to a non-slab object.
> >> > + * If you are not sure what type of object you have, you should instead
> >> > + * use mem_dump_obj().
> >> > + */
> >> > +void kmem_dump_obj(void *object)
> >> > +{
> >> > +	int i;
> >> > +	struct page *page;
> >> > +	struct kmem_provenance kp;
> >> > +
> >> > +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> >> > +		return;
> >> > +	page = virt_to_head_page(object);
> >> > +	if (WARN_ON_ONCE(!PageSlab(page))) {
> >> > +		pr_cont(" non-slab memory.\n");
> >> > +		return;
> >> > +	}
> >> > +	kp.kp_ptr = object;
> >> > +	kp.kp_page = page;
> >> > +	kp.kp_nstack = KS_ADDRS_COUNT;
> >> > +	kmem_provenance(&kp);
> >> 
> >> You don't seem to be printing kp.kp_objp anywhere? (unless in later patch, but
> >> would make sense in this patch already).
> > 
> > Good point!
> > 
> > However, please note that the various debugging options that reserve
> > space at the beginning.  This can make the meaning of kp.kp_objp a bit
> > different than one might expect.
> 
> Yeah, I think the best would be to match the address that
> kmalloc/kmem_cache_alloc() would return, thus the beginning of the object
> itself, so you can calculate the offset within it, etc.

My thought is to do both.  Show the start address, the data offset (if
nonzero), and the pointer offset within the data.  My guess is that in
the absence of things like slub_debug=U, the pointer offset within the
data is the best way to figure out which structure is involved.

Or do you use other tricks to work this sort of thing out?

> >> > --- a/mm/util.c
> >> > +++ b/mm/util.c
> >> 
> >> I think mm/debug.c is a better fit as it already has dump_page() of a similar
> >> nature. Also you can call that from from mem_dump_obj() at least in case when
> >> the more specific handlers fail. It will even include page_owner info if enabled! :)
> > 
> > I will count this as one vote for mm/debug.c.
> > 
> > Two things to consider, though...  First, Joonsoo suggests that the fact
> > that this produces useful information without any debugging information
> > enabled makes it not be debugging as such.
> 
> Well there's already dump_page() which also produces information without special
> configs.
> We're not the best subsystem in this kind of consistency...
> 
> > Second, mm/debug.c does
> > not include either slab.h or vmalloc.h.  The second might not be a
> > showstopper, but I was interpreting this to mean that its role was
> > less central.
> 
> I think it can include whatever becomes needed there :)

I figured that there was a significant probability that I would have to
move it, and I really don't have a basis for a preference, let alone
a preference.  But I would like to avoid moving it more than once, so
I also figured I should give anyone else having an educated preference
a chance to speak up.  ;-)

							Thanx, Paul

> >> Thanks,
> >> Vlastimil
> >> 
> >> > @@ -970,3 +970,28 @@ int __weak memcmp_pages(struct page *page1, struct page *page2)
> >> >  	kunmap_atomic(addr1);
> >> >  	return ret;
> >> >  }
> >> > +
> >> > +/**
> >> > + * mem_dump_obj - Print available provenance information
> >> > + * @object: object for which to find provenance information.
> >> > + *
> >> > + * This function uses pr_cont(), so that the caller is expected to have
> >> > + * printed out whatever preamble is appropriate.  The provenance information
> >> > + * depends on the type of object and on how much debugging is enabled.
> >> > + * For example, for a slab-cache object, the slab name is printed, and,
> >> > + * if available, the return address and stack trace from the allocation
> >> > + * of that object.
> >> > + */
> >> > +void mem_dump_obj(void *object)
> >> > +{
> >> > +	if (!virt_addr_valid(object)) {
> >> > +		pr_cont(" non-paged (local) memory.\n");
> >> > +		return;
> >> > +	}
> >> > +	if (kmem_valid_obj(object)) {
> >> > +		kmem_dump_obj(object);
> >> > +		return;
> >> > +	}
> >> > +	pr_cont(" non-slab memory.\n");
> >> > +}
> >> > +EXPORT_SYMBOL_GPL(mem_dump_obj);
> >> > 
> >> 
> > 
>
Paul E. McKenney Dec. 10, 2020, 11:41 p.m. UTC | #10
On Thu, Dec 10, 2020 at 09:04:11PM +0900, Joonsoo Kim wrote:
> On Tue, Dec 08, 2020 at 05:12:59PM -0800, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > There are kernel facilities such as per-CPU reference counts that give
> > error messages in generic handlers or callbacks, whose messages are
> > unenlightening.  In the case of per-CPU reference-count underflow, this
> > is not a problem when creating a new use of this facility because in that
> > case the bug is almost certainly in the code implementing that new use.
> > However, trouble arises when deploying across many systems, which might
> > exercise corner cases that were not seen during development and testing.
> > Here, it would be really nice to get some kind of hint as to which of
> > several uses the underflow was caused by.
> > 
> > This commit therefore exposes a mem_dump_obj() function that takes
> > a pointer to memory (which must still be allocated if it has been
> > dynamically allocated) and prints available information on where that
> > memory came from.  This pointer can reference the middle of the block as
> > well as the beginning of the block, as needed by things like RCU callback
> > functions and timer handlers that might not know where the beginning of
> > the memory block is.  These functions and handlers can use mem_dump_obj()
> > to print out better hints as to where the problem might lie.
> > 
> > The information printed can depend on kernel configuration.  For example,
> > the allocation return address can be printed only for slab and slub,
> > and even then only when the necessary debug has been enabled.  For slab,
> > build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> > to the next power of two or use the SLAB_STORE_USER when creating the
> > kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> > boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> > if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> > to enable printing of the allocation-time stack trace.
> > 
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: <linux-mm@kvack.org>
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> > [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> > Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
> 
> Introducing three functions, kmem_valid_obj(), kmem_provenance(),
> mem_dump_obj() looks better than patchset v1. Nice work. Few comments
> below.

Glad you like it!

> > ---
> >  include/linux/mm.h   |  2 ++
> >  include/linux/slab.h |  2 ++
> >  mm/slab.c            | 28 +++++++++++++++++++++
> >  mm/slab.h            | 11 +++++++++
> >  mm/slab_common.c     | 69 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  mm/slob.c            |  7 ++++++
> >  mm/slub.c            | 40 ++++++++++++++++++++++++++++++
> >  mm/util.c            | 25 +++++++++++++++++++
> >  8 files changed, 184 insertions(+)
> > 
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index ef360fe..1eea266 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -3153,5 +3153,7 @@ unsigned long wp_shared_mapping_range(struct address_space *mapping,
> >  
> >  extern int sysctl_nr_trim_pages;
> >  
> > +void mem_dump_obj(void *object);
> > +
> >  #endif /* __KERNEL__ */
> >  #endif /* _LINUX_MM_H */
> > diff --git a/include/linux/slab.h b/include/linux/slab.h
> > index dd6897f..169b511 100644
> > --- a/include/linux/slab.h
> > +++ b/include/linux/slab.h
> > @@ -186,6 +186,8 @@ void kfree(const void *);
> >  void kfree_sensitive(const void *);
> >  size_t __ksize(const void *);
> >  size_t ksize(const void *);
> > +bool kmem_valid_obj(void *object);
> > +void kmem_dump_obj(void *object);
> >  
> >  #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
> >  void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
> > diff --git a/mm/slab.c b/mm/slab.c
> > index b111356..72b6743 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3602,6 +3602,34 @@ void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
> >  EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
> >  #endif
> >  
> > +void kmem_provenance(struct kmem_provenance *kpp)
> 
> To open up the possibility of future enhancement, name, provenance,
> looks not good to me. This function could be used to extract various
> object information so such as kmem_obj_info() looks better to me. Any
> thought?

The name kmem_obj_info() works for me, updated.

> > +{
> > +#ifdef DEBUG
> > +	struct kmem_cache *cachep;
> > +	void *object = kpp->kp_ptr;
> > +	unsigned int objnr;
> > +	void *objp;
> > +	struct page *page = kpp->kp_page;
> > +
> > +	cachep = page->slab_cache;
> > +	if (!(cachep->flags & SLAB_STORE_USER)) {
> > +		kpp->kp_ret = NULL;
> > +		goto nodebug;
> > +	}
> > +	objp = object - obj_offset(cachep);
> > +	page = virt_to_head_page(objp);
> > +	objnr = obj_to_index(cachep, page, objp);
> > +	objp = index_to_obj(cachep, page, objnr);
> > +	kpp->kp_objp = objp;
> > +	kpp->kp_ret = *dbg_userword(cachep, objp);
> > +nodebug:
> > +#else
> > +	kpp->kp_ret = NULL;
> > +#endif
> > +	if (kpp->kp_nstack)
> > +		kpp->kp_stack[0] = NULL;
> > +}
> > +
> >  static __always_inline void *
> >  __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
> >  {
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 6d7c6a5..28a41d5 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -630,4 +630,15 @@ static inline bool slab_want_init_on_free(struct kmem_cache *c)
> >  	return false;
> >  }
> >  
> > +#define KS_ADDRS_COUNT 16
> > +struct kmem_provenance {
> > +	void *kp_ptr;
> > +	struct page *kp_page;
> > +	void *kp_objp;
> > +	void *kp_ret;
> > +	void *kp_stack[KS_ADDRS_COUNT];
> > +	int kp_nstack;
> > +};
> > +void kmem_provenance(struct kmem_provenance *kpp);
> > +
> >  #endif /* MM_SLAB_H */
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index f9ccd5d..09f0cbc 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -536,6 +536,75 @@ bool slab_is_available(void)
> >  	return slab_state >= UP;
> >  }
> >  
> > +/**
> > + * kmem_valid_obj - does the pointer reference a valid slab object?
> > + * @object: pointer to query.
> > + *
> > + * Return: %true if the pointer is to a not-yet-freed object from
> > + * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
> > + * is to an already-freed object, and %false otherwise.
> > + */
> > +bool kmem_valid_obj(void *object)
> > +{
> > +	struct page *page;
> > +
> > +	if (!virt_addr_valid(object))
> > +		return false;
> > +	page = virt_to_head_page(object);
> > +	return PageSlab(page);
> > +}
> > +EXPORT_SYMBOL_GPL(kmem_valid_obj);
> > +
> > +/**
> > + * kmem_dump_obj - Print available slab provenance information
> > + * @object: slab object for which to find provenance information.
> > + *
> > + * This function uses pr_cont(), so that the caller is expected to have
> > + * printed out whatever preamble is appropriate.  The provenance information
> > + * depends on the type of object and on how much debugging is enabled.
> > + * For a slab-cache object, the fact that it is a slab object is printed,
> > + * and, if available, the slab name, return address, and stack trace from
> > + * the allocation of that object.
> > + *
> > + * This function will splat if passed a pointer to a non-slab object.
> > + * If you are not sure what type of object you have, you should instead
> > + * use mem_dump_obj().
> > + */
> > +void kmem_dump_obj(void *object)
> > +{
> > +	int i;
> > +	struct page *page;
> > +	struct kmem_provenance kp;
> > +
> > +	if (WARN_ON_ONCE(!virt_addr_valid(object)))
> > +		return;
> > +	page = virt_to_head_page(object);
> > +	if (WARN_ON_ONCE(!PageSlab(page))) {
> > +		pr_cont(" non-slab memory.\n");
> > +		return;
> > +	}
> > +	kp.kp_ptr = object;
> > +	kp.kp_page = page;
> > +	kp.kp_nstack = KS_ADDRS_COUNT;
> 
> I hope that kmem_dump_obj() doesn't set any kp fields. It's the job
> reserved for kmem_provenance().

I assigned to kp.kp_ptr to avoid doing it in each of the three variants
of kmem_provenance(), but it is clearly not a big deal to do the three
assignments.  Ditto for kp.kp_page.

I can remove the kp.kp_nstack assignment entirely and have the variants
just use KS_ADDRS_COUNT directly.

And I will zero-initialize kp, thus getting rid of some of the
NULL/0 assignments in the various kmem_provenance() functions.
And a lot of goto statements.

> > +	kmem_provenance(&kp);
> > +	if (page->slab_cache)
> > +		pr_cont(" slab %s", page->slab_cache->name);
> 
> Rather than accessing page->slab_cache, it's better to introduce
> slab_cache field on kp and use it. Note that slob doesn't use
> page->slab_cache. In slob, that field on struct page would be NULL so
> it would not cause a problem. But using kp makes things clear.

Easy enough!

> > +	else
> > +		pr_cont(" slab ");
> > +	if (kp.kp_ret)
> > +		pr_cont(" allocated at %pS\n", kp.kp_ret);
> > +	else
> > +		pr_cont("\n");
> > +	if (kp.kp_stack[0]) {
> 
> This check would be useless since we check it on every iteration.

Good catch, removed.

							Thanx, Paul
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index ef360fe..1eea266 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3153,5 +3153,7 @@  unsigned long wp_shared_mapping_range(struct address_space *mapping,
 
 extern int sysctl_nr_trim_pages;
 
+void mem_dump_obj(void *object);
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/slab.h b/include/linux/slab.h
index dd6897f..169b511 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -186,6 +186,8 @@  void kfree(const void *);
 void kfree_sensitive(const void *);
 size_t __ksize(const void *);
 size_t ksize(const void *);
+bool kmem_valid_obj(void *object);
+void kmem_dump_obj(void *object);
 
 #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
 void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
diff --git a/mm/slab.c b/mm/slab.c
index b111356..72b6743 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3602,6 +3602,34 @@  void *kmem_cache_alloc_node_trace(struct kmem_cache *cachep,
 EXPORT_SYMBOL(kmem_cache_alloc_node_trace);
 #endif
 
+void kmem_provenance(struct kmem_provenance *kpp)
+{
+#ifdef DEBUG
+	struct kmem_cache *cachep;
+	void *object = kpp->kp_ptr;
+	unsigned int objnr;
+	void *objp;
+	struct page *page = kpp->kp_page;
+
+	cachep = page->slab_cache;
+	if (!(cachep->flags & SLAB_STORE_USER)) {
+		kpp->kp_ret = NULL;
+		goto nodebug;
+	}
+	objp = object - obj_offset(cachep);
+	page = virt_to_head_page(objp);
+	objnr = obj_to_index(cachep, page, objp);
+	objp = index_to_obj(cachep, page, objnr);
+	kpp->kp_objp = objp;
+	kpp->kp_ret = *dbg_userword(cachep, objp);
+nodebug:
+#else
+	kpp->kp_ret = NULL;
+#endif
+	if (kpp->kp_nstack)
+		kpp->kp_stack[0] = NULL;
+}
+
 static __always_inline void *
 __do_kmalloc_node(size_t size, gfp_t flags, int node, unsigned long caller)
 {
diff --git a/mm/slab.h b/mm/slab.h
index 6d7c6a5..28a41d5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -630,4 +630,15 @@  static inline bool slab_want_init_on_free(struct kmem_cache *c)
 	return false;
 }
 
+#define KS_ADDRS_COUNT 16
+struct kmem_provenance {
+	void *kp_ptr;
+	struct page *kp_page;
+	void *kp_objp;
+	void *kp_ret;
+	void *kp_stack[KS_ADDRS_COUNT];
+	int kp_nstack;
+};
+void kmem_provenance(struct kmem_provenance *kpp);
+
 #endif /* MM_SLAB_H */
diff --git a/mm/slab_common.c b/mm/slab_common.c
index f9ccd5d..09f0cbc 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -536,6 +536,75 @@  bool slab_is_available(void)
 	return slab_state >= UP;
 }
 
+/**
+ * kmem_valid_obj - does the pointer reference a valid slab object?
+ * @object: pointer to query.
+ *
+ * Return: %true if the pointer is to a not-yet-freed object from
+ * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
+ * is to an already-freed object, and %false otherwise.
+ */
+bool kmem_valid_obj(void *object)
+{
+	struct page *page;
+
+	if (!virt_addr_valid(object))
+		return false;
+	page = virt_to_head_page(object);
+	return PageSlab(page);
+}
+EXPORT_SYMBOL_GPL(kmem_valid_obj);
+
+/**
+ * kmem_dump_obj - Print available slab provenance information
+ * @object: slab object for which to find provenance information.
+ *
+ * This function uses pr_cont(), so that the caller is expected to have
+ * printed out whatever preamble is appropriate.  The provenance information
+ * depends on the type of object and on how much debugging is enabled.
+ * For a slab-cache object, the fact that it is a slab object is printed,
+ * and, if available, the slab name, return address, and stack trace from
+ * the allocation of that object.
+ *
+ * This function will splat if passed a pointer to a non-slab object.
+ * If you are not sure what type of object you have, you should instead
+ * use mem_dump_obj().
+ */
+void kmem_dump_obj(void *object)
+{
+	int i;
+	struct page *page;
+	struct kmem_provenance kp;
+
+	if (WARN_ON_ONCE(!virt_addr_valid(object)))
+		return;
+	page = virt_to_head_page(object);
+	if (WARN_ON_ONCE(!PageSlab(page))) {
+		pr_cont(" non-slab memory.\n");
+		return;
+	}
+	kp.kp_ptr = object;
+	kp.kp_page = page;
+	kp.kp_nstack = KS_ADDRS_COUNT;
+	kmem_provenance(&kp);
+	if (page->slab_cache)
+		pr_cont(" slab %s", page->slab_cache->name);
+	else
+		pr_cont(" slab ");
+	if (kp.kp_ret)
+		pr_cont(" allocated at %pS\n", kp.kp_ret);
+	else
+		pr_cont("\n");
+	if (kp.kp_stack[0]) {
+		for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
+			if (!kp.kp_stack[i])
+				break;
+			pr_info("    %pS\n", kp.kp_stack[i]);
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(kmem_dump_obj);
+
 #ifndef CONFIG_SLOB
 /* Create a cache during boot when no slab services are available yet */
 void __init create_boot_cache(struct kmem_cache *s, const char *name,
diff --git a/mm/slob.c b/mm/slob.c
index 7cc9805..fb10493 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -461,6 +461,13 @@  static void slob_free(void *block, int size)
 	spin_unlock_irqrestore(&slob_lock, flags);
 }
 
+void kmem_provenance(struct kmem_provenance *kpp)
+{
+	kpp->kp_ret = NULL;
+	if (kpp->kp_nstack)
+		kpp->kp_stack[0] = NULL;
+}
+
 /*
  * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend.
  */
diff --git a/mm/slub.c b/mm/slub.c
index b30be23..027fe0f 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3918,6 +3918,46 @@  int __kmem_cache_shutdown(struct kmem_cache *s)
 	return 0;
 }
 
+void kmem_provenance(struct kmem_provenance *kpp)
+{
+#ifdef CONFIG_SLUB_DEBUG
+	void *base;
+	int i;
+	void *object = kpp->kp_ptr;
+	unsigned int objnr;
+	void *objp;
+	struct page *page = kpp->kp_page;
+	struct kmem_cache *s = page->slab_cache;
+	struct track *trackp;
+
+	base = page_address(page);
+	objp = kasan_reset_tag(object);
+	objp = restore_red_left(s, objp);
+	objnr = obj_to_index(s, page, objp);
+	objp = base + s->size * objnr;
+	kpp->kp_objp = objp;
+	if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) ||
+	    !(s->flags & SLAB_STORE_USER))
+		goto nodebug;
+	trackp = get_track(s, objp, TRACK_ALLOC);
+	kpp->kp_ret = (void *)trackp->addr;
+#ifdef CONFIG_STACKTRACE
+	for (i = 0; i < kpp->kp_nstack && i < TRACK_ADDRS_COUNT; i++) {
+		kpp->kp_stack[i] = (void *)trackp->addrs[i];
+		if (!kpp->kp_stack[i])
+			break;
+	}
+#endif
+	if (kpp->kp_stack && i < kpp->kp_nstack)
+		kpp->kp_stack[i] = NULL;
+	return;
+nodebug:
+#endif
+	kpp->kp_ret = NULL;
+	if (kpp->kp_nstack)
+		kpp->kp_stack[0] = NULL;
+}
+
 /********************************************************************
  *		Kmalloc subsystem
  *******************************************************************/
diff --git a/mm/util.c b/mm/util.c
index 4ddb6e1..d0e60d2 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -970,3 +970,28 @@  int __weak memcmp_pages(struct page *page1, struct page *page2)
 	kunmap_atomic(addr1);
 	return ret;
 }
+
+/**
+ * mem_dump_obj - Print available provenance information
+ * @object: object for which to find provenance information.
+ *
+ * This function uses pr_cont(), so that the caller is expected to have
+ * printed out whatever preamble is appropriate.  The provenance information
+ * depends on the type of object and on how much debugging is enabled.
+ * For example, for a slab-cache object, the slab name is printed, and,
+ * if available, the return address and stack trace from the allocation
+ * of that object.
+ */
+void mem_dump_obj(void *object)
+{
+	if (!virt_addr_valid(object)) {
+		pr_cont(" non-paged (local) memory.\n");
+		return;
+	}
+	if (kmem_valid_obj(object)) {
+		kmem_dump_obj(object);
+		return;
+	}
+	pr_cont(" non-slab memory.\n");
+}
+EXPORT_SYMBOL_GPL(mem_dump_obj);