diff mbox series

[mm,percpu_ref,rcu,1/6] mm: Add mem_dump_obj() to print source of memory block

Message ID 20210106011750.13709-1-paulmck@kernel.org (mailing list archive)
State New, archived
Headers show
Series Export return addresses etc. for better diagnostics | expand

Commit Message

Paul E. McKenney Jan. 6, 2021, 1:17 a.m. UTC
From: "Paul E. McKenney" <paulmck@kernel.org>

There are kernel facilities such as per-CPU reference counts that give
error messages in generic handlers or callbacks, whose messages are
unenlightening.  In the case of per-CPU reference-count underflow, this
is not a problem when creating a new use of this facility because in that
case the bug is almost certainly in the code implementing that new use.
However, trouble arises when deploying across many systems, which might
exercise corner cases that were not seen during development and testing.
Here, it would be really nice to get some kind of hint as to which of
several uses the underflow was caused by.

This commit therefore exposes a mem_dump_obj() function that takes
a pointer to memory (which must still be allocated if it has been
dynamically allocated) and prints available information on where that
memory came from.  This pointer can reference the middle of the block as
well as the beginning of the block, as needed by things like RCU callback
functions and timer handlers that might not know where the beginning of
the memory block is.  These functions and handlers can use mem_dump_obj()
to print out better hints as to where the problem might lie.

The information printed can depend on kernel configuration.  For example,
the allocation return address can be printed only for slab and slub,
and even then only when the necessary debug has been enabled.  For slab,
build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
to the next power of two or use the SLAB_STORE_USER when creating the
kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
to enable printing of the allocation-time stack trace.

Cc: Christoph Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: David Rientjes <rientjes@google.com>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: <linux-mm@kvack.org>
Reported-by: Andrii Nakryiko <andrii@kernel.org>
[ paulmck: Convert to printing and change names per Joonsoo Kim. ]
[ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
[ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
[ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
[ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
---
 include/linux/mm.h   |  2 ++
 include/linux/slab.h |  2 ++
 mm/slab.c            | 20 ++++++++++++++
 mm/slab.h            | 12 +++++++++
 mm/slab_common.c     | 74 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 mm/slob.c            |  6 +++++
 mm/slub.c            | 40 ++++++++++++++++++++++++++++
 mm/util.c            | 24 +++++++++++++++++
 8 files changed, 180 insertions(+)

Comments

Vlastimil Babka Jan. 8, 2021, 1:50 p.m. UTC | #1
On 1/6/21 2:17 AM, paulmck@kernel.org wrote:
> From: "Paul E. McKenney" <paulmck@kernel.org>
> 
> There are kernel facilities such as per-CPU reference counts that give
> error messages in generic handlers or callbacks, whose messages are
> unenlightening.  In the case of per-CPU reference-count underflow, this
> is not a problem when creating a new use of this facility because in that
> case the bug is almost certainly in the code implementing that new use.
> However, trouble arises when deploying across many systems, which might
> exercise corner cases that were not seen during development and testing.
> Here, it would be really nice to get some kind of hint as to which of
> several uses the underflow was caused by.
> 
> This commit therefore exposes a mem_dump_obj() function that takes
> a pointer to memory (which must still be allocated if it has been
> dynamically allocated) and prints available information on where that
> memory came from.  This pointer can reference the middle of the block as
> well as the beginning of the block, as needed by things like RCU callback
> functions and timer handlers that might not know where the beginning of
> the memory block is.  These functions and handlers can use mem_dump_obj()
> to print out better hints as to where the problem might lie.
> 
> The information printed can depend on kernel configuration.  For example,
> the allocation return address can be printed only for slab and slub,
> and even then only when the necessary debug has been enabled.  For slab,
> build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> to the next power of two or use the SLAB_STORE_USER when creating the
> kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> to enable printing of the allocation-time stack trace.
> 
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Pekka Enberg <penberg@kernel.org>
> Cc: David Rientjes <rientjes@google.com>
> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: <linux-mm@kvack.org>
> Reported-by: Andrii Nakryiko <andrii@kernel.org>
> [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
> [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
> [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
> Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>

Acked-by: Vlastimil Babka <vbabka@suse.cz>

Some nits below:

> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3635,6 +3635,26 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
>  EXPORT_SYMBOL(__kmalloc_node_track_caller);
>  #endif /* CONFIG_NUMA */
>  
> +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> +{
> +	struct kmem_cache *cachep;
> +	unsigned int objnr;
> +	void *objp;
> +
> +	kpp->kp_ptr = object;
> +	kpp->kp_page = page;
> +	cachep = page->slab_cache;
> +	kpp->kp_slab_cache = cachep;
> +	objp = object - obj_offset(cachep);
> +	kpp->kp_data_offset = obj_offset(cachep);
> +	page = virt_to_head_page(objp);

Hm when can this page be different from the "page" we got as function parameter?
I guess only if "object" was so close to the beginning of page that "object -
obj_offset(cachep)" underflowed it. So either "object" pointed to the
padding/redzone, or even below page->s_mem. Both situations sounds like we
should handle them differently than continuing with an unrelated page that's
below our slab page?

> +	objnr = obj_to_index(cachep, page, objp);

Related, this will return bogus value for objp below page->s_mem.
And if our "object" pointer pointed beyond last valid object, this will give us
too large index.


> +	objp = index_to_obj(cachep, page, objnr);

Too large index can cause dbg_userword to be beyond our page.
In SLUB version you have the WARN_ON_ONCE that catches such invalid pointers
(before first valid object or after last valid object) and skips getting the
backtrace for those, so analogical thing should probably be done here?

> +	kpp->kp_objp = objp;
> +	if (DEBUG && cachep->flags & SLAB_STORE_USER)
> +		kpp->kp_ret = *dbg_userword(cachep, objp);
> +}
> +
> diff --git a/mm/slub.c b/mm/slub.c
> index 0c8b43a..3c1a843 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -3919,6 +3919,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
>  	return 0;
>  }
>  
> +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> +{
> +	void *base;
> +	int __maybe_unused i;
> +	unsigned int objnr;
> +	void *objp;
> +	void *objp0;
> +	struct kmem_cache *s = page->slab_cache;
> +	struct track __maybe_unused *trackp;
> +
> +	kpp->kp_ptr = object;
> +	kpp->kp_page = page;
> +	kpp->kp_slab_cache = s;
> +	base = page_address(page);
> +	objp0 = kasan_reset_tag(object);
> +#ifdef CONFIG_SLUB_DEBUG
> +	objp = restore_red_left(s, objp0);
> +#else
> +	objp = objp0;
> +#endif
> +	objnr = obj_to_index(s, page, objp);

It would be safer to use objp0 instead of objp here I think. In case "object"
was pointer to the first object's left red zone, then we would not have "objp"
underflow "base" and get a bogus objnr. The WARN_ON_ONCE below could then be
less paranoid? Basically just the "objp >= base + page->objects * s->size"
should be possible if "object" points beyond the last valid object. But
otherwise we should get valid index and thus valid "objp = base + s->size *
objnr;" below, and "objp < base" and "(objp - base) % s->size)" should be
impossible?

Hmm but since it would then be possible to get a negative pointer offset (into
the left padding/redzone), kmem_dump_obj() should calculate and print it as signed?
But it's not obvious if a pointer to left red zone is a pointer that was an
overflow of object N-1 or underflow of object N, and which one to report (unless
it's the very first object). AFAICS your current code will report all as
overflows of object N-1, which is problematic with N=0 (as described above) so
changing it to report underflows of object N would make more sense?

Thanks,
Vlastimil
Paul E. McKenney Jan. 8, 2021, 7:01 p.m. UTC | #2
On Fri, Jan 08, 2021 at 02:50:35PM +0100, Vlastimil Babka wrote:
> On 1/6/21 2:17 AM, paulmck@kernel.org wrote:
> > From: "Paul E. McKenney" <paulmck@kernel.org>
> > 
> > There are kernel facilities such as per-CPU reference counts that give
> > error messages in generic handlers or callbacks, whose messages are
> > unenlightening.  In the case of per-CPU reference-count underflow, this
> > is not a problem when creating a new use of this facility because in that
> > case the bug is almost certainly in the code implementing that new use.
> > However, trouble arises when deploying across many systems, which might
> > exercise corner cases that were not seen during development and testing.
> > Here, it would be really nice to get some kind of hint as to which of
> > several uses the underflow was caused by.
> > 
> > This commit therefore exposes a mem_dump_obj() function that takes
> > a pointer to memory (which must still be allocated if it has been
> > dynamically allocated) and prints available information on where that
> > memory came from.  This pointer can reference the middle of the block as
> > well as the beginning of the block, as needed by things like RCU callback
> > functions and timer handlers that might not know where the beginning of
> > the memory block is.  These functions and handlers can use mem_dump_obj()
> > to print out better hints as to where the problem might lie.
> > 
> > The information printed can depend on kernel configuration.  For example,
> > the allocation return address can be printed only for slab and slub,
> > and even then only when the necessary debug has been enabled.  For slab,
> > build with CONFIG_DEBUG_SLAB=y, and either use sizes with ample space
> > to the next power of two or use the SLAB_STORE_USER when creating the
> > kmem_cache structure.  For slub, build with CONFIG_SLUB_DEBUG=y and
> > boot with slub_debug=U, or pass SLAB_STORE_USER to kmem_cache_create()
> > if more focused use is desired.  Also for slub, use CONFIG_STACKTRACE
> > to enable printing of the allocation-time stack trace.
> > 
> > Cc: Christoph Lameter <cl@linux.com>
> > Cc: Pekka Enberg <penberg@kernel.org>
> > Cc: David Rientjes <rientjes@google.com>
> > Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: <linux-mm@kvack.org>
> > Reported-by: Andrii Nakryiko <andrii@kernel.org>
> > [ paulmck: Convert to printing and change names per Joonsoo Kim. ]
> > [ paulmck: Move slab definition per Stephen Rothwell and kbuild test robot. ]
> > [ paulmck: Handle CONFIG_MMU=n case where vmalloc() is kmalloc(). ]
> > [ paulmck: Apply Vlastimil Babka feedback on slab.c kmem_provenance(). ]
> > [ paulmck: Extract more info from !SLUB_DEBUG per Joonsoo Kim. ]
> > Acked-by: Joonsoo Kim <iamjoonsoo.kim@lge.com>
> 
> Acked-by: Vlastimil Babka <vbabka@suse.cz>

Thank you for the review and comments!

> Some nits below:

Andrew pushed this to an upstream maintainer, but I have not seen these
patches appear anywhere.  So if that upstream maintainer was Linus, I can
send a follow-up patch once we converge.  If the upstream maintainer was
in fact me, I can of course update the commit directly.  If the upstream
maintainer was someone else, please let me know who it is.  ;-)

(Linus does not appear to have pushed anything out since before Andrew's
email, hence my uncertainty.)

> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3635,6 +3635,26 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
> >  EXPORT_SYMBOL(__kmalloc_node_track_caller);
> >  #endif /* CONFIG_NUMA */
> >  
> > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> > +{
> > +	struct kmem_cache *cachep;
> > +	unsigned int objnr;
> > +	void *objp;
> > +
> > +	kpp->kp_ptr = object;
> > +	kpp->kp_page = page;
> > +	cachep = page->slab_cache;
> > +	kpp->kp_slab_cache = cachep;
> > +	objp = object - obj_offset(cachep);
> > +	kpp->kp_data_offset = obj_offset(cachep);
> > +	page = virt_to_head_page(objp);
> 
> Hm when can this page be different from the "page" we got as function parameter?
> I guess only if "object" was so close to the beginning of page that "object -
> obj_offset(cachep)" underflowed it. So either "object" pointed to the
> padding/redzone, or even below page->s_mem. Both situations sounds like we
> should handle them differently than continuing with an unrelated page that's
> below our slab page?

I examined other code to obtain this.  I have been assuming that the
point was to be able to handle multipage slabs, but I freely confess to
having no idea.  But I am reluctant to change this sequence unless the
other code translating from pointer to in-slab object is also changed.

> > +	objnr = obj_to_index(cachep, page, objp);
> 
> Related, this will return bogus value for objp below page->s_mem.
> And if our "object" pointer pointed beyond last valid object, this will give us
> too large index.
> 
> 
> > +	objp = index_to_obj(cachep, page, objnr);
> 
> Too large index can cause dbg_userword to be beyond our page.
> In SLUB version you have the WARN_ON_ONCE that catches such invalid pointers
> (before first valid object or after last valid object) and skips getting the
> backtrace for those, so analogical thing should probably be done here?

Like this, just before the "objp =" statement?

	WARN_ON_ONCE(objnr >= cachep->num);

> > +	kpp->kp_objp = objp;
> > +	if (DEBUG && cachep->flags & SLAB_STORE_USER)
> > +		kpp->kp_ret = *dbg_userword(cachep, objp);
> > +}
> > +
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 0c8b43a..3c1a843 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -3919,6 +3919,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
> >  	return 0;
> >  }
> >  
> > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
> > +{
> > +	void *base;
> > +	int __maybe_unused i;
> > +	unsigned int objnr;
> > +	void *objp;
> > +	void *objp0;
> > +	struct kmem_cache *s = page->slab_cache;
> > +	struct track __maybe_unused *trackp;
> > +
> > +	kpp->kp_ptr = object;
> > +	kpp->kp_page = page;
> > +	kpp->kp_slab_cache = s;
> > +	base = page_address(page);
> > +	objp0 = kasan_reset_tag(object);
> > +#ifdef CONFIG_SLUB_DEBUG
> > +	objp = restore_red_left(s, objp0);
> > +#else
> > +	objp = objp0;
> > +#endif
> > +	objnr = obj_to_index(s, page, objp);
> 
> It would be safer to use objp0 instead of objp here I think. In case "object"
> was pointer to the first object's left red zone, then we would not have "objp"
> underflow "base" and get a bogus objnr. The WARN_ON_ONCE below could then be
> less paranoid? Basically just the "objp >= base + page->objects * s->size"
> should be possible if "object" points beyond the last valid object. But
> otherwise we should get valid index and thus valid "objp = base + s->size *
> objnr;" below, and "objp < base" and "(objp - base) % s->size)" should be
> impossible?
> 
> Hmm but since it would then be possible to get a negative pointer offset (into
> the left padding/redzone), kmem_dump_obj() should calculate and print it as signed?
> But it's not obvious if a pointer to left red zone is a pointer that was an
> overflow of object N-1 or underflow of object N, and which one to report (unless
> it's the very first object). AFAICS your current code will report all as
> overflows of object N-1, which is problematic with N=0 (as described above) so
> changing it to report underflows of object N would make more sense?

Doesn't the "WARN_ON_ONCE(objp < base" further down report underflows?
Or am I missing something subtle here?

							Thanx, Paul
Vlastimil Babka Jan. 8, 2021, 7:41 p.m. UTC | #3
On 1/8/21 8:01 PM, Paul E. McKenney wrote:
> 
> Andrew pushed this to an upstream maintainer, but I have not seen these
> patches appear anywhere.  So if that upstream maintainer was Linus, I can
> send a follow-up patch once we converge.  If the upstream maintainer was
> in fact me, I can of course update the commit directly.  If the upstream
> maintainer was someone else, please let me know who it is.  ;-)
> 
> (Linus does not appear to have pushed anything out since before Andrew's
> email, hence my uncertainty.)

I've wondered about the mm-commits messages too, and concluded that the most probable explanation is that Andrew tried to add your series to mmotm and then tried mmotm merge to linux-next and found out the series is already there via your rcu tree :)
 
>> > --- a/mm/slab.c
>> > +++ b/mm/slab.c
>> > @@ -3635,6 +3635,26 @@ void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
>> >  EXPORT_SYMBOL(__kmalloc_node_track_caller);
>> >  #endif /* CONFIG_NUMA */
>> >  
>> > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
>> > +{
>> > +	struct kmem_cache *cachep;
>> > +	unsigned int objnr;
>> > +	void *objp;
>> > +
>> > +	kpp->kp_ptr = object;
>> > +	kpp->kp_page = page;
>> > +	cachep = page->slab_cache;
>> > +	kpp->kp_slab_cache = cachep;
>> > +	objp = object - obj_offset(cachep);
>> > +	kpp->kp_data_offset = obj_offset(cachep);
>> > +	page = virt_to_head_page(objp);
>> 
>> Hm when can this page be different from the "page" we got as function parameter?
>> I guess only if "object" was so close to the beginning of page that "object -
>> obj_offset(cachep)" underflowed it. So either "object" pointed to the
>> padding/redzone, or even below page->s_mem. Both situations sounds like we
>> should handle them differently than continuing with an unrelated page that's
>> below our slab page?
> 
> I examined other code to obtain this.  I have been assuming that the
> point was to be able to handle multipage slabs, but I freely confess to
> having no idea.  But I am reluctant to change this sequence unless the
> other code translating from pointer to in-slab object is also changed.

OK, I will check the other code.

>> > +	objnr = obj_to_index(cachep, page, objp);
>> 
>> Related, this will return bogus value for objp below page->s_mem.
>> And if our "object" pointer pointed beyond last valid object, this will give us
>> too large index.
>> 
>> 
>> > +	objp = index_to_obj(cachep, page, objnr);
>> 
>> Too large index can cause dbg_userword to be beyond our page.
>> In SLUB version you have the WARN_ON_ONCE that catches such invalid pointers
>> (before first valid object or after last valid object) and skips getting the
>> backtrace for those, so analogical thing should probably be done here?
> 
> Like this, just before the "objp =" statement?
> 
> 	WARN_ON_ONCE(objnr >= cachep->num);

Yeah, that should do the trick to prevent accessing garbage dbg_userword.

But I wrote the comments about SLAB first and only in the SLUB part I realized
about the larger picture. So now I would consider something like below, which
should find the closest valid object index and resulting pointer offset in
kmem_dump_obj() might become negative. Pointers to padding, below page->s_mem or
beyond last object just become respectively large negative or positive pointer
offsets and we probably don't need to warn for them specially unless we warn
also for all other pointers that are not within the "data area" of the object.

void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
{
        struct kmem_cache *cachep;
        unsigned int objnr;
        void *objp;

        kpp->kp_ptr = object;
        kpp->kp_page = page;
        cachep = page->slab_cache;
        kpp->kp_slab_cache = cachep;
        kpp->kp_data_offset = obj_offset(cachep);
	if (object < page->s_mem)
		objnr = 0;
	else
        	objnr = obj_to_index(cachep, page, object);
	if (objnr >= cachep->num)
		objnr = cachep->num - 1;
        objp = index_to_obj(cachep, page, objnr);
        kpp->kp_objp = objp;
        if (DEBUG && cachep->flags & SLAB_STORE_USER)
                kpp->kp_ret = *dbg_userword(cachep, objp);
}

 
>> > +	kpp->kp_objp = objp;
>> > +	if (DEBUG && cachep->flags & SLAB_STORE_USER)
>> > +		kpp->kp_ret = *dbg_userword(cachep, objp);
>> > +}
>> > +
>> > diff --git a/mm/slub.c b/mm/slub.c
>> > index 0c8b43a..3c1a843 100644
>> > --- a/mm/slub.c
>> > +++ b/mm/slub.c
>> > @@ -3919,6 +3919,46 @@ int __kmem_cache_shutdown(struct kmem_cache *s)
>> >  	return 0;
>> >  }
>> >  
>> > +void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
>> > +{
>> > +	void *base;
>> > +	int __maybe_unused i;
>> > +	unsigned int objnr;
>> > +	void *objp;
>> > +	void *objp0;
>> > +	struct kmem_cache *s = page->slab_cache;
>> > +	struct track __maybe_unused *trackp;
>> > +
>> > +	kpp->kp_ptr = object;
>> > +	kpp->kp_page = page;
>> > +	kpp->kp_slab_cache = s;
>> > +	base = page_address(page);
>> > +	objp0 = kasan_reset_tag(object);
>> > +#ifdef CONFIG_SLUB_DEBUG
>> > +	objp = restore_red_left(s, objp0);
>> > +#else
>> > +	objp = objp0;
>> > +#endif
>> > +	objnr = obj_to_index(s, page, objp);
>> 
>> It would be safer to use objp0 instead of objp here I think. In case "object"
>> was pointer to the first object's left red zone, then we would not have "objp"
>> underflow "base" and get a bogus objnr. The WARN_ON_ONCE below could then be
>> less paranoid? Basically just the "objp >= base + page->objects * s->size"
>> should be possible if "object" points beyond the last valid object. But
>> otherwise we should get valid index and thus valid "objp = base + s->size *
>> objnr;" below, and "objp < base" and "(objp - base) % s->size)" should be
>> impossible?
>> 
>> Hmm but since it would then be possible to get a negative pointer offset (into
>> the left padding/redzone), kmem_dump_obj() should calculate and print it as signed?
>> But it's not obvious if a pointer to left red zone is a pointer that was an
>> overflow of object N-1 or underflow of object N, and which one to report (unless
>> it's the very first object). AFAICS your current code will report all as
>> overflows of object N-1, which is problematic with N=0 (as described above) so
>> changing it to report underflows of object N would make more sense?
> 
> Doesn't the "WARN_ON_ONCE(objp < base" further down report underflows?

I don't think it could be possible, could you describe the conditions?

> Or am I missing something subtle here?

A version analogical to the SLAB one above could AFAICS look like this:

...
        kpp->kp_ptr = object;
        kpp->kp_page = page;
        kpp->kp_slab_cache = s;
        base = page_address(page);
        objp0 = kasan_reset_tag(object);
#ifdef CONFIG_SLUB_DEBUG
        objp = restore_red_left(s, objp0);
#else
        objp = objp0;
#endif
        kpp->kp_data_offset = (unsigned long)((char *)objp0 - (char *)objp);
        objnr = obj_to_index(s, page, objp0); // unlike SLAB this can't underflow
	if (objnr >= page->objects)
		objnr = page->objects - 1;
        objp = base + s->size * objnr;
        kpp->kp_objp = objp;
	// no WARN_ON_ONCE() needed, objp has to be valid, we just might have negative
	// offset to it, or a larger than s->size positive offset
#ifdef CONFIG_SLUB_DEBUG
	// etc, no changes below
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 5299b90a..af7d050 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -3169,5 +3169,7 @@  unsigned long wp_shared_mapping_range(struct address_space *mapping,
 
 extern int sysctl_nr_trim_pages;
 
+void mem_dump_obj(void *object);
+
 #endif /* __KERNEL__ */
 #endif /* _LINUX_MM_H */
diff --git a/include/linux/slab.h b/include/linux/slab.h
index be4ba58..7ae6040 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -186,6 +186,8 @@  void kfree(const void *);
 void kfree_sensitive(const void *);
 size_t __ksize(const void *);
 size_t ksize(const void *);
+bool kmem_valid_obj(void *object);
+void kmem_dump_obj(void *object);
 
 #ifdef CONFIG_HAVE_HARDENED_USERCOPY_ALLOCATOR
 void __check_heap_object(const void *ptr, unsigned long n, struct page *page,
diff --git a/mm/slab.c b/mm/slab.c
index d7c8da9..dcc55e7 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3635,6 +3635,26 @@  void *__kmalloc_node_track_caller(size_t size, gfp_t flags,
 EXPORT_SYMBOL(__kmalloc_node_track_caller);
 #endif /* CONFIG_NUMA */
 
+void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
+{
+	struct kmem_cache *cachep;
+	unsigned int objnr;
+	void *objp;
+
+	kpp->kp_ptr = object;
+	kpp->kp_page = page;
+	cachep = page->slab_cache;
+	kpp->kp_slab_cache = cachep;
+	objp = object - obj_offset(cachep);
+	kpp->kp_data_offset = obj_offset(cachep);
+	page = virt_to_head_page(objp);
+	objnr = obj_to_index(cachep, page, objp);
+	objp = index_to_obj(cachep, page, objnr);
+	kpp->kp_objp = objp;
+	if (DEBUG && cachep->flags & SLAB_STORE_USER)
+		kpp->kp_ret = *dbg_userword(cachep, objp);
+}
+
 /**
  * __do_kmalloc - allocate memory
  * @size: how many bytes of memory are required.
diff --git a/mm/slab.h b/mm/slab.h
index 1a756a3..ecad9b5 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -615,4 +615,16 @@  static inline bool slab_want_init_on_free(struct kmem_cache *c)
 	return false;
 }
 
+#define KS_ADDRS_COUNT 16
+struct kmem_obj_info {
+	void *kp_ptr;
+	struct page *kp_page;
+	void *kp_objp;
+	unsigned long kp_data_offset;
+	struct kmem_cache *kp_slab_cache;
+	void *kp_ret;
+	void *kp_stack[KS_ADDRS_COUNT];
+};
+void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page);
+
 #endif /* MM_SLAB_H */
diff --git a/mm/slab_common.c b/mm/slab_common.c
index e981c80..b594413 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -537,6 +537,80 @@  bool slab_is_available(void)
 	return slab_state >= UP;
 }
 
+/**
+ * kmem_valid_obj - does the pointer reference a valid slab object?
+ * @object: pointer to query.
+ *
+ * Return: %true if the pointer is to a not-yet-freed object from
+ * kmalloc() or kmem_cache_alloc(), either %true or %false if the pointer
+ * is to an already-freed object, and %false otherwise.
+ */
+bool kmem_valid_obj(void *object)
+{
+	struct page *page;
+
+	if (!virt_addr_valid(object))
+		return false;
+	page = virt_to_head_page(object);
+	return PageSlab(page);
+}
+
+/**
+ * kmem_dump_obj - Print available slab provenance information
+ * @object: slab object for which to find provenance information.
+ *
+ * This function uses pr_cont(), so that the caller is expected to have
+ * printed out whatever preamble is appropriate.  The provenance information
+ * depends on the type of object and on how much debugging is enabled.
+ * For a slab-cache object, the fact that it is a slab object is printed,
+ * and, if available, the slab name, return address, and stack trace from
+ * the allocation of that object.
+ *
+ * This function will splat if passed a pointer to a non-slab object.
+ * If you are not sure what type of object you have, you should instead
+ * use mem_dump_obj().
+ */
+void kmem_dump_obj(void *object)
+{
+	char *cp = IS_ENABLED(CONFIG_MMU) ? "" : "/vmalloc";
+	int i;
+	struct page *page;
+	unsigned long ptroffset;
+	struct kmem_obj_info kp = { };
+
+	if (WARN_ON_ONCE(!virt_addr_valid(object)))
+		return;
+	page = virt_to_head_page(object);
+	if (WARN_ON_ONCE(!PageSlab(page))) {
+		pr_cont(" non-slab memory.\n");
+		return;
+	}
+	kmem_obj_info(&kp, object, page);
+	if (kp.kp_slab_cache)
+		pr_cont(" slab%s %s", cp, kp.kp_slab_cache->name);
+	else
+		pr_cont(" slab%s", cp);
+	if (kp.kp_objp)
+		pr_cont(" start %px", kp.kp_objp);
+	if (kp.kp_data_offset)
+		pr_cont(" data offset %lu", kp.kp_data_offset);
+	if (kp.kp_objp) {
+		ptroffset = ((char *)object - (char *)kp.kp_objp) - kp.kp_data_offset;
+		pr_cont(" pointer offset %lu", ptroffset);
+	}
+	if (kp.kp_slab_cache && kp.kp_slab_cache->usersize)
+		pr_cont(" size %u", kp.kp_slab_cache->usersize);
+	if (kp.kp_ret)
+		pr_cont(" allocated at %pS\n", kp.kp_ret);
+	else
+		pr_cont("\n");
+	for (i = 0; i < ARRAY_SIZE(kp.kp_stack); i++) {
+		if (!kp.kp_stack[i])
+			break;
+		pr_info("    %pS\n", kp.kp_stack[i]);
+	}
+}
+
 #ifndef CONFIG_SLOB
 /* Create a cache during boot when no slab services are available yet */
 void __init create_boot_cache(struct kmem_cache *s, const char *name,
diff --git a/mm/slob.c b/mm/slob.c
index 8d4bfa4..ef87ada 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -461,6 +461,12 @@  static void slob_free(void *block, int size)
 	spin_unlock_irqrestore(&slob_lock, flags);
 }
 
+void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
+{
+	kpp->kp_ptr = object;
+	kpp->kp_page = page;
+}
+
 /*
  * End of slob allocator proper. Begin kmem_cache_alloc and kmalloc frontend.
  */
diff --git a/mm/slub.c b/mm/slub.c
index 0c8b43a..3c1a843 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3919,6 +3919,46 @@  int __kmem_cache_shutdown(struct kmem_cache *s)
 	return 0;
 }
 
+void kmem_obj_info(struct kmem_obj_info *kpp, void *object, struct page *page)
+{
+	void *base;
+	int __maybe_unused i;
+	unsigned int objnr;
+	void *objp;
+	void *objp0;
+	struct kmem_cache *s = page->slab_cache;
+	struct track __maybe_unused *trackp;
+
+	kpp->kp_ptr = object;
+	kpp->kp_page = page;
+	kpp->kp_slab_cache = s;
+	base = page_address(page);
+	objp0 = kasan_reset_tag(object);
+#ifdef CONFIG_SLUB_DEBUG
+	objp = restore_red_left(s, objp0);
+#else
+	objp = objp0;
+#endif
+	objnr = obj_to_index(s, page, objp);
+	kpp->kp_data_offset = (unsigned long)((char *)objp0 - (char *)objp);
+	objp = base + s->size * objnr;
+	kpp->kp_objp = objp;
+	if (WARN_ON_ONCE(objp < base || objp >= base + page->objects * s->size || (objp - base) % s->size) ||
+	    !(s->flags & SLAB_STORE_USER))
+		return;
+#ifdef CONFIG_SLUB_DEBUG
+	trackp = get_track(s, objp, TRACK_ALLOC);
+	kpp->kp_ret = (void *)trackp->addr;
+#ifdef CONFIG_STACKTRACE
+	for (i = 0; i < KS_ADDRS_COUNT && i < TRACK_ADDRS_COUNT; i++) {
+		kpp->kp_stack[i] = (void *)trackp->addrs[i];
+		if (!kpp->kp_stack[i])
+			break;
+	}
+#endif
+#endif
+}
+
 /********************************************************************
  *		Kmalloc subsystem
  *******************************************************************/
diff --git a/mm/util.c b/mm/util.c
index 8c9b7d1..da46f9d 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -982,3 +982,27 @@  int __weak memcmp_pages(struct page *page1, struct page *page2)
 	kunmap_atomic(addr1);
 	return ret;
 }
+
+/**
+ * mem_dump_obj - Print available provenance information
+ * @object: object for which to find provenance information.
+ *
+ * This function uses pr_cont(), so that the caller is expected to have
+ * printed out whatever preamble is appropriate.  The provenance information
+ * depends on the type of object and on how much debugging is enabled.
+ * For example, for a slab-cache object, the slab name is printed, and,
+ * if available, the return address and stack trace from the allocation
+ * of that object.
+ */
+void mem_dump_obj(void *object)
+{
+	if (!virt_addr_valid(object)) {
+		pr_cont(" non-paged (local) memory.\n");
+		return;
+	}
+	if (kmem_valid_obj(object)) {
+		kmem_dump_obj(object);
+		return;
+	}
+	pr_cont(" non-slab memory.\n");
+}