diff mbox series

[v2] mm/slub: introduce SLAB_WARN_ON_ERROR

Message ID 1548313223-17114-1-git-send-email-miles.chen@mediatek.com (mailing list archive)
State New, archived
Headers show
Series [v2] mm/slub: introduce SLAB_WARN_ON_ERROR | expand

Commit Message

Miles Chen Jan. 24, 2019, 7 a.m. UTC
From: Miles Chen <miles.chen@mediatek.com>

When debugging slab errors in slub.c, sometimes we have to trigger
a panic in order to get the coredump file. Add a debug option
SLAB_WARN_ON_ERROR to toggle WARN_ON() when the option is set.

Change since v1:
1. Add a special debug option SLAB_WARN_ON_ERROR and toggle WARN_ON()
if it is set.
2. SLAB_WARN_ON_ERROR can be set by kernel parameter slub_debug.

Cc: Christopher Lameter <cl@linux.com>
Cc: Pekka Enberg <penberg@kernel.org>,
Cc: David Rientjes <rientjes@google.com>,
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>,
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Signed-off-by: Miles Chen <miles.chen@mediatek.com>
---
 Documentation/vm/slub.rst |  1 +
 include/linux/slab.h      |  3 +++
 mm/slub.c                 | 34 ++++++++++++++++++++++++++++++++--
 3 files changed, 36 insertions(+), 2 deletions(-)

Comments

Andrew Morton Jan. 28, 2019, 8:29 p.m. UTC | #1
On Thu, 24 Jan 2019 15:00:23 +0800 <miles.chen@mediatek.com> wrote:

> From: Miles Chen <miles.chen@mediatek.com>
> 
> When debugging slab errors in slub.c, sometimes we have to trigger
> a panic in order to get the coredump file. Add a debug option
> SLAB_WARN_ON_ERROR to toggle WARN_ON() when the option is set.
> 
> Change since v1:
> 1. Add a special debug option SLAB_WARN_ON_ERROR and toggle WARN_ON()
> if it is set.
> 2. SLAB_WARN_ON_ERROR can be set by kernel parameter slub_debug.
> 

Hopefully the slab developers will have an opinion on this.

> --- a/Documentation/vm/slub.rst
> +++ b/Documentation/vm/slub.rst
> @@ -52,6 +52,7 @@ Possible debug options are::
>  	A		Toggle failslab filter mark for the cache
>  	O		Switch debugging off for caches that would have
>  			caused higher minimum slab orders
> +	W		Toggle WARN_ON() on slab errors
>  	-		Switch all debugging off (useful if the kernel is
>  			configured with CONFIG_SLUB_DEBUG_ON)

This documentation is poorly phrased.  The term "toggle" means to
invert the value of a boolean: if it was 1, make it 0 and if it was 0,
make it 1.  But that isn't what these options do.  Something like
"enable/disable" would be better.   So...

--- a/Documentation/vm/slub.rst~mm-slub-introduce-slab_warn_on_error-fix
+++ a/Documentation/vm/slub.rst
@@ -49,10 +49,10 @@ Possible debug options are::
 	P		Poisoning (object and padding)
 	U		User tracking (free and alloc)
 	T		Trace (please only use on single slabs)
-	A		Toggle failslab filter mark for the cache
+	A		Enable/disable failslab filter mark for the cache
 	O		Switch debugging off for caches that would have
 			caused higher minimum slab orders
-	W		Toggle WARN_ON() on slab errors
+	W		Enable/disable WARN_ON() on slab errors
 	-		Switch all debugging off (useful if the kernel is
 			configured with CONFIG_SLUB_DEBUG_ON)
David Rientjes Jan. 29, 2019, 1:41 a.m. UTC | #2
On Thu, 24 Jan 2019, miles.chen@mediatek.com wrote:

> From: Miles Chen <miles.chen@mediatek.com>
> 
> When debugging slab errors in slub.c, sometimes we have to trigger
> a panic in order to get the coredump file. Add a debug option
> SLAB_WARN_ON_ERROR to toggle WARN_ON() when the option is set.
> 

Wouldn't it be better to enable/disable this for all slab caches instead 
of individual caches at runtime?  I'm not sure excluding some caches 
because you know they'll WARN and trigger panic_on_warn unnecessarily is 
valid since it could be enabled for that cache as well through this 
interface.
Miles Chen Jan. 29, 2019, 3:45 a.m. UTC | #3
On Mon, 2019-01-28 at 17:41 -0800, David Rientjes wrote:
> On Thu, 24 Jan 2019, miles.chen@mediatek.com wrote:
> 
> > From: Miles Chen <miles.chen@mediatek.com>
> > 
> > When debugging slab errors in slub.c, sometimes we have to trigger
> > a panic in order to get the coredump file. Add a debug option
> > SLAB_WARN_ON_ERROR to toggle WARN_ON() when the option is set.
> > 
> 
> Wouldn't it be better to enable/disable this for all slab caches instead 
> of individual caches at runtime?  I'm not sure excluding some caches 
> because you know they'll WARN and trigger panic_on_warn unnecessarily is 
> valid since it could be enabled for that cache as well through this 
> interface.

We can enable this option only for specific slab(s).
e.g., slub_debug=W,dentry
or
enable this option for all slabs
e.g., slub_debug=W
Christoph Lameter (Ampere) Jan. 29, 2019, 5:46 a.m. UTC | #4
On Mon, 28 Jan 2019, Andrew Morton wrote:

> > When debugging slab errors in slub.c, sometimes we have to trigger
> > a panic in order to get the coredump file. Add a debug option
> > SLAB_WARN_ON_ERROR to toggle WARN_ON() when the option is set.
> >
> > Change since v1:
> > 1. Add a special debug option SLAB_WARN_ON_ERROR and toggle WARN_ON()
> > if it is set.
> > 2. SLAB_WARN_ON_ERROR can be set by kernel parameter slub_debug.
> >
>
> Hopefully the slab developers will have an opinion on this.

Debugging slab itself is usually done in kvm or some other virtualized
environment. Then gdb can be used to set breakpoints. Otherwise one may
add printks and stuff to the allocators to figure out more or use perf.

What you are changing here is the debugging for data corruption within
objects managed by slub or the metadata. Slub currently outputs extensive
data about the metadata corruption (typically caused by a user of
slab allocation) which should allow you to set a proper
breakpoint not in the allocator but in the subsystem where the corruption
occurs.
Miles Chen Jan. 29, 2019, 7:53 a.m. UTC | #5
On Tue, 2019-01-29 at 05:46 +0000, Christopher Lameter wrote:
> On Mon, 28 Jan 2019, Andrew Morton wrote:
> 
> > > When debugging slab errors in slub.c, sometimes we have to trigger
> > > a panic in order to get the coredump file. Add a debug option
> > > SLAB_WARN_ON_ERROR to toggle WARN_ON() when the option is set.
> > >
> > > Change since v1:
> > > 1. Add a special debug option SLAB_WARN_ON_ERROR and toggle WARN_ON()
> > > if it is set.
> > > 2. SLAB_WARN_ON_ERROR can be set by kernel parameter slub_debug.
> > >
> >
> > Hopefully the slab developers will have an opinion on this.
> 
> Debugging slab itself is usually done in kvm or some other virtualized
> environment. Then gdb can be used to set breakpoints. Otherwise one may
> add printks and stuff to the allocators to figure out more or use perf.
> 
> 
> What you are changing here is the debugging for data corruption within
> objects managed by slub or the metadata. Slub currently outputs extensive
> data about the metadata corruption (typically caused by a user of
> slab allocation) which should allow you to set a proper
> breakpoint not in the allocator but in the subsystem where the corruption
> occurs.
> 
Thanks for your comments. The real problems the change can help are:

a) classic slub issue. e.g., use-after-free, redzone overwritten. It's
more efficient to report a issue as soon as slub detects it. (comparing
to monitor the log, set a breakpoint, and re-produce the issue). With
the coredump file, we can analyze the issue.

b) memory corruption issues caused by h/w write. e.g., memory
overwritten by a DMA engine. Memory corruptions may or may not related
to the slab cache that reports any error. For example: kmalloc-256 or
dentry may report the same errors. If we can preserve the the coredump
file without any restore/reset processing in slub, we could have more
information of this memory corruption.

c) memory corruption issues caused by unstable h/w. e.g., bit flipping
because of xxxx DRAM die or applying new power settings. It's hard to
re-produce this kind of issue and it much easier to tell this kind of
issue in the coredump file without any restore/reset processing.

Users can set the option by slub_debug. We can still have the original
behavior(keep the system alive) if the option is not set. We can turn on
the option when we need the coredump file. (with panic_on_warn is set,
of course).
Christoph Lameter (Ampere) Jan. 29, 2019, 7:46 p.m. UTC | #6
On Tue, 29 Jan 2019, Miles Chen wrote:

> a) classic slub issue. e.g., use-after-free, redzone overwritten. It's
> more efficient to report a issue as soon as slub detects it. (comparing
> to monitor the log, set a breakpoint, and re-produce the issue). With
> the coredump file, we can analyze the issue.

What usually happens is that the systems fails with a strange error
message. Then the system is rebooted using slub_debug options and the
issue is reproduced yielding more information about the problem.

Then you run the scenario again with additional debugging in the subsystem
that caused the problem.

So you are already reproducing the issue because you need to activate
debugging to get more information. Doing it for the 3rd time is not that
much more difficult.

None of your modifications will be active in a production kernel.
slub_debug must be activated to use it and thus you are already
reproducing the issue.

> b) memory corruption issues caused by h/w write. e.g., memory
> overwritten by a DMA engine. Memory corruptions may or may not related
> to the slab cache that reports any error. For example: kmalloc-256 or
> dentry may report the same errors. If we can preserve the the coredump
> file without any restore/reset processing in slub, we could have more
> information of this memory corruption.

If debugging is active then reporting will include the accurate slab cache
affected. The memory layout is already changing when you enable the
existing debugging code. None of your code runs without that and thus is
cannot add a coredump for the prod case without debugging.

> c) memory corruption issues caused by unstable h/w. e.g., bit flipping
> because of xxxx DRAM die or applying new power settings. It's hard to
> re-produce this kind of issue and it much easier to tell this kind of
> issue in the coredump file without any restore/reset processing.

But then you patch does not help in this situation because the code has to
be enabled by special  slub debug options.


> Users can set the option by slub_debug. We can still have the original
> behavior(keep the system alive) if the option is not set. We can turn on
> the option when we need the coredump file. (with panic_on_warn is set,
> of course).

I think we would need to turn on debugging by default and have your patch
for this to make sense. We already reproducing the issue multiple times
for debugging. This patch does not change that.
Miles Chen Jan. 30, 2019, 1:43 a.m. UTC | #7
On Tue, 2019-01-29 at 19:46 +0000, Christopher Lameter wrote:
> On Tue, 29 Jan 2019, Miles Chen wrote:
> 
> > a) classic slub issue. e.g., use-after-free, redzone overwritten. It's
> > more efficient to report a issue as soon as slub detects it. (comparing
> > to monitor the log, set a breakpoint, and re-produce the issue). With
> > the coredump file, we can analyze the issue.
> 
> What usually happens is that the systems fails with a strange error
> message. Then the system is rebooted using slub_debug options and the
> issue is reproduced yielding more information about the problem.
> 
> Then you run the scenario again with additional debugging in the subsystem
> that caused the problem.

Thanks your comments and patient.

I now understand the difference between us.
I usually enable CONFIG_SLUB_DEBUG=y, CONFIG_SLUB_DEBUG_ON=y and setup
slub_debug by default and do all tests. (eng mode).
Not hit an issue first, then setup slub_debug and reproduce the issue
again.

CONFIG_SLUB_DEBUG is disabled for products.

> 
> So you are already reproducing the issue because you need to activate
> debugging to get more information. Doing it for the 3rd time is not that
> much more difficult.
> 
> None of your modifications will be active in a production kernel.
> slub_debug must be activated to use it and thus you are already
> reproducing the issue.
> 
> > b) memory corruption issues caused by h/w write. e.g., memory
> > overwritten by a DMA engine. Memory corruptions may or may not related
> > to the slab cache that reports any error. For example: kmalloc-256 or
> > dentry may report the same errors. If we can preserve the the coredump
> > file without any restore/reset processing in slub, we could have more
> > information of this memory corruption.
> 
> If debugging is active then reporting will include the accurate slab cache
> affected. The memory layout is already changing when you enable the
> existing debugging code. None of your code runs without that and thus is
> cannot add a coredump for the prod case without debugging.

I usually set slub_debug by default and get the coredump file.

> > c) memory corruption issues caused by unstable h/w. e.g., bit flipping
> > because of xxxx DRAM die or applying new power settings. It's hard to
> > re-produce this kind of issue and it much easier to tell this kind of
> > issue in the coredump file without any restore/reset processing.
> 
> But then you patch does not help in this situation because the code has to
> be enabled by special  slub debug options.
> 
> 
> > Users can set the option by slub_debug. We can still have the original
> > behavior(keep the system alive) if the option is not set. We can turn on
> > the option when we need the coredump file. (with panic_on_warn is set,
> > of course).
> 
> I think we would need to turn on debugging by default and have your patch
> for this to make sense. We already reproducing the issue multiple times
> for debugging. This patch does not change that.
> 
yes. I turn on the debugging by default. Does that make sense now?

Thanks again for your comments.
diff mbox series

Patch

diff --git a/Documentation/vm/slub.rst b/Documentation/vm/slub.rst
index 195928808bac..236c00b2d17b 100644
--- a/Documentation/vm/slub.rst
+++ b/Documentation/vm/slub.rst
@@ -52,6 +52,7 @@  Possible debug options are::
 	A		Toggle failslab filter mark for the cache
 	O		Switch debugging off for caches that would have
 			caused higher minimum slab orders
+	W		Toggle WARN_ON() on slab errors
 	-		Switch all debugging off (useful if the kernel is
 			configured with CONFIG_SLUB_DEBUG_ON)
 
diff --git a/include/linux/slab.h b/include/linux/slab.h
index 11b45f7ae405..1fd9911890c6 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -109,6 +109,9 @@ 
 #define SLAB_KASAN		0
 #endif
 
+/* WARN_ON slab error */
+#define SLAB_WARN_ON_ERROR	((slab_flags_t __force)0x10000000U)
+
 /* The following flags affect the page allocator grouping pages by mobility */
 /* Objects are reclaimable */
 #define SLAB_RECLAIM_ACCOUNT	((slab_flags_t __force)0x00020000U)
diff --git a/mm/slub.c b/mm/slub.c
index 1e3d0ec4e200..60f93e0657fb 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -684,7 +684,10 @@  static void print_trailer(struct kmem_cache *s, struct page *page, u8 *p)
 		print_section(KERN_ERR, "Padding ", p + off,
 			      size_from_object(s) - off);
 
-	dump_stack();
+	if (unlikely(s->flags & SLAB_WARN_ON_ERROR))
+		WARN_ON(1);
+	else
+		dump_stack();
 }
 
 void object_err(struct kmem_cache *s, struct page *page,
@@ -705,7 +708,11 @@  static __printf(3, 4) void slab_err(struct kmem_cache *s, struct page *page,
 	va_end(args);
 	slab_bug(s, "%s", buf);
 	print_page_info(page);
-	dump_stack();
+
+	if (unlikely(s->flags & SLAB_WARN_ON_ERROR))
+		WARN_ON(1);
+	else
+		dump_stack();
 }
 
 static void init_object(struct kmem_cache *s, void *object, u8 val)
@@ -1254,6 +1261,9 @@  static int __init setup_slub_debug(char *str)
 		case 'a':
 			slub_debug |= SLAB_FAILSLAB;
 			break;
+		case 'w':
+			slub_debug |= SLAB_WARN_ON_ERROR;
+			break;
 		case 'o':
 			/*
 			 * Avoid enabling debugging on caches if its minimum
@@ -5220,6 +5230,25 @@  static ssize_t store_user_store(struct kmem_cache *s,
 }
 SLAB_ATTR(store_user);
 
+static ssize_t warn_on_error_show(struct kmem_cache *s, char *buf)
+{
+	return sprintf(buf, "%d\n", !!(s->flags & SLAB_WARN_ON_ERROR));
+}
+
+static ssize_t warn_on_error_store(struct kmem_cache *s,
+				const char *buf, size_t length)
+{
+	if (any_slab_objects(s))
+		return -EBUSY;
+
+	s->flags &= ~SLAB_WARN_ON_ERROR;
+	if (buf[0] == '1')
+		s->flags |= SLAB_WARN_ON_ERROR;
+
+	return length;
+}
+SLAB_ATTR(warn_on_error);
+
 static ssize_t validate_show(struct kmem_cache *s, char *buf)
 {
 	return 0;
@@ -5428,6 +5457,7 @@  static struct attribute *slab_attrs[] = {
 	&validate_attr.attr,
 	&alloc_calls_attr.attr,
 	&free_calls_attr.attr,
+	&warn_on_error_attr.attr,
 #endif
 #ifdef CONFIG_ZONE_DMA
 	&cache_dma_attr.attr,