mbox series

[v4,0/2] mm: slub: Enhanced debugging in slub error

Message ID 20250226081206.680495-1-hyesoo.yu@samsung.com (mailing list archive)
Headers show
Series mm: slub: Enhanced debugging in slub error | expand

Message

Hyesoo Yu Feb. 26, 2025, 8:11 a.m. UTC
Dear Maintainer,

The purpose is to improve the debugging capabilities of the slub allocator
when a error occurs. The following improvements have been made:

 - Added WARN() calls at specific locations (slab_err, object_err) to detect
errors effectively and to generate a crash dump if panic_on_warn is enabled.

 - Additionally, the error printing location in check_object has been adjusted to
display the broken data before the restoration process. This improvement
allows for a better understanding of how the data was corrupted.

This series combines two patches that were discussed seperately in the links below.
https://lore.kernel.org/linux-mm/20250120082908.4162780-1-hyesoo.yu@samsung.com/
https://lore.kernel.org/linux-mm/20250120083023.4162932-1-hyesoo.yu@samsung.com/

Thanks you.

version 2 changes
 - Replaced direct calling of BUG_ON with the use of WARN() to trigger a panic.
 - Modified the code to print the broken data only once before the restore.

version 3 changes
 - Moved WARN() from slab_fix to slab_err and object to call WARN on all error
 reporting paths.
 - Changed the parameter type of check_bytes_and_report.

version 4 changes
 - Modified the print format to include specific error names.
 - Removed the redundant warning by removing WARN() in kmem_cache_destroy

Hyesoo Yu (2):
  mm: slub: Print the broken data before restoring slub.
  mm: slub: call WARN() when the slab detect an error

 mm/slab_common.c |  3 ---
 mm/slub.c        | 63 +++++++++++++++++++++++++-----------------------
 2 files changed, 33 insertions(+), 33 deletions(-)

Comments

Harry Yoo Feb. 27, 2025, 11:53 a.m. UTC | #1
On Wed, Feb 26, 2025 at 05:11:59PM +0900, Hyesoo Yu wrote:
> Dear Maintainer,
> 
> The purpose is to improve the debugging capabilities of the slub allocator
> when a error occurs. The following improvements have been made:
> 
>  - Added WARN() calls at specific locations (slab_err, object_err) to detect
> errors effectively and to generate a crash dump if panic_on_warn is enabled.
> 
>  - Additionally, the error printing location in check_object has been adjusted to
> display the broken data before the restoration process. This improvement
> allows for a better understanding of how the data was corrupted.
> 
> This series combines two patches that were discussed seperately in the links below.
> https://urldefense.com/v3/__https://lore.kernel.org/linux-mm/20250120082908.4162780-1-hyesoo.yu@samsung.com/__;!!ACWV5N9M2RV99hQ!JpvsczvJJcu4xw6xseDcLQJyiNXgZmwubb5cXEfORBj3VslI2ZTgmipoW7pdQ6qTldrr0mnk2l99xw3nio0$ 
> https://urldefense.com/v3/__https://lore.kernel.org/linux-mm/20250120083023.4162932-1-hyesoo.yu@samsung.com/__;!!ACWV5N9M2RV99hQ!JpvsczvJJcu4xw6xseDcLQJyiNXgZmwubb5cXEfORBj3VslI2ZTgmipoW7pdQ6qTldrr0mnk2l99Cdp4khE$ 

IMHO it will be helpful if the cover letter includes error reporting output 
before and after this patch series.
Vlastimil Babka Feb. 27, 2025, 4:12 p.m. UTC | #2
On 2/26/25 09:11, Hyesoo Yu wrote:
> Dear Maintainer,
> 
> The purpose is to improve the debugging capabilities of the slub allocator
> when a error occurs. The following improvements have been made:
> 
>  - Added WARN() calls at specific locations (slab_err, object_err) to detect
> errors effectively and to generate a crash dump if panic_on_warn is enabled.
> 
>  - Additionally, the error printing location in check_object has been adjusted to
> display the broken data before the restoration process. This improvement
> allows for a better understanding of how the data was corrupted.
> 
> This series combines two patches that were discussed seperately in the links below.
> https://lore.kernel.org/linux-mm/20250120082908.4162780-1-hyesoo.yu@samsung.com/
> https://lore.kernel.org/linux-mm/20250120083023.4162932-1-hyesoo.yu@samsung.com/
> 
> Thanks you.

Thanks. On top of things already mentioned, I added some kunit suppressions
in patch 2. Please check the result:

https://web.git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-6.15/fixes-cleanups
Vlastimil Babka Feb. 27, 2025, 4:26 p.m. UTC | #3
On 2/27/25 17:12, Vlastimil Babka wrote:
> On 2/26/25 09:11, Hyesoo Yu wrote:
>> Dear Maintainer,
>> 
>> The purpose is to improve the debugging capabilities of the slub allocator
>> when a error occurs. The following improvements have been made:
>> 
>>  - Added WARN() calls at specific locations (slab_err, object_err) to detect
>> errors effectively and to generate a crash dump if panic_on_warn is enabled.
>> 
>>  - Additionally, the error printing location in check_object has been adjusted to
>> display the broken data before the restoration process. This improvement
>> allows for a better understanding of how the data was corrupted.
>> 
>> This series combines two patches that were discussed seperately in the links below.
>> https://lore.kernel.org/linux-mm/20250120082908.4162780-1-hyesoo.yu@samsung.com/
>> https://lore.kernel.org/linux-mm/20250120083023.4162932-1-hyesoo.yu@samsung.com/
>> 
>> Thanks you.
> 
> Thanks. On top of things already mentioned, I added some kunit suppressions
> in patch 2. Please check the result:
> 
> https://web.git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-6.15/fixes-cleanups

What do you think about the following patch on top?

---8<---
From c38dadde6293cacdb91f95afc3615c22dec5830a Mon Sep 17 00:00:00 2001
From: Vlastimil Babka <vbabka@suse.cz>
Date: Thu, 27 Feb 2025 16:05:46 +0100
Subject: [PATCH] mm, slab: cleanup slab_bug() parameters

slab_err() has variadic printf arguments but instead of passing them to
slab_bug() it does vsnprintf() to a buffer and passes %s, buf.

To allow passing them directly, turn slab_bug() to __slab_bug() with a
va_list parameter, and slab_bug() a wrapper with fmt, ... parameters.
Then slab_err() can call __slab_bug() without the intermediate buffer.

Also constify fmt everywhere, which also simplifies object_err()'s
call to slab_bug().

Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
---
 mm/slub.c | 28 ++++++++++++++++++----------
 1 file changed, 18 insertions(+), 10 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index a9a02b4ae4d6..d94af020b305 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1017,12 +1017,12 @@ void skip_orig_size_check(struct kmem_cache *s, const void *object)
 	set_orig_size(s, (void *)object, s->object_size);
 }
 
-static void slab_bug(struct kmem_cache *s, char *fmt, ...)
+static void __slab_bug(struct kmem_cache *s, const char *fmt, va_list argsp)
 {
 	struct va_format vaf;
 	va_list args;
 
-	va_start(args, fmt);
+	va_copy(args, argsp);
 	vaf.fmt = fmt;
 	vaf.va = &args;
 	pr_err("=============================================================================\n");
@@ -1031,8 +1031,17 @@ static void slab_bug(struct kmem_cache *s, char *fmt, ...)
 	va_end(args);
 }
 
+static void slab_bug(struct kmem_cache *s, const char *fmt, ...)
+{
+	va_list args;
+
+	va_start(args, fmt);
+	__slab_bug(s, fmt, args);
+	va_end(args);
+}
+
 __printf(2, 3)
-static void slab_fix(struct kmem_cache *s, char *fmt, ...)
+static void slab_fix(struct kmem_cache *s, const char *fmt, ...)
 {
 	struct va_format vaf;
 	va_list args;
@@ -1088,12 +1097,12 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p)
 }
 
 static void object_err(struct kmem_cache *s, struct slab *slab,
-			u8 *object, char *reason)
+			u8 *object, const char *reason)
 {
 	if (slab_add_kunit_errors())
 		return;
 
-	slab_bug(s, "%s", reason);
+	slab_bug(s, reason);
 	print_trailer(s, slab, object);
 	add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE);
 
@@ -1129,15 +1138,14 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab,
 			const char *fmt, ...)
 {
 	va_list args;
-	char buf[100];
 
 	if (slab_add_kunit_errors())
 		return;
 
 	va_start(args, fmt);
-	vsnprintf(buf, sizeof(buf), fmt, args);
+	__slab_bug(s, fmt, args);
 	va_end(args);
-	slab_bug(s, "%s", buf);
+
 	__slab_err(slab);
 }
 
@@ -1175,7 +1183,7 @@ static void init_object(struct kmem_cache *s, void *object, u8 val)
 					  s->inuse - poison_size);
 }
 
-static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
+static void restore_bytes(struct kmem_cache *s, const char *message, u8 data,
 						void *from, void *to)
 {
 	slab_fix(s, "Restoring %s 0x%p-0x%p=0x%x", message, from, to - 1, data);
@@ -1190,7 +1198,7 @@ static void restore_bytes(struct kmem_cache *s, char *message, u8 data,
 
 static pad_check_attributes int
 check_bytes_and_report(struct kmem_cache *s, struct slab *slab,
-		       u8 *object, char *what, u8 *start, unsigned int value,
+		       u8 *object, const char *what, u8 *start, unsigned int value,
 		       unsigned int bytes, bool slab_obj_print)
 {
 	u8 *fault;
Harry Yoo Feb. 28, 2025, 12:47 p.m. UTC | #4
On Thu, Feb 27, 2025 at 05:26:26PM +0100, Vlastimil Babka wrote:
> On 2/27/25 17:12, Vlastimil Babka wrote:
> > On 2/26/25 09:11, Hyesoo Yu wrote:
> >> Dear Maintainer,
> >> 
> >> The purpose is to improve the debugging capabilities of the slub allocator
> >> when a error occurs. The following improvements have been made:
> >> 
> >>  - Added WARN() calls at specific locations (slab_err, object_err) to detect
> >> errors effectively and to generate a crash dump if panic_on_warn is enabled.
> >> 
> >>  - Additionally, the error printing location in check_object has been adjusted to
> >> display the broken data before the restoration process. This improvement
> >> allows for a better understanding of how the data was corrupted.
> >> 
> >> This series combines two patches that were discussed seperately in the links below.
> >> https://lore.kernel.org/linux-mm/20250120082908.4162780-1-hyesoo.yu@samsung.com/
> >> https://lore.kernel.org/linux-mm/20250120083023.4162932-1-hyesoo.yu@samsung.com/
> >> 
> >> Thanks you.
> > 
> > Thanks. On top of things already mentioned, I added some kunit suppressions
> > in patch 2. Please check the result:
> > 
> > https://web.git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab.git/log/?h=slab/for-6.15/fixes-cleanups
> 
> What do you think about the following patch on top?
> 
> ---8<---
> From c38dadde6293cacdb91f95afc3615c22dec5830a Mon Sep 17 00:00:00 2001
> From: Vlastimil Babka <vbabka@suse.cz>
> Date: Thu, 27 Feb 2025 16:05:46 +0100
> Subject: [PATCH] mm, slab: cleanup slab_bug() parameters
> 
> slab_err() has variadic printf arguments but instead of passing them to
> slab_bug() it does vsnprintf() to a buffer and passes %s, buf.
> 
> To allow passing them directly, turn slab_bug() to __slab_bug() with a
> va_list parameter, and slab_bug() a wrapper with fmt, ... parameters.
> Then slab_err() can call __slab_bug() without the intermediate buffer.
> 
> Also constify fmt everywhere, which also simplifies object_err()'s
> call to slab_bug().
> 
> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> ---

Looks good to me.

FWIW,
Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
Vlastimil Babka Feb. 28, 2025, 4:02 p.m. UTC | #5
On 2/28/25 13:47, Harry Yoo wrote:
> On Thu, Feb 27, 2025 at 05:26:26PM +0100, Vlastimil Babka wrote:
>> ---8<---
>> From c38dadde6293cacdb91f95afc3615c22dec5830a Mon Sep 17 00:00:00 2001
>> From: Vlastimil Babka <vbabka@suse.cz>
>> Date: Thu, 27 Feb 2025 16:05:46 +0100
>> Subject: [PATCH] mm, slab: cleanup slab_bug() parameters
>> 
>> slab_err() has variadic printf arguments but instead of passing them to
>> slab_bug() it does vsnprintf() to a buffer and passes %s, buf.
>> 
>> To allow passing them directly, turn slab_bug() to __slab_bug() with a
>> va_list parameter, and slab_bug() a wrapper with fmt, ... parameters.
>> Then slab_err() can call __slab_bug() without the intermediate buffer.
>> 
>> Also constify fmt everywhere, which also simplifies object_err()'s
>> call to slab_bug().
>> 
>> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
>> ---
> 
> Looks good to me.
> 
> FWIW,
> Reviewed-by: Harry Yoo <harry.yoo@oracle.com>

Thanks, adding to slab/for-next
Hyesoo Yu March 4, 2025, 1:37 a.m. UTC | #6
On Fri, Feb 28, 2025 at 05:02:00PM +0100, Vlastimil Babka wrote:
> On 2/28/25 13:47, Harry Yoo wrote:
> > On Thu, Feb 27, 2025 at 05:26:26PM +0100, Vlastimil Babka wrote:
> >> ---8<---
> >> From c38dadde6293cacdb91f95afc3615c22dec5830a Mon Sep 17 00:00:00 2001
> >> From: Vlastimil Babka <vbabka@suse.cz>
> >> Date: Thu, 27 Feb 2025 16:05:46 +0100
> >> Subject: [PATCH] mm, slab: cleanup slab_bug() parameters
> >> 
> >> slab_err() has variadic printf arguments but instead of passing them to
> >> slab_bug() it does vsnprintf() to a buffer and passes %s, buf.
> >> 
> >> To allow passing them directly, turn slab_bug() to __slab_bug() with a
> >> va_list parameter, and slab_bug() a wrapper with fmt, ... parameters.
> >> Then slab_err() can call __slab_bug() without the intermediate buffer.
> >> 
> >> Also constify fmt everywhere, which also simplifies object_err()'s
> >> call to slab_bug().
> >> 
> >> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
> >> ---
> > 
> > Looks good to me.
> > 
> > FWIW,
> > Reviewed-by: Harry Yoo <harry.yoo@oracle.com>
> 
> Thanks, adding to slab/for-next
> 
> 

Looks good to me.
Thanks!