Message ID | 20250226081206.680495-3-hyesoo.yu@samsung.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: slub: Enhanced debugging in slub error | expand |
On Wed, Feb 26, 2025 at 05:12:01PM +0900, Hyesoo Yu wrote: > If a slab object is corrupted or an error occurs in its internal > value, continuing after restoration may cause other side effects. > At this point, it is difficult to debug because the problem occurred > in the past. It is useful to use WARN() to catch errors at the point > of issue because WARN() could trigger panic for system debugging when > panic_on_warn is enabled. WARN() is added where to detect the error > on slab_err and object_err. > > It makes sense to only do the WARN() after printing the logs. slab_err > is splited to __slab_err that calls the WARN() and it is called after > printing logs. > > Changes in v4: > - Remove WARN() in kmem_cache_destroy to remove redundant warning. > > Changes in v3: > - move the WARN from slab_fix to slab_err, object_err and check_obj to > use WARN on all error reporting paths. > > Changes in v2: > - Replace direct calling with BUG_ON with the use of WARN in slab_fix. Same here, please rephrase the changelog without "Changes in vN" in the changelog, and move the patch version changes below "---" line. > Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> > --- Otherwise this change in general looks good to me (with a suggestion below). Please feel free to add: Reviewed-by: Harry Yoo <harry.yoo@oracle.com> # Suggestion There's a case where SLUB just calls pr_err() and dump_stack() instead of slab_err() or object_err() in free_consistency_checks(). I would like to add something like this: diff --git a/mm/slub.c b/mm/slub.c index b7660ee85db0..d5609fa63da4 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1022,12 +1022,16 @@ static void slab_bug(struct kmem_cache *s, char *fmt, ...) { struct va_format vaf; va_list args; + const char *name = "<unknown>"; + + if (s) + name = s->name; va_start(args, fmt); vaf.fmt = fmt; vaf.va = &args; pr_err("=============================================================================\n"); - pr_err("BUG %s (%s): %pV\n", s->name, print_tainted(), &vaf); + pr_err("BUG %s (%s): %pV\n", name, print_tainted(), &vaf); pr_err("-----------------------------------------------------------------------------\n\n"); va_end(args); } @@ -1628,9 +1632,8 @@ static inline int free_consistency_checks(struct kmem_cache *s, slab_err(s, slab, "Attempt to free object(0x%p) outside of slab", object); } else if (!slab->slab_cache) { - pr_err("SLUB <none>: no slab for object 0x%p.\n", - object); - dump_stack(); + slab_err(NULL, slab, "No slab for object 0x%p", + object); } else object_err(s, slab, object, "page slab pointer corrupt.");
On 2/26/25 09:12, Hyesoo Yu wrote: > If a slab object is corrupted or an error occurs in its internal > value, continuing after restoration may cause other side effects. > At this point, it is difficult to debug because the problem occurred > in the past. It is useful to use WARN() to catch errors at the point > of issue because WARN() could trigger panic for system debugging when > panic_on_warn is enabled. WARN() is added where to detect the error > on slab_err and object_err. > > It makes sense to only do the WARN() after printing the logs. slab_err > is splited to __slab_err that calls the WARN() and it is called after > printing logs. > > Changes in v4: > - Remove WARN() in kmem_cache_destroy to remove redundant warning. > > Changes in v3: > - move the WARN from slab_fix to slab_err, object_err and check_obj to > use WARN on all error reporting paths. > > Changes in v2: > - Replace direct calling with BUG_ON with the use of WARN in slab_fix. > > Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> As Harry said. I'll remove that locally. > --- > mm/slab_common.c | 3 --- > mm/slub.c | 31 +++++++++++++++++++------------ > 2 files changed, 19 insertions(+), 15 deletions(-) > > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 477fa471da18..d13f4ffe252b 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -517,9 +517,6 @@ void kmem_cache_destroy(struct kmem_cache *s) > kasan_cache_shutdown(s); > > err = __kmem_cache_shutdown(s); > - if (!slab_in_kunit_test()) > - WARN(err, "%s %s: Slab cache still has objects when called from %pS", > - __func__, s->name, (void *)_RET_IP_); I think I'll keep this one, because the more detailed warning via list_slab_objects() is only enabled with CONFIG_SLUB_DEBUG. If it's not enabled, the kmem_cache_destroy() failure rather should not be silent. So slab_in_kunit_test() would also stay. > } else { > - list_slab_objects(s, slab, > - "Objects remaining in %s on __kmem_cache_shutdown()"); > + list_slab_objects(s, slab); I tried to extract slab_bug() and __slab_err() from list_slab_objects() but they were also only available with CONFIG_SLUB_DEBUG. Perhaps we can improve that, but as a follow-up cleanup so we don't hold this up further. > } > } > spin_unlock_irq(&n->list_lock);
On 2/27/25 13:55, Harry Yoo wrote: > On Wed, Feb 26, 2025 at 05:12:01PM +0900, Hyesoo Yu wrote: >> If a slab object is corrupted or an error occurs in its internal >> value, continuing after restoration may cause other side effects. >> At this point, it is difficult to debug because the problem occurred >> in the past. It is useful to use WARN() to catch errors at the point >> of issue because WARN() could trigger panic for system debugging when >> panic_on_warn is enabled. WARN() is added where to detect the error >> on slab_err and object_err. >> >> It makes sense to only do the WARN() after printing the logs. slab_err >> is splited to __slab_err that calls the WARN() and it is called after >> printing logs. >> >> Changes in v4: >> - Remove WARN() in kmem_cache_destroy to remove redundant warning. >> >> Changes in v3: >> - move the WARN from slab_fix to slab_err, object_err and check_obj to >> use WARN on all error reporting paths. >> >> Changes in v2: >> - Replace direct calling with BUG_ON with the use of WARN in slab_fix. > > Same here, please rephrase the changelog without "Changes in vN" in the > changelog, and move the patch version changes below "---" line. > >> Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> >> --- > > Otherwise this change in general looks good to me (with a suggestion below). > Please feel free to add: > Reviewed-by: Harry Yoo <harry.yoo@oracle.com> > > # Suggestion > > There's a case where SLUB just calls pr_err() and dump_stack() instead of > slab_err() or object_err() in free_consistency_checks(). > > I would like to add something like this: > > diff --git a/mm/slub.c b/mm/slub.c > index b7660ee85db0..d5609fa63da4 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -1022,12 +1022,16 @@ static void slab_bug(struct kmem_cache *s, char *fmt, ...) > { > struct va_format vaf; > va_list args; > + const char *name = "<unknown>"; > + > + if (s) > + name = s->name; > > va_start(args, fmt); > vaf.fmt = fmt; > vaf.va = &args; > pr_err("=============================================================================\n"); > - pr_err("BUG %s (%s): %pV\n", s->name, print_tainted(), &vaf); > + pr_err("BUG %s (%s): %pV\n", name, print_tainted(), &vaf); s ? s->name : "<unknown>" more concise > pr_err("-----------------------------------------------------------------------------\n\n"); > va_end(args); > } > @@ -1628,9 +1632,8 @@ static inline int free_consistency_checks(struct kmem_cache *s, > slab_err(s, slab, "Attempt to free object(0x%p) outside of slab", > object); > } else if (!slab->slab_cache) { > - pr_err("SLUB <none>: no slab for object 0x%p.\n", > - object); > - dump_stack(); > + slab_err(NULL, slab, "No slab for object 0x%p", > + object); Good suggestion, added that locally. Probably not likely to trigger as a use-after-free would mean we're rather hit !folio_test_slab() above than a folio that has a slab flag but has a NULL pointer (or the pointer might be garbage and not NULL). But at least the error handling will be consistent. Changed the error message to "No slab cache" as that's more accurate. Thanks. > } else > object_err(s, slab, object, > "page slab pointer corrupt."); >
diff --git a/mm/slab_common.c b/mm/slab_common.c index 477fa471da18..d13f4ffe252b 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -517,9 +517,6 @@ void kmem_cache_destroy(struct kmem_cache *s) kasan_cache_shutdown(s); err = __kmem_cache_shutdown(s); - if (!slab_in_kunit_test()) - WARN(err, "%s %s: Slab cache still has objects when called from %pS", - __func__, s->name, (void *)_RET_IP_); list_del(&s->list); diff --git a/mm/slub.c b/mm/slub.c index 8c13cd43c0fd..4961eeccf3ad 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1096,8 +1096,6 @@ static void print_trailer(struct kmem_cache *s, struct slab *slab, u8 *p) /* Beginning of the filler is the free pointer */ print_section(KERN_ERR, "Padding ", p + off, size_from_object(s) - off); - - dump_stack(); } static void object_err(struct kmem_cache *s, struct slab *slab, @@ -1109,6 +1107,8 @@ static void object_err(struct kmem_cache *s, struct slab *slab, slab_bug(s, "%s", reason); print_trailer(s, slab, object); add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); + + WARN_ON(1); } static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab, @@ -1125,6 +1125,14 @@ static bool freelist_corrupted(struct kmem_cache *s, struct slab *slab, return false; } +static void __slab_err(struct slab *slab) +{ + print_slab_info(slab); + add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); + + WARN_ON(1); +} + static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab, const char *fmt, ...) { @@ -1138,9 +1146,7 @@ static __printf(3, 4) void slab_err(struct kmem_cache *s, struct slab *slab, vsnprintf(buf, sizeof(buf), fmt, args); va_end(args); slab_bug(s, "%s", buf); - print_slab_info(slab); - dump_stack(); - add_taint(TAINT_BAD_PAGE, LOCKDEP_NOW_UNRELIABLE); + __slab_err(slab); } static void init_object(struct kmem_cache *s, void *object, u8 val) @@ -1313,9 +1319,10 @@ slab_pad_check(struct kmem_cache *s, struct slab *slab) while (end > fault && end[-1] == POISON_INUSE) end--; - slab_err(s, slab, "Padding overwritten. 0x%p-0x%p @offset=%tu", - fault, end - 1, fault - start); + slab_bug(s, "Padding overwritten. 0x%p-0x%p @offset=%tu", + fault, end - 1, fault - start); print_section(KERN_ERR, "Padding ", pad, remainder); + __slab_err(slab); restore_bytes(s, "slab padding", POISON_INUSE, fault, end); } @@ -5428,14 +5435,13 @@ static int calculate_sizes(struct kmem_cache_args *args, struct kmem_cache *s) return !!oo_objects(s->oo); } -static void list_slab_objects(struct kmem_cache *s, struct slab *slab, - const char *text) +static void list_slab_objects(struct kmem_cache *s, struct slab *slab) { #ifdef CONFIG_SLUB_DEBUG void *addr = slab_address(slab); void *p; - slab_err(s, slab, text, s->name); + slab_bug(s, "Objects remaining on __kmem_cache_shutdown()"); spin_lock(&object_map_lock); __fill_map(object_map, s, slab); @@ -5450,6 +5456,8 @@ static void list_slab_objects(struct kmem_cache *s, struct slab *slab, } } spin_unlock(&object_map_lock); + + __slab_err(slab); #endif } @@ -5470,8 +5478,7 @@ static void free_partial(struct kmem_cache *s, struct kmem_cache_node *n) remove_partial(n, slab); list_add(&slab->slab_list, &discard); } else { - list_slab_objects(s, slab, - "Objects remaining in %s on __kmem_cache_shutdown()"); + list_slab_objects(s, slab); } } spin_unlock_irq(&n->list_lock);
If a slab object is corrupted or an error occurs in its internal value, continuing after restoration may cause other side effects. At this point, it is difficult to debug because the problem occurred in the past. It is useful to use WARN() to catch errors at the point of issue because WARN() could trigger panic for system debugging when panic_on_warn is enabled. WARN() is added where to detect the error on slab_err and object_err. It makes sense to only do the WARN() after printing the logs. slab_err is splited to __slab_err that calls the WARN() and it is called after printing logs. Changes in v4: - Remove WARN() in kmem_cache_destroy to remove redundant warning. Changes in v3: - move the WARN from slab_fix to slab_err, object_err and check_obj to use WARN on all error reporting paths. Changes in v2: - Replace direct calling with BUG_ON with the use of WARN in slab_fix. Signed-off-by: Hyesoo Yu <hyesoo.yu@samsung.com> --- mm/slab_common.c | 3 --- mm/slub.c | 31 +++++++++++++++++++------------ 2 files changed, 19 insertions(+), 15 deletions(-)