diff mbox series

mm/slub: Avoid list corruption when removing a slab from the full list

Message ID 20241006044755.79634-1-yuan.gao@ucloud.cn (mailing list archive)
State New
Headers show
Series mm/slub: Avoid list corruption when removing a slab from the full list | expand

Commit Message

yuan.gao Oct. 6, 2024, 4:47 a.m. UTC
Boot with slub_debug=UFPZ.

If allocated object failed in alloc_consistency_checks, all objects of
the slab will be marked as used, and then the slab will be removed from
the partial list.

When an object belonging to the slab got freed later, the remove_full()
function is called. Because the slab is neither on the partial list nor
on the full list, it eventually lead to a list corruption.

So we need to add the slab to full list in this case.

[ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100)
[ 4277.387023] ------------[ cut here ]------------
[ 4277.387880] kernel BUG at lib/list_debug.c:56!
[ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G           OE      6.6.1-1 #1
[ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs]
[ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91
 82 e8 15 f9 9a ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 90 49 91 82 e8 01 f9 9a ff 0f 0b
[ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082
[ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000
[ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff
[ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0
[ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910
[ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0
[ 4277.404049] FS:  0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000
[ 4277.405357] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0
[ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 4277.410000] PKRU: 55555554
[ 4277.410645] Call Trace:
[ 4277.411234]  <TASK>
[ 4277.411777]  ? die+0x32/0x80
[ 4277.412439]  ? do_trap+0xd6/0x100
[ 4277.413150]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.414158]  ? do_error_trap+0x6a/0x90
[ 4277.414948]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.415915]  ? exc_invalid_op+0x4c/0x60
[ 4277.416710]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.417675]  ? asm_exc_invalid_op+0x16/0x20
[ 4277.418482]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.419466]  ? __list_del_entry_valid_or_report+0x7b/0xc0
[ 4277.420410]  free_to_partial_list+0x515/0x5e0
[ 4277.421242]  ? xfs_iext_remove+0x41a/0xa10 [xfs]
[ 4277.422298]  xfs_iext_remove+0x41a/0xa10 [xfs]
[ 4277.423316]  ? xfs_inodegc_worker+0xb4/0x1a0 [xfs]
[ 4277.424383]  xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs]
[ 4277.425490]  __xfs_bunmapi+0x50d/0x840 [xfs]
[ 4277.426445]  xfs_itruncate_extents_flags+0x13a/0x490 [xfs]
[ 4277.427553]  xfs_inactive_truncate+0xa3/0x120 [xfs]
[ 4277.428567]  xfs_inactive+0x22d/0x290 [xfs]
[ 4277.429500]  xfs_inodegc_worker+0xb4/0x1a0 [xfs]
[ 4277.430479]  process_one_work+0x171/0x340
[ 4277.431227]  worker_thread+0x277/0x390
[ 4277.431962]  ? __pfx_worker_thread+0x10/0x10
[ 4277.432752]  kthread+0xf0/0x120
[ 4277.433382]  ? __pfx_kthread+0x10/0x10
[ 4277.434134]  ret_from_fork+0x2d/0x50
[ 4277.434837]  ? __pfx_kthread+0x10/0x10
[ 4277.435566]  ret_from_fork_asm+0x1b/0x30
[ 4277.436280]  </TASK>

Signed-off-by: yuan.gao <yuan.gao@ucloud.cn>
---
 mm/slub.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Hyeonggon Yoo Oct. 6, 2024, 1 p.m. UTC | #1
On Sun, Oct 6, 2024 at 1:48 PM yuan.gao <yuan.gao@ucloud.cn> wrote:
>
> Boot with slub_debug=UFPZ.
>
> If allocated object failed in alloc_consistency_checks, all objects of
> the slab will be marked as used, and then the slab will be removed from
> the partial list.
>
> When an object belonging to the slab got freed later, the remove_full()
> function is called. Because the slab is neither on the partial list nor
> on the full list, it eventually lead to a list corruption.

Good catch! Thanks for investigating the cause and fixing it.

> So we need to add the slab to full list in this case.

While I believe that behavior is not intended by alloc_debug_processing(),
I can't think of a better fix here without adding some complexity.
The approach looks fine to me.

> ---
>  mm/slub.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/mm/slub.c b/mm/slub.c
> index 21f71cb6cc06..a99522b9efc0 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2746,6 +2746,8 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
>
>         if (!alloc_debug_processing(s, slab, object, orig_size)) {
>                 remove_partial(n, slab);
> +               if (slab->inuse == slab->objects)
> +                       add_full(s, n, slab);

Shouldn't this be (folio_test_slab(slab_folio(slab))) instead of
(slab->inuse == slab->objects)?
Oh wait. the kernel also should not call remove_partial() for non-slab folios.

So I think it should be:

if (!alloc_debug_processing(s, slab, object, orig_size)) {
        if (folio_test_slab(slab_folio(slab))) {
                remove_partial(n, slab);
                add_full(s, n, slab);
        }
}

By the way, SLUB always messes with struct page fields even when it is
not a slab,
and I think SLUB should avoid modifying those fields before confirming
it is a slab.
(specifically, calling alloc_debug_processing() before updating
->freelist, ->inuse fields)

That is beyond the scope of this patch, but do you want to address it
in the next version
of your patch series?

Cheers,
Hyeonggon
yuan.gao Oct. 7, 2024, 9:12 a.m. UTC | #2
On 24/10/06 10:00PM, Hyeonggon Yoo wrote:
> On Sun, Oct 6, 2024 at 1:48 PM yuan.gao <yuan.gao@ucloud.cn> wrote:
> >
> > Boot with slub_debug=UFPZ.
> >
> > If allocated object failed in alloc_consistency_checks, all objects of
> > the slab will be marked as used, and then the slab will be removed from
> > the partial list.
> >
> > When an object belonging to the slab got freed later, the remove_full()
> > function is called. Because the slab is neither on the partial list nor
> > on the full list, it eventually lead to a list corruption.
> 
> Good catch! Thanks for investigating the cause and fixing it.
> 
> > So we need to add the slab to full list in this case.
> 
> While I believe that behavior is not intended by alloc_debug_processing(),
> I can't think of a better fix here without adding some complexity.
> The approach looks fine to me.
> 
> > ---
> >  mm/slub.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/mm/slub.c b/mm/slub.c
> > index 21f71cb6cc06..a99522b9efc0 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2746,6 +2746,8 @@ static void *alloc_single_from_partial(struct kmem_cache *s,
> >
> >         if (!alloc_debug_processing(s, slab, object, orig_size)) {
> >                 remove_partial(n, slab);
> > +               if (slab->inuse == slab->objects)
> > +                       add_full(s, n, slab);
> 
> Shouldn't this be (folio_test_slab(slab_folio(slab))) instead of
> (slab->inuse == slab->objects)?
> Oh wait. the kernel also should not call remove_partial() for non-slab folios.
> 
> So I think it should be:
> 
> if (!alloc_debug_processing(s, slab, object, orig_size)) {
>         if (folio_test_slab(slab_folio(slab))) {
>                 remove_partial(n, slab);
>                 add_full(s, n, slab);
>         }
> }

Thank you for reminding me of this. I didn't notice the subtle differences here. 
 
> By the way, SLUB always messes with struct page fields even when it is
> not a slab,
> and I think SLUB should avoid modifying those fields before confirming
> it is a slab.
> (specifically, calling alloc_debug_processing() before updating
> ->freelist, ->inuse fields)
> 
> That is beyond the scope of this patch, but do you want to address it
> in the next version
> of your patch series?
> 
> Cheers,
> Hyeonggon
> 

I'm glad to do that, just takes time.

Thanks
Christoph Lameter (Ampere) Oct. 7, 2024, 3:57 p.m. UTC | #3
On Sun, 6 Oct 2024, yuan.gao wrote:

> If allocated object failed in alloc_consistency_checks, all objects of
> the slab will be marked as used, and then the slab will be removed from
> the partial list.

Yea so the intend is to isolate the corrupted slab page. There could be
more corrupted data on the page.

> When an object belonging to the slab got freed later, the remove_full()
> function is called. Because the slab is neither on the partial list nor
> on the full list, it eventually lead to a list corruption.

Right. The full list is used in the debug case.

> So we need to add the slab to full list in this case.

That would mean to put the slab with corrupted memory back in circulation.

I guess we would need some check to avoid handling list operations on a
slab page that was removed from the lists because of metadata corruption.
diff mbox series

Patch

diff --git a/mm/slub.c b/mm/slub.c
index 21f71cb6cc06..a99522b9efc0 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2746,6 +2746,8 @@  static void *alloc_single_from_partial(struct kmem_cache *s,
 
 	if (!alloc_debug_processing(s, slab, object, orig_size)) {
 		remove_partial(n, slab);
+		if (slab->inuse == slab->objects)
+			add_full(s, n, slab);
 		return NULL;
 	}