Message ID | 20241006044755.79634-1-yuan.gao@ucloud.cn (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/slub: Avoid list corruption when removing a slab from the full list | expand |
On Sun, Oct 6, 2024 at 1:48 PM yuan.gao <yuan.gao@ucloud.cn> wrote: > > Boot with slub_debug=UFPZ. > > If allocated object failed in alloc_consistency_checks, all objects of > the slab will be marked as used, and then the slab will be removed from > the partial list. > > When an object belonging to the slab got freed later, the remove_full() > function is called. Because the slab is neither on the partial list nor > on the full list, it eventually lead to a list corruption. Good catch! Thanks for investigating the cause and fixing it. > So we need to add the slab to full list in this case. While I believe that behavior is not intended by alloc_debug_processing(), I can't think of a better fix here without adding some complexity. The approach looks fine to me. > --- > mm/slub.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/mm/slub.c b/mm/slub.c > index 21f71cb6cc06..a99522b9efc0 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2746,6 +2746,8 @@ static void *alloc_single_from_partial(struct kmem_cache *s, > > if (!alloc_debug_processing(s, slab, object, orig_size)) { > remove_partial(n, slab); > + if (slab->inuse == slab->objects) > + add_full(s, n, slab); Shouldn't this be (folio_test_slab(slab_folio(slab))) instead of (slab->inuse == slab->objects)? Oh wait. the kernel also should not call remove_partial() for non-slab folios. So I think it should be: if (!alloc_debug_processing(s, slab, object, orig_size)) { if (folio_test_slab(slab_folio(slab))) { remove_partial(n, slab); add_full(s, n, slab); } } By the way, SLUB always messes with struct page fields even when it is not a slab, and I think SLUB should avoid modifying those fields before confirming it is a slab. (specifically, calling alloc_debug_processing() before updating ->freelist, ->inuse fields) That is beyond the scope of this patch, but do you want to address it in the next version of your patch series? Cheers, Hyeonggon
On 24/10/06 10:00PM, Hyeonggon Yoo wrote: > On Sun, Oct 6, 2024 at 1:48 PM yuan.gao <yuan.gao@ucloud.cn> wrote: > > > > Boot with slub_debug=UFPZ. > > > > If allocated object failed in alloc_consistency_checks, all objects of > > the slab will be marked as used, and then the slab will be removed from > > the partial list. > > > > When an object belonging to the slab got freed later, the remove_full() > > function is called. Because the slab is neither on the partial list nor > > on the full list, it eventually lead to a list corruption. > > Good catch! Thanks for investigating the cause and fixing it. > > > So we need to add the slab to full list in this case. > > While I believe that behavior is not intended by alloc_debug_processing(), > I can't think of a better fix here without adding some complexity. > The approach looks fine to me. > > > --- > > mm/slub.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/mm/slub.c b/mm/slub.c > > index 21f71cb6cc06..a99522b9efc0 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -2746,6 +2746,8 @@ static void *alloc_single_from_partial(struct kmem_cache *s, > > > > if (!alloc_debug_processing(s, slab, object, orig_size)) { > > remove_partial(n, slab); > > + if (slab->inuse == slab->objects) > > + add_full(s, n, slab); > > Shouldn't this be (folio_test_slab(slab_folio(slab))) instead of > (slab->inuse == slab->objects)? > Oh wait. the kernel also should not call remove_partial() for non-slab folios. > > So I think it should be: > > if (!alloc_debug_processing(s, slab, object, orig_size)) { > if (folio_test_slab(slab_folio(slab))) { > remove_partial(n, slab); > add_full(s, n, slab); > } > } Thank you for reminding me of this. I didn't notice the subtle differences here. > By the way, SLUB always messes with struct page fields even when it is > not a slab, > and I think SLUB should avoid modifying those fields before confirming > it is a slab. > (specifically, calling alloc_debug_processing() before updating > ->freelist, ->inuse fields) > > That is beyond the scope of this patch, but do you want to address it > in the next version > of your patch series? > > Cheers, > Hyeonggon > I'm glad to do that, just takes time. Thanks
On Sun, 6 Oct 2024, yuan.gao wrote: > If allocated object failed in alloc_consistency_checks, all objects of > the slab will be marked as used, and then the slab will be removed from > the partial list. Yea so the intend is to isolate the corrupted slab page. There could be more corrupted data on the page. > When an object belonging to the slab got freed later, the remove_full() > function is called. Because the slab is neither on the partial list nor > on the full list, it eventually lead to a list corruption. Right. The full list is used in the debug case. > So we need to add the slab to full list in this case. That would mean to put the slab with corrupted memory back in circulation. I guess we would need some check to avoid handling list operations on a slab page that was removed from the lists because of metadata corruption.
diff --git a/mm/slub.c b/mm/slub.c index 21f71cb6cc06..a99522b9efc0 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2746,6 +2746,8 @@ static void *alloc_single_from_partial(struct kmem_cache *s, if (!alloc_debug_processing(s, slab, object, orig_size)) { remove_partial(n, slab); + if (slab->inuse == slab->objects) + add_full(s, n, slab); return NULL; }
Boot with slub_debug=UFPZ. If allocated object failed in alloc_consistency_checks, all objects of the slab will be marked as used, and then the slab will be removed from the partial list. When an object belonging to the slab got freed later, the remove_full() function is called. Because the slab is neither on the partial list nor on the full list, it eventually lead to a list corruption. So we need to add the slab to full list in this case. [ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100) [ 4277.387023] ------------[ cut here ]------------ [ 4277.387880] kernel BUG at lib/list_debug.c:56! [ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G OE 6.6.1-1 #1 [ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs] [ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91 82 e8 15 f9 9a ff <0f> 0b 48 89 fe 48 89 ca 48 c7 c7 90 49 91 82 e8 01 f9 9a ff 0f 0b [ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082 [ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000 [ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff [ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0 [ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910 [ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0 [ 4277.404049] FS: 0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000 [ 4277.405357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0 [ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4277.410000] PKRU: 55555554 [ 4277.410645] Call Trace: [ 4277.411234] <TASK> [ 4277.411777] ? die+0x32/0x80 [ 4277.412439] ? do_trap+0xd6/0x100 [ 4277.413150] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.414158] ? do_error_trap+0x6a/0x90 [ 4277.414948] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.415915] ? exc_invalid_op+0x4c/0x60 [ 4277.416710] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.417675] ? asm_exc_invalid_op+0x16/0x20 [ 4277.418482] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.419466] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.420410] free_to_partial_list+0x515/0x5e0 [ 4277.421242] ? xfs_iext_remove+0x41a/0xa10 [xfs] [ 4277.422298] xfs_iext_remove+0x41a/0xa10 [xfs] [ 4277.423316] ? xfs_inodegc_worker+0xb4/0x1a0 [xfs] [ 4277.424383] xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs] [ 4277.425490] __xfs_bunmapi+0x50d/0x840 [xfs] [ 4277.426445] xfs_itruncate_extents_flags+0x13a/0x490 [xfs] [ 4277.427553] xfs_inactive_truncate+0xa3/0x120 [xfs] [ 4277.428567] xfs_inactive+0x22d/0x290 [xfs] [ 4277.429500] xfs_inodegc_worker+0xb4/0x1a0 [xfs] [ 4277.430479] process_one_work+0x171/0x340 [ 4277.431227] worker_thread+0x277/0x390 [ 4277.431962] ? __pfx_worker_thread+0x10/0x10 [ 4277.432752] kthread+0xf0/0x120 [ 4277.433382] ? __pfx_kthread+0x10/0x10 [ 4277.434134] ret_from_fork+0x2d/0x50 [ 4277.434837] ? __pfx_kthread+0x10/0x10 [ 4277.435566] ret_from_fork_asm+0x1b/0x30 [ 4277.436280] </TASK> Signed-off-by: yuan.gao <yuan.gao@ucloud.cn> --- mm/slub.c | 2 ++ 1 file changed, 2 insertions(+)