Message ID | 20190909061016.173927-1-yuzhao@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: avoid slub allocation while holding list_lock | expand |
On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote: > If we are already under list_lock, don't call kmalloc(). Otherwise we > will run into deadlock because kmalloc() also tries to grab the same > lock. > > Instead, allocate pages directly. Given currently page->objects has > 15 bits, we only need 1 page. We may waste some memory but we only do > so when slub debug is on. > > WARNING: possible recursive locking detected > -------------------------------------------- > mount-encrypted/4921 is trying to acquire lock: > (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > > but task is already holding lock: > (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&(&n->list_lock)->rlock); > lock(&(&n->list_lock)->rlock); > > *** DEADLOCK *** > > Signed-off-by: Yu Zhao <yuzhao@google.com> Looks sane to me: Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
On Tue, Sep 10, 2019 at 05:57:22AM +0900, Tetsuo Handa wrote: > On 2019/09/10 1:00, Kirill A. Shutemov wrote: > > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote: > >> If we are already under list_lock, don't call kmalloc(). Otherwise we > >> will run into deadlock because kmalloc() also tries to grab the same > >> lock. > >> > >> Instead, allocate pages directly. Given currently page->objects has > >> 15 bits, we only need 1 page. We may waste some memory but we only do > >> so when slub debug is on. > >> > >> WARNING: possible recursive locking detected > >> -------------------------------------------- > >> mount-encrypted/4921 is trying to acquire lock: > >> (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > >> > >> but task is already holding lock: > >> (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > >> > >> other info that might help us debug this: > >> Possible unsafe locking scenario: > >> > >> CPU0 > >> ---- > >> lock(&(&n->list_lock)->rlock); > >> lock(&(&n->list_lock)->rlock); > >> > >> *** DEADLOCK *** > >> > >> Signed-off-by: Yu Zhao <yuzhao@google.com> > > > > Looks sane to me: > > > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > > Really? > > Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG > than BITS_PER_BYTE (though in this particular case, get_order() would > implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an > undefined behavior. I think we can safely assume PAGE_SIZE is unsigned long aligned and page->objects is non-zero. But if you don't feel comfortable with these assumptions, I'd be happy to ensure them explicitly.
On Tue, Sep 10, 2019 at 10:41:31AM +0900, Tetsuo Handa wrote: > Yu Zhao wrote: > > I think we can safely assume PAGE_SIZE is unsigned long aligned and > > page->objects is non-zero. But if you don't feel comfortable with these > > assumptions, I'd be happy to ensure them explicitly. > > I know PAGE_SIZE is unsigned long aligned. If someone by chance happens to > change from "dynamic allocation" to "on stack", get_order() will no longer > be called and the bug will show up. > > I don't know whether __get_free_page(GFP_ATOMIC) can temporarily consume more > than 4096 bytes, but if it can, we might want to avoid "dynamic allocation". With GFP_ATOMIC and ~~__GFP_HIGHMEM, it shouldn't. > By the way, if "struct kmem_cache_node" is object which won't have many thousands > of instances, can't we embed that buffer into "struct kmem_cache_node" because > max size of that buffer is only 4096 bytes? It seems to me allocation in error path is better than always keeping a page around. But the latter may still be acceptable given it's done only when debug is on and, of course, on a per-node scale.
On Mon, Sep 09, 2019 at 03:39:38PM -0600, Yu Zhao wrote: > On Tue, Sep 10, 2019 at 05:57:22AM +0900, Tetsuo Handa wrote: > > On 2019/09/10 1:00, Kirill A. Shutemov wrote: > > > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote: > > >> If we are already under list_lock, don't call kmalloc(). Otherwise we > > >> will run into deadlock because kmalloc() also tries to grab the same > > >> lock. > > >> > > >> Instead, allocate pages directly. Given currently page->objects has > > >> 15 bits, we only need 1 page. We may waste some memory but we only do > > >> so when slub debug is on. > > >> > > >> WARNING: possible recursive locking detected > > >> -------------------------------------------- > > >> mount-encrypted/4921 is trying to acquire lock: > > >> (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > > >> > > >> but task is already holding lock: > > >> (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > > >> > > >> other info that might help us debug this: > > >> Possible unsafe locking scenario: > > >> > > >> CPU0 > > >> ---- > > >> lock(&(&n->list_lock)->rlock); > > >> lock(&(&n->list_lock)->rlock); > > >> > > >> *** DEADLOCK *** > > >> > > >> Signed-off-by: Yu Zhao <yuzhao@google.com> > > > > > > Looks sane to me: > > > > > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> > > > > > > > Really? > > > > Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG > > than BITS_PER_BYTE (though in this particular case, get_order() would > > implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an > > undefined behavior. > > I think we can safely assume PAGE_SIZE is unsigned long aligned and > page->objects is non-zero. I think it's better to handle page->objects == 0 gracefully. It should not happen, but this code handles situation that should not happen.
On Mon, 9 Sep 2019 00:10:16 -0600 Yu Zhao <yuzhao@google.com> wrote: > If we are already under list_lock, don't call kmalloc(). Otherwise we > will run into deadlock because kmalloc() also tries to grab the same > lock. > > Instead, allocate pages directly. Given currently page->objects has > 15 bits, we only need 1 page. We may waste some memory but we only do > so when slub debug is on. > > WARNING: possible recursive locking detected > -------------------------------------------- > mount-encrypted/4921 is trying to acquire lock: > (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 > > but task is already holding lock: > (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb > > other info that might help us debug this: > Possible unsafe locking scenario: > > CPU0 > ---- > lock(&(&n->list_lock)->rlock); > lock(&(&n->list_lock)->rlock); > > *** DEADLOCK *** > It would be better if a silly low-level debug function like this weren't to try to allocate memory at all. Did you consider simply using a statically allocated buffer? { static char buffer[something large enough]; static spinlock_t lock_to_protect_it; Alternatively, do we need to call get_map() at all in there? We could simply open-code the get_map() functionality inside list_slab_objects(). It would be slower, but printk is already slow. Potentially extremely slow.
diff --git a/mm/slub.c b/mm/slub.c index 8834563cdb4b..574a53ee31e1 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3683,7 +3683,11 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, #ifdef CONFIG_SLUB_DEBUG void *addr = page_address(page); void *p; - unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC); + int order; + unsigned long *map; + + order = get_order(DIV_ROUND_UP(page->objects, BITS_PER_BYTE)); + map = (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO, order); if (!map) return; slab_err(s, page, text, s->name); @@ -3698,7 +3702,7 @@ static void list_slab_objects(struct kmem_cache *s, struct page *page, } } slab_unlock(page); - bitmap_free(map); + free_pages((unsigned long)map, order); #endif }
If we are already under list_lock, don't call kmalloc(). Otherwise we will run into deadlock because kmalloc() also tries to grab the same lock. Instead, allocate pages directly. Given currently page->objects has 15 bits, we only need 1 page. We may waste some memory but we only do so when slub debug is on. WARNING: possible recursive locking detected -------------------------------------------- mount-encrypted/4921 is trying to acquire lock: (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437 but task is already holding lock: (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(&(&n->list_lock)->rlock); lock(&(&n->list_lock)->rlock); *** DEADLOCK *** Signed-off-by: Yu Zhao <yuzhao@google.com> --- mm/slub.c | 8 ++++++-- 1 file changed, 6 insertions(+), 2 deletions(-)