diff mbox series

mm: avoid slub allocation while holding list_lock

Message ID 20190909061016.173927-1-yuzhao@google.com (mailing list archive)
State New, archived
Headers show
Series mm: avoid slub allocation while holding list_lock | expand

Commit Message

Yu Zhao Sept. 9, 2019, 6:10 a.m. UTC
If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.

Instead, allocate pages directly. Given currently page->objects has
15 bits, we only need 1 page. We may waste some memory but we only do
so when slub debug is on.

  WARNING: possible recursive locking detected
  --------------------------------------------
  mount-encrypted/4921 is trying to acquire lock:
  (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

  but task is already holding lock:
  (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&n->list_lock)->rlock);
    lock(&(&n->list_lock)->rlock);

   *** DEADLOCK ***

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Comments

Kirill A. Shutemov Sept. 9, 2019, 4 p.m. UTC | #1
On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote:
> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
> 
> Instead, allocate pages directly. Given currently page->objects has
> 15 bits, we only need 1 page. We may waste some memory but we only do
> so when slub debug is on.
> 
>   WARNING: possible recursive locking detected
>   --------------------------------------------
>   mount-encrypted/4921 is trying to acquire lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> 
>   but task is already holding lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(&(&n->list_lock)->rlock);
>     lock(&(&n->list_lock)->rlock);
> 
>    *** DEADLOCK ***
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>

Looks sane to me:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Yu Zhao Sept. 9, 2019, 9:39 p.m. UTC | #2
On Tue, Sep 10, 2019 at 05:57:22AM +0900, Tetsuo Handa wrote:
> On 2019/09/10 1:00, Kirill A. Shutemov wrote:
> > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote:
> >> If we are already under list_lock, don't call kmalloc(). Otherwise we
> >> will run into deadlock because kmalloc() also tries to grab the same
> >> lock.
> >>
> >> Instead, allocate pages directly. Given currently page->objects has
> >> 15 bits, we only need 1 page. We may waste some memory but we only do
> >> so when slub debug is on.
> >>
> >>   WARNING: possible recursive locking detected
> >>   --------------------------------------------
> >>   mount-encrypted/4921 is trying to acquire lock:
> >>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> >>
> >>   but task is already holding lock:
> >>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> >>
> >>   other info that might help us debug this:
> >>    Possible unsafe locking scenario:
> >>
> >>          CPU0
> >>          ----
> >>     lock(&(&n->list_lock)->rlock);
> >>     lock(&(&n->list_lock)->rlock);
> >>
> >>    *** DEADLOCK ***
> >>
> >> Signed-off-by: Yu Zhao <yuzhao@google.com>
> > 
> > Looks sane to me:
> > 
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > 
> 
> Really?
> 
> Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG
> than BITS_PER_BYTE (though in this particular case, get_order() would
> implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an
> undefined behavior.

I think we can safely assume PAGE_SIZE is unsigned long aligned and
page->objects is non-zero. But if you don't feel comfortable with these
assumptions, I'd be happy to ensure them explicitly.
Yu Zhao Sept. 10, 2019, 2:16 a.m. UTC | #3
On Tue, Sep 10, 2019 at 10:41:31AM +0900, Tetsuo Handa wrote:
> Yu Zhao wrote:
> > I think we can safely assume PAGE_SIZE is unsigned long aligned and
> > page->objects is non-zero. But if you don't feel comfortable with these
> > assumptions, I'd be happy to ensure them explicitly.
> 
> I know PAGE_SIZE is unsigned long aligned. If someone by chance happens to
> change from "dynamic allocation" to "on stack", get_order() will no longer
> be called and the bug will show up.
> 
> I don't know whether __get_free_page(GFP_ATOMIC) can temporarily consume more
> than 4096 bytes, but if it can, we might want to avoid "dynamic allocation".

With GFP_ATOMIC and ~~__GFP_HIGHMEM, it shouldn't.

> By the way, if "struct kmem_cache_node" is object which won't have many thousands
> of instances, can't we embed that buffer into "struct kmem_cache_node" because
> max size of that buffer is only 4096 bytes?

It seems to me allocation in error path is better than always keeping
a page around. But the latter may still be acceptable given it's done
only when debug is on and, of course, on a per-node scale.
Kirill A. Shutemov Sept. 10, 2019, 9:16 a.m. UTC | #4
On Mon, Sep 09, 2019 at 03:39:38PM -0600, Yu Zhao wrote:
> On Tue, Sep 10, 2019 at 05:57:22AM +0900, Tetsuo Handa wrote:
> > On 2019/09/10 1:00, Kirill A. Shutemov wrote:
> > > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote:
> > >> If we are already under list_lock, don't call kmalloc(). Otherwise we
> > >> will run into deadlock because kmalloc() also tries to grab the same
> > >> lock.
> > >>
> > >> Instead, allocate pages directly. Given currently page->objects has
> > >> 15 bits, we only need 1 page. We may waste some memory but we only do
> > >> so when slub debug is on.
> > >>
> > >>   WARNING: possible recursive locking detected
> > >>   --------------------------------------------
> > >>   mount-encrypted/4921 is trying to acquire lock:
> > >>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> > >>
> > >>   but task is already holding lock:
> > >>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> > >>
> > >>   other info that might help us debug this:
> > >>    Possible unsafe locking scenario:
> > >>
> > >>          CPU0
> > >>          ----
> > >>     lock(&(&n->list_lock)->rlock);
> > >>     lock(&(&n->list_lock)->rlock);
> > >>
> > >>    *** DEADLOCK ***
> > >>
> > >> Signed-off-by: Yu Zhao <yuzhao@google.com>
> > > 
> > > Looks sane to me:
> > > 
> > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > 
> > 
> > Really?
> > 
> > Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG
> > than BITS_PER_BYTE (though in this particular case, get_order() would
> > implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an
> > undefined behavior.
> 
> I think we can safely assume PAGE_SIZE is unsigned long aligned and
> page->objects is non-zero.

I think it's better to handle page->objects == 0 gracefully. It should not
happen, but this code handles situation that should not happen.
Andrew Morton Sept. 11, 2019, 2:13 p.m. UTC | #5
On Mon,  9 Sep 2019 00:10:16 -0600 Yu Zhao <yuzhao@google.com> wrote:

> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
> 
> Instead, allocate pages directly. Given currently page->objects has
> 15 bits, we only need 1 page. We may waste some memory but we only do
> so when slub debug is on.
> 
>   WARNING: possible recursive locking detected
>   --------------------------------------------
>   mount-encrypted/4921 is trying to acquire lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> 
>   but task is already holding lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(&(&n->list_lock)->rlock);
>     lock(&(&n->list_lock)->rlock);
> 
>    *** DEADLOCK ***
> 

It would be better if a silly low-level debug function like this
weren't to try to allocate memory at all.  Did you consider simply
using a statically allocated buffer?

{
	static char buffer[something large enough];
	static spinlock_t lock_to_protect_it;


Alternatively, do we need to call get_map() at all in there?  We could
simply open-code the get_map() functionality inside
list_slab_objects().  It would be slower, but printk is already slow. 
Potentially extremely slow.
diff mbox series

Patch

diff --git a/mm/slub.c b/mm/slub.c
index 8834563cdb4b..574a53ee31e1 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3683,7 +3683,11 @@  static void list_slab_objects(struct kmem_cache *s, struct page *page,
 #ifdef CONFIG_SLUB_DEBUG
 	void *addr = page_address(page);
 	void *p;
-	unsigned long *map = bitmap_zalloc(page->objects, GFP_ATOMIC);
+	int order;
+	unsigned long *map;
+
+	order = get_order(DIV_ROUND_UP(page->objects, BITS_PER_BYTE));
+	map = (void *)__get_free_pages(GFP_ATOMIC | __GFP_ZERO, order);
 	if (!map)
 		return;
 	slab_err(s, page, text, s->name);
@@ -3698,7 +3702,7 @@  static void list_slab_objects(struct kmem_cache *s, struct page *page,
 		}
 	}
 	slab_unlock(page);
-	bitmap_free(map);
+	free_pages((unsigned long)map, order);
 #endif
 }