mm: avoid slub allocation while holding list_lock

Message ID	20190909061016.173927-1-yuzhao@google.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=8wNw=XE=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 5894121D7F Date: Mon, 9 Sep 2019 00:10:16 -0600 Message-Id: <20190909061016.173927-1-yuzhao@google.com> Mime-Version: 1.0 Subject: [PATCH] mm: avoid slub allocation while holding list_lock From: Yu Zhao <yuzhao@google.com> To: Christoph Lameter <cl@linux.com>, Pekka Enberg <penberg@kernel.org>, David Rientjes <rientjes@google.com>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Andrew Morton <akpm@linux-foundation.org> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Yu Zhao <yuzhao@google.com> Content-Type: text/plain; charset="UTF-8" Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: avoid slub allocation while holding list_lock \| expand mm: avoid slub allocation while holding list_lock

Yu Zhao Sept. 9, 2019, 6:10 a.m. UTC

If we are already under list_lock, don't call kmalloc(). Otherwise we
will run into deadlock because kmalloc() also tries to grab the same
lock.

Instead, allocate pages directly. Given currently page->objects has
15 bits, we only need 1 page. We may waste some memory but we only do
so when slub debug is on.

  WARNING: possible recursive locking detected
  --------------------------------------------
  mount-encrypted/4921 is trying to acquire lock:
  (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437

  but task is already holding lock:
  (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb

  other info that might help us debug this:
   Possible unsafe locking scenario:

         CPU0
         ----
    lock(&(&n->list_lock)->rlock);
    lock(&(&n->list_lock)->rlock);

   *** DEADLOCK ***

Signed-off-by: Yu Zhao <yuzhao@google.com>
---
 mm/slub.c | 8 ++++++--
 1 file changed, 6 insertions(+), 2 deletions(-)

Kirill A. Shutemov Sept. 9, 2019, 4 p.m. UTC | #1

On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote:
> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
> 
> Instead, allocate pages directly. Given currently page->objects has
> 15 bits, we only need 1 page. We may waste some memory but we only do
> so when slub debug is on.
> 
>   WARNING: possible recursive locking detected
>   --------------------------------------------
>   mount-encrypted/4921 is trying to acquire lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> 
>   but task is already holding lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(&(&n->list_lock)->rlock);
>     lock(&(&n->list_lock)->rlock);
> 
>    *** DEADLOCK ***
> 
> Signed-off-by: Yu Zhao <yuzhao@google.com>

Looks sane to me:

Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>

Yu Zhao Sept. 9, 2019, 9:39 p.m. UTC | #2

On Tue, Sep 10, 2019 at 05:57:22AM +0900, Tetsuo Handa wrote:
> On 2019/09/10 1:00, Kirill A. Shutemov wrote:
> > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote:
> >> If we are already under list_lock, don't call kmalloc(). Otherwise we
> >> will run into deadlock because kmalloc() also tries to grab the same
> >> lock.
> >>
> >> Instead, allocate pages directly. Given currently page->objects has
> >> 15 bits, we only need 1 page. We may waste some memory but we only do
> >> so when slub debug is on.
> >>
> >>   WARNING: possible recursive locking detected
> >>   --------------------------------------------
> >>   mount-encrypted/4921 is trying to acquire lock:
> >>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> >>
> >>   but task is already holding lock:
> >>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> >>
> >>   other info that might help us debug this:
> >>    Possible unsafe locking scenario:
> >>
> >>          CPU0
> >>          ----
> >>     lock(&(&n->list_lock)->rlock);
> >>     lock(&(&n->list_lock)->rlock);
> >>
> >>    *** DEADLOCK ***
> >>
> >> Signed-off-by: Yu Zhao <yuzhao@google.com>
> > 
> > Looks sane to me:
> > 
> > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > 
> 
> Really?
> 
> Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG
> than BITS_PER_BYTE (though in this particular case, get_order() would
> implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an
> undefined behavior.

I think we can safely assume PAGE_SIZE is unsigned long aligned and
page->objects is non-zero. But if you don't feel comfortable with these
assumptions, I'd be happy to ensure them explicitly.

Yu Zhao Sept. 10, 2019, 2:16 a.m. UTC | #3

On Tue, Sep 10, 2019 at 10:41:31AM +0900, Tetsuo Handa wrote:
> Yu Zhao wrote:
> > I think we can safely assume PAGE_SIZE is unsigned long aligned and
> > page->objects is non-zero. But if you don't feel comfortable with these
> > assumptions, I'd be happy to ensure them explicitly.
> 
> I know PAGE_SIZE is unsigned long aligned. If someone by chance happens to
> change from "dynamic allocation" to "on stack", get_order() will no longer
> be called and the bug will show up.
> 
> I don't know whether __get_free_page(GFP_ATOMIC) can temporarily consume more
> than 4096 bytes, but if it can, we might want to avoid "dynamic allocation".

With GFP_ATOMIC and ~~__GFP_HIGHMEM, it shouldn't.

> By the way, if "struct kmem_cache_node" is object which won't have many thousands
> of instances, can't we embed that buffer into "struct kmem_cache_node" because
> max size of that buffer is only 4096 bytes?

It seems to me allocation in error path is better than always keeping
a page around. But the latter may still be acceptable given it's done
only when debug is on and, of course, on a per-node scale.

Kirill A. Shutemov Sept. 10, 2019, 9:16 a.m. UTC | #4

On Mon, Sep 09, 2019 at 03:39:38PM -0600, Yu Zhao wrote:
> On Tue, Sep 10, 2019 at 05:57:22AM +0900, Tetsuo Handa wrote:
> > On 2019/09/10 1:00, Kirill A. Shutemov wrote:
> > > On Mon, Sep 09, 2019 at 12:10:16AM -0600, Yu Zhao wrote:
> > >> If we are already under list_lock, don't call kmalloc(). Otherwise we
> > >> will run into deadlock because kmalloc() also tries to grab the same
> > >> lock.
> > >>
> > >> Instead, allocate pages directly. Given currently page->objects has
> > >> 15 bits, we only need 1 page. We may waste some memory but we only do
> > >> so when slub debug is on.
> > >>
> > >>   WARNING: possible recursive locking detected
> > >>   --------------------------------------------
> > >>   mount-encrypted/4921 is trying to acquire lock:
> > >>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> > >>
> > >>   but task is already holding lock:
> > >>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> > >>
> > >>   other info that might help us debug this:
> > >>    Possible unsafe locking scenario:
> > >>
> > >>          CPU0
> > >>          ----
> > >>     lock(&(&n->list_lock)->rlock);
> > >>     lock(&(&n->list_lock)->rlock);
> > >>
> > >>    *** DEADLOCK ***
> > >>
> > >> Signed-off-by: Yu Zhao <yuzhao@google.com>
> > > 
> > > Looks sane to me:
> > > 
> > > Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > 
> > 
> > Really?
> > 
> > Since page->objects is handled as bitmap, alignment should be BITS_PER_LONG
> > than BITS_PER_BYTE (though in this particular case, get_order() would
> > implicitly align BITS_PER_BYTE * PAGE_SIZE). But get_order(0) is an
> > undefined behavior.
> 
> I think we can safely assume PAGE_SIZE is unsigned long aligned and
> page->objects is non-zero.

I think it's better to handle page->objects == 0 gracefully. It should not
happen, but this code handles situation that should not happen.

Andrew Morton Sept. 11, 2019, 2:13 p.m. UTC | #5

On Mon,  9 Sep 2019 00:10:16 -0600 Yu Zhao <yuzhao@google.com> wrote:

> If we are already under list_lock, don't call kmalloc(). Otherwise we
> will run into deadlock because kmalloc() also tries to grab the same
> lock.
> 
> Instead, allocate pages directly. Given currently page->objects has
> 15 bits, we only need 1 page. We may waste some memory but we only do
> so when slub debug is on.
> 
>   WARNING: possible recursive locking detected
>   --------------------------------------------
>   mount-encrypted/4921 is trying to acquire lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: ___slab_alloc+0x104/0x437
> 
>   but task is already holding lock:
>   (&(&n->list_lock)->rlock){-.-.}, at: __kmem_cache_shutdown+0x81/0x3cb
> 
>   other info that might help us debug this:
>    Possible unsafe locking scenario:
> 
>          CPU0
>          ----
>     lock(&(&n->list_lock)->rlock);
>     lock(&(&n->list_lock)->rlock);
> 
>    *** DEADLOCK ***
> 

It would be better if a silly low-level debug function like this
weren't to try to allocate memory at all.  Did you consider simply
using a statically allocated buffer?

{
	static char buffer[something large enough];
	static spinlock_t lock_to_protect_it;


Alternatively, do we need to call get_map() at all in there?  We could
simply open-code the get_map() functionality inside
list_slab_objects().  It would be slower, but printk is already slow. 
Potentially extremely slow.

mm: avoid slub allocation while holding list_lock

Commit Message

Comments

Patch