[RFC,v2,34/34] mm, slub: convert kmem_cpu_slab protection to local_lock

Message ID	20210609113903.1421-35-vbabka@suse.cz (mailing list archive)
State	New, archived
Headers	show Return-Path: <SRS0=BPbg=LD=kvack.org=owner-linux-mm@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F35546100B From: Vlastimil Babka <vbabka@suse.cz> To: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Christoph Lameter <cl@linux.com>, David Rientjes <rientjes@google.com>, Pekka Enberg <penberg@kernel.org>, Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>, Thomas Gleixner <tglx@linutronix.de>, Mel Gorman <mgorman@techsingularity.net>, Jesper Dangaard Brouer <brouer@redhat.com>, Peter Zijlstra <peterz@infradead.org>, Jann Horn <jannh@google.com>, Vlastimil Babka <vbabka@suse.cz> Subject: [RFC v2 34/34] mm, slub: convert kmem_cpu_slab protection to local_lock Date: Wed, 9 Jun 2021 13:39:03 +0200 Message-Id: <20210609113903.1421-35-vbabka@suse.cz> In-Reply-To: <20210609113903.1421-1-vbabka@suse.cz> References: <20210609113903.1421-1-vbabka@suse.cz> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	SLUB: reduce irq disabled scope and make it RT compatible \| expand [RFC,v2,00/34] SLUB: reduce irq disabled scope and make it RT compatible [RFC,v2,01/34] mm, slub: don't call flush_all() from list_locations() [RFC,v2,02/34] mm, slub: allocate private object map for sysfs listings [RFC,v2,03/34] mm, slub: allocate private object map for validate_slab_cache() [RFC,v2,04/34] mm, slub: don't disable irq for debug_check_no_locks_freed() [RFC,v2,05/34] mm, slub: remove redundant unfreeze_partials() from put_cpu_partial() [RFC,v2,06/34] mm, slub: unify cmpxchg_double_slab() and __cmpxchg_double_slab() [RFC,v2,07/34] mm, slub: extract get_partial() from new_slab_objects() [RFC,v2,08/34] mm, slub: dissolve new_slab_objects() into ___slab_alloc() [RFC,v2,09/34] mm, slub: return slab page from get_partial() and set c->page afterwards [RFC,v2,10/34] mm, slub: restructure new page checks in ___slab_alloc() [RFC,v2,11/34] mm, slub: simplify kmem_cache_cpu and tid setup [RFC,v2,12/34] mm, slub: move disabling/enabling irqs to ___slab_alloc() [RFC,v2,13/34] mm, slub: do initial checks in ___slab_alloc() with irqs enabled [RFC,v2,14/34] mm, slub: move disabling irqs closer to get_partial() in ___slab_alloc() [RFC,v2,15/34] mm, slub: restore irqs around calling new_slab() [RFC,v2,16/34] mm, slub: validate slab from partial list or page allocator before making it cpu slab [RFC,v2,17/34] mm, slub: check new pages with restored irqs [RFC,v2,18/34] mm, slub: stop disabling irqs around get_partial() [RFC,v2,19/34] mm, slub: move reset of c->page and freelist out of deactivate_slab() [RFC,v2,20/34] mm, slub: make locking in deactivate_slab() irq-safe [RFC,v2,21/34] mm, slub: call deactivate_slab() without disabling irqs [RFC,v2,22/34] mm, slub: move irq control into unfreeze_partials() [RFC,v2,23/34] mm, slub: discard slabs in unfreeze_partials() without irqs disabled [RFC,v2,24/34] mm, slub: detach whole partial list at once in unfreeze_partials() [RFC,v2,25/34] mm, slub: detach percpu partial list in unfreeze_partials() using this_cpu_cmpxchg() [RFC,v2,26/34] mm, slub: only disable irq with spin_lock in __unfreeze_partials() [RFC,v2,27/34] mm, slub: don't disable irqs in slub_cpu_dead() [RFC,v2,28/34] mm, slab: make flush_slab() possible to call with irqs enabled [RFC,v2,29/34] mm: slub: Move flush_cpu_slab() invocations __free_slab() invocations out of IRQ con… [RFC,v2,30/34] mm: slub: Make object_map_lock a raw_spinlock_t [RFC,v2,31/34] mm, slub: optionally save/restore irqs in slab_[un]lock()/ [RFC,v2,32/34] mm, slub: make slab_lock() disable irqs with PREEMPT_RT [RFC,v2,33/34] mm, slub: use migrate_disable() on PREEMPT_RT [RFC,v2,34/34] mm, slub: convert kmem_cpu_slab protection to local_lock

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index dcde82a4434c..b5bcac29b979 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -10,6 +10,7 @@ #include <linux/kfence.h> #include <linux/kobject.h> #include <linux/reciprocal_div.h> +#include <linux/local_lock.h> enum stat_item { ALLOC_FASTPATH, /* Allocation from cpu slab */ @@ -41,6 +42,7 @@ enum stat_item { NR_SLUB_STAT_ITEMS }; struct kmem_cache_cpu { + local_lock_t lock; /* Protects the fields below except stat */ void **freelist; /* Pointer to next available object */ unsigned long tid; /* Globally unique transaction id */ struct page *page; /* The slab from which we are allocating */ diff --git a/mm/slub.c b/mm/slub.c index caa206213e72..500720ec1e57 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -43,13 +43,22 @@ /* * Lock order: * 1. slab_mutex (Global Mutex) - * 2. node->list_lock - * 3. slab_lock(page) (Only on some arches and for debugging) + * 2. node->list_lock (Spinlock) + * OR + * kmem_cache->cpu_slab->lock (Local lock) + * 3. slab_lock(page) (Only on some arches or for debugging) + * 4. object_map_lock (Only for debugging) * * slab_mutex * * The role of the slab_mutex is to protect the list of all the slabs * and to synchronize major metadata changes to slab cache structures. + * Also synchronizes memory hotplug callbacks. + * + * slab_lock + * + * The slab_lock is a wrapper around the page lock, thus it is a bit + * spinlock. * * The slab_lock is only used for debugging and on arches that do not * have the ability to do a cmpxchg_double. It only protects: @@ -58,6 +67,8 @@ * C. page->objects -> Number of objects in page * D. page->frozen -> frozen state * + * Frozen slabs + * * If a slab is frozen then it is exempt from list management. It is not * on any list except per cpu partial list. The processor that froze the * slab is the one who can perform list operations on the page. Other @@ -65,6 +76,8 @@ * froze the slab is the only one that can retrieve the objects from the * page's freelist. * + * list_lock + * * The list_lock protects the partial and full list on each node and * the partial slab counter. If taken then no new slabs may be added or * removed from the lists nor make the number of partial slabs be modified. @@ -76,10 +89,36 @@ * slabs, operations can continue without any centralized lock. F.e. * allocating a long series of objects that fill up slabs does not require * the list lock. - * Interrupts are disabled during allocation and deallocation in order to - * make the slab allocator safe to use in the context of an irq. In addition - * interrupts are disabled to ensure that the processor does not change - * while handling per_cpu slabs, due to kernel preemption. + * + * cpu_slab->lock local lock + * + * This locks protect slowpath manipulation of all kmem_cache_cpu fields + * except the stat counters. This is a percpu structure manipulated only by + * the local cpu, so the lock protects against being preempted or interrupted + * by an irq. Fast path operations rely on lockless operations instead. + * On PREEMPT_RT, the local lock does not actually disable irqs (and thus + * prevent the lockless operations), so fastpath operations also need to take + * the lock and are no longer lockless. + * + * lockless fastpaths + * + * The fast path allocation (slab_alloc_node()) and freeing (do_slab_free()) + * are fully lockless when satisfied from the percpu slab (and when + * cmpxchg_double is possible to use, otherwise slab_lock is taken). + * They also don't disable preemption or migration or irqs. They rely on + * the transaction id (tid) field to detect being preempted or moved to + * another cpu. + * + * irq, preemption, migration considerations + * + * Interrupts are disabled as part of list_lock or local_lock operations, or + * around the slab_lock operation, in order to make the slab allocator safe + * to use in the context of an irq. + * + * In addition, preemption (or migration on PREEMPT_RT) is disabled in the + * allocation slowpath, bulk allocation, and put_cpu_partial(), so that the + * local cpu doesn't change in the process and e.g. the kmem_cache_cpu pointer + * doesn't have to be revalidated in each section protected by the local lock. * * SLUB assigns one slab for allocation to each processor. * Allocations only occur from these slabs called cpu slabs. @@ -2179,9 +2218,13 @@ static inline void note_cmpxchg_failure(const char *n, static void init_kmem_cache_cpus(struct kmem_cache *s) { int cpu; + struct kmem_cache_cpu *c; - for_each_possible_cpu(cpu) - per_cpu_ptr(s->cpu_slab, cpu)->tid = init_tid(cpu); + for_each_possible_cpu(cpu) { + c = per_cpu_ptr(s->cpu_slab, cpu); + local_lock_init(&c->lock); + c->tid = init_tid(cpu); + } } /* @@ -2482,7 +2525,7 @@ static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c, struct page *page; if (lock) - local_irq_save(flags); + local_lock_irqsave(&s->cpu_slab->lock, flags); freelist = c->freelist; page = c->page; @@ -2492,7 +2535,7 @@ static inline void flush_slab(struct kmem_cache *s, struct kmem_cache_cpu *c, c->tid = next_tid(c->tid); if (lock) - local_irq_restore(flags); + local_unlock_irqrestore(&s->cpu_slab->lock, flags); if (page) deactivate_slab(s, page, freelist); @@ -2780,9 +2823,9 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, goto deactivate_slab; /* must check again c->page in case we got preempted and it changed */ - local_irq_save(flags); + local_lock_irqsave(&s->cpu_slab->lock, flags); if (unlikely(page != c->page)) { - local_irq_restore(flags); + local_unlock_irqrestore(&s->cpu_slab->lock, flags); goto reread_page; } freelist = c->freelist; @@ -2793,7 +2836,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, if (!freelist) { c->page = NULL; - local_irq_restore(flags); + local_unlock_irqrestore(&s->cpu_slab->lock, flags); stat(s, DEACTIVATE_BYPASS); goto new_slab; } @@ -2802,7 +2845,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, load_freelist: - lockdep_assert_irqs_disabled(); + lockdep_assert_held(this_cpu_ptr(&s->cpu_slab->lock)); /* * freelist is pointing to the list of objects to be used. @@ -2812,39 +2855,39 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, VM_BUG_ON(!c->page->frozen); c->freelist = get_freepointer(s, freelist); c->tid = next_tid(c->tid); - local_irq_restore(flags); + local_unlock_irqrestore(&s->cpu_slab->lock, flags); return freelist; deactivate_slab: - local_irq_save(flags); + local_lock_irqsave(&s->cpu_slab->lock, flags); if (page != c->page) { - local_irq_restore(flags); + local_unlock_irqrestore(&s->cpu_slab->lock, flags); goto reread_page; } freelist = c->freelist; c->page = NULL; c->freelist = NULL; - local_irq_restore(flags); + local_unlock_irqrestore(&s->cpu_slab->lock, flags); deactivate_slab(s, page, freelist); new_slab: if (slub_percpu_partial(c)) { - local_irq_save(flags); + local_lock_irqsave(&s->cpu_slab->lock, flags); if (unlikely(c->page)) { - local_irq_restore(flags); + local_unlock_irqrestore(&s->cpu_slab->lock, flags); goto reread_page; } if (unlikely(!slub_percpu_partial(c))) { - local_irq_restore(flags); + local_unlock_irqrestore(&s->cpu_slab->lock, flags); /* we were preempted and partial list got empty */ goto new_objects; } page = c->page = slub_percpu_partial(c); slub_set_percpu_partial(c, page); - local_irq_restore(flags); + local_unlock_irqrestore(&s->cpu_slab->lock, flags); stat(s, CPU_PARTIAL_ALLOC); goto redo; } @@ -2897,7 +2940,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, retry_load_page: - local_irq_save(flags); + local_lock_irqsave(&s->cpu_slab->lock, flags); if (unlikely(c->page)) { void *flush_freelist = c->freelist; struct page *flush_page = c->page; @@ -2906,7 +2949,7 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, c->freelist = NULL; c->tid = next_tid(c->tid); - local_irq_restore(flags); + local_unlock_irqrestore(&s->cpu_slab->lock, flags); deactivate_slab(s, flush_page, flush_freelist); @@ -3025,7 +3068,15 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s, object = c->freelist; page = c->page; - if (unlikely(!object || !page || !node_match(page, node))) { + /* + * We cannot use the lockless fastpath on PREEMPT_RT because if a + * slowpath has taken the local_lock_irqsave(), it is not protected + * against a fast path operation in an irq handler. So we need to take + * the slow path which uses local_lock. It is still relatively fast if + * there is a suitable cpu freelist. + */ + if (IS_ENABLED(CONFIG_PREEMPT_RT) || + unlikely(!object || !page || !node_match(page, node))) { object = __slab_alloc(s, gfpflags, node, addr, c); } else { void *next_object = get_freepointer_safe(s, object); @@ -3285,6 +3336,7 @@ static __always_inline void do_slab_free(struct kmem_cache *s, barrier(); if (likely(page == c->page)) { +#ifndef CONFIG_PREEMPT_RT void **freelist = READ_ONCE(c->freelist); set_freepointer(s, tail_obj, freelist); @@ -3297,6 +3349,32 @@ static __always_inline void do_slab_free(struct kmem_cache *s, note_cmpxchg_failure("slab_free", s, tid); goto redo; } +#else /* CONFIG_PREEMPT_RT */ + /* + * We cannot use the lockless fastpath on PREEMPT_RT because if + * a slowpath has taken the local_lock_irqsave(), it is not + * protected against a fast path operation in an irq handler. So + * we need to take the local_lock. We shouldn't simply defer to + * __slab_free() as that wouldn't use the cpu freelist at all. + */ + unsigned long flags; + void **freelist; + + local_lock_irqsave(&s->cpu_slab->lock, flags); + c = this_cpu_ptr(s->cpu_slab); + if (unlikely(page != c->page)) { + local_unlock_irqrestore(&s->cpu_slab->lock, flags); + goto redo; + } + tid = c->tid; + freelist = c->freelist; + + set_freepointer(s, tail_obj, freelist); + c->freelist = head; + c->tid = next_tid(tid); + + local_unlock_irqrestore(&s->cpu_slab->lock, flags); +#endif stat(s, FREE_FASTPATH); } else __slab_free(s, page, head, tail_obj, cnt, addr); @@ -3467,7 +3545,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, * handlers invoking normal fastpath. */ c = slub_get_cpu_ptr(s->cpu_slab); - local_irq_disable(); + local_lock_irq(&s->cpu_slab->lock); for (i = 0; i < size; i++) { void *object = kfence_alloc(s, s->object_size, flags); @@ -3488,7 +3566,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, */ c->tid = next_tid(c->tid); - local_irq_enable(); + local_unlock_irq(&s->cpu_slab->lock); /* * Invoking slow path likely have side-effect @@ -3502,7 +3580,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, c = this_cpu_ptr(s->cpu_slab); maybe_wipe_obj_freeptr(s, p[i]); - local_irq_disable(); + local_lock_irq(&s->cpu_slab->lock); continue; /* goto for-loop */ } @@ -3511,7 +3589,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, maybe_wipe_obj_freeptr(s, p[i]); } c->tid = next_tid(c->tid); - local_irq_enable(); + local_unlock_irq(&s->cpu_slab->lock); slub_put_cpu_ptr(s->cpu_slab); /* @@ -3522,7 +3600,7 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size, slab_want_init_on_alloc(flags, s)); return i; error: - local_irq_enable(); + local_unlock_irq(&s->cpu_slab->lock); slab_post_alloc_hook(s, objcg, flags, i, p, false); __kmem_cache_free_bulk(s, i, p); return 0;

[RFC,v2,34/34] mm, slub: convert kmem_cpu_slab protection to local_lock

Commit Message

Patch