From patchwork Thu Aug 8 18:59:49 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Mateusz Guzik X-Patchwork-Id: 13758114 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 72299C52D71 for ; Thu, 8 Aug 2024 19:00:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4C6706B0089; Thu, 8 Aug 2024 15:00:10 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 44DFE6B008C; Thu, 8 Aug 2024 15:00:10 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2EECE6B0092; Thu, 8 Aug 2024 15:00:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 10C316B0089 for ; Thu, 8 Aug 2024 15:00:10 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C7AC380C14 for ; Thu, 8 Aug 2024 19:00:09 +0000 (UTC) X-FDA: 82429993338.17.4509D3E Received: from mail-ej1-f41.google.com (mail-ej1-f41.google.com [209.85.218.41]) by imf09.hostedemail.com (Postfix) with ESMTP id D613D140040 for ; Thu, 8 Aug 2024 19:00:07 +0000 (UTC) Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aVBC9fNl; spf=pass (imf09.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1723143575; a=rsa-sha256; cv=none; b=kn/UgTPwGQ3GuCzdmhYEI9fPnFTA2nLuhgV5sg6nUjdPMSuNAlsfiJ5C+jeaLgFUBV7l3n zE1S3rt4vYObHXVAp5WM5gk+6BNUwGqIa4ZxKEzV5XBdKPvysNG8n9vyO1kHeg31BZsprT aW5qU6x6kX6m+nGCOgvRkSQ7S7xezJA= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=aVBC9fNl; spf=pass (imf09.hostedemail.com: domain of mjguzik@gmail.com designates 209.85.218.41 as permitted sender) smtp.mailfrom=mjguzik@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1723143575; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=M7J2unFknJRVaAPZnv6P83G7JcPOGmdhcXFSmISXSAk=; b=ry6tFl5JqhW9wKmWanA4hwGybDV0Z9JWYtfuYsgBI63mxO+iBstH8MDwOAXAiHqk51qDrg 0EQH3Mx9TvUaObckRo4MWnoY0K86N2RwmSIV9HSLgQaEXzraVV2cIpdRK0/Q11YsX5ILM2 lJgdRzVX6TsyvcdHrkE8G+pf0Wwy+VM= Received: by mail-ej1-f41.google.com with SMTP id a640c23a62f3a-a7a83a968ddso136746666b.0 for ; Thu, 08 Aug 2024 12:00:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1723143606; x=1723748406; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=M7J2unFknJRVaAPZnv6P83G7JcPOGmdhcXFSmISXSAk=; b=aVBC9fNlz8Uy2ek/DCg6kXACvaTIhRZxt7JWg5yf/Ae8cPa51WRAVQfj0W8C7hi834 PuniSy7ReLg3jiEK+VxpK3GRQZttqaU0TqMgCWrAiDuiin2DSuczShPumv1+Dt1u3wbD qU1/WbTBpHnNcd3sTl4jUHohzfbxwfPLYImGtjpKahpsunAM/Ev6umHJXxVmO652fBeD qUrygT+WjsTrTOFL218lRcSjohrPSmik5K4Fj63g8wnMKVYit86YqJ8JQMjzAf9ll+Ym serqX5OmDwbF4LMEU4ZrAaMLON2+cL1IA1se9zr7ci50URt3Kbs0Fn5J63RCWHjPLrSv 0exA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1723143606; x=1723748406; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=M7J2unFknJRVaAPZnv6P83G7JcPOGmdhcXFSmISXSAk=; b=gYI1CrlMVCsrx2YFa6XZulCZbOdep9VZGDrAWiP9SNV7JeqAmv0/Nybrl07qwaXCyi aqO1mHIgYIUG8MVMck8r8v7kWx4b4hStNNUSxeMen48iJ6LjIzKuGHyQ8KiBQlgrUKSk MJOw4u+Cu1hxcXwjQmuAw/SPDxlTMew1pzzGGhQggGacfb97JKNog8m4GHIX3mX43CCf CZZoFBM4eL/2/s7IcCbHrnA51ZpzTIqH2lSHfLO23W7YeD1QK/199TajlZikzwewptV9 H4q9r1g+I8tEZarisnOyDVMtTTore7aL/wjRBBI5950w+nYseqoyfW0YNNJBgjNfoMaM Z3tQ== X-Forwarded-Encrypted: i=1; AJvYcCU6o5krFS+/FI9zq4O2WumWm6SdFgXRu1VaZyPQ4LzJ5x3z9BGIZnXb2B8E//gsa94RcQdj6o3CnObX5hq25Ynm26g= X-Gm-Message-State: AOJu0YxddwbUi3scQabrU47sCYFU4c98r/8nK2ddtI3BfDurTdwXQlf0 jKGwFjVyXg3lBL6to1yn1AgIij++EkSjOF3oOCR1E9l+gIQf20wN X-Google-Smtp-Source: AGHT+IF3hl8d6AuywCjf1zHEmLj97N7p3hhhjrNyTjZJm3475UCyf+pIhGgiblWRA0cx6yNmWA5yCA== X-Received: by 2002:a17:907:f154:b0:a7a:97a9:ba28 with SMTP id a640c23a62f3a-a8090d796f8mr190544766b.26.1723143605640; Thu, 08 Aug 2024 12:00:05 -0700 (PDT) Received: from f.. (cst-prg-72-52.cust.vodafone.cz. [46.135.72.52]) by smtp.gmail.com with ESMTPSA id a640c23a62f3a-a7dc9ec7266sm772744066b.196.2024.08.08.12.00.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 08 Aug 2024 12:00:04 -0700 (PDT) From: Mateusz Guzik To: surenb@google.com Cc: linux-kernel@vger.kernel.org, linux-mm@kvack.org, Liam.Howlett@oracle.com, vbabka@suse.cz, lstoakes@gmail.com, pedro.falcato@gmail.com, Mateusz Guzik Subject: [RFC PATCH] vm: align vma allocation and move the lock back into the struct Date: Thu, 8 Aug 2024 20:59:49 +0200 Message-ID: <20240808185949.1094891-1-mjguzik@gmail.com> X-Mailer: git-send-email 2.43.0 MIME-Version: 1.0 X-Stat-Signature: a6i4fwadzr8wjexxmzrusjz76tnkzzct X-Rspamd-Queue-Id: D613D140040 X-Rspam-User: X-Rspamd-Server: rspam10 X-HE-Tag: 1723143607-247674 X-HE-Meta: U2FsdGVkX18uXPxruCc/th3EbJWiR4enNx4eumm91wKQIBuQXIn+SQ97/wIyytp+g/rqE+YS2HAxTvE+p9hfBNkqfomuiP9/YToBoITnX/Bf0Ro/hT+pEyWg8T77T99SnqyaMmX14z6XE7wW8IRW9FdIThnPz4uhSBOcp0MIV6DBqs8qYXDckBctMGYP+eM38mTL5B9tblDru8yETy4828u3Mqf7r0fPUPvipAJEcFWy/fxFL3ieKdU3UtFpOnBOIAnJsk5f+qUFbt8ti/0YkDftYukNrfJ46+qHUm4rGtSQvz8bZFK9XE+UDdCqiAelh8osNLa+cHGOe8BL89FKrRFEYLnbnzeTZmYtyClCbLavngVwqdvH1+af+x4k7DasHAQVBrnK8ggH/OR8uKDqUx+T/T2h7kyBKPik8K9UXa4+NGza9VBO0gsyUCDfROPlzhGFny4T9SfxZX/2Z6FDu293COjb9cWVMvksjZnv5j6/MkumXpglGSRzXdfDrgSXM8epZIH+pJX0ZMwtkdcX6rkmBw+g934W3vufHf35uGK/FEY2TENkju0To+C2rGztiN4herPlEEhZVKKy08jtSR3LJ5r5fGdMLoZgcef1xwVC0fO1GGyM14Tb2KS+adAbbmHRiM5g9VmrwHVbs6kJz5nQPRVq2hCoj15WSZnBFy56NXP6afDvideoc73UidoIzY3tMtPaiceI7vyfwUVLAEc18ucoX+QULk6WTYfBaPZ3yXHxpZ06G1fD0TzPOkUTNtrYbb55wMCdlSmRIhWBjkxlNVNg5ik1VJpQ12b2BQJl1wquB4uc8MBygXt4r2yiRa3t2y/oes/BHwlzzH2bdIwgAolwSKH43T2itGylhT2MzBW5yB9u4txYH/5tw4ksOBzOjP3XClSyGE+Jo5IQFBekhSl5ZPyiGTIU++h/0R9ldPnK2tVLtNrI0rq+iVjk0EMxmuSCVzXIJycY2Y/ Vb2c7Lik ysZO03pYCyQLSuoXm6rn1AaZUPstSPzML4uQ9V+ooK31F6u+2MopAhWS+AGY4U15EMEfP8ByxpCRGus59uHkT9JJ1N2oZ/1lL+66ce5x5AZnBTpHrfNLGOE2nrvkLDrccwcb8z7fhFFooSCj+JyeGCB99dURFFKJPYzjrzhU63Gc1vOX/U+BzBPBOWQE8WCnQOVPS4smyPqAm/7ca/veZ2TG3qSMvfm84WKqx/G4rCsCwQEnz9iFddfNC5Sgj0rDo14g1A4x8tp1TqN4t46cfLnLBvI0HG5m2ozeti7uK9H6ZOSibWS+v9FtYUB34B1qMM4wI2JwPdWxnB1ijdhNOlr+UvHPGdkoqKRLufMGPAEZfB123W0QNDENnFiegnt1JXr3Uad/AyOFUgffuEipGtEi0AiuYsfSF9YdlkTVPq7m2GOFLvfzlOw2asT8caEEkGhsUQ1Wu+NgS6NZKXuWHge7mjnBcyqXorsXg X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: ACHTUNG: this is more of a request for benchmarking than a patch proposal at this stage I was pointed at your patch which moved the vma lock to a separate allocation [1]. The commit message does not say anything about making sure the object itself is allocated with proper alignment and I found that the cache creation lacks the HWCACHE_ALIGN flag, which may or may not be the problem. I verified with a simple one-liner than on a stock kernel the vmas keep roaming around with a 16-byte alignment: # bpftrace -e 'kretprobe:vm_area_alloc { @[retval & 0x3f] = count(); }' @[16]: 39 @[0]: 46 @[32]: 53 @[48]: 56 Note the stock vma lock cache also lacks the alignment flag. While I have not verified experimentally, if they are also romaing it would mean that 2 unrelated vmas can false-share locks. If the patch below is a bust, the flag should probably be added to that one. The patch has slapped-around vma lock cache removal + HWALLOC for the vma cache. I left a pointer to not change relative offsets between current fields. I does compile without CONFIG_PER_VMA_LOCK. Vlastimil says you tested a case where the struct got bloated to 256 bytes, but the lock remained separate. It is unclear to me if this happened with allocations made with the HWCACHE_ALIGN flag though. There is 0 urgency on my end, but it would be nice if you could try this out with your test rig. 1: https://lore.kernel.org/all/20230227173632.3292573-34-surenb@google.com/T/#u --- include/linux/mm.h | 18 +++++++-------- include/linux/mm_types.h | 10 ++++----- kernel/fork.c | 47 ++++------------------------------------ mm/userfaultfd.c | 6 ++--- 4 files changed, 19 insertions(+), 62 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 43b40334e9b2..6d8b668d3deb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -687,7 +687,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) if (READ_ONCE(vma->vm_lock_seq) == READ_ONCE(vma->vm_mm->mm_lock_seq)) return false; - if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0)) + if (unlikely(down_read_trylock(&vma->vm_lock) == 0)) return false; /* @@ -702,7 +702,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) * This pairs with RELEASE semantics in vma_end_write_all(). */ if (unlikely(vma->vm_lock_seq == smp_load_acquire(&vma->vm_mm->mm_lock_seq))) { - up_read(&vma->vm_lock->lock); + up_read(&vma->vm_lock); return false; } return true; @@ -711,7 +711,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ - up_read(&vma->vm_lock->lock); + up_read(&vma->vm_lock); rcu_read_unlock(); } @@ -740,7 +740,7 @@ static inline void vma_start_write(struct vm_area_struct *vma) if (__is_vma_write_locked(vma, &mm_lock_seq)) return; - down_write(&vma->vm_lock->lock); + down_write(&vma->vm_lock); /* * We should use WRITE_ONCE() here because we can have concurrent reads * from the early lockless pessimistic check in vma_start_read(). @@ -748,7 +748,7 @@ static inline void vma_start_write(struct vm_area_struct *vma) * we should use WRITE_ONCE() for cleanliness and to keep KCSAN happy. */ WRITE_ONCE(vma->vm_lock_seq, mm_lock_seq); - up_write(&vma->vm_lock->lock); + up_write(&vma->vm_lock); } static inline void vma_assert_write_locked(struct vm_area_struct *vma) @@ -760,7 +760,7 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) static inline void vma_assert_locked(struct vm_area_struct *vma) { - if (!rwsem_is_locked(&vma->vm_lock->lock)) + if (!rwsem_is_locked(&vma->vm_lock)) vma_assert_write_locked(vma); } @@ -827,10 +827,6 @@ static inline void assert_fault_locked(struct vm_fault *vmf) extern const struct vm_operations_struct vma_dummy_vm_ops; -/* - * WARNING: vma_init does not initialize vma->vm_lock. - * Use vm_area_alloc()/vm_area_free() if vma needs locking. - */ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) { memset(vma, 0, sizeof(*vma)); @@ -839,6 +835,8 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) INIT_LIST_HEAD(&vma->anon_vma_chain); vma_mark_detached(vma, false); vma_numab_state_init(vma); + init_rwsem(&vma->vm_lock); + vma->vm_lock_seq = -1; } /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 003619fab20e..caffdb4eeb94 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -615,10 +615,6 @@ static inline struct anon_vma_name *anon_vma_name_alloc(const char *name) } #endif -struct vma_lock { - struct rw_semaphore lock; -}; - struct vma_numab_state { /* * Initialised as time in 'jiffies' after which VMA @@ -716,8 +712,7 @@ struct vm_area_struct { * slowpath. */ int vm_lock_seq; - /* Unstable RCU readers are allowed to read this. */ - struct vma_lock *vm_lock; + void *vm_dummy; #endif /* @@ -770,6 +765,9 @@ struct vm_area_struct { struct vma_numab_state *numab_state; /* NUMA Balancing state */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; +#ifdef CONFIG_PER_VMA_LOCK + struct rw_semaphore vm_lock ____cacheline_aligned_in_smp; +#endif } __randomize_layout; #ifdef CONFIG_NUMA diff --git a/kernel/fork.c b/kernel/fork.c index 92bfe56c9fed..eab04a24d5f1 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -436,35 +436,6 @@ static struct kmem_cache *vm_area_cachep; /* SLAB cache for mm_struct structures (tsk->mm) */ static struct kmem_cache *mm_cachep; -#ifdef CONFIG_PER_VMA_LOCK - -/* SLAB cache for vm_area_struct.lock */ -static struct kmem_cache *vma_lock_cachep; - -static bool vma_lock_alloc(struct vm_area_struct *vma) -{ - vma->vm_lock = kmem_cache_alloc(vma_lock_cachep, GFP_KERNEL); - if (!vma->vm_lock) - return false; - - init_rwsem(&vma->vm_lock->lock); - vma->vm_lock_seq = -1; - - return true; -} - -static inline void vma_lock_free(struct vm_area_struct *vma) -{ - kmem_cache_free(vma_lock_cachep, vma->vm_lock); -} - -#else /* CONFIG_PER_VMA_LOCK */ - -static inline bool vma_lock_alloc(struct vm_area_struct *vma) { return true; } -static inline void vma_lock_free(struct vm_area_struct *vma) {} - -#endif /* CONFIG_PER_VMA_LOCK */ - struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) { struct vm_area_struct *vma; @@ -474,10 +445,6 @@ struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) return NULL; vma_init(vma, mm); - if (!vma_lock_alloc(vma)) { - kmem_cache_free(vm_area_cachep, vma); - return NULL; - } return vma; } @@ -496,10 +463,8 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) * will be reinitialized. */ data_race(memcpy(new, orig, sizeof(*new))); - if (!vma_lock_alloc(new)) { - kmem_cache_free(vm_area_cachep, new); - return NULL; - } + init_rwsem(&new->vm_lock); + new->vm_lock_seq = -1; INIT_LIST_HEAD(&new->anon_vma_chain); vma_numab_state_init(new); dup_anon_vma_name(orig, new); @@ -511,7 +476,6 @@ void __vm_area_free(struct vm_area_struct *vma) { vma_numab_state_free(vma); free_anon_vma_name(vma); - vma_lock_free(vma); kmem_cache_free(vm_area_cachep, vma); } @@ -522,7 +486,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) vm_rcu); /* The vma should not be locked while being destroyed. */ - VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma); + VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock), vma); __vm_area_free(vma); } #endif @@ -3192,10 +3156,7 @@ void __init proc_caches_init(void) SLAB_HWCACHE_ALIGN|SLAB_PANIC|SLAB_ACCOUNT, NULL); - vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); -#ifdef CONFIG_PER_VMA_LOCK - vma_lock_cachep = KMEM_CACHE(vma_lock, SLAB_PANIC|SLAB_ACCOUNT); -#endif + vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT|SLAB_HWCACHE_ALIGN); mmap_init(); nsproxy_cache_init(); } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 3b7715ecf292..e95ecb2063d2 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -92,7 +92,7 @@ static struct vm_area_struct *uffd_lock_vma(struct mm_struct *mm, * mmap_lock, which guarantees that nobody can lock the * vma for write (vma_start_write()) under us. */ - down_read(&vma->vm_lock->lock); + down_read(&vma->vm_lock); } mmap_read_unlock(mm); @@ -1468,9 +1468,9 @@ static int uffd_move_lock(struct mm_struct *mm, * See comment in uffd_lock_vma() as to why not using * vma_start_read() here. */ - down_read(&(*dst_vmap)->vm_lock->lock); + down_read(&(*dst_vmap)->vm_lock); if (*dst_vmap != *src_vmap) - down_read_nested(&(*src_vmap)->vm_lock->lock, + down_read_nested(&(*src_vmap)->vm_lock, SINGLE_DEPTH_NESTING); } mmap_read_unlock(mm);