From patchwork Fri Jan 27 19:41:10 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13119270 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C620CC61DB3 for ; Fri, 27 Jan 2023 19:42:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5BEE06B009A; Fri, 27 Jan 2023 14:42:35 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 571596B00A3; Fri, 27 Jan 2023 14:42:35 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 410BA6B00A4; Fri, 27 Jan 2023 14:42:35 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2FA346B009A for ; Fri, 27 Jan 2023 14:42:35 -0500 (EST) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 0B90E1606B7 for ; Fri, 27 Jan 2023 19:42:35 +0000 (UTC) X-FDA: 80401601070.29.7476709 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf13.hostedemail.com (Postfix) with ESMTP id 4D34F20028 for ; Fri, 27 Jan 2023 19:42:33 +0000 (UTC) Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=FNDN2+9L; spf=pass (imf13.hostedemail.com: domain of 3KCnUYwYKCO8jliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3KCnUYwYKCO8jliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1674848553; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ud3T37IGHP0g1lmdTa/tq8X3xMhLxr1AXmnGvd1U4CI=; b=EbbLI1ytiMR6JrLVHaD7yjvmz3jVZtcOJ9QL0qnxDP8nHnLOv2Gw6lOYUfX9OkDLKxEezF FAJsoSAnGA0DP76ZugxG5cDdb68aJp6hSXLTFkjYtC7ZTxruFjcCstDa5vnPxc1UBe6sMo 6cBuxkcrfXMG3BrctwK63OdnZeGInWI= ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=FNDN2+9L; spf=pass (imf13.hostedemail.com: domain of 3KCnUYwYKCO8jliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=3KCnUYwYKCO8jliVeSXffXcV.TfdcZelo-ddbmRTb.fiX@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1674848553; a=rsa-sha256; cv=none; b=AVqsILdNBS2K9zPL+iu+SufvJQa3HMe1IO8O5DVXrdnnsivJwnttUPlLwRudzjo0URM1Mi PKmXnZLFImz6Uz3YYL0UbEx8OMuG6rssHB5uWkPAdXW8c6I5T0Lp1Ex0WhsluhUiUoZjK3 cCaBmKpk65mPTc051XExEhqWeNdSubw= Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-507aac99fdfso65550597b3.11 for ; Fri, 27 Jan 2023 11:42:33 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Ud3T37IGHP0g1lmdTa/tq8X3xMhLxr1AXmnGvd1U4CI=; b=FNDN2+9L1Vmxr+/mMF6eML5JcVJbNLvQW3fVgvwdePzy3lehe5YuSr9HWsNaMC4a8J FUhI3Xak6vLO90aUXXpIFs6adKFN80x7s3bJgLZLYfmaGEKhkkZLbtXJLX0N/ZyPrSOD vT5YWUMa6D2YWUL+XcKLQxhxuZ8Qre6a0DkbEZ1ct6ULdZqWxPdE1UEo8jeDeLLPMaSB b3W+W/qkByIa+a5SR1bxuJVPYAEoRSrded/1J6aiwU0I/kursScnVvXrI1MYVCgC0Qld xz7h6VUcKpIFgOOniSgMFSpFb+kGVYjmhq6wkneKC/yuMYuw9ACcdFQ59agxDWycdAy0 DJpA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ud3T37IGHP0g1lmdTa/tq8X3xMhLxr1AXmnGvd1U4CI=; b=WGLENiBU9XZWfIEN/ZBzfD3FnbyThKh0+4cKOKIJ0dgFdjlWn5aSs+/MboR90gPce4 00UNxB5s7A2k9M0bHXdfAty7141dK//A2Mv+epeDosw6eMcjYNl9wz9YJEWcODd4DCcO I4fyRVFaHeUkjju4jTCYny3YHD0vFR73M6O9ll6lDKm8UNTpZ1Qm1XjeGLWEpUDW8ZbH iA0MgFzwrFQHQVskkTLdBT75XMm9/6s2CGY5aCRv5CCj0DF7PIJm01j9CZguK/vHPrPs oTU9EaRMd89FQt2PNpN1iEr+at229z0KPRljorCdBlOEh331rYtpkPAN1/3Kxle37WKe 06Sw== X-Gm-Message-State: AFqh2kqlG9vRtT9HLc9wp7joLRdXqJ9Tqh8H7k7YXcsGVAKpfm7v1S7I 58Q3L/h7H2XVbAEkHpm5omTzgivOJtE= X-Google-Smtp-Source: AMrXdXuWd7O6+z302DT+NWEpQuMGblny5VWrJhlz+cYeowewBMKBJvug6bhWSAlG5JyuNtY5PTSWQDkfixM= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:4e19:be9:c5d0:8483]) (user=surenb job=sendgmr) by 2002:a81:be06:0:b0:4b2:fa7c:8836 with SMTP id i6-20020a81be06000000b004b2fa7c8836mr5004644ywn.195.1674848552353; Fri, 27 Jan 2023 11:42:32 -0800 (PST) Date: Fri, 27 Jan 2023 11:41:10 -0800 In-Reply-To: <20230127194110.533103-1-surenb@google.com> Mime-Version: 1.0 References: <20230127194110.533103-1-surenb@google.com> X-Mailer: git-send-email 2.39.1.456.gfc5497dd1b-goog Message-ID: <20230127194110.533103-34-surenb@google.com> Subject: [PATCH v2 33/33] mm: separate vma->lock from vm_area_struct From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, surenb@google.com X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 4D34F20028 X-Rspam-User: X-Stat-Signature: 31izyak4w9bndf74szkd8rudeqcwzeks X-HE-Tag: 1674848553-637510 X-HE-Meta: U2FsdGVkX1+Bd+L4VDTPWf3DUzQPegbNPSoJKd4v8gq4JMO1qzPNJfX5IxIvdmnaUlaLVgt8MLcftONtca7E4Sctm1VCMxyQ815x/GeW9X6Lf/5YhSBPtXUV1xAqWIXFkwBzik3xl4GM8CCYwt9iU0+IxchE9cKkQUAcKwiP02/5ImS6DCOveKIMFsc/rDjwtb7qwbN0AddvfletqA9us5soNgUAxA32oKjVTbI+cyzjYs6wqDPR5b9XbJDa0dIJCbSeR3NFEUY9f/fBHzFYkXzw6Vml+Cu3zXfuM021AjDVJvlEyIP/vL5JB2X2eF/wuIcTqtE34kzPD7N6Rs+XE47PyiPYU3xO15rqdUeSp+MsvkJXH7DnjMyfwOoO6HpCO3tLfPe5qhdvyJucSgxo9pn7umsJVMAfPSGAhiDgROBAvCXMed+nLn6ywqPwoi5CTWq91IZySKqWUzC0KiTC8U9RBpCGQxU7P0vS9Ashn04syeM3u7Z6s18Q3nWCzdimhWIKsDghNLkiJEWUTc/911SM80MG+Oik8Y2uQKKqLkP102srhb2GHwZqT78Sr/HdCOC9MtFd2RecY9fboHHFu7mx87rQLgmsfdxZs/CdbAEoXvbDup14aUp+3m3Iv8qFB6eZ4uc8nNAzwBs5DK3J8buUkcIwudzmPxj+2CpX0Wod4lhfBZpk/Dw6eOkqXcJA83h7g4gLfzStlXP097BRCyTXAI9oKdkkLoipYwR+cEWg5A/KFDkWnheJOpWxvRv3fxckDcFO10gsMRpjHCX749bvJheqnAKzG9CiyVugSFhf1/+b8wHqCVtC5R54ebw6/quLuhgWXcz+Jno+A8yRiXw5Vq56rviU3X616JSpgCMSeKCNyocGFOJZ9WjrqYjwbZO6OLn/qoflzPRiBh0HpZuKkssDB6+c7WAGew6TmSCKu33yycVHZTzCGAa3tHlP2g8XfAFn8ukDNqRP3tT uN3E2T+2 rmPkvK5Za+5R/agCFtGlgMnVa4FbiMTwni7Rzx66aL5r0e2eSeZLrknRdzalbiUqd6n+OI/v7DBOEHLLZQTbVtFL/GjmSibQ7mxN38onFppYxrxdl4tjPrMROG+M1t0CHHTC0UOCwI4iuE8oOqRMl68gx75a7/NSEX7UKW+wC2VjmZFOsuK7VnjI075+2xtODy6pOws5QCD7TIqrnpXzkoo5GZOhRV7Xr5iclYeYDbwERVozPoIp1uwxY0oEv9OH6Us9WR5bSi7tpBxtW0l5pUoM0meXxrBlxitUgVwIj+eLv9MAtP/JV5V8IyPH9ZQ6Qj2jI2UxItbf/CLzmk0C8/UsaU1RuOhcrkIINryHHhOVmZHapGlTNnjvMKN8gT+3ulm47VzqqLVcNhEu9Rx7HL9ovjFOweYG1/RDogJFn3+MaddeguLrSTGBlWWRbmWvEpAezbs8b1TALVbh3HFWEgLuYXISoHLTqGcVQVY6EjlPRuut/Gg95d4Y8rDwaQm3oOcE5gMJ8MX4+1MddZDju1SG2kwPVOCmUpUZUFAnQxcSqKj/c2F0fpNgYYEuuR4if4wom4CttH+Lq8xySfJl0fAZvhHOkzHkYs9l2qjQQxWhweY8aLGBcXp+LdEOTa3LRtCKoiYHOrEKT6FeH+13ucdQb6F4t5HHz7UZg/NGDBBrQw3o= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: vma->lock being part of the vm_area_struct causes performance regression during page faults because during contention its count and owner fields are constantly updated and having other parts of vm_area_struct used during page fault handling next to them causes constant cache line bouncing. Fix that by moving the lock outside of the vm_area_struct. All attempts to keep vma->lock inside vm_area_struct in a separate cache line still produce performance regression especially on NUMA machines. Smallest regression was achieved when lock is placed in the fourth cache line but that bloats vm_area_struct to 256 bytes. Considering performance and memory impact, separate lock looks like the best option. It increases memory footprint of each VMA but that can be optimized later if the new size causes issues. Note that after this change vma_init() does not allocate or initialize vma->lock anymore. A number of drivers allocate a pseudo VMA on the stack but they never use the VMA's lock, therefore it does not need to be allocated. The future drivers which might need the VMA lock should use vm_area_alloc()/vm_area_free() to allocate the VMA. Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 23 ++++++------- include/linux/mm_types.h | 6 +++- kernel/fork.c | 73 ++++++++++++++++++++++++++++++++-------- 3 files changed, 74 insertions(+), 28 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1c4ddcd6fd84..52e048c31239 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -621,12 +621,6 @@ struct vm_operations_struct { }; #ifdef CONFIG_PER_VMA_LOCK -static inline void vma_init_lock(struct vm_area_struct *vma) -{ - init_rwsem(&vma->lock); - vma->vm_lock_seq = -1; -} - /* * Try to read-lock a vma. The function is allowed to occasionally yield false * locked result to avoid performance overhead, in which case we fall back to @@ -638,17 +632,17 @@ static inline bool vma_start_read(struct vm_area_struct *vma) if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) return false; - if (unlikely(down_read_trylock(&vma->lock) == 0)) + if (unlikely(down_read_trylock(&vma->vm_lock->lock) == 0)) return false; /* * Overflow might produce false locked result. * False unlocked result is impossible because we modify and check - * vma->vm_lock_seq under vma->lock protection and mm->mm_lock_seq + * vma->vm_lock_seq under vma->vm_lock protection and mm->mm_lock_seq * modification invalidates all existing locks. */ if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { - up_read(&vma->lock); + up_read(&vma->vm_lock->lock); return false; } return true; @@ -657,7 +651,7 @@ static inline bool vma_start_read(struct vm_area_struct *vma) static inline void vma_end_read(struct vm_area_struct *vma) { rcu_read_lock(); /* keeps vma alive till the end of up_read */ - up_read(&vma->lock); + up_read(&vma->vm_lock->lock); rcu_read_unlock(); } @@ -675,9 +669,9 @@ static inline void vma_start_write(struct vm_area_struct *vma) if (vma->vm_lock_seq == mm_lock_seq) return; - down_write(&vma->lock); + down_write(&vma->vm_lock->lock); vma->vm_lock_seq = mm_lock_seq; - up_write(&vma->lock); + up_write(&vma->vm_lock->lock); } static inline void vma_assert_write_locked(struct vm_area_struct *vma) @@ -704,6 +698,10 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} #endif /* CONFIG_PER_VMA_LOCK */ +/* + * WARNING: vma_init does not initialize vma->vm_lock. + * Use vm_area_alloc()/vm_area_free() if vma needs locking. + */ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) { static const struct vm_operations_struct dummy_vm_ops = {}; @@ -712,7 +710,6 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) vma->vm_mm = mm; vma->vm_ops = &dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); - vma_init_lock(vma); } /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index c4c43f10344a..1e97bb98197c 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -471,6 +471,10 @@ struct anon_vma_name { char name[]; }; +struct vma_lock { + struct rw_semaphore lock; +}; + /* * This struct describes a virtual memory area. There is one of these * per VM-area/task. A VM area is any part of the process virtual memory @@ -510,7 +514,7 @@ struct vm_area_struct { #ifdef CONFIG_PER_VMA_LOCK int vm_lock_seq; - struct rw_semaphore lock; + struct vma_lock *vm_lock; #endif /* diff --git a/kernel/fork.c b/kernel/fork.c index d0999de82f94..a152804faa14 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -451,13 +451,49 @@ static struct kmem_cache *vm_area_cachep; /* SLAB cache for mm_struct structures (tsk->mm) */ static struct kmem_cache *mm_cachep; +#ifdef CONFIG_PER_VMA_LOCK + +/* SLAB cache for vm_area_struct.lock */ +static struct kmem_cache *vma_lock_cachep; + +static bool vma_lock_alloc(struct vm_area_struct *vma) +{ + vma->vm_lock = kmem_cache_alloc(vma_lock_cachep, GFP_KERNEL); + if (!vma->vm_lock) + return false; + + init_rwsem(&vma->vm_lock->lock); + vma->vm_lock_seq = -1; + + return true; +} + +static inline void vma_lock_free(struct vm_area_struct *vma) +{ + kmem_cache_free(vma_lock_cachep, vma->vm_lock); +} + +#else /* CONFIG_PER_VMA_LOCK */ + +static inline bool vma_lock_alloc(struct vm_area_struct *vma) { return true; } +static inline void vma_lock_free(struct vm_area_struct *vma) {} + +#endif /* CONFIG_PER_VMA_LOCK */ + struct vm_area_struct *vm_area_alloc(struct mm_struct *mm) { struct vm_area_struct *vma; vma = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); - if (vma) - vma_init(vma, mm); + if (!vma) + return NULL; + + vma_init(vma, mm); + if (!vma_lock_alloc(vma)) { + kmem_cache_free(vm_area_cachep, vma); + return NULL; + } + return vma; } @@ -465,24 +501,30 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) { struct vm_area_struct *new = kmem_cache_alloc(vm_area_cachep, GFP_KERNEL); - if (new) { - ASSERT_EXCLUSIVE_WRITER(orig->vm_flags); - ASSERT_EXCLUSIVE_WRITER(orig->vm_file); - /* - * orig->shared.rb may be modified concurrently, but the clone - * will be reinitialized. - */ - data_race(memcpy(new, orig, sizeof(*new))); - INIT_LIST_HEAD(&new->anon_vma_chain); - vma_init_lock(new); - dup_anon_vma_name(orig, new); + if (!new) + return NULL; + + ASSERT_EXCLUSIVE_WRITER(orig->vm_flags); + ASSERT_EXCLUSIVE_WRITER(orig->vm_file); + /* + * orig->shared.rb may be modified concurrently, but the clone + * will be reinitialized. + */ + data_race(memcpy(new, orig, sizeof(*new))); + if (!vma_lock_alloc(new)) { + kmem_cache_free(vm_area_cachep, new); + return NULL; } + INIT_LIST_HEAD(&new->anon_vma_chain); + dup_anon_vma_name(orig, new); + return new; } void __vm_area_free(struct vm_area_struct *vma) { free_anon_vma_name(vma); + vma_lock_free(vma); kmem_cache_free(vm_area_cachep, vma); } @@ -493,7 +535,7 @@ static void vm_area_free_rcu_cb(struct rcu_head *head) vm_rcu); /* The vma should not be locked while being destroyed. */ - VM_BUG_ON_VMA(rwsem_is_locked(&vma->lock), vma); + VM_BUG_ON_VMA(rwsem_is_locked(&vma->vm_lock->lock), vma); __vm_area_free(vma); } #endif @@ -3089,6 +3131,9 @@ void __init proc_caches_init(void) NULL); vm_area_cachep = KMEM_CACHE(vm_area_struct, SLAB_PANIC|SLAB_ACCOUNT); +#ifdef CONFIG_PER_VMA_LOCK + vma_lock_cachep = KMEM_CACHE(vma_lock, SLAB_PANIC|SLAB_ACCOUNT); +#endif mmap_init(); nsproxy_cache_init(); }