From patchwork Mon Feb 27 17:36:11 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13153975 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 231FCC64ED8 for ; Mon, 27 Feb 2023 17:37:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B707E6B0088; Mon, 27 Feb 2023 12:37:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B45916B008A; Mon, 27 Feb 2023 12:37:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A0DA06B0092; Mon, 27 Feb 2023 12:37:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8FE336B0088 for ; Mon, 27 Feb 2023 12:37:08 -0500 (EST) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 6AAA440616 for ; Mon, 27 Feb 2023 17:37:08 +0000 (UTC) X-FDA: 80513777736.05.E5A1AB7 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf23.hostedemail.com (Postfix) with ESMTP id 9AEC3140002 for ; Mon, 27 Feb 2023 17:37:06 +0000 (UTC) Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=NrffhdAm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of 3Qer8YwYKCDAegdQZNSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--surenb.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3Qer8YwYKCDAegdQZNSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--surenb.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1677519426; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Q/GJoky+VbSuk6dzXc8JP5Q5/FC6q697TPLIvz4vCWo=; b=ZIxPj6b3DS/EUW7nJSLBtPBkYoEIOoFmsJvgJ1fc/Kp0yvlj5TYXaYymnG2xp7n1dYCGUS ImVZ2C51edMl3jpO+sv44nMF7PFrII/o4JhOT24AYdGKWzyUU4mjbnWxwpg73WT9ivZlqV ExzVCyZemzKV9VdkIPlZ+U4lylo/azs= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=NrffhdAm; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf23.hostedemail.com: domain of 3Qer8YwYKCDAegdQZNSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--surenb.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3Qer8YwYKCDAegdQZNSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--surenb.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1677519426; a=rsa-sha256; cv=none; b=2VSdZYPPbgtdpFetLwXoTko6gQA3fLzeRS0VOXn6s8eP+U0cHwe2X5JdunXobzKLwlG2FD TmeDdew53O9gCybx4KUTR/rXrN4UOuZMTcaClGtmkTIT5AbpLN4SvpRRlk7cSO5Y19ZP9j O1bhnPyW1d5qNT/GoZS7zpwOTNjGSJo= Received: by mail-pg1-f201.google.com with SMTP id g13-20020a63f40d000000b005015be7b9faso2178534pgi.15 for ; Mon, 27 Feb 2023 09:37:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Q/GJoky+VbSuk6dzXc8JP5Q5/FC6q697TPLIvz4vCWo=; b=NrffhdAmMk53hmq587bWVqAcVwUoFMyrGdI1dcdSP5rFWcjv1b2dxTOEh/45qUYAPi h/0PdE/bzVHMe1/toIQLnI1PA1/EP/QcNn95krcTC4R7+IMiZJdtCRCtMs9jcf3/2ANu i0kT+E3X8ZM+V+p/+ReYSUc0stVg3Z855SoJ4cBQAEK0+2hyKtIS1nwe4GOth2yQHLfU w8Fk0BJXAVVwtUoVe1w9Qpj9BUCfZjuVzX1iYFGwYp+nbDSsHPi+lTKw+kw0AGwaK95T wMmibkxVEN+68L0K6akco8ImKzGRBceFctt146aukjldpyDGZk2PpWKH7Fdr+teexTnY XHzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Q/GJoky+VbSuk6dzXc8JP5Q5/FC6q697TPLIvz4vCWo=; b=C1LPkgV1IP6bzx5snlKjzgkQraqiISFMDu3YTI9rlWXN1Hi30Xnl+9e47e1cVXg5PI QSaigPygeZmZQ+hJpQTq5HvBb6l5o1v5A23FovwB/6Mb2wf4oMOLhqiFxEAFBPHmGHF9 xyxNypsmNTeu7Q/uLLNwK1f6KN1x4gIE2ThvrjIYzt3DsOMjv/vLV0Y9CzHs1Bgcll00 pXa/dSzbOAecJA2MBW93sjrj8f4H7XMWrtjBDy7l++uUM+29i2U3OWXMXVG4z9rSyKmf X8Aeu6A4tNL7lsEIO24uNGOTafsqgr5AnKte/m8MDXRaaT10AlWcsS3Xzit0kG6nEIKx C84Q== X-Gm-Message-State: AO0yUKX+x9eu06u+pHTf0BFmv1gZQrqBWEbK2eg4Fs9OgNaFXgHPZ/DV rzHbxScH3VeZtVALrQjwMZTIT9KiLMM= X-Google-Smtp-Source: AK7set/svLuIVJzjBHat6rB8sP8zcMP/ZnW0bz7V1s9w04rE5j+sFd4oXDqXKYJ9s+zwLVwyfpWMezpHbsk= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:200:e1f6:21d1:eead:3897]) (user=surenb job=sendgmr) by 2002:a17:90a:1081:b0:237:3b68:e50c with SMTP id c1-20020a17090a108100b002373b68e50cmr30137pja.1.1677519425412; Mon, 27 Feb 2023 09:37:05 -0800 (PST) Date: Mon, 27 Feb 2023 09:36:11 -0800 In-Reply-To: <20230227173632.3292573-1-surenb@google.com> Mime-Version: 1.0 References: <20230227173632.3292573-1-surenb@google.com> X-Mailer: git-send-email 2.39.2.722.g9855ee24e9-goog Message-ID: <20230227173632.3292573-13-surenb@google.com> Subject: [PATCH v4 12/33] mm: add per-VMA lock and helper functions to control it From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, vbabka@suse.cz, hannes@cmpxchg.org, mgorman@techsingularity.net, dave@stgolabs.net, willy@infradead.org, liam.howlett@oracle.com, peterz@infradead.org, ldufour@linux.ibm.com, paulmck@kernel.org, mingo@redhat.com, will@kernel.org, luto@kernel.org, songliubraving@fb.com, peterx@redhat.com, david@redhat.com, dhowells@redhat.com, hughd@google.com, bigeasy@linutronix.de, kent.overstreet@linux.dev, punit.agrawal@bytedance.com, lstoakes@gmail.com, peterjung1337@gmail.com, rientjes@google.com, chriscli@google.com, axelrasmussen@google.com, joelaf@google.com, minchan@google.com, rppt@kernel.org, jannh@google.com, shakeelb@google.com, tatashin@google.com, edumazet@google.com, gthelen@google.com, gurua@google.com, arjunroy@google.com, soheil@google.com, leewalsh@google.com, posk@google.com, michalechner92@googlemail.com, linux-mm@kvack.org, linux-arm-kernel@lists.infradead.org, linuxppc-dev@lists.ozlabs.org, x86@kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com, Suren Baghdasaryan X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 9AEC3140002 X-Stat-Signature: zatzcsbdypsra8s1fgn1f58gnhrcxzd1 X-HE-Tag: 1677519426-549607 X-HE-Meta: U2FsdGVkX18sO0OpCCS8/XO8GlBbQkP4Ka40b9FRMk6Nrg2+GGkvrzQHMUeRbUF4cGzVlxvcZynoVuH0j0KRqXk7r5wIiUZrfTdc4h8CPkin0FiME62VDdWnxJ+bGSLT+mNvUpJEPB4RbxIw7eNLI8P72ud0APvGwH5p0e7kuCRXtkTX4IvyW4Ae0FKtx0+WLgQAwYcYN53+EkAoQlZVypE03ABa592mTckxC5SikwMzvkGf1cbaVKyqNpXmit++Lk2GhizmijqvXB1UL2298ur7DCz30MAvjgUH980M+Yb5q/X2wK2qMIusLslqO9hAAAOD0ztF4+14e+48otOZL5xg1GU4D01l12n0WNi1KooQAbIPpsJftnTw0b1VWMD4pERzLyjhzlQdYAsTc+CFb11S5YZ5SDKUCRZUQ7jCDSWiJ+uP9gWNlpH+4GpV1m8ABlcPpwq/uUQiX19nHGSGKQSJFQAOZUElfHhOJ6pjQzO6s7FoE73lCFx1IuCWvLlPriF+EHoA4R8miL0lCMBfXS1m9EoVbJOSjIi6qFb4qdUJWtZJ3ZMudtIGD+2/Rr24MVqqhbKlLGPJ/+4EvKamIDcxLU9XrwWh4+3rgdQl1Sfm/El/thsY3/bx/4yu2IWAqc2eho1Bfep1ifmB/ik+bz95QcZs0R2kcKjX0ALKh6j8R9JszicbihMpOW2JP/ZbgRaI4ihy9KXKuWmYY7otE4R2WANicUlTC3aHMAtMKoZpQnK+qrtpYgJu/crvlosntLG9bBiD2rCPbX5li/9uqAal5jo3kaEHXb7iXZ7I/ylB/YN3fXDhe9nNSqxdf4D/XsZ/k0czGWHLo7hJGBAiIxcBKZAeQbOVZK3ZqUxcorc+ZTOKuZV9DBSaUy9XAyJnnLPTGip7MYEONHBpwZ23q5dGvWCDQEpSyy4alFET5X4eReSs7ndsFgO19zVNyuk6vu5cUzGat9SZlzWRsQr 7SsiapRK NgR+YfREgO1bdyoebYcaYD8yAUuSEX+pAnyuyf/vuAt24ukklzjfy2V0JuouhjfeuJu6nTsY84X/hKHPK4cj6BjS0rl681/djLlarYyD6nQZ+ak04TCH7QVv6NyOmqGOl2BRoAIt9HsLV0suyR946wIMvY46fqpIgyy04YSE8AJ8YZ0vYgkIufGyfpj/gq4TgDA/zkEIm9M6MR9tBote/FJ0KzMyWJycb6g6s8EVef6tgGV+EI3V5P2xeHcSDjWaj1POFBD47Tbba5EWljVtCerWlgu9ngNU24pYXTt5QCWaWnDcx2T6tusfs05AbRLnnG8J4bRTfp9KbI5EHchJvFn1MxodL/21lVMKQL/UYhQ8X+1EZuE6dD+PymSziplKNcX9pSrpm1DOj31qjjeco0xMvJqHrS9QZuV3iEJc4z4RoeMft0Ao/9kp4F7FSfFxBOIXHqCD48mPMdZYnJL1gH4QZTtXk/Ibq1mEwKR0D9a4Q7TpVFoZ/qL/8SnSZpNS+jt3BWe7QZV/546P27GorY7C5jw26LZtdbqpeLU45whgQWeDdYavjGTn+2TQNk/BgJUqoFtahFVEAE08u86NJXhwP8INjyAhQIYv9y41kF0f8NvCPPpyaao9vih7qjQo0/QcQf7LapvOtWpN4vycKz+5Crg== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce per-VMA locking. The lock implementation relies on a per-vma and per-mm sequence counters to note exclusive locking: - read lock - (implemented by vma_start_read) requires the vma (vm_lock_seq) and mm (mm_lock_seq) sequence counters to differ. If they match then there must be a vma exclusive lock held somewhere. - read unlock - (implemented by vma_end_read) is a trivial vma->lock unlock. - write lock - (vma_start_write) requires the mmap_lock to be held exclusively and the current mm counter is assigned to the vma counter. This will allow multiple vmas to be locked under a single mmap_lock write lock (e.g. during vma merging). The vma counter is modified under exclusive vma lock. - write unlock - (vma_end_write_all) is a batch release of all vma locks held. It doesn't pair with a specific vma_start_write! It is done before exclusive mmap_lock is released by incrementing mm sequence counter (mm_lock_seq). - write downgrade - if the mmap_lock is downgraded to the read lock, all vma write locks are released as well (effectivelly same as write unlock). Signed-off-by: Suren Baghdasaryan --- include/linux/mm.h | 82 +++++++++++++++++++++++++++++++++++++++ include/linux/mm_types.h | 8 ++++ include/linux/mmap_lock.h | 13 +++++++ kernel/fork.c | 4 ++ mm/init-mm.c | 3 ++ 5 files changed, 110 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 1f79667824eb..bbad5d4fa81b 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -623,6 +623,87 @@ struct vm_operations_struct { unsigned long addr); }; +#ifdef CONFIG_PER_VMA_LOCK +static inline void vma_init_lock(struct vm_area_struct *vma) +{ + init_rwsem(&vma->lock); + vma->vm_lock_seq = -1; +} + +/* + * Try to read-lock a vma. The function is allowed to occasionally yield false + * locked result to avoid performance overhead, in which case we fall back to + * using mmap_lock. The function should never yield false unlocked result. + */ +static inline bool vma_start_read(struct vm_area_struct *vma) +{ + /* Check before locking. A race might cause false locked result. */ + if (vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq)) + return false; + + if (unlikely(down_read_trylock(&vma->lock) == 0)) + return false; + + /* + * Overflow might produce false locked result. + * False unlocked result is impossible because we modify and check + * vma->vm_lock_seq under vma->lock protection and mm->mm_lock_seq + * modification invalidates all existing locks. + */ + if (unlikely(vma->vm_lock_seq == READ_ONCE(vma->vm_mm->mm_lock_seq))) { + up_read(&vma->lock); + return false; + } + return true; +} + +static inline void vma_end_read(struct vm_area_struct *vma) +{ + rcu_read_lock(); /* keeps vma alive till the end of up_read */ + up_read(&vma->lock); + rcu_read_unlock(); +} + +static inline void vma_start_write(struct vm_area_struct *vma) +{ + int mm_lock_seq; + + mmap_assert_write_locked(vma->vm_mm); + + /* + * current task is holding mmap_write_lock, both vma->vm_lock_seq and + * mm->mm_lock_seq can't be concurrently modified. + */ + mm_lock_seq = READ_ONCE(vma->vm_mm->mm_lock_seq); + if (vma->vm_lock_seq == mm_lock_seq) + return; + + down_write(&vma->lock); + vma->vm_lock_seq = mm_lock_seq; + up_write(&vma->lock); +} + +static inline void vma_assert_write_locked(struct vm_area_struct *vma) +{ + mmap_assert_write_locked(vma->vm_mm); + /* + * current task is holding mmap_write_lock, both vma->vm_lock_seq and + * mm->mm_lock_seq can't be concurrently modified. + */ + VM_BUG_ON_VMA(vma->vm_lock_seq != READ_ONCE(vma->vm_mm->mm_lock_seq), vma); +} + +#else /* CONFIG_PER_VMA_LOCK */ + +static inline void vma_init_lock(struct vm_area_struct *vma) {} +static inline bool vma_start_read(struct vm_area_struct *vma) + { return false; } +static inline void vma_end_read(struct vm_area_struct *vma) {} +static inline void vma_start_write(struct vm_area_struct *vma) {} +static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} + +#endif /* CONFIG_PER_VMA_LOCK */ + static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) { static const struct vm_operations_struct dummy_vm_ops = {}; @@ -631,6 +712,7 @@ static inline void vma_init(struct vm_area_struct *vma, struct mm_struct *mm) vma->vm_mm = mm; vma->vm_ops = &dummy_vm_ops; INIT_LIST_HEAD(&vma->anon_vma_chain); + vma_init_lock(vma); } /* Use when VMA is not part of the VMA tree and needs no locking */ diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 64a6b3f6b74f..a4e7493bacd7 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -508,6 +508,11 @@ struct vm_area_struct { vm_flags_t __private __vm_flags; }; +#ifdef CONFIG_PER_VMA_LOCK + int vm_lock_seq; + struct rw_semaphore lock; +#endif + /* * For areas with an address space and backing store, * linkage into the address_space->i_mmap interval tree. @@ -644,6 +649,9 @@ struct mm_struct { * init_mm.mmlist, and are protected * by mmlist_lock */ +#ifdef CONFIG_PER_VMA_LOCK + int mm_lock_seq; +#endif unsigned long hiwater_rss; /* High-watermark of RSS usage */ diff --git a/include/linux/mmap_lock.h b/include/linux/mmap_lock.h index e49ba91bb1f0..aab8f1b28d26 100644 --- a/include/linux/mmap_lock.h +++ b/include/linux/mmap_lock.h @@ -72,6 +72,17 @@ static inline void mmap_assert_write_locked(struct mm_struct *mm) VM_BUG_ON_MM(!rwsem_is_locked(&mm->mmap_lock), mm); } +#ifdef CONFIG_PER_VMA_LOCK +static inline void vma_end_write_all(struct mm_struct *mm) +{ + mmap_assert_write_locked(mm); + /* No races during update due to exclusive mmap_lock being held */ + WRITE_ONCE(mm->mm_lock_seq, mm->mm_lock_seq + 1); +} +#else +static inline void vma_end_write_all(struct mm_struct *mm) {} +#endif + static inline void mmap_init_lock(struct mm_struct *mm) { init_rwsem(&mm->mmap_lock); @@ -114,12 +125,14 @@ static inline bool mmap_write_trylock(struct mm_struct *mm) static inline void mmap_write_unlock(struct mm_struct *mm) { __mmap_lock_trace_released(mm, true); + vma_end_write_all(mm); up_write(&mm->mmap_lock); } static inline void mmap_write_downgrade(struct mm_struct *mm) { __mmap_lock_trace_acquire_returned(mm, false, true); + vma_end_write_all(mm); downgrade_write(&mm->mmap_lock); } diff --git a/kernel/fork.c b/kernel/fork.c index a63b739aeca9..e1dd79c7738c 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -474,6 +474,7 @@ struct vm_area_struct *vm_area_dup(struct vm_area_struct *orig) */ data_race(memcpy(new, orig, sizeof(*new))); INIT_LIST_HEAD(&new->anon_vma_chain); + vma_init_lock(new); dup_anon_vma_name(orig, new); } return new; @@ -1216,6 +1217,9 @@ static struct mm_struct *mm_init(struct mm_struct *mm, struct task_struct *p, seqcount_init(&mm->write_protect_seq); mmap_init_lock(mm); INIT_LIST_HEAD(&mm->mmlist); +#ifdef CONFIG_PER_VMA_LOCK + mm->mm_lock_seq = 0; +#endif mm_pgtables_bytes_init(mm); mm->map_count = 0; mm->locked_vm = 0; diff --git a/mm/init-mm.c b/mm/init-mm.c index c9327abb771c..33269314e060 100644 --- a/mm/init-mm.c +++ b/mm/init-mm.c @@ -37,6 +37,9 @@ struct mm_struct init_mm = { .page_table_lock = __SPIN_LOCK_UNLOCKED(init_mm.page_table_lock), .arg_lock = __SPIN_LOCK_UNLOCKED(init_mm.arg_lock), .mmlist = LIST_HEAD_INIT(init_mm.mmlist), +#ifdef CONFIG_PER_VMA_LOCK + .mm_lock_seq = 0, +#endif .user_ns = &init_user_ns, .cpu_bitmap = CPU_BITS_NONE, #ifdef CONFIG_IOMMU_SVA