From patchwork Thu Aug 3 17:26:46 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13340368 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2A092EB64DD for ; Thu, 3 Aug 2023 17:27:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 59DD0280280; Thu, 3 Aug 2023 13:27:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 573B428022C; Thu, 3 Aug 2023 13:27:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 379E3280280; Thu, 3 Aug 2023 13:27:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 2559728022C for ; Thu, 3 Aug 2023 13:27:01 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E7E0880545 for ; Thu, 3 Aug 2023 17:27:00 +0000 (UTC) X-FDA: 81083473800.17.4EA315B Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf30.hostedemail.com (Postfix) with ESMTP id E13C980019 for ; Thu, 3 Aug 2023 17:26:58 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=z+dQhQo+; spf=pass (imf30.hostedemail.com: domain of 3YePLZAYKCIQ02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3YePLZAYKCIQ02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1691083619; a=rsa-sha256; cv=none; b=uwpVy5f1ZEgCq73F/IIfqmJRsKmyvUFaxrOVx/lFo8bCFziqlE3cYZxetNIuvKWJJZGib+ I5/OQM9eHKPZ6rKnkIRRGNHjf7iaAO2HBeEU7tEJMPdRjuRk0rMrNTmjqj/2QC7QJOwoV2 g/5IIMohNFosmmbPglu7lb5MdH35Bw8= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=z+dQhQo+; spf=pass (imf30.hostedemail.com: domain of 3YePLZAYKCIQ02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3YePLZAYKCIQ02zmvjowwotm.kwutqv25-uus3iks.wzo@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1691083619; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=71RLrK6Mb0aadLJnf8lNJkVJrOaKPnZ5THHnlI7+5Ec=; b=DrHTbr1YuXxuyv469kZXDVB9Dn/3HZWhgPajObnT5y2Ps2ZPphp5wGVbpjLVwS8lOia6Uz P2Z5EnvkBZ0Al8jV6L9bQrBR6C/h4S+O4ouEEdUXGubEvTaSft1EMN7twZC4NzEOlT86Ok FdOLwi/JEVkKCKkwumc1iI3vd3CFFkI= Received: by mail-yb1-f201.google.com with SMTP id 3f1490d57ef6-d1efa597303so3974297276.0 for ; Thu, 03 Aug 2023 10:26:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1691083618; x=1691688418; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=71RLrK6Mb0aadLJnf8lNJkVJrOaKPnZ5THHnlI7+5Ec=; b=z+dQhQo+Kt6Eb1Ib+dHdcc5V/AMlevUGc3YlsFbmFxB8Ja4tQL0sH583LPCWHFEKjr LCJT0zSGk242YQimvb9fWrReMP3dYgoxBDMB5aSvFbBas9Qi+mtKIyhJ5sOA6H93jSPA R+X5vWxIjdvpEuwSIeyZt4s4fP03JBTeA8GITi/vGy+0azOReq95xpDC5gosxo4P7L+f qORD71ozdIKCy1hZFUlq7/ogEyre/zziudpgQzDONH/lXc9z3s8VuH3xg5QPQ8rKfCS5 Q5g6DHxNksBOYdDKBZ5rLMB641Cm5MK9gkaR7CAnuxbqNZ64rc/TRUA/dgOJMrYH2nl6 /KbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1691083618; x=1691688418; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=71RLrK6Mb0aadLJnf8lNJkVJrOaKPnZ5THHnlI7+5Ec=; b=do3zfG1APMzt5pmKH8PqmDXE0uWmld86HHOkP8NQz3dHJAxQwUQ2pvbt2uCbplVpul Cff1+u8j57NFY0EN/7wquhjYGJ1gQmyLN2PqUOaT+8vDjz+9IZZEpWPNNRYCIsPZqeGo av8KLBicFyynPmDMmAoWM4JPlNpfzDM11ZYnICzSjhCgIiW7yTJVvFMro+s9LxhZfVSE 6cv+d66p2jQ8dYeg6ObRKKBl1afGYjtfEHCnPe9d2grAhi9EIHouB55RUvipUOHgv55L W/9Nub+NupV2Yzq2EqCI81zGPFU0CmuhscR2POSvfnrDAa+ZRkbSJ3zqNp+a8j2i86wG R8BA== X-Gm-Message-State: ABy/qLb/7EikUKA6I4K5bvDEKssI71YfFujNoLeNqey5C71AB9RNnFi3 MFzB6Cq8Xdjkik9jhYV1n2c955kdpek= X-Google-Smtp-Source: APBJJlHG20H85H4lsY8faC8hU9eTd9+dNHHyvDgi8rmCO+g2myKSnUTG+z8MfGkV4XoNO+8b/dUqf5AuJJw= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:3dec:ef9e:7bf4:7a6]) (user=surenb job=sendgmr) by 2002:a05:6902:11cb:b0:d16:7ccc:b406 with SMTP id n11-20020a05690211cb00b00d167cccb406mr243662ybu.5.1691083617987; Thu, 03 Aug 2023 10:26:57 -0700 (PDT) Date: Thu, 3 Aug 2023 10:26:46 -0700 In-Reply-To: <20230803172652.2849981-1-surenb@google.com> Mime-Version: 1.0 References: <20230803172652.2849981-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.585.gd2178a4bd4-goog Message-ID: <20230803172652.2849981-2-surenb@google.com> Subject: [PATCH v3 1/6] mm: enable page walking API to lock vmas during the walk From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: torvalds@linux-foundation.org, jannh@google.com, willy@infradead.org, liam.howlett@oracle.com, david@redhat.com, peterx@redhat.com, ldufour@linux.ibm.com, vbabka@suse.cz, michel@lespinasse.org, jglisse@google.com, mhocko@suse.com, hannes@cmpxchg.org, dave@stgolabs.net, hughd@google.com, linux-kernel@vger.kernel.org, linux-mm@kvack.org, stable@vger.kernel.org, Suren Baghdasaryan , Linus Torvalds X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E13C980019 X-Stat-Signature: ejkqk4agcq6sik3ji96o1tdpmqn5ohr7 X-Rspam-User: X-HE-Tag: 1691083618-379962 X-HE-Meta: U2FsdGVkX18z9WrtmcZVP4O22BTdCxmSTUagylPqxpNzzXTr3UjwVUCEzTm7BpJ04YjzhsCQ4czckvrvptZSb1SKiEblNOcHyd13GrNCVzOs3+WzqeRcVTMJlv/Ks44ERQ/ay4Y535KCDaXNO2A3YXw2GhFjd7W8hOTf+W20HwsHT3BAa6uFxok6g/KEdpKCAkG601nvqqcnmRoKtZqPePTEJtCapTuaQrf2mOTTwJn7QqURcTy+ps/arlV4T8UXLsIe9sl0rQPzzh1gk/zGzI6qex0PjB/j3jW9zQStQai1vYzAR8MKRvE947UhQOx4747TMnbYvnngpyF9jo6iBRj4H3bMyFM2mYIA9Uv95/8AtYEsQmRYk0hsEbFg13FKDlgXQixpXpZFc6mcUn9ibz3nCJPsMkACTsZJwfdsP4Z/buElxhPlhX7ROk2LsrrlskwBTFP8GDtDCF7npe2STrLAOW/5bTSnuZlN7riDhMDvQCJ0eLvZkyeDGQVMjm5zBfKL08sDXnjzFjZkOSBZ1b7r8olX0xfcE6uaKx2Y85Nc1MQhrqHxCSc9P783qzqSBExo4imuyjNdOIjtgqcWV7tt3Q3xTnI40pG0ucVMAoLrLVu9jwWD0WBtU+tXbpR0k/vBkJ3S0BQuBzfMplpjMt+OZb1LIYGCodrhslhEbaJ6uS3HKed6koQ5hEBExFGuffDv+u/epIL9fJOMcPn4IHMEvyGgy07wHNjQFsirXylvBECSzOerOwrzjoOLyS9YAUTPTBT4H38OxkZAzKYNx1J2TXxmxq+ZxnvOFx5LqGNxPeCbnnECpAMlGuS/BEjFNSHUAztP1IPodp69BDNiUzZoQcvCqdX+22aWxNsi5+6Ov/GJ+xhHflR2UXSkCNajkmxbm1oaY4Og8u8EA1IsK9Odrk+3s2QZTfU2UNQKFBsD9gsi9o6ljQ1W4Uf9Zh51+7T4HABzw2yuu7Hl8Xb EFK0gHEq omPEPpJ5pE7DBgy4aHtda/wViMKmfyzXSlmIGPmls8RlHOBH+ewJJT7+PPg2Jlzjyd2P5/xdgaR1ZDAIuQvqm3ObuDT0eDF5YbLUC+NgKTjTkI+OyUmE1EugiZ0gI67oaOE4G1pS5mZSRUT+26ecPBYFsCyK2lUxjlgwHzn39VXDQGFHDpfhVeohifXbGvMl/ecFui+m4zczWA0Qn42KnQIi/Ye6rABWS1CuYDj6gwRuX8I+WXW2q3U0gYj7a/Q3YVueks7TdA3KjMWOQ/r7TkWE7udyX/RE33Jq/FhsAqf/ur4YzcMFMQaJaKKqTzIwoE4gW7gGJY82nqE3ftzLd5wxL1fPZsKsVjZV0sHGJRd9Yeb+k2FTkk0AW/pHk6k/GfU9H+KWK5edg48kgy/Yo02n7szvRUxqAtsY5pgRvWUEr3bfW/OycAGxRm1g0YI/mmSARmQ+OuJGZ010ZmKeuhDoCgJ67lUUIMIge6B1kZvRHyQlyzPITm5eMmjs9EWDBdsUm8vSApYpSt3ghC+xJPapcsNb9W5cLLJ0aPt3gHvQ6qdCQbB/ncfXsaqq5XwxLkOQkSvX23g1VkpJEp0MEN7XqjmvS0B8irabe+4+4xa+XMY4YyZZCnV22/+mPM/5dlM6bfC4WmV4BoczQbFn+CSDadIug0oaIayUOeA4GMCxAGwg76piqxnD/VJiJLTA++6kJd/+xJH2XCB/P3+DY0ekFXw== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: walk_page_range() and friends often operate under write-locked mmap_lock. With introduction of vma locks, the vmas have to be locked as well during such walks to prevent concurrent page faults in these areas. Add an additional member to mm_walk_ops to indicate locking requirements for the walk. Cc: stable@vger.kernel.org # 6.4.x Suggested-by: Linus Torvalds Suggested-by: Jann Horn Signed-off-by: Suren Baghdasaryan --- arch/powerpc/mm/book3s64/subpage_prot.c | 1 + arch/riscv/mm/pageattr.c | 1 + arch/s390/mm/gmap.c | 5 ++++ fs/proc/task_mmu.c | 5 ++++ include/linux/pagewalk.h | 11 ++++++++ mm/damon/vaddr.c | 2 ++ mm/hmm.c | 1 + mm/ksm.c | 25 ++++++++++------- mm/madvise.c | 3 +++ mm/memcontrol.c | 2 ++ mm/memory-failure.c | 1 + mm/mempolicy.c | 22 +++++++++------ mm/migrate_device.c | 1 + mm/mincore.c | 1 + mm/mlock.c | 1 + mm/mprotect.c | 1 + mm/pagewalk.c | 36 ++++++++++++++++++++++--- mm/vmscan.c | 1 + 18 files changed, 100 insertions(+), 20 deletions(-) diff --git a/arch/powerpc/mm/book3s64/subpage_prot.c b/arch/powerpc/mm/book3s64/subpage_prot.c index 0dc85556dec5..ec98e526167e 100644 --- a/arch/powerpc/mm/book3s64/subpage_prot.c +++ b/arch/powerpc/mm/book3s64/subpage_prot.c @@ -145,6 +145,7 @@ static int subpage_walk_pmd_entry(pmd_t *pmd, unsigned long addr, static const struct mm_walk_ops subpage_walk_ops = { .pmd_entry = subpage_walk_pmd_entry, + .walk_lock = PGWALK_WRLOCK_VERIFY, }; static void subpage_mark_vma_nohuge(struct mm_struct *mm, unsigned long addr, diff --git a/arch/riscv/mm/pageattr.c b/arch/riscv/mm/pageattr.c index ea3d61de065b..161d0b34c2cb 100644 --- a/arch/riscv/mm/pageattr.c +++ b/arch/riscv/mm/pageattr.c @@ -102,6 +102,7 @@ static const struct mm_walk_ops pageattr_ops = { .pmd_entry = pageattr_pmd_entry, .pte_entry = pageattr_pte_entry, .pte_hole = pageattr_pte_hole, + .walk_lock = PGWALK_RDLOCK, }; static int __set_memory(unsigned long addr, int numpages, pgprot_t set_mask, diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 9c8af31be970..906a7bfc2a78 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2514,6 +2514,7 @@ static int thp_split_walk_pmd_entry(pmd_t *pmd, unsigned long addr, static const struct mm_walk_ops thp_split_walk_ops = { .pmd_entry = thp_split_walk_pmd_entry, + .walk_lock = PGWALK_WRLOCK_VERIFY, }; static inline void thp_split_mm(struct mm_struct *mm) @@ -2565,6 +2566,7 @@ static int __zap_zero_pages(pmd_t *pmd, unsigned long start, static const struct mm_walk_ops zap_zero_walk_ops = { .pmd_entry = __zap_zero_pages, + .walk_lock = PGWALK_WRLOCK, }; /* @@ -2655,6 +2657,7 @@ static const struct mm_walk_ops enable_skey_walk_ops = { .hugetlb_entry = __s390_enable_skey_hugetlb, .pte_entry = __s390_enable_skey_pte, .pmd_entry = __s390_enable_skey_pmd, + .walk_lock = PGWALK_WRLOCK, }; int s390_enable_skey(void) @@ -2692,6 +2695,7 @@ static int __s390_reset_cmma(pte_t *pte, unsigned long addr, static const struct mm_walk_ops reset_cmma_walk_ops = { .pte_entry = __s390_reset_cmma, + .walk_lock = PGWALK_WRLOCK, }; void s390_reset_cmma(struct mm_struct *mm) @@ -2728,6 +2732,7 @@ static int s390_gather_pages(pte_t *ptep, unsigned long addr, static const struct mm_walk_ops gather_pages_ops = { .pte_entry = s390_gather_pages, + .walk_lock = PGWALK_RDLOCK, }; /* diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 507cd4e59d07..ef6ee330e3be 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -758,12 +758,14 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, static const struct mm_walk_ops smaps_walk_ops = { .pmd_entry = smaps_pte_range, .hugetlb_entry = smaps_hugetlb_range, + .walk_lock = PGWALK_RDLOCK, }; static const struct mm_walk_ops smaps_shmem_walk_ops = { .pmd_entry = smaps_pte_range, .hugetlb_entry = smaps_hugetlb_range, .pte_hole = smaps_pte_hole, + .walk_lock = PGWALK_RDLOCK, }; /* @@ -1245,6 +1247,7 @@ static int clear_refs_test_walk(unsigned long start, unsigned long end, static const struct mm_walk_ops clear_refs_walk_ops = { .pmd_entry = clear_refs_pte_range, .test_walk = clear_refs_test_walk, + .walk_lock = PGWALK_WRLOCK, }; static ssize_t clear_refs_write(struct file *file, const char __user *buf, @@ -1622,6 +1625,7 @@ static const struct mm_walk_ops pagemap_ops = { .pmd_entry = pagemap_pmd_range, .pte_hole = pagemap_pte_hole, .hugetlb_entry = pagemap_hugetlb_range, + .walk_lock = PGWALK_RDLOCK, }; /* @@ -1935,6 +1939,7 @@ static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, static const struct mm_walk_ops show_numa_ops = { .hugetlb_entry = gather_hugetlb_stats, .pmd_entry = gather_pte_stats, + .walk_lock = PGWALK_RDLOCK, }; /* diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 27a6df448ee5..27cd1e59ccf7 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -6,6 +6,16 @@ struct mm_walk; +/* Locking requirement during a page walk. */ +enum page_walk_lock { + /* mmap_lock should be locked for read to stabilize the vma tree */ + PGWALK_RDLOCK = 0, + /* vma will be write-locked during the walk */ + PGWALK_WRLOCK = 1, + /* vma is expected to be already write-locked during the walk */ + PGWALK_WRLOCK_VERIFY = 2, +}; + /** * struct mm_walk_ops - callbacks for walk_page_range * @pgd_entry: if set, called for each non-empty PGD (top-level) entry @@ -66,6 +76,7 @@ struct mm_walk_ops { int (*pre_vma)(unsigned long start, unsigned long end, struct mm_walk *walk); void (*post_vma)(struct mm_walk *walk); + enum page_walk_lock walk_lock; }; /* diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 2fcc9731528a..e0e59d420fca 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -386,6 +386,7 @@ static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask, static const struct mm_walk_ops damon_mkold_ops = { .pmd_entry = damon_mkold_pmd_entry, .hugetlb_entry = damon_mkold_hugetlb_entry, + .walk_lock = PGWALK_RDLOCK, }; static void damon_va_mkold(struct mm_struct *mm, unsigned long addr) @@ -525,6 +526,7 @@ static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask, static const struct mm_walk_ops damon_young_ops = { .pmd_entry = damon_young_pmd_entry, .hugetlb_entry = damon_young_hugetlb_entry, + .walk_lock = PGWALK_RDLOCK, }; static bool damon_va_young(struct mm_struct *mm, unsigned long addr, diff --git a/mm/hmm.c b/mm/hmm.c index 855e25e59d8f..277ddcab4947 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -562,6 +562,7 @@ static const struct mm_walk_ops hmm_walk_ops = { .pte_hole = hmm_vma_walk_hole, .hugetlb_entry = hmm_vma_walk_hugetlb_entry, .test_walk = hmm_vma_walk_test, + .walk_lock = PGWALK_RDLOCK, }; /** diff --git a/mm/ksm.c b/mm/ksm.c index ba266359da55..00c21fb4d94e 100644 --- a/mm/ksm.c +++ b/mm/ksm.c @@ -455,6 +455,12 @@ static int break_ksm_pmd_entry(pmd_t *pmd, unsigned long addr, unsigned long nex static const struct mm_walk_ops break_ksm_ops = { .pmd_entry = break_ksm_pmd_entry, + .walk_lock = PGWALK_RDLOCK, +}; + +static const struct mm_walk_ops break_ksm_lock_vma_ops = { + .pmd_entry = break_ksm_pmd_entry, + .walk_lock = PGWALK_WRLOCK, }; /* @@ -470,16 +476,17 @@ static const struct mm_walk_ops break_ksm_ops = { * of the process that owns 'vma'. We also do not want to enforce * protection keys here anyway. */ -static int break_ksm(struct vm_area_struct *vma, unsigned long addr) +static int break_ksm(struct vm_area_struct *vma, unsigned long addr, bool lock_vma) { vm_fault_t ret = 0; + const struct mm_walk_ops *ops = lock_vma ? + &break_ksm_lock_vma_ops : &break_ksm_ops; do { int ksm_page; cond_resched(); - ksm_page = walk_page_range_vma(vma, addr, addr + 1, - &break_ksm_ops, NULL); + ksm_page = walk_page_range_vma(vma, addr, addr + 1, ops, NULL); if (WARN_ON_ONCE(ksm_page < 0)) return ksm_page; if (!ksm_page) @@ -565,7 +572,7 @@ static void break_cow(struct ksm_rmap_item *rmap_item) mmap_read_lock(mm); vma = find_mergeable_vma(mm, addr); if (vma) - break_ksm(vma, addr); + break_ksm(vma, addr, false); mmap_read_unlock(mm); } @@ -871,7 +878,7 @@ static void remove_trailing_rmap_items(struct ksm_rmap_item **rmap_list) * in cmp_and_merge_page on one of the rmap_items we would be removing. */ static int unmerge_ksm_pages(struct vm_area_struct *vma, - unsigned long start, unsigned long end) + unsigned long start, unsigned long end, bool lock_vma) { unsigned long addr; int err = 0; @@ -882,7 +889,7 @@ static int unmerge_ksm_pages(struct vm_area_struct *vma, if (signal_pending(current)) err = -ERESTARTSYS; else - err = break_ksm(vma, addr); + err = break_ksm(vma, addr, lock_vma); } return err; } @@ -1029,7 +1036,7 @@ static int unmerge_and_remove_all_rmap_items(void) if (!(vma->vm_flags & VM_MERGEABLE) || !vma->anon_vma) continue; err = unmerge_ksm_pages(vma, - vma->vm_start, vma->vm_end); + vma->vm_start, vma->vm_end, false); if (err) goto error; } @@ -2530,7 +2537,7 @@ static int __ksm_del_vma(struct vm_area_struct *vma) return 0; if (vma->anon_vma) { - err = unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end); + err = unmerge_ksm_pages(vma, vma->vm_start, vma->vm_end, true); if (err) return err; } @@ -2668,7 +2675,7 @@ int ksm_madvise(struct vm_area_struct *vma, unsigned long start, return 0; /* just ignore the advice */ if (vma->anon_vma) { - err = unmerge_ksm_pages(vma, start, end); + err = unmerge_ksm_pages(vma, start, end, true); if (err) return err; } diff --git a/mm/madvise.c b/mm/madvise.c index 886f06066622..bfe0e06427bd 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -233,6 +233,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, static const struct mm_walk_ops swapin_walk_ops = { .pmd_entry = swapin_walk_pmd_entry, + .walk_lock = PGWALK_RDLOCK, }; static void shmem_swapin_range(struct vm_area_struct *vma, @@ -534,6 +535,7 @@ static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, static const struct mm_walk_ops cold_walk_ops = { .pmd_entry = madvise_cold_or_pageout_pte_range, + .walk_lock = PGWALK_RDLOCK, }; static void madvise_cold_page_range(struct mmu_gather *tlb, @@ -757,6 +759,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, static const struct mm_walk_ops madvise_free_walk_ops = { .pmd_entry = madvise_free_pte_range, + .walk_lock = PGWALK_RDLOCK, }; static int madvise_free_single_vma(struct vm_area_struct *vma, diff --git a/mm/memcontrol.c b/mm/memcontrol.c index e8ca4bdcb03c..315fd5f45e3c 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6024,6 +6024,7 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd, static const struct mm_walk_ops precharge_walk_ops = { .pmd_entry = mem_cgroup_count_precharge_pte_range, + .walk_lock = PGWALK_RDLOCK, }; static unsigned long mem_cgroup_count_precharge(struct mm_struct *mm) @@ -6303,6 +6304,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd, static const struct mm_walk_ops charge_walk_ops = { .pmd_entry = mem_cgroup_move_charge_pte_range, + .walk_lock = PGWALK_RDLOCK, }; static void mem_cgroup_move_charge(void) diff --git a/mm/memory-failure.c b/mm/memory-failure.c index ece5d481b5ff..6bfb762facab 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -831,6 +831,7 @@ static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask, static const struct mm_walk_ops hwp_walk_ops = { .pmd_entry = hwpoison_pte_range, .hugetlb_entry = hwpoison_hugetlb_range, + .walk_lock = PGWALK_RDLOCK, }; /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index c53f8beeb507..ec2eaceffd74 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -718,6 +718,14 @@ static const struct mm_walk_ops queue_pages_walk_ops = { .hugetlb_entry = queue_folios_hugetlb, .pmd_entry = queue_folios_pte_range, .test_walk = queue_pages_test_walk, + .walk_lock = PGWALK_RDLOCK, +}; + +static const struct mm_walk_ops queue_pages_lock_vma_walk_ops = { + .hugetlb_entry = queue_folios_hugetlb, + .pmd_entry = queue_folios_pte_range, + .test_walk = queue_pages_test_walk, + .walk_lock = PGWALK_WRLOCK, }; /* @@ -738,7 +746,7 @@ static const struct mm_walk_ops queue_pages_walk_ops = { static int queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, nodemask_t *nodes, unsigned long flags, - struct list_head *pagelist) + struct list_head *pagelist, bool lock_vma) { int err; struct queue_pages qp = { @@ -749,8 +757,10 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, .end = end, .first = NULL, }; + const struct mm_walk_ops *ops = lock_vma ? + &queue_pages_lock_vma_walk_ops : &queue_pages_walk_ops; - err = walk_page_range(mm, start, end, &queue_pages_walk_ops, &qp); + err = walk_page_range(mm, start, end, ops, &qp); if (!qp.first) /* whole range in hole */ @@ -1078,7 +1088,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest, vma = find_vma(mm, 0); VM_BUG_ON(!(flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL))); queue_pages_range(mm, vma->vm_start, mm->task_size, &nmask, - flags | MPOL_MF_DISCONTIG_OK, &pagelist); + flags | MPOL_MF_DISCONTIG_OK, &pagelist, false); if (!list_empty(&pagelist)) { err = migrate_pages(&pagelist, alloc_migration_target, NULL, @@ -1321,12 +1331,8 @@ static long do_mbind(unsigned long start, unsigned long len, * Lock the VMAs before scanning for pages to migrate, to ensure we don't * miss a concurrently inserted page. */ - vma_iter_init(&vmi, mm, start); - for_each_vma_range(vmi, vma, end) - vma_start_write(vma); - ret = queue_pages_range(mm, start, end, nmask, - flags | MPOL_MF_INVERT, &pagelist); + flags | MPOL_MF_INVERT, &pagelist, true); if (ret < 0) { err = ret; diff --git a/mm/migrate_device.c b/mm/migrate_device.c index 8365158460ed..d5f492356e3e 100644 --- a/mm/migrate_device.c +++ b/mm/migrate_device.c @@ -279,6 +279,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, static const struct mm_walk_ops migrate_vma_walk_ops = { .pmd_entry = migrate_vma_collect_pmd, .pte_hole = migrate_vma_collect_hole, + .walk_lock = PGWALK_RDLOCK, }; /* diff --git a/mm/mincore.c b/mm/mincore.c index b7f7a516b26c..dad3622cc963 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -176,6 +176,7 @@ static const struct mm_walk_ops mincore_walk_ops = { .pmd_entry = mincore_pte_range, .pte_hole = mincore_unmapped_range, .hugetlb_entry = mincore_hugetlb, + .walk_lock = PGWALK_RDLOCK, }; /* diff --git a/mm/mlock.c b/mm/mlock.c index 0a0c996c5c21..479e09d0994c 100644 --- a/mm/mlock.c +++ b/mm/mlock.c @@ -371,6 +371,7 @@ static void mlock_vma_pages_range(struct vm_area_struct *vma, { static const struct mm_walk_ops mlock_walk_ops = { .pmd_entry = mlock_pte_range, + .walk_lock = PGWALK_WRLOCK_VERIFY, }; /* diff --git a/mm/mprotect.c b/mm/mprotect.c index 6f658d483704..3aef1340533a 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -568,6 +568,7 @@ static const struct mm_walk_ops prot_none_walk_ops = { .pte_entry = prot_none_pte_entry, .hugetlb_entry = prot_none_hugetlb_entry, .test_walk = prot_none_test, + .walk_lock = PGWALK_WRLOCK, }; int diff --git a/mm/pagewalk.c b/mm/pagewalk.c index 2022333805d3..9b2d23fbf4d3 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -400,6 +400,33 @@ static int __walk_page_range(unsigned long start, unsigned long end, return err; } +static inline void process_mm_walk_lock(struct mm_struct *mm, + enum page_walk_lock walk_lock) +{ + if (walk_lock == PGWALK_RDLOCK) + mmap_assert_locked(mm); + else + mmap_assert_write_locked(mm); +} + +static inline void process_vma_walk_lock(struct vm_area_struct *vma, + enum page_walk_lock walk_lock) +{ +#ifdef CONFIG_PER_VMA_LOCK + switch (walk_lock) { + case PGWALK_WRLOCK: + vma_start_write(vma); + break; + case PGWALK_WRLOCK_VERIFY: + vma_assert_write_locked(vma); + break; + case PGWALK_RDLOCK: + /* PGWALK_RDLOCK is handled by process_mm_walk_lock */ + break; + } +#endif +} + /** * walk_page_range - walk page table with caller specific callbacks * @mm: mm_struct representing the target process of page table walk @@ -459,7 +486,7 @@ int walk_page_range(struct mm_struct *mm, unsigned long start, if (!walk.mm) return -EINVAL; - mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); vma = find_vma(walk.mm, start); do { @@ -474,6 +501,7 @@ int walk_page_range(struct mm_struct *mm, unsigned long start, if (ops->pte_hole) err = ops->pte_hole(start, next, -1, &walk); } else { /* inside vma */ + process_vma_walk_lock(vma, ops->walk_lock); walk.vma = vma; next = min(end, vma->vm_end); vma = find_vma(mm, vma->vm_end); @@ -549,7 +577,8 @@ int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start, if (start < vma->vm_start || end > vma->vm_end) return -EINVAL; - mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); + process_vma_walk_lock(vma, ops->walk_lock); return __walk_page_range(start, end, &walk); } @@ -566,7 +595,8 @@ int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops, if (!walk.mm) return -EINVAL; - mmap_assert_locked(walk.mm); + process_mm_walk_lock(walk.mm, ops->walk_lock); + process_vma_walk_lock(vma, ops->walk_lock); return __walk_page_range(vma->vm_start, vma->vm_end, &walk); } diff --git a/mm/vmscan.c b/mm/vmscan.c index 1080209a568b..3555927df9b5 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -4284,6 +4284,7 @@ static void walk_mm(struct lruvec *lruvec, struct mm_struct *mm, struct lru_gen_ static const struct mm_walk_ops mm_walk_ops = { .test_walk = should_skip_vma, .p4d_entry = walk_pud_range, + .walk_lock = PGWALK_RDLOCK, }; int err;