From patchwork Fri Jun 30 21:19:56 2023 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Suren Baghdasaryan X-Patchwork-Id: 13298787 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51D66C001B1 for ; Fri, 30 Jun 2023 21:20:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D93D88E0050; Fri, 30 Jun 2023 17:20:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CF5938E000F; Fri, 30 Jun 2023 17:20:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BBD668E0050; Fri, 30 Jun 2023 17:20:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id ACC008E000F for ; Fri, 30 Jun 2023 17:20:15 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8A68BC0E08 for ; Fri, 30 Jun 2023 21:20:15 +0000 (UTC) X-FDA: 80960682390.26.BE09388 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf16.hostedemail.com (Postfix) with ESMTP id A90D718000D for ; Fri, 30 Jun 2023 21:20:13 +0000 (UTC) Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ZT6ktHZW; spf=pass (imf16.hostedemail.com: domain of 3DEefZAYKCEMxzwjsglttlqj.htrqnsz2-rrp0fhp.twl@flex--surenb.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3DEefZAYKCEMxzwjsglttlqj.htrqnsz2-rrp0fhp.twl@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1688160013; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=ZLYKxrZ5h1fbhcFAmYXsTYfJQa/73KPAk4mDWkHP+2U=; b=eB1qV1Diz7/HYyLTa3PIvp7tVIPhcDhS68Nc8HHk3gsSI80LKYNCrQ/29jjvwKu5KqljnP YiIJfAaXiCbvjzllkY59zOQ0vYGedFNn5E10G2yZEgR0NeL97cI5xE8Giv4JmJNU/+GTdx c7C9QVlQsJfoPTvAd/7jeo23IOhRyuM= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1688160013; a=rsa-sha256; cv=none; b=4LMf56iyr+WBLmdZvXQer3SQlcc39X+PyP3Gr1sDGq013chO8LVmK+km5pEIwXXT2xfPGt krT2UIlMgmJ2D6u4Xe0uNiwTz2yLO9sLjE6m37AF0/AXBagyDaVepCbLsbywchjNYegdn2 QDJtaDofT5mjzsQkxazHrYOAbbysP0E= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20221208 header.b=ZT6ktHZW; spf=pass (imf16.hostedemail.com: domain of 3DEefZAYKCEMxzwjsglttlqj.htrqnsz2-rrp0fhp.twl@flex--surenb.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3DEefZAYKCEMxzwjsglttlqj.htrqnsz2-rrp0fhp.twl@flex--surenb.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-573d70da2dcso22631157b3.1 for ; Fri, 30 Jun 2023 14:20:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20221208; t=1688160012; x=1690752012; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=ZLYKxrZ5h1fbhcFAmYXsTYfJQa/73KPAk4mDWkHP+2U=; b=ZT6ktHZWBfakPwTdWQL11GJylklZdUt0REmIhnELtq8vuBuVaxQgco9JCoPnC+OuuU vZw6PSnrGelo0lEcCeDnFYN+4tflSM08EdhmftkX19M9KyJrNLTeDVawm7norgcsHG5C dYVgLU28UQIhivrIil0nxAl475ziO73D9fN2kmu9LIUFW9ZFSiP3g0UxtSZe73VE2twZ nBPnZdmF3VizjltBsqXwY6xVQPcd/rehkQvW2VeGImIgps3clPONivcBrZwnBiRTeozA de88VNzV4x+EJnMxv8ucE3UsQ49l0jwnRRd+gl+RiItO69e1VdmSX+9wfmOXimiSZbMc iK4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688160012; x=1690752012; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=ZLYKxrZ5h1fbhcFAmYXsTYfJQa/73KPAk4mDWkHP+2U=; b=kvqP6vlmiDruZhmySS7WjXZmUei4N+UNPO/Uz7Ip5F8HM4wRwVninSi0jjZWgHH0UH Lv537xmgkkn+YqzyTK5rBEvZHwnfdt4RNEzd7IGAN87ze2kWeK4hdTpFtmmPBjN1IZuO +eE5QMvQVJa5cBA8OanqjUg45vfYaeCITy9vjgY+1vWEuWzPOa7NBDcjjo5kt+H0Rfh5 OuNc7+5KqYVR13abDURUNIGfQx0Bwn/vXmzuAerEE9Rzrw7kL6vav5U0J0jdbWR0jv1Y X95Zxh1tHMcxCAmlJoGCC17ObnnF1LLGL/DNpAdUl86gusoxBhjL5aZ3UD0o558YOjq4 ykCQ== X-Gm-Message-State: ABy/qLbklM+9Mwr93/fVnzYne6UK15J6agZKlBOQFf4JJs+3lgyUoKx0 fTeT0u+VojpWLIHk3Yn43+Kgcb54eo0= X-Google-Smtp-Source: APBJJlEfJxRiiLohpASuK48k4/aLdsiMH6cDocekn+hydllmXb7XugZwuZXe3kkmsJwh5wjvqPkBFJmtPmQ= X-Received: from surenb-desktop.mtv.corp.google.com ([2620:15c:211:201:b54c:4d64:f00a:1b67]) (user=surenb job=sendgmr) by 2002:a81:cb0c:0:b0:56c:e6fa:6ce9 with SMTP id q12-20020a81cb0c000000b0056ce6fa6ce9mr29913ywi.8.1688160012625; Fri, 30 Jun 2023 14:20:12 -0700 (PDT) Date: Fri, 30 Jun 2023 14:19:56 -0700 In-Reply-To: <20230630211957.1341547-1-surenb@google.com> Mime-Version: 1.0 References: <20230630211957.1341547-1-surenb@google.com> X-Mailer: git-send-email 2.41.0.255.g8b1d071c50-goog Message-ID: <20230630211957.1341547-6-surenb@google.com> Subject: [PATCH v7 5/6] mm: handle swap page faults under per-VMA lock From: Suren Baghdasaryan To: akpm@linux-foundation.org Cc: willy@infradead.org, hannes@cmpxchg.org, mhocko@suse.com, josef@toxicpanda.com, jack@suse.cz, ldufour@linux.ibm.com, laurent.dufour@fr.ibm.com, michel@lespinasse.org, liam.howlett@oracle.com, jglisse@google.com, vbabka@suse.cz, minchan@google.com, dave@stgolabs.net, punit.agrawal@bytedance.com, lstoakes@gmail.com, hdanton@sina.com, apopple@nvidia.com, peterx@redhat.com, ying.huang@intel.com, david@redhat.com, yuzhao@google.com, dhowells@redhat.com, hughd@google.com, viro@zeniv.linux.org.uk, brauner@kernel.org, pasha.tatashin@soleen.com, surenb@google.com, linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-kernel@vger.kernel.org, kernel-team@android.com X-Stat-Signature: fzm8zts3ptgw9ocukr7dpmdnh6mbh4bh X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: A90D718000D X-Rspam-User: X-HE-Tag: 1688160013-608357 X-HE-Meta: U2FsdGVkX1+A8CoHjPhEC7jT7cNxt3TtVxMhi2o/vF/71h30GTIwZfxt3h+WtraPy+t83vA+NDwPKJfchsBW+4v0QdV8FjCim40mjnZ2804Sj+etx1PgztJpImN7GWrEUplIlIkEpMD2NqJI7SURXCwpXU5ZApiY1CdSp/ZHi9dzUNZ9prxAFi2TUvH73UymdKXBLhQhayLes2WPE/2bNSriWblp88sGWyaW952552WyFVcgczkzzQLmT5StR2wM/mVlsKZJuk8CKvWHu9PlLSvlAx8zFjLEqq5KmOfszyv7UjvGsQTx9e/0nhALSouQKDE9Tu6aaUsoUGVJCYnAAwf2Dv7NomtdlJne/zpHwoaYZLsqnQ3ORp0MMCsPkjVRKVEDeAHX9cnKasl7OYRQS1wKrn3+dWUel3we13XCGTf42o1+ux2TOpBscpVotEX3nerCB9o5eeqhMMnCyhHtXB8mALNCfQfrkScmvUx8ghAgHi4Wijow+Ruiy1JNbjj5RqrDgiHOFDXwGOjTe9F77/cle/o5JsobexGgMfBP48CW5DJj6lTdlnOU3AazhtSgOCwsKsuWrVRc5GKIbjbIDuV5otVFRTHSlihN1x4BQ/XUO5SPnOCBtoS+eLVaVeFKe3/tQoCfwYbruqObwPZ4IbPbpZMcfrnHGkgSRqVPBqeXLOE01Qhn+YvrKlx8WnweP27X8bcMsSHdr3J0jFlvb6BX3J8Q8irg4BwyJx/u1qx35LiqCyEVPzvPG2CqLxJ+5uDqpKekswKAi3QWMgq0Uko657wu9sgcC6/mfRYordVDE8To31VhEQqGH1fdBkebO5BGUP+n8WOqDB/I911EtmVhIcPj4A1Az5+h0Qd2T3ONyJq9Om4XjhxChsAMfDkOnscYLjskTx+I+pghlUov+WZc6p9BOsFOvWesZEIpGdXEaUYH+FYyvgFFBattBHdYrlomBSha/NVUheANj7o 4isWmx15 erF3tDbIp5D4GLe23rqSjMGRfjjAfGt3gyCDNwjrJf8N/wAKNQ26KQvsFV8aJ0w2Hb4B6KLoW7iQB/Er2oLH0DTiONYcHg6ORGjy/4sZkYYjjq2c0ZBjTATZASqiikj46g9mxvy3iX3e7LGtnxerfssKPkI5AHbfz4rILvRSrjx3Xt6Ig87ldFpbdmKVfhDPsSdzY7gN89nzYMv2wrUhahj5i6gGq215/JMNy1qqxHFi6nMwaOtZHo38azQK3i0ldDC9avYTStoisONsSeMztdyPKGjAxkWvfbszvE+wHjsM/bF/Qq6wo0p3rhalrC4EIiEw2gt0cAQSn+O//4yaIWQGiug5JH2UoA08aNYuftHZ/HWQwNVyoduiZzrCJTwD5YfRmzStFytIA81rN/SzbwobZYw+qv1M7TP5hBvdcMtPAWWxOCyrZKLPPVmitAn2StbV79rdUH2b4m0YAoKmSE6gaX9YLpuAqGBR7N3HNhjKjUmySmTfragAYuSqsYOHRCpcHGVVzFtLvkiQTeaq6rbqfWEIzyjvOx9Ja7C+kGQ8n0AACGnSyN3Wtx8Hfq8U1RV3FK3gFoVxKYXAu9ukWu0gJqwSw7fS/QQzgCKICf1wvGTm63pcxskS1qSs+AutXhPQXJvvnWGQMHb0sJTMM+mkt2w== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When page fault is handled under per-VMA lock protection, all swap page faults are retried with mmap_lock because folio_lock_or_retry has to drop and reacquire mmap_lock if folio could not be immediately locked. Follow the same pattern as mmap_lock to drop per-VMA lock when waiting for folio and retrying once folio is available. With this obstacle removed, enable do_swap_page to operate under per-VMA lock protection. Drivers implementing ops->migrate_to_ram might still rely on mmap_lock, therefore we have to fall back to mmap_lock in that particular case. Note that the only time do_swap_page calls synchronous swap_readpage is when SWP_SYNCHRONOUS_IO is set, which is only set for QUEUE_FLAG_SYNCHRONOUS devices: brd, zram and nvdimms (both btt and pmem). Therefore we don't sleep in this path, and there's no need to drop the mmap or per-VMA lock. Signed-off-by: Suren Baghdasaryan Tested-by: Alistair Popple Reviewed-by: Alistair Popple Acked-by: Peter Xu --- include/linux/mm.h | 13 +++++++++++++ mm/filemap.c | 17 ++++++++--------- mm/memory.c | 16 ++++++++++------ 3 files changed, 31 insertions(+), 15 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 39aa409e84d5..54ab11214f4f 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -720,6 +720,14 @@ static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) vma->detached = detached; } +static inline void release_fault_lock(struct vm_fault *vmf) +{ + if (vmf->flags & FAULT_FLAG_VMA_LOCK) + vma_end_read(vmf->vma); + else + mmap_read_unlock(vmf->vma->vm_mm); +} + struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm, unsigned long address); @@ -735,6 +743,11 @@ static inline void vma_assert_write_locked(struct vm_area_struct *vma) {} static inline void vma_mark_detached(struct vm_area_struct *vma, bool detached) {} +static inline void release_fault_lock(struct vm_fault *vmf) +{ + mmap_read_unlock(vmf->vma->vm_mm); +} + #endif /* CONFIG_PER_VMA_LOCK */ /* diff --git a/mm/filemap.c b/mm/filemap.c index 5da5ad6f7f4c..5ac1b7beea2a 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -1671,27 +1671,26 @@ static int __folio_lock_async(struct folio *folio, struct wait_page_queue *wait) * Return values: * 0 - folio is locked. * non-zero - folio is not locked. - * mmap_lock has been released (mmap_read_unlock(), unless flags had both - * FAULT_FLAG_ALLOW_RETRY and FAULT_FLAG_RETRY_NOWAIT set, in - * which case mmap_lock is still held. + * mmap_lock or per-VMA lock has been released (mmap_read_unlock() or + * vma_end_read()), unless flags had both FAULT_FLAG_ALLOW_RETRY and + * FAULT_FLAG_RETRY_NOWAIT set, in which case the lock is still held. * * If neither ALLOW_RETRY nor KILLABLE are set, will always return 0 - * with the folio locked and the mmap_lock unperturbed. + * with the folio locked and the mmap_lock/per-VMA lock is left unperturbed. */ vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf) { - struct mm_struct *mm = vmf->vma->vm_mm; unsigned int flags = vmf->flags; if (fault_flag_allow_retry_first(flags)) { /* - * CAUTION! In this case, mmap_lock is not released - * even though return VM_FAULT_RETRY. + * CAUTION! In this case, mmap_lock/per-VMA lock is not + * released even though returning VM_FAULT_RETRY. */ if (flags & FAULT_FLAG_RETRY_NOWAIT) return VM_FAULT_RETRY; - mmap_read_unlock(mm); + release_fault_lock(vmf); if (flags & FAULT_FLAG_KILLABLE) folio_wait_locked_killable(folio); else @@ -1703,7 +1702,7 @@ vm_fault_t __folio_lock_or_retry(struct folio *folio, struct vm_fault *vmf) ret = __folio_lock_killable(folio); if (ret) { - mmap_read_unlock(mm); + release_fault_lock(vmf); return VM_FAULT_RETRY; } } else { diff --git a/mm/memory.c b/mm/memory.c index 4ae3f046f593..bb0f68a73b0c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3729,12 +3729,6 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!pte_unmap_same(vmf)) goto out; - if (vmf->flags & FAULT_FLAG_VMA_LOCK) { - ret = VM_FAULT_RETRY; - vma_end_read(vma); - goto out; - } - entry = pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { @@ -3744,6 +3738,16 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vmf->page = pfn_swap_entry_to_page(entry); ret = remove_device_exclusive_entry(vmf); } else if (is_device_private_entry(entry)) { + if (vmf->flags & FAULT_FLAG_VMA_LOCK) { + /* + * migrate_to_ram is not yet ready to operate + * under VMA lock. + */ + vma_end_read(vma); + ret = VM_FAULT_RETRY; + goto out; + } + vmf->page = pfn_swap_entry_to_page(entry); vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl);