From patchwork Tue Mar 23 00:48:50 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156477 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3AAD2C433C1 for ; Tue, 23 Mar 2021 00:49:24 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 99D6C6196C for ; Tue, 23 Mar 2021 00:49:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 99D6C6196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B98F46B00FD; Mon, 22 Mar 2021 20:49:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B48A96B0100; Mon, 22 Mar 2021 20:49:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 923C56B0103; Mon, 22 Mar 2021 20:49:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0119.hostedemail.com [216.40.44.119]) by kanga.kvack.org (Postfix) with ESMTP id 659A46B00FD for ; Mon, 22 Mar 2021 20:49:21 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 27B2E1E0A for ; Tue, 23 Mar 2021 00:49:21 +0000 (UTC) X-FDA: 77949305322.29.6F2438E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf02.hostedemail.com (Postfix) with ESMTP id A2F6E407F8F3 for ; Tue, 23 Mar 2021 00:49:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460559; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+P1dWJGjbRulk7s1nJfmaHQsVvt1Wj4iX3MZXYNalzI=; b=JHphZv9WvNAXGi9DnPc+gQvIYuZzfgs2Xl0YIm7ep/c93VC7kIdaOWDCIAGigYzXfCmZO4 1pjWY+9dELuz3alS1ko+5vRSPjVjXk4FbdLEgDs2vVt/bS4O5ZOoUEC71TpqshuNd+2qIH Wn+Vym49QoWod51oa7Qa9pcj+RElE/A= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-499-XCkuk6bnPVy1i7AtYLZCKw-1; Mon, 22 Mar 2021 20:49:18 -0400 X-MC-Unique: XCkuk6bnPVy1i7AtYLZCKw-1 Received: by mail-qv1-f70.google.com with SMTP id b15so550854qvz.15 for ; Mon, 22 Mar 2021 17:49:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+P1dWJGjbRulk7s1nJfmaHQsVvt1Wj4iX3MZXYNalzI=; b=Fl5XSE0mwJt1kckXy0PpDhB87lU6Hv2EInHPh2YCbfWoggNK0M5Oaf8g3YHfuRiqcM ssQ0H1GSvcxNRhchi3etoJGvRhxw1C8yCFliMkV+H1o7BM9clFjJvmFagoehB01/6ltu FbF880HBlr4tdQjiUgbvzEhP0PxfYkUZ+BfhLEAVPxn+0x4SLUhWyE9Jr/5IDmDfGI0U KXBumNJ0eNnYC3S1UVBR8KojaUbzv9HbKn+mTyDPgauonZtCjTDRa5mfV+gnQiKjQqUZ sMiZGEYJWktC6F5W4lO0q8u9QQYKi4++/6MgHUhw5eI6xrebFpAvOfA2rujigvJCjbk9 orTw== X-Gm-Message-State: AOAM531lczo/79FiSMnTKqPdHQOO3bNZ7cb5677V5Jna8FIUwmZcjOrF eHbIS6g/UR+NW64gDzdRWoIsRW/NdIxvC4kfj4sydJUYKo5Jz3TJc+2uLfz6o67+ZcYQVKdzDuS 7FUH7vbEUNz4= X-Received: by 2002:a05:622a:1192:: with SMTP id m18mr2273147qtk.27.1616460557075; Mon, 22 Mar 2021 17:49:17 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyqjFEqUS8mI2bN3hk02sEVq6e4Mt4cud7LDIj6h6LCxZV+FMUuvXeBvvd82VpC1CtDlxHBfA== X-Received: by 2002:a05:622a:1192:: with SMTP id m18mr2273126qtk.27.1616460556811; Mon, 22 Mar 2021 17:49:16 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:16 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 01/23] shmem/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Date: Mon, 22 Mar 2021 20:48:50 -0400 Message-Id: <20210323004912.35132-2-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: 73jnsf8xyajagezsr6bqbc74df665841 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: A2F6E407F8F3 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf02; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460559-213897 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Firstly, pass wp_copy into shmem_mfill_atomic_pte() through the stack. Then apply the UFFD_WP bit properly when the UFFDIO_COPY on shmem is with UFFDIO_COPY_MODE_WP. One thing to mention is that shmem_mfill_atomic_pte() needs to set the dirty bit in pte even if UFFDIO_COPY_MODE_WP is set. The reason is similar to dcf7fe9d8976 ("userfaultfd: shmem: UFFDIO_COPY: set the page dirty if VM_WRITE is not set") where we need to set page as dirty even if VM_WRITE is no there. It's just that shmem can drop the pte any time later, and if it's not dirty the data will be dropped. For uffd-wp, that could lead to data loss if without the dirty bit set. Note that shmem_mfill_zeropage_pte() will always call shmem_mfill_atomic_pte() with wp_copy==false because UFFDIO_ZEROCOPY does not support UFFDIO_COPY_MODE_WP. Signed-off-by: Peter Xu --- include/linux/shmem_fs.h | 5 +++-- mm/shmem.c | 18 ++++++++++++++---- mm/userfaultfd.c | 2 +- 3 files changed, 18 insertions(+), 7 deletions(-) diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h index f0919c3722e7..dfd0369657d8 100644 --- a/include/linux/shmem_fs.h +++ b/include/linux/shmem_fs.h @@ -128,10 +128,11 @@ extern void shmem_uncharge(struct inode *inode, long pages); int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - enum mcopy_atomic_mode mode, struct page **pagep); + enum mcopy_atomic_mode mode, struct page **pagep, + bool wp_copy); #else /* !CONFIG_SHMEM */ #define shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, \ - src_addr, mode, pagep) ({ BUG(); 0; }) + src_addr, mode, pagep, wp_copy) ({ BUG(); 0; }) #endif /* CONFIG_SHMEM */ #endif /* CONFIG_USERFAULTFD */ diff --git a/mm/shmem.c b/mm/shmem.c index 5cfd2fb6e52b..e88aaabaeb27 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2364,7 +2364,8 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, - enum mcopy_atomic_mode mode, struct page **pagep) + enum mcopy_atomic_mode mode, struct page **pagep, + bool wp_copy) { bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct inode *inode = file_inode(dst_vma->vm_file); @@ -2438,9 +2439,18 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, } _dst_pte = mk_pte(page, dst_vma->vm_page_prot); - if (dst_vma->vm_flags & VM_WRITE) - _dst_pte = pte_mkwrite(pte_mkdirty(_dst_pte)); - else { + if (dst_vma->vm_flags & VM_WRITE) { + if (wp_copy) + _dst_pte = pte_mkuffd_wp(pte_wrprotect(_dst_pte)); + else + _dst_pte = pte_mkwrite(_dst_pte); + /* + * Similar reason to set_page_dirty(), that we need to mark the + * pte dirty even if wp_copy==true here, otherwise the pte and + * its page could be dropped at anytime when e.g. swapped out. + */ + _dst_pte = pte_mkdirty(_dst_pte); + } else { /* * We don't set the pte dirty if the vma has no * VM_WRITE permission, so mark the page dirty or it diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index cbb7c8d79a4d..0963e0d9ed20 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -448,7 +448,7 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, } else { VM_WARN_ON_ONCE(wp_copy); err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - src_addr, mode, page); + src_addr, mode, page, wp_copy); } return err; From patchwork Tue Mar 23 00:48:51 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156479 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66175C433DB for ; Tue, 23 Mar 2021 00:49:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E1B6F6199F for ; Tue, 23 Mar 2021 00:49:25 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E1B6F6199F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8B3C66B0100; Mon, 22 Mar 2021 20:49:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8637F6B0103; Mon, 22 Mar 2021 20:49:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6B8E26B0107; Mon, 22 Mar 2021 20:49:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0073.hostedemail.com [216.40.44.73]) by kanga.kvack.org (Postfix) with ESMTP id 396246B0100 for ; Mon, 22 Mar 2021 20:49:22 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id E4E3A8419 for ; Tue, 23 Mar 2021 00:49:21 +0000 (UTC) X-FDA: 77949305322.36.8B6C5A9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 5331FA0009E5 for ; Tue, 23 Mar 2021 00:49:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460560; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=qzE2FQvBgwjcQuvAyEoA4yo8mQxpoaRUw1a4rmG7XTE=; b=PqLAGVOGYWKFuD2AA22/2Rcwd8aGdRL4DwNQyZw3QexC0SPmRLfP2u50JFqy3fpPMYUDNp Ddf8psj82+nBINCksF4SSrHGPZFfftUKVfcWWZZHfIpYON1g9pL5j9qNW+scav0+naMLr/ P5hOmebphGw9D3/3TBhrR+uk+9u9olw= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-157-fW8Iq_VvNDi8OciPqKKSXg-1; Mon, 22 Mar 2021 20:49:19 -0400 X-MC-Unique: fW8Iq_VvNDi8OciPqKKSXg-1 Received: by mail-qt1-f199.google.com with SMTP id m8so392477qtp.14 for ; Mon, 22 Mar 2021 17:49:19 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=qzE2FQvBgwjcQuvAyEoA4yo8mQxpoaRUw1a4rmG7XTE=; b=hyx0mOPfhIJ4qGYWhE4mDvoSJ6TR17KYjFV0EiR7oqF49AyVlR8uzlC9JaLFnhG2sY rX5XDsdfYpdyfyDo+eSLHwo4siwfONMoKNiSWVF8eRLd0QEq72ainSmZHetwRgxjcPL0 DBpBXrSEIIWD95wI/xrYo0VJq/9axGSy9Qef3W1pIWaMnW1gL+DDla/kmxH1pbClnyZA 9s4XGY0ykueA7HeMJ6oUG4qQfJ3iS+NY2Ir9euf30RZdgI/M8gc6gDpwXrHvnn6O77Sq zrmCw/s6KKP+HP57zxLXdAxJIrPHl9A9P/s9uAEUspugGxetvCBz5nlXaeQcWeYtT6FH L7UQ== X-Gm-Message-State: AOAM530yS1+ntfqvFVh69p8cVWRCis6jXR41reiXqJhX//4peV33bfdI oWkCcMPvZixCza6xKPjOMcAOplEwNu6JqzpYMPT0/LwXLA0a1dR8H+Pz5bWKRlPQ+WdGw+2wQCD 9KvwqMARB3vI= X-Received: by 2002:ac8:4314:: with SMTP id z20mr2399455qtm.127.1616460558822; Mon, 22 Mar 2021 17:49:18 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzGnxyaubPdosmeB8+KjtwtJv00yj3i3xw1XrG7tvd1zJbFrOCn0HMVDuIqra3FYF+x1R7Apg== X-Received: by 2002:ac8:4314:: with SMTP id z20mr2399429qtm.127.1616460558565; Mon, 22 Mar 2021 17:49:18 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:18 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 02/23] mm: Clear vmf->pte after pte_unmap_same() returns Date: Mon, 22 Mar 2021 20:48:51 -0400 Message-Id: <20210323004912.35132-3-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: prx4hq48zu7cidhep5en3sfwnr8df4o7 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 5331FA0009E5 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460560-768737 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: pte_unmap_same() will always unmap the pte pointer. After the unmap, vmf->pte will not be valid any more. We should clear it. It was safe only because no one is accessing vmf->pte after pte_unmap_same() returns, since the only caller of pte_unmap_same() (so far) is do_swap_page(), where vmf->pte will in most cases be overwritten very soon. pte_unmap_same() will be used in other places in follow up patches, so that vmf->pte will not always be re-written. This patch enables us to call functions like finish_fault() because that'll conditionally unmap the pte by checking vmf->pte first. Or, alloc_set_pte() will make sure to allocate a new pte even after calling pte_unmap_same(). Since we'll need to modify vmf->pte, directly pass in vmf into pte_unmap_same() and then we can also avoid the long parameter list. Signed-off-by: Peter Xu Reviewed-by: Miaohe Lin --- mm/memory.c | 13 +++++++------ 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index a458a595331f..d534eba85756 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -2607,19 +2607,20 @@ EXPORT_SYMBOL_GPL(apply_to_existing_page_range); * proceeding (but do_wp_page is only called after already making such a check; * and do_anonymous_page can safely check later on). */ -static inline int pte_unmap_same(struct mm_struct *mm, pmd_t *pmd, - pte_t *page_table, pte_t orig_pte) +static inline int pte_unmap_same(struct vm_fault *vmf) { int same = 1; #if defined(CONFIG_SMP) || defined(CONFIG_PREEMPTION) if (sizeof(pte_t) > sizeof(unsigned long)) { - spinlock_t *ptl = pte_lockptr(mm, pmd); + spinlock_t *ptl = pte_lockptr(vmf->vma->vm_mm, vmf->pmd); spin_lock(ptl); - same = pte_same(*page_table, orig_pte); + same = pte_same(*vmf->pte, vmf->orig_pte); spin_unlock(ptl); } #endif - pte_unmap(page_table); + pte_unmap(vmf->pte); + /* After unmap of pte, the pointer is invalid now - clear it. */ + vmf->pte = NULL; return same; } @@ -3308,7 +3309,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) vm_fault_t ret = 0; void *shadow = NULL; - if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, vmf->orig_pte)) + if (!pte_unmap_same(vmf)) goto out; entry = pte_to_swp_entry(vmf->orig_pte); From patchwork Tue Mar 23 00:48:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156481 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD420C433C1 for ; Tue, 23 Mar 2021 00:49:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 512C16199F for ; Tue, 23 Mar 2021 00:49:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 512C16199F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 226BC6B0103; Mon, 22 Mar 2021 20:49:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 163C16B0115; Mon, 22 Mar 2021 20:49:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id ECB6D6B011B; Mon, 22 Mar 2021 20:49:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id B9A496B0103 for ; Mon, 22 Mar 2021 20:49:23 -0400 (EDT) Received: from smtpin07.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 75C4A6D99 for ; Tue, 23 Mar 2021 00:49:23 +0000 (UTC) X-FDA: 77949305406.07.BFC374A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 64E7D132 for ; Tue, 23 Mar 2021 00:49:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460562; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=+DNoRIuZQ4jXaA0anHGtp4SMKUmZM5prtrzcj00Thqk=; b=XtTFj8IyaqU0rf7kMrDXzfnW2CtZQw0SpL5j5g067yq1Q2sOtgf2x5BRl+/A8gZ8RLbmAe dt9FHm0wJIky17Ml5/sP6WjNq92E/JgGd4e4CTl+SE1yOZ/Y04QYZj+z2hbH0AyM1To9Zm 3TKY98ReYmC2iKvT9AYgkwQbb0ARQQo= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-206-_dUnAlqaMxSiq_HmOwVgwQ-1; Mon, 22 Mar 2021 20:49:21 -0400 X-MC-Unique: _dUnAlqaMxSiq_HmOwVgwQ-1 Received: by mail-qt1-f197.google.com with SMTP id f26so388943qtq.17 for ; Mon, 22 Mar 2021 17:49:21 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=+DNoRIuZQ4jXaA0anHGtp4SMKUmZM5prtrzcj00Thqk=; b=VXVmjvKo8DSYvh90BLspbX2TwdYoIeD3cJQCawmnRM1UHZyVsHBYLM5TH5ET0O0nD1 XtrNLRGC4shEdgYkiJlvkz91fxOkKUJe4BjDsD8VyZoE/G1flMEC1G1jskskqsIUpbjI xpk8Jeb28DNgfe4OhIz5man4DfW2bt8WAiX0c+9gkH1SP/2eTk1rM2OXL9N/V7Ou2LLX TmvOKuGGJNzAXTsdHYleYgMuui1n0PsNfPbtFF0aQSZt43f3ZmVJAqhqpcKy6BhPOL85 cfAMNydDaPM4wlgbKfjPmlymdrZu8SR76PcoRrwxg8Usns7LhnawGs2I1ohYNTHF7GKm THLQ== X-Gm-Message-State: AOAM530ideaIeiqQPnnjWMyTPavOa4aueJwpRcykpTOBz+8wfWhokmFb +yLh8o4PUWMuxEriMNkFQ/6VFuWO61SchiMHNc2hXnH+kYtjzwRgApeYkdUzGGzyesGZLI7N82R 0efWNBCIg+M0= X-Received: by 2002:a05:622a:1342:: with SMTP id w2mr2320750qtk.163.1616460560366; Mon, 22 Mar 2021 17:49:20 -0700 (PDT) X-Google-Smtp-Source: ABdhPJw9DxGHZBapow5ellIcCzucBZ0qtKlDBcxBkEZT6mDMY9J0jUAxRb3/6MZ2mRDfYsubV/pAsQ== X-Received: by 2002:a05:622a:1342:: with SMTP id w2mr2320733qtk.163.1616460560045; Mon, 22 Mar 2021 17:49:20 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.18 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:19 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 03/23] mm/userfaultfd: Introduce special pte for unmapped file-backed mem Date: Mon, 22 Mar 2021 20:48:52 -0400 Message-Id: <20210323004912.35132-4-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: accocgucfqxi1ikxjs6iyi9j9p3nzczp X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 64E7D132 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf29; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460562-625846 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This patch introduces a very special swap-like pte for file-backed memories. Currently it's only defined for x86_64 only, but as long as any arch that can properly define the UFFD_WP_SWP_PTE_SPECIAL value as requested, it should conceptually work too. We will use this special pte to arm the ptes that got either unmapped or swapped out for a file-backed region that was previously wr-protected. This special pte could trigger a page fault just like swap entries, and as long as the page fault will satisfy pte_none()==false && pte_present()==false. Then we can revive the special pte into a normal pte backed by the page cache. This idea is greatly inspired by Hugh and Andrea in the discussion, which is referenced in the links below. The other idea (from Hugh) is that we use swp_type==1 and swp_offset=0 as the special pte. The current solution (as pointed out by Andrea) is slightly preferred in that we don't even need swp_entry_t knowledge at all in trapping these accesses. Meanwhile, we also reuse _PAGE_SWP_UFFD_WP from the anonymous swp entries. This patch only introduces the special pte and its operators. It's not yet applied to have any functional difference. Link: https://lore.kernel.org/lkml/20201126222359.8120-1-peterx@redhat.com/ Link: https://lore.kernel.org/lkml/20201130230603.46187-1-peterx@redhat.com/ Suggested-by: Andrea Arcangeli Suggested-by: Hugh Dickins Signed-off-by: Peter Xu --- arch/x86/include/asm/pgtable.h | 28 ++++++++++++++++++++++++++++ include/asm-generic/pgtable_uffd.h | 3 +++ include/linux/userfaultfd_k.h | 21 +++++++++++++++++++++ 3 files changed, 52 insertions(+) diff --git a/arch/x86/include/asm/pgtable.h b/arch/x86/include/asm/pgtable.h index a02c67291cfc..379bae343dd1 100644 --- a/arch/x86/include/asm/pgtable.h +++ b/arch/x86/include/asm/pgtable.h @@ -1329,6 +1329,34 @@ static inline pmd_t pmd_swp_clear_soft_dirty(pmd_t pmd) #endif #ifdef CONFIG_HAVE_ARCH_USERFAULTFD_WP + +/* + * This is a very special swap-like pte that marks this pte as "wr-protected" + * by userfaultfd-wp. It should only exist for file-backed memory where the + * page (previously got wr-protected) has been unmapped or swapped out. + * + * For anonymous memories, the userfaultfd-wp _PAGE_SWP_UFFD_WP bit is kept + * along with a real swp entry instead. + * + * Let's make some rules for this special pte: + * + * (1) pte_none()==false, so that it'll not trigger a missing page fault. + * + * (2) pte_present()==false, so that it's recognized as swap (is_swap_pte). + * + * (3) pte_swp_uffd_wp()==true, so it can be tested just like a swap pte that + * contains a valid swap entry, so that we can check a swap pte always + * using "is_swap_pte() && pte_swp_uffd_wp()" without caring about whether + * there's one swap entry inside of the pte. + * + * (4) It should not be a valid swap pte anywhere, so that when we see this pte + * we know it does not contain a swap entry. + * + * For x86, the simplest special pte which satisfies all of above should be the + * pte with only _PAGE_SWP_UFFD_WP bit set (where swp_type==swp_offset==0). + */ +#define UFFD_WP_SWP_PTE_SPECIAL __pte(_PAGE_SWP_UFFD_WP) + static inline pte_t pte_swp_mkuffd_wp(pte_t pte) { return pte_set_flags(pte, _PAGE_SWP_UFFD_WP); diff --git a/include/asm-generic/pgtable_uffd.h b/include/asm-generic/pgtable_uffd.h index 828966d4c281..95e9811ce9d1 100644 --- a/include/asm-generic/pgtable_uffd.h +++ b/include/asm-generic/pgtable_uffd.h @@ -2,6 +2,9 @@ #define _ASM_GENERIC_PGTABLE_UFFD_H #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP + +#define UFFD_WP_SWP_PTE_SPECIAL __pte(0) + static __always_inline int pte_uffd_wp(pte_t pte) { return 0; diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 794d1538b8ba..bc733512c690 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -140,6 +140,17 @@ extern int userfaultfd_unmap_prep(struct vm_area_struct *vma, extern void userfaultfd_unmap_complete(struct mm_struct *mm, struct list_head *uf); +static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma) +{ + WARN_ON_ONCE(vma_is_anonymous(vma)); + return UFFD_WP_SWP_PTE_SPECIAL; +} + +static inline bool pte_swp_uffd_wp_special(pte_t pte) +{ + return pte_same(pte, UFFD_WP_SWP_PTE_SPECIAL); +} + #else /* CONFIG_USERFAULTFD */ /* mm helpers */ @@ -229,6 +240,16 @@ static inline void userfaultfd_unmap_complete(struct mm_struct *mm, { } +static inline pte_t pte_swp_mkuffd_wp_special(struct vm_area_struct *vma) +{ + return __pte(0); +} + +static inline bool pte_swp_uffd_wp_special(pte_t pte) +{ + return false; +} + #endif /* CONFIG_USERFAULTFD */ #endif /* _LINUX_USERFAULTFD_K_H */ From patchwork Tue Mar 23 00:48:53 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156483 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 66D4AC433DB for ; Tue, 23 Mar 2021 00:49:31 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BB08C6199F for ; Tue, 23 Mar 2021 00:49:30 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB08C6199F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 028C56B0115; Mon, 22 Mar 2021 20:49:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F1CCC6B011C; Mon, 22 Mar 2021 20:49:25 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CFC096B011E; Mon, 22 Mar 2021 20:49:25 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0191.hostedemail.com [216.40.44.191]) by kanga.kvack.org (Postfix) with ESMTP id A53EC6B0115 for ; Mon, 22 Mar 2021 20:49:25 -0400 (EDT) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 6C2268249980 for ; Tue, 23 Mar 2021 00:49:25 +0000 (UTC) X-FDA: 77949305490.23.F053D3A Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf25.hostedemail.com (Postfix) with ESMTP id D871A6000100 for ; Tue, 23 Mar 2021 00:49:23 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460564; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gDE3yzI45vtOy9qhGrIFbz2SqIX8w7YY3YHbRAtraVY=; b=MH/owfa2CEV82f0jO6NPY5D3pX0INxrGNFIi1dteUOam0gqMivEXUs0SsARIIpSPc4BLuZ 3TGXwilIT6HZrkGttJwAzVKEShqSRYOgahp+x2mH2/BckNRS6Nu5v/Buv+yVdTbiTiheoA XbSk3fKu1vwpzzDC1t7lVj6bUkXQzjM= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-348-hsCTmkxTNsCcQ4QTiTlW7A-1; Mon, 22 Mar 2021 20:49:22 -0400 X-MC-Unique: hsCTmkxTNsCcQ4QTiTlW7A-1 Received: by mail-qk1-f197.google.com with SMTP id y22so778451qkb.23 for ; Mon, 22 Mar 2021 17:49:22 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gDE3yzI45vtOy9qhGrIFbz2SqIX8w7YY3YHbRAtraVY=; b=IMcwkgAXJbT2syAt+fZ70RU6qkDUxpkvuz6/Bm+Tntdh8aUjICUfphWomnrHGpgYxR r/nlFcGouvTljAx3jbG9lSCj3GsIkfMwe43Fe/NS32EG8fykCHLVm3Vy7aem074TnR8h ULnW+Agy5ezx87y3oxL5x9KCPtJ5mAeQf2yB3oze0R5Vuvl3n0bNYInbiEolEDEDaJ26 q5jWXA/kgxNf3puRABhtI4C9gTmUiKLxy9fQEX2nvCjLdJw0pSbxisoV6o7iFmNHnWSQ h1A3H+3aGLkzVB0vOufSaeeIJK3AcGkO8ySD66ofTJol8q0DKXgnP8vDx23rs8b+P90w vKMg== X-Gm-Message-State: AOAM531bV5FTo+RlL1nxrxasME2y1QrLE+wQPwsgV4U0FHnn8DkbL8Cl 0KCS4Maaelxy7SZ4PWfxC5aj9diJhn4J2eeZxqO11FAhEVTHA2fg/B1RqZaZCZNwIG6ZtZJEl/S r6UbkzBSUXyQ= X-Received: by 2002:a37:9f4e:: with SMTP id i75mr3001735qke.283.1616460562110; Mon, 22 Mar 2021 17:49:22 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwCy3pgphUvT2476IhICTto4uU5cu5+czhoWfvrmrBMwW3mKzfYdYM0mnEfxBHC8BLk+kqHHg== X-Received: by 2002:a37:9f4e:: with SMTP id i75mr3001701qke.283.1616460561699; Mon, 22 Mar 2021 17:49:21 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:20 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 04/23] mm/swap: Introduce the idea of special swap ptes Date: Mon, 22 Mar 2021 20:48:53 -0400 Message-Id: <20210323004912.35132-5-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: 3s5buj71aysnpujeraim7wo58gj8nxw5 X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: D871A6000100 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf25; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460563-285985 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We used to have special swap entries, like migration entries, hw-poison entries, device private entries, etc. Those "special swap entries" reside in the range that they need to be at least swap entries first, and their types are decided by swp_type(entry). This patch introduces another idea called "special swap ptes". It's very easy to get confused against "special swap entries", but a speical swap pte should never contain a swap entry at all. It means, it's illegal to call pte_to_swp_entry() upon a special swap pte. Make the uffd-wp special pte to be the first special swap pte. Before this patch, is_swap_pte()==true means one of the below: (a.1) The pte has a normal swap entry (non_swap_entry()==false). For example, when an anonymous page got swapped out. (a.2) The pte has a special swap entry (non_swap_entry()==true). For example, a migration entry, a hw-poison entry, etc. After this patch, is_swap_pte()==true means one of the below, where case (b) is added: (a) The pte contains a swap entry. (a.1) The pte has a normal swap entry (non_swap_entry()==false). For example, when an anonymous page got swapped out. (a.2) The pte has a special swap entry (non_swap_entry()==true). For example, a migration entry, a hw-poison entry, etc. (b) The pte does not contain a swap entry at all (so it cannot be passed into pte_to_swp_entry()). For example, uffd-wp special swap pte. Teach the whole mm core about this new idea. It's done by introducing another helper called pte_has_swap_entry() which stands for case (a.1) and (a.2). Before this patch, it will be the same as is_swap_pte() because there's no special swap pte yet. Now for most of the previous use of is_swap_entry() in mm core, we'll need to use the new helper pte_has_swap_entry() instead, to make sure we won't try to parse a swap entry from a swap special pte (which does not contain a swap entry at all!). We either handle the swap special pte, or it'll naturally use the default "else" paths. Warn properly (e.g., in do_swap_page()) when we see a special swap pte - we should never call do_swap_page() upon those ptes, but just to bail out early if it happens. Signed-off-by: Peter Xu --- arch/arm64/kernel/mte.c | 2 +- fs/proc/task_mmu.c | 14 ++++++++------ include/linux/swapops.h | 39 ++++++++++++++++++++++++++++++++++++++- mm/gup.c | 2 +- mm/hmm.c | 2 +- mm/khugepaged.c | 11 ++++++++++- mm/madvise.c | 4 ++-- mm/memcontrol.c | 2 +- mm/memory.c | 7 +++++++ mm/migrate.c | 4 ++-- mm/mincore.c | 2 +- mm/mprotect.c | 2 +- mm/mremap.c | 2 +- mm/page_vma_mapped.c | 6 +++--- mm/swapfile.c | 2 +- 15 files changed, 78 insertions(+), 23 deletions(-) diff --git a/arch/arm64/kernel/mte.c b/arch/arm64/kernel/mte.c index b3c70a612c7a..ebe213cba913 100644 --- a/arch/arm64/kernel/mte.c +++ b/arch/arm64/kernel/mte.c @@ -30,7 +30,7 @@ static void mte_sync_page_tags(struct page *page, pte_t *ptep, bool check_swap) { pte_t old_pte = READ_ONCE(*ptep); - if (check_swap && is_swap_pte(old_pte)) { + if (check_swap && pte_has_swap_entry(old_pte)) { swp_entry_t entry = pte_to_swp_entry(old_pte); if (!non_swap_entry(entry) && mte_restore_tags(entry, page)) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index fc9784544b24..4c95cc57a66a 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -498,7 +498,7 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, if (pte_present(*pte)) { page = vm_normal_page(vma, addr, *pte); - } else if (is_swap_pte(*pte)) { + } else if (pte_has_swap_entry(*pte)) { swp_entry_t swpent = pte_to_swp_entry(*pte); if (!non_swap_entry(swpent)) { @@ -518,8 +518,10 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr, page = migration_entry_to_page(swpent); else if (is_device_private_entry(swpent)) page = device_private_entry_to_page(swpent); - } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap - && pte_none(*pte))) { + } else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && + mss->check_shmem_swap && + /* Here swap special pte is the same as none pte */ + (pte_none(*pte) || is_swap_special_pte(*pte)))) { page = xa_load(&vma->vm_file->f_mapping->i_pages, linear_page_index(vma, addr)); if (xa_is_value(page)) @@ -691,7 +693,7 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, if (pte_present(*pte)) { page = vm_normal_page(vma, addr, *pte); - } else if (is_swap_pte(*pte)) { + } else if (pte_has_swap_entry(*pte)) { swp_entry_t swpent = pte_to_swp_entry(*pte); if (is_migration_entry(swpent)) @@ -1075,7 +1077,7 @@ static inline void clear_soft_dirty(struct vm_area_struct *vma, ptent = pte_wrprotect(old_pte); ptent = pte_clear_soft_dirty(ptent); ptep_modify_prot_commit(vma, addr, pte, old_pte, ptent); - } else if (is_swap_pte(ptent)) { + } else if (pte_has_swap_entry(ptent)) { ptent = pte_swp_clear_soft_dirty(ptent); set_pte_at(vma->vm_mm, addr, pte, ptent); } @@ -1375,7 +1377,7 @@ static pagemap_entry_t pte_to_pagemap_entry(struct pagemapread *pm, page = vm_normal_page(vma, addr, pte); if (pte_soft_dirty(pte)) flags |= PM_SOFT_DIRTY; - } else if (is_swap_pte(pte)) { + } else if (pte_has_swap_entry(pte)) { swp_entry_t entry; if (pte_swp_soft_dirty(pte)) flags |= PM_SOFT_DIRTY; diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 7dd57303bb0c..7b7387d2892f 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -5,6 +5,7 @@ #include #include #include +#include #ifdef CONFIG_MMU @@ -52,12 +53,48 @@ static inline pgoff_t swp_offset(swp_entry_t entry) return entry.val & SWP_OFFSET_MASK; } -/* check whether a pte points to a swap entry */ +/* + * is_swap_pte() returns true for three cases: + * + * (a) The pte contains a swap entry. + * + * (a.1) The pte has a normal swap entry (non_swap_entry()==false). For + * example, when an anonymous page got swapped out. + * + * (a.2) The pte has a special swap entry (non_swap_entry()==true). For + * example, a migration entry, a hw-poison entry, etc. + * + * (b) The pte does not contain a swap entry at all (so it cannot be passed + * into pte_to_swp_entry()). For example, uffd-wp special swap pte. + */ static inline int is_swap_pte(pte_t pte) { return !pte_none(pte) && !pte_present(pte); } +/* + * A swap-like special pte should only be used as special marker to trigger a + * page fault. We should treat them similarly as pte_none() in most cases, + * except that it may contain some special information that can persist within + * the pte. Currently the only special swap pte is UFFD_WP_SWP_PTE_SPECIAL. + * + * Note: we should never call pte_to_swp_entry() upon a special swap pte, + * Because a swap special pte does not contain a swap entry! + */ +static inline bool is_swap_special_pte(pte_t pte) +{ + return pte_swp_uffd_wp_special(pte); +} + +/* + * Returns true if the pte contains a swap entry. This includes not only the + * normal swp entry case, but also for migration entries, etc. + */ +static inline bool pte_has_swap_entry(pte_t pte) +{ + return is_swap_pte(pte) && !is_swap_special_pte(pte); +} + /* * Convert the arch-dependent pte representation of a swp_entry_t into an * arch-independent swp_entry_t. diff --git a/mm/gup.c b/mm/gup.c index b3e647c8b7ee..53e9ddc3a829 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -474,7 +474,7 @@ static struct page *follow_page_pte(struct vm_area_struct *vma, */ if (likely(!(flags & FOLL_MIGRATION))) goto no_page; - if (pte_none(pte)) + if (!pte_has_swap_entry(pte)) goto no_page; entry = pte_to_swp_entry(pte); if (!is_migration_entry(entry)) diff --git a/mm/hmm.c b/mm/hmm.c index 943cb2ba4442..4dba5debf163 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -237,7 +237,7 @@ static int hmm_vma_handle_pte(struct mm_walk *walk, unsigned long addr, pte_t pte = *ptep; uint64_t pfn_req_flags = *hmm_pfn; - if (pte_none(pte)) { + if (pte_none(pte) || is_swap_special_pte(pte)) { required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, 0); if (required_fault) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b81521dfbb1a..419a6acce326 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1019,7 +1019,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, vmf.pte = pte_offset_map(pmd, address); vmf.orig_pte = *vmf.pte; - if (!is_swap_pte(vmf.orig_pte)) { + if (!pte_has_swap_entry(vmf.orig_pte)) { pte_unmap(vmf.pte); continue; } @@ -1248,6 +1248,15 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; if (is_swap_pte(pteval)) { + if (is_swap_special_pte(pteval)) { + /* + * Reuse SCAN_PTE_UFFD_WP. If there will be + * new users of is_swap_special_pte(), we'd + * better introduce a new result type. + */ + result = SCAN_PTE_UFFD_WP; + goto out_unmap; + } if (++unmapped <= khugepaged_max_ptes_swap) { /* * Always be strict with uffd-wp diff --git a/mm/madvise.c b/mm/madvise.c index 01fef79ac761..c77499d21aac 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -202,7 +202,7 @@ static int swapin_walk_pmd_entry(pmd_t *pmd, unsigned long start, pte = *(orig_pte + ((index - start) / PAGE_SIZE)); pte_unmap_unlock(orig_pte, ptl); - if (pte_present(pte) || pte_none(pte)) + if (!pte_has_swap_entry(pte)) continue; entry = pte_to_swp_entry(pte); if (unlikely(non_swap_entry(entry))) @@ -594,7 +594,7 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, for (; addr != end; pte++, addr += PAGE_SIZE) { ptent = *pte; - if (pte_none(ptent)) + if (pte_none(ptent) || is_swap_special_pte(ptent)) continue; /* * If the pte has swp_entry, just clear page table to diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 668d1d7c2645..64b347a15ded 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -5558,7 +5558,7 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma, if (pte_present(ptent)) page = mc_handle_present_pte(vma, addr, ptent); - else if (is_swap_pte(ptent)) + else if (pte_has_swap_entry(ptent)) page = mc_handle_swap_pte(vma, ptent, &ent); else if (pte_none(ptent)) page = mc_handle_file_pte(vma, addr, ptent, &ent); diff --git a/mm/memory.c b/mm/memory.c index d534eba85756..8c4ed1f9693c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3312,6 +3312,13 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) if (!pte_unmap_same(vmf)) goto out; + /* + * We should never call do_swap_page upon a swap special pte; just be + * safe to bail out if it happens. + */ + if (WARN_ON_ONCE(is_swap_special_pte(vmf->orig_pte))) + goto out; + entry = pte_to_swp_entry(vmf->orig_pte); if (unlikely(non_swap_entry(entry))) { if (is_migration_entry(entry)) { diff --git a/mm/migrate.c b/mm/migrate.c index 47df0df8f21a..08425acc2563 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -314,7 +314,7 @@ void __migration_entry_wait(struct mm_struct *mm, pte_t *ptep, spin_lock(ptl); pte = *ptep; - if (!is_swap_pte(pte)) + if (!pte_has_swap_entry(pte)) goto out; entry = pte_to_swp_entry(pte); @@ -2425,7 +2425,7 @@ static int migrate_vma_collect_pmd(pmd_t *pmdp, pte = *ptep; - if (pte_none(pte)) { + if (pte_none(pte) || is_swap_special_pte(pte)) { if (vma_is_anonymous(vma)) { mpfn = MIGRATE_PFN_MIGRATE; migrate->cpages++; diff --git a/mm/mincore.c b/mm/mincore.c index 9122676b54d6..5728c3e6473f 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -121,7 +121,7 @@ static int mincore_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, for (; addr != end; ptep++, addr += PAGE_SIZE) { pte_t pte = *ptep; - if (pte_none(pte)) + if (pte_none(pte) || is_swap_special_pte(pte)) __mincore_unmapped_range(addr, addr + PAGE_SIZE, vma, vec); else if (pte_present(pte)) diff --git a/mm/mprotect.c b/mm/mprotect.c index 94188df1ee55..b3def0a102bf 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -139,7 +139,7 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, } ptep_modify_prot_commit(vma, addr, pte, oldpte, ptent); pages++; - } else if (is_swap_pte(oldpte)) { + } else if (pte_has_swap_entry(oldpte)) { swp_entry_t entry = pte_to_swp_entry(oldpte); pte_t newpte; diff --git a/mm/mremap.c b/mm/mremap.c index 6934d199da54..cd9759ede04b 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -124,7 +124,7 @@ static pte_t move_soft_dirty_pte(pte_t pte) #ifdef CONFIG_MEM_SOFT_DIRTY if (pte_present(pte)) pte = pte_mksoft_dirty(pte); - else if (is_swap_pte(pte)) + else if (pte_has_swap_entry(pte)) pte = pte_swp_mksoft_dirty(pte); #endif return pte; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 86e3a3688d59..6b51759d9203 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -36,7 +36,7 @@ static bool map_pte(struct page_vma_mapped_walk *pvmw) * For more details on device private memory see HMM * (include/linux/hmm.h or mm/hmm.c). */ - if (is_swap_pte(*pvmw->pte)) { + if (pte_has_swap_entry(*pvmw->pte)) { swp_entry_t entry; /* Handle un-addressable ZONE_DEVICE memory */ @@ -89,7 +89,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) if (pvmw->flags & PVMW_MIGRATION) { swp_entry_t entry; - if (!is_swap_pte(*pvmw->pte)) + if (!pte_has_swap_entry(*pvmw->pte)) return false; entry = pte_to_swp_entry(*pvmw->pte); @@ -97,7 +97,7 @@ static bool check_pte(struct page_vma_mapped_walk *pvmw) return false; pfn = migration_entry_to_pfn(entry); - } else if (is_swap_pte(*pvmw->pte)) { + } else if (pte_has_swap_entry(*pvmw->pte)) { swp_entry_t entry; /* Handle un-addressable ZONE_DEVICE memory */ diff --git a/mm/swapfile.c b/mm/swapfile.c index 149e77454e3c..8aa4be074659 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -1964,7 +1964,7 @@ static int unuse_pte_range(struct vm_area_struct *vma, pmd_t *pmd, si = swap_info[type]; pte = pte_offset_map(pmd, addr); do { - if (!is_swap_pte(*pte)) + if (!pte_has_swap_entry(*pte)) continue; entry = pte_to_swp_entry(*pte); From patchwork Tue Mar 23 00:48:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156485 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 517E3C433C1 for ; Tue, 23 Mar 2021 00:49:34 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id BB5B76199F for ; Tue, 23 Mar 2021 00:49:33 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org BB5B76199F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 30F946B011C; Mon, 22 Mar 2021 20:49:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 221746B011F; Mon, 22 Mar 2021 20:49:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0259E6B0121; Mon, 22 Mar 2021 20:49:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0183.hostedemail.com [216.40.44.183]) by kanga.kvack.org (Postfix) with ESMTP id D2CB86B011C for ; Mon, 22 Mar 2021 20:49:26 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 92E798249980 for ; Tue, 23 Mar 2021 00:49:26 +0000 (UTC) X-FDA: 77949305532.35.9696DEF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf17.hostedemail.com (Postfix) with ESMTP id DC2EC407F8F3 for ; Tue, 23 Mar 2021 00:49:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460565; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=hDWZ3WcjzLBFN4Ysj1UCXP8mvHfkjlwVRvCaVc8sMRI=; b=RY9moxalfhN/iKhCp4bMClszUF28OCuZiq+axhiQ25DoS/WMo+oE72nKnDVwucF21OaI6A Uc4JuG/s5449SZF/snV6h1VCgh/EAQA18/3Lcdlp+reuk/uEZtiwngp7nIvkzU7koXQwFz fPdFaKf79AiSxqAySSAQDcKhO/tut4M= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-296-Weem5-HJONWqat12T6Kn-g-1; Mon, 22 Mar 2021 20:49:24 -0400 X-MC-Unique: Weem5-HJONWqat12T6Kn-g-1 Received: by mail-qv1-f72.google.com with SMTP id o14so543859qvn.18 for ; Mon, 22 Mar 2021 17:49:24 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=hDWZ3WcjzLBFN4Ysj1UCXP8mvHfkjlwVRvCaVc8sMRI=; b=aJ+e866ykPVV7lIbIYnwTinEn+ye9lX8pnWkFYSKvu2MrYGMJ0UGEbKKLZVuehVCvr FMjGKtBBQY6eHlYcdw8QU0Xxw6xPuM+3aLq2i/klbJ+TdT75bKgKXIOb94Tke9awo9bi gIIvA+x2xPTNeSYDwD7jQDSKuKgvGZ1+LpvzNW3beoqBJIBL1VzCdo4SGpVbrKbrt/GW oaUwnIEVoMJx3ffO3UDxKiZUZ/mAW4UwkDsUHISugnJ+UUHxuRirJkpq3B3fAMunE0bm E1cxvkhzgk+JDFPSqBhJcMlBs+4NDiul1uzDNKuoEUp9fuB2ixKOllHve33uXmhogXqs Hluw== X-Gm-Message-State: AOAM531IU8SFPyNm6aXB3FXTSDPzmRw4IUc0MX/psIrZndZd4TMLCPk7 pz6L7xS3Ph+7LEj7ONxIQfDlTzHCNKf05iMjipdNY3mWfwQFIvaor/8je0sVBGb1mhzebv0TFgE b+yXkrUbA0bM= X-Received: by 2002:ac8:5212:: with SMTP id r18mr2326354qtn.290.1616460563373; Mon, 22 Mar 2021 17:49:23 -0700 (PDT) X-Google-Smtp-Source: ABdhPJygmCCy7L0hZsid2x69DTLeqwC012NMnR0C/+lqM+3H/VKhd03Jxq9vTuX6ffs2CJHo0ib0KA== X-Received: by 2002:ac8:5212:: with SMTP id r18mr2326329qtn.290.1616460562949; Mon, 22 Mar 2021 17:49:22 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:22 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 05/23] shmem/userfaultfd: Handle uffd-wp special pte in page fault handler Date: Mon, 22 Mar 2021 20:48:54 -0400 Message-Id: <20210323004912.35132-6-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: 7r7yihh8gx1ih8ppug1i5ws7mhfs64c6 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: DC2EC407F8F3 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf17; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460564-609652 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: File-backed memories are prone to unmap/swap so the ptes are always unstable. This could lead to userfaultfd-wp information got lost when unmapped or swapped out on such types of memory, for example, shmem. To keep such an information persistent, we will start to use the newly introduced swap-like special ptes to replace a null pte when those ptes were removed. Prepare this by handling such a special pte first before it is applied. Here a new fault flag FAULT_FLAG_UFFD_WP is introduced. When this flag is set, it means the current fault is to resolve a page access (either read or write) to the uffd-wp special pte. The handling of this special pte page fault is similar to missing fault, but it should happen after the pte missing logic since the special pte is designed to be a swap-like pte. Meanwhile it should be handled before do_swap_page() so that the swap core logic won't be confused to see such an illegal swap pte. This is a slow path of uffd-wp handling, because unmap of wr-protected shmem ptes should be rare. So far it should only trigger in two conditions: (1) When trying to punch holes in shmem_fallocate(), there will be a pre-unmap optimization before evicting the page. That will create unmapped shmem ptes with wr-protected pages covered. (2) Swapping out of shmem pages Because of this, the page fault handling is simplifed too by always assuming it's a read fault when calling do_fault(). When it's a write fault, it'll fault again when retry the page access, then do_wp_page() will handle the rest of message generation and delivery to the userfaultfd. Disable fault-around for such a special page fault, because the introduced new flag (FAULT_FLAG_UFFD_WP) only applies to current pte rather than all the pages around it. Doing fault-around with the new flag could confuse all the rest of pages when installing ptes from page cache when there's a cache hit. Signed-off-by: Peter Xu --- include/linux/mm.h | 2 + include/linux/userfaultfd_k.h | 11 ++++ mm/memory.c | 103 +++++++++++++++++++++++++++++++--- 3 files changed, 107 insertions(+), 9 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index cb1e191da319..b78306eb7a63 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -441,6 +441,7 @@ extern pgprot_t protection_map[16]; * @FAULT_FLAG_REMOTE: The fault is not for current task/mm. * @FAULT_FLAG_INSTRUCTION: The fault was during an instruction fetch. * @FAULT_FLAG_INTERRUPTIBLE: The fault can be interrupted by non-fatal signals. + * @FAULT_FLAG_UFFD_WP: When install new page entries, set uffd-wp bit. * * About @FAULT_FLAG_ALLOW_RETRY and @FAULT_FLAG_TRIED: we can specify * whether we would allow page faults to retry by specifying these two @@ -471,6 +472,7 @@ extern pgprot_t protection_map[16]; #define FAULT_FLAG_REMOTE 0x80 #define FAULT_FLAG_INSTRUCTION 0x100 #define FAULT_FLAG_INTERRUPTIBLE 0x200 +#define FAULT_FLAG_UFFD_WP 0x400 /* * The default fault flags that should be used by most of the diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index bc733512c690..fefebe6e9656 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -89,6 +89,17 @@ static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); } +/* + * Don't do fault around for FAULT_FLAG_UFFD_WP because it means we want to + * recover a previously wr-protected pte. This flag is a per-pte information, + * so it could confuse all the pages around the current page when faulted in. + * Similar reason for MINOR mode faults. + */ +static inline bool uffd_disable_fault_around(struct vm_area_struct *vma) +{ + return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; diff --git a/mm/memory.c b/mm/memory.c index 8c4ed1f9693c..b4ddba343abc 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3775,6 +3775,7 @@ vm_fault_t do_set_pmd(struct vm_fault *vmf, struct page *page) void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) { struct vm_area_struct *vma = vmf->vma; + bool uffd_wp = vmf->flags & FAULT_FLAG_UFFD_WP; bool write = vmf->flags & FAULT_FLAG_WRITE; bool prefault = vmf->address != addr; pte_t entry; @@ -3787,6 +3788,11 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) if (write) entry = maybe_mkwrite(pte_mkdirty(entry), vma); + if (uffd_wp) { + /* This should only be triggered by a read fault */ + WARN_ON_ONCE(write); + entry = pte_mkuffd_wp(pte_wrprotect(entry)); + } /* copy-on-write page */ if (write && !(vma->vm_flags & VM_SHARED)) { inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); @@ -3816,6 +3822,7 @@ void do_set_pte(struct vm_fault *vmf, struct page *page, unsigned long addr) */ vm_fault_t finish_fault(struct vm_fault *vmf) { + bool pte_stable, uffd_wp = vmf->flags & FAULT_FLAG_UFFD_WP; struct vm_area_struct *vma = vmf->vma; struct page *page; vm_fault_t ret; @@ -3854,8 +3861,14 @@ vm_fault_t finish_fault(struct vm_fault *vmf) vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, vmf->address, &vmf->ptl); ret = 0; + + if (unlikely(uffd_wp)) + pte_stable = pte_swp_uffd_wp_special(*vmf->pte); + else + pte_stable = pte_none(*vmf->pte); + /* Re-check under ptl */ - if (likely(pte_none(*vmf->pte))) + if (likely(pte_stable)) do_set_pte(vmf, page, vmf->address); else ret = VM_FAULT_NOPAGE; @@ -3959,9 +3972,21 @@ static vm_fault_t do_fault_around(struct vm_fault *vmf) return vmf->vma->vm_ops->map_pages(vmf, start_pgoff, end_pgoff); } +/* Return true if we should do read fault-around, false otherwise */ +static inline bool should_fault_around(struct vm_fault *vmf) +{ + /* No ->map_pages? No way to fault around... */ + if (!vmf->vma->vm_ops->map_pages) + return false; + + if (uffd_disable_fault_around(vmf->vma)) + return false; + + return fault_around_bytes >> PAGE_SHIFT > 1; +} + static vm_fault_t do_read_fault(struct vm_fault *vmf) { - struct vm_area_struct *vma = vmf->vma; vm_fault_t ret = 0; /* @@ -3969,12 +3994,10 @@ static vm_fault_t do_read_fault(struct vm_fault *vmf) * if page by the offset is not ready to be mapped (cold cache or * something). */ - if (vma->vm_ops->map_pages && fault_around_bytes >> PAGE_SHIFT > 1) { - if (likely(!userfaultfd_minor(vmf->vma))) { - ret = do_fault_around(vmf); - if (ret) - return ret; - } + if (should_fault_around(vmf)) { + ret = do_fault_around(vmf); + if (ret) + return ret; } ret = __do_fault(vmf); @@ -4284,6 +4307,68 @@ static vm_fault_t wp_huge_pud(struct vm_fault *vmf, pud_t orig_pud) return VM_FAULT_FALLBACK; } +static vm_fault_t uffd_wp_clear_special(struct vm_fault *vmf) +{ + vmf->pte = pte_offset_map_lock(vmf->vma->vm_mm, vmf->pmd, + vmf->address, &vmf->ptl); + /* + * Be careful so that we will only recover a special uffd-wp pte into a + * none pte. Otherwise it means the pte could have changed, so retry. + */ + if (pte_swp_uffd_wp_special(*vmf->pte)) + pte_clear(vmf->vma->vm_mm, vmf->address, vmf->pte); + pte_unmap_unlock(vmf->pte, vmf->ptl); + return 0; +} + +/* + * This is actually a page-missing access, but with uffd-wp special pte + * installed. It means this pte was wr-protected before being unmapped. + */ +static vm_fault_t uffd_wp_handle_special(struct vm_fault *vmf) +{ + /* Careful! vmf->pte unmapped after return */ + if (!pte_unmap_same(vmf)) + return 0; + + /* + * Just in case there're leftover special ptes even after the region + * got unregistered - we can simply clear them. + */ + if (unlikely(!userfaultfd_wp(vmf->vma) || vma_is_anonymous(vmf->vma))) + return uffd_wp_clear_special(vmf); + + /* + * Tell all the rest of the fault code: we're handling a special pte, + * always remember to arm the uffd-wp bit when intalling the new pte. + */ + vmf->flags |= FAULT_FLAG_UFFD_WP; + + /* + * Let's assume this is a read fault no matter what. If it is a real + * write access, it'll fault again into do_wp_page() where the message + * will be generated before the thread yields itself. + * + * Ideally we can also handle write immediately before return, but this + * should be a slow path already (pte unmapped), so be simple first. + */ + vmf->flags &= ~FAULT_FLAG_WRITE; + + return do_fault(vmf); +} + +static vm_fault_t do_swap_pte(struct vm_fault *vmf) +{ + /* + * We need to handle special swap ptes before handling ptes that + * contain swap entries, always. + */ + if (unlikely(pte_swp_uffd_wp_special(vmf->orig_pte))) + return uffd_wp_handle_special(vmf); + + return do_swap_page(vmf); +} + /* * These routines also need to handle stuff like marking pages dirty * and/or accessed for architectures that don't do it in hardware (most @@ -4358,7 +4443,7 @@ static vm_fault_t handle_pte_fault(struct vm_fault *vmf) } if (!pte_present(vmf->orig_pte)) - return do_swap_page(vmf); + return do_swap_pte(vmf); if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) return do_numa_page(vmf); From patchwork Tue Mar 23 00:48:55 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156487 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BC937C433E0 for ; Tue, 23 Mar 2021 00:49:36 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 60C456196C for ; Tue, 23 Mar 2021 00:49:36 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 60C456196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 0C3276B011F; Mon, 22 Mar 2021 20:49:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 024956B0122; Mon, 22 Mar 2021 20:49:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB9EB6B0123; Mon, 22 Mar 2021 20:49:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id B6E6A6B011F for ; Mon, 22 Mar 2021 20:49:29 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 811928249980 for ; Tue, 23 Mar 2021 00:49:29 +0000 (UTC) X-FDA: 77949305658.26.BDAD197 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf10.hostedemail.com (Postfix) with ESMTP id 379FC407F8E8 for ; Tue, 23 Mar 2021 00:49:28 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460568; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=jLKkf+ZPkMgUQMvwd6XDyh9qcQY+uJivyMkVgKqyvds=; b=DglbnqE3gGL7uWHXw4R5qlxNZQZ1iWKfzcq/cSp7vIgypLMkGn69AkbUxSbP6w8DxASIE5 pOx8mhyP0cfG27FSl5tOV7LKZFU9QzSnVNf/GwlmlkwpeS8LQw8hDj9byIOgbrvEHpuBAJ IJVJyVLtWa4BjkJxgfgrottq1UtnUg4= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-193-o5fsniqMMyOXrhYzNeIakg-1; Mon, 22 Mar 2021 20:49:25 -0400 X-MC-Unique: o5fsniqMMyOXrhYzNeIakg-1 Received: by mail-qt1-f197.google.com with SMTP id v18so418386qtx.0 for ; Mon, 22 Mar 2021 17:49:25 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=jLKkf+ZPkMgUQMvwd6XDyh9qcQY+uJivyMkVgKqyvds=; b=cwJKW02vsXn5mfAndMVrdGJtk9JVxo4MDF95EPDMoj0MrLOhYi+LR40pRdXuWrZ9Ta z7wOuO4mZLhTDiHEIlgb/W/pa07cF/GnmyPmvLHN5yY3mL6VKf8SbnpHbG5UrFaRGBFp O3qA07PDl+FOaF29aGtgPjeBCpuAjgLjEWf9CkOlxHICF+eR4plFpnIeypF+C7ipoKnE lsR/ctIH01f2OFBn5O+7Ovnszti3n6SL3GMki4qCpA8x6prVnXnFrboADR9pJ19Y7RBx 3KGuVXtyeXFi4Vk2FoFE55t4alwzj7hiBROcpFDjM1hsW9/NiswaS5dQH5gRpzd8dFcZ QzEg== X-Gm-Message-State: AOAM530/2HNV+xpVMR3ec2Y+zwlZ88rIsYVgrnCfY0EUIx4ZbAWTqmeE ArHSX/rmzKaI8MKWS2/u/ogIJ1miGtlCgajo00r0TnsWuxWTmmgSA5e4cRzSQxzHfN6dO6NKyJq OlXnwFt8s01k= X-Received: by 2002:a05:622a:38a:: with SMTP id j10mr2356387qtx.321.1616460564553; Mon, 22 Mar 2021 17:49:24 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxWTu7ggSiwl9a2400ldzG7CNv+Tgn2StD4wlyGGg2tML5uryFjvwpa6kp0vQM/I+lvGp7Ilw== X-Received: by 2002:a05:622a:38a:: with SMTP id j10mr2356368qtx.321.1616460564311; Mon, 22 Mar 2021 17:49:24 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:23 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 06/23] mm: Drop first_index/last_index in zap_details Date: Mon, 22 Mar 2021 20:48:55 -0400 Message-Id: <20210323004912.35132-7-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 379FC407F8E8 X-Stat-Signature: e4qukhpnmkcbr1r7kggmkpmr6qn69tmp Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf10; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460568-218694 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The first_index/last_index parameters in zap_details are actually only used in unmap_mapping_range_tree(). At the meantime, this function is only called by unmap_mapping_pages() once. Instead of passing these two variables through the whole stack of page zapping code, remove them from zap_details and let them simply be parameters of unmap_mapping_range_tree(), which is inlined. Signed-off-by: Peter Xu --- include/linux/mm.h | 2 -- mm/memory.c | 20 ++++++++++---------- 2 files changed, 10 insertions(+), 12 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index b78306eb7a63..389dd91134f9 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1707,8 +1707,6 @@ extern void user_shm_unlock(size_t, struct user_struct *); */ struct zap_details { struct address_space *check_mapping; /* Check page->mapping if set */ - pgoff_t first_index; /* Lowest page->index to unmap */ - pgoff_t last_index; /* Highest page->index to unmap */ }; struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, diff --git a/mm/memory.c b/mm/memory.c index b4ddba343abc..5e6175e00617 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3202,20 +3202,20 @@ static void unmap_mapping_range_vma(struct vm_area_struct *vma, } static inline void unmap_mapping_range_tree(struct rb_root_cached *root, + pgoff_t first_index, + pgoff_t last_index, struct zap_details *details) { struct vm_area_struct *vma; pgoff_t vba, vea, zba, zea; - vma_interval_tree_foreach(vma, root, - details->first_index, details->last_index) { - + vma_interval_tree_foreach(vma, root, first_index, last_index) { vba = vma->vm_pgoff; vea = vba + vma_pages(vma) - 1; - zba = details->first_index; + zba = first_index; if (zba < vba) zba = vba; - zea = details->last_index; + zea = last_index; if (zea > vea) zea = vea; @@ -3241,17 +3241,17 @@ static inline void unmap_mapping_range_tree(struct rb_root_cached *root, void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows) { + pgoff_t first_index = start, last_index = start + nr - 1; struct zap_details details = { }; details.check_mapping = even_cows ? NULL : mapping; - details.first_index = start; - details.last_index = start + nr - 1; - if (details.last_index < details.first_index) - details.last_index = ULONG_MAX; + if (last_index < first_index) + last_index = ULONG_MAX; i_mmap_lock_write(mapping); if (unlikely(!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root))) - unmap_mapping_range_tree(&mapping->i_mmap, &details); + unmap_mapping_range_tree(&mapping->i_mmap, first_index, + last_index, &details); i_mmap_unlock_write(mapping); } From patchwork Tue Mar 23 00:48:56 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156489 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37FDDC433DB for ; Tue, 23 Mar 2021 00:49:39 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B54DC6196C for ; Tue, 23 Mar 2021 00:49:38 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B54DC6196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 160A56B0122; Mon, 22 Mar 2021 20:49:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0ED006B0124; Mon, 22 Mar 2021 20:49:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DE69D6B0125; Mon, 22 Mar 2021 20:49:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0059.hostedemail.com [216.40.44.59]) by kanga.kvack.org (Postfix) with ESMTP id B196C6B0122 for ; Mon, 22 Mar 2021 20:49:30 -0400 (EDT) Received: from smtpin35.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 736DA181AF5C4 for ; Tue, 23 Mar 2021 00:49:30 +0000 (UTC) X-FDA: 77949305700.35.D09A9F9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf23.hostedemail.com (Postfix) with ESMTP id 80E4EA0000FC for ; Tue, 23 Mar 2021 00:49:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460569; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=fV3MtSOZBbfBia062YNmBOAWjcsiN3b7tgIAUmg+MOk=; b=JVEjazCTSPLJ8o7kXiZMpKFtV4MyT4DIYlc6BEmFektjuA1wzvTE4671mMsC0O2Bwh355a 3d2S7E37cBwm6wUIiNctCcxlw9mXJ8v8Gzjy/CYDaHiQRSutPhMxB6F57YM3CU8wWFHZE2 84HpWPebs2QtLjHEBx+/xo68N9PAbK8= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-84-ZDXs1yNHOBaB1yA7uEvqdg-1; Mon, 22 Mar 2021 20:49:26 -0400 X-MC-Unique: ZDXs1yNHOBaB1yA7uEvqdg-1 Received: by mail-qk1-f199.google.com with SMTP id p133so796709qka.17 for ; Mon, 22 Mar 2021 17:49:26 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=fV3MtSOZBbfBia062YNmBOAWjcsiN3b7tgIAUmg+MOk=; b=mA02RpSpm8SttPkooM0gsir4YZTo+rY1q+GabtTRudv51/mehEZIIKxlfARYgCwaRz m/7x1Daj38Se+fEhUDAwI8+k0b6WdPJNoytFy0dqGZqm40/vhwSK1is10HPF/gFrTyhF H56NUrPIRU5VjkG014oGYBMtEuWFLw8BPNpEqpg+npmcuxErBQnNW6ATie9YRmH+KBj4 R18X+0zNhy7bNDsGHQNEsfZRhS51Iz6KYWuF13dUlGG8yUiXLE7X5fxVRQKmV2ItqJ/k qYBufYtIMvRu2sQqAkulqlfWaoqKDNwt6RycFqLD5cTm5ZN1rSWdSxW0lPJuu9aw7ZkH lOfA== X-Gm-Message-State: AOAM532b0gRw28aoFRHIADjN/i2d6I2ieeYXBL1UEkrHBzd3o8rMayP5 +UJigoI+irW8F7Zckj6PL0uw4C9g8N+qz71pwueg/oOCeoaRvm/nOJpcvokM+h54HwcFlEptsJo 9Xd5yUbiUICE= X-Received: by 2002:ae9:e64b:: with SMTP id x11mr2923236qkl.290.1616460566114; Mon, 22 Mar 2021 17:49:26 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzq1QbH70CTFaHpcK8W1TcA+gRvt2O2nG2Ow+91rYU6OblOmJVrYvP9Ef+HtjDx0NsEGppiUg== X-Received: by 2002:ae9:e64b:: with SMTP id x11mr2923216qkl.290.1616460565830; Mon, 22 Mar 2021 17:49:25 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.24 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:25 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 07/23] mm: Introduce zap_details.zap_flags Date: Mon, 22 Mar 2021 20:48:56 -0400 Message-Id: <20210323004912.35132-8-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 80E4EA0000FC X-Stat-Signature: 7qsrpkjzfb7kza44ga47gejbmuonuyg9 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf23; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460569-326650 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Instead of trying to introduce one variable for every new zap_details fields, let's introduce a flag so that it can start to encode true/false informations. Let's start to use this flag first to clean up the only check_mapping variable. Firstly, the name "check_mapping" implies this is a "boolean", but actually it stores the mapping inside, just in a way that it won't be set if we don't want to check the mapping. To make things clearer, introduce the 1st zap flag ZAP_FLAG_CHECK_MAPPING, so that we only check against the mapping if this bit set. At the same time, we can rename check_mapping into zap_mapping and set it always. Since at it, introduce another helper zap_check_mapping_skip() and use it in zap_pte_range() properly. Some old comments have been removed in zap_pte_range() because they're duplicated, and since now we're with ZAP_FLAG_CHECK_MAPPING flag, it'll be very easy to grep this information by simply grepping the flag. It'll also make life easier when we want to e.g. pass in zap_flags into the callers like unmap_mapping_pages() (instead of adding new booleans besides the even_cows parameter). Signed-off-by: Peter Xu --- include/linux/mm.h | 19 ++++++++++++++++++- mm/memory.c | 31 ++++++++----------------------- 2 files changed, 26 insertions(+), 24 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 389dd91134f9..bb513a346beb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1702,13 +1702,30 @@ static inline bool can_do_mlock(void) { return false; } extern int user_shm_lock(size_t, struct user_struct *); extern void user_shm_unlock(size_t, struct user_struct *); +/* Whether to check page->mapping when zapping */ +#define ZAP_FLAG_CHECK_MAPPING BIT(0) + /* * Parameter block passed down to zap_pte_range in exceptional cases. */ struct zap_details { - struct address_space *check_mapping; /* Check page->mapping if set */ + struct address_space *zap_mapping; /* Check page->mapping if set */ + unsigned long zap_flags; /* Special flags for zapping */ }; +/* Return true if skip zapping this page, false otherwise */ +static inline bool +zap_check_mapping_skip(struct zap_details *details, struct page *page) +{ + if (!details || !page) + return false; + + if (!(details->zap_flags & ZAP_FLAG_CHECK_MAPPING)) + return false; + + return details->zap_mapping != page_rmapping(page); +} + struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte); struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, diff --git a/mm/memory.c b/mm/memory.c index 5e6175e00617..2e540b315481 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1242,16 +1242,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, struct page *page; page = vm_normal_page(vma, addr, ptent); - if (unlikely(details) && page) { - /* - * unmap_shared_mapping_pages() wants to - * invalidate cache without truncating: - * unmap shared but keep private pages. - */ - if (details->check_mapping && - details->check_mapping != page_rmapping(page)) - continue; - } + if (unlikely(zap_check_mapping_skip(details, page))) + continue; ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); tlb_remove_tlb_entry(tlb, pte, addr); @@ -1283,17 +1275,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, if (is_device_private_entry(entry)) { struct page *page = device_private_entry_to_page(entry); - if (unlikely(details && details->check_mapping)) { - /* - * unmap_shared_mapping_pages() wants to - * invalidate cache without truncating: - * unmap shared but keep private pages. - */ - if (details->check_mapping != - page_rmapping(page)) - continue; - } - + if (unlikely(zap_check_mapping_skip(details, page))) + continue; pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); rss[mm_counter(page)]--; page_remove_rmap(page, false); @@ -3242,9 +3225,11 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows) { pgoff_t first_index = start, last_index = start + nr - 1; - struct zap_details details = { }; + struct zap_details details = { .zap_mapping = mapping }; + + if (!even_cows) + details.zap_flags |= ZAP_FLAG_CHECK_MAPPING; - details.check_mapping = even_cows ? NULL : mapping; if (last_index < first_index) last_index = ULONG_MAX; From patchwork Tue Mar 23 00:48:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156491 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7F67CC433C1 for ; Tue, 23 Mar 2021 00:49:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 009D16196C for ; Tue, 23 Mar 2021 00:49:40 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 009D16196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8645E6B0124; Mon, 22 Mar 2021 20:49:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7C6D06B0125; Mon, 22 Mar 2021 20:49:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5CBC66B0127; Mon, 22 Mar 2021 20:49:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0124.hostedemail.com [216.40.44.124]) by kanga.kvack.org (Postfix) with ESMTP id 27C466B0125 for ; Mon, 22 Mar 2021 20:49:31 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id D7E44180226ED for ; Tue, 23 Mar 2021 00:49:30 +0000 (UTC) X-FDA: 77949305700.03.A9965A9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf06.hostedemail.com (Postfix) with ESMTP id ED255C0007CA for ; Tue, 23 Mar 2021 00:49:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460569; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=t66g/F9+6bLvOsk6XEcPT7bVSiazBzgvig/355lAXQE=; b=fazX2JDx6zjlqrJNQmADyFAJxboycQAnfAUgs3Cx5lW0BynMTchcub5VH4z5XezL4bQwvf yfAz0qWWXf+7jRb8pPmfaWxh3/eeynAaJRR/T78wGv970tUnTSpz7di+Cns8LSaZwAYGvQ wGCt0oo0zKr7fqhBm8Xig4qBvRNmVZ8= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-277-w0aEHrBXOLiwUnaEsz6zJw-1; Mon, 22 Mar 2021 20:49:28 -0400 X-MC-Unique: w0aEHrBXOLiwUnaEsz6zJw-1 Received: by mail-qt1-f200.google.com with SMTP id d11so413529qth.3 for ; Mon, 22 Mar 2021 17:49:28 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=t66g/F9+6bLvOsk6XEcPT7bVSiazBzgvig/355lAXQE=; b=FRGSfQy7zUlcIRJEOO7xTW8s2ihNbixU/O4/tJZH7/3YFR8s3WDf27xenoxAtHjb6P 169OFyGByrWB2OQidSRLphgbwSiuwdTUYo3N/5pkinvs2Ss4npgW5iKx2GUmoLLcAyaY mzwKf4v+605Z5VvBPLBf3GFtpVM6f4EQoL6e9qPch4lcBrEkxCevgg6Tk/0ffKVpjm1l TuROgmA0BUkPztQ/r/z/nbAXorSkaho/MfQBOK3u6oOMydEsn8W7tFEkipgb6+NaMkNF uFE+2uezVqukmCKq+wVE3ajj5lzIyBmpmkL1jeD+0prsr/OYjvxOTNCwfFJbw1jHu22i FjvQ== X-Gm-Message-State: AOAM5331jFch8lfuourkVoqSOwEauIC75Rns6J0xpsP90wG7jd64cekH LA9pFQJGxvTioH8f1ZeMOveQLh41+n2vpgEERvyZzeGri8f34hV4koZ828dGHI2jV7AAaOU4UgO hjE7ENvBu7gY= X-Received: by 2002:ac8:520d:: with SMTP id r13mr2445788qtn.38.1616460567842; Mon, 22 Mar 2021 17:49:27 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxQ3OkZpVOpbW4WMpnPVKmwjNCxNhfbWMLukWj6w5MQB4ZMF7sL7y6kqxbDNBAVeG3FmEY40Q== X-Received: by 2002:ac8:520d:: with SMTP id r13mr2445775qtn.38.1616460567535; Mon, 22 Mar 2021 17:49:27 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.25 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:26 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 08/23] mm: Introduce ZAP_FLAG_SKIP_SWAP Date: Mon, 22 Mar 2021 20:48:57 -0400 Message-Id: <20210323004912.35132-9-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: 456djc51juten4yfn5gytkz3x19ng8pu X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: ED255C0007CA Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf06; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460569-128890 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Firstly, the comment in zap_pte_range() is misleading because it checks against details rather than check_mappings, so it's against what the code did. Meanwhile, it's confusing too on not explaining why passing in the details pointer would mean to skip all swap entries. New user of zap_details could very possibly miss this fact if they don't read deep until zap_pte_range() because there's no comment at zap_details talking about it at all, so swap entries could be errornously skipped without being noticed. This partly reverts 3e8715fdc03e ("mm: drop zap_details::check_swap_entries"), but introduce ZAP_FLAG_SKIP_SWAP flag, which means the opposite of previous "details" parameter: the caller should explicitly set this to skip swap entries, otherwise swap entries will always be considered (which is still the major case here). Cc: Kirill A. Shutemov Signed-off-by: Peter Xu --- include/linux/mm.h | 12 ++++++++++++ mm/memory.c | 8 +++++--- 2 files changed, 17 insertions(+), 3 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index bb513a346beb..c11fbce0d557 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1704,6 +1704,8 @@ extern void user_shm_unlock(size_t, struct user_struct *); /* Whether to check page->mapping when zapping */ #define ZAP_FLAG_CHECK_MAPPING BIT(0) +/* Whether to skip zapping swap entries */ +#define ZAP_FLAG_SKIP_SWAP BIT(1) /* * Parameter block passed down to zap_pte_range in exceptional cases. @@ -1726,6 +1728,16 @@ zap_check_mapping_skip(struct zap_details *details, struct page *page) return details->zap_mapping != page_rmapping(page); } +/* Return true if skip swap entries, false otherwise */ +static inline bool +zap_skip_swap(struct zap_details *details) +{ + if (!details) + return false; + + return details->zap_flags & ZAP_FLAG_SKIP_SWAP; +} + struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte); struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, diff --git a/mm/memory.c b/mm/memory.c index 2e540b315481..a02c4d851cd4 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1284,8 +1284,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; } - /* If details->check_mapping, we leave swap entries. */ - if (unlikely(details)) + if (unlikely(zap_skip_swap(details))) continue; if (!non_swap_entry(entry)) @@ -3225,7 +3224,10 @@ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, pgoff_t nr, bool even_cows) { pgoff_t first_index = start, last_index = start + nr - 1; - struct zap_details details = { .zap_mapping = mapping }; + struct zap_details details = { + .zap_mapping = mapping, + .zap_flags = ZAP_FLAG_SKIP_SWAP, + }; if (!even_cows) details.zap_flags |= ZAP_FLAG_CHECK_MAPPING; From patchwork Tue Mar 23 00:48:58 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156493 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5089AC433C1 for ; Tue, 23 Mar 2021 00:49:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C590D6196C for ; Tue, 23 Mar 2021 00:49:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C590D6196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id AD8726B0125; Mon, 22 Mar 2021 20:49:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A0FF86B0128; Mon, 22 Mar 2021 20:49:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 79E916B0129; Mon, 22 Mar 2021 20:49:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0132.hostedemail.com [216.40.44.132]) by kanga.kvack.org (Postfix) with ESMTP id 565986B0125 for ; Mon, 22 Mar 2021 20:49:34 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 19C25181AF5C4 for ; Tue, 23 Mar 2021 00:49:34 +0000 (UTC) X-FDA: 77949305868.25.E184255 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 73D67A0000FC for ; Tue, 23 Mar 2021 00:49:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460572; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ycNGR7jI4EfGPoKNtAWo/hwPX/71F3+EQicoRsueR2k=; b=H5m9HpNjfO1btp+t8O5pBj5SJs1NIz7ACKnm6AGEShkrT3/dLU+XR7aHxuPEwxsvOz0ABi aEpOu4Nrp9aPF36rCSWZcdpgzVi/WdfiKyhuEsfWRe5wrtTShUQJNG69tx2RIAIYLtp6GA LpO5UDOlupF4/3opvzR0CKJS5/mE5/k= Received: from mail-qk1-f197.google.com (mail-qk1-f197.google.com [209.85.222.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-9-vV0Ni2nPPgSx6LWfAGwSPw-1; Mon, 22 Mar 2021 20:49:30 -0400 X-MC-Unique: vV0Ni2nPPgSx6LWfAGwSPw-1 Received: by mail-qk1-f197.google.com with SMTP id 130so833058qkm.0 for ; Mon, 22 Mar 2021 17:49:30 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ycNGR7jI4EfGPoKNtAWo/hwPX/71F3+EQicoRsueR2k=; b=GZ0yKYEdKqvWioegr29PQHDmqLh7Y3HNmLoc56thODkoyQO0/f7e0zg6TCbDWZ5QMC QjqJSCZ2dOaATW9X934qgPp6J7JKz1SynYcUiyIwE3wPzkZdGUfN3r5CMZPpRFa8tk2Q 2DiCeSM/BQBmiJjpjzpHn9mY05Y+yvdHHNHqjVweSsd748H7b2sa5qd70Nz2EN7eeHrm c9ATGpyNbCA15zIjQkQfuaMcucxTJ29Nt7HSNjENtAndaXXmRam5BHTxXqYi25fMxDbU M2MzQTtv9Jr+EzvDt4iMdJg+WlrItUJnS77smzr7h5zSguGm13sAK0P2JTrBydFh6LEV IKBA== X-Gm-Message-State: AOAM530lEQ1/3ejbL0lFxugPBsppQMOP/xSuQOvAvTEo8i7vEudINnXU lvQvHesNfk6A76T0hMSm/MyfHTzTpir8UV1kXMGOva72nzsmPQCeYYNgqe+KZALWOZ94V90Dg/6 GQbdW0SYOudw= X-Received: by 2002:ac8:6059:: with SMTP id k25mr2375048qtm.251.1616460569741; Mon, 22 Mar 2021 17:49:29 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwtV0VT7cWYfkT7bF1v+/17du36Bhz1HhGotTYXOvU4CEqVh7lpJjcB0mZq2d3qpwefNyxi1w== X-Received: by 2002:ac8:6059:: with SMTP id k25mr2375035qtm.251.1616460569403; Mon, 22 Mar 2021 17:49:29 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.27 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:28 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 09/23] mm: Pass zap_flags into unmap_mapping_pages() Date: Mon, 22 Mar 2021 20:48:58 -0400 Message-Id: <20210323004912.35132-10-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 73D67A0000FC X-Stat-Signature: 6d5a9qnimhpkq5bfibpq5sx4m39xxpwi Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460572-397971 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Give unmap_mapping_pages() more power by allowing to specify a zap flag so that it can pass in more information than "whether we'd also like to zap cow pages". With the new flag, we can remove the even_cow parameter because even_cow==false equals to zap_flags==ZAP_FLAG_CHECK_MAPPING, while even_cow==true means a none zap flag to pass in (though in most cases we have had even_cow==false). No functional change intended. Signed-off-by: Peter Xu --- fs/dax.c | 10 ++++++---- include/linux/mm.h | 4 ++-- mm/khugepaged.c | 3 ++- mm/memory.c | 15 ++++++++------- mm/truncate.c | 11 +++++++---- 5 files changed, 25 insertions(+), 18 deletions(-) diff --git a/fs/dax.c b/fs/dax.c index 177b7d305a52..dd90a35d38be 100644 --- a/fs/dax.c +++ b/fs/dax.c @@ -514,7 +514,7 @@ static void *grab_mapping_entry(struct xa_state *xas, xas_unlock_irq(xas); unmap_mapping_pages(mapping, xas->xa_index & ~PG_PMD_COLOUR, - PG_PMD_NR, false); + PG_PMD_NR, ZAP_FLAG_CHECK_MAPPING); xas_reset(xas); xas_lock_irq(xas); } @@ -609,7 +609,8 @@ struct page *dax_layout_busy_page_range(struct address_space *mapping, * guaranteed to either see new references or prevent new * references from being established. */ - unmap_mapping_pages(mapping, start_idx, end_idx - start_idx + 1, 0); + unmap_mapping_pages(mapping, start_idx, end_idx - start_idx + 1, + ZAP_FLAG_CHECK_MAPPING); xas_lock_irq(&xas); xas_for_each(&xas, entry, end_idx) { @@ -740,9 +741,10 @@ static void *dax_insert_entry(struct xa_state *xas, /* we are replacing a zero page with block mapping */ if (dax_is_pmd_entry(entry)) unmap_mapping_pages(mapping, index & ~PG_PMD_COLOUR, - PG_PMD_NR, false); + PG_PMD_NR, ZAP_FLAG_CHECK_MAPPING); else /* pte entry */ - unmap_mapping_pages(mapping, index, 1, false); + unmap_mapping_pages(mapping, index, 1, + ZAP_FLAG_CHECK_MAPPING); } xas_reset(xas); diff --git a/include/linux/mm.h b/include/linux/mm.h index c11fbce0d557..d38cd23a08be 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1784,7 +1784,7 @@ extern int fixup_user_fault(struct mm_struct *mm, unsigned long address, unsigned int fault_flags, bool *unlocked); void unmap_mapping_pages(struct address_space *mapping, - pgoff_t start, pgoff_t nr, bool even_cows); + pgoff_t start, pgoff_t nr, unsigned long zap_flags); void unmap_mapping_range(struct address_space *mapping, loff_t const holebegin, loff_t const holelen, int even_cows); #else @@ -1804,7 +1804,7 @@ static inline int fixup_user_fault(struct mm_struct *mm, unsigned long address, return -EFAULT; } static inline void unmap_mapping_pages(struct address_space *mapping, - pgoff_t start, pgoff_t nr, bool even_cows) { } + pgoff_t start, pgoff_t nr, unsigned long zap_flags) { } static inline void unmap_mapping_range(struct address_space *mapping, loff_t const holebegin, loff_t const holelen, int even_cows) { } #endif diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 419a6acce326..7c75dff637e2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1837,7 +1837,8 @@ static void collapse_file(struct mm_struct *mm, } if (page_mapped(page)) - unmap_mapping_pages(mapping, index, 1, false); + unmap_mapping_pages(mapping, index, 1, + ZAP_FLAG_CHECK_MAPPING); xas_lock_irq(&xas); xas_set(&xas, index); diff --git a/mm/memory.c b/mm/memory.c index a02c4d851cd4..36204b898894 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -3213,7 +3213,10 @@ static inline void unmap_mapping_range_tree(struct rb_root_cached *root, * @mapping: The address space containing pages to be unmapped. * @start: Index of first page to be unmapped. * @nr: Number of pages to be unmapped. 0 to unmap to end of file. - * @even_cows: Whether to unmap even private COWed pages. + * @zap_flags: Zap flags for the process. E.g., when ZAP_FLAG_CHECK_MAPPING is + * passed into it, we will only zap the pages that are in the same mapping + * specified in the @mapping parameter; otherwise we will not check mapping, + * IOW cow pages will be zapped too. * * Unmap the pages in this address space from any userspace process which * has them mmaped. Generally, you want to remove COWed pages as well when @@ -3221,17 +3224,14 @@ static inline void unmap_mapping_range_tree(struct rb_root_cached *root, * cache. */ void unmap_mapping_pages(struct address_space *mapping, pgoff_t start, - pgoff_t nr, bool even_cows) + pgoff_t nr, unsigned long zap_flags) { pgoff_t first_index = start, last_index = start + nr - 1; struct zap_details details = { .zap_mapping = mapping, - .zap_flags = ZAP_FLAG_SKIP_SWAP, + .zap_flags = zap_flags | ZAP_FLAG_SKIP_SWAP, }; - if (!even_cows) - details.zap_flags |= ZAP_FLAG_CHECK_MAPPING; - if (last_index < first_index) last_index = ULONG_MAX; @@ -3273,7 +3273,8 @@ void unmap_mapping_range(struct address_space *mapping, hlen = ULONG_MAX - hba + 1; } - unmap_mapping_pages(mapping, hba, hlen, even_cows); + unmap_mapping_pages(mapping, hba, hlen, even_cows ? + 0 : ZAP_FLAG_CHECK_MAPPING); } EXPORT_SYMBOL(unmap_mapping_range); diff --git a/mm/truncate.c b/mm/truncate.c index 95af244b112a..ba2cbe300e83 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -172,7 +172,8 @@ truncate_cleanup_page(struct address_space *mapping, struct page *page) { if (page_mapped(page)) { unsigned int nr = thp_nr_pages(page); - unmap_mapping_pages(mapping, page->index, nr, false); + unmap_mapping_pages(mapping, page->index, nr, + ZAP_FLAG_CHECK_MAPPING); } if (page_has_private(page)) @@ -652,14 +653,15 @@ int invalidate_inode_pages2_range(struct address_space *mapping, * Zap the rest of the file in one hit. */ unmap_mapping_pages(mapping, index, - (1 + end - index), false); + (1 + end - index), + ZAP_FLAG_CHECK_MAPPING); did_range_unmap = 1; } else { /* * Just zap this page */ unmap_mapping_pages(mapping, index, - 1, false); + 1, ZAP_FLAG_CHECK_MAPPING); } } BUG_ON(page_mapped(page)); @@ -685,7 +687,8 @@ int invalidate_inode_pages2_range(struct address_space *mapping, * get remapped later. */ if (dax_mapping(mapping)) { - unmap_mapping_pages(mapping, start, end - start + 1, false); + unmap_mapping_pages(mapping, start, end - start + 1, + ZAP_FLAG_CHECK_MAPPING); } out: cleancache_invalidate_inode(mapping); From patchwork Tue Mar 23 00:48:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156495 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7A135C433DB for ; Tue, 23 Mar 2021 00:49:46 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0C3706199F for ; Tue, 23 Mar 2021 00:49:46 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0C3706199F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6431E6B0128; Mon, 22 Mar 2021 20:49:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 556DC6B012A; Mon, 22 Mar 2021 20:49:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 30B506B012B; Mon, 22 Mar 2021 20:49:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0228.hostedemail.com [216.40.44.228]) by kanga.kvack.org (Postfix) with ESMTP id 0E5426B0128 for ; Mon, 22 Mar 2021 20:49:35 -0400 (EDT) Received: from smtpin37.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C1AFD180AD82F for ; Tue, 23 Mar 2021 00:49:34 +0000 (UTC) X-FDA: 77949305868.37.7877DFE Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf14.hostedemail.com (Postfix) with ESMTP id B987DC0007C5 for ; Tue, 23 Mar 2021 00:49:32 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460573; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=mxF+sSLyK09baZDkstNBjMYwD0MIL35w4r5kBgH3Yq4=; b=QsEoZTOMwMQoX3d+RS9wSp5FnuBWX3EYdMnY4i2G8eGEfFzWcm262OFrETKcYj1VTYb0Am 4O/tugvfYetOpTi6t1e9tcdzg+o0k3xD5cc7nuanD9Hd+zWhkWZWKr7ys2lR5p6aAVU3YS KLJlAvRI6aglOdkbLM7hR1FCNkHgSLw= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-546-sm4oKGbgMmaG4xAz0jrwoA-1; Mon, 22 Mar 2021 20:49:32 -0400 X-MC-Unique: sm4oKGbgMmaG4xAz0jrwoA-1 Received: by mail-qt1-f197.google.com with SMTP id b21so404110qtr.8 for ; Mon, 22 Mar 2021 17:49:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=mxF+sSLyK09baZDkstNBjMYwD0MIL35w4r5kBgH3Yq4=; b=jxr+hYiQLVUEFeYUrSyUMr0HD9MtXpv9MixJjcYS9of6kA9UpQDHmtN9/9zRfEGszF jtbYwtHIqalnIABCWw66M2lCvTI3ayrh7cy8QeUv7oUJ9lE219KtwswdM9CwNysCBfch 6f8hcRZgL0VUtaXnU+ZRFGvS4Vu7xq4uXH5iWbF3Ervs+Edfi9FwuGpdC/aAlpDuTHSu 8gQsy0rbKVvyyQbfdTLKwwv4OgxaqG3TSWxDOZ46GstxW36aoWKXJkHiX1E3cpt/CGhw OCAOAMBeC3xpIaAOYmuu4+zhXSj3xZkhTnAbQZZ+H6JE3Jhwj9JSQPtVz0ZVMuoEe2HR DIew== X-Gm-Message-State: AOAM530tCdxh471sf/RjpN9+qnxo274jT/5Bv2954vV97Cy0Es8+sJzu JM+bSFTZCU8uWkNtDlT64qwg6Xqt7SV2tdDvMu98M2tDc8je8Wzf3QefK1N2nVtx5bfYRvNcGxz Pd3uaIofCgRs= X-Received: by 2002:a0c:ea81:: with SMTP id d1mr2323823qvp.57.1616460571530; Mon, 22 Mar 2021 17:49:31 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwhRXhBHr0uCB7qZBb8kx7hjNjj0cyzKyCZAht0WyQSAKRw/Uab9ufyaPmHS3hwQTe2s8EwGQ== X-Received: by 2002:a0c:ea81:: with SMTP id d1mr2323798qvp.57.1616460571128; Mon, 22 Mar 2021 17:49:31 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.29 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:30 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 10/23] shmem/userfaultfd: Persist uffd-wp bit across zapping for file-backed Date: Mon, 22 Mar 2021 20:48:59 -0400 Message-Id: <20210323004912.35132-11-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: B987DC0007C5 X-Stat-Signature: yt7g9dkasjndgm956ih8kfn5aeo6a8pk Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf14; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460572-238965 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: File-backed memory is prone to being unmapped at any time. It means all information in the pte will be dropped, including the uffd-wp flag. Since the uffd-wp info cannot be stored in page cache or swap cache, persist this wr-protect information by installing the special uffd-wp marker pte when we're going to unmap a uffd wr-protected pte. When the pte is accessed again, we will know it's previously wr-protected by recognizing the special pte. Meanwhile add a new flag ZAP_FLAG_DROP_FILE_UFFD_WP when we don't want to persist such an information. For example, when destroying the whole vma, or punching a hole in a shmem file. For the latter, we can only drop the uffd-wp bit when holding the page lock. It means the unmap_mapping_range() in shmem_fallocate() still reuqires to zap without ZAP_FLAG_DROP_FILE_UFFD_WP because that's still racy with the page faults. Signed-off-by: Peter Xu --- include/linux/mm.h | 11 ++++++++++ include/linux/mm_inline.h | 43 +++++++++++++++++++++++++++++++++++++++ mm/memory.c | 42 +++++++++++++++++++++++++++++++++++++- mm/rmap.c | 8 ++++++++ mm/truncate.c | 8 +++++++- 5 files changed, 110 insertions(+), 2 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index d38cd23a08be..9cd215454cdb 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1706,6 +1706,8 @@ extern void user_shm_unlock(size_t, struct user_struct *); #define ZAP_FLAG_CHECK_MAPPING BIT(0) /* Whether to skip zapping swap entries */ #define ZAP_FLAG_SKIP_SWAP BIT(1) +/* Whether to completely drop uffd-wp entries for file-backed memory */ +#define ZAP_FLAG_DROP_FILE_UFFD_WP BIT(2) /* * Parameter block passed down to zap_pte_range in exceptional cases. @@ -1738,6 +1740,15 @@ zap_skip_swap(struct zap_details *details) return details->zap_flags & ZAP_FLAG_SKIP_SWAP; } +static inline bool +zap_drop_file_uffd_wp(struct zap_details *details) +{ + if (!details) + return false; + + return details->zap_flags & ZAP_FLAG_DROP_FILE_UFFD_WP; +} + struct page *vm_normal_page(struct vm_area_struct *vma, unsigned long addr, pte_t pte); struct page *vm_normal_page_pmd(struct vm_area_struct *vma, unsigned long addr, diff --git a/include/linux/mm_inline.h b/include/linux/mm_inline.h index 355ea1ee32bd..c29a6ef3a642 100644 --- a/include/linux/mm_inline.h +++ b/include/linux/mm_inline.h @@ -4,6 +4,8 @@ #include #include +#include +#include /** * page_is_file_lru - should the page be on a file LRU or anon LRU? @@ -104,4 +106,45 @@ static __always_inline void del_page_from_lru_list(struct page *page, update_lru_size(lruvec, page_lru(page), page_zonenum(page), -thp_nr_pages(page)); } + +/* + * If this pte is wr-protected by uffd-wp in any form, arm the special pte to + * replace a none pte. NOTE! This should only be called when *pte is already + * cleared so we will never accidentally replace something valuable. Meanwhile + * none pte also means we are not demoting the pte so if tlb flushed then we + * don't need to do it again; otherwise if tlb flush is postponed then it's + * even better. + * + * Must be called with pgtable lock held. + */ +static inline void +pte_install_uffd_wp_if_needed(struct vm_area_struct *vma, unsigned long addr, + pte_t *pte, pte_t pteval) +{ +#ifdef CONFIG_USERFAULTFD + bool arm_uffd_pte = false; + + /* The current status of the pte should be "cleared" before calling */ + WARN_ON_ONCE(!pte_none(*pte)); + + if (vma_is_anonymous(vma)) + return; + + /* A uffd-wp wr-protected normal pte */ + if (unlikely(pte_present(pteval) && pte_uffd_wp(pteval))) + arm_uffd_pte = true; + + /* + * A uffd-wp wr-protected swap pte. Note: this should even work for + * pte_swp_uffd_wp_special() too. + */ + if (unlikely(is_swap_pte(pteval) && pte_swp_uffd_wp(pteval))) + arm_uffd_pte = true; + + if (unlikely(arm_uffd_pte)) + set_pte_at(vma->vm_mm, addr, pte, + pte_swp_mkuffd_wp_special(vma)); +#endif +} + #endif diff --git a/mm/memory.c b/mm/memory.c index 36204b898894..8be28bcaa044 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -73,6 +73,7 @@ #include #include #include +#include #include @@ -1210,6 +1211,21 @@ copy_page_range(struct vm_area_struct *dst_vma, struct vm_area_struct *src_vma) return ret; } +/* + * This function makes sure that we'll replace the none pte with an uffd-wp + * swap special pte marker when necessary. Must be with the pgtable lock held. + */ +static inline void +zap_install_uffd_wp_if_needed(struct vm_area_struct *vma, + unsigned long addr, pte_t *pte, + struct zap_details *details, pte_t pteval) +{ + if (zap_drop_file_uffd_wp(details)) + return; + + pte_install_uffd_wp_if_needed(vma, addr, pte, pteval); +} + static unsigned long zap_pte_range(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr, unsigned long end, @@ -1247,6 +1263,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, ptent = ptep_get_and_clear_full(mm, addr, pte, tlb->fullmm); tlb_remove_tlb_entry(tlb, pte, addr); + zap_install_uffd_wp_if_needed(vma, addr, pte, details, + ptent); if (unlikely(!page)) continue; @@ -1271,6 +1289,22 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, continue; } + /* + * If this is a special uffd-wp marker pte... Drop it only if + * enforced to do so. + */ + if (unlikely(is_swap_special_pte(ptent))) { + WARN_ON_ONCE(!pte_swp_uffd_wp_special(ptent)); + /* + * If this is a common unmap of ptes, keep this as is. + * Drop it only if this is a whole-vma destruction. + */ + if (zap_drop_file_uffd_wp(details)) + ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + continue; + } + entry = pte_to_swp_entry(ptent); if (is_device_private_entry(entry)) { struct page *page = device_private_entry_to_page(entry); @@ -1281,6 +1315,8 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, rss[mm_counter(page)]--; page_remove_rmap(page, false); put_page(page); + zap_install_uffd_wp_if_needed(vma, addr, pte, details, + ptent); continue; } @@ -1298,6 +1334,7 @@ static unsigned long zap_pte_range(struct mmu_gather *tlb, if (unlikely(!free_swap_and_cache(entry))) print_bad_pte(vma, addr, ptent, NULL); pte_clear_not_present_full(mm, addr, pte, tlb->fullmm); + zap_install_uffd_wp_if_needed(vma, addr, pte, details, ptent); } while (pte++, addr += PAGE_SIZE, addr != end); add_mm_rss_vec(mm, rss); @@ -1497,12 +1534,15 @@ void unmap_vmas(struct mmu_gather *tlb, unsigned long end_addr) { struct mmu_notifier_range range; + struct zap_details details = { + .zap_flags = ZAP_FLAG_DROP_FILE_UFFD_WP, + }; mmu_notifier_range_init(&range, MMU_NOTIFY_UNMAP, 0, vma, vma->vm_mm, start_addr, end_addr); mmu_notifier_invalidate_range_start(&range); for ( ; vma && vma->vm_start < end_addr; vma = vma->vm_next) - unmap_single_vma(tlb, vma, start_addr, end_addr, NULL); + unmap_single_vma(tlb, vma, start_addr, end_addr, &details); mmu_notifier_invalidate_range_end(&range); } diff --git a/mm/rmap.c b/mm/rmap.c index b0fc27e77d6d..5e25c57164fc 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -72,6 +72,7 @@ #include #include #include +#include #include @@ -1571,6 +1572,13 @@ static bool try_to_unmap_one(struct page *page, struct vm_area_struct *vma, pteval = ptep_clear_flush(vma, address, pvmw.pte); } + /* + * Now the pte is cleared. If this is uffd-wp armed pte, we + * may want to replace a none pte with a marker pte if it's + * file-backed, so we don't lose the tracking information. + */ + pte_install_uffd_wp_if_needed(vma, address, pvmw.pte, pteval); + /* Move the dirty bit to the page. Now the pte is gone. */ if (pte_dirty(pteval)) set_page_dirty(page); diff --git a/mm/truncate.c b/mm/truncate.c index ba2cbe300e83..65fed21e52bd 100644 --- a/mm/truncate.c +++ b/mm/truncate.c @@ -173,7 +173,13 @@ truncate_cleanup_page(struct address_space *mapping, struct page *page) if (page_mapped(page)) { unsigned int nr = thp_nr_pages(page); unmap_mapping_pages(mapping, page->index, nr, - ZAP_FLAG_CHECK_MAPPING); + ZAP_FLAG_CHECK_MAPPING | + /* + * Now it's safe to drop uffd-wp because + * we're with page lock, and the page is + * being truncated. + */ + ZAP_FLAG_DROP_FILE_UFFD_WP); } if (page_has_private(page)) From patchwork Tue Mar 23 00:49:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156497 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 23036C433E0 for ; Tue, 23 Mar 2021 00:49:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8AE196199F for ; Tue, 23 Mar 2021 00:49:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8AE196199F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 869D76B012A; Mon, 22 Mar 2021 20:49:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 72EC96B012C; Mon, 22 Mar 2021 20:49:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 582D06B012D; Mon, 22 Mar 2021 20:49:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0250.hostedemail.com [216.40.44.250]) by kanga.kvack.org (Postfix) with ESMTP id 335386B012A for ; Mon, 22 Mar 2021 20:49:37 -0400 (EDT) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 947176C3E for ; Tue, 23 Mar 2021 00:49:36 +0000 (UTC) X-FDA: 77949305952.10.E381199 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf01.hostedemail.com (Postfix) with ESMTP id 9D5175001529 for ; Tue, 23 Mar 2021 00:49:35 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460575; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=GdeoM8mAgWisQJp09LS8IOLXFM1XCnluE4TOq2YmQQc=; b=CHLWZlCFCsPrIo0JeydFbFUhc0g2nVtcM+s1hFfWQq8c65nVRD2VfyMz/cFPjfcAO7ZJfX Vrrd9Qummegug+t5uNrKPgDD7nTSulh5Wi+ox1HlGjh+/2n1urOEykDITQ5gOLtMxXDs1d XR1lTvbqnYdbvOZSJAvcUfE4u2rqPrU= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-528-u1HDYV6jMWqOupfvs42A2w-1; Mon, 22 Mar 2021 20:49:34 -0400 X-MC-Unique: u1HDYV6jMWqOupfvs42A2w-1 Received: by mail-qk1-f198.google.com with SMTP id c131so784281qkg.21 for ; Mon, 22 Mar 2021 17:49:34 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=GdeoM8mAgWisQJp09LS8IOLXFM1XCnluE4TOq2YmQQc=; b=N9zyO8gYykOSKKU2/gL4AfKyX3on4S4Z6XX58a83cqyoP26SaCOlUx2XcJ5/j29c2W aUKKERBEc1zMFxheiAYfs5FjvA/93OXaoRFr7MIf/fyeftQr00k3VCrTHDJFAIcxqCkJ krz1/X5FoMDaLlwqkq3rdcDmSsBBXiS3M0cQuE/m5jgQipaLUXnkQ3KJ0TnbpuROVvho ljbU1wFGfq5ypua4dZeVvOgU4T6glOGoMNyrBckEZBe3MqImDjoXVcZz2giwsmBtITtp Y40XuchkENFAzHIRUKq27BFSg75qkgI9IXB+W1jsTRdZSZ4sUJ1Abvd3DF20jYfSzKUt 3V4Q== X-Gm-Message-State: AOAM531Kq7RBYDw/s6g3ITv0WqrgjGGcMAdACFyOOiaEO40gPsZdE78o UwjrD/lbM6qzmbmwaYWoGcJB4JiREn4+p1whGfQwb9xZMbQ11Kj98/7ThfXSlaJgaS+zUD+hBE3 JBNV8xCA3+fM= X-Received: by 2002:a0c:ef81:: with SMTP id w1mr2559077qvr.0.1616460573502; Mon, 22 Mar 2021 17:49:33 -0700 (PDT) X-Google-Smtp-Source: ABdhPJws9zEAWvWiYpilkK69WHg1W7DvCAHWTeaiLVpskY0kvpdwkfq5fk2WfOjV880qQyXwe5nCog== X-Received: by 2002:a0c:ef81:: with SMTP id w1mr2559056qvr.0.1616460573212; Mon, 22 Mar 2021 17:49:33 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.31 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:32 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 11/23] shmem/userfaultfd: Allow wr-protect none pte for file-backed mem Date: Mon, 22 Mar 2021 20:49:00 -0400 Message-Id: <20210323004912.35132-12-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: m8zrjh7ihcw8x61rk66c8p7yqz5c6x4s X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 9D5175001529 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf01; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460575-952982 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: File-backed memory differs from anonymous memory in that even if the pte is missing, the data could still resides either in the file or in page/swap cache. So when wr-protect a pte, we need to consider none ptes too. We do that by installing the uffd-wp special swap pte as a marker. So when there's a future write to the pte, the fault handler will go the special path to first fault-in the page as read-only, then report to userfaultfd server with the wr-protect message. On the other hand, when unprotecting a page, it's also possible that the pte got unmapped but replaced by the special uffd-wp marker. Then we'll need to be able to recover from a uffd-wp special swap pte into a none pte, so that the next access to the page will fault in correctly as usual when trigger the fault handler next time, rather than sending a uffd-wp message. Special care needs to be taken throughout the change_protection_range() process. Since now we allow user to wr-protect a none pte, we need to be able to pre-populate the page table entries if we see !anonymous && MM_CP_UFFD_WP requests, otherwise change_protection_range() will always skip when the pgtable entry does not exist. Note that this patch only covers the small pages (pte level) but not covering any of the transparent huge pages yet. But this will be a base for thps too. Signed-off-by: Peter Xu --- mm/mprotect.c | 48 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 48 insertions(+) diff --git a/mm/mprotect.c b/mm/mprotect.c index b3def0a102bf..6b63e3544b47 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -29,6 +29,7 @@ #include #include #include +#include #include #include #include @@ -176,6 +177,32 @@ static unsigned long change_pte_range(struct vm_area_struct *vma, pmd_t *pmd, set_pte_at(vma->vm_mm, addr, pte, newpte); pages++; } + } else if (unlikely(is_swap_special_pte(oldpte))) { + if (uffd_wp_resolve && !vma_is_anonymous(vma) && + pte_swp_uffd_wp_special(oldpte)) { + /* + * This is uffd-wp special pte and we'd like to + * unprotect it. What we need to do is simply + * recover the pte into a none pte; the next + * page fault will fault in the page. + */ + pte_clear(vma->vm_mm, addr, pte); + pages++; + } + } else { + /* It must be an none page, or what else?.. */ + WARN_ON_ONCE(!pte_none(oldpte)); + if (unlikely(uffd_wp && !vma_is_anonymous(vma))) { + /* + * For file-backed mem, we need to be able to + * wr-protect even for a none pte! Because + * even if the pte is null, the page/swap cache + * could exist. + */ + set_pte_at(vma->vm_mm, addr, pte, + pte_swp_mkuffd_wp_special(vma)); + pages++; + } } } while (pte++, addr += PAGE_SIZE, addr != end); arch_leave_lazy_mmu_mode(); @@ -209,6 +236,25 @@ static inline int pmd_none_or_clear_bad_unless_trans_huge(pmd_t *pmd) return 0; } +/* + * File-backed vma allows uffd wr-protect upon none ptes, because even if pte + * is missing, page/swap cache could exist. When that happens, the wr-protect + * information will be stored in the page table entries with the marker (e.g., + * PTE_SWP_UFFD_WP_SPECIAL). Prepare for that by always populating the page + * tables to pte level, so that we'll install the markers in change_pte_range() + * where necessary. + * + * Note that we only need to do this in pmd level, because if pmd does not + * exist, it means the whole range covered by the pmd entry (of a pud) does not + * contain any valid data but all zeros. Then nothing to wr-protect. + */ +#define change_protection_prepare(vma, pmd, addr, cp_flags) \ + do { \ + if (unlikely((cp_flags & MM_CP_UFFD_WP) && pmd_none(*pmd) && \ + !vma_is_anonymous(vma))) \ + WARN_ON_ONCE(pte_alloc(vma->vm_mm, pmd)); \ + } while (0) + static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pud_t *pud, unsigned long addr, unsigned long end, pgprot_t newprot, unsigned long cp_flags) @@ -227,6 +273,8 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, next = pmd_addr_end(addr, end); + change_protection_prepare(vma, pmd, addr, cp_flags); + /* * Automatic NUMA balancing walks the tables with mmap_lock * held for read. It's possible a parallel update to occur From patchwork Tue Mar 23 00:49:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156499 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 772CCC433DB for ; Tue, 23 Mar 2021 00:49:51 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1E9E16196C for ; Tue, 23 Mar 2021 00:49:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1E9E16196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3AF3D6B012C; Mon, 22 Mar 2021 20:49:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 30D376B012E; Mon, 22 Mar 2021 20:49:39 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1C316B012F; Mon, 22 Mar 2021 20:49:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0101.hostedemail.com [216.40.44.101]) by kanga.kvack.org (Postfix) with ESMTP id C4A096B012C for ; Mon, 22 Mar 2021 20:49:38 -0400 (EDT) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 8946762F2 for ; Tue, 23 Mar 2021 00:49:38 +0000 (UTC) X-FDA: 77949306036.01.B7E3ADA Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf05.hostedemail.com (Postfix) with ESMTP id B379AE0001AC for ; Tue, 23 Mar 2021 00:49:37 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460577; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=bmYAjaH54F633I35l9apxE4WoJ10q7xNHsvAC/VJ+ZA=; b=Afn42jchpcK1LVnG3Wp0gdbAek0DUKYkzav7BdCwPMu3mG5i5mKwAmr+XQ8DxYr7GYaS/h Di8LFbO17JD+N7GO0im9W5nAd6SH3iUobta3aysTSVn2dNHXOWTZScqf2jeDTYLK9JQ84r lJ2w5nTRWjMzT5WzF6KuhEm06Av+jLU= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-187-X7BlhdJEPwCJVgpUhh2HUw-1; Mon, 22 Mar 2021 20:49:35 -0400 X-MC-Unique: X7BlhdJEPwCJVgpUhh2HUw-1 Received: by mail-qk1-f198.google.com with SMTP id b78so806179qkg.13 for ; Mon, 22 Mar 2021 17:49:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=bmYAjaH54F633I35l9apxE4WoJ10q7xNHsvAC/VJ+ZA=; b=OTzwiByCqo/NBPFWIHqKfkZoXdm47B0Ivy7bbqI2fJJ6kbbTU056cZuV2Wc4cWcw3g 8UKOO9mDEPCciKUPUZ0EAI0yrbvmbR3qCBFsWyBYYnIb8f1Tgr6Do2TGXAkFIZkgHFGk y0J600Vwu6VvE1UiQ/Ss1CQEZjq1g3rMqcm4JB5edpu12Zh22Vx11QcOQJMxM3KDJawv nVH9nvCz62/Yo2UQrJnNkXJX6yPbMTyqW7YMWK/AC3DMy/J8Dv6iJbJZDYMDtc9qmR9k RSmJy8Q7oGGAjgfFBO6pjshHKPoWoEAThdcY8qgGmX6J0IIyxWe2lHXkwp+Mrw28R4My 8How== X-Gm-Message-State: AOAM53233W6w+o2G8vqG4Mqv7yWsGOhkpwfM1nO2c2gDnyiG5OXk+T2K onnwiC1nZKsZzc/kUto6waTaM7js8dRrM0TIi5tQYMhb9kZ5fhxxq95zjg4SB/6/iIrzAISfKzl OVU3WrfWKC/k= X-Received: by 2002:a05:620a:819:: with SMTP id s25mr3093169qks.485.1616460575298; Mon, 22 Mar 2021 17:49:35 -0700 (PDT) X-Google-Smtp-Source: ABdhPJwndVaIK/rU/VfjU2g3FSUD8i9gf3stvyGR9MZB2GR8CImkA5nMNn5+tsv84lL9aYUj/HBNgQ== X-Received: by 2002:a05:620a:819:: with SMTP id s25mr3093145qks.485.1616460575052; Mon, 22 Mar 2021 17:49:35 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:33 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 12/23] shmem/userfaultfd: Allows file-back mem to be uffd wr-protected on thps Date: Mon, 22 Mar 2021 20:49:01 -0400 Message-Id: <20210323004912.35132-13-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: c6touizkmtfuexfseeqqkrycp6w3z9ua X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: B379AE0001AC Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf05; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460577-849542 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We don't have "huge" version of PTE_SWP_UFFD_WP_SPECIAL, instead when necessary we split the thp if the huge page is uffd wr-protected previously. However split the thp is not enough, because file-backed thp is handled totally differently comparing to anonymous thps - rather than doing a real split, the thp pmd will simply got dropped in __split_huge_pmd_locked(). That is definitely not enough if e.g. when there is a thp covers range [0, 2M) but we want to wr-protect small page resides in [4K, 8K) range, because after __split_huge_pmd() returns, there will be a none pmd. Here we leverage the previously introduced change_protection_prepare() macro so that we'll populate the pmd with a pgtable page. Then change_pte_range() will do all the rest for us, e.g., install the uffd-wp swap special pte marker at any pte that we'd like to wr-protect, under the protection of pgtable lock. Signed-off-by: Peter Xu --- mm/mprotect.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/mm/mprotect.c b/mm/mprotect.c index 6b63e3544b47..51c954afa406 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -296,8 +296,16 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, } if (is_swap_pmd(*pmd) || pmd_trans_huge(*pmd) || pmd_devmap(*pmd)) { - if (next - addr != HPAGE_PMD_SIZE) { + if (next - addr != HPAGE_PMD_SIZE || + /* Uffd wr-protecting a file-backed memory range */ + unlikely(!vma_is_anonymous(vma) && + (cp_flags & MM_CP_UFFD_WP))) { __split_huge_pmd(vma, pmd, addr, false, NULL); + /* + * For file-backed, the pmd could have been + * gone; still provide a pte pgtable if needed. + */ + change_protection_prepare(vma, pmd, addr, cp_flags); } else { int nr_ptes = change_huge_pmd(vma, pmd, addr, newprot, cp_flags); From patchwork Tue Mar 23 00:49:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE4A3C433C1 for ; Tue, 23 Mar 2021 00:49:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 766976196C for ; Tue, 23 Mar 2021 00:49:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 766976196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id B71A96B012E; Mon, 22 Mar 2021 20:49:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B22466B0130; Mon, 22 Mar 2021 20:49:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8FF2B6B0131; Mon, 22 Mar 2021 20:49:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0131.hostedemail.com [216.40.44.131]) by kanga.kvack.org (Postfix) with ESMTP id 65A456B012E for ; Mon, 22 Mar 2021 20:49:40 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2D90362F2 for ; Tue, 23 Mar 2021 00:49:40 +0000 (UTC) X-FDA: 77949306120.03.132D71F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf13.hostedemail.com (Postfix) with ESMTP id 88016E0011C9 for ; Tue, 23 Mar 2021 00:49:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460579; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=Eo6GR1szVdaE50+5HSQqPNu3QULjjOofe28eTN5phnY=; b=QTKagX470a4SOSZBfUMNNC0czFPhCfXLBRf+Vl4vZZep1RTwPrRSYtMnSBUCRra+iqmYT8 4/EtssL5KW9lElCc8uGZAnLt6NSHTO2imXiMonh4FzIkG0/T2t2IS2xxkGJpPjXJvq28G+ lZ+TwuEBnhsAEwHcAYSoWvQAkT9S9eA= Received: from mail-qt1-f200.google.com (mail-qt1-f200.google.com [209.85.160.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-431--A5az_gHPhi-EHXYQdf9qw-1; Mon, 22 Mar 2021 20:49:38 -0400 X-MC-Unique: -A5az_gHPhi-EHXYQdf9qw-1 Received: by mail-qt1-f200.google.com with SMTP id t5so410628qti.5 for ; Mon, 22 Mar 2021 17:49:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=Eo6GR1szVdaE50+5HSQqPNu3QULjjOofe28eTN5phnY=; b=MxHHwRJnAyKrLx0ZHd8hGoQp+l/ukfZ8VdFfLZ1/kdbx31amTMVxbkVNlS2lJa1yaP 7DgiKmQomNpVThd36pNp/LGlyGYQonf0w3LeDDqnVmkJNUc/VMUZUMxVqEXYDy/l2cJ4 X7GHv82/hzZGLDz3I6K5Tcbw05k3TwzoinCREymD7uH31gCw5OAJ4929uhz9vJXUyirn Oy0vLjw6iZqSfcMvujP+38R6V6bkxYhv+rRhTPayBG/uwbUqTwrMPTbcXi96wvS2tazh SzlKOZix4LW50zuWbbSLS1w4/Dq75Kr6PSKccyFAWB6BKS7kA18KKp4ORczsevm5n2c/ Vt6Q== X-Gm-Message-State: AOAM532Jc4nzjLceb9gHA98Jkp2yabILeebdU5L5z8qcMNCS/QrOa9sb pCGE50eyaGuZEoShIw/ndxWgcojSQbG2HqtFccMWHmOko0Pi46jNdrryn8eoQOJCE+K2pCvWwpG v/Kp6Xwc2aXQ= X-Received: by 2002:a0c:f805:: with SMTP id r5mr2752058qvn.45.1616460577099; Mon, 22 Mar 2021 17:49:37 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyf8TDGDMVgqtBSuLbWSmrCugVffrpBoNx4NuHOZZ6VA9Ugvm4eHQ0tm4uAnu2RFfyiBQGo4A== X-Received: by 2002:a0c:f805:: with SMTP id r5mr2752046qvn.45.1616460576859; Mon, 22 Mar 2021 17:49:36 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.35 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:36 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 13/23] shmem/userfaultfd: Handle the left-overed special swap ptes Date: Mon, 22 Mar 2021 20:49:02 -0400 Message-Id: <20210323004912.35132-14-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 88016E0011C9 X-Stat-Signature: qjmoyh9qwtdrwdcpbwjm5po16xnciceg Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf13; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460579-774471 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Note that the special uffd-wp swap pte can be left over even if the page under the pte got evicted. Normally when evict a page, we will unmap the ptes by walking through the reverse mapping. However we never tracked such information for the special swap ptes because they're not real mappings but just markers. So we need to take care of that when we see a marker but when it's actually meaningless (the page behind it got evicted). We have already taken care of that in e.g. alloc_set_pte() where we'll treat the special swap pte as pte_none() when necessary. However we need to also teach userfaultfd itself on either UFFDIO_COPY or handling page faults, so that everything will still work as expected. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 15 +++++++++++++++ mm/shmem.c | 13 ++++++++++++- 2 files changed, 27 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index bd83379d4dd2..72956f9cc892 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -329,6 +329,21 @@ static inline bool userfaultfd_must_wait(struct userfaultfd_ctx *ctx, */ if (pte_none(*pte)) ret = true; + /* + * We also treat the swap special uffd-wp pte as the pte_none() here. + * This should in most cases be a missing event, as we never handle + * wr-protect upon a special uffd-wp swap pte - it should first be + * converted into a normal read request before handling wp. It just + * means the page/swap cache that backing this pte is gone, so this + * special pte is leftover. + * + * We can't simply replace it with a none pte because we're not with + * the pgtable lock here. Instead of taking it and clearing the pte, + * the easy way is to let UFFDIO_COPY understand this pte too when + * trying to install a new page onto it. + */ + if (pte_swp_uffd_wp_special(*pte)) + ret = true; if (!pte_write(*pte) && (reason & VM_UFFD_WP)) ret = true; pte_unmap(pte); diff --git a/mm/shmem.c b/mm/shmem.c index e88aaabaeb27..90d67406af66 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2469,7 +2469,18 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, goto out_release_unlock; ret = -EEXIST; - if (!pte_none(*dst_pte)) + /* + * Besides the none pte, we also allow UFFDIO_COPY to install a pte + * onto the uffd-wp swap special pte, because that pte should be the + * same as a pte_none() just in that it contains wr-protect information + * (which could only be dropped when unmap the memory). + * + * It's safe to drop that marker because we know this is part of a + * MISSING fault, and the caller is very clear about this page missing + * rather than wr-protected. Then we're sure the wr-protect bit is + * just a leftover so it's useless already. + */ + if (!pte_none(*dst_pte) && !pte_swp_uffd_wp_special(*dst_pte)) goto out_release_unlock; if (!is_continue) { From patchwork Tue Mar 23 00:49:03 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156505 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 936FFC433E0 for ; Tue, 23 Mar 2021 00:49:58 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 370196196C for ; Tue, 23 Mar 2021 00:49:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 370196196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8E6126B0132; Mon, 22 Mar 2021 20:49:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 862896B0134; Mon, 22 Mar 2021 20:49:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5F02C6B0135; Mon, 22 Mar 2021 20:49:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0252.hostedemail.com [216.40.44.252]) by kanga.kvack.org (Postfix) with ESMTP id 3CC136B0134 for ; Mon, 22 Mar 2021 20:49:44 -0400 (EDT) Received: from smtpin22.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 06AFF8249980 for ; Tue, 23 Mar 2021 00:49:44 +0000 (UTC) X-FDA: 77949306288.22.D5AA9F6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf24.hostedemail.com (Postfix) with ESMTP id 42F6EA0009CA for ; Tue, 23 Mar 2021 00:49:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460582; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=cKuTAK/coeu/o9lPtGRIHWod3LERVBmK315ziMrYcLI=; b=FjbuUcEalzW8GqGXb3GZXAZSQtbn0bCJMuf9ffQ8NJQE03Y9qKAH/YTYKNvwyOh6lJbufX YsMf5OaFwJUMgcmVlg8lUsDgQ9w1c1wJbtRftAPpN1yCyPH1eoO1kR8iAODrBwyD+4c7tZ QS+T1Yl8cSjmBPcGj9bG6rYlIfXkUsQ= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-602-t1kdNRq2NBq_qOs_i2H6_A-1; Mon, 22 Mar 2021 20:49:39 -0400 X-MC-Unique: t1kdNRq2NBq_qOs_i2H6_A-1 Received: by mail-qt1-f198.google.com with SMTP id h26so397598qtm.13 for ; Mon, 22 Mar 2021 17:49:39 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=cKuTAK/coeu/o9lPtGRIHWod3LERVBmK315ziMrYcLI=; b=t8Xn59YD8cQKGMWkr1J/keSBJjuDlnV5lOB7Dw3JU8PKBeRG6u+Dm/G76W3i8+Vj5N +yq+z+5ce9O3+3sIYLoxf2C/bpmPeKA3zo44zLyl6a/GlqIC7C5f4NimJxA0IfbgafmK elCvW35u3LzlEJmXwa0Dd0xdyub7Bf8milZGDNrR94wX22LqxCsdo2toMqNEhXfbw/VO R857PZHX1f73I+q89Yjx2WC+7BhRuLYkzc7R+4yN0VP5EZJZrW9qceCWPQJpAGs8u+88 /ZJZrwUg7uiz0JIAqzTTmAEBtt5dV5LO74fb2EnIq3Oo0zIN0Pr4l9CNPEaoxvc1+BDN /nEw== X-Gm-Message-State: AOAM532dX37m4wzuf2uSxE+xZe/WkrbSvGJJ2RdXWtbjzgbCd5jJGa/P l073bYQn+d2o7RceGcblhobJphD9nlEMNid9KUUbmvxokVQUwKZI5HXxV/IJntiMDgDJ/hEmMme Jd9CivPnm+bw= X-Received: by 2002:a05:620a:1388:: with SMTP id k8mr2974993qki.224.1616460578814; Mon, 22 Mar 2021 17:49:38 -0700 (PDT) X-Google-Smtp-Source: ABdhPJy+wBwPeNPOsPise+7lL7dkN21Mn6lkzdRj7CMeozDPNGfvtHmM2/ikJDtiTyBJ8cbrfkFN1A== X-Received: by 2002:a05:620a:1388:: with SMTP id k8mr2974981qki.224.1616460578591; Mon, 22 Mar 2021 17:49:38 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.37 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:38 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 14/23] shmem/userfaultfd: Pass over uffd-wp special swap pte when fork() Date: Mon, 22 Mar 2021 20:49:03 -0400 Message-Id: <20210323004912.35132-15-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 42F6EA0009CA X-Stat-Signature: ude8pc7aqom3mdo4zf3khgesz5pupytf Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460582-259586 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It should be handled similarly like other uffd-wp wr-protected ptes: we should pass it over when the dst_vma has VM_UFFD_WP armed, otherwise drop it. Signed-off-by: Peter Xu --- mm/memory.c | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/mm/memory.c b/mm/memory.c index 8be28bcaa044..766946d3eab0 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -715,8 +715,21 @@ copy_nonpresent_pte(struct mm_struct *dst_mm, struct mm_struct *src_mm, unsigned long vm_flags = dst_vma->vm_flags; pte_t pte = *src_pte; struct page *page; - swp_entry_t entry = pte_to_swp_entry(pte); + swp_entry_t entry; + + if (unlikely(is_swap_special_pte(pte))) { + /* + * uffd-wp special swap pte is the only possibility for now. + * If dst vma is registered with uffd-wp, copy it over. + * Otherwise, ignore this pte as if it's a none pte would work. + */ + WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte)); + if (userfaultfd_wp(dst_vma)) + set_pte_at(dst_mm, addr, dst_pte, pte); + return 0; + } + entry = pte_to_swp_entry(pte); if (likely(!non_swap_entry(entry))) { if (swap_duplicate(entry) < 0) return entry.val; From patchwork Tue Mar 23 00:49:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 10DD7C433DB for ; Tue, 23 Mar 2021 00:49:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id A1E6E6196C for ; Tue, 23 Mar 2021 00:49:55 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A1E6E6196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1AE936B0130; Mon, 22 Mar 2021 20:49:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 160826B0132; Mon, 22 Mar 2021 20:49:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E8A246B0133; Mon, 22 Mar 2021 20:49:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0120.hostedemail.com [216.40.44.120]) by kanga.kvack.org (Postfix) with ESMTP id C64336B0130 for ; Mon, 22 Mar 2021 20:49:43 -0400 (EDT) Received: from smtpin39.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 8A29A8249980 for ; Tue, 23 Mar 2021 00:49:43 +0000 (UTC) X-FDA: 77949306246.39.7B6F93C Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf24.hostedemail.com (Postfix) with ESMTP id EB951A0009E2 for ; Tue, 23 Mar 2021 00:49:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460582; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=LwDOET0+t9FVkgq293BQMlQJW4s7rxaX3ykZpO1NCQ8=; b=T3JJr4TjZgXIAauGUHzlMgVPV/1p1YOpNrgVpd9ZbNJ2iWkEg26Pq1K580hchp8dlhca+N Exrqw/J1eyTnigW+dJV0L3KjRdMwKLpSTWZWXK/gAYWiYjEMbpIgVLMtuPVBMdjowzFxbH gp+yrpujGhwNNfudjsBEToKKzch9BPw= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-437-Ec2ynQsNN1qzTS9tLftHEQ-1; Mon, 22 Mar 2021 20:49:41 -0400 X-MC-Unique: Ec2ynQsNN1qzTS9tLftHEQ-1 Received: by mail-qv1-f72.google.com with SMTP id k92so540840qva.20 for ; Mon, 22 Mar 2021 17:49:40 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=LwDOET0+t9FVkgq293BQMlQJW4s7rxaX3ykZpO1NCQ8=; b=REZ2+BSoY+t+crrE05gOuCeE07vXL4q+lJnVFHTioKy800lJEScYlf9csEjB2NO1rl 3CEMxgsXTSuVcF0m76fIWRMACgE7QeA+n4tu+L1hb0sj/wrhfjhYl/+ho6UOTMsuRBKY 3NRBmiXGiFBX9xLEsqKEewPnxpAY4YRpeUKBYVEcCgo7TsdBQaNlYDO2L/N1fLfeg51E BvkqbOC6uUW5iA8Bn3h1iufn1HvB3pzfG6CwT2w4Gi4aJglFbpMYWv7uB+PXOM3DMwv7 BPK3P2Ut1WZNd4+ZztojQUxAZu/PB2l84AX3PYRYM/UWVomUGlNFMSqW0hxWkk7rBflL 8pug== X-Gm-Message-State: AOAM533tK9dQM1RF++KKHFDoiwTsFaTervdUP1nZHWweoK5Vh3//dHmv uJeNhu5AD9AAB/L0MnUTGCgIiFaa4KaaKVgkcDwNU+/quJ9SEvC//NxSBlTDXLanCd5UjxqYzgH oykEM+n3VM70= X-Received: by 2002:a37:e315:: with SMTP id y21mr2972106qki.418.1616460580486; Mon, 22 Mar 2021 17:49:40 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzGhlvWbuuhEk+ZKMiW+MQgkfAcTKCZx9vxyblCGaELts7EA6fuF6kIJ/7TdB+zCUR+L3XDeA== X-Received: by 2002:a37:e315:: with SMTP id y21mr2972089qki.418.1616460580286; Mon, 22 Mar 2021 17:49:40 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.38 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:39 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 15/23] hugetlb/userfaultfd: Hook page faults for uffd write protection Date: Mon, 22 Mar 2021 20:49:04 -0400 Message-Id: <20210323004912.35132-16-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: EB951A0009E2 X-Stat-Signature: 4qgst1x4canfmya5ur8kozsd9roj6u4i Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf24; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460581-514381 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hook up hugetlbfs_fault() with the capability to handle userfaultfd-wp faults. We do this slightly earlier than hugetlb_cow() so that we can avoid taking some extra locks that we definitely don't need. Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz --- mm/hugetlb.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 56b78a206913..def2c7ddf3ae 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4643,6 +4643,25 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) goto out_ptl; + /* Handle userfault-wp first, before trying to lock more pages */ + if (userfaultfd_pte_wp(vma, huge_ptep_get(ptep)) && + (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { + struct vm_fault vmf = { + .vma = vma, + .address = haddr, + .flags = flags, + }; + + spin_unlock(ptl); + if (pagecache_page) { + unlock_page(pagecache_page); + put_page(pagecache_page); + } + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + i_mmap_unlock_read(mapping); + return handle_userfault(&vmf, VM_UFFD_WP); + } + /* * hugetlb_cow() requires page locks of pte_page(entry) and * pagecache_page, so here we need take the former one From patchwork Tue Mar 23 00:49:05 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156509 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CBDFFC433E0 for ; Tue, 23 Mar 2021 00:50:03 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 6EBB26196C for ; Tue, 23 Mar 2021 00:50:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 6EBB26196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id D215E6B0136; Mon, 22 Mar 2021 20:49:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C5F896B0137; Mon, 22 Mar 2021 20:49:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C4AF6B0139; Mon, 22 Mar 2021 20:49:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0021.hostedemail.com [216.40.44.21]) by kanga.kvack.org (Postfix) with ESMTP id 62F816B0137 for ; Mon, 22 Mar 2021 20:49:48 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2A9C218035453 for ; Tue, 23 Mar 2021 00:49:48 +0000 (UTC) X-FDA: 77949306456.05.822FEF9 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf20.hostedemail.com (Postfix) with ESMTP id 2AB71E7 for ; Tue, 23 Mar 2021 00:49:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460587; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=AiO26VDrmUg6vB+7BT5Jj5YPPTcAM0VGHQAxP1RtUns=; b=WGvD8H8hhqbaZSFJnIxUpXur3+sdcp4POjBfbPG7IK2UU1sU4BijklJXZ5hdZ6vf18gtZm HhH0TWfgT6Cx7YQVjD1TAkcMQ3327w+sFmTO8X9GF1ZVWS2ANz07D+Nr9LedKICMvErfxT +eD7fMu6tNFHxScAUXxqbyN0vaaZHdE= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-172--pX-GioUOI2NEbZe4GkZgA-1; Mon, 22 Mar 2021 20:49:43 -0400 X-MC-Unique: -pX-GioUOI2NEbZe4GkZgA-1 Received: by mail-qt1-f197.google.com with SMTP id m8so392879qtp.14 for ; Mon, 22 Mar 2021 17:49:43 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=AiO26VDrmUg6vB+7BT5Jj5YPPTcAM0VGHQAxP1RtUns=; b=Dqo1siJvd2usyPuGZOHub3oYy7yDCHzruhJ/IjciTGIw2xSjesaRk8xMe/LbFRoaui tt/FWwQ6vvWuZlzlAL4UG7VhSUxRfNgVBlQa/JJ0pT+ODRnbWhLJmlUAb0LwUhwK7rLn acIx0uvBrIQzBl7Wk7xwvS9j8uHrTXXlDDQTLzObAj7a5lnVhULHZz6ZWZpiL9y7ycDw zhdibqZ9yKsGpzCnqO0TCW7w+IkS8Xda4jdmCJsbGisNBKZm6IgsMtS0vCv87JCBEXKA LlPPPkq/rsUaMg/5UMVwzTxtb4keqzvU+XfOSzBjjuFGhnODfj8CMkYaPmUObTDvgvpR QoSw== X-Gm-Message-State: AOAM530M6wdNTIumwnK6yvkmFyZY1aAYnBS93uNXM5dbG8DZaByaNHt1 oUFlk5XyW1SU2Px05WjzCj5bhpdIy4GMIcfP50qX7PcNpGDdqX5n0mupEPlW4wZcUl3gce3d1kn 7wJl9LZjKzFY= X-Received: by 2002:a05:620a:244f:: with SMTP id h15mr2827100qkn.235.1616460582825; Mon, 22 Mar 2021 17:49:42 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzWIpDXhwdNCAyIRtAVCaaAMdmYrNrenAgax2tWD83MS43YI42hO1l+4DhwqPJ+NaQThw1Wgg== X-Received: by 2002:a05:620a:244f:: with SMTP id h15mr2827076qkn.235.1616460582571; Mon, 22 Mar 2021 17:49:42 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.40 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:41 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 16/23] hugetlb/userfaultfd: Take care of UFFDIO_COPY_MODE_WP Date: Mon, 22 Mar 2021 20:49:05 -0400 Message-Id: <20210323004912.35132-17-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 2AB71E7 X-Stat-Signature: pc7q89ioeh1ywk7m8uy43okpzn3hcdec Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf20; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460586-713205 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Firstly, pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout the stack. Then, apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with UFFDIO_COPY. Introduce huge_pte_mkuffd_wp() for it. Note that similar to how we've handled shmem, we'd better keep setting the dirty bit even if UFFDIO_COPY_MODE_WP is provided, so that the core mm will know this page contains valid data and never drop it. Signed-off-by: Peter Xu --- include/asm-generic/hugetlb.h | 5 +++++ include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 22 +++++++++++++++++----- mm/userfaultfd.c | 12 ++++++++---- 4 files changed, 34 insertions(+), 11 deletions(-) diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index 8e1e6244a89d..548212eccbd6 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -27,6 +27,11 @@ static inline pte_t huge_pte_mkdirty(pte_t pte) return pte_mkdirty(pte); } +static inline pte_t huge_pte_mkuffd_wp(pte_t pte) +{ + return pte_mkuffd_wp(pte); +} + static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot) { return pte_modify(pte, newprot); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a7f7d5f328dc..ef8d2b8427b1 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -141,7 +141,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep); + struct page **pagep, + bool wp_copy); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, @@ -321,7 +322,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { BUG(); return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index def2c7ddf3ae..f0e55b341ebd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4725,7 +4725,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct address_space *mapping; @@ -4822,17 +4823,28 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); } - /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */ - if (is_continue && !vm_shared) + /* + * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY + * with wp flag set, don't set pte write bit. + */ + if (wp_copy || (is_continue && !vm_shared)) writable = 0; else writable = dst_vma->vm_flags & VM_WRITE; _dst_pte = make_huge_pte(dst_vma, page, writable); - if (writable) - _dst_pte = huge_pte_mkdirty(_dst_pte); + /* + * Always mark UFFDIO_COPY page dirty; note that this may not be + * extremely important for hugetlbfs for now since swapping is not + * supported, but we should still be clear in that this page cannot be + * thrown away at will, even if write bit not set. + */ + _dst_pte = huge_pte_mkdirty(_dst_pte); _dst_pte = pte_mkyoung(_dst_pte); + if (wp_copy) + _dst_pte = huge_pte_mkuffd_wp(_dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 0963e0d9ed20..78471ae3d25c 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -207,7 +207,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode) + enum mcopy_atomic_mode mode, + bool wp_copy) { int vm_alloc_shared = dst_vma->vm_flags & VM_SHARED; int vm_shared = dst_vma->vm_flags & VM_SHARED; @@ -304,7 +305,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, - dst_addr, src_addr, mode, &page); + dst_addr, src_addr, mode, &page, + wp_copy); mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -406,7 +408,8 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode); + enum mcopy_atomic_mode mode, + bool wp_copy); #endif /* CONFIG_HUGETLB_PAGE */ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -527,7 +530,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, mcopy_mode); + src_start, len, mcopy_mode, + wp_copy); if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock; From patchwork Tue Mar 23 00:49:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05519C433C1 for ; Tue, 23 Mar 2021 00:50:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9F87B619A0 for ; Tue, 23 Mar 2021 00:50:00 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F87B619A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4BA9A6B0134; Mon, 22 Mar 2021 20:49:48 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 442686B0136; Mon, 22 Mar 2021 20:49:48 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 295E66B0137; Mon, 22 Mar 2021 20:49:48 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0163.hostedemail.com [216.40.44.163]) by kanga.kvack.org (Postfix) with ESMTP id 055156B0134 for ; Mon, 22 Mar 2021 20:49:48 -0400 (EDT) Received: from smtpin36.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id C5926180AD82F for ; Tue, 23 Mar 2021 00:49:47 +0000 (UTC) X-FDA: 77949306414.36.DFD31F3 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf03.hostedemail.com (Postfix) with ESMTP id A5225C0007C4 for ; Tue, 23 Mar 2021 00:49:46 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460586; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=UhBZPg5Z3WMoKN7PnLVoaXXwSSsImCDdZ5uN4rmdysI=; b=HTDEx8z2gH6CJOGQUaQoiDsAc3Q3GN3//49gDTbTtVN6b+GlIa0BwLzAykKd/lDRyyKVHi kj8TM5T89QeC+3bHaaaYHnFhvRfjf6JVCu18IX7A1hNPYXGDdbbPNc2/S0Gx1jnFxzeDgk iOcPf6+11PYyNroOxyIZ7P3A23Sr5Ak= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-295-qEOzYZonNoWHhVT4F2HrCA-1; Mon, 22 Mar 2021 20:49:45 -0400 X-MC-Unique: qEOzYZonNoWHhVT4F2HrCA-1 Received: by mail-qt1-f197.google.com with SMTP id c20so404408qtw.9 for ; Mon, 22 Mar 2021 17:49:45 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=UhBZPg5Z3WMoKN7PnLVoaXXwSSsImCDdZ5uN4rmdysI=; b=eCTPtDEBIDN13y2EXOiBAMgx2iHAex8mdJnv01wQcf1X+2Z3ohOYb8H3tmXxFTocET n0P11inyIw+CKbHnJrQ7pV9mKAGsjcviNvwJXwE+jt2zspJXrnjHS2XCE1+P3LqqXlmT QUDf9y7NdPVsgbgzKEDdvUbXB0bWzSDV2UPN9aI2720QC3bGq4FjXUcSAj8Xj1TP0B9n p9oVP8imv1a9d0liyMjVEeOrTr4y/VOrUHMXzbexTZuxfCIUBjYRJXr/mAi/g19/RMJj 27P7ifD0qMJE4P+P/5bVluoW9fhdjN1qfNUPrk3JrdKz7b2V0HhE0DuhhxYFHY0fcKsf e6BQ== X-Gm-Message-State: AOAM530yzyW/sA8heXNcILc/nGHLWewQxK9ItpqHkgrawz4nGu0un53A D37LFAXbSr3HsS8g8gr06ucKHBtXnLzXEiznVRStxJj2ZfT3OMeGAqGovxfzWzT2YK87F0dupZn GaqvpS+lMzSw= X-Received: by 2002:ae9:e513:: with SMTP id w19mr3120251qkf.231.1616460584702; Mon, 22 Mar 2021 17:49:44 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzCH+SBGMXuDXb1jmvEX2TeCWc+Nab5QwAa3TRvreNK9N+mjxOgDxPA3LiapLQzrImvnQdEXA== X-Received: by 2002:ae9:e513:: with SMTP id w19mr3120225qkf.231.1616460584478; Mon, 22 Mar 2021 17:49:44 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.42 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:44 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 17/23] hugetlb/userfaultfd: Handle UFFDIO_WRITEPROTECT Date: Mon, 22 Mar 2021 20:49:06 -0400 Message-Id: <20210323004912.35132-18-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: A5225C0007C4 X-Stat-Signature: xzqqpf7zfo8tewarih8gd3twayydggup Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf03; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460586-73992 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This starts from passing cp_flags into hugetlb_change_protection() so hugetlb will be able to handle MM_CP_UFFD_WP[_RESOLVE] requests. huge_pte_clear_uffd_wp() is introduced to handle the case where the UFFDIO_WRITEPROTECT is requested upon migrating huge page entries. Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz --- include/asm-generic/hugetlb.h | 5 +++++ include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 13 ++++++++++++- mm/mprotect.c | 3 ++- mm/userfaultfd.c | 8 ++++++++ 5 files changed, 31 insertions(+), 4 deletions(-) diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index 548212eccbd6..181cdc3297e7 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -32,6 +32,11 @@ static inline pte_t huge_pte_mkuffd_wp(pte_t pte) return pte_mkuffd_wp(pte); } +static inline pte_t huge_pte_clear_uffd_wp(pte_t pte) +{ + return pte_clear_uffd_wp(pte); +} + static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot) { return pte_modify(pte, newprot); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ef8d2b8427b1..92710600596e 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -190,7 +190,8 @@ struct page *follow_huge_pgd(struct mm_struct *mm, unsigned long address, int pmd_huge(pmd_t pmd); int pud_huge(pud_t pud); unsigned long hugetlb_change_protection(struct vm_area_struct *vma, - unsigned long address, unsigned long end, pgprot_t newprot); + unsigned long address, unsigned long end, pgprot_t newprot, + unsigned long cp_flags); bool is_hugetlb_entry_migration(pte_t pte); void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); @@ -352,7 +353,8 @@ static inline void move_hugetlb_state(struct page *oldpage, static inline unsigned long hugetlb_change_protection( struct vm_area_struct *vma, unsigned long address, - unsigned long end, pgprot_t newprot) + unsigned long end, pgprot_t newprot, + unsigned long cp_flags) { return 0; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f0e55b341ebd..fd3e87517e10 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5063,7 +5063,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, } unsigned long hugetlb_change_protection(struct vm_area_struct *vma, - unsigned long address, unsigned long end, pgprot_t newprot) + unsigned long address, unsigned long end, + pgprot_t newprot, unsigned long cp_flags) { struct mm_struct *mm = vma->vm_mm; unsigned long start = address; @@ -5073,6 +5074,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long pages = 0; bool shared_pmd = false; struct mmu_notifier_range range; + bool uffd_wp = cp_flags & MM_CP_UFFD_WP; + bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; /* * In the case of shared PMDs, the area to flush could be beyond @@ -5113,6 +5116,10 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, make_migration_entry_read(&entry); newpte = swp_entry_to_pte(entry); + if (uffd_wp) + newpte = pte_swp_mkuffd_wp(newpte); + else if (uffd_wp_resolve) + newpte = pte_swp_clear_uffd_wp(newpte); set_huge_swap_pte_at(mm, address, ptep, newpte, huge_page_size(h)); pages++; @@ -5126,6 +5133,10 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, old_pte = huge_ptep_modify_prot_start(vma, address, ptep); pte = pte_mkhuge(huge_pte_modify(old_pte, newprot)); pte = arch_make_huge_pte(pte, vma, NULL, 0); + if (uffd_wp) + pte = huge_pte_mkuffd_wp(huge_pte_wrprotect(pte)); + else if (uffd_wp_resolve) + pte = huge_pte_clear_uffd_wp(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; } diff --git a/mm/mprotect.c b/mm/mprotect.c index 51c954afa406..fe5a5b96a61f 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -416,7 +416,8 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start, BUG_ON((cp_flags & MM_CP_UFFD_WP_ALL) == MM_CP_UFFD_WP_ALL); if (is_vm_hugetlb_page(vma)) - pages = hugetlb_change_protection(vma, start, end, newprot); + pages = hugetlb_change_protection(vma, start, end, newprot, + cp_flags); else pages = change_protection_range(vma, start, end, newprot, cp_flags); diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 78471ae3d25c..01170197a3d7 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -654,6 +654,7 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool enable_wp, bool *mmap_changing) { struct vm_area_struct *dst_vma; + unsigned long page_mask; pgprot_t newprot; int err; @@ -690,6 +691,13 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, if (!vma_is_anonymous(dst_vma)) goto out_unlock; + if (is_vm_hugetlb_page(dst_vma)) { + err = -EINVAL; + page_mask = vma_kernel_pagesize(dst_vma) - 1; + if ((start & page_mask) || (len & page_mask)) + goto out_unlock; + } + if (enable_wp) newprot = vm_get_page_prot(dst_vma->vm_flags & ~(VM_WRITE)); else From patchwork Tue Mar 23 00:49:07 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156511 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BDDCC433C1 for ; Tue, 23 Mar 2021 00:50:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 071376196C for ; Tue, 23 Mar 2021 00:50:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 071376196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 50BF66B0137; Mon, 22 Mar 2021 20:49:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 46CC26B013A; Mon, 22 Mar 2021 20:49:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 248C26B013B; Mon, 22 Mar 2021 20:49:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0206.hostedemail.com [216.40.44.206]) by kanga.kvack.org (Postfix) with ESMTP id 040236B0137 for ; Mon, 22 Mar 2021 20:49:50 -0400 (EDT) Received: from smtpin38.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id C24398249980 for ; Tue, 23 Mar 2021 00:49:50 +0000 (UTC) X-FDA: 77949306540.38.4E2D7CC Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf18.hostedemail.com (Postfix) with ESMTP id AF21B2000248 for ; Tue, 23 Mar 2021 00:49:49 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460589; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=KeJZGTpGOrohXVCCKtDdkXfVwkpCkyHGurO1opmnjl4=; b=c3TVOvflmh94hA4f7BjDKcRktSk/W6PNsQ5TyX/LzS7ZOMichkE9BcGxdT2mlfZLxwvHGm nbgpNW9HDL1+0bHaxgb3kNewfWPNs79yzwchGZk98GPli7Dpy05jJSZ+m7u8xzGyV5zxpr efja3aoNSLuvQdodpqQDTB7s7hMzemg= Received: from mail-qk1-f198.google.com (mail-qk1-f198.google.com [209.85.222.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-33-4IksJrd6MA2eD79yk_jJ6g-1; Mon, 22 Mar 2021 20:49:48 -0400 X-MC-Unique: 4IksJrd6MA2eD79yk_jJ6g-1 Received: by mail-qk1-f198.google.com with SMTP id y9so804529qki.14 for ; Mon, 22 Mar 2021 17:49:48 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=KeJZGTpGOrohXVCCKtDdkXfVwkpCkyHGurO1opmnjl4=; b=hAvNCgMj1Lc0OW9hx+/fkyIfZxe5qxcWPuENTHkPGEO10I7eCRko5heyDC2T0BIcGq NWX2jei2e6XRO1XwK8R7cytMgCxwfL9hiJsObM77Hzl3fyUcWL9WjXWZ9SZN3GOS2YXi upfgSK5EssHRra7F6gS8Ro/9TM1uOSvolEZPpey8ukDhGTT2gFj9SdDEmQURKCcPDXMy DvZ7+DEm+fOHhDT8zTnYnjvdBsfvO6iq9QhckVs8ohmfqSTaiSoxLGcZ2FCRSs6SHTzS o9xb/91QE21X/R+vEmo8bi5C9VU+L8skC6ZT6cCkDQ8ECTUHsLe4UrgYMMPzWVMLT2Rc VmoQ== X-Gm-Message-State: AOAM532gvNV1NZVIFA5LbP5rdUONmwmMsQ6AjiaQi37i6PG34ZkNBGZ/ vijeVyeHYGcpWvkRIK0RFbY7VNjvtlCMBR8KdisrxpD944RcbbH/b8J01UXhBSuBboQq1Xt8G7d tLgZltO12xI4= X-Received: by 2002:a37:e16:: with SMTP id 22mr2917205qko.145.1616460586504; Mon, 22 Mar 2021 17:49:46 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxy6w3/RokxBsgJvQlCDPweOOic+hnNCwhRCQU36ubzi7Ex7In6XJ0GzrQwMBBQ4/ZcsoaITQ== X-Received: by 2002:a37:e16:: with SMTP id 22mr2917181qko.145.1616460586282; Mon, 22 Mar 2021 17:49:46 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id n6sm5031793qtx.22.2021.03.22.17.49.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:49:45 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: "Kirill A . Shutemov" , Jerome Glisse , Mike Kravetz , Matthew Wilcox , Andrew Morton , Axel Rasmussen , Hugh Dickins , peterx@redhat.com, Nadav Amit , Andrea Arcangeli , Mike Rapoport Subject: [PATCH 18/23] mm/hugetlb: Introduce huge version of special swap pte helpers Date: Mon, 22 Mar 2021 20:49:07 -0400 Message-Id: <20210323004912.35132-19-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: cyts8soao96suenp9oa8rc3rr4hc96n1 X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: AF21B2000248 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf18; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460589-438542 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is to let hugetlbfs be prepared to also recognize swap special ptes just like uffd-wp special swap ptes. Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz --- mm/hugetlb.c | 23 +++++++++++++++++++++-- 1 file changed, 21 insertions(+), 2 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index fd3e87517e10..64e424b03774 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -93,6 +93,25 @@ static inline bool subpool_is_free(struct hugepage_subpool *spool) return true; } +/* + * These are sister versions of is_swap_pte() and pte_has_swap_entry(). We + * need standalone ones because huge_pte_none() is handled differently from + * pte_none(). For more information, please refer to comments above + * is_swap_pte() and pte_has_swap_entry(). + * + * Here we directly reuse the pte level of swap special ptes, for example, the + * pte_swp_uffd_wp_special(). It just stands for a huge page rather than a + * small page for hugetlbfs pages. + */ +static inline bool is_huge_swap_pte(pte_t pte) +{ + return !huge_pte_none(pte) && !pte_present(pte); +} +static inline bool huge_pte_has_swap_entry(pte_t pte) +{ + return is_huge_swap_pte(pte) && !is_swap_special_pte(pte); +} + static inline void unlock_or_release_subpool(struct hugepage_subpool *spool) { spin_unlock(&spool->lock); @@ -3726,7 +3745,7 @@ bool is_hugetlb_entry_migration(pte_t pte) { swp_entry_t swp; - if (huge_pte_none(pte) || pte_present(pte)) + if (!huge_pte_has_swap_entry(pte)) return false; swp = pte_to_swp_entry(pte); if (is_migration_entry(swp)) @@ -3739,7 +3758,7 @@ static bool is_hugetlb_entry_hwpoisoned(pte_t pte) { swp_entry_t swp; - if (huge_pte_none(pte) || pte_present(pte)) + if (!huge_pte_has_swap_entry(pte)) return false; swp = pte_to_swp_entry(pte); if (is_hwpoison_entry(swp)) From patchwork Tue Mar 23 00:50:49 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156513 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3A662C433C1 for ; Tue, 23 Mar 2021 00:50:57 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C2C1D619A9 for ; Tue, 23 Mar 2021 00:50:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C2C1D619A9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3F5596B013A; Mon, 22 Mar 2021 20:50:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CC2B6B013C; Mon, 22 Mar 2021 20:50:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2953D6B013D; Mon, 22 Mar 2021 20:50:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0090.hostedemail.com [216.40.44.90]) by kanga.kvack.org (Postfix) with ESMTP id 09B956B013A for ; Mon, 22 Mar 2021 20:50:56 -0400 (EDT) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C50351803EB66 for ; Tue, 23 Mar 2021 00:50:55 +0000 (UTC) X-FDA: 77949309270.19.A06E7C1 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf11.hostedemail.com (Postfix) with ESMTP id 3645C2000241 for ; Tue, 23 Mar 2021 00:50:54 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460654; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=H3Lr+m1986ntzk3VRxe9tylxZnwbRgl8IMuaiRrVnNc=; b=Yzt5QWQ5vjsFgezKYj/gxwFX9JqPxVV0WfXTwCM5sfKZzHyiwWllFvn9aHFRSIdhTZDEEu T3QeSrMnmFnwWWw+t1buFducHptORJnScg93VsBdmBee7zQftA63HyjMn81cpDbTwa8Uau XvU7+yEd39pY1vmpXFGyWcKGeErEilE= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-255-o0jUXo1rORScJci6VWAOMg-1; Mon, 22 Mar 2021 20:50:53 -0400 X-MC-Unique: o0jUXo1rORScJci6VWAOMg-1 Received: by mail-qv1-f72.google.com with SMTP id jx11so556905qvb.10 for ; Mon, 22 Mar 2021 17:50:53 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=H3Lr+m1986ntzk3VRxe9tylxZnwbRgl8IMuaiRrVnNc=; b=K55WsHISJ95WclPW6IY6+DAPPDU6zcksQBiEYFtgRWOF9G9YmicgW9xOhapa5sSDWy QaJ5DJPgsTsr18aFTqmUkgKonnBaoOSB7n0hD4PgX2DS+HvfZL2+l9G9d1HV1CJG7G2d FTbfM2LffzIubZCiPFlF9ujEG05RbSxuj3qpMsEG/g+eC36Brwg1fCS7iJPaUmMMcdj2 rRSJD4Nbq0CQjIWv3Ar57UbC4HiYkH7b+62IAqVLKPAu/uDGA3NuwDGzroJ1unjhwgJm /YbKnYBRalbpxlCpdQpq7zDMam4AhYjsGGkwqGp5P7Gi6f0+MIXq+EPplrqu75hleEIV eIrQ== X-Gm-Message-State: AOAM5320iewCjjUcSqNkFaW7Kx1pQulJmkDTAaDyPXOg66zeyC7qkhDk UkYhtoT5s1weAjovrvBqT3csIBbn4UexnrGMgheRHiu8chhLSTL4BUlGA/dPFoHYClhTBIiWK8Z BpSRntNu7CrTxbIZv9437gSwLPTVcCsOAh80Cxys1zlAMXXeiys5Ur2F4A3q5 X-Received: by 2002:a05:620a:134a:: with SMTP id c10mr2863470qkl.481.1616460652180; Mon, 22 Mar 2021 17:50:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzAMuoGRIZxa7jOL9xQ+AW4kyHPVmNjfDrANEOxAGw3VtXU98u3x8ooHTpIEev85y+Gs3ynOw== X-Received: by 2002:a05:620a:134a:: with SMTP id c10mr2863435qkl.481.1616460651856; Mon, 22 Mar 2021 17:50:51 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id y6sm2228726qkd.106.2021.03.22.17.50.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:50:51 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Mike Rapoport , Nadav Amit , Jerome Glisse , Hugh Dickins , Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Axel Rasmussen , peterx@redhat.com, Mike Kravetz , Matthew Wilcox Subject: [PATCH 19/23] hugetlb/userfaultfd: Handle uffd-wp special pte in hugetlb pf handler Date: Mon, 22 Mar 2021 20:50:49 -0400 Message-Id: <20210323005049.35862-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: yxmquhgo6o6sigzm144k6pycmbmke3i7 X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 3645C2000241 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf11; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460654-242649 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Teach the hugetlb page fault code to understand uffd-wp special pte. For example, when seeing such a pte we need to convert any write fault into a read one (which is fake - we'll retry the write later if so). Meanwhile, for handle_userfault() we'll need to make sure we must wait for the special swap pte too just like a none pte. Note that we also need to teach UFFDIO_COPY about this special pte across the code path so that we can safely install a new page at this special pte as long as we know it's a stall entry. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 5 ++++- mm/hugetlb.c | 34 +++++++++++++++++++++++++++------- mm/userfaultfd.c | 5 ++++- 3 files changed, 35 insertions(+), 9 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 72956f9cc892..f6fa34f58c37 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -245,8 +245,11 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, /* * Lockless access: we're in a wait_event so it's ok if it * changes under us. + * + * Regarding uffd-wp special case, please refer to comments in + * userfaultfd_must_wait(). */ - if (huge_pte_none(pte)) + if (huge_pte_none(pte) || pte_swp_uffd_wp_special(pte)) ret = true; if (!huge_pte_write(pte) && (reason & VM_UFFD_WP)) ret = true; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 64e424b03774..448ef745d5ee 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4369,7 +4369,8 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, - unsigned long address, pte_t *ptep, unsigned int flags) + unsigned long address, pte_t *ptep, + pte_t old_pte, unsigned int flags) { struct hstate *h = hstate_vma(vma); vm_fault_t ret = VM_FAULT_SIGBUS; @@ -4493,7 +4494,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, ptl = huge_pte_lock(h, mm, ptep); ret = 0; - if (!huge_pte_none(huge_ptep_get(ptep))) + if (!pte_same(huge_ptep_get(ptep), old_pte)) goto backout; if (anon_rmap) { @@ -4503,6 +4504,11 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, page_dup_rmap(page, true); new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED))); + if (unlikely(flags & FAULT_FLAG_UFFD_WP)) { + WARN_ON_ONCE(flags & FAULT_FLAG_WRITE); + /* We should have the write bit cleared already, but be safe */ + new_pte = huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte)); + } set_huge_pte_at(mm, haddr, ptep, new_pte); hugetlb_count_add(pages_per_huge_page(h), mm); @@ -4584,9 +4590,16 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (unlikely(is_hugetlb_entry_migration(entry))) { migration_entry_wait_huge(vma, mm, ptep); return 0; - } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) + } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) { return VM_FAULT_HWPOISON_LARGE | VM_FAULT_SET_HINDEX(hstate_index(h)); + } else if (unlikely(is_swap_special_pte(entry))) { + /* Must be a uffd-wp special swap pte */ + WARN_ON_ONCE(!pte_swp_uffd_wp_special(entry)); + flags |= FAULT_FLAG_UFFD_WP; + /* Emulate a read fault */ + flags &= ~FAULT_FLAG_WRITE; + } } /* @@ -4618,8 +4631,13 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, mutex_lock(&hugetlb_fault_mutex_table[hash]); entry = huge_ptep_get(ptep); - if (huge_pte_none(entry)) { - ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, flags); + /* + * FAULT_FLAG_UFFD_WP should be handled merely the same as pte none + * because it's basically a none pte with a special marker + */ + if (huge_pte_none(entry) || pte_swp_uffd_wp_special(entry)) { + ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, + entry, flags); goto out_mutex; } @@ -4753,7 +4771,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long size; int vm_shared = dst_vma->vm_flags & VM_SHARED; struct hstate *h = hstate_vma(dst_vma); - pte_t _dst_pte; + pte_t _dst_pte, cur_pte; spinlock_t *ptl; int ret; struct page *page; @@ -4831,8 +4849,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (idx >= size) goto out_release_unlock; + cur_pte = huge_ptep_get(dst_pte); ret = -EEXIST; - if (!huge_pte_none(huge_ptep_get(dst_pte))) + /* Please refer to shmem_mfill_atomic_pte() for uffd-wp special case */ + if (!huge_pte_none(cur_pte) && !pte_swp_uffd_wp_special(cur_pte)) goto out_release_unlock; if (vm_shared) { diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 01170197a3d7..a2b0dcc80a19 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -274,6 +274,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } while (src_addr < src_start + len) { + pte_t pteval; + BUG_ON(dst_addr >= dst_start + len); /* @@ -296,8 +298,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, goto out_unlock; } + pteval = huge_ptep_get(dst_pte); if (mode != MCOPY_ATOMIC_CONTINUE && - !huge_pte_none(huge_ptep_get(dst_pte))) { + !huge_pte_none(pteval) && !pte_swp_uffd_wp_special(pteval)) { err = -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); From patchwork Tue Mar 23 00:50:52 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156515 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A388C433E0 for ; Tue, 23 Mar 2021 00:50:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id F2D446196C for ; Tue, 23 Mar 2021 00:50:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org F2D446196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 863356B013C; Mon, 22 Mar 2021 20:50:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 839F26B013E; Mon, 22 Mar 2021 20:50:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 63F766B013F; Mon, 22 Mar 2021 20:50:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0140.hostedemail.com [216.40.44.140]) by kanga.kvack.org (Postfix) with ESMTP id 429726B013C for ; Mon, 22 Mar 2021 20:50:58 -0400 (EDT) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 06A5E8249980 for ; Tue, 23 Mar 2021 00:50:58 +0000 (UTC) X-FDA: 77949309396.03.4C58F1D Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf08.hostedemail.com (Postfix) with ESMTP id CCA6E80192D5 for ; Tue, 23 Mar 2021 00:50:57 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460657; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QYSwV1DdjoGbYhLD7l7jd/ThoyIb5S4HvnYDjWOs2EQ=; b=atv6ChkpYXCADsCIO4jC6My370XExgsucXq8yq9iwoKxw1BGxrqUOQtNRRJ/OjCVWxFvju CcJLl7k0BTZDNsunMbTi4giC4Qp8gZL+YEDPdjJLfIlnKOYcB534lDsmrBboi0HNtMUbqN gjtOu/1qFVAt+FIcL1BUD/jinGgpnD0= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-172--Fx1KWIONYCYcZ0VPFrkUQ-1; Mon, 22 Mar 2021 20:50:55 -0400 X-MC-Unique: -Fx1KWIONYCYcZ0VPFrkUQ-1 Received: by mail-qk1-f200.google.com with SMTP id b127so792171qkf.19 for ; Mon, 22 Mar 2021 17:50:55 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QYSwV1DdjoGbYhLD7l7jd/ThoyIb5S4HvnYDjWOs2EQ=; b=MPsrQAk9pPqu5FALAic+doeqGNR9hsvQMbyZTvBB4AB3MWqty79GT0yGzQ9aP1PeVX LLv0AV8ZzoFBWsDrdc3d4GulBqOwO9QoeYROK+ZZGZJ66AoWjKKT2BcY6uAaSjKgIRDe /DOirQSA07k/Em/jbVICqBJxxQyKkeNnAV3GtH1rVwJoMCcOIe8gptRx3f46PJ3UQUZx qv0FVMyqocQyV04nd44jx/xtuqkuqTt+hDObiT5TWb7uz05SZYXpgDyOmu+N+g89zPP9 H8l3lzooQSD6LVCZvfO0SGgd0uG6re78xpJji3j5TbD4Q9x78KpbhbStYIy0aJaSeS1m mLAA== X-Gm-Message-State: AOAM531NId4T2fcDsQM6RMpJbKlvg7MccPvq61B+QDij8qU7cI3Ve0BR 7GIsGXeBUqqEF5qegiXYEb+YebjoANZbVl/eMTQ2C3bjmnA+wq78NpvK0gaCXqf27VYa9JNCa0q 0cVn9Cjh954sloXwzjug4s/u4m1DG+Hsc+oVZGOurQrYDL9BFmt7nhGGdMwUj X-Received: by 2002:ad4:4cc8:: with SMTP id i8mr2460908qvz.56.1616460654793; Mon, 22 Mar 2021 17:50:54 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz/wYGS6MPDtkctKR6iln/9KAo2nZVmqKkBxLuPXgL5Im3S3NO0Z0sGaNw2LONKOI8LyEdDOw== X-Received: by 2002:ad4:4cc8:: with SMTP id i8mr2460882qvz.56.1616460654532; Mon, 22 Mar 2021 17:50:54 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id 8sm12051921qkc.32.2021.03.22.17.50.53 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:50:54 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Mike Rapoport , Nadav Amit , Jerome Glisse , Hugh Dickins , Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Axel Rasmussen , peterx@redhat.com, Mike Kravetz , Matthew Wilcox Subject: [PATCH 20/23] hugetlb/userfaultfd: Allow wr-protect none ptes Date: Mon, 22 Mar 2021 20:50:52 -0400 Message-Id: <20210323005052.35916-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: CCA6E80192D5 X-Stat-Signature: 6dn9gsy8s68cbqagadkcs9g67wwh8q4r Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf08; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460657-793427 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Teach hugetlbfs code to wr-protect none ptes just in case the page cache existed for that pte. Meanwhile we also need to be able to recognize a uffd-wp marker pte and remove it for uffd_wp_resolve. Since at it, introduce a variable "psize" to replace all references to the huge page size fetcher. Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz --- mm/hugetlb.c | 29 +++++++++++++++++++++++++---- 1 file changed, 25 insertions(+), 4 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 448ef745d5ee..d4acf9d9d087 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5110,7 +5110,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, pte_t *ptep; pte_t pte; struct hstate *h = hstate_vma(vma); - unsigned long pages = 0; + unsigned long pages = 0, psize = huge_page_size(h); bool shared_pmd = false; struct mmu_notifier_range range; bool uffd_wp = cp_flags & MM_CP_UFFD_WP; @@ -5130,13 +5130,19 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, mmu_notifier_invalidate_range_start(&range); i_mmap_lock_write(vma->vm_file->f_mapping); - for (; address < end; address += huge_page_size(h)) { + for (; address < end; address += psize) { spinlock_t *ptl; - ptep = huge_pte_offset(mm, address, huge_page_size(h)); + ptep = huge_pte_offset(mm, address, psize); if (!ptep) continue; ptl = huge_pte_lock(h, mm, ptep); if (huge_pmd_unshare(mm, vma, &address, ptep)) { + /* + * When uffd-wp is enabled on the vma, unshare + * shouldn't happen at all. Warn about it if it + * happened due to some reason. + */ + WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); pages++; spin_unlock(ptl); shared_pmd = true; @@ -5160,12 +5166,21 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, else if (uffd_wp_resolve) newpte = pte_swp_clear_uffd_wp(newpte); set_huge_swap_pte_at(mm, address, ptep, - newpte, huge_page_size(h)); + newpte, psize); pages++; } spin_unlock(ptl); continue; } + if (unlikely(is_swap_special_pte(pte))) { + WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte)); + /* + * This is changing a non-present pte into a none pte, + * no need for huge_ptep_modify_prot_start/commit(). + */ + if (uffd_wp_resolve) + huge_pte_clear(mm, address, ptep, psize); + } if (!huge_pte_none(pte)) { pte_t old_pte; @@ -5178,6 +5193,12 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, pte = huge_pte_clear_uffd_wp(pte); huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); pages++; + } else { + /* None pte */ + if (unlikely(uffd_wp)) + /* Safe to modify directly (none->non-present). */ + set_huge_pte_at(mm, address, ptep, + pte_swp_mkuffd_wp_special(vma)); } spin_unlock(ptl); } From patchwork Tue Mar 23 00:50:54 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156517 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1385CC433C1 for ; Tue, 23 Mar 2021 00:51:02 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9F3D2619A0 for ; Tue, 23 Mar 2021 00:51:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9F3D2619A0 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 333AD6B013E; Mon, 22 Mar 2021 20:51:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2E42C6B0140; Mon, 22 Mar 2021 20:51:01 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 134D66B0141; Mon, 22 Mar 2021 20:51:01 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0247.hostedemail.com [216.40.44.247]) by kanga.kvack.org (Postfix) with ESMTP id E03826B013E for ; Mon, 22 Mar 2021 20:51:00 -0400 (EDT) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id A97F918014CB8 for ; Tue, 23 Mar 2021 00:51:00 +0000 (UTC) X-FDA: 77949309480.29.3D2D4C8 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf30.hostedemail.com (Postfix) with ESMTP id 553FCE0011C5 for ; Tue, 23 Mar 2021 00:50:59 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460659; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=SEgaZPIzrTaIsi5MnPxcYnngkBF3yoI3KwetqriOawQ=; b=FgcG6A2ivbk+2BpNhqDrA8piSj8x8Jcm7P7Quk0c+l0r0XpFPkiOf4DsvXXHKPq3uAEuJh BOFpFCTVpU3bJWe+652RZi40EbekjYI3hCHZUgULca5wgB74V4gNLONV82P4wm2nj2FXRv hLy34mhw1AYmQRGxmaoxEQdfDUzBd38= Received: from mail-qt1-f199.google.com (mail-qt1-f199.google.com [209.85.160.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-330-mgQ_Gc2wMmm0ni9gx9EqOw-1; Mon, 22 Mar 2021 20:50:58 -0400 X-MC-Unique: mgQ_Gc2wMmm0ni9gx9EqOw-1 Received: by mail-qt1-f199.google.com with SMTP id d11so414979qth.3 for ; Mon, 22 Mar 2021 17:50:58 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=SEgaZPIzrTaIsi5MnPxcYnngkBF3yoI3KwetqriOawQ=; b=CApmscHdHZw8QnXSI0aA0WnFBZqBC+OB+6/Firp4jJ9Gi7uIhjc0Pf0Aw/J8zwwelX aT8ElROibA6i5YfrQmVSat1P6B1mcCO24jSGCRKLFGOR2eYGAF7yQFT1pqwK8Ua+wIXk 4eLuRpnLs1I4k5tS/4ZI06cUS2UEUtWhaqg5JglfWKFUynASmpC9pj4b7pcb1pMtwLRs XxqJIXsMptExO/92GFneQcr8VAuIh76/A2AOSckWJjPAP+jbE8eGZxJfFTxA2QerP3el FYXy5G3pDgnLZxYpkzfh0/kSpgxXkVdOToLY9DEmGLtb4R2vXSiWreVOa/vFxQk3v1JY M+AQ== X-Gm-Message-State: AOAM531jYKWsd7vd9X91QiVYpfTS2nvpj8+KcO+k/MouMAbbgL0PhlqX ZxROhGUUsuagxfRffhJ9Xv0na3G9In1B/yElKyDsFUbkGjVj4XIFOXGBrD79GnPyDvk/Oot164D 1ARXIZTpiSeRqIUceKN0MLKzEbFcvkdnRkjU+JPLpnLC2NfxHa6M3Vvzn5qDe X-Received: by 2002:a37:946:: with SMTP id 67mr2838371qkj.194.1616460657285; Mon, 22 Mar 2021 17:50:57 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzoPSRNxbgLdbNd+eWCnJrXrf5UvKBY7it33yfIre1UGnCBiLS4TmBBCxQoaDUEz5lsAjK4ew== X-Received: by 2002:a37:946:: with SMTP id 67mr2838337qkj.194.1616460656902; Mon, 22 Mar 2021 17:50:56 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id y13sm9837618qto.39.2021.03.22.17.50.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:50:56 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Mike Rapoport , Nadav Amit , Jerome Glisse , Hugh Dickins , Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Axel Rasmussen , peterx@redhat.com, Mike Kravetz , Matthew Wilcox Subject: [PATCH 21/23] hugetlb/userfaultfd: Only drop uffd-wp special pte if required Date: Mon, 22 Mar 2021 20:50:54 -0400 Message-Id: <20210323005054.35973-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 553FCE0011C5 X-Stat-Signature: xe8wnpjpn4exh8f3kp833jsd4g7zygrh Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf30; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=170.10.133.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460659-335659 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Just like what we've done with shmem uffd-wp special ptes, we shouldn't drop uffd-wp special swap pte for hugetlb too, only if we're going to unmap the whole vma, or we're punching a hole with safe locks held. For example, remove_inode_hugepages() is safe to drop uffd-wp ptes, because it has taken hugetlb fault mutex so that no concurrent page fault would trigger. While the call to hugetlb_vmdelete_list() in hugetlbfs_punch_hole() is not safe. That's why the previous call will be with ZAP_FLAG_DROP_FILE_UFFD_WP, while the latter one won't be able to. Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 15 +++++++++------ include/linux/hugetlb.h | 13 ++++++++----- mm/hugetlb.c | 27 +++++++++++++++++++++------ mm/memory.c | 5 ++++- 4 files changed, 42 insertions(+), 18 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index d81f52b87bd7..5fe19e801a2b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -399,7 +399,8 @@ static void remove_huge_page(struct page *page) } static void -hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) +hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end, + unsigned long zap_flags) { struct vm_area_struct *vma; @@ -432,7 +433,7 @@ hugetlb_vmdelete_list(struct rb_root_cached *root, pgoff_t start, pgoff_t end) } unmap_hugepage_range(vma, vma->vm_start + v_offset, v_end, - NULL); + NULL, zap_flags); } } @@ -513,7 +514,8 @@ static void remove_inode_hugepages(struct inode *inode, loff_t lstart, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vmdelete_list(&mapping->i_mmap, index * pages_per_huge_page(h), - (index + 1) * pages_per_huge_page(h)); + (index + 1) * pages_per_huge_page(h), + ZAP_FLAG_DROP_FILE_UFFD_WP); i_mmap_unlock_write(mapping); } @@ -579,7 +581,8 @@ static void hugetlb_vmtruncate(struct inode *inode, loff_t offset) i_mmap_lock_write(mapping); i_size_write(inode, offset); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) - hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); + hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0, + ZAP_FLAG_DROP_FILE_UFFD_WP); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); } @@ -612,8 +615,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, - hole_start >> PAGE_SHIFT, - hole_end >> PAGE_SHIFT); + hole_start >> PAGE_SHIFT, + hole_end >> PAGE_SHIFT, 0); i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); inode_unlock(inode); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 92710600596e..4047fa042782 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -121,14 +121,15 @@ long follow_hugetlb_page(struct mm_struct *, struct vm_area_struct *, unsigned long *, unsigned long *, long, unsigned int, int *); void unmap_hugepage_range(struct vm_area_struct *, - unsigned long, unsigned long, struct page *); + unsigned long, unsigned long, struct page *, + unsigned long); void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page); + struct page *ref_page, unsigned long zap_flags); void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page); + struct page *ref_page, unsigned long zap_flags); void hugetlb_report_meminfo(struct seq_file *); int hugetlb_report_node_meminfo(char *buf, int len, int nid); void hugetlb_show_meminfo(void); @@ -361,14 +362,16 @@ static inline unsigned long hugetlb_change_protection( static inline void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { BUG(); } static inline void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { BUG(); } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d4acf9d9d087..deeae6d40dad 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3936,7 +3936,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, unsigned long end, - struct page *ref_page) + struct page *ref_page, unsigned long zap_flags) { struct mm_struct *mm = vma->vm_mm; unsigned long address; @@ -3988,6 +3988,19 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, continue; } + if (unlikely(is_swap_special_pte(pte))) { + WARN_ON_ONCE(!pte_swp_uffd_wp_special(pte)); + /* + * Only drop the special swap uffd-wp pte if + * e.g. unmapping a vma or punching a hole (with proper + * lock held so that concurrent page fault won't happen). + */ + if (zap_flags & ZAP_FLAG_DROP_FILE_UFFD_WP) + huge_pte_clear(mm, address, ptep, sz); + spin_unlock(ptl); + continue; + } + /* * Migrating hugepage or HWPoisoned hugepage is already * unmapped and its refcount is dropped, so just clear pte here. @@ -4039,9 +4052,10 @@ void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct *vma, void __unmap_hugepage_range_final(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { - __unmap_hugepage_range(tlb, vma, start, end, ref_page); + __unmap_hugepage_range(tlb, vma, start, end, ref_page, zap_flags); /* * Clear this flag so that x86's huge_pmd_share page_table_shareable @@ -4057,12 +4071,13 @@ void __unmap_hugepage_range_final(struct mmu_gather *tlb, } void unmap_hugepage_range(struct vm_area_struct *vma, unsigned long start, - unsigned long end, struct page *ref_page) + unsigned long end, struct page *ref_page, + unsigned long zap_flags) { struct mmu_gather tlb; tlb_gather_mmu(&tlb, vma->vm_mm); - __unmap_hugepage_range(&tlb, vma, start, end, ref_page); + __unmap_hugepage_range(&tlb, vma, start, end, ref_page, zap_flags); tlb_finish_mmu(&tlb); } @@ -4117,7 +4132,7 @@ static void unmap_ref_private(struct mm_struct *mm, struct vm_area_struct *vma, */ if (!is_vma_resv_set(iter_vma, HPAGE_RESV_OWNER)) unmap_hugepage_range(iter_vma, address, - address + huge_page_size(h), page); + address + huge_page_size(h), page, 0); } i_mmap_unlock_write(mapping); } diff --git a/mm/memory.c b/mm/memory.c index 766946d3eab0..4bf7f8e83733 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1515,8 +1515,11 @@ static void unmap_single_vma(struct mmu_gather *tlb, * safe to do nothing in this case. */ if (vma->vm_file) { + unsigned long zap_flags = details ? + details->zap_flags : 0; i_mmap_lock_write(vma->vm_file->f_mapping); - __unmap_hugepage_range_final(tlb, vma, start, end, NULL); + __unmap_hugepage_range_final(tlb, vma, start, end, + NULL, zap_flags); i_mmap_unlock_write(vma->vm_file->f_mapping); } } else From patchwork Tue Mar 23 00:50:57 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156519 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id BBF0BC433E0 for ; Tue, 23 Mar 2021 00:51:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 65BDF6196C for ; Tue, 23 Mar 2021 00:51:04 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 65BDF6196C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id F420C6B0140; Mon, 22 Mar 2021 20:51:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC9716B0142; Mon, 22 Mar 2021 20:51:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CD1E56B0143; Mon, 22 Mar 2021 20:51:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0203.hostedemail.com [216.40.44.203]) by kanga.kvack.org (Postfix) with ESMTP id A4D856B0140 for ; Mon, 22 Mar 2021 20:51:03 -0400 (EDT) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 6F1B81803F9A7 for ; Tue, 23 Mar 2021 00:51:03 +0000 (UTC) X-FDA: 77949309606.12.43F9AF0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 69A5180192C0 for ; Tue, 23 Mar 2021 00:51:02 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460662; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=gMz4zykUsRqLQsQnpjugwWFonlMak/AsLri3yN76IU8=; b=QTxs42UBsYO+oy1OAP26VABNFhFommfYiJCSiw1W1wCrcMhP3v9crffMPHwv2QK4JoPGbk dgzCM6DiGOVyM0A9d6P3vhE0BF2jDG9IOOU1zeMbwp8qwGjjFCNK7ToXyhwFEUhtaQTmvn zwhAeTV9GKl67wWPFqDBIi/WBwGmlhw= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-215-1PImHJzUNRCHDkrDPtkYNQ-1; Mon, 22 Mar 2021 20:51:00 -0400 X-MC-Unique: 1PImHJzUNRCHDkrDPtkYNQ-1 Received: by mail-qv1-f69.google.com with SMTP id k92so542369qva.20 for ; Mon, 22 Mar 2021 17:51:00 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=gMz4zykUsRqLQsQnpjugwWFonlMak/AsLri3yN76IU8=; b=kwEOoCVC7EqoSlcLwwuRPBnXsXXirfvam06YNKlHnhKkKZeL3Q/Zo87w0NZ6REW3X+ eLt/0ZqJUJwhVkvv4BShMYvKV/qizXjTFDpobCHKaA8ptLMm1Tr2FlGaCd2Q1402144p CMlPyug+YRtWMBWnf96p0xBba/KW7LJheQtgzO/Y7jxf3JDbk6OTONI32dThra91YAB2 C7SCKQFldmk38zDvEkILEXnFt83vBxZq+6QSH5pKWkN30GdU8jG2bWS0kSS7xdsJg5BI 9giSkQOQUbFHm4xXPRwiAGrk37bD1PV6+D/rL+TebOO1NNEETepvcDYBe61VhNLRVIFy Swug== X-Gm-Message-State: AOAM531ji7Qpss+ElLBxb9/52xejPQxdV6/4TIYxzuHUu3NQysXWHolu QipsaWRNM5Cc2aQ6PXIalklNE2JYFwHTSBWbPqSe+rDQw3V7WBBgNDmwjqMYh89dk1AiUeZzghb bPm5inJiqxetcXjpzpwyfP6zUXASJmeT+AankIn3x+Dp7hte1Z082MzyyPVrR X-Received: by 2002:a37:6848:: with SMTP id d69mr2970021qkc.159.1616460659760; Mon, 22 Mar 2021 17:50:59 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz3wt9SmWeNMBwi1yvJ4wgntypuHoNSgIQv+r28OKmzDrhpIEGPoRQZMlpRSyZw5Ld0JfAMuA== X-Received: by 2002:a37:6848:: with SMTP id d69mr2969993qkc.159.1616460659449; Mon, 22 Mar 2021 17:50:59 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id c27sm12337483qko.71.2021.03.22.17.50.58 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:50:58 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Mike Rapoport , Nadav Amit , Jerome Glisse , Hugh Dickins , Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Axel Rasmussen , peterx@redhat.com, Mike Kravetz , Matthew Wilcox Subject: [PATCH 22/23] mm/userfaultfd: Enable write protection for shmem & hugetlbfs Date: Mon, 22 Mar 2021 20:50:57 -0400 Message-Id: <20210323005057.36027-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: husq4fgup57jay8d4et3xyezw3dah1mz X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 69A5180192C0 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf27; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460662-273169 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: We've had all the necessary changes ready for both shmem and hugetlbfs. Turn on all the shmem/hugetlbfs switches for userfaultfd-wp. Now we can remove the flags parameter for vma_can_userfault() since not used any more. Meanwhile, we can expand UFFD_API_RANGE_IOCTLS_BASIC with _UFFDIO_WRITEPROTECT too because all existing types now support write protection mode. Since vma_can_userfault() will be used elsewhere, move into userfaultfd_k.h. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 18 ------------------ include/linux/userfaultfd_k.h | 14 ++++++++++++++ include/uapi/linux/userfaultfd.h | 7 +++++-- mm/userfaultfd.c | 10 +++------- 4 files changed, 22 insertions(+), 27 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index f6fa34f58c37..b4f30fe84aa3 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1275,24 +1275,6 @@ static __always_inline int validate_range(struct mm_struct *mm, return 0; } -static inline bool vma_can_userfault(struct vm_area_struct *vma, - unsigned long vm_flags) -{ - /* FIXME: add WP support to hugetlbfs and shmem */ - if (vm_flags & VM_UFFD_WP) { - if (is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) - return false; - } - - if (vm_flags & VM_UFFD_MINOR) { - if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma))) - return false; - } - - return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || - vma_is_shmem(vma); -} - static int userfaultfd_register(struct userfaultfd_ctx *ctx, unsigned long arg) { diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index fefebe6e9656..413323fc81ca 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -16,6 +16,7 @@ #include #include #include +#include /* The set of all possible UFFD-related VM flags. */ #define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR) @@ -132,6 +133,19 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma) return vma->vm_flags & __VM_UFFD_FLAGS; } +static inline bool vma_can_userfault(struct vm_area_struct *vma, + unsigned long vm_flags) +{ + if (vm_flags & VM_UFFD_MINOR) { + if (!(is_vm_hugetlb_page(vma) || vma_is_shmem(vma))) + return false; + } + + return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || + vma_is_shmem(vma); +} + + extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); extern void dup_userfaultfd_complete(struct list_head *); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 47d9790d863d..000cc4cfc787 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -32,7 +32,8 @@ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ - UFFD_FEATURE_MINOR_SHMEM) + UFFD_FEATURE_MINOR_SHMEM | \ + UFFD_FEATURE_WP_HUGETLBFS_SHMEM) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -46,7 +47,8 @@ #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ - (__u64)1 << _UFFDIO_CONTINUE) + (__u64)1 << _UFFDIO_CONTINUE | \ + (__u64)1 << _UFFDIO_WRITEPROTECT) /* * Valid ioctl command number range with this API is from 0x00 to @@ -198,6 +200,7 @@ struct uffdio_api { #define UFFD_FEATURE_THREAD_ID (1<<8) #define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) #define UFFD_FEATURE_MINOR_SHMEM (1<<10) +#define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<11) __u64 features; __u64 ioctls; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index a2b0dcc80a19..e6c6095878bb 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -452,7 +452,6 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, break; } } else { - VM_WARN_ON_ONCE(wp_copy); err = shmem_mcopy_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, src_addr, mode, page, wp_copy); } @@ -683,15 +682,12 @@ int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, err = -ENOENT; dst_vma = find_dst_vma(dst_mm, start, len); - /* - * Make sure the vma is not shared, that the dst range is - * both valid and fully within a single existing vma. - */ - if (!dst_vma || (dst_vma->vm_flags & VM_SHARED)) + + if (!dst_vma) goto out_unlock; if (!userfaultfd_wp(dst_vma)) goto out_unlock; - if (!vma_is_anonymous(dst_vma)) + if (!vma_can_userfault(dst_vma, dst_vma->vm_flags)) goto out_unlock; if (is_vm_hugetlb_page(dst_vma)) { From patchwork Tue Mar 23 00:50:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12156521 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id ECBB8C433DB for ; Tue, 23 Mar 2021 00:51:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 9593B619A9 for ; Tue, 23 Mar 2021 00:51:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 9593B619A9 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2E4756B0142; Mon, 22 Mar 2021 20:51:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2BAE76B0144; Mon, 22 Mar 2021 20:51:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1839E6B0145; Mon, 22 Mar 2021 20:51:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0061.hostedemail.com [216.40.44.61]) by kanga.kvack.org (Postfix) with ESMTP id E881A6B0142 for ; Mon, 22 Mar 2021 20:51:07 -0400 (EDT) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B67A5180ACF6C for ; Tue, 23 Mar 2021 00:51:07 +0000 (UTC) X-FDA: 77949309774.21.81600B0 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf25.hostedemail.com (Postfix) with ESMTP id 438FD6000102 for ; Tue, 23 Mar 2021 00:51:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1616460666; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=XCYwl192EbRTTqhTN6RAwmcLAOqLyfsZgbfALtQItf4=; b=QOxNgtZnFtU91xCihNvf1xKHu7ziY7JyGbuVPR6iprV9aBocSIzLqcH4+RnWRLUv93BdT2 XmLzEF4Vv7JGzOlPuVnAUeuz8/LbNK6P4jzlOchfZLvQIKlA2gDqY/gwcMByih0TTT9ViX 6mmyVSXUurTQ9fxzus0fFQavC3fAx9Y= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-83-v_YmQKvXM1-BAYsxLBHtMw-1; Mon, 22 Mar 2021 20:51:03 -0400 X-MC-Unique: v_YmQKvXM1-BAYsxLBHtMw-1 Received: by mail-qv1-f69.google.com with SMTP id da16so574251qvb.2 for ; Mon, 22 Mar 2021 17:51:02 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=XCYwl192EbRTTqhTN6RAwmcLAOqLyfsZgbfALtQItf4=; b=O2SMGco6vbtacdasIV1OPuQgbNHoCY+MPTR8auVwZQXWpMLt3gKCg9DhzicNpZ/vmN hRigiPUQcFqSng9k0dK/qmz11s3rFVNJT65CnkyHG2des4+WdbygzSQs20tkQNIDne7g ga0BYRw3qAMiQNRECIQAOYVTT4C8bQIA/bM6tqwAdNjg38DmrBJd8SZ7FIk1UOtzfrpE ncsn05Jpq6u1Lo4wF2WnzHJscTEenGEO+cXyqpaGKfsQA2e94tzpXCCgRLawxiQcPZVw 3hc809fOBB0rnSYpJSV4aYKRZSS8TVYPTxxMIEREcvK/LA/ISVrpl2VYzgNytfjlXdD4 xfYg== X-Gm-Message-State: AOAM5338TqJjqmv8PyWWgJHjpIyIfTq+TRS9orBnCHpj1gG2xfLdNQCw RP3tiXr1inwksGfSqVnKTFUVmD4p7KXLRj0QfV3gWYGGYJuDlxphyN94ezdj3Jy0B9Cmi2O26M1 8xpYtc7mZeXnQdXzjdf+seghSRQ6IbTteaERPLeFzPTMSGpubBFN/FQIfEftN X-Received: by 2002:a0c:aa45:: with SMTP id e5mr2400663qvb.44.1616460662195; Mon, 22 Mar 2021 17:51:02 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzz8+ectXkImQeBvED857DjCH8Njpx7f7CDz4cyDPiZa49fT5ITS2E9jSS61/fmqfTmNpQtdw== X-Received: by 2002:a0c:aa45:: with SMTP id e5mr2400634qvb.44.1616460661914; Mon, 22 Mar 2021 17:51:01 -0700 (PDT) Received: from localhost.localdomain (bras-base-toroon474qw-grc-82-174-91-135-175.dsl.bell.ca. [174.91.135.175]) by smtp.gmail.com with ESMTPSA id i8sm9816695qtj.16.2021.03.22.17.51.00 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 22 Mar 2021 17:51:01 -0700 (PDT) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Mike Rapoport , Nadav Amit , Jerome Glisse , Hugh Dickins , Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" , Axel Rasmussen , peterx@redhat.com, Mike Kravetz , Matthew Wilcox Subject: [PATCH 23/23] userfaultfd/selftests: Enable uffd-wp for shmem/hugetlbfs Date: Mon, 22 Mar 2021 20:50:59 -0400 Message-Id: <20210323005059.36084-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210323004912.35132-1-peterx@redhat.com> References: <20210323004912.35132-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 438FD6000102 X-Stat-Signature: 6rh7d9p1wjsu56ty715t3tihktpzdeyt Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf25; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1616460666-304251 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After we added support for shmem and hugetlbfs, we can turn uffd-wp test on always now. Define HUGETLB_EXPECTED_IOCTLS to avoid using UFFD_API_RANGE_IOCTLS_BASIC, because UFFD_API_RANGE_IOCTLS_BASIC is normally a superset of capabilities, while the test may not satisfy them all. E.g., when hugetlb registered without minor mode, then we need to explicitly remove _UFFDIO_CONTINUE. Same thing to uffd-wp, as we'll need to explicitly remove _UFFDIO_WRITEPROTECT if not registered with uffd-wp. For the long term, we may consider dropping UFFD_API_* macros completely from uapi/linux/userfaultfd.h header files, because it may cause kernel header update to easily break userspace. Signed-off-by: Peter Xu --- tools/testing/selftests/vm/userfaultfd.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 1f5f9362ec7b..5fa9a506ded5 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -80,7 +80,7 @@ static int test_type; static volatile bool test_uffdio_copy_eexist = true; static volatile bool test_uffdio_zeropage_eexist = true; /* Whether to test uffd write-protection */ -static bool test_uffdio_wp = false; +static bool test_uffdio_wp = true; static bool map_shared; static int shm_fd; @@ -319,6 +319,9 @@ struct uffd_test_ops { (1 << _UFFDIO_ZEROPAGE) | \ (1 << _UFFDIO_WRITEPROTECT)) +#define HUGETLB_EXPECTED_IOCTLS ((1 << _UFFDIO_WAKE) | \ + (1 << _UFFDIO_COPY)) + static struct uffd_test_ops anon_uffd_test_ops = { .expected_ioctls = ANON_EXPECTED_IOCTLS, .allocate_area = anon_allocate_area, @@ -334,7 +337,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = { }; static struct uffd_test_ops hugetlb_uffd_test_ops = { - .expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC & ~(1 << _UFFDIO_CONTINUE), + .expected_ioctls = HUGETLB_EXPECTED_IOCTLS, .allocate_area = hugetlb_allocate_area, .release_pages = hugetlb_release_pages, .alias_mapping = hugetlb_alias_mapping, @@ -1433,8 +1436,6 @@ static void set_test_type(const char *type) if (!strcmp(type, "anon")) { test_type = TEST_ANON; uffd_test_ops = &anon_uffd_test_ops; - /* Only enable write-protect test for anonymous test */ - test_uffdio_wp = true; } else if (!strcmp(type, "hugetlb")) { test_type = TEST_HUGETLB; uffd_test_ops = &hugetlb_uffd_test_ops;