From patchwork Mon Nov 15 08:01:46 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12618871 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 074BEC433EF for ; Mon, 15 Nov 2021 08:02:08 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 87F9063219 for ; Mon, 15 Nov 2021 08:02:07 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.4.1 mail.kernel.org 87F9063219 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=kvack.org Received: by kanga.kvack.org (Postfix) id 268856B007B; Mon, 15 Nov 2021 03:02:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 218CE6B0093; Mon, 15 Nov 2021 03:02:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0E1A76B0095; Mon, 15 Nov 2021 03:02:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0170.hostedemail.com [216.40.44.170]) by kanga.kvack.org (Postfix) with ESMTP id ED6B26B0093 for ; Mon, 15 Nov 2021 03:02:06 -0500 (EST) Received: from smtpin08.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B47878249980 for ; Mon, 15 Nov 2021 08:02:06 +0000 (UTC) X-FDA: 78810421452.08.6CF860F Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf27.hostedemail.com (Postfix) with ESMTP id 3C53C70000AD for ; Mon, 15 Nov 2021 08:02:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1636963325; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=6hmTwTIZrb3ZAEbduDIRoqEEL0oiwJCe8ujMVDm5FPs=; b=XWY9xZSjkw+R82HodG9GBLfxJ9cSLbdYUyr/StXBqtLlgGW6c/r5P/LkTT5i5aBPB5K2DK /42VdsHL5aFCaLNkRfH+I2w/0G+zDVL5kP0UUwClFCccVEm5NmznSkwFzOjyjGev3iAPor dhHaP03aJonmbvDPR1ByZXWTFU8SyDQ= Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-275-RALdWTE5NPug6MYuD0vmUw-1; Mon, 15 Nov 2021 03:02:04 -0500 X-MC-Unique: RALdWTE5NPug6MYuD0vmUw-1 Received: by mail-pl1-f199.google.com with SMTP id k9-20020a170902c40900b001421e921ccaso5769313plk.22 for ; Mon, 15 Nov 2021 00:02:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=6hmTwTIZrb3ZAEbduDIRoqEEL0oiwJCe8ujMVDm5FPs=; b=Bqq7+X0BD1mBuNxP3lT7k5pKJ+cJ2TlQt6nNym9CjAGRSeacI1ep9GZihDHiEal5vw vbOmDqz+9kcBqwZycPi/Rv+bEpsyqhJzyCPHO64UntLGjYnPI2KZO54/uvwJnN6lj47n y6pw/qoGMBRLY6TZVwZR6cK0Y8xY0TvYBAlukZgrMwcCCiMh9zyaCM9ssaniz4V0eouu oIgP+siktFdJmUcuK54ubgPul/+iCuc3fbzzWrJkRXSV4i8wQ6bz0a2VlYLJr4vjXNBl NejzvlpZ6ECDgK4KEu7cH95bxet6sVOuFhWqDSXTutVk8J8iuzm53ENDwpT4lvzg4VND PC7Q== X-Gm-Message-State: AOAM532XVr8qH6seqiE69mODQGwHm9ihVNUQb4GuE1Eogl/bCNNqwX0J +2qwO8EgvwCi0gSmRc0ZoNkw9/LtLt6JexMJ/SAew7BekpwwjcU/BUH+hMhbn9QvBErb1+BN33W jXBl629suFkiWjAFA4Ls7NWWJ0B/v7a5oscUUStM8RhOSR6bdpzDx/2L34J4L X-Received: by 2002:a17:90a:4b47:: with SMTP id o7mr25848459pjl.92.1636963320609; Mon, 15 Nov 2021 00:02:00 -0800 (PST) X-Google-Smtp-Source: ABdhPJzlWxTaC0v+r/YuGJSh03eszu/Jv/h92Vt+54eS7+7+RnqDDaawpklrXtrShKxo0IbKQ4Ql0g== X-Received: by 2002:a17:90a:4b47:: with SMTP id o7mr25848392pjl.92.1636963320125; Mon, 15 Nov 2021 00:02:00 -0800 (PST) Received: from localhost.localdomain ([94.177.118.89]) by smtp.gmail.com with ESMTPSA id f21sm9904786pfe.69.2021.11.15.00.01.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 15 Nov 2021 00:01:59 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Nadav Amit , peterx@redhat.com, Alistair Popple , Andrew Morton , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Jerome Glisse , Axel Rasmussen , "Kirill A . Shutemov" , David Hildenbrand , Andrea Arcangeli , Hugh Dickins Subject: [PATCH v6 13/23] mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP Date: Mon, 15 Nov 2021 16:01:46 +0800 Message-Id: <20211115080146.74812-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20211115075522.73795-1-peterx@redhat.com> References: <20211115075522.73795-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=XWY9xZSj; spf=none (imf27.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 3C53C70000AD X-Stat-Signature: sxen1apsgr7ajx44i7b7ormgft3s4ic6 X-HE-Tag: 1636963326-991910 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout the stack. Apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with UFFDIO_COPY. Hugetlb pages are only managed by hugetlbfs, so we're safe even without setting dirty bit in the huge pte if the page is installed as read-only. However we'd better still keep the dirty bit set for a read-only UFFDIO_COPY pte (when UFFDIO_COPY_MODE_WP bit is set), not only to match what we do with shmem, but also because the page does contain dirty data that the kernel just copied from the userspace. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 29 +++++++++++++++++++++++------ mm/userfaultfd.c | 14 +++++++++----- 3 files changed, 36 insertions(+), 13 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 00351ccb49a3..4da0c4b4159a 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -160,7 +160,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep); + struct page **pagep, + bool wp_copy); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, @@ -355,7 +356,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { BUG(); return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3a10274b2e39..8146240eefc6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5740,7 +5740,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct hstate *h = hstate_vma(dst_vma); @@ -5868,7 +5869,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, goto out_release_unlock; ret = -EEXIST; - if (!huge_pte_none(huge_ptep_get(dst_pte))) + /* + * We allow to overwrite a pte marker: consider when both MISSING|WP + * registered, we firstly wr-protect a none pte which has no page cache + * page backing it, then access the page. + */ + if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) goto out_release_unlock; if (vm_shared) { @@ -5878,17 +5884,28 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); } - /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */ - if (is_continue && !vm_shared) + /* + * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY + * with wp flag set, don't set pte write bit. + */ + if (wp_copy || (is_continue && !vm_shared)) writable = 0; else writable = dst_vma->vm_flags & VM_WRITE; _dst_pte = make_huge_pte(dst_vma, page, writable); - if (writable) - _dst_pte = huge_pte_mkdirty(_dst_pte); + /* + * Always mark UFFDIO_COPY page dirty; note that this may not be + * extremely important for hugetlbfs for now since swapping is not + * supported, but we should still be clear in that this page cannot be + * thrown away at will, even if write bit not set. + */ + _dst_pte = huge_pte_mkdirty(_dst_pte); _dst_pte = pte_mkyoung(_dst_pte); + if (wp_copy) + _dst_pte = huge_pte_mkuffd_wp(_dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 95e5a9ba3196..6174a212c72f 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -291,7 +291,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode) + enum mcopy_atomic_mode mode, + bool wp_copy) { int vm_shared = dst_vma->vm_flags & VM_SHARED; ssize_t err; @@ -379,7 +380,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } if (mode != MCOPY_ATOMIC_CONTINUE && - !huge_pte_none(huge_ptep_get(dst_pte))) { + !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err = -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -387,7 +388,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, - dst_addr, src_addr, mode, &page); + dst_addr, src_addr, mode, &page, + wp_copy); mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -442,7 +444,8 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode); + enum mcopy_atomic_mode mode, + bool wp_copy); #endif /* CONFIG_HUGETLB_PAGE */ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -562,7 +565,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, mcopy_mode); + src_start, len, mcopy_mode, + wp_copy); if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock;