From patchwork Tue Apr 5 01:49:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12800969 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86D69C433F5 for ; Tue, 5 Apr 2022 01:54:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8F8186B0083; Mon, 4 Apr 2022 21:49:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A7EC6B0085; Mon, 4 Apr 2022 21:49:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7206B6B0087; Mon, 4 Apr 2022 21:49:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243]) by kanga.kvack.org (Postfix) with ESMTP id 64FA26B0083 for ; Mon, 4 Apr 2022 21:49:20 -0400 (EDT) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 2B30A1828AC85 for ; Tue, 5 Apr 2022 01:49:10 +0000 (UTC) X-FDA: 79321142460.28.E4537D6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf11.hostedemail.com (Postfix) with ESMTP id B139D40018 for ; Tue, 5 Apr 2022 01:49:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1649123349; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=95EZeCyII9xxx3gyIQ0ulWF2wAsEmWfCQ2/ndLgG/cw=; b=Mv7AcGNkNNbq1D8aJcJtFebae95Oo8GcyqjDHzyuOcKysKS2Wvi5JckLtliRxg7M3lknEz rSp13uCw9c/Eb8bwPwZOfmccsTalnswm/eK+fMpgyK711ELObSh63sbAbHJ6ogNwXCb8bN xZJJYHpWcuMzDxxW0Ao0RG5voJj/OI8= Received: from mail-il1-f200.google.com (mail-il1-f200.google.com [209.85.166.200]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-550--Lz3lX70PGO93-u_lsaClw-1; Mon, 04 Apr 2022 21:49:08 -0400 X-MC-Unique: -Lz3lX70PGO93-u_lsaClw-1 Received: by mail-il1-f200.google.com with SMTP id d13-20020a056e02214d00b002ca4d440f73so1848343ilv.15 for ; Mon, 04 Apr 2022 18:49:08 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=95EZeCyII9xxx3gyIQ0ulWF2wAsEmWfCQ2/ndLgG/cw=; b=mX6c9I4mP02Wyw8p4chhxQ6saCuffUH+7lzZDI3w6vD58SwC98II+kO16UUdVVgmNx bzAnfw7vVzJ/Aas0N5w//cmS3W7G5K14D259/JmZ6zCpzLMCQkURYbseKNI67kzgJrMR +eQLgSf3DKLD85FscespMAhdR60u3Bcv6vqdc3q3fNYZlI/DA49NvgdRNCaz0g1bpx+O ZuFbL/mQu07CnIsMnai+LZH4JyhEDLXdgtJ/iQXSE1hFApnpRL+w0mn8+R3qlVxQJSQw cpKVSR5uQTMAtThFf6w9VdgWLIvf52ErMpQbV7uk2ojwU1kGNhqIq9rdGE0riPlMiUWj /jrA== X-Gm-Message-State: AOAM532MSWLfP8jsPz/et5N6AgudP3ECzk6DcMc4bRFZZ4qY5k/rnzNT kDu2KttGaUtL1Pag9Yymb1BtkOoFm7W1ROpM1sT+3lsUL4SPSe60OppXXefWkhIYncWa6Ajez8y yvSh58+swz4g= X-Received: by 2002:a05:6602:13d5:b0:64c:9ef0:65e1 with SMTP id o21-20020a05660213d500b0064c9ef065e1mr588045iov.157.1649123346612; Mon, 04 Apr 2022 18:49:06 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxvBZ1x9YNEIDP6fPb8GcdToNu45lYsI3Q8dcSYoOOQ7uq3K76BU1AVUi+jy6YnevEjPpvOKw== X-Received: by 2002:a05:6602:13d5:b0:64c:9ef0:65e1 with SMTP id o21-20020a05660213d500b0064c9ef065e1mr588020iov.157.1649123346392; Mon, 04 Apr 2022 18:49:06 -0700 (PDT) Received: from localhost.localdomain (cpec09435e3e0ee-cmc09435e3e0ec.cpe.net.cable.rogers.com. [99.241.198.116]) by smtp.gmail.com with ESMTPSA id b15-20020a05660214cf00b0064cb75d7e97sm7836568iow.53.2022.04.04.18.49.05 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Mon, 04 Apr 2022 18:49:06 -0700 (PDT) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Mike Kravetz , Nadav Amit , Matthew Wilcox , Mike Rapoport , David Hildenbrand , Hugh Dickins , Jerome Glisse , "Kirill A . Shutemov" , Andrea Arcangeli , Andrew Morton , Axel Rasmussen , Alistair Popple , peterx@redhat.com Subject: [PATCH v8 13/23] mm/hugetlb: Take care of UFFDIO_COPY_MODE_WP Date: Mon, 4 Apr 2022 21:49:04 -0400 Message-Id: <20220405014904.14643-1-peterx@redhat.com> X-Mailer: git-send-email 2.32.0 In-Reply-To: <20220405014646.13522-1-peterx@redhat.com> References: <20220405014646.13522-1-peterx@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: wtq5fj7ywzjzpurbd3poo4pitjme3mx1 Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=Mv7AcGNk; spf=none (imf11.hostedemail.com: domain of peterx@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=peterx@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: B139D40018 X-HE-Tag: 1649123349-81044 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pass the wp_copy variable into hugetlb_mcopy_atomic_pte() thoughout the stack. Apply the UFFD_WP bit if UFFDIO_COPY_MODE_WP is with UFFDIO_COPY. Hugetlb pages are only managed by hugetlbfs, so we're safe even without setting dirty bit in the huge pte if the page is installed as read-only. However we'd better still keep the dirty bit set for a read-only UFFDIO_COPY pte (when UFFDIO_COPY_MODE_WP bit is set), not only to match what we do with shmem, but also because the page does contain dirty data that the kernel just copied from the userspace. Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 6 ++++-- mm/hugetlb.c | 29 +++++++++++++++++++++++------ mm/userfaultfd.c | 14 +++++++++----- 3 files changed, 36 insertions(+), 13 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 53c1b6082a4c..6347298778b6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -160,7 +160,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep); + struct page **pagep, + bool wp_copy); #endif /* CONFIG_USERFAULTFD */ bool hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, @@ -355,7 +356,8 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { BUG(); return 0; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 82df0fcfedf9..c94deead22b2 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5795,7 +5795,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, enum mcopy_atomic_mode mode, - struct page **pagep) + struct page **pagep, + bool wp_copy) { bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct hstate *h = hstate_vma(dst_vma); @@ -5925,7 +5926,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, goto out_release_unlock; ret = -EEXIST; - if (!huge_pte_none(huge_ptep_get(dst_pte))) + /* + * We allow to overwrite a pte marker: consider when both MISSING|WP + * registered, we firstly wr-protect a none pte which has no page cache + * page backing it, then access the page. + */ + if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) goto out_release_unlock; if (vm_shared) { @@ -5935,17 +5941,28 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); } - /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */ - if (is_continue && !vm_shared) + /* + * For either: (1) CONTINUE on a non-shared VMA, or (2) UFFDIO_COPY + * with wp flag set, don't set pte write bit. + */ + if (wp_copy || (is_continue && !vm_shared)) writable = 0; else writable = dst_vma->vm_flags & VM_WRITE; _dst_pte = make_huge_pte(dst_vma, page, writable); - if (writable) - _dst_pte = huge_pte_mkdirty(_dst_pte); + /* + * Always mark UFFDIO_COPY page dirty; note that this may not be + * extremely important for hugetlbfs for now since swapping is not + * supported, but we should still be clear in that this page cannot be + * thrown away at will, even if write bit not set. + */ + _dst_pte = huge_pte_mkdirty(_dst_pte); _dst_pte = pte_mkyoung(_dst_pte); + if (wp_copy) + _dst_pte = huge_pte_mkuffd_wp(_dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte, diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index b1c875b77fbb..da0b3ed2a6b5 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -304,7 +304,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode) + enum mcopy_atomic_mode mode, + bool wp_copy) { int vm_shared = dst_vma->vm_flags & VM_SHARED; ssize_t err; @@ -392,7 +393,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } if (mode != MCOPY_ATOMIC_CONTINUE && - !huge_pte_none(huge_ptep_get(dst_pte))) { + !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { err = -EEXIST; mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -400,7 +401,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, - dst_addr, src_addr, mode, &page); + dst_addr, src_addr, mode, &page, + wp_copy); mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -455,7 +457,8 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - enum mcopy_atomic_mode mode); + enum mcopy_atomic_mode mode, + bool wp_copy); #endif /* CONFIG_HUGETLB_PAGE */ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -575,7 +578,8 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, mcopy_mode); + src_start, len, mcopy_mode, + wp_copy); if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock;