From patchwork Wed Feb 17 16:31:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12091899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE9B3C433E0 for ; Wed, 17 Feb 2021 16:32:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8E63764E15 for ; Wed, 17 Feb 2021 16:32:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8E63764E15 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2B0286B0006; Wed, 17 Feb 2021 11:32:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 260BA6B006C; Wed, 17 Feb 2021 11:32:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1504B6B006E; Wed, 17 Feb 2021 11:32:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id EF67B6B0006 for ; Wed, 17 Feb 2021 11:32:42 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 82FF2B2BD for ; Wed, 17 Feb 2021 16:32:42 +0000 (UTC) X-FDA: 77828303364.03.face07_3414c212764d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 70D9328A252 for ; Wed, 17 Feb 2021 16:31:10 +0000 (UTC) X-HE-Tag: face07_3414c212764d X-Filterd-Recvd-Size: 9008 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Wed, 17 Feb 2021 16:31:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613579469; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d1HW5swS9ZOyvERvBufh4SMfymzI+ErI982WYqgv7a0=; b=QNUCNfHA0HUB1fotL9luDTJ9Y+nxhuLdPg5jYW/7r79vo0vv8Xz2FPSoaJ8Fz2YeN89o88 IT+ZEX3wAvPzxhzptndRh9WhEHVrHvZz0gD/y92UTqc/N2EYHV3tN1PWyW9MlrE8Is9tD8 ZokON0rxlzZfhkMyP9lAHEWMrBOQM38= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-445-dcps5XkQM-KBvndutfu9Pw-1; Wed, 17 Feb 2021 11:31:07 -0500 X-MC-Unique: dcps5XkQM-KBvndutfu9Pw-1 Received: by mail-qv1-f71.google.com with SMTP id m1so10319543qvp.0 for ; Wed, 17 Feb 2021 08:31:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d1HW5swS9ZOyvERvBufh4SMfymzI+ErI982WYqgv7a0=; b=QmFb9unbNHFUK6WgCdYKlPDjaNddmdy0YLTFWOCottSHdJwmvr6Ahn5jCHeXQNM7j4 ZSgQLM4kepT0D+lvpniuY0Z6nCyRWeVXqw+Np2bU/IzoQNvDJ/Var+Wh7FxCNiF6cise iTIkuTM7T1BSGtq6gZjlPIKINaD8PWmxj3/GNl6lKEtjTdLT5HE1pcX/2hZ2UNvXSzHn OXrLxRbDfXpfKhfmB2T4hakV7ff2sbmtqHuUl1msuugmTlAIQQruprkAF8MPRhEVzEz9 j36Vhbqyk2oFJd9R8rAfxNrV26ci0zgmdI/nyW7Lo2f6Ei5gXgn+a/w3OXrVecMpud1d fuaA== X-Gm-Message-State: AOAM5334DRhbd3AXRzRDspurkibXLtvZXzEg4PnTP4ungAnJNxdYOGgw ZMPT8FCQRlO84cmn6FnYJ/mKYASHoId5qO38KjPzVMbP6vuoxGg5G5gm+oe2Tfe96z0HgZfzrDe 2/n8eOlAsg94= X-Received: by 2002:a05:620a:88f:: with SMTP id b15mr25963825qka.445.1613579467178; Wed, 17 Feb 2021 08:31:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJx6yJir8BfCh9ZGmOzmAyHa6uI/AAyLRzlhsDAnhQLoK8h1fNjp3S+IxZUTPXEvPmUnrB1h9w== X-Received: by 2002:a05:620a:88f:: with SMTP id b15mr25963804qka.445.1613579466917; Wed, 17 Feb 2021 08:31:06 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id z37sm399902qth.87.2021.02.17.08.31.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 08:31:06 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Axel Rasmussen , Andrew Morton , Matthew Wilcox , Mike Rapoport , "Kirill A . Shutemov" , Andrea Arcangeli , Mike Kravetz Subject: [PATCH 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Date: Wed, 17 Feb 2021 11:31:00 -0500 Message-Id: <20210217163102.13436-3-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210217163102.13436-1-peterx@redhat.com> References: <20210217163102.13436-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge pmd sharing could bring problem to userfaultfd. The thing is that userfaultfd is running its logic based on the special bits on page table entries, however the huge pmd sharing could potentially share page table entries for different address ranges. That could cause issues on either: - When sharing huge pmd page tables for an uffd write protected range, the newly mapped huge pmd range will also be write protected unexpectedly, or, - When we try to write protect a range of huge pmd shared range, we'll first do huge_pmd_unshare() in hugetlb_change_protection(), however that also means the UFFDIO_WRITEPROTECT could be silently skipped for the shared region, which could lead to data loss. Since at it, a few other things are done altogether: - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because that's definitely something that arch code would like to use too - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when trying to share huge pmd. Switch to the want_pmd_share() helper. Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share(). Signed-off-by: Peter Xu --- arch/arm64/mm/hugetlbpage.c | 3 +-- include/linux/hugetlb.h | 2 ++ include/linux/userfaultfd_k.h | 9 +++++++++ mm/hugetlb.c | 20 ++++++++++++++------ 4 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 6e3bcffe2837..58987a98e179 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -284,8 +284,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, */ ptep = pte_alloc_map(mm, pmdp, addr); } else if (sz == PMD_SIZE) { - if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && - pud_none(READ_ONCE(*pudp))) + if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp))) ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a6113fa6d21d..bc86f2f516e7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -950,4 +950,6 @@ static inline __init void hugetlb_cma_check(void) } #endif +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); + #endif /* _LINUX_HUGETLB_H */ diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index a8e5f3ea9bb2..c63ccdae3eab 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -52,6 +52,15 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, return vma->vm_userfaultfd_ctx.ctx == vm_ctx.ctx; } +/* + * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp + * protect information is per pgtable entry. + */ +static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_WP; +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07bb9bdc3282..8e8e2f3dfe06 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5292,6 +5292,18 @@ static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) return false; } +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +{ +#ifndef CONFIG_ARCH_WANT_HUGE_PMD_SHARE + return false; +#endif +#ifdef CONFIG_USERFAULTFD + if (uffd_disable_huge_pmd_share(vma)) + return false; +#endif + return vma_shareable(vma, addr); +} + /* * Determine if start,end range within vma could be mapped by shared pmd. * If yes, adjust start and end to cover range associated with possible @@ -5346,9 +5358,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *pte; spinlock_t *ptl; - if (!vma_shareable(vma, addr)) - return (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_assert_locked(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) @@ -5412,7 +5421,7 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE; return 1; } -#define want_pmd_share() (1) + #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, unsigned long addr, pud_t *pud) @@ -5430,7 +5439,6 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { } -#define want_pmd_share() (0) #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB @@ -5452,7 +5460,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte = (pte_t *)pud; } else { BUG_ON(sz != PMD_SIZE); - if (want_pmd_share() && pud_none(*pud)) + if (want_pmd_share(vma, addr) && pud_none(*pud)) pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr);