From patchwork Thu Feb 18 23:06:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12094501 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 06DA5C433E0 for ; Thu, 18 Feb 2021 23:06:44 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7FF0064E22 for ; Thu, 18 Feb 2021 23:06:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7FF0064E22 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 1E5096B006E; Thu, 18 Feb 2021 18:06:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 1957F6B0070; Thu, 18 Feb 2021 18:06:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 084458D0001; Thu, 18 Feb 2021 18:06:42 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0052.hostedemail.com [216.40.44.52]) by kanga.kvack.org (Postfix) with ESMTP id E51B26B006E for ; Thu, 18 Feb 2021 18:06:42 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id A13EE1814065F for ; Thu, 18 Feb 2021 23:06:42 +0000 (UTC) X-FDA: 77832925044.20.753487E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 79190A0009D9 for ; Thu, 18 Feb 2021 23:06:41 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613689601; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ebvuLXgzVCijBNQq4nehnGr7QucDvuU4DcdVd3zOixA=; b=g3A8xe99UmK7I+YkpegSnPjI7kp/5L9dchYJT0bdJ+wQoVUO40wjaMnTXDnnMWgSo7K2Vs TlNzurCINJIkIQLFFaMxWyO4QMZkuuokvfDskj9gGmwGSzSUWQASl+EMSecW4A7anCuTq+ gJoQ8TWIGJsUM0k3fy3SZ7LY1nx9VSw= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-510-UH_afXqxOTeE7WETkIomzA-1; Thu, 18 Feb 2021 18:06:39 -0500 X-MC-Unique: UH_afXqxOTeE7WETkIomzA-1 Received: by mail-qv1-f72.google.com with SMTP id p4so2087067qvn.23 for ; Thu, 18 Feb 2021 15:06:39 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ebvuLXgzVCijBNQq4nehnGr7QucDvuU4DcdVd3zOixA=; b=lJ+Xqk2NYFlPaAE+Xh4LWKS0NUi5R95RGTDBSB417fvnUJeF2layo7fmyb/efCw6RT yW3NJe+vLXLvlsZ5CVwzzlpm2HMcu09JuhizCFQybJqvLj+HdIdgSsVN3dtjRJXZvzsb KTR/M+9Jjuhd+Y05bPfpa5ientfMwUbo5u3EDd8SPjpEBvgMdFkgloEK6ZqqtIK6brlL Rjek/z6IkmZ5tUNrV5iXkFaShXqp2icT8sGyAghg6AI+Q+ZIFsmRDO9ezL8yG4dmZ1Nt kDSpd5x+vec/qagmTQKFmfJNuWQv8uJBxrItWZo7ruAMXbCseGJ8EptRaDG5NuNvzmE5 mQrg== X-Gm-Message-State: AOAM531f1FvyMoc3q0nX+qKWzNuhy6Yec8pGQR8dgXWrmojQDYrsbhWe iLLpIGM4V3ZGkoz26Ghm6ZEjo+DMVcYOAwRQyP+FlUf7AWugLAnXWRAtszNnXb5kL6rGlbqOUGD XBjg6/UMEM00= X-Received: by 2002:a05:620a:1313:: with SMTP id o19mr6583885qkj.199.1613689597839; Thu, 18 Feb 2021 15:06:37 -0800 (PST) X-Google-Smtp-Source: ABdhPJx+Ci65j3eR0BoHzZFnBXpfpH1PkMDggM75Rko7b26GVbbbLOtXKmkrPjpyzdQdROoB/ePMJA== X-Received: by 2002:a05:620a:1313:: with SMTP id o19mr6583859qkj.199.1613689597584; Thu, 18 Feb 2021 15:06:37 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id p81sm4994324qke.18.2021.02.18.15.06.36 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 15:06:37 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: Andrew Morton , Matthew Wilcox , Axel Rasmussen , "Kirill A . Shutemov" , Mike Rapoport , Mike Kravetz , Andrea Arcangeli , peterx@redhat.com Subject: [PATCH v4 1/4] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() Date: Thu, 18 Feb 2021 18:06:30 -0500 Message-Id: <20210218230633.15028-2-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210218230633.15028-1-peterx@redhat.com> References: <20210218230633.15028-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: eykrhe8ud9zwqsjpwbrrgyihbq6h4p1t X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 79190A0009D9 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613689601-613282 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It is a preparation work to be able to behave differently in the per architecture huge_pte_alloc() according to different VMA attributes. Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call. Suggested-by: Mike Kravetz Signed-off-by: Peter Xu Reported-by: kernel test robot --- arch/arm64/mm/hugetlbpage.c | 4 ++-- arch/ia64/mm/hugetlbpage.c | 3 ++- arch/mips/mm/hugetlbpage.c | 4 ++-- arch/parisc/mm/hugetlbpage.c | 2 +- arch/powerpc/mm/hugetlbpage.c | 3 ++- arch/s390/mm/hugetlbpage.c | 2 +- arch/sh/mm/hugetlbpage.c | 2 +- arch/sparc/mm/hugetlbpage.c | 2 +- include/linux/hugetlb.h | 5 +++-- mm/hugetlb.c | 15 ++++++++------- mm/userfaultfd.c | 2 +- 11 files changed, 24 insertions(+), 20 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 55ecf6de9ff7..6e3bcffe2837 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -252,7 +252,7 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr, set_pte(ptep, pte); } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgdp; @@ -286,7 +286,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, } else if (sz == PMD_SIZE) { if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && pud_none(READ_ONCE(*pudp))) - ptep = huge_pmd_share(mm, addr, pudp); + ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); } else if (sz == (CONT_PMD_SIZE)) { diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c index b331f94d20ac..f993cb36c062 100644 --- a/arch/ia64/mm/hugetlbpage.c +++ b/arch/ia64/mm/hugetlbpage.c @@ -25,7 +25,8 @@ unsigned int hpage_shift = HPAGE_SHIFT_DEFAULT; EXPORT_SYMBOL(hpage_shift); pte_t * -huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) +huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { unsigned long taddr = htlbpage_to_page(addr); pgd_t *pgd; diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c index b9f76f433617..7eaff5b07873 100644 --- a/arch/mips/mm/hugetlbpage.c +++ b/arch/mips/mm/hugetlbpage.c @@ -21,8 +21,8 @@ #include #include -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, - unsigned long sz) +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { pgd_t *pgd; p4d_t *p4d; diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c index d7ba014a7fbb..e141441bfa64 100644 --- a/arch/parisc/mm/hugetlbpage.c +++ b/arch/parisc/mm/hugetlbpage.c @@ -44,7 +44,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 8b3cc4d688e8..d57276b8791c 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -106,7 +106,8 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp, * At this point we do the placement change only for BOOK3S 64. This would * possibly work on other subarchs. */ -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { pgd_t *pg; p4d_t *p4; diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index 3b5a4d25ca9b..da36d13ffc16 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -189,7 +189,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, return pte; } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgdp; diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c index 220d7bc43d2b..999ab5916e69 100644 --- a/arch/sh/mm/hugetlbpage.c +++ b/arch/sh/mm/hugetlbpage.c @@ -21,7 +21,7 @@ #include #include -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c index ad4b42f04988..04d8790f6c32 100644 --- a/arch/sparc/mm/hugetlbpage.c +++ b/arch/sparc/mm/hugetlbpage.c @@ -279,7 +279,7 @@ unsigned long pud_leaf_size(pud_t pud) { return 1UL << tte_to_shift(*(pte_t *)&p unsigned long pmd_leaf_size(pmd_t pmd) { return 1UL << tte_to_shift(*(pte_t *)&pmd); } unsigned long pte_leaf_size(pte_t pte) { return 1UL << tte_to_shift(pte); } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b5807f23caf8..a6113fa6d21d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -152,7 +152,8 @@ void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud); struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); @@ -161,7 +162,7 @@ extern struct list_head huge_boot_pages; /* arch callbacks */ -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz); pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4bdb58ab14cb..07bb9bdc3282 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3807,7 +3807,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, src_pte = huge_pte_offset(src, addr, sz); if (!src_pte) continue; - dst_pte = huge_pte_alloc(dst, addr, sz); + dst_pte = huge_pte_alloc(dst, vma, addr, sz); if (!dst_pte) { ret = -ENOMEM; break; @@ -4544,7 +4544,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, */ mapping = vma->vm_file->f_mapping; i_mmap_lock_read(mapping); - ptep = huge_pte_alloc(mm, haddr, huge_page_size(h)); + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); if (!ptep) { i_mmap_unlock_read(mapping); return VM_FAULT_OOM; @@ -5334,9 +5334,9 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * if !vma_shareable check at the beginning of the routine. i_mmap_rwsem is * only required for subsequent processing. */ -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud) { - struct vm_area_struct *vma = find_vma(mm, addr); struct address_space *mapping = vma->vm_file->f_mapping; pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; @@ -5414,7 +5414,8 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, } #define want_pmd_share() (1) #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, + unsigned long addr, pud_t *pud) { return NULL; } @@ -5433,7 +5434,7 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; @@ -5452,7 +5453,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, } else { BUG_ON(sz != PMD_SIZE); if (want_pmd_share() && pud_none(*pud)) - pte = huge_pmd_share(mm, addr, pud); + pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr); } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9a3d451402d7..063cbb17e8d8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -290,7 +290,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, mutex_lock(&hugetlb_fault_mutex_table[hash]); err = -ENOMEM; - dst_pte = huge_pte_alloc(dst_mm, dst_addr, vma_hpagesize); + dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); From patchwork Thu Feb 18 23:12:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12094503 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0F064C433DB for ; Thu, 18 Feb 2021 23:12:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 3570E64E5F for ; Thu, 18 Feb 2021 23:12:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 3570E64E5F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 9C4EE6B0005; Thu, 18 Feb 2021 18:12:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 975296B006C; Thu, 18 Feb 2021 18:12:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 863FE6B006E; Thu, 18 Feb 2021 18:12:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0126.hostedemail.com [216.40.44.126]) by kanga.kvack.org (Postfix) with ESMTP id 6E2A76B0005 for ; Thu, 18 Feb 2021 18:12:07 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 2A3051814C701 for ; Thu, 18 Feb 2021 23:12:07 +0000 (UTC) X-FDA: 77832938694.01.A34C9E7 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf15.hostedemail.com (Postfix) with ESMTP id 13DA4A0009E7 for ; Thu, 18 Feb 2021 23:12:05 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613689926; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=e4OLUk0Jy8zIZlRHoRelkknb6fgRUan1/BmpVqTVRpo=; b=gIO5+e3vFNEtIeTZEFX9uITjS67c+7/oIRL0XHmzClr4ex/EjfZ+JTiK5ajjRlAkcm8ZIm 8W35r6365v277jwHLtL3MKNhmLb5iVABmW0fJFwRD93MqmTw5CmtK0cXsRSKcIjdZtkfLN 7QxkYRLyTALlh5LvHGXDzKbKEeNhnLs= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-181-fdR8UnzrMpWeldVlThPK8w-1; Thu, 18 Feb 2021 18:12:04 -0500 X-MC-Unique: fdR8UnzrMpWeldVlThPK8w-1 Received: by mail-qv1-f71.google.com with SMTP id t18so2110809qva.6 for ; Thu, 18 Feb 2021 15:12:04 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=e4OLUk0Jy8zIZlRHoRelkknb6fgRUan1/BmpVqTVRpo=; b=W5bKeFe+/FQ6GLmNzBCaUP3Wc0xYcXLWRCepc2wjivHzY19sGqy7G8cq74g+tahJew J47tqzamUGdSyYZlTGEeqarDdlt+3p+NiItekuKqJM8zr44aRPqwro1co9Juaqi3dPhn Ojz2XozRX7FVJWz7BvJ7zHUVrJiIaDep9Or/N9odTQWUi/k5ccsmjzS0s0GcDCey9l55 gy21LlzsteywW55EBLKudnO5kb062zgFW/01RvTUcQPdvWlEjnAAg7L5nlOF5yt+BgjY MNfPfhgqzjiMDUtjQ/qGjpaap3jWk1Ua0F2ikczwDNVrtDSmiJQAV6EiC/gJmlgPZNIb /WaA== X-Gm-Message-State: AOAM530PuxMkUeggUxzPIENlUEUSpavVjoAJCyw1mcsgqE2i14VGgl3z ARovZFjOOQSbMoQ4notzrNTh+Hq5Sw0p+sPYEDluSUhnkaGq2hz5sxPqSv1OWamKTf2hnjMJFwC OR4bn/tvmM6I= X-Received: by 2002:a05:620a:1d:: with SMTP id j29mr6700541qki.44.1613689924290; Thu, 18 Feb 2021 15:12:04 -0800 (PST) X-Google-Smtp-Source: ABdhPJyNcCRt5mpNJEHZqxBiuceXrqWJT6w5UnTTmmRll9SNEnxh73o0Dd3vEIchpsfhphq0LTVt3A== X-Received: by 2002:a05:620a:1d:: with SMTP id j29mr6700517qki.44.1613689924057; Thu, 18 Feb 2021 15:12:04 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id c191sm2078643qke.1.2021.02.18.15.12.02 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 15:12:03 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Andrea Arcangeli , Axel Rasmussen , Mike Rapoport , "Kirill A . Shutemov" , Andrew Morton , Matthew Wilcox , Mike Kravetz Subject: [PATCH v4 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Date: Thu, 18 Feb 2021 18:12:02 -0500 Message-Id: <20210218231202.15426-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210218230633.15028-1-peterx@redhat.com> References: <20210218230633.15028-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 13DA4A0009E7 X-Stat-Signature: s98t5he91iecwauyon8jnsgfmrt95hzm Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf15; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613689925-570430 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge pmd sharing could bring problem to userfaultfd. The thing is that userfaultfd is running its logic based on the special bits on page table entries, however the huge pmd sharing could potentially share page table entries for different address ranges. That could cause issues on either: - When sharing huge pmd page tables for an uffd write protected range, the newly mapped huge pmd range will also be write protected unexpectedly, or, - When we try to write protect a range of huge pmd shared range, we'll first do huge_pmd_unshare() in hugetlb_change_protection(), however that also means the UFFDIO_WRITEPROTECT could be silently skipped for the shared region, which could lead to data loss. Since at it, a few other things are done altogether: - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because that's definitely something that arch code would like to use too - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when trying to share huge pmd. Switch to the want_pmd_share() helper. Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share(). Reviewed-by: Mike Kravetz Reviewed-by: Axel Rasmussen Signed-off-by: Peter Xu Reported-by: Naresh Kamboju --- arch/arm64/mm/hugetlbpage.c | 3 +-- include/linux/hugetlb.h | 2 ++ include/linux/userfaultfd_k.h | 9 +++++++++ mm/hugetlb.c | 20 ++++++++++++++------ 4 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 6e3bcffe2837..58987a98e179 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -284,8 +284,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, */ ptep = pte_alloc_map(mm, pmdp, addr); } else if (sz == PMD_SIZE) { - if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && - pud_none(READ_ONCE(*pudp))) + if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp))) ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a6113fa6d21d..bc86f2f516e7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -950,4 +950,6 @@ static inline __init void hugetlb_cma_check(void) } #endif +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); + #endif /* _LINUX_HUGETLB_H */ diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index a8e5f3ea9bb2..c63ccdae3eab 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -52,6 +52,15 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, return vma->vm_userfaultfd_ctx.ctx == vm_ctx.ctx; } +/* + * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp + * protect information is per pgtable entry. + */ +static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_WP; +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07bb9bdc3282..8e8e2f3dfe06 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5292,6 +5292,18 @@ static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) return false; } +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +{ +#ifndef CONFIG_ARCH_WANT_HUGE_PMD_SHARE + return false; +#endif +#ifdef CONFIG_USERFAULTFD + if (uffd_disable_huge_pmd_share(vma)) + return false; +#endif + return vma_shareable(vma, addr); +} + /* * Determine if start,end range within vma could be mapped by shared pmd. * If yes, adjust start and end to cover range associated with possible @@ -5346,9 +5358,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *pte; spinlock_t *ptl; - if (!vma_shareable(vma, addr)) - return (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_assert_locked(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) @@ -5412,7 +5421,7 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE; return 1; } -#define want_pmd_share() (1) + #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, unsigned long addr, pud_t *pud) @@ -5430,7 +5439,6 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { } -#define want_pmd_share() (0) #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB @@ -5452,7 +5460,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte = (pte_t *)pud; } else { BUG_ON(sz != PMD_SIZE); - if (want_pmd_share() && pud_none(*pud)) + if (want_pmd_share(vma, addr) && pud_none(*pud)) pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr); From patchwork Thu Feb 18 23:12:04 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12094505 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 984F7C433E0 for ; Thu, 18 Feb 2021 23:12:10 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2DDF964E5F for ; Thu, 18 Feb 2021 23:12:10 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2DDF964E5F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A8A726B006C; Thu, 18 Feb 2021 18:12:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9EC236B006E; Thu, 18 Feb 2021 18:12:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DA6B6B0070; Thu, 18 Feb 2021 18:12:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0056.hostedemail.com [216.40.44.56]) by kanga.kvack.org (Postfix) with ESMTP id 75C5C6B006C for ; Thu, 18 Feb 2021 18:12:09 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 46A748249980 for ; Thu, 18 Feb 2021 23:12:09 +0000 (UTC) X-FDA: 77832938778.13.7341034 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf14.hostedemail.com (Postfix) with ESMTP id 4F2A2C000C5F for ; Thu, 18 Feb 2021 23:12:04 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613689928; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=dE6cAOaasOy90WntQkvLo9Y3tII8NSr50Af2SEtWMdM=; b=OELS5IFn8mE71Me6wgbFNC96mMih7JOcIBzRQHjOUptehvJ2BCK/jtiTiusLg/w1eRr3g9 eZrg9Gtidkpy2dFh4fy/LU+p/b+/HtjqyFKtqQE/Fir/zkIqBU3Z6mIxbA7kONpxOIXP4A gDaktJACZ1I4sSdoVi/iwTm8Q7RxJbU= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-168-Z1WRHR1BO2KhW6K03JGiNw-1; Thu, 18 Feb 2021 18:12:07 -0500 X-MC-Unique: Z1WRHR1BO2KhW6K03JGiNw-1 Received: by mail-qk1-f200.google.com with SMTP id s6so2330235qkg.15 for ; Thu, 18 Feb 2021 15:12:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=dE6cAOaasOy90WntQkvLo9Y3tII8NSr50Af2SEtWMdM=; b=U4e5oVsBSNjsQde7GMWcO9kBZ++aXaGreMSH1OM+HkzFsBsbRsJozfKIgTr828AIOC dueCCS/xlX+qizkex0dYtjBBiWdvm6g1Nbzc12VO1Tgrj41KocwiJkGeR0xD5eP99hxr MvLSXO6p1FrSx89ZieQ+GCWyd7OKMQVUMMnZRSs8GgIcv/1lZ2RSwlxsQIYo1rZRTkYX J6PxqK/rpfTCZf/qx7+RAfsp6u2UWd5TMpveS9pMxlNU6CFyVhHm6y4Hlyv0+FMsVXOZ n80A0bhF20/oHYrw0blhk9/uoRxsvvSlzgddsBIVIvOFmNm8ZNmdhp6SaX5Vxnsy3i8w M18Q== X-Gm-Message-State: AOAM531vlAt1hYnJ3gBBMzLil4j7n7u/MKda36aFmVRP784pwVm1ry2Z AOPmA/9wKRCf2+pD233zbIz7imytuH7aPjraVHPZjMtDAY21ggrzG4GGoLi/nSZM5YdZcv/9bJg mlIa3bGgdVZo= X-Received: by 2002:a37:27c8:: with SMTP id n191mr6700821qkn.146.1613689926639; Thu, 18 Feb 2021 15:12:06 -0800 (PST) X-Google-Smtp-Source: ABdhPJxWtXzg806D3X1/9xAHfHtMWWudcrJTvGiuHxKFKciYfHvGxPXTrIKeyA4zWYlPt54g5V2HUw== X-Received: by 2002:a37:27c8:: with SMTP id n191mr6700800qkn.146.1613689926437; Thu, 18 Feb 2021 15:12:06 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id j5sm4243160qth.80.2021.02.18.15.12.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 15:12:05 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Andrea Arcangeli , Axel Rasmussen , Mike Rapoport , "Kirill A . Shutemov" , Andrew Morton , Matthew Wilcox , Mike Kravetz Subject: [PATCH v4 3/4] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Date: Thu, 18 Feb 2021 18:12:04 -0500 Message-Id: <20210218231204.15474-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210218230633.15028-1-peterx@redhat.com> References: <20210218230633.15028-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 4F2A2C000C5F X-Stat-Signature: nfkzxtodmrgoohcem46554hsprmcr3tr Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf14; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613689924-119635 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Prepare for it to be called outside of mm/hugetlb.c. Reviewed-by: Mike Kravetz Reviewed-by: Axel Rasmussen Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 8 ++++++++ mm/hugetlb.c | 8 -------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index bc86f2f516e7..3b4104021dd3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -952,4 +952,12 @@ static inline __init void hugetlb_cma_check(void) bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); +#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE +/* + * ARCHes with special requirements for evicting HUGETLB backing TLB entries can + * implement this. + */ +#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) +#endif + #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8e8e2f3dfe06..f53a0b852ed8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4965,14 +4965,6 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, return i ? i : err; } -#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE -/* - * ARCHes with special requirements for evicting HUGETLB backing TLB entries can - * implement this. - */ -#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) -#endif - unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot) { From patchwork Thu Feb 18 23:12:06 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12094507 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B9CCC433DB for ; Thu, 18 Feb 2021 23:12:15 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EDBD564E92 for ; Thu, 18 Feb 2021 23:12:14 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EDBD564E92 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8B6BB6B006E; Thu, 18 Feb 2021 18:12:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 88AFA8D0001; Thu, 18 Feb 2021 18:12:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 705056B0071; Thu, 18 Feb 2021 18:12:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0088.hostedemail.com [216.40.44.88]) by kanga.kvack.org (Postfix) with ESMTP id 5A73C6B006E for ; Thu, 18 Feb 2021 18:12:14 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 1C8A1181AF5C6 for ; Thu, 18 Feb 2021 23:12:14 +0000 (UTC) X-FDA: 77832938988.06.DEDB0AF Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf19.hostedemail.com (Postfix) with ESMTP id 5A2F890009EA for ; Thu, 18 Feb 2021 23:12:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613689933; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=4JoTh9i7nRqiPgisTFKHvYqnz+XigsyDbcSAEKf2YS0=; b=CCcUfxOhV0D4M06JXEwTEi42SF/hni+R/wnr9W69pwTk2L4i3m476P03FNe5Kn6uUsJOCC L2AakNW8u3yQWJI6FdY5RudlNQCjFKQt4OcxaiDisRkyDpp+gmO5Zjv43yu+tHQAa/SiuH jlwHbPQ2wx3vP2qUx1h6SrSn94vEqSM= Received: from mail-qt1-f198.google.com (mail-qt1-f198.google.com [209.85.160.198]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-452-eh7KTaIcNdKSW47HZt3nrg-1; Thu, 18 Feb 2021 18:12:09 -0500 X-MC-Unique: eh7KTaIcNdKSW47HZt3nrg-1 Received: by mail-qt1-f198.google.com with SMTP id d10so2172800qtx.8 for ; Thu, 18 Feb 2021 15:12:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=4JoTh9i7nRqiPgisTFKHvYqnz+XigsyDbcSAEKf2YS0=; b=ISZWuOHJEFXPVV1qJzsrxlUy5geaIzThWyr4GdCvhGj+vXn5CodQb3IBHFn1u7O2qh FNAsfJxP80O6OGKSdN2GbWnIYug7H5DLmWLVnKI/Zvk4HXwAFLLoflP+LDFW/6XTS6w1 pXj1AWrxsR9l/+4NhTP6yPkDTxbhlf/Lj5FusYp85/scP4wdc7/H4eh2u3oldYvkgPcx FCaiEamoDcDzyO/iyxSykljwC3RIRR7MOyPRrfrLY/IqDUTisiokTnL3LgmRr7UQAx0p jb/Pax2qpYHQOwIa/Sip/VysNpYzJWmP7D2yccQqDT7wxeLgDWVQRQh8StSQ7Emn0tRh c+FQ== X-Gm-Message-State: AOAM531DiJM1ZGNWqx/Khi+fFG/0sTVjrJcD1X6UIexU/lCrJI4NJhjL 9wk01o5Lh7Vtq4Eqi4Pf7O1q4p42V/TyGk2X54AqYsp2o6O44BJY7MYMzVbeuXDXjoFRl7eSnyv rKgFktt/17m4= X-Received: by 2002:a37:6f01:: with SMTP id k1mr6789270qkc.252.1613689928948; Thu, 18 Feb 2021 15:12:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJwH7uSnFwdtsTX7ZkSOcixigLZ7WrmzlE0H3TVrR6Bzrcir+R3XPF4G+Uufy0zQJHFnDKl59g== X-Received: by 2002:a37:6f01:: with SMTP id k1mr6789252qkc.252.1613689928723; Thu, 18 Feb 2021 15:12:08 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id b7sm5057179qkj.115.2021.02.18.15.12.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 18 Feb 2021 15:12:08 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Andrea Arcangeli , Axel Rasmussen , Mike Rapoport , "Kirill A . Shutemov" , Andrew Morton , Matthew Wilcox , Mike Kravetz Subject: [PATCH v4 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Date: Thu, 18 Feb 2021 18:12:06 -0500 Message-Id: <20210218231206.15524-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210218230633.15028-1-peterx@redhat.com> References: <20210218230633.15028-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: 5A2F890009EA X-Stat-Signature: hhf5hpx68h1q9odkz8fg358mzsi3p7bu Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf19; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613689931-966307 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because userfaultfd-wp is always based on pgtable entries, so they cannot be shared. Walk the hugetlb range and unshare all such mappings if there is, right before UFFDIO_REGISTER will succeed and return to userspace. This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing is completely disabled for userfaultfd-wp registered range. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- fs/userfaultfd.c | 4 ++++ include/linux/hugetlb.h | 3 +++ mm/hugetlb.c | 51 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 58 insertions(+) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 894cc28142e7..e259318fcae1 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vma->vm_flags = new_flags; vma->vm_userfaultfd_ctx.ctx = ctx; + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) + hugetlb_unshare_all_pmds(vma); + skip: prev = vma; start = vma->vm_end; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3b4104021dd3..6437483ad01b 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -188,6 +188,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot); bool is_hugetlb_entry_migration(pte_t pte); +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); #else /* !CONFIG_HUGETLB_PAGE */ @@ -369,6 +370,8 @@ static inline vm_fault_t hugetlb_fault(struct mm_struct *mm, return 0; } +static inline void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) { } + #endif /* !CONFIG_HUGETLB_PAGE */ /* * hugepages at page global directory. If arch support diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f53a0b852ed8..fc62932c31cb 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5653,6 +5653,57 @@ void move_hugetlb_state(struct page *oldpage, struct page *newpage, int reason) } } +/* + * This function will unconditionally remove all the shared pmd pgtable entries + * within the specific vma for a hugetlbfs memory range. + */ +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) +{ + struct hstate *h = hstate_vma(vma); + unsigned long sz = huge_page_size(h); + struct mm_struct *mm = vma->vm_mm; + struct mmu_notifier_range range; + unsigned long address, start, end; + spinlock_t *ptl; + pte_t *ptep; + + if (!(vma->vm_flags & VM_MAYSHARE)) + return; + + start = ALIGN(vma->vm_start, PUD_SIZE); + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); + + if (start >= end) + return; + + /* + * No need to call adjust_range_if_pmd_sharing_possible(), because + * we have already done the PUD_SIZE alignment. + */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, + start, end); + mmu_notifier_invalidate_range_start(&range); + i_mmap_lock_write(vma->vm_file->f_mapping); + for (address = start; address < end; address += PUD_SIZE) { + unsigned long tmp = address; + + ptep = huge_pte_offset(mm, address, sz); + if (!ptep) + continue; + ptl = huge_pte_lock(h, mm, ptep); + /* We don't want 'address' to be changed */ + huge_pmd_unshare(mm, vma, &tmp, ptep); + spin_unlock(ptl); + } + flush_hugetlb_tlb_range(vma, start, end); + i_mmap_unlock_write(vma->vm_file->f_mapping); + /* + * No need to call mmu_notifier_invalidate_range(), see + * Documentation/vm/mmu_notifier.rst. + */ + mmu_notifier_invalidate_range_end(&range); +} + #ifdef CONFIG_CMA static bool cma_reserve_called __initdata;