From patchwork Wed Feb 17 16:30:59 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12091891 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id A5C7DC433DB for ; Wed, 17 Feb 2021 16:31:11 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 46A7664E02 for ; Wed, 17 Feb 2021 16:31:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 46A7664E02 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id C775E6B0006; Wed, 17 Feb 2021 11:31:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C01216B006C; Wed, 17 Feb 2021 11:31:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA1FB6B006E; Wed, 17 Feb 2021 11:31:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0007.hostedemail.com [216.40.44.7]) by kanga.kvack.org (Postfix) with ESMTP id 8DED46B0006 for ; Wed, 17 Feb 2021 11:31:10 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id E3571942B for ; Wed, 17 Feb 2021 16:31:09 +0000 (UTC) X-FDA: 77828299458.01.37D9721 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf07.hostedemail.com (Postfix) with ESMTP id 520A8A0049D9 for ; Wed, 17 Feb 2021 16:31:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613579468; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/cOEu0qsFxSp6evGy5oqY57J2hfl6jR1TN+14A84sIU=; b=dM4mq7v6hyHYB/lgqfH2XFNlkbq3Ji2jBBztv9fi7eIHFpLNW/Wf3yXA99UxRgeywG5Jzl f2vs0k0TG6mgUUIPDaF2zu3Pl6Zzwu965ipQP3oX6rAu1M5ntIpSPoUK+l1bdPPpX7PQYR xvZ1dSPdU4JXvj6NWRmlBv2CeY2alsg= Received: from mail-qv1-f72.google.com (mail-qv1-f72.google.com [209.85.219.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-507-h1BDATiLOX2I0Y1KK2QS5Q-1; Wed, 17 Feb 2021 11:31:07 -0500 X-MC-Unique: h1BDATiLOX2I0Y1KK2QS5Q-1 Received: by mail-qv1-f72.google.com with SMTP id a13so10170353qvv.14 for ; Wed, 17 Feb 2021 08:31:06 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/cOEu0qsFxSp6evGy5oqY57J2hfl6jR1TN+14A84sIU=; b=DldVcS688VgWM/MepK6qfWwu/zE2jMdi78ZFG3pGsI3SoXRBCN6+4IZ/VZcSKODxfr 0UuJTZx0Io/wcORw+We1DZODlwCn8I3qzZNBqOtS98ibKkwhOH2GsuoW6ueGDwv8kLk/ pxOgVofYuFhdHbshHgp8L4X0+Ny7nKwh+L+Vm97tgi0TijzZ1R466tycRjMqOgVxdAuC FOpWFBmkdYxcQDgSpvvUulNpaicJoh6LxWdFy+rospytF2+1zYyeSGRKCeIgnSJesr7+ Q36bePAp7IvR+VhZo5QrYNWtY1vndp9wxKb7Dd2HimVOCk6fYe7eTrd86b6H212REZBA tJhg== X-Gm-Message-State: AOAM533gjwsVWn2pWry7F1ZTxNxx6nQggOlfBRDMiGNe+sHalIrtif2c wUIm+y+Tp8yc2HIRoNvIVkO7P/qcxS0WDTy8EF4djqhW+6DcDNSdEFFL3ApdhgtZEMnsuXM40mO z7E6o1xDZ8fs= X-Received: by 2002:a37:a896:: with SMTP id r144mr11247168qke.121.1613579465716; Wed, 17 Feb 2021 08:31:05 -0800 (PST) X-Google-Smtp-Source: ABdhPJypxSiiT45MFdlZ+0zyD0f4Diy/BbYjxcNySZvcwNuHQwu4koTRY7lhmoAGBdHvcGKK6NXrPw== X-Received: by 2002:a37:a896:: with SMTP id r144mr11247140qke.121.1613579465477; Wed, 17 Feb 2021 08:31:05 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id z37sm399902qth.87.2021.02.17.08.31.04 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 08:31:04 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Axel Rasmussen , Andrew Morton , Matthew Wilcox , Mike Rapoport , "Kirill A . Shutemov" , Andrea Arcangeli , Mike Kravetz Subject: [PATCH 1/4] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() Date: Wed, 17 Feb 2021 11:30:59 -0500 Message-Id: <20210217163102.13436-2-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210217163102.13436-1-peterx@redhat.com> References: <20210217163102.13436-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: bweo8tmxzz4b5uhzeskazetgx7bsk6ws X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 520A8A0049D9 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf07; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613579468-652022 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It is a preparation work to be able to behave differently in the per architecture huge_pte_alloc() according to different VMA attributes. Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call. Suggested-by: Mike Kravetz Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu Reported-by: kernel test robot --- arch/arm64/mm/hugetlbpage.c | 4 ++-- arch/ia64/mm/hugetlbpage.c | 3 ++- arch/mips/mm/hugetlbpage.c | 4 ++-- arch/parisc/mm/hugetlbpage.c | 2 +- arch/powerpc/mm/hugetlbpage.c | 3 ++- arch/s390/mm/hugetlbpage.c | 2 +- arch/sh/mm/hugetlbpage.c | 2 +- arch/sparc/mm/hugetlbpage.c | 1 + include/linux/hugetlb.h | 5 +++-- mm/hugetlb.c | 15 ++++++++------- mm/userfaultfd.c | 2 +- 11 files changed, 24 insertions(+), 19 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 55ecf6de9ff7..6e3bcffe2837 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -252,7 +252,7 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr, set_pte(ptep, pte); } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgdp; @@ -286,7 +286,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, } else if (sz == PMD_SIZE) { if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && pud_none(READ_ONCE(*pudp))) - ptep = huge_pmd_share(mm, addr, pudp); + ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); } else if (sz == (CONT_PMD_SIZE)) { diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c index b331f94d20ac..f993cb36c062 100644 --- a/arch/ia64/mm/hugetlbpage.c +++ b/arch/ia64/mm/hugetlbpage.c @@ -25,7 +25,8 @@ unsigned int hpage_shift = HPAGE_SHIFT_DEFAULT; EXPORT_SYMBOL(hpage_shift); pte_t * -huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) +huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { unsigned long taddr = htlbpage_to_page(addr); pgd_t *pgd; diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c index b9f76f433617..7eaff5b07873 100644 --- a/arch/mips/mm/hugetlbpage.c +++ b/arch/mips/mm/hugetlbpage.c @@ -21,8 +21,8 @@ #include #include -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, - unsigned long sz) +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { pgd_t *pgd; p4d_t *p4d; diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c index d7ba014a7fbb..e141441bfa64 100644 --- a/arch/parisc/mm/hugetlbpage.c +++ b/arch/parisc/mm/hugetlbpage.c @@ -44,7 +44,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 8b3cc4d688e8..d57276b8791c 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -106,7 +106,8 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp, * At this point we do the placement change only for BOOK3S 64. This would * possibly work on other subarchs. */ -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { pgd_t *pg; p4d_t *p4; diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index 3b5a4d25ca9b..da36d13ffc16 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -189,7 +189,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, return pte; } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgdp; diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c index 220d7bc43d2b..999ab5916e69 100644 --- a/arch/sh/mm/hugetlbpage.c +++ b/arch/sh/mm/hugetlbpage.c @@ -21,7 +21,7 @@ #include #include -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c index ad4b42f04988..97e0824fdbe7 100644 --- a/arch/sparc/mm/hugetlbpage.c +++ b/arch/sparc/mm/hugetlbpage.c @@ -280,6 +280,7 @@ unsigned long pmd_leaf_size(pmd_t pmd) { return 1UL << tte_to_shift(*(pte_t *)&p unsigned long pte_leaf_size(pte_t pte) { return 1UL << tte_to_shift(pte); } pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b5807f23caf8..a6113fa6d21d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -152,7 +152,8 @@ void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud); struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); @@ -161,7 +162,7 @@ extern struct list_head huge_boot_pages; /* arch callbacks */ -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz); pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4bdb58ab14cb..07bb9bdc3282 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3807,7 +3807,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, src_pte = huge_pte_offset(src, addr, sz); if (!src_pte) continue; - dst_pte = huge_pte_alloc(dst, addr, sz); + dst_pte = huge_pte_alloc(dst, vma, addr, sz); if (!dst_pte) { ret = -ENOMEM; break; @@ -4544,7 +4544,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, */ mapping = vma->vm_file->f_mapping; i_mmap_lock_read(mapping); - ptep = huge_pte_alloc(mm, haddr, huge_page_size(h)); + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); if (!ptep) { i_mmap_unlock_read(mapping); return VM_FAULT_OOM; @@ -5334,9 +5334,9 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * if !vma_shareable check at the beginning of the routine. i_mmap_rwsem is * only required for subsequent processing. */ -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud) { - struct vm_area_struct *vma = find_vma(mm, addr); struct address_space *mapping = vma->vm_file->f_mapping; pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; @@ -5414,7 +5414,8 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, } #define want_pmd_share() (1) #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, + unsigned long addr, pud_t *pud) { return NULL; } @@ -5433,7 +5434,7 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; @@ -5452,7 +5453,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, } else { BUG_ON(sz != PMD_SIZE); if (want_pmd_share() && pud_none(*pud)) - pte = huge_pmd_share(mm, addr, pud); + pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr); } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9a3d451402d7..063cbb17e8d8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -290,7 +290,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, mutex_lock(&hugetlb_fault_mutex_table[hash]); err = -ENOMEM; - dst_pte = huge_pte_alloc(dst_mm, dst_addr, vma_hpagesize); + dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); From patchwork Wed Feb 17 16:31:00 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12091899 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EE9B3C433E0 for ; Wed, 17 Feb 2021 16:32:43 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 8E63764E15 for ; Wed, 17 Feb 2021 16:32:43 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 8E63764E15 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2B0286B0006; Wed, 17 Feb 2021 11:32:43 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 260BA6B006C; Wed, 17 Feb 2021 11:32:43 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1504B6B006E; Wed, 17 Feb 2021 11:32:43 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0200.hostedemail.com [216.40.44.200]) by kanga.kvack.org (Postfix) with ESMTP id EF67B6B0006 for ; Wed, 17 Feb 2021 11:32:42 -0500 (EST) Received: from smtpin03.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 82FF2B2BD for ; Wed, 17 Feb 2021 16:32:42 +0000 (UTC) X-FDA: 77828303364.03.face07_3414c212764d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin03.hostedemail.com (Postfix) with ESMTP id 70D9328A252 for ; Wed, 17 Feb 2021 16:31:10 +0000 (UTC) X-HE-Tag: face07_3414c212764d X-Filterd-Recvd-Size: 9008 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Wed, 17 Feb 2021 16:31:09 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613579469; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d1HW5swS9ZOyvERvBufh4SMfymzI+ErI982WYqgv7a0=; b=QNUCNfHA0HUB1fotL9luDTJ9Y+nxhuLdPg5jYW/7r79vo0vv8Xz2FPSoaJ8Fz2YeN89o88 IT+ZEX3wAvPzxhzptndRh9WhEHVrHvZz0gD/y92UTqc/N2EYHV3tN1PWyW9MlrE8Is9tD8 ZokON0rxlzZfhkMyP9lAHEWMrBOQM38= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-445-dcps5XkQM-KBvndutfu9Pw-1; Wed, 17 Feb 2021 11:31:07 -0500 X-MC-Unique: dcps5XkQM-KBvndutfu9Pw-1 Received: by mail-qv1-f71.google.com with SMTP id m1so10319543qvp.0 for ; Wed, 17 Feb 2021 08:31:07 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d1HW5swS9ZOyvERvBufh4SMfymzI+ErI982WYqgv7a0=; b=QmFb9unbNHFUK6WgCdYKlPDjaNddmdy0YLTFWOCottSHdJwmvr6Ahn5jCHeXQNM7j4 ZSgQLM4kepT0D+lvpniuY0Z6nCyRWeVXqw+Np2bU/IzoQNvDJ/Var+Wh7FxCNiF6cise iTIkuTM7T1BSGtq6gZjlPIKINaD8PWmxj3/GNl6lKEtjTdLT5HE1pcX/2hZ2UNvXSzHn OXrLxRbDfXpfKhfmB2T4hakV7ff2sbmtqHuUl1msuugmTlAIQQruprkAF8MPRhEVzEz9 j36Vhbqyk2oFJd9R8rAfxNrV26ci0zgmdI/nyW7Lo2f6Ei5gXgn+a/w3OXrVecMpud1d fuaA== X-Gm-Message-State: AOAM5334DRhbd3AXRzRDspurkibXLtvZXzEg4PnTP4ungAnJNxdYOGgw ZMPT8FCQRlO84cmn6FnYJ/mKYASHoId5qO38KjPzVMbP6vuoxGg5G5gm+oe2Tfe96z0HgZfzrDe 2/n8eOlAsg94= X-Received: by 2002:a05:620a:88f:: with SMTP id b15mr25963825qka.445.1613579467178; Wed, 17 Feb 2021 08:31:07 -0800 (PST) X-Google-Smtp-Source: ABdhPJx6yJir8BfCh9ZGmOzmAyHa6uI/AAyLRzlhsDAnhQLoK8h1fNjp3S+IxZUTPXEvPmUnrB1h9w== X-Received: by 2002:a05:620a:88f:: with SMTP id b15mr25963804qka.445.1613579466917; Wed, 17 Feb 2021 08:31:06 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id z37sm399902qth.87.2021.02.17.08.31.05 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 08:31:06 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Axel Rasmussen , Andrew Morton , Matthew Wilcox , Mike Rapoport , "Kirill A . Shutemov" , Andrea Arcangeli , Mike Kravetz Subject: [PATCH 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Date: Wed, 17 Feb 2021 11:31:00 -0500 Message-Id: <20210217163102.13436-3-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210217163102.13436-1-peterx@redhat.com> References: <20210217163102.13436-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge pmd sharing could bring problem to userfaultfd. The thing is that userfaultfd is running its logic based on the special bits on page table entries, however the huge pmd sharing could potentially share page table entries for different address ranges. That could cause issues on either: - When sharing huge pmd page tables for an uffd write protected range, the newly mapped huge pmd range will also be write protected unexpectedly, or, - When we try to write protect a range of huge pmd shared range, we'll first do huge_pmd_unshare() in hugetlb_change_protection(), however that also means the UFFDIO_WRITEPROTECT could be silently skipped for the shared region, which could lead to data loss. Since at it, a few other things are done altogether: - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because that's definitely something that arch code would like to use too - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when trying to share huge pmd. Switch to the want_pmd_share() helper. Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share(). Signed-off-by: Peter Xu --- arch/arm64/mm/hugetlbpage.c | 3 +-- include/linux/hugetlb.h | 2 ++ include/linux/userfaultfd_k.h | 9 +++++++++ mm/hugetlb.c | 20 ++++++++++++++------ 4 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 6e3bcffe2837..58987a98e179 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -284,8 +284,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, */ ptep = pte_alloc_map(mm, pmdp, addr); } else if (sz == PMD_SIZE) { - if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && - pud_none(READ_ONCE(*pudp))) + if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp))) ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a6113fa6d21d..bc86f2f516e7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -950,4 +950,6 @@ static inline __init void hugetlb_cma_check(void) } #endif +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); + #endif /* _LINUX_HUGETLB_H */ diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index a8e5f3ea9bb2..c63ccdae3eab 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -52,6 +52,15 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, return vma->vm_userfaultfd_ctx.ctx == vm_ctx.ctx; } +/* + * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp + * protect information is per pgtable entry. + */ +static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_WP; +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07bb9bdc3282..8e8e2f3dfe06 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5292,6 +5292,18 @@ static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) return false; } +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +{ +#ifndef CONFIG_ARCH_WANT_HUGE_PMD_SHARE + return false; +#endif +#ifdef CONFIG_USERFAULTFD + if (uffd_disable_huge_pmd_share(vma)) + return false; +#endif + return vma_shareable(vma, addr); +} + /* * Determine if start,end range within vma could be mapped by shared pmd. * If yes, adjust start and end to cover range associated with possible @@ -5346,9 +5358,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *pte; spinlock_t *ptl; - if (!vma_shareable(vma, addr)) - return (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_assert_locked(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) @@ -5412,7 +5421,7 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE; return 1; } -#define want_pmd_share() (1) + #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, unsigned long addr, pud_t *pud) @@ -5430,7 +5439,6 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { } -#define want_pmd_share() (0) #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB @@ -5452,7 +5460,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte = (pte_t *)pud; } else { BUG_ON(sz != PMD_SIZE); - if (want_pmd_share() && pud_none(*pud)) + if (want_pmd_share(vma, addr) && pud_none(*pud)) pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr); From patchwork Wed Feb 17 16:31:01 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12091897 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 748A2C433DB for ; Wed, 17 Feb 2021 16:31:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 0710964E15 for ; Wed, 17 Feb 2021 16:31:48 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 0710964E15 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 858386B0070; Wed, 17 Feb 2021 11:31:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E08F6B0071; Wed, 17 Feb 2021 11:31:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6CF0C6B0072; Wed, 17 Feb 2021 11:31:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0212.hostedemail.com [216.40.44.212]) by kanga.kvack.org (Postfix) with ESMTP id 532ED6B0070 for ; Wed, 17 Feb 2021 11:31:48 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 137A718074F14 for ; Wed, 17 Feb 2021 16:31:48 +0000 (UTC) X-FDA: 77828301096.19.horn82_3900ff32764d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin19.hostedemail.com (Postfix) with ESMTP id DFCB9263BC for ; Wed, 17 Feb 2021 16:31:14 +0000 (UTC) X-HE-Tag: horn82_3900ff32764d X-Filterd-Recvd-Size: 5327 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf35.hostedemail.com (Postfix) with ESMTP for ; Wed, 17 Feb 2021 16:31:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613579473; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QWpsk6WR+Bwp1z9BHMOtQqpluxvzU8iTR9I0eDmdXe0=; b=RlD92Jhvy+B2N/EN6AitiJFDlwBqaZhoTC9KRH5vX/NGYgmGk769e6YIgbemAkcrQse+Jp 3tEO5N+iQAMA+XXeYOqvtipMN3/JmbJAlFnKoH6KrfFBXaMDYZPXNK9mtFidlqPNZJ4YCF YZaS/90WjzDZjR05aJfr9CawzghH4Z8= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-453-93658pOXMqyeE_kyu2bxtw-1; Wed, 17 Feb 2021 11:31:09 -0500 X-MC-Unique: 93658pOXMqyeE_kyu2bxtw-1 Received: by mail-qv1-f71.google.com with SMTP id z28so10311292qva.15 for ; Wed, 17 Feb 2021 08:31:09 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QWpsk6WR+Bwp1z9BHMOtQqpluxvzU8iTR9I0eDmdXe0=; b=dASAPtiENw4rPSLQyBJe04JOTXGuKiayCMeTEzSZHnKY6uUoR+E+pNvGW4fFPuEIbA e+dwbN9kRjBJl0Awc4YOGMgEp2cyl706DuaMGlr96qkpgr8LRwFBauydlnQOmz+FOnJH oUllflKr9zE0/H3ORATvGhCarWcE5stAW00ATjtSNVyqT0MpoSbHXUPqySCyjzDr2U2A 1oSemVP8T72aBoLCgeswunRkKT1qpwcpaOztS6aRtX+V7+dhzH/hfjAiBYlLqDJyc6ND QMq5JyfmWJ1HlCo0MjMJZqLPl6YihGFA5FMoiAksSn+4UH51ro8S4i7KY6Z69wi02+gj S1wQ== X-Gm-Message-State: AOAM532gF7wcmxc6e5jn/mU7e8RGmPRS6i/uG/Bifxk/RcovHHT2MGvc Phgdxatcrxu1svHGH+wNw1V+1uLpD3okt7jOPr+r3Y9kJHO92Urplt70tJuNcbm9xL84Zac0Lbd 6CJjD4Wb8cII= X-Received: by 2002:ac8:7557:: with SMTP id b23mr114972qtr.10.1613579468561; Wed, 17 Feb 2021 08:31:08 -0800 (PST) X-Google-Smtp-Source: ABdhPJyvVZMfmcfU5UP4AHANsYDhZdSFMybJOgfWQPKuMRHOiyCJ5cBdD8K1bDxpYI+5LrnEztIn+A== X-Received: by 2002:ac8:7557:: with SMTP id b23mr114950qtr.10.1613579468380; Wed, 17 Feb 2021 08:31:08 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id z37sm399902qth.87.2021.02.17.08.31.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 08:31:07 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Axel Rasmussen , Andrew Morton , Matthew Wilcox , Mike Rapoport , "Kirill A . Shutemov" , Andrea Arcangeli , Mike Kravetz Subject: [PATCH 3/4] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Date: Wed, 17 Feb 2021 11:31:01 -0500 Message-Id: <20210217163102.13436-4-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210217163102.13436-1-peterx@redhat.com> References: <20210217163102.13436-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Prepare for it to be called outside of mm/hugetlb.c. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 8 ++++++++ mm/hugetlb.c | 8 -------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index bc86f2f516e7..3b4104021dd3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -952,4 +952,12 @@ static inline __init void hugetlb_cma_check(void) bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); +#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE +/* + * ARCHes with special requirements for evicting HUGETLB backing TLB entries can + * implement this. + */ +#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) +#endif + #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8e8e2f3dfe06..f53a0b852ed8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4965,14 +4965,6 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, return i ? i : err; } -#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE -/* - * ARCHes with special requirements for evicting HUGETLB backing TLB entries can - * implement this. - */ -#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) -#endif - unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot) { From patchwork Wed Feb 17 16:31:02 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12091895 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 91C40C433DB for ; Wed, 17 Feb 2021 16:31:23 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2BDBB64DFF for ; Wed, 17 Feb 2021 16:31:23 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2BDBB64DFF Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A320D6B006E; Wed, 17 Feb 2021 11:31:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 943AC6B0070; Wed, 17 Feb 2021 11:31:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6AE086B0071; Wed, 17 Feb 2021 11:31:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0076.hostedemail.com [216.40.44.76]) by kanga.kvack.org (Postfix) with ESMTP id 542C46B006E for ; Wed, 17 Feb 2021 11:31:22 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 16DF79893 for ; Wed, 17 Feb 2021 16:31:22 +0000 (UTC) X-FDA: 77828300004.20.hot99_02139c52764d Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 2B5FD180C1476 for ; Wed, 17 Feb 2021 16:31:15 +0000 (UTC) X-HE-Tag: hot99_02139c52764d X-Filterd-Recvd-Size: 6897 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Wed, 17 Feb 2021 16:31:14 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613579474; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ftGXzlNwaFnDz8kDiuMs+kiPJ3W722AFkGJD2WRAqcY=; b=hv9Y4w0PMfmGGm2uTCpDbHHfA0P13USHjiu9Gi/YrOzledvCAseMCVFb67Um2vXLshhJkQ dKhSSYDjrRrMxn9N73mQlVrXpDeHYDlsHsKSeMtLoO5qIGwIOAlU+uUOsC6aC2xefDV5PI 9kE/4IGz1NhW6aS11+sYyBB/TJZbfwQ= Received: from mail-qk1-f199.google.com (mail-qk1-f199.google.com [209.85.222.199]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-515-TJSJ_JYYPu-SX7BmpWaztQ-1; Wed, 17 Feb 2021 11:31:10 -0500 X-MC-Unique: TJSJ_JYYPu-SX7BmpWaztQ-1 Received: by mail-qk1-f199.google.com with SMTP id k185so9861825qkb.17 for ; Wed, 17 Feb 2021 08:31:10 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=ftGXzlNwaFnDz8kDiuMs+kiPJ3W722AFkGJD2WRAqcY=; b=U6nHIqV6k4FbrBrN5BzHRJnIQu2u3p3qj94WQZCLdR1kstE7kmBMN2oNfTBZ0M13j3 iUFe7y4UBuS05PXlFAJZexvuZBaxWHL7oHaYwRuL5jiChgqgScGitDvhpOaFNNZa88H+ 1vpSTrJRWC1H3ESmMsi7vCkVNFWHe/EAXcfqqIxSwtpuFFy2oV/RiMYQSpX0j6Db2tCJ gQWPJySZLIzci64CAzFOLoTwVjGORyZE5tOMBUsIP3FYfPnKo7rllu3I+ob5iAlt7dkR BiOT4lBh+2YSchbuTHQHmb/mbe85CRzpZdc1dFbbe/BkZfENgL5FwjUg1vFxYCXWC4w/ V1Gg== X-Gm-Message-State: AOAM531JZ+U66RBZTeLERczCyjoXFRxF4LMTaSuST8QN1nRDT8Az283D 70je5vj+0OOxRXQQYBP3rJqnQgE5CRdEQBf/I6aAiMX9S/CeC4jZ7i2EBOXocO0ViKXtkNcQdhk Whxn9XR0Scqw= X-Received: by 2002:a05:622a:54e:: with SMTP id m14mr101661qtx.128.1613579469950; Wed, 17 Feb 2021 08:31:09 -0800 (PST) X-Google-Smtp-Source: ABdhPJztC/d/cZ0fIo7SdGnxYzSnhkW+w3i5etEZbDCMiMqGgbvRAOf3wv7D+bDWPxCKQ3NiCG4Z/w== X-Received: by 2002:a05:622a:54e:: with SMTP id m14mr101606qtx.128.1613579469720; Wed, 17 Feb 2021 08:31:09 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id z37sm399902qth.87.2021.02.17.08.31.08 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 08:31:09 -0800 (PST) From: Peter Xu To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: peterx@redhat.com, Axel Rasmussen , Andrew Morton , Matthew Wilcox , Mike Rapoport , "Kirill A . Shutemov" , Andrea Arcangeli , Mike Kravetz Subject: [PATCH 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Date: Wed, 17 Feb 2021 11:31:02 -0500 Message-Id: <20210217163102.13436-5-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210217163102.13436-1-peterx@redhat.com> References: <20210217163102.13436-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because userfaultfd-wp is always based on pgtable entries, so they cannot be shared. Walk the hugetlb range and unshare all such mappings if there is, right before UFFDIO_REGISTER will succeed and return to userspace. This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing is completely disabled for userfaultfd-wp registered range. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 57 ++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 57 insertions(+) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 894cc28142e7..3fbdacc25ff4 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -1190,6 +1191,59 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf, } } +/* + * This function will unconditionally remove all the shared pmd pgtable entries + * within the specific vma for a hugetlbfs memory range. + */ +static void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) +{ +#ifdef CONFIG_HUGETLB_PAGE + struct hstate *h = hstate_vma(vma); + unsigned long sz = huge_page_size(h); + struct mm_struct *mm = vma->vm_mm; + struct mmu_notifier_range range; + unsigned long address, start, end; + spinlock_t *ptl; + pte_t *ptep; + + if (!(vma->vm_flags & VM_MAYSHARE)) + return; + + start = ALIGN(vma->vm_start, PUD_SIZE); + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); + + if (start >= end) + return; + + /* + * No need to call adjust_range_if_pmd_sharing_possible(), because + * we're going to operate on the whole vma + */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, + vma->vm_start, vma->vm_end); + mmu_notifier_invalidate_range_start(&range); + i_mmap_lock_write(vma->vm_file->f_mapping); + for (address = start; address < end; address += PUD_SIZE) { + unsigned long tmp = address; + + ptep = huge_pte_offset(mm, address, sz); + if (!ptep) + continue; + ptl = huge_pte_lock(h, mm, ptep); + /* We don't want 'address' to be changed */ + huge_pmd_unshare(mm, vma, &tmp, ptep); + spin_unlock(ptl); + } + flush_hugetlb_tlb_range(vma, vma->vm_start, vma->vm_end); + i_mmap_unlock_write(vma->vm_file->f_mapping); + /* + * No need to call mmu_notifier_invalidate_range(), see + * Documentation/vm/mmu_notifier.rst. + */ + mmu_notifier_invalidate_range_end(&range); +#endif +} + static void __wake_userfault(struct userfaultfd_ctx *ctx, struct userfaultfd_wake_range *range) { @@ -1448,6 +1502,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vma->vm_flags = new_flags; vma->vm_userfaultfd_ctx.ctx = ctx; + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) + hugetlb_unshare_all_pmds(vma); + skip: prev = vma; start = vma->vm_end;