From patchwork Wed Feb 17 20:44:15 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12092287 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 741B2C433DB for ; Wed, 17 Feb 2021 20:44:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E928A64D79 for ; Wed, 17 Feb 2021 20:44:27 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E928A64D79 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 7C13A6B006C; Wed, 17 Feb 2021 15:44:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 721916B006E; Wed, 17 Feb 2021 15:44:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 527176B0070; Wed, 17 Feb 2021 15:44:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0142.hostedemail.com [216.40.44.142]) by kanga.kvack.org (Postfix) with ESMTP id 379246B006C for ; Wed, 17 Feb 2021 15:44:27 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id EDE121802A36E for ; Wed, 17 Feb 2021 20:44:26 +0000 (UTC) X-FDA: 77828937732.17.2CD1BA6 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf27.hostedemail.com (Postfix) with ESMTP id EABBE80192DA for ; Wed, 17 Feb 2021 20:44:22 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613594665; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=/cOEu0qsFxSp6evGy5oqY57J2hfl6jR1TN+14A84sIU=; b=gTzi/T496ApCQ4rxyCmMIifYpKsExfUOnxLXxyCk/6GF24GpU2P6aluGShd/qLjgMjcG49 3+UCK4EDkHpGmunFKeNIKwk5gHWfQxZtLI+kH3o8E41316kTWA8+KVlQSnGlxeAj0TFLBN 6X214UiDPQnVdSKe0juovBW1Wyks3u4= Received: from mail-qt1-f197.google.com (mail-qt1-f197.google.com [209.85.160.197]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-156-Ln4quG62ME2O5MNHEQ1nGg-1; Wed, 17 Feb 2021 15:44:24 -0500 X-MC-Unique: Ln4quG62ME2O5MNHEQ1nGg-1 Received: by mail-qt1-f197.google.com with SMTP id o20so10884302qtx.22 for ; Wed, 17 Feb 2021 12:44:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=/cOEu0qsFxSp6evGy5oqY57J2hfl6jR1TN+14A84sIU=; b=ICmH+AEi3YZfBOuiXl3wCaP8CmVxfl5EmRM0sO6KSy0OF3o7CmjqdnRgoie7aY+YAu 8W8CA8CCcDWAEKZegOONutVwbGCtF81a+AO0MPcLIjd2Cp5Yz0x1J8PDjHp7f8NOJeVn BQZV2isGCX649F7r0HKLY/9mZBWp83YGnTLEviuDRuiqvf7ww/Dodo4bjeEXNnDXDVt9 l4nOktG1E54NEdIK2rMjMYV/FMYJFORsshKtH8jP5Wdx5z+osU1cCF2LnA/3/5Jw1EYt 8M2FM40QJIXSQF/mU6qkywwQDasYEeslf0CYX4m73znfI9dONIPZM/x0pEbjzZEqlaTn 0m+g== X-Gm-Message-State: AOAM532ByNdPlVhSchzX/1Cp6rObPy3WdsU3yu0EyxfY+VS6B6Q7aDnB 0l/kGISAIz+yDZmfxXlC5GjXa/oZEOgw0umhCqWQpByIAmCWfco32JF+EeRCSJ03a4JKHjmodVe XbEDJI06Ao7WHk5OOfRz6w4WPSasQlbgr5uIU499kQwzdGIEMPXHKQzkh8miR X-Received: by 2002:a05:6214:76f:: with SMTP id f15mr1038096qvz.56.1613594661933; Wed, 17 Feb 2021 12:44:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJxLuiip2HhBtNLgxquicGH5ejywRhdQ09Nlb874ijJib1r6C3YoiSEunGAK0/gTISLIvrnoTw== X-Received: by 2002:a05:6214:76f:: with SMTP id f15mr1038053qvz.56.1613594661610; Wed, 17 Feb 2021 12:44:21 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id f9sm2440561qkm.28.2021.02.17.12.44.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 12:44:21 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: peterx@redhat.com, Axel Rasmussen , Mike Kravetz , Mike Rapoport , Matthew Wilcox , Andrea Arcangeli , Andrew Morton , "Kirill A . Shutemov" Subject: [PATCH v2 1/4] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() Date: Wed, 17 Feb 2021 15:44:15 -0500 Message-Id: <20210217204418.54259-2-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210217204418.54259-1-peterx@redhat.com> References: <20210217204418.54259-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: 7of48fajkbuxg3hxzp78rente76hd7nw X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: EABBE80192DA Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf27; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613594662-636840 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: It is a preparation work to be able to behave differently in the per architecture huge_pte_alloc() according to different VMA attributes. Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call. Suggested-by: Mike Kravetz Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- arch/arm64/mm/hugetlbpage.c | 4 ++-- arch/ia64/mm/hugetlbpage.c | 3 ++- arch/mips/mm/hugetlbpage.c | 4 ++-- arch/parisc/mm/hugetlbpage.c | 2 +- arch/powerpc/mm/hugetlbpage.c | 3 ++- arch/s390/mm/hugetlbpage.c | 2 +- arch/sh/mm/hugetlbpage.c | 2 +- arch/sparc/mm/hugetlbpage.c | 1 + include/linux/hugetlb.h | 5 +++-- mm/hugetlb.c | 15 ++++++++------- mm/userfaultfd.c | 2 +- 11 files changed, 24 insertions(+), 19 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 55ecf6de9ff7..6e3bcffe2837 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -252,7 +252,7 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr, set_pte(ptep, pte); } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgdp; @@ -286,7 +286,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, } else if (sz == PMD_SIZE) { if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && pud_none(READ_ONCE(*pudp))) - ptep = huge_pmd_share(mm, addr, pudp); + ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); } else if (sz == (CONT_PMD_SIZE)) { diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c index b331f94d20ac..f993cb36c062 100644 --- a/arch/ia64/mm/hugetlbpage.c +++ b/arch/ia64/mm/hugetlbpage.c @@ -25,7 +25,8 @@ unsigned int hpage_shift = HPAGE_SHIFT_DEFAULT; EXPORT_SYMBOL(hpage_shift); pte_t * -huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) +huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { unsigned long taddr = htlbpage_to_page(addr); pgd_t *pgd; diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c index b9f76f433617..7eaff5b07873 100644 --- a/arch/mips/mm/hugetlbpage.c +++ b/arch/mips/mm/hugetlbpage.c @@ -21,8 +21,8 @@ #include #include -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, - unsigned long sz) +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { pgd_t *pgd; p4d_t *p4d; diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c index d7ba014a7fbb..e141441bfa64 100644 --- a/arch/parisc/mm/hugetlbpage.c +++ b/arch/parisc/mm/hugetlbpage.c @@ -44,7 +44,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 8b3cc4d688e8..d57276b8791c 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -106,7 +106,8 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp, * At this point we do the placement change only for BOOK3S 64. This would * possibly work on other subarchs. */ -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { pgd_t *pg; p4d_t *p4; diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index 3b5a4d25ca9b..da36d13ffc16 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -189,7 +189,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, return pte; } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgdp; diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c index 220d7bc43d2b..999ab5916e69 100644 --- a/arch/sh/mm/hugetlbpage.c +++ b/arch/sh/mm/hugetlbpage.c @@ -21,7 +21,7 @@ #include #include -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c index ad4b42f04988..97e0824fdbe7 100644 --- a/arch/sparc/mm/hugetlbpage.c +++ b/arch/sparc/mm/hugetlbpage.c @@ -280,6 +280,7 @@ unsigned long pmd_leaf_size(pmd_t pmd) { return 1UL << tte_to_shift(*(pte_t *)&p unsigned long pte_leaf_size(pte_t pte) { return 1UL << tte_to_shift(pte); } pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index b5807f23caf8..a6113fa6d21d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -152,7 +152,8 @@ void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud); struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); @@ -161,7 +162,7 @@ extern struct list_head huge_boot_pages; /* arch callbacks */ -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz); pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 4bdb58ab14cb..07bb9bdc3282 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3807,7 +3807,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, src_pte = huge_pte_offset(src, addr, sz); if (!src_pte) continue; - dst_pte = huge_pte_alloc(dst, addr, sz); + dst_pte = huge_pte_alloc(dst, vma, addr, sz); if (!dst_pte) { ret = -ENOMEM; break; @@ -4544,7 +4544,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, */ mapping = vma->vm_file->f_mapping; i_mmap_lock_read(mapping); - ptep = huge_pte_alloc(mm, haddr, huge_page_size(h)); + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); if (!ptep) { i_mmap_unlock_read(mapping); return VM_FAULT_OOM; @@ -5334,9 +5334,9 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * if !vma_shareable check at the beginning of the routine. i_mmap_rwsem is * only required for subsequent processing. */ -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud) { - struct vm_area_struct *vma = find_vma(mm, addr); struct address_space *mapping = vma->vm_file->f_mapping; pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; @@ -5414,7 +5414,8 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, } #define want_pmd_share() (1) #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, + unsigned long addr, pud_t *pud) { return NULL; } @@ -5433,7 +5434,7 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; @@ -5452,7 +5453,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, } else { BUG_ON(sz != PMD_SIZE); if (want_pmd_share() && pud_none(*pud)) - pte = huge_pmd_share(mm, addr, pud); + pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr); } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 9a3d451402d7..063cbb17e8d8 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -290,7 +290,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, mutex_lock(&hugetlb_fault_mutex_table[hash]); err = -ENOMEM; - dst_pte = huge_pte_alloc(dst_mm, dst_addr, vma_hpagesize); + dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); From patchwork Wed Feb 17 20:46:17 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12092289 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E921DC433DB for ; Wed, 17 Feb 2021 20:46:26 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 7435764D5D for ; Wed, 17 Feb 2021 20:46:26 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 7435764D5D Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 100DB6B006C; Wed, 17 Feb 2021 15:46:26 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 062EF6B006E; Wed, 17 Feb 2021 15:46:26 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E94306B0070; Wed, 17 Feb 2021 15:46:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0153.hostedemail.com [216.40.44.153]) by kanga.kvack.org (Postfix) with ESMTP id D12406B006C for ; Wed, 17 Feb 2021 15:46:25 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 90DF2183F7923 for ; Wed, 17 Feb 2021 20:46:25 +0000 (UTC) X-FDA: 77828942730.14.trick86_5d083242764f Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id 705CE18043170 for ; Wed, 17 Feb 2021 20:46:25 +0000 (UTC) X-HE-Tag: trick86_5d083242764f X-Filterd-Recvd-Size: 9055 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf38.hostedemail.com (Postfix) with ESMTP for ; Wed, 17 Feb 2021 20:46:24 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613594784; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=d1HW5swS9ZOyvERvBufh4SMfymzI+ErI982WYqgv7a0=; b=Caes2bTcGWfNBqyoPX4RKBc2DRh+1OEA+i8CbpLsZ3piWDZRAWS+fdu3zip1XeHiKuLaew G3qZWpih3IE9do6rcvs33yNnnO8vopnNdWK3lBX7aKroaBYnfHO59j9Ry8ZIzbyG2G291I WMUyDZsdP46yW01kLp4eRSAPYlYNtjw= Received: from mail-qk1-f200.google.com (mail-qk1-f200.google.com [209.85.222.200]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-575-qBFPmSZIMvCRSR-AtK_VQg-1; Wed, 17 Feb 2021 15:46:22 -0500 X-MC-Unique: qBFPmSZIMvCRSR-AtK_VQg-1 Received: by mail-qk1-f200.google.com with SMTP id c63so11653059qkd.1 for ; Wed, 17 Feb 2021 12:46:22 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=d1HW5swS9ZOyvERvBufh4SMfymzI+ErI982WYqgv7a0=; b=r3ctZ1Z7xHvdmQYD5GS2luLWNj1CorT1eK1EUxMrlM5CYwQeVYhOeugqxU9ACD2vIZ KFR17ZL4YroVtLWucG7BdV/0kwixQz8fClSj1Cuoo1DQY04i0SOjd778+VCouoq886ab bN6+/GiK6UYIFmGivpglLf6I2+3/DIn/VVehC7QDt8HQs6HoRp2Io1bBFbkzhuCX0mJq glDg1KLpNwhOJNd0UI+9cgRa7EPhKr6u1TUwJrXZx0OYl2hFgG4TNIrsx78Zuuqyr+Qm lhA0A+NzO8J8mUFxAErIfor/jcK91PepFyg6Cr32kYc7pfgt4pf4Vc7FX9n7BII/p7bc X5xQ== X-Gm-Message-State: AOAM531KaRqWj0/Q1I79u2KfvX5Gk4ocI9InWQ0rH6x8CYTZbF02LG63 WQKKsqL52kNyV2yAzBhPSmnHG7hANXng43sO+GSCjnzbbOeVYuUZs4Fyj0PgGLJ6DewuUaStOl8 AQOe+HVowoYJNOXfdzVhPZbqR1685y9LvuEYxUJ1ks7hoxTqjPyN+9TIPysiT X-Received: by 2002:a37:ef0a:: with SMTP id j10mr1109551qkk.166.1613594781774; Wed, 17 Feb 2021 12:46:21 -0800 (PST) X-Google-Smtp-Source: ABdhPJzuKAPoi4qQJGkclBX/0hUm+efUNuRbTJIEHpviV0lPiapcmMfh9KzkLOt5Cb+IkRH7p0BwDA== X-Received: by 2002:a37:ef0a:: with SMTP id j10mr1109517qkk.166.1613594781445; Wed, 17 Feb 2021 12:46:21 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id p25sm2354901qkh.79.2021.02.17.12.46.20 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 12:46:20 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Mike Kravetz , peterx@redhat.com, Mike Rapoport , Andrea Arcangeli , Axel Rasmussen , Matthew Wilcox , "Kirill A . Shutemov" , Andrew Morton Subject: [PATCH v2 2/4] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled Date: Wed, 17 Feb 2021 15:46:17 -0500 Message-Id: <20210217204619.54761-1-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210217204418.54259-1-peterx@redhat.com> References: <20210217204418.54259-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge pmd sharing could bring problem to userfaultfd. The thing is that userfaultfd is running its logic based on the special bits on page table entries, however the huge pmd sharing could potentially share page table entries for different address ranges. That could cause issues on either: - When sharing huge pmd page tables for an uffd write protected range, the newly mapped huge pmd range will also be write protected unexpectedly, or, - When we try to write protect a range of huge pmd shared range, we'll first do huge_pmd_unshare() in hugetlb_change_protection(), however that also means the UFFDIO_WRITEPROTECT could be silently skipped for the shared region, which could lead to data loss. Since at it, a few other things are done altogether: - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because that's definitely something that arch code would like to use too - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when trying to share huge pmd. Switch to the want_pmd_share() helper. Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share(). Signed-off-by: Peter Xu Reviewed-by: Mike Kravetz --- arch/arm64/mm/hugetlbpage.c | 3 +-- include/linux/hugetlb.h | 2 ++ include/linux/userfaultfd_k.h | 9 +++++++++ mm/hugetlb.c | 20 ++++++++++++++------ 4 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 6e3bcffe2837..58987a98e179 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -284,8 +284,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, */ ptep = pte_alloc_map(mm, pmdp, addr); } else if (sz == PMD_SIZE) { - if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && - pud_none(READ_ONCE(*pudp))) + if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp))) ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a6113fa6d21d..bc86f2f516e7 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -950,4 +950,6 @@ static inline __init void hugetlb_cma_check(void) } #endif +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); + #endif /* _LINUX_HUGETLB_H */ diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index a8e5f3ea9bb2..c63ccdae3eab 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -52,6 +52,15 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, return vma->vm_userfaultfd_ctx.ctx == vm_ctx.ctx; } +/* + * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp + * protect information is per pgtable entry. + */ +static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_WP; +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 07bb9bdc3282..8e8e2f3dfe06 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5292,6 +5292,18 @@ static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) return false; } +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +{ +#ifndef CONFIG_ARCH_WANT_HUGE_PMD_SHARE + return false; +#endif +#ifdef CONFIG_USERFAULTFD + if (uffd_disable_huge_pmd_share(vma)) + return false; +#endif + return vma_shareable(vma, addr); +} + /* * Determine if start,end range within vma could be mapped by shared pmd. * If yes, adjust start and end to cover range associated with possible @@ -5346,9 +5358,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *pte; spinlock_t *ptl; - if (!vma_shareable(vma, addr)) - return (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_assert_locked(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) @@ -5412,7 +5421,7 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE; return 1; } -#define want_pmd_share() (1) + #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, unsigned long addr, pud_t *pud) @@ -5430,7 +5439,6 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { } -#define want_pmd_share() (0) #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB @@ -5452,7 +5460,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte = (pte_t *)pud; } else { BUG_ON(sz != PMD_SIZE); - if (want_pmd_share() && pud_none(*pud)) + if (want_pmd_share(vma, addr) && pud_none(*pud)) pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr); From patchwork Wed Feb 17 20:46:18 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12092291 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 68BFEC433E6 for ; Wed, 17 Feb 2021 20:46:28 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1119964E6C for ; Wed, 17 Feb 2021 20:46:28 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1119964E6C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 40FD56B006E; Wed, 17 Feb 2021 15:46:27 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C0816B0070; Wed, 17 Feb 2021 15:46:27 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EF486B0071; Wed, 17 Feb 2021 15:46:27 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 03B6F6B006E for ; Wed, 17 Feb 2021 15:46:26 -0500 (EST) Received: from smtpin04.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id B89DA6D75 for ; Wed, 17 Feb 2021 20:46:26 +0000 (UTC) X-FDA: 77828942772.04.E5FA628 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf18.hostedemail.com (Postfix) with ESMTP id 723E82000D82 for ; Wed, 17 Feb 2021 20:46:26 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613594785; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=QWpsk6WR+Bwp1z9BHMOtQqpluxvzU8iTR9I0eDmdXe0=; b=G0OUVkSZ6klEF7LfEvioWYIDACaOWqpyYJiv/mrpR+xyQWimQwkdlR8Ra1cbZ4Dp8eMrvi 3uZEm9L7HVkWSfEJ6nMLvIUD1RS+UnQwQzE5o+aIemH+hXuBUnpK5mYN9nk9Nkyr9j63D9 DaqebcMW4IyPhrMoaX7/TeuMjIrszLU= Received: from mail-qv1-f69.google.com (mail-qv1-f69.google.com [209.85.219.69]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-272-m6p2S-cPO72CDEHQJ68saw-1; Wed, 17 Feb 2021 15:46:24 -0500 X-MC-Unique: m6p2S-cPO72CDEHQJ68saw-1 Received: by mail-qv1-f69.google.com with SMTP id l13so10802739qvt.13 for ; Wed, 17 Feb 2021 12:46:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=QWpsk6WR+Bwp1z9BHMOtQqpluxvzU8iTR9I0eDmdXe0=; b=fZzSc8LvtPSwEduG8JHi4UewyXIbERTq5OIZD/pTF7olLzwcF/pSUrF60dEqomnafY c5ObbdsoCkM3pyCBxCcE6z1KTpuUO1qF22ee/c+V1xBkQb1pzN4anatsGCeiizEWFnqA BVggUKu9zJpnSZUPoly8K65iLFBYVK6scT4bTUWIxkE6jaer1y5dJ72eWtGOq3pM0nvt WePphtA12F8IB8xvy7pprBooPixVFYBedoEpWSNJGZ45NL8uYIzZRbHQxbNffd/SLUai jpS0hhDGYvxdlnNRYxcldCj6yB9f90LkyNBG5/07pAnHRPSM/ahM25HxAtj1Xqzd5xvo HGTA== X-Gm-Message-State: AOAM531cGxe7bbUp2lMXXpXFIdjikIl1ZLnwvHVfVmzacz3Pa0l+DTha xwua4qvhBP36OJ/zYZk8to/WFyb1r7bxZub5a5dFrQGYBDM0ew8RMQFuOhLYmluG3t8sRtotWtY DsOfsJpbVzbmGjFPsBP6BL/+0o0nHZ+N3Gb9WGQBtoc1H5wZIhP04h+86rnRh X-Received: by 2002:a0c:8485:: with SMTP id m5mr878918qva.14.1613594783373; Wed, 17 Feb 2021 12:46:23 -0800 (PST) X-Google-Smtp-Source: ABdhPJxzpJY/dYRuFJ0bbzo69kje8Ne04jgTMY6pdLwcYDopHHxPEES4boAYuLkYLXf2AHwHwph7Kg== X-Received: by 2002:a0c:8485:: with SMTP id m5mr878886qva.14.1613594783102; Wed, 17 Feb 2021 12:46:23 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id p25sm2354901qkh.79.2021.02.17.12.46.21 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 12:46:22 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Mike Kravetz , peterx@redhat.com, Mike Rapoport , Andrea Arcangeli , Axel Rasmussen , Matthew Wilcox , "Kirill A . Shutemov" , Andrew Morton Subject: [PATCH v2 3/4] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h Date: Wed, 17 Feb 2021 15:46:18 -0500 Message-Id: <20210217204619.54761-2-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210217204619.54761-1-peterx@redhat.com> References: <20210217204418.54259-1-peterx@redhat.com> <20210217204619.54761-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Stat-Signature: 5dx4gy74mcy8kjbir5mhou1ceu3oacgo X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 723E82000D82 Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf18; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=63.128.21.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613594786-835632 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Prepare for it to be called outside of mm/hugetlb.c. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu --- include/linux/hugetlb.h | 8 ++++++++ mm/hugetlb.c | 8 -------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index bc86f2f516e7..3b4104021dd3 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -952,4 +952,12 @@ static inline __init void hugetlb_cma_check(void) bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); +#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE +/* + * ARCHes with special requirements for evicting HUGETLB backing TLB entries can + * implement this. + */ +#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) +#endif + #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8e8e2f3dfe06..f53a0b852ed8 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4965,14 +4965,6 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, return i ? i : err; } -#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE -/* - * ARCHes with special requirements for evicting HUGETLB backing TLB entries can - * implement this. - */ -#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) -#endif - unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot) { From patchwork Wed Feb 17 20:46:19 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Peter Xu X-Patchwork-Id: 12092293 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.6 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER,INCLUDES_PATCH, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 59F04C433E0 for ; Wed, 17 Feb 2021 20:46:30 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 02BFB64E76 for ; Wed, 17 Feb 2021 20:46:29 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 02BFB64E76 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A7BF28D0001; Wed, 17 Feb 2021 15:46:28 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 969246B0071; Wed, 17 Feb 2021 15:46:28 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82E2B8D0001; Wed, 17 Feb 2021 15:46:28 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0204.hostedemail.com [216.40.44.204]) by kanga.kvack.org (Postfix) with ESMTP id 6DDD16B0070 for ; Wed, 17 Feb 2021 15:46:28 -0500 (EST) Received: from smtpin19.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3711380ACD83 for ; Wed, 17 Feb 2021 20:46:28 +0000 (UTC) X-FDA: 77828942856.19.5A8FD5E Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf18.hostedemail.com (Postfix) with ESMTP id F25662000D89 for ; Wed, 17 Feb 2021 20:46:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613594787; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=9gY06+Wkmcfht3KanCaWtohAIi6SQMiBnqeZFMOnly8=; b=IN2ElJ9cL3cqba1j7Fco5GJVYQf1K2TPSUTrkqkqOzXz0mVNiJbdYeZdlQ1wmpfwhY3Wss o3IKypfp4Ke/JfNhko8ujSVHCea54wn+Afl9jSh6z1P8eqayY4MzZM0A9Vr8gxd1HpZkN9 vqc/r7Ilsp4AKz3t7vH9xqBHjFSjyuI= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-175-9Qoyw68jMH2Xi6decUZoVg-1; Wed, 17 Feb 2021 15:46:25 -0500 X-MC-Unique: 9Qoyw68jMH2Xi6decUZoVg-1 Received: by mail-qv1-f70.google.com with SMTP id dk3so9623377qvb.1 for ; Wed, 17 Feb 2021 12:46:25 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:in-reply-to :references:mime-version:content-transfer-encoding; bh=9gY06+Wkmcfht3KanCaWtohAIi6SQMiBnqeZFMOnly8=; b=a0YyFnqMB1yRPmLJMJl002fA2k5bQT9a5JHdg+jda6Y+F+27kFYJ7yhlm8Heyy6GYx eZObkmkIi6ZZPzm5FbEA14LPOEUFTGdcGrWzeyAzRHw9Ls8khTKUpNCA35oHgjpS7lBb opk0gtfvGmVmuvMdYAww9x+WGPd8+zXCBkxX3wPZpAgWUckwNymFkW/WqAGIEswwV3Wc fJO2B7GL5UvP3zL5Kxm4nKdfgtRksLwYA+pd/5BsIxCBPTd0LOoe8wFmESA8SwPb4ydL lkT0A8v1fzUzFQ0Pwb/aInEOeqqe8w6I6zXhGgtEGpH88+DTXFBbBpazZDK+5sV30E35 e6vg== X-Gm-Message-State: AOAM532HDaB4I5Qq8+vIYri8P3YFyqyR1JYzSWkUQcV4cinlGgTqpE6A UY3ZsPTuPIbBYSVFJZuErYizlUsa46u9BNrPold+yWswnlyHFXKRmE/JL/9g9LBuTKfUAeoz9ao 2VeJ3uV7ZRAe+ESfWhUBIx3cESxJo7OJ2voy6OrCYoV6zfVpOmPJ3XgCW+/aP X-Received: by 2002:aed:3366:: with SMTP id u93mr1036115qtd.255.1613594785038; Wed, 17 Feb 2021 12:46:25 -0800 (PST) X-Google-Smtp-Source: ABdhPJxMbRLQXtnhXCNIMHW/s0UkO3hOcL2E4K5NIeBOPs0XHxCUYd5LV5uO57+v1rN/IHYI76Nexg== X-Received: by 2002:aed:3366:: with SMTP id u93mr1036087qtd.255.1613594784778; Wed, 17 Feb 2021 12:46:24 -0800 (PST) Received: from xz-x1.redhat.com (bras-vprn-toroon474qw-lp130-20-174-93-89-182.dsl.bell.ca. [174.93.89.182]) by smtp.gmail.com with ESMTPSA id p25sm2354901qkh.79.2021.02.17.12.46.23 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 17 Feb 2021 12:46:24 -0800 (PST) From: Peter Xu To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Mike Kravetz , peterx@redhat.com, Mike Rapoport , Andrea Arcangeli , Axel Rasmussen , Matthew Wilcox , "Kirill A . Shutemov" , Andrew Morton Subject: [PATCH v2 4/4] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp Date: Wed, 17 Feb 2021 15:46:19 -0500 Message-Id: <20210217204619.54761-3-peterx@redhat.com> X-Mailer: git-send-email 2.26.2 In-Reply-To: <20210217204619.54761-1-peterx@redhat.com> References: <20210217204418.54259-1-peterx@redhat.com> <20210217204619.54761-1-peterx@redhat.com> MIME-Version: 1.0 Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: F25662000D89 X-Stat-Signature: 4rsdx5n586cdh3dh1b8e4pdkoxmoui4q Received-SPF: none (redhat.com>: No applicable sender policy available) receiver=imf18; identity=mailfrom; envelope-from=""; helo=us-smtp-delivery-124.mimecast.com; client-ip=216.205.24.124 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1613594787-271985 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because userfaultfd-wp is always based on pgtable entries, so they cannot be shared. Walk the hugetlb range and unshare all such mappings if there is, right before UFFDIO_REGISTER will succeed and return to userspace. This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing is completely disabled for userfaultfd-wp registered range. Signed-off-by: Peter Xu --- fs/userfaultfd.c | 4 ++++ include/linux/hugetlb.h | 1 + mm/hugetlb.c | 51 +++++++++++++++++++++++++++++++++++++++++ 3 files changed, 56 insertions(+) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 894cc28142e7..e259318fcae1 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -1448,6 +1449,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vma->vm_flags = new_flags; vma->vm_userfaultfd_ctx.ctx = ctx; + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) + hugetlb_unshare_all_pmds(vma); + skip: prev = vma; start = vma->vm_end; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 3b4104021dd3..97ecfd4c20b2 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -188,6 +188,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot); bool is_hugetlb_entry_migration(pte_t pte); +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma); #else /* !CONFIG_HUGETLB_PAGE */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index f53a0b852ed8..83c006ea3ff9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5723,4 +5723,55 @@ void __init hugetlb_cma_check(void) pr_warn("hugetlb_cma: the option isn't supported by current arch\n"); } +/* + * This function will unconditionally remove all the shared pmd pgtable entries + * within the specific vma for a hugetlbfs memory range. + */ +void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) +{ + struct hstate *h = hstate_vma(vma); + unsigned long sz = huge_page_size(h); + struct mm_struct *mm = vma->vm_mm; + struct mmu_notifier_range range; + unsigned long address, start, end; + spinlock_t *ptl; + pte_t *ptep; + + if (!(vma->vm_flags & VM_MAYSHARE)) + return; + + start = ALIGN(vma->vm_start, PUD_SIZE); + end = ALIGN_DOWN(vma->vm_end, PUD_SIZE); + + if (start >= end) + return; + + /* + * No need to call adjust_range_if_pmd_sharing_possible(), because + * we're going to operate on the whole vma + */ + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, + vma->vm_start, vma->vm_end); + mmu_notifier_invalidate_range_start(&range); + i_mmap_lock_write(vma->vm_file->f_mapping); + for (address = start; address < end; address += PUD_SIZE) { + unsigned long tmp = address; + + ptep = huge_pte_offset(mm, address, sz); + if (!ptep) + continue; + ptl = huge_pte_lock(h, mm, ptep); + /* We don't want 'address' to be changed */ + huge_pmd_unshare(mm, vma, &tmp, ptep); + spin_unlock(ptl); + } + flush_hugetlb_tlb_range(vma, vma->vm_start, vma->vm_end); + i_mmap_unlock_write(vma->vm_file->f_mapping); + /* + * No need to call mmu_notifier_invalidate_range(), see + * Documentation/vm/mmu_notifier.rst. + */ + mmu_notifier_invalidate_range_end(&range); +} + #endif /* CONFIG_CMA */