From patchwork Thu Feb 4 18:34:24 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12068339 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C3D2C433E6 for ; Thu, 4 Feb 2021 18:34:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1875464E49 for ; Thu, 4 Feb 2021 18:34:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1875464E49 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id A6F9F6B0006; Thu, 4 Feb 2021 13:34:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 9FDA56B006E; Thu, 4 Feb 2021 13:34:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 875A26B0070; Thu, 4 Feb 2021 13:34:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0125.hostedemail.com [216.40.44.125]) by kanga.kvack.org (Postfix) with ESMTP id 6E8EA6B0006 for ; Thu, 4 Feb 2021 13:34:48 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 26D413621 for ; Thu, 4 Feb 2021 18:34:48 +0000 (UTC) X-FDA: 77781436656.29.boys41_2e12260275de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin29.hostedemail.com (Postfix) with ESMTP id E154018086CD4 for ; Thu, 4 Feb 2021 18:34:47 +0000 (UTC) X-HE-Tag: boys41_2e12260275de X-Filterd-Recvd-Size: 13815 Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf21.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 18:34:47 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id t13so3026672pfg.13 for ; Thu, 04 Feb 2021 10:34:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=5vrPV9I8JVFeLdRQprLM/i7ud6Fy/UbsW1wB6hwfbrM=; b=cw6vFbn3ZtVYziWDAoBJPVdXzmHwj3gshfcBd1n8LL3h71vdCBHnUfzDNuIHurdyvJ ycz20vw2diYjsV+3EW35kdzxtG/Nexr+DkhHDlEx0iMBzasYKUE7+LjJ8OHP+qpLEK9K nndBClB+32wqFL5m6eITon3s6xqbowdJ2jHGrLUVQRLv2/POTtvXJ1XRsYwPIWnTFf8k 3TUzG+2PIZrzcFQKA/torTQG1dLeV2l8suMHlji4DQCtaZFL8bfHpKHXM8ucI4+fr+r5 mCiTV3/q/7WlCXl90KJS936FaVFcYxAY27DS7aobWg0MxNbhQVl0UkE4JCZ86d5ID3PF gtxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=5vrPV9I8JVFeLdRQprLM/i7ud6Fy/UbsW1wB6hwfbrM=; b=hpTXwF6TfSimj8V40oI7Nz73qUF7JFI/LXwDswqw1P7Y0V0ccgjy5uv6qMFt1yNU7i 0Pv4b895XGLCIyfm7T1fF0SK15Hq6f8IIglG0ku/jH0JDVjCweDcIpaODV+zKxxKnnuW EnTlLpkJh5oQ2ac3RRk7a9O7gNcRNPpOa6P9iQpu/XcZYdlnqzyj8vMEWnKahhoAI1e/ dTdCF8geimAZ94n/OlTXeSzmY14l/+GJw7+L6Sg06QXHUuxpZqsKeesnCiX0sB+XQm6W bBXDjUiUcLXn49rBepHc/KYQyZPB8A2xXwRlNtHEbtPe80UN73B7Cc/YaCWK7juOkZvP J6VQ== X-Gm-Message-State: AOAM530wvezWqKpypq/3dxHn9JYRP0NMeNLShWDI7TZ5mxRrfP0UMV27 DaPaCGK6Dg5dXwG1qwKvt6OfLQNxKd2HCycawcpf X-Google-Smtp-Source: ABdhPJxsdlvp6hPq74MOCiF+ZTBW7uOux6t8O1EJFW/gr3u7o5tudyn3A9Ll+OiQb/fSyTWQ+hrWuZgBwoF0X+ez4RY4 X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:b001:12c1:dc19:2089]) (user=axelrasmussen job=sendgmr) by 2002:a17:902:9a49:b029:df:fab8:384 with SMTP id x9-20020a1709029a49b02900dffab80384mr455786plv.37.1612463686168; Thu, 04 Feb 2021 10:34:46 -0800 (PST) Date: Thu, 4 Feb 2021 10:34:24 -0800 In-Reply-To: <20210204183433.1431202-1-axelrasmussen@google.com> Message-Id: <20210204183433.1431202-2-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210204183433.1431202-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH v4 01/10] hugetlb: Pass vma into huge_pte_alloc() and huge_pmd_share() From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Xu It is a preparation work to be able to behave differently in the per architecture huge_pte_alloc() according to different VMA attributes. Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call. Suggested-by: Mike Kravetz Signed-off-by: Peter Xu Signed-off-by: Axel Rasmussen Reported-by: kernel test robot --- arch/arm64/mm/hugetlbpage.c | 4 ++-- arch/ia64/mm/hugetlbpage.c | 3 ++- arch/mips/mm/hugetlbpage.c | 4 ++-- arch/parisc/mm/hugetlbpage.c | 2 +- arch/powerpc/mm/hugetlbpage.c | 3 ++- arch/s390/mm/hugetlbpage.c | 2 +- arch/sh/mm/hugetlbpage.c | 2 +- arch/sparc/mm/hugetlbpage.c | 6 +----- include/linux/hugetlb.h | 5 +++-- mm/hugetlb.c | 15 ++++++++------- mm/userfaultfd.c | 2 +- 11 files changed, 24 insertions(+), 24 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 55ecf6de9ff7..6e3bcffe2837 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -252,7 +252,7 @@ void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr, set_pte(ptep, pte); } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgdp; @@ -286,7 +286,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, } else if (sz == PMD_SIZE) { if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && pud_none(READ_ONCE(*pudp))) - ptep = huge_pmd_share(mm, addr, pudp); + ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); } else if (sz == (CONT_PMD_SIZE)) { diff --git a/arch/ia64/mm/hugetlbpage.c b/arch/ia64/mm/hugetlbpage.c index b331f94d20ac..f993cb36c062 100644 --- a/arch/ia64/mm/hugetlbpage.c +++ b/arch/ia64/mm/hugetlbpage.c @@ -25,7 +25,8 @@ unsigned int hpage_shift = HPAGE_SHIFT_DEFAULT; EXPORT_SYMBOL(hpage_shift); pte_t * -huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) +huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { unsigned long taddr = htlbpage_to_page(addr); pgd_t *pgd; diff --git a/arch/mips/mm/hugetlbpage.c b/arch/mips/mm/hugetlbpage.c index b9f76f433617..7eaff5b07873 100644 --- a/arch/mips/mm/hugetlbpage.c +++ b/arch/mips/mm/hugetlbpage.c @@ -21,8 +21,8 @@ #include #include -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, - unsigned long sz) +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { pgd_t *pgd; p4d_t *p4d; diff --git a/arch/parisc/mm/hugetlbpage.c b/arch/parisc/mm/hugetlbpage.c index d7ba014a7fbb..e141441bfa64 100644 --- a/arch/parisc/mm/hugetlbpage.c +++ b/arch/parisc/mm/hugetlbpage.c @@ -44,7 +44,7 @@ hugetlb_get_unmapped_area(struct file *file, unsigned long addr, } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/arch/powerpc/mm/hugetlbpage.c b/arch/powerpc/mm/hugetlbpage.c index 8b3cc4d688e8..d57276b8791c 100644 --- a/arch/powerpc/mm/hugetlbpage.c +++ b/arch/powerpc/mm/hugetlbpage.c @@ -106,7 +106,8 @@ static int __hugepte_alloc(struct mm_struct *mm, hugepd_t *hpdp, * At this point we do the placement change only for BOOK3S 64. This would * possibly work on other subarchs. */ -pte_t *huge_pte_alloc(struct mm_struct *mm, unsigned long addr, unsigned long sz) +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, unsigned long sz) { pgd_t *pg; p4d_t *p4; diff --git a/arch/s390/mm/hugetlbpage.c b/arch/s390/mm/hugetlbpage.c index 3b5a4d25ca9b..da36d13ffc16 100644 --- a/arch/s390/mm/hugetlbpage.c +++ b/arch/s390/mm/hugetlbpage.c @@ -189,7 +189,7 @@ pte_t huge_ptep_get_and_clear(struct mm_struct *mm, return pte; } -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgdp; diff --git a/arch/sh/mm/hugetlbpage.c b/arch/sh/mm/hugetlbpage.c index 220d7bc43d2b..999ab5916e69 100644 --- a/arch/sh/mm/hugetlbpage.c +++ b/arch/sh/mm/hugetlbpage.c @@ -21,7 +21,7 @@ #include #include -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/arch/sparc/mm/hugetlbpage.c b/arch/sparc/mm/hugetlbpage.c index ad4b42f04988..a487ea2977a2 100644 --- a/arch/sparc/mm/hugetlbpage.c +++ b/arch/sparc/mm/hugetlbpage.c @@ -275,11 +275,7 @@ static unsigned long huge_tte_to_size(pte_t pte) return size; } -unsigned long pud_leaf_size(pud_t pud) { return 1UL << tte_to_shift(*(pte_t *)&pud); } -unsigned long pmd_leaf_size(pmd_t pmd) { return 1UL << tte_to_shift(*(pte_t *)&pmd); } -unsigned long pte_leaf_size(pte_t pte) { return 1UL << tte_to_shift(pte); } - -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index ebca2ef02212..92e8799edffd 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -152,7 +152,8 @@ void hugetlb_fix_reserve_counts(struct inode *inode); extern struct mutex *hugetlb_fault_mutex_table; u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud); +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud); struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); @@ -161,7 +162,7 @@ extern struct list_head huge_boot_pages; /* arch callbacks */ -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz); pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 18f6ee317900..3185631f61bc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -3766,7 +3766,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, src_pte = huge_pte_offset(src, addr, sz); if (!src_pte) continue; - dst_pte = huge_pte_alloc(dst, addr, sz); + dst_pte = huge_pte_alloc(dst, vma, addr, sz); if (!dst_pte) { ret = -ENOMEM; break; @@ -4503,7 +4503,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, */ mapping = vma->vm_file->f_mapping; i_mmap_lock_read(mapping); - ptep = huge_pte_alloc(mm, haddr, huge_page_size(h)); + ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h)); if (!ptep) { i_mmap_unlock_read(mapping); return VM_FAULT_OOM; @@ -5293,9 +5293,9 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * if !vma_shareable check at the beginning of the routine. i_mmap_rwsem is * only required for subsequent processing. */ -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long addr, pud_t *pud) { - struct vm_area_struct *vma = find_vma(mm, addr); struct address_space *mapping = vma->vm_file->f_mapping; pgoff_t idx = ((addr - vma->vm_start) >> PAGE_SHIFT) + vma->vm_pgoff; @@ -5373,7 +5373,8 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, } #define want_pmd_share() (1) #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ -pte_t *huge_pmd_share(struct mm_struct *mm, unsigned long addr, pud_t *pud) +pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, + unsigned long addr, pud_t *pud) { return NULL; } @@ -5392,7 +5393,7 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB -pte_t *huge_pte_alloc(struct mm_struct *mm, +pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) { pgd_t *pgd; @@ -5411,7 +5412,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, } else { BUG_ON(sz != PMD_SIZE); if (want_pmd_share() && pud_none(*pud)) - pte = huge_pmd_share(mm, addr, pud); + pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr); } diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index 7423808640ef..b2ce61c1b50d 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -290,7 +290,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, mutex_lock(&hugetlb_fault_mutex_table[hash]); err = -ENOMEM; - dst_pte = huge_pte_alloc(dst_mm, dst_addr, vma_hpagesize); + dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); if (!dst_pte) { mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); From patchwork Thu Feb 4 18:34:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12070409 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id CFB57C433E0 for ; Fri, 5 Feb 2021 16:20:49 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2A37A64D9A for ; Fri, 5 Feb 2021 16:20:49 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2A37A64D9A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8CBD56B0006; Fri, 5 Feb 2021 11:20:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 87DC86B0073; Fri, 5 Feb 2021 11:20:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7934C6B0074; Fri, 5 Feb 2021 11:20:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 63DAC6B0006 for ; Fri, 5 Feb 2021 11:20:48 -0500 (EST) Received: from smtpin18.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 2989C181AEF10 for ; Fri, 5 Feb 2021 16:20:48 +0000 (UTC) X-FDA: 77784727776.18.983573B Received: by imf03.hostedemail.com (Postfix, from userid 200) id E9E73C00CEC7; Thu, 4 Feb 2021 18:34:50 +0000 (UTC) Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf03.hostedemail.com (Postfix) with ESMTP id C0CA2DB3D7D7 for ; Thu, 4 Feb 2021 18:34:47 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id 134so4180469ybd.3 for ; Thu, 04 Feb 2021 10:34:48 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=s9r2ulxIVo+JxzRZfbGUdKJOP0/th7I3qsQK2fKA868=; b=wV+Hva7Qm6LmWtboJMy1O3oNDDp9+Ve3p0BFRSkoshlP4KJG6Pvq+kn2LJL+fXNz0H 7Be7hCNSDbaTmGhW64WInTBmORgd0QNu3sU1mKpY0L8+8id+LYOX5eXOk3umYJ1Xe0JA S4YuMEO/342CuRpnjziXI81SaQS/YKnf6ibKDL5oBM5EJyHhJFroah7osop6dWo/lzEx 6agFStDL6ZeCHk295SjxM/nC0hV8tAuhTDJ90CD8vJZLw+1EGbDMJTJpn4be4+UREWYa WOcBxClzEpU87vKEFl6J3wIo8cL88U+kEP43Otj+h35ibnJpTaCGn3XsX5Mtenym6CCo iW8g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=s9r2ulxIVo+JxzRZfbGUdKJOP0/th7I3qsQK2fKA868=; b=s7u8jcWE+kk1Pl7ynvCtuwphWybGnVovcXj/xBpdwa00VzJaZTA/pfMvMPZkB+Gjtp esZTX8jxz1DcqBYYznBy9mHfE2OXV7QOT44bntFVCOS9LieQajwe5YDCNhZ5QLfxkZL9 zM3cas1pG9OjvrA7SASxRB2p32Mbts6woM0Pm7u25qb5mi9py9e5UXvoUo834zeEOiZF 6GnR5N6SeQ0mlhfX/oU4+jxPR4gf0GwMdxS+hmgBVSCst5GRz2hFV1YGrFVPnglvWjGG RC59UAfuvxFIO4VxWTxIX5ESMN/enZ+S2cNXAWmNcbtssKk8vDlVrNRImiXkYF1B/5Q9 zZZw== X-Gm-Message-State: AOAM530LR4AfsX7vt8S1p0xb+F9M8qraLVq/fYgYnK9U0yLBEJhCISSQ 66Zl0P937HCcrkYAvs+iGlp5zAk6+Zv4KoBck3dD X-Google-Smtp-Source: ABdhPJx6wGFI4hlg9+bzkoNUY3w5U7WH/0NErIIYgtcvUJiTksoN103vNGBceqQZ7UuGL6uHIDxGzIj67AWdShq6BnyQ X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:b001:12c1:dc19:2089]) (user=axelrasmussen job=sendgmr) by 2002:a25:6cc6:: with SMTP id h189mr814735ybc.434.1612463688006; Thu, 04 Feb 2021 10:34:48 -0800 (PST) Date: Thu, 4 Feb 2021 10:34:25 -0800 In-Reply-To: <20210204183433.1431202-1-axelrasmussen@google.com> Message-Id: <20210204183433.1431202-3-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210204183433.1431202-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH v4 02/10] hugetlb/userfaultfd: Forbid huge pmd sharing when uffd enabled From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: C0CA2DB3D7D7 X-Stat-Signature: 8hs6mc7fd7wukz8u1j9e86neabpo7tkm Received-SPF: none (flex--axelrasmussen.bounces.google.com>: No applicable sender policy available) receiver=imf03; identity=mailfrom; envelope-from="<3SD4cYA0KCD0ZwdkqZrltrrdmfnnfkd.bnlkhmtw-lljuZbj.nqf@flex--axelrasmussen.bounces.google.com>"; helo=mail-yb1-f201.google.com; client-ip=209.85.219.201 X-HE-DKIM-Result: pass/pass X-HE-Tag: 1612463687-744288 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Xu Huge pmd sharing could bring problem to userfaultfd. The thing is that userfaultfd is running its logic based on the special bits on page table entries, however the huge pmd sharing could potentially share page table entries for different address ranges. That could cause issues on either: - When sharing huge pmd page tables for an uffd write protected range, the newly mapped huge pmd range will also be write protected unexpectedly, or, - When we try to write protect a range of huge pmd shared range, we'll first do huge_pmd_unshare() in hugetlb_change_protection(), however that also means the UFFDIO_WRITEPROTECT could be silently skipped for the shared region, which could lead to data loss. Since at it, a few other things are done altogether: - Move want_pmd_share() from mm/hugetlb.c into linux/hugetlb.h, because that's definitely something that arch code would like to use too - ARM64 currently directly check against CONFIG_ARCH_WANT_HUGE_PMD_SHARE when trying to share huge pmd. Switch to the want_pmd_share() helper. Since at it, move vma_shareable() from huge_pmd_share() into want_pmd_share(). Signed-off-by: Peter Xu Signed-off-by: Axel Rasmussen --- arch/arm64/mm/hugetlbpage.c | 3 +-- include/linux/hugetlb.h | 2 ++ include/linux/userfaultfd_k.h | 9 +++++++++ mm/hugetlb.c | 20 ++++++++++++++------ 4 files changed, 26 insertions(+), 8 deletions(-) diff --git a/arch/arm64/mm/hugetlbpage.c b/arch/arm64/mm/hugetlbpage.c index 6e3bcffe2837..58987a98e179 100644 --- a/arch/arm64/mm/hugetlbpage.c +++ b/arch/arm64/mm/hugetlbpage.c @@ -284,8 +284,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, */ ptep = pte_alloc_map(mm, pmdp, addr); } else if (sz == PMD_SIZE) { - if (IS_ENABLED(CONFIG_ARCH_WANT_HUGE_PMD_SHARE) && - pud_none(READ_ONCE(*pudp))) + if (want_pmd_share(vma, addr) && pud_none(READ_ONCE(*pudp))) ptep = huge_pmd_share(mm, vma, addr, pudp); else ptep = (pte_t *)pmd_alloc(mm, pudp, addr); diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 92e8799edffd..f6d5939a6eb0 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -948,4 +948,6 @@ static inline __init void hugetlb_cma_check(void) } #endif +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); + #endif /* _LINUX_HUGETLB_H */ diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index a8e5f3ea9bb2..c63ccdae3eab 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -52,6 +52,15 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, return vma->vm_userfaultfd_ctx.ctx == vm_ctx.ctx; } +/* + * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp + * protect information is per pgtable entry. + */ +static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_WP; +} + static inline bool userfaultfd_missing(struct vm_area_struct *vma) { return vma->vm_flags & VM_UFFD_MISSING; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 3185631f61bc..588c4c28c44d 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5251,6 +5251,18 @@ static bool vma_shareable(struct vm_area_struct *vma, unsigned long addr) return false; } +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +{ +#ifndef CONFIG_ARCH_WANT_HUGE_PMD_SHARE + return false; +#endif +#ifdef CONFIG_USERFAULTFD + if (uffd_disable_huge_pmd_share(vma)) + return false; +#endif + return vma_shareable(vma, addr); +} + /* * Determine if start,end range within vma could be mapped by shared pmd. * If yes, adjust start and end to cover range associated with possible @@ -5305,9 +5317,6 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *pte; spinlock_t *ptl; - if (!vma_shareable(vma, addr)) - return (pte_t *)pmd_alloc(mm, pud, addr); - i_mmap_assert_locked(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { if (svma == vma) @@ -5371,7 +5380,7 @@ int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, *addr = ALIGN(*addr, HPAGE_SIZE * PTRS_PER_PTE) - HPAGE_SIZE; return 1; } -#define want_pmd_share() (1) + #else /* !CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct vma, unsigned long addr, pud_t *pud) @@ -5389,7 +5398,6 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, unsigned long *start, unsigned long *end) { } -#define want_pmd_share() (0) #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB @@ -5411,7 +5419,7 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte = (pte_t *)pud; } else { BUG_ON(sz != PMD_SIZE); - if (want_pmd_share() && pud_none(*pud)) + if (want_pmd_share(vma, addr) && pud_none(*pud)) pte = huge_pmd_share(mm, vma, addr, pud); else pte = (pte_t *)pmd_alloc(mm, pud, addr); From patchwork Thu Feb 4 18:34:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12068341 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4775EC433E6 for ; Thu, 4 Feb 2021 18:34:52 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E136664D9A for ; Thu, 4 Feb 2021 18:34:51 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E136664D9A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 806606B006E; Thu, 4 Feb 2021 13:34:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 712C46B0070; Thu, 4 Feb 2021 13:34:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5DB526B0071; Thu, 4 Feb 2021 13:34:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0241.hostedemail.com [216.40.44.241]) by kanga.kvack.org (Postfix) with ESMTP id 478606B006E for ; Thu, 4 Feb 2021 13:34:51 -0500 (EST) Received: from smtpin14.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 12047181AC9CB for ; Thu, 4 Feb 2021 18:34:51 +0000 (UTC) X-FDA: 77781436782.14.wave38_55001ae275de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin14.hostedemail.com (Postfix) with ESMTP id EF63C18229837 for ; Thu, 4 Feb 2021 18:34:50 +0000 (UTC) X-HE-Tag: wave38_55001ae275de X-Filterd-Recvd-Size: 5776 Received: from mail-qk1-f202.google.com (mail-qk1-f202.google.com [209.85.222.202]) by imf12.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 18:34:50 +0000 (UTC) Received: by mail-qk1-f202.google.com with SMTP id m64so3456439qke.12 for ; Thu, 04 Feb 2021 10:34:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=OEiarKErHeQBPZKsYeL3IkSXq4c5mYMZ0dIk1JQlp5Y=; b=HaC1/BN3UFOD+8cUBOspMU73ct/e+qcOiXXFRVMYRHECbtjGH4eFsMreWMsc1gLI8E D4HwfqBpv8ifvkQ0WTBn6pGbOvbPMH5naPNpoq6uIrdBp6aAsuqDwCDf2SHMU8U7EnQf as+ddooJ5YbrDa0uuHll/bhyoJhHSwd8mgvYbwJbZpIPtS3RdBrNlhZIAxBzHn83KTEe ColYJ67sQiwpnOsZb66BnQz+LGNeOAusyJeYaEc/NUueGhZQKpMBT4BxAkZqBVp13ukv WAAZ4mkE7OgDpUP7dnncZRPz8SVVk3PinhZ4MkQv9OelGo/fPIj3rr0yOBddPveGFIiu udHg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=OEiarKErHeQBPZKsYeL3IkSXq4c5mYMZ0dIk1JQlp5Y=; b=Ib2iWC0ZcVHxJbGNLMVl10gR8RPs2NNi5VHV4Lj8D2U0eqOVubGq+kL7RiVgDk6qKk o8Zsm5k0McsWqQJPgQQYm9E+XN3sQJ3CdwuvVJv8BmMB6ZJyRXbyzDuoSyTOjpP9euXv 4UbT+TaPA1eL0CKGoQJl8QEcKEMO0Wyt7iWYXM5g2u5S2oBwpZE26dw6inNwcoEMG1dy 7wdlqQA6+IZgRTbiUs88jabeFlgdUM9m/zeM07FuOQMwPnATsRcyrV/gQtMgCeVGBmFJ 1nV54rZaCefeHh3JDENPEhAF9IPabBhABsaCbjk9B2ncV3nN1DThdYiL/306k2fXDBrt weeg== X-Gm-Message-State: AOAM530z/oabppJ4j+DlryMeuoqZY+3Pc1Bx7xWP+bTcEavr7MB0pGgn h+chChaN2Zvns+l5kwRh40IxccCJopCVJFCu+lhn X-Google-Smtp-Source: ABdhPJwoP//UfZwpJJBLBfBtdgwOdsWrZ/lchYHtEI3yC3eDyBTvQXSkLtfePQ0byVtVxFApIJiz12FKAuAPClXh2/pD X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:b001:12c1:dc19:2089]) (user=axelrasmussen job=sendgmr) by 2002:a0c:a692:: with SMTP id t18mr803070qva.18.1612463689620; Thu, 04 Feb 2021 10:34:49 -0800 (PST) Date: Thu, 4 Feb 2021 10:34:26 -0800 In-Reply-To: <20210204183433.1431202-1-axelrasmussen@google.com> Message-Id: <20210204183433.1431202-4-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210204183433.1431202-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH v4 03/10] mm/hugetlb: Move flush_hugetlb_tlb_range() into hugetlb.h From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Xu Prepare for it to be called outside of mm/hugetlb.c. Reviewed-by: Mike Kravetz Signed-off-by: Peter Xu Signed-off-by: Axel Rasmussen --- include/linux/hugetlb.h | 8 ++++++++ mm/hugetlb.c | 8 -------- 2 files changed, 8 insertions(+), 8 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index f6d5939a6eb0..af40500c99f0 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -950,4 +950,12 @@ static inline __init void hugetlb_cma_check(void) bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr); +#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE +/* + * ARCHes with special requirements for evicting HUGETLB backing TLB entries can + * implement this. + */ +#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) +#endif + #endif /* _LINUX_HUGETLB_H */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 588c4c28c44d..5b3f00a1e276 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4924,14 +4924,6 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, return i ? i : err; } -#ifndef __HAVE_ARCH_FLUSH_HUGETLB_TLB_RANGE -/* - * ARCHes with special requirements for evicting HUGETLB backing TLB entries can - * implement this. - */ -#define flush_hugetlb_tlb_range(vma, addr, end) flush_tlb_range(vma, addr, end) -#endif - unsigned long hugetlb_change_protection(struct vm_area_struct *vma, unsigned long address, unsigned long end, pgprot_t newprot) { From patchwork Thu Feb 4 18:34:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12068343 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7257DC433E0 for ; Thu, 4 Feb 2021 18:34:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EF4F964D9A for ; Thu, 4 Feb 2021 18:34:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF4F964D9A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 80B616B0070; Thu, 4 Feb 2021 13:34:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7192A6B0071; Thu, 4 Feb 2021 13:34:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 58EA26B0072; Thu, 4 Feb 2021 13:34:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0224.hostedemail.com [216.40.44.224]) by kanga.kvack.org (Postfix) with ESMTP id 36EBE6B0070 for ; Thu, 4 Feb 2021 13:34:53 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 0008D1EF1 for ; Thu, 4 Feb 2021 18:34:52 +0000 (UTC) X-FDA: 77781436866.28.slip05_050ab6e275de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id D0D336C3A for ; Thu, 4 Feb 2021 18:34:52 +0000 (UTC) X-HE-Tag: slip05_050ab6e275de X-Filterd-Recvd-Size: 7569 Received: from mail-qk1-f201.google.com (mail-qk1-f201.google.com [209.85.222.201]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 18:34:52 +0000 (UTC) Received: by mail-qk1-f201.google.com with SMTP id s4so3448441qkj.18 for ; Thu, 04 Feb 2021 10:34:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=BOdWGt2YrrFhUAh6Oya9odQ31xPXJajGw7XfNah1Ei0=; b=NBHdLCMVmjlgN5mUcDN7btxS+7ZLWcclBrRA56QJlXrMR8QRMxe4LKcc+rDaJR88Tg 81ZpUhmRyV+2iRs9b9i6X1F53gwpkWEkrPHiso9lwqtE01nZ0LC76zIowyeGwRQger9n MvN1CUkSc5RzCH3+1uJPnTf4lcAKThO0g7tGROkLDYgTOlEt+CYUjMazGvYBxhzJ26is 2gH5HlvO0XbpfhMs2p9bBdRgOCjFHHwlv8y9M7F5y30awvPM3pYsdA3nlMCVnO+SfpbC VJuVKmjkYK4vCc8vvFkmNhpnzIoB+2Lu2ef0b8fXz1jWw9ZJYS+CaSQD16ThR+WRoF/o 1JOQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=BOdWGt2YrrFhUAh6Oya9odQ31xPXJajGw7XfNah1Ei0=; b=Hd5Q83BTGJkeKKst2QRxF66Rk3+5M5NFQIVYHLZ6HEANOVrobKQAWqjZ84GLEjvL4O tkie/1JOQ7w9xEPdQDCfZgdCSklxQehijGCnKLURIyYmV97ziCwbq+Ro+6A55aFzAzBs pDG0H1kCIIA2tt38zg6COFL+1pqLDBZk2vkz27h+lyxc32qrxIJyHYhYhraluOiKoYCL tM8vefBIWEre/r6GnnVfUYIPLcYpA+PB35ZZdzSXWM+Axz0i4KnKEDlIJME9/LXOyKoG kD4/CHVoFPy8QkCwnKfLEpFh7cOSOf+dscgnme3H5DmfmkMV1w3JO7xp+EHsbRr6mxq2 tZsA== X-Gm-Message-State: AOAM533MvDeuCHOqt2bgfCFayr8E71KqlxO0qbRhedeWR/ifBod9bsIi yT+HrSJLTkh84SPW6AloGP0rCuNZMVfUkaDcfVLR X-Google-Smtp-Source: ABdhPJyPb3Mlg/0ElAXzwxLs8sVTCzLB5QqJgGLurcVZpTtQkepWRndQWdvko4eXoVcaKAI92jkYW/tc6Gcw4S4Evpba X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:b001:12c1:dc19:2089]) (user=axelrasmussen job=sendgmr) by 2002:a05:6214:2ee:: with SMTP id h14mr785467qvu.34.1612463691582; Thu, 04 Feb 2021 10:34:51 -0800 (PST) Date: Thu, 4 Feb 2021 10:34:27 -0800 In-Reply-To: <20210204183433.1431202-1-axelrasmussen@google.com> Message-Id: <20210204183433.1431202-5-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210204183433.1431202-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH v4 04/10] hugetlb/userfaultfd: Unshare all pmds for hugetlbfs when register wp From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: From: Peter Xu Huge pmd sharing for hugetlbfs is racy with userfaultfd-wp because userfaultfd-wp is always based on pgtable entries, so they cannot be shared. Walk the hugetlb range and unshare all such mappings if there is, right before UFFDIO_REGISTER will succeed and return to userspace. This will pair with want_pmd_share() in hugetlb code so that huge pmd sharing is completely disabled for userfaultfd-wp registered range. Signed-off-by: Peter Xu Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 48 ++++++++++++++++++++++++++++++++++++ include/linux/mmu_notifier.h | 1 + 2 files changed, 49 insertions(+) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 894cc28142e7..a0f66e12026b 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -15,6 +15,7 @@ #include #include #include +#include #include #include #include @@ -1190,6 +1191,50 @@ static ssize_t userfaultfd_read(struct file *file, char __user *buf, } } +/* + * This function will unconditionally remove all the shared pmd pgtable entries + * within the specific vma for a hugetlbfs memory range. + */ +static void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) +{ +#ifdef CONFIG_HUGETLB_PAGE + struct hstate *h = hstate_vma(vma); + unsigned long sz = huge_page_size(h); + struct mm_struct *mm = vma->vm_mm; + struct mmu_notifier_range range; + unsigned long address; + spinlock_t *ptl; + pte_t *ptep; + + if (!(vma->vm_flags & VM_MAYSHARE)) + return; + + /* + * No need to call adjust_range_if_pmd_sharing_possible(), because + * we're going to operate on the whole vma + */ + mmu_notifier_range_init(&range, MMU_NOTIFY_HUGETLB_UNSHARE, + 0, vma, mm, vma->vm_start, vma->vm_end); + mmu_notifier_invalidate_range_start(&range); + i_mmap_lock_write(vma->vm_file->f_mapping); + for (address = vma->vm_start; address < vma->vm_end; address += sz) { + ptep = huge_pte_offset(mm, address, sz); + if (!ptep) + continue; + ptl = huge_pte_lock(h, mm, ptep); + huge_pmd_unshare(mm, vma, &address, ptep); + spin_unlock(ptl); + } + flush_hugetlb_tlb_range(vma, vma->vm_start, vma->vm_end); + i_mmap_unlock_write(vma->vm_file->f_mapping); + /* + * No need to call mmu_notifier_invalidate_range(), see + * Documentation/vm/mmu_notifier.rst. + */ + mmu_notifier_invalidate_range_end(&range); +#endif +} + static void __wake_userfault(struct userfaultfd_ctx *ctx, struct userfaultfd_wake_range *range) { @@ -1448,6 +1493,9 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, vma->vm_flags = new_flags; vma->vm_userfaultfd_ctx.ctx = ctx; + if (is_vm_hugetlb_page(vma) && uffd_disable_huge_pmd_share(vma)) + hugetlb_unshare_all_pmds(vma); + skip: prev = vma; start = vma->vm_end; diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h index b8200782dede..ff50c8528113 100644 --- a/include/linux/mmu_notifier.h +++ b/include/linux/mmu_notifier.h @@ -51,6 +51,7 @@ enum mmu_notifier_event { MMU_NOTIFY_SOFT_DIRTY, MMU_NOTIFY_RELEASE, MMU_NOTIFY_MIGRATE, + MMU_NOTIFY_HUGETLB_UNSHARE, }; #define MMU_NOTIFIER_RANGE_BLOCKABLE (1 << 0) From patchwork Thu Feb 4 18:34:28 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12068345 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id B824FC433DB for ; Thu, 4 Feb 2021 18:34:56 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 2DEE964E06 for ; Thu, 4 Feb 2021 18:34:56 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 2DEE964E06 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BD0D06B0071; Thu, 4 Feb 2021 13:34:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B5C216B0072; Thu, 4 Feb 2021 13:34:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8E9676B0073; Thu, 4 Feb 2021 13:34:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0243.hostedemail.com [216.40.44.243]) by kanga.kvack.org (Postfix) with ESMTP id 6696F6B0071 for ; Thu, 4 Feb 2021 13:34:55 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 2E9A21EF1 for ; Thu, 4 Feb 2021 18:34:55 +0000 (UTC) X-FDA: 77781436950.23.brush07_1f008fb275de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin23.hostedemail.com (Postfix) with ESMTP id 004E33760C for ; Thu, 4 Feb 2021 18:34:54 +0000 (UTC) X-HE-Tag: brush07_1f008fb275de X-Filterd-Recvd-Size: 20129 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf44.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 18:34:54 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id n81so4115241ybg.20 for ; Thu, 04 Feb 2021 10:34:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=q1L/k0H/UGBr4uAJUByz+DhF3hH27nDxUF8N5ZGwtFY=; b=QRRhdHI8S5EFPnydNq/3UyLEJEXtIyDf7DDQTfWX5mpw2nTHpw95gM2Xkwi+5bAQN7 HnmlG+lBrEuGHeAmYhqx6+TK098yhDBNxPih+7Xia+fC1WryN22k7JYIAJvveqTy2e3p /xRTrRH9D60mV0yPQSDwyFroID0FEIYNcHk35ZKNYRvnARU3XXyJeGkT/3c3xDYRUTPS SRy/r95iKmZnloQD1mjOEqEu2/dB29UrqUj1VN0bsh4uRtyZk7ocqxJ18j0EOWmeryNC jtJAUyvdklG7DTrnktRnSLCr+BnamcxFAO08yRQccR8sljiq/5BQfiadHJ0dRQqjvwXK 96eQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=q1L/k0H/UGBr4uAJUByz+DhF3hH27nDxUF8N5ZGwtFY=; b=JLLR/hQ5PadJjwXz8z8ygy06QEfyoGSDBtS8Sc/obXqVkTIi4pLNQVkpZUE0XP1PR3 hjjBbpBzeP9UY3ylAPstozgD7x8qeh/NbAejEAYJnLUQr4mMx2kWxdfiB1Y4knGg30zY UF1JTTYknDIZoJM4qSit3oP0WtW9VA01f9bsIf9iLUqVq8Oxj4PpJr8OkbqVznm69ewv RQWcz1IUTbgpjLLBGhVGxrYcJ6cNqzUdyJVv6dcKHL688N/ToGhv0cmU5dS8ncE6iQzs bbWvrzj3Lb4C/epuJZgqDI4CRlTl0OjV4YJoHubkjfQrZhDJEBRZcNRm1rkKojZsRwz9 jBzA== X-Gm-Message-State: AOAM530Bm9BOnVar4q9Dn1GpZvVghg0E16024QWWVRo+KrY/8vormc19 3ujIV/FLxyCOIAmZXux8TkDZ1QslDnd+ldwkPN6t X-Google-Smtp-Source: ABdhPJyEkF0Fa2re9M01HKqw2JB4GvealVad+nbeRd/5Pt/ixMObAE36/47Caf4dk2THPtwlSg1vwfSDX+Y8lzny9IPk X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:b001:12c1:dc19:2089]) (user=axelrasmussen job=sendgmr) by 2002:a25:b951:: with SMTP id s17mr843516ybm.498.1612463693608; Thu, 04 Feb 2021 10:34:53 -0800 (PST) Date: Thu, 4 Feb 2021 10:34:28 -0800 In-Reply-To: <20210204183433.1431202-1-axelrasmussen@google.com> Message-Id: <20210204183433.1431202-6-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210204183433.1431202-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH v4 05/10] userfaultfd: add minor fault registration mode From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This feature allows userspace to intercept "minor" faults. By "minor" faults, I mean the following situation: Let there exist two mappings (i.e., VMAs) to the same page(s). One of the mappings is registered with userfaultfd (in minor mode), and the other is not. Via the non-UFFD mapping, the underlying pages have already been allocated & filled with some contents. The UFFD mapping has not yet been faulted in; when it is touched for the first time, this results in what I'm calling a "minor" fault. As a concrete example, when working with hugetlbfs, we have huge_pte_none(), but find_lock_page() finds an existing page. This commit adds the new registration mode, and sets the relevant flag on the VMAs being registered. In the hugetlb fault path, if we find that we have huge_pte_none(), but find_lock_page() does indeed find an existing page, then we have a "minor" fault, and if the VMA has the userfaultfd registration flag, we call into userfaultfd to handle it. Why add a new registration mode, as opposed to adding a feature to MISSING registration, like UFFD_FEATURE_SIGBUS? - The semantics are significantly different. UFFDIO_COPY or UFFDIO_ZEROPAGE do not make sense for these minor faults; userspace would instead just memset() or memcpy() or whatever via the non-UFFD mapping. Unlike MISSING registration, MINOR registration only makes sense for hugetlbfs (or, in the future, shmem), as this is the only way to get two VMAs to a single set of underlying pages. - Doing so would make handle_userfault()'s "reason" argument confusing. We'd pass in "MISSING" even if the pages weren't really missing. Signed-off-by: Axel Rasmussen Reviewed-by: Peter Xu --- fs/proc/task_mmu.c | 1 + fs/userfaultfd.c | 81 ++++++++++++++++++++------------ include/linux/mm.h | 1 + include/linux/userfaultfd_k.h | 15 +++++- include/trace/events/mmflags.h | 1 + include/uapi/linux/userfaultfd.h | 15 +++++- mm/hugetlb.c | 32 +++++++++++++ 7 files changed, 112 insertions(+), 34 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 602e3a52884d..94e951ea3e03 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -651,6 +651,7 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) [ilog2(VM_MTE)] = "mt", [ilog2(VM_MTE_ALLOWED)] = "", #endif + [ilog2(VM_UFFD_MINOR)] = "ui", #ifdef CONFIG_ARCH_HAS_PKEYS /* These come out via ProtectionKey: */ [ilog2(VM_PKEY_BIT0)] = "", diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index a0f66e12026b..c643cf13d957 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -197,24 +197,21 @@ static inline struct uffd_msg userfault_msg(unsigned long address, msg_init(&msg); msg.event = UFFD_EVENT_PAGEFAULT; msg.arg.pagefault.address = address; + /* + * These flags indicate why the userfault occurred: + * - UFFD_PAGEFAULT_FLAG_WP indicates a write protect fault. + * - UFFD_PAGEFAULT_FLAG_MINOR indicates a minor fault. + * - Neither of these flags being set indicates a MISSING fault. + * + * Separately, UFFD_PAGEFAULT_FLAG_WRITE indicates it was a write + * fault. Otherwise, it was a read fault. + */ if (flags & FAULT_FLAG_WRITE) - /* - * If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the - * uffdio_api.features and UFFD_PAGEFAULT_FLAG_WRITE - * was not set in a UFFD_EVENT_PAGEFAULT, it means it - * was a read fault, otherwise if set it means it's - * a write fault. - */ msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WRITE; if (reason & VM_UFFD_WP) - /* - * If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the - * uffdio_api.features and UFFD_PAGEFAULT_FLAG_WP was - * not set in a UFFD_EVENT_PAGEFAULT, it means it was - * a missing fault, otherwise if set it means it's a - * write protect fault. - */ msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WP; + if (reason & VM_UFFD_MINOR) + msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_MINOR; if (features & UFFD_FEATURE_THREAD_ID) msg.arg.pagefault.feat.ptid = task_pid_vnr(current); return msg; @@ -401,8 +398,10 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) BUG_ON(ctx->mm != mm); - VM_BUG_ON(reason & ~(VM_UFFD_MISSING|VM_UFFD_WP)); - VM_BUG_ON(!(reason & VM_UFFD_MISSING) ^ !!(reason & VM_UFFD_WP)); + /* Any unrecognized flag is a bug. */ + VM_BUG_ON(reason & ~__VM_UFFD_FLAGS); + /* 0 or > 1 flags set is a bug; we expect exactly 1. */ + VM_BUG_ON(!reason || !!(reason & (reason - 1))); if (ctx->features & UFFD_FEATURE_SIGBUS) goto out; @@ -612,7 +611,7 @@ static void userfaultfd_event_wait_completion(struct userfaultfd_ctx *ctx, for (vma = mm->mmap; vma; vma = vma->vm_next) if (vma->vm_userfaultfd_ctx.ctx == release_new_ctx) { vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING); + vma->vm_flags &= ~__VM_UFFD_FLAGS; } mmap_write_unlock(mm); @@ -644,7 +643,7 @@ int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *fcs) octx = vma->vm_userfaultfd_ctx.ctx; if (!octx || !(octx->features & UFFD_FEATURE_EVENT_FORK)) { vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING); + vma->vm_flags &= ~__VM_UFFD_FLAGS; return 0; } @@ -726,7 +725,7 @@ void mremap_userfaultfd_prep(struct vm_area_struct *vma, } else { /* Drop uffd context if remap feature not enabled */ vma->vm_userfaultfd_ctx = NULL_VM_UFFD_CTX; - vma->vm_flags &= ~(VM_UFFD_WP | VM_UFFD_MISSING); + vma->vm_flags &= ~__VM_UFFD_FLAGS; } } @@ -867,12 +866,12 @@ static int userfaultfd_release(struct inode *inode, struct file *file) for (vma = mm->mmap; vma; vma = vma->vm_next) { cond_resched(); BUG_ON(!!vma->vm_userfaultfd_ctx.ctx ^ - !!(vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP))); + !!(vma->vm_flags & __VM_UFFD_FLAGS)); if (vma->vm_userfaultfd_ctx.ctx != ctx) { prev = vma; continue; } - new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP); + new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; prev = vma_merge(mm, prev, vma->vm_start, vma->vm_end, new_flags, vma->anon_vma, vma->vm_file, vma->vm_pgoff, @@ -1305,9 +1304,29 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma, unsigned long vm_flags) { /* FIXME: add WP support to hugetlbfs and shmem */ - return vma_is_anonymous(vma) || - ((is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) && - !(vm_flags & VM_UFFD_WP)); + if (vm_flags & VM_UFFD_WP) { + if (is_vm_hugetlb_page(vma) || vma_is_shmem(vma)) + return false; + } + + if (vm_flags & VM_UFFD_MINOR) { + /* + * The use case for minor registration (intercepting minor + * faults) is to handle the case where a page is present, but + * needs to be modified before it can be used. This only makes + * sense when you have two mappings to the same underlying + * pages (one UFFD registered, one not), but the memory doesn't + * have to be shared (consider one process mapping a hugetlbfs + * file with MAP_SHARED, and then a second process doing + * MAP_PRIVATE). + */ + /* FIXME: Add minor fault interception for shmem. */ + if (!is_vm_hugetlb_page(vma)) + return false; + } + + return vma_is_anonymous(vma) || is_vm_hugetlb_page(vma) || + vma_is_shmem(vma); } static int userfaultfd_register(struct userfaultfd_ctx *ctx, @@ -1333,14 +1352,15 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, ret = -EINVAL; if (!uffdio_register.mode) goto out; - if (uffdio_register.mode & ~(UFFDIO_REGISTER_MODE_MISSING| - UFFDIO_REGISTER_MODE_WP)) + if (uffdio_register.mode & ~UFFD_API_REGISTER_MODES) goto out; vm_flags = 0; if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MISSING) vm_flags |= VM_UFFD_MISSING; if (uffdio_register.mode & UFFDIO_REGISTER_MODE_WP) vm_flags |= VM_UFFD_WP; + if (uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR) + vm_flags |= VM_UFFD_MINOR; ret = validate_range(mm, &uffdio_register.range.start, uffdio_register.range.len); @@ -1384,7 +1404,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, cond_resched(); BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^ - !!(cur->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP))); + !!(cur->vm_flags & __VM_UFFD_FLAGS)); /* check not compatible vmas */ ret = -EINVAL; @@ -1464,8 +1484,7 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, start = vma->vm_start; vma_end = min(end, vma->vm_end); - new_flags = (vma->vm_flags & - ~(VM_UFFD_MISSING|VM_UFFD_WP)) | vm_flags; + new_flags = (vma->vm_flags & ~__VM_UFFD_FLAGS) | vm_flags; prev = vma_merge(mm, prev, start, vma_end, new_flags, vma->anon_vma, vma->vm_file, vma->vm_pgoff, vma_policy(vma), @@ -1587,7 +1606,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, cond_resched(); BUG_ON(!!cur->vm_userfaultfd_ctx.ctx ^ - !!(cur->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP))); + !!(cur->vm_flags & __VM_UFFD_FLAGS)); /* * Check not compatible vmas, not strictly required @@ -1638,7 +1657,7 @@ static int userfaultfd_unregister(struct userfaultfd_ctx *ctx, wake_userfault(vma->vm_userfaultfd_ctx.ctx, &range); } - new_flags = vma->vm_flags & ~(VM_UFFD_MISSING | VM_UFFD_WP); + new_flags = vma->vm_flags & ~__VM_UFFD_FLAGS; prev = vma_merge(mm, prev, start, vma_end, new_flags, vma->anon_vma, vma->vm_file, vma->vm_pgoff, vma_policy(vma), diff --git a/include/linux/mm.h b/include/linux/mm.h index ecdf8a8cd6ae..1d7041bd3148 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -276,6 +276,7 @@ extern unsigned int kobjsize(const void *objp); #define VM_PFNMAP 0x00000400 /* Page-ranges managed without "struct page", just pure PFN */ #define VM_DENYWRITE 0x00000800 /* ETXTBSY on write attempts.. */ #define VM_UFFD_WP 0x00001000 /* wrprotect pages tracking */ +#define VM_UFFD_MINOR 0x00002000 /* minor fault interception */ #define VM_LOCKED 0x00002000 #define VM_IO 0x00004000 /* Memory mapped I/O or similar */ diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index c63ccdae3eab..0390e5ac63b3 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -17,6 +17,9 @@ #include #include +/* The set of all possible UFFD-related VM flags. */ +#define __VM_UFFD_FLAGS (VM_UFFD_MISSING | VM_UFFD_WP | VM_UFFD_MINOR) + /* * CAREFUL: Check include/uapi/asm-generic/fcntl.h when defining * new flags, since they might collide with O_* ones. We want @@ -71,6 +74,11 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma) return vma->vm_flags & VM_UFFD_WP; } +static inline bool userfaultfd_minor(struct vm_area_struct *vma) +{ + return vma->vm_flags & VM_UFFD_MINOR; +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { @@ -85,7 +93,7 @@ static inline bool userfaultfd_huge_pmd_wp(struct vm_area_struct *vma, static inline bool userfaultfd_armed(struct vm_area_struct *vma) { - return vma->vm_flags & (VM_UFFD_MISSING | VM_UFFD_WP); + return vma->vm_flags & __VM_UFFD_FLAGS; } extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); @@ -132,6 +140,11 @@ static inline bool userfaultfd_wp(struct vm_area_struct *vma) return false; } +static inline bool userfaultfd_minor(struct vm_area_struct *vma) +{ + return false; +} + static inline bool userfaultfd_pte_wp(struct vm_area_struct *vma, pte_t pte) { diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 67018d367b9f..2d583ffd4100 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -151,6 +151,7 @@ IF_HAVE_PG_ARCH_2(PG_arch_2, "arch_2" ) {VM_PFNMAP, "pfnmap" }, \ {VM_DENYWRITE, "denywrite" }, \ {VM_UFFD_WP, "uffd_wp" }, \ + {VM_UFFD_MINOR, "uffd_minor" }, \ {VM_LOCKED, "locked" }, \ {VM_IO, "io" }, \ {VM_SEQ_READ, "seqread" }, \ diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 5f2d88212f7c..f24dd4fcbad9 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -19,15 +19,19 @@ * means the userland is reading). */ #define UFFD_API ((__u64)0xAA) +#define UFFD_API_REGISTER_MODES (UFFDIO_REGISTER_MODE_MISSING | \ + UFFDIO_REGISTER_MODE_WP | \ + UFFDIO_REGISTER_MODE_MINOR) #define UFFD_API_FEATURES (UFFD_FEATURE_PAGEFAULT_FLAG_WP | \ UFFD_FEATURE_EVENT_FORK | \ UFFD_FEATURE_EVENT_REMAP | \ - UFFD_FEATURE_EVENT_REMOVE | \ + UFFD_FEATURE_EVENT_REMOVE | \ UFFD_FEATURE_EVENT_UNMAP | \ UFFD_FEATURE_MISSING_HUGETLBFS | \ UFFD_FEATURE_MISSING_SHMEM | \ UFFD_FEATURE_SIGBUS | \ - UFFD_FEATURE_THREAD_ID) + UFFD_FEATURE_THREAD_ID | \ + UFFD_FEATURE_MINOR_HUGETLBFS) #define UFFD_API_IOCTLS \ ((__u64)1 << _UFFDIO_REGISTER | \ (__u64)1 << _UFFDIO_UNREGISTER | \ @@ -127,6 +131,7 @@ struct uffd_msg { /* flags for UFFD_EVENT_PAGEFAULT */ #define UFFD_PAGEFAULT_FLAG_WRITE (1<<0) /* If this was a write fault */ #define UFFD_PAGEFAULT_FLAG_WP (1<<1) /* If reason is VM_UFFD_WP */ +#define UFFD_PAGEFAULT_FLAG_MINOR (1<<2) /* If reason is VM_UFFD_MINOR */ struct uffdio_api { /* userland asks for an API number and the features to enable */ @@ -171,6 +176,10 @@ struct uffdio_api { * * UFFD_FEATURE_THREAD_ID pid of the page faulted task_struct will * be returned, if feature is not requested 0 will be returned. + * + * UFFD_FEATURE_MINOR_HUGETLBFS indicates that minor faults + * can be intercepted (via REGISTER_MODE_MINOR) for + * hugetlbfs-backed pages. */ #define UFFD_FEATURE_PAGEFAULT_FLAG_WP (1<<0) #define UFFD_FEATURE_EVENT_FORK (1<<1) @@ -181,6 +190,7 @@ struct uffdio_api { #define UFFD_FEATURE_EVENT_UNMAP (1<<6) #define UFFD_FEATURE_SIGBUS (1<<7) #define UFFD_FEATURE_THREAD_ID (1<<8) +#define UFFD_FEATURE_MINOR_HUGETLBFS (1<<9) __u64 features; __u64 ioctls; @@ -195,6 +205,7 @@ struct uffdio_register { struct uffdio_range range; #define UFFDIO_REGISTER_MODE_MISSING ((__u64)1<<0) #define UFFDIO_REGISTER_MODE_WP ((__u64)1<<1) +#define UFFDIO_REGISTER_MODE_MINOR ((__u64)1<<2) __u64 mode; /* diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5b3f00a1e276..2b6c1c67ee88 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4375,6 +4375,38 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, VM_FAULT_SET_HINDEX(hstate_index(h)); goto backout_unlocked; } + + /* Check for page in userfault range. */ + if (userfaultfd_minor(vma)) { + u32 hash; + struct vm_fault vmf = { + .vma = vma, + .address = haddr, + .flags = flags, + /* + * Hard to debug if it ends up being used by a + * callee that assumes something about the + * other uninitialized fields... same as in + * memory.c + */ + }; + + unlock_page(page); + + /* + * hugetlb_fault_mutex and i_mmap_rwsem must be dropped + * before handling userfault. Reacquire after handling + * fault to make calling code simpler. + */ + + hash = hugetlb_fault_mutex_hash(mapping, idx); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + i_mmap_unlock_read(mapping); + ret = handle_userfault(&vmf, VM_UFFD_MINOR); + i_mmap_lock_read(mapping); + mutex_lock(&hugetlb_fault_mutex_table[hash]); + goto out; + } } /* From patchwork Thu Feb 4 18:34:29 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12068347 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 30D0DC433E0 for ; Thu, 4 Feb 2021 18:34:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id B6F5064E06 for ; Thu, 4 Feb 2021 18:34:58 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org B6F5064E06 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 2D1C26B0072; Thu, 4 Feb 2021 13:34:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 285F86B0073; Thu, 4 Feb 2021 13:34:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0FEE56B0074; Thu, 4 Feb 2021 13:34:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0235.hostedemail.com [216.40.44.235]) by kanga.kvack.org (Postfix) with ESMTP id ED7496B0072 for ; Thu, 4 Feb 2021 13:34:56 -0500 (EST) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B55FA8249980 for ; Thu, 4 Feb 2021 18:34:56 +0000 (UTC) X-FDA: 77781436992.26.fear07_0715edc275de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin26.hostedemail.com (Postfix) with ESMTP id 8C4291804B669 for ; Thu, 4 Feb 2021 18:34:56 +0000 (UTC) X-HE-Tag: fear07_0715edc275de X-Filterd-Recvd-Size: 5991 Received: from mail-qt1-f202.google.com (mail-qt1-f202.google.com [209.85.160.202]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 18:34:55 +0000 (UTC) Received: by mail-qt1-f202.google.com with SMTP id j14so3313172qtv.3 for ; Thu, 04 Feb 2021 10:34:55 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=dnNo0P9pQk4mCWpPk9SkHpqnJh6eIBtgx02Oeb/UXjo=; b=Al/UauZV1DDE1GQ/B3cVQR4rotIVxh2DLihJOd/5CWM7XthzAeW499I1SxHk1BPVLl dEowBo6pNqZumu/JZm1jSFVw8i1MjyIgVLbAi90z4s0xRZzHVnnToEs7fW+HS9kf00lk 4oAQmnRtfTxMbdZjOrBsGE/MZfMsT25k5kWdEbxtyY9wMtiqQtxHtx5uhlPsRx/VHdLG sJ/ahIQhrG/y3oGYrRQvsUscM6Yru+JLmb67PlIw4/1jSXn+PrZ7i7EzVDQgg/tdricm UC9q2irp8osM7qLusXPZ7c9F8DyZjphS272uL5NoPER97hCV15fQvLGUwQktxrXH2CMl J3+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=dnNo0P9pQk4mCWpPk9SkHpqnJh6eIBtgx02Oeb/UXjo=; b=l2OvYXcI6GdlnJmtxMmL3WFxEFxilZJhOGtB1lddQ2YQseMaWZEGJ9dK0m6eY2S6bn /i4j0GhycxUcwQ0Q2u5lQi0TGaNkSbc04sz+F+XdmeeyV1WCnszBzn4PfrBAlR/hnEmY 0fHwWNmvosi9V8cQfgmHQ1NIy9vqVQJQGZ67S89CZBuiOY9lxdksHY24Iw5DGeZwPytm NgeONos29u6q/tPzcB0ramLr0SNXdgbHX/wltjAF9um09Y10P/n8AeoY2HQXzyzz8Wtl gcDNNW3uO3BaNBHiTZk4omhxvkuVv3dpVcYu0HzCUtpsVmQfOfYX0qDEzCohnj9K/xK5 Zkmg== X-Gm-Message-State: AOAM533Y33IMLZ1X5xbo2LoYm5EKJlonyoxdC+fISd2xfBOqvOioOyY0 N5zRSGlzhi7C9ucYIUzI79M70ZLzP+5RE9SKp4Hg X-Google-Smtp-Source: ABdhPJwzY/m5iDRQzjnkOcfC3idOuJxllyLq6tjUscyKEYj5/eihbt8Wssn5DtrwP7t8RsmeAVnnF7FClkXApTydM2XJ X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:b001:12c1:dc19:2089]) (user=axelrasmussen job=sendgmr) by 2002:a0c:e652:: with SMTP id c18mr505898qvn.59.1612463695305; Thu, 04 Feb 2021 10:34:55 -0800 (PST) Date: Thu, 4 Feb 2021 10:34:29 -0800 In-Reply-To: <20210204183433.1431202-1-axelrasmussen@google.com> Message-Id: <20210204183433.1431202-7-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210204183433.1431202-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH v4 06/10] userfaultfd: disable huge PMD sharing for MINOR registered VMAs From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: As the comment says: for the MINOR fault use case, although the page might be present and populated in the other (non-UFFD-registered) half of the mapping, it may be out of date, and we explicitly want userspace to get a minor fault so it can check and potentially update the page's contents. Huge PMD sharing would prevent these faults from occurring for suitably aligned areas, so disable it upon UFFD registration. Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- include/linux/userfaultfd_k.h | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index 0390e5ac63b3..e060d5f77cc5 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -56,12 +56,19 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct vm_area_struct *vma, } /* - * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp - * protect information is per pgtable entry. + * Never enable huge pmd sharing on some uffd registered vmas: + * + * - VM_UFFD_WP VMAs, because write protect information is per pgtable entry. + * + * - VM_UFFD_MINOR VMAs, because otherwise we would never get minor faults for + * VMAs which share huge pmds. (If you have two mappings to the same + * underlying pages, and fault in the non-UFFD-registered one with a write, + * with huge pmd sharing this would *also* setup the second UFFD-registered + * mapping, and we'd not get minor faults.) */ static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma) { - return vma->vm_flags & VM_UFFD_WP; + return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR); } static inline bool userfaultfd_missing(struct vm_area_struct *vma) From patchwork Thu Feb 4 18:34:30 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12068349 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6897AC433DB for ; Thu, 4 Feb 2021 18:35:01 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 1149264D9A for ; Thu, 4 Feb 2021 18:35:01 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1149264D9A Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 199236B0073; Thu, 4 Feb 2021 13:34:59 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 0FD8C6B0074; Thu, 4 Feb 2021 13:34:59 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F2A7F6B0075; Thu, 4 Feb 2021 13:34:58 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0175.hostedemail.com [216.40.44.175]) by kanga.kvack.org (Postfix) with ESMTP id D7B5F6B0073 for ; Thu, 4 Feb 2021 13:34:58 -0500 (EST) Received: from smtpin11.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 9EE1B180AD80F for ; Thu, 4 Feb 2021 18:34:58 +0000 (UTC) X-FDA: 77781437076.11.pies36_0a150fb275de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin11.hostedemail.com (Postfix) with ESMTP id 7EB6A180F8B80 for ; Thu, 4 Feb 2021 18:34:58 +0000 (UTC) X-HE-Tag: pies36_0a150fb275de X-Filterd-Recvd-Size: 7257 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf23.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 18:34:57 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id v17so4164891ybq.9 for ; Thu, 04 Feb 2021 10:34:57 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=6t26rKAvHjVFZcXii/SQ9LPddb8Vo4DY2aiNhPD/qRs=; b=cqkrziIP5zqsbNi+7iPKrIRJFkRc5OoJGU6PPxyXFOvTc5OTgyewUBCxxU7+RiugT2 rLqQmq7zKRktfRXm4KL6H97QyIP8GuCtz0LucRcs4E9xMev4vFQHoC92QvxehmX7JU8U nRhtKwXJoDT4jT0ppA10AMLICawO44OafW3HomNTDDnDn5hBePss1bXwnTI5BEwaCwoT XaZi6GYGz7iW59Coj6CGgt8ir7XbK/7yz6ZwapZMRDI9mX2nxtSjES+HW4yOALhbNCkD YTB9e8wV4XGRRBD07yV9RL4YAkk1rMrOagY8UgbCof4KrLKLBqb9stuDztyc1OMV42Ys /UDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=6t26rKAvHjVFZcXii/SQ9LPddb8Vo4DY2aiNhPD/qRs=; b=pdyGNHgbff06wHY8nETs7LdoSwxry907E+ezk4YmKLRqZclqa5iA7wmW9Vyhc9stAM tUqYXPfJ52t8sTC4cJTQYaNOR5lLlvEDyBu063KV5lsyRHVY7P8NhKKZW88RRcfSVyVx vgpFJzgdURnh2G5qwH4kpFDcPDcXuCe48jnlDxz+nQaplCnZjtC+BwnvHbfGA/GySxcL OZL5z2QBXczff0uCF4hyo971qn1yzoP0KHgyhSI6VB8Be6eGSLFbPEr/65TM8ZdFymJE CT3d0Gg7mngdBAOnf0ergipLTQ5KvuYCcwh4LxrJXajBNO2rYtnrc51GXlkzFmfWe8QU vNcA== X-Gm-Message-State: AOAM530uarsQ+t+L39SQ66ieKnLy+BiCzw0Smv9bFMNe7jEfWeBKB7qp YDEyXYO2/CeDTcv3tkI98V99e9fxQd+BwEfdhXpW X-Google-Smtp-Source: ABdhPJyaJGA2aAl5NZt4ao326zJOwq4Igh8Fg36gWGJSfGsKs4TIk146f12vqJ/nXSy/hNeR1lumq3NaHFnv9uKgt4+q X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:b001:12c1:dc19:2089]) (user=axelrasmussen job=sendgmr) by 2002:a25:cb8f:: with SMTP id b137mr841939ybg.312.1612463697102; Thu, 04 Feb 2021 10:34:57 -0800 (PST) Date: Thu, 4 Feb 2021 10:34:30 -0800 In-Reply-To: <20210204183433.1431202-1-axelrasmussen@google.com> Message-Id: <20210204183433.1431202-8-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210204183433.1431202-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH v4 07/10] userfaultfd: hugetlbfs: only compile UFFD helpers if config enabled From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For background, mm/userfaultfd.c provides a general mcopy_atomic implementation. But some types of memory (i.e., hugetlb and shmem) need a slightly different implementation, so they provide their own helpers for this. In other words, userfaultfd is the only caller of these functions. This patch achieves two things: 1. Don't spend time compiling code which will end up never being referenced anyway (a small build time optimization). 2. In patches later in this series, we extend the signature of these helpers with UFFD-specific state (a mode enumeration). Once this happens, we *have to* either not compile the helpers, or unconditionally define the UFFD-only state (which seems messier to me). This includes the declarations in the headers, as otherwise they'd yield warnings about implicitly defining the type of those arguments. Reviewed-by: Mike Kravetz Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- include/linux/hugetlb.h | 4 ++++ mm/hugetlb.c | 2 ++ 2 files changed, 6 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index af40500c99f0..261c3284015d 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -134,11 +134,13 @@ void hugetlb_show_meminfo(void); unsigned long hugetlb_total_pages(void); vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags); +#ifdef CONFIG_USERFAULTFD int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, struct page **pagep); +#endif /* CONFIG_USERFAULTFD */ int hugetlb_reserve_pages(struct inode *inode, long from, long to, struct vm_area_struct *vma, vm_flags_t vm_flags); @@ -309,6 +311,7 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather *tlb, BUG(); } +#ifdef CONFIG_USERFAULTFD static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct vm_area_struct *dst_vma, @@ -319,6 +322,7 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, BUG(); return 0; } +#endif /* CONFIG_USERFAULTFD */ static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2b6c1c67ee88..868292cf148a 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4647,6 +4647,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, return ret; } +#ifdef CONFIG_USERFAULTFD /* * Used by userfaultfd UFFDIO_COPY. Based on mcopy_atomic_pte with * modifications for huge pages. @@ -4777,6 +4778,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, put_page(page); goto out; } +#endif /* CONFIG_USERFAULTFD */ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, struct page **pages, struct vm_area_struct **vmas, From patchwork Thu Feb 4 18:34:31 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12068351 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 32B16C433E0 for ; Thu, 4 Feb 2021 18:35:04 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 988F564E49 for ; Thu, 4 Feb 2021 18:35:03 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 988F564E49 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 3E3516B0074; Thu, 4 Feb 2021 13:35:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 343E66B0075; Thu, 4 Feb 2021 13:35:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0B0816B0078; Thu, 4 Feb 2021 13:35:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id E28626B0074 for ; Thu, 4 Feb 2021 13:35:00 -0500 (EST) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id A603E33C4 for ; Thu, 4 Feb 2021 18:35:00 +0000 (UTC) X-FDA: 77781437160.25.mass85_3f1380a275de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin25.hostedemail.com (Postfix) with ESMTP id 710751804E3A1 for ; Thu, 4 Feb 2021 18:35:00 +0000 (UTC) X-HE-Tag: mass85_3f1380a275de X-Filterd-Recvd-Size: 20411 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf15.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 18:34:59 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id n81so4115465ybg.20 for ; Thu, 04 Feb 2021 10:34:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=3lWRiNvorrANLNubkdazwO3PEMqtv4ngxTmOGJ+LQqE=; b=reiuVt4YTT/tCaomr+epXUe2L9IJ+FX8GAmxw+5eaHGh+1cShkT04VjwADJes6B/N5 z4YZMqEfe922Lw8KooIeII4w1pJObwl6rZ75nZjedVy6uTNXEZWHTc4yRn2/Q5NaLM5M MTdLRQPHfptXg9kveJKeFC+zq2A1dq6gV6WV9eAm6Ksb64nTjPfAoGDbk5yiyQdd5aRW wC01J4026tStlLvxtC+6//juBc+Pc+U8rZZ4AlmT2ID9XR1WcbUov4i9Xewl1pzL/cCz pLP8QkLkzK0m427RUB3EznNIaJz2Gsurp1LuhGDNxCbWsLmdli8oi7sELLBV647WaIPG aSPw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=3lWRiNvorrANLNubkdazwO3PEMqtv4ngxTmOGJ+LQqE=; b=mMaZIjuiTZz5nwisoXL72ar9i8EJju81D9JXwFNB4HGpDTiozmUKSwJQ2PEqYX2Ald xoKdQcFZiUK5MSfUksEQQ6IdC8KCZNWsca3EEQUoap8m0UHoRqSeZE4ER3Fa5LfADHzk cX2sE6SIAkgpGIensk5jBYuzIl/hmOySee6CfpVhaK9MN9syvrt5MbyrCxK9oNRcFmWB zdXKkmus7oCLhXGI+xaHTiPYP1mzPCynrn2Zzz9Of3a0km8qptovwVAlLSIHCw0A4Xki 0ZMKRPscSHv+DbS9RDmwg6GsDUJKsYAvj85UBAUQmSgg7VVrNDz/MK5DCKWnN7BboP4O hYJA== X-Gm-Message-State: AOAM533xU6DHX75JypYC41Ne61y3P/0FLyKhKS7H2qFHlOjpHuL451il dNI0PkywRIH7jOcfEcx7/1w/U9+TNHP7Gm90CDdp X-Google-Smtp-Source: ABdhPJyM/n+o2xKV6PEEU4Ej1wNFj2npcLtYgnbgz5T8IsnusUFfZbjaHjq8Fp1TdwNHaJ5gRe33hYxG5D8UU7igcLHu X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:b001:12c1:dc19:2089]) (user=axelrasmussen job=sendgmr) by 2002:a25:4015:: with SMTP id n21mr911266yba.180.1612463698992; Thu, 04 Feb 2021 10:34:58 -0800 (PST) Date: Thu, 4 Feb 2021 10:34:31 -0800 In-Reply-To: <20210204183433.1431202-1-axelrasmussen@google.com> Message-Id: <20210204183433.1431202-9-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210204183433.1431202-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH v4 08/10] userfaultfd: add UFFDIO_CONTINUE ioctl From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This ioctl is how userspace ought to resolve "minor" userfaults. The idea is, userspace is notified that a minor fault has occurred. It might change the contents of the page using its second non-UFFD mapping, or not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured the page contents are correct, carry on setting up the mapping". Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in the minor fault case, we already have some pre-existing underlying page. Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping. We'd just use memcpy() or similar instead. It turns out hugetlb_mcopy_atomic_pte() already does very close to what we want, if an existing page is provided via `struct page **pagep`. We already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so just extend that design: add an enum for the three modes of operation, and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE case. (Basically, look up the existing page, and avoid adding the existing page to the page cache or calling set_page_huge_active() on it.) Signed-off-by: Axel Rasmussen --- fs/userfaultfd.c | 67 ++++++++++++++++++++++++++++++++ include/linux/hugetlb.h | 3 ++ include/linux/userfaultfd_k.h | 18 +++++++++ include/uapi/linux/userfaultfd.h | 21 +++++++++- mm/hugetlb.c | 36 ++++++++++------- mm/userfaultfd.c | 49 ++++++++++++++--------- 6 files changed, 160 insertions(+), 34 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index c643cf13d957..c311086b6085 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1536,6 +1536,10 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_WP)) ioctls_out &= ~((__u64)1 << _UFFDIO_WRITEPROTECT); + /* CONTINUE ioctl is only supported for MINOR ranges. */ + if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR)) + ioctls_out &= ~((__u64)1 << _UFFDIO_CONTINUE); + /* * Now that we scanned all vmas we can already tell * userland which ioctls methods are guaranteed to @@ -1889,6 +1893,66 @@ static int userfaultfd_writeprotect(struct userfaultfd_ctx *ctx, return ret; } +static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg) +{ + __s64 ret; + struct uffdio_continue uffdio_continue; + struct uffdio_continue __user *user_uffdio_continue; + struct userfaultfd_wake_range range; + + user_uffdio_continue = (struct uffdio_continue __user *)arg; + + ret = -EAGAIN; + if (READ_ONCE(ctx->mmap_changing)) + goto out; + + ret = -EFAULT; + if (copy_from_user(&uffdio_continue, user_uffdio_continue, + /* don't copy the output fields */ + sizeof(uffdio_continue) - (sizeof(__s64)))) + goto out; + + ret = validate_range(ctx->mm, &uffdio_continue.range.start, + uffdio_continue.range.len); + if (ret) + goto out; + + ret = -EINVAL; + /* double check for wraparound just in case. */ + if (uffdio_continue.range.start + uffdio_continue.range.len <= + uffdio_continue.range.start) { + goto out; + } + if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE) + goto out; + + if (mmget_not_zero(ctx->mm)) { + ret = mcopy_continue(ctx->mm, uffdio_continue.range.start, + uffdio_continue.range.len, + &ctx->mmap_changing); + mmput(ctx->mm); + } else { + return -ESRCH; + } + + if (unlikely(put_user(ret, &user_uffdio_continue->mapped))) + return -EFAULT; + if (ret < 0) + goto out; + + /* len == 0 would wake all */ + BUG_ON(!ret); + range.len = ret; + if (!(uffdio_continue.mode & UFFDIO_CONTINUE_MODE_DONTWAKE)) { + range.start = uffdio_continue.range.start; + wake_userfault(ctx, &range); + } + ret = range.len == uffdio_continue.range.len ? 0 : -EAGAIN; + +out: + return ret; +} + static inline unsigned int uffd_ctx_features(__u64 user_features) { /* @@ -1973,6 +2037,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned cmd, case UFFDIO_WRITEPROTECT: ret = userfaultfd_writeprotect(ctx, arg); break; + case UFFDIO_CONTINUE: + ret = userfaultfd_continue(ctx, arg); + break; } return ret; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 261c3284015d..9d4badc73cc8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -11,6 +11,7 @@ #include #include #include +#include struct ctl_table; struct user_struct; @@ -139,6 +140,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, + enum mcopy_atomic_mode mode, struct page **pagep); #endif /* CONFIG_USERFAULTFD */ int hugetlb_reserve_pages(struct inode *inode, long from, long to, @@ -317,6 +319,7 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, + enum mcopy_atomic_mode mode, struct page **pagep) { BUG(); diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index e060d5f77cc5..333ee531e8be 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -37,6 +37,22 @@ extern int sysctl_unprivileged_userfaultfd; extern vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason); +/* + * The mode of operation for __mcopy_atomic and its helpers. + * + * This is almost an implementation detail (mcopy_atomic below doesn't take this + * as a parameter), but it's exposed here because memory-kind-specific + * implementations (e.g. hugetlbfs) need to know the mode of operation. + */ +enum mcopy_atomic_mode { + /* A normal copy_from_user into the destination range. */ + MCOPY_ATOMIC_NORMAL, + /* Don't copy; map the destination range to the zero page. */ + MCOPY_ATOMIC_ZEROPAGE, + /* Just setup the dst_vma, without modifying the underlying page(s). */ + MCOPY_ATOMIC_CONTINUE, +}; + extern ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, bool *mmap_changing, __u64 mode); @@ -44,6 +60,8 @@ extern ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long len, bool *mmap_changing); +extern ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long dst_start, + unsigned long len, bool *mmap_changing); extern int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool enable_wp, bool *mmap_changing); diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index f24dd4fcbad9..bafbeb1a2624 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -40,10 +40,12 @@ ((__u64)1 << _UFFDIO_WAKE | \ (__u64)1 << _UFFDIO_COPY | \ (__u64)1 << _UFFDIO_ZEROPAGE | \ - (__u64)1 << _UFFDIO_WRITEPROTECT) + (__u64)1 << _UFFDIO_WRITEPROTECT | \ + (__u64)1 << _UFFDIO_CONTINUE) #define UFFD_API_RANGE_IOCTLS_BASIC \ ((__u64)1 << _UFFDIO_WAKE | \ - (__u64)1 << _UFFDIO_COPY) + (__u64)1 << _UFFDIO_COPY | \ + (__u64)1 << _UFFDIO_CONTINUE) /* * Valid ioctl command number range with this API is from 0x00 to @@ -59,6 +61,7 @@ #define _UFFDIO_COPY (0x03) #define _UFFDIO_ZEROPAGE (0x04) #define _UFFDIO_WRITEPROTECT (0x06) +#define _UFFDIO_CONTINUE (0x07) #define _UFFDIO_API (0x3F) /* userfaultfd ioctl ids */ @@ -77,6 +80,8 @@ struct uffdio_zeropage) #define UFFDIO_WRITEPROTECT _IOWR(UFFDIO, _UFFDIO_WRITEPROTECT, \ struct uffdio_writeprotect) +#define UFFDIO_CONTINUE _IOR(UFFDIO, _UFFDIO_CONTINUE, \ + struct uffdio_continue) /* read() structure */ struct uffd_msg { @@ -268,6 +273,18 @@ struct uffdio_writeprotect { __u64 mode; }; +struct uffdio_continue { + struct uffdio_range range; +#define UFFDIO_CONTINUE_MODE_DONTWAKE ((__u64)1<<0) + __u64 mode; + + /* + * Fields below here are written by the ioctl and must be at the end: + * the copy_from_user will not read past here. + */ + __s64 mapped; +}; + /* * Flags for the userfaultfd(2) system call itself. */ diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 868292cf148a..fd0693996029 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -39,7 +39,6 @@ #include #include #include -#include #include #include "internal.h" @@ -4657,6 +4656,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, + enum mcopy_atomic_mode mode, struct page **pagep) { struct address_space *mapping; @@ -4668,8 +4668,18 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, spinlock_t *ptl; int ret; struct page *page; + vm_flags_t dst_pte_flags; - if (!*pagep) { + mapping = dst_vma->vm_file->f_mapping; + idx = vma_hugecache_offset(h, dst_vma, dst_addr); + + if (mode == MCOPY_ATOMIC_CONTINUE) { + ret = -EFAULT; + page = find_lock_page(mapping, idx); + *pagep = NULL; + if (!page) + goto out; + } else if (!*pagep) { ret = -ENOMEM; page = alloc_huge_page(dst_vma, dst_addr, 0); if (IS_ERR(page)) @@ -4698,13 +4708,8 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, */ __SetPageUptodate(page); - mapping = dst_vma->vm_file->f_mapping; - idx = vma_hugecache_offset(h, dst_vma, dst_addr); - - /* - * If shared, add to page cache - */ - if (vm_shared) { + /* Add shared, newly allocated pages to the page cache. */ + if (vm_shared && mode != MCOPY_ATOMIC_CONTINUE) { size = i_size_read(mapping->host) >> huge_page_shift(h); ret = -EFAULT; if (idx >= size) @@ -4749,22 +4754,27 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, hugepage_add_new_anon_rmap(page, dst_vma, dst_addr); } - _dst_pte = make_huge_pte(dst_vma, page, dst_vma->vm_flags & VM_WRITE); - if (dst_vma->vm_flags & VM_WRITE) + dst_pte_flags = dst_vma->vm_flags & VM_WRITE; + /* For CONTINUE on a non-shared VMA, don't set VM_WRITE for CoW. */ + if (mode == MCOPY_ATOMIC_CONTINUE && !vm_shared) + dst_pte_flags &= ~VM_WRITE; + _dst_pte = make_huge_pte(dst_vma, page, dst_pte_flags); + if (dst_pte_flags & VM_WRITE) _dst_pte = huge_pte_mkdirty(_dst_pte); _dst_pte = pte_mkyoung(_dst_pte); set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); (void)huge_ptep_set_access_flags(dst_vma, dst_addr, dst_pte, _dst_pte, - dst_vma->vm_flags & VM_WRITE); + dst_pte_flags); hugetlb_count_add(pages_per_huge_page(h), dst_mm); /* No need to invalidate - it was non-present before */ update_mmu_cache(dst_vma, dst_addr, dst_pte); spin_unlock(ptl); - set_page_huge_active(page); + if (mode != MCOPY_ATOMIC_CONTINUE) + set_page_huge_active(page); if (vm_shared) unlock_page(page); ret = 0; diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index b2ce61c1b50d..7bf83ffa456b 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -207,7 +207,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage) + enum mcopy_atomic_mode mode) { int vm_alloc_shared = dst_vma->vm_flags & VM_SHARED; int vm_shared = dst_vma->vm_flags & VM_SHARED; @@ -227,7 +227,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, * by THP. Since we can not reliably insert a zero page, this * feature is not supported. */ - if (zeropage) { + if (mode == MCOPY_ATOMIC_ZEROPAGE) { mmap_read_unlock(dst_mm); return -EINVAL; } @@ -273,8 +273,6 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } while (src_addr < src_start + len) { - pte_t dst_pteval; - BUG_ON(dst_addr >= dst_start + len); /* @@ -297,16 +295,17 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, goto out_unlock; } - err = -EEXIST; - dst_pteval = huge_ptep_get(dst_pte); - if (!huge_pte_none(dst_pteval)) { - mutex_unlock(&hugetlb_fault_mutex_table[hash]); - i_mmap_unlock_read(mapping); - goto out_unlock; + if (mode != MCOPY_ATOMIC_CONTINUE) { + if (!huge_pte_none(huge_ptep_get(dst_pte))) { + err = -EEXIST; + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + i_mmap_unlock_read(mapping); + goto out_unlock; + } } err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, - dst_addr, src_addr, &page); + dst_addr, src_addr, mode, &page); mutex_unlock(&hugetlb_fault_mutex_table[hash]); i_mmap_unlock_read(mapping); @@ -408,7 +407,7 @@ extern ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage); + enum mcopy_atomic_mode mode); #endif /* CONFIG_HUGETLB_PAGE */ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, @@ -417,10 +416,14 @@ static __always_inline ssize_t mfill_atomic_pte(struct mm_struct *dst_mm, unsigned long dst_addr, unsigned long src_addr, struct page **page, - bool zeropage, + enum mcopy_atomic_mode mode, bool wp_copy) { ssize_t err; + bool zeropage = (mode == MCOPY_ATOMIC_ZEROPAGE); + + if (mode == MCOPY_ATOMIC_CONTINUE) + return -EINVAL; /* * The normal page fault path for a shmem will invoke the @@ -458,7 +461,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, - bool zeropage, + enum mcopy_atomic_mode mcopy_mode, bool *mmap_changing, __u64 mode) { @@ -527,7 +530,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, */ if (is_vm_hugetlb_page(dst_vma)) return __mcopy_atomic_hugetlb(dst_mm, dst_vma, dst_start, - src_start, len, zeropage); + src_start, len, mcopy_mode); if (!vma_is_anonymous(dst_vma) && !vma_is_shmem(dst_vma)) goto out_unlock; @@ -577,7 +580,7 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm, BUG_ON(pmd_trans_huge(*dst_pmd)); err = mfill_atomic_pte(dst_mm, dst_pmd, dst_vma, dst_addr, - src_addr, &page, zeropage, wp_copy); + src_addr, &page, mcopy_mode, wp_copy); cond_resched(); if (unlikely(err == -ENOENT)) { @@ -626,14 +629,22 @@ ssize_t mcopy_atomic(struct mm_struct *dst_mm, unsigned long dst_start, unsigned long src_start, unsigned long len, bool *mmap_changing, __u64 mode) { - return __mcopy_atomic(dst_mm, dst_start, src_start, len, false, - mmap_changing, mode); + return __mcopy_atomic(dst_mm, dst_start, src_start, len, + MCOPY_ATOMIC_NORMAL, mmap_changing, mode); } ssize_t mfill_zeropage(struct mm_struct *dst_mm, unsigned long start, unsigned long len, bool *mmap_changing) { - return __mcopy_atomic(dst_mm, start, 0, len, true, mmap_changing, 0); + return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_ZEROPAGE, + mmap_changing, 0); +} + +ssize_t mcopy_continue(struct mm_struct *dst_mm, unsigned long start, + unsigned long len, bool *mmap_changing) +{ + return __mcopy_atomic(dst_mm, start, 0, len, MCOPY_ATOMIC_CONTINUE, + mmap_changing, 0); } int mwriteprotect_range(struct mm_struct *dst_mm, unsigned long start, From patchwork Thu Feb 4 18:34:32 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12068353 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id EFEC0C433E6 for ; Thu, 4 Feb 2021 18:35:06 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 738D664E06 for ; Thu, 4 Feb 2021 18:35:06 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 738D664E06 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 31F806B0075; Thu, 4 Feb 2021 13:35:03 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2D1DF6B0078; Thu, 4 Feb 2021 13:35:03 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 148916B007B; Thu, 4 Feb 2021 13:35:03 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0135.hostedemail.com [216.40.44.135]) by kanga.kvack.org (Postfix) with ESMTP id EFBE06B0075 for ; Thu, 4 Feb 2021 13:35:02 -0500 (EST) Received: from smtpin10.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 96BE0181AC9CB for ; Thu, 4 Feb 2021 18:35:02 +0000 (UTC) X-FDA: 77781437244.10.year28_6216cab275de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin10.hostedemail.com (Postfix) with ESMTP id 718B116A4AB for ; Thu, 4 Feb 2021 18:35:02 +0000 (UTC) X-HE-Tag: year28_6216cab275de X-Filterd-Recvd-Size: 11898 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf34.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 18:35:01 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id n81so4115517ybg.20 for ; Thu, 04 Feb 2021 10:35:01 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=ZRnTzwA9VW80nusel52jLxZxnTscC+VsgLNozj8M9Ek=; b=TqhTeWZBNzktQArxFmaBl+YrTMIRGYZ6yelt4qpqYEK34O6Bi820bZ4M+eK6TOeLT/ pOsio1N8xXmjhA1KfGOGgsPQchO3SJgRcz/I0fEwijFyjxli0eLUW9kjpz/9EF/07ylP 9+pC/+baAPwELjikAyFZoC7Vdx/39zZIyzAmKgMXAU0LCTk3nl7aFZCFiFq70YnZTUmc 7LtGqnicqyb5MU9TYp8sQm78Y1vxCDzLF9NprAbxbItVMfAVoyFDAKt1n1YusvpzZBwq cTDJ5HLyZZpO2dli2gzLEuba3PDbuSU8LvXRN6lQsaff/TWscK6VaVQBd+S7PLFXDSRT xMlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ZRnTzwA9VW80nusel52jLxZxnTscC+VsgLNozj8M9Ek=; b=Af0kAiPsByWncbDTxjTu1QRwe0ESKCYi4rLp90xEcADeCU+/AAIUjhMUdT5ywIYmQS HStX07LddqZl+rxRoWL7GIJXt41Qtr0f8khd5O9QPikIQULAUUGrXIDLbIakwrXRwXd2 +BPkjqDbF1XsWH+T+LUPhn3D9r946staZYIqXmnGA3SLfuX6g26xqsjp50Tbnnm543tl T0HZk2UEyQj8BzCaNgyqejT3+YjQjiW99HpbebW80Tdh8BQgP0yy661dKWrcF6l9rLy5 BYn9fxU0z1mrXkAnCjnJg9eH4WbAWtRlG8PlopOVQsxeFeD6gYnY2Rr6E6KaK6ahyp5h JMig== X-Gm-Message-State: AOAM5307uQl/3ksvzxTBVi3cJ8xbsdZPvYypc8SoFhA9lCJ3QHKCPZEv XcCBQSVc/RWj4N4Xt36O9xc4X8dWn0L6FOOAzdFT X-Google-Smtp-Source: ABdhPJw1TPN7+GxXKgnoUawyuqvDsJG9JEF8QbuwQPe2qj8/t8BThuj5NmCawCKbTq0jvP2L3uY2vjXMzJgYp3M/1S2B X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:b001:12c1:dc19:2089]) (user=axelrasmussen job=sendgmr) by 2002:a25:bec2:: with SMTP id k2mr903121ybm.42.1612463700943; Thu, 04 Feb 2021 10:35:00 -0800 (PST) Date: Thu, 4 Feb 2021 10:34:32 -0800 In-Reply-To: <20210204183433.1431202-1-axelrasmussen@google.com> Message-Id: <20210204183433.1431202-10-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210204183433.1431202-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH v4 09/10] userfaultfd: update documentation to describe minor fault handling From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Reword / reorganize things a little bit into "lists", so new features / modes / ioctls can sort of just be appended. Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used to intercept and resolve minor faults. Make it clear that COPY and ZEROPAGE are used for MISSING faults, whereas CONTINUE is used for MINOR faults. Signed-off-by: Axel Rasmussen --- Documentation/admin-guide/mm/userfaultfd.rst | 107 ++++++++++++------- 1 file changed, 66 insertions(+), 41 deletions(-) diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 65eefa66c0ba..cfd3daf59d0e 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -63,36 +63,36 @@ the generic ioctl available. The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl defines what memory types are supported by the ``userfaultfd`` and what -events, except page fault notifications, may be generated. - -If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs -virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in -``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be -set if the kernel supports registering ``userfaultfd`` ranges on shared -memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, -``MAP_SHARED``, ``memfd_create``, etc). - -The userland application that wants to use ``userfaultfd`` with hugetlbfs -or shared memory need to set the corresponding flag in -``uffdio_api.features`` to enable those features. - -If the userland desires to receive notifications for events other than -page faults, it has to verify that ``uffdio_api.features`` has appropriate -``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more -detail below in `Non-cooperative userfaultfd`_ section. - -Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should -be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to -register a memory range in the ``userfaultfd`` by setting the +events, except page fault notifications, may be generated: + +- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events + other than page faults are supported. These events are described in more + detail below in the `Non-cooperative userfaultfd`_ section. + +- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM`` + indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING`` + registrations for hugetlbfs and shared memory (covering all shmem APIs, + i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``, + etc) virtual memory areas, respectively. + +- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports + ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory + areas. + +The userland application should set the feature flags it intends to use +when envoking the ``UFFDIO_API`` ioctl, to request that those features be +enabled if supported. + +Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER`` +ioctl should be invoked (if present in the returned ``uffdio_api.ioctls`` +bitmask) to register a memory range in the ``userfaultfd`` by setting the uffdio_register structure accordingly. The ``uffdio_register.mode`` bitmask will specify to the kernel which kind of faults to track for -the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing -pages). The ``UFFDIO_REGISTER`` ioctl will return the +the range. The ``UFFDIO_REGISTER`` ioctl will return the ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve userfaults on the range registered. Not all ioctls will necessarily be -supported for all memory types depending on the underlying virtual -memory backend (anonymous memory vs tmpfs vs real filebacked -mappings). +supported for all memory types (e.g. anonymous memory vs. shmem vs. +hugetlbfs), or all types of intercepted faults. Userland can use the ``uffdio_register.ioctls`` to manage the virtual address space in the background (to add or potentially also remove @@ -100,21 +100,46 @@ memory from the ``userfaultfd`` registered range). This means a userfault could be triggering just before userland maps in the background the user-faulted page. -The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That -atomically copies a page into the userfault registered range and wakes -up the blocked userfaults -(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set). -Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in -guaranteeing that nothing can see an half copied page since it'll -keep userfaulting until the copy has finished. +Resolving Userfaults +-------------------- + +There are three basic ways to resolve userfaults: + +- ``UFFDIO_COPY`` atomically copies some existing page contents from + userspace. + +- ``UFFDIO_ZEROPAGE`` atomically zeros the new page. + +- ``UFFDIO_CONTINUE`` maps an existing, previously-populated page. + +These operations are atomic in the sense that they guarantee nothing can +see a half-populated page, since readers will keep userfaulting until the +operation has finished. + +By default, these wake up userfaults blocked on the range in question. +They support a ``UFFDIO_*_MODE_DONTWAKE`` ``mode`` flag, which indicates +that waking will be done separately at some later time. + +Which ioctl to choose depends on the kind of page fault, and what we'd +like to do to resolve it: + +- For ``UFFDIO_REGISTER_MODE_MISSING`` faults, the fault needs to be + resolved by either providing a new page (``UFFDIO_COPY``), or mapping + the zero page (``UFFDIO_ZEROPAGE``). By default, the kernel would map + the zero page for a missing fault. With userfaultfd, userspace can + decide what content to provide before the faulting thread continues. + +- For ``UFFDIO_REGISTER_MODE_MINOR`` faults, there is an existing page (in + the page cache). Userspace has the option of modifying the page's + contents before resolving the fault. Once the contents are correct + (modified or not), userspace asks the kernel to map the page and let the + faulting thread continue with ``UFFDIO_CONTINUE``. Notes: -- If you requested ``UFFDIO_REGISTER_MODE_MISSING`` when registering then - you must provide some kind of page in your thread after reading from - the uffd. You must provide either ``UFFDIO_COPY`` or ``UFFDIO_ZEROPAGE``. - The normal behavior of the OS automatically providing a zero page on - an anonymous mmaping is not in place. +- You can tell which kind of fault occurred by examining + ``pagefault.flags`` within the ``uffd_msg``, checking for the + ``UFFD_PAGEFAULT_FLAG_*`` flags. - None of the page-delivering ioctls default to the range that you registered with. You must fill in all fields for the appropriate @@ -122,9 +147,9 @@ Notes: - You get the address of the access that triggered the missing page event out of a struct uffd_msg that you read in the thread from the - uffd. You can supply as many pages as you want with ``UFFDIO_COPY`` or - ``UFFDIO_ZEROPAGE``. Keep in mind that unless you used DONTWAKE then - the first of any of those IOCTLs wakes up the faulting thread. + uffd. You can supply as many pages as you want with these IOCTLs. + Keep in mind that unless you used DONTWAKE then the first of any of + those IOCTLs wakes up the faulting thread. - Be sure to test for all errors including (``pollfd[0].revents & POLLERR``). This can happen, e.g. when ranges From patchwork Thu Feb 4 18:34:33 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Axel Rasmussen X-Patchwork-Id: 12068355 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-16.5 required=3.0 tests=BAYES_00, DKIM_ADSP_CUSTOM_MED,DKIM_INVALID,DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9C39AC433E0 for ; Thu, 4 Feb 2021 18:35:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4043664E06 for ; Thu, 4 Feb 2021 18:35:09 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4043664E06 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 827B66B0078; Thu, 4 Feb 2021 13:35:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7D66F6B007B; Thu, 4 Feb 2021 13:35:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 67A9D6B007D; Thu, 4 Feb 2021 13:35:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0089.hostedemail.com [216.40.44.89]) by kanga.kvack.org (Postfix) with ESMTP id 4C2A56B0078 for ; Thu, 4 Feb 2021 13:35:05 -0500 (EST) Received: from smtpin01.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id 10641181AC9CB for ; Thu, 4 Feb 2021 18:35:05 +0000 (UTC) X-FDA: 77781437370.01.pin23_0407612275de Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin01.hostedemail.com (Postfix) with ESMTP id DEA271004762A for ; Thu, 4 Feb 2021 18:35:04 +0000 (UTC) X-HE-Tag: pin23_0407612275de X-Filterd-Recvd-Size: 13047 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf46.hostedemail.com (Postfix) with ESMTP for ; Thu, 4 Feb 2021 18:35:03 +0000 (UTC) Received: by mail-pg1-f202.google.com with SMTP id s4so1014860pgi.7 for ; Thu, 04 Feb 2021 10:35:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=sender:date:in-reply-to:message-id:mime-version:references:subject :from:to:cc; bh=8hGsRhwaDWoDizfeeTsq9J+LowqD142CLh64m52wJCk=; b=n4kHyH/OcEolnGZyvK4JrmtjTn9vIJiS9Exo+vxqn0fPsJxQLoaDR1yeHr1imFSxSF eOvQgyctfyn20ByXGTJwj+D9+7w7R4k7ghmL80M7T2dCdnYgSF+9g03v8ZypyZwIwupc qeOJuIfd6kJKY6IFt3CJS2hrh69ChAE6QX+EVTZO1yUr7EHVjVyteOpvMiXWepiGIKs2 VRvczP54eru7JR4J1IXVdiY2+fAOq//EDBUltNRFdDRaTl/6YH5hnkYQm/FjV73lDGwZ +o/WIjU8Hmuckv7qO6s9fMuMyUkd8UAyO0JhXTawp0Z4IimSgq9KUvBAbxQb6wkJ2x/Y jVbg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=8hGsRhwaDWoDizfeeTsq9J+LowqD142CLh64m52wJCk=; b=MkszMWhlfQFAAGYG4qsg9nUR9fNrZLrUwY8ifnciwGS2Z8Onzmi975Fnqwx2/Tas1c to/7FaDMSezeGYPLaiLT7Mm2niRTMtyxM6P1i3XyIai/iku/wGaLdzBlSEsNRrxBMw9V wm6e1n5ZZ9KVICNIcQQXTDm7VdkbMVzCkQbTCcbhSen2RXxCbF5m2TjLSfV0Qr3tUNUE o2XWfGHTQtMgnMjKZxIsUxq4MQhBL9Z2UZuZC+2S2r+RSyqim/rFBzUXCvE7xLHp3i0a PxbuYRrJq5Oh2eVWvXp0lgfufsrCE2hFBLI5aQtRAIxqn/8pUiR6GbOOmHbS6e6U8zjm tkqA== X-Gm-Message-State: AOAM531Q67BHNkRN+dqWwr0nrt9xnRlNzMLB+tFNN3CaMmyCFQQheuAm mC5rFKO5v457ekoVn6kx6YguGREhKMXX1eqkEHr1 X-Google-Smtp-Source: ABdhPJzDUBTF1SiPoJJMYyg7MiYrHCvKNsxpANXekJSLV+B3WJJMPETX3zn5RQvsW0WBeHqUFuPcg5GZU7J0rGLJSdr4 X-Received: from ajr0.svl.corp.google.com ([2620:15c:2cd:203:b001:12c1:dc19:2089]) (user=axelrasmussen job=sendgmr) by 2002:a17:903:185:b029:e1:8692:90bc with SMTP id z5-20020a1709030185b02900e1869290bcmr360402plg.75.1612463702914; Thu, 04 Feb 2021 10:35:02 -0800 (PST) Date: Thu, 4 Feb 2021 10:34:33 -0800 In-Reply-To: <20210204183433.1431202-1-axelrasmussen@google.com> Message-Id: <20210204183433.1431202-11-axelrasmussen@google.com> Mime-Version: 1.0 References: <20210204183433.1431202-1-axelrasmussen@google.com> X-Mailer: git-send-email 2.30.0.365.g02bc693789-goog Subject: [PATCH v4 10/10] userfaultfd/selftests: add test exercising minor fault handling From: Axel Rasmussen To: Alexander Viro , Alexey Dobriyan , Andrea Arcangeli , Andrew Morton , Anshuman Khandual , Catalin Marinas , Chinwen Chang , Huang Ying , Ingo Molnar , Jann Horn , Jerome Glisse , Lokesh Gidra , "Matthew Wilcox (Oracle)" , Michael Ellerman , " =?utf-8?q?Michal_Koutn=C3=BD?= " , Michel Lespinasse , Mike Kravetz , Mike Rapoport , Nicholas Piggin , Peter Xu , Shaohua Li , Shawn Anastasio , Steven Rostedt , Steven Price , Vlastimil Babka Cc: linux-kernel@vger.kernel.org, linux-fsdevel@vger.kernel.org, linux-mm@kvack.org, Adam Ruprecht , Axel Rasmussen , Cannon Matthews , "Dr . David Alan Gilbert" , David Rientjes , Mina Almasry , Oliver Upton X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Fix a dormant bug in userfaultfd_events_test(), where we did `return faulting_process(0)` instead of `exit(faulting_process(0))`. This caused the forked process to keep running, trying to execute any further test cases after the events test in parallel with the "real" process. Add a simple test case which exercises minor faults. In short, it does the following: 1. "Sets up" an area (area_dst) and a second shared mapping to the same underlying pages (area_dst_alias). 2. Register one of these areas with userfaultfd, in minor fault mode. 3. Start a second thread to handle any minor faults. 4. Populate the underlying pages with the non-UFFD-registered side of the mapping. Basically, memset() each page with some arbitrary contents. 5. Then, using the UFFD-registered mapping, read all of the page contents, asserting that the contents match expectations (we expect the minor fault handling thread can modify the page contents before resolving the fault). The minor fault handling thread, upon receiving an event, flips all the bits (~) in that page, just to prove that it can modify it in some arbitrary way. Then it issues a UFFDIO_CONTINUE ioctl, to setup the mapping and resolve the fault. The reading thread should wake up and see this modification. Currently the minor fault test is only enabled in hugetlb_shared mode, as this is the only configuration the kernel feature supports. Reviewed-by: Peter Xu Signed-off-by: Axel Rasmussen --- tools/testing/selftests/vm/userfaultfd.c | 147 ++++++++++++++++++++++- 1 file changed, 143 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 92b8ec423201..73a72a3c4189 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -81,6 +81,8 @@ static volatile bool test_uffdio_copy_eexist = true; static volatile bool test_uffdio_zeropage_eexist = true; /* Whether to test uffd write-protection */ static bool test_uffdio_wp = false; +/* Whether to test uffd minor faults */ +static bool test_uffdio_minor = false; static bool map_shared; static int huge_fd; @@ -96,6 +98,7 @@ struct uffd_stats { int cpu; unsigned long missing_faults; unsigned long wp_faults; + unsigned long minor_faults; }; /* pthread_mutex_t starts at page offset 0 */ @@ -153,17 +156,19 @@ static void uffd_stats_reset(struct uffd_stats *uffd_stats, uffd_stats[i].cpu = i; uffd_stats[i].missing_faults = 0; uffd_stats[i].wp_faults = 0; + uffd_stats[i].minor_faults = 0; } } static void uffd_stats_report(struct uffd_stats *stats, int n_cpus) { int i; - unsigned long long miss_total = 0, wp_total = 0; + unsigned long long miss_total = 0, wp_total = 0, minor_total = 0; for (i = 0; i < n_cpus; i++) { miss_total += stats[i].missing_faults; wp_total += stats[i].wp_faults; + minor_total += stats[i].minor_faults; } printf("userfaults: %llu missing (", miss_total); @@ -172,6 +177,9 @@ static void uffd_stats_report(struct uffd_stats *stats, int n_cpus) printf("\b), %llu wp (", wp_total); for (i = 0; i < n_cpus; i++) printf("%lu+", stats[i].wp_faults); + printf("\b), %llu minor (", minor_total); + for (i = 0; i < n_cpus; i++) + printf("%lu+", stats[i].minor_faults); printf("\b)\n"); } @@ -328,7 +336,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = { }; static struct uffd_test_ops hugetlb_uffd_test_ops = { - .expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC, + .expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC & ~(1 << _UFFDIO_CONTINUE), .allocate_area = hugetlb_allocate_area, .release_pages = hugetlb_release_pages, .alias_mapping = hugetlb_alias_mapping, @@ -362,6 +370,22 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool wp) } } +static void continue_range(int ufd, __u64 start, __u64 len) +{ + struct uffdio_continue req; + + req.range.start = start; + req.range.len = len; + req.mode = 0; + + if (ioctl(ufd, UFFDIO_CONTINUE, &req)) { + fprintf(stderr, + "UFFDIO_CONTINUE failed for address 0x%" PRIx64 "\n", + (uint64_t)start); + exit(1); + } +} + static void *locking_thread(void *arg) { unsigned long cpu = (unsigned long) arg; @@ -569,8 +593,32 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, } if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP) { + /* Write protect page faults */ wp_range(uffd, msg->arg.pagefault.address, page_size, false); stats->wp_faults++; + } else if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_MINOR) { + uint8_t *area; + int b; + + /* + * Minor page faults + * + * To prove we can modify the original range for testing + * purposes, we're going to bit flip this range before + * continuing. + * + * Note that this requires all minor page fault tests operate on + * area_dst (non-UFFD-registered) and area_dst_alias + * (UFFD-registered). + */ + + area = (uint8_t *)(area_dst + + ((char *)msg->arg.pagefault.address - + area_dst_alias)); + for (b = 0; b < page_size; ++b) + area[b] = ~area[b]; + continue_range(uffd, msg->arg.pagefault.address, page_size); + stats->minor_faults++; } else { /* Missing page faults */ if (bounces & BOUNCE_VERIFY && @@ -1112,7 +1160,7 @@ static int userfaultfd_events_test(void) } if (!pid) - return faulting_process(0); + exit(faulting_process(0)); waitpid(pid, &err, 0); if (err) { @@ -1215,6 +1263,95 @@ static int userfaultfd_sig_test(void) return userfaults != 0; } +static int userfaultfd_minor_test(void) +{ + struct uffdio_register uffdio_register; + unsigned long expected_ioctls; + unsigned long p; + pthread_t uffd_mon; + uint8_t expected_byte; + void *expected_page; + char c; + struct uffd_stats stats = { 0 }; + + if (!test_uffdio_minor) + return 0; + + printf("testing minor faults: "); + fflush(stdout); + + if (uffd_test_ops->release_pages(area_dst)) + return 1; + + if (userfaultfd_open(0)) + return 1; + + uffdio_register.range.start = (unsigned long)area_dst_alias; + uffdio_register.range.len = nr_pages * page_size; + uffdio_register.mode = UFFDIO_REGISTER_MODE_MINOR; + if (ioctl(uffd, UFFDIO_REGISTER, &uffdio_register)) { + fprintf(stderr, "register failure\n"); + exit(1); + } + + expected_ioctls = uffd_test_ops->expected_ioctls; + expected_ioctls |= 1 << _UFFDIO_CONTINUE; + if ((uffdio_register.ioctls & expected_ioctls) != expected_ioctls) { + fprintf(stderr, "unexpected missing ioctl(s)\n"); + exit(1); + } + + /* + * After registering with UFFD, populate the non-UFFD-registered side of + * the shared mapping. This should *not* trigger any UFFD minor faults. + */ + for (p = 0; p < nr_pages; ++p) { + memset(area_dst + (p * page_size), p % ((uint8_t)-1), + page_size); + } + + if (pthread_create(&uffd_mon, &attr, uffd_poll_thread, &stats)) { + perror("uffd_poll_thread create"); + exit(1); + } + + /* + * Read each of the pages back using the UFFD-registered mapping. We + * expect that the first time we touch a page, it will result in a minor + * fault. uffd_poll_thread will resolve the fault by bit-flipping the + * page's contents, and then issuing a CONTINUE ioctl. + */ + + if (posix_memalign(&expected_page, page_size, page_size)) { + fprintf(stderr, "out of memory\n"); + return 1; + } + + for (p = 0; p < nr_pages; ++p) { + expected_byte = ~((uint8_t)(p % ((uint8_t)-1))); + memset(expected_page, expected_byte, page_size); + if (my_bcmp(expected_page, area_dst_alias + (p * page_size), + page_size)) { + fprintf(stderr, + "unexpected page contents after minor fault\n"); + exit(1); + } + } + + if (write(pipefd[1], &c, sizeof(c)) != sizeof(c)) { + perror("pipe write"); + exit(1); + } + if (pthread_join(uffd_mon, NULL)) + return 1; + + close(uffd); + + uffd_stats_report(&stats, 1); + + return stats.minor_faults != nr_pages; +} + static int userfaultfd_stress(void) { void *area; @@ -1413,7 +1550,7 @@ static int userfaultfd_stress(void) close(uffd); return userfaultfd_zeropage_test() || userfaultfd_sig_test() - || userfaultfd_events_test(); + || userfaultfd_events_test() || userfaultfd_minor_test(); } /* @@ -1454,6 +1591,8 @@ static void set_test_type(const char *type) map_shared = true; test_type = TEST_HUGETLB; uffd_test_ops = &hugetlb_uffd_test_ops; + /* Minor faults require shared hugetlb; only enable here. */ + test_uffdio_minor = true; } else if (!strcmp(type, "shmem")) { map_shared = true; test_type = TEST_SHMEM;