From patchwork Wed Dec 4 18:26:50 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Guillaume Morin X-Patchwork-Id: 13894174 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3B7BEE7716D for ; Wed, 4 Dec 2024 18:27:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AAA396B0082; Wed, 4 Dec 2024 13:27:02 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A58406B0083; Wed, 4 Dec 2024 13:27:02 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9490D6B0085; Wed, 4 Dec 2024 13:27:02 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 733DE6B0082 for ; Wed, 4 Dec 2024 13:27:02 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 19633121083 for ; Wed, 4 Dec 2024 18:27:02 +0000 (UTC) X-FDA: 82858107990.11.AFA56BD Received: from smtp2-g21.free.fr (smtp2-g21.free.fr [212.27.42.2]) by imf30.hostedemail.com (Postfix) with ESMTP id 06D9A80004 for ; Wed, 4 Dec 2024 18:26:31 +0000 (UTC) Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=morinfr.org header.s=20170427 header.b=lNCeQ97y; spf=pass (imf30.hostedemail.com: domain of guillaume@morinfr.org designates 212.27.42.2 as permitted sender) smtp.mailfrom=guillaume@morinfr.org; dmarc=pass (policy=quarantine) header.from=morinfr.org ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1733336810; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=XvkApVkGOyr97+SFFARJioI95kRCf5IOchx6oG1WUzc=; b=iqKPWdsSiKCxRBCU+mTdNQb72PUaUIV9DNTqhNi9a4F7eeFiieQQxEFv4YZ6AsA6kkYVgA gNUlSxuxyR2T5eSbjQo8vXwigAS3PnLRU1HgrRGOkFCvcybdmNb0RVkGiWT4M4qIS5LbU+ 6BdXGdCZDtGa+eYpFrRhWIK3q82R9DU= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=morinfr.org header.s=20170427 header.b=lNCeQ97y; spf=pass (imf30.hostedemail.com: domain of guillaume@morinfr.org designates 212.27.42.2 as permitted sender) smtp.mailfrom=guillaume@morinfr.org; dmarc=pass (policy=quarantine) header.from=morinfr.org ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1733336810; a=rsa-sha256; cv=none; b=4pVEXiVYxdY9DXn7+YGajqJCzMA7Acq8KIcTbK1JM6nTYOHNzviXI3gmTPwkuhoD7Bo8S8 vm7b1auUCCNJ91tInpVIxNtMDqwF581Uz4wICThIab1lAByftAMVctGMN5UaMi9So4Iy5t qDpNAttYDOA+G1bNPTGdNq6B67wV7Sk= Received: from bender.morinfr.org (unknown [82.66.66.112]) by smtp2-g21.free.fr (Postfix) with ESMTPS id 2E1E62003F7; Wed, 4 Dec 2024 19:26:51 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=morinfr.org ; s=20170427; h=Content-Type:MIME-Version:Message-ID:Subject:Cc:To:From:Date: Sender:Reply-To:Content-Transfer-Encoding:Content-ID:Content-Description: Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID: In-Reply-To:References:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=XvkApVkGOyr97+SFFARJioI95kRCf5IOchx6oG1WUzc=; b=lNCeQ97yoKFTCy0Q32IPzldTmr 3vNGQf3sQ8esD/GOX/xB74hAovnS0r4FvI3xVZtYUrXnkVPADymVDQ36d1UvO/rOoNEGvujgJgTlN 9jt7Luo5Q2wVwNk4u9bWuGqJZauX69cXWPJOcTKhK28ZSSdtrWruRjm9quC0PPdaZQCE=; Received: from guillaum by bender.morinfr.org with local (Exim 4.96) (envelope-from ) id 1tIu5K-00156h-1R; Wed, 04 Dec 2024 19:26:50 +0100 Date: Wed, 4 Dec 2024 19:26:50 +0100 From: Guillaume Morin To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, guillaume@morinfr.org, Muchun Song , Andrew Morton , Peter Xu , David Hildenbrand , Eric Hagberg Subject: [PATCH v1] hugetlb: support FOLL_FORCE|FOLL_WRITE Message-ID: MIME-Version: 1.0 Content-Disposition: inline X-Rspamd-Server: rspam05 X-Stat-Signature: kw4g5hcyay1wed37c3sy3y9ayef3bja4 X-Rspamd-Queue-Id: 06D9A80004 X-Rspam-User: X-HE-Tag: 1733336791-148589 X-HE-Meta: U2FsdGVkX18O4//XT20MWpEcXoe5cKen9qfXNfxjh53c4x6bac77Jr5QjQIEGwFpZE2maAEnkiIFN6zFyizajIPDOKBGu1zy9DpG3DfZTiPEbBHMCcsd7dV1WwYzSlCmGPrIWACW48txqJ4H0y9ppn1rIKjVe6C4v68sxYHLBexlHd9s4LpRWI+HSx7Iii9nA+qAlI7k7/tzoYRWLJY6CMgRlRtuFy9hMY5X9IcAXyJmOdNQXKSfVIt97UTwjtXW+CjpXh5Zf1Wj8jtUGhUVpiioTjsO1hiK6jMD+sjSWkpi2MJyYny+7wlynEcXes9tHdL/7qZmXvYqTcdcgAb0IxPJTRjXWgspMFOkRmByr2lC98n+E91ErP0OwNdr5L84JgBDvBbnPGntzgXYtfPeutbRqJQmsgq4THfXkfK2zztuyIwkbXGJ1qF1w4sad/3y8OwwP8edMn5x3kPRGSwTZau2q5fFaPL2UaxBD+XpnmK1QYn6A7teHnsJmiAHInixndcDIk57LYZqCo0vOSLd4JxRA2yRC3Dss10qGPJ0lsPOVp6AGc/ZvacCibEc+bJXIenHLuvxI0DW9ulC1mP7ImYfD80yD/F4Mo0zXlEvlQsQ91GVSqSjmSjWfBrtmzYi9DXGRBgPf4dpvwcoRb1Q54HR/aSAhp2O4Y8UvEKyNOyDMxX2LjmRgL7M1cXENagBA2qwQ80AEVWl5dTngBtP8uCknQa+duiQer2apDJF6m8CVvT85wfhonI3Y64jvD8ZT8o/8ZEQ+97jQfZsEj0XqqUgSYkoxx7V3sk1Ac4slc6oI/N52j2yDl2JOiaI+fYdWZotMeZqWP0nHvG0sNi660xxJFhNSd0rnRC/WPMzHXjmgOxn2yGU4wbYGC2tPfSGhP/cavEg+ccy3ZyYAzy1ch3rlPoXZQTmEGtwlzNgYxNhyqlZ+nkCOxtMYM+iWAIa5jtp1wlLQx2ZNHCj1aw WOYcc+sZ t/UwY7D31posjZ/dOBZUf5sxO/+jyXsX+ESkrODX3Y5pdWpffsV5+RlZxD8qrOX5dxtuXYqpCgqC4zk2EED5VeNNwEnmjNWWL4s+c17HuNuYR8ba0fYl8XxridyNm6hkqAKIUzUi5jAO29f9Dw9fbpil/o9vUIU+qT0B6K+L7iq560ROfaBHfkB7O7y5nZty/oAmpc+NbvCsbQvc6FjxI53jlXgvqH2DeX1drLP/CE1a38Te/ZdFELqNwJUT5MivAA97q9xvru21VbqnX90lf15HEGnqaA5EOnQByIxP0oqyFJGH1HZdAH04MR6ZXKnTYdhjkpr2rVaJbxhoZhdyr/ElUTu9ylTytmFImP9EeoXP3E0Eyp9H1UoLNM2NI9al0UfjSwUoP9AQ1AihNxYE0iowSjy4DFtsQLKv7pr/0tdgEQHbgY8PVHhaTrRQ6hfTfVTamwbY/GNLz0gzA08kGuAN93A== X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: FOLL_FORCE|FOLL_WRITE has never been properly supported for hugetlb mappings. Since 1d8d14641fd94, we explicitly reject it. However running software on hugetlb mappings is a useful optimization. Multiple tools allow to use that such as Intel iodlr or libhugetlbfs. Cc: Muchun Song Cc: Andrew Morton Cc: Peter Xu Cc: David Hildenbrand Cc: Eric Hagberg Signed-off-by: Guillaume Morin --- mm/gup.c | 93 ++++++++++++++++++++++++++-------------------------- mm/hugetlb.c | 20 ++++++----- 2 files changed, 58 insertions(+), 55 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 746070a1d8bf..c680edf33248 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -587,6 +587,33 @@ static struct folio *try_grab_folio_fast(struct page *page, int refs, } #endif /* CONFIG_HAVE_GUP_FAST */ +/* Common code for can_follow_write_* */ +static inline bool can_follow_write_common(struct page *page, + struct vm_area_struct *vma, unsigned int flags) +{ + /* Maybe FOLL_FORCE is set to override it? */ + if (!(flags & FOLL_FORCE)) + return false; + + /* But FOLL_FORCE has no effect on shared mappings */ + if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) + return false; + + /* ... or read-only private ones */ + if (!(vma->vm_flags & VM_MAYWRITE)) + return false; + + /* ... or already writable ones that just need to take a write fault */ + if (vma->vm_flags & VM_WRITE) + return false; + + /* + * See can_change_pte_writable(): we broke COW and could map the page + * writable if we have an exclusive anonymous page ... + */ + return page && PageAnon(page) && PageAnonExclusive(page); +} + static struct page *no_page_table(struct vm_area_struct *vma, unsigned int flags, unsigned long address) { @@ -613,6 +640,22 @@ static struct page *no_page_table(struct vm_area_struct *vma, } #ifdef CONFIG_PGTABLE_HAS_HUGE_LEAVES +/* FOLL_FORCE can write to even unwritable PUDs in COW mappings. */ +static inline bool can_follow_write_pud(pud_t pud, struct page *page, + struct vm_area_struct *vma, + unsigned int flags) +{ + /* If the pud is writable, we can write to the page. */ + if (pud_write(pud)) + return true; + + if (!can_follow_write_common(page, vma, flags)) + return false; + + /* ... and a write-fault isn't required for other reasons. */ + return !vma_soft_dirty_enabled(vma) || pud_soft_dirty(pud); +} + static struct page *follow_huge_pud(struct vm_area_struct *vma, unsigned long addr, pud_t *pudp, int flags, struct follow_page_context *ctx) @@ -625,7 +668,8 @@ static struct page *follow_huge_pud(struct vm_area_struct *vma, assert_spin_locked(pud_lockptr(mm, pudp)); - if ((flags & FOLL_WRITE) && !pud_write(pud)) + if ((flags & FOLL_WRITE) && + !can_follow_write_pud(pud, page, vma, flags)) return NULL; if (!pud_present(pud)) @@ -677,27 +721,7 @@ static inline bool can_follow_write_pmd(pmd_t pmd, struct page *page, if (pmd_write(pmd)) return true; - /* Maybe FOLL_FORCE is set to override it? */ - if (!(flags & FOLL_FORCE)) - return false; - - /* But FOLL_FORCE has no effect on shared mappings */ - if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) - return false; - - /* ... or read-only private ones */ - if (!(vma->vm_flags & VM_MAYWRITE)) - return false; - - /* ... or already writable ones that just need to take a write fault */ - if (vma->vm_flags & VM_WRITE) - return false; - - /* - * See can_change_pte_writable(): we broke COW and could map the page - * writable if we have an exclusive anonymous page ... - */ - if (!page || !PageAnon(page) || !PageAnonExclusive(page)) + if (!can_follow_write_common(page, vma, flags)) return false; /* ... and a write-fault isn't required for other reasons. */ @@ -798,27 +822,7 @@ static inline bool can_follow_write_pte(pte_t pte, struct page *page, if (pte_write(pte)) return true; - /* Maybe FOLL_FORCE is set to override it? */ - if (!(flags & FOLL_FORCE)) - return false; - - /* But FOLL_FORCE has no effect on shared mappings */ - if (vma->vm_flags & (VM_MAYSHARE | VM_SHARED)) - return false; - - /* ... or read-only private ones */ - if (!(vma->vm_flags & VM_MAYWRITE)) - return false; - - /* ... or already writable ones that just need to take a write fault */ - if (vma->vm_flags & VM_WRITE) - return false; - - /* - * See can_change_pte_writable(): we broke COW and could map the page - * writable if we have an exclusive anonymous page ... - */ - if (!page || !PageAnon(page) || !PageAnonExclusive(page)) + if (!can_follow_write_common(page, vma, flags)) return false; /* ... and a write-fault isn't required for other reasons. */ @@ -1285,9 +1289,6 @@ static int check_vma_flags(struct vm_area_struct *vma, unsigned long gup_flags) if (!(vm_flags & VM_WRITE) || (vm_flags & VM_SHADOW_STACK)) { if (!(gup_flags & FOLL_FORCE)) return -EFAULT; - /* hugetlb does not support FOLL_FORCE|FOLL_WRITE. */ - if (is_vm_hugetlb_page(vma)) - return -EFAULT; /* * We used to let the write,force case do COW in a * VM_MAYWRITE VM_SHARED !VM_WRITE vma, so ptrace could diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ea2ed8e301ef..52517b7ce308 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5169,6 +5169,13 @@ static void set_huge_ptep_writable(struct vm_area_struct *vma, update_mmu_cache(vma, address, ptep); } +static void set_huge_ptep_maybe_writable(struct vm_area_struct *vma, + unsigned long address, pte_t *ptep) +{ + if (vma->vm_flags & VM_WRITE) + set_huge_ptep_writable(vma, address, ptep); +} + bool is_hugetlb_entry_migration(pte_t pte) { swp_entry_t swp; @@ -5802,13 +5809,6 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, if (!unshare && huge_pte_uffd_wp(pte)) return 0; - /* - * hugetlb does not support FOLL_FORCE-style write faults that keep the - * PTE mapped R/O such as maybe_mkwrite() would do. - */ - if (WARN_ON_ONCE(!unshare && !(vma->vm_flags & VM_WRITE))) - return VM_FAULT_SIGSEGV; - /* Let's take out MAP_SHARED mappings first. */ if (vma->vm_flags & VM_MAYSHARE) { set_huge_ptep_writable(vma, vmf->address, vmf->pte); @@ -5837,7 +5837,8 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, SetPageAnonExclusive(&old_folio->page); } if (likely(!unshare)) - set_huge_ptep_writable(vma, vmf->address, vmf->pte); + set_huge_ptep_maybe_writable(vma, vmf->address, + vmf->pte); delayacct_wpcopy_end(); return 0; @@ -5943,7 +5944,8 @@ static vm_fault_t hugetlb_wp(struct folio *pagecache_folio, spin_lock(vmf->ptl); vmf->pte = hugetlb_walk(vma, vmf->address, huge_page_size(h)); if (likely(vmf->pte && pte_same(huge_ptep_get(mm, vmf->address, vmf->pte), pte))) { - pte_t newpte = make_huge_pte(vma, &new_folio->page, !unshare); + const bool writable = !unshare && (vma->vm_flags & VM_WRITE); + pte_t newpte = make_huge_pte(vma, &new_folio->page, writable); /* Break COW or unshare */ huge_ptep_clear_flush(vma, vmf->address, vmf->pte);