From patchwork Fri Oct 21 16:36:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015070 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BA9C1FA373E for ; Fri, 21 Oct 2022 16:37:17 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37A148E0003; Fri, 21 Oct 2022 12:37:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2DC268E0001; Fri, 21 Oct 2022 12:37:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12F8E8E0003; Fri, 21 Oct 2022 12:37:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F29F98E0001 for ; Fri, 21 Oct 2022 12:37:16 -0400 (EDT) Received: from smtpin30.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C415E80556 for ; Fri, 21 Oct 2022 16:37:16 +0000 (UTC) X-FDA: 80045511672.30.A45C9B7 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf30.hostedemail.com (Postfix) with ESMTP id 304AD80035 for ; Fri, 21 Oct 2022 16:37:15 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id n6-20020a5b0486000000b006aff8dc9865so3736362ybp.11 for ; Fri, 21 Oct 2022 09:37:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=PyHYN0BZDkl5R+we+ok2phOJ6Br6Eo9HknyqZ/f6blI=; b=k4YSYz8Eqmi/g8cLFz6T0cXUQewY5TrykcLo3J6vLFEFW+zVg07R+6L2pGz4hYkpjh YeRSCb3IFz9Lx2ADjHguv3Bcl2NzvlSG0UaHoorM2hx35cBJZbjKBwoRh5rxOv53PUtH 6JfJ5o1zynpQr144bkb1sr56ifBdHTV2T5B0ZasYzlZTGp6KE2/lEQXR+tm8Fb9Lka+0 ayKHUamTEhmRm9nNm0eDQcbCVIN9xZXd7NlM3NhcPsDm0mH+iuwgerGgDIztDwXkgbUe GoioeQ/qERTklWRzJZVlo//zRoztIQ/e5hovtXxIkbCWhIHv+mW+9bZYapriY1Hylf8f cQlw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=PyHYN0BZDkl5R+we+ok2phOJ6Br6Eo9HknyqZ/f6blI=; b=sGjYx0nQNYgdnO87EoT07mB8e5GlWGKTM/HW/HztQQjhi7jNL0wgZB2HQHwS7dinxi wNKhS12XCispkCZBEYsfcm03zt5ZgNVKdtWlXAUGed3+hWxbEySNYepmbVLK3V//ayEW fOwY/8ho7zHHTekKX/raS1T/jwom0CN7hDj93Csx3MU/aVE1f2Shc9lXs7eJHUuoTWLY 9dUuScgM9txPn/w04UXgZE/6kfNMnrohRXjZHt2H37udZldCyetkzEwaURPIa8ye9aow 6bRTuQhQXM2wmTLHLnePfqK9LP6vkSENSWZFur0aja54wf+aq30o4RxRTt+ujBPyvu9Q 4N6Q== X-Gm-Message-State: ACrzQf1UumI6EihMIB3CE37ndGmjUgdeJ93gw0065+peRZJ6lmP4c3y5 kKztp9YcXO+KmV2Ue28OYPYyztQ3ZYgNfhcD X-Google-Smtp-Source: AMsMyM71R8GW+ChtJoWlK0QDaMYn9Alr4ayX6EmYtj3WLRdsrGZ2zJ8HM0W8AKSaOIsOoFkltQC4RNu16EZ4KvtE X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:6a8b:0:b0:6c0:5610:b6f1 with SMTP id f133-20020a256a8b000000b006c05610b6f1mr17608546ybc.273.1666370235466; Fri, 21 Oct 2022 09:37:15 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:17 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-2-jthoughton@google.com> Subject: [RFC PATCH v2 01/47] hugetlb: don't set PageUptodate for UFFDIO_CONTINUE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370236; a=rsa-sha256; cv=none; b=D2oF9G71h61x3biBVnS0ZEEqBCgyCi/tJEs5LL0h1MG9f5HGIUAhN0PRfXGfa2o+P+pY/G lRguoa4bHJeWwj27/CYp08gOyWuEvc7C0x7baVLGJbhBVFPexVoAapENeVePDdpYJLrlhZ ZbLTrd8hwerUyRY5CO+c9Y6kuqyE5kc= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=k4YSYz8E; spf=pass (imf30.hostedemail.com: domain of 3u8pSYwoKCLwlvjqwijvqpiqqing.eqonkpwz-oomxcem.qti@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3u8pSYwoKCLwlvjqwijvqpiqqing.eqonkpwz-oomxcem.qti@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370236; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=PyHYN0BZDkl5R+we+ok2phOJ6Br6Eo9HknyqZ/f6blI=; b=NFNLTDY6fAw6eo1HGdtzikoEsgIqQyVkaiPWAXbAOCjucgceYY66+4nBgGtob6deI0UCOL yKNyVbTwPgHI6XmAYWygsKv31iqmWEowFISXjVyhfXJSCq36gYVdx3rxf1JhyZCPiNiJuM V5lAgAM7LsLKSZE5XWFR30wWYozeM3I= Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=k4YSYz8E; spf=pass (imf30.hostedemail.com: domain of 3u8pSYwoKCLwlvjqwijvqpiqqing.eqonkpwz-oomxcem.qti@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3u8pSYwoKCLwlvjqwijvqpiqqing.eqonkpwz-oomxcem.qti@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: 9a1u6jhan6srm64eq97oxymmytp9rjch X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 304AD80035 X-Rspam-User: X-HE-Tag: 1666370235-200954 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is how it should have been to begin with. It would be very bad if we actually set PageUptodate with a UFFDIO_CONTINUE, as UFFDIO_CONTINUE doesn't actually set/update the contents of the page, so we would be exposing a non-zeroed page to the user. The reason this change is being made now is because UFFDIO_CONTINUEs on subpages definitely shouldn't set this page flag on the head page. Signed-off-by: James Houghton --- mm/hugetlb.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 1a7dc7b2e16c..650761cdd2f6 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6097,7 +6097,10 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * preceding stores to the page contents become visible before * the set_pte_at() write. */ - __SetPageUptodate(page); + if (!is_continue) + __SetPageUptodate(page); + else + VM_WARN_ON_ONCE_PAGE(!PageUptodate(page), page); /* Add shared, newly allocated pages to the page cache. */ if (vm_shared && !is_continue) { From patchwork Fri Oct 21 16:36:18 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015071 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E0752FA373D for ; Fri, 21 Oct 2022 16:37:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 85D1A8E0005; Fri, 21 Oct 2022 12:37:18 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7999E8E0001; Fri, 21 Oct 2022 12:37:18 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C31F8E0005; Fri, 21 Oct 2022 12:37:18 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 469528E0001 for ; Fri, 21 Oct 2022 12:37:18 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 0E595AAD57 for ; Fri, 21 Oct 2022 16:37:18 +0000 (UTC) X-FDA: 80045511756.15.EF66B1C Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf26.hostedemail.com (Postfix) with ESMTP id AEAD7140036 for ; Fri, 21 Oct 2022 16:37:17 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id y6-20020a25b9c6000000b006c1c6161716so3754927ybj.8 for ; Fri, 21 Oct 2022 09:37:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=esgdcnz6KGc3B9V+LZ2N8Ds4/nBZJJcBbCbHMbBwHjk=; b=pIJH5olO/I2dA/fnJ38RNDUNag1jCBRbSJReNXDuFWFG47O9hxJi0FrZmqRDXnMQI/ /1xVDVB5mv31fhcLpuKcwsVOmx47jsNdpgEmx1ogSlHkiRq2gQSCkBfkeXDdnWNMRdDn WgoWwu2YOgG4/TAj7KoF7BJ0B/PakCZecR5Fz7AxDD2HusQ0sKx0GmA+SDCOGrHBGHKJ 9+3IXcRAfLkNOoszDtrPaH07o7iDaKBcPTzeRh8Cc75lTqqKCgJ+kxoLRkdDPkZ1rJBO w37Zjh0EOVT4NqJ7iFe9EVTQT2dlpacbUdy+60R4swoG2PQ3BsajOoLaJnerCKU7XJt3 wX9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=esgdcnz6KGc3B9V+LZ2N8Ds4/nBZJJcBbCbHMbBwHjk=; b=CJyadUbD1p9CbWtgz6OmZlRMalPUSqQZXW9cqipm13gJjr80InvOiKAvbDBILEp0YN /0WcZDMbes4zjc8kfEv0cZ0qbCDlCgI5ycUew12wsTyU/41ZyFmEnL6EbV1Pa4+a+UKP aRBBT8pzmiC0x+gu6vEvZcicKqSC7OPILC542pXEMtFUb69DTTgnbx53I0jQV3f9UNzJ q4POa0jAtflDLSdQdhm6ySO0dI/ZQG7ku49SSIcMnAg7aaRNTucHHh8dB59rRTc5vdj4 coMv26QExTIBJo88ZvugxJD9FdXyQDhKuQZIUvKEbewIStNRu1pfz8KkqXbJPhRUXhMl 0YEg== X-Gm-Message-State: ACrzQf2Pt54dxzBIl3rsJkBOO3mqDpk9GdFlSzMURLIe9Jy4OowN48fk Fpz0Ss0webPohhOmPQM70DXPRT516jYWO/mB X-Google-Smtp-Source: AMsMyM5MFGoqy8wia1mdc/lLpWSr3NMH5TcxOdBPl8JHYJ3m/WiGfOvhhzuYtwCQ7+h6B+nksqPI0zssr+I/zZcM X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:3cc4:0:b0:6af:c67e:909e with SMTP id j187-20020a253cc4000000b006afc67e909emr17009872yba.266.1666370236967; Fri, 21 Oct 2022 09:37:16 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:18 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-3-jthoughton@google.com> Subject: [RFC PATCH v2 02/47] hugetlb: remove mk_huge_pte; it is unused From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370237; a=rsa-sha256; cv=none; b=fLkHiW110y30Mc5eDWHlgrRVmRUz9YwAUsJaYOvxg52dZJuOLgdzi4QukaQTAAYnSl5RmS xmgISKbg3O4BEeGhxtgIZQozm+/6Axd//2TsrlF1oV/xNteQjPJ7Bx3P3VfleewQWZ0s8/ 2CVNgjS3JfBM7Wzjzcbf+qLnAfmES5s= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pIJH5olO; spf=pass (imf26.hostedemail.com: domain of 3vMpSYwoKCL0mwkrxjkwrqjrrjoh.frpolqx0-ppnydfn.ruj@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3vMpSYwoKCL0mwkrxjkwrqjrrjoh.frpolqx0-ppnydfn.ruj@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370237; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=esgdcnz6KGc3B9V+LZ2N8Ds4/nBZJJcBbCbHMbBwHjk=; b=ru+4QRiz6wG7AwSbEvwuZXHV/fBURLx052O4QfGDM9AkoCQr+MBkRso4+TbYKIiVazeomN YAXlYofMtoqCJQC25/I9Y01c9zQmlmuGw4suPhf4yw/2Q8ZNmFzzUZlFNCZdot/iVGA5Mg Kbe0nuPFFudXxxh7ekF+RogxUQBgtcU= X-Stat-Signature: gdrt3bcdhpmpdknqj3g8r78nrycdkqod X-Rspamd-Queue-Id: AEAD7140036 X-Rspam-User: X-Rspamd-Server: rspam03 Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=pIJH5olO; spf=pass (imf26.hostedemail.com: domain of 3vMpSYwoKCL0mwkrxjkwrqjrrjoh.frpolqx0-ppnydfn.ruj@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3vMpSYwoKCL0mwkrxjkwrqjrrjoh.frpolqx0-ppnydfn.ruj@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1666370237-144536 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: mk_huge_pte is unused and not necessary. pte_mkhuge is the appropriate function to call to create a HugeTLB PTE (see Documentation/mm/arch_pgtable_helpers.rst). It is being removed now to avoid complicating the implementation of HugeTLB high-granularity mapping. Signed-off-by: James Houghton Acked-by: Peter Xu Acked-by: Mina Almasry Reviewed-by: Mike Kravetz --- arch/s390/include/asm/hugetlb.h | 5 ----- include/asm-generic/hugetlb.h | 5 ----- mm/debug_vm_pgtable.c | 2 +- mm/hugetlb.c | 7 +++---- 4 files changed, 4 insertions(+), 15 deletions(-) diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h index ccdbccfde148..c34893719715 100644 --- a/arch/s390/include/asm/hugetlb.h +++ b/arch/s390/include/asm/hugetlb.h @@ -77,11 +77,6 @@ static inline void huge_ptep_set_wrprotect(struct mm_struct *mm, set_huge_pte_at(mm, addr, ptep, pte_wrprotect(pte)); } -static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) -{ - return mk_pte(page, pgprot); -} - static inline int huge_pte_none(pte_t pte) { return pte_none(pte); diff --git a/include/asm-generic/hugetlb.h b/include/asm-generic/hugetlb.h index a57d667addd2..aab9e46fa628 100644 --- a/include/asm-generic/hugetlb.h +++ b/include/asm-generic/hugetlb.h @@ -5,11 +5,6 @@ #include #include -static inline pte_t mk_huge_pte(struct page *page, pgprot_t pgprot) -{ - return mk_pte(page, pgprot); -} - static inline unsigned long huge_pte_write(pte_t pte) { return pte_write(pte); diff --git a/mm/debug_vm_pgtable.c b/mm/debug_vm_pgtable.c index 2b61fde8c38c..10573a283a12 100644 --- a/mm/debug_vm_pgtable.c +++ b/mm/debug_vm_pgtable.c @@ -929,7 +929,7 @@ static void __init hugetlb_basic_tests(struct pgtable_debug_args *args) * as it was previously derived from a real kernel symbol. */ page = pfn_to_page(args->fixed_pmd_pfn); - pte = mk_huge_pte(page, args->page_prot); + pte = mk_pte(page, args->page_prot); WARN_ON(!huge_pte_dirty(huge_pte_mkdirty(pte))); WARN_ON(!huge_pte_write(huge_pte_mkwrite(huge_pte_wrprotect(pte)))); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 650761cdd2f6..20a111b532aa 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4728,11 +4728,10 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, unsigned int shift = huge_page_shift(hstate_vma(vma)); if (writable) { - entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_huge_pte(page, - vma->vm_page_prot))); + entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_pte(page, + vma->vm_page_prot))); } else { - entry = huge_pte_wrprotect(mk_huge_pte(page, - vma->vm_page_prot)); + entry = huge_pte_wrprotect(mk_pte(page, vma->vm_page_prot)); } entry = pte_mkyoung(entry); entry = arch_make_huge_pte(entry, shift, vma->vm_flags); From patchwork Fri Oct 21 16:36:19 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015072 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5D382FA373E for ; Fri, 21 Oct 2022 16:37:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F13798E0006; Fri, 21 Oct 2022 12:37:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E9C968E0001; Fri, 21 Oct 2022 12:37:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA18E8E0006; Fri, 21 Oct 2022 12:37:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id B8CC38E0001 for ; Fri, 21 Oct 2022 12:37:19 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8F7E7140C26 for ; Fri, 21 Oct 2022 16:37:19 +0000 (UTC) X-FDA: 80045511798.29.1F76339 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf16.hostedemail.com (Postfix) with ESMTP id 273ED180033 for ; Fri, 21 Oct 2022 16:37:18 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-352e29ff8c2so34074357b3.21 for ; Fri, 21 Oct 2022 09:37:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=lZUN9P7JGIQHPnquxx9reV3Jbe3B3qBfX9sZ+LRQm7o=; b=GJoNeaGDjwD1Fl1cyhHuK2aLhzkgPOKG7HEw7TkRJFgoz4poH2nLCE3sHMXaMDFunc w58AGv5zD/ndOCN5807bGBahVC5t3I4qD5n2viZgEwpgYmeuPO+w8ZUmQC0tetwyZma6 tnt+SbOGRucmBwJ4t0wW+XGY70fiV6i1Si6lkcLf27swVkRoOYIPAlkG/JqDsto1XLnY Mc9LriuigNU7e6Dt/Y1OKV5sJg/o96ytQuNXvUjynTNWGnVUG1Ap7CSa0NXR3xqkTQw9 i4dLRfrG2R7TcYitFjvdRgvivePstzTsFNXsq03qGK/00RpCzirkjyJ+wubBLyXDN6fR YW/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=lZUN9P7JGIQHPnquxx9reV3Jbe3B3qBfX9sZ+LRQm7o=; b=lmW+/uzPX4jQNMaBnA6/HDjq2whbdtv643EDu5EhMTYgBaVT2QhkaAzxgZ8/Z9sq2d keln1Uy178gZ86nD8QQDLK3i+xE5K1iyw2n1pEBafWGe7v5ZUzwBRA9FE7993j5/MTTs PWIE0vXgxKFSTJhWGz2nw6LkA57fecxmDmwnMzJIdybChq/0O8ffDGkwNzPauSCT4BSZ ymzB816EtkwwPC/si45VCF5KXuQ/Jv7Q9a2McXhnnURHBtf5K8tk1cl+xAGt+H5Nkud8 wPtQLOFiYp/9XM9hcnaVpOvTyaxG7xPrX/ftqIda9h/xy93ccVeRVb7Pd3VUNA9rJYOP Cl6w== X-Gm-Message-State: ACrzQf0+cXOaXHtLSrVou6WHvyY7DeAPoYW0SsZLEdcGzBbXxBq9b8Xs fNU/LFTYSxJwJqCy9MJVsclyg3YTwT/rrAY3 X-Google-Smtp-Source: AMsMyM6glgNF0xzBVlnv9CGZLhulgQ4Bomw5BBvj+Pqsya/BXuBbeRDw2FnYNlQBJbKk9FqkELlNYTzbRdED6jS4 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:383:0:b0:350:9b62:60bc with SMTP id 125-20020a810383000000b003509b6260bcmr17608845ywd.514.1666370238413; Fri, 21 Oct 2022 09:37:18 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:19 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-4-jthoughton@google.com> Subject: [RFC PATCH v2 03/47] hugetlb: remove redundant pte_mkhuge in migration path From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=GJoNeaGD; spf=pass (imf16.hostedemail.com: domain of 3vspSYwoKCL8oymtzlmytslttlqj.htrqnsz2-rrp0fhp.twl@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3vspSYwoKCL8oymtzlmytslttlqj.htrqnsz2-rrp0fhp.twl@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370239; a=rsa-sha256; cv=none; b=vlk8oxAXzDq1LsBcSh2I5sPfgGFx48/XTcUz8F/Gkg/GHVItGpY6rDwMxy9uwy0MBSG0m9 zfdjoooE1b7vZ/BVb4MAnn0afbUNcDpgY+1tYdVHYteHD2rtSUi1BIyw5Q4aXkT3beysVp cZuzVjW/KsIaaLSk73GvCUeaCAzbatQ= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370239; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=lZUN9P7JGIQHPnquxx9reV3Jbe3B3qBfX9sZ+LRQm7o=; b=Cv6YZ1oA5QuHb1XVEBsIdNmmEx3o6+Nigi05PSK0xjyBOzVy13ie72MXKnw+sVzR/MZ3vl 8b75uCQgwd4eJXyXTaxbirYRFVFn1bKSmoeipVu64AlKzBC/hUUOJhBS4/jk79eHv3jh5z C5CQD/8214JU7a//DIPHnSbX/pngVRw= X-Stat-Signature: 5834m7665cg8yt4a3q5t6w96p1kcxmzp X-Rspamd-Queue-Id: 273ED180033 X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=GJoNeaGD; spf=pass (imf16.hostedemail.com: domain of 3vspSYwoKCL8oymtzlmytslttlqj.htrqnsz2-rrp0fhp.twl@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3vspSYwoKCL8oymtzlmytslttlqj.htrqnsz2-rrp0fhp.twl@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam06 X-HE-Tag: 1666370238-535234 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: arch_make_huge_pte, which is called immediately following pte_mkhuge, already makes the necessary changes to the PTE that pte_mkhuge would have. The generic implementation of arch_make_huge_pte simply calls pte_mkhuge. Signed-off-by: James Houghton Acked-by: Peter Xu Acked-by: Mina Almasry Reviewed-by: Mike Kravetz --- mm/migrate.c | 1 - 1 file changed, 1 deletion(-) diff --git a/mm/migrate.c b/mm/migrate.c index 8e5eb6ed9da2..1457cdbb7828 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -237,7 +237,6 @@ static bool remove_migration_pte(struct folio *folio, if (folio_test_hugetlb(folio)) { unsigned int shift = huge_page_shift(hstate_vma(vma)); - pte = pte_mkhuge(pte); pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) hugepage_add_anon_rmap(new, vma, pvmw.address, From patchwork Fri Oct 21 16:36:20 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015073 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 421F6FA373F for ; Fri, 21 Oct 2022 16:37:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id D67218E0007; Fri, 21 Oct 2022 12:37:21 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C2C488E0001; Fri, 21 Oct 2022 12:37:21 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9E2D28E0007; Fri, 21 Oct 2022 12:37:21 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 8C4EE8E0001 for ; Fri, 21 Oct 2022 12:37:21 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 5A105120504 for ; Fri, 21 Oct 2022 16:37:21 +0000 (UTC) X-FDA: 80045511882.01.1ED97D9 Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) by imf09.hostedemail.com (Postfix) with ESMTP id E27FC14003A for ; Fri, 21 Oct 2022 16:37:20 +0000 (UTC) Received: by mail-vs1-f73.google.com with SMTP id p10-20020a056102200a00b003a786b572f3so1067167vsr.20 for ; Fri, 21 Oct 2022 09:37:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=3+WerD0u1o5rdjYYXB/kVnQa0VLhkM9eeGW4etvGwB4=; b=tdKQZAo0gYs0G2kdztjg1TX/oj/RnJXrve82tS8xBK6mSdOdMhGNTvkXAs8ZVxrK7z c8Kzgsu4FgaGzd7K4NG6nHrY9Icl4OgkNHSsL2+fveGeW7tPGAvA7jOfo/RrP3+xhVsj hj3HZu2VVZPzkXEDKmJLfia5sRbAnjgM+kgr5WU35yWKtsbyjTr1njgKfeGFoYirHR50 cSTvm6iAX5EBkULkMrNaZyqGJUKF7r8UPQRlxf5X/7PYgCLrcqJENsCKtKyvvJj2Wm1K 9YyCpBMTJx1mwbB+Hpr2b1STLxNq4+6UD5uPyGqMcDZTJDNCR2PlND0UA0Ab0EPHpo7w JnYg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=3+WerD0u1o5rdjYYXB/kVnQa0VLhkM9eeGW4etvGwB4=; b=SrZI1aN/ZWtCyuUUSQkMFpp3rTJH7nHC6VqP4G+IjgKUMY3p4dbsrkVpPZ9UzdqN5M PGRkx+7JBXIsSCUB4y/s5QKrPggw1HxAxHUoFsPZElv4M6Vrz0Z99EA3VT8rKe9AVJJ+ VfzB8144fHr6jOUz95pzxkAMQXxWf9onPt4omtfaWAwpeSb5PzQlePa1HKXBT0NMc5bf RZ8ZW60+X1FaCi4G4b7Sd410Tt6o7IZRj5E+OpfY0QLUUnbCFW2HyWdhkmJYXLP1ntXh VZupbao9ExIj/cBCxQncDQUFXEJicvy3dXbJeeaVcxYqg67PwUUj437y91O0REjutVz5 KOGg== X-Gm-Message-State: ACrzQf3rhpCsZovldeOLVm52wH4ZRhgEeJUi2xJoHB0dyvOZQSDvTRQq PUwDxYtApCDb3Ojpw2FkoZawTGvvlY/lzK+d X-Google-Smtp-Source: AMsMyM6gtOTS1JtxBT054d/UW8WqhzjYFLQgqbNi7vzatLh9x6zB+jmUsXXfe5Kdu0BRGsbHY5icbADcYMj/NEsT X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:aa42:0:b0:3ab:81ee:8fa9 with SMTP id t63-20020a1faa42000000b003ab81ee8fa9mr12468300vke.17.1666370240102; Fri, 21 Oct 2022 09:37:20 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:20 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-5-jthoughton@google.com> Subject: [RFC PATCH v2 04/47] hugetlb: only adjust address ranges when VMAs want PMD sharing From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=tdKQZAo0; spf=pass (imf09.hostedemail.com: domain of 3wMpSYwoKCMEq0ov1no0vunvvnsl.jvtspu14-ttr2hjr.vyn@flex--jthoughton.bounces.google.com designates 209.85.217.73 as permitted sender) smtp.mailfrom=3wMpSYwoKCMEq0ov1no0vunvvnsl.jvtspu14-ttr2hjr.vyn@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370240; a=rsa-sha256; cv=none; b=1LlCC/WfD3+O8xGzC27xakpt7AxoNPvZv6i9yT3jMiDRnfMqLEuvlSeV/dBdn2bZUObktn lrMNXVWxEymkGKAGXFNknPmfd0A37QgQtVrHUo4uTWCMw3HecCXiGM+cd0xhiE4DZGobdy uqye/HFRVFBFWOsP7sKVsaTj/NG1Iik= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370240; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=3+WerD0u1o5rdjYYXB/kVnQa0VLhkM9eeGW4etvGwB4=; b=ftKwbWyaCvfXrkGslCnnomB6S/Rf/7DqZ9Mw4osBkFxayNa0pu1ePCQrQI5/2QbAi4Tc46 mT0hr+KBy0cKozY4kMNExGsTB5eNVbC926Ka0YC/JxQszbU0qsxP8E2Xs3eRQYFoeKbli7 rJAruePNEZjoe3nLtJPXO0C8p+GzTzw= X-Rspamd-Queue-Id: E27FC14003A Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=tdKQZAo0; spf=pass (imf09.hostedemail.com: domain of 3wMpSYwoKCMEq0ov1no0vunvvnsl.jvtspu14-ttr2hjr.vyn@flex--jthoughton.bounces.google.com designates 209.85.217.73 as permitted sender) smtp.mailfrom=3wMpSYwoKCMEq0ov1no0vunvvnsl.jvtspu14-ttr2hjr.vyn@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: 7j973pq5x6j5i3nup9tein39qocsy7jm X-HE-Tag: 1666370240-677980 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently this check is overly aggressive. For some userfaultfd VMAs, VMA sharing is disabled, yet we still widen the address range, which is used for flushing TLBs and sending MMU notifiers. This is done now, as HGM VMAs also have sharing disabled, yet would still have flush ranges adjusted. Overaggressively flushing TLBs and triggering MMU notifiers is particularly harmful with lots of high-granularity operations. Signed-off-by: James Houghton Acked-by: Peter Xu Reviewed-by: Mike Kravetz --- mm/hugetlb.c | 21 +++++++++++++++------ 1 file changed, 15 insertions(+), 6 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 20a111b532aa..52cec5b0789e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6835,22 +6835,31 @@ static unsigned long page_table_shareable(struct vm_area_struct *svma, return saddr; } -bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +static bool pmd_sharing_possible(struct vm_area_struct *vma) { - unsigned long start = addr & PUD_MASK; - unsigned long end = start + PUD_SIZE; - #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; #endif /* - * check on proper vm_flags and page table alignment + * Only shared VMAs can share PMDs. */ if (!(vma->vm_flags & VM_MAYSHARE)) return false; if (!vma->vm_private_data) /* vma lock required for sharing */ return false; + return true; +} + +bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) +{ + unsigned long start = addr & PUD_MASK; + unsigned long end = start + PUD_SIZE; + /* + * check on proper vm_flags and page table alignment + */ + if (!pmd_sharing_possible(vma)) + return false; if (!range_in_vma(vma, start, end)) return false; return true; @@ -6871,7 +6880,7 @@ void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, * vma needs to span at least one aligned PUD size, and the range * must be at least partially within in. */ - if (!(vma->vm_flags & VM_MAYSHARE) || !(v_end > v_start) || + if (!pmd_sharing_possible(vma) || !(v_end > v_start) || (*end <= v_start) || (*start >= v_end)) return; From patchwork Fri Oct 21 16:36:21 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015074 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6339AFA373E for ; Fri, 21 Oct 2022 16:37:23 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 09E588E0008; Fri, 21 Oct 2022 12:37:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC3668E0001; Fri, 21 Oct 2022 12:37:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEF4E8E0008; Fri, 21 Oct 2022 12:37:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id B56268E0001 for ; Fri, 21 Oct 2022 12:37:22 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 8044F1A1135 for ; Fri, 21 Oct 2022 16:37:22 +0000 (UTC) X-FDA: 80045511924.20.7037E4E Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) by imf14.hostedemail.com (Postfix) with ESMTP id 26BEA100026 for ; Fri, 21 Oct 2022 16:37:21 +0000 (UTC) Received: by mail-vs1-f73.google.com with SMTP id 124-20020a671082000000b0039b07671c7aso1059315vsq.13 for ; Fri, 21 Oct 2022 09:37:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=34cx5eEOWWMEza3+3K1vLVCCrFvx1bz9vEUMrzq2ytU=; b=Sp9ejFIjwkTD2k7b5O35d/eTr3qFaL4Cu0+Y3Jqen1+jW42rNvItxKaNURZz5TSKk7 x35Bvv5PH68z6pCQYFbhWyoyGgG4rzUpeOPbjmYrzgvXuzuUNcyvX4/Otvfkuiq5+cZv HHQynZVto3uDixtcVQsCf3qWQvklcY23DsStSgC2qGnXAGQ/l0stiqvSXSvt9oZVTDkE Caber1o4FewbpJvcwU1yEnRVdmKdy2XZSnVqN1DFbOLIkZI+hKbcR55fyjn4Og0CJ7M/ x13b8bjYpqhnO2fokhLpSAWC6h6Udyc8I00LhZRGAFNopGb6wpiImE12TuoAvXq6t9Ke 1k4A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=34cx5eEOWWMEza3+3K1vLVCCrFvx1bz9vEUMrzq2ytU=; b=O1LSqQsoxEa/98CjBfxgKUFCi55nFJSCNAPOiIgmCpf+TxW+Vs7zbPRtkhSOLQsAkr hJ5UE9o9GaioGfuZAJ8M7svUN9pEjt8PnPpVhZDdXgEoCF9Y5fNkVR50zu/PFOnLXemG yPrI14NBhqrTNMDoT16P9nd/SEU43M9gmQFFnG3HMAsVLCKJWXv46rE1ZcU7IC6inohw LwByMKoYIKoFgWbYqnn+3HgyXYYk3NxHfVe9kq3LG4GOUKn0ODpQZn56assnJxn+pwA3 uig2AjUborLWoq8kUi0DuzGlD/+yjOiENybNXxGuz0mrEDUC5tm0Qlp6SIqP1lCqJpUg i1Fw== X-Gm-Message-State: ACrzQf0uRgN9WQR6ybQ9UjUuamBhGMQ0MHPrZ+AssXlC8aLZpILnbqtI nlxJ8kMSjiuHDHV58Qkard9/9JZ3SZlmOYU1 X-Google-Smtp-Source: AMsMyM6Mvo2qQVK8sdSIbcMVnfOhwFg4FhU4eWPB/UUGtomewuseawpOfNBtO35uLxun2jYxXKf3dRRjIA99NUXN X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:fa59:0:b0:3a7:7516:b43b with SMTP id j25-20020a67fa59000000b003a77516b43bmr14278433vsq.83.1666370241512; Fri, 21 Oct 2022 09:37:21 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:21 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-6-jthoughton@google.com> Subject: [RFC PATCH v2 05/47] hugetlb: make hugetlb_vma_lock_alloc return its failure reason From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370242; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=34cx5eEOWWMEza3+3K1vLVCCrFvx1bz9vEUMrzq2ytU=; b=fi9Sq/re+NHFjELxDOxQFUlolivAOMvsONqYGCTIfHry94PDVZ96KJnMfyOug/xRjLSxMz 2PT2FKgB5qh4lKnYFo5Q1bbKiPVIlaGKQ29rk+wsAJn6rQPBrtwkqiMCwCbs1aJeb+27OD mOqyGWG71QEUgtnO8kWS1OKRnXazBns= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Sp9ejFIj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of 3wcpSYwoKCMIr1pw2op1wvowwotm.kwutqv25-uus3iks.wzo@flex--jthoughton.bounces.google.com designates 209.85.217.73 as permitted sender) smtp.mailfrom=3wcpSYwoKCMIr1pw2op1wvowwotm.kwutqv25-uus3iks.wzo@flex--jthoughton.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370242; a=rsa-sha256; cv=none; b=CCkYb4uH9m3ibc7IOOkJLFSx8gj5Qna8YBmOldjzZvzRi25qkhmEjillXUsE6BX1CXFs7j cl7vLxTuGmV2zckJ5h1sgBtxJnMn/rPuEgXTTGoWJAjHSiV2RjIuSUxi1lf66ZArxjjqPE cNQL2c/ZjODZlnWamamnLvIGT6HRI5M= X-Rspam-User: X-Rspamd-Queue-Id: 26BEA100026 X-Stat-Signature: 7tnu6fiqpb7p5oksezrjohityacgcy7t Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Sp9ejFIj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of 3wcpSYwoKCMIr1pw2op1wvowwotm.kwutqv25-uus3iks.wzo@flex--jthoughton.bounces.google.com designates 209.85.217.73 as permitted sender) smtp.mailfrom=3wcpSYwoKCMIr1pw2op1wvowwotm.kwutqv25-uus3iks.wzo@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam07 X-HE-Tag: 1666370241-790313 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently hugetlb_vma_lock_alloc doesn't return anything, as there is no need: if it fails, PMD sharing won't be enabled. However, HGM requires that the VMA lock exists, so we need to verify that hugetlb_vma_lock_alloc actually succeeded. If hugetlb_vma_lock_alloc fails, then we can pass that up to the caller that is attempting to enable HGM. Signed-off-by: James Houghton --- mm/hugetlb.c | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 52cec5b0789e..dc82256b89dd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -92,7 +92,7 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); static void hugetlb_vma_lock_free(struct vm_area_struct *vma); -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma); +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) @@ -7001,17 +7001,17 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) } } -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) { struct hugetlb_vma_lock *vma_lock; /* Only establish in (flags) sharable vmas */ if (!vma || !(vma->vm_flags & VM_MAYSHARE)) - return; + return -EINVAL; - /* Should never get here with non-NULL vm_private_data */ + /* We've already allocated the lock. */ if (vma->vm_private_data) - return; + return 0; vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL); if (!vma_lock) { @@ -7026,13 +7026,14 @@ static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) * allocation failure. */ pr_warn_once("HugeTLB: unable to allocate vma specific lock\n"); - return; + return -ENOMEM; } kref_init(&vma_lock->refs); init_rwsem(&vma_lock->rw_sema); vma_lock->vma = vma; vma->vm_private_data = vma_lock; + return 0; } /* @@ -7160,8 +7161,9 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) { } -static void hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) { + return 0; } pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, From patchwork Fri Oct 21 16:36:22 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015075 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A89F7FA373D for ; Fri, 21 Oct 2022 16:37:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E9E938E0009; Fri, 21 Oct 2022 12:37:23 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E27D98E0001; Fri, 21 Oct 2022 12:37:23 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C2B828E0009; Fri, 21 Oct 2022 12:37:23 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id AB4D38E0001 for ; Fri, 21 Oct 2022 12:37:23 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 80F911A0512 for ; Fri, 21 Oct 2022 16:37:23 +0000 (UTC) X-FDA: 80045511966.02.EC10930 Received: from mail-vs1-f74.google.com (mail-vs1-f74.google.com [209.85.217.74]) by imf15.hostedemail.com (Postfix) with ESMTP id 2DB5BA0037 for ; Fri, 21 Oct 2022 16:37:22 +0000 (UTC) Received: by mail-vs1-f74.google.com with SMTP id a126-20020a676684000000b003a6eeb4e8b7so1039168vsc.1 for ; Fri, 21 Oct 2022 09:37:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=EmV7/WIsSX/8L/pw2a2OJCYUYMgWWbmlq26RmUc7uno=; b=EjsRjGW9BIOWJ38zTf5oxqMXDmlQSTQ3fYn3u9zu3boTToWZbcGqMU9ujmx640GFRE 9DUkw2yIPQ77Nn5MUKOb5kXytgsWQCvBSzYZM5ncj6hObT+maHj7QLFp+FH44fcEAm6s E54uAsE9CsLgxWffXC8Kk3/4Jzx+VZfruoAAQwnnwD+/xbUG4mg5X6y9LbZPB2Zen+1y 1D72cLftsFvHU+NJIIKl4sRZGZcynq0InczbAJ1ff1mayWhCSq0HjcxZ85zVQqshHO7I JBMe4cxSTiZbZ0HfhHlpTs6kGaZKFF03HNKa1/XrU2XZyxzQiAxSFXRPbPzeToVyxPWz xW8Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=EmV7/WIsSX/8L/pw2a2OJCYUYMgWWbmlq26RmUc7uno=; b=oydHP2pKPgbfpkeoMUP7TmKnktpeEC31cy/eqEEh7s2b+GOQEiPA1APQ3bduT39Jc0 wpW7xtmt0TeCXj6/bOBhpFygFx4aVd1TPbNG5mxGYx1/AJR72qcMS9YagTETcLckesmv f8MWjbViFsRbaX9nIYz8BRJq68mSHBy+QNL+R8zzYmdzioQpL2D7J7RjWO6E4/jzgWvC CHj0Z9XLyXHuXeu/u0rkqL0GJQ+7EAlNms1qr9GGsTTbn7laRfDf/Om6Gw5QFqRoZqRA lyTDdxd5qrMGJJs76cJELHfONdzyAvk6hm5UprMF8eCYUDdqH1/HVpUzerCJO9qZM190 mI/g== X-Gm-Message-State: ACrzQf0+ykB4fduRVg+XzQDxKAus5TX/qUlNHl0ySgqwameZ2dNG3ozK qX2Cye6ogcbcdhIV90eUktUEI/hXUi2TJ9Hr X-Google-Smtp-Source: AMsMyM78OfFVLJ/HGE0WMCwWX8vn7VKvg/OslMBhw8pk/6tm+bsmEHDM6BDnshVWSAveaCuilj26QFmSeysallRX X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:3742:0:b0:403:e8e2:865a with SMTP id i2-20020ab03742000000b00403e8e2865amr1305908uat.37.1666370242435; Fri, 21 Oct 2022 09:37:22 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:22 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-7-jthoughton@google.com> Subject: [RFC PATCH v2 06/47] hugetlb: extend vma lock for shared vmas From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370243; a=rsa-sha256; cv=none; b=kDRtJFoirB0LbgfxtBD/KYBcD4G/78e/UU+RflWB+kuWYuvpKEj13tM8Se2ZMivMhgEJ3C GF7HtrAMArkwo0tBZjN1wvw+0TPVcVlO8ZMAIibP+sDQ8+8ST5GncZ4uRaL3BrUYZAAWUU 6VGn3okS6q7RwfVZdGLFbSlrseLMufQ= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EjsRjGW9; spf=pass (imf15.hostedemail.com: domain of 3wspSYwoKCMMs2qx3pq2xwpxxpun.lxvurw36-vvt4jlt.x0p@flex--jthoughton.bounces.google.com designates 209.85.217.74 as permitted sender) smtp.mailfrom=3wspSYwoKCMMs2qx3pq2xwpxxpun.lxvurw36-vvt4jlt.x0p@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370243; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=EmV7/WIsSX/8L/pw2a2OJCYUYMgWWbmlq26RmUc7uno=; b=uC8NiZACuSOxpFn/EsIWYDn6kJmJitom+yF82yGTbGXu5NCwfmzHVkIOi5yqIstHHp+cyS 43cG/5mTn3dCM/qcJD4X1qbhGPbQGnBsMgGsaRJUmdv8GBD/pq17/u2t580WaYUVpbQBjg 5+dRVNidLyZrnDhqdNFUAi0v5NSaTwo= Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=EjsRjGW9; spf=pass (imf15.hostedemail.com: domain of 3wspSYwoKCMMs2qx3pq2xwpxxpun.lxvurw36-vvt4jlt.x0p@flex--jthoughton.bounces.google.com designates 209.85.217.74 as permitted sender) smtp.mailfrom=3wspSYwoKCMMs2qx3pq2xwpxxpun.lxvurw36-vvt4jlt.x0p@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: tooiusdpcg5uootpn4oupdxoigot9nx9 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 2DB5BA0037 X-Rspam-User: X-HE-Tag: 1666370242-571608 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This allows us to add more data into the shared structure, which we will use to store whether or not HGM is enabled for this VMA or not, as HGM is only available for shared mappings. It may be better to include HGM as a VMA flag instead of extending the VMA lock structure. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 4 +++ mm/hugetlb.c | 65 +++++++++++++++++++++-------------------- 2 files changed, 37 insertions(+), 32 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index a899bc76d677..534958499ac4 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -121,6 +121,10 @@ struct hugetlb_vma_lock { struct vm_area_struct *vma; }; +struct hugetlb_shared_vma_data { + struct hugetlb_vma_lock vma_lock; +}; + extern struct resv_map *resv_map_alloc(void); void resv_map_release(struct kref *ref); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index dc82256b89dd..5ae8bc8c928e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -91,8 +91,8 @@ struct mutex *hugetlb_fault_mutex_table ____cacheline_aligned_in_smp; /* Forward declaration */ static int hugetlb_acct_memory(struct hstate *h, long delta); -static void hugetlb_vma_lock_free(struct vm_area_struct *vma); -static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma); +static void hugetlb_vma_data_free(struct vm_area_struct *vma); +static int hugetlb_vma_data_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); static inline bool subpool_is_free(struct hugepage_subpool *spool) @@ -4643,11 +4643,11 @@ static void hugetlb_vm_op_open(struct vm_area_struct *vma) if (vma_lock) { if (vma_lock->vma != vma) { vma->vm_private_data = NULL; - hugetlb_vma_lock_alloc(vma); + hugetlb_vma_data_alloc(vma); } else pr_warn("HugeTLB: vma_lock already exists in %s.\n", __func__); } else - hugetlb_vma_lock_alloc(vma); + hugetlb_vma_data_alloc(vma); } } @@ -4659,7 +4659,7 @@ static void hugetlb_vm_op_close(struct vm_area_struct *vma) unsigned long reserve, start, end; long gbl_reserve; - hugetlb_vma_lock_free(vma); + hugetlb_vma_data_free(vma); resv = vma_resv_map(vma); if (!resv || !is_vma_resv_set(vma, HPAGE_RESV_OWNER)) @@ -6629,7 +6629,7 @@ bool hugetlb_reserve_pages(struct inode *inode, /* * vma specific semaphore used for pmd sharing synchronization */ - hugetlb_vma_lock_alloc(vma); + hugetlb_vma_data_alloc(vma); /* * Only apply hugepage reservation if asked. At fault time, an @@ -6753,7 +6753,7 @@ bool hugetlb_reserve_pages(struct inode *inode, hugetlb_cgroup_uncharge_cgroup_rsvd(hstate_index(h), chg * pages_per_huge_page(h), h_cg); out_err: - hugetlb_vma_lock_free(vma); + hugetlb_vma_data_free(vma); if (!vma || vma->vm_flags & VM_MAYSHARE) /* Only call region_abort if the region_chg succeeded but the * region_add failed or didn't run. @@ -6901,55 +6901,55 @@ static bool __vma_shareable_flags_pmd(struct vm_area_struct *vma) void hugetlb_vma_lock_read(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; - down_read(&vma_lock->rw_sema); + down_read(&data->vma_lock.rw_sema); } } void hugetlb_vma_unlock_read(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; - up_read(&vma_lock->rw_sema); + up_read(&data->vma_lock.rw_sema); } } void hugetlb_vma_lock_write(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; - down_write(&vma_lock->rw_sema); + down_write(&data->vma_lock.rw_sema); } } void hugetlb_vma_unlock_write(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; - up_write(&vma_lock->rw_sema); + up_write(&data->vma_lock.rw_sema); } } int hugetlb_vma_trylock_write(struct vm_area_struct *vma) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; if (!__vma_shareable_flags_pmd(vma)) return 1; - return down_write_trylock(&vma_lock->rw_sema); + return down_write_trylock(&data->vma_lock.rw_sema); } void hugetlb_vma_assert_locked(struct vm_area_struct *vma) { if (__vma_shareable_flags_pmd(vma)) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; - lockdep_assert_held(&vma_lock->rw_sema); + lockdep_assert_held(&data->vma_lock.rw_sema); } } @@ -6985,7 +6985,7 @@ static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma) } } -static void hugetlb_vma_lock_free(struct vm_area_struct *vma) +static void hugetlb_vma_data_free(struct vm_area_struct *vma) { /* * Only present in sharable vmas. @@ -6994,16 +6994,17 @@ static void hugetlb_vma_lock_free(struct vm_area_struct *vma) return; if (vma->vm_private_data) { - struct hugetlb_vma_lock *vma_lock = vma->vm_private_data; + struct hugetlb_shared_vma_data *data = vma->vm_private_data; + struct hugetlb_vma_lock *vma_lock = &data->vma_lock; down_write(&vma_lock->rw_sema); __hugetlb_vma_unlock_write_put(vma_lock); } } -static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) { - struct hugetlb_vma_lock *vma_lock; + struct hugetlb_shared_vma_data *data; /* Only establish in (flags) sharable vmas */ if (!vma || !(vma->vm_flags & VM_MAYSHARE)) @@ -7013,8 +7014,8 @@ static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) if (vma->vm_private_data) return 0; - vma_lock = kmalloc(sizeof(*vma_lock), GFP_KERNEL); - if (!vma_lock) { + data = kmalloc(sizeof(*data), GFP_KERNEL); + if (!data) { /* * If we can not allocate structure, then vma can not * participate in pmd sharing. This is only a possible @@ -7025,14 +7026,14 @@ static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) * until the file is removed. Warn in the unlikely case of * allocation failure. */ - pr_warn_once("HugeTLB: unable to allocate vma specific lock\n"); + pr_warn_once("HugeTLB: unable to allocate vma shared data\n"); return -ENOMEM; } - kref_init(&vma_lock->refs); - init_rwsem(&vma_lock->rw_sema); - vma_lock->vma = vma; - vma->vm_private_data = vma_lock; + kref_init(&data->vma_lock.refs); + init_rwsem(&data->vma_lock.rw_sema); + data->vma_lock.vma = vma; + vma->vm_private_data = data; return 0; } @@ -7157,11 +7158,11 @@ static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma) { } -static void hugetlb_vma_lock_free(struct vm_area_struct *vma) +static void hugetlb_vma_data_free(struct vm_area_struct *vma) { } -static int hugetlb_vma_lock_alloc(struct vm_area_struct *vma) +static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) { return 0; } From patchwork Fri Oct 21 16:36:23 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015076 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id E1F20C433FE for ; Fri, 21 Oct 2022 16:37:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8AA988E000A; Fri, 21 Oct 2022 12:37:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 85A298E0001; Fri, 21 Oct 2022 12:37:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6FB628E000A; Fri, 21 Oct 2022 12:37:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 617C48E0001 for ; Fri, 21 Oct 2022 12:37:24 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 4624AC0139 for ; Fri, 21 Oct 2022 16:37:24 +0000 (UTC) X-FDA: 80045512008.02.4EEBA8D Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf09.hostedemail.com (Postfix) with ESMTP id F2BAA14003A for ; Fri, 21 Oct 2022 16:37:23 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-367f94b9b16so33986027b3.11 for ; Fri, 21 Oct 2022 09:37:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=IUxkfzPpYKN6asWxOUK5hNSN6fMNkS8/sAXTNszQZDU=; b=KyHaYNZ4pYGCB/ViwdCe6ocu1GDgb5YvODnNgAb9mCc41is/HzCKWGlBYl+zSC2M7R wDxY0MPGeDl4H+0C+mB6AFFSBWs3XZbwRg6SRGwAx5hUQbyU0q2y7YE3x3g1mN6R7kOL mmsmmdI260xt2gPS3NqKTeMLpZZsslNcjBFK0FPPuqtHMPONuhEMnH8tH2tsucIOb/sf djUB2qYtnGhphrore61r1cmnFMRrRh4uh4EhSpubCR4MBBVW+CoIXhmSmouk+6a8lUUx XMYWqrbERhGrTAQJzZPB/mpCFUGWWYrvH6mlo2sZFv3G70Q1lmcsgtXbK8JdDbqteunh HYwQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=IUxkfzPpYKN6asWxOUK5hNSN6fMNkS8/sAXTNszQZDU=; b=YGFug9D/sOnYEDlAqRTj70banyELVGGOslOkLCWLDOzZB01H9NVUHMx09tud+mNNTn rhhzuISDlDnUornOVjvwNO0BPXdudmFvzux+NMaKwNB4tPi4XB1HfN6L118I+fnJP9WQ VQyJROb2NIvjtsOl+/8eJvTObFlrTxZKtzEOOsw7i0WqCQP3ezz25S+tgSiC5BdFBnui 03Osm3NpecRQXfEmHiwMJzLjudxBZryiXxkraAS2Ft6wM7QuA0kdmObjrJQN1JWxm3t4 gv15MTOmhCG/RAu6+zjDD15t+lTmsgmXsX3r516KsEN6vMcq5PMo3bvUfVccot/PmV/p xeog== X-Gm-Message-State: ACrzQf1vSmaIXvbtwqoFBDk7/uBf0KoOygj8VAFhn6p9zG+qkUPsKSUP lZxijqJL88MLNIwUJC5AMpbV81a7PNxhbuiJ X-Google-Smtp-Source: AMsMyM4vp8fYTq3qDzC4UTof6qxcqVOi/s+AdXdyXBlWQIteH9csXgDzlKMfMAxWzbToL+1hNLKr/pzgUDQtkmZ8 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:790d:0:b0:670:6032:b1df with SMTP id u13-20020a25790d000000b006706032b1dfmr16600775ybc.629.1666370243225; Fri, 21 Oct 2022 09:37:23 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:23 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-8-jthoughton@google.com> Subject: [RFC PATCH v2 07/47] hugetlb: add CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370244; a=rsa-sha256; cv=none; b=vLBqJtd6PSezGwFTH3OHO/fRa7Mi6eOrH9GX3QCM1sum/fkwbNbMCQP9TNut7jFuBOIVXY 9F3yqCRzc1XCyAaj04ulGFGxgrC8MmTR1IIA4yINVosIiVpuJc9cZjM14PwR/szjQ/xU5K kKWWo2MfmYkNcfUCQmSsTTZ/iSJtucQ= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KyHaYNZ4; spf=pass (imf09.hostedemail.com: domain of 3w8pSYwoKCMQt3ry4qr3yxqyyqvo.mywvsx47-wwu5kmu.y1q@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3w8pSYwoKCMQt3ry4qr3yxqyyqvo.mywvsx47-wwu5kmu.y1q@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370244; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=IUxkfzPpYKN6asWxOUK5hNSN6fMNkS8/sAXTNszQZDU=; b=tgGeq1B1149fO+stjotP9D2wnArLtyAyKJZl3K35IFKaNJL4ziSFYFalUVNLzBtC4+1+jT toE1f9a65b519KsvfiUbSqlm+JkWccqODCh/0pG16NDwWfxqCq0gUY/idgKCSJocz5fOGV z5ugU3/CRtcFNhhOxhe1ZCnuHXWnPgI= X-Stat-Signature: zpj3bbhr8rewyo58dsw49a8iq891ucgn X-Rspamd-Queue-Id: F2BAA14003A Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=KyHaYNZ4; spf=pass (imf09.hostedemail.com: domain of 3w8pSYwoKCMQt3ry4qr3yxqyyqvo.mywvsx47-wwu5kmu.y1q@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3w8pSYwoKCMQt3ry4qr3yxqyyqvo.mywvsx47-wwu5kmu.y1q@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1666370243-213749 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This adds the Kconfig to enable or disable high-granularity mapping. Each architecture must explicitly opt-in to it (via ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING), but when opted in, HGM will be enabled by default if HUGETLB_PAGE is enabled. Signed-off-by: James Houghton --- fs/Kconfig | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/fs/Kconfig b/fs/Kconfig index 2685a4d0d353..ce2567946016 100644 --- a/fs/Kconfig +++ b/fs/Kconfig @@ -267,6 +267,13 @@ config HUGETLB_PAGE_OPTIMIZE_VMEMMAP_DEFAULT_ON enable HVO by default. It can be disabled via hugetlb_free_vmemmap=off (boot command line) or hugetlb_optimize_vmemmap (sysctl). +config ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING + bool + +config HUGETLB_HIGH_GRANULARITY_MAPPING + def_bool HUGETLB_PAGE + depends on ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING + config MEMFD_CREATE def_bool TMPFS || HUGETLBFS From patchwork Fri Oct 21 16:36:24 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015077 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1A1BAFA373E for ; Fri, 21 Oct 2022 16:37:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 774418E000C; Fri, 21 Oct 2022 12:37:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6AE5D8E0001; Fri, 21 Oct 2022 12:37:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3A2488E000B; Fri, 21 Oct 2022 12:37:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 214A68E0001 for ; Fri, 21 Oct 2022 12:37:26 -0400 (EDT) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id E9A8480602 for ; Fri, 21 Oct 2022 16:37:25 +0000 (UTC) X-FDA: 80045512050.08.0162935 Received: from mail-ua1-f73.google.com (mail-ua1-f73.google.com [209.85.222.73]) by imf12.hostedemail.com (Postfix) with ESMTP id D41F14001A for ; Fri, 21 Oct 2022 16:37:24 +0000 (UTC) Received: by mail-ua1-f73.google.com with SMTP id k9-20020a9f30c9000000b003df15f05649so2386041uab.6 for ; Fri, 21 Oct 2022 09:37:24 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=2b1VLSKb1qIoBhHv68uc7sEp7uA+Ju1UfoLt8HAduUY=; b=fFKq9A+3A5TkI1IGkVnB2YWBf8fVHVGy1IiVCq2xH2D+ukznX6/SfUTTA86X+pjDnq mbkAvkkQZk6J12PGLDzV/KGi2b7kOMU02Aktw1M0kEN5f6ZdQppBNHzpLJZ/2+Tr/2/g ncbFQte81ZEqmhzmntgBziWdoyhRcdyDj0o5k51Fk87D8MZ5FdHzTMkbyahm7686yY3i DZOqrcFZGBRSPmwIMgUobG/r3zXICtn1+y5idZYO2hGxg2n7PJaREU84U/umC4o0E0Ow /Kbl96xQv1BiA7GCRPg/XqDMmTLQZjscb0wPrtARpfKTcFdOZGNtTdI8o1uc4cbj1uEg 69aw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=2b1VLSKb1qIoBhHv68uc7sEp7uA+Ju1UfoLt8HAduUY=; b=4I+BYlM/1xLNQ6daUuxLl7KLqeiQfEiXho5tx9tPvzYsK4TNdLvXh4H5uZycYpyiE6 lUFuifHepaTwOkPlnNaXCQ72tJkPPukeXrIUVtkH9GBNBZ470Ch/OcACeP4r7distA/I ByLlXKiYwfgOxyK5hMOD1u5m+QO/P4KrbMp02KSSDIwxqHsuFs+BDCXa0P+t4cDJgUgb 60alyFjjp7xL8reWxYZdiK5oc7W9/Wc5iGuv5TjA9nNfD3uKMSTxUAHJ5CuOLoUj/hfL VENgh2dD9tOg/CXODCOqzgyLW57ycSTgUxOTVsSvWyE2D/QD0Iip8Fh7H1r/gB8EwqvE gv4g== X-Gm-Message-State: ACrzQf0GFLMkaQJa51jkOTvsCnKfzQQ+JlXDrTU5Cp/X0xmwjanwTyig 4KfQLJY35WWFovL6xzrqPanw5pATb3/aFckg X-Google-Smtp-Source: AMsMyM4U494lNIuvyXBYQh14d1SDt3UAjeXmdAvTI/QhS8Lp1qSrtxq6ioU81FqwlquHVCLPB+1LIMea9tgSCnQ7 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:6847:0:b0:3f0:c29b:e14a with SMTP id a7-20020ab06847000000b003f0c29be14amr9896783uas.33.1666370244124; Fri, 21 Oct 2022 09:37:24 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:24 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-9-jthoughton@google.com> Subject: [RFC PATCH v2 08/47] hugetlb: add HGM enablement functions From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370244; a=rsa-sha256; cv=none; b=tLP68V5I8E+FnZPxVwmmDC2oEPzCnBUMFcr5zpX8NBGvYG+rN32ouF21nnp/xxtb9EB8Fk vI8zFqMuv8pM7oWVvXuBoeHokvbWiaNrtC7g5H8lGTrN2BZmadRL77VbQQSnRxmdKu6T92 kQ1mWtwhaDTxignAE3ZNfj7kCrCKH8I= ARC-Authentication-Results: i=1; imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=fFKq9A+3; spf=pass (imf12.hostedemail.com: domain of 3xMpSYwoKCMUu4sz5rs4zyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--jthoughton.bounces.google.com designates 209.85.222.73 as permitted sender) smtp.mailfrom=3xMpSYwoKCMUu4sz5rs4zyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370244; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=2b1VLSKb1qIoBhHv68uc7sEp7uA+Ju1UfoLt8HAduUY=; b=ZRdbSyr3Ol2mGz66SGK9O/kkgqr2YovixdsCL20CUD8TQTGlqUePAhVHELSbz/E8TtN92/ y6xM56PkGtjkADkQ0DlKgBBsS4apUdkZ3oN8Of9/5uaC884y/ZwpAIqnwC2f+sOFE78kPB uXawF+ooDJwwxhAcIjpAZp4x0N9FDmo= X-Stat-Signature: tsf41e98t4xgepzm4gzow4yxmsweyjx7 X-Rspamd-Queue-Id: D41F14001A X-Rspam-User: X-Rspamd-Server: rspam03 Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=fFKq9A+3; spf=pass (imf12.hostedemail.com: domain of 3xMpSYwoKCMUu4sz5rs4zyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--jthoughton.bounces.google.com designates 209.85.222.73 as permitted sender) smtp.mailfrom=3xMpSYwoKCMUu4sz5rs4zyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1666370244-282571 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Currently it is possible for all shared VMAs to use HGM, but it must be enabled first. This is because with HGM, we lose PMD sharing, and page table walks require additional synchronization (we need to take the VMA lock). Signed-off-by: James Houghton --- include/linux/hugetlb.h | 22 +++++++++++++ mm/hugetlb.c | 69 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 91 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 534958499ac4..6e0c36b08a0c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -123,6 +123,9 @@ struct hugetlb_vma_lock { struct hugetlb_shared_vma_data { struct hugetlb_vma_lock vma_lock; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + bool hgm_enabled; +#endif }; extern struct resv_map *resv_map_alloc(void); @@ -1179,6 +1182,25 @@ static inline void hugetlb_unregister_node(struct node *node) } #endif /* CONFIG_HUGETLB_PAGE */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_enabled(struct vm_area_struct *vma); +bool hugetlb_hgm_eligible(struct vm_area_struct *vma); +int enable_hugetlb_hgm(struct vm_area_struct *vma); +#else +static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + return false; +} +static inline bool hugetlb_hgm_eligible(struct vm_area_struct *vma) +{ + return false; +} +static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) +{ + return -EINVAL; +} +#endif + static inline spinlock_t *huge_pte_lock(struct hstate *h, struct mm_struct *mm, pte_t *pte) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5ae8bc8c928e..a18143add956 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6840,6 +6840,10 @@ static bool pmd_sharing_possible(struct vm_area_struct *vma) #ifdef CONFIG_USERFAULTFD if (uffd_disable_huge_pmd_share(vma)) return false; +#endif +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + if (hugetlb_hgm_enabled(vma)) + return false; #endif /* * Only shared VMAs can share PMDs. @@ -7033,6 +7037,9 @@ static int hugetlb_vma_data_alloc(struct vm_area_struct *vma) kref_init(&data->vma_lock.refs); init_rwsem(&data->vma_lock.rw_sema); data->vma_lock.vma = vma; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + data->hgm_enabled = false; +#endif vma->vm_private_data = data; return 0; } @@ -7290,6 +7297,68 @@ __weak unsigned long hugetlb_mask_last_page(struct hstate *h) #endif /* CONFIG_ARCH_WANT_GENERAL_HUGETLB */ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +bool hugetlb_hgm_eligible(struct vm_area_struct *vma) +{ + /* + * All shared VMAs may have HGM. + * + * HGM requires using the VMA lock, which only exists for shared VMAs. + * To make HGM work for private VMAs, we would need to use another + * scheme to prevent collapsing/splitting from invalidating other + * threads' page table walks. + */ + return vma && (vma->vm_flags & VM_MAYSHARE); +} +bool hugetlb_hgm_enabled(struct vm_area_struct *vma) +{ + struct hugetlb_shared_vma_data *data = vma->vm_private_data; + + if (!vma || !(vma->vm_flags & VM_MAYSHARE)) + return false; + + return data && data->hgm_enabled; +} + +/* + * Enable high-granularity mapping (HGM) for this VMA. Once enabled, HGM + * cannot be turned off. + * + * PMDs cannot be shared in HGM VMAs. + */ +int enable_hugetlb_hgm(struct vm_area_struct *vma) +{ + int ret; + struct hugetlb_shared_vma_data *data; + + if (!hugetlb_hgm_eligible(vma)) + return -EINVAL; + + if (hugetlb_hgm_enabled(vma)) + return 0; + + /* + * We must hold the mmap lock for writing so that callers can rely on + * hugetlb_hgm_enabled returning a consistent result while holding + * the mmap lock for reading. + */ + mmap_assert_write_locked(vma->vm_mm); + + /* HugeTLB HGM requires the VMA lock to synchronize collapsing. */ + ret = hugetlb_vma_data_alloc(vma); + if (ret) + return ret; + + data = vma->vm_private_data; + BUG_ON(!data); + data->hgm_enabled = true; + + /* We don't support PMD sharing with HGM. */ + hugetlb_unshare_all_pmds(vma); + return 0; +} +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + /* * These functions are overwritable if your architecture needs its own * behavior. From patchwork Fri Oct 21 16:36:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015078 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5F57AC433FE for ; Fri, 21 Oct 2022 16:37:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB9038E000B; Fri, 21 Oct 2022 12:37:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B41E28E0001; Fri, 21 Oct 2022 12:37:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8ACE98E000D; Fri, 21 Oct 2022 12:37:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7364E8E000B for ; Fri, 21 Oct 2022 12:37:26 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 3887F1A0512 for ; Fri, 21 Oct 2022 16:37:26 +0000 (UTC) X-FDA: 80045512092.07.55D30B5 Received: from mail-vs1-f73.google.com (mail-vs1-f73.google.com [209.85.217.73]) by imf11.hostedemail.com (Postfix) with ESMTP id B5F6040004 for ; Fri, 21 Oct 2022 16:37:25 +0000 (UTC) Received: by mail-vs1-f73.google.com with SMTP id q185-20020a6743c2000000b003a6ee1e0c90so1048493vsa.17 for ; Fri, 21 Oct 2022 09:37:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=uDuCPiOW4B8iVzI/6AmJPz8Wx+abw4sbfMgiEm8c4Xc=; b=qq0EliFW2HK5B98D2udo/aAD/3vjrh0KvnobMtG+PZI475PQxdoifbXuauS2SCXR7s NBAGHfMDTBh2R7M3djga9zPIyUlhx0bNxKeL4L+V01PzyGv4mm0erA2L0Y1ny5S6vxJy 4gip8m0AqQsH7lfmx5uVfNaKn30YrEf2kISS2JxneX57zOtYW6lbymXlNNHiSfHhJ/y1 yUdl8fYNmg7ptYMobwK3mFqxuJ7Y4BDoNlS3denm9rDalcC6tLL8UL3VoKuOEfO+zFSR UdMrIX+D3fSXo2nL8Ka8LYE/7mxAdhiJ2pdIhLJL9bKOH/gF6yAWL/qKQ3jBuanK9jCL 6s5w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=uDuCPiOW4B8iVzI/6AmJPz8Wx+abw4sbfMgiEm8c4Xc=; b=uGlLQ/MdHQj83H2/WKWuSNwyIud9xreeTBM8VszYZ8ywpSlHwXVY8GxNUOr9/ZGqh2 tI1AMcoMES1YqoszgirVVXTgSOq66/aMOCr742sfBuXRiu5RoLcyuUMUbvm/9kwaEqX1 UDvN2wE3AHGagF3exNDp2tJTQsQB50RJY/4s7bIIlYRPWBnjNhn40+eqY8WfndGG8Zwn hjkJNqhCRiZjmJgKgwoNrKKKym07Q25ZotWUHgZJ68gDwE278N0c2pCeLWUH+PkXB5L4 CZ5N4mGjrKtxDR3WhWjkVxYgTnkm3seSjDlBH5KZECVf74N1CPkgLUNtQgq29v+OKCQW 8grw== X-Gm-Message-State: ACrzQf29GhuqfteT3OmlF+6V2J69SjyVbLaRPdPAu4OzqltXqO9vy9xH HurJNYKJieHhjuVoXSedXcewq0jIK0nT3Kpx X-Google-Smtp-Source: AMsMyM5MOfTzLEOJgE3E6XYRUQL+J9vN27S5Ka8UI3ikwk+Wd0rafhqiIzJtQHdX6T/3LZGa8ANB+n7hmfewb2Cy X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:3172:b0:3a7:319c:ffef with SMTP id l18-20020a056102317200b003a7319cffefmr15180512vsm.80.1666370244975; Fri, 21 Oct 2022 09:37:24 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:25 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-10-jthoughton@google.com> Subject: [RFC PATCH v2 09/47] hugetlb: make huge_pte_lockptr take an explicit shift argument. From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370245; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=uDuCPiOW4B8iVzI/6AmJPz8Wx+abw4sbfMgiEm8c4Xc=; b=lE56P3TWBjpMbNJEfGfUuePmCzT7wF2G5+DO3ZGmOZrqfdLvRZigdw/mNQT7f6bhnGJewZ yzA/UrTqUHtOPlvUBC3jtiG6flizTFUEa0uimBh9FaKHyOLm8Kh/wQxFaP3nsXX9uSs7+Y Md+DFYbWYdVTqeTMFR/CqxQ85NDGbGQ= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qq0EliFW; spf=pass (imf11.hostedemail.com: domain of 3xMpSYwoKCMUu4sz5rs4zyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--jthoughton.bounces.google.com designates 209.85.217.73 as permitted sender) smtp.mailfrom=3xMpSYwoKCMUu4sz5rs4zyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370245; a=rsa-sha256; cv=none; b=gxAmdu151YWjpCACZ8yV3BpHkov9kaOannaaCGQag/1EWPzGqrA6UFnwM9Kcp98LjClVZV t8qiBYvTLRerJO6EuB6BCQccVXwOuMMGX5/uTJNSD3E4OanNP6wYWBHrALkoeHRIqTGptQ CAlECqLjME6lZUn/6Rt9JGq1aHW4Qg8= X-Stat-Signature: h995ain7yjin8xck7nzrykyy8h1hdqqc X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=qq0EliFW; spf=pass (imf11.hostedemail.com: domain of 3xMpSYwoKCMUu4sz5rs4zyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--jthoughton.bounces.google.com designates 209.85.217.73 as permitted sender) smtp.mailfrom=3xMpSYwoKCMUu4sz5rs4zyrzzrwp.nzxwty58-xxv6lnv.z2r@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: B5F6040004 X-HE-Tag: 1666370245-313350 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is needed to handle PTL locking with high-granularity mapping. We won't always be using the PMD-level PTL even if we're using the 2M hugepage hstate. It's possible that we're dealing with 4K PTEs, in which case, we need to lock the PTL for the 4K PTE. Signed-off-by: James Houghton Reviewed-by: Mina Almasry Acked-by: Mike Kravetz --- arch/powerpc/mm/pgtable.c | 3 ++- include/linux/hugetlb.h | 9 ++++----- mm/hugetlb.c | 7 ++++--- mm/migrate.c | 3 ++- 4 files changed, 12 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/mm/pgtable.c b/arch/powerpc/mm/pgtable.c index cb2dcdb18f8e..035a0df47af0 100644 --- a/arch/powerpc/mm/pgtable.c +++ b/arch/powerpc/mm/pgtable.c @@ -261,7 +261,8 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma, psize = hstate_get_psize(h); #ifdef CONFIG_DEBUG_VM - assert_spin_locked(huge_pte_lockptr(h, vma->vm_mm, ptep)); + assert_spin_locked(huge_pte_lockptr(huge_page_shift(h), + vma->vm_mm, ptep)); #endif #else diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 6e0c36b08a0c..db3ed6095b1c 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -934,12 +934,11 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return modified_mask; } -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, +static inline spinlock_t *huge_pte_lockptr(unsigned int shift, struct mm_struct *mm, pte_t *pte) { - if (huge_page_size(h) == PMD_SIZE) + if (shift == PMD_SHIFT) return pmd_lockptr(mm, (pmd_t *) pte); - VM_BUG_ON(huge_page_size(h) == PAGE_SIZE); return &mm->page_table_lock; } @@ -1144,7 +1143,7 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return 0; } -static inline spinlock_t *huge_pte_lockptr(struct hstate *h, +static inline spinlock_t *huge_pte_lockptr(unsigned int shift, struct mm_struct *mm, pte_t *pte) { return &mm->page_table_lock; @@ -1206,7 +1205,7 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h, { spinlock_t *ptl; - ptl = huge_pte_lockptr(h, mm, pte); + ptl = huge_pte_lockptr(huge_page_shift(h), mm, pte); spin_lock(ptl); return ptl; } diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a18143add956..ef7662bd0068 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4847,7 +4847,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(h, src, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); again: @@ -4925,7 +4925,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, /* Install the new huge page if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(h, src, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), + src, src_pte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { @@ -4979,7 +4980,7 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, pte_t pte; dst_ptl = huge_pte_lock(h, mm, dst_pte); - src_ptl = huge_pte_lockptr(h, mm, src_pte); + src_ptl = huge_pte_lockptr(huge_page_shift(h), mm, src_pte); /* * We don't have to worry about the ordering of src and dst ptlocks diff --git a/mm/migrate.c b/mm/migrate.c index 1457cdbb7828..a0105fa6e3b2 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -334,7 +334,8 @@ void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { - spinlock_t *ptl = huge_pte_lockptr(hstate_vma(vma), vma->vm_mm, pte); + spinlock_t *ptl = huge_pte_lockptr(huge_page_shift(hstate_vma(vma)), + vma->vm_mm, pte); __migration_entry_wait_huge(pte, ptl); } From patchwork Fri Oct 21 16:36:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015079 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5E03EFA373F for ; Fri, 21 Oct 2022 16:37:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id ACD2C8E000D; Fri, 21 Oct 2022 12:37:27 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A31328E0001; Fri, 21 Oct 2022 12:37:27 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 834598E000D; Fri, 21 Oct 2022 12:37:27 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6B3E38E0001 for ; Fri, 21 Oct 2022 12:37:27 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 37504AACDF for ; Fri, 21 Oct 2022 16:37:27 +0000 (UTC) X-FDA: 80045512134.22.D17B1AD Received: from mail-vs1-f74.google.com (mail-vs1-f74.google.com [209.85.217.74]) by imf11.hostedemail.com (Postfix) with ESMTP id CD9B340036 for ; Fri, 21 Oct 2022 16:37:26 +0000 (UTC) Received: by mail-vs1-f74.google.com with SMTP id h187-20020a676cc4000000b003a7403a00d3so1047097vsc.16 for ; Fri, 21 Oct 2022 09:37:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fki+um4ICIptosTnJrWPKrPDlP+wU0UwwTN9RPzv9+o=; b=ND0Wnc7owKsedJs2eeuHEmNDxzWNQ8LJ9L6l2O850QzH0C/poOjJndJ7AVL7Po1zjY RTH3QeNCDw8Ggm9zXm0JwsqpfhqKci87zPwqjusR19zNbpEUtAwPucBKje4NDjKnw/yN JTUX7XoJ4tpG9TXphw9fp2FSGuVr5cTSuyjJYTpAt1r53VPzmdcNWylmEckjVlQivsUE doMwgo1hmTPDhQyun6/Q6ld67hLjwxbbwc8Uuj99/wST20NLCwvhvHvgbjaU/OG+tOnz +6TkRssWN2qGSoBubKtF0OdCdQxRP8LDpfebPGhVmTNXhq3XL/JjAXpxQmoXgb5b4WfY YwQg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fki+um4ICIptosTnJrWPKrPDlP+wU0UwwTN9RPzv9+o=; b=Rf0IdtcB7ImU3XRwEzCUv3HEAxXuukT6N5qG6Sjm1Hlg1ZnDzGxPqgb3zuhMnJnq8U SjACFr7Z6V8LBD1RABi+rpWq6Wk3eUVeSWLQSI3ncfIu1VR+rWOUvq7KtnYtUCBkTgo7 124t/9ciBpbj+AsS3CRa+GRxWGmQsG6JX05Tpm0gnrv3XMvXefmyHXIZIlCCD5V+VT0t jVrfDcULB5ALJxHumMeOB/Fnmx7l6uJcAeRT8TKo27/95AVwqvSuL7pfRlXSIejYdZMv FY5dHxt5/zLn5EeMwpPtL2C0QVuqwJSLj+g2L6NFrOPZ4m2HIt649DK4XPYhw/7FBNrS Dr0w== X-Gm-Message-State: ACrzQf0PXRG0BmMKhSGScSXkTO53H2aLvK4a11E+iiTMKVpZHbMSEJhD h23JG2u1wvRx4BHX03csPvErV8vXWVyrmwRM X-Google-Smtp-Source: AMsMyM5QFhBSyeYC9Di8NxouECHRYWC0hrCK9vBz/FU8ZymUfIUKH+oIBZ8yx3ZjzVdzlHzg9sxxP3WTZJzBD3WK X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:3e1a:b0:3a9:7543:204a with SMTP id j26-20020a0561023e1a00b003a97543204amr13044013vsv.53.1666370245924; Fri, 21 Oct 2022 09:37:25 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:26 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-11-jthoughton@google.com> Subject: [RFC PATCH v2 10/47] hugetlb: add hugetlb_pte to track HugeTLB page table entries From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ND0Wnc7o; spf=pass (imf11.hostedemail.com: domain of 3xcpSYwoKCMYv5t06st50zs00sxq.o0yxuz69-yyw7mow.03s@flex--jthoughton.bounces.google.com designates 209.85.217.74 as permitted sender) smtp.mailfrom=3xcpSYwoKCMYv5t06st50zs00sxq.o0yxuz69-yyw7mow.03s@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370246; a=rsa-sha256; cv=none; b=JiDVtJdfOvGf12XYr5I5GqDPJBuvvTBb+hDUXUr9SlDXh2O9snxza7vMIofbdRdD3ZNmiS 0eZZVniO3GRj2c9st804ild+5a1xZLlo0n6d7TlqsTYBZ/ji6GsZe3CiW1w989oeirJxKm D2PhMQWwDvOhh+LXxcuPAk/dM9Pu5kk= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370246; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fki+um4ICIptosTnJrWPKrPDlP+wU0UwwTN9RPzv9+o=; b=gykH3ByhLncsUtaouJQ5Vw3KBQTk9eJuz/rRHqn7wHBS4KEak7tPjzrBgGLAdPjFi+prOr /B4QvoqheBkdfZtv3QPpCTmiDWTpLISBsl+C+n4r2A9D5GG2diJ1nW5xjnrvGft/ZpPEku 1DdzgjPfqRENJVNvJ5syiNykeo3p0E4= X-Rspamd-Queue-Id: CD9B340036 X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ND0Wnc7o; spf=pass (imf11.hostedemail.com: domain of 3xcpSYwoKCMYv5t06st50zs00sxq.o0yxuz69-yyw7mow.03s@flex--jthoughton.bounces.google.com designates 209.85.217.74 as permitted sender) smtp.mailfrom=3xcpSYwoKCMYv5t06st50zs00sxq.o0yxuz69-yyw7mow.03s@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam11 X-Stat-Signature: 9udhmtswt99po17tg97fh6tim7ni6agw X-HE-Tag: 1666370246-399096 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: After high-granularity mapping, page table entries for HugeTLB pages can be of any size/type. (For example, we can have a 1G page mapped with a mix of PMDs and PTEs.) This struct is to help keep track of a HugeTLB PTE after we have done a page table walk. Without this, we'd have to pass around the "size" of the PTE everywhere. We effectively did this before; it could be fetched from the hstate, which we pass around pretty much everywhere. hugetlb_pte_present_leaf is included here as a helper function that will be used frequently later on. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 88 +++++++++++++++++++++++++++++++++++++++++ mm/hugetlb.c | 29 ++++++++++++++ 2 files changed, 117 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index db3ed6095b1c..d30322108b34 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -50,6 +50,75 @@ enum { __NR_USED_SUBPAGE, }; +enum hugetlb_level { + HUGETLB_LEVEL_PTE = 1, + /* + * We always include PMD, PUD, and P4D in this enum definition so that, + * when logged as an integer, we can easily tell which level it is. + */ + HUGETLB_LEVEL_PMD, + HUGETLB_LEVEL_PUD, + HUGETLB_LEVEL_P4D, + HUGETLB_LEVEL_PGD, +}; + +struct hugetlb_pte { + pte_t *ptep; + unsigned int shift; + enum hugetlb_level level; + spinlock_t *ptl; +}; + +static inline +void hugetlb_pte_populate(struct hugetlb_pte *hpte, pte_t *ptep, + unsigned int shift, enum hugetlb_level level) +{ + WARN_ON_ONCE(!ptep); + hpte->ptep = ptep; + hpte->shift = shift; + hpte->level = level; + hpte->ptl = NULL; +} + +static inline +unsigned long hugetlb_pte_size(const struct hugetlb_pte *hpte) +{ + WARN_ON_ONCE(!hpte->ptep); + return 1UL << hpte->shift; +} + +static inline +unsigned long hugetlb_pte_mask(const struct hugetlb_pte *hpte) +{ + WARN_ON_ONCE(!hpte->ptep); + return ~(hugetlb_pte_size(hpte) - 1); +} + +static inline +unsigned int hugetlb_pte_shift(const struct hugetlb_pte *hpte) +{ + WARN_ON_ONCE(!hpte->ptep); + return hpte->shift; +} + +static inline +enum hugetlb_level hugetlb_pte_level(const struct hugetlb_pte *hpte) +{ + WARN_ON_ONCE(!hpte->ptep); + return hpte->level; +} + +static inline +void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src) +{ + dest->ptep = src->ptep; + dest->shift = src->shift; + dest->level = src->level; + dest->ptl = src->ptl; +} + +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte); + struct hugepage_subpool { spinlock_t lock; long count; @@ -1210,6 +1279,25 @@ static inline spinlock_t *huge_pte_lock(struct hstate *h, return ptl; } +static inline +spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte) +{ + + BUG_ON(!hpte->ptep); + if (hpte->ptl) + return hpte->ptl; + return huge_pte_lockptr(hugetlb_pte_shift(hpte), mm, hpte->ptep); +} + +static inline +spinlock_t *hugetlb_pte_lock(struct mm_struct *mm, struct hugetlb_pte *hpte) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte); + + spin_lock(ptl); + return ptl; +} + #if defined(CONFIG_HUGETLB_PAGE) && defined(CONFIG_CMA) extern void __init hugetlb_cma_reserve(int order); #else diff --git a/mm/hugetlb.c b/mm/hugetlb.c index ef7662bd0068..a0e46d35dabc 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -1127,6 +1127,35 @@ static bool vma_has_reserves(struct vm_area_struct *vma, long chg) return false; } +bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte) +{ + pgd_t pgd; + p4d_t p4d; + pud_t pud; + pmd_t pmd; + + WARN_ON_ONCE(!hpte->ptep); + switch (hugetlb_pte_level(hpte)) { + case HUGETLB_LEVEL_PGD: + pgd = __pgd(pte_val(pte)); + return pgd_present(pgd) && pgd_leaf(pgd); + case HUGETLB_LEVEL_P4D: + p4d = __p4d(pte_val(pte)); + return p4d_present(p4d) && p4d_leaf(p4d); + case HUGETLB_LEVEL_PUD: + pud = __pud(pte_val(pte)); + return pud_present(pud) && pud_leaf(pud); + case HUGETLB_LEVEL_PMD: + pmd = __pmd(pte_val(pte)); + return pmd_present(pmd) && pmd_leaf(pmd); + case HUGETLB_LEVEL_PTE: + return pte_present(pte); + default: + WARN_ON_ONCE(1); + return false; + } +} + static void enqueue_huge_page(struct hstate *h, struct page *page) { int nid = page_to_nid(page); From patchwork Fri Oct 21 16:36:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015080 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 7DD4FFA3740 for ; Fri, 21 Oct 2022 16:37:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 661388E000E; Fri, 21 Oct 2022 12:37:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 59E968E0001; Fri, 21 Oct 2022 12:37:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 379838E000E; Fri, 21 Oct 2022 12:37:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 260EE8E0001 for ; Fri, 21 Oct 2022 12:37:28 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E40FBAAD57 for ; Fri, 21 Oct 2022 16:37:27 +0000 (UTC) X-FDA: 80045512134.14.38641C9 Received: from mail-ua1-f73.google.com (mail-ua1-f73.google.com [209.85.222.73]) by imf02.hostedemail.com (Postfix) with ESMTP id 9402580018 for ; Fri, 21 Oct 2022 16:37:27 +0000 (UTC) Received: by mail-ua1-f73.google.com with SMTP id a43-20020a9f376e000000b003eac6b97cf1so2407842uae.11 for ; Fri, 21 Oct 2022 09:37:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=oBshBOuvfy/W1BjcC6heQpMSQfDF2AIsZyDOsSW4TwI=; b=P3qv/MTBLsES5DUqVSKD8A/FOo67uH/5QWjk7bZ+AuC2vtVepaPv7nXDMe6jBJdGnF wl5tbLBKoSRJgFze6Rqcla+5Wu3Y/4rq+qR1z//VWWhDq0hfdl1Bqviwajdis7GX6dZJ M+ZE1jHFudjivAP+u9jN3zLgWg43Li5LR/f+WX7fNKTL12loHBOO3vplCGTU1vujjFPN SEUF2jc3R6lL4b97sxviNu/EraGla9thkJDq4ogtS6wyKOHp0XSnD3L/ikmoBB4oGubq FmBAKVQpWv//kqqkvT5yrCIh6Yo6KqBzzodM7qpeTlTRCCK5jlC7slq4si+uHRhLiq/Z jmIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=oBshBOuvfy/W1BjcC6heQpMSQfDF2AIsZyDOsSW4TwI=; b=g4eF08PqzVN3kPddhVoO4X1O7MgRZcSAWQLIyYtsoIQ5ScigHgx5Ufp1e1Xb5BMeZC +FhkGzrElN6QWCPaOyHljDgaB4GFcVtXb7DaPe3OkTZE29Ab/giF48u09EmZUTURt+Nj a2FJRttuFXk6hebdIOs8lOJFohKTPzHTeQ7ufjaPjpirKCqF6tTBpk0Q6X/DkGjxIJ7S KUw81MYPD8AyRlC+mvndVRN8oikk9xPjAg6rIbMtXOTG1KzL8HXy7aEa8On8JjG4/e3O iPpbdltzEzs6EvTVDwiZVx4zLBXUbFRN/RFkTvo6/fMNnG+sD4G4Q9tf/v+oxcipTLFA xLUA== X-Gm-Message-State: ACrzQf2rEAKyafsd1DQpt4QMoGbgZceglvSsqzI/sOQrPES7VK+xNo1g IP+ROGgUrLZ12d9mf/IKL1yKsiOH39tpQZDa X-Google-Smtp-Source: AMsMyM4Vh7NfINhYVI9VNbRp8uXsxB4z8lmQPmQVUZspSIfA3zXevP+3q84mKE9ZQnNKvownRcHvr2VDIrpSXS5f X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:ab0:b99:0:b0:3c2:b377:61e1 with SMTP id c25-20020ab00b99000000b003c2b37761e1mr13835270uak.2.1666370246908; Fri, 21 Oct 2022 09:37:26 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:27 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-12-jthoughton@google.com> Subject: [RFC PATCH v2 11/47] hugetlb: add hugetlb_pmd_alloc and hugetlb_pte_alloc From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370247; a=rsa-sha256; cv=none; b=1WnIIKC4Ut1mHidXYMcTfXTElyrCaOM8WYrv5ExTOC2U7YZlcmKoHHhNuEQEOGDA/qVvUg LtVAGn1vIpx/VMQU91w9yFeCR+zdWqxPuKcTPx1tztLWgZZGEk7XhiZjnZUD8ATIYqsgNT auvjyx3/JIDziY3tB6hWv8+GwJcZi/0= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="P3qv/MTB"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of 3xspSYwoKCMcw6u17tu610t11tyr.p1zyv07A-zzx8npx.14t@flex--jthoughton.bounces.google.com designates 209.85.222.73 as permitted sender) smtp.mailfrom=3xspSYwoKCMcw6u17tu610t11tyr.p1zyv07A-zzx8npx.14t@flex--jthoughton.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370247; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=oBshBOuvfy/W1BjcC6heQpMSQfDF2AIsZyDOsSW4TwI=; b=yFHiAHF1BG8KI58ZV4hff7AYgFgWP03BLB83Yh+orTrFanpY2kKVmhW1H41osFQMqWfidT YjEpOCp/RRMR/dspnwRjT/x0WgbnYJYf4Lm/MXiCRtlt5RjQedHalGa2yGNkPplisFvLxX 3zxn6z0cRZ108z0JfSz+XoLw4fz/4iY= Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="P3qv/MTB"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of 3xspSYwoKCMcw6u17tu610t11tyr.p1zyv07A-zzx8npx.14t@flex--jthoughton.bounces.google.com designates 209.85.222.73 as permitted sender) smtp.mailfrom=3xspSYwoKCMcw6u17tu610t11tyr.p1zyv07A-zzx8npx.14t@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam04 X-Rspam-User: X-Stat-Signature: 8yxb1b85tkf71h5i5iw7qfqbrkhe7eon X-Rspamd-Queue-Id: 9402580018 X-HE-Tag: 1666370247-368408 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: These functions are used to allocate new PTEs below the hstate PTE. This will be used by hugetlb_walk_step, which implements stepping forwards in a HugeTLB high-granularity page table walk. The reasons that we don't use the standard pmd_alloc/pte_alloc* functions are: 1) This prevents us from accidentally overwriting swap entries or attempting to use swap entries as present non-leaf PTEs (see pmd_alloc(); we assume that !pte_none means pte_present and non-leaf). 2) Locking hugetlb PTEs can different than regular PTEs. (Although, as implemented right now, locking is the same.) 3) We can maintain compatibility with CONFIG_HIGHPTE. That is, HugeTLB HGM won't use HIGHPTE, but the kernel can still be built with it, and other mm code will use it. When GENERAL_HUGETLB supports P4D-based hugepages, we will need to implement hugetlb_pud_alloc to implement hugetlb_walk_step. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 5 +++ mm/hugetlb.c | 94 +++++++++++++++++++++++++++++++++++++++++ 2 files changed, 99 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d30322108b34..003255b0e40f 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -119,6 +119,11 @@ void hugetlb_pte_copy(struct hugetlb_pte *dest, const struct hugetlb_pte *src) bool hugetlb_pte_present_leaf(const struct hugetlb_pte *hpte, pte_t pte); +pmd_t *hugetlb_pmd_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr); +pte_t *hugetlb_pte_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr); + struct hugepage_subpool { spinlock_t lock; long count; diff --git a/mm/hugetlb.c b/mm/hugetlb.c index a0e46d35dabc..e3733388adee 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -341,6 +341,100 @@ static bool has_same_uncharge_info(struct file_region *rg, #endif } +pmd_t *hugetlb_pmd_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte); + pmd_t *new; + pud_t *pudp; + pud_t pud; + + if (hpte->level != HUGETLB_LEVEL_PUD) + return ERR_PTR(-EINVAL); + + pudp = (pud_t *)hpte->ptep; +retry: + pud = *pudp; + if (likely(pud_present(pud))) + return unlikely(pud_leaf(pud)) + ? ERR_PTR(-EEXIST) + : pmd_offset(pudp, addr); + else if (!huge_pte_none(huge_ptep_get(hpte->ptep))) + /* + * Not present and not none means that a swap entry lives here, + * and we can't get rid of it. + */ + return ERR_PTR(-EEXIST); + + new = pmd_alloc_one(mm, addr); + if (!new) + return ERR_PTR(-ENOMEM); + + spin_lock(ptl); + if (!pud_same(pud, *pudp)) { + spin_unlock(ptl); + pmd_free(mm, new); + goto retry; + } + + mm_inc_nr_pmds(mm); + smp_wmb(); /* See comment in pmd_install() */ + pud_populate(mm, pudp, new); + spin_unlock(ptl); + return pmd_offset(pudp, addr); +} + +pte_t *hugetlb_pte_alloc(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr) +{ + spinlock_t *ptl = hugetlb_pte_lockptr(mm, hpte); + pgtable_t new; + pmd_t *pmdp; + pmd_t pmd; + + if (hpte->level != HUGETLB_LEVEL_PMD) + return ERR_PTR(-EINVAL); + + pmdp = (pmd_t *)hpte->ptep; +retry: + pmd = *pmdp; + if (likely(pmd_present(pmd))) + return unlikely(pmd_leaf(pmd)) + ? ERR_PTR(-EEXIST) + : pte_offset_kernel(pmdp, addr); + else if (!huge_pte_none(huge_ptep_get(hpte->ptep))) + /* + * Not present and not none means that a swap entry lives here, + * and we can't get rid of it. + */ + return ERR_PTR(-EEXIST); + + /* + * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result + * in page tables being allocated in high memory, needing a kmap to + * access. Instead, we call __pte_alloc_one directly with + * GFP_PGTABLE_USER to prevent these PTEs being allocated in high + * memory. + */ + new = __pte_alloc_one(mm, GFP_PGTABLE_USER); + if (!new) + return ERR_PTR(-ENOMEM); + + spin_lock(ptl); + if (!pmd_same(pmd, *pmdp)) { + spin_unlock(ptl); + pgtable_pte_page_dtor(new); + __free_page(new); + goto retry; + } + + mm_inc_nr_ptes(mm); + smp_wmb(); /* See comment in pmd_install() */ + pmd_populate(mm, pmdp, new); + spin_unlock(ptl); + return pte_offset_kernel(pmdp, addr); +} + static void coalesce_file_region(struct resv_map *resv, struct file_region *rg) { struct file_region *nrg, *prg; From patchwork Fri Oct 21 16:36:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015081 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A1F74FA373E for ; Fri, 21 Oct 2022 16:37:31 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 354888E000F; Fri, 21 Oct 2022 12:37:29 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B5CA8E0001; Fri, 21 Oct 2022 12:37:29 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 108998E000F; Fri, 21 Oct 2022 12:37:29 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id E6FB78E0001 for ; Fri, 21 Oct 2022 12:37:28 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id D1A92406AE for ; Fri, 21 Oct 2022 16:37:28 +0000 (UTC) X-FDA: 80045512176.05.01D1B17 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf03.hostedemail.com (Postfix) with ESMTP id 75E872001C for ; Fri, 21 Oct 2022 16:37:28 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id b14-20020a056902030e00b006a827d81fd8so3710205ybs.17 for ; Fri, 21 Oct 2022 09:37:28 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=tiouMOIIjJOYM6j/82rHxYpjcD4eEhikQkMBGqa8BjI=; b=CKDVMh2S2MhbdslRa+CI5ddIMrp2Tg/ONWgjdp1mlNaXKwpLWu4solm3u9LKSEZhXe p5A9lENwGfwCeYMW/FG7oLitUyH1M07pAul99TUAng6w3bindvWi4y2ob1DEtfPdFWqB mEbbZRsIEV0rMs/1QSkPYQ7dFlZlPFhQZCI2VPY2cOuqpSHhoSsI6hY56xx7Nrh4CjVM eSbPEMydq6pzO8+REx/zjXLNqei8CW9W8LlzdHQcvqY0UxSls9cI1rUTL/nhPpVdrYN7 CrKaq5o7mD0KjcbV9vAH7QbcGy2DhuZdmXzsjVy3O3QG6hDph0GIp/gUmOcRtHlgiQyh o1cA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=tiouMOIIjJOYM6j/82rHxYpjcD4eEhikQkMBGqa8BjI=; b=S+dlkGdp/xLfx+EICrhWia5LxhfkmfJU/Q7agrzzzIS1KyHgUZZMes4JVMrD2yfop1 cStttQLIWff+3j0gZQDMP5A4ls8/21JosKJMHKnNbSvHg1EarF/mokNxZpgI825RoEjw PkW3I4iPVgFZDxxoBl1yo73kG9ZPJxQ2CPmAjQkXtc//8vus4mosfGdSojysOnm9NzTj tsdoO59sppHVVG1Vpv/KT7S0z5L1RUbnr3ukhU5FoFXRZPi0k9DPU0zC3xwCjQfwzaIb u7ZziOmtzF7vIuEhA1+sgsf1QEG9ITPOJLGSDk44nqj9FMsSzaAvF7nwGnRFWVMV3anR BptA== X-Gm-Message-State: ACrzQf22hhd/7921hkM1++jqCRAj/Yy1EzIMJ+tQtwoM+GxrtDzLCJmj 0lqfObLyj3hFsXoMgYKpxUICFIrlGRlDzNq7 X-Google-Smtp-Source: AMsMyM7X8/tb9BRsckLDnqlJFeET3NZqXynOVW5o2aTa1Z25kXUp3nnDPZi442QY8I4RggrVWYOsHolTPZE57Spr X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:124f:b0:66e:e3da:487e with SMTP id t15-20020a056902124f00b0066ee3da487emr18532629ybu.310.1666370247777; Fri, 21 Oct 2022 09:37:27 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:28 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-13-jthoughton@google.com> Subject: [RFC PATCH v2 12/47] hugetlb: add hugetlb_hgm_walk and hugetlb_walk_step From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=CKDVMh2S; spf=pass (imf03.hostedemail.com: domain of 3x8pSYwoKCMgx7v28uv721u22uzs.q20zw18B-00y9oqy.25u@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3x8pSYwoKCMgx7v28uv721u22uzs.q20zw18B-00y9oqy.25u@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370248; a=rsa-sha256; cv=none; b=XYxFRjxLUbhms8l+yMA4ukQeZHB5fJcehgY5C7eZtZGxdQ+hqe1PjOVqqmii2+euGWdBPT REvmNx4NaiajZtBeajRvYP8q0PPTqZzXRJFte0RYB69Ex8tCKrpYJoo3CD6nRExb5nLhqM DPUG09g4LdmKAvNBuiGhTKIyVSFs+lY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370248; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=tiouMOIIjJOYM6j/82rHxYpjcD4eEhikQkMBGqa8BjI=; b=jgFOq/s75FjD+UvW6XyTP12XdaEssjB3jFwsD2AJKZYvGlFZMPbvyCZjlFEUnjMxNskvSy eNZVa9LJVbs1ny+oQ5AEAVb10r4m3muIiGTboIUtL9jo/krS5Uz9Dyk22ZRGFqzHVRqz/Q 8CQDVfiQRhzj01mqkQIAantwpRhWjdk= X-Rspamd-Queue-Id: 75E872001C Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=CKDVMh2S; spf=pass (imf03.hostedemail.com: domain of 3x8pSYwoKCMgx7v28uv721u22uzs.q20zw18B-00y9oqy.25u@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3x8pSYwoKCMgx7v28uv721u22uzs.q20zw18B-00y9oqy.25u@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam09 X-Stat-Signature: n8mxejsnkkqgqhqs87zb9j5frmzikrbq X-HE-Tag: 1666370248-876667 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: hugetlb_hgm_walk implements high-granularity page table walks for HugeTLB. It is safe to call on non-HGM enabled VMAs; it will return immediately. hugetlb_walk_step implements how we step forwards in the walk. For architectures that don't use GENERAL_HUGETLB, they will need to provide their own implementation. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 13 +++++ mm/hugetlb.c | 125 ++++++++++++++++++++++++++++++++++++++++ 2 files changed, 138 insertions(+) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 003255b0e40f..4b1548adecde 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -276,6 +276,10 @@ u32 hugetlb_fault_mutex_hash(struct address_space *mapping, pgoff_t idx); pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pud_t *pud); +int hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma, + struct hugetlb_pte *hpte, unsigned long addr, + unsigned long sz, bool stop_at_none); + struct address_space *hugetlb_page_mapping_lock_write(struct page *hpage); extern int sysctl_hugetlb_shm_group; @@ -288,6 +292,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr, unsigned long sz); unsigned long hugetlb_mask_last_page(struct hstate *h); +int hugetlb_walk_step(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz); int huge_pmd_unshare(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, pte_t *ptep); void adjust_range_if_pmd_sharing_possible(struct vm_area_struct *vma, @@ -1066,6 +1072,8 @@ void hugetlb_register_node(struct node *node); void hugetlb_unregister_node(struct node *node); #endif +enum hugetlb_level hpage_size_to_level(unsigned long sz); + #else /* CONFIG_HUGETLB_PAGE */ struct hstate {}; @@ -1253,6 +1261,11 @@ static inline void hugetlb_register_node(struct node *node) static inline void hugetlb_unregister_node(struct node *node) { } + +static inline enum hugetlb_level hpage_size_to_level(unsigned long sz) +{ + return HUGETLB_LEVEL_PTE; +} #endif /* CONFIG_HUGETLB_PAGE */ #ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING diff --git a/mm/hugetlb.c b/mm/hugetlb.c index e3733388adee..90db59632559 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -95,6 +95,29 @@ static void hugetlb_vma_data_free(struct vm_area_struct *vma); static int hugetlb_vma_data_alloc(struct vm_area_struct *vma); static void __hugetlb_vma_unlock_write_free(struct vm_area_struct *vma); +/* + * hpage_size_to_level() - convert @sz to the corresponding page table level + * + * @sz must be less than or equal to a valid hugepage size. + */ +enum hugetlb_level hpage_size_to_level(unsigned long sz) +{ + /* + * We order the conditionals from smallest to largest to pick the + * smallest level when multiple levels have the same size (i.e., + * when levels are folded). + */ + if (sz < PMD_SIZE) + return HUGETLB_LEVEL_PTE; + if (sz < PUD_SIZE) + return HUGETLB_LEVEL_PMD; + if (sz < P4D_SIZE) + return HUGETLB_LEVEL_PUD; + if (sz < PGDIR_SIZE) + return HUGETLB_LEVEL_P4D; + return HUGETLB_LEVEL_PGD; +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -7321,6 +7344,70 @@ bool want_pmd_share(struct vm_area_struct *vma, unsigned long addr) } #endif /* CONFIG_ARCH_WANT_HUGE_PMD_SHARE */ +/* hugetlb_hgm_walk - walks a high-granularity HugeTLB page table to resolve + * the page table entry for @addr. + * + * @hpte must always be pointing at an hstate-level PTE (or deeper). + * + * This function will never walk further if it encounters a PTE of a size + * less than or equal to @sz. + * + * @stop_at_none determines what we do when we encounter an empty PTE. If true, + * we return that PTE. If false and @sz is less than the current PTE's size, + * we make that PTE point to the next level down, going until @sz is the same + * as our current PTE. + * + * If @stop_at_none is true and @sz is PAGE_SIZE, this function will always + * succeed, but that does not guarantee that hugetlb_pte_size(hpte) is @sz. + * + * Return: + * -ENOMEM if we couldn't allocate new PTEs. + * -EEXIST if the caller wanted to walk further than a migration PTE, + * poison PTE, or a PTE marker. The caller needs to manually deal + * with this scenario. + * -EINVAL if called with invalid arguments (@sz invalid, @hpte not + * initialized). + * 0 otherwise. + * + * Even if this function fails, @hpte is guaranteed to always remain + * valid. + */ +int hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma, + struct hugetlb_pte *hpte, unsigned long addr, + unsigned long sz, bool stop_at_none) +{ + int ret = 0; + pte_t pte; + + if (WARN_ON_ONCE(sz < PAGE_SIZE)) + return -EINVAL; + + if (!hugetlb_hgm_enabled(vma)) { + if (stop_at_none) + return 0; + return sz == huge_page_size(hstate_vma(vma)) ? 0 : -EINVAL; + } + + hugetlb_vma_assert_locked(vma); + + if (WARN_ON_ONCE(!hpte->ptep)) + return -EINVAL; + + while (hugetlb_pte_size(hpte) > sz && !ret) { + pte = huge_ptep_get(hpte->ptep); + if (!pte_present(pte)) { + if (stop_at_none) + return 0; + if (unlikely(!huge_pte_none(pte))) + return -EEXIST; + } else if (hugetlb_pte_present_leaf(hpte, pte)) + return 0; + ret = hugetlb_walk_step(mm, hpte, addr, sz); + } + + return ret; +} + #ifdef CONFIG_ARCH_WANT_GENERAL_HUGETLB pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr, unsigned long sz) @@ -7388,6 +7475,44 @@ pte_t *huge_pte_offset(struct mm_struct *mm, return (pte_t *)pmd; } +/* + * hugetlb_walk_step() - Walk the page table one step to resolve the page + * (hugepage or subpage) entry at address @addr. + * + * @sz always points at the final target PTE size (e.g. PAGE_SIZE for the + * lowest level PTE). + * + * @hpte will always remain valid, even if this function fails. + */ +int hugetlb_walk_step(struct mm_struct *mm, struct hugetlb_pte *hpte, + unsigned long addr, unsigned long sz) +{ + pte_t *ptep; + spinlock_t *ptl; + + switch (hpte->level) { + case HUGETLB_LEVEL_PUD: + ptep = (pte_t *)hugetlb_pmd_alloc(mm, hpte, addr); + if (IS_ERR(ptep)) + return PTR_ERR(ptep); + hugetlb_pte_populate(hpte, ptep, PMD_SHIFT, HUGETLB_LEVEL_PMD); + break; + case HUGETLB_LEVEL_PMD: + ptep = hugetlb_pte_alloc(mm, hpte, addr); + if (IS_ERR(ptep)) + return PTR_ERR(ptep); + ptl = pte_lockptr(mm, (pmd_t *)hpte->ptep); + hugetlb_pte_populate(hpte, ptep, PAGE_SHIFT, HUGETLB_LEVEL_PTE); + hpte->ptl = ptl; + break; + default: + WARN_ONCE(1, "%s: got invalid level: %d (shift: %d)\n", + __func__, hpte->level, hpte->shift); + return -EINVAL; + } + return 0; +} + /* * Return a mask that can be used to update an address to the last huge * page in a page table page mapping size. Used to skip non-present From patchwork Fri Oct 21 16:36:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015082 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D8320FA373D for ; Fri, 21 Oct 2022 16:37:32 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5504A8E0010; Fri, 21 Oct 2022 12:37:30 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4DB358E0001; Fri, 21 Oct 2022 12:37:30 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21C548E0010; Fri, 21 Oct 2022 12:37:30 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 011768E0001 for ; Fri, 21 Oct 2022 12:37:29 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C554181271 for ; Fri, 21 Oct 2022 16:37:29 +0000 (UTC) X-FDA: 80045512218.25.94B23DC Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf28.hostedemail.com (Postfix) with ESMTP id 680FFC000B for ; Fri, 21 Oct 2022 16:37:29 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-3697bd55974so33421047b3.15 for ; Fri, 21 Oct 2022 09:37:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8/NM5zqKrp3T0mJhjHHyOLUnz62gupmqJK/DDFGexZI=; b=kNo3BI9LgJDNbw4zG7f+BcfBeEsxT7zZGWcOge04kuwB3WGs4ua5FnV/ikxacl43mD M6k0xqIMh4xV/kNssOly5E5ouNZTJbE1jiHX8f4BQKdFzjOe8CtoSIjUdke5SPji+pEq kOZ846TjVCKjKeX4b6HxrkfJ6BQ6CGISZePlXZkN2Rm1jtqyKh+RfXYCoD2KqxlK58+X giN/v/U9prCXXWwTgQ+82EXYGKAT/w9/2BGshYZOi4LwaY3LCQutO9qEE+7l1KWb9pbv DCVuuxqs8fuEhlJkR2ekbMYvfUQkoMuOLpKGVnSHPaC+P69I+ARfmCsKOI9PHLE6d5yH SWSQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8/NM5zqKrp3T0mJhjHHyOLUnz62gupmqJK/DDFGexZI=; b=1zPDQ0LeS/E4pnHZsFBhbvw96mMkKvFBfLhXrRhzWNazzHaJIMtjikoPcYT89n7GLL Znx4yGO+VLHN/xMOtaFG1qzG9nFQJvzIzeoeQDcRWR2GhM1LyQp3lmbxBdsJg2SQ1oZX s0nWnqsMsF/ruFENQPje3KHb/lNOlK3tL9yi3s90a2xBcqcWipe3KsXtYHzVFFjBVhOS /kg/V60W/qXkjnYf4EBfhMLImB8LsGBFxMpL/6q3yQy0ImUCCbid0cUHrE+D3uyTq+TA /r6ozjH9NrPvTsQSRUX2BIOQ+hVStsyQYmhTa6AgzQdle87+ki44+6Y/NGqai2upIrMf jpHw== X-Gm-Message-State: ACrzQf2foJCuXE2NGDMCzovAUxO5CbMfK2ZqNnTPESFKBaXny+Y0nXNI 4f0QHDNgY+OjOyX/4zydgNFpXvodP5BYxk27 X-Google-Smtp-Source: AMsMyM6lUFLmKl2u24xYHvEYAF3HSHyv+Q9NHWaQ4uC8W+BKyVcH1krHI+4kAFLxi7gYBA7W761SZPhq47YrIqjR X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:4b96:0:b0:354:8935:d5a9 with SMTP id y144-20020a814b96000000b003548935d5a9mr18239585ywa.36.1666370248722; Fri, 21 Oct 2022 09:37:28 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:29 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-14-jthoughton@google.com> Subject: [RFC PATCH v2 13/47] hugetlb: add make_huge_pte_with_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370249; a=rsa-sha256; cv=none; b=bps8dX74SNYRvjFhsMAEa9SciOAqaD2BetXgQw6VQJahxTSQzndIWutXPq5OSR5ki7f8fZ v/Ih/8RaP5f2VfGap1aJeUSvlErIw6+FDmc1dfiJV3akALMOAfHEmuR7p+sCY0Vol96JbW wCDXY2Bctjux5TsinZCmls3sc3EdBCg= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kNo3BI9L; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 3yMpSYwoKCMky8w39vw832v33v0t.r310x29C-11zAprz.36v@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3yMpSYwoKCMky8w39vw832v33v0t.r310x29C-11zAprz.36v@flex--jthoughton.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370249; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8/NM5zqKrp3T0mJhjHHyOLUnz62gupmqJK/DDFGexZI=; b=CHLvGEF+KmVQNKJrTGD3ll4/ZxqB7U/brcpQhRhZrLAlqeirc1Ls/IjIhiFocpT+Vnemjh gIg6svZKWzJR4gZpUN5ZdpXIt0EBpq9nqzL0yYxBT+yRx9bT2Dj8GIK3Mlk4hjE3Q2Wmgy ql3Uv+7mmLf96gcfdIgYorRXHbdOJ+o= Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kNo3BI9L; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 3yMpSYwoKCMky8w39vw832v33v0t.r310x29C-11zAprz.36v@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3yMpSYwoKCMky8w39vw832v33v0t.r310x29C-11zAprz.36v@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam04 X-Rspam-User: X-Stat-Signature: 5w6opzwm68cns393o5c9e64sahao6554 X-Rspamd-Queue-Id: 680FFC000B X-HE-Tag: 1666370249-519263 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This allows us to make huge PTEs at shifts other than the hstate shift, which will be necessary for high-granularity mappings. Signed-off-by: James Houghton Acked-by: Mike Kravetz --- mm/hugetlb.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 90db59632559..74a4afda1a7e 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4867,11 +4867,11 @@ const struct vm_operations_struct hugetlb_vm_ops = { .pagesize = hugetlb_vm_op_pagesize, }; -static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) +static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, + struct page *page, int writable, + int shift) { pte_t entry; - unsigned int shift = huge_page_shift(hstate_vma(vma)); if (writable) { entry = huge_pte_mkwrite(huge_pte_mkdirty(mk_pte(page, @@ -4885,6 +4885,14 @@ static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, return entry; } +static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, + int writable) +{ + unsigned int shift = huge_page_shift(hstate_vma(vma)); + + return make_huge_pte_with_shift(vma, page, writable, shift); +} + static void set_huge_ptep_writable(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { From patchwork Fri Oct 21 16:36:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015083 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CD18FA3740 for ; Fri, 21 Oct 2022 16:37:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7E99C8E0011; Fri, 21 Oct 2022 12:37:31 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6856D8E0001; Fri, 21 Oct 2022 12:37:31 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3F0E98E0011; Fri, 21 Oct 2022 12:37:31 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 170938E0001 for ; Fri, 21 Oct 2022 12:37:31 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id B93D7406AE for ; Fri, 21 Oct 2022 16:37:30 +0000 (UTC) X-FDA: 80045512260.16.E587D4C Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf29.hostedemail.com (Postfix) with ESMTP id 56B43120018 for ; Fri, 21 Oct 2022 16:37:30 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-360a7ff46c3so34032277b3.12 for ; Fri, 21 Oct 2022 09:37:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=vfmRA2hXHJo+o02NNvf+xyYOK4fmmGYkyLRnRI2e0ew=; b=N0Njss2tbbufEOGB3mWef6ZPPCsGQk8p/qgNHuo+xubJr8V06BeblG2+fHSKMDlxA/ UJHivxWkHImjyXzOSjLEeYcZId2JJN2s+UU8otxIgWTW9duZ5h63B+TLoNO25kQvzWXU HeD3Hvh1l9z43C29huY50us4lyvu6ehOMFBsaP/qsg6yn2ieehrftiopv/q8ZpHCLJ3Y K7UMsTw8/CtFVq/Ds71EirrkPGmvM7a28uKqnCDpCM+PSuE49qWj9kjs6TX0wErVJMdC fBYvTwS4YdM1mYbSThmJwyUIU5RPiBM8YE1g/uw7dhpETOePIsSy4CEuuJN9RhbaqsIy 0rtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=vfmRA2hXHJo+o02NNvf+xyYOK4fmmGYkyLRnRI2e0ew=; b=CA8pUfsubWFZfVXioBmr+/SEGktCttkbZbjUZVUQOUztUb//aAZKoWF7AReoMhP7bZ 1czNuuCJss+8P/sWgaxJK1Xgqior8CLa/usw0uS6Av7Uxpz8hrpxRs/ME7kJcxSrJ22K I1YzI2xHzohNbN0GquBax9STDueYs91f0UWs6k/oSclq3eu34CdLpOLE8kY6s7F9FKTP gw/UFcpOHHZ45+YZSxzxuKta4SkQMrOHHN+MchZMdAyZRf3PwDmFeaNmQ2RYNAks3/DV 2NkuGwbFWHjwo7/PTGarkpvpzynlXHm5lbCO9c3/D5bpyavE94SxuLLxfSDwve4HVnlX aQEA== X-Gm-Message-State: ACrzQf30eMffE+/N1KR6ruyyFLq5LCI7kqxgg2C5rdO8Q1Z6pWE4yuWy WPhnCZ7oQDNxSo6Ne0NJEQPcMw1O0radTt2p X-Google-Smtp-Source: AMsMyM75iODIU2u9+2Ko6JMPySQIphDFX4W2ca1ATqg4hjaMgjVtf9dtnIOEwBdlJcFpQluhwu54PMNKNN8Zpw+u X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:ca8d:0:b0:6c1:99ab:5798 with SMTP id a135-20020a25ca8d000000b006c199ab5798mr17535543ybg.19.1666370249622; Fri, 21 Oct 2022 09:37:29 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:30 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-15-jthoughton@google.com> Subject: [RFC PATCH v2 14/47] hugetlb: make default arch_make_huge_pte understand small mappings From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=N0Njss2t; spf=pass (imf29.hostedemail.com: domain of 3ycpSYwoKCMoz9x4Awx943w44w1u.s421y3AD-220Bqs0.47w@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3ycpSYwoKCMoz9x4Awx943w44w1u.s421y3AD-220Bqs0.47w@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370250; a=rsa-sha256; cv=none; b=mGy3Ef3nCVo/0zpscW2jCjDnDt+PdAj52z6p3/r3qhe8uNiz+bCYYNaZsdJBg3NB4zzvdR VTw9LOoDrKlAmcvjcHRvr01lCwXG4rdC3MVlgbK/23Vijtgtiq2H5HcaeJp4y+0fqNB3G2 7bj1RO2Rat/YU3OUyntJ1HQpRSZWMbM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370250; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=vfmRA2hXHJo+o02NNvf+xyYOK4fmmGYkyLRnRI2e0ew=; b=XZcq+iaam4FRwQBUS4smUtPV4DXIYAiO/ZRBdncElvogdsWngSb1AxFcApaAZnky36NvG5 IgJypMsepdtA7VH/yqXImvbjd/8aPdhomD8kXXNj/1mxIh3VK8z0mGLTX8fQXlSNTEYj1A YaPQfVMT3q2W68idlWP5mpAc6SovJjs= Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=N0Njss2t; spf=pass (imf29.hostedemail.com: domain of 3ycpSYwoKCMoz9x4Awx943w44w1u.s421y3AD-220Bqs0.47w@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3ycpSYwoKCMoz9x4Awx943w44w1u.s421y3AD-220Bqs0.47w@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: orqyte98hge4sfxdhfpc6c365fpd1tq3 X-Rspamd-Queue-Id: 56B43120018 X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1666370250-378562 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a simple change: don't create a "huge" PTE if we are making a regular, PAGE_SIZE PTE. All architectures that want to implement HGM likely need to be changed in a similar way if they implement their own version of arch_make_huge_pte. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 4b1548adecde..d305742e9d44 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -907,7 +907,7 @@ static inline void arch_clear_hugepage_flags(struct page *page) { } static inline pte_t arch_make_huge_pte(pte_t entry, unsigned int shift, vm_flags_t flags) { - return pte_mkhuge(entry); + return shift > PAGE_SHIFT ? pte_mkhuge(entry) : entry; } #endif From patchwork Fri Oct 21 16:36:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015084 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1D3DEFA373D for ; Fri, 21 Oct 2022 16:37:35 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 773108E0012; Fri, 21 Oct 2022 12:37:32 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6D7ED8E0001; Fri, 21 Oct 2022 12:37:32 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48D9F8E0012; Fri, 21 Oct 2022 12:37:32 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 2B9B18E0001 for ; Fri, 21 Oct 2022 12:37:32 -0400 (EDT) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 0F50281081 for ; Fri, 21 Oct 2022 16:37:32 +0000 (UTC) X-FDA: 80045512344.03.F9FB303 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf06.hostedemail.com (Postfix) with ESMTP id 86BDA18003F for ; Fri, 21 Oct 2022 16:37:31 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id w190-20020a257bc7000000b006c0d1b19526so3747939ybc.12 for ; Fri, 21 Oct 2022 09:37:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=fsyIlVnnMiNXvkl1HsXqgX1TLgLGQ8zT/P/5weg/1gI=; b=XqorezT/2FphyiQnAX1Bd8wgsl7lfI/pP/ncB+cOJZKqD23STzv16ftCYUCPWHqtg5 bpBlu8K4trcZPPP7ksTIPwi+OmxSmuA+bx2JDQ4xMN1Sza2xhzZIri2t3uZC3siOgf1N 6PkInLp3do3ZY7Mr0m9JvaUW9Fio86INZMzBpW/c8ze+LCXgeqrdcTQFLLGy0UJLJ7rx r5nPuIKCiA9//Lvg3JPH/D3IUupmV0+whrn4/c2BWlYGrhrlNQ3nh+aOXJrWK/llyVpr 73/3Br83UOiGnA3b2tybFdHzcf9rRgS7SFsJnN0uyja6ZNBNeDdPNMYCpV/KSWK8w9CV O7AQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=fsyIlVnnMiNXvkl1HsXqgX1TLgLGQ8zT/P/5weg/1gI=; b=YXE9ZA59zn+hF5dZgZJLMNM0yOWzqBisRuQgzegyeOY/fdTSgy5XR+grxTHasDcwC3 yfdFewwYHsPXhFuXykboeSjAv+WJtkZz91jSdT8gNeBVP2xCXuP82jQE2/ZdLP97AkL9 3+bxNfO4IHNu+anTEBMQBYIz4TXBHMpAzmHcFrsWklfn4sl0SuSedkXO2AjJ0GtxBVpA 9wESgjBTDEBm9Ch7Wb4CSzwGgU76Xa8+k9A0NPQrp8iYvxgVqAx71C0mN9l4yEWDKtZJ 9UbNqS7SgO4ese8rzTI9DdU0SgIAn7AtvZ9Y84gUcErj8n914EHcp8xfZ8cGRl0SmiUn pJcQ== X-Gm-Message-State: ACrzQf0DlEsB440BdHetE1aoNEw+ZbaisaPnX+i64McTiP/B6ItDN8RT CxcsePRKeqAuwohAI8D15BnunCSBidX0U7j/ X-Google-Smtp-Source: AMsMyM4BD94+VbGPR4g7N7cSR0mTVCvwlFZLyZIhchhPIXP/hu+fs3VOPw61nlbFmAsQysXWmSvESfUZFYZzAMKD X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:70f:b0:6ca:7254:c2ea with SMTP id k15-20020a056902070f00b006ca7254c2eamr3096670ybt.476.1666370250766; Fri, 21 Oct 2022 09:37:30 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:31 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-16-jthoughton@google.com> Subject: [RFC PATCH v2 15/47] hugetlbfs: for unmapping, treat HGM-mapped pages as potentially mapped From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370251; a=rsa-sha256; cv=none; b=Zrbuf4d1yXHTla5/QDKQw+moGOWkgO5db7QDtWW6aTzrpx5LbKJ/SH2plwDvYqzaNT7IrC Wpp5c1VvFBHUzTAwvtMH9VR3NY5IMvNEmltrX7g9kNK0e+hbmpkjuj9rLRv7GgfHuCfI8D Xe4COTOA0oKR3KXnA+GPzj2hKN1QO7o= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="XqorezT/"; spf=pass (imf06.hostedemail.com: domain of 3yspSYwoKCMs0Ay5BxyA54x55x2v.t532z4BE-331Crt1.58x@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3yspSYwoKCMs0Ay5BxyA54x55x2v.t532z4BE-331Crt1.58x@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370251; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=fsyIlVnnMiNXvkl1HsXqgX1TLgLGQ8zT/P/5weg/1gI=; b=fiHT+0KjAeFartBkgMh+LNAlyX8YGoyLRddozQPnRIDqdA2wJTRnMk77zHvKMC6mf1czXz fQFjTDb6QRaJl8pZ7CmSq3VAay0/A2cD1JNBPFA+tFn+LTksL91JTndDhBeCTvXK5Hh7vq yHwnKe4ZeeVrfrLXt7IaN2XRAV8poVc= X-Stat-Signature: qcyeu54u5rtrjoy8t8gdmbwiwe6813ej X-Rspamd-Queue-Id: 86BDA18003F Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="XqorezT/"; spf=pass (imf06.hostedemail.com: domain of 3yspSYwoKCMs0Ay5BxyA54x55x2v.t532z4BE-331Crt1.58x@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=3yspSYwoKCMs0Ay5BxyA54x55x2v.t532z4BE-331Crt1.58x@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1666370251-788362 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: hugetlb_vma_maps_page was mostly used as an optimization: if the VMA isn't mapping a page, then we don't have to attempt to unmap it again. We are still able to call the unmap routine if we need to. For high-granularity mapped pages, we can't easily do a full walk to see if the page is actually mapped or not, so simply return that it might be. Signed-off-by: James Houghton --- fs/hugetlbfs/inode.c | 27 +++++++++++++++++++++------ 1 file changed, 21 insertions(+), 6 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 7f836f8f9db1..a7ab62e39b8c 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -383,21 +383,34 @@ static void hugetlb_delete_from_page_cache(struct folio *folio) * mutex for the page in the mapping. So, we can not race with page being * faulted into the vma. */ -static bool hugetlb_vma_maps_page(struct vm_area_struct *vma, - unsigned long addr, struct page *page) +static bool hugetlb_vma_maybe_maps_page(struct vm_area_struct *vma, + unsigned long addr, struct page *page) { pte_t *ptep, pte; + struct hugetlb_pte hpte; + struct hstate *h = hstate_vma(vma); - ptep = huge_pte_offset(vma->vm_mm, addr, - huge_page_size(hstate_vma(vma))); + ptep = huge_pte_offset(vma->vm_mm, addr, huge_page_size(h)); if (!ptep) return false; + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + pte = huge_ptep_get(ptep); if (huge_pte_none(pte) || !pte_present(pte)) return false; + if (!hugetlb_pte_present_leaf(&hpte, pte)) + /* + * The top-level PTE is not a leaf, so it's possible that a PTE + * under us is mapping the page. We aren't holding the VMA + * lock, so it is unsafe to continue the walk further. Instead, + * return true to indicate that we might be mapping the page. + */ + return true; + if (pte_page(pte) == page) return true; @@ -457,7 +470,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - if (!hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + if (!hugetlb_vma_maybe_maps_page(vma, vma->vm_start + v_start, + page)) continue; if (!hugetlb_vma_trylock_write(vma)) { @@ -507,7 +521,8 @@ static void hugetlb_unmap_file_folio(struct hstate *h, */ v_start = vma_offset_start(vma, start); v_end = vma_offset_end(vma, end); - if (hugetlb_vma_maps_page(vma, vma->vm_start + v_start, page)) + if (hugetlb_vma_maybe_maps_page(vma, vma->vm_start + v_start, + page)) unmap_hugepage_range(vma, vma->vm_start + v_start, v_end, NULL, ZAP_FLAG_DROP_MARKER); From patchwork Fri Oct 21 16:36:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015085 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 39AB6C433FE for ; Fri, 21 Oct 2022 16:37:36 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C3838E0013; Fri, 21 Oct 2022 12:37:33 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 686CA8E0001; Fri, 21 Oct 2022 12:37:33 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 502008E0013; Fri, 21 Oct 2022 12:37:33 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 206CE8E0001 for ; Fri, 21 Oct 2022 12:37:33 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id E8E394033A for ; Fri, 21 Oct 2022 16:37:32 +0000 (UTC) X-FDA: 80045512344.21.A7E04B4 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf11.hostedemail.com (Postfix) with ESMTP id 55FB540039 for ; Fri, 21 Oct 2022 16:37:32 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id e8-20020a5b0cc8000000b006bca0fa3ab6so3795928ybr.0 for ; Fri, 21 Oct 2022 09:37:32 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jGDNGpgyd+LktDdBqcgk2eFzL/51App81iFIhhA8QvM=; b=WvssnA576m2gIOjqFAeA/o9F21XVuqPBOfjdLDQe3yj3f7sCXH/DyQZifeomZgACso tq9jSZfo+HUQwrmpjel2ODEauK1LcfzwDDEk6ffGwYPc4Y5s7gAYkqNbIaP2xTzfrsxZ pIO+v90Vr8xsUYHX9E/VLnIFshVrB0JCnxxBwAK2yJQEFzngwXTyf9t/ysQ0xxvch87r Zi7MRBeyl8crtyxZjZWkPZLEPtcOooB2n7nKJms7jG/Tpde/Q+Lx3dNw5d7IMczHqnN3 CZBfpX0lsEsL0bCzuxums79HAP7Zr4yoEeRp/oAqhslrlEQnEUwcSQPm09TnYI39AuL+ zA3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jGDNGpgyd+LktDdBqcgk2eFzL/51App81iFIhhA8QvM=; b=LADybNKzqEEu+xi/f/PQsMH1C6Aj8unkSYjvzVQZe2Pl+iHF6pAPHhxbA7NKSaEClc GP8lTRH2Uc2WkfplvEZFP0TZd0IhXU7+0dAB77MNQY+2de97/O1wpVJJVXQK+ZmrgKtM Nvq0D+2s/GpfM2h887iGSgEgRkbY9fdbE8BI5mneS4FZAsIh9C7ugDJsRwpzl7TjmG5l h9Twc5eErHaT5ttZps27vNQdYlrOOlLdE91SEp+6+boD98BUKV4rwspHyLfS08bhu8eJ 45+yIlFizmEJuvQ+DNTHyBEV6YcIS3PF3rLSoOUA9BnBgbcSvfi0CSYcPL9PWopfISM9 1Qww== X-Gm-Message-State: ACrzQf2v/GK3yYqhp/s9SGt3+N7uV8epSSzpghUpksF8K8lkwOq4ywLa NK3zarnzz4EFoC/W+v3FEgmGSzKldXIB3/vs X-Google-Smtp-Source: AMsMyM7S7b4xi7h0rwSX6ZHP74KAZnjzSP+ITPpkxvpF7yvakbmq9b7FArr2IxSbKG/6i0vwlb/HkHrAOF9sGOQp X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:d601:0:b0:368:6e24:d23a with SMTP id y1-20020a0dd601000000b003686e24d23amr8805625ywd.82.1666370251617; Fri, 21 Oct 2022 09:37:31 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:32 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-17-jthoughton@google.com> Subject: [RFC PATCH v2 16/47] hugetlb: make unmapping compatible with high-granularity mappings From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370252; a=rsa-sha256; cv=none; b=4YihwRgX0DMDBRsdMWOLA7JganYGl8lU5Z7ZY3uFPRCIiBsB6179Allk53sA/Pb+LdGIt4 LYphMaMfxSUWXhPLMMFy2UOElyiU8uERHJJ/cy5M64CXYWfp5j8zGMPVbjD0WjnZ2SSPxU KSWnvEYtE4j+/04cNiMt3NSxjJ1sJ/4= ARC-Authentication-Results: i=1; imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WvssnA57; spf=pass (imf11.hostedemail.com: domain of 3y8pSYwoKCMw1Bz6CyzB65y66y3w.u64305CF-442Dsu2.69y@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3y8pSYwoKCMw1Bz6CyzB65y66y3w.u64305CF-442Dsu2.69y@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370252; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jGDNGpgyd+LktDdBqcgk2eFzL/51App81iFIhhA8QvM=; b=uySOr/VGOO+PreBCFBH9qUPKKY6hd4ouqd9qy5JB3PfDQuNdmvb4hMyBg4HTNFGdl0NZFo ZNGuv04+OXzVIkunnfX8J0VqkXOT5+pnuc+CmexZx86TnjWv7U2mqk1Qydy+euBdXqFS5N 45lR1eB2od0pu4lssygMhdcJ8HWdBm4= X-Stat-Signature: 14ai4a3drgj7icnzbsyw3dgy8zebbn3o X-Rspamd-Queue-Id: 55FB540039 Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=WvssnA57; spf=pass (imf11.hostedemail.com: domain of 3y8pSYwoKCMw1Bz6CyzB65y66y3w.u64305CF-442Dsu2.69y@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3y8pSYwoKCMw1Bz6CyzB65y66y3w.u64305CF-442Dsu2.69y@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1666370252-134714 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Enlighten __unmap_hugepage_range to deal with high-granularity mappings. This doesn't change its API; it still must be called with hugepage alignment, but it will correctly unmap hugepages that have been mapped at high granularity. The rules for mapcount and refcount here are: 1. Refcount and mapcount are tracked on the head page. 2. Each page table mapping into some of an hpage will increase that hpage's mapcount and refcount by 1. Eventually, functionality here can be expanded to allow users to call MADV_DONTNEED on PAGE_SIZE-aligned sections of a hugepage, but that is not done here. Signed-off-by: James Houghton --- include/asm-generic/tlb.h | 6 ++-- mm/hugetlb.c | 76 +++++++++++++++++++++++++-------------- 2 files changed, 52 insertions(+), 30 deletions(-) diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 492dce43236e..c378a44915a9 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -566,9 +566,9 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0) -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ +#define tlb_remove_huge_tlb_entry(tlb, hpte, address) \ do { \ - unsigned long _sz = huge_page_size(h); \ + unsigned long _sz = hugetlb_pte_size(&hpte); \ if (_sz >= P4D_SIZE) \ tlb_flush_p4d_range(tlb, address, _sz); \ else if (_sz >= PUD_SIZE) \ @@ -577,7 +577,7 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, tlb_flush_pmd_range(tlb, address, _sz); \ else \ tlb_flush_pte_range(tlb, address, _sz); \ - __tlb_remove_tlb_entry(tlb, ptep, address); \ + __tlb_remove_tlb_entry(tlb, hpte.ptep, address);\ } while (0) /** diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 74a4afda1a7e..227150c25763 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5221,10 +5221,10 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct { struct mm_struct *mm = vma->vm_mm; unsigned long address; - pte_t *ptep; + struct hugetlb_pte hpte; pte_t pte; spinlock_t *ptl; - struct page *page; + struct page *hpage, *subpage; struct hstate *h = hstate_vma(vma); unsigned long sz = huge_page_size(h); struct mmu_notifier_range range; @@ -5235,11 +5235,6 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct BUG_ON(start & ~huge_page_mask(h)); BUG_ON(end & ~huge_page_mask(h)); - /* - * This is a hugetlb vma, all the pte entries should point - * to huge page. - */ - tlb_change_page_size(tlb, sz); tlb_start_vma(tlb, vma); /* @@ -5251,26 +5246,35 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct mmu_notifier_invalidate_range_start(&range); last_addr_mask = hugetlb_mask_last_page(h); address = start; - for (; address < end; address += sz) { - ptep = huge_pte_offset(mm, address, sz); + + while (address < end) { + pte_t *ptep = huge_pte_offset(mm, address, sz); + if (!ptep) { address |= last_addr_mask; + address += sz; continue; } + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + hugetlb_hgm_walk(mm, vma, &hpte, address, + PAGE_SIZE, /*stop_at_none=*/true); - ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + ptl = hugetlb_pte_lock(mm, &hpte); + if (huge_pmd_unshare(mm, vma, address, hpte.ptep)) { spin_unlock(ptl); tlb_flush_pmd_range(tlb, address & PUD_MASK, PUD_SIZE); force_flush = true; address |= last_addr_mask; + address += sz; continue; } - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); + if (huge_pte_none(pte)) { spin_unlock(ptl); - continue; + goto next_hpte; } /* @@ -5287,25 +5291,36 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct */ if (pte_swp_uffd_wp_any(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); else #endif - huge_pte_clear(mm, address, ptep, sz); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); + spin_unlock(ptl); + goto next_hpte; + } + + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) { + /* + * We raced with someone splitting out from under us. + * Retry the walk. + */ spin_unlock(ptl); continue; } - page = pte_page(pte); + subpage = pte_page(pte); + hpage = compound_head(subpage); /* * If a reference page is supplied, it is because a specific * page is being unmapped, not a range. Ensure the page we * are about to unmap is the actual page of interest. */ if (ref_page) { - if (page != ref_page) { + if (hpage != ref_page) { spin_unlock(ptl); - continue; + goto next_hpte; } /* * Mark the VMA as having unmapped its page so that @@ -5315,27 +5330,34 @@ static void __unmap_hugepage_range(struct mmu_gather *tlb, struct vm_area_struct set_vma_resv_flags(vma, HPAGE_RESV_UNMAPPED); } - pte = huge_ptep_get_and_clear(mm, address, ptep); - tlb_remove_huge_tlb_entry(h, tlb, ptep, address); + pte = huge_ptep_get_and_clear(mm, address, hpte.ptep); + tlb_change_page_size(tlb, hugetlb_pte_size(&hpte)); + tlb_remove_huge_tlb_entry(tlb, hpte, address); if (huge_pte_dirty(pte)) - set_page_dirty(page); + set_page_dirty(hpage); #ifdef CONFIG_PTE_MARKER_UFFD_WP /* Leave a uffd-wp pte marker if needed */ if (huge_pte_uffd_wp(pte) && !(zap_flags & ZAP_FLAG_DROP_MARKER)) - set_huge_pte_at(mm, address, ptep, + set_huge_pte_at(mm, address, hpte.ptep, make_pte_marker(PTE_MARKER_UFFD_WP)); #endif - hugetlb_count_sub(pages_per_huge_page(h), mm); - page_remove_rmap(page, vma, true); + hugetlb_count_sub(hugetlb_pte_size(&hpte)/PAGE_SIZE, mm); + page_remove_rmap(hpage, vma, true); spin_unlock(ptl); - tlb_remove_page_size(tlb, page, huge_page_size(h)); /* - * Bail out after unmapping reference page if supplied + * Lower the reference count on the head page. + */ + tlb_remove_page_size(tlb, hpage, sz); + /* + * Bail out after unmapping reference page if supplied, + * and there's only one PTE mapping this page. */ - if (ref_page) + if (ref_page && hugetlb_pte_size(&hpte) == sz) break; +next_hpte: + address += hugetlb_pte_size(&hpte); } mmu_notifier_invalidate_range_end(&range); tlb_end_vma(tlb, vma); From patchwork Fri Oct 21 16:36:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015116 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B074AFA373D for ; Fri, 21 Oct 2022 16:44:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 522D28E0005; Fri, 21 Oct 2022 12:44:03 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4ABA18E0001; Fri, 21 Oct 2022 12:44:03 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 325D38E0005; Fri, 21 Oct 2022 12:44:03 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 208068E0001 for ; Fri, 21 Oct 2022 12:44:03 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id DBD9081271 for ; Fri, 21 Oct 2022 16:44:02 +0000 (UTC) X-FDA: 80045528724.12.6F854AE Received: from mail-oa1-f73.google.com (mail-oa1-f73.google.com [209.85.160.73]) by imf22.hostedemail.com (Postfix) with ESMTP id 7ED23C001E for ; Fri, 21 Oct 2022 16:44:02 +0000 (UTC) Received: by mail-oa1-f73.google.com with SMTP id 586e51a60fabf-13193fb45b9so2074631fac.1 for ; Fri, 21 Oct 2022 09:44:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=W0hHBKIF8uIcOgxC/q0OXlvofpo6KadXA1H0O4ltk1Y=; b=AHId1r8qc07ghUEt5/Ex2Zf0W0zSQGOUnXdHCxNiJR1Hno1KO2tg3IaYi5fsBFFREt zaTPG3ZOZ7ZDI5SYZUSdd164/SBx2RNgUpq7SyZUqrxObCinLqY5HPhh4EIS39CFsVS9 CAn3ANMOBawas/OIPX5suHRWyjuMGO9l7g4fyyzNwadwFTopQQE0DuLsY9+NrFeWg5Jp XA17TZNaMI18gkKs5ImobgIPXu4MoBCgWZQKQetYF2kcQudpcNWu4meOa0LoiWW9Y37P EB6D21RUxMU62JbTdgEoQX2uHO4k4jpr8V5zyAH7bZhv3mCQ5m7YGVKOsgczQDIKG0Ut QxXw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=W0hHBKIF8uIcOgxC/q0OXlvofpo6KadXA1H0O4ltk1Y=; b=3r/UmycgMPb1KXEeXpEa2NfQqgr51dp1iCGrfJFoI6kQwvzXeC4xLYBVH2BKj2LXzo fo9atCebQKOtITk8O+dJUslMs4ARVVTmmW0QjUCmTAZ/h1HLC3gQuvX+isK9HDtQVUyL 18ohGkQEM0iFPH77BwL7xNZ/p+A6VWKnKigqzF7NEKokwY456LfWfX4G+GfIpT1VPSWj A3sXH9k7TKF2Wve8OgZ+ebl4OBcRVnbl3LjZcREZC3k1nRDSY3sbSfpE3nteEeER0Dd9 ng7C1rTj6rwoEXy+uyfegNxvU3tD/KHn9rbRLvw+0/ry4+AZ9uuo3VQttEPjGq4bGC8r Slow== X-Gm-Message-State: ACrzQf2UqdVzOTG+xjCNfbSmgXSiu5DUU2zRjsYnLLul4IgUkgfMWl42 R0Q2jcLuvNdrau6Qw5g6ToI5VQDgkfbR76Eo X-Google-Smtp-Source: AMsMyM7nODxEiDN1Wi5inkQKcjcQQAwFn793cgLw4B+EMkRaGKkCeMDbET3p13SOVpqGZZeaUP7OxVoC63X5F8f7 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:c5c4:0:b0:34a:7ab0:7b29 with SMTP id h187-20020a0dc5c4000000b0034a7ab07b29mr18365233ywd.294.1666370252562; Fri, 21 Oct 2022 09:37:32 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:33 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-18-jthoughton@google.com> Subject: [RFC PATCH v2 17/47] hugetlb: make hugetlb_change_protection compatible with HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370642; a=rsa-sha256; cv=none; b=bTRtHstT59bW136RGIa53Dh6P/n8DP0iys44HAudli1ghY91EBOo5qD/I1TI59/IgcidHZ LoSmuPqutX92dXI3TpdfpZCiAdMM05hC5J/SL+2KNXbbHLzw6umacintpVjI5c9XsIlgII nyPHuVNpMhK3BhLe4OQuH3LTn9cUHTA= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=AHId1r8q; spf=pass (imf22.hostedemail.com: domain of 3zMpSYwoKCM02C07Dz0C76z77z4x.v75416DG-553Etv3.7Az@flex--jthoughton.bounces.google.com designates 209.85.160.73 as permitted sender) smtp.mailfrom=3zMpSYwoKCM02C07Dz0C76z77z4x.v75416DG-553Etv3.7Az@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370642; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=W0hHBKIF8uIcOgxC/q0OXlvofpo6KadXA1H0O4ltk1Y=; b=h3eVFG9B0HvmPcWBcv0UmaTEmGhOag3Tw499MGz2sQuZWsz7nwlXHCqBtdGmvwBBA1Dqv0 iCg9/e4+CG5juys0PH5wrLJkAPipLrzcRIE72eNWF1c6Evh8hMmkShztZWMJJ0iD8JMHbR Zeoy4B/V4GdqbqYQLKQx3fShBdzWxiw= Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=AHId1r8q; spf=pass (imf22.hostedemail.com: domain of 3zMpSYwoKCM02C07Dz0C76z77z4x.v75416DG-553Etv3.7Az@flex--jthoughton.bounces.google.com designates 209.85.160.73 as permitted sender) smtp.mailfrom=3zMpSYwoKCM02C07Dz0C76z77z4x.v75416DG-553Etv3.7Az@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: zhkomutoqwxnusssf3dk5mq4z5xju91x X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 7ED23C001E X-Rspam-User: X-HE-Tag: 1666370642-820669 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The main change here is to do a high-granularity walk and pulling the shift from the walk (not from the hstate). Signed-off-by: James Houghton --- mm/hugetlb.c | 65 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 45 insertions(+), 20 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 227150c25763..2d096cef53cd 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6654,15 +6654,15 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, { struct mm_struct *mm = vma->vm_mm; unsigned long start = address; - pte_t *ptep; pte_t pte; struct hstate *h = hstate_vma(vma); - unsigned long pages = 0, psize = huge_page_size(h); + unsigned long base_pages = 0, psize = huge_page_size(h); bool shared_pmd = false; struct mmu_notifier_range range; unsigned long last_addr_mask; bool uffd_wp = cp_flags & MM_CP_UFFD_WP; bool uffd_wp_resolve = cp_flags & MM_CP_UFFD_WP_RESOLVE; + struct hugetlb_pte hpte; /* * In the case of shared PMDs, the area to flush could be beyond @@ -6680,31 +6680,38 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, hugetlb_vma_lock_write(vma); i_mmap_lock_write(vma->vm_file->f_mapping); last_addr_mask = hugetlb_mask_last_page(h); - for (; address < end; address += psize) { + while (address < end) { spinlock_t *ptl; - ptep = huge_pte_offset(mm, address, psize); + pte_t *ptep = huge_pte_offset(mm, address, psize); + if (!ptep) { address |= last_addr_mask; + address += huge_page_size(h); continue; } - ptl = huge_pte_lock(h, mm, ptep); - if (huge_pmd_unshare(mm, vma, address, ptep)) { + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(psize)); + hugetlb_hgm_walk(mm, vma, &hpte, address, PAGE_SIZE, + /*stop_at_none=*/true); + + ptl = hugetlb_pte_lock(mm, &hpte); + if (huge_pmd_unshare(mm, vma, address, hpte.ptep)) { /* * When uffd-wp is enabled on the vma, unshare * shouldn't happen at all. Warn about it if it * happened due to some reason. */ WARN_ON_ONCE(uffd_wp || uffd_wp_resolve); - pages++; + base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE; spin_unlock(ptl); shared_pmd = true; address |= last_addr_mask; - continue; + goto next_hpte; } - pte = huge_ptep_get(ptep); + pte = huge_ptep_get(hpte.ptep); if (unlikely(is_hugetlb_entry_hwpoisoned(pte))) { spin_unlock(ptl); - continue; + goto next_hpte; } if (unlikely(is_hugetlb_entry_migration(pte))) { swp_entry_t entry = pte_to_swp_entry(pte); @@ -6724,11 +6731,11 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, newpte = pte_swp_mkuffd_wp(newpte); else if (uffd_wp_resolve) newpte = pte_swp_clear_uffd_wp(newpte); - set_huge_pte_at(mm, address, ptep, newpte); - pages++; + set_huge_pte_at(mm, address, hpte.ptep, newpte); + base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE; } spin_unlock(ptl); - continue; + goto next_hpte; } if (unlikely(pte_marker_uffd_wp(pte))) { /* @@ -6736,21 +6743,37 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, * no need for huge_ptep_modify_prot_start/commit(). */ if (uffd_wp_resolve) - huge_pte_clear(mm, address, ptep, psize); + huge_pte_clear(mm, address, hpte.ptep, + hugetlb_pte_size(&hpte)); } if (!huge_pte_none(pte)) { pte_t old_pte; - unsigned int shift = huge_page_shift(hstate_vma(vma)); + unsigned int shift = hpte.shift; - old_pte = huge_ptep_modify_prot_start(vma, address, ptep); + /* + * Because we are holding the VMA lock for writing, pte + * will always be a leaf. WARN if it is not. + */ + if (unlikely(!hugetlb_pte_present_leaf(&hpte, pte))) { + spin_unlock(ptl); + WARN_ONCE(1, "Unexpected non-leaf PTE: ptep:%p, address:0x%lx\n", + hpte.ptep, address); + continue; + } + + old_pte = huge_ptep_modify_prot_start( + vma, address, hpte.ptep); pte = huge_pte_modify(old_pte, newprot); - pte = arch_make_huge_pte(pte, shift, vma->vm_flags); + pte = arch_make_huge_pte( + pte, shift, vma->vm_flags); if (uffd_wp) pte = huge_pte_mkuffd_wp(huge_pte_wrprotect(pte)); else if (uffd_wp_resolve) pte = huge_pte_clear_uffd_wp(pte); - huge_ptep_modify_prot_commit(vma, address, ptep, old_pte, pte); - pages++; + huge_ptep_modify_prot_commit( + vma, address, hpte.ptep, + old_pte, pte); + base_pages += hugetlb_pte_size(&hpte) / PAGE_SIZE; } else { /* None pte */ if (unlikely(uffd_wp)) @@ -6759,6 +6782,8 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, make_pte_marker(PTE_MARKER_UFFD_WP)); } spin_unlock(ptl); +next_hpte: + address += hugetlb_pte_size(&hpte); } /* * Must flush TLB before releasing i_mmap_rwsem: x86's huge_pmd_unshare @@ -6781,7 +6806,7 @@ unsigned long hugetlb_change_protection(struct vm_area_struct *vma, hugetlb_vma_unlock_write(vma); mmu_notifier_invalidate_range_end(&range); - return pages << h->order; + return base_pages; } /* Return true if reservation was successful, false otherwise. */ From patchwork Fri Oct 21 16:36:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015086 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 60EF9FA373D for ; Fri, 21 Oct 2022 16:37:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7EDE8E0014; Fri, 21 Oct 2022 12:37:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A2F978E0001; Fri, 21 Oct 2022 12:37:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8AC8C8E0014; Fri, 21 Oct 2022 12:37:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 743008E0001 for ; Fri, 21 Oct 2022 12:37:34 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 5023C81294 for ; Fri, 21 Oct 2022 16:37:34 +0000 (UTC) X-FDA: 80045512428.09.2C80B5F Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf14.hostedemail.com (Postfix) with ESMTP id 02486100023 for ; Fri, 21 Oct 2022 16:37:33 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id y126-20020a257d84000000b006c554365f5aso3714104ybc.9 for ; Fri, 21 Oct 2022 09:37:33 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=X13wx2njoRAvoh4E0IMJVfrDZ5s4WLay4B3IFLQbLEY=; b=Sa9tlTSvMy27PkqCy85zUMB7NXscw1SeftIzcgjHIdZeuKi35UjClC0+e61EdsQTSl NMLCrfPUooY7jzsHIfTKQHEnm6IUVAUYCi3yYETPsKUeFPWIFi4VRk+XusoX6DEnlweI lFO+cIJ76Sk+AjPePBoVFpvlnw2U1gr7dRgFq3DrzIDAhWlVL+NEO/0ro9fFwSCY/mjx 7ZtzekH0PrfJEZhy6u5zJ6NFUTk/+Y3IJrfcH7Gs6tO7dALKjpojag4k7ICH4rINP9Vg 6A/PZUKktjf3jROcDajl8cuELh+tfbsO8Bbf5at999k7ONKuVzu9yjryfOrezKepLD6u XbIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=X13wx2njoRAvoh4E0IMJVfrDZ5s4WLay4B3IFLQbLEY=; b=1t4j0TMi8yfg5umCOfMVRgPuCuEnUEWAnD1R2sdVuxIxfxnrx6s0BPvK/t+Qy/9KNv oQrkKEKiN3neJYLh4jT2UCTiIBujYhxukThnoIOz96QWWf39SlQU2t7D2HiSq9ZXuLBh zdQ02CsRf9HygsB6hCOIS+omkwpNmIJCR7W+yGhyqGNF0JdIa47ngLbNCGiCTz5Q8nlz j4SiTluELCeJSEmRjDUIA7iwpn8CPvTl39yCTmDu3RvG9dm3uySB42p4zDgnnjzqFC/E 53+z7JueElR3SAS4qUbc2laRlETP08qG0Plfo9+K0IfluNN8HOLNUiX5sM1jUpmciMUB 3TcQ== X-Gm-Message-State: ACrzQf2ytBwGSGsVjBhFptKwZrAOIiSZ77/LBOKGUAfaoqIiL7YfAmAU CVUUKMcvdw8SX6Ar5/REXcCAPlQv0Xl0i0pJ X-Google-Smtp-Source: AMsMyM5FTyzfLTC0XnU5k1S4/Yvtcod5ACm9RndBlBmN969+ez/t051FnxhOla7v7Xs6v4smN/AXXJR7XNVGPYEB X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:87:0:b0:6c3:b703:ef69 with SMTP id 129-20020a250087000000b006c3b703ef69mr17587149yba.126.1666370253374; Fri, 21 Oct 2022 09:37:33 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:34 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-19-jthoughton@google.com> Subject: [RFC PATCH v2 18/47] hugetlb: enlighten follow_hugetlb_page to support HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370254; a=rsa-sha256; cv=none; b=k/wVvJ6Uv7IyVfKIlyYd8l9BNqHGnW1ZugOzsLO3y2P0HmdYDGGgU5mOJ6qwuVxfJ4DKs+ l/RLxwIe5wKsznkHl2hRSODspnB3fpawn3C+ToEibrTECgxphwhBtGth3h0VULlUGuJJSy ewDZmC3XErWqqG03V9i+ZkH9bPA+pDU= ARC-Authentication-Results: i=1; imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Sa9tlTSv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of 3zcpSYwoKCM43D18E01D8708805y.w86527EH-664Fuw4.8B0@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3zcpSYwoKCM43D18E01D8708805y.w86527EH-664Fuw4.8B0@flex--jthoughton.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370254; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=X13wx2njoRAvoh4E0IMJVfrDZ5s4WLay4B3IFLQbLEY=; b=O2IeM/RUcZLX5URWrJ7Vu7c1LWtIKLiT7SjhZgYPm2dc/OVR6WVt1eLcqQls/TK3UXXrAR q+D5cAbdjw0wPElwyEs10cXG6Wql18v/rKfEYooUdyftGAt+/7KOe0yMA3gy5gf7W4G+Yj fkGNQLlntIj9gjL7kfuwMwUzftzVkvs= Authentication-Results: imf14.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Sa9tlTSv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf14.hostedemail.com: domain of 3zcpSYwoKCM43D18E01D8708805y.w86527EH-664Fuw4.8B0@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3zcpSYwoKCM43D18E01D8708805y.w86527EH-664Fuw4.8B0@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam04 X-Rspam-User: X-Stat-Signature: dy1j7ocpjimcqicut6hnmor3fmf47z93 X-Rspamd-Queue-Id: 02486100023 X-HE-Tag: 1666370253-803471 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This enables high-granularity mapping support in GUP. One important change here is that, before, we never needed to grab the VMA lock, but now, to prevent someone from collapsing the page tables out from under us, we grab it for reading when doing high-granularity PT walks. In case it is confusing, pfn_offset is the offset (in PAGE_SIZE units) that vaddr points to within the subpage that hpte points to. Signed-off-by: James Houghton --- mm/hugetlb.c | 76 ++++++++++++++++++++++++++++++++++++---------------- 1 file changed, 53 insertions(+), 23 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2d096cef53cd..d76ab32fb6d3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6382,11 +6382,9 @@ static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma, } } -static inline bool __follow_hugetlb_must_fault(unsigned int flags, pte_t *pte, +static inline bool __follow_hugetlb_must_fault(unsigned int flags, pte_t pteval, bool *unshare) { - pte_t pteval = huge_ptep_get(pte); - *unshare = false; if (is_swap_pte(pteval)) return true; @@ -6478,12 +6476,20 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, struct hstate *h = hstate_vma(vma); int err = -EFAULT, refs; + /* + * Grab the VMA lock for reading now so no one can collapse the page + * table from under us. + */ + hugetlb_vma_lock_read(vma); + while (vaddr < vma->vm_end && remainder) { - pte_t *pte; + pte_t *ptep, pte; spinlock_t *ptl = NULL; bool unshare = false; int absent; - struct page *page; + unsigned long pages_per_hpte; + struct page *page, *subpage; + struct hugetlb_pte hpte; /* * If we have a pending SIGKILL, don't keep faulting pages and @@ -6499,13 +6505,22 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * each hugepage. We have to make sure we get the * first, for the page indexing below to work. * - * Note that page table lock is not held when pte is null. + * Note that page table lock is not held when ptep is null. */ - pte = huge_pte_offset(mm, vaddr & huge_page_mask(h), - huge_page_size(h)); - if (pte) - ptl = huge_pte_lock(h, mm, pte); - absent = !pte || huge_pte_none(huge_ptep_get(pte)); + ptep = huge_pte_offset(mm, vaddr & huge_page_mask(h), + huge_page_size(h)); + if (ptep) { + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + hugetlb_hgm_walk(mm, vma, &hpte, vaddr, + PAGE_SIZE, + /*stop_at_none=*/true); + ptl = hugetlb_pte_lock(mm, &hpte); + ptep = hpte.ptep; + pte = huge_ptep_get(ptep); + } + + absent = !ptep || huge_pte_none(pte); /* * When coredumping, it suits get_dump_page if we just return @@ -6516,12 +6531,19 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, */ if (absent && (flags & FOLL_DUMP) && !hugetlbfs_pagecache_present(h, vma, vaddr)) { - if (pte) + if (ptep) spin_unlock(ptl); remainder = 0; break; } + if (!absent && pte_present(pte) && + !hugetlb_pte_present_leaf(&hpte, pte)) { + /* We raced with someone splitting the PTE, so retry. */ + spin_unlock(ptl); + continue; + } + /* * We need call hugetlb_fault for both hugepages under migration * (in which case hugetlb_fault waits for the migration,) and @@ -6537,7 +6559,10 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, vm_fault_t ret; unsigned int fault_flags = 0; - if (pte) + /* Drop the lock before entering hugetlb_fault. */ + hugetlb_vma_unlock_read(vma); + + if (ptep) spin_unlock(ptl); if (flags & FOLL_WRITE) fault_flags |= FAULT_FLAG_WRITE; @@ -6560,7 +6585,7 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, if (ret & VM_FAULT_ERROR) { err = vm_fault_to_errno(ret, flags); remainder = 0; - break; + goto out; } if (ret & VM_FAULT_RETRY) { if (locked && @@ -6578,11 +6603,14 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, */ return i; } + hugetlb_vma_lock_read(vma); continue; } - pfn_offset = (vaddr & ~huge_page_mask(h)) >> PAGE_SHIFT; - page = pte_page(huge_ptep_get(pte)); + pfn_offset = (vaddr & ~hugetlb_pte_mask(&hpte)) >> PAGE_SHIFT; + subpage = pte_page(pte); + pages_per_hpte = hugetlb_pte_size(&hpte) / PAGE_SIZE; + page = compound_head(subpage); VM_BUG_ON_PAGE((flags & FOLL_PIN) && PageAnon(page) && !PageAnonExclusive(page), page); @@ -6592,21 +6620,21 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, * and skip the same_page loop below. */ if (!pages && !vmas && !pfn_offset && - (vaddr + huge_page_size(h) < vma->vm_end) && - (remainder >= pages_per_huge_page(h))) { - vaddr += huge_page_size(h); - remainder -= pages_per_huge_page(h); - i += pages_per_huge_page(h); + (vaddr + pages_per_hpte < vma->vm_end) && + (remainder >= pages_per_hpte)) { + vaddr += pages_per_hpte; + remainder -= pages_per_hpte; + i += pages_per_hpte; spin_unlock(ptl); continue; } /* vaddr may not be aligned to PAGE_SIZE */ - refs = min3(pages_per_huge_page(h) - pfn_offset, remainder, + refs = min3(pages_per_hpte - pfn_offset, remainder, (vma->vm_end - ALIGN_DOWN(vaddr, PAGE_SIZE)) >> PAGE_SHIFT); if (pages || vmas) - record_subpages_vmas(nth_page(page, pfn_offset), + record_subpages_vmas(nth_page(subpage, pfn_offset), vma, refs, likely(pages) ? pages + i : NULL, vmas ? vmas + i : NULL); @@ -6637,6 +6665,8 @@ long follow_hugetlb_page(struct mm_struct *mm, struct vm_area_struct *vma, spin_unlock(ptl); } + hugetlb_vma_unlock_read(vma); +out: *nr_pages = remainder; /* * setting position is actually required only if remainder is From patchwork Fri Oct 21 16:36:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015087 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 92D43C433FE for ; Fri, 21 Oct 2022 16:37:38 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 888A48E0015; Fri, 21 Oct 2022 12:37:35 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7730F8E0001; Fri, 21 Oct 2022 12:37:35 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 574E38E0015; Fri, 21 Oct 2022 12:37:35 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id 40C988E0001 for ; Fri, 21 Oct 2022 12:37:35 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 20E561606B2 for ; Fri, 21 Oct 2022 16:37:35 +0000 (UTC) X-FDA: 80045512470.27.6158D76 Received: from mail-ua1-f73.google.com (mail-ua1-f73.google.com [209.85.222.73]) by imf22.hostedemail.com (Postfix) with ESMTP id BD1F8C001E for ; Fri, 21 Oct 2022 16:37:34 +0000 (UTC) Received: by mail-ua1-f73.google.com with SMTP id o17-20020a9f3311000000b003d01d091d4fso2355531uab.2 for ; Fri, 21 Oct 2022 09:37:34 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=CRJ6pkRKArtLmZe7XMBJjGLJyFbSk0rbbeye9q84C9k=; b=ZQtGwmeXsLEasfxlAVd+Ky+KcZ4u0ied8Nvno8abVpS+XKmDnGeaG7sbDC8DX8DjvM 74vP8xl/C/D6frwkZjH0NjGEaALQrxYOl4qyIHM6NFQ6TqBYFbfRh99+goV7aMS+wqHZ EWOers7WW52iKsOYpfLe1iauG6eIptFtQYmFfmKiotPdEPO4sKbu65xEzDbR/5RG82C/ y77vYpTF7YV+Whm4oAUX1QUUOJtL0b1RKIBN4O1mH8xMYkKU5aZVNAgTr4TGtpelYVZI e9gU8PeJNzs1B5TCPPy55btaAR0P5+I/VVVQLdPgY9hG0sq1HT32TKdmh24otsMY096+ x/yw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=CRJ6pkRKArtLmZe7XMBJjGLJyFbSk0rbbeye9q84C9k=; b=5VNEERqr6GcU5k1jL891F84tv/nl9VLS29cCFrl/cED74YknoL9Cceldl1Ty4gr1U0 yTzJSNsGDTNOLy1dXtdeB9M9dsLIUuwmJsyBSdoqhVRpy4Ad8Fvvv7hT+f37KMFc/xsg 3EXzMRwOvrwbgg81Jm0nxmk+S/H0W3eityZUYFZGDltt3am47w5GIgaMzUloHI0SBNaY xD1FUixATeBpgXRWFpRilWnqxpZ3xQnmhDMUoFMkWg99LHHzVFsZpgb2hNzC/njT241C cSm8e/zzZdekR80jlKbqtWjsDPTUNbNpqkhcRUCLtZ4DarLNjv+9mREPhHCJDPcD2AEr qXHA== X-Gm-Message-State: ACrzQf3DLhdt1W+8jSaxi6oZGGpyOtPLyyEJjUi38cTKSbyyIJ4OhWzH J6mppIFDWpWDAX1VLcjz8F3jKPLg4wdug22E X-Google-Smtp-Source: AMsMyM58r3yVRAL1NhZoJpcVLFGz0JqMpwGwsUiw6+Jq6Q4QwSneC2a182VZO4sfXGLadr38QjPrDD4kaXOFzTvM X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:3678:b0:3a7:6056:9e7a with SMTP id bg24-20020a056102367800b003a760569e7amr12639659vsb.62.1666370254111; Fri, 21 Oct 2022 09:37:34 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:35 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-20-jthoughton@google.com> Subject: [RFC PATCH v2 19/47] hugetlb: make hugetlb_follow_page_mask HGM-enabled From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370254; a=rsa-sha256; cv=none; b=Y0CGlIfSnXOrnNS0Ajsn4bGT0YOEkcoUglcvcTs+/cITL0Wya1/lPnZMN6L8LWVhLGXmHR ao+Sov7x33OMl5Ripbo/ePsyU+f7w9FlrDphxNDLUeSsASR7wXSs5jKBKS0LD/XKffsGke N9m7SxbbONzBQiuNDI4ebWU2hY7klMg= ARC-Authentication-Results: i=1; imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ZQtGwmeX; spf=pass (imf22.hostedemail.com: domain of 3zspSYwoKCM84E29F12E9819916z.x97638FI-775Gvx5.9C1@flex--jthoughton.bounces.google.com designates 209.85.222.73 as permitted sender) smtp.mailfrom=3zspSYwoKCM84E29F12E9819916z.x97638FI-775Gvx5.9C1@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370254; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=CRJ6pkRKArtLmZe7XMBJjGLJyFbSk0rbbeye9q84C9k=; b=BKIA99eLbt5NCz+iky8jRTp2YQUgx37rhaP1zYqtQhJeUmPlXqCnbJVCx3LvE8LmqpaXQC 3v5APy8UucSXRidss7Npn1EY4EPe3qy9OhAed81Q9LtYQ6k6RnfJFq7aCHdBD0jpALB8Tg uQLoFmOIwGwXP4Brv6Kyu2yPxvu4Afg= X-Stat-Signature: 5ocibj7jp9sfmnysrcixiro6dq3qtrij X-Rspamd-Queue-Id: BD1F8C001E Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ZQtGwmeX; spf=pass (imf22.hostedemail.com: domain of 3zspSYwoKCM84E29F12E9819916z.x97638FI-775Gvx5.9C1@flex--jthoughton.bounces.google.com designates 209.85.222.73 as permitted sender) smtp.mailfrom=3zspSYwoKCM84E29F12E9819916z.x97638FI-775Gvx5.9C1@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1666370254-387935 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The change here is very simple: do a high-granularity walk. Signed-off-by: James Houghton --- mm/hugetlb.c | 16 +++++++++++++++- 1 file changed, 15 insertions(+), 1 deletion(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d76ab32fb6d3..5783a8307a77 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6408,6 +6408,7 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, struct page *page = NULL; spinlock_t *ptl; pte_t *pte, entry; + struct hugetlb_pte hpte; /* * FOLL_PIN is not supported for follow_page(). Ordinary GUP goes via @@ -6429,9 +6430,22 @@ struct page *hugetlb_follow_page_mask(struct vm_area_struct *vma, return NULL; } - ptl = huge_pte_lock(h, mm, pte); +retry_walk: + hugetlb_pte_populate(&hpte, pte, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + hugetlb_hgm_walk(mm, vma, &hpte, address, + PAGE_SIZE, + /*stop_at_none=*/true); + + ptl = hugetlb_pte_lock(mm, &hpte); entry = huge_ptep_get(pte); if (pte_present(entry)) { + if (unlikely(!hugetlb_pte_present_leaf(&hpte, entry))) { + /* We raced with someone splitting from under us. */ + spin_unlock(ptl); + goto retry_walk; + } + page = pte_page(entry) + ((address & ~huge_page_mask(h)) >> PAGE_SHIFT); /* From patchwork Fri Oct 21 16:36:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015114 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A04B8FA373D for ; Fri, 21 Oct 2022 16:43:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3EAB18E0002; Fri, 21 Oct 2022 12:43:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39AE48E0001; Fri, 21 Oct 2022 12:43:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 23BD88E0002; Fri, 21 Oct 2022 12:43:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 124DA8E0001 for ; Fri, 21 Oct 2022 12:43:09 -0400 (EDT) Received: from smtpin06.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C1D2C1C6316 for ; Fri, 21 Oct 2022 16:43:08 +0000 (UTC) X-FDA: 80045526456.06.DD22EE4 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf17.hostedemail.com (Postfix) with ESMTP id 56AAB40033 for ; Fri, 21 Oct 2022 16:43:08 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-3586920096bso34286217b3.20 for ; Fri, 21 Oct 2022 09:43:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=jeZGXnq9ppZX6Bn3Ab6ADEhW+p+OYv5o03S1VkTLnuk=; b=tbNvguBzB1ZQ4lsC5+9mg7RWgx9DLFVbmF5v6BFznCk26QFFQdPaNiWrYza+kxDylE tEa7a8XUOhkliOlVXdX6cyfm8TE9X0VEvk/pj5HJUcnfc7FgSAESb7fi8f7w8gA576cO LYwJodJrSLZqWi2/eMrO03DghELV+IjVJYCQF4i3lW2FOVdNkbcytuaue/e5g9HIMMWF uNu1bQ+0pIqtKnUyor0k37TH4YOlWSMHgWWcV4CAVeufrCm6C4lzViN3BceDaEdp7Rlw gCQWhWp3Lo+C7PMV0hXo81mJTBpOeDjRf2UNTijPurz0+pu6m8vVHSiJIopwA6humXkP 4dgA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=jeZGXnq9ppZX6Bn3Ab6ADEhW+p+OYv5o03S1VkTLnuk=; b=Coy33xtzAR8bbZEYnO9HXDtjNhrxcLJw5OyKwnujirEPohqBtpqozOXGvQYXq7vsfX EFIr7l/L1oVZvHXaYS0VOOH9WIBr8a80y/AbjdmOmrtHXiyWlB/GT7E4rJzWDamY7LOl CRI87R9TfkLqmL8SeJEUpOWZZzN9qnqXOf7nAX3Swwds+0/94E2SG+d/07Ea14zYgob9 Tc3YDyp81U7OyjFOlszIwmdEv0xBMqTutVzCFS+zpRGPF3nDofkze1bF3ft+UEUi4Pu4 hJoo1NugV+UPVH3p5LKjyLpJr9o0GErF2AfO9r1zBjdjBMbGqnCuqRqmtCfTnH5hfaZU YYJQ== X-Gm-Message-State: ACrzQf0YNN6ilgTTZ0wP08oBHzo9ZpyQlG9Gcu4Wq6bnn7sS4WONcYCO O9P8pnYrQrWX88k7LZTCdQin+l5J9NssmeLq X-Google-Smtp-Source: AMsMyM7QWTBVTDYOl/TUbDnTqd5zxQWVq73p6o4dP7bFkvXD+6XB7O77ZjXS/hAWz+IG055NLGtAnntOHHr7ZsbD X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:df08:0:b0:352:f2f2:580c with SMTP id c8-20020a81df08000000b00352f2f2580cmr17379566ywn.40.1666370254979; Fri, 21 Oct 2022 09:37:34 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:36 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-21-jthoughton@google.com> Subject: [RFC PATCH v2 20/47] hugetlb: use struct hugetlb_pte for walk_hugetlb_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370588; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=jeZGXnq9ppZX6Bn3Ab6ADEhW+p+OYv5o03S1VkTLnuk=; b=RsHPFjL0xliFPmtP7Mbb117p4+EwTUbkt0fas8TUpU+MrY22P5BfwmrUcj8wb0sMBaB7nB KVF08ZcZXNj+2S+FzJw4woDxqk3xx064ErCdSoPVs8UpzVG3Lj6i0VqFtpNMZxnIUkSNBo E6cBD1wPbtqdItukA9H4nZ79V9/ogiA= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=tbNvguBz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3zspSYwoKCM84E29F12E9819916z.x97638FI-775Gvx5.9C1@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3zspSYwoKCM84E29F12E9819916z.x97638FI-775Gvx5.9C1@flex--jthoughton.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370588; a=rsa-sha256; cv=none; b=5IfLAnMmWy457gw35zlqwle/Fxo0WlMq1yjYMdpeU2hw7ymASKTG1Mr4YZj7wwvTKVSvo7 YcAZ0rM3O+yb2+uxqQQx//9E9J87vmsEUHFtDGkJyHWFFj4ys1ytQ8FudIAlmp9w1O622O gQmV/HUnynsJnkjk9+Dw05fFiu+OIIg= X-Rspam-User: X-Rspamd-Queue-Id: 56AAB40033 X-Stat-Signature: zneje1x4eszexmwjjibuqsorrmhqxspr Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=tbNvguBz; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf17.hostedemail.com: domain of 3zspSYwoKCM84E29F12E9819916z.x97638FI-775Gvx5.9C1@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3zspSYwoKCM84E29F12E9819916z.x97638FI-775Gvx5.9C1@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam07 X-HE-Tag: 1666370588-4496 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The main change in this commit is to walk_hugetlb_range to support walking HGM mappings, but all walk_hugetlb_range callers must be updated to use the new API and take the correct action. Listing all the changes to the callers: For s390 changes, we simply ignore HGM PTEs (we don't support s390 yet). For smaps, shared_hugetlb (and private_hugetlb, although private mappings don't support HGM) may now not be divisible by the hugepage size. The appropriate changes have been made to support analyzing HGM PTEs. For pagemap, we ignore non-leaf PTEs by treating that as if they were none PTEs. We can only end up with non-leaf PTEs if they had just been updated from a none PTE. For show_numa_map, the challenge is that, if any of a hugepage is mapped, we have to count that entire page exactly once, as the results are given in units of hugepages. To support HGM mappings, we keep track of the last page that we looked it. If the hugepage we are currently looking at is the same as the last one, then we must be looking at an HGM-mapped page that has been mapped at high-granularity, and we've already accounted for it. For DAMON, we treat non-leaf PTEs as if they were blank, for the same reason as pagemap. For hwpoison, we proactively update the logic to support the case when hpte is pointing to a subpage within the poisoned hugepage. For queue_pages_hugetlb/migration, we ignore all HGM-enabled VMAs for now. For mincore, we ignore non-leaf PTEs for the same reason as pagemap. For mprotect/prot_none_hugetlb_entry, we retry the walk when we get a non-leaf PTE. Signed-off-by: James Houghton --- arch/s390/mm/gmap.c | 20 ++++++++-- fs/proc/task_mmu.c | 83 +++++++++++++++++++++++++++++----------- include/linux/pagewalk.h | 11 ++++-- mm/damon/vaddr.c | 57 +++++++++++++++++---------- mm/hmm.c | 21 ++++++---- mm/memory-failure.c | 17 ++++---- mm/mempolicy.c | 12 ++++-- mm/mincore.c | 17 ++++++-- mm/mprotect.c | 18 ++++++--- mm/pagewalk.c | 32 +++++++++++++--- 10 files changed, 203 insertions(+), 85 deletions(-) diff --git a/arch/s390/mm/gmap.c b/arch/s390/mm/gmap.c index 02d15c8dc92e..d65c15b5dccb 100644 --- a/arch/s390/mm/gmap.c +++ b/arch/s390/mm/gmap.c @@ -2622,13 +2622,25 @@ static int __s390_enable_skey_pmd(pmd_t *pmd, unsigned long addr, return 0; } -static int __s390_enable_skey_hugetlb(pte_t *pte, unsigned long addr, - unsigned long hmask, unsigned long next, +static int __s390_enable_skey_hugetlb(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - pmd_t *pmd = (pmd_t *)pte; + struct hstate *h = hstate_vma(walk->vma); + pmd_t *pmd; unsigned long start, end; - struct page *page = pmd_page(*pmd); + struct page *page; + + if (huge_page_size(h) != hugetlb_pte_size(hpte)) + /* Ignore high-granularity PTEs. */ + return 0; + + if (!pte_present(huge_ptep_get(hpte->ptep))) + /* Ignore non-present PTEs. */ + return 0; + + pmd = (pmd_t *)pte; + page = pmd_page(*pmd); /* * The write check makes sure we do not set a key on shared diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index 8a74cdcc9af0..be78cdb7677e 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -720,18 +720,28 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) } #ifdef CONFIG_HUGETLB_PAGE -static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int smaps_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, + struct mm_walk *walk) { struct mem_size_stats *mss = walk->private; struct vm_area_struct *vma = walk->vma; struct page *page = NULL; + pte_t pte = huge_ptep_get(hpte->ptep); - if (pte_present(*pte)) { - page = vm_normal_page(vma, addr, *pte); - } else if (is_swap_pte(*pte)) { - swp_entry_t swpent = pte_to_swp_entry(*pte); + if (pte_present(pte)) { + /* We only care about leaf-level PTEs. */ + if (!hugetlb_pte_present_leaf(hpte, pte)) + /* + * The only case where hpte is not a leaf is that + * it was originally none, but it was split from + * under us. It was originally none, so exclude it. + */ + return 0; + + page = vm_normal_page(vma, addr, pte); + } else if (is_swap_pte(pte)) { + swp_entry_t swpent = pte_to_swp_entry(pte); if (is_pfn_swap_entry(swpent)) page = pfn_swap_entry_to_page(swpent); @@ -740,9 +750,9 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long hmask, int mapcount = page_mapcount(page); if (mapcount >= 2) - mss->shared_hugetlb += huge_page_size(hstate_vma(vma)); + mss->shared_hugetlb += hugetlb_pte_size(hpte); else - mss->private_hugetlb += huge_page_size(hstate_vma(vma)); + mss->private_hugetlb += hugetlb_pte_size(hpte); } return 0; } @@ -1561,22 +1571,31 @@ static int pagemap_pmd_range(pmd_t *pmdp, unsigned long addr, unsigned long end, #ifdef CONFIG_HUGETLB_PAGE /* This function walks within one hugetlb entry in the single call */ -static int pagemap_hugetlb_range(pte_t *ptep, unsigned long hmask, - unsigned long addr, unsigned long end, +static int pagemap_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { struct pagemapread *pm = walk->private; struct vm_area_struct *vma = walk->vma; u64 flags = 0, frame = 0; int err = 0; - pte_t pte; + unsigned long hmask = hugetlb_pte_mask(hpte); + unsigned long end = addr + hugetlb_pte_size(hpte); + pte_t pte = huge_ptep_get(hpte->ptep); + struct page *page; if (vma->vm_flags & VM_SOFTDIRTY) flags |= PM_SOFT_DIRTY; - pte = huge_ptep_get(ptep); if (pte_present(pte)) { - struct page *page = pte_page(pte); + /* + * We raced with this PTE being split, which can only happen if + * it was blank before. Treat it is as if it were blank. + */ + if (!hugetlb_pte_present_leaf(hpte, pte)) + return 0; + + page = pte_page(pte); if (!PageAnon(page)) flags |= PM_FILE; @@ -1857,10 +1876,16 @@ static struct page *can_gather_numa_stats_pmd(pmd_t pmd, } #endif +struct show_numa_map_private { + struct numa_maps *md; + struct page *last_page; +}; + static int gather_pte_stats(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { - struct numa_maps *md = walk->private; + struct show_numa_map_private *priv = walk->private; + struct numa_maps *md = priv->md; struct vm_area_struct *vma = walk->vma; spinlock_t *ptl; pte_t *orig_pte; @@ -1872,6 +1897,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, struct page *page; page = can_gather_numa_stats_pmd(*pmd, vma, addr); + priv->last_page = page; if (page) gather_stats(page, md, pmd_dirty(*pmd), HPAGE_PMD_SIZE/PAGE_SIZE); @@ -1885,6 +1911,7 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, orig_pte = pte = pte_offset_map_lock(walk->mm, pmd, addr, &ptl); do { struct page *page = can_gather_numa_stats(*pte, vma, addr); + priv->last_page = page; if (!page) continue; gather_stats(page, md, pte_dirty(*pte), 1); @@ -1895,19 +1922,25 @@ static int gather_pte_stats(pmd_t *pmd, unsigned long addr, return 0; } #ifdef CONFIG_HUGETLB_PAGE -static int gather_hugetlb_stats(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, struct mm_walk *walk) +static int gather_hugetlb_stats(struct hugetlb_pte *hpte, unsigned long addr, + struct mm_walk *walk) { - pte_t huge_pte = huge_ptep_get(pte); + struct show_numa_map_private *priv = walk->private; + pte_t huge_pte = huge_ptep_get(hpte->ptep); struct numa_maps *md; struct page *page; - if (!pte_present(huge_pte)) + if (!hugetlb_pte_present_leaf(hpte, huge_pte)) + return 0; + + page = compound_head(pte_page(huge_pte)); + if (priv->last_page == page) + /* we've already accounted for this page */ return 0; - page = pte_page(huge_pte); + priv->last_page = page; - md = walk->private; + md = priv->md; gather_stats(page, md, pte_dirty(huge_pte), 1); return 0; } @@ -1937,9 +1970,15 @@ static int show_numa_map(struct seq_file *m, void *v) struct file *file = vma->vm_file; struct mm_struct *mm = vma->vm_mm; struct mempolicy *pol; + char buffer[64]; int nid; + struct show_numa_map_private numa_map_private; + + numa_map_private.md = md; + numa_map_private.last_page = NULL; + if (!mm) return 0; @@ -1969,7 +2008,7 @@ static int show_numa_map(struct seq_file *m, void *v) seq_puts(m, " huge"); /* mmap_lock is held by m_start */ - walk_page_vma(vma, &show_numa_ops, md); + walk_page_vma(vma, &show_numa_ops, &numa_map_private); if (!md->pages) goto out; diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h index 2f8f6cc980b4..7ed065ea5dba 100644 --- a/include/linux/pagewalk.h +++ b/include/linux/pagewalk.h @@ -3,6 +3,7 @@ #define _LINUX_PAGEWALK_H #include +#include struct mm_walk; @@ -21,7 +22,10 @@ struct mm_walk; * depth is -1 if not known, 0:PGD, 1:P4D, 2:PUD, 3:PMD. * Any folded depths (where PTRS_PER_P?D is equal to 1) * are skipped. - * @hugetlb_entry: if set, called for each hugetlb entry + * @hugetlb_entry: if set, called for each hugetlb entry. In the presence + * of high-granularity hugetlb entries, @hugetlb_entry is + * called only for leaf-level entries (i.e., hstate-level + * page table entries are ignored if they are not leaves). * @test_walk: caller specific callback function to determine whether * we walk over the current vma or not. Returning 0 means * "do page table walk over the current vma", returning @@ -47,9 +51,8 @@ struct mm_walk_ops { unsigned long next, struct mm_walk *walk); int (*pte_hole)(unsigned long addr, unsigned long next, int depth, struct mm_walk *walk); - int (*hugetlb_entry)(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long next, - struct mm_walk *walk); + int (*hugetlb_entry)(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk); int (*test_walk)(unsigned long addr, unsigned long next, struct mm_walk *walk); int (*pre_vma)(unsigned long start, unsigned long end, diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c index 15f03df66db6..42845e1b560d 100644 --- a/mm/damon/vaddr.c +++ b/mm/damon/vaddr.c @@ -330,48 +330,55 @@ static int damon_mkold_pmd_entry(pmd_t *pmd, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static void damon_hugetlb_mkold(pte_t *pte, struct mm_struct *mm, +static void damon_hugetlb_mkold(struct hugetlb_pte *hpte, pte_t entry, + struct mm_struct *mm, struct vm_area_struct *vma, unsigned long addr) { bool referenced = false; - pte_t entry = huge_ptep_get(pte); struct page *page = pte_page(entry); + struct page *hpage = compound_head(page); - get_page(page); + get_page(hpage); if (pte_young(entry)) { referenced = true; entry = pte_mkold(entry); - set_huge_pte_at(mm, addr, pte, entry); + set_huge_pte_at(mm, addr, hpte->ptep, entry); } #ifdef CONFIG_MMU_NOTIFIER if (mmu_notifier_clear_young(mm, addr, - addr + huge_page_size(hstate_vma(vma)))) + addr + hugetlb_pte_size(hpte))) referenced = true; #endif /* CONFIG_MMU_NOTIFIER */ if (referenced) - set_page_young(page); + set_page_young(hpage); - set_page_idle(page); - put_page(page); + set_page_idle(hpage); + put_page(hpage); } -static int damon_mkold_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int damon_mkold_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - struct hstate *h = hstate_vma(walk->vma); spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(h, walk->mm, pte); - entry = huge_ptep_get(pte); + ptl = hugetlb_pte_lock(walk->mm, hpte); + entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto out; - damon_hugetlb_mkold(pte, walk->mm, walk->vma, addr); + if (!hugetlb_pte_present_leaf(hpte, entry)) + /* + * We raced with someone splitting a blank PTE. Treat this PTE + * as if it were blank. + */ + goto out; + + damon_hugetlb_mkold(hpte, entry, walk->mm, walk->vma, addr); out: spin_unlock(ptl); @@ -484,31 +491,39 @@ static int damon_young_pmd_entry(pmd_t *pmd, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static int damon_young_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int damon_young_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { struct damon_young_walk_private *priv = walk->private; struct hstate *h = hstate_vma(walk->vma); - struct page *page; + struct page *page, *hpage; spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(h, walk->mm, pte); + ptl = hugetlb_pte_lock(walk->mm, hpte); entry = huge_ptep_get(pte); if (!pte_present(entry)) goto out; + if (!hugetlb_pte_present_leaf(hpte, entry)) + /* + * We raced with someone splitting a blank PTE. Treat this PTE + * as if it were blank. + */ + goto out; + page = pte_page(entry); - get_page(page); + hpage = compound_head(page); + get_page(hpage); - if (pte_young(entry) || !page_is_idle(page) || + if (pte_young(entry) || !page_is_idle(hpage) || mmu_notifier_test_young(walk->mm, addr)) { *priv->page_sz = huge_page_size(h); priv->young = true; } - put_page(page); + put_page(hpage); out: spin_unlock(ptl); diff --git a/mm/hmm.c b/mm/hmm.c index 3850fb625dda..76679b46ad5e 100644 --- a/mm/hmm.c +++ b/mm/hmm.c @@ -469,27 +469,34 @@ static int hmm_vma_walk_pud(pud_t *pudp, unsigned long start, unsigned long end, #endif #ifdef CONFIG_HUGETLB_PAGE -static int hmm_vma_walk_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long start, unsigned long end, +static int hmm_vma_walk_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long start, struct mm_walk *walk) { unsigned long addr = start, i, pfn; struct hmm_vma_walk *hmm_vma_walk = walk->private; struct hmm_range *range = hmm_vma_walk->range; - struct vm_area_struct *vma = walk->vma; unsigned int required_fault; unsigned long pfn_req_flags; unsigned long cpu_flags; + unsigned long hmask = hugetlb_pte_mask(hpte); + unsigned int order = hugetlb_pte_shift(hpte) - PAGE_SHIFT; + unsigned long end = start + hugetlb_pte_size(hpte); spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(hstate_vma(vma), walk->mm, pte); - entry = huge_ptep_get(pte); + ptl = hugetlb_pte_lock(walk->mm, hpte); + entry = huge_ptep_get(hpte->ptep); + + if (!hugetlb_pte_present_leaf(hpte, entry)) { + spin_unlock(ptl); + return -EAGAIN; + } i = (start - range->start) >> PAGE_SHIFT; pfn_req_flags = range->hmm_pfns[i]; cpu_flags = pte_to_hmm_pfn_flags(range, entry) | - hmm_pfn_flags_order(huge_page_order(hstate_vma(vma))); + hmm_pfn_flags_order(order); required_fault = hmm_pte_need_fault(hmm_vma_walk, pfn_req_flags, cpu_flags); if (required_fault) { @@ -593,7 +600,7 @@ int hmm_range_fault(struct hmm_range *range) * in pfns. All entries < last in the pfn array are set to their * output, and all >= are still at their input values. */ - } while (ret == -EBUSY); + } while (ret == -EBUSY || ret == -EAGAIN); return ret; } EXPORT_SYMBOL(hmm_range_fault); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index bead6bccc7f2..505efba59d29 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -628,6 +628,7 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, unsigned long poisoned_pfn, struct to_kill *tk) { unsigned long pfn = 0; + unsigned long base_pages_poisoned = (1UL << shift) / PAGE_SIZE; if (pte_present(pte)) { pfn = pte_pfn(pte); @@ -638,7 +639,8 @@ static int check_hwpoisoned_entry(pte_t pte, unsigned long addr, short shift, pfn = swp_offset_pfn(swp); } - if (!pfn || pfn != poisoned_pfn) + if (!pfn || pfn < poisoned_pfn || + pfn >= poisoned_pfn + base_pages_poisoned) return 0; set_to_kill(tk, addr, shift); @@ -704,16 +706,15 @@ static int hwpoison_pte_range(pmd_t *pmdp, unsigned long addr, } #ifdef CONFIG_HUGETLB_PAGE -static int hwpoison_hugetlb_range(pte_t *ptep, unsigned long hmask, - unsigned long addr, unsigned long end, - struct mm_walk *walk) +static int hwpoison_hugetlb_range(struct hugetlb_pte *hpte, + unsigned long addr, + struct mm_walk *walk) { struct hwp_walk *hwp = walk->private; - pte_t pte = huge_ptep_get(ptep); - struct hstate *h = hstate_vma(walk->vma); + pte_t pte = huge_ptep_get(hpte->ptep); - return check_hwpoisoned_entry(pte, addr, huge_page_shift(h), - hwp->pfn, &hwp->tk); + return check_hwpoisoned_entry(pte, addr & hugetlb_pte_mask(hpte), + hpte->shift, hwp->pfn, &hwp->tk); } #else #define hwpoison_hugetlb_range NULL diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 61aa9aedb728..275bc549590e 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -558,8 +558,8 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, return addr != end ? -EIO : 0; } -static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long end, +static int queue_pages_hugetlb(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { int ret = 0; @@ -570,8 +570,12 @@ static int queue_pages_hugetlb(pte_t *pte, unsigned long hmask, spinlock_t *ptl; pte_t entry; - ptl = huge_pte_lock(hstate_vma(walk->vma), walk->mm, pte); - entry = huge_ptep_get(pte); + /* We don't migrate high-granularity HugeTLB mappings for now. */ + if (hugetlb_hgm_enabled(walk->vma)) + return -EINVAL; + + ptl = hugetlb_pte_lock(walk->mm, hpte); + entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto unlock; page = pte_page(entry); diff --git a/mm/mincore.c b/mm/mincore.c index a085a2aeabd8..0894965b3944 100644 --- a/mm/mincore.c +++ b/mm/mincore.c @@ -22,18 +22,29 @@ #include #include "swap.h" -static int mincore_hugetlb(pte_t *pte, unsigned long hmask, unsigned long addr, - unsigned long end, struct mm_walk *walk) +static int mincore_hugetlb(struct hugetlb_pte *hpte, unsigned long addr, + struct mm_walk *walk) { #ifdef CONFIG_HUGETLB_PAGE unsigned char present; + unsigned long end = addr + hugetlb_pte_size(hpte); unsigned char *vec = walk->private; + pte_t pte = huge_ptep_get(hpte->ptep); /* * Hugepages under user process are always in RAM and never * swapped out, but theoretically it needs to be checked. */ - present = pte && !huge_pte_none(huge_ptep_get(pte)); + present = !huge_pte_none(pte); + + /* + * If the pte is present but not a leaf, we raced with someone + * splitting it. For someone to have split it, it must have been + * huge_pte_none before, so treat it as such. + */ + if (pte_present(pte) && !hugetlb_pte_present_leaf(hpte, pte)) + present = false; + for (; addr != end; vec++, addr += PAGE_SIZE) *vec = present; walk->private = vec; diff --git a/mm/mprotect.c b/mm/mprotect.c index 99762403cc8f..9975b86035e0 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -524,12 +524,16 @@ static int prot_none_pte_entry(pte_t *pte, unsigned long addr, 0 : -EACCES; } -static int prot_none_hugetlb_entry(pte_t *pte, unsigned long hmask, - unsigned long addr, unsigned long next, +static int prot_none_hugetlb_entry(struct hugetlb_pte *hpte, + unsigned long addr, struct mm_walk *walk) { - return pfn_modify_allowed(pte_pfn(*pte), *(pgprot_t *)(walk->private)) ? - 0 : -EACCES; + pte_t pte = huge_ptep_get(hpte->ptep); + + if (!hugetlb_pte_present_leaf(hpte, pte)) + return -EAGAIN; + return pfn_modify_allowed(pte_pfn(pte), + *(pgprot_t *)(walk->private)) ? 0 : -EACCES; } static int prot_none_test(unsigned long addr, unsigned long next, @@ -572,8 +576,10 @@ mprotect_fixup(struct mmu_gather *tlb, struct vm_area_struct *vma, (newflags & VM_ACCESS_FLAGS) == 0) { pgprot_t new_pgprot = vm_get_page_prot(newflags); - error = walk_page_range(current->mm, start, end, - &prot_none_walk_ops, &new_pgprot); + do { + error = walk_page_range(current->mm, start, end, + &prot_none_walk_ops, &new_pgprot); + } while (error == -EAGAIN); if (error) return error; } diff --git a/mm/pagewalk.c b/mm/pagewalk.c index bb33c1e8c017..2318aae98f1e 100644 --- a/mm/pagewalk.c +++ b/mm/pagewalk.c @@ -3,6 +3,7 @@ #include #include #include +#include /* * We want to know the real level where a entry is located ignoring any @@ -301,20 +302,39 @@ static int walk_hugetlb_range(unsigned long addr, unsigned long end, pte_t *pte; const struct mm_walk_ops *ops = walk->ops; int err = 0; + struct hugetlb_pte hpte; + + if (hugetlb_hgm_enabled(vma)) + /* + * We could potentially do high-granularity walks. Grab the + * VMA lock to prevent PTEs from becoming invalid. + */ + hugetlb_vma_lock_read(vma); do { - next = hugetlb_entry_end(h, addr, end); pte = huge_pte_offset(walk->mm, addr & hmask, sz); - - if (pte) - err = ops->hugetlb_entry(pte, hmask, addr, next, walk); - else if (ops->pte_hole) - err = ops->pte_hole(addr, next, -1, walk); + if (!pte) { + next = hugetlb_entry_end(h, addr, end); + if (ops->pte_hole) + err = ops->pte_hole(addr, next, -1, walk); + } else { + hugetlb_pte_populate(&hpte, pte, huge_page_shift(h), + hpage_size_to_level(sz)); + hugetlb_hgm_walk(walk->mm, vma, &hpte, addr, + PAGE_SIZE, + /*stop_at_none=*/true); + err = ops->hugetlb_entry( + &hpte, addr, walk); + next = min(addr + hugetlb_pte_size(&hpte), end); + } if (err) break; } while (addr = next, addr != end); + if (hugetlb_hgm_enabled(vma)) + hugetlb_vma_unlock_read(vma); + return err; } From patchwork Fri Oct 21 16:36:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015088 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CABF9FA373D for ; Fri, 21 Oct 2022 16:37:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 047D48E0016; Fri, 21 Oct 2022 12:37:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F146D8E0001; Fri, 21 Oct 2022 12:37:36 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CF1398E0016; Fri, 21 Oct 2022 12:37:36 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id AF3DD8E0001 for ; Fri, 21 Oct 2022 12:37:36 -0400 (EDT) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 738DC1C6CEF for ; Fri, 21 Oct 2022 16:37:36 +0000 (UTC) X-FDA: 80045512512.10.95F99FF Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf29.hostedemail.com (Postfix) with ESMTP id 2127F120043 for ; Fri, 21 Oct 2022 16:37:35 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-360a7ff46c3so34034277b3.12 for ; Fri, 21 Oct 2022 09:37:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=b10z9K2d7yL2NkYZoSYI1eqym2z6cE2oNjVSRM+YxM4=; b=RdBFlvTWkP0mOj/+10aeT3pEp8gO1xe67MKNQAL9eJH8ygvwfl65HnLVJ/TXb07S8J Ekg4eIzJiS8mFJVRjcHyYUexaJ2Wpz0KD24tdJaK9vQvVpzHw4Swadb/CS9sPPIaClpx a4pSknIHqn7yKEu2xGQn6BM9ClrfpieKDQTxfc134vaHcOGtZtUzcwwZ/+hyo2Y9SDbU EqxOcGGs64poU/YYQXMdII6wUC0HnwMTfwDML9Jw5n3ISsd1571avOMoRlykby8HbVOv iidSXRx0txEQjxJng6lLm/tQXQGCk4PTQsXMvO/tZsN0O2ZrP+47TYzob91093CTHxXV vE5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=b10z9K2d7yL2NkYZoSYI1eqym2z6cE2oNjVSRM+YxM4=; b=H/SorkTRlwPlRwUQcj0sTXHByVNPFyLemcb5x3zMGkBfuUt4R7SzVYBWnFC1ROJPwr SLK4zwkTWlxHxEh3C2kcOp7s1fSclW1REyWMEGmuxvfd+blOK+XWdFt29osFdLETKMlj dp1Ob64irWuZPPf5SOpVUPn/kx84itfmPEfpUSWmFJCVsyo/1EjowG3IsNs76bbtF3aU KN5COtgdvDaHW/KYS4zGfPIisuiomZGQoTCHrskxRDqUu438Cx/6l5xoNzO9lVlyvdpy dPQgYL6GFV7ar3sQv+cdfB+q2i+UsV6D4oxSIGK15hBZcrD/QaivyM55lnKvvxZQRjqD uC6g== X-Gm-Message-State: ACrzQf2Alj+g4AS+CbyjN+i/yebg5g3AvTNU+HRvBE2H/BCTSD8Pcv/K bXXxJAYBOVLlH74q7w3wEOzbSRs+j0JHb1Yb X-Google-Smtp-Source: AMsMyM4a1Xbq88agbnwbbrlQOayFRrDGXm2X6sV7Iqp0l2y6+5WLGgjVLDxiqSHzLLJPyqPxXNBrDUUp2xxQ0SZu X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:8d14:0:b0:361:4e59:a90e with SMTP id d20-20020a818d14000000b003614e59a90emr17171702ywg.288.1666370255675; Fri, 21 Oct 2022 09:37:35 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:37 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-22-jthoughton@google.com> Subject: [RFC PATCH v2 21/47] mm: rmap: provide pte_order in page_vma_mapped_walk From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=RdBFlvTW; spf=pass (imf29.hostedemail.com: domain of 3z8pSYwoKCNA5F3AG23FA92AA270.yA8749GJ-886Hwy6.AD2@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3z8pSYwoKCNA5F3AG23FA92AA270.yA8749GJ-886Hwy6.AD2@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370256; a=rsa-sha256; cv=none; b=6LjfTJGMCICuZRFYCtVZgXAueBxdhIrk834ocoxDOBrgsXq5+btRu0L+uYVzzKAPr4OONA H5aLWYxSQKT/D1IQK6bNQ1NyNCo3N9grwTHZul47zsS06vJVKiIvPMpXEBJrs4VBwn37bD 5oJsoKrLPggyVGC4VambogNo+REMxgw= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370256; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=b10z9K2d7yL2NkYZoSYI1eqym2z6cE2oNjVSRM+YxM4=; b=eSSVJpca5gJ4EFr6BsRRqlrV0+zcMH10sedpRzYAJXL7LMMNOaSDKpt+sz+OuqqtBgrK2P v562D5GL/4BhUklUaYRlfFeOCSpj2lJ9DcmjMWCnRlYkUooAHsjJe5sQP5So2gsxCRBrNA vHj49M1xirtvfPCr9hLIMSnK1ur4XjE= Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=RdBFlvTW; spf=pass (imf29.hostedemail.com: domain of 3z8pSYwoKCNA5F3AG23FA92AA270.yA8749GJ-886Hwy6.AD2@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=3z8pSYwoKCNA5F3AG23FA92AA270.yA8749GJ-886Hwy6.AD2@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: wqbhcx74jjnf6863dgbcj54uk83t9e7p X-Rspamd-Queue-Id: 2127F120043 X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1666370255-554807 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: page_vma_mapped_walk callers will need this information to know how HugeTLB pages are mapped. pte_order only applies if pte is not NULL. Signed-off-by: James Houghton --- include/linux/rmap.h | 1 + mm/page_vma_mapped.c | 3 +++ 2 files changed, 4 insertions(+) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index bd3504d11b15..e0557ede2951 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -378,6 +378,7 @@ struct page_vma_mapped_walk { pmd_t *pmd; pte_t *pte; spinlock_t *ptl; + unsigned int pte_order; unsigned int flags; }; diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 93e13fc17d3c..395ca4e21c56 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -16,6 +16,7 @@ static inline bool not_found(struct page_vma_mapped_walk *pvmw) static bool map_pte(struct page_vma_mapped_walk *pvmw) { pvmw->pte = pte_offset_map(pvmw->pmd, pvmw->address); + pvmw->pte_order = 0; if (!(pvmw->flags & PVMW_SYNC)) { if (pvmw->flags & PVMW_MIGRATION) { if (!is_swap_pte(*pvmw->pte)) @@ -174,6 +175,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (!pvmw->pte) return false; + pvmw->pte_order = huge_page_order(hstate); pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); if (!check_pte(pvmw)) return not_found(pvmw); @@ -269,6 +271,7 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) } pte_unmap(pvmw->pte); pvmw->pte = NULL; + pvmw->pte_order = 0; goto restart; } pvmw->pte++; From patchwork Fri Oct 21 16:36:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015089 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EAC2FFA373E for ; Fri, 21 Oct 2022 16:37:40 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EC7088E0017; Fri, 21 Oct 2022 12:37:37 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E01948E0001; Fri, 21 Oct 2022 12:37:37 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C7EB08E0017; Fri, 21 Oct 2022 12:37:37 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 951748E0001 for ; Fri, 21 Oct 2022 12:37:37 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 73BA1A02C6 for ; Fri, 21 Oct 2022 16:37:37 +0000 (UTC) X-FDA: 80045512554.27.4CCFE81 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf28.hostedemail.com (Postfix) with ESMTP id 26B8CC000B for ; Fri, 21 Oct 2022 16:37:36 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-3697bd55974so33423957b3.15 for ; Fri, 21 Oct 2022 09:37:36 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8wv1ndA1KF8oaMnIrotaSvG0oK1F475ZCcDJjASpTlQ=; b=hA8J326O7a0EqIeFifBsH7a/imYAoEv3Fjm/FsWS/2l67p7YL4UTp4delGk3izcl+P yUMlvya8wwiX0QoCutq/CYssZPSkjElnj1/ElmaX1Yv0en/RWLhHh7JnxK7ihEFGucid sPn7Jj6tEBM2NrRJYPvUv4K5hnyaC9+Ddpv1Ik/07iw1wk6iQd8BFDrNfG9dJysMepUC Rk6k301F/i5SC3oHp9J/MopKdD70L4fAd+UF5uQtHniRZAMT/o0Qu3QtZI+LtO+vEwiz 1K/dd5p1zQhugJO2OzYe1+HctzunUY0PXuEreoURlPTPkCfYe98jsMbHS423j8l/98PY A6zg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8wv1ndA1KF8oaMnIrotaSvG0oK1F475ZCcDJjASpTlQ=; b=TnMZettmG8I1DEGnDrjgPIVLPB7P9f/gspr+RF4lW79LqhLOPEbZcQYsFFirpx6AL2 4qsGF0q1IgwKBtM0yq1H1Ce8boUSf0wOaXIR81ecNg7cbm4WJYTa0o/fspe8bLU2Hosb Jl8bue3yJrmqTBE9FSwf1qqRZM6QO/FzzLvdXh8pUYP+jhtciG3QJin6g4uLywo8vnzx CHcLVWe1v9ZYe+cpp5cyn2OUFTsK5hSUz1jNEcJXwIRzi+YRIcBDZCoO/r0PB3iWcUUq HeDLXSxvMwpGD9+WzBMQeiwSDT2pSklKWUsjXOtAsM9eJ4wYQ9VfUmSqLRzbsqX2xcEb uTlA== X-Gm-Message-State: ACrzQf2+TgVhLaphkRkB2DkLC+4IEP5+O0DJqNYxSv0L29YS71uzbKxp j23J9AlH5XZqWo+vTG4vImsRdbPeDIAyQ4Aw X-Google-Smtp-Source: AMsMyM7U6QtOfjRtLEvHhafVxd2Typ3vGFACBScGTzt4dGpN5ExRr+u3NtmZyooil8Yr5UMNKl7jIoiR4PCD61EC X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:110e:b0:66d:e6dc:5f31 with SMTP id o14-20020a056902110e00b0066de6dc5f31mr17202827ybu.628.1666370256631; Fri, 21 Oct 2022 09:37:36 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:38 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-23-jthoughton@google.com> Subject: [RFC PATCH v2 22/47] mm: rmap: make page_vma_mapped_walk callers use pte_order From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370257; a=rsa-sha256; cv=none; b=zJ+fFJWArpx6Oiar1VxJLuEHMMd79KLNhIni/KALIXiY0FO9utfoEJWNAd8iBe9NkkqIUC qwtCZl18QFgIU4vPDOPlXU9gyfTha+Dt+U65lxv3EcNXVEwn+Y4Fswz8avAZtxDt85JMv6 7mVQbhDp5s4/zsC9+X4CRnqIrrDybGY= ARC-Authentication-Results: i=1; imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hA8J326O; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 30MpSYwoKCNE6G4BH34GBA3BB381.zB985AHK-997Ixz7.BE3@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=30MpSYwoKCNE6G4BH34GBA3BB381.zB985AHK-997Ixz7.BE3@flex--jthoughton.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370257; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8wv1ndA1KF8oaMnIrotaSvG0oK1F475ZCcDJjASpTlQ=; b=70A6xy7rMMrmOqN0RKct9c2Wsa3Ha6kGX8JCtktV64g9imAino5tdQLqM0ZF/z4CGgVIxZ zZ7gmA6HVac+FUP6i9cTp1G8txicA8qKpr6ILjpbknX6MrOxAQ+id19Gh4OxIY4seZP+OU G6sLs8C71UtQh0HVNKLdWa/kbzSI4Gg= Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hA8J326O; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 30MpSYwoKCNE6G4BH34GBA3BB381.zB985AHK-997Ixz7.BE3@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=30MpSYwoKCNE6G4BH34GBA3BB381.zB985AHK-997Ixz7.BE3@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam04 X-Rspam-User: X-Stat-Signature: dwwyyfhtmrnor6gqyi3scm5dp1cgta45 X-Rspamd-Queue-Id: 26B8CC000B X-HE-Tag: 1666370256-313798 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This also updates the callers' hugetlb mapcounting code to handle mapcount properly for subpage-mapped hugetlb pages. Signed-off-by: James Houghton --- mm/migrate.c | 2 +- mm/rmap.c | 17 +++++++++++++---- 2 files changed, 14 insertions(+), 5 deletions(-) diff --git a/mm/migrate.c b/mm/migrate.c index a0105fa6e3b2..8712b694c5a7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -235,7 +235,7 @@ static bool remove_migration_pte(struct folio *folio, #ifdef CONFIG_HUGETLB_PAGE if (folio_test_hugetlb(folio)) { - unsigned int shift = huge_page_shift(hstate_vma(vma)); + unsigned int shift = pvmw.pte_order + PAGE_SHIFT; pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) diff --git a/mm/rmap.c b/mm/rmap.c index 9bba65b30e4d..19850d955aea 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1626,7 +1626,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); + hugetlb_count_sub(1UL << pvmw.pte_order, mm); set_huge_pte_at(mm, address, pvmw.pte, pteval); } else { dec_mm_counter(mm, mm_counter(&folio->page)); @@ -1785,7 +1785,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); + if (folio_test_hugetlb(folio)) + page_remove_rmap(&folio->page, vma, true); + else + page_remove_rmap(subpage, vma, false); + if (vma->vm_flags & VM_LOCKED) mlock_page_drain_local(); folio_put(folio); @@ -2034,7 +2038,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, } else if (PageHWPoison(subpage)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { - hugetlb_count_sub(folio_nr_pages(folio), mm); + hugetlb_count_sub(1L << pvmw.pte_order, mm); set_huge_pte_at(mm, address, pvmw.pte, pteval); } else { dec_mm_counter(mm, mm_counter(&folio->page)); @@ -2126,7 +2130,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * * See Documentation/mm/mmu_notifier.rst */ - page_remove_rmap(subpage, vma, folio_test_hugetlb(folio)); + if (folio_test_hugetlb(folio)) + page_remove_rmap(&folio->page, vma, true); + else + page_remove_rmap(subpage, vma, false); if (vma->vm_flags & VM_LOCKED) mlock_page_drain_local(); folio_put(folio); @@ -2210,6 +2217,8 @@ static bool page_make_device_exclusive_one(struct folio *folio, args->owner); mmu_notifier_invalidate_range_start(&range); + VM_BUG_ON_FOLIO(folio_test_hugetlb(folio), folio); + while (page_vma_mapped_walk(&pvmw)) { /* Unexpected PMD-mapped THP? */ VM_BUG_ON_FOLIO(!pvmw.pte, folio); From patchwork Fri Oct 21 16:36:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015090 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1288AC433FE for ; Fri, 21 Oct 2022 16:37:42 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 15B448E0018; Fri, 21 Oct 2022 12:37:39 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 0BD318E0001; Fri, 21 Oct 2022 12:37:38 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DB3B78E0018; Fri, 21 Oct 2022 12:37:38 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id C62298E0001 for ; Fri, 21 Oct 2022 12:37:38 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 8288AC0317 for ; Fri, 21 Oct 2022 16:37:38 +0000 (UTC) X-FDA: 80045512596.14.22C809B Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf16.hostedemail.com (Postfix) with ESMTP id 344DD180031 for ; Fri, 21 Oct 2022 16:37:38 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-36ab1ae386bso3387167b3.16 for ; Fri, 21 Oct 2022 09:37:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=A8A4WdJUmtZ3pZn0W1kJeWoXW7oHHPnmlapm5XaH58o=; b=QyZNlwrSq5mo4QWYzR3oK0EIR216wpexI9Df6hEZNyICuoeMeDmrMlBrvBhG5G9oMH eDwoSKuwVOkw2F+eA1UKjuSgQ/4G0YaIvKDgPK5/l7J3Q8CkdNBISRGJo0HwWGWEUNrf Hsg26jVqMIfXyFuiBuWIPRk3e/w2Ynh00CNLcq8RUWHyxHLWPosuj8ILf6dtKr/srDKf VgnkAvfVFAk1eD2bIu3KwtmPY4idqkPwVNcFidZU4Qvqi2ndAMirEsPVmZ++CLhMi7or QE1FxE7zy+/P5OQx3hmNlYrC8AT26Ti6pHfMan5PLEwpAZLEhxJr4VhMjlaxssiD5v+m jbDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=A8A4WdJUmtZ3pZn0W1kJeWoXW7oHHPnmlapm5XaH58o=; b=sZTkeHwfPYqYFJ+8r41jlGVTWRoPXPSEOp6cuk07N1s01wrfDg/ipwHC2TKANf2h3Y /waXRKuu27kAUMl2tEevSpO+RI0pg1/qOE1atO65zyutGTHVry03nIh2bzf26rEqBJGX PckDI6iUqbytoiSdCZcMhA9IoGFr8StTKHrCO1ZNlydQJQ5DowtDIFE6aK6lhdaHwWjp q3f6PBSrHcdSICxOLBP6CcJEc+2SOHb/agTEXwmlTXEXexlXPUdNEIgquUCSgF5NunBQ jv65b3pWvg0QlEcvRt48TYWz7Xn02aiZ2enXVH7/YxGujNoOsCxSR25qjBD6Z9Ml6S4X qPkg== X-Gm-Message-State: ACrzQf1cjDfmJUM2gAlTO9yRigF7inR8s8BQyPSctVhQny/mtD2w3J64 zG8kkdjHTE0oYR7zDLcxZB+8BCCVQoqQxEIB X-Google-Smtp-Source: AMsMyM7mqFcWFty/hD8NAlbP6ayFKJzD6/8Ky2ha3bC8YyWht+9XnIfNeF0UVQPRo5YyUUff1SZwDTyeE2m4y1Pj X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:cd45:0:b0:6c2:2d8a:e3f4 with SMTP id d66-20020a25cd45000000b006c22d8ae3f4mr17406990ybf.395.1666370257496; Fri, 21 Oct 2022 09:37:37 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:39 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-24-jthoughton@google.com> Subject: [RFC PATCH v2 23/47] rmap: update hugetlb lock comment for HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370258; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=A8A4WdJUmtZ3pZn0W1kJeWoXW7oHHPnmlapm5XaH58o=; b=vPSlbPKv/MwrAwJJmWw7GwLgiLwqV53cdW/ZxNkDM58jleGiSJ8H4tQ8002QsAXdQAarQe pYEpDuSiKOqdJB/qDveymLXABWVBOej4QMf80yQUAn28nfEIC9hPwwyOMU4KerYUx5ib9v HyQ1E8Xkx3jfWM2Z4NtCtK6yTn2AD60= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=QyZNlwrS; spf=pass (imf16.hostedemail.com: domain of 30cpSYwoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=30cpSYwoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370258; a=rsa-sha256; cv=none; b=ig7bV9tv5kP3JMkKsuB7EYZ+VW2fzibPWsWJA2oljio0BSS9uyxk3woYAe9l9bSZntt9Dt wlT3ZXnsNXHT3uCp0PBVcvGHLZHQFWmcHsVdqOZLRGsawX+kLn+UlOL2qmQkNzAgxTkOWX j8UsgseK+kmzrVOxAUIJhTyrs91X8aA= X-Stat-Signature: y94c6ghombpwigof4kuc8gpo1mzdp4w9 X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=QyZNlwrS; spf=pass (imf16.hostedemail.com: domain of 30cpSYwoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=30cpSYwoKCNI7H5CI45HCB4CC492.0CA96BIL-AA8Jy08.CF4@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 344DD180031 X-HE-Tag: 1666370258-620597 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The VMA lock is used to prevent high-granularity HugeTLB mappings from being collapsed while other threads are doing high-granularity page table walks. Signed-off-by: James Houghton --- mm/rmap.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/mm/rmap.c b/mm/rmap.c index 19850d955aea..527463c1e936 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -47,7 +47,8 @@ * * hugetlbfs PageHuge() take locks in this order: * hugetlb_fault_mutex (hugetlbfs specific page fault mutex) - * vma_lock (hugetlb specific lock for pmd_sharing) + * vma_lock (hugetlb specific lock for pmd_sharing and high-granularity + * mapping) * mapping->i_mmap_rwsem (also used for hugetlb pmd sharing) * page->flags PG_locked (lock_page) */ From patchwork Fri Oct 21 16:36:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015091 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34165FA373D for ; Fri, 21 Oct 2022 16:37:43 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 43DB58E0019; Fri, 21 Oct 2022 12:37:40 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3CC7F8E0001; Fri, 21 Oct 2022 12:37:40 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F1CFC8E0019; Fri, 21 Oct 2022 12:37:39 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id D44028E0001 for ; Fri, 21 Oct 2022 12:37:39 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id A569AAABBB for ; Fri, 21 Oct 2022 16:37:39 +0000 (UTC) X-FDA: 80045512638.01.04F0D16 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf06.hostedemail.com (Postfix) with ESMTP id 3BA9C18003F for ; Fri, 21 Oct 2022 16:37:38 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id m66-20020a257145000000b006c23949ec98so3762299ybc.4 for ; Fri, 21 Oct 2022 09:37:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=kLa4FDL78/heGirMrV4t2jI2YXb52QMfSMVBDglb1pI=; b=Q5PTuLo7HRBrnlQ1Ey9ThRuaUPzdRht6lDvMahIeRZWgEby1HC2ivIu1k6cVhj8WmR oO/uNViSfO/2l5oua1WWbkZ0eHLvVgNvH7kU5i5kmpn4hbwV9AMGBGCX3YqInEpOxWcS OQI52pRTR+PnZRAz0BqQAhpT9WrDVcR13NBnsTM4rCctkW7Z+CR0eHFNnJ4WxMOCiJ5T SDYleezhNhwhzNFFFV+R09sJxih1+czcs9xyXQDbsEpbaRwdWjhHE0DoJ1w0cD/H3Kzs ktG3Exmi3FDbo2/KLeQiGqY7fizeM3zEOYgw8vPiHu8GPzznlIQ9RQAUhOcOYC4PvbiJ fpjQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=kLa4FDL78/heGirMrV4t2jI2YXb52QMfSMVBDglb1pI=; b=SyjB5U3S5rTxj+pkGOoWzoi53aoahtteLGGYIjBdga1JmUp3RGR0nwr2bj5QYfxaft MoQM0QHy5S9FTIvFEvny3idmupvbfGDBzyXd+xdLjXB9ydFomLNFua37gBBqY0e5rVb+ eC3NHMMgEKVMBOrm/jrk0ny7KhKKxWA+7O8Y3Eic9u+bUsmjskrjH3UfjWLDR6G0aqRi UGTgQdZdNLnaCjmsVdDGxgv+ac8XvaJ1kK8XDHZfaIxh8JARNon4P2G4vZiw50W23XDS WNO3opBuKKwuSyODhNCAipXGT7vKFmZoM0vjVv3UwDySj80ieD58D3jemqRiSSJyi8VK frzA== X-Gm-Message-State: ACrzQf1pDK056JPIXgzVeQW8Z2mM4FVugDoMJ+aA73Cy+idtxDcTRV2l 548actjOkvUNLCzL5inX/c/0CWk+GOcPR0Le X-Google-Smtp-Source: AMsMyM7h5gdYFH3DD5K/ZNJhJDsbg1FfZUc+C9HxdHoM0T7ZqKXkPe+MqKnLU+sCEA2oMl8JZ8rXkLKHBL6YNc9r X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:244e:0:b0:6ca:1972:f851 with SMTP id k75-20020a25244e000000b006ca1972f851mr10580295ybk.277.1666370258436; Fri, 21 Oct 2022 09:37:38 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:40 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-25-jthoughton@google.com> Subject: [RFC PATCH v2 24/47] hugetlb: update page_vma_mapped to do high-granularity walks From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370259; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=kLa4FDL78/heGirMrV4t2jI2YXb52QMfSMVBDglb1pI=; b=7gC4maYF9gkW2WIF1zfxUsRh0yV/78Emmmh9ZJL969sXT/XHpXty0NqmIqLcN/eYBfrlMD vCkPVCyYZxWr39YNPvj63WGvJaNLU02Twkz5GLORoZVSQnm6dZnOif/buQ3/X3YfxtnrFf 183SGRF5x6mi2h6vMXF1Sag9DYfEMP0= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Q5PTuLo7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of 30spSYwoKCNM8I6DJ56IDC5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=30spSYwoKCNM8I6DJ56IDC5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jthoughton.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370259; a=rsa-sha256; cv=none; b=edsLSoxXf8g6kFTiZrz2yjeKa1xEsn2t+nESrvbv+vugC8BFW4JjJZPUj2NEmoL+ZbeY0l mtqu2LjPSl8s1+0ZPdfSXMbc2iQSMuQ/f2v/rOwi0wAOus0zEdMNlFqfcLbt+4yqC6BqM1 PxYvl/4DCDzkzwLwFpiuvr+EqPuEZGY= X-Rspam-User: X-Rspamd-Queue-Id: 3BA9C18003F X-Stat-Signature: kjfys8qmm339fj13oy7ad97ukwckkmqd Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=Q5PTuLo7; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf06.hostedemail.com: domain of 30spSYwoKCNM8I6DJ56IDC5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=30spSYwoKCNM8I6DJ56IDC5DD5A3.1DBA7CJM-BB9Kz19.DG5@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam07 X-HE-Tag: 1666370258-708285 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This updates the HugeTLB logic to look a lot more like the PTE-mapped THP logic. When a user calls us in a loop, we will update pvmw->address to walk to each page table entry that could possibly map the hugepage containing pvmw->pfn. This makes use of the new pte_order so callers know what size PTE they're getting. Signed-off-by: James Houghton --- include/linux/rmap.h | 4 +++ mm/page_vma_mapped.c | 59 ++++++++++++++++++++++++++++++++++++-------- mm/rmap.c | 48 +++++++++++++++++++++-------------- 3 files changed, 83 insertions(+), 28 deletions(-) diff --git a/include/linux/rmap.h b/include/linux/rmap.h index e0557ede2951..d7d2d9f65a01 100644 --- a/include/linux/rmap.h +++ b/include/linux/rmap.h @@ -13,6 +13,7 @@ #include #include #include +#include /* * The anon_vma heads a list of private "related" vmas, to scan if @@ -409,6 +410,9 @@ static inline void page_vma_mapped_walk_done(struct page_vma_mapped_walk *pvmw) pte_unmap(pvmw->pte); if (pvmw->ptl) spin_unlock(pvmw->ptl); + if (pvmw->pte && is_vm_hugetlb_page(pvmw->vma) && + hugetlb_hgm_enabled(pvmw->vma)) + hugetlb_vma_unlock_read(pvmw->vma); } bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw); diff --git a/mm/page_vma_mapped.c b/mm/page_vma_mapped.c index 395ca4e21c56..1994b3f9a4c2 100644 --- a/mm/page_vma_mapped.c +++ b/mm/page_vma_mapped.c @@ -133,7 +133,8 @@ static void step_forward(struct page_vma_mapped_walk *pvmw, unsigned long size) * * Returns true if the page is mapped in the vma. @pvmw->pmd and @pvmw->pte point * to relevant page table entries. @pvmw->ptl is locked. @pvmw->address is - * adjusted if needed (for PTE-mapped THPs). + * adjusted if needed (for PTE-mapped THPs and high-granularity--mapped HugeTLB + * pages). * * If @pvmw->pmd is set but @pvmw->pte is not, you have found PMD-mapped page * (usually THP). For PTE-mapped THP, you should run page_vma_mapped_walk() in @@ -166,19 +167,57 @@ bool page_vma_mapped_walk(struct page_vma_mapped_walk *pvmw) if (unlikely(is_vm_hugetlb_page(vma))) { struct hstate *hstate = hstate_vma(vma); unsigned long size = huge_page_size(hstate); - /* The only possible mapping was handled on last iteration */ - if (pvmw->pte) - return not_found(pvmw); + struct hugetlb_pte hpte; + pte_t *pte; + pte_t pteval; + + end = (pvmw->address & huge_page_mask(hstate)) + + huge_page_size(hstate); /* when pud is not present, pte will be NULL */ - pvmw->pte = huge_pte_offset(mm, pvmw->address, size); - if (!pvmw->pte) + pte = huge_pte_offset(mm, pvmw->address, size); + if (!pte) return false; - pvmw->pte_order = huge_page_order(hstate); - pvmw->ptl = huge_pte_lock(hstate, mm, pvmw->pte); - if (!check_pte(pvmw)) - return not_found(pvmw); + do { + hugetlb_pte_populate(&hpte, pte, huge_page_shift(hstate), + hpage_size_to_level(size)); + + /* + * Do a high granularity page table walk. The vma lock + * is grabbed to prevent the page table from being + * collapsed mid-walk. It is dropped in + * page_vma_mapped_walk_done(). + */ + if (pvmw->pte) { + if (pvmw->ptl) + spin_unlock(pvmw->ptl); + pvmw->ptl = NULL; + pvmw->address += PAGE_SIZE << pvmw->pte_order; + if (pvmw->address >= end) + return not_found(pvmw); + } else if (hugetlb_hgm_enabled(vma)) + /* Only grab the lock once. */ + hugetlb_vma_lock_read(vma); + +retry_walk: + hugetlb_hgm_walk(mm, vma, &hpte, pvmw->address, + PAGE_SIZE, /*stop_at_none=*/true); + + pvmw->pte = hpte.ptep; + pvmw->pte_order = hpte.shift - PAGE_SHIFT; + pvmw->ptl = hugetlb_pte_lock(mm, &hpte); + pteval = huge_ptep_get(hpte.ptep); + if (pte_present(pteval) && !hugetlb_pte_present_leaf( + &hpte, pteval)) { + /* + * Someone split from under us, so keep + * walking. + */ + spin_unlock(pvmw->ptl); + goto retry_walk; + } + } while (!check_pte(pvmw)); return true; } diff --git a/mm/rmap.c b/mm/rmap.c index 527463c1e936..a8359584467e 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1552,17 +1552,23 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, flush_cache_range(vma, range.start, range.end); /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - * - * We also must hold hugetlb vma_lock in write mode. - * Lock order dictates acquiring vma_lock BEFORE - * i_mmap_rwsem. We can only try lock here and fail - * if unsuccessful. + * If HGM is enabled, we have already grabbed the VMA + * lock for reading, and we cannot safely release it. + * Because HGM-enabled VMAs have already unshared all + * PMDs, we can safely ignore PMD unsharing here. */ - if (!anon) { + if (!anon && !hugetlb_hgm_enabled(vma)) { VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + /* + * To call huge_pmd_unshare, i_mmap_rwsem must + * be held in write mode. Caller needs to + * explicitly do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write + * mode. Lock order dictates acquiring vma_lock + * BEFORE i_mmap_rwsem. We can only try lock + * here and fail if unsuccessful. + */ if (!hugetlb_vma_trylock_write(vma)) { page_vma_mapped_walk_done(&pvmw); ret = false; @@ -1946,17 +1952,23 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, flush_cache_range(vma, range.start, range.end); /* - * To call huge_pmd_unshare, i_mmap_rwsem must be - * held in write mode. Caller needs to explicitly - * do this outside rmap routines. - * - * We also must hold hugetlb vma_lock in write mode. - * Lock order dictates acquiring vma_lock BEFORE - * i_mmap_rwsem. We can only try lock here and - * fail if unsuccessful. + * If HGM is enabled, we have already grabbed the VMA + * lock for reading, and we cannot safely release it. + * Because HGM-enabled VMAs have already unshared all + * PMDs, we can safely ignore PMD unsharing here. */ - if (!anon) { + if (!anon && !hugetlb_hgm_enabled(vma)) { VM_BUG_ON(!(flags & TTU_RMAP_LOCKED)); + /* + * To call huge_pmd_unshare, i_mmap_rwsem must + * be held in write mode. Caller needs to + * explicitly do this outside rmap routines. + * + * We also must hold hugetlb vma_lock in write + * mode. Lock order dictates acquiring vma_lock + * BEFORE i_mmap_rwsem. We can only try lock + * here and fail if unsuccessful. + */ if (!hugetlb_vma_trylock_write(vma)) { page_vma_mapped_walk_done(&pvmw); ret = false; From patchwork Fri Oct 21 16:36:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015092 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6C2FCFA373E for ; Fri, 21 Oct 2022 16:37:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 243578E001A; Fri, 21 Oct 2022 12:37:41 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 1CD0A8E0001; Fri, 21 Oct 2022 12:37:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id E9D178E001A; Fri, 21 Oct 2022 12:37:40 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id CEA0F8E0001 for ; Fri, 21 Oct 2022 12:37:40 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id ADA29A1313 for ; Fri, 21 Oct 2022 16:37:40 +0000 (UTC) X-FDA: 80045512680.20.8B31CCD Received: from mail-vs1-f74.google.com (mail-vs1-f74.google.com [209.85.217.74]) by imf23.hostedemail.com (Postfix) with ESMTP id 57F0C140019 for ; Fri, 21 Oct 2022 16:37:40 +0000 (UTC) Received: by mail-vs1-f74.google.com with SMTP id l62-20020a677041000000b003aa06a96e75so1008343vsc.5 for ; Fri, 21 Oct 2022 09:37:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=n5VzrQrgnFUgUGieOOqXon7u8Uw/+xZD2vuQWAKlM0E=; b=F0/10ArYlwy8KqPJwKkvyO4Kn3FHV6wsoqD58NMFi4c3hrL91LKlK8QvSpR/Gi5jq7 8V938nHwYuBhcZy87Ak4geIo2bLTgzyOZnSBMzMsfaBHlybcG/7ybv/sJK8kQ71WN5JR ekEzvlX4a51pOlaarCmmZvNB7g071LCPtfavN9axUqYDW7ZwfJDHPNjILkcheYku32by t5iyhpbam+p3mqmNvdhDtz//VOQktYqdwwXSfb6PYrkvtME/W09eAN3rCIPNMTaIKzzZ 8CK1TCpdG1tv5c3OF14+w2xwlejiF31gunYGANRvHbfBrQU2x5LoJVQkfQShQHGIvQ+H RCzA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=n5VzrQrgnFUgUGieOOqXon7u8Uw/+xZD2vuQWAKlM0E=; b=lUgykGFbfY+q5BeL3AVpIhRnis2UfqeYk7uUC8STmauxUgfF498391iw3sqjx8nJdR FPRVnS+33RD0mAJDM4KyF5WgC3nQbrVLeZArkBRENrT6Au16yy2kdJ+Zm95j8gAOOHCF 9CjQwbhBhcFvtH1qv766xmZBRn3JI3SzoBGN0tiLkitzV6u7XJ8JmccOI6D0OTp3IMR+ XhPZ9BenQfFYR+cHYfD8roYgahyb0z6RHFTivWGqL2JkpmdgJ/xSdrSLEfvJokDwWJWz 96iA5n8NDkcOci5O0S/fRnTa5lWSF/3CzZoXiSf4TspgYBMP0X7hGNUDmbRbLWtjJdLf kNZQ== X-Gm-Message-State: ACrzQf11PgxSTS03zHbgYQFEOdh5ol0bUBQSIiN63x7qpCwuQ2KEDiUW iaM+YtFIxOtAkl9dbW9tVnvU7+o7afiPoRx1 X-Google-Smtp-Source: AMsMyM6h9KKXHsBwL7ULSdyiMpqDrdyzMiXVNXvrxTdzSehgColYTf8RW41Ifwt+YRo98GLNYVPintVIlqxk7ky0 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a1f:f288:0:b0:3ab:a997:901a with SMTP id q130-20020a1ff288000000b003aba997901amr12715913vkh.19.1666370259573; Fri, 21 Oct 2022 09:37:39 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:41 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-26-jthoughton@google.com> Subject: [RFC PATCH v2 25/47] hugetlb: add HGM support for copy_hugetlb_page_range From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370260; a=rsa-sha256; cv=none; b=qBW55EZjeM/iDnq9cD2uwBSZwaM+4i9qki93Ig30RTE+AZzBsH8uaWob8E+4mdVGMfx/ya 3RYXxjl8HiQye+hfvlzRCd9iD3soRNj6nhNuWpreqzM2YMUp4mJTvXs3sdp9koLFt+gmrD 5+ATr3t6P4kLulcm5Or6YMvYzq6ErTo= ARC-Authentication-Results: i=1; imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="F0/10ArY"; spf=pass (imf23.hostedemail.com: domain of 308pSYwoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com designates 209.85.217.74 as permitted sender) smtp.mailfrom=308pSYwoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370260; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=n5VzrQrgnFUgUGieOOqXon7u8Uw/+xZD2vuQWAKlM0E=; b=5U2rO1ALmQ0iYg3yOUlw0X9kib09HrTM5CuHwhTCssi+RHHqnEpLlExVtdoUd8VlHhStV1 cQTq55xQKup7POzDYtq/zYiRFk9nUA7tVi61CRKmYb7Tly9TNRrbG4asrQTS4C52yx5Ka1 w4F8rcup6+EY7qZauYJv95Tei7oRnT8= Authentication-Results: imf23.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="F0/10ArY"; spf=pass (imf23.hostedemail.com: domain of 308pSYwoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com designates 209.85.217.74 as permitted sender) smtp.mailfrom=308pSYwoKCNQ9J7EK67JED6EE6B4.2ECB8DKN-CCAL02A.EH6@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: hiazag4k3fne6bigqz35yc7znuswf7dc X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 57F0C140019 X-Rspam-User: X-HE-Tag: 1666370260-997154 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This allows fork() to work with high-granularity mappings. The page table structure is copied such that partially mapped regions will remain partially mapped in the same way for the new process. A page's reference count is incremented for *each* portion of it that is mapped in the page table. For example, if you have a PMD-mapped 1G page, the reference count and mapcount will be incremented by 512. Signed-off-by: James Houghton --- mm/hugetlb.c | 81 +++++++++++++++++++++++++++++++++++++++++----------- 1 file changed, 65 insertions(+), 16 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 5783a8307a77..7d692907cbf3 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4946,7 +4946,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, struct vm_area_struct *src_vma) { pte_t *src_pte, *dst_pte, entry; - struct page *ptepage; + struct hugetlb_pte src_hpte, dst_hpte; + struct page *ptepage, *hpage; unsigned long addr; bool cow = is_cow_mapping(src_vma->vm_flags); struct hstate *h = hstate_vma(src_vma); @@ -4956,6 +4957,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, unsigned long last_addr_mask; int ret = 0; + if (hugetlb_hgm_enabled(src_vma)) { + /* + * src_vma might have high-granularity PTEs, and dst_vma will + * need to copy those. + */ + ret = enable_hugetlb_hgm(dst_vma); + if (ret) + return ret; + } + if (cow) { mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, src_vma, src, src_vma->vm_start, @@ -4967,18 +4978,22 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, /* * For shared mappings the vma lock must be held before * calling huge_pte_offset in the src vma. Otherwise, the - * returned ptep could go away if part of a shared pmd and - * another thread calls huge_pmd_unshare. + * returned ptep could go away if + * - part of a shared pmd and another thread calls + * huge_pmd_unshare, or + * - another thread collapses a high-granularity mapping. */ hugetlb_vma_lock_read(src_vma); } last_addr_mask = hugetlb_mask_last_page(h); - for (addr = src_vma->vm_start; addr < src_vma->vm_end; addr += sz) { + addr = src_vma->vm_start; + while (addr < src_vma->vm_end) { spinlock_t *src_ptl, *dst_ptl; + unsigned long hpte_sz; src_pte = huge_pte_offset(src, addr, sz); if (!src_pte) { - addr |= last_addr_mask; + addr = (addr | last_addr_mask) + sz; continue; } dst_pte = huge_pte_alloc(dst, dst_vma, addr, sz); @@ -4987,6 +5002,26 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, break; } + hugetlb_pte_populate(&src_hpte, src_pte, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + hugetlb_pte_populate(&dst_hpte, dst_pte, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + + if (hugetlb_hgm_enabled(src_vma)) { + hugetlb_hgm_walk(src, src_vma, &src_hpte, addr, + PAGE_SIZE, /*stop_at_none=*/true); + ret = hugetlb_hgm_walk(dst, dst_vma, &dst_hpte, addr, + hugetlb_pte_size(&src_hpte), + /*stop_at_none=*/false); + if (ret) + break; + + src_pte = src_hpte.ptep; + dst_pte = dst_hpte.ptep; + } + + hpte_sz = hugetlb_pte_size(&src_hpte); + /* * If the pagetables are shared don't copy or take references. * @@ -4996,12 +5031,12 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * to reliably determine whether pte is shared. */ if (page_count(virt_to_page(dst_pte)) > 1) { - addr |= last_addr_mask; + addr = (addr | last_addr_mask) + sz; continue; } - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), src, src_pte); + dst_ptl = hugetlb_pte_lock(dst, &dst_hpte); + src_ptl = hugetlb_pte_lockptr(src, &src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); again: @@ -5042,10 +5077,15 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, */ if (userfaultfd_wp(dst_vma)) set_huge_pte_at(dst, addr, dst_pte, entry); + } else if (!hugetlb_pte_present_leaf(&src_hpte, entry)) { + /* Retry the walk. */ + spin_unlock(src_ptl); + spin_unlock(dst_ptl); + continue; } else { - entry = huge_ptep_get(src_pte); ptepage = pte_page(entry); - get_page(ptepage); + hpage = compound_head(ptepage); + get_page(hpage); /* * Failing to duplicate the anon rmap is a rare case @@ -5058,24 +5098,29 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, * sleep during the process. */ if (!PageAnon(ptepage)) { - page_dup_file_rmap(ptepage, true); - } else if (page_try_dup_anon_rmap(ptepage, true, + page_dup_file_rmap(hpage, true); + } else if (page_try_dup_anon_rmap(hpage, true, src_vma)) { pte_t src_pte_old = entry; struct page *new; + if (hugetlb_hgm_enabled(src_vma)) { + ret = -EINVAL; + break; + } + spin_unlock(src_ptl); spin_unlock(dst_ptl); /* Do not use reserve as it's private owned */ new = alloc_huge_page(dst_vma, addr, 1); if (IS_ERR(new)) { - put_page(ptepage); + put_page(hpage); ret = PTR_ERR(new); break; } - copy_user_huge_page(new, ptepage, addr, dst_vma, + copy_user_huge_page(new, hpage, addr, dst_vma, npages); - put_page(ptepage); + put_page(hpage); /* Install the new huge page if src pte stable */ dst_ptl = huge_pte_lock(h, dst, dst_pte); @@ -5093,6 +5138,7 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, hugetlb_install_page(dst_vma, dst_pte, addr, new); spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); continue; } @@ -5109,10 +5155,13 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, } set_huge_pte_at(dst, addr, dst_pte, entry); - hugetlb_count_add(npages, dst); + hugetlb_count_add( + hugetlb_pte_size(&dst_hpte) / PAGE_SIZE, + dst); } spin_unlock(src_ptl); spin_unlock(dst_ptl); + addr += hugetlb_pte_size(&src_hpte); } if (cow) { From patchwork Fri Oct 21 16:36:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015093 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B0DF4FA373D for ; Fri, 21 Oct 2022 16:37:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1599F8E001B; Fri, 21 Oct 2022 12:37:42 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id F3B948E0001; Fri, 21 Oct 2022 12:37:41 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEE408E001B; Fri, 21 Oct 2022 12:37:41 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id B51F58E0001 for ; Fri, 21 Oct 2022 12:37:41 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 8EEFB1405D4 for ; Fri, 21 Oct 2022 16:37:41 +0000 (UTC) X-FDA: 80045512722.12.3B88811 Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf26.hostedemail.com (Postfix) with ESMTP id 3415C140014 for ; Fri, 21 Oct 2022 16:37:41 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id o4-20020a258d84000000b006bcfc1aafbdso3722239ybl.14 for ; Fri, 21 Oct 2022 09:37:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=iBv8Qeh3umlbRvLwHQeNwyurmrBSLw5TULsp1T6zjOQ=; b=suyR7+ETHnPADETLXijl2rxvK4S0l8lKXahB00jRy94KrLql58bgTIu6Xtq+wEAutw 3qSk5gGSMSLsMT2bwUVmLbiHjZ0JM//d+Va50N+iIzaNRXrtWFUP2ujYlH1+zkVlnTBY o4N+HujD+Uvd3YUSbtc6XaGAqm8geaCN4uRH8TMKVxMsk5SRi2j9odHj3AlMttSaThcn piV7sQs60kSX1tbZIg1XS5TZQJHNgM0uTI7X8y1AXFlegWGyn7RhVj6zPo9K9F6hUJ3t H8alvqf47S17+GW/5DSv1v7fEO6GtqbrtOg9qQM+8ny5ZkncQHoN4Y8WiC0iIHWzuQsy 2f6w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=iBv8Qeh3umlbRvLwHQeNwyurmrBSLw5TULsp1T6zjOQ=; b=nBmr6pVRTxWw4oezp2f639Rw5xzpjkG2mTuDPShi6TAwbiI3risGsMVPlhC2amjX/z yKrY9s/+/M61xX37gN8TR0u3IBLnH7UqAS/Z0bVTJEUNUlu5bEMP74WVo2yOwQIMsRK5 CkhIK1z09E0eK4eGZz/nlnVLfbxW0PjF+pDem5Z6K84yd9T1SMkLVPgOeGXDwSsyHBuF wozGYpbidO5a8zC0jgRd7OEp9wMiaAIfyu9hEi7QqRIhFtag8F6GEA17AJiiH/805ZK6 zfG/NugzGE1pRFdJNyi1Bqp/d3XrIxRjFYxIoBCLVoDOGuxI7ngwzEl+O2DRQdEU4gEJ 8AQQ== X-Gm-Message-State: ACrzQf3EaW6gHIOckBKG2qkHBYK+mWqR/KFDiHkRJ3jnzZ09XCiTtdYm 2XVpM7Vt52UaR3OfzAqXqo34nThdeCVoM4Ny X-Google-Smtp-Source: AMsMyM6dL+1MEn3usFlfe2w4rDmoALEsJetzdzp0dKvN3alW4LkUGv7Gm0+RZmYNLBhMIeQTdlGoqaYDlTAsktx8 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:84cc:0:b0:6c4:95:f1c1 with SMTP id x12-20020a2584cc000000b006c40095f1c1mr18790616ybm.131.1666370260537; Fri, 21 Oct 2022 09:37:40 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:42 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-27-jthoughton@google.com> Subject: [RFC PATCH v2 26/47] hugetlb: make move_hugetlb_page_tables compatible with HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=suyR7+ET; spf=pass (imf26.hostedemail.com: domain of 31MpSYwoKCNUAK8FL78KFE7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=31MpSYwoKCNUAK8FL78KFE7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370261; a=rsa-sha256; cv=none; b=Mj2NSRmcYh4xvIaKOMrpb7AwxBLjuA66M/7RZJmVuOLYRrZ1sX6WYuyq+mhEZc2k9kVUF+ GYdcNWi9qnDuZqa1RaGbcFFdKqqEhTacJc6VS8K5dx7siIeU+NVciLhptK2ABL8TRIJMBP P7Q4rD32jnTBujN8SIwLrpYtPHsJmxA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370261; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=iBv8Qeh3umlbRvLwHQeNwyurmrBSLw5TULsp1T6zjOQ=; b=OE8VPrqXVlDO+tudMgIZBUFqzjQfC+iNuyZPN62rDzuKIaDiAbvCVljtzSe8SYdKFPPHBx eA5blxZkz851a++IuG3pkLZiXehW5o4z3djnhi1y3LsjCcRBCkcfCGdM2IO5m39Z9uqiej z4hK51OlIsHD17iF8CDcO8UDEtS8BTg= X-Stat-Signature: jt8gs87dccmawzc9y1op1kcyfiizzek9 X-Rspamd-Queue-Id: 3415C140014 X-Rspam-User: Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=suyR7+ET; spf=pass (imf26.hostedemail.com: domain of 31MpSYwoKCNUAK8FL78KFE7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=31MpSYwoKCNUAK8FL78KFE7FF7C5.3FDC9ELO-DDBM13B.FI7@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam06 X-HE-Tag: 1666370261-382419 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is very similar to the support that was added to copy_hugetlb_page_range. We simply do a high-granularity walk now, and most of the rest of the code stays the same. Signed-off-by: James Houghton --- mm/hugetlb.c | 47 ++++++++++++++++++++++++++++++++--------------- 1 file changed, 32 insertions(+), 15 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 7d692907cbf3..16b0d192445c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5174,16 +5174,16 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, return ret; } -static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, - unsigned long new_addr, pte_t *src_pte, pte_t *dst_pte) +static void move_hugetlb_pte(struct vm_area_struct *vma, unsigned long old_addr, + unsigned long new_addr, struct hugetlb_pte *src_hpte, + struct hugetlb_pte *dst_hpte) { - struct hstate *h = hstate_vma(vma); struct mm_struct *mm = vma->vm_mm; spinlock_t *src_ptl, *dst_ptl; pte_t pte; - dst_ptl = huge_pte_lock(h, mm, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), mm, src_pte); + dst_ptl = hugetlb_pte_lock(mm, dst_hpte); + src_ptl = hugetlb_pte_lockptr(mm, src_hpte); /* * We don't have to worry about the ordering of src and dst ptlocks @@ -5192,8 +5192,8 @@ static void move_huge_pte(struct vm_area_struct *vma, unsigned long old_addr, if (src_ptl != dst_ptl) spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); - pte = huge_ptep_get_and_clear(mm, old_addr, src_pte); - set_huge_pte_at(mm, new_addr, dst_pte, pte); + pte = huge_ptep_get_and_clear(mm, old_addr, src_hpte->ptep); + set_huge_pte_at(mm, new_addr, dst_hpte->ptep, pte); if (src_ptl != dst_ptl) spin_unlock(src_ptl); @@ -5214,6 +5214,7 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, pte_t *src_pte, *dst_pte; struct mmu_notifier_range range; bool shared_pmd = false; + struct hugetlb_pte src_hpte, dst_hpte; mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, old_addr, old_end); @@ -5229,20 +5230,28 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, /* Prevent race with file truncation */ hugetlb_vma_lock_write(vma); i_mmap_lock_write(mapping); - for (; old_addr < old_end; old_addr += sz, new_addr += sz) { + while (old_addr < old_end) { src_pte = huge_pte_offset(mm, old_addr, sz); if (!src_pte) { - old_addr |= last_addr_mask; - new_addr |= last_addr_mask; + old_addr = (old_addr | last_addr_mask) + sz; + new_addr = (new_addr | last_addr_mask) + sz; continue; } - if (huge_pte_none(huge_ptep_get(src_pte))) + + hugetlb_pte_populate(&src_hpte, src_pte, huge_page_shift(h), + hpage_size_to_level(sz)); + hugetlb_hgm_walk(mm, vma, &src_hpte, old_addr, + PAGE_SIZE, /*stop_at_none=*/true); + if (huge_pte_none(huge_ptep_get(src_hpte.ptep))) { + old_addr += hugetlb_pte_size(&src_hpte); + new_addr += hugetlb_pte_size(&src_hpte); continue; + } - if (huge_pmd_unshare(mm, vma, old_addr, src_pte)) { + if (huge_pmd_unshare(mm, vma, old_addr, src_hpte.ptep)) { shared_pmd = true; - old_addr |= last_addr_mask; - new_addr |= last_addr_mask; + old_addr = (old_addr | last_addr_mask) + sz; + new_addr = (new_addr | last_addr_mask) + sz; continue; } @@ -5250,7 +5259,15 @@ int move_hugetlb_page_tables(struct vm_area_struct *vma, if (!dst_pte) break; - move_huge_pte(vma, old_addr, new_addr, src_pte, dst_pte); + hugetlb_pte_populate(&dst_hpte, dst_pte, huge_page_shift(h), + hpage_size_to_level(sz)); + if (hugetlb_hgm_walk(mm, vma, &dst_hpte, new_addr, + hugetlb_pte_size(&src_hpte), + /*stop_at_none=*/false)) + break; + move_hugetlb_pte(vma, old_addr, new_addr, &src_hpte, &dst_hpte); + old_addr += hugetlb_pte_size(&src_hpte); + new_addr += hugetlb_pte_size(&src_hpte); } if (shared_pmd) From patchwork Fri Oct 21 16:36:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015094 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D0157C433FE for ; Fri, 21 Oct 2022 16:37:46 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3F2778E001C; Fri, 21 Oct 2022 12:37:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3059C8E0001; Fri, 21 Oct 2022 12:37:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 06CC68E001C; Fri, 21 Oct 2022 12:37:42 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0010.hostedemail.com [216.40.44.10]) by kanga.kvack.org (Postfix) with ESMTP id DED228E0001 for ; Fri, 21 Oct 2022 12:37:42 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id B4C9B160541 for ; Fri, 21 Oct 2022 16:37:42 +0000 (UTC) X-FDA: 80045512764.27.18DC85B Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf10.hostedemail.com (Postfix) with ESMTP id 2FC5CC0034 for ; Fri, 21 Oct 2022 16:37:41 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id 129-20020a250087000000b006ca5c621bacso3364666yba.3 for ; Fri, 21 Oct 2022 09:37:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=xcIc6o2q1ZWtyzFetKZVD9MlAzdaCFh0Vvt9+xlVO4U=; b=RBhXCh4+M6IU1RPOCl/mMwL/StlmI///zFvgKrj30KQcO9AVvs3UiO61F1asmizkBs xj35V7Esrs0WYRxwoyhm7vvnHyJsjHl37OhS9h/vWxsEHfNinNfICaG7CudxLzL06C2/ ZNuGUVrCoCLtVIzeXI+CZNkVbBFWX4dzCSWr/REeJH2L4rNTqx80kStlB6xJeNxzmo+p um0YkZS3Pl93a17EcSQA2mJP+JIyDjPEhxfNNRk/fLvPyR7v7qRC28tNFCPNiLrxtNze fip6KSAYskk7JdwZzlm12LEGDesz9sLZXePmgdnFbFD5TgEwuKmoBgaXaElakdHSJy4e /oeQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xcIc6o2q1ZWtyzFetKZVD9MlAzdaCFh0Vvt9+xlVO4U=; b=sr7isUposZROUGfaqr4qquEJi9krABzl0YRsWPYtxNwSBHM1c9iOP59vZY+r9oHuEf fgfux6KmsScIbXujgkGNc3gRzBxAsBaj/8C+0yCG5OhDEmGUaTpoLUEKNrRDFBmCAWvk WWdqav1GiC6A9Vcad3qT7dvxMSJp6zC6oOMtv+roe0WbFZ+yxoX0fTxciMY9gVvrqdvy 8NWXTJrGIcEwgoaJP2G2ufrnVnklj+qIY7pa4/BQNPktd1Ze7Vjxq8mXaNMJr/J6mDOM RFUk9T9bSm09foubFl5Jyr0MgANn7HDFJc6nlJ11fj3ZLLfSTb9RvlUViYvw89s3fep3 Oc9A== X-Gm-Message-State: ACrzQf0mCh38CqtKvEZo3QdL8thYp81BrWeHgzrq7vMtTQM6LJSAtDQe dpk5sJ2194IiXSi7u2yZsqP1Rte8wIh1ls0m X-Google-Smtp-Source: AMsMyM4u8CD3I4d/tpVW86y/KkJ5f8bn6NYPFbZDH2CmgCL5LSoZ0MSJDJQ0DAkF1J5yuH1SLUcG/xHay8AE1gFf X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:8c2:0:b0:6bc:272:4f42 with SMTP id w2-20020a5b08c2000000b006bc02724f42mr17980966ybq.555.1666370261433; Fri, 21 Oct 2022 09:37:41 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:43 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-28-jthoughton@google.com> Subject: [RFC PATCH v2 27/47] hugetlb: add HGM support for hugetlb_fault and hugetlb_no_page From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370262; a=rsa-sha256; cv=none; b=AUtV0E+aNmM8XnyiuQaNy3LJStOfxYf0niT906rvZ54hAOb1KIHbJfMVjIOCGni5ANRbJk vaq0D9hAM6+2ZSkF8CpwfZKB2IQtdGXBJ6Hbj0Y2Aj+yo3IKjw6LOZ/lbDHZvC5Pc/blvY WPhyEsJdn0l4UNlBw4+RDRcpsBico5M= ARC-Authentication-Results: i=1; imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=RBhXCh4+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of 31cpSYwoKCNYBL9GM89LGF8GG8D6.4GEDAFMP-EECN24C.GJ8@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=31cpSYwoKCNYBL9GM89LGF8GG8D6.4GEDAFMP-EECN24C.GJ8@flex--jthoughton.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370262; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=xcIc6o2q1ZWtyzFetKZVD9MlAzdaCFh0Vvt9+xlVO4U=; b=MCVCy7gxHr7bdI1RSAp7o0yts15fDwRq+vDsT3Yt0FeJrPDiQa8ORH9GNp9ICX5YWjQeAb ftLYIxK2yUTU6k2WKEBtIQhQcIDrvOZkRjrbSgNEB4Tltnx3iZO9zDLTYb6Ht6jg0c2gFU dA2HoZ5N4dlJK0NY1dS4C4WkX2lzQKs= Authentication-Results: imf10.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=RBhXCh4+; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf10.hostedemail.com: domain of 31cpSYwoKCNYBL9GM89LGF8GG8D6.4GEDAFMP-EECN24C.GJ8@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=31cpSYwoKCNYBL9GM89LGF8GG8D6.4GEDAFMP-EECN24C.GJ8@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam04 X-Rspam-User: X-Stat-Signature: 3hqmdp3whi1yehcf34ghk4pxqhusz3w3 X-Rspamd-Queue-Id: 2FC5CC0034 X-HE-Tag: 1666370261-826421 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Update the page fault handler to support high-granularity page faults. While handling a page fault on a partially-mapped HugeTLB page, if the PTE we find with hugetlb_pte_walk is none, then we will replace it with a leaf-level PTE to map the page. To give some examples: 1. For a completely unmapped 1G page, it will be mapped with a 1G PUD. 2. For a 1G page that has its first 512M mapped, any faults on the unmapped sections will result in 2M PMDs mapping each unmapped 2M section. 3. For a 1G page that has only its first 4K mapped, a page fault on its second 4K section will get a 4K PTE to map it. Unless high-granularity mappings are created via UFFDIO_CONTINUE, it is impossible for hugetlb_fault to create high-granularity mappings. This commit does not handle hugetlb_wp right now, and it doesn't handle HugeTLB page migration and swap entries. Signed-off-by: James Houghton --- mm/hugetlb.c | 90 +++++++++++++++++++++++++++++++++++++--------------- 1 file changed, 64 insertions(+), 26 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 16b0d192445c..2ee2c48ee79c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -118,6 +118,18 @@ enum hugetlb_level hpage_size_to_level(unsigned long sz) return HUGETLB_LEVEL_PGD; } +/* + * Find the subpage that corresponds to `addr` in `hpage`. + */ +static struct page *hugetlb_find_subpage(struct hstate *h, struct page *hpage, + unsigned long addr) +{ + size_t idx = (addr & ~huge_page_mask(h))/PAGE_SIZE; + + BUG_ON(idx >= pages_per_huge_page(h)); + return &hpage[idx]; +} + static inline bool subpool_is_free(struct hugepage_subpool *spool) { if (spool->count) @@ -5810,13 +5822,13 @@ static inline vm_fault_t hugetlb_handle_userfault(struct vm_area_struct *vma, * false if pte changed or is changing. */ static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, - pte_t *ptep, pte_t old_pte) + struct hugetlb_pte *hpte, pte_t old_pte) { spinlock_t *ptl; bool same; - ptl = huge_pte_lock(h, mm, ptep); - same = pte_same(huge_ptep_get(ptep), old_pte); + ptl = hugetlb_pte_lock(mm, hpte); + same = pte_same(huge_ptep_get(hpte->ptep), old_pte); spin_unlock(ptl); return same; @@ -5825,17 +5837,18 @@ static bool hugetlb_pte_stable(struct hstate *h, struct mm_struct *mm, static vm_fault_t hugetlb_no_page(struct mm_struct *mm, struct vm_area_struct *vma, struct address_space *mapping, pgoff_t idx, - unsigned long address, pte_t *ptep, + unsigned long address, struct hugetlb_pte *hpte, pte_t old_pte, unsigned int flags) { struct hstate *h = hstate_vma(vma); vm_fault_t ret = VM_FAULT_SIGBUS; int anon_rmap = 0; unsigned long size; - struct page *page; + struct page *page, *subpage; pte_t new_pte; spinlock_t *ptl; unsigned long haddr = address & huge_page_mask(h); + unsigned long haddr_hgm = address & hugetlb_pte_mask(hpte); bool new_page, new_pagecache_page = false; u32 hash = hugetlb_fault_mutex_hash(mapping, idx); @@ -5880,7 +5893,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * never happen on the page after UFFDIO_COPY has * correctly installed the page and returned. */ - if (!hugetlb_pte_stable(h, mm, ptep, old_pte)) { + if (!hugetlb_pte_stable(h, mm, hpte, old_pte)) { ret = 0; goto out; } @@ -5904,7 +5917,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, * here. Before returning error, get ptl and make * sure there really is no pte entry. */ - if (hugetlb_pte_stable(h, mm, ptep, old_pte)) + if (hugetlb_pte_stable(h, mm, hpte, old_pte)) ret = vmf_error(PTR_ERR(page)); else ret = 0; @@ -5954,7 +5967,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, unlock_page(page); put_page(page); /* See comment in userfaultfd_missing() block above */ - if (!hugetlb_pte_stable(h, mm, ptep, old_pte)) { + if (!hugetlb_pte_stable(h, mm, hpte, old_pte)) { ret = 0; goto out; } @@ -5979,10 +5992,10 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, vma_end_reservation(h, vma, haddr); } - ptl = huge_pte_lock(h, mm, ptep); + ptl = hugetlb_pte_lock(mm, hpte); ret = 0; /* If pte changed from under us, retry */ - if (!pte_same(huge_ptep_get(ptep), old_pte)) + if (!pte_same(huge_ptep_get(hpte->ptep), old_pte)) goto backout; if (anon_rmap) { @@ -5990,20 +6003,25 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, hugepage_add_new_anon_rmap(page, vma, haddr); } else page_dup_file_rmap(page, true); - new_pte = make_huge_pte(vma, page, ((vma->vm_flags & VM_WRITE) - && (vma->vm_flags & VM_SHARED))); + + subpage = hugetlb_find_subpage(h, page, haddr_hgm); + new_pte = make_huge_pte_with_shift(vma, subpage, + ((vma->vm_flags & VM_WRITE) + && (vma->vm_flags & VM_SHARED)), + hpte->shift); /* * If this pte was previously wr-protected, keep it wr-protected even * if populated. */ if (unlikely(pte_marker_uffd_wp(old_pte))) new_pte = huge_pte_wrprotect(huge_pte_mkuffd_wp(new_pte)); - set_huge_pte_at(mm, haddr, ptep, new_pte); + set_huge_pte_at(mm, haddr_hgm, hpte->ptep, new_pte); - hugetlb_count_add(pages_per_huge_page(h), mm); + hugetlb_count_add(hugetlb_pte_size(hpte) / PAGE_SIZE, mm); if ((flags & FAULT_FLAG_WRITE) && !(vma->vm_flags & VM_SHARED)) { + BUG_ON(hugetlb_pte_size(hpte) != huge_page_size(h)); /* Optimization, do the COW without a second fault */ - ret = hugetlb_wp(mm, vma, address, ptep, flags, page, ptl); + ret = hugetlb_wp(mm, vma, address, hpte->ptep, flags, page, ptl); } spin_unlock(ptl); @@ -6066,11 +6084,14 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, u32 hash; pgoff_t idx; struct page *page = NULL; + struct page *subpage = NULL; struct page *pagecache_page = NULL; struct hstate *h = hstate_vma(vma); struct address_space *mapping; int need_wait_lock = 0; unsigned long haddr = address & huge_page_mask(h); + unsigned long haddr_hgm; + struct hugetlb_pte hpte; ptep = huge_pte_offset(mm, haddr, huge_page_size(h)); if (ptep) { @@ -6115,15 +6136,22 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, return VM_FAULT_OOM; } - entry = huge_ptep_get(ptep); + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + /* Do a high-granularity page table walk. */ + hugetlb_hgm_walk(mm, vma, &hpte, address, PAGE_SIZE, + /*stop_at_none=*/true); + + entry = huge_ptep_get(hpte.ptep); /* PTE markers should be handled the same way as none pte */ - if (huge_pte_none_mostly(entry)) + if (huge_pte_none_mostly(entry)) { /* * hugetlb_no_page will drop vma lock and hugetlb fault * mutex internally, which make us return immediately. */ - return hugetlb_no_page(mm, vma, mapping, idx, address, ptep, + return hugetlb_no_page(mm, vma, mapping, idx, address, &hpte, entry, flags); + } ret = 0; @@ -6137,6 +6165,10 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (!pte_present(entry)) goto out_mutex; + if (!hugetlb_pte_present_leaf(&hpte, entry)) + /* We raced with someone splitting the entry. */ + goto out_mutex; + /* * If we are going to COW/unshare the mapping later, we examine the * pending reservations for this page now. This will ensure that any @@ -6156,14 +6188,17 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, pagecache_page = find_lock_page(mapping, idx); } - ptl = huge_pte_lock(h, mm, ptep); + ptl = hugetlb_pte_lock(mm, &hpte); /* Check for a racing update before calling hugetlb_wp() */ - if (unlikely(!pte_same(entry, huge_ptep_get(ptep)))) + if (unlikely(!pte_same(entry, huge_ptep_get(hpte.ptep)))) goto out_ptl; + /* haddr_hgm is the base address of the region that hpte maps. */ + haddr_hgm = address & hugetlb_pte_mask(&hpte); + /* Handle userfault-wp first, before trying to lock more pages */ - if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) && + if (userfaultfd_wp(vma) && huge_pte_uffd_wp(entry) && (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) { struct vm_fault vmf = { .vma = vma, @@ -6187,7 +6222,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * pagecache_page, so here we need take the former one * when page != pagecache_page or !pagecache_page. */ - page = pte_page(entry); + subpage = pte_page(entry); + page = compound_head(subpage); if (page != pagecache_page) if (!trylock_page(page)) { need_wait_lock = 1; @@ -6198,7 +6234,8 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, if (flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { if (!huge_pte_write(entry)) { - ret = hugetlb_wp(mm, vma, address, ptep, flags, + BUG_ON(hugetlb_pte_size(&hpte) != huge_page_size(h)); + ret = hugetlb_wp(mm, vma, address, hpte.ptep, flags, pagecache_page, ptl); goto out_put_page; } else if (likely(flags & FAULT_FLAG_WRITE)) { @@ -6206,9 +6243,9 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, } } entry = pte_mkyoung(entry); - if (huge_ptep_set_access_flags(vma, haddr, ptep, entry, + if (huge_ptep_set_access_flags(vma, haddr_hgm, hpte.ptep, entry, flags & FAULT_FLAG_WRITE)) - update_mmu_cache(vma, haddr, ptep); + update_mmu_cache(vma, haddr_hgm, hpte.ptep); out_put_page: if (page != pagecache_page) unlock_page(page); @@ -7598,7 +7635,8 @@ pte_t *huge_pte_alloc(struct mm_struct *mm, struct vm_area_struct *vma, pte = (pte_t *)pmd_alloc(mm, pud, addr); } } - BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte)); + BUG_ON(pte && pte_present(*pte) && !pte_huge(*pte) && + !hugetlb_hgm_enabled(vma)); return pte; } From patchwork Fri Oct 21 16:36:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015095 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 170C0FA373D for ; Fri, 21 Oct 2022 16:37:48 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id EED818E001D; Fri, 21 Oct 2022 12:37:43 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E777C8E0001; Fri, 21 Oct 2022 12:37:43 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C05BE8E001D; Fri, 21 Oct 2022 12:37:43 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 9D4088E0001 for ; Fri, 21 Oct 2022 12:37:43 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 76D2B1A0500 for ; Fri, 21 Oct 2022 16:37:43 +0000 (UTC) X-FDA: 80045512806.21.4289ECE Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf24.hostedemail.com (Postfix) with ESMTP id 23B5818002D for ; Fri, 21 Oct 2022 16:37:42 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-3584be56efbso34163437b3.8 for ; Fri, 21 Oct 2022 09:37:42 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=Ei7k3uU35Z4Hu/w/xkPODXeMU6eNvsuQ55K19eUxLMI=; b=huKMsf08tu+ikIVOCPqBNwFsJXqNyp0P8NACcPr/sumx9YmFpNoCw7wMEhSlSL8pMP nSAMHrcN+7UfC0DL5d1quDTLv1jC014OC18xm+ftOJsMlX86PO+FJJI4RqcxlQb7hajv PzQFgZUu5gOec6xHS10xWWeea1IDWG2Utvm0kWwRv/uh6n7cEEBs0u+ueLl/S4oJE+Vy nDwZQusgJIjBbOwa/l5QxjrgNgvpToknRIkLb2dQOitFidH8U4mZMupmnTOexm99hkrz z0sYqYb8B3PDqwJp6QYPOzSeMKhGJlKRatwIHEblCaYbrb8BMhBjB2tWxOLSppUpCrob tTWw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=Ei7k3uU35Z4Hu/w/xkPODXeMU6eNvsuQ55K19eUxLMI=; b=ag6m0LfV6slxSDJfZ9seesVn+fkdeVi6v3gIwWo31+UdQFKgb4jJyIhZoaFKv/Q345 nQlI034GFatA60ZMjH11zgiqyyT6tF8fX1BYfvYNSvDLVn5EMdWiMjO2pF/T2rNW8BYo A6aKuy2Sz0cWvzMTyLeEbv8L5S+cplu7v3IrwhvL0QJYPS8iREwSY9ZTIRf+GhveeW8/ VHgZSAXIJiS8UgBaZ0lnNr0XLkskJ7yJe4qTp2dAJ//18U4cHK+uJ9zxkLIHYk2JhrRE JFRrVDT/+gTJQ+cCsK48hfW0y8X41q+co1Sm9zrItPujTFQh6Louoim9yCwo0Tlms2i1 qN0Q== X-Gm-Message-State: ACrzQf3RphVk4sebsXAEGIRCLNZ9lX5hHRkv4X9Bq1k3zW5kxoiVb7dB 05mHaf49yAwx7Ku5beFpj4xEHgNBu+6czjKQ X-Google-Smtp-Source: AMsMyM7zI9mXFO3X8vPctlBiB3UBLdTU3cbMoRvzP0AM4QwHltXjTzo1HEhNiplbJ7IWD/KkV3s28fi8e7mc0/0l X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:15c4:0:b0:6bd:4d9c:a3bc with SMTP id 187-20020a2515c4000000b006bd4d9ca3bcmr18149514ybv.211.1666370262380; Fri, 21 Oct 2022 09:37:42 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:44 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-29-jthoughton@google.com> Subject: [RFC PATCH v2 28/47] rmap: in try_to_{migrate,unmap}_one, check head page for page flags From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370263; a=rsa-sha256; cv=none; b=VgU0KtuypT2di5iwtSTwnMgA1RMidp0cWJoAbPGa2wp9ghT8vwsgy9JO9J783Ei/uARsNY 9WEloekji/6QnErTMgl4Wrh8ZedY1tVPSMG6bC1MzB03pAB4F4l9Dps6zvOEnPRCBiG4jz ajVgLHLwHF3QS/EN2qbshd0fUf+mQS4= ARC-Authentication-Results: i=1; imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=huKMsf08; spf=pass (imf24.hostedemail.com: domain of 31spSYwoKCNcCMAHN9AMHG9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=31spSYwoKCNcCMAHN9AMHG9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370263; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=Ei7k3uU35Z4Hu/w/xkPODXeMU6eNvsuQ55K19eUxLMI=; b=acnvUN+kmHgnwUdQMHymlBVRz7w0LZ7tJxwv8CGYKC9bCnJ0TwQ6mA+5P3TWClw2cATTWn lU6BvEkcTFvNM0oOViZ21ejjgQIfm15Tuce7pihWGnH0hpGbfaFQYmYKGt1yv7joJsZ3Vj ICsvGQZ/4nsFYWVaQ6/+3CuTF4bOPz0= Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=huKMsf08; spf=pass (imf24.hostedemail.com: domain of 31spSYwoKCNcCMAHN9AMHG9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=31spSYwoKCNcCMAHN9AMHG9HH9E7.5HFEBGNQ-FFDO35D.HK9@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: 7mkk8zop1mx5ebjpyezuo7b8hqtwkirs X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 23B5818002D X-Rspam-User: X-HE-Tag: 1666370262-231508 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The main complication here is that HugeTLB pages have their poison status stored in the head page as the HWPoison page flag. Because HugeTLB high-granularity mapping can create PTEs that point to subpages instead of always the head of a hugepage, we need to check the compound_head for page flags. Signed-off-by: James Houghton --- mm/rmap.c | 34 ++++++++++++++++++++++++++-------- 1 file changed, 26 insertions(+), 8 deletions(-) diff --git a/mm/rmap.c b/mm/rmap.c index a8359584467e..d5e1eb6b8ce5 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1474,10 +1474,11 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; - struct page *subpage; + struct page *subpage, *page_flags_page; bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; + bool page_poisoned; /* * When racing against e.g. zap_pte_range() on another cpu, @@ -1530,9 +1531,17 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); + /* + * We check the page flags of HugeTLB pages by checking the + * head page. + */ + page_flags_page = folio_test_hugetlb(folio) + ? &folio->page + : subpage; + page_poisoned = PageHWPoison(page_flags_page); address = pvmw.address; anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); + PageAnonExclusive(page_flags_page); if (folio_test_hugetlb(folio)) { bool anon = folio_test_anon(folio); @@ -1541,7 +1550,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, * The try_to_unmap() is only passed a hugetlb page * in the case where the hugetlb page is poisoned. */ - VM_BUG_ON_PAGE(!PageHWPoison(subpage), subpage); + VM_BUG_ON_FOLIO(!page_poisoned, folio); /* * huge_pmd_unshare may unmap an entire PMD page. * There is no way of knowing exactly which PMDs may @@ -1630,7 +1639,7 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, /* Update high watermark before we lower rss */ update_hiwater_rss(mm); - if (PageHWPoison(subpage) && !(flags & TTU_IGNORE_HWPOISON)) { + if (page_poisoned && !(flags & TTU_IGNORE_HWPOISON)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(1UL << pvmw.pte_order, mm); @@ -1656,7 +1665,9 @@ static bool try_to_unmap_one(struct folio *folio, struct vm_area_struct *vma, mmu_notifier_invalidate_range(mm, address, address + PAGE_SIZE); } else if (folio_test_anon(folio)) { - swp_entry_t entry = { .val = page_private(subpage) }; + swp_entry_t entry = { + .val = page_private(page_flags_page) + }; pte_t swp_pte; /* * Store the swap location in the pte. @@ -1855,7 +1866,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, struct mm_struct *mm = vma->vm_mm; DEFINE_FOLIO_VMA_WALK(pvmw, folio, vma, address, 0); pte_t pteval; - struct page *subpage; + struct page *subpage, *page_flags_page; bool anon_exclusive, ret = true; struct mmu_notifier_range range; enum ttu_flags flags = (enum ttu_flags)(long)arg; @@ -1935,9 +1946,16 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, subpage = folio_page(folio, pte_pfn(*pvmw.pte) - folio_pfn(folio)); } + /* + * We check the page flags of HugeTLB pages by checking the + * head page. + */ + page_flags_page = folio_test_hugetlb(folio) + ? &folio->page + : subpage; address = pvmw.address; anon_exclusive = folio_test_anon(folio) && - PageAnonExclusive(subpage); + PageAnonExclusive(page_flags_page); if (folio_test_hugetlb(folio)) { bool anon = folio_test_anon(folio); @@ -2048,7 +2066,7 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, * No need to invalidate here it will synchronize on * against the special swap migration pte. */ - } else if (PageHWPoison(subpage)) { + } else if (PageHWPoison(page_flags_page)) { pteval = swp_entry_to_pte(make_hwpoison_entry(subpage)); if (folio_test_hugetlb(folio)) { hugetlb_count_sub(1L << pvmw.pte_order, mm); From patchwork Fri Oct 21 16:36:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015096 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 40AFAFA373E for ; Fri, 21 Oct 2022 16:37:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C2D938E001E; Fri, 21 Oct 2022 12:37:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB87D8E0001; Fri, 21 Oct 2022 12:37:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 993A18E001E; Fri, 21 Oct 2022 12:37:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 78F5E8E0001 for ; Fri, 21 Oct 2022 12:37:44 -0400 (EDT) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 5C9181C4780 for ; Fri, 21 Oct 2022 16:37:44 +0000 (UTC) X-FDA: 80045512848.11.88D1E57 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf20.hostedemail.com (Postfix) with ESMTP id 058DB1C0047 for ; Fri, 21 Oct 2022 16:37:43 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id t6-20020a25b706000000b006b38040b6f7so3731233ybj.6 for ; Fri, 21 Oct 2022 09:37:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=mfAGmPTzD6IaWh4r35NYR4uvUMh+9qxawpYMgnabsqo=; b=ladT4IDF0faYkS6EsLyPL4iAwSYt3vTPNcCtHgK4GO6UxLr3iGow3ilNz77qQtTxdU teVbA5i21qZ5PUXIY4R9n+Nlp+jikEryrw14TrwV+/URh3HfX3Ah8cih1kOfbXrR5NfU 7R9L/opxCj6oXyy8sYYGZV+1ooqMBwsfrmEH7fhy8ZHX6lySQFd9UiHPWrK2rd9KNFSB 3/duJ13bOFufz9wC+afwg6lg1M05SolVDmt/Bc7HfN7Dz/OhlEzPe8dGy1s/Dxw1eJ9b MfoCnqg3qGFaYulat7qBcu0qc1uAQ7885pfCKUPqkcpv+B/hepRO/OQQPPnKwTrNgY8F 3D8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=mfAGmPTzD6IaWh4r35NYR4uvUMh+9qxawpYMgnabsqo=; b=1FObp7+HeW3NFM7jEa6tmOirUCUyJOi1MCtlx92mp4ubVbhdobrViLLmIYg7Kk5xvw 3+6G0CFDkPy6wuywx/ONa01dkCTQyX20ihqVyXf/xQTCk1iJWsIOTpqos9QPI8B63GOs 9wHLvVHwFXUegRNlWvDDFT57brK2dD3bKrZIqtPx9s22vx6egs7rV36FLI4lxwJ9yuAD WS6O1JYXuOD1wZReXuc/H8lemvt4716TqDyGThBhaH4FqvV/zeE0t0iCC4kHlMg0o5Xt VqL033FL00Blz2Fosew8PNC3w3Ut5/RMCEdfGw8GxvaxrPQWCgXXPWCc93kvQlyF0oDg jYwA== X-Gm-Message-State: ACrzQf3GYGcC4bYfNTAYUsMztQVeFG2QrhlrMkjlIjN8x7jIx3kauwKy qAdLLboCFlUmmYFSRUjh2GnTFis+/SCCOZiO X-Google-Smtp-Source: AMsMyM5VI5g5y4qDnJ3t6eUtwrmz7+9GTBb9pTvZ+FXVDF2XmHcHaa8WGqrF89WshyvGQ2ReYotP8PNTRlH1p2sC X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:c704:0:b0:6c1:9494:f584 with SMTP id w4-20020a25c704000000b006c19494f584mr17803486ybe.98.1666370263251; Fri, 21 Oct 2022 09:37:43 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:45 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-30-jthoughton@google.com> Subject: [RFC PATCH v2 29/47] hugetlb: add high-granularity migration support From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370264; a=rsa-sha256; cv=none; b=OpObAXZwhTkflKGzNtS8ZzDubUD08FJgAiu7DsICQg2Xh/Lud0UbnmcJSl/xNXi9X0TYMp BKwjYFi0gr0+hBtXZmxlnFSkiHpe4xQ+zDFtlN5c/FvTH3bQU+6+lSza7/h3N8ICzTpUFi Xnih/lISKELNKO0zi6VGfD2OY3B+568= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ladT4IDF; spf=pass (imf20.hostedemail.com: domain of 318pSYwoKCNgDNBIOABNIHAIIAF8.6IGFCHOR-GGEP46E.ILA@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=318pSYwoKCNgDNBIOABNIHAIIAF8.6IGFCHOR-GGEP46E.ILA@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370264; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=mfAGmPTzD6IaWh4r35NYR4uvUMh+9qxawpYMgnabsqo=; b=7ju8LDE2y2OcpWQuVA9Q3ATwu+nCh62GMboiS7jx4tzJUnIHchCEssNjrbTcg3rUpR5n+G KMp3ATLrrhzY6h23sICd/F5ynZXyrsrtXuW3etS/WAjywR8jfo3otT31IsAX/SgcZee6AX hEXxsl4QJRze+qvh3DRTC3k0afTFu5A= X-Stat-Signature: euxq5bzfyj9quh45q69tio7tsiz4ex1y X-Rspamd-Queue-Id: 058DB1C0047 Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ladT4IDF; spf=pass (imf20.hostedemail.com: domain of 318pSYwoKCNgDNBIOABNIHAIIAF8.6IGFCHOR-GGEP46E.ILA@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=318pSYwoKCNgDNBIOABNIHAIIAF8.6IGFCHOR-GGEP46E.ILA@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1666370263-243678 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To prevent queueing a hugepage for migration multiple times, we use last_page to keep track of the last page we saw in queue_pages_hugetlb, and if the page we're looking at is last_page, then we skip it. For the non-hugetlb cases, last_page, although unused, is still updated so that it has a consistent meaning with the hugetlb case. This commit adds a check in hugetlb_fault for high-granularity migration PTEs. Signed-off-by: James Houghton --- include/linux/swapops.h | 8 ++++++-- mm/hugetlb.c | 15 ++++++++++++++- mm/mempolicy.c | 24 +++++++++++++++++++----- mm/migrate.c | 18 +++++++++++------- 4 files changed, 50 insertions(+), 15 deletions(-) diff --git a/include/linux/swapops.h b/include/linux/swapops.h index 86b95ccb81bb..2939323d0fd2 100644 --- a/include/linux/swapops.h +++ b/include/linux/swapops.h @@ -66,6 +66,8 @@ static inline bool is_pfn_swap_entry(swp_entry_t entry); +struct hugetlb_pte; + /* Clear all flags but only keep swp_entry_t related information */ static inline pte_t pte_swp_clear_flags(pte_t pte) { @@ -346,7 +348,8 @@ extern void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address); #ifdef CONFIG_HUGETLB_PAGE extern void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl); -extern void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte); +extern void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte); #endif /* CONFIG_HUGETLB_PAGE */ #else /* CONFIG_MIGRATION */ static inline swp_entry_t make_readable_migration_entry(pgoff_t offset) @@ -375,7 +378,8 @@ static inline void migration_entry_wait(struct mm_struct *mm, pmd_t *pmd, unsigned long address) { } #ifdef CONFIG_HUGETLB_PAGE static inline void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) { } -static inline void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) { } +static inline void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte) { } #endif /* CONFIG_HUGETLB_PAGE */ static inline int is_writable_migration_entry(swp_entry_t entry) { diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 2ee2c48ee79c..8dba8d59ebe5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6100,9 +6100,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * OK as we are only making decisions based on content and * not actually modifying content here. */ + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); entry = huge_ptep_get(ptep); if (unlikely(is_hugetlb_entry_migration(entry))) { - migration_entry_wait_huge(vma, ptep); + migration_entry_wait_huge(vma, &hpte); return 0; } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) return VM_FAULT_HWPOISON_LARGE | @@ -6142,7 +6144,18 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, hugetlb_hgm_walk(mm, vma, &hpte, address, PAGE_SIZE, /*stop_at_none=*/true); + /* + * Now that we have done a high-granularity walk, check again if we are + * looking at a migration entry. + */ entry = huge_ptep_get(hpte.ptep); + if (unlikely(is_hugetlb_entry_migration(entry))) { + hugetlb_vma_unlock_read(vma); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + migration_entry_wait_huge(vma, &hpte); + return 0; + } + /* PTE markers should be handled the same way as none pte */ if (huge_pte_none_mostly(entry)) { /* diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 275bc549590e..47bf9b16a9c0 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -424,6 +424,7 @@ struct queue_pages { unsigned long start; unsigned long end; struct vm_area_struct *first; + struct page *last_page; }; /* @@ -475,6 +476,7 @@ static int queue_pages_pmd(pmd_t *pmd, spinlock_t *ptl, unsigned long addr, flags = qp->flags; /* go to thp migration */ if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { + qp->last_page = page; if (!vma_migratable(walk->vma) || migrate_page_add(page, qp->pagelist, flags)) { ret = 1; @@ -532,6 +534,7 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, continue; if (!queue_pages_required(page, qp)) continue; + if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { /* MPOL_MF_STRICT must be specified if we get here */ if (!vma_migratable(vma)) { @@ -539,6 +542,8 @@ static int queue_pages_pte_range(pmd_t *pmd, unsigned long addr, break; } + qp->last_page = page; + /* * Do not abort immediately since there may be * temporary off LRU pages in the range. Still @@ -570,15 +575,22 @@ static int queue_pages_hugetlb(struct hugetlb_pte *hpte, spinlock_t *ptl; pte_t entry; - /* We don't migrate high-granularity HugeTLB mappings for now. */ - if (hugetlb_hgm_enabled(walk->vma)) - return -EINVAL; - ptl = hugetlb_pte_lock(walk->mm, hpte); entry = huge_ptep_get(hpte->ptep); if (!pte_present(entry)) goto unlock; - page = pte_page(entry); + + if (!hugetlb_pte_present_leaf(hpte, entry)) { + ret = -EAGAIN; + goto unlock; + } + + page = compound_head(pte_page(entry)); + + /* We already queued this page with another high-granularity PTE. */ + if (page == qp->last_page) + goto unlock; + if (!queue_pages_required(page, qp)) goto unlock; @@ -605,6 +617,7 @@ static int queue_pages_hugetlb(struct hugetlb_pte *hpte, /* With MPOL_MF_MOVE, we migrate only unshared hugepage. */ if (flags & (MPOL_MF_MOVE_ALL) || (flags & MPOL_MF_MOVE && page_mapcount(page) == 1)) { + qp->last_page = page; if (isolate_hugetlb(page, qp->pagelist) && (flags & MPOL_MF_STRICT)) /* @@ -740,6 +753,7 @@ queue_pages_range(struct mm_struct *mm, unsigned long start, unsigned long end, .start = start, .end = end, .first = NULL, + .last_page = NULL, }; err = walk_page_range(mm, start, end, &queue_pages_walk_ops, &qp); diff --git a/mm/migrate.c b/mm/migrate.c index 8712b694c5a7..197662dd1dc0 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -186,6 +186,9 @@ static bool remove_migration_pte(struct folio *folio, /* pgoff is invalid for ksm pages, but they are never large */ if (folio_test_large(folio) && !folio_test_hugetlb(folio)) idx = linear_page_index(vma, pvmw.address) - pvmw.pgoff; + else if (folio_test_hugetlb(folio)) + idx = (pvmw.address & ~huge_page_mask(hstate_vma(vma)))/ + PAGE_SIZE; new = folio_page(folio, idx); #ifdef CONFIG_ARCH_ENABLE_THP_MIGRATION @@ -235,14 +238,15 @@ static bool remove_migration_pte(struct folio *folio, #ifdef CONFIG_HUGETLB_PAGE if (folio_test_hugetlb(folio)) { + struct page *hpage = folio_page(folio, 0); unsigned int shift = pvmw.pte_order + PAGE_SHIFT; pte = arch_make_huge_pte(pte, shift, vma->vm_flags); if (folio_test_anon(folio)) - hugepage_add_anon_rmap(new, vma, pvmw.address, + hugepage_add_anon_rmap(hpage, vma, pvmw.address, rmap_flags); else - page_dup_file_rmap(new, true); + page_dup_file_rmap(hpage, true); set_huge_pte_at(vma->vm_mm, pvmw.address, pvmw.pte, pte); } else #endif @@ -258,7 +262,7 @@ static bool remove_migration_pte(struct folio *folio, mlock_page_drain_local(); trace_remove_migration_pte(pvmw.address, pte_val(pte), - compound_order(new)); + pvmw.pte_order); /* No need to invalidate - it was non-present before */ update_mmu_cache(vma, pvmw.address, pvmw.pte); @@ -332,12 +336,12 @@ void __migration_entry_wait_huge(pte_t *ptep, spinlock_t *ptl) migration_entry_wait_on_locked(pte_to_swp_entry(pte), NULL, ptl); } -void migration_entry_wait_huge(struct vm_area_struct *vma, pte_t *pte) +void migration_entry_wait_huge(struct vm_area_struct *vma, + struct hugetlb_pte *hpte) { - spinlock_t *ptl = huge_pte_lockptr(huge_page_shift(hstate_vma(vma)), - vma->vm_mm, pte); + spinlock_t *ptl = hugetlb_pte_lockptr(vma->vm_mm, hpte); - __migration_entry_wait_huge(pte, ptl); + __migration_entry_wait_huge(hpte->ptep, ptl); } #endif From patchwork Fri Oct 21 16:36:46 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015115 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2813EFA373D for ; Fri, 21 Oct 2022 16:43:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BD94A8E0003; Fri, 21 Oct 2022 12:43:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B89858E0001; Fri, 21 Oct 2022 12:43:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A9FFA8E0003; Fri, 21 Oct 2022 12:43:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 928EF8E0001 for ; Fri, 21 Oct 2022 12:43:57 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 592451205EE for ; Fri, 21 Oct 2022 16:43:57 +0000 (UTC) X-FDA: 80045528514.01.0C85D0A Received: from mail-ot1-f73.google.com (mail-ot1-f73.google.com [209.85.210.73]) by imf26.hostedemail.com (Postfix) with ESMTP id D78FF14003B for ; Fri, 21 Oct 2022 16:43:56 +0000 (UTC) Received: by mail-ot1-f73.google.com with SMTP id w17-20020a9d70d1000000b006619151992bso2005984otj.3 for ; Fri, 21 Oct 2022 09:43:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=AZXIzSP66kpr06+Kt+tXVsbo2+8AOhJazRiZYv7H/E0=; b=AaEDv6vDecL09NV8IK8qq+jwZiQZvvW6A8x7Ma8TJX0CZ7w6xMLKRf3Y909I4MsZZu 0hYooJEJeyPq66zTIVLiJg5dt/gi8xbCkq4HqSj0EjYIEz8fyCwDNYdQStbGaf9MswBA cM0TuGM5Hkk/qVgNL10a3ya+lk67Uue+WlYyzDhlhPN60sp+XydnBo5yzMI6BKPgITpI H+hf53xNhGv8YGv7YkIqkTgxh9mHI8YlwlzFtLYveDQJ8oSEEIKb54V3+B6SA/Eioz/A s4eUG3ERd3oa9TzjeUGeXvx0oAzfFHUOWw0ZDZJbL1DS3wUfuFPlKrawoJROET5ltInS kxow== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=AZXIzSP66kpr06+Kt+tXVsbo2+8AOhJazRiZYv7H/E0=; b=PDnN+QudLwPpLpU5akOJycm42Ji5sdyphhwo9Bdo6tKbnkpHnPBfINRCRRW61WVieE 077eD4DRqEx12l+FkFg9HjzR6L86FC/9OxS/TUG0nOm9CZfBtO0dHnfz45zvwHaevpVM YnHLAHheiwJJu1Ahy6JYQ7L6D8fyZzv4Rgg4jEuMJk+g1TVRb1CR/n0rMgwnXE0SWKsl qVWQ0KYXMlPWTZBR3oTRclHwUsRfb389q8UnyU+Juz23SkD3ZOuJCICh7jv2Y1yRovEg CsXhqONIiiJDcdnOl8GPXDjeVFvFbEd21km3s4DgVt0rJdNb8+wUJyadHfXqGOOO44HT NeBQ== X-Gm-Message-State: ACrzQf38BcFDrEiVCNi0805/bVUwvAlbvJ61vA8bmgfvNuwvjNkeiUnV quo9+DxB7QKcoFCHDxddPlE8482zlsprSlEo X-Google-Smtp-Source: AMsMyM5fFT2NWW07vlUOD6TzMkML3qSAAmP9Dq6cl7bToYf1cSTCX5pGnWQwkImQeBAyI0NRH5KmT/OClGf1fMeI X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:c1c2:0:b0:6c4:318:642f with SMTP id r185-20020a25c1c2000000b006c40318642fmr16424485ybf.561.1666370264324; Fri, 21 Oct 2022 09:37:44 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:46 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-31-jthoughton@google.com> Subject: [RFC PATCH v2 30/47] hugetlb: add high-granularity check for hwpoison in fault path From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370636; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=AZXIzSP66kpr06+Kt+tXVsbo2+8AOhJazRiZYv7H/E0=; b=xuHU7tM6qlGVxo/0XvPqJZaYR4/rKd2vx2gWFxHazUX1xx8yA5oteWKqCJ7ADTFgi4kR6K E353iVNqLE5phQDlkOx2xnCnvBNj9D8G92/zWrPKUzqJNF/3t6f5Bv0LBvPEOTzGaDOKWi EFKMyTCoDLA7TwbKWIbIS90vcIjVA90= ARC-Authentication-Results: i=1; imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=AaEDv6vD; spf=pass (imf26.hostedemail.com: domain of 32MpSYwoKCNkEOCJPBCOJIBJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--jthoughton.bounces.google.com designates 209.85.210.73 as permitted sender) smtp.mailfrom=32MpSYwoKCNkEOCJPBCOJIBJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370636; a=rsa-sha256; cv=none; b=hxeLg/nuecRb/UHBubcIClFlRLxd0vmpKoTUajEv84e4K4dFZLBZpjU055whU5m0xQLmXN 1qpYcB0dv4tzJcLrpAwH8J7DVOMZ5VbKrwSlfDd/82sE0P7XEoZ8033i3OtYlms6Um1mXp Nb4T1PeJ+ogrlzi4Yia+54Bw4FmpSok= X-Stat-Signature: f8ggpj6uo5w8yho5imhd6w6gychtommb X-Rspam-User: Authentication-Results: imf26.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=AaEDv6vD; spf=pass (imf26.hostedemail.com: domain of 32MpSYwoKCNkEOCJPBCOJIBJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--jthoughton.bounces.google.com designates 209.85.210.73 as permitted sender) smtp.mailfrom=32MpSYwoKCNkEOCJPBCOJIBJJBG9.7JHGDIPS-HHFQ57F.JMB@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: D78FF14003B X-HE-Tag: 1666370636-817375 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Because hwpoison swap entries may be placed beneath the hstate-level PTE, we need to check for it separately (on top of the hstate-level PTE check that remains). Signed-off-by: James Houghton --- mm/hugetlb.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 8dba8d59ebe5..bb0005d57cab 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6154,6 +6154,11 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, mutex_unlock(&hugetlb_fault_mutex_table[hash]); migration_entry_wait_huge(vma, &hpte); return 0; + } else if (unlikely(is_hugetlb_entry_hwpoisoned(entry))) { + hugetlb_vma_unlock_read(vma); + mutex_unlock(&hugetlb_fault_mutex_table[hash]); + return VM_FAULT_HWPOISON_LARGE | + VM_FAULT_SET_HINDEX(hstate_index(h)); } /* PTE markers should be handled the same way as none pte */ From patchwork Fri Oct 21 16:36:47 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015097 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8019DC433FE for ; Fri, 21 Oct 2022 16:37:50 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DB3278E001F; Fri, 21 Oct 2022 12:37:46 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id D3C658E0001; Fri, 21 Oct 2022 12:37:46 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B693F8E001F; Fri, 21 Oct 2022 12:37:46 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 99D258E0001 for ; Fri, 21 Oct 2022 12:37:46 -0400 (EDT) Received: from smtpin16.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 7E1D3140611 for ; Fri, 21 Oct 2022 16:37:46 +0000 (UTC) X-FDA: 80045512932.16.A967C74 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf15.hostedemail.com (Postfix) with ESMTP id 17C68A003F for ; Fri, 21 Oct 2022 16:37:45 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-36885d835e9so33779197b3.17 for ; Fri, 21 Oct 2022 09:37:45 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=scj2/ZcN7CmnhbVTPEMSvCXttKvb/J7A59aTeFA+Dhs=; b=S0gGSTHVvjAreWqeZs5A9raMMMeKXhXFwUCqBkMsUskY4XwYA32hBXLRGplHk45QeI 5hNgcngQPNVjy5Kk2gQ5YzOxS99JOyMMSdcynaw9rmdXfKkiFEtbFwINokhrgbql0JE2 0sKmbmef0q7IWIbffBh2wHykR88/4gHiqdd0pAw6PZWkeF9kCARu3pi/qAdcULFiEFK2 zqn43ZTH5u5pcpqC+v95dptWRM8ktSEwZXJsat+d9M6BgvVw7Tq+4w0GYsymA05kIp1Y kIdA/tQpriZyd9lUub9jcQWZ1R2v+B1qZdNcMF4fmnajZ6EhnPh0qg0O2vMtK7smyeOR wLPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=scj2/ZcN7CmnhbVTPEMSvCXttKvb/J7A59aTeFA+Dhs=; b=dJqsROvH0Hl0Bd0MmTYY2kLylVj/AtlG1vhgeSEToXUxx/VBTTZJlJ0Q174nPxTXjC AxzfBHUo7D/Pyrdbi27bUOyqj1NqvY3c8GmA2LRyX45r+g5wEE48sHZ19BbNNSJeWT7H 1m7lS97VjqPB6KytnZam5zVpcX7T2SpACHoH7aPs8yAab5sW2AxqpWwHZb22NBAUUDu0 Sfpel5UXSKAnNt7JY1rV45W37LjdmXCF017NM4GEA5516d4Hn7UMqBbsjb9uH+d9/GQ6 TOwezTU4aKvBAbE+9snnfTm9pbRvpOotTeUhXr8KBOjrTOo6NSgytTUQCC1Hpm3EaFBX gj+w== X-Gm-Message-State: ACrzQf2i6F/w24mLeDHofXeEEWKbha6TnGUUshzkL2CKFT6wQAcfwXVL yyYeqQhBtJMk8UUbUNNMqBCaRkN9gjeNZHKh X-Google-Smtp-Source: AMsMyM4ktFt5TlSATVEDX64pMc+gxEkbubGz5MVYH6J1QITcv/uZTkTFvGYjWRDFvS1XdP3P3wGfcec2WQcdi1Y5 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:c713:0:b0:6ca:203:504f with SMTP id w19-20020a25c713000000b006ca0203504fmr11754100ybe.574.1666370265356; Fri, 21 Oct 2022 09:37:45 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:47 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-32-jthoughton@google.com> Subject: [RFC PATCH v2 31/47] hugetlb: sort hstates in hugetlb_init_hstates From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370266; a=rsa-sha256; cv=none; b=o98i7vTzqBNd/y2zOs9gh3A/+RaNOkGLdCHET4rM+90+SSh/bKQKdNLeKsEocp5swO1oIO lq2Usp4/oe/9V0ERI3cHgfr3EbcMNRQUPCg+L97b4vV8jRCwbuhIL2fOqrU2VcClaDn8b9 3EyTnSThnlDB3+sqfY4B1iiSo0U4nts= ARC-Authentication-Results: i=1; imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=S0gGSTHV; spf=pass (imf15.hostedemail.com: domain of 32cpSYwoKCNoFPDKQCDPKJCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=32cpSYwoKCNoFPDKQCDPKJCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370266; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=scj2/ZcN7CmnhbVTPEMSvCXttKvb/J7A59aTeFA+Dhs=; b=Mb0upAYM5FBweEjyMm6sFogDr9YW+jq6dcs8xcSfxw/7WenrMbBLt/KLfYAxxb8AHZTQrP 2qMl5THFpDnNKIagNwSIVxWJ3uswSOxNaVvl9vm74KE+Y7O/5gO28rFDN5mX6rbs7ggkKl u8U1WW3rrCQks6ZB5g9g4wck7IPrkjc= Authentication-Results: imf15.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=S0gGSTHV; spf=pass (imf15.hostedemail.com: domain of 32cpSYwoKCNoFPDKQCDPKJCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=32cpSYwoKCNoFPDKQCDPKJCKKCHA.8KIHEJQT-IIGR68G.KNC@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: bq1xqk6rnfktxy9xj3pwhcqtknix8sh1 X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: 17C68A003F X-Rspam-User: X-HE-Tag: 1666370265-943110 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When using HugeTLB high-granularity mapping, we need to go through the supported hugepage sizes in decreasing order so that we pick the largest size that works. Consider the case where we're faulting in a 1G hugepage for the first time: we want hugetlb_fault/hugetlb_no_page to map it with a PUD. By going through the sizes in decreasing order, we will find that PUD_SIZE works before finding out that PMD_SIZE or PAGE_SIZE work too. This commit also changes bootmem hugepages from storing hstate pointers directly to storing the hstate sizes. The hstate pointers used for boot-time-allocated hugepages become invalid after we sort the hstates. `gather_bootmem_prealloc`, called after the hstates have been sorted, now converts the size to the correct hstate. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 2 +- mm/hugetlb.c | 49 ++++++++++++++++++++++++++++++++--------- 2 files changed, 40 insertions(+), 11 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index d305742e9d44..e25f97cdd086 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -772,7 +772,7 @@ struct hstate { struct huge_bootmem_page { struct list_head list; - struct hstate *hstate; + unsigned long hstate_sz; }; int isolate_or_dissolve_huge_page(struct page *page, struct list_head *list); diff --git a/mm/hugetlb.c b/mm/hugetlb.c index bb0005d57cab..d6f07968156c 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -34,6 +34,7 @@ #include #include #include +#include #include #include @@ -49,6 +50,10 @@ int hugetlb_max_hstate __read_mostly; unsigned int default_hstate_idx; +/* + * After hugetlb_init_hstates is called, hstates will be sorted from largest + * to smallest. + */ struct hstate hstates[HUGE_MAX_HSTATE]; #ifdef CONFIG_CMA @@ -3189,7 +3194,7 @@ int __alloc_bootmem_huge_page(struct hstate *h, int nid) /* Put them into a private list first because mem_map is not up yet */ INIT_LIST_HEAD(&m->list); list_add(&m->list, &huge_boot_pages); - m->hstate = h; + m->hstate_sz = huge_page_size(h); return 1; } @@ -3203,7 +3208,7 @@ static void __init gather_bootmem_prealloc(void) list_for_each_entry(m, &huge_boot_pages, list) { struct page *page = virt_to_page(m); - struct hstate *h = m->hstate; + struct hstate *h = size_to_hstate(m->hstate_sz); VM_BUG_ON(!hstate_is_gigantic(h)); WARN_ON(page_count(page) != 1); @@ -3319,9 +3324,38 @@ static void __init hugetlb_hstate_alloc_pages(struct hstate *h) kfree(node_alloc_noretry); } +static int compare_hstates_decreasing(const void *a, const void *b) +{ + unsigned long sz_a = huge_page_size((const struct hstate *)a); + unsigned long sz_b = huge_page_size((const struct hstate *)b); + + if (sz_a < sz_b) + return 1; + if (sz_a > sz_b) + return -1; + return 0; +} + +static void sort_hstates(void) +{ + unsigned long default_hstate_sz = huge_page_size(&default_hstate); + + /* Sort from largest to smallest. */ + sort(hstates, hugetlb_max_hstate, sizeof(*hstates), + compare_hstates_decreasing, NULL); + + /* + * We may have changed the location of the default hstate, so we need to + * update it. + */ + default_hstate_idx = hstate_index(size_to_hstate(default_hstate_sz)); +} + static void __init hugetlb_init_hstates(void) { - struct hstate *h, *h2; + struct hstate *h; + + sort_hstates(); for_each_hstate(h) { /* oversize hugepages were init'ed in early boot */ @@ -3340,13 +3374,8 @@ static void __init hugetlb_init_hstates(void) continue; if (hugetlb_cma_size && h->order <= HUGETLB_PAGE_ORDER) continue; - for_each_hstate(h2) { - if (h2 == h) - continue; - if (h2->order < h->order && - h2->order > h->demote_order) - h->demote_order = h2->order; - } + if (h - 1 >= &hstates[0]) + h->demote_order = huge_page_order(h - 1); } } From patchwork Fri Oct 21 16:36:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015098 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A686CFA373D for ; Fri, 21 Oct 2022 16:37:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F3A928E0020; Fri, 21 Oct 2022 12:37:47 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E776B8E0001; Fri, 21 Oct 2022 12:37:47 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C53908E0020; Fri, 21 Oct 2022 12:37:47 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id A1DE08E0001 for ; Fri, 21 Oct 2022 12:37:47 -0400 (EDT) Received: from smtpin28.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 8047341481 for ; Fri, 21 Oct 2022 16:37:47 +0000 (UTC) X-FDA: 80045512974.28.B3D26A5 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf02.hostedemail.com (Postfix) with ESMTP id 1CE8780003 for ; Fri, 21 Oct 2022 16:37:46 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id f9-20020a25b089000000b006be298e2a8dso3756008ybj.20 for ; Fri, 21 Oct 2022 09:37:46 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=WxKGqQ6TZ4IrlRpo46PGVbYr6hPIEsEGe3hsSCnb6qg=; b=nd84xfoSk805dVcAVyuoNqGohV7P5cWuxEwnitdbs1rkx0N7wPZbXDeJ4RvhkuBQuI 1v5ieZAWL4fzYytJgsDwUG418qnEFB1/qmZtj48lT3wl5WE9DVb2V1XTv+zdOICQ5pda 5diZHurVjQaoJ7Iv2PjP8DCGO1/4fukcBNdxrI/uAXh+hc99Z5wn2kayMv2lFTeOX5T6 kP7/z1HmhxO1WbkpEmUOZN3ey0YjwgIFaCUcG52wZtr3jBV58ShZQQ9oLg2hp/N1Ghjm Ji/V0BtPfvSDPqTGOkZQH+0TcxQ7lca6EKfHqsThwhdtFcp6XI9rp8phcpX8i0ybjokl Tk3Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=WxKGqQ6TZ4IrlRpo46PGVbYr6hPIEsEGe3hsSCnb6qg=; b=3TESg5xbRC9Rvpd8f7Md3xRlM7Sioip8p2ZoQ7BMUAmr+YUyIf/nntJdMrC29vxKhq Lx7UL7EpO4/1s2MNAwdEWw7BnPsBbFy5OoNXcPWk4QvNg1aD9tCxoi9GTg7HbVL7wa6g 2CSlcODRXwhClDaHTfgGXXh1QsuJgajn5n5no08CvKtIa7lHERPX/WeiNmvsD0t3vAM/ Ara9HDUkrzYEAY1yjE7IHl0ctQGWQa0HQKsoTF1jarNaq8tqnwl6Yw18kdbiq3W3pe7B ytWWDXA6T/l0ZhsSsfM9AUqYkfRMTPVsbZVPhj/ubEzUa8fuayiqu2NzHihc/rpIl+Yf YjuQ== X-Gm-Message-State: ACrzQf0vGmYhFbZPH16pJ5uIC+r1BkIUJmD+hZQvN0aqy3xCb2odb7el i7dqoPEUJHE7yZzia1367crFM23qe8REl6HU X-Google-Smtp-Source: AMsMyM4H5gt46GP7GjF8ARtqp9296GvSN14OD4CSfEEEt+Wku4dIzgl6cgqSu20XIEbBbUbyH8kZp+Wknizu7ee3 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1027:b0:6ca:71ca:e68b with SMTP id x7-20020a056902102700b006ca71cae68bmr2934076ybt.2.1666370266458; Fri, 21 Oct 2022 09:37:46 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:48 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-33-jthoughton@google.com> Subject: [RFC PATCH v2 32/47] hugetlb: add for_each_hgm_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=nd84xfoS; spf=pass (imf02.hostedemail.com: domain of 32spSYwoKCNsGQELRDEQLKDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=32spSYwoKCNsGQELRDEQLKDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370267; a=rsa-sha256; cv=none; b=4nfPrbFyDDmnyh8zAezVPoeDTf7LUCT/b4tnnypxB/F8mS2OgouDuzKF7fFHa7eCxXFtST F6CNoK9Nnwwdlm8BhmNv9dRe4+e1yvbGnrZeh4qF8o8y+IOEmAvHCmVqNtHqMGf8+PZEYD OHR4ZIz0OSI4qcN5mk++YK/HMu0F1wM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370267; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=WxKGqQ6TZ4IrlRpo46PGVbYr6hPIEsEGe3hsSCnb6qg=; b=xaqDUsRS9rF88xjfkq6KFlCxyMlT9hzH3bJ89F0dIEtjXkIEBg4HqEUGfEcn7z54xJ/ZY+ ULs06PNGkbelKYYaZUamhHvXBTsOhe/JtbnbCEx4JVwJVYZJxkdEd6DQIXgf/+ZYqlUL4w 4awpc6JJTosbELcj7vmquXGs0eZyFNs= Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=nd84xfoS; spf=pass (imf02.hostedemail.com: domain of 32spSYwoKCNsGQELRDEQLKDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=32spSYwoKCNsGQELRDEQLKDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: yxq8ohfy6izpw55uypqkg77hnu6m6wbw X-Rspamd-Queue-Id: 1CE8780003 X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1666370266-211939 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a helper macro to loop through all the usable page sizes for a high-granularity-enabled HugeTLB VMA. Given the VMA's hstate, it will loop, in descending order, through the page sizes that HugeTLB supports for this architecture. It always includes PAGE_SIZE. This is done by looping through the hstates; however, there is no hstate for PAGE_SIZE. To handle this case, the loop intentionally goes out of bounds, and the out-of-bounds pointer is mapped to PAGE_SIZE. Signed-off-by: James Houghton --- mm/hugetlb.c | 19 +++++++++++++++++++ 1 file changed, 19 insertions(+) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d6f07968156c..6eaec40d66ad 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7856,6 +7856,25 @@ int enable_hugetlb_hgm(struct vm_area_struct *vma) hugetlb_unshare_all_pmds(vma); return 0; } + +/* Should only be used by the for_each_hgm_shift macro. */ +static unsigned int __shift_for_hstate(struct hstate *h) +{ + /* If h is out of bounds, we have reached the end, so give PAGE_SIZE */ + if (h >= &hstates[hugetlb_max_hstate]) + return PAGE_SHIFT; + return huge_page_shift(h); +} + +/* + * Intentionally go out of bounds. An out-of-bounds hstate will be converted to + * PAGE_SIZE. + */ +#define for_each_hgm_shift(hstate, tmp_h, shift) \ + for ((tmp_h) = hstate; (shift) = __shift_for_hstate(tmp_h), \ + (tmp_h) <= &hstates[hugetlb_max_hstate]; \ + (tmp_h)++) + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* From patchwork Fri Oct 21 16:36:49 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015099 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BCA82C433FE for ; Fri, 21 Oct 2022 16:37:52 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 48CA98E0021; Fri, 21 Oct 2022 12:37:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 43F258E0001; Fri, 21 Oct 2022 12:37:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 0BA988E0021; Fri, 21 Oct 2022 12:37:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id EAB368E0001 for ; Fri, 21 Oct 2022 12:37:48 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id A20BC160457 for ; Fri, 21 Oct 2022 16:37:48 +0000 (UTC) X-FDA: 80045513016.12.EB1CDAA Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf16.hostedemail.com (Postfix) with ESMTP id 0645C180031 for ; Fri, 21 Oct 2022 16:37:47 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-36ab1ae386bso3392147b3.16 for ; Fri, 21 Oct 2022 09:37:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=+C7/ATuUriMIrUw34NjzRIATqKexU0rnzCYq9OSaiU4=; b=jJGO9Z7ZWa3ry6jjj7ApkjoGMgyzlebjLJ9uIX3/IGcrm79tNXbnUjUmPWEAbUWcEd 55DuEMpVP9519fDyAuDBtSlqfbKbDKkyB8ijR8pErjYk3LLzCP/D/GV/wZwetPTE7Ojr wpMDqo/jZbCPsmNcIrkWDF+SByqBvmoi+4RjlGJDBYF3ZdmqeeL+CbWAQiFjQNnor9LY 8qVv18b9TvK+Uf6pPv8vBqeHpFR9pTHLq1xNH9bPPuK8OH77Q8JvguOlKR7a9IFK3m+x IXO0PqqgFybb/i1Kn8jmukXUloBrvY5XjEr+pUxja4P0YbG1JGN8wCpY7Bj9c2UzEM8m BOcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=+C7/ATuUriMIrUw34NjzRIATqKexU0rnzCYq9OSaiU4=; b=sU5cxV0B1OgwKCv9kib4MpGTzO5VdlIz7532JE3DrUxYMKz44ToT5VO3sprVYaCUSe z/zdi/k2qxQ4BimPlFC4k0BOhn99cZNGc/oBdVZ3TdDQU0F67lrREaE4LpbK5pt7OwV9 pSMuKVfVxskf6TIwRyHPe/fYvbtvf4pw99gN8UuU+baQviWjl8Rpxb4UwPRyR4iWrOqJ qLeFZEMxZYmmOL2TBpPCdRAPfJI9sZVTJeng0Y7LdK+ieDMxv+Kd8Co81rJIRQ0QdHrj KqrksCzAV947RfOsy5Rs7t8TWnIUR/CbPKdzCv6DcUbuLUusr4zLInZAfxYyAvk438Xz YANg== X-Gm-Message-State: ACrzQf0LpeUbhj9TKmKFJmHIzxmmmYgBsxwtj/AOXR2Tuq+t+J6yWVRz I3cdsI4gjWsjuf3yGbD2xhUL20XUOIPP35qm X-Google-Smtp-Source: AMsMyM5P4lx4WIyqBLDAe0Bc4rPZ7u/BTYY9AuZts942sFnS05sAkglQc+I6YcdbaB2xek0CE+P143XsmtscbOX3 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:b851:0:b0:6ca:2b0b:d334 with SMTP id b17-20020a25b851000000b006ca2b0bd334mr9202954ybm.104.1666370267430; Fri, 21 Oct 2022 09:37:47 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:49 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-34-jthoughton@google.com> Subject: [RFC PATCH v2 33/47] userfaultfd: add UFFD_FEATURE_MINOR_HUGETLBFS_HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370268; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=+C7/ATuUriMIrUw34NjzRIATqKexU0rnzCYq9OSaiU4=; b=tG3qjGbaPYOQJvU4LHqB/es0p/bbJwntYmiIdCFzyrLRQKp93EJkiB/ObIVnAlZuOK6H3h PmbIj/DcudZzeboE7TzOtzm1fuZMy5GtijTHGG5oR3K+nQ5DuDE4l7maYGGB9c0ecv7rc6 pkBQ/54M9tDwTQo+q6WwCGULfNZ4CuM= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jJGO9Z7Z; spf=pass (imf16.hostedemail.com: domain of 328pSYwoKCNwHRFMSEFRMLEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=328pSYwoKCNwHRFMSEFRMLEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370268; a=rsa-sha256; cv=none; b=hNylix9+HFREalwDT1maZRAwrzX/pOd8yGvo/u1W6uF6yU7NzXzAOnRuvw+dtIKTJ/VxOX ygJ99tcMmPX3KHrN+dTDRP7R+xp1SbpmmbcGj7z/nucpPWGhB+4RfBJWJNeQ0sa+vO+0OL Irpw8CAZZh0WMDq/wN9/RgNgcqIDErQ= X-Stat-Signature: 8j9h1mnu9tzim97o1j3wdgukeftib6gt X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=jJGO9Z7Z; spf=pass (imf16.hostedemail.com: domain of 328pSYwoKCNwHRFMSEFRMLEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=328pSYwoKCNwHRFMSEFRMLEMMEJC.AMKJGLSV-KKIT8AI.MPE@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 0645C180031 X-HE-Tag: 1666370267-939585 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Userspace must provide this new feature when it calls UFFDIO_API to enable HGM. Userspace can check if the feature exists in uffdio_api.features, and if it does not exist, the kernel does not support and therefore did not enable HGM. Signed-off-by: James Houghton --- fs/userfaultfd.c | 12 +++++++++++- include/linux/userfaultfd_k.h | 7 +++++++ include/uapi/linux/userfaultfd.h | 2 ++ 3 files changed, 20 insertions(+), 1 deletion(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 07c81ab3fd4d..3a3e9ef74dab 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -226,6 +226,11 @@ static inline struct uffd_msg userfault_msg(unsigned long address, return msg; } +bool uffd_ctx_has_hgm(struct vm_userfaultfd_ctx *ctx) +{ + return ctx->ctx->features & UFFD_FEATURE_MINOR_HUGETLBFS_HGM; +} + #ifdef CONFIG_HUGETLB_PAGE /* * Same functionality as userfaultfd_must_wait below with modifications for @@ -1954,10 +1959,15 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, goto err_out; /* report all available features and ioctls to userland */ uffdio_api.features = UFFD_API_FEATURES; + #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR uffdio_api.features &= ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); -#endif +#ifndef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS_HGM; +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ +#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ + #ifndef CONFIG_HAVE_ARCH_USERFAULTFD_WP uffdio_api.features &= ~UFFD_FEATURE_PAGEFAULT_FLAG_WP; #endif diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h index f07e6998bb68..d8fa37f308f7 100644 --- a/include/linux/userfaultfd_k.h +++ b/include/linux/userfaultfd_k.h @@ -162,6 +162,8 @@ static inline bool vma_can_userfault(struct vm_area_struct *vma, vma_is_shmem(vma); } +extern bool uffd_ctx_has_hgm(struct vm_userfaultfd_ctx *); + extern int dup_userfaultfd(struct vm_area_struct *, struct list_head *); extern void dup_userfaultfd_complete(struct list_head *); @@ -228,6 +230,11 @@ static inline bool userfaultfd_armed(struct vm_area_struct *vma) return false; } +static inline bool uffd_ctx_has_hgm(struct vm_userfaultfd_ctx *ctx) +{ + return false; +} + static inline int dup_userfaultfd(struct vm_area_struct *vma, struct list_head *l) { diff --git a/include/uapi/linux/userfaultfd.h b/include/uapi/linux/userfaultfd.h index 005e5e306266..ae8080003560 100644 --- a/include/uapi/linux/userfaultfd.h +++ b/include/uapi/linux/userfaultfd.h @@ -36,6 +36,7 @@ UFFD_FEATURE_SIGBUS | \ UFFD_FEATURE_THREAD_ID | \ UFFD_FEATURE_MINOR_HUGETLBFS | \ + UFFD_FEATURE_MINOR_HUGETLBFS_HGM | \ UFFD_FEATURE_MINOR_SHMEM | \ UFFD_FEATURE_EXACT_ADDRESS | \ UFFD_FEATURE_WP_HUGETLBFS_SHMEM) @@ -217,6 +218,7 @@ struct uffdio_api { #define UFFD_FEATURE_MINOR_SHMEM (1<<10) #define UFFD_FEATURE_EXACT_ADDRESS (1<<11) #define UFFD_FEATURE_WP_HUGETLBFS_SHMEM (1<<12) +#define UFFD_FEATURE_MINOR_HUGETLBFS_HGM (1<<13) __u64 features; __u64 ioctls; From patchwork Fri Oct 21 16:36:50 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015100 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EB43BFA373D for ; Fri, 21 Oct 2022 16:37:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C307F8E0022; Fri, 21 Oct 2022 12:37:49 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB6FF8E0001; Fri, 21 Oct 2022 12:37:49 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8F80C8E0022; Fri, 21 Oct 2022 12:37:49 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 752CE8E0001 for ; Fri, 21 Oct 2022 12:37:49 -0400 (EDT) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 37C5C1405D7 for ; Fri, 21 Oct 2022 16:37:49 +0000 (UTC) X-FDA: 80045513058.02.07CF6E1 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf20.hostedemail.com (Postfix) with ESMTP id BE8481C003E for ; Fri, 21 Oct 2022 16:37:48 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id t6-20020a25b706000000b006b38040b6f7so3731478ybj.6 for ; Fri, 21 Oct 2022 09:37:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=UqxKOM3MWYAXAyGtNHot8Ydwfg1psxhFwpjBslLv29s=; b=UytSbHX2Jji0JBABtMgWoaGtBgF2MUn5vlwWqDRrVrcNAijPP9Qv7aUmRVAfTxv3Ft rWsJVyFJxCzZthpiWYJWzo3/CLwv8k9bBuZeEHVBwb6+quU9NKcUh+V+JdakL1oVm1Lr dG6N1EnVTKTYO39B1wEgizFXS4DG8EDiEOg5ij/oUeHv2Ul95dPDPNqU4XQYGGjFjYr/ 688C0dW1v+ij3m5bN6tv2h3klRHHsfbZzqDSO6L0zDjYBV0IMjd8371SGOjnV1fU9nnm sqiPf/pfItm/trjJqPfGvHI3I8Axr0JJvvQ2432CzGRIGyvDTWICZsz9NlNB6ANq6VQC 8zMQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=UqxKOM3MWYAXAyGtNHot8Ydwfg1psxhFwpjBslLv29s=; b=W4d44UCRShg/yZQgippanDxwgSBsq5j/3o6FxV398Z7pUOtakjpcxiO8k0jGA1TPb2 Iv3IGNguHkPfj2W98z/ZNQoFbamy1SiBXzKj5/kHx+4p6jfKgwmMw/3qSp/Vpa6CJSw2 Bd4ub9A6yaloOhJg5nsVNop9fY1ICdxToRsX5TV3F8MUPHMQnyqGMdvqtgS8dcSCTxhh Az0LcJ35ZcS1ocKRdL8B4t1aMN3+PwjrWrxS6RYNz6GGDp/yMee+D1PORERmo/QAwZjj axhkCErf/E6z0Fs/DnJKzmcKzxGuezxQrZDCm99wZjM+TdM5y4NOCJ/q6E20BM0qNOSQ 1GHA== X-Gm-Message-State: ACrzQf1JUq567og5AHGd5xwmUyPpl0uNqtI/bqd1Y9q3+TeF5hV1+c0o riGzf+BT8eL5Lrmq0mawRTQt4oXhkcjxXRKs X-Google-Smtp-Source: AMsMyM7BeLpxmGyPrti6+NMPI8m8keYcsP94HEVrTPhwNiH8qTLdbBFBhcQxduk3MQMxPCt6PRt7hmZ6rqOJOjx3 X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:13c7:b0:695:84d9:c5da with SMTP id y7-20020a05690213c700b0069584d9c5damr18146452ybu.650.1666370268278; Fri, 21 Oct 2022 09:37:48 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:50 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-35-jthoughton@google.com> Subject: [RFC PATCH v2 34/47] hugetlb: userfaultfd: add support for high-granularity UFFDIO_CONTINUE From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370268; a=rsa-sha256; cv=none; b=0IqJVSKVXDLTZqSamQgYmMqNSedHJFfZggCn7yMUVKOi041UKI61foG5Rgu+fAqfeDriL/ acCDr1amGbI3Jzl6CIEge77RY8pO4evQ+xYXzs+Omlr+I+VYBynhIIcbxZPPGMKzJrq2Lm tExiW8CP2+v6aFFu275v9d5MJ/19aYQ= ARC-Authentication-Results: i=1; imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=UytSbHX2; spf=pass (imf20.hostedemail.com: domain of 33MpSYwoKCN0ISGNTFGSNMFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=33MpSYwoKCN0ISGNTFGSNMFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370268; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=UqxKOM3MWYAXAyGtNHot8Ydwfg1psxhFwpjBslLv29s=; b=XnR0aNK78XbZitug5KO/SsTDmsYP/+DmztAkTHmOHQC6rIyNs5Pa3BH0M9zfJvhAkaEvBP F8wsTMiyDLGaBVeladIub6NNq0Lk49g1Lr55u09St0bMunTtjzE8ZuxtOHXasxNhbTd8yW lgMH1bCRjoKmpnvvbB9LyGmKxdEpLQw= X-Stat-Signature: gbttqtq18rqzxgy5pw7sitbnmm5pwzro X-Rspamd-Queue-Id: BE8481C003E Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=UytSbHX2; spf=pass (imf20.hostedemail.com: domain of 33MpSYwoKCN0ISGNTFGSNMFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=33MpSYwoKCN0ISGNTFGSNMFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-HE-Tag: 1666370268-734111 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Changes here are similar to the changes made for hugetlb_no_page. Pass vmf->real_address to userfaultfd_huge_must_wait because vmf->address is rounded down to the hugepage size, and a high-granularity page table walk would look up the wrong PTE. Also change the call to userfaultfd_must_wait in the same way for consistency. This commit introduces hugetlb_alloc_largest_pte which is used to find the appropriate PTE size to map pages with UFFDIO_CONTINUE. Signed-off-by: James Houghton --- fs/userfaultfd.c | 33 +++++++++++++++--- include/linux/hugetlb.h | 14 +++++++- mm/hugetlb.c | 76 +++++++++++++++++++++++++++++++++-------- mm/userfaultfd.c | 46 +++++++++++++++++-------- 4 files changed, 135 insertions(+), 34 deletions(-) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 3a3e9ef74dab..0204108e3882 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -245,14 +245,22 @@ static inline bool userfaultfd_huge_must_wait(struct userfaultfd_ctx *ctx, struct mm_struct *mm = ctx->mm; pte_t *ptep, pte; bool ret = true; + struct hugetlb_pte hpte; + unsigned long sz = vma_mmu_pagesize(vma); + unsigned int shift = huge_page_shift(hstate_vma(vma)); mmap_assert_locked(mm); - ptep = huge_pte_offset(mm, address, vma_mmu_pagesize(vma)); + ptep = huge_pte_offset(mm, address, sz); if (!ptep) goto out; + hugetlb_pte_populate(&hpte, ptep, shift, hpage_size_to_level(sz)); + hugetlb_hgm_walk(mm, vma, &hpte, address, PAGE_SIZE, + /*stop_at_none=*/true); + ptep = hpte.ptep; + ret = false; pte = huge_ptep_get(ptep); @@ -498,6 +506,14 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) blocking_state = userfaultfd_get_blocking_state(vmf->flags); + if (is_vm_hugetlb_page(vmf->vma) && hugetlb_hgm_enabled(vmf->vma)) + /* + * Lock the VMA lock so we can do a high-granularity walk in + * userfaultfd_huge_must_wait. We have to grab this lock before + * we set our state to blocking. + */ + hugetlb_vma_lock_read(vmf->vma); + spin_lock_irq(&ctx->fault_pending_wqh.lock); /* * After the __add_wait_queue the uwq is visible to userland @@ -513,12 +529,15 @@ vm_fault_t handle_userfault(struct vm_fault *vmf, unsigned long reason) spin_unlock_irq(&ctx->fault_pending_wqh.lock); if (!is_vm_hugetlb_page(vmf->vma)) - must_wait = userfaultfd_must_wait(ctx, vmf->address, vmf->flags, - reason); + must_wait = userfaultfd_must_wait(ctx, vmf->real_address, + vmf->flags, reason); else must_wait = userfaultfd_huge_must_wait(ctx, vmf->vma, - vmf->address, + vmf->real_address, vmf->flags, reason); + + if (is_vm_hugetlb_page(vmf->vma) && hugetlb_hgm_enabled(vmf->vma)) + hugetlb_vma_unlock_read(vmf->vma); mmap_read_unlock(mm); if (likely(must_wait && !READ_ONCE(ctx->released))) { @@ -1463,6 +1482,12 @@ static int userfaultfd_register(struct userfaultfd_ctx *ctx, mas_pause(&mas); } next: + if (is_vm_hugetlb_page(vma) && (ctx->features & + UFFD_FEATURE_MINOR_HUGETLBFS_HGM)) { + ret = enable_hugetlb_hgm(vma); + if (ret) + break; + } /* * In the vma_merge() successful mprotect-like case 8: * the next vma was merged into the current one and diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index e25f97cdd086..00c22a84a1c6 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -250,7 +250,8 @@ unsigned long hugetlb_total_pages(void); vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, unsigned int flags); #ifdef CONFIG_USERFAULTFD -int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte, +int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, + struct hugetlb_pte *dst_hpte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, @@ -1272,6 +1273,9 @@ static inline enum hugetlb_level hpage_size_to_level(unsigned long sz) bool hugetlb_hgm_enabled(struct vm_area_struct *vma); bool hugetlb_hgm_eligible(struct vm_area_struct *vma); int enable_hugetlb_hgm(struct vm_area_struct *vma); +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { @@ -1285,6 +1289,14 @@ static inline int enable_hugetlb_hgm(struct vm_area_struct *vma) { return -EINVAL; } + +static inline +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + return -EINVAL; +} #endif static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 6eaec40d66ad..c25d3cd73ac9 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -6325,7 +6325,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma, * modifications for huge pages. */ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, - pte_t *dst_pte, + struct hugetlb_pte *dst_hpte, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, @@ -6336,13 +6336,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, bool is_continue = (mode == MCOPY_ATOMIC_CONTINUE); struct hstate *h = hstate_vma(dst_vma); struct address_space *mapping = dst_vma->vm_file->f_mapping; - pgoff_t idx = vma_hugecache_offset(h, dst_vma, dst_addr); + unsigned long haddr = dst_addr & huge_page_mask(h); + pgoff_t idx = vma_hugecache_offset(h, dst_vma, haddr); unsigned long size; int vm_shared = dst_vma->vm_flags & VM_SHARED; pte_t _dst_pte; spinlock_t *ptl; int ret = -ENOMEM; - struct page *page; + struct page *page, *subpage; int writable; bool page_in_pagecache = false; @@ -6357,12 +6358,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * a non-missing case. Return -EEXIST. */ if (vm_shared && - hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + hugetlbfs_pagecache_present(h, dst_vma, haddr)) { ret = -EEXIST; goto out; } - page = alloc_huge_page(dst_vma, dst_addr, 0); + page = alloc_huge_page(dst_vma, haddr, 0); if (IS_ERR(page)) { ret = -ENOMEM; goto out; @@ -6378,13 +6379,13 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, /* Free the allocated page which may have * consumed a reservation. */ - restore_reserve_on_error(h, dst_vma, dst_addr, page); + restore_reserve_on_error(h, dst_vma, haddr, page); put_page(page); /* Allocate a temporary page to hold the copied * contents. */ - page = alloc_huge_page_vma(h, dst_vma, dst_addr); + page = alloc_huge_page_vma(h, dst_vma, haddr); if (!page) { ret = -ENOMEM; goto out; @@ -6398,14 +6399,14 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, } } else { if (vm_shared && - hugetlbfs_pagecache_present(h, dst_vma, dst_addr)) { + hugetlbfs_pagecache_present(h, dst_vma, haddr)) { put_page(*pagep); ret = -EEXIST; *pagep = NULL; goto out; } - page = alloc_huge_page(dst_vma, dst_addr, 0); + page = alloc_huge_page(dst_vma, haddr, 0); if (IS_ERR(page)) { put_page(*pagep); ret = -ENOMEM; @@ -6447,7 +6448,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, page_in_pagecache = true; } - ptl = huge_pte_lock(h, dst_mm, dst_pte); + ptl = hugetlb_pte_lock(dst_mm, dst_hpte); ret = -EIO; if (PageHWPoison(page)) @@ -6459,7 +6460,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, * page backing it, then access the page. */ ret = -EEXIST; - if (!huge_pte_none_mostly(huge_ptep_get(dst_pte))) + if (!huge_pte_none_mostly(huge_ptep_get(dst_hpte->ptep))) goto out_release_unlock; if (page_in_pagecache) { @@ -6478,7 +6479,11 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, else writable = dst_vma->vm_flags & VM_WRITE; - _dst_pte = make_huge_pte(dst_vma, page, writable); + subpage = hugetlb_find_subpage(h, page, dst_addr); + WARN_ON_ONCE(subpage != page && !hugetlb_hgm_enabled(dst_vma)); + + _dst_pte = make_huge_pte_with_shift(dst_vma, subpage, writable, + dst_hpte->shift); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not @@ -6491,12 +6496,12 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, if (wp_copy) _dst_pte = huge_pte_mkuffd_wp(_dst_pte); - set_huge_pte_at(dst_mm, dst_addr, dst_pte, _dst_pte); + set_huge_pte_at(dst_mm, dst_addr, dst_hpte->ptep, _dst_pte); - hugetlb_count_add(pages_per_huge_page(h), dst_mm); + hugetlb_count_add(hugetlb_pte_size(dst_hpte) / PAGE_SIZE, dst_mm); /* No need to invalidate - it was non-present before */ - update_mmu_cache(dst_vma, dst_addr, dst_pte); + update_mmu_cache(dst_vma, dst_addr, dst_hpte->ptep); spin_unlock(ptl); if (!is_continue) @@ -7875,6 +7880,47 @@ static unsigned int __shift_for_hstate(struct hstate *h) (tmp_h) <= &hstates[hugetlb_max_hstate]; \ (tmp_h)++) +/* + * Allocate a HugeTLB PTE that maps as much of [start, end) as possible with a + * single page table entry. The allocated HugeTLB PTE is returned in @hpte. + */ +int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, + struct vm_area_struct *vma, unsigned long start, + unsigned long end) +{ + struct hstate *h = hstate_vma(vma), *tmp_h; + unsigned int shift; + unsigned long sz; + int ret; + pte_t *ptep; + + for_each_hgm_shift(h, tmp_h, shift) { + sz = 1UL << shift; + + if (!IS_ALIGNED(start, sz) || start + sz > end) + continue; + goto found; + } + return -EINVAL; +found: + ptep = huge_pte_alloc(mm, vma, start, huge_page_size(h)); + if (!ptep) + return -ENOMEM; + + hugetlb_pte_populate(hpte, ptep, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + + ret = hugetlb_hgm_walk(mm, vma, hpte, start, 1L << shift, + /*stop_at_none=*/false); + if (ret) + return ret; + + if (hpte->shift > shift) + return -EEXIST; + + return 0; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c index e24e8a47ce8a..c4a8e6666ea6 100644 --- a/mm/userfaultfd.c +++ b/mm/userfaultfd.c @@ -315,14 +315,16 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, { int vm_shared = dst_vma->vm_flags & VM_SHARED; ssize_t err; - pte_t *dst_pte; unsigned long src_addr, dst_addr; long copied; struct page *page; - unsigned long vma_hpagesize; + unsigned long vma_hpagesize, target_pagesize; pgoff_t idx; u32 hash; struct address_space *mapping; + bool use_hgm = uffd_ctx_has_hgm(&dst_vma->vm_userfaultfd_ctx) && + mode == MCOPY_ATOMIC_CONTINUE; + struct hstate *h = hstate_vma(dst_vma); /* * There is no default zero huge page for all huge page sizes as @@ -340,12 +342,13 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, copied = 0; page = NULL; vma_hpagesize = vma_kernel_pagesize(dst_vma); + target_pagesize = use_hgm ? PAGE_SIZE : vma_hpagesize; /* - * Validate alignment based on huge page size + * Validate alignment based on the targeted page size. */ err = -EINVAL; - if (dst_start & (vma_hpagesize - 1) || len & (vma_hpagesize - 1)) + if (dst_start & (target_pagesize - 1) || len & (target_pagesize - 1)) goto out_unlock; retry: @@ -362,6 +365,8 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, err = -EINVAL; if (vma_hpagesize != vma_kernel_pagesize(dst_vma)) goto out_unlock; + if (use_hgm && !hugetlb_hgm_enabled(dst_vma)) + goto out_unlock; vm_shared = dst_vma->vm_flags & VM_SHARED; } @@ -376,13 +381,15 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, } while (src_addr < src_start + len) { + struct hugetlb_pte hpte; + pte_t *dst_pte; BUG_ON(dst_addr >= dst_start + len); /* * Serialize via vma_lock and hugetlb_fault_mutex. - * vma_lock ensures the dst_pte remains valid even - * in the case of shared pmds. fault mutex prevents - * races with other faulting threads. + * vma_lock ensures the hpte.ptep remains valid even + * in the case of shared pmds and page table collapsing. + * fault mutex prevents races with other faulting threads. */ idx = linear_page_index(dst_vma, dst_addr); mapping = dst_vma->vm_file->f_mapping; @@ -390,23 +397,33 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, mutex_lock(&hugetlb_fault_mutex_table[hash]); hugetlb_vma_lock_read(dst_vma); - err = -ENOMEM; + err = 0; dst_pte = huge_pte_alloc(dst_mm, dst_vma, dst_addr, vma_hpagesize); - if (!dst_pte) { + if (!dst_pte) + err = -ENOMEM; + else { + hugetlb_pte_populate(&hpte, dst_pte, huge_page_shift(h), + hpage_size_to_level(huge_page_size(h))); + if (use_hgm) + err = hugetlb_alloc_largest_pte(&hpte, + dst_mm, dst_vma, dst_addr, + dst_start + len); + } + if (err) { hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } if (mode != MCOPY_ATOMIC_CONTINUE && - !huge_pte_none_mostly(huge_ptep_get(dst_pte))) { + !huge_pte_none_mostly(huge_ptep_get(hpte.ptep))) { err = -EEXIST; hugetlb_vma_unlock_read(dst_vma); mutex_unlock(&hugetlb_fault_mutex_table[hash]); goto out_unlock; } - err = hugetlb_mcopy_atomic_pte(dst_mm, dst_pte, dst_vma, + err = hugetlb_mcopy_atomic_pte(dst_mm, &hpte, dst_vma, dst_addr, src_addr, mode, &page, wp_copy); @@ -418,6 +435,7 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, if (unlikely(err == -ENOENT)) { mmap_read_unlock(dst_mm); BUG_ON(!page); + BUG_ON(hpte.shift != huge_page_shift(h)); err = copy_huge_page_from_user(page, (const void __user *)src_addr, @@ -435,9 +453,9 @@ static __always_inline ssize_t __mcopy_atomic_hugetlb(struct mm_struct *dst_mm, BUG_ON(page); if (!err) { - dst_addr += vma_hpagesize; - src_addr += vma_hpagesize; - copied += vma_hpagesize; + dst_addr += hugetlb_pte_size(&hpte); + src_addr += hugetlb_pte_size(&hpte); + copied += hugetlb_pte_size(&hpte); if (fatal_signal_pending(current)) err = -EINTR; From patchwork Fri Oct 21 16:36:51 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015101 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2DEF5FA373F for ; Fri, 21 Oct 2022 16:37:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BBB458E0023; Fri, 21 Oct 2022 12:37:50 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7F378E0001; Fri, 21 Oct 2022 12:37:50 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 833C08E0023; Fri, 21 Oct 2022 12:37:50 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 6BD2C8E0001 for ; Fri, 21 Oct 2022 12:37:50 -0400 (EDT) Received: from smtpin22.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 49B7AA1317 for ; Fri, 21 Oct 2022 16:37:50 +0000 (UTC) X-FDA: 80045513100.22.864A082 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf25.hostedemail.com (Postfix) with ESMTP id D82C0A0027 for ; Fri, 21 Oct 2022 16:37:49 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id d8-20020a25bc48000000b00680651cf051so3722669ybk.23 for ; Fri, 21 Oct 2022 09:37:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=u3+VTs02NGQ5oyT2/qztgs968E2VJInsJwZCRXskVX8=; b=B67mZeoRzaRq4630o3McC0EzYYV2BfBTiABJMfKCzqnLycO3omT8Gy4/5jQCJFz8AY Qu+pjgsVsBEQ7iOage5JALUzkvGcVzVAI+BCZzndDO2njS28MHuucgPnHYGjHi0MMXoR Jp5rOJtddtvnQu8CbcPae/HSkg70RTOIctaRcEgcIMgYrmnjlaiPn4cNjpoWw5tTnOGk A3Uul0URZQlgQ2/D4cBlS3e4c0tE/yI2F+txMyFPuIxYVsSXN6ldOmoWU22e+oy+uchS z7av7s9Z3JIen2P4NfWiU5pOIplHTUb/SK5SQWwj0gI2b/bXVCp6bRB3GWynd6s9gcat VN1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=u3+VTs02NGQ5oyT2/qztgs968E2VJInsJwZCRXskVX8=; b=qAjCRFxhUIbxcKIKk/pr/SUiJEpNNc6B6tmZJqrPc4hZzTF/tJBz1mu0hQlPYXMIXW 2qx9NJqYicQ/SzHbjNtOMo1aj1mPJ71AiQX6OLKzj9PofjB21YOziJcCw31/7xr/QL9G T1N6UBYykmVR0BTPbo+wUpI7RScAcPVw2+aBsHAsV1f2YxNsIu16ahlPCwfornXE0q9O Ub0iRGTdU5eMU7OEBfhUJCsvf5U92QTAoGmSrPVBoVhwN7QuRR576FbOYHafZV17cv0v VbacUN8Nu8TJ9hiPs0fVuaIBxWvr0y22Gyq19eCC3evfYcS04cdxS+XLOy2O+avkixnD r2rA== X-Gm-Message-State: ACrzQf21xCtwdOf1EyN6fcww4I5M/+SMWiyLz5qbm28alNo+Hed87krD /hqCWuDhyYax1DOJzf+pM8wcIG3WcjwETIko X-Google-Smtp-Source: AMsMyM5EIJbefFWR9wH2Ga6LCed1ohj9fcbD4MeuIorT1fpq+Z8mg3azIGenpfDISn40nNMhrx3GeTcUaANowwsy X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:2187:0:b0:6b0:1abc:2027 with SMTP id h129-20020a252187000000b006b01abc2027mr17110269ybh.348.1666370269187; Fri, 21 Oct 2022 09:37:49 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:51 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-36-jthoughton@google.com> Subject: [RFC PATCH v2 35/47] userfaultfd: require UFFD_FEATURE_EXACT_ADDRESS when using HugeTLB HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370269; a=rsa-sha256; cv=none; b=vGDB7wYhiEZTRQ4rPjc3T1it7S3zxJl39/ttAgBcV5Nlhe0jN9iOkywlkxbaYIqAOZaln2 k0fu871jc/1j1e+q8uK2Gxqfd8Q+Q399ianTa+mq3q+D6dB0Qhv2u6mAvryDtVBshCjm6k 3cIhqC11IDNsWnWJGEgAHSJsATzZUeE= ARC-Authentication-Results: i=1; imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=B67mZeoR; spf=pass (imf25.hostedemail.com: domain of 33cpSYwoKCN4JTHOUGHTONGOOGLE.COMLINUX-MMKVACK.ORG@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=33cpSYwoKCN4JTHOUGHTONGOOGLE.COMLINUX-MMKVACK.ORG@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370269; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u3+VTs02NGQ5oyT2/qztgs968E2VJInsJwZCRXskVX8=; b=o76/q97ahNMudtvqE9wZb6kFLzkPfIpW2oZzFr97BII2rxzVFUmQ42vZRNtVR12AocQowx +3WpxuqjzzHiSHqw5ClLDjezHPmaoHOqDygLbbhCSc+rUIlNlzkXIpYme8xQx3uZjCbo9Y ftnv/yBwn3HycvmVXaj1Ma5fbuLqKlk= X-Stat-Signature: dhqd4pgtabdpke9i3tw5qdih83kc9gnu X-Rspamd-Queue-Id: D82C0A0027 X-Rspam-User: X-Rspamd-Server: rspam03 Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=B67mZeoR; spf=pass (imf25.hostedemail.com: domain of 33cpSYwoKCN4JTHOUGHTONGOOGLE.COMLINUX-MMKVACK.ORG@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=33cpSYwoKCN4JTHOUGHTONGOOGLE.COMLINUX-MMKVACK.ORG@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1666370269-920687 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: To avoid bugs in userspace, we require that userspace provide UFFD_FEATURE_EXACT_ADDRESS when using UFFD_FEATURE_MINOR_HUGETLBFS_HGM, otherwise UFFDIO_API will fail with EINVAL. The potential confusion is this: without EXACT_ADDRESS, the address given in the userfaultfd message will be rounded down to the hugepage size. Userspace may think that, because they're using HGM, just UFFDIO_CONTINUE the interval [address, address+PAGE_SIZE), but for faults that didn't occur in the first base page of the hugepage, this won't resolve the fault. The only choice it has in this scenario is to UFFDIO_CONTINUE the interval [address, address+hugepage_size), which negates the purpose of using HGM in the first place. By requiring userspace to provide UFFD_FEATURE_EXACT_ADDRESS, there is no rounding, and userspace now has the information it needs to appropriately resolve the fault. Another potential solution here is to change the behavior when UFFD_FEATURE_EXACT_ADDRESS is not provided: when HGM is enabled, start rounding to PAGE_SIZE instead of to the hugepage size. I think requiring UFFD_FEATURE_EXACT_ADDRESS is cleaner. Signed-off-by: James Houghton --- fs/userfaultfd.c | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c index 0204108e3882..c8f21f53e37d 100644 --- a/fs/userfaultfd.c +++ b/fs/userfaultfd.c @@ -1990,6 +1990,17 @@ static int userfaultfd_api(struct userfaultfd_ctx *ctx, ~(UFFD_FEATURE_MINOR_HUGETLBFS | UFFD_FEATURE_MINOR_SHMEM); #ifndef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING uffdio_api.features &= ~UFFD_FEATURE_MINOR_HUGETLBFS_HGM; +#else + + ret = -EINVAL; + if ((uffdio_api.features & UFFD_FEATURE_MINOR_HUGETLBFS_HGM) && + !(uffdio_api.features & UFFD_FEATURE_EXACT_ADDRESS)) + /* + * UFFD_FEATURE_MINOR_HUGETLBFS_HGM is mostly + * useless without UFFD_FEATURE_EXACT_ADDRESS, + * so require userspace to provide both. + */ + goto err_out; #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ #endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */ From patchwork Fri Oct 21 16:36:52 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015102 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6B872FA3740 for ; Fri, 21 Oct 2022 16:37:56 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 773F68E0024; Fri, 21 Oct 2022 12:37:51 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6FCCB8E0001; Fri, 21 Oct 2022 12:37:51 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 500238E0024; Fri, 21 Oct 2022 12:37:51 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 3B93B8E0001 for ; Fri, 21 Oct 2022 12:37:51 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 1D284160409 for ; Fri, 21 Oct 2022 16:37:51 +0000 (UTC) X-FDA: 80045513142.09.E7197EE Received: from mail-yb1-f202.google.com (mail-yb1-f202.google.com [209.85.219.202]) by imf05.hostedemail.com (Postfix) with ESMTP id B879A10003D for ; Fri, 21 Oct 2022 16:37:50 +0000 (UTC) Received: by mail-yb1-f202.google.com with SMTP id y6-20020a25b9c6000000b006c1c6161716so3756419ybj.8 for ; Fri, 21 Oct 2022 09:37:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9XBceLY8DUUPXaMS6s0AnMTF5dFEtD/aKN5y5ep44a8=; b=OgH1PzymDfy3mw5S5P/GrBugnAOY7dXplImrNOPJNzGgEj5Ynkk+R2pxayxr0P31L+ I7KeAWrzVvfqQft9BqHY8unXCkX4tbl4+yOLciSqD0qIO+ksbnyovctrGta/pIV8JNaG W6hTpfrnvPg+23rAzFRCtUicfoL9p4+j7RCT+Huce0P9QHhJj1qJU0NBfwFm3ad7u1Th CIpBC/83vC+apa7HIzMVeXUPgw0BhfSreH1K/aS4Ytz1C/SF7HhYjjJiarG3JHxSYM/N OlbY1e6bk+zMD25bt+/pLDkYPVZFHVfTiNohjOvsYUlfTvGIg+EDo6vuDIqJC1bsWShr W5sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9XBceLY8DUUPXaMS6s0AnMTF5dFEtD/aKN5y5ep44a8=; b=6819cHiklyrHHJnnXRGDQo+WLd4/drlRCuNCUn6PtekCpWfLXbvw8pm6e9b2vQZ1h7 YRc25E4ZEp258yX2q84+75wMd+t/WdPYc5O3I2/b3zAyr5PPNJUEVf/8a4e9A+aF0lKK M/S218TgemRZSM7/4pOoWbfSnptxZnYDg5ZCvqn1Xybh7KksprnPYSHPaRChJXW6ooD+ VyNPIY4DpmVu7zC2sN09zYatphuveQA+La8F4NF3Nv+dqhyTWKxHiPzB/M/YOeqW0jdw W/e3xQLSev1iTlOxOB4kcOShc+uOt48S4iiNnmNzf4BvbSCz3iBUG/HoDljvO3u7ovfV gtyw== X-Gm-Message-State: ACrzQf0mdHOnsLj3UaARHQJm54SQrXh/4corMbHsp3wGXvMgFo3DieOs CMcTqJdXVYSGgw9+gxV67vhrmUVIf6Jc9aE3 X-Google-Smtp-Source: AMsMyM56wfg6tebFxMZ6dZr5I9ipUQ4dmY2vdBMm2uXtPrwrmCPjNUqRAXwKZ9096wT8uJYWAxw1NCWSaLMcXzXh X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:5:0:b0:6c5:3855:d87b with SMTP id 5-20020a250005000000b006c53855d87bmr18032790yba.84.1666370270050; Fri, 21 Oct 2022 09:37:50 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:52 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-37-jthoughton@google.com> Subject: [RFC PATCH v2 36/47] hugetlb: add MADV_COLLAPSE for hugetlb From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=OgH1Pzym; spf=pass (imf05.hostedemail.com: domain of 33spSYwoKCN8KUIPVHIUPOHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=33spSYwoKCN8KUIPVHIUPOHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370270; a=rsa-sha256; cv=none; b=q9Dtm4Z/xa1GKlkwe0EygQUD3W3jHiOqcl5sN9Q6ArluKR4c2mBaiVOwCHihxqJ7vBKWtl EUOw1YQDMl/utKdJpZiU4CNSglsdQ6ZgToL2B/IbQWXHBWZUxzSfIzedsBXs7eCWY1uCvH YndF6Fn62LNjWwjpukeQuF6BKA3ayqA= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370270; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9XBceLY8DUUPXaMS6s0AnMTF5dFEtD/aKN5y5ep44a8=; b=r7d/BzRfQP+s3M8i7a/2bMTbWHMIr4Q1n4ZcnMT+tRrBnq+4R1L8Wpnj+G6N6zuvjBBa4l WugUTrWzvpixXwdTkIunISXRmK7iwPCpHqE1JGRLUxb3rtFS+R91ELhRT3zvqEohH6govp h3X562c3OlVKgkXsMruTESBUCEYz8NU= Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=OgH1Pzym; spf=pass (imf05.hostedemail.com: domain of 33spSYwoKCN8KUIPVHIUPOHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--jthoughton.bounces.google.com designates 209.85.219.202 as permitted sender) smtp.mailfrom=33spSYwoKCN8KUIPVHIUPOHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: tzg6aqrk78ea31w55p7ehg3raqywsfbt X-Rspamd-Queue-Id: B879A10003D X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1666370270-920207 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is a necessary extension to the UFFDIO_CONTINUE changes. When userspace finishes mapping an entire hugepage with UFFDIO_CONTINUE, the kernel has no mechanism to automatically collapse the page table to map the whole hugepage normally. We require userspace to inform us that they would like the mapping to be collapsed; they do this with MADV_COLLAPSE. If userspace has not mapped all of a hugepage with UFFDIO_CONTINUE, but only some, hugetlb_collapse will cause the requested range to be mapped as if it were UFFDIO_CONTINUE'd already. The effects of any UFFDIO_WRITEPROTECT calls may be undone by a call to MADV_COLLAPSE for intersecting address ranges. This commit is co-opting the same madvise mode that has been introduced to synchronously collapse THPs. The function that does THP collapsing has been renamed to madvise_collapse_thp. As with the rest of the high-granularity mapping support, MADV_COLLAPSE is only supported for shared VMAs right now. Signed-off-by: James Houghton --- include/linux/huge_mm.h | 12 ++-- include/linux/hugetlb.h | 8 +++ mm/hugetlb.c | 142 ++++++++++++++++++++++++++++++++++++++++ mm/khugepaged.c | 4 +- mm/madvise.c | 24 ++++++- 5 files changed, 181 insertions(+), 9 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 5d861905df46..fc2813db5e2e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,9 +226,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); -int madvise_collapse(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end); +int madvise_collapse_thp(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -373,9 +373,9 @@ static inline int hugepage_madvise(struct vm_area_struct *vma, return -EINVAL; } -static inline int madvise_collapse(struct vm_area_struct *vma, - struct vm_area_struct **prev, - unsigned long start, unsigned long end) +static inline int madvise_collapse_thp(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) { return -EINVAL; } diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 00c22a84a1c6..5378b98cc7b8 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1276,6 +1276,8 @@ int enable_hugetlb_hgm(struct vm_area_struct *vma); int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, struct vm_area_struct *vma, unsigned long start, unsigned long end); +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end); #else static inline bool hugetlb_hgm_enabled(struct vm_area_struct *vma) { @@ -1297,6 +1299,12 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, { return -EINVAL; } +static inline +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + return -EINVAL; +} #endif static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/hugetlb.c b/mm/hugetlb.c index c25d3cd73ac9..d80db81a1fa5 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -7921,6 +7921,148 @@ int hugetlb_alloc_largest_pte(struct hugetlb_pte *hpte, struct mm_struct *mm, return 0; } +/* + * Collapse the address range from @start to @end to be mapped optimally. + * + * This is only valid for shared mappings. The main use case for this function + * is following UFFDIO_CONTINUE. If a user UFFDIO_CONTINUEs an entire hugepage + * by calling UFFDIO_CONTINUE once for each 4K region, the kernel doesn't know + * to collapse the mapping after the final UFFDIO_CONTINUE. Instead, we leave + * it up to userspace to tell us to do so, via MADV_COLLAPSE. + * + * Any holes in the mapping will be filled. If there is no page in the + * pagecache for a region we're collapsing, the PTEs will be cleared. + * + * If high-granularity PTEs are uffd-wp markers, those markers will be dropped. + */ +int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + struct hstate *h = hstate_vma(vma); + struct address_space *mapping = vma->vm_file->f_mapping; + struct mmu_notifier_range range; + struct mmu_gather tlb; + unsigned long curr = start; + int ret = 0; + struct page *hpage, *subpage; + pgoff_t idx; + bool writable = vma->vm_flags & VM_WRITE; + bool shared = vma->vm_flags & VM_SHARED; + struct hugetlb_pte hpte; + pte_t entry; + + /* + * This is only supported for shared VMAs, because we need to look up + * the page to use for any PTEs we end up creating. + */ + if (!shared) + return -EINVAL; + + if (!hugetlb_hgm_enabled(vma)) + return 0; + + mmu_notifier_range_init(&range, MMU_NOTIFY_CLEAR, 0, vma, mm, + start, end); + mmu_notifier_invalidate_range_start(&range); + tlb_gather_mmu(&tlb, mm); + + /* + * Grab the lock VMA lock for writing. This will prevent concurrent + * high-granularity page table walks, so that we can safely collapse + * and free page tables. + */ + hugetlb_vma_lock_write(vma); + + while (curr < end) { + ret = hugetlb_alloc_largest_pte(&hpte, mm, vma, curr, end); + if (ret) + goto out; + + entry = huge_ptep_get(hpte.ptep); + + /* + * There is no work to do if the PTE doesn't point to page + * tables. + */ + if (!pte_present(entry)) + goto next_hpte; + if (hugetlb_pte_present_leaf(&hpte, entry)) + goto next_hpte; + + idx = vma_hugecache_offset(h, vma, curr); + hpage = find_get_page(mapping, idx); + + if (hpage && !HPageMigratable(hpage)) { + /* + * Don't collapse a mapping to a page that is pending + * a migration. Migration swap entries may have placed + * in the page table. + */ + ret = -EBUSY; + put_page(hpage); + goto out; + } + + if (hpage && PageHWPoison(hpage)) { + /* + * Don't collapse a mapping to a page that is + * hwpoisoned. + */ + ret = -EHWPOISON; + put_page(hpage); + /* + * By setting ret to -EHWPOISON, if nothing else + * happens, we will tell userspace that we couldn't + * fully collapse everything due to poison. + * + * Skip this page, and continue to collapse the rest + * of the mapping. + */ + curr = (curr & huge_page_mask(h)) + huge_page_size(h); + continue; + } + + /* + * Clear all the PTEs, and drop ref/mapcounts + * (on tlb_finish_mmu). + */ + __unmap_hugepage_range(&tlb, vma, curr, + curr + hugetlb_pte_size(&hpte), + NULL, + ZAP_FLAG_DROP_MARKER); + /* Free the PTEs. */ + hugetlb_free_pgd_range(&tlb, + curr, curr + hugetlb_pte_size(&hpte), + curr, curr + hugetlb_pte_size(&hpte)); + if (!hpage) { + huge_pte_clear(mm, curr, hpte.ptep, + hugetlb_pte_size(&hpte)); + goto next_hpte; + } + + page_dup_file_rmap(hpage, true); + + subpage = hugetlb_find_subpage(h, hpage, curr); + entry = make_huge_pte_with_shift(vma, subpage, + writable, hpte.shift); + set_huge_pte_at(mm, curr, hpte.ptep, entry); +next_hpte: + curr += hugetlb_pte_size(&hpte); + + if (curr < end) { + /* Don't hold the VMA lock for too long. */ + hugetlb_vma_unlock_write(vma); + cond_resched(); + hugetlb_vma_lock_write(vma); + } + } +out: + hugetlb_vma_unlock_write(vma); + tlb_finish_mmu(&tlb); + mmu_notifier_invalidate_range_end(&range); + return ret; +} + #endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ /* diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 4734315f7940..70796824e9d2 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2555,8 +2555,8 @@ static int madvise_collapse_errno(enum scan_result r) } } -int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, - unsigned long start, unsigned long end) +int madvise_collapse_thp(struct vm_area_struct *vma, struct vm_area_struct **prev, + unsigned long start, unsigned long end) { struct collapse_control *cc; struct mm_struct *mm = vma->vm_mm; diff --git a/mm/madvise.c b/mm/madvise.c index 2baa93ca2310..6aed9bd68476 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -986,6 +986,24 @@ static long madvise_remove(struct vm_area_struct *vma, return error; } +static int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + /* Only allow collapsing for HGM-enabled, shared mappings. */ + if (is_vm_hugetlb_page(vma)) { + *prev = vma; + if (!hugetlb_hgm_eligible(vma)) + return -EINVAL; + if (!hugetlb_hgm_enabled(vma)) + return 0; + return hugetlb_collapse(vma->vm_mm, vma, start, end); + } + + return madvise_collapse_thp(vma, prev, start, end); + +} + /* * Apply an madvise behavior to a region of a vma. madvise_update_vma * will handle splitting a vm area into separate areas, each area with its own @@ -1157,6 +1175,9 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_TRANSPARENT_HUGEPAGE case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: +#endif +#if defined(CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING) || \ + defined(CONFIG_TRANSPARENT_HUGEPAGE) case MADV_COLLAPSE: #endif case MADV_DONTDUMP: @@ -1347,7 +1368,8 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * MADV_NOHUGEPAGE - mark the given range as not worth being backed by * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. - * MADV_COLLAPSE - synchronously coalesce pages into new THP. + * MADV_COLLAPSE - synchronously coalesce pages into new THP, or, for HugeTLB + * pages, collapse the mapping. * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. From patchwork Fri Oct 21 16:36:53 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015103 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id BB329C433FE for ; Fri, 21 Oct 2022 16:37:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 85CA68E0025; Fri, 21 Oct 2022 12:37:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 7E84A8E0001; Fri, 21 Oct 2022 12:37:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 54DA68E0025; Fri, 21 Oct 2022 12:37:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 383008E0001 for ; Fri, 21 Oct 2022 12:37:52 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay04.hostedemail.com (Postfix) with ESMTP id 034E81A0568 for ; Fri, 21 Oct 2022 16:37:51 +0000 (UTC) X-FDA: 80045513184.04.C11512D Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf07.hostedemail.com (Postfix) with ESMTP id AA11B40034 for ; Fri, 21 Oct 2022 16:37:51 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-360a7ff46c3so34041067b3.12 for ; Fri, 21 Oct 2022 09:37:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=pO7qfdtljC0cQEtW68yd+sWFuuFiNFYwsARvURkZJUk=; b=F5EfAc0EFaqIHDJwcnG3SX+PWOaohJQFLM2kFecrOP2UMSCoQVjg78Y10c4j1PI46N x3pQT8CD3Oz4bgPgYC/hkNHGtoaKAp+KDoJ159WXQmk8tAlHkqBPm8EcDd37g5RKAzHh OvLW9vkuEzWvkq5Y6LbXK3yN9XL3aGUUggRQqwFgMqJtrLkxrke5o4a9R8EwmQ1kcj1S Hppt92/szI1X7GCxGOniN98/J1Tm1vwZRKo9WhiiZPqGEQDCeU4Uj32Clokiwp8SefxD ZHhuRWT6/KZWweMGNRSX67APat2/HtcUTuMbrmhoGIuQHcxDyVwqWe/kONRtmfkeK79F UDpw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=pO7qfdtljC0cQEtW68yd+sWFuuFiNFYwsARvURkZJUk=; b=U0lcROngGfd5DQdS9cYztibjKAWEeMzTKKVUzPt14XEX5BJJb/SzDQ3i8JzcZc3lGr 0tSHbE3JStbCfnqsB6KryrEvbDPotmw3um8I/PBDfTjl0lPsgJB8eDFtGlI/EOX44jeV XSfnpAtkhxSgdvtwYlO/1bhZukkxEk7tzbMv67tj6xSBkQyMW8AgIWYft6WXOHfx+LMC eOIRcUwBjC9V+lQaco8Kaj/x+yVW8BrxA9RkrMtZTZNKlzoGz3zAO4FDoJOiInB2aJ7z ScUtAXXFHtuQLyJM/9zOIGqBPE0Gf6ivVM4AUnZtRRhKbGLq44FhO6fmivutIWplkV0E YqBw== X-Gm-Message-State: ACrzQf0f59wOGT+4DJx6XPwYjb0puFcaP/wNezOtQHjWAK3PTrruP7W3 Z6GZsCMe3c3mik2buaGHxfkNobv9MJFZzNAD X-Google-Smtp-Source: AMsMyM7rpb5mjTGAEgy2DHwIFR1eIfpS/I9RsBpBYNRwglN8Yvfr24l7aLFFrSr+D9OVYLDzvsu+JgqdYQ0MB69h X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:da85:0:b0:360:819a:ffa8 with SMTP id c127-20020a0dda85000000b00360819affa8mr17910682ywe.414.1666370270941; Fri, 21 Oct 2022 09:37:50 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:53 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-38-jthoughton@google.com> Subject: [RFC PATCH v2 37/47] hugetlb: remove huge_pte_lock and huge_pte_lockptr From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=F5EfAc0E; spf=pass (imf07.hostedemail.com: domain of 33spSYwoKCN8KUIPVHIUPOHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=33spSYwoKCN8KUIPVHIUPOHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370271; a=rsa-sha256; cv=none; b=k5WujZRm5OY/pxjs7skVUW1ANs74vy/s7NcxPKwLjcDxX0XQ3i9azeQJSX91gQcevOSaAq XbC7B+toEoEpS6JLInt4yRfBKgJXsN/gifVOXH2CDnKkijlydffVwkSKtk5ZXc+x6OoPiN Nph0tXa+Asg5JoVeIlmXpUDQ2eTNYEM= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370271; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=pO7qfdtljC0cQEtW68yd+sWFuuFiNFYwsARvURkZJUk=; b=au5gFmftvdkdyjMmizN1qw2kwDyvV1HXzTWZZ99NgDZMyropXaZ5sznLArxEKEMbhftjmX /uEMQhWidf71Z0OFKn5KJ95sMzp+jftnzIw6CQtOQU3BMSQzYmwwakTdS/UKREr6DjgxNz waVhyG4yro6shfw5ozveu0ctN/xWp3A= Authentication-Results: imf07.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=F5EfAc0E; spf=pass (imf07.hostedemail.com: domain of 33spSYwoKCN8KUIPVHIUPOHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=33spSYwoKCN8KUIPVHIUPOHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: w3sp1b8i3ao9fssker5cyjuafi8ezfma X-Rspamd-Queue-Id: AA11B40034 X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1666370271-125181 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: They are replaced with hugetlb_pte_lock{,ptr}. All callers that haven't already been replaced don't get called when using HGM, so we handle them by populating hugetlb_ptes with the standard, hstate-sized huge PTEs. Signed-off-by: James Houghton --- include/linux/hugetlb.h | 28 +++------------------------- mm/hugetlb.c | 15 ++++++++++----- 2 files changed, 13 insertions(+), 30 deletions(-) diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 5378b98cc7b8..e6dc25b15403 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1015,14 +1015,6 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return modified_mask; } -static inline spinlock_t *huge_pte_lockptr(unsigned int shift, - struct mm_struct *mm, pte_t *pte) -{ - if (shift == PMD_SHIFT) - return pmd_lockptr(mm, (pmd_t *) pte); - return &mm->page_table_lock; -} - #ifndef hugepages_supported /* * Some platform decide whether they support huge pages at boot @@ -1226,12 +1218,6 @@ static inline gfp_t htlb_modify_alloc_mask(struct hstate *h, gfp_t gfp_mask) return 0; } -static inline spinlock_t *huge_pte_lockptr(unsigned int shift, - struct mm_struct *mm, pte_t *pte) -{ - return &mm->page_table_lock; -} - static inline void hugetlb_count_init(struct mm_struct *mm) { } @@ -1307,16 +1293,6 @@ int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, } #endif -static inline spinlock_t *huge_pte_lock(struct hstate *h, - struct mm_struct *mm, pte_t *pte) -{ - spinlock_t *ptl; - - ptl = huge_pte_lockptr(huge_page_shift(h), mm, pte); - spin_lock(ptl); - return ptl; -} - static inline spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte) { @@ -1324,7 +1300,9 @@ spinlock_t *hugetlb_pte_lockptr(struct mm_struct *mm, struct hugetlb_pte *hpte) BUG_ON(!hpte->ptep); if (hpte->ptl) return hpte->ptl; - return huge_pte_lockptr(hugetlb_pte_shift(hpte), mm, hpte->ptep); + if (hugetlb_pte_level(hpte) == HUGETLB_LEVEL_PMD) + return pmd_lockptr(mm, (pmd_t *) hpte->ptep); + return &mm->page_table_lock; } static inline diff --git a/mm/hugetlb.c b/mm/hugetlb.c index d80db81a1fa5..9d4e41c41f78 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -5164,9 +5164,8 @@ int copy_hugetlb_page_range(struct mm_struct *dst, struct mm_struct *src, put_page(hpage); /* Install the new huge page if src pte stable */ - dst_ptl = huge_pte_lock(h, dst, dst_pte); - src_ptl = huge_pte_lockptr(huge_page_shift(h), - src, src_pte); + dst_ptl = hugetlb_pte_lock(dst, &dst_hpte); + src_ptl = hugetlb_pte_lockptr(src, &src_hpte); spin_lock_nested(src_ptl, SINGLE_DEPTH_NESTING); entry = huge_ptep_get(src_pte); if (!pte_same(src_pte_old, entry)) { @@ -7465,6 +7464,7 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, pte_t *spte = NULL; pte_t *pte; spinlock_t *ptl; + struct hugetlb_pte hpte; i_mmap_lock_read(mapping); vma_interval_tree_foreach(svma, &mapping->i_mmap, idx, idx) { @@ -7485,7 +7485,8 @@ pte_t *huge_pmd_share(struct mm_struct *mm, struct vm_area_struct *vma, if (!spte) goto out; - ptl = huge_pte_lock(hstate_vma(vma), mm, spte); + hugetlb_pte_populate(&hpte, (pte_t *)pud, PUD_SHIFT, HUGETLB_LEVEL_PUD); + ptl = hugetlb_pte_lock(mm, &hpte); if (pud_none(*pud)) { pud_populate(mm, pud, (pmd_t *)((unsigned long)spte & PAGE_MASK)); @@ -8179,6 +8180,7 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) unsigned long address, start, end; spinlock_t *ptl; pte_t *ptep; + struct hugetlb_pte hpte; if (!(vma->vm_flags & VM_MAYSHARE)) return; @@ -8203,7 +8205,10 @@ void hugetlb_unshare_all_pmds(struct vm_area_struct *vma) ptep = huge_pte_offset(mm, address, sz); if (!ptep) continue; - ptl = huge_pte_lock(h, mm, ptep); + + hugetlb_pte_populate(&hpte, ptep, huge_page_shift(h), + hpage_size_to_level(sz)); + ptl = hugetlb_pte_lock(mm, &hpte); huge_pmd_unshare(mm, vma, address, ptep); spin_unlock(ptl); } From patchwork Fri Oct 21 16:36:54 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015104 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 05A9FFA373D for ; Fri, 21 Oct 2022 16:37:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 5760D8E0026; Fri, 21 Oct 2022 12:37:53 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 500CD8E0001; Fri, 21 Oct 2022 12:37:53 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 353128E0026; Fri, 21 Oct 2022 12:37:53 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0013.hostedemail.com [216.40.44.13]) by kanga.kvack.org (Postfix) with ESMTP id 1F6138E0001 for ; Fri, 21 Oct 2022 12:37:53 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 0140E1C6B06 for ; Fri, 21 Oct 2022 16:37:52 +0000 (UTC) X-FDA: 80045513226.25.9618EDB Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf18.hostedemail.com (Postfix) with ESMTP id 8460A1C0037 for ; Fri, 21 Oct 2022 16:37:52 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-36772c0c795so33799127b3.23 for ; Fri, 21 Oct 2022 09:37:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=8LccpPhXfmW4rVnqf/7+o+Hx7HGdpTBJy+WhJxfYYjY=; b=VtedBtwvWHz/xR7mbbUZi/QoevZsZHK58lrDg/Og1dHq+4sTO0SzuJwbDL01OD4aRO ZrzqzImsUW7zaLsPz9uDyqGaQmYDBP/ewXtMT4FyGfv8JkL2w+joXY1RfIXLgBKES2Sj vqywDuAhyTmd/LLF7nkcY7zjNmxvC9i895d0uGYTI/cSG6WlZY+43S6OfJD/E2SXCvXm X2hL0i7SeDE8rCss/xStUkiKMm2focDWaDnd8WQz9Joh9DY2g8UAYbi2pu7QynxsVTmQ UZERGl+LChK8MQdnam9/1x/By+2QIZQaEESDONOlLX/D7ClwvnQ3+jVU/EIpCh5KiFQc fYpg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8LccpPhXfmW4rVnqf/7+o+Hx7HGdpTBJy+WhJxfYYjY=; b=lUGqPw7CCdtkL6CuAP2w0PXE8kA4vLEzznQ2lIiPXM5P4H/+SabNvk9AxcXa+5+xNX tFUAvMA2F+ojzhnd5wBkD1lup/wvNF4I4xx2o0NjGriloOApJlTNshZQbLf6c8r/4FDu tTXZWtozwSpmtyLLIRxxH7u6KyuCexU6GkjggUwVXcXi/5F3i6DXMCoK5sJs/K485pEo nlZ8bON0yBX1hCcPkoizW3ytay6CSBZ/CrUt7QO6ma/1OPOoJZYPqHmL+tEGqTSKBLQy FkOXlWmbzrMiiLOG7epB341f0biQnBp7hFIY6qOpdq0xbnbS0K5/RmO8jlDH0Gvg2zH/ /Inw== X-Gm-Message-State: ACrzQf0btJIYzQf0VDVCYytRhkHyZS7jzfkBWKYy46r2xoQrp0aBCse6 0QQ8ci4CLSsrQ300Bki2wmWXg+0mYXxPGrdZ X-Google-Smtp-Source: AMsMyM7sugWuEQwWleh1RgVujXbvv8FwsEzDuh6usgcO4hdZ4bZxviKYervWwAHcVUH8gDgLsBvkym1lZSOGRGOT X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:2d63:0:b0:6ca:3fe:3f2d with SMTP id s35-20020a252d63000000b006ca03fe3f2dmr12173515ybe.90.1666370271836; Fri, 21 Oct 2022 09:37:51 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:54 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-39-jthoughton@google.com> Subject: [RFC PATCH v2 38/47] hugetlb: replace make_huge_pte with make_huge_pte_with_shift From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370272; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=8LccpPhXfmW4rVnqf/7+o+Hx7HGdpTBJy+WhJxfYYjY=; b=ReoQY0IkTZi9y7yB05OryQCPJ+GjN6hUZYKP3HPjGB+CUZwK5ZPoULZiL+nB+CRElurvYA 6pn0ygbNQa/+yrRlNwPUUPF+UtL41yIbWhSNlQX4sY1d+QY6hzU9VgApGa3FZ343EddOQ3 iHoA0DXuQOgMxW1+HQ7V1GPJllrN+TI= ARC-Authentication-Results: i=1; imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VtedBtwv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of 338pSYwoKCOALVJQWIJVQPIQQING.EQONKPWZ-OOMXCEM.QTI@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=338pSYwoKCOALVJQWIJVQPIQQING.EQONKPWZ-OOMXCEM.QTI@flex--jthoughton.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370272; a=rsa-sha256; cv=none; b=6Musrbo5pCkQgUwH0l2HRADEZOOBgQj0bfPGeRb3ebRP+LjnIuHsaLcwOJHpQBdmFAI0Rd /ZvAR0g7rLdVX/Qs8pS2sRcktIkYJwWXHTiC3B9E/T5XPQIqHwDExPzNJvQjHzSCA4Qi7h KtzR2jnDE4GMycdrTQR8wgVhEPWHtZI= X-Rspam-User: X-Rspamd-Queue-Id: 8460A1C0037 X-Stat-Signature: x61h9wrbiagnezjwmttae98h3misomr6 Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VtedBtwv; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf18.hostedemail.com: domain of 338pSYwoKCOALVJQWIJVQPIQQING.EQONKPWZ-OOMXCEM.QTI@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=338pSYwoKCOALVJQWIJVQPIQQING.EQONKPWZ-OOMXCEM.QTI@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam07 X-HE-Tag: 1666370272-870268 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This removes the old definition of make_huge_pte, where now we always require the shift to be explicitly given. All callsites are cleaned up. Signed-off-by: James Houghton --- mm/hugetlb.c | 31 ++++++++++++------------------- 1 file changed, 12 insertions(+), 19 deletions(-) diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 9d4e41c41f78..b26142bec4fe 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -4908,9 +4908,9 @@ const struct vm_operations_struct hugetlb_vm_ops = { .pagesize = hugetlb_vm_op_pagesize, }; -static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, - struct page *page, int writable, - int shift) +static pte_t make_huge_pte(struct vm_area_struct *vma, + struct page *page, int writable, + int shift) { pte_t entry; @@ -4926,14 +4926,6 @@ static pte_t make_huge_pte_with_shift(struct vm_area_struct *vma, return entry; } -static pte_t make_huge_pte(struct vm_area_struct *vma, struct page *page, - int writable) -{ - unsigned int shift = huge_page_shift(hstate_vma(vma)); - - return make_huge_pte_with_shift(vma, page, writable, shift); -} - static void set_huge_ptep_writable(struct vm_area_struct *vma, unsigned long address, pte_t *ptep) { @@ -4974,10 +4966,12 @@ static void hugetlb_install_page(struct vm_area_struct *vma, pte_t *ptep, unsigned long addr, struct page *new_page) { + struct hstate *h = hstate_vma(vma); __SetPageUptodate(new_page); hugepage_add_new_anon_rmap(new_page, vma, addr); - set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, new_page, 1)); - hugetlb_count_add(pages_per_huge_page(hstate_vma(vma)), vma->vm_mm); + set_huge_pte_at(vma->vm_mm, addr, ptep, make_huge_pte(vma, new_page, 1, + huge_page_shift(h))); + hugetlb_count_add(pages_per_huge_page(h), vma->vm_mm); ClearHPageRestoreReserve(new_page); SetHPageMigratable(new_page); } @@ -5737,7 +5731,8 @@ static vm_fault_t hugetlb_wp(struct mm_struct *mm, struct vm_area_struct *vma, page_remove_rmap(old_page, vma, true); hugepage_add_new_anon_rmap(new_page, vma, haddr); set_huge_pte_at(mm, haddr, ptep, - make_huge_pte(vma, new_page, !unshare)); + make_huge_pte(vma, new_page, !unshare, + huge_page_shift(h))); SetHPageMigratable(new_page); /* Make the old page be freed below */ new_page = old_page; @@ -6033,7 +6028,7 @@ static vm_fault_t hugetlb_no_page(struct mm_struct *mm, page_dup_file_rmap(page, true); subpage = hugetlb_find_subpage(h, page, haddr_hgm); - new_pte = make_huge_pte_with_shift(vma, subpage, + new_pte = make_huge_pte(vma, subpage, ((vma->vm_flags & VM_WRITE) && (vma->vm_flags & VM_SHARED)), hpte->shift); @@ -6481,8 +6476,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, subpage = hugetlb_find_subpage(h, page, dst_addr); WARN_ON_ONCE(subpage != page && !hugetlb_hgm_enabled(dst_vma)); - _dst_pte = make_huge_pte_with_shift(dst_vma, subpage, writable, - dst_hpte->shift); + _dst_pte = make_huge_pte(dst_vma, subpage, writable, dst_hpte->shift); /* * Always mark UFFDIO_COPY page dirty; note that this may not be * extremely important for hugetlbfs for now since swapping is not @@ -8044,8 +8038,7 @@ int hugetlb_collapse(struct mm_struct *mm, struct vm_area_struct *vma, page_dup_file_rmap(hpage, true); subpage = hugetlb_find_subpage(h, hpage, curr); - entry = make_huge_pte_with_shift(vma, subpage, - writable, hpte.shift); + entry = make_huge_pte(vma, subpage, writable, hpte.shift); set_huge_pte_at(mm, curr, hpte.ptep, entry); next_hpte: curr += hugetlb_pte_size(&hpte); From patchwork Fri Oct 21 16:36:55 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015105 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 34EDCC433FE for ; Fri, 21 Oct 2022 16:38:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7C3638E0027; Fri, 21 Oct 2022 12:37:54 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 749E98E0001; Fri, 21 Oct 2022 12:37:54 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 555BB8E0027; Fri, 21 Oct 2022 12:37:54 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 2FB328E0001 for ; Fri, 21 Oct 2022 12:37:54 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 0E0EA140123 for ; Fri, 21 Oct 2022 16:37:54 +0000 (UTC) X-FDA: 80045513268.14.0E01017 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf09.hostedemail.com (Postfix) with ESMTP id A69EA140033 for ; Fri, 21 Oct 2022 16:37:53 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-3538689fc60so34221197b3.3 for ; Fri, 21 Oct 2022 09:37:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=l0T+gx8ZiJIGzUiHsY9RRIxEbYNQB6e8ZQhqkah5o0o=; b=hcmOSRNpRjq1NSM7DS6HOkNcODOyUhsGlgn2j//u74i5fPh2pgjEybQjits1Mwk5lZ lKPtFsy9JwRZ/QpCoIUrz+BOUQpmPn+x03wKvOaw4dkIE0YqoVSiYMWvcQlk22duLcOv Ih4SJ+4n2QJdtBkeEPXivg1YLJg/nUSmvxiKAqqtuM0iOZOwSeuYV7oa76jrjZexxv2d g2n70JA8Q7a6iYcl4XXTqj8YTsteJE6rxVLegrKhAN80Er0f8yLjwpslDHiqVDzr/NBy kxGlC87klGgk8zgpyI9hmrUjjd9CbOO8P2QhjI2W0lQZebZNHlUuk/9YNIni/j2WIMky 1A6A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=l0T+gx8ZiJIGzUiHsY9RRIxEbYNQB6e8ZQhqkah5o0o=; b=tHZzzs4xp076HFM41IS8JRiBQHUDXf+TOHTBTZJ93xhaIip8sdpcTftYkUK781RP2E bbzse5aTVNQ0Q7FMK7+No0e2jkavNENwFlnz80ho1NGEw+eTDPqq7ZI6HHAmXWFmX02U xnJNOY8PnsGVgBwSpcLDeTutIcSsKcCMsqCmCWMV42t7i27jucIsQVi23sy4uB3WxQOh ho5Ysglfv7k7IBFTNCWDeE6GqPij1GRJsSYBXJsGiAVwSph52DhUmuCop1AHBaLs6QK0 y3Heh8oozuGpeHUC5nasAtrtoxcKHx5E+TAIUhLnGS3WrklEm03TlRU2T2hElvd7bRaA pJVw== X-Gm-Message-State: ACrzQf1j99n+QVuSde9wkPxEtXbKPjOIkkFuJw6u81h28yA2Ly0SdjBh fzC/LqhsWWzmXRNrY11Y3aKirxSN8PMqMuUY X-Google-Smtp-Source: AMsMyM4JAXwy24rXZvtJd9kUyV3ZJKwuo8Rj2q9Eu3xMQ2hWwGLVEU/aYB0TsW+nYhoV4WdGOlqlbvyyVDobsOLq X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:5789:0:b0:35d:f12:4c0e with SMTP id l131-20020a815789000000b0035d0f124c0emr17814525ywb.26.1666370272899; Fri, 21 Oct 2022 09:37:52 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:55 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-40-jthoughton@google.com> Subject: [RFC PATCH v2 39/47] mm: smaps: add stats for HugeTLB mapping size From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370273; a=rsa-sha256; cv=none; b=1BubBYwHFiTRf+NWDQ+ypJmQaWDUI1X1ED2Vr9zF/c1zN/GBQGxdt0LrM1dkynKXzJl3vu KdkZ4rFMk9cO3Vxm0xJ3WPJF+/YsiPcn0v2+/HtJWGaIE40VKaSp5bxYGMDyxfXyKbK0wV rAQnP0mX2O5iDAS9gFlg08bnQxKUZns= ARC-Authentication-Results: i=1; imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hcmOSRNp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 34MpSYwoKCOEMWKRXJKWRQJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=34MpSYwoKCOEMWKRXJKWRQJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--jthoughton.bounces.google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370273; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=l0T+gx8ZiJIGzUiHsY9RRIxEbYNQB6e8ZQhqkah5o0o=; b=HIKBYcoZD+0FJz1RfV7WYYv3uqsGFG9G5c21GkDeyxPGa0DCBpEcR0xfCNyG4+aMkMn9sV 6ROwgnk2llZIQV/wf7Tm95x7xqsCI3rcvjFalTGa+GwnoQew3onECgefCzBY+3n8I2DLgl ujD/3jku2iDiD0FKxvCsGqma18WjC40= Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hcmOSRNp; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 34MpSYwoKCOEMWKRXJKWRQJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=34MpSYwoKCOEMWKRXJKWRQJRRJOH.FRPOLQXa-PPNYDFN.RUJ@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam04 X-Rspam-User: X-Stat-Signature: tgz98mxyq7rct63xxdfpsjr1izut1ibp X-Rspamd-Queue-Id: A69EA140033 X-HE-Tag: 1666370273-67794 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When the kernel is compiled with HUGETLB_HIGH_GRANULARITY_MAPPING, smaps may provide HugetlbPudMapped, HugetlbPmdMapped, and HugetlbPteMapped. Levels that are folded will not be outputted. Signed-off-by: James Houghton --- fs/proc/task_mmu.c | 101 +++++++++++++++++++++++++++++++++------------ 1 file changed, 75 insertions(+), 26 deletions(-) diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c index be78cdb7677e..16288d6dbf1d 100644 --- a/fs/proc/task_mmu.c +++ b/fs/proc/task_mmu.c @@ -405,6 +405,15 @@ struct mem_size_stats { unsigned long swap; unsigned long shared_hugetlb; unsigned long private_hugetlb; +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +#ifndef __PAGETABLE_PUD_FOLDED + unsigned long hugetlb_pud_mapped; +#endif +#ifndef __PAGETABLE_PMD_FOLDED + unsigned long hugetlb_pmd_mapped; +#endif + unsigned long hugetlb_pte_mapped; +#endif u64 pss; u64 pss_anon; u64 pss_file; @@ -720,6 +729,35 @@ static void show_smap_vma_flags(struct seq_file *m, struct vm_area_struct *vma) } #ifdef CONFIG_HUGETLB_PAGE + +static void smaps_hugetlb_hgm_account(struct mem_size_stats *mss, + struct hugetlb_pte *hpte) +{ +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING + unsigned long size = hugetlb_pte_size(hpte); + + switch (hpte->level) { +#ifndef __PAGETABLE_PUD_FOLDED + case HUGETLB_LEVEL_PUD: + mss->hugetlb_pud_mapped += size; + break; +#endif +#ifndef __PAGETABLE_PMD_FOLDED + case HUGETLB_LEVEL_PMD: + mss->hugetlb_pmd_mapped += size; + break; +#endif + case HUGETLB_LEVEL_PTE: + mss->hugetlb_pte_mapped += size; + break; + default: + break; + } +#else + return; +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ +} + static int smaps_hugetlb_range(struct hugetlb_pte *hpte, unsigned long addr, struct mm_walk *walk) @@ -753,6 +791,8 @@ static int smaps_hugetlb_range(struct hugetlb_pte *hpte, mss->shared_hugetlb += hugetlb_pte_size(hpte); else mss->private_hugetlb += hugetlb_pte_size(hpte); + + smaps_hugetlb_hgm_account(mss, hpte); } return 0; } @@ -822,38 +862,47 @@ static void smap_gather_stats(struct vm_area_struct *vma, static void __show_smap(struct seq_file *m, const struct mem_size_stats *mss, bool rollup_mode) { - SEQ_PUT_DEC("Rss: ", mss->resident); - SEQ_PUT_DEC(" kB\nPss: ", mss->pss >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_Dirty: ", mss->pss_dirty >> PSS_SHIFT); + SEQ_PUT_DEC("Rss: ", mss->resident); + SEQ_PUT_DEC(" kB\nPss: ", mss->pss >> PSS_SHIFT); + SEQ_PUT_DEC(" kB\nPss_Dirty: ", mss->pss_dirty >> PSS_SHIFT); if (rollup_mode) { /* * These are meaningful only for smaps_rollup, otherwise two of * them are zero, and the other one is the same as Pss. */ - SEQ_PUT_DEC(" kB\nPss_Anon: ", + SEQ_PUT_DEC(" kB\nPss_Anon: ", mss->pss_anon >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_File: ", + SEQ_PUT_DEC(" kB\nPss_File: ", mss->pss_file >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nPss_Shmem: ", + SEQ_PUT_DEC(" kB\nPss_Shmem: ", mss->pss_shmem >> PSS_SHIFT); } - SEQ_PUT_DEC(" kB\nShared_Clean: ", mss->shared_clean); - SEQ_PUT_DEC(" kB\nShared_Dirty: ", mss->shared_dirty); - SEQ_PUT_DEC(" kB\nPrivate_Clean: ", mss->private_clean); - SEQ_PUT_DEC(" kB\nPrivate_Dirty: ", mss->private_dirty); - SEQ_PUT_DEC(" kB\nReferenced: ", mss->referenced); - SEQ_PUT_DEC(" kB\nAnonymous: ", mss->anonymous); - SEQ_PUT_DEC(" kB\nLazyFree: ", mss->lazyfree); - SEQ_PUT_DEC(" kB\nAnonHugePages: ", mss->anonymous_thp); - SEQ_PUT_DEC(" kB\nShmemPmdMapped: ", mss->shmem_thp); - SEQ_PUT_DEC(" kB\nFilePmdMapped: ", mss->file_thp); - SEQ_PUT_DEC(" kB\nShared_Hugetlb: ", mss->shared_hugetlb); - seq_put_decimal_ull_width(m, " kB\nPrivate_Hugetlb: ", + SEQ_PUT_DEC(" kB\nShared_Clean: ", mss->shared_clean); + SEQ_PUT_DEC(" kB\nShared_Dirty: ", mss->shared_dirty); + SEQ_PUT_DEC(" kB\nPrivate_Clean: ", mss->private_clean); + SEQ_PUT_DEC(" kB\nPrivate_Dirty: ", mss->private_dirty); + SEQ_PUT_DEC(" kB\nReferenced: ", mss->referenced); + SEQ_PUT_DEC(" kB\nAnonymous: ", mss->anonymous); + SEQ_PUT_DEC(" kB\nLazyFree: ", mss->lazyfree); + SEQ_PUT_DEC(" kB\nAnonHugePages: ", mss->anonymous_thp); + SEQ_PUT_DEC(" kB\nShmemPmdMapped: ", mss->shmem_thp); + SEQ_PUT_DEC(" kB\nFilePmdMapped: ", mss->file_thp); + SEQ_PUT_DEC(" kB\nShared_Hugetlb: ", mss->shared_hugetlb); + seq_put_decimal_ull_width(m, " kB\nPrivate_Hugetlb: ", mss->private_hugetlb >> 10, 7); - SEQ_PUT_DEC(" kB\nSwap: ", mss->swap); - SEQ_PUT_DEC(" kB\nSwapPss: ", +#ifdef CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING +#ifndef __PAGETABLE_PUD_FOLDED + SEQ_PUT_DEC(" kB\nHugetlbPudMapped: ", mss->hugetlb_pud_mapped); +#endif +#ifndef __PAGETABLE_PMD_FOLDED + SEQ_PUT_DEC(" kB\nHugetlbPmdMapped: ", mss->hugetlb_pmd_mapped); +#endif + SEQ_PUT_DEC(" kB\nHugetlbPteMapped: ", mss->hugetlb_pte_mapped); +#endif /* CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING */ + SEQ_PUT_DEC(" kB\nSwap: ", mss->swap); + SEQ_PUT_DEC(" kB\nSwapPss: ", mss->swap_pss >> PSS_SHIFT); - SEQ_PUT_DEC(" kB\nLocked: ", + SEQ_PUT_DEC(" kB\nLocked: ", mss->pss_locked >> PSS_SHIFT); seq_puts(m, " kB\n"); } @@ -869,18 +918,18 @@ static int show_smap(struct seq_file *m, void *v) show_map_vma(m, vma); - SEQ_PUT_DEC("Size: ", vma->vm_end - vma->vm_start); - SEQ_PUT_DEC(" kB\nKernelPageSize: ", vma_kernel_pagesize(vma)); - SEQ_PUT_DEC(" kB\nMMUPageSize: ", vma_mmu_pagesize(vma)); + SEQ_PUT_DEC("Size: ", vma->vm_end - vma->vm_start); + SEQ_PUT_DEC(" kB\nKernelPageSize: ", vma_kernel_pagesize(vma)); + SEQ_PUT_DEC(" kB\nMMUPageSize: ", vma_mmu_pagesize(vma)); seq_puts(m, " kB\n"); __show_smap(m, &mss, false); - seq_printf(m, "THPeligible: %d\n", + seq_printf(m, "THPeligible: %d\n", hugepage_vma_check(vma, vma->vm_flags, true, false, true)); if (arch_pkeys_enabled()) - seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); + seq_printf(m, "ProtectionKey: %8u\n", vma_pkey(vma)); show_smap_vma_flags(m, vma); return 0; From patchwork Fri Oct 21 16:36:56 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015106 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8C99DFA373E for ; Fri, 21 Oct 2022 16:38:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6621A8E0028; Fri, 21 Oct 2022 12:37:55 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 59CB48E0001; Fri, 21 Oct 2022 12:37:55 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 414B38E0028; Fri, 21 Oct 2022 12:37:55 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 2CD588E0001 for ; Fri, 21 Oct 2022 12:37:55 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 10EAD1405D7 for ; Fri, 21 Oct 2022 16:37:55 +0000 (UTC) X-FDA: 80045513310.05.1309D45 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf29.hostedemail.com (Postfix) with ESMTP id BA3CA120019 for ; Fri, 21 Oct 2022 16:37:54 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-360b9418f64so34091777b3.7 for ; Fri, 21 Oct 2022 09:37:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=aZtBH3hoHoyLKHZRtw1whotmEhRsDJhNWbtzAg1Amnw=; b=hgVq1BGI9uAPQNPd5svcoZ0uICUDeObQg8sjHrHyUINwI5zeVQNqGOHjoo/I6apJKi k/JHaeZ4Yl0qrBsh6RDFwRouNrgEUvtGuCUfpRfpfN4sO+Xf2T8PiR1oaMkO0yS/UNTz 23yBS9W01SZ1r6pO17zULLRSmQAG2HfnAL6Oj1CQ5zD5aGiHW6MLxM2mYGGk74xNkJVY PqQQmAGOeK1qbySxR4UJaQgs9hCNXsrXirmumCJnGX6I+NXqom+pwXFznBlBrvL3S+zD jT93upMwLSjpb8myFKnMEGe3Y5MDjPbD5ykOxmhldKIp5uvY/jR4K4NRBGpRWxnp01tl jMzg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=aZtBH3hoHoyLKHZRtw1whotmEhRsDJhNWbtzAg1Amnw=; b=kb8cgHfy7i00bTXZJkZdUB0NOjXQw6eFbNdqSQiBZoi1/NC/+3oXHQTY4QHPk6GCe/ iMWIGebLJdr9BjTH4j4gURcFRb8kWFZ+BtjyGKnMAs0zROwaBZH7ihywDba5wJHYsMLp yj5skxV9Jk9nuYo2MpNz7PfacKpdiqmsxMi5QpNr3oF8vLMwSn9+a+Rp3KrRZ6sezzms wFFjxI/rGSPiRlF4pjACGOcj2gBSrurnFPshFTgnsBoc9zlYtH3UVyEsdFDM9ilg1wyj uYSR5cf+/2rkTgp19M5hREp7mn4eNSsPzGk3h5mJxDQd+LVKumVa7V5ypexejgMGOEmJ WltQ== X-Gm-Message-State: ACrzQf1mStMyEox0p0beI96xYCkWbvKec4e9a0aGuo5fISsVFoiBM/rG O+6gt+IQZVO3gyaX8m47zHgJHB0AMFHoRzgb X-Google-Smtp-Source: AMsMyM4WUFFcbz+a39rMX4Pd0dvJkqMzPaAe4hK5UbBTGwsMnH/jSNJj+49mYcOwtGDlZYF7vGruuaB3FpImvesN X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:1083:b0:6c0:7c4f:f093 with SMTP id v3-20020a056902108300b006c07c4ff093mr17523772ybu.25.1666370273909; Fri, 21 Oct 2022 09:37:53 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:56 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-41-jthoughton@google.com> Subject: [RFC PATCH v2 40/47] hugetlb: x86: enable high-granularity mapping From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hgVq1BGI; spf=pass (imf29.hostedemail.com: domain of 34cpSYwoKCOINXLSYKLXSRKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=34cpSYwoKCOINXLSYKLXSRKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370274; a=rsa-sha256; cv=none; b=ZooM10NKbScMf7qODznbVpQh02F7pjDmiCnuZ+M18MxbZKRDZ64hvXuFai1WhP2jGo5/JE Iox/kQJYMjRQ2iYE+ZM/eac+SEBlMgPVFfaWAinPYfLr1jagmwy4ixM8seaHRiP5OmMD5b 5wFQCzcwfCWhqMhoyhgfJ/KmSrDZNCY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370274; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=aZtBH3hoHoyLKHZRtw1whotmEhRsDJhNWbtzAg1Amnw=; b=ezsCR0RjY6q6w+IeyBQUNP8tdU5PtbRhRoKoRFfbxRmGt5z0ZzfkdgwGmmMzSDIwY1Rs/E px2nTLYrqHOZFAmZ3Ec4Ew2uRqfdu9WHEWlVt8ZtKMPXgzJ8y+T2cECfufINV2O9nRu+kX xmggBz5NTzsQQv0q6xYNhozRlJTMhCY= Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hgVq1BGI; spf=pass (imf29.hostedemail.com: domain of 34cpSYwoKCOINXLSYKLXSRKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=34cpSYwoKCOINXLSYKLXSRKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Stat-Signature: xaeqftfdx7mkrs1poej4fc6qokocp6xf X-Rspamd-Queue-Id: BA3CA120019 X-HE-Tag: 1666370274-874330 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Now that HGM is fully supported for GENERAL_HUGETLB, x86 can enable it. The x86 KVM MMU already properly handles HugeTLB HGM pages (it does a page table walk to determine which size to use in the second-stage page table instead of, for example, checking vma_mmu_pagesize, like arm64 does). We could also enable HugeTLB HGM for arm (32-bit) at this point, as it also uses GENERAL_HUGETLB and I don't see anything else that is needed for it. However, I haven't tested on arm at all, so I won't enable it. Signed-off-by: James Houghton --- arch/x86/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 6d1879ef933a..6d7103266e61 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -124,6 +124,7 @@ config X86 select ARCH_WANT_GENERAL_HUGETLB select ARCH_WANT_HUGE_PMD_SHARE select ARCH_WANT_HUGETLB_PAGE_OPTIMIZE_VMEMMAP if X86_64 + select ARCH_WANT_HUGETLB_HIGH_GRANULARITY_MAPPING select ARCH_WANT_LD_ORPHAN_WARN select ARCH_WANTS_THP_SWAP if X86_64 select ARCH_HAS_PARANOID_L1D_FLUSH From patchwork Fri Oct 21 16:36:57 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015107 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D1BEBC433FE for ; Fri, 21 Oct 2022 16:38:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 467C88E0005; Fri, 21 Oct 2022 12:37:56 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3C95B8E0001; Fri, 21 Oct 2022 12:37:56 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 21B2E8E0005; Fri, 21 Oct 2022 12:37:56 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 02C4D8E0001 for ; Fri, 21 Oct 2022 12:37:56 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id D25A11212F6 for ; Fri, 21 Oct 2022 16:37:55 +0000 (UTC) X-FDA: 80045513310.25.4BB2C64 Received: from mail-yw1-f201.google.com (mail-yw1-f201.google.com [209.85.128.201]) by imf16.hostedemail.com (Postfix) with ESMTP id 850EA180034 for ; Fri, 21 Oct 2022 16:37:55 +0000 (UTC) Received: by mail-yw1-f201.google.com with SMTP id 00721157ae682-36ab1ae386bso3394957b3.16 for ; Fri, 21 Oct 2022 09:37:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=u2L8dpQwPLEEtV1+N27Ru5/DaXt/OYV1HEOgWR9nKqQ=; b=hQEOt5qbhtIciEkifW3aJtfdGkwkMFYThz3+zH4m+l0DZeOwkOHljylI87o4133YIY +9qlJoQYYs5X58L9fVYGOUcAW3htdR4wJqVgJegiRWhoDZN6Hw6rLU1Q3Xarnd4Z6B2S 0Qm/YGHX8i6NhqgYKz1MbFHfMwocZZ13peTo4v5Lo7uK+rU6z8w5PasvDpHbgItoLjy/ kO/NxxIWM2aQQDZLQNut8OALQcGGq9dp7sd0njpT2gzPLMMoFH/CqGvCCb5bZ8st1noJ mGHFYBCd4aJ2Q+Rh1Ul7ceCnhLsZNJJXbbirMkaOkrcUY5VvqDaCqxSFUg1RZviE76DG R0lw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=u2L8dpQwPLEEtV1+N27Ru5/DaXt/OYV1HEOgWR9nKqQ=; b=5rFj6nDCzyMxSkheTqjEB45bB1ZF01WMYjNZmpC71gfTwMoNGkt1dTNh0LnmFCG59H OeyXdzI/JY21s8zuerBh9MFgVTPd7DjO3RA3vVdpNvSylKPCUcyzAL1t4LOrcac9jfHk FqNJmvLhZZ4J4tZz5t5QOnQeAsJxSxET0kCuCLl1sbatKftriGmpuDHmHQWTtBSES788 RjJzTFKUrsKjZMhKR66lImMiktl3my0YuDsl8xpwu7Vg3XTbDA2PZS5G5dRl5W2KsTK3 7b/ceL1HPMBQkoAOldyuVizaCv75mtUMxDmCJMvRm2nIqlLb5NNZVYIe70QSrYhIaZo9 YqIg== X-Gm-Message-State: ACrzQf2bCBrwxHO2mjfkkVATYGV4S2iTf8OlvQmFzCtou6qnM/Zxc18I sf4Oum/mCwAjCMv7GqZpapaNIVfDaXz3Zk+f X-Google-Smtp-Source: AMsMyM7qrRczE0kSxjQO7MrMYsn4Vq7Fy76rcBbW8wGxpKnFBq1NTfYQXCvfTxYgtyY5kNNK/TDw3Sq2i3AbtMvh X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6902:105:b0:6bc:fb54:f4da with SMTP id o5-20020a056902010500b006bcfb54f4damr17650934ybh.284.1666370275067; Fri, 21 Oct 2022 09:37:55 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:57 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-42-jthoughton@google.com> Subject: [RFC PATCH v2 41/47] docs: hugetlb: update hugetlb and userfaultfd admin-guides with HGM info From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370275; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=u2L8dpQwPLEEtV1+N27Ru5/DaXt/OYV1HEOgWR9nKqQ=; b=2UqG7gGdq/m91lB7J8Z2aWamiJ/pj9GJ7MmkVFbS/hh7ntR8+BKD6z9cGR58fDARd4goae IFYpb/scJiqQpIBs/Cjf417AXT6whRS2jeaVfzlFhgy/+cta70kk3uCxRh3o9DGMVVsQy4 PwSZZiorMT2R7zK0yPIVaOk1df3y6+E= ARC-Authentication-Results: i=1; imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hQEOt5qb; spf=pass (imf16.hostedemail.com: domain of 348pSYwoKCOQPZNUaMNZUTMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=348pSYwoKCOQPZNUaMNZUTMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370275; a=rsa-sha256; cv=none; b=VeHUnJgepQm6CYeVA/6zbD0i04DgmhaI4TAbX50YfbGYaxXmNJ3GzYQXF8WpjkMegalrJv 3pBZmFPmmR3gKfQT+aWyqx3eLEEe2jW5CRqW8Ifj78WhoUMSnK/jO0NF9xwknswks4yoMr FDAy03ZW2Vd0PK2BLLYpvnh0gYKMyBk= X-Stat-Signature: zkm1eo1io8a6ripjprazcxdgx77fcs7t X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hQEOt5qb; spf=pass (imf16.hostedemail.com: domain of 348pSYwoKCOQPZNUaMNZUTMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--jthoughton.bounces.google.com designates 209.85.128.201 as permitted sender) smtp.mailfrom=348pSYwoKCOQPZNUaMNZUTMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 850EA180034 X-HE-Tag: 1666370275-877795 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This includes information about how UFFD_FEATURE_MINOR_HUGETLBFS_HGM should be used and when MADV_COLLAPSE should be used with it. Signed-off-by: James Houghton --- Documentation/admin-guide/mm/hugetlbpage.rst | 4 ++++ Documentation/admin-guide/mm/userfaultfd.rst | 16 +++++++++++++++- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/Documentation/admin-guide/mm/hugetlbpage.rst b/Documentation/admin-guide/mm/hugetlbpage.rst index 19f27c0d92e0..ca7db15ae768 100644 --- a/Documentation/admin-guide/mm/hugetlbpage.rst +++ b/Documentation/admin-guide/mm/hugetlbpage.rst @@ -454,6 +454,10 @@ errno set to EINVAL or exclude hugetlb pages that extend beyond the length if not hugepage aligned. For example, munmap(2) will fail if memory is backed by a hugetlb page and the length is smaller than the hugepage size. +It is possible for users to map HugeTLB pages at a higher granularity than +normal using HugeTLB high-granularity mapping (HGM). For example, when using 1G +pages on x86, a user could map that page with 4K PTEs, 2M PMDs, a combination of +the two. See Documentation/admin-guide/mm/userfaultfd.rst. Examples ======== diff --git a/Documentation/admin-guide/mm/userfaultfd.rst b/Documentation/admin-guide/mm/userfaultfd.rst index 83f31919ebb3..19877aaad61b 100644 --- a/Documentation/admin-guide/mm/userfaultfd.rst +++ b/Documentation/admin-guide/mm/userfaultfd.rst @@ -115,6 +115,14 @@ events, except page fault notifications, may be generated: areas. ``UFFD_FEATURE_MINOR_SHMEM`` is the analogous feature indicating support for shmem virtual memory areas. +- ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM`` indicates that the kernel supports + small-page-aligned regions for ``UFFDIO_CONTINUE`` in HugeTLB-backed + virtual memory areas. ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM`` and + ``UFFD_FEATURE_EXACT_ADDRESS`` must both be specified explicitly to enable + this behavior. If ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM`` is specified but + ``UFFD_FEATURE_EXACT_ADDRESS`` is not, then ``UFFDIO_API`` will fail with + ``EINVAL``. + The userland application should set the feature flags it intends to use when invoking the ``UFFDIO_API`` ioctl, to request that those features be enabled if supported. @@ -169,7 +177,13 @@ like to do to resolve it: the page cache). Userspace has the option of modifying the page's contents before resolving the fault. Once the contents are correct (modified or not), userspace asks the kernel to map the page and let the - faulting thread continue with ``UFFDIO_CONTINUE``. + faulting thread continue with ``UFFDIO_CONTINUE``. If this is done at the + base-page size in a transparent-hugepage-eligible VMA or in a HugeTLB VMA + (requires ``UFFD_FEATURE_MINOR_HUGETLBFS_HGM``), then userspace may want to + use ``MADV_COLLAPSE`` when a hugepage is fully populated to inform the kernel + that it may be able to collapse the mapping. ``MADV_COLLAPSE`` will may undo + the effect of any ``UFFDIO_WRITEPROTECT`` calls on the collapsed address + range. Notes: From patchwork Fri Oct 21 16:36:58 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015108 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1B781FA373E for ; Fri, 21 Oct 2022 16:38:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9478C8E0029; Fri, 21 Oct 2022 12:37:57 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 80E458E0001; Fri, 21 Oct 2022 12:37:57 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 687BD8E0029; Fri, 21 Oct 2022 12:37:57 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0015.hostedemail.com [216.40.44.15]) by kanga.kvack.org (Postfix) with ESMTP id 3D8C68E0001 for ; Fri, 21 Oct 2022 12:37:57 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id 0E0D5A05D0 for ; Fri, 21 Oct 2022 16:37:57 +0000 (UTC) X-FDA: 80045513394.09.899B7AA Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf02.hostedemail.com (Postfix) with ESMTP id A146F80041 for ; Fri, 21 Oct 2022 16:37:56 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-3697bd55974so33432737b3.15 for ; Fri, 21 Oct 2022 09:37:56 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=TmmajsS+yF0Y+smc9uDIgSrHWhiWWvbw0yNGeqoJ1B4=; b=k/CWTQ2SpQJfccU+Jmu5n5m3XKwNOcn09ydVormKK4vMumeUdIUvjCVCl5GIZ9SIyh C7Inn0LyzhxsNSOMjm/4020a5BNYgiNEhy78qAvhW1Twuq2iFbHu60cGARKIabVVIokR utHTdzZJsAzCrzV4+3TLa3tG7Ervq6FCIxHIVfY86Qt7eoS/8UiFf/E6EZGcJbnBuQHC zZp90iS8OmSktGOr/SeHXA0/BixmjPhSHK2BJ4SArbN3ZuBM6g0pEG45aPZ9TlAcsNeA Wt8tZXRb5+asexv88RaTMkaItUpev6DjCp9jt/Dnh6ebstfYDAyaIV6Q4tpqhyYhq9jK 6mNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=TmmajsS+yF0Y+smc9uDIgSrHWhiWWvbw0yNGeqoJ1B4=; b=Vn4/BTHqB6y99Kl4LlV4Je846skgeWgqEijGEKd26JrfQ6ouEOVmq9pVu+a+KTVhKr mNFmzsOoisr+fxsyQnCPaHCbvFFDpBY+Ao9t2SKHNS/5fC7b/edr48GlrfWT5XhtgiQ5 nEtrmnUH0YDGNM9DQjKnU0DnjQZQM+GdTCOja8Fob9xHr+Cp37IgWkRBTPLbWAlHurWS oUl/jri1nBKtMTBQEIGW7kkfnWHpW0qKA63cERt1h9D/NqnoXdQMHd94mfZiCnXEXeSK RjG39yHkQI6BpD2sGUxsu2UNzdeYvGRVZ/mKZ/vxf/EhO0i0D0YCcnDrG1hHUCh3hK17 dYvA== X-Gm-Message-State: ACrzQf0gBm/VhA9cHSqQE5QOCI5ft0EdLGZzRrZQ0HWVjqDMrDQH9IZf aSoPxZuRA4pBJxfQWNUAfviMfORldMeAuyfw X-Google-Smtp-Source: AMsMyM5msxZHKYsLs81LjcYQrDmjGVSx8z+U4iBhnZGc75i/WnwO2D1HOQtkJBFaF+q0Bi5FivhkVZCgmEY2BbDT X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a81:1aca:0:b0:35f:1d9e:fbc8 with SMTP id a193-20020a811aca000000b0035f1d9efbc8mr17306594ywa.261.1666370275936; Fri, 21 Oct 2022 09:37:55 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:58 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-43-jthoughton@google.com> Subject: [RFC PATCH v2 42/47] docs: proc: include information about HugeTLB HGM From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370276; a=rsa-sha256; cv=none; b=n4G2OeAy6Hcxu0dnRAhdI9/6EGVg5LZyGFunDlp+9YaqkfeiFmq/FLd5FRNHTJkVqeJOvc QWkhNgOGmzASaicv4/SgNmudXMw6y/Ynqe6uT/DyJRRvTmJH2sBRR21x/ul3dbiyfbatxV FPeilHAIR6gyfMc4okNhk/QemI93+HM= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="k/CWTQ2S"; spf=pass (imf02.hostedemail.com: domain of 348pSYwoKCOQPZNUaMNZUTMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=348pSYwoKCOQPZNUaMNZUTMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370276; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=TmmajsS+yF0Y+smc9uDIgSrHWhiWWvbw0yNGeqoJ1B4=; b=Y6/R5mLko7g7ShXRa2oHu6GpCuPWD+ADxOyPIrX/pUkrPgN+d6pYjbKhEoW+2geG/4CKZd 9tsrvkki58qgtd4rR/ZMUctE3fbjdkMOX83urB++p9dDxgRA0FD4LM86S/As+UChJNlup4 Cc7lVgmG5K3lhAKeOjqIduyDepVLZVk= X-Stat-Signature: hhnymerd7r45cb6boptkpfmyx5ta11j7 X-Rspamd-Queue-Id: A146F80041 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="k/CWTQ2S"; spf=pass (imf02.hostedemail.com: domain of 348pSYwoKCOQPZNUaMNZUTMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=348pSYwoKCOQPZNUaMNZUTMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1666370276-986276 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This includes the updates that have been made to smaps, specifically, the addition of Hugetlb[Pud,Pmd,Pte]Mapped. Signed-off-by: James Houghton --- Documentation/filesystems/proc.rst | 56 +++++++++++++++++------------- 1 file changed, 32 insertions(+), 24 deletions(-) diff --git a/Documentation/filesystems/proc.rst b/Documentation/filesystems/proc.rst index ec6cfdf1796a..807d6c0694c2 100644 --- a/Documentation/filesystems/proc.rst +++ b/Documentation/filesystems/proc.rst @@ -444,29 +444,32 @@ Memory Area, or VMA) there is a series of lines such as the following:: 08048000-080bc000 r-xp 00000000 03:02 13130 /bin/bash - Size: 1084 kB - KernelPageSize: 4 kB - MMUPageSize: 4 kB - Rss: 892 kB - Pss: 374 kB - Pss_Dirty: 0 kB - Shared_Clean: 892 kB - Shared_Dirty: 0 kB - Private_Clean: 0 kB - Private_Dirty: 0 kB - Referenced: 892 kB - Anonymous: 0 kB - LazyFree: 0 kB - AnonHugePages: 0 kB - ShmemPmdMapped: 0 kB - Shared_Hugetlb: 0 kB - Private_Hugetlb: 0 kB - Swap: 0 kB - SwapPss: 0 kB - KernelPageSize: 4 kB - MMUPageSize: 4 kB - Locked: 0 kB - THPeligible: 0 + Size: 1084 kB + KernelPageSize: 4 kB + MMUPageSize: 4 kB + Rss: 892 kB + Pss: 374 kB + Pss_Dirty: 0 kB + Shared_Clean: 892 kB + Shared_Dirty: 0 kB + Private_Clean: 0 kB + Private_Dirty: 0 kB + Referenced: 892 kB + Anonymous: 0 kB + LazyFree: 0 kB + AnonHugePages: 0 kB + ShmemPmdMapped: 0 kB + Shared_Hugetlb: 0 kB + Private_Hugetlb: 0 kB + HugetlbPudMapped: 0 kB + HugetlbPmdMapped: 0 kB + HugetlbPteMapped: 0 kB + Swap: 0 kB + SwapPss: 0 kB + KernelPageSize: 4 kB + MMUPageSize: 4 kB + Locked: 0 kB + THPeligible: 0 VmFlags: rd ex mr mw me dw The first of these lines shows the same information as is displayed for the @@ -507,10 +510,15 @@ implementation. If this is not desirable please file a bug report. "ShmemPmdMapped" shows the ammount of shared (shmem/tmpfs) memory backed by huge pages. -"Shared_Hugetlb" and "Private_Hugetlb" show the ammounts of memory backed by +"Shared_Hugetlb" and "Private_Hugetlb" show the amounts of memory backed by hugetlbfs page which is *not* counted in "RSS" or "PSS" field for historical reasons. And these are not included in {Shared,Private}_{Clean,Dirty} field. +If the kernel was compiled with ``CONFIG_HUGETLB_HIGH_GRANULARITY_MAPPING``, +"HugetlbPudMapped", "HugetlbPmdMapped", and "HugetlbPteMapped" will appear and +show the amount of HugeTLB memory mapped with PUDs, PMDs, and PTEs respectively. +See Documentation/admin-guide/mm/hugetlbpage.rst. + "Swap" shows how much would-be-anonymous memory is also used, but out on swap. For shmem mappings, "Swap" includes also the size of the mapped (and not From patchwork Fri Oct 21 16:36:59 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015109 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5ED98C433FE for ; Fri, 21 Oct 2022 16:38:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 550FD8E002A; Fri, 21 Oct 2022 12:37:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4D93E8E0001; Fri, 21 Oct 2022 12:37:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2DE7C8E002A; Fri, 21 Oct 2022 12:37:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 12B3F8E0001 for ; Fri, 21 Oct 2022 12:37:58 -0400 (EDT) Received: from smtpin26.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id D74B91C6CE1 for ; Fri, 21 Oct 2022 16:37:57 +0000 (UTC) X-FDA: 80045513394.26.1D5BE46 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf05.hostedemail.com (Postfix) with ESMTP id 8A4F110003D for ; Fri, 21 Oct 2022 16:37:57 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-36a4089328dso14611877b3.2 for ; Fri, 21 Oct 2022 09:37:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=LdHnE7caiqm4qshIBX12p5Rv53BpHm3Kmd+OvWlvh64=; b=BIwoTuKHg0KVxAgZkQCtc2lTNtp6Y3O1lU/37+Hr0O/jeYU6gp+nzdB+H8m1NuA1ZL VfIhzTnIwplVj+yHE00lZr+dFVwGgjXrUD+N5kHCoV/HhA1HAXwQNy9x27lvlVlgLIwf DtjoIt6kkMc1YxnjeRuFwMF1WhDCfWcSqQA8ZPBcfbEojzPyzzD3dfJjCbyiR5Oc3sTY SGe/+kTodt8Pm9AEqQzXBD2m1LtQnlBciskGh6NtkTnRzM+71sNNQHKJjcxjPPCLNpaD NGLrnmbOnuJ8iS/HFbyXfe7oQTovhAfVGgN7uv4nuezupV5Y/VexoSSWmkqo3fUcEME7 XpJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=LdHnE7caiqm4qshIBX12p5Rv53BpHm3Kmd+OvWlvh64=; b=Cz6hl472UVhvB4Ay7e4bLub18oTKbTSeoP9DSd1jwvfnit3uagGRwhic9jKpSJMtaH /DrYJaaAJtPcuRisxO7oCzK8WB8DH2ihNZx8OsQteHsEoA8jQiIvJnAaJPC+KBJaLQvE iaJ0E9gS+JrESx8GMoQQ1YVQUlgtuDNcx1ccG66AX3cbyWd4ZCBJKWG9nNNl5u34PoGj mBDGefTnKjrlJCV3AKxUT1jw+zwsMnffiJmJPAJ5yMuNh6x3vH/UBrX0W1a7KEkM4qi2 R9rF9fVhbmYx59lFH2bVA7vPWA/BL+ZpjHf2DouHFK7H5ptcV/MUGD04kQ7htOQnU2FH oUKw== X-Gm-Message-State: ACrzQf0ArEohyqsy7fnQFpIHm8tHSYBL0ZakdHGYd7aA+RDXe6e/27vh EYbUHxyCGmT5sWKU4CajZiTCz9Fblxl/67J6 X-Google-Smtp-Source: AMsMyM5h7Qfvjj2U3dA4nHJgEpkUrpiMmGEWdmjeehuzwUrU0UvEY2wo0j6MknJytybv7NpgRlPVlv0RuroSad+t X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a0d:e402:0:b0:368:5f54:d94b with SMTP id n2-20020a0de402000000b003685f54d94bmr8815902ywe.519.1666370276754; Fri, 21 Oct 2022 09:37:56 -0700 (PDT) Date: Fri, 21 Oct 2022 16:36:59 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-44-jthoughton@google.com> Subject: [RFC PATCH v2 43/47] selftests/vm: add HugeTLB HGM to userfaultfd selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=BIwoTuKH; spf=pass (imf05.hostedemail.com: domain of 35MpSYwoKCOUQaOVbNOaVUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=35MpSYwoKCOUQaOVbNOaVUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370277; a=rsa-sha256; cv=none; b=v19e1QquF6CbKaoWiumfVCfpAZV8QXII0lgxE6FOkoPkVlYzPT5GpUAYAhIaRM3bAE+5qt zJEFB+onaNd8yq8yT7X/T3uJEXDbjrSfJp4f2O4wivA+pMxRyrejumZdpiFb+36rO3c4ym TRAi9ZwJnSh7Yof43B5GRmBsC6r8wbY= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370277; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=LdHnE7caiqm4qshIBX12p5Rv53BpHm3Kmd+OvWlvh64=; b=3u5oyCRqJvo8EdbsatQI/z27XJypxSoB0K2N1TqMKuuI5WTjO0hqTCdacSO1ghJhFinanJ T7zOijbdI87nMGTecGeQmSD9yroW7MyiySapooKnnBOGKAau0f7fTYDIQcdaSfqRLXbLw1 E7xdW4/QmLKCevd4o9VVT50Y6dkPwGw= Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=BIwoTuKH; spf=pass (imf05.hostedemail.com: domain of 35MpSYwoKCOUQaOVbNOaVUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=35MpSYwoKCOUQaOVbNOaVUNVVNSL.JVTSPUbe-TTRcHJR.VYN@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: uu34xcfbqyeyquyrsd97amqyprnc1g9q X-Rspamd-Queue-Id: 8A4F110003D X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1666370277-755641 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This test case behaves similarly to the regular shared HugeTLB configuration, except that it uses 4K instead of hugepages, and that we ignore the UFFDIO_COPY tests, as UFFDIO_CONTINUE is the only ioctl that supports PAGE_SIZE-aligned regions. This doesn't test MADV_COLLAPSE. Other tests are added later to exercise MADV_COLLAPSE. Signed-off-by: James Houghton --- tools/testing/selftests/vm/userfaultfd.c | 90 +++++++++++++++++++----- 1 file changed, 74 insertions(+), 16 deletions(-) diff --git a/tools/testing/selftests/vm/userfaultfd.c b/tools/testing/selftests/vm/userfaultfd.c index 7f22844ed704..c9cdfb20f292 100644 --- a/tools/testing/selftests/vm/userfaultfd.c +++ b/tools/testing/selftests/vm/userfaultfd.c @@ -73,9 +73,10 @@ static unsigned long nr_cpus, nr_pages, nr_pages_per_cpu, page_size, hpage_size; #define BOUNCE_POLL (1<<3) static int bounces; -#define TEST_ANON 1 -#define TEST_HUGETLB 2 -#define TEST_SHMEM 3 +#define TEST_ANON 1 +#define TEST_HUGETLB 2 +#define TEST_HUGETLB_HGM 3 +#define TEST_SHMEM 4 static int test_type; #define UFFD_FLAGS (O_CLOEXEC | O_NONBLOCK | UFFD_USER_MODE_ONLY) @@ -93,6 +94,8 @@ static volatile bool test_uffdio_zeropage_eexist = true; static bool test_uffdio_wp = true; /* Whether to test uffd minor faults */ static bool test_uffdio_minor = false; +static bool test_uffdio_copy = true; + static bool map_shared; static int mem_fd; static unsigned long long *count_verify; @@ -151,7 +154,7 @@ static void usage(void) fprintf(stderr, "\nUsage: ./userfaultfd " "[hugetlbfs_file]\n\n"); fprintf(stderr, "Supported : anon, hugetlb, " - "hugetlb_shared, shmem\n\n"); + "hugetlb_shared, hugetlb_shared_hgm, shmem\n\n"); fprintf(stderr, "'Test mods' can be joined to the test type string with a ':'. " "Supported mods:\n"); fprintf(stderr, "\tsyscall - Use userfaultfd(2) (default)\n"); @@ -167,6 +170,11 @@ static void usage(void) exit(1); } +static bool test_is_hugetlb(void) +{ + return test_type == TEST_HUGETLB || test_type == TEST_HUGETLB_HGM; +} + #define _err(fmt, ...) \ do { \ int ret = errno; \ @@ -381,8 +389,12 @@ static struct uffd_test_ops *uffd_test_ops; static inline uint64_t uffd_minor_feature(void) { - if (test_type == TEST_HUGETLB && map_shared) - return UFFD_FEATURE_MINOR_HUGETLBFS; + if (test_is_hugetlb() && map_shared) + return UFFD_FEATURE_MINOR_HUGETLBFS | + (test_type == TEST_HUGETLB_HGM + ? (UFFD_FEATURE_MINOR_HUGETLBFS_HGM | + UFFD_FEATURE_EXACT_ADDRESS) + : 0); else if (test_type == TEST_SHMEM) return UFFD_FEATURE_MINOR_SHMEM; else @@ -393,7 +405,7 @@ static uint64_t get_expected_ioctls(uint64_t mode) { uint64_t ioctls = UFFD_API_RANGE_IOCTLS; - if (test_type == TEST_HUGETLB) + if (test_is_hugetlb()) ioctls &= ~(1 << _UFFDIO_ZEROPAGE); if (!((mode & UFFDIO_REGISTER_MODE_WP) && test_uffdio_wp)) @@ -500,13 +512,16 @@ static void uffd_test_ctx_clear(void) static void uffd_test_ctx_init(uint64_t features) { unsigned long nr, cpu; + uint64_t enabled_features = features; uffd_test_ctx_clear(); uffd_test_ops->allocate_area((void **)&area_src, true); uffd_test_ops->allocate_area((void **)&area_dst, false); - userfaultfd_open(&features); + userfaultfd_open(&enabled_features); + if ((enabled_features & features) != features) + err("couldn't enable all features"); count_verify = malloc(nr_pages * sizeof(unsigned long long)); if (!count_verify) @@ -726,13 +741,21 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, struct uffd_stats *stats) { unsigned long offset; + unsigned long address; if (msg->event != UFFD_EVENT_PAGEFAULT) err("unexpected msg event %u", msg->event); + /* + * Round down address to nearest page_size. + * We do this manually because we specified UFFD_FEATURE_EXACT_ADDRESS + * to support UFFD_FEATURE_MINOR_HUGETLBFS_HGM. + */ + address = msg->arg.pagefault.address & ~(page_size - 1); + if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP) { /* Write protect page faults */ - wp_range(uffd, msg->arg.pagefault.address, page_size, false); + wp_range(uffd, address, page_size, false); stats->wp_faults++; } else if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_MINOR) { uint8_t *area; @@ -751,11 +774,10 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, */ area = (uint8_t *)(area_dst + - ((char *)msg->arg.pagefault.address - - area_dst_alias)); + ((char *)address - area_dst_alias)); for (b = 0; b < page_size; ++b) area[b] = ~area[b]; - continue_range(uffd, msg->arg.pagefault.address, page_size); + continue_range(uffd, address, page_size); stats->minor_faults++; } else { /* @@ -782,7 +804,7 @@ static void uffd_handle_page_fault(struct uffd_msg *msg, if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WRITE) err("unexpected write fault"); - offset = (char *)(unsigned long)msg->arg.pagefault.address - area_dst; + offset = (char *)address - area_dst; offset &= ~(page_size-1); if (copy_page(uffd, offset)) @@ -1192,6 +1214,12 @@ static int userfaultfd_events_test(void) char c; struct uffd_stats stats = { 0 }; + if (!test_uffdio_copy) { + printf("Skipping userfaultfd events test " + "(test_uffdio_copy=false)\n"); + return 0; + } + printf("testing events (fork, remap, remove): "); fflush(stdout); @@ -1245,6 +1273,12 @@ static int userfaultfd_sig_test(void) char c; struct uffd_stats stats = { 0 }; + if (!test_uffdio_copy) { + printf("Skipping userfaultfd signal test " + "(test_uffdio_copy=false)\n"); + return 0; + } + printf("testing signal delivery: "); fflush(stdout); @@ -1538,6 +1572,12 @@ static int userfaultfd_stress(void) pthread_attr_init(&attr); pthread_attr_setstacksize(&attr, 16*1024*1024); + if (!test_uffdio_copy) { + printf("Skipping userfaultfd stress test " + "(test_uffdio_copy=false)\n"); + bounces = 0; + } + while (bounces--) { printf("bounces: %d, mode:", bounces); if (bounces & BOUNCE_RANDOM) @@ -1696,6 +1736,16 @@ static void set_test_type(const char *type) uffd_test_ops = &hugetlb_uffd_test_ops; /* Minor faults require shared hugetlb; only enable here. */ test_uffdio_minor = true; + } else if (!strcmp(type, "hugetlb_shared_hgm")) { + map_shared = true; + test_type = TEST_HUGETLB_HGM; + uffd_test_ops = &hugetlb_uffd_test_ops; + /* + * HugeTLB HGM only changes UFFDIO_CONTINUE, so don't test + * UFFDIO_COPY. + */ + test_uffdio_minor = true; + test_uffdio_copy = false; } else if (!strcmp(type, "shmem")) { map_shared = true; test_type = TEST_SHMEM; @@ -1731,6 +1781,7 @@ static void parse_test_type_arg(const char *raw_type) err("Unsupported test: %s", raw_type); if (test_type == TEST_HUGETLB) + /* TEST_HUGETLB_HGM gets small pages. */ page_size = hpage_size; else page_size = sysconf(_SC_PAGE_SIZE); @@ -1813,22 +1864,29 @@ int main(int argc, char **argv) nr_cpus = x < y ? x : y; } nr_pages_per_cpu = bytes / page_size / nr_cpus; + if (test_type == TEST_HUGETLB_HGM) + /* + * `page_size` refers to the page_size we can use in + * UFFDIO_CONTINUE. We still need nr_pages to be appropriately + * aligned, so align it here. + */ + nr_pages_per_cpu -= nr_pages_per_cpu % (hpage_size / page_size); if (!nr_pages_per_cpu) { _err("invalid MiB"); usage(); } + nr_pages = nr_pages_per_cpu * nr_cpus; bounces = atoi(argv[3]); if (bounces <= 0) { _err("invalid bounces"); usage(); } - nr_pages = nr_pages_per_cpu * nr_cpus; - if (test_type == TEST_SHMEM || test_type == TEST_HUGETLB) { + if (test_type == TEST_SHMEM || test_is_hugetlb()) { unsigned int memfd_flags = 0; - if (test_type == TEST_HUGETLB) + if (test_is_hugetlb()) memfd_flags = MFD_HUGETLB; mem_fd = memfd_create(argv[0], memfd_flags); if (mem_fd < 0) From patchwork Fri Oct 21 16:37:00 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015110 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B013EFA373E for ; Fri, 21 Oct 2022 16:38:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B07D8E002B; Fri, 21 Oct 2022 12:37:59 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 61E298E0001; Fri, 21 Oct 2022 12:37:59 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 37A128E002B; Fri, 21 Oct 2022 12:37:59 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id 26C8A8E0001 for ; Fri, 21 Oct 2022 12:37:59 -0400 (EDT) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id E9DF4AAEB7 for ; Fri, 21 Oct 2022 16:37:58 +0000 (UTC) X-FDA: 80045513436.14.0FA9D61 Received: from mail-vs1-f74.google.com (mail-vs1-f74.google.com [209.85.217.74]) by imf13.hostedemail.com (Postfix) with ESMTP id 574CD2002C for ; Fri, 21 Oct 2022 16:37:58 +0000 (UTC) Received: by mail-vs1-f74.google.com with SMTP id b68-20020a676747000000b003a780247edcso1062448vsc.14 for ; Fri, 21 Oct 2022 09:37:58 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=9y9Kps0uMUxaUviUQmHpRsZg2wkh8g+3BdgpOUsdpD0=; b=l/7qHlTlqK2+odgNL+qqaWtZd95RSQmfjj2EnMNPwPzYUopC/nQW9oknGDwJAIzM/V toI7WNbfT5WjVMs9srJI2QLMSDhRHFgSpdvMACL3z+IizihCksdhhkBJ5mEioYTHCvo0 FmIcUxXZMiWS/ZIIxOqSMSBxAewQGqIH5uUcc6n9Dwa4sD9nRkufaMXxSFItUN7lDgfT 9USzcKP2HR68osgtg5cJtSMN3cXJ4Jhqhm/TxBcpRTG1sKA8CwhRmqo/CkU6R16k9yiZ DFjIvC7I9+PQFY+MIk1rCSd9HJjdGzJoKWKELko+HJE1P98O26BCJKJPZ6KXbv/ZPUig 8qww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=9y9Kps0uMUxaUviUQmHpRsZg2wkh8g+3BdgpOUsdpD0=; b=K7MCGcsmH+liHmCSkZgHBr+zXqvCFC5QqPPSjpajbMtYVUdugFB+iRpG5AacNg4Tvk u0mRrVRIunV+bjurzp1bTtArj7zMoDSfWngrWB1dNrYkeKiZZgn54gPXRBUxDhMDjmb7 qYtzcemRdUV32dpSs/e3n2PpTV9OsnqlGIMxwo//WemO20KuPTi4yqzwKyXmLRpDIRQH 6YiSX6jEM02+pMBefW1NQCd3Jq4nQ1rTgH51LUwQSrnTbYzRBv+s+KVMRu3K+vXRkqRg iPsyixV3RD9ixY7TBlIg19VNX9OD0byCzHy1YJ37lfLv12DF2tSH/9OXUp7ldmKnJnIS oVyw== X-Gm-Message-State: ACrzQf0cbkf1Tww/g9pW6q1kZKj2ijKevQ2wJOYHFOdMs5KH8RpT3Jzg nOS7h2BymCLRyKurj+/G0CvKjdILGh3CHOhp X-Google-Smtp-Source: AMsMyM4nbZOrU1c/6hMqiMfGOpW3jUqhlLxnmrA187TxhDh2sp8xpyGXRT14XguSYLO8M1Ws1W5QM12ugy8JoJHf X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a67:c80d:0:b0:3aa:895:9630 with SMTP id u13-20020a67c80d000000b003aa08959630mr2609998vsk.15.1666370277616; Fri, 21 Oct 2022 09:37:57 -0700 (PDT) Date: Fri, 21 Oct 2022 16:37:00 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-45-jthoughton@google.com> Subject: [RFC PATCH v2 44/47] selftests/kvm: add HugeTLB HGM to KVM demand paging selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="l/7qHlTl"; spf=pass (imf13.hostedemail.com: domain of 35cpSYwoKCOYRbPWcOPbWVOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--jthoughton.bounces.google.com designates 209.85.217.74 as permitted sender) smtp.mailfrom=35cpSYwoKCOYRbPWcOPbWVOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370278; a=rsa-sha256; cv=none; b=fnMADfnwegZNevZ9FKrG5BnSPiKTITeewvAGYCndB0X+kP6eXJlD5BhcDLUTA4rmVQ1+F8 NXFC5iE9DvEQzh4LEZuLHsG7OFUma6a3eB7fbUZw3orJu0JKGC49a7HtBg6BZ3WLJ1n1Tp 8QvrtG7iiVI6kgIcquC12w0R65vQxY8= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370278; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=9y9Kps0uMUxaUviUQmHpRsZg2wkh8g+3BdgpOUsdpD0=; b=Ae1q00GQkXpi5uZrGVLzVaxILzwfJV/yuutMd3D/k2NVZ3+W2P5K+dolxt0v+nB622u+1I hVnE1HPDbmxsDbgLRQhe0UrG7ZkNM4V8WIKnaoILfTNTUe00WbOVjvTNkkPC9pEAkBdtn0 /BFOVmM3jNBF9sTF1iW0rokZLvBwQSg= Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="l/7qHlTl"; spf=pass (imf13.hostedemail.com: domain of 35cpSYwoKCOYRbPWcOPbWVOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--jthoughton.bounces.google.com designates 209.85.217.74 as permitted sender) smtp.mailfrom=35cpSYwoKCOYRbPWcOPbWVOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Stat-Signature: cqjnf6o3i3ukrtey4co8g3seuzo4ppwm X-Rspamd-Queue-Id: 574CD2002C X-HE-Tag: 1666370278-857343 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This test exercises the GUP paths for HGM. MADV_COLLAPSE is not tested. Signed-off-by: James Houghton --- .../selftests/kvm/demand_paging_test.c | 20 ++++++++++++++++--- .../testing/selftests/kvm/include/test_util.h | 2 ++ tools/testing/selftests/kvm/lib/kvm_util.c | 2 +- tools/testing/selftests/kvm/lib/test_util.c | 14 +++++++++++++ 4 files changed, 34 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/kvm/demand_paging_test.c b/tools/testing/selftests/kvm/demand_paging_test.c index 779ae54f89c4..67ca8703c6b7 100644 --- a/tools/testing/selftests/kvm/demand_paging_test.c +++ b/tools/testing/selftests/kvm/demand_paging_test.c @@ -76,6 +76,12 @@ static int handle_uffd_page_request(int uffd_mode, int uffd, uint64_t addr) clock_gettime(CLOCK_MONOTONIC, &start); + /* + * We're using UFFD_FEATURE_EXACT_ADDRESS, so round down the address. + * This is needed to support HugeTLB high-granularity mapping. + */ + addr &= ~(demand_paging_size - 1); + if (uffd_mode == UFFDIO_REGISTER_MODE_MISSING) { struct uffdio_copy copy; @@ -214,7 +220,8 @@ static void setup_demand_paging(struct kvm_vm *vm, pthread_t *uffd_handler_thread, int pipefd, int uffd_mode, useconds_t uffd_delay, struct uffd_handler_args *uffd_args, - void *hva, void *alias, uint64_t len) + void *hva, void *alias, uint64_t len, + enum vm_mem_backing_src_type src_type) { bool is_minor = (uffd_mode == UFFDIO_REGISTER_MODE_MINOR); int uffd; @@ -244,9 +251,15 @@ static void setup_demand_paging(struct kvm_vm *vm, TEST_ASSERT(uffd >= 0, __KVM_SYSCALL_ERROR("userfaultfd()", uffd)); uffdio_api.api = UFFD_API; - uffdio_api.features = 0; + uffdio_api.features = is_minor + ? UFFD_FEATURE_EXACT_ADDRESS | UFFD_FEATURE_MINOR_HUGETLBFS_HGM + : 0; ret = ioctl(uffd, UFFDIO_API, &uffdio_api); TEST_ASSERT(ret != -1, __KVM_SYSCALL_ERROR("UFFDIO_API", ret)); + if (src_type == VM_MEM_SRC_SHARED_HUGETLB_HGM) + TEST_ASSERT(uffdio_api.features & + UFFD_FEATURE_MINOR_HUGETLBFS_HGM, + "UFFD_FEATURE_MINOR_HUGETLBFS_HGM not present"); uffdio_register.range.start = (uint64_t)hva; uffdio_register.range.len = len; @@ -329,7 +342,8 @@ static void run_test(enum vm_guest_mode mode, void *arg) pipefds[i * 2], p->uffd_mode, p->uffd_delay, &uffd_args[i], vcpu_hva, vcpu_alias, - vcpu_args->pages * perf_test_args.guest_page_size); + vcpu_args->pages * perf_test_args.guest_page_size, + p->src_type); } } diff --git a/tools/testing/selftests/kvm/include/test_util.h b/tools/testing/selftests/kvm/include/test_util.h index befc754ce9b3..0410326dbc18 100644 --- a/tools/testing/selftests/kvm/include/test_util.h +++ b/tools/testing/selftests/kvm/include/test_util.h @@ -96,6 +96,7 @@ enum vm_mem_backing_src_type { VM_MEM_SRC_ANONYMOUS_HUGETLB_16GB, VM_MEM_SRC_SHMEM, VM_MEM_SRC_SHARED_HUGETLB, + VM_MEM_SRC_SHARED_HUGETLB_HGM, NUM_SRC_TYPES, }; @@ -114,6 +115,7 @@ size_t get_def_hugetlb_pagesz(void); const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i); size_t get_backing_src_pagesz(uint32_t i); bool is_backing_src_hugetlb(uint32_t i); +bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type); void backing_src_help(const char *flag); enum vm_mem_backing_src_type parse_backing_src_type(const char *type_name); long get_run_delay(void); diff --git a/tools/testing/selftests/kvm/lib/kvm_util.c b/tools/testing/selftests/kvm/lib/kvm_util.c index f1cb1627161f..7d769a117e14 100644 --- a/tools/testing/selftests/kvm/lib/kvm_util.c +++ b/tools/testing/selftests/kvm/lib/kvm_util.c @@ -896,7 +896,7 @@ void vm_userspace_mem_region_add(struct kvm_vm *vm, region->fd = -1; if (backing_src_is_shared(src_type)) region->fd = kvm_memfd_alloc(region->mmap_size, - src_type == VM_MEM_SRC_SHARED_HUGETLB); + is_backing_src_shared_hugetlb(src_type)); region->mmap_start = mmap(NULL, region->mmap_size, PROT_READ | PROT_WRITE, diff --git a/tools/testing/selftests/kvm/lib/test_util.c b/tools/testing/selftests/kvm/lib/test_util.c index 6d23878bbfe1..710dc42077fe 100644 --- a/tools/testing/selftests/kvm/lib/test_util.c +++ b/tools/testing/selftests/kvm/lib/test_util.c @@ -254,6 +254,13 @@ const struct vm_mem_backing_src_alias *vm_mem_backing_src_alias(uint32_t i) */ .flag = MAP_SHARED, }, + [VM_MEM_SRC_SHARED_HUGETLB_HGM] = { + /* + * Identical to shared_hugetlb except for the name. + */ + .name = "shared_hugetlb_hgm", + .flag = MAP_SHARED, + }, }; _Static_assert(ARRAY_SIZE(aliases) == NUM_SRC_TYPES, "Missing new backing src types?"); @@ -272,6 +279,7 @@ size_t get_backing_src_pagesz(uint32_t i) switch (i) { case VM_MEM_SRC_ANONYMOUS: case VM_MEM_SRC_SHMEM: + case VM_MEM_SRC_SHARED_HUGETLB_HGM: return getpagesize(); case VM_MEM_SRC_ANONYMOUS_THP: return get_trans_hugepagesz(); @@ -288,6 +296,12 @@ bool is_backing_src_hugetlb(uint32_t i) return !!(vm_mem_backing_src_alias(i)->flag & MAP_HUGETLB); } +bool is_backing_src_shared_hugetlb(enum vm_mem_backing_src_type src_type) +{ + return src_type == VM_MEM_SRC_SHARED_HUGETLB || + src_type == VM_MEM_SRC_SHARED_HUGETLB_HGM; +} + static void print_available_backing_src_types(const char *prefix) { int i; From patchwork Fri Oct 21 16:37:01 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015111 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 18944C433FE for ; Fri, 21 Oct 2022 16:38:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 37BF68E002C; Fri, 21 Oct 2022 12:38:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 3031A8E0001; Fri, 21 Oct 2022 12:38:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 12F488E002C; Fri, 21 Oct 2022 12:38:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0017.hostedemail.com [216.40.44.17]) by kanga.kvack.org (Postfix) with ESMTP id F08608E0001 for ; Fri, 21 Oct 2022 12:37:59 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id C64AB80556 for ; Fri, 21 Oct 2022 16:37:59 +0000 (UTC) X-FDA: 80045513478.05.4A8C8FF Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf05.hostedemail.com (Postfix) with ESMTP id 5CDA1100018 for ; Fri, 21 Oct 2022 16:37:59 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id v17-20020a259d91000000b006b4c31c0640so3723468ybp.18 for ; Fri, 21 Oct 2022 09:37:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=rgu86dTBraBMEbo7EYK3wnE2QExNEIbm0lf0S3p+RAM=; b=CDWV7vuGy8k4HSVoqHUUqHq4k/ljtoqD3xxO+wejMuxAHTGuNtdRUbXNgGj+V+78Zx oU99fdRPWJDzHYJJwVoaXcgYCaOBiPK/e2LKKQV+PqTKnIRDuYGoo28iGCk8HwJwicT7 +Q6T6t9e35ieUG8SKhA6GWk9Yngm0X5rtw33d4A+66eY3fHNfiCumY+W+3EeR+pd+k2g EJ/M2S1Kfe/MJZEOuP4oko60RqTpq+UKJsdxFeoYbx/3dIB+RmMCT6+Z8zWR1IyMbdHb V6aR+BivHNTlLmnlyRdniwOEWMZQbRMfTL1+xWOhSK0EfGWdjiwxTjY0tFQre39sDFt5 TSmQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=rgu86dTBraBMEbo7EYK3wnE2QExNEIbm0lf0S3p+RAM=; b=hTjT86zYCM0r2sM0lE4KLTozEhXF8TH262kUJrzSiogfJYPnpV8enHjYzu0ICKtnEu nadX8BrUfEVhxDtDn2Yr+AZuRogA1+ouAd48hSpjke+M0OCnzIa1POguJTp5+NTicXtp 4KdjQPCaBXyHOanfw6QgpJeaIMEYWM8Q3Uvy8cqotw0swehO4mgGXPij9U2uWH+mOW4U CaY6zKR8QekQOgr8QhzJzWSWth5bELU8PEUlKkIKDVh5u/c2x+EBOSqItX50OmEIzpra 3lcZ6e0tNE7p50iCq5VXK3Ytcq7Ag8tMguK/Kk98pZotnp28rNUO1V/litM6oC0dmz8q PG8w== X-Gm-Message-State: ACrzQf21/6Jv3C9PFq6YkGCUIgqgb3VRciwiWDzAB0NCENHA5mGSWAB/ CFkGqXIlq3CK/tuFGoujtGx6U1HEk1cumXjX X-Google-Smtp-Source: AMsMyM4PvTvrmhRgENKB7MFbK/ytPZdbawGZknlyGSBuoh8Jx8BOkToygX7N4aket6Bhv6WXDKkhj4XIOO1VetbE X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a25:f448:0:b0:6ca:22e1:638c with SMTP id p8-20020a25f448000000b006ca22e1638cmr10364996ybe.252.1666370278696; Fri, 21 Oct 2022 09:37:58 -0700 (PDT) Date: Fri, 21 Oct 2022 16:37:01 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-46-jthoughton@google.com> Subject: [RFC PATCH v2 45/47] selftests/vm: add anon and shared hugetlb to migration test From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Authentication-Results: i=1; imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=CDWV7vuG; spf=pass (imf05.hostedemail.com: domain of 35spSYwoKCOcScQXdPQcXWPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=35spSYwoKCOcScQXdPQcXWPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370279; a=rsa-sha256; cv=none; b=U9DRTcem9ZAwKVCYGjOrzspGBPBdgdF7koThK8JHCpNasU8Z5R0C1BlnJ+83/uRDTihQ7+ QdtApbUWWbY5qD3d6+9K5KAoTnOL1RtxGxI85mWy07B2LCLwRtDhztpetJJwQTjYBTpA2A JGvGImCyySgnJVGkfo61iI2hfjPKRec= ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370279; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=rgu86dTBraBMEbo7EYK3wnE2QExNEIbm0lf0S3p+RAM=; b=bXxzChkgdw68RjxsqjLa2eA6NY+xPkGIApEW7X59Y2aG4Z9CzI2OIX9QClBktvSQrz3ziP glyA6F46IuBM2yObZmZsa1DpbN0Jf9aPl/oSRU/1jrF3A8/PKaXCpcCtezzSQT8MkYkMd2 YHjK62s5qTHnehJ4Xu223B99b+MPEBI= Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=CDWV7vuG; spf=pass (imf05.hostedemail.com: domain of 35spSYwoKCOcScQXdPQcXWPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--jthoughton.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=35spSYwoKCOcScQXdPQcXWPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: hmqk8dgitf1qddo9sa5gaq7ngxajcq78 X-Rspamd-Queue-Id: 5CDA1100018 X-Rspamd-Server: rspam02 X-Rspam-User: X-HE-Tag: 1666370279-743709 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Shared HugeTLB mappings are migrated best-effort. Sometimes, due to being unable to grab the VMA lock for writing, migration may just randomly fail. To allow for that, we allow retries. Signed-off-by: James Houghton --- tools/testing/selftests/vm/migration.c | 83 ++++++++++++++++++++++++-- 1 file changed, 79 insertions(+), 4 deletions(-) diff --git a/tools/testing/selftests/vm/migration.c b/tools/testing/selftests/vm/migration.c index 1cec8425e3ca..21577a84d7e4 100644 --- a/tools/testing/selftests/vm/migration.c +++ b/tools/testing/selftests/vm/migration.c @@ -13,6 +13,7 @@ #include #include #include +#include #define TWOMEG (2<<20) #define RUNTIME (60) @@ -59,11 +60,12 @@ FIXTURE_TEARDOWN(migration) free(self->pids); } -int migrate(uint64_t *ptr, int n1, int n2) +int migrate(uint64_t *ptr, int n1, int n2, int retries) { int ret, tmp; int status = 0; struct timespec ts1, ts2; + int failed = 0; if (clock_gettime(CLOCK_MONOTONIC, &ts1)) return -1; @@ -78,6 +80,9 @@ int migrate(uint64_t *ptr, int n1, int n2) ret = move_pages(0, 1, (void **) &ptr, &n2, &status, MPOL_MF_MOVE_ALL); if (ret) { + if (++failed < retries) + continue; + if (ret > 0) printf("Didn't migrate %d pages\n", ret); else @@ -88,6 +93,7 @@ int migrate(uint64_t *ptr, int n1, int n2) tmp = n2; n2 = n1; n1 = tmp; + failed = 0; } return 0; @@ -128,7 +134,7 @@ TEST_F_TIMEOUT(migration, private_anon, 2*RUNTIME) if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) perror("Couldn't create thread"); - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); } @@ -158,7 +164,7 @@ TEST_F_TIMEOUT(migration, shared_anon, 2*RUNTIME) self->pids[i] = pid; } - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(kill(self->pids[i], SIGTERM), 0); } @@ -185,9 +191,78 @@ TEST_F_TIMEOUT(migration, private_anon_thp, 2*RUNTIME) if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) perror("Couldn't create thread"); - ASSERT_EQ(migrate(ptr, self->n1, self->n2), 0); + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); + for (i = 0; i < self->nthreads - 1; i++) + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); +} + +/* + * Tests the anon hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, private_anon_hugetlb, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + ptr = mmap(NULL, TWOMEG, PROT_READ | PROT_WRITE, + MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB, -1, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not allocate hugetlb pages"); + + memset(ptr, 0xde, TWOMEG); + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 1), 0); for (i = 0; i < self->nthreads - 1; i++) ASSERT_EQ(pthread_cancel(self->threads[i]), 0); } +/* + * Tests the shared hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, shared_hugetlb, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + int fd; + unsigned long sz; + struct statfs filestat; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + fd = memfd_create("tmp_hugetlb", MFD_HUGETLB); + if (fd < 0) + SKIP(return, "Couldn't create hugetlb memfd"); + + if (fstatfs(fd, &filestat) < 0) + SKIP(return, "Couldn't fstatfs hugetlb file"); + + sz = filestat.f_bsize; + + if (ftruncate(fd, sz)) + SKIP(return, "Couldn't allocate hugetlb pages"); + ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not map hugetlb pages"); + + memset(ptr, 0xde, sz); + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 10), 0); + for (i = 0; i < self->nthreads - 1; i++) { + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); + pthread_join(self->threads[i], NULL); + } + ftruncate(fd, 0); + close(fd); +} + TEST_HARNESS_MAIN From patchwork Fri Oct 21 16:37:02 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015112 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 6A440FA373D for ; Fri, 21 Oct 2022 16:38:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 021228E002D; Fri, 21 Oct 2022 12:38:01 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id EC3608E0001; Fri, 21 Oct 2022 12:38:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CEFBB8E002D; Fri, 21 Oct 2022 12:38:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B5D228E0001 for ; Fri, 21 Oct 2022 12:38:00 -0400 (EDT) Received: from smtpin27.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 868B2160208 for ; Fri, 21 Oct 2022 16:38:00 +0000 (UTC) X-FDA: 80045513520.27.18263A1 Received: from mail-yw1-f202.google.com (mail-yw1-f202.google.com [209.85.128.202]) by imf02.hostedemail.com (Postfix) with ESMTP id DAD9380035 for ; Fri, 21 Oct 2022 16:37:59 +0000 (UTC) Received: by mail-yw1-f202.google.com with SMTP id 00721157ae682-3697bd55974so33434137b3.15 for ; Fri, 21 Oct 2022 09:37:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=hIQEWHlSLo/XkVLbilcDDKMbq3Gd3wcTC7FKcNbq2ok=; b=kwDsO1K2E/Ygmg/+9L0K1r1iIEM7XdI2MkqQcBsRmTHXB24iYmGgzQlcuEW2FKjX7m plhMEQJHzYM8g9ACG97HJE8ZCRQlE3FdOGdPSeXGWAUONWB42bOUJIthA2+V8zHKec/m fC6Mz0RKHc+cbwkbZjPlGqO5Uqz3GxoAe2WPWIrxxuon+8lLDyi8pd/A3tEgkPonOJa4 QGdAVkjCxEDMcQL2niAH/NOQhDCN8KVl0p+89L9GZ9Hl7RKE4T/WVn9sk/Y8tmIFdKCL iWCcUBvK1VCBpMt5bjp85RIt+A5+KMo4rALu1oLpUelOWsIAmrxOn9f5I2sbOnyNjqPF QggA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=hIQEWHlSLo/XkVLbilcDDKMbq3Gd3wcTC7FKcNbq2ok=; b=0An20AzE5rj6N4ZwoAqMxjpoQMWInn0MhBreWQAg6dt8VByRfLeC8sXmZ02Q9nqhKi G5sXL1JGr7h+DnAWY1JDIP7KVgqPkL5QxBct7Y1eq99mv8A8FgPIdBWnV432sje+v9B7 7iZSIcabvLw6QtDJUBCEWdJrDSpJeyhZTnDpkfNz8ar3VP332tW/YGKwScoXb/lZpOba utthEDWFV7rX3RNWzGPyfAwjPJcAxV/hNAEeCcmpRjob7JpDj4RYYW7Z00G449q+xuFY JFikex5Q6PXP0apDJsweytuIoamdQG5/h3eBNFVjjBALgiU9G3RMr3DzyBWk6A7AIp/l JIXg== X-Gm-Message-State: ACrzQf3e4boPW1OLp9L51QBEMAt1mhS8yimUfL5LJSAH+Vj7k2l2OxZr hs0qfHj2mUF3V29grK/dmnKcLqAQlwKMe9as X-Google-Smtp-Source: AMsMyM4rZCj+506qOriWzb44CLh78uoPB4FCIUSm75q+2TtVzxnulSdkwqX6XKYclU57H68nfYys4vC1g5f0/3cX X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a5b:f11:0:b0:6be:94c1:65e2 with SMTP id x17-20020a5b0f11000000b006be94c165e2mr17452348ybr.283.1666370279477; Fri, 21 Oct 2022 09:37:59 -0700 (PDT) Date: Fri, 21 Oct 2022 16:37:02 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-47-jthoughton@google.com> Subject: [RFC PATCH v2 46/47] selftests/vm: add hugetlb HGM test to migration selftest From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370279; a=rsa-sha256; cv=none; b=8YMIeHMM48s3ij/F6nRTheblJM5Cbz9fvHt+aS68q+APOyFO3FMseVzJziS2BxS/ppGLb7 QzXjGjvQPCTUOD+wuF8bLfBtU9zxA+IgkSwCtFidEdqu6X2BxG5KymMAbhwAInz2unO31E V3HM6bh2UL5VyQpvQgXlt11tYyHkJMk= ARC-Authentication-Results: i=1; imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kwDsO1K2; spf=pass (imf02.hostedemail.com: domain of 358pSYwoKCOgTdRYeQRdYXQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=358pSYwoKCOgTdRYeQRdYXQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370279; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=hIQEWHlSLo/XkVLbilcDDKMbq3Gd3wcTC7FKcNbq2ok=; b=WdozyKMyw73URI1qevycsFxoW4yHBZRJMXDhQBi8g1szVBlP+jD1fPETiSw27n1jBmjkiY ZmRwF2HK9Xqtx4q9I22NgajB0z2Z4PUkalbSz5i9gG7Koan4lm7LiVfElOgakbEwvmf+Qj TojD3jBplVzn8531gBrFnEbA6alzgfA= X-Stat-Signature: kquwart4owgktmzgtwngqkbggdee3rwi X-Rspamd-Queue-Id: DAD9380035 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kwDsO1K2; spf=pass (imf02.hostedemail.com: domain of 358pSYwoKCOgTdRYeQRdYXQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--jthoughton.bounces.google.com designates 209.85.128.202 as permitted sender) smtp.mailfrom=358pSYwoKCOgTdRYeQRdYXQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--jthoughton.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam05 X-HE-Tag: 1666370279-242254 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is mostly the same as the shared HugeTLB case, but instead of mapping the page with a regular page fault, we map it with lots of UFFDIO_CONTINUE operations. We also verify that the contents haven't changed after the migration, which would be the case if the post-migration PTEs pointed to the wrong page. Signed-off-by: James Houghton --- tools/testing/selftests/vm/migration.c | 139 +++++++++++++++++++++++++ 1 file changed, 139 insertions(+) diff --git a/tools/testing/selftests/vm/migration.c b/tools/testing/selftests/vm/migration.c index 21577a84d7e4..89cb5934f139 100644 --- a/tools/testing/selftests/vm/migration.c +++ b/tools/testing/selftests/vm/migration.c @@ -14,6 +14,11 @@ #include #include #include +#include +#include +#include +#include +#include #define TWOMEG (2<<20) #define RUNTIME (60) @@ -265,4 +270,138 @@ TEST_F_TIMEOUT(migration, shared_hugetlb, 2*RUNTIME) close(fd); } +#ifdef __NR_userfaultfd +static int map_at_high_granularity(char *mem, size_t length) +{ + int i; + int ret; + int uffd = syscall(__NR_userfaultfd, 0); + struct uffdio_api api; + struct uffdio_register reg; + int pagesize = getpagesize(); + + if (uffd < 0) { + perror("couldn't create uffd"); + return uffd; + } + + api.api = UFFD_API; + api.features = UFFD_FEATURE_MISSING_HUGETLBFS + | UFFD_FEATURE_MINOR_HUGETLBFS + | UFFD_FEATURE_MINOR_HUGETLBFS_HGM; + + ret = ioctl(uffd, UFFDIO_API, &api); + if (ret || api.api != UFFD_API) { + perror("UFFDIO_API failed"); + goto out; + } + + reg.range.start = (unsigned long)mem; + reg.range.len = length; + + reg.mode = UFFDIO_REGISTER_MODE_MISSING | UFFDIO_REGISTER_MODE_MINOR; + + ret = ioctl(uffd, UFFDIO_REGISTER, ®); + if (ret) { + perror("UFFDIO_REGISTER failed"); + goto out; + } + + /* UFFDIO_CONTINUE each 4K segment of the 2M page. */ + for (i = 0; i < length/pagesize; ++i) { + struct uffdio_continue cont; + + cont.range.start = (unsigned long long)mem + i * pagesize; + cont.range.len = pagesize; + cont.mode = 0; + ret = ioctl(uffd, UFFDIO_CONTINUE, &cont); + if (ret) { + fprintf(stderr, "UFFDIO_CONTINUE failed " + "for %llx -> %llx: %d\n", + cont.range.start, + cont.range.start + cont.range.len, + errno); + goto out; + } + } + ret = 0; +out: + close(uffd); + return ret; +} +#else +static int map_at_high_granularity(char *mem, size_t length) +{ + fprintf(stderr, "Userfaultfd missing\n"); + return -1; +} +#endif /* __NR_userfaultfd */ + +/* + * Tests the high-granularity hugetlb migration entry paths. + */ +TEST_F_TIMEOUT(migration, shared_hugetlb_hgm, 2*RUNTIME) +{ + uint64_t *ptr; + int i; + int fd; + unsigned long sz; + struct statfs filestat; + + if (self->nthreads < 2 || self->n1 < 0 || self->n2 < 0) + SKIP(return, "Not enough threads or NUMA nodes available"); + + fd = memfd_create("tmp_hugetlb", MFD_HUGETLB); + if (fd < 0) + SKIP(return, "Couldn't create hugetlb memfd"); + + if (fstatfs(fd, &filestat) < 0) + SKIP(return, "Couldn't fstatfs hugetlb file"); + + sz = filestat.f_bsize; + + if (ftruncate(fd, sz)) + SKIP(return, "Couldn't allocate hugetlb pages"); + + if (fallocate(fd, 0, 0, sz) < 0) { + perror("fallocate failed"); + SKIP(return, "fallocate failed"); + } + + ptr = mmap(NULL, sz, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (ptr == MAP_FAILED) + SKIP(return, "Could not allocate hugetlb pages"); + + /* + * We have to map_at_high_granularity before we memset, otherwise + * memset will map everything at the hugepage size. + */ + if (map_at_high_granularity((char *)ptr, sz) < 0) + SKIP(return, "Could not map HugeTLB range at high granularity"); + + /* Populate the page we're migrating. */ + for (i = 0; i < sz/sizeof(*ptr); ++i) + ptr[i] = i; + + for (i = 0; i < self->nthreads - 1; i++) + if (pthread_create(&self->threads[i], NULL, access_mem, ptr)) + perror("Couldn't create thread"); + + ASSERT_EQ(migrate(ptr, self->n1, self->n2, 10), 0); + for (i = 0; i < self->nthreads - 1; i++) { + ASSERT_EQ(pthread_cancel(self->threads[i]), 0); + pthread_join(self->threads[i], NULL); + } + + /* Check that the contents didnt' change. */ + for (i = 0; i < sz/sizeof(*ptr); ++i) { + ASSERT_EQ(ptr[i], i); + if (ptr[i] != i) + break; + } + + ftruncate(fd, 0); + close(fd); +} + TEST_HARNESS_MAIN From patchwork Fri Oct 21 16:37:03 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: James Houghton X-Patchwork-Id: 13015113 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id B6F2BFA373E for ; Fri, 21 Oct 2022 16:38:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6AE798E002E; Fri, 21 Oct 2022 12:38:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 63AB78E0001; Fri, 21 Oct 2022 12:38:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 303928E002E; Fri, 21 Oct 2022 12:38:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 0C3EC8E0001 for ; Fri, 21 Oct 2022 12:38:02 -0400 (EDT) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay03.hostedemail.com (Postfix) with ESMTP id AE7DAA12D0 for ; Fri, 21 Oct 2022 16:38:01 +0000 (UTC) X-FDA: 80045513562.12.33786F9 Received: from mail-ua1-f74.google.com (mail-ua1-f74.google.com [209.85.222.74]) by imf30.hostedemail.com (Postfix) with ESMTP id 3FF0580035 for ; Fri, 21 Oct 2022 16:38:01 +0000 (UTC) Received: by mail-ua1-f74.google.com with SMTP id o102-20020a9f356f000000b003e32b0c0f74so2335758uao.22 for ; Fri, 21 Oct 2022 09:38:00 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:from:to:cc:subject:date:message-id:reply-to; bh=qP1ClP7q+g5S/3EyNUqzGRFXdONCD3+1un/KPsFlt6w=; b=QMIS+S3FZjAOFL/rBkRiry87Th5Kd5EwT0c05LySCp4nkCaELh/zdUD5+Aa9N/+7D7 SttWCG9Hi3HpsVOYK516SluMw+1U5skL5XiUCiezO60dxluHkHXNJD2BPxzYy9ElIoIo CqVjIAYat6VGsTooAFXgZhyY31kEJ2Pb7dKKQXnSZh6pHHmXteCqEnslryEPoKlJ/1HL ZJ2O0i+mmtp0eV1DkFySFLMzu3xcpuYZRBOXaXNztWI6gXkUaHxqX1Lu3FuqcpPbUo/J YTGrOVJlkmeX1/WedJcVMPwDEck6qTmRFmeRuAAp5UxDdvPetKdD3Fa+pCCwGznSQq8b F+kg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:from:subject:message-id:references:mime-version:in-reply-to :date:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=qP1ClP7q+g5S/3EyNUqzGRFXdONCD3+1un/KPsFlt6w=; b=7LSA23tEeMzL29d+i5DvUUns/i6QMqPay6Rou2d99ByNPqjpuvSCzysZP33jtoWl9n WsCVFKf5s0bMBjOvARyLihJ6WIRhZTgJg7YpPc3rlCramgiy72JRcSHIGavjmt/fkDl5 oGmTMJZQe4dhyK2fBf4rbjQPp0co2wukFHN4I3o7SFYd9WsoCy9lATzCLfkrEjOo3pBn wiQCmYjH59kgF1kuW8o3Hsig7GOQtBg5ZkC9JfBb+e/jOVoOtOmDLPW/k/C0wBmhdiPv FkA3L1RJHl39Bc4+uix0rR3mKjwuRkmM/TDEi5lIa2oUXPTGhabGhWOoq3S/KpQvrbOu Yxxw== X-Gm-Message-State: ACrzQf3ODOqCYqIy6MEsv6py3mK/SpzLX49Qx+6s779Ck5aAqJu2SqZU PeTE8LjHLnyDM/tAdoxI2LQPBDtwoe/HxBSd X-Google-Smtp-Source: AMsMyM4IUmuvbEMg95LOcPWISiFLwBaVK46TD/2jo2TirSfe33B71xuI4Gxr0O7eiTjo3WhjYikEb+f+W3ttCA5W X-Received: from jthoughton.c.googlers.com ([fda3:e722:ac3:cc00:14:4d90:c0a8:2a4f]) (user=jthoughton job=sendgmr) by 2002:a05:6102:3d99:b0:3a9:5976:cd84 with SMTP id h25-20020a0561023d9900b003a95976cd84mr12345154vsv.4.1666370280478; Fri, 21 Oct 2022 09:38:00 -0700 (PDT) Date: Fri, 21 Oct 2022 16:37:03 +0000 In-Reply-To: <20221021163703.3218176-1-jthoughton@google.com> Mime-Version: 1.0 References: <20221021163703.3218176-1-jthoughton@google.com> X-Mailer: git-send-email 2.38.0.135.g90850a2211-goog Message-ID: <20221021163703.3218176-48-jthoughton@google.com> Subject: [RFC PATCH v2 47/47] selftests/vm: add HGM UFFDIO_CONTINUE and hwpoison tests From: James Houghton To: Mike Kravetz , Muchun Song , Peter Xu Cc: David Hildenbrand , David Rientjes , Axel Rasmussen , Mina Almasry , "Zach O'Keefe" , Manish Mishra , Naoya Horiguchi , "Dr . David Alan Gilbert" , "Matthew Wilcox (Oracle)" , Vlastimil Babka , Baolin Wang , Miaohe Lin , Yang Shi , Andrew Morton , linux-mm@kvack.org, linux-kernel@vger.kernel.org, James Houghton ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1666370281; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:content-transfer-encoding: in-reply-to:in-reply-to:references:references:dkim-signature; bh=qP1ClP7q+g5S/3EyNUqzGRFXdONCD3+1un/KPsFlt6w=; b=Ew6u8fA5Hj+l7rGCgzFbxU061LJxEku2MFF7Xx3X/Gn3rtvhqDzSCF1NJDe3wHpkVcSu7p uPMSG7LfluHMKh8rDkTySf9MacxHGjMq3ZbVohp1XJ2mSC46fVXp9Kwd+ADLEhxo6aaDA4 RzwnO2mKl+XMb6psG9yPr8JwklF7GkU= ARC-Authentication-Results: i=1; imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=QMIS+S3F; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 36MpSYwoKCOkUeSZfRSeZYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jthoughton.bounces.google.com designates 209.85.222.74 as permitted sender) smtp.mailfrom=36MpSYwoKCOkUeSZfRSeZYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jthoughton.bounces.google.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1666370281; a=rsa-sha256; cv=none; b=c4H+TPFvvzfXpO3LBqOf940k/7D3Aj2MI3NvIakI4au9skFkNmEZ7QCskDAUbeq5LmKGHj IDpMJs7VVAJRRJOeyLkQDZc4rgNpvlRr08kxeuZExhADYYn1bgK5Oow+ZAjcFBQ1ahA9V4 A1IAHLsU7simCsvdd3G0oGzlTu3MDDE= X-Rspam-User: X-Rspamd-Queue-Id: 3FF0580035 X-Stat-Signature: u3k35uy8qr1hg9zmgg3ay754gsmzbc8a Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=QMIS+S3F; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf30.hostedemail.com: domain of 36MpSYwoKCOkUeSZfRSeZYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jthoughton.bounces.google.com designates 209.85.222.74 as permitted sender) smtp.mailfrom=36MpSYwoKCOkUeSZfRSeZYRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--jthoughton.bounces.google.com X-Rspamd-Server: rspam07 X-HE-Tag: 1666370281-265401 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This tests that high-granularity CONTINUEs at all sizes work (exercising contiguous PTE sizes for arm64, when support is added). This also tests that collapse works and hwpoison works correctly (although we aren't yet testing high-granularity poison). Signed-off-by: James Houghton --- tools/testing/selftests/vm/Makefile | 1 + tools/testing/selftests/vm/hugetlb-hgm.c | 326 +++++++++++++++++++++++ 2 files changed, 327 insertions(+) create mode 100644 tools/testing/selftests/vm/hugetlb-hgm.c diff --git a/tools/testing/selftests/vm/Makefile b/tools/testing/selftests/vm/Makefile index 00920cb8b499..da1e01a5ac9b 100644 --- a/tools/testing/selftests/vm/Makefile +++ b/tools/testing/selftests/vm/Makefile @@ -32,6 +32,7 @@ TEST_GEN_FILES += compaction_test TEST_GEN_FILES += gup_test TEST_GEN_FILES += hmm-tests TEST_GEN_FILES += hugetlb-madvise +TEST_GEN_FILES += hugetlb-hgm TEST_GEN_FILES += hugepage-mmap TEST_GEN_FILES += hugepage-mremap TEST_GEN_FILES += hugepage-shm diff --git a/tools/testing/selftests/vm/hugetlb-hgm.c b/tools/testing/selftests/vm/hugetlb-hgm.c new file mode 100644 index 000000000000..e36a1c988bb4 --- /dev/null +++ b/tools/testing/selftests/vm/hugetlb-hgm.c @@ -0,0 +1,326 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * Test uncommon cases in HugeTLB high-granularity mapping: + * 1. Test all supported high-granularity page sizes (with MADV_COLLAPSE). + * 2. Test MADV_HWPOISON behavior. + */ + +#define _GNU_SOURCE +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#define PAGE_MASK ~(4096 - 1) + +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif + +#define PREFIX " ... " + +int userfaultfd(int flags) +{ + return syscall(__NR_userfaultfd, flags); +} + +int map_range(int uffd, char *addr, uint64_t length) +{ + struct uffdio_continue cont = { + .range = (struct uffdio_range) { + .start = (uint64_t)addr, + .len = length, + }, + .mode = 0, + .mapped = 0, + }; + + if (ioctl(uffd, UFFDIO_CONTINUE, &cont) < 0) { + perror("UFFDIO_CONTINUE failed"); + return -1; + } + return 0; +} + +int check_equal(char *mapping, size_t length, char value) +{ + size_t i; + + for (i = 0; i < length; ++i) + if (mapping[i] != value) { + printf("mismatch at %p (%d != %d)\n", &mapping[i], + mapping[i], value); + return -1; + } + + return 0; +} + +int test_continues(int uffd, char *primary_map, char *secondary_map, size_t len, + bool verify) +{ + size_t offset = 0; + unsigned char iter = 0; + unsigned long pagesize = getpagesize(); + uint64_t size; + + for (size = len/2; size >= pagesize; + offset += size, size /= 2) { + iter++; + memset(secondary_map + offset, iter, size); + printf(PREFIX "UFFDIO_CONTINUE: %p -> %p = %d%s\n", + primary_map + offset, + primary_map + offset + size, + iter, + verify ? " (and verify)" : ""); + if (map_range(uffd, primary_map + offset, size)) + return -1; + if (verify && check_equal(primary_map + offset, size, iter)) + return -1; + } + return 0; +} + +int test_collapse(char *primary_map, size_t len, bool hwpoison) +{ + size_t offset; + int i; + uint64_t size; + + printf(PREFIX "collapsing %p -> %p\n", primary_map, primary_map + len); + if (madvise(primary_map, len, MADV_COLLAPSE) < 0) { + if (errno == EHWPOISON && hwpoison) { + /* this is expected for the hwpoison test. */ + printf(PREFIX "could not collapse due to poison\n"); + return 0; + } + perror("collapse failed"); + return -1; + } + + printf(PREFIX "verifying %p -> %p\n", primary_map, primary_map + len); + + offset = 0; + i = 0; + for (size = len/2; size > 4096; offset += size, size /= 2) { + if (check_equal(primary_map + offset, size, ++i)) + return -1; + } + /* expect the last 4K to be zero. */ + if (check_equal(primary_map + len - 4096, 4096, 0)) + return -1; + + return 0; +} + +static void *poisoned_addr; + +void sigbus_handler(int signo, siginfo_t *info, void *context) +{ + if (info->si_code != BUS_MCEERR_AR) + goto kill; + poisoned_addr = info->si_addr; +kill: + pthread_exit(NULL); +} + +void *access_mem(void *addr) +{ + volatile char *ptr = addr; + + *ptr; + return NULL; +} + +int test_poison_sigbus(char *addr) +{ + int ret = 0; + pthread_t pthread; + + poisoned_addr = (void *)0xBADBADBAD; + ret = pthread_create(&pthread, NULL, &access_mem, addr); + if (pthread_create(&pthread, NULL, &access_mem, addr)) { + printf("failed to create thread: %s\n", strerror(ret)); + return ret; + } + + pthread_join(pthread, NULL); + if (poisoned_addr != addr) { + printf("got incorrect poisoned address: %p vs %p\n", + poisoned_addr, addr); + return -1; + } + return 0; +} + +int test_hwpoison(char *primary_map, size_t len) +{ + const unsigned long pagesize = getpagesize(); + const int num_poison_checks = 512; + unsigned long bytes_per_check = len/num_poison_checks; + struct sigaction new, old; + int i; + + printf(PREFIX "poisoning %p -> %p\n", primary_map, primary_map + len); + if (madvise(primary_map, len, MADV_HWPOISON) < 0) { + perror("MADV_HWPOISON failed"); + return -1; + } + + printf(PREFIX "checking that it was poisoned " + "(%d addresses within %p -> %p)\n", + num_poison_checks, primary_map, primary_map + len); + + new.sa_sigaction = &sigbus_handler; + new.sa_flags = SA_SIGINFO; + if (sigaction(SIGBUS, &new, &old) < 0) { + perror("could not setup SIGBUS handler"); + return -1; + } + + if (pagesize > bytes_per_check) + bytes_per_check = pagesize; + + for (i = 0; i < len; i += bytes_per_check) + if (test_poison_sigbus(primary_map + i) < 0) + return -1; + /* check very last byte, because we left it unmapped */ + if (test_poison_sigbus(primary_map + len - 1)) + return -1; + + return 0; +} + +int test_hgm(int fd, size_t hugepagesize, size_t len, bool hwpoison) +{ + int ret = 0; + int uffd; + char *primary_map, *secondary_map; + struct uffdio_api api; + struct uffdio_register reg; + + if (ftruncate(fd, len) < 0) { + perror("ftruncate failed"); + return -1; + } + + uffd = userfaultfd(O_CLOEXEC | O_NONBLOCK); + if (uffd < 0) { + perror("uffd not created"); + return -1; + } + + primary_map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (primary_map == MAP_FAILED) { + perror("mmap for primary mapping failed"); + ret = -1; + goto close_uffd; + } + secondary_map = mmap(NULL, len, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0); + if (secondary_map == MAP_FAILED) { + perror("mmap for secondary mapping failed"); + ret = -1; + goto unmap_primary; + } + + printf(PREFIX "primary mapping: %p\n", primary_map); + printf(PREFIX "secondary mapping: %p\n", secondary_map); + + api.api = UFFD_API; + api.features = UFFD_FEATURE_MINOR_HUGETLBFS | + UFFD_FEATURE_MISSING_HUGETLBFS | + UFFD_FEATURE_MINOR_HUGETLBFS_HGM | UFFD_FEATURE_SIGBUS | + UFFD_FEATURE_EXACT_ADDRESS; + if (ioctl(uffd, UFFDIO_API, &api) == -1) { + perror("UFFDIO_API failed"); + ret = -1; + goto out; + } + if (!(api.features & UFFD_FEATURE_MINOR_HUGETLBFS_HGM)) { + puts("UFFD_FEATURE_MINOR_HUGETLBFS_HGM not present"); + ret = -1; + goto out; + } + + reg.range.start = (unsigned long)primary_map; + reg.range.len = len; + reg.mode = UFFDIO_REGISTER_MODE_MINOR | UFFDIO_REGISTER_MODE_MISSING; + reg.ioctls = 0; + if (ioctl(uffd, UFFDIO_REGISTER, ®) == -1) { + perror("register failed"); + ret = -1; + goto out; + } + + if (test_continues(uffd, primary_map, secondary_map, len, !hwpoison) + || (hwpoison && test_hwpoison(primary_map, len)) + || test_collapse(primary_map, len, hwpoison)) { + ret = -1; + } + + if (ftruncate(fd, 0) < 0) { + perror("ftruncate back to 0 failed"); + ret = -1; + } + +out: + munmap(secondary_map, len); +unmap_primary: + munmap(primary_map, len); +close_uffd: + close(uffd); + return ret; +} + +int main(void) +{ + int fd; + struct statfs file_stat; + size_t hugepagesize; + size_t len; + + fd = memfd_create("hugetlb_tmp", MFD_HUGETLB); + if (fd < 0) { + perror("could not open hugetlbfs file"); + return -1; + } + + memset(&file_stat, 0, sizeof(file_stat)); + if (fstatfs(fd, &file_stat)) { + perror("fstatfs failed"); + goto close; + } + if (file_stat.f_type != HUGETLBFS_MAGIC) { + printf("not hugetlbfs file\n"); + goto close; + } + + hugepagesize = file_stat.f_bsize; + len = 2 * hugepagesize; + printf("HGM regular test...\n"); + printf("HGM regular test: %s\n", + test_hgm(fd, hugepagesize, len, false) + ? "FAILED" : "PASSED"); + printf("HGM hwpoison test...\n"); + printf("HGM hwpoison test: %s\n", + test_hgm(fd, hugepagesize, len, true) + ? "FAILED" : "PASSED"); +close: + close(fd); + + return 0; +}