From patchwork Sun Apr 10 13:54:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808145 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 658DDC433F5 for ; Sun, 10 Apr 2022 13:55:01 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B17606B0073; Sun, 10 Apr 2022 09:55:00 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A9F846B0074; Sun, 10 Apr 2022 09:55:00 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A4946B0075; Sun, 10 Apr 2022 09:55:00 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 799BC6B0073 for ; Sun, 10 Apr 2022 09:55:00 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 49E1F2CF for ; Sun, 10 Apr 2022 13:55:00 +0000 (UTC) X-FDA: 79341115560.13.DBC5C59 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf25.hostedemail.com (Postfix) with ESMTP id D66EEA0007 for ; Sun, 10 Apr 2022 13:54:59 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id h8-20020a25e208000000b00628c0565607so10269805ybe.0 for ; Sun, 10 Apr 2022 06:54:59 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=vKX9tDF77gkFY4J7UZ8x9dswmSDs5FlwUgGPOHO0F9E=; b=ephcHAQYALiNbiXaNq6CdDrvNc1nqEHGRvP34x7kXoDuM5nWE2MgNBU/0Td2+B/US+ DvFYRls45beF9ktg1ZmMInlSjbCdDoBYgwsqgwkYw5xp/KS+XEnq9ABgv39NQsD8L2Ak TS21p68W77ZGEJNkBy/1suW4swyn+ZVKWE5E9320Gyu1NWls3vfoBCCExdqpw7IHGMSf wsPi/rEBGtjP6MDoum7yFNzFb+XHdwR65agxDPeXKEnEPjSTytHOu7+iUnPzw/57YlC2 OZvxz74lRs1fF4h//aKM3Q754RkRcZANG5FiR4Pm6gYQ0cOSUdq8bKCU52f9EislvYMB wi/Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=vKX9tDF77gkFY4J7UZ8x9dswmSDs5FlwUgGPOHO0F9E=; b=R6US6wnsTE1Xo3nd7FG4ICtJIsYIZjchJrbLYeiWGpDNiGHgFOQuVAc4ws52m+iRe0 L0xmtT2zZRXBxM83MOa7dH5+lxGgg3nAvQy4txmXnswGHbhOmgCY+qY4DmIVhqnk1vkA 9o5H74ZJQcHjEXKmQaMDzjvyht/1TfxAM8/K/T15gICoDcVR6Y1lkYSygJHavlZLcK/s dfUT7wfLUiw7EeBMGihc+ten/Ly1+Jspp+cxzI03XdHKKWxuMseBF1E8MAieOPqTSyyY Btx/TfDStynp2NQN9FqTPXtI9L8/knEoNEX+qU92F6zotc2Czqe+mTau0XFrE0662sJx 4iUw== X-Gm-Message-State: AOAM533M1LCyTvQCw8e44D9cnU+q/IRDm2OAKptmF3zHSsGVuCdWbo5E Hk8iSu1l842zqBBBNC4A0wEjqH5tOPmH X-Google-Smtp-Source: ABdhPJyIMHbjZAq/zrvnH2Brx9qUV4NZwVrboptymJGEXYKVthYVaiZCt7mFIX2V3EvDmx/7l7kuXQUvwnlm X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a81:1dd1:0:b0:2ea:c38b:65a8 with SMTP id d200-20020a811dd1000000b002eac38b65a8mr22211063ywd.135.1649598899017; Sun, 10 Apr 2022 06:54:59 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:34 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-2-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 01/12] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspam-User: Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ephcHAQY; spf=pass (imf25.hostedemail.com: domain of 3s-FSYgcKCNoVKGAABACKKCHA.8KIHEJQT-IIGR68G.KNC@flex--zokeefe.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=3s-FSYgcKCNoVKGAABACKKCHA.8KIHEJQT-IIGR68G.KNC@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam03 X-Rspamd-Queue-Id: D66EEA0007 X-Stat-Signature: mmhs5afwf99x7kij51iw45nuw4nofhx5 X-HE-Tag: 1649598899-991574 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When scanning an anon pmd to see if it's eligible for collapse, return SCAN_PMD_MAPPED if the pmd already maps a THP. Note that SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the file-collapse path, since the latter might identify pte-mapped compound pages. This is required by MADV_COLLAPSE which necessarily needs to know what hugepage-aligned/sized regions are already pmd-mapped. Signed-off-by: Zach O'Keefe --- include/trace/events/huge_memory.h | 3 ++- mm/internal.h | 1 + mm/khugepaged.c | 30 ++++++++++++++++++++++++++---- mm/rmap.c | 15 +++++++++++++-- 4 files changed, 42 insertions(+), 7 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index d651f3437367..9faa678e0a5b 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -33,7 +33,8 @@ EM( SCAN_ALLOC_HUGE_PAGE_FAIL, "alloc_huge_page_failed") \ EM( SCAN_CGROUP_CHARGE_FAIL, "ccgroup_charge_failed") \ EM( SCAN_TRUNCATED, "truncated") \ - EMe(SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ + EM( SCAN_PAGE_HAS_PRIVATE, "page_has_private") \ + EMe(SCAN_PMD_MAPPED, "page_pmd_mapped") \ #undef EM #undef EMe diff --git a/mm/internal.h b/mm/internal.h index 1d3fb3c0f971..db594d611925 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -172,6 +172,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason /* * in mm/rmap.c: */ +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address); extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 0cde4b44d799..b403f056a847 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -51,6 +51,7 @@ enum scan_result { SCAN_CGROUP_CHARGE_FAIL, SCAN_TRUNCATED, SCAN_PAGE_HAS_PRIVATE, + SCAN_PMD_MAPPED, }; #define CREATE_TRACE_POINTS @@ -987,6 +988,29 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return 0; } +static int find_pmd_or_thp_or_none(struct mm_struct *mm, + unsigned long address, + pmd_t **pmd) +{ + *pmd = mm_find_pmd_raw(mm, address); + pmd_t pmde; + + if (!*pmd) + return SCAN_PMD_NULL; + + pmde = pmd_read_atomic(*pmd); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ + barrier(); +#endif + if (!pmd_present(pmde) || pmd_none(pmde)) + return SCAN_PMD_NULL; + if (pmd_trans_huge(pmde)) + return SCAN_PMD_MAPPED; + return SCAN_SUCCEED; +} + /* * Bring missing pages in from swap, to complete THP collapse. * Only done if khugepaged_scan_pmd believes it is worthwhile. @@ -1238,11 +1262,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, VM_BUG_ON(address & ~HPAGE_PMD_MASK); - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) goto out; - } memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); diff --git a/mm/rmap.c b/mm/rmap.c index a1211fa879cf..fb47443f44c6 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -758,13 +758,12 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) return vma_address(page, vma); } -pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd = NULL; - pmd_t pmde; pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) @@ -779,6 +778,18 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) goto out; pmd = pmd_offset(pud, address); +out: + return pmd; +} + +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) +{ + pmd_t pmde; + pmd_t *pmd; + + pmd = mm_find_pmd_raw(mm, address); + if (!pmd) + goto out; /* * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() * without holding anon_vma lock for write. So when looking for a From patchwork Sun Apr 10 13:54:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808146 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 599D0C433FE for ; Sun, 10 Apr 2022 13:55:03 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E993B6B0074; Sun, 10 Apr 2022 09:55:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id E26F16B0075; Sun, 10 Apr 2022 09:55:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id C75E76B0078; Sun, 10 Apr 2022 09:55:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0146.hostedemail.com [216.40.44.146]) by kanga.kvack.org (Postfix) with ESMTP id B59FE6B0074 for ; Sun, 10 Apr 2022 09:55:02 -0400 (EDT) Received: from smtpin26.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 6624A183C8AA6 for ; Sun, 10 Apr 2022 13:55:02 +0000 (UTC) X-FDA: 79341115644.26.C5773E5 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf28.hostedemail.com (Postfix) with ESMTP id E64B4C0005 for ; Sun, 10 Apr 2022 13:55:01 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id z132-20020a63338a000000b003844e317066so7453800pgz.19 for ; Sun, 10 Apr 2022 06:55:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=EOtkC+q4RLXqaapH78M8qZL28EKUZs9MZc/bj0QkILA=; b=knTV/kzApsg6hM/06xNs3i5wy6M5iT8/qRz78Yswu0GSEnoiua6PRq26onXdXKWs78 DdS2TwVVnoKPa52BZLP+Vo9Ez8Y0AFsPeVhoBKZlCknTN/KRit9VdQJSdsOHrQAnS/MG jtt9nxebxS3a54/F0+gC1VGcIhdR8n6ozosih7AaJN025f6g32xt+/vVTdyA2AcNH1E6 wxdNpl1S1ABgG2AeqD74q2q2V0FISML44SrM0l7QS1o/uhP7rCA9TuemenWfZQu8I2a+ in+U5pQ7kmiT8f/O8yfSqTe5QEdN/52fLoxj+mIgse0hACf2y0sYWGffijU7c1j8i/Bs 5wuw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=EOtkC+q4RLXqaapH78M8qZL28EKUZs9MZc/bj0QkILA=; b=akinc43jC4HgYitFPDqWzGe0Ye1GASuu9b1FR8e6uknGhjt6ktDEUWmW9dp02zdv1d kxarouogVwdlGcspevMzedFlUDHBrXvrBes+30hhWcgbrI0HVCrQsi1g0nCenklua8xd txiKm834aZeRaKYtgQNmMvMt30tRgwAW6LYi9e5pt9TtyvG+Sd4ZQxwPWPPZaXuFNCcn q93Kb4TbjbThqTFHPT75vcusG/Xe4J9rbE4cGlTXix0+pB/9SBzxbew7JoUzFx5I+HHN +h09pDQD2/OBwLNInv0dcmhQhcDRXiZY7kabv9qMxqVrSYVvoM+qDEH2f1iSHOOiRl4M fxww== X-Gm-Message-State: AOAM53332ZBEY5APkjTCi/3EMj5ODzEHGAXxzkmcGgCOp/Ny7vYCcLvh cRXXz3psVjAZt6tv/9X6Am1zLEq3e5U+ X-Google-Smtp-Source: ABdhPJyUO3IQyJyskcjAWnh72ApgpyA27Q1Mb3eC0A6NRzO2grL6kPdJkvVb/C3xxwDglI2pip2L+7O2kMA6 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a63:1b20:0:b0:382:70f9:dc24 with SMTP id b32-20020a631b20000000b0038270f9dc24mr22298954pgb.485.1649598900855; Sun, 10 Apr 2022 06:55:00 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:35 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-3-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 02/12] mm/khugepaged: add struct collapse_control From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: rwcprto1an7g7s9rbadykac8cu9rd9ki Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="knTV/kzA"; spf=pass (imf28.hostedemail.com: domain of 3tOFSYgcKCNsWLHBBCBDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3tOFSYgcKCNsWLHBBCBDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam08 X-Rspamd-Queue-Id: E64B4C0005 X-HE-Tag: 1649598901-401736 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize hugepage collapse by introducing struct collapse_control. This structure serves to describe the properties of the requested collapse, as well as serve as a local scratch pad to use during the collapse itself. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 79 ++++++++++++++++++++++++++++--------------------- 1 file changed, 46 insertions(+), 33 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b403f056a847..eca61eb88dda 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -86,6 +86,14 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 +struct collapse_control { + /* Num pages scanned per node */ + int node_load[MAX_NUMNODES]; + + /* Last target selected in khugepaged_find_target_node() for this scan */ + int last_target_node; +}; + /** * struct mm_slot - hash lookup from mm to mm_slot * @hash: hash collision list @@ -796,9 +804,7 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } -static int khugepaged_node_load[MAX_NUMNODES]; - -static bool khugepaged_scan_abort(int nid) +static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -810,11 +816,11 @@ static bool khugepaged_scan_abort(int nid) return false; /* If there is a count for this node already, it must be acceptable */ - if (khugepaged_node_load[nid]) + if (cc->node_load[nid]) return false; for (i = 0; i < MAX_NUMNODES; i++) { - if (!khugepaged_node_load[i]) + if (!cc->node_load[i]) continue; if (node_distance(nid, i) > node_reclaim_distance) return true; @@ -829,28 +835,28 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { - static int last_khugepaged_target_node = NUMA_NO_NODE; int nid, target_node = 0, max_value = 0; /* find first node with max normal pages hit */ for (nid = 0; nid < MAX_NUMNODES; nid++) - if (khugepaged_node_load[nid] > max_value) { - max_value = khugepaged_node_load[nid]; + if (cc->node_load[nid] > max_value) { + max_value = cc->node_load[nid]; target_node = nid; } /* do some balance if several nodes have the same hit record */ - if (target_node <= last_khugepaged_target_node) - for (nid = last_khugepaged_target_node + 1; nid < MAX_NUMNODES; - nid++) - if (max_value == khugepaged_node_load[nid]) { + if (target_node <= cc->last_target_node) + for (nid = cc->last_target_node + 1; nid < MAX_NUMNODES; + nid++) { + if (max_value == cc->node_load[nid]) { target_node = nid; break; } + } - last_khugepaged_target_node = target_node; + cc->last_target_node = target_node; return target_node; } @@ -888,7 +894,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) return *hpage; } #else -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { return 0; } @@ -1248,7 +1254,8 @@ static void collapse_huge_page(struct mm_struct *mm, static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - struct page **hpage) + struct page **hpage, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1266,7 +1273,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, if (result != SCAN_SUCCEED) goto out; - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR; _pte++, _address += PAGE_SIZE) { @@ -1332,16 +1339,16 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, /* * Record which node the original page is from and save this - * information to khugepaged_node_load[]. + * information to cc->node_load[]. * Khugepaged will allocate hugepage from the node has the max * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; goto out_unmap; @@ -1392,7 +1399,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ collapse_huge_page(mm, address, hpage, node, referenced, unmapped); @@ -2032,7 +2039,8 @@ static void collapse_file(struct mm_struct *mm, } static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) + struct file *file, pgoff_t start, struct page **hpage, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2043,7 +2051,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, present = 0; swap = 0; - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); rcu_read_lock(); xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) { if (xas_retry(&xas, page)) @@ -2068,11 +2076,11 @@ static void khugepaged_scan_file(struct mm_struct *mm, } node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; break; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; @@ -2105,7 +2113,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); collapse_file(mm, file, start, hpage, node); } } @@ -2114,7 +2122,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, } #else static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) + struct file *file, pgoff_t start, struct page **hpage, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2125,7 +2134,8 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage) + struct page **hpage, + struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) { @@ -2201,12 +2211,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage); + khugepaged_scan_file(mm, file, pgoff, hpage, cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, khugepaged_scan.address, - hpage); + hpage, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2262,7 +2272,7 @@ static int khugepaged_wait_event(void) kthread_should_stop(); } -static void khugepaged_do_scan(void) +static void khugepaged_do_scan(struct collapse_control *cc) { struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; @@ -2286,7 +2296,7 @@ static void khugepaged_do_scan(void) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage); + &hpage, cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); @@ -2325,12 +2335,15 @@ static void khugepaged_wait_work(void) static int khugepaged(void *none) { struct mm_slot *mm_slot; + struct collapse_control cc = { + .last_target_node = NUMA_NO_NODE, + }; set_freezable(); set_user_nice(current, MAX_NICE); while (!kthread_should_stop()) { - khugepaged_do_scan(); + khugepaged_do_scan(&cc); khugepaged_wait_work(); } From patchwork Sun Apr 10 13:54:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808147 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 45B6EC433EF for ; Sun, 10 Apr 2022 13:55:05 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C9ECE6B0078; Sun, 10 Apr 2022 09:55:04 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C27FA6B007B; Sun, 10 Apr 2022 09:55:04 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2C1B6B007D; Sun, 10 Apr 2022 09:55:04 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 7FEF66B0078 for ; Sun, 10 Apr 2022 09:55:04 -0400 (EDT) Received: from smtpin04.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 5068621023 for ; Sun, 10 Apr 2022 13:55:04 +0000 (UTC) X-FDA: 79341115728.04.4D7254F Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf29.hostedemail.com (Postfix) with ESMTP id BFF8B120003 for ; Sun, 10 Apr 2022 13:55:03 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id z20-20020a62d114000000b0050567b73da5so3646410pfg.5 for ; Sun, 10 Apr 2022 06:55:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=Q0wrAXoL1DvE5yblhlDS1fugYbq94PA+Qwl+lv/wwwE=; b=p9QOfl69tW1/xE202c1GuPD1Z0UoVdy5ylMpl8enHXJ+8d8RTCI0fmZ0kPfBgiACpI k5LDd4WmL5E+UV1FD1SfzRhzpJP9UnZwdBrfk8IeEvBR/3SZUwGgiMZALoIUkt4a4f0G sBLqbJqHSClW3me+y1VNn/zlrhAwawIA+6C5BRlg8sXqaLEdRKPsFw+As6v/NdFQRZLw 6fa6XCqnUieqnrFj7iA+lgHzbn+EUmwXTz48UmakhSDL4tAbEDqI+RISOFLWZvelKe6m dQ27MVnjOTHzTXeb+s9pSP/mdbjsg9ICRniM4klOyyxXruoX3L4C5DuYK6v0o8dznUh1 7q0w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=Q0wrAXoL1DvE5yblhlDS1fugYbq94PA+Qwl+lv/wwwE=; b=fNo67LB8OPAd+GHM0gzhzeFLPF5b+BpYvftW+Bwj2Ot+gnY8S0kATsl5gKIWWgJAkl lR7HZuljBTV1W6h5vR+qEkRmo6hvPcHeosjzs/gzH92WeYEZnDF1zyJWDO2CBKg4eIWX /HREan+SEoIlazlkoYwVMwHpsEZRxSGPrm/trYNE7vbEZX47BTDoi+fw8kfhiLKexIc1 BDkB7Z6BffCvVpAx/KAfmSGhhJCrzYoSDx3e9txfYx9wDfXQfw7/+5jOMrguecJi7w9x VkAg9LQmJOu22tCjs4PVdA8CT/ihDspB7RSp9jee74dzJJBLWaTsCi0L6s7iixFiNUgM akAg== X-Gm-Message-State: AOAM532lDT8FKH85UrL9J+2DxApTpu4J4PWGsgbgA/U7B+fNqLyTqytJ cVgSq65yPldVDhIiR9nMm4ZAmOJs23KB X-Google-Smtp-Source: ABdhPJxeI6o0or+KinHTfSRNeWAZucM1/Dl81okyXgpYClY2GkbBNZApO0TN1ABMNzlJnrxKUarrKAqStwQS X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:8581:b0:1b2:7541:af6c with SMTP id m1-20020a17090a858100b001b27541af6cmr31890552pjn.48.1649598902754; Sun, 10 Apr 2022 06:55:02 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:36 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-4-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 03/12] mm/khugepaged: make hugepage allocation context-specific From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: w9onxy6f9ipcf9r4y7k4h5kagp3zy8jm X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: BFF8B120003 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=p9QOfl69; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf29.hostedemail.com: domain of 3tuFSYgcKCN0YNJDDEDFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3tuFSYgcKCN0YNJDDEDFNNFKD.BNLKHMTW-LLJU9BJ.NQF@flex--zokeefe.bounces.google.com X-Rspam-User: X-HE-Tag: 1649598903-987762 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add hugepage allocation context to struct collapse_context, allowing different collapse contexts to allocate hugepages differently. For example, khugepaged decides to allocate differently in NUMA and UMA configurations, and other collapse contexts shouldn't be coupled to this decision. Additionally, move [pre]allocated hugepage pointer into struct collapse_context. Signed-off-by: Zach O'Keefe Reported-by: kernel test robot Reported-by: kernel test robot --- mm/khugepaged.c | 96 ++++++++++++++++++++++++------------------------- 1 file changed, 48 insertions(+), 48 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index eca61eb88dda..180d99a6b571 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,6 +92,10 @@ struct collapse_control { /* Last target selected in khugepaged_find_target_node() for this scan */ int last_target_node; + + struct page *hpage; + struct page* (*alloc_hpage)(struct collapse_control *cc, gfp_t gfp, + int node); }; /** @@ -877,21 +881,21 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static struct page *khugepaged_alloc_page(struct collapse_control *cc, + gfp_t gfp, int node) { - VM_BUG_ON_PAGE(*hpage, *hpage); + VM_BUG_ON_PAGE(cc->hpage, cc->hpage); - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!*hpage)) { + cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + if (unlikely(!cc->hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - *hpage = ERR_PTR(-ENOMEM); + cc->hpage = ERR_PTR(-ENOMEM); return NULL; } - prep_transhuge_page(*hpage); + prep_transhuge_page(cc->hpage); count_vm_event(THP_COLLAPSE_ALLOC); - return *hpage; + return cc->hpage; } #else static int khugepaged_find_target_node(struct collapse_control *cc) @@ -953,12 +957,12 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static struct page *khugepaged_alloc_page(struct collapse_control *cc, + gfp_t gfp) { - VM_BUG_ON(!*hpage); + VM_BUG_ON(!cc->hpage); - return *hpage; + return cc->hpage; } #endif @@ -1080,10 +1084,9 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - int node, int referenced, int unmapped) +static void collapse_huge_page(struct mm_struct *mm, unsigned long address, + struct collapse_control *cc, int referenced, + int unmapped) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1096,6 +1099,7 @@ static void collapse_huge_page(struct mm_struct *mm, struct mmu_notifier_range range; gfp_t gfp; const struct cpumask *cpumask; + int node; VM_BUG_ON(address & ~HPAGE_PMD_MASK); @@ -1110,13 +1114,14 @@ static void collapse_huge_page(struct mm_struct *mm, */ mmap_read_unlock(mm); + node = khugepaged_find_target_node(cc); /* sched to specified node before huage page memory copy */ if (task_node(current) != node) { cpumask = cpumask_of_node(node); if (!cpumask_empty(cpumask)) set_cpus_allowed_ptr(current, cpumask); } - new_page = khugepaged_alloc_page(hpage, gfp, node); + new_page = cc->alloc_hpage(cc, gfp, node); if (!new_page) { result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out_nolock; @@ -1238,15 +1243,15 @@ static void collapse_huge_page(struct mm_struct *mm, update_mmu_cache_pmd(vma, address, pmd); spin_unlock(pmd_ptl); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); trace_mm_collapse_huge_page(mm, isolated, result); return; } @@ -1254,7 +1259,6 @@ static void collapse_huge_page(struct mm_struct *mm, static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - struct page **hpage, struct collapse_control *cc) { pmd_t *pmd; @@ -1399,10 +1403,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, node, - referenced, unmapped); + collapse_huge_page(mm, address, cc, referenced, unmapped); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1655,8 +1657,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @mm: process address space where collapse happens * @file: file that collapse on * @start: collapse start address - * @hpage: new allocated huge page for collapse - * @node: appointed node the new huge page allocate from + * @collapse_control: collapse context and scratchpad * * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; @@ -1674,8 +1675,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + unlock and free huge page; */ static void collapse_file(struct mm_struct *mm, - struct file *file, pgoff_t start, - struct page **hpage, int node) + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; gfp_t gfp; @@ -1685,15 +1686,16 @@ static void collapse_file(struct mm_struct *mm, XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); int nr_none = 0, result = SCAN_SUCCEED; bool is_shmem = shmem_file(file); - int nr; + int nr, node; VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); /* Only allocate from the target node */ gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; + node = khugepaged_find_target_node(cc); - new_page = khugepaged_alloc_page(hpage, gfp, node); + new_page = cc->alloc_hpage(cc, gfp, node); if (!new_page) { result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out; @@ -1986,7 +1988,7 @@ static void collapse_file(struct mm_struct *mm, * Remove pte page tables, so we can re-fault the page as huge. */ retract_page_tables(mapping, start); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; } else { @@ -2033,14 +2035,14 @@ static void collapse_file(struct mm_struct *mm, unlock_page(new_page); out: VM_BUG_ON(!list_empty(&pagelist)); - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); /* TODO: tracepoints */ } static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage, - struct collapse_control *cc) + struct file *file, pgoff_t start, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2113,8 +2115,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(cc); - collapse_file(mm, file, start, hpage, node); + collapse_file(mm, file, start, cc); } } @@ -2122,8 +2123,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, } #else static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage, - struct collapse_control *cc) + struct file *file, pgoff_t start, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2134,7 +2135,6 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage, struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) @@ -2211,12 +2211,11 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage, cc); + khugepaged_scan_file(mm, file, pgoff, cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - hpage, cc); + khugepaged_scan.address, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2274,15 +2273,15 @@ static int khugepaged_wait_event(void) static void khugepaged_do_scan(struct collapse_control *cc) { - struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; unsigned int pages = READ_ONCE(khugepaged_pages_to_scan); bool wait = true; + cc->hpage = NULL; lru_add_drain_all(); while (progress < pages) { - if (!khugepaged_prealloc_page(&hpage, &wait)) + if (!khugepaged_prealloc_page(&cc->hpage, &wait)) break; cond_resched(); @@ -2296,14 +2295,14 @@ static void khugepaged_do_scan(struct collapse_control *cc) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage, cc); + cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); } - if (!IS_ERR_OR_NULL(hpage)) - put_page(hpage); + if (!IS_ERR_OR_NULL(cc->hpage)) + put_page(cc->hpage); } static bool khugepaged_should_wakeup(void) @@ -2337,6 +2336,7 @@ static int khugepaged(void *none) struct mm_slot *mm_slot; struct collapse_control cc = { .last_target_node = NUMA_NO_NODE, + .alloc_hpage = &khugepaged_alloc_page, }; set_freezable(); From patchwork Sun Apr 10 13:54:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808148 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 473A3C433FE for ; Sun, 10 Apr 2022 13:55:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A87676B007E; Sun, 10 Apr 2022 09:55:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A02436B007D; Sun, 10 Apr 2022 09:55:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7E0616B007E; Sun, 10 Apr 2022 09:55:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 67C286B007B for ; Sun, 10 Apr 2022 09:55:06 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 4071460DE2 for ; Sun, 10 Apr 2022 13:55:06 +0000 (UTC) X-FDA: 79341115812.01.3646F43 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf01.hostedemail.com (Postfix) with ESMTP id D6DD540007 for ; Sun, 10 Apr 2022 13:55:05 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id y3-20020a17090a8b0300b001cb4831a8fbso4020060pjn.1 for ; Sun, 10 Apr 2022 06:55:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=/TYmyATP/zWYnYeZsGBJimF3MmCv23i5/m+9wtuAvls=; b=Px/73zISMV4w2GFkK3YaiBSRuOe1bJDu2HZ/4I6JwUpMXraedlVQcbcUNZEnGe/7tv kjM4j8taFZ9S8ck8nk0DLpDw/eMrwJgwMMbapvlZX+isgn+skbaCPyCDZPkmRjcJQLxW VvdHN/EMbKQDrkCcVlEGNAxK1kVBtQRbibruwn7kgrQ12nnvbr+KvwmaIb/3jUxOnVD8 DGA54WKVIWgZ2DDAY3r9DLB9vAXsL1CzydCt+Bt+Ztixwqc3UEu75pUJLiNHLl64vkFr 33l2iUBUXjB661r8ng1haU3+lnjS7X6Yrk43qkeJABZuT8woYiNjHDtBC8QTHXzRefMM piGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=/TYmyATP/zWYnYeZsGBJimF3MmCv23i5/m+9wtuAvls=; b=LQDdiO0adDFbB9TW39wv7nm1fCFBr8fcmOQwxGumEbR0YP12YPggBaztUqMwRMsgSm nWwSn1EeIH7bFOvQSIm2Zu1jqcMCRks67VwhMG2j7IlezgIe8QOeXXsj5kX5hNgZ+xV6 GgLMdRDAa1+8Xm4zKNqVmCItSMoZPwR1WZM7CJmKZL3gaWThMfPqV+axQzam8ABi1uZ0 mrOFeOUBsPdNXQs+Y2pN17IzvYMHKrhdc1cy/LSDQBCSPCZ1gGrrB/g+Zlx5k5lfVP3b GmC2D0JpEiGLhTFTmUOsCp6Qqs5/CRvLWhFd13vVE60bvNdvOvC/3AdSFRrrqkJ17TxY eVuw== X-Gm-Message-State: AOAM531xn5QoVyTHK6BD5Qphgdyd/Cbt8zPuVdkl2lPJIgkCoUuEqkC9 z31QjR0qTA76SfPHNZMEzSR6XL5kGsBA X-Google-Smtp-Source: ABdhPJwsH63cKkba24jy1yt+qBHTfvdh4CL11wb8TUO1PqcDbBLDh/jsQMA/soy67G6MiOdzua7cLXkkwRZ9 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:d584:b0:1b8:7864:1735 with SMTP id v4-20020a17090ad58400b001b878641735mr31716047pju.126.1649598904804; Sun, 10 Apr 2022 06:55:04 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:37 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-5-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 04/12] mm/khugepaged: add struct collapse_result From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: bcj53szoum4b9o5iw55met1jcyugb7a1 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: D6DD540007 Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Px/73zIS"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf01.hostedemail.com: domain of 3uOFSYgcKCN8aPLFFGFHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3uOFSYgcKCN8aPLFFGFHPPHMF.DPNMJOVY-NNLWBDL.PSH@flex--zokeefe.bounces.google.com X-Rspam-User: X-HE-Tag: 1649598905-42790 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add struct collapse_result which aggregates data from a single khugepaged_scan_pmd() or khugapaged_scan_file() request. Change khugepaged to take action based on this returned data instead of deep within the collapsing functions themselves. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 186 ++++++++++++++++++++++++++---------------------- 1 file changed, 100 insertions(+), 86 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 180d99a6b571..ed025dbbd7e6 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -98,6 +98,14 @@ struct collapse_control { int node); }; +/* Gather information from one khugepaged_scan_[pmd|file]() request */ +struct collapse_result { + enum scan_result result; + + /* Was mmap_lock dropped during request? */ + bool dropped_mmap_lock; +}; + /** * struct mm_slot - hash lookup from mm to mm_slot * @hash: hash collision list @@ -742,13 +750,13 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); - return 1; + return SCAN_SUCCEED; } out: release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); - return 0; + return result; } static void __collapse_huge_page_copy(pte_t *pte, struct page *page, @@ -1086,7 +1094,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, static void collapse_huge_page(struct mm_struct *mm, unsigned long address, struct collapse_control *cc, int referenced, - int unmapped) + int unmapped, struct collapse_result *cr) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1094,7 +1102,6 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, pgtable_t pgtable; struct page *new_page; spinlock_t *pmd_ptl, *pte_ptl; - int isolated = 0, result = 0; struct vm_area_struct *vma; struct mmu_notifier_range range; gfp_t gfp; @@ -1102,6 +1109,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, int node; VM_BUG_ON(address & ~HPAGE_PMD_MASK); + cr->result = SCAN_FAIL; /* Only allocate from the target node */ gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; @@ -1113,6 +1121,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, * that. We will recheck the vma after taking it again in write mode. */ mmap_read_unlock(mm); + cr->dropped_mmap_lock = true; node = khugepaged_find_target_node(cc); /* sched to specified node before huage page memory copy */ @@ -1123,26 +1132,26 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, } new_page = cc->alloc_hpage(cc, gfp, node); if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + cr->result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out_nolock; } if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; + cr->result = SCAN_CGROUP_CHARGE_FAIL; goto out_nolock; } count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); - if (result) { + cr->result = hugepage_vma_revalidate(mm, address, &vma); + if (cr->result) { mmap_read_unlock(mm); goto out_nolock; } pmd = mm_find_pmd(mm, address); if (!pmd) { - result = SCAN_PMD_NULL; + cr->result = SCAN_PMD_NULL; mmap_read_unlock(mm); goto out_nolock; } @@ -1165,8 +1174,8 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, * handled by the anon_vma lock + PG_lock. */ mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); - if (result) + cr->result = hugepage_vma_revalidate(mm, address, &vma); + if (cr->result) goto out_up_write; /* check if the pmd is still valid */ if (mm_find_pmd(mm, address) != pmd) @@ -1193,11 +1202,11 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - isolated = __collapse_huge_page_isolate(vma, address, pte, - &compound_pagelist); + cr->result = __collapse_huge_page_isolate(vma, address, pte, + &compound_pagelist); spin_unlock(pte_ptl); - if (unlikely(!isolated)) { + if (unlikely(cr->result != SCAN_SUCCEED)) { pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); @@ -1209,7 +1218,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, pmd_populate(mm, pmd, pmd_pgtable(_pmd)); spin_unlock(pmd_ptl); anon_vma_unlock_write(vma->anon_vma); - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto out_up_write; } @@ -1245,25 +1254,25 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, cc->hpage = NULL; - khugepaged_pages_collapsed++; - result = SCAN_SUCCEED; + cr->result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: if (!IS_ERR_OR_NULL(cc->hpage)) mem_cgroup_uncharge(page_folio(cc->hpage)); - trace_mm_collapse_huge_page(mm, isolated, result); + trace_mm_collapse_huge_page(mm, cr->result == SCAN_SUCCEED, cr->result); return; } -static int khugepaged_scan_pmd(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long address, - struct collapse_control *cc) +static void khugepaged_scan_pmd(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + struct collapse_control *cc, + struct collapse_result *cr) { pmd_t *pmd; pte_t *pte, *_pte; - int ret = 0, result = 0, referenced = 0; + int referenced = 0; int none_or_zero = 0, shared = 0; struct page *page = NULL; unsigned long _address; @@ -1272,9 +1281,10 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, bool writable = false; VM_BUG_ON(address & ~HPAGE_PMD_MASK); + cr->result = SCAN_FAIL; - result = find_pmd_or_thp_or_none(mm, address, &pmd); - if (result != SCAN_SUCCEED) + cr->result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (cr->result != SCAN_SUCCEED) goto out; memset(cc->node_load, 0, sizeof(cc->node_load)); @@ -1290,12 +1300,12 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, * comment below for pte_uffd_wp(). */ if (pte_swp_uffd_wp(pteval)) { - result = SCAN_PTE_UFFD_WP; + cr->result = SCAN_PTE_UFFD_WP; goto out_unmap; } continue; } else { - result = SCAN_EXCEED_SWAP_PTE; + cr->result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); goto out_unmap; } @@ -1305,7 +1315,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, ++none_or_zero <= khugepaged_max_ptes_none) { continue; } else { - result = SCAN_EXCEED_NONE_PTE; + cr->result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); goto out_unmap; } @@ -1320,7 +1330,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, * userfault messages that falls outside of * the registered range. So, just be simple. */ - result = SCAN_PTE_UFFD_WP; + cr->result = SCAN_PTE_UFFD_WP; goto out_unmap; } if (pte_write(pteval)) @@ -1328,13 +1338,13 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, page = vm_normal_page(vma, _address, pteval); if (unlikely(!page)) { - result = SCAN_PAGE_NULL; + cr->result = SCAN_PAGE_NULL; goto out_unmap; } if (page_mapcount(page) > 1 && ++shared > khugepaged_max_ptes_shared) { - result = SCAN_EXCEED_SHARED_PTE; + cr->result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; } @@ -1349,20 +1359,20 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, */ node = page_to_nid(page); if (khugepaged_scan_abort(node, cc)) { - result = SCAN_SCAN_ABORT; + cr->result = SCAN_SCAN_ABORT; goto out_unmap; } cc->node_load[node]++; if (!PageLRU(page)) { - result = SCAN_PAGE_LRU; + cr->result = SCAN_PAGE_LRU; goto out_unmap; } if (PageLocked(page)) { - result = SCAN_PAGE_LOCK; + cr->result = SCAN_PAGE_LOCK; goto out_unmap; } if (!PageAnon(page)) { - result = SCAN_PAGE_ANON; + cr->result = SCAN_PAGE_ANON; goto out_unmap; } @@ -1384,7 +1394,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, * will be done again later the risk seems low. */ if (!is_refcount_suitable(page)) { - result = SCAN_PAGE_COUNT; + cr->result = SCAN_PAGE_COUNT; goto out_unmap; } if (pte_young(pteval) || @@ -1393,23 +1403,20 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, referenced++; } if (!writable) { - result = SCAN_PAGE_RO; + cr->result = SCAN_PAGE_RO; } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { - result = SCAN_LACK_REFERENCED_PAGE; + cr->result = SCAN_LACK_REFERENCED_PAGE; } else { - result = SCAN_SUCCEED; - ret = 1; + cr->result = SCAN_SUCCEED; } out_unmap: pte_unmap_unlock(pte, ptl); - if (ret) { + if (cr->result == SCAN_SUCCEED) /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, cc, referenced, unmapped); - } + collapse_huge_page(mm, address, cc, referenced, unmapped, cr); out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, - none_or_zero, result, unmapped); - return ret; + none_or_zero, cr->result, unmapped); } static void collect_mm_slot(struct mm_slot *mm_slot) @@ -1676,7 +1683,9 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) */ static void collapse_file(struct mm_struct *mm, struct file *file, pgoff_t start, - struct collapse_control *cc) + struct collapse_control *cc, + struct collapse_result *cr) + { struct address_space *mapping = file->f_mapping; gfp_t gfp; @@ -1684,25 +1693,27 @@ static void collapse_file(struct mm_struct *mm, pgoff_t index, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); XA_STATE_ORDER(xas, &mapping->i_pages, start, HPAGE_PMD_ORDER); - int nr_none = 0, result = SCAN_SUCCEED; + int nr_none = 0; bool is_shmem = shmem_file(file); int nr, node; VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); + cr->result = SCAN_SUCCEED; + /* Only allocate from the target node */ gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; node = khugepaged_find_target_node(cc); new_page = cc->alloc_hpage(cc, gfp, node); if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + cr->result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out; } if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; + cr->result = SCAN_CGROUP_CHARGE_FAIL; goto out; } count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); @@ -1718,7 +1729,7 @@ static void collapse_file(struct mm_struct *mm, break; xas_unlock_irq(&xas); if (!xas_nomem(&xas, GFP_KERNEL)) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto out; } } while (1); @@ -1749,13 +1760,13 @@ static void collapse_file(struct mm_struct *mm, */ if (index == start) { if (!xas_next_entry(&xas, end - 1)) { - result = SCAN_TRUNCATED; + cr->result = SCAN_TRUNCATED; goto xa_locked; } xas_set(&xas, index); } if (!shmem_charge(mapping->host, 1)) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto xa_locked; } xas_store(&xas, new_page); @@ -1768,14 +1779,14 @@ static void collapse_file(struct mm_struct *mm, /* swap in or instantiate fallocated page */ if (shmem_getpage(mapping->host, index, &page, SGP_NOALLOC)) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto xa_unlocked; } } else if (trylock_page(page)) { get_page(page); xas_unlock_irq(&xas); } else { - result = SCAN_PAGE_LOCK; + cr->result = SCAN_PAGE_LOCK; goto xa_locked; } } else { /* !is_shmem */ @@ -1788,7 +1799,7 @@ static void collapse_file(struct mm_struct *mm, lru_add_drain(); page = find_lock_page(mapping, index); if (unlikely(page == NULL)) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto xa_unlocked; } } else if (PageDirty(page)) { @@ -1807,17 +1818,17 @@ static void collapse_file(struct mm_struct *mm, */ xas_unlock_irq(&xas); filemap_flush(mapping); - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto xa_unlocked; } else if (PageWriteback(page)) { xas_unlock_irq(&xas); - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto xa_unlocked; } else if (trylock_page(page)) { get_page(page); xas_unlock_irq(&xas); } else { - result = SCAN_PAGE_LOCK; + cr->result = SCAN_PAGE_LOCK; goto xa_locked; } } @@ -1830,7 +1841,7 @@ static void collapse_file(struct mm_struct *mm, /* make sure the page is up to date */ if (unlikely(!PageUptodate(page))) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto out_unlock; } @@ -1839,12 +1850,12 @@ static void collapse_file(struct mm_struct *mm, * we locked the first page, then a THP might be there already. */ if (PageTransCompound(page)) { - result = SCAN_PAGE_COMPOUND; + cr->result = SCAN_PAGE_COMPOUND; goto out_unlock; } if (page_mapping(page) != mapping) { - result = SCAN_TRUNCATED; + cr->result = SCAN_TRUNCATED; goto out_unlock; } @@ -1855,18 +1866,18 @@ static void collapse_file(struct mm_struct *mm, * page is dirty because it hasn't been flushed * since first write. */ - result = SCAN_FAIL; + cr->result = SCAN_FAIL; goto out_unlock; } if (isolate_lru_page(page)) { - result = SCAN_DEL_PAGE_LRU; + cr->result = SCAN_DEL_PAGE_LRU; goto out_unlock; } if (page_has_private(page) && !try_to_release_page(page, GFP_KERNEL)) { - result = SCAN_PAGE_HAS_PRIVATE; + cr->result = SCAN_PAGE_HAS_PRIVATE; putback_lru_page(page); goto out_unlock; } @@ -1887,7 +1898,7 @@ static void collapse_file(struct mm_struct *mm, * - one from isolate_lru_page; */ if (!page_ref_freeze(page, 3)) { - result = SCAN_PAGE_COUNT; + cr->result = SCAN_PAGE_COUNT; xas_unlock_irq(&xas); putback_lru_page(page); goto out_unlock; @@ -1922,7 +1933,7 @@ static void collapse_file(struct mm_struct *mm, */ smp_mb(); if (inode_is_open_for_write(mapping->host)) { - result = SCAN_FAIL; + cr->result = SCAN_FAIL; __mod_lruvec_page_state(new_page, NR_FILE_THPS, -nr); filemap_nr_thps_dec(mapping); goto xa_locked; @@ -1949,7 +1960,7 @@ static void collapse_file(struct mm_struct *mm, */ try_to_unmap_flush(); - if (result == SCAN_SUCCEED) { + if (cr->result == SCAN_SUCCEED) { struct page *page, *tmp; /* @@ -1989,8 +2000,6 @@ static void collapse_file(struct mm_struct *mm, */ retract_page_tables(mapping, start); cc->hpage = NULL; - - khugepaged_pages_collapsed++; } else { struct page *page; @@ -2042,15 +2051,16 @@ static void collapse_file(struct mm_struct *mm, static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, pgoff_t start, - struct collapse_control *cc) + struct collapse_control *cc, + struct collapse_result *cr) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; XA_STATE(xas, &mapping->i_pages, start); int present, swap; int node = NUMA_NO_NODE; - int result = SCAN_SUCCEED; + cr->result = SCAN_SUCCEED; present = 0; swap = 0; memset(cc->node_load, 0, sizeof(cc->node_load)); @@ -2061,7 +2071,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, if (xa_is_value(page)) { if (++swap > khugepaged_max_ptes_swap) { - result = SCAN_EXCEED_SWAP_PTE; + cr->result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); break; } @@ -2073,25 +2083,25 @@ static void khugepaged_scan_file(struct mm_struct *mm, * into a PMD sized page */ if (PageTransCompound(page)) { - result = SCAN_PAGE_COMPOUND; + cr->result = SCAN_PAGE_COMPOUND; break; } node = page_to_nid(page); if (khugepaged_scan_abort(node, cc)) { - result = SCAN_SCAN_ABORT; + cr->result = SCAN_SCAN_ABORT; break; } cc->node_load[node]++; if (!PageLRU(page)) { - result = SCAN_PAGE_LRU; + cr->result = SCAN_PAGE_LRU; break; } if (page_count(page) != 1 + page_mapcount(page) + page_has_private(page)) { - result = SCAN_PAGE_COUNT; + cr->result = SCAN_PAGE_COUNT; break; } @@ -2110,12 +2120,12 @@ static void khugepaged_scan_file(struct mm_struct *mm, } rcu_read_unlock(); - if (result == SCAN_SUCCEED) { + if (cr->result == SCAN_SUCCEED) { if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { - result = SCAN_EXCEED_NONE_PTE; + cr->result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - collapse_file(mm, file, start, cc); + collapse_file(mm, file, start, cc, cr); } } @@ -2124,7 +2134,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, #else static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, pgoff_t start, - struct collapse_control *cc) + struct collapse_control *cc, + struct collapse_result *cr) { BUILD_BUG(); } @@ -2196,7 +2207,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, goto skip; while (khugepaged_scan.address < hend) { - int ret; + struct collapse_result cr = {0}; cond_resched(); if (unlikely(khugepaged_test_exit(mm))) goto breakouterloop; @@ -2210,17 +2221,20 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, khugepaged_scan.address); mmap_read_unlock(mm); - ret = 1; - khugepaged_scan_file(mm, file, pgoff, cc); + cr.dropped_mmap_lock = true; + khugepaged_scan_file(mm, file, pgoff, cc, &cr); fput(file); } else { - ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, cc); + khugepaged_scan_pmd(mm, vma, + khugepaged_scan.address, + cc, &cr); } + if (cr.result == SCAN_SUCCEED) + ++khugepaged_pages_collapsed; /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; progress += HPAGE_PMD_NR; - if (ret) + if (cr.dropped_mmap_lock) /* we released mmap_lock so break loop */ goto breakouterloop_mmap_lock; if (progress >= pages) From patchwork Sun Apr 10 13:54:38 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808149 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 35DDFC433F5 for ; Sun, 10 Apr 2022 13:55:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AF1AE6B007B; Sun, 10 Apr 2022 09:55:09 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A79636B007D; Sun, 10 Apr 2022 09:55:09 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8A4316B0080; Sun, 10 Apr 2022 09:55:09 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id 75F946B007B for ; Sun, 10 Apr 2022 09:55:09 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 28CA3A7294 for ; Sun, 10 Apr 2022 13:55:09 +0000 (UTC) X-FDA: 79341115938.31.2F5E7DC Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf22.hostedemail.com (Postfix) with ESMTP id A8868C0003 for ; Sun, 10 Apr 2022 13:55:08 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id c32-20020a631c60000000b0039cec64e9f1so4401715pgm.3 for ; Sun, 10 Apr 2022 06:55:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=fwnSWvm3YlxUgYWdNYj6DCq1g/Vd5grd9Eg89xs0iDE=; b=iYaPOIZvs3ha6pHpFh2GA87iFxUV/6veZMKpNxH/4NucFMkKwPYSumsQ3t4nYNauO0 8O6fJ/ArRCseUQYbJO0btu9gCT5YWzDOep4f+lql+ohH2lhiEz0PGbDg6vTbb8/MD0Tc N8h5euUfg1DTFJhbTs1eUIsX+meu5nlBoWWz+zd5p3rzvehJ0drKb3sCont/XDqNfTKL oTNWkOoq1aqUwE0yzF1Bt5S7z4CgsoXYSBPjJOAB1Z9buTQtd6MsQ/2WPvmHiW1MYclj dg3IG5l9jVKMLPJajWGkF3c7GUVIKCCss4ygBx+/XeGC2LqEcWHETIZAojgTl4tst9cd EU3g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=fwnSWvm3YlxUgYWdNYj6DCq1g/Vd5grd9Eg89xs0iDE=; b=54a8P4S6Ga4dC2wW5Wzvv3kQI77gpgfH0GrA8r3KDfwaikuIZo+F7ILzYSrYU6Z56Y bmCtbZYXM9LxQRuZovQpQikidD7sav5j52mJLlajjn/GMRoih2rMshT7B7S09zS6Bujz rt9vENuSZEDRxqwmfzy0Bxbg5XpfCGsacNi4d3F8QO3veD/u6KUmH2psLF3hJLlh28Yy +s7AmoxJhXjSOM2+xtO0ZnfTajjEyOmHBrg2b0I1L3XWKR1KCgBoSVoWjgiO9vUJS4IF GYR7wSzdtJGay0CtQRfx5xz4h47xGR5sW9+8lR1PbS2EBAgR134kWPsPwPLQeacJODS9 QaUg== X-Gm-Message-State: AOAM531adJ88IQ3Om9fJcdq0X0TlyOFzHIt2jLvC26+lCvYfchzxC6Yo aynRIkl1TmmTUIsame2d9U9nTSfz/Dpk X-Google-Smtp-Source: ABdhPJyPERbW1LIA30FZXpWAnIXSvvA6PLoWGndlB/+a57ZygXe9nOM6G+lFL9Dwc7zVTG0CfwaKsu/3elAr X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:858b:b0:1c6:5bc8:781a with SMTP id m11-20020a17090a858b00b001c65bc8781amr954412pjn.0.1649598907260; Sun, 10 Apr 2022 06:55:07 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:38 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-6-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 05/12] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspam-User: X-Stat-Signature: kux63fc6zefq1fywnz4ecdy8gasoeum6 Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=iYaPOIZv; spf=pass (imf22.hostedemail.com: domain of 3u-FSYgcKCOIdSOIIJIKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3u-FSYgcKCOIdSOIIJIKSSKPI.GSQPMRYb-QQOZEGO.SVK@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: A8868C0003 X-HE-Tag: 1649598908-651913 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This idea was introduced by David Rientjes[1], and the semantics and implementation were introduced and discussed in a previous PATCH RFC[2]. Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory at their own expense. The benefits of this approach are: * CPU is charged to the process that wants to spend the cycles for the THP * avoid unpredictable timing of khugepaged collapse Immediate users of this new functionality include: * immediately back executable text by hugepages. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take too long on a large system. * malloc implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THP to regain TLB performance. Allocation semantics are the same as khugepaged, and depend on (1) the active sysfs settings /sys/kernel/mm/transparent_hugepage/enabled and /sys/kernel/mm/transparent_hugepage/khugepaged/defrag, and (2) the VMA flags of the memory range being collapsed. Only privately-mapped anon memory is supported for now. [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ [2] https://lore.kernel.org/linux-mm/20220308213417.1407042-1-zokeefe@google.com/ Suggested-by: David Rientjes Signed-off-by: Zach O'Keefe Reported-by: kernel test robot Reported-by: kernel test robot --- include/linux/huge_mm.h | 12 ++ include/uapi/asm-generic/mman-common.h | 2 + mm/khugepaged.c | 151 ++++++++++++++++++++++--- mm/madvise.c | 5 + 4 files changed, 157 insertions(+), 13 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 816a9937f30e..ddad7c7af44e 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -236,6 +236,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); +int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -392,6 +395,15 @@ static inline int hugepage_madvise(struct vm_area_struct *vma, BUG(); return 0; } + +static inline int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + BUG(); + return 0; +} + static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6c1aa92a92e4..6ce1f1ceb432 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -77,6 +77,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ed025dbbd7e6..c5c484b7e394 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -846,7 +846,6 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT; } -#ifdef CONFIG_NUMA static int khugepaged_find_target_node(struct collapse_control *cc) { int nid, target_node = 0, max_value = 0; @@ -872,6 +871,24 @@ static int khugepaged_find_target_node(struct collapse_control *cc) return target_node; } +static struct page *alloc_hpage(struct collapse_control *cc, gfp_t gfp, + int node) +{ + VM_BUG_ON_PAGE(cc->hpage, cc->hpage); + + cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + if (unlikely(!cc->hpage)) { + count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + cc->hpage = ERR_PTR(-ENOMEM); + return NULL; + } + + prep_transhuge_page(cc->hpage); + count_vm_event(THP_COLLAPSE_ALLOC); + return cc->hpage; +} + +#ifdef CONFIG_NUMA static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) { if (IS_ERR(*hpage)) { @@ -892,18 +909,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) static struct page *khugepaged_alloc_page(struct collapse_control *cc, gfp_t gfp, int node) { - VM_BUG_ON_PAGE(cc->hpage, cc->hpage); - - cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!cc->hpage)) { - count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - cc->hpage = ERR_PTR(-ENOMEM); - return NULL; - } - - prep_transhuge_page(cc->hpage); - count_vm_event(THP_COLLAPSE_ALLOC); - return cc->hpage; + return alloc_hpage(cc, gfp, node); } #else static int khugepaged_find_target_node(struct collapse_control *cc) @@ -2456,3 +2462,122 @@ void khugepaged_min_free_kbytes_update(void) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); } + +static void madvise_collapse_cleanup_page(struct page **hpage) +{ + if (!IS_ERR(*hpage) && *hpage) + put_page(*hpage); + *hpage = NULL; +} + +int madvise_collapse_errno(enum scan_result r) +{ + switch (r) { + case SCAN_PMD_NULL: + case SCAN_ADDRESS_RANGE: + case SCAN_VMA_NULL: + case SCAN_PTE_NON_PRESENT: + case SCAN_PAGE_NULL: + /* + * Addresses in the specified range are not currently mapped, + * or are outside the AS of the process. + */ + return -ENOMEM; + case SCAN_ALLOC_HUGE_PAGE_FAIL: + case SCAN_CGROUP_CHARGE_FAIL: + /* A kernel resource was temporarily unavailable. */ + return -EAGAIN; + default: + return -EINVAL; + } +} + +int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + struct collapse_control cc = { + .last_target_node = NUMA_NO_NODE, + .hpage = NULL, + .alloc_hpage = &alloc_hpage, + }; + struct mm_struct *mm = vma->vm_mm; + struct collapse_result cr; + unsigned long hstart, hend, addr; + int thps = 0, nr_hpages = 0; + + BUG_ON(vma->vm_start > start); + BUG_ON(vma->vm_end < end); + + *prev = vma; + + if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) + return -EINVAL; + + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; + hend = end & HPAGE_PMD_MASK; + nr_hpages = (hend - hstart) >> HPAGE_PMD_SHIFT; + + if (hstart >= hend || !transparent_hugepage_active(vma)) + return -EINVAL; + + mmgrab(mm); + lru_add_drain(); + + for (addr = hstart; ; ) { + mmap_assert_locked(mm); + cond_resched(); + memset(&cr, 0, sizeof(cr)); + + if (unlikely(khugepaged_test_exit(mm))) + break; + + memset(cc.node_load, 0, sizeof(cc.node_load)); + khugepaged_scan_pmd(mm, vma, addr, &cc, &cr); + if (cr.dropped_mmap_lock) + *prev = NULL; /* tell madvise we dropped mmap_lock */ + + switch (cr.result) { + /* Whitelisted set of results where continuing OK */ + case SCAN_SUCCEED: + case SCAN_PMD_MAPPED: + ++thps; + case SCAN_PMD_NULL: + case SCAN_PTE_NON_PRESENT: + case SCAN_PTE_UFFD_WP: + case SCAN_PAGE_RO: + case SCAN_LACK_REFERENCED_PAGE: + case SCAN_PAGE_NULL: + case SCAN_PAGE_COUNT: + case SCAN_PAGE_LOCK: + case SCAN_PAGE_COMPOUND: + break; + case SCAN_PAGE_LRU: + lru_add_drain_all(); + goto retry; + default: + /* Other error, exit */ + goto break_loop; + } + addr += HPAGE_PMD_SIZE; + if (addr >= hend) + break; +retry: + if (cr.dropped_mmap_lock) { + mmap_read_lock(mm); + if (hugepage_vma_revalidate(mm, addr, &vma)) + goto out; + } + madvise_collapse_cleanup_page(&cc.hpage); + } + +break_loop: + /* madvise_walk_vmas() expects us to hold mmap_lock on return */ + if (cr.dropped_mmap_lock) + mmap_read_lock(mm); +out: + mmap_assert_locked(mm); + madvise_collapse_cleanup_page(&cc.hpage); + mmdrop(mm); + + return thps == nr_hpages ? 0 : madvise_collapse_errno(cr.result); +} diff --git a/mm/madvise.c b/mm/madvise.c index ec03a76244b7..7ad53e5311cf 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -59,6 +59,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_FREE: case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: + case MADV_COLLAPSE: return 0; default: /* be safe, default to 1. list exceptions explicitly */ @@ -1051,6 +1052,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, if (error) goto out; break; + case MADV_COLLAPSE: + return madvise_collapse(vma, prev, start, end); } anon_name = anon_vma_name(vma); @@ -1144,6 +1147,7 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_TRANSPARENT_HUGEPAGE case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: + case MADV_COLLAPSE: #endif case MADV_DONTDUMP: case MADV_DODUMP: @@ -1333,6 +1337,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * MADV_NOHUGEPAGE - mark the given range as not worth being backed by * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. + * MADV_COLLAPSE - synchronously coalesce pages into new THP. * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. From patchwork Sun Apr 10 13:54:39 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808150 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id EDCD5C433EF for ; Sun, 10 Apr 2022 13:55:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 611856B0080; Sun, 10 Apr 2022 09:55:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 5961E6B0081; Sun, 10 Apr 2022 09:55:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39AEB6B0082; Sun, 10 Apr 2022 09:55:11 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0006.hostedemail.com [216.40.44.6]) by kanga.kvack.org (Postfix) with ESMTP id 1EE956B0080 for ; Sun, 10 Apr 2022 09:55:11 -0400 (EDT) Received: from smtpin25.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id CB058A7294 for ; Sun, 10 Apr 2022 13:55:10 +0000 (UTC) X-FDA: 79341115980.25.29D7AC1 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf05.hostedemail.com (Postfix) with ESMTP id 62F9F100002 for ; Sun, 10 Apr 2022 13:55:10 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id ob7-20020a17090b390700b001c692ec6de4so10824534pjb.7 for ; Sun, 10 Apr 2022 06:55:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=8tA87Ul9eOuamiRwByNRq346xIjzQIhoyS7GWSV3Yeo=; b=FWWxu6CjLuzSTF+kabrDwjTV8/LaX4ZQFnqwczeCbI6os2W6gHxDUCQptFPdimC0dt 8AF7wQOv87JqL4wkzZz4AS+GBUCmFJl5yyLsM1jO/eGwH1Hy38EMJNyb4mMHwSpOJGQX EV4/ACZebSpwASCzcbHZQtoHz69a47+9NrWRDH0HrET6riKA3bKSsMWOEvcdnSk9xkeJ Wbtt/kufjBelDuM8ufHHmtz8qvaec0AoWwSvHZRmUHasFuZ+UwBrcXmY0cEQyvK38ElO wXNrtvqfBAkYgQWZckYUb8Z6McuJLiXBaz/CpfwnAEJD55zDlJGKSny13TrAcKmjhg60 GFTw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=8tA87Ul9eOuamiRwByNRq346xIjzQIhoyS7GWSV3Yeo=; b=vHeZw/SBsDcqJpkD2uGG6ZRybrsjYoi8DRW3swkF232YbVBZWrbccdP2ooSRqCX3iB GIzEUWExQNNoxBrQuTH4170Xnx6sKNk9x3qzWPwjteMAbMjMCDJg5TdCRFFZvPZsuLaj X7jlzixeipmVLvsSQfqt62dM0MX3JGXqD505RdI092+Q7imgX6Afko6Gv7hPYPDPww40 NYTtZ6y4kIjBs7EWY+OpAfCZ891WBUsYnWjGn5FyiB86m7Ojt6ts84P0OaleLtJlA7zS e0r7ouDmoLcktOtV0RU+SVey3ik4QhD0ofMQ0GImsVwf5N1gkfn+GUVGOTtEuA5XH4bE N/yw== X-Gm-Message-State: AOAM532MG55dgtEvsdeltkdyR0eD+6EgRbQ3FjpebMFwbNGC5LyoPubR VFekWEOqgaLrbCFYl/MKfHznge2xzb7z X-Google-Smtp-Source: ABdhPJwMrYHI40XWeLZn6pRW9gF4W1S7KZsoMAzMlk0kG/uDHe5zItkpgg/UnGobf1NdiuKnwFzApjq64RU8 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a62:7b85:0:b0:505:89c3:e3ac with SMTP id w127-20020a627b85000000b0050589c3e3acmr11831307pfc.59.1649598909317; Sun, 10 Apr 2022 06:55:09 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:39 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-7-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 06/12] mm/khugepaged: remove khugepaged prefix from shared collapse functions From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: geijx5jr5if78dyny95doa7s7obdg4fc X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: 62F9F100002 Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=FWWxu6Cj; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of 3veFSYgcKCOQfUQKKLKMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3veFSYgcKCOQfUQKKLKMUUMRK.IUSROTad-SSQbGIQ.UXM@flex--zokeefe.bounces.google.com X-Rspam-User: X-HE-Tag: 1649598910-125181 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The following functions/tracepoints are shared between khugepaged and madvise collapse contexts. Remove the khugepaged prefixes. tracepoint:mm_khugepaged_scan_pmd -> tracepoint:mm_scan_pmd khugepaged_test_exit() -> test_exit() khugepaged_scan_abort() -> scan_abort() khugepaged_scan_pmd() -> scan_pmd() khugepaged_find_target_node() -> find_target_node() Signed-off-by: Zach O'Keefe Reported-by: kernel test robot --- include/trace/events/huge_memory.h | 2 +- mm/khugepaged.c | 68 ++++++++++++++---------------- 2 files changed, 33 insertions(+), 37 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 9faa678e0a5b..09be0e2f76b1 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -48,7 +48,7 @@ SCAN_STATUS #define EM(a, b) {a, b}, #define EMe(a, b) {a, b} -TRACE_EVENT(mm_khugepaged_scan_pmd, +TRACE_EVENT(mm_scan_pmd, TP_PROTO(struct mm_struct *mm, struct page *page, bool writable, int referenced, int none_or_zero, int status, int unmapped), diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c5c484b7e394..2717262d1832 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -90,7 +90,7 @@ struct collapse_control { /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; - /* Last target selected in khugepaged_find_target_node() for this scan */ + /* Last target selected in find_target_node() for this scan */ int last_target_node; struct page *hpage; @@ -453,7 +453,7 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm, hash_add(mm_slots_hash, &mm_slot->hash, (long)mm); } -static inline int khugepaged_test_exit(struct mm_struct *mm) +static inline int test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) == 0; } @@ -505,7 +505,7 @@ void __khugepaged_enter(struct mm_struct *mm) return; /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); + VM_BUG_ON_MM(test_exit(mm), mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) { free_mm_slot(mm_slot); return; @@ -557,12 +557,11 @@ void __khugepaged_exit(struct mm_struct *mm) mmdrop(mm); } else if (mm_slot) { /* - * This is required to serialize against - * khugepaged_test_exit() (which is guaranteed to run - * under mmap sem read mode). Stop here (after we - * return all pagetables will be destroyed) until - * khugepaged has finished working on the pagetables - * under the mmap_lock. + * This is required to serialize against test_exit() (which is + * guaranteed to run under mmap sem read mode). Stop here + * (after we return all pagetables will be destroyed) until + * khugepaged has finished working on the pagetables under + * the mmap_lock. */ mmap_write_lock(mm); mmap_write_unlock(mm); @@ -816,7 +815,7 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } -static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) +static bool scan_abort(int nid, struct collapse_control *cc) { int i; @@ -846,7 +845,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT; } -static int khugepaged_find_target_node(struct collapse_control *cc) +static int find_target_node(struct collapse_control *cc) { int nid, target_node = 0, max_value = 0; @@ -993,7 +992,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, struct vm_area_struct *vma; unsigned long hstart, hend; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(test_exit(mm))) return SCAN_ANY_PROCESS; *vmap = vma = find_vma(mm, address); @@ -1037,7 +1036,7 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm, /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if khugepaged_scan_pmd believes it is worthwhile. + * Only done if scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held, * but with mmap_lock held to protect against vma changes. @@ -1129,7 +1128,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, mmap_read_unlock(mm); cr->dropped_mmap_lock = true; - node = khugepaged_find_target_node(cc); + node = find_target_node(cc); /* sched to specified node before huage page memory copy */ if (task_node(current) != node) { cpumask = cpumask_of_node(node); @@ -1270,11 +1269,9 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, return; } -static void khugepaged_scan_pmd(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long address, - struct collapse_control *cc, - struct collapse_result *cr) +static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, struct collapse_control *cc, + struct collapse_result *cr) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1364,7 +1361,7 @@ static void khugepaged_scan_pmd(struct mm_struct *mm, * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node, cc)) { + if (scan_abort(node, cc)) { cr->result = SCAN_SCAN_ABORT; goto out_unmap; } @@ -1421,8 +1418,8 @@ static void khugepaged_scan_pmd(struct mm_struct *mm, /* collapse_huge_page will return with the mmap_lock released */ collapse_huge_page(mm, address, cc, referenced, unmapped, cr); out: - trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, - none_or_zero, cr->result, unmapped); + trace_mm_scan_pmd(mm, page, writable, referenced, none_or_zero, + cr->result, unmapped); } static void collect_mm_slot(struct mm_slot *mm_slot) @@ -1431,7 +1428,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot) lockdep_assert_held(&khugepaged_mm_lock); - if (khugepaged_test_exit(mm)) { + if (test_exit(mm)) { /* free mm_slot */ hash_del(&mm_slot->hash); list_del(&mm_slot->mm_node); @@ -1598,7 +1595,7 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) if (!mmap_write_trylock(mm)) return; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(test_exit(mm))) goto out; for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) @@ -1653,7 +1650,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * reverse order. Trylock is a way to avoid deadlock. */ if (mmap_write_trylock(mm)) { - if (!khugepaged_test_exit(mm)) + if (!test_exit(mm)) collapse_and_free_pmd(mm, vma, addr, pmd); mmap_write_unlock(mm); } else { @@ -1710,7 +1707,7 @@ static void collapse_file(struct mm_struct *mm, /* Only allocate from the target node */ gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - node = khugepaged_find_target_node(cc); + node = find_target_node(cc); new_page = cc->alloc_hpage(cc, gfp, node); if (!new_page) { @@ -2094,7 +2091,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, } node = page_to_nid(page); - if (khugepaged_scan_abort(node, cc)) { + if (scan_abort(node, cc)) { cr->result = SCAN_SCAN_ABORT; break; } @@ -2183,7 +2180,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, vma = NULL; if (unlikely(!mmap_read_trylock(mm))) goto breakouterloop_mmap_lock; - if (likely(!khugepaged_test_exit(mm))) + if (likely(!test_exit(mm))) vma = find_vma(mm, khugepaged_scan.address); progress++; @@ -2191,7 +2188,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, unsigned long hstart, hend; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) { + if (unlikely(test_exit(mm))) { progress++; break; } @@ -2215,7 +2212,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, while (khugepaged_scan.address < hend) { struct collapse_result cr = {0}; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(test_exit(mm))) goto breakouterloop; VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2231,9 +2228,8 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, khugepaged_scan_file(mm, file, pgoff, cc, &cr); fput(file); } else { - khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - cc, &cr); + scan_pmd(mm, vma, khugepaged_scan.address, cc, + &cr); } if (cr.result == SCAN_SUCCEED) ++khugepaged_pages_collapsed; @@ -2257,7 +2253,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (khugepaged_test_exit(mm) || !vma) { + if (test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find @@ -2528,11 +2524,11 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, cond_resched(); memset(&cr, 0, sizeof(cr)); - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(test_exit(mm))) break; memset(cc.node_load, 0, sizeof(cc.node_load)); - khugepaged_scan_pmd(mm, vma, addr, &cc, &cr); + scan_pmd(mm, vma, addr, &cc, &cr); if (cr.dropped_mmap_lock) *prev = NULL; /* tell madvise we dropped mmap_lock */ From patchwork Sun Apr 10 13:54:40 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808151 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 37945C433EF for ; Sun, 10 Apr 2022 13:55:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id C28A98D0002; Sun, 10 Apr 2022 09:55:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id BB4038D0001; Sun, 10 Apr 2022 09:55:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A2A5E8D0002; Sun, 10 Apr 2022 09:55:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id 89A6A8D0001 for ; Sun, 10 Apr 2022 09:55:13 -0400 (EDT) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 3CF708249980 for ; Sun, 10 Apr 2022 13:55:13 +0000 (UTC) X-FDA: 79341116106.27.CBAF768 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf31.hostedemail.com (Postfix) with ESMTP id 00FF520006 for ; Sun, 10 Apr 2022 13:55:12 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id y15-20020a17090a154f00b001cb4f2196e1so3594534pja.3 for ; Sun, 10 Apr 2022 06:55:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=o0GV/i2w0W5xsDvWnkCTYSD2+KQOFIhsp+EwOgG0va0=; b=tlUKq1zhrWted0V54Vl3SAg7SrXTDtTfhvs3NuRPcQYEjDMBQl6PmDpiKROyFMbot3 bZIQkR7Je3K8k8WS0/pIn/oH1Y5qk3i0VeMwSuhYvjiY/q/EbHXS7HwdKSIteYEclk8T +QjcHMdwW+YdaFM/7lnreAYdkVHJgKm5RWrKUQH/pvuWkBY4U3AyV6+6cX6zZyafFhoI ytJSBrL6ltDfthQ9DG7E83h70x/juqUSJFQqo0DJTcBxE6B12YrpT9d+ZqJqYMpuYgsy J4hPKW9MxzkcYChB/IuwemWN/5ZccsN6D/PG+lVk22CxhCjesIbRBLLKVbEUeregiTfG N4uQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=o0GV/i2w0W5xsDvWnkCTYSD2+KQOFIhsp+EwOgG0va0=; b=etbqfn8r+8BZsD2dezAbitf8OosVgCesnIK8/3U0isCroYk8MMgC0zWZIfyg12d4mS R1MNDMEsOTdu+oCQb9yE4PAWeqBBT72AGO/oYYKM02LjTX76ol+M6MB3p48FWvUiuGgI IPMDyZHinc0x4f7My9WgcD7/i0VhZLVQrOc+1lqb0VvstdVnH1T/slRDoXmi8mcRh1Bi Bjn1SBn/BOwgDxmhl3J+HPReNfsPrvH+6Mw+pDK8dSucmjBjoFNqchs7d4hrEemqwLZh IlAUpXWcMfqAbN+Vej2Pi8M+Nz2gQlGL3cfG6H8rhemVm26fl+/q6YeCyZ34IC7T32CR STGg== X-Gm-Message-State: AOAM531xdvyXyeY6zpTyAXT17pSUORJjGGGz5a7dDPRvJzeATVkTXOHu Dl6aa3FvCqkjc4963d6wadx96g3Jzloe X-Google-Smtp-Source: ABdhPJy90l5SuTiVTzuyMYIxl7QDHR1VoXteUjiQ/VqZ/Ucl/C+rkSb4HT/j7mXWZ4u/CYIFihfPjrTz9Eg9 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:1589:b0:4fb:e7c:7c53 with SMTP id u9-20020a056a00158900b004fb0e7c7c53mr27910658pfk.78.1649598911843; Sun, 10 Apr 2022 06:55:11 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:40 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-8-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 07/12] mm/khugepaged: add flag to ignore khugepaged_max_ptes_* From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: kt9rwdskufshxwqqe7xse1dwfpfd6dpy Authentication-Results: imf31.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=tlUKq1zh; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf31.hostedemail.com: domain of 3v-FSYgcKCOYhWSMMNMOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3v-FSYgcKCOYhWSMMNMOWWOTM.KWUTQVcf-UUSdIKS.WZO@flex--zokeefe.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 00FF520006 X-HE-Tag: 1649598912-882824 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add enforce_pte_scan_limits flag to struct collapse_control that allows context to ignore sysfs-controlled knobs: khugepaged_max_ptes_[none|swap|shared]. Set this flag in khugepaged collapse context to preserve existing khugepaged behavior and unset the flag in madvise collapse context since the user presumably has reason to believe the collapse will be beneficial. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 32 ++++++++++++++++++++++---------- 1 file changed, 22 insertions(+), 10 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2717262d1832..7f555da26fdc 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -87,6 +87,9 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 struct collapse_control { + /* Respect khugepaged_max_ptes_[none|swap|shared] */ + bool enforce_pte_scan_limits; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -631,6 +634,7 @@ static bool is_refcount_suitable(struct page *page) static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, + struct collapse_control *cc, struct list_head *compound_pagelist) { struct page *page = NULL; @@ -644,7 +648,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (pte_none(pteval) || (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_pte_scan_limits)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -664,8 +669,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageAnon(page), page); - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->enforce_pte_scan_limits && page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -1207,7 +1212,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - cr->result = __collapse_huge_page_isolate(vma, address, pte, + cr->result = __collapse_huge_page_isolate(vma, address, pte, cc, &compound_pagelist); spin_unlock(pte_ptl); @@ -1296,7 +1301,8 @@ static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; if (is_swap_pte(pteval)) { - if (++unmapped <= khugepaged_max_ptes_swap) { + if (++unmapped <= khugepaged_max_ptes_swap || + !cc->enforce_pte_scan_limits) { /* * Always be strict with uffd-wp * enabled swap entries. Please see @@ -1315,7 +1321,8 @@ static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_pte_scan_limits)) { continue; } else { cr->result = SCAN_EXCEED_NONE_PTE; @@ -1345,8 +1352,9 @@ static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, goto out_unmap; } - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->enforce_pte_scan_limits && + page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { cr->result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; @@ -2073,7 +2081,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, continue; if (xa_is_value(page)) { - if (++swap > khugepaged_max_ptes_swap) { + if (cc->enforce_pte_scan_limits && + ++swap > khugepaged_max_ptes_swap) { cr->result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); break; @@ -2124,7 +2133,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, rcu_read_unlock(); if (cr->result == SCAN_SUCCEED) { - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { + if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none && + cc->enforce_pte_scan_limits) { cr->result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { @@ -2351,6 +2361,7 @@ static int khugepaged(void *none) { struct mm_slot *mm_slot; struct collapse_control cc = { + .enforce_pte_scan_limits = true, .last_target_node = NUMA_NO_NODE, .alloc_hpage = &khugepaged_alloc_page, }; @@ -2492,6 +2503,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, unsigned long end) { struct collapse_control cc = { + .enforce_pte_scan_limits = false, .last_target_node = NUMA_NO_NODE, .hpage = NULL, .alloc_hpage = &alloc_hpage, From patchwork Sun Apr 10 13:54:41 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808152 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 13396C433F5 for ; Sun, 10 Apr 2022 13:55:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 599A88D0003; Sun, 10 Apr 2022 09:55:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 521A48D0001; Sun, 10 Apr 2022 09:55:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 373478D0003; Sun, 10 Apr 2022 09:55:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 20D478D0001 for ; Sun, 10 Apr 2022 09:55:15 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay12.hostedemail.com (Postfix) with ESMTP id EC26D120B51 for ; Sun, 10 Apr 2022 13:55:14 +0000 (UTC) X-FDA: 79341116148.13.334B53C Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf09.hostedemail.com (Postfix) with ESMTP id 755AA140004 for ; Sun, 10 Apr 2022 13:55:14 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id v10-20020a17090a0c8a00b001c7a548e4f7so10842162pja.2 for ; Sun, 10 Apr 2022 06:55:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=xUPk2DkNiFStSfsg7fvWt61fb/cYrWOLsUemRz+N124=; b=HFoWDjgSzNj9L6Uvwi/isbGr1e2nTud3A2DGjAl+TcBFS25Tq2DP8VoNnES2yyGThd F3h6ub5TkxIz7Qn9S/lo9jAmerKNkCmWwiegpHaCbhHsGljZZqn4P/3ORuSGtl4xfz5O DDDrD/wWnkRDijxRr1TSjTK88mvb3d4tqQfz74ALvieeChJ7T6LYOnUYbF8G1AznhR9V hBcb0dj95E0PpiF8P9ZO3TrN+zeJ/EIvAP7tOI5V6f3XgvTo9W1zhctjEwZOhxbKtN5R fuysssS+AcPx0SUfGDbmdeJiQwGFJWwA16cmCDI0fy38QkehgESBCig3Y0069P8bjcxg hxew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=xUPk2DkNiFStSfsg7fvWt61fb/cYrWOLsUemRz+N124=; b=hjwj2kkLNV9xtpSlnpkCRdwvDnRVujf/4/7VRLPUT7dOF3wVElvE/JVtiuqlZDIDNh kPdohClOLqKLaNfU0ctnMHwB5kUcIPlDIXAZgOXR//j5pkzGyENo0Kk2wBd+j8yWFwXc gDJztMqGA7EZ1ens/6LvbXuAsiLSPf1KjqvtgIUsHocSrqeIFyXCJxntoTrGgIouUPib epBFtwCAv6zRyrNsB+1KIqckQaPM16Hp1DKRgHP+YgNhgG/6zQ25M5pp1boCIX9GbjrJ zenKQfJayMOqWOLMBQMIzJ9H0RWLug8GY/etM5NjU0OSibjPe+8KfCHpVSMWzQKDNefM bhoA== X-Gm-Message-State: AOAM530ddvd19hclxRqSF3cCTHJ7YCs24unSuZvMckQQ9zIRTanTQa6v 9wdqOXBS0NNk7AQPkhdR9c3XTaY2czEm X-Google-Smtp-Source: ABdhPJxFD/emTUVdxm1W48JbMpdh4+SM3AhHa3RUGl8AhJP0WayL24fz5XJa6Bl/vIn56R+lRbdWsMvPTWDV X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:174f:b0:4fd:aed5:b5e4 with SMTP id j15-20020a056a00174f00b004fdaed5b5e4mr28379544pfc.39.1649598913462; Sun, 10 Apr 2022 06:55:13 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:41 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-9-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 08/12] mm/khugepaged: add flag to ignore page young/referenced requirement From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 755AA140004 X-Stat-Signature: uwkxmorc5c746bsbanb4uwtmizgbo98m Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=HFoWDjgS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 3weFSYgcKCOgjYUOOPOQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3weFSYgcKCOgjYUOOPOQYYQVO.MYWVSXeh-WWUfKMU.YbQ@flex--zokeefe.bounces.google.com X-Rspam-User: X-HE-Tag: 1649598914-251137 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add enforce_young flag to struct collapse_control that allows context to ignore requirement that some pages in region being collapsed be young or referenced. Set this flag in khugepaged collapse context to preserve existing khugepaged behavior and unset the flag in madvise collapse context since the user presumably has reason to believe the collapse will be beneficial. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 24 ++++++++++++++++-------- 1 file changed, 16 insertions(+), 8 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 7f555da26fdc..8e5e45355c6d 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -90,6 +90,9 @@ struct collapse_control { /* Respect khugepaged_max_ptes_[none|swap|shared] */ bool enforce_pte_scan_limits; + /* Require memory to be young */ + bool enforce_young; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -737,9 +740,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, list_add_tail(&page->lru, compound_pagelist); next: /* There should be enough young pte to collapse the page */ - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + if (cc->enforce_young && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; if (pte_write(pteval)) @@ -748,7 +752,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (unlikely(!writable)) { result = SCAN_PAGE_RO; - } else if (unlikely(!referenced)) { + } else if (unlikely(cc->enforce_young && !referenced)) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -1408,14 +1412,16 @@ static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, cr->result = SCAN_PAGE_COUNT; goto out_unmap; } - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + if (cc->enforce_young && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; } if (!writable) { cr->result = SCAN_PAGE_RO; - } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { + } else if (cc->enforce_young && (!referenced || (unmapped && referenced + < HPAGE_PMD_NR / 2))) { cr->result = SCAN_LACK_REFERENCED_PAGE; } else { cr->result = SCAN_SUCCEED; @@ -2362,6 +2368,7 @@ static int khugepaged(void *none) struct mm_slot *mm_slot; struct collapse_control cc = { .enforce_pte_scan_limits = true, + .enforce_young = true, .last_target_node = NUMA_NO_NODE, .alloc_hpage = &khugepaged_alloc_page, }; @@ -2504,6 +2511,7 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, { struct collapse_control cc = { .enforce_pte_scan_limits = false, + .enforce_young = false, .last_target_node = NUMA_NO_NODE, .hpage = NULL, .alloc_hpage = &alloc_hpage, From patchwork Sun Apr 10 13:54:42 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808153 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F9D7C433EF for ; Sun, 10 Apr 2022 13:55:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 794B68D0005; Sun, 10 Apr 2022 09:55:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 71CA68D0001; Sun, 10 Apr 2022 09:55:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 596A18D0005; Sun, 10 Apr 2022 09:55:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id 3B8D88D0001 for ; Sun, 10 Apr 2022 09:55:17 -0400 (EDT) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 0DF4360BB3 for ; Sun, 10 Apr 2022 13:55:17 +0000 (UTC) X-FDA: 79341116274.01.DA9D40B Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf04.hostedemail.com (Postfix) with ESMTP id 8712040005 for ; Sun, 10 Apr 2022 13:55:16 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id t3-20020a656083000000b0039cf337edd6so4114414pgu.18 for ; Sun, 10 Apr 2022 06:55:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=nq/xQcXjJdglMeLB91FD7YYLmBCvBfcr3XMv5Gt3XKQ=; b=rR8VEjDNZe1Lq8K0HN+KZaEFxGOw1k4/zI8bdQClwOSAIsJtXhF37fvSBGqwJVkHEb 6JS4nqV5inxX+WAlSugafU3p7TVOU5OuHKco84b5y9TcTh6vlnS5fiqipZbb/TyqDKh0 9Sqv7KJIB5PQQaArcvXATNPz3LQCZz73d0k2A6ulbnzUZUpsKwtzAQvXNtf43XkP/yfE dpBB+Kl1Yj6vUNBm84FhlSAYeLQ+Gw/rzyacWsCH56JCydNloqhUcpZ6ZTmqmdY8NcrL EfWGRma5HKPyRRw/BsxnK+Ha1RvqkNR6N3dvQ0LRJZoWwpHhTaaQVfvTQ7i0X/TqsWA4 Dhqw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=nq/xQcXjJdglMeLB91FD7YYLmBCvBfcr3XMv5Gt3XKQ=; b=Pk4iriU/jdCZCdh1HUbBimTxM+axBzllSwYgp3VrKDJx9/4XxirgRjvhD70A5pHHlZ ucG+J5KN3uxNyJuliUp4TyitiyUz6qtGKQm3aasG7EqHD7KJj9HR12HtcpNwSSQimBwA V5ddzAjp7MrneG+wQbAf01UAEO9pQM+BblcY2UnM3ORzqMcGxdhuacLCaRWucVspv806 INTuJmCMvOpvGumX/fTFcdxQk/A0Xz/+9/l1BYFtzQ3EGkK2Ye8EMtM2SYI8GEgO9LOe bgQ84IFvM53w4MvvDLV8QZl65ZGNc3/m6x+WWnzhVe7W1QbCQkAj81Lq9dBDYUmwHYp0 +Xeg== X-Gm-Message-State: AOAM532I43Qd48yu0cc5k+wyUNu8qbhDngdzEYUntTfu+jLF7VitdI28 kZ42yzwOnQr3A2kARUQxuIBTZDb+nbDD X-Google-Smtp-Source: ABdhPJwXPcmP3mctsbyaCVDAgt+4qnBdcjuFNqbo1MA+nwhusXZmTjIdz8kw0AOGrO2ngfIFTKYMrrLZsoNN X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:6901:b0:156:4aad:fad7 with SMTP id j1-20020a170902690100b001564aadfad7mr28367254plk.33.1649598915487; Sun, 10 Apr 2022 06:55:15 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:42 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-10-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 09/12] mm/madvise: add MADV_COLLAPSE to process_madvise() From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspam-User: X-Stat-Signature: 1z7pd7zz5tr6ouq7b79xh8czjczfd85h Authentication-Results: imf04.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=rR8VEjDN; spf=pass (imf04.hostedemail.com: domain of 3w-FSYgcKCOolaWQQRQSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3w-FSYgcKCOolaWQQRQSaaSXQ.OaYXUZgj-YYWhMOW.adS@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 8712040005 X-HE-Tag: 1649598916-691252 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allow MADV_COLLAPSE behavior for process_madvise(2) if caller has CAP_SYS_ADMIN or is requesting collapse of it's own memory. Signed-off-by: Zach O'Keefe --- mm/madvise.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 7ad53e5311cf..a5c82fa7972b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1165,13 +1165,15 @@ madvise_behavior_valid(int behavior) } static bool -process_madvise_behavior_valid(int behavior) +process_madvise_behavior_valid(int behavior, struct task_struct *task) { switch (behavior) { case MADV_COLD: case MADV_PAGEOUT: case MADV_WILLNEED: return true; + case MADV_COLLAPSE: + return task == current || capable(CAP_SYS_ADMIN); default: return false; } @@ -1449,7 +1451,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, goto free_iov; } - if (!process_madvise_behavior_valid(behavior)) { + if (!process_madvise_behavior_valid(behavior, task)) { ret = -EINVAL; goto release_task; } From patchwork Sun Apr 10 13:54:43 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808154 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CE142C433F5 for ; Sun, 10 Apr 2022 13:55:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6AABF8D0006; Sun, 10 Apr 2022 09:55:20 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 634628D0001; Sun, 10 Apr 2022 09:55:20 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 460E58D0006; Sun, 10 Apr 2022 09:55:20 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0046.hostedemail.com [216.40.44.46]) by kanga.kvack.org (Postfix) with ESMTP id 2728A8D0001 for ; Sun, 10 Apr 2022 09:55:20 -0400 (EDT) Received: from smtpin31.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id C7D05A5D3B for ; Sun, 10 Apr 2022 13:55:19 +0000 (UTC) X-FDA: 79341116358.31.832C11D Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf25.hostedemail.com (Postfix) with ESMTP id 55D14A0009 for ; Sun, 10 Apr 2022 13:55:19 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id oo16-20020a17090b1c9000b001c6d21e8c04so10832731pjb.4 for ; Sun, 10 Apr 2022 06:55:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=PV/idpFKrhmRyaCIzOzBmjy0yfQb0Pgg4Ffs7FHc7FU=; b=QT20pUhTunqLEzCy4KJNRtodMZyzV7/E4BY+n+kRIqFl/h51CSiKy5ZFKtsKNlvxD6 ZJJFWon3aswR8E61+tipmNtPyRdtvR4XFAVjl0nYRn396/dBuNpnsNGH/WSZCWkxl3jR ZYRcX2lcbJ3P/MOf+zZclH5ee+3agTefzD75HPGHmE7JM/1jeelXrDB2hHHhFGk8N8nb VHXfwTrLmd1gzdlDPIUoY4BFHdEbA0yQzS8yiiPKjROyjS4zMjuCDrZ802foOLu54sRC fhj7evdIxsOr769BZhLK0qcfFsoMeV95ydAZM53d1PTOxUS41hYAHxYTjiYVXAAUCGR7 +nGg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=PV/idpFKrhmRyaCIzOzBmjy0yfQb0Pgg4Ffs7FHc7FU=; b=eNE2ljMiHHUseql8m9Z9C0rNPG1moQXbK39je2jH+57JZ0QMgnsuab5c8y4pzZWGzO BXjTvG9wR6ZHgya/Dx79MdJNwZrBMmZ5onSMkjul8RrtciKFiiJy6NCs23uDtp7I386j //844VevhEKjxkW8r9mfEyrAFzlAwspHUFzMiHzFJtaYS1aT2td1DYrc35myAvYoMG/2 sj9+Iz5MSbaGlzB3tHZ8MMia3Kb1Yr0hSmi3RxqW5XPBytg6h22KJBOPTSQ/I9ddugPu +r7GcHbwNJVYxpGkYhEhrCHcXiuoq4v9NgNPSf7vGnFa4BIZXoDuFKWR48XqWMhKNE9f nwBQ== X-Gm-Message-State: AOAM532756kx73MWr5BAiTn2GeEnhhAl/GHYK4zTgKyQ0S9owiz+qLHd rb3dLdkrKIcD7rWJSR17gmcRzf5q8PId X-Google-Smtp-Source: ABdhPJwR9MjwFSvtL33iGC9eum14Fd4fDGYGLoMRChKGI5U4V7z+T1L+Oh+Y3Z7pJTgg0yAUDoCmKZvhdvCp X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:2384:b0:1cb:5223:9dc4 with SMTP id mr4-20020a17090b238400b001cb52239dc4mr802937pjb.1.1649598917752; Sun, 10 Apr 2022 06:55:17 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:43 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-11-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 10/12] selftests/vm: modularize collapse selftests From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" Authentication-Results: imf25.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=QT20pUhT; spf=pass (imf25.hostedemail.com: domain of 3xeFSYgcKCOwncYSSTSUccUZS.QcaZWbil-aaYjOQY.cfU@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3xeFSYgcKCOwncYSSTSUccUZS.QcaZWbil-aaYjOQY.cfU@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 55D14A0009 X-Stat-Signature: nignohqnextwyw5gfc599tc17dmnck4f X-HE-Tag: 1649598919-708808 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize the collapse action of khugepaged collapse selftests by introducing a struct collapse_context which specifies how to collapse a given memory range and the expected semantics of the collapse. This can be reused later to test other collapse contexts. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 257 +++++++++++------------- 1 file changed, 116 insertions(+), 141 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 155120b67a16..c59d832fee96 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -23,6 +23,12 @@ static int hpage_pmd_nr; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" #define PID_SMAPS "/proc/self/smaps" +struct collapse_context { + const char *name; + void (*collapse)(const char *msg, char *p, bool expect); + bool enforce_pte_scan_limits; +}; + enum thp_enabled { THP_ALWAYS, THP_MADVISE, @@ -528,53 +534,39 @@ static void alloc_at_fault(void) munmap(p, hpage_pmd_size); } -static void collapse_full(void) +static void collapse_full(struct collapse_context *context) { void *p; p = alloc_mapping(); fill_memory(p, 0, hpage_pmd_size); - if (wait_for_scan("Collapse fully populated PTE table", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse fully populated PTE table", p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_empty(void) +static void collapse_empty(struct collapse_context *context) { void *p; p = alloc_mapping(); - if (wait_for_scan("Do not collapse empty PTE table", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Do not collapse empty PTE table", p, false); munmap(p, hpage_pmd_size); } -static void collapse_single_pte_entry(void) +static void collapse_single_pte_entry(struct collapse_context *context) { void *p; p = alloc_mapping(); fill_memory(p, 0, page_size); - if (wait_for_scan("Collapse PTE table with single PTE entry present", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single PTE entry present", p, + true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_none(void) +static void collapse_max_ptes_none(struct collapse_context *context) { int max_ptes_none = hpage_pmd_nr / 2; struct settings settings = default_settings; @@ -586,28 +578,23 @@ static void collapse_max_ptes_none(void) p = alloc_mapping(); fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - if (wait_for_scan("Do not collapse with max_ptes_none exceeded", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Maybe collapse with max_ptes_none exceeded", p, + !context->enforce_pte_scan_limits); validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); - if (wait_for_scan("Collapse with max_ptes_none PTEs empty", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); - validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + if (context->enforce_pte_scan_limits) { + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + context->collapse("Collapse with max_ptes_none PTEs empty", p, + true); + validate_memory(p, 0, + (hpage_pmd_nr - max_ptes_none) * page_size); + } munmap(p, hpage_pmd_size); write_settings(&default_settings); } -static void collapse_swapin_single_pte(void) +static void collapse_swapin_single_pte(struct collapse_context *context) { void *p; p = alloc_mapping(); @@ -625,18 +612,14 @@ static void collapse_swapin_single_pte(void) goto out; } - if (wait_for_scan("Collapse with swapping in single PTE entry", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse with swapping in single PTE entry", + p, true); validate_memory(p, 0, hpage_pmd_size); out: munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_swap(void) +static void collapse_max_ptes_swap(struct collapse_context *context) { int max_ptes_swap = read_num("khugepaged/max_ptes_swap"); void *p; @@ -656,39 +639,34 @@ static void collapse_max_ptes_swap(void) goto out; } - if (wait_for_scan("Do not collapse with max_ptes_swap exceeded", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Maybe collapse with max_ptes_swap exceeded", + p, !context->enforce_pte_scan_limits); validate_memory(p, 0, hpage_pmd_size); - fill_memory(p, 0, hpage_pmd_size); - printf("Swapout %d of %d pages...", max_ptes_swap, hpage_pmd_nr); - if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { - perror("madvise(MADV_PAGEOUT)"); - exit(EXIT_FAILURE); - } - if (check_swap(p, max_ptes_swap * page_size)) { - success("OK"); - } else { - fail("Fail"); - goto out; - } + if (context->enforce_pte_scan_limits) { + fill_memory(p, 0, hpage_pmd_size); + printf("Swapout %d of %d pages...", max_ptes_swap, + hpage_pmd_nr); + if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, max_ptes_swap * page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } - if (wait_for_scan("Collapse with max_ptes_swap pages swapped out", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); - validate_memory(p, 0, hpage_pmd_size); + context->collapse("Collapse with max_ptes_swap pages swapped out", + p, true); + validate_memory(p, 0, hpage_pmd_size); + } out: munmap(p, hpage_pmd_size); } -static void collapse_single_pte_entry_compound(void) +static void collapse_single_pte_entry_compound(struct collapse_context *context) { void *p; @@ -710,17 +688,13 @@ static void collapse_single_pte_entry_compound(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table with single PTE mapping compound page", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single PTE mapping compound page", + p, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } -static void collapse_full_of_compound(void) +static void collapse_full_of_compound(struct collapse_context *context) { void *p; @@ -742,17 +716,12 @@ static void collapse_full_of_compound(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table full of compound pages", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of compound pages", p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_compound_extreme(void) +static void collapse_compound_extreme(struct collapse_context *context) { void *p; int i; @@ -798,18 +767,14 @@ static void collapse_compound_extreme(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table full of different compound pages", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of different compound pages", + p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_fork(void) +static void collapse_fork(struct collapse_context *context) { int wstatus; void *p; @@ -835,13 +800,8 @@ static void collapse_fork(void) fail("Fail"); fill_memory(p, page_size, 2 * page_size); - - if (wait_for_scan("Collapse PTE table with single page shared with parent process", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single page shared with parent process", + p, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); @@ -860,7 +820,7 @@ static void collapse_fork(void) munmap(p, hpage_pmd_size); } -static void collapse_fork_compound(void) +static void collapse_fork_compound(struct collapse_context *context) { int wstatus; void *p; @@ -896,14 +856,10 @@ static void collapse_fork_compound(void) fill_memory(p, 0, page_size); write_num("khugepaged/max_ptes_shared", hpage_pmd_nr - 1); - if (wait_for_scan("Collapse PTE table full of compound pages in child", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of compound pages in child", + p, true); write_num("khugepaged/max_ptes_shared", - default_settings.khugepaged.max_ptes_shared); + default_settings.khugepaged.max_ptes_shared); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -922,7 +878,7 @@ static void collapse_fork_compound(void) munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_shared() +static void collapse_max_ptes_shared(struct collapse_context *context) { int max_ptes_shared = read_num("khugepaged/max_ptes_shared"); int wstatus; @@ -957,28 +913,22 @@ static void collapse_max_ptes_shared() else fail("Fail"); - if (wait_for_scan("Do not collapse with max_ptes_shared exceeded", p)) - fail("Timeout"); - else if (!check_huge(p)) - success("OK"); - else - fail("Fail"); - - printf("Trigger CoW on page %d of %d...", - hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * page_size); - if (!check_huge(p)) - success("OK"); - else - fail("Fail"); - - - if (wait_for_scan("Collapse with max_ptes_shared PTEs shared", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Maybe collapse with max_ptes_shared exceeded", + p, !context->enforce_pte_scan_limits); + + if (context->enforce_pte_scan_limits) { + printf("Trigger CoW on page %d of %d...", + hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * + page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + context->collapse("Collapse with max_ptes_shared PTEs shared", + p, true); + } validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -997,8 +947,27 @@ static void collapse_max_ptes_shared() munmap(p, hpage_pmd_size); } +static void khugepaged_collapse(const char *msg, char *p, bool expect) +{ + if (wait_for_scan(msg, p)) + fail("Timeout"); + else if (check_huge(p) == expect) + success("OK"); + else + fail("Fail"); +} + int main(void) { + struct collapse_context contexts[] = { + { + .name = "khugepaged", + .collapse = &khugepaged_collapse, + .enforce_pte_scan_limits = true, + }, + }; + int i; + setbuf(stdout, NULL); page_size = getpagesize(); @@ -1014,18 +983,24 @@ int main(void) adjust_settings(); alloc_at_fault(); - collapse_full(); - collapse_empty(); - collapse_single_pte_entry(); - collapse_max_ptes_none(); - collapse_swapin_single_pte(); - collapse_max_ptes_swap(); - collapse_single_pte_entry_compound(); - collapse_full_of_compound(); - collapse_compound_extreme(); - collapse_fork(); - collapse_fork_compound(); - collapse_max_ptes_shared(); + + for (i = 0; i < sizeof(contexts) / sizeof(contexts[0]); ++i) { + struct collapse_context *c = &contexts[i]; + + printf("\n*** Testing context: %s ***\n", c->name); + collapse_full(c); + collapse_empty(c); + collapse_single_pte_entry(c); + collapse_max_ptes_none(c); + collapse_swapin_single_pte(c); + collapse_max_ptes_swap(c); + collapse_single_pte_entry_compound(c); + collapse_full_of_compound(c); + collapse_compound_extreme(c); + collapse_fork(c); + collapse_fork_compound(c); + collapse_max_ptes_shared(c); + } restore_settings(0); } From patchwork Sun Apr 10 13:54:44 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808155 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D75A5C433EF for ; Sun, 10 Apr 2022 13:55:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4DA0E8D0007; Sun, 10 Apr 2022 09:55:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 45F378D0001; Sun, 10 Apr 2022 09:55:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 2B0248D0007; Sun, 10 Apr 2022 09:55:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0194.hostedemail.com [216.40.44.194]) by kanga.kvack.org (Postfix) with ESMTP id 0DC728D0001 for ; Sun, 10 Apr 2022 09:55:22 -0400 (EDT) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id C09BF183380A7 for ; Sun, 10 Apr 2022 13:55:21 +0000 (UTC) X-FDA: 79341116442.24.C6D185B Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf16.hostedemail.com (Postfix) with ESMTP id 208BC180007 for ; Sun, 10 Apr 2022 13:55:20 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id i184-20020a62c1c1000000b0050569a135c1so5447791pfg.3 for ; Sun, 10 Apr 2022 06:55:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=RoD7cCSCub5NPNFZ8BxvkniMQZAbDy4V3obTSLgZ4xQ=; b=H8eu741x7ewySZi+Xeol68sZeN0Y6Q5P4O1qi3cXGHWZx4ZsGm6ZT5v3CD/rUkyrET DJzntkKa4kLdchNlmCHng57B6slDbwOXpI+/PzcQOgjEMud2o6IvhGYK5Doog9tl98QW 93PrYb0XQ3VfFeLclmcUsyDg8iyuzzh0eyaO/XochAt58CXBaaSkU7uyI1BvHweVtX/U aUMSuXIGnQbXRGp7t7vqMwtcXRPJ5d2rCM5cMqhy4V306xeY/NifXk/Qu+wwqyYCUP5x i9SDeXfcmk2tjaFntinlAKdner8kFCyRC4E4wFEm914qu8x2mK41BGFyVD9Ap+8+lgca 9lPg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=RoD7cCSCub5NPNFZ8BxvkniMQZAbDy4V3obTSLgZ4xQ=; b=PrXKpriua7LmJQwXaIrPv0UB9A86b6j6Tv95Yr6DZhriA8PveVqA5M47U4wXffbiK6 +58R1WBRshKQkxTPMLRodLnaWnDGgjhXdL8v0HiHT4hdeu8gVrrIpjodWYXUx8nwd7u3 6x/h0Grd1vrPLefLa31M5GL8o439FyoDWWlHWJfpW0QtcvkIjRcwDqOdlUgJC2bQaHE2 sqTRzM1fYi705JKZKBrtWQsDA+bip80X87M+jq6KZUFWyphDaYSvNFVTW2faRa/OBLUT gLuXOMxP98sYVMr5ws/+lH98VbYre8+xXZWoOQVY3ccz+0fRxVMfZBaa3CWBxWN2qnKd OU0Q== X-Gm-Message-State: AOAM530awx+dCJ4OaoqEupkBASTbV9zvYsKog1OCPKGPXrZyg3Oqhpcl xk6LbTW0741adNU0kX6kVFGyBRnJZPSG X-Google-Smtp-Source: ABdhPJyIUoUMa+3RDbRdg6mGNfKzMeJdDwXIDe7Ghc5hfGvJQGSbgUqEDl1suzDASVflzB2d4x6R/z66xu/L X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a65:6942:0:b0:378:9365:5963 with SMTP id w2-20020a656942000000b0037893655963mr22652441pgq.142.1649598920206; Sun, 10 Apr 2022 06:55:20 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:44 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-12-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 11/12] selftests/vm: add MADV_COLLAPSE collapse context to selftests From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 208BC180007 X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=H8eu741x; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf16.hostedemail.com: domain of 3yOFSYgcKCO8qfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3yOFSYgcKCO8qfbVVWVXffXcV.TfdcZelo-ddbmRTb.fiX@flex--zokeefe.bounces.google.com X-Stat-Signature: k17zp7gpfwkpg311nkw8mihqjpx4uzqt X-HE-Tag: 1649598920-182465 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add MADV_COLLAPSE selftests. Extend struct collapse_context to support context initialization/cleanup. This is used by madvise collapse context to "disable" and "enable" khugepaged, since it would otherwise interfere with the tests. The mechanism used to "disable" khugepaged is a hack: it sets /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs to a large value and feeds khugepaged enough suitable VMAs/pages to keep khugepaged sleeping for the duration of the madvise collapse tests. Since khugepaged is woken when this file is written, enough VMAs must be queued to put khugepaged back to sleep when the tests write to this file in write_settings(). Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 133 ++++++++++++++++++++++-- 1 file changed, 125 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index c59d832fee96..e0ccc9443f78 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -14,17 +14,23 @@ #ifndef MADV_PAGEOUT #define MADV_PAGEOUT 21 #endif +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif #define BASE_ADDR ((void *)(1UL << 30)) static unsigned long hpage_pmd_size; static unsigned long page_size; static int hpage_pmd_nr; +static int num_khugepaged_wakeups; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" #define PID_SMAPS "/proc/self/smaps" struct collapse_context { const char *name; + bool (*init_context)(void); + bool (*cleanup_context)(void); void (*collapse)(const char *msg, char *p, bool expect); bool enforce_pte_scan_limits; }; @@ -264,6 +270,17 @@ static void write_num(const char *name, unsigned long num) } } +/* + * Use this macro instead of write_settings inside tests, and should + * be called at most once per callsite. + * + * Hack to statically count the number of times khugepaged is woken up due to + * writes to + * /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs, + * and is stored in __COUNTER__. + */ +#define WRITE_SETTINGS(s) do { __COUNTER__; write_settings(s); } while (0) + static void write_settings(struct settings *settings) { struct khugepaged_settings *khugepaged = &settings->khugepaged; @@ -332,7 +349,7 @@ static void adjust_settings(void) { printf("Adjust settings..."); - write_settings(&default_settings); + WRITE_SETTINGS(&default_settings); success("OK"); } @@ -440,20 +457,25 @@ static bool check_swap(void *addr, unsigned long size) return swap; } -static void *alloc_mapping(void) +static void *alloc_mapping_at(void *at, size_t size) { void *p; - p = mmap(BASE_ADDR, hpage_pmd_size, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); - if (p != BASE_ADDR) { - printf("Failed to allocate VMA at %p\n", BASE_ADDR); + p = mmap(at, size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, + -1, 0); + if (p != at) { + printf("Failed to allocate VMA at %p\n", at); exit(EXIT_FAILURE); } return p; } +static void *alloc_mapping(void) +{ + return alloc_mapping_at(BASE_ADDR, hpage_pmd_size); +} + static void fill_memory(int *p, unsigned long start, unsigned long end) { int i; @@ -573,7 +595,7 @@ static void collapse_max_ptes_none(struct collapse_context *context) void *p; settings.khugepaged.max_ptes_none = max_ptes_none; - write_settings(&settings); + WRITE_SETTINGS(&settings); p = alloc_mapping(); @@ -591,7 +613,7 @@ static void collapse_max_ptes_none(struct collapse_context *context) } munmap(p, hpage_pmd_size); - write_settings(&default_settings); + WRITE_SETTINGS(&default_settings); } static void collapse_swapin_single_pte(struct collapse_context *context) @@ -947,6 +969,87 @@ static void collapse_max_ptes_shared(struct collapse_context *context) munmap(p, hpage_pmd_size); } +static void madvise_collapse(const char *msg, char *p, bool expect) +{ + int ret; + + printf("%s...", msg); + /* Sanity check */ + if (check_huge(p)) { + printf("Unexpected huge page\n"); + exit(EXIT_FAILURE); + } + + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + ret = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (((bool)ret) == expect) + fail("Fail: Bad return value"); + else if (check_huge(p) != expect) + fail("Fail: check_huge()"); + else + success("OK"); +} + +static struct khugepaged_disable_state { + void *p; + size_t map_size; +} khugepaged_disable_state; + +static bool disable_khugepaged(void) +{ + /* + * Hack to "disable" khugepaged by setting + * /transparent_hugepage/khugepaged/scan_sleep_millisecs to some large + * value, then feeding it enough suitable VMAs to scan and subsequently + * sleep. + * + * khugepaged is woken up on writes to + * /transparent_hugepage/khugepaged/scan_sleep_millisecs, so care must + * be taken to not inadvertently wake khugepaged in these tests. + * + * Feed khugepaged 1 hugepage-sized VMA to scan and sleep on, then + * N more for each time khugepaged would be woken up. + */ + size_t map_size = (num_khugepaged_wakeups + 1) * hpage_pmd_size; + void *p; + bool ret = true; + int full_scans; + int timeout = 6; /* 3 seconds */ + + default_settings.khugepaged.scan_sleep_millisecs = 1000 * 60 * 10; + default_settings.khugepaged.pages_to_scan = 1; + write_settings(&default_settings); + + p = alloc_mapping_at(((char *)BASE_ADDR) + (1UL << 30), map_size); + fill_memory(p, 0, map_size); + + full_scans = read_num("khugepaged/full_scans") + 2; + + printf("disabling khugepaged..."); + while (timeout--) { + if (read_num("khugepaged/full_scans") >= full_scans) { + fail("Fail"); + ret = false; + break; + } + printf("."); + usleep(TICK); + } + success("OK"); + khugepaged_disable_state.p = p; + khugepaged_disable_state.map_size = map_size; + return ret; +} + +static bool enable_khugepaged(void) +{ + printf("enabling khugepaged..."); + munmap(khugepaged_disable_state.p, khugepaged_disable_state.map_size); + write_settings(&saved_settings); + success("OK"); + return true; +} + static void khugepaged_collapse(const char *msg, char *p, bool expect) { if (wait_for_scan(msg, p)) @@ -962,9 +1065,18 @@ int main(void) struct collapse_context contexts[] = { { .name = "khugepaged", + .init_context = NULL, + .cleanup_context = NULL, .collapse = &khugepaged_collapse, .enforce_pte_scan_limits = true, }, + { + .name = "madvise", + .init_context = &disable_khugepaged, + .cleanup_context = &enable_khugepaged, + .collapse = &madvise_collapse, + .enforce_pte_scan_limits = false, + }, }; int i; @@ -973,6 +1085,7 @@ int main(void) page_size = getpagesize(); hpage_pmd_size = read_num("hpage_pmd_size"); hpage_pmd_nr = hpage_pmd_size / page_size; + num_khugepaged_wakeups = __COUNTER__; default_settings.khugepaged.max_ptes_none = hpage_pmd_nr - 1; default_settings.khugepaged.max_ptes_swap = hpage_pmd_nr / 8; @@ -988,6 +1101,8 @@ int main(void) struct collapse_context *c = &contexts[i]; printf("\n*** Testing context: %s ***\n", c->name); + if (c->init_context && !c->init_context()) + continue; collapse_full(c); collapse_empty(c); collapse_single_pte_entry(c); @@ -1000,6 +1115,8 @@ int main(void) collapse_fork(c); collapse_fork_compound(c); collapse_max_ptes_shared(c); + if (c->cleanup_context && !c->cleanup_context()) + break; } restore_settings(0); From patchwork Sun Apr 10 13:54:45 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12808156 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3F928C433F5 for ; Sun, 10 Apr 2022 13:55:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 439298D0008; Sun, 10 Apr 2022 09:55:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 39AAD8D0001; Sun, 10 Apr 2022 09:55:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1EE2B8D0008; Sun, 10 Apr 2022 09:55:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 073C08D0001 for ; Sun, 10 Apr 2022 09:55:24 -0400 (EDT) Received: from smtpin20.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id D18D4995 for ; Sun, 10 Apr 2022 13:55:23 +0000 (UTC) X-FDA: 79341116526.20.964B892 Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf28.hostedemail.com (Postfix) with ESMTP id 65F54C0005 for ; Sun, 10 Apr 2022 13:55:23 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id b6-20020a62a106000000b0050564d6fd75so6097915pff.22 for ; Sun, 10 Apr 2022 06:55:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=6f6E+e2xbFY+GfzJPcyHfiReE2u3cj1r12bjUFc7pDM=; b=cV0yRAJBeXEo3FNIK6QOdZ6yVmvDX0nvOmzaKKav7u8j3/hVJwTlJzKzNkJJczUtgy pXEXpFi3JYOFxizzB6HDHBNnuJRR+HkJff8teVajbQ1L2bRUtqhR3yEuBNBYqSBoxcv8 JkvwBcopxWUzDsApejc+m6Mi8EiW2gXd8cvBHeYN108swEBWck55upPe0pTKcoCF7SjO bPwcoUBpwvC61WXjxNvyegAlhlU13GyNsy8tlAsBSRbWW/zQpSA8RV22LMsxdVAZEEqH PyOVtLdt0V/brGXgsTpFlo3IcCTkHszEPhv2heFBDr+a9XUuMDGGKPz5HSOxAAj2tcvU MtBg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=6f6E+e2xbFY+GfzJPcyHfiReE2u3cj1r12bjUFc7pDM=; b=JG+sQ/yl7HChAOQIlU7DFhYaqMoxpQXVHEPY94MBGfIXbyIPX5goHRFb4TjJh9Qt0j lM9EpHL38EWUVjxgyS4ntGp3fUC3rEPnlGPgSzmmXjGwYBbbT8enbA49SYpblB+WFQJZ VgtKLJldJMxCvK7Z0HiXed58eFyFX0/G7xCp//L3T20maPxSV1jQuUb56MRCAJDNXwKy NNRyvFgChSHcyHaStXPhnSJCqoVOpq28fObRLfAokxBqDRYQINtFXGNsDsM+C3pNqd6B Btc5im2g5eZfm+UYvSxutJsnhQjUOR3gfTpHMCUjigeNBls4Tr2KmeuYOSnqXjWo9dN4 DUDw== X-Gm-Message-State: AOAM531j+gbWTQCYwwhZ8zdHSFr5Tv9vrI2OBFK5Lq/XM5NV88DZ+qfQ wuziv1579WdMuz9iZDxDmq6QdT4YI4Na X-Google-Smtp-Source: ABdhPJxXXjKIn2zVFucb8YUcGAaIdDmXMSueneWzmMGuaFvpnQsZuMMUk2K731wGE74XDSJkdBSsxy2xBAdF X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:2685:b0:1cb:6521:dd78 with SMTP id pl5-20020a17090b268500b001cb6521dd78mr7565584pjb.194.1649598922445; Sun, 10 Apr 2022 06:55:22 -0700 (PDT) Date: Sun, 10 Apr 2022 06:54:45 -0700 In-Reply-To: <20220410135445.3897054-1-zokeefe@google.com> Message-Id: <20220410135445.3897054-13-zokeefe@google.com> Mime-Version: 1.0 References: <20220410135445.3897054-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.1178.g4f1659d476-goog Subject: [PATCH 12/12] selftests/vm: add test to verify recollapse of THPs From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: y3q4pefw9ezquzkxcspnt39shdk5hkye Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cV0yRAJB; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf28.hostedemail.com: domain of 3yuFSYgcKCPEshdXXYXZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3yuFSYgcKCPEshdXXYXZhhZeX.Vhfebgnq-ffdoTVd.hkZ@flex--zokeefe.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 65F54C0005 X-HE-Tag: 1649598923-185988 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add selftest specific to madvise collapse context that tests MADV_COLLAPSE is "successful" if a hugepage-algined/sized region is already pmd-mapped. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 32 +++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index e0ccc9443f78..c36d04218083 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -969,6 +969,32 @@ static void collapse_max_ptes_shared(struct collapse_context *context) munmap(p, hpage_pmd_size); } +static void madvise_collapse_existing_thps(void) +{ + void *p; + int err; + + p = alloc_mapping(); + fill_memory(p, 0, hpage_pmd_size); + + printf("Collapse fully populated PTE table..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (err == 0 && check_huge(p)) { + success("OK"); + printf("Re-collapse PMD-mapped hugepage"); + err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (err == 0 && check_huge(p)) + success("OK"); + else + fail("Fail"); + } else { + fail("Fail"); + } + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + static void madvise_collapse(const char *msg, char *p, bool expect) { int ret; @@ -1097,6 +1123,7 @@ int main(void) alloc_at_fault(); + /* Shared tests */ for (i = 0; i < sizeof(contexts) / sizeof(contexts[0]); ++i) { struct collapse_context *c = &contexts[i]; @@ -1119,5 +1146,10 @@ int main(void) break; } + /* madvise-specific tests */ + disable_khugepaged(); + madvise_collapse_existing_thps(); + enable_khugepaged(); + restore_settings(0); }