From patchwork Wed May 4 21:44:25 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838663 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DC854C4332F for ; Wed, 4 May 2022 21:45:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 68CF86B0073; Wed, 4 May 2022 17:45:02 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 610F86B0074; Wed, 4 May 2022 17:45:02 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48A5C6B0075; Wed, 4 May 2022 17:45:02 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 3C3456B0073 for ; Wed, 4 May 2022 17:45:02 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0C3142BDBD for ; Wed, 4 May 2022 21:45:02 +0000 (UTC) X-FDA: 79429391244.13.4839CBA Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf28.hostedemail.com (Postfix) with ESMTP id 82C64C0085 for ; Wed, 4 May 2022 21:44:47 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id m8-20020a17090aab0800b001cb1320ef6eso3503271pjq.3 for ; Wed, 04 May 2022 14:45:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=feb3hdgUhXYlLWlaSi+Zt7sm/tCIhpJou4kzEHLcVAI=; b=e50YKsCYdONwWzNuK0DWVivQw+G7Wqbx2VV5qJQ1MPDjU7NKsMQsBmiCVmZiRlPmgx JewHmHA+6Ud4Rxyp2I36DNkdsZcyY+EQp70ZXVj14Fi8lU4uQNEHeXAycrgTsGmKQuMX qWwmBwvCU983Oh+bufIrmUQYQq7h7lgZGXNqGdA6I/pdKdDcX+THelaIlXbkupErp6pr MwxTBb+Bhd5zTe3fAwlwUArtOiOVEvKP+804en1zFDJbjgtqYVO9TOfscjCB2VxOBD4E AabxWgyyg6Gn3J3bPHU60tu/VWqp3KF3BSTbJ1im1A+U4W5GELcGC4lmbpwF0oJ5D+tA 2Bxg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=feb3hdgUhXYlLWlaSi+Zt7sm/tCIhpJou4kzEHLcVAI=; b=WktTEoZC32H1KWnYXy14K9LAqH/miEbFPfMUqntXcxa3Ezefo+vj8pY1tfl9XMs/IQ mrq1MBV3LnC8op+W6M6He4lGExOJqlwDYfPEAijP73RqR3JNrYOa7ztt9KRzA/5k5HPS 7akIMRG1Sz74yrtcmiSsPa4I8UJ9m8ka1FicarU1lKpweEc0a/JFV4VmRqtEPy1VArr8 nan1hXiUnMCGD3+ie1qmKKah1TnaRXLNwufbE9KJznHlbit41wkdUMqIPwrqLkf7qWNa 0BGmEAOVYFfx9T1hWXv0d4k2FgkpsKgOuWE2PvFtjE541AJWZNgFBo5vRg1PJAMocxde dR6w== X-Gm-Message-State: AOAM531SOy3ZvURxmTwFYcH4nhVY47SP3ECTuCJZ8wRzIlsbRFTxS+ry 4tWXCRoLAfdIUAfy13sX0lYNpM4jgToc X-Google-Smtp-Source: ABdhPJxkmRwVhZd0W9qmjPkfULiJmDVmcaqHtvN5F814iMrfoQn2PPN6vPwEcdN+2vjz5RBebkSgOnx2E3QP X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:cc0a:b0:1dc:9a3a:6eef with SMTP id b10-20020a17090acc0a00b001dc9a3a6eefmr1912657pju.127.1651700700177; Wed, 04 May 2022 14:45:00 -0700 (PDT) Date: Wed, 4 May 2022 14:44:25 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-2-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 01/13] mm/khugepaged: record SCAN_PMD_MAPPED when scan_pmd() finds THP From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Queue-Id: 82C64C0085 X-Stat-Signature: y5yja9rqm68watut6rpscek7j71yygdg X-Rspam-User: Authentication-Results: imf28.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=e50YKsCY; spf=pass (imf28.hostedemail.com: domain of 33PNyYgcKCKkiXTNNONPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=33PNyYgcKCKkiXTNNONPXXPUN.LXVURWdg-VVTeJLT.XaP@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam09 X-HE-Tag: 1651700687-165126 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When scanning an anon pmd to see if it's eligible for collapse, return SCAN_PMD_MAPPED if the pmd already maps a THP. Note that SCAN_PMD_MAPPED is different from SCAN_PAGE_COMPOUND used in the file-collapse path, since the latter might identify pte-mapped compound pages. This is required by MADV_COLLAPSE which necessarily needs to know what hugepage-aligned/sized regions are already pmd-mapped. Signed-off-by: Zach O'Keefe --- include/trace/events/huge_memory.h | 1 + mm/internal.h | 1 + mm/khugepaged.c | 30 ++++++++++++++++++++++++++---- mm/rmap.c | 15 +++++++++++++-- 4 files changed, 41 insertions(+), 6 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index d651f3437367..55392bf30a03 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -11,6 +11,7 @@ EM( SCAN_FAIL, "failed") \ EM( SCAN_SUCCEED, "succeeded") \ EM( SCAN_PMD_NULL, "pmd_null") \ + EM( SCAN_PMD_MAPPED, "page_pmd_mapped") \ EM( SCAN_EXCEED_NONE_PTE, "exceed_none_pte") \ EM( SCAN_EXCEED_SWAP_PTE, "exceed_swap_pte") \ EM( SCAN_EXCEED_SHARED_PTE, "exceed_shared_pte") \ diff --git a/mm/internal.h b/mm/internal.h index 0667abd57634..51ae9f71a2a3 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -172,6 +172,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason /* * in mm/rmap.c: */ +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address); extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* diff --git a/mm/khugepaged.c b/mm/khugepaged.c index eb444fd45568..2c2ed6b4d96c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -28,6 +28,7 @@ enum scan_result { SCAN_FAIL, SCAN_SUCCEED, SCAN_PMD_NULL, + SCAN_PMD_MAPPED, SCAN_EXCEED_NONE_PTE, SCAN_EXCEED_SWAP_PTE, SCAN_EXCEED_SHARED_PTE, @@ -977,6 +978,29 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return 0; } +static int find_pmd_or_thp_or_none(struct mm_struct *mm, + unsigned long address, + pmd_t **pmd) +{ + pmd_t pmde; + + *pmd = mm_find_pmd_raw(mm, address); + if (!*pmd) + return SCAN_PMD_NULL; + + pmde = pmd_read_atomic(*pmd); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ + barrier(); +#endif + if (!pmd_present(pmde)) + return SCAN_PMD_NULL; + if (pmd_trans_huge(pmde)) + return SCAN_PMD_MAPPED; + return SCAN_SUCCEED; +} + /* * Bring missing pages in from swap, to complete THP collapse. * Only done if khugepaged_scan_pmd believes it is worthwhile. @@ -1228,11 +1252,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, VM_BUG_ON(address & ~HPAGE_PMD_MASK); - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; + result = find_pmd_or_thp_or_none(mm, address, &pmd); + if (result != SCAN_SUCCEED) goto out; - } memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); diff --git a/mm/rmap.c b/mm/rmap.c index 94d6b24a1ac2..6980b4011bf8 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -759,13 +759,12 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) return vma_address(page, vma); } -pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd = NULL; - pmd_t pmde; pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) @@ -780,6 +779,18 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) goto out; pmd = pmd_offset(pud, address); +out: + return pmd; +} + +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) +{ + pmd_t pmde; + pmd_t *pmd; + + pmd = mm_find_pmd_raw(mm, address); + if (!pmd) + goto out; /* * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() * without holding anon_vma lock for write. So when looking for a From patchwork Wed May 4 21:44:26 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838671 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C252FC433EF for ; Wed, 4 May 2022 21:45:22 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4DA996B0081; Wed, 4 May 2022 17:45:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 4895F6B0082; Wed, 4 May 2022 17:45:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 28FFF6B0083; Wed, 4 May 2022 17:45:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 1A8316B0081 for ; Wed, 4 May 2022 17:45:22 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 04BE42B142 for ; Wed, 4 May 2022 21:45:03 +0000 (UTC) X-FDA: 79429391328.13.2410CD3 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf01.hostedemail.com (Postfix) with ESMTP id 2F36E40095 for ; Wed, 4 May 2022 21:44:55 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id r16-20020a17090b051000b001db302efed7so1025322pjz.2 for ; Wed, 04 May 2022 14:45:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=xJPycQ5a1KbRgGdIdWaJxym2glfy9e1p0q154K69+2U=; b=amfV2fWBhfexwDvw4exF9OV9cUsO6Repq2FCGb6CnY1QHj4vnfxU3v9r7UG/ppl8a8 QdxUdPovlaAnsH4PkwkXuJ3qfLiV/D3E0eShZ2rcMqMJXlDXUI1JcvLkysXNKsH7ywOi 4Ro7TGrLoOY30UQYBFUO1NDed2npoJnT/ERav1Vk5peC6FzwZBUcZZeATijImyqmrUmP 6VEnzP4Zf+zc7gFAEAjeFX/yNN9AFHZgTyudIpxKNeGqlODtMTHyqsQPZxKsym9+HIbQ ioM453X+WWe3Qaalu3HbqpUKi7dYwZX30bmlvq/+eLxYdNsHHKjk2nBKAlqOrnjzGsjO DGyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=xJPycQ5a1KbRgGdIdWaJxym2glfy9e1p0q154K69+2U=; b=2vhxm67wq1dN5l1TILgsqyWFGN7uFxhFZJwOITFTcGGcNLm96D5WXAHL2cdGmp4fUj 4/9tLGOOeAwRKTmYS75VMctluSeYuuMAF/AH+EfEwbkFKw5pfLdXcKk777bn1dpUyjJ9 GM2Iu/Yk4GWSG4SqDHeP5L2oxnpuVGxq7jK4+LJTHdtXhwlulLyDdY5ft6DYDhM4kyPM 8J7lfA58AB4quvKRdaquQM4xNyTntUMYo7oD6YwD7VL0pK74jP/Vk6AYQGYbMGNe9p9d YJHP7Y/+1OIbJ8M3Suklsd3j9KtbjSkMrSP0/iIa75zFUTEkSCOSFrDMhb5P4rZjKsWl kgeQ== X-Gm-Message-State: AOAM53267FZtyX+uGLt3Veie2yQFVltKHwyt6r7zrwhMjV2wqDU4mJYn EsYVMufZR3uMnvmLSnL8K1xJ58AEqqwA X-Google-Smtp-Source: ABdhPJx10QNHrl9XkdW5/GNJLhagC7P8gAR/NOD5b7RrmeVt6uoUUL4MhJbaXWGkwpSKvDP49Si81Du8eavB X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:4c06:b0:1dc:861c:1cfb with SMTP id na6-20020a17090b4c0600b001dc861c1cfbmr1909951pjb.85.1651700702630; Wed, 04 May 2022 14:45:02 -0700 (PDT) Date: Wed, 4 May 2022 14:44:26 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-3-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 02/13] mm/khugepaged: add struct collapse_control From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: 3tw654qd34dimd4qbac1cob4iy9h989f Authentication-Results: imf01.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=amfV2fWB; spf=pass (imf01.hostedemail.com: domain of 33vNyYgcKCKskZVPPQPRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=33vNyYgcKCKskZVPPQPRZZRWP.NZXWTYfi-XXVgLNV.ZcR@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 2F36E40095 X-HE-Tag: 1651700695-707160 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize hugepage collapse by introducing struct collapse_control. This structure serves to describe the properties of the requested collapse, as well as serve as a local scratch pad to use during the collapse itself. Start by moving global per-node khugepaged statistics into this new structure, and stack allocate one for khugepaged collapse context. Signed-off-by: Zach O'Keefe Acked-by: David Rientjes Reviewed-by: Peter Xu --- mm/khugepaged.c | 87 ++++++++++++++++++++++++++++--------------------- 1 file changed, 49 insertions(+), 38 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 2c2ed6b4d96c..d3cb670921cd 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -86,6 +86,14 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 +struct collapse_control { + /* Num pages scanned per node */ + int node_load[MAX_NUMNODES]; + + /* Last target selected in khugepaged_find_target_node() */ + int last_target_node; +}; + /** * struct mm_slot - hash lookup from mm to mm_slot * @hash: hash collision list @@ -786,9 +794,7 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } -static int khugepaged_node_load[MAX_NUMNODES]; - -static bool khugepaged_scan_abort(int nid) +static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -800,11 +806,11 @@ static bool khugepaged_scan_abort(int nid) return false; /* If there is a count for this node already, it must be acceptable */ - if (khugepaged_node_load[nid]) + if (cc->node_load[nid]) return false; for (i = 0; i < MAX_NUMNODES; i++) { - if (!khugepaged_node_load[i]) + if (!cc->node_load[i]) continue; if (node_distance(nid, i) > node_reclaim_distance) return true; @@ -819,28 +825,27 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { - static int last_khugepaged_target_node = NUMA_NO_NODE; int nid, target_node = 0, max_value = 0; /* find first node with max normal pages hit */ for (nid = 0; nid < MAX_NUMNODES; nid++) - if (khugepaged_node_load[nid] > max_value) { - max_value = khugepaged_node_load[nid]; + if (cc->node_load[nid] > max_value) { + max_value = cc->node_load[nid]; target_node = nid; } /* do some balance if several nodes have the same hit record */ - if (target_node <= last_khugepaged_target_node) - for (nid = last_khugepaged_target_node + 1; nid < MAX_NUMNODES; - nid++) - if (max_value == khugepaged_node_load[nid]) { + if (target_node <= cc->last_target_node) + for (nid = cc->last_target_node + 1; nid < MAX_NUMNODES; + nid++) + if (max_value == cc->node_load[nid]) { target_node = nid; break; } - last_khugepaged_target_node = target_node; + cc->last_target_node = target_node; return target_node; } @@ -878,7 +883,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) return *hpage; } #else -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { return 0; } @@ -1235,10 +1240,9 @@ static void collapse_huge_page(struct mm_struct *mm, return; } -static int khugepaged_scan_pmd(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long address, - struct page **hpage) +static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, + unsigned long address, struct page **hpage, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1256,7 +1260,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, if (result != SCAN_SUCCEED) goto out; - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR; _pte++, _address += PAGE_SIZE) { @@ -1322,16 +1326,16 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, /* * Record which node the original page is from and save this - * information to khugepaged_node_load[]. + * information to cc->node_load[]. * Khugepaged will allocate hugepage from the node has the max * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; goto out_unmap; @@ -1382,7 +1386,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ collapse_huge_page(mm, address, hpage, node, referenced, unmapped); @@ -2033,8 +2037,9 @@ static void collapse_file(struct mm_struct *mm, /* TODO: tracepoints */ } -static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) +static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct page **hpage, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2045,7 +2050,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, present = 0; swap = 0; - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); rcu_read_lock(); xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) { if (xas_retry(&xas, page)) @@ -2070,11 +2075,11 @@ static void khugepaged_scan_file(struct mm_struct *mm, } node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; break; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; @@ -2107,7 +2112,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); collapse_file(mm, file, start, hpage, node); } } @@ -2115,8 +2120,9 @@ static void khugepaged_scan_file(struct mm_struct *mm, /* TODO: tracepoints */ } #else -static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) +static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct page **hpage, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2127,7 +2133,8 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage) + struct page **hpage, + struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) { @@ -2203,12 +2210,13 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage); + khugepaged_scan_file(mm, file, pgoff, hpage, + cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, khugepaged_scan.address, - hpage); + hpage, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2264,7 +2272,7 @@ static int khugepaged_wait_event(void) kthread_should_stop(); } -static void khugepaged_do_scan(void) +static void khugepaged_do_scan(struct collapse_control *cc) { struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; @@ -2288,7 +2296,7 @@ static void khugepaged_do_scan(void) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage); + &hpage, cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); @@ -2327,12 +2335,15 @@ static void khugepaged_wait_work(void) static int khugepaged(void *none) { struct mm_slot *mm_slot; + struct collapse_control cc = { + .last_target_node = NUMA_NO_NODE, + }; set_freezable(); set_user_nice(current, MAX_NICE); while (!kthread_should_stop()) { - khugepaged_do_scan(); + khugepaged_do_scan(&cc); khugepaged_wait_work(); } From patchwork Wed May 4 21:44:27 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838664 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3D01DC433F5 for ; Wed, 4 May 2022 21:45:07 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CCA7E6B0074; Wed, 4 May 2022 17:45:06 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C78BD6B0075; Wed, 4 May 2022 17:45:06 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B40656B0078; Wed, 4 May 2022 17:45:06 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A7DD76B0074 for ; Wed, 4 May 2022 17:45:06 -0400 (EDT) Received: from smtpin29.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8A229608DE for ; Wed, 4 May 2022 21:45:06 +0000 (UTC) X-FDA: 79429391412.29.17E6ACB Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf13.hostedemail.com (Postfix) with ESMTP id B2D0220089 for ; Wed, 4 May 2022 21:44:53 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id d64-20020a17090a6f4600b001da3937032fso3496006pjk.5 for ; Wed, 04 May 2022 14:45:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=yZ31JGTvgfOlGTmtkhpjekkkxapYjOkK+nff1sh1eRM=; b=chbJhrF9DlqD9ZKgf0v7IcrRKMQbrKWUZRqlI610pu7VUgEmZBIwnfP0albDK5OUBA Chu9K3KDdWgwZqqvVoa0+BMBzUM9DiWZ6aNDTZmcl9EmFeuMNwryfld9EjCTSOex3bE+ 1x6MCVhe3FRpZhvMrO6OV+I0z3scxywHsGMhDK5cgxnbMXg3Gu56ik762hwBd+xRDP/S TLKq2+E+RODrpGNxBqhGLa2lAiJENTPrklbgN3x8RQjdbxVABv2vTjiZt66Aw1N4ENMr 5zp0+OOn/tU6gchm/WSMGmb601cxTCDbCgdCE51DaH/Ae5TY5+3SNagup6tVqYoduy3U qgbA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=yZ31JGTvgfOlGTmtkhpjekkkxapYjOkK+nff1sh1eRM=; b=uCpc46526i+rg7NX3Xup/t99ioI9b2arUUdSWBbM5EscTxu5fTH3w5R9zQDY1LKIea Wejo/CnYY48b0lHMWRPGQstvgcq7jhxn9xKaCSF+lLU1/aO+8yJZBFxJO0JEjV58uHYS RCmaIiGE+S0z/2S9AzJNdKdxEho1WcxU7f2Lc18TyUHr1Vme964eYY8I0E5/wt7x4gyL Xxq3ozLTEDEWYfG6MSSCLJwkMivTx6dRb65KhjqcJ7bHjMz8xnJI8iDjicf6N8NLrILk tbFW5FBKVSEw2VqQpuCZPqy2Skb57zP/6W+z788/wI8w6zDmuRYu5XmHAKgh9yOpkVfP Fe5Q== X-Gm-Message-State: AOAM533t0UtTZub+hKsZ++5ysd8BourcfFXQ4aebGcm3WLJyV7LpTySa 6SXLl8y86ty661+pYCSmHjdbxV/eHxZ2 X-Google-Smtp-Source: ABdhPJzstN4c2mKX9U/geFdIC7UB+h9d/tlTfYcE+ARPgx4w6AmIC3slfUa1u7ARdskzQ61CA1xZ5Di9LvB+ X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:c986:b0:1d9:56e7:4e83 with SMTP id w6-20020a17090ac98600b001d956e74e83mr121566pjt.1.1651700704549; Wed, 04 May 2022 14:45:04 -0700 (PDT) Date: Wed, 4 May 2022 14:44:27 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-4-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 03/13] mm/khugepaged: dedup and simplify hugepage alloc and charging From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" Authentication-Results: imf13.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=chbJhrF9; spf=pass (imf13.hostedemail.com: domain of 34PNyYgcKCK0mbXRRSRTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=34PNyYgcKCK0mbXRRSRTbbTYR.PbZYVahk-ZZXiNPX.beT@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: B2D0220089 X-Stat-Signature: riq1pe9zwaaugzbw5zfy697cdgykonrx X-HE-Tag: 1651700693-490693 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The following code is duplicated in collapse_huge_page() and collapse_file(): /* Only allocate from the target node */ gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; new_page = khugepaged_alloc_page(hpage, gfp, node); if (!new_page) { result = SCAN_ALLOC_HUGE_PAGE_FAIL; goto out; } if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { result = SCAN_CGROUP_CHARGE_FAIL; goto out; } count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); Also, "node" is passed as an argument to both collapse_huge_page() and collapse_file() and obtained the same way, via khugepaged_find_target_node(). Move all this into a new helper, alloc_charge_hpage(), and remove the duplicate code from collapse_huge_page() and collapse_file(). Also, simplify khugepaged_alloc_page() by returning a bool indicating allocation success instead of a copy of the (possibly) allocated struct page. Suggested-by: Peter Xu Signed-off-by: Zach O'Keefe --- This patch currently depends on 'mm/khugepaged: sched to numa node when collapse huge page' currently being discussed upstream[1], and anticipates that this functionality would be equally applicable to file-backed collapse. It also goes ahead and wraps this code in a CONFIF_NUMA #ifdef. [1] https://lore.kernel.org/linux-mm/20220317065024.2635069-1-maobibo@loongson.cn/ mm/khugepaged.c | 99 +++++++++++++++++++++++-------------------------- 1 file changed, 46 insertions(+), 53 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index d3cb670921cd..c94bc43dff3e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -866,8 +866,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) { VM_BUG_ON_PAGE(*hpage, *hpage); @@ -875,12 +874,12 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) if (unlikely(!*hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); *hpage = ERR_PTR(-ENOMEM); - return NULL; + return false; } prep_transhuge_page(*hpage); count_vm_event(THP_COLLAPSE_ALLOC); - return *hpage; + return true; } #else static int khugepaged_find_target_node(struct collapse_control *cc) @@ -942,12 +941,11 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static struct page * -khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) { VM_BUG_ON(!*hpage); - return *hpage; + return true; } #endif @@ -1069,10 +1067,34 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - int node, int referenced, int unmapped) +static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, + struct collapse_control *cc) +{ +#ifdef CONFIG_NUMA + const struct cpumask *cpumask; +#endif + gfp_t gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; + int node = khugepaged_find_target_node(cc); + +#ifdef CONFIG_NUMA + /* sched to specified node before huge page memory copy */ + if (task_node(current) != node) { + cpumask = cpumask_of_node(node); + if (!cpumask_empty(cpumask)) + set_cpus_allowed_ptr(current, cpumask); + } +#endif + if (!khugepaged_alloc_page(hpage, gfp, node)) + return SCAN_ALLOC_HUGE_PAGE_FAIL; + if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp))) + return SCAN_CGROUP_CHARGE_FAIL; + count_memcg_page_event(*hpage, THP_COLLAPSE_ALLOC); + return SCAN_SUCCEED; +} + +static void collapse_huge_page(struct mm_struct *mm, unsigned long address, + struct page **hpage, int referenced, + int unmapped, struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1083,14 +1105,9 @@ static void collapse_huge_page(struct mm_struct *mm, int isolated = 0, result = 0; struct vm_area_struct *vma; struct mmu_notifier_range range; - gfp_t gfp; - const struct cpumask *cpumask; VM_BUG_ON(address & ~HPAGE_PMD_MASK); - /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - /* * Before allocating the hugepage, release the mmap_lock read lock. * The allocation can take potentially a long time if it involves @@ -1099,23 +1116,11 @@ static void collapse_huge_page(struct mm_struct *mm, */ mmap_read_unlock(mm); - /* sched to specified node before huage page memory copy */ - if (task_node(current) != node) { - cpumask = cpumask_of_node(node); - if (!cpumask_empty(cpumask)) - set_cpus_allowed_ptr(current, cpumask); - } - new_page = khugepaged_alloc_page(hpage, gfp, node); - if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + result = alloc_charge_hpage(hpage, mm, cc); + if (result != SCAN_SUCCEED) goto out_nolock; - } - if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; - goto out_nolock; - } - count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); + new_page = *hpage; mmap_read_lock(mm); result = hugepage_vma_revalidate(mm, address, &vma); @@ -1386,10 +1391,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, node, - referenced, unmapped); + collapse_huge_page(mm, address, hpage, referenced, unmapped, + cc); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1655,7 +1659,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @file: file that collapse on * @start: collapse start address * @hpage: new allocated huge page for collapse - * @node: appointed node the new huge page allocate from + * @cc: collapse context and scratchpad * * Basic scheme is simple, details are more complex: * - allocate and lock a new huge page; @@ -1672,12 +1676,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + restore gaps in the page cache; * + unlock and free huge page; */ -static void collapse_file(struct mm_struct *mm, - struct file *file, pgoff_t start, - struct page **hpage, int node) +static void collapse_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct page **hpage, + struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; - gfp_t gfp; struct page *new_page; pgoff_t index, end = start + HPAGE_PMD_NR; LIST_HEAD(pagelist); @@ -1689,20 +1692,11 @@ static void collapse_file(struct mm_struct *mm, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); - /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - - new_page = khugepaged_alloc_page(hpage, gfp, node); - if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; + result = alloc_charge_hpage(hpage, mm, cc); + if (result != SCAN_SUCCEED) goto out; - } - if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; - goto out; - } - count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); + new_page = *hpage; /* * Ensure we have slots for all the pages in the range. This is @@ -2112,8 +2106,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(cc); - collapse_file(mm, file, start, hpage, node); + collapse_file(mm, file, start, hpage, cc); } } From patchwork Wed May 4 21:44:28 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838665 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3095BC433EF for ; Wed, 4 May 2022 21:45:09 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BB81A6B0075; Wed, 4 May 2022 17:45:08 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B66DD6B0078; Wed, 4 May 2022 17:45:08 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A08AD6B007B; Wed, 4 May 2022 17:45:08 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 93C016B0075 for ; Wed, 4 May 2022 17:45:08 -0400 (EDT) Received: from smtpin21.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 4A938609FC for ; Wed, 4 May 2022 21:45:08 +0000 (UTC) X-FDA: 79429391496.21.E0FC458 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf12.hostedemail.com (Postfix) with ESMTP id 98C7840025 for ; Wed, 4 May 2022 21:44:51 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id i6-20020a17090a718600b001dc87aca289so1017800pjk.5 for ; Wed, 04 May 2022 14:45:07 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=YV+5blkKMPhHApKELd/OUVF7srg6niYmbYw8y83Xey0=; b=kqr9O7goq1m4GzEMcswiqwNaQu7CaX+t8xrBSlg7H69MqmSoT3+dp/8Sahd4dlmOAI wMV+saEDJqRgPQQImQ0lr35ylZUvtHuwto6NGJtG8u7cnDv2zsUt4bZbfsHhY/iZZTi9 kcW74LHstIpqP/5cLBZ4yFThu0Uq70vh/0B3PYa/4ohY3e1Cn3NrwxEToZLQKpslhiyi wAgDd/LQgyaf72AbOXGj/l6kSEpl3yGlCfnRFoa6DiWELLIcVaEtUbYRphMFEyw01+5t Ywfuo2Lg3dLv/ZR9pEMBAOTQMWG18+VAQboHBZljH/Tr1741KvCVxAJwTOLqSLqTsQfT R7vg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=YV+5blkKMPhHApKELd/OUVF7srg6niYmbYw8y83Xey0=; b=OHohLgaEBthAL0BgjzF2jmnoYskr/h6AGTDe34lUiJKvPtTvwVbvoKrmIm15CW9okQ SxWxd4waeBfa7Xw2rhqxlTH/7MwJ43kSRU06EFgOWt5nLW6gw5snzWesWLuU6iCZWjm2 KDoZ8HNs8Kao3pzupHNkeEFzYPp3jiB4pdotvePKNG/CpXikxycJYalp12XJ0yIYdLqu nxhFVckv6u3Wtq1sr1WbJuaqlGy9JMI0/4MCrFPZTEX6cIQZLI2LIxlNDaKcEFAXJIcz henVqMMIEqQVJhT5fNuU1pTqEv7IERG+9+mHdoSNzY9vBclZa6ZVrnAmtiqvS6SUGVOn XoGA== X-Gm-Message-State: AOAM531u+nJnYKI/7Va1DddbTly64u1dCVrudhdPpu2iMkIMnoDyT7Nr rBGpbD6LwicyhgX856/10NYrdonEzk1v X-Google-Smtp-Source: ABdhPJwo4y7+BKzZ9FtyXYSnR+dOadIveuGIUP3Dyh502NWEiLjgNV5MYcm2bOvgGLjtz62mn6BYRoUw0eCI X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:da91:b0:15e:d22f:cfd7 with SMTP id j17-20020a170902da9100b0015ed22fcfd7mr2564899plx.85.1651700706919; Wed, 04 May 2022 14:45:06 -0700 (PDT) Date: Wed, 4 May 2022 14:44:28 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-5-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 04/13] mm/khugepaged: make hugepage allocation context-specific From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Queue-Id: 98C7840025 X-Stat-Signature: s1unut6b7ppmrmimywgjsbuyfju735dq X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kqr9O7go; spf=pass (imf12.hostedemail.com: domain of 34vNyYgcKCK8odZTTUTVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=34vNyYgcKCK8odZTTUTVddVaT.RdbaXcjm-bbZkPRZ.dgV@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam09 X-HE-Tag: 1651700691-297036 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add a hook to struct collapse_context that allows contexts to define their own allocation semantics and charging logic. For example, khugepaged has specific NUMA and UMA implementations as well as gfp flags tied to /sys/kernel/mm/transparent_hugepage/khugepaged/defrag. Additionally, move [pre]allocated hugepage pointer into struct collapse_context. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 90 ++++++++++++++++++++++++------------------------- 1 file changed, 44 insertions(+), 46 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index c94bc43dff3e..6095fcb3f07c 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,6 +92,10 @@ struct collapse_control { /* Last target selected in khugepaged_find_target_node() */ int last_target_node; + + struct page *hpage; + int (*alloc_charge_hpage)(struct mm_struct *mm, + struct collapse_control *cc); }; /** @@ -866,18 +870,19 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(gfp_t gfp, int node, + struct collapse_control *cc) { - VM_BUG_ON_PAGE(*hpage, *hpage); + VM_BUG_ON_PAGE(cc->hpage, cc->hpage); - *hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!*hpage)) { + cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + if (unlikely(!cc->hpage)) { count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - *hpage = ERR_PTR(-ENOMEM); + cc->hpage = ERR_PTR(-ENOMEM); return false; } - prep_transhuge_page(*hpage); + prep_transhuge_page(cc->hpage); count_vm_event(THP_COLLAPSE_ALLOC); return true; } @@ -941,9 +946,10 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) return true; } -static bool khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) +static bool khugepaged_alloc_page(gfp_t gfp, int node, + struct collapse_control *cc) { - VM_BUG_ON(!*hpage); + VM_BUG_ON(!cc->hpage); return true; } @@ -1067,8 +1073,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, - struct collapse_control *cc) +static int alloc_charge_hpage(struct mm_struct *mm, struct collapse_control *cc) { #ifdef CONFIG_NUMA const struct cpumask *cpumask; @@ -1084,17 +1089,17 @@ static int alloc_charge_hpage(struct page **hpage, struct mm_struct *mm, set_cpus_allowed_ptr(current, cpumask); } #endif - if (!khugepaged_alloc_page(hpage, gfp, node)) + if (!khugepaged_alloc_page(gfp, node, cc)) return SCAN_ALLOC_HUGE_PAGE_FAIL; - if (unlikely(mem_cgroup_charge(page_folio(*hpage), mm, gfp))) + if (unlikely(mem_cgroup_charge(page_folio(cc->hpage), mm, gfp))) return SCAN_CGROUP_CHARGE_FAIL; - count_memcg_page_event(*hpage, THP_COLLAPSE_ALLOC); + count_memcg_page_event(cc->hpage, THP_COLLAPSE_ALLOC); return SCAN_SUCCEED; } static void collapse_huge_page(struct mm_struct *mm, unsigned long address, - struct page **hpage, int referenced, - int unmapped, struct collapse_control *cc) + int referenced, int unmapped, + struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1116,11 +1121,11 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, */ mmap_read_unlock(mm); - result = alloc_charge_hpage(hpage, mm, cc); + result = cc->alloc_charge_hpage(mm, cc); if (result != SCAN_SUCCEED) goto out_nolock; - new_page = *hpage; + new_page = cc->hpage; mmap_read_lock(mm); result = hugepage_vma_revalidate(mm, address, &vma); @@ -1232,21 +1237,21 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, update_mmu_cache_pmd(vma, address, pmd); spin_unlock(pmd_ptl); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); trace_mm_collapse_huge_page(mm, isolated, result); return; } static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long address, struct page **hpage, + unsigned long address, struct collapse_control *cc) { pmd_t *pmd; @@ -1392,8 +1397,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, pte_unmap_unlock(pte, ptl); if (ret) { /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, referenced, unmapped, - cc); + collapse_huge_page(mm, address, referenced, unmapped, cc); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1658,7 +1662,6 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * @mm: process address space where collapse happens * @file: file that collapse on * @start: collapse start address - * @hpage: new allocated huge page for collapse * @cc: collapse context and scratchpad * * Basic scheme is simple, details are more complex: @@ -1677,8 +1680,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + unlock and free huge page; */ static void collapse_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) + pgoff_t start, struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; struct page *new_page; @@ -1692,11 +1694,11 @@ static void collapse_file(struct mm_struct *mm, struct file *file, VM_BUG_ON(!IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS) && !is_shmem); VM_BUG_ON(start & (HPAGE_PMD_NR - 1)); - result = alloc_charge_hpage(hpage, mm, cc); + result = cc->alloc_charge_hpage(mm, cc); if (result != SCAN_SUCCEED) goto out; - new_page = *hpage; + new_page = cc->hpage; /* * Ensure we have slots for all the pages in the range. This is @@ -1979,7 +1981,7 @@ static void collapse_file(struct mm_struct *mm, struct file *file, * Remove pte page tables, so we can re-fault the page as huge. */ retract_page_tables(mapping, start); - *hpage = NULL; + cc->hpage = NULL; khugepaged_pages_collapsed++; } else { @@ -2026,14 +2028,13 @@ static void collapse_file(struct mm_struct *mm, struct file *file, unlock_page(new_page); out: VM_BUG_ON(!list_empty(&pagelist)); - if (!IS_ERR_OR_NULL(*hpage)) - mem_cgroup_uncharge(page_folio(*hpage)); + if (!IS_ERR_OR_NULL(cc->hpage)) + mem_cgroup_uncharge(page_folio(cc->hpage)); /* TODO: tracepoints */ } static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) + pgoff_t start, struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2106,7 +2107,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - collapse_file(mm, file, start, hpage, cc); + collapse_file(mm, file, start, cc); } } @@ -2114,8 +2115,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, } #else static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct page **hpage, - struct collapse_control *cc) + pgoff_t start, struct collapse_control *cc) { BUILD_BUG(); } @@ -2126,7 +2126,6 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage, struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) @@ -2203,13 +2202,11 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage, - cc); + khugepaged_scan_file(mm, file, pgoff, cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - hpage, cc); + khugepaged_scan.address, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2267,15 +2264,15 @@ static int khugepaged_wait_event(void) static void khugepaged_do_scan(struct collapse_control *cc) { - struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; unsigned int pages = READ_ONCE(khugepaged_pages_to_scan); bool wait = true; + cc->hpage = NULL; lru_add_drain_all(); while (progress < pages) { - if (!khugepaged_prealloc_page(&hpage, &wait)) + if (!khugepaged_prealloc_page(&cc->hpage, &wait)) break; cond_resched(); @@ -2289,14 +2286,14 @@ static void khugepaged_do_scan(struct collapse_control *cc) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage, cc); + cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); } - if (!IS_ERR_OR_NULL(hpage)) - put_page(hpage); + if (!IS_ERR_OR_NULL(cc->hpage)) + put_page(cc->hpage); } static bool khugepaged_should_wakeup(void) @@ -2330,6 +2327,7 @@ static int khugepaged(void *none) struct mm_slot *mm_slot; struct collapse_control cc = { .last_target_node = NUMA_NO_NODE, + .alloc_charge_hpage = &alloc_charge_hpage, }; set_freezable(); From patchwork Wed May 4 21:44:29 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838666 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 835D1C433EF for ; Wed, 4 May 2022 21:45:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 130AA6B0078; Wed, 4 May 2022 17:45:11 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 107E26B007B; Wed, 4 May 2022 17:45:11 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id F11BB6B007D; Wed, 4 May 2022 17:45:10 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id E2B1C6B0078 for ; Wed, 4 May 2022 17:45:10 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id C87AC869 for ; Wed, 4 May 2022 21:45:10 +0000 (UTC) X-FDA: 79429391580.17.DC3342F Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf02.hostedemail.com (Postfix) with ESMTP id 012FA8009B for ; Wed, 4 May 2022 21:45:04 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id g11-20020a17090a640b00b001dca0c276e7so1248550pjj.4 for ; Wed, 04 May 2022 14:45:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=0CryALte+9KIobVi1eR/iam4hpx8sIVk2MsWB6ofw6A=; b=kHHaX5iS51swqhtsgYYWmXI3ctsZ0qm/Cx90N8hSAnNFvIWPIsWS8MZWJb/3LsoveS GDdE7EZjyAIg4LDYS5IPMFLpksCZXOCCneI/Hwytlz3oJ6nSZoDM3/FDt7RpB8S0jEg/ 6OkSsyJ3ltpQkrU3yaWd7hiJmuMeLl6CQ2vXeTUpiNbR+pURcmGTUM2/0MPeFUgvEoKr n6MHzLUEsaifPbidZrMzT5TsfNiQtLPmg01DKkJEk6yuL7GrbH8bkxIVPDA7f/03onV6 L7Z2XM8lcpw2NaAhmCIFcV9TsWv1fzFOvHzTOOjCNjVLyO5CUKG6JxHKUaIIJakD3KUr 0MHA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=0CryALte+9KIobVi1eR/iam4hpx8sIVk2MsWB6ofw6A=; b=Bv4eF+4gKRBaCgwT6yamIHqOMvWFf+ax1AI+r0Zz6dOWN8nlDYOa1nLo816lqeZjLj U4K8q0YddNb2sw1dh8kUkTAAhPQBrRGgvje/7w5IHDv2Gws5ZmR5EGUcX0swhfvbj0pk LqEg6Om4sL9HC1nrRXX17NQnom8uM67YRPh96R3G7rvLMQlSAlPZPIO7VygLAhJ4+rXS adJ1gfmXJv1kiTdbc0VZr91n5MyXGWdgHakodLyeJME8eRRvtOFWCjIhUJUhn/86oIRE 4tZAjwDybg8zs/eFYU/0+cdYGBVplJBzjnjEXqy4f3sKL9hsl0/8R3IL+AZ2486pLPa+ HauA== X-Gm-Message-State: AOAM5338oelw1ZW0XkTGLAKyb3DvpYqKLNUmVR+WIeFZbeZVFpOVYabX 8IfLdkglVVde3K0eglweYALegjDiYzbB X-Google-Smtp-Source: ABdhPJyG/ppqIs1noVxxijfvSai2/ztzd/URaiLGt9tpytXKm6220snu91SFq5HVWe992QsVrtEOB4d1z9Oh X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:c8b:b0:50f:d589:e7b8 with SMTP id a11-20020a056a000c8b00b0050fd589e7b8mr6683408pfv.42.1651700709115; Wed, 04 May 2022 14:45:09 -0700 (PDT) Date: Wed, 4 May 2022 14:44:29 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-6-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 05/13] mm/khugepaged: pipe enum scan_result codes back to callers From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kHHaX5iS; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf02.hostedemail.com: domain of 35fNyYgcKCLIrgcWWXWYggYdW.Ugedafmp-eecnSUc.gjY@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=35fNyYgcKCLIrgcWWXWYggYdW.Ugedafmp-eecnSUc.gjY@flex--zokeefe.bounces.google.com X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 012FA8009B X-Rspam-User: X-Stat-Signature: q4ko3jjtmx7x36k5higw1wdsh7sdt55r X-HE-Tag: 1651700704-457284 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Pipe enum scan_result codes back through return values of functions downstream of khugepaged_scan_file() and khugepaged_scan_pmd() to inform callers if the operation was successful, and if not, why. Since khugepaged_scan_pmd()'s return value already has a specific meaning (whether mmap_lock was unlocked or not), add a bool* argument to khugepaged_scan_pmd() to retrieve this information. Change khugepaged to take action based on the return values of khugepaged_scan_file() and khugepaged_scan_pmd() instead of acting deep within the collapsing functions themselves. Signed-off-by: Zach O'Keefe Acked-by: David Rientjes --- mm/khugepaged.c | 72 ++++++++++++++++++++++++++----------------------- 1 file changed, 39 insertions(+), 33 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 6095fcb3f07c..1314caed65b0 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -732,13 +732,13 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_SUCCEED; trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); - return 1; + return SCAN_SUCCEED; } out: release_pte_pages(pte, _pte, compound_pagelist); trace_mm_collapse_huge_page_isolate(page, none_or_zero, referenced, writable, result); - return 0; + return result; } static void __collapse_huge_page_copy(pte_t *pte, struct page *page, @@ -1097,9 +1097,9 @@ static int alloc_charge_hpage(struct mm_struct *mm, struct collapse_control *cc) return SCAN_SUCCEED; } -static void collapse_huge_page(struct mm_struct *mm, unsigned long address, - int referenced, int unmapped, - struct collapse_control *cc) +static int collapse_huge_page(struct mm_struct *mm, unsigned long address, + int referenced, int unmapped, + struct collapse_control *cc) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1107,7 +1107,7 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, pgtable_t pgtable; struct page *new_page; spinlock_t *pmd_ptl, *pte_ptl; - int isolated = 0, result = 0; + int result = SCAN_FAIL; struct vm_area_struct *vma; struct mmu_notifier_range range; @@ -1187,11 +1187,11 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - isolated = __collapse_huge_page_isolate(vma, address, pte, - &compound_pagelist); + result = __collapse_huge_page_isolate(vma, address, pte, + &compound_pagelist); spin_unlock(pte_ptl); - if (unlikely(!isolated)) { + if (unlikely(result != SCAN_SUCCEED)) { pte_unmap(pte); spin_lock(pmd_ptl); BUG_ON(!pmd_none(*pmd)); @@ -1239,24 +1239,23 @@ static void collapse_huge_page(struct mm_struct *mm, unsigned long address, cc->hpage = NULL; - khugepaged_pages_collapsed++; result = SCAN_SUCCEED; out_up_write: mmap_write_unlock(mm); out_nolock: if (!IS_ERR_OR_NULL(cc->hpage)) mem_cgroup_uncharge(page_folio(cc->hpage)); - trace_mm_collapse_huge_page(mm, isolated, result); - return; + trace_mm_collapse_huge_page(mm, result == SCAN_SUCCEED, result); + return result; } static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long address, + unsigned long address, bool *mmap_locked, struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; - int ret = 0, result = 0, referenced = 0; + int result = SCAN_FAIL, referenced = 0; int none_or_zero = 0, shared = 0; struct page *page = NULL; unsigned long _address; @@ -1391,18 +1390,19 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; - ret = 1; } out_unmap: pte_unmap_unlock(pte, ptl); - if (ret) { + if (result == SCAN_SUCCEED) { /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, referenced, unmapped, cc); + *mmap_locked = false; + result = collapse_huge_page(mm, address, referenced, + unmapped, cc); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, none_or_zero, result, unmapped); - return ret; + return result; } static void collect_mm_slot(struct mm_slot *mm_slot) @@ -1679,8 +1679,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * + restore gaps in the page cache; * + unlock and free huge page; */ -static void collapse_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct collapse_control *cc) +static int collapse_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct collapse_control *cc) { struct address_space *mapping = file->f_mapping; struct page *new_page; @@ -1982,8 +1982,6 @@ static void collapse_file(struct mm_struct *mm, struct file *file, */ retract_page_tables(mapping, start); cc->hpage = NULL; - - khugepaged_pages_collapsed++; } else { struct page *page; @@ -2031,10 +2029,11 @@ static void collapse_file(struct mm_struct *mm, struct file *file, if (!IS_ERR_OR_NULL(cc->hpage)) mem_cgroup_uncharge(page_folio(cc->hpage)); /* TODO: tracepoints */ + return result; } -static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct collapse_control *cc) +static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, + pgoff_t start, struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2107,15 +2106,16 @@ static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - collapse_file(mm, file, start, cc); + result = collapse_file(mm, file, start, cc); } } /* TODO: tracepoints */ + return result; } #else -static void khugepaged_scan_file(struct mm_struct *mm, struct file *file, - pgoff_t start, struct collapse_control *cc) +static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, pgoff_t start, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2187,7 +2187,9 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, goto skip; while (khugepaged_scan.address < hend) { - int ret; + int result; + bool mmap_locked = true; + cond_resched(); if (unlikely(khugepaged_test_exit(mm))) goto breakouterloop; @@ -2201,17 +2203,21 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, khugepaged_scan.address); mmap_read_unlock(mm); - ret = 1; - khugepaged_scan_file(mm, file, pgoff, cc); + mmap_locked = false; + result = khugepaged_scan_file(mm, file, pgoff, + cc); fput(file); } else { - ret = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, cc); + result = khugepaged_scan_pmd(mm, vma, + khugepaged_scan.address, + &mmap_locked, cc); } + if (result == SCAN_SUCCEED) + ++khugepaged_pages_collapsed; /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; progress += HPAGE_PMD_NR; - if (ret) + if (!mmap_locked) /* we released mmap_lock so break loop */ goto breakouterloop_mmap_lock; if (progress >= pages) From patchwork Wed May 4 21:44:30 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838667 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id D2A5EC433EF for ; Wed, 4 May 2022 21:45:13 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6B01E6B007B; Wed, 4 May 2022 17:45:13 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6604E6B007D; Wed, 4 May 2022 17:45:13 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 48AE36B007E; Wed, 4 May 2022 17:45:13 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 39FDB6B007B for ; Wed, 4 May 2022 17:45:13 -0400 (EDT) Received: from smtpin13.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 0CD082BDC5 for ; Wed, 4 May 2022 21:45:13 +0000 (UTC) X-FDA: 79429391706.13.90603C7 Received: from mail-yb1-f201.google.com (mail-yb1-f201.google.com [209.85.219.201]) by imf11.hostedemail.com (Postfix) with ESMTP id A31324007E for ; Wed, 4 May 2022 21:45:08 +0000 (UTC) Received: by mail-yb1-f201.google.com with SMTP id h14-20020a25e20e000000b006484e4a1da2so2203686ybe.9 for ; Wed, 04 May 2022 14:45:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=6ylFBVibThCvrnccmW0woox5oR29xo59yENHBOiZ9ik=; b=OpAVbzyVUC2DBZdsf3dxfYBsNP6lJ9C9BuSS/Zy8cvRJWyFWWczRF2lvfg8bqEJpeF NC6Hl2Q6G7yfnKYYoJ/Z0h689Yrw/aFmq5ed0yykSBtetqAISHRHYGrQnbCoW7C3FqDp H3Iafrs9i5PzYon4ewGQjUH7VL//sUqS++ylKlLUsiax5+dNmEPhVjqjVECqWlCV2fwP tvr4wCTVGrszBpUm3GRwXvJsKu9vTWnqEo+/gTN2NPSpoxFXX1FlBW1cLpLU/wNsG9Aw LCiBXhy4kMaizILn9mhFCCNA6ielkxlppBxw92fUFY+sjh33w9g70I82F0VCwkxKfa+j lsxQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=6ylFBVibThCvrnccmW0woox5oR29xo59yENHBOiZ9ik=; b=iFn00Qy92rBbNXPlljbG7JerP/dVdX3wO+DuTuozprl3jBbO6XEyP0CHpsxiAB1NpI Dz+lvPmc3RLdRSh+Xd0/3sDMU2WLkfyS7TGZV2ENOjF6VXtfvhOGhP8pRWYdvIxnOvym PjeB2efvdzGtYFy9q2+Yuzr8jmz7u9dbBG7S/MikNjIfiSfvD/xGDo9fzHnj0Umi7S2i LIggq28L0naCmZniRCImFGfTlQb0xkrABot1efiojU0SfUvwZaH4DGRwnoL0T3bZAyKD sVMTlUNgebsdlA0BzC07Yyp4FNqfFWJPJco33A9UE5kiQjehlKNB/wim8v8mcSYelQON 0S4w== X-Gm-Message-State: AOAM530Pao9C1TcZtlq2mgy9VC936mmm4wfwJ+yBqw2xfZjCPriGQCKp 2+xo7U1Jx2/WM4n5McD3LBQeLVh9lPF3 X-Google-Smtp-Source: ABdhPJyA5gZQdg9lQlnkZX6yz2NBiF1nGVAvuYC03MVjXE0rsPPH5DKN3QGH5TxTHwMG/9jXkJ4N4F4lUamV X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6902:684:b0:648:7045:7d00 with SMTP id i4-20020a056902068400b0064870457d00mr19884001ybt.29.1651700711483; Wed, 04 May 2022 14:45:11 -0700 (PDT) Date: Wed, 4 May 2022 14:44:30 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-7-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 06/13] mm/khugepaged: add flag to ignore khugepaged_max_ptes_* From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: tiwzs6fibwtnasciernrh3biry64x7g1 X-Rspamd-Server: rspam07 X-Rspamd-Queue-Id: A31324007E X-Rspam-User: Authentication-Results: imf11.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=OpAVbzyV; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf11.hostedemail.com: domain of 35_NyYgcKCLQtieYYZYaiiafY.Wigfchor-ggepUWe.ila@flex--zokeefe.bounces.google.com designates 209.85.219.201 as permitted sender) smtp.mailfrom=35_NyYgcKCLQtieYYZYaiiafY.Wigfchor-ggepUWe.ila@flex--zokeefe.bounces.google.com X-HE-Tag: 1651700708-929014 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add enforce_pte_scan_limits flag to struct collapse_control that allows context to ignore the sysfs-controlled knobs khugepaged_max_ptes_[none|swap|shared] and set this flag in khugepaged collapse context to preserve existing khugepaged behavior. This flag will be used (unset) when introducing madvise collapse context since here, the user presumably has reason to believe the collapse will be beneficial and khugepaged heuristics shouldn't tell the user they are wrong. Signed-off-by: Zach O'Keefe Acked-by: David Rientjes --- mm/khugepaged.c | 31 +++++++++++++++++++++---------- 1 file changed, 21 insertions(+), 10 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1314caed65b0..ca730aec0e3e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -87,6 +87,9 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 struct collapse_control { + /* Respect khugepaged_max_ptes_[none|swap|shared] */ + bool enforce_pte_scan_limits; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -614,6 +617,7 @@ static bool is_refcount_suitable(struct page *page) static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, + struct collapse_control *cc, struct list_head *compound_pagelist) { struct page *page = NULL; @@ -627,7 +631,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (pte_none(pteval) || (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_pte_scan_limits)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -647,8 +652,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageAnon(page), page); - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->enforce_pte_scan_limits && page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -1187,7 +1192,7 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, mmu_notifier_invalidate_range_end(&range); spin_lock(pte_ptl); - result = __collapse_huge_page_isolate(vma, address, pte, + result = __collapse_huge_page_isolate(vma, address, pte, cc, &compound_pagelist); spin_unlock(pte_ptl); @@ -1275,7 +1280,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; if (is_swap_pte(pteval)) { - if (++unmapped <= khugepaged_max_ptes_swap) { + if (++unmapped <= khugepaged_max_ptes_swap || + !cc->enforce_pte_scan_limits) { /* * Always be strict with uffd-wp * enabled swap entries. Please see @@ -1294,7 +1300,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_pte_scan_limits)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -1324,8 +1331,9 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, goto out_unmap; } - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (cc->enforce_pte_scan_limits && + page_mapcount(page) > 1 && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; @@ -2051,7 +2059,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, continue; if (xa_is_value(page)) { - if (++swap > khugepaged_max_ptes_swap) { + if (cc->enforce_pte_scan_limits && + ++swap > khugepaged_max_ptes_swap) { result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); break; @@ -2102,7 +2111,8 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, rcu_read_unlock(); if (result == SCAN_SUCCEED) { - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { + if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none && + cc->enforce_pte_scan_limits) { result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { @@ -2332,6 +2342,7 @@ static int khugepaged(void *none) { struct mm_slot *mm_slot; struct collapse_control cc = { + .enforce_pte_scan_limits = true, .last_target_node = NUMA_NO_NODE, .alloc_charge_hpage = &alloc_charge_hpage, }; From patchwork Wed May 4 21:44:31 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838668 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 2C04CC433EF for ; Wed, 4 May 2022 21:45:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id BDF8A6B007D; Wed, 4 May 2022 17:45:15 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id B8F876B007E; Wed, 4 May 2022 17:45:15 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A56546B0080; Wed, 4 May 2022 17:45:15 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 955676B007D for ; Wed, 4 May 2022 17:45:15 -0400 (EDT) Received: from smtpin24.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 632102BDE4 for ; Wed, 4 May 2022 21:45:15 +0000 (UTC) X-FDA: 79429391790.24.5F9D34E Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf18.hostedemail.com (Postfix) with ESMTP id 439841C0084 for ; Wed, 4 May 2022 21:45:07 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id t24-20020a17090a449800b001d2d6e740c3so3487487pjg.9 for ; Wed, 04 May 2022 14:45:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=5AZXzap0NKDHHUVSd4ZmCTitOVST19Wp792E9GujTtg=; b=Om00ZpZ/vyrJoB0r27H42fJGPDM2niFnq21PnJWkmVOpEb3cyRwBq+HLWcjRA9nRxg Y8WvQ4BWNN9lU3Dkdl8q2yL6gic7CPcoJOeq2e+2tilPxDpomH9NWhKEP4ul8P69yhHq WwbesZM7H1ZupxMivYaPsK/P0vhTY+Qoq7hv44mY2wS2SK/VShW35CzpA3OU3Dl9gUJg HbnwSiiEo52HwSKBtbBiTFuKBvWkvD2bNf4+CrVguXeX0c299fC74lYRjp9moJWFw/bR 390aA0qZZAVX8bDAjOIMdg0A10KNxyir8CcjAgFPSBdGlVI06Zv9KT+dkUGonG6JZygV e3jg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=5AZXzap0NKDHHUVSd4ZmCTitOVST19Wp792E9GujTtg=; b=lqNc4MCHjPzdWY8MvyNWD3pGn+Wp6pGgYSAdyyRMJL5ejjIYwPxM2B3gL7EVAlIsNN 0ajgFgiQPKTE0aCjO6i9SbnROW8QS+MlJjxxhyJnhEzguavOVHGzLATutvXpLV77ahxd nPPJmsukroIY3+HRq0AO4ilGbAxOfntTZp9MPiRc2BmuAvphT0Pw7y7VnpmUTOVHTDLH 05Db26RheQ4RFOf0en72FnErvX5HIcE9NVMLqAvTqG7+rU600767bjXDR8aclj6iPkH5 FUTPMsThgWhkDvzuLhBaNbaYoQMSdHj8p3hKkev3le3+UlVG8msE+zqVsilXnkV8BZPn 6yJw== X-Gm-Message-State: AOAM532G5LEWhgoR7l4g6miK4VwKAyo7wpV/FtQdXkzruvVejEoKFh3z vGz7uQwOiTJWMGyYy06y9IfR7N8gqUW/ X-Google-Smtp-Source: ABdhPJxGSyt8ncT6PLYhulgEg5MMlpixCwNvmBTjxi44bb/zKMEvXyhFxKdSBtABb3lcAZKHPD3kYY1VXg7N X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:150a:b0:510:3ea4:4cd9 with SMTP id q10-20020a056a00150a00b005103ea44cd9mr4582007pfu.27.1651700713893; Wed, 04 May 2022 14:45:13 -0700 (PDT) Date: Wed, 4 May 2022 14:44:31 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-8-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 07/13] mm/khugepaged: add flag to ignore page young/referenced requirement From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 439841C0084 X-Stat-Signature: 7qpa43hwi3hk9agprj3b88ge54pf3j8e X-Rspam-User: Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="Om00ZpZ/"; spf=pass (imf18.hostedemail.com: domain of 36fNyYgcKCLYvkgaabackkcha.Ykihejqt-iigrWYg.knc@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=36fNyYgcKCLYvkgaabackkcha.Ykihejqt-iigrWYg.knc@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1651700707-617558 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add enforce_young flag to struct collapse_control that allows context to ignore requirement that some pages in region being collapsed be young or referenced. Set this flag in khugepaged collapse context to preserve existing khugepaged behavior. This flag will be used (unset) when introducing madvise collapse context since here, the user presumably has reason to believe the collapse will be beneficial and khugepaged heuristics shouldn't tell the user they are wrong. Signed-off-by: Zach O'Keefe Acked-by: David Rientjes --- mm/khugepaged.c | 23 +++++++++++++++-------- 1 file changed, 15 insertions(+), 8 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ca730aec0e3e..b14807b7002e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -90,6 +90,9 @@ struct collapse_control { /* Respect khugepaged_max_ptes_[none|swap|shared] */ bool enforce_pte_scan_limits; + /* Require memory to be young */ + bool enforce_young; + /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; @@ -720,9 +723,10 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, list_add_tail(&page->lru, compound_pagelist); next: /* There should be enough young pte to collapse the page */ - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + if (cc->enforce_young && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; if (pte_write(pteval)) @@ -731,7 +735,7 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (unlikely(!writable)) { result = SCAN_PAGE_RO; - } else if (unlikely(!referenced)) { + } else if (unlikely(cc->enforce_young && !referenced)) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -1387,14 +1391,16 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, result = SCAN_PAGE_COUNT; goto out_unmap; } - if (pte_young(pteval) || - page_is_young(page) || PageReferenced(page) || - mmu_notifier_test_young(vma->vm_mm, address)) + if (cc->enforce_young && + (pte_young(pteval) || page_is_young(page) || + PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, + address))) referenced++; } if (!writable) { result = SCAN_PAGE_RO; - } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { + } else if (cc->enforce_young && (!referenced || (unmapped && referenced + < HPAGE_PMD_NR / 2))) { result = SCAN_LACK_REFERENCED_PAGE; } else { result = SCAN_SUCCEED; @@ -2343,6 +2349,7 @@ static int khugepaged(void *none) struct mm_slot *mm_slot; struct collapse_control cc = { .enforce_pte_scan_limits = true, + .enforce_young = true, .last_target_node = NUMA_NO_NODE, .alloc_charge_hpage = &alloc_charge_hpage, }; From patchwork Wed May 4 21:44:32 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838669 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3BB36C433F5 for ; Wed, 4 May 2022 21:45:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CA1DA6B007E; Wed, 4 May 2022 17:45:17 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id C7B2D6B0080; Wed, 4 May 2022 17:45:17 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B426B6B0081; Wed, 4 May 2022 17:45:17 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A57996B007E for ; Wed, 4 May 2022 17:45:17 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id 791F86072A for ; Wed, 4 May 2022 21:45:17 +0000 (UTC) X-FDA: 79429391874.17.D55F2E9 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf29.hostedemail.com (Postfix) with ESMTP id 12EA312009D for ; Wed, 4 May 2022 21:45:11 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id h17-20020a656391000000b003c606ca1b47so561590pgv.19 for ; Wed, 04 May 2022 14:45:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=aoefKHUWo5PpG8G3s9XyGU1sEYbrXTzPoLudMOwuSF4=; b=NkpmaCze/SdonMNg4nf+ZkwTRH1/IvUTp3Hth0Rg/SoQ6iF0VXoN5aU6tYCZlYDXY1 DU9Moom1OEfhm7t3PPb/9F5ZK8RbSdJGHOQZTdpPRNl7l6O3Nbo2hlntnhm6V3S91gqB DL1xdGmj4BsQsOUieTxCm8oZGVfD66hJYn8sWmKFoEsD55AOzWyZiaw9Y3g0VhnquoR8 PDCDHu2vbBfCG5TTfX56teK8pHGHE2xSTrY31oCJ36f8P5hPt3WIyEFouMrLNnT0zujz 5/VVQmeASfvR4lYYwoPBWuVje4KKZV3B5bMlCuMslMChHlKcoSUcqZ3GJqGmsJR003NM BmbQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=aoefKHUWo5PpG8G3s9XyGU1sEYbrXTzPoLudMOwuSF4=; b=uxltfJK/4PA8OPZe5qRMrEq2Pzj7F4aN4b1jfeuigH4A5IHiNlXhEsmWxMKvJY4cRq NdQUqIMM8I8VGwPCftVXwpLtRr8CPoQL1uaaLy676WGZc/KawTTcbQ7qhbuPxblE1zm1 pzTIJDXfC6JSJJC0/dNmSHaqExPMxwHWwj41rWl3MuJtXb+AmZYhDNRioQNBpgGv8ehE ikaVlRHVo8KlSofzJVwuNYW7sWNS9JXvV2oSJz1ehLUjTMzr9eIpx+q4Rkn2K+EU1mkP p3LwW3YYlrwdXFQTxeVrp1EfORYC51GKtex1A5ptKA6hH16o3vCi6aF1MxR2Etjot9r8 S1FA== X-Gm-Message-State: AOAM532rtPr9M+e3T8yi9ipp0DvKKK2mkLSUmHyJijyc2jJdhZsr8Tf6 kM5x2ADobfKkzdfF0ZwHclfsXScBIsOR X-Google-Smtp-Source: ABdhPJyazgHMUv4NURo1aq53EcclKW0SgfplzkSNOr/6xT7X/JGvltF4CzQMeOTNaxAaYtTbwQKrMo2l0kKb X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:1d0e:b0:1d2:79e9:21aa with SMTP id on14-20020a17090b1d0e00b001d279e921aamr1864201pjb.153.1651700715936; Wed, 04 May 2022 14:45:15 -0700 (PDT) Date: Wed, 4 May 2022 14:44:32 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-9-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 08/13] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: 12EA312009D X-Stat-Signature: ig8o6s8gwwi11bw7unyp9w6cwbp4n7ko X-Rspam-User: Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=NkpmaCze; spf=pass (imf29.hostedemail.com: domain of 36_NyYgcKCLgxmiccdcemmejc.amkjglsv-kkitYai.mpe@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=36_NyYgcKCLgxmiccdcemmejc.amkjglsv-kkitYai.mpe@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1651700711-835736 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This idea was introduced by David Rientjes[1]. Introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory at their own expense. The benefits of this approach are: * CPU is charged to the process that wants to spend the cycles for the THP * Avoid unpredictable timing of khugepaged collapse An immediate user of this new functionality are malloc() implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THPs to regain hugepage coverage and dTLB performance. TCMalloc is such an implementation that could benefit from this[2]. Only privately-mapped anon memory is supported for now, but it is expected that file and shmem support will be added later to support the use-case of backing executable text by THPs. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take a long time on a large system which might impair services from serving at their full rated load after (re)starting. Tricks like mremap(2)'ing text onto anonymous memory to immediately realize iTLB performance prevents page sharing and demand paging, both of which increase steady state memory footprint. With MADV_COLLAPSE, we get the best of both worlds: Peak upfront performance and lower RAM footprints. This call respects THP eligibility as determined by the system-wide /sys/kernel/mm/transparent_hugepage/enabled sysfs settings and the VMA flags for the memory range being collapsed. THP allocation may enter direct reclaim and/or compaction. [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ [2] https://github.com/google/tcmalloc/tree/master/tcmalloc Suggested-by: David Rientjes Signed-off-by: Zach O'Keefe --- arch/alpha/include/uapi/asm/mman.h | 2 + arch/mips/include/uapi/asm/mman.h | 2 + arch/parisc/include/uapi/asm/mman.h | 2 + arch/xtensa/include/uapi/asm/mman.h | 2 + include/linux/huge_mm.h | 12 ++ include/uapi/asm-generic/mman-common.h | 2 + mm/khugepaged.c | 167 +++++++++++++++++++++++-- mm/madvise.c | 5 + 8 files changed, 182 insertions(+), 12 deletions(-) diff --git a/arch/alpha/include/uapi/asm/mman.h b/arch/alpha/include/uapi/asm/mman.h index 4aa996423b0d..763929e814e9 100644 --- a/arch/alpha/include/uapi/asm/mman.h +++ b/arch/alpha/include/uapi/asm/mman.h @@ -76,6 +76,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/mips/include/uapi/asm/mman.h b/arch/mips/include/uapi/asm/mman.h index 1be428663c10..c6e1fc77c996 100644 --- a/arch/mips/include/uapi/asm/mman.h +++ b/arch/mips/include/uapi/asm/mman.h @@ -103,6 +103,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/arch/parisc/include/uapi/asm/mman.h b/arch/parisc/include/uapi/asm/mman.h index a7ea3204a5fa..22133a6a506e 100644 --- a/arch/parisc/include/uapi/asm/mman.h +++ b/arch/parisc/include/uapi/asm/mman.h @@ -70,6 +70,8 @@ #define MADV_WIPEONFORK 71 /* Zero memory on fork, child only */ #define MADV_KEEPONFORK 72 /* Undo MADV_WIPEONFORK */ +#define MADV_COLLAPSE 73 /* Synchronous hugepage collapse */ + #define MADV_HWPOISON 100 /* poison a page for testing */ #define MADV_SOFT_OFFLINE 101 /* soft offline page for testing */ diff --git a/arch/xtensa/include/uapi/asm/mman.h b/arch/xtensa/include/uapi/asm/mman.h index 7966a58af472..1ff0c858544f 100644 --- a/arch/xtensa/include/uapi/asm/mman.h +++ b/arch/xtensa/include/uapi/asm/mman.h @@ -111,6 +111,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 9a26bd10e083..4a2ea1b5437c 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -222,6 +222,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); +int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -378,6 +381,15 @@ static inline int hugepage_madvise(struct vm_area_struct *vma, BUG(); return 0; } + +static inline int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + BUG(); + return 0; +} + static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6c1aa92a92e4..6ce1f1ceb432 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -77,6 +77,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b14807b7002e..165c646ddb8f 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -837,6 +837,22 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) return khugepaged_defrag() ? GFP_TRANSHUGE : GFP_TRANSHUGE_LIGHT; } +static bool alloc_hpage(gfp_t gfp, int node, struct collapse_control *cc) +{ + VM_BUG_ON_PAGE(cc->hpage, cc->hpage); + + cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); + if (unlikely(!cc->hpage)) { + count_vm_event(THP_COLLAPSE_ALLOC_FAILED); + cc->hpage = ERR_PTR(-ENOMEM); + return false; + } + + prep_transhuge_page(cc->hpage); + count_vm_event(THP_COLLAPSE_ALLOC); + return true; +} + #ifdef CONFIG_NUMA static int khugepaged_find_target_node(struct collapse_control *cc) { @@ -882,18 +898,7 @@ static bool khugepaged_prealloc_page(struct page **hpage, bool *wait) static bool khugepaged_alloc_page(gfp_t gfp, int node, struct collapse_control *cc) { - VM_BUG_ON_PAGE(cc->hpage, cc->hpage); - - cc->hpage = __alloc_pages_node(node, gfp, HPAGE_PMD_ORDER); - if (unlikely(!cc->hpage)) { - count_vm_event(THP_COLLAPSE_ALLOC_FAILED); - cc->hpage = ERR_PTR(-ENOMEM); - return false; - } - - prep_transhuge_page(cc->hpage); - count_vm_event(THP_COLLAPSE_ALLOC); - return true; + return alloc_hpage(gfp, node, cc); } #else static int khugepaged_find_target_node(struct collapse_control *cc) @@ -2457,3 +2462,141 @@ void khugepaged_min_free_kbytes_update(void) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); } + +static void madvise_collapse_cleanup_page(struct page **hpage) +{ + if (!IS_ERR(*hpage) && *hpage) + put_page(*hpage); + *hpage = NULL; +} + +static int madvise_collapse_errno(enum scan_result r) +{ + switch (r) { + case SCAN_PMD_NULL: + case SCAN_ADDRESS_RANGE: + case SCAN_VMA_NULL: + case SCAN_PTE_NON_PRESENT: + case SCAN_PAGE_NULL: + /* + * Addresses in the specified range are not currently mapped, + * or are outside the AS of the process. + */ + return -ENOMEM; + case SCAN_ALLOC_HUGE_PAGE_FAIL: + case SCAN_CGROUP_CHARGE_FAIL: + /* A kernel resource was temporarily unavailable. */ + return -EAGAIN; + default: + return -EINVAL; + } +} + +static int madvise_alloc_charge_hpage(struct mm_struct *mm, + struct collapse_control *cc) +{ + if (!alloc_hpage(GFP_TRANSHUGE, khugepaged_find_target_node(cc), cc)) + return SCAN_ALLOC_HUGE_PAGE_FAIL; + if (unlikely(mem_cgroup_charge(page_folio(cc->hpage), mm, + GFP_TRANSHUGE))) + return SCAN_CGROUP_CHARGE_FAIL; + count_memcg_page_event(cc->hpage, THP_COLLAPSE_ALLOC); + return SCAN_SUCCEED; +} + +int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + struct collapse_control cc = { + .enforce_pte_scan_limits = false, + .enforce_young = false, + .last_target_node = NUMA_NO_NODE, + .hpage = NULL, + .alloc_charge_hpage = &madvise_alloc_charge_hpage, + }; + struct mm_struct *mm = vma->vm_mm; + unsigned long hstart, hend, addr; + int thps = 0, nr_hpages = 0, result = SCAN_FAIL; + bool mmap_locked = true; + + BUG_ON(vma->vm_start > start); + BUG_ON(vma->vm_end < end); + + *prev = vma; + + if (IS_ENABLED(CONFIG_SHMEM) && vma->vm_file) + return -EINVAL; + + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; + hend = end & HPAGE_PMD_MASK; + nr_hpages = (hend - hstart) >> HPAGE_PMD_SHIFT; + + if (hstart >= hend || !transparent_hugepage_active(vma)) + return -EINVAL; + + mmgrab(mm); + lru_add_drain(); + + for (addr = hstart; ; ) { + mmap_assert_locked(mm); + cond_resched(); + result = SCAN_FAIL; + + if (unlikely(khugepaged_test_exit(mm))) { + result = SCAN_ANY_PROCESS; + break; + } + + memset(cc.node_load, 0, sizeof(cc.node_load)); + result = khugepaged_scan_pmd(mm, vma, addr, &mmap_locked, &cc); + if (!mmap_locked) + *prev = NULL; /* tell madvise we dropped mmap_lock */ + + switch (result) { + /* Whitelisted set of results where continuing OK */ + case SCAN_SUCCEED: + case SCAN_PMD_MAPPED: + ++thps; + fallthrough; + case SCAN_PMD_NULL: + case SCAN_PTE_NON_PRESENT: + case SCAN_PTE_UFFD_WP: + case SCAN_PAGE_RO: + case SCAN_LACK_REFERENCED_PAGE: + case SCAN_PAGE_NULL: + case SCAN_PAGE_COUNT: + case SCAN_PAGE_LOCK: + case SCAN_PAGE_COMPOUND: + break; + case SCAN_PAGE_LRU: + lru_add_drain_all(); + goto retry; + default: + /* Other error, exit */ + goto break_loop; + } + addr += HPAGE_PMD_SIZE; + if (addr >= hend) + break; +retry: + if (!mmap_locked) { + mmap_read_lock(mm); + mmap_locked = true; + result = hugepage_vma_revalidate(mm, addr, &vma); + if (result) + goto out; + } + madvise_collapse_cleanup_page(&cc.hpage); + } + +break_loop: + /* madvise_walk_vmas() expects us to hold mmap_lock on return */ + if (!mmap_locked) + mmap_read_lock(mm); +out: + mmap_assert_locked(mm); + madvise_collapse_cleanup_page(&cc.hpage); + mmdrop(mm); + + return thps == nr_hpages ? 0 : madvise_collapse_errno(result); +} diff --git a/mm/madvise.c b/mm/madvise.c index 5f4537511532..638517952bd2 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -59,6 +59,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_FREE: case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: + case MADV_COLLAPSE: return 0; default: /* be safe, default to 1. list exceptions explicitly */ @@ -1054,6 +1055,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, if (error) goto out; break; + case MADV_COLLAPSE: + return madvise_collapse(vma, prev, start, end); } anon_name = anon_vma_name(vma); @@ -1147,6 +1150,7 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_TRANSPARENT_HUGEPAGE case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: + case MADV_COLLAPSE: #endif case MADV_DONTDUMP: case MADV_DODUMP: @@ -1336,6 +1340,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * MADV_NOHUGEPAGE - mark the given range as not worth being backed by * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. + * MADV_COLLAPSE - synchronously coalesce pages into new THP. * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. From patchwork Wed May 4 21:44:33 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838670 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 55D74C433EF for ; Wed, 4 May 2022 21:45:20 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E524A6B0080; Wed, 4 May 2022 17:45:19 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DB3E86B0081; Wed, 4 May 2022 17:45:19 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B90296B0082; Wed, 4 May 2022 17:45:19 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id A9FEE6B0080 for ; Wed, 4 May 2022 17:45:19 -0400 (EDT) Received: from smtpin05.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id 8C07A60A48 for ; Wed, 4 May 2022 21:45:19 +0000 (UTC) X-FDA: 79429391958.05.C7B755C Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf12.hostedemail.com (Postfix) with ESMTP id E2C2F4007C for ; Wed, 4 May 2022 21:45:02 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id b198-20020a6334cf000000b003ab23ccd0cbso1280006pga.14 for ; Wed, 04 May 2022 14:45:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=2dkmQNcQmOGlATP5teArTW29S0qwEJDHirTfhWHFJLU=; b=VmZsQDvLrCENyv2YV2Cs6Jdzk6NHXszxT5Vv/27CiCiCNa3wTX3aptBnxuXgPhRyfl X55Cjx4b0fkpP4o+Um1erHCgwILK9tRwDF9ez7XzHshig8LdKC8WktISxL7r8XgT5EfV BsXuG/C0rH+RHptdLkuSG0XFgSrVE1ScIX6Z2Kf2ldw3kASG+mlASACVjvuUheg/hErc mXCJq3MIt4C1LjGBFD3y11VRpc22l/DpGXQvOgsPBv5xvs7rmteF9zceym05LB9Md4JD b0wwTpUmVfxTBUG9aoNWUnTGxmQ6x4UZSE4aBG1atSk7albm0eC9a9oKAZy/jUCXEBX3 Tflw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=2dkmQNcQmOGlATP5teArTW29S0qwEJDHirTfhWHFJLU=; b=SC2iRJ/h6XtJMn6VmSkAxVV36QIlD/S0DmBnVDSAbgCObQJjtDPlmNcHu3TuWvlkqj ZpaasUz6MP7PjFcGvisKgjvtY4boMitiBixWuY791trd3gX95lyBFV5fwaJmaou2AmWr 0wDs8K+C6rRybzdzy6xZ3D/DaCX08MgzH6gKPfl1DNG1018Pu+Vl1gJo5X6U11tPftTD JOeyzVdFAV76HMq+EZZW0zWYKP1EYKUL+2/GQ9FyXOVPRIV0YUb1oYNf2W/qXujP1FCr OrX8/gdaZG1whXKz8nJiCS5Pp3sPU6EDZPYWfmTmHtY9xeDTSH19sfe7M2wFIuEK/mfc Q8kA== X-Gm-Message-State: AOAM5314T20skrXe6errm3prAXb7g4G8E1cniQPQ2ZwFGju/LyORDVQd 7X1YJob/eUFq920Hr8GqunOvEciEmfw7 X-Google-Smtp-Source: ABdhPJwlweA3CuSRC17P1DV0jq9EFuIYuZiVJB24sel6oKq67LS7H0mcAvgIDlwt4a/uoTSS6hwRbuvRGMW1 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:528f:b0:1dc:9a7c:4a3 with SMTP id w15-20020a17090a528f00b001dc9a7c04a3mr1881607pjh.112.1651700718156; Wed, 04 May 2022 14:45:18 -0700 (PDT) Date: Wed, 4 May 2022 14:44:33 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-10-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 09/13] mm/khugepaged: rename prefix of shared collapse functions From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Queue-Id: E2C2F4007C X-Stat-Signature: h9ij34usk7dqwxx81fokhmhxpx5qob7r Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VmZsQDvL; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf12.hostedemail.com: domain of 37vNyYgcKCLs0plffgfhpphmf.dpnmjovy-nnlwbdl.psh@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=37vNyYgcKCLs0plffgfhpphmf.dpnmjovy-nnlwbdl.psh@flex--zokeefe.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1651700702-710407 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The following functions/tracepoints are shared between khugepaged and madvise collapse contexts. Replace the "khugepaged_" prefixe with generic "hpage_collapse_" prefix in such cases: huge_memory:mm_khugepaged_scan_pmd -> huge_memory:mm_hpage_collapse_scan_pmd khugepaged_test_exit() -> hpage_collapse_test_exit() khugepaged_scan_abort() -> hpage_collapse_scan_abort() khugepaged_scan_pmd() -> hpage_collapse_scan_pmd() khugepaged_find_target_node() -> hpage_collapse_find_target_node() Signed-off-by: Zach O'Keefe Acked-by: David Rientjes --- include/trace/events/huge_memory.h | 2 +- mm/khugepaged.c | 72 ++++++++++++++++-------------- 2 files changed, 39 insertions(+), 35 deletions(-) diff --git a/include/trace/events/huge_memory.h b/include/trace/events/huge_memory.h index 55392bf30a03..fb6c73632ff3 100644 --- a/include/trace/events/huge_memory.h +++ b/include/trace/events/huge_memory.h @@ -48,7 +48,7 @@ SCAN_STATUS #define EM(a, b) {a, b}, #define EMe(a, b) {a, b} -TRACE_EVENT(mm_khugepaged_scan_pmd, +TRACE_EVENT(mm_hpage_collapse_scan_pmd, TP_PROTO(struct mm_struct *mm, struct page *page, bool writable, int referenced, int none_or_zero, int status, int unmapped), diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 165c646ddb8f..44e31e072124 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -96,7 +96,7 @@ struct collapse_control { /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; - /* Last target selected in khugepaged_find_target_node() */ + /* Last target selected in hpage_collapse_find_target_node() */ int last_target_node; struct page *hpage; @@ -453,7 +453,7 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm, hash_add(mm_slots_hash, &mm_slot->hash, (long)mm); } -static inline int khugepaged_test_exit(struct mm_struct *mm) +static inline int hpage_collapse_test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) == 0; } @@ -502,7 +502,7 @@ int __khugepaged_enter(struct mm_struct *mm) return -ENOMEM; /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); + VM_BUG_ON_MM(hpage_collapse_test_exit(mm), mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) { free_mm_slot(mm_slot); return 0; @@ -566,11 +566,10 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (mm_slot) { /* * This is required to serialize against - * khugepaged_test_exit() (which is guaranteed to run - * under mmap sem read mode). Stop here (after we - * return all pagetables will be destroyed) until - * khugepaged has finished working on the pagetables - * under the mmap_lock. + * hpage_collapse_test_exit() (which is guaranteed to run + * under mmap sem read mode). Stop here (after we return all + * pagetables will be destroyed) until khugepaged has finished + * working on the pagetables under the mmap_lock. */ mmap_write_lock(mm); mmap_write_unlock(mm); @@ -807,7 +806,7 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } -static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) +static bool hpage_collapse_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -854,7 +853,7 @@ static bool alloc_hpage(gfp_t gfp, int node, struct collapse_control *cc) } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(struct collapse_control *cc) +static int hpage_collapse_find_target_node(struct collapse_control *cc) { int nid, target_node = 0, max_value = 0; @@ -901,7 +900,7 @@ static bool khugepaged_alloc_page(gfp_t gfp, int node, return alloc_hpage(gfp, node, cc); } #else -static int khugepaged_find_target_node(struct collapse_control *cc) +static int hpage_collapse_find_target_node(struct collapse_control *cc) { return 0; } @@ -982,7 +981,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, struct vm_area_struct *vma; unsigned long hstart, hend; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(hpage_collapse_test_exit(mm))) return SCAN_ANY_PROCESS; *vmap = vma = find_vma(mm, address); @@ -1026,7 +1025,7 @@ static int find_pmd_or_thp_or_none(struct mm_struct *mm, /* * Bring missing pages in from swap, to complete THP collapse. - * Only done if khugepaged_scan_pmd believes it is worthwhile. + * Only done if hpage_collapse_scan_pmd believes it is worthwhile. * * Called and returns without pte mapped or spinlocks held, * but with mmap_lock held to protect against vma changes. @@ -1093,7 +1092,7 @@ static int alloc_charge_hpage(struct mm_struct *mm, struct collapse_control *cc) const struct cpumask *cpumask; #endif gfp_t gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - int node = khugepaged_find_target_node(cc); + int node = hpage_collapse_find_target_node(cc); #ifdef CONFIG_NUMA /* sched to specified node before huge page memory copy */ @@ -1263,9 +1262,10 @@ static int collapse_huge_page(struct mm_struct *mm, unsigned long address, return result; } -static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, - unsigned long address, bool *mmap_locked, - struct collapse_control *cc) +static int hpage_collapse_scan_pmd(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, bool *mmap_locked, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1357,7 +1357,7 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node, cc)) { + if (hpage_collapse_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } @@ -1419,8 +1419,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unmapped, cc); } out: - trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, - none_or_zero, result, unmapped); + trace_mm_hpage_collapse_scan_pmd(mm, page, writable, referenced, + none_or_zero, result, unmapped); return result; } @@ -1430,7 +1430,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot) lockdep_assert_held(&khugepaged_mm_lock); - if (khugepaged_test_exit(mm)) { + if (hpage_collapse_test_exit(mm)) { /* free mm_slot */ hash_del(&mm_slot->hash); list_del(&mm_slot->mm_node); @@ -1601,7 +1601,7 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) if (!mmap_write_trylock(mm)) return; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(hpage_collapse_test_exit(mm))) goto out; for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) @@ -1664,7 +1664,8 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * it'll always mapped in small page size for uffd-wp * registered ranges. */ - if (!khugepaged_test_exit(mm) && !userfaultfd_wp(vma)) + if (!hpage_collapse_test_exit(mm) && + !userfaultfd_wp(vma)) collapse_and_free_pmd(mm, vma, addr, pmd); mmap_write_unlock(mm); } else { @@ -2089,7 +2090,7 @@ static int khugepaged_scan_file(struct mm_struct *mm, struct file *file, } node = page_to_nid(page); - if (khugepaged_scan_abort(node, cc)) { + if (hpage_collapse_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; break; } @@ -2178,7 +2179,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, vma = NULL; if (unlikely(!mmap_read_trylock(mm))) goto breakouterloop_mmap_lock; - if (likely(!khugepaged_test_exit(mm))) + if (likely(!hpage_collapse_test_exit(mm))) vma = find_vma(mm, khugepaged_scan.address); progress++; @@ -2186,7 +2187,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, unsigned long hstart, hend; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) { + if (unlikely(hpage_collapse_test_exit(mm))) { progress++; break; } @@ -2212,7 +2213,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, bool mmap_locked = true; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(hpage_collapse_test_exit(mm))) goto breakouterloop; VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2229,9 +2230,10 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, cc); fput(file); } else { - result = khugepaged_scan_pmd(mm, vma, - khugepaged_scan.address, - &mmap_locked, cc); + result = hpage_collapse_scan_pmd(mm, vma, + khugepaged_scan.address, + &mmap_locked, + cc); } if (result == SCAN_SUCCEED) ++khugepaged_pages_collapsed; @@ -2255,7 +2257,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (khugepaged_test_exit(mm) || !vma) { + if (hpage_collapse_test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find @@ -2495,7 +2497,8 @@ static int madvise_collapse_errno(enum scan_result r) static int madvise_alloc_charge_hpage(struct mm_struct *mm, struct collapse_control *cc) { - if (!alloc_hpage(GFP_TRANSHUGE, khugepaged_find_target_node(cc), cc)) + if (!alloc_hpage(GFP_TRANSHUGE, hpage_collapse_find_target_node(cc), + cc)) return SCAN_ALLOC_HUGE_PAGE_FAIL; if (unlikely(mem_cgroup_charge(page_folio(cc->hpage), mm, GFP_TRANSHUGE))) @@ -2542,13 +2545,14 @@ int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, cond_resched(); result = SCAN_FAIL; - if (unlikely(khugepaged_test_exit(mm))) { + if (unlikely(hpage_collapse_test_exit(mm))) { result = SCAN_ANY_PROCESS; break; } memset(cc.node_load, 0, sizeof(cc.node_load)); - result = khugepaged_scan_pmd(mm, vma, addr, &mmap_locked, &cc); + result = hpage_collapse_scan_pmd(mm, vma, addr, &mmap_locked, + &cc); if (!mmap_locked) *prev = NULL; /* tell madvise we dropped mmap_lock */ From patchwork Wed May 4 21:44:34 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838672 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0A4CCC433F5 for ; Wed, 4 May 2022 21:45:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7F5EE6B0082; Wed, 4 May 2022 17:45:22 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 752FC6B0087; Wed, 4 May 2022 17:45:22 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4146E6B0085; Wed, 4 May 2022 17:45:22 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id 32B6C6B0081 for ; Wed, 4 May 2022 17:45:22 -0400 (EDT) Received: from smtpin19.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0E533912 for ; Wed, 4 May 2022 21:45:22 +0000 (UTC) X-FDA: 79429392084.19.B408753 Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf22.hostedemail.com (Postfix) with ESMTP id 3D690C008F for ; Wed, 4 May 2022 21:45:20 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id f7-20020a6547c7000000b003c600995546so723255pgs.5 for ; Wed, 04 May 2022 14:45:21 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ycD61MQgIhpcrtvxZMyg0KT3Zd0aTw6m/uwS7wZnXys=; b=IvKbG9ikAwG0nZEEUErkrwBY/rU7Ow8y53t9jRhda92fSkWwB3gUi0yAkabkP2iqOQ ICOzeDuRW4QNqVHT3WYt+6dLLRlqt+POHSyGyianNm/Nw0MbVPPjav4JHqbOnO/m3L0/ +EXcK01or03BkuQQ8kn8zFWG7cJcgVRqWQlzD9FTCF1HqrOCsh6pjbUWFZrfw7kpjaBo AfROYH5pfY5eKiXNp2PKH/K6UEyiBXyyo0Rolxwq6+ZJAcRSBKyHCCnh+JTm4bmFboQM 22YpAk5Sf/ZyCO9MrR7mWPTzm5a2Gb4mat2+HE6HzXhIzClV6Azq7uRtwAX5gIwGr2fi tjew== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ycD61MQgIhpcrtvxZMyg0KT3Zd0aTw6m/uwS7wZnXys=; b=6tAK7WbLPiHkv9oghAIkQI5gXvWjNFvxnEMJhUFB6xLaplOOdOSFpSsOHLBMBVr3cV YsO0hpmSWzFhIotjC6jha/7ZNnwqmq/8z8+PWL+4/vecRfOAhspEsqJCP4yY5DfkmWAB 1/ZXxRkYkSp38HQPnru18auF4JrIYfRzDTvYTByyJxDSs/EZX3OJYuiovSrDkUFOw0JK ySdkhnvzqwFNR160oYR47LYXBLjsMuTQ/wGH9EBTPkh3o78paLTZ7MPBHBv9YO1rdhmh Nu1ApL6/E4ezgalFHbHxxjgSr81UHygvBWZxAFe+sLS831xWFRC3Ex3o4CDyogtJxPbT PEqA== X-Gm-Message-State: AOAM531l3KYEZcufOqciuY/BsPlHlcmglpmDwLuS9qHCgA0AxSaBzcyn LKFSB4zQJZ2zUWTDhrATNcre+0VmlXBJ X-Google-Smtp-Source: ABdhPJzQ0vQlKaoP+2IFPabfM0cx9twO9vyeIb1dIsLRcx2PJepSXdPnKmebeL+nhq0shOImF8Tbq2wMvLQC X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:903:11c7:b0:154:b936:d1d4 with SMTP id q7-20020a17090311c700b00154b936d1d4mr23584337plh.78.1651700720526; Wed, 04 May 2022 14:45:20 -0700 (PDT) Date: Wed, 4 May 2022 14:44:34 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-11-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 10/13] mm/madvise: add MADV_COLLAPSE to process_madvise() From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspam-User: X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: 3D690C008F X-Stat-Signature: 7n64qsncc4b34jcysexrcwq6c31w9k3m Authentication-Results: imf22.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=IvKbG9ik; spf=pass (imf22.hostedemail.com: domain of 38PNyYgcKCL02rnhhihjrrjoh.frpolqx0-ppnydfn.ruj@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=38PNyYgcKCL02rnhhihjrrjoh.frpolqx0-ppnydfn.ruj@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1651700720-986234 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Allow MADV_COLLAPSE behavior for process_madvise(2) if caller has CAP_SYS_ADMIN or is requesting collapse of it's own memory. Signed-off-by: Zach O'Keefe Acked-by: David Rientjes --- mm/madvise.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 638517952bd2..08c11217025a 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -1168,13 +1168,15 @@ madvise_behavior_valid(int behavior) } static bool -process_madvise_behavior_valid(int behavior) +process_madvise_behavior_valid(int behavior, struct task_struct *task) { switch (behavior) { case MADV_COLD: case MADV_PAGEOUT: case MADV_WILLNEED: return true; + case MADV_COLLAPSE: + return task == current || capable(CAP_SYS_ADMIN); default: return false; } @@ -1452,7 +1454,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, goto free_iov; } - if (!process_madvise_behavior_valid(behavior)) { + if (!process_madvise_behavior_valid(behavior, task)) { ret = -EINVAL; goto release_task; } From patchwork Wed May 4 21:44:35 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838673 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A6D2EC433FE for ; Wed, 4 May 2022 21:45:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E29856B0087; Wed, 4 May 2022 17:45:24 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DD9346B0085; Wed, 4 May 2022 17:45:24 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id CA04E6B0087; Wed, 4 May 2022 17:45:24 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B80246B0083 for ; Wed, 4 May 2022 17:45:24 -0400 (EDT) Received: from smtpin17.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 91F542C18D for ; Wed, 4 May 2022 21:45:24 +0000 (UTC) X-FDA: 79429392168.17.3C4EE48 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf24.hostedemail.com (Postfix) with ESMTP id 180881800AC for ; Wed, 4 May 2022 21:45:17 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id u9-20020a17090282c900b0015ea48078b7so1303807plz.10 for ; Wed, 04 May 2022 14:45:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=oOYaPFS0snZF0jcIguSnt+fI0DE93HtI6yWk8Dzmu68=; b=kwX0hH7z65+KcDnFuyEhfBy7br2XY9wgyoqkVBSm71OKe8uC5NRMQsjLS7d3OaV3Lc jZN1L3IAP2WTj9JymlTok9mEhEToGdYV9fwqMNm96yCvKNp8szz5zR7nRqY3fe/gutKZ ybDxBKOd7Ni2V34z4rzBd5EgPqL4zImXCFCg472Wb8MXlwKU33faQDxdvZOi6Lt8MevM mkiLctcQOQO03NSIqhF5BCWTm2ie3Pb15xoNEyeddURwUv1MpV4gR6ADISnGNP1BaWjS xcTA/fwzEf4ZGr8vbeZRCeq3H0VnJG4wakIPDd+WpA7RdDHLB+1MpoqnaBeCXevhyHqo 7X5Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=oOYaPFS0snZF0jcIguSnt+fI0DE93HtI6yWk8Dzmu68=; b=HaB9pMfMmu6PCknUht8F66J4amJ12N1P0JvoQNoywbP8iJzC17yRKJ8EwN9YrVYzZ6 FjPRBKhRyDFr/I+VCruSn7RsrA/tPp/ZU9hiX7UMaA1luTb1ToNAgZ4/520LjVCmbepL HDrHsfHONLfNRtaoT03r1A7kV7t0lo6w/Xk2x+w6I8jLb03gXFPG/Ax3+KmUAysnVjUI bwoxbMCHY1sKmDvSgw7sxRYozoxavxkCvW+xK4+t40Xo1iU+6jOr/erk6eAAmg14/EtX F7OKvW0NBy09S2K8H05cDhdOhQpK8WwJkbtICorfKU+9/3FgHqrbCI23YmTN8lR7y+5R mC/A== X-Gm-Message-State: AOAM531qZX3OkNKc+iMSj6VLu4eGWNrZSV1+tthhsYKfajFIwQvj8yuA FSWH0/T9JlDHVMadRVOv2ndYeCUhdDqS X-Google-Smtp-Source: ABdhPJxM8yFanuR8l42bEjoVRLjtuc8XIK62oVk4UAyiv6ko7NWDaQSw+lRBbu24yJWL6Qfdz145UYrVFPjp X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:e510:b0:1d9:ee23:9fa1 with SMTP id t16-20020a17090ae51000b001d9ee239fa1mr122004pjy.0.1651700722547; Wed, 04 May 2022 14:45:22 -0700 (PDT) Date: Wed, 4 May 2022 14:44:35 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-12-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 11/13] selftests/vm: modularize collapse selftests From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Stat-Signature: fabbmhong56gj44p18db3grh3fh9nt3p X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 180881800AC Authentication-Results: imf24.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=kwX0hH7z; spf=pass (imf24.hostedemail.com: domain of 38vNyYgcKCL84tpjjkjlttlqj.htrqnsz2-rrp0fhp.twl@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=38vNyYgcKCL84tpjjkjlttlqj.htrqnsz2-rrp0fhp.twl@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1651700717-281207 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize the collapse action of khugepaged collapse selftests by introducing a struct collapse_context which specifies how to collapse a given memory range and the expected semantics of the collapse. This can be reused later to test other collapse contexts. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 257 +++++++++++------------- 1 file changed, 116 insertions(+), 141 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index 155120b67a16..c59d832fee96 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -23,6 +23,12 @@ static int hpage_pmd_nr; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" #define PID_SMAPS "/proc/self/smaps" +struct collapse_context { + const char *name; + void (*collapse)(const char *msg, char *p, bool expect); + bool enforce_pte_scan_limits; +}; + enum thp_enabled { THP_ALWAYS, THP_MADVISE, @@ -528,53 +534,39 @@ static void alloc_at_fault(void) munmap(p, hpage_pmd_size); } -static void collapse_full(void) +static void collapse_full(struct collapse_context *context) { void *p; p = alloc_mapping(); fill_memory(p, 0, hpage_pmd_size); - if (wait_for_scan("Collapse fully populated PTE table", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse fully populated PTE table", p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_empty(void) +static void collapse_empty(struct collapse_context *context) { void *p; p = alloc_mapping(); - if (wait_for_scan("Do not collapse empty PTE table", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Do not collapse empty PTE table", p, false); munmap(p, hpage_pmd_size); } -static void collapse_single_pte_entry(void) +static void collapse_single_pte_entry(struct collapse_context *context) { void *p; p = alloc_mapping(); fill_memory(p, 0, page_size); - if (wait_for_scan("Collapse PTE table with single PTE entry present", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single PTE entry present", p, + true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_none(void) +static void collapse_max_ptes_none(struct collapse_context *context) { int max_ptes_none = hpage_pmd_nr / 2; struct settings settings = default_settings; @@ -586,28 +578,23 @@ static void collapse_max_ptes_none(void) p = alloc_mapping(); fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - if (wait_for_scan("Do not collapse with max_ptes_none exceeded", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Maybe collapse with max_ptes_none exceeded", p, + !context->enforce_pte_scan_limits); validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none - 1) * page_size); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); - if (wait_for_scan("Collapse with max_ptes_none PTEs empty", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); - validate_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + if (context->enforce_pte_scan_limits) { + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_none) * page_size); + context->collapse("Collapse with max_ptes_none PTEs empty", p, + true); + validate_memory(p, 0, + (hpage_pmd_nr - max_ptes_none) * page_size); + } munmap(p, hpage_pmd_size); write_settings(&default_settings); } -static void collapse_swapin_single_pte(void) +static void collapse_swapin_single_pte(struct collapse_context *context) { void *p; p = alloc_mapping(); @@ -625,18 +612,14 @@ static void collapse_swapin_single_pte(void) goto out; } - if (wait_for_scan("Collapse with swapping in single PTE entry", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse with swapping in single PTE entry", + p, true); validate_memory(p, 0, hpage_pmd_size); out: munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_swap(void) +static void collapse_max_ptes_swap(struct collapse_context *context) { int max_ptes_swap = read_num("khugepaged/max_ptes_swap"); void *p; @@ -656,39 +639,34 @@ static void collapse_max_ptes_swap(void) goto out; } - if (wait_for_scan("Do not collapse with max_ptes_swap exceeded", p)) - fail("Timeout"); - else if (check_huge(p)) - fail("Fail"); - else - success("OK"); + context->collapse("Maybe collapse with max_ptes_swap exceeded", + p, !context->enforce_pte_scan_limits); validate_memory(p, 0, hpage_pmd_size); - fill_memory(p, 0, hpage_pmd_size); - printf("Swapout %d of %d pages...", max_ptes_swap, hpage_pmd_nr); - if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { - perror("madvise(MADV_PAGEOUT)"); - exit(EXIT_FAILURE); - } - if (check_swap(p, max_ptes_swap * page_size)) { - success("OK"); - } else { - fail("Fail"); - goto out; - } + if (context->enforce_pte_scan_limits) { + fill_memory(p, 0, hpage_pmd_size); + printf("Swapout %d of %d pages...", max_ptes_swap, + hpage_pmd_nr); + if (madvise(p, max_ptes_swap * page_size, MADV_PAGEOUT)) { + perror("madvise(MADV_PAGEOUT)"); + exit(EXIT_FAILURE); + } + if (check_swap(p, max_ptes_swap * page_size)) { + success("OK"); + } else { + fail("Fail"); + goto out; + } - if (wait_for_scan("Collapse with max_ptes_swap pages swapped out", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); - validate_memory(p, 0, hpage_pmd_size); + context->collapse("Collapse with max_ptes_swap pages swapped out", + p, true); + validate_memory(p, 0, hpage_pmd_size); + } out: munmap(p, hpage_pmd_size); } -static void collapse_single_pte_entry_compound(void) +static void collapse_single_pte_entry_compound(struct collapse_context *context) { void *p; @@ -710,17 +688,13 @@ static void collapse_single_pte_entry_compound(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table with single PTE mapping compound page", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single PTE mapping compound page", + p, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); } -static void collapse_full_of_compound(void) +static void collapse_full_of_compound(struct collapse_context *context) { void *p; @@ -742,17 +716,12 @@ static void collapse_full_of_compound(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table full of compound pages", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of compound pages", p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_compound_extreme(void) +static void collapse_compound_extreme(struct collapse_context *context) { void *p; int i; @@ -798,18 +767,14 @@ static void collapse_compound_extreme(void) else fail("Fail"); - if (wait_for_scan("Collapse PTE table full of different compound pages", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of different compound pages", + p, true); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); } -static void collapse_fork(void) +static void collapse_fork(struct collapse_context *context) { int wstatus; void *p; @@ -835,13 +800,8 @@ static void collapse_fork(void) fail("Fail"); fill_memory(p, page_size, 2 * page_size); - - if (wait_for_scan("Collapse PTE table with single page shared with parent process", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table with single page shared with parent process", + p, true); validate_memory(p, 0, page_size); munmap(p, hpage_pmd_size); @@ -860,7 +820,7 @@ static void collapse_fork(void) munmap(p, hpage_pmd_size); } -static void collapse_fork_compound(void) +static void collapse_fork_compound(struct collapse_context *context) { int wstatus; void *p; @@ -896,14 +856,10 @@ static void collapse_fork_compound(void) fill_memory(p, 0, page_size); write_num("khugepaged/max_ptes_shared", hpage_pmd_nr - 1); - if (wait_for_scan("Collapse PTE table full of compound pages in child", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Collapse PTE table full of compound pages in child", + p, true); write_num("khugepaged/max_ptes_shared", - default_settings.khugepaged.max_ptes_shared); + default_settings.khugepaged.max_ptes_shared); validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -922,7 +878,7 @@ static void collapse_fork_compound(void) munmap(p, hpage_pmd_size); } -static void collapse_max_ptes_shared() +static void collapse_max_ptes_shared(struct collapse_context *context) { int max_ptes_shared = read_num("khugepaged/max_ptes_shared"); int wstatus; @@ -957,28 +913,22 @@ static void collapse_max_ptes_shared() else fail("Fail"); - if (wait_for_scan("Do not collapse with max_ptes_shared exceeded", p)) - fail("Timeout"); - else if (!check_huge(p)) - success("OK"); - else - fail("Fail"); - - printf("Trigger CoW on page %d of %d...", - hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); - fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * page_size); - if (!check_huge(p)) - success("OK"); - else - fail("Fail"); - - - if (wait_for_scan("Collapse with max_ptes_shared PTEs shared", p)) - fail("Timeout"); - else if (check_huge(p)) - success("OK"); - else - fail("Fail"); + context->collapse("Maybe collapse with max_ptes_shared exceeded", + p, !context->enforce_pte_scan_limits); + + if (context->enforce_pte_scan_limits) { + printf("Trigger CoW on page %d of %d...", + hpage_pmd_nr - max_ptes_shared, hpage_pmd_nr); + fill_memory(p, 0, (hpage_pmd_nr - max_ptes_shared) * + page_size); + if (!check_huge(p)) + success("OK"); + else + fail("Fail"); + + context->collapse("Collapse with max_ptes_shared PTEs shared", + p, true); + } validate_memory(p, 0, hpage_pmd_size); munmap(p, hpage_pmd_size); @@ -997,8 +947,27 @@ static void collapse_max_ptes_shared() munmap(p, hpage_pmd_size); } +static void khugepaged_collapse(const char *msg, char *p, bool expect) +{ + if (wait_for_scan(msg, p)) + fail("Timeout"); + else if (check_huge(p) == expect) + success("OK"); + else + fail("Fail"); +} + int main(void) { + struct collapse_context contexts[] = { + { + .name = "khugepaged", + .collapse = &khugepaged_collapse, + .enforce_pte_scan_limits = true, + }, + }; + int i; + setbuf(stdout, NULL); page_size = getpagesize(); @@ -1014,18 +983,24 @@ int main(void) adjust_settings(); alloc_at_fault(); - collapse_full(); - collapse_empty(); - collapse_single_pte_entry(); - collapse_max_ptes_none(); - collapse_swapin_single_pte(); - collapse_max_ptes_swap(); - collapse_single_pte_entry_compound(); - collapse_full_of_compound(); - collapse_compound_extreme(); - collapse_fork(); - collapse_fork_compound(); - collapse_max_ptes_shared(); + + for (i = 0; i < sizeof(contexts) / sizeof(contexts[0]); ++i) { + struct collapse_context *c = &contexts[i]; + + printf("\n*** Testing context: %s ***\n", c->name); + collapse_full(c); + collapse_empty(c); + collapse_single_pte_entry(c); + collapse_max_ptes_none(c); + collapse_swapin_single_pte(c); + collapse_max_ptes_swap(c); + collapse_single_pte_entry_compound(c); + collapse_full_of_compound(c); + collapse_compound_extreme(c); + collapse_fork(c); + collapse_fork_compound(c); + collapse_max_ptes_shared(c); + } restore_settings(0); } From patchwork Wed May 4 21:44:36 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838674 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 51DDCC433F5 for ; Wed, 4 May 2022 21:45:27 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CF2076B0083; Wed, 4 May 2022 17:45:26 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id CA0D16B0085; Wed, 4 May 2022 17:45:26 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B1C596B0088; Wed, 4 May 2022 17:45:26 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 956266B0083 for ; Wed, 4 May 2022 17:45:26 -0400 (EDT) Received: from smtpin15.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id 6BD102C17C for ; Wed, 4 May 2022 21:45:26 +0000 (UTC) X-FDA: 79429392252.15.E55A3AA Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf09.hostedemail.com (Postfix) with ESMTP id 358AA140081 for ; Wed, 4 May 2022 21:45:20 +0000 (UTC) Received: by mail-pl1-f202.google.com with SMTP id t14-20020a1709028c8e00b0015cf7e541feso1309844plo.1 for ; Wed, 04 May 2022 14:45:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=FQZlj+shHBqj+OTBNHuQzomzjyvobuslZcU+fXrDTXU=; b=F0T9mdOVZwwCbHNw2OJAVPZUCoZjFwnIn8ZiyeYbjQJwWSSCGX2e/iaqOoo7L6QW7a KBRny8Ouy94cfWs+/A7OhWuaalZ+2mCDK7/G6e5TEyzE74CN61rC1BXtkC5l3E8TP4U2 3KVjFL30XH4+NNUz0+afB8kpv8QRcfKGsYCrEagdkFyPf4DvQezM+4kng5hDgHLt3xPf MY+rAAvNHMe18qGiijD2bhrQkpftnb54UzHF8vSqGZNC+kKlfwJi7AM811WQMvVFoJNp oAMQ6HhTIr7UnvN9735p0ekq9eikVR3SItnl5HiCCMwNeLq0OPv3Z5pGHsgIwa10fO9s v19w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=FQZlj+shHBqj+OTBNHuQzomzjyvobuslZcU+fXrDTXU=; b=48z4VudfUUrtOirwEBGhcyPkpdv/CIDxC6Et6UjP4uLWzlz/ZeHpxE9gpp1hq7oRe1 ZMipCKPnndas+/GnqFQzHAaySspPCCeVSzfFmmvj8I/XQdkj8JQ0/p8vN0uGI+HRD3Bw cj27q8RR689yN4zXY25ts/CgNjVlKrkM8ILnqJ8VPpvaaPrQppOrauh4HeiLnZfWoyPd m1VGhvp4chPkJ5g1N4qt3Cj+pgKejDTQJwdfiarVjca09gAIgByzM0LCOyr9FzSpPXiO 92rXfpT7dlKD7M8oXwzwK8faoe32x7oJxVfk24HJa8RzHIpXFrHswZPKZlOrSXaJdJpK fPSw== X-Gm-Message-State: AOAM532pBC0wvjyasG2DC2jS4KYPIvI9LgYq6pyffhdUZ2YJA96BB8tw GF6Z4S/XMLVfIp8MBh6wBkdla5ffpvYj X-Google-Smtp-Source: ABdhPJy59f17KuCsTXsfdNS84v2lYYHGLYA7G6vD6O8G6XMs2NS4Wm7iVuWEKqRlij49CBJW2O0ZI1amVJYv X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:e806:b0:15e:a456:e528 with SMTP id u6-20020a170902e80600b0015ea456e528mr17388593plg.114.1651700724991; Wed, 04 May 2022 14:45:24 -0700 (PDT) Date: Wed, 4 May 2022 14:44:36 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-13-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 12/13] selftests/vm: add MADV_COLLAPSE collapse context to selftests From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Queue-Id: 358AA140081 X-Stat-Signature: czt5r9ra9mfdydp3tyqs87q7se8jbp4a Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=F0T9mdOV; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf09.hostedemail.com: domain of 39PNyYgcKCME6vrllmlnvvnsl.jvtspu14-ttr2hjr.vyn@flex--zokeefe.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=39PNyYgcKCME6vrllmlnvvnsl.jvtspu14-ttr2hjr.vyn@flex--zokeefe.bounces.google.com X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1651700720-218992 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add MADV_COLLAPSE selftests. Extend struct collapse_context to support context initialization/cleanup. This is used by madvise collapse context to "disable" and "enable" khugepaged, since it would otherwise interfere with the tests. The mechanism used to "disable" khugepaged is a hack: it sets /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs to a large value and feeds khugepaged enough suitable VMAs/pages to keep khugepaged sleeping for the duration of the madvise collapse tests. Since khugepaged is woken when this file is written, enough VMAs must be queued to put khugepaged back to sleep when the tests write to this file in write_settings(). Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 133 ++++++++++++++++++++++-- 1 file changed, 125 insertions(+), 8 deletions(-) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index c59d832fee96..e0ccc9443f78 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -14,17 +14,23 @@ #ifndef MADV_PAGEOUT #define MADV_PAGEOUT 21 #endif +#ifndef MADV_COLLAPSE +#define MADV_COLLAPSE 25 +#endif #define BASE_ADDR ((void *)(1UL << 30)) static unsigned long hpage_pmd_size; static unsigned long page_size; static int hpage_pmd_nr; +static int num_khugepaged_wakeups; #define THP_SYSFS "/sys/kernel/mm/transparent_hugepage/" #define PID_SMAPS "/proc/self/smaps" struct collapse_context { const char *name; + bool (*init_context)(void); + bool (*cleanup_context)(void); void (*collapse)(const char *msg, char *p, bool expect); bool enforce_pte_scan_limits; }; @@ -264,6 +270,17 @@ static void write_num(const char *name, unsigned long num) } } +/* + * Use this macro instead of write_settings inside tests, and should + * be called at most once per callsite. + * + * Hack to statically count the number of times khugepaged is woken up due to + * writes to + * /sys/kernel/mm/transparent_hugepage/khugepaged/scan_sleep_millisecs, + * and is stored in __COUNTER__. + */ +#define WRITE_SETTINGS(s) do { __COUNTER__; write_settings(s); } while (0) + static void write_settings(struct settings *settings) { struct khugepaged_settings *khugepaged = &settings->khugepaged; @@ -332,7 +349,7 @@ static void adjust_settings(void) { printf("Adjust settings..."); - write_settings(&default_settings); + WRITE_SETTINGS(&default_settings); success("OK"); } @@ -440,20 +457,25 @@ static bool check_swap(void *addr, unsigned long size) return swap; } -static void *alloc_mapping(void) +static void *alloc_mapping_at(void *at, size_t size) { void *p; - p = mmap(BASE_ADDR, hpage_pmd_size, PROT_READ | PROT_WRITE, - MAP_ANONYMOUS | MAP_PRIVATE, -1, 0); - if (p != BASE_ADDR) { - printf("Failed to allocate VMA at %p\n", BASE_ADDR); + p = mmap(at, size, PROT_READ | PROT_WRITE, MAP_ANONYMOUS | MAP_PRIVATE, + -1, 0); + if (p != at) { + printf("Failed to allocate VMA at %p\n", at); exit(EXIT_FAILURE); } return p; } +static void *alloc_mapping(void) +{ + return alloc_mapping_at(BASE_ADDR, hpage_pmd_size); +} + static void fill_memory(int *p, unsigned long start, unsigned long end) { int i; @@ -573,7 +595,7 @@ static void collapse_max_ptes_none(struct collapse_context *context) void *p; settings.khugepaged.max_ptes_none = max_ptes_none; - write_settings(&settings); + WRITE_SETTINGS(&settings); p = alloc_mapping(); @@ -591,7 +613,7 @@ static void collapse_max_ptes_none(struct collapse_context *context) } munmap(p, hpage_pmd_size); - write_settings(&default_settings); + WRITE_SETTINGS(&default_settings); } static void collapse_swapin_single_pte(struct collapse_context *context) @@ -947,6 +969,87 @@ static void collapse_max_ptes_shared(struct collapse_context *context) munmap(p, hpage_pmd_size); } +static void madvise_collapse(const char *msg, char *p, bool expect) +{ + int ret; + + printf("%s...", msg); + /* Sanity check */ + if (check_huge(p)) { + printf("Unexpected huge page\n"); + exit(EXIT_FAILURE); + } + + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + ret = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (((bool)ret) == expect) + fail("Fail: Bad return value"); + else if (check_huge(p) != expect) + fail("Fail: check_huge()"); + else + success("OK"); +} + +static struct khugepaged_disable_state { + void *p; + size_t map_size; +} khugepaged_disable_state; + +static bool disable_khugepaged(void) +{ + /* + * Hack to "disable" khugepaged by setting + * /transparent_hugepage/khugepaged/scan_sleep_millisecs to some large + * value, then feeding it enough suitable VMAs to scan and subsequently + * sleep. + * + * khugepaged is woken up on writes to + * /transparent_hugepage/khugepaged/scan_sleep_millisecs, so care must + * be taken to not inadvertently wake khugepaged in these tests. + * + * Feed khugepaged 1 hugepage-sized VMA to scan and sleep on, then + * N more for each time khugepaged would be woken up. + */ + size_t map_size = (num_khugepaged_wakeups + 1) * hpage_pmd_size; + void *p; + bool ret = true; + int full_scans; + int timeout = 6; /* 3 seconds */ + + default_settings.khugepaged.scan_sleep_millisecs = 1000 * 60 * 10; + default_settings.khugepaged.pages_to_scan = 1; + write_settings(&default_settings); + + p = alloc_mapping_at(((char *)BASE_ADDR) + (1UL << 30), map_size); + fill_memory(p, 0, map_size); + + full_scans = read_num("khugepaged/full_scans") + 2; + + printf("disabling khugepaged..."); + while (timeout--) { + if (read_num("khugepaged/full_scans") >= full_scans) { + fail("Fail"); + ret = false; + break; + } + printf("."); + usleep(TICK); + } + success("OK"); + khugepaged_disable_state.p = p; + khugepaged_disable_state.map_size = map_size; + return ret; +} + +static bool enable_khugepaged(void) +{ + printf("enabling khugepaged..."); + munmap(khugepaged_disable_state.p, khugepaged_disable_state.map_size); + write_settings(&saved_settings); + success("OK"); + return true; +} + static void khugepaged_collapse(const char *msg, char *p, bool expect) { if (wait_for_scan(msg, p)) @@ -962,9 +1065,18 @@ int main(void) struct collapse_context contexts[] = { { .name = "khugepaged", + .init_context = NULL, + .cleanup_context = NULL, .collapse = &khugepaged_collapse, .enforce_pte_scan_limits = true, }, + { + .name = "madvise", + .init_context = &disable_khugepaged, + .cleanup_context = &enable_khugepaged, + .collapse = &madvise_collapse, + .enforce_pte_scan_limits = false, + }, }; int i; @@ -973,6 +1085,7 @@ int main(void) page_size = getpagesize(); hpage_pmd_size = read_num("hpage_pmd_size"); hpage_pmd_nr = hpage_pmd_size / page_size; + num_khugepaged_wakeups = __COUNTER__; default_settings.khugepaged.max_ptes_none = hpage_pmd_nr - 1; default_settings.khugepaged.max_ptes_swap = hpage_pmd_nr / 8; @@ -988,6 +1101,8 @@ int main(void) struct collapse_context *c = &contexts[i]; printf("\n*** Testing context: %s ***\n", c->name); + if (c->init_context && !c->init_context()) + continue; collapse_full(c); collapse_empty(c); collapse_single_pte_entry(c); @@ -1000,6 +1115,8 @@ int main(void) collapse_fork(c); collapse_fork_compound(c); collapse_max_ptes_shared(c); + if (c->cleanup_context && !c->cleanup_context()) + break; } restore_settings(0); From patchwork Wed May 4 21:44:37 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12838675 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3577DC433FE for ; Wed, 4 May 2022 21:45:29 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B6E978D0002; Wed, 4 May 2022 17:45:28 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id A7D318D0001; Wed, 4 May 2022 17:45:28 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91FBD8D0002; Wed, 4 May 2022 17:45:28 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id 83FCC8D0001 for ; Wed, 4 May 2022 17:45:28 -0400 (EDT) Received: from smtpin07.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id 69FC7205DD for ; Wed, 4 May 2022 21:45:28 +0000 (UTC) X-FDA: 79429392336.07.2EE6791 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf12.hostedemail.com (Postfix) with ESMTP id C33954007E for ; Wed, 4 May 2022 21:45:11 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id v8-20020a170902b7c800b0015e927ee201so1301140plz.12 for ; Wed, 04 May 2022 14:45:27 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=NzNBsVmMl2I2JJIrs2ROjNfXGNBNgi3t9j2NCCw70/0=; b=PwS+NtKcDMIIUY4SEevHHqNoixdNbp/xdwcK40h85Mg5juFubPMsMYG8fWLUqjYREv Tz7CdUaPDpMMPKUGv1lUcu5Np232ua45c6SH3Q+x3zDRY6YCwB8zSaHpiliatII5Zoc7 Q1MqyO5O+hglmUdQl5MAqSfc2dFFBkGJYKgKnk6CsXgCeZKUf1/K58Ff4EzpJzOBFhwt CMMUS0sfq8odntCLX/nAtPgOAvkF2zyLWtS6VVT3tmu2rZIQK5ydAEiTsAABDFNYwoD7 YkU0iGE4OU88Ox3rYhkUAYPqBIdTW/fVqnMDgOHg+hMcY/sZ0uRw/mowHdvjJ1T/vRjv 6RBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=NzNBsVmMl2I2JJIrs2ROjNfXGNBNgi3t9j2NCCw70/0=; b=OUs7WqtxNl6khHoIUixc2/Ic33ifx1Loq0sD2dGL2yO1TvuGAqyPijHNX/CYOql+y+ 3Dl+hE7G7ditlzmxpjGTcQVEbwVVbpVLDCb36Zr2dfftKJmPrdpcnAQGTQg+TOFiB49d /QzrY2Jz/FSvKv8o2WdF/LPiSzkU/axaoz5UvNocPNw5J+fA4D83wVoxRysWK0ArO9Xd nk5QdEtqcKm14w6GDvUvDACbxJm649I+Y1TP65zWS7WTM6aR6qW3OHcnua4B4T1QIBrq qFbOkOYZq1cMHEJxVGWNjrUbUET+rLMR1vbOhs3Bojc2pUVsbxnVPjhH1fpkzCvhLM/G CzTg== X-Gm-Message-State: AOAM530DBYbABPiFK3lQNfKUd7+ITQDPt0IyTGm/Y+BwRqMEflmxfA6/ XV+mOKqV5jDvjL6HrzrqE5qiWnbUT00w X-Google-Smtp-Source: ABdhPJzgBymbZaitntdgtkp0iy6XCuLKPl+A0uIM9BMiLExJE3DOb5ddRaDSl6bO6z3KEMY16Owm34crgmDW X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:8d6:b0:505:b41a:13a0 with SMTP id s22-20020a056a0008d600b00505b41a13a0mr22693798pfu.67.1651700727034; Wed, 04 May 2022 14:45:27 -0700 (PDT) Date: Wed, 4 May 2022 14:44:37 -0700 In-Reply-To: <20220504214437.2850685-1-zokeefe@google.com> Message-Id: <20220504214437.2850685-14-zokeefe@google.com> Mime-Version: 1.0 References: <20220504214437.2850685-1-zokeefe@google.com> X-Mailer: git-send-email 2.36.0.464.gb9c8b46e94-goog Subject: [PATCH v5 13/13] selftests/vm: add test to verify recollapse of THPs From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Matthew Wilcox , Michal Hocko , Pasha Tatashin , Peter Xu , SeongJae Park , Song Liu , Vlastimil Babka , Yang Shi , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Thomas Bogendoerfer , "Zach O'Keefe" X-Rspamd-Server: rspam04 X-Rspamd-Queue-Id: C33954007E X-Stat-Signature: kbk35ug7gs7ji5fxzigcwp593kp73i1u X-Rspam-User: Authentication-Results: imf12.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=PwS+NtKc; spf=pass (imf12.hostedemail.com: domain of 39_NyYgcKCMQ9yuoopoqyyqvo.mywvsx47-wwu5kmu.y1q@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=39_NyYgcKCMQ9yuoopoqyyqvo.mywvsx47-wwu5kmu.y1q@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1651700711-425305 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add selftest specific to madvise collapse context that tests MADV_COLLAPSE is "successful" if a hugepage-algined/sized region is already pmd-mapped. Signed-off-by: Zach O'Keefe --- tools/testing/selftests/vm/khugepaged.c | 32 +++++++++++++++++++++++++ 1 file changed, 32 insertions(+) diff --git a/tools/testing/selftests/vm/khugepaged.c b/tools/testing/selftests/vm/khugepaged.c index e0ccc9443f78..c36d04218083 100644 --- a/tools/testing/selftests/vm/khugepaged.c +++ b/tools/testing/selftests/vm/khugepaged.c @@ -969,6 +969,32 @@ static void collapse_max_ptes_shared(struct collapse_context *context) munmap(p, hpage_pmd_size); } +static void madvise_collapse_existing_thps(void) +{ + void *p; + int err; + + p = alloc_mapping(); + fill_memory(p, 0, hpage_pmd_size); + + printf("Collapse fully populated PTE table..."); + madvise(p, hpage_pmd_size, MADV_HUGEPAGE); + err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (err == 0 && check_huge(p)) { + success("OK"); + printf("Re-collapse PMD-mapped hugepage"); + err = madvise(p, hpage_pmd_size, MADV_COLLAPSE); + if (err == 0 && check_huge(p)) + success("OK"); + else + fail("Fail"); + } else { + fail("Fail"); + } + validate_memory(p, 0, hpage_pmd_size); + munmap(p, hpage_pmd_size); +} + static void madvise_collapse(const char *msg, char *p, bool expect) { int ret; @@ -1097,6 +1123,7 @@ int main(void) alloc_at_fault(); + /* Shared tests */ for (i = 0; i < sizeof(contexts) / sizeof(contexts[0]); ++i) { struct collapse_context *c = &contexts[i]; @@ -1119,5 +1146,10 @@ int main(void) break; } + /* madvise-specific tests */ + disable_khugepaged(); + madvise_collapse_existing_thps(); + enable_khugepaged(); + restore_settings(0); }