From patchwork Tue Mar 8 21:34:04 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774390 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 512E0C433EF for ; Tue, 8 Mar 2022 21:34:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id DD9878D0003; Tue, 8 Mar 2022 16:34:48 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id D378C8D0001; Tue, 8 Mar 2022 16:34:48 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BB1938D0003; Tue, 8 Mar 2022 16:34:48 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0162.hostedemail.com [216.40.44.162]) by kanga.kvack.org (Postfix) with ESMTP id A95BC8D0001 for ; Tue, 8 Mar 2022 16:34:48 -0500 (EST) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 5E52A95C99 for ; Tue, 8 Mar 2022 21:34:48 +0000 (UTC) X-FDA: 79222523856.20.F4F9CA1 Received: from mail-pl1-f201.google.com (mail-pl1-f201.google.com [209.85.214.201]) by imf21.hostedemail.com (Postfix) with ESMTP id EFC201C000A for ; Tue, 8 Mar 2022 21:34:47 +0000 (UTC) Received: by mail-pl1-f201.google.com with SMTP id c10-20020a170902d48a00b00151cf8ca3c7so163048plg.0 for ; Tue, 08 Mar 2022 13:34:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ve0m0yCLuMUl2tRX0tiaUyBdaoAzAOTE2DnxWbiiZhI=; b=aeb1HQTcxeemspgEKALnNTXQmEaP6/oVdrCeYJZC1LSAhUs3DoGNiiCSISQaqZV0li wEUu/eks1LwVGgW9K/Dt1N4J5C3tO3uenus+ZWZFhD2KVCEtH7/Sar5/bHI1SDVRdK4E BG4mDdTBdBIXYcxhIEEuvUq9MSUQkiCr9VZFTBan68KCSURLMarqDcs3crt7I8Ld85Lf sGLCX7BMt2u1F0LL9O5YFPLIm+OqTprzk6oabYCoFNlgqRk9Mihyx1GsqItN/qcMRyh8 xfN/nTINxQ1Jk/gAgUGsbZ6qLeza5MTm0VrYPCUpQPwoPEFF/28QBLFy29kOM4KSO1tV 41pQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ve0m0yCLuMUl2tRX0tiaUyBdaoAzAOTE2DnxWbiiZhI=; b=sgdsLCDdlEE8qJWLJi3dOt+YIuAR8EY8Er/0H/xfWRbF/FQDoHiriSixPON5xBDaqB TS4p1VL1gp7qRKh/mImIjXdySlYyLVd6mXVQd/UBRSLdnldpYoME0DSkTwiRgId97f6+ VU1q9NOqVU1umKgTGsuDdQw9RgSIod5JICMYhZEKk+St7SotAaSpdZVP6oyR+z8Q5lii B6LrnWrbmBnYS8C+1EQVCk2lAeiz2ffzJu8ledhMaUpD98Ax6leH7X4rj+pT56ps6V8X mHlfiLwNzdiLaaq8X74Y4khDCl1tY2C7vQ/TY5uC4L1lYoT01CXBMRmzz/xDcdz0ou7g O1zg== X-Gm-Message-State: AOAM532N0EZaKI6fqYNSOsWUXX49xsf8HgUiVJbNu+oaEoYLVN5eatmB Go9UOPszAE8p0EpEJnTlyroCTStwCkl2 X-Google-Smtp-Source: ABdhPJxv5H76Biwwemt4gouPtwTqXeI5LOPN3ZsAaCgJV2L+Y8z0AZ85IwYGvvwR+9Mh1We+aGTSflpLhTMr X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:4ad2:b0:1bf:d47:8077 with SMTP id mh18-20020a17090b4ad200b001bf0d478077mr6808811pjb.85.1646775287033; Tue, 08 Mar 2022 13:34:47 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:04 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-2-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 01/14] mm/rmap: add mm_find_pmd_raw helper From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Queue-Id: EFC201C000A X-Stat-Signature: 5uxe5bjff9ba6yb8imq864ndfm5b9dbh X-Rspam-User: Authentication-Results: imf21.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=aeb1HQTc; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf21.hostedemail.com: domain of 398snYgcKCEY7wsmmnmowwotm.kwutqv25-uus3iks.wzo@flex--zokeefe.bounces.google.com designates 209.85.214.201 as permitted sender) smtp.mailfrom=398snYgcKCEY7wsmmnmowwotm.kwutqv25-uus3iks.wzo@flex--zokeefe.bounces.google.com X-Rspamd-Server: rspam03 X-HE-Tag: 1646775287-111492 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Later in the series, we want to find a pmd and take different actions, depending on if the pmd maps a thp or not. Currently, mm_find_pmd() returns NULL if a valid pmd maps a thp, and so we can't use it directly. Split mm_find_pmd() into 2 parts: mm_find_pmd_raw(), which returns a raw pmd pointer, and the logic that filters out non-present none, or huge pmds. mm_find_pmd_raw() can then be reused later in the series. Signed-off-by: Zach O'Keefe --- mm/internal.h | 1 + mm/rmap.c | 15 +++++++++++++-- 2 files changed, 14 insertions(+), 2 deletions(-) diff --git a/mm/internal.h b/mm/internal.h index 86277d90a5e2..aaea25bb9096 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -166,6 +166,7 @@ extern void reclaim_throttle(pg_data_t *pgdat, enum vmscan_throttle_state reason /* * in mm/rmap.c: */ +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address); extern pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address); /* diff --git a/mm/rmap.c b/mm/rmap.c index 70375c331083..0ae99affcb27 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -758,13 +758,12 @@ unsigned long page_address_in_vma(struct page *page, struct vm_area_struct *vma) return vma_address(page, vma); } -pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) +pmd_t *mm_find_pmd_raw(struct mm_struct *mm, unsigned long address) { pgd_t *pgd; p4d_t *p4d; pud_t *pud; pmd_t *pmd = NULL; - pmd_t pmde; pgd = pgd_offset(mm, address); if (!pgd_present(*pgd)) @@ -779,6 +778,18 @@ pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) goto out; pmd = pmd_offset(pud, address); +out: + return pmd; +} + +pmd_t *mm_find_pmd(struct mm_struct *mm, unsigned long address) +{ + pmd_t pmde; + pmd_t *pmd; + + pmd = mm_find_pmd_raw(mm, address); + if (!pmd) + goto out; /* * Some THP functions use the sequence pmdp_huge_clear_flush(), set_pmd_at() * without holding anon_vma lock for write. So when looking for a From patchwork Tue Mar 8 21:34:05 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774391 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id A4F76C433EF for ; Tue, 8 Mar 2022 21:34:51 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4620F8D0005; Tue, 8 Mar 2022 16:34:51 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3E8B88D0001; Tue, 8 Mar 2022 16:34:51 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 266E58D0005; Tue, 8 Mar 2022 16:34:51 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0203.hostedemail.com [216.40.44.203]) by kanga.kvack.org (Postfix) with ESMTP id 0AEE68D0001 for ; Tue, 8 Mar 2022 16:34:51 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id B5A7A8248076 for ; Tue, 8 Mar 2022 21:34:50 +0000 (UTC) X-FDA: 79222523940.27.C4A7D4C Received: from mail-pg1-f201.google.com (mail-pg1-f201.google.com [209.85.215.201]) by imf09.hostedemail.com (Postfix) with ESMTP id 448EB14000C for ; Tue, 8 Mar 2022 21:34:50 +0000 (UTC) Received: by mail-pg1-f201.google.com with SMTP id 196-20020a6307cd000000b0038027886594so147148pgh.4 for ; Tue, 08 Mar 2022 13:34:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=fXjdhLBwr9IidH/BTWrBILXwTt16t8GxAPoafWem5J4=; b=cM8z1fR9THMGEKeS6ccOZ63TxYQDfcBTxksPxEZttLHuzWslbXz8J65TwAHDsubEvl xzNuF7RmQ26okNMbwGtY6xMN6AnV2fw07ktgFyFwTjxghpaM/UrhFU1gYAOGJvFf6qmX A6/516fod2kjRzsjYk+QgK6CyXTHouHMpA+s1FYiwdZh8sqG8UGZpx2TIla9iP9JDKKL 4RNbFf7U105droEG3x8p+5IkRJPMW3ckhDMTlQc9zMvjfFslh0WmVMoukIjtv0TKwh98 3Pn/T/PIn8zJk6xvEwB7tXTBqzwVtsiKShscPp5MrAH7CLDObNEscFAdJp0AGx6YZ5Uw HOJQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=fXjdhLBwr9IidH/BTWrBILXwTt16t8GxAPoafWem5J4=; b=xEQ1FJX84iYOe2IwyCy5pitivw5JCIQCT5piePQHQidqdhUYeGz582ILuZtU4bGpEM 9/6ApS4SL2Qx+hI1MnjNBxbx1UtmGB/VPt2LikFOwfzhPvEamnNUkJxo7HypvrOUDsjm 7sG9bS26Z6agMpIcBnkD5FPqNqAVR1tqHpnAhW9JZMHmfvMxBMq1X9nrPoo0SOSJ8Mqn zyrZK6g80kL/G6/p/ezh/nFGD86QzbXe0ZU5eyqgloBuOzR5K8w0tbudMpg33L2LkEkh 04xSUcTnHZaj+xZxL2x6+dF1uH9OynBSjd3tk0t/W5MSTWw11IiosJwf2GRQW5MtjReB 4hpQ== X-Gm-Message-State: AOAM530d/5BJ+4LWOkxHVqMsrU7BcFunazoQiUYD73Fzls/Ler7tDyHK doccNtu90rTeNOLPcPXzTujWbqE3aPnv X-Google-Smtp-Source: ABdhPJyb2GCmaN+3MS9TXdsAQpt6HiYRGnPfPv43EUDvi09kgvKBAmwu+92j3jxyCOKTpD+v8WVXbzHAptMz X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:f54f:b0:152:7e6:c32d with SMTP id h15-20020a170902f54f00b0015207e6c32dmr3482493plf.125.1646775289141; Tue, 08 Mar 2022 13:34:49 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:05 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-3-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 02/14] mm/khugepaged: add struct collapse_control From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Server: rspam10 X-Rspam-User: X-Stat-Signature: to5aneonxpkxd6xxy7ir6ofaxj6w5pxx Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=cM8z1fR9; spf=pass (imf09.hostedemail.com: domain of 3-csnYgcKCEg9yuoopoqyyqvo.mywvsx47-wwu5kmu.y1q@flex--zokeefe.bounces.google.com designates 209.85.215.201 as permitted sender) smtp.mailfrom=3-csnYgcKCEg9yuoopoqyyqvo.mywvsx47-wwu5kmu.y1q@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Queue-Id: 448EB14000C X-HE-Tag: 1646775290-632719 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Modularize huge page collapse by introducing struct collapse_control. This structure serves to describe the properties of the requested collapse, as well as serve as a local scratch pad to use during the collapse itself. Later in the series when we introduce the madvise collapse context, we will want to be able to ignore khugepaged_max_ptes_[none|swap|shared] in said context, and so is included here as a property of the requested collapse. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 120 ++++++++++++++++++++++++++++++------------------ 1 file changed, 76 insertions(+), 44 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a4e5eaf3eb01..36fc0099c445 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -85,6 +85,24 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 +struct collapse_control { + /* Respect khugepaged_max_ptes_[none|swap|shared] */ + bool enforce_pte_scan_limits; + + /* Num pages scanned per node */ + int node_load[MAX_NUMNODES]; + + /* Last target selected in khugepaged_find_target_node() for this scan */ + int last_target_node; +}; + +static void collapse_control_init(struct collapse_control *cc, + bool enforce_pte_scan_limits) +{ + cc->enforce_pte_scan_limits = enforce_pte_scan_limits; + cc->last_target_node = NUMA_NO_NODE; +} + /** * struct mm_slot - hash lookup from mm to mm_slot * @hash: hash collision list @@ -601,6 +619,7 @@ static bool is_refcount_suitable(struct page *page) static int __collapse_huge_page_isolate(struct vm_area_struct *vma, unsigned long address, pte_t *pte, + bool enforce_pte_scan_limits, struct list_head *compound_pagelist) { struct page *page = NULL; @@ -614,7 +633,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, if (pte_none(pteval) || (pte_present(pteval) && is_zero_pfn(pte_pfn(pteval)))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !enforce_pte_scan_limits)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -634,8 +654,8 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, VM_BUG_ON_PAGE(!PageAnon(page), page); - if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + if (page_mapcount(page) > 1 && enforce_pte_scan_limits && + ++shared > khugepaged_max_ptes_shared) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out; @@ -785,9 +805,7 @@ static void khugepaged_alloc_sleep(void) remove_wait_queue(&khugepaged_wait, &wait); } -static int khugepaged_node_load[MAX_NUMNODES]; - -static bool khugepaged_scan_abort(int nid) +static bool khugepaged_scan_abort(int nid, struct collapse_control *cc) { int i; @@ -799,11 +817,11 @@ static bool khugepaged_scan_abort(int nid) return false; /* If there is a count for this node already, it must be acceptable */ - if (khugepaged_node_load[nid]) + if (cc->node_load[nid]) return false; for (i = 0; i < MAX_NUMNODES; i++) { - if (!khugepaged_node_load[i]) + if (!cc->node_load[i]) continue; if (node_distance(nid, i) > node_reclaim_distance) return true; @@ -818,28 +836,28 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { - static int last_khugepaged_target_node = NUMA_NO_NODE; int nid, target_node = 0, max_value = 0; /* find first node with max normal pages hit */ for (nid = 0; nid < MAX_NUMNODES; nid++) - if (khugepaged_node_load[nid] > max_value) { - max_value = khugepaged_node_load[nid]; + if (cc->node_load[nid] > max_value) { + max_value = cc->node_load[nid]; target_node = nid; } /* do some balance if several nodes have the same hit record */ - if (target_node <= last_khugepaged_target_node) - for (nid = last_khugepaged_target_node + 1; nid < MAX_NUMNODES; - nid++) - if (max_value == khugepaged_node_load[nid]) { + if (target_node <= cc->last_target_node) + for (nid = cc->last_target_node + 1; nid < MAX_NUMNODES; + nid++) { + if (max_value == cc->node_load[nid]) { target_node = nid; break; } + } - last_khugepaged_target_node = target_node; + cc->last_target_node = target_node; return target_node; } @@ -877,7 +895,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) return *hpage; } #else -static int khugepaged_find_target_node(void) +static int khugepaged_find_target_node(struct collapse_control *cc) { return 0; } @@ -1043,7 +1061,8 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, static void collapse_huge_page(struct mm_struct *mm, unsigned long address, struct page **hpage, - int node, int referenced, int unmapped) + int node, int referenced, int unmapped, + int enforce_pte_scan_limits) { LIST_HEAD(compound_pagelist); pmd_t *pmd, _pmd; @@ -1141,7 +1160,7 @@ static void collapse_huge_page(struct mm_struct *mm, spin_lock(pte_ptl); isolated = __collapse_huge_page_isolate(vma, address, pte, - &compound_pagelist); + enforce_pte_scan_limits, &compound_pagelist); spin_unlock(pte_ptl); if (unlikely(!isolated)) { @@ -1206,7 +1225,8 @@ static void collapse_huge_page(struct mm_struct *mm, static int khugepaged_scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, - struct page **hpage) + struct page **hpage, + struct collapse_control *cc) { pmd_t *pmd; pte_t *pte, *_pte; @@ -1226,13 +1246,14 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, goto out; } - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); for (_address = address, _pte = pte; _pte < pte+HPAGE_PMD_NR; _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; if (is_swap_pte(pteval)) { - if (++unmapped <= khugepaged_max_ptes_swap) { + if (++unmapped <= khugepaged_max_ptes_swap || + !cc->enforce_pte_scan_limits) { /* * Always be strict with uffd-wp * enabled swap entries. Please see @@ -1251,7 +1272,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { if (!userfaultfd_armed(vma) && - ++none_or_zero <= khugepaged_max_ptes_none) { + (++none_or_zero <= khugepaged_max_ptes_none || + !cc->enforce_pte_scan_limits)) { continue; } else { result = SCAN_EXCEED_NONE_PTE; @@ -1282,7 +1304,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, } if (page_mapcount(page) > 1 && - ++shared > khugepaged_max_ptes_shared) { + ++shared > khugepaged_max_ptes_shared && + cc->enforce_pte_scan_limits) { result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; @@ -1292,16 +1315,16 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, /* * Record which node the original page is from and save this - * information to khugepaged_node_load[]. + * information to cc->node_load[]. * Khugepaged will allocate hugepage from the node has the max * hit record. */ node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; goto out_unmap; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; goto out_unmap; @@ -1352,10 +1375,11 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, out_unmap: pte_unmap_unlock(pte, ptl); if (ret) { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ collapse_huge_page(mm, address, hpage, node, - referenced, unmapped); + referenced, unmapped, + cc->enforce_pte_scan_limits); } out: trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, @@ -1992,7 +2016,8 @@ static void collapse_file(struct mm_struct *mm, } static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) + struct file *file, pgoff_t start, struct page **hpage, + struct collapse_control *cc) { struct page *page = NULL; struct address_space *mapping = file->f_mapping; @@ -2003,14 +2028,15 @@ static void khugepaged_scan_file(struct mm_struct *mm, present = 0; swap = 0; - memset(khugepaged_node_load, 0, sizeof(khugepaged_node_load)); + memset(cc->node_load, 0, sizeof(cc->node_load)); rcu_read_lock(); xas_for_each(&xas, page, start + HPAGE_PMD_NR - 1) { if (xas_retry(&xas, page)) continue; if (xa_is_value(page)) { - if (++swap > khugepaged_max_ptes_swap) { + if (cc->enforce_pte_scan_limits && + ++swap > khugepaged_max_ptes_swap) { result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); break; @@ -2028,11 +2054,11 @@ static void khugepaged_scan_file(struct mm_struct *mm, } node = page_to_nid(page); - if (khugepaged_scan_abort(node)) { + if (khugepaged_scan_abort(node, cc)) { result = SCAN_SCAN_ABORT; break; } - khugepaged_node_load[node]++; + cc->node_load[node]++; if (!PageLRU(page)) { result = SCAN_PAGE_LRU; @@ -2061,11 +2087,12 @@ static void khugepaged_scan_file(struct mm_struct *mm, rcu_read_unlock(); if (result == SCAN_SUCCEED) { - if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none) { + if (present < HPAGE_PMD_NR - khugepaged_max_ptes_none && + cc->enforce_pte_scan_limits) { result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(); + node = khugepaged_find_target_node(cc); collapse_file(mm, file, start, hpage, node); } } @@ -2074,7 +2101,8 @@ static void khugepaged_scan_file(struct mm_struct *mm, } #else static void khugepaged_scan_file(struct mm_struct *mm, - struct file *file, pgoff_t start, struct page **hpage) + struct file *file, pgoff_t start, struct page **hpage, + struct collapse_control *cc) { BUILD_BUG(); } @@ -2085,7 +2113,8 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) #endif static unsigned int khugepaged_scan_mm_slot(unsigned int pages, - struct page **hpage) + struct page **hpage, + struct collapse_control *cc) __releases(&khugepaged_mm_lock) __acquires(&khugepaged_mm_lock) { @@ -2161,12 +2190,12 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, mmap_read_unlock(mm); ret = 1; - khugepaged_scan_file(mm, file, pgoff, hpage); + khugepaged_scan_file(mm, file, pgoff, hpage, cc); fput(file); } else { ret = khugepaged_scan_pmd(mm, vma, khugepaged_scan.address, - hpage); + hpage, cc); } /* move to next address */ khugepaged_scan.address += HPAGE_PMD_SIZE; @@ -2222,7 +2251,7 @@ static int khugepaged_wait_event(void) kthread_should_stop(); } -static void khugepaged_do_scan(void) +static void khugepaged_do_scan(struct collapse_control *cc) { struct page *hpage = NULL; unsigned int progress = 0, pass_through_head = 0; @@ -2246,7 +2275,7 @@ static void khugepaged_do_scan(void) if (khugepaged_has_work() && pass_through_head < 2) progress += khugepaged_scan_mm_slot(pages - progress, - &hpage); + &hpage, cc); else progress = pages; spin_unlock(&khugepaged_mm_lock); @@ -2285,12 +2314,15 @@ static void khugepaged_wait_work(void) static int khugepaged(void *none) { struct mm_slot *mm_slot; + struct collapse_control cc; + + collapse_control_init(&cc, /* enforce_pte_scan_limits= */ 1); set_freezable(); set_user_nice(current, MAX_NICE); while (!kthread_should_stop()) { - khugepaged_do_scan(); + khugepaged_do_scan(&cc); khugepaged_wait_work(); } From patchwork Tue Mar 8 21:34:06 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774392 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 1EBE9C433FE for ; Tue, 8 Mar 2022 21:34:54 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B17608D0006; Tue, 8 Mar 2022 16:34:53 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A9F1E8D0001; Tue, 8 Mar 2022 16:34:53 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 91AAE8D0006; Tue, 8 Mar 2022 16:34:53 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id 82E938D0001 for ; Tue, 8 Mar 2022 16:34:53 -0500 (EST) Received: from smtpin17.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 3A4DCA15E4 for ; Tue, 8 Mar 2022 21:34:53 +0000 (UTC) X-FDA: 79222524066.17.A692B9E Received: from mail-pl1-f202.google.com (mail-pl1-f202.google.com [209.85.214.202]) by imf09.hostedemail.com (Postfix) with ESMTP id C041314000C for ; Tue, 8 Mar 2022 21:34:52 +0000 (UTC) Received: by mail-pl1-f202.google.com with SMTP id w13-20020a1709027b8d00b0014fb4f012d3so124534pll.12 for ; Tue, 08 Mar 2022 13:34:52 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=+lRk9Yw5FndqvuB7uGyspfcX1upM1bCkLuVZqsL26no=; b=j4k/61voIeMKHKn2nyzf/ASuMPMU7+4XFrw+rKipLfUjvTS0HyzRQ2D/Y7VlHr7sqL JdaAI4IZse7duXR5UuSy+BFm72DjH0GRu039iIZaj73tQH0EJtONjA6ihmF9Ryp7cnSD XRFzw/OhRuxL2wOz9ma+6CFAgT7TtSMky2nmW9v97YWaf2IqoGpgFc9vwizrEc3j6Fny nzBR8EeBC1McmbtzJwZN0jE0DF/r0s4XTdAQgVsxWq9rUo7LNNI+4wtD42D480U2OeMz zq4pUoeLBoB9JbYdiObZjODOk4YGzoahpwEJ+drr/K+zSS7fjDdqQcoeD7iaS0Lxu/ks A0cw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=+lRk9Yw5FndqvuB7uGyspfcX1upM1bCkLuVZqsL26no=; b=AKPpL5T6xTFCRlwCmlwxnW717SaSZhaxF5ES7opZ7ICBPT5cpxCOcb0KIDXD3ig5mt ubwCVxUJLVd67hziiKu7IIRaTae7cfBMIgeqh9QuW5sRSQkncp7AAkWjPg/Ps24Lbtyc AX/V9IJ1xzHDa8GjtejjH0m1xT8BYItrs4Jr22+PAlzeCsAhNK3fsWtq9scUPfXb+Xnh te/W2pDXpwd+fLKJB0n8v+AUaImWkNq+udJW180i0qRUCVetIKQdhf8dVabckyJQws07 F+QCV2JJ/5v0N9kosGKZCBE9pFV6XT8WiV9tfdh77rf84n8/nKiKIx/gaTiQBZOXVI3O YP/g== X-Gm-Message-State: AOAM532Nb4sNHF29BtPeAgt8GhRH6T8QXyZxzjnNU6lEYzB+LZqC+jtx TL50ok/bigi9u/urunTIsNNBpnTDsNk4 X-Google-Smtp-Source: ABdhPJyi6c4gqjy9u6OJOMVextgij1eoe8hgLPyeyWRbP2OwddCNIQN1ztDocVr0qlcONDm+1+BeRoqDpf5e X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:1e10:b0:1bf:6c78:54a9 with SMTP id pg16-20020a17090b1e1000b001bf6c7854a9mr927812pjb.1.1646775291422; Tue, 08 Mar 2022 13:34:51 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:06 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-4-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 03/14] mm/khugepaged: add __do_collapse_huge_page() helper From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Queue-Id: C041314000C X-Stat-Signature: z8ns8xpkcyt6dxo1wbf1z33dzwopi7fg Authentication-Results: imf09.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="j4k/61vo"; spf=pass (imf09.hostedemail.com: domain of 3-8snYgcKCEoB0wqqrqs00sxq.o0yxuz69-yyw7mow.03s@flex--zokeefe.bounces.google.com designates 209.85.214.202 as permitted sender) smtp.mailfrom=3-8snYgcKCEoB0wqqrqs00sxq.o0yxuz69-yyw7mow.03s@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1646775292-799544 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: collapse_huge_page currently does: (1) possibly allocates a hugepage, (2) charges the owning memcg, (3) swaps in swapped-out pages (4) the actual collapse (copying of pages, installation of huge pmd), and (5) some final memcg accounting in error path. Separate out (4) so that it can be reused by itself later in the series. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 178 +++++++++++++++++++++++++++--------------------- 1 file changed, 100 insertions(+), 78 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 36fc0099c445..e3399a451662 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1058,85 +1058,23 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, return true; } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - int node, int referenced, int unmapped, - int enforce_pte_scan_limits) -{ - LIST_HEAD(compound_pagelist); - pmd_t *pmd, _pmd; +static int __do_collapse_huge_page(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, pmd_t *pmd, + struct page *new_page, + int enforce_pte_scan_limits, + int *isolated_out) +{ + pmd_t _pmd; pte_t *pte; pgtable_t pgtable; - struct page *new_page; spinlock_t *pmd_ptl, *pte_ptl; - int isolated = 0, result = 0; - struct vm_area_struct *vma; + int isolated = 0, result = SCAN_SUCCEED; struct mmu_notifier_range range; - gfp_t gfp; - - VM_BUG_ON(address & ~HPAGE_PMD_MASK); - - /* Only allocate from the target node */ - gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; - - /* - * Before allocating the hugepage, release the mmap_lock read lock. - * The allocation can take potentially a long time if it involves - * sync compaction, and we do not need to hold the mmap_lock during - * that. We will recheck the vma after taking it again in write mode. - */ - mmap_read_unlock(mm); - new_page = khugepaged_alloc_page(hpage, gfp, node); - if (!new_page) { - result = SCAN_ALLOC_HUGE_PAGE_FAIL; - goto out_nolock; - } - - if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { - result = SCAN_CGROUP_CHARGE_FAIL; - goto out_nolock; - } - count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); - - mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); - if (result) { - mmap_read_unlock(mm); - goto out_nolock; - } - - pmd = mm_find_pmd(mm, address); - if (!pmd) { - result = SCAN_PMD_NULL; - mmap_read_unlock(mm); - goto out_nolock; - } - - /* - * __collapse_huge_page_swapin always returns with mmap_lock locked. - * If it fails, we release mmap_lock and jump out_nolock. - * Continuing to collapse causes inconsistency. - */ - if (unmapped && !__collapse_huge_page_swapin(mm, vma, address, - pmd, referenced)) { - mmap_read_unlock(mm); - goto out_nolock; - } + LIST_HEAD(compound_pagelist); - mmap_read_unlock(mm); - /* - * Prevent all access to pagetables with the exception of - * gup_fast later handled by the ptep_clear_flush and the VM - * handled by the anon_vma lock + PG_lock. - */ - mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); - if (result) - goto out_up_write; - /* check if the pmd is still valid */ - if (mm_find_pmd(mm, address) != pmd) - goto out_up_write; + VM_BUG_ON(!new_page); + mmap_assert_write_locked(mm); anon_vma_lock_write(vma->anon_vma); @@ -1176,7 +1114,7 @@ static void collapse_huge_page(struct mm_struct *mm, spin_unlock(pmd_ptl); anon_vma_unlock_write(vma->anon_vma); result = SCAN_FAIL; - goto out_up_write; + goto out; } /* @@ -1208,11 +1146,95 @@ static void collapse_huge_page(struct mm_struct *mm, set_pmd_at(mm, address, pmd, _pmd); update_mmu_cache_pmd(vma, address, pmd); spin_unlock(pmd_ptl); +out: + if (isolated_out) + *isolated_out = isolated; + return result; +} - *hpage = NULL; - khugepaged_pages_collapsed++; - result = SCAN_SUCCEED; +static void collapse_huge_page(struct mm_struct *mm, + unsigned long address, + struct page **hpage, + int node, int referenced, int unmapped, + int enforce_pte_scan_limits) +{ + pmd_t *pmd; + struct page *new_page; + int isolated = 0, result = 0; + struct vm_area_struct *vma; + gfp_t gfp; + + VM_BUG_ON(address & ~HPAGE_PMD_MASK); + + /* Only allocate from the target node */ + gfp = alloc_hugepage_khugepaged_gfpmask() | __GFP_THISNODE; + + /* + * Before allocating the hugepage, release the mmap_lock read lock. + * The allocation can take potentially a long time if it involves + * sync compaction, and we do not need to hold the mmap_lock during + * that. We will recheck the vma after taking it again in write mode. + */ + mmap_read_unlock(mm); + new_page = khugepaged_alloc_page(hpage, gfp, node); + if (!new_page) { + result = SCAN_ALLOC_HUGE_PAGE_FAIL; + goto out_nolock; + } + + if (unlikely(mem_cgroup_charge(page_folio(new_page), mm, gfp))) { + result = SCAN_CGROUP_CHARGE_FAIL; + goto out_nolock; + } + count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); + + mmap_read_lock(mm); + result = hugepage_vma_revalidate(mm, address, &vma); + if (result) { + mmap_read_unlock(mm); + goto out_nolock; + } + + pmd = mm_find_pmd(mm, address); + if (!pmd) { + result = SCAN_PMD_NULL; + mmap_read_unlock(mm); + goto out_nolock; + } + + /* + * __collapse_huge_page_swapin always returns with mmap_lock locked. + * If it fails, we release mmap_lock and jump out_nolock. + * Continuing to collapse causes inconsistency. + */ + if (unmapped && !__collapse_huge_page_swapin(mm, vma, address, + pmd, referenced)) { + mmap_read_unlock(mm); + goto out_nolock; + } + + mmap_read_unlock(mm); + /* + * Prevent all access to pagetables with the exception of + * gup_fast later handled by the ptep_clear_flush and the VM + * handled by the anon_vma lock + PG_lock. + */ + mmap_write_lock(mm); + + result = hugepage_vma_revalidate(mm, address, &vma); + if (result) + goto out_up_write; + /* check if the pmd is still valid */ + if (mm_find_pmd(mm, address) != pmd) + goto out_up_write; + + result = __do_collapse_huge_page(mm, vma, address, pmd, new_page, + enforce_pte_scan_limits, &isolated); + if (result == SCAN_SUCCEED) { + *hpage = NULL; + khugepaged_pages_collapsed++; + } out_up_write: mmap_write_unlock(mm); out_nolock: From patchwork Tue Mar 8 21:34:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774393 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C48E0C433F5 for ; Tue, 8 Mar 2022 21:34:55 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 598238D0007; Tue, 8 Mar 2022 16:34:55 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 521598D0001; Tue, 8 Mar 2022 16:34:55 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 39B6B8D0007; Tue, 8 Mar 2022 16:34:55 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.a.hostedemail.com [64.99.140.24]) by kanga.kvack.org (Postfix) with ESMTP id 24A328D0001 for ; Tue, 8 Mar 2022 16:34:55 -0500 (EST) Received: from smtpin01.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay13.hostedemail.com (Postfix) with ESMTP id D7CA160596 for ; Tue, 8 Mar 2022 21:34:54 +0000 (UTC) X-FDA: 79222524108.01.BA741BE Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf16.hostedemail.com (Postfix) with ESMTP id 6DC9A18000A for ; Tue, 8 Mar 2022 21:34:54 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id o41-20020a17090a0a2c00b001bf06e5badfso292004pjo.3 for ; Tue, 08 Mar 2022 13:34:54 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=NMj28RQu89IpBjP1IhWkIlBf01qLZ8nKExbaoqPbgbw=; b=bHTSNRFrQNOgrgEyO/iGzwg+dE/4R42AsAOXYzqf6xj86GE8ZRjY2uSxgfFeANeXz4 JlaibAaIlVnbDSQv56rSHq7AO2sFjpCTFUMLD1ei0szirxDiTu1hzULbIkMTaOInHplb 8v7PMpB9jqElfoTCV3lJHV2Olxzntpof6kz5UkNKFZIh9RtgZcBRc1h1f1A0cxQEsxyk VLBVgnjhyfpEdjj1EsfzTKucxJAUIWcB2DwkZiCYCiHxj54lCjnDNU54XXxvZWw2RvQn 5S53oFLmFUFPLQQMmY+cl3bN+1goInKjkxH+GGkA1uTxdeDQkKfMpHjYq98dbyRz8P9n DQiQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=NMj28RQu89IpBjP1IhWkIlBf01qLZ8nKExbaoqPbgbw=; b=es7tT4zNpTsAjRd9Y85vwb30NgRp2xGvOKwdBpSOqw3hNjaiiCCfhFyAGqx8+bLUio IjhJhx6ih3Px9TQ6749/TvrFKDhmQla5frESXCS14PgNS2/1ik1H/5QW/Ru7fj9uMCDK lkugSTZgLAO8TQNgKMu0ch8RlEGV8gs/XN7A4olw8XAA0WijCLyZdYMVmslfCfnUNM6U 2eSdDvLrLxp+trux+bYyvFxtfOt+nQPJxJzE9EmedJbaFK+JPQv8FKPGZ5NxLTKpP29E 1GItlc75Fs4VEkWJ8ZcxefSNL/jMoGZkND1w8esDItMR+pyNwW+bDUJYsYm0vzK9pokY Bzdw== X-Gm-Message-State: AOAM531+yewHo/gjFNVtddVOm3XCI3tISl98l2g5b+bLCL87JIDpLb2/ nRBsZ9l7mCkvFL6dmjur9VRQ0+kh/Y8v X-Google-Smtp-Source: ABdhPJxlwlg45n5uaxApAFV2cd0Uim4keKFYP4qHSu4iCB3PSh3HavOUnC0SlYIRfaMhZyrTy/WBlHyu0NX9 X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:13a4:b0:4ce:118f:a822 with SMTP id t36-20020a056a0013a400b004ce118fa822mr20313831pfg.33.1646775293440; Tue, 08 Mar 2022 13:34:53 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:07 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-5-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 04/14] mm/khugepaged: separate khugepaged_scan_pmd() scan and collapse From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 6DC9A18000A X-Stat-Signature: 5jq4d6aas18jaburaf9kcx3nscu3yphh Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=bHTSNRFr; spf=pass (imf16.hostedemail.com: domain of 3_csnYgcKCEwD2ysstsu22uzs.q20zw18B-00y9oqy.25u@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3_csnYgcKCEwD2ysstsu22uzs.q20zw18B-00y9oqy.25u@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-HE-Tag: 1646775294-608205 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: khugepaged_scan_pmd() currently does : (1) scan pmd to see if it's suitable for collapse, then (2) do the collapse, if scan succeeds. Separate out (1) so that it can be reused by itself later in the series, and introduce a struct scan_pmd_result to gather data about the scan. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 107 ++++++++++++++++++++++++++++++------------------ 1 file changed, 67 insertions(+), 40 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e3399a451662..b204bc1eefa7 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1244,27 +1244,34 @@ static void collapse_huge_page(struct mm_struct *mm, return; } -static int khugepaged_scan_pmd(struct mm_struct *mm, - struct vm_area_struct *vma, - unsigned long address, - struct page **hpage, - struct collapse_control *cc) +struct scan_pmd_result { + int result; + bool writable; + int referenced; + int unmapped; + int none_or_zero; + struct page *head; +}; + +static void scan_pmd(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + struct collapse_control *cc, + struct scan_pmd_result *scan_result) { pmd_t *pmd; pte_t *pte, *_pte; - int ret = 0, result = 0, referenced = 0; - int none_or_zero = 0, shared = 0; + int shared = 0; struct page *page = NULL; unsigned long _address; spinlock_t *ptl; - int node = NUMA_NO_NODE, unmapped = 0; - bool writable = false; + int node = NUMA_NO_NODE; VM_BUG_ON(address & ~HPAGE_PMD_MASK); pmd = mm_find_pmd(mm, address); if (!pmd) { - result = SCAN_PMD_NULL; + scan_result->result = SCAN_PMD_NULL; goto out; } @@ -1274,7 +1281,8 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, _pte++, _address += PAGE_SIZE) { pte_t pteval = *_pte; if (is_swap_pte(pteval)) { - if (++unmapped <= khugepaged_max_ptes_swap || + if (++scan_result->unmapped <= + khugepaged_max_ptes_swap || !cc->enforce_pte_scan_limits) { /* * Always be strict with uffd-wp @@ -1282,23 +1290,24 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, * comment below for pte_uffd_wp(). */ if (pte_swp_uffd_wp(pteval)) { - result = SCAN_PTE_UFFD_WP; + scan_result->result = SCAN_PTE_UFFD_WP; goto out_unmap; } continue; } else { - result = SCAN_EXCEED_SWAP_PTE; + scan_result->result = SCAN_EXCEED_SWAP_PTE; count_vm_event(THP_SCAN_EXCEED_SWAP_PTE); goto out_unmap; } } if (pte_none(pteval) || is_zero_pfn(pte_pfn(pteval))) { if (!userfaultfd_armed(vma) && - (++none_or_zero <= khugepaged_max_ptes_none || + (++scan_result->none_or_zero <= + khugepaged_max_ptes_none || !cc->enforce_pte_scan_limits)) { continue; } else { - result = SCAN_EXCEED_NONE_PTE; + scan_result->result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); goto out_unmap; } @@ -1313,22 +1322,22 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, * userfault messages that falls outside of * the registered range. So, just be simple. */ - result = SCAN_PTE_UFFD_WP; + scan_result->result = SCAN_PTE_UFFD_WP; goto out_unmap; } if (pte_write(pteval)) - writable = true; + scan_result->writable = true; page = vm_normal_page(vma, _address, pteval); if (unlikely(!page)) { - result = SCAN_PAGE_NULL; + scan_result->result = SCAN_PAGE_NULL; goto out_unmap; } if (page_mapcount(page) > 1 && ++shared > khugepaged_max_ptes_shared && cc->enforce_pte_scan_limits) { - result = SCAN_EXCEED_SHARED_PTE; + scan_result->result = SCAN_EXCEED_SHARED_PTE; count_vm_event(THP_SCAN_EXCEED_SHARED_PTE); goto out_unmap; } @@ -1338,25 +1347,25 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, /* * Record which node the original page is from and save this * information to cc->node_load[]. - * Khugepaged will allocate hugepage from the node has the max + * Caller should allocate hugepage from the node has the max * hit record. */ node = page_to_nid(page); if (khugepaged_scan_abort(node, cc)) { - result = SCAN_SCAN_ABORT; + scan_result->result = SCAN_SCAN_ABORT; goto out_unmap; } cc->node_load[node]++; if (!PageLRU(page)) { - result = SCAN_PAGE_LRU; + scan_result->result = SCAN_PAGE_LRU; goto out_unmap; } if (PageLocked(page)) { - result = SCAN_PAGE_LOCK; + scan_result->result = SCAN_PAGE_LOCK; goto out_unmap; } if (!PageAnon(page)) { - result = SCAN_PAGE_ANON; + scan_result->result = SCAN_PAGE_ANON; goto out_unmap; } @@ -1378,35 +1387,53 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, * will be done again later the risk seems low. */ if (!is_refcount_suitable(page)) { - result = SCAN_PAGE_COUNT; + scan_result->result = SCAN_PAGE_COUNT; goto out_unmap; } if (pte_young(pteval) || page_is_young(page) || PageReferenced(page) || mmu_notifier_test_young(vma->vm_mm, address)) - referenced++; + scan_result->referenced++; } - if (!writable) { - result = SCAN_PAGE_RO; - } else if (!referenced || (unmapped && referenced < HPAGE_PMD_NR/2)) { - result = SCAN_LACK_REFERENCED_PAGE; + if (!scan_result->writable) { + scan_result->result = SCAN_PAGE_RO; + } else if (!scan_result->referenced || + (scan_result->unmapped && + scan_result->referenced < HPAGE_PMD_NR / 2)) { + scan_result->result = SCAN_LACK_REFERENCED_PAGE; } else { - result = SCAN_SUCCEED; - ret = 1; + scan_result->result = SCAN_SUCCEED; } out_unmap: pte_unmap_unlock(pte, ptl); - if (ret) { +out: + scan_result->head = page; +} + +static int khugepaged_scan_pmd(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long address, + struct page **hpage, + struct collapse_control *cc) +{ + int node; + struct scan_pmd_result scan_result = {}; + + scan_pmd(mm, vma, address, cc, &scan_result); + if (scan_result.result == SCAN_SUCCEED) { node = khugepaged_find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, address, hpage, node, - referenced, unmapped, - cc->enforce_pte_scan_limits); + collapse_huge_page(mm, khugepaged_scan.address, hpage, node, + scan_result.referenced, scan_result.unmapped, + cc->enforce_pte_scan_limits); } -out: - trace_mm_khugepaged_scan_pmd(mm, page, writable, referenced, - none_or_zero, result, unmapped); - return ret; + + trace_mm_khugepaged_scan_pmd(mm, scan_result.head, scan_result.writable, + scan_result.referenced, + scan_result.none_or_zero, + scan_result.result, scan_result.unmapped); + + return scan_result.result == SCAN_SUCCEED; } static void collect_mm_slot(struct mm_slot *mm_slot) From patchwork Tue Mar 8 21:34:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774394 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3CFB8C433F5 for ; Tue, 8 Mar 2022 21:34:58 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CEB2E8D0008; Tue, 8 Mar 2022 16:34:57 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C9C638D0001; Tue, 8 Mar 2022 16:34:57 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id B3BDE8D0008; Tue, 8 Mar 2022 16:34:57 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.28]) by kanga.kvack.org (Postfix) with ESMTP id A10418D0001 for ; Tue, 8 Mar 2022 16:34:57 -0500 (EST) Received: from smtpin10.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay11.hostedemail.com (Postfix) with ESMTP id 785588204F for ; Tue, 8 Mar 2022 21:34:57 +0000 (UTC) X-FDA: 79222524234.10.4A8413B Received: from mail-pf1-f202.google.com (mail-pf1-f202.google.com [209.85.210.202]) by imf05.hostedemail.com (Postfix) with ESMTP id 11483100007 for ; Tue, 8 Mar 2022 21:34:56 +0000 (UTC) Received: by mail-pf1-f202.google.com with SMTP id w68-20020a62dd47000000b004f6aa5e4824so303312pff.4 for ; Tue, 08 Mar 2022 13:34:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=YL7Ifg74lmqADJcFyq3Dg19FCQB6Gp8A8NRremZcd8k=; b=J/04UN8xXO6PfB2dgwV2d+cYmhx8EAJbGt2Dry3KROTmDlbNLWMvFSL/Rb+KplutcF cMQICc3NSHW+Iq0cFf6+a+X/d+zM4oVo0qV9RAipPXhx7TPxgZHtYyEGKPVzX05/S6W6 ZjYFHCWn3HhovCnEGljl8iCEJmW28ZPHrqUJaJgbBl6DfCj6tCg0pYLQS2ZHxpAfq/Ho ZTs58hQnbCHXJbdg4FvXZyewha6Gs7DIfM+UycLtr7zaQ/SQgRAaWaZsnybDUMykvGy6 I8EeNTsd4NUzKIBptivL9ZtFVO8W04h25MwkeUsS6jpOCmPVTMJVCYMOcvvNEk/f4EsE +MDw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=YL7Ifg74lmqADJcFyq3Dg19FCQB6Gp8A8NRremZcd8k=; b=MR7Yo8Gla/0Vl8FnK6oSMGc3XBIDQQF954Bgw3dBCwZUtX6bG6KDRU59bWRM2jesJr C1M/sD1UF9YYmaCrNyYzbKC3wbA7Txkfn2B+qOZ+7+Tj/IaZsCRQT+nS3pUmvcXZw8OP gmfTXZh5SS1O2p/Jvw0mJya8XxsLuCUT+Z9UWakENMwxRgUrKNeuzvG+dkBDpqk5CNuX AIDxroxUh7feoIvgm/bZlsjqX7nJLoY8dRv43v2H5YIdQgIhyWU9o8scktnsG8DQnnCK +FN61H9ADRa48S1ylykk1dSXir6LqqnnLlVZxI4CzRH4qmkSz1vYu98cfI4jaF/Zak5E BC5g== X-Gm-Message-State: AOAM530zZI9wOUj1HjLGg31emgZOcCVMxirgVC0pMIpdCyKL/aq4Lrxt PEaogYCkdPdFIVAbU0IWVy9kI8X1KsLS X-Google-Smtp-Source: ABdhPJyikj6QzzefY5xo7SWXSM/OK/736d0/+m3Zo491Wj4wZVDep2Ui8XZDZnWT2k0Mw+7g80lkXRqnCVEx X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:1e10:b0:1bf:6c78:54a9 with SMTP id pg16-20020a17090b1e1000b001bf6c7854a9mr927835pjb.1.1646775295506; Tue, 08 Mar 2022 13:34:55 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:08 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-6-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 05/14] mm/khugepaged: add mmap_assert_locked() checks to scan_pmd() From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspam-User: X-Rspamd-Server: rspam02 X-Rspamd-Queue-Id: 11483100007 X-Stat-Signature: 8rf7sjza7ch9jnqkbwixx38tawesc1nj Authentication-Results: imf05.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b="J/04UN8x"; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf05.hostedemail.com: domain of 3_8snYgcKCE4F40uuvuw44w1u.s421y3AD-220Bqs0.47w@flex--zokeefe.bounces.google.com designates 209.85.210.202 as permitted sender) smtp.mailfrom=3_8snYgcKCE4F40uuvuw44w1u.s421y3AD-220Bqs0.47w@flex--zokeefe.bounces.google.com X-HE-Tag: 1646775296-594436 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: scan_pmd() requires mmap_lock held in read. Add a lockdep assertion to guard this condition, as scan_pmd() will be called from other contexts later in the series. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index b204bc1eefa7..56f2ef7146c7 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1253,6 +1253,7 @@ struct scan_pmd_result { struct page *head; }; +/* Called with mmap_lock held and does not drop it. */ static void scan_pmd(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long address, @@ -1267,6 +1268,7 @@ static void scan_pmd(struct mm_struct *mm, spinlock_t *ptl; int node = NUMA_NO_NODE; + mmap_assert_locked(mm); VM_BUG_ON(address & ~HPAGE_PMD_MASK); pmd = mm_find_pmd(mm, address); From patchwork Tue Mar 8 21:34:09 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774395 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8713DC433EF for ; Tue, 8 Mar 2022 21:35:00 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 1A2EC8D0009; Tue, 8 Mar 2022 16:35:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 12B8C8D0001; Tue, 8 Mar 2022 16:35:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id EBFF18D0009; Tue, 8 Mar 2022 16:34:59 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0030.hostedemail.com [216.40.44.30]) by kanga.kvack.org (Postfix) with ESMTP id DC3968D0001 for ; Tue, 8 Mar 2022 16:34:59 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 8DBBCA759C for ; Tue, 8 Mar 2022 21:34:59 +0000 (UTC) X-FDA: 79222524318.28.3D2EA3F Received: from mail-pf1-f201.google.com (mail-pf1-f201.google.com [209.85.210.201]) by imf03.hostedemail.com (Postfix) with ESMTP id F2E512000C for ; Tue, 8 Mar 2022 21:34:58 +0000 (UTC) Received: by mail-pf1-f201.google.com with SMTP id 127-20020a620685000000b004f6eaf868easo265436pfg.22 for ; Tue, 08 Mar 2022 13:34:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=DbXdz/QNU2FalrGjiQ9AEtwD/eVZwmYdtIsI67Y/vrQ=; b=H0+9KgUA7b5DaGQ5trxXk7aCYPyhuPtPM/vrIRcW7jDStpS1S6cbo569qrOF+RBBCu ycp7o9uYddCPrjflOfFtltQ8OnIM85gaYtIyKNpei4nwL90hD4u2EFFZSM2L3t0D6sMN zTtmOix6EO44ju5jLs6OxLt5cZ8wpaFJ+Pv7Erui1glEdgscjz3TvrXZYhRe1wi+rtba a1nxEv8oOKbiq2gYJ9iCbO21s1i8Jrf43i2dhl69qRy3+umHuSU4Vx+WHnF92DtcVHzP SddqZy6CTJyC2+fII5NDMQftuZ1dBwAUINS3l2TPxps5lGH1y+xbUUPPva2SYvB5Cdla LSig== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=DbXdz/QNU2FalrGjiQ9AEtwD/eVZwmYdtIsI67Y/vrQ=; b=r5FQeMHkrFpdAojaN64afAMjEH3tHRYr0NVN5USHM+mJ8e7vNFSFnkoMk6hMNJ6rsO VFRsuFKXjVIH/HBZ2G8RVifbsVVUHus8tH/IHFhK8nn8c4JTdMAxYo3TqIXhtdEc4g4I ziOMc6z6aF3XPgy7YSgLundCI1GaunwhP4PJJvbhcGqGCWrLSs9cL+6Z2e4iIhovX2VB T1Cl23ibISS6mFWbaFQNsfbFkQtvrRQpsYJLq8uarMDk4RBG7o6zy+I6WGCpD4VhmDOW PoPdDKbKUFuNibOxXmRRQ+rZpK2RnRZMCdGuDNjNoRFtPExMH6BHWb46IzM3453nbGri 0jYQ== X-Gm-Message-State: AOAM53082V+KUz99/FhuN2YSIICcls1NNVP+c/+e1q6v+LfbL+8L9nwl S36St5kHFX7yo54dtQygbVHIECC6H1AA X-Google-Smtp-Source: ABdhPJy7RCgUr3XsGlQcMcqkd2uK7AmGUdabvjDfGVr64UpdBPBh0ZjDJR2D+X2PKEILFF+7oqFgKTkpthHA X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:1e10:b0:1bf:6c78:54a9 with SMTP id pg16-20020a17090b1e1000b001bf6c7854a9mr927850pjb.1.1646775297572; Tue, 08 Mar 2022 13:34:57 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:09 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-7-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 06/14] mm/khugepaged: add hugepage_vma_revalidate_pmd_count() From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Server: rspam11 X-Rspamd-Queue-Id: F2E512000C X-Rspam-User: Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=H0+9KgUA; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 3AcwnYgcKCFAH62wwxwy66y3w.u64305CF-442Dsu2.69y@flex--zokeefe.bounces.google.com designates 209.85.210.201 as permitted sender) smtp.mailfrom=3AcwnYgcKCFAH62wwxwy66y3w.u64305CF-442Dsu2.69y@flex--zokeefe.bounces.google.com X-Stat-Signature: gbzpc89qcw5ohnbp47kbapyum3xgtsm9 X-HE-Tag: 1646775298-541287 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: madvise collapse context operates on pmds in batch. We will want to be able to revalidate a region that spans multiple pmds in the same vma. Add hugepage_vma_revalidate_pmd_count() which extends hugepage_vma_revalidate() with number of pmds to revalidate. hugepage_vma_revalidate() now calls through this. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 26 ++++++++++++++++++-------- 1 file changed, 18 insertions(+), 8 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 56f2ef7146c7..1d20be47bcea 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -964,18 +964,17 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) #endif /* - * If mmap_lock temporarily dropped, revalidate vma - * before taking mmap_lock. - * Return 0 if succeeds, otherwise return none-zero - * value (scan code). + * Revalidate a vma's eligibility to collapse nr hugepages. */ - -static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, - struct vm_area_struct **vmap) +static int hugepage_vma_revalidate_pmd_count(struct mm_struct *mm, + unsigned long address, int nr, + struct vm_area_struct **vmap) { struct vm_area_struct *vma; unsigned long hstart, hend; + mmap_assert_locked(mm); + if (unlikely(khugepaged_test_exit(mm))) return SCAN_ANY_PROCESS; @@ -985,7 +984,7 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, hstart = (vma->vm_start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; hend = vma->vm_end & HPAGE_PMD_MASK; - if (address < hstart || address + HPAGE_PMD_SIZE > hend) + if (address < hstart || (address + nr * HPAGE_PMD_SIZE) > hend) return SCAN_ADDRESS_RANGE; if (!hugepage_vma_check(vma, vma->vm_flags)) return SCAN_VMA_CHECK; @@ -995,6 +994,17 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, return 0; } +/* + * If mmap_lock temporarily dropped, revalidate vma before taking mmap_lock. + * Return 0 if succeeds, otherwise return none-zero value (scan code). + */ + +static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, + struct vm_area_struct **vmap) +{ + return hugepage_vma_revalidate_pmd_count(mm, address, 1, vmap); +} + /* * Bring missing pages in from swap, to complete THP collapse. * Only done if khugepaged_scan_pmd believes it is worthwhile. From patchwork Tue Mar 8 21:34:10 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774396 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21CB7C433F5 for ; Tue, 8 Mar 2022 21:35:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AEF718D000A; Tue, 8 Mar 2022 16:35:01 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A9FFC8D0001; Tue, 8 Mar 2022 16:35:01 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8DF4B8D000A; Tue, 8 Mar 2022 16:35:01 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0065.hostedemail.com [216.40.44.65]) by kanga.kvack.org (Postfix) with ESMTP id 791548D0001 for ; Tue, 8 Mar 2022 16:35:01 -0500 (EST) Received: from smtpin23.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 357DC181F24B6 for ; Tue, 8 Mar 2022 21:35:01 +0000 (UTC) X-FDA: 79222524402.23.2D7206C Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf20.hostedemail.com (Postfix) with ESMTP id BB60D1C000D for ; Tue, 8 Mar 2022 21:35:00 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id w3-20020a17090ac98300b001b8b914e91aso316121pjt.0 for ; Tue, 08 Mar 2022 13:35:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=4pNIXGv6rdz5q19s1rZxDBE8jFO83zbbktYFtEBML7A=; b=hwUz3yBBt0CsKEFMkdWyy+g406vvQ3b3vW3F48F9KGjs49XfcYo/2yaWHj/lLV4Dzz MzxYJa98/JatD8CAs8u8rkjrQTNvw0BzmBVt+1OR2m6R7vxVnkndBpneR2tX8V8B65Es qWPJZSjFzHBWk+5utEjCbWDclsm82xLNHMjDHULS6mrS+am7kQPCoIaQyrQO8HIq1OBd uscaWsG5uZmlbu+/yLpgW6wLJ93cgdFVaCs9nArB+ThI5a0ld1LjCxc5S/yrVzGeeuve iYHbtZvw8Pgy5cZUyMreNdqxbJWWcPtMSOy0bMsgrdzCLF81i84cZK8fkdULUGeEEK1V 2W7w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=4pNIXGv6rdz5q19s1rZxDBE8jFO83zbbktYFtEBML7A=; b=1esjK/kVsr753qZSuDOFi+AEyUIs55iYbjUBXyge7p7ZqFta1LQZOv9fzKkWJXAR50 ezFO1m4ArJD2R9z9Z1C2um2C+JzY2sK6Hb6hBLZ28zfnktzOXdRblDmh/LPDLqsgmX1y a4+5NtzArG36bt85XXH7pnfraaCz5oUv9H3C061Nxc6gyXsLXnaUiVCxdT60BBl0GGTZ ub5mtZvui5qmNYk/wCg8Y0CaHc1U0IkxziSRLY16b+fhiRHRf/YAHBDz0sWsb7LpcltN rLXW1dKNFlpVOcFOAFCjJ5eQm9kL4kUe2ZWwdXQ+mE3eB58iCFDtoZHieZTx2KxEEZd9 Au8w== X-Gm-Message-State: AOAM532x94o3fWcE18V4MIuh3Q/F3fy3ePetTTzE9kfckHj6rYUtA+cZ xTkLLx549xg2LWCUT6e0MdKGconv+crc X-Google-Smtp-Source: ABdhPJzCFgsJPnBjeOpEKdxdcVbDseeXH0IJj/6nUs+NKdlyZUSn29QuaEGUW+OQEB/cxzScg3Yh4AqXCXWq X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:cf12:b0:14f:e0c2:1515 with SMTP id i18-20020a170902cf1200b0014fe0c21515mr19740035plg.4.1646775299810; Tue, 08 Mar 2022 13:34:59 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:10 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-8-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 07/14] mm/khugepaged: add vm_flags_ignore to hugepage_vma_revalidate_pmd_count() From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: BB60D1C000D X-Rspam-User: Authentication-Results: imf20.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=hwUz3yBB; spf=pass (imf20.hostedemail.com: domain of 3A8wnYgcKCFIJ84yyzy08805y.w86527EH-664Fuw4.8B0@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3A8wnYgcKCFIJ84yyzy08805y.w86527EH-664Fuw4.8B0@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: 7nzzxy1dtf348ofeug38e53g74upicaf X-HE-Tag: 1646775300-251040 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In madvise collapse context, we optionally want to be able to ignore advice from MADV_NOHUGEPAGE-marked regions. Add a vm_flags_ignore argument to hugepage_vma_revalidate_pmd_count() which can be used to ignore vm flags used when considering thp eligibility. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 18 ++++++++++++------ 1 file changed, 12 insertions(+), 6 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 1d20be47bcea..ecbd3fc41c80 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -964,10 +964,14 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) #endif /* - * Revalidate a vma's eligibility to collapse nr hugepages. + * Revalidate a vma's eligibility to collapse nr hugepages. vm_flags_ignore + * can be used to ignore certain vma_flags that would otherwise be checked - + * the principal example being VM_NOHUGEPAGE which is ignored in madvise + * collapse context. */ static int hugepage_vma_revalidate_pmd_count(struct mm_struct *mm, unsigned long address, int nr, + unsigned long vm_flags_ignore, struct vm_area_struct **vmap) { struct vm_area_struct *vma; @@ -986,7 +990,7 @@ static int hugepage_vma_revalidate_pmd_count(struct mm_struct *mm, hend = vma->vm_end & HPAGE_PMD_MASK; if (address < hstart || (address + nr * HPAGE_PMD_SIZE) > hend) return SCAN_ADDRESS_RANGE; - if (!hugepage_vma_check(vma, vma->vm_flags)) + if (!hugepage_vma_check(vma, vma->vm_flags & ~vm_flags_ignore)) return SCAN_VMA_CHECK; /* Anon VMA expected */ if (!vma->anon_vma || vma->vm_ops) @@ -1000,9 +1004,11 @@ static int hugepage_vma_revalidate_pmd_count(struct mm_struct *mm, */ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, + unsigned long vm_flags_ignore, struct vm_area_struct **vmap) { - return hugepage_vma_revalidate_pmd_count(mm, address, 1, vmap); + return hugepage_vma_revalidate_pmd_count(mm, address, 1, + vm_flags_ignore, vmap); } /* @@ -1043,7 +1049,7 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */ if (ret & VM_FAULT_RETRY) { mmap_read_lock(mm); - if (hugepage_vma_revalidate(mm, haddr, &vma)) { + if (hugepage_vma_revalidate(mm, haddr, VM_NONE, &vma)) { /* vma is no longer available, don't continue to swapin */ trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); return false; @@ -1200,7 +1206,7 @@ static void collapse_huge_page(struct mm_struct *mm, count_memcg_page_event(new_page, THP_COLLAPSE_ALLOC); mmap_read_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); + result = hugepage_vma_revalidate(mm, address, VM_NONE, &vma); if (result) { mmap_read_unlock(mm); goto out_nolock; @@ -1232,7 +1238,7 @@ static void collapse_huge_page(struct mm_struct *mm, */ mmap_write_lock(mm); - result = hugepage_vma_revalidate(mm, address, &vma); + result = hugepage_vma_revalidate(mm, address, VM_NONE, &vma); if (result) goto out_up_write; /* check if the pmd is still valid */ From patchwork Tue Mar 8 21:34:11 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774397 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id DD992C433EF for ; Tue, 8 Mar 2022 21:35:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 721818D000B; Tue, 8 Mar 2022 16:35:04 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 6CFF28D0001; Tue, 8 Mar 2022 16:35:04 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 549508D000B; Tue, 8 Mar 2022 16:35:04 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0153.hostedemail.com [216.40.44.153]) by kanga.kvack.org (Postfix) with ESMTP id 3D83B8D0001 for ; Tue, 8 Mar 2022 16:35:04 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with ESMTP id E56EB18288A48 for ; Tue, 8 Mar 2022 21:35:03 +0000 (UTC) X-FDA: 79222524486.21.EF37659 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf27.hostedemail.com (Postfix) with ESMTP id 7490340005 for ; Tue, 8 Mar 2022 21:35:03 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id y1-20020a17090a644100b001bc901aba0dso278766pjm.8 for ; Tue, 08 Mar 2022 13:35:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=NhFddjnsK0PIVdpBWrNU/x4hymHnPa8n/Ik4F0cygtk=; b=OWR3g8uNE2HHASyt/2BcsKeJlkq9k0i4r4+Kb0dQtnuy2YWiTLGnwtv8EDfQKXfOGD LVNjUDR6Gfi8AbTSu9RieP80ovKhF1HhtaTdNTKRAQhgzPpDOTEq7VTBWFaicd+dpYTD 5njmHp9vpcgdaizPFfQAvwdCBY2ONFEMYWx1Upu084kgH+IKCyK8gd3LCT9CO5gXRvyX g4VeQqp0jHVegeVrrXHn1k/TdUiQd2FXYm4GNOoVMafmfYEK7ljqHxSwj7NMix0ACc5w aeoFWA7fEiRUGrgpUXFhNilbjLTdgJKhcu90D+jiX/HEPZjsvVZ+tx2xq1AOsuJEY+3D 8CBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=NhFddjnsK0PIVdpBWrNU/x4hymHnPa8n/Ik4F0cygtk=; b=TvTJV13DtvtCjReybPBRvZzyOH2BRksdkHwi8ElmCrxqJUzR6sbBiAQZqZ/yJNYUHb 0jYTNvTCYW6SOD0psE6rFhXWGF1yCqzlm+XHAQTjwkMOw4S1YhKJjMSkoYPqbKe6odKu PDcEK8PhIZgNFKyyQtzfGXJPKmSp67c6wZq2yJNyUR3mbuE2peOXRhYGKbblsvx9gXkv eLlEKcnjyYthLpBvM/Kf3vNR0u7BrPOljbYYY6NwJIyST+I9DSIuhSH4ZB/PcmopPx0n 3isRdzc6usYTk4jO1u4zD50/e7tqBWkwCrZ7CeXf15k6SEApe9pMxkyPOJSks8e3st5B Psrg== X-Gm-Message-State: AOAM5334bLgLg1PIufuP3VstOX3pE4NHLNstdEDDIHeF+LpNrpnJ/+45 7j28sQyXTNqczL8LcvKkZarkHgD6sBaL X-Google-Smtp-Source: ABdhPJyEVDLySllBmNQp1JPL5pb572jTuuXj39Sm4ofIw8LNfOF0luQJ+gMR0V19/cRtVbJgqdoXhpd/8pJX X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:1e10:b0:1bf:6c78:54a9 with SMTP id pg16-20020a17090b1e1000b001bf6c7854a9mr927885pjb.1.1646775301800; Tue, 08 Mar 2022 13:35:01 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:11 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-9-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 08/14] mm/thp: add madv_thp_vm_flags to __transparent_hugepage_enabled() From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 7490340005 X-Stat-Signature: 8et4i8ptm9sxsxk4ct1gire3gxzc1c4b Authentication-Results: imf27.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=OWR3g8uN; spf=pass (imf27.hostedemail.com: domain of 3BcwnYgcKCFQLA600102AA270.yA8749GJ-886Hwy6.AD2@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3BcwnYgcKCFQLA600102AA270.yA8749GJ-886Hwy6.AD2@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1646775303-321814 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Later in the series, in madvise collapse context, we will want to optionally ignore MADV_NOHUGEPAGE. However, we'd also like to standardize on __transparent_hugepage_enabled() for determining anon thp eligibility. Add a new argument to __transparent_hugepage_enabled() which represents the vma flags to be used instead of those in vma->vm_flags for VM_[NO]HUGEPAGE checks. I.e. checks inside __transparent_hugepage_enabled() which previously didn't care about madvise settings, such as dax check, or stack check, are unaffected. Signed-off-by: Zach O'Keefe --- include/linux/huge_mm.h | 14 ++++++++++---- mm/huge_memory.c | 2 +- mm/memory.c | 6 ++++-- 3 files changed, 15 insertions(+), 7 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 2999190adc22..fd905b0b2c71 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -143,8 +143,13 @@ static inline bool transhuge_vma_enabled(struct vm_area_struct *vma, /* * to be used on vmas which are known to support THP. * Use transparent_hugepage_active otherwise + * + * madv_thp_vm_flags are used instead of vma->vm_flags for VM_NOHUGEPAGE + * and VM_HUGEPAGE. Principal use is ignoring VM_NOHUGEPAGE when in madvise + * collapse context. */ -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) +static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma, + unsigned long madv_thp_vm_flags) { /* @@ -153,7 +158,7 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_NEVER_DAX)) return false; - if (!transhuge_vma_enabled(vma, vma->vm_flags)) + if (!transhuge_vma_enabled(vma, madv_thp_vm_flags)) return false; if (vma_is_temporary_stack(vma)) @@ -167,7 +172,7 @@ static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) if (transparent_hugepage_flags & (1 << TRANSPARENT_HUGEPAGE_REQ_MADV_FLAG)) - return !!(vma->vm_flags & VM_HUGEPAGE); + return !!(madv_thp_vm_flags & VM_HUGEPAGE); return false; } @@ -316,7 +321,8 @@ static inline bool folio_test_pmd_mappable(struct folio *folio) return false; } -static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma) +static inline bool __transparent_hugepage_enabled(struct vm_area_struct *vma, + unsigned long madv_thp_vm_flags) { return false; } diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 3557aabe86fe..25b7590b9846 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -83,7 +83,7 @@ bool transparent_hugepage_active(struct vm_area_struct *vma) if (!transhuge_vma_suitable(vma, addr)) return false; if (vma_is_anonymous(vma)) - return __transparent_hugepage_enabled(vma); + return __transparent_hugepage_enabled(vma, vma->vm_flags); if (vma_is_shmem(vma)) return shmem_huge_enabled(vma); if (IS_ENABLED(CONFIG_READ_ONLY_THP_FOR_FS)) diff --git a/mm/memory.c b/mm/memory.c index 4499cf09c21f..a6f2a8a20329 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -4695,7 +4695,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, if (!vmf.pud) return VM_FAULT_OOM; retry_pud: - if (pud_none(*vmf.pud) && __transparent_hugepage_enabled(vma)) { + if (pud_none(*vmf.pud) && + __transparent_hugepage_enabled(vma, vma->vm_flags)) { ret = create_huge_pud(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; @@ -4726,7 +4727,8 @@ static vm_fault_t __handle_mm_fault(struct vm_area_struct *vma, if (pud_trans_unstable(vmf.pud)) goto retry_pud; - if (pmd_none(*vmf.pmd) && __transparent_hugepage_enabled(vma)) { + if (pmd_none(*vmf.pmd) && + __transparent_hugepage_enabled(vma, vma->vm_flags)) { ret = create_huge_pmd(&vmf); if (!(ret & VM_FAULT_FALLBACK)) return ret; From patchwork Tue Mar 8 21:34:12 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774398 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 84292C433F5 for ; Tue, 8 Mar 2022 21:35:06 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A04D58D000C; Tue, 8 Mar 2022 16:35:05 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 919188D0001; Tue, 8 Mar 2022 16:35:05 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 792D48D000C; Tue, 8 Mar 2022 16:35:05 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 6B5168D0001 for ; Tue, 8 Mar 2022 16:35:05 -0500 (EST) Received: from smtpin12.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay07.hostedemail.com (Postfix) with ESMTP id 4A0F122092 for ; Tue, 8 Mar 2022 21:35:05 +0000 (UTC) X-FDA: 79222524570.12.CEACCE5 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf30.hostedemail.com (Postfix) with ESMTP id D024E8000A for ; Tue, 8 Mar 2022 21:35:04 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id o41-20020a17090a0a2c00b001bf06e5badfso292173pjo.3 for ; Tue, 08 Mar 2022 13:35:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=ipw1R7b8Sbc0tCBBdZYpbSNx+pW+oifi9JV5Pg9mDo0=; b=fxS7bc9XBrs7XecMQfIQjSg7jOIBkH3yMKI92zX0g4pPwRdVFxFGCuTg65X8qdiwqL F5WKEcMx6OZosejrxbuSVYpIRy5pvCkNwvGqeY7BXsYOnyZ7QD1a7i99Fiw7cNE2gxVB hmnjJOzym4cmlUskERwkaM4ZFI9LsXT71k/Saq3L8sqx1IyO5LKZPH9Ds3Yrx9SE8BY+ T8I+HHcq7h+r28z9DHOWWYmjMmiQRrA1W6FJGmLeKJOdkMcsSsfOVi/FkTldWUT1iKY8 xp1xlgI+wn6ytSAMygRXslP+g2wDUfnLcIYH/D3/LO+JBjk17slT4ArTH9rtGEaMvyL9 lIMg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=ipw1R7b8Sbc0tCBBdZYpbSNx+pW+oifi9JV5Pg9mDo0=; b=3M/d/UUmFd6tM1aqLi8+bDWcZqu6PA7BXTs/EUQ3p+RgxEA1kxTbo3lLsJI2k8h3Eh OtMfJio0icvMqv4p0X/FFbrxOI17dUsO0yrWknrroe+1ciK9TVo+5Q2qU12XykFT8pmK CX+yM6sHbdhEY+fzm+4cnZrbdm+0chlgkuRjKY0dFpcb62gcpBc4xfmuXDw0sAdrZze3 DGGVRsapZ/+vIInTH0xIie9JBifg+lTHrQQgns5zzifv7eMrIp5aJZMdAMAe09YKFCuH NNwJIIwjENJJURCCGhddRRVvEJ7Y7omRhOwrMkqjF4iTwYR2t0UpQlsS5rErQktxl5D/ 1bag== X-Gm-Message-State: AOAM533Sd7kK8L6qPh7nkTAFQPMk3e/YKc9z6OmMPJCyh36AFUAB99UU /5OsEk0+cW0kZUxDVodRRHIAacTJjRay X-Google-Smtp-Source: ABdhPJwUx1r259FM6SlG9TMDWvRlaqM9dHNFlQFgyb8wj6p+Zp8zdYXbB/9toXLIq1WtsUepwejsFU/BMoDx X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a05:6a00:244c:b0:4f6:67b8:a6b4 with SMTP id d12-20020a056a00244c00b004f667b8a6b4mr20403199pfj.51.1646775303868; Tue, 08 Mar 2022 13:35:03 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:12 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-10-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 09/14] mm/khugepaged: record SCAN_PAGE_COMPOUND when scan_pmd() finds THP From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Server: rspam09 X-Rspamd-Queue-Id: D024E8000A X-Stat-Signature: qoqnxmc1hfwrcgrzdzq6ybm7iipxiyz3 Authentication-Results: imf30.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=fxS7bc9X; spf=pass (imf30.hostedemail.com: domain of 3B8wnYgcKCFYNC822324CC492.0CA96BIL-AA8Jy08.CF4@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3B8wnYgcKCFYNC822324CC492.0CA96BIL-AA8Jy08.CF4@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1646775304-943476 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: When scanning an anon pmd to see if it's eligible for collapse, return SCAN_PAGE_COMPOUND if the pmd already maps a thp. This is consistent with handling when scanning file-backed memory. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 41 +++++++++++++++++++++++++++++++++++------ 1 file changed, 35 insertions(+), 6 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ecbd3fc41c80..403578161a3b 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -1011,6 +1011,38 @@ static int hugepage_vma_revalidate(struct mm_struct *mm, unsigned long address, vm_flags_ignore, vmap); } +/* + * If returning NULL (meaning the pmd isn't mapped, isn't present, or thp), + * write the reason to *result. + */ +static pmd_t *find_pmd_or_thp_or_none(struct mm_struct *mm, + unsigned long address, + int *result) +{ + pmd_t *pmd = mm_find_pmd_raw(mm, address); + pmd_t pmde; + + if (!pmd) { + *result = SCAN_PMD_NULL; + return NULL; + } + + pmde = pmd_read_atomic(pmd); + +#ifdef CONFIG_TRANSPARENT_HUGEPAGE + /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ + barrier(); +#endif + if (!pmd_present(pmde) || !pmd_none(pmde)) { + *result = SCAN_PMD_NULL; + return NULL; + } else if (pmd_trans_huge(pmde)) { + *result = SCAN_PAGE_COMPOUND; + return NULL; + } + return pmd; +} + /* * Bring missing pages in from swap, to complete THP collapse. * Only done if khugepaged_scan_pmd believes it is worthwhile. @@ -1212,9 +1244,8 @@ static void collapse_huge_page(struct mm_struct *mm, goto out_nolock; } - pmd = mm_find_pmd(mm, address); + pmd = find_pmd_or_thp_or_none(mm, address, &result); if (!pmd) { - result = SCAN_PMD_NULL; mmap_read_unlock(mm); goto out_nolock; } @@ -1287,11 +1318,9 @@ static void scan_pmd(struct mm_struct *mm, mmap_assert_locked(mm); VM_BUG_ON(address & ~HPAGE_PMD_MASK); - pmd = mm_find_pmd(mm, address); - if (!pmd) { - scan_result->result = SCAN_PMD_NULL; + pmd = find_pmd_or_thp_or_none(mm, address, &scan_result->result); + if (!pmd) goto out; - } memset(cc->node_load, 0, sizeof(cc->node_load)); pte = pte_offset_map_lock(mm, pmd, address, &ptl); From patchwork Tue Mar 8 21:34:13 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774399 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5BA32C433FE for ; Tue, 8 Mar 2022 21:35:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id CEB4B8D000D; Tue, 8 Mar 2022 16:35:07 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id C9BA38D0001; Tue, 8 Mar 2022 16:35:07 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id A52118D000D; Tue, 8 Mar 2022 16:35:07 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.26]) by kanga.kvack.org (Postfix) with ESMTP id 951128D0001 for ; Tue, 8 Mar 2022 16:35:07 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 6C27295D for ; Tue, 8 Mar 2022 21:35:07 +0000 (UTC) X-FDA: 79222524654.11.FE1A494 Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf02.hostedemail.com (Postfix) with ESMTP id EEBF680015 for ; Tue, 8 Mar 2022 21:35:06 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id lp2-20020a17090b4a8200b001bc449ecbceso2404472pjb.8 for ; Tue, 08 Mar 2022 13:35:06 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=EKD7tUEflTeYheu32sMPeLrGDW2eIa1SzfVfqlF/w18=; b=VLwIRVOtbNVMQIwA9vDGCjKKBnmxB3Gwx9STsng1Eh7UnRkYaGpL4egDF8TEKw35xC 1a9M+qK4YM41yBC9XEKU3A7Vbpn57tj6GyHg1wWrzSaCTsW9VTZNVtAOTXyi/4A7D8eT d9mFLN/tQ6YXvfIoxF2kmj6wtFQQU1El5pYk48vRLoOaMt36c8mB3kNlFkkEcfTSmNlw TNGBWgX4kgj1FKFedyz6ul2opNOEQbycWHxPO//Yirsif7KTdDGbBx6FRsGga0YvfHlw dXCz7hVWIurXtsadC3rH/4W/HjY66H7fVuCMDQaxUm1kYG08IC8LQZ3uQPMtBuEyltCN Adww== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=EKD7tUEflTeYheu32sMPeLrGDW2eIa1SzfVfqlF/w18=; b=o0ZaBcqCg8w0vvv/UeB2U1J3/kOikCW+YRxqCO86uUsaL7aQE+DpvnDLpdPXYurzl9 oDqXO6rU/7DDCMsLHk0sajrBuKD0W2A9m0P9uBFmwciQbqFMGG0cvFvKvSDhEP7yJWF3 Qx7XVip68GWac4n+AxzWx6FTNuH8NFKtDX/fjmiyRjL+UE30zwA/ul0HQg9ohL49Tnuv ktyzRyf5I2GADyRfSVc+9mhwpiBbOb7OjTQGwn/HOvCwU78yIMND1p8tbMZb03ICaCyT gPmTKpAgZIgXUPjdEna2gX0rkXVPn4Ik5rhxK0pNFKpbZBpPPGLVf4EgzzHjtr6mOelK RPgQ== X-Gm-Message-State: AOAM530BQe/7DpEgbkzBqvGOqBG9y5eX+QEHipdabCnE/hcI1n205k23 8Stv6Hy1HG1lXuXJklI31pUvFoQPB1t/ X-Google-Smtp-Source: ABdhPJwZReJ91cvgq00tb/R3UOb3nFU6nzO3D7soWBdV1FhZx+sCDdRC15IJBdh/GImf6RrCEFxfzz2xTZrb X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:db0c:b0:151:ef4e:4d99 with SMTP id m12-20020a170902db0c00b00151ef4e4d99mr11058311plx.36.1646775305842; Tue, 08 Mar 2022 13:35:05 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:13 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-11-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 10/14] mm/khugepaged: rename khugepaged-specific/not functions From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: EEBF680015 X-Stat-Signature: 56q76dkc8q1wmabwf49eth6oz6oijaek Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=VLwIRVOt; spf=pass (imf02.hostedemail.com: domain of 3CcwnYgcKCFgPEA44546EE6B4.2ECB8DKN-CCAL02A.EH6@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3CcwnYgcKCFgPEA44546EE6B4.2ECB8DKN-CCAL02A.EH6@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1646775306-843210 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: In preparation for introducing a new collapse context, rename functions that are khugepaged-specific (or not). There is no functional change here. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 50 +++++++++++++++++++++++++------------------------ 1 file changed, 26 insertions(+), 24 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 403578161a3b..12ae765c5c32 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -92,7 +92,7 @@ struct collapse_control { /* Num pages scanned per node */ int node_load[MAX_NUMNODES]; - /* Last target selected in khugepaged_find_target_node() for this scan */ + /* Last target selected in find_target_node() for this scan */ int last_target_node; }; @@ -452,7 +452,7 @@ static void insert_to_mm_slots_hash(struct mm_struct *mm, hash_add(mm_slots_hash, &mm_slot->hash, (long)mm); } -static inline int khugepaged_test_exit(struct mm_struct *mm) +static inline int test_exit(struct mm_struct *mm) { return atomic_read(&mm->mm_users) == 0; } @@ -501,7 +501,7 @@ int __khugepaged_enter(struct mm_struct *mm) return -ENOMEM; /* __khugepaged_exit() must not run from under us */ - VM_BUG_ON_MM(khugepaged_test_exit(mm), mm); + VM_BUG_ON_MM(test_exit(mm), mm); if (unlikely(test_and_set_bit(MMF_VM_HUGEPAGE, &mm->flags))) { free_mm_slot(mm_slot); return 0; @@ -565,7 +565,7 @@ void __khugepaged_exit(struct mm_struct *mm) } else if (mm_slot) { /* * This is required to serialize against - * khugepaged_test_exit() (which is guaranteed to run + * test_exit() (which is guaranteed to run * under mmap sem read mode). Stop here (after we * return all pagetables will be destroyed) until * khugepaged has finished working on the pagetables @@ -836,7 +836,7 @@ static inline gfp_t alloc_hugepage_khugepaged_gfpmask(void) } #ifdef CONFIG_NUMA -static int khugepaged_find_target_node(struct collapse_control *cc) +static int find_target_node(struct collapse_control *cc) { int nid, target_node = 0, max_value = 0; @@ -895,7 +895,7 @@ khugepaged_alloc_page(struct page **hpage, gfp_t gfp, int node) return *hpage; } #else -static int khugepaged_find_target_node(struct collapse_control *cc) +static int find_target_node(struct collapse_control *cc) { return 0; } @@ -979,7 +979,7 @@ static int hugepage_vma_revalidate_pmd_count(struct mm_struct *mm, mmap_assert_locked(mm); - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(test_exit(mm))) return SCAN_ANY_PROCESS; *vmap = vma = find_vma(mm, address); @@ -1201,11 +1201,11 @@ static int __do_collapse_huge_page(struct mm_struct *mm, } -static void collapse_huge_page(struct mm_struct *mm, - unsigned long address, - struct page **hpage, - int node, int referenced, int unmapped, - int enforce_pte_scan_limits) +static void khugepaged_collapse_huge_page(struct mm_struct *mm, + unsigned long address, + struct page **hpage, + int node, int referenced, int unmapped, + int enforce_pte_scan_limits) { pmd_t *pmd; struct page *new_page; @@ -1468,11 +1468,13 @@ static int khugepaged_scan_pmd(struct mm_struct *mm, scan_pmd(mm, vma, address, cc, &scan_result); if (scan_result.result == SCAN_SUCCEED) { - node = khugepaged_find_target_node(cc); + node = find_target_node(cc); /* collapse_huge_page will return with the mmap_lock released */ - collapse_huge_page(mm, khugepaged_scan.address, hpage, node, - scan_result.referenced, scan_result.unmapped, - cc->enforce_pte_scan_limits); + khugepaged_collapse_huge_page(mm, khugepaged_scan.address, + hpage, node, + scan_result.referenced, + scan_result.unmapped, + cc->enforce_pte_scan_limits); } trace_mm_khugepaged_scan_pmd(mm, scan_result.head, scan_result.writable, @@ -1489,7 +1491,7 @@ static void collect_mm_slot(struct mm_slot *mm_slot) lockdep_assert_held(&khugepaged_mm_lock); - if (khugepaged_test_exit(mm)) { + if (test_exit(mm)) { /* free mm_slot */ hash_del(&mm_slot->hash); list_del(&mm_slot->mm_node); @@ -1656,7 +1658,7 @@ static void khugepaged_collapse_pte_mapped_thps(struct mm_slot *mm_slot) if (!mmap_write_trylock(mm)) return; - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(test_exit(mm))) goto out; for (i = 0; i < mm_slot->nr_pte_mapped_thp; i++) @@ -1711,7 +1713,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff) * reverse order. Trylock is a way to avoid deadlock. */ if (mmap_write_trylock(mm)) { - if (!khugepaged_test_exit(mm)) + if (!test_exit(mm)) collapse_and_free_pmd(mm, vma, addr, pmd); mmap_write_unlock(mm); } else { @@ -2188,7 +2190,7 @@ static void khugepaged_scan_file(struct mm_struct *mm, result = SCAN_EXCEED_NONE_PTE; count_vm_event(THP_SCAN_EXCEED_NONE_PTE); } else { - node = khugepaged_find_target_node(cc); + node = find_target_node(cc); collapse_file(mm, file, start, hpage, node); } } @@ -2241,7 +2243,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, vma = NULL; if (unlikely(!mmap_read_trylock(mm))) goto breakouterloop_mmap_lock; - if (likely(!khugepaged_test_exit(mm))) + if (likely(!test_exit(mm))) vma = find_vma(mm, khugepaged_scan.address); progress++; @@ -2249,7 +2251,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, unsigned long hstart, hend; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) { + if (unlikely(test_exit(mm))) { progress++; break; } @@ -2273,7 +2275,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, while (khugepaged_scan.address < hend) { int ret; cond_resched(); - if (unlikely(khugepaged_test_exit(mm))) + if (unlikely(test_exit(mm))) goto breakouterloop; VM_BUG_ON(khugepaged_scan.address < hstart || @@ -2313,7 +2315,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages, * Release the current mm_slot if this mm is about to die, or * if we scanned all vmas of this mm. */ - if (khugepaged_test_exit(mm) || !vma) { + if (test_exit(mm) || !vma) { /* * Make sure that if mm_users is reaching zero while * khugepaged runs here, khugepaged_exit will find From patchwork Tue Mar 8 21:34:14 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774400 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 69C7AC433EF for ; Tue, 8 Mar 2022 21:35:10 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 6ACCD8D000E; Tue, 8 Mar 2022 16:35:09 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 60CFF8D0001; Tue, 8 Mar 2022 16:35:09 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3C39C8D000E; Tue, 8 Mar 2022 16:35:09 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0070.hostedemail.com [216.40.44.70]) by kanga.kvack.org (Postfix) with ESMTP id 293EB8D0001 for ; Tue, 8 Mar 2022 16:35:09 -0500 (EST) Received: from smtpin24.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id DB79A824C420 for ; Tue, 8 Mar 2022 21:35:08 +0000 (UTC) X-FDA: 79222524696.24.0C5ADA8 Received: from mail-pj1-f74.google.com (mail-pj1-f74.google.com [209.85.216.74]) by imf03.hostedemail.com (Postfix) with ESMTP id 59DF82000E for ; Tue, 8 Mar 2022 21:35:08 +0000 (UTC) Received: by mail-pj1-f74.google.com with SMTP id y1-20020a17090a644100b001bc901aba0dso278892pjm.8 for ; Tue, 08 Mar 2022 13:35:08 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=3XLxnbL9VFnxXVITHjJhBMOjK9Ad9LTLCushbpHINSQ=; b=X2K0PE5pcThGSRm4Vyq5TiG9uavE4+/DTitJYTwFFBGKBiXWrQ8pSftwdQsfQziL7l ePh6RVWRjx10vtyjs1F/uJHsbc9lJ3nXPPuB5q6FKLk7DVuv+2N0f3B/zuk+qVs1K8wF evubqyt60Z27dt0GPCsNZkL4QD5hN4Ww2Fmmn+QrulOkamVU59TPbXVeDZRjCQC2j1VQ Bkggpsvi0nFdf3ExtIXk5SblTLoz1crQv4ayyl2hrJTLOj9b6PNuO9thshxObdtfErh9 lxYomFnPw50tcoKUC5Grx3WQ3Vo8eXQn1nyaTc0kjEyWJOVYy1q4IhKrkXdSJJc2MwTO bJLw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=3XLxnbL9VFnxXVITHjJhBMOjK9Ad9LTLCushbpHINSQ=; b=RqhFQ2PLr1YUs0ANrD8b3ChrE4rjCTEsckMSbbhGstWwCgSBb7uv1heT6UHyUbcFlS flbj/76wVwQt8cjT2t0Xjn5eRakEAWn2HopwzrBUkfK/r0h6ZcNxYjDyCWoEh0Um2IN8 pfeXtfpZpzoQYvkL2utsBY7xS6eEmLY5WgWdMe/UE7VRTqGeCZUbCXX69n8LUlwmRIvp C+5WbC/+oqB1suoSAR3UJWKmDgXmzIfqjiG+bqDsjeea+722uASO7Q3ywglRHJCxHXYv FNC2BRmhrC3rqqKJ+vIGWWNA3VlPZU2/Da2eNqLd2IcYEaYxqQUGBmuR2D/fgtHsjmwq PqJQ== X-Gm-Message-State: AOAM532UYhlIFs7z+KWhFD2iThXKVQgu4VaS7dBrDupwzKxUgoiCn9Ks HO2LiFgTZ9D8pYMYdKOHcclHVDlE+2k7 X-Google-Smtp-Source: ABdhPJyCped9ksqzdLEM40g57/qDjseA52PWk9Wrs3SSoGsoUdVBnWP3vjHtgAmV/Bk6EsDeZpEPZzBm5aIk X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:db0d:b0:14f:b047:8d22 with SMTP id m13-20020a170902db0d00b0014fb0478d22mr19067647plx.90.1646775307419; Tue, 08 Mar 2022 13:35:07 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:14 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-12-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 11/14] mm/madvise: introduce MADV_COLLAPSE sync hugepage collapse From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspam-User: X-Rspamd-Server: rspam01 X-Rspamd-Queue-Id: 59DF82000E X-Stat-Signature: hs8m6nt46mhgecjzy5r13cw9so1y7tap Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=X2K0PE5p; dmarc=pass (policy=reject) header.from=google.com; spf=pass (imf03.hostedemail.com: domain of 3C8wnYgcKCFoRGC66768GG8D6.4GEDAFMP-EECN24C.GJ8@flex--zokeefe.bounces.google.com designates 209.85.216.74 as permitted sender) smtp.mailfrom=3C8wnYgcKCFoRGC66768GG8D6.4GEDAFMP-EECN24C.GJ8@flex--zokeefe.bounces.google.com X-HE-Tag: 1646775308-366568 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The idea of hugepage collapse in process context was previously introduced by David Rientjes to linux-mm[1]. The idea is to introduce a new madvise mode, MADV_COLLAPSE, that allows users to request a synchronous collapse of memory. The benefits of this approach are: * cpu is charged to the process that wants to spend the cycles for the THP * avoid unpredictable timing of khugepaged collapse * flexible separation of sync userspace and async khugepaged THP collapse policies Immediate users of this new functionality include: * malloc implementations that manage memory in hugepage-sized chunks, but sometimes subrelease memory back to the system in native-sized chunks via MADV_DONTNEED; zapping the pmd. Later, when the memory is hot, the implementation could madvise(MADV_COLLAPSE) to re-back the memory by THP to regain TLB performance. * immediately back executable text by hugepages. Current support provided by CONFIG_READ_ONLY_THP_FOR_FS may take too long on a large system. To keep patches digestible, introduce MADV_COLLAPSE in a few stages. Add plumbing to existing madvise infrastructure, as well as populate uapi header files, leaving the actual madvise(MADV_COLLAPSE) handler stubbed out. Only privately-mapped anon memory is supported for now. [1] https://lore.kernel.org/linux-mm/d098c392-273a-36a4-1a29-59731cdf5d3d@google.com/ Signed-off-by: Zach O'Keefe --- include/linux/huge_mm.h | 12 +++++++ include/uapi/asm-generic/mman-common.h | 2 ++ mm/khugepaged.c | 46 ++++++++++++++++++++++++++ mm/madvise.c | 5 +++ 4 files changed, 65 insertions(+) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index fd905b0b2c71..407b63ab4185 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -226,6 +226,9 @@ void __split_huge_pud(struct vm_area_struct *vma, pud_t *pud, int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); +int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); @@ -383,6 +386,15 @@ static inline int hugepage_madvise(struct vm_area_struct *vma, BUG(); return 0; } + +static inline int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, unsigned long end) +{ + BUG(); + return 0; +} + static inline void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6c1aa92a92e4..6ce1f1ceb432 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -77,6 +77,8 @@ #define MADV_DONTNEED_LOCKED 24 /* like DONTNEED, but drop locked pages too */ +#define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 12ae765c5c32..ca1e523086ed 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2519,3 +2519,49 @@ void khugepaged_min_free_kbytes_update(void) set_recommended_min_free_kbytes(); mutex_unlock(&khugepaged_mutex); } + +/* + * Returns 0 if successfully able to collapse range into THPs (or range already + * backed by THPs). Due to implementation detail, THPs collapsed here may be + * split again before this function returns. + */ +static int _madvise_collapse(struct mm_struct *mm, + struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start, + unsigned long end, gfp_t gfp, + struct collapse_control *cc) +{ + /* Implemented in later patch */ + return -ENOSYS; +} + +int madvise_collapse(struct vm_area_struct *vma, + struct vm_area_struct **prev, unsigned long start, + unsigned long end) +{ + struct collapse_control cc; + gfp_t gfp; + int error; + struct mm_struct *mm = vma->vm_mm; + + /* Requested to hold mmap_lock in read */ + mmap_assert_locked(mm); + + mmgrab(mm); + collapse_control_init(&cc, /* enforce_pte_scan_limits= */ false); + gfp = vma_thp_gfp_mask(vma); + lru_add_drain(); /* lru_add_drain_all() too heavy here */ + error = _madvise_collapse(mm, vma, prev, start, end, gfp, &cc); + mmap_assert_locked(mm); + mmdrop(mm); + + /* + * madvise() returns EAGAIN if kernel resources are temporarily + * unavailable. + */ + if (error == -ENOMEM) + error = -EAGAIN; + + return error; +} diff --git a/mm/madvise.c b/mm/madvise.c index 5b6d796e55de..292aa017c150 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -58,6 +58,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_FREE: case MADV_POPULATE_READ: case MADV_POPULATE_WRITE: + case MADV_COLLAPSE: return 0; default: /* be safe, default to 1. list exceptions explicitly */ @@ -1046,6 +1047,8 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, if (error) goto out; break; + case MADV_COLLAPSE: + return madvise_collapse(vma, prev, start, end); } anon_name = anon_vma_name(vma); @@ -1139,6 +1142,7 @@ madvise_behavior_valid(int behavior) #ifdef CONFIG_TRANSPARENT_HUGEPAGE case MADV_HUGEPAGE: case MADV_NOHUGEPAGE: + case MADV_COLLAPSE: #endif case MADV_DONTDUMP: case MADV_DODUMP: @@ -1328,6 +1332,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * MADV_NOHUGEPAGE - mark the given range as not worth being backed by * transparent huge pages so the existing pages will not be * coalesced into THP and new pages will not be allocated as THP. + * MADV_COLLAPSE - synchronously coalesce pages into new THP. * MADV_DONTDUMP - the application wants to prevent pages in the given range * from being included in its core dump. * MADV_DODUMP - cancel MADV_DONTDUMP: no longer exclude from core dump. From patchwork Tue Mar 8 21:34:15 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774401 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 82EA8C433FE for ; Tue, 8 Mar 2022 21:35:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 858EB8D000F; Tue, 8 Mar 2022 16:35:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 807BF8D0001; Tue, 8 Mar 2022 16:35:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 60DC28D000F; Tue, 8 Mar 2022 16:35:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id 4F3938D0001 for ; Tue, 8 Mar 2022 16:35:11 -0500 (EST) Received: from smtpin11.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay02.hostedemail.com (Postfix) with ESMTP id 14B6224B0F for ; Tue, 8 Mar 2022 21:35:11 +0000 (UTC) X-FDA: 79222524822.11.003880E Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf03.hostedemail.com (Postfix) with ESMTP id 971482000E for ; Tue, 8 Mar 2022 21:35:10 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id lp2-20020a17090b4a8200b001bc449ecbceso2404568pjb.8 for ; Tue, 08 Mar 2022 13:35:10 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=H7jECIaDr+FdfznLZ0vKwtRCcRrA2wpZv9OVKIXPyJg=; b=UvpA9NrPYfwSywAS+50LMB5WhpsPfOY7H/AwesKT6MWsdwbT46uZVGw2THKQ2ghj+o n7C6nrGzv3ZvSZ9WJNKSQyjpxJgbU1+Ef8wM7bDSHUFnQFRjyrqh5RTL8DVtrZNQ1v73 EtrHBuBmaDcIDkPz0AAl/XNr4K1Y3LUpCEAUPT3u7A3AjjTP86VcXb2/l2l0Hwrs/lO9 o+00FNImUZ7ojKsuphyTixZPP2neTVk1mwj82lsdivN5JR2oF99sMb/vOVEiRb4WY2AF dQip6rQR99R2ryieLMGsHwnK5TXgnhMQeByo7LZVUIF8yWrMN/ZPUJLmhVnjIX2gqlCB 1mtw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=H7jECIaDr+FdfznLZ0vKwtRCcRrA2wpZv9OVKIXPyJg=; b=My3OcmYFVoUtfIu4Pt0vZ1Oz+MKOMhiLIFpN1b5LJshr5pJUfkitvG8bCDzo0G+cud au4kjO3Yrfqn0yGUK7kaE2KoXC1WA+VJNz64DPiotq4I095ISjamlSIuYpzPh83U21q+ VpJWXstunBvY3qb5YYvyZ5ZeKcgCwPrw4dtLvHnU4QyY3NVQRAmgB05T2yBMsARraSg8 w12AE7L1E++NQ2mEdspDyfq/Ebtk3Ufi+DcZHvi2qWjuyr0QjJOC8QJyXVc5tobry+Vf XlChSRPcLNmMgIhIOEQkbf0pcMs0ufPSznp99a5MsO+n66vLdvPaNxJNiRKM3+CUMLfQ YggA== X-Gm-Message-State: AOAM5335wxUXFot0hLJUQg2N884NZGJgI6E/6g8yhZcTwmFMBQT8Xr7O VJr+lLlKA+p89wdrtcaleda87YnynW4z X-Google-Smtp-Source: ABdhPJxzGSH0TAJlzdvIQ2QZgM3M99N1XKU1ygekga7HtGowsRCsi8WEt3GWVIZIU1aABdFGmvLGP7zLWotD X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90b:4d81:b0:1bf:8ce4:4f51 with SMTP id oj1-20020a17090b4d8100b001bf8ce44f51mr655627pjb.0.1646775309252; Tue, 08 Mar 2022 13:35:09 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:15 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-13-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 12/14] mm/madvise: introduce batched madvise(MADV_COLLPASE) collapse From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Server: rspam06 X-Rspamd-Queue-Id: 971482000E X-Stat-Signature: bo3mhuj33bd679go3aaqgq7xkn3ipdor Authentication-Results: imf03.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=UvpA9NrP; spf=pass (imf03.hostedemail.com: domain of 3DcwnYgcKCFwTIE8898AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3DcwnYgcKCFwTIE8898AIIAF8.6IGFCHOR-GGEP46E.ILA@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspam-User: X-HE-Tag: 1646775310-894231 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Introduce the main madvise collapse batched logic, including the overall locking strategy. Stubs for individual batched actions, such as scanning pmds in batch, have been stubbed out, and will be added later in the series. Note the main benefit from doing all this work in a batched manner is that __madvise__collapse_pmd_batch() (stubbed out) can be called inside a single mmap_lock write. Per-batch data is stored in a struct madvise_collapse_data array, with an entry for each pmd to collapse, and is shared between the various *_batch actions. This allows for partial success of collapsing a range of pmds - we continue as long as some pmds can be successfully collapsed. A "success" here, is where all pmds can be (or already are) collapsed. On failure, the caller will need to verify what, if any, partial successes occurred via smaps or otherwise. Also note that, where possible, if collapse fails for a particular pmd after a hugepage has already been allocated, said hugepage is kept on a per-node free list for the purpose of backing subsequent pmd collapses. All unused hugepages are returned before _madvise_collapse() returns. Note that bisect at this patch won't break; madvise(MADV_COLLAPSE) will return -1 always. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 279 ++++++++++++++++++++++++++++++++++++++++++++++-- 1 file changed, 273 insertions(+), 6 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ca1e523086ed..ea53c706602e 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -86,6 +86,9 @@ static struct kmem_cache *mm_slot_cache __read_mostly; #define MAX_PTE_MAPPED_THP 8 struct collapse_control { + /* Used by MADV_COLLAPSE batch collapse */ + struct list_head free_hpages[MAX_NUMNODES]; + /* Respect khugepaged_max_ptes_[none|swap|shared] */ bool enforce_pte_scan_limits; @@ -99,8 +102,13 @@ struct collapse_control { static void collapse_control_init(struct collapse_control *cc, bool enforce_pte_scan_limits) { + int i; + cc->enforce_pte_scan_limits = enforce_pte_scan_limits; cc->last_target_node = NUMA_NO_NODE; + + for (i = 0; i < MAX_NUMNODES; ++i) + INIT_LIST_HEAD(cc->free_hpages + i); } /** @@ -1033,7 +1041,7 @@ static pmd_t *find_pmd_or_thp_or_none(struct mm_struct *mm, /* See comments in pmd_none_or_trans_huge_or_clear_bad() */ barrier(); #endif - if (!pmd_present(pmde) || !pmd_none(pmde)) { + if (!pmd_present(pmde) || pmd_none(pmde)) { *result = SCAN_PMD_NULL; return NULL; } else if (pmd_trans_huge(pmde)) { @@ -1054,12 +1062,16 @@ static pmd_t *find_pmd_or_thp_or_none(struct mm_struct *mm, static bool __collapse_huge_page_swapin(struct mm_struct *mm, struct vm_area_struct *vma, unsigned long haddr, pmd_t *pmd, - int referenced) + int referenced, + unsigned long vm_flags_ignored, + bool *mmap_lock_dropped) { int swapped_in = 0; vm_fault_t ret = 0; unsigned long address, end = haddr + (HPAGE_PMD_NR * PAGE_SIZE); + if (mmap_lock_dropped) + *mmap_lock_dropped = false; for (address = haddr; address < end; address += PAGE_SIZE) { struct vm_fault vmf = { .vma = vma, @@ -1080,8 +1092,10 @@ static bool __collapse_huge_page_swapin(struct mm_struct *mm, /* do_swap_page returns VM_FAULT_RETRY with released mmap_lock */ if (ret & VM_FAULT_RETRY) { + if (mmap_lock_dropped) + *mmap_lock_dropped = true; mmap_read_lock(mm); - if (hugepage_vma_revalidate(mm, haddr, VM_NONE, &vma)) { + if (hugepage_vma_revalidate(mm, haddr, vm_flags_ignored, &vma)) { /* vma is no longer available, don't continue to swapin */ trace_mm_collapse_huge_page_swapin(mm, swapped_in, referenced, 0); return false; @@ -1256,7 +1270,8 @@ static void khugepaged_collapse_huge_page(struct mm_struct *mm, * Continuing to collapse causes inconsistency. */ if (unmapped && !__collapse_huge_page_swapin(mm, vma, address, - pmd, referenced)) { + pmd, referenced, VM_NONE, + NULL)) { mmap_read_unlock(mm); goto out_nolock; } @@ -2520,6 +2535,128 @@ void khugepaged_min_free_kbytes_update(void) mutex_unlock(&khugepaged_mutex); } +struct madvise_collapse_data { + struct page *hpage; /* Preallocated THP */ + bool continue_collapse; /* Should we attempt / continue collapse? */ + + struct scan_pmd_result scan_result; + pmd_t *pmd; +}; + +static int +madvise_collapse_vma_revalidate_pmd_count(struct mm_struct *mm, + unsigned long address, int nr, + struct vm_area_struct **vmap) +{ + /* madvise_collapse() ignores MADV_NOHUGEPAGE */ + return hugepage_vma_revalidate_pmd_count(mm, address, nr, VM_NOHUGEPAGE, + vmap); +} + +/* + * Scan pmd to see which we can collapse, and to determine node to allocate on. + * + * Must be called with mmap_lock in read, and returns with the lock held in + * read. Does not drop the lock. + * + * Set batch_data[i]->continue_collapse to false for any pmd that can't be + * collapsed. + * + * Return the number of existing THPs in batch. + */ +static int +__madvise_collapse_scan_pmd_batch(struct mm_struct *mm, + struct vm_area_struct *vma, + unsigned long batch_start, + struct madvise_collapse_data *batch_data, + int batch_size, + struct collapse_control *cc) +{ + /* Implemented in later patch */ + return 0; +} + +/* + * Preallocate and charge huge page for each pmd in the batch, store the + * new page in batch_data[i]->hpage. + * + * Return the number of huge pages allocated. + */ +static int +__madvise_collapse_prealloc_hpages_batch(struct mm_struct *mm, + gfp_t gfp, + int node, + struct madvise_collapse_data *batch_data, + int batch_size, + struct collapse_control *cc) +{ + /* Implemented in later patch */ + return 0; +} + +/* + * Do swapin for all ranges in batch, returns true iff successful. + * + * Called with mmap_lock held in read, and returns with it held in read. + * Might drop the lock. + * + * Set batch_data[i]->continue_collapse to false for any pmd that can't be + * collapsed. Else, set batch_data[i]->pmd to the found pmd. + */ +static bool +__madvise_collapse_swapin_pmd_batch(struct mm_struct *mm, + int node, + unsigned long batch_start, + struct madvise_collapse_data *batch_data, + int batch_size, + struct collapse_control *cc) + +{ + /* Implemented in later patch */ + return true; +} + +/* + * Do the collapse operation. Return number of THPs collapsed successfully. + * + * Called with mmap_lock held in write, and returns with it held. Does not + * drop the lock. + */ +static int +__madvise_collapse_pmd_batch(struct mm_struct *mm, + unsigned long batch_start, + int batch_size, + struct madvise_collapse_data *batch_data, + int node, + struct collapse_control *cc) +{ + /* Implemented in later patch */ + return 0; +} + +static bool continue_collapse(struct madvise_collapse_data *batch_data, + int batch_size) +{ + int i; + + for (i = 0; i < batch_size; ++i) + if (batch_data[i].continue_collapse) + return true; + return false; +} + +static bool madvise_transparent_hugepage_enabled(struct vm_area_struct *vma) +{ + if (vma_is_anonymous(vma)) + /* madvise_collapse() ignores MADV_NOHUGEPAGE */ + return __transparent_hugepage_enabled(vma, vma->vm_flags & + ~VM_NOHUGEPAGE); + /* TODO: Support file-backed memory */ + return false; +} + +#define MADVISE_COLLAPSE_BATCH_SIZE 8 + /* * Returns 0 if successfully able to collapse range into THPs (or range already * backed by THPs). Due to implementation detail, THPs collapsed here may be @@ -2532,8 +2669,138 @@ static int _madvise_collapse(struct mm_struct *mm, unsigned long end, gfp_t gfp, struct collapse_control *cc) { - /* Implemented in later patch */ - return -ENOSYS; + unsigned long hstart, hend, batch_addr; + int ret = -EINVAL, collapsed = 0, nr_hpages = 0, i; + struct madvise_collapse_data batch_data[MADVISE_COLLAPSE_BATCH_SIZE]; + + mmap_assert_locked(mm); + BUG_ON(vma->vm_start > start); + BUG_ON(vma->vm_end < end); + VM_BUG_ON_MM(atomic_read(&mm->mm_users) == 0, mm); + + hstart = (start + ~HPAGE_PMD_MASK) & HPAGE_PMD_MASK; + hend = end & HPAGE_PMD_MASK; + nr_hpages = (hend - hstart) >> HPAGE_PMD_SHIFT; + if (hstart >= hend) + goto out; + + if (!madvise_transparent_hugepage_enabled(vma)) + goto out; + + /* + * Request might cover multiple hugepages. Strategy is to batch + * allocation and collapse operations so that we do more work while + * mmap_lock is held exclusively. + * + * While processing batch, mmap_lock is locked/unlocked many times for + * the supplied VMA. It's possible that the original VMA is split while + * lock was dropped. If in the context of the (possibly new) VMA, THP + * collapse is possible, we continue. + */ + for (batch_addr = hstart; + batch_addr < hend; + batch_addr += HPAGE_PMD_SIZE * MADVISE_COLLAPSE_BATCH_SIZE) { + int node, batch_size; + int thps; /* Number of existing THPs in range */ + + batch_size = (hend - batch_addr) >> HPAGE_PMD_SHIFT; + batch_size = min_t(int, batch_size, + MADVISE_COLLAPSE_BATCH_SIZE); + + BUG_ON(batch_size <= 0); + memset(batch_data, 0, sizeof(batch_data)); + cond_resched(); + VM_BUG_ON_MM(atomic_read(&mm->mm_users) == 0, mm); + + /* + * If first batch, we still hold mmap_lock from madvise + * call and haven't dropped it since checking the VMA. Else, + * we've dropped the lock and we need to revalidate. + */ + if (batch_addr != hstart) { + mmap_read_lock(mm); + if (madvise_collapse_vma_revalidate_pmd_count(mm, + batch_addr, + batch_size, + &vma)) + goto loop_unlock_break; + } + + mmap_assert_locked(mm); + + thps = __madvise_collapse_scan_pmd_batch(mm, vma, batch_addr, + batch_data, batch_size, + cc); + mmap_read_unlock(mm); + + /* Count existing THPs as-if we collapsed them */ + collapsed += thps; + if (thps == batch_size || !continue_collapse(batch_data, + batch_size)) + continue; + + node = find_target_node(cc); + if (!__madvise_collapse_prealloc_hpages_batch(mm, gfp, node, + batch_data, + batch_size, cc)) { + /* No more THPs available - so give up */ + ret = -ENOMEM; + break; + } + + mmap_read_lock(mm); + if (!__madvise_collapse_swapin_pmd_batch(mm, node, batch_addr, + batch_data, batch_size, + cc)) + goto loop_unlock_break; + mmap_read_unlock(mm); + mmap_write_lock(mm); + collapsed += __madvise_collapse_pmd_batch(mm, + batch_addr, batch_size, batch_data, + node, cc); + mmap_write_unlock(mm); + + for (i = 0; i < batch_size; ++i) { + struct page *page = batch_data[i].hpage; + + if (page && !IS_ERR(page)) { + list_add_tail(&page->lru, + &cc->free_hpages[node]); + batch_data[i].hpage = NULL; + } + } + /* mmap_lock is unlocked here */ + continue; +loop_unlock_break: + mmap_read_unlock(mm); + break; + } + /* mmap_lock is unlocked here */ + + for (i = 0; i < MADVISE_COLLAPSE_BATCH_SIZE; ++i) { + struct page *page = batch_data[i].hpage; + + if (page && !IS_ERR(page)) { + mem_cgroup_uncharge(page_folio(page)); + put_page(page); + } + } + for (i = 0; i < MAX_NUMNODES; ++i) { + struct page *page, *tmp; + + list_for_each_entry_safe(page, tmp, cc->free_hpages + i, lru) { + list_del(&page->lru); + mem_cgroup_uncharge(page_folio(page)); + put_page(page); + } + } + ret = collapsed == nr_hpages ? 0 : -1; + vma = NULL; /* tell sys_madvise we dropped mmap_lock */ + mmap_read_lock(mm); /* sys_madvise expects us to have mmap_lock */ +out: + *prev = vma; /* we didn't drop mmap_lock, so this holds */ + + return ret; } int madvise_collapse(struct vm_area_struct *vma, From patchwork Tue Mar 8 21:34:16 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774402 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id AE793C433F5 for ; Tue, 8 Mar 2022 21:35:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 26C2C8D0010; Tue, 8 Mar 2022 16:35:13 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 217CB8D0001; Tue, 8 Mar 2022 16:35:13 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 01B488D0010; Tue, 8 Mar 2022 16:35:12 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.25]) by kanga.kvack.org (Postfix) with ESMTP id DDD088D0001 for ; Tue, 8 Mar 2022 16:35:12 -0500 (EST) Received: from smtpin14.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay08.hostedemail.com (Postfix) with ESMTP id AD1EF2072D for ; Tue, 8 Mar 2022 21:35:12 +0000 (UTC) X-FDA: 79222524864.14.ED32DD5 Received: from mail-pg1-f202.google.com (mail-pg1-f202.google.com [209.85.215.202]) by imf16.hostedemail.com (Postfix) with ESMTP id 403C118000E for ; Tue, 8 Mar 2022 21:35:12 +0000 (UTC) Received: by mail-pg1-f202.google.com with SMTP id g31-20020a63521f000000b003783582a261so143992pgb.5 for ; Tue, 08 Mar 2022 13:35:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=kONPxFUwqtCJSoObDTSTQSIAj4A5yIP0m8A8ViPpLgk=; b=ZiK0UJaOnPideDrH5WTLprss7RJmCz0RC2ybLROo78mtqlmyDY4OM99jizFS6RyC5A 0e4blSZ0vyOH47X4mNIob+MUQptouzEm2XTA1mPQNZtJ1FlhgAz4EPwLVYuptvKClp4U JnwIaFSuX9mzUxwzYhyh6OUEGRH1DE1yA3UAdUuoxDYr0ZiMC/HLun4dBv8ICsDDrrD2 LdGKbb8dvCZxxfBki1YUzSlp7Pj+yOn/kgkFWoHg7G/ggiDOBwqXkWj8r1ozoz6l9f6+ GIgtmwYirTULnJ+5N3srvWZIRwBhbdOjOn6mglU6jJHb8ruaUypY2LeMCc+iOqhEoYPP H8PA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=kONPxFUwqtCJSoObDTSTQSIAj4A5yIP0m8A8ViPpLgk=; b=TwTbBXMHHkdq+ZQmjsISw3xvDr1sWJgdY2pjX0uw0YOX4Pw/rZ2tTjwfv7F6WWHn3L L5guRzaYnARklEw5CILyEVIjfXVyax/OBuI8juTRuGIBxq64CYNGDH6R7ztypO3jnYLo OFOIbnaTn4Hr335zek8VUQB0o0W/tX/RqTU0rAYRuxfjfeEYmMEtwlCkKw4S0sNkWpUb 7oN9/czZ2juVPnX4kKyXivMvHUCIv0H6odd+GbnTorswctMpTxbCRe9IrCAlxQXi2iHH ImlePJ5T/fSUqLQL2+OF5vy+hUqnwAQCfW2jFodzQPTREuZSIDD3VcaEmOy4PLR1GBcq 2TXQ== X-Gm-Message-State: AOAM533rimxEoDJnHqzbVkbUwvrtnllYUK3L5Od8/IqZhgdP2wxpo+02 qSn/Wt3nLQEAxidOzWhyP7U2HZrwkFcs X-Google-Smtp-Source: ABdhPJwswgBfRWAdXvSJ2bD/0J5YHz9l+TXT0xPaCjv3ZH+ZswujjTrwYwTAVcnkhgZJC5wXmxoWU9k06LmE X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:902:930b:b0:14d:b0c0:1f71 with SMTP id bc11-20020a170902930b00b0014db0c01f71mr19701788plb.113.1646775311273; Tue, 08 Mar 2022 13:35:11 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:16 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-14-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 13/14] mm/madvise: add __madvise_collapse_*_batch() actions. From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Server: rspam05 X-Rspamd-Queue-Id: 403C118000E X-Rspam-User: Authentication-Results: imf16.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=ZiK0UJaO; spf=pass (imf16.hostedemail.com: domain of 3D8wnYgcKCF4VKGAABACKKCHA.8KIHEJQT-IIGR68G.KNC@flex--zokeefe.bounces.google.com designates 209.85.215.202 as permitted sender) smtp.mailfrom=3D8wnYgcKCF4VKGAABACKKCHA.8KIHEJQT-IIGR68G.KNC@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Stat-Signature: 9fi44sg9edu18i737em44pstxixfewe5 X-HE-Tag: 1646775312-287625 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Add implementations for the following batch actions: scan_pmd: Iterate over batch and scan the pmd for eligibility. Note that this function is called with mmap_lock in read, and does not drop it before returning. If a batch entry fails, ->continue_collapse field of its madvise_collapse_data is set to 'false' so that later _batch actions know to ignore it. Return the number of THPs already the batch, which is needed by _madvise_collapse() to determine overall "success" criteria (all pmds either collapsed successfully, or already THP-backed). prealloc_hpages: Iterate over batch and allocate / charge hugepages. Before allocating a new page, check on local free hugepage list. Similarly, if, after allocating a hugepage, charging the memcg fails, save the hugepage on a local free list for future use. swapin_pmd: Iterate over batch and attempt to swap-in pages that are currently swapped out. Called with mmap_lock in read, and returns with it held; however, it might drop and require the lock internally. Specifically, __collapse_huge_page_swapin() might drop + require the mmap_lock. When it does so, it only revalidates the vma/address for a single pmd. Since we need to revalidate the vma for the entire region covered in the batch, we need to be notified when the lock is dropped so that we can perform the required revalidation. As such, add an argument to __collapse_huge_page_swapin() to notify caller when mmap_lock is dropped. collapse_pmd: Iterate over the batch and perform the actual collapse for each pmd. Note that this is done while holding the mmap_lock in write for the entire batch action. Signed-off-by: Zach O'Keefe --- mm/khugepaged.c | 153 +++++++++++++++++++++++++++++++++++++++++++++--- 1 file changed, 145 insertions(+), 8 deletions(-) diff --git a/mm/khugepaged.c b/mm/khugepaged.c index ea53c706602e..e8156f15a3da 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2572,8 +2572,23 @@ __madvise_collapse_scan_pmd_batch(struct mm_struct *mm, int batch_size, struct collapse_control *cc) { - /* Implemented in later patch */ - return 0; + unsigned long addr, i; + int thps = 0; + + mmap_assert_locked(mm); + + for (addr = batch_start, i = 0; i < batch_size; + addr += HPAGE_PMD_SIZE, ++i) { + struct madvise_collapse_data *data = batch_data + i; + + scan_pmd(mm, vma, addr, cc, &data->scan_result); + data->continue_collapse = + data->scan_result.result == SCAN_SUCCEED; + if (data->scan_result.result == SCAN_PAGE_COMPOUND) + ++thps; + } + mmap_assert_locked(mm); + return thps; } /* @@ -2590,8 +2605,39 @@ __madvise_collapse_prealloc_hpages_batch(struct mm_struct *mm, int batch_size, struct collapse_control *cc) { - /* Implemented in later patch */ - return 0; + int nr_hpages = 0; + int i; + + for (i = 0; i < batch_size; ++i) { + struct madvise_collapse_data *data = batch_data + i; + + if (!data->continue_collapse) + continue; + + if (!list_empty(&cc->free_hpages[node])) { + data->hpage = list_first_entry(&cc->free_hpages[node], + struct page, lru); + list_del(&data->hpage->lru); + } else { + data->hpage = __alloc_pages_node(node, gfp, + HPAGE_PMD_ORDER); + if (unlikely(!data->hpage)) + break; + + prep_transhuge_page(data->hpage); + + if (unlikely(mem_cgroup_charge(page_folio(data->hpage), + mm, gfp))) { + /* No use reusing page, so give it back */ + put_page(data->hpage); + data->hpage = NULL; + data->continue_collapse = false; + break; + } + } + ++nr_hpages; + } + return nr_hpages; } /* @@ -2612,8 +2658,67 @@ __madvise_collapse_swapin_pmd_batch(struct mm_struct *mm, struct collapse_control *cc) { - /* Implemented in later patch */ - return true; + unsigned long addr; + int i; + bool ret = true; + + /* + * This function is called with mmap_lock held, and returns with it + * held. However, __collapse_huge_page_swapin() may internally drop and + * reaquire the lock. When it does, it only revalidates the single pmd + * provided to it. We need to know when it drops the lock so that we can + * revalidate the batch of pmds we are operating on. + * + * Initially setting this to 'true' because the caller just locked + * mmap_lock and so we need to revalidate before doing anything else. + */ + bool need_revalidate_pmd_count = true; + + for (addr = batch_start, i = 0; + i < batch_size; + addr += HPAGE_PMD_SIZE, ++i) { + struct vm_area_struct *vma; + struct madvise_collapse_data *data = batch_data + i; + + mmap_assert_locked(mm); + + /* + * We might have dropped the lock during previous iteration. + * It's acceptable to exit this function without revalidating + * the vma since the caller immediately unlocks mmap_lock + * anyway. + */ + if (!data->continue_collapse) + continue; + + if (need_revalidate_pmd_count) { + if (madvise_collapse_vma_revalidate_pmd_count(mm, + batch_start, + batch_size, + &vma)) { + ret = false; + break; + } + need_revalidate_pmd_count = false; + } + + data->pmd = mm_find_pmd(mm, addr); + + if (!data->pmd || + (data->scan_result.unmapped && + !__collapse_huge_page_swapin(mm, vma, addr, data->pmd, + VM_NOHUGEPAGE, + data->scan_result.referenced, + &need_revalidate_pmd_count))) { + /* Hold on to the THP until we know we don't need it. */ + data->continue_collapse = false; + list_add_tail(&data->hpage->lru, + &cc->free_hpages[node]); + data->hpage = NULL; + } + } + mmap_assert_locked(mm); + return ret; } /* @@ -2630,8 +2735,40 @@ __madvise_collapse_pmd_batch(struct mm_struct *mm, int node, struct collapse_control *cc) { - /* Implemented in later patch */ - return 0; + unsigned long addr; + struct vm_area_struct *vma; + int i, ret = 0; + + mmap_assert_write_locked(mm); + + if (madvise_collapse_vma_revalidate_pmd_count(mm, batch_start, + batch_size, &vma)) + goto out; + + for (addr = batch_start, i = 0; + i < batch_size; + addr += HPAGE_PMD_SIZE, ++i) { + int result; + struct madvise_collapse_data *data = batch_data + i; + + if (!data->continue_collapse || + (mm_find_pmd(mm, addr) != data->pmd)) + continue; + + result = __do_collapse_huge_page(mm, vma, addr, data->pmd, + data->hpage, + cc->enforce_pte_scan_limits, + NULL); + + if (result == SCAN_SUCCEED) + ++ret; + else + list_add_tail(&data->hpage->lru, + &cc->free_hpages[node]); + data->hpage = NULL; + } +out: + return ret; } static bool continue_collapse(struct madvise_collapse_data *batch_data, From patchwork Tue Mar 8 21:34:17 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Zach O'Keefe X-Patchwork-Id: 12774403 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id C7DA6C433F5 for ; Tue, 8 Mar 2022 21:35:16 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A7A818D0011; Tue, 8 Mar 2022 16:35:14 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A03838D0001; Tue, 8 Mar 2022 16:35:14 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 82F0C8D0011; Tue, 8 Mar 2022 16:35:14 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 71EAA8D0001 for ; Tue, 8 Mar 2022 16:35:14 -0500 (EST) Received: from smtpin03.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay09.hostedemail.com (Postfix) with ESMTP id 4815A2491A for ; Tue, 8 Mar 2022 21:35:14 +0000 (UTC) X-FDA: 79222524948.03.18E1E1C Received: from mail-pj1-f73.google.com (mail-pj1-f73.google.com [209.85.216.73]) by imf19.hostedemail.com (Postfix) with ESMTP id C7DBB1A000B for ; Tue, 8 Mar 2022 21:35:13 +0000 (UTC) Received: by mail-pj1-f73.google.com with SMTP id t10-20020a17090a5d8a00b001bed9556134so2413473pji.5 for ; Tue, 08 Mar 2022 13:35:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20210112; h=date:in-reply-to:message-id:mime-version:references:subject:from:to :cc; bh=n3enWErGQM67F6sH7wYU8i9Ht9iQdMEWoRpkF+ulJ0U=; b=K2q2B3EPkv5c8XWiXP04Hut294WEZDY/CrogYsDNsURrlL3kV5Kh55E2TEFp1H9KZw Qc3MpXlvPvUMmOAUarjjw6+eyrou0Y7YXK1ent5Oo64Zi5dQaryUd9VIR5RtHjGxEWAP X1f++Xk0jXqDQbKU0mjufH/Em51Wz8mnW8K7ipqRfldJSrqcHmoOVjejyVeih8bsSFZg LKVAwEKLmAkqzS25LR07VT07jXk0lIzIubG7xl7jEZQooNJM9tlDuYxpD4CsbQKgFlym /k8qe/5RO+H6CEP2zh6m5sH8ALiawh+gzbgbszh+In+ABHpFWbQMlftG6y5OMgnFi2Mi nHCQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:date:in-reply-to:message-id:mime-version :references:subject:from:to:cc; bh=n3enWErGQM67F6sH7wYU8i9Ht9iQdMEWoRpkF+ulJ0U=; b=vPAdo4ItrB681/VNOOPQyHmiSODOLyYaOIqK9+tEstevvt+x3lkEuA+l9KzZI+fOxw ZxU8mGyhDL6LSJ1VY/iIN8/exnwXVkEMp7Hqp6rjDhMNPwrDg6QIcJflifBkXlOCYUzM 9a5isI2lH6ZOiezxUtsb1rt9udGfPjmZhzl6PA9kdr7WspS+nGNwl5XpnHwFXm8mfMTK 69y2EzWVxnvNL+PcL+ULDhDFN0SA+uOovWPpOM4N5+ZpzfICFBtPuGF4e1TKJq+cDGog FjuiKQOR/wJ9FFsqHZB3USfJ7ugAY5BxGzEPI9P6XuOs74VZgHmyU5pY10YQdbB2DnWt Zb7Q== X-Gm-Message-State: AOAM533J7+7mY/idOn9VGms4Xr5ujK+DKlSeiv7dpwoPf6c8+sh7wY/y vFNp/Kt8DuKwaJnA1ydjXnKSrcSjCzii X-Google-Smtp-Source: ABdhPJzdyN2fnS0H0sfG3Uqh3b3n3sraJxueYOqi8HdLaSJ6tkNPuwKdZU2rlzrt491Q0dE+Ddefu/OkP45B X-Received: from zokeefe3.c.googlers.com ([fda3:e722:ac3:cc00:7f:e700:c0a8:1b6]) (user=zokeefe job=sendgmr) by 2002:a17:90a:de02:b0:1be:dbd6:b5a7 with SMTP id m2-20020a17090ade0200b001bedbd6b5a7mr6920873pjv.222.1646775312872; Tue, 08 Mar 2022 13:35:12 -0800 (PST) Date: Tue, 8 Mar 2022 13:34:17 -0800 In-Reply-To: <20220308213417.1407042-1-zokeefe@google.com> Message-Id: <20220308213417.1407042-15-zokeefe@google.com> Mime-Version: 1.0 References: <20220308213417.1407042-1-zokeefe@google.com> X-Mailer: git-send-email 2.35.1.616.g0bdcbb4464-goog Subject: [RFC PATCH 14/14] mm/madvise: add process_madvise(MADV_COLLAPSE) From: "Zach O'Keefe" To: Alex Shi , David Hildenbrand , David Rientjes , Michal Hocko , Pasha Tatashin , SeongJae Park , Song Liu , Vlastimil Babka , Zi Yan , linux-mm@kvack.org Cc: Andrea Arcangeli , Andrew Morton , Arnd Bergmann , Axel Rasmussen , Chris Kennelly , Chris Zankel , Helge Deller , Hugh Dickins , Ivan Kokshaysky , "James E.J. Bottomley" , Jens Axboe , "Kirill A. Shutemov" , Matthew Wilcox , Matt Turner , Max Filippov , Miaohe Lin , Minchan Kim , Patrick Xia , Pavel Begunkov , Peter Xu , Richard Henderson , Thomas Bogendoerfer , Yang Shi , "Zach O'Keefe" X-Rspamd-Queue-Id: C7DBB1A000B X-Stat-Signature: 65gxazhwxhzjogrygrt8jkwd3341mykz X-Rspam-User: Authentication-Results: imf19.hostedemail.com; dkim=pass header.d=google.com header.s=20210112 header.b=K2q2B3EP; spf=pass (imf19.hostedemail.com: domain of 3EMwnYgcKCF8WLHBBCBDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--zokeefe.bounces.google.com designates 209.85.216.73 as permitted sender) smtp.mailfrom=3EMwnYgcKCF8WLHBBCBDLLDIB.9LJIFKRU-JJHS79H.LOD@flex--zokeefe.bounces.google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam07 X-HE-Tag: 1646775313-86708 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: This is the first advice that makes use of process_madvise() flags. Add the necessary plumbing to make the flags available from do_madvise() handlers. For MADV_COLLAPSE, the added flags are: * MADV_F_COLLAPSE_LIMITS - controls if we should respect khugepaged/max_ptes_* limits (requires CAP_SYS_ADMIN if not acting on self) * MADV_F_COLLAPSE_DEFRAG - force enable defrag, despite vma or system settings. These two flags together provide userspace flexibility in defining separate policies for synchronous userspace-directed collapse, and asynchronous kernel (khugepaged) collapse. Signed-off-by: Zach O'Keefe --- fs/io_uring.c | 3 +- include/linux/huge_mm.h | 3 +- include/linux/mm.h | 3 +- include/uapi/asm-generic/mman-common.h | 8 +++++ mm/khugepaged.c | 7 +++-- mm/madvise.c | 42 ++++++++++++++------------ 6 files changed, 41 insertions(+), 25 deletions(-) diff --git a/fs/io_uring.c b/fs/io_uring.c index 23e7f93d3956..8558b7549431 100644 --- a/fs/io_uring.c +++ b/fs/io_uring.c @@ -4720,7 +4720,8 @@ static int io_madvise(struct io_kiocb *req, unsigned int issue_flags) if (issue_flags & IO_URING_F_NONBLOCK) return -EAGAIN; - ret = do_madvise(current->mm, ma->addr, ma->len, ma->advice); + ret = do_madvise(current->mm, ma->addr, ma->len, ma->advice, + MADV_F_NONE); if (ret < 0) req_set_fail(req); io_req_complete(req, ret); diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 407b63ab4185..31f514ff36be 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -228,7 +228,8 @@ int hugepage_madvise(struct vm_area_struct *vma, unsigned long *vm_flags, int advice); int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, - unsigned long start, unsigned long end); + unsigned long start, unsigned long end, + unsigned int flags); void vma_adjust_trans_huge(struct vm_area_struct *vma, unsigned long start, unsigned long end, long adjust_next); spinlock_t *__pmd_trans_huge_lock(pmd_t *pmd, struct vm_area_struct *vma); diff --git a/include/linux/mm.h b/include/linux/mm.h index dc69d2a69912..f4776f4cda48 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2690,7 +2690,8 @@ extern int __do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf, bool downgrade); extern int do_munmap(struct mm_struct *, unsigned long, size_t, struct list_head *uf); -extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior); +extern int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, + int behavior, unsigned int flags); #ifdef CONFIG_MMU extern int __mm_populate(unsigned long addr, unsigned long len, diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 6ce1f1ceb432..b81f4b1b18ba 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -79,6 +79,14 @@ #define MADV_COLLAPSE 25 /* Synchronous hugepage collapse */ +/* process_madvise() flags */ +#define MADV_F_NONE 0x0 + +/* process_madvise(MADV_COLLAPSE) flags */ +#define MADV_F_COLLAPSE_LIMITS 0x1 /* respect system khugepaged/max_ptes_* sysfs limits */ +#define MADV_F_COLLAPSE_DEFRAG 0x2 /* force enable sync collapse + reclaim */ +#define MADV_F_COLLAPSE_MASK (MADV_F_COLLAPSE_LIMITS | MADV_F_COLLAPSE_DEFRAG) + /* compatibility flags */ #define MAP_FILE 0 diff --git a/mm/khugepaged.c b/mm/khugepaged.c index e8156f15a3da..993de0c6eaa9 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -2942,7 +2942,7 @@ static int _madvise_collapse(struct mm_struct *mm, int madvise_collapse(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, - unsigned long end) + unsigned long end, unsigned int flags) { struct collapse_control cc; gfp_t gfp; @@ -2953,8 +2953,9 @@ int madvise_collapse(struct vm_area_struct *vma, mmap_assert_locked(mm); mmgrab(mm); - collapse_control_init(&cc, /* enforce_pte_scan_limits= */ false); - gfp = vma_thp_gfp_mask(vma); + collapse_control_init(&cc, flags & MADV_F_COLLAPSE_LIMITS); + gfp = vma_thp_gfp_mask(vma) | (flags & MADV_F_COLLAPSE_DEFRAG + ? __GFP_DIRECT_RECLAIM : 0); lru_add_drain(); /* lru_add_drain_all() too heavy here */ error = _madvise_collapse(mm, vma, prev, start, end, gfp, &cc); mmap_assert_locked(mm); diff --git a/mm/madvise.c b/mm/madvise.c index 292aa017c150..7d094d86d2f1 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -979,7 +979,7 @@ static long madvise_remove(struct vm_area_struct *vma, static int madvise_vma_behavior(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, unsigned long end, - unsigned long behavior) + unsigned long behavior, unsigned int flags) { int error; struct anon_vma_name *anon_name; @@ -1048,7 +1048,7 @@ static int madvise_vma_behavior(struct vm_area_struct *vma, goto out; break; case MADV_COLLAPSE: - return madvise_collapse(vma, prev, start, end); + return madvise_collapse(vma, prev, start, end, flags); } anon_name = anon_vma_name(vma); @@ -1160,13 +1160,19 @@ madvise_behavior_valid(int behavior) } static bool -process_madvise_behavior_valid(int behavior) +process_madvise_behavior_valid(int behavior, struct task_struct *task, + unsigned int flags) { switch (behavior) { case MADV_COLD: case MADV_PAGEOUT: case MADV_WILLNEED: - return true; + return flags == 0; + case MADV_COLLAPSE: + return (flags & ~MADV_F_COLLAPSE_MASK) == 0 && + (capable(CAP_SYS_ADMIN) || + (task == current) || + (flags & MADV_F_COLLAPSE_LIMITS)); default: return false; } @@ -1182,10 +1188,11 @@ process_madvise_behavior_valid(int behavior) */ static int madvise_walk_vmas(struct mm_struct *mm, unsigned long start, - unsigned long end, unsigned long arg, + unsigned long end, unsigned long arg, unsigned int flags, int (*visit)(struct vm_area_struct *vma, struct vm_area_struct **prev, unsigned long start, - unsigned long end, unsigned long arg)) + unsigned long end, unsigned long arg, + unsigned int flags)) { struct vm_area_struct *vma; struct vm_area_struct *prev; @@ -1222,7 +1229,7 @@ int madvise_walk_vmas(struct mm_struct *mm, unsigned long start, tmp = end; /* Here vma->vm_start <= start < tmp <= (end|vma->vm_end). */ - error = visit(vma, &prev, start, tmp, arg); + error = visit(vma, &prev, start, tmp, arg, flags); if (error) return error; start = tmp; @@ -1285,7 +1292,7 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, return 0; return madvise_walk_vmas(mm, start, end, (unsigned long)anon_name, - madvise_vma_anon_name); + madvise_vma_anon_name, MADV_F_NONE); } #endif /* CONFIG_ANON_VMA_NAME */ /* @@ -1359,7 +1366,8 @@ int madvise_set_anon_name(struct mm_struct *mm, unsigned long start, * -EBADF - map exists, but area maps something that isn't a file. * -EAGAIN - a kernel resource was temporarily unavailable. */ -int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int behavior) +int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, + int behavior, unsigned int flags) { unsigned long end; int error; @@ -1401,8 +1409,8 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh } blk_start_plug(&plug); - error = madvise_walk_vmas(mm, start, end, behavior, - madvise_vma_behavior); + error = madvise_walk_vmas(mm, start, end, behavior, flags, + madvise_vma_behavior); blk_finish_plug(&plug); if (write) mmap_write_unlock(mm); @@ -1414,7 +1422,8 @@ int do_madvise(struct mm_struct *mm, unsigned long start, size_t len_in, int beh SYSCALL_DEFINE3(madvise, unsigned long, start, size_t, len_in, int, behavior) { - return do_madvise(current->mm, start, len_in, behavior); + return do_madvise(current->mm, start, len_in, behavior, + MADV_F_NONE); } SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, @@ -1429,11 +1438,6 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, size_t total_len; unsigned int f_flags; - if (flags != 0) { - ret = -EINVAL; - goto out; - } - ret = import_iovec(READ, vec, vlen, ARRAY_SIZE(iovstack), &iov, &iter); if (ret < 0) goto out; @@ -1444,7 +1448,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, goto free_iov; } - if (!process_madvise_behavior_valid(behavior)) { + if (!process_madvise_behavior_valid(behavior, task, flags)) { ret = -EINVAL; goto release_task; } @@ -1470,7 +1474,7 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const struct iovec __user *, vec, while (iov_iter_count(&iter)) { iovec = iov_iter_iovec(&iter); ret = do_madvise(mm, (unsigned long)iovec.iov_base, - iovec.iov_len, behavior); + iovec.iov_len, behavior, flags); if (ret < 0) break; iov_iter_advance(&iter, iovec.iov_len);