From patchwork Mon Jun 10 11:12:48 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 10984685 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 0948814E5 for ; Mon, 10 Jun 2019 11:13:14 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id EC9F62873B for ; Mon, 10 Jun 2019 11:13:13 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id E029828741; Mon, 10 Jun 2019 11:13:13 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D7CA72873B for ; Mon, 10 Jun 2019 11:13:12 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B777A6B026B; Mon, 10 Jun 2019 07:13:11 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id B30B46B026C; Mon, 10 Jun 2019 07:13:11 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 9C82F6B026D; Mon, 10 Jun 2019 07:13:11 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 5F3B36B026B for ; Mon, 10 Jun 2019 07:13:11 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id t2so1710283plo.10 for ; Mon, 10 Jun 2019 04:13:11 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=LVb+RK9HpdqgjRFPqEqYP4eUpeiWX6FbBcFPVfvZWbM=; b=bTxXxxi10TZitudINVxrGdXQjQJQUbQuWGuYy/9TxFpkU9GjULQ7l/C/12SJ2wWbdm gSfMvsPlVJRYi454Adq7DBk0tQ18lO6mlS+89MjvJ4JiVYJNkLh/gwua61xytrpUx3g4 Hla8mJ9WqTeEeN6OFodOJ8rywvFVVAjgnI/nLDe78Qwn/Dlxq1t1jxXoRNEt62UB9tkK aX+B2uY1eaB0AeO2dhZeCr42GI1vkWE7R9dX+zshmuvgpVRciVd3Dhy26zLTf2rCsIt1 sRRNedCOTaBR9qQR9nmt/3fRdP4tpme25crytZj9NyngY5Y9IC0gBo4vLIAEk8kppJ7d QPig== X-Gm-Message-State: APjAAAU4kmu8y9N2XJ4yGg4O90GXuNAcPryqU9juI7f7Mn2wA0C561UO wYRyUPwZ5bEGATA7mj2awOa+2Hmm9w3WGxVz9ZNEELqXJgXA6+hHm7eZ10QtRGk18/aD6z5Xjel HhMQDYlB8zgc/jd+DgpnULKgm2XEBNoRguStjocur+SZr9wIqf66EhZ8lU6BYjAY= X-Received: by 2002:a17:902:e306:: with SMTP id cg6mr40198230plb.341.1560165190828; Mon, 10 Jun 2019 04:13:10 -0700 (PDT) X-Received: by 2002:a17:902:e306:: with SMTP id cg6mr40198145plb.341.1560165189641; Mon, 10 Jun 2019 04:13:09 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560165189; cv=none; d=google.com; s=arc-20160816; b=blBSq784kRaVFKE6gwqK2IYnU8tSJhJWXpOPNnMQnoEnfUQueRG/0YEZSVRxUjtGxW GBKnir4iYhRJvRazGn7Ij0+OuCbWpHsed4qt1kCS8+KeRR1Poe9b/B0xEY2o1M4AU9Rb MntYcEuBOb2zQvxVOxvoBhMWk/Xe9TT1e3xLfA+0ABwHnB0y74ZeejwhVAgURebAg+5v 47GADA5LZx3N3X74wQGvhweb93s7k1zyTUZZLvaAOdOLXlTR7ZqLl7uapIqGw+FisFyV A/JA8yLW34kbzkT8oYZIN1bm4CjuiOZUo8RXZh1L+WR8mYzsh3P0dcH5GT2z7T7RSZHM pE1Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=LVb+RK9HpdqgjRFPqEqYP4eUpeiWX6FbBcFPVfvZWbM=; b=0K1S2EIHAKrD8JTYmaBBQ/LGg8Iv5TlQJCK+yn48YnJrbjocVj6XJEqjFlW+TIg8Hx CgRuPZWdUCgL4hK941tUxnPobBr5IcTppYYURn+MHaAOXYE+G4H1Jcth4gUo7zaI1z2S iVe7n4X6oKGKso1zZRSbG9BO67jDeXTSOBcLvKMaUlcoe9bvHsbo3HFEuV6WJMgU6Kbk 2FIrz26PwJNGpl4EqiAgu3cHbRSeSUv/N8B3t+JA20IaLVIBVJmDxEQQRNKGWYhJSutq Nwa1yroOVzyCY+8hv/jMrAHXE6fn2IQIft02siynGRDsnE71yoBzWHkFfimlUbKouuCx BNQg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=CbTXbXRg; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id m8sor8640649pgt.53.2019.06.10.04.13.09 for (Google Transport Security); Mon, 10 Jun 2019 04:13:09 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=CbTXbXRg; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=LVb+RK9HpdqgjRFPqEqYP4eUpeiWX6FbBcFPVfvZWbM=; b=CbTXbXRgrmo+QMwdGcPr+mtk35N+mI2OXBuFin4aGreM+RZux3GfHClxnjtOW5ZuhP 5yjaSnBgmWtX+sVwqhNh5Jv/QFq4VwyYDqOspXjjHN/KRBDAmqJSxLTNvfxdG4cFqOc6 B5vo7ARsDGCk0ODSdTHxFBZqmrM3/yw7qUZatAboTHPBtgRMFBx75n1K4BZU5GJUncdN ZhnFDEy2iS1fG0QeX+djTZuJISIpA9IdxwelJ7gJOwoieSKIfJfie2WZMS+Pt6S8yVVf x7l7sr3nFLgreq24qLTDfRIMSGtFJ1DY4zf9zZ7+zKOH81DUJaffIOnshusBFWS9x9AH cK/A== X-Google-Smtp-Source: APXvYqySeM+HZBTI11B6RovghcMCND58FiVz2JND6vULdAIVKkw5DAT68mAXRa0UoXrA0udL/FTkLw== X-Received: by 2002:a63:70f:: with SMTP id 15mr14875676pgh.432.1560165189062; Mon, 10 Jun 2019 04:13:09 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id h14sm9224633pgj.8.2019.06.10.04.13.03 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 10 Jun 2019 04:13:07 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon , jannh@google.com, oleg@redhat.com, christian@brauner.io, oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Minchan Kim Subject: [PATCH v2 1/5] mm: introduce MADV_COLD Date: Mon, 10 Jun 2019 20:12:48 +0900 Message-Id: <20190610111252.239156-2-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.rc2.383.gf4fbbf30c2-goog In-Reply-To: <20190610111252.239156-1-minchan@kernel.org> References: <20190610111252.239156-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. All of error rule is same with MADV_DONTNEED. * v1 * remove page_mapcount filter - hannes, mhocko * fix idle page handling - joelaf * RFCv2 * add more description - mhocko * RFCv1 * renaming from MADV_COOL to MADV_COLD - hannes * internal review * use clear_page_youn in deactivate_page - joelaf * Revise the description - surenb * Renaming from MADV_WARM to MADV_COOL - surenb Signed-off-by: Minchan Kim --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/internal.h | 2 +- mm/madvise.c | 151 ++++++++++++++++++++++++- mm/oom_kill.c | 2 +- mm/swap.c | 42 +++++++ 6 files changed, 195 insertions(+), 4 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index de2c67a33b7e..0ce997edb8bb 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -340,6 +340,7 @@ extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_all(void); extern void rotate_reclaimable_page(struct page *page); extern void deactivate_file_page(struct page *page); +extern void deactivate_page(struct page *page); extern void mark_page_lazyfree(struct page *page); extern void swap_setup(void); diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index ef4623f03156..d7b4231eea63 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -47,6 +47,7 @@ #define MADV_SEQUENTIAL 2 /* expect sequential page references */ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ +#define MADV_COLD 5 /* deactivatie these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/internal.h b/mm/internal.h index e32390802fd3..0d5f720c75ab 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -39,7 +39,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf); void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); -static inline bool can_madv_dontneed_vma(struct vm_area_struct *vma) +static inline bool can_madv_lru_vma(struct vm_area_struct *vma) { return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); } diff --git a/mm/madvise.c b/mm/madvise.c index 628022e674a7..67c0379f64a7 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -40,6 +40,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_REMOVE: case MADV_WILLNEED: case MADV_DONTNEED: + case MADV_COLD: case MADV_FREE: return 0; default: @@ -307,6 +308,149 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } +static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + struct page *page; + unsigned long next; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + } + + test_and_clear_page_young(page); + deactivate_page(page); +huge_unlock: + spin_unlock(ptl); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; + +regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + + if (pte_none(ptent)) + continue; + + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + + /* + * We are deactivating a page for accelerating reclaiming. + * VM couldn't reclaim the page unless we clear PG_young. + * As a side effect, it makes confuse idle-page tracking + * because they will miss recent referenced history. + */ + test_and_clear_page_young(page); + deactivate_page(page); + } + + arch_enter_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + cond_resched(); + + return 0; +} + +static void madvise_cold_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk cold_walk = { + .pmd_entry = madvise_cold_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &cold_walk); + tlb_end_vma(tlb, vma); +} + +static long madvise_cold(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_cold_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -519,7 +663,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, int behavior) { *prev = vma; - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (!userfaultfd_remove(vma, start, end)) { @@ -541,7 +685,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, */ return -ENOMEM; } - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (end > vma->vm_end) { /* @@ -695,6 +839,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_remove(vma, prev, start, end); case MADV_WILLNEED: return madvise_willneed(vma, prev, start, end); + case MADV_COLD: + return madvise_cold(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -716,6 +862,7 @@ madvise_behavior_valid(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_FREE: + case MADV_COLD: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 5a58778c91d4..f73d5f5145f0 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -515,7 +515,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm) set_bit(MMF_UNSTABLE, &mm->flags); for (vma = mm->mmap ; vma; vma = vma->vm_next) { - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) continue; /* diff --git a/mm/swap.c b/mm/swap.c index 6d153ce4cb8c..7e44f5b50774 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -47,6 +47,7 @@ int page_cluster; static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); +static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); #ifdef CONFIG_SMP static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs); @@ -538,6 +539,22 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, update_page_reclaim_stat(lruvec, file, 0); } +static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, + void *arg) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + int file = page_is_file_cache(page); + int lru = page_lru_base_type(page); + + del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE); + ClearPageActive(page); + ClearPageReferenced(page); + add_page_to_lru_list(page, lruvec, lru); + + __count_vm_events(PGDEACTIVATE, hpage_nr_pages(page)); + update_page_reclaim_stat(lruvec, file, 0); + } +} static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, void *arg) @@ -590,6 +607,10 @@ void lru_add_drain_cpu(int cpu) if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + pvec = &per_cpu(lru_deactivate_pvecs, cpu); + if (pagevec_count(pvec)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + pvec = &per_cpu(lru_lazyfree_pvecs, cpu); if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); @@ -623,6 +644,26 @@ void deactivate_file_page(struct page *page) } } +/* + * deactivate_page - deactivate a page + * @page: page to deactivate + * + * deactivate_page() moves @page to the inactive list if @page was on the active + * list and was not an unevictable page. This is done to accelerate the reclaim + * of @page. + */ +void deactivate_page(struct page *page) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); + + get_page(page); + if (!pagevec_add(pvec, page) || PageCompound(page)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + put_cpu_var(lru_deactivate_pvecs); + } +} + /** * mark_page_lazyfree - make an anon page lazyfree * @page: page to deactivate @@ -687,6 +728,7 @@ void lru_add_drain_all(void) if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) || pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) || + pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) || need_activate_page_drain(cpu)) { INIT_WORK(work, lru_add_drain_per_cpu); From patchwork Mon Jun 10 11:12:49 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 10984687 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id F355514E5 for ; Mon, 10 Jun 2019 11:13:18 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id E34F32873B for ; Mon, 10 Jun 2019 11:13:18 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id D678D28741; Mon, 10 Jun 2019 11:13:18 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 58A552873B for ; Mon, 10 Jun 2019 11:13:18 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3AEEF6B026C; Mon, 10 Jun 2019 07:13:17 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3382F6B026D; Mon, 10 Jun 2019 07:13:17 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 1D9846B026E; Mon, 10 Jun 2019 07:13:17 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f199.google.com (mail-pf1-f199.google.com [209.85.210.199]) by kanga.kvack.org (Postfix) with ESMTP id DBF5D6B026C for ; Mon, 10 Jun 2019 07:13:16 -0400 (EDT) Received: by mail-pf1-f199.google.com with SMTP id f9so7019074pfn.6 for ; Mon, 10 Jun 2019 04:13:16 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=rtdW0n4Q4ONaBgNzSNMHKzSk3rd1hSsd9/TPUeo4Zf4=; b=sRMXiMHgo0zMTCr9Vvnjxgb0gvsvrEU6ZrvH1ZCa7DDxhodSHpb+jAhEGdmhTkMMaO vyx+l59udLiigDRI/8SRiDYbS1SSuOdM7/rbR0THmOGFZdAf6NGehoP3o8ABmhOPJiGK 5OCd7fMoHNat8ZcT3lXtzXTdzg9Pv5/9xGOPRyWWZiNvv1eJn1SNFFSZTD+SIjdqYHz2 zfx+5prIKRXzrvwJAkKHXonZPZkdZwBHHCefr1zZ/Dh5pMibbFHhsUW3DpegnBAYNs4V CvXkdq64bQHMKx+vyfRXlldVr1hUtmTkG92EH5bX22ra6jw+uk0l0KHYIQu0ty6wH/6u pgAw== X-Gm-Message-State: APjAAAUTMxh4ucTofTJ9UojwVe0LiQ9cd+LS2XMAa/9h31GdSIjnVh8W EdCwatxOAsBrsD4VqoB8aajJ42A1IdZi+R06a0Pa744NkYAIBPWwkUfm0x04ptmEMb3PGaj4KuS Eit2Ni9+q2HCwz/YwhZ9akNqSLMdaYAAAnMBR1ux8SiaqRE/yuKGTukSvBm5LPk4= X-Received: by 2002:a63:d4c:: with SMTP id 12mr15695155pgn.30.1560165196475; Mon, 10 Jun 2019 04:13:16 -0700 (PDT) X-Received: by 2002:a63:d4c:: with SMTP id 12mr15695101pgn.30.1560165195666; Mon, 10 Jun 2019 04:13:15 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560165195; cv=none; d=google.com; s=arc-20160816; b=T6T2hy381SpnJULKFxQLuX9WWEtEvSw+wIx1zJ7jcpXlUwgnvUc0dJOKBC5gMMGfTR ZpsvXdEy8aTrk7fJ4N6NhrkertNz49oicKWN+m/E6O3FMz5GJsbN0PyvP2iheWDx+t5E PryIQB0faYNmRuTHLvo6rCeT6xwFIr5iDAPhuzHLSat0wFNFvQltumuFEI7O/fz1BAVW 5DyeTsRLB+xZzSK9lnodB5721WfYfjYhsR2psmGLSszZqSLM69MEyiK9nGCche3Z7hOT Y/gV86fbJiSSP84FP8FptkC/608dstQPkxWawN8gIE9a2wnXQnNFXJ7Tisg61UeFVZL+ Xjdw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=rtdW0n4Q4ONaBgNzSNMHKzSk3rd1hSsd9/TPUeo4Zf4=; b=t3jsAFObH/xXYc69/lkC0SlIKnKKAZF4sVWNcDHsNodN8QR07yC+nzpIZ0LW5IuDLg bbj61wQdM/hTSh7cudWMAaOiQxzOACbCqEqQ2uWgXp/3JCJxgdn2xA7SzlZZRwpf1cpB q00BnelatYEroT9og7d395vSt6n5TsQb0I4K4OfKA78Or2ZVNwpbGmtgzX//vycwviHA bFPHdah1/JQ7w3EmEwm/dEreQ0OUQx+YMP52FCDcIyz0jL3ZtnF9fD5Enfc/PtyUL3VZ I9zwDtqvYwbiQpEnKa/75fcF/DsIkIe6MIbXZsTa6t/TG0l99Ha2kC4kc7P8SS06+cmI CNdg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="C/EGTTb9"; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id t1sor11441120pjo.17.2019.06.10.04.13.15 for (Google Transport Security); Mon, 10 Jun 2019 04:13:15 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="C/EGTTb9"; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=rtdW0n4Q4ONaBgNzSNMHKzSk3rd1hSsd9/TPUeo4Zf4=; b=C/EGTTb9dJdZwxuMIvqZVsbbUIRUqEq0Y0GOsoig6NyIpHoHymWCD8ih6Z7c/JT9EQ TPYVc6XiFmXsMqqJzzrQfDUkaww3pkzUGlgYKvwRnrUDUd7rMI6YVnlA4QVMw+Iz1ICW il5vU458YFkhGj2rBqvx8tEeXt4JQ+GJPR9SuoRXU/7NfzDgWKFqLp8mtwue9uFuOZJE YvB9yTdlV946j0YWbsCSQ18+xYS863GYrva8WhdjFwLVK1Ghishf2h2q+kiQAraavvvH Yx/2mN058ASCIvAQo909hzJ5sk4SkIChiKgnRD+he2K94Fe8/w89Q5FDi11PRzboCdya lsuw== X-Google-Smtp-Source: APXvYqzUKtfBzCkt8JOrcNL3JVApJN7IuSow2TUbfeFGWUJOWH42s56iyctqroAHvajIeMPIzhFSOA== X-Received: by 2002:a17:90b:d8b:: with SMTP id bg11mr20724547pjb.30.1560165195244; Mon, 10 Jun 2019 04:13:15 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id h14sm9224633pgj.8.2019.06.10.04.13.09 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 10 Jun 2019 04:13:13 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon , jannh@google.com, oleg@redhat.com, christian@brauner.io, oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Minchan Kim Subject: [PATCH v2 2/5] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Date: Mon, 10 Jun 2019 20:12:49 +0900 Message-Id: <20190610111252.239156-3-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.rc2.383.gf4fbbf30c2-goog In-Reply-To: <20190610111252.239156-1-minchan@kernel.org> References: <20190610111252.239156-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN as default. It is for preventing to reclaim dirty pages when CMA try to migrate pages. Strictly speaking, we don't need it because CMA didn't allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list. Moreover, it has a problem to prevent anonymous pages's swap out even though force_reclaim = true in shrink_page_list on upcoming patch. So this patch makes references's default value to PAGEREF_RECLAIM and rename force_reclaim with ignore_references to make it more clear. This is a preparatory work for next patch. * RFCv1 * use ignore_referecnes as parameter name - hannes Acked-by: Johannes Weiner Signed-off-by: Minchan Kim Acked-by: Michal Hocko --- mm/vmscan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 84dcb651d05c..0973a46a0472 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1102,7 +1102,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct scan_control *sc, enum ttu_flags ttu_flags, struct reclaim_stat *stat, - bool force_reclaim) + bool ignore_references) { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); @@ -1116,7 +1116,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct address_space *mapping; struct page *page; int may_enter_fs; - enum page_references references = PAGEREF_RECLAIM_CLEAN; + enum page_references references = PAGEREF_RECLAIM; bool dirty, writeback; unsigned int nr_pages; @@ -1247,7 +1247,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, } } - if (!force_reclaim) + if (!ignore_references) references = page_check_references(page, sc); switch (references) { From patchwork Mon Jun 10 11:12:50 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 10984689 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 468726C5 for ; Mon, 10 Jun 2019 11:13:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 33CAE2873B for ; Mon, 10 Jun 2019 11:13:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 26E6628741; Mon, 10 Jun 2019 11:13:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 02FD02873B for ; Mon, 10 Jun 2019 11:13:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B28986B026D; Mon, 10 Jun 2019 07:13:23 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id AB2ED6B026E; Mon, 10 Jun 2019 07:13:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 92C9D6B026F; Mon, 10 Jun 2019 07:13:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f198.google.com (mail-pl1-f198.google.com [209.85.214.198]) by kanga.kvack.org (Postfix) with ESMTP id 58DE46B026D for ; Mon, 10 Jun 2019 07:13:23 -0400 (EDT) Received: by mail-pl1-f198.google.com with SMTP id y9so5560126plp.12 for ; Mon, 10 Jun 2019 04:13:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=VFKFNRtcM5OzHpTShHRkAEmoY5RrXhVcLzC7ftmVEqk=; b=sfntlC2iTGNYI8LtawNdoL+7KwUh2MWJoZP08PJDN2XC07z+pmlExdcJ9phhzPiEFn 8q/4GWLhs0GX8Kx3XDYcvCAf+E2RmWCY4MMIcODWeHKEgIzik+JkKgSIvSGa/GQr5Gwm r65TEntiMqVOJfXmehbQJl3m6jMMKh/Ufh7jjzRFe17rfspjQxLepV1joOXE8AdWOt2w s9L9n1O7ndrbN7ilfyrW4lcrsonIMIfxtNivyc0cg/k4c3DgL7hU+XTn1EzEk4olOsvR 94LObNcJprp9qF7ZZpfef1kzspjBq102N/pi2f52EjE2GWb9VxDHmTHSosGd3UzZu27R 2drA== X-Gm-Message-State: APjAAAXod5O4Wq2utlignqcLz6ewhPZP8w8pORGcO0yPxTNFrMcyWwpy OTJTQZG4KyoRAPS9lMSQivkJ+eCimeVIt7cc2iTquMOdQmtHjRJI0yA+txD+XpK/lQKFLtWSZS4 bw+fVJfyVYyWAuNrFCmZyBNyWke7Vaaqum5Wmte+0AOr2ci8lhfpWWHplosv2apI= X-Received: by 2002:a17:90a:a397:: with SMTP id x23mr21256203pjp.118.1560165202984; Mon, 10 Jun 2019 04:13:22 -0700 (PDT) X-Received: by 2002:a17:90a:a397:: with SMTP id x23mr21256106pjp.118.1560165201713; Mon, 10 Jun 2019 04:13:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560165201; cv=none; d=google.com; s=arc-20160816; b=bybCGHKnHZIRGOoFwPcZA7RrpoDyjQL5O+fltlLOVQc8ZwZAyVmm13FP+LU01D3A6i 3zTpc1ncETMFQ9ktTCZjM2YZFpokud8bD18vgIjoStF7dlQWgfzNjdr6+s56pFlta9mD jAbjrSk8zawTELDzbFB2AiWVeP7JXjodrU36ElbSiBmqpoyaMCYarXgHmhA4+58nkY8V d+r1BQngCZu0g+2TcD2giyc03/Ll+FK/DTa+LZhjpBUXJ3udFXKlNOX28RDdVKsdgoRd mWkRh8vycx8jlV6jjUOLRCNnNFXbJKGbERB9EZ7LlDd30/+ql0KF7qD7c/fJmgTm5ZKg Fe4Q== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=VFKFNRtcM5OzHpTShHRkAEmoY5RrXhVcLzC7ftmVEqk=; b=vRv4N5Zihr1bBqbMP4ODMpwHZxCVvcvLcPks363oyoAeYUkPM1SovMU+9g1dTQlWxV 2CD9Su9aqn/DZV+KkAGLtOB6a2LPcwoEaV3lYyZ1kECbxMhEGkiK1yohzz2Y1IQlKrv8 AY3sQyFRowV4fns6m8M2contrv969YmP8P676aBWa/n4TC4rwLJgDKue1Gieq16ngjD/ OQ2tWQmSYvxotUhVt4Qy7r1I5McmLarYLeaI3QZCJAJZVFMMsLmkIYnwjbvv7lMWJmU5 qRukjxtXpRHlJrKDYJq2CJl5ohjueV8/5H0xeaLMo6+aze9WXD9TKqkdD6phyhgQXGgL A+3w== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=EI6RzxhR; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id q17sor1686217pff.71.2019.06.10.04.13.21 for (Google Transport Security); Mon, 10 Jun 2019 04:13:21 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=EI6RzxhR; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=VFKFNRtcM5OzHpTShHRkAEmoY5RrXhVcLzC7ftmVEqk=; b=EI6RzxhRDjHPHn9Vdzmm9+DgK6lW/qfPybb0N3jbnxnKjfl2UkO6C0/CF7Onb6QbbG sPMaIT9I6FRBXAdUGVDxEBXrx9LRqMitpcE9Rzp5v4Dq3aBe2e+wJYrx8tw2xhIH3lYc SeRsvQw8qtRH8UaI15yD0sHpGfqGPOidtzGyLrm/AT2Fz0qiVlxBdoVz+vayPDmxqA3O ZAcwmC3PRQwz9LGXrvgphzPPcjkQioc/xsi+X30c8MIQpDYUsVQp2/GCzI6smnLhZk9s uQYnsZwUp62B+o0eYsRAthmqnc/gqQrtkMA/BSA+60c2LRiPuBaee9J2kfRO3/DDnuaf 3Mqw== X-Google-Smtp-Source: APXvYqxqXUdffPMelN2Ct953o2KwbeGvuUDzve1AuLhaglMZdPW1Us8IlwYZJMvPNIVqfquLo+piPA== X-Received: by 2002:aa7:8555:: with SMTP id y21mr17189332pfn.104.1560165201273; Mon, 10 Jun 2019 04:13:21 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id h14sm9224633pgj.8.2019.06.10.04.13.15 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 10 Jun 2019 04:13:20 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon , jannh@google.com, oleg@redhat.com, christian@brauner.io, oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Minchan Kim Subject: [PATCH v2 3/5] mm: account nr_isolated_xxx in [isolate|putback]_lru_page Date: Mon, 10 Jun 2019 20:12:50 +0900 Message-Id: <20190610111252.239156-4-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.rc2.383.gf4fbbf30c2-goog In-Reply-To: <20190610111252.239156-1-minchan@kernel.org> References: <20190610111252.239156-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The isolate counting is pecpu counter so it would be not huge gain to work them by batch. Rather than complicating to make them batch, let's make it more stright-foward via adding the counting logic into [isolate|putback]_lru_page API. * v1 * fix accounting bug - Hillf Link: http://lkml.kernel.org/r/20190531165927.GA20067@cmpxchg.org Suggested-by: Johannes Weiner Signed-off-by: Minchan Kim --- mm/compaction.c | 2 -- mm/gup.c | 7 +------ mm/khugepaged.c | 3 --- mm/memory-failure.c | 3 --- mm/memory_hotplug.c | 4 ---- mm/mempolicy.c | 6 +----- mm/migrate.c | 37 ++++++++----------------------------- mm/vmscan.c | 22 ++++++++++++++++------ 8 files changed, 26 insertions(+), 58 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9e1b9acb116b..c6591682deda 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -982,8 +982,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* Successfully isolated */ del_page_from_lru_list(page, lruvec, page_lru(page)); - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); isolate_success: list_add(&page->lru, &cc->migratepages); diff --git a/mm/gup.c b/mm/gup.c index 63ac50e48072..2d9a9bc358c7 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1360,13 +1360,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, drain_allow = false; } - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, &cma_page_list); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + - page_is_file_cache(head), - hpage_nr_pages(head)); - } } } } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index a335f7c1fac4..3359df994fb4 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -503,7 +503,6 @@ void __khugepaged_exit(struct mm_struct *mm) static void release_pte_page(struct page *page) { - dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); unlock_page(page); putback_lru_page(page); } @@ -602,8 +601,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_DEL_PAGE_LRU; goto out; } - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index b9cc36a284f9..430946cf9c8a 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1796,9 +1796,6 @@ static int __soft_offline_page(struct page *page, int flags) * so use !__PageMovable instead for LRU page's mapping * cannot have PAGE_MAPPING_MOVABLE. */ - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index a88c5f334e5a..a41bea24d0c9 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1390,10 +1390,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE); if (!ret) { /* Success */ list_add_tail(&page->lru, &source); - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - } else { pr_warn("failed to isolate pfn %lx\n", pfn); dump_page(page, "isolation failed"); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index fdcb73536319..89bb25fe7553 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -948,12 +948,8 @@ static void migrate_page_add(struct page *page, struct list_head *pagelist, * Avoid migrating a page that is shared with others. */ if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); - } } } diff --git a/mm/migrate.c b/mm/migrate.c index 572b4bc85d76..5583324c01e7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -190,8 +190,6 @@ void putback_movable_pages(struct list_head *l) unlock_page(page); put_page(page); } else { - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); putback_lru_page(page); } } @@ -1181,10 +1179,17 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, return -ENOMEM; if (page_count(page) == 1) { + bool is_lru = !__PageMovable(page); + /* page was freed from under us. So we are done. */ ClearPageActive(page); ClearPageUnevictable(page); - if (unlikely(__PageMovable(page))) { + if (likely(is_lru)) + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); + else { lock_page(page); if (!PageMovable(page)) __ClearPageIsolated(page); @@ -1210,15 +1215,6 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, * restored. */ list_del(&page->lru); - - /* - * Compaction can migrate also non-LRU pages which are - * not accounted to NR_ISOLATED_*. They can be recognized - * as __PageMovable - */ - if (likely(!__PageMovable(page))) - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); } /* @@ -1572,9 +1568,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, err = 0; list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); } out_putpage: /* @@ -1890,8 +1883,6 @@ static struct page *alloc_misplaced_dst_page(struct page *page, static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) { - int page_lru; - VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page); /* Avoid migrating to a node that is nearly full */ @@ -1913,10 +1904,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) return 0; } - page_lru = page_is_file_cache(page); - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru, - hpage_nr_pages(page)); - /* * Isolating the page has taken another reference, so the * caller's reference can be safely dropped without the page @@ -1971,8 +1958,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, if (nr_remaining) { if (!list_empty(&migratepages)) { list_del(&page->lru); - dec_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); putback_lru_page(page); } isolated = 0; @@ -2002,7 +1987,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, pg_data_t *pgdat = NODE_DATA(node); int isolated = 0; struct page *new_page = NULL; - int page_lru = page_is_file_cache(page); unsigned long start = address & HPAGE_PMD_MASK; new_page = alloc_pages_node(node, @@ -2048,8 +2032,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, /* Retake the callers reference and putback on LRU */ get_page(page); putback_lru_page(page); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR); goto out_unlock; } @@ -2099,9 +2081,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR); count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, - -HPAGE_PMD_NR); return isolated; out_fail: diff --git a/mm/vmscan.c b/mm/vmscan.c index 0973a46a0472..56df55e8afcd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -999,6 +999,9 @@ int remove_mapping(struct address_space *mapping, struct page *page) void putback_lru_page(struct page *page) { lru_cache_add(page); + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_cache(page), + -hpage_nr_pages(page)); put_page(page); /* drop ref from isolate */ } @@ -1464,6 +1467,9 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ nr_reclaimed += nr_pages; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -nr_pages); /* * Is there need to periodically free_page_list? It would * appear not as the counts should be low @@ -1539,7 +1545,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, TTU_IGNORE_ACCESS, &dummy_stat, true); list_splice(&clean_pages, page_list); - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret); return ret; } @@ -1615,6 +1620,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) */ ClearPageLRU(page); ret = 0; + __mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } return ret; @@ -1746,6 +1754,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, total_scan, skipped, nr_taken, mode, lru); update_lru_sizes(lruvec, lru, nr_zone_taken); + return nr_taken; } @@ -1794,6 +1803,9 @@ int isolate_lru_page(struct page *page) ClearPageLRU(page); del_page_from_lru_list(page, lruvec, lru); ret = 0; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } spin_unlock_irq(&pgdat->lru_lock); } @@ -1885,6 +1897,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); list_move(&page->lru, &lruvec->lists[lru]); + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); if (put_page_testzero(page)) { __ClearPageLRU(page); __ClearPageActive(page); @@ -1962,7 +1977,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT; @@ -1988,8 +2002,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, move_pages_to_lru(lruvec, &page_list); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&page_list); @@ -2048,7 +2060,6 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; __count_vm_events(PGREFILL, nr_scanned); @@ -2117,7 +2128,6 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGDEACTIVATE, nr_deactivate); __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&l_active); From patchwork Mon Jun 10 11:12:51 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 10984691 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8D45614E5 for ; Mon, 10 Jun 2019 11:13:31 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7C4C62873B for ; Mon, 10 Jun 2019 11:13:31 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7025628741; Mon, 10 Jun 2019 11:13:31 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 92F102873B for ; Mon, 10 Jun 2019 11:13:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 70A8B6B026E; Mon, 10 Jun 2019 07:13:29 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 693C06B026F; Mon, 10 Jun 2019 07:13:29 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 534C66B0270; Mon, 10 Jun 2019 07:13:29 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 174FC6B026E for ; Mon, 10 Jun 2019 07:13:29 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id y9so5560258plp.12 for ; Mon, 10 Jun 2019 04:13:29 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=QxZJuWwrSt8e4EoZGdB0xgtw5B8tMulUEiCfRA05PWQ=; b=Q6X+BgeZL9+t+9JwECITFC0gKWW1ohjRCxrXx8LmbZFjmz429SHvHH/3MLRb2vM3FQ /bojJ5fP8MQ3CK/9BQfko8is1cSia0S80tU4nQyCwf2bqyFgVYd8VWR4jSBW7EQ5CV/1 gpeUWG5BBn75iaY0dwbUh6pTzJoPlzUv9Tt1rH+v7kGixEADWoAg6ulUwvcWFAkeGjFG tYGTEqYIglDjuflbmPFMjaYk7pvg+6gFWs7+Hd5GWK8NdD3immkv0atbvF3sjCEu67nz +74A/A2x2r2J985EHM851etJAH7d4JzdIH98nwbojTtqTNwf/D6/pPh04omUZFsLTo7x 1GKA== X-Gm-Message-State: APjAAAVBQwSJ9yXMjUiESuy/NyHdQxkayqbniuQQmhla8pUoPoS2EvEW 9sK7PR+AkD2DRDuvZ7PWKX/MOLSUTkPhAuw9UjupHEv7E2XDESKgX6mtOfss8ma5cZ/ZZKzpnDa I4IWwf3sNZjSMsGtn5mMYjbtG+FPw1oOD+VTasVDL6pUMonWPxU37zQeImfgYaF4= X-Received: by 2002:a17:902:3183:: with SMTP id x3mr16272766plb.321.1560165208706; Mon, 10 Jun 2019 04:13:28 -0700 (PDT) X-Received: by 2002:a17:902:3183:: with SMTP id x3mr16272690plb.321.1560165207619; Mon, 10 Jun 2019 04:13:27 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560165207; cv=none; d=google.com; s=arc-20160816; b=LOpDseTJZK7T1cSnzGs4XF0sk+gfcJkLNlWR05dfh2aawBXTNH3Z0uUPYetPgZaCiE Xh30XY/wMhDJyVBgAy49o1hjMuwHnMORUdscUXsKQL2cGOzJPgZRZybYoXYpefXtAVSN BvSaqK0FNyQGRXRY5sTFNN94P4AXhqx2/c9IzROvkQ8cwNE935tOB4dZjDO7b63wpbRU 7SZ1DhaltNHEnmUTdDF4mHufF+DdwO8rDA7k/uG7VCM8f9AUNT83YjZdoPt+v71JZkMb akvw/DnsM3ajjxmSVvpiHs4VAVnc91MrZzExDdoC5zhPw3FdofFlwwpcNTiahSnrhLem BHWw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=QxZJuWwrSt8e4EoZGdB0xgtw5B8tMulUEiCfRA05PWQ=; b=wQkYEezdzcilBVms1R5+kZc7dRjSj9KMqkkVC5eFeRZXErUon1dCGCgL9V85Q4eHvO rUGMDpVpwO968pCm9RwlBwd2sdQb3cKXNPijw1G+UOai15RXTNeJYN3qr4yRiTl1z2Lr PRXpSEIFUnsZzCu8NwX2cR0Hk93zav3VzxNnWkqjX55dwHh6DTqCdAGmelIG8u58JK2i IuiiitNRRKnhaOlXKKlKF+BkuVpARAUGTkuB42zgMPwrnzDtYW70NtqMl85LkUKAEgSs DoWCYQPcJNfyjuQ7tkT32LJvCBGa0+DUiFpY/sGQZk861/fcCBnr2vBJDSv76+bDNd3I vlrA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=iHwMnlsc; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id t2sor8786895pgu.81.2019.06.10.04.13.27 for (Google Transport Security); Mon, 10 Jun 2019 04:13:27 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=iHwMnlsc; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=QxZJuWwrSt8e4EoZGdB0xgtw5B8tMulUEiCfRA05PWQ=; b=iHwMnlscA2jX3z1xMikm1LVfnuU2QmIkKk8wIs6d9NgqrVH/KvGSn4Y62YOXnNOAVQ ycpRSlEugPyNuTFkhtmfgSRo19++zKdvp3ykbUfkYVtsMeBoHx0AwmrLDznJaeNbMg6P TCemf1RAqvpxrePlAt9N154/rh/VE22R5s/rnHyZXBp4OJ6eoBCqcuFPe01cKg11O82v GoUcjpOblLIcsyrAEAp2vgamYqIMO52d2mj0FGSzESxOqaAHJu6tTPPa9xX81Rljp25n jGp4rLEH+cT+Bq+5EPYVUGltT7ryKrVzz7oyFE/R5sn5PdGqa5NH5J411HpJI8deMMIL MaEQ== X-Google-Smtp-Source: APXvYqyq38tW8Z7RiY2tkkYNH38hLl3AkhvhmjGaPhxvQB5DjCukq8bEPPkMiBqwMP3WTTCCHoOBOA== X-Received: by 2002:a63:2109:: with SMTP id h9mr15359153pgh.51.1560165207127; Mon, 10 Jun 2019 04:13:27 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id h14sm9224633pgj.8.2019.06.10.04.13.21 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 10 Jun 2019 04:13:25 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon , jannh@google.com, oleg@redhat.com, christian@brauner.io, oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Minchan Kim Subject: [PATCH v2 4/5] mm: introduce MADV_PAGEOUT Date: Mon, 10 Jun 2019 20:12:51 +0900 Message-Id: <20190610111252.239156-5-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.rc2.383.gf4fbbf30c2-goog In-Reply-To: <20190610111252.239156-1-minchan@kernel.org> References: <20190610111252.239156-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. All of error rule is same with MADV_DONTNEED. * v1 * change pte to old and rely on the other's reference - hannes * remove page_mapcount to check shared page - mhocko * RFC v2 * make reclaim_pages simple via factoring out isolate logic - hannes * RFCv1 * rename from MADV_COLD to MADV_PAGEOUT - hannes * bail out if process is being killed - Hillf * fix reclaim_pages bugs - Hillf Signed-off-by: Minchan Kim --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/madvise.c | 161 +++++++++++++++++++++++++ mm/vmscan.c | 58 +++++++++ 4 files changed, 221 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0ce997edb8bb..063c0c1e112b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -365,6 +365,7 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long vm_total_pages; +extern unsigned long reclaim_pages(struct list_head *page_list); #ifdef CONFIG_NUMA extern int node_reclaim_mode; extern int sysctl_min_unmapped_ratio; diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index d7b4231eea63..f545e159b472 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -48,6 +48,7 @@ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ #define MADV_COLD 5 /* deactivatie these pages */ +#define MADV_PAGEOUT 6 /* reclaim these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/madvise.c b/mm/madvise.c index 67c0379f64a7..3b9d2ba421b1 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -41,6 +42,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_COLD: + case MADV_PAGEOUT: case MADV_FREE: return 0; default: @@ -451,6 +453,162 @@ static long madvise_cold(struct vm_area_struct *vma, return 0; } +static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + LIST_HEAD(page_list); + struct page *page; + int isolated = 0; + unsigned long next; + + if (fatal_signal_pending(current)) + return -EINTR; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (isolate_lru_page(page)) + goto huge_unlock; + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_tlb_entry(tlb, pmd, addr); + } + + ClearPageReferenced(page); + test_and_clear_page_young(page); + list_add(&page->lru, &page_list); +huge_unlock: + spin_unlock(ptl); + reclaim_pages(&page_list); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; +regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + if (isolate_lru_page(page)) + continue; + + isolated++; + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + ClearPageReferenced(page); + test_and_clear_page_young(page); + list_add(&page->lru, &page_list); + if (isolated >= SWAP_CLUSTER_MAX) { + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + reclaim_pages(&page_list); + isolated = 0; + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + arch_enter_lazy_mmu_mode(); + orig_pte = pte; + } + } + + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + reclaim_pages(&page_list); + cond_resched(); + + return 0; +} + +static void madvise_pageout_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk pageout_walk = { + .pmd_entry = madvise_pageout_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &pageout_walk); + tlb_end_vma(tlb, vma); +} + + +static long madvise_pageout(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_pageout_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -841,6 +999,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_willneed(vma, prev, start, end); case MADV_COLD: return madvise_cold(vma, prev, start, end); + case MADV_PAGEOUT: + return madvise_pageout(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -863,6 +1023,7 @@ madvise_behavior_valid(int behavior) case MADV_DONTNEED: case MADV_FREE: case MADV_COLD: + case MADV_PAGEOUT: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/vmscan.c b/mm/vmscan.c index 56df55e8afcd..04061185677f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2136,6 +2136,64 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_deactivate, nr_rotated, sc->priority, file); } +unsigned long reclaim_pages(struct list_head *page_list) +{ + int nid = -1; + unsigned long nr_reclaimed = 0; + LIST_HEAD(node_page_list); + struct reclaim_stat dummy_stat; + struct scan_control sc = { + .gfp_mask = GFP_KERNEL, + .priority = DEF_PRIORITY, + .may_writepage = 1, + .may_unmap = 1, + .may_swap = 1, + }; + + while (!list_empty(page_list)) { + struct page *page; + + page = lru_to_page(page_list); + if (nid == -1) { + nid = page_to_nid(page); + INIT_LIST_HEAD(&node_page_list); + } + + if (nid == page_to_nid(page)) { + list_move(&page->lru, &node_page_list); + continue; + } + + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + struct page *page = lru_to_page(&node_page_list); + + list_del(&page->lru); + putback_lru_page(page); + } + + nid = -1; + } + + if (!list_empty(&node_page_list)) { + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + struct page *page = lru_to_page(&node_page_list); + + list_del(&page->lru); + putback_lru_page(page); + } + } + + return nr_reclaimed; +} + /* * The inactive anon list should be small enough that the VM never has * to do too much work. From patchwork Mon Jun 10 11:12:52 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 10984693 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 7448B14E5 for ; Mon, 10 Jun 2019 11:13:38 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 622462873D for ; Mon, 10 Jun 2019 11:13:38 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 55C692874A; Mon, 10 Jun 2019 11:13:38 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 5D90A2873D for ; Mon, 10 Jun 2019 11:13:37 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 4242D6B026F; Mon, 10 Jun 2019 07:13:36 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 3AD7C6B0270; Mon, 10 Jun 2019 07:13:36 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 24E7C6B0271; Mon, 10 Jun 2019 07:13:36 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f198.google.com (mail-pf1-f198.google.com [209.85.210.198]) by kanga.kvack.org (Postfix) with ESMTP id DEA076B026F for ; Mon, 10 Jun 2019 07:13:35 -0400 (EDT) Received: by mail-pf1-f198.google.com with SMTP id 140so6980978pfa.23 for ; Mon, 10 Jun 2019 04:13:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=rPG+ROWOzHjK16mIcQ4C+hvTtB/BQwTOXkhzxALsNDM=; b=DyfzgO6Kc7MVHh4z4u6xPVHHE/ClSrja+Qa4LeE1JmdmAue/aata2jqVys6ra6VEEE abwn4uGWA8LYzZzoyY34udnt+uz8oYE4tlFbJMMXurRiQEUAcwPUjS58NXW/XMtLe7c9 iiaHeqK1L2clzCVBSxsRAv0YRopZvgIJAcpKxP6Jp/R7eVj80mMEqOqR8CvYPkTdd8uw Ut6hmpcbeksgw4gP4Yjk6xu2TkjR/7ADzlUww61NLGC62X49H/A1m4hjBku2obB76PAh vh5RS+kbW/WYjME1Y4zI7eYGTG5T5HkRLjFL9s6TaV2og5DJwuK10EN8PHrXNs8v/f3/ 7DbQ== X-Gm-Message-State: APjAAAUfxihvYQCgxG2y6CdzYUPt6D6VbFqwQ3CBFBoDV6stFYFoX4/z 6ytfp84la18y/tNsqD2uXeOZANs6p/swyyehBx1FAgdB5xCxJkoo2F5SNwxsgcpksuna0HUjXrX sx9w4aKkZd0HvJp68nWsEHpCyYCa3Tj5B6xQHAkONg9wm0bjKIOJn+ReC02XOB8o= X-Received: by 2002:a63:87c8:: with SMTP id i191mr15454094pge.131.1560165215366; Mon, 10 Jun 2019 04:13:35 -0700 (PDT) X-Received: by 2002:a63:87c8:: with SMTP id i191mr15454004pge.131.1560165213694; Mon, 10 Jun 2019 04:13:33 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1560165213; cv=none; d=google.com; s=arc-20160816; b=Iu5Q+8TbHopKgkAci5gi1phx7cVIAU3W2BRA7V4Xvl4/r00IGQrWTACyBQXX51Vdag M+Ks4mz/jDsY5BGKhyZi7LqVw9TzA+tSr/l+jnj7aUirYethV38h3SmNvG7rvPrcSknd 7UxC4jQz0q8DLGoW7k07uhlHEKG4ppd4RAjXa2XkFdlCT08BuAAaZyX61cBPDmfjAYcX IR6NI5bEU3s5N78Rop3BrJJ4vCVgAudXF+9o8X18ymlegfmpgaywAxJ7vhsf0T+184tc CQcZ8AhdvXsyvjI4N5am4A3EqCQ2DRN4/AeVc8HFkROAP0eNFfIbvtjHyHBTtxA6YGNe b35w== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=rPG+ROWOzHjK16mIcQ4C+hvTtB/BQwTOXkhzxALsNDM=; b=Yekx33VENOBddGbPLR67sD9BxRvZRCZInv+7fofQVEwt+nnRPhADJgHa+IVVaxBXWZ Rh05i3fy+8HCUeoI4ulOtsthkSnWA8mvHSeQYyOVcBRUaO5EhCjJBVV6jz+kNo242Y1I VjerdrxuUCmJ+bSOXWuyKoMR0OLFJrZC4vyHr4cEtq/TltUGtLzaNU4yMzSlntZLjyHS 4LYs+Pk8DRVeiuCrmnwIjxjzqKcwsMowkp0ghacV3hGDBc7hSU3I7/D+8xUR0LF1RdwI LKpkRmn+axxyDacHhMIqWu7MYUANQNqpXgJe+wrEcZC/z2e0sGlFZ0tG8/QzUfaLasJb fyqQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NmKhfQn2; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id j20sor9431395pfh.40.2019.06.10.04.13.33 for (Google Transport Security); Mon, 10 Jun 2019 04:13:33 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=NmKhfQn2; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=rPG+ROWOzHjK16mIcQ4C+hvTtB/BQwTOXkhzxALsNDM=; b=NmKhfQn2/0M7QiWafaLEoMeyTHABnSvZPFy8O2V0AL4u+lJgz3njJVO5H5zxGzSy4h xqy2KzD/DPsBMirf/T36qYm88jTFj8X6yvNuTsL1Bi5EzG+aT6xZ3Iuqua1UWhM+k0yO GKY2t/Fyiv9LwekT+YjjHt3Sp5yhJKexKC6lqjeGNtd6uVcz0EntdyZo6eFsoNTHjVQX uu12cLDdBNqTnwdJib0hsQJdAgoElWA9sdMeGjxMzFwrJywj8p6yYGc+mjPf2MrEQ8DK jjyRMG/Vyt0JFVhWG1IBivNsQnuK1M/mpK5CR5Iu+DnuKJ8+3YdEGPIafurC55SsJ7PA We0w== X-Google-Smtp-Source: APXvYqzY4WtiEyc3qVwXA80Fry7VpC5HipNq59JihUKhfrGRMamA0pPT4cEFDQGwJD6Cz7sjB8iumw== X-Received: by 2002:aa7:8193:: with SMTP id g19mr67599446pfi.162.1560165213314; Mon, 10 Jun 2019 04:13:33 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id h14sm9224633pgj.8.2019.06.10.04.13.27 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 10 Jun 2019 04:13:32 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , Brian Geffon , jannh@google.com, oleg@redhat.com, christian@brauner.io, oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Minchan Kim , "Kirill A. Shutemov" , Christopher Lameter Subject: [PATCH v2 5/5] mm: factor out pmd young/dirty bit handling and THP split Date: Mon, 10 Jun 2019 20:12:52 +0900 Message-Id: <20190610111252.239156-6-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.rc2.383.gf4fbbf30c2-goog In-Reply-To: <20190610111252.239156-1-minchan@kernel.org> References: <20190610111252.239156-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Now, there are common part among MADV_COLD|PAGEOUT|FREE to reset access/dirty bit resetting or split the THP page to handle part of subpages in the THP page. This patch factor out the common part. Cc: "Kirill A. Shutemov" Cc: Christopher Lameter Signed-off-by: Minchan Kim --- include/linux/huge_mm.h | 3 - mm/huge_memory.c | 74 ------------- mm/madvise.c | 234 +++++++++++++++++++++++----------------- 3 files changed, 135 insertions(+), 176 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7cd5c150c21d..2667e1aa3ce5 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -29,9 +29,6 @@ extern struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, unsigned int flags); -extern bool madvise_free_huge_pmd(struct mmu_gather *tlb, - struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr, unsigned long next); extern int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 9f8bce9a6b32..22e20f929463 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1668,80 +1668,6 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } -/* - * Return true if we do MADV_FREE successfully on entire pmd page. - * Otherwise, return false. - */ -bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr, unsigned long next) -{ - spinlock_t *ptl; - pmd_t orig_pmd; - struct page *page; - struct mm_struct *mm = tlb->mm; - bool ret = false; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - - ptl = pmd_trans_huge_lock(pmd, vma); - if (!ptl) - goto out_unlocked; - - orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto out; - - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto out; - } - - page = pmd_page(orig_pmd); - /* - * If other processes are mapping this page, we couldn't discard - * the page unless they all do MADV_FREE so let's skip the page. - */ - if (page_mapcount(page) != 1) - goto out; - - if (!trylock_page(page)) - goto out; - - /* - * If user want to discard part-pages of THP, split it so MADV_FREE - * will deactivate only them. - */ - if (next - addr != HPAGE_PMD_SIZE) { - get_page(page); - spin_unlock(ptl); - split_huge_page(page); - unlock_page(page); - put_page(page); - goto out_unlocked; - } - - if (PageDirty(page)) - ClearPageDirty(page); - unlock_page(page); - - if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd = pmd_mkold(orig_pmd); - orig_pmd = pmd_mkclean(orig_pmd); - - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_pmd_tlb_entry(tlb, pmd, addr); - } - - mark_page_lazyfree(page); - ret = true; -out: - spin_unlock(ptl); -out_unlocked: - return ret; -} - static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) { pgtable_t pgtable; diff --git a/mm/madvise.c b/mm/madvise.c index 3b9d2ba421b1..bb1906bb75fd 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -310,6 +310,91 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } +enum madv_pmdp_reset_t { + MADV_PMDP_RESET, /* pmd was reset successfully */ + MADV_PMDP_SPLIT, /* pmd was split */ + MADV_PMDP_ERROR, +}; + +static enum madv_pmdp_reset_t madvise_pmdp_reset_or_split(struct mm_walk *walk, + pmd_t *pmd, spinlock_t *ptl, + unsigned long addr, unsigned long end, + bool young, bool dirty) +{ + pmd_t orig_pmd; + unsigned long next; + struct page *page; + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = walk->mm; + struct vm_area_struct *vma = walk->vma; + bool reset_young = false; + bool reset_dirty = false; + enum madv_pmdp_reset_t ret = MADV_PMDP_ERROR; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + return ret; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + return ret; + } + + next = pmd_addr_end(addr, end); + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + /* + * THP collapsing is not cheap so only split the page is + * private to the this process. + */ + if (page_mapcount(page) != 1) + return ret; + get_page(page); + spin_unlock(ptl); + lock_page(page); + if (!split_huge_page(page)) + ret = MADV_PMDP_SPLIT; + unlock_page(page); + put_page(page); + return ret; + } + + if (young && pmd_young(orig_pmd)) + reset_young = true; + if (dirty && pmd_dirty(orig_pmd)) + reset_dirty = true; + + /* + * Other process could rely on the PG_dirty for data consistency, + * not pte_dirty so we could reset PG_dirty only when we are owner + * of the page. + */ + if (reset_dirty) { + if (page_mapcount(page) != 1) + goto out; + if (!trylock_page(page)) + goto out; + if (PageDirty(page)) + ClearPageDirty(page); + unlock_page(page); + } + + ret = MADV_PMDP_RESET; + if (reset_young || reset_dirty) { + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + pmdp_invalidate(vma, addr, pmd); + if (reset_young) + orig_pmd = pmd_mkold(orig_pmd); + if (reset_dirty) + orig_pmd = pmd_mkclean(orig_pmd); + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + } +out: + return ret; +} + static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -319,64 +404,31 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, pte_t *orig_pte, *pte, ptent; spinlock_t *ptl; struct page *page; - unsigned long next; - next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) { - pmd_t orig_pmd; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); ptl = pmd_trans_huge_lock(pmd, vma); if (!ptl) return 0; - orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto huge_unlock; - - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto huge_unlock; - } - - page = pmd_page(orig_pmd); - if (next - addr != HPAGE_PMD_SIZE) { - int err; - - if (page_mapcount(page) != 1) - goto huge_unlock; - - get_page(page); + switch (madvise_pmdp_reset_or_split(walk, pmd, ptl, addr, end, + true, false)) { + case MADV_PMDP_RESET: spin_unlock(ptl); - lock_page(page); - err = split_huge_page(page); - unlock_page(page); - put_page(page); - if (!err) - goto regular_page; - return 0; - } - - if (pmd_young(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd = pmd_mkold(orig_pmd); - - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + page = pmd_page(*pmd); + test_and_clear_page_young(page); + deactivate_page(page); + goto next; + case MADV_PMDP_ERROR: + spin_unlock(ptl); + goto next; + case MADV_PMDP_SPLIT: + ; /* go through */ } - - test_and_clear_page_young(page); - deactivate_page(page); -huge_unlock: - spin_unlock(ptl); - return 0; } if (pmd_trans_unstable(pmd)) return 0; -regular_page: tlb_change_page_size(tlb, PAGE_SIZE); orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); flush_tlb_batched_pending(mm); @@ -414,6 +466,7 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, arch_enter_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); +next: cond_resched(); return 0; @@ -464,70 +517,38 @@ static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, LIST_HEAD(page_list); struct page *page; int isolated = 0; - unsigned long next; if (fatal_signal_pending(current)) return -EINTR; - next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) { - pmd_t orig_pmd; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); ptl = pmd_trans_huge_lock(pmd, vma); if (!ptl) return 0; - orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto huge_unlock; - - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto huge_unlock; - } - - page = pmd_page(orig_pmd); - if (next - addr != HPAGE_PMD_SIZE) { - int err; - - if (page_mapcount(page) != 1) - goto huge_unlock; - get_page(page); + switch (madvise_pmdp_reset_or_split(walk, pmd, ptl, addr, end, + true, false)) { + case MADV_PMDP_RESET: + page = pmd_page(*pmd); spin_unlock(ptl); - lock_page(page); - err = split_huge_page(page); - unlock_page(page); - put_page(page); - if (!err) - goto regular_page; - return 0; - } - - if (isolate_lru_page(page)) - goto huge_unlock; - - if (pmd_young(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd = pmd_mkold(orig_pmd); - - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_tlb_entry(tlb, pmd, addr); + if (isolate_lru_page(page)) + return 0; + ClearPageReferenced(page); + test_and_clear_page_young(page); + list_add(&page->lru, &page_list); + reclaim_pages(&page_list); + goto next; + case MADV_PMDP_ERROR: + spin_unlock(ptl); + goto next; + case MADV_PMDP_SPLIT: + ; /* go through */ } - - ClearPageReferenced(page); - test_and_clear_page_young(page); - list_add(&page->lru, &page_list); -huge_unlock: - spin_unlock(ptl); - reclaim_pages(&page_list); - return 0; } if (pmd_trans_unstable(pmd)) return 0; -regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); flush_tlb_batched_pending(mm); @@ -569,6 +590,7 @@ static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); reclaim_pages(&page_list); +next: cond_resched(); return 0; @@ -620,12 +642,26 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, pte_t *orig_pte, *pte, ptent; struct page *page; int nr_swap = 0; - unsigned long next; - next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) - if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) + if (pmd_trans_huge(*pmd)) { + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + switch (madvise_pmdp_reset_or_split(walk, pmd, ptl, addr, end, + true, true)) { + case MADV_PMDP_RESET: + page = pmd_page(*pmd); + spin_unlock(ptl); + mark_page_lazyfree(page); goto next; + case MADV_PMDP_ERROR: + spin_unlock(ptl); + goto next; + case MADV_PMDP_SPLIT: + ; /* go through */ + } + } if (pmd_trans_unstable(pmd)) return 0; @@ -737,8 +773,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, } arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); - cond_resched(); next: + cond_resched(); return 0; }