From patchwork Tue Jul 23 06:25:35 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11053907 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 307266C5 for ; Tue, 23 Jul 2019 06:25:59 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1BE7C285E1 for ; Tue, 23 Jul 2019 06:25:59 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 0CDC028573; Tue, 23 Jul 2019 06:25:59 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D34E628573 for ; Tue, 23 Jul 2019 06:25:57 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id E04386B0008; Tue, 23 Jul 2019 02:25:56 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id D8E708E0003; Tue, 23 Jul 2019 02:25:56 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id BE13D8E0001; Tue, 23 Jul 2019 02:25:56 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 7F10E6B0008 for ; Tue, 23 Jul 2019 02:25:56 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id h3so25291934pgc.19 for ; Mon, 22 Jul 2019 23:25:56 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=wW44akBTWHeVt4h6UpXEbo4n6ZM+o8/8SIOBHtwi49k=; b=th+HP/un0qhhqZrcEuNsgwRcIcgEWTR8JlQLQ7mssmJxe7LBYmKBqmr16qBgEfdIf/ fdGLVWdRE6OlQ0sppxgPwiM8gNjPlPBjdJ8bxJI4Eesd46yNOm8nG8ybDRUEI6OELP4B xRtLwpOV/MlNHgqgtbrXDn93PU/g05quJjzal0FkmfmO6hbHBRYtqTAHHKaGS+yQ/uXL Qoc2hdIiEQ9mf/AbSoKPPETztAqyNuW1nzygAqxsq0mwmZrLjEwFBgNjBdmcEO4jT/+g 0YuqRhqbrwRLgYaXcCxZCY83KmQHKc0VfBDmnOub61xwa/JtxU9Cy979si9KtYSfUw+J 9P+w== X-Gm-Message-State: APjAAAUhfWwYIK/l687ek125hRwRX3SJKg/fqPMD9tMClqIiL2kCE5hU WgZ8bArgoMYEtshLJ1cuBD3mX9Vr8+xWc2QhVtYFZ2OjiXu0CgB+p4aNOzTHiDvLeBMlYH2nzgY kYGzGGVydzdCm6Js2PMb3w/xV325vTGfgRKCNe4JP4lom10jkBX8H9GzxJpich+g= X-Received: by 2002:a17:902:b206:: with SMTP id t6mr80226172plr.195.1563863156068; Mon, 22 Jul 2019 23:25:56 -0700 (PDT) X-Received: by 2002:a17:902:b206:: with SMTP id t6mr80226115plr.195.1563863154952; Mon, 22 Jul 2019 23:25:54 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563863154; cv=none; d=google.com; s=arc-20160816; b=qx6psCSQoM2KaqDX6E9U+9J493Hf9fXEiAqMlWRw/hjomF3TobM9B/tXb1qoVTCwjr r1qq/hP1PrWA9Yksi3L8mvTs7k5n1eBJxWEP1SNALC4yU0R69voMTd/ZrZxLmfbtZpHu qfeuoZAkI4vMz6mpiKhsjg5HcLGc5WQYKiN/6DNUhvZvHblxcPsoKIVrVET+qk6CIrYL K4YnkIcLXDNEfNxvBXQnEQ1+kSnDYCRuO69hmbsrOxDA2dV8LDdeRoSw/MVr2I/+l0YC FNLdsswjkpJsRaOT3FSmLzrViP7v0TsWzp/QuoMFLzcTPxKDjPEt+T2WSeetPJT75mRB nRmg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=wW44akBTWHeVt4h6UpXEbo4n6ZM+o8/8SIOBHtwi49k=; b=pOm7nmDairEd5FOHV4A6lry9h+BEbnPYlA6H2NUYRkcIeimpFGHtJSPMEaFBvn8z+z bPOfy9/ZUDbO4blzZULRRo5jiobTK+LaxyaIOJ1h2p3NrYVKnf/TzDfXAXfA37oMrGoL nJvrHHx58GiOJ89MPaQdQkNeltKqR0QgS2/itcLz/jny873OQ/PY81B0aSfvcpoK7FgW 0qjYPa2zxUflGGhCk5JMdDxgFLJYZq+y+naNVD+7q0LY6xFzxJffej3jF7hdG0o9+stP MfxHcEzfiMVM1oJWFs/npweLbKJf7MsteJcdHhZ+jupQZ8HVMRg9XzD3FuA82SMX/kKT CVxA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="OOP/Qpti"; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id e16sor52005524pjp.20.2019.07.22.23.25.54 for (Google Transport Security); Mon, 22 Jul 2019 23:25:54 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b="OOP/Qpti"; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=wW44akBTWHeVt4h6UpXEbo4n6ZM+o8/8SIOBHtwi49k=; b=OOP/QptiBcfrMWI9sO7VjPEHRNwBL6fdgVM0kN/1DnHmlljKVhkbIYc0cimaW1YrhM 0oLs7CZ9epZNdFdp5eP8pvUSokAOEizSdQjTUlh7gIMY9clXFiSD/I44d5yMQ84+wfgh d+2C7oFP2w8q4daO/WN364n8FJYSjVUzemL6D+zadLM1GL+p14QZQ0FgwzuOkOYXsmS6 zl7+8K/MPVB07atO+9+JP1Zh9INM6xLsvviZ1nRgMqxMQHolYjZuZbpW7YZvZ8aEZWsB 7+zXiVNLDHPxH9gPv4AgMOI/3SsNrPzDd+/eYdZuc5eontHID5AByP//MKw/SSQ6JHkW wP5A== X-Google-Smtp-Source: APXvYqz38xDn2wzqB3iOdTq7V+kVIfuuId64iwhy7EBsG7WriLn4HRy2WgZ2hqM8X9DzxOpsNw01wQ== X-Received: by 2002:a17:90a:270f:: with SMTP id o15mr80783232pje.56.1563863154488; Mon, 22 Jul 2019 23:25:54 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id s66sm44630376pfs.8.2019.07.22.23.25.49 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 22 Jul 2019 23:25:53 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v6 1/5] mm: introduce MADV_COLD Date: Tue, 23 Jul 2019 15:25:35 +0900 Message-Id: <20190723062539.198697-2-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.657.g960e92d24f-goog In-Reply-To: <20190723062539.198697-1-minchan@kernel.org> References: <20190723062539.198697-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. * v5 * Fix typo and correct wrong lazy_mmu_mode pair use - surenb * v2 * add up the warn with lots of page cache workload - mhocko * add man page stuff - dave * v1 * remove page_mapcount filter - hannes, mhocko * remove idle page handling - joelaf * RFCv2 * add more description - mhocko * RFCv1 * renaming from MADV_COOL to MADV_COLD - hannes * internal review * use clear_page_youn in deactivate_page - joelaf * Revise the description - surenb * Renaming from MADV_WARM to MADV_COOL - surenb Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Minchan Kim --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/internal.h | 2 +- mm/madvise.c | 180 ++++++++++++++++++++++++- mm/oom_kill.c | 2 +- mm/swap.c | 42 ++++++ 6 files changed, 224 insertions(+), 4 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index de2c67a33b7e7..0ce997edb8bbc 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -340,6 +340,7 @@ extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_all(void); extern void rotate_reclaimable_page(struct page *page); extern void deactivate_file_page(struct page *page); +extern void deactivate_page(struct page *page); extern void mark_page_lazyfree(struct page *page); extern void swap_setup(void); diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 63b1f506ea678..e9aeda400af3a 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -45,6 +45,7 @@ #define MADV_SEQUENTIAL 2 /* expect sequential page references */ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ +#define MADV_COLD 5 /* deactivate these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/internal.h b/mm/internal.h index e32390802fd3f..0d5f720c75abf 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -39,7 +39,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf); void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); -static inline bool can_madv_dontneed_vma(struct vm_area_struct *vma) +static inline bool can_madv_lru_vma(struct vm_area_struct *vma) { return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); } diff --git a/mm/madvise.c b/mm/madvise.c index 968df3aa069fd..10255bb23aa73 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -40,6 +40,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_REMOVE: case MADV_WILLNEED: case MADV_DONTNEED: + case MADV_COLD: case MADV_FREE: return 0; default: @@ -307,6 +308,178 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } +static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + struct page *page; + unsigned long next; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + } + + test_and_clear_page_young(page); + deactivate_page(page); +huge_unlock: + spin_unlock(ptl); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; + +regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + + if (pte_none(ptent)) + continue; + + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + /* + * Creating a THP page is expensive so split it only if we + * are sure it's worth. Split it if we are only owner. + */ + if (PageTransCompound(page)) { + if (page_mapcount(page) != 1) + break; + get_page(page); + if (!trylock_page(page)) { + put_page(page); + break; + } + pte_unmap_unlock(orig_pte, ptl); + if (split_huge_page(page)) { + unlock_page(page); + put_page(page); + pte_offset_map_lock(mm, pmd, addr, &ptl); + break; + } + unlock_page(page); + put_page(page); + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + pte--; + addr -= PAGE_SIZE; + continue; + } + + VM_BUG_ON_PAGE(PageTransCompound(page), page); + + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + + /* + * We are deactivating a page for accelerating reclaiming. + * VM couldn't reclaim the page unless we clear PG_young. + * As a side effect, it makes confuse idle-page tracking + * because they will miss recent referenced history. + */ + test_and_clear_page_young(page); + deactivate_page(page); + } + + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + cond_resched(); + + return 0; +} + +static void madvise_cold_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk cold_walk = { + .pmd_entry = madvise_cold_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &cold_walk); + tlb_end_vma(tlb, vma); +} + +static long madvise_cold(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_cold_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -519,7 +692,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, int behavior) { *prev = vma; - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (!userfaultfd_remove(vma, start, end)) { @@ -541,7 +714,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, */ return -ENOMEM; } - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (end > vma->vm_end) { /* @@ -695,6 +868,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_remove(vma, prev, start, end); case MADV_WILLNEED: return madvise_willneed(vma, prev, start, end); + case MADV_COLD: + return madvise_cold(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -716,6 +891,7 @@ madvise_behavior_valid(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_FREE: + case MADV_COLD: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 95872bdfec4e0..c8f0ec6b0e80c 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -523,7 +523,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm) set_bit(MMF_UNSTABLE, &mm->flags); for (vma = mm->mmap ; vma; vma = vma->vm_next) { - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) continue; /* diff --git a/mm/swap.c b/mm/swap.c index ae300397dfdac..cd492435395d5 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -47,6 +47,7 @@ int page_cluster; static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); +static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); #ifdef CONFIG_SMP static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs); @@ -538,6 +539,22 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, update_page_reclaim_stat(lruvec, file, 0); } +static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, + void *arg) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + int file = page_is_file_cache(page); + int lru = page_lru_base_type(page); + + del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE); + ClearPageActive(page); + ClearPageReferenced(page); + add_page_to_lru_list(page, lruvec, lru); + + __count_vm_events(PGDEACTIVATE, hpage_nr_pages(page)); + update_page_reclaim_stat(lruvec, file, 0); + } +} static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, void *arg) @@ -590,6 +607,10 @@ void lru_add_drain_cpu(int cpu) if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + pvec = &per_cpu(lru_deactivate_pvecs, cpu); + if (pagevec_count(pvec)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + pvec = &per_cpu(lru_lazyfree_pvecs, cpu); if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); @@ -623,6 +644,26 @@ void deactivate_file_page(struct page *page) } } +/* + * deactivate_page - deactivate a page + * @page: page to deactivate + * + * deactivate_page() moves @page to the inactive list if @page was on the active + * list and was not an unevictable page. This is done to accelerate the reclaim + * of @page. + */ +void deactivate_page(struct page *page) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); + + get_page(page); + if (!pagevec_add(pvec, page) || PageCompound(page)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + put_cpu_var(lru_deactivate_pvecs); + } +} + /** * mark_page_lazyfree - make an anon page lazyfree * @page: page to deactivate @@ -687,6 +728,7 @@ void lru_add_drain_all(void) if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) || pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) || + pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) || need_activate_page_drain(cpu)) { INIT_WORK(work, lru_add_drain_per_cpu); From patchwork Tue Jul 23 06:25:36 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11053909 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 46C10138D for ; Tue, 23 Jul 2019 06:26:03 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 35B84285E1 for ; Tue, 23 Jul 2019 06:26:03 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 28F79285EB; Tue, 23 Jul 2019 06:26:03 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 85A37285E1 for ; Tue, 23 Jul 2019 06:26:02 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 74B956B000A; Tue, 23 Jul 2019 02:26:01 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6FF448E0003; Tue, 23 Jul 2019 02:26:01 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 529788E0001; Tue, 23 Jul 2019 02:26:01 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id 141B36B000A for ; Tue, 23 Jul 2019 02:26:01 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id x10so25511827pfa.23 for ; Mon, 22 Jul 2019 23:26:01 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=bnIASPZxrrGOWF/G5OfEZ1yV57NTr/ruSeiXKsRM3nk=; b=VTRvdm0gy/I8WmRLo9wO4E57HzGM+/1cF1XEcEs89Tu/5MdbIKtvnzF+XO2E+66L5K MfZ6TSCnZUhroKUiNaXTzCRd2na8E2617giurOawnfSfTqIwpFXU7nUaJ566D/Q/VWcA KlGOf7vpsHyDLsXTeobvZ7QTI5Qvnn7GYKpcF4JBQU8PA+CwUghvYhiP3m0ZKoXHhr0E M7NIVEGBEsf3ynTREtj6m79YmKz9fxJh8mcgCQ+8M4AXPVY/utl8LIQYevwFIPqZ2nOj 2LaUgJ6RddbePJm5qCnNHRo7lhR10aWC7pmZEnm5nA1tc0DX3EDVPAMORCWH3J0ODUqb cjCA== X-Gm-Message-State: APjAAAW99+83OM3YgdvjMT7lOmz6jye2AcG1mD0qTKVQfhqW6tOoKUgv C1gA69i9Y+dGHskDgpstfuRQeTBNJwOWE3BkNjXoXUogISD/1m84JH5/DscRJNsmLlbrhE/RoUH b1ryDJRzvH+Qm/T+BwVst18ZwwBRDULE6VNqVABDHHpR7UlFMn76HKcpI70DeA7w= X-Received: by 2002:a17:90a:2343:: with SMTP id f61mr81742463pje.130.1563863160751; Mon, 22 Jul 2019 23:26:00 -0700 (PDT) X-Received: by 2002:a17:90a:2343:: with SMTP id f61mr81742423pje.130.1563863159985; Mon, 22 Jul 2019 23:25:59 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563863159; cv=none; d=google.com; s=arc-20160816; b=Uic0uNxrEhx2+trq4RGv4yJPHlDcjsazUUDfhzZI/n8Gjv72s5CfeCfVpiXqbZFn4c Fq4uSfphLaPJFJTzawKkMDTO7aXGcWp3zI15G3EaD1XEXWpsMkMDgai9pcspJ8pFJau4 ZdT9v5E1YszN4eYtgy2xdePhqv0Chp660qJbQiF4Js4gHMMJm/QxjtP3PO2aSFqrhNMR QL2IBtDY1WQT8rm8IXnCTOLaIn0juTd7DAFmp6mQJNx+vkyLMnAQUAj3CfY47wc72MaX QN8VGanbsGsCi89sE/1+cOZF+KBIp2NjQLlmQw5wbr/TPeJjySHtenM9T0vjtqOCGZaQ cGmQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=bnIASPZxrrGOWF/G5OfEZ1yV57NTr/ruSeiXKsRM3nk=; b=mC800lAmd4nFY5iJueuxDZxuX/oJjWzAwFnZTmLx1KuvUWExfZcmY5ldzV09VxNC+P WB65+7sxrmwY3FK/+nPbQ5iST0z45xl4b1hmJ8WqSU9XmmDrqqdmxT8xIZfxeaEWoAPE J1uYF0OqwrUYHuukfRUZP+JBeTXaenDqtvhha89XecBUJlbPNHehfJynU6A1asao/Uoq byg3lihcpoMfubqf74pbSJhqj7oSHBBtjf3VDzEYGuVEqaUh9/DLI8TZPJ/78rLaOByw elsw6S2SUh2n2mTEGc7VXdEOP24stLSsmbO6HzXuO5dp2+btXMpqq5wBK3IMd+P/sg5N yYcQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=uuhLoA0B; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id m39sor50663953plg.49.2019.07.22.23.25.59 for (Google Transport Security); Mon, 22 Jul 2019 23:25:59 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=uuhLoA0B; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bnIASPZxrrGOWF/G5OfEZ1yV57NTr/ruSeiXKsRM3nk=; b=uuhLoA0BBoERKypNBgrn/ZlmpMF92srWO/6MUGEN3xX1GzOTomN23OaW/P/sopLIED 2Xzsuj6Co6dKvk0D5MbrA8TfP4zStCUOHX6h5EBFSgLyStisdYhYvIEJEqB78ObpfvQx CIGiXaI8bQsfzUcHQORkIa8aacyWCE9dLGwtRaXBrzmU5hzTk6jGGgYCl9PXzw3/0bkL xe3svs11oSO6Hq0wAytjsddI9jOX0atWt4ZiE7429kLMyXjLHClGvJP6kuBdw9t/NZ/n Pj8j5fZXo7JbPhPs7ets5aP15dM9d29pdU+nhxzKURWL9WBl7Y2iQjcanm0fOfDBvhjW LBnw== X-Google-Smtp-Source: APXvYqyOx95R0nQcXqx5AoqfhCHYUJ6uEq79frQnW+ZOvXT1NvMLzarqUl004ZEy3UbXm7PLzVCYcQ== X-Received: by 2002:a17:902:290b:: with SMTP id g11mr77632984plb.26.1563863159650; Mon, 22 Jul 2019 23:25:59 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id s66sm44630376pfs.8.2019.07.22.23.25.54 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 22 Jul 2019 23:25:58 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v6 2/5] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Date: Tue, 23 Jul 2019 15:25:36 +0900 Message-Id: <20190723062539.198697-3-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.657.g960e92d24f-goog In-Reply-To: <20190723062539.198697-1-minchan@kernel.org> References: <20190723062539.198697-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN as default. It is for preventing to reclaim dirty pages when CMA try to migrate pages. Strictly speaking, we don't need it because CMA didn't allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list. Moreover, it has a problem to prevent anonymous pages's swap out even though force_reclaim = true in shrink_page_list on upcoming patch. So this patch makes references's default value to PAGEREF_RECLAIM and rename force_reclaim with ignore_references to make it more clear. This is a preparatory work for next patch. * RFCv1 * use ignore_referecnes as parameter name - hannes Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Minchan Kim --- mm/vmscan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index f4fd02ae233ef..f68449ce0c44c 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1124,7 +1124,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct scan_control *sc, enum ttu_flags ttu_flags, struct reclaim_stat *stat, - bool force_reclaim) + bool ignore_references) { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); @@ -1138,7 +1138,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct address_space *mapping; struct page *page; int may_enter_fs; - enum page_references references = PAGEREF_RECLAIM_CLEAN; + enum page_references references = PAGEREF_RECLAIM; bool dirty, writeback; unsigned int nr_pages; @@ -1269,7 +1269,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, } } - if (!force_reclaim) + if (!ignore_references) references = page_check_references(page, sc); switch (references) { From patchwork Tue Jul 23 06:25:37 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11053911 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 47011138D for ; Tue, 23 Jul 2019 06:26:09 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 35FDA285E1 for ; Tue, 23 Jul 2019 06:26:09 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 29AEE285EB; Tue, 23 Jul 2019 06:26:09 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 30F55285E1 for ; Tue, 23 Jul 2019 06:26:08 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 243548E0003; Tue, 23 Jul 2019 02:26:07 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 1CD408E0001; Tue, 23 Jul 2019 02:26:07 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 070298E0003; Tue, 23 Jul 2019 02:26:07 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f197.google.com (mail-pf1-f197.google.com [209.85.210.197]) by kanga.kvack.org (Postfix) with ESMTP id C12A18E0001 for ; Tue, 23 Jul 2019 02:26:06 -0400 (EDT) Received: by mail-pf1-f197.google.com with SMTP id i26so25442679pfo.22 for ; Mon, 22 Jul 2019 23:26:06 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=xr8cwtVAFqpMs6Az1B0omCnRc5Dl1COt3ClHqKPUTb0=; b=moh9rHA9nHVX7dCj+Kc0PHWSl4yx4c964Pq8ZlxQtMEZTyBBS4Wxdl/etQFDRlfuqQ IHgi1LAOLb2+Eqb2uIwbjImURAG5OR0w2mGi5vFlLjh8rUZzNRkrh4/yBjybXsggcSdA reUEJkct3zTawxcVzjUFCnr49libbVmW7XT57r2T08wlWcINkiDb+6oDApsIVAQ7xvlM NbMJehpf9LTgC4hU8ylXIDe84FQqIopR2lM9Jp29yZd52v0LoDjzt1TnfrFIdLAUKw9P fQEVwzZp0GQ3uNWR3hUsbX3yJ94hog51cCcAiSC6YAmnsnW8SPyx1ed5Az/0O1Uvip4a j2ew== X-Gm-Message-State: APjAAAUnV4CuzqtigEiVd/lGkr/VGrMNVprHz6stNFMRAbGpxbSeG6zq 2iYBp9fYwyAvhKQgh7LJjrINJK+MoCmJDZWl8zW3fV6rKBr2vp4LH64/dDcZ18Pd+H8aOVSLeW5 tqHw3C7rJHBHoQQ3FSBUSVsi9guJhE2XZv9g8x4pW3T6FJUlmWNtPREl5cGbGSMY= X-Received: by 2002:a17:90a:8a84:: with SMTP id x4mr79465966pjn.105.1563863166422; Mon, 22 Jul 2019 23:26:06 -0700 (PDT) X-Received: by 2002:a17:90a:8a84:: with SMTP id x4mr79465899pjn.105.1563863165294; Mon, 22 Jul 2019 23:26:05 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563863165; cv=none; d=google.com; s=arc-20160816; b=SUwvikr5xt16xqO1je4qhkgcPLEXsAKt4ryPCVholSZJsv8MMnQAlBq/wtrAqzUeuP oRt0XMpnoX7v0bz/7iIwF0Hp5kYMeJ1EWtK6C23jQ+z7DodzqPTG4vn9A3bQEmghcKDG nCvb9QYER065n7LqlFTxnzbumncRPM9uxKppwwAZaSIlEyy6/Jx/MPW0druZ+u5IehaK WJmu3vFsSTGZNne7rrKGdySbOh7wePI1uvHtYyxeyoeSQGRam4HUBppzVBzTYN67QDyg NNbzIRrFLtomVMcVP3sQS0KtkdNUqp7hU061p+tDXxkf5VBQ+tBZAhUDgnFP56tgqhD/ WYOQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=xr8cwtVAFqpMs6Az1B0omCnRc5Dl1COt3ClHqKPUTb0=; b=xR7ht3rpp42JP0L7g9TYuFI+vNjRo+Zqt3wV8ETGZgzfCseFehAPmIrhAIs0nfe8Wf dj6kTs3cc4derJKeiy8oik09aacwjsTXncTXjLvqW1KhfKAd6vdeBCi6Y65cHwedbgNR 4WHuwHyuvPcaV5+Ri98Np+g+bHByuodC0CR67VcOErxD4eyYgtBmgBDDHrw0Lw5rYgTR lOXkB483ccOwS9iUlIwwBTo2FqmdsLuM9nyFLXk2PHO6JBve/rI/2QsKB4jifnzqEjTV 3m38W0MsgrbRzj8+NVcGsFNiZjJYAU2B17uFc444E+/FIVp+72QSkZ+DOTmeIS9rmfRq /Q/Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cmL1+InM; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id b64sor52015187pjc.24.2019.07.22.23.26.05 for (Google Transport Security); Mon, 22 Jul 2019 23:26:05 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=cmL1+InM; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=xr8cwtVAFqpMs6Az1B0omCnRc5Dl1COt3ClHqKPUTb0=; b=cmL1+InM3mkcUz2zDDDyJRITha2V/7Rd8iwv+E1hV4xWxw3bm/WF6h5BVMClUgEvVn 66oS1lHbyJ+8UIfuPXPAVjxqqUIpzZl2DuPnRuJXWr7GgBT5srsqpRCit5/5ame8HVfG DF90SM1618EeOQJZ26t4Pic/dJQt8dW0+hEHbVK7gAmZxyWKrktsuvNLItNKdsEk2uno pr+uwP0MZYECsS6HPNco8k2jZBDGffHXAoytjxANUnkkf+jIj9lm8EZq/salckMn93f2 L4cug0wZEPWUN/ER/t5ZJsZqUAxAnQoMZ2RHkDienL6bNL01khE3mG0XmELWBBKvvQ7q 2TYw== X-Google-Smtp-Source: APXvYqyYQVyWb5NekZSfzb2hyCygQblEI1kBz45ohHpfqzxwwSNZgRSbErPKUB/YTrfc5L/+EwHqXw== X-Received: by 2002:a17:90a:3344:: with SMTP id m62mr81071242pjb.135.1563863164805; Mon, 22 Jul 2019 23:26:04 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id s66sm44630376pfs.8.2019.07.22.23.25.59 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 22 Jul 2019 23:26:03 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v6 3/5] mm: account nr_isolated_xxx in [isolate|putback]_lru_page Date: Tue, 23 Jul 2019 15:25:37 +0900 Message-Id: <20190723062539.198697-4-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.657.g960e92d24f-goog In-Reply-To: <20190723062539.198697-1-minchan@kernel.org> References: <20190723062539.198697-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The isolate counting is pecpu counter so it would be not huge gain to work them by batch. Rather than complicating to make them batch, let's make it more stright-foward via adding the counting logic into [isolate|putback]_lru_page API. * v1 * fix accounting bug - Hillf Link: http://lkml.kernel.org/r/20190531165927.GA20067@cmpxchg.org Suggested-by: Johannes Weiner Acked-by: Johannes Weiner Acked-by: Michal Hocko Signed-off-by: Minchan Kim --- mm/compaction.c | 2 -- mm/gup.c | 7 +------ mm/khugepaged.c | 3 --- mm/memory-failure.c | 3 --- mm/memory_hotplug.c | 4 ---- mm/mempolicy.c | 6 +----- mm/migrate.c | 37 ++++++++----------------------------- mm/vmscan.c | 22 ++++++++++++++++------ 8 files changed, 26 insertions(+), 58 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 952dc2fb24e50..3e6b5acdaaffc 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -984,8 +984,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* Successfully isolated */ del_page_from_lru_list(page, lruvec, page_lru(page)); - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); isolate_success: list_add(&page->lru, &cc->migratepages); diff --git a/mm/gup.c b/mm/gup.c index 98f13ab37bacc..11d0634ce6137 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1475,13 +1475,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, drain_allow = false; } - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, &cma_page_list); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + - page_is_file_cache(head), - hpage_nr_pages(head)); - } } } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index eaaa21b232156..a8b517d6df4ab 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -503,7 +503,6 @@ void __khugepaged_exit(struct mm_struct *mm) static void release_pte_page(struct page *page) { - dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); unlock_page(page); putback_lru_page(page); } @@ -602,8 +601,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_DEL_PAGE_LRU; goto out; } - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 7ef849da8278c..9900bb95d7740 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1791,9 +1791,6 @@ static int __soft_offline_page(struct page *page, int flags) * so use !__PageMovable instead for LRU page's mapping * cannot have PAGE_MAPPING_MOVABLE. */ - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 2a9bbddb0e554..e92103a13545b 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1383,10 +1383,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE); if (!ret) { /* Success */ list_add_tail(&page->lru, &source); - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - } else { pr_warn("failed to isolate pfn %lx\n", pfn); dump_page(page, "isolation failed"); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4acc2d14bc779..a5685eee6d1db 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -994,12 +994,8 @@ static int migrate_page_add(struct page *page, struct list_head *pagelist, * Avoid migrating a page that is shared with others. */ if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); - } } return 0; diff --git a/mm/migrate.c b/mm/migrate.c index 515718392b249..2ccab4add6471 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -190,8 +190,6 @@ void putback_movable_pages(struct list_head *l) unlock_page(page); put_page(page); } else { - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); putback_lru_page(page); } } @@ -1177,10 +1175,17 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, return -ENOMEM; if (page_count(page) == 1) { + bool is_lru = !__PageMovable(page); + /* page was freed from under us. So we are done. */ ClearPageActive(page); ClearPageUnevictable(page); - if (unlikely(__PageMovable(page))) { + if (likely(is_lru)) + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); + else { lock_page(page); if (!PageMovable(page)) __ClearPageIsolated(page); @@ -1206,15 +1211,6 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, * restored. */ list_del(&page->lru); - - /* - * Compaction can migrate also non-LRU pages which are - * not accounted to NR_ISOLATED_*. They can be recognized - * as __PageMovable - */ - if (likely(!__PageMovable(page))) - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); } /* @@ -1568,9 +1564,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, err = 0; list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); } out_putpage: /* @@ -1886,8 +1879,6 @@ static struct page *alloc_misplaced_dst_page(struct page *page, static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) { - int page_lru; - VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page); /* Avoid migrating to a node that is nearly full */ @@ -1909,10 +1900,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) return 0; } - page_lru = page_is_file_cache(page); - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru, - hpage_nr_pages(page)); - /* * Isolating the page has taken another reference, so the * caller's reference can be safely dropped without the page @@ -1967,8 +1954,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, if (nr_remaining) { if (!list_empty(&migratepages)) { list_del(&page->lru); - dec_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); putback_lru_page(page); } isolated = 0; @@ -1998,7 +1983,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, pg_data_t *pgdat = NODE_DATA(node); int isolated = 0; struct page *new_page = NULL; - int page_lru = page_is_file_cache(page); unsigned long start = address & HPAGE_PMD_MASK; new_page = alloc_pages_node(node, @@ -2044,8 +2028,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, /* Retake the callers reference and putback on LRU */ get_page(page); putback_lru_page(page); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR); goto out_unlock; } @@ -2095,9 +2077,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR); count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, - -HPAGE_PMD_NR); return isolated; out_fail: diff --git a/mm/vmscan.c b/mm/vmscan.c index f68449ce0c44c..c693585c3facd 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1021,6 +1021,9 @@ int remove_mapping(struct address_space *mapping, struct page *page) void putback_lru_page(struct page *page) { lru_cache_add(page); + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_cache(page), + -hpage_nr_pages(page)); put_page(page); /* drop ref from isolate */ } @@ -1486,6 +1489,9 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ nr_reclaimed += nr_pages; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -nr_pages); /* * Is there need to periodically free_page_list? It would * appear not as the counts should be low @@ -1561,7 +1567,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, TTU_IGNORE_ACCESS, &dummy_stat, true); list_splice(&clean_pages, page_list); - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret); return ret; } @@ -1637,6 +1642,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) */ ClearPageLRU(page); ret = 0; + __mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } return ret; @@ -1768,6 +1776,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, total_scan, skipped, nr_taken, mode, lru); update_lru_sizes(lruvec, lru, nr_zone_taken); + return nr_taken; } @@ -1816,6 +1825,9 @@ int isolate_lru_page(struct page *page) ClearPageLRU(page); del_page_from_lru_list(page, lruvec, lru); ret = 0; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } spin_unlock_irq(&pgdat->lru_lock); } @@ -1907,6 +1919,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); list_move(&page->lru, &lruvec->lists[lru]); + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); if (put_page_testzero(page)) { __ClearPageLRU(page); __ClearPageActive(page); @@ -1984,7 +1999,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT; @@ -2010,8 +2024,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, move_pages_to_lru(lruvec, &page_list); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&page_list); @@ -2070,7 +2082,6 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; __count_vm_events(PGREFILL, nr_scanned); @@ -2139,7 +2150,6 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGDEACTIVATE, nr_deactivate); __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&l_active); From patchwork Tue Jul 23 06:25:38 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11053913 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 98CD36C5 for ; Tue, 23 Jul 2019 06:26:15 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8687A285B9 for ; Tue, 23 Jul 2019 06:26:15 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 79E48285E1; Tue, 23 Jul 2019 06:26:15 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 82FF828573 for ; Tue, 23 Jul 2019 06:26:14 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 8AAB88E0005; Tue, 23 Jul 2019 02:26:13 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 836408E0001; Tue, 23 Jul 2019 02:26:13 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 6D5958E0005; Tue, 23 Jul 2019 02:26:13 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f199.google.com (mail-pl1-f199.google.com [209.85.214.199]) by kanga.kvack.org (Postfix) with ESMTP id 2FEEB8E0001 for ; Tue, 23 Jul 2019 02:26:13 -0400 (EDT) Received: by mail-pl1-f199.google.com with SMTP id 91so21339727pla.7 for ; Mon, 22 Jul 2019 23:26:13 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=bfQQP9NHAJxmbSqu89eS8TN5Wu7mwb+7DHz3Y7mBEH8=; b=Qv+MwqJzDrzaZcvscetRqvpBPt6rSMFc3yJr8Gmxtyx0bfQRBfLfwHhdFi50EdZkJL 89SJdDHBWYbC2/kiCcKdhCLbrw8ty+SZeM2t8ywVjJ3oIRZ+KkQQgWYdSH1O9UduHKI9 80X0LzS+OJVIX+IviNEjMtMg3Wgd918zp849TbEZJR6Aag9Lxxsz81KNXYDOjK38OQqr 2wwWYtNPmBXNffYd/wJ8NeCrfOhslqvG1TJBMPJUR7BsslvVkR+iyu+bHWVwGlKt1NmC 43HMD5uFWc4CNVmJ2y3NhSGpBbESXKldkhoXYub5+9bUm3iQsScVxtx/ZxZu7ox4RbhB rLcg== X-Gm-Message-State: APjAAAXkoShxK+RJ7cQRv/loDc7IWKjRQegL19yEozX+G7nwjS8pf8zH 4Aab5a8HXF/LU3dh0LYUNdvhMvUrhUV36ER2mQtPxwEfifFHOT1LXuOHgfVVRu8UkzwRFssL4bQ /QE9MimuiK8wviaAbMftFlTvfQEfeq7xQgPhMdiDOWXp9xVoKWHhvXrdm+wEWPtA= X-Received: by 2002:a63:6154:: with SMTP id v81mr42110044pgb.296.1563863172601; Mon, 22 Jul 2019 23:26:12 -0700 (PDT) X-Received: by 2002:a63:6154:: with SMTP id v81mr42109985pgb.296.1563863171358; Mon, 22 Jul 2019 23:26:11 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563863171; cv=none; d=google.com; s=arc-20160816; b=XPrwp6Sta0WzVLHTphz5fozrnM1VGqcKLHI2X0sOQBzOQ3C2vJPdfi+MRjiQca77cA Ef8IZqGIOW7377VvqdM8Vf2k/nPloH9E+p8W/GOeFPxmfWlJVVq5Ff337oZotXiaYaA+ X30Sw3tv1Am8mFIxPzIUzruVZWFmoYyjvrlnPBl9t7y3/e9OF7FnwAJvqndo/DWy/Ql7 YKTiWxq7gIUW7oZzxMZNGoVyb2yi0eNgfCLTx6lp8BNwgxd6tQ9KJOUDQah6XN+bpdjY 5OBaesO2oPXVFr986tDlElsk4uepXM8sxslJpB+SvG5mOG+0ADJCh18nfssrsFiOXL8R V0Cw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=bfQQP9NHAJxmbSqu89eS8TN5Wu7mwb+7DHz3Y7mBEH8=; b=Ohz62Ti1AQzzxKMT3CIUKIDvwXsxBrxWRhWvcOVWQMfVn9Mo3tVzG/HI4G8pY07RoZ 8ZxY2TC8zToFF8MfKp5jFll37BYwesPWa9BTc+9pmi0aQXC5dJPlsQmgTUMc3P+Feav+ sSv0Szv7af6LR48Jkkaq9TZkWAsCd+EJLsNeu9Ux3Aok14hHxTDIUPVAkY8pluaOI5ei YzKX19C7irUnWLd3nQgc/aAWkdpqIHRNL2I6b1uLuCV9jFzxBPsXROKVAWdJNsEVfbY2 qOJ6ddjtmHD5zPd1mEgRjd0HMUqSL+M6OD5FLynIXRrIXFvgUxzTs6rGcjClppPL8mnh BP5g== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=S6N72sPt; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id b64sor52015433pjc.24.2019.07.22.23.26.11 for (Google Transport Security); Mon, 22 Jul 2019 23:26:11 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=S6N72sPt; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=bfQQP9NHAJxmbSqu89eS8TN5Wu7mwb+7DHz3Y7mBEH8=; b=S6N72sPtSLT5BRbZkdM7yLT3KExklat/g0/M44vLcLXzTOmhpazeQPD6qTDZxZI+bq tOhuIAReOaMz2Z0OZIRW40cU6cA1iD7WQx/951IkNbNwE4/D1F4A+HG7/F144NqlHmA8 nqF8Mxb4eYZX/AWt3D+y2o3GBL0P0iQTMkUyPUjmDdi3dpGbj6Ss20rX4TykbwN+7qpF zXOW2GdXWSgODEB5naQOgr6pstUNYHFkv2re13znj4gbj6YhC25RWc8qo8oDw5ZwMhNh rkzgsHolS/aE1+HKaNZRefLcDFVoRmd/Ga4IdoHXk/H6RDwZ9c69onc0c9H2AYXbddcx XWCg== X-Google-Smtp-Source: APXvYqzb7z49tOq+c3J6HBU0OTLCnyy3BYi10kL+NmLUNZO2UTErjgIy0BnBNxMhCLHmTvwhOXpq1g== X-Received: by 2002:a17:90a:346c:: with SMTP id o99mr79200719pjb.20.1563863170869; Mon, 22 Jul 2019 23:26:10 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id s66sm44630376pfs.8.2019.07.22.23.26.05 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 22 Jul 2019 23:26:09 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v6 4/5] mm: introduce MADV_PAGEOUT Date: Tue, 23 Jul 2019 15:25:38 +0900 Message-Id: <20190723062539.198697-5-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.657.g960e92d24f-goog In-Reply-To: <20190723062539.198697-1-minchan@kernel.org> References: <20190723062539.198697-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. * v4 * clear young bit regardless of success of page isolation - hannes * v3 * man page material modification - mhocko * remove using SWAP_CLUSTER_MAX - mhocko * v2 * add comment about SWAP_CLUSTER_MAX - mhocko * add permission check to prevent sidechannel attack - mhocko * add man page stuff - dave * v1 * change pte to old and rely on the other's reference - hannes * remove page_mapcount to check shared page - mhocko * RFC v2 * make reclaim_pages simple via factoring out isolate logic - hannes * RFCv1 * rename from MADV_COLD to MADV_PAGEOUT - hannes * bail out if process is being killed - Hillf * fix reclaim_pages bugs - Hillf [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/ Acked-by: Michal Hocko Signed-off-by: Minchan Kim --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/madvise.c | 195 +++++++++++++++++++++++++ mm/vmscan.c | 55 +++++++ 4 files changed, 252 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0ce997edb8bbc..063c0c1e112bd 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -365,6 +365,7 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long vm_total_pages; +extern unsigned long reclaim_pages(struct list_head *page_list); #ifdef CONFIG_NUMA extern int node_reclaim_mode; extern int sysctl_min_unmapped_ratio; diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index e9aeda400af3a..be83a785d1cc4 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -46,6 +46,7 @@ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ #define MADV_COLD 5 /* deactivate these pages */ +#define MADV_PAGEOUT 6 /* reclaim these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/madvise.c b/mm/madvise.c index 10255bb23aa73..24ded9f9e0fab 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -41,6 +42,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_COLD: + case MADV_PAGEOUT: case MADV_FREE: return 0; default: @@ -480,6 +482,196 @@ static long madvise_cold(struct vm_area_struct *vma, return 0; } +static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + LIST_HEAD(page_list); + struct page *page; + unsigned long next; + + if (fatal_signal_pending(current)) + return -EINTR; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_tlb_entry(tlb, pmd, addr); + } + + ClearPageReferenced(page); + test_and_clear_page_young(page); + + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); +huge_unlock: + spin_unlock(ptl); + reclaim_pages(&page_list); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; +regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + /* + * creating a THP page is expensive so split it only if we + * are sure it's worth. Split it if we are only owner. + */ + if (PageTransCompound(page)) { + if (page_mapcount(page) != 1) + break; + get_page(page); + if (!trylock_page(page)) { + put_page(page); + break; + } + pte_unmap_unlock(orig_pte, ptl); + if (split_huge_page(page)) { + unlock_page(page); + put_page(page); + pte_offset_map_lock(mm, pmd, addr, &ptl); + break; + } + unlock_page(page); + put_page(page); + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + pte--; + addr -= PAGE_SIZE; + continue; + } + + VM_BUG_ON_PAGE(PageTransCompound(page), page); + + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + ClearPageReferenced(page); + test_and_clear_page_young(page); + + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); + } + + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + reclaim_pages(&page_list); + cond_resched(); + + return 0; +} + +static void madvise_pageout_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk pageout_walk = { + .pmd_entry = madvise_pageout_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &pageout_walk); + tlb_end_vma(tlb, vma); +} + +static inline bool can_do_pageout(struct vm_area_struct *vma) +{ + if (vma_is_anonymous(vma)) + return true; + if (!vma->vm_file) + return false; + /* + * paging out pagecache only for non-anonymous mappings that correspond + * to the files the calling process could (if tried) open for writing; + * otherwise we'd be including shared non-exclusive mappings, which + * opens a side channel. + */ + return inode_owner_or_capable(file_inode(vma->vm_file)) || + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; +} + +static long madvise_pageout(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + if (!can_do_pageout(vma)) + return 0; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_pageout_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -870,6 +1062,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_willneed(vma, prev, start, end); case MADV_COLD: return madvise_cold(vma, prev, start, end); + case MADV_PAGEOUT: + return madvise_pageout(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -892,6 +1086,7 @@ madvise_behavior_valid(int behavior) case MADV_DONTNEED: case MADV_FREE: case MADV_COLD: + case MADV_PAGEOUT: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/vmscan.c b/mm/vmscan.c index c693585c3facd..9f4aa350b020d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2158,6 +2158,61 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_deactivate, nr_rotated, sc->priority, file); } +unsigned long reclaim_pages(struct list_head *page_list) +{ + int nid = -1; + unsigned long nr_reclaimed = 0; + LIST_HEAD(node_page_list); + struct reclaim_stat dummy_stat; + struct page *page; + struct scan_control sc = { + .gfp_mask = GFP_KERNEL, + .priority = DEF_PRIORITY, + .may_writepage = 1, + .may_unmap = 1, + .may_swap = 1, + }; + + while (!list_empty(page_list)) { + page = lru_to_page(page_list); + if (nid == -1) { + nid = page_to_nid(page); + INIT_LIST_HEAD(&node_page_list); + } + + if (nid == page_to_nid(page)) { + list_move(&page->lru, &node_page_list); + continue; + } + + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + page = lru_to_page(&node_page_list); + list_del(&page->lru); + putback_lru_page(page); + } + + nid = -1; + } + + if (!list_empty(&node_page_list)) { + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + page = lru_to_page(&node_page_list); + list_del(&page->lru); + putback_lru_page(page); + } + } + + return nr_reclaimed; +} + /* * The inactive anon list should be small enough that the VM never has * to do too much work. From patchwork Tue Jul 23 06:25:39 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11053915 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 2E376138D for ; Tue, 23 Jul 2019 06:26:20 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 1D94228573 for ; Tue, 23 Jul 2019 06:26:20 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 118FE285E1; Tue, 23 Jul 2019 06:26:20 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4B68928573 for ; Tue, 23 Jul 2019 06:26:19 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3A1C08E0001; Tue, 23 Jul 2019 02:26:18 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 32B958E0006; Tue, 23 Jul 2019 02:26:18 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 189E78E0001; Tue, 23 Jul 2019 02:26:18 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id D24658E0001 for ; Tue, 23 Jul 2019 02:26:17 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id u21so25501008pfn.15 for ; Mon, 22 Jul 2019 23:26:17 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=HWfaGSGIcQ2TlhH4i7ElyoGsDhVzg5vFtyNIv3y8QK8=; b=YfWBkwGkBFhj4g0uN6woVYDOVQm6Y/aoTlWcd1bfWdB1fSPZMXqDjxOpDI3hEwfQlw BVRTtVQF5ZPFsnnVsvxS7RT0mVNTifnSXz3Ir6dIAuhkNwmnKYWdpNnnc9JCXVwWUHFg e4g/8Vt/5bX+9uvjtViwdEMO26SIuIQOPMaF4GkMXGkH65w0PIlzeO40uuPUza8ZQ6DB SF7TPcvnvtFGro7lhpv8jkimi3nCCnMuWFkVugpfCzmodIaXu9z6dO7LNGrENOxP/Z7Z qz2oQBCWCDk4e/RJHRiA28fGxqio9fGampoZKiEdd98IZMgAP1noQnlBpih3q4LuCahM /nRQ== X-Gm-Message-State: APjAAAXBa/zYX0F8/YvtFguNsp9tXNm4x36dFUZA/7pMnxKirvRIT4sO Z0Ncw74qXHWBajLy71sy4eOIHbdFYRPFf5upA4+MeQ5CYSdnSyHR3XyNxZmFHJ1bZW7RvK09F/4 lUIHKm55HbeERvK23AceSuRDcQEq+6SgtpMT3FbLOeqf2PjxpStcWnqcpXDhb/z0= X-Received: by 2002:aa7:843c:: with SMTP id q28mr4228630pfn.152.1563863177512; Mon, 22 Jul 2019 23:26:17 -0700 (PDT) X-Received: by 2002:aa7:843c:: with SMTP id q28mr4228578pfn.152.1563863176651; Mon, 22 Jul 2019 23:26:16 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1563863176; cv=none; d=google.com; s=arc-20160816; b=h+ugiv3gPVdTQxp4Zenf32C5Z2BDg7SrPHEj7JevC2BGKqf5WGChZ/FytY+s4P2p8f XyAsykaMKon4hkyzEa4rI4NOrx+BhzfVsSXZ+WEVB0LeqY5D3tAS7qpcvz6QdxUtoniw fHPLdSdIqtpgfslV/m+vVsab+u+uRw7SwGEj0/gwrsIZf7pwHy9ffKxVUYvLN4RaWBC+ wih8JjeKClfYwBqLCuUfCpYImXQyhn4TnyL1iOD7ZrlR23eWsRomz+Qfxjn57IfASQZ/ Lio4vi/VaOE2pPQqpeq7aLW600KzOEju/FI8LaTv+tPN3z5Ill/E4q8tK2gIfLI8glZf BhcA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=HWfaGSGIcQ2TlhH4i7ElyoGsDhVzg5vFtyNIv3y8QK8=; b=t8oUAJLP7Bx5PIw2+9CndTplQ/fVTaNGQKEWk0V22n60sr/rlhhInNbR7Y3GQ9h6Jo xW8d6u9dp8SW9+qjLUVNDPkO/kYfrtm0Ok6Ayfa3R5sIGypXg7cr6qQYg3DYXn6EXLAF WzMjuoqHwrHZnpsQW/X1hbKaxBW3Z5KMGnqH3apOATBSAjL2we5HaM2yEF0K+T0GX+VK 9kE9oz2JKaOr34LMBtssMGmTpggcDoGA1vNP4W/XYCQeehEnazxUye/pqXQUSj8owZ4L TcP2vUnJQ1c0LSlra2dgcFxY7thkj54B4JHGFeCxlLaUYF8lUsCn/dYJHz/YwkcyoBPF W3sg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ESc5AccD; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id r7sor50789241pli.56.2019.07.22.23.26.16 for (Google Transport Security); Mon, 22 Jul 2019 23:26:16 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=ESc5AccD; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=HWfaGSGIcQ2TlhH4i7ElyoGsDhVzg5vFtyNIv3y8QK8=; b=ESc5AccD3fIY+3GLvmiAB0kep7+tfzAbjWyIwC8m0DV2gPhz1xDNb1b/gD95eYmOhC a4QqxTdoyeWeZ60pl615dQiMsUh0zICwi4+9xxYb+6M08mGulaz2XGZhtfsUJpO7C9Rd WdFr8V18O/v9nAWvxksmW0c+Q8avtZBKDwOTQ7i2G1eJnxU+gTSRHfG//NR5OQKYeA1P IlLuSIleWH3wy4M1cb8rte6uyzhWyJYX48HsO+2wsw0ItMAfmogWoaXROgkBOH6GKVVJ zudhs/4cejJlVbUnUjKWeKMNO/6WiskFxdoUgjZmg8ThfqDjVznGMtOBCY6Y5THMugVi 0oRg== X-Google-Smtp-Source: APXvYqzxpTQSdvmtEP25pudnB7+4N4RCNywSv31s7Zvji+D9HZedbBwT3mlGwaWkgSnoDjdv0i3NLg== X-Received: by 2002:a17:902:42d:: with SMTP id 42mr75384177ple.228.1563863176244; Mon, 22 Jul 2019 23:26:16 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id s66sm44630376pfs.8.2019.07.22.23.26.11 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Mon, 22 Jul 2019 23:26:15 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v6 5/5] mm: factor out common parts between MADV_COLD and MADV_PAGEOUT Date: Tue, 23 Jul 2019 15:25:39 +0900 Message-Id: <20190723062539.198697-6-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.657.g960e92d24f-goog In-Reply-To: <20190723062539.198697-1-minchan@kernel.org> References: <20190723062539.198697-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP There are many common parts between MADV_COLD and MADV_PAGEOUT. This patch factor them out to save code duplication. Suggested-by: Johannes Weiner Acked-by: Michal Hocko Signed-off-by: Minchan Kim --- mm/madvise.c | 193 ++++++++++++--------------------------------------- 1 file changed, 46 insertions(+), 147 deletions(-) diff --git a/mm/madvise.c b/mm/madvise.c index 24ded9f9e0fab..22be197c7cc9b 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -30,6 +30,11 @@ #include "internal.h" +struct madvise_walk_private { + struct mmu_gather *tlb; + bool pageout; +}; + /* * Any behaviour which results in changes to the vma->vm_flags needs to * take mmap_sem for writing. Others, which simply traverse vmas, need @@ -310,16 +315,23 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } -static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) +static int madvise_cold_or_pageout_pte_range(pmd_t *pmd, + unsigned long addr, unsigned long end, + struct mm_walk *walk) { - struct mmu_gather *tlb = walk->private; + struct madvise_walk_private *private = walk->private; + struct mmu_gather *tlb = private->tlb; + bool pageout = private->pageout; struct mm_struct *mm = tlb->mm; struct vm_area_struct *vma = walk->vma; pte_t *orig_pte, *pte, ptent; spinlock_t *ptl; - struct page *page; unsigned long next; + struct page *page = NULL; + LIST_HEAD(page_list); + + if (fatal_signal_pending(current)) + return -EINTR; next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) { @@ -366,10 +378,17 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, tlb_remove_pmd_tlb_entry(tlb, pmd, addr); } + ClearPageReferenced(page); test_and_clear_page_young(page); - deactivate_page(page); + if (pageout) { + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); + } else + deactivate_page(page); huge_unlock: spin_unlock(ptl); + if (pageout) + reclaim_pages(&page_list); return 0; } @@ -437,12 +456,19 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, * As a side effect, it makes confuse idle-page tracking * because they will miss recent referenced history. */ + ClearPageReferenced(page); test_and_clear_page_young(page); - deactivate_page(page); + if (pageout) { + if (!isolate_lru_page(page)) + list_add(&page->lru, &page_list); + } else + deactivate_page(page); } arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); + if (pageout) + reclaim_pages(&page_list); cond_resched(); return 0; @@ -452,10 +478,15 @@ static void madvise_cold_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long addr, unsigned long end) { + struct madvise_walk_private walk_private = { + .tlb = tlb, + .pageout = false, + }; + struct mm_walk cold_walk = { - .pmd_entry = madvise_cold_pte_range, + .pmd_entry = madvise_cold_or_pageout_pte_range, .mm = vma->vm_mm, - .private = tlb, + .private = &walk_private, }; tlb_start_vma(tlb, vma); @@ -482,151 +513,19 @@ static long madvise_cold(struct vm_area_struct *vma, return 0; } -static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, - unsigned long end, struct mm_walk *walk) -{ - struct mmu_gather *tlb = walk->private; - struct mm_struct *mm = tlb->mm; - struct vm_area_struct *vma = walk->vma; - pte_t *orig_pte, *pte, ptent; - spinlock_t *ptl; - LIST_HEAD(page_list); - struct page *page; - unsigned long next; - - if (fatal_signal_pending(current)) - return -EINTR; - - next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) { - pmd_t orig_pmd; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - ptl = pmd_trans_huge_lock(pmd, vma); - if (!ptl) - return 0; - - orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto huge_unlock; - - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto huge_unlock; - } - - page = pmd_page(orig_pmd); - if (next - addr != HPAGE_PMD_SIZE) { - int err; - - if (page_mapcount(page) != 1) - goto huge_unlock; - get_page(page); - spin_unlock(ptl); - lock_page(page); - err = split_huge_page(page); - unlock_page(page); - put_page(page); - if (!err) - goto regular_page; - return 0; - } - - if (pmd_young(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd = pmd_mkold(orig_pmd); - - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_tlb_entry(tlb, pmd, addr); - } - - ClearPageReferenced(page); - test_and_clear_page_young(page); - - if (!isolate_lru_page(page)) - list_add(&page->lru, &page_list); -huge_unlock: - spin_unlock(ptl); - reclaim_pages(&page_list); - return 0; - } - - if (pmd_trans_unstable(pmd)) - return 0; -regular_page: - tlb_change_page_size(tlb, PAGE_SIZE); - orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); - flush_tlb_batched_pending(mm); - arch_enter_lazy_mmu_mode(); - for (; addr < end; pte++, addr += PAGE_SIZE) { - ptent = *pte; - if (!pte_present(ptent)) - continue; - - page = vm_normal_page(vma, addr, ptent); - if (!page) - continue; - - /* - * creating a THP page is expensive so split it only if we - * are sure it's worth. Split it if we are only owner. - */ - if (PageTransCompound(page)) { - if (page_mapcount(page) != 1) - break; - get_page(page); - if (!trylock_page(page)) { - put_page(page); - break; - } - pte_unmap_unlock(orig_pte, ptl); - if (split_huge_page(page)) { - unlock_page(page); - put_page(page); - pte_offset_map_lock(mm, pmd, addr, &ptl); - break; - } - unlock_page(page); - put_page(page); - pte = pte_offset_map_lock(mm, pmd, addr, &ptl); - pte--; - addr -= PAGE_SIZE; - continue; - } - - VM_BUG_ON_PAGE(PageTransCompound(page), page); - - if (pte_young(ptent)) { - ptent = ptep_get_and_clear_full(mm, addr, pte, - tlb->fullmm); - ptent = pte_mkold(ptent); - set_pte_at(mm, addr, pte, ptent); - tlb_remove_tlb_entry(tlb, pte, addr); - } - ClearPageReferenced(page); - test_and_clear_page_young(page); - - if (!isolate_lru_page(page)) - list_add(&page->lru, &page_list); - } - - arch_leave_lazy_mmu_mode(); - pte_unmap_unlock(orig_pte, ptl); - reclaim_pages(&page_list); - cond_resched(); - - return 0; -} - static void madvise_pageout_page_range(struct mmu_gather *tlb, struct vm_area_struct *vma, unsigned long addr, unsigned long end) { + struct madvise_walk_private walk_private = { + .pageout = true, + .tlb = tlb, + }; + struct mm_walk pageout_walk = { - .pmd_entry = madvise_pageout_pte_range, + .pmd_entry = madvise_cold_or_pageout_pte_range, .mm = vma->vm_mm, - .private = tlb, + .private = &walk_private, }; tlb_start_vma(tlb, vma);