From patchwork Thu Jul 11 01:25:25 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11039171 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id BE7A414C0 for ; Thu, 11 Jul 2019 01:25:50 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id ADC7F28758 for ; Thu, 11 Jul 2019 01:25:50 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A1977287DA; Thu, 11 Jul 2019 01:25:50 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6176128758 for ; Thu, 11 Jul 2019 01:25:49 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 479F88E00A2; Wed, 10 Jul 2019 21:25:48 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 450DE8E0032; Wed, 10 Jul 2019 21:25:48 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 33F658E00A2; Wed, 10 Jul 2019 21:25:48 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id EDCBF8E0032 for ; Wed, 10 Jul 2019 21:25:47 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id x19so2580135pgx.1 for ; Wed, 10 Jul 2019 18:25:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=uH/2a1IWmhYEwlkkogiHSOMffDjr5cK1rp2VKH1sPp4=; b=ZchbaPyC6xbZENgnVapIsKgnlyjaK46layCgT/9ZePqP/q/l5H1AUBJvh/9MzuAV6+ mJo/H+s6Kg1sJtHO8Hak+dIoS87oFkk0XmVlSYHwe8U03i3f6B2TDeaqenne53lPVfWN MMJswxz0//Cx5ey+CRonB2DNaDRRLGETREohqXXcPIkLVWtEDapGdikSaQ4AyQlDvNdT QEBFovYJIwAaU+E69NNiES9a6jxz7UjKrG81JVZEzec0rUM4nX/DW69YLUbj/Y0k+562 FgTFmiwXjeesoW0bTp02wiI/l7RQpS/Hg9HguEQj9h0llgH7wbfx547L7wzcdSphgeub Tglg== X-Gm-Message-State: APjAAAUUiX8PcNGAY+0Ru4R8s9dNo+R2BtSBuNBkuxqUSgVSnfC3QGO0 isRdxtRXQreCACNXgJzkfzFkkO58jMxd86J4lVCW8yllpHLHIkKeAAnTMroSmvD196GQT6v4UTN 7gu2b8kk1WWzsudZ9h0cJXDoDjtTEo+NIbLkB8M1vvaRKwRytx2fyKgTkkYXXnv8= X-Received: by 2002:a63:6c7:: with SMTP id 190mr1351396pgg.7.1562808347212; Wed, 10 Jul 2019 18:25:47 -0700 (PDT) X-Received: by 2002:a63:6c7:: with SMTP id 190mr1351280pgg.7.1562808345559; Wed, 10 Jul 2019 18:25:45 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562808345; cv=none; d=google.com; s=arc-20160816; b=GQ1JFdREvA5TaOysRn8NPhXAEi+tNq6+jh6XEzYvZoX9HCjncAqSAGFPV9Amvc0t9X cSDVfZEB8Aek9PiyJlBxN2bIcD/oervE2mhTAR9nmvOnlcKSc9kNwrWRcVSulUUIcF1Z rqSQSt2anqBFNT14NimDngzDC5Sl+70CP0e13CycKQ083SZTZaWebl3789y+Rjqq51nT Zdtxj7FTCcC9WfvHNmLHH6xzwcVLWue5cu8fGVMRSjuWlLdFwsJn5yfLfRjY6c6P6G9C Af47YrGMQr+Hx94P7PiSXyzVq0tiEo9OU7Mv4yzExxG9XnfwGGESyfNRKd3ZU6giJJcH brJQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=uH/2a1IWmhYEwlkkogiHSOMffDjr5cK1rp2VKH1sPp4=; b=FuBFMWZjh00+JG7r8W3m9o8Mll1vcRPrhojnC4JqPpfDQXKIGNjRzwo0t7++XfIkSa S9Osjai1SxbC9biPBOlaFTpGF6dbEbn1yrskTOBBaVM383NBlCfRIAdPjyBttLdVAIqt esRiPX+Bu6zukrDbl+Q2u3ioFMnYyiewpctZ33+8XIUv9usdbohWJPMC6ZhGX2Vohe98 U623nCR6vAzPWdMEiQg/XJF/Q35PAEmd/iAAXjJ5xOx0eIFO1xGPn1dcXeQDSfp/5Typ jvm+yqZFo3fyIhaKGpVqVPT5nM8ZbnU+bBCHaEIBg/dinqCSDIqiCwTzxBOsRBX+uviy 3Ycg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=X88ZBsD0; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id r39sor4825284pld.65.2019.07.10.18.25.45 for (Google Transport Security); Wed, 10 Jul 2019 18:25:45 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=X88ZBsD0; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=uH/2a1IWmhYEwlkkogiHSOMffDjr5cK1rp2VKH1sPp4=; b=X88ZBsD0jq0bf1bYfcqSvVQ1p/zehAHMMcznAlBZE1SlzCtE/wQrzLL3zf5ZYTAMu6 iBH0y0J3pzMF8+4ML0tjnck7WZtZkW+vJC4coQEbB2o4Zza8wcr6hmxmIndWypPFPrXT zxWKwtclbbPYTQyYRANAOuX41JjahnJdnnPFKXi7KNPlDJ7MOhIXSt+gJ5wXEhVXTUNr h01S23O2PUBufNhin26O/k3ZccM253hqDxXqPueVKHD0KKgKbpP5Opx61w0J5ZLZDZyb 8WAlk0IjIxn8RiZOjXMWOoQqTsCZ34ErLRYBRHsNP/yRDd7OoCJUTQ3t7bsL05dwsW54 iahQ== X-Google-Smtp-Source: APXvYqx6N1FnZSIJRo4I9fW4v84QLEfJhNB9VYcHN6wMx8JA9f3ruXcxGHT8xKoQ0eayrFlDO9gI+Q== X-Received: by 2002:a17:902:9f8e:: with SMTP id g14mr1430817plq.67.1562808345065; Wed, 10 Jul 2019 18:25:45 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id b37sm10031974pjc.15.2019.07.10.18.25.39 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 10 Jul 2019 18:25:43 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v4 1/4] mm: introduce MADV_COLD Date: Thu, 11 Jul 2019 10:25:25 +0900 Message-Id: <20190711012528.176050-2-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190711012528.176050-1-minchan@kernel.org> References: <20190711012528.176050-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Pages in the specified regions will be treated as less-recently-accessed compared to pages in the system with similar access frequencies. In contrast to MADV_FREE, the contents of the region are preserved regardless of subsequent writes to pages. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. * v2 * add up the warn with lots of page cache workload - mhocko * add man page stuff - dave * v1 * remove page_mapcount filter - hannes, mhocko * remove idle page handling - joelaf * RFCv2 * add more description - mhocko * RFCv1 * renaming from MADV_COOL to MADV_COLD - hannes * internal review * use clear_page_youn in deactivate_page - joelaf * Revise the description - surenb * Renaming from MADV_WARM to MADV_COOL - surenb Acked-by: Michal Hocko Signed-off-by: Minchan Kim Acked-by: Johannes Weiner --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/internal.h | 2 +- mm/madvise.c | 180 ++++++++++++++++++++++++- mm/oom_kill.c | 2 +- mm/swap.c | 42 ++++++ 6 files changed, 224 insertions(+), 4 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index de2c67a33b7e..0ce997edb8bb 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -340,6 +340,7 @@ extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_all(void); extern void rotate_reclaimable_page(struct page *page); extern void deactivate_file_page(struct page *page); +extern void deactivate_page(struct page *page); extern void mark_page_lazyfree(struct page *page); extern void swap_setup(void); diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index 63b1f506ea67..ef8a56927b12 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -45,6 +45,7 @@ #define MADV_SEQUENTIAL 2 /* expect sequential page references */ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ +#define MADV_COLD 5 /* deactivatie these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/internal.h b/mm/internal.h index f53a14d67538..c61b215ff265 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -39,7 +39,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf); void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); -static inline bool can_madv_dontneed_vma(struct vm_area_struct *vma) +static inline bool can_madv_lru_vma(struct vm_area_struct *vma) { return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); } diff --git a/mm/madvise.c b/mm/madvise.c index 968df3aa069f..bae0055f9724 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -40,6 +40,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_REMOVE: case MADV_WILLNEED: case MADV_DONTNEED: + case MADV_COLD: case MADV_FREE: return 0; default: @@ -307,6 +308,178 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } +static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + struct page *page; + unsigned long next; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + } + + test_and_clear_page_young(page); + deactivate_page(page); +huge_unlock: + spin_unlock(ptl); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; + +regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + + if (pte_none(ptent)) + continue; + + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + /* + * Creating a THP page is expensive so split it only if we + * are sure it's worth. Split it if we are only owner. + */ + if (PageTransCompound(page)) { + if (page_mapcount(page) != 1) + break; + get_page(page); + if (!trylock_page(page)) { + put_page(page); + break; + } + pte_unmap_unlock(orig_pte, ptl); + if (split_huge_page(page)) { + unlock_page(page); + put_page(page); + pte_offset_map_lock(mm, pmd, addr, &ptl); + break; + } + unlock_page(page); + put_page(page); + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + pte--; + addr -= PAGE_SIZE; + continue; + } + + VM_BUG_ON_PAGE(PageTransCompound(page), page); + + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + + /* + * We are deactivating a page for accelerating reclaiming. + * VM couldn't reclaim the page unless we clear PG_young. + * As a side effect, it makes confuse idle-page tracking + * because they will miss recent referenced history. + */ + test_and_clear_page_young(page); + deactivate_page(page); + } + + arch_enter_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + cond_resched(); + + return 0; +} + +static void madvise_cold_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk cold_walk = { + .pmd_entry = madvise_cold_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &cold_walk); + tlb_end_vma(tlb, vma); +} + +static long madvise_cold(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_cold_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -519,7 +692,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, int behavior) { *prev = vma; - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (!userfaultfd_remove(vma, start, end)) { @@ -541,7 +714,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, */ return -ENOMEM; } - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (end > vma->vm_end) { /* @@ -695,6 +868,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_remove(vma, prev, start, end); case MADV_WILLNEED: return madvise_willneed(vma, prev, start, end); + case MADV_COLD: + return madvise_cold(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -716,6 +891,7 @@ madvise_behavior_valid(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_FREE: + case MADV_COLD: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 95872bdfec4e..c8f0ec6b0e80 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -523,7 +523,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm) set_bit(MMF_UNSTABLE, &mm->flags); for (vma = mm->mmap ; vma; vma = vma->vm_next) { - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) continue; /* diff --git a/mm/swap.c b/mm/swap.c index 55899c1f54af..b501ad6eb091 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -47,6 +47,7 @@ int page_cluster; static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); +static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); #ifdef CONFIG_SMP static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs); @@ -538,6 +539,22 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, update_page_reclaim_stat(lruvec, file, 0); } +static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, + void *arg) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + int file = page_is_file_cache(page); + int lru = page_lru_base_type(page); + + del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE); + ClearPageActive(page); + ClearPageReferenced(page); + add_page_to_lru_list(page, lruvec, lru); + + __count_vm_events(PGDEACTIVATE, hpage_nr_pages(page)); + update_page_reclaim_stat(lruvec, file, 0); + } +} static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, void *arg) @@ -590,6 +607,10 @@ void lru_add_drain_cpu(int cpu) if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + pvec = &per_cpu(lru_deactivate_pvecs, cpu); + if (pagevec_count(pvec)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + pvec = &per_cpu(lru_lazyfree_pvecs, cpu); if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); @@ -623,6 +644,26 @@ void deactivate_file_page(struct page *page) } } +/* + * deactivate_page - deactivate a page + * @page: page to deactivate + * + * deactivate_page() moves @page to the inactive list if @page was on the active + * list and was not an unevictable page. This is done to accelerate the reclaim + * of @page. + */ +void deactivate_page(struct page *page) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); + + get_page(page); + if (!pagevec_add(pvec, page) || PageCompound(page)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + put_cpu_var(lru_deactivate_pvecs); + } +} + /** * mark_page_lazyfree - make an anon page lazyfree * @page: page to deactivate @@ -687,6 +728,7 @@ void lru_add_drain_all(void) if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) || pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) || + pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) || need_activate_page_drain(cpu)) { INIT_WORK(work, lru_add_drain_per_cpu); From patchwork Thu Jul 11 01:25:26 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11039173 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 538D914C0 for ; Thu, 11 Jul 2019 01:25:54 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3FD8428758 for ; Thu, 11 Jul 2019 01:25:54 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 30C4E287DA; Thu, 11 Jul 2019 01:25:54 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 61BDE28758 for ; Thu, 11 Jul 2019 01:25:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 627208E00A3; Wed, 10 Jul 2019 21:25:52 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 5FE1D8E0032; Wed, 10 Jul 2019 21:25:52 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4ECED8E00A3; Wed, 10 Jul 2019 21:25:52 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id 1634F8E0032 for ; Wed, 10 Jul 2019 21:25:52 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id c5so2579110pgq.0 for ; Wed, 10 Jul 2019 18:25:52 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=ytlBWMtHJbXM/Fdr0Dw34etRRl/dA5ETgcx25kBAXTE=; b=KteFTVWzxAHXXrRhy+JjVyFRxNdxSusOGarZVcye1RPFpq4HBxr9sddfhwKgS8ew+q i6OIpeD2IjW/LnhMdswibjHl7YiRDD/UERri/KnrDtPGxtKtwpehXwrKKhzwCZ7rwzjl zJRX0QEA6jhUDomJe1QlD+TLd35CkJAQ30QB1Gvt0OYLH7yk0RXWbtifjZEEAQUwkh/J vdurmMLRR5EppdQLCEq1UwMjxeLJoEbnihmcBJeXjPDWLIKGpkZOWhdpsfkrxReoDpUX Z2X0BNvUs/cDdMWh9/5ImzqDGGM9DBfvMULaXgPF2AmxNR3PDN9H6j911o9uHr8X39GZ ga5g== X-Gm-Message-State: APjAAAWSGV5TLaNz4iQX5as6boabBPoyz8uepHO902dP7NNOkiWNVPvp TFvhprVGP/Vmfs2ozWZvsoJ7D4+fSS43ZKqJIgWIQofsYg84uM3Zh2NJu1Wok0IRB6EP6D80mD1 lBiJ5cSUbHKVVPTOQIcqxBr6ksC4AB7oy2SRUqpUPgwFHdqldRJEIsTNVWv9GoYc= X-Received: by 2002:a63:6f8f:: with SMTP id k137mr1398737pgc.90.1562808351601; Wed, 10 Jul 2019 18:25:51 -0700 (PDT) X-Received: by 2002:a63:6f8f:: with SMTP id k137mr1398692pgc.90.1562808350803; Wed, 10 Jul 2019 18:25:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562808350; cv=none; d=google.com; s=arc-20160816; b=Xc7gEb1wiOl48e5IUOhJrX52ykitp966JYX3rr9/NL5hq0Awf4dqIyUdopcoUld6HO gHGDabpH/zwn4wD7KIxKPISnZc9WMAolRgwddeqJ+ogqwhHdOqMKwpokD+PrhEedf/2G FHfIqI/zTzyd6Ggu/uJMY+ouTB6WisGz3nJs1ag8AyO1eROLJG5+oC+U9iUJa4RhQrDJ Omivni8dgsOwmFLHeB4BGcTNvXcktBcSmGRGJQzgBrUfeq2KTVgIbWk/+myTHwPl30pH ACKQvFQ7lx4Ql8+RIHkWuOODS6H/k5d+gGHxvEpkOg8qKx5oj863GUQSJo03VObhiKxK 9+FA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=ytlBWMtHJbXM/Fdr0Dw34etRRl/dA5ETgcx25kBAXTE=; b=osZ3duMH627TsvUhqEKLTrp1brpcvQ8DKXItDRJETphca45Nd61hOTgXmdFClxzQIU nN9Shj0ZrJGIRMAnJdADLeHNCBQIGBv0WcY9/80hb2+lwF6lnjF2KMgmLBBA8om7tDa6 DxMhiY5EKnVX0m5PpziAmH1BjnJ2i0zWQWUGcPjcqALtHFhvgZ66LU/srht1wip01/ul OlBKEfduRun00/Qum6HyKFS6mnRYoWruaq5CwPk6MOEi7UhqlwvoqTBel7nB2gAQYrGi aym9KUo0Y1zkV8/u2nFmw3UI5katwKnUJobSztwHSXl2AWejVTni5XidZyyQk1feGcLP sLIQ== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=scmCxifB; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id t29sor2085021pgm.5.2019.07.10.18.25.50 for (Google Transport Security); Wed, 10 Jul 2019 18:25:50 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=scmCxifB; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=ytlBWMtHJbXM/Fdr0Dw34etRRl/dA5ETgcx25kBAXTE=; b=scmCxifB8StBEnmKJK3AL4Hjoo98qxRNoQ8Fk3T8R9JuJObsuBeHZD00xDTlmTG4lh BzuC9uPW4vFnIP4tbOwbF3cqX08QGsRYfP2p+KEsQG5xIuaSSWt7oxSdFKZjc/ofymfx qw8C+OcxbqqZk22YmTmHhF8Mp/eKpdhQi+NAjVTUMZiePcw0acmeucdEhdmeeE+aiX3n gRwdmJpdShHc94ZVrAYoPS7XXlnnyC6tXr4cQj2bhACNORepwXVk2b2tOmJO2srqMWit qXgqphbd2UH5Wt76dnSGGMN6OabavY6+qQ2dpOslFYo1NQUYJE5KTS3xQ0D5pJ3G5Cow U1JQ== X-Google-Smtp-Source: APXvYqwoGIjOCi1444aQyMZiTm8AH1PhVwM3KaGhWog4eNCwyVy0z+FnrWXmbFQXXt10zF79f3fcAA== X-Received: by 2002:a63:9a51:: with SMTP id e17mr1412020pgo.212.1562808350302; Wed, 10 Jul 2019 18:25:50 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id b37sm10031974pjc.15.2019.07.10.18.25.45 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 10 Jul 2019 18:25:49 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v4 2/4] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Date: Thu, 11 Jul 2019 10:25:26 +0900 Message-Id: <20190711012528.176050-3-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190711012528.176050-1-minchan@kernel.org> References: <20190711012528.176050-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN as default. It is for preventing to reclaim dirty pages when CMA try to migrate pages. Strictly speaking, we don't need it because CMA didn't allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list. Moreover, it has a problem to prevent anonymous pages's swap out even though force_reclaim = true in shrink_page_list on upcoming patch. So this patch makes references's default value to PAGEREF_RECLAIM and rename force_reclaim with ignore_references to make it more clear. This is a preparatory work for next patch. * RFCv1 * use ignore_referecnes as parameter name - hannes Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Minchan Kim --- mm/vmscan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index a0301edd8d03..b4fa04d10ba6 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1119,7 +1119,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct scan_control *sc, enum ttu_flags ttu_flags, struct reclaim_stat *stat, - bool force_reclaim) + bool ignore_references) { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); @@ -1133,7 +1133,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct address_space *mapping; struct page *page; int may_enter_fs; - enum page_references references = PAGEREF_RECLAIM_CLEAN; + enum page_references references = PAGEREF_RECLAIM; bool dirty, writeback; unsigned int nr_pages; @@ -1264,7 +1264,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, } } - if (!force_reclaim) + if (!ignore_references) references = page_check_references(page, sc); switch (references) { From patchwork Thu Jul 11 01:25:27 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11039177 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 8059414F6 for ; Thu, 11 Jul 2019 01:26:00 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 6F5BE28758 for ; Thu, 11 Jul 2019 01:26:00 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 636A8287DA; Thu, 11 Jul 2019 01:26:00 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 64A2D28758 for ; Thu, 11 Jul 2019 01:25:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 53B6D8E00A4; Wed, 10 Jul 2019 21:25:58 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 4ECE48E0032; Wed, 10 Jul 2019 21:25:58 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3DBF68E00A4; Wed, 10 Jul 2019 21:25:58 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f199.google.com (mail-pg1-f199.google.com [209.85.215.199]) by kanga.kvack.org (Postfix) with ESMTP id 013208E0032 for ; Wed, 10 Jul 2019 21:25:58 -0400 (EDT) Received: by mail-pg1-f199.google.com with SMTP id b18so2573810pgg.8 for ; Wed, 10 Jul 2019 18:25:57 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=g3sPXxwryfyIYGvf6SkhaUfkJre0gRyCx/35axsK1aw=; b=SekIJefqqAVx4NR5FWqQ/EyVP3GhZ/er+gV5cA9ghcDtm7YWu/aEYj+OilNL5vwMiX 5Dkt241Lj9uQN3lpnqdgp6BY/xIm/07ecLGuk+1fE0Gh85OFnckT8Ovqq8ML7NJSuinJ Jw0m7ALkKOYR2n2yLjKJ/Is/BF1sjBZCDjyDBLU+65/M5i1p6hO04W+dItT5fGo5M+Uw kzO0wmS9PbOMhCmI9OCvAxjPsr4a7Cnc9d9iNvKqn9KoopoJUINet+yonoyhoepRFnZt 0O6Ggm5xnN2EEpp9OgQ+PUtsuEustMjKvwrbhts2pM+mBV8ZRKQeiIla8xiRMmba9akN uJFg== X-Gm-Message-State: APjAAAW+66myV7cQSYq6cnWWenX08A7j85arJcGqeAwUnPKLxv7Ib46C rYHIH9OD8Hj4LU9iTb/rAZPayjv3JY+lqnU4Ga6P5id4m/ziUct07JquWHuRzcDYRh03QJSAlBR cXXupnYDKkmF4eBPBdGfQhnsm+GXpGZLDhhtau684Pp/YbhJBmzYyzyHZuD2Hlyc= X-Received: by 2002:a63:b10f:: with SMTP id r15mr1371075pgf.230.1562808357479; Wed, 10 Jul 2019 18:25:57 -0700 (PDT) X-Received: by 2002:a63:b10f:: with SMTP id r15mr1371012pgf.230.1562808356383; Wed, 10 Jul 2019 18:25:56 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562808356; cv=none; d=google.com; s=arc-20160816; b=UWc6B0hqtLxRgsg8j9FSISOEcZlqkfZywuph6QTP3uUJZdKRkIh8cjn6R1IUF9aqKY C+VmY3xJCYS22n+s3ou4nm/yx4/7NqEsqOwvgVVYowFvOge+cF31LsT6VPnzt6Em54Vy la7E9QNxxzCC0DzJEXwg85yANyXXrAEbUHf9jowJN1X9NxTKHKQ+zNVqv3GNpdS3UIpL kxUGHxfAvRJa5NDZr+/7rp2QLtrvwXU8EoyYYP9yZeAt520TJ5HlJ7/ao2IrT6xSJiPA MukjoceXOW0vZQhMeQwF8NKSoLpMonV/9tS145SC8lEDwQf+zOeUVgIdlMF20NxLEXGF vAdA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=g3sPXxwryfyIYGvf6SkhaUfkJre0gRyCx/35axsK1aw=; b=QScqLPpMQ0acSu+xwyo+I0pANMakKACsfsGoIU5M+7DfBKmHQL1nF5zixqpY+WOOPM z3bWqUKOXFHfjydrt006RiteEsINDliahwF9TzPew9jQ2wZ6PBZI5intzw1gXXa93sye F3g9VDYoSrIrMy3OU9CcMqw1KMdjjL/eQk7U07zxlHWdtE4sUrgeSKV2PsF5C1aBvopg 20APUdauL3h9Pw4PagcyWy3LPY+iOu7dyCaqQm8yyUmmy53+25asXQANEsS1JnycF94g ch3udHTxQFL8jHrXtWhusl0OwFLLiQy4/XCb1aZpkJYN8Q3coaApsNZiLDX9fpV1ceFo STiw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=IfzouMzA; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id b64sor5197617pjc.24.2019.07.10.18.25.56 for (Google Transport Security); Wed, 10 Jul 2019 18:25:56 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=IfzouMzA; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=g3sPXxwryfyIYGvf6SkhaUfkJre0gRyCx/35axsK1aw=; b=IfzouMzAhm1Bmv3cV0b5Uq3+6uTtqI8UlcQ0D+i/m/KoGY5gRYPiFONouv3q6F6VJw e95Y5Q2oM7ngJqIupI0m3aYBZRI7ZZg8UzDFHceGUKqRq7jS0Scji1OtCnmG3bfBzc7v AabQuJI+4c4DbowNuYo7UlmIdkefwQjycubgMPchpkbU1YADNsz3fex3KI0Fp6ZhLoJE w4dxAcdrA5WQqhO9C7LEjeTcKsawIBBjGUvWW7T6ScRemFAlziwvuo2c9r2RlvxdOAwE xce5UQ4oUSc2dIG3LL+gg6AqJmty7fKH0qYyHBKer1GHBKCjFS9VNXYkPv9V7MGNdso6 rVsw== X-Google-Smtp-Source: APXvYqznXaO571WdAm9exTfgVHNOIPd4HtaYloqp/pMN0riHnBJghIr+LQKqamZeufoNitewUwHvvg== X-Received: by 2002:a17:90a:ad93:: with SMTP id s19mr1600923pjq.36.1562808355903; Wed, 10 Jul 2019 18:25:55 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id b37sm10031974pjc.15.2019.07.10.18.25.50 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 10 Jul 2019 18:25:54 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v4 3/4] mm: account nr_isolated_xxx in [isolate|putback]_lru_page Date: Thu, 11 Jul 2019 10:25:27 +0900 Message-Id: <20190711012528.176050-4-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190711012528.176050-1-minchan@kernel.org> References: <20190711012528.176050-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The isolate counting is pecpu counter so it would be not huge gain to work them by batch. Rather than complicating to make them batch, let's make it more stright-foward via adding the counting logic into [isolate|putback]_lru_page API. * v1 * fix accounting bug - Hillf Link: http://lkml.kernel.org/r/20190531165927.GA20067@cmpxchg.org Acked-by: Michal Hocko Suggested-by: Johannes Weiner Signed-off-by: Minchan Kim Acked-by: Johannes Weiner --- mm/compaction.c | 2 -- mm/gup.c | 7 +------ mm/khugepaged.c | 3 --- mm/memory-failure.c | 3 --- mm/memory_hotplug.c | 4 ---- mm/mempolicy.c | 6 +----- mm/migrate.c | 37 ++++++++----------------------------- mm/vmscan.c | 22 ++++++++++++++++------ 8 files changed, 26 insertions(+), 58 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9e1b9acb116b..c6591682deda 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -982,8 +982,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* Successfully isolated */ del_page_from_lru_list(page, lruvec, page_lru(page)); - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); isolate_success: list_add(&page->lru, &cc->migratepages); diff --git a/mm/gup.c b/mm/gup.c index 98f13ab37bac..11d0634ce613 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1475,13 +1475,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, drain_allow = false; } - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, &cma_page_list); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + - page_is_file_cache(head), - hpage_nr_pages(head)); - } } } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index eaaa21b23215..a8b517d6df4a 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -503,7 +503,6 @@ void __khugepaged_exit(struct mm_struct *mm) static void release_pte_page(struct page *page) { - dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); unlock_page(page); putback_lru_page(page); } @@ -602,8 +601,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_DEL_PAGE_LRU; goto out; } - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 7ef849da8278..9900bb95d774 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1791,9 +1791,6 @@ static int __soft_offline_page(struct page *page, int flags) * so use !__PageMovable instead for LRU page's mapping * cannot have PAGE_MAPPING_MOVABLE. */ - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index b9ba5b85f9f7..15bad2043b41 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1383,10 +1383,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE); if (!ret) { /* Success */ list_add_tail(&page->lru, &source); - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - } else { pr_warn("failed to isolate pfn %lx\n", pfn); dump_page(page, "isolation failed"); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 4acc2d14bc77..a5685eee6d1d 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -994,12 +994,8 @@ static int migrate_page_add(struct page *page, struct list_head *pagelist, * Avoid migrating a page that is shared with others. */ if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); - } } return 0; diff --git a/mm/migrate.c b/mm/migrate.c index 8992741f10aa..f54f449a99f5 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -190,8 +190,6 @@ void putback_movable_pages(struct list_head *l) unlock_page(page); put_page(page); } else { - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); putback_lru_page(page); } } @@ -1175,10 +1173,17 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, return -ENOMEM; if (page_count(page) == 1) { + bool is_lru = !__PageMovable(page); + /* page was freed from under us. So we are done. */ ClearPageActive(page); ClearPageUnevictable(page); - if (unlikely(__PageMovable(page))) { + if (likely(is_lru)) + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); + else { lock_page(page); if (!PageMovable(page)) __ClearPageIsolated(page); @@ -1204,15 +1209,6 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, * restored. */ list_del(&page->lru); - - /* - * Compaction can migrate also non-LRU pages which are - * not accounted to NR_ISOLATED_*. They can be recognized - * as __PageMovable - */ - if (likely(!__PageMovable(page))) - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); } /* @@ -1566,9 +1562,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, err = 0; list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); } out_putpage: /* @@ -1884,8 +1877,6 @@ static struct page *alloc_misplaced_dst_page(struct page *page, static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) { - int page_lru; - VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page); /* Avoid migrating to a node that is nearly full */ @@ -1907,10 +1898,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) return 0; } - page_lru = page_is_file_cache(page); - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru, - hpage_nr_pages(page)); - /* * Isolating the page has taken another reference, so the * caller's reference can be safely dropped without the page @@ -1965,8 +1952,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, if (nr_remaining) { if (!list_empty(&migratepages)) { list_del(&page->lru); - dec_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); putback_lru_page(page); } isolated = 0; @@ -1996,7 +1981,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, pg_data_t *pgdat = NODE_DATA(node); int isolated = 0; struct page *new_page = NULL; - int page_lru = page_is_file_cache(page); unsigned long start = address & HPAGE_PMD_MASK; new_page = alloc_pages_node(node, @@ -2042,8 +2026,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, /* Retake the callers reference and putback on LRU */ get_page(page); putback_lru_page(page); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR); goto out_unlock; } @@ -2093,9 +2075,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR); count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, - -HPAGE_PMD_NR); return isolated; out_fail: diff --git a/mm/vmscan.c b/mm/vmscan.c index b4fa04d10ba6..ca192b792d4f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1016,6 +1016,9 @@ int remove_mapping(struct address_space *mapping, struct page *page) void putback_lru_page(struct page *page) { lru_cache_add(page); + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_cache(page), + -hpage_nr_pages(page)); put_page(page); /* drop ref from isolate */ } @@ -1481,6 +1484,9 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ nr_reclaimed += nr_pages; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -nr_pages); /* * Is there need to periodically free_page_list? It would * appear not as the counts should be low @@ -1556,7 +1562,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, TTU_IGNORE_ACCESS, &dummy_stat, true); list_splice(&clean_pages, page_list); - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret); return ret; } @@ -1632,6 +1637,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) */ ClearPageLRU(page); ret = 0; + __mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } return ret; @@ -1763,6 +1771,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, total_scan, skipped, nr_taken, mode, lru); update_lru_sizes(lruvec, lru, nr_zone_taken); + return nr_taken; } @@ -1811,6 +1820,9 @@ int isolate_lru_page(struct page *page) ClearPageLRU(page); del_page_from_lru_list(page, lruvec, lru); ret = 0; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } spin_unlock_irq(&pgdat->lru_lock); } @@ -1902,6 +1914,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); list_move(&page->lru, &lruvec->lists[lru]); + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); if (put_page_testzero(page)) { __ClearPageLRU(page); __ClearPageActive(page); @@ -1979,7 +1994,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT; @@ -2005,8 +2019,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, move_pages_to_lru(lruvec, &page_list); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&page_list); @@ -2065,7 +2077,6 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; __count_vm_events(PGREFILL, nr_scanned); @@ -2134,7 +2145,6 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGDEACTIVATE, nr_deactivate); __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&l_active); From patchwork Thu Jul 11 01:25:28 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11039179 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id DD86A14C0 for ; Thu, 11 Jul 2019 01:26:05 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id CD7E528A0D for ; Thu, 11 Jul 2019 01:26:05 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C1B3228A15; Thu, 11 Jul 2019 01:26:05 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id C9C5628A0D for ; Thu, 11 Jul 2019 01:26:04 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 9D0BA8E00A5; Wed, 10 Jul 2019 21:26:03 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 9809A8E0032; Wed, 10 Jul 2019 21:26:03 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 848EF8E00A5; Wed, 10 Jul 2019 21:26:03 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pf1-f200.google.com (mail-pf1-f200.google.com [209.85.210.200]) by kanga.kvack.org (Postfix) with ESMTP id 48C978E0032 for ; Wed, 10 Jul 2019 21:26:03 -0400 (EDT) Received: by mail-pf1-f200.google.com with SMTP id 6so2453656pfi.6 for ; Wed, 10 Jul 2019 18:26:03 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=24jcnqUkS/NXF8aH8va17XJ9ebArqkYpqhxw1QfB8xU=; b=mSzf7hR/jQ8hPH9Vm/0s6MN6p381sHumZvYfNlyZdTrPr8Bwm5FARPb50t/7bH3f45 73zssoBAaUcNYGIonVAmkhhZKBT8yIis+iyWfuvR8r1/Zr6hKoqV9HrLEuYN/5GUQCET SnO9zM55Mz/7ejWDvcl5KMA5yQa0KZvvDnIp7vtMCZDZdGaZZ4WW12QFyq65Gyf0NY9p o8vLoUvd/AF1y5uN+L6AXyitSLOeb/dS0ijPVgJ8pIMSq94BQeGcZBu4lc1ahxHewmq4 G/5pBkP9k/2quyXZUxNnqHnCrpp701SMpTJft3MMkSXh2vpfut58L0qDJN1+/ZJENS7O 5vSA== X-Gm-Message-State: APjAAAW4IKXgEafXyC3ZS5sPKwT294NLNVZe8/b1caK0YJh87JvpT61x JsBkjGH11A8FX7jwaPifMWz76v7iR4kyyoke5WnKlCvYBhPjFE32Y2ddRVE4jngAPNQTXiqly25 HfUYuwx732r5B5cJlr6w08FZVipgXP3zQZkWfC0PDxSxdB90FqQBkYh9A5iHAxE8= X-Received: by 2002:a17:90a:fa07:: with SMTP id cm7mr1539088pjb.138.1562808362848; Wed, 10 Jul 2019 18:26:02 -0700 (PDT) X-Received: by 2002:a17:90a:fa07:: with SMTP id cm7mr1539031pjb.138.1562808361775; Wed, 10 Jul 2019 18:26:01 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1562808361; cv=none; d=google.com; s=arc-20160816; b=gI65ime3Z1qVxwpQ/wnH4w2DO4qO3tR/jKSeIBc9l+voUj8dCkAdzdqxBjS3R+bTfS /xwBS1UwsZdpXFj1MfATKpw2rHfdpLc2CHRIrXcibyHHeQdVuMP+3nJv/N0ORS+fT3j9 XyA38U4SYD/CPhiyrHGILQPDxJm/BZ1ryP6pxkl46dB8yRBKsamM6uCiAG9IGTr9/m3z 3Pd969LXPNzPYqPRpWKer6j9ThCQaniJhFeVVkMpgBLxtTzFPa5E8h0v++5UPZfiq0TF wpWkR2X42blcz/hLHgidzET//Ly2KMYiSmEGH/L19G8QMLBxhX5iLBcWhhjsMD6zjuir edIQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=24jcnqUkS/NXF8aH8va17XJ9ebArqkYpqhxw1QfB8xU=; b=JpKcYtrXiKtPXiMuY6at4IkCXJlOzzzueun+F9eeECiWkMP+VXIvrSoaL/di5Xc1Am acEbAJaPqWwA9InnKl5FsvC1Y0uKb1DR1dil+V0pSZEpsYX0xJ7VpXnzu/+rZOB175UX 6q5xDFbfKHB1lFfABGzqXR6SwYDkNPjhX++oPduIBJ4dlfKUHSb4ARW80m6ZN1tKmPpQ Y9488to8LYTJhlrxIIfWJJ4TmrbBFZUqgVqb4M7xLMobjEuM5Bt6fuaAQ5HiiRiqZyjs ind5muVYzwXq+ELtZWR79Y6joN7pvyVmLQbYesgjpTyl94YyNt8s+LqmiVM9tmvpEmLT Nxdw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Ksld3UEg; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id q36sor2107879pgl.81.2019.07.10.18.26.01 for (Google Transport Security); Wed, 10 Jul 2019 18:26:01 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=Ksld3UEg; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=24jcnqUkS/NXF8aH8va17XJ9ebArqkYpqhxw1QfB8xU=; b=Ksld3UEgh0GfQTrvkzpKACpnRZhEjsspy0YGSBYwcEHl8glu9rn0aO/LrFVsCgxZbP kQL4crWekZVd6asOuHWOzpYEit/eoE+LZFqb870Klm2OQtcRHETukABGGnCDn4kA22IB Lyrcyaxsk9NJPmNYf0pkeMglZwiYLSZNY+os76Zp9Eo3fW0i3OZh/oQYoSJ03mUaDXBg MQ7/6DD5kQRXlp3q1LC/5i7BQhEhyuod1+p3V/7uZ0KAAjEpHui1uVhxI16wHHO6SENO Id0WLZJDfqrxqC24/B4IY7Y6rol11egnzgjrUHjzHqWglPMJ6RhEUezfqHTY09UMnWOo qXcg== X-Google-Smtp-Source: APXvYqzM5IJf6ixMvYWEC3R66//YxsankW0ieNb+ScNQL+bV7IlMOZ3P+U4kjZUC1kDdpFXgYI4QlQ== X-Received: by 2002:a63:c10d:: with SMTP id w13mr1393291pgf.28.1562808361198; Wed, 10 Jul 2019 18:26:01 -0700 (PDT) Received: from bbox-2.seo.corp.google.com ([2401:fa00:d:0:98f1:8b3d:1f37:3e8]) by smtp.gmail.com with ESMTPSA id b37sm10031974pjc.15.2019.07.10.18.25.56 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Wed, 10 Jul 2019 18:26:00 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v4 4/4] mm: introduce MADV_PAGEOUT Date: Thu, 11 Jul 2019 10:25:28 +0900 Message-Id: <20190711012528.176050-5-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190711012528.176050-1-minchan@kernel.org> References: <20190711012528.176050-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. A note: It doesn't apply SWAP_CLUSTER_MAX LRU page isolation limit intentionally because it's automatically bounded by PMD size. If PMD size(e.g., 256) makes some trouble, we could fix it later by limit it to SWAP_CLUSTER_MAX[1]. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. Pages belonging to a shared mapping are only processed if a write access is allowed for the calling process. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. * v3 * man page material modification - mhocko * remove using SWAP_CLUSTER_MAX - mhocko * v2 * add comment about SWAP_CLUSTER_MAX - mhocko * add permission check to prevent sidechannel attack - mhocko * add man page stuff - dave * v1 * change pte to old and rely on the other's reference - hannes * remove page_mapcount to check shared page - mhocko * RFC v2 * make reclaim_pages simple via factoring out isolate logic - hannes * RFCv1 * rename from MADV_COLD to MADV_PAGEOUT - hannes * bail out if process is being killed - Hillf * fix reclaim_pages bugs - Hillf [1] https://lore.kernel.org/lkml/20190710194719.GS29695@dhcp22.suse.cz/ Acked-by: Michal Hocko Signed-off-by: Minchan Kim Signed-off-by: Minchan Kim Acked-by: Michal Hocko --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/madvise.c | 197 +++++++++++++++++++++++++ mm/vmscan.c | 55 +++++++ 4 files changed, 254 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0ce997edb8bb..063c0c1e112b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -365,6 +365,7 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long vm_total_pages; +extern unsigned long reclaim_pages(struct list_head *page_list); #ifdef CONFIG_NUMA extern int node_reclaim_mode; extern int sysctl_min_unmapped_ratio; diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index ef8a56927b12..c613abdb7284 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -46,6 +46,7 @@ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ #define MADV_COLD 5 /* deactivatie these pages */ +#define MADV_PAGEOUT 6 /* reclaim these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/madvise.c b/mm/madvise.c index bae0055f9724..bc2f0138982e 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -41,6 +42,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_COLD: + case MADV_PAGEOUT: case MADV_FREE: return 0; default: @@ -480,6 +482,198 @@ static long madvise_cold(struct vm_area_struct *vma, return 0; } +static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + LIST_HEAD(page_list); + struct page *page; + unsigned long next; + + if (fatal_signal_pending(current)) + return -EINTR; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (isolate_lru_page(page)) + goto huge_unlock; + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_tlb_entry(tlb, pmd, addr); + } + + ClearPageReferenced(page); + test_and_clear_page_young(page); + list_add(&page->lru, &page_list); +huge_unlock: + spin_unlock(ptl); + reclaim_pages(&page_list); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; +regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + /* + * creating a THP page is expensive so split it only if we + * are sure it's worth. Split it if we are only owner. + */ + if (PageTransCompound(page)) { + if (page_mapcount(page) != 1) + break; + get_page(page); + if (!trylock_page(page)) { + put_page(page); + break; + } + pte_unmap_unlock(orig_pte, ptl); + if (split_huge_page(page)) { + unlock_page(page); + put_page(page); + pte_offset_map_lock(mm, pmd, addr, &ptl); + break; + } + unlock_page(page); + put_page(page); + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + pte--; + addr -= PAGE_SIZE; + continue; + } + + VM_BUG_ON_PAGE(PageTransCompound(page), page); + + if (isolate_lru_page(page)) + continue; + + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + ClearPageReferenced(page); + test_and_clear_page_young(page); + list_add(&page->lru, &page_list); + } + + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + reclaim_pages(&page_list); + cond_resched(); + + return 0; +} + +static void madvise_pageout_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk pageout_walk = { + .pmd_entry = madvise_pageout_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &pageout_walk); + tlb_end_vma(tlb, vma); +} + +static inline bool can_do_pageout(struct vm_area_struct *vma) +{ + if (vma_is_anonymous(vma)) + return true; + if (!vma->vm_file) + return false; + /* + * paging out pagecache only for non-anonymous mappings that correspond + * to the files the calling process could (if tried) open for writing; + * otherwise we'd be including shared non-exclusive mappings, which + * opens a side channel. + */ + return inode_owner_or_capable(file_inode(vma->vm_file)) || + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; +} + +static long madvise_pageout(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + if (!can_do_pageout(vma)) + return 0; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_pageout_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -870,6 +1064,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_willneed(vma, prev, start, end); case MADV_COLD: return madvise_cold(vma, prev, start, end); + case MADV_PAGEOUT: + return madvise_pageout(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -892,6 +1088,7 @@ madvise_behavior_valid(int behavior) case MADV_DONTNEED: case MADV_FREE: case MADV_COLD: + case MADV_PAGEOUT: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/vmscan.c b/mm/vmscan.c index ca192b792d4f..bda3c41de767 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2153,6 +2153,61 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_deactivate, nr_rotated, sc->priority, file); } +unsigned long reclaim_pages(struct list_head *page_list) +{ + int nid = -1; + unsigned long nr_reclaimed = 0; + LIST_HEAD(node_page_list); + struct reclaim_stat dummy_stat; + struct page *page; + struct scan_control sc = { + .gfp_mask = GFP_KERNEL, + .priority = DEF_PRIORITY, + .may_writepage = 1, + .may_unmap = 1, + .may_swap = 1, + }; + + while (!list_empty(page_list)) { + page = lru_to_page(page_list); + if (nid == -1) { + nid = page_to_nid(page); + INIT_LIST_HEAD(&node_page_list); + } + + if (nid == page_to_nid(page)) { + list_move(&page->lru, &node_page_list); + continue; + } + + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + page = lru_to_page(&node_page_list); + list_del(&page->lru); + putback_lru_page(page); + } + + nid = -1; + } + + if (!list_empty(&node_page_list)) { + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + page = lru_to_page(&node_page_list); + list_del(&page->lru); + putback_lru_page(page); + } + } + + return nr_reclaimed; +} + /* * The inactive anon list should be small enough that the VM never has * to do too much work.