From patchwork Thu Jun 27 11:54:01 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11019477 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 4D1D61398 for ; Thu, 27 Jun 2019 11:54:26 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 3DE9128AD7 for ; Thu, 27 Jun 2019 11:54:26 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 312D528AE1; Thu, 27 Jun 2019 11:54:26 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 0155F28ADA for ; Thu, 27 Jun 2019 11:54:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 05B3A6B0006; Thu, 27 Jun 2019 07:54:24 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id F27648E0003; Thu, 27 Jun 2019 07:54:23 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id DC9388E0002; Thu, 27 Jun 2019 07:54:23 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pg1-f200.google.com (mail-pg1-f200.google.com [209.85.215.200]) by kanga.kvack.org (Postfix) with ESMTP id A0FB06B0006 for ; Thu, 27 Jun 2019 07:54:23 -0400 (EDT) Received: by mail-pg1-f200.google.com with SMTP id d3so1196310pgc.9 for ; Thu, 27 Jun 2019 04:54:23 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=DefxTfwRvED5uziZmDTt9KuGpjYV3ikuovJXeIcdLyE=; b=nIw/VMZt+Im5v3WuVwIDLnQvWSM9LhNNGTjhKLGP7yM8RpXFepzM4PbSSSNURJq8nM +KyC+SlltHYe+Vg78zTEzIjmeTPdQxoSRDfx34vffBMW4tQgn+3jd0eNb+nUDz57iThy xM1inN7d6hCrcyTDmt6JJOgCQedLx3NBZ3I7HM9qMMy0ei6kn57zFKMgTwycaW836/pX yZzldx88jHchCglOAkBtQ1PYqVl9/WfsIKj908Dc3Tvlx+oj/0nmUX/DUTKpjR5LLx22 CjnLP4q7nwigoPIxRQZAcpLJ7GZisJhILsZueJ5Kqiln8QdAHjanH5mHMqmTg4bZWi7k h03Q== X-Gm-Message-State: APjAAAX65J7bIxJD6SamstSRImDXXxyAnS5hLHcKJBBIOGmQPh1WLXDq cKSaKlWKlXi83WEhnSjGfIBQRh5DQZjlTtejXyAEPf6BQ1HRLnZ1zZ5mBZrxF0GXykrODB764dF kpAwyf6R1L1rfswWH5BA+TWxvERWoc7yoJacIBjUUvUX+fhY0IhoF8kq2XciaCt4= X-Received: by 2002:a63:257:: with SMTP id 84mr3518589pgc.142.1561636463035; Thu, 27 Jun 2019 04:54:23 -0700 (PDT) X-Received: by 2002:a63:257:: with SMTP id 84mr3518503pgc.142.1561636461630; Thu, 27 Jun 2019 04:54:21 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561636461; cv=none; d=google.com; s=arc-20160816; b=WKVakyhTQkxy35rlB33ofjdmNyERT9GqGIdtzhLPjoDf92BtMCACDUebDvlQ5AJiDy ELLlF1ptMdcH2/vK9J/XeLHLduQ6cBuTqUgE5V98QXPMY1qD0k4pd+nw++8QUkIz31nq hPkvjRZBEnh1rWXgzLG3KI3rrKv628Mn5FLSMbF0Zcq+hapoDEsAtODYy6mmhMMryOmK QR2/yV7uJfAKNDGfXOUAz3VA3ixmEVPHZPO1NzzRm2dyMTOioAnPdR2JhqxuURXgzvsr OQoIv3jR3PiXEb835CI+wOtV7hYI9e9JY7xjIAXZLyrhbwqQEf72k76qtyTd/JypZaVM fO5g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=DefxTfwRvED5uziZmDTt9KuGpjYV3ikuovJXeIcdLyE=; b=00BIvXPTy3T7WygDm3ZFzi4Rjl2vdiQY4qRtndVI/lPS48j/jFcYOiTxatglq/W/28 V2Da5MUcYvZUjJE/yUoxS0SRhiCHpE1nZlQbNgwVQrTLrwSeUXZ3xTwgc1q/kh4ypKXY 8DaikbF5wn4k7Xmb+7fclF9skpmTcmE5Am2xB4vMjNFLNhUncgRh3bTakcGFDQ7ppfex vsKQREf9rw7nKzeAH5deqv0tn+8b07iECajyQaoEFdHHfYjUb066qVzwTWZ/geN7XRMp Z/NMO+Eb0+UVDtI5e8IdzGg06YJLXa35tgNzBx89J/JFDjuKVI7wnALFQg72TL4AfIG1 KZWg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=XSqsKH0a; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id s12sor2114791plr.24.2019.06.27.04.54.21 for (Google Transport Security); Thu, 27 Jun 2019 04:54:21 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=XSqsKH0a; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=DefxTfwRvED5uziZmDTt9KuGpjYV3ikuovJXeIcdLyE=; b=XSqsKH0aft4crhZ5QA+rVI0EvIx0Bem4UTM9Gz54Ikoez88vJWGFB3zq2vWZZ8s1NC 1r4YDGirJOEtPON+AHwh4PHn5v4fqaAlKP9IULqSvqr+61drl0FUunYfTwUnjJgJzoXS aiSP3sE92COET2lz7w9iXOvt/bu3f55bNhm+vVnAlEJ1snbziYdG33+7FNKElJdm5crM hA+2qeRViROBSdhZDLU7TcCH5pgbQSgoLncOQ693QPeu7EOR2g/q92TA7kRPp1axSWN9 9TcpomQZrCZhe3/N1HXEukyguYI2vQs/Uc8NebrduIOpiBffBcnqMfhVSdZxDzkdOzJ0 QyHw== X-Google-Smtp-Source: APXvYqxRS0hJF28WHYdNZ3n5hj9MZO/6sWGqhWn46MBYBrKeoLcARN5WZ+6wpXiywgTXJdRimh2VMg== X-Received: by 2002:a17:902:1003:: with SMTP id b3mr4272745pla.172.1561636460354; Thu, 27 Jun 2019 04:54:20 -0700 (PDT) Received: from bbox-1.seo.corp.google.com ([2401:fa00:d:0:d988:f0f2:984f:445b]) by smtp.gmail.com with ESMTPSA id x14sm3241419pfq.158.2019.06.27.04.54.15 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 27 Jun 2019 04:54:19 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v3 1/5] mm: introduce MADV_COLD Date: Thu, 27 Jun 2019 20:54:01 +0900 Message-Id: <20190627115405.255259-2-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190627115405.255259-1-minchan@kernel.org> References: <20190627115405.255259-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range, it could give a hint to kernel that the pages can be reclaimed when memory pressure happens but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_COLD hint to madvise(2) syscall. MADV_COLD can be used by a process to mark a memory range as not expected to be used in the near future. The hint can help kernel in deciding which pages to evict early during memory pressure. It works for every LRU pages like MADV_[DONTNEED|FREE]. IOW, It moves active file page -> inactive file LRU active anon page -> inacdtive anon LRU Unlike MADV_FREE, it doesn't move active anonymous pages to inactive file LRU's head because MADV_COLD is a little bit different symantic. MADV_FREE means it's okay to discard when the memory pressure because the content of the page is *garbage* so freeing such pages is almost zero overhead since we don't need to swap out and access afterward causes just minor fault. Thus, it would make sense to put those freeable pages in inactive file LRU to compete other used-once pages. It makes sense for implmentaion point of view, too because it's not swapbacked memory any longer until it would be re-dirtied. Even, it could give a bonus to make them be reclaimed on swapless system. However, MADV_COLD doesn't mean garbage so reclaiming them requires swap-out/in in the end so it's bigger cost. Since we have designed VM LRU aging based on cost-model, anonymous cold pages would be better to position inactive anon's LRU list, not file LRU. Furthermore, it would help to avoid unnecessary scanning if system doesn't have a swap device. Let's start simpler way without adding complexity at this moment. However, keep in mind, too that it's a caveat that workloads with a lot of pages cache are likely to ignore MADV_COLD on anonymous memory because we rarely age anonymous LRU lists. * man-page material MADV_COLD (since Linux x.x) Do not expect access in the near future so under memory pressure, pages in the specified regions could be reclaimed more aggressively compared to other pages in the system. The difference with MADV_DONTNEED is it doesn't change the semantics of memory access in the specified regions. Thus, it will keep the up-to-date contents of the region. MADV_COLD cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. * v2 * add up the warn with lots of page cache workload - mhocko * add man page stuff - dave * v1 * remove page_mapcount filter - hannes, mhocko * remove idle page handling - joelaf * RFCv2 * add more description - mhocko * RFCv1 * renaming from MADV_COOL to MADV_COLD - hannes * internal review * use clear_page_youn in deactivate_page - joelaf * Revise the description - surenb * Renaming from MADV_WARM to MADV_COOL - surenb Signed-off-by: Minchan Kim Signed-off-by: Minchan Kim Acked-by: Michal Hocko --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/internal.h | 2 +- mm/madvise.c | 180 ++++++++++++++++++++++++- mm/oom_kill.c | 2 +- mm/swap.c | 42 ++++++ 6 files changed, 224 insertions(+), 4 deletions(-) diff --git a/include/linux/swap.h b/include/linux/swap.h index de2c67a33b7e..0ce997edb8bb 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -340,6 +340,7 @@ extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_all(void); extern void rotate_reclaimable_page(struct page *page); extern void deactivate_file_page(struct page *page); +extern void deactivate_page(struct page *page); extern void mark_page_lazyfree(struct page *page); extern void swap_setup(void); diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index ef4623f03156..d7b4231eea63 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -47,6 +47,7 @@ #define MADV_SEQUENTIAL 2 /* expect sequential page references */ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ +#define MADV_COLD 5 /* deactivatie these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/internal.h b/mm/internal.h index f53a14d67538..c61b215ff265 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -39,7 +39,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf); void free_pgtables(struct mmu_gather *tlb, struct vm_area_struct *start_vma, unsigned long floor, unsigned long ceiling); -static inline bool can_madv_dontneed_vma(struct vm_area_struct *vma) +static inline bool can_madv_lru_vma(struct vm_area_struct *vma) { return !(vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)); } diff --git a/mm/madvise.c b/mm/madvise.c index 628022e674a7..7abb8e54bc7a 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -40,6 +40,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_REMOVE: case MADV_WILLNEED: case MADV_DONTNEED: + case MADV_COLD: case MADV_FREE: return 0; default: @@ -307,6 +308,178 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } +static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + struct page *page; + unsigned long next; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + } + + test_and_clear_page_young(page); + deactivate_page(page); +huge_unlock: + spin_unlock(ptl); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; + +regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + + if (pte_none(ptent)) + continue; + + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + /* + * Creating a THP page is expensive so split it only if we + * are sure it's worth. Split it if we are only owner. + */ + if (PageTransCompound(page)) { + if (page_mapcount(page) != 1) + break; + get_page(page); + if (!trylock_page(page)) { + put_page(page); + break; + } + pte_unmap_unlock(orig_pte, ptl); + if (split_huge_page(page)) { + unlock_page(page); + put_page(page); + pte_offset_map_lock(mm, pmd, addr, &ptl); + break; + } + unlock_page(page); + put_page(page); + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + pte--; + addr -= PAGE_SIZE; + continue; + } + + VM_BUG_ON_PAGE(PageTransCompound(page), page); + + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + + /* + * We are deactivating a page for accelerating reclaiming. + * VM couldn't reclaim the page unless we clear PG_young. + * As a side effect, it makes confuse idle-page tracking + * because they will miss recent referenced history. + */ + test_and_clear_page_young(page); + deactivate_page(page); + } + + arch_enter_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + cond_resched(); + + return 0; +} + +static void madvise_cold_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk cold_walk = { + .pmd_entry = madvise_cold_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &cold_walk); + tlb_end_vma(tlb, vma); +} + +static long madvise_cold(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_cold_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -519,7 +692,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, int behavior) { *prev = vma; - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (!userfaultfd_remove(vma, start, end)) { @@ -541,7 +714,7 @@ static long madvise_dontneed_free(struct vm_area_struct *vma, */ return -ENOMEM; } - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) return -EINVAL; if (end > vma->vm_end) { /* @@ -695,6 +868,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_remove(vma, prev, start, end); case MADV_WILLNEED: return madvise_willneed(vma, prev, start, end); + case MADV_COLD: + return madvise_cold(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -716,6 +891,7 @@ madvise_behavior_valid(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_FREE: + case MADV_COLD: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/oom_kill.c b/mm/oom_kill.c index 6de5c354d6ca..2140a6f8db63 100644 --- a/mm/oom_kill.c +++ b/mm/oom_kill.c @@ -523,7 +523,7 @@ bool __oom_reap_task_mm(struct mm_struct *mm) set_bit(MMF_UNSTABLE, &mm->flags); for (vma = mm->mmap ; vma; vma = vma->vm_next) { - if (!can_madv_dontneed_vma(vma)) + if (!can_madv_lru_vma(vma)) continue; /* diff --git a/mm/swap.c b/mm/swap.c index 607c48229a1d..a91859d061f3 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -47,6 +47,7 @@ int page_cluster; static DEFINE_PER_CPU(struct pagevec, lru_add_pvec); static DEFINE_PER_CPU(struct pagevec, lru_rotate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_deactivate_file_pvecs); +static DEFINE_PER_CPU(struct pagevec, lru_deactivate_pvecs); static DEFINE_PER_CPU(struct pagevec, lru_lazyfree_pvecs); #ifdef CONFIG_SMP static DEFINE_PER_CPU(struct pagevec, activate_page_pvecs); @@ -538,6 +539,22 @@ static void lru_deactivate_file_fn(struct page *page, struct lruvec *lruvec, update_page_reclaim_stat(lruvec, file, 0); } +static void lru_deactivate_fn(struct page *page, struct lruvec *lruvec, + void *arg) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + int file = page_is_file_cache(page); + int lru = page_lru_base_type(page); + + del_page_from_lru_list(page, lruvec, lru + LRU_ACTIVE); + ClearPageActive(page); + ClearPageReferenced(page); + add_page_to_lru_list(page, lruvec, lru); + + __count_vm_events(PGDEACTIVATE, hpage_nr_pages(page)); + update_page_reclaim_stat(lruvec, file, 0); + } +} static void lru_lazyfree_fn(struct page *page, struct lruvec *lruvec, void *arg) @@ -590,6 +607,10 @@ void lru_add_drain_cpu(int cpu) if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn, NULL); + pvec = &per_cpu(lru_deactivate_pvecs, cpu); + if (pagevec_count(pvec)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + pvec = &per_cpu(lru_lazyfree_pvecs, cpu); if (pagevec_count(pvec)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn, NULL); @@ -623,6 +644,26 @@ void deactivate_file_page(struct page *page) } } +/* + * deactivate_page - deactivate a page + * @page: page to deactivate + * + * deactivate_page() moves @page to the inactive list if @page was on the active + * list and was not an unevictable page. This is done to accelerate the reclaim + * of @page. + */ +void deactivate_page(struct page *page) +{ + if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { + struct pagevec *pvec = &get_cpu_var(lru_deactivate_pvecs); + + get_page(page); + if (!pagevec_add(pvec, page) || PageCompound(page)) + pagevec_lru_move_fn(pvec, lru_deactivate_fn, NULL); + put_cpu_var(lru_deactivate_pvecs); + } +} + /** * mark_page_lazyfree - make an anon page lazyfree * @page: page to deactivate @@ -687,6 +728,7 @@ void lru_add_drain_all(void) if (pagevec_count(&per_cpu(lru_add_pvec, cpu)) || pagevec_count(&per_cpu(lru_rotate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_deactivate_file_pvecs, cpu)) || + pagevec_count(&per_cpu(lru_deactivate_pvecs, cpu)) || pagevec_count(&per_cpu(lru_lazyfree_pvecs, cpu)) || need_activate_page_drain(cpu)) { INIT_WORK(work, lru_add_drain_per_cpu); From patchwork Thu Jun 27 11:54:02 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11019479 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id E1C071575 for ; Thu, 27 Jun 2019 11:54:29 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id D414128AE1 for ; Thu, 27 Jun 2019 11:54:29 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id C7C0428ADA; Thu, 27 Jun 2019 11:54:29 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 09CE028AE7 for ; Thu, 27 Jun 2019 11:54:28 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id A95406B0007; Thu, 27 Jun 2019 07:54:27 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id A459B8E0003; Thu, 27 Jun 2019 07:54:27 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8BFE68E0002; Thu, 27 Jun 2019 07:54:27 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 56CBA6B0007 for ; Thu, 27 Jun 2019 07:54:27 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id r7so1323552plo.6 for ; Thu, 27 Jun 2019 04:54:27 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=AAp8d968Yh2opGUVOFegnzF6eMOu3mT7GyxCSQBbrz4=; b=HsRhGZ097V2Op8bzO5sm2o8FrC/PIs9HRzxkLTdLenDSGa7Jtf7RjOLgmrXiOjFxpj 3IF30+3qkZDiWP2ecxsDEXZ2t61K353JVmb8ChOW8hPJMg+UTe4lYLb0TmwJrBOCPlnH TCzx0r7AW1Ecieut7bwlWe/TwYn7fOnmbicC9EigQEomGwt5kbYhmt3wqI66P9GqAUQq GYmz5rkFRkjioVtqIp202ej8ls0oKjxPIGL/3WUDBaAAkqMD82QlbOgS4uzFwDVxOIvH 5oMONZgNHHlSijPpFDLFaqf8a7bG/7CukPm4fMeo8gqwdkHuschoXo27rEwZxvHEuAeY 1UDQ== X-Gm-Message-State: APjAAAWxyoNEnXjFSebcOUUUAKSlKtbc61Z+UTBv3v/Io/KnUcZKLUgK fVUTEh9BsQcCbI97XQsJtzje4qxSh5b+pz9oBUQU7phRu/meEt7ihyIh8yxLcEohBlfIjDxEVNM Q3vUUkP4TxHyfA0nSk5oERZ+2pH21vLG/FS+MsquhtsIEj/JWLDgFvi3b+gSz7iI= X-Received: by 2002:a17:902:a9ca:: with SMTP id b10mr265773plr.69.1561636466915; Thu, 27 Jun 2019 04:54:26 -0700 (PDT) X-Received: by 2002:a17:902:a9ca:: with SMTP id b10mr265724plr.69.1561636466196; Thu, 27 Jun 2019 04:54:26 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561636466; cv=none; d=google.com; s=arc-20160816; b=mBVUAo7+M4NTQjUe7JVjhYsmd6fxtr6BauVsME6yZr1p2E43QJfDbSy5T92o6bmjTy ouk+H2ZxTed4xF7ytv/c7XxMssHbadf7ZJHVe7D45/7XjZ7542fMwCgWTQ9qvFYTQzxH B3fOJ4IlUb0D2CC1Jvn6m+bb5EVq/IYoYfHHgJNRrsk8rwEOmbuCbzctV03V2B3O5tet YoP6xJdJuv/hIKfexad0WyQdChzlT0OhStSJqsVGL02gf+iucGlNmeOkmZ3bhMZi5v/z QRnLRac1v3WD9lePX9/i8MxpNblAsl6LRGEEVgX8tKYoWzzYyda4idLilovJSuri7BYJ 66Kw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=AAp8d968Yh2opGUVOFegnzF6eMOu3mT7GyxCSQBbrz4=; b=fAMXTLbhBAPwuwMmQmdN1pjmcF58a65jWGCbpOeB0D8JLmWgMvZrgam190yt6CCjDb u3pK8ddPZkBJGuUPQyh0DqgKIvrRCCshN2QBmZ5iU/IIvOL/0H1hD472fPiDkNCnY4dN Xe/ME+EJWQO/UxgAVTMyiQBVnwB6iUCBPb7CzHD1K9948gKZX3EC1gcVWev/s9V7qlMN rodF9n8yvxR+l7N2sXqAW0ltl96mng2OpdM9Xif25vPbkzljWjyJkaprfF6JpdiZIb/A 7cZ04Q9butylDReJLhSghjJnqVw0k7jgNO9JYOLxQ4NmyRus1xBCwYkQSg5iRV3vsiVC Y6wg== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=IusLSOBj; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id f194sor1041732pfa.8.2019.06.27.04.54.26 for (Google Transport Security); Thu, 27 Jun 2019 04:54:26 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=IusLSOBj; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=AAp8d968Yh2opGUVOFegnzF6eMOu3mT7GyxCSQBbrz4=; b=IusLSOBjKyjzEsTIjCyYaX7iDxTc+jt9jznyuZqnSu292hsSCly7AXcCPkLqGadrsU GS3+VBsW8saL2SZYTRKOIxeQ4qrKDrv/fiq8yO1AsrJVbWUPIQa18kwtFnCB6EmUDQgY zmFLUSMSOckUpxgrm8gQPnSy/5YIi0WHywEMXVeEd1M2sABbbe81pzPqtIg8yXZhe15d VSCTBlwiaMVgs+x0+j5D2+nmM5ulbkKn9ovG7BEb6bOsx8UVMyJ5E4LoQaQ5FWBAiiTp mOtBV7bnx9ZmbLyR2+r9g1qY63mV6QqOqAD6BZmkxF9j2IcLNTQNhQupulksZ+bvJWl1 jO6A== X-Google-Smtp-Source: APXvYqw+UmMVOCkMwFhSUPGnw/i5c2zgEVNDeOFbpvqoW2kE2ETfKhNvJF6jlv44hoNP2zJ6SNMOdg== X-Received: by 2002:a63:4553:: with SMTP id u19mr3404760pgk.420.1561636465689; Thu, 27 Jun 2019 04:54:25 -0700 (PDT) Received: from bbox-1.seo.corp.google.com ([2401:fa00:d:0:d988:f0f2:984f:445b]) by smtp.gmail.com with ESMTPSA id x14sm3241419pfq.158.2019.06.27.04.54.20 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 27 Jun 2019 04:54:24 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v3 2/5] mm: change PAGEREF_RECLAIM_CLEAN with PAGE_REFRECLAIM Date: Thu, 27 Jun 2019 20:54:02 +0900 Message-Id: <20190627115405.255259-3-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190627115405.255259-1-minchan@kernel.org> References: <20190627115405.255259-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The local variable references in shrink_page_list is PAGEREF_RECLAIM_CLEAN as default. It is for preventing to reclaim dirty pages when CMA try to migrate pages. Strictly speaking, we don't need it because CMA didn't allow to write out by .may_writepage = 0 in reclaim_clean_pages_from_list. Moreover, it has a problem to prevent anonymous pages's swap out even though force_reclaim = true in shrink_page_list on upcoming patch. So this patch makes references's default value to PAGEREF_RECLAIM and rename force_reclaim with ignore_references to make it more clear. This is a preparatory work for next patch. * RFCv1 * use ignore_referecnes as parameter name - hannes Acked-by: Michal Hocko Acked-by: Johannes Weiner Signed-off-by: Minchan Kim --- mm/vmscan.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/mm/vmscan.c b/mm/vmscan.c index 9e3292ee5c7c..49e9ee4d771d 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1117,7 +1117,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct scan_control *sc, enum ttu_flags ttu_flags, struct reclaim_stat *stat, - bool force_reclaim) + bool ignore_references) { LIST_HEAD(ret_pages); LIST_HEAD(free_pages); @@ -1131,7 +1131,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, struct address_space *mapping; struct page *page; int may_enter_fs; - enum page_references references = PAGEREF_RECLAIM_CLEAN; + enum page_references references = PAGEREF_RECLAIM; bool dirty, writeback; unsigned int nr_pages; @@ -1262,7 +1262,7 @@ static unsigned long shrink_page_list(struct list_head *page_list, } } - if (!force_reclaim) + if (!ignore_references) references = page_check_references(page, sc); switch (references) { From patchwork Thu Jun 27 11:54:03 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11019481 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 90A551575 for ; Thu, 27 Jun 2019 11:54:35 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 8270A28AD5 for ; Thu, 27 Jun 2019 11:54:35 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 7641228AE7; Thu, 27 Jun 2019 11:54:35 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7DD4E28AD5 for ; Thu, 27 Jun 2019 11:54:34 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 7394A6B0008; Thu, 27 Jun 2019 07:54:33 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 6EA058E0003; Thu, 27 Jun 2019 07:54:33 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 564598E0002; Thu, 27 Jun 2019 07:54:33 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 188BF6B0008 for ; Thu, 27 Jun 2019 07:54:33 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id s22so1327146plp.5 for ; Thu, 27 Jun 2019 04:54:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=BuVEzWKbeZodaY1DtSQ/rLjfKBb5D8VfUouLYJEzuIE=; b=uFIxv0IqwMCO1OZ8r5Ex9z/2WLecq39xmJD7eczFS3FCePtS4V+zxL1nHmhN3acL1L Z8h65jN23hTtartprlR5tDch4yC50WsIQDr1zbERtOLhR31CelXR2xBVfpY3SCXPy7r3 paiZ+peVqaS1yBK+8uEA4zJwctKJZFjsDbAlxIX3XKU2KNPiBuBip2LU83oeQvAJ5Ztm rfZSJv4QusOsKIdj81FLarHN8L/bPFpP5O4Xyu6pf6GnfrM3mYulLmMOUHj0tcAC9+X+ czIaLhvPnMavAOVTcnPPMunyQUuCl7HWvb3UpX5qToBORyK6fTg0qWJ0repJKnWPnkAa QNtQ== X-Gm-Message-State: APjAAAVQMMSnW2wnu8irYqyOFEbfwebsqYCqmq/QEhTtdAwIKaFrtFTS LZFsrPWBno6BXiobCBPFEkSRC/1Y7Fgxx+mxK9fb2VQHEpFjQQG+Po245Qm6au7nPfv8uudycOr osOnWVBC+Wcw3gvaCehO+kYyXXUlzylsjUPvrDQvOpJmYeztwIOG6ksQ5uTn37sw= X-Received: by 2002:a17:902:d695:: with SMTP id v21mr3956376ply.342.1561636472731; Thu, 27 Jun 2019 04:54:32 -0700 (PDT) X-Received: by 2002:a17:902:d695:: with SMTP id v21mr3956308ply.342.1561636471675; Thu, 27 Jun 2019 04:54:31 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561636471; cv=none; d=google.com; s=arc-20160816; b=TLZ+/jeGhao173XNI8TsuKiOTnQfnpvuU07PmHMnP+Keh949SebAo0me6sJ63fONNC BckiHiKJojMqjt7/zg8GOg731LMDudXpNCq4dB7E/4gfzTFX2kPiqNAohMkg6ddfjd6r OEPHBfDwoQjwHRXoyc3l9IoMOCHJfCs7PSI2wdT2xFgKGoD7dZ2EZPGLV+eHkDp3sEM/ pk5DCC8co1KifDrqelscb5jqmOSShfN3U0xXlztHSzIQsvygZ3+iOqaJbk9k6qIKqyEa Aatk3yTnt1AQkOJEVcxFCzbHeu9vGUHlozUF8a9j/RVa9loogl2O4NyyjshLdrSWj5cf srQw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=BuVEzWKbeZodaY1DtSQ/rLjfKBb5D8VfUouLYJEzuIE=; b=ojYpZP/QV8kT0vtWaWGXl4HHe671GvinGo8ZytIk0N+9OaUxHPzN7p6d2XJDjcj7IG NgJB3WsRXRtJC5WAin1Zji0GiU4f3q9gLEYgqyt37oY+yJGSQw9PRaDomFlk/3wgG/cD o/U6oeeIVjdPJdD4qJm+qi1r+nmmG09QaE1KcutwYWuMX9zHotxQeJN5EIoB2NYblh1m QzGEdRYPPC9Chvu6RBFqlPV7dbOKjEWEAD4cFSeA/59/BTRx9Nmd2AfbHVAcPebmeb4Q 50CezXnC5pk4agVTMJQdxEgRCcnI/Fe6SIS358t4q9KQSClR7yefd/FHa1rNbBofokUE VJmA== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=hTcH+5QW; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id v75sor6082969pjb.26.2019.06.27.04.54.31 for (Google Transport Security); Thu, 27 Jun 2019 04:54:31 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=hTcH+5QW; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=BuVEzWKbeZodaY1DtSQ/rLjfKBb5D8VfUouLYJEzuIE=; b=hTcH+5QWaidgZDsApBE1+6el+QVb/e7UZYRZUJqv8BCAg6tJqcy1Klv7J+bvCVKJ1v 4eZJjq+1GHxqMNAHoycM4+4aa/RQrZuJ7bVWqIlGCB8jGEfUhjwozEGtd5mog+ZwBP8t pCqRrkYzyiFpdq/PaT8I7RqnSFapxOJb1gtGdHhWhndY4pfnYt7WEdLIkn9VeY7VOH0c fVzNg6SSSG0/TWiibN9Io1a4pKoaOtzRQbiSLd0XJFf4p5u9B/hT1VunUKQJOijI5DO3 qnJTJ9o7KqmUlRBnSLVSgr92Pdasj+kssorHTS/qk37w+KifZoO86tBnCP/W/TOu95Rt B/Og== X-Google-Smtp-Source: APXvYqxhan3s16KjBk3Mp/Tu8jm4OH897OBDSn9aszzMUXfDMo3p8QxfacVymd5q7l8bCkiLo9Om6w== X-Received: by 2002:a17:90a:9bc5:: with SMTP id b5mr5614905pjw.109.1561636471073; Thu, 27 Jun 2019 04:54:31 -0700 (PDT) Received: from bbox-1.seo.corp.google.com ([2401:fa00:d:0:d988:f0f2:984f:445b]) by smtp.gmail.com with ESMTPSA id x14sm3241419pfq.158.2019.06.27.04.54.25 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 27 Jun 2019 04:54:29 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v3 3/5] mm: account nr_isolated_xxx in [isolate|putback]_lru_page Date: Thu, 27 Jun 2019 20:54:03 +0900 Message-Id: <20190627115405.255259-4-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190627115405.255259-1-minchan@kernel.org> References: <20190627115405.255259-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP The isolate counting is pecpu counter so it would be not huge gain to work them by batch. Rather than complicating to make them batch, let's make it more stright-foward via adding the counting logic into [isolate|putback]_lru_page API. * v1 * fix accounting bug - Hillf Link: http://lkml.kernel.org/r/20190531165927.GA20067@cmpxchg.org Suggested-by: Johannes Weiner Signed-off-by: Minchan Kim Acked-by: Michal Hocko --- mm/compaction.c | 2 -- mm/gup.c | 7 +------ mm/khugepaged.c | 3 --- mm/memory-failure.c | 3 --- mm/memory_hotplug.c | 4 ---- mm/mempolicy.c | 6 +----- mm/migrate.c | 37 ++++++++----------------------------- mm/vmscan.c | 22 ++++++++++++++++------ 8 files changed, 26 insertions(+), 58 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 9e1b9acb116b..c6591682deda 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -982,8 +982,6 @@ isolate_migratepages_block(struct compact_control *cc, unsigned long low_pfn, /* Successfully isolated */ del_page_from_lru_list(page, lruvec, page_lru(page)); - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); isolate_success: list_add(&page->lru, &cc->migratepages); diff --git a/mm/gup.c b/mm/gup.c index 7dde2e3a1963..aec3a2b7e61b 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1473,13 +1473,8 @@ static long check_and_migrate_cma_pages(struct task_struct *tsk, drain_allow = false; } - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, &cma_page_list); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + - page_is_file_cache(head), - hpage_nr_pages(head)); - } } } } diff --git a/mm/khugepaged.c b/mm/khugepaged.c index 0f7419938008..7da34e198ec5 100644 --- a/mm/khugepaged.c +++ b/mm/khugepaged.c @@ -503,7 +503,6 @@ void __khugepaged_exit(struct mm_struct *mm) static void release_pte_page(struct page *page) { - dec_node_page_state(page, NR_ISOLATED_ANON + page_is_file_cache(page)); unlock_page(page); putback_lru_page(page); } @@ -602,8 +601,6 @@ static int __collapse_huge_page_isolate(struct vm_area_struct *vma, result = SCAN_DEL_PAGE_LRU; goto out; } - inc_node_page_state(page, - NR_ISOLATED_ANON + page_is_file_cache(page)); VM_BUG_ON_PAGE(!PageLocked(page), page); VM_BUG_ON_PAGE(PageLRU(page), page); diff --git a/mm/memory-failure.c b/mm/memory-failure.c index 7e08cbf3ba49..3586e8226e4e 100644 --- a/mm/memory-failure.c +++ b/mm/memory-failure.c @@ -1795,9 +1795,6 @@ static int __soft_offline_page(struct page *page, int flags) * so use !__PageMovable instead for LRU page's mapping * cannot have PAGE_MAPPING_MOVABLE. */ - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); list_add(&page->lru, &pagelist); ret = migrate_pages(&pagelist, new_page, NULL, MPOL_MF_MOVE_ALL, MIGRATE_SYNC, MR_MEMORY_FAILURE); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index dfab21dc33dc..68577c677b46 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1384,10 +1384,6 @@ do_migrate_range(unsigned long start_pfn, unsigned long end_pfn) ret = isolate_movable_page(page, ISOLATE_UNEVICTABLE); if (!ret) { /* Success */ list_add_tail(&page->lru, &source); - if (!__PageMovable(page)) - inc_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); - } else { pr_warn("failed to isolate pfn %lx\n", pfn); dump_page(page, "isolation failed"); diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 64562809bf3b..03081f3404ca 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -994,12 +994,8 @@ static int migrate_page_add(struct page *page, struct list_head *pagelist, * Avoid migrating a page that is shared with others. */ if ((flags & MPOL_MF_MOVE_ALL) || page_mapcount(head) == 1) { - if (!isolate_lru_page(head)) { + if (!isolate_lru_page(head)) list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); - } } return 0; diff --git a/mm/migrate.c b/mm/migrate.c index 572b4bc85d76..5583324c01e7 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -190,8 +190,6 @@ void putback_movable_pages(struct list_head *l) unlock_page(page); put_page(page); } else { - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); putback_lru_page(page); } } @@ -1181,10 +1179,17 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, return -ENOMEM; if (page_count(page) == 1) { + bool is_lru = !__PageMovable(page); + /* page was freed from under us. So we are done. */ ClearPageActive(page); ClearPageUnevictable(page); - if (unlikely(__PageMovable(page))) { + if (likely(is_lru)) + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); + else { lock_page(page); if (!PageMovable(page)) __ClearPageIsolated(page); @@ -1210,15 +1215,6 @@ static ICE_noinline int unmap_and_move(new_page_t get_new_page, * restored. */ list_del(&page->lru); - - /* - * Compaction can migrate also non-LRU pages which are - * not accounted to NR_ISOLATED_*. They can be recognized - * as __PageMovable - */ - if (likely(!__PageMovable(page))) - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + - page_is_file_cache(page), -hpage_nr_pages(page)); } /* @@ -1572,9 +1568,6 @@ static int add_page_for_migration(struct mm_struct *mm, unsigned long addr, err = 0; list_add_tail(&head->lru, pagelist); - mod_node_page_state(page_pgdat(head), - NR_ISOLATED_ANON + page_is_file_cache(head), - hpage_nr_pages(head)); } out_putpage: /* @@ -1890,8 +1883,6 @@ static struct page *alloc_misplaced_dst_page(struct page *page, static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) { - int page_lru; - VM_BUG_ON_PAGE(compound_order(page) && !PageTransHuge(page), page); /* Avoid migrating to a node that is nearly full */ @@ -1913,10 +1904,6 @@ static int numamigrate_isolate_page(pg_data_t *pgdat, struct page *page) return 0; } - page_lru = page_is_file_cache(page); - mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + page_lru, - hpage_nr_pages(page)); - /* * Isolating the page has taken another reference, so the * caller's reference can be safely dropped without the page @@ -1971,8 +1958,6 @@ int migrate_misplaced_page(struct page *page, struct vm_area_struct *vma, if (nr_remaining) { if (!list_empty(&migratepages)) { list_del(&page->lru); - dec_node_page_state(page, NR_ISOLATED_ANON + - page_is_file_cache(page)); putback_lru_page(page); } isolated = 0; @@ -2002,7 +1987,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, pg_data_t *pgdat = NODE_DATA(node); int isolated = 0; struct page *new_page = NULL; - int page_lru = page_is_file_cache(page); unsigned long start = address & HPAGE_PMD_MASK; new_page = alloc_pages_node(node, @@ -2048,8 +2032,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, /* Retake the callers reference and putback on LRU */ get_page(page); putback_lru_page(page); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, -HPAGE_PMD_NR); goto out_unlock; } @@ -2099,9 +2081,6 @@ int migrate_misplaced_transhuge_page(struct mm_struct *mm, count_vm_events(PGMIGRATE_SUCCESS, HPAGE_PMD_NR); count_vm_numa_events(NUMA_PAGE_MIGRATE, HPAGE_PMD_NR); - mod_node_page_state(page_pgdat(page), - NR_ISOLATED_ANON + page_lru, - -HPAGE_PMD_NR); return isolated; out_fail: diff --git a/mm/vmscan.c b/mm/vmscan.c index 49e9ee4d771d..223ce5da08f0 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1014,6 +1014,9 @@ int remove_mapping(struct address_space *mapping, struct page *page) void putback_lru_page(struct page *page) { lru_cache_add(page); + mod_node_page_state(page_pgdat(page), + NR_ISOLATED_ANON + page_is_file_cache(page), + -hpage_nr_pages(page)); put_page(page); /* drop ref from isolate */ } @@ -1479,6 +1482,9 @@ static unsigned long shrink_page_list(struct list_head *page_list, */ nr_reclaimed += nr_pages; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -nr_pages); /* * Is there need to periodically free_page_list? It would * appear not as the counts should be low @@ -1554,7 +1560,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, ret = shrink_page_list(&clean_pages, zone->zone_pgdat, &sc, TTU_IGNORE_ACCESS, &dummy_stat, true); list_splice(&clean_pages, page_list); - mod_node_page_state(zone->zone_pgdat, NR_ISOLATED_FILE, -ret); return ret; } @@ -1630,6 +1635,9 @@ int __isolate_lru_page(struct page *page, isolate_mode_t mode) */ ClearPageLRU(page); ret = 0; + __mod_node_page_state(page_pgdat(page), NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } return ret; @@ -1761,6 +1769,7 @@ static unsigned long isolate_lru_pages(unsigned long nr_to_scan, trace_mm_vmscan_lru_isolate(sc->reclaim_idx, sc->order, nr_to_scan, total_scan, skipped, nr_taken, mode, lru); update_lru_sizes(lruvec, lru, nr_zone_taken); + return nr_taken; } @@ -1809,6 +1818,9 @@ int isolate_lru_page(struct page *page) ClearPageLRU(page); del_page_from_lru_list(page, lruvec, lru); ret = 0; + mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + hpage_nr_pages(page)); } spin_unlock_irq(&pgdat->lru_lock); } @@ -1900,6 +1912,9 @@ static unsigned noinline_for_stack move_pages_to_lru(struct lruvec *lruvec, update_lru_size(lruvec, lru, page_zonenum(page), nr_pages); list_move(&page->lru, &lruvec->lists[lru]); + __mod_node_page_state(pgdat, NR_ISOLATED_ANON + + page_is_file_cache(page), + -hpage_nr_pages(page)); if (put_page_testzero(page)) { __ClearPageLRU(page); __ClearPageActive(page); @@ -1977,7 +1992,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &page_list, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; item = current_is_kswapd() ? PGSCAN_KSWAPD : PGSCAN_DIRECT; @@ -2003,8 +2017,6 @@ shrink_inactive_list(unsigned long nr_to_scan, struct lruvec *lruvec, move_pages_to_lru(lruvec, &page_list); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); - spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&page_list); @@ -2063,7 +2075,6 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_taken = isolate_lru_pages(nr_to_scan, lruvec, &l_hold, &nr_scanned, sc, lru); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, nr_taken); reclaim_stat->recent_scanned[file] += nr_taken; __count_vm_events(PGREFILL, nr_scanned); @@ -2132,7 +2143,6 @@ static void shrink_active_list(unsigned long nr_to_scan, __count_vm_events(PGDEACTIVATE, nr_deactivate); __count_memcg_events(lruvec_memcg(lruvec), PGDEACTIVATE, nr_deactivate); - __mod_node_page_state(pgdat, NR_ISOLATED_ANON + file, -nr_taken); spin_unlock_irq(&pgdat->lru_lock); mem_cgroup_uncharge_list(&l_active); From patchwork Thu Jun 27 11:54:04 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11019483 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id C433F1398 for ; Thu, 27 Jun 2019 11:54:41 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id B095C28AB0 for ; Thu, 27 Jun 2019 11:54:41 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id A426D28ADC; Thu, 27 Jun 2019 11:54:41 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id AC31F28AD9 for ; Thu, 27 Jun 2019 11:54:39 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B09FB6B000A; Thu, 27 Jun 2019 07:54:38 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id ABA4E8E0003; Thu, 27 Jun 2019 07:54:38 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 982258E0002; Thu, 27 Jun 2019 07:54:38 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f200.google.com (mail-pl1-f200.google.com [209.85.214.200]) by kanga.kvack.org (Postfix) with ESMTP id 5AC9C6B000A for ; Thu, 27 Jun 2019 07:54:38 -0400 (EDT) Received: by mail-pl1-f200.google.com with SMTP id p14so1333866plq.1 for ; Thu, 27 Jun 2019 04:54:38 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=9i6n4LPYZTA6UMeXmzy/2j1yEQzesDG08Zi61XyneI8=; b=tOYjmIX6rjx9Yfy7TIO0qWl9wwCgVxQlKjzRvPgyb++oQpVNsAdRXZEMCGF0olMIMq m2GSZTZ2sdWipGCczTVsiPC4n3r5HHTX4dxiy1Tzwua6NOmqdsZhKPOjc/tF6B/dy9Hy JfKKi8/Og4sXXVFgv0Ve9OGYspd4vBi8h0zSSwrSxYBBSUF0Kqj+QjjPxdPj/cPNP8Ys UptsIHxS8+TN7mZHakjwZmWvsNwfH4h5gLkO6wxkZJMufHNfhx0vekaFIrkFlSHG+whh PQFzc9rMPZzbobP0gzKCqy8DB03jLHMru6MwXRJeJlmhiOdd9z3yuipLDOd7FIIBMFJ+ OxIA== X-Gm-Message-State: APjAAAWA4fF7Q6luML1bJ/rJp7K22Zw0Yd9Nb1VJ98HrOlYFe22CFAiN PRK8eatf7Hw0acNyU5S3bb6Xp8EsvhTq8Y4zwvtqX0gZxcCyZ8zuleCycyMv+M5/MTu45F4R4L7 em0WjsOgosw3EdzP3JcgsBWsG2q6dloIlDYpg4d7bwz9tnBjmNGKt45YES/OmJiY= X-Received: by 2002:a17:90a:2224:: with SMTP id c33mr5621079pje.22.1561636477935; Thu, 27 Jun 2019 04:54:37 -0700 (PDT) X-Received: by 2002:a17:90a:2224:: with SMTP id c33mr5621003pje.22.1561636476875; Thu, 27 Jun 2019 04:54:36 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561636476; cv=none; d=google.com; s=arc-20160816; b=JPqvZkYvStbNSYPuGSs0XITdkhm0s3lNhDC2psS7IMdRe5W9ibQFqnyzKyprKqGI/+ 5tMVP6W4GFwnydGKzxZI20iR18vTdzQ5NdZeWSJQkr4JiCFghKhm5Sim7rHY1ClneOlW iHnh/8JbleFeOpM0hsLPGotTVJ1gCaJGPfsAsktf/xUmPYljjuSUvV9D5H+e2Oj7TK9N mVX0fhxO5yX5GJInjAnrMONQjuGnDxXOz3XiMyz8H4pXlhNLXJDIPE079zJMm/pUFLpc jgpPhSEQR6jGamGsuEtyGwsDFH6uYrFmdDDOX/PtdF3WG3wVVS22buN5NYDiHLaOD6L3 cnNA== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=9i6n4LPYZTA6UMeXmzy/2j1yEQzesDG08Zi61XyneI8=; b=yyJWeURvrJkf1hjTbNGuoxmQ9GCBWi36nR3tlunHMNCa2fh/l5Gk+SyIqC24yquqDW XEEl+5drx1rixyayPzHMxAvEBkyvF42DB4JNek0DX2aE2xiMhMwHtw03E6bCFe2i0WDm SXwtthTh0eguLezB74sRF7lNEPq7yYurVr9LLHuD1SdH1ehaz3ecTn4LqbsRFdZ4BNDm VK63m8R/RoWhiosrX9n2Pv79FR89kSFNP+G2u6Wx1m1k7stZ6xQZc+2Kq02mZyvDIgOm Thgm2mMzz7zOEqzIvkFCKCLLjM5da9DUjeDYc8xgYCZ7iOBg0Mnod3BlTLB1GUaQRlCk Ki5Q== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GEdjQGyv; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id t69sor6054803pjb.5.2019.06.27.04.54.36 for (Google Transport Security); Thu, 27 Jun 2019 04:54:36 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=GEdjQGyv; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=9i6n4LPYZTA6UMeXmzy/2j1yEQzesDG08Zi61XyneI8=; b=GEdjQGyvEFoujze6wWFXw8zsY/WjdKWdVPf65oMkVhPFEaC5qMWXAmneqWWnIms3PZ 3Lz21dTPnV1+DezGh3jd+4Je1gCz4PBEeAPsCzcUTdRKs5HQE/i8m1i+wBN9Oy3JNH+8 oLeflqtrTZZKqpMso4qa+8qOWBSWcKLZr7mk53U+XbsafMAqazjpxme2qkwde05Yq2IS 8W6P97nddKsXSHQfQQLRjXmfyYQBBll07Deeht1n+knri9x8R4AfeW1v69MBKGJbToaE NI4E2StsXVd7v0bhRlOHog8WU0tZXgjyNckFr6xyqOvD0C5iLXhhDuYwh99kMZKZqPIj xaYQ== X-Google-Smtp-Source: APXvYqxriXQXqYbzUk163SO/cpWXZ+wnTH2azEgBoAbhV1yGjtdXyc1Mn/1C6A+ZPUvagH+5oH6Usw== X-Received: by 2002:a17:90a:5288:: with SMTP id w8mr5595080pjh.61.1561636476317; Thu, 27 Jun 2019 04:54:36 -0700 (PDT) Received: from bbox-1.seo.corp.google.com ([2401:fa00:d:0:d988:f0f2:984f:445b]) by smtp.gmail.com with ESMTPSA id x14sm3241419pfq.158.2019.06.27.04.54.31 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 27 Jun 2019 04:54:35 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v3 4/5] mm: introduce MADV_PAGEOUT Date: Thu, 27 Jun 2019 20:54:04 +0900 Message-Id: <20190627115405.255259-5-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190627115405.255259-1-minchan@kernel.org> References: <20190627115405.255259-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP When a process expects no accesses to a certain memory range for a long time, it could hint kernel that the pages can be reclaimed instantly but data should be preserved for future use. This could reduce workingset eviction so it ends up increasing performance. This patch introduces the new MADV_PAGEOUT hint to madvise(2) syscall. MADV_PAGEOUT can be used by a process to mark a memory range as not expected to be used for a long time so that kernel reclaims *any LRU* pages instantly. The hint can help kernel in deciding which pages to evict proactively. - man-page material MADV_PAGEOUT (since Linux x.x) Do not expect access in the near future so pages in the specified regions could be reclaimed instantly regardless of memory pressure. Thus, access in the range after successful operation could cause major page fault but never lose the up-to-date contents unlike MADV_DONTNEED. It works for only private anonymous mappings and non-anonymous mappings that belong to files that the calling process could successfully open for writing; otherwise, it could be used for sidechannel attack. MADV_PAGEOUT cannot be applied to locked pages, Huge TLB pages, or VM_PFNMAP pages. * v2 * add comment about SWAP_CLUSTER_MAX - mhocko * add permission check to prevent sidechannel attack - mhocko * add man page stuff - dave * v1 * change pte to old and rely on the other's reference - hannes * remove page_mapcount to check shared page - mhocko * RFC v2 * make reclaim_pages simple via factoring out isolate logic - hannes * RFCv1 * rename from MADV_COLD to MADV_PAGEOUT - hannes * bail out if process is being killed - Hillf * fix reclaim_pages bugs - Hillf Signed-off-by: Minchan Kim Acked-by: Michal Hocko --- include/linux/swap.h | 1 + include/uapi/asm-generic/mman-common.h | 1 + mm/madvise.c | 212 +++++++++++++++++++++++++ mm/vmscan.c | 58 +++++++ 4 files changed, 272 insertions(+) diff --git a/include/linux/swap.h b/include/linux/swap.h index 0ce997edb8bb..063c0c1e112b 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -365,6 +365,7 @@ extern int vm_swappiness; extern int remove_mapping(struct address_space *mapping, struct page *page); extern unsigned long vm_total_pages; +extern unsigned long reclaim_pages(struct list_head *page_list); #ifdef CONFIG_NUMA extern int node_reclaim_mode; extern int sysctl_min_unmapped_ratio; diff --git a/include/uapi/asm-generic/mman-common.h b/include/uapi/asm-generic/mman-common.h index d7b4231eea63..f545e159b472 100644 --- a/include/uapi/asm-generic/mman-common.h +++ b/include/uapi/asm-generic/mman-common.h @@ -48,6 +48,7 @@ #define MADV_WILLNEED 3 /* will need these pages */ #define MADV_DONTNEED 4 /* don't need these pages */ #define MADV_COLD 5 /* deactivatie these pages */ +#define MADV_PAGEOUT 6 /* reclaim these pages */ /* common parameters: try to keep these consistent across architectures */ #define MADV_FREE 8 /* free pages only if memory pressure */ diff --git a/mm/madvise.c b/mm/madvise.c index 7abb8e54bc7a..ee210473f639 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -11,6 +11,7 @@ #include #include #include +#include #include #include #include @@ -41,6 +42,7 @@ static int madvise_need_mmap_write(int behavior) case MADV_WILLNEED: case MADV_DONTNEED: case MADV_COLD: + case MADV_PAGEOUT: case MADV_FREE: return 0; default: @@ -480,6 +482,213 @@ static long madvise_cold(struct vm_area_struct *vma, return 0; } +static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, + unsigned long end, struct mm_walk *walk) +{ + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = tlb->mm; + struct vm_area_struct *vma = walk->vma; + pte_t *orig_pte, *pte, ptent; + spinlock_t *ptl; + LIST_HEAD(page_list); + struct page *page; + int isolated = 0; + unsigned long next; + + if (fatal_signal_pending(current)) + return -EINTR; + + next = pmd_addr_end(addr, end); + if (pmd_trans_huge(*pmd)) { + pmd_t orig_pmd; + + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + goto huge_unlock; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + goto huge_unlock; + } + + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + int err; + + if (page_mapcount(page) != 1) + goto huge_unlock; + get_page(page); + spin_unlock(ptl); + lock_page(page); + err = split_huge_page(page); + unlock_page(page); + put_page(page); + if (!err) + goto regular_page; + return 0; + } + + if (isolate_lru_page(page)) + goto huge_unlock; + + if (pmd_young(orig_pmd)) { + pmdp_invalidate(vma, addr, pmd); + orig_pmd = pmd_mkold(orig_pmd); + + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_tlb_entry(tlb, pmd, addr); + } + + ClearPageReferenced(page); + test_and_clear_page_young(page); + list_add(&page->lru, &page_list); +huge_unlock: + spin_unlock(ptl); + reclaim_pages(&page_list); + return 0; + } + + if (pmd_trans_unstable(pmd)) + return 0; +regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); + orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + flush_tlb_batched_pending(mm); + arch_enter_lazy_mmu_mode(); + for (; addr < end; pte++, addr += PAGE_SIZE) { + ptent = *pte; + if (!pte_present(ptent)) + continue; + + page = vm_normal_page(vma, addr, ptent); + if (!page) + continue; + + /* + * creating a THP page is expensive so split it only if we + * are sure it's worth. Split it if we are only owner. + */ + if (PageTransCompound(page)) { + if (page_mapcount(page) != 1) + break; + get_page(page); + if (!trylock_page(page)) { + put_page(page); + break; + } + pte_unmap_unlock(orig_pte, ptl); + if (split_huge_page(page)) { + unlock_page(page); + put_page(page); + pte_offset_map_lock(mm, pmd, addr, &ptl); + break; + } + unlock_page(page); + put_page(page); + pte = pte_offset_map_lock(mm, pmd, addr, &ptl); + pte--; + addr -= PAGE_SIZE; + continue; + } + + VM_BUG_ON_PAGE(PageTransCompound(page), page); + + if (isolate_lru_page(page)) + continue; + + isolated++; + if (pte_young(ptent)) { + ptent = ptep_get_and_clear_full(mm, addr, pte, + tlb->fullmm); + ptent = pte_mkold(ptent); + set_pte_at(mm, addr, pte, ptent); + tlb_remove_tlb_entry(tlb, pte, addr); + } + ClearPageReferenced(page); + test_and_clear_page_young(page); + list_add(&page->lru, &page_list); + /* + * Prevent early OOM kill since it could isolate too many LRU + * pages concurrently. + */ + if (isolated >= SWAP_CLUSTER_MAX) { + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + reclaim_pages(&page_list); + isolated = 0; + pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); + arch_enter_lazy_mmu_mode(); + orig_pte = pte; + } + } + + arch_leave_lazy_mmu_mode(); + pte_unmap_unlock(orig_pte, ptl); + reclaim_pages(&page_list); + cond_resched(); + + return 0; +} + +static void madvise_pageout_page_range(struct mmu_gather *tlb, + struct vm_area_struct *vma, + unsigned long addr, unsigned long end) +{ + struct mm_walk pageout_walk = { + .pmd_entry = madvise_pageout_pte_range, + .mm = vma->vm_mm, + .private = tlb, + }; + + tlb_start_vma(tlb, vma); + walk_page_range(addr, end, &pageout_walk); + tlb_end_vma(tlb, vma); +} + +static inline bool can_do_pageout(struct vm_area_struct *vma) +{ + if (vma_is_anonymous(vma)) + return true; + if (!vma->vm_file) + return false; + /* + * paging out pagecache only for non-anonymous mappings that correspond + * to the files the calling process could (if tried) open for writing; + * otherwise we'd be including shared non-exclusive mappings, which + * opens a side channel. + */ + return inode_owner_or_capable(file_inode(vma->vm_file)) || + inode_permission(file_inode(vma->vm_file), MAY_WRITE) == 0; +} + +static long madvise_pageout(struct vm_area_struct *vma, + struct vm_area_struct **prev, + unsigned long start_addr, unsigned long end_addr) +{ + struct mm_struct *mm = vma->vm_mm; + struct mmu_gather tlb; + + *prev = vma; + if (!can_madv_lru_vma(vma)) + return -EINVAL; + + if (!can_do_pageout(vma)) + return 0; + + lru_add_drain(); + tlb_gather_mmu(&tlb, mm, start_addr, end_addr); + madvise_pageout_page_range(&tlb, vma, start_addr, end_addr); + tlb_finish_mmu(&tlb, start_addr, end_addr); + + return 0; +} + static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) @@ -870,6 +1079,8 @@ madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev, return madvise_willneed(vma, prev, start, end); case MADV_COLD: return madvise_cold(vma, prev, start, end); + case MADV_PAGEOUT: + return madvise_pageout(vma, prev, start, end); case MADV_FREE: case MADV_DONTNEED: return madvise_dontneed_free(vma, prev, start, end, behavior); @@ -892,6 +1103,7 @@ madvise_behavior_valid(int behavior) case MADV_DONTNEED: case MADV_FREE: case MADV_COLD: + case MADV_PAGEOUT: #ifdef CONFIG_KSM case MADV_MERGEABLE: case MADV_UNMERGEABLE: diff --git a/mm/vmscan.c b/mm/vmscan.c index 223ce5da08f0..b521770c4314 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -2151,6 +2151,64 @@ static void shrink_active_list(unsigned long nr_to_scan, nr_deactivate, nr_rotated, sc->priority, file); } +unsigned long reclaim_pages(struct list_head *page_list) +{ + int nid = -1; + unsigned long nr_reclaimed = 0; + LIST_HEAD(node_page_list); + struct reclaim_stat dummy_stat; + struct scan_control sc = { + .gfp_mask = GFP_KERNEL, + .priority = DEF_PRIORITY, + .may_writepage = 1, + .may_unmap = 1, + .may_swap = 1, + }; + + while (!list_empty(page_list)) { + struct page *page; + + page = lru_to_page(page_list); + if (nid == -1) { + nid = page_to_nid(page); + INIT_LIST_HEAD(&node_page_list); + } + + if (nid == page_to_nid(page)) { + list_move(&page->lru, &node_page_list); + continue; + } + + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + struct page *page = lru_to_page(&node_page_list); + + list_del(&page->lru); + putback_lru_page(page); + } + + nid = -1; + } + + if (!list_empty(&node_page_list)) { + nr_reclaimed += shrink_page_list(&node_page_list, + NODE_DATA(nid), + &sc, 0, + &dummy_stat, false); + while (!list_empty(&node_page_list)) { + struct page *page = lru_to_page(&node_page_list); + + list_del(&page->lru); + putback_lru_page(page); + } + } + + return nr_reclaimed; +} + /* * The inactive anon list should be small enough that the VM never has * to do too much work. From patchwork Thu Jun 27 11:54:05 2019 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 11019487 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork-2.web.codeaurora.org (Postfix) with ESMTP id 845D51575 for ; Thu, 27 Jun 2019 11:54:46 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 751B92881C for ; Thu, 27 Jun 2019 11:54:46 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 68DF328AC0; Thu, 27 Jun 2019 11:54:46 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.7 required=2.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,MAILING_LIST_MULTI,RCVD_IN_DNSWL_NONE autolearn=ham version=3.3.1 Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 7F1C028ADC for ; Thu, 27 Jun 2019 11:54:45 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 58B6B6B000C; Thu, 27 Jun 2019 07:54:44 -0400 (EDT) Delivered-To: linux-mm-outgoing@kvack.org Received: by kanga.kvack.org (Postfix, from userid 40) id 53C328E0003; Thu, 27 Jun 2019 07:54:44 -0400 (EDT) X-Original-To: int-list-linux-mm@kvack.org X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 42B578E0002; Thu, 27 Jun 2019 07:54:44 -0400 (EDT) X-Original-To: linux-mm@kvack.org X-Delivered-To: linux-mm@kvack.org Received: from mail-pl1-f197.google.com (mail-pl1-f197.google.com [209.85.214.197]) by kanga.kvack.org (Postfix) with ESMTP id 088D56B000C for ; Thu, 27 Jun 2019 07:54:44 -0400 (EDT) Received: by mail-pl1-f197.google.com with SMTP id q2so1306677plr.19 for ; Thu, 27 Jun 2019 04:54:44 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:dkim-signature:sender:from:to:cc:subject:date :message-id:in-reply-to:references:mime-version :content-transfer-encoding; bh=93WnCRK2teuF0qh7ke2oj7ryC0s9RBV/iR71KvnFbIA=; b=B+bfAGvnqXfgo+UAz+5KWNzpD8hexu61SIUyQwwJXmd8HhEMrPUX/cxJ4ylChsqGWH dRRkP7g12uOT4TFfD7bUsu3ok6z2QhT4OhGWGu2wYkfBW//Aj7B3Eg54hQjMtYqPAqe1 4E2fUz3eG8Fg0GyOJ7sAZAbOE7zxDJaDqpEKZQM719B4Kn9nUbI8AGBJxcNAH/YX4iDl KSyW9FOk7g//0vC6O/9wZNEcV95M92Mu39jIa3VBWpB6deP9jAOBRGceAeMItIMbTMVM wZ/gRrXAA/tL12PbjzTrmsD/SePkLpW6tf1bdoLWJAiBPR8bLsg1vR/QMyTg7KUo3k84 0XcA== X-Gm-Message-State: APjAAAV1RNALo01gS7FdOj/uxY8TPQUHBz6W2m34WLC6oXHxluP9RpD/ bXngudXLM1EJT1bH+rwgNQuE8KPA7KEQ+NhzlJmrdtqyklVyVfGPuC4KJk6l326nD0hiCtWpAn+ GARyQJwZUjZmvBPXtYUznWLVpb3yVcuVFEhLGVkmsI6VlZylw6dDYh4f/SJjGM2k= X-Received: by 2002:a17:902:23:: with SMTP id 32mr4234022pla.34.1561636483674; Thu, 27 Jun 2019 04:54:43 -0700 (PDT) X-Received: by 2002:a17:902:23:: with SMTP id 32mr4233920pla.34.1561636482150; Thu, 27 Jun 2019 04:54:42 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; t=1561636482; cv=none; d=google.com; s=arc-20160816; b=Qo4SbQuyJwGlIxUkd/M917lPn8iA3dQA6/YFKLGr/Q6TBo4Xe1k7gVUpnsAztzWGg3 vP8dM23t51zbQnGwvZeyqycQ4Xh6znrhPJkhq+ac3B1+pLGf05OqYEgg4jvH5UfHbzcW YgaclgBHPERaPcz2jaUddAXQflbyPSUGTCBfy70AWxpfE+btyr+tEp4z4q0bjqcIBOha u74OSumiwhTLTE7x+4pkssXWa4Ux3rURJ/zmn4gGnpnd1fWfGEMJMVXNPbZiHh02IolA +CB2J2ek6q6xfueLnqjW6l/+sRY/cWTOKAYwpex+uMJwNL5KCQf+k/tsIy8jYdJGg+IO B10g== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=arc-20160816; h=content-transfer-encoding:mime-version:references:in-reply-to :message-id:date:subject:cc:to:from:sender:dkim-signature; bh=93WnCRK2teuF0qh7ke2oj7ryC0s9RBV/iR71KvnFbIA=; b=HSx+Rx/yOVci/RL/DhZMRfi0Ph+rIY8hwCerwvKHeiht+Cv7k2G4uRyBf7fkUTNW0a 3YEj52vF4JyqbfhMrCPwSHb2KnST/homXvIM7bedKLQZUuk8MZetINryh6GB8GNzjhKK dXEZLzIlMckDQ3nSSV7PrbOhbDtkDtJzTDz1SFUpyk1NYZCm17qYcfkrzr+vLsG6LqwS NfMs3nBmeZnQYe4+JL2FM1KhBZkzvrgoohjVIM9S6Dub8sopwiEWaHXsYuDQFC1zMbi9 vDZ38uPy2LFe6fd2OV9qt9TM27sgGbGJTYzgoEBYzsKG6Ir4KJ1anyGKTQ88TUEqUUzY JQGw== ARC-Authentication-Results: i=1; mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=iyGK6Aca; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org Received: from mail-sor-f65.google.com (mail-sor-f65.google.com. [209.85.220.65]) by mx.google.com with SMTPS id 67sor724122pgc.77.2019.06.27.04.54.42 for (Google Transport Security); Thu, 27 Jun 2019 04:54:42 -0700 (PDT) Received-SPF: pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) client-ip=209.85.220.65; Authentication-Results: mx.google.com; dkim=pass header.i=@gmail.com header.s=20161025 header.b=iyGK6Aca; spf=pass (google.com: domain of minchan.kim@gmail.com designates 209.85.220.65 as permitted sender) smtp.mailfrom=minchan.kim@gmail.com; dmarc=fail (p=NONE sp=NONE dis=NONE) header.from=kernel.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=93WnCRK2teuF0qh7ke2oj7ryC0s9RBV/iR71KvnFbIA=; b=iyGK6AcageSWTpRAoEpFUqvpCfRBe+B8O3rjFHJn3q6n+QUE2aGU3IDPSmEiPtIVUL Xg47qIEk6eFEwg0G2Jeoeps4LOOnepBZyXy5j7pTK+19F75BknvEu0JMRjkvsm6SabuX LPgluwglHGZSnnHpNupub0knft7bc4VRrnrmE5f06L3VgXqylHMANP45wYE3SPqZzYG/ KocSGqOiKnlEJ7pHywFnr5E/ddYTeTz7fqrc2ItpU9dj3Z8oyto43aGeX7h/XqbVEvPB hj3HIQK0C+q8pIHNmpLGNBI2FmXzzqSMBdroNatU+YNp/Ssdc5XNda3aKRpdxHlE4oXx EXPQ== X-Google-Smtp-Source: APXvYqxwNuznR8PQUPWGciSUojLRogFx9FhnqXbvCrOQeZ10M0UmgIixyw2S3h3r+SYWovqv+zsvng== X-Received: by 2002:a63:c0e:: with SMTP id b14mr3457713pgl.4.1561636481596; Thu, 27 Jun 2019 04:54:41 -0700 (PDT) Received: from bbox-1.seo.corp.google.com ([2401:fa00:d:0:d988:f0f2:984f:445b]) by smtp.gmail.com with ESMTPSA id x14sm3241419pfq.158.2019.06.27.04.54.36 (version=TLS1_3 cipher=AEAD-AES256-GCM-SHA384 bits=256/256); Thu, 27 Jun 2019 04:54:40 -0700 (PDT) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , linux-api@vger.kernel.org, Michal Hocko , Johannes Weiner , Tim Murray , Joel Fernandes , Suren Baghdasaryan , Daniel Colascione , Shakeel Butt , Sonny Rao , oleksandr@redhat.com, hdanton@sina.com, lizeb@google.com, Dave Hansen , "Kirill A . Shutemov" , Minchan Kim Subject: [PATCH v3 5/5] mm: factor out pmd young/dirty bit handling and THP split Date: Thu, 27 Jun 2019 20:54:05 +0900 Message-Id: <20190627115405.255259-6-minchan@kernel.org> X-Mailer: git-send-email 2.22.0.410.gd8fdbe21b5-goog In-Reply-To: <20190627115405.255259-1-minchan@kernel.org> References: <20190627115405.255259-1-minchan@kernel.org> MIME-Version: 1.0 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: X-Virus-Scanned: ClamAV using ClamSMTP Now, there are common part among MADV_COLD|PAGEOUT|FREE to reset access/dirty bit resetting or split the THP page to handle part of subpages in the THP page. This patch factor out the common part. Signed-off-by: Minchan Kim --- include/linux/huge_mm.h | 3 - mm/huge_memory.c | 74 ------------- mm/madvise.c | 234 +++++++++++++++++++++++----------------- 3 files changed, 135 insertions(+), 176 deletions(-) diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h index 7cd5c150c21d..2667e1aa3ce5 100644 --- a/include/linux/huge_mm.h +++ b/include/linux/huge_mm.h @@ -29,9 +29,6 @@ extern struct page *follow_trans_huge_pmd(struct vm_area_struct *vma, unsigned long addr, pmd_t *pmd, unsigned int flags); -extern bool madvise_free_huge_pmd(struct mmu_gather *tlb, - struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr, unsigned long next); extern int zap_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, pmd_t *pmd, unsigned long addr); diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 93f531b63a45..e4b9a06788f3 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1671,80 +1671,6 @@ vm_fault_t do_huge_pmd_numa_page(struct vm_fault *vmf, pmd_t pmd) return 0; } -/* - * Return true if we do MADV_FREE successfully on entire pmd page. - * Otherwise, return false. - */ -bool madvise_free_huge_pmd(struct mmu_gather *tlb, struct vm_area_struct *vma, - pmd_t *pmd, unsigned long addr, unsigned long next) -{ - spinlock_t *ptl; - pmd_t orig_pmd; - struct page *page; - struct mm_struct *mm = tlb->mm; - bool ret = false; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); - - ptl = pmd_trans_huge_lock(pmd, vma); - if (!ptl) - goto out_unlocked; - - orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto out; - - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto out; - } - - page = pmd_page(orig_pmd); - /* - * If other processes are mapping this page, we couldn't discard - * the page unless they all do MADV_FREE so let's skip the page. - */ - if (page_mapcount(page) != 1) - goto out; - - if (!trylock_page(page)) - goto out; - - /* - * If user want to discard part-pages of THP, split it so MADV_FREE - * will deactivate only them. - */ - if (next - addr != HPAGE_PMD_SIZE) { - get_page(page); - spin_unlock(ptl); - split_huge_page(page); - unlock_page(page); - put_page(page); - goto out_unlocked; - } - - if (PageDirty(page)) - ClearPageDirty(page); - unlock_page(page); - - if (pmd_young(orig_pmd) || pmd_dirty(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd = pmd_mkold(orig_pmd); - orig_pmd = pmd_mkclean(orig_pmd); - - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_pmd_tlb_entry(tlb, pmd, addr); - } - - mark_page_lazyfree(page); - ret = true; -out: - spin_unlock(ptl); -out_unlocked: - return ret; -} - static inline void zap_deposited_table(struct mm_struct *mm, pmd_t *pmd) { pgtable_t pgtable; diff --git a/mm/madvise.c b/mm/madvise.c index ee210473f639..13b06dc8d402 100644 --- a/mm/madvise.c +++ b/mm/madvise.c @@ -310,6 +310,91 @@ static long madvise_willneed(struct vm_area_struct *vma, return 0; } +enum madv_pmdp_reset_t { + MADV_PMDP_RESET, /* pmd was reset successfully */ + MADV_PMDP_SPLIT, /* pmd was split */ + MADV_PMDP_ERROR, +}; + +static enum madv_pmdp_reset_t madvise_pmdp_reset_or_split(struct mm_walk *walk, + pmd_t *pmd, spinlock_t *ptl, + unsigned long addr, unsigned long end, + bool young, bool dirty) +{ + pmd_t orig_pmd; + unsigned long next; + struct page *page; + struct mmu_gather *tlb = walk->private; + struct mm_struct *mm = walk->mm; + struct vm_area_struct *vma = walk->vma; + bool reset_young = false; + bool reset_dirty = false; + enum madv_pmdp_reset_t ret = MADV_PMDP_ERROR; + + orig_pmd = *pmd; + if (is_huge_zero_pmd(orig_pmd)) + return ret; + + if (unlikely(!pmd_present(orig_pmd))) { + VM_BUG_ON(thp_migration_supported() && + !is_pmd_migration_entry(orig_pmd)); + return ret; + } + + next = pmd_addr_end(addr, end); + page = pmd_page(orig_pmd); + if (next - addr != HPAGE_PMD_SIZE) { + /* + * THP collapsing is not cheap so only split the page is + * private to the this process. + */ + if (page_mapcount(page) != 1) + return ret; + get_page(page); + spin_unlock(ptl); + lock_page(page); + if (!split_huge_page(page)) + ret = MADV_PMDP_SPLIT; + unlock_page(page); + put_page(page); + return ret; + } + + if (young && pmd_young(orig_pmd)) + reset_young = true; + if (dirty && pmd_dirty(orig_pmd)) + reset_dirty = true; + + /* + * Other process could rely on the PG_dirty for data consistency, + * not pte_dirty so we could reset PG_dirty only when we are owner + * of the page. + */ + if (reset_dirty) { + if (page_mapcount(page) != 1) + goto out; + if (!trylock_page(page)) + goto out; + if (PageDirty(page)) + ClearPageDirty(page); + unlock_page(page); + } + + ret = MADV_PMDP_RESET; + if (reset_young || reset_dirty) { + tlb_change_page_size(tlb, HPAGE_PMD_SIZE); + pmdp_invalidate(vma, addr, pmd); + if (reset_young) + orig_pmd = pmd_mkold(orig_pmd); + if (reset_dirty) + orig_pmd = pmd_mkclean(orig_pmd); + set_pmd_at(mm, addr, pmd, orig_pmd); + tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + } +out: + return ret; +} + static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, unsigned long end, struct mm_walk *walk) { @@ -319,64 +404,31 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, pte_t *orig_pte, *pte, ptent; spinlock_t *ptl; struct page *page; - unsigned long next; - next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) { - pmd_t orig_pmd; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); ptl = pmd_trans_huge_lock(pmd, vma); if (!ptl) return 0; - orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto huge_unlock; - - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto huge_unlock; - } - - page = pmd_page(orig_pmd); - if (next - addr != HPAGE_PMD_SIZE) { - int err; - - if (page_mapcount(page) != 1) - goto huge_unlock; - - get_page(page); + switch (madvise_pmdp_reset_or_split(walk, pmd, ptl, addr, end, + true, false)) { + case MADV_PMDP_RESET: spin_unlock(ptl); - lock_page(page); - err = split_huge_page(page); - unlock_page(page); - put_page(page); - if (!err) - goto regular_page; - return 0; - } - - if (pmd_young(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd = pmd_mkold(orig_pmd); - - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_pmd_tlb_entry(tlb, pmd, addr); + page = pmd_page(*pmd); + test_and_clear_page_young(page); + deactivate_page(page); + goto next; + case MADV_PMDP_ERROR: + spin_unlock(ptl); + goto next; + case MADV_PMDP_SPLIT: + ; /* go through */ } - - test_and_clear_page_young(page); - deactivate_page(page); -huge_unlock: - spin_unlock(ptl); - return 0; } if (pmd_trans_unstable(pmd)) return 0; -regular_page: tlb_change_page_size(tlb, PAGE_SIZE); orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); flush_tlb_batched_pending(mm); @@ -443,6 +495,7 @@ static int madvise_cold_pte_range(pmd_t *pmd, unsigned long addr, arch_enter_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); +next: cond_resched(); return 0; @@ -493,70 +546,38 @@ static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, LIST_HEAD(page_list); struct page *page; int isolated = 0; - unsigned long next; if (fatal_signal_pending(current)) return -EINTR; - next = pmd_addr_end(addr, end); if (pmd_trans_huge(*pmd)) { - pmd_t orig_pmd; - - tlb_change_page_size(tlb, HPAGE_PMD_SIZE); ptl = pmd_trans_huge_lock(pmd, vma); if (!ptl) return 0; - orig_pmd = *pmd; - if (is_huge_zero_pmd(orig_pmd)) - goto huge_unlock; - - if (unlikely(!pmd_present(orig_pmd))) { - VM_BUG_ON(thp_migration_supported() && - !is_pmd_migration_entry(orig_pmd)); - goto huge_unlock; - } - - page = pmd_page(orig_pmd); - if (next - addr != HPAGE_PMD_SIZE) { - int err; - - if (page_mapcount(page) != 1) - goto huge_unlock; - get_page(page); + switch (madvise_pmdp_reset_or_split(walk, pmd, ptl, addr, end, + true, false)) { + case MADV_PMDP_RESET: + page = pmd_page(*pmd); spin_unlock(ptl); - lock_page(page); - err = split_huge_page(page); - unlock_page(page); - put_page(page); - if (!err) - goto regular_page; - return 0; - } - - if (isolate_lru_page(page)) - goto huge_unlock; - - if (pmd_young(orig_pmd)) { - pmdp_invalidate(vma, addr, pmd); - orig_pmd = pmd_mkold(orig_pmd); - - set_pmd_at(mm, addr, pmd, orig_pmd); - tlb_remove_tlb_entry(tlb, pmd, addr); + if (isolate_lru_page(page)) + return 0; + ClearPageReferenced(page); + test_and_clear_page_young(page); + list_add(&page->lru, &page_list); + reclaim_pages(&page_list); + goto next; + case MADV_PMDP_ERROR: + spin_unlock(ptl); + goto next; + case MADV_PMDP_SPLIT: + ; /* go through */ } - - ClearPageReferenced(page); - test_and_clear_page_young(page); - list_add(&page->lru, &page_list); -huge_unlock: - spin_unlock(ptl); - reclaim_pages(&page_list); - return 0; } if (pmd_trans_unstable(pmd)) return 0; -regular_page: + tlb_change_page_size(tlb, PAGE_SIZE); orig_pte = pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl); flush_tlb_batched_pending(mm); @@ -631,6 +652,7 @@ static int madvise_pageout_pte_range(pmd_t *pmd, unsigned long addr, arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); reclaim_pages(&page_list); +next: cond_resched(); return 0; @@ -700,12 +722,26 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, pte_t *orig_pte, *pte, ptent; struct page *page; int nr_swap = 0; - unsigned long next; - next = pmd_addr_end(addr, end); - if (pmd_trans_huge(*pmd)) - if (madvise_free_huge_pmd(tlb, vma, pmd, addr, next)) + if (pmd_trans_huge(*pmd)) { + ptl = pmd_trans_huge_lock(pmd, vma); + if (!ptl) + return 0; + + switch (madvise_pmdp_reset_or_split(walk, pmd, ptl, addr, end, + true, true)) { + case MADV_PMDP_RESET: + page = pmd_page(*pmd); + spin_unlock(ptl); + mark_page_lazyfree(page); goto next; + case MADV_PMDP_ERROR: + spin_unlock(ptl); + goto next; + case MADV_PMDP_SPLIT: + ; /* go through */ + } + } if (pmd_trans_unstable(pmd)) return 0; @@ -817,8 +853,8 @@ static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, } arch_leave_lazy_mmu_mode(); pte_unmap_unlock(orig_pte, ptl); - cond_resched(); next: + cond_resched(); return 0; }