From patchwork Fri Mar 19 17:51:25 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 12151631 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id E10ADC433C1 for ; Fri, 19 Mar 2021 17:52:44 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id B8BA461984 for ; Fri, 19 Mar 2021 17:52:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230166AbhCSRwN (ORCPT ); Fri, 19 Mar 2021 13:52:13 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42600 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230028AbhCSRvv (ORCPT ); Fri, 19 Mar 2021 13:51:51 -0400 Received: from mail-pg1-x52c.google.com (mail-pg1-x52c.google.com [IPv6:2607:f8b0:4864:20::52c]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id ED892C06174A; Fri, 19 Mar 2021 10:51:50 -0700 (PDT) Received: by mail-pg1-x52c.google.com with SMTP id k24so4187562pgl.6; Fri, 19 Mar 2021 10:51:50 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=OeHyWKCGOpd9bY+kqlImZzFrLSiUBZ7zPZcgXqiKPbc=; b=S+wDoNwr6wE3jgHD2DQFV9qmbQR/k96QzLn8YE+nckAhTDtAciQmFyca0PPMQJmDvw 3ibXqDaB0VXXKU1Y+QoCRb1qFULWIMr2AqK+DBnZqE5XdgiOX/rOvHJlG1KMQfIX7c3t FXpNPt2Hu+9YNk2l90lspW+JRYSHxLD4QsDElqe6l7Is5YYMn9GT18T9StpxUMdVk9nP MjifufLLuaOW3CjsawprcK0RZzvbYNSKK/qqQ2KL32V31xri4OiRbSCpqrXOUr5bhy/9 aqSMYEVNMDRTJVdVRsVZCJyNotgQN/ZD3+howu6F4/WHDjeICzslDPIZQfCkEI3e6xH/ TyCw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :mime-version:content-transfer-encoding; bh=OeHyWKCGOpd9bY+kqlImZzFrLSiUBZ7zPZcgXqiKPbc=; b=rW9+GrJtN/nb4QusPWwbqypZKDIW3Sxlp3PT42LwkICsnDlrv3Hl5xzQLYJ0Desgwf GMk9aXZZJV/PvuYs7mfXQPBev8lFu71s6tHSBv1Ezq6YTCjqo2YkGzHGiSw10qoMEgxc fy4J3t7kxV0rPnmPEL4qqUFuD79fsp4+mGC+08uFe1FaOKiJ6ei5vypIbCOecc1lt3EK QnD4Llpis7BLQgp7qnSCohheMRINg4oGLRD9q8pudKmsfCvyJGO1An9hIFKM/MG9a4Hb ErX9ZOaJ9Otqchdk6AIIeV9OU7WRRDHLh/aenpoSq7YT6GfNLuoMbaEip4z7RQ50O8GA S1OQ== X-Gm-Message-State: AOAM530dj5Lb6VuM2Fmyr3BlEBGLTA8EOM8Dw73ruFImkVFs5p1stWQk XksONAEroxCFWoYOQstBpyc= X-Google-Smtp-Source: ABdhPJyCw9zQ60569SfJUv7gjAimlBr2v0X3ndeNBs2C4c/AMLMtDTk8RkIeRB3sAGPI2DoUFf2Qjg== X-Received: by 2002:a62:37c6:0:b029:1f0:abe0:8d1c with SMTP id e189-20020a6237c60000b02901f0abe08d1cmr10240412pfa.23.1616176310387; Fri, 19 Mar 2021 10:51:50 -0700 (PDT) Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:201:913d:5573:c956:f033]) by smtp.gmail.com with ESMTPSA id w8sm5498287pgk.46.2021.03.19.10.51.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Mar 2021 10:51:49 -0700 (PDT) Sender: Minchan Kim From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , joaodias@google.com, surenb@google.com, cgoldswo@codeaurora.org, willy@infradead.org, mhocko@suse.com, david@redhat.com, vbabka@suse.cz, linux-fsdevel@vger.kernel.org, oliver.sang@intel.com, Minchan Kim Subject: [PATCH v4 1/3] mm: disable LRU pagevec during the migration temporarily Date: Fri, 19 Mar 2021 10:51:25 -0700 Message-Id: <20210319175127.886124-1-minchan@kernel.org> X-Mailer: git-send-email 2.31.0.rc2.261.g7f71774620-goog MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org LRU pagevec holds refcount of pages until the pagevec are drained. It could prevent migration since the refcount of the page is greater than the expection in migration logic. To mitigate the issue, callers of migrate_pages drains LRU pagevec via migrate_prep or lru_add_drain_all before migrate_pages call. However, it's not enough because pages coming into pagevec after the draining call still could stay at the pagevec so it could keep preventing page migration. Since some callers of migrate_pages have retrial logic with LRU draining, the page would migrate at next trail but it is still fragile in that it doesn't close the fundamental race between upcoming LRU pages into pagvec and migration so the migration failure could cause contiguous memory allocation failure in the end. To close the race, this patch disables lru caches(i.e, pagevec) during ongoing migration until migrate is done. Since it's really hard to reproduce, I measured how many times migrate_pages retried with force mode(it is about a fallback to a sync migration) with below debug code. int migrate_pages(struct list_head *from, new_page_t get_new_page, .. .. if (rc && reason == MR_CONTIG_RANGE && pass > 2) { printk(KERN_ERR, "pfn 0x%lx reason %d\n", page_to_pfn(page), rc); dump_page(page, "fail to migrate"); } The test was repeating android apps launching with cma allocation in background every five seconds. Total cma allocation count was about 500 during the testing. With this patch, the dump_page count was reduced from 400 to 30. The new interface is also useful for memory hotplug which currently drains lru pcp caches after each migration failure. This is rather suboptimal as it has to disrupt others running during the operation. With the new interface the operation happens only once. This is also in line with pcp allocator cache which are disabled for the offlining as well. Reviewed-by: Chris Goldsworthy Acked-by: Michal Hocko Signed-off-by: Minchan Kim --- include/linux/migrate.h | 2 ++ include/linux/swap.h | 14 +++++++++ mm/memory_hotplug.c | 3 +- mm/mempolicy.c | 4 +++ mm/migrate.c | 11 ++++--- mm/page_alloc.c | 2 ++ mm/swap.c | 64 +++++++++++++++++++++++++++++++++++------ 7 files changed, 86 insertions(+), 14 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 3a389633b68f..9e4a2dc8622c 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -46,6 +46,7 @@ extern int isolate_movable_page(struct page *page, isolate_mode_t mode); extern void putback_movable_page(struct page *page); extern void migrate_prep(void); +extern void migrate_finish(void); extern void migrate_prep_local(void); extern void migrate_page_states(struct page *newpage, struct page *page); extern void migrate_page_copy(struct page *newpage, struct page *page); @@ -67,6 +68,7 @@ static inline int isolate_movable_page(struct page *page, isolate_mode_t mode) { return -EBUSY; } static inline int migrate_prep(void) { return -ENOSYS; } +static inline int migrate_finish(void) { return -ENOSYS; } static inline int migrate_prep_local(void) { return -ENOSYS; } static inline void migrate_page_states(struct page *newpage, struct page *page) diff --git a/include/linux/swap.h b/include/linux/swap.h index 32f665b1ee85..85cf022072a0 100644 --- a/include/linux/swap.h +++ b/include/linux/swap.h @@ -339,6 +339,20 @@ extern void lru_note_cost(struct lruvec *lruvec, bool file, extern void lru_note_cost_page(struct page *); extern void lru_cache_add(struct page *); extern void mark_page_accessed(struct page *); + +extern atomic_t lru_disable_count; + +static inline bool lru_cache_disabled(void) +{ + return atomic_read(&lru_disable_count); +} + +static inline void lru_cache_enable(void) +{ + atomic_dec(&lru_disable_count); +} + +extern void lru_cache_disable(void); extern void lru_add_drain(void); extern void lru_add_drain_cpu(int cpu); extern void lru_add_drain_cpu_zone(struct zone *zone); diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 5ba51a8bdaeb..959f659ef085 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1611,6 +1611,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) * in a way that pages from isolated pageblock are left on pcplists. */ zone_pcp_disable(zone); + lru_cache_disable(); /* set above range as isolated */ ret = start_isolate_page_range(start_pfn, end_pfn, @@ -1642,7 +1643,6 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) } cond_resched(); - lru_add_drain_all(); ret = scan_movable_pages(pfn, end_pfn, &pfn); if (!ret) { @@ -1687,6 +1687,7 @@ int __ref offline_pages(unsigned long start_pfn, unsigned long nr_pages) zone->nr_isolate_pageblock -= nr_pages / pageblock_nr_pages; spin_unlock_irqrestore(&zone->lock, flags); + lru_cache_enable(); zone_pcp_enable(zone); /* removal success */ diff --git a/mm/mempolicy.c b/mm/mempolicy.c index ab51132547b8..495b43a4b0f8 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1208,6 +1208,8 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, break; } mmap_read_unlock(mm); + + migrate_finish(); if (err < 0) return err; return busy; @@ -1371,6 +1373,8 @@ static long do_mbind(unsigned long start, unsigned long len, mmap_write_unlock(mm); mpol_out: mpol_put(new); + if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) + migrate_finish(); return err; } diff --git a/mm/migrate.c b/mm/migrate.c index 62b81d5257aa..4d6c306d41c6 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -66,11 +66,13 @@ void migrate_prep(void) { /* * Clear the LRU lists so pages can be isolated. - * Note that pages may be moved off the LRU after we have - * drained them. Those pages will fail to migrate like other - * pages that may be busy. */ - lru_add_drain_all(); + lru_cache_disable(); +} + +void migrate_finish(void) +{ + lru_cache_enable(); } /* Do the necessary work of migrate_prep but not if it involves other CPUs */ @@ -1838,6 +1840,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, if (err >= 0) err = err1; out: + migrate_finish(); return err; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 2e8348936df8..af5d4eeb2999 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8495,6 +8495,8 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, ret = migrate_pages(&cc->migratepages, alloc_migration_target, NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE); } + + migrate_finish(); if (ret < 0) { putback_movable_pages(&cc->migratepages); return ret; diff --git a/mm/swap.c b/mm/swap.c index 31b844d4ed94..c94f55e7b649 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -235,6 +235,18 @@ static void pagevec_move_tail_fn(struct page *page, struct lruvec *lruvec) } } +/* return true if pagevec needs to drain */ +static bool pagevec_add_and_need_flush(struct pagevec *pvec, struct page *page) +{ + bool ret = false; + + if (!pagevec_add(pvec, page) || PageCompound(page) || + lru_cache_disabled()) + ret = true; + + return ret; +} + /* * Writeback is about to end against a page which has been marked for immediate * reclaim. If it still appears to be reclaimable, move it to the tail of the @@ -252,7 +264,7 @@ void rotate_reclaimable_page(struct page *page) get_page(page); local_lock_irqsave(&lru_rotate.lock, flags); pvec = this_cpu_ptr(&lru_rotate.pvec); - if (!pagevec_add(pvec, page) || PageCompound(page)) + if (pagevec_add_and_need_flush(pvec, page)) pagevec_lru_move_fn(pvec, pagevec_move_tail_fn); local_unlock_irqrestore(&lru_rotate.lock, flags); } @@ -343,7 +355,7 @@ static void activate_page(struct page *page) local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.activate_page); get_page(page); - if (!pagevec_add(pvec, page) || PageCompound(page)) + if (pagevec_add_and_need_flush(pvec, page)) pagevec_lru_move_fn(pvec, __activate_page); local_unlock(&lru_pvecs.lock); } @@ -458,7 +470,7 @@ void lru_cache_add(struct page *page) get_page(page); local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.lru_add); - if (!pagevec_add(pvec, page) || PageCompound(page)) + if (pagevec_add_and_need_flush(pvec, page)) __pagevec_lru_add(pvec); local_unlock(&lru_pvecs.lock); } @@ -654,7 +666,7 @@ void deactivate_file_page(struct page *page) local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate_file); - if (!pagevec_add(pvec, page) || PageCompound(page)) + if (pagevec_add_and_need_flush(pvec, page)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn); local_unlock(&lru_pvecs.lock); } @@ -676,7 +688,7 @@ void deactivate_page(struct page *page) local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate); get_page(page); - if (!pagevec_add(pvec, page) || PageCompound(page)) + if (pagevec_add_and_need_flush(pvec, page)) pagevec_lru_move_fn(pvec, lru_deactivate_fn); local_unlock(&lru_pvecs.lock); } @@ -698,7 +710,7 @@ void mark_page_lazyfree(struct page *page) local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.lru_lazyfree); get_page(page); - if (!pagevec_add(pvec, page) || PageCompound(page)) + if (pagevec_add_and_need_flush(pvec, page)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn); local_unlock(&lru_pvecs.lock); } @@ -735,7 +747,7 @@ static void lru_add_drain_per_cpu(struct work_struct *dummy) * Calling this function with cpu hotplug locks held can actually lead * to obscure indirect dependencies via WQ context. */ -void lru_add_drain_all(void) +inline void __lru_add_drain_all(bool force_all_cpus) { /* * lru_drain_gen - Global pages generation number @@ -780,7 +792,7 @@ void lru_add_drain_all(void) * (C) Exit the draining operation if a newer generation, from another * lru_add_drain_all(), was already scheduled for draining. Check (A). */ - if (unlikely(this_gen != lru_drain_gen)) + if (unlikely(this_gen != lru_drain_gen && !force_all_cpus)) goto done; /* @@ -810,7 +822,8 @@ void lru_add_drain_all(void) for_each_online_cpu(cpu) { struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); - if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || + if (force_all_cpus || + pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) || @@ -828,6 +841,11 @@ void lru_add_drain_all(void) done: mutex_unlock(&lock); } + +void lru_add_drain_all(void) +{ + __lru_add_drain_all(false); +} #else void lru_add_drain_all(void) { @@ -835,6 +853,34 @@ void lru_add_drain_all(void) } #endif /* CONFIG_SMP */ +atomic_t lru_disable_count = ATOMIC_INIT(0); + +/* + * lru_cache_disable() needs to be called before we start compiling + * a list of pages to be migrated using isolate_lru_page(). + * It drains pages on LRU cache and then disable on all cpus until + * lru_cache_enable is called. + * + * Must be paired with a call to lru_cache_enable(). + */ +void lru_cache_disable(void) +{ + atomic_inc(&lru_disable_count); +#ifdef CONFIG_SMP + /* + * lru_add_drain_all in the force mode will schedule draining on + * all online CPUs so any calls of lru_cache_disabled wrapped by + * local_lock or preemption disabled would be ordered by that. + * The atomic operation doesn't need to have stronger ordering + * requirements because that is enforeced by the scheduling + * guarantees. + */ + __lru_add_drain_all(true); +#else + lru_add_drain(); +#endif +} + /** * release_pages - batched put_page() * @pages: array of pages to release From patchwork Fri Mar 19 17:51:26 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 12151633 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 3E1A3C433E2 for ; Fri, 19 Mar 2021 17:52:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id F160C61986 for ; Fri, 19 Mar 2021 17:52:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230228AbhCSRwO (ORCPT ); Fri, 19 Mar 2021 13:52:14 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42608 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230113AbhCSRvw (ORCPT ); Fri, 19 Mar 2021 13:51:52 -0400 Received: from mail-pj1-x102f.google.com (mail-pj1-x102f.google.com [IPv6:2607:f8b0:4864:20::102f]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id AF832C06174A; Fri, 19 Mar 2021 10:51:52 -0700 (PDT) Received: by mail-pj1-x102f.google.com with SMTP id ot17-20020a17090b3b51b0290109c9ac3c34so4809533pjb.4; Fri, 19 Mar 2021 10:51:52 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=AUG0YiSUOTPE19D4JV8qDGUQXAattgMTMVymnVqLA6s=; b=bR2GpVXnMjeqmMVaIhmgrqbcgqNYbuHFt1BndQ96+WWMpFi/vnZ69veZuD+/T6c1jG S2RlYI8E8ZzEJJDpOg3dvHNhB9Dtgy+AfDY1v8lgk5PF1O7JkGA1HYF7Q+cHKepuMBZP ETCPFdBQzYMLpdvwQoode6VaYIxDEv5KBBTvWnhi8So3z1xVO4QpzaXLdWtuugNzkgYl 0v3D514iB031azsnQhis1MiOxR2Emhvn51BKO3dKmAVgQo6xzbEqBw1jMb8YTYiXDcUT QJjnhaN7lrZhDw+Qk903Bfv9oSFW0k4y57gUNelPeoef5APHXmIbzyG1ij/plx4wxdOf X28A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=AUG0YiSUOTPE19D4JV8qDGUQXAattgMTMVymnVqLA6s=; b=XkVSvYxGiKNFuafNMiJyMsWeWi+gF52zF0PonnH4ZOn3mvZGaCfhePLPYVpMiPHn+o XvNIPu8RJK9vSPb+LrjXDIFUMmA91wFBE8ozK9VE+787UpYY6nHvOyjE9LiH9dVy7pQy VGAcpw7Gp7glLjfimawdjF3c53qEFaze3uMGtpNGBlfggh5cfol1J4MCoYNlhPBgaIb8 oHrLHXHBeC5MUMpMV7YJ9o+66pdgAe4IB5kexg4p8k7F0/50IC8C1imGpvWzy0MkES3X t4tFErtixUt5V291MgDfkt743schpyYBcw1fSqUqsU5ouGS2uvdJSOML0aRo4zS7eMK1 fNuw== X-Gm-Message-State: AOAM532Zs4irX8culNaEtjxLbBRPhOdRlNBGZT5vSLvGnWiXFyjup89A GzOUZybcA8vjN2SE8bmVNzA= X-Google-Smtp-Source: ABdhPJxYxz2FHA40AdEL/r3rMHxxo8IfNmZT8tRl4SmF5qEo/I3lprQ2K3TzEvXWMNfUV6mDHhBw1A== X-Received: by 2002:a17:902:c40a:b029:e4:99fc:c09f with SMTP id k10-20020a170902c40ab02900e499fcc09fmr15082101plk.46.1616176312322; Fri, 19 Mar 2021 10:51:52 -0700 (PDT) Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:201:913d:5573:c956:f033]) by smtp.gmail.com with ESMTPSA id w8sm5498287pgk.46.2021.03.19.10.51.50 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Mar 2021 10:51:51 -0700 (PDT) Sender: Minchan Kim From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , joaodias@google.com, surenb@google.com, cgoldswo@codeaurora.org, willy@infradead.org, mhocko@suse.com, david@redhat.com, vbabka@suse.cz, linux-fsdevel@vger.kernel.org, oliver.sang@intel.com, Minchan Kim Subject: [PATCH v4 2/3] mm: replace migrate_[prep|finish] with lru_cache_[disable|enable] Date: Fri, 19 Mar 2021 10:51:26 -0700 Message-Id: <20210319175127.886124-2-minchan@kernel.org> X-Mailer: git-send-email 2.31.0.rc2.261.g7f71774620-goog In-Reply-To: <20210319175127.886124-1-minchan@kernel.org> References: <20210319175127.886124-1-minchan@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Currently, migrate_[prep|finish] is merely a wrapper of lru_cache_[disable|enable]. There is not much to gain from having additional abstraction. Use lru_cache_[disable|enable] instead of migrate_[prep|finish], which would be more descriptive. note: migrate_prep_local in compaction.c changed into lru_add_drain to avoid CPU schedule cost with involving many other CPUs to keep keep old behavior. Acked-by: Michal Hocko Reviewed-by: David Hildenbrand Signed-off-by: Minchan Kim --- include/linux/migrate.h | 7 ------- mm/compaction.c | 3 ++- mm/mempolicy.c | 8 ++++---- mm/migrate.c | 28 ++-------------------------- mm/page_alloc.c | 4 ++-- 5 files changed, 10 insertions(+), 40 deletions(-) diff --git a/include/linux/migrate.h b/include/linux/migrate.h index 9e4a2dc8622c..6155d97ec76c 100644 --- a/include/linux/migrate.h +++ b/include/linux/migrate.h @@ -45,9 +45,6 @@ extern struct page *alloc_migration_target(struct page *page, unsigned long priv extern int isolate_movable_page(struct page *page, isolate_mode_t mode); extern void putback_movable_page(struct page *page); -extern void migrate_prep(void); -extern void migrate_finish(void); -extern void migrate_prep_local(void); extern void migrate_page_states(struct page *newpage, struct page *page); extern void migrate_page_copy(struct page *newpage, struct page *page); extern int migrate_huge_page_move_mapping(struct address_space *mapping, @@ -67,10 +64,6 @@ static inline struct page *alloc_migration_target(struct page *page, static inline int isolate_movable_page(struct page *page, isolate_mode_t mode) { return -EBUSY; } -static inline int migrate_prep(void) { return -ENOSYS; } -static inline int migrate_finish(void) { return -ENOSYS; } -static inline int migrate_prep_local(void) { return -ENOSYS; } - static inline void migrate_page_states(struct page *newpage, struct page *page) { } diff --git a/mm/compaction.c b/mm/compaction.c index e04f4476e68e..3be017ececc0 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -2319,7 +2319,8 @@ compact_zone(struct compact_control *cc, struct capture_control *capc) trace_mm_compaction_begin(start_pfn, cc->migrate_pfn, cc->free_pfn, end_pfn, sync); - migrate_prep_local(); + /* lru_add_drain_all could be expensive with involving other CPUs */ + lru_add_drain(); while ((ret = compact_finished(cc)) == COMPACT_CONTINUE) { int err; diff --git a/mm/mempolicy.c b/mm/mempolicy.c index 495b43a4b0f8..6daf9cc4c843 100644 --- a/mm/mempolicy.c +++ b/mm/mempolicy.c @@ -1124,7 +1124,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, int err = 0; nodemask_t tmp; - migrate_prep(); + lru_cache_disable(); mmap_read_lock(mm); @@ -1209,7 +1209,7 @@ int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from, } mmap_read_unlock(mm); - migrate_finish(); + lru_cache_enable(); if (err < 0) return err; return busy; @@ -1325,7 +1325,7 @@ static long do_mbind(unsigned long start, unsigned long len, if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) { - migrate_prep(); + lru_cache_disable(); } { NODEMASK_SCRATCH(scratch); @@ -1374,7 +1374,7 @@ static long do_mbind(unsigned long start, unsigned long len, mpol_out: mpol_put(new); if (flags & (MPOL_MF_MOVE | MPOL_MF_MOVE_ALL)) - migrate_finish(); + lru_cache_enable(); return err; } diff --git a/mm/migrate.c b/mm/migrate.c index 4d6c306d41c6..acc9913e4303 100644 --- a/mm/migrate.c +++ b/mm/migrate.c @@ -57,30 +57,6 @@ #include "internal.h" -/* - * migrate_prep() needs to be called before we start compiling a list of pages - * to be migrated using isolate_lru_page(). If scheduling work on other CPUs is - * undesirable, use migrate_prep_local() - */ -void migrate_prep(void) -{ - /* - * Clear the LRU lists so pages can be isolated. - */ - lru_cache_disable(); -} - -void migrate_finish(void) -{ - lru_cache_enable(); -} - -/* Do the necessary work of migrate_prep but not if it involves other CPUs */ -void migrate_prep_local(void) -{ - lru_add_drain(); -} - int isolate_movable_page(struct page *page, isolate_mode_t mode) { struct address_space *mapping; @@ -1771,7 +1747,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, int start, i; int err = 0, err1; - migrate_prep(); + lru_cache_disable(); for (i = start = 0; i < nr_pages; i++) { const void __user *p; @@ -1840,7 +1816,7 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, if (err >= 0) err = err1; out: - migrate_finish(); + lru_cache_enable(); return err; } diff --git a/mm/page_alloc.c b/mm/page_alloc.c index af5d4eeb2999..bf1606c7965a 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -8467,7 +8467,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, .gfp_mask = GFP_USER | __GFP_MOVABLE | __GFP_RETRY_MAYFAIL, }; - migrate_prep(); + lru_cache_disable(); while (pfn < end || !list_empty(&cc->migratepages)) { if (fatal_signal_pending(current)) { @@ -8496,7 +8496,7 @@ static int __alloc_contig_migrate_range(struct compact_control *cc, NULL, (unsigned long)&mtc, cc->mode, MR_CONTIG_RANGE); } - migrate_finish(); + lru_cache_enable(); if (ret < 0) { putback_movable_pages(&cc->migratepages); return ret; From patchwork Fri Mar 19 17:51:27 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Minchan Kim X-Patchwork-Id: 12151635 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-17.0 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,INCLUDES_CR_TRAILER,INCLUDES_PATCH,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_GIT autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 034ACC433E0 for ; Fri, 19 Mar 2021 17:52:45 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id CF8936198E for ; Fri, 19 Mar 2021 17:52:44 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230235AbhCSRwP (ORCPT ); Fri, 19 Mar 2021 13:52:15 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42618 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230145AbhCSRvy (ORCPT ); Fri, 19 Mar 2021 13:51:54 -0400 Received: from mail-pf1-x432.google.com (mail-pf1-x432.google.com [IPv6:2607:f8b0:4864:20::432]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 6DA7BC06174A; Fri, 19 Mar 2021 10:51:54 -0700 (PDT) Received: by mail-pf1-x432.google.com with SMTP id g15so6419743pfq.3; Fri, 19 Mar 2021 10:51:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=q5cs7BR95iNbZ6WMCZed7NnR/Mld0Eh7IFrfWwaLHq8=; b=ZqvQWkv7qc3+G9XEcpiufuNDs1mOJJbP8ItCeiZtB0Yxsa1x1XSzxYvaB4NFwwHQHt 0N2GrBu3calLG4NLoPppXrXp7mYrsdzwYiiFfvoFvKLlPSQboreA2U/5GzE4KcCaccLT c83g+eos6DixdQ6+x5Eb18c0FyHN9JW98mw8gE7KrPkJqpw+d83dLtww2Wvcg4fMEM0I B2wrxY/SM/vzcG9n1cZ3RVdww/y8WsY51LahgLt46zYdWpETOGH8LRWlreb6+kau+KGQ 1Lf6amVkcHQZ8prVpbVVO34FS9Cljo/O83XbI/VBsmUs0pzMCk9vDCBHPdYQXFOh7xuB E16g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=q5cs7BR95iNbZ6WMCZed7NnR/Mld0Eh7IFrfWwaLHq8=; b=GonHEmPwlgX9c15aJxCGwipEgMqjjVDsw9RFOG0UmnFiDNKXpUdMXSSqxtZH2EumJS lJQYXb9GcWkCgfU0F4a12khgnh1RNJQGfJXcURkbtIPYNs/X4bfWaS7tjtDocLFesbUQ 5QxEZS1g/Ydq0b4AD1ZjDBzc+qmSxRHnClhVudFx5JCKcUAayG5F1oa5hmjqIgb1YXPx yA2norVdaOw5I73ke2n3dt8T9X77sGlcmB7Uq6SFd5Ofxuu5WjGpJaGisvDDaYiF+xag S/UH67iDvvOhroUnycMzsp2700qRiAPMcehi4llvOuoFaC2HZIlH9Cc5j02B7DKxbDbc 5SRw== X-Gm-Message-State: AOAM53146gVe6GgiXd8YmDs/cfSkK8rH3LtNXOdT8dc9DctB67d+IwUh 7UTzk9hSKOMU5X7gGPDrHhU= X-Google-Smtp-Source: ABdhPJxA3BmuVyAsv9cP2aECptIsxggqkZ3YsQbA0KajbfUtZGT9Iw5MS6vCF49mYU3xhCxczVdHOQ== X-Received: by 2002:a62:1713:0:b029:1f1:56e2:8ec6 with SMTP id 19-20020a6217130000b02901f156e28ec6mr10095937pfx.56.1616176314032; Fri, 19 Mar 2021 10:51:54 -0700 (PDT) Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:201:913d:5573:c956:f033]) by smtp.gmail.com with ESMTPSA id w8sm5498287pgk.46.2021.03.19.10.51.52 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 19 Mar 2021 10:51:53 -0700 (PDT) Sender: Minchan Kim From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , joaodias@google.com, surenb@google.com, cgoldswo@codeaurora.org, willy@infradead.org, mhocko@suse.com, david@redhat.com, vbabka@suse.cz, linux-fsdevel@vger.kernel.org, oliver.sang@intel.com, Minchan Kim Subject: [PATCH v4 3/3] mm: fs: Invalidate BH LRU during page migration Date: Fri, 19 Mar 2021 10:51:27 -0700 Message-Id: <20210319175127.886124-3-minchan@kernel.org> X-Mailer: git-send-email 2.31.0.rc2.261.g7f71774620-goog In-Reply-To: <20210319175127.886124-1-minchan@kernel.org> References: <20210319175127.886124-1-minchan@kernel.org> MIME-Version: 1.0 Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org Pages containing buffer_heads that are in one of the per-CPU buffer_head LRU caches will be pinned and thus cannot be migrated. This can prevent CMA allocations from succeeding, which are often used on platforms with co-processors (such as a DSP) that can only use physically contiguous memory. It can also prevent memory hot-unplugging from succeeding, which involves migrating at least MIN_MEMORY_BLOCK_SIZE bytes of memory, which ranges from 8 MiB to 1 GiB based on the architecture in use. Correspondingly, invalidate the BH LRU caches before a migration starts and stop any buffer_head from being cached in the LRU caches, until migration has finished. Tested-by: Oliver Sang Reported-by: kernel test robot Signed-off-by: Chris Goldsworthy Signed-off-by: Minchan Kim Tested-by: Oliver Sang Reported-by: kernel test robot Reported-by: Laura Abbott Reported-by: Chris Goldsworthy Signed-off-by: Minchan Kim --- fs/buffer.c | 36 ++++++++++++++++++++++++++++++------ include/linux/buffer_head.h | 4 ++++ mm/swap.c | 5 ++++- 3 files changed, 38 insertions(+), 7 deletions(-) diff --git a/fs/buffer.c b/fs/buffer.c index 0cb7ffd4977c..e9872d0dcbf1 100644 --- a/fs/buffer.c +++ b/fs/buffer.c @@ -1264,6 +1264,15 @@ static void bh_lru_install(struct buffer_head *bh) int i; check_irqs_on(); + /* + * the refcount of buffer_head in bh_lru prevents dropping the + * attached page(i.e., try_to_free_buffers) so it could cause + * failing page migration. + * Skip putting upcoming bh into bh_lru until migration is done. + */ + if (lru_cache_disabled()) + return; + bh_lru_lock(); b = this_cpu_ptr(&bh_lrus); @@ -1404,6 +1413,15 @@ __bread_gfp(struct block_device *bdev, sector_t block, } EXPORT_SYMBOL(__bread_gfp); +static void __invalidate_bh_lrus(struct bh_lru *b) +{ + int i; + + for (i = 0; i < BH_LRU_SIZE; i++) { + brelse(b->bhs[i]); + b->bhs[i] = NULL; + } +} /* * invalidate_bh_lrus() is called rarely - but not only at unmount. * This doesn't race because it runs in each cpu either in irq @@ -1412,16 +1430,12 @@ EXPORT_SYMBOL(__bread_gfp); static void invalidate_bh_lru(void *arg) { struct bh_lru *b = &get_cpu_var(bh_lrus); - int i; - for (i = 0; i < BH_LRU_SIZE; i++) { - brelse(b->bhs[i]); - b->bhs[i] = NULL; - } + __invalidate_bh_lrus(b); put_cpu_var(bh_lrus); } -static bool has_bh_in_lru(int cpu, void *dummy) +bool has_bh_in_lru(int cpu, void *dummy) { struct bh_lru *b = per_cpu_ptr(&bh_lrus, cpu); int i; @@ -1440,6 +1454,16 @@ void invalidate_bh_lrus(void) } EXPORT_SYMBOL_GPL(invalidate_bh_lrus); +void invalidate_bh_lrus_cpu(int cpu) +{ + struct bh_lru *b; + + bh_lru_lock(); + b = per_cpu_ptr(&bh_lrus, cpu); + __invalidate_bh_lrus(b); + bh_lru_unlock(); +} + void set_bh_page(struct buffer_head *bh, struct page *page, unsigned long offset) { diff --git a/include/linux/buffer_head.h b/include/linux/buffer_head.h index 6b47f94378c5..e7e99da31349 100644 --- a/include/linux/buffer_head.h +++ b/include/linux/buffer_head.h @@ -194,6 +194,8 @@ void __breadahead_gfp(struct block_device *, sector_t block, unsigned int size, struct buffer_head *__bread_gfp(struct block_device *, sector_t block, unsigned size, gfp_t gfp); void invalidate_bh_lrus(void); +void invalidate_bh_lrus_cpu(int cpu); +bool has_bh_in_lru(int cpu, void *dummy); struct buffer_head *alloc_buffer_head(gfp_t gfp_flags); void free_buffer_head(struct buffer_head * bh); void unlock_buffer(struct buffer_head *bh); @@ -406,6 +408,8 @@ static inline int inode_has_buffers(struct inode *inode) { return 0; } static inline void invalidate_inode_buffers(struct inode *inode) {} static inline int remove_inode_buffers(struct inode *inode) { return 1; } static inline int sync_mapping_buffers(struct address_space *mapping) { return 0; } +static inline void invalidate_bh_lrus_cpu(int cpu) {} +static inline bool has_bh_in_lru(int cpu, void *dummy) { return 0; } #define buffer_heads_over_limit 0 #endif /* CONFIG_BLOCK */ diff --git a/mm/swap.c b/mm/swap.c index c94f55e7b649..a75a8265302b 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -36,6 +36,7 @@ #include #include #include +#include #include "internal.h" @@ -641,6 +642,7 @@ void lru_add_drain_cpu(int cpu) pagevec_lru_move_fn(pvec, lru_lazyfree_fn); activate_page_drain(cpu); + invalidate_bh_lrus_cpu(cpu); } /** @@ -828,7 +830,8 @@ inline void __lru_add_drain_all(bool force_all_cpus) pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) || pagevec_count(&per_cpu(lru_pvecs.lru_lazyfree, cpu)) || - need_activate_page_drain(cpu)) { + need_activate_page_drain(cpu) || + has_bh_in_lru(cpu, NULL)) { INIT_WORK(work, lru_add_drain_per_cpu); queue_work_on(cpu, mm_percpu_wq, work); __cpumask_set_cpu(cpu, &has_work);