From patchwork Tue Feb 22 14:47:07 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 12755245 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 26BACC433EF for ; Tue, 22 Feb 2022 14:51:11 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 699748D0003; Tue, 22 Feb 2022 09:51:11 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 648C38D0001; Tue, 22 Feb 2022 09:51:11 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 4E9AC8D0003; Tue, 22 Feb 2022 09:51:11 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0129.hostedemail.com [216.40.44.129]) by kanga.kvack.org (Postfix) with ESMTP id 418928D0001 for ; Tue, 22 Feb 2022 09:51:11 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 6D54C99EB9 for ; Tue, 22 Feb 2022 14:51:09 +0000 (UTC) X-FDA: 79170703458.13.04D387B Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by imf02.hostedemail.com (Postfix) with ESMTP id EF40180011 for ; Tue, 22 Feb 2022 14:51:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645541468; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=7v5oF1jmYEp4q4i26tu3+p6Rx6KOC1/aMDSCGwDgiSA=; b=CaAakIg4JiXVF0EZPZ4jV7+OgptjzThDhBCcqYK/k84+gipOep4THHE6hx0pO+rqi+7VvA Vl9TrjLxjhBqjjQUUhE1UKUrLh1R8/AtGJNAsgTQFtoFxqx06D2wDSdfZUtu46BxNuHzzw TYrj9xCck7qrdBQTMPx6ehvh0MM0ld4= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-7-aCwnt6a3PwetfEnLMW3KQQ-1; Tue, 22 Feb 2022 09:51:05 -0500 X-MC-Unique: aCwnt6a3PwetfEnLMW3KQQ-1 Received: from smtp.corp.redhat.com (int-mx08.intmail.prod.int.phx2.redhat.com [10.5.11.23]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 6B3E418766D0; Tue, 22 Feb 2022 14:51:03 +0000 (UTC) Received: from fuller.cnet (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4A6AB3481B; Tue, 22 Feb 2022 14:50:31 +0000 (UTC) Received: by fuller.cnet (Postfix, from userid 1000) id 0B4904168B84; Tue, 22 Feb 2022 11:49:40 -0300 (-03) Message-ID: <20220222144907.023121407@redhat.com> User-Agent: quilt/0.66 Date: Tue, 22 Feb 2022 11:47:07 -0300 From: Marcelo Tosatti To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Minchan Kim , Matthew Wilcox , Mel Gorman , Nicolas Saenz Julienne , Juri Lelli , Thomas Gleixner , Sebastian Andrzej Siewior , "Paul E. McKenney" , Marcelo Tosatti Subject: [patch 1/2] mm: protect local lock sections with rcu_read_lock (on RT) References: <20220222144706.937848439@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.84 on 10.5.11.23 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: EF40180011 X-Stat-Signature: w44foyucdopycgfi7zbhi7n8bqnjxrn8 Authentication-Results: imf02.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=CaAakIg4; spf=none (imf02.hostedemail.com: domain of mtosatti@redhat.com has no SPF policy when checking 170.10.129.124) smtp.mailfrom=mtosatti@redhat.com; dmarc=pass (policy=none) header.from=redhat.com X-HE-Tag: 1645541468-699606 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: For the per-CPU LRU page vectors, augment the local lock protected code sections with rcu_read_lock. This makes it possible to replace the queueing of work items on all CPUs by synchronize_rcu (which is necessary to run FIFO:1 applications uninterrupted on isolated CPUs). Signed-off-by: Marcelo Tosatti Index: linux-rt-devel/mm/swap.c =================================================================== --- linux-rt-devel.orig/mm/swap.c +++ linux-rt-devel/mm/swap.c @@ -73,6 +73,48 @@ static DEFINE_PER_CPU(struct lru_pvecs, .lock = INIT_LOCAL_LOCK(lock), }; +#ifdef CONFIG_PREEMPT_RT + +#define lru_local_lock(lock) \ + do { \ + rcu_read_lock(); \ + local_lock(lock); \ + } while (0) + +#define lru_local_unlock(lock) \ + do { \ + local_unlock(lock); \ + rcu_read_unlock(); \ + } while (0) + +#define lru_local_lock_irqsave(lock, flags) \ + do { \ + rcu_read_lock(); \ + local_lock_irqsave(lock, flags); \ + } while (0) + +#define lru_local_unlock_irqrestore(lock, flags) \ + do { \ + local_unlock_irqrestore(lock, flags); \ + rcu_read_unlock(); \ + } while (0) + +#else + +#define lru_local_lock(lock) \ + local_lock(lock) + +#define lru_local_unlock(lock) \ + local_unlock(lock) + +#define lru_local_lock_irqsave(lock, flag) \ + local_lock_irqsave(lock, flags) + +#define lru_local_unlock_irqrestore(lock, flags) \ + local_unlock_irqrestore(lock, flags) + +#endif + /* * This path almost never happens for VM activity - pages are normally * freed via pagevecs. But it gets used by networking. @@ -255,11 +297,11 @@ void folio_rotate_reclaimable(struct fol unsigned long flags; folio_get(folio); - local_lock_irqsave(&lru_rotate.lock, flags); + lru_local_lock_irqsave(&lru_rotate.lock, flags); pvec = this_cpu_ptr(&lru_rotate.pvec); if (pagevec_add_and_need_flush(pvec, &folio->page)) pagevec_lru_move_fn(pvec, pagevec_move_tail_fn); - local_unlock_irqrestore(&lru_rotate.lock, flags); + lru_local_unlock_irqrestore(&lru_rotate.lock, flags); } } @@ -351,11 +393,11 @@ static void folio_activate(struct folio struct pagevec *pvec; folio_get(folio); - local_lock(&lru_pvecs.lock); + lru_local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.activate_page); if (pagevec_add_and_need_flush(pvec, &folio->page)) pagevec_lru_move_fn(pvec, __activate_page); - local_unlock(&lru_pvecs.lock); + lru_local_unlock(&lru_pvecs.lock); } } @@ -382,7 +424,7 @@ static void __lru_cache_activate_folio(s struct pagevec *pvec; int i; - local_lock(&lru_pvecs.lock); + lru_local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.lru_add); /* @@ -404,7 +446,7 @@ static void __lru_cache_activate_folio(s } } - local_unlock(&lru_pvecs.lock); + lru_local_unlock(&lru_pvecs.lock); } /* @@ -463,11 +505,11 @@ void folio_add_lru(struct folio *folio) VM_BUG_ON_FOLIO(folio_test_lru(folio), folio); folio_get(folio); - local_lock(&lru_pvecs.lock); + lru_local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.lru_add); if (pagevec_add_and_need_flush(pvec, &folio->page)) __pagevec_lru_add(pvec); - local_unlock(&lru_pvecs.lock); + lru_local_unlock(&lru_pvecs.lock); } EXPORT_SYMBOL(folio_add_lru); @@ -618,9 +660,9 @@ void lru_add_drain_cpu(int cpu) unsigned long flags; /* No harm done if a racing interrupt already did this */ - local_lock_irqsave(&lru_rotate.lock, flags); + lru_local_lock_irqsave(&lru_rotate.lock, flags); pagevec_lru_move_fn(pvec, pagevec_move_tail_fn); - local_unlock_irqrestore(&lru_rotate.lock, flags); + lru_local_unlock_irqrestore(&lru_rotate.lock, flags); } pvec = &per_cpu(lru_pvecs.lru_deactivate_file, cpu); @@ -658,12 +700,12 @@ void deactivate_file_page(struct page *p if (likely(get_page_unless_zero(page))) { struct pagevec *pvec; - local_lock(&lru_pvecs.lock); + lru_local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate_file); if (pagevec_add_and_need_flush(pvec, page)) pagevec_lru_move_fn(pvec, lru_deactivate_file_fn); - local_unlock(&lru_pvecs.lock); + lru_local_unlock(&lru_pvecs.lock); } } @@ -680,12 +722,12 @@ void deactivate_page(struct page *page) if (PageLRU(page) && PageActive(page) && !PageUnevictable(page)) { struct pagevec *pvec; - local_lock(&lru_pvecs.lock); + lru_local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.lru_deactivate); get_page(page); if (pagevec_add_and_need_flush(pvec, page)) pagevec_lru_move_fn(pvec, lru_deactivate_fn); - local_unlock(&lru_pvecs.lock); + lru_local_unlock(&lru_pvecs.lock); } } @@ -702,20 +744,20 @@ void mark_page_lazyfree(struct page *pag !PageSwapCache(page) && !PageUnevictable(page)) { struct pagevec *pvec; - local_lock(&lru_pvecs.lock); + lru_local_lock(&lru_pvecs.lock); pvec = this_cpu_ptr(&lru_pvecs.lru_lazyfree); get_page(page); if (pagevec_add_and_need_flush(pvec, page)) pagevec_lru_move_fn(pvec, lru_lazyfree_fn); - local_unlock(&lru_pvecs.lock); + lru_local_unlock(&lru_pvecs.lock); } } void lru_add_drain(void) { - local_lock(&lru_pvecs.lock); + lru_local_lock(&lru_pvecs.lock); lru_add_drain_cpu(smp_processor_id()); - local_unlock(&lru_pvecs.lock); + lru_local_unlock(&lru_pvecs.lock); } /* @@ -726,18 +768,18 @@ void lru_add_drain(void) */ static void lru_add_and_bh_lrus_drain(void) { - local_lock(&lru_pvecs.lock); + lru_local_lock(&lru_pvecs.lock); lru_add_drain_cpu(smp_processor_id()); - local_unlock(&lru_pvecs.lock); + lru_local_unlock(&lru_pvecs.lock); invalidate_bh_lrus_cpu(); } void lru_add_drain_cpu_zone(struct zone *zone) { - local_lock(&lru_pvecs.lock); + lru_local_lock(&lru_pvecs.lock); lru_add_drain_cpu(smp_processor_id()); drain_local_pages(zone); - local_unlock(&lru_pvecs.lock); + lru_local_unlock(&lru_pvecs.lock); } #ifdef CONFIG_SMP From patchwork Tue Feb 22 14:47:08 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Marcelo Tosatti X-Patchwork-Id: 12755244 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 280C6C433EF for ; Tue, 22 Feb 2022 14:50:44 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 3B2FA8D0002; Tue, 22 Feb 2022 09:50:44 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 3627C8D0001; Tue, 22 Feb 2022 09:50:44 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 22AB98D0002; Tue, 22 Feb 2022 09:50:44 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (relay.hostedemail.com [64.99.140.27]) by kanga.kvack.org (Postfix) with ESMTP id 0CAE38D0001 for ; Tue, 22 Feb 2022 09:50:44 -0500 (EST) Received: from smtpin08.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay06.hostedemail.com (Postfix) with ESMTP id B5E5022F25 for ; Tue, 22 Feb 2022 14:50:43 +0000 (UTC) X-FDA: 79170702366.08.E7E41D2 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by imf29.hostedemail.com (Postfix) with ESMTP id 1B2FC120002 for ; Tue, 22 Feb 2022 14:50:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1645541442; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: references:references; bh=Cr0ZYgOxMgOWM25f+kX0AGpz4r/lzS38LGJca9sTxPc=; b=bBJCn65EuDmtrDJ9ia/omVnhked1X/EjB8nY48CN7AygEfQExiASJvTCDdIBZV2ywRJ1iZ riodxTtT9h39mUPXabZVcsrMCY4dpSAPvQewUhWkh/qEJdhY5U3bacQowOIfWY72wMmBWH dSZ/cXMtboCpVA08Iht4qufeZUdH3w8= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-108-SwHpjaPPOQa55D7lQto1Cg-1; Tue, 22 Feb 2022 09:50:39 -0500 X-MC-Unique: SwHpjaPPOQa55D7lQto1Cg-1 Received: from smtp.corp.redhat.com (int-mx04.intmail.prod.int.phx2.redhat.com [10.5.11.14]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id 057A918766D4; Tue, 22 Feb 2022 14:50:38 +0000 (UTC) Received: from fuller.cnet (ovpn-112-2.gru2.redhat.com [10.97.112.2]) by smtp.corp.redhat.com (Postfix) with ESMTPS id 4A3107C047; Tue, 22 Feb 2022 14:50:31 +0000 (UTC) Received: by fuller.cnet (Postfix, from userid 1000) id 0DC564168B8E; Tue, 22 Feb 2022 11:49:40 -0300 (-03) Message-ID: <20220222144907.056089321@redhat.com> User-Agent: quilt/0.66 Date: Tue, 22 Feb 2022 11:47:08 -0300 From: Marcelo Tosatti To: linux-kernel@vger.kernel.org Cc: linux-mm@kvack.org, Minchan Kim , Matthew Wilcox , Mel Gorman , Nicolas Saenz Julienne , Juri Lelli , Thomas Gleixner , Sebastian Andrzej Siewior , "Paul E. McKenney" , Marcelo Tosatti Subject: [patch 2/2] mm: lru_cache_disable: replace work queue synchronization with synchronize_rcu References: <20220222144706.937848439@redhat.com> MIME-Version: 1.0 X-Scanned-By: MIMEDefang 2.79 on 10.5.11.14 Authentication-Results: imf29.hostedemail.com; dkim=pass header.d=redhat.com header.s=mimecast20190719 header.b=bBJCn65E; dmarc=pass (policy=none) header.from=redhat.com; spf=none (imf29.hostedemail.com: domain of mtosatti@redhat.com has no SPF policy when checking 170.10.133.124) smtp.mailfrom=mtosatti@redhat.com X-Rspam-User: X-Rspamd-Server: rspam10 X-Rspamd-Queue-Id: 1B2FC120002 X-Stat-Signature: ekrpghd4krzzjxc3a4mda4cj5gpkncgu X-HE-Tag: 1645541442-850059 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On systems that run FIFO:1 applications that busy loop on isolated CPUs, executing tasks on such CPUs under lower priority is undesired (since that will either hang the system, or cause longer interruption to the FIFO task due to execution of lower priority task with very small sched slices). Commit d479960e44f27e0e52ba31b21740b703c538027c ("mm: disable LRU pagevec during the migration temporarily") relies on queueing work items on all online CPUs to ensure visibility of lru_disable_count. However, its possible to use synchronize_rcu which will provide the same guarantees: * synchronize_rcu() waits for preemption disabled * and RCU read side critical sections * For the users of lru_disable_count: * * preempt_disable, local_irq_disable() [bh_lru_lock()] * rcu_read_lock [lru_pvecs CONFIG_PREEMPT_RT] * preempt_disable [lru_pvecs !CONFIG_PREEMPT_RT] * * * so any calls of lru_cache_disabled wrapped by * local_lock+rcu_read_lock or preemption disabled would be * ordered by that. Fixes: [ 1873.243925] INFO: task kworker/u160:0:9 blocked for more than 622 seconds. [ 1873.243927] Tainted: G I --------- --- 5.14.0-31.rt21.31.el9.x86_64 #1 [ 1873.243929] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [ 1873.243929] task:kworker/u160:0 state:D stack: 0 pid: 9 ppid: 2 flags:0x00004000 [ 1873.243932] Workqueue: cpuset_migrate_mm cpuset_migrate_mm_workfn [ 1873.243936] Call Trace: [ 1873.243938] __schedule+0x21b/0x5b0 [ 1873.243941] schedule+0x43/0xe0 [ 1873.243943] schedule_timeout+0x14d/0x190 [ 1873.243946] ? resched_curr+0x20/0xe0 [ 1873.243953] ? __prepare_to_swait+0x4b/0x70 [ 1873.243958] wait_for_completion+0x84/0xe0 [ 1873.243962] __flush_work.isra.0+0x146/0x200 [ 1873.243966] ? flush_workqueue_prep_pwqs+0x130/0x130 [ 1873.243971] __lru_add_drain_all+0x158/0x1f0 [ 1873.243978] do_migrate_pages+0x3d/0x2d0 [ 1873.243985] ? pick_next_task_fair+0x39/0x3b0 [ 1873.243989] ? put_prev_task_fair+0x1e/0x30 [ 1873.243992] ? pick_next_task+0xb30/0xbd0 [ 1873.243995] ? __tick_nohz_task_switch+0x1e/0x70 [ 1873.244000] ? raw_spin_rq_unlock+0x18/0x60 [ 1873.244002] ? finish_task_switch.isra.0+0xc1/0x2d0 [ 1873.244005] ? __switch_to+0x12f/0x510 [ 1873.244013] cpuset_migrate_mm_workfn+0x22/0x40 [ 1873.244016] process_one_work+0x1e0/0x410 [ 1873.244019] worker_thread+0x50/0x3b0 [ 1873.244022] ? process_one_work+0x410/0x410 [ 1873.244024] kthread+0x173/0x190 [ 1873.244027] ? set_kthread_struct+0x40/0x40 [ 1873.244031] ret_from_fork+0x1f/0x30 Signed-off-by: Marcelo Tosatti Index: linux-rt-devel/mm/swap.c =================================================================== --- linux-rt-devel.orig/mm/swap.c +++ linux-rt-devel/mm/swap.c @@ -873,8 +873,7 @@ inline void __lru_add_drain_all(bool for for_each_online_cpu(cpu) { struct work_struct *work = &per_cpu(lru_add_drain_work, cpu); - if (force_all_cpus || - pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || + if (pagevec_count(&per_cpu(lru_pvecs.lru_add, cpu)) || data_race(pagevec_count(&per_cpu(lru_rotate.pvec, cpu))) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate_file, cpu)) || pagevec_count(&per_cpu(lru_pvecs.lru_deactivate, cpu)) || @@ -918,14 +917,23 @@ atomic_t lru_disable_count = ATOMIC_INIT void lru_cache_disable(void) { atomic_inc(&lru_disable_count); + synchronize_rcu(); #ifdef CONFIG_SMP /* - * lru_add_drain_all in the force mode will schedule draining on - * all online CPUs so any calls of lru_cache_disabled wrapped by - * local_lock or preemption disabled would be ordered by that. - * The atomic operation doesn't need to have stronger ordering - * requirements because that is enforced by the scheduling - * guarantees. + * synchronize_rcu() waits for preemption disabled + * and RCU read side critical sections + * For the users of lru_disable_count: + * + * preempt_disable, local_irq_disable() [bh_lru_lock()] + * rcu_read_lock [lru_pvecs CONFIG_PREEMPT_RT] + * preempt_disable [lru_pvecs !CONFIG_PREEMPT_RT] + * + * + * so any calls of lru_cache_disabled wrapped by + * local_lock+rcu_read_lock or preemption disabled would be + * ordered by that. The atomic operation doesn't need to have + * stronger ordering requirements because that is enforced + * by the scheduling guarantees. */ __lru_add_drain_all(true); #else