From patchwork Wed Dec 22 11:41:09 2021 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Sebastian Andrzej Siewior X-Patchwork-Id: 12691417 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 50528C433FE for ; Wed, 22 Dec 2021 11:41:24 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B5D376B0072; Wed, 22 Dec 2021 06:41:22 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id AD08A6B007D; Wed, 22 Dec 2021 06:41:22 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 7EB106B0073; Wed, 22 Dec 2021 06:41:22 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0067.hostedemail.com [216.40.44.67]) by kanga.kvack.org (Postfix) with ESMTP id 5D50B6B0074 for ; Wed, 22 Dec 2021 06:41:22 -0500 (EST) Received: from smtpin12.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 0F84D8249980 for ; Wed, 22 Dec 2021 11:41:22 +0000 (UTC) X-FDA: 78945239604.12.395A13A Received: from galois.linutronix.de (Galois.linutronix.de [193.142.43.55]) by imf18.hostedemail.com (Postfix) with ESMTP id 42C731C003F for ; Wed, 22 Dec 2021 11:41:15 +0000 (UTC) From: Sebastian Andrzej Siewior DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020; t=1640173279; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yft2/5r6EVO3JzmEvsALpkpwi74bPqzdoozMCIubZx8=; b=pOw5kNMyB6XOy/Unv2vaRj6VI4mttvzxO0ZoEyRZOtsTemnrZgXe9y7ZE3o4GNGm6M6CFT 4hJPSSc9fuLikRDCqaKhIDUyhZ525Ilz03qUWZ9cJbj6qcjhhvEXiUYmvfLEDMp+S/JNg+ fWxwUJ5qhQeRpFNJoI08/t5n5QiS07gQMhfIM6t5bDhSPu1LVWod50QYiDv5FSRyhrS6B8 jsrGKULY+w0LXb2FrDUIWGmJSjoT6Gt3vXQl5YM6LHhz09mEBy5QSwOa3ZhpV7jf/zhx2K DMXGwfqIS/XgffBiVSDMtJtcoCxsoXnr72IWW4jAyn1pgjoEZD5eDGygiOIHJw== DKIM-Signature: v=1; a=ed25519-sha256; c=relaxed/relaxed; d=linutronix.de; s=2020e; t=1640173279; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=yft2/5r6EVO3JzmEvsALpkpwi74bPqzdoozMCIubZx8=; b=c4hUYgLkfCPKlFS+LQNaZv/FhIkIEekHgNpFhxGBE1Km2sbcdHUb1LddZQgBqWswMVzMMa 8v/q3Ez2h2/szKCg== To: cgroups@vger.kernel.org, linux-mm@kvack.org Cc: Johannes Weiner , Michal Hocko , Vladimir Davydov , Andrew Morton , Thomas Gleixner , Waiman Long , Peter Zijlstra , Sebastian Andrzej Siewior Subject: [RFC PATCH 1/3] mm/memcg: Protect per-CPU counter by disabling preemption on PREEMPT_RT Date: Wed, 22 Dec 2021 12:41:09 +0100 Message-Id: <20211222114111.2206248-2-bigeasy@linutronix.de> In-Reply-To: <20211222114111.2206248-1-bigeasy@linutronix.de> References: <20211222114111.2206248-1-bigeasy@linutronix.de> MIME-Version: 1.0 X-Rspamd-Queue-Id: 42C731C003F X-Stat-Signature: hiqiokp8kp4g5o833uputtehk54twnpd Authentication-Results: imf18.hostedemail.com; dkim=pass header.d=linutronix.de header.s=2020 header.b=pOw5kNMy; dkim=pass header.d=linutronix.de header.s=2020e header.b=c4hUYgLk; dmarc=pass (policy=none) header.from=linutronix.de; spf=pass (imf18.hostedemail.com: domain of bigeasy@linutronix.de designates 193.142.43.55 as permitted sender) smtp.mailfrom=bigeasy@linutronix.de X-Rspamd-Server: rspam11 X-HE-Tag: 1640173275-524368 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: The per-CPU counter are modified with the non-atomic modifier. The consistency is ensure by disabling interrupts for the update. This breaks on PREEMPT_RT because some sections additionally acquire a spinlock_t lock (which becomes sleeping and must not be acquired with disabled interrupts). Another problem is that mem_cgroup_swapout() expects to be invoked with disabled interrupts because the caller has to acquire a spinlock_t which is acquired with disabled interrupts. Since spinlock_t never disables interrupts on PREEMPT_RT the interrupts are never disabled at this point. The code is never called from in_irq() context on PREEMPT_RT therefore disabling preemption during the update is sufficient on PREEMPT_RT. The sections with disabled preemption must exclude memcg_check_events() so that spinlock_t locks can still be acquired (for instance in eventfd_signal()). Don't disable interrupts during updates of the per-CPU variables, instead use shorter sections which disable preemption. Signed-off-by: Sebastian Andrzej Siewior --- mm/memcontrol.c | 74 +++++++++++++++++++++++++++++++++++++++---------- 1 file changed, 60 insertions(+), 14 deletions(-) diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 2ed5f2a0879d3..d2687d5ed544b 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -671,8 +671,14 @@ void __mod_memcg_state(struct mem_cgroup *memcg, int idx, int val) if (mem_cgroup_disabled()) return; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); + __this_cpu_add(memcg->vmstats_percpu->state[idx], val); memcg_rstat_updated(memcg); + + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); } /* idx can be of type enum memcg_stat_item or node_stat_item. */ @@ -699,6 +705,9 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, pn = container_of(lruvec, struct mem_cgroup_per_node, lruvec); memcg = pn->memcg; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); + /* Update memcg */ __this_cpu_add(memcg->vmstats_percpu->state[idx], val); @@ -706,6 +715,9 @@ void __mod_memcg_lruvec_state(struct lruvec *lruvec, enum node_stat_item idx, __this_cpu_add(pn->lruvec_stats_percpu->state[idx], val); memcg_rstat_updated(memcg); + + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); } /** @@ -788,8 +800,13 @@ void __count_memcg_events(struct mem_cgroup *memcg, enum vm_event_item idx, if (mem_cgroup_disabled()) return; + if (IS_ENABLED(PREEMPT_RT)) + preempt_disable(); + __this_cpu_add(memcg->vmstats_percpu->events[idx], count); memcg_rstat_updated(memcg); + if (IS_ENABLED(PREEMPT_RT)) + preempt_enable(); } static unsigned long memcg_events(struct mem_cgroup *memcg, int event) @@ -810,6 +827,9 @@ static unsigned long memcg_events_local(struct mem_cgroup *memcg, int event) static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, int nr_pages) { + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); + /* pagein of a big page is an event. So, ignore page size */ if (nr_pages > 0) __count_memcg_events(memcg, PGPGIN, 1); @@ -819,12 +839,19 @@ static void mem_cgroup_charge_statistics(struct mem_cgroup *memcg, } __this_cpu_add(memcg->vmstats_percpu->nr_page_events, nr_pages); + + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); } static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, enum mem_cgroup_events_target target) { unsigned long val, next; + bool ret = false; + + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_disable(); val = __this_cpu_read(memcg->vmstats_percpu->nr_page_events); next = __this_cpu_read(memcg->vmstats_percpu->targets[target]); @@ -841,9 +868,11 @@ static bool mem_cgroup_event_ratelimit(struct mem_cgroup *memcg, break; } __this_cpu_write(memcg->vmstats_percpu->targets[target], next); - return true; + ret = true; } - return false; + if (IS_ENABLED(CONFIG_PREEMPT_RT)) + preempt_enable(); + return ret; } /* @@ -5645,12 +5674,14 @@ static int mem_cgroup_move_account(struct page *page, ret = 0; nid = folio_nid(folio); - local_irq_disable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_disable(); mem_cgroup_charge_statistics(to, nr_pages); memcg_check_events(to, nid); mem_cgroup_charge_statistics(from, -nr_pages); memcg_check_events(from, nid); - local_irq_enable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_enable(); out_unlock: folio_unlock(folio); out: @@ -6670,10 +6701,12 @@ static int charge_memcg(struct folio *folio, struct mem_cgroup *memcg, css_get(&memcg->css); commit_charge(folio, memcg); - local_irq_disable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_disable(); mem_cgroup_charge_statistics(memcg, nr_pages); memcg_check_events(memcg, folio_nid(folio)); - local_irq_enable(); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_enable(); out: return ret; } @@ -6785,11 +6818,20 @@ static void uncharge_batch(const struct uncharge_gather *ug) memcg_oom_recover(ug->memcg); } - local_irq_save(flags); - __count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout); - __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory); - memcg_check_events(ug->memcg, ug->nid); - local_irq_restore(flags); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) { + local_irq_save(flags); + __count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout); + __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory); + memcg_check_events(ug->memcg, ug->nid); + local_irq_restore(flags); + } else { + preempt_disable(); + __count_memcg_events(ug->memcg, PGPGOUT, ug->pgpgout); + __this_cpu_add(ug->memcg->vmstats_percpu->nr_page_events, ug->nr_memory); + preempt_enable(); + + memcg_check_events(ug->memcg, ug->nid); + } /* drop reference from uncharge_folio */ css_put(&ug->memcg->css); @@ -6930,10 +6972,12 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new) css_get(&memcg->css); commit_charge(new, memcg); - local_irq_save(flags); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_save(flags); mem_cgroup_charge_statistics(memcg, nr_pages); memcg_check_events(memcg, folio_nid(new)); - local_irq_restore(flags); + if (!IS_ENABLED(CONFIG_PREEMPT_RT)) + local_irq_restore(flags); } DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key); @@ -7157,8 +7201,10 @@ void mem_cgroup_swapout(struct page *page, swp_entry_t entry) * i_pages lock which is taken with interrupts-off. It is * important here to have the interrupts disabled because it is the * only synchronisation we have for updating the per-CPU variables. + * On PREEMPT_RT interrupts are never disabled and the updates to per-CPU + * variables are synchronised by keeping preemption disabled. */ - VM_BUG_ON(!irqs_disabled()); + VM_BUG_ON(!IS_ENABLED(CONFIG_PREEMPT_RT) && !irqs_disabled()); mem_cgroup_charge_statistics(memcg, -nr_entries); memcg_check_events(memcg, page_to_nid(page));