From patchwork Fri Sep 30 14:06:48 2022 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Alexander Fedorov X-Patchwork-Id: 12995474 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 86F1FC433F5 for ; Fri, 30 Sep 2022 14:06:53 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id B271F8D0003; Fri, 30 Sep 2022 10:06:52 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id AAF608D0001; Fri, 30 Sep 2022 10:06:52 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 929A98D0003; Fri, 30 Sep 2022 10:06:52 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7D74E8D0001 for ; Fri, 30 Sep 2022 10:06:52 -0400 (EDT) Received: from smtpin25.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 0166EC1566 for ; Fri, 30 Sep 2022 14:06:51 +0000 (UTC) X-FDA: 79968927864.25.FA897CB Received: from mail-lf1-f49.google.com (mail-lf1-f49.google.com [209.85.167.49]) by imf17.hostedemail.com (Postfix) with ESMTP id 7A27040010 for ; Fri, 30 Sep 2022 14:06:51 +0000 (UTC) Received: by mail-lf1-f49.google.com with SMTP id bu25so7059470lfb.3 for ; Fri, 30 Sep 2022 07:06:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:content-transfer-encoding:user-agent:date:subject:cc :to:from:message-id:from:to:cc:subject:date; bh=Eu1UCcsFNrtGDpP0teOX35SousjnhGHT123vvgcKo/k=; b=JQHL799ma39mxOGPE+gyXK+ZE69/Sgnv6GL9hyUmyU/NRAySECqAPsYLbdp+YJWo78 oKpJ5jAWZ8QzBeVm0jto83to21gR84tMwmSCDr7v5J56xkHakG6Mo4QwOPbbx4Ntm/Rn 7duiyJxmoUkZeToHp0Yt//TWHvf+YWoWz/Qntgh0rBP1CMmfqHL8so3IOPL1Ov9YN8gJ PgeuTxU/R+ge84LDxb7a9cy/oes8BDQiwVSssGTBzsI41gCjIcNg79GoVBX7XGiBE3d6 BAtqACLyIXMLlplZb98X3gYtYmr7J6taw3Gqj7mW/+uMhwn4rQdq/WJSO1+KY0Meyrna Ar9g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=mime-version:content-transfer-encoding:user-agent:date:subject:cc :to:from:message-id:x-gm-message-state:from:to:cc:subject:date; bh=Eu1UCcsFNrtGDpP0teOX35SousjnhGHT123vvgcKo/k=; b=jcfsEvrU9VY3lSRK1JQGhWPk4yJDmZvZOaRP5VrLpK+r6Arrr+HzbWSZgIrs3c04Jy UAL7Q7+EjfMVQ4gJMiO00E+EETifu5m1fF/tv75tMwJGNZNOgmazNNucWVTYZ0/Kj6Ff C2AWpaWFSDavwbabdYnm6/kS/Ay/RqjeC1eraRHhX5ywHb/4MZqLzSuyqjHQtq9vDzmB Ay30/z4YA5fSvuOGdvSPC4BDTnm1dZb5fZJlA/GM9uBuA15bPVNmtt9pb9+QtFY+Uqyh H1lih5sCEBpVcHZXvZzvyM85eaNJGuz/WGfXhlfZ8XCAWeg03ynNQYQH6oZOj/m2PZxq F60Q== X-Gm-Message-State: ACrzQf2VjTW3BayZWAimwu9/1MlXuy5KMJGyFyri9WP4aJaVH7CIwPsw LEtg9d31zjxqg3FsSSmmSqw= X-Google-Smtp-Source: AMsMyM6ykR7ZkfWDyHGjEmY1ifu/GDSNGUVy18APhaVnAQou5uBqZI5xZMk2TENtz9tUeJ00YoJsRQ== X-Received: by 2002:a05:6512:ea9:b0:499:d6f9:5af6 with SMTP id bi41-20020a0565120ea900b00499d6f95af6mr3751053lfb.119.1664546809729; Fri, 30 Sep 2022 07:06:49 -0700 (PDT) Received: from noip.localdomain ([2a02:2168:a11:244b::1]) by smtp.gmail.com with ESMTPSA id f17-20020a056512229100b0049b58c51773sm299850lfu.193.2022.09.30.07.06.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 30 Sep 2022 07:06:49 -0700 (PDT) Message-Id: <1664546131660.1777662787.1655319815@gmail.com> From: Alexander Fedorov To: Johannes Weiner , Michal Hocko , Roman Gushchin , Shakeel Butt , Vladimir Davydov , Muchun Song , Sebastian Andrzej Siewior Cc: cgroups@vger.kernel.org, linux-mm@kvack.org Subject: Possible race in obj_stock_flush_required() vs drain_obj_stock() Date: Fri, 30 Sep 2022 14:06:48 +0000 X-Mailer: Vivaldi Mail User-Agent: Vivaldi Mail/1.1.2753.51 MIME-Version: 1.0 ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1664546811; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references:dkim-signature; bh=Eu1UCcsFNrtGDpP0teOX35SousjnhGHT123vvgcKo/k=; b=bLHuVbl6AXOHQ/I1LGpVAZ4oOe65pXhQ1Lumcm9QnNoBJL0Gm0vl+0I8AAWVqRGQU9P0qv K3YQplTNQcw02gLyFDFxz9dt/lfrCm5gOoUflR/QEWgTytURvr8f80K5jGU4Ewp+XavtTQ K6wJuv+Pd8o9t+hgVFlmrI8xeus3d3k= ARC-Authentication-Results: i=1; imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=JQHL799m; spf=pass (imf17.hostedemail.com: domain of halcien@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=halcien@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1664546811; a=rsa-sha256; cv=none; b=frD4hHXa18N3TDATnLzWf2FmY3HHh+CWJIXMor7VasT2nfZOOLgoUGv/JgoRj68H27pHK3 yq1eFatNdiBy4UDriV7t9zV7IuJMra54DxRjNZ2PWc9fQgvMta1RUqfphBgfUK3XF32Mwd nRGU4ZWAq7svyYEzE3kBtTvkCvN36eA= Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=JQHL799m; spf=pass (imf17.hostedemail.com: domain of halcien@gmail.com designates 209.85.167.49 as permitted sender) smtp.mailfrom=halcien@gmail.com; dmarc=pass (policy=none) header.from=gmail.com X-Rspam-User: X-Stat-Signature: aswqyxgntj4fpud74gdwcsfysw96esrx X-Rspamd-Queue-Id: 7A27040010 X-Rspamd-Server: rspam08 X-HE-Tag: 1664546811-634133 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Hi, reposting this to the mainline list as requested and with updated patch. I've encountered a race on kernel version 5.10.131-rt72 when running LTP cgroup_fj_stress_memory* tests and need help with understanding synchronization in mm/memcontrol.c, it seems really not-trivial... Have also checked patches in the latest mainline kernel but do not see anything similar to the problem. Realtime patch also does not seem to be the culprit: it just changed preemption to migration disabling and irq_disable to local_lock. It goes as follows: 1) First CPU: css_killed_work_fn() -> mem_cgroup_css_offline() -> drain_all_stock() -> obj_stock_flush_required() if (stock->cached_objcg) { This check sees a non-NULL pointer for *another* CPU's `memcg_stock` instance. 2) Second CPU: css_free_rwork_fn() -> __mem_cgroup_free() -> free_percpu() -> obj_cgroup_uncharge() -> drain_obj_stock() It frees `cached_objcg` pointer in its own `memcg_stock` instance: struct obj_cgroup *old = stock->cached_objcg; < ... > obj_cgroup_put(old); stock->cached_objcg = NULL; 3) First CPU continues after the 'if' check and re-reads the pointer again, now it is NULL and dereferencing it leads to kernel panic: static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg) { < ... > if (stock->cached_objcg) { memcg = obj_cgroup_memcg(stock->cached_objcg); Since it seems that `cached_objcg` field is protected by RCU, I've also tried the patch below (against 5.10.131-rt72) and confirmed that it seems to fix the problem (at least the same LTP tests finish successfully) but am not sure if that's the right fix. The patch does not apply cleanly to mainline kernel but I can try rebasing it if needed. mm/memcg: fix race in obj_stock_flush_required() vs drain_obj_stock() When obj_stock_flush_required() is called from drain_all_stock() it reads the `memcg_stock->cached_objcg` field twice for another CPU's per-cpu variable, leading to TOCTTOU race: another CPU can simultaneously enter drain_obj_stock() and clear its own instance of `memcg_stock->cached_objcg`. To fix it mark `cached_objcg` as RCU protected field and use rcu_*() accessors everywhere. Signed-off-by: Alexander Fedorov ret = true; } @@ -3160,7 +3160,8 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) static void drain_obj_stock(struct memcg_stock_pcp *stock) { - struct obj_cgroup *old = stock->cached_objcg; + struct obj_cgroup *old = rcu_replace_pointer(stock->cached_objcg, NULL, + lockdep_is_held(&stock->lock)); if (!old) return; @@ -3198,16 +3199,16 @@ static void drain_obj_stock(struct memcg_stock_pcp *stock) } obj_cgroup_put(old); - stock->cached_objcg = NULL; } static bool obj_stock_flush_required(struct memcg_stock_pcp *stock, struct mem_cgroup *root_memcg) { struct mem_cgroup *memcg; + struct obj_cgroup *cached_objcg = rcu_dereference(stock->cached_objcg); - if (stock->cached_objcg) { - memcg = obj_cgroup_memcg(stock->cached_objcg); + if (cached_objcg) { + memcg = obj_cgroup_memcg(cached_objcg); if (memcg && mem_cgroup_is_descendant(memcg, root_memcg)) return true; } @@ -3223,11 +3224,11 @@ static void refill_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) local_lock_irqsave(&memcg_stock.lock, flags); stock = this_cpu_ptr(&memcg_stock); - if (stock->cached_objcg != objcg) { /* reset if necessary */ + if (rcu_access_pointer(stock->cached_objcg) != objcg) { /* reset if necessary */ drain_obj_stock(stock); obj_cgroup_get(objcg); - stock->cached_objcg = objcg; stock->nr_bytes = atomic_xchg(&objcg->nr_charged_bytes, 0); + rcu_assign_pointer(stock->cached_objcg, objcg); } stock->nr_bytes += nr_bytes; @@ -7162,6 +7163,7 @@ static int __init mem_cgroup_init(void) stock = per_cpu_ptr(&memcg_stock, cpu); INIT_WORK(&stock->work, drain_local_stock); + RCU_INIT_POINTER(stock->cached_objcg, NULL); local_lock_init(&stock->lock); } Signed-off-by: Alexander Fedorov Signed-off-by: Alexander Fedorov Reported-by: Alexander Fedorov Signed-off-by: Johannes Weiner Acked-by: Roman Gushchin . Reported-by: Alexander Fedorov Signed-off-by: Johannes Weiner Acked-by: Roman Gushchin diff --git a/mm/memcontrol.c b/mm/memcontrol.c index c1152f8747..2ab205f529 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -2215,7 +2215,7 @@ struct memcg_stock_pcp { unsigned int nr_pages; #ifdef CONFIG_MEMCG_KMEM - struct obj_cgroup *cached_objcg; + struct obj_cgroup __rcu *cached_objcg; unsigned int nr_bytes; #endif @@ -3148,7 +3148,7 @@ static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) local_lock_irqsave(&memcg_stock.lock, flags); stock = this_cpu_ptr(&memcg_stock); - if (objcg == stock->cached_objcg && stock->nr_bytes >= nr_bytes) { + if (objcg == rcu_access_pointer(stock->cached_objcg) && stock->nr_bytes >= nr_bytes) { stock->nr_bytes -= nr_bytes;