From patchwork Fri Jun 21 01:08:41 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Tetsuo Handa X-Patchwork-Id: 13706639 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 78E2FC2BA18 for ; Fri, 21 Jun 2024 01:08:59 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id F27E08D0112; Thu, 20 Jun 2024 21:08:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id ED70D8D0111; Thu, 20 Jun 2024 21:08:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D788B8D0112; Thu, 20 Jun 2024 21:08:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0011.hostedemail.com [216.40.44.11]) by kanga.kvack.org (Postfix) with ESMTP id B6A998D0111 for ; Thu, 20 Jun 2024 21:08:58 -0400 (EDT) Received: from smtpin09.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay05.hostedemail.com (Postfix) with ESMTP id 60EE3406E3 for ; Fri, 21 Jun 2024 01:08:58 +0000 (UTC) X-FDA: 82253111556.09.E51D9C0 Received: from www262.sakura.ne.jp (www262.sakura.ne.jp [202.181.97.72]) by imf07.hostedemail.com (Postfix) with ESMTP id 34B7040006 for ; Fri, 21 Jun 2024 01:08:54 +0000 (UTC) Authentication-Results: imf07.hostedemail.com; dkim=none; spf=pass (imf07.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1718932126; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:in-reply-to: references; bh=LFtfF6u1izztd9ChTUjlKWmGgv8upuq4lhN+RbZkATM=; b=kcVfu12ghIBBpAoCvyoIBrBnDV9wmeWfQbTJ3T+T3cO9mOGlKNrwL+5RkX1OAuuVJtzaBH Jm2YGZWsfIcTwy4DkJILWhxBd8gX3mcgrjyVFY2iS1sF9f8fkiPz/QQwTSgBKUAagkUykX KyogGmJg2WjAvgXzdccu5vrp6ZKNUZ0= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1718932126; a=rsa-sha256; cv=none; b=piXrjaVTrIJAaJP+ITl+41lsn91GLz0PJHnHZmguXLLnMxcSnMhMfZDL1m1RxiZYGLgHvL ue0qLEZiCdgyRKETAFy91dE1HqvG2AqY5NK8mVL3fryTAF20WA1T7CbGGggKWNHSiG55E4 NhvNq8ICkbmA+JDLQUZKtqOudwxqkqM= ARC-Authentication-Results: i=1; imf07.hostedemail.com; dkim=none; spf=pass (imf07.hostedemail.com: domain of penguin-kernel@I-love.SAKURA.ne.jp designates 202.181.97.72 as permitted sender) smtp.mailfrom=penguin-kernel@I-love.SAKURA.ne.jp; dmarc=none Received: from fsav412.sakura.ne.jp (fsav412.sakura.ne.jp [133.242.250.111]) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTP id 45L18f83070744; Fri, 21 Jun 2024 10:08:41 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Received: from www262.sakura.ne.jp (202.181.97.72) by fsav412.sakura.ne.jp (F-Secure/fsigk_smtp/550/fsav412.sakura.ne.jp); Fri, 21 Jun 2024 10:08:41 +0900 (JST) X-Virus-Status: clean(F-Secure/fsigk_smtp/550/fsav412.sakura.ne.jp) Received: from [192.168.1.6] (M106072142033.v4.enabler.ne.jp [106.72.142.33]) (authenticated bits=0) by www262.sakura.ne.jp (8.15.2/8.15.2) with ESMTPSA id 45L18fsN070735 (version=TLSv1.2 cipher=AES256-GCM-SHA384 bits=256 verify=NO); Fri, 21 Jun 2024 10:08:41 +0900 (JST) (envelope-from penguin-kernel@I-love.SAKURA.ne.jp) Message-ID: Date: Fri, 21 Jun 2024 10:08:41 +0900 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird Content-Language: en-US To: Axel Rasmussen , linux-mm Cc: Andrew Morton , Nicolas Saenz Julienne , LKML From: Tetsuo Handa Subject: [PATCH] mm: mmap_lock: replace get_memcg_path_buf() with on-stack buffer X-Stat-Signature: yuunkub3aftr548sot31ossz1qt5ajbk X-Rspamd-Queue-Id: 34B7040006 X-Rspam-User: X-Rspamd-Server: rspam08 X-HE-Tag: 1718932134-319817 X-HE-Meta: U2FsdGVkX1+x0Zt0ZeK2FxvAZGECwwnq/d5ALF4LSR56+CM2JGr6/+8RRvdoxw+WwX30pnHRdArUEg0RxuS8E0eO08VD3laqKEnelcWV0uz1GrCq6/1z/LKDGgz5HtraqAnMi8QVvXL6dvORmPEB5nC+/9FQT6XOti4s47sdUiDPyuBcwIDLXP06jmhkg1lUh6k2Yov74l/jEduBbex94NBoFco3snWwS9tLTsDTsQ7pfDDupH3WKpBGEZxqF9vN7sc5bNCe29R60rkvUZltxxEZfrC+FMaUibgHjT/r/y9nGvFb5X7+eZ1RI5JYMDIlBf1yZeRxFTIufQ4UOzgoHew/IZVLK/a0oOeP+hAzKQUbMm+jVk5Ji1ZnEJ5q9IYGM95daJt59D9RXty6FE8dA6X6GFFuHCrDEwY+jX6aHVbQj/1Fl6goBHov8nyUSeL4WI6g4DbxC0FQIKb8PGpux0m8JFp5aUFZXn2Y1bHdnhwu3/m8YawTA8g8Wz9+zcosRJQhl6Tf/Xytt/yx53TDHI2ECCDyOm41ubRYXqzhbP4V5VyHCVGDk0Wkf5n8ISQvE9kPzy22AXZ/tTKUUadRzhP+CPuuRELbTNUVxgfv89fBdXNt/yEqS3RJiqdof5l83xQAPMv91W1lN6qpuByAxCQ6Hk/xi7MLiplyAg8H3yrZFb0Tz5zWQAcShZeQOB990lPwM2rhiHlgGnKx94I4QSLKsphB3tKrHfugiLxK8ImCyW7xsOgx9DLa++N7oqDDF9g4oUxRphYMfh0TbCDUk+uAVwrTlmzh2zITKJNS0ZDBqId/uRbbBrdBZqOxGydlZkzVb0MEYfc6UfqDUKBtA8QKETHpYfkUXHkmyzqyL3rTDaM6TAiFgxurLFvdM5OokTrs2J9epWbu+2eEF9mjt+KC/VcnkY/xyUaQooX+qDWDfe6+VqqUqY6ZMIDUGpvJaMdvILATLh2FVSPsPFf AVF3widG 5XT9N7jGYWOvdEVvZG/3nxeXn0IOAbW32QSQp32dwjJ0fljeqCF0Nm6S3BKaiNNzEznjBKUK5/GlBk7ue30CwMQ/96TNGH2h75Y3hgO+uqnatdI1PE4X9qUSy8jSOUejyv5CBxZl9gbvPQZwmUoTUuN5TQy5xLvyrhN01aEKZdJx941BD+FNvr+f8YmP0cHLfp3hlqNPGgXzGK/Mp6vxSPTWZIzN7PASakhefm8Qg8k9n27Koo/zYzR/ZXen+53c7rvBwRx5XKmsfKymDaUr4OKP22K49bOkiezcD+fVSNwjn3m7IY4+ingHwn7heLF0f5D88Qk7l9XplwNvTyKNB1M3kEh2Fc2Pyn/J0gUSDK3FLVdEjuVIdFl2IpwDyeSyByt0u+/WyU6WQQV+bMdsBwQnSydb+2fbysRR8 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: Commit 2b5067a8143e ("mm: mmap_lock: add tracepoints around lock acquisition") introduced TRACE_MMAP_LOCK_EVENT() macro using preempt_disable() in order to let get_mm_memcg_path() return a percpu buffer exclusively used by normal, softirq, irq and NMI contexts respectively. Commit 832b50725373 ("mm: mmap_lock: use local locks instead of disabling preemption") replaced preempt_disable() with local_lock(&memcg_paths.lock) based on an argument that preempt_disable() has to be avoided because get_mm_memcg_path() might sleep if PREEMPT_RT=y. But syzbot started reporting inconsistent {HARDIRQ-ON-W} -> {IN-HARDIRQ-W} usage. and inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. messages, for local_lock() does not disable IRQ. We could replace local_lock() with local_lock_irqsave() in order to suppress these messages. But this patch instead replaces percpu buffers with on-stack buffer, for the size of each buffer returned by get_memcg_path_buf() is only 256 bytes which is tolerable for allocating from current thread's kernel stack memory. Reported-by: syzbot Closes: https://syzkaller.appspot.com/bug?extid=40905bca570ae6784745 Fixes: 832b50725373 ("mm: mmap_lock: use local locks instead of disabling preemption") Signed-off-by: Tetsuo Handa Reviewed-by: Axel Rasmussen --- Only compile tested. mm/mmap_lock.c | 175 ++++++------------------------------------------- 1 file changed, 20 insertions(+), 155 deletions(-) diff --git a/mm/mmap_lock.c b/mm/mmap_lock.c index 1854850b4b89..368b840e7508 100644 --- a/mm/mmap_lock.c +++ b/mm/mmap_lock.c @@ -19,14 +19,7 @@ EXPORT_TRACEPOINT_SYMBOL(mmap_lock_released); #ifdef CONFIG_MEMCG -/* - * Our various events all share the same buffer (because we don't want or need - * to allocate a set of buffers *per event type*), so we need to protect against - * concurrent _reg() and _unreg() calls, and count how many _reg() calls have - * been made. - */ -static DEFINE_MUTEX(reg_lock); -static int reg_refcount; /* Protected by reg_lock. */ +static atomic_t reg_refcount; /* * Size of the buffer for memcg path names. Ignoring stack trace support, @@ -34,136 +27,22 @@ static int reg_refcount; /* Protected by reg_lock. */ */ #define MEMCG_PATH_BUF_SIZE MAX_FILTER_STR_VAL -/* - * How many contexts our trace events might be called in: normal, softirq, irq, - * and NMI. - */ -#define CONTEXT_COUNT 4 - -struct memcg_path { - local_lock_t lock; - char __rcu *buf; - local_t buf_idx; -}; -static DEFINE_PER_CPU(struct memcg_path, memcg_paths) = { - .lock = INIT_LOCAL_LOCK(lock), - .buf_idx = LOCAL_INIT(0), -}; - -static char **tmp_bufs; - -/* Called with reg_lock held. */ -static void free_memcg_path_bufs(void) -{ - struct memcg_path *memcg_path; - int cpu; - char **old = tmp_bufs; - - for_each_possible_cpu(cpu) { - memcg_path = per_cpu_ptr(&memcg_paths, cpu); - *(old++) = rcu_dereference_protected(memcg_path->buf, - lockdep_is_held(®_lock)); - rcu_assign_pointer(memcg_path->buf, NULL); - } - - /* Wait for inflight memcg_path_buf users to finish. */ - synchronize_rcu(); - - old = tmp_bufs; - for_each_possible_cpu(cpu) { - kfree(*(old++)); - } - - kfree(tmp_bufs); - tmp_bufs = NULL; -} - int trace_mmap_lock_reg(void) { - int cpu; - char *new; - - mutex_lock(®_lock); - - /* If the refcount is going 0->1, proceed with allocating buffers. */ - if (reg_refcount++) - goto out; - - tmp_bufs = kmalloc_array(num_possible_cpus(), sizeof(*tmp_bufs), - GFP_KERNEL); - if (tmp_bufs == NULL) - goto out_fail; - - for_each_possible_cpu(cpu) { - new = kmalloc(MEMCG_PATH_BUF_SIZE * CONTEXT_COUNT, GFP_KERNEL); - if (new == NULL) - goto out_fail_free; - rcu_assign_pointer(per_cpu_ptr(&memcg_paths, cpu)->buf, new); - /* Don't need to wait for inflights, they'd have gotten NULL. */ - } - -out: - mutex_unlock(®_lock); + atomic_inc(®_refcount); return 0; - -out_fail_free: - free_memcg_path_bufs(); -out_fail: - /* Since we failed, undo the earlier ref increment. */ - --reg_refcount; - - mutex_unlock(®_lock); - return -ENOMEM; } void trace_mmap_lock_unreg(void) { - mutex_lock(®_lock); - - /* If the refcount is going 1->0, proceed with freeing buffers. */ - if (--reg_refcount) - goto out; - - free_memcg_path_bufs(); - -out: - mutex_unlock(®_lock); -} - -static inline char *get_memcg_path_buf(void) -{ - struct memcg_path *memcg_path = this_cpu_ptr(&memcg_paths); - char *buf; - int idx; - - rcu_read_lock(); - buf = rcu_dereference(memcg_path->buf); - if (buf == NULL) { - rcu_read_unlock(); - return NULL; - } - idx = local_add_return(MEMCG_PATH_BUF_SIZE, &memcg_path->buf_idx) - - MEMCG_PATH_BUF_SIZE; - return &buf[idx]; + atomic_dec(®_refcount); } -static inline void put_memcg_path_buf(void) -{ - local_sub(MEMCG_PATH_BUF_SIZE, &this_cpu_ptr(&memcg_paths)->buf_idx); - rcu_read_unlock(); -} - -#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ - do { \ - const char *memcg_path; \ - local_lock(&memcg_paths.lock); \ - memcg_path = get_mm_memcg_path(mm); \ - trace_mmap_lock_##type(mm, \ - memcg_path != NULL ? memcg_path : "", \ - ##__VA_ARGS__); \ - if (likely(memcg_path != NULL)) \ - put_memcg_path_buf(); \ - local_unlock(&memcg_paths.lock); \ +#define TRACE_MMAP_LOCK_EVENT(type, mm, ...) \ + do { \ + char buf[MEMCG_PATH_BUF_SIZE]; \ + get_mm_memcg_path(mm, buf, sizeof(buf)); \ + trace_mmap_lock_##type(mm, buf, ##__VA_ARGS__); \ } while (0) #else /* !CONFIG_MEMCG */ @@ -185,37 +64,23 @@ void trace_mmap_lock_unreg(void) #ifdef CONFIG_TRACING #ifdef CONFIG_MEMCG /* - * Write the given mm_struct's memcg path to a percpu buffer, and return a - * pointer to it. If the path cannot be determined, or no buffer was available - * (because the trace event is being unregistered), NULL is returned. - * - * Note: buffers are allocated per-cpu to avoid locking, so preemption must be - * disabled by the caller before calling us, and re-enabled only after the - * caller is done with the pointer. - * - * The caller must call put_memcg_path_buf() once the buffer is no longer - * needed. This must be done while preemption is still disabled. + * Write the given mm_struct's memcg path to a buffer. If the path cannot be + * determined or the trace event is being unregistered, empty string is written. */ -static const char *get_mm_memcg_path(struct mm_struct *mm) +static void get_mm_memcg_path(struct mm_struct *mm, char *buf, size_t buflen) { - char *buf = NULL; - struct mem_cgroup *memcg = get_mem_cgroup_from_mm(mm); + struct mem_cgroup *memcg; + buf[0] = '\0'; + /* No need to get path if no trace event is registered. */ + if (!atomic_read(®_refcount)) + return; + memcg = get_mem_cgroup_from_mm(mm); if (memcg == NULL) - goto out; - if (unlikely(memcg->css.cgroup == NULL)) - goto out_put; - - buf = get_memcg_path_buf(); - if (buf == NULL) - goto out_put; - - cgroup_path(memcg->css.cgroup, buf, MEMCG_PATH_BUF_SIZE); - -out_put: + return; + if (memcg->css.cgroup) + cgroup_path(memcg->css.cgroup, buf, buflen); css_put(&memcg->css); -out: - return buf; } #endif /* CONFIG_MEMCG */