From patchwork Fri Nov 8 13:31:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Adrian Huang X-Patchwork-Id: 13868183 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id CD1A6D5C0C6 for ; Fri, 8 Nov 2024 13:32:25 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id 318666B0083; Fri, 8 Nov 2024 08:32:25 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2C8416B0085; Fri, 8 Nov 2024 08:32:25 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 18FA96B0088; Fri, 8 Nov 2024 08:32:25 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0014.hostedemail.com [216.40.44.14]) by kanga.kvack.org (Postfix) with ESMTP id E9E646B0083 for ; Fri, 8 Nov 2024 08:32:24 -0500 (EST) Received: from smtpin18.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay10.hostedemail.com (Postfix) with ESMTP id 71CD7C0C1C for ; Fri, 8 Nov 2024 13:32:24 +0000 (UTC) X-FDA: 82763016252.18.9CB672F Received: from mail-pl1-f169.google.com (mail-pl1-f169.google.com [209.85.214.169]) by imf06.hostedemail.com (Postfix) with ESMTP id 56AED180021 for ; Fri, 8 Nov 2024 13:31:56 +0000 (UTC) Authentication-Results: imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UONOYpkD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of adrianhuang0701@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=adrianhuang0701@gmail.com ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731072571; a=rsa-sha256; cv=none; b=sQu17DMnCASM+nXLMQ4AhPv4lVnpE/KJlYGIMqZSDyVjBO80/rQqKKaNDioVpy2b9RAUYO KfiYjBXrhMngNw1xO+ZfVzE5cJ7IS00J/4oTYSMuM5dW19mwpOPoVNx7MOMAIYjwMPOcTO y0PBnkAmh++DHNPq2eI/UPUV3NfJ/O8= ARC-Authentication-Results: i=1; imf06.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=UONOYpkD; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (imf06.hostedemail.com: domain of adrianhuang0701@gmail.com designates 209.85.214.169 as permitted sender) smtp.mailfrom=adrianhuang0701@gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731072571; h=from:from:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=0mizrfvSVkUCNED9Y2NSIWyK0vQ9oYQmDnBbjRhGsL4=; b=49I1ygaiyxKsHksrfAW0+OLABn8sJ4k4aZHJvmjzOaXgEBQ803R9jMKpKtxsrcvBkC9SYm FoZ9HingHLAw0jIJOeWRRwnbWuc/SSmALrRnZ+gUWUvwGn2OZfzKxxrxjMCwqq4gs19wCj MgUPGXufMheNRwnz8oovqqE59FgXaik= Received: by mail-pl1-f169.google.com with SMTP id d9443c01a7336-2114214c63eso18076845ad.3 for ; Fri, 08 Nov 2024 05:32:22 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731072741; x=1731677541; darn=kvack.org; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:from:to:cc:subject:date:message-id:reply-to; bh=0mizrfvSVkUCNED9Y2NSIWyK0vQ9oYQmDnBbjRhGsL4=; b=UONOYpkD9KltWIr+yKHsRBLeXZM6bGPo+D8ZG2Z0Uj7TmVtFfSwq1ho0zi0Dr7/GO5 LYAlDIx/bqb0fTgYnUHzvzgZA3QlNBP6Eb2mWTXvT40Bl4o3tS+vWYmyCiixZv+I+IEP KhIenem3ePCujs1sTT0U5W8vNVakgl4DWSHFlCG8Y+jGJfr3reBHgB6eq5OVF8mEfgo5 bhPqhtZJo1bGFPVi/LGWiU+bC4XL9+i9q7lT9uzv/k2DVnnioSBYxrqgiiXkbFu821NG VK4mThfIQYZfGweidBXyOu1QKCCEdmRdWCxJ9W9/5Ra8Y0i1BAXeweOLGLuyvWaki7Rh SacA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731072741; x=1731677541; h=content-transfer-encoding:mime-version:message-id:date:subject:cc :to:from:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=0mizrfvSVkUCNED9Y2NSIWyK0vQ9oYQmDnBbjRhGsL4=; b=FqYi+bZkXWg1qWrNv+cfvJ3k+LlXBw+6L2A1QN4F8vH4cclBholh42vWxvQ54Afosy DhJUG1xeyT9L+rs7quMEz+NszhNTPkdv5G9t4JAh6MCabvL1/fg7enPqpqxPgnlZEvSA 3c400IqAseHlwqvkJEzl5uWEcQjYAF9vNOLMTvka+iW/NLSB4T6O/it2fu1XobPPBsZJ cyL/6b1q1J4yP5PpKpjEC8k3HZ5ltDV6Fzt5I6lTbILTQWMZo/PfExsv1CmCLmfX9LLU 2uMt6ZiUxsW5EwbbDqwxjWaJAh7NQdD5GBSB07THuAM/GyMjehYoux+nc/wl+qROf/7D z6Tw== X-Forwarded-Encrypted: i=1; AJvYcCUawLdhkiLDaLUjEwaDi5nR5g3OjIs7eeAZ98PmRgJac2c9Wb2/4EcuOygBgqoXDNmcZ8swBFP3Ug==@kvack.org X-Gm-Message-State: AOJu0YypBsdMLVky+5QtFA5aIdLq3SFkP+/ZGKXKEL4LhaAfl6Sc60zN WWIHM9FAiOgFtV5E4nfcyfSFvNId0hTKSmIpGZ1qFbe1z+JO0Sr4 X-Google-Smtp-Source: AGHT+IEhGsvp6Hssde9pRaFnvxwwTJ/5AVBuoiR0yoMCI7WlfZC9INb/F+oL2qxkFMC8blbsiii9fw== X-Received: by 2002:a17:902:ce91:b0:20c:e1f5:48c7 with SMTP id d9443c01a7336-211835ea892mr27707775ad.55.1731072740942; Fri, 08 Nov 2024 05:32:20 -0800 (PST) Received: from AHUANG12-3ZHH9X.lenovo.com (220-143-198-84.dynamic-ip.hinet.net. [220.143.198.84]) by smtp.gmail.com with ESMTPSA id d9443c01a7336-21177e59486sm29288975ad.204.2024.11.08.05.32.16 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 08 Nov 2024 05:32:20 -0800 (PST) From: Adrian Huang X-Google-Original-From: Adrian Huang To: Ingo Molnar , Peter Zijlstra , Juri Lelli , Vincent Guittot , Andrew Morton Cc: Dietmar Eggemann , Steven Rostedt , Ben Segall , Mel Gorman , Valentin Schneider , Raghavendra K T , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Adrian Huang , Jiwei Sun Subject: [PATCH 1/1] sched/numa: Fix memory leak due to the overwritten vma->numab_state Date: Fri, 8 Nov 2024 21:31:39 +0800 Message-Id: <20241108133139.25326-1-ahuang12@lenovo.com> X-Mailer: git-send-email 2.25.1 MIME-Version: 1.0 X-Stat-Signature: difk5qt8g94dz43yp6gx1kbse61s1imj X-Rspamd-Queue-Id: 56AED180021 X-Rspamd-Server: rspam08 X-Rspam-User: X-HE-Tag: 1731072716-353486 X-HE-Meta: U2FsdGVkX1+5gKwb7z9CrSqRXIXaNy6uaUalH9rIO/6ZBB061g27S/KcC5Y6u08y6U2gm4NwOaLU8R9oVsWk2qmhIlkBJGmWVSb03CjFNbxkVMkdEPuWlYnGyAeRUoj6RUUS0VyAxKpt2VrU2390i2daPWSacSpmxgf6QSUFsoAJd6fm6edeTaPaOXksfrGDq4gmXDbE3Xeh+VVMYEXCMQUQgYiMQvrhB0+rLeMuzDXTu3ePOusKJq/pzXXNP1uohaCGjXHUmPgS7Cd4ABjc2ipvxWOum3OlJjPOitihFuWXeLbxGRAlwn27+JXFqJ4Xk4CtoDbF4xMyurrZEDC2bssA5A1nU8JLDzMu+nGwB7gHfI4BAXcS2jUtrCiyZJ4u3eVjd76YPmSAzgPFDeQm2Hu6GR+Lle0fSKaGm+drk+dXEzwHWUr+J2ChHIzVTZaur0Xmdp3lTFzg0lPeLWQN17VdIVd50tM1WyWoKhdzAOHntsFc3qiZUXBnqAJTtQKB5OW6H79h0qSyNdO4IlTEUQRfXAjVYzkjr7LbZ5mp0KZyE9kuS20s79bcrwUEzKGLuoI4vjNwwGhNO7z10HXrEBQtyraptw2s5zUrh55J9eTAitMKaYp4mcB8DXWG2FBt+B+4oo/auXNrPtcKNQahNtT8iahEJFyu3Q5g0VjBQLpXxrQIaAxBfuoVF9ZwxorbosGpjFdaVuoMZu5BZnh00d21+pIPnf5S7EtaV/piaxdgL6utUoHnNxuZifv33ABFyxjC3Nx5sbG7EQxb49PiA4LCXsAPFbmb1oOmJ9Sp+YjDiCTP4yy61LA/CuZJZKyyPfpGEyimqeWUgRv2y8y8nvg97eskGLpCkZMSvPIh5Rn8x4e98GBZ7Gg0eIlNN030LWCoXe7fdiRT1RnKphLSVuQRiknyNXLvnUUE23ndu5bez4JiJ1RCDpSGGy9CNgkkrAj8G6BN+5q6ERl3TWK A0v9bPSM ph/w4VNwrQFPnFOwNpOgNpup7c9kmISYAgCPxKfQK9CPYqa9dE1NCTNZ7SvQn8w2KxyN8NYIIZZIHXjawLR0HmjJN0zTX9b2Y8p84fVQUQSr3pyVL2SO9bRGcmkVQWNw+kk+Dwa8BbgUmhGQYSe9uvvUOA7c00OOx8DVzGoOrEH05mXcXoCb+eddNOzgnQduqufWjorg3BUCN0/YD3BPDPvlPk3B/c8+j5SFLYisgNffZWkzu1mOztLaPvzlzkwaQwZClLkPBv/O4RtcVZ469TDR6Kyvt2vUqCsqX09BnV7QI4f6trjEs7PakQGS78O8LwHPOzvILR4GHM229UmPntgktIaJ4aCA5wBmtwXhcG6PWXB7v6ToNGdy8w0F0f0ONj41dHHKoKdJXBHGlwa6jrgEpRjuNJ3i3e57aLQNFf7XcI8U= X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Adrian Huang [Problem Description] When running the hackbench program of LTP, the following memory leak is reported by kmemleak. # /opt/ltp/testcases/bin/hackbench 20 thread 1000 Running with 20*40 (== 800) tasks. # dmesg | grep kmemleak ... kmemleak: 480 new suspected memory leaks (see /sys/kernel/debug/kmemleak) kmemleak: 665 new suspected memory leaks (see /sys/kernel/debug/kmemleak) # cat /sys/kernel/debug/kmemleak unreferenced object 0xffff888cd8ca2c40 (size 64): comm "hackbench", pid 17142, jiffies 4299780315 hex dump (first 32 bytes): ac 74 49 00 01 00 00 00 4c 84 49 00 01 00 00 00 .tI.....L.I..... 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ backtrace (crc bff18fd4): [] __kmalloc_cache_noprof+0x2f9/0x3f0 [] task_numa_work+0x725/0xa00 [] task_work_run+0x58/0x90 [] syscall_exit_to_user_mode+0x1c8/0x1e0 [] do_syscall_64+0x85/0x150 [] entry_SYSCALL_64_after_hwframe+0x76/0x7e ... This issue can be consistently reproduced on three different servers: * a 448-core server * a 256-core server * a 192-core server [Root Cause] Since multiple threads are created by the hackbench program (along with the command argument 'thread'), a shared vma might be accessed by two or more cores simultaneously. When two or more cores observe that vma->numab_state is NULL at the same time, vma->numab_state will be overwritten. Note that the command `/opt/ltp/testcases/bin/hackbench 50 process 1000` cannot the reproduce the issue because of the fork() and COW. It is verified with 200+ test runs. [Solution] Introduce a lock to make sure the atomic operation of the vma->numab_state access. Fixes: ef6a22b70f6d ("sched/numa: apply the scan delay to every new vma") Reported-by: Jiwei Sun Signed-off-by: Adrian Huang --- include/linux/mm.h | 1 + include/linux/mm_types.h | 1 + kernel/sched/fair.c | 17 ++++++++++++++++- 3 files changed, 18 insertions(+), 1 deletion(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 61fff5d34ed5..a08e31ac53de 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -673,6 +673,7 @@ struct vm_operations_struct { static inline void vma_numab_state_init(struct vm_area_struct *vma) { vma->numab_state = NULL; + mutex_init(&vma->numab_state_lock); } static inline void vma_numab_state_free(struct vm_area_struct *vma) { diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h index 6e3bdf8e38bc..77eee89a89f5 100644 --- a/include/linux/mm_types.h +++ b/include/linux/mm_types.h @@ -768,6 +768,7 @@ struct vm_area_struct { #endif #ifdef CONFIG_NUMA_BALANCING struct vma_numab_state *numab_state; /* NUMA Balancing state */ + struct mutex numab_state_lock; /* NUMA Balancing state lock */ #endif struct vm_userfaultfd_ctx vm_userfaultfd_ctx; } __randomize_layout; diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c index c157d4860a3b..53e6383cd94e 100644 --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -3397,12 +3397,24 @@ static void task_numa_work(struct callback_head *work) continue; } + /* + * In case of the shared vma, the vma->numab_state will be + * overwritten if two or more cores observe vma->numab_state + * is NULL at the same time. Make sure that only one core + * allocates memory for vma->numab_state. This can prevent + * the memory leak. + */ + if (!mutex_trylock(&vma->numab_state_lock)) + continue; + /* Initialise new per-VMA NUMAB state. */ if (!vma->numab_state) { vma->numab_state = kzalloc(sizeof(struct vma_numab_state), GFP_KERNEL); - if (!vma->numab_state) + if (!vma->numab_state) { + mutex_unlock(&vma->numab_state_lock); continue; + } vma->numab_state->start_scan_seq = mm->numa_scan_seq; @@ -3428,6 +3440,7 @@ static void task_numa_work(struct callback_head *work) if (mm->numa_scan_seq && time_before(jiffies, vma->numab_state->next_scan)) { trace_sched_skip_vma_numa(mm, vma, NUMAB_SKIP_SCAN_DELAY); + mutex_unlock(&vma->numab_state_lock); continue; } @@ -3440,6 +3453,8 @@ static void task_numa_work(struct callback_head *work) vma->numab_state->pids_active[1] = 0; } + mutex_unlock(&vma->numab_state_lock); + /* Do not rescan VMAs twice within the same sequence. */ if (vma->numab_state->prev_scan_seq == mm->numa_scan_seq) { mm->numa_scan_offset = vma->vm_end;