From patchwork Tue Nov 12 08:34:14 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Kairui Song X-Patchwork-Id: 13871885 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4FF22D41D6B for ; Tue, 12 Nov 2024 08:34:30 +0000 (UTC) Received: by kanga.kvack.org (Postfix) id AE4148D0002; Tue, 12 Nov 2024 03:34:29 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id A93EF8D0001; Tue, 12 Nov 2024 03:34:29 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 90EB78D0002; Tue, 12 Nov 2024 03:34:29 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from relay.hostedemail.com (smtprelay0016.hostedemail.com [216.40.44.16]) by kanga.kvack.org (Postfix) with ESMTP id 7059A8D0001 for ; Tue, 12 Nov 2024 03:34:29 -0500 (EST) Received: from smtpin02.hostedemail.com (a10.router.float.18 [10.200.18.1]) by unirelay01.hostedemail.com (Postfix) with ESMTP id C5D2E1C68BF for ; Tue, 12 Nov 2024 08:34:28 +0000 (UTC) X-FDA: 82776780996.02.C9820E8 Received: from mail-oi1-f182.google.com (mail-oi1-f182.google.com [209.85.167.182]) by imf08.hostedemail.com (Postfix) with ESMTP id 78F2116001A for ; Tue, 12 Nov 2024 08:33:59 +0000 (UTC) Authentication-Results: imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JGkNJpE9; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=hostedemail.com; s=arc-20220608; t=1731400324; h=from:from:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-transfer-encoding:content-transfer-encoding: in-reply-to:references:dkim-signature; bh=mrwF4esGn5JmInQXjA96wKWFKN7/g9VujNYRkGHUmPU=; b=lFrrhXPRUEunizhqQG/VxrhtVAW2sycQMFjfl/aZyw5SSQsurtMNVgGRwMpYz0BqRbbANl Fdqjr73lv8osExFlP02x7Fl1OvvQd69f6lRCkgYhAVKqYD6bELxItFvaJ9TO9AeCeStRLY 92OnhZ6GsMACvFppJSJoYFhwwEebMMo= ARC-Seal: i=1; s=arc-20220608; d=hostedemail.com; t=1731400324; a=rsa-sha256; cv=none; b=VrUyXRwVypY7Nj7SYOcBTVN0YtmvUCGTX8LsalG77PeJxeknPTA0sBjXmxWSqi4/Xc3DdP VdcDm7J+2w3EmglvFOdfkj1PdJFw38RhpQhZ56xWmUVT+g5Nr2FVk254TKBD1u7m8qmeIv Hdo9edmfwekjufpWWTmNXqZeqZQwirU= ARC-Authentication-Results: i=1; imf08.hostedemail.com; dkim=pass header.d=gmail.com header.s=20230601 header.b=JGkNJpE9; spf=pass (imf08.hostedemail.com: domain of ryncsn@gmail.com designates 209.85.167.182 as permitted sender) smtp.mailfrom=ryncsn@gmail.com; dmarc=pass (policy=none) header.from=gmail.com Received: by mail-oi1-f182.google.com with SMTP id 5614622812f47-3e6010a3bbfso3276644b6e.2 for ; Tue, 12 Nov 2024 00:34:26 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1731400465; x=1732005265; darn=kvack.org; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:from:to:cc:subject:date:message-id:reply-to; bh=mrwF4esGn5JmInQXjA96wKWFKN7/g9VujNYRkGHUmPU=; b=JGkNJpE92B1yTeO5JSpfmVzWT20mrzm6bcpaSU2PY/tRGIhQWwe44mBujOBlBZAtJa 3Pm32++wgGH2wq6H9MLL+4H229ohVGxTRysfd4CJIlTnH2zARqIE6ZLg/P2s/blCNEZ0 C6uNOLQVjWSYNFgmNlQVk4bZKFNT5+L3jCnvOWIUWefP5+nw0+Zb4jZvZU59J2AAWykz Of9828QbouZiH3xzz29tFDHkTKEx2kZA8aRuoJQD5NwAScIdYBTlVrvZ2sD+ue4YslTC M1XZfKkMwz5aIE876q6HWAVCNIShfP1/4WbSic3cV3wMYfXWFEYQujCBXvsnC9WVfqh/ btcA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1731400465; x=1732005265; h=content-transfer-encoding:mime-version:reply-to:message-id:date :subject:cc:to:from:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=mrwF4esGn5JmInQXjA96wKWFKN7/g9VujNYRkGHUmPU=; b=rb3gKC7VVmL94GZje9+gv/zmP8OyzLkq99k/jyA20vopHNeEt7XVMe5x4b/m9BC3bk bzE85GkGQdc2rHBVQOsUK0KywPJ65Cg+r2vMjosjrIar0QaozA/CNP/bgkxrFl6YeY7C tOQ/6H9AN1ddqm8icP3Gpe5X57qi9uC7e7weNDg5s7yk4SLEPYVr0BTpsSa3hGSIWzFY JPoDtHYQwMcK1GenBMo+T+2ObbX/rK76iKHRsmzyfu61a3KCg6l7o/0HIOXTC11Zibtu h8tzxwyAyz1Hj3x5g/Tb3sqEr7ciQhyXCzixjAOE7hSXJg69+eAmlmXQ14Z3QyXlQ+cK Ebbg== X-Gm-Message-State: AOJu0YzLj6VHp/08ctiUDlneyJqJp7L7SWBktHz2EGKtBsS9nXo5NfXP RBE1LGJaLlJaCIx6nFbICmOQzJsj0Y3ZbkTXLcB6VJbdP2RMzyMUOQtruP3r X-Google-Smtp-Source: AGHT+IFXnIkq1uAs6pS9cvPyOSonJKo7e0Gj76mQMBR29irmJ+fGEx3tclEsJ9Vs/nd5A3ztaOGcxQ== X-Received: by 2002:a05:6871:7bc7:b0:261:1b66:5ab1 with SMTP id 586e51a60fabf-2956010aadamr12801246fac.21.1731400465558; Tue, 12 Nov 2024 00:34:25 -0800 (PST) Received: from KASONG-MC4.tencent.com ([43.132.141.25]) by smtp.gmail.com with ESMTPSA id 41be03b00d2f7-7f41f644036sm9885452a12.66.2024.11.12.00.34.22 (version=TLS1_3 cipher=TLS_CHACHA20_POLY1305_SHA256 bits=256/256); Tue, 12 Nov 2024 00:34:25 -0800 (PST) From: Kairui Song To: linux-mm@kvack.org Cc: Andrew Morton , Chris Li , "Huang, Ying" , Barry Song , Ryan Roberts , Hugh Dickins , Kalesh Singh , linux-kernel@vger.kernel.org, Kairui Song Subject: [PATCH] mm, swap: fix allocation and scanning race with swapoff Date: Tue, 12 Nov 2024 16:34:14 +0800 Message-ID: <20241112083414.78174-1-ryncsn@gmail.com> X-Mailer: git-send-email 2.47.0 Reply-To: Kairui Song MIME-Version: 1.0 X-Rspam-User: X-Rspamd-Server: rspam12 X-Rspamd-Queue-Id: 78F2116001A X-Stat-Signature: dkuo5b5xy9665tog95spksmyn1a9nud8 X-HE-Tag: 1731400439-369617 X-HE-Meta: U2FsdGVkX19bBOweOTzT++AcI7e7o7xT0eUACVmIS1yEJlI7K32SlDsncOtzl8gHY+FddGK6Fyu24WS3UGW/uVGsg5zvmNk6EwcVyeDvX53YB1rong7JnmMLugVPQlnimKBakg5Zdnzu0YZWX29dIxC1RXoIsTtHH1U88R7UJyNNpylViyoe9ZzzR0ZsmWhIajzIeolAa7VlvwvRrhbOjJeykpamA2UkOvmSLlqHEvTh2g3xxuNgC1n0LoTKV2yioPtOB2hrfhiPNGxcJlC8esCTaOBcwc6DCysYD7FPZHGe+b+yc4erVsYwEP3w3x1zds4tgDdYmclNTXLe9ITmrNRjyklEN7d4R25pcS58jU7SLwW+05qb8FieSTI5NV+yMBEu4H8x29I7W8Xaa6z9KPTsmnhmeEdm8mhBrpT7NK/Fka5P0YdvZvgvBc8UTKbGahr5crdzObS1eYuXKDkY7GfWhCmwWrpAiXGe4OLTGHeY6EhiBLmIJiMd26KAJXWpp6N1PgGnwUO3B/3f18c+5wBISYNwYWTGZLmzFySlHKU49E8vSWU6pVBQ36wSnrA1Xq54eRnIYIDsz99vpbuW5QerdsfqkAkFrLzH7vrJ5WPS0W7V/nstzKN/cm1NegpMMTyKjy2atNOGGkUeMADtmVjxYcStVyvGKCAaZpZg5b8DxY8++tVt5iczhTbQJAiMQ8MoIGDnZjXCl9lkJgDzavcxXXt7lqC2zOjKWEu5jGvGvWwGpiw+86HcNLZLQGEpHwYUiecnvHdEO7INa3Lzkq3+JFEBIr5/BUYzYZIQ1zfc+0ouqv2vavg1Mu1KDlqzsGXtewlR/Y0SPLVc0uQY6XpdUR2S3ojcgInXRDGuuHZxTSZcUZMLA+V24Omhph4j6ciMOJZqOggVTdbAU6w1lEqI4LML4TixLYhhNVOZMzRa1QTmHdAmG69++nzYp9EBcse2zKIluN+TCISYf0Z YbHZvA88 WQuumFE2pDLANtyjlZKPGzLJVTxDQdtIGfWQQmjcu2qiqZxKvdk3v3IPtn6naq+nG/q9WcHUCi9eSzU541Lm/E+orhg3IIgufmczpIdiDxZXUr7NIRQeAK3zuOTH6B0edFHcMzxOmKOuPK+sg3YreQgmHm8zjfiyZrEh4nILq54rONQ6o8moL1fR68vu9UgpOPu4zCLsMYeFjcVeqZecfqk5BCSv7YpSKBGzYWbQHRpwSatlEaukNm1KLxqI9p9jomH+M8h+bS5ThIwaUGiKUmdYOWXxKD5YqPwtxLEWjPOfj/rtPDxifX1pKgqRz2caDVf69lBnHQyW769sPqKhrzFfJj1c46I6WTVMQvOylB7gXvT9IiVNmoUMXep3hCZckjntV3LyG99zdMqHTS9Cy1h2onJEdYrw8CtJPohUYYX8w7ndwj5ZQeHQsIbP5GJf3pBmr2HkKYaF0yaPcHRLJPCungn7+JiXcrylqpI7Aq+muHbJL6PQM5tk946CK9mhr0dVq9oktbiTyUiD/jtoKIWpldNKFM7ksau2N085MnxXwgselnBJ+g9J7sQ2sGpvr0Z7o X-Bogosity: Ham, tests=bogofilter, spamicity=0.000084, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: List-Subscribe: List-Unsubscribe: From: Kairui Song There are two flags used to synchronize allocation and scanning with swapoff: SWP_WRITEOK and SWP_SCANNING. SWP_WRITEOK: Swapoff will first unset this flag, at this point any further swap allocation or scanning on this device should just abort so no more new entries will be referencing this device. Swapoff will then unuse all existing swap entries. SWP_SCANNING: This flag is set when device is being scanned. Swapoff will wait for all scanner to stop before the final release of the swap device structures to avoid UAF. Note this flag is the highest used bit of si->flags so it could be added up arithmetically, if there are multiple scanner. commit 5f843a9a3a1e ("mm: swap: separate SSD allocation from scan_swap_map_slots()") ignored SWP_SCANNING and SWP_WRITEOK flags while separating cluster allocation path from the old allocation path. Add the flags back to fix swapoff race. The race is hard to trigger as si->lock prevents most parallel operations, but si->lock could be dropped for reclaim or discard. This issue is found during code review. This commit fixes this problem. For SWP_SCANNING, Just like before, set the flag before scan and remove it afterwards. For SWP_WRITEOK, there are several places where si->lock could be dropped, it will be error-prone and make the code hard to follow if we try to cover these places one by one. So just do one check before the real allocation, which is also very similar like before. With new cluster allocator it may waste a bit of time iterating the clusters but won't take long, and swapoff is not performance sensitive. Reported-by: "Huang, Ying" Closes: https://lore.kernel.org/linux-mm/87a5es3f1f.fsf@yhuang6-desk2.ccr.corp.intel.com/ Fixes: 5f843a9a3a1e ("mm: swap: separate SSD allocation from scan_swap_map_slots()") Signed-off-by: Kairui Song Reviewed-by: "Huang, Ying" --- mm/swapfile.c | 22 +++++++++++++++++++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/mm/swapfile.c b/mm/swapfile.c index 9c85bd46ab7f..b0a9071cfe1d 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -664,12 +664,15 @@ static bool cluster_scan_range(struct swap_info_struct *si, return true; } -static void cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster_info *ci, +static bool cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster_info *ci, unsigned int start, unsigned char usage, unsigned int order) { unsigned int nr_pages = 1 << order; + if (!(si->flags & SWP_WRITEOK)) + return false; + if (cluster_is_free(ci)) { if (nr_pages < SWAPFILE_CLUSTER) { list_move_tail(&ci->list, &si->nonfull_clusters[order]); @@ -690,6 +693,8 @@ static void cluster_alloc_range(struct swap_info_struct *si, struct swap_cluster list_move_tail(&ci->list, &si->full_clusters); ci->flags = CLUSTER_FLAG_FULL; } + + return true; } static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigned long offset, @@ -713,7 +718,10 @@ static unsigned int alloc_swap_scan_cluster(struct swap_info_struct *si, unsigne while (offset <= end) { if (cluster_scan_range(si, ci, offset, nr_pages)) { - cluster_alloc_range(si, ci, offset, usage, order); + if (!cluster_alloc_range(si, ci, offset, usage, order)) { + offset = SWAP_NEXT_INVALID; + goto done; + } *foundp = offset; if (ci->count == SWAPFILE_CLUSTER) { offset = SWAP_NEXT_INVALID; @@ -805,7 +813,11 @@ static unsigned long cluster_alloc_swap_entry(struct swap_info_struct *si, int o if (!list_empty(&si->free_clusters)) { ci = list_first_entry(&si->free_clusters, struct swap_cluster_info, list); offset = alloc_swap_scan_cluster(si, cluster_offset(si, ci), &found, order, usage); - VM_BUG_ON(!found); + /* + * Either we didn't touch the cluster due to swapoff, + * or the allocation must success. + */ + VM_BUG_ON((si->flags & SWP_WRITEOK) && !found); goto done; } @@ -1041,6 +1053,8 @@ static int cluster_alloc_swap(struct swap_info_struct *si, VM_BUG_ON(!si->cluster_info); + si->flags += SWP_SCANNING; + while (n_ret < nr) { unsigned long offset = cluster_alloc_swap_entry(si, order, usage); @@ -1049,6 +1063,8 @@ static int cluster_alloc_swap(struct swap_info_struct *si, slots[n_ret++] = swp_entry(si->type, offset); } + si->flags -= SWP_SCANNING; + return n_ret; }