From patchwork Sat Mar 23 16:11:19 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13600685 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id B85CB1EB34 for ; Sat, 23 Mar 2024 16:11:31 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711210291; cv=none; b=nFti2RaiqFZ3HoQYqZIp6cFrrDtlG8eS5IdMdghh2I4ZivbJgPJRMTSctjVnJFCrX+KUEPcYyqIdHgzqEGdeoA0JXleNT0+/ulSjyH3zkIY0ESTaKc4Bke+LbtV7SlngZJby80OjmtXGyO7lt7iClIAxbyjk6ov5XDv9D1NCzfU= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711210291; c=relaxed/simple; bh=rWhcbzwQd6R9rS3x+dE1gyVxbOyZBip+qIheG8D5z2U=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=SNjqfjqNRZMCBO6wjdefmMXnMA88bhpg2ZeR1quwuBSrr1ro5GpdEWKO6TIwAStNvFRW/tRFUXxJIHA7+deDLM+QJRnAoeW9gJ9v+NrTUNoxb1lm6MUG204OWJGwmQZR1PmH3bmKjcc3huuQATD64jB61fnDE8tKzsOFvzWpC2I= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Izm74HTH; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Izm74HTH" Received: by smtp.kernel.org (Postfix) with ESMTPSA id A1713C433F1; Sat, 23 Mar 2024 16:11:29 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1711210291; bh=rWhcbzwQd6R9rS3x+dE1gyVxbOyZBip+qIheG8D5z2U=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=Izm74HTHhPVFR9dmYxdt5p7QSPmWITKRW87lYsZMyTx8y8iTGlJZKD3pfLnMDJJaO 8EBzVJH0y9DTwGEvq6cbC1bIfgf9rxLWXrUgeDhNVyjUZ+lKdwYj+hOIOFdQJXTvzC FcRjNXO8gmO1YhxRAQVggtVaLpFw7u+yEyjKswLflahIZPuoni6VJrT/04JnzVyU7K OaDSf9e61Kxl+z7v9ggkaFLQZcvN2qT4PZQDDq2hWmjkcaEp+yPWybFK3RVvggu5Xo AyAVNX5JT4EkHWOfdzLnUCPjLmJXyRL3DjtcTJ8DjIjNVs5g1rerCyX0F6RFnslVA6 r/fowQoydAU9Q== From: Christian Brauner To: Christoph Hellwig , Jan Kara , Jens Axboe Cc: Christian Brauner , Matthew Wilcox , linux-block@vger.kernel.org Subject: [PATCH 1/2] block: handle BLK_OPEN_RESTRICT_WRITES correctly Date: Sat, 23 Mar 2024 17:11:19 +0100 Message-ID: <20240323-zielbereich-mittragen-6fdf14876c3e@brauner> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240323-seide-erbrachten-5c60873fadc1@brauner> References: <20240323-seide-erbrachten-5c60873fadc1@brauner> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2999; i=brauner@kernel.org; h=from:subject:message-id; bh=rWhcbzwQd6R9rS3x+dE1gyVxbOyZBip+qIheG8D5z2U=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaT++69RdiCmdMZ87rbTUy9MVFujNXczy13d0GzH1ic5d 4pPcfGf7yhlYRDjYpAVU2RxaDcJl1vOU7HZKFMDZg4rE8gQBi5OAZiIsy3DX0HH7gvbJW+w7W+J VNt7sdcteRbTS5XYbSE/krzv8kx5/omR4VI5Tw7f4QPSj7b4t6yV1ijck/ScZeriqed39h4+6p0 jxQ8A X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 Last kernel release we introduce CONFIG_BLK_DEV_WRITE_MOUNTED. By default this option is set. When it is set the long-standing behavior of being able to write to mounted block devices is enabled. But in order to guard against unintended corruption by writing to the block device buffer cache CONFIG_BLK_DEV_WRITE_MOUNTED can be turned off. In that case it isn't possible to write to mounted block devices anymore. A filesystem may open its block devices with BLK_OPEN_RESTRICT_WRITES which disallows concurrent BLK_OPEN_WRITE access. When we still had the bdev handle around we could recognize BLK_OPEN_RESTRICT_WRITES because the mode was passed around. Since we managed to get rid of the bdev handle we changed that logic to recognize BLK_OPEN_RESTRICT_WRITES based on whether the file was opened writable and writes to that block device are blocked. That logic doesn't work because we do allow BLK_OPEN_RESTRICT_WRITES to be specified without BLK_OPEN_WRITE. So fix the detection logic. Use O_EXCL as an indicator that BLK_OPEN_RESTRICT_WRITES has been requested. We do the exact same thing for pidfds where O_EXCL means that this is a pidfd that refers to a thread. For userspace open paths O_EXCL will never be retained but for internal opens where we open files that are never installed into a file descriptor table this is fine. Note that BLK_OPEN_RESTRICT_WRITES is an internal only flag that cannot directly be raised by userspace. It is implicitly raised during mounting. Passes xftests and blktests with CONFIG_BLK_DEV_WRITE_MOUNTED set and unset. Fixes: 321de651fa56 ("block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access") Reported-by: Matthew Wilcox Link: https://lore.kernel.org/r/ZfyyEwu9Uq5Pgb94@casper.infradead.org Signed-off-by: Christian Brauner Reviewed-by: Yu Kuai Reviewed-by: Jan Kara --- block/bdev.c | 20 +++++++++++++------- 1 file changed, 13 insertions(+), 7 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index 7a5f611c3d2e..f819f3086905 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -821,13 +821,12 @@ static void bdev_yield_write_access(struct file *bdev_file) return; bdev = file_bdev(bdev_file); - /* Yield exclusive or shared write access. */ - if (bdev_file->f_mode & FMODE_WRITE) { - if (bdev_writes_blocked(bdev)) - bdev_unblock_writes(bdev); - else - bdev->bd_writers--; - } + + /* O_EXCL is only set for internal BLK_OPEN_RESTRICT_WRITES. */ + if (bdev_file->f_flags & O_EXCL) + bdev_unblock_writes(bdev); + else if (bdev_file->f_mode & FMODE_WRITE) + bdev->bd_writers--; } /** @@ -946,6 +945,13 @@ static unsigned blk_to_file_flags(blk_mode_t mode) else WARN_ON_ONCE(true); + /* + * BLK_OPEN_RESTRICT_WRITES is never set from userspace and + * O_EXCL is stripped from userspace. + */ + if (mode & BLK_OPEN_RESTRICT_WRITES) + flags |= O_EXCL; + if (mode & BLK_OPEN_NDELAY) flags |= O_NDELAY; From patchwork Sat Mar 23 16:11:20 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Christian Brauner X-Patchwork-Id: 13600686 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id D6EAB1EB34 for ; Sat, 23 Mar 2024 16:11:35 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711210295; cv=none; b=fe7W1DV/RmsiGk0l2WSeoL9fUtJtR79Ky9cOta0zpd1SIQKjRFYtD/HlUyxCXgyP7f560ziwlgmJJzjupV7hLJf0zr778Mbk5qQYuTlL+4xHF81/C14+J4bt/OlNnHu9OoGB4DlLzpg2QY/BNYU4Q/dMq0SJwto/71CziKsjbDQ= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1711210295; c=relaxed/simple; bh=72k07Vg0UKLZ4eCoZd/amAxTFixToiDtGZ28b+Er4cI=; h=From:To:Cc:Subject:Date:Message-ID:In-Reply-To:References: MIME-Version; b=F5xjMgERYWZMW+1WD6LVNgiVkWxIiJu9byZv2VVKiOOE1uXe2G+UXf7SzP0rpLORtW8VWNfbOucb6TWUJdjlQd8tA0+SnAS0JY0TmKr3+cpr3lPW73plzFqplg7Z1SMy5cMnWqAbw/GLTEMNZmtJhCzhpSGNECMh/z7hZjh9lMI= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=SWFmuQdR; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="SWFmuQdR" Received: by smtp.kernel.org (Postfix) with ESMTPSA id EAABBC433F1; Sat, 23 Mar 2024 16:11:33 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1711210295; bh=72k07Vg0UKLZ4eCoZd/amAxTFixToiDtGZ28b+Er4cI=; h=From:To:Cc:Subject:Date:In-Reply-To:References:From; b=SWFmuQdRsmkMgk2Of/GJ5TGMMjbEEeKEhKjn1BdQQqFqtnDujG0ASq+pahdFugSdc ikvWbAZ6dphQRGy1RPd4PyWBHI2iZqh5/CFDpcg4u+TPilTrUo3zZLNPExC2Awz33u B4esPfHeu8D2qd8MaCfry+965BiR2YJ5/5bVZ+CnY2AM8CT/0rJ3K+udHrVrlAey5m TQ3o1lYS7k/7hNu9qEsXoiiHdYLz155Ue3y3TYLZ0ciQCjX019a3uk1UBEPoMdtqNH ueQTLnTf9sruASZv9rwPa/lQYKes4A6trVFY/lMQQaUWLEb1ZqbKaEz3x/lV060tEJ ghrNTmzA+KL+A== From: Christian Brauner To: Christoph Hellwig , Jan Kara , Jens Axboe Cc: Christian Brauner , Matthew Wilcox , linux-block@vger.kernel.org Subject: [PATCH 2/2] [RFC]: block: count BLK_OPEN_RESTRICT_WRITES openers Date: Sat, 23 Mar 2024 17:11:20 +0100 Message-ID: <20240323-abtauchen-klauen-c2953810082d@brauner> X-Mailer: git-send-email 2.43.0 In-Reply-To: <20240323-zielbereich-mittragen-6fdf14876c3e@brauner> References: <20240323-seide-erbrachten-5c60873fadc1@brauner> <20240323-zielbereich-mittragen-6fdf14876c3e@brauner> Precedence: bulk X-Mailing-List: linux-block@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 X-Developer-Signature: v=1; a=openpgp-sha256; l=2132; i=brauner@kernel.org; h=from:subject:message-id; bh=72k07Vg0UKLZ4eCoZd/amAxTFixToiDtGZ28b+Er4cI=; b=owGbwMvMwCU28Zj0gdSKO4sYT6slMaT++6+Z9mVitUr5spTTlR35kue2iU235Jj5+Hagv33sp yQ/6/PTOkpZGMS4GGTFFFkc2k3C5ZbzVGw2ytSAmcPKBDKEgYtTACZS68vI0K27Zdd//mqDKq3L y8W2znXTn3zj8v8pLE/WT73kniyiuoHhr1ihusyV8LfFpU3Gfx7UL1gc9Vj9HStHcYLI6+nnCuN ZWQE= X-Developer-Key: i=brauner@kernel.org; a=openpgp; fpr=4880B8C9BD0E5106FC070F4F7B3C391EFEA93624 The original changes in v6.8 do allow for a block device to be reopened with BLK_OPEN_RESTRICT_WRITES provided the same holder is used as per bdev_may_open(). I think that may have a bug. The first opener @f1 of that block device will set bdev->bd_writers to -1. The second opener @f2 using the same holder will pass the check in bdev_may_open() that bdev->bd_writers must not be greater than zero. The first opener @f1 now closes the block device and in bdev_release() will end up calling bdev_yield_write_access() which calls bdev_writes_blocked() and sets bdev->bd_writers to 0 again. Now @f2 holds a file to that block device which was opened with exclusive write access but bdev->bd_writers has been reset to 0. So now @f3 comes along and succeeds in opening the block device with BLK_OPEN_WRITE betraying @f2's request to have exclusive write access. This isn't a practical issue yet because afaict there's no codepath inside the kernel that reopenes the same block device with BLK_OPEN_RESTRICT_WRITES but it will be if there is. If that's right then fix this by counting the number of BLK_OPEN_RESTRICT_WRITES openers. So we only allow writes again once all BLK_OPEN_RESTRICT_WRITES openers are done. Fixes: ed5cc702d311 ("block: Add config option to not allow writing to mounted devices") Signed-off-by: Christian Brauner Reviewed-by: Jan Kara --- block/bdev.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/block/bdev.c b/block/bdev.c index f819f3086905..42f84692404c 100644 --- a/block/bdev.c +++ b/block/bdev.c @@ -776,17 +776,17 @@ void blkdev_put_no_open(struct block_device *bdev) static bool bdev_writes_blocked(struct block_device *bdev) { - return bdev->bd_writers == -1; + return bdev->bd_writers < 0; } static void bdev_block_writes(struct block_device *bdev) { - bdev->bd_writers = -1; + bdev->bd_writers--; } static void bdev_unblock_writes(struct block_device *bdev) { - bdev->bd_writers = 0; + bdev->bd_writers++; } static bool bdev_may_open(struct block_device *bdev, blk_mode_t mode)