[1/2] xfs: flush inodegc before swapon

Message ID	20250206061507.2320090-2-hch@lst.de (mailing list archive)
State	Accepted, archived
Headers	show Received: from bombadil.infradead.org (bombadil.infradead.org [198.137.202.133]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 3F09817548 for <linux-xfs@vger.kernel.org>; Thu, 6 Feb 2025 06:15:18 +0000 (UTC) From: Christoph Hellwig <hch@lst.de> To: Carlos Maiolino <cem@kernel.org> Cc: "Darrick J. Wong" <djwong@kernel.org>, Dave Chinner <david@fromorbit.com>, Zorro Lang <zlang@kernel.org>, linux-xfs@vger.kernel.org Subject: [PATCH 1/2] xfs: flush inodegc before swapon Date: Thu, 6 Feb 2025 07:15:00 +0100 Message-ID: <20250206061507.2320090-2-hch@lst.de> In-Reply-To: <20250206061507.2320090-1-hch@lst.de> References: <20250206061507.2320090-1-hch@lst.de> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	[1/2] xfs: flush inodegc before swapon \| expand [1/2] xfs: flush inodegc before swapon [2/2] xfs: rename xfs_iomap_swapfile_activate to xfs_vm_swap_activate

Message ID

20250206061507.2320090-2-hch@lst.de (mailing list archive)

State

Accepted, archived

Headers

From: Christoph Hellwig <hch@lst.de>
To: Carlos Maiolino <cem@kernel.org>
Cc: "Darrick J. Wong" <djwong@kernel.org>,
	Dave Chinner <david@fromorbit.com>,
	Zorro Lang <zlang@kernel.org>,
	linux-xfs@vger.kernel.org
Subject: [PATCH 1/2] xfs: flush inodegc before swapon
Date: Thu,  6 Feb 2025 07:15:00 +0100
Message-ID: <20250206061507.2320090-2-hch@lst.de>
In-Reply-To: <20250206061507.2320090-1-hch@lst.de>
References: <20250206061507.2320090-1-hch@lst.de>
Precedence: bulk
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit

Series

[1/2] xfs: flush inodegc before swapon | expand

Commit Message

hch Feb. 6, 2025, 6:15 a.m. UTC

Fix the brand new xfstest that tries to swapon on a recently unshared
file and use the chance to document the other bit of magic in this
function.

The big comment is taken from a mailinglist post by Dave Chinner.

Fixes: 5e672cd69f0a53 ("xfs: introduce xfs_inodegc_push()")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: "Darrick J. Wong" <djwong@kernel.org>
---
 fs/xfs/xfs_aops.c | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

Comments

Carlos Maiolino Feb. 11, 2025, 8:44 a.m. UTC | #1

On Thu, 06 Feb 2025 07:15:00 +0100, Christoph Hellwig wrote:
> Fix the brand new xfstest that tries to swapon on a recently unshared
> file and use the chance to document the other bit of magic in this
> function.
> 
> The big comment is taken from a mailinglist post by Dave Chinner.
> 
> 
> [...]

Applied to for-next, thanks!

[1/2] xfs: flush inodegc before swapon
      commit: 35010cc72acc468c98962f1056480a0a363eb1c3
[2/2] xfs: rename xfs_iomap_swapfile_activate to xfs_vm_swap_activate
      commit: 6f7ce473cca4952e4ac673f0fdf6dad2fac40324

Best regards,

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 69b8c2d1937d..9ff1a0b07303 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -21,6 +21,7 @@ 
 #include "xfs_error.h"
 #include "xfs_zone_alloc.h"
 #include "xfs_rtgroup.h"
+#include "xfs_icache.h"
 
 struct xfs_writepage_ctx {
 	struct iomap_writepage_ctx ctx;
@@ -685,7 +686,39 @@  xfs_iomap_swapfile_activate(
 	struct file			*swap_file,
 	sector_t			*span)
 {
-	sis->bdev = xfs_inode_buftarg(XFS_I(file_inode(swap_file)))->bt_bdev;
+	struct xfs_inode		*ip = XFS_I(file_inode(swap_file));
+
+	/*
+	 * Swap file activation can race against concurrent shared extent
+	 * removal in files that have been cloned.  If this happens,
+	 * iomap_swapfile_iter() can fail because it encountered a shared
+	 * extent even though an operation is in progress to remove those
+	 * shared extents.
+	 *
+	 * This race becomes problematic when we defer extent removal
+	 * operations beyond the end of a syscall (i.e. use async background
+	 * processing algorithms).  Users think the extents are no longer
+	 * shared, but iomap_swapfile_iter() still sees them as shared
+	 * because the refcountbt entries for the extents being removed have
+	 * not yet been updated.  Hence the swapon call fails unexpectedly.
+	 *
+	 * The race condition is currently most obvious from the unlink()
+	 * operation as extent removal is deferred until after the last
+	 * reference to the inode goes away.  We then process the extent
+	 * removal asynchronously, hence triggers the "syscall completed but
+	 * work not done" condition mentioned above.  To close this race
+	 * window, we need to flush any pending inodegc operations to ensure
+	 * they have updated the refcountbt records before we try to map the
+	 * swapfile.
+	 */
+	xfs_inodegc_flush(ip->i_mount);
+
+	/*
+	 * Direct the swap code to the correct block device when this file
+	 * sits on the RT device.
+	 */
+	sis->bdev = xfs_inode_buftarg(ip)->bt_bdev;
+
 	return iomap_swapfile_activate(sis, swap_file, span,
 			&xfs_read_iomap_ops);
 }

[1/2] xfs: flush inodegc before swapon

Commit Message

Comments

Patch