From patchwork Thu Jun 20 23:13:39 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13706532 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id BC0E313777F for ; Thu, 20 Jun 2024 23:13:40 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925220; cv=none; b=ZGkO/05CsygaJGZPhxLpYCLLALrXeSjSTOq9ROTOFj+8d+2fEcqkpF/UdZvAJgaSFgUztH+CLDbmyu3rHAlJcfklIHEnbPD4opcmkJKcKlcNmfC/KK+3WOYnAD36I52vIeB1Mg204RyY5vift0Jq8rSadkqRlONlS3T0DZnlC8E= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925220; c=relaxed/simple; bh=PWSqt5aRXt22+xVPnrevZ5FuqnkmI/rrl5r9cIFLhpY=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=NmgHWW70d6NSi2C8nK/SkuWzlAQYJAfQooBJXTZY3Z5lywBqcgUYOLYjS6/XyF5wXACbbzkGe8qb+yrCbvnr3XZNARBBwigfoktTuzkeu66y4LGtXefUDBu6Y+Jygf6yJvhGXbEmSbHPeBe/aeZzYClqYyqx5JPkzh8S0o2lkx0= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TJIDsJSh; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TJIDsJSh" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 4E4F6C2BD10; Thu, 20 Jun 2024 23:13:40 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718925220; bh=PWSqt5aRXt22+xVPnrevZ5FuqnkmI/rrl5r9cIFLhpY=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=TJIDsJShA3I8JBB1Axfc7SLBkw2VD2VnLecqXxP2TBI9qxV46u+dUQZtpgVfNrl/f 3f31Y0MWo3pLeLXX0CEv2UzeYRF8XtsPNe7EuLeTSZjBl24jsbz6oVZb8Df0Nr7teG Gj7n2oaq1RrE3J1+lMiaS6N4IOWIJ0dZHxpg54pzkQWQCVcTf4JQxZYNwfY7BBqbTj fFY5uRdXT76Yr0NKkla7hWTxomH5nOXzHWDUmYkGZzlZ/jp2YhfBsEcnRJgx+Mpvxf XwmeKVKUAA57m5yd+ajaXmazpt5kzEn20TyW89yHeEGSDPqlfCBMj1aUOAKkIPPHlr vQI0Vt/6Qvubg== Date: Thu, 20 Jun 2024 16:13:39 -0700 Subject: [PATCH 1/6] xfs: fix freeing speculative preallocations for preallocated files From: "Darrick J. Wong" To: hch@lst.de, chandanbabu@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <171892459249.3192151.18355050838100323730.stgit@frogsfrogsfrogs> In-Reply-To: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> References: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Christoph Hellwig xfs_can_free_eofblocks returns false for files that have persistent preallocations unless the force flag is passed and there are delayed blocks. This means it won't free delalloc reservations for files with persistent preallocations unless the force flag is set, and it will also free the persistent preallocations if the force flag is set and the file happens to have delayed allocations. Both of these are bad, so do away with the force flag and always free only post-EOF delayed allocations for files with the XFS_DIFLAG_PREALLOC or APPEND flags set. Signed-off-by: Christoph Hellwig Reviewed-by: Darrick J. Wong Signed-off-by: Darrick J. Wong --- fs/xfs/xfs_bmap_util.c | 30 ++++++++++++++++++++++-------- fs/xfs/xfs_bmap_util.h | 2 +- fs/xfs/xfs_icache.c | 2 +- fs/xfs/xfs_inode.c | 14 ++++---------- 4 files changed, 28 insertions(+), 20 deletions(-) diff --git a/fs/xfs/xfs_bmap_util.c b/fs/xfs/xfs_bmap_util.c index ac2e77ebb54c..a4d9fbc21b83 100644 --- a/fs/xfs/xfs_bmap_util.c +++ b/fs/xfs/xfs_bmap_util.c @@ -486,13 +486,11 @@ xfs_bmap_punch_delalloc_range( /* * Test whether it is appropriate to check an inode for and free post EOF - * blocks. The 'force' parameter determines whether we should also consider - * regular files that are marked preallocated or append-only. + * blocks. */ bool xfs_can_free_eofblocks( - struct xfs_inode *ip, - bool force) + struct xfs_inode *ip) { struct xfs_bmbt_irec imap; struct xfs_mount *mp = ip->i_mount; @@ -526,11 +524,11 @@ xfs_can_free_eofblocks( return false; /* - * Do not free real preallocated or append-only files unless the file - * has delalloc blocks and we are forced to remove them. + * Only free real extents for inodes with persistent preallocations or + * the append-only flag. */ if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) - if (!force || ip->i_delayed_blks == 0) + if (ip->i_delayed_blks == 0) return false; /* @@ -584,6 +582,22 @@ xfs_free_eofblocks( /* Wait on dio to ensure i_size has settled. */ inode_dio_wait(VFS_I(ip)); + /* + * For preallocated files only free delayed allocations. + * + * Note that this means we also leave speculative preallocations in + * place for preallocated files. + */ + if (ip->i_diflags & (XFS_DIFLAG_PREALLOC | XFS_DIFLAG_APPEND)) { + if (ip->i_delayed_blks) { + xfs_bmap_punch_delalloc_range(ip, + round_up(XFS_ISIZE(ip), mp->m_sb.sb_blocksize), + LLONG_MAX); + } + xfs_inode_clear_eofblocks_tag(ip); + return 0; + } + error = xfs_trans_alloc(mp, &M_RES(mp)->tr_itruncate, 0, 0, 0, &tp); if (error) { ASSERT(xfs_is_shutdown(mp)); @@ -891,7 +905,7 @@ xfs_prepare_shift( * Trim eofblocks to avoid shifting uninitialized post-eof preallocation * into the accessible region of the file. */ - if (xfs_can_free_eofblocks(ip, true)) { + if (xfs_can_free_eofblocks(ip)) { error = xfs_free_eofblocks(ip); if (error) return error; diff --git a/fs/xfs/xfs_bmap_util.h b/fs/xfs/xfs_bmap_util.h index 51f84d8ff372..eb0895bfb9da 100644 --- a/fs/xfs/xfs_bmap_util.h +++ b/fs/xfs/xfs_bmap_util.h @@ -63,7 +63,7 @@ int xfs_insert_file_space(struct xfs_inode *, xfs_off_t offset, xfs_off_t len); /* EOF block manipulation functions */ -bool xfs_can_free_eofblocks(struct xfs_inode *ip, bool force); +bool xfs_can_free_eofblocks(struct xfs_inode *ip); int xfs_free_eofblocks(struct xfs_inode *ip); int xfs_swap_extents(struct xfs_inode *ip, struct xfs_inode *tip, diff --git a/fs/xfs/xfs_icache.c b/fs/xfs/xfs_icache.c index 0953163a2d84..9967334ea99f 100644 --- a/fs/xfs/xfs_icache.c +++ b/fs/xfs/xfs_icache.c @@ -1155,7 +1155,7 @@ xfs_inode_free_eofblocks( } *lockflags |= XFS_IOLOCK_EXCL; - if (xfs_can_free_eofblocks(ip, false)) + if (xfs_can_free_eofblocks(ip)) return xfs_free_eofblocks(ip); /* inode could be preallocated or append-only */ diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index 58fb7a5062e1..b699fa6ee3b6 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -1595,7 +1595,7 @@ xfs_release( if (!xfs_ilock_nowait(ip, XFS_IOLOCK_EXCL)) return 0; - if (xfs_can_free_eofblocks(ip, false)) { + if (xfs_can_free_eofblocks(ip)) { /* * Check if the inode is being opened, written and closed * frequently and we have delayed allocation blocks outstanding @@ -1856,15 +1856,13 @@ xfs_inode_needs_inactive( /* * This file isn't being freed, so check if there are post-eof blocks - * to free. @force is true because we are evicting an inode from the - * cache. Post-eof blocks must be freed, lest we end up with broken - * free space accounting. + * to free. * * Note: don't bother with iolock here since lockdep complains about * acquiring it in reclaim context. We have the only reference to the * inode at this point anyways. */ - return xfs_can_free_eofblocks(ip, true); + return xfs_can_free_eofblocks(ip); } /* @@ -1947,15 +1945,11 @@ xfs_inactive( if (VFS_I(ip)->i_nlink != 0) { /* - * force is true because we are evicting an inode from the - * cache. Post-eof blocks must be freed, lest we end up with - * broken free space accounting. - * * Note: don't bother with iolock here since lockdep complains * about acquiring it in reclaim context. We have the only * reference to the inode at this point anyways. */ - if (xfs_can_free_eofblocks(ip, true)) + if (xfs_can_free_eofblocks(ip)) error = xfs_free_eofblocks(ip); goto out; From patchwork Thu Jun 20 23:13:55 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13706533 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 729B914291E for ; Thu, 20 Jun 2024 23:13:56 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925236; cv=none; b=q2ukX/brtAtyIfl6qkFmRaYWyFrj7z1073XktPIoURdm/ArkybbTtH6Lv/veRpBj0BK+oHfhgeK9mFkpGwvrvZC9jou1hKPLIdr1sCj50aFTO3FbAHM4E2NuyfhTUuFq4XGdM7mlhRpEpQB30284/OHaDz42rkuYO/2Xc+KZ9JI= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925236; c=relaxed/simple; bh=CiGnqyRdeOrYVGB3i1PZbKS029WkUukuxgSpx4e8nbs=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=VrAbUfC+/AHDEdGvuoLGFlAGiKgpZ6IgOd48sS/10lz26x43sv/xIvUUvLicEHrFp5hPpp+GsPLLQtwhQSbh+Mb5HulA7/Pd70zxZx8/pPxBTryy0yeWIt5VF6ta5PGfrbv2ksZeyKCcWNL/RU2RaXgS0q3maZy+12/ANHFcIJk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=TsbUkz2w; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="TsbUkz2w" Received: by smtp.kernel.org (Postfix) with ESMTPSA id E3AAAC2BD10; Thu, 20 Jun 2024 23:13:55 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718925236; bh=CiGnqyRdeOrYVGB3i1PZbKS029WkUukuxgSpx4e8nbs=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=TsbUkz2w0Z0mZAGpzT4eV0M6ram4xz2VB5Uc0NkBOS9+MXl4yyo59KxCrQcB2Hue9 vCoqNPpB7kYZ0zqElKIDtl3Nu7s7sqneQwaB6COt4dE0bxdEJRytnUB2Cp9ANP1AGh 3Ydsm3WjEcoYd+SvYdIcI1TvXsWrHqlKiwVN7C/GveIpJunfkwmTQRBuA7QyfxLuZR I+YDLB4WQ6i7Xv2wnt7yTDzlpnVfRtGgDV2OzNuL06ChOUnrjEtL3XwEe59YAZLqW8 66d+jqZp4diN5h72DHAWrhXYjytYXu2LcgoVGVSL9MDCr3RF9mDVDJB73+ws6tnS2B RqZ4y4KjzV0aQ== Date: Thu, 20 Jun 2024 16:13:55 -0700 Subject: [PATCH 2/6] xfs: restrict when we try to align cow fork delalloc to cowextsz hints From: "Darrick J. Wong" To: hch@lst.de, chandanbabu@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <171892459267.3192151.207856272423876675.stgit@frogsfrogsfrogs> In-Reply-To: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> References: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong xfs/205 produces the following failure when always_cow is enabled: --- a/tests/xfs/205.out 2024-02-28 16:20:24.437887970 -0800 +++ b/tests/xfs/205.out.bad 2024-06-03 21:13:40.584000000 -0700 @@ -1,4 +1,5 @@ QA output created by 205 *** one file + !!! disk full (expected) *** one file, a few bytes at a time *** done This is the result of overly aggressive attempts to align cow fork delalloc reservations to the CoW extent size hint. Looking at the trace data, we're trying to append a single fsblock to the "fred" file. Trying to create a speculative post-eof reservation fails because there's not enough space. We then set @prealloc_blocks to zero and try again, but the cowextsz alignment code triggers, which expands our request for a 1-fsblock reservation into a 39-block reservation. There's not enough space for that, so the whole write fails with ENOSPC even though there's sufficient space in the filesystem to allocate the single block that we need to land the write. There are two things wrong here -- first, we shouldn't be attempting speculative preallocations beyond what was requested when we're low on space. Second, if we've already computed a posteof preallocation, we shouldn't bother trying to align that to the cowextsize hint. Fix both of these problems by adding a flag that only enables the expansion of the delalloc reservation to the cowextsize if we're doing a non-extending write, and only if we're not doing an ENOSPC retry. This requires us to move the ENOSPC retry logic to xfs_bmapi_reserve_delalloc. I probably should have caught this six years ago when 6ca30729c206d was being reviewed, but oh well. Update the comments to reflect what the code does now. Fixes: 6ca30729c206d ("xfs: bmap code cleanup") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_bmap.c | 31 +++++++++++++++++++++++++++---- fs/xfs/xfs_iomap.c | 34 ++++++++++++---------------------- 2 files changed, 39 insertions(+), 26 deletions(-) diff --git a/fs/xfs/libxfs/xfs_bmap.c b/fs/xfs/libxfs/xfs_bmap.c index c101cf266bc4..6af6f744fdd6 100644 --- a/fs/xfs/libxfs/xfs_bmap.c +++ b/fs/xfs/libxfs/xfs_bmap.c @@ -4058,20 +4058,32 @@ xfs_bmapi_reserve_delalloc( xfs_extlen_t indlen; uint64_t fdblocks; int error; - xfs_fileoff_t aoff = off; + xfs_fileoff_t aoff; + bool use_cowextszhint = + whichfork == XFS_COW_FORK && !prealloc; +retry: /* * Cap the alloc length. Keep track of prealloc so we know whether to * tag the inode before we return. */ + aoff = off; alen = XFS_FILBLKS_MIN(len + prealloc, XFS_MAX_BMBT_EXTLEN); if (!eof) alen = XFS_FILBLKS_MIN(alen, got->br_startoff - aoff); if (prealloc && alen >= len) prealloc = alen - len; - /* Figure out the extent size, adjust alen */ - if (whichfork == XFS_COW_FORK) { + /* + * If we're targetting the COW fork but aren't creating a speculative + * posteof preallocation, try to expand the reservation to align with + * the COW extent size hint if there's sufficient free space. + * + * Unlike the data fork, the CoW cancellation functions will free all + * the reservations at inactivation, so we don't require that every + * delalloc reservation have a dirty pagecache. + */ + if (use_cowextszhint) { struct xfs_bmbt_irec prev; xfs_extlen_t extsz = xfs_get_cowextsz_hint(ip); @@ -4090,7 +4102,7 @@ xfs_bmapi_reserve_delalloc( */ error = xfs_quota_reserve_blkres(ip, alen); if (error) - return error; + goto out; /* * Split changing sb for alen and indlen since they could be coming @@ -4140,6 +4152,17 @@ xfs_bmapi_reserve_delalloc( out_unreserve_quota: if (XFS_IS_QUOTA_ON(mp)) xfs_quota_unreserve_blkres(ip, alen); +out: + if (error == -ENOSPC || error == -EDQUOT) { + trace_xfs_delalloc_enospc(ip, off, len); + + if (prealloc || use_cowextszhint) { + /* retry without any preallocation */ + use_cowextszhint = false; + prealloc = 0; + goto retry; + } + } return error; } diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 378342673925..414903885ab9 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -1148,33 +1148,23 @@ xfs_buffered_write_iomap_begin( } } -retry: - error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb, - end_fsb - offset_fsb, prealloc_blocks, - allocfork == XFS_DATA_FORK ? &imap : &cmap, - allocfork == XFS_DATA_FORK ? &icur : &ccur, - allocfork == XFS_DATA_FORK ? eof : cow_eof); - switch (error) { - case 0: - break; - case -ENOSPC: - case -EDQUOT: - /* retry without any preallocation */ - trace_xfs_delalloc_enospc(ip, offset, count); - if (prealloc_blocks) { - prealloc_blocks = 0; - goto retry; - } - fallthrough; - default: - goto out_unlock; - } - if (allocfork == XFS_COW_FORK) { + error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb, + end_fsb - offset_fsb, prealloc_blocks, &cmap, + &ccur, cow_eof); + if (error) + goto out_unlock; + trace_xfs_iomap_alloc(ip, offset, count, allocfork, &cmap); goto found_cow; } + error = xfs_bmapi_reserve_delalloc(ip, allocfork, offset_fsb, + end_fsb - offset_fsb, prealloc_blocks, &imap, &icur, + eof); + if (error) + goto out_unlock; + /* * Flag newly allocated delalloc blocks with IOMAP_F_NEW so we punch * them out if the write happens to fail. From patchwork Thu Jun 20 23:14:11 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13706534 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 00EA114387F for ; Thu, 20 Jun 2024 23:14:11 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925252; cv=none; b=mYC2M2H4+2ZqY5p6hv/FPEO4Su3OIk8rsRd+JcxSZxmffHK2aOsf+smYCOUfe//JO/C2kOKd1PmijEjyzLpiZhft9oEtgV2gFhPRE+CV/A/Sd2GrOWyXbmKv4X2L9VkDpV4nWOovMuOD7pTCjjmrQWawCTw+BEPVB2cGyuu9qi8= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925252; c=relaxed/simple; bh=930eoyYif2aR7xU90lFvQmdoLWBDG1Mey9JRIHAiMJI=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=IS6jB/ytkDD7YdRblPAbMIKtGzsFM0bzuXuy5Rdz6DGsrs14rwOLmz0zNZ3KkCH/xplgLRg4/259CeXFLBYMY8Kt5uTBzmW5qlJEkOQInaPqMWVBbDBV2HnC5WKM1yWldFBqjWZTeH6Yw05sljgGs1fQzk61BrbSfyJSze35xv8= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=AZBfltuW; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="AZBfltuW" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 8306DC4AF08; Thu, 20 Jun 2024 23:14:11 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718925251; bh=930eoyYif2aR7xU90lFvQmdoLWBDG1Mey9JRIHAiMJI=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=AZBfltuWGUlMomHhNvEQQHk8gKyTOR0pQRUaUSHnSPrucFlnOAt58UeA6pMhSYPQs jONOsIvRo0iddJqYJcK4s8pQYXTTttVAteqw5KuIKFqVuvdhrtZKBisXRn8QBAKXLL micNgJH7yVq1GANIJ8n9oh3xTouYMxt6af5TkPv/l6f09VUM4Y/kvNqMVl8g/rSbvg 9lbAH8wT6iESV8afoFE6oIazuSRg1murQlMkHq6Z+aFO4gyPz942tNGc3pshByV4vI 8v5axPRyq+Z0TGBPcV5Sj7vqMy2L+RqIdlsoUTHwazOJuVInsuZPXkCrMO2aRlWC25 S8O+kEuI6Z9tg== Date: Thu, 20 Jun 2024 16:14:11 -0700 Subject: [PATCH 3/6] xfs: allow unlinked symlinks and dirs with zero size From: "Darrick J. Wong" To: hch@lst.de, chandanbabu@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <171892459283.3192151.16340949131696263994.stgit@frogsfrogsfrogs> In-Reply-To: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> References: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong For a very very long time, inode inactivation has set the inode size to zero before unmapping the extents associated with the data fork. Unfortunately, commit 3c6f46eacd876 changed the inode verifier to prohibit zero-length symlinks and directories. If an inode happens to get logged in this state and the system crashes before freeing the inode, log recovery will also fail on the broken inode. Therefore, allow zero-size symlinks and directories as long as the link count is zero; nobody will be able to open these files by handle so there isn't any risk of data exposure. Fixes: 3c6f46eacd876 ("xfs: sanity check directory inode di_size") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_inode_buf.c | 23 ++++++++++++++++++----- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c index e7a7bfbe75b4..513b50da6215 100644 --- a/fs/xfs/libxfs/xfs_inode_buf.c +++ b/fs/xfs/libxfs/xfs_inode_buf.c @@ -379,10 +379,13 @@ xfs_dinode_verify_fork( /* * A directory small enough to fit in the inode must be stored * in local format. The directory sf <-> extents conversion - * code updates the directory size accordingly. + * code updates the directory size accordingly. Directories + * being truncated have zero size and are not subject to this + * check. */ if (S_ISDIR(mode)) { - if (be64_to_cpu(dip->di_size) <= fork_size && + if (dip->di_size && + be64_to_cpu(dip->di_size) <= fork_size && fork_format != XFS_DINODE_FMT_LOCAL) return __this_address; } @@ -528,9 +531,19 @@ xfs_dinode_verify( if (mode && xfs_mode_to_ftype(mode) == XFS_DIR3_FT_UNKNOWN) return __this_address; - /* No zero-length symlinks/dirs. */ - if ((S_ISLNK(mode) || S_ISDIR(mode)) && di_size == 0) - return __this_address; + /* + * No zero-length symlinks/dirs unless they're unlinked and hence being + * inactivated. + */ + if ((S_ISLNK(mode) || S_ISDIR(mode)) && di_size == 0) { + if (dip->di_version > 1) { + if (dip->di_nlink) + return __this_address; + } else { + if (dip->di_onlink) + return __this_address; + } + } fa = xfs_dinode_verify_nrext64(mp, dip); if (fa) From patchwork Thu Jun 20 23:14:26 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13706535 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id A425614387F for ; Thu, 20 Jun 2024 23:14:27 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925267; cv=none; b=tciC7//Q6tGKUcgjhIFbs9Y7PsCAXNvbPRlVoFJfkjQrwKBs4MpW7iN3f27WNtbL4SJZhcy5rmDtUbfj6PUE/1MzTMstLGxPHH+0A1XAlQv6WypSWVrNVXbxyByKZc4Xq6oA1xxPEB8EWXhPx1Y8o/QyxtD39YyDiUMTpI9mr0M= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925267; c=relaxed/simple; bh=iWIuNxpT8hhSEvjJm7X4/TLhdW5h3KU8ghba7SOu3H4=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=Z+S0HdD6VhC0a9ujJc9GytzDTP/0hZB/W30jDjc324j4JswPBmHgksEJxH+Xcl1SxUEay64BXBn0+8+Na0EjoftcoR36KmAvNhgKHRUFcXxvcZzV4SwelWTIgRs6I/WMcua14vBaggpzoD3LFinVxJttmEGhajiOK6LxoFELvpg= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=Ik7cuqq7; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="Ik7cuqq7" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 2DE55C32781; Thu, 20 Jun 2024 23:14:27 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718925267; bh=iWIuNxpT8hhSEvjJm7X4/TLhdW5h3KU8ghba7SOu3H4=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=Ik7cuqq7Rv89q2jxKiKPUKXaABZwNaajPmkiK9oJU3OvdZ84KLm4vpX1kK0Hu5GNL RQh6kPKrsEiA4imgglqBDZCSqx01HsTwYE8hKgdjMSs265wQ+QTXhITfbRTPrYX+gs QhrG26hesjuo2hCok54gdBSkTC0+Z4vDOIIx4ArG5wM+ZWuwvAC4G6Fo1SYWWoA1vm NorOXpEp/Y+TG/QLk9HPKL9hsu1Q7ifc1YJ3K871jjAYXW4tyyDfMmNshQ8bxn3Y++ kXUubdtLeUUblWCmZBvJZHiH3hH+5WtRZwKl4ZxtjMHd5+QUjagbKK8CqKumJn/yA8 3uwJWaZdW9o3g== Date: Thu, 20 Jun 2024 16:14:26 -0700 Subject: [PATCH 4/6] xfs: verify buffer, inode, and dquot items every tx commit From: "Darrick J. Wong" To: hch@lst.de, chandanbabu@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <171892459300.3192151.2332029602549409027.stgit@frogsfrogsfrogs> In-Reply-To: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> References: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong generic/388 has an annoying tendency to fail like this during log recovery: XFS (sda4): Unmounting Filesystem 435fe39b-82b6-46ef-be56-819499585130 XFS (sda4): Mounting V5 Filesystem 435fe39b-82b6-46ef-be56-819499585130 XFS (sda4): Starting recovery (logdev: internal) 00000000: 49 4e 81 b6 03 02 00 00 00 00 00 07 00 00 00 07 IN.............. 00000010: 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 10 ................ 00000020: 35 9a 8b c1 3e 6e 81 00 35 9a 8b c1 3f dc b7 00 5...>n..5...?... 00000030: 35 9a 8b c1 3f dc b7 00 00 00 00 00 00 3c 86 4f 5...?........<.O 00000040: 00 00 00 00 00 00 02 f3 00 00 00 00 00 00 00 00 ................ 00000050: 00 00 1f 01 00 00 00 00 00 00 00 02 b2 74 c9 0b .............t.. 00000060: ff ff ff ff d7 45 73 10 00 00 00 00 00 00 00 2d .....Es........- 00000070: 00 00 07 92 00 01 fe 30 00 00 00 00 00 00 00 1a .......0........ 00000080: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000090: 35 9a 8b c1 3b 55 0c 00 00 00 00 00 04 27 b2 d1 5...;U.......'.. 000000a0: 43 5f e3 9b 82 b6 46 ef be 56 81 94 99 58 51 30 C_....F..V...XQ0 XFS (sda4): Internal error Bad dinode after recovery at line 539 of file fs/xfs/xfs_inode_item_recover.c. Caller xlog_recover_items_pass2+0x4e/0xc0 [xfs] CPU: 0 PID: 2189311 Comm: mount Not tainted 6.9.0-rc4-djwx #rc4 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20171121_152543-x86-ol7-builder-01.us.oracle.com-4.el7.1 04/01/2014 Call Trace: dump_stack_lvl+0x4f/0x60 xfs_corruption_error+0x90/0xa0 xlog_recover_inode_commit_pass2+0x5f1/0xb00 xlog_recover_items_pass2+0x4e/0xc0 xlog_recover_commit_trans+0x2db/0x350 xlog_recovery_process_trans+0xab/0xe0 xlog_recover_process_data+0xa7/0x130 xlog_do_recovery_pass+0x398/0x840 xlog_do_log_recovery+0x62/0xc0 xlog_do_recover+0x34/0x1d0 xlog_recover+0xe9/0x1a0 xfs_log_mount+0xff/0x260 xfs_mountfs+0x5d9/0xb60 xfs_fs_fill_super+0x76b/0xa30 get_tree_bdev+0x124/0x1d0 vfs_get_tree+0x17/0xa0 path_mount+0x72b/0xa90 __x64_sys_mount+0x112/0x150 do_syscall_64+0x49/0x100 entry_SYSCALL_64_after_hwframe+0x4b/0x53 XFS (sda4): Corruption detected. Unmount and run xfs_repair XFS (sda4): Metadata corruption detected at xfs_dinode_verify.part.0+0x739/0x920 [xfs], inode 0x427b2d1 XFS (sda4): Filesystem has been shut down due to log error (0x2). XFS (sda4): Please unmount the filesystem and rectify the problem(s). XFS (sda4): log mount/recovery failed: error -117 XFS (sda4): log mount failed This inode log item recovery failing the dinode verifier after replaying the contents of the inode log item into the ondisk inode. Looking back into what the kernel was doing at the time of the fs shutdown, a thread was in the middle of running a series of transactions, each of which committed changes to the inode. At some point in the middle of that chain, an invalid (at least according to the verifier) change was committed. Had the filesystem not shut down in the middle of the chain, a subsequent transaction would have corrected the invalid state and nobody would have noticed. But that's not what happened here. Instead, the invalid inode state was committed to the ondisk log, so log recovery tripped over it. The actual defect here was an overzealous inode verifier, which was fixed in a separate patch. This patch adds some transaction precommit functions for CONFIG_XFS_DEBUG=y mode so that we can detect these kinds of transient errors at transaction commit time, where it's much easier to find the root cause. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/Kconfig | 12 ++++++++++++ fs/xfs/xfs.h | 4 ++++ fs/xfs/xfs_buf_item.c | 32 ++++++++++++++++++++++++++++++++ fs/xfs/xfs_dquot_item.c | 31 +++++++++++++++++++++++++++++++ fs/xfs/xfs_inode_item.c | 32 ++++++++++++++++++++++++++++++++ 5 files changed, 111 insertions(+) diff --git a/fs/xfs/Kconfig b/fs/xfs/Kconfig index d41edd30388b..fffd6fffdce0 100644 --- a/fs/xfs/Kconfig +++ b/fs/xfs/Kconfig @@ -217,6 +217,18 @@ config XFS_DEBUG Say N unless you are an XFS developer, or you play one on TV. +config XFS_DEBUG_EXPENSIVE + bool "XFS expensive debugging checks" + depends on XFS_FS && XFS_DEBUG + help + Say Y here to get an XFS build with expensive debugging checks + enabled. These checks may affect performance significantly. + + Note that the resulting code will be HUGER and SLOWER, and probably + not useful unless you are debugging a particular problem. + + Say N unless you are an XFS developer, or you play one on TV. + config XFS_ASSERT_FATAL bool "XFS fatal asserts" default y diff --git a/fs/xfs/xfs.h b/fs/xfs/xfs.h index f6ffb4f248f7..9355ccad9503 100644 --- a/fs/xfs/xfs.h +++ b/fs/xfs/xfs.h @@ -10,6 +10,10 @@ #define DEBUG 1 #endif +#ifdef CONFIG_XFS_DEBUG_EXPENSIVE +#define DEBUG_EXPENSIVE 1 +#endif + #ifdef CONFIG_XFS_ASSERT_FATAL #define XFS_ASSERT_FATAL 1 #endif diff --git a/fs/xfs/xfs_buf_item.c b/fs/xfs/xfs_buf_item.c index 43031842341a..47549cfa61cd 100644 --- a/fs/xfs/xfs_buf_item.c +++ b/fs/xfs/xfs_buf_item.c @@ -22,6 +22,7 @@ #include "xfs_trace.h" #include "xfs_log.h" #include "xfs_log_priv.h" +#include "xfs_error.h" struct kmem_cache *xfs_buf_item_cache; @@ -781,8 +782,39 @@ xfs_buf_item_committed( return lsn; } +#ifdef DEBUG_EXPENSIVE +static int +xfs_buf_item_precommit( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + struct xfs_buf_log_item *bip = BUF_ITEM(lip); + struct xfs_buf *bp = bip->bli_buf; + struct xfs_mount *mp = bp->b_mount; + xfs_failaddr_t fa; + + if (!bp->b_ops || !bp->b_ops->verify_struct) + return 0; + if (bip->bli_flags & XFS_BLI_STALE) + return 0; + + fa = bp->b_ops->verify_struct(bp); + if (fa) { + xfs_buf_verifier_error(bp, -EFSCORRUPTED, bp->b_ops->name, + bp->b_addr, BBTOB(bp->b_length), fa); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + ASSERT(fa == NULL); + } + + return 0; +} +#else +# define xfs_buf_item_precommit NULL +#endif + static const struct xfs_item_ops xfs_buf_item_ops = { .iop_size = xfs_buf_item_size, + .iop_precommit = xfs_buf_item_precommit, .iop_format = xfs_buf_item_format, .iop_pin = xfs_buf_item_pin, .iop_unpin = xfs_buf_item_unpin, diff --git a/fs/xfs/xfs_dquot_item.c b/fs/xfs/xfs_dquot_item.c index 6a1aae799cf1..7d19091215b0 100644 --- a/fs/xfs/xfs_dquot_item.c +++ b/fs/xfs/xfs_dquot_item.c @@ -17,6 +17,7 @@ #include "xfs_trans_priv.h" #include "xfs_qm.h" #include "xfs_log.h" +#include "xfs_error.h" static inline struct xfs_dq_logitem *DQUOT_ITEM(struct xfs_log_item *lip) { @@ -193,8 +194,38 @@ xfs_qm_dquot_logitem_committing( return xfs_qm_dquot_logitem_release(lip); } +#ifdef DEBUG_EXPENSIVE +static int +xfs_qm_dquot_logitem_precommit( + struct xfs_trans *tp, + struct xfs_log_item *lip) +{ + struct xfs_dquot *dqp = DQUOT_ITEM(lip)->qli_dquot; + struct xfs_mount *mp = dqp->q_mount; + struct xfs_disk_dquot ddq = { }; + xfs_failaddr_t fa; + + xfs_dquot_to_disk(&ddq, dqp); + fa = xfs_dquot_verify(mp, &ddq, dqp->q_id); + if (fa) { + XFS_CORRUPTION_ERROR("Bad dquot during logging", + XFS_ERRLEVEL_LOW, mp, &ddq, sizeof(ddq)); + xfs_alert(mp, + "Metadata corruption detected at %pS, dquot 0x%x", + fa, dqp->q_id); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + ASSERT(fa == NULL); + } + + return 0; +} +#else +# define xfs_qm_dquot_logitem_precommit NULL +#endif + static const struct xfs_item_ops xfs_dquot_item_ops = { .iop_size = xfs_qm_dquot_logitem_size, + .iop_precommit = xfs_qm_dquot_logitem_precommit, .iop_format = xfs_qm_dquot_logitem_format, .iop_pin = xfs_qm_dquot_logitem_pin, .iop_unpin = xfs_qm_dquot_logitem_unpin, diff --git a/fs/xfs/xfs_inode_item.c b/fs/xfs/xfs_inode_item.c index f28d653300d1..ef05cbbe116c 100644 --- a/fs/xfs/xfs_inode_item.c +++ b/fs/xfs/xfs_inode_item.c @@ -37,6 +37,36 @@ xfs_inode_item_sort( return INODE_ITEM(lip)->ili_inode->i_ino; } +#ifdef DEBUG_EXPENSIVE +static void +xfs_inode_item_precommit_check( + struct xfs_inode *ip) +{ + struct xfs_mount *mp = ip->i_mount; + struct xfs_dinode *dip; + xfs_failaddr_t fa; + + dip = kzalloc(mp->m_sb.sb_inodesize, GFP_KERNEL | GFP_NOFS); + if (!dip) { + ASSERT(dip != NULL); + return; + } + + xfs_inode_to_disk(ip, dip, 0); + xfs_dinode_calc_crc(mp, dip); + fa = xfs_dinode_verify(mp, ip->i_ino, dip); + if (fa) { + xfs_inode_verifier_error(ip, -EFSCORRUPTED, __func__, dip, + sizeof(*dip), fa); + xfs_force_shutdown(mp, SHUTDOWN_CORRUPT_INCORE); + ASSERT(fa == NULL); + } + kfree(dip); +} +#else +# define xfs_inode_item_precommit_check(ip) ((void)0) +#endif + /* * Prior to finally logging the inode, we have to ensure that all the * per-modification inode state changes are applied. This includes VFS inode @@ -169,6 +199,8 @@ xfs_inode_item_precommit( iip->ili_fields |= (flags | iip->ili_last_fields); spin_unlock(&iip->ili_lock); + xfs_inode_item_precommit_check(ip); + /* * We are done with the log item transaction dirty state, so clear it so * that it doesn't pollute future transactions. From patchwork Thu Jun 20 23:14:42 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13706536 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 0F4D914F9CC for ; Thu, 20 Jun 2024 23:14:43 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925283; cv=none; b=XF2SG3BAmwN6Y28YEZ1zAI4hQqTBWyFRk0u24xsNQu2x4vvn1g4vMEZH1zo1PvEtEiblOnKKzs72dnia1jBmfX8FyIpOnmzpsnBBUzeJjm5skk7eNH77lF0caEvhHCnncZJ/zzAWbeTGDrQrWaShMr5BYCvUa2UmGMquqT15yTY= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925283; c=relaxed/simple; bh=BNI9+s72cbwfRYMugMXo+ZKBxHEwTWN2KXkTOBlDV6g=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=aLVK3S4E0IOt1SFcZqwOiqnzA5kos7MGXh0WGrrFxqEl5DIiveiDi/E3+s77l98OSgPO3TFL71XSPmsj5MKPcIKWF/88Cv6AYoOjFJudtCx2EDYMFT5rmHKLfUzQuU/D/xyBVkMZ9CXPb0kqKTXhpt53niMRNJa1+Xq0Ssc1csU= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=BokC0QPQ; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="BokC0QPQ" Received: by smtp.kernel.org (Postfix) with ESMTPSA id D031BC2BD10; Thu, 20 Jun 2024 23:14:42 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718925282; bh=BNI9+s72cbwfRYMugMXo+ZKBxHEwTWN2KXkTOBlDV6g=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=BokC0QPQidQuMVM4t7ZOfGvolWqTidPX7kAQ9vV6RwrKse6uxtdXQbesQXrGonbwx dBU+s88RQ/47BKF1XZg5Sh1AZkPk0KG+cqdJ7i/knCMxCAmsIIFGDkzuXxtX08TZkd 7xA0q2K2UGn8NL/wLZJvWvpRbQfuE8HlBAdt5+4N9R3N+hF5um20wE6bSVFLOwTO9H BAJMdD9FKJPQPc2VIWV314rEytwFhbhBQSz3fTAEKYMl7aimiWP5kSRORhGf2TEnqz Ao4DrS5VhIpo18Rw5JTrauC4w3f2w1KATWSFzgzc1seRLU8xNrbGTsBl+2qrgm2sFH bYTKCX7rjzXpA== Date: Thu, 20 Jun 2024 16:14:42 -0700 Subject: [PATCH 5/6] xfs: fix direction in XFS_IOC_EXCHANGE_RANGE From: "Darrick J. Wong" To: hch@lst.de, chandanbabu@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <171892459317.3192151.14305994848485102437.stgit@frogsfrogsfrogs> In-Reply-To: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> References: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong The kernel reads userspace's buffer but does not write it back. Therefore this is really an _IOW ioctl. Change this before 6.10 final releases. Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/libxfs/xfs_fs.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/xfs/libxfs/xfs_fs.h b/fs/xfs/libxfs/xfs_fs.h index 97996cb79aaa..454b63ef7201 100644 --- a/fs/xfs/libxfs/xfs_fs.h +++ b/fs/xfs/libxfs/xfs_fs.h @@ -996,7 +996,7 @@ struct xfs_getparents_by_handle { #define XFS_IOC_FSGEOMETRY _IOR ('X', 126, struct xfs_fsop_geom) #define XFS_IOC_BULKSTAT _IOR ('X', 127, struct xfs_bulkstat_req) #define XFS_IOC_INUMBERS _IOR ('X', 128, struct xfs_inumbers_req) -#define XFS_IOC_EXCHANGE_RANGE _IOWR('X', 129, struct xfs_exchange_range) +#define XFS_IOC_EXCHANGE_RANGE _IOW ('X', 129, struct xfs_exchange_range) /* XFS_IOC_GETFSUUID ---------- deprecated 140 */ From patchwork Thu Jun 20 23:14:58 2024 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: "Darrick J. Wong" X-Patchwork-Id: 13706537 Received: from smtp.kernel.org (aws-us-west-2-korg-mail-1.web.codeaurora.org [10.30.226.201]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 040A2143746 for ; Thu, 20 Jun 2024 23:14:58 +0000 (UTC) Authentication-Results: smtp.subspace.kernel.org; arc=none smtp.client-ip=10.30.226.201 ARC-Seal: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925299; cv=none; b=CWKSnhYZNLxYFamW96qWlzhHfbdad9DerQC6C71Z3JeYX4l9OKumkIgeycOFtiU15pKqeqgmm8UZoSO4dOuyzBYL1mLv1P0y4N2kepKjKi3cE5aQu0FOsC/E8hQeOfDJgpN6T2TUftxQylXH4JNXSec5Iqg/5zey7RRaz0+YMY0= ARC-Message-Signature: i=1; a=rsa-sha256; d=subspace.kernel.org; s=arc-20240116; t=1718925299; c=relaxed/simple; bh=PYAsbZHpUYq3rmnmpeEGsbiuFMumhVIEI8YJErzyRhc=; h=Date:Subject:From:To:Cc:Message-ID:In-Reply-To:References: MIME-Version:Content-Type; b=bP4sraxQ+x2atMF8qcAdjYIAWuCEJkHNXVMUOVAqt9qRZ8IZM6zuB5a5wEj4EAKwfTkVhxa/3nNXwNC5ECJDAlW4RODZyZXw8E7DqyXyrWv93XeS6/5hASJfgWU1p4l7m52f9b50/8ydUU/EI3wuL9KFU/UaTBZF+8PevNj1nVk= ARC-Authentication-Results: i=1; smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b=JSZ2gtdg; arc=none smtp.client-ip=10.30.226.201 Authentication-Results: smtp.subspace.kernel.org; dkim=pass (2048-bit key) header.d=kernel.org header.i=@kernel.org header.b="JSZ2gtdg" Received: by smtp.kernel.org (Postfix) with ESMTPSA id 7A357C2BD10; Thu, 20 Jun 2024 23:14:58 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=kernel.org; s=k20201202; t=1718925298; bh=PYAsbZHpUYq3rmnmpeEGsbiuFMumhVIEI8YJErzyRhc=; h=Date:Subject:From:To:Cc:In-Reply-To:References:From; b=JSZ2gtdgSeIR2gYXOAUNHCShfOgB2jMhJbk/A761JVp+ErvDmTD04SpmtU5RDiNIT pYBHDBRYisYUFXzPs9yae7NKnfroGt0tEE5eGm5eLv6AM/j+tOIwDj1hR2nkAUB1yr vdKqaDYGp07zhzPC7aokJu56dSs44CQLU7WGd3PADbB89cWqw+Nmvxu187yQd3usYs D9BpGtKV+3VG8V79Iar4x4OKc7cJBjjTQmopmfoY/qHqFwnkcoqZ93iEuc4uFwp6gO EeIVX5fYxOqZ5ZE+xY8Ybfg5yxTn/cBqspdtpNEZ9ai2p+ebcY4zV/0pnTv08AbAst fmQL+mbJEGSUw== Date: Thu, 20 Jun 2024 16:14:58 -0700 Subject: [PATCH 6/6] xfs: honor init_xattrs in xfs_init_new_inode for !ATTR fs From: "Darrick J. Wong" To: hch@lst.de, chandanbabu@kernel.org, djwong@kernel.org Cc: linux-xfs@vger.kernel.org Message-ID: <171892459334.3192151.413694580283882579.stgit@frogsfrogsfrogs> In-Reply-To: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> References: <171892459218.3192151.10366641366672957906.stgit@frogsfrogsfrogs> User-Agent: StGit/0.19 Precedence: bulk X-Mailing-List: linux-xfs@vger.kernel.org List-Id: List-Subscribe: List-Unsubscribe: MIME-Version: 1.0 From: Darrick J. Wong xfs_init_new_inode ignores the init_xattrs parameter for filesystems that do not have ATTR enabled. As a result, the first init_xattrs file to be created by the kernel will not have an attr fork created to store acls. Storing that first acl will add ATTR to the superblock flags, so subsequent files will be created with attr forks. The overhead of this is so small that chances are that nobody has noticed this behavior. However, this is disastrous on a filesystem with parent pointers because it requires that a new linkable file /must/ have a pre-existing attr fork, and the parent pointers code uses init_xattrs to create that fork. The preproduction version of mkfs.xfs used to set this, but the V5 sb verifier only requires ATTR2, not ATTR. There is no guard for filesystems with (PARENT && !ATTR). It turns out that I misunderstood the two flags -- ATTR means that we at some point created an attr fork to store xattrs in a file; ATTR2 apparently means only that inodes have dynamic fork offsets or that the filesystem was mounted with the "attr2" option. Fixes: 2442ee15bb1e ("xfs: eager inode attr fork init needs attr feature awareness") Signed-off-by: Darrick J. Wong Reviewed-by: Christoph Hellwig --- fs/xfs/xfs_inode.c | 10 +++++++++- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c index b699fa6ee3b6..aa134687027c 100644 --- a/fs/xfs/xfs_inode.c +++ b/fs/xfs/xfs_inode.c @@ -42,6 +42,7 @@ #include "xfs_pnfs.h" #include "xfs_parent.h" #include "xfs_xattr.h" +#include "xfs_sb.h" struct kmem_cache *xfs_inode_cache; @@ -870,9 +871,16 @@ xfs_init_new_inode( * this saves us from needing to run a separate transaction to set the * fork offset in the immediate future. */ - if (init_xattrs && xfs_has_attr(mp)) { + if (init_xattrs) { ip->i_forkoff = xfs_default_attroffset(ip) >> 3; xfs_ifork_init_attr(ip, XFS_DINODE_FMT_EXTENTS, 0); + + if (!xfs_has_attr(mp)) { + spin_lock(&mp->m_sb_lock); + xfs_add_attr(mp); + spin_unlock(&mp->m_sb_lock); + xfs_log_sb(tp); + } } /*