diff mbox series

[3/4] xfs: allow symlinks with short remote targets

Message ID 171635763423.2619960.122476714020756620.stgit@frogsfrogsfrogs (mailing list archive)
State New
Headers show
Series [1/4] xfs: drop xfarray sortinfo folio on error | expand

Commit Message

Darrick J. Wong May 22, 2024, 6:02 a.m. UTC
From: Darrick J. Wong <djwong@kernel.org>

An internal user complained about log recovery failing on a symlink
("Bad dinode after recovery") with the following (excerpted) format:

core.magic = 0x494e
core.mode = 0120777
core.version = 3
core.format = 2 (extents)
core.nlinkv2 = 1
core.nextents = 1
core.size = 297
core.nblocks = 1
core.naextents = 0
core.forkoff = 0
core.aformat = 2 (extents)
u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
0:[0,12,1,0]

This is a symbolic link with a 297-byte target stored in a disk block,
which is to say this is a symlink with a remote target.  The forkoff is
0, which is to say that there's 512 - 176 == 336 bytes in the inode core
to store the data fork.

Eventually, testing of generic/388 failed with the same inode corruption
message during inode recovery.  In writing a debugging patch to call
xfs_dinode_verify on dirty inode log items when we're committing
transactions, I observed that xfs/298 can reproduce the problem quite
quickly.

xfs/298 creates a symbolic link, adds some extended attributes, then
deletes them all.  The test failure occurs when the final removexattr
also deletes the attr fork because that does not convert the remote
symlink back into a shortform symlink.  That is how we trip this test.
The only reason why xfs/298 only triggers with the debug patch added is
that it deletes the symlink, so the final iflush shows the inode as
free.

I wrote a quick fstest to emulate the behavior of xfs/298, except that
it leaves the symlinks on the filesystem after inducing the "corrupt"
state.  Kernels going back at least as far as 4.18 have written out
symlink inodes in this manner and prior to 1eb70f54c445f they did not
object to reading them back in.

Because we've been writing out inodes this way for quite some time, the
only way to fix this is to relax the check for symbolic links.
Directories don't have this problem because di_size is bumped to
blocksize during the sf->data conversion.

Fixes: 1eb70f54c445f ("xfs: validate inode fork size against fork format")
Signed-off-by: Darrick J. Wong <djwong@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/libxfs/xfs_inode_buf.c |   28 ++++++++++++++++++++++++----
 1 file changed, 24 insertions(+), 4 deletions(-)

Comments

Chandan Babu R May 29, 2024, 7:10 a.m. UTC | #1
On Tue, May 21, 2024 at 11:02:16 PM -0700, Darrick J. Wong wrote:
> From: Darrick J. Wong <djwong@kernel.org>
>
> An internal user complained about log recovery failing on a symlink
> ("Bad dinode after recovery") with the following (excerpted) format:
>
> core.magic = 0x494e
> core.mode = 0120777
> core.version = 3
> core.format = 2 (extents)
> core.nlinkv2 = 1
> core.nextents = 1
> core.size = 297
> core.nblocks = 1
> core.naextents = 0
> core.forkoff = 0
> core.aformat = 2 (extents)
> u3.bmx[0] = [startoff,startblock,blockcount,extentflag]
> 0:[0,12,1,0]
>
> This is a symbolic link with a 297-byte target stored in a disk block,
> which is to say this is a symlink with a remote target.  The forkoff is
> 0, which is to say that there's 512 - 176 == 336 bytes in the inode core
> to store the data fork.
>
> Eventually, testing of generic/388 failed with the same inode corruption
> message during inode recovery.  In writing a debugging patch to call
> xfs_dinode_verify on dirty inode log items when we're committing
> transactions, I observed that xfs/298 can reproduce the problem quite
> quickly.
>
> xfs/298 creates a symbolic link, adds some extended attributes, then
> deletes them all.  The test failure occurs when the final removexattr
> also deletes the attr fork because that does not convert the remote
> symlink back into a shortform symlink.  That is how we trip this test.
> The only reason why xfs/298 only triggers with the debug patch added is
> that it deletes the symlink, so the final iflush shows the inode as
> free.
>
> I wrote a quick fstest to emulate the behavior of xfs/298, except that
> it leaves the symlinks on the filesystem after inducing the "corrupt"
> state.  Kernels going back at least as far as 4.18 have written out
> symlink inodes in this manner and prior to 1eb70f54c445f they did not
> object to reading them back in.
>
> Because we've been writing out inodes this way for quite some time, the
> only way to fix this is to relax the check for symbolic links.
> Directories don't have this problem because di_size is bumped to
> blocksize during the sf->data conversion.
>

Darrick, This patch causes xfs/348 to fail. To be precise, the case where the
test sets the file type of all inodes to 12 (a symbolic link) causes the DATA
inode (i.e. a regular file) to be interpreted as a symbolic link. The test
however expected 'stat' to fail with EUCLEAN. Hence the test needs to be
fixed.
diff mbox series

Patch

diff --git a/fs/xfs/libxfs/xfs_inode_buf.c b/fs/xfs/libxfs/xfs_inode_buf.c
index d79002343d0b..e7a7bfbe75b4 100644
--- a/fs/xfs/libxfs/xfs_inode_buf.c
+++ b/fs/xfs/libxfs/xfs_inode_buf.c
@@ -374,17 +374,37 @@  xfs_dinode_verify_fork(
 	/*
 	 * For fork types that can contain local data, check that the fork
 	 * format matches the size of local data contained within the fork.
-	 *
-	 * For all types, check that when the size says the should be in extent
-	 * or btree format, the inode isn't claiming it is in local format.
 	 */
 	if (whichfork == XFS_DATA_FORK) {
-		if (S_ISDIR(mode) || S_ISLNK(mode)) {
+		/*
+		 * A directory small enough to fit in the inode must be stored
+		 * in local format.  The directory sf <-> extents conversion
+		 * code updates the directory size accordingly.
+		 */
+		if (S_ISDIR(mode)) {
 			if (be64_to_cpu(dip->di_size) <= fork_size &&
 			    fork_format != XFS_DINODE_FMT_LOCAL)
 				return __this_address;
 		}
 
+		/*
+		 * A symlink with a target small enough to fit in the inode can
+		 * be stored in extents format if xattrs were added (thus
+		 * converting the data fork from shortform to remote format)
+		 * and then removed.
+		 */
+		if (S_ISLNK(mode)) {
+			if (be64_to_cpu(dip->di_size) <= fork_size &&
+			    fork_format != XFS_DINODE_FMT_EXTENTS &&
+			    fork_format != XFS_DINODE_FMT_LOCAL)
+				return __this_address;
+		}
+
+		/*
+		 * For all types, check that when the size says the fork should
+		 * be in extent or btree format, the inode isn't claiming to be
+		 * in local format.
+		 */
 		if (be64_to_cpu(dip->di_size) > fork_size &&
 		    fork_format == XFS_DINODE_FMT_LOCAL)
 			return __this_address;