diff mbox series

[2/3] xfs: don't ever put nlink > 0 inodes on the unlinked list

Message ID 155009105350.32028.13101526675073908023.stgit@magnolia (mailing list archive)
State Accepted, archived
Headers show
Series [1/3] xfs: don't overflow xattr listent buffer | expand

Commit Message

Darrick J. Wong Feb. 13, 2019, 8:50 p.m. UTC
From: Darrick J. Wong <darrick.wong@oracle.com>

When XFS creates an O_TMPFILE file, the inode is created with nlink = 1,
put on the unlinked list, and then the VFS sets nlink = 0 in d_tmpfile.
If we crash before anything logs the inode (it's dirty incore but the
vfs doesn't tell us it's dirty so we never log that change), the iunlink
processing part of recovery will then explode with a pile of:

XFS: Assertion failed: VFS_I(ip)->i_nlink == 0, file:
fs/xfs/xfs_log_recover.c, line: 5072

Worse yet, since nlink is nonzero, the inodes also don't get cleaned up
and they just leak until the next xfs_repair run.

Therefore, change xfs_iunlink to require that inodes being put on the
unlinked list have nlink == 0, change the tmpfile callers to instantiate
nodes that way, and set the nlink to 1 just prior to calling d_tmpfile.
Fix the comment for xfs_iunlink while we're at it.

Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
---
 fs/xfs/xfs_inode.c |   16 ++++++----------
 fs/xfs/xfs_iops.c  |   13 +++++++++++--
 2 files changed, 17 insertions(+), 12 deletions(-)

Comments

Christoph Hellwig Feb. 14, 2019, 8:15 a.m. UTC | #1
On Wed, Feb 13, 2019 at 12:50:53PM -0800, Darrick J. Wong wrote:
> +	if (tmpfile) {
> +		/*
> +		 * The VFS requires that any inode fed to d_tmpfile must have
> +		 * nlink == 1 so that it can decrement the nlink in d_tmpfile.
> +		 * However, we created the temp file with nlink == 0 because
> +		 * we're not allowed to put an inode with nlink > 0 on the
> +		 * unlinked list.  Therefore we have to set nlink to 1 so that
> +		 * d_tmpfile can immediately set it back to zero.
> +		 */
> +		set_nlink(inode, 1);
>  		d_tmpfile(dentry, inode);
> +	} else

At least btrtfs has to work around these d_tmpfile assumptions as well.
Instead of piling hacks over hacks I'd rather move the call to
inode_dec_link_count from d_tmpfile, which should lead to a saner
interface.
Darrick J. Wong Feb. 14, 2019, 4:03 p.m. UTC | #2
On Thu, Feb 14, 2019 at 12:15:56AM -0800, Christoph Hellwig wrote:
> On Wed, Feb 13, 2019 at 12:50:53PM -0800, Darrick J. Wong wrote:
> > +	if (tmpfile) {
> > +		/*
> > +		 * The VFS requires that any inode fed to d_tmpfile must have
> > +		 * nlink == 1 so that it can decrement the nlink in d_tmpfile.
> > +		 * However, we created the temp file with nlink == 0 because
> > +		 * we're not allowed to put an inode with nlink > 0 on the
> > +		 * unlinked list.  Therefore we have to set nlink to 1 so that
> > +		 * d_tmpfile can immediately set it back to zero.
> > +		 */
> > +		set_nlink(inode, 1);
> >  		d_tmpfile(dentry, inode);
> > +	} else
> 
> At least btrtfs has to work around these d_tmpfile assumptions as well.
> Instead of piling hacks over hacks I'd rather move the call to
> inode_dec_link_count from d_tmpfile, which should lead to a saner
> interface.

I'm working on a bigger change to fix the d_tmpfile behavior, but that's
a complex multi-fs change that may or may not make it for 5.1. :(

In the meantime this prevents leaking inodes during unlink recovery by
ensuring that we never put linked inodes on the unlink list so I would
still like to get this one reviewed. :)

--D
Christoph Hellwig Feb. 14, 2019, 9:41 p.m. UTC | #3
On Wed, Feb 13, 2019 at 12:50:53PM -0800, Darrick J. Wong wrote:
> From: Darrick J. Wong <darrick.wong@oracle.com>
> 
> When XFS creates an O_TMPFILE file, the inode is created with nlink = 1,
> put on the unlinked list, and then the VFS sets nlink = 0 in d_tmpfile.
> If we crash before anything logs the inode (it's dirty incore but the
> vfs doesn't tell us it's dirty so we never log that change), the iunlink
> processing part of recovery will then explode with a pile of:
> 
> XFS: Assertion failed: VFS_I(ip)->i_nlink == 0, file:
> fs/xfs/xfs_log_recover.c, line: 5072
> 
> Worse yet, since nlink is nonzero, the inodes also don't get cleaned up
> and they just leak until the next xfs_repair run.
> 
> Therefore, change xfs_iunlink to require that inodes being put on the
> unlinked list have nlink == 0, change the tmpfile callers to instantiate
> nodes that way, and set the nlink to 1 just prior to calling d_tmpfile.
> Fix the comment for xfs_iunlink while we're at it.

Looks good for a quick fix, even if I think we should fix it different
in the long term:

Reviewed-by: Christoph Hellwig <hch@lst.de>
diff mbox series

Patch

diff --git a/fs/xfs/xfs_inode.c b/fs/xfs/xfs_inode.c
index 9aaa3143a277..9d683b455e01 100644
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@ -1332,7 +1332,7 @@  xfs_create_tmpfile(
 	if (error)
 		goto out_trans_cancel;
 
-	error = xfs_dir_ialloc(&tp, dp, mode, 1, 0, prid, &ip);
+	error = xfs_dir_ialloc(&tp, dp, mode, 0, 0, prid, &ip);
 	if (error)
 		goto out_trans_cancel;
 
@@ -2231,11 +2231,8 @@  xfs_iunlink_update_inode(
 }
 
 /*
- * This is called when the inode's link count goes to 0 or we are creating a
- * tmpfile via O_TMPFILE. In the case of a tmpfile, @ignore_linkcount will be
- * set to true as the link count is dropped to zero by the VFS after we've
- * created the file successfully, so we have to add it to the unlinked list
- * while the link count is non-zero.
+ * This is called when the inode's link count has gone to 0 or we are creating
+ * a tmpfile via O_TMPFILE.  The inode @ip must have nlink == 0.
  *
  * We place the on-disk inode on a list in the AGI.  It will be pulled from this
  * list when the inode is freed.
@@ -2254,6 +2251,7 @@  xfs_iunlink(
 	short			bucket_index = agino % XFS_AGI_UNLINKED_BUCKETS;
 	int			error;
 
+	ASSERT(VFS_I(ip)->i_nlink == 0);
 	ASSERT(VFS_I(ip)->i_mode != 0);
 	trace_xfs_iunlink(ip);
 
@@ -3184,11 +3182,9 @@  xfs_rename_alloc_whiteout(
 
 	/*
 	 * Prepare the tmpfile inode as if it were created through the VFS.
-	 * Otherwise, the link increment paths will complain about nlink 0->1.
-	 * Drop the link count as done by d_tmpfile(), complete the inode setup
-	 * and flag it as linkable.
+	 * Complete the inode setup and flag it as linkable.  nlink is already
+	 * zero, so we can skip the drop_nlink.
 	 */
-	drop_nlink(VFS_I(tmpfile));
 	xfs_setup_iops(tmpfile);
 	xfs_finish_inode_setup(tmpfile);
 	VFS_I(tmpfile)->i_state |= I_LINKABLE;
diff --git a/fs/xfs/xfs_iops.c b/fs/xfs/xfs_iops.c
index f48ffd7a8d3e..1efef69a7f1c 100644
--- a/fs/xfs/xfs_iops.c
+++ b/fs/xfs/xfs_iops.c
@@ -191,9 +191,18 @@  xfs_generic_create(
 
 	xfs_setup_iops(ip);
 
-	if (tmpfile)
+	if (tmpfile) {
+		/*
+		 * The VFS requires that any inode fed to d_tmpfile must have
+		 * nlink == 1 so that it can decrement the nlink in d_tmpfile.
+		 * However, we created the temp file with nlink == 0 because
+		 * we're not allowed to put an inode with nlink > 0 on the
+		 * unlinked list.  Therefore we have to set nlink to 1 so that
+		 * d_tmpfile can immediately set it back to zero.
+		 */
+		set_nlink(inode, 1);
 		d_tmpfile(dentry, inode);
-	else
+	} else
 		d_instantiate(dentry, inode);
 
 	xfs_finish_inode_setup(ip);