diff mbox series

[11/12] xfs: drop s_umount over opening the log and RT devices

Message ID 20230802154131.2221419-12-hch@lst.de (mailing list archive)
State New, archived
Headers show
Series [01/12] fs: export setup_bdev_super | expand

Commit Message

Christoph Hellwig Aug. 2, 2023, 3:41 p.m. UTC
Just like get_tree_bdev needs to drop s_umount when opening the main
device, we need to do the same for the xfs log and RT devices to avoid a
potential lock order reversal with s_unmount for the mark_dead path.

It might be preferable to just drop s_umount over ->fill_super entirely,
but that will require a fairly massive audit first, so we'll do the easy
version here first.

Signed-off-by: Christoph Hellwig <hch@lst.de>
---
 fs/xfs/xfs_super.c | 15 +++++++++++----
 1 file changed, 11 insertions(+), 4 deletions(-)

Comments

Darrick J. Wong Aug. 2, 2023, 4:32 p.m. UTC | #1
On Wed, Aug 02, 2023 at 05:41:30PM +0200, Christoph Hellwig wrote:
> Just like get_tree_bdev needs to drop s_umount when opening the main
> device, we need to do the same for the xfs log and RT devices to avoid a
> potential lock order reversal with s_unmount for the mark_dead path.
> 
> It might be preferable to just drop s_umount over ->fill_super entirely,
> but that will require a fairly massive audit first, so we'll do the easy
> version here first.
> 
> Signed-off-by: Christoph Hellwig <hch@lst.de>
> ---
>  fs/xfs/xfs_super.c | 15 +++++++++++----
>  1 file changed, 11 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
> index 8185102431301d..d5042419ed9997 100644
> --- a/fs/xfs/xfs_super.c
> +++ b/fs/xfs/xfs_super.c
> @@ -448,17 +448,21 @@ STATIC int
>  xfs_open_devices(
>  	struct xfs_mount	*mp)
>  {
> -	struct block_device	*ddev = mp->m_super->s_bdev;
> +	struct super_block	*sb = mp->m_super;
> +	struct block_device	*ddev = sb->s_bdev;
>  	struct block_device	*logdev = NULL, *rtdev = NULL;
>  	int			error;
>  
> +	/* see get_tree_bdev why this is needed and safe */

Which part of get_tree_bdev?  Is it this?

		/*
		 * s_umount nests inside open_mutex during
		 * __invalidate_device().  blkdev_put() acquires
		 * open_mutex and can't be called under s_umount.  Drop
		 * s_umount temporarily.  This is safe as we're
		 * holding an active reference.
		 */
		up_write(&s->s_umount);
		blkdev_put(bdev, fc->fs_type);
		down_write(&s->s_umount);

<confused>

> +	up_write(&sb->s_umount);
> +
>  	/*
>  	 * Open real time and log devices - order is important.
>  	 */
>  	if (mp->m_logname) {
>  		error = xfs_blkdev_get(mp, mp->m_logname, &logdev);
>  		if (error)
> -			return error;
> +			goto out_unlock;
>  	}
>  
>  	if (mp->m_rtname) {
> @@ -496,7 +500,10 @@ xfs_open_devices(
>  		mp->m_logdev_targp = mp->m_ddev_targp;
>  	}
>  
> -	return 0;
> +	error = 0;
> +out_unlock:
> +	down_write(&sb->s_umount);

Isn't down_write taking s_umount?  I think the label should be
out_relock or something less misleading.

--D

> +	return error;
>  
>   out_free_rtdev_targ:
>  	if (mp->m_rtdev_targp)
> @@ -508,7 +515,7 @@ xfs_open_devices(
>   out_close_logdev:
>  	if (logdev && logdev != ddev)
>  		xfs_blkdev_put(mp, logdev);
> -	return error;
> +	goto out_unlock;
>  }
>  
>  /*
> -- 
> 2.39.2
>
Christoph Hellwig Aug. 5, 2023, 8:32 a.m. UTC | #2
On Wed, Aug 02, 2023 at 09:32:19AM -0700, Darrick J. Wong wrote:
> > +	/* see get_tree_bdev why this is needed and safe */
> 
> Which part of get_tree_bdev?  Is it this?
> 
> 		/*
> 		 * s_umount nests inside open_mutex during
> 		 * __invalidate_device().  blkdev_put() acquires
> 		 * open_mutex and can't be called under s_umount.  Drop
> 		 * s_umount temporarily.  This is safe as we're
> 		 * holding an active reference.
> 		 */
> 		up_write(&s->s_umount);
> 		blkdev_put(bdev, fc->fs_type);
> 		down_write(&s->s_umount);

Yes.  With the refactoring earlier in the series get_tree_bdev should
be trivial enough to not need a more specific reference.  If you
think there's a better way to refer to it I can update the comment,
though.

> >  		mp->m_logdev_targp = mp->m_ddev_targp;
> >  	}
> >  
> > -	return 0;
> > +	error = 0;
> > +out_unlock:
> > +	down_write(&sb->s_umount);
> 
> Isn't down_write taking s_umount?  I think the label should be
> out_relock or something less misleading.

Agreed.  Christian, can you just change this in your branch, or should
I send an incremental patch?
Christian Brauner Aug. 5, 2023, 10:39 a.m. UTC | #3
On Sat, Aug 05, 2023 at 10:32:39AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 02, 2023 at 09:32:19AM -0700, Darrick J. Wong wrote:
> > > +	/* see get_tree_bdev why this is needed and safe */
> > 
> > Which part of get_tree_bdev?  Is it this?
> > 
> > 		/*
> > 		 * s_umount nests inside open_mutex during
> > 		 * __invalidate_device().  blkdev_put() acquires
> > 		 * open_mutex and can't be called under s_umount.  Drop
> > 		 * s_umount temporarily.  This is safe as we're
> > 		 * holding an active reference.
> > 		 */
> > 		up_write(&s->s_umount);
> > 		blkdev_put(bdev, fc->fs_type);
> > 		down_write(&s->s_umount);
> 
> Yes.  With the refactoring earlier in the series get_tree_bdev should
> be trivial enough to not need a more specific reference.  If you
> think there's a better way to refer to it I can update the comment,
> though.
> 
> > >  		mp->m_logdev_targp = mp->m_ddev_targp;
> > >  	}
> > >  
> > > -	return 0;
> > > +	error = 0;
> > > +out_unlock:
> > > +	down_write(&sb->s_umount);
> > 
> > Isn't down_write taking s_umount?  I think the label should be
> > out_relock or something less misleading.
> 
> Agreed.  Christian, can you just change this in your branch, or should
> I send an incremental patch?

No need to send an incremental patch. I just s/out_unlock/out_relock/g
in-tree. Thanks!
Darrick J. Wong Aug. 5, 2023, 4:19 p.m. UTC | #4
On Sat, Aug 05, 2023 at 10:32:39AM +0200, Christoph Hellwig wrote:
> On Wed, Aug 02, 2023 at 09:32:19AM -0700, Darrick J. Wong wrote:
> > > +	/* see get_tree_bdev why this is needed and safe */
> > 
> > Which part of get_tree_bdev?  Is it this?
> > 
> > 		/*
> > 		 * s_umount nests inside open_mutex during
> > 		 * __invalidate_device().  blkdev_put() acquires
> > 		 * open_mutex and can't be called under s_umount.  Drop
> > 		 * s_umount temporarily.  This is safe as we're
> > 		 * holding an active reference.
> > 		 */
> > 		up_write(&s->s_umount);
> > 		blkdev_put(bdev, fc->fs_type);
> > 		down_write(&s->s_umount);
> 
> Yes.  With the refactoring earlier in the series get_tree_bdev should
> be trivial enough to not need a more specific reference.  If you
> think there's a better way to refer to it I can update the comment,
> though.

How about:

	/*
	 * blkdev_put can't be called under s_umount, see the comment in
	 * get_tree_bdev for more details
	 */

with that and the label name change,
Reviewed-by: Darrick J. Wong <djwong@kernel.org>

--D


> > >  		mp->m_logdev_targp = mp->m_ddev_targp;
> > >  	}
> > >  
> > > -	return 0;
> > > +	error = 0;
> > > +out_unlock:
> > > +	down_write(&sb->s_umount);
> > 
> > Isn't down_write taking s_umount?  I think the label should be
> > out_relock or something less misleading.
> 
> Agreed.  Christian, can you just change this in your branch, or should
> I send an incremental patch?
>
Christian Brauner Aug. 5, 2023, 5:13 p.m. UTC | #5
On Sat, Aug 05, 2023 at 09:19:04AM -0700, Darrick J. Wong wrote:
> On Sat, Aug 05, 2023 at 10:32:39AM +0200, Christoph Hellwig wrote:
> > On Wed, Aug 02, 2023 at 09:32:19AM -0700, Darrick J. Wong wrote:
> > > > +	/* see get_tree_bdev why this is needed and safe */
> > > 
> > > Which part of get_tree_bdev?  Is it this?
> > > 
> > > 		/*
> > > 		 * s_umount nests inside open_mutex during
> > > 		 * __invalidate_device().  blkdev_put() acquires
> > > 		 * open_mutex and can't be called under s_umount.  Drop
> > > 		 * s_umount temporarily.  This is safe as we're
> > > 		 * holding an active reference.
> > > 		 */
> > > 		up_write(&s->s_umount);
> > > 		blkdev_put(bdev, fc->fs_type);
> > > 		down_write(&s->s_umount);
> > 
> > Yes.  With the refactoring earlier in the series get_tree_bdev should
> > be trivial enough to not need a more specific reference.  If you
> > think there's a better way to refer to it I can update the comment,
> > though.
> 
> How about:
> 
> 	/*
> 	 * blkdev_put can't be called under s_umount, see the comment in
> 	 * get_tree_bdev for more details
> 	 */
> 
> with that and the label name change,
> Reviewed-by: Darrick J. Wong <djwong@kernel.org>

Added that comment and you rvb in-tree.
diff mbox series

Patch

diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8185102431301d..d5042419ed9997 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -448,17 +448,21 @@  STATIC int
 xfs_open_devices(
 	struct xfs_mount	*mp)
 {
-	struct block_device	*ddev = mp->m_super->s_bdev;
+	struct super_block	*sb = mp->m_super;
+	struct block_device	*ddev = sb->s_bdev;
 	struct block_device	*logdev = NULL, *rtdev = NULL;
 	int			error;
 
+	/* see get_tree_bdev why this is needed and safe */
+	up_write(&sb->s_umount);
+
 	/*
 	 * Open real time and log devices - order is important.
 	 */
 	if (mp->m_logname) {
 		error = xfs_blkdev_get(mp, mp->m_logname, &logdev);
 		if (error)
-			return error;
+			goto out_unlock;
 	}
 
 	if (mp->m_rtname) {
@@ -496,7 +500,10 @@  xfs_open_devices(
 		mp->m_logdev_targp = mp->m_ddev_targp;
 	}
 
-	return 0;
+	error = 0;
+out_unlock:
+	down_write(&sb->s_umount);
+	return error;
 
  out_free_rtdev_targ:
 	if (mp->m_rtdev_targp)
@@ -508,7 +515,7 @@  xfs_open_devices(
  out_close_logdev:
 	if (logdev && logdev != ddev)
 		xfs_blkdev_put(mp, logdev);
-	return error;
+	goto out_unlock;
 }
 
 /*