[4/8] iomap: Move lockdep_assert_held() to iomap_dio_rw() calls
diff mbox series

Message ID 20191212003043.31093-5-rgoldwyn@suse.de
State New
Headers show
Series
  • btrfs direct-io using iomap
Related show

Commit Message

Goldwyn Rodrigues Dec. 12, 2019, 12:30 a.m. UTC
From: Goldwyn Rodrigues <rgoldwyn@suse.com>

Filesystems such as btrfs can perform direct I/O without holding the
inode->i_rwsem in some of the cases like writing within i_size.
So, remove the check for lockdep_assert_held() in iomap_dio_rw()
and move it to where iomap_dio_rw() is called so the checks are
in place.

Signed-off-by: Goldwyn Rodrigues <rgoldwyn@suse.com>
---
 fs/gfs2/file.c       | 4 ++++
 fs/iomap/direct-io.c | 2 --
 fs/xfs/xfs_file.c    | 9 ++++++++-
 3 files changed, 12 insertions(+), 3 deletions(-)

Comments

Christoph Hellwig Dec. 12, 2019, 9:50 a.m. UTC | #1
On Wed, Dec 11, 2019 at 06:30:39PM -0600, Goldwyn Rodrigues wrote:
> From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> 
> Filesystems such as btrfs can perform direct I/O without holding the
> inode->i_rwsem in some of the cases like writing within i_size.

How is that safe?  

> +	lockdep_assert_held(&file_inode(file)->i_rwsem);

Having the asserts in the callers is pointless.  The assert is inside
the iomap helper to ensure the expected calling conventions, as the
code is written under the assumption that we have i_rwsem.
Goldwyn Rodrigues Dec. 12, 2019, 10:24 p.m. UTC | #2
On  1:50 12/12, Christoph Hellwig wrote:
> On Wed, Dec 11, 2019 at 06:30:39PM -0600, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > 
> > Filesystems such as btrfs can perform direct I/O without holding the
> > inode->i_rwsem in some of the cases like writing within i_size.
> 
> How is that safe? 

This (inode_lock release) is only done for writes within i_size.
We only have to safeguard write against truncates, which is done by
inode_dio_wait() call in the truncate sequence (I had mistakenly removed
it in patch 8/8, I shall reinstate that). The commit that introduced this
optimization is 38851cc19adb ("Btrfs: implement unlocked dio write")


> 
> > +	lockdep_assert_held(&file_inode(file)->i_rwsem);
> 
> Having the asserts in the callers is pointless.  The assert is inside
> the iomap helper to ensure the expected calling conventions, as the
> code is written under the assumption that we have i_rwsem.

Hmm, conflicting opinions from you and Dave. Anyways, I have removed it
in individual filesystems.
Dave Chinner Dec. 12, 2019, 10:46 p.m. UTC | #3
On Thu, Dec 12, 2019 at 01:50:44AM -0800, Christoph Hellwig wrote:
> On Wed, Dec 11, 2019 at 06:30:39PM -0600, Goldwyn Rodrigues wrote:
> > From: Goldwyn Rodrigues <rgoldwyn@suse.com>
> > 
> > Filesystems such as btrfs can perform direct I/O without holding the
> > inode->i_rwsem in some of the cases like writing within i_size.
> 
> How is that safe?  
> 
> > +	lockdep_assert_held(&file_inode(file)->i_rwsem);
> 
> Having the asserts in the callers is pointless.  The assert is inside
> the iomap helper to ensure the expected calling conventions, as the
> code is written under the assumption that we have i_rwsem.

It's written under the assumption that the caller has already
performed the appropriate locking they require for serialisation
against other operations on that inode.

The fact that the filesystems up to this point all used the i_rwsem
is largely irrelevant, and filesystems don't have to use the i_rwsem
to serialise their IO. e.g. go back a handful of years and this
would have needed to take into account an XFS specific rwsem, not
the VFS inode mutex...

Indeed, the IO range locking patches I have for XFS get rid of this
lockdep assert in iomap because we no longer use the i_rwsem for IO
serialisation in XFS - we go back to using an internal XFS construct
for IO serialisation and don't use the i_rwsem at all.

Cheers,

Dave.

Patch
diff mbox series

diff --git a/fs/gfs2/file.c b/fs/gfs2/file.c
index 9d58295ccf7a..d0517a78640b 100644
--- a/fs/gfs2/file.c
+++ b/fs/gfs2/file.c
@@ -771,6 +771,8 @@  static ssize_t gfs2_file_direct_read(struct kiocb *iocb, struct iov_iter *to)
 	if (ret)
 		goto out_uninit;
 
+	lockdep_assert_held(&file_inode(file)->i_rwsem);
+
 	ret = iomap_dio_rw(iocb, to, &gfs2_iomap_ops, NULL,
 			   is_sync_kiocb(iocb));
 
@@ -807,6 +809,8 @@  static ssize_t gfs2_file_direct_write(struct kiocb *iocb, struct iov_iter *from)
 	if (offset + len > i_size_read(&ip->i_inode))
 		goto out;
 
+	lockdep_assert_held(&inode->i_rwsem);
+
 	ret = iomap_dio_rw(iocb, from, &gfs2_iomap_ops, NULL,
 			   is_sync_kiocb(iocb));
 
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 1a3bf3bd86fb..41c1e7c20a1f 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -415,8 +415,6 @@  iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 	struct blk_plug plug;
 	struct iomap_dio *dio;
 
-	lockdep_assert_held(&inode->i_rwsem);
-
 	if (!count)
 		return 0;
 
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index c93250108952..dfaccd4d6e6c 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -176,7 +176,8 @@  xfs_file_dio_aio_read(
 	struct kiocb		*iocb,
 	struct iov_iter		*to)
 {
-	struct xfs_inode	*ip = XFS_I(file_inode(iocb->ki_filp));
+	struct inode		*inode = file_inode(iocb->ki_filp);
+	struct xfs_inode	*ip = XFS_I(inode);
 	size_t			count = iov_iter_count(to);
 	ssize_t			ret;
 
@@ -188,6 +189,9 @@  xfs_file_dio_aio_read(
 	file_accessed(iocb->ki_filp);
 
 	xfs_ilock(ip, XFS_IOLOCK_SHARED);
+
+	lockdep_assert_held(&inode->i_rwsem);
+
 	ret = iomap_dio_rw(iocb, to, &xfs_read_iomap_ops, NULL,
 			is_sync_kiocb(iocb));
 	xfs_iunlock(ip, XFS_IOLOCK_SHARED);
@@ -547,6 +551,9 @@  xfs_file_dio_aio_write(
 	}
 
 	trace_xfs_file_direct_write(ip, count, iocb->ki_pos);
+
+	lockdep_assert_held(&inode->i_rwsem);
+
 	/*
 	 * If unaligned, this is the only IO in-flight. Wait on it before we
 	 * release the iolock to prevent subsequent overlapping IO.