diff mbox series

direct-io: use read lock for DIO_LOCKING flag

Message ID 20210415094332.37231-1-yuchao0@huawei.com (mailing list archive)
State New, archived
Headers show
Series direct-io: use read lock for DIO_LOCKING flag | expand

Commit Message

Chao Yu April 15, 2021, 9:43 a.m. UTC
9902af79c01a ("parallel lookups: actual switch to rwsem") changes inode
lock from mutex to rwsem, however, we forgot to adjust lock for
DIO_LOCKING flag in do_blockdev_direct_IO(), so let's change to hold read
lock to mitigate performance regression in the case of read DIO vs read DIO,
meanwhile it still keeps original functionality of avoiding buffered access
vs direct access.

Signed-off-by: Chao Yu <yuchao0@huawei.com>
---
 fs/direct-io.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Jan Kara April 15, 2021, 10:24 a.m. UTC | #1
On Thu 15-04-21 17:43:32, Chao Yu wrote:
> 9902af79c01a ("parallel lookups: actual switch to rwsem") changes inode
> lock from mutex to rwsem, however, we forgot to adjust lock for
> DIO_LOCKING flag in do_blockdev_direct_IO(), so let's change to hold read
> lock to mitigate performance regression in the case of read DIO vs read DIO,
> meanwhile it still keeps original functionality of avoiding buffered access
> vs direct access.
> 
> Signed-off-by: Chao Yu <yuchao0@huawei.com>

Thanks for the patch but this is not safe. Originally we had exclusive lock
(with i_mutex), switching to rwsem doesn't change that requirement. It may
be OK for some filesystems to actually use shared acquisition of rwsem for
DIO reads but it is not clear that is fine for all filesystems (and I
suspect those filesystems that actually do care already don't use
DIO_LOCKING flag or were already converted to iomap_dio_rw()). So unless
you do audit of all filesystems using do_blockdev_direct_IO() with
DIO_LOCKING flag and make sure they are all fine with inode lock in shared
mode, this is a no-go.

								Honza

> ---
>  fs/direct-io.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/fs/direct-io.c b/fs/direct-io.c
> index b2e86e739d7a..93ff912f2749 100644
> --- a/fs/direct-io.c
> +++ b/fs/direct-io.c
> @@ -1166,7 +1166,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  	dio->flags = flags;
>  	if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) {
>  		/* will be released by direct_io_worker */
> -		inode_lock(inode);
> +		inode_lock_shared(inode);
>  	}
>  
>  	/* Once we sampled i_size check for reads beyond EOF */
> @@ -1316,7 +1316,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  	 * of protecting us from looking up uninitialized blocks.
>  	 */
>  	if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING))
> -		inode_unlock(dio->inode);
> +		inode_unlock_shared(dio->inode);
>  
>  	/*
>  	 * The only time we want to leave bios in flight is when a successful
> @@ -1341,7 +1341,7 @@ do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
>  
>  fail_dio:
>  	if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ)
> -		inode_unlock(inode);
> +		inode_unlock_shared(inode);
>  
>  	kmem_cache_free(dio_cache, dio);
>  	return retval;
> -- 
> 2.29.2
>
Al Viro April 16, 2021, 12:43 a.m. UTC | #2
On Thu, Apr 15, 2021 at 12:24:13PM +0200, Jan Kara wrote:
> On Thu 15-04-21 17:43:32, Chao Yu wrote:
> > 9902af79c01a ("parallel lookups: actual switch to rwsem") changes inode
> > lock from mutex to rwsem, however, we forgot to adjust lock for
> > DIO_LOCKING flag in do_blockdev_direct_IO(),

The change in question had nothing to do with the use of ->i_mutex for
regular files data access.

> > so let's change to hold read
> > lock to mitigate performance regression in the case of read DIO vs read DIO,
> > meanwhile it still keeps original functionality of avoiding buffered access
> > vs direct access.
> > 
> > Signed-off-by: Chao Yu <yuchao0@huawei.com>
> 
> Thanks for the patch but this is not safe. Originally we had exclusive lock
> (with i_mutex), switching to rwsem doesn't change that requirement. It may
> be OK for some filesystems to actually use shared acquisition of rwsem for
> DIO reads but it is not clear that is fine for all filesystems (and I
> suspect those filesystems that actually do care already don't use
> DIO_LOCKING flag or were already converted to iomap_dio_rw()). So unless
> you do audit of all filesystems using do_blockdev_direct_IO() with
> DIO_LOCKING flag and make sure they are all fine with inode lock in shared
> mode, this is a no-go.

Aye.  Frankly, I would expect that anyone bothering with that kind of
analysis for given filesystem (and there are fairly unpleasant ones in the
list) would just use the fruits of those efforts to convert it over to
iomap.

"Read DIO" does not mean that accesses to private in-core data structures used
by given filesystem can be safely done in parallel.  So blanket patch like
that is not safe at all.
Chao Yu April 16, 2021, 2:19 a.m. UTC | #3
On 2021/4/16 8:43, Al Viro wrote:
> On Thu, Apr 15, 2021 at 12:24:13PM +0200, Jan Kara wrote:
>> On Thu 15-04-21 17:43:32, Chao Yu wrote:
>>> 9902af79c01a ("parallel lookups: actual switch to rwsem") changes inode
>>> lock from mutex to rwsem, however, we forgot to adjust lock for
>>> DIO_LOCKING flag in do_blockdev_direct_IO(),
> 
> The change in question had nothing to do with the use of ->i_mutex for
> regular files data access.
> 
>>> so let's change to hold read
>>> lock to mitigate performance regression in the case of read DIO vs read DIO,
>>> meanwhile it still keeps original functionality of avoiding buffered access
>>> vs direct access.
>>>
>>> Signed-off-by: Chao Yu <yuchao0@huawei.com>
>>
>> Thanks for the patch but this is not safe. Originally we had exclusive lock
>> (with i_mutex), switching to rwsem doesn't change that requirement. It may
>> be OK for some filesystems to actually use shared acquisition of rwsem for
>> DIO reads but it is not clear that is fine for all filesystems (and I
>> suspect those filesystems that actually do care already don't use
>> DIO_LOCKING flag or were already converted to iomap_dio_rw()). So unless
>> you do audit of all filesystems using do_blockdev_direct_IO() with
>> DIO_LOCKING flag and make sure they are all fine with inode lock in shared
>> mode, this is a no-go.
> 
> Aye.  Frankly, I would expect that anyone bothering with that kind of
> analysis for given filesystem (and there are fairly unpleasant ones in the
> list) would just use the fruits of those efforts to convert it over to
> iomap.

Actually, I was misguided by DIO_LOCKING comments more or less, it looks it
was introduced to avoid race case only in between buffered IO and DIO.

	/* need locking between buffered and direct access */
	DIO_LOCKING	= 0x01,

I don't think it is easy for me to analyse usage scenario/restriction of all
DIO_LOCKING users, and get their developers' acks for this change.

Converting fs to use iomap_dio_rw looks a better option for me, thanks, Jan
and Al. :)

Thanks,

> 
> "Read DIO" does not mean that accesses to private in-core data structures used
> by given filesystem can be safely done in parallel.  So blanket patch like
> that is not safe at all.
> .
>
diff mbox series

Patch

diff --git a/fs/direct-io.c b/fs/direct-io.c
index b2e86e739d7a..93ff912f2749 100644
--- a/fs/direct-io.c
+++ b/fs/direct-io.c
@@ -1166,7 +1166,7 @@  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	dio->flags = flags;
 	if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ) {
 		/* will be released by direct_io_worker */
-		inode_lock(inode);
+		inode_lock_shared(inode);
 	}
 
 	/* Once we sampled i_size check for reads beyond EOF */
@@ -1316,7 +1316,7 @@  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 	 * of protecting us from looking up uninitialized blocks.
 	 */
 	if (iov_iter_rw(iter) == READ && (dio->flags & DIO_LOCKING))
-		inode_unlock(dio->inode);
+		inode_unlock_shared(dio->inode);
 
 	/*
 	 * The only time we want to leave bios in flight is when a successful
@@ -1341,7 +1341,7 @@  do_blockdev_direct_IO(struct kiocb *iocb, struct inode *inode,
 
 fail_dio:
 	if (dio->flags & DIO_LOCKING && iov_iter_rw(iter) == READ)
-		inode_unlock(inode);
+		inode_unlock_shared(inode);
 
 	kmem_cache_free(dio_cache, dio);
 	return retval;