Message ID | 20210122164643.620257-3-hch@lst.de (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | [1/2] xfs: refactor xfs_file_fsync | expand |
On Fri, Jan 22, 2021 at 05:46:43PM +0100, Christoph Hellwig wrote: > If the inode is not pinned by the time fsync is called we don't need the > ilock to protect against concurrent clearing of ili_fsync_fields as the > inode won't need a log flush or clearing of these fields. Not taking > the iolock allows for full concurrency of fsync and thus O_DSYNC > completions with io_uring/aio write submissions. > > Signed-off-by: Christoph Hellwig <hch@lst.de> Code looks good, so Reviewed-by: Dave Chinner <dchinner@redhat.com> But it makes me wonder... That is, we already elide the call to generic_write_sync() in direct IO in the case that the device supports FUA and it's a pure overwrite with no dirty metadata on the inode. Hence for a lot of storage and AIO/io_uring+DIO w/ O_DSYNC workloads we're already eliding this fsync-based lock cycle. In the case where we can't do a REQ_FUA IO because it is not supported by the device, then don't we really only need a cache flush at IO completion rather than the full generic_write_sync() call path? That would provide this optimisation to all the filesystems using iomap_dio_rw(), not just XFS.... In fact, I wonder if we need to do anything other than just use REQ_FUA unconditionally in iomap for this situation, as the block layer will translate REQ_FUA to a write+post-flush if the device doesn't support FUA writes directly. You're thoughts on that, Christoph? Cheers, Dave.
On Sat, Jan 23, 2021 at 08:08:01AM +1100, Dave Chinner wrote: > That is, we already elide the call to generic_write_sync() in direct > IO in the case that the device supports FUA and it's a pure > overwrite with no dirty metadata on the inode. Hence for a lot of > storage and AIO/io_uring+DIO w/ O_DSYNC workloads we're already > eliding this fsync-based lock cycle. > > In the case where we can't do a REQ_FUA IO because it is not > supported by the device, then don't we really only need a cache > flush at IO completion rather than the full generic_write_sync() > call path? That would provide this optimisation to all the > filesystems using iomap_dio_rw(), not just XFS.... > > In fact, I wonder if we need to do anything other than just use > REQ_FUA unconditionally in iomap for this situation, as the block > layer will translate REQ_FUA to a write+post-flush if the device > doesn't support FUA writes directly. > > You're thoughts on that, Christoph? For the pure overwrite O_DIRECT + O_DSYNC case we'd get away with just a flush. And using REQ_FUA will get us there, so it might be worth a try.
On Fri, Jan 22, 2021 at 05:46:43PM +0100, Christoph Hellwig wrote: > If the inode is not pinned by the time fsync is called we don't need the > ilock to protect against concurrent clearing of ili_fsync_fields as the > inode won't need a log flush or clearing of these fields. Not taking > the iolock allows for full concurrency of fsync and thus O_DSYNC > completions with io_uring/aio write submissions. > > Signed-off-by: Christoph Hellwig <hch@lst.de> > --- > fs/xfs/xfs_file.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c > index 588232c77f11e0..ffe2d7c37e26cd 100644 > --- a/fs/xfs/xfs_file.c > +++ b/fs/xfs/xfs_file.c > @@ -200,7 +200,14 @@ xfs_file_fsync( > else if (mp->m_logdev_targp != mp->m_ddev_targp) > xfs_blkdev_issue_flush(mp->m_ddev_targp); > > - error = xfs_fsync_flush_log(ip, datasync, &log_flushed); > + /* > + * Any inode that has dirty modifications in the log is pinned. The > + * racy check here for a pinned inode while not catch modifications s/while/will/ ? Otherwise looks good: Reviewed-by: Brian Foster <bfoster@redhat.com> > + * that happen concurrently to the fsync call, but fsync semantics > + * only require to sync previously completed I/O. > + */ > + if (xfs_ipincount(ip)) > + error = xfs_fsync_flush_log(ip, datasync, &log_flushed); > > /* > * If we only have a single device, and the log force about was > -- > 2.29.2 >
On Sat, Jan 23, 2021 at 07:41:39AM +0100, Christoph Hellwig wrote: > > In fact, I wonder if we need to do anything other than just use > > REQ_FUA unconditionally in iomap for this situation, as the block > > layer will translate REQ_FUA to a write+post-flush if the device > > doesn't support FUA writes directly. > > > > You're thoughts on that, Christoph? > > For the pure overwrite O_DIRECT + O_DSYNC case we'd get away with just > a flush. And using REQ_FUA will get us there, so it might be worth > a try. And looking at this a little more, while just using REQ_FUA would work it would be rather suboptimal for many cases, as the block layer flush state machine would do a flush for every bio. So for each O_DIRECT + O_DSYNC write that generates more than one bio we'd grow extra flushes.
On Mon, Jan 25, 2021 at 08:16:18AM -0500, Brian Foster wrote: > > - error = xfs_fsync_flush_log(ip, datasync, &log_flushed); > > + /* > > + * Any inode that has dirty modifications in the log is pinned. The > > + * racy check here for a pinned inode while not catch modifications > > s/while/will/ ? Yes. Darrick, can you fix this up when applying the patch, or do you want me to resend?
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 588232c77f11e0..ffe2d7c37e26cd 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -200,7 +200,14 @@ xfs_file_fsync( else if (mp->m_logdev_targp != mp->m_ddev_targp) xfs_blkdev_issue_flush(mp->m_ddev_targp); - error = xfs_fsync_flush_log(ip, datasync, &log_flushed); + /* + * Any inode that has dirty modifications in the log is pinned. The + * racy check here for a pinned inode while not catch modifications + * that happen concurrently to the fsync call, but fsync semantics + * only require to sync previously completed I/O. + */ + if (xfs_ipincount(ip)) + error = xfs_fsync_flush_log(ip, datasync, &log_flushed); /* * If we only have a single device, and the log force about was
If the inode is not pinned by the time fsync is called we don't need the ilock to protect against concurrent clearing of ili_fsync_fields as the inode won't need a log flush or clearing of these fields. Not taking the iolock allows for full concurrency of fsync and thus O_DSYNC completions with io_uring/aio write submissions. Signed-off-by: Christoph Hellwig <hch@lst.de> --- fs/xfs/xfs_file.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-)