Message ID | 20250310183946.932054-11-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | large atomic writes for xfs with CoW | expand |
On Mon, Mar 10, 2025 at 06:39:46PM +0000, John Garry wrote: > Dave Chinner thought that names IOMAP_ATOMIC_HW and IOMAP_ATOMIC_SW were > not appropopiate. Specifically because IOMAP_ATOMIC_HW could actually be > realised with a SW-based method in the block or md/dm layers. > > So rename to IOMAP_ATOMIC_BIO and IOMAP_ATOMIC_FS. Looking over the entire series and the already merged iomap code: there should be no reason at all for having IOMAP_ATOMIC_FS. The only thing it does is to branch out to xfs_atomic_write_sw_iomap_begin from xfs_atomic_write_iomap_begin. You can do that in a much simpler and nicer way by just having different iomap_ops for the bio based vs file system based atomics. I agree with dave that bio is a better term for the bio based atomic, but please use the IOMAP_ATOMIC_BIO name above instead of the IOMAP_BIO_ATOMIC actually used in the code if you change it. > */ > static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, > - const struct iomap *iomap, bool use_fua, bool atomic_hw) > + const struct iomap *iomap, bool use_fua, bool bio_atomic) Not new here, but these two bools are pretty ugly. I'd rather have a blk_opf_t extra_flags; in the caller that gets REQ_FUA and REQ_ATOMIC assigned as needed, and then just clear > > - if (atomic_hw && length != iter->len) > + if (bio_atomic && length != iter->len) > return -EINVAL; This could really use a comment explaining what the check is for. > - if (WARN_ON_ONCE(atomic_hw && n != length)) { > + if (WARN_ON_ONCE(bio_atomic && n != length)) { Same. > -#define IOMAP_ATOMIC_HW (1 << 9) /* HW-based torn-write protection */ > -#define IOMAP_DONTCACHE (1 << 10) > -#define IOMAP_ATOMIC_SW (1 << 11)/* SW-based torn-write protection */ > +#define IOMAP_DONTCACHE (1 << 9) > +#define IOMAP_BIO_ATOMIC (1 << 10) /* Use REQ_ATOMIC on single bio */ > +#define IOMAP_FS_ATOMIC (1 << 11) /* FS-based torn-write protection */ Please also fix the overly long lines here by moving the comments above the definitions. That should also give you enough space to expand the comment into a full sentence describing the flag fully.
On Wed, Mar 12, 2025 at 12:13:42AM -0700, Christoph Hellwig wrote: > On Mon, Mar 10, 2025 at 06:39:46PM +0000, John Garry wrote: > > Dave Chinner thought that names IOMAP_ATOMIC_HW and IOMAP_ATOMIC_SW were > > not appropopiate. Specifically because IOMAP_ATOMIC_HW could actually be > > realised with a SW-based method in the block or md/dm layers. > > > > So rename to IOMAP_ATOMIC_BIO and IOMAP_ATOMIC_FS. > > Looking over the entire series and the already merged iomap code: > there should be no reason at all for having IOMAP_ATOMIC_FS. > The only thing it does is to branch out to > xfs_atomic_write_sw_iomap_begin from xfs_atomic_write_iomap_begin. > > You can do that in a much simpler and nicer way by just having > different iomap_ops for the bio based vs file system based atomics. Agreed - I was going to suggest that, but got distracted by something else and then forgot about it when I got back to writing the email... > I agree with dave that bio is a better term for the bio based > atomic, but please use the IOMAP_ATOMIC_BIO name above instead > of the IOMAP_BIO_ATOMIC actually used in the code if you change > it. Works for me. > > */ > > static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, > > - const struct iomap *iomap, bool use_fua, bool atomic_hw) > > + const struct iomap *iomap, bool use_fua, bool bio_atomic) > > Not new here, but these two bools are pretty ugly. > > I'd rather have a > > blk_opf_t extra_flags; > > in the caller that gets REQ_FUA and REQ_ATOMIC assigned as needed, > and then just clear Yep, that is cleaner.. -Dave.
iomap_dio_bio_iter() On 12/03/2025 23:59, Dave Chinner wrote: >>> */ >>> static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, >>> - const struct iomap *iomap, bool use_fua, bool atomic_hw) >>> + const struct iomap *iomap, bool use_fua, bool bio_atomic) >> Not new here, but these two bools are pretty ugly. >> >> I'd rather have a >> >> blk_opf_t extra_flags; >> >> in the caller that gets REQ_FUA and REQ_ATOMIC assigned as needed, >> and then just clear > Yep, that is cleaner.. This suggestion is not clear to me. Is it that iomap_dio_bio_iter() [the only caller of iomap_dio_bio_opflags()] sets REQ_FUA and REQ_ATOMIC in extra_flags, and then we extra_flags | bio_opf? Note that iomap_dio_bio_opflags() does still use use_fua for clearing IOMAP_DIO_WRITE_THROUGH. And to me it seems nicer to set all the REQ_ flags in one place.
On Thu, Mar 13, 2025 at 06:28:03AM +0000, John Garry wrote: > > > I'd rather have a > > > > > > blk_opf_t extra_flags; > > > > > > in the caller that gets REQ_FUA and REQ_ATOMIC assigned as needed, > > > and then just clear > > Yep, that is cleaner.. > > This suggestion is not clear to me. > > Is it that iomap_dio_bio_iter() [the only caller of iomap_dio_bio_opflags()] > sets REQ_FUA and REQ_ATOMIC in extra_flags, and then we extra_flags | > bio_opf? Yes. > Note that iomap_dio_bio_opflags() does still use use_fua for clearing > IOMAP_DIO_WRITE_THROUGH. You can check for REQ_FUA in extra_flags (or the entire op). > And to me it seems nicer to set all the REQ_ flags in one place. Passing multiple bool arguments just loses way too much context. But if you really want everything in one place you could probably build the entire blk_opf_t in iomap_dio_bio_iter, and avoid having to recalculate it for every bio.
On 13/03/2025 07:02, Christoph Hellwig wrote: > On Thu, Mar 13, 2025 at 06:28:03AM +0000, John Garry wrote: >>>> I'd rather have a >>>> >>>> blk_opf_t extra_flags; >>>> >>>> in the caller that gets REQ_FUA and REQ_ATOMIC assigned as needed, >>>> and then just clear >>> Yep, that is cleaner.. >> >> This suggestion is not clear to me. >> >> Is it that iomap_dio_bio_iter() [the only caller of iomap_dio_bio_opflags()] >> sets REQ_FUA and REQ_ATOMIC in extra_flags, and then we extra_flags | >> bio_opf? > > Yes. > >> Note that iomap_dio_bio_opflags() does still use use_fua for clearing >> IOMAP_DIO_WRITE_THROUGH. > > You can check for REQ_FUA in extra_flags (or the entire op). > >> And to me it seems nicer to set all the REQ_ flags in one place. > > Passing multiple bool arguments just loses way too much context. But > if you really want everything in one place you could probably build > the entire blk_opf_t in iomap_dio_bio_iter, and avoid having to > recalculate it for every bio. > Yeah, when we start taking use_fua and atomic_bio args from iomap_dio_bio_opflags(), then iomap_dio_bio_opflags() becomes a shell of the function. So how about this (I would re-add the write through comment): --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -311,30 +311,6 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, return 0; } -/* - * Figure out the bio's operation flags from the dio request, the - * mapping, and whether or not we want FUA. Note that we can end up - * clearing the WRITE_THROUGH flag in the dio request. - */ -static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, - const struct iomap *iomap, bool use_fua, bool atomic_hw) -{ - blk_opf_t opflags = REQ_SYNC | REQ_IDLE; - - if (!(dio->flags & IOMAP_DIO_WRITE)) - return REQ_OP_READ; - - opflags |= REQ_OP_WRITE; - if (use_fua) - opflags |= REQ_FUA; - else - dio->flags &= ~IOMAP_DIO_WRITE_THROUGH; - if (atomic_hw) - opflags |= REQ_ATOMIC; - - return opflags; -} - static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) { const struct iomap *iomap = &iter->iomap; @@ -346,13 +322,20 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) blk_opf_t bio_opf; struct bio *bio; bool need_zeroout = false; - bool use_fua = false; int nr_pages, ret = 0; u64 copied = 0; size_t orig_count; - if (atomic_hw && length != iter->len) - return -EINVAL; + if (dio->flags & IOMAP_DIO_WRITE) { + bio_opf = REQ_OP_WRITE; + if (atomic_hw) { + if (length != iter->len) + return -EINVAL; + bio_opf |= REQ_ATOMIC; + } + } else { + bio_opf = REQ_OP_READ; + } if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) || !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) @@ -382,10 +365,12 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) */ if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && (dio->flags & IOMAP_DIO_WRITE_THROUGH) && - (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) - use_fua = true; - else if (dio->flags & IOMAP_DIO_NEED_SYNC) + (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) { + bio_opf |= REQ_FUA; //reads as well? + dio->flags &= ~IOMAP_DIO_WRITE_THROUGH; + } else if (dio->flags & IOMAP_DIO_NEED_SYNC) { dio->flags &= ~IOMAP_DIO_CALLER_COMP; + } } /* @@ -407,7 +392,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) * during completion processing. */ if (need_zeroout || - ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) || + ((dio->flags & IOMAP_DIO_NEED_SYNC) && !(bio_opf & REQ_FUA)) || ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) dio->flags &= ~IOMAP_DIO_CALLER_COMP; @@ -428,8 +413,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) goto out; } - bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic_hw); - nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS); do { size_t n; --
On Thu, Mar 13, 2025 at 07:41:11AM +0000, John Garry wrote: > So how about this (I would re-add the write through comment): This looks roughly sane. You'd probably want to turn the iomap_dio_bio_opflags removal into a prep path, though. > - blk_opf_t opflags = REQ_SYNC | REQ_IDLE; This good lost and should move to the bio_opf declaration now. > + (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) { > + bio_opf |= REQ_FUA; //reads as well? REQ_FUA is not defined for reads in Linux Some of the storage standards define it for reads, but the semantics are pretty nonsensical.
On 13/03/2025 07:49, Christoph Hellwig wrote: > On Thu, Mar 13, 2025 at 07:41:11AM +0000, John Garry wrote: >> So how about this (I would re-add the write through comment): > > This looks roughly sane. You'd probably want to turn the > iomap_dio_bio_opflags removal into a prep path, though. Sure > >> - blk_opf_t opflags = REQ_SYNC | REQ_IDLE; > > This good lost and should move to the bio_opf declaration now. ok > >> + (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) { >> + bio_opf |= REQ_FUA; //reads as well? > > REQ_FUA is not defined for reads in Linux Some of the storage standards > define it for reads, but the semantics are pretty nonsensical. > ok, so I will need to check for writes when setting that (as it was previously)
On Thu, Mar 13, 2025 at 07:53:22AM +0000, John Garry wrote: > > REQ_FUA is not defined for reads in Linux Some of the storage standards > > define it for reads, but the semantics are pretty nonsensical. > > > > ok, so I will need to check for writes when setting that (as it was > previously) Yes. That entire IOMAP_MAPPED branch should just grow and additional IOMAP_DIO_WRITE check, as all the code in it is only relevant to writes. In fact that also includes the most of the other checks before, so this could use more refactoring if we want to.
Something like this (untestested): diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 5299f70428ef..3fa21445906a 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -312,27 +312,20 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, } /* - * Figure out the bio's operation flags from the dio request, the - * mapping, and whether or not we want FUA. Note that we can end up - * clearing the WRITE_THROUGH flag in the dio request. + * Use a FUA write if we need datasync semantics and this is a pure data I/O + * that doesn't require any metadata updates (including after I/O completion + * such as unwritten extent conversion) and the underlying device either + * doesn't have a volatile write cache or supports FUA. + * This allows us to avoid cache flushes on I/O completion. */ -static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, - const struct iomap *iomap, bool use_fua, bool atomic_hw) +static inline bool iomap_dio_can_use_fua(const struct iomap *iomap, + struct iomap_dio *dio) { - blk_opf_t opflags = REQ_SYNC | REQ_IDLE; - - if (!(dio->flags & IOMAP_DIO_WRITE)) - return REQ_OP_READ; - - opflags |= REQ_OP_WRITE; - if (use_fua) - opflags |= REQ_FUA; - else - dio->flags &= ~IOMAP_DIO_WRITE_THROUGH; - if (atomic_hw) - opflags |= REQ_ATOMIC; - - return opflags; + if (iomap->flags & (IOMAP_F_SHARED | IOMAP_F_DIRTY)) + return false; + if (!(dio->flags & IOMAP_DIO_WRITE_THROUGH)) + return false; + return !bdev_write_cache(iomap->bdev) || bdev_fua(iomap->bdev); } static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) @@ -343,49 +336,53 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) bool atomic_hw = iter->flags & IOMAP_ATOMIC_HW; const loff_t length = iomap_length(iter); loff_t pos = iter->pos; - blk_opf_t bio_opf; + blk_opf_t bio_opf = REQ_SYNC | REQ_IDLE; struct bio *bio; bool need_zeroout = false; - bool use_fua = false; int nr_pages, ret = 0; u64 copied = 0; size_t orig_count; - if (atomic_hw && length != iter->len) - return -EINVAL; - if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) || !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter)) return -EINVAL; - if (iomap->type == IOMAP_UNWRITTEN) { - dio->flags |= IOMAP_DIO_UNWRITTEN; - need_zeroout = true; - } + if (dio->flags & IOMAP_DIO_WRITE) { + bio_opf |= REQ_OP_WRITE; + + if (atomic_hw) { + if (length != iter->len) + return -EINVAL; + bio_opf |= REQ_ATOMIC; + } - if (iomap->flags & IOMAP_F_SHARED) - dio->flags |= IOMAP_DIO_COW; + if (iomap->type == IOMAP_UNWRITTEN) { + dio->flags |= IOMAP_DIO_UNWRITTEN; + need_zeroout = true; + } - if (iomap->flags & IOMAP_F_NEW) { - need_zeroout = true; - } else if (iomap->type == IOMAP_MAPPED) { - /* - * Use a FUA write if we need datasync semantics, this is a pure - * data IO that doesn't require any metadata updates (including - * after IO completion such as unwritten extent conversion) and - * the underlying device either supports FUA or doesn't have - * a volatile write cache. This allows us to avoid cache flushes - * on IO completion. If we can't use writethrough and need to - * sync, disable in-task completions as dio completion will - * need to call generic_write_sync() which will do a blocking - * fsync / cache flush call. - */ - if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) && - (dio->flags & IOMAP_DIO_WRITE_THROUGH) && - (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) - use_fua = true; - else if (dio->flags & IOMAP_DIO_NEED_SYNC) - dio->flags &= ~IOMAP_DIO_CALLER_COMP; + if (iomap->flags & IOMAP_F_SHARED) + dio->flags |= IOMAP_DIO_COW; + + if (iomap->flags & IOMAP_F_NEW) { + need_zeroout = true; + } else if (iomap->type == IOMAP_MAPPED) { + if (iomap_dio_can_use_fua(iomap, dio)) { + bio_opf |= REQ_FUA; + } else { + dio->flags &= ~IOMAP_DIO_WRITE_THROUGH; + /* + * Disable in-task completions if we can't use + * writethrough and need to sync as the I/O + * completion handler has to force a (blocking) + * cache flush. + */ + if (dio->flags & IOMAP_DIO_NEED_SYNC) + dio->flags &= ~IOMAP_DIO_CALLER_COMP; + } + } + } else { + bio_opf |= REQ_OP_READ; } /* @@ -407,7 +404,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) * during completion processing. */ if (need_zeroout || - ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) || + ((dio->flags & IOMAP_DIO_NEED_SYNC) && !(bio_opf & REQ_FUA)) || ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode))) dio->flags &= ~IOMAP_DIO_CALLER_COMP; @@ -428,8 +425,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) goto out; } - bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic_hw); - nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS); do { size_t n;
On 13/03/2025 08:18, Christoph Hellwig wrote:
> Something like this (untestested):
looks sane, I'll check it further and pick it up.
Thanks
On Thu, Mar 13, 2025 at 01:18:26AM -0700, Christoph Hellwig wrote:
> Something like this (untestested):
In fact this can be further simplified, as the clearing of
IOMAP_DIO_CALLER_COMP has duplicate logic, and only applies to writes.
So something like this would do even better:
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 5299f70428ef..e9b03b9dae9e 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -312,27 +312,20 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
}
/*
- * Figure out the bio's operation flags from the dio request, the
- * mapping, and whether or not we want FUA. Note that we can end up
- * clearing the WRITE_THROUGH flag in the dio request.
+ * Use a FUA write if we need datasync semantics and this is a pure data I/O
+ * that doesn't require any metadata updates (including after I/O completion
+ * such as unwritten extent conversion) and the underlying device either
+ * doesn't have a volatile write cache or supports FUA.
+ * This allows us to avoid cache flushes on I/O completion.
*/
-static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
- const struct iomap *iomap, bool use_fua, bool atomic_hw)
+static inline bool iomap_dio_can_use_fua(const struct iomap *iomap,
+ struct iomap_dio *dio)
{
- blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
-
- if (!(dio->flags & IOMAP_DIO_WRITE))
- return REQ_OP_READ;
-
- opflags |= REQ_OP_WRITE;
- if (use_fua)
- opflags |= REQ_FUA;
- else
- dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
- if (atomic_hw)
- opflags |= REQ_ATOMIC;
-
- return opflags;
+ if (iomap->flags & (IOMAP_F_SHARED | IOMAP_F_DIRTY))
+ return false;
+ if (!(dio->flags & IOMAP_DIO_WRITE_THROUGH))
+ return false;
+ return !bdev_write_cache(iomap->bdev) || bdev_fua(iomap->bdev);
}
static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
@@ -340,52 +333,59 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
const struct iomap *iomap = &iter->iomap;
struct inode *inode = iter->inode;
unsigned int fs_block_size = i_blocksize(inode), pad;
- bool atomic_hw = iter->flags & IOMAP_ATOMIC_HW;
const loff_t length = iomap_length(iter);
loff_t pos = iter->pos;
- blk_opf_t bio_opf;
+ blk_opf_t bio_opf = REQ_SYNC | REQ_IDLE;
struct bio *bio;
bool need_zeroout = false;
- bool use_fua = false;
int nr_pages, ret = 0;
u64 copied = 0;
size_t orig_count;
- if (atomic_hw && length != iter->len)
- return -EINVAL;
-
if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
!bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
return -EINVAL;
- if (iomap->type == IOMAP_UNWRITTEN) {
- dio->flags |= IOMAP_DIO_UNWRITTEN;
- need_zeroout = true;
- }
+ if (dio->flags & IOMAP_DIO_WRITE) {
+ bio_opf |= REQ_OP_WRITE;
+
+ if (iter->flags & IOMAP_ATOMIC_HW) {
+ if (length != iter->len)
+ return -EINVAL;
+ bio_opf |= REQ_ATOMIC;
+ }
+
+ if (iomap->type == IOMAP_UNWRITTEN) {
+ dio->flags |= IOMAP_DIO_UNWRITTEN;
+ need_zeroout = true;
+ }
- if (iomap->flags & IOMAP_F_SHARED)
- dio->flags |= IOMAP_DIO_COW;
+ if (iomap->flags & IOMAP_F_SHARED)
+ dio->flags |= IOMAP_DIO_COW;
- if (iomap->flags & IOMAP_F_NEW) {
- need_zeroout = true;
- } else if (iomap->type == IOMAP_MAPPED) {
+ if (iomap->flags & IOMAP_F_NEW) {
+ need_zeroout = true;
+ } else if (iomap->type == IOMAP_MAPPED) {
+ if (iomap_dio_can_use_fua(iomap, dio))
+ bio_opf |= REQ_FUA;
+ else
+ dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
+ }
+
/*
- * Use a FUA write if we need datasync semantics, this is a pure
- * data IO that doesn't require any metadata updates (including
- * after IO completion such as unwritten extent conversion) and
- * the underlying device either supports FUA or doesn't have
- * a volatile write cache. This allows us to avoid cache flushes
- * on IO completion. If we can't use writethrough and need to
- * sync, disable in-task completions as dio completion will
- * need to call generic_write_sync() which will do a blocking
- * fsync / cache flush call.
+ * We can only do deferred completion for pure overwrites that
+ * don't require additional I/O at completion time.
+ *
+ * This rules out writes that need zeroing or extent conversion,
+ * extend the file size, or issue metadata I/O or cache flushes
+ * during completion processing.
*/
- if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
- (dio->flags & IOMAP_DIO_WRITE_THROUGH) &&
- (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev)))
- use_fua = true;
- else if (dio->flags & IOMAP_DIO_NEED_SYNC)
+ if (need_zeroout || (pos >= i_size_read(inode)) ||
+ ((dio->flags & IOMAP_DIO_NEED_SYNC) &&
+ !(bio_opf & REQ_FUA)))
dio->flags &= ~IOMAP_DIO_CALLER_COMP;
+ } else {
+ bio_opf |= REQ_OP_READ;
}
/*
@@ -399,18 +399,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
if (!iov_iter_count(dio->submit.iter))
goto out;
- /*
- * We can only do deferred completion for pure overwrites that
- * don't require additional IO at completion. This rules out
- * writes that need zeroing or extent conversion, extend
- * the file size, or issue journal IO or cache flushes
- * during completion processing.
- */
- if (need_zeroout ||
- ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) ||
- ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
- dio->flags &= ~IOMAP_DIO_CALLER_COMP;
-
/*
* The rules for polled IO completions follow the guidelines as the
* ones we set for inline and deferred completions. If none of those
@@ -428,8 +416,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
goto out;
}
- bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic_hw);
-
nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
do {
size_t n;
@@ -461,7 +447,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
}
n = bio->bi_iter.bi_size;
- if (WARN_ON_ONCE(atomic_hw && n != length)) {
+ if (WARN_ON_ONCE((bio_opf & REQ_ATOMIC) && n != length)) {
/*
* This bio should have covered the complete length,
* which it doesn't, so error. We may need to zero out
diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst index b08a79d11d9f..f1d9aa767d30 100644 --- a/Documentation/filesystems/iomap/operations.rst +++ b/Documentation/filesystems/iomap/operations.rst @@ -514,7 +514,7 @@ IOMAP_WRITE`` with any combination of the following enhancements: if the mapping is unwritten and the filesystem cannot handle zeroing the unaligned regions without exposing stale contents. - * ``IOMAP_ATOMIC_HW``: This write is being issued with torn-write + * ``IOMAP_BIO_ATOMIC``: This write is being issued with torn-write protection based on HW-offload support. Only a single bio can be created for the write, and the write must not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be @@ -530,10 +530,10 @@ IOMAP_WRITE`` with any combination of the following enhancements: the mapping start disk block must have at least the same alignment as the write offset. - * ``IOMAP_ATOMIC_SW``: This write is being issued with torn-write + * ``IOMAP_FS_ATOMIC``: This write is being issued with torn-write protection via a software mechanism provided by the filesystem. All the disk block alignment and single bio restrictions which apply - to IOMAP_ATOMIC_HW do not apply here. + to IOMAP_BIO_ATOMIC do not apply here. SW-based untorn writes would typically be used as a fallback when HW-based untorn writes may not be issued, e.g. the range of the write covers multiple extents, meaning that it is not possible to issue diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index ba2f1e3db7c7..da385862c1b5 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -3467,7 +3467,7 @@ static inline bool ext4_want_directio_fallback(unsigned flags, ssize_t written) return false; /* atomic writes are all-or-nothing */ - if (flags & IOMAP_ATOMIC_HW) + if (flags & IOMAP_BIO_ATOMIC) return false; /* can only try again if we wrote nothing */ diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c index 5299f70428ef..d728d894bd90 100644 --- a/fs/iomap/direct-io.c +++ b/fs/iomap/direct-io.c @@ -317,7 +317,7 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio, * clearing the WRITE_THROUGH flag in the dio request. */ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, - const struct iomap *iomap, bool use_fua, bool atomic_hw) + const struct iomap *iomap, bool use_fua, bool bio_atomic) { blk_opf_t opflags = REQ_SYNC | REQ_IDLE; @@ -329,7 +329,7 @@ static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio, opflags |= REQ_FUA; else dio->flags &= ~IOMAP_DIO_WRITE_THROUGH; - if (atomic_hw) + if (bio_atomic) opflags |= REQ_ATOMIC; return opflags; @@ -340,7 +340,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) const struct iomap *iomap = &iter->iomap; struct inode *inode = iter->inode; unsigned int fs_block_size = i_blocksize(inode), pad; - bool atomic_hw = iter->flags & IOMAP_ATOMIC_HW; + bool bio_atomic = iter->flags & IOMAP_BIO_ATOMIC; const loff_t length = iomap_length(iter); loff_t pos = iter->pos; blk_opf_t bio_opf; @@ -351,7 +351,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) u64 copied = 0; size_t orig_count; - if (atomic_hw && length != iter->len) + if (bio_atomic && length != iter->len) return -EINVAL; if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) || @@ -428,7 +428,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) goto out; } - bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic_hw); + bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, bio_atomic); nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS); do { @@ -461,7 +461,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio) } n = bio->bi_iter.bi_size; - if (WARN_ON_ONCE(atomic_hw && n != length)) { + if (WARN_ON_ONCE(bio_atomic && n != length)) { /* * This bio should have covered the complete length, * which it doesn't, so error. We may need to zero out @@ -686,10 +686,10 @@ __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, iomi.flags |= IOMAP_OVERWRITE_ONLY; } - if (dio_flags & IOMAP_DIO_ATOMIC_SW) - iomi.flags |= IOMAP_ATOMIC_SW; + if (dio_flags & IOMAP_DIO_FS_ATOMIC) + iomi.flags |= IOMAP_FS_ATOMIC; else if (iocb->ki_flags & IOCB_ATOMIC) - iomi.flags |= IOMAP_ATOMIC_HW; + iomi.flags |= IOMAP_BIO_ATOMIC; /* for data sync or sync, we need sync completion processing */ if (iocb_is_dsync(iocb)) { diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h index 69af89044ebd..4b71f1711b69 100644 --- a/fs/iomap/trace.h +++ b/fs/iomap/trace.h @@ -99,7 +99,8 @@ DEFINE_RANGE_EVENT(iomap_dio_rw_queued); { IOMAP_FAULT, "FAULT" }, \ { IOMAP_DIRECT, "DIRECT" }, \ { IOMAP_NOWAIT, "NOWAIT" }, \ - { IOMAP_ATOMIC_HW, "ATOMIC_HW" } + { IOMAP_BIO_ATOMIC, "ATOMIC_BIO" }, \ + { IOMAP_FS_ATOMIC, "ATOMIC_FS" } #define IOMAP_F_FLAGS_STRINGS \ { IOMAP_F_NEW, "NEW" }, \ diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c index 16739c408af3..4ad80179173a 100644 --- a/fs/xfs/xfs_file.c +++ b/fs/xfs/xfs_file.c @@ -755,9 +755,9 @@ xfs_file_dio_write_atomic( &xfs_dio_write_ops, dio_flags, NULL, 0); if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT) && - !(dio_flags & IOMAP_DIO_ATOMIC_SW)) { + !(dio_flags & IOMAP_DIO_FS_ATOMIC)) { xfs_iunlock(ip, iolock); - dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT; + dio_flags = IOMAP_DIO_FS_ATOMIC | IOMAP_DIO_FORCE_WAIT; iolock = XFS_IOLOCK_EXCL; goto retry; } diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c index 6c963786530d..71ddd5979091 100644 --- a/fs/xfs/xfs_iomap.c +++ b/fs/xfs/xfs_iomap.c @@ -830,7 +830,7 @@ xfs_direct_write_iomap_begin( xfs_fileoff_t offset_fsb = XFS_B_TO_FSBT(mp, offset); xfs_fileoff_t end_fsb = xfs_iomap_end_fsb(mp, offset, length); xfs_fileoff_t orig_end_fsb = end_fsb; - bool atomic_hw = flags & IOMAP_ATOMIC_HW; + bool atomic_bio = flags & IOMAP_BIO_ATOMIC; int nimaps = 1, error = 0; unsigned int reflink_flags = 0; bool shared = false; @@ -895,7 +895,7 @@ xfs_direct_write_iomap_begin( if (error) goto out_unlock; if (shared) { - if (atomic_hw && + if (atomic_bio && !xfs_bmap_valid_for_atomic_write(&cmap, offset_fsb, end_fsb)) { error = -EAGAIN; @@ -909,7 +909,7 @@ xfs_direct_write_iomap_begin( needs_alloc = imap_needs_alloc(inode, flags, &imap, nimaps); - if (atomic_hw) { + if (atomic_bio) { error = -EAGAIN; /* * Use CoW method for when we need to alloc > 1 block, @@ -1141,11 +1141,11 @@ xfs_atomic_write_iomap_begin( ASSERT(flags & IOMAP_WRITE); ASSERT(flags & IOMAP_DIRECT); - if (flags & IOMAP_ATOMIC_SW) + if (flags & IOMAP_FS_ATOMIC) return xfs_atomic_write_sw_iomap_begin(inode, offset, length, flags, iomap, srcmap); - ASSERT(flags & IOMAP_ATOMIC_HW); + ASSERT(flags & IOMAP_BIO_ATOMIC); return xfs_direct_write_iomap_begin(inode, offset, length, flags, iomap, srcmap); } diff --git a/include/linux/iomap.h b/include/linux/iomap.h index 9cd93530013c..5e44ca17a64a 100644 --- a/include/linux/iomap.h +++ b/include/linux/iomap.h @@ -189,9 +189,9 @@ struct iomap_folio_ops { #else #define IOMAP_DAX 0 #endif /* CONFIG_FS_DAX */ -#define IOMAP_ATOMIC_HW (1 << 9) /* HW-based torn-write protection */ -#define IOMAP_DONTCACHE (1 << 10) -#define IOMAP_ATOMIC_SW (1 << 11)/* SW-based torn-write protection */ +#define IOMAP_DONTCACHE (1 << 9) +#define IOMAP_BIO_ATOMIC (1 << 10) /* Use REQ_ATOMIC on single bio */ +#define IOMAP_FS_ATOMIC (1 << 11) /* FS-based torn-write protection */ struct iomap_ops { /* @@ -504,9 +504,9 @@ struct iomap_dio_ops { #define IOMAP_DIO_PARTIAL (1 << 2) /* - * Use software-based torn-write protection. + * Use FS-based torn-write protection. */ -#define IOMAP_DIO_ATOMIC_SW (1 << 3) +#define IOMAP_DIO_FS_ATOMIC (1 << 3) ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter, const struct iomap_ops *ops, const struct iomap_dio_ops *dops,
Dave Chinner thought that names IOMAP_ATOMIC_HW and IOMAP_ATOMIC_SW were not appropopiate. Specifically because IOMAP_ATOMIC_HW could actually be realised with a SW-based method in the block or md/dm layers. So rename to IOMAP_ATOMIC_BIO and IOMAP_ATOMIC_FS. Also renumber the flags so that the atomic flags are adjacent. Signed-off-by: John Garry <john.g.garry@oracle.com> --- Documentation/filesystems/iomap/operations.rst | 6 +++--- fs/ext4/inode.c | 2 +- fs/iomap/direct-io.c | 18 +++++++++--------- fs/iomap/trace.h | 3 ++- fs/xfs/xfs_file.c | 4 ++-- fs/xfs/xfs_iomap.c | 10 +++++----- include/linux/iomap.h | 10 +++++----- 7 files changed, 27 insertions(+), 26 deletions(-)