diff mbox series

[RFC,v5,10/10] iomap: Rename ATOMIC flags again

Message ID 20250310183946.932054-11-john.g.garry@oracle.com (mailing list archive)
State New
Headers show
Series large atomic writes for xfs with CoW | expand

Commit Message

John Garry March 10, 2025, 6:39 p.m. UTC
Dave Chinner thought that names IOMAP_ATOMIC_HW and IOMAP_ATOMIC_SW were
not appropopiate. Specifically because IOMAP_ATOMIC_HW could actually be
realised with a SW-based method in the block or md/dm layers.

So rename to IOMAP_ATOMIC_BIO and IOMAP_ATOMIC_FS.

Also renumber the flags so that the atomic flags are adjacent.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 Documentation/filesystems/iomap/operations.rst |  6 +++---
 fs/ext4/inode.c                                |  2 +-
 fs/iomap/direct-io.c                           | 18 +++++++++---------
 fs/iomap/trace.h                               |  3 ++-
 fs/xfs/xfs_file.c                              |  4 ++--
 fs/xfs/xfs_iomap.c                             | 10 +++++-----
 include/linux/iomap.h                          | 10 +++++-----
 7 files changed, 27 insertions(+), 26 deletions(-)

Comments

Christoph Hellwig March 12, 2025, 7:13 a.m. UTC | #1
On Mon, Mar 10, 2025 at 06:39:46PM +0000, John Garry wrote:
> Dave Chinner thought that names IOMAP_ATOMIC_HW and IOMAP_ATOMIC_SW were
> not appropopiate. Specifically because IOMAP_ATOMIC_HW could actually be
> realised with a SW-based method in the block or md/dm layers.
> 
> So rename to IOMAP_ATOMIC_BIO and IOMAP_ATOMIC_FS.

Looking over the entire series and the already merged iomap code:
there should be no reason at all for having IOMAP_ATOMIC_FS.
The only thing it does is to branch out to
xfs_atomic_write_sw_iomap_begin from xfs_atomic_write_iomap_begin.

You can do that in a much simpler and nicer way by just having
different iomap_ops for the bio based vs file system based atomics.

I agree with dave that bio is a better term for the bio based
atomic, but please use the IOMAP_ATOMIC_BIO name above instead
of the IOMAP_BIO_ATOMIC actually used in the code if you change
it.

>   */
>  static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
> -		const struct iomap *iomap, bool use_fua, bool atomic_hw)
> +		const struct iomap *iomap, bool use_fua, bool bio_atomic)

Not new here, but these two bools are pretty ugly.

I'd rather have a

    blk_opf_t extra_flags;

in the caller that gets REQ_FUA and REQ_ATOMIC assigned as needed,
and then just clear 

>  
> -	if (atomic_hw && length != iter->len)
> +	if (bio_atomic && length != iter->len)
>  		return -EINVAL;

This could really use a comment explaining what the check is for.

> -		if (WARN_ON_ONCE(atomic_hw && n != length)) {
> +		if (WARN_ON_ONCE(bio_atomic && n != length)) {

Same.

> -#define IOMAP_ATOMIC_HW		(1 << 9) /* HW-based torn-write protection */
> -#define IOMAP_DONTCACHE		(1 << 10)
> -#define IOMAP_ATOMIC_SW		(1 << 11)/* SW-based torn-write protection */
> +#define IOMAP_DONTCACHE		(1 << 9)
> +#define IOMAP_BIO_ATOMIC	(1 << 10) /* Use REQ_ATOMIC on single bio */
> +#define IOMAP_FS_ATOMIC		(1 << 11) /* FS-based torn-write protection */

Please also fix the overly long lines here by moving the comments
above the definitions.  That should also give you enough space to
expand the comment into a full sentence describing the flag fully.
Dave Chinner March 12, 2025, 11:59 p.m. UTC | #2
On Wed, Mar 12, 2025 at 12:13:42AM -0700, Christoph Hellwig wrote:
> On Mon, Mar 10, 2025 at 06:39:46PM +0000, John Garry wrote:
> > Dave Chinner thought that names IOMAP_ATOMIC_HW and IOMAP_ATOMIC_SW were
> > not appropopiate. Specifically because IOMAP_ATOMIC_HW could actually be
> > realised with a SW-based method in the block or md/dm layers.
> > 
> > So rename to IOMAP_ATOMIC_BIO and IOMAP_ATOMIC_FS.
> 
> Looking over the entire series and the already merged iomap code:
> there should be no reason at all for having IOMAP_ATOMIC_FS.
> The only thing it does is to branch out to
> xfs_atomic_write_sw_iomap_begin from xfs_atomic_write_iomap_begin.
> 
> You can do that in a much simpler and nicer way by just having
> different iomap_ops for the bio based vs file system based atomics.

Agreed - I was going to suggest that, but got distracted by
something else and then forgot about it when I got back to writing
the email...

> I agree with dave that bio is a better term for the bio based
> atomic, but please use the IOMAP_ATOMIC_BIO name above instead
> of the IOMAP_BIO_ATOMIC actually used in the code if you change
> it.

Works for me.

> >   */
> >  static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
> > -		const struct iomap *iomap, bool use_fua, bool atomic_hw)
> > +		const struct iomap *iomap, bool use_fua, bool bio_atomic)
> 
> Not new here, but these two bools are pretty ugly.
> 
> I'd rather have a
> 
>     blk_opf_t extra_flags;
> 
> in the caller that gets REQ_FUA and REQ_ATOMIC assigned as needed,
> and then just clear 

Yep, that is cleaner..

-Dave.
John Garry March 13, 2025, 6:28 a.m. UTC | #3
iomap_dio_bio_iter()
On 12/03/2025 23:59, Dave Chinner wrote:
>>>    */
>>>   static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
>>> -		const struct iomap *iomap, bool use_fua, bool atomic_hw)
>>> +		const struct iomap *iomap, bool use_fua, bool bio_atomic)
>> Not new here, but these two bools are pretty ugly.
>>
>> I'd rather have a
>>
>>      blk_opf_t extra_flags;
>>
>> in the caller that gets REQ_FUA and REQ_ATOMIC assigned as needed,
>> and then just clear
> Yep, that is cleaner..

This suggestion is not clear to me.

Is it that iomap_dio_bio_iter() [the only caller of 
iomap_dio_bio_opflags()] sets REQ_FUA and REQ_ATOMIC in extra_flags, and 
then we extra_flags | bio_opf?

Note that iomap_dio_bio_opflags() does still use use_fua for clearing 
IOMAP_DIO_WRITE_THROUGH.

And to me it seems nicer to set all the REQ_ flags in one place.
Christoph Hellwig March 13, 2025, 7:02 a.m. UTC | #4
On Thu, Mar 13, 2025 at 06:28:03AM +0000, John Garry wrote:
> > > I'd rather have a
> > > 
> > >      blk_opf_t extra_flags;
> > > 
> > > in the caller that gets REQ_FUA and REQ_ATOMIC assigned as needed,
> > > and then just clear
> > Yep, that is cleaner..
> 
> This suggestion is not clear to me.
> 
> Is it that iomap_dio_bio_iter() [the only caller of iomap_dio_bio_opflags()]
> sets REQ_FUA and REQ_ATOMIC in extra_flags, and then we extra_flags |
> bio_opf?

Yes.

> Note that iomap_dio_bio_opflags() does still use use_fua for clearing
> IOMAP_DIO_WRITE_THROUGH.

You can check for REQ_FUA in extra_flags (or the entire op).

> And to me it seems nicer to set all the REQ_ flags in one place.

Passing multiple bool arguments just loses way too much context.  But
if you really want everything in one place you could probably build
the entire blk_opf_t in iomap_dio_bio_iter, and avoid having to
recalculate it for every bio.
John Garry March 13, 2025, 7:41 a.m. UTC | #5
On 13/03/2025 07:02, Christoph Hellwig wrote:
> On Thu, Mar 13, 2025 at 06:28:03AM +0000, John Garry wrote:
>>>> I'd rather have a
>>>>
>>>>       blk_opf_t extra_flags;
>>>>
>>>> in the caller that gets REQ_FUA and REQ_ATOMIC assigned as needed,
>>>> and then just clear
>>> Yep, that is cleaner..
>>
>> This suggestion is not clear to me.
>>
>> Is it that iomap_dio_bio_iter() [the only caller of iomap_dio_bio_opflags()]
>> sets REQ_FUA and REQ_ATOMIC in extra_flags, and then we extra_flags |
>> bio_opf?
> 
> Yes.
> 
>> Note that iomap_dio_bio_opflags() does still use use_fua for clearing
>> IOMAP_DIO_WRITE_THROUGH.
> 
> You can check for REQ_FUA in extra_flags (or the entire op).
 > >> And to me it seems nicer to set all the REQ_ flags in one place.
> 
> Passing multiple bool arguments just loses way too much context.  But
> if you really want everything in one place you could probably build
> the entire blk_opf_t in iomap_dio_bio_iter, and avoid having to
> recalculate it for every bio.
> 

Yeah, when we start taking use_fua and atomic_bio args from 
iomap_dio_bio_opflags(), then iomap_dio_bio_opflags() becomes a shell of 
the function.

So how about this (I would re-add the write through comment):

--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -311,30 +311,6 @@ static int iomap_dio_zero(const struct iomap_iter 
*iter, struct iomap_dio *dio,
  	return 0;
  }

-/*
- * Figure out the bio's operation flags from the dio request, the
- * mapping, and whether or not we want FUA.  Note that we can end up
- * clearing the WRITE_THROUGH flag in the dio request.
- */
-static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
-		const struct iomap *iomap, bool use_fua, bool atomic_hw)
-{
-	blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
-
-	if (!(dio->flags & IOMAP_DIO_WRITE))
-		return REQ_OP_READ;
-
-	opflags |= REQ_OP_WRITE;
-	if (use_fua)
-		opflags |= REQ_FUA;
-	else
-		dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
-	if (atomic_hw)
-		opflags |= REQ_ATOMIC;
-
-	return opflags;
-}
-
  static int iomap_dio_bio_iter(struct iomap_iter *iter, struct 
iomap_dio *dio)
  {
  	const struct iomap *iomap = &iter->iomap;
@@ -346,13 +322,20 @@ static int iomap_dio_bio_iter(struct iomap_iter 
*iter, struct iomap_dio *dio)
  	blk_opf_t bio_opf;
  	struct bio *bio;
  	bool need_zeroout = false;
-	bool use_fua = false;
  	int nr_pages, ret = 0;
  	u64 copied = 0;
  	size_t orig_count;

-	if (atomic_hw && length != iter->len)
-		return -EINVAL;
+	if (dio->flags & IOMAP_DIO_WRITE) {
+		bio_opf = REQ_OP_WRITE;
+		if (atomic_hw)  {
+			if (length != iter->len)
+				return -EINVAL;
+			bio_opf |= REQ_ATOMIC;
+		}
+	} else {
+		bio_opf = REQ_OP_READ;
+	}

  	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
  	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
@@ -382,10 +365,12 @@ static int iomap_dio_bio_iter(struct iomap_iter 
*iter, struct iomap_dio *dio)
  		 */
  		if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
  		    (dio->flags & IOMAP_DIO_WRITE_THROUGH) &&
-		    (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev)))
-			use_fua = true;
-		else if (dio->flags & IOMAP_DIO_NEED_SYNC)
+		    (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) {
+			bio_opf |= REQ_FUA; //reads as well?
+			dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
+		} else if (dio->flags & IOMAP_DIO_NEED_SYNC) {
  			dio->flags &= ~IOMAP_DIO_CALLER_COMP;
+		}
  	}

  	/*
@@ -407,7 +392,7 @@ static int iomap_dio_bio_iter(struct iomap_iter 
*iter, struct iomap_dio *dio)
  	 * during completion processing.
  	 */
  	if (need_zeroout ||
-	    ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) ||
+	    ((dio->flags & IOMAP_DIO_NEED_SYNC) && !(bio_opf & REQ_FUA)) ||
  	    ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
  		dio->flags &= ~IOMAP_DIO_CALLER_COMP;

@@ -428,8 +413,6 @@ static int iomap_dio_bio_iter(struct iomap_iter 
*iter, struct iomap_dio *dio)
  			goto out;
  	}

-	bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic_hw);
-
  	nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
  	do {
  		size_t n;
--
Christoph Hellwig March 13, 2025, 7:49 a.m. UTC | #6
On Thu, Mar 13, 2025 at 07:41:11AM +0000, John Garry wrote:
> So how about this (I would re-add the write through comment):

This looks roughly sane.  You'd probably want to turn the
iomap_dio_bio_opflags removal into a prep path, though.

> -     blk_opf_t opflags = REQ_SYNC | REQ_IDLE;

This good lost and should move to the bio_opf declaration now.

> +		    (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) {
> +			bio_opf |= REQ_FUA; //reads as well?

REQ_FUA is not defined for reads in Linux  Some of the storage standards
define it for reads, but the semantics are pretty nonsensical.
John Garry March 13, 2025, 7:53 a.m. UTC | #7
On 13/03/2025 07:49, Christoph Hellwig wrote:
> On Thu, Mar 13, 2025 at 07:41:11AM +0000, John Garry wrote:
>> So how about this (I would re-add the write through comment):
> 
> This looks roughly sane.  You'd probably want to turn the
> iomap_dio_bio_opflags removal into a prep path, though.

Sure

> 
>> -     blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
> 
> This good lost and should move to the bio_opf declaration now.

ok

> 
>> +		    (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev))) {
>> +			bio_opf |= REQ_FUA; //reads as well?
> 
> REQ_FUA is not defined for reads in Linux  Some of the storage standards
> define it for reads, but the semantics are pretty nonsensical.
> 

ok, so I will need to check for writes when setting that (as it was 
previously)
Christoph Hellwig March 13, 2025, 8:09 a.m. UTC | #8
On Thu, Mar 13, 2025 at 07:53:22AM +0000, John Garry wrote:
> > REQ_FUA is not defined for reads in Linux  Some of the storage standards
> > define it for reads, but the semantics are pretty nonsensical.
> > 
> 
> ok, so I will need to check for writes when setting that (as it was
> previously)

Yes.  That entire IOMAP_MAPPED branch should just grow and additional
IOMAP_DIO_WRITE check, as all the code in it is only relevant to writes.
In fact that also includes the most of the other checks before,
so this could use more refactoring if we want to.
Christoph Hellwig March 13, 2025, 8:18 a.m. UTC | #9
Something like this (untestested):

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 5299f70428ef..3fa21445906a 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -312,27 +312,20 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 }
 
 /*
- * Figure out the bio's operation flags from the dio request, the
- * mapping, and whether or not we want FUA.  Note that we can end up
- * clearing the WRITE_THROUGH flag in the dio request.
+ * Use a FUA write if we need datasync semantics and this is a pure data I/O
+ * that doesn't require any metadata updates (including after I/O completion
+ * such as unwritten extent conversion) and the underlying device either
+ * doesn't have a volatile write cache or supports FUA.
+ * This allows us to avoid cache flushes on I/O completion.
  */
-static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
-		const struct iomap *iomap, bool use_fua, bool atomic_hw)
+static inline bool iomap_dio_can_use_fua(const struct iomap *iomap,
+		struct iomap_dio *dio)
 {
-	blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
-
-	if (!(dio->flags & IOMAP_DIO_WRITE))
-		return REQ_OP_READ;
-
-	opflags |= REQ_OP_WRITE;
-	if (use_fua)
-		opflags |= REQ_FUA;
-	else
-		dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
-	if (atomic_hw)
-		opflags |= REQ_ATOMIC;
-
-	return opflags;
+	if (iomap->flags & (IOMAP_F_SHARED | IOMAP_F_DIRTY))
+		return false;
+	if (!(dio->flags & IOMAP_DIO_WRITE_THROUGH))
+		return false;
+	return !bdev_write_cache(iomap->bdev) || bdev_fua(iomap->bdev);
 }
 
 static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
@@ -343,49 +336,53 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	bool atomic_hw = iter->flags & IOMAP_ATOMIC_HW;
 	const loff_t length = iomap_length(iter);
 	loff_t pos = iter->pos;
-	blk_opf_t bio_opf;
+	blk_opf_t bio_opf = REQ_SYNC | REQ_IDLE;
 	struct bio *bio;
 	bool need_zeroout = false;
-	bool use_fua = false;
 	int nr_pages, ret = 0;
 	u64 copied = 0;
 	size_t orig_count;
 
-	if (atomic_hw && length != iter->len)
-		return -EINVAL;
-
 	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
 	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
 		return -EINVAL;
 
-	if (iomap->type == IOMAP_UNWRITTEN) {
-		dio->flags |= IOMAP_DIO_UNWRITTEN;
-		need_zeroout = true;
-	}
+	if (dio->flags & IOMAP_DIO_WRITE) {
+		bio_opf |= REQ_OP_WRITE;
+
+		if (atomic_hw) {
+			if (length != iter->len)
+				return -EINVAL;
+			bio_opf |= REQ_ATOMIC;
+		}
 
-	if (iomap->flags & IOMAP_F_SHARED)
-		dio->flags |= IOMAP_DIO_COW;
+		if (iomap->type == IOMAP_UNWRITTEN) {
+			dio->flags |= IOMAP_DIO_UNWRITTEN;
+			need_zeroout = true;
+		}
 
-	if (iomap->flags & IOMAP_F_NEW) {
-		need_zeroout = true;
-	} else if (iomap->type == IOMAP_MAPPED) {
-		/*
-		 * Use a FUA write if we need datasync semantics, this is a pure
-		 * data IO that doesn't require any metadata updates (including
-		 * after IO completion such as unwritten extent conversion) and
-		 * the underlying device either supports FUA or doesn't have
-		 * a volatile write cache. This allows us to avoid cache flushes
-		 * on IO completion. If we can't use writethrough and need to
-		 * sync, disable in-task completions as dio completion will
-		 * need to call generic_write_sync() which will do a blocking
-		 * fsync / cache flush call.
-		 */
-		if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
-		    (dio->flags & IOMAP_DIO_WRITE_THROUGH) &&
-		    (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev)))
-			use_fua = true;
-		else if (dio->flags & IOMAP_DIO_NEED_SYNC)
-			dio->flags &= ~IOMAP_DIO_CALLER_COMP;
+		if (iomap->flags & IOMAP_F_SHARED)
+			dio->flags |= IOMAP_DIO_COW;
+
+		if (iomap->flags & IOMAP_F_NEW) {
+			need_zeroout = true;
+		} else if (iomap->type == IOMAP_MAPPED) {
+			if (iomap_dio_can_use_fua(iomap, dio)) {
+				bio_opf |= REQ_FUA;
+			} else {
+				dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
+				/*
+				 * Disable in-task completions if we can't use
+				 * writethrough and need to sync as the I/O
+				 * completion handler has to force a (blocking)
+				 * cache flush.
+				 */
+				if (dio->flags & IOMAP_DIO_NEED_SYNC)
+					dio->flags &= ~IOMAP_DIO_CALLER_COMP;
+			}
+		}
+	} else {
+		bio_opf |= REQ_OP_READ;
 	}
 
 	/*
@@ -407,7 +404,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	 * during completion processing.
 	 */
 	if (need_zeroout ||
-	    ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) ||
+	    ((dio->flags & IOMAP_DIO_NEED_SYNC) && !(bio_opf & REQ_FUA)) ||
 	    ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
 		dio->flags &= ~IOMAP_DIO_CALLER_COMP;
 
@@ -428,8 +425,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 			goto out;
 	}
 
-	bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic_hw);
-
 	nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
 	do {
 		size_t n;
John Garry March 13, 2025, 8:24 a.m. UTC | #10
On 13/03/2025 08:18, Christoph Hellwig wrote:
> Something like this (untestested):

looks sane, I'll check it further and pick it up.

Thanks
Christoph Hellwig March 13, 2025, 8:28 a.m. UTC | #11
On Thu, Mar 13, 2025 at 01:18:26AM -0700, Christoph Hellwig wrote:
> Something like this (untestested):

In fact this can be further simplified, as the clearing of
IOMAP_DIO_CALLER_COMP has duplicate logic, and only applies to writes.

So something like this would do even better:

diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 5299f70428ef..e9b03b9dae9e 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -312,27 +312,20 @@ static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
 }
 
 /*
- * Figure out the bio's operation flags from the dio request, the
- * mapping, and whether or not we want FUA.  Note that we can end up
- * clearing the WRITE_THROUGH flag in the dio request.
+ * Use a FUA write if we need datasync semantics and this is a pure data I/O
+ * that doesn't require any metadata updates (including after I/O completion
+ * such as unwritten extent conversion) and the underlying device either
+ * doesn't have a volatile write cache or supports FUA.
+ * This allows us to avoid cache flushes on I/O completion.
  */
-static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
-		const struct iomap *iomap, bool use_fua, bool atomic_hw)
+static inline bool iomap_dio_can_use_fua(const struct iomap *iomap,
+		struct iomap_dio *dio)
 {
-	blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
-
-	if (!(dio->flags & IOMAP_DIO_WRITE))
-		return REQ_OP_READ;
-
-	opflags |= REQ_OP_WRITE;
-	if (use_fua)
-		opflags |= REQ_FUA;
-	else
-		dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
-	if (atomic_hw)
-		opflags |= REQ_ATOMIC;
-
-	return opflags;
+	if (iomap->flags & (IOMAP_F_SHARED | IOMAP_F_DIRTY))
+		return false;
+	if (!(dio->flags & IOMAP_DIO_WRITE_THROUGH))
+		return false;
+	return !bdev_write_cache(iomap->bdev) || bdev_fua(iomap->bdev);
 }
 
 static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
@@ -340,52 +333,59 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	const struct iomap *iomap = &iter->iomap;
 	struct inode *inode = iter->inode;
 	unsigned int fs_block_size = i_blocksize(inode), pad;
-	bool atomic_hw = iter->flags & IOMAP_ATOMIC_HW;
 	const loff_t length = iomap_length(iter);
 	loff_t pos = iter->pos;
-	blk_opf_t bio_opf;
+	blk_opf_t bio_opf = REQ_SYNC | REQ_IDLE;
 	struct bio *bio;
 	bool need_zeroout = false;
-	bool use_fua = false;
 	int nr_pages, ret = 0;
 	u64 copied = 0;
 	size_t orig_count;
 
-	if (atomic_hw && length != iter->len)
-		return -EINVAL;
-
 	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
 	    !bdev_iter_is_aligned(iomap->bdev, dio->submit.iter))
 		return -EINVAL;
 
-	if (iomap->type == IOMAP_UNWRITTEN) {
-		dio->flags |= IOMAP_DIO_UNWRITTEN;
-		need_zeroout = true;
-	}
+	if (dio->flags & IOMAP_DIO_WRITE) {
+		bio_opf |= REQ_OP_WRITE;
+
+		if (iter->flags & IOMAP_ATOMIC_HW) {
+			if (length != iter->len)
+				return -EINVAL;
+			bio_opf |= REQ_ATOMIC;
+		}
+
+		if (iomap->type == IOMAP_UNWRITTEN) {
+			dio->flags |= IOMAP_DIO_UNWRITTEN;
+			need_zeroout = true;
+		}
 
-	if (iomap->flags & IOMAP_F_SHARED)
-		dio->flags |= IOMAP_DIO_COW;
+		if (iomap->flags & IOMAP_F_SHARED)
+			dio->flags |= IOMAP_DIO_COW;
 
-	if (iomap->flags & IOMAP_F_NEW) {
-		need_zeroout = true;
-	} else if (iomap->type == IOMAP_MAPPED) {
+		if (iomap->flags & IOMAP_F_NEW) {
+			need_zeroout = true;
+		} else if (iomap->type == IOMAP_MAPPED) {
+			if (iomap_dio_can_use_fua(iomap, dio))
+				bio_opf |= REQ_FUA;
+			else
+				dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
+		}
+	
 		/*
-		 * Use a FUA write if we need datasync semantics, this is a pure
-		 * data IO that doesn't require any metadata updates (including
-		 * after IO completion such as unwritten extent conversion) and
-		 * the underlying device either supports FUA or doesn't have
-		 * a volatile write cache. This allows us to avoid cache flushes
-		 * on IO completion. If we can't use writethrough and need to
-		 * sync, disable in-task completions as dio completion will
-		 * need to call generic_write_sync() which will do a blocking
-		 * fsync / cache flush call.
+		 * We can only do deferred completion for pure overwrites that
+		 * don't require additional I/O at completion time.
+		 *
+		 * This rules out writes that need zeroing or extent conversion,
+		 * extend the file size, or issue metadata I/O or cache flushes
+		 * during completion processing.
 		 */
-		if (!(iomap->flags & (IOMAP_F_SHARED|IOMAP_F_DIRTY)) &&
-		    (dio->flags & IOMAP_DIO_WRITE_THROUGH) &&
-		    (bdev_fua(iomap->bdev) || !bdev_write_cache(iomap->bdev)))
-			use_fua = true;
-		else if (dio->flags & IOMAP_DIO_NEED_SYNC)
+		if (need_zeroout || (pos >= i_size_read(inode)) ||
+		    ((dio->flags & IOMAP_DIO_NEED_SYNC) &&
+		     !(bio_opf & REQ_FUA)))
 			dio->flags &= ~IOMAP_DIO_CALLER_COMP;
+	} else {
+		bio_opf |= REQ_OP_READ;
 	}
 
 	/*
@@ -399,18 +399,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	if (!iov_iter_count(dio->submit.iter))
 		goto out;
 
-	/*
-	 * We can only do deferred completion for pure overwrites that
-	 * don't require additional IO at completion. This rules out
-	 * writes that need zeroing or extent conversion, extend
-	 * the file size, or issue journal IO or cache flushes
-	 * during completion processing.
-	 */
-	if (need_zeroout ||
-	    ((dio->flags & IOMAP_DIO_NEED_SYNC) && !use_fua) ||
-	    ((dio->flags & IOMAP_DIO_WRITE) && pos >= i_size_read(inode)))
-		dio->flags &= ~IOMAP_DIO_CALLER_COMP;
-
 	/*
 	 * The rules for polled IO completions follow the guidelines as the
 	 * ones we set for inline and deferred completions. If none of those
@@ -428,8 +416,6 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 			goto out;
 	}
 
-	bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic_hw);
-
 	nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
 	do {
 		size_t n;
@@ -461,7 +447,7 @@ static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 		}
 
 		n = bio->bi_iter.bi_size;
-		if (WARN_ON_ONCE(atomic_hw && n != length)) {
+		if (WARN_ON_ONCE((bio_opf & REQ_ATOMIC) && n != length)) {
 			/*
 			 * This bio should have covered the complete length,
 			 * which it doesn't, so error. We may need to zero out
diff mbox series

Patch

diff --git a/Documentation/filesystems/iomap/operations.rst b/Documentation/filesystems/iomap/operations.rst
index b08a79d11d9f..f1d9aa767d30 100644
--- a/Documentation/filesystems/iomap/operations.rst
+++ b/Documentation/filesystems/iomap/operations.rst
@@ -514,7 +514,7 @@  IOMAP_WRITE`` with any combination of the following enhancements:
    if the mapping is unwritten and the filesystem cannot handle zeroing
    the unaligned regions without exposing stale contents.
 
- * ``IOMAP_ATOMIC_HW``: This write is being issued with torn-write
+ * ``IOMAP_BIO_ATOMIC``: This write is being issued with torn-write
    protection based on HW-offload support.
    Only a single bio can be created for the write, and the write must
    not be split into multiple I/O requests, i.e. flag REQ_ATOMIC must be
@@ -530,10 +530,10 @@  IOMAP_WRITE`` with any combination of the following enhancements:
    the mapping start disk block must have at least the same alignment as
    the write offset.
 
- * ``IOMAP_ATOMIC_SW``: This write is being issued with torn-write
+ * ``IOMAP_FS_ATOMIC``: This write is being issued with torn-write
    protection via a software mechanism provided by the filesystem.
    All the disk block alignment and single bio restrictions which apply
-   to IOMAP_ATOMIC_HW do not apply here.
+   to IOMAP_BIO_ATOMIC do not apply here.
    SW-based untorn writes would typically be used as a fallback when
    HW-based untorn writes may not be issued, e.g. the range of the write
    covers multiple extents, meaning that it is not possible to issue
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
index ba2f1e3db7c7..da385862c1b5 100644
--- a/fs/ext4/inode.c
+++ b/fs/ext4/inode.c
@@ -3467,7 +3467,7 @@  static inline bool ext4_want_directio_fallback(unsigned flags, ssize_t written)
 		return false;
 
 	/* atomic writes are all-or-nothing */
-	if (flags & IOMAP_ATOMIC_HW)
+	if (flags & IOMAP_BIO_ATOMIC)
 		return false;
 
 	/* can only try again if we wrote nothing */
diff --git a/fs/iomap/direct-io.c b/fs/iomap/direct-io.c
index 5299f70428ef..d728d894bd90 100644
--- a/fs/iomap/direct-io.c
+++ b/fs/iomap/direct-io.c
@@ -317,7 +317,7 @@  static int iomap_dio_zero(const struct iomap_iter *iter, struct iomap_dio *dio,
  * clearing the WRITE_THROUGH flag in the dio request.
  */
 static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
-		const struct iomap *iomap, bool use_fua, bool atomic_hw)
+		const struct iomap *iomap, bool use_fua, bool bio_atomic)
 {
 	blk_opf_t opflags = REQ_SYNC | REQ_IDLE;
 
@@ -329,7 +329,7 @@  static inline blk_opf_t iomap_dio_bio_opflags(struct iomap_dio *dio,
 		opflags |= REQ_FUA;
 	else
 		dio->flags &= ~IOMAP_DIO_WRITE_THROUGH;
-	if (atomic_hw)
+	if (bio_atomic)
 		opflags |= REQ_ATOMIC;
 
 	return opflags;
@@ -340,7 +340,7 @@  static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	const struct iomap *iomap = &iter->iomap;
 	struct inode *inode = iter->inode;
 	unsigned int fs_block_size = i_blocksize(inode), pad;
-	bool atomic_hw = iter->flags & IOMAP_ATOMIC_HW;
+	bool bio_atomic = iter->flags & IOMAP_BIO_ATOMIC;
 	const loff_t length = iomap_length(iter);
 	loff_t pos = iter->pos;
 	blk_opf_t bio_opf;
@@ -351,7 +351,7 @@  static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 	u64 copied = 0;
 	size_t orig_count;
 
-	if (atomic_hw && length != iter->len)
+	if (bio_atomic && length != iter->len)
 		return -EINVAL;
 
 	if ((pos | length) & (bdev_logical_block_size(iomap->bdev) - 1) ||
@@ -428,7 +428,7 @@  static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 			goto out;
 	}
 
-	bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, atomic_hw);
+	bio_opf = iomap_dio_bio_opflags(dio, iomap, use_fua, bio_atomic);
 
 	nr_pages = bio_iov_vecs_to_alloc(dio->submit.iter, BIO_MAX_VECS);
 	do {
@@ -461,7 +461,7 @@  static int iomap_dio_bio_iter(struct iomap_iter *iter, struct iomap_dio *dio)
 		}
 
 		n = bio->bi_iter.bi_size;
-		if (WARN_ON_ONCE(atomic_hw && n != length)) {
+		if (WARN_ON_ONCE(bio_atomic && n != length)) {
 			/*
 			 * This bio should have covered the complete length,
 			 * which it doesn't, so error. We may need to zero out
@@ -686,10 +686,10 @@  __iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 			iomi.flags |= IOMAP_OVERWRITE_ONLY;
 		}
 
-		if (dio_flags & IOMAP_DIO_ATOMIC_SW)
-			iomi.flags |= IOMAP_ATOMIC_SW;
+		if (dio_flags & IOMAP_DIO_FS_ATOMIC)
+			iomi.flags |= IOMAP_FS_ATOMIC;
 		else if (iocb->ki_flags & IOCB_ATOMIC)
-			iomi.flags |= IOMAP_ATOMIC_HW;
+			iomi.flags |= IOMAP_BIO_ATOMIC;
 
 		/* for data sync or sync, we need sync completion processing */
 		if (iocb_is_dsync(iocb)) {
diff --git a/fs/iomap/trace.h b/fs/iomap/trace.h
index 69af89044ebd..4b71f1711b69 100644
--- a/fs/iomap/trace.h
+++ b/fs/iomap/trace.h
@@ -99,7 +99,8 @@  DEFINE_RANGE_EVENT(iomap_dio_rw_queued);
 	{ IOMAP_FAULT,		"FAULT" }, \
 	{ IOMAP_DIRECT,		"DIRECT" }, \
 	{ IOMAP_NOWAIT,		"NOWAIT" }, \
-	{ IOMAP_ATOMIC_HW,	"ATOMIC_HW" }
+	{ IOMAP_BIO_ATOMIC,	"ATOMIC_BIO" }, \
+	{ IOMAP_FS_ATOMIC,	"ATOMIC_FS" }
 
 #define IOMAP_F_FLAGS_STRINGS \
 	{ IOMAP_F_NEW,		"NEW" }, \
diff --git a/fs/xfs/xfs_file.c b/fs/xfs/xfs_file.c
index 16739c408af3..4ad80179173a 100644
--- a/fs/xfs/xfs_file.c
+++ b/fs/xfs/xfs_file.c
@@ -755,9 +755,9 @@  xfs_file_dio_write_atomic(
 			&xfs_dio_write_ops, dio_flags, NULL, 0);
 
 	if (ret == -EAGAIN && !(iocb->ki_flags & IOCB_NOWAIT) &&
-	    !(dio_flags & IOMAP_DIO_ATOMIC_SW)) {
+	    !(dio_flags & IOMAP_DIO_FS_ATOMIC)) {
 		xfs_iunlock(ip, iolock);
-		dio_flags = IOMAP_DIO_ATOMIC_SW | IOMAP_DIO_FORCE_WAIT;
+		dio_flags = IOMAP_DIO_FS_ATOMIC | IOMAP_DIO_FORCE_WAIT;
 		iolock = XFS_IOLOCK_EXCL;
 		goto retry;
 	}
diff --git a/fs/xfs/xfs_iomap.c b/fs/xfs/xfs_iomap.c
index 6c963786530d..71ddd5979091 100644
--- a/fs/xfs/xfs_iomap.c
+++ b/fs/xfs/xfs_iomap.c
@@ -830,7 +830,7 @@  xfs_direct_write_iomap_begin(
 	xfs_fileoff_t		offset_fsb = XFS_B_TO_FSBT(mp, offset);
 	xfs_fileoff_t		end_fsb = xfs_iomap_end_fsb(mp, offset, length);
 	xfs_fileoff_t		orig_end_fsb = end_fsb;
-	bool			atomic_hw = flags & IOMAP_ATOMIC_HW;
+	bool			atomic_bio = flags & IOMAP_BIO_ATOMIC;
 	int			nimaps = 1, error = 0;
 	unsigned int		reflink_flags = 0;
 	bool			shared = false;
@@ -895,7 +895,7 @@  xfs_direct_write_iomap_begin(
 		if (error)
 			goto out_unlock;
 		if (shared) {
-			if (atomic_hw &&
+			if (atomic_bio &&
 			    !xfs_bmap_valid_for_atomic_write(&cmap,
 					offset_fsb, end_fsb)) {
 				error = -EAGAIN;
@@ -909,7 +909,7 @@  xfs_direct_write_iomap_begin(
 
 	needs_alloc = imap_needs_alloc(inode, flags, &imap, nimaps);
 
-	if (atomic_hw) {
+	if (atomic_bio) {
 		error = -EAGAIN;
 		/*
 		 * Use CoW method for when we need to alloc > 1 block,
@@ -1141,11 +1141,11 @@  xfs_atomic_write_iomap_begin(
 	ASSERT(flags & IOMAP_WRITE);
 	ASSERT(flags & IOMAP_DIRECT);
 
-	if (flags & IOMAP_ATOMIC_SW)
+	if (flags & IOMAP_FS_ATOMIC)
 		return xfs_atomic_write_sw_iomap_begin(inode, offset, length,
 				flags, iomap, srcmap);
 
-	ASSERT(flags & IOMAP_ATOMIC_HW);
+	ASSERT(flags & IOMAP_BIO_ATOMIC);
 	return xfs_direct_write_iomap_begin(inode, offset, length, flags,
 			iomap, srcmap);
 }
diff --git a/include/linux/iomap.h b/include/linux/iomap.h
index 9cd93530013c..5e44ca17a64a 100644
--- a/include/linux/iomap.h
+++ b/include/linux/iomap.h
@@ -189,9 +189,9 @@  struct iomap_folio_ops {
 #else
 #define IOMAP_DAX		0
 #endif /* CONFIG_FS_DAX */
-#define IOMAP_ATOMIC_HW		(1 << 9) /* HW-based torn-write protection */
-#define IOMAP_DONTCACHE		(1 << 10)
-#define IOMAP_ATOMIC_SW		(1 << 11)/* SW-based torn-write protection */
+#define IOMAP_DONTCACHE		(1 << 9)
+#define IOMAP_BIO_ATOMIC	(1 << 10) /* Use REQ_ATOMIC on single bio */
+#define IOMAP_FS_ATOMIC		(1 << 11) /* FS-based torn-write protection */
 
 struct iomap_ops {
 	/*
@@ -504,9 +504,9 @@  struct iomap_dio_ops {
 #define IOMAP_DIO_PARTIAL		(1 << 2)
 
 /*
- * Use software-based torn-write protection.
+ * Use FS-based torn-write protection.
  */
-#define IOMAP_DIO_ATOMIC_SW		(1 << 3)
+#define IOMAP_DIO_FS_ATOMIC		(1 << 3)
 
 ssize_t iomap_dio_rw(struct kiocb *iocb, struct iov_iter *iter,
 		const struct iomap_ops *ops, const struct iomap_dio_ops *dops,