diff mbox series

[RFC,4/4] md/raid0: Atomic write support

Message ID 20240903150748.2179966-5-john.g.garry@oracle.com (mailing list archive)
State New, archived
Headers show
Series RAID0 atomic write support | expand

Commit Message

John Garry Sept. 3, 2024, 3:07 p.m. UTC
Set BLK_FEAT_ATOMIC_WRITES to enable atomic writes. All other stacked
device request queue limits should automatically be set properly. With
regards to atomic write max bytes limit, this will be set at
hw_max_sectors and this is limited by the stripe width, which we want.

Atomic requests may not be split, so error an attempt to do so.

It is noted that returning false from .make_request CB results in
bio_io_error() being called for the bio, which results in BLK_STS_IOERR.
This is not suitable for atomic writes, as BLK_STS_INVAL should be
returned.

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 drivers/md/raid0.c | 8 +++++++-
 1 file changed, 7 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig Sept. 12, 2024, 1:18 p.m. UTC | #1
On Tue, Sep 03, 2024 at 03:07:48PM +0000, John Garry wrote:
>  	if (sectors < bio_sectors(bio)) {
> -		struct bio *split = bio_split(bio, sectors, GFP_NOIO,
> +		struct bio *split;
> +
> +		if (bio->bi_opf & REQ_ATOMIC)
> +			return false;

I guess this is the erroring out when attempting to split the request.
Can you add a comment to explain that and why it can't happen for the
normal I/O patterns?
John Garry Sept. 12, 2024, 2:48 p.m. UTC | #2
On 12/09/2024 14:18, Christoph Hellwig wrote:
> On Tue, Sep 03, 2024 at 03:07:48PM +0000, John Garry wrote:
>>   	if (sectors < bio_sectors(bio)) {
>> -		struct bio *split = bio_split(bio, sectors, GFP_NOIO,
>> +		struct bio *split;
>> +
>> +		if (bio->bi_opf & REQ_ATOMIC)
>> +			return false;
> I guess this is the erroring out when attempting to split the request.

I actually now think that I should change bio_split() to return NULL for 
splitting a REQ_ATOMIC, like what do for ZONE_APPEND - calling 
bio_split() like this is a common pattern in md RAID personalities. 
However, none of the md RAID code check for a NULL split, which they 
really should, so I can make that change also.

> Can you add a comment to explain that and why it can't happen for the
> normal I/O patterns?

ok, will do.

Cheers,
John
Christoph Hellwig Sept. 12, 2024, 3:10 p.m. UTC | #3
On Thu, Sep 12, 2024 at 03:48:09PM +0100, John Garry wrote:
>
> I actually now think that I should change bio_split() to return NULL for 
> splitting a REQ_ATOMIC, like what do for ZONE_APPEND - calling bio_split() 
> like this is a common pattern in md RAID personalities. However, none of 
> the md RAID code check for a NULL split, which they really should, so I can 
> make that change also.

bio_split is a bit of a mess - even NULL isn't very good at returning
what caused it to fail.  Maybe a switch to ERR_PTR and an audit of
all callers might be a good idea.
John Garry Sept. 12, 2024, 3:38 p.m. UTC | #4
On 12/09/2024 16:10, Christoph Hellwig wrote:
> On Thu, Sep 12, 2024 at 03:48:09PM +0100, John Garry wrote:
>>
>> I actually now think that I should change bio_split() to return NULL for
>> splitting a REQ_ATOMIC, like what do for ZONE_APPEND - calling bio_split()
>> like this is a common pattern in md RAID personalities. However, none of
>> the md RAID code check for a NULL split, which they really should, so I can
>> make that change also.
> 
> bio_split is a bit of a mess - even NULL isn't very good at returning
> what caused it to fail.  Maybe a switch to ERR_PTR and an audit of
> all callers might be a good idea.
> 

So for bio_split() I guess that we would change as follows:

--->8----


diff --git a/block/bio.c b/block/bio.c
index c4053d49679a..36ddf458753f 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -1671,11 +1671,11 @@ struct bio *bio_split(struct bio *bio, int sectors,

  	/* Zone append commands cannot be split */
  	if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND))
-		return NULL;
+		return ERR_PTR(-EOPNOTSUPP);

  	split = bio_alloc_clone(bio->bi_bdev, bio, gfp, bs);
  	if (!split)
-		return NULL;
+		return ERR_PTR(-ENOMEM);

  	split->bi_iter.bi_size = sectors << 9;


---8<----

And then fix up the callers to use IS_ERR().

Or also change bio_alloc_clone() to return an ERR_PTR()? I don't see 
much point in that, as we will only ever return ENOMEM (from the 
callees) anyway.
diff mbox series

Patch

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index 32d587524778..caf1ecb55d11 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -384,6 +384,7 @@  static int raid0_set_limits(struct mddev *mddev)
 	lim.max_write_zeroes_sectors = mddev->chunk_sectors;
 	lim.io_min = mddev->chunk_sectors << 9;
 	lim.io_opt = lim.io_min * mddev->raid_disks;
+	lim.features |= BLK_FEAT_ATOMIC_WRITES;
 	err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY);
 	if (err) {
 		queue_limits_cancel_update(mddev->gendisk->queue);
@@ -606,7 +607,12 @@  static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 		 : sector_div(sector, chunk_sects));
 
 	if (sectors < bio_sectors(bio)) {
-		struct bio *split = bio_split(bio, sectors, GFP_NOIO,
+		struct bio *split;
+
+		if (bio->bi_opf & REQ_ATOMIC)
+			return false;
+
+		split = bio_split(bio, sectors, GFP_NOIO,
 					      &mddev->bio_set);
 		bio_chain(split, bio);
 		raid0_map_submit_bio(mddev, bio);