Message ID | 20240903150748.2179966-5-john.g.garry@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RAID0 atomic write support | expand |
On Tue, Sep 03, 2024 at 03:07:48PM +0000, John Garry wrote: > if (sectors < bio_sectors(bio)) { > - struct bio *split = bio_split(bio, sectors, GFP_NOIO, > + struct bio *split; > + > + if (bio->bi_opf & REQ_ATOMIC) > + return false; I guess this is the erroring out when attempting to split the request. Can you add a comment to explain that and why it can't happen for the normal I/O patterns?
On 12/09/2024 14:18, Christoph Hellwig wrote: > On Tue, Sep 03, 2024 at 03:07:48PM +0000, John Garry wrote: >> if (sectors < bio_sectors(bio)) { >> - struct bio *split = bio_split(bio, sectors, GFP_NOIO, >> + struct bio *split; >> + >> + if (bio->bi_opf & REQ_ATOMIC) >> + return false; > I guess this is the erroring out when attempting to split the request. I actually now think that I should change bio_split() to return NULL for splitting a REQ_ATOMIC, like what do for ZONE_APPEND - calling bio_split() like this is a common pattern in md RAID personalities. However, none of the md RAID code check for a NULL split, which they really should, so I can make that change also. > Can you add a comment to explain that and why it can't happen for the > normal I/O patterns? ok, will do. Cheers, John
On Thu, Sep 12, 2024 at 03:48:09PM +0100, John Garry wrote: > > I actually now think that I should change bio_split() to return NULL for > splitting a REQ_ATOMIC, like what do for ZONE_APPEND - calling bio_split() > like this is a common pattern in md RAID personalities. However, none of > the md RAID code check for a NULL split, which they really should, so I can > make that change also. bio_split is a bit of a mess - even NULL isn't very good at returning what caused it to fail. Maybe a switch to ERR_PTR and an audit of all callers might be a good idea.
On 12/09/2024 16:10, Christoph Hellwig wrote: > On Thu, Sep 12, 2024 at 03:48:09PM +0100, John Garry wrote: >> >> I actually now think that I should change bio_split() to return NULL for >> splitting a REQ_ATOMIC, like what do for ZONE_APPEND - calling bio_split() >> like this is a common pattern in md RAID personalities. However, none of >> the md RAID code check for a NULL split, which they really should, so I can >> make that change also. > > bio_split is a bit of a mess - even NULL isn't very good at returning > what caused it to fail. Maybe a switch to ERR_PTR and an audit of > all callers might be a good idea. > So for bio_split() I guess that we would change as follows: --->8---- diff --git a/block/bio.c b/block/bio.c index c4053d49679a..36ddf458753f 100644 --- a/block/bio.c +++ b/block/bio.c @@ -1671,11 +1671,11 @@ struct bio *bio_split(struct bio *bio, int sectors, /* Zone append commands cannot be split */ if (WARN_ON_ONCE(bio_op(bio) == REQ_OP_ZONE_APPEND)) - return NULL; + return ERR_PTR(-EOPNOTSUPP); split = bio_alloc_clone(bio->bi_bdev, bio, gfp, bs); if (!split) - return NULL; + return ERR_PTR(-ENOMEM); split->bi_iter.bi_size = sectors << 9; ---8<---- And then fix up the callers to use IS_ERR(). Or also change bio_alloc_clone() to return an ERR_PTR()? I don't see much point in that, as we will only ever return ENOMEM (from the callees) anyway.
diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c index 32d587524778..caf1ecb55d11 100644 --- a/drivers/md/raid0.c +++ b/drivers/md/raid0.c @@ -384,6 +384,7 @@ static int raid0_set_limits(struct mddev *mddev) lim.max_write_zeroes_sectors = mddev->chunk_sectors; lim.io_min = mddev->chunk_sectors << 9; lim.io_opt = lim.io_min * mddev->raid_disks; + lim.features |= BLK_FEAT_ATOMIC_WRITES; err = mddev_stack_rdev_limits(mddev, &lim, MDDEV_STACK_INTEGRITY); if (err) { queue_limits_cancel_update(mddev->gendisk->queue); @@ -606,7 +607,12 @@ static bool raid0_make_request(struct mddev *mddev, struct bio *bio) : sector_div(sector, chunk_sects)); if (sectors < bio_sectors(bio)) { - struct bio *split = bio_split(bio, sectors, GFP_NOIO, + struct bio *split; + + if (bio->bi_opf & REQ_ATOMIC) + return false; + + split = bio_split(bio, sectors, GFP_NOIO, &mddev->bio_set); bio_chain(split, bio); raid0_map_submit_bio(mddev, bio);
Set BLK_FEAT_ATOMIC_WRITES to enable atomic writes. All other stacked device request queue limits should automatically be set properly. With regards to atomic write max bytes limit, this will be set at hw_max_sectors and this is limited by the stripe width, which we want. Atomic requests may not be split, so error an attempt to do so. It is noted that returning false from .make_request CB results in bio_io_error() being called for the bio, which results in BLK_STS_IOERR. This is not suitable for atomic writes, as BLK_STS_INVAL should be returned. Signed-off-by: John Garry <john.g.garry@oracle.com> --- drivers/md/raid0.c | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-)