Message ID | 20240919063048.2887579-1-linan666@huaweicloud.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | md: ensure child flush IO does not affect origin bio->bi_status | expand |
Context | Check | Description |
---|---|---|
mdraidci/vmtest-md-6_12-PR | success | PR summary |
mdraidci/vmtest-md-6_12-VM_Test-0 | success | Logs for per-patch-testing |
Hi, 在 2024/09/19 14:30, linan666@huaweicloud.com 写道: > From: Li Nan <linan122@huawei.com> > > When a flush is issued to an RAID array, a child flush IO is created and > issued for each member disk in the RAID array. Since commit b75197e86e6d > ("md: Remove flush handling"), each child flush IO has been chained with > the original bio. As a result, the failure of any child IO could modify > the bi_status of the original bio, potentially impacting the upper-layer > filesystem. > > Fix the issue by preventing child flush IO from altering the original > bio->bi_status as before. However, this design introduces a known > issue: in the event of a power failure, if a flush IO on a member > disk fails, the upper layers may not be informed. This issue is not easy > to fix and will not be addressed for the time being in this issue. > > Fixes: b75197e86e6d ("md: Remove flush handling") > Signed-off-by: Li Nan <linan122@huawei.com> > --- > drivers/md/md.c | 24 +++++++++++++++++++++++- > 1 file changed, 23 insertions(+), 1 deletion(-) LGTM Reviewed-by: Yu Kuai <yukuai3@huawei.com> > > diff --git a/drivers/md/md.c b/drivers/md/md.c > index 179ee4afe937..67108c397c5a 100644 > --- a/drivers/md/md.c > +++ b/drivers/md/md.c > @@ -546,6 +546,26 @@ static int mddev_set_closing_and_sync_blockdev(struct mddev *mddev, int opener_n > return 0; > } > > +/* > + * The only difference from bio_chain_endio() is that the current > + * bi_status of bio does not affect the bi_status of parent. > + */ > +static void md_end_flush(struct bio *bio) > +{ > + struct bio *parent = bio->bi_private; > + > + /* > + * If any flush io error before the power failure, > + * disk data may be lost. > + */ The only solution I can think of is treating flush IO the same as meta IO, just call md_error() this rdev. Thanks, Kuai > + if (bio->bi_status) > + pr_err("md: %pg flush io error %d\n", bio->bi_bdev, > + blk_status_to_errno(bio->bi_status)); > + > + bio_put(bio); > + bio_endio(parent); > +} > + > bool md_flush_request(struct mddev *mddev, struct bio *bio) > { > struct md_rdev *rdev; > @@ -565,7 +585,9 @@ bool md_flush_request(struct mddev *mddev, struct bio *bio) > new = bio_alloc_bioset(rdev->bdev, 0, > REQ_OP_WRITE | REQ_PREFLUSH, GFP_NOIO, > &mddev->bio_set); > - bio_chain(new, bio); > + new->bi_private = bio; > + new->bi_end_io = md_end_flush; > + bio_inc_remaining(bio); > submit_bio(new); > } > >
On Wed, Sep 18, 2024 at 11:33 PM <linan666@huaweicloud.com> wrote: > > From: Li Nan <linan122@huawei.com> > > When a flush is issued to an RAID array, a child flush IO is created and > issued for each member disk in the RAID array. Since commit b75197e86e6d > ("md: Remove flush handling"), each child flush IO has been chained with > the original bio. As a result, the failure of any child IO could modify > the bi_status of the original bio, potentially impacting the upper-layer > filesystem. > > Fix the issue by preventing child flush IO from altering the original > bio->bi_status as before. However, this design introduces a known > issue: in the event of a power failure, if a flush IO on a member > disk fails, the upper layers may not be informed. This issue is not easy > to fix and will not be addressed for the time being in this issue. > > Fixes: b75197e86e6d ("md: Remove flush handling") > Signed-off-by: Li Nan <linan122@huawei.com> Applied to md-6.12. Thanks for the fix! Song
diff --git a/drivers/md/md.c b/drivers/md/md.c index 179ee4afe937..67108c397c5a 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -546,6 +546,26 @@ static int mddev_set_closing_and_sync_blockdev(struct mddev *mddev, int opener_n return 0; } +/* + * The only difference from bio_chain_endio() is that the current + * bi_status of bio does not affect the bi_status of parent. + */ +static void md_end_flush(struct bio *bio) +{ + struct bio *parent = bio->bi_private; + + /* + * If any flush io error before the power failure, + * disk data may be lost. + */ + if (bio->bi_status) + pr_err("md: %pg flush io error %d\n", bio->bi_bdev, + blk_status_to_errno(bio->bi_status)); + + bio_put(bio); + bio_endio(parent); +} + bool md_flush_request(struct mddev *mddev, struct bio *bio) { struct md_rdev *rdev; @@ -565,7 +585,9 @@ bool md_flush_request(struct mddev *mddev, struct bio *bio) new = bio_alloc_bioset(rdev->bdev, 0, REQ_OP_WRITE | REQ_PREFLUSH, GFP_NOIO, &mddev->bio_set); - bio_chain(new, bio); + new->bi_private = bio; + new->bi_end_io = md_end_flush; + bio_inc_remaining(bio); submit_bio(new); }