diff mbox series

[RFC,5/6] md/raid1: Handle bio_split() errors

Message ID 20240919092302.3094725-6-john.g.garry@oracle.com (mailing list archive)
State New
Headers show
Series bio_split() error handling rework | expand

Checks

Context Check Description
mdraidci/vmtest-md-6_12-PR fail merge-conflict

Commit Message

John Garry Sept. 19, 2024, 9:23 a.m. UTC
Add proper bio_split() error handling. For any error, call
raid_end_bio_io() and return;

Signed-off-by: John Garry <john.g.garry@oracle.com>
---
 drivers/md/raid1.c | 8 ++++++++
 1 file changed, 8 insertions(+)

Comments

Yu Kuai Sept. 20, 2024, 6:58 a.m. UTC | #1
Hi,

在 2024/09/19 17:23, John Garry 写道:
> Add proper bio_split() error handling. For any error, call
> raid_end_bio_io() and return;
> 
> Signed-off-by: John Garry <john.g.garry@oracle.com>
> ---
>   drivers/md/raid1.c | 8 ++++++++
>   1 file changed, 8 insertions(+)
> 
> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
> index 6c9d24203f39..c561e2d185e2 100644
> --- a/drivers/md/raid1.c
> +++ b/drivers/md/raid1.c
> @@ -1383,6 +1383,10 @@ static void raid1_read_request(struct mddev *mddev, struct bio *bio,
>   	if (max_sectors < bio_sectors(bio)) {
>   		struct bio *split = bio_split(bio, max_sectors,
>   					      gfp, &conf->bio_split);
> +		if (IS_ERR(split)) {
> +			raid_end_bio_io(r1_bio);
> +			return;
> +		}

This way, BLK_STS_IOERR will always be returned, perhaps what you want
is to return the error code from bio_split()?

Thanks,
Kuai

>   		bio_chain(split, bio);
>   		submit_bio_noacct(bio);
>   		bio = split;
> @@ -1576,6 +1580,10 @@ static void raid1_write_request(struct mddev *mddev, struct bio *bio,
>   	if (max_sectors < bio_sectors(bio)) {
>   		struct bio *split = bio_split(bio, max_sectors,
>   					      GFP_NOIO, &conf->bio_split);
> +		if (IS_ERR(split)) {
> +			raid_end_bio_io(r1_bio);
> +			return;
> +		}
>   		bio_chain(split, bio);
>   		submit_bio_noacct(bio);
>   		bio = split;
>
John Garry Sept. 20, 2024, 10:04 a.m. UTC | #2
On 20/09/2024 07:58, Yu Kuai wrote:
> Hi,
> 
> 在 2024/09/19 17:23, John Garry 写道:
>> Add proper bio_split() error handling. For any error, call
>> raid_end_bio_io() and return;
>>
>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>> ---
>>   drivers/md/raid1.c | 8 ++++++++
>>   1 file changed, 8 insertions(+)
>>
>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>> index 6c9d24203f39..c561e2d185e2 100644
>> --- a/drivers/md/raid1.c
>> +++ b/drivers/md/raid1.c
>> @@ -1383,6 +1383,10 @@ static void raid1_read_request(struct mddev 
>> *mddev, struct bio *bio,
>>       if (max_sectors < bio_sectors(bio)) {
>>           struct bio *split = bio_split(bio, max_sectors,
>>                             gfp, &conf->bio_split);
>> +        if (IS_ERR(split)) {
>> +            raid_end_bio_io(r1_bio);
>> +            return;
>> +        }
> 
> This way, BLK_STS_IOERR will always be returned, perhaps what you want
> is to return the error code from bio_split()?

Yeah, I would like to return that error code, so maybe I can encode it 
in the master_bio directly or pass as an arg to raid_end_bio_io().

Thanks,
John
Yu Kuai Sept. 23, 2024, 6:15 a.m. UTC | #3
在 2024/09/20 18:04, John Garry 写道:
> On 20/09/2024 07:58, Yu Kuai wrote:
>> Hi,
>>
>> 在 2024/09/19 17:23, John Garry 写道:
>>> Add proper bio_split() error handling. For any error, call
>>> raid_end_bio_io() and return;
>>>
>>> Signed-off-by: John Garry <john.g.garry@oracle.com>
>>> ---
>>>   drivers/md/raid1.c | 8 ++++++++
>>>   1 file changed, 8 insertions(+)
>>>
>>> diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
>>> index 6c9d24203f39..c561e2d185e2 100644
>>> --- a/drivers/md/raid1.c
>>> +++ b/drivers/md/raid1.c
>>> @@ -1383,6 +1383,10 @@ static void raid1_read_request(struct mddev 
>>> *mddev, struct bio *bio,
>>>       if (max_sectors < bio_sectors(bio)) {
>>>           struct bio *split = bio_split(bio, max_sectors,
>>>                             gfp, &conf->bio_split);
>>> +        if (IS_ERR(split)) {
>>> +            raid_end_bio_io(r1_bio);
>>> +            return;
>>> +        }
>>
>> This way, BLK_STS_IOERR will always be returned, perhaps what you want
>> is to return the error code from bio_split()?
> 
> Yeah, I would like to return that error code, so maybe I can encode it 
> in the master_bio directly or pass as an arg to raid_end_bio_io().

That's fine, however, I think the change can introduce problems in some
corner cases, for example there is a rdev with badblocks and a slow rdev
with full copy. Currently raid1_read_request() will split this bio to
read some from fast rdev, and read the badblocks region from slow rdev.

We need a new branch in read_balance() to choose a rdev with full copy.

Thanks,
Kuai

> 
> Thanks,
> John
> 
> .
>
John Garry Sept. 23, 2024, 7:44 a.m. UTC | #4
On 23/09/2024 07:15, Yu Kuai wrote:
>>>
>>> This way, BLK_STS_IOERR will always be returned, perhaps what you want
>>> is to return the error code from bio_split()?
>>
>> Yeah, I would like to return that error code, so maybe I can encode it 
>> in the master_bio directly or pass as an arg to raid_end_bio_io().
> 
> That's fine, however, I think the change can introduce problems in some
> corner cases, for example there is a rdev with badblocks and a slow rdev
> with full copy. Currently raid1_read_request() will split this bio to
> read some from fast rdev, and read the badblocks region from slow rdev.
> 
> We need a new branch in read_balance() to choose a rdev with full copy.

Sure, I do realize that the mirror'ing personalities need more 
sophisticated error handling changes (than what I presented).

However, in raid1_read_request() we do the read_balance() and then the 
bio_split() attempt. So what are you suggesting we do for the 
bio_split() error? Is it to retry without the bio_split()?

To me bio_split() should not fail. If it does, it is likely ENOMEM or 
some other bug being exposed, so I am not sure that retrying with 
skipping bio_split() is the right approach (if that is what you are 
suggesting).

Thanks,
John
Yu Kuai Sept. 23, 2024, 8:18 a.m. UTC | #5
Hi,

在 2024/09/23 15:44, John Garry 写道:
> On 23/09/2024 07:15, Yu Kuai wrote:
>>>>
>>>> This way, BLK_STS_IOERR will always be returned, perhaps what you want
>>>> is to return the error code from bio_split()?
>>>
>>> Yeah, I would like to return that error code, so maybe I can encode 
>>> it in the master_bio directly or pass as an arg to raid_end_bio_io().
>>
>> That's fine, however, I think the change can introduce problems in some
>> corner cases, for example there is a rdev with badblocks and a slow rdev
>> with full copy. Currently raid1_read_request() will split this bio to
>> read some from fast rdev, and read the badblocks region from slow rdev.
>>
>> We need a new branch in read_balance() to choose a rdev with full copy.
> 
> Sure, I do realize that the mirror'ing personalities need more 
> sophisticated error handling changes (than what I presented).
> 
> However, in raid1_read_request() we do the read_balance() and then the 
> bio_split() attempt. So what are you suggesting we do for the 
> bio_split() error? Is it to retry without the bio_split()?
> 
> To me bio_split() should not fail. If it does, it is likely ENOMEM or 
> some other bug being exposed, so I am not sure that retrying with 
> skipping bio_split() is the right approach (if that is what you are 
> suggesting).

bio_split_to_limits() is already called from md_submit_bio(), so here
bio should only be splitted because of badblocks or resync. We have to
return error for resync, however, for badblocks, we can still try to
find a rdev without badblocks so bio_split() is not needed. And we need
to retry and inform read_balance() to skip rdev with badblocks in this
case.

This can only happen if the full copy only exist in slow disks. This
really is corner case, and this is not related to your new error path by
atomic write. I don't mind this version for now, just something
I noticed if bio_spilit() can fail.

Thanks,
Kuai

> 
> Thanks,
> John
> 
> .
>
John Garry Sept. 23, 2024, 9:21 a.m. UTC | #6
On 23/09/2024 09:18, Yu Kuai wrote:
>>>
>>> We need a new branch in read_balance() to choose a rdev with full copy.
>>
>> Sure, I do realize that the mirror'ing personalities need more 
>> sophisticated error handling changes (than what I presented).
>>
>> However, in raid1_read_request() we do the read_balance() and then the 
>> bio_split() attempt. So what are you suggesting we do for the 
>> bio_split() error? Is it to retry without the bio_split()?
>>
>> To me bio_split() should not fail. If it does, it is likely ENOMEM or 
>> some other bug being exposed, so I am not sure that retrying with 
>> skipping bio_split() is the right approach (if that is what you are 
>> suggesting).
> 
> bio_split_to_limits() is already called from md_submit_bio(), so here
> bio should only be splitted because of badblocks or resync. We have to
> return error for resync, however, for badblocks, we can still try to
> find a rdev without badblocks so bio_split() is not needed. And we need
> to retry and inform read_balance() to skip rdev with badblocks in this
> case.
> 
> This can only happen if the full copy only exist in slow disks. This
> really is corner case, and this is not related to your new error path by
> atomic write. I don't mind this version for now, just something
> I noticed if bio_spilit() can fail.

Are you saying that some improvement needs to be made to the current 
code for badblocks handling, like initially try to skip bio_split()?

Apart from that, what about the change in raid10_write_request(), w.r.t 
error handling?

There, for an error in bio_split(), I think that we need to do some 
tidy-up if bio_split() fails, i.e. undo increase in rdev->nr_pending 
when looping conf->copies

BTW, feel free to comment in patch 6/6 for that.

Thanks,
John
Yu Kuai Sept. 23, 2024, 9:38 a.m. UTC | #7
Hi,

在 2024/09/23 17:21, John Garry 写道:
> On 23/09/2024 09:18, Yu Kuai wrote:
>>>>
>>>> We need a new branch in read_balance() to choose a rdev with full copy.
>>>
>>> Sure, I do realize that the mirror'ing personalities need more 
>>> sophisticated error handling changes (than what I presented).
>>>
>>> However, in raid1_read_request() we do the read_balance() and then 
>>> the bio_split() attempt. So what are you suggesting we do for the 
>>> bio_split() error? Is it to retry without the bio_split()?
>>>
>>> To me bio_split() should not fail. If it does, it is likely ENOMEM or 
>>> some other bug being exposed, so I am not sure that retrying with 
>>> skipping bio_split() is the right approach (if that is what you are 
>>> suggesting).
>>
>> bio_split_to_limits() is already called from md_submit_bio(), so here
>> bio should only be splitted because of badblocks or resync. We have to
>> return error for resync, however, for badblocks, we can still try to
>> find a rdev without badblocks so bio_split() is not needed. And we need
>> to retry and inform read_balance() to skip rdev with badblocks in this
>> case.
>>
>> This can only happen if the full copy only exist in slow disks. This
>> really is corner case, and this is not related to your new error path by
>> atomic write. I don't mind this version for now, just something
>> I noticed if bio_spilit() can fail.
> 
> Are you saying that some improvement needs to be made to the current 
> code for badblocks handling, like initially try to skip bio_split()?
> 
> Apart from that, what about the change in raid10_write_request(), w.r.t 
> error handling?
> 
> There, for an error in bio_split(), I think that we need to do some 
> tidy-up if bio_split() fails, i.e. undo increase in rdev->nr_pending 
> when looping conf->copies
> 
> BTW, feel free to comment in patch 6/6 for that.

Yes, raid1/raid10 write are the same. If you want to enable atomic write
for raid1/raid10, you must add a new branch to handle badblocks now,
otherwise, as long as one copy contain any badblocks, atomic write will
fail while theoretically I think it can work.

Thanks,
Kuai

> 
> Thanks,
> John
> 
> .
>
John Garry Sept. 23, 2024, 10:40 a.m. UTC | #8
On 23/09/2024 10:38, Yu Kuai wrote:
>>
>> Are you saying that some improvement needs to be made to the current 
>> code for badblocks handling, like initially try to skip bio_split()?
>>
>> Apart from that, what about the change in raid10_write_request(), 
>> w.r.t error handling?
>>
>> There, for an error in bio_split(), I think that we need to do some 
>> tidy-up if bio_split() fails, i.e. undo increase in rdev->nr_pending 
>> when looping conf->copies
>>
>> BTW, feel free to comment in patch 6/6 for that.
> 
> Yes, raid1/raid10 write are the same. If you want to enable atomic write
> for raid1/raid10, you must add a new branch to handle badblocks now,
> otherwise, as long as one copy contain any badblocks, atomic write will
> fail while theoretically I think it can work.

ok, I'll check the badblocks code further to understand this.

The point really for atomic writes support is that we should just not be 
attempting to split a bio, and handle an attempt to split an atomic 
write bio like any other bio split failure, e.g. if it does happen we 
either have a software bug or out-of-resources (-ENOMEM). Properly 
stacked atomic write queue limits should ensure that we are not in the 
situation where we do need to split, and the new checking in bio_split() 
is just an insurance policy.

Thanks,
John
diff mbox series

Patch

diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 6c9d24203f39..c561e2d185e2 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -1383,6 +1383,10 @@  static void raid1_read_request(struct mddev *mddev, struct bio *bio,
 	if (max_sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, max_sectors,
 					      gfp, &conf->bio_split);
+		if (IS_ERR(split)) {
+			raid_end_bio_io(r1_bio);
+			return;
+		}
 		bio_chain(split, bio);
 		submit_bio_noacct(bio);
 		bio = split;
@@ -1576,6 +1580,10 @@  static void raid1_write_request(struct mddev *mddev, struct bio *bio,
 	if (max_sectors < bio_sectors(bio)) {
 		struct bio *split = bio_split(bio, max_sectors,
 					      GFP_NOIO, &conf->bio_split);
+		if (IS_ERR(split)) {
+			raid_end_bio_io(r1_bio);
+			return;
+		}
 		bio_chain(split, bio);
 		submit_bio_noacct(bio);
 		bio = split;