diff mbox series

md/raid0: avoid RAID0 data corruption due to layout confusion.

Message ID 87pnkaardl.fsf@notabene.neil.brown.name (mailing list archive)
State New, archived
Headers show
Series md/raid0: avoid RAID0 data corruption due to layout confusion. | expand

Commit Message

NeilBrown Sept. 9, 2019, 6:57 a.m. UTC
If the drives in a RAID0 are not all the same size, the array is
divided into zones.
The first zone covers all drives, to the size of the smallest.
The second zone covers all drives larger than the smallest, up to
the size of the second smallest - etc.

A change in Linux 3.14 unintentionally changed the layout for the
second and subsequent zones.  All the correct data is still stored, but
each chunk may be assigned to a different device than in pre-3.14 kernels.
This can lead to data corruption.

It is not possible to determine what layout to use - it depends which
kernel the data was written by.
So we add a module parameter to allow the old (0) or new (1) layout to be
specified, and refused to assemble an affected array if that parameter is
not set.

Fixes: 20d0189b1012 ("block: Introduce new bio_split()")
cc: stable@vger.kernel.org (3.14+)
Signed-off-by: NeilBrown <neilb@suse.de>
---

This and the next patch are my proposal for how to address
this problem.  I haven't actually tested .....

NeilBrown

 drivers/md/raid0.c | 28 +++++++++++++++++++++++++++-
 drivers/md/raid0.h | 14 ++++++++++++++
 2 files changed, 41 insertions(+), 1 deletion(-)

Comments

Song Liu Sept. 9, 2019, 2:56 p.m. UTC | #1
Hi Neil,

> On Sep 9, 2019, at 7:57 AM, NeilBrown <neilb@suse.de> wrote:
> 
> 
> If the drives in a RAID0 are not all the same size, the array is
> divided into zones.
> The first zone covers all drives, to the size of the smallest.
> The second zone covers all drives larger than the smallest, up to
> the size of the second smallest - etc.
> 
> A change in Linux 3.14 unintentionally changed the layout for the
> second and subsequent zones.  All the correct data is still stored, but
> each chunk may be assigned to a different device than in pre-3.14 kernels.
> This can lead to data corruption.
> 
> It is not possible to determine what layout to use - it depends which
> kernel the data was written by.
> So we add a module parameter to allow the old (0) or new (1) layout to be
> specified, and refused to assemble an affected array if that parameter is
> not set.
> 
> Fixes: 20d0189b1012 ("block: Introduce new bio_split()")
> cc: stable@vger.kernel.org (3.14+)
> Signed-off-by: NeilBrown <neilb@suse.de>

Thanks for the patches. They look great. However, I am having problem
apply them (not sure whether it is a problem on my side). Could you 
please push it somewhere so I can use cherry-pick instead?

Thanks,
Song
Guoqing Jiang Sept. 9, 2019, 3:09 p.m. UTC | #2
On 9/9/19 8:57 AM, NeilBrown wrote:
> 
> If the drives in a RAID0 are not all the same size, the array is
> divided into zones.
> The first zone covers all drives, to the size of the smallest.
> The second zone covers all drives larger than the smallest, up to
> the size of the second smallest - etc.
> 
> A change in Linux 3.14 unintentionally changed the layout for the
> second and subsequent zones.  All the correct data is still stored, but
> each chunk may be assigned to a different device than in pre-3.14 kernels.
> This can lead to data corruption.
> 
> It is not possible to determine what layout to use - it depends which
> kernel the data was written by.
> So we add a module parameter to allow the old (0) or new (1) layout to be
> specified, and refused to assemble an affected array if that parameter is
> not set.
> 
> Fixes: 20d0189b1012 ("block: Introduce new bio_split()")
> cc: stable@vger.kernel.org (3.14+)
> Signed-off-by: NeilBrown <neilb@suse.de>
> ---
> 
> This and the next patch are my proposal for how to address
> this problem.  I haven't actually tested .....
> 
> NeilBrown
> 
>   drivers/md/raid0.c | 28 +++++++++++++++++++++++++++-
>   drivers/md/raid0.h | 14 ++++++++++++++
>   2 files changed, 41 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
> index bf5cf184a260..a8888c12308a 100644
> --- a/drivers/md/raid0.c
> +++ b/drivers/md/raid0.c
> @@ -19,6 +19,9 @@
>   #include "raid0.h"
>   #include "raid5.h"
>   
> +static int default_layout = -1;
> +module_param(default_layout, int, 0644);
> +
>   #define UNSUPPORTED_MDDEV_FLAGS		\
>   	((1L << MD_HAS_JOURNAL) |	\
>   	 (1L << MD_JOURNAL_CLEAN) |	\
> @@ -139,6 +142,19 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
>   	}
>   	pr_debug("md/raid0:%s: FINAL %d zones\n",
>   		 mdname(mddev), conf->nr_strip_zones);
> +
> +	if (conf->nr_strip_zones == 1) {
> +		conf->layout = RAID0_ORIG_LAYOUT;
> +	} else if (default_layout == RAID0_ORIG_LAYOUT ||
> +		   default_layout == RAID0_ALT_MULTIZONE_LAYOUT) {
> +		conf->layout = default_layout;
> +	} else {
> +		pr_err("md/raid0:%s: cannot assemble multi-zone RAID0 with default_layout setting\n",
> +		       mdname(mddev));
> +		pr_err("md/raid0: please set raid.default_layout to 0 or 1\n");

Maybe "1 or 2" to align with the definition of below r0layout?

[snip]

> +enum r0layout {
> +	RAID0_ORIG_LAYOUT = 1,
> +	RAID0_ALT_MULTIZONE_LAYOUT = 2,
> +};
>   struct r0conf {
>   	struct strip_zone	*strip_zone;
>   	struct md_rdev		**devlist; /* lists of rdevs, pointed to
>   					    * by strip_zone->dev */
>   	int			nr_strip_zones;
> +	enum r0layout		layout;
>   };
>   
>   #endif
> 

Thanks,
Guoqing
NeilBrown Sept. 9, 2019, 11:33 p.m. UTC | #3
On Mon, Sep 09 2019, Song Liu wrote:

> Hi Neil,
>
>> On Sep 9, 2019, at 7:57 AM, NeilBrown <neilb@suse.de> wrote:
>> 
>> 
>> If the drives in a RAID0 are not all the same size, the array is
>> divided into zones.
>> The first zone covers all drives, to the size of the smallest.
>> The second zone covers all drives larger than the smallest, up to
>> the size of the second smallest - etc.
>> 
>> A change in Linux 3.14 unintentionally changed the layout for the
>> second and subsequent zones.  All the correct data is still stored, but
>> each chunk may be assigned to a different device than in pre-3.14 kernels.
>> This can lead to data corruption.
>> 
>> It is not possible to determine what layout to use - it depends which
>> kernel the data was written by.
>> So we add a module parameter to allow the old (0) or new (1) layout to be
>> specified, and refused to assemble an affected array if that parameter is
>> not set.
>> 
>> Fixes: 20d0189b1012 ("block: Introduce new bio_split()")
>> cc: stable@vger.kernel.org (3.14+)
>> Signed-off-by: NeilBrown <neilb@suse.de>
>
> Thanks for the patches. They look great. However, I am having problem
> apply them (not sure whether it is a problem on my side). Could you 
> please push it somewhere so I can use cherry-pick instead?

I rebased them on block/for-next, fixed the problems that Guoqing found,
and pushed them to 
  https://github.com/neilbrown/linux md/raid0

NeilBrown
NeilBrown Sept. 9, 2019, 11:34 p.m. UTC | #4
On Mon, Sep 09 2019, Guoqing Jiang wrote:

> On 9/9/19 8:57 AM, NeilBrown wrote:
>> 
>> If the drives in a RAID0 are not all the same size, the array is
>> divided into zones.
>> The first zone covers all drives, to the size of the smallest.
>> The second zone covers all drives larger than the smallest, up to
>> the size of the second smallest - etc.
>> 
>> A change in Linux 3.14 unintentionally changed the layout for the
>> second and subsequent zones.  All the correct data is still stored, but
>> each chunk may be assigned to a different device than in pre-3.14 kernels.
>> This can lead to data corruption.
>> 
>> It is not possible to determine what layout to use - it depends which
>> kernel the data was written by.
>> So we add a module parameter to allow the old (0) or new (1) layout to be
>> specified, and refused to assemble an affected array if that parameter is
>> not set.
>> 
>> Fixes: 20d0189b1012 ("block: Introduce new bio_split()")
>> cc: stable@vger.kernel.org (3.14+)
>> Signed-off-by: NeilBrown <neilb@suse.de>
>> ---
>> 
>> This and the next patch are my proposal for how to address
>> this problem.  I haven't actually tested .....
>> 
>> NeilBrown
>> 
>>   drivers/md/raid0.c | 28 +++++++++++++++++++++++++++-
>>   drivers/md/raid0.h | 14 ++++++++++++++
>>   2 files changed, 41 insertions(+), 1 deletion(-)
>> 
>> diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
>> index bf5cf184a260..a8888c12308a 100644
>> --- a/drivers/md/raid0.c
>> +++ b/drivers/md/raid0.c
>> @@ -19,6 +19,9 @@
>>   #include "raid0.h"
>>   #include "raid5.h"
>>   
>> +static int default_layout = -1;
>> +module_param(default_layout, int, 0644);
>> +
>>   #define UNSUPPORTED_MDDEV_FLAGS		\
>>   	((1L << MD_HAS_JOURNAL) |	\
>>   	 (1L << MD_JOURNAL_CLEAN) |	\
>> @@ -139,6 +142,19 @@ static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
>>   	}
>>   	pr_debug("md/raid0:%s: FINAL %d zones\n",
>>   		 mdname(mddev), conf->nr_strip_zones);
>> +
>> +	if (conf->nr_strip_zones == 1) {
>> +		conf->layout = RAID0_ORIG_LAYOUT;
>> +	} else if (default_layout == RAID0_ORIG_LAYOUT ||
>> +		   default_layout == RAID0_ALT_MULTIZONE_LAYOUT) {
>> +		conf->layout = default_layout;
>> +	} else {
>> +		pr_err("md/raid0:%s: cannot assemble multi-zone RAID0 with default_layout setting\n",
>> +		       mdname(mddev));
>> +		pr_err("md/raid0: please set raid.default_layout to 0 or 1\n");
>
> Maybe "1 or 2" to align with the definition of below r0layout?

Thanks.  Fixed.

NeilBrown

>
> [snip]
>
>> +enum r0layout {
>> +	RAID0_ORIG_LAYOUT = 1,
>> +	RAID0_ALT_MULTIZONE_LAYOUT = 2,
>> +};
>>   struct r0conf {
>>   	struct strip_zone	*strip_zone;
>>   	struct md_rdev		**devlist; /* lists of rdevs, pointed to
>>   					    * by strip_zone->dev */
>>   	int			nr_strip_zones;
>> +	enum r0layout		layout;
>>   };
>>   
>>   #endif
>> 
>
> Thanks,
> Guoqing
Song Liu Sept. 10, 2019, 3:45 p.m. UTC | #5
> On Sep 10, 2019, at 12:33 AM, NeilBrown <neilb@suse.de> wrote:
> 
> On Mon, Sep 09 2019, Song Liu wrote:
> 
>> Hi Neil,
>> 
>>> On Sep 9, 2019, at 7:57 AM, NeilBrown <neilb@suse.de> wrote:
>>> 
>>> 
>>> If the drives in a RAID0 are not all the same size, the array is
>>> divided into zones.
>>> The first zone covers all drives, to the size of the smallest.
>>> The second zone covers all drives larger than the smallest, up to
>>> the size of the second smallest - etc.
>>> 
>>> A change in Linux 3.14 unintentionally changed the layout for the
>>> second and subsequent zones.  All the correct data is still stored, but
>>> each chunk may be assigned to a different device than in pre-3.14 kernels.
>>> This can lead to data corruption.
>>> 
>>> It is not possible to determine what layout to use - it depends which
>>> kernel the data was written by.
>>> So we add a module parameter to allow the old (0) or new (1) layout to be
>>> specified, and refused to assemble an affected array if that parameter is
>>> not set.
>>> 
>>> Fixes: 20d0189b1012 ("block: Introduce new bio_split()")
>>> cc: stable@vger.kernel.org (3.14+)
>>> Signed-off-by: NeilBrown <neilb@suse.de>
>> 
>> Thanks for the patches. They look great. However, I am having problem
>> apply them (not sure whether it is a problem on my side). Could you 
>> please push it somewhere so I can use cherry-pick instead?
> 
> I rebased them on block/for-next, fixed the problems that Guoqing found,
> and pushed them to 
>  https://github.com/neilbrown/linux md/raid0
> 
> NeilBrown

Thanks Neil!

Guoqing, if this looks good, please reply with your Reviewed-by
or Acked-by. 

Thanks,
Song
Guoqing Jiang Sept. 10, 2019, 4:01 p.m. UTC | #6
On 9/10/19 5:45 PM, Song Liu wrote:
> 
> 
>> On Sep 10, 2019, at 12:33 AM, NeilBrown <neilb@suse.de> wrote:
>>
>> On Mon, Sep 09 2019, Song Liu wrote:
>>
>>> Hi Neil,
>>>
>>>> On Sep 9, 2019, at 7:57 AM, NeilBrown <neilb@suse.de> wrote:
>>>>
>>>>
>>>> If the drives in a RAID0 are not all the same size, the array is
>>>> divided into zones.
>>>> The first zone covers all drives, to the size of the smallest.
>>>> The second zone covers all drives larger than the smallest, up to
>>>> the size of the second smallest - etc.
>>>>
>>>> A change in Linux 3.14 unintentionally changed the layout for the
>>>> second and subsequent zones.  All the correct data is still stored, but
>>>> each chunk may be assigned to a different device than in pre-3.14 kernels.
>>>> This can lead to data corruption.
>>>>
>>>> It is not possible to determine what layout to use - it depends which
>>>> kernel the data was written by.
>>>> So we add a module parameter to allow the old (0) or new (1) layout to be
>>>> specified, and refused to assemble an affected array if that parameter is
>>>> not set.
>>>>
>>>> Fixes: 20d0189b1012 ("block: Introduce new bio_split()")
>>>> cc: stable@vger.kernel.org (3.14+)
>>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>>
>>> Thanks for the patches. They look great. However, I am having problem
>>> apply them (not sure whether it is a problem on my side). Could you
>>> please push it somewhere so I can use cherry-pick instead?
>>
>> I rebased them on block/for-next, fixed the problems that Guoqing found,
>> and pushed them to
>>   https://github.com/neilbrown/linux md/raid0
>>
>> NeilBrown
> 
> Thanks Neil!

Thanks for the explanation about set the flag.

> 
> Guoqing, if this looks good, please reply with your Reviewed-by
> or Acked-by.

No more comments from my side, but I am not sure if it is better/possible to use one
sysfs node to control the behavior instead of module parameter, then we can support
different raid0 layout dynamically.

Anyway, Acked-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>

Thanks,
Guoqing
NeilBrown Sept. 10, 2019, 11:08 p.m. UTC | #7
On Tue, Sep 10 2019, Guoqing Jiang wrote:

> On 9/10/19 5:45 PM, Song Liu wrote:
>> 
>> 
>>> On Sep 10, 2019, at 12:33 AM, NeilBrown <neilb@suse.de> wrote:
>>>
>>> On Mon, Sep 09 2019, Song Liu wrote:
>>>
>>>> Hi Neil,
>>>>
>>>>> On Sep 9, 2019, at 7:57 AM, NeilBrown <neilb@suse.de> wrote:
>>>>>
>>>>>
>>>>> If the drives in a RAID0 are not all the same size, the array is
>>>>> divided into zones.
>>>>> The first zone covers all drives, to the size of the smallest.
>>>>> The second zone covers all drives larger than the smallest, up to
>>>>> the size of the second smallest - etc.
>>>>>
>>>>> A change in Linux 3.14 unintentionally changed the layout for the
>>>>> second and subsequent zones.  All the correct data is still stored, but
>>>>> each chunk may be assigned to a different device than in pre-3.14 kernels.
>>>>> This can lead to data corruption.
>>>>>
>>>>> It is not possible to determine what layout to use - it depends which
>>>>> kernel the data was written by.
>>>>> So we add a module parameter to allow the old (0) or new (1) layout to be
>>>>> specified, and refused to assemble an affected array if that parameter is
>>>>> not set.
>>>>>
>>>>> Fixes: 20d0189b1012 ("block: Introduce new bio_split()")
>>>>> cc: stable@vger.kernel.org (3.14+)
>>>>> Signed-off-by: NeilBrown <neilb@suse.de>
>>>>
>>>> Thanks for the patches. They look great. However, I am having problem
>>>> apply them (not sure whether it is a problem on my side). Could you
>>>> please push it somewhere so I can use cherry-pick instead?
>>>
>>> I rebased them on block/for-next, fixed the problems that Guoqing found,
>>> and pushed them to
>>>   https://github.com/neilbrown/linux md/raid0
>>>
>>> NeilBrown
>> 
>> Thanks Neil!
>
> Thanks for the explanation about set the flag.
>
>> 
>> Guoqing, if this looks good, please reply with your Reviewed-by
>> or Acked-by.
>
> No more comments from my side, but I am not sure if it is better/possible to use one
> sysfs node to control the behavior instead of module parameter, then we can support
> different raid0 layout dynamically.

A strength of module parameters is that you can set them in
  /etc/modprobe.d/00-local.conf
and then they are automatically set on boot.
For sysfs, you need some tool to set them.

>
> Anyway, Acked-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>

Thanks,
NeilBrown


> Thanks,
> Guoqing
Song Liu Sept. 11, 2019, 9:56 a.m. UTC | #8
On Wed, Sep 11, 2019 at 12:10 AM NeilBrown <neilb@suse.de> wrote:
>
> On Tue, Sep 10 2019, Guoqing Jiang wrote:
>
> > On 9/10/19 5:45 PM, Song Liu wrote:
> >>
> >>
> >>> On Sep 10, 2019, at 12:33 AM, NeilBrown <neilb@suse.de> wrote:
> >>>
> >>> On Mon, Sep 09 2019, Song Liu wrote:
> >>>
> >>>> Hi Neil,
> >>>>
> >>>>> On Sep 9, 2019, at 7:57 AM, NeilBrown <neilb@suse.de> wrote:
> >>>>>
> >>>>>
> >>>>> If the drives in a RAID0 are not all the same size, the array is
> >>>>> divided into zones.
> >>>>> The first zone covers all drives, to the size of the smallest.
> >>>>> The second zone covers all drives larger than the smallest, up to
> >>>>> the size of the second smallest - etc.
> >>>>>
> >>>>> A change in Linux 3.14 unintentionally changed the layout for the
> >>>>> second and subsequent zones.  All the correct data is still stored, but
> >>>>> each chunk may be assigned to a different device than in pre-3.14 kernels.
> >>>>> This can lead to data corruption.
> >>>>>
> >>>>> It is not possible to determine what layout to use - it depends which
> >>>>> kernel the data was written by.
> >>>>> So we add a module parameter to allow the old (0) or new (1) layout to be
> >>>>> specified, and refused to assemble an affected array if that parameter is
> >>>>> not set.
> >>>>>
> >>>>> Fixes: 20d0189b1012 ("block: Introduce new bio_split()")
> >>>>> cc: stable@vger.kernel.org (3.14+)
> >>>>> Signed-off-by: NeilBrown <neilb@suse.de>
> >>>>
> >>>> Thanks for the patches. They look great. However, I am having problem
> >>>> apply them (not sure whether it is a problem on my side). Could you
> >>>> please push it somewhere so I can use cherry-pick instead?
> >>>
> >>> I rebased them on block/for-next, fixed the problems that Guoqing found,
> >>> and pushed them to
> >>>   https://github.com/neilbrown/linux md/raid0
> >>>
> >>> NeilBrown
> >>
> >> Thanks Neil!
> >
> > Thanks for the explanation about set the flag.
> >
> >>
> >> Guoqing, if this looks good, please reply with your Reviewed-by
> >> or Acked-by.
> >
> > No more comments from my side, but I am not sure if it is better/possible to use one
> > sysfs node to control the behavior instead of module parameter, then we can support
> > different raid0 layout dynamically.
>
> A strength of module parameters is that you can set them in
>   /etc/modprobe.d/00-local.conf
> and then they are automatically set on boot.
> For sysfs, you need some tool to set them.
>
> >
> > Anyway, Acked-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
> >

I am adding the following change to the 1/2. Please let me know if it doesn't
make sense.

Thanks,
Song

diff --git i/drivers/md/raid0.c w/drivers/md/raid0.c
index a9fcff50bbfc..54d0064787a8 100644
--- i/drivers/md/raid0.c
+++ w/drivers/md/raid0.c
@@ -615,6 +615,10 @@ static bool raid0_make_request(struct mddev
*mddev, struct bio *bio)
        case RAID0_ALT_MULTIZONE_LAYOUT:
                tmp_dev = map_sector(mddev, zone, sector, &sector);
                break;
+       default:
+               WARN("md/raid0:%s: Invalid layout\n", mdname(mddev));
+               bio_io_error(bio);
+               return true;
        }

        if (unlikely(is_mddev_broken(tmp_dev, "raid0"))) {
NeilBrown Sept. 11, 2019, 10:48 p.m. UTC | #9
On Wed, Sep 11 2019, Song Liu wrote:

> On Wed, Sep 11, 2019 at 12:10 AM NeilBrown <neilb@suse.de> wrote:
>>
>> On Tue, Sep 10 2019, Guoqing Jiang wrote:
>>
>> > On 9/10/19 5:45 PM, Song Liu wrote:
>> >>
>> >>
>> >>> On Sep 10, 2019, at 12:33 AM, NeilBrown <neilb@suse.de> wrote:
>> >>>
>> >>> On Mon, Sep 09 2019, Song Liu wrote:
>> >>>
>> >>>> Hi Neil,
>> >>>>
>> >>>>> On Sep 9, 2019, at 7:57 AM, NeilBrown <neilb@suse.de> wrote:
>> >>>>>
>> >>>>>
>> >>>>> If the drives in a RAID0 are not all the same size, the array is
>> >>>>> divided into zones.
>> >>>>> The first zone covers all drives, to the size of the smallest.
>> >>>>> The second zone covers all drives larger than the smallest, up to
>> >>>>> the size of the second smallest - etc.
>> >>>>>
>> >>>>> A change in Linux 3.14 unintentionally changed the layout for the
>> >>>>> second and subsequent zones.  All the correct data is still stored, but
>> >>>>> each chunk may be assigned to a different device than in pre-3.14 kernels.
>> >>>>> This can lead to data corruption.
>> >>>>>
>> >>>>> It is not possible to determine what layout to use - it depends which
>> >>>>> kernel the data was written by.
>> >>>>> So we add a module parameter to allow the old (0) or new (1) layout to be
>> >>>>> specified, and refused to assemble an affected array if that parameter is
>> >>>>> not set.
>> >>>>>
>> >>>>> Fixes: 20d0189b1012 ("block: Introduce new bio_split()")
>> >>>>> cc: stable@vger.kernel.org (3.14+)
>> >>>>> Signed-off-by: NeilBrown <neilb@suse.de>
>> >>>>
>> >>>> Thanks for the patches. They look great. However, I am having problem
>> >>>> apply them (not sure whether it is a problem on my side). Could you
>> >>>> please push it somewhere so I can use cherry-pick instead?
>> >>>
>> >>> I rebased them on block/for-next, fixed the problems that Guoqing found,
>> >>> and pushed them to
>> >>>   https://github.com/neilbrown/linux md/raid0
>> >>>
>> >>> NeilBrown
>> >>
>> >> Thanks Neil!
>> >
>> > Thanks for the explanation about set the flag.
>> >
>> >>
>> >> Guoqing, if this looks good, please reply with your Reviewed-by
>> >> or Acked-by.
>> >
>> > No more comments from my side, but I am not sure if it is better/possible to use one
>> > sysfs node to control the behavior instead of module parameter, then we can support
>> > different raid0 layout dynamically.
>>
>> A strength of module parameters is that you can set them in
>>   /etc/modprobe.d/00-local.conf
>> and then they are automatically set on boot.
>> For sysfs, you need some tool to set them.
>>
>> >
>> > Anyway, Acked-by: Guoqing Jiang <guoqing.jiang@cloud.ionos.com>
>> >
>
> I am adding the following change to the 1/2. Please let me know if it doesn't
> make sense.

I don't object, through with the current code it is impossible for that
warning to fire.
Code might change in the future though, and it's better to be safe than
sorry.

Thanks,
NeilBrown

>
> Thanks,
> Song
>
> diff --git i/drivers/md/raid0.c w/drivers/md/raid0.c
> index a9fcff50bbfc..54d0064787a8 100644
> --- i/drivers/md/raid0.c
> +++ w/drivers/md/raid0.c
> @@ -615,6 +615,10 @@ static bool raid0_make_request(struct mddev
> *mddev, struct bio *bio)
>         case RAID0_ALT_MULTIZONE_LAYOUT:
>                 tmp_dev = map_sector(mddev, zone, sector, &sector);
>                 break;
> +       default:
> +               WARN("md/raid0:%s: Invalid layout\n", mdname(mddev));
> +               bio_io_error(bio);
> +               return true;
>         }
>
>         if (unlikely(is_mddev_broken(tmp_dev, "raid0"))) {
diff mbox series

Patch

diff --git a/drivers/md/raid0.c b/drivers/md/raid0.c
index bf5cf184a260..a8888c12308a 100644
--- a/drivers/md/raid0.c
+++ b/drivers/md/raid0.c
@@ -19,6 +19,9 @@ 
 #include "raid0.h"
 #include "raid5.h"
 
+static int default_layout = -1;
+module_param(default_layout, int, 0644);
+
 #define UNSUPPORTED_MDDEV_FLAGS		\
 	((1L << MD_HAS_JOURNAL) |	\
 	 (1L << MD_JOURNAL_CLEAN) |	\
@@ -139,6 +142,19 @@  static int create_strip_zones(struct mddev *mddev, struct r0conf **private_conf)
 	}
 	pr_debug("md/raid0:%s: FINAL %d zones\n",
 		 mdname(mddev), conf->nr_strip_zones);
+
+	if (conf->nr_strip_zones == 1) {
+		conf->layout = RAID0_ORIG_LAYOUT;
+	} else if (default_layout == RAID0_ORIG_LAYOUT ||
+		   default_layout == RAID0_ALT_MULTIZONE_LAYOUT) {
+		conf->layout = default_layout;
+	} else {
+		pr_err("md/raid0:%s: cannot assemble multi-zone RAID0 with default_layout setting\n",
+		       mdname(mddev));
+		pr_err("md/raid0: please set raid.default_layout to 0 or 1\n");
+		err = -ENOTSUPP;
+		goto abort;
+	}
 	/*
 	 * now since we have the hard sector sizes, we can make sure
 	 * chunk size is a multiple of that sector size
@@ -547,10 +563,12 @@  static void raid0_handle_discard(struct mddev *mddev, struct bio *bio)
 
 static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 {
+	struct r0conf *conf = mddev->private;
 	struct strip_zone *zone;
 	struct md_rdev *tmp_dev;
 	sector_t bio_sector;
 	sector_t sector;
+	sector_t orig_sector;
 	unsigned chunk_sects;
 	unsigned sectors;
 
@@ -584,8 +602,16 @@  static bool raid0_make_request(struct mddev *mddev, struct bio *bio)
 		bio = split;
 	}
 
+	orig_sector = sector;
 	zone = find_zone(mddev->private, &sector);
-	tmp_dev = map_sector(mddev, zone, sector, &sector);
+	switch (conf->layout) {
+	case RAID0_ORIG_LAYOUT:
+		tmp_dev = map_sector(mddev, zone, orig_sector, &sector);
+		break;
+	case RAID0_ALT_MULTIZONE_LAYOUT:
+		tmp_dev = map_sector(mddev, zone, sector, &sector);
+		break;
+	}
 	bio_set_dev(bio, tmp_dev->bdev);
 	bio->bi_iter.bi_sector = sector + zone->dev_start +
 		tmp_dev->data_offset;
diff --git a/drivers/md/raid0.h b/drivers/md/raid0.h
index 540e65d92642..3816e5477db1 100644
--- a/drivers/md/raid0.h
+++ b/drivers/md/raid0.h
@@ -8,11 +8,25 @@  struct strip_zone {
 	int	 nb_dev;	/* # of devices attached to the zone */
 };
 
+/* Linux 3.14 (20d0189b101) made an unintended change to
+ * the RAID0 layout for multi-zone arrays (where devices aren't all
+ * the same size.
+ * RAID0_ORIG_LAYOUT restores the original layout
+ * RAID0_ALT_MULTIZONE_LAYOUT uses the altered layout
+ * The layouts are identical when there is only one zone (all
+ * devices the same size).
+ */
+
+enum r0layout {
+	RAID0_ORIG_LAYOUT = 1,
+	RAID0_ALT_MULTIZONE_LAYOUT = 2,
+};
 struct r0conf {
 	struct strip_zone	*strip_zone;
 	struct md_rdev		**devlist; /* lists of rdevs, pointed to
 					    * by strip_zone->dev */
 	int			nr_strip_zones;
+	enum r0layout		layout;
 };
 
 #endif