Message ID | 1461812780-538-2-git-send-email-anand.jain@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 28, 2016 at 11:06:19AM +0800, Anand Jain wrote: > When RAID1 is degraded, newer chunks should be degraded-RAID1 > chunks instead of single chunks. > > The bug is because the devs_min for raid1 was wrong it should > be 1, instead of 2. > > Signed-off-by: Anand Jain <anand.jain@oracle.com> > --- > fs/btrfs/volumes.c | 38 +++++++++++++++++++++++++++++++++----- > 1 file changed, 33 insertions(+), 5 deletions(-) > > diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c > index e2b54d546b7c..8b87ed6eb381 100644 > --- a/fs/btrfs/volumes.c > +++ b/fs/btrfs/volumes.c > @@ -56,7 +56,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { > .sub_stripes = 1, > .dev_stripes = 1, > .devs_max = 2, > - .devs_min = 2, > + .devs_min = 1, I think we should introduce another way how to determine the lower limit for the degraded mounts. We need the proper raidX constraints and use the degraded limits only if in case of the degraded mount. > .tolerated_failures = 1, Which is exactly the tolerated_failures: degraded_devs_min == devs_min - tolerated_failures which works for all raid levels with redundancy. > .devs_increment = 2, > .ncopies = 2, -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Thanks for comments, more below.. On 04/30/2016 12:42 AM, David Sterba wrote: > On Thu, Apr 28, 2016 at 11:06:19AM +0800, Anand Jain wrote: >> When RAID1 is degraded, newer chunks should be degraded-RAID1 >> chunks instead of single chunks. >> >> The bug is because the devs_min for raid1 was wrong it should >> be 1, instead of 2. >> >> Signed-off-by: Anand Jain <anand.jain@oracle.com> >> --- >> fs/btrfs/volumes.c | 38 +++++++++++++++++++++++++++++++++----- >> 1 file changed, 33 insertions(+), 5 deletions(-) >> >> diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c >> index e2b54d546b7c..8b87ed6eb381 100644 >> --- a/fs/btrfs/volumes.c >> +++ b/fs/btrfs/volumes.c >> @@ -56,7 +56,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { >> .sub_stripes = 1, >> .dev_stripes = 1, >> .devs_max = 2, >> - .devs_min = 2, >> + .devs_min = 1, > > I think we should introduce another way how to determine the lower limit > for the degraded mounts. We need the proper raidX constraints and use > the degraded limits only if in case of the degraded mount. > >> .tolerated_failures = 1, > > Which is exactly the tolerated_failures: > > degraded_devs_min == devs_min - tolerated_failures that is devs_min is actually healthy_devs_min. > which works for all raid levels with redundancy. But not for RAID5 and RAID6. Here is a (simulation?) tool which gives some ready ans. I have added devs_min - tolerated_failures to it. https://github.com/asj/btrfs-raid-cal.git I am seeing problem as this: RAID5&6 devs_min values are in the context of degraded volume. RAID1&10.. devs_min values are in the context of healthy volume. RAID56 is correct. We already have devs_max to know the number of devices in a healthy volumes. RAID1 is devs_min is wrong so it ended up being same as devs_max. ? Thanks, Anand >> .devs_increment = 2, >> .ncopies = 2, > -- > To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
>>> - .devs_min = 2, >>> + .devs_min = 1, >> I think we should introduce another way how to determine the lower limit >> for the degraded mounts. We need the proper raidX constraints and use >> the degraded limits only if in case of the degraded mount. >>> .tolerated_failures = 1, >> Which is exactly the tolerated_failures: >> >> degraded_devs_min == devs_min - tolerated_failures > that is devs_min is actually healthy_devs_min. >> which works for all raid levels with redundancy. > But not for RAID5 and RAID6. > Here is a (simulation?) tool which gives some ready ans. > I have added devs_min - tolerated_failures to it. > > https://github.com/asj/btrfs-raid-cal.git I have copied the state table from the above repo and modified to add the above equation. [x1 = devs_increment * sub_stripes] [ndevs' = ndevs - ndevs % devs_increment] [num_stripes = ndevs' * dev_stripes] [data_stripes = num_stripes / ncopies] [missing_mirror_dev = x1 - ndevs'] [Y = devs_min - tolerated_failures ] R10 R1 DUP R0 Sn R5 R6 .sub_stripes = 2, 1, 1, 1, 1, 1, 1 .dev_stripes = 1, 1, 2, 1, 1, 1, 1 .devs_max = 0, 2, 1, 0, 1, 0, 0 .devs_min = 4, 1, 1, 2, 1, 2, 3 .tolerated_fails= 1, 1, 0, 0, 0, 1, 2 .devs_increment = 2, 2, 1, 1, 1, 1, 1 .ncopies = 2, 2, 2, 1, 1, 2, 3 x1 = 4, 2, 1, 1, 1, 1, 1 Y = 3, 0, 1, 2, 1, 1, 1 [ndevs = 9] ndevs = 9, 9, 9, 9, 9, 9, 9 ndevs' = 8, 2, 1, 9, 9, 9, 9 num_stripes = 8, 2, 2, 9, 1, 9, 9 data_stripes = 4, 1, 1, 9, 1, 8, 7 [ndevs = tolerated_fails + devs_min] ndevs = 5, 2, 1, 2, 1, 3, 5 ndevs' = 4, 2, 1, 2, 1, 3, 5 num_stripes = 4, 2, 2, 2, 1, 3, 5 data_stripes = 2, 1, 1, 2, 1, 1, 1 [ndevs = devs_min] ndevs = 3, 1, 1, 2, 1, 2, 3 ndevs' = 2, 0, 1, 2, 1, 2, 3 num_stripes = -, -, 2, 2, 1, 2, 3 data_stripes = -, -, 1, 2, 1, 1, 1 [ndevs = devs_min, with RAID1 patch fix] ndevs = 3, 1, 1, 2, 1, 2, 3 ndevs' = 3, 1, 1, 2, 1, 2, 3 num_stripes = 3, 1, 2, 2, 1, 2, 3 data_stripes = ?, 1, 1, 2, 1, 1, 1 > I am seeing problem as this: > RAID5&6 devs_min values are in the context of degraded volume. > RAID1&10.. devs_min values are in the context of healthy volume. > > RAID56 is correct. We already have devs_max to know the number > of devices in a healthy volumes. RAID1 is devs_min is wrong so > it ended up being same as devs_max. Thanks, Anand -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/fs/btrfs/volumes.c b/fs/btrfs/volumes.c index e2b54d546b7c..8b87ed6eb381 100644 --- a/fs/btrfs/volumes.c +++ b/fs/btrfs/volumes.c @@ -56,7 +56,7 @@ const struct btrfs_raid_attr btrfs_raid_array[BTRFS_NR_RAID_TYPES] = { .sub_stripes = 1, .dev_stripes = 1, .devs_max = 2, - .devs_min = 2, + .devs_min = 1, .tolerated_failures = 1, .devs_increment = 2, .ncopies = 2, @@ -4513,6 +4513,7 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, int i; int j; int index; + int missing_dev = 0; BUG_ON(!alloc_profile_is_valid(type, 0)); @@ -4627,14 +4628,35 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, sort(devices_info, ndevs, sizeof(struct btrfs_device_info), btrfs_cmp_device_info, NULL); - /* round down to number of usable stripes */ - ndevs -= ndevs % devs_increment; - if (ndevs < devs_increment * sub_stripes || ndevs < devs_min) { + /* + * For raid1 and raid10; ndevs = devs_min, is less than + * (devs_increment * sub_stripes) = x in degraded mode. + * For rest of the RAIDs, x is 1 + */ + if (ndevs >= (devs_increment * sub_stripes)) { + /* round down to number of usable stripes */ + ndevs -= ndevs % devs_increment; + } + + /* + * Fix devs_min for RAID10 + */ + if (ndevs < devs_min) { + /* todo: look for better error reporting */ ret = -ENOSPC; goto error; } + /* + * For RAID1 and RAID10 if there is no sufficient dev for mirror + * or stripe, let the chunks be created in degraded mode. + * And for rest of RAIDs its fine as (devs_increment * sub_stripes) + * is 1. + */ + if (ndevs < (devs_increment * sub_stripes)) + missing_dev = (devs_increment * sub_stripes) - ndevs; + if (devs_max && ndevs > devs_max) ndevs = devs_max; /* @@ -4645,11 +4667,17 @@ static int __btrfs_alloc_chunk(struct btrfs_trans_handle *trans, num_stripes = ndevs * dev_stripes; /* - * this will have to be fixed for RAID1 and RAID10 over + * This will have to be fixed for RAID1 and RAID10 over * more drives */ data_stripes = num_stripes / ncopies; + if (type & BTRFS_BLOCK_GROUP_RAID1) { + if (missing_dev) { + /* For RAID1 data_stripes is always 1 rather */ + data_stripes = num_stripes; + } + } if (type & BTRFS_BLOCK_GROUP_RAID5) { raid_stripe_len = find_raid56_stripe_len(ndevs - 1, btrfs_super_stripesize(info->super_copy));
When RAID1 is degraded, newer chunks should be degraded-RAID1 chunks instead of single chunks. The bug is because the devs_min for raid1 was wrong it should be 1, instead of 2. Signed-off-by: Anand Jain <anand.jain@oracle.com> --- fs/btrfs/volumes.c | 38 +++++++++++++++++++++++++++++++++----- 1 file changed, 33 insertions(+), 5 deletions(-)