Message ID | mi2sgj$urs$1@ger.gmane.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote: > Currently BTRFS allows you to make bad choices of data and > metadata levels. For example -d raid1 -m raid0 means you can > only use half your total disk space, but will loose everything > if 1 disk fails. This patch prevents you creating the situation > another will be need to prevent rebalancing in to it. > > When making a filesystem check that metadata mode is at least > as redundant as the data mode. For example don't allow: > -d raid1 -m raid0 This is enforcing some policty that makes sense for some usecases, but I think that the tool should be flexible enough to create any kind of raid profiles. It's up to the user. I'm willing to add a warning that the profiles seem fishy, but failing mkfs without any way to override that is IMHO not a good thing. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 05/05/15 15:54, David Sterba wrote: > On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote: >> Currently BTRFS allows you to make bad choices of data and >> metadata levels. For example -d raid1 -m raid0 means you can >> only use half your total disk space, but will loose everything >> if 1 disk fails. This patch prevents you creating the situation >> another will be need to prevent rebalancing in to it. >> >> When making a filesystem check that metadata mode is at least >> as redundant as the data mode. For example don't allow: >> -d raid1 -m raid0 > > This is enforcing some policty that makes sense for some usecases, but I > think that the tool should be flexible enough to create any kind of raid > profiles. It's up to the user. I'm willing to add a warning that the > profiles seem fishy, but failing mkfs without any way to override that > is IMHO not a good thing. There already seems to be policy in test_num_disk_vs_raid() disallowing DUP for multiple devices. Is there really a useful case better protected data than metadata? In btrfs_balance() fs/btrfs/volumes.c, operations that reduce integrity require a 'force' option. Would that be a good way of handling questionable data/metadata combinations? If so should it overload the existing for option, or additional one, e.g. --force-raid-level? Otherwise I could redo it as just a warning. If wrote a similar check for rebalancing is there a way to share the group_profile_max_safe_loss() function between the kernel and btrfs-progs? Thanks, Sam -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
> -----Original Message----- > From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs- > owner@vger.kernel.org] On Behalf Of sam tygier > Sent: Wednesday, 6 May 2015 7:18 AM > To: linux-btrfs@vger.kernel.org > Subject: Re: [PATCH] btrfs-progs: check metadata redundancy > > On 05/05/15 15:54, David Sterba wrote: > > On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote: > >> Currently BTRFS allows you to make bad choices of data and metadata > >> levels. For example -d raid1 -m raid0 means you can only use half > >> your total disk space, but will loose everything if 1 disk fails. > >> This patch prevents you creating the situation another will be need > >> to prevent rebalancing in to it. > >> > >> When making a filesystem check that metadata mode is at least as > >> redundant as the data mode. For example don't allow: > >> -d raid1 -m raid0 > > > > This is enforcing some policty that makes sense for some usecases, but > > I think that the tool should be flexible enough to create any kind of > > raid profiles. It's up to the user. I'm willing to add a warning that > > the profiles seem fishy, but failing mkfs without any way to override > > that is IMHO not a good thing. > > There already seems to be policy in test_num_disk_vs_raid() disallowing > DUP for multiple devices. Is there really a useful case better protected data > than metadata? I would appreciate being able to use the DUP profile for data on a single disk - at the moment I just resort to partitioning the disk in two and creating a raid1. Usecase is portable disk backups - I take a master backup at site A with slow internet, and upload it to backup server from site B. Then I use rsync and snapshots. I start over every 2 months as one of the backups is an incremental backup of a few windows system images. Paul. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Paul Jones posted on Wed, 06 May 2015 03:40:12 +0000 as excerpted: > I would appreciate being able to use the DUP profile for data on a > single disk - at the moment I just resort to partitioning the disk in > two and creating a raid1. Usecase is portable disk backups Rabbit-trailing a bit from the original discussion to address this point... First of all, DUP data has been on the wishlist for some users for some time, generally for those who wish to capitalize on btrfs data checksumming and validation and actually be able to correct and scrub invalid data when using btrfs on a single device, which would seem to be what your use-case is about, at its root. Meanwhile, second, good news! There is actually one way to achieve DUP data on a single device, without resorting to partitioning a single device to create a raid1 on it: When doing the mkfs.btrfs, set the -M/--mixed chunk mode option. Mixed-chunk mode (aka mixed-block-group aka mixed-bg) is used by default on btrfs under 1 GiB, and there have been discussions about upping that to say 16 GiB or so, but the reason it isn't the default on normal larger btrfs is because (as the mkfs.btrfs manpage states) mixed-chunk mode incurs a performance penalty on larger btrfs. But partitioning up a single physical device to make it two logical devices, just to be able to setup raid1 data, surely incurs a larger penalty, at least on spinning rust, where the single set of write heads will have to seek back and forth between partitions. And obviously, if people are going to that extreme (and you and others have demonstrated that they are and do!), they're willing to pay the relatively smaller mixed-bg penalty. Since mixed-bg mode mixes data and metadata in the same block-groups (chunks), DUP mode was made an option there (and I think the default, tho I always set it specifically for both data and metadata, setting just one in the case of mixed-bg is an error), in ordered to maintain metadata DUP protection. But since as the name suggests they're mixed-bgs, that also has the effect of setting data DUP mode! =:^) When asked specifically about this, Chris Mason confirmed that wasn't intentional, simply an implementation accident, with the purpose of mixed- bg mode being, as the manpage and sub-1-GiB-default suggests, to allow more efficient usage of space on small btrfs, which was really needed as pre-mixed-bg, chunk allocation on small devices /was/ really inefficient, and mixed-bg really did solve that problem. However, accident or not, it's not something that can really be dropped now, and I've seen absolutely no suggestions to do so. So mixed-bg does seem to be the workaround for lack of DUP mode data, and even with its inefficiencies compared to separate data/metadata, compared to "virtual" raid1 on a single partitioned physical device, it should be /quite/ efficient indeed, at least on spinning rust. Which just leaves some loose ends to tie up... * I don't know that anyone has benchmarked, or even made claims based on logic, of performance on single SSD, partitioned raid1 mode against mixed- bg dup mode. * If the concern is failing media itself, again on spinning rust (ssd firmwares do weird things with consecutive sector addresses anyway), raid1 mode should still be slightly more reliable, simply because the two copies are going to be in partitions on opposite ends of the disc. Mixed- bg mode will tend to allocate chunks closer to each other, such that at least in theory, it's more likely that a flaky media section will damage both copies. * Many people would still like to have true DUP data chunks as an option, and personally, I think it'll likely eventually happen, because I think the reason people want that option is valid and it shouldn't be hard to implement -- I think mostly just letting that be a data option too, tho there's likely a few corner-case races that might expose that have been hidden so far as it's not an option. However, I also believe that as a practical matter, the existence of the mixed-bg workaround has more or less silenced the requests, and there have been bigger fish (well, bugs, and features like raid56 mode, not fish) to fry, so it hasn't been the issue that it would have been otherwise, and that has probably delayed the lifting of the DUP mode data restriction in the more general case.
On Tue, May 05, 2015 at 10:18:07PM +0100, sam tygier wrote: > On 05/05/15 15:54, David Sterba wrote: > > On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote: > >> Currently BTRFS allows you to make bad choices of data and > >> metadata levels. For example -d raid1 -m raid0 means you can > >> only use half your total disk space, but will loose everything > >> if 1 disk fails. This patch prevents you creating the situation > >> another will be need to prevent rebalancing in to it. > >> > >> When making a filesystem check that metadata mode is at least > >> as redundant as the data mode. For example don't allow: > >> -d raid1 -m raid0 > > > > This is enforcing some policty that makes sense for some usecases, but I > > think that the tool should be flexible enough to create any kind of raid > > profiles. It's up to the user. I'm willing to add a warning that the > > profiles seem fishy, but failing mkfs without any way to override that > > is IMHO not a good thing. > > There already seems to be policy in test_num_disk_vs_raid() disallowing > DUP for multiple devices. Is there really a useful case better protected > data than metadata? In case of DUP/data and single device it's not a policy but lack of implementation. And not a simple change to make it work AFAIK. DUP/metadata on multiple devices can exist only if a new device is added to an existing filesystem until it's balanced. Here it is a policy that multiple devices need RAID1. > In btrfs_balance() fs/btrfs/volumes.c, operations that reduce integrity > require a 'force' option. Would that be a good way of handling > questionable data/metadata combinations? If so should it overload the > existing for option, or additional one, e.g. --force-raid-level? I think changing the integrity is something different than the mkfs profile setup. The force flag prevents irreversible changes (overwriting an existing filesystem). Overloading it for the raid profiles does not sound good to me, it would have to be another flag. But, I still think that the user hould be aware of the properties of the respective raid levels, so the warning is IMHO enough. > Otherwise I could redo it as just a warning. Yes please. > If wrote a similar check for rebalancing is there a way to share the > group_profile_max_safe_loss() function between the kernel and btrfs-progs? No, the source code is not shared now, both have to be patched separately. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On Wed, May 06, 2015 at 03:40:12AM +0000, Paul Jones wrote: > I would appreciate being able to use the DUP profile for data on a > single disk - at the moment I just resort to partitioning the disk in > two and creating a raid1. As you can read elsewhere, it's possible with the mixed profiles and AFAICT this will be the only way for the forseeable future. -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/utils.c b/utils.c index b175b01..1136a78 100644 --- a/utils.c +++ b/utils.c @@ -2391,6 +2391,24 @@ static int group_profile_devs_min(u64 flag) } } +static int group_profile_max_safe_loss(u64 flag) +{ + switch (flag & BTRFS_BLOCK_GROUP_PROFILE_MASK) { + case 0: /* single */ + case BTRFS_BLOCK_GROUP_DUP: + case BTRFS_BLOCK_GROUP_RAID0: + return 0; + case BTRFS_BLOCK_GROUP_RAID1: + case BTRFS_BLOCK_GROUP_RAID5: + case BTRFS_BLOCK_GROUP_RAID10: + return 1; + case BTRFS_BLOCK_GROUP_RAID6: + return 2; + default: + return -1; + } +} + int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile, u64 dev_cnt, int mixed, char *estr) { @@ -2439,6 +2457,13 @@ int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile, "dup for data is allowed only in mixed mode"); return 1; } + + if (group_profile_max_safe_loss(metadata_profile) < + group_profile_max_safe_loss(data_profile)){ + snprintf(estr, sz, + "metatdata has lower redundancy than data"); + return 1; + } return 0; }
Currently BTRFS allows you to make bad choices of data and metadata levels. For example -d raid1 -m raid0 means you can only use half your total disk space, but will loose everything if 1 disk fails. This patch prevents you creating the situation another will be need to prevent rebalancing in to it. When making a filesystem check that metadata mode is at least as redundant as the data mode. For example don't allow: -d raid1 -m raid0 Signed-off-by: Sam Tygier <samtygier@yahoo.co.uk> --- utils.c | 25 +++++++++++++++++++++++++ 1 file changed, 25 insertions(+) -- 2.1.4 -- To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html