btrfs-progs: check metadata redundancy
diff mbox

Message ID mi2sgj$urs$1@ger.gmane.org
State New
Headers show

Commit Message

Sam Tygier May 2, 2015, 4:03 p.m. UTC
Currently BTRFS allows you to make bad choices of data and 
metadata levels. For example -d raid1 -m raid0 means you can
only use half your total disk space, but will loose everything
if 1 disk fails. This patch prevents you creating the situation
another will be need to prevent rebalancing in to it. 

When making a filesystem check that metadata mode is at least
as redundant as the data mode. For example don't allow:
	-d raid1 -m raid0

Signed-off-by: Sam Tygier <samtygier@yahoo.co.uk>
---
 utils.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

-- 2.1.4


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

David Sterba May 5, 2015, 2:54 p.m. UTC | #1
On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote:
> Currently BTRFS allows you to make bad choices of data and 
> metadata levels. For example -d raid1 -m raid0 means you can
> only use half your total disk space, but will loose everything
> if 1 disk fails. This patch prevents you creating the situation
> another will be need to prevent rebalancing in to it. 
> 
> When making a filesystem check that metadata mode is at least
> as redundant as the data mode. For example don't allow:
> 	-d raid1 -m raid0

This is enforcing some policty that makes sense for some usecases, but I
think that the tool should be flexible enough to create any kind of raid
profiles. It's up to the user. I'm willing to add a warning that the
profiles seem fishy, but failing mkfs without any way to override that
is IMHO not a good thing.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sam Tygier May 5, 2015, 9:18 p.m. UTC | #2
On 05/05/15 15:54, David Sterba wrote:
> On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote:
>> Currently BTRFS allows you to make bad choices of data and
>> metadata levels. For example -d raid1 -m raid0 means you can
>> only use half your total disk space, but will loose everything
>> if 1 disk fails. This patch prevents you creating the situation
>> another will be need to prevent rebalancing in to it.
>>
>> When making a filesystem check that metadata mode is at least
>> as redundant as the data mode. For example don't allow:
>> 	-d raid1 -m raid0
> 
> This is enforcing some policty that makes sense for some usecases, but I
> think that the tool should be flexible enough to create any kind of raid
> profiles. It's up to the user. I'm willing to add a warning that the
> profiles seem fishy, but failing mkfs without any way to override that
> is IMHO not a good thing.

There already seems to be policy in test_num_disk_vs_raid() disallowing
DUP for multiple devices. Is there really a useful case better protected
data than metadata?

In btrfs_balance() fs/btrfs/volumes.c, operations that reduce integrity
require a 'force' option. Would that be a good way of handling
questionable data/metadata combinations? If so should it overload the
existing for option, or additional one, e.g. --force-raid-level?

Otherwise I could redo it as just a warning.

If wrote a similar check for rebalancing is there a way to share the
group_profile_max_safe_loss() function between the kernel and btrfs-progs?

Thanks,
Sam

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paul Jones May 6, 2015, 3:40 a.m. UTC | #3
> -----Original Message-----
> From: linux-btrfs-owner@vger.kernel.org [mailto:linux-btrfs-
> owner@vger.kernel.org] On Behalf Of sam tygier
> Sent: Wednesday, 6 May 2015 7:18 AM
> To: linux-btrfs@vger.kernel.org
> Subject: Re: [PATCH] btrfs-progs: check metadata redundancy
> 
> On 05/05/15 15:54, David Sterba wrote:
> > On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote:
> >> Currently BTRFS allows you to make bad choices of data and metadata
> >> levels. For example -d raid1 -m raid0 means you can only use half
> >> your total disk space, but will loose everything if 1 disk fails.
> >> This patch prevents you creating the situation another will be need
> >> to prevent rebalancing in to it.
> >>
> >> When making a filesystem check that metadata mode is at least as
> >> redundant as the data mode. For example don't allow:
> >> 	-d raid1 -m raid0
> >
> > This is enforcing some policty that makes sense for some usecases, but
> > I think that the tool should be flexible enough to create any kind of
> > raid profiles. It's up to the user. I'm willing to add a warning that
> > the profiles seem fishy, but failing mkfs without any way to override
> > that is IMHO not a good thing.
> 
> There already seems to be policy in test_num_disk_vs_raid() disallowing
> DUP for multiple devices. Is there really a useful case better protected data
> than metadata?

I would appreciate being able to use the DUP profile for data on a single disk - at the moment I just resort to partitioning the disk in two and creating a raid1.
Usecase is portable disk backups - I take a master backup at site A with slow internet, and upload it to backup server from site B. Then I use rsync and snapshots. I start over every 2 months as one of the backups is an incremental backup of a few windows system images.

Paul.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Duncan May 6, 2015, 10:14 a.m. UTC | #4
Paul Jones posted on Wed, 06 May 2015 03:40:12 +0000 as excerpted:

> I would appreciate being able to use the DUP profile for data on a
> single disk - at the moment I just resort to partitioning the disk in
> two and creating a raid1. Usecase is portable disk backups

Rabbit-trailing a bit from the original discussion to address this 
point...

First of all, DUP data has been on the wishlist for some users for some 
time, generally for those who wish to capitalize on btrfs data 
checksumming and validation and actually be able to correct and scrub 
invalid data when using btrfs on a single device, which would seem to be 
what your use-case is about, at its root.

Meanwhile, second, good news!  There is actually one way to achieve DUP 
data on a single device, without resorting to partitioning a single 
device to create a raid1 on it:

When doing the mkfs.btrfs, set the -M/--mixed chunk mode option.  

Mixed-chunk mode (aka mixed-block-group aka mixed-bg) is used by default 
on btrfs under 1 GiB, and there have been discussions about upping that 
to say 16 GiB or so, but the reason it isn't the default on normal larger 
btrfs is because (as the mkfs.btrfs manpage states) mixed-chunk mode 
incurs a performance penalty on larger btrfs.

But partitioning up a single physical device to make it two logical 
devices, just to be able to setup raid1 data, surely incurs a larger 
penalty, at least on spinning rust, where the single set of write heads 
will have to seek back and forth between partitions.  And obviously, if 
people are going to that extreme (and you and others have demonstrated 
that they are and do!), they're willing to pay the relatively smaller 
mixed-bg penalty.

Since mixed-bg mode mixes data and metadata in the same block-groups 
(chunks), DUP mode was made an option there (and I think the default, tho 
I always set it specifically for both data and metadata, setting just one 
in the case of mixed-bg is an error), in ordered to maintain metadata DUP 
protection.  But since as the name suggests they're mixed-bgs, that also 
has the effect of setting data DUP mode! =:^)

When asked specifically about this, Chris Mason confirmed that wasn't 
intentional, simply an implementation accident, with the purpose of mixed-
bg mode being, as the manpage and sub-1-GiB-default suggests, to allow 
more efficient usage of space on small btrfs, which was really needed as 
pre-mixed-bg, chunk allocation on small devices /was/ really inefficient, 
and mixed-bg really did solve that problem.  However, accident or not, 
it's not something that can really be dropped now, and I've seen 
absolutely no suggestions to do so.

So mixed-bg does seem to be the workaround for lack of DUP mode data, and 
even with its inefficiencies compared to separate data/metadata, compared 
to "virtual" raid1 on a single partitioned physical device, it should be 
/quite/ efficient indeed, at least on spinning rust.

Which just leaves some loose ends to tie up...

* I don't know that anyone has benchmarked, or even made claims based on 
logic, of performance on single SSD, partitioned raid1 mode against mixed-
bg dup mode.

* If the concern is failing media itself, again on spinning rust (ssd 
firmwares do weird things with consecutive sector addresses anyway), 
raid1 mode should still be slightly more reliable, simply because the two 
copies are going to be in partitions on opposite ends of the disc.  Mixed-
bg mode will tend to allocate chunks closer to each other, such that at 
least in theory, it's more likely that a flaky media section will damage 
both copies.

* Many people would still like to have true DUP data chunks as an option, 
and personally, I think it'll likely eventually happen, because I think 
the reason people want that option is valid and it shouldn't be hard to 
implement -- I think mostly just letting that be a data option too, tho 
there's likely a few corner-case races that might expose that have been 
hidden so far as it's not an option.  However, I also believe that as a 
practical matter, the existence of the mixed-bg workaround has more or 
less silenced the requests, and there have been bigger fish (well, bugs, 
and features like raid56 mode, not fish) to fry, so it hasn't been the 
issue that it would have been otherwise, and that has probably delayed 
the lifting of the DUP mode data restriction in the more general case.
David Sterba May 12, 2015, 3:02 p.m. UTC | #5
On Tue, May 05, 2015 at 10:18:07PM +0100, sam tygier wrote:
> On 05/05/15 15:54, David Sterba wrote:
> > On Sat, May 02, 2015 at 05:03:31PM +0100, sam tygier wrote:
> >> Currently BTRFS allows you to make bad choices of data and
> >> metadata levels. For example -d raid1 -m raid0 means you can
> >> only use half your total disk space, but will loose everything
> >> if 1 disk fails. This patch prevents you creating the situation
> >> another will be need to prevent rebalancing in to it.
> >>
> >> When making a filesystem check that metadata mode is at least
> >> as redundant as the data mode. For example don't allow:
> >> 	-d raid1 -m raid0
> > 
> > This is enforcing some policty that makes sense for some usecases, but I
> > think that the tool should be flexible enough to create any kind of raid
> > profiles. It's up to the user. I'm willing to add a warning that the
> > profiles seem fishy, but failing mkfs without any way to override that
> > is IMHO not a good thing.
> 
> There already seems to be policy in test_num_disk_vs_raid() disallowing
> DUP for multiple devices. Is there really a useful case better protected
> data than metadata?

In case of DUP/data and single device it's not a policy but lack of
implementation. And not a simple change to make it work AFAIK.

DUP/metadata on multiple devices can exist only if a new device is added
to an existing filesystem until it's balanced. Here it is a policy that
multiple devices need RAID1.

> In btrfs_balance() fs/btrfs/volumes.c, operations that reduce integrity
> require a 'force' option. Would that be a good way of handling
> questionable data/metadata combinations? If so should it overload the
> existing for option, or additional one, e.g. --force-raid-level?

I think changing the integrity is something different than the mkfs
profile setup.

The force flag prevents irreversible changes (overwriting an existing
filesystem). Overloading it for the raid profiles does not sound good to
me, it would have to be another flag. But, I still think that the user
hould be aware of the properties of the respective raid levels, so the
warning is IMHO enough.

> Otherwise I could redo it as just a warning.

Yes please.

> If wrote a similar check for rebalancing is there a way to share the
> group_profile_max_safe_loss() function between the kernel and btrfs-progs?

No, the source code is not shared now, both have to be patched
separately.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
David Sterba May 12, 2015, 3:04 p.m. UTC | #6
On Wed, May 06, 2015 at 03:40:12AM +0000, Paul Jones wrote:
> I would appreciate being able to use the DUP profile for data on a
> single disk - at the moment I just resort to partitioning the disk in
> two and creating a raid1.

As you can read elsewhere, it's possible with the mixed profiles and
AFAICT this will be the only way for the forseeable future.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Patch
diff mbox

diff --git a/utils.c b/utils.c
index b175b01..1136a78 100644
--- a/utils.c
+++ b/utils.c
@@ -2391,6 +2391,24 @@  static int group_profile_devs_min(u64 flag)
 	}
 }
 
+static int group_profile_max_safe_loss(u64 flag)
+{
+	switch (flag & BTRFS_BLOCK_GROUP_PROFILE_MASK) {
+	case 0: /* single */
+	case BTRFS_BLOCK_GROUP_DUP:
+	case BTRFS_BLOCK_GROUP_RAID0:
+		return 0;
+	case BTRFS_BLOCK_GROUP_RAID1:
+	case BTRFS_BLOCK_GROUP_RAID5:
+	case BTRFS_BLOCK_GROUP_RAID10:
+		return 1;
+	case BTRFS_BLOCK_GROUP_RAID6:
+		return 2;
+	default:
+		return -1;
+	}
+}
+
 int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile,
 	u64 dev_cnt, int mixed, char *estr)
 {
@@ -2439,6 +2457,13 @@  int test_num_disk_vs_raid(u64 metadata_profile, u64 data_profile,
 			"dup for data is allowed only in mixed mode");
 		return 1;
 	}
+
+	if (group_profile_max_safe_loss(metadata_profile) <
+		group_profile_max_safe_loss(data_profile)){
+		snprintf(estr, sz,
+			"metatdata has lower redundancy than data");
+		return 1;
+	}
 	return 0;
 }