btrfs raid5

From: Alexandre Oliva <oliva@gnu.org>

On Oct 22, 2013, Duncan <1i5t5.duncan@cox.net> wrote:

> This is because there's a hole in the recovery process in case of a
> lost device, making it dangerous to use except for the pure test-case.

It's not just that; any I/O error in raid56 chunks will trigger a BUG
and make the filesystem unusable until the next reboot, because the
mirror number is zero.  I wrote this patch last week, just before
leaving on a trip, and I was happy to find out it enabled a
frequently-failing disk to hold a filesystem that turned out to be
surprisingly reliable!

btrfs: some progress in raid56 recovery

From: Alexandre Oliva <oliva@gnu.org>

This patch is WIP, but it has enabled a raid6 filesystem on a bad disk
(frequent read failures at random blocks) to work flawlessly for a
couple of weeks, instead of hanging the entire filesystem upon the
first read error.

One of the problems is that we have the mirror number set to zero on
most raid56 reads.  That's unexpected, for mirror numbers start at
one.  I couldn't quite figure out where to fix the mirror number in
the bio construction, but by simply refraining from failing when the
mirror number is zero, I found out we end up retrying the read with
the next mirror, which becomes a read retry that, on my bad disk,
often succeeds.  So, that was the first win.

After that, I had to make a few further tweaks so that other BUG_ONs
wouldn't hit, and we'd instead fail the read altogether, i.e., in the
extent_io layer, we still don't repair/rewrite the raid56 blocks, nor
do we attempt to rebuild bad blocks out of the other blocks in the
stride.  In a few cases in which the read retry didn't succeed, I'd
get an extent cksum verify failure, which I regarded as ok.

What did surprise me was that, for some of these failures, but not
all, the raid56 recovery code would kick in and rebuild the bad block,
so that we'd get the correct data back in spite of the cksum failure
and the bad block.  I'm still puzzled by that; I can't explain what
I'm observing, but surely the correct data is coming out of somewhere
;-)

Another oddity I noticed is that sometimes the mirror numbers appear
to be totally out of range; I suspect there might be some type
mismatch or out-of-range memory access that causes some other
information to be read as a mirror number from bios or somesuch.  I
couldn't track that down yet.

As it stands, although I know this still doesn't kick in the recovery
or repair code at the right place, the patch is usable on its own, and
it is surely an improvement over the current state of raid56 in btrfs,
so it might be a good idea to put it in.  So far, I've put more than
1TB of data on that failing disk with 16 partitions on raid6, and
somehow I got all the data back successfully: every file passed an
md5sum check, in spite of tons of I/O errors in the process.

Signed-off-by: Alexandre Oliva <oliva@gnu.org>
---
 fs/btrfs/extent_io.c |   17 ++++++++++++-----
 fs/btrfs/raid56.c    |   18 ++++++++++++++----
 2 files changed, 26 insertions(+), 9 deletions(-)

btrfs raid5

Commit Message

Comments

Patch