Message ID | or7gd59qcf.fsf@livre.home (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 2013/10/22 07:18 PM, Alexandre Oliva wrote: > ... and > it is surely an improvement over the current state of raid56 in btrfs, > so it might be a good idea to put it in. I suspect the issue is that, while it sortof works, we don't really want to push people to use it half-baked. This is reassuring work, however. Maybe it would be nice to have some half-baked code *anyway*, even if Chris doesn't put it in his pull requests juuust yet. ;) > So far, I've put more than > 1TB of data on that failing disk with 16 partitions on raid6, and > somehow I got all the data back successfully: every file passed an > md5sum check, in spite of tons of I/O errors in the process. Is this all on a single disk? If so it must be seeking like mad! haha
On Oct 22, 2013, Brendan Hide <brendan@swiftspirit.co.za> wrote: > On 2013/10/22 07:18 PM, Alexandre Oliva wrote: >> ... and >> it is surely an improvement over the current state of raid56 in btrfs, >> so it might be a good idea to put it in. > I suspect the issue is that, while it sortof works, we don't really > want to push people to use it half-baked. I don't think the current state of the implementation upstream is compatible with that statement ;-) One can create and run a glorified raid0 that computes and updates parity blocks it won't use for anything, while the name gives the illusion of a more reliable filesystem than it actually is, and it will freeze when encountering any of the failures the name suggests it would protect from. If we didn't have any raid56 support at all, or if it was configured separately and disabled by default, I'd concur with your statement. But as things stand, any improvement to the raid56 implementation that brings at least some of the safety net raid56 are meant to provide makes things better, without giving users an idea that the implementation is any more full-featured than it currently is. > Maybe it would be nice to have some half-baked code *anyway*, > even if Chris doesn't put it in his pull requests juuust yet. ;) Why, sure, that's why I posted the patch; even if it didn't make it to the repository, others might find it useful ;-) >> So far, I've put more than >> 1TB of data on that failing disk with 16 partitions on raid6, and >> somehow I got all the data back successfully: every file passed an >> md5sum check, in spite of tons of I/O errors in the process. > Is this all on a single disk? If so it must be seeking like mad! haha Yeah. It probably is, but the access pattern for most of the time is mostly random access to smallish files, so that won't be a problem. I considered doing raid1 on the data, to get some more reliability out of the broken disk, but then I recalled there was this raid56 implementation that, in raid6, would theoretically bring about additional reliability and be far more space efficient, so I decided to give it a try. Only after I'd put in most of the data did the errors start popping out. Then I decided to try and fix them instead of moving data out. it was some happy hacking ;-)
Alexandre Oliva posted on Tue, 22 Oct 2013 17:24:37 -0200 as excerpted: > On Oct 22, 2013, Brendan Hide <brendan@swiftspirit.co.za> wrote: > >> On 2013/10/22 07:18 PM, Alexandre Oliva wrote: >>> ... and it is surely an improvement over the current state of raid56 >>> in btrfs, >>> so it might be a good idea to put it in. >> I suspect the issue is that, while it sortof works, we don't really >> want to push people to use it half-baked. > > I don't think the current state of the implementation upstream is > compatible with that statement ;-) > > One can create and run a glorified raid0 that computes and updates > parity blocks it won't use for anything, while the name gives the > illusion of a more reliable filesystem than it actually is, and it will > freeze when encountering any of the failures the name suggests it would > protect from. > > If we didn't have any raid56 support at all, or if it was configured > separately and disabled by default, I'd concur with your statement. But > as things stand, any improvement to the raid56 implementation that > brings at least some of the safety net raid56 are meant to provide makes > things better, without giving users an idea that the implementation is > any more full-featured than it currently is. The thing is, btrfs /doesn't/ have any raid56 support at all, in the practical sense of the word. There is a preliminary partial implementation, exactly as announced/warned when the feature went in, on a filesystem that itself is still experimental/testing status, so even for the features that are in general working, make and test your backups and keep 'em handy! Anyone running btrfs at this point should know it's status and be keeping up with upstream /because/ of that status, or they shouldn't be testing/ using it at all as it's not yet considered a stable filesystem. If they're already aware of upstream status and are deliberately testing, by definition they'll already know the preliminary/partial nature of the current raid56 implementation and there won't be an issue. If they aren't already keeping up with developments on a declared experimental filesystem, that's the base problem right there, and the quick failure should they try raid56 in its current state simply alerts them to the problem they already had.
On Oct 22, 2013, Duncan <1i5t5.duncan@cox.net> wrote: > the quick failure should they try raid56 in its current state simply > alerts them to the problem they already had. What quick failure? There's no such thing in place AFAIK. It seems to do all the work properly, the limitations in the current implementation will only show up when an I/O error kicks in. I can't see any indication, in existing announcements, that recovery from I/O errors in raid56 is missing, let alone that it's so utterly and completely broken that it will freeze the entire filesystem and require a forced reboot to unmount the filesystem and make any other data in it accessible again. That's far, far worse than the general state of btrfs, and that's not a documented limitation of raid56, so how would someone be expected to know about it? It certainly isn't obvious by having a cursory look at the code either.
diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c index fe443fe..4a592a3 100644 --- a/fs/btrfs/extent_io.c +++ b/fs/btrfs/extent_io.c @@ -2061,11 +2061,11 @@ int repair_io_failure(struct btrfs_fs_info *fs_info, u64 start, struct btrfs_mapping_tree *map_tree = &fs_info->mapping_tree; int ret; - BUG_ON(!mirror_num); - /* we can't repair anything in raid56 yet */ if (btrfs_is_parity_mirror(map_tree, logical, length, mirror_num)) - return 0; + return -EIO; + + BUG_ON(!mirror_num); bio = btrfs_io_bio_alloc(GFP_NOFS, 1); if (!bio) @@ -2157,7 +2157,6 @@ static int clean_io_failure(u64 start, struct page *page) return 0; failrec = (struct io_failure_record *)(unsigned long) private_failure; - BUG_ON(!failrec->this_mirror); if (failrec->in_validation) { /* there was no real error, just free the record */ @@ -2167,6 +2166,12 @@ static int clean_io_failure(u64 start, struct page *page) goto out; } + if (!failrec->this_mirror) { + pr_debug("clean_io_failure: failrec->this_mirror not set, assuming %llu not repaired\n", + failrec->start); + goto out; + } + spin_lock(&BTRFS_I(inode)->io_tree.lock); state = find_first_extent_bit_state(&BTRFS_I(inode)->io_tree, failrec->start, @@ -2338,7 +2343,9 @@ static int bio_readpage_error(struct bio *failed_bio, struct page *page, * everything for repair_io_failure to do the rest for us. */ if (failrec->in_validation) { - BUG_ON(failrec->this_mirror != failed_mirror); + if (failrec->this_mirror != failed_mirror) + pr_debug("bio_readpage_error: this_mirror equals failed_mirror: %i\n", + failed_mirror); failrec->in_validation = 0; failrec->this_mirror = 0; } diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c index 0525e13..2d1a960 100644 --- a/fs/btrfs/raid56.c +++ b/fs/btrfs/raid56.c @@ -1732,6 +1732,8 @@ static void __raid_recover_end_io(struct btrfs_raid_bio *rbio) int err; int i; + pr_debug("__raid_recover_end_io: attempting error recovery\n"); + pointers = kzalloc(rbio->bbio->num_stripes * sizeof(void *), GFP_NOFS); if (!pointers) { @@ -1886,17 +1888,22 @@ cleanup: cleanup_io: if (rbio->read_rebuild) { - if (err == 0) + if (err == 0) { + pr_debug("__raid_recover_end_io: successful read_rebuild\n"); cache_rbio_pages(rbio); - else + } else { + pr_debug("__raid_recover_end_io: failed read_rebuild\n"); clear_bit(RBIO_CACHE_READY_BIT, &rbio->flags); + } rbio_orig_end_io(rbio, err, err == 0); } else if (err == 0) { + pr_debug("__raid_recover_end_io: successful recovery, on to fnish_rmw\n"); rbio->faila = -1; rbio->failb = -1; finish_rmw(rbio); } else { + pr_debug("__raid_recover_end_io: failed recovery\n"); rbio_orig_end_io(rbio, err, 0); } } @@ -1922,10 +1929,13 @@ static void raid_recover_end_io(struct bio *bio, int err) if (!atomic_dec_and_test(&rbio->bbio->stripes_pending)) return; - if (atomic_read(&rbio->bbio->error) > rbio->bbio->max_errors) + if (atomic_read(&rbio->bbio->error) > rbio->bbio->max_errors) { + pr_debug("raid_recover_end_io: unrecoverable error\n"); rbio_orig_end_io(rbio, -EIO, 0); - else + } else { + pr_debug("raid_recover_end_io: attempting error recovery\n"); __raid_recover_end_io(rbio); + } } /*