diff mbox

Raid1 won't mount degraded

Message ID 20170129201650.wvpzaz45r5uzsvrb@angband.pl (mailing list archive)
State New, archived
Headers show

Commit Message

Adam Borowski Jan. 29, 2017, 8:16 p.m. UTC
On Sun, Jan 29, 2017 at 08:12:56AM -0500, Subscription Account wrote:
> I had to remove one disk from raid 1 and I rebooted the system and was
> able to mount in degraded mode. Then I powered off the system added a
> new disk and when I am trying to mount the btrfs filesystem in
> degraded mode it will no longer mount it read-write. I can mount
> read-only though.
> 
> [ 2506.816795] BTRFS: missing devices(1) exceeds the limit(0),
> writeable mount is not allowed
> 
> In the read-only mode I am not able to add a new device or replace :(.
> Please help.

A known problem; you can mount rw degraded only once, if you don't fix the
degradation somehow (by adding a device or converting down), you can't mount
rw again.

If you know how to build a kernel, here's a crude patch.


Meow

Comments

Subscription Account Jan. 30, 2017, 1:26 a.m. UTC | #1
On Sun, Jan 29, 2017 at 3:16 PM, Adam Borowski <kilobyte@angband.pl> wrote:
> On Sun, Jan 29, 2017 at 08:12:56AM -0500, Subscription Account wrote:
>> I had to remove one disk from raid 1 and I rebooted the system and was
>> able to mount in degraded mode. Then I powered off the system added a
>> new disk and when I am trying to mount the btrfs filesystem in
>> degraded mode it will no longer mount it read-write. I can mount
>> read-only though.
>>
>> [ 2506.816795] BTRFS: missing devices(1) exceeds the limit(0),
>> writeable mount is not allowed
>>
>> In the read-only mode I am not able to add a new device or replace :(.
>> Please help.
>
> A known problem; you can mount rw degraded only once, if you don't fix the
> degradation somehow (by adding a device or converting down), you can't mount
> rw again.

Uh oh! I wish I know that I only had one shot :(. Just to be clear,
once the rw mount is done first time, and a replace operation or a add
followed by delete missing is executed, irrespective of raid rebuild
status, if the machine reboots, it will still be mounted without
issues and rebuild/balance will continue? Because my concern is that
there could still be blocks with single copy if the balance has not
completed?


>
> If you know how to build a kernel, here's a crude patch.
>
>

I am feeling a little lucky because I still have the other disk and I
am assuming if I remove my current disk and put the other disk in, I
would be able to mount it once again?

Also, since I have couple times already tried btrfs repair etc, can I
trust the current disk anyways?

if I get into issue again, I would definitely use the patch to
recompile the kernel and thanks a lot for the same.

Thanks again,

--
Raj

> Meow
> --
> Autotools hint: to do a zx-spectrum build on a pdp11 host, type:
>   ./configure --host=zx-spectrum --build=pdp11
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Adam Borowski Jan. 30, 2017, 1:49 a.m. UTC | #2
On Sun, Jan 29, 2017 at 08:26:54PM -0500, Subscription Account wrote:
> On Sun, Jan 29, 2017 at 3:16 PM, Adam Borowski <kilobyte@angband.pl> wrote:
> > On Sun, Jan 29, 2017 at 08:12:56AM -0500, Subscription Account wrote:
> >> [ 2506.816795] BTRFS: missing devices(1) exceeds the limit(0),
> >> writeable mount is not allowed
> >>
> >> In the read-only mode I am not able to add a new device or replace :(.
> >> Please help.
> >
> > A known problem; you can mount rw degraded only once, if you don't fix the
> > degradation somehow (by adding a device or converting down), you can't mount
> > rw again.
> 
> Uh oh! I wish I know that I only had one shot :(.

No data is actually lost, it was raid1, it's raid1+single now.  The mount
check is naive and in this case wrong.

> > If you know how to build a kernel, here's a crude patch.
> 
> I am feeling a little lucky because I still have the other disk and I
> am assuming if I remove my current disk and put the other disk in, I
> would be able to mount it once again?

Not sure if that's a good idea if the copies are out of sync.

> Also, since I have couple times already tried btrfs repair etc, can I
> trust the current disk anyways?
> 
> if I get into issue again, I would definitely use the patch to
> recompile the kernel and thanks a lot for the same.

No, the patch doesn't prevent the issue.  It lets you recover -- ie, it
should help in the situation you're in right now.


Meow!
Subscription Account Jan. 30, 2017, 1:53 a.m. UTC | #3
On Sun, Jan 29, 2017 at 8:49 PM, Adam Borowski <kilobyte@angband.pl> wrote:
> On Sun, Jan 29, 2017 at 08:26:54PM -0500, Subscription Account wrote:
>> On Sun, Jan 29, 2017 at 3:16 PM, Adam Borowski <kilobyte@angband.pl> wrote:
>> > On Sun, Jan 29, 2017 at 08:12:56AM -0500, Subscription Account wrote:
>> >> [ 2506.816795] BTRFS: missing devices(1) exceeds the limit(0),
>> >> writeable mount is not allowed
>> >>
>> >> In the read-only mode I am not able to add a new device or replace :(.
>> >> Please help.
>> >
>> > A known problem; you can mount rw degraded only once, if you don't fix the
>> > degradation somehow (by adding a device or converting down), you can't mount
>> > rw again.
>>
>> Uh oh! I wish I know that I only had one shot :(.
>
> No data is actually lost, it was raid1, it's raid1+single now.  The mount
> check is naive and in this case wrong.
>
>> > If you know how to build a kernel, here's a crude patch.
>>
>> I am feeling a little lucky because I still have the other disk and I
>> am assuming if I remove my current disk and put the other disk in, I
>> would be able to mount it once again?
>
> Not sure if that's a good idea if the copies are out of sync.
>

Not sure I follow, since I will be removing the current disk which may
have some changes, but I don't really care for those changes. When I
put the other disk in, I would be removing the first one so there is
only one disk at one time, it is just that I will go back to a few
days ago status? Am I missing something here?

Thanks,

--
Raj

>> Also, since I have couple times already tried btrfs repair etc, can I
>> trust the current disk anyways?
>>
>> if I get into issue again, I would definitely use the patch to
>> recompile the kernel and thanks a lot for the same.
>
> No, the patch doesn't prevent the issue.  It lets you recover -- ie, it
> should help in the situation you're in right now.
>
>
> Meow!
> --
> Autotools hint: to do a zx-spectrum build on a pdp11 host, type:
>   ./configure --host=zx-spectrum --build=pdp11
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

From 1367d3da6b0189797f6090b11d8716a1cc136593 Mon Sep 17 00:00:00 2001
From: Adam Borowski <kilobyte@angband.pl>
Date: Mon, 23 Jan 2017 19:03:20 +0100
Subject: [PATCH] [NOT-FOR-MERGING] btrfs: make "too many missing devices"
 check non-fatal

It breaks degraded mounts of multi-device filesystems that have any single
blocks, which are naturally created if it has been mounted degraded before.
Obviously, any further device loss will result in data loss, but the user
has already specified -odegraded so that's understood.

For a real fix, we'd want to check whether any of single blocks are missing,
as that would allow telling apart broken JBOD filesystems from bona-fide
degraded RAIDs.

(This patch is for the benefit of folks who'd have to recreate a filesystem
just because it got degraded.)
---
 fs/btrfs/disk-io.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 18004169552c..1b25b9e24662 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -3060,10 +3060,9 @@  int open_ctree(struct super_block *sb,
 	     fs_info->num_tolerated_disk_barrier_failures &&
 	    !(sb->s_flags & MS_RDONLY)) {
 		btrfs_warn(fs_info,
-"missing devices (%llu) exceeds the limit (%d), writeable mount is not allowed",
+"missing devices (%llu) exceeds the limit (%d), add more or risk data loss",
 			fs_info->fs_devices->missing_devices,
 			fs_info->num_tolerated_disk_barrier_failures);
-		goto fail_sysfs;
 	}
 
 	fs_info->cleaner_kthread = kthread_run(cleaner_kthread, tree_root,
-- 
2.11.0