diff mbox

Btrfs: fix crash when mounting raid5 btrfs with missing disks

Message ID 1403595556-32753-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State Accepted
Headers show

Commit Message

Liu Bo June 24, 2014, 7:39 a.m. UTC
The reproducer is

$ mkfs.btrfs D1 D2 D3 -mraid5
$ mkfs.ext4 D2 && mkfs.ext4 D3
$ mount D1 /btrfs -odegraded

-------------------

[   87.672992] ------------[ cut here ]------------
[   87.673845] kernel BUG at fs/btrfs/raid56.c:1828!
...
[   87.673845] RIP: 0010:[<ffffffff813efc7e>]  [<ffffffff813efc7e>] __raid_recover_end_io+0x4ae/0x4d0
...
[   87.673845] Call Trace:
[   87.673845]  [<ffffffff8116bbc6>] ? mempool_free+0x36/0xa0
[   87.673845]  [<ffffffff813f0255>] raid_recover_end_io+0x75/0xa0
[   87.673845]  [<ffffffff81447c5b>] bio_endio+0x5b/0xa0
[   87.673845]  [<ffffffff81447cb2>] bio_endio_nodec+0x12/0x20
[   87.673845]  [<ffffffff81374621>] end_workqueue_fn+0x41/0x50
[   87.673845]  [<ffffffff813ad2aa>] normal_work_helper+0xca/0x2c0
[   87.673845]  [<ffffffff8108ba2b>] process_one_work+0x1eb/0x530
[   87.673845]  [<ffffffff8108b9c9>] ? process_one_work+0x189/0x530
[   87.673845]  [<ffffffff8108c15b>] worker_thread+0x11b/0x4f0
[   87.673845]  [<ffffffff8108c040>] ? rescuer_thread+0x290/0x290
[   87.673845]  [<ffffffff810939c4>] kthread+0xe4/0x100
[   87.673845]  [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220
[   87.673845]  [<ffffffff817e7c7c>] ret_from_fork+0x7c/0xb0
[   87.673845]  [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220

-------------------

It's because that we miscalculate @rbio->bbio->error so that it doesn't
reach maximum of tolerable errors while it should have.

Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/raid56.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Satoru Takeuchi June 25, 2014, 7:25 a.m. UTC | #1
Hi Liu,

(2014/06/24 16:39), Liu Bo wrote:
> The reproducer is
> 
> $ mkfs.btrfs D1 D2 D3 -mraid5
> $ mkfs.ext4 D2 && mkfs.ext4 D3
> $ mount D1 /btrfs -odegraded

Tested-by: Satoru Takeuchi<takeuchi_satoru@jp.fujitsu.com>

Here is the result of the last mount.

===
...
mount: wrong fs type, bad option, bad superblock on /dev/vdb1,
       missing codepage or helper program, or other error

       In some cases useful info is found in syslog - try
       dmesg | tail or so.
===

It "correctly" failed :-)

Thanks,
Satoru

> 
> -------------------
> 
> [   87.672992] ------------[ cut here ]------------
> [   87.673845] kernel BUG at fs/btrfs/raid56.c:1828!
> ...
> [   87.673845] RIP: 0010:[<ffffffff813efc7e>]  [<ffffffff813efc7e>] __raid_recover_end_io+0x4ae/0x4d0
> ...
> [   87.673845] Call Trace:
> [   87.673845]  [<ffffffff8116bbc6>] ? mempool_free+0x36/0xa0
> [   87.673845]  [<ffffffff813f0255>] raid_recover_end_io+0x75/0xa0
> [   87.673845]  [<ffffffff81447c5b>] bio_endio+0x5b/0xa0
> [   87.673845]  [<ffffffff81447cb2>] bio_endio_nodec+0x12/0x20
> [   87.673845]  [<ffffffff81374621>] end_workqueue_fn+0x41/0x50
> [   87.673845]  [<ffffffff813ad2aa>] normal_work_helper+0xca/0x2c0
> [   87.673845]  [<ffffffff8108ba2b>] process_one_work+0x1eb/0x530
> [   87.673845]  [<ffffffff8108b9c9>] ? process_one_work+0x189/0x530
> [   87.673845]  [<ffffffff8108c15b>] worker_thread+0x11b/0x4f0
> [   87.673845]  [<ffffffff8108c040>] ? rescuer_thread+0x290/0x290
> [   87.673845]  [<ffffffff810939c4>] kthread+0xe4/0x100
> [   87.673845]  [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220
> [   87.673845]  [<ffffffff817e7c7c>] ret_from_fork+0x7c/0xb0
> [   87.673845]  [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220
> 
> -------------------
> 
> It's because that we miscalculate @rbio->bbio->error so that it doesn't
> reach maximum of tolerable errors while it should have.
> 
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>   fs/btrfs/raid56.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> index 4055291..4a88f07 100644
> --- a/fs/btrfs/raid56.c
> +++ b/fs/btrfs/raid56.c
> @@ -1956,9 +1956,10 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
>   	 * pages are going to be uptodate.
>   	 */
>   	for (stripe = 0; stripe < bbio->num_stripes; stripe++) {
> -		if (rbio->faila == stripe ||
> -		    rbio->failb == stripe)
> +		if (rbio->faila == stripe || rbio->failb == stripe) {
> +			atomic_inc(&rbio->bbio->error);
>   			continue;
> +		}
>   
>   		for (pagenr = 0; pagenr < nr_pages; pagenr++) {
>   			struct page *p;
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo June 25, 2014, 7:34 a.m. UTC | #2
Hi Satoru,

On Wed, Jun 25, 2014 at 04:25:01PM +0900, Satoru Takeuchi wrote:
> Hi Liu,
> 
> (2014/06/24 16:39), Liu Bo wrote:
> > The reproducer is
> > 
> > $ mkfs.btrfs D1 D2 D3 -mraid5
> > $ mkfs.ext4 D2 && mkfs.ext4 D3
> > $ mount D1 /btrfs -odegraded
> 
> Tested-by: Satoru Takeuchi<takeuchi_satoru@jp.fujitsu.com>
> 
> Here is the result of the last mount.
> 
> ===
> ...
> mount: wrong fs type, bad option, bad superblock on /dev/vdb1,
>        missing codepage or helper program, or other error
> 
>        In some cases useful info is found in syslog - try
>        dmesg | tail or so.
> ===
> 
> It "correctly" failed :-)

Thanks for testing it :)

thanks,
-liubo

> 
> Thanks,
> Satoru
> 
> > 
> > -------------------
> > 
> > [   87.672992] ------------[ cut here ]------------
> > [   87.673845] kernel BUG at fs/btrfs/raid56.c:1828!
> > ...
> > [   87.673845] RIP: 0010:[<ffffffff813efc7e>]  [<ffffffff813efc7e>] __raid_recover_end_io+0x4ae/0x4d0
> > ...
> > [   87.673845] Call Trace:
> > [   87.673845]  [<ffffffff8116bbc6>] ? mempool_free+0x36/0xa0
> > [   87.673845]  [<ffffffff813f0255>] raid_recover_end_io+0x75/0xa0
> > [   87.673845]  [<ffffffff81447c5b>] bio_endio+0x5b/0xa0
> > [   87.673845]  [<ffffffff81447cb2>] bio_endio_nodec+0x12/0x20
> > [   87.673845]  [<ffffffff81374621>] end_workqueue_fn+0x41/0x50
> > [   87.673845]  [<ffffffff813ad2aa>] normal_work_helper+0xca/0x2c0
> > [   87.673845]  [<ffffffff8108ba2b>] process_one_work+0x1eb/0x530
> > [   87.673845]  [<ffffffff8108b9c9>] ? process_one_work+0x189/0x530
> > [   87.673845]  [<ffffffff8108c15b>] worker_thread+0x11b/0x4f0
> > [   87.673845]  [<ffffffff8108c040>] ? rescuer_thread+0x290/0x290
> > [   87.673845]  [<ffffffff810939c4>] kthread+0xe4/0x100
> > [   87.673845]  [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220
> > [   87.673845]  [<ffffffff817e7c7c>] ret_from_fork+0x7c/0xb0
> > [   87.673845]  [<ffffffff810938e0>] ? kthread_create_on_node+0x220/0x220
> > 
> > -------------------
> > 
> > It's because that we miscalculate @rbio->bbio->error so that it doesn't
> > reach maximum of tolerable errors while it should have.
> > 
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >   fs/btrfs/raid56.c | 5 +++--
> >   1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
> > index 4055291..4a88f07 100644
> > --- a/fs/btrfs/raid56.c
> > +++ b/fs/btrfs/raid56.c
> > @@ -1956,9 +1956,10 @@ static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
> >   	 * pages are going to be uptodate.
> >   	 */
> >   	for (stripe = 0; stripe < bbio->num_stripes; stripe++) {
> > -		if (rbio->faila == stripe ||
> > -		    rbio->failb == stripe)
> > +		if (rbio->faila == stripe || rbio->failb == stripe) {
> > +			atomic_inc(&rbio->bbio->error);
> >   			continue;
> > +		}
> >   
> >   		for (pagenr = 0; pagenr < nr_pages; pagenr++) {
> >   			struct page *p;
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/raid56.c b/fs/btrfs/raid56.c
index 4055291..4a88f07 100644
--- a/fs/btrfs/raid56.c
+++ b/fs/btrfs/raid56.c
@@ -1956,9 +1956,10 @@  static int __raid56_parity_recover(struct btrfs_raid_bio *rbio)
 	 * pages are going to be uptodate.
 	 */
 	for (stripe = 0; stripe < bbio->num_stripes; stripe++) {
-		if (rbio->faila == stripe ||
-		    rbio->failb == stripe)
+		if (rbio->faila == stripe || rbio->failb == stripe) {
+			atomic_inc(&rbio->bbio->error);
 			continue;
+		}
 
 		for (pagenr = 0; pagenr < nr_pages; pagenr++) {
 			struct page *p;