diff mbox

Btrfs: fix crash on endio of reading corrupted block

Message ID 1408462393-3291-1-git-send-email-bo.li.liu@oracle.com (mailing list archive)
State Superseded
Headers show

Commit Message

Liu Bo Aug. 19, 2014, 3:33 p.m. UTC
The crash is

------------[ cut here ]------------
kernel BUG at fs/btrfs/extent_io.c:2124!
[...]
Workqueue: btrfs-endio normal_work_helper [btrfs]
RIP: 0010:[<ffffffffa02d6055>]  [<ffffffffa02d6055>] end_bio_extent_readpage+0xb45/0xcd0 [btrfs]

This is in fact a regression.

It is because we forgot to increase @offset properly in reading corrupted block,
so that the @offset remains, and this leads to checksum errors while reading
left blocks queued up in the same bio, and then ends up with hiting the above
BUG_ON.

Reported-by: Chris Murphy <lists@colorremedies.com>
Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
---
 fs/btrfs/extent_io.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Chris Mason Aug. 19, 2014, 7:49 p.m. UTC | #1
On 08/19/2014 11:33 AM, Liu Bo wrote:
> The crash is
> 
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/extent_io.c:2124!
> [...]
> Workqueue: btrfs-endio normal_work_helper [btrfs]
> RIP: 0010:[<ffffffffa02d6055>]  [<ffffffffa02d6055>] end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
> 
> This is in fact a regression.
> 
> It is because we forgot to increase @offset properly in reading corrupted block,
> so that the @offset remains, and this leads to checksum errors while reading
> left blocks queued up in the same bio, and then ends up with hiting the above
> BUG_ON.

Thanks Chris and Liu, this is queued.

-chris
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Eric Sandeen Aug. 19, 2014, 9:42 p.m. UTC | #2
On 8/19/14, 10:33 AM, Liu Bo wrote:
> The crash is
> 
> ------------[ cut here ]------------
> kernel BUG at fs/btrfs/extent_io.c:2124!
> [...]
> Workqueue: btrfs-endio normal_work_helper [btrfs]
> RIP: 0010:[<ffffffffa02d6055>]  [<ffffffffa02d6055>] end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
> 
> This is in fact a regression.

It'd be helpful to identify the commit, or at least kernel release, which caused
the regression.

> It is because we forgot to increase @offset properly in reading corrupted block,
> so that the @offset remains, and this leads to checksum errors while reading
> left blocks queued up in the same bio, and then ends up with hiting the above
> BUG_ON.

So does that mean that any checksum error on this path will crash the kernel?

That sounds like this bug has exposed a more fundamental problem, no?

Thanks,
-Eric

> Reported-by: Chris Murphy <lists@colorremedies.com>
> Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> ---
>  fs/btrfs/extent_io.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> index 3af4966..be41e4d 100644
> --- a/fs/btrfs/extent_io.c
> +++ b/fs/btrfs/extent_io.c
> @@ -2602,6 +2602,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
>  					test_bit(BIO_UPTODATE, &bio->bi_flags);
>  				if (err)
>  					uptodate = 0;
> +				offset += len;
>  				continue;
>  			}
>  		}
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Liu Bo Aug. 20, 2014, 8:20 a.m. UTC | #3
On Tue, Aug 19, 2014 at 04:42:42PM -0500, Eric Sandeen wrote:
> On 8/19/14, 10:33 AM, Liu Bo wrote:
> > The crash is
> > 
> > ------------[ cut here ]------------
> > kernel BUG at fs/btrfs/extent_io.c:2124!
> > [...]
> > Workqueue: btrfs-endio normal_work_helper [btrfs]
> > RIP: 0010:[<ffffffffa02d6055>]  [<ffffffffa02d6055>] end_bio_extent_readpage+0xb45/0xcd0 [btrfs]
> > 
> > This is in fact a regression.
> 
> It'd be helpful to identify the commit, or at least kernel release, which caused
> the regression.

Okay, got it.

> 
> > It is because we forgot to increase @offset properly in reading corrupted block,
> > so that the @offset remains, and this leads to checksum errors while reading
> > left blocks queued up in the same bio, and then ends up with hiting the above
> > BUG_ON.
> 
> So does that mean that any checksum error on this path will crash the kernel?
> 
> That sounds like this bug has exposed a more fundamental problem, no?

Eric, you're right, I was hiding some details, now writing a new commit log...

thanks,
-liubo

> 
> Thanks,
> -Eric
> 
> > Reported-by: Chris Murphy <lists@colorremedies.com>
> > Signed-off-by: Liu Bo <bo.li.liu@oracle.com>
> > ---
> >  fs/btrfs/extent_io.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
> > index 3af4966..be41e4d 100644
> > --- a/fs/btrfs/extent_io.c
> > +++ b/fs/btrfs/extent_io.c
> > @@ -2602,6 +2602,7 @@ static void end_bio_extent_readpage(struct bio *bio, int err)
> >  					test_bit(BIO_UPTODATE, &bio->bi_flags);
> >  				if (err)
> >  					uptodate = 0;
> > +				offset += len;
> >  				continue;
> >  			}
> >  		}
> > 
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index 3af4966..be41e4d 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -2602,6 +2602,7 @@  static void end_bio_extent_readpage(struct bio *bio, int err)
 					test_bit(BIO_UPTODATE, &bio->bi_flags);
 				if (err)
 					uptodate = 0;
+				offset += len;
 				continue;
 			}
 		}