diff mbox

btrfs GPF in read_extent_buffer() while scrubbing with kernel 3.4.2

Message ID 4FF42876.8010309@jan-o-sch.net (mailing list archive)
State New, archived
Headers show

Commit Message

Jan Schmidt July 4, 2012, 11:26 a.m. UTC
On 04.07.2012 02:17, Sami Liedes wrote:
> On Wed, Jul 04, 2012 at 01:47:56AM +0300, Sami Liedes wrote:
>> I've seen this before: An overly long "Modules linked in:" line causes
>> a large gap in netconsole output.
> 
> I managed to capture the entire output using netconsole by modifying
> the kernel to not output the list of modules.

Okay, thanks for the output. Can you please apply the patch below and capture
especially the line printed before the "cut here" line?

Thanks!
-Jan

--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Sami Liedes July 4, 2012, 4:03 p.m. UTC | #1
On Wed, Jul 04, 2012 at 01:26:46PM +0200, Jan Schmidt wrote:
> On 04.07.2012 02:17, Sami Liedes wrote:
> > On Wed, Jul 04, 2012 at 01:47:56AM +0300, Sami Liedes wrote:
> >> I've seen this before: An overly long "Modules linked in:" line causes
> >> a large gap in netconsole output.
> > 
> > I managed to capture the entire output using netconsole by modifying
> > the kernel to not output the list of modules.
> 
> Okay, thanks for the output. Can you please apply the patch below and capture
> especially the line printed before the "cut here" line?

Here you go.

	Sami

------------------------------------------------------------
[  121.524803] netpoll: netconsole: local port 4444
[  121.524831] netpoll: netconsole: local IP 192.168.1.2
[  121.524853] netpoll: netconsole: interface 'eth0'
[  121.524874] netpoll: netconsole: remote port 1194
[  121.524894] netpoll: netconsole: remote IP 192.168.1.73
[  121.524917] netpoll: netconsole: remote ethernet address 00:1c:10:44:47:2c
[  121.525055] console [netcon0] enabled
[  121.525074] netconsole: network logging started
[  200.980496] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2243489562624, level 26, generation 3144240307695375391, nritems 620178657. len param 17. debug 2/989/620178657/3144240307695375391
[  200.980594] ------------[ cut here ]------------
[  200.980644] WARNING: at fs/btrfs/extent_io.c:4528 read_extent_buffer+0x167/0x1a0 [btrfs]()
[  200.980681] Hardware name: System Product Name
[  200.980701] Modules linked in: <omitted>  [last unloaded: scsi_wait_scan]
[  200.980739] Pid: 1145, comm: btrfs-endio-met Tainted: G        W    3.4.4-modded-oops+ #2
[  200.980774] Call Trace:
[  200.980792]  [<ffffffff8103905a>] warn_slowpath_common+0x7a/0xb0
[  200.980821]  [<ffffffff810390a5>] warn_slowpath_null+0x15/0x20
[  200.980860]  [<ffffffffa01d6e07>] read_extent_buffer+0x167/0x1a0 [btrfs]
[  200.980902]  [<ffffffffa01ce03d>] btrfs_node_key+0x1d/0x20 [btrfs]
[  200.980941]  [<ffffffffa020459f>] __readahead_hook.isra.5+0x3ff/0x460 [btrfs]
[  200.980982]  [<ffffffffa020492f>] btree_readahead_hook+0x1f/0x40 [btrfs]
[  200.981022]  [<ffffffffa01aa841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs]
[  200.981065]  [<ffffffffa01d2442>] ? find_first_extent_bit_state+0x22/0x80 [btrfs]
[  200.981109]  [<ffffffffa01d308b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs]
[  200.981150]  [<ffffffffa01a9e61>] ? end_workqueue_fn+0x31/0x50 [btrfs]
[  200.981182]  [<ffffffff81158988>] bio_endio+0x18/0x30
[  200.981214]  [<ffffffffa01a9e6c>] end_workqueue_fn+0x3c/0x50 [btrfs]
[  200.981253]  [<ffffffffa01e08d7>] worker_loop+0x157/0x560 [btrfs]
[  200.981291]  [<ffffffffa01e0780>] ? btrfs_queue_worker+0x310/0x310 [btrfs]
[  200.981323]  [<ffffffff81058e5e>] kthread+0x8e/0xa0
[  200.981348]  [<ffffffff81419064>] kernel_thread_helper+0x4/0x10
[  200.981377]  [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70
[  200.981406]  [<ffffffff81419060>] ? gs_change+0x13/0x13
[  200.981430] ---[ end trace e93713a9d40cd06e ]---
[  200.981459] general protection fault: 0000 [#1] SMP 
[  200.981487] CPU 2 
[  200.981498] Modules linked in: <omitted>  [last unloaded: scsi_wait_scan]
[  200.981540] 
[  200.981550] Pid: 1145, comm: btrfs-endio-met Tainted: G        W    3.4.4-modded-oops+ #2 System manufacturer System Product Name/P8P67 EVO
[  200.981612] RIP: 0010:[<ffffffff811e83ed>]  [<ffffffff811e83ed>] memcpy+0xd/0x110
[  200.981650] RSP: 0000:ffff8801f4bf7b68  EFLAGS: 00010202
[  200.981675] RAX: ffff8801f4bf7c8f RBX: 0000000000000011 RCX: 0000000000000002
[  200.981707] RDX: 0000000000000001 RSI: 0005080000000003 RDI: ffff8801f4bf7c8f
[  200.981738] RBP: ffff8801f4bf7be0 R08: 0000000000000000 R09: 0000000000000000
[  200.981769] R10: ffff8801f4bf7c8f R11: ffff8801f3930780 R12: ffff8801f4bf7ca0
[  200.981800] R13: ffff8801f7286178 R14: 0000000000000048 R15: 0000000000000011
[  200.981832] FS:  0000000000000000(0000) GS:ffff88021ec80000(0000) knlGS:0000000000000000
[  200.981868] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  200.981894] CR2: 00000000f773a000 CR3: 000000020d0de000 CR4: 00000000000407e0
[  200.981925] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  200.981956] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  200.981988] Process btrfs-endio-met (pid: 1145, threadinfo ffff8801f4bf6000, task ffff8801f4ff42f0)
[  200.982026] Stack:
[  200.982038]  ffffffffa01d6d5b ffff880124f72ce1 0000000000000011 0000000000000002
[  200.982084]  00000000000003dd 0000000024f72ce1 2ba295e6a4a5d41f ffff88020dc65000
[  200.982130]  ffff8801f4bf7c8f 0000000000001000 ffff8801f4bf7c58 00000000000003dd
[  200.982175] Call Trace:
[  200.982201]  [<ffffffffa01d6d5b>] ? read_extent_buffer+0xbb/0x1a0 [btrfs]
[  200.982243]  [<ffffffffa01ce03d>] btrfs_node_key+0x1d/0x20 [btrfs]
[  200.982281]  [<ffffffffa020459f>] __readahead_hook.isra.5+0x3ff/0x460 [btrfs]
[  200.982322]  [<ffffffffa020492f>] btree_readahead_hook+0x1f/0x40 [btrfs]
[  200.982362]  [<ffffffffa01aa841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs]
[  200.982405]  [<ffffffffa01d2442>] ? find_first_extent_bit_state+0x22/0x80 [btrfs]
[  200.982448]  [<ffffffffa01d308b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs]
[  200.982489]  [<ffffffffa01a9e61>] ? end_workqueue_fn+0x31/0x50 [btrfs]
[  200.982519]  [<ffffffff81158988>] bio_endio+0x18/0x30
[  200.982552]  [<ffffffffa01a9e6c>] end_workqueue_fn+0x3c/0x50 [btrfs]
[  200.982591]  [<ffffffffa01e08d7>] worker_loop+0x157/0x560 [btrfs]
[  200.982628]  [<ffffffffa01e0780>] ? btrfs_queue_worker+0x310/0x310 [btrfs]
[  200.982660]  [<ffffffff81058e5e>] kthread+0x8e/0xa0
[  200.982684]  [<ffffffff81419064>] kernel_thread_helper+0x4/0x10
[  200.982713]  [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70
[  200.982742]  [<ffffffff81419060>] ? gs_change+0x13/0x13
[  200.982766] Code: 4e 48 83 c4 08 5b 5d c3 66 0f 1f 44 00 00 e8 eb fb ff ff eb e1 90 90 90 90 90 90 90 90 90 48 89 f8 48 89 d1 48 c1 e9 03 83 e2 07 <f3> 48 a5 89 d1 f3 a4 c3 20 4c 8b 06 4c 8b 4e 08 4c 8b 56 10 4c 
[  200.983037] RIP  [<ffffffff811e83ed>] memcpy+0xd/0x110
[  200.983065]  RSP <ffff8801f4bf7b68>
[  200.983087] ---[ end trace e93713a9d40cd06f ]---
[  361.453631] SysRq : Resetting
[  361.453663] ACPI MEMORY or I/O RESET_REG.
Jan Schmidt July 4, 2012, 4:38 p.m. UTC | #2
On 04.07.2012 18:03, Sami Liedes wrote:
> Here you go.
> 
> 	Sami
> [...]
> [  200.980496] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2243489562624, level 26, generation 3144240307695375391, nritems 620178657. len param 17. debug 2/989/620178657/3144240307695375391

Wow, that's strange. Can you repeat your test once or twice and paste that line,
please? I'd like to get a feeling if the values are completely random.

Reading more of the readahead code now...

Thanks,
-Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Sami Liedes July 4, 2012, 8:24 p.m. UTC | #3
On Wed, Jul 04, 2012 at 06:38:00PM +0200, Jan Schmidt wrote:
> > [  200.980496] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2243489562624, level 26, generation 3144240307695375391, nritems 620178657. len param 17. debug 2/989/620178657/3144240307695375391

Let's call this try 1. I ran it three more times, so below we have
tries 2, 3 and 4.

> Wow, that's strange. Can you repeat your test once or twice and paste that line,
> please? I'd like to get a feeling if the values are completely random.

Curiously, it clearly takes longer for it to crash after starting the
scrub each time I run it. Also on try 4 I got an entirely different
crash (backtrace below). Now it scrubs maybe the first 200G or so of
both devices of the (raid-1) 2.2T filesystem before it crashes.

start and eb->len seem to be the same (32771 and 32768) every time. eb
start varies, but there's some pattern if you view them in hex:

  Try 1  20a5a660000
  Try 2  20bb0018000
  Try 3  20a8bc28000
  Try 4  (no output, different crash)

The rest of the values seem to me to be completely different every time.

	Sami


Try 2:

------------------------------------------------------------
[12961.870107] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2249220784128, level 14, generation 2242260605927040034, nritems 117835525. len param 17. debug 2/989/117835525/2242260605927040034
[12961.870204] ------------[ cut here ]------------
[12961.870264] WARNING: at fs/btrfs/extent_io.c:4528 read_extent_buffer+0x167/0x1a0 [btrfs]()
[12961.870302] Hardware name: System Product Name
[12961.870322] Modules linked in: <omitted>  [last unloaded: scsi_wait_scan]
[12961.870367] Pid: 1144, comm: btrfs-endio-met Tainted: G        W    3.4.4-modded-oops+ #2
[12961.870403] Call Trace:
[12961.870421]  [<ffffffff8103905a>] warn_slowpath_common+0x7a/0xb0
[12961.870449]  [<ffffffff810390a5>] warn_slowpath_null+0x15/0x20
[...]
------------------------------------------------------------

Try 3:

------------------------------------------------------------
[  531.770984] btrfs: invalid parameters for read_extent_buffer: start (32771) > eb->len (32768). eb start is 2244317708288, level 170, generation 13639284858109917187, nritems 6171943. len param 17. debug 2/989/6171943/13639284858109917187
[  531.771081] ------------[ cut here ]------------
[  531.771133] WARNING: at fs/btrfs/extent_io.c:4528 read_extent_buffer+0x167/0x1a0 [btrfs]()
[  531.771169] Hardware name: System Product Name
[  531.771191] Modules linked in: <omitted>  [last unloaded: scsi_wait_scan]
[  531.771229] Pid: 1132, comm: btrfs-endio-met Tainted: G        W    3.4.4-modded-oops+ #2
[  531.771265] Call Trace:
[  531.771282]  [<ffffffff8103905a>] warn_slowpath_common+0x7a/0xb0
[...]
------------------------------------------------------------

Try 4:

------------------------------------------------------------
[   95.933108] netconsole: network logging started
[  982.651987] unable to find logical 691402650139365534 len 32768
[  982.652060] ------------[ cut here ]------------
[  982.652085] kernel BUG at fs/btrfs/volumes.c:3725!
[  982.652109] invalid opcode: 0000 [#1] SMP 
[  982.652138] CPU 4 
[  982.652149] Modules linked in: <omitted>  [last unloaded: scsi_wait_scan]
[  982.652190] 
[  982.652201] Pid: 1127, comm: btrfs-endio-met Tainted: G        W    3.4.4-modded-oops+ #2 System manufacturer System Product Name/P8P67 EVO
[  982.652264] RIP: 0010:[<ffffffffa0257778>]  [<ffffffffa0257778>] __btrfs_map_block+0x668/0x680 [btrfs]
[  982.652323] RSP: 0000:ffff8801f43cfa70  EFLAGS: 00010286
[  982.652347] RAX: 0000000000000049 RBX: 09985b0c0e52109e RCX: 0000000000000082
[  982.652379] RDX: 00000000000000e8 RSI: 0000000000000046 RDI: 0000000000000246
[  982.652411] RBP: ffff8801f43cfb10 R08: 0000000000000000 R09: 0000000000000000
[  982.652443] R10: ffff8801f1fbf680 R11: ffff8801f3b00780 R12: ffff8801f0026108
[  982.652475] R13: 0000000000008000 R14: ffff8801f0026fe0 R15: ffff8801f43cfbb0
[  982.652507] FS:  0000000000000000(0000) GS:ffff88021ed00000(0000) knlGS:0000000000000000
[  982.652542] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  982.652568] CR2: 00000000f775c000 CR3: 00000002110a1000 CR4: 00000000000407e0
[  982.652600] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  982.652632] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  982.652664] Process btrfs-endio-met (pid: 1127, threadinfo ffff8801f43ce000, task ffff8801f401c2f0)
[  982.652703] Stack:
[  982.652715]  ffff8801f43cfa90 ffffffff811dc553 ffff88021ec0ce88 ffff88021ed0ce80
[  982.652761]  ffff8801f43cfb00 0000000000000086 ffff8801f43cfaf0 0000000000000086
[  982.652807]  000000fdf43cfad0 ffff8801f43cfba8 0000002000000004 0000000000000004
[  982.652852] Call Trace:
[  982.652869]  [<ffffffff811dc553>] ? cpumask_next_and+0x23/0x40
[  982.652897]  [<ffffffff8111d7e2>] ? kmem_cache_alloc_trace+0xc2/0x100
[  982.652940]  [<ffffffffa025aac9>] btrfs_map_block+0x9/0x10 [btrfs]
[  982.652979]  [<ffffffffa0280ad2>] reada_add_block+0x1c2/0x890 [btrfs]
[  982.653010]  [<ffffffff8140a4a2>] ? __slab_free+0xde/0x24e
[  982.653045]  [<ffffffffa0281447>] __readahead_hook.isra.5+0x2a7/0x460 [btrfs]
[  982.653086]  [<ffffffffa028192f>] btree_readahead_hook+0x1f/0x40 [btrfs]
[  982.653126]  [<ffffffffa0227841>] btree_readpage_end_io_hook+0x111/0x260 [btrfs]
[  982.653168]  [<ffffffffa024f442>] ? find_first_extent_bit_state+0x22/0x80 [btrfs]
[  982.653211]  [<ffffffffa025008b>] end_bio_extent_readpage+0xcb/0xa30 [btrfs]
[  982.653252]  [<ffffffffa0226e61>] ? end_workqueue_fn+0x31/0x50 [btrfs]
[  982.653284]  [<ffffffff81158988>] bio_endio+0x18/0x30
[  982.653317]  [<ffffffffa0226e6c>] end_workqueue_fn+0x3c/0x50 [btrfs]
[  982.653355]  [<ffffffffa025d8d7>] worker_loop+0x157/0x560 [btrfs]
[  982.653393]  [<ffffffffa025d780>] ? btrfs_queue_worker+0x310/0x310 [btrfs]
[  982.653427]  [<ffffffff81058e5e>] kthread+0x8e/0xa0
[  982.653451]  [<ffffffff81419064>] kernel_thread_helper+0x4/0x10
[  982.653480]  [<ffffffff81058dd0>] ? flush_kthread_worker+0x70/0x70
[  982.653509]  [<ffffffff81419060>] ? gs_change+0x13/0x13
[  982.653533] Code: e6 89 c7 eb a4 0f 0b c7 45 c4 01 00 00 00 31 db e9 08 fd ff ff 0f 0b 49 8b 17 48 89 de 48 c7 c7 08 0c 29 a0 31 c0 e8 e3 09 1b e1 <0f> 0b 0f 0b 89 df e9 73 ff ff ff 66 66 66 66 2e 0f 1f 84 00 00 
[  982.653807] RIP  [<ffffffffa0257778>] __btrfs_map_block+0x668/0x680 [btrfs]
[  982.653853]  RSP <ffff8801f43cfa70>
[  982.653877] ---[ end trace e93713a9d40cd06e ]---
------------------------------------------------------------
diff mbox

Patch

diff --git a/fs/btrfs/extent_io.c b/fs/btrfs/extent_io.c
index c9018a0..beabe99 100644
--- a/fs/btrfs/extent_io.c
+++ b/fs/btrfs/extent_io.c
@@ -4519,7 +4519,14 @@  void read_extent_buffer(struct extent_buffer *eb, void *dstv,
        size_t start_offset = eb->start & ((u64)PAGE_CACHE_SIZE - 1);
        unsigned long i = (start_offset + start) >> PAGE_CACHE_SHIFT;
 
-       WARN_ON(start > eb->len);
+       if (start > eb->len) {
+               printk(KERN_ERR "btrfs: invalid parameters for read_extent_buffer: start (%lu) > eb->len (%lu). eb start is %llu, level %d, generation %llu, nritems %d. len param %lu. debug %llu/%llu/%llu/%llu\n",
+                       start, eb->len, eb->start, btrfs_header_level(eb),
+                       btrfs_header_generation(eb), btrfs_header_nritems(eb),
+                       len,
+                       eb->debug[0], eb->debug[1], eb->debug[2], eb->debug[3]);
+               WARN_ON(1);
+       }
        WARN_ON(start + len > eb->start + eb->len);
 
        offset = (start_offset + start) & ((unsigned long)PAGE_CACHE_SIZE - 1);
diff --git a/fs/btrfs/extent_io.h b/fs/btrfs/extent_io.h
index b516c3b..1bbf823 100644
--- a/fs/btrfs/extent_io.h
+++ b/fs/btrfs/extent_io.h
@@ -164,6 +164,8 @@  struct extent_buffer {
        wait_queue_head_t lock_wq;
        struct page *inline_pages[INLINE_EXTENT_BUFFER_PAGES];
        struct page **pages;
+
+       u64 debug[4];
 };
 
 static inline void extent_set_compress_type(unsigned long *bio_flags,
diff --git a/fs/btrfs/reada.c b/fs/btrfs/reada.c
index ac5d010..d9c1146 100644
--- a/fs/btrfs/reada.c
+++ b/fs/btrfs/reada.c
@@ -168,10 +168,15 @@  static int __readahead_hook(struct btrfs_root *root, struct extent_buffer *eb,
                struct btrfs_key key;
                struct btrfs_key next_key;
 
+               eb->debug[0] = 1;
+               eb->debug[1] = i;
+               eb->debug[2] = nritems;
+               eb->debug[3] = generation;
                btrfs_node_key_to_cpu(eb, &key, i);
-               if (i + 1 < nritems)
+               if (i + 1 < nritems) {
+                       eb->debug[0] = 2;
                        btrfs_node_key_to_cpu(eb, &next_key, i + 1);
-               else
+               } else
                        next_key = re->top;
                bytenr = btrfs_node_blockptr(eb, i);
                n_gen = btrfs_node_ptr_generation(eb, i);