Message ID | cover.1689143654.git.wqu@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: preparation patches for the incoming metadata folio conversion | expand |
On 7/12/23 02:37, Qu Wenruo wrote: > [CHANGELOG] > v2: > - Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers > > [BACKGROUND] > > Recently I'm checking on the feasibility on converting metadata handling > to go a folio based solution. > > The best part of using a single folio for metadata is, we can get rid of > the complexity of cross-page handling, everything would be just a single > memory operation on a continuous memory range. > > [PITFALLS] > > One of the biggest problem for metadata folio conversion is, we still > need the current page based solution (or folios with order 0) as a > fallback solution when we can not get a high order folio. > > In that case, there would be a hell to handle the four different > combinations (folio/folio, folio/page, page/folio, page/page) for extent > buffer helpers involving two extent buffers. > > Although there are some new ideas on how to handle metadata memory (e.g. > go full vmallocated memory), reducing the open-coded memory handling for > metadata should always be a good start point. > > [OBJECTIVE] > > So this patchset is the preparation to reduce direct page operations for > metadata. > > The patchset would do this mostly by concentrating the operations to use > the common helper, write_extent_buffer() and read_extent_buffer(). > > For bitmap operations it's much complex, thus this patchset refactor it > completely to go a 3 part solution: > > - Handle the first byte > - Handle the byte aligned ranges > - Handle the last byte > > This needs more complex testing (which I failed several times during > development) to prevent regression. > > Finally there is only one function which can not be properly migrated, > memmove_extent_buffer(), which has to use memmove() calls, thus must go > per-page mapping handling. > > Thankfully if we go folio in the end, the folio based handling would > just be a single memmove(), thus it won't be too much burden. > > > Qu Wenruo (6): > btrfs: tests: enhance extent buffer bitmap tests > btrfs: refactor extent buffer bitmaps operations > btrfs: use write_extent_buffer() to implement > write_extent_buffer_*id() > btrfs: refactor memcpy_extent_buffer() > btrfs: refactor copy_extent_buffer_full() > btrfs: call copy_extent_buffer_full() inside > btrfs_clone_extent_buffer() > > fs/btrfs/extent_io.c | 224 +++++++++++++------------------ > fs/btrfs/extent_io.h | 19 ++- > fs/btrfs/tests/extent-io-tests.c | 161 ++++++++++++++-------- > 3 files changed, 215 insertions(+), 189 deletions(-) > For the series: Reviewed-by: Sweet Tea Dorminy <sweettea-kernel@dorminy.me>
On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote: > One of the biggest problem for metadata folio conversion is, we still > need the current page based solution (or folios with order 0) as a > fallback solution when we can not get a high order folio. Do we? btrfs by default uses a 16k nodesize (order 2 on x86), with a maximum of 64k (order 4). IIRC we should be able to get them pretty reliably. If not the best thning is to just a virtually contigous allocation as fallback, i.e. use vm_map_ram. That's what XFS uses in it's buffer cache, and it already did so before it stopped to use page cache to back it's buffer cache, something I plan to do for the btrfs buffer cache as well, as the page cache algorithms tend to not work very well for buffer based metadata, never mind that there is an incredible amount of complex code just working around the interactions.
On 2023/7/13 00:41, Christoph Hellwig wrote: > On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote: >> One of the biggest problem for metadata folio conversion is, we still >> need the current page based solution (or folios with order 0) as a >> fallback solution when we can not get a high order folio. > > Do we? btrfs by default uses a 16k nodesize (order 2 on x86), with > a maximum of 64k (order 4). IIRC we should be able to get them pretty > reliably. If it can be done as reliable as order 0 with NOFAIL, I'm totally fine with that. > > If not the best thning is to just a virtually contigous allocation as > fallback, i.e. use vm_map_ram. That's also what Sweet Tea Dorminy mentioned, and I believe it's the correct way to go (as the fallback) Although my concern is my lack of experience on MM code, and if those pages can still be attached to address space (with PagePrivate set). > That's what XFS uses in it's buffer > cache, and it already did so before it stopped to use page cache to > back it's buffer cache, something I plan to do for the btrfs buffer > cache as well, as the page cache algorithms tend to not work very > well for buffer based metadata, never mind that there is an incredible > amount of complex code just working around the interactions. Thus we have the preparation patchset as the first step. It should help no matter what the next step we go. Thanks, Qu
On Thu, Jul 13, 2023 at 07:58:17AM +0800, Qu Wenruo wrote: > > Do we? btrfs by default uses a 16k nodesize (order 2 on x86), with > > a maximum of 64k (order 4). IIRC we should be able to get them pretty > > reliably. > > If it can be done as reliable as order 0 with NOFAIL, I'm totally fine with > that. I think that is the aim. I'm not entirely sure if we are entirely there yes, thus the Ccs. > > If not the best thning is to just a virtually contigous allocation as > > fallback, i.e. use vm_map_ram. > > That's also what Sweet Tea Dorminy mentioned, and I believe it's the correct > way to go (as the fallback) > > Although my concern is my lack of experience on MM code, and if those pages > can still be attached to address space (with PagePrivate set). At least they could back in the day when XFS did exactly that. In fact that was the use case why I added vmap originally back in 2002..
On Thu, Jul 13, 2023 at 07:58:17AM +0800, Qu Wenruo wrote: > On 2023/7/13 00:41, Christoph Hellwig wrote: > > On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote: > >> One of the biggest problem for metadata folio conversion is, we still > >> need the current page based solution (or folios with order 0) as a > >> fallback solution when we can not get a high order folio. > > > > Do we? btrfs by default uses a 16k nodesize (order 2 on x86), with > > a maximum of 64k (order 4). IIRC we should be able to get them pretty > > reliably. > > If it can be done as reliable as order 0 with NOFAIL, I'm totally fine > with that. I have mentioned my concerns about the allocation problems with higher order than 0 in the past. Allocator gives some guarantees about not failing for certain levels, now it's 1 (mm/fail_page_alloc.c fail_page_alloc.min_oder = 1). Per comment in page_alloc.c:rmqueue() 2814 /* 2815 * We most definitely don't want callers attempting to 2816 * allocate greater than order-1 page units with __GFP_NOFAIL. 2817 */ 2818 WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); For allocations with higher order, eg. 4 to match the default 16K nodes, this increases pressure and can trigger compaction, logic around PAGE_ALLOC_COSTLY_ORDER which is 3. > > If not the best thning is to just a virtually contigous allocation as > > fallback, i.e. use vm_map_ram. So we can allocate 0-order pages and then map them to virtual addresses, which needs manipulation of PTE (page table entries), and requires additional memory. This is what xfs does, fs/xfs_buf.c:_xfs_buf_map_pages(), needs some care with aliasing memory, so vm_unmap_aliases() is required and brings some overhead, and at the end vm_unmap_ram() needs to be called, another overhead but probably bearable. With all that in place there would be a contiguous memory range representing the metadata, so a simple memcpy() can be done. Sure, with higher overhead and decreased reliability due to potentially failing memory allocations - for metadata operations. Compare that to what we have: Pages are allocated as order 0, so there's much higher chance to get them under pressure and not increasing the pressure otherwise. We don't need any virtual mappings. The cost is that we have to iterate the pages and do the partial copying ourselves, but this is hidden in helpers. We have different usage pattern of the metadata buffers than xfs, so that it does something with vmapped contiguous buffers may not be easily transferable to btrfs and bring us new problems. The conversion to folios will happen eventually, though I don't want to sacrifice reliability just for API use convenience. First the conversion should be done 1:1 with pages and folios both order 0 before switching to some higher order allocations hidden behind API calls.
On 2023/7/13 19:26, David Sterba wrote: > On Thu, Jul 13, 2023 at 07:58:17AM +0800, Qu Wenruo wrote: >> On 2023/7/13 00:41, Christoph Hellwig wrote: >>> On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote: >>>> One of the biggest problem for metadata folio conversion is, we still >>>> need the current page based solution (or folios with order 0) as a >>>> fallback solution when we can not get a high order folio. >>> >>> Do we? btrfs by default uses a 16k nodesize (order 2 on x86), with >>> a maximum of 64k (order 4). IIRC we should be able to get them pretty >>> reliably. >> >> If it can be done as reliable as order 0 with NOFAIL, I'm totally fine >> with that. > > I have mentioned my concerns about the allocation problems with higher > order than 0 in the past. Allocator gives some guarantees about not > failing for certain levels, now it's 1 (mm/fail_page_alloc.c > fail_page_alloc.min_oder = 1). > > Per comment in page_alloc.c:rmqueue() > > 2814 /* > 2815 * We most definitely don't want callers attempting to > 2816 * allocate greater than order-1 page units with __GFP_NOFAIL. > 2817 */ > 2818 WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > > For allocations with higher order, eg. 4 to match the default 16K nodes, > this increases pressure and can trigger compaction, logic around > PAGE_ALLOC_COSTLY_ORDER which is 3. > >>> If not the best thning is to just a virtually contigous allocation as >>> fallback, i.e. use vm_map_ram. > > So we can allocate 0-order pages and then map them to virtual addresses, > which needs manipulation of PTE (page table entries), and requires > additional memory. This is what xfs does, > fs/xfs_buf.c:_xfs_buf_map_pages(), needs some care with aliasing memory, > so vm_unmap_aliases() is required and brings some overhead, and at the > end vm_unmap_ram() needs to be called, another overhead but probably > bearable. > > With all that in place there would be a contiguous memory range > representing the metadata, so a simple memcpy() can be done. Sure, > with higher overhead and decreased reliability due to potentially > failing memory allocations - for metadata operations. > > Compare that to what we have: > > Pages are allocated as order 0, so there's much higher chance to get > them under pressure and not increasing the pressure otherwise. We don't > need any virtual mappings. The cost is that we have to iterate the pages > and do the partial copying ourselves, but this is hidden in helpers. > > We have different usage pattern of the metadata buffers than xfs, so > that it does something with vmapped contiguous buffers may not be easily > transferable to btrfs and bring us new problems. > > The conversion to folios will happen eventually, though I don't want to > sacrifice reliability just for API use convenience. First the conversion > should be done 1:1 with pages and folios both order 0 before switching > to some higher order allocations hidden behind API calls. In fact, I have another solution as a middle ground before adding folio into the situation. Check if the pages are already physically continuous. If so, everything can go without any cross-page handling. If not, we can either keep the current cross-page handling, or migrate to the virtually continuous mapped pages. Currently we already have around 50~66% of eb pages are already allocated physically continuous. If we can just reduce the cross page handling for more than half of the ebs, it's already a win. For the vmapped pages, I'm not sure about the overhead, but I can try to go that path and check the result. Thanks, Qu
On Thu, Jul 13, 2023 at 07:41:53PM +0800, Qu Wenruo wrote: > On 2023/7/13 19:26, David Sterba wrote: > > On Thu, Jul 13, 2023 at 07:58:17AM +0800, Qu Wenruo wrote: > >> On 2023/7/13 00:41, Christoph Hellwig wrote: > >>> On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote: > >>>> One of the biggest problem for metadata folio conversion is, we still > >>>> need the current page based solution (or folios with order 0) as a > >>>> fallback solution when we can not get a high order folio. > >>> > >>> Do we? btrfs by default uses a 16k nodesize (order 2 on x86), with > >>> a maximum of 64k (order 4). IIRC we should be able to get them pretty > >>> reliably. > >> > >> If it can be done as reliable as order 0 with NOFAIL, I'm totally fine > >> with that. > > > > I have mentioned my concerns about the allocation problems with higher > > order than 0 in the past. Allocator gives some guarantees about not > > failing for certain levels, now it's 1 (mm/fail_page_alloc.c > > fail_page_alloc.min_oder = 1). > > > > Per comment in page_alloc.c:rmqueue() > > > > 2814 /* > > 2815 * We most definitely don't want callers attempting to > > 2816 * allocate greater than order-1 page units with __GFP_NOFAIL. > > 2817 */ > > 2818 WARN_ON_ONCE((gfp_flags & __GFP_NOFAIL) && (order > 1)); > > > > For allocations with higher order, eg. 4 to match the default 16K nodes, > > this increases pressure and can trigger compaction, logic around > > PAGE_ALLOC_COSTLY_ORDER which is 3. > > > >>> If not the best thning is to just a virtually contigous allocation as > >>> fallback, i.e. use vm_map_ram. > > > > So we can allocate 0-order pages and then map them to virtual addresses, > > which needs manipulation of PTE (page table entries), and requires > > additional memory. This is what xfs does, > > fs/xfs_buf.c:_xfs_buf_map_pages(), needs some care with aliasing memory, > > so vm_unmap_aliases() is required and brings some overhead, and at the > > end vm_unmap_ram() needs to be called, another overhead but probably > > bearable. > > > > With all that in place there would be a contiguous memory range > > representing the metadata, so a simple memcpy() can be done. Sure, > > with higher overhead and decreased reliability due to potentially > > failing memory allocations - for metadata operations. > > > > Compare that to what we have: > > > > Pages are allocated as order 0, so there's much higher chance to get > > them under pressure and not increasing the pressure otherwise. We don't > > need any virtual mappings. The cost is that we have to iterate the pages > > and do the partial copying ourselves, but this is hidden in helpers. > > > > We have different usage pattern of the metadata buffers than xfs, so > > that it does something with vmapped contiguous buffers may not be easily > > transferable to btrfs and bring us new problems. > > > > The conversion to folios will happen eventually, though I don't want to > > sacrifice reliability just for API use convenience. First the conversion > > should be done 1:1 with pages and folios both order 0 before switching > > to some higher order allocations hidden behind API calls. > > In fact, I have another solution as a middle ground before adding folio > into the situation. > > Check if the pages are already physically continuous. > If so, everything can go without any cross-page handling. > > If not, we can either keep the current cross-page handling, or migrate > to the virtually continuous mapped pages. > > Currently we already have around 50~66% of eb pages are already > allocated physically continuous. Memory fragmentation becomes problem over time on systems running for weeks/months, then the contiguous ranges will became scarce. So if you measure that on a system with a lot of memory and for a short time then of course this will reach high rate of contiguous pages.
On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote: > [CHANGELOG] > v2: > - Define write_extent_buffer_fsid/chunk_tree_uuid() as inline helpers > > [BACKGROUND] > > Recently I'm checking on the feasibility on converting metadata handling > to go a folio based solution. > > The best part of using a single folio for metadata is, we can get rid of > the complexity of cross-page handling, everything would be just a single > memory operation on a continuous memory range. > > [PITFALLS] > > One of the biggest problem for metadata folio conversion is, we still > need the current page based solution (or folios with order 0) as a > fallback solution when we can not get a high order folio. > > In that case, there would be a hell to handle the four different > combinations (folio/folio, folio/page, page/folio, page/page) for extent > buffer helpers involving two extent buffers. > > Although there are some new ideas on how to handle metadata memory (e.g. > go full vmallocated memory), reducing the open-coded memory handling for > metadata should always be a good start point. > > [OBJECTIVE] > > So this patchset is the preparation to reduce direct page operations for > metadata. > > The patchset would do this mostly by concentrating the operations to use > the common helper, write_extent_buffer() and read_extent_buffer(). > > For bitmap operations it's much complex, thus this patchset refactor it > completely to go a 3 part solution: > > - Handle the first byte > - Handle the byte aligned ranges > - Handle the last byte > > This needs more complex testing (which I failed several times during > development) to prevent regression. > > Finally there is only one function which can not be properly migrated, > memmove_extent_buffer(), which has to use memmove() calls, thus must go > per-page mapping handling. > > Thankfully if we go folio in the end, the folio based handling would > just be a single memmove(), thus it won't be too much burden. > > > Qu Wenruo (6): > btrfs: tests: enhance extent buffer bitmap tests > btrfs: refactor extent buffer bitmaps operations > btrfs: use write_extent_buffer() to implement > write_extent_buffer_*id() > btrfs: refactor memcpy_extent_buffer() > btrfs: refactor copy_extent_buffer_full() > btrfs: call copy_extent_buffer_full() inside > btrfs_clone_extent_buffer() Added to misc-next, with some fixups, thanks. How far we'll get with the folio conversions or other page contiguous improvements depends but this patchset is fairly independent.
On Thu, Jul 13, 2023 at 02:09:35PM +0200, David Sterba wrote: > On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote: > Added to misc-next And removed again, it explodes right before the first test: BTRFS: device fsid 4e9cf0f7-cdc4-4e38-9e59-de4d88122ee9 devid 1 transid 6 /dev/vdb scanned by mkfs.btrfs (13714) BTRFS info (device vdb): using crc32c (crc32c-generic) checksum algorithm BTRFS info (device vdb): using free space tree BTRFS info (device vdb): auto enabling async discard BTRFS info (device vdb): checking UUID tree ------------[ cut here ]------------ WARNING: CPU: 3 PID: 13739 at fs/btrfs/extent-tree.c:3026 __btrfs_free_extent+0x9ac/0x1280 [btrfs] Modules linked in: btrfs blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop CPU: 3 PID: 13739 Comm: umount Not tainted 6.5.0-rc1-default+ #2126 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 RIP: 0010:__btrfs_free_extent+0x9ac/0x1280 [btrfs] RSP: 0018:ffff8880031c78a8 EFLAGS: 00010246 RAX: 0000000000000000 RBX: ffff88802ec71708 RCX: ffffffffc065e9ba RDX: dffffc0000000000 RSI: ffffffffc063f610 RDI: ffff888026511130 RBP: ffff888002734000 R08: 0000000000000000 R09: ffffed1000638eff R10: ffff8880031c77ff R11: 0000000000000001 R12: 0000000000000001 R13: ffff8880058522b8 R14: ffff8880265110e0 R15: 0000000001d24000 FS: 00007fb5c22e9800(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2b194fcfc4 CR3: 000000002f32a000 CR4: 00000000000006a0 Call Trace: <TASK> ? __warn+0xa1/0x200 ? __btrfs_free_extent+0x9ac/0x1280 [btrfs] ? report_bug+0x207/0x270 ? handle_bug+0x65/0x90 ? exc_invalid_op+0x13/0x40 ? asm_exc_invalid_op+0x16/0x20 ? __btrfs_free_extent+0x39a/0x1280 [btrfs] ? unlock_up+0x160/0x370 [btrfs] ? __btrfs_free_extent+0x9ac/0x1280 [btrfs] ? __btrfs_free_extent+0x39a/0x1280 [btrfs] ? lookup_extent_backref+0xd0/0xd0 [btrfs] ? __lock_release.isra.0+0x14e/0x510 ? reacquire_held_locks+0x280/0x280 run_delayed_tree_ref+0x10b/0x2d0 [btrfs] btrfs_run_delayed_refs_for_head+0x630/0x960 [btrfs] __btrfs_run_delayed_refs+0xce/0x160 [btrfs] btrfs_run_delayed_refs+0xe7/0x2a0 [btrfs] commit_cowonly_roots+0x3f1/0x4c0 [btrfs] ? trace_btrfs_transaction_commit+0xd0/0xd0 [btrfs] ? btrfs_commit_transaction+0xbbe/0x17e0 [btrfs] btrfs_commit_transaction+0xc13/0x17e0 [btrfs] ? cleanup_transaction+0x640/0x640 [btrfs] ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs] sync_filesystem+0xd3/0x100 generic_shutdown_super+0x44/0x1f0 kill_anon_super+0x1e/0x40 btrfs_kill_super+0x25/0x30 [btrfs] deactivate_locked_super+0x4c/0xc0 cleanup_mnt+0x13a/0x1f0 task_work_run+0xf2/0x170 ? task_work_cancel+0x20/0x20 ? mark_held_locks+0x1a/0x80 exit_to_user_mode_prepare+0x16c/0x170 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x49/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fb5c250f4bb RSP: 002b:00007ffeee578518 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 000055bf227429f0 RCX: 00007fb5c250f4bb RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055bf22742c20 RBP: 000055bf22742b08 R08: 0000000000000073 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000055bf22742c20 R14: 0000000000000000 R15: 00007ffeee57b084 </TASK> irq event stamp: 11109 hardirqs last enabled at (11119): [<ffffffff841678a2>] __up_console_sem+0x52/0x60 hardirqs last disabled at (11130): [<ffffffff84167887>] __up_console_sem+0x37/0x60 softirqs last enabled at (11084): [<ffffffff84cc910b>] __do_softirq+0x31b/0x5ae softirqs last disabled at (11079): [<ffffffff840b5b09>] irq_exit_rcu+0xa9/0x100 ---[ end trace 0000000000000000 ]--- ------------[ cut here ]------------ BTRFS: Transaction aborted (error -117) WARNING: CPU: 3 PID: 13739 at fs/btrfs/extent-tree.c:3027 __btrfs_free_extent+0x10ff/0x1280 [btrfs] Modules linked in: btrfs blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop CPU: 3 PID: 13739 Comm: umount Tainted: G W 6.5.0-rc1-default+ #2126 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 RIP: 0010:__btrfs_free_extent+0x10ff/0x1280 [btrfs] RSP: 0018:ffff8880031c78a8 EFLAGS: 00010282 RAX: 0000000000000000 RBX: ffff88802ec71708 RCX: 0000000000000000 RDX: 0000000000000002 RSI: ffffffff841007a8 RDI: ffffffff87c9e0e0 RBP: ffff888002734000 R08: 0000000000000001 R09: ffffed1000638eba R10: ffff8880031c75d7 R11: 0000000000000001 R12: 0000000000000001 R13: ffff8880058522b8 R14: ffff8880265110e0 R15: 0000000001d24000 FS: 00007fb5c22e9800(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f2b194fcfc4 CR3: 000000002f32a000 CR4: 00000000000006a0 Call Trace: <TASK> ? __warn+0xa1/0x200 ? __btrfs_free_extent+0x10ff/0x1280 [btrfs] ? report_bug+0x207/0x270 ? handle_bug+0x65/0x90 ? exc_invalid_op+0x13/0x40 ? asm_exc_invalid_op+0x16/0x20 ? preempt_count_sub+0x18/0xc0 ? __btrfs_free_extent+0x10ff/0x1280 [btrfs] ? __btrfs_free_extent+0x10ff/0x1280 [btrfs] ? lookup_extent_backref+0xd0/0xd0 [btrfs] ? __lock_release.isra.0+0x14e/0x510 ? reacquire_held_locks+0x280/0x280 run_delayed_tree_ref+0x10b/0x2d0 [btrfs] btrfs_run_delayed_refs_for_head+0x630/0x960 [btrfs] __btrfs_run_delayed_refs+0xce/0x160 [btrfs] btrfs_run_delayed_refs+0xe7/0x2a0 [btrfs] commit_cowonly_roots+0x3f1/0x4c0 [btrfs] ? trace_btrfs_transaction_commit+0xd0/0xd0 [btrfs] ? btrfs_commit_transaction+0xbbe/0x17e0 [btrfs] btrfs_commit_transaction+0xc13/0x17e0 [btrfs] ? cleanup_transaction+0x640/0x640 [btrfs] ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs] sync_filesystem+0xd3/0x100 generic_shutdown_super+0x44/0x1f0 kill_anon_super+0x1e/0x40 btrfs_kill_super+0x25/0x30 [btrfs] deactivate_locked_super+0x4c/0xc0 cleanup_mnt+0x13a/0x1f0 task_work_run+0xf2/0x170 ? task_work_cancel+0x20/0x20 ? mark_held_locks+0x1a/0x80 exit_to_user_mode_prepare+0x16c/0x170 syscall_exit_to_user_mode+0x19/0x50 do_syscall_64+0x49/0x90 entry_SYSCALL_64_after_hwframe+0x46/0xb0 RIP: 0033:0x7fb5c250f4bb RSP: 002b:00007ffeee578518 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 RAX: 0000000000000000 RBX: 000055bf227429f0 RCX: 00007fb5c250f4bb RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055bf22742c20 RBP: 000055bf22742b08 R08: 0000000000000073 R09: 0000000000000001 R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 R13: 000055bf22742c20 R14: 0000000000000000 R15: 00007ffeee57b084 </TASK> irq event stamp: 11925 hardirqs last enabled at (11935): [<ffffffff841678a2>] __up_console_sem+0x52/0x60 hardirqs last disabled at (11946): [<ffffffff84167887>] __up_console_sem+0x37/0x60 softirqs last enabled at (11084): [<ffffffff84cc910b>] __do_softirq+0x31b/0x5ae softirqs last disabled at (11079): [<ffffffff840b5b09>] irq_exit_rcu+0xa9/0x100 ---[ end trace 0000000000000000 ]--- BTRFS: error (device vdb: state A) in __btrfs_free_extent:3027: errno=-117 Filesystem corrupted BTRFS info (device vdb: state EA): forced readonly BTRFS info (device vdb: state EA): leaf 30474240 gen 7 total ptrs 16 free space 15382 owner 2 BTRFS info (device vdb: state EA): refs 3 lock_owner 13739 current 13739 item 0 key (13631488 192 8388608) itemoff 16259 itemsize 24 block group used 0 chunk_objectid 256 flags 1 item 1 key (22020096 192 8388608) itemoff 16235 itemsize 24 block group used 16384 chunk_objectid 256 flags 34 item 2 key (22036480 169 0) itemoff 16202 itemsize 33 extent refs 1 gen 6 flags 2 ref#0: tree block backref root 3 item 3 key (30408704 169 0) itemoff 16169 itemsize 33 extent refs 1 gen 6 flags 2 ref#0: tree block backref root 2 item 4 key (30408704 192 268435456) itemoff 16145 itemsize 24 block group used 131072 chunk_objectid 256 flags 36 item 5 key (30425088 169 0) itemoff 16112 itemsize 33 extent refs 1 gen 5 flags 2 ref#0: tree block backref root 5 item 6 key (30441472 169 0) itemoff 16079 itemsize 33 extent refs 1 gen 7 flags 2 ref#0: tree block backref root 1 item 7 key (30457856 169 0) itemoff 16046 itemsize 33 extent refs 1 gen 7 flags 2 ref#0: tree block backref root 4 item 8 key (30474240 169 0) itemoff 16013 itemsize 33 extent refs 1 gen 7 flags 2 ref#0: tree block backref root 2 item 9 key (30490624 169 0) itemoff 15980 itemsize 33 extent refs 1 gen 5 flags 2 ref#0: tree block backref root 7 item 10 key (30507008 169 0) itemoff 15947 itemsize 33 extent refs 1 gen 7 flags 2 ref#0: tree block backref root 10 item 11 key (30523392 169 0) itemoff 15914 itemsize 33 extent refs 1 gen 5 flags 2 ref#0: tree block backref root 7 item 12 key (30539776 169 0) itemoff 15881 itemsize 33 extent refs 1 gen 5 flags 2 ref#0: tree block backref root 7 item 13 key (30556160 169 0) itemoff 15848 itemsize 33 extent refs 1 gen 5 flags 2 ref#0: tree block backref root 7 item 14 key (30572544 169 0) itemoff 15815 itemsize 33 extent refs 1 gen 5 flags 2 ref#0: tree block backref root 7 item 15 key (30588928 169 0) itemoff 15782 itemsize 33 extent refs 1 gen 5 flags 2 ref#0: tree block backref root 7 BTRFS critical (device vdb: state EA): unable to find ref byte nr 30556160 parent 0 root 4 owner 0 offset 0 slot 14 BTRFS error (device vdb: state EA): failed to run delayed ref for logical 30556160 num_bytes 16384 type 176 action 2 ref_mod 1: -2 BTRFS: error (device vdb: state EA) in btrfs_run_delayed_refs:2102: errno=-2 No such entry BTRFS warning (device vdb: state EA): Skipping commit of aborted transaction. BTRFS: error (device vdb: state EA) in cleanup_transaction:1977: errno=-2 No such entry
On 2023/7/14 00:39, David Sterba wrote: > On Thu, Jul 13, 2023 at 02:09:35PM +0200, David Sterba wrote: >> On Wed, Jul 12, 2023 at 02:37:40PM +0800, Qu Wenruo wrote: >> Added to misc-next > > And removed again, it explodes right before the first test: Weird, it passed my local btrfs/* tests. > > BTRFS: device fsid 4e9cf0f7-cdc4-4e38-9e59-de4d88122ee9 devid 1 transid 6 /dev/vdb scanned by mkfs.btrfs (13714) > BTRFS info (device vdb): using crc32c (crc32c-generic) checksum algorithm > BTRFS info (device vdb): using free space tree > BTRFS info (device vdb): auto enabling async discard > BTRFS info (device vdb): checking UUID tree > ------------[ cut here ]------------ > WARNING: CPU: 3 PID: 13739 at fs/btrfs/extent-tree.c:3026 __btrfs_free_extent+0x9ac/0x1280 [btrfs] > Modules linked in: btrfs blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop > CPU: 3 PID: 13739 Comm: umount Not tainted 6.5.0-rc1-default+ #2126 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 > RIP: 0010:__btrfs_free_extent+0x9ac/0x1280 [btrfs] > RSP: 0018:ffff8880031c78a8 EFLAGS: 00010246 > RAX: 0000000000000000 RBX: ffff88802ec71708 RCX: ffffffffc065e9ba > RDX: dffffc0000000000 RSI: ffffffffc063f610 RDI: ffff888026511130 > RBP: ffff888002734000 R08: 0000000000000000 R09: ffffed1000638eff > R10: ffff8880031c77ff R11: 0000000000000001 R12: 0000000000000001 > R13: ffff8880058522b8 R14: ffff8880265110e0 R15: 0000000001d24000 > FS: 00007fb5c22e9800(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f2b194fcfc4 CR3: 000000002f32a000 CR4: 00000000000006a0 > Call Trace: > <TASK> > ? __warn+0xa1/0x200 > ? __btrfs_free_extent+0x9ac/0x1280 [btrfs] > ? report_bug+0x207/0x270 > ? handle_bug+0x65/0x90 > ? exc_invalid_op+0x13/0x40 > ? asm_exc_invalid_op+0x16/0x20 > ? __btrfs_free_extent+0x39a/0x1280 [btrfs] > ? unlock_up+0x160/0x370 [btrfs] > ? __btrfs_free_extent+0x9ac/0x1280 [btrfs] > ? __btrfs_free_extent+0x39a/0x1280 [btrfs] > ? lookup_extent_backref+0xd0/0xd0 [btrfs] > ? __lock_release.isra.0+0x14e/0x510 > ? reacquire_held_locks+0x280/0x280 > run_delayed_tree_ref+0x10b/0x2d0 [btrfs] > btrfs_run_delayed_refs_for_head+0x630/0x960 [btrfs] > __btrfs_run_delayed_refs+0xce/0x160 [btrfs] > btrfs_run_delayed_refs+0xe7/0x2a0 [btrfs] > commit_cowonly_roots+0x3f1/0x4c0 [btrfs] > ? trace_btrfs_transaction_commit+0xd0/0xd0 [btrfs] > ? btrfs_commit_transaction+0xbbe/0x17e0 [btrfs] > btrfs_commit_transaction+0xc13/0x17e0 [btrfs] > ? cleanup_transaction+0x640/0x640 [btrfs] > ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs] > sync_filesystem+0xd3/0x100 > generic_shutdown_super+0x44/0x1f0 > kill_anon_super+0x1e/0x40 > btrfs_kill_super+0x25/0x30 [btrfs] > deactivate_locked_super+0x4c/0xc0 > cleanup_mnt+0x13a/0x1f0 > task_work_run+0xf2/0x170 > ? task_work_cancel+0x20/0x20 > ? mark_held_locks+0x1a/0x80 > exit_to_user_mode_prepare+0x16c/0x170 > syscall_exit_to_user_mode+0x19/0x50 > do_syscall_64+0x49/0x90 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > RIP: 0033:0x7fb5c250f4bb > RSP: 002b:00007ffeee578518 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 > RAX: 0000000000000000 RBX: 000055bf227429f0 RCX: 00007fb5c250f4bb > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055bf22742c20 > RBP: 000055bf22742b08 R08: 0000000000000073 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 000055bf22742c20 R14: 0000000000000000 R15: 00007ffeee57b084 > </TASK> > irq event stamp: 11109 > hardirqs last enabled at (11119): [<ffffffff841678a2>] __up_console_sem+0x52/0x60 > hardirqs last disabled at (11130): [<ffffffff84167887>] __up_console_sem+0x37/0x60 > softirqs last enabled at (11084): [<ffffffff84cc910b>] __do_softirq+0x31b/0x5ae > softirqs last disabled at (11079): [<ffffffff840b5b09>] irq_exit_rcu+0xa9/0x100 > ---[ end trace 0000000000000000 ]--- > ------------[ cut here ]------------ > BTRFS: Transaction aborted (error -117) > WARNING: CPU: 3 PID: 13739 at fs/btrfs/extent-tree.c:3027 __btrfs_free_extent+0x10ff/0x1280 [btrfs] > Modules linked in: btrfs blake2b_generic libcrc32c xor lzo_compress lzo_decompress raid6_pq zstd_decompress zstd_compress xxhash zstd_common loop > CPU: 3 PID: 13739 Comm: umount Tainted: G W 6.5.0-rc1-default+ #2126 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552-rebuilt.opensuse.org 04/01/2014 > RIP: 0010:__btrfs_free_extent+0x10ff/0x1280 [btrfs] > RSP: 0018:ffff8880031c78a8 EFLAGS: 00010282 > RAX: 0000000000000000 RBX: ffff88802ec71708 RCX: 0000000000000000 > RDX: 0000000000000002 RSI: ffffffff841007a8 RDI: ffffffff87c9e0e0 > RBP: ffff888002734000 R08: 0000000000000001 R09: ffffed1000638eba > R10: ffff8880031c75d7 R11: 0000000000000001 R12: 0000000000000001 > R13: ffff8880058522b8 R14: ffff8880265110e0 R15: 0000000001d24000 > FS: 00007fb5c22e9800(0000) GS:ffff88806d200000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f2b194fcfc4 CR3: 000000002f32a000 CR4: 00000000000006a0 > Call Trace: > <TASK> > ? __warn+0xa1/0x200 > ? __btrfs_free_extent+0x10ff/0x1280 [btrfs] > ? report_bug+0x207/0x270 > ? handle_bug+0x65/0x90 > ? exc_invalid_op+0x13/0x40 > ? asm_exc_invalid_op+0x16/0x20 > ? preempt_count_sub+0x18/0xc0 > ? __btrfs_free_extent+0x10ff/0x1280 [btrfs] > ? __btrfs_free_extent+0x10ff/0x1280 [btrfs] > ? lookup_extent_backref+0xd0/0xd0 [btrfs] > ? __lock_release.isra.0+0x14e/0x510 > ? reacquire_held_locks+0x280/0x280 > run_delayed_tree_ref+0x10b/0x2d0 [btrfs] > btrfs_run_delayed_refs_for_head+0x630/0x960 [btrfs] > __btrfs_run_delayed_refs+0xce/0x160 [btrfs] > btrfs_run_delayed_refs+0xe7/0x2a0 [btrfs] > commit_cowonly_roots+0x3f1/0x4c0 [btrfs] > ? trace_btrfs_transaction_commit+0xd0/0xd0 [btrfs] > ? btrfs_commit_transaction+0xbbe/0x17e0 [btrfs] > btrfs_commit_transaction+0xc13/0x17e0 [btrfs] > ? cleanup_transaction+0x640/0x640 [btrfs] > ? btrfs_attach_transaction_barrier+0x1e/0x50 [btrfs] > sync_filesystem+0xd3/0x100 > generic_shutdown_super+0x44/0x1f0 > kill_anon_super+0x1e/0x40 > btrfs_kill_super+0x25/0x30 [btrfs] > deactivate_locked_super+0x4c/0xc0 > cleanup_mnt+0x13a/0x1f0 > task_work_run+0xf2/0x170 > ? task_work_cancel+0x20/0x20 > ? mark_held_locks+0x1a/0x80 > exit_to_user_mode_prepare+0x16c/0x170 > syscall_exit_to_user_mode+0x19/0x50 > do_syscall_64+0x49/0x90 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > RIP: 0033:0x7fb5c250f4bb > RSP: 002b:00007ffeee578518 EFLAGS: 00000246 ORIG_RAX: 00000000000000a6 > RAX: 0000000000000000 RBX: 000055bf227429f0 RCX: 00007fb5c250f4bb > RDX: 0000000000000000 RSI: 0000000000000000 RDI: 000055bf22742c20 > RBP: 000055bf22742b08 R08: 0000000000000073 R09: 0000000000000001 > R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 > R13: 000055bf22742c20 R14: 0000000000000000 R15: 00007ffeee57b084 > </TASK> > irq event stamp: 11925 > hardirqs last enabled at (11935): [<ffffffff841678a2>] __up_console_sem+0x52/0x60 > hardirqs last disabled at (11946): [<ffffffff84167887>] __up_console_sem+0x37/0x60 > softirqs last enabled at (11084): [<ffffffff84cc910b>] __do_softirq+0x31b/0x5ae > softirqs last disabled at (11079): [<ffffffff840b5b09>] irq_exit_rcu+0xa9/0x100 > ---[ end trace 0000000000000000 ]--- > BTRFS: error (device vdb: state A) in __btrfs_free_extent:3027: errno=-117 Filesystem corrupted > BTRFS info (device vdb: state EA): forced readonly > BTRFS info (device vdb: state EA): leaf 30474240 gen 7 total ptrs 16 free space 15382 owner 2 > BTRFS info (device vdb: state EA): refs 3 lock_owner 13739 current 13739 > item 0 key (13631488 192 8388608) itemoff 16259 itemsize 24 > block group used 0 chunk_objectid 256 flags 1 > item 1 key (22020096 192 8388608) itemoff 16235 itemsize 24 > block group used 16384 chunk_objectid 256 flags 34 > item 2 key (22036480 169 0) itemoff 16202 itemsize 33 > extent refs 1 gen 6 flags 2 > ref#0: tree block backref root 3 > item 3 key (30408704 169 0) itemoff 16169 itemsize 33 > extent refs 1 gen 6 flags 2 > ref#0: tree block backref root 2 > item 4 key (30408704 192 268435456) itemoff 16145 itemsize 24 > block group used 131072 chunk_objectid 256 flags 36 > item 5 key (30425088 169 0) itemoff 16112 itemsize 33 > extent refs 1 gen 5 flags 2 > ref#0: tree block backref root 5 > item 6 key (30441472 169 0) itemoff 16079 itemsize 33 > extent refs 1 gen 7 flags 2 > ref#0: tree block backref root 1 > item 7 key (30457856 169 0) itemoff 16046 itemsize 33 > extent refs 1 gen 7 flags 2 > ref#0: tree block backref root 4 > item 8 key (30474240 169 0) itemoff 16013 itemsize 33 > extent refs 1 gen 7 flags 2 > ref#0: tree block backref root 2 > item 9 key (30490624 169 0) itemoff 15980 itemsize 33 > extent refs 1 gen 5 flags 2 > ref#0: tree block backref root 7 > item 10 key (30507008 169 0) itemoff 15947 itemsize 33 > extent refs 1 gen 7 flags 2 > ref#0: tree block backref root 10 > item 11 key (30523392 169 0) itemoff 15914 itemsize 33 > extent refs 1 gen 5 flags 2 > ref#0: tree block backref root 7 > item 12 key (30539776 169 0) itemoff 15881 itemsize 33 > extent refs 1 gen 5 flags 2 > ref#0: tree block backref root 7 > item 13 key (30556160 169 0) itemoff 15848 itemsize 33 > extent refs 1 gen 5 flags 2 > ref#0: tree block backref root 7 > item 14 key (30572544 169 0) itemoff 15815 itemsize 33 > extent refs 1 gen 5 flags 2 > ref#0: tree block backref root 7 > item 15 key (30588928 169 0) itemoff 15782 itemsize 33 > extent refs 1 gen 5 flags 2 > ref#0: tree block backref root 7 This looks like an error in memmove_extent_buffer() which I intentionally didn't touch. Anyway I'll try rebase and more tests. Can you put your modified commits in an external branch so I can inherit all your modifications? Thanks, Qu > BTRFS critical (device vdb: state EA): unable to find ref byte nr 30556160 parent 0 root 4 owner 0 offset 0 slot 14 > BTRFS error (device vdb: state EA): failed to run delayed ref for logical 30556160 num_bytes 16384 type 176 action 2 ref_mod 1: -2 > BTRFS: error (device vdb: state EA) in btrfs_run_delayed_refs:2102: errno=-2 No such entry > BTRFS warning (device vdb: state EA): Skipping commit of aborted transaction. > BTRFS: error (device vdb: state EA) in cleanup_transaction:1977: errno=-2 No such entry
On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote: > On 2023/7/14 00:39, David Sterba wrote: > > ref#0: tree block backref root 7 > > item 14 key (30572544 169 0) itemoff 15815 itemsize 33 > > extent refs 1 gen 5 flags 2 > > ref#0: tree block backref root 7 > > item 15 key (30588928 169 0) itemoff 15782 itemsize 33 > > extent refs 1 gen 5 flags 2 > > ref#0: tree block backref root 7 > > This looks like an error in memmove_extent_buffer() which I > intentionally didn't touch. > > Anyway I'll try rebase and more tests. > > Can you put your modified commits in an external branch so I can inherit > all your modifications? First I saw the crashes with the modified patches but the report is from what you sent to the mailinglist so I can eliminate error on my side.
On 2023/7/14 06:03, David Sterba wrote: > On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote: >> On 2023/7/14 00:39, David Sterba wrote: >>> ref#0: tree block backref root 7 >>> item 14 key (30572544 169 0) itemoff 15815 itemsize 33 >>> extent refs 1 gen 5 flags 2 >>> ref#0: tree block backref root 7 >>> item 15 key (30588928 169 0) itemoff 15782 itemsize 33 >>> extent refs 1 gen 5 flags 2 >>> ref#0: tree block backref root 7 >> >> This looks like an error in memmove_extent_buffer() which I >> intentionally didn't touch. >> >> Anyway I'll try rebase and more tests. >> >> Can you put your modified commits in an external branch so I can inherit >> all your modifications? > > First I saw the crashes with the modified patches but the report is from > what you sent to the mailinglist so I can eliminate error on my side. Still a branch would help a lot, as you won't want to re-do the usual modification (like grammar, comments etc). Thanks, Qu
On Fri, Jul 14, 2023 at 08:09:16AM +0800, Qu Wenruo wrote: > > > On 2023/7/14 06:03, David Sterba wrote: > > On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote: > >> On 2023/7/14 00:39, David Sterba wrote: > >>> ref#0: tree block backref root 7 > >>> item 14 key (30572544 169 0) itemoff 15815 itemsize 33 > >>> extent refs 1 gen 5 flags 2 > >>> ref#0: tree block backref root 7 > >>> item 15 key (30588928 169 0) itemoff 15782 itemsize 33 > >>> extent refs 1 gen 5 flags 2 > >>> ref#0: tree block backref root 7 > >> > >> This looks like an error in memmove_extent_buffer() which I > >> intentionally didn't touch. > >> > >> Anyway I'll try rebase and more tests. > >> > >> Can you put your modified commits in an external branch so I can inherit > >> all your modifications? > > > > First I saw the crashes with the modified patches but the report is from > > what you sent to the mailinglist so I can eliminate error on my side. > > Still a branch would help a lot, as you won't want to re-do the usual > modification (like grammar, comments etc). Branch ext/qu-eb-page-clanups-updated-broken at github.
On 2023/7/14 08:26, David Sterba wrote: > On Fri, Jul 14, 2023 at 08:09:16AM +0800, Qu Wenruo wrote: >> >> >> On 2023/7/14 06:03, David Sterba wrote: >>> On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote: >>>> On 2023/7/14 00:39, David Sterba wrote: >>>>> ref#0: tree block backref root 7 >>>>> item 14 key (30572544 169 0) itemoff 15815 itemsize 33 >>>>> extent refs 1 gen 5 flags 2 >>>>> ref#0: tree block backref root 7 >>>>> item 15 key (30588928 169 0) itemoff 15782 itemsize 33 >>>>> extent refs 1 gen 5 flags 2 >>>>> ref#0: tree block backref root 7 >>>> >>>> This looks like an error in memmove_extent_buffer() which I >>>> intentionally didn't touch. >>>> >>>> Anyway I'll try rebase and more tests. >>>> >>>> Can you put your modified commits in an external branch so I can inherit >>>> all your modifications? >>> >>> First I saw the crashes with the modified patches but the report is from >>> what you sent to the mailinglist so I can eliminate error on my side. >> >> Still a branch would help a lot, as you won't want to re-do the usual >> modification (like grammar, comments etc). > > Branch ext/qu-eb-page-clanups-updated-broken at github. Already running the auto group with that branch, and no explosion so far (btrfs/004 failed to mount with -o atime though). Any extra setup needed to trigger the failure? Thanks, Qu
On Fri, Jul 14, 2023 at 09:58:00AM +0800, Qu Wenruo wrote: > > > On 2023/7/14 08:26, David Sterba wrote: > > On Fri, Jul 14, 2023 at 08:09:16AM +0800, Qu Wenruo wrote: > >> > >> > >> On 2023/7/14 06:03, David Sterba wrote: > >>> On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote: > >>>> On 2023/7/14 00:39, David Sterba wrote: > >>>>> ref#0: tree block backref root 7 > >>>>> item 14 key (30572544 169 0) itemoff 15815 itemsize 33 > >>>>> extent refs 1 gen 5 flags 2 > >>>>> ref#0: tree block backref root 7 > >>>>> item 15 key (30588928 169 0) itemoff 15782 itemsize 33 > >>>>> extent refs 1 gen 5 flags 2 > >>>>> ref#0: tree block backref root 7 > >>>> > >>>> This looks like an error in memmove_extent_buffer() which I > >>>> intentionally didn't touch. > >>>> > >>>> Anyway I'll try rebase and more tests. > >>>> > >>>> Can you put your modified commits in an external branch so I can inherit > >>>> all your modifications? > >>> > >>> First I saw the crashes with the modified patches but the report is from > >>> what you sent to the mailinglist so I can eliminate error on my side. > >> > >> Still a branch would help a lot, as you won't want to re-do the usual > >> modification (like grammar, comments etc). > > > > Branch ext/qu-eb-page-clanups-updated-broken at github. > > Already running the auto group with that branch, and no explosion so far > (btrfs/004 failed to mount with -o atime though). > > Any extra setup needed to trigger the failure? I'm not aware of anything different than usual. Patches applied to git, built, updated VM and started. I had another branch built and tested and it finished the fstests. I can at least bisect which patch does it.
On 2023/7/14 18:03, David Sterba wrote: > On Fri, Jul 14, 2023 at 09:58:00AM +0800, Qu Wenruo wrote: >> >> >> On 2023/7/14 08:26, David Sterba wrote: >>> On Fri, Jul 14, 2023 at 08:09:16AM +0800, Qu Wenruo wrote: >>>> >>>> >>>> On 2023/7/14 06:03, David Sterba wrote: >>>>> On Fri, Jul 14, 2023 at 05:30:33AM +0800, Qu Wenruo wrote: >>>>>> On 2023/7/14 00:39, David Sterba wrote: >>>>>>> ref#0: tree block backref root 7 >>>>>>> item 14 key (30572544 169 0) itemoff 15815 itemsize 33 >>>>>>> extent refs 1 gen 5 flags 2 >>>>>>> ref#0: tree block backref root 7 >>>>>>> item 15 key (30588928 169 0) itemoff 15782 itemsize 33 >>>>>>> extent refs 1 gen 5 flags 2 >>>>>>> ref#0: tree block backref root 7 >>>>>> >>>>>> This looks like an error in memmove_extent_buffer() which I >>>>>> intentionally didn't touch. >>>>>> >>>>>> Anyway I'll try rebase and more tests. >>>>>> >>>>>> Can you put your modified commits in an external branch so I can inherit >>>>>> all your modifications? >>>>> >>>>> First I saw the crashes with the modified patches but the report is from >>>>> what you sent to the mailinglist so I can eliminate error on my side. >>>> >>>> Still a branch would help a lot, as you won't want to re-do the usual >>>> modification (like grammar, comments etc). >>> >>> Branch ext/qu-eb-page-clanups-updated-broken at github. >> >> Already running the auto group with that branch, and no explosion so far >> (btrfs/004 failed to mount with -o atime though). >> >> Any extra setup needed to trigger the failure? > > I'm not aware of anything different than usual. Patches applied to git, > built, updated VM and started. I had another branch built and tested and > it finished the fstests. I can at least bisect which patch does it. A bisection would be very appreciated. Although I guess it should be the memcpy_extent_buffer() patch, I didn't see something obvious right now... Thanks, Qu
On Fri, Jul 14, 2023 at 06:32:27PM +0800, Qu Wenruo wrote: > >> Already running the auto group with that branch, and no explosion so far > >> (btrfs/004 failed to mount with -o atime though). > >> > >> Any extra setup needed to trigger the failure? > > > > I'm not aware of anything different than usual. Patches applied to git, > > built, updated VM and started. I had another branch built and tested and > > it finished the fstests. I can at least bisect which patch does it. > > A bisection would be very appreciated. > > Although I guess it should be the memcpy_extent_buffer() patch, I didn't > see something obvious right now... 5ebf7593abb81ec1993f31e90a7573b75aff4db4 is the first bad commit btrfs: refactor main loop in memcpy_extent_buffer() $ git bisect log # bad: [5c6c140622dd7107acb13da404f0c682f1f954a6] btrfs: copy all pages at once at the end of btrfs_clone_extent_buffer() # good: [72c15cf7e64769ca9273a825fff8495d99975c9c] btrfs: deprecate integrity checker feature git bisect start 'ext/qu-eb-page-clanups-updated-broken' '72c15cf7e64769ca9273a825fff8495d99975c9c' # good: [85ab525a6a63c477b92099835d6b05eaebd4ad4b] btrfs: use write_extent_buffer() to implement write_extent_buffer_*id() git bisect good 85ab525a6a63c477b92099835d6b05eaebd4ad4b # bad: [cd6668ef43a224b3f8130b78f4e3b922a7175a05] btrfs: refactor main loop in copy_extent_buffer_full() git bisect bad cd6668ef43a224b3f8130b78f4e3b922a7175a05 # bad: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: refactor main loop in memcpy_extent_buffer() git bisect bad 5ebf7593abb81ec1993f31e90a7573b75aff4db4 # first bad commit: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: refactor main loop in memcpy_extent_buffer()
On 2023/7/14 18:41, David Sterba wrote: > On Fri, Jul 14, 2023 at 06:32:27PM +0800, Qu Wenruo wrote: >>>> Already running the auto group with that branch, and no explosion so far >>>> (btrfs/004 failed to mount with -o atime though). >>>> >>>> Any extra setup needed to trigger the failure? >>> >>> I'm not aware of anything different than usual. Patches applied to git, >>> built, updated VM and started. I had another branch built and tested and >>> it finished the fstests. I can at least bisect which patch does it. >> >> A bisection would be very appreciated. >> >> Although I guess it should be the memcpy_extent_buffer() patch, I didn't >> see something obvious right now... > > 5ebf7593abb81ec1993f31e90a7573b75aff4db4 is the first bad commit > btrfs: refactor main loop in memcpy_extent_buffer() Anything special on the system that you can reproduce the bug? I checked the overall code, it's a little different than the original behavior. The original behavior has double limits on the cross-page case, while the new code only handles the cross-page on the source, and let write_extent_buffer() to handle the cross-page situation on the destination. Considering memcpy() is called for memmove() case, it can explain the corrupted tree block we see in your report. Although I can not see the obvious problem, I guess there may be some hidden corner cases that would be finally exposed if we move to folio/vmallocated memory eventually. If I can reproduce it locally the turnover time can be reduced greatly. Thanks, Qu > > $ git bisect log > # bad: [5c6c140622dd7107acb13da404f0c682f1f954a6] btrfs: copy all pages at once at the end of btrfs_clone_extent_buffer() > # good: [72c15cf7e64769ca9273a825fff8495d99975c9c] btrfs: deprecate integrity checker feature > git bisect start 'ext/qu-eb-page-clanups-updated-broken' '72c15cf7e64769ca9273a825fff8495d99975c9c' > # good: [85ab525a6a63c477b92099835d6b05eaebd4ad4b] btrfs: use write_extent_buffer() to implement write_extent_buffer_*id() > git bisect good 85ab525a6a63c477b92099835d6b05eaebd4ad4b > # bad: [cd6668ef43a224b3f8130b78f4e3b922a7175a05] btrfs: refactor main loop in copy_extent_buffer_full() > git bisect bad cd6668ef43a224b3f8130b78f4e3b922a7175a05 > # bad: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: refactor main loop in memcpy_extent_buffer() > git bisect bad 5ebf7593abb81ec1993f31e90a7573b75aff4db4 > # first bad commit: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: refactor main loop in memcpy_extent_buffer()
On 2023/7/15 08:39, Qu Wenruo wrote: > > > On 2023/7/14 18:41, David Sterba wrote: >> On Fri, Jul 14, 2023 at 06:32:27PM +0800, Qu Wenruo wrote: >>>>> Already running the auto group with that branch, and no explosion >>>>> so far >>>>> (btrfs/004 failed to mount with -o atime though). >>>>> >>>>> Any extra setup needed to trigger the failure? >>>> >>>> I'm not aware of anything different than usual. Patches applied to git, >>>> built, updated VM and started. I had another branch built and tested >>>> and >>>> it finished the fstests. I can at least bisect which patch does it. >>> >>> A bisection would be very appreciated. >>> >>> Although I guess it should be the memcpy_extent_buffer() patch, I didn't >>> see something obvious right now... >> >> 5ebf7593abb81ec1993f31e90a7573b75aff4db4 is the first bad commit >> btrfs: refactor main loop in memcpy_extent_buffer() > > Anything special on the system that you can reproduce the bug? > > I checked the overall code, it's a little different than the original > behavior. > > The original behavior has double limits on the cross-page case, while > the new code only handles the cross-page on the source, and let > write_extent_buffer() to handle the cross-page situation on the > destination. OK, I got the cause. It's indeed the memcpy_extent_buffer() rework. memcpy() itself is not safe if the range is overlapping, and the old code is doing proper overlap checks for both memcpy and memmove through copy_pages() helper. And unfortunately I didn't go that copy_pages() helper and triggered the problem. Let me find a better solution for this case. Thanks, Qu > > Considering memcpy() is called for memmove() case, it can explain the > corrupted tree block we see in your report. > > Although I can not see the obvious problem, I guess there may be some > hidden corner cases that would be finally exposed if we move to > folio/vmallocated memory eventually. > > If I can reproduce it locally the turnover time can be reduced greatly. > > Thanks, > Qu >> >> $ git bisect log >> # bad: [5c6c140622dd7107acb13da404f0c682f1f954a6] btrfs: copy all >> pages at once at the end of btrfs_clone_extent_buffer() >> # good: [72c15cf7e64769ca9273a825fff8495d99975c9c] btrfs: deprecate >> integrity checker feature >> git bisect start 'ext/qu-eb-page-clanups-updated-broken' >> '72c15cf7e64769ca9273a825fff8495d99975c9c' >> # good: [85ab525a6a63c477b92099835d6b05eaebd4ad4b] btrfs: use >> write_extent_buffer() to implement write_extent_buffer_*id() >> git bisect good 85ab525a6a63c477b92099835d6b05eaebd4ad4b >> # bad: [cd6668ef43a224b3f8130b78f4e3b922a7175a05] btrfs: refactor main >> loop in copy_extent_buffer_full() >> git bisect bad cd6668ef43a224b3f8130b78f4e3b922a7175a05 >> # bad: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: refactor main >> loop in memcpy_extent_buffer() >> git bisect bad 5ebf7593abb81ec1993f31e90a7573b75aff4db4 >> # first bad commit: [5ebf7593abb81ec1993f31e90a7573b75aff4db4] btrfs: >> refactor main loop in memcpy_extent_buffer()