Message ID | 20181108054919.18253-1-wqu@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: qgroup: Delay subtree scan to reduce overhead | expand |
On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: > This patchset can be fetched from github: > https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased > > Which is based on v4.20-rc1. Thanks, I'll add it to for-next soon.
On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote: > On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: > > This patchset can be fetched from github: > > https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased > > > > Which is based on v4.20-rc1. > > Thanks, I'll add it to for-next soon. During test generic/517, the logs were full of the warning below. The reference test on current master, effectively misc-4.20 which was used as base of your branch did not get the warning. [11540.167829] BTRFS: end < start 2519039 2519040 [11540.170513] WARNING: CPU: 1 PID: 539 at fs/btrfs/extent_io.c:436 insert_state+0xd8/0x100 [btrfs] [11540.174411] Modules linked in: dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq dm_mod loop [last unloaded: libcrc32c] [11540.178279] CPU: 1 PID: 539 Comm: xfs_io Tainted: G D W 4.20.0-rc1-default+ #329 [11540.180616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 [11540.183754] RIP: 0010:insert_state+0xd8/0x100 [btrfs] [11540.189173] RSP: 0018:ffffa0d245eafb20 EFLAGS: 00010282 [11540.189885] RAX: 0000000000000000 RBX: ffff9f0bb3267320 RCX: 0000000000000000 [11540.191646] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffffa40c400d [11540.192942] RBP: 0000000000266fff R08: 0000000000000001 R09: 0000000000000000 [11540.193871] R10: 0000000000000000 R11: ffffffffa629da2d R12: ffff9f0ba0281c60 [11540.195527] R13: 0000000000267000 R14: ffffa0d245eafb98 R15: ffffa0d245eafb90 [11540.197026] FS: 00007fa338eb4b80(0000) GS:ffff9f0bbd600000(0000) knlGS:0000000000000000 [11540.198251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [11540.199698] CR2: 00007fa33873bfb8 CR3: 000000006fb6e000 CR4: 00000000000006e0 [11540.201428] Call Trace: [11540.202164] __set_extent_bit+0x43b/0x5b0 [btrfs] [11540.203223] lock_extent_bits+0x5d/0x210 [btrfs] [11540.204346] ? _raw_spin_unlock+0x24/0x40 [11540.205381] ? test_range_bit+0xdf/0x130 [btrfs] [11540.206573] lock_extent_range+0xb8/0x150 [btrfs] [11540.207696] btrfs_double_extent_lock+0x78/0xb0 [btrfs] [11540.208988] btrfs_extent_same_range+0x131/0x4e0 [btrfs] [11540.210237] btrfs_remap_file_range+0x337/0x350 [btrfs] [11540.211448] vfs_dedupe_file_range_one+0x141/0x150 [11540.212622] vfs_dedupe_file_range+0x146/0x1a0 [11540.213795] do_vfs_ioctl+0x520/0x6c0 [11540.214711] ? __fget+0x109/0x1e0 [11540.215616] ksys_ioctl+0x3a/0x70 [11540.216233] __x64_sys_ioctl+0x16/0x20 [11540.216860] do_syscall_64+0x54/0x180 [11540.217409] entry_SYSCALL_64_after_hwframe+0x49/0xbe [11540.218126] RIP: 0033:0x7fa338a4daa7
On Tue, Nov 13, 2018 at 5:08 PM David Sterba <dsterba@suse.cz> wrote: > > On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote: > > On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: > > > This patchset can be fetched from github: > > > https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased > > > > > > Which is based on v4.20-rc1. > > > > Thanks, I'll add it to for-next soon. > > During test generic/517, the logs were full of the warning below. The reference > test on current master, effectively misc-4.20 which was used as base of your > branch did not get the warning. > > [11540.167829] BTRFS: end < start 2519039 2519040 > [11540.170513] WARNING: CPU: 1 PID: 539 at fs/btrfs/extent_io.c:436 insert_state+0xd8/0x100 [btrfs] > [11540.174411] Modules linked in: dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq dm_mod loop [last unloaded: libcrc32c] > [11540.178279] CPU: 1 PID: 539 Comm: xfs_io Tainted: G D W 4.20.0-rc1-default+ #329 > [11540.180616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 > [11540.183754] RIP: 0010:insert_state+0xd8/0x100 [btrfs] > [11540.189173] RSP: 0018:ffffa0d245eafb20 EFLAGS: 00010282 > [11540.189885] RAX: 0000000000000000 RBX: ffff9f0bb3267320 RCX: 0000000000000000 > [11540.191646] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffffa40c400d > [11540.192942] RBP: 0000000000266fff R08: 0000000000000001 R09: 0000000000000000 > [11540.193871] R10: 0000000000000000 R11: ffffffffa629da2d R12: ffff9f0ba0281c60 > [11540.195527] R13: 0000000000267000 R14: ffffa0d245eafb98 R15: ffffa0d245eafb90 > [11540.197026] FS: 00007fa338eb4b80(0000) GS:ffff9f0bbd600000(0000) knlGS:0000000000000000 > [11540.198251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [11540.199698] CR2: 00007fa33873bfb8 CR3: 000000006fb6e000 CR4: 00000000000006e0 > [11540.201428] Call Trace: > [11540.202164] __set_extent_bit+0x43b/0x5b0 [btrfs] > [11540.203223] lock_extent_bits+0x5d/0x210 [btrfs] > [11540.204346] ? _raw_spin_unlock+0x24/0x40 > [11540.205381] ? test_range_bit+0xdf/0x130 [btrfs] > [11540.206573] lock_extent_range+0xb8/0x150 [btrfs] > [11540.207696] btrfs_double_extent_lock+0x78/0xb0 [btrfs] > [11540.208988] btrfs_extent_same_range+0x131/0x4e0 [btrfs] > [11540.210237] btrfs_remap_file_range+0x337/0x350 [btrfs] > [11540.211448] vfs_dedupe_file_range_one+0x141/0x150 > [11540.212622] vfs_dedupe_file_range+0x146/0x1a0 > [11540.213795] do_vfs_ioctl+0x520/0x6c0 > [11540.214711] ? __fget+0x109/0x1e0 > [11540.215616] ksys_ioctl+0x3a/0x70 > [11540.216233] __x64_sys_ioctl+0x16/0x20 > [11540.216860] do_syscall_64+0x54/0x180 > [11540.217409] entry_SYSCALL_64_after_hwframe+0x49/0xbe > [11540.218126] RIP: 0033:0x7fa338a4daa7 That's the infinite loop issue fixed by one of the patches submitted for 4.20-rc2: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.20-rc2&id=11023d3f5fdf89bba5e1142127701ca6e6014587 The branch you used for testing doesn't have that fix? >
On 2018/11/14 上午1:58, Filipe Manana wrote: > On Tue, Nov 13, 2018 at 5:08 PM David Sterba <dsterba@suse.cz> wrote: >> >> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote: >>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: >>>> This patchset can be fetched from github: >>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased >>>> >>>> Which is based on v4.20-rc1. >>> >>> Thanks, I'll add it to for-next soon. >> >> During test generic/517, the logs were full of the warning below. The reference >> test on current master, effectively misc-4.20 which was used as base of your >> branch did not get the warning. >> >> [11540.167829] BTRFS: end < start 2519039 2519040 >> [11540.170513] WARNING: CPU: 1 PID: 539 at fs/btrfs/extent_io.c:436 insert_state+0xd8/0x100 [btrfs] >> [11540.174411] Modules linked in: dm_thin_pool dm_persistent_data dm_bufio dm_bio_prison btrfs libcrc32c xor zstd_decompress zstd_compress xxhash raid6_pq dm_mod loop [last unloaded: libcrc32c] >> [11540.178279] CPU: 1 PID: 539 Comm: xfs_io Tainted: G D W 4.20.0-rc1-default+ #329 >> [11540.180616] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.11.2-0-gf9626cc-prebuilt.qemu-project.org 04/01/2014 >> [11540.183754] RIP: 0010:insert_state+0xd8/0x100 [btrfs] >> [11540.189173] RSP: 0018:ffffa0d245eafb20 EFLAGS: 00010282 >> [11540.189885] RAX: 0000000000000000 RBX: ffff9f0bb3267320 RCX: 0000000000000000 >> [11540.191646] RDX: 0000000000000002 RSI: 0000000000000001 RDI: ffffffffa40c400d >> [11540.192942] RBP: 0000000000266fff R08: 0000000000000001 R09: 0000000000000000 >> [11540.193871] R10: 0000000000000000 R11: ffffffffa629da2d R12: ffff9f0ba0281c60 >> [11540.195527] R13: 0000000000267000 R14: ffffa0d245eafb98 R15: ffffa0d245eafb90 >> [11540.197026] FS: 00007fa338eb4b80(0000) GS:ffff9f0bbd600000(0000) knlGS:0000000000000000 >> [11540.198251] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [11540.199698] CR2: 00007fa33873bfb8 CR3: 000000006fb6e000 CR4: 00000000000006e0 >> [11540.201428] Call Trace: >> [11540.202164] __set_extent_bit+0x43b/0x5b0 [btrfs] >> [11540.203223] lock_extent_bits+0x5d/0x210 [btrfs] >> [11540.204346] ? _raw_spin_unlock+0x24/0x40 >> [11540.205381] ? test_range_bit+0xdf/0x130 [btrfs] >> [11540.206573] lock_extent_range+0xb8/0x150 [btrfs] >> [11540.207696] btrfs_double_extent_lock+0x78/0xb0 [btrfs] >> [11540.208988] btrfs_extent_same_range+0x131/0x4e0 [btrfs] >> [11540.210237] btrfs_remap_file_range+0x337/0x350 [btrfs] >> [11540.211448] vfs_dedupe_file_range_one+0x141/0x150 >> [11540.212622] vfs_dedupe_file_range+0x146/0x1a0 >> [11540.213795] do_vfs_ioctl+0x520/0x6c0 >> [11540.214711] ? __fget+0x109/0x1e0 >> [11540.215616] ksys_ioctl+0x3a/0x70 >> [11540.216233] __x64_sys_ioctl+0x16/0x20 >> [11540.216860] do_syscall_64+0x54/0x180 >> [11540.217409] entry_SYSCALL_64_after_hwframe+0x49/0xbe >> [11540.218126] RIP: 0033:0x7fa338a4daa7 > > That's the infinite loop issue fixed by one of the patches submitted > for 4.20-rc2: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.20-rc2&id=11023d3f5fdf89bba5e1142127701ca6e6014587 > > The branch you used for testing doesn't have that fix? Yep, I tried v4.20-rc1 tag, which hits tons of such warning even without my patchset. So it shouldn't be my patches causing the problem. Thanks, Qu > >> > >
On Tue, Nov 13, 2018 at 05:58:14PM +0000, Filipe Manana wrote: > That's the infinite loop issue fixed by one of the patches submitted > for 4.20-rc2: > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.20-rc2&id=11023d3f5fdf89bba5e1142127701ca6e6014587 > > The branch you used for testing doesn't have that fix? That explains it, thanks. The branch was based on 4.20-rc1 as I took it from Qu's repository but did not check the base.
On 2018/11/15 上午3:05, David Sterba wrote: > On Tue, Nov 13, 2018 at 05:58:14PM +0000, Filipe Manana wrote: >> That's the infinite loop issue fixed by one of the patches submitted >> for 4.20-rc2: >> >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.20-rc2&id=11023d3f5fdf89bba5e1142127701ca6e6014587 >> >> The branch you used for testing doesn't have that fix? > > That explains it, thanks. The branch was based on 4.20-rc1 as I took it > from Qu's repository but did not check the base. BTW should I always rebase my patches to misc-next or misc-4.20? IMHO based on -rc tags should make it easier for David to rebase/apply, but if it's causing problem like this, I could definitely go misc-* based. Thanks, Qu
On Thu, Nov 15, 2018 at 01:23:25PM +0800, Qu Wenruo wrote: > > > On 2018/11/15 上午3:05, David Sterba wrote: > > On Tue, Nov 13, 2018 at 05:58:14PM +0000, Filipe Manana wrote: > >> That's the infinite loop issue fixed by one of the patches submitted > >> for 4.20-rc2: > >> > >> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?h=v4.20-rc2&id=11023d3f5fdf89bba5e1142127701ca6e6014587 > >> > >> The branch you used for testing doesn't have that fix? > > > > That explains it, thanks. The branch was based on 4.20-rc1 as I took it > > from Qu's repository but did not check the base. > > BTW should I always rebase my patches to misc-next or misc-4.20? The misc-next can rebase as I add the tags or need to remove a patch etc., so using last rc or the last pulled branch (ie. the exact commit from misc-4.20, not the head itself) should be better option for you so you don't need to catch up and rebase constantly. I can handle rebases of your branches to current misc-next as long as there are no major conflicts. > IMHO based on -rc tags should make it easier for David to rebase/apply, > but if it's causing problem like this, I could definitely go misc-* based. No, that was my fault. I review and rebase all topic branches at each rc release, but taking the branch from your repo was a bit different step in the workflow.
On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote: > On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: > > This patchset can be fetched from github: > > https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased > > > > Which is based on v4.20-rc1. > > Thanks, I'll add it to for-next soon. The branch was there for some time but not for at least a week (my mistake I did not notice in time). I've rebased it on top of recent misc-next, but without the delayed refs patchset from Josef. At the moment I'm considering it for merge to 4.21, there's still some time to pull it out in case it shows up to be too problematic. I'm mostly worried about the unknown interactions with the enospc updates or generally because of lack of qgroup and reloc code reviews. I'm going to do some testing of the rebased branch before I add it to for-next. The branch is ext/qu/qgroup-delay-scan in my devel repos, plase check if everyghing is still ok there. Thanks.
On 2018/12/7 上午3:35, David Sterba wrote: > On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote: >> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: >>> This patchset can be fetched from github: >>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased >>> >>> Which is based on v4.20-rc1. >> >> Thanks, I'll add it to for-next soon. > > The branch was there for some time but not for at least a week (my > mistake I did not notice in time). I've rebased it on top of recent > misc-next, but without the delayed refs patchset from Josef. > > At the moment I'm considering it for merge to 4.21, there's still some > time to pull it out in case it shows up to be too problematic. I'm > mostly worried about the unknown interactions with the enospc updates or For that part, I don't think it would have some obvious problem for enospc updates. As the user-noticeable effect is the delay of reloc tree deletion. Despite that, it's mostly transparent to extent allocation. > generally because of lack of qgroup and reloc code reviews. That's the biggest problem. However most of the current qgroup + balance optimization is done inside qgroup code (to skip certain qgroup record), if we're going to hit some problem then this patchset would have the highest possibility to hit problem. Later patches will just keep tweaking qgroup to without affecting any other parts mostly. So I'm fine if you decide to pull it out for now. Thanks, Qu > > I'm going to do some testing of the rebased branch before I add it to > for-next. The branch is ext/qu/qgroup-delay-scan in my devel repos, > plase check if everyghing is still ok there. Thanks. >
On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote: > > > On 2018/12/7 上午3:35, David Sterba wrote: > > On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote: > >> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: > >>> This patchset can be fetched from github: > >>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased > >>> > >>> Which is based on v4.20-rc1. > >> > >> Thanks, I'll add it to for-next soon. > > > > The branch was there for some time but not for at least a week (my > > mistake I did not notice in time). I've rebased it on top of recent > > misc-next, but without the delayed refs patchset from Josef. > > > > At the moment I'm considering it for merge to 4.21, there's still some > > time to pull it out in case it shows up to be too problematic. I'm > > mostly worried about the unknown interactions with the enospc updates or > > For that part, I don't think it would have some obvious problem for > enospc updates. > > As the user-noticeable effect is the delay of reloc tree deletion. > > Despite that, it's mostly transparent to extent allocation. > > > generally because of lack of qgroup and reloc code reviews. > > That's the biggest problem. > > However most of the current qgroup + balance optimization is done inside > qgroup code (to skip certain qgroup record), if we're going to hit some > problem then this patchset would have the highest possibility to hit > problem. > > Later patches will just keep tweaking qgroup to without affecting any > other parts mostly. > > So I'm fine if you decide to pull it out for now. I've adapted a stress tests that unpacks a large tarball, snaphosts every 20 seconds, deletes a random snapshot every 50 seconds, deletes file from the original subvolume, now enhanced with qgroups just for the new snapshots inherigin the toplevel subvolume. Lockup. It gets stuck in a snapshot call with the follwin stacktrace [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs] [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs] [<0>] do_walk_down+0x681/0xb20 [btrfs] [<0>] walk_down_tree+0xf5/0x1c0 [btrfs] [<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs] [<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs] [<0>] cleaner_kthread+0xf8/0x170 [btrfs] [<0>] kthread+0x121/0x140 [<0>] ret_from_fork+0x27/0x50 and that's like 10th snapshot and ~3rd deltion. This is qgroup show: qgroupid rfer excl parent -------- ---- ---- ------ 0/5 865.27MiB 1.66MiB --- 0/257 0.00B 0.00B --- 0/259 0.00B 0.00B --- 0/260 806.58MiB 637.25MiB --- 0/262 0.00B 0.00B --- 0/263 0.00B 0.00B --- 0/264 0.00B 0.00B --- 0/265 0.00B 0.00B --- 0/266 0.00B 0.00B --- 0/267 0.00B 0.00B --- 0/268 0.00B 0.00B --- 0/269 0.00B 0.00B --- 0/270 989.04MiB 1.22MiB --- 0/271 0.00B 0.00B --- 0/272 922.25MiB 416.00KiB --- 0/273 931.02MiB 1.50MiB --- 0/274 910.94MiB 1.52MiB --- 1/1 1.64GiB 1.64GiB 0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274 No IO or cpu activity at this point, the stacktrace and show output remains the same. So, considering this, I'm not going to add the patchset to 4.21 but will keep it in for-next for testing, any fixups or updates will be applied.
On 2018/12/8 上午8:47, David Sterba wrote: > On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote: >> >> >> On 2018/12/7 上午3:35, David Sterba wrote: >>> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote: >>>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: >>>>> This patchset can be fetched from github: >>>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased >>>>> >>>>> Which is based on v4.20-rc1. >>>> >>>> Thanks, I'll add it to for-next soon. >>> >>> The branch was there for some time but not for at least a week (my >>> mistake I did not notice in time). I've rebased it on top of recent >>> misc-next, but without the delayed refs patchset from Josef. >>> >>> At the moment I'm considering it for merge to 4.21, there's still some >>> time to pull it out in case it shows up to be too problematic. I'm >>> mostly worried about the unknown interactions with the enospc updates or >> >> For that part, I don't think it would have some obvious problem for >> enospc updates. >> >> As the user-noticeable effect is the delay of reloc tree deletion. >> >> Despite that, it's mostly transparent to extent allocation. >> >>> generally because of lack of qgroup and reloc code reviews. >> >> That's the biggest problem. >> >> However most of the current qgroup + balance optimization is done inside >> qgroup code (to skip certain qgroup record), if we're going to hit some >> problem then this patchset would have the highest possibility to hit >> problem. >> >> Later patches will just keep tweaking qgroup to without affecting any >> other parts mostly. >> >> So I'm fine if you decide to pull it out for now. > > I've adapted a stress tests that unpacks a large tarball, snaphosts > every 20 seconds, deletes a random snapshot every 50 seconds, deletes > file from the original subvolume, now enhanced with qgroups just for the > new snapshots inherigin the toplevel subvolume. Lockup. > > It gets stuck in a snapshot call with the follwin stacktrace > > [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs] > [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs] This looks like the original subtree tracing has something wrong. Thanks for the report, I'll investigate it. Qu > [<0>] do_walk_down+0x681/0xb20 [btrfs] > [<0>] walk_down_tree+0xf5/0x1c0 [btrfs] > [<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs] > [<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs] > [<0>] cleaner_kthread+0xf8/0x170 [btrfs] > [<0>] kthread+0x121/0x140 > [<0>] ret_from_fork+0x27/0x50 > > and that's like 10th snapshot and ~3rd deltion. This is qgroup show: > > qgroupid rfer excl parent > -------- ---- ---- ------ > 0/5 865.27MiB 1.66MiB --- > 0/257 0.00B 0.00B --- > 0/259 0.00B 0.00B --- > 0/260 806.58MiB 637.25MiB --- > 0/262 0.00B 0.00B --- > 0/263 0.00B 0.00B --- > 0/264 0.00B 0.00B --- > 0/265 0.00B 0.00B --- > 0/266 0.00B 0.00B --- > 0/267 0.00B 0.00B --- > 0/268 0.00B 0.00B --- > 0/269 0.00B 0.00B --- > 0/270 989.04MiB 1.22MiB --- > 0/271 0.00B 0.00B --- > 0/272 922.25MiB 416.00KiB --- > 0/273 931.02MiB 1.50MiB --- > 0/274 910.94MiB 1.52MiB --- > 1/1 1.64GiB 1.64GiB > 0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274 > > No IO or cpu activity at this point, the stacktrace and show output > remains the same. > > So, considering this, I'm not going to add the patchset to 4.21 but will > keep it in for-next for testing, any fixups or updates will be applied. >
On Sat, Dec 08, 2018 at 08:50:32AM +0800, Qu Wenruo wrote: > > I've adapted a stress tests that unpacks a large tarball, snaphosts > > every 20 seconds, deletes a random snapshot every 50 seconds, deletes > > file from the original subvolume, now enhanced with qgroups just for the > > new snapshots inherigin the toplevel subvolume. Lockup. > > > > It gets stuck in a snapshot call with the follwin stacktrace > > > > [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs] > > [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs] > > This looks like the original subtree tracing has something wrong. Yes, I ran the test on current master and it locked up too, so it's not due to your patchset. > Thanks for the report, I'll investigate it. Thanks.
On 2018/12/8 上午8:47, David Sterba wrote: > On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote: >> >> >> On 2018/12/7 上午3:35, David Sterba wrote: >>> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote: >>>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: >>>>> This patchset can be fetched from github: >>>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased >>>>> >>>>> Which is based on v4.20-rc1. >>>> >>>> Thanks, I'll add it to for-next soon. >>> >>> The branch was there for some time but not for at least a week (my >>> mistake I did not notice in time). I've rebased it on top of recent >>> misc-next, but without the delayed refs patchset from Josef. >>> >>> At the moment I'm considering it for merge to 4.21, there's still some >>> time to pull it out in case it shows up to be too problematic. I'm >>> mostly worried about the unknown interactions with the enospc updates or >> >> For that part, I don't think it would have some obvious problem for >> enospc updates. >> >> As the user-noticeable effect is the delay of reloc tree deletion. >> >> Despite that, it's mostly transparent to extent allocation. >> >>> generally because of lack of qgroup and reloc code reviews. >> >> That's the biggest problem. >> >> However most of the current qgroup + balance optimization is done inside >> qgroup code (to skip certain qgroup record), if we're going to hit some >> problem then this patchset would have the highest possibility to hit >> problem. >> >> Later patches will just keep tweaking qgroup to without affecting any >> other parts mostly. >> >> So I'm fine if you decide to pull it out for now. > > I've adapted a stress tests that unpacks a large tarball, snaphosts > every 20 seconds, deletes a random snapshot every 50 seconds, deletes > file from the original subvolume, now enhanced with qgroups just for the > new snapshots inherigin the toplevel subvolume. Lockup. Could you please provide the test script? As I can't reproduce it in my environment. I crafted my own test with some simplification, namely no qgroup inherit. However I can't reproduce the problem even with more snapshots creation/deletion and more data. In my test script, I created around 35 snapshots, deleted 6 snapshots, with around 1000 data regular extents and 1000 2K inline extents. My test script can be found at: https://gist.github.com/adam900710/4109fa23fc5ba8fc6b37a9c8e52353c1 Thanks, Qu > > It gets stuck in a snapshot call with the follwin stacktrace > > [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs] > [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs] > [<0>] do_walk_down+0x681/0xb20 [btrfs] > [<0>] walk_down_tree+0xf5/0x1c0 [btrfs] > [<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs] > [<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs] > [<0>] cleaner_kthread+0xf8/0x170 [btrfs] > [<0>] kthread+0x121/0x140 > [<0>] ret_from_fork+0x27/0x50 > > and that's like 10th snapshot and ~3rd deltion. This is qgroup show: > > qgroupid rfer excl parent > -------- ---- ---- ------ > 0/5 865.27MiB 1.66MiB --- > 0/257 0.00B 0.00B --- > 0/259 0.00B 0.00B --- > 0/260 806.58MiB 637.25MiB --- > 0/262 0.00B 0.00B --- > 0/263 0.00B 0.00B --- > 0/264 0.00B 0.00B --- > 0/265 0.00B 0.00B --- > 0/266 0.00B 0.00B --- > 0/267 0.00B 0.00B --- > 0/268 0.00B 0.00B --- > 0/269 0.00B 0.00B --- > 0/270 989.04MiB 1.22MiB --- > 0/271 0.00B 0.00B --- > 0/272 922.25MiB 416.00KiB --- > 0/273 931.02MiB 1.50MiB --- > 0/274 910.94MiB 1.52MiB --- > 1/1 1.64GiB 1.64GiB > 0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274 > > No IO or cpu activity at this point, the stacktrace and show output > remains the same. > > So, considering this, I'm not going to add the patchset to 4.21 but will > keep it in for-next for testing, any fixups or updates will be applied. >
On Sat, Dec 8, 2018 at 12:51 AM Qu Wenruo <wqu@suse.de> wrote: > > > > On 2018/12/8 上午8:47, David Sterba wrote: > > On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote: > >> > >> > >> On 2018/12/7 上午3:35, David Sterba wrote: > >>> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote: > >>>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: > >>>>> This patchset can be fetched from github: > >>>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased > >>>>> > >>>>> Which is based on v4.20-rc1. > >>>> > >>>> Thanks, I'll add it to for-next soon. > >>> > >>> The branch was there for some time but not for at least a week (my > >>> mistake I did not notice in time). I've rebased it on top of recent > >>> misc-next, but without the delayed refs patchset from Josef. > >>> > >>> At the moment I'm considering it for merge to 4.21, there's still some > >>> time to pull it out in case it shows up to be too problematic. I'm > >>> mostly worried about the unknown interactions with the enospc updates or > >> > >> For that part, I don't think it would have some obvious problem for > >> enospc updates. > >> > >> As the user-noticeable effect is the delay of reloc tree deletion. > >> > >> Despite that, it's mostly transparent to extent allocation. > >> > >>> generally because of lack of qgroup and reloc code reviews. > >> > >> That's the biggest problem. > >> > >> However most of the current qgroup + balance optimization is done inside > >> qgroup code (to skip certain qgroup record), if we're going to hit some > >> problem then this patchset would have the highest possibility to hit > >> problem. > >> > >> Later patches will just keep tweaking qgroup to without affecting any > >> other parts mostly. > >> > >> So I'm fine if you decide to pull it out for now. > > > > I've adapted a stress tests that unpacks a large tarball, snaphosts > > every 20 seconds, deletes a random snapshot every 50 seconds, deletes > > file from the original subvolume, now enhanced with qgroups just for the > > new snapshots inherigin the toplevel subvolume. Lockup. > > > > It gets stuck in a snapshot call with the follwin stacktrace > > > > [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs] > > [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs] > > This looks like the original subtree tracing has something wrong. > > Thanks for the report, I'll investigate it. Btw, there's another deadlock with qgroups. I don't recall if I ever reported it, but I still hit it with fstests (rarely happens) for at least 1 year iirc: [29845.732448] INFO: task kworker/u8:8:3898 blocked for more than 120 seconds. [29845.732852] Not tainted 4.20.0-rc5-btrfs-next-40 #1 [29845.733248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. [29845.733558] kworker/u8:8 D 0 3898 2 0x80000000 [29845.733878] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] [29845.734183] Call Trace: [29845.734499] ? __schedule+0x3d4/0xbc0 [29845.734818] schedule+0x39/0x90 [29845.735131] btrfs_tree_read_lock+0xe7/0x140 [btrfs] [29845.735430] ? remove_wait_queue+0x60/0x60 [29845.735731] find_parent_nodes+0x25e/0xe30 [btrfs] [29845.736037] btrfs_find_all_roots_safe+0xc6/0x140 [btrfs] [29845.736342] btrfs_find_all_roots+0x52/0x70 [btrfs] [29845.736710] btrfs_qgroup_trace_extent_post+0x37/0x80 [btrfs] [29845.737046] btrfs_add_delayed_data_ref+0x240/0x3d0 [btrfs] [29845.737362] btrfs_inc_extent_ref+0xb7/0x140 [btrfs] [29845.737678] __btrfs_mod_ref+0x174/0x250 [btrfs] [29845.737999] ? add_pinned_bytes+0x60/0x60 [btrfs] [29845.738298] update_ref_for_cow+0x26b/0x340 [btrfs] [29845.738592] __btrfs_cow_block+0x221/0x5b0 [btrfs] [29845.738899] btrfs_cow_block+0xf4/0x210 [btrfs] [29845.739200] btrfs_search_slot+0x583/0xa40 [btrfs] [29845.739527] ? init_object+0x6b/0x80 [29845.739823] btrfs_lookup_file_extent+0x4a/0x70 [btrfs] [29845.740119] __btrfs_drop_extents+0x157/0xd70 [btrfs] [29845.740524] insert_reserved_file_extent.constprop.66+0x97/0x2f0 [btrfs] [29845.740853] ? start_transaction+0xa2/0x490 [btrfs] [29845.741166] btrfs_finish_ordered_io+0x344/0x810 [btrfs] [29845.741489] normal_work_helper+0xea/0x530 [btrfs] [29845.741880] process_one_work+0x22f/0x5d0 [29845.742174] worker_thread+0x4f/0x3b0 [29845.742462] ? rescuer_thread+0x360/0x360 [29845.742759] kthread+0x103/0x140 [29845.743044] ? kthread_create_worker_on_cpu+0x70/0x70 [29845.743336] ret_from_fork+0x3a/0x50 It happened last friday again on 4.20-rcX. It's caused by a change from 2017 (commit fb235dc06fac9eaa4408ade9c8b20d45d63c89b7 btrfs: qgroup: Move half of the qgroup accounting time out of commit trans). The task is deadlocking with itself. thanks > Qu > > > [<0>] do_walk_down+0x681/0xb20 [btrfs] > > [<0>] walk_down_tree+0xf5/0x1c0 [btrfs] > > [<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs] > > [<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs] > > [<0>] cleaner_kthread+0xf8/0x170 [btrfs] > > [<0>] kthread+0x121/0x140 > > [<0>] ret_from_fork+0x27/0x50 > > > > and that's like 10th snapshot and ~3rd deltion. This is qgroup show: > > > > qgroupid rfer excl parent > > -------- ---- ---- ------ > > 0/5 865.27MiB 1.66MiB --- > > 0/257 0.00B 0.00B --- > > 0/259 0.00B 0.00B --- > > 0/260 806.58MiB 637.25MiB --- > > 0/262 0.00B 0.00B --- > > 0/263 0.00B 0.00B --- > > 0/264 0.00B 0.00B --- > > 0/265 0.00B 0.00B --- > > 0/266 0.00B 0.00B --- > > 0/267 0.00B 0.00B --- > > 0/268 0.00B 0.00B --- > > 0/269 0.00B 0.00B --- > > 0/270 989.04MiB 1.22MiB --- > > 0/271 0.00B 0.00B --- > > 0/272 922.25MiB 416.00KiB --- > > 0/273 931.02MiB 1.50MiB --- > > 0/274 910.94MiB 1.52MiB --- > > 1/1 1.64GiB 1.64GiB > > 0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274 > > > > No IO or cpu activity at this point, the stacktrace and show output > > remains the same. > > > > So, considering this, I'm not going to add the patchset to 4.21 but will > > keep it in for-next for testing, any fixups or updates will be applied. > > >
On 2018/12/10 下午6:45, Filipe Manana wrote: > On Sat, Dec 8, 2018 at 12:51 AM Qu Wenruo <wqu@suse.de> wrote: >> >> >> >> On 2018/12/8 上午8:47, David Sterba wrote: >>> On Fri, Dec 07, 2018 at 06:51:21AM +0800, Qu Wenruo wrote: >>>> >>>> >>>> On 2018/12/7 上午3:35, David Sterba wrote: >>>>> On Mon, Nov 12, 2018 at 10:33:33PM +0100, David Sterba wrote: >>>>>> On Thu, Nov 08, 2018 at 01:49:12PM +0800, Qu Wenruo wrote: >>>>>>> This patchset can be fetched from github: >>>>>>> https://github.com/adam900710/linux/tree/qgroup_delayed_subtree_rebased >>>>>>> >>>>>>> Which is based on v4.20-rc1. >>>>>> >>>>>> Thanks, I'll add it to for-next soon. >>>>> >>>>> The branch was there for some time but not for at least a week (my >>>>> mistake I did not notice in time). I've rebased it on top of recent >>>>> misc-next, but without the delayed refs patchset from Josef. >>>>> >>>>> At the moment I'm considering it for merge to 4.21, there's still some >>>>> time to pull it out in case it shows up to be too problematic. I'm >>>>> mostly worried about the unknown interactions with the enospc updates or >>>> >>>> For that part, I don't think it would have some obvious problem for >>>> enospc updates. >>>> >>>> As the user-noticeable effect is the delay of reloc tree deletion. >>>> >>>> Despite that, it's mostly transparent to extent allocation. >>>> >>>>> generally because of lack of qgroup and reloc code reviews. >>>> >>>> That's the biggest problem. >>>> >>>> However most of the current qgroup + balance optimization is done inside >>>> qgroup code (to skip certain qgroup record), if we're going to hit some >>>> problem then this patchset would have the highest possibility to hit >>>> problem. >>>> >>>> Later patches will just keep tweaking qgroup to without affecting any >>>> other parts mostly. >>>> >>>> So I'm fine if you decide to pull it out for now. >>> >>> I've adapted a stress tests that unpacks a large tarball, snaphosts >>> every 20 seconds, deletes a random snapshot every 50 seconds, deletes >>> file from the original subvolume, now enhanced with qgroups just for the >>> new snapshots inherigin the toplevel subvolume. Lockup. >>> >>> It gets stuck in a snapshot call with the follwin stacktrace >>> >>> [<0>] btrfs_tree_read_lock+0xf3/0x150 [btrfs] >>> [<0>] btrfs_qgroup_trace_subtree+0x280/0x7b0 [btrfs] >> >> This looks like the original subtree tracing has something wrong. >> >> Thanks for the report, I'll investigate it. > > Btw, there's another deadlock with qgroups. I don't recall if I ever > reported it, but I still hit it with fstests (rarely happens) for at > least 1 year iirc: > > [29845.732448] INFO: task kworker/u8:8:3898 blocked for more than 120 seconds. > [29845.732852] Not tainted 4.20.0-rc5-btrfs-next-40 #1 > [29845.733248] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" > disables this message. > [29845.733558] kworker/u8:8 D 0 3898 2 0x80000000 > [29845.733878] Workqueue: btrfs-endio-write btrfs_endio_write_helper [btrfs] > [29845.734183] Call Trace: > [29845.734499] ? __schedule+0x3d4/0xbc0 > [29845.734818] schedule+0x39/0x90 > [29845.735131] btrfs_tree_read_lock+0xe7/0x140 [btrfs] > [29845.735430] ? remove_wait_queue+0x60/0x60 > [29845.735731] find_parent_nodes+0x25e/0xe30 [btrfs] > [29845.736037] btrfs_find_all_roots_safe+0xc6/0x140 [btrfs] > [29845.736342] btrfs_find_all_roots+0x52/0x70 [btrfs] > [29845.736710] btrfs_qgroup_trace_extent_post+0x37/0x80 [btrfs] > [29845.737046] btrfs_add_delayed_data_ref+0x240/0x3d0 [btrfs] > [29845.737362] btrfs_inc_extent_ref+0xb7/0x140 [btrfs] > [29845.737678] __btrfs_mod_ref+0x174/0x250 [btrfs] > [29845.737999] ? add_pinned_bytes+0x60/0x60 [btrfs] > [29845.738298] update_ref_for_cow+0x26b/0x340 [btrfs] > [29845.738592] __btrfs_cow_block+0x221/0x5b0 [btrfs] > [29845.738899] btrfs_cow_block+0xf4/0x210 [btrfs] > [29845.739200] btrfs_search_slot+0x583/0xa40 [btrfs] > [29845.739527] ? init_object+0x6b/0x80 > [29845.739823] btrfs_lookup_file_extent+0x4a/0x70 [btrfs] > [29845.740119] __btrfs_drop_extents+0x157/0xd70 [btrfs] > [29845.740524] insert_reserved_file_extent.constprop.66+0x97/0x2f0 [btrfs] > [29845.740853] ? start_transaction+0xa2/0x490 [btrfs] > [29845.741166] btrfs_finish_ordered_io+0x344/0x810 [btrfs] > [29845.741489] normal_work_helper+0xea/0x530 [btrfs] > [29845.741880] process_one_work+0x22f/0x5d0 > [29845.742174] worker_thread+0x4f/0x3b0 > [29845.742462] ? rescuer_thread+0x360/0x360 > [29845.742759] kthread+0x103/0x140 > [29845.743044] ? kthread_create_worker_on_cpu+0x70/0x70 > [29845.743336] ret_from_fork+0x3a/0x50 > > It happened last friday again on 4.20-rcX. It's caused by a change > from 2017 (commit fb235dc06fac9eaa4408ade9c8b20d45d63c89b7 btrfs: > qgroup: Move half of the qgroup accounting time out of commit trans). I have to admit, this commit doesn't really save much critical section time, but causes a lot of problem for its ability to trigger backward tree locking behavior. Especially when its original objective is to reduce balance + qgroup overhead, but did a poor job compared to recent optimization. I'll revert it just as what we did in SLE kernels. Thanks, Qu > The task is deadlocking with itself. > > thanks > > >> Qu >> >>> [<0>] do_walk_down+0x681/0xb20 [btrfs] >>> [<0>] walk_down_tree+0xf5/0x1c0 [btrfs] >>> [<0>] btrfs_drop_snapshot+0x43b/0xb60 [btrfs] >>> [<0>] btrfs_clean_one_deleted_snapshot+0xc1/0x120 [btrfs] >>> [<0>] cleaner_kthread+0xf8/0x170 [btrfs] >>> [<0>] kthread+0x121/0x140 >>> [<0>] ret_from_fork+0x27/0x50 >>> >>> and that's like 10th snapshot and ~3rd deltion. This is qgroup show: >>> >>> qgroupid rfer excl parent >>> -------- ---- ---- ------ >>> 0/5 865.27MiB 1.66MiB --- >>> 0/257 0.00B 0.00B --- >>> 0/259 0.00B 0.00B --- >>> 0/260 806.58MiB 637.25MiB --- >>> 0/262 0.00B 0.00B --- >>> 0/263 0.00B 0.00B --- >>> 0/264 0.00B 0.00B --- >>> 0/265 0.00B 0.00B --- >>> 0/266 0.00B 0.00B --- >>> 0/267 0.00B 0.00B --- >>> 0/268 0.00B 0.00B --- >>> 0/269 0.00B 0.00B --- >>> 0/270 989.04MiB 1.22MiB --- >>> 0/271 0.00B 0.00B --- >>> 0/272 922.25MiB 416.00KiB --- >>> 0/273 931.02MiB 1.50MiB --- >>> 0/274 910.94MiB 1.52MiB --- >>> 1/1 1.64GiB 1.64GiB >>> 0/5,0/257,0/259,0/260,0/262,0/263,0/264,0/265,0/266,0/267,0/268,0/269,0/270,0/271,0/272,0/273,0/274 >>> >>> No IO or cpu activity at this point, the stacktrace and show output >>> remains the same. >>> >>> So, considering this, I'm not going to add the patchset to 4.21 but will >>> keep it in for-next for testing, any fixups or updates will be applied. >>> >> > >