diff mbox series

btrfs: fix setting last_trans for reloc roots

Message ID 20200410154248.2646406-1-josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix setting last_trans for reloc roots | expand

Commit Message

Josef Bacik April 10, 2020, 3:42 p.m. UTC
I made a mistake with my previous fix, I assumed that we didn't need to
mess with the reloc roots once we were out of the part of relocation
where we are actually moving the extents.

The subtle thing that I missed is that btrfs_init_reloc_root() also
updates the last_trans for the reloc root when we do
btrfs_record_root_in_trans() for the corresponding fs_root.  I've added
a comment to make sure future me doesn't make this mistake again.

This showed up as a WARN_ON() in btrfs_copy_root() because our
last_trans didn't == the current transid.  This could happen if we
snapshotted a fs root with a reloc root after we set
rc->create_reloc_tree = 0, but before we actually merge the reloc root.

Fixes: 2abc726ab4b8 ("btrfs: do not init a reloc root if we aren't relocating")
Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 fs/btrfs/relocation.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

Comments

Filipe Manana April 14, 2020, 4:24 p.m. UTC | #1
On Fri, Apr 10, 2020 at 4:44 PM Josef Bacik <josef@toxicpanda.com> wrote:
>
> I made a mistake with my previous fix, I assumed that we didn't need to
> mess with the reloc roots once we were out of the part of relocation
> where we are actually moving the extents.
>
> The subtle thing that I missed is that btrfs_init_reloc_root() also
> updates the last_trans for the reloc root when we do
> btrfs_record_root_in_trans() for the corresponding fs_root.  I've added
> a comment to make sure future me doesn't make this mistake again.
>
> This showed up as a WARN_ON() in btrfs_copy_root() because our
> last_trans didn't == the current transid.  This could happen if we
> snapshotted a fs root with a reloc root after we set
> rc->create_reloc_tree = 0, but before we actually merge the reloc root.
>
> Fixes: 2abc726ab4b8 ("btrfs: do not init a reloc root if we aren't relocating")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Reviewed-by: Filipe Manana <fdmanana@suse.com>

Worth mentioning that the regression produced the following warning
when running snapshot creation and balance in parallel:

85256.545808] BTRFS info (device sdc): relocating block group 30408704
flags metadata|dup
[85256.551956] ------------[ cut here ]------------
[85256.552852] WARNING: CPU: 0 PID: 12823 at fs/btrfs/ctree.c:191
btrfs_copy_root+0x26f/0x430 [btrfs]
[85256.554407] Modules linked in: btrfs dm_log_writes dm_mod
blake2b_generic xor raid6_pq libcrc32c intel_rapl_msr
intel_rapl_common kvm_intel kvm irqbypass bochs_drm drm_vram_helper
crct10dif_pclmul drm_ttm_helper crc32_pclmul ghash_clmulni_intel ttm
drm_kms_helper aesni_intel crypto_simd cryptd glue_helper drm sg
joydev evdev serio_raw qemu_fw_cfg pcspkr button parport_pc ppdev lp
parport ip_tables x_tables autofs4 ext4 crc32c_generic crc16 mbcache
jbd2 sd_mod t10_pi virtio_scsi ata_generic ata_piix libata
crc32c_intel i2c_piix4 scsi_mod virtio_pci psmouse virtio_ring virtio
e1000 [last unloaded: btrfs]
[85256.563623] CPU: 0 PID: 12823 Comm: btrfs Tainted: G        W
  5.6.0-rc7-btrfs-next-58 #1
[85256.565139] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.12.0-59-gc9ba5276e321-prebuilt.qemu.org 04/01/2014
[85256.567134] RIP: 0010:btrfs_copy_root+0x26f/0x430 [btrfs]
[85256.568068] Code: 5f c3 31 d2 4c 89 fe 48 89 ef e8 6c e9 04 00 44
8b 4c 24 08 e9 4d fe ff ff 48 8b 83 60 05 00 00 49 39 04 24 0f 84 e3
fd ff ff <0f> 0b e9 dc fd ff ff 49 8b 86 b0 04 00 00 48 8b 00 48 39 07
0f 84
[85256.571282] RSP: 0018:ffffb96e044279b8 EFLAGS: 00010202
[85256.572193] RAX: 0000000000000009 RBX: ffff9da70bf61000 RCX: ffffb96e04427a48
[85256.573422] RDX: ffff9da733a770c8 RSI: ffff9da70bf61000 RDI: ffff9da694163818
[85256.574657] RBP: ffff9da733a770c8 R08: fffffffffffffff8 R09: 0000000000000002
[85256.575887] R10: ffffb96e044279a0 R11: 0000000000000000 R12: ffff9da694163818
[85256.577122] R13: fffffffffffffff8 R14: ffff9da6d2512000 R15: ffff9da714cdac00
[85256.578352] FS:  00007fdeacf328c0(0000) GS:ffff9da735e00000(0000)
knlGS:0000000000000000
[85256.579741] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[85256.580739] CR2: 000055a2a5b8a118 CR3: 00000001eed78002 CR4: 00000000003606f0
[85256.581971] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[85256.583200] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[85256.584431] Call Trace:
[85256.584887]  ? create_reloc_root+0x49/0x2b0 [btrfs]
[85256.585734]  ? kmem_cache_alloc_trace+0xe5/0x200
[85256.586552]  create_reloc_root+0x8b/0x2b0 [btrfs]
[85256.587387]  btrfs_reloc_post_snapshot+0x96/0x5b0 [btrfs]
[85256.588340]  create_pending_snapshot+0x610/0x1010 [btrfs]
[85256.589293]  create_pending_snapshots+0xa8/0xd0 [btrfs]
[85256.590212]  btrfs_commit_transaction+0x4c7/0xc50 [btrfs]
[85256.591166]  ? btrfs_mksubvol+0x3cd/0x560 [btrfs]
[85256.592006]  btrfs_mksubvol+0x455/0x560 [btrfs]
[85256.592640]  __btrfs_ioctl_snap_create+0x15f/0x190 [btrfs]
[85256.593336]  btrfs_ioctl_snap_create_v2+0xa4/0xf0 [btrfs]
[85256.594008]  ? mem_cgroup_commit_charge+0x6e/0x540
[85256.594615]  btrfs_ioctl+0x12d8/0x3760 [btrfs]
[85256.595171]  ? do_raw_spin_unlock+0x49/0xc0
[85256.595695]  ? _raw_spin_unlock+0x29/0x40
[85256.596198]  ? __handle_mm_fault+0x11b3/0x14b0
[85256.596757]  ? ksys_ioctl+0x92/0xb0
[85256.597194]  ksys_ioctl+0x92/0xb0
[85256.597610]  ? trace_hardirqs_off_thunk+0x1a/0x1c
[85256.598194]  __x64_sys_ioctl+0x16/0x20
[85256.598662]  do_syscall_64+0x5c/0x280
[85256.599120]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
[85256.599752] RIP: 0033:0x7fdeabd3bdd7

This is actually very easy to trigger after the patchset I just sent
for fstests, that fixes btrfs/014.

Thanks.

> ---
>  fs/btrfs/relocation.c | 19 +++++++++++++++++--
>  1 file changed, 17 insertions(+), 2 deletions(-)
>
> diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
> index d4734337127a..76bfb524bf3e 100644
> --- a/fs/btrfs/relocation.c
> +++ b/fs/btrfs/relocation.c
> @@ -830,8 +830,7 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
>         int clear_rsv = 0;
>         int ret;
>
> -       if (!rc || !rc->create_reloc_tree ||
> -           root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID)
> +       if (!rc)
>                 return 0;
>
>         /*
> @@ -841,12 +840,28 @@ int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
>         if (reloc_root_is_dead(root))
>                 return 0;
>
> +       /*
> +        * This is subtle but important.  We do not do
> +        * record_root_in_transaction for reloc roots, instead we record their
> +        * corresponding fs root, and then here we update the last trans for the
> +        * reloc root.  This means that we have to do this for the entire life
> +        * of the reloc root, regardless of which stage of the relocation we are
> +        * in.
> +        */
>         if (root->reloc_root) {
>                 reloc_root = root->reloc_root;
>                 reloc_root->last_trans = trans->transid;
>                 return 0;
>         }
>
> +       /*
> +        * We are merging reloc roots, we do not need new reloc trees.  Also
> +        * reloc trees never need their own reloc tree.
> +        */
> +       if (!rc->create_reloc_tree ||
> +           root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID)
> +               return 0;
> +
>         if (!trans->reloc_reserved) {
>                 rsv = trans->block_rsv;
>                 trans->block_rsv = rc->block_rsv;
> --
> 2.24.1
>
David Sterba April 16, 2020, 12:24 p.m. UTC | #2
On Fri, Apr 10, 2020 at 11:42:48AM -0400, Josef Bacik wrote:
> I made a mistake with my previous fix, I assumed that we didn't need to
> mess with the reloc roots once we were out of the part of relocation
> where we are actually moving the extents.
> 
> The subtle thing that I missed is that btrfs_init_reloc_root() also
> updates the last_trans for the reloc root when we do
> btrfs_record_root_in_trans() for the corresponding fs_root.  I've added
> a comment to make sure future me doesn't make this mistake again.
> 
> This showed up as a WARN_ON() in btrfs_copy_root() because our
> last_trans didn't == the current transid.  This could happen if we
> snapshotted a fs root with a reloc root after we set
> rc->create_reloc_tree = 0, but before we actually merge the reloc root.
> 
> Fixes: 2abc726ab4b8 ("btrfs: do not init a reloc root if we aren't relocating")
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>

Added to misc-next with the stacktrace.
Johannes Thumshirn April 16, 2020, 12:37 p.m. UTC | #3
This fixes a kmemleak complaint from btrfs/074, complete re-run of 
xfstests is pending, but one down again.

Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Johannes Thumshirn April 16, 2020, 3:34 p.m. UTC | #4
On 16/04/2020 14:38, Johannes Thumshirn wrote:
> This fixes a kmemleak complaint from btrfs/074, complete re-run of
> xfstests is pending, but one down again.
> 
> Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> 

I'll take that back, I still see a leak of 'cur_trans' allocated in 
join_transaction() for btrfs/074 on a full xfstests run. The same leak 
is reported for btrfs/072 and generic/127.
David Sterba April 20, 2020, 11:20 p.m. UTC | #5
On Thu, Apr 16, 2020 at 03:34:30PM +0000, Johannes Thumshirn wrote:
> On 16/04/2020 14:38, Johannes Thumshirn wrote:
> > This fixes a kmemleak complaint from btrfs/074, complete re-run of
> > xfstests is pending, but one down again.
> > 
> > Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> > 
> 
> I'll take that back, I still see a leak of 'cur_trans' allocated in 
> join_transaction() for btrfs/074 on a full xfstests run. The same leak 
> is reported for btrfs/072 and generic/127.

I'm not sure, but the patch "btrfs: drop logs when we've aborted a
transaction" is fixing transaction handle leaks. I've added it to
misc-next, the effects have been observed in test generic/475 so it's
only a weak link, tests btrfs/074 do not stress the transaction cleanup
that much.
Filipe Manana April 21, 2020, 8:04 a.m. UTC | #6
On Tue, Apr 21, 2020 at 12:22 AM David Sterba <dsterba@suse.cz> wrote:
>
> On Thu, Apr 16, 2020 at 03:34:30PM +0000, Johannes Thumshirn wrote:
> > On 16/04/2020 14:38, Johannes Thumshirn wrote:
> > > This fixes a kmemleak complaint from btrfs/074, complete re-run of
> > > xfstests is pending, but one down again.
> > >
> > > Tested-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
> > >
> >
> > I'll take that back, I still see a leak of 'cur_trans' allocated in
> > join_transaction() for btrfs/074 on a full xfstests run. The same leak
> > is reported for btrfs/072 and generic/127.
>
> I'm not sure, but the patch "btrfs: drop logs when we've aborted a
> transaction" is fixing transaction handle leaks. I've added it to
> misc-next, the effects have been observed in test generic/475 so it's
> only a weak link, tests btrfs/074 do not stress the transaction cleanup
> that much.

Hum?
That patch fixes a use-after-crash during unmount after a transaction
was aborted - it doesn't fix transaction leaks as far as I can see.
Perhaps you meant patch "btrfs: fix memory leak of transaction when
deleting unused block group", which fixes a regression introduced in
5.7-rc1.
btrfs/074 (and tests 060 to 073) often triggers unused block group
deletion due to fsstress and other operations in parallel.
diff mbox series

Patch

diff --git a/fs/btrfs/relocation.c b/fs/btrfs/relocation.c
index d4734337127a..76bfb524bf3e 100644
--- a/fs/btrfs/relocation.c
+++ b/fs/btrfs/relocation.c
@@ -830,8 +830,7 @@  int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
 	int clear_rsv = 0;
 	int ret;
 
-	if (!rc || !rc->create_reloc_tree ||
-	    root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID)
+	if (!rc)
 		return 0;
 
 	/*
@@ -841,12 +840,28 @@  int btrfs_init_reloc_root(struct btrfs_trans_handle *trans,
 	if (reloc_root_is_dead(root))
 		return 0;
 
+	/*
+	 * This is subtle but important.  We do not do
+	 * record_root_in_transaction for reloc roots, instead we record their
+	 * corresponding fs root, and then here we update the last trans for the
+	 * reloc root.  This means that we have to do this for the entire life
+	 * of the reloc root, regardless of which stage of the relocation we are
+	 * in.
+	 */
 	if (root->reloc_root) {
 		reloc_root = root->reloc_root;
 		reloc_root->last_trans = trans->transid;
 		return 0;
 	}
 
+	/*
+	 * We are merging reloc roots, we do not need new reloc trees.  Also
+	 * reloc trees never need their own reloc tree.
+	 */
+	if (!rc->create_reloc_tree ||
+	    root->root_key.objectid == BTRFS_TREE_RELOC_OBJECTID)
+		return 0;
+
 	if (!trans->reloc_reserved) {
 		rsv = trans->block_rsv;
 		trans->block_rsv = rc->block_rsv;