Message ID | 498b58b794722c70eca58bf3fe46003c43e60aff.1741636986.git.boris@bur.io (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | btrfs: block_group refcounting fixes | expand |
On Mon, Mar 10, 2025 at 01:07:01PM -0700, Boris Burkov wrote: > Block group creation is done in two phases, which results in a slightly > unintiuitive property: a block group can be allocated/deallocated from > after btrfs_make_block_group adds it to the space_info with > btrfs_add_bg_to_space_info, but before creation is completely completed Please write function references with the (), this makes it clear when there are other identifiers like struct names or variables. > in btrfs_create_pending_block_groups. As a result, it is possible for a > block group to go unused and have 'btrfs_mark_bg_unused' called on it > concurrently with 'btrfs_create_pending_block_groups'. This causes a > number of issues, which were fixed with the block group flag > 'BLOCK_GROUP_FLAG_NEW'. > > However, this fix is not quite complete. Since it does not use the > unused_bg_lock, it is possible for the following race to occur: > > btrfs_create_pending_block_groups btrfs_mark_bg_unused > if list_empty // false > list_del_init > clear_bit > else if (test_bit) // true > list_move_tail > > And we get into the exact same broken ref count and invalid new_bgs > state for transaction cleanup that BLOCK_GROUP_FLAG_NEW was designed to > prevent. > > The broken refcount aspect will result in a warning like: > [ 1272.943113] ------------[ cut here ]------------ > [ 1272.943527] refcount_t: underflow; use-after-free. > [ 1272.943967] WARNING: CPU: 1 PID: 61 at lib/refcount.c:28 refcount_warn_saturate+0xba/0x110 > [ 1272.944731] Modules linked in: btrfs virtio_net xor zstd_compress raid6_pq null_blk [last unloaded: btrfs] > [ 1272.945550] CPU: 1 UID: 0 PID: 61 Comm: kworker/u32:1 Kdump: loaded Tainted: G W 6.14.0-rc5+ #108 > [ 1272.946368] Tainted: [W]=WARN > [ 1272.946585] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Arch Linux 1.16.3-1-1 04/01/2014 > [ 1272.947273] Workqueue: btrfs_discard btrfs_discard_workfn [btrfs] > [ 1272.947788] RIP: 0010:refcount_warn_saturate+0xba/0x110 > [ 1272.948180] Code: 01 01 e8 e9 c7 a9 ff 0f 0b c3 cc cc cc cc 80 3d 3f 4a de 01 00 75 85 48 c7 c7 00 9b 9f 8f c6 05 2f 4a de 01 01 e8 c6 c7 a9 ff <0f> 0b c3 cc cc cc cc 80 3d 1d 4a de 01 00 0f 85 5e ff ff ff 48 c7 Please remove the Code: line unless it's really needed for the fix. I've fixed it for-next.
diff --git a/fs/btrfs/block-group.c b/fs/btrfs/block-group.c index 64f0268dcf02..2db1497b58d9 100644 --- a/fs/btrfs/block-group.c +++ b/fs/btrfs/block-group.c @@ -2797,8 +2797,11 @@ void btrfs_create_pending_block_groups(struct btrfs_trans_handle *trans) /* Already aborted the transaction if it failed. */ next: btrfs_dec_delayed_refs_rsv_bg_inserts(fs_info); + + spin_lock(&fs_info->unused_bgs_lock); list_del_init(&block_group->bg_list); clear_bit(BLOCK_GROUP_FLAG_NEW, &block_group->runtime_flags); + spin_unlock(&fs_info->unused_bgs_lock); /* * If the block group is still unused, add it to the list of