diff mbox

btrfs: extent-tree: Check if the newly reserved tree block is already in use

Message ID 20180717074658.22331-1-wqu@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Qu Wenruo July 17, 2018, 7:46 a.m. UTC
[BUG]
For certain fuzzed btrfs image, if we create any csum data, it would
cause the following kernel warning and deadlock when trying to update
csum tree:
------
[  278.113360] WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
[  278.113737] CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
[  278.113745] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
[  278.113753] Workqueue: btrfs-endio-write btrfs_endio_write_helper
[  278.113761] RIP: 0010:btrfs_tree_lock+0x3e2/0x400
[  278.113762] Code: 00 48 c7 40 08 00 00 00 00 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 20 48 81 c4 a0 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b e9 d4 fc ff ff 0f 0b e9 61 ff ff ff e8 ab f4 87 ff 90 66 2e
[  278.113818] RSP: 0018:ffff8801f407f488 EFLAGS: 00010246
[  278.113865] Call Trace:
[  278.113936]  btrfs_alloc_tree_block+0x39f/0x770
[  278.113988]  __btrfs_cow_block+0x285/0x9e0
[  278.114029]  btrfs_cow_block+0x191/0x2e0
[  278.114035]  btrfs_search_slot+0x492/0x1160
[  278.114146]  btrfs_lookup_csum+0xec/0x280
[  278.114182]  btrfs_csum_file_blocks+0x2be/0xa60
[  278.114232]  add_pending_csums+0xaf/0xf0
[  278.114238]  btrfs_finish_ordered_io+0x74b/0xc90
[  278.114281]  finish_ordered_fn+0x15/0x20
[  278.114285]  normal_work_helper+0xf6/0x500
[  278.114305]  btrfs_endio_write_helper+0x12/0x20
[  278.114310]  process_one_work+0x302/0x770
[  278.114315]  worker_thread+0x81/0x6d0
[  278.114321]  kthread+0x180/0x1d0
[  278.114334]  ret_from_fork+0x35/0x40
[  278.114339] ---[ end trace 2e85051acb5f6dc1 ]---
------

[CAUSE]
The fuzzed image has corrupted EXTENT_ITEM for csum tree root:
------
extent tree key (EXTENT_TREE ROOT_ITEM 0)
	item 4 key (29364224 METADATA_ITEM 0) itemoff 3857 itemsize 33
		refs 1 gen 6 flags TREE_BLOCK
		tree block skinny level 0
		tree block backref root UUID_TREE
	item 5 key (29376512 UNKNOWN.0 0) itemoff 3824 itemsize 33
		    ^^^^^^^^^^^^^^^^^^^^ Corrupted METADATA_ITEM
	item 6 key (29380608 METADATA_ITEM 0) itemoff 3791 itemsize 33
		refs 1 gen 4 flags TREE_BLOCK
		tree block skinny level 0
		tree block backref root DATA_RELOC_TREE

checksum tree key (CSUM_TREE ROOT_ITEM 0)
leaf 29376512 items 0 free space 3995 generation 4 owner CSUM_TREE
     ^^^^^^^^ bytenr matches above item.
------

So when btrfs_alloc_tree_blocks() calls btrfs_reserve_extent(), since
there is not METADATA_ITEM/EXTENT_ITEM for bytenr 29376512, btrfs thinks
it's free space, and reserve it.

However in fact it's already been used by csum tree, and later
btrfs_init_new_buffer() will try to call btrfs_tree_lock(), whose
WARN_ON() detects lock nest on the same extent buffer.

Finally the wait_event() on the eb->read/write_lock_wq will never exit
since we're holding the lock by ourselves and deadlock.

[FIX]
The fix here is to ensure at least the reserved extent buffer is not
cached.
Any used extent buffer should be cached in the global radix tree
(fs_info->buffer_radix).

So before calling btrfs_init_new_buffer() in btrfs_alloc_tree_block(),
we call find_extent_buffer() explicitly to verify it's not used by
ourselves.

Please note this is just a basic check, it is not and will never be as
good as btrfs check on detecting extent tree corruption, but at least we
won't dead lock so easily.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
Reported-by: Xu Wen <wen.xu@gatech.edu>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 fs/btrfs/extent-tree.c | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Nikolay Borisov July 17, 2018, 8:01 a.m. UTC | #1
On 17.07.2018 10:46, Qu Wenruo wrote:
> [BUG]
> For certain fuzzed btrfs image, if we create any csum data, it would
> cause the following kernel warning and deadlock when trying to update
> csum tree:
> ------
> [  278.113360] WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
> [  278.113737] CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
> [  278.113745] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
> [  278.113753] Workqueue: btrfs-endio-write btrfs_endio_write_helper
> [  278.113761] RIP: 0010:btrfs_tree_lock+0x3e2/0x400
> [  278.113762] Code: 00 48 c7 40 08 00 00 00 00 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 20 48 81 c4 a0 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b e9 d4 fc ff ff 0f 0b e9 61 ff ff ff e8 ab f4 87 ff 90 66 2e
> [  278.113818] RSP: 0018:ffff8801f407f488 EFLAGS: 00010246
> [  278.113865] Call Trace:
> [  278.113936]  btrfs_alloc_tree_block+0x39f/0x770
> [  278.113988]  __btrfs_cow_block+0x285/0x9e0
> [  278.114029]  btrfs_cow_block+0x191/0x2e0
> [  278.114035]  btrfs_search_slot+0x492/0x1160
> [  278.114146]  btrfs_lookup_csum+0xec/0x280
> [  278.114182]  btrfs_csum_file_blocks+0x2be/0xa60
> [  278.114232]  add_pending_csums+0xaf/0xf0
> [  278.114238]  btrfs_finish_ordered_io+0x74b/0xc90
> [  278.114281]  finish_ordered_fn+0x15/0x20
> [  278.114285]  normal_work_helper+0xf6/0x500
> [  278.114305]  btrfs_endio_write_helper+0x12/0x20
> [  278.114310]  process_one_work+0x302/0x770
> [  278.114315]  worker_thread+0x81/0x6d0
> [  278.114321]  kthread+0x180/0x1d0
> [  278.114334]  ret_from_fork+0x35/0x40
> [  278.114339] ---[ end trace 2e85051acb5f6dc1 ]---
> ------
> 
> [CAUSE]
> The fuzzed image has corrupted EXTENT_ITEM for csum tree root:
> ------
> extent tree key (EXTENT_TREE ROOT_ITEM 0)
> 	item 4 key (29364224 METADATA_ITEM 0) itemoff 3857 itemsize 33
> 		refs 1 gen 6 flags TREE_BLOCK
> 		tree block skinny level 0
> 		tree block backref root UUID_TREE
> 	item 5 key (29376512 UNKNOWN.0 0) itemoff 3824 itemsize 33
> 		    ^^^^^^^^^^^^^^^^^^^^ Corrupted METADATA_ITEM
> 	item 6 key (29380608 METADATA_ITEM 0) itemoff 3791 itemsize 33
> 		refs 1 gen 4 flags TREE_BLOCK
> 		tree block skinny level 0
> 		tree block backref root DATA_RELOC_TREE
> 
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> leaf 29376512 items 0 free space 3995 generation 4 owner CSUM_TREE
>      ^^^^^^^^ bytenr matches above item.
> ------
> 
> So when btrfs_alloc_tree_blocks() calls btrfs_reserve_extent(), since
> there is not METADATA_ITEM/EXTENT_ITEM for bytenr 29376512, btrfs thinks
> it's free space, and reserve it.
> 
> However in fact it's already been used by csum tree, and later
> btrfs_init_new_buffer() will try to call btrfs_tree_lock(), whose
> WARN_ON() detects lock nest on the same extent buffer.
> 
> Finally the wait_event() on the eb->read/write_lock_wq will never exit
> since we're holding the lock by ourselves and deadlock.
> 
> [FIX]
> The fix here is to ensure at least the reserved extent buffer is not
> cached.
> Any used extent buffer should be cached in the global radix tree
> (fs_info->buffer_radix).
> 
> So before calling btrfs_init_new_buffer() in btrfs_alloc_tree_block(),
> we call find_extent_buffer() explicitly to verify it's not used by
> ourselves.
> 
> Please note this is just a basic check, it is not and will never be as
> good as btrfs check on detecting extent tree corruption, but at least we
> won't dead lock so easily.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
> Reported-by: Xu Wen <wen.xu@gatech.edu>
> Signed-off-by: Qu Wenruo <wqu@suse.com>
> ---
>  fs/btrfs/extent-tree.c | 14 ++++++++++++++
>  1 file changed, 14 insertions(+)
> 
> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
> index 3578fa5b30ef..782dd96b7c5e 100644
> --- a/fs/btrfs/extent-tree.c
> +++ b/fs/btrfs/extent-tree.c
> @@ -8435,6 +8435,20 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>  	if (ret)
>  		goto out_unuse;
>  
> +	/*
> +	 * Newly allocated tree block should never be cached in radix tree,
> +	 * Or we have a corrupted extent tree.
> +	 */
> +	buf = find_extent_buffer(fs_info, ins.objectid);
> +	if (buf) {
> +		btrfs_err_rl(fs_info,
> +	"tree block %llu is already in use, extent tree may be corrupted",
> +			     ins.objectid);
> +		ret = -EUCLEAN;
> +		free_extent_buffer(buf);
> +		goto out_unuse;
> +	}

The code makes sense but I have the feeling it needs to have some sort
of assert guard because this check will likely trigger only on severly
corrupted filesystemd and yet we introduce it for everyone.

> +
>  	buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
>  	if (IS_ERR(buf)) {
>  		ret = PTR_ERR(buf);
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Su Yue July 17, 2018, 8:11 a.m. UTC | #2
On 07/17/2018 04:01 PM, Nikolay Borisov wrote:
> 
> 
> On 17.07.2018 10:46, Qu Wenruo wrote:
>> [BUG]
>> For certain fuzzed btrfs image, if we create any csum data, it would
>> cause the following kernel warning and deadlock when trying to update
>> csum tree:
>> ------
>> [  278.113360] WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
>> [  278.113737] CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
>> [  278.113745] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
>> [  278.113753] Workqueue: btrfs-endio-write btrfs_endio_write_helper
>> [  278.113761] RIP: 0010:btrfs_tree_lock+0x3e2/0x400
>> [  278.113762] Code: 00 48 c7 40 08 00 00 00 00 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 20 48 81 c4 a0 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b e9 d4 fc ff ff 0f 0b e9 61 ff ff ff e8 ab f4 87 ff 90 66 2e
>> [  278.113818] RSP: 0018:ffff8801f407f488 EFLAGS: 00010246
>> [  278.113865] Call Trace:
>> [  278.113936]  btrfs_alloc_tree_block+0x39f/0x770
>> [  278.113988]  __btrfs_cow_block+0x285/0x9e0
>> [  278.114029]  btrfs_cow_block+0x191/0x2e0
>> [  278.114035]  btrfs_search_slot+0x492/0x1160
>> [  278.114146]  btrfs_lookup_csum+0xec/0x280
>> [  278.114182]  btrfs_csum_file_blocks+0x2be/0xa60
>> [  278.114232]  add_pending_csums+0xaf/0xf0
>> [  278.114238]  btrfs_finish_ordered_io+0x74b/0xc90
>> [  278.114281]  finish_ordered_fn+0x15/0x20
>> [  278.114285]  normal_work_helper+0xf6/0x500
>> [  278.114305]  btrfs_endio_write_helper+0x12/0x20
>> [  278.114310]  process_one_work+0x302/0x770
>> [  278.114315]  worker_thread+0x81/0x6d0
>> [  278.114321]  kthread+0x180/0x1d0
>> [  278.114334]  ret_from_fork+0x35/0x40
>> [  278.114339] ---[ end trace 2e85051acb5f6dc1 ]---
>> ------
>>
>> [CAUSE]
>> The fuzzed image has corrupted EXTENT_ITEM for csum tree root:
>> ------
>> extent tree key (EXTENT_TREE ROOT_ITEM 0)
>> 	item 4 key (29364224 METADATA_ITEM 0) itemoff 3857 itemsize 33
>> 		refs 1 gen 6 flags TREE_BLOCK
>> 		tree block skinny level 0
>> 		tree block backref root UUID_TREE
>> 	item 5 key (29376512 UNKNOWN.0 0) itemoff 3824 itemsize 33
>> 		    ^^^^^^^^^^^^^^^^^^^^ Corrupted METADATA_ITEM
>> 	item 6 key (29380608 METADATA_ITEM 0) itemoff 3791 itemsize 33
>> 		refs 1 gen 4 flags TREE_BLOCK
>> 		tree block skinny level 0
>> 		tree block backref root DATA_RELOC_TREE
>>
>> checksum tree key (CSUM_TREE ROOT_ITEM 0)
>> leaf 29376512 items 0 free space 3995 generation 4 owner CSUM_TREE
>>       ^^^^^^^^ bytenr matches above item.
>> ------
>>
>> So when btrfs_alloc_tree_blocks() calls btrfs_reserve_extent(), since
>> there is not METADATA_ITEM/EXTENT_ITEM for bytenr 29376512, btrfs thinks
>> it's free space, and reserve it.
>>
>> However in fact it's already been used by csum tree, and later
>> btrfs_init_new_buffer() will try to call btrfs_tree_lock(), whose
>> WARN_ON() detects lock nest on the same extent buffer.
>>
>> Finally the wait_event() on the eb->read/write_lock_wq will never exit
>> since we're holding the lock by ourselves and deadlock.
>>
>> [FIX]
>> The fix here is to ensure at least the reserved extent buffer is not
>> cached.
>> Any used extent buffer should be cached in the global radix tree
>> (fs_info->buffer_radix).
>>
>> So before calling btrfs_init_new_buffer() in btrfs_alloc_tree_block(),
>> we call find_extent_buffer() explicitly to verify it's not used by
>> ourselves.
>>
>> Please note this is just a basic check, it is not and will never be as
>> good as btrfs check on detecting extent tree corruption, but at least we
>> won't dead lock so easily.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
>> Reported-by: Xu Wen <wen.xu@gatech.edu>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>   fs/btrfs/extent-tree.c | 14 ++++++++++++++
>>   1 file changed, 14 insertions(+)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 3578fa5b30ef..782dd96b7c5e 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -8435,6 +8435,20 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>   	if (ret)
>>   		goto out_unuse;
>>   
>> +	/*
>> +	 * Newly allocated tree block should never be cached in radix tree,
>> +	 * Or we have a corrupted extent tree.
>> +	 */
>> +	buf = find_extent_buffer(fs_info, ins.objectid);
>> +	if (buf) {
>> +		btrfs_err_rl(fs_info,
>> +	"tree block %llu is already in use, extent tree may be corrupted",
>> +			     ins.objectid);
>> +		ret = -EUCLEAN;
>> +		free_extent_buffer(buf);
>> +		goto out_unuse;
>> +	}
> 
> The code makes sense but I have the feeling it needs to have some sort
> of assert guard because this check will likely trigger only on severly
> corrupted filesystemd and yet we introduce it for everyone.
> 
Agreed, the function is called so frequently.
>> +
>>   	buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
>>   	if (IS_ERR(buf)) {
>>   		ret = PTR_ERR(buf);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo July 17, 2018, 8:24 a.m. UTC | #3
On 2018年07月17日 16:01, Nikolay Borisov wrote:
> 
> 
> On 17.07.2018 10:46, Qu Wenruo wrote:
>> [BUG]
>> For certain fuzzed btrfs image, if we create any csum data, it would
>> cause the following kernel warning and deadlock when trying to update
>> csum tree:
>> ------
>> [  278.113360] WARNING: CPU: 1 PID: 41 at fs/btrfs/locking.c:230 btrfs_tree_lock+0x3e2/0x400
>> [  278.113737] CPU: 1 PID: 41 Comm: kworker/u4:1 Not tainted 4.18.0-rc1+ #8
>> [  278.113745] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu1 04/01/2014
>> [  278.113753] Workqueue: btrfs-endio-write btrfs_endio_write_helper
>> [  278.113761] RIP: 0010:btrfs_tree_lock+0x3e2/0x400
>> [  278.113762] Code: 00 48 c7 40 08 00 00 00 00 48 8b 45 d0 65 48 33 04 25 28 00 00 00 75 20 48 81 c4 a0 00 00 00 5b 41 5c 41 5d 41 5e 41 5f 5d c3 <0f> 0b e9 d4 fc ff ff 0f 0b e9 61 ff ff ff e8 ab f4 87 ff 90 66 2e
>> [  278.113818] RSP: 0018:ffff8801f407f488 EFLAGS: 00010246
>> [  278.113865] Call Trace:
>> [  278.113936]  btrfs_alloc_tree_block+0x39f/0x770
>> [  278.113988]  __btrfs_cow_block+0x285/0x9e0
>> [  278.114029]  btrfs_cow_block+0x191/0x2e0
>> [  278.114035]  btrfs_search_slot+0x492/0x1160
>> [  278.114146]  btrfs_lookup_csum+0xec/0x280
>> [  278.114182]  btrfs_csum_file_blocks+0x2be/0xa60
>> [  278.114232]  add_pending_csums+0xaf/0xf0
>> [  278.114238]  btrfs_finish_ordered_io+0x74b/0xc90
>> [  278.114281]  finish_ordered_fn+0x15/0x20
>> [  278.114285]  normal_work_helper+0xf6/0x500
>> [  278.114305]  btrfs_endio_write_helper+0x12/0x20
>> [  278.114310]  process_one_work+0x302/0x770
>> [  278.114315]  worker_thread+0x81/0x6d0
>> [  278.114321]  kthread+0x180/0x1d0
>> [  278.114334]  ret_from_fork+0x35/0x40
>> [  278.114339] ---[ end trace 2e85051acb5f6dc1 ]---
>> ------
>>
>> [CAUSE]
>> The fuzzed image has corrupted EXTENT_ITEM for csum tree root:
>> ------
>> extent tree key (EXTENT_TREE ROOT_ITEM 0)
>> 	item 4 key (29364224 METADATA_ITEM 0) itemoff 3857 itemsize 33
>> 		refs 1 gen 6 flags TREE_BLOCK
>> 		tree block skinny level 0
>> 		tree block backref root UUID_TREE
>> 	item 5 key (29376512 UNKNOWN.0 0) itemoff 3824 itemsize 33
>> 		    ^^^^^^^^^^^^^^^^^^^^ Corrupted METADATA_ITEM
>> 	item 6 key (29380608 METADATA_ITEM 0) itemoff 3791 itemsize 33
>> 		refs 1 gen 4 flags TREE_BLOCK
>> 		tree block skinny level 0
>> 		tree block backref root DATA_RELOC_TREE
>>
>> checksum tree key (CSUM_TREE ROOT_ITEM 0)
>> leaf 29376512 items 0 free space 3995 generation 4 owner CSUM_TREE
>>      ^^^^^^^^ bytenr matches above item.
>> ------
>>
>> So when btrfs_alloc_tree_blocks() calls btrfs_reserve_extent(), since
>> there is not METADATA_ITEM/EXTENT_ITEM for bytenr 29376512, btrfs thinks
>> it's free space, and reserve it.
>>
>> However in fact it's already been used by csum tree, and later
>> btrfs_init_new_buffer() will try to call btrfs_tree_lock(), whose
>> WARN_ON() detects lock nest on the same extent buffer.
>>
>> Finally the wait_event() on the eb->read/write_lock_wq will never exit
>> since we're holding the lock by ourselves and deadlock.
>>
>> [FIX]
>> The fix here is to ensure at least the reserved extent buffer is not
>> cached.
>> Any used extent buffer should be cached in the global radix tree
>> (fs_info->buffer_radix).
>>
>> So before calling btrfs_init_new_buffer() in btrfs_alloc_tree_block(),
>> we call find_extent_buffer() explicitly to verify it's not used by
>> ourselves.
>>
>> Please note this is just a basic check, it is not and will never be as
>> good as btrfs check on detecting extent tree corruption, but at least we
>> won't dead lock so easily.
>>
>> Link: https://bugzilla.kernel.org/show_bug.cgi?id=200405
>> Reported-by: Xu Wen <wen.xu@gatech.edu>
>> Signed-off-by: Qu Wenruo <wqu@suse.com>
>> ---
>>  fs/btrfs/extent-tree.c | 14 ++++++++++++++
>>  1 file changed, 14 insertions(+)
>>
>> diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
>> index 3578fa5b30ef..782dd96b7c5e 100644
>> --- a/fs/btrfs/extent-tree.c
>> +++ b/fs/btrfs/extent-tree.c
>> @@ -8435,6 +8435,20 @@ struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
>>  	if (ret)
>>  		goto out_unuse;
>>  
>> +	/*
>> +	 * Newly allocated tree block should never be cached in radix tree,
>> +	 * Or we have a corrupted extent tree.
>> +	 */
>> +	buf = find_extent_buffer(fs_info, ins.objectid);
>> +	if (buf) {
>> +		btrfs_err_rl(fs_info,
>> +	"tree block %llu is already in use, extent tree may be corrupted",
>> +			     ins.objectid);
>> +		ret = -EUCLEAN;
>> +		free_extent_buffer(buf);
>> +		goto out_unuse;
>> +	}
> 
> The code makes sense but I have the feeling it needs to have some sort
> of assert guard because this check will likely trigger only on severly
> corrupted filesystemd and yet we introduce it for everyone.

And it's causing problem for certain test cases.
Please ignore this (at least for now).

But on the other hand, we indeed have a lot of reports on corrupted
extent tree, it's possible to hit some corrupted extent tree (Su is
already exhausted by the corrupted tree reported by Marc)

So I'm not completely fine with current extent tree error handling.
I'll try to find some balance in next version.

Thanks,
Qu
> 
>> +
>>  	buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
>>  	if (IS_ERR(buf)) {
>>  		ret = PTR_ERR(buf);
>>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Nikolay Borisov July 17, 2018, 8:28 a.m. UTC | #4
On 17.07.2018 11:24, Qu Wenruo wrote:
> And it's causing problem for certain test cases.
> Please ignore this (at least for now).
> 
> But on the other hand, we indeed have a lot of reports on corrupted
> extent tree, it's possible to hit some corrupted extent tree (Su is
> already exhausted by the corrupted tree reported by Marc)
> 
> So I'm not completely fine with current extent tree error handling.
> I'll try to find some balance in next version.


I agree we need a better OVERALL error handling/detection. Your
tree-checker work IMO is a step in the right direction. What I want is
to prevent ad-hoc checks being sprinkled in the code. Sorry, but that's
not fine. The thing with working on a lot of corruption reports is the
fact each one of them is looked at in isolation so it produces isolated
fixes. Whereas if a step back is taken and the overall error
handling/detection is considered it might turn out a whole class of
corruption could be detected by a single change, otherwise checks upon
checks will be added which just add technical debt.

Considering this, I'm more in favor of extending the tree-checker to be
the central place where errors are detected (of course this is easier
said than done).
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo July 17, 2018, 8:33 a.m. UTC | #5
On 2018年07月17日 16:28, Nikolay Borisov wrote:
> 
> 
> On 17.07.2018 11:24, Qu Wenruo wrote:
>> And it's causing problem for certain test cases.
>> Please ignore this (at least for now).
>>
>> But on the other hand, we indeed have a lot of reports on corrupted
>> extent tree, it's possible to hit some corrupted extent tree (Su is
>> already exhausted by the corrupted tree reported by Marc)
>>
>> So I'm not completely fine with current extent tree error handling.
>> I'll try to find some balance in next version.
> 
> 
> I agree we need a better OVERALL error handling/detection. Your
> tree-checker work IMO is a step in the right direction. What I want is
> to prevent ad-hoc checks being sprinkled in the code.

Yep, I also don't like what I'm doing.
But the problem and the limit of tree-checker is, it's static check.

For things doing tons of cross-check like extent tree, it's not really
as useful.

> Sorry, but that's
> not fine. The thing with working on a lot of corruption reports is the
> fact each one of them is looked at in isolation so it produces isolated
> fixes. Whereas if a step back is taken and the overall error
> handling/detection is considered it might turn out a whole class of
> corruption could be detected by a single change, otherwise checks upon
> checks will be added which just add technical debt.
> 
> Considering this, I'm more in favor of extending the tree-checker to be
> the central place where errors are detected (of course this is easier
> said than done).

For this report itself, tree checker can detect it indirectly by reject
the leaf for the unknown key type.

But one can easily create a valid image by just removing the valid
METADATA_ITEM, and tree-checker can't do anything to detect the problem.

So unfortunately, we will eventually need some runtime check anyway.

Thanks,
Qu

> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/fs/btrfs/extent-tree.c b/fs/btrfs/extent-tree.c
index 3578fa5b30ef..782dd96b7c5e 100644
--- a/fs/btrfs/extent-tree.c
+++ b/fs/btrfs/extent-tree.c
@@ -8435,6 +8435,20 @@  struct extent_buffer *btrfs_alloc_tree_block(struct btrfs_trans_handle *trans,
 	if (ret)
 		goto out_unuse;
 
+	/*
+	 * Newly allocated tree block should never be cached in radix tree,
+	 * Or we have a corrupted extent tree.
+	 */
+	buf = find_extent_buffer(fs_info, ins.objectid);
+	if (buf) {
+		btrfs_err_rl(fs_info,
+	"tree block %llu is already in use, extent tree may be corrupted",
+			     ins.objectid);
+		ret = -EUCLEAN;
+		free_extent_buffer(buf);
+		goto out_unuse;
+	}
+
 	buf = btrfs_init_new_buffer(trans, root, ins.objectid, level);
 	if (IS_ERR(buf)) {
 		ret = PTR_ERR(buf);