diff mbox series

[v2,02/12] btrfs-progs: do not infinite loop on corrupt keys with lowmem mode

Message ID aaaf2cadf66d9e573e2dbcc3e8fab7984ce42f99.1629322156.git.josef@toxicpanda.com (mailing list archive)
State New, archived
Headers show
Series btrfs-progs: make check handle invalid bg items | expand

Commit Message

Josef Bacik Aug. 18, 2021, 9:33 p.m. UTC
By enabling the lowmem checks properly I uncovered the case where test
007 will infinite loop at the detection stage.  This is because when
checking the inode item we will just btrfs_next_item(), and because we
ignore check tree block failures at read time we don't get an -EIO from
btrfs_next_leaf.  Generally what check usually does is validate the
leaves/nodes as we hit them, but in this case we're not doing that.  Fix
this by checking the leaf if we move to the next one and if it fails
bail.  This allows us to pass the 007 test with lowmem.

Signed-off-by: Josef Bacik <josef@toxicpanda.com>
---
 check/mode-lowmem.c | 9 +++++++++
 1 file changed, 9 insertions(+)

Comments

Qu Wenruo Aug. 19, 2021, 5:42 a.m. UTC | #1
On 2021/8/19 上午5:33, Josef Bacik wrote:
> By enabling the lowmem checks properly I uncovered the case where test
> 007 will infinite loop at the detection stage.  This is because when
> checking the inode item we will just btrfs_next_item(), and because we
> ignore check tree block failures at read time we don't get an -EIO from
> btrfs_next_leaf.  Generally what check usually does is validate the
> leaves/nodes as we hit them, but in this case we're not doing that.  Fix
> this by checking the leaf if we move to the next one and if it fails
> bail.  This allows us to pass the 007 test with lowmem.

Doesn't this mean btrfs_next_item() is not doing what it should do?

Normally we would expect btrfs_next_item() to return -EIO other than
manually checking the returned leaf.

Thanks,
Qu
>
> Signed-off-by: Josef Bacik <josef@toxicpanda.com>
> ---
>   check/mode-lowmem.c | 9 +++++++++
>   1 file changed, 9 insertions(+)
>
> diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
> index 323e66bc..7fc7d467 100644
> --- a/check/mode-lowmem.c
> +++ b/check/mode-lowmem.c
> @@ -2675,6 +2675,15 @@ static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path)
>   	while (1) {
>   		btrfs_item_key_to_cpu(path->nodes[0], &last_key, path->slots[0]);
>   		ret = btrfs_next_item(root, path);
> +
> +		/*
> +		 * New leaf, we need to check it and see if it's valid, if not
> +		 * we need to bail otherwise we could end up stuck.
> +		 */
> +		if (path->slots[0] == 0 &&
> +		    btrfs_check_leaf(gfs_info, NULL, path->nodes[0]))
> +			ret = -EIO;
> +
>   		if (ret < 0) {
>   			/* out will fill 'err' rusing current statistics */
>   			goto out;
>
David Sterba Aug. 23, 2021, 3:04 p.m. UTC | #2
On Thu, Aug 19, 2021 at 01:42:39PM +0800, Qu Wenruo wrote:
> 
> 
> On 2021/8/19 上午5:33, Josef Bacik wrote:
> > By enabling the lowmem checks properly I uncovered the case where test
> > 007 will infinite loop at the detection stage.  This is because when
> > checking the inode item we will just btrfs_next_item(), and because we
> > ignore check tree block failures at read time we don't get an -EIO from
> > btrfs_next_leaf.  Generally what check usually does is validate the
> > leaves/nodes as we hit them, but in this case we're not doing that.  Fix
> > this by checking the leaf if we move to the next one and if it fails
> > bail.  This allows us to pass the 007 test with lowmem.
> 
> Doesn't this mean btrfs_next_item() is not doing what it should do?
> 
> Normally we would expect btrfs_next_item() to return -EIO other than
> manually checking the returned leaf.

That's an interesting point, I think we rely on that behaviour
elsewhere too.
Josef Bacik Aug. 23, 2021, 6:44 p.m. UTC | #3
On 8/23/21 11:04 AM, David Sterba wrote:
> On Thu, Aug 19, 2021 at 01:42:39PM +0800, Qu Wenruo wrote:
>>
>>
>> On 2021/8/19 上午5:33, Josef Bacik wrote:
>>> By enabling the lowmem checks properly I uncovered the case where test
>>> 007 will infinite loop at the detection stage.  This is because when
>>> checking the inode item we will just btrfs_next_item(), and because we
>>> ignore check tree block failures at read time we don't get an -EIO from
>>> btrfs_next_leaf.  Generally what check usually does is validate the
>>> leaves/nodes as we hit them, but in this case we're not doing that.  Fix
>>> this by checking the leaf if we move to the next one and if it fails
>>> bail.  This allows us to pass the 007 test with lowmem.
>>
>> Doesn't this mean btrfs_next_item() is not doing what it should do?
>>
>> Normally we would expect btrfs_next_item() to return -EIO other than
>> manually checking the returned leaf.
> 
> That's an interesting point, I think we rely on that behaviour
> elsewhere too.
> 

This is the result of how we deal with corrupt blocks.  We will happily 
read corrupt blocks with check, because we expect check to do it's own 
btrfs_check_node/btrfs_check_leaf().  The problem here is that if the 
block is corrupt it's still in cache, and so btrfs_next_leaf() will 
return it because the buffer is marked uptodate.

It looks like search does the extra check_block() specifically to catch 
this case, so I'll fix next_leaf to do the same thing as well.  Thanks,

Josef
Qu Wenruo Aug. 23, 2021, 11:34 p.m. UTC | #4
On 2021/8/24 上午2:44, Josef Bacik wrote:
> On 8/23/21 11:04 AM, David Sterba wrote:
>> On Thu, Aug 19, 2021 at 01:42:39PM +0800, Qu Wenruo wrote:
>>>
>>>
>>> On 2021/8/19 上午5:33, Josef Bacik wrote:
>>>> By enabling the lowmem checks properly I uncovered the case where test
>>>> 007 will infinite loop at the detection stage.  This is because when
>>>> checking the inode item we will just btrfs_next_item(), and because we
>>>> ignore check tree block failures at read time we don't get an -EIO from
>>>> btrfs_next_leaf.  Generally what check usually does is validate the
>>>> leaves/nodes as we hit them, but in this case we're not doing that.
>>>> Fix
>>>> this by checking the leaf if we move to the next one and if it fails
>>>> bail.  This allows us to pass the 007 test with lowmem.
>>>
>>> Doesn't this mean btrfs_next_item() is not doing what it should do?
>>>
>>> Normally we would expect btrfs_next_item() to return -EIO other than
>>> manually checking the returned leaf.
>>
>> That's an interesting point, I think we rely on that behaviour
>> elsewhere too.
>>
>
> This is the result of how we deal with corrupt blocks.  We will happily
> read corrupt blocks with check, because we expect check to do it's own
> btrfs_check_node/btrfs_check_leaf().  The problem here is that if the
> block is corrupt it's still in cache, and so btrfs_next_leaf() will
> return it because the buffer is marked uptodate.

Shouldn't we prevent the corrupted block from entering the cache?

>
> It looks like search does the extra check_block() specifically to catch
> this case, so I'll fix next_leaf to do the same thing as well.  Thanks,

OK for now I think it's fine to have the extra check.

It won't cause any harm even if we solved the cache problem in the future.

Thanks,
Qu

>
> Josef
diff mbox series

Patch

diff --git a/check/mode-lowmem.c b/check/mode-lowmem.c
index 323e66bc..7fc7d467 100644
--- a/check/mode-lowmem.c
+++ b/check/mode-lowmem.c
@@ -2675,6 +2675,15 @@  static int check_inode_item(struct btrfs_root *root, struct btrfs_path *path)
 	while (1) {
 		btrfs_item_key_to_cpu(path->nodes[0], &last_key, path->slots[0]);
 		ret = btrfs_next_item(root, path);
+
+		/*
+		 * New leaf, we need to check it and see if it's valid, if not
+		 * we need to bail otherwise we could end up stuck.
+		 */
+		if (path->slots[0] == 0 &&
+		    btrfs_check_leaf(gfs_info, NULL, path->nodes[0]))
+			ret = -EIO;
+
 		if (ret < 0) {
 			/* out will fill 'err' rusing current statistics */
 			goto out;