[RFC] btrfs-progs: map-logical: look at next leaf if slot > items
diff mbox

Message ID CA+X5Wn5iV-q3jfDBPjGgvjtO4j+DCs0nifjFsya6GfLG9eNYWA@mail.gmail.com
State New
Headers show

Commit Message

james harvey June 7, 2018, 3:33 a.m. UTC
goto again;
--
2.17.0
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Su Yue June 7, 2018, 4:20 a.m. UTC | #1
On 06/07/2018 11:33 AM, james harvey wrote:
> =====[ NOTE ]=====
> 
> I think I found a buffer over-read error that will come up other places, unless
> EACH caller checks bounds themselves.  See "Bonus bug, LEFT FOR READER" below.
> 
> =====[ PROBLEM ]=====
> 
> Using btrfs-progs v4.16:
> 
> No extent found at range [10955980800,10955984896)
> 
> But, this extent exists.  btrfs-debug-tree shows:
> ...
> extent tree key (EXTENT_TREE ROOT_ITEM 0)
> ...
> leaf 316456960 items 203 free space 3697 generation 84225 owner EXTENT_TREE
> ...
>         item 197 key (10955931648 EXTENT_ITEM 28672) itemoff 8957 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142622720 count 1
>         item 198 key (10955960320 EXTENT_ITEM 4096) itemoff 8920 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142622720 count 1
>         item 199 key (10955964416 EXTENT_ITEM 4096) itemoff 8883 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142622720 count 1
>         item 200 key (10955968512 EXTENT_ITEM 4096) itemoff 8846 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142622720 count 1
>         item 201 key (10955972608 EXTENT_ITEM 4096) itemoff 8809 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142655488 count 1
>         item 202 key (10955976704 EXTENT_ITEM 4096) itemoff 8772 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142655488 count 1
> ...
> leaf 316489728 items 208 free space 3387 generation 84225 owner EXTENT_TREE
> ...
>         item 0 key (10955980800 EXTENT_ITEM 4096) itemoff 16246 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 128958464 count 1
> ...
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> ...
>         item 412 key (EXTENT_CSUM EXTENT_CSUM 10955980800) itemoff 10647
>                    itemsize 4
>                 range start 10955980800 end 10955984896 length 4096
> ....
> file tree key (3038 ROOT_ITEM 80009)
> ...
> leaf 128958464 items 37 free space 6032 generation 62656 owner FS_TREE
> ...
>         item 1 key (997645 INODE_ITEM 0) itemoff 15220 itemsize 160
>                 generation 62656 transid 62656 size 4943 nbytes 8192
>                 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
>                 sequence 5 flags 0x0(none)
>                 atime 1478246492.0 (2016-11-04 08:01:32)
>                 ctime 1478246493.129060242 (2016-11-04 08:01:33)
>                 mtime 1477487995.0 (2016-10-26 13:19:55)
>                 otime 1478246493.129060242 (2016-11-04 08:01:33)
>         item 2 key (997645 INODE_REF 299949) itemoff 15200 itemsize 20
>                 index 13 namelen 10 name: as_map.hpp
>         item 3 key (997645 EXTENT_DATA 0) itemoff 15147 itemsize 53
>                 generation 62656 type 1 (regular)
>                 extent data disk byte 10955980800 nr 4096
>                 extent data offset 0 nr 8192 ram 8192
>                 extent compression 2 (lzo)
> ...
> 
> =====[ CAUSE ]=====
> 
> In main's first call to map_one_extent(10955980800, 4096, 0):
> 
> * btrfs_search_slot() looks for (10955980800, 0, 0), and:
> ** returns 1 because it doesn't exist
> ** sets path->slots[0] to 203 (for leaf 316456960), where it should go if
>    inserted (pointing after the last existing item)
> * ret is reset to 0
> * btrfs_item_key_to_cpu sets key to (10955960320, BTRFS_EXTENT_ITEM_KEY, 4096)
> !!! Bonus bug, LEFT FOR READER.  Why is this item #197, 5 items before the 203
>     given?  I think no bounds checking causes a buffer over-read here.
>     btrfs_item_key_to_cpu() calls btrfs_item_key(), which uses the macro
>     read_eb_member() to call read_extent_buffer() which memcpy's using an out
>     of range index, at least for this leaf.
> * With (!search_forward && key.objectid > logical) being false, the code calling
>   btrfs_next_item() is not run.
> * logical is set to this too-low key.objectid
> * !ret, so set *logical_ret and *len_ret with the new values
> 
> Back in main:
> 
> * ret is 0, so don't print the first error
> * ret is still 0, so don't run map_one_extent() with search_forward=1
> * At the while loop, (10955960320 + 4096 >= 10955980800 && 10955960320 <
>   10955980800 + 4096) (10955964416 >= 10955980800 && 10955960320 < 10955984896)
>   (false && true) (false), so don't call map_one_extent() with search_forward=1
>   here, either.
> * In the while loop, now call map_one_extent() with search_forward=1
> * !found, so print (somewhat deceptive) error only mentioning the user-given
>   logical without mentioning it looked elsewhere, and give up.
> 
> =====[ FIX ]=====
> 
> btrfs-map-logical and I are not friends.  The "least code" fix for this
> situation is this patch.
> 
> Qu's "btrfs-progs: check: Also compare data between mirrors to detect corruption
> for NODATASUM extents" uses a simpler way, which makes me wonder if that should
> just be modified to replace how btrfs-map-logical works.  But, I'll admit I do
> not have my head around the entire way everything is done in btrfs-map-logical,
> and there could be reasons for it needing to be different here.
> 
> This doesn't touch what I think is the buffer over-read error described above.
> 
> With this fix, btrfs_item_key_to_cpu() is not asked to read out of bounds, so
> map_one_extent() leaves cur_logical and cur_len unmodified because they don't
> need to be.  (Both the first time it's ran with search_forward=0, and in the
> while loop when it's ran with search_forward=1.)
> 
> Also updated call from btrfs_next_item() to btrfs_next_extent_item().  I can't
> see any reason not to, while this area's being modified.  It looks for both
> BTRFS_EXTENT_ITEM_KEY and BTRFS_METADATA_ITEM_KEY, which is what we need.
> (Granted, inside, it's just calling btrfs_next_item().)
> 
Make sense. IMP the commit message is too long to read.
Code wise is almost fine. Some nitpicks are below.

> Also fixed misspelling of "foward" to "forward".
> 
> Signed-off-by: James Harvey <jamespharvey20@gmail.com>
> ---
>  btrfs-map-logical.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/btrfs-map-logical.c b/btrfs-map-logical.c
> index 7a8bcff9..1c30b22b 100644
> --- a/btrfs-map-logical.c
> +++ b/btrfs-map-logical.c
> @@ -39,7 +39,7 @@
>  static FILE *info_file;
> 
>  static int map_one_extent(struct btrfs_fs_info *fs_info,
> -                         u64 *logical_ret, u64 *len_ret, int search_foward)
> +                         u64 *logical_ret, u64 *len_ret, int search_forward)
>  {
>         struct btrfs_path *path;
>         struct btrfs_key key;
> @@ -65,17 +65,25 @@ static int map_one_extent(struct btrfs_fs_info *fs_info,
>         BUG_ON(ret == 0);
>         ret = 0;
> 
> +       if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
> +               ret = btrfs_next_leaf(fs_info->extent_root, path);
> +               if (ret != 0) {
> +                       ret = -1;

btrfs_next_leaf() may return -EIO, so keep the negative return code is
prefered.
btrfs_next_leaf may return > 0, here I'd like to set ret=-ENOENT.
You can refer callers of btrfs_next_leaf() how to handle the return codes.

> +                       goto out;
> +               }
> +       }
> +
>  again:
>         btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> -       if ((search_foward && key.objectid < logical) ||
> -           (!search_foward && key.objectid > logical) ||
> +       if ((search_forward && key.objectid < logical) ||
> +           (!search_forward && key.objectid > logical) ||
>             (key.type != BTRFS_EXTENT_ITEM_KEY &&
>              key.type != BTRFS_METADATA_ITEM_KEY)) {
> -               if (!search_foward)
> +               if (!search_forward)
>                         ret = btrfs_previous_extent_item(fs_info->extent_root,
>                                                          path, 0);
>                 else
> -                       ret = btrfs_next_item(fs_info->extent_root, path);
> +                       ret =
Nit:
Misclick here?

Thanks,
Su

> btrfs_next_extent_item(fs_info->extent_root, path);
>                 if (ret)
>                         goto out;
>                 goto again;
> --
> 2.17.0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 


--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo June 7, 2018, 4:44 a.m. UTC | #2
On 2018年06月07日 11:33, james harvey wrote:
> =====[ NOTE ]=====
> 
> I think I found a buffer over-read error that will come up other places, unless
> EACH caller checks bounds themselves.  See "Bonus bug, LEFT FOR READER" below.
> 
> =====[ PROBLEM ]=====
> 
> Using btrfs-progs v4.16:
> 
> No extent found at range [10955980800,10955984896)
> 
> But, this extent exists.  btrfs-debug-tree shows:
> ...
> extent tree key (EXTENT_TREE ROOT_ITEM 0)
> ...
> leaf 316456960 items 203 free space 3697 generation 84225 owner EXTENT_TREE
> ...
>         item 197 key (10955931648 EXTENT_ITEM 28672) itemoff 8957 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142622720 count 1
>         item 198 key (10955960320 EXTENT_ITEM 4096) itemoff 8920 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142622720 count 1
>         item 199 key (10955964416 EXTENT_ITEM 4096) itemoff 8883 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142622720 count 1
>         item 200 key (10955968512 EXTENT_ITEM 4096) itemoff 8846 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142622720 count 1
>         item 201 key (10955972608 EXTENT_ITEM 4096) itemoff 8809 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142655488 count 1
>         item 202 key (10955976704 EXTENT_ITEM 4096) itemoff 8772 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 142655488 count 1
> ...
> leaf 316489728 items 208 free space 3387 generation 84225 owner EXTENT_TREE
> ...
>         item 0 key (10955980800 EXTENT_ITEM 4096) itemoff 16246 itemsize 37
>                 refs 1 gen 62656 flags DATA
>                 shared data backref parent 128958464 count 1
> ...
> checksum tree key (CSUM_TREE ROOT_ITEM 0)
> ...
>         item 412 key (EXTENT_CSUM EXTENT_CSUM 10955980800) itemoff 10647
>                    itemsize 4
>                 range start 10955980800 end 10955984896 length 4096
> ....
> file tree key (3038 ROOT_ITEM 80009)
> ...
> leaf 128958464 items 37 free space 6032 generation 62656 owner FS_TREE
> ...
>         item 1 key (997645 INODE_ITEM 0) itemoff 15220 itemsize 160
>                 generation 62656 transid 62656 size 4943 nbytes 8192
>                 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
>                 sequence 5 flags 0x0(none)
>                 atime 1478246492.0 (2016-11-04 08:01:32)
>                 ctime 1478246493.129060242 (2016-11-04 08:01:33)
>                 mtime 1477487995.0 (2016-10-26 13:19:55)
>                 otime 1478246493.129060242 (2016-11-04 08:01:33)
>         item 2 key (997645 INODE_REF 299949) itemoff 15200 itemsize 20
>                 index 13 namelen 10 name: as_map.hpp
>         item 3 key (997645 EXTENT_DATA 0) itemoff 15147 itemsize 53
>                 generation 62656 type 1 (regular)
>                 extent data disk byte 10955980800 nr 4096
>                 extent data offset 0 nr 8192 ram 8192
>                 extent compression 2 (lzo)
> ...
> 
> =====[ CAUSE ]=====
> 
> In main's first call to map_one_extent(10955980800, 4096, 0):
> 
> * btrfs_search_slot() looks for (10955980800, 0, 0), and:
> ** returns 1 because it doesn't exist
> ** sets path->slots[0] to 203 (for leaf 316456960), where it should go if
>    inserted (pointing after the last existing item)
> * ret is reset to 0
> * btrfs_item_key_to_cpu sets key to (10955960320, BTRFS_EXTENT_ITEM_KEY, 4096)
> !!! Bonus bug, LEFT FOR READER.  Why is this item #197, 5 items before the 203
>     given?  I think no bounds checking causes a buffer over-read here.
>     btrfs_item_key_to_cpu() calls btrfs_item_key(), which uses the macro
>     read_eb_member() to call read_extent_buffer() which memcpy's using an out
>     of range index, at least for this leaf.

Here the key you got could be garbage.
Either the leaf has enough free space, then you got pending zero into
the key, or the leaf is full, you got some item data into the key.

Either way, the key should be read/trusted at all.

> * With (!search_forward && key.objectid > logical) being false, the code calling
>   btrfs_next_item() is not run.
> * logical is set to this too-low key.objectid
> * !ret, so set *logical_ret and *len_ret with the new values

Personally speaking, the first paragraph of dump tree could already
explain the bug pretty well.
So the explanation is a little longer here.

> 
> Back in main:
> 
> * ret is 0, so don't print the first error
> * ret is still 0, so don't run map_one_extent() with search_forward=1
> * At the while loop, (10955960320 + 4096 >= 10955980800 && 10955960320 <
>   10955980800 + 4096) (10955964416 >= 10955980800 && 10955960320 < 10955984896)
>   (false && true) (false), so don't call map_one_extent() with search_forward=1
>   here, either.
> * In the while loop, now call map_one_extent() with search_forward=1
> * !found, so print (somewhat deceptive) error only mentioning the user-given
>   logical without mentioning it looked elsewhere, and give up.
> 
> =====[ FIX ]=====
> 
> btrfs-map-logical and I are not friends.  The "least code" fix for this
> situation is this patch.
> 
> Qu's "btrfs-progs: check: Also compare data between mirrors to detect corruption
> for NODATASUM extents" uses a simpler way, which makes me wonder if that should
> just be modified to replace how btrfs-map-logical works.

That only works for certain cases.

Thus btrfs-map-logical is still a pretty handy tool for education/debug
purpose.

>  But, I'll admit I do
> not have my head around the entire way everything is done in btrfs-map-logical,
> and there could be reasons for it needing to be different here.
> 
> This doesn't touch what I think is the buffer over-read error described above.
> 
> With this fix, btrfs_item_key_to_cpu() is not asked to read out of bounds, so
> map_one_extent() leaves cur_logical and cur_len unmodified because they don't
> need to be.  (Both the first time it's ran with search_forward=0, and in the
> while loop when it's ran with search_forward=1.)
> 
> Also updated call from btrfs_next_item() to btrfs_next_extent_item().  I can't
> see any reason not to, while this area's being modified.  It looks for both
> BTRFS_EXTENT_ITEM_KEY and BTRFS_METADATA_ITEM_KEY, which is what we need.
> (Granted, inside, it's just calling btrfs_next_item().)
> 
> Also fixed misspelling of "foward" to "forward".
> 
> Signed-off-by: James Harvey <jamespharvey20@gmail.com>
> ---
>  btrfs-map-logical.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
> 
> diff --git a/btrfs-map-logical.c b/btrfs-map-logical.c
> index 7a8bcff9..1c30b22b 100644
> --- a/btrfs-map-logical.c
> +++ b/btrfs-map-logical.c
> @@ -39,7 +39,7 @@
>  static FILE *info_file;
> 
>  static int map_one_extent(struct btrfs_fs_info *fs_info,
> -                         u64 *logical_ret, u64 *len_ret, int search_foward)
> +                         u64 *logical_ret, u64 *len_ret, int search_forward)

OK, I checked twice and then I realized it's a typo...

Normally David would request you to split the patch, and put the typo
fix into a separate patch, to avoid jamming the real fix.

>  {
>         struct btrfs_path *path;
>         struct btrfs_key key;
> @@ -65,17 +65,25 @@ static int map_one_extent(struct btrfs_fs_info *fs_info,
>         BUG_ON(ret == 0);
>         ret = 0;
> 
> +       if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
> +               ret = btrfs_next_leaf(fs_info->extent_root, path);
> +               if (ret != 0) {
> +                       ret = -1;
> +                       goto out;
> +               }
> +       }
> +

Yes, we need such check especially when we search using key smaller than
what we really want.

So the fix here is correct.

And that's also why normally we search with larger key, and then use
btrfs_previous_item() to avoid such problem.


Thanks,
Qu

>  again:
>         btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
> -       if ((search_foward && key.objectid < logical) ||
> -           (!search_foward && key.objectid > logical) ||
> +       if ((search_forward && key.objectid < logical) ||
> +           (!search_forward && key.objectid > logical) ||
>             (key.type != BTRFS_EXTENT_ITEM_KEY &&
>              key.type != BTRFS_METADATA_ITEM_KEY)) {
> -               if (!search_foward)
> +               if (!search_forward)
>                         ret = btrfs_previous_extent_item(fs_info->extent_root,
>                                                          path, 0);
>                 else
> -                       ret = btrfs_next_item(fs_info->extent_root, path);
> +                       ret =
> btrfs_next_extent_item(fs_info->extent_root, path);
>                 if (ret)
>                         goto out;
>                 goto again;
> --
> 2.17.0
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
Qu Wenruo June 7, 2018, 4:46 a.m. UTC | #3
On 2018年06月07日 12:20, Su Yue wrote:
> 
> 
> On 06/07/2018 11:33 AM, james harvey wrote:
>> =====[ NOTE ]=====
>>
>> I think I found a buffer over-read error that will come up other places, unless
>> EACH caller checks bounds themselves.  See "Bonus bug, LEFT FOR READER" below.
>>
>> =====[ PROBLEM ]=====
>>
>> Using btrfs-progs v4.16:
>>
>> No extent found at range [10955980800,10955984896)
>>
>> But, this extent exists.  btrfs-debug-tree shows:
>> ...
>> extent tree key (EXTENT_TREE ROOT_ITEM 0)
>> ...
>> leaf 316456960 items 203 free space 3697 generation 84225 owner EXTENT_TREE
>> ...
>>         item 197 key (10955931648 EXTENT_ITEM 28672) itemoff 8957 itemsize 37
>>                 refs 1 gen 62656 flags DATA
>>                 shared data backref parent 142622720 count 1
>>         item 198 key (10955960320 EXTENT_ITEM 4096) itemoff 8920 itemsize 37
>>                 refs 1 gen 62656 flags DATA
>>                 shared data backref parent 142622720 count 1
>>         item 199 key (10955964416 EXTENT_ITEM 4096) itemoff 8883 itemsize 37
>>                 refs 1 gen 62656 flags DATA
>>                 shared data backref parent 142622720 count 1
>>         item 200 key (10955968512 EXTENT_ITEM 4096) itemoff 8846 itemsize 37
>>                 refs 1 gen 62656 flags DATA
>>                 shared data backref parent 142622720 count 1
>>         item 201 key (10955972608 EXTENT_ITEM 4096) itemoff 8809 itemsize 37
>>                 refs 1 gen 62656 flags DATA
>>                 shared data backref parent 142655488 count 1
>>         item 202 key (10955976704 EXTENT_ITEM 4096) itemoff 8772 itemsize 37
>>                 refs 1 gen 62656 flags DATA
>>                 shared data backref parent 142655488 count 1
>> ...
>> leaf 316489728 items 208 free space 3387 generation 84225 owner EXTENT_TREE
>> ...
>>         item 0 key (10955980800 EXTENT_ITEM 4096) itemoff 16246 itemsize 37
>>                 refs 1 gen 62656 flags DATA
>>                 shared data backref parent 128958464 count 1
>> ...
>> checksum tree key (CSUM_TREE ROOT_ITEM 0)
>> ...
>>         item 412 key (EXTENT_CSUM EXTENT_CSUM 10955980800) itemoff 10647
>>                    itemsize 4
>>                 range start 10955980800 end 10955984896 length 4096
>> ....
>> file tree key (3038 ROOT_ITEM 80009)
>> ...
>> leaf 128958464 items 37 free space 6032 generation 62656 owner FS_TREE
>> ...
>>         item 1 key (997645 INODE_ITEM 0) itemoff 15220 itemsize 160
>>                 generation 62656 transid 62656 size 4943 nbytes 8192
>>                 block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
>>                 sequence 5 flags 0x0(none)
>>                 atime 1478246492.0 (2016-11-04 08:01:32)
>>                 ctime 1478246493.129060242 (2016-11-04 08:01:33)
>>                 mtime 1477487995.0 (2016-10-26 13:19:55)
>>                 otime 1478246493.129060242 (2016-11-04 08:01:33)
>>         item 2 key (997645 INODE_REF 299949) itemoff 15200 itemsize 20
>>                 index 13 namelen 10 name: as_map.hpp
>>         item 3 key (997645 EXTENT_DATA 0) itemoff 15147 itemsize 53
>>                 generation 62656 type 1 (regular)
>>                 extent data disk byte 10955980800 nr 4096
>>                 extent data offset 0 nr 8192 ram 8192
>>                 extent compression 2 (lzo)
>> ...
>>
>> =====[ CAUSE ]=====
>>
>> In main's first call to map_one_extent(10955980800, 4096, 0):
>>
>> * btrfs_search_slot() looks for (10955980800, 0, 0), and:
>> ** returns 1 because it doesn't exist
>> ** sets path->slots[0] to 203 (for leaf 316456960), where it should go if
>>    inserted (pointing after the last existing item)
>> * ret is reset to 0
>> * btrfs_item_key_to_cpu sets key to (10955960320, BTRFS_EXTENT_ITEM_KEY, 4096)
>> !!! Bonus bug, LEFT FOR READER.  Why is this item #197, 5 items before the 203
>>     given?  I think no bounds checking causes a buffer over-read here.
>>     btrfs_item_key_to_cpu() calls btrfs_item_key(), which uses the macro
>>     read_eb_member() to call read_extent_buffer() which memcpy's using an out
>>     of range index, at least for this leaf.
>> * With (!search_forward && key.objectid > logical) being false, the code calling
>>   btrfs_next_item() is not run.
>> * logical is set to this too-low key.objectid
>> * !ret, so set *logical_ret and *len_ret with the new values
>>
>> Back in main:
>>
>> * ret is 0, so don't print the first error
>> * ret is still 0, so don't run map_one_extent() with search_forward=1
>> * At the while loop, (10955960320 + 4096 >= 10955980800 && 10955960320 <
>>   10955980800 + 4096) (10955964416 >= 10955980800 && 10955960320 < 10955984896)
>>   (false && true) (false), so don't call map_one_extent() with search_forward=1
>>   here, either.
>> * In the while loop, now call map_one_extent() with search_forward=1
>> * !found, so print (somewhat deceptive) error only mentioning the user-given
>>   logical without mentioning it looked elsewhere, and give up.
>>
>> =====[ FIX ]=====
>>
>> btrfs-map-logical and I are not friends.  The "least code" fix for this
>> situation is this patch.
>>
>> Qu's "btrfs-progs: check: Also compare data between mirrors to detect corruption
>> for NODATASUM extents" uses a simpler way, which makes me wonder if that should
>> just be modified to replace how btrfs-map-logical works.  But, I'll admit I do
>> not have my head around the entire way everything is done in btrfs-map-logical,
>> and there could be reasons for it needing to be different here.
>>
>> This doesn't touch what I think is the buffer over-read error described above.
>>
>> With this fix, btrfs_item_key_to_cpu() is not asked to read out of bounds, so
>> map_one_extent() leaves cur_logical and cur_len unmodified because they don't
>> need to be.  (Both the first time it's ran with search_forward=0, and in the
>> while loop when it's ran with search_forward=1.)
>>
>> Also updated call from btrfs_next_item() to btrfs_next_extent_item().  I can't
>> see any reason not to, while this area's being modified.  It looks for both
>> BTRFS_EXTENT_ITEM_KEY and BTRFS_METADATA_ITEM_KEY, which is what we need.
>> (Granted, inside, it's just calling btrfs_next_item().)
>>
> Make sense. IMP the commit message is too long to read.
> Code wise is almost fine. Some nitpicks are below.
> 
>> Also fixed misspelling of "foward" to "forward".
>>
>> Signed-off-by: James Harvey <jamespharvey20@gmail.com>
>> ---
>>  btrfs-map-logical.c | 18 +++++++++++++-----
>>  1 file changed, 13 insertions(+), 5 deletions(-)
>>
>> diff --git a/btrfs-map-logical.c b/btrfs-map-logical.c
>> index 7a8bcff9..1c30b22b 100644
>> --- a/btrfs-map-logical.c
>> +++ b/btrfs-map-logical.c
>> @@ -39,7 +39,7 @@
>>  static FILE *info_file;
>>
>>  static int map_one_extent(struct btrfs_fs_info *fs_info,
>> -                         u64 *logical_ret, u64 *len_ret, int search_foward)
>> +                         u64 *logical_ret, u64 *len_ret, int search_forward)
>>  {
>>         struct btrfs_path *path;
>>         struct btrfs_key key;
>> @@ -65,17 +65,25 @@ static int map_one_extent(struct btrfs_fs_info *fs_info,
>>         BUG_ON(ret == 0);
>>         ret = 0;
>>
>> +       if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
>> +               ret = btrfs_next_leaf(fs_info->extent_root, path);
>> +               if (ret != 0) {
>> +                       ret = -1;
> 
> btrfs_next_leaf() may return -EIO, so keep the negative return code is
> prefered.
> btrfs_next_leaf may return > 0, here I'd like to set ret=-ENOENT.
> You can refer callers of btrfs_next_leaf() how to handle the return codes.
> 
>> +                       goto out;
>> +               }
>> +       }
>> +
>>  again:
>>         btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
>> -       if ((search_foward && key.objectid < logical) ||
>> -           (!search_foward && key.objectid > logical) ||
>> +       if ((search_forward && key.objectid < logical) ||
>> +           (!search_forward && key.objectid > logical) ||
>>             (key.type != BTRFS_EXTENT_ITEM_KEY &&
>>              key.type != BTRFS_METADATA_ITEM_KEY)) {
>> -               if (!search_foward)
>> +               if (!search_forward)
>>                         ret = btrfs_previous_extent_item(fs_info->extent_root,
>>                                                          path, 0);
>>                 else
>> -                       ret = btrfs_next_item(fs_info->extent_root, path);
>> +                       ret =
> Nit:
> Misclick here?

That would be your (and my) mailer's fault.

If there is a real change line, it will be a leading '+'.

Thanks,
Qu

> 
> Thanks,
> Su
> 
>> btrfs_next_extent_item(fs_info->extent_root, path);
>>                 if (ret)
>>                         goto out;
>>                 goto again;
>> --
>> 2.17.0
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>
>>
> 
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
james harvey June 7, 2018, 7:19 a.m. UTC | #4
On Thu, Jun 7, 2018 at 12:20 AM, Su Yue <suy.fnst@cn.fujitsu.com> wrote:
> On 06/07/2018 11:33 AM, james harvey wrote:
>> Using btrfs-progs v4.16:
>>
>> No extent found at range [10955980800,10955984896)
>>
>> But, this extent exists.  btrfs-debug-tree shows:
> Make sense. IMP the commit message is too long to read.
> Code wise is almost fine. Some nitpicks are below.

Thanks for response.  I'll try to work on that.

>> +       if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
>> +               ret = btrfs_next_leaf(fs_info->extent_root, path);
>> +               if (ret != 0) {
>> +                       ret = -1;
>
> btrfs_next_leaf() may return -EIO, so keep the negative return code is
> prefered.
> btrfs_next_leaf may return > 0, here I'd like to set ret=-ENOENT.
> You can refer callers of btrfs_next_leaf() how to handle the return codes.

Fixed in v2.

>>                 else
>> -                       ret = btrfs_next_item(fs_info->extent_root, path);
>> +                       ret =
> Nit:
> Misclick here?

Missed semicolon being 81st character, fixed in v2.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
james harvey June 7, 2018, 7:57 p.m. UTC | #5
"On Thu, Jun 7, 2018 at 3:26 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> On 2018年06月07日 15:19, james harvey wrote:
>> On Thu, Jun 7, 2018 at 12:44 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>> On 2018年06月07日 11:33, james harvey wrote:
>>>> * btrfs_item_key_to_cpu sets key to (10955960320, BTRFS_EXTENT_ITEM_KEY, 4096)
>>>> !!! Bonus bug, LEFT FOR READER.  Why is this item #197, 5 items before the 203
>>>>     given?  I think no bounds checking causes a buffer over-read here.
>>>>     btrfs_item_key_to_cpu() calls btrfs_item_key(), which uses the macro
>>>>     read_eb_member() to call read_extent_buffer() which memcpy's using an out
>>>>     of range index, at least for this leaf.
>>>
>>> Here the key you got could be garbage.
>>> Either the leaf has enough free space, then you got pending zero into
>>> the key, or the leaf is full, you got some item data into the key.
>>>
>>> Either way, the key should be read/trusted at all.
>>
>> Is the group's preference to leave things like bounds checking to the
>> caller?
>
> The problem here is, btrfs_search_slot() is not only called for
> read_only search, but also for insert.
>
> For insert usage, such returned path is completely valid and in fact
> could be pretty useful.

I don't mean for btrfs_search_slot() behavior to change, since it is
called for inserting.  I mean btrfs_item_key_to_cpu() behavior could
change.

>>  (I get the merit to that, avoiding redundant checks.)  My
>> normal style would be to have read_extent_buffer() fail if start + len
>> is out of bounds.
>
> In this case, it's not out-of-boundary at all, and the invalid key you
> read out if from the last item, which could contain a valid key.
>
> For leaf, btrfs puts btrfs items pointers at the head, and the items
> data at the tail.
>
> Thus for this case, your invalid key slot is just in the middle of the
> leaf, making read_extent_buffer() unable to detect anything wrong.
>
> Thanks,
> Qu

You're right.  read_extent_buffer() doesn't have the information to
look at this.

And, "buffer over-read error" was probably the wrong term.  I mean in
the sense that it has 203 valid values, and is being asked for the
204'th item.

btrfs_item_key() could error on

(nr > btrfs_header_nritems(eb))

This defensive programming would cause redundant checks, if the caller
checks this too as they are supposed to, so I understand if the group
doesn't swing that far on defensive programming.

Although, perhaps in such situation, btrfs_item_key() could
automatically call btrfs_next_leaf().  I'm thinking this might not be
desired behavior everywhere, but just in case it is, mentioning it
because then btrfs_item_key() could handle all of this, not allowing
the caller to shoot their own foot, without redundant checks.
--
To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Qu Wenruo June 8, 2018, 1:21 a.m. UTC | #6
On 2018年06月08日 03:57, james harvey wrote:
> "On Thu, Jun 7, 2018 at 3:26 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>> On 2018年06月07日 15:19, james harvey wrote:
>>> On Thu, Jun 7, 2018 at 12:44 AM, Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>> On 2018年06月07日 11:33, james harvey wrote:
>>>>> * btrfs_item_key_to_cpu sets key to (10955960320, BTRFS_EXTENT_ITEM_KEY, 4096)
>>>>> !!! Bonus bug, LEFT FOR READER.  Why is this item #197, 5 items before the 203
>>>>>     given?  I think no bounds checking causes a buffer over-read here.
>>>>>     btrfs_item_key_to_cpu() calls btrfs_item_key(), which uses the macro
>>>>>     read_eb_member() to call read_extent_buffer() which memcpy's using an out
>>>>>     of range index, at least for this leaf.
>>>>
>>>> Here the key you got could be garbage.
>>>> Either the leaf has enough free space, then you got pending zero into
>>>> the key, or the leaf is full, you got some item data into the key.
>>>>
>>>> Either way, the key should be read/trusted at all.
>>>
>>> Is the group's preference to leave things like bounds checking to the
>>> caller?
>>
>> The problem here is, btrfs_search_slot() is not only called for
>> read_only search, but also for insert.
>>
>> For insert usage, such returned path is completely valid and in fact
>> could be pretty useful.
> 
> I don't mean for btrfs_search_slot() behavior to change, since it is
> called for inserting.  I mean btrfs_item_key_to_cpu() behavior could
> change.

Makes sense!

An nice point to enhance!

> 
>>>  (I get the merit to that, avoiding redundant checks.)  My
>>> normal style would be to have read_extent_buffer() fail if start + len
>>> is out of bounds.
>>
>> In this case, it's not out-of-boundary at all, and the invalid key you
>> read out if from the last item, which could contain a valid key.
>>
>> For leaf, btrfs puts btrfs items pointers at the head, and the items
>> data at the tail.
>>
>> Thus for this case, your invalid key slot is just in the middle of the
>> leaf, making read_extent_buffer() unable to detect anything wrong.
>>
>> Thanks,
>> Qu
> 
> You're right.  read_extent_buffer() doesn't have the information to
> look at this.
> 
> And, "buffer over-read error" was probably the wrong term.  I mean in
> the sense that it has 203 valid values, and is being asked for the
> 204'th item.
> 
> btrfs_item_key() could error on
> 
> (nr > btrfs_header_nritems(eb))
> 
> This defensive programming would cause redundant checks, if the caller
> checks this too as they are supposed to, so I understand if the group
> doesn't swing that far on defensive programming.
> 
> Although, perhaps in such situation, btrfs_item_key() could
> automatically call btrfs_next_leaf().

This introduce extra return value, and for things like btrfs_item_key(),
we need to modify tons of call sites.

But at least, I could check if doing extra check (mostly a WARN_ON()) in
btrfs_item_key() is valid.

Thanks,
Qu

>  I'm thinking this might not be
> desired behavior everywhere, but just in case it is, mentioning it
> because then btrfs_item_key() could handle all of this, not allowing
> the caller to shoot their own foot, without redundant checks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-btrfs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

Patch
diff mbox

=====[ NOTE ]=====

I think I found a buffer over-read error that will come up other places, unless
EACH caller checks bounds themselves.  See "Bonus bug, LEFT FOR READER" below.

=====[ PROBLEM ]=====

Using btrfs-progs v4.16:

No extent found at range [10955980800,10955984896)

But, this extent exists.  btrfs-debug-tree shows:
...
extent tree key (EXTENT_TREE ROOT_ITEM 0)
...
leaf 316456960 items 203 free space 3697 generation 84225 owner EXTENT_TREE
...
        item 197 key (10955931648 EXTENT_ITEM 28672) itemoff 8957 itemsize 37
                refs 1 gen 62656 flags DATA
                shared data backref parent 142622720 count 1
        item 198 key (10955960320 EXTENT_ITEM 4096) itemoff 8920 itemsize 37
                refs 1 gen 62656 flags DATA
                shared data backref parent 142622720 count 1
        item 199 key (10955964416 EXTENT_ITEM 4096) itemoff 8883 itemsize 37
                refs 1 gen 62656 flags DATA
                shared data backref parent 142622720 count 1
        item 200 key (10955968512 EXTENT_ITEM 4096) itemoff 8846 itemsize 37
                refs 1 gen 62656 flags DATA
                shared data backref parent 142622720 count 1
        item 201 key (10955972608 EXTENT_ITEM 4096) itemoff 8809 itemsize 37
                refs 1 gen 62656 flags DATA
                shared data backref parent 142655488 count 1
        item 202 key (10955976704 EXTENT_ITEM 4096) itemoff 8772 itemsize 37
                refs 1 gen 62656 flags DATA
                shared data backref parent 142655488 count 1
...
leaf 316489728 items 208 free space 3387 generation 84225 owner EXTENT_TREE
...
        item 0 key (10955980800 EXTENT_ITEM 4096) itemoff 16246 itemsize 37
                refs 1 gen 62656 flags DATA
                shared data backref parent 128958464 count 1
...
checksum tree key (CSUM_TREE ROOT_ITEM 0)
...
        item 412 key (EXTENT_CSUM EXTENT_CSUM 10955980800) itemoff 10647
                   itemsize 4
                range start 10955980800 end 10955984896 length 4096
....
file tree key (3038 ROOT_ITEM 80009)
...
leaf 128958464 items 37 free space 6032 generation 62656 owner FS_TREE
...
        item 1 key (997645 INODE_ITEM 0) itemoff 15220 itemsize 160
                generation 62656 transid 62656 size 4943 nbytes 8192
                block group 0 mode 100644 links 1 uid 0 gid 0 rdev 0
                sequence 5 flags 0x0(none)
                atime 1478246492.0 (2016-11-04 08:01:32)
                ctime 1478246493.129060242 (2016-11-04 08:01:33)
                mtime 1477487995.0 (2016-10-26 13:19:55)
                otime 1478246493.129060242 (2016-11-04 08:01:33)
        item 2 key (997645 INODE_REF 299949) itemoff 15200 itemsize 20
                index 13 namelen 10 name: as_map.hpp
        item 3 key (997645 EXTENT_DATA 0) itemoff 15147 itemsize 53
                generation 62656 type 1 (regular)
                extent data disk byte 10955980800 nr 4096
                extent data offset 0 nr 8192 ram 8192
                extent compression 2 (lzo)
...

=====[ CAUSE ]=====

In main's first call to map_one_extent(10955980800, 4096, 0):

* btrfs_search_slot() looks for (10955980800, 0, 0), and:
** returns 1 because it doesn't exist
** sets path->slots[0] to 203 (for leaf 316456960), where it should go if
   inserted (pointing after the last existing item)
* ret is reset to 0
* btrfs_item_key_to_cpu sets key to (10955960320, BTRFS_EXTENT_ITEM_KEY, 4096)
!!! Bonus bug, LEFT FOR READER.  Why is this item #197, 5 items before the 203
    given?  I think no bounds checking causes a buffer over-read here.
    btrfs_item_key_to_cpu() calls btrfs_item_key(), which uses the macro
    read_eb_member() to call read_extent_buffer() which memcpy's using an out
    of range index, at least for this leaf.
* With (!search_forward && key.objectid > logical) being false, the code calling
  btrfs_next_item() is not run.
* logical is set to this too-low key.objectid
* !ret, so set *logical_ret and *len_ret with the new values

Back in main:

* ret is 0, so don't print the first error
* ret is still 0, so don't run map_one_extent() with search_forward=1
* At the while loop, (10955960320 + 4096 >= 10955980800 && 10955960320 <
  10955980800 + 4096) (10955964416 >= 10955980800 && 10955960320 < 10955984896)
  (false && true) (false), so don't call map_one_extent() with search_forward=1
  here, either.
* In the while loop, now call map_one_extent() with search_forward=1
* !found, so print (somewhat deceptive) error only mentioning the user-given
  logical without mentioning it looked elsewhere, and give up.

=====[ FIX ]=====

btrfs-map-logical and I are not friends.  The "least code" fix for this
situation is this patch.

Qu's "btrfs-progs: check: Also compare data between mirrors to detect corruption
for NODATASUM extents" uses a simpler way, which makes me wonder if that should
just be modified to replace how btrfs-map-logical works.  But, I'll admit I do
not have my head around the entire way everything is done in btrfs-map-logical,
and there could be reasons for it needing to be different here.

This doesn't touch what I think is the buffer over-read error described above.

With this fix, btrfs_item_key_to_cpu() is not asked to read out of bounds, so
map_one_extent() leaves cur_logical and cur_len unmodified because they don't
need to be.  (Both the first time it's ran with search_forward=0, and in the
while loop when it's ran with search_forward=1.)

Also updated call from btrfs_next_item() to btrfs_next_extent_item().  I can't
see any reason not to, while this area's being modified.  It looks for both
BTRFS_EXTENT_ITEM_KEY and BTRFS_METADATA_ITEM_KEY, which is what we need.
(Granted, inside, it's just calling btrfs_next_item().)

Also fixed misspelling of "foward" to "forward".

Signed-off-by: James Harvey <jamespharvey20@gmail.com>
---
 btrfs-map-logical.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/btrfs-map-logical.c b/btrfs-map-logical.c
index 7a8bcff9..1c30b22b 100644
--- a/btrfs-map-logical.c
+++ b/btrfs-map-logical.c
@@ -39,7 +39,7 @@ 
 static FILE *info_file;

 static int map_one_extent(struct btrfs_fs_info *fs_info,
-                         u64 *logical_ret, u64 *len_ret, int search_foward)
+                         u64 *logical_ret, u64 *len_ret, int search_forward)
 {
        struct btrfs_path *path;
        struct btrfs_key key;
@@ -65,17 +65,25 @@  static int map_one_extent(struct btrfs_fs_info *fs_info,
        BUG_ON(ret == 0);
        ret = 0;

+       if (path->slots[0] >= btrfs_header_nritems(path->nodes[0])) {
+               ret = btrfs_next_leaf(fs_info->extent_root, path);
+               if (ret != 0) {
+                       ret = -1;
+                       goto out;
+               }
+       }
+
 again:
        btrfs_item_key_to_cpu(path->nodes[0], &key, path->slots[0]);
-       if ((search_foward && key.objectid < logical) ||
-           (!search_foward && key.objectid > logical) ||
+       if ((search_forward && key.objectid < logical) ||
+           (!search_forward && key.objectid > logical) ||
            (key.type != BTRFS_EXTENT_ITEM_KEY &&
             key.type != BTRFS_METADATA_ITEM_KEY)) {
-               if (!search_foward)
+               if (!search_forward)
                        ret = btrfs_previous_extent_item(fs_info->extent_root,
                                                         path, 0);
                else
-                       ret = btrfs_next_item(fs_info->extent_root, path);
+                       ret =
btrfs_next_extent_item(fs_info->extent_root, path);
                if (ret)
                        goto out;