diff mbox series

[2/3] btrfs: print extent buffers when sibling keys check fails

Message ID 14f4089a9d26606c7a15e398536ca75f9c484c9c.1682505780.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: improve sibling keys check failure report | expand

Commit Message

Filipe Manana April 26, 2023, 10:51 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

When trying to move keys from one node/leaf to another sibling node/leaf,
if the sibling keys check fails we just print an error message with the
last key of the left sibling and the first key of the right sibling.
However it's also useful to print all the keys of each sibling, as it
may provide some clues to what went wrong, which code path may be
inserting keys in an incorrect order. So just do that, print the siblings
with btrfs_print_tree(), as it works for both leaves and nodes.

Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/ctree.c | 4 ++++
 1 file changed, 4 insertions(+)

Comments

Qu Wenruo April 26, 2023, 11:20 a.m. UTC | #1
On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> When trying to move keys from one node/leaf to another sibling node/leaf,
> if the sibling keys check fails we just print an error message with the
> last key of the left sibling and the first key of the right sibling.
> However it's also useful to print all the keys of each sibling, as it
> may provide some clues to what went wrong, which code path may be
> inserting keys in an incorrect order. So just do that, print the siblings
> with btrfs_print_tree(), as it works for both leaves and nodes.
> 
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Qu Wenruo <wqu@suse.com>

However my concern is, printing two tree blocks may be a little too large.
Definitely not something can be capture by one screen.

Although dumping single tree block is already too large for a single 
screen, I don't have any better way...

Thanks,
Qu
> ---
>   fs/btrfs/ctree.c | 4 ++++
>   1 file changed, 4 insertions(+)
> 
> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> index a0b97a6d075a..a061d7092369 100644
> --- a/fs/btrfs/ctree.c
> +++ b/fs/btrfs/ctree.c
> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
>   	}
>   
>   	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> +		btrfs_crit(left->fs_info, "left extent buffer:");
> +		btrfs_print_tree(left, false);
> +		btrfs_crit(left->fs_info, "right extent buffer:");
> +		btrfs_print_tree(right, false);
>   		btrfs_crit(left->fs_info,
>   "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
>   			   left_last.objectid, left_last.type,
Filipe Manana April 26, 2023, 11:31 a.m. UTC | #2
On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When trying to move keys from one node/leaf to another sibling node/leaf,
> > if the sibling keys check fails we just print an error message with the
> > last key of the left sibling and the first key of the right sibling.
> > However it's also useful to print all the keys of each sibling, as it
> > may provide some clues to what went wrong, which code path may be
> > inserting keys in an incorrect order. So just do that, print the siblings
> > with btrfs_print_tree(), as it works for both leaves and nodes.
> >
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
>
> Reviewed-by: Qu Wenruo <wqu@suse.com>
>
> However my concern is, printing two tree blocks may be a little too large.
> Definitely not something can be capture by one screen.

What?
If I check syslog/dmesg, I can certainly access hundreds, thousands of lines...

I suppose by "capture by one screen" you mean a screenshot?

>
> Although dumping single tree block is already too large for a single
> screen, I don't have any better way...
>
> Thanks,
> Qu
> > ---
> >   fs/btrfs/ctree.c | 4 ++++
> >   1 file changed, 4 insertions(+)
> >
> > diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> > index a0b97a6d075a..a061d7092369 100644
> > --- a/fs/btrfs/ctree.c
> > +++ b/fs/btrfs/ctree.c
> > @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
> >       }
> >
> >       if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> > +             btrfs_crit(left->fs_info, "left extent buffer:");
> > +             btrfs_print_tree(left, false);
> > +             btrfs_crit(left->fs_info, "right extent buffer:");
> > +             btrfs_print_tree(right, false);
> >               btrfs_crit(left->fs_info,
> >   "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
> >                          left_last.objectid, left_last.type,
Qu Wenruo April 26, 2023, 11:40 a.m. UTC | #3
On 2023/4/26 19:31, Filipe Manana wrote:
> On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>
>>
>>
>> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> When trying to move keys from one node/leaf to another sibling node/leaf,
>>> if the sibling keys check fails we just print an error message with the
>>> last key of the left sibling and the first key of the right sibling.
>>> However it's also useful to print all the keys of each sibling, as it
>>> may provide some clues to what went wrong, which code path may be
>>> inserting keys in an incorrect order. So just do that, print the siblings
>>> with btrfs_print_tree(), as it works for both leaves and nodes.
>>>
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>
>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>
>> However my concern is, printing two tree blocks may be a little too large.
>> Definitely not something can be capture by one screen.
> 
> What?
> If I check syslog/dmesg, I can certainly access hundreds, thousands of lines...
> 
> I suppose by "capture by one screen" you mean a screenshot?

Yep, mostly a phone shot of a physical monitor, which a lot of end users 
choose to use to report their initial trans abort.

E.g. 
https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/

I guess the reason is, if the trans abort happens on the root fs, there 
will be no persistent log files anyway.
(Although it's still possible to pass the dmesg to another machine or go 
netconsole, but not everyone has such setup ready).

Thanks,
Qu
> 
>>
>> Although dumping single tree block is already too large for a single
>> screen, I don't have any better way...
>>
>> Thanks,
>> Qu
>>> ---
>>>    fs/btrfs/ctree.c | 4 ++++
>>>    1 file changed, 4 insertions(+)
>>>
>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>> index a0b97a6d075a..a061d7092369 100644
>>> --- a/fs/btrfs/ctree.c
>>> +++ b/fs/btrfs/ctree.c
>>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
>>>        }
>>>
>>>        if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
>>> +             btrfs_crit(left->fs_info, "left extent buffer:");
>>> +             btrfs_print_tree(left, false);
>>> +             btrfs_crit(left->fs_info, "right extent buffer:");
>>> +             btrfs_print_tree(right, false);
>>>                btrfs_crit(left->fs_info,
>>>    "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
>>>                           left_last.objectid, left_last.type,
Filipe Manana April 26, 2023, 11:44 a.m. UTC | #4
On Wed, Apr 26, 2023 at 12:40 PM Qu Wenruo <wqu@suse.com> wrote:
>
>
>
> On 2023/4/26 19:31, Filipe Manana wrote:
> > On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>
> >>
> >>
> >> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> >>> From: Filipe Manana <fdmanana@suse.com>
> >>>
> >>> When trying to move keys from one node/leaf to another sibling node/leaf,
> >>> if the sibling keys check fails we just print an error message with the
> >>> last key of the left sibling and the first key of the right sibling.
> >>> However it's also useful to print all the keys of each sibling, as it
> >>> may provide some clues to what went wrong, which code path may be
> >>> inserting keys in an incorrect order. So just do that, print the siblings
> >>> with btrfs_print_tree(), as it works for both leaves and nodes.
> >>>
> >>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >>
> >> Reviewed-by: Qu Wenruo <wqu@suse.com>
> >>
> >> However my concern is, printing two tree blocks may be a little too large.
> >> Definitely not something can be capture by one screen.
> >
> > What?
> > If I check syslog/dmesg, I can certainly access hundreds, thousands of lines...
> >
> > I suppose by "capture by one screen" you mean a screenshot?
>
> Yep, mostly a phone shot of a physical monitor, which a lot of end users
> choose to use to report their initial trans abort.
>
> E.g.
> https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/
>
> I guess the reason is, if the trans abort happens on the root fs, there
> will be no persistent log files anyway.
> (Although it's still possible to pass the dmesg to another machine or go
> netconsole, but not everyone has such setup ready).

So what?
I don't see how that invalidates printing extent buffers... Many users
are able to access dmesg/syslog
and paste what they get there... It's also useful for development
where we can certainly access everything...


>
> Thanks,
> Qu
> >
> >>
> >> Although dumping single tree block is already too large for a single
> >> screen, I don't have any better way...
> >>
> >> Thanks,
> >> Qu
> >>> ---
> >>>    fs/btrfs/ctree.c | 4 ++++
> >>>    1 file changed, 4 insertions(+)
> >>>
> >>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> >>> index a0b97a6d075a..a061d7092369 100644
> >>> --- a/fs/btrfs/ctree.c
> >>> +++ b/fs/btrfs/ctree.c
> >>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
> >>>        }
> >>>
> >>>        if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> >>> +             btrfs_crit(left->fs_info, "left extent buffer:");
> >>> +             btrfs_print_tree(left, false);
> >>> +             btrfs_crit(left->fs_info, "right extent buffer:");
> >>> +             btrfs_print_tree(right, false);
> >>>                btrfs_crit(left->fs_info,
> >>>    "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
> >>>                           left_last.objectid, left_last.type,
Qu Wenruo April 26, 2023, 11:50 a.m. UTC | #5
On 2023/4/26 19:44, Filipe Manana wrote:
> On Wed, Apr 26, 2023 at 12:40 PM Qu Wenruo <wqu@suse.com> wrote:
>>
>>
>>
>> On 2023/4/26 19:31, Filipe Manana wrote:
>>> On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>>>>
>>>>
>>>>
>>>> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
>>>>> From: Filipe Manana <fdmanana@suse.com>
>>>>>
>>>>> When trying to move keys from one node/leaf to another sibling node/leaf,
>>>>> if the sibling keys check fails we just print an error message with the
>>>>> last key of the left sibling and the first key of the right sibling.
>>>>> However it's also useful to print all the keys of each sibling, as it
>>>>> may provide some clues to what went wrong, which code path may be
>>>>> inserting keys in an incorrect order. So just do that, print the siblings
>>>>> with btrfs_print_tree(), as it works for both leaves and nodes.
>>>>>
>>>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>>>
>>>> Reviewed-by: Qu Wenruo <wqu@suse.com>
>>>>
>>>> However my concern is, printing two tree blocks may be a little too large.
>>>> Definitely not something can be capture by one screen.
>>>
>>> What?
>>> If I check syslog/dmesg, I can certainly access hundreds, thousands of lines...
>>>
>>> I suppose by "capture by one screen" you mean a screenshot?
>>
>> Yep, mostly a phone shot of a physical monitor, which a lot of end users
>> choose to use to report their initial trans abort.
>>
>> E.g.
>> https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/
>>
>> I guess the reason is, if the trans abort happens on the root fs, there
>> will be no persistent log files anyway.
>> (Although it's still possible to pass the dmesg to another machine or go
>> netconsole, but not everyone has such setup ready).
> 
> So what?
> I don't see how that invalidates printing extent buffers... Many users
> are able to access dmesg/syslog
> and paste what they get there... It's also useful for development
> where we can certainly access everything...

That's why I gave the reviewed-by tag.

Maybe we can shrink the output to the first several and last several 
items in the future.

But for now since it's just a trans abort, it should be mostly fine for 
the end uesrs to access the full dump.

Thanks,
Qu
> 
> 
>>
>> Thanks,
>> Qu
>>>
>>>>
>>>> Although dumping single tree block is already too large for a single
>>>> screen, I don't have any better way...
>>>>
>>>> Thanks,
>>>> Qu
>>>>> ---
>>>>>     fs/btrfs/ctree.c | 4 ++++
>>>>>     1 file changed, 4 insertions(+)
>>>>>
>>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
>>>>> index a0b97a6d075a..a061d7092369 100644
>>>>> --- a/fs/btrfs/ctree.c
>>>>> +++ b/fs/btrfs/ctree.c
>>>>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
>>>>>         }
>>>>>
>>>>>         if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
>>>>> +             btrfs_crit(left->fs_info, "left extent buffer:");
>>>>> +             btrfs_print_tree(left, false);
>>>>> +             btrfs_crit(left->fs_info, "right extent buffer:");
>>>>> +             btrfs_print_tree(right, false);
>>>>>                 btrfs_crit(left->fs_info,
>>>>>     "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
>>>>>                            left_last.objectid, left_last.type,
Filipe Manana April 26, 2023, noon UTC | #6
On Wed, Apr 26, 2023 at 12:50 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> On 2023/4/26 19:44, Filipe Manana wrote:
> > On Wed, Apr 26, 2023 at 12:40 PM Qu Wenruo <wqu@suse.com> wrote:
> >>
> >>
> >>
> >> On 2023/4/26 19:31, Filipe Manana wrote:
> >>> On Wed, Apr 26, 2023 at 12:21 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> >>>>
> >>>>
> >>>>
> >>>> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> >>>>> From: Filipe Manana <fdmanana@suse.com>
> >>>>>
> >>>>> When trying to move keys from one node/leaf to another sibling node/leaf,
> >>>>> if the sibling keys check fails we just print an error message with the
> >>>>> last key of the left sibling and the first key of the right sibling.
> >>>>> However it's also useful to print all the keys of each sibling, as it
> >>>>> may provide some clues to what went wrong, which code path may be
> >>>>> inserting keys in an incorrect order. So just do that, print the siblings
> >>>>> with btrfs_print_tree(), as it works for both leaves and nodes.
> >>>>>
> >>>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >>>>
> >>>> Reviewed-by: Qu Wenruo <wqu@suse.com>
> >>>>
> >>>> However my concern is, printing two tree blocks may be a little too large.
> >>>> Definitely not something can be capture by one screen.
> >>>
> >>> What?
> >>> If I check syslog/dmesg, I can certainly access hundreds, thousands of lines...
> >>>
> >>> I suppose by "capture by one screen" you mean a screenshot?
> >>
> >> Yep, mostly a phone shot of a physical monitor, which a lot of end users
> >> choose to use to report their initial trans abort.
> >>
> >> E.g.
> >> https://lore.kernel.org/linux-btrfs/f057bdd1-bdd9-459f-b25f-6a2faa652499@betaapp.fastmail.com/
> >>
> >> I guess the reason is, if the trans abort happens on the root fs, there
> >> will be no persistent log files anyway.
> >> (Although it's still possible to pass the dmesg to another machine or go
> >> netconsole, but not everyone has such setup ready).
> >
> > So what?
> > I don't see how that invalidates printing extent buffers... Many users
> > are able to access dmesg/syslog
> > and paste what they get there... It's also useful for development
> > where we can certainly access everything...
>
> That's why I gave the reviewed-by tag.
>
> Maybe we can shrink the output to the first several and last several
> items in the future.

That's not really useful for most cases, and I've certainly run into key
ordering bugs in the past where having an entire leaf printed made the
problem easier to debug,
sometimes it made it immediately obvious where the bug happened and why.

We are avoiding printing useful stuff just because we assume people will take
a photo from their phone... For those with access to dmesg/syslog, we
lose useful
information.

>
> But for now since it's just a trans abort, it should be mostly fine for
> the end uesrs to access the full dump.
>
> Thanks,
> Qu
> >
> >
> >>
> >> Thanks,
> >> Qu
> >>>
> >>>>
> >>>> Although dumping single tree block is already too large for a single
> >>>> screen, I don't have any better way...
> >>>>
> >>>> Thanks,
> >>>> Qu
> >>>>> ---
> >>>>>     fs/btrfs/ctree.c | 4 ++++
> >>>>>     1 file changed, 4 insertions(+)
> >>>>>
> >>>>> diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
> >>>>> index a0b97a6d075a..a061d7092369 100644
> >>>>> --- a/fs/btrfs/ctree.c
> >>>>> +++ b/fs/btrfs/ctree.c
> >>>>> @@ -2708,6 +2708,10 @@ static bool check_sibling_keys(struct extent_buffer *left,
> >>>>>         }
> >>>>>
> >>>>>         if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
> >>>>> +             btrfs_crit(left->fs_info, "left extent buffer:");
> >>>>> +             btrfs_print_tree(left, false);
> >>>>> +             btrfs_crit(left->fs_info, "right extent buffer:");
> >>>>> +             btrfs_print_tree(right, false);
> >>>>>                 btrfs_crit(left->fs_info,
> >>>>>     "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
> >>>>>                            left_last.objectid, left_last.type,
David Sterba April 27, 2023, 11 p.m. UTC | #7
On Wed, Apr 26, 2023 at 07:20:57PM +0800, Qu Wenruo wrote:
> On 2023/4/26 18:51, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> > 
> > When trying to move keys from one node/leaf to another sibling node/leaf,
> > if the sibling keys check fails we just print an error message with the
> > last key of the left sibling and the first key of the right sibling.
> > However it's also useful to print all the keys of each sibling, as it
> > may provide some clues to what went wrong, which code path may be
> > inserting keys in an incorrect order. So just do that, print the siblings
> > with btrfs_print_tree(), as it works for both leaves and nodes.
> > 
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> 
> Reviewed-by: Qu Wenruo <wqu@suse.com>
> 
> However my concern is, printing two tree blocks may be a little too large.
> Definitely not something can be capture by one screen.
> 
> Although dumping single tree block is already too large for a single 
> screen, I don't have any better way...

For debugging with full logging we can add what we think would be
useful but for the screenshot debugging we can only choose the last
piece of information which I think is the stack trace.
diff mbox series

Patch

diff --git a/fs/btrfs/ctree.c b/fs/btrfs/ctree.c
index a0b97a6d075a..a061d7092369 100644
--- a/fs/btrfs/ctree.c
+++ b/fs/btrfs/ctree.c
@@ -2708,6 +2708,10 @@  static bool check_sibling_keys(struct extent_buffer *left,
 	}
 
 	if (btrfs_comp_cpu_keys(&left_last, &right_first) >= 0) {
+		btrfs_crit(left->fs_info, "left extent buffer:");
+		btrfs_print_tree(left, false);
+		btrfs_crit(left->fs_info, "right extent buffer:");
+		btrfs_print_tree(right, false);
 		btrfs_crit(left->fs_info,
 "bad key order, sibling blocks, left last (%llu %u %llu) right first (%llu %u %llu)",
 			   left_last.objectid, left_last.type,