Message ID | 20171002212654.rsm4uj3hfioddldd@treble (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon, Oct 2, 2017 at 2:26 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > The bisect is pointing to a commit which is almost 5 months old, so this > is pre-ORC. Kallsyms *is* enabled, but the unwinder dump isn't smart > enough to realize it's dumping misaligned stack addresses: Ahh, I didn't pick up on that "esp isn't aligned" part. That said, if %esp gets unaligned at some point, it's not clear exactly when we should align it. An unaligned stack pointer will continue to _work_ just potentially perform fairly badly. But more likely, we picked the wrong frame value to begin with. For example, maybe that decode_frame_pointer() logic really should check not that the low bit in bp is set, but instead check that it's a valid "unsigned long *" that has the low bit set. IOW, the difference would be that instead of checking if (!(regs & 0x1)) return NULL; if would check if ((regs & (sizeof(unsigned long)-1)) != 1) return NULL; but also maybe add logic to simply not trust a next frame pointer that isn't appropriately aligned. So I think adding PTR_ALIGN() there in the unwind dumper might be a bit late. By that time it has already accepted what looks like a garbage frame. No? Linus
On Mon, Oct 02, 2017 at 02:58:08PM -0700, Linus Torvalds wrote: > On Mon, Oct 2, 2017 at 2:26 PM, Josh Poimboeuf <jpoimboe@redhat.com> wrote: > > > > The bisect is pointing to a commit which is almost 5 months old, so this > > is pre-ORC. Kallsyms *is* enabled, but the unwinder dump isn't smart > > enough to realize it's dumping misaligned stack addresses: > > Ahh, I didn't pick up on that "esp isn't aligned" part. > > That said, if %esp gets unaligned at some point, it's not clear > exactly when we should align it. An unaligned stack pointer will > continue to _work_ just potentially perform fairly badly. True, but unaligned stacks seem to be very rare, and I don't remember ever seeing one with frame pointers enabled. I think the call return addresses will always be aligned, so I think it makes sense to align the starting address before dumping. > But more likely, we picked the wrong frame value to begin with. Agreed. > For example, maybe that decode_frame_pointer() logic really should > check not that the low bit in bp is set, but instead check that it's a > valid "unsigned long *" that has the low bit set. > > IOW, the difference would be that instead of checking > > if (!(regs & 0x1)) > return NULL; > > if would check > > if ((regs & (sizeof(unsigned long)-1)) != 1) > return NULL; Yeah, that would be an improvement. > but also maybe add logic to simply not trust a next frame pointer that > isn't appropriately aligned. Also a good idea. > So I think adding PTR_ALIGN() there in the unwind dumper might be a > bit late. By that time it has already accepted what looks like a > garbage frame. No? Yeah, maybe. I guess it really depends on what exactly the root cause is. It got a bad frame pointer somehow, but maybe not *too* bad because it still ended up in the task stack somehow. It's not clear how that happened, but it's possible the task stack had some clues about what led up to it. I'll implement your suggestions in addition to the patch to align sp in the dump code. Having all of them together would hopefully enable the unwinder to detect a bad condition as early as possible and also do a better job at dumping it.
diff --git a/arch/x86/kernel/unwind_frame.c b/arch/x86/kernel/unwind_frame.c index 82c6d7f1fd73..64282ec73eb8 100644 --- a/arch/x86/kernel/unwind_frame.c +++ b/arch/x86/kernel/unwind_frame.c @@ -42,7 +42,7 @@ static void unwind_dump(struct unwind_state *state) state->stack_info.type, state->stack_info.next_sp, state->stack_mask, state->graph_idx); - for (sp = state->orig_sp; sp; sp = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { + for (sp = PTR_ALIGN(state->orig_sp); sp; sp = PTR_ALIGN(stack_info.next_sp, sizeof(long))) { if (get_stack_info(sp, state->task, &stack_info, &visit_mask)) break;