[10/13] x86/dumpstack: Try harder to get a call trace on stack overflow
diff mbox

Message ID f3202c855a2a61ae4c37d8b3835e00fb384c56b9.1466036668.git.luto@kernel.org
State New
Headers show

Commit Message

Andy Lutomirski June 16, 2016, 12:28 a.m. UTC
If we overflow the stack, print_context_stack will abort.  Detect
this case and rewind back into the valid part of the stack so that
we can trace it.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/x86/kernel/dumpstack.c | 7 +++++++
 1 file changed, 7 insertions(+)

Comments

Josh Poimboeuf June 16, 2016, 6:16 p.m. UTC | #1
On Wed, Jun 15, 2016 at 05:28:32PM -0700, Andy Lutomirski wrote:
> If we overflow the stack, print_context_stack will abort.  Detect
> this case and rewind back into the valid part of the stack so that
> we can trace it.
> 
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/x86/kernel/dumpstack.c | 7 +++++++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> index d4d085e27d04..400a2e17c1d1 100644
> --- a/arch/x86/kernel/dumpstack.c
> +++ b/arch/x86/kernel/dumpstack.c
> @@ -100,6 +100,13 @@ print_context_stack(struct thread_info *tinfo,
>  {
>  	struct stack_frame *frame = (struct stack_frame *)bp;
>  
> +	/*
> +	 * If we overflowed the stack into a guard page, jump back to the
> +	 * bottom of the usable stack.
> +	 */
> +	if ((unsigned long)tinfo - (unsigned long)stack < PAGE_SIZE)
> +		stack = (unsigned long *)tinfo + 1;

That will start walking the stack in the middle of the thread_info
struct.

I think you meant:

		stack = (unsigned long *)(tinfo + 1)

However, thread_info will have been overwritten anyway.  So maybe it
should just be:

		stack = tinfo;

(Though that still wouldn't quite work because the valid_stack_ptr()
check would fail...)

> +
>  	while (valid_stack_ptr(tinfo, stack, sizeof(*stack), end)) {
>  		unsigned long addr;
Andy Lutomirski June 16, 2016, 6:22 p.m. UTC | #2
On Thu, Jun 16, 2016 at 11:16 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Wed, Jun 15, 2016 at 05:28:32PM -0700, Andy Lutomirski wrote:
>> If we overflow the stack, print_context_stack will abort.  Detect
>> this case and rewind back into the valid part of the stack so that
>> we can trace it.
>>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>>  arch/x86/kernel/dumpstack.c | 7 +++++++
>>  1 file changed, 7 insertions(+)
>>
>> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
>> index d4d085e27d04..400a2e17c1d1 100644
>> --- a/arch/x86/kernel/dumpstack.c
>> +++ b/arch/x86/kernel/dumpstack.c
>> @@ -100,6 +100,13 @@ print_context_stack(struct thread_info *tinfo,
>>  {
>>       struct stack_frame *frame = (struct stack_frame *)bp;
>>
>> +     /*
>> +      * If we overflowed the stack into a guard page, jump back to the
>> +      * bottom of the usable stack.
>> +      */
>> +     if ((unsigned long)tinfo - (unsigned long)stack < PAGE_SIZE)
>> +             stack = (unsigned long *)tinfo + 1;
>
> That will start walking the stack in the middle of the thread_info
> struct.
>
> I think you meant:
>
>                 stack = (unsigned long *)(tinfo + 1)
>
> However, thread_info will have been overwritten anyway.  So maybe it
> should just be:
>
>                 stack = tinfo;
>
> (Though that still wouldn't quite work because the valid_stack_ptr()
> check would fail...)

I did mean what I wrote, because I wanted to start at the bottom of
the validly allocated area.  IOW I wanted to do the minimum possible
backward jump to make the code display something.

Eventually I want to make thread_info empty, in which case the
distinction won't matter so much.

--Andy
Josh Poimboeuf June 16, 2016, 6:33 p.m. UTC | #3
On Thu, Jun 16, 2016 at 11:22:14AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 16, 2016 at 11:16 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Wed, Jun 15, 2016 at 05:28:32PM -0700, Andy Lutomirski wrote:
> >> If we overflow the stack, print_context_stack will abort.  Detect
> >> this case and rewind back into the valid part of the stack so that
> >> we can trace it.
> >>
> >> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> >> ---
> >>  arch/x86/kernel/dumpstack.c | 7 +++++++
> >>  1 file changed, 7 insertions(+)
> >>
> >> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> >> index d4d085e27d04..400a2e17c1d1 100644
> >> --- a/arch/x86/kernel/dumpstack.c
> >> +++ b/arch/x86/kernel/dumpstack.c
> >> @@ -100,6 +100,13 @@ print_context_stack(struct thread_info *tinfo,
> >>  {
> >>       struct stack_frame *frame = (struct stack_frame *)bp;
> >>
> >> +     /*
> >> +      * If we overflowed the stack into a guard page, jump back to the
> >> +      * bottom of the usable stack.
> >> +      */
> >> +     if ((unsigned long)tinfo - (unsigned long)stack < PAGE_SIZE)
> >> +             stack = (unsigned long *)tinfo + 1;
> >
> > That will start walking the stack in the middle of the thread_info
> > struct.
> >
> > I think you meant:
> >
> >                 stack = (unsigned long *)(tinfo + 1)
> >
> > However, thread_info will have been overwritten anyway.  So maybe it
> > should just be:
> >
> >                 stack = tinfo;
> >
> > (Though that still wouldn't quite work because the valid_stack_ptr()
> > check would fail...)
> 
> I did mean what I wrote, because I wanted to start at the bottom of
> the validly allocated area.  IOW I wanted to do the minimum possible
> backward jump to make the code display something.

But why the "+ 1"?  Is that a hack to make it pass the valid_stack_ptr()
check?
Andy Lutomirski June 16, 2016, 6:37 p.m. UTC | #4
On Thu, Jun 16, 2016 at 11:33 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> On Thu, Jun 16, 2016 at 11:22:14AM -0700, Andy Lutomirski wrote:
>> On Thu, Jun 16, 2016 at 11:16 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
>> > On Wed, Jun 15, 2016 at 05:28:32PM -0700, Andy Lutomirski wrote:
>> >> If we overflow the stack, print_context_stack will abort.  Detect
>> >> this case and rewind back into the valid part of the stack so that
>> >> we can trace it.
>> >>
>> >> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> >> ---
>> >>  arch/x86/kernel/dumpstack.c | 7 +++++++
>> >>  1 file changed, 7 insertions(+)
>> >>
>> >> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
>> >> index d4d085e27d04..400a2e17c1d1 100644
>> >> --- a/arch/x86/kernel/dumpstack.c
>> >> +++ b/arch/x86/kernel/dumpstack.c
>> >> @@ -100,6 +100,13 @@ print_context_stack(struct thread_info *tinfo,
>> >>  {
>> >>       struct stack_frame *frame = (struct stack_frame *)bp;
>> >>
>> >> +     /*
>> >> +      * If we overflowed the stack into a guard page, jump back to the
>> >> +      * bottom of the usable stack.
>> >> +      */
>> >> +     if ((unsigned long)tinfo - (unsigned long)stack < PAGE_SIZE)
>> >> +             stack = (unsigned long *)tinfo + 1;
>> >
>> > That will start walking the stack in the middle of the thread_info
>> > struct.
>> >
>> > I think you meant:
>> >
>> >                 stack = (unsigned long *)(tinfo + 1)
>> >
>> > However, thread_info will have been overwritten anyway.  So maybe it
>> > should just be:
>> >
>> >                 stack = tinfo;
>> >
>> > (Though that still wouldn't quite work because the valid_stack_ptr()
>> > check would fail...)
>>
>> I did mean what I wrote, because I wanted to start at the bottom of
>> the validly allocated area.  IOW I wanted to do the minimum possible
>> backward jump to make the code display something.
>
> But why the "+ 1"?  Is that a hack to make it pass the valid_stack_ptr()
> check?

Yes.

But hmm.  Maybe the right fix is to drop the + 1 and to change the
last line of valid_stck_ptr from:

    return p > t && p < t + THREAD_SIZE - size;

to:

    return p >= t && p < t + THREAD_SIZE - size;

The current definition of valid_stack_ptr is certainly nonsensical.
It should either be p >= t or p >= t + 1.

--Andy
Josh Poimboeuf June 16, 2016, 6:54 p.m. UTC | #5
On Thu, Jun 16, 2016 at 11:37:07AM -0700, Andy Lutomirski wrote:
> On Thu, Jun 16, 2016 at 11:33 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> > On Thu, Jun 16, 2016 at 11:22:14AM -0700, Andy Lutomirski wrote:
> >> On Thu, Jun 16, 2016 at 11:16 AM, Josh Poimboeuf <jpoimboe@redhat.com> wrote:
> >> > On Wed, Jun 15, 2016 at 05:28:32PM -0700, Andy Lutomirski wrote:
> >> >> If we overflow the stack, print_context_stack will abort.  Detect
> >> >> this case and rewind back into the valid part of the stack so that
> >> >> we can trace it.
> >> >>
> >> >> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> >> >> ---
> >> >>  arch/x86/kernel/dumpstack.c | 7 +++++++
> >> >>  1 file changed, 7 insertions(+)
> >> >>
> >> >> diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
> >> >> index d4d085e27d04..400a2e17c1d1 100644
> >> >> --- a/arch/x86/kernel/dumpstack.c
> >> >> +++ b/arch/x86/kernel/dumpstack.c
> >> >> @@ -100,6 +100,13 @@ print_context_stack(struct thread_info *tinfo,
> >> >>  {
> >> >>       struct stack_frame *frame = (struct stack_frame *)bp;
> >> >>
> >> >> +     /*
> >> >> +      * If we overflowed the stack into a guard page, jump back to the
> >> >> +      * bottom of the usable stack.
> >> >> +      */
> >> >> +     if ((unsigned long)tinfo - (unsigned long)stack < PAGE_SIZE)
> >> >> +             stack = (unsigned long *)tinfo + 1;
> >> >
> >> > That will start walking the stack in the middle of the thread_info
> >> > struct.
> >> >
> >> > I think you meant:
> >> >
> >> >                 stack = (unsigned long *)(tinfo + 1)
> >> >
> >> > However, thread_info will have been overwritten anyway.  So maybe it
> >> > should just be:
> >> >
> >> >                 stack = tinfo;
> >> >
> >> > (Though that still wouldn't quite work because the valid_stack_ptr()
> >> > check would fail...)
> >>
> >> I did mean what I wrote, because I wanted to start at the bottom of
> >> the validly allocated area.  IOW I wanted to do the minimum possible
> >> backward jump to make the code display something.
> >
> > But why the "+ 1"?  Is that a hack to make it pass the valid_stack_ptr()
> > check?
> 
> Yes.
> 
> But hmm.  Maybe the right fix is to drop the + 1 and to change the
> last line of valid_stck_ptr from:
> 
>     return p > t && p < t + THREAD_SIZE - size;
> 
> to:
> 
>     return p >= t && p < t + THREAD_SIZE - size;

Yeah, I think that would be much better.  Then it won't skip the first
value on the page.

Patch
diff mbox

diff --git a/arch/x86/kernel/dumpstack.c b/arch/x86/kernel/dumpstack.c
index d4d085e27d04..400a2e17c1d1 100644
--- a/arch/x86/kernel/dumpstack.c
+++ b/arch/x86/kernel/dumpstack.c
@@ -100,6 +100,13 @@  print_context_stack(struct thread_info *tinfo,
 {
 	struct stack_frame *frame = (struct stack_frame *)bp;
 
+	/*
+	 * If we overflowed the stack into a guard page, jump back to the
+	 * bottom of the usable stack.
+	 */
+	if ((unsigned long)tinfo - (unsigned long)stack < PAGE_SIZE)
+		stack = (unsigned long *)tinfo + 1;
+
 	while (valid_stack_ptr(tinfo, stack, sizeof(*stack), end)) {
 		unsigned long addr;