diff mbox

debian stretch dom0 + xen 4.9 fails to boot

Message ID d4add3563cbc4f2fbd6a7a6594f59400@AMSPEX02CL03.citrite.net (mailing list archive)
State New, archived
Headers show

Commit Message

Paul Durrant June 7, 2017, 11:06 a.m. UTC
> -----Original Message-----
> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of
> Paul Durrant
> Sent: 07 June 2017 11:37
> To: Andrew Cooper <Andrew.Cooper3@citrix.com>; 'Juergen Gross'
> <jgross@suse.com>; Jan Beulich <JBeulich@suse.com>
> Cc: xen-devel (xen-devel@lists.xenproject.org) <xen-
> devel@lists.xenproject.org>; Julien Grall (julien.grall@arm.com)
> <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>
> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> 
> > -----Original Message-----
> [snip]
> > >>
> > >> TBH: I really can't see what is wrong with that patch. The only change
> > >> which should be able to break something seems to be the reduction of
> > the
> > >> wakeup stack size to 3kB, but this shouldn't affect booting the system
> > >> at all...
> > >>
> > > Yeah, my next test is going to be increasing the size of the wakeup stack
> > again, but there is really nothing obviously wrong with the patch.
> >
> > My gut feeling is that there is some path through boot (tickled by these
> > two machines) which is clobbering the wrong piece of memory, which was
> > previously safe and is now not, because of the rearrangements here.
> >
> > Debugging these machines is very tricky, because they have no serial or
> > IMPI whatsoever.
> >
> 
> It does appear to be a layout issue. If I modify the code to just set
> wakeup_stack to wakeup_stack_start + PAGE_SIZE, so it has the full 4k then I
> still get the problem. However if I then move that code block that includes
> wakeup.S and move it to the end of trampoline.S so that wakup code and
> stack are once again located at the end then the problem goes away.
> 

It appears that it is just the code that needs to go at the end. The following patch is sufficient to avoid the problem. This may be preferable to a full reversion...

  Paul


>   Paul
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel

Comments

Juergen Gross June 7, 2017, 11:57 a.m. UTC | #1
On 07/06/17 13:06, Paul Durrant wrote:
>> -----Original Message-----
>> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of
>> Paul Durrant
>> Sent: 07 June 2017 11:37
>> To: Andrew Cooper <Andrew.Cooper3@citrix.com>; 'Juergen Gross'
>> <jgross@suse.com>; Jan Beulich <JBeulich@suse.com>
>> Cc: xen-devel (xen-devel@lists.xenproject.org) <xen-
>> devel@lists.xenproject.org>; Julien Grall (julien.grall@arm.com)
>> <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>
>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
>>
>>> -----Original Message-----
>> [snip]
>>>>>
>>>>> TBH: I really can't see what is wrong with that patch. The only change
>>>>> which should be able to break something seems to be the reduction of
>>> the
>>>>> wakeup stack size to 3kB, but this shouldn't affect booting the system
>>>>> at all...
>>>>>
>>>> Yeah, my next test is going to be increasing the size of the wakeup stack
>>> again, but there is really nothing obviously wrong with the patch.
>>>
>>> My gut feeling is that there is some path through boot (tickled by these
>>> two machines) which is clobbering the wrong piece of memory, which was
>>> previously safe and is now not, because of the rearrangements here.
>>>
>>> Debugging these machines is very tricky, because they have no serial or
>>> IMPI whatsoever.
>>>
>>
>> It does appear to be a layout issue. If I modify the code to just set
>> wakeup_stack to wakeup_stack_start + PAGE_SIZE, so it has the full 4k then I
>> still get the problem. However if I then move that code block that includes
>> wakeup.S and move it to the end of trampoline.S so that wakup code and
>> stack are once again located at the end then the problem goes away.
>>
> 
> It appears that it is just the code that needs to go at the end. The following patch is sufficient to avoid the problem. This may be preferable to a full reversion...

I believe this is wrong. You risk the wakeup_stack extending into wakeup
code and the main reason of the patch is gone, as now the permanent
trampoline no longer is on a single page.


Juergen

> 
>   Paul
> 
> diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S
> index 4d640f3fcd..7709a782f9 100644
> --- a/xen/arch/x86/boot/trampoline.S
> +++ b/xen/arch/x86/boot/trampoline.S
> @@ -156,7 +156,7 @@ start64:
>          movabs  $__high_start,%rax
>          jmpq    *%rax
> 
> -#include "wakeup.S"
> +ENTRY(wakeup_stack_start)
> 
>  /* The first page of trampoline is permanent, the rest boot-time only. */
>  /* Reuse the boot trampoline on the 1st trampoline page as stack for wakeup. */
> @@ -280,3 +280,4 @@ rm_idt: .word   256*4-1, 0, 0
>  #include "mem.S"
>  #include "edd.S"
>  #include "video.S"
> +#include "wakeup.S"
> diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S
> index f9632eef95..d4824b55d5 100644
> --- a/xen/arch/x86/boot/wakeup.S
> +++ b/xen/arch/x86/boot/wakeup.S
> @@ -173,5 +173,3 @@ bogus_saved_magic:
>          movw    $0x0e00 + 'S', 0xb8014
>          jmp     bogus_saved_magic
> 
> -/* Stack for wakeup: rest of first trampoline page. */
> -ENTRY(wakeup_stack_start)
> 
>>   Paul
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xen.org
>> https://lists.xen.org/xen-devel
Paul Durrant June 7, 2017, 12:02 p.m. UTC | #2
> -----Original Message-----

> From: Juergen Gross [mailto:jgross@suse.com]

> Sent: 07 June 2017 12:57

> To: Paul Durrant <Paul.Durrant@citrix.com>; Andrew Cooper

> <Andrew.Cooper3@citrix.com>; Jan Beulich <JBeulich@suse.com>

> Cc: xen-devel (xen-devel@lists.xenproject.org) <xen-

> devel@lists.xenproject.org>; Julien Grall (julien.grall@arm.com)

> <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>

> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot

> 

> On 07/06/17 13:06, Paul Durrant wrote:

> >> -----Original Message-----

> >> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of

> >> Paul Durrant

> >> Sent: 07 June 2017 11:37

> >> To: Andrew Cooper <Andrew.Cooper3@citrix.com>; 'Juergen Gross'

> >> <jgross@suse.com>; Jan Beulich <JBeulich@suse.com>

> >> Cc: xen-devel (xen-devel@lists.xenproject.org) <xen-

> >> devel@lists.xenproject.org>; Julien Grall (julien.grall@arm.com)

> >> <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>

> >> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot

> >>

> >>> -----Original Message-----

> >> [snip]

> >>>>>

> >>>>> TBH: I really can't see what is wrong with that patch. The only change

> >>>>> which should be able to break something seems to be the reduction

> of

> >>> the

> >>>>> wakeup stack size to 3kB, but this shouldn't affect booting the system

> >>>>> at all...

> >>>>>

> >>>> Yeah, my next test is going to be increasing the size of the wakeup

> stack

> >>> again, but there is really nothing obviously wrong with the patch.

> >>>

> >>> My gut feeling is that there is some path through boot (tickled by these

> >>> two machines) which is clobbering the wrong piece of memory, which

> was

> >>> previously safe and is now not, because of the rearrangements here.

> >>>

> >>> Debugging these machines is very tricky, because they have no serial or

> >>> IMPI whatsoever.

> >>>

> >>

> >> It does appear to be a layout issue. If I modify the code to just set

> >> wakeup_stack to wakeup_stack_start + PAGE_SIZE, so it has the full 4k

> then I

> >> still get the problem. However if I then move that code block that includes

> >> wakeup.S and move it to the end of trampoline.S so that wakup code and

> >> stack are once again located at the end then the problem goes away.

> >>

> >

> > It appears that it is just the code that needs to go at the end. The following

> patch is sufficient to avoid the problem. This may be preferable to a full

> reversion...

> 

> I believe this is wrong. You risk the wakeup_stack extending into wakeup

> code and the main reason of the patch is gone, as now the permanent

> trampoline no longer is on a single page.

> 


I must be misunderstanding something then. The stack grows down from wakeup_stack towards wakeup_stack_start doesn't it? So why would there be an issue with the stack overwriting wakeup code?

  Paul

> 

> Juergen

> 

> >

> >   Paul

> >

> > diff --git a/xen/arch/x86/boot/trampoline.S

> b/xen/arch/x86/boot/trampoline.S

> > index 4d640f3fcd..7709a782f9 100644

> > --- a/xen/arch/x86/boot/trampoline.S

> > +++ b/xen/arch/x86/boot/trampoline.S

> > @@ -156,7 +156,7 @@ start64:

> >          movabs  $__high_start,%rax

> >          jmpq    *%rax

> >

> > -#include "wakeup.S"

> > +ENTRY(wakeup_stack_start)

> >

> >  /* The first page of trampoline is permanent, the rest boot-time only. */

> >  /* Reuse the boot trampoline on the 1st trampoline page as stack for

> wakeup. */

> > @@ -280,3 +280,4 @@ rm_idt: .word   256*4-1, 0, 0

> >  #include "mem.S"

> >  #include "edd.S"

> >  #include "video.S"

> > +#include "wakeup.S"

> > diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S

> > index f9632eef95..d4824b55d5 100644

> > --- a/xen/arch/x86/boot/wakeup.S

> > +++ b/xen/arch/x86/boot/wakeup.S

> > @@ -173,5 +173,3 @@ bogus_saved_magic:

> >          movw    $0x0e00 + 'S', 0xb8014

> >          jmp     bogus_saved_magic

> >

> > -/* Stack for wakeup: rest of first trampoline page. */

> > -ENTRY(wakeup_stack_start)

> >

> >>   Paul

> >>

> >> _______________________________________________

> >> Xen-devel mailing list

> >> Xen-devel@lists.xen.org

> >> https://lists.xen.org/xen-devel
Juergen Gross June 7, 2017, 12:13 p.m. UTC | #3
On 07/06/17 14:02, Paul Durrant wrote:
>> -----Original Message-----
>> From: Juergen Gross [mailto:jgross@suse.com]
>> Sent: 07 June 2017 12:57
>> To: Paul Durrant <Paul.Durrant@citrix.com>; Andrew Cooper
>> <Andrew.Cooper3@citrix.com>; Jan Beulich <JBeulich@suse.com>
>> Cc: xen-devel (xen-devel@lists.xenproject.org) <xen-
>> devel@lists.xenproject.org>; Julien Grall (julien.grall@arm.com)
>> <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>
>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
>>
>> On 07/06/17 13:06, Paul Durrant wrote:
>>>> -----Original Message-----
>>>> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of
>>>> Paul Durrant
>>>> Sent: 07 June 2017 11:37
>>>> To: Andrew Cooper <Andrew.Cooper3@citrix.com>; 'Juergen Gross'
>>>> <jgross@suse.com>; Jan Beulich <JBeulich@suse.com>
>>>> Cc: xen-devel (xen-devel@lists.xenproject.org) <xen-
>>>> devel@lists.xenproject.org>; Julien Grall (julien.grall@arm.com)
>>>> <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>
>>>> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
>>>>
>>>>> -----Original Message-----
>>>> [snip]
>>>>>>>
>>>>>>> TBH: I really can't see what is wrong with that patch. The only change
>>>>>>> which should be able to break something seems to be the reduction
>> of
>>>>> the
>>>>>>> wakeup stack size to 3kB, but this shouldn't affect booting the system
>>>>>>> at all...
>>>>>>>
>>>>>> Yeah, my next test is going to be increasing the size of the wakeup
>> stack
>>>>> again, but there is really nothing obviously wrong with the patch.
>>>>>
>>>>> My gut feeling is that there is some path through boot (tickled by these
>>>>> two machines) which is clobbering the wrong piece of memory, which
>> was
>>>>> previously safe and is now not, because of the rearrangements here.
>>>>>
>>>>> Debugging these machines is very tricky, because they have no serial or
>>>>> IMPI whatsoever.
>>>>>
>>>>
>>>> It does appear to be a layout issue. If I modify the code to just set
>>>> wakeup_stack to wakeup_stack_start + PAGE_SIZE, so it has the full 4k
>> then I
>>>> still get the problem. However if I then move that code block that includes
>>>> wakeup.S and move it to the end of trampoline.S so that wakup code and
>>>> stack are once again located at the end then the problem goes away.
>>>>
>>>
>>> It appears that it is just the code that needs to go at the end. The following
>> patch is sufficient to avoid the problem. This may be preferable to a full
>> reversion...
>>
>> I believe this is wrong. You risk the wakeup_stack extending into wakeup
>> code and the main reason of the patch is gone, as now the permanent
>> trampoline no longer is on a single page.
>>
> 
> I must be misunderstanding something then. The stack grows down from wakeup_stack towards wakeup_stack_start doesn't it? So why would there be an issue with the stack overwriting wakeup code?

wakeup_stack is just defined to be trampoline_start + PAGE_SIZE,
without any real space reserved for the stack. So it may well be that
wakeup_start points somewhere into wakeup.S.

There must be no permanent trampoline coding after wakeup_stack_start.

Juergen

> 
>   Paul
> 
>>
>> Juergen
>>
>>>
>>>   Paul
>>>
>>> diff --git a/xen/arch/x86/boot/trampoline.S
>> b/xen/arch/x86/boot/trampoline.S
>>> index 4d640f3fcd..7709a782f9 100644
>>> --- a/xen/arch/x86/boot/trampoline.S
>>> +++ b/xen/arch/x86/boot/trampoline.S
>>> @@ -156,7 +156,7 @@ start64:
>>>          movabs  $__high_start,%rax
>>>          jmpq    *%rax
>>>
>>> -#include "wakeup.S"
>>> +ENTRY(wakeup_stack_start)
>>>
>>>  /* The first page of trampoline is permanent, the rest boot-time only. */
>>>  /* Reuse the boot trampoline on the 1st trampoline page as stack for
>> wakeup. */
>>> @@ -280,3 +280,4 @@ rm_idt: .word   256*4-1, 0, 0
>>>  #include "mem.S"
>>>  #include "edd.S"
>>>  #include "video.S"
>>> +#include "wakeup.S"
>>> diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S
>>> index f9632eef95..d4824b55d5 100644
>>> --- a/xen/arch/x86/boot/wakeup.S
>>> +++ b/xen/arch/x86/boot/wakeup.S
>>> @@ -173,5 +173,3 @@ bogus_saved_magic:
>>>          movw    $0x0e00 + 'S', 0xb8014
>>>          jmp     bogus_saved_magic
>>>
>>> -/* Stack for wakeup: rest of first trampoline page. */
>>> -ENTRY(wakeup_stack_start)
>>>
>>>>   Paul
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xen.org
>>>> https://lists.xen.org/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> https://lists.xen.org/xen-devel
>
Jan Beulich June 7, 2017, 12:19 p.m. UTC | #4
>>> On 07.06.17 at 14:02, <Paul.Durrant@citrix.com> wrote:
>> From: Juergen Gross [mailto:jgross@suse.com]
>> Sent: 07 June 2017 12:57
>> 
>> On 07/06/17 13:06, Paul Durrant wrote:
>> > It appears that it is just the code that needs to go at the end. The following
>> patch is sufficient to avoid the problem. This may be preferable to a full
>> reversion...
>> 
>> I believe this is wrong. You risk the wakeup_stack extending into wakeup
>> code and the main reason of the patch is gone, as now the permanent
>> trampoline no longer is on a single page.
>> 
> 
> I must be misunderstanding something then. The stack grows down from 
> wakeup_stack towards wakeup_stack_start doesn't it? So why would there be an 
> issue with the stack overwriting wakeup code?

I think this is a pointless discussion: Once we know memory is being
corrupted, it doesn't help shuffling things around. By putting the
wakeup code at the end, it'll be that which gets corrupted, and
hence S3 wakeup would not work. We can really only try to figure
out what parts of memory we need to avoid touching _at all_.

Jan
Paul Durrant June 7, 2017, 12:26 p.m. UTC | #5
> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 07 June 2017 13:19
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew
> Cooper <Andrew.Cooper3@citrix.com>; xen-devel (xen-
> devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris
> Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross
> <jgross@suse.com>
> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> 
> >>> On 07.06.17 at 14:02, <Paul.Durrant@citrix.com> wrote:
> >> From: Juergen Gross [mailto:jgross@suse.com]
> >> Sent: 07 June 2017 12:57
> >>
> >> On 07/06/17 13:06, Paul Durrant wrote:
> >> > It appears that it is just the code that needs to go at the end. The
> following
> >> patch is sufficient to avoid the problem. This may be preferable to a full
> >> reversion...
> >>
> >> I believe this is wrong. You risk the wakeup_stack extending into wakeup
> >> code and the main reason of the patch is gone, as now the permanent
> >> trampoline no longer is on a single page.
> >>
> >
> > I must be misunderstanding something then. The stack grows down from
> > wakeup_stack towards wakeup_stack_start doesn't it? So why would
> there be an
> > issue with the stack overwriting wakeup code?
> 
> I think this is a pointless discussion: Once we know memory is being
> corrupted, it doesn't help shuffling things around. By putting the
> wakeup code at the end, it'll be that which gets corrupted, and
> hence S3 wakeup would not work. We can really only try to figure
> out what parts of memory we need to avoid touching _at all_.
> 

Yes, fair enough. I'm currently trying to figure out what the code in head.S just ahead of trampoline_setup is trying to do.

  Paul

> Jan
Jan Beulich June 7, 2017, 12:34 p.m. UTC | #6
>>> On 07.06.17 at 14:26, <Paul.Durrant@citrix.com> wrote:
> I'm currently trying to figure out what the code in head.S 
> just ahead of trampoline_setup is trying to do.

That's where we determine where to put the trampoline, by looking
at base memory size, EBDA location, and the MB provided low
memory size. Insane values are attempted to be ignored.

Jan
diff mbox

Patch

diff --git a/xen/arch/x86/boot/trampoline.S b/xen/arch/x86/boot/trampoline.S
index 4d640f3fcd..7709a782f9 100644
--- a/xen/arch/x86/boot/trampoline.S
+++ b/xen/arch/x86/boot/trampoline.S
@@ -156,7 +156,7 @@  start64:
         movabs  $__high_start,%rax
         jmpq    *%rax

-#include "wakeup.S"
+ENTRY(wakeup_stack_start)

 /* The first page of trampoline is permanent, the rest boot-time only. */
 /* Reuse the boot trampoline on the 1st trampoline page as stack for wakeup. */
@@ -280,3 +280,4 @@  rm_idt: .word   256*4-1, 0, 0
 #include "mem.S"
 #include "edd.S"
 #include "video.S"
+#include "wakeup.S"
diff --git a/xen/arch/x86/boot/wakeup.S b/xen/arch/x86/boot/wakeup.S
index f9632eef95..d4824b55d5 100644
--- a/xen/arch/x86/boot/wakeup.S
+++ b/xen/arch/x86/boot/wakeup.S
@@ -173,5 +173,3 @@  bogus_saved_magic:
         movw    $0x0e00 + 'S', 0xb8014
         jmp     bogus_saved_magic

-/* Stack for wakeup: rest of first trampoline page. */
-ENTRY(wakeup_stack_start)