debian stretch dom0 + xen 4.9 fails to boot

Message ID	94bf1caf8b95436fa7b3aed74a172ce1@AMSPEX02CL03.citrite.net (mailing list archive)
State	New, archived
Headers	show Return-Path: <xen-devel-bounces@lists.xen.org> From: Paul Durrant <Paul.Durrant@citrix.com> To: 'Jan Beulich' <JBeulich@suse.com> Thread-Topic: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot Thread-Index: AdLe0KoM3kANRDRYSf+PCIgyxC4kg///62kA///UqkD//530gIAAq+WA//7hiaAASaM11////+MQgAAfuwCAAAbUMIAAGXcAgAAQOnCAAEA9AIAAVefggADG/ACAAUkj0IACqxgAgAUy+oCACoPQgIATiwnQ Date: Thu, 8 Jun 2017 12:42:26 +0000 Message-ID: <94bf1caf8b95436fa7b3aed74a172ce1@AMSPEX02CL03.citrite.net> References: <bcc0d4330d9b463a9829af1327d895a9@AMSPEX02CL03.citrite.net> <5936E230020000780015FEDC@prv-mh.provo.novell.com> <fe6a9bb5eb8144bda27ef1532a153ab7@AMSPEX02CL03.citrite.net> <d45ae6fc34a94ab48213667073b37616@AMSPEX02CL03.citrite.net> <3a0eb227-8302-50db-84a6-b07893ab0e91@oracle.com> <d8a91bf2426a4fc99f6a0e633e853402@AMSPEX02CL03.citrite.net> <5937D4FF02000078001602F6@suse.com> <e9772a31-a3c0-6994-2745-219e6b0948f8@suse.com> <ad450ab0147147429a46cd7382a17c19@AMSPEX02CL03.citrite.net> <0e704f5e-5845-dc56-6058-d0451d43d842@citrix.com> <2baad09e48864a06873037240b8e39dd@AMSPEX02CL03.citrite.net> <5938048A02000078001604AE@prv-mh.provo.novell.com> <056e566f141c4715867e2bdbbe418977@AMSPEX02CL03.citrite.net> <593806FC02000078001604D1@prv-mh.provo.novell.com> <e4aae59cbd884e21ab1fb75a96f34d3d@AMSPEX02CL03.citrite.net> <593813E602000078001605F7@prv-mh.provo.novell.com> <ee93e45315794a3db4aa21beefb62e52@AMSPEX02CL03.citrite.net> <593838D20200007800160859@prv-mh.provo.novell.com> <fc6fa02bb41e497e8b1896b692e82123@AMSPEX02CL03.citrite.net> <59383D4F020000780016089F@prv-mh.provo.novell.com> In-Reply-To: <59383D4F020000780016089F@prv-mh.provo.novell.com> Accept-Language: en-GB, en-US Content-Language: en-US MIME-Version: 1.0 Cc: Juergen Gross <jgross@suse.com>, Andrew Cooper <Andrew.Cooper3@citrix.com>, "Julien Grall \(julien.grall@arm.com\)" <julien.grall@arm.com>, 'BorisOstrovsky' <boris.ostrovsky@oracle.com>, "xen-devel\(xen-devel@lists.xenproject.org\)" <xen-devel@lists.xenproject.org> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot Precedence: list Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: base64 Errors-To: xen-devel-bounces@lists.xen.org Sender: "Xen-devel" <xen-devel-bounces@lists.xen.org>

Paul Durrant June 8, 2017, 12:42 p.m. UTC

For those following this...

By poking characters at the screen and bisecting where they stopped, I have narrowed the problem to the code in edd.S. I can successfully boot by setting opt_edd=off on the Xen cmd line and I can also boot with the following patch applied:


(based on a hunch that the BIOS defaults to a 4K sector for my NVMe drive)

I need to investigate some more but I do wonder whether the EDD info should be read first to determine the appropriate size of memory buffer to use when issuing the read of the MBR. Hardcoding a 4k reservation seems like the wrong thing to do, even if it is sufficient for this BIOS.

  Paul

Jürgen Groß June 8, 2017, 12:46 p.m. UTC | #1

On 08/06/17 14:42, Paul Durrant wrote:
> For those following this...
> 
> By poking characters at the screen and bisecting where they stopped, I have narrowed the problem to the code in edd.S. I can successfully boot by setting opt_edd=off on the Xen cmd line and I can also boot with the following patch applied:
> 
> diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S
> index 73371f98b5..5409f1d9a1 100644
> --- a/xen/arch/x86/boot/edd.S
> +++ b/xen/arch/x86/boot/edd.S
> @@ -148,5 +148,6 @@ GLOBAL(boot_mbr_signature_nr)
>          .byte   0
>  GLOBAL(boot_mbr_signature)
>          .fill   EDD_MBR_SIG_MAX*8,1,0
> +       .align  4096
>  GLOBAL(boot_edd_info)
> -        .fill   512,1,0                         # big enough for a disc sector
> +        .fill   4096,1,0                         # big enough for a disc sector
> 
> (based on a hunch that the BIOS defaults to a 4K sector for my NVMe drive)
> 
> I need to investigate some more but I do wonder whether the EDD info should be read first to determine the appropriate size of memory buffer to use when issuing the read of the MBR. Hardcoding a 4k reservation seems like the wrong thing to do, even if it is sufficient for this BIOS
Thanks for rehabilitation of my patch. :-)


Juergen

Jan Beulich June 8, 2017, 1:18 p.m. UTC | #2

>>> On 08.06.17 at 14:42, <Paul.Durrant@citrix.com> wrote:
> For those following this...
> 
> By poking characters at the screen and bisecting where they stopped, I have 
> narrowed the problem to the code in edd.S. I can successfully boot by setting 
> opt_edd=off on the Xen cmd line and I can also boot with the following patch 
> applied:
> 
> diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S
> index 73371f98b5..5409f1d9a1 100644
> --- a/xen/arch/x86/boot/edd.S
> +++ b/xen/arch/x86/boot/edd.S
> @@ -148,5 +148,6 @@ GLOBAL(boot_mbr_signature_nr)
>          .byte   0
>  GLOBAL(boot_mbr_signature)
>          .fill   EDD_MBR_SIG_MAX*8,1,0
> +       .align  4096
>  GLOBAL(boot_edd_info)
> -        .fill   512,1,0                         # big enough for a disc 
> sector
> +        .fill   4096,1,0                         # big enough for a disc 
> sector
> 
> (based on a hunch that the BIOS defaults to a 4K sector for my NVMe drive)
> 
> I need to investigate some more but I do wonder whether the EDD info should 
> be read first to determine the appropriate size of memory buffer to use when 
> issuing the read of the MBR. Hardcoding a 4k reservation seems like the wrong 
> thing to do, even if it is sufficient for this BIOS.

boot_edd_info is being used for two things - reading the MBR of
each disk and storing data retrieved from INT 13 Fn 41 and 48.
The latter occupies 492 bytes (6 times 8+74). Which would make
me guess the system has a 4k disk, and the BIOS doesn't abstract
away this characteristic when handling INT 13 Fn 02 (which is
supposed to only act in multiples of 512-byte sectors, as opposed
to Fn 42).

The alternative of Fn 48 overflowing its buffer would seem less
likely, especially with the buffer holding a size on input.

Do you, btw, really need both the size and alignment increases?

Jan

Paul Durrant June 8, 2017, 1:24 p.m. UTC | #3

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 08 June 2017 14:19
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew
> Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen-
> devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>;
> 'BorisOstrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross
> <jgross@suse.com>
> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> 
> >>> On 08.06.17 at 14:42, <Paul.Durrant@citrix.com> wrote:
> > For those following this...
> >
> > By poking characters at the screen and bisecting where they stopped, I
> have
> > narrowed the problem to the code in edd.S. I can successfully boot by
> setting
> > opt_edd=off on the Xen cmd line and I can also boot with the following
> patch
> > applied:
> >
> > diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S
> > index 73371f98b5..5409f1d9a1 100644
> > --- a/xen/arch/x86/boot/edd.S
> > +++ b/xen/arch/x86/boot/edd.S
> > @@ -148,5 +148,6 @@ GLOBAL(boot_mbr_signature_nr)
> >          .byte   0
> >  GLOBAL(boot_mbr_signature)
> >          .fill   EDD_MBR_SIG_MAX*8,1,0
> > +       .align  4096
> >  GLOBAL(boot_edd_info)
> > -        .fill   512,1,0                         # big enough for a disc
> > sector
> > +        .fill   4096,1,0                         # big enough for a disc
> > sector
> >
> > (based on a hunch that the BIOS defaults to a 4K sector for my NVMe drive)
> >
> > I need to investigate some more but I do wonder whether the EDD info
> should
> > be read first to determine the appropriate size of memory buffer to use
> when
> > issuing the read of the MBR. Hardcoding a 4k reservation seems like the
> wrong
> > thing to do, even if it is sufficient for this BIOS.
> 
> boot_edd_info is being used for two things - reading the MBR of
> each disk and storing data retrieved from INT 13 Fn 41 and 48.
> The latter occupies 492 bytes (6 times 8+74). Which would make
> me guess the system has a 4k disk, and the BIOS doesn't abstract
> away this characteristic when handling INT 13 Fn 02 (which is
> supposed to only act in multiples of 512-byte sectors, as opposed
> to Fn 42).
> 
> The alternative of Fn 48 overflowing its buffer would seem less
> likely, especially with the buffer holding a size on input.

Yes, I tested with edd=skipmbr on the command line (and no patch applied) and the system booted, so it's definitely the MBR read that is at fault.

> 
> Do you, btw, really need both the size and alignment increases?
> 

At first I tried just increasing the .fill to 4096 but that did not seem to work. I have not found anything that says int13 0x2 buffers need to be aligned... but the BIOS being buggy in this respect I guess it could easily require that.
I'm just testing some more code to try to see exactly how much memory the MBR read scribbles on.

  Paul

> Jan

Paul Durrant June 9, 2017, 12:19 p.m. UTC | #4

> -----Original Message-----
> From: Jan Beulich [mailto:JBeulich@suse.com]
> Sent: 08 June 2017 14:19
> To: Paul Durrant <Paul.Durrant@citrix.com>
> Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew
> Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen-
> devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>;
> 'BorisOstrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross
> <jgross@suse.com>
> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> 
> >>> On 08.06.17 at 14:42, <Paul.Durrant@citrix.com> wrote:
> > For those following this...
> >
> > By poking characters at the screen and bisecting where they stopped, I
> have
> > narrowed the problem to the code in edd.S. I can successfully boot by
> setting
> > opt_edd=off on the Xen cmd line and I can also boot with the following
> patch
> > applied:
> >
> > diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S
> > index 73371f98b5..5409f1d9a1 100644
> > --- a/xen/arch/x86/boot/edd.S
> > +++ b/xen/arch/x86/boot/edd.S
> > @@ -148,5 +148,6 @@ GLOBAL(boot_mbr_signature_nr)
> >          .byte   0
> >  GLOBAL(boot_mbr_signature)
> >          .fill   EDD_MBR_SIG_MAX*8,1,0
> > +       .align  4096
> >  GLOBAL(boot_edd_info)
> > -        .fill   512,1,0                         # big enough for a disc
> > sector
> > +        .fill   4096,1,0                         # big enough for a disc
> > sector
> >
> > (based on a hunch that the BIOS defaults to a 4K sector for my NVMe drive)
> >
> > I need to investigate some more but I do wonder whether the EDD info
> should
> > be read first to determine the appropriate size of memory buffer to use
> when
> > issuing the read of the MBR. Hardcoding a 4k reservation seems like the
> wrong
> > thing to do, even if it is sufficient for this BIOS.
> 
> boot_edd_info is being used for two things - reading the MBR of
> each disk and storing data retrieved from INT 13 Fn 41 and 48.
> The latter occupies 492 bytes (6 times 8+74). Which would make
> me guess the system has a 4k disk, and the BIOS doesn't abstract
> away this characteristic when handling INT 13 Fn 02 (which is
> supposed to only act in multiples of 512-byte sectors, as opposed
> to Fn 42).
> 
> The alternative of Fn 48 overflowing its buffer would seem less
> likely, especially with the buffer holding a size on input.
> 
> Do you, btw, really need both the size and alignment increases?
> 

More investigation has characterized the problem a little more but I still don't understand precisely what it happening. The trampoline code sets %ds to 0x86 and the image is loaded at offset 0 in that segment, i.e. it is located at 0x86000. The boot_edd_info 512 byte range ends up spanning the 0x87000 boundary and when this area is used for reading the MBR I see the lock-up. If I insert some bytes sufficient to push boot_edd_info up to or beyond 0x87000 then the system boots and I have verified that bytes located immediately before or after boot_edd_info are not scribbled on by the int13 call. However, if I arrange for boot_edd_info to be located even just one byte below 0x87000 then the system again fails to boot.
I am now attempting to grab some memory below the trampoline, pattern fill it and then try to figure out if there is any collateral damage from the int13 that is further afield from the actual value in es:bx, but all this has got me wondering why Xen bothers to read the MBR, or the EDD info for that matter? EDD or MBR signatures are returned by the XENPF_firmware_info hypercall, and Linux does seem to have code called early on in xen_start_kernel() that does make such hypercalls, but it also appears to be able to boot happily if I put edd=off on my Xen command line, so is this code really necessary?

  Paul

> Jan

Jan Beulich June 9, 2017, 1:05 p.m. UTC | #5

>>> On 09.06.17 at 14:19, <Paul.Durrant@citrix.com> wrote:
> ..., but all this has 
> got me wondering why Xen bothers to read the MBR, or the EDD info for that 
> matter? EDD or MBR signatures are returned by the XENPF_firmware_info 
> hypercall, and Linux does seem to have code called early on in 
> xen_start_kernel() that does make such hypercalls, but it also appears to be 
> able to boot happily if I put edd=off on my Xen command line, so is this code 
> really necessary?

Well, that's a question to the Linux folks. I would guess there's
management code around wanting that info, but I'm not sure. Us
doing this is simply because of Linux wanting it and having no
other way to get at least some of this information (it could surely
read the MBRs, but it wouldn't be able to associate them with
BIOS drive numbers used for the other EDD information obtained).

Jan

Boris Ostrovsky June 9, 2017, 1:52 p.m. UTC | #6

On 06/09/2017 09:05 AM, Jan Beulich wrote:
>>>> On 09.06.17 at 14:19, <Paul.Durrant@citrix.com> wrote:
>> ..., but all this has 
>> got me wondering why Xen bothers to read the MBR, or the EDD info for that 
>> matter? EDD or MBR signatures are returned by the XENPF_firmware_info 
>> hypercall, and Linux does seem to have code called early on in 
>> xen_start_kernel() that does make such hypercalls, but it also appears to be 
>> able to boot happily if I put edd=off on my Xen command line, so is this code 
>> really necessary?
> Well, that's a question to the Linux folks. I would guess there's
> management code around wanting that info, but I'm not sure. Us
> doing this is simply because of Linux wanting it and having no
> other way to get at least some of this information (it could surely
> read the MBRs, but it wouldn't be able to associate them with
> BIOS drive numbers used for the other EDD information obtained).

Not sure what it is for. Perhaps there are some tools that poke into sysfs?


commit 96f28bc66adb1414cfc9405ff80cfffdc44edd84
Author: David Vrabel <david.vrabel@citrix.com>
Date:   Wed Apr 3 17:31:50 2013 +0100

    x86/xen: populate boot_params with EDD data
   
    During early setup of a dom0 kernel, populate boot_params with the
    Enhanced Disk Drive (EDD) and MBR signature data.  This makes
    information on the BIOS boot device available in /sys/firmware/edd/.
   
    Signed-off-by: David Vrabel <david.vrabel@citrix.com>
    Acked-by: Jan Beulich <jbeulich@suse.com>
    Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Paul Durrant June 9, 2017, 3:14 p.m. UTC | #7

> -----Original Message-----
> From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com]
> Sent: 09 June 2017 14:52
> To: Jan Beulich <JBeulich@suse.com>; Paul Durrant
> <Paul.Durrant@citrix.com>
> Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew
> Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen-
> devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; Juergen
> Gross <jgross@suse.com>
> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot
> 
> On 06/09/2017 09:05 AM, Jan Beulich wrote:
> >>>> On 09.06.17 at 14:19, <Paul.Durrant@citrix.com> wrote:
> >> ..., but all this has
> >> got me wondering why Xen bothers to read the MBR, or the EDD info for
> that
> >> matter? EDD or MBR signatures are returned by the
> XENPF_firmware_info
> >> hypercall, and Linux does seem to have code called early on in
> >> xen_start_kernel() that does make such hypercalls, but it also appears to
> be
> >> able to boot happily if I put edd=off on my Xen command line, so is this
> code
> >> really necessary?
> > Well, that's a question to the Linux folks. I would guess there's
> > management code around wanting that info, but I'm not sure. Us
> > doing this is simply because of Linux wanting it and having no
> > other way to get at least some of this information (it could surely
> > read the MBRs, but it wouldn't be able to associate them with
> > BIOS drive numbers used for the other EDD information obtained).
> 
> Not sure what it is for. Perhaps there are some tools that poke into sysfs?
> 
> 
> commit 96f28bc66adb1414cfc9405ff80cfffdc44edd84
> Author: David Vrabel <david.vrabel@citrix.com>
> Date:   Wed Apr 3 17:31:50 2013 +0100
> 
>     x86/xen: populate boot_params with EDD data
> 
>     During early setup of a dom0 kernel, populate boot_params with the
>     Enhanced Disk Drive (EDD) and MBR signature data.  This makes
>     information on the BIOS boot device available in /sys/firmware/edd/.
> 
>     Signed-off-by: David Vrabel <david.vrabel@citrix.com>
>     Acked-by: Jan Beulich <jbeulich@suse.com>
>     Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>

Interesting. The Xen side of things seems to have been there forever:

commit 79e96982cade240531d7d84fa5b966b2b64c04af
Author: kfraser@localhost.localdomain <kfraser@localhost.localdomain>
Date:   Tue Jun 12 14:03:09 2007 +0100

    x86: Gather BIOS EDD info during boot.
    Still needs plumbing to dom0.
    Signed-off-by: Keir Fraser <keir@xensource.com>

I've characterised the issue some more and it appears to be an overflow inside the int13 handler if es:bx is less than 512 bytes below a 4k boundary. I modified the code to use a hardcoded segment, which I set at 0x6000, and all values of bx up to 0xe00 resulted in a good MBR signature. Values above 0xe00 but below 0xe20 resulted in the buffer not being identified as a valid MBR (I guess because the 0xAA55 fell off) and values of bx above 0xe20 resulted in either a hang (sometimes with a black screen) or a reboot.
This led me to believe that backing out all my debug code and adding a '.align 512' just before the definition of boot_edd_info should result in a successful boot. Alas this appears not to be the case... I seem to need at least 2k alignment. I wonder whether it may be more robust to go for 4k alignment though.

  Paul

Jan Beulich June 9, 2017, 3:41 p.m. UTC | #8

>>> On 09.06.17 at 17:14, <Paul.Durrant@citrix.com> wrote:
> I've characterised the issue some more and it appears to be an overflow 
> inside the int13 handler if es:bx is less than 512 bytes below a 4k boundary. 
> I modified the code to use a hardcoded segment, which I set at 0x6000, and 
> all values of bx up to 0xe00 resulted in a good MBR signature. Values above 
> 0xe00 but below 0xe20 resulted in the buffer not being identified as a valid 
> MBR (I guess because the 0xAA55 fell off) and values of bx above 0xe20 
> resulted in either a hang (sometimes with a black screen) or a reboot.
> This led me to believe that backing out all my debug code and adding a 
> '.align 512' just before the definition of boot_edd_info should result in a 
> successful boot. Alas this appears not to be the case... I seem to need at 
> least 2k alignment. I wonder whether it may be more robust to go for 4k 
> alignment though.

At least until we've seen (and merged) Jürgen's further trampoline
adjustments, we need to be careful with growing its overall size.
Memory below 1Mb is known to be scarce specifically on some EFI
systems, and we're currently still allocating space for all of the
trampoline instead of just its permanent part. Even on non-EFI
systems I'd prefer the trampoline to remain as small as possible.

With what you say about the requirements this buggy BIOS has
I wonder whether we couldn't help ourselves by doing I/O to
other than boot_edd_info. Especially if we did the EDD stuff last
(rather than before video), a good portion of the boot time only
trampoline space will no longer be needed.

Otoh I wonder where a system this buggy shouldn't be declared
unusable (until a suitable BIOS update becomes available). Did
you check what constraints Linux places on the buffer used for
I/O? IOW can you judge whether bare metal Linux just happens
to work (just like older Xen did), or has been fixed to cope with
such a situation?

Jan

Paul Durrant June 9, 2017, 3:47 p.m. UTC | #9

> -----Original Message-----

> From: Jan Beulich [mailto:JBeulich@suse.com]

> Sent: 09 June 2017 16:41

> To: Paul Durrant <Paul.Durrant@citrix.com>

> Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew

> Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen-

> devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris

> Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross

> <jgross@suse.com>

> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot

> 

> >>> On 09.06.17 at 17:14, <Paul.Durrant@citrix.com> wrote:

> > I've characterised the issue some more and it appears to be an overflow

> > inside the int13 handler if es:bx is less than 512 bytes below a 4k boundary.

> > I modified the code to use a hardcoded segment, which I set at 0x6000, and

> > all values of bx up to 0xe00 resulted in a good MBR signature. Values above

> > 0xe00 but below 0xe20 resulted in the buffer not being identified as a valid

> > MBR (I guess because the 0xAA55 fell off) and values of bx above 0xe20

> > resulted in either a hang (sometimes with a black screen) or a reboot.

> > This led me to believe that backing out all my debug code and adding a

> > '.align 512' just before the definition of boot_edd_info should result in a

> > successful boot. Alas this appears not to be the case... I seem to need at

> > least 2k alignment. I wonder whether it may be more robust to go for 4k

> > alignment though.

> 

> At least until we've seen (and merged) Jürgen's further trampoline

> adjustments, we need to be careful with growing its overall size.

> Memory below 1Mb is known to be scarce specifically on some EFI

> systems, and we're currently still allocating space for all of the

> trampoline instead of just its permanent part. Even on non-EFI

> systems I'd prefer the trampoline to remain as small as possible.

> 

> With what you say about the requirements this buggy BIOS has

> I wonder whether we couldn't help ourselves by doing I/O to

> other than boot_edd_info. Especially if we did the EDD stuff last

> (rather than before video), a good portion of the boot time only

> trampoline space will no longer be needed.


I think that would be sensible, but I was looking for the simplest fix/workaround possible for 4.9 and setting the alignment seems to it.

> 

> Otoh I wonder where a system this buggy shouldn't be declared

> unusable (until a suitable BIOS update becomes available). Did

> you check what constraints Linux places on the buffer used for

> I/O? IOW can you judge whether bare metal Linux just happens

> to work (just like older Xen did), or has been fixed to cope with

> such a situation?

> 


I'll go have a look and the linux edd code. I'm also trying a BIOS update (which is proving to be trickier than I thought as it seems to have killed networking in some weird way).

  Paul

> Jan

Jan Beulich June 9, 2017, 3:58 p.m. UTC | #10

>>> On 09.06.17 at 17:47, <Paul.Durrant@citrix.com> wrote:
> I'll go have a look and the linux edd code. I'm also trying a BIOS update 
> (which is proving to be trickier than I thought as it seems to have killed 
> networking in some weird way).

Speaks for the quality of what that vendor delivers...

Jan

Paul Durrant June 12, 2017, 8:14 a.m. UTC | #11

> -----Original Message-----

> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of

> Paul Durrant

> Sent: 09 June 2017 16:47

> To: 'Jan Beulich' <JBeulich@suse.com>

> Cc: Juergen Gross <jgross@suse.com>; Andrew Cooper

> <Andrew.Cooper3@citrix.com>; Julien Grall (julien.grall@arm.com)

> <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>;

> xen-devel(xen-devel@lists.xenproject.org) <xen-

> devel@lists.xenproject.org>

> Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot

> 

> > -----Original Message-----

> > From: Jan Beulich [mailto:JBeulich@suse.com]

> > Sent: 09 June 2017 16:41

> > To: Paul Durrant <Paul.Durrant@citrix.com>

> > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew

> > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen-

> > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris

> > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross

> > <jgross@suse.com>

> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot

> >

> > >>> On 09.06.17 at 17:14, <Paul.Durrant@citrix.com> wrote:

> > > I've characterised the issue some more and it appears to be an overflow

> > > inside the int13 handler if es:bx is less than 512 bytes below a 4k

> boundary.

> > > I modified the code to use a hardcoded segment, which I set at 0x6000,

> and

> > > all values of bx up to 0xe00 resulted in a good MBR signature. Values

> above

> > > 0xe00 but below 0xe20 resulted in the buffer not being identified as a

> valid

> > > MBR (I guess because the 0xAA55 fell off) and values of bx above 0xe20

> > > resulted in either a hang (sometimes with a black screen) or a reboot.

> > > This led me to believe that backing out all my debug code and adding a

> > > '.align 512' just before the definition of boot_edd_info should result in a

> > > successful boot. Alas this appears not to be the case... I seem to need at

> > > least 2k alignment. I wonder whether it may be more robust to go for 4k

> > > alignment though.

> >

> > At least until we've seen (and merged) Jürgen's further trampoline

> > adjustments, we need to be careful with growing its overall size.

> > Memory below 1Mb is known to be scarce specifically on some EFI

> > systems, and we're currently still allocating space for all of the

> > trampoline instead of just its permanent part. Even on non-EFI

> > systems I'd prefer the trampoline to remain as small as possible.

> >

> > With what you say about the requirements this buggy BIOS has

> > I wonder whether we couldn't help ourselves by doing I/O to

> > other than boot_edd_info. Especially if we did the EDD stuff last

> > (rather than before video), a good portion of the boot time only

> > trampoline space will no longer be needed.

> 

> I think that would be sensible, but I was looking for the simplest

> fix/workaround possible for 4.9 and setting the alignment seems to it.

> 

> >

> > Otoh I wonder where a system this buggy shouldn't be declared

> > unusable (until a suitable BIOS update becomes available). Did

> > you check what constraints Linux places on the buffer used for

> > I/O? IOW can you judge whether bare metal Linux just happens

> > to work (just like older Xen did), or has been fixed to cope with

> > such a situation?

> >

> 

> I'll go have a look and the linux edd code. I'm also trying a BIOS update (which

> is proving to be trickier than I thought as it seems to have killed networking in

> some weird way).


Looking at the code in arch/x86/boot/edd.c in Linux, it sector aligns the buffer into which it reads the MBR and the sector size is pulled from the EDD which means, I believe, that the MBR read on the skull canyon would be 4k aligned.

What do you think it best to do for Xen 4.9? Hardcoding a 4k alignment is clearly easy and would work around this BIOS issue but, as you say, it does grow the image. Reverting Juergen's patch also works round the issue, but that is more by luck. Re-working the code is preferable, but I guess it's too late to introduce such code-churn in 4.9.

  Paul

> 

>   Paul

> 

> > Jan

> _______________________________________________

> Xen-devel mailing list

> Xen-devel@lists.xen.org

> https://lists.xen.org/xen-devel

debian stretch dom0 + xen 4.9 fails to boot

Commit Message

Comments

Patch