Message ID | 94bf1caf8b95436fa7b3aed74a172ce1@AMSPEX02CL03.citrite.net (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 08/06/17 14:42, Paul Durrant wrote: > For those following this... > > By poking characters at the screen and bisecting where they stopped, I have narrowed the problem to the code in edd.S. I can successfully boot by setting opt_edd=off on the Xen cmd line and I can also boot with the following patch applied: > > diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S > index 73371f98b5..5409f1d9a1 100644 > --- a/xen/arch/x86/boot/edd.S > +++ b/xen/arch/x86/boot/edd.S > @@ -148,5 +148,6 @@ GLOBAL(boot_mbr_signature_nr) > .byte 0 > GLOBAL(boot_mbr_signature) > .fill EDD_MBR_SIG_MAX*8,1,0 > + .align 4096 > GLOBAL(boot_edd_info) > - .fill 512,1,0 # big enough for a disc sector > + .fill 4096,1,0 # big enough for a disc sector > > (based on a hunch that the BIOS defaults to a 4K sector for my NVMe drive) > > I need to investigate some more but I do wonder whether the EDD info should be read first to determine the appropriate size of memory buffer to use when issuing the read of the MBR. Hardcoding a 4k reservation seems like the wrong thing to do, even if it is sufficient for this BIOS Thanks for rehabilitation of my patch. :-) Juergen
>>> On 08.06.17 at 14:42, <Paul.Durrant@citrix.com> wrote: > For those following this... > > By poking characters at the screen and bisecting where they stopped, I have > narrowed the problem to the code in edd.S. I can successfully boot by setting > opt_edd=off on the Xen cmd line and I can also boot with the following patch > applied: > > diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S > index 73371f98b5..5409f1d9a1 100644 > --- a/xen/arch/x86/boot/edd.S > +++ b/xen/arch/x86/boot/edd.S > @@ -148,5 +148,6 @@ GLOBAL(boot_mbr_signature_nr) > .byte 0 > GLOBAL(boot_mbr_signature) > .fill EDD_MBR_SIG_MAX*8,1,0 > + .align 4096 > GLOBAL(boot_edd_info) > - .fill 512,1,0 # big enough for a disc > sector > + .fill 4096,1,0 # big enough for a disc > sector > > (based on a hunch that the BIOS defaults to a 4K sector for my NVMe drive) > > I need to investigate some more but I do wonder whether the EDD info should > be read first to determine the appropriate size of memory buffer to use when > issuing the read of the MBR. Hardcoding a 4k reservation seems like the wrong > thing to do, even if it is sufficient for this BIOS. boot_edd_info is being used for two things - reading the MBR of each disk and storing data retrieved from INT 13 Fn 41 and 48. The latter occupies 492 bytes (6 times 8+74). Which would make me guess the system has a 4k disk, and the BIOS doesn't abstract away this characteristic when handling INT 13 Fn 02 (which is supposed to only act in multiples of 512-byte sectors, as opposed to Fn 42). The alternative of Fn 48 overflowing its buffer would seem less likely, especially with the buffer holding a size on input. Do you, btw, really need both the size and alignment increases? Jan
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: 08 June 2017 14:19 > To: Paul Durrant <Paul.Durrant@citrix.com> > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; > 'BorisOstrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > <jgross@suse.com> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > >>> On 08.06.17 at 14:42, <Paul.Durrant@citrix.com> wrote: > > For those following this... > > > > By poking characters at the screen and bisecting where they stopped, I > have > > narrowed the problem to the code in edd.S. I can successfully boot by > setting > > opt_edd=off on the Xen cmd line and I can also boot with the following > patch > > applied: > > > > diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S > > index 73371f98b5..5409f1d9a1 100644 > > --- a/xen/arch/x86/boot/edd.S > > +++ b/xen/arch/x86/boot/edd.S > > @@ -148,5 +148,6 @@ GLOBAL(boot_mbr_signature_nr) > > .byte 0 > > GLOBAL(boot_mbr_signature) > > .fill EDD_MBR_SIG_MAX*8,1,0 > > + .align 4096 > > GLOBAL(boot_edd_info) > > - .fill 512,1,0 # big enough for a disc > > sector > > + .fill 4096,1,0 # big enough for a disc > > sector > > > > (based on a hunch that the BIOS defaults to a 4K sector for my NVMe drive) > > > > I need to investigate some more but I do wonder whether the EDD info > should > > be read first to determine the appropriate size of memory buffer to use > when > > issuing the read of the MBR. Hardcoding a 4k reservation seems like the > wrong > > thing to do, even if it is sufficient for this BIOS. > > boot_edd_info is being used for two things - reading the MBR of > each disk and storing data retrieved from INT 13 Fn 41 and 48. > The latter occupies 492 bytes (6 times 8+74). Which would make > me guess the system has a 4k disk, and the BIOS doesn't abstract > away this characteristic when handling INT 13 Fn 02 (which is > supposed to only act in multiples of 512-byte sectors, as opposed > to Fn 42). > > The alternative of Fn 48 overflowing its buffer would seem less > likely, especially with the buffer holding a size on input. Yes, I tested with edd=skipmbr on the command line (and no patch applied) and the system booted, so it's definitely the MBR read that is at fault. > > Do you, btw, really need both the size and alignment increases? > At first I tried just increasing the .fill to 4096 but that did not seem to work. I have not found anything that says int13 0x2 buffers need to be aligned... but the BIOS being buggy in this respect I guess it could easily require that. I'm just testing some more code to try to see exactly how much memory the MBR read scribbles on. Paul > Jan
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: 08 June 2017 14:19 > To: Paul Durrant <Paul.Durrant@citrix.com> > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; > 'BorisOstrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > <jgross@suse.com> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > >>> On 08.06.17 at 14:42, <Paul.Durrant@citrix.com> wrote: > > For those following this... > > > > By poking characters at the screen and bisecting where they stopped, I > have > > narrowed the problem to the code in edd.S. I can successfully boot by > setting > > opt_edd=off on the Xen cmd line and I can also boot with the following > patch > > applied: > > > > diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S > > index 73371f98b5..5409f1d9a1 100644 > > --- a/xen/arch/x86/boot/edd.S > > +++ b/xen/arch/x86/boot/edd.S > > @@ -148,5 +148,6 @@ GLOBAL(boot_mbr_signature_nr) > > .byte 0 > > GLOBAL(boot_mbr_signature) > > .fill EDD_MBR_SIG_MAX*8,1,0 > > + .align 4096 > > GLOBAL(boot_edd_info) > > - .fill 512,1,0 # big enough for a disc > > sector > > + .fill 4096,1,0 # big enough for a disc > > sector > > > > (based on a hunch that the BIOS defaults to a 4K sector for my NVMe drive) > > > > I need to investigate some more but I do wonder whether the EDD info > should > > be read first to determine the appropriate size of memory buffer to use > when > > issuing the read of the MBR. Hardcoding a 4k reservation seems like the > wrong > > thing to do, even if it is sufficient for this BIOS. > > boot_edd_info is being used for two things - reading the MBR of > each disk and storing data retrieved from INT 13 Fn 41 and 48. > The latter occupies 492 bytes (6 times 8+74). Which would make > me guess the system has a 4k disk, and the BIOS doesn't abstract > away this characteristic when handling INT 13 Fn 02 (which is > supposed to only act in multiples of 512-byte sectors, as opposed > to Fn 42). > > The alternative of Fn 48 overflowing its buffer would seem less > likely, especially with the buffer holding a size on input. > > Do you, btw, really need both the size and alignment increases? > More investigation has characterized the problem a little more but I still don't understand precisely what it happening. The trampoline code sets %ds to 0x86 and the image is loaded at offset 0 in that segment, i.e. it is located at 0x86000. The boot_edd_info 512 byte range ends up spanning the 0x87000 boundary and when this area is used for reading the MBR I see the lock-up. If I insert some bytes sufficient to push boot_edd_info up to or beyond 0x87000 then the system boots and I have verified that bytes located immediately before or after boot_edd_info are not scribbled on by the int13 call. However, if I arrange for boot_edd_info to be located even just one byte below 0x87000 then the system again fails to boot. I am now attempting to grab some memory below the trampoline, pattern fill it and then try to figure out if there is any collateral damage from the int13 that is further afield from the actual value in es:bx, but all this has got me wondering why Xen bothers to read the MBR, or the EDD info for that matter? EDD or MBR signatures are returned by the XENPF_firmware_info hypercall, and Linux does seem to have code called early on in xen_start_kernel() that does make such hypercalls, but it also appears to be able to boot happily if I put edd=off on my Xen command line, so is this code really necessary? Paul > Jan
>>> On 09.06.17 at 14:19, <Paul.Durrant@citrix.com> wrote: > ..., but all this has > got me wondering why Xen bothers to read the MBR, or the EDD info for that > matter? EDD or MBR signatures are returned by the XENPF_firmware_info > hypercall, and Linux does seem to have code called early on in > xen_start_kernel() that does make such hypercalls, but it also appears to be > able to boot happily if I put edd=off on my Xen command line, so is this code > really necessary? Well, that's a question to the Linux folks. I would guess there's management code around wanting that info, but I'm not sure. Us doing this is simply because of Linux wanting it and having no other way to get at least some of this information (it could surely read the MBRs, but it wouldn't be able to associate them with BIOS drive numbers used for the other EDD information obtained). Jan
On 06/09/2017 09:05 AM, Jan Beulich wrote: >>>> On 09.06.17 at 14:19, <Paul.Durrant@citrix.com> wrote: >> ..., but all this has >> got me wondering why Xen bothers to read the MBR, or the EDD info for that >> matter? EDD or MBR signatures are returned by the XENPF_firmware_info >> hypercall, and Linux does seem to have code called early on in >> xen_start_kernel() that does make such hypercalls, but it also appears to be >> able to boot happily if I put edd=off on my Xen command line, so is this code >> really necessary? > Well, that's a question to the Linux folks. I would guess there's > management code around wanting that info, but I'm not sure. Us > doing this is simply because of Linux wanting it and having no > other way to get at least some of this information (it could surely > read the MBRs, but it wouldn't be able to associate them with > BIOS drive numbers used for the other EDD information obtained). Not sure what it is for. Perhaps there are some tools that poke into sysfs? commit 96f28bc66adb1414cfc9405ff80cfffdc44edd84 Author: David Vrabel <david.vrabel@citrix.com> Date: Wed Apr 3 17:31:50 2013 +0100 x86/xen: populate boot_params with EDD data During early setup of a dom0 kernel, populate boot_params with the Enhanced Disk Drive (EDD) and MBR signature data. This makes information on the BIOS boot device available in /sys/firmware/edd/. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: Jan Beulich <jbeulich@suse.com> Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
> -----Original Message----- > From: Boris Ostrovsky [mailto:boris.ostrovsky@oracle.com] > Sent: 09 June 2017 14:52 > To: Jan Beulich <JBeulich@suse.com>; Paul Durrant > <Paul.Durrant@citrix.com> > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; Juergen > Gross <jgross@suse.com> > Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > On 06/09/2017 09:05 AM, Jan Beulich wrote: > >>>> On 09.06.17 at 14:19, <Paul.Durrant@citrix.com> wrote: > >> ..., but all this has > >> got me wondering why Xen bothers to read the MBR, or the EDD info for > that > >> matter? EDD or MBR signatures are returned by the > XENPF_firmware_info > >> hypercall, and Linux does seem to have code called early on in > >> xen_start_kernel() that does make such hypercalls, but it also appears to > be > >> able to boot happily if I put edd=off on my Xen command line, so is this > code > >> really necessary? > > Well, that's a question to the Linux folks. I would guess there's > > management code around wanting that info, but I'm not sure. Us > > doing this is simply because of Linux wanting it and having no > > other way to get at least some of this information (it could surely > > read the MBRs, but it wouldn't be able to associate them with > > BIOS drive numbers used for the other EDD information obtained). > > Not sure what it is for. Perhaps there are some tools that poke into sysfs? > > > commit 96f28bc66adb1414cfc9405ff80cfffdc44edd84 > Author: David Vrabel <david.vrabel@citrix.com> > Date: Wed Apr 3 17:31:50 2013 +0100 > > x86/xen: populate boot_params with EDD data > > During early setup of a dom0 kernel, populate boot_params with the > Enhanced Disk Drive (EDD) and MBR signature data. This makes > information on the BIOS boot device available in /sys/firmware/edd/. > > Signed-off-by: David Vrabel <david.vrabel@citrix.com> > Acked-by: Jan Beulich <jbeulich@suse.com> > Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Interesting. The Xen side of things seems to have been there forever: commit 79e96982cade240531d7d84fa5b966b2b64c04af Author: kfraser@localhost.localdomain <kfraser@localhost.localdomain> Date: Tue Jun 12 14:03:09 2007 +0100 x86: Gather BIOS EDD info during boot. Still needs plumbing to dom0. Signed-off-by: Keir Fraser <keir@xensource.com> I've characterised the issue some more and it appears to be an overflow inside the int13 handler if es:bx is less than 512 bytes below a 4k boundary. I modified the code to use a hardcoded segment, which I set at 0x6000, and all values of bx up to 0xe00 resulted in a good MBR signature. Values above 0xe00 but below 0xe20 resulted in the buffer not being identified as a valid MBR (I guess because the 0xAA55 fell off) and values of bx above 0xe20 resulted in either a hang (sometimes with a black screen) or a reboot. This led me to believe that backing out all my debug code and adding a '.align 512' just before the definition of boot_edd_info should result in a successful boot. Alas this appears not to be the case... I seem to need at least 2k alignment. I wonder whether it may be more robust to go for 4k alignment though. Paul
>>> On 09.06.17 at 17:14, <Paul.Durrant@citrix.com> wrote: > I've characterised the issue some more and it appears to be an overflow > inside the int13 handler if es:bx is less than 512 bytes below a 4k boundary. > I modified the code to use a hardcoded segment, which I set at 0x6000, and > all values of bx up to 0xe00 resulted in a good MBR signature. Values above > 0xe00 but below 0xe20 resulted in the buffer not being identified as a valid > MBR (I guess because the 0xAA55 fell off) and values of bx above 0xe20 > resulted in either a hang (sometimes with a black screen) or a reboot. > This led me to believe that backing out all my debug code and adding a > '.align 512' just before the definition of boot_edd_info should result in a > successful boot. Alas this appears not to be the case... I seem to need at > least 2k alignment. I wonder whether it may be more robust to go for 4k > alignment though. At least until we've seen (and merged) Jürgen's further trampoline adjustments, we need to be careful with growing its overall size. Memory below 1Mb is known to be scarce specifically on some EFI systems, and we're currently still allocating space for all of the trampoline instead of just its permanent part. Even on non-EFI systems I'd prefer the trampoline to remain as small as possible. With what you say about the requirements this buggy BIOS has I wonder whether we couldn't help ourselves by doing I/O to other than boot_edd_info. Especially if we did the EDD stuff last (rather than before video), a good portion of the boot time only trampoline space will no longer be needed. Otoh I wonder where a system this buggy shouldn't be declared unusable (until a suitable BIOS update becomes available). Did you check what constraints Linux places on the buffer used for I/O? IOW can you judge whether bare metal Linux just happens to work (just like older Xen did), or has been fixed to cope with such a situation? Jan
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: 09 June 2017 16:41 > To: Paul Durrant <Paul.Durrant@citrix.com> > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > <jgross@suse.com> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > >>> On 09.06.17 at 17:14, <Paul.Durrant@citrix.com> wrote: > > I've characterised the issue some more and it appears to be an overflow > > inside the int13 handler if es:bx is less than 512 bytes below a 4k boundary. > > I modified the code to use a hardcoded segment, which I set at 0x6000, and > > all values of bx up to 0xe00 resulted in a good MBR signature. Values above > > 0xe00 but below 0xe20 resulted in the buffer not being identified as a valid > > MBR (I guess because the 0xAA55 fell off) and values of bx above 0xe20 > > resulted in either a hang (sometimes with a black screen) or a reboot. > > This led me to believe that backing out all my debug code and adding a > > '.align 512' just before the definition of boot_edd_info should result in a > > successful boot. Alas this appears not to be the case... I seem to need at > > least 2k alignment. I wonder whether it may be more robust to go for 4k > > alignment though. > > At least until we've seen (and merged) Jürgen's further trampoline > adjustments, we need to be careful with growing its overall size. > Memory below 1Mb is known to be scarce specifically on some EFI > systems, and we're currently still allocating space for all of the > trampoline instead of just its permanent part. Even on non-EFI > systems I'd prefer the trampoline to remain as small as possible. > > With what you say about the requirements this buggy BIOS has > I wonder whether we couldn't help ourselves by doing I/O to > other than boot_edd_info. Especially if we did the EDD stuff last > (rather than before video), a good portion of the boot time only > trampoline space will no longer be needed. I think that would be sensible, but I was looking for the simplest fix/workaround possible for 4.9 and setting the alignment seems to it. > > Otoh I wonder where a system this buggy shouldn't be declared > unusable (until a suitable BIOS update becomes available). Did > you check what constraints Linux places on the buffer used for > I/O? IOW can you judge whether bare metal Linux just happens > to work (just like older Xen did), or has been fixed to cope with > such a situation? > I'll go have a look and the linux edd code. I'm also trying a BIOS update (which is proving to be trickier than I thought as it seems to have killed networking in some weird way). Paul > Jan
>>> On 09.06.17 at 17:47, <Paul.Durrant@citrix.com> wrote: > I'll go have a look and the linux edd code. I'm also trying a BIOS update > (which is proving to be trickier than I thought as it seems to have killed > networking in some weird way). Speaks for the quality of what that vendor delivers... Jan
> -----Original Message----- > From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of > Paul Durrant > Sent: 09 June 2017 16:47 > To: 'Jan Beulich' <JBeulich@suse.com> > Cc: Juergen Gross <jgross@suse.com>; Andrew Cooper > <Andrew.Cooper3@citrix.com>; Julien Grall (julien.grall@arm.com) > <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>; > xen-devel(xen-devel@lists.xenproject.org) <xen- > devel@lists.xenproject.org> > Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > > -----Original Message----- > > From: Jan Beulich [mailto:JBeulich@suse.com] > > Sent: 09 June 2017 16:41 > > To: Paul Durrant <Paul.Durrant@citrix.com> > > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > > <jgross@suse.com> > > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > > > >>> On 09.06.17 at 17:14, <Paul.Durrant@citrix.com> wrote: > > > I've characterised the issue some more and it appears to be an overflow > > > inside the int13 handler if es:bx is less than 512 bytes below a 4k > boundary. > > > I modified the code to use a hardcoded segment, which I set at 0x6000, > and > > > all values of bx up to 0xe00 resulted in a good MBR signature. Values > above > > > 0xe00 but below 0xe20 resulted in the buffer not being identified as a > valid > > > MBR (I guess because the 0xAA55 fell off) and values of bx above 0xe20 > > > resulted in either a hang (sometimes with a black screen) or a reboot. > > > This led me to believe that backing out all my debug code and adding a > > > '.align 512' just before the definition of boot_edd_info should result in a > > > successful boot. Alas this appears not to be the case... I seem to need at > > > least 2k alignment. I wonder whether it may be more robust to go for 4k > > > alignment though. > > > > At least until we've seen (and merged) Jürgen's further trampoline > > adjustments, we need to be careful with growing its overall size. > > Memory below 1Mb is known to be scarce specifically on some EFI > > systems, and we're currently still allocating space for all of the > > trampoline instead of just its permanent part. Even on non-EFI > > systems I'd prefer the trampoline to remain as small as possible. > > > > With what you say about the requirements this buggy BIOS has > > I wonder whether we couldn't help ourselves by doing I/O to > > other than boot_edd_info. Especially if we did the EDD stuff last > > (rather than before video), a good portion of the boot time only > > trampoline space will no longer be needed. > > I think that would be sensible, but I was looking for the simplest > fix/workaround possible for 4.9 and setting the alignment seems to it. > > > > > Otoh I wonder where a system this buggy shouldn't be declared > > unusable (until a suitable BIOS update becomes available). Did > > you check what constraints Linux places on the buffer used for > > I/O? IOW can you judge whether bare metal Linux just happens > > to work (just like older Xen did), or has been fixed to cope with > > such a situation? > > > > I'll go have a look and the linux edd code. I'm also trying a BIOS update (which > is proving to be trickier than I thought as it seems to have killed networking in > some weird way). Looking at the code in arch/x86/boot/edd.c in Linux, it sector aligns the buffer into which it reads the MBR and the sector size is pulled from the EDD which means, I believe, that the MBR read on the skull canyon would be 4k aligned. What do you think it best to do for Xen 4.9? Hardcoding a 4k alignment is clearly easy and would work around this BIOS issue but, as you say, it does grow the image. Reverting Juergen's patch also works round the issue, but that is more by luck. Re-working the code is preferable, but I guess it's too late to introduce such code-churn in 4.9. Paul > > Paul > > > Jan > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
diff --git a/xen/arch/x86/boot/edd.S b/xen/arch/x86/boot/edd.S index 73371f98b5..5409f1d9a1 100644 --- a/xen/arch/x86/boot/edd.S +++ b/xen/arch/x86/boot/edd.S @@ -148,5 +148,6 @@ GLOBAL(boot_mbr_signature_nr) .byte 0 GLOBAL(boot_mbr_signature) .fill EDD_MBR_SIG_MAX*8,1,0 + .align 4096 GLOBAL(boot_edd_info) - .fill 512,1,0 # big enough for a disc sector + .fill 4096,1,0 # big enough for a disc sector