Message ID | 593E8BC70200007800161DEB@prv-mh.provo.novell.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: 12 June 2017 11:41 > To: Paul Durrant <Paul.Durrant@citrix.com> > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > <jgross@suse.com> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > >>> On 12.06.17 at 10:14, <Paul.Durrant@citrix.com> wrote: > > Looking at the code in arch/x86/boot/edd.c in Linux, it sector aligns the > > buffer into which it reads the MBR and the sector size is pulled from the > EDD > > which means, I believe, that the MBR read on the skull canyon would be 4k > > aligned. > > > > What do you think it best to do for Xen 4.9? Hardcoding a 4k alignment is > > clearly easy and would work around this BIOS issue but, as you say, it does > > grow the image. Reverting Juergen's patch also works round the issue, but > > that is more by luck. Re-working the code is preferable, but I guess it's too > > late to introduce such code-churn in 4.9. > > Reverting Jürgen's code is out of question with all the information > you've gathered by now. I think re-working the EDD code slightly > is the best option. Would you mind giving the attached patch a > try? This still slightly grows the trampoline due to a few more > instructions being needed, but should still be far better than > embedding a whole 4k buffer (and then later finding a BIOS/disk > combination which wants even more). Note that I've left a tiny > bit of debugging code in there. > Sure, I'll give that a go now. Paul > Jan
> -----Original Message----- [snip] > > > > > > What do you think it best to do for Xen 4.9? Hardcoding a 4k alignment is > > > clearly easy and would work around this BIOS issue but, as you say, it > does > > > grow the image. Reverting Juergen's patch also works round the issue, > but > > > that is more by luck. Re-working the code is preferable, but I guess it's > too > > > late to introduce such code-churn in 4.9. > > > > Reverting Jürgen's code is out of question with all the information > > you've gathered by now. I think re-working the EDD code slightly > > is the best option. Would you mind giving the attached patch a > > try? This still slightly grows the trampoline due to a few more > > instructions being needed, but should still be far better than > > embedding a whole 4k buffer (and then later finding a BIOS/disk > > combination which wants even more). Note that I've left a tiny > > bit of debugging code in there. > > > > Sure, I'll give that a go now. > That worked fine: (XEN) MBR[80] @ 85e0 (86000) so you can add my Tested-by to that. Thanks, Paul > Paul > > > Jan > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
>>> On 12.06.17 at 12:53, <Paul.Durrant@citrix.com> wrote: >> -----Original Message----- > [snip] >> > > >> > > What do you think it best to do for Xen 4.9? Hardcoding a 4k alignment is >> > > clearly easy and would work around this BIOS issue but, as you say, it >> does >> > > grow the image. Reverting Juergen's patch also works round the issue, >> but >> > > that is more by luck. Re-working the code is preferable, but I guess it's >> too >> > > late to introduce such code-churn in 4.9. >> > >> > Reverting Jürgen's code is out of question with all the information >> > you've gathered by now. I think re-working the EDD code slightly >> > is the best option. Would you mind giving the attached patch a >> > try? This still slightly grows the trampoline due to a few more >> > instructions being needed, but should still be far better than >> > embedding a whole 4k buffer (and then later finding a BIOS/disk >> > combination which wants even more). Note that I've left a tiny >> > bit of debugging code in there. >> > >> >> Sure, I'll give that a go now. >> > > That worked fine: > > (XEN) MBR[80] @ 85e0 (86000) But that's contrary to your earlier findings: Didn't you say simply avoiding a 4k-boundary wasn't enough? And it certainly tells us that this isn't a 4k drive (or at least the BIOS doesn't surface 4k sectors) - I was really expecting a larger gap between the two logged values. > so you can add my Tested-by to that. I.e. I'm not sure about this, as I'm still uncertain whether some corruption didn't again occur. Of course APs coming up properly would already be a relatively good sign (as now the permanent part of the trampoline would be the predestined area for corruption to occur in). Jan
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: 12 June 2017 12:12 > To: Paul Durrant <Paul.Durrant@citrix.com> > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > <jgross@suse.com> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > >>> On 12.06.17 at 12:53, <Paul.Durrant@citrix.com> wrote: > >> -----Original Message----- > > [snip] > >> > > > >> > > What do you think it best to do for Xen 4.9? Hardcoding a 4k alignment > is > >> > > clearly easy and would work around this BIOS issue but, as you say, it > >> does > >> > > grow the image. Reverting Juergen's patch also works round the issue, > >> but > >> > > that is more by luck. Re-working the code is preferable, but I guess it's > >> too > >> > > late to introduce such code-churn in 4.9. > >> > > >> > Reverting Jürgen's code is out of question with all the information > >> > you've gathered by now. I think re-working the EDD code slightly > >> > is the best option. Would you mind giving the attached patch a > >> > try? This still slightly grows the trampoline due to a few more > >> > instructions being needed, but should still be far better than > >> > embedding a whole 4k buffer (and then later finding a BIOS/disk > >> > combination which wants even more). Note that I've left a tiny > >> > bit of debugging code in there. > >> > > >> > >> Sure, I'll give that a go now. > >> > > > > That worked fine: > > > > (XEN) MBR[80] @ 85e0 (86000) > > But that's contrary to your earlier findings: Didn't you say simply > avoiding a 4k-boundary wasn't enough? And it certainly tells us > that this isn't a 4k drive (or at least the BIOS doesn't surface 4k > sectors) - I was really expecting a larger gap between the two > logged values. > I'll go dump out the edd and double check what it is saying. My findings indicated that the problem seemed to be doing a read that spanned a 4k boundary caused a problem, so using 0x85e00 would be safe. The anomaly was that simply aligning the edd_info buffer and a 512 byte boundary and continuing to use that for reading did not work. > > so you can add my Tested-by to that. > > I.e. I'm not sure about this, as I'm still uncertain whether some > corruption didn't again occur. Of course APs coming up properly > would already be a relatively good sign (as now the permanent > part of the trampoline would be the predestined area for > corruption to occur in). > None of my findings ever indicated memory corruption (although there, of course, may have been some that I happened to miss), but rather misbehaviour of the int13 handler itself - either locking up, having odd effects (e.g. black screen), or both. Paul > Jan
> -----Original Message----- > From: Paul Durrant > Sent: 12 June 2017 13:06 > To: 'Jan Beulich' <JBeulich@suse.com> > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > <jgross@suse.com> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > > -----Original Message----- > > From: Jan Beulich [mailto:JBeulich@suse.com] > > Sent: 12 June 2017 12:12 > > To: Paul Durrant <Paul.Durrant@citrix.com> > > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > > <jgross@suse.com> > > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > > > >>> On 12.06.17 at 12:53, <Paul.Durrant@citrix.com> wrote: > > >> -----Original Message----- > > > [snip] > > >> > > > > >> > > What do you think it best to do for Xen 4.9? Hardcoding a 4k > alignment > > is > > >> > > clearly easy and would work around this BIOS issue but, as you say, it > > >> does > > >> > > grow the image. Reverting Juergen's patch also works round the > issue, > > >> but > > >> > > that is more by luck. Re-working the code is preferable, but I guess > it's > > >> too > > >> > > late to introduce such code-churn in 4.9. > > >> > > > >> > Reverting Jürgen's code is out of question with all the information > > >> > you've gathered by now. I think re-working the EDD code slightly > > >> > is the best option. Would you mind giving the attached patch a > > >> > try? This still slightly grows the trampoline due to a few more > > >> > instructions being needed, but should still be far better than > > >> > embedding a whole 4k buffer (and then later finding a BIOS/disk > > >> > combination which wants even more). Note that I've left a tiny > > >> > bit of debugging code in there. > > >> > > > >> > > >> Sure, I'll give that a go now. > > >> > > > > > > That worked fine: > > > > > > (XEN) MBR[80] @ 85e0 (86000) > > > > But that's contrary to your earlier findings: Didn't you say simply > > avoiding a 4k-boundary wasn't enough? And it certainly tells us > > that this isn't a 4k drive (or at least the BIOS doesn't surface 4k > > sectors) - I was really expecting a larger gap between the two > > logged values. > > > > I'll go dump out the edd and double check what it is saying. > I dumped a bit of the info: (XEN) device 0x80 version 0x30 (XEN) number_of_sectors = 0x1dcf32b0 (XEN) sectors_per_track = 0x3f (XEN) bytes_per_sector = 0x200 So it is indeed advertising a 512 byte sector. It is an SSD though so it'll be something much bigger underneath. Paul > My findings indicated that the problem seemed to be doing a read that > spanned a 4k boundary caused a problem, so using 0x85e00 would be safe. > The anomaly was that simply aligning the edd_info buffer and a 512 byte > boundary and continuing to use that for reading did not work. > > > > so you can add my Tested-by to that. > > > > I.e. I'm not sure about this, as I'm still uncertain whether some > > corruption didn't again occur. Of course APs coming up properly > > would already be a relatively good sign (as now the permanent > > part of the trampoline would be the predestined area for > > corruption to occur in). > > > > None of my findings ever indicated memory corruption (although there, of > course, may have been some that I happened to miss), but rather > misbehaviour of the int13 handler itself - either locking up, having odd > effects (e.g. black screen), or both. > > Paul > > > Jan
>>> On 12.06.17 at 14:05, <Paul.Durrant@citrix.com> wrote: >> -----Original Message----- >> From: Jan Beulich [mailto:JBeulich@suse.com] >> Sent: 12 June 2017 12:12 >> To: Paul Durrant <Paul.Durrant@citrix.com> >> Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew >> Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- >> devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris >> Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross >> <jgross@suse.com> >> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot >> >> >>> On 12.06.17 at 12:53, <Paul.Durrant@citrix.com> wrote: >> >> -----Original Message----- >> > [snip] >> >> > > >> >> > > What do you think it best to do for Xen 4.9? Hardcoding a 4k alignment >> is >> >> > > clearly easy and would work around this BIOS issue but, as you say, it >> >> does >> >> > > grow the image. Reverting Juergen's patch also works round the issue, >> >> but >> >> > > that is more by luck. Re-working the code is preferable, but I guess it's >> >> too >> >> > > late to introduce such code-churn in 4.9. >> >> > >> >> > Reverting Jürgen's code is out of question with all the information >> >> > you've gathered by now. I think re-working the EDD code slightly >> >> > is the best option. Would you mind giving the attached patch a >> >> > try? This still slightly grows the trampoline due to a few more >> >> > instructions being needed, but should still be far better than >> >> > embedding a whole 4k buffer (and then later finding a BIOS/disk >> >> > combination which wants even more). Note that I've left a tiny >> >> > bit of debugging code in there. >> >> > >> >> >> >> Sure, I'll give that a go now. >> >> >> > >> > That worked fine: >> > >> > (XEN) MBR[80] @ 85e0 (86000) >> >> But that's contrary to your earlier findings: Didn't you say simply >> avoiding a 4k-boundary wasn't enough? And it certainly tells us >> that this isn't a 4k drive (or at least the BIOS doesn't surface 4k >> sectors) - I was really expecting a larger gap between the two >> logged values. >> > > I'll go dump out the edd and double check what it is saying. > > My findings indicated that the problem seemed to be doing a read that > spanned a 4k boundary caused a problem, so using 0x85e00 would be safe. The > anomaly was that simply aligning the edd_info buffer and a 512 byte boundary > and continuing to use that for reading did not work. But a 512-byte aligned 512-byte buffer can't possibly cross a page boundary. >> > so you can add my Tested-by to that. >> >> I.e. I'm not sure about this, as I'm still uncertain whether some >> corruption didn't again occur. Of course APs coming up properly >> would already be a relatively good sign (as now the permanent >> part of the trampoline would be the predestined area for >> corruption to occur in). >> > > None of my findings ever indicated memory corruption (although there, of > course, may have been some that I happened to miss), but rather misbehaviour > of the int13 handler itself - either locking up, having odd effects (e.g. > black screen), or both. Ah, I didn't understand it this way so far, and instead had implied that the handler did return, but corrupt our trampoline area in one way or another. Jan
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: 12 June 2017 14:55 > To: Paul Durrant <Paul.Durrant@citrix.com> > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > <jgross@suse.com> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > >>> On 12.06.17 at 14:05, <Paul.Durrant@citrix.com> wrote: > >> -----Original Message----- > >> From: Jan Beulich [mailto:JBeulich@suse.com] > >> Sent: 12 June 2017 12:12 > >> To: Paul Durrant <Paul.Durrant@citrix.com> > >> Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > >> Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > >> devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > >> Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > >> <jgross@suse.com> > >> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > >> > >> >>> On 12.06.17 at 12:53, <Paul.Durrant@citrix.com> wrote: > >> >> -----Original Message----- > >> > [snip] > >> >> > > > >> >> > > What do you think it best to do for Xen 4.9? Hardcoding a 4k > alignment > >> is > >> >> > > clearly easy and would work around this BIOS issue but, as you say, > it > >> >> does > >> >> > > grow the image. Reverting Juergen's patch also works round the > issue, > >> >> but > >> >> > > that is more by luck. Re-working the code is preferable, but I guess > it's > >> >> too > >> >> > > late to introduce such code-churn in 4.9. > >> >> > > >> >> > Reverting Jürgen's code is out of question with all the information > >> >> > you've gathered by now. I think re-working the EDD code slightly > >> >> > is the best option. Would you mind giving the attached patch a > >> >> > try? This still slightly grows the trampoline due to a few more > >> >> > instructions being needed, but should still be far better than > >> >> > embedding a whole 4k buffer (and then later finding a BIOS/disk > >> >> > combination which wants even more). Note that I've left a tiny > >> >> > bit of debugging code in there. > >> >> > > >> >> > >> >> Sure, I'll give that a go now. > >> >> > >> > > >> > That worked fine: > >> > > >> > (XEN) MBR[80] @ 85e0 (86000) > >> > >> But that's contrary to your earlier findings: Didn't you say simply > >> avoiding a 4k-boundary wasn't enough? And it certainly tells us > >> that this isn't a 4k drive (or at least the BIOS doesn't surface 4k > >> sectors) - I was really expecting a larger gap between the two > >> logged values. > >> > > > > I'll go dump out the edd and double check what it is saying. > > > > My findings indicated that the problem seemed to be doing a read that > > spanned a 4k boundary caused a problem, so using 0x85e00 would be safe. > The > > anomaly was that simply aligning the edd_info buffer and a 512 byte > boundary > > and continuing to use that for reading did not work. > > But a 512-byte aligned 512-byte buffer can't possibly cross a page > boundary. Indeed, which is why I was perplexed. I found that 0x60e00 was ok. Your patch chose 0x85e00, which was ok too, but for some reason a '.align 512' in front of boot_edd_info yielded an address which was not ok. I just checked what address that yielded though (by booting with edd=off to avoid the hang) and it was 0x86f40... which clearly means that '.align 512' is not doing what I thought it would do. Paul > > >> > so you can add my Tested-by to that. > >> > >> I.e. I'm not sure about this, as I'm still uncertain whether some > >> corruption didn't again occur. Of course APs coming up properly > >> would already be a relatively good sign (as now the permanent > >> part of the trampoline would be the predestined area for > >> corruption to occur in). > >> > > > > None of my findings ever indicated memory corruption (although there, of > > course, may have been some that I happened to miss), but rather > misbehaviour > > of the int13 handler itself - either locking up, having odd effects (e.g. > > black screen), or both. > > Ah, I didn't understand it this way so far, and instead had implied > that the handler did return, but corrupt our trampoline area in > one way or another. > > Jan
> -----Original Message----- > From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of > Paul Durrant > Sent: 12 June 2017 15:29 > To: 'Jan Beulich' <JBeulich@suse.com> > Cc: Juergen Gross <jgross@suse.com>; Andrew Cooper > <Andrew.Cooper3@citrix.com>; Julien Grall (julien.grall@arm.com) > <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>; > xen-devel(xen-devel@lists.xenproject.org) <xen- > devel@lists.xenproject.org> > Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > > -----Original Message----- > > From: Jan Beulich [mailto:JBeulich@suse.com] > > Sent: 12 June 2017 14:55 > > To: Paul Durrant <Paul.Durrant@citrix.com> > > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > > <jgross@suse.com> > > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > > > >>> On 12.06.17 at 14:05, <Paul.Durrant@citrix.com> wrote: > > >> -----Original Message----- > > >> From: Jan Beulich [mailto:JBeulich@suse.com] > > >> Sent: 12 June 2017 12:12 > > >> To: Paul Durrant <Paul.Durrant@citrix.com> > > >> Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > > >> Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > > >> devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > > >> Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > > >> <jgross@suse.com> > > >> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > >> > > >> >>> On 12.06.17 at 12:53, <Paul.Durrant@citrix.com> wrote: > > >> >> -----Original Message----- > > >> > [snip] > > >> >> > > > > >> >> > > What do you think it best to do for Xen 4.9? Hardcoding a 4k > > alignment > > >> is > > >> >> > > clearly easy and would work around this BIOS issue but, as you > say, > > it > > >> >> does > > >> >> > > grow the image. Reverting Juergen's patch also works round the > > issue, > > >> >> but > > >> >> > > that is more by luck. Re-working the code is preferable, but I > guess > > it's > > >> >> too > > >> >> > > late to introduce such code-churn in 4.9. > > >> >> > > > >> >> > Reverting Jürgen's code is out of question with all the information > > >> >> > you've gathered by now. I think re-working the EDD code slightly > > >> >> > is the best option. Would you mind giving the attached patch a > > >> >> > try? This still slightly grows the trampoline due to a few more > > >> >> > instructions being needed, but should still be far better than > > >> >> > embedding a whole 4k buffer (and then later finding a BIOS/disk > > >> >> > combination which wants even more). Note that I've left a tiny > > >> >> > bit of debugging code in there. > > >> >> > > > >> >> > > >> >> Sure, I'll give that a go now. > > >> >> > > >> > > > >> > That worked fine: > > >> > > > >> > (XEN) MBR[80] @ 85e0 (86000) > > >> > > >> But that's contrary to your earlier findings: Didn't you say simply > > >> avoiding a 4k-boundary wasn't enough? And it certainly tells us > > >> that this isn't a 4k drive (or at least the BIOS doesn't surface 4k > > >> sectors) - I was really expecting a larger gap between the two > > >> logged values. > > >> > > > > > > I'll go dump out the edd and double check what it is saying. > > > > > > My findings indicated that the problem seemed to be doing a read that > > > spanned a 4k boundary caused a problem, so using 0x85e00 would be > safe. > > The > > > anomaly was that simply aligning the edd_info buffer and a 512 byte > > boundary > > > and continuing to use that for reading did not work. > > > > But a 512-byte aligned 512-byte buffer can't possibly cross a page > > boundary. > > Indeed, which is why I was perplexed. I found that 0x60e00 was ok. Your > patch chose 0x85e00, which was ok too, but for some reason a '.align 512' in > front of boot_edd_info yielded an address which was not ok. I just checked > what address that yielded though (by booting with edd=off to avoid the > hang) and it was 0x86f40... which clearly means that '.align 512' is not doing > what I thought it would do. No, the problem turns out to be the GLOBAL() macro which, in assembly files, contains an implicit .align 16! Paul > > Paul > > > > > >> > so you can add my Tested-by to that. > > >> > > >> I.e. I'm not sure about this, as I'm still uncertain whether some > > >> corruption didn't again occur. Of course APs coming up properly > > >> would already be a relatively good sign (as now the permanent > > >> part of the trampoline would be the predestined area for > > >> corruption to occur in). > > >> > > > > > > None of my findings ever indicated memory corruption (although there, > of > > > course, may have been some that I happened to miss), but rather > > misbehaviour > > > of the int13 handler itself - either locking up, having odd effects (e.g. > > > black screen), or both. > > > > Ah, I didn't understand it this way so far, and instead had implied > > that the handler did return, but corrupt our trampoline area in > > one way or another. > > > > Jan > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > https://lists.xen.org/xen-devel
> -----Original Message----- > From: Paul Durrant > Sent: 12 June 2017 15:43 > To: Paul Durrant <Paul.Durrant@citrix.com>; 'Jan Beulich' > <JBeulich@suse.com> > Cc: Juergen Gross <jgross@suse.com>; Andrew Cooper > <Andrew.Cooper3@citrix.com>; Julien Grall (julien.grall@arm.com) > <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>; > xen-devel(xen-devel@lists.xenproject.org) <xen- > devel@lists.xenproject.org> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > > -----Original Message----- > > From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of > > Paul Durrant > > Sent: 12 June 2017 15:29 > > To: 'Jan Beulich' <JBeulich@suse.com> > > Cc: Juergen Gross <jgross@suse.com>; Andrew Cooper > > <Andrew.Cooper3@citrix.com>; Julien Grall (julien.grall@arm.com) > > <julien.grall@arm.com>; 'Boris Ostrovsky' <boris.ostrovsky@oracle.com>; > > xen-devel(xen-devel@lists.xenproject.org) <xen- > > devel@lists.xenproject.org> > > Subject: Re: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > > > > -----Original Message----- > > > From: Jan Beulich [mailto:JBeulich@suse.com] > > > Sent: 12 June 2017 14:55 > > > To: Paul Durrant <Paul.Durrant@citrix.com> > > > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > > > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > > > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > > > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > > > <jgross@suse.com> > > > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > > > > > >>> On 12.06.17 at 14:05, <Paul.Durrant@citrix.com> wrote: > > > >> -----Original Message----- > > > >> From: Jan Beulich [mailto:JBeulich@suse.com] > > > >> Sent: 12 June 2017 12:12 > > > >> To: Paul Durrant <Paul.Durrant@citrix.com> > > > >> Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; > Andrew > > > >> Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > > > >> devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > > > >> Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > > > >> <jgross@suse.com> > > > >> Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > > >> > > > >> >>> On 12.06.17 at 12:53, <Paul.Durrant@citrix.com> wrote: > > > >> >> -----Original Message----- > > > >> > [snip] > > > >> >> > > > > > >> >> > > What do you think it best to do for Xen 4.9? Hardcoding a 4k > > > alignment > > > >> is > > > >> >> > > clearly easy and would work around this BIOS issue but, as you > > say, > > > it > > > >> >> does > > > >> >> > > grow the image. Reverting Juergen's patch also works round > the > > > issue, > > > >> >> but > > > >> >> > > that is more by luck. Re-working the code is preferable, but I > > guess > > > it's > > > >> >> too > > > >> >> > > late to introduce such code-churn in 4.9. > > > >> >> > > > > >> >> > Reverting Jürgen's code is out of question with all the information > > > >> >> > you've gathered by now. I think re-working the EDD code slightly > > > >> >> > is the best option. Would you mind giving the attached patch a > > > >> >> > try? This still slightly grows the trampoline due to a few more > > > >> >> > instructions being needed, but should still be far better than > > > >> >> > embedding a whole 4k buffer (and then later finding a BIOS/disk > > > >> >> > combination which wants even more). Note that I've left a tiny > > > >> >> > bit of debugging code in there. > > > >> >> > > > > >> >> > > > >> >> Sure, I'll give that a go now. > > > >> >> > > > >> > > > > >> > That worked fine: > > > >> > > > > >> > (XEN) MBR[80] @ 85e0 (86000) > > > >> > > > >> But that's contrary to your earlier findings: Didn't you say simply > > > >> avoiding a 4k-boundary wasn't enough? And it certainly tells us > > > >> that this isn't a 4k drive (or at least the BIOS doesn't surface 4k > > > >> sectors) - I was really expecting a larger gap between the two > > > >> logged values. > > > >> > > > > > > > > I'll go dump out the edd and double check what it is saying. > > > > > > > > My findings indicated that the problem seemed to be doing a read that > > > > spanned a 4k boundary caused a problem, so using 0x85e00 would be > > safe. > > > The > > > > anomaly was that simply aligning the edd_info buffer and a 512 byte > > > boundary > > > > and continuing to use that for reading did not work. > > > > > > But a 512-byte aligned 512-byte buffer can't possibly cross a page > > > boundary. > > > > Indeed, which is why I was perplexed. I found that 0x60e00 was ok. Your > > patch chose 0x85e00, which was ok too, but for some reason a '.align 512' in > > front of boot_edd_info yielded an address which was not ok. I just checked > > what address that yielded though (by booting with edd=off to avoid the > > hang) and it was 0x86f40... which clearly means that '.align 512' is not doing > > what I thought it would do. > > No, the problem turns out to be the GLOBAL() macro which, in assembly > files, contains an implicit .align 16! > No, I misread.. ENTRY() contains the implicit align. It's clearly even more subtle. Running objdump tells me the symbol is indeed 512 byte aligned, but when it ends up on memory it's clearly not. So I guess it must be down to how the trampoline is loaded. Thus, not using a buffer within the trampoline image is most definitely the best idea. Paul > Paul > > > > > Paul > > > > > > > > >> > so you can add my Tested-by to that. > > > >> > > > >> I.e. I'm not sure about this, as I'm still uncertain whether some > > > >> corruption didn't again occur. Of course APs coming up properly > > > >> would already be a relatively good sign (as now the permanent > > > >> part of the trampoline would be the predestined area for > > > >> corruption to occur in). > > > >> > > > > > > > > None of my findings ever indicated memory corruption (although > there, > > of > > > > course, may have been some that I happened to miss), but rather > > > misbehaviour > > > > of the int13 handler itself - either locking up, having odd effects (e.g. > > > > black screen), or both. > > > > > > Ah, I didn't understand it this way so far, and instead had implied > > > that the handler did return, but corrupt our trampoline area in > > > one way or another. > > > > > > Jan > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xen.org > > https://lists.xen.org/xen-devel
>>> On 12.06.17 at 16:43, <Paul.Durrant@citrix.com> wrote: >> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of >> Paul Durrant >> Sent: 12 June 2017 15:29 >> > From: Jan Beulich [mailto:JBeulich@suse.com] >> > Sent: 12 June 2017 14:55 >> > >> > That worked fine: >> > >> > >> > >> > (XEN) MBR[80] @ 85e0 (86000) >> > >> >> > >> But that's contrary to your earlier findings: Didn't you say simply >> > >> avoiding a 4k-boundary wasn't enough? And it certainly tells us >> > >> that this isn't a 4k drive (or at least the BIOS doesn't surface 4k >> > >> sectors) - I was really expecting a larger gap between the two >> > >> logged values. >> > >> >> > > >> > > I'll go dump out the edd and double check what it is saying. >> > > >> > > My findings indicated that the problem seemed to be doing a read that >> > > spanned a 4k boundary caused a problem, so using 0x85e00 would be >> safe. >> > The >> > > anomaly was that simply aligning the edd_info buffer and a 512 byte >> > boundary >> > > and continuing to use that for reading did not work. >> > >> > But a 512-byte aligned 512-byte buffer can't possibly cross a page >> > boundary. >> >> Indeed, which is why I was perplexed. I found that 0x60e00 was ok. Your >> patch chose 0x85e00, which was ok too, but for some reason a '.align 512' in >> front of boot_edd_info yielded an address which was not ok. I just checked >> what address that yielded though (by booting with edd=off to avoid the >> hang) and it was 0x86f40... which clearly means that '.align 512' is not doing >> what I thought it would do. > > No, the problem turns out to be the GLOBAL() macro which, in assembly files, > contains an implicit .align 16! No, I don't think so - two successive .align don't have any bad effect, the higher value will be it. Instead I think you're suffering from the copying of the trampoline space to low memory: What is aligned to a 512-byte boundary in the image won't necessarily be in low memory, unless trampoline_start is also aligned at least as much. But with this likely having been the problem in your experiments I'm not feeling sufficiently reassured to submit the patch "officially". Jan Jan
> -----Original Message----- > From: Jan Beulich [mailto:JBeulich@suse.com] > Sent: 12 June 2017 16:08 > To: Paul Durrant <Paul.Durrant@citrix.com> > Cc: Julien Grall (julien.grall@arm.com) <julien.grall@arm.com>; Andrew > Cooper <Andrew.Cooper3@citrix.com>; xen-devel(xen- > devel@lists.xenproject.org) <xen-devel@lists.xenproject.org>; 'Boris > Ostrovsky' <boris.ostrovsky@oracle.com>; Juergen Gross > <jgross@suse.com> > Subject: RE: [Xen-devel] debian stretch dom0 + xen 4.9 fails to boot > > >>> On 12.06.17 at 16:43, <Paul.Durrant@citrix.com> wrote: > >> From: Xen-devel [mailto:xen-devel-bounces@lists.xen.org] On Behalf Of > >> Paul Durrant > >> Sent: 12 June 2017 15:29 > >> > From: Jan Beulich [mailto:JBeulich@suse.com] > >> > Sent: 12 June 2017 14:55 > >> > >> > That worked fine: > >> > >> > > >> > >> > (XEN) MBR[80] @ 85e0 (86000) > >> > >> > >> > >> But that's contrary to your earlier findings: Didn't you say simply > >> > >> avoiding a 4k-boundary wasn't enough? And it certainly tells us > >> > >> that this isn't a 4k drive (or at least the BIOS doesn't surface 4k > >> > >> sectors) - I was really expecting a larger gap between the two > >> > >> logged values. > >> > >> > >> > > > >> > > I'll go dump out the edd and double check what it is saying. > >> > > > >> > > My findings indicated that the problem seemed to be doing a read > that > >> > > spanned a 4k boundary caused a problem, so using 0x85e00 would be > >> safe. > >> > The > >> > > anomaly was that simply aligning the edd_info buffer and a 512 byte > >> > boundary > >> > > and continuing to use that for reading did not work. > >> > > >> > But a 512-byte aligned 512-byte buffer can't possibly cross a page > >> > boundary. > >> > >> Indeed, which is why I was perplexed. I found that 0x60e00 was ok. Your > >> patch chose 0x85e00, which was ok too, but for some reason a '.align 512' > in > >> front of boot_edd_info yielded an address which was not ok. I just > checked > >> what address that yielded though (by booting with edd=off to avoid the > >> hang) and it was 0x86f40... which clearly means that '.align 512' is not > doing > >> what I thought it would do. > > > > No, the problem turns out to be the GLOBAL() macro which, in assembly > files, > > contains an implicit .align 16! > > No, I don't think so - two successive .align don't have any bad effect, > the higher value will be it. Instead I think you're suffering from the > copying of the trampoline space to low memory: What is aligned to a > 512-byte boundary in the image won't necessarily be in low memory, > unless trampoline_start is also aligned at least as much. > > But with this likely having been the problem in your experiments I'm > not feeling sufficiently reassured to submit the patch "officially". > I see you submitted the patch. I'm happy now because the anomaly in what I was seeing is explained. I was convinced that, at some stage, I had found that the image was 64k aligned in memory. I was clearly wrong. Paul > Jan > > Jan
--- unstable.orig/xen/arch/x86/boot/edd.S 2017-02-09 14:28:18.000000000 +0100 +++ unstable/xen/arch/x86/boot/edd.S 2017-06-12 12:33:36.353082705 +0200 @@ -26,46 +26,6 @@ get_edd: cmpb $2, bootsym(opt_edd) # edd=off ? je edd_done - cmpb $1, bootsym(opt_edd) # edd=skipmbr ? - je edd_start - -# Read the first sector of each BIOS disk device and store the 4-byte signature -edd_mbr_sig_start: - movb $0x80, %dl # from device 80 - movw $bootsym(boot_mbr_signature),%bx # store buffer ptr in bx -edd_mbr_sig_read: - pushw %bx - movb $0x02, %ah # 0x02 Read Sectors - movb $1, %al # read 1 sector - movb $0, %dh # at head 0 - movw $1, %cx # cylinder 0, sector 0 - pushw %es - pushw %ds - popw %es - movw $bootsym(boot_edd_info), %bx # disk's data goes into info - pushw %dx # work around buggy BIOSes - stc # work around buggy BIOSes - int $0x13 - sti # work around buggy BIOSes - popw %dx - popw %es - popw %bx - jc edd_mbr_sig_done # on failure, we're done. - cmpb $0, %ah # some BIOSes do not set CF - jne edd_mbr_sig_done # on failure, we're done. - cmpw $0xaa55, bootsym(boot_edd_info)+0x1fe - jne .Ledd_mbr_sig_next - movl bootsym(boot_edd_info)+EDD_MBR_SIG_OFFSET,%eax - movb %dl, (%bx) # store BIOS drive number - movl %eax, 4(%bx) # store signature from MBR - incb bootsym(boot_mbr_signature_nr) # note that we stored something - addw $8, %bx # increment sig buffer ptr -.Ledd_mbr_sig_next: - incb %dl # increment to next device - jz edd_mbr_sig_done - cmpb $EDD_MBR_SIG_MAX,bootsym(boot_mbr_signature_nr) - jb edd_mbr_sig_read -edd_mbr_sig_done: # Do the BIOS Enhanced Disk Drive calls # This consists of two calls: @@ -136,10 +96,72 @@ edd_legacy_done: edd_next: incb %dl # increment to next device + jz edd_done cmpb $EDD_INFO_MAX,bootsym(boot_edd_info_nr) jb edd_check_ext edd_done: + cmpb $1, bootsym(opt_edd) # edd=skipmbr ? + je .Ledd_mbr_sig_skip + +# Read the first sector of each BIOS disk device and store the 4-byte signature +.Ledd_mbr_sig_start: + pushw %es + movb $0x80, %dl # from device 80 + movw $bootsym(boot_mbr_signature), %bx # store buffer ptr in bx +.Ledd_mbr_sig_read: + pushw %bx + movw $bootsym(boot_edd_info), %bx + movzbw bootsym(boot_edd_info_nr), %cx + jcxz .Ledd_mbr_sig_default +.Ledd_mbr_sig_find_info: + cmpb %dl, (%bx) + ja .Ledd_mbr_sig_default + je .Ledd_mbr_sig_get_size + add $EDDEXTSIZE+EDDPARMSIZE, %bx + loop .Ledd_mbr_sig_find_info +.Ledd_mbr_sig_default: + movw $(512 >> 4), %bx + jmp .Ledd_mbr_sig_set_buf +.Ledd_mbr_sig_get_size: + movw EDDEXTSIZE+0x18(%bx), %bx # sector size + shr $4, %bx # convert to paragraphs + jz .Ledd_mbr_sig_default +.Ledd_mbr_sig_set_buf: + movw %ds, %ax + subw %bx, %ax # disk's data goes right ahead + movw %ax, %es # of trampoline + xorw %bx, %bx + movw %bx, %es:0x1fe(%bx) # clear BIOS magic just in case + pushw %dx # work around buggy BIOSes + stc # work around buggy BIOSes + movw $0x0201, %ax # read 1 sector + movb $0, %dh # at head 0 + movw $1, %cx # cylinder 0, sector 0 + int $0x13 + sti # work around buggy BIOSes + popw %dx + movw %es:0x1fe(%bx), %si + movl %es:EDD_MBR_SIG_OFFSET(%bx), %ecx + popw %bx + jc .Ledd_mbr_sig_done # on failure, we're done. + testb %ah, %ah # some BIOSes do not set CF + jnz .Ledd_mbr_sig_done # on failure, we're done. + cmpw $0xaa55, %si + jne .Ledd_mbr_sig_next + movb %dl, (%bx) # store BIOS drive number + movw %es,2(%bx)//temp + movl %ecx, 4(%bx) # store signature from MBR + incb bootsym(boot_mbr_signature_nr) # note that we stored something + addw $8, %bx # increment sig buffer ptr +.Ledd_mbr_sig_next: + incb %dl # increment to next device + jz .Ledd_mbr_sig_done + cmpb $EDD_MBR_SIG_MAX, bootsym(boot_mbr_signature_nr) + jb .Ledd_mbr_sig_read +.Ledd_mbr_sig_done: + popw %es +.Ledd_mbr_sig_skip: ret GLOBAL(boot_edd_info_nr) @@ -149,4 +171,4 @@ GLOBAL(boot_mbr_signature_nr) GLOBAL(boot_mbr_signature) .fill EDD_MBR_SIG_MAX*8,1,0 GLOBAL(boot_edd_info) - .fill 512,1,0 # big enough for a disc sector + .fill EDD_INFO_MAX * (EDDEXTSIZE + EDDPARMSIZE), 1, 0 --- unstable.orig/xen/arch/x86/platform_hypercall.c 2015-07-20 14:49:38.000000000 +0200 +++ unstable/xen/arch/x86/platform_hypercall.c 2017-06-12 12:22:01.658928095 +0200 @@ -376,6 +376,7 @@ ret_t do_platform_op(XEN_GUEST_HANDLE_PA break; sig = bootsym(boot_mbr_signature) + op->u.firmware_info.index; +printk("MBR[%02x] @ %02x%02x (%04lx)\n", sig->device, sig->pad[2], sig->pad[1], trampoline_phys);//temp op->u.firmware_info.u.disk_mbr_signature.device = sig->device; op->u.firmware_info.u.disk_mbr_signature.mbr_signature =