Message ID | 20191028085603.32037-1-sergey.dyasli@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [for,4.13] x86/shim: don't reserve 640k - 1M region in E820 | expand |
On 28.10.2019 09:56, Sergey Dyasli wrote: > Converting a guest from PV to PV-in-PVH makes the guest to have 384k > less memory, which may confuse guest's balloon driver. This happens > because Xen unconditionally reserves 640k - 1M region in E820 despite > the fact that it's really a usable RAM in PVH boot mode. This is a PVH property according to your description and according to my understanding. Why would you then ... > Fix this by skipping the region type change for pv-shim mode. ... alter behavior for shim mode only, rather than when booted in PVH mode in general? Jan
On 28/10/2019 09:06, Jan Beulich wrote: > On 28.10.2019 09:56, Sergey Dyasli wrote: >> Converting a guest from PV to PV-in-PVH makes the guest to have 384k >> less memory, which may confuse guest's balloon driver. This happens >> because Xen unconditionally reserves 640k - 1M region in E820 despite >> the fact that it's really a usable RAM in PVH boot mode. > > This is a PVH property according to your description and according > to my understanding. Why would you then ... > >> Fix this by skipping the region type change for pv-shim mode. > > ... alter behavior for shim mode only, rather than when booted in > PVH mode in general? This is just me being cautious. My original email does have this text after --- "The condition can be relaxed to be just !pvh_boot if there are no plans to support VGA MMIO / ROMs for PVH guests." I wanted to check with the community about this first. Thanks, Sergey
On 28.10.2019 12:08, Sergey Dyasli wrote: > On 28/10/2019 09:06, Jan Beulich wrote: >> On 28.10.2019 09:56, Sergey Dyasli wrote: >>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k >>> less memory, which may confuse guest's balloon driver. This happens >>> because Xen unconditionally reserves 640k - 1M region in E820 despite >>> the fact that it's really a usable RAM in PVH boot mode. >> >> This is a PVH property according to your description and according >> to my understanding. Why would you then ... >> >>> Fix this by skipping the region type change for pv-shim mode. >> >> ... alter behavior for shim mode only, rather than when booted in >> PVH mode in general? > > This is just me being cautious. > > My original email does have this text after --- > "The condition can be relaxed to be just !pvh_boot if there are no > plans to support VGA MMIO / ROMs for PVH guests." I doubt emulated ones are intended to be supported, but of course a graphics card could be passed through. Iirc passthrough is a pending work item still anyway for PVH, so dealing with the case right now may not even be necessary. Jan
On 28/10/2019 11:11, Jan Beulich wrote: > On 28.10.2019 12:08, Sergey Dyasli wrote: >> On 28/10/2019 09:06, Jan Beulich wrote: >>> On 28.10.2019 09:56, Sergey Dyasli wrote: >>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k >>>> less memory, which may confuse guest's balloon driver. This happens >>>> because Xen unconditionally reserves 640k - 1M region in E820 despite >>>> the fact that it's really a usable RAM in PVH boot mode. >>> This is a PVH property according to your description and according >>> to my understanding. Why would you then ... >>> >>>> Fix this by skipping the region type change for pv-shim mode. >>> ... alter behavior for shim mode only, rather than when booted in >>> PVH mode in general? >> This is just me being cautious. >> >> My original email does have this text after --- >> "The condition can be relaxed to be just !pvh_boot if there are no >> plans to support VGA MMIO / ROMs for PVH guests." > I doubt emulated ones are intended to be supported, but of > course a graphics card could be passed through. Iirc passthrough > is a pending work item still anyway for PVH, so dealing with the > case right now may not even be necessary. The bug is Xen blindly presuming that a missing hole "must be a firmware bug". I can believe that there may have been systems decades ago which got this wrong, but tbh I doubt it applies to 64bit-capable systems, considering how standardised things were by then. I'd just delete this whole piece of logic and call it done. ~Andrew
On 28.10.2019 12:15, Andrew Cooper wrote: > On 28/10/2019 11:11, Jan Beulich wrote: >> On 28.10.2019 12:08, Sergey Dyasli wrote: >>> On 28/10/2019 09:06, Jan Beulich wrote: >>>> On 28.10.2019 09:56, Sergey Dyasli wrote: >>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k >>>>> less memory, which may confuse guest's balloon driver. This happens >>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite >>>>> the fact that it's really a usable RAM in PVH boot mode. >>>> This is a PVH property according to your description and according >>>> to my understanding. Why would you then ... >>>> >>>>> Fix this by skipping the region type change for pv-shim mode. >>>> ... alter behavior for shim mode only, rather than when booted in >>>> PVH mode in general? >>> This is just me being cautious. >>> >>> My original email does have this text after --- >>> "The condition can be relaxed to be just !pvh_boot if there are no >>> plans to support VGA MMIO / ROMs for PVH guests." >> I doubt emulated ones are intended to be supported, but of >> course a graphics card could be passed through. Iirc passthrough >> is a pending work item still anyway for PVH, so dealing with the >> case right now may not even be necessary. > > The bug is Xen blindly presuming that a missing hole "must be a firmware > bug". > > I can believe that there may have been systems decades ago which got > this wrong, but tbh I doubt it applies to 64bit-capable systems, > considering how standardised things were by then. > > I'd just delete this whole piece of logic and call it done. Well, I could see use being less aggressive here, but firmware (if there is firmware, i.e. everything except PVH) claiming there to be RAM immediately below the 1M boundary is a pretty good sign of something being wrong. There _ought_ to be ROM at that address. Otoh there were even ways in older chipsets to make RAM appear at address ranges not used by option ROMs, so the logic we currently have goes too far potentially even on bare metal. So while I'm all for relaxing, I don't think I can support outright deletion. Jan
On 28/10/2019 11:30, Jan Beulich wrote: > On 28.10.2019 12:15, Andrew Cooper wrote: >> On 28/10/2019 11:11, Jan Beulich wrote: >>> On 28.10.2019 12:08, Sergey Dyasli wrote: >>>> On 28/10/2019 09:06, Jan Beulich wrote: >>>>> On 28.10.2019 09:56, Sergey Dyasli wrote: >>>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k >>>>>> less memory, which may confuse guest's balloon driver. This happens >>>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite >>>>>> the fact that it's really a usable RAM in PVH boot mode. >>>>> This is a PVH property according to your description and according >>>>> to my understanding. Why would you then ... >>>>> >>>>>> Fix this by skipping the region type change for pv-shim mode. >>>>> ... alter behavior for shim mode only, rather than when booted in >>>>> PVH mode in general? >>>> This is just me being cautious. >>>> >>>> My original email does have this text after --- >>>> "The condition can be relaxed to be just !pvh_boot if there are no >>>> plans to support VGA MMIO / ROMs for PVH guests." >>> I doubt emulated ones are intended to be supported, but of >>> course a graphics card could be passed through. Iirc passthrough >>> is a pending work item still anyway for PVH, so dealing with the >>> case right now may not even be necessary. >> The bug is Xen blindly presuming that a missing hole "must be a firmware >> bug". >> >> I can believe that there may have been systems decades ago which got >> this wrong, but tbh I doubt it applies to 64bit-capable systems, >> considering how standardised things were by then. >> >> I'd just delete this whole piece of logic and call it done. > Well, I could see use being less aggressive here, but firmware (if > there is firmware, i.e. everything except PVH) claiming there to > be RAM immediately below the 1M boundary is a pretty good sign of > something being wrong. There _ought_ to be ROM at that address. > Otoh there were even ways in older chipsets to make RAM appear at > address ranges not used by option ROMs, so the logic we currently > have goes too far potentially even on bare metal. > > So while I'm all for relaxing, I don't think I can support > outright deletion. For now, how about cpu_has_hypervisor ? Whatever the virtual environment, we should trust the provided memory map. ~Andrew
On 28.10.2019 12:33, Andrew Cooper wrote: > On 28/10/2019 11:30, Jan Beulich wrote: >> On 28.10.2019 12:15, Andrew Cooper wrote: >>> On 28/10/2019 11:11, Jan Beulich wrote: >>>> On 28.10.2019 12:08, Sergey Dyasli wrote: >>>>> On 28/10/2019 09:06, Jan Beulich wrote: >>>>>> On 28.10.2019 09:56, Sergey Dyasli wrote: >>>>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k >>>>>>> less memory, which may confuse guest's balloon driver. This happens >>>>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite >>>>>>> the fact that it's really a usable RAM in PVH boot mode. >>>>>> This is a PVH property according to your description and according >>>>>> to my understanding. Why would you then ... >>>>>> >>>>>>> Fix this by skipping the region type change for pv-shim mode. >>>>>> ... alter behavior for shim mode only, rather than when booted in >>>>>> PVH mode in general? >>>>> This is just me being cautious. >>>>> >>>>> My original email does have this text after --- >>>>> "The condition can be relaxed to be just !pvh_boot if there are no >>>>> plans to support VGA MMIO / ROMs for PVH guests." >>>> I doubt emulated ones are intended to be supported, but of >>>> course a graphics card could be passed through. Iirc passthrough >>>> is a pending work item still anyway for PVH, so dealing with the >>>> case right now may not even be necessary. >>> The bug is Xen blindly presuming that a missing hole "must be a firmware >>> bug". >>> >>> I can believe that there may have been systems decades ago which got >>> this wrong, but tbh I doubt it applies to 64bit-capable systems, >>> considering how standardised things were by then. >>> >>> I'd just delete this whole piece of logic and call it done. >> Well, I could see use being less aggressive here, but firmware (if >> there is firmware, i.e. everything except PVH) claiming there to >> be RAM immediately below the 1M boundary is a pretty good sign of >> something being wrong. There _ought_ to be ROM at that address. >> Otoh there were even ways in older chipsets to make RAM appear at >> address ranges not used by option ROMs, so the logic we currently >> have goes too far potentially even on bare metal. >> >> So while I'm all for relaxing, I don't think I can support >> outright deletion. > > For now, how about cpu_has_hypervisor ? > > Whatever the virtual environment, we should trust the provided memory map. Hmm, yes, this sounds reasonable. Jan
On 28/10/2019 11:33, Andrew Cooper wrote: > On 28/10/2019 11:30, Jan Beulich wrote: >> On 28.10.2019 12:15, Andrew Cooper wrote: >>> On 28/10/2019 11:11, Jan Beulich wrote: >>>> On 28.10.2019 12:08, Sergey Dyasli wrote: >>>>> On 28/10/2019 09:06, Jan Beulich wrote: >>>>>> On 28.10.2019 09:56, Sergey Dyasli wrote: >>>>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k >>>>>>> less memory, which may confuse guest's balloon driver. This happens >>>>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite >>>>>>> the fact that it's really a usable RAM in PVH boot mode. >>>>>> This is a PVH property according to your description and according >>>>>> to my understanding. Why would you then ... >>>>>> >>>>>>> Fix this by skipping the region type change for pv-shim mode. >>>>>> ... alter behavior for shim mode only, rather than when booted in >>>>>> PVH mode in general? >>>>> This is just me being cautious. >>>>> >>>>> My original email does have this text after --- >>>>> "The condition can be relaxed to be just !pvh_boot if there are no >>>>> plans to support VGA MMIO / ROMs for PVH guests." >>>> I doubt emulated ones are intended to be supported, but of >>>> course a graphics card could be passed through. Iirc passthrough >>>> is a pending work item still anyway for PVH, so dealing with the >>>> case right now may not even be necessary. >>> The bug is Xen blindly presuming that a missing hole "must be a firmware >>> bug". >>> >>> I can believe that there may have been systems decades ago which got >>> this wrong, but tbh I doubt it applies to 64bit-capable systems, >>> considering how standardised things were by then. >>> >>> I'd just delete this whole piece of logic and call it done. >> Well, I could see use being less aggressive here, but firmware (if >> there is firmware, i.e. everything except PVH) claiming there to >> be RAM immediately below the 1M boundary is a pretty good sign of >> something being wrong. There _ought_ to be ROM at that address. >> Otoh there were even ways in older chipsets to make RAM appear at >> address ranges not used by option ROMs, so the logic we currently >> have goes too far potentially even on bare metal. >> >> So while I'm all for relaxing, I don't think I can support >> outright deletion. > > For now, how about cpu_has_hypervisor ? > > Whatever the virtual environment, we should trust the provided memory map. Unfortunately, this plan has failed: init_e820() is called way before early_cpu_init() and testing cpu_has_hypervisor is meaningless there. I guess I'll go for !pvh_boot check for now. -- Thanks, Sergey
On 29/10/2019 10:17, Sergey Dyasli wrote: > On 28/10/2019 11:33, Andrew Cooper wrote: >> On 28/10/2019 11:30, Jan Beulich wrote: >>> On 28.10.2019 12:15, Andrew Cooper wrote: >>>> On 28/10/2019 11:11, Jan Beulich wrote: >>>>> On 28.10.2019 12:08, Sergey Dyasli wrote: >>>>>> On 28/10/2019 09:06, Jan Beulich wrote: >>>>>>> On 28.10.2019 09:56, Sergey Dyasli wrote: >>>>>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k >>>>>>>> less memory, which may confuse guest's balloon driver. This happens >>>>>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite >>>>>>>> the fact that it's really a usable RAM in PVH boot mode. >>>>>>> This is a PVH property according to your description and according >>>>>>> to my understanding. Why would you then ... >>>>>>> >>>>>>>> Fix this by skipping the region type change for pv-shim mode. >>>>>>> ... alter behavior for shim mode only, rather than when booted in >>>>>>> PVH mode in general? >>>>>> This is just me being cautious. >>>>>> >>>>>> My original email does have this text after --- >>>>>> "The condition can be relaxed to be just !pvh_boot if there are no >>>>>> plans to support VGA MMIO / ROMs for PVH guests." >>>>> I doubt emulated ones are intended to be supported, but of >>>>> course a graphics card could be passed through. Iirc passthrough >>>>> is a pending work item still anyway for PVH, so dealing with the >>>>> case right now may not even be necessary. >>>> The bug is Xen blindly presuming that a missing hole "must be a firmware >>>> bug". >>>> >>>> I can believe that there may have been systems decades ago which got >>>> this wrong, but tbh I doubt it applies to 64bit-capable systems, >>>> considering how standardised things were by then. >>>> >>>> I'd just delete this whole piece of logic and call it done. >>> Well, I could see use being less aggressive here, but firmware (if >>> there is firmware, i.e. everything except PVH) claiming there to >>> be RAM immediately below the 1M boundary is a pretty good sign of >>> something being wrong. There _ought_ to be ROM at that address. >>> Otoh there were even ways in older chipsets to make RAM appear at >>> address ranges not used by option ROMs, so the logic we currently >>> have goes too far potentially even on bare metal. >>> >>> So while I'm all for relaxing, I don't think I can support >>> outright deletion. >> For now, how about cpu_has_hypervisor ? >> >> Whatever the virtual environment, we should trust the provided memory map. > Unfortunately, this plan has failed: init_e820() is called way before > early_cpu_init() and testing cpu_has_hypervisor is meaningless there. > > I guess I'll go for !pvh_boot check for now. Opencode it with cpuid_ecx(1) for now, as we've done elsewhere. I've got a patch in my not-progressing-very-quickly series to remove the boot time mapping at 0 which cleans up early CPUID handling so this can turn into cpu_has_hypervisor, but that is definitely post-4.13 material right now. ~Andrew
On 29/10/2019 10:19, Andrew Cooper wrote: > On 29/10/2019 10:17, Sergey Dyasli wrote: >> On 28/10/2019 11:33, Andrew Cooper wrote: >>> For now, how about cpu_has_hypervisor ? >>> >>> Whatever the virtual environment, we should trust the provided memory map. >> Unfortunately, this plan has failed: init_e820() is called way before >> early_cpu_init() and testing cpu_has_hypervisor is meaningless there. >> >> I guess I'll go for !pvh_boot check for now. > > Opencode it with cpuid_ecx(1) for now, as we've done elsewhere. > > I've got a patch in my not-progressing-very-quickly series to remove the > boot time mapping at 0 which cleans up early CPUID handling so this can > turn into cpu_has_hypervisor, but that is definitely post-4.13 material > right now. Looking closely at early_cpu_init(), I don't see anything there that would prevent placing it near the top of __start_xen(). It works for pv-shim, but I need to also do some bare metal testing with this change. Do you know of anything that would prevent it? -- Thanks, Sergey
On 29/10/2019 10:40, Sergey Dyasli wrote: > On 29/10/2019 10:19, Andrew Cooper wrote: >> On 29/10/2019 10:17, Sergey Dyasli wrote: >>> On 28/10/2019 11:33, Andrew Cooper wrote: >>>> For now, how about cpu_has_hypervisor ? >>>> >>>> Whatever the virtual environment, we should trust the provided memory map. >>> Unfortunately, this plan has failed: init_e820() is called way before >>> early_cpu_init() and testing cpu_has_hypervisor is meaningless there. >>> >>> I guess I'll go for !pvh_boot check for now. >> Opencode it with cpuid_ecx(1) for now, as we've done elsewhere. >> >> I've got a patch in my not-progressing-very-quickly series to remove the >> boot time mapping at 0 which cleans up early CPUID handling so this can >> turn into cpu_has_hypervisor, but that is definitely post-4.13 material >> right now. > Looking closely at early_cpu_init(), I don't see anything there that > would prevent placing it near the top of __start_xen(). It works for > pv-shim, but I need to also do some bare metal testing with this change. > > Do you know of anything that would prevent it? Yes - microcode loading. That sequence of things in __start_xen() is very position sensitive. ~Andrew
diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c index 8e8a2c4e1b..54c2ae70c4 100644 --- a/xen/arch/x86/e820.c +++ b/xen/arch/x86/e820.c @@ -318,9 +318,10 @@ static int __init copy_e820_map(struct e820entry * biosmap, unsigned int nr_map) /* * Some BIOSes claim RAM in the 640k - 1M region. - * Not right. Fix it up. + * Not right. Fix it up, except for pv-shim case. */ - if (type == E820_RAM) { + if ( !(pv_shim && pvh_boot) && type == E820_RAM ) + { if (start < 0x100000ULL && end > 0xA0000ULL) { if (start < 0xA0000ULL) add_memory_region(start, 0xA0000ULL-start, type);
Converting a guest from PV to PV-in-PVH makes the guest to have 384k less memory, which may confuse guest's balloon driver. This happens because Xen unconditionally reserves 640k - 1M region in E820 despite the fact that it's really a usable RAM in PVH boot mode. Fix this by skipping the region type change for pv-shim mode. Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com> --- The condition can be relaxed to be just !pvh_boot if there are no plans to support VGA MMIO / ROMs for PVH guests. CC: Jan Beulich <jbeulich@suse.com> CC: Andrew Cooper <andrew.cooper3@citrix.com> CC: Wei Liu <wl@xen.org> CC: "Roger Pau Monné" <roger.pau@citrix.com> CC: Juergen Gross <jgross@suse.com> --- xen/arch/x86/e820.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-)