diff mbox series

[for,4.13] x86/shim: don't reserve 640k - 1M region in E820

Message ID 20191028085603.32037-1-sergey.dyasli@citrix.com (mailing list archive)
State New, archived
Headers show
Series [for,4.13] x86/shim: don't reserve 640k - 1M region in E820 | expand

Commit Message

Sergey Dyasli Oct. 28, 2019, 8:56 a.m. UTC
Converting a guest from PV to PV-in-PVH makes the guest to have 384k
less memory, which may confuse guest's balloon driver. This happens
because Xen unconditionally reserves 640k - 1M region in E820 despite
the fact that it's really a usable RAM in PVH boot mode.

Fix this by skipping the region type change for pv-shim mode.

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
The condition can be relaxed to be just !pvh_boot if there are no
plans to support VGA MMIO / ROMs for PVH guests.

CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
CC: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/e820.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

Comments

Jan Beulich Oct. 28, 2019, 9:06 a.m. UTC | #1
On 28.10.2019 09:56, Sergey Dyasli wrote:
> Converting a guest from PV to PV-in-PVH makes the guest to have 384k
> less memory, which may confuse guest's balloon driver. This happens
> because Xen unconditionally reserves 640k - 1M region in E820 despite
> the fact that it's really a usable RAM in PVH boot mode.

This is a PVH property according to your description and according
to my understanding. Why would you then ...

> Fix this by skipping the region type change for pv-shim mode.

... alter behavior for shim mode only, rather than when booted in
PVH mode in general?

Jan
Sergey Dyasli Oct. 28, 2019, 11:08 a.m. UTC | #2
On 28/10/2019 09:06, Jan Beulich wrote:
> On 28.10.2019 09:56, Sergey Dyasli wrote:
>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k
>> less memory, which may confuse guest's balloon driver. This happens
>> because Xen unconditionally reserves 640k - 1M region in E820 despite
>> the fact that it's really a usable RAM in PVH boot mode.
> 
> This is a PVH property according to your description and according
> to my understanding. Why would you then ...
> 
>> Fix this by skipping the region type change for pv-shim mode.
> 
> ... alter behavior for shim mode only, rather than when booted in
> PVH mode in general?

This is just me being cautious.

My original email does have this text after ---
"The condition can be relaxed to be just !pvh_boot if there are no
plans to support VGA MMIO / ROMs for PVH guests."

I wanted to check with the community about this first.

Thanks,
Sergey
Jan Beulich Oct. 28, 2019, 11:11 a.m. UTC | #3
On 28.10.2019 12:08, Sergey Dyasli wrote:
> On 28/10/2019 09:06, Jan Beulich wrote:
>> On 28.10.2019 09:56, Sergey Dyasli wrote:
>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k
>>> less memory, which may confuse guest's balloon driver. This happens
>>> because Xen unconditionally reserves 640k - 1M region in E820 despite
>>> the fact that it's really a usable RAM in PVH boot mode.
>>
>> This is a PVH property according to your description and according
>> to my understanding. Why would you then ...
>>
>>> Fix this by skipping the region type change for pv-shim mode.
>>
>> ... alter behavior for shim mode only, rather than when booted in
>> PVH mode in general?
> 
> This is just me being cautious.
> 
> My original email does have this text after ---
> "The condition can be relaxed to be just !pvh_boot if there are no
> plans to support VGA MMIO / ROMs for PVH guests."

I doubt emulated ones are intended to be supported, but of
course a graphics card could be passed through. Iirc passthrough
is a pending work item still anyway for PVH, so dealing with the
case right now may not even be necessary.

Jan
Andrew Cooper Oct. 28, 2019, 11:15 a.m. UTC | #4
On 28/10/2019 11:11, Jan Beulich wrote:
> On 28.10.2019 12:08, Sergey Dyasli wrote:
>> On 28/10/2019 09:06, Jan Beulich wrote:
>>> On 28.10.2019 09:56, Sergey Dyasli wrote:
>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k
>>>> less memory, which may confuse guest's balloon driver. This happens
>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite
>>>> the fact that it's really a usable RAM in PVH boot mode.
>>> This is a PVH property according to your description and according
>>> to my understanding. Why would you then ...
>>>
>>>> Fix this by skipping the region type change for pv-shim mode.
>>> ... alter behavior for shim mode only, rather than when booted in
>>> PVH mode in general?
>> This is just me being cautious.
>>
>> My original email does have this text after ---
>> "The condition can be relaxed to be just !pvh_boot if there are no
>> plans to support VGA MMIO / ROMs for PVH guests."
> I doubt emulated ones are intended to be supported, but of
> course a graphics card could be passed through. Iirc passthrough
> is a pending work item still anyway for PVH, so dealing with the
> case right now may not even be necessary.

The bug is Xen blindly presuming that a missing hole "must be a firmware
bug".

I can believe that there may have been systems decades ago which got
this wrong, but tbh I doubt it applies to 64bit-capable systems,
considering how standardised things were by then.

I'd just delete this whole piece of logic and call it done.

~Andrew
Jan Beulich Oct. 28, 2019, 11:30 a.m. UTC | #5
On 28.10.2019 12:15, Andrew Cooper wrote:
> On 28/10/2019 11:11, Jan Beulich wrote:
>> On 28.10.2019 12:08, Sergey Dyasli wrote:
>>> On 28/10/2019 09:06, Jan Beulich wrote:
>>>> On 28.10.2019 09:56, Sergey Dyasli wrote:
>>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k
>>>>> less memory, which may confuse guest's balloon driver. This happens
>>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite
>>>>> the fact that it's really a usable RAM in PVH boot mode.
>>>> This is a PVH property according to your description and according
>>>> to my understanding. Why would you then ...
>>>>
>>>>> Fix this by skipping the region type change for pv-shim mode.
>>>> ... alter behavior for shim mode only, rather than when booted in
>>>> PVH mode in general?
>>> This is just me being cautious.
>>>
>>> My original email does have this text after ---
>>> "The condition can be relaxed to be just !pvh_boot if there are no
>>> plans to support VGA MMIO / ROMs for PVH guests."
>> I doubt emulated ones are intended to be supported, but of
>> course a graphics card could be passed through. Iirc passthrough
>> is a pending work item still anyway for PVH, so dealing with the
>> case right now may not even be necessary.
> 
> The bug is Xen blindly presuming that a missing hole "must be a firmware
> bug".
> 
> I can believe that there may have been systems decades ago which got
> this wrong, but tbh I doubt it applies to 64bit-capable systems,
> considering how standardised things were by then.
> 
> I'd just delete this whole piece of logic and call it done.

Well, I could see use being less aggressive here, but firmware (if
there is firmware, i.e. everything except PVH) claiming there to
be RAM immediately below the 1M boundary is a pretty good sign of
something being wrong. There _ought_ to be ROM at that address.
Otoh there were even ways in older chipsets to make RAM appear at
address ranges not used by option ROMs, so the logic we currently
have goes too far potentially even on bare metal.

So while I'm all for relaxing, I don't think I can support
outright deletion.

Jan
Andrew Cooper Oct. 28, 2019, 11:33 a.m. UTC | #6
On 28/10/2019 11:30, Jan Beulich wrote:
> On 28.10.2019 12:15, Andrew Cooper wrote:
>> On 28/10/2019 11:11, Jan Beulich wrote:
>>> On 28.10.2019 12:08, Sergey Dyasli wrote:
>>>> On 28/10/2019 09:06, Jan Beulich wrote:
>>>>> On 28.10.2019 09:56, Sergey Dyasli wrote:
>>>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k
>>>>>> less memory, which may confuse guest's balloon driver. This happens
>>>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite
>>>>>> the fact that it's really a usable RAM in PVH boot mode.
>>>>> This is a PVH property according to your description and according
>>>>> to my understanding. Why would you then ...
>>>>>
>>>>>> Fix this by skipping the region type change for pv-shim mode.
>>>>> ... alter behavior for shim mode only, rather than when booted in
>>>>> PVH mode in general?
>>>> This is just me being cautious.
>>>>
>>>> My original email does have this text after ---
>>>> "The condition can be relaxed to be just !pvh_boot if there are no
>>>> plans to support VGA MMIO / ROMs for PVH guests."
>>> I doubt emulated ones are intended to be supported, but of
>>> course a graphics card could be passed through. Iirc passthrough
>>> is a pending work item still anyway for PVH, so dealing with the
>>> case right now may not even be necessary.
>> The bug is Xen blindly presuming that a missing hole "must be a firmware
>> bug".
>>
>> I can believe that there may have been systems decades ago which got
>> this wrong, but tbh I doubt it applies to 64bit-capable systems,
>> considering how standardised things were by then.
>>
>> I'd just delete this whole piece of logic and call it done.
> Well, I could see use being less aggressive here, but firmware (if
> there is firmware, i.e. everything except PVH) claiming there to
> be RAM immediately below the 1M boundary is a pretty good sign of
> something being wrong. There _ought_ to be ROM at that address.
> Otoh there were even ways in older chipsets to make RAM appear at
> address ranges not used by option ROMs, so the logic we currently
> have goes too far potentially even on bare metal.
>
> So while I'm all for relaxing, I don't think I can support
> outright deletion.

For now, how about cpu_has_hypervisor ?

Whatever the virtual environment, we should trust the provided memory map.

~Andrew
Jan Beulich Oct. 28, 2019, 11:41 a.m. UTC | #7
On 28.10.2019 12:33, Andrew Cooper wrote:
> On 28/10/2019 11:30, Jan Beulich wrote:
>> On 28.10.2019 12:15, Andrew Cooper wrote:
>>> On 28/10/2019 11:11, Jan Beulich wrote:
>>>> On 28.10.2019 12:08, Sergey Dyasli wrote:
>>>>> On 28/10/2019 09:06, Jan Beulich wrote:
>>>>>> On 28.10.2019 09:56, Sergey Dyasli wrote:
>>>>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k
>>>>>>> less memory, which may confuse guest's balloon driver. This happens
>>>>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite
>>>>>>> the fact that it's really a usable RAM in PVH boot mode.
>>>>>> This is a PVH property according to your description and according
>>>>>> to my understanding. Why would you then ...
>>>>>>
>>>>>>> Fix this by skipping the region type change for pv-shim mode.
>>>>>> ... alter behavior for shim mode only, rather than when booted in
>>>>>> PVH mode in general?
>>>>> This is just me being cautious.
>>>>>
>>>>> My original email does have this text after ---
>>>>> "The condition can be relaxed to be just !pvh_boot if there are no
>>>>> plans to support VGA MMIO / ROMs for PVH guests."
>>>> I doubt emulated ones are intended to be supported, but of
>>>> course a graphics card could be passed through. Iirc passthrough
>>>> is a pending work item still anyway for PVH, so dealing with the
>>>> case right now may not even be necessary.
>>> The bug is Xen blindly presuming that a missing hole "must be a firmware
>>> bug".
>>>
>>> I can believe that there may have been systems decades ago which got
>>> this wrong, but tbh I doubt it applies to 64bit-capable systems,
>>> considering how standardised things were by then.
>>>
>>> I'd just delete this whole piece of logic and call it done.
>> Well, I could see use being less aggressive here, but firmware (if
>> there is firmware, i.e. everything except PVH) claiming there to
>> be RAM immediately below the 1M boundary is a pretty good sign of
>> something being wrong. There _ought_ to be ROM at that address.
>> Otoh there were even ways in older chipsets to make RAM appear at
>> address ranges not used by option ROMs, so the logic we currently
>> have goes too far potentially even on bare metal.
>>
>> So while I'm all for relaxing, I don't think I can support
>> outright deletion.
> 
> For now, how about cpu_has_hypervisor ?
> 
> Whatever the virtual environment, we should trust the provided memory map.

Hmm, yes, this sounds reasonable.

Jan
Sergey Dyasli Oct. 29, 2019, 10:17 a.m. UTC | #8
On 28/10/2019 11:33, Andrew Cooper wrote:
> On 28/10/2019 11:30, Jan Beulich wrote:
>> On 28.10.2019 12:15, Andrew Cooper wrote:
>>> On 28/10/2019 11:11, Jan Beulich wrote:
>>>> On 28.10.2019 12:08, Sergey Dyasli wrote:
>>>>> On 28/10/2019 09:06, Jan Beulich wrote:
>>>>>> On 28.10.2019 09:56, Sergey Dyasli wrote:
>>>>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k
>>>>>>> less memory, which may confuse guest's balloon driver. This happens
>>>>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite
>>>>>>> the fact that it's really a usable RAM in PVH boot mode.
>>>>>> This is a PVH property according to your description and according
>>>>>> to my understanding. Why would you then ...
>>>>>>
>>>>>>> Fix this by skipping the region type change for pv-shim mode.
>>>>>> ... alter behavior for shim mode only, rather than when booted in
>>>>>> PVH mode in general?
>>>>> This is just me being cautious.
>>>>>
>>>>> My original email does have this text after ---
>>>>> "The condition can be relaxed to be just !pvh_boot if there are no
>>>>> plans to support VGA MMIO / ROMs for PVH guests."
>>>> I doubt emulated ones are intended to be supported, but of
>>>> course a graphics card could be passed through. Iirc passthrough
>>>> is a pending work item still anyway for PVH, so dealing with the
>>>> case right now may not even be necessary.
>>> The bug is Xen blindly presuming that a missing hole "must be a firmware
>>> bug".
>>>
>>> I can believe that there may have been systems decades ago which got
>>> this wrong, but tbh I doubt it applies to 64bit-capable systems,
>>> considering how standardised things were by then.
>>>
>>> I'd just delete this whole piece of logic and call it done.
>> Well, I could see use being less aggressive here, but firmware (if
>> there is firmware, i.e. everything except PVH) claiming there to
>> be RAM immediately below the 1M boundary is a pretty good sign of
>> something being wrong. There _ought_ to be ROM at that address.
>> Otoh there were even ways in older chipsets to make RAM appear at
>> address ranges not used by option ROMs, so the logic we currently
>> have goes too far potentially even on bare metal.
>>
>> So while I'm all for relaxing, I don't think I can support
>> outright deletion.
> 
> For now, how about cpu_has_hypervisor ?
> 
> Whatever the virtual environment, we should trust the provided memory map.

Unfortunately, this plan has failed: init_e820() is called way before
early_cpu_init() and testing cpu_has_hypervisor is meaningless there.

I guess I'll go for !pvh_boot check for now.

--
Thanks,
Sergey
Andrew Cooper Oct. 29, 2019, 10:19 a.m. UTC | #9
On 29/10/2019 10:17, Sergey Dyasli wrote:
> On 28/10/2019 11:33, Andrew Cooper wrote:
>> On 28/10/2019 11:30, Jan Beulich wrote:
>>> On 28.10.2019 12:15, Andrew Cooper wrote:
>>>> On 28/10/2019 11:11, Jan Beulich wrote:
>>>>> On 28.10.2019 12:08, Sergey Dyasli wrote:
>>>>>> On 28/10/2019 09:06, Jan Beulich wrote:
>>>>>>> On 28.10.2019 09:56, Sergey Dyasli wrote:
>>>>>>>> Converting a guest from PV to PV-in-PVH makes the guest to have 384k
>>>>>>>> less memory, which may confuse guest's balloon driver. This happens
>>>>>>>> because Xen unconditionally reserves 640k - 1M region in E820 despite
>>>>>>>> the fact that it's really a usable RAM in PVH boot mode.
>>>>>>> This is a PVH property according to your description and according
>>>>>>> to my understanding. Why would you then ...
>>>>>>>
>>>>>>>> Fix this by skipping the region type change for pv-shim mode.
>>>>>>> ... alter behavior for shim mode only, rather than when booted in
>>>>>>> PVH mode in general?
>>>>>> This is just me being cautious.
>>>>>>
>>>>>> My original email does have this text after ---
>>>>>> "The condition can be relaxed to be just !pvh_boot if there are no
>>>>>> plans to support VGA MMIO / ROMs for PVH guests."
>>>>> I doubt emulated ones are intended to be supported, but of
>>>>> course a graphics card could be passed through. Iirc passthrough
>>>>> is a pending work item still anyway for PVH, so dealing with the
>>>>> case right now may not even be necessary.
>>>> The bug is Xen blindly presuming that a missing hole "must be a firmware
>>>> bug".
>>>>
>>>> I can believe that there may have been systems decades ago which got
>>>> this wrong, but tbh I doubt it applies to 64bit-capable systems,
>>>> considering how standardised things were by then.
>>>>
>>>> I'd just delete this whole piece of logic and call it done.
>>> Well, I could see use being less aggressive here, but firmware (if
>>> there is firmware, i.e. everything except PVH) claiming there to
>>> be RAM immediately below the 1M boundary is a pretty good sign of
>>> something being wrong. There _ought_ to be ROM at that address.
>>> Otoh there were even ways in older chipsets to make RAM appear at
>>> address ranges not used by option ROMs, so the logic we currently
>>> have goes too far potentially even on bare metal.
>>>
>>> So while I'm all for relaxing, I don't think I can support
>>> outright deletion.
>> For now, how about cpu_has_hypervisor ?
>>
>> Whatever the virtual environment, we should trust the provided memory map.
> Unfortunately, this plan has failed: init_e820() is called way before
> early_cpu_init() and testing cpu_has_hypervisor is meaningless there.
>
> I guess I'll go for !pvh_boot check for now.

Opencode it with cpuid_ecx(1) for now, as we've done elsewhere.

I've got a patch in my not-progressing-very-quickly series to remove the
boot time mapping at 0 which cleans up early CPUID handling so this can
turn into cpu_has_hypervisor, but that is definitely post-4.13 material
right now.

~Andrew
Sergey Dyasli Oct. 29, 2019, 10:40 a.m. UTC | #10
On 29/10/2019 10:19, Andrew Cooper wrote:
> On 29/10/2019 10:17, Sergey Dyasli wrote:
>> On 28/10/2019 11:33, Andrew Cooper wrote:
>>> For now, how about cpu_has_hypervisor ?
>>>
>>> Whatever the virtual environment, we should trust the provided memory map.
>> Unfortunately, this plan has failed: init_e820() is called way before
>> early_cpu_init() and testing cpu_has_hypervisor is meaningless there.
>>
>> I guess I'll go for !pvh_boot check for now.
> 
> Opencode it with cpuid_ecx(1) for now, as we've done elsewhere.
> 
> I've got a patch in my not-progressing-very-quickly series to remove the
> boot time mapping at 0 which cleans up early CPUID handling so this can
> turn into cpu_has_hypervisor, but that is definitely post-4.13 material
> right now.

Looking closely at early_cpu_init(), I don't see anything there that
would prevent placing it near the top of __start_xen(). It works for
pv-shim, but I need to also do some bare metal testing with this change.

Do you know of anything that would prevent it?

--
Thanks,
Sergey
Andrew Cooper Oct. 29, 2019, 10:41 a.m. UTC | #11
On 29/10/2019 10:40, Sergey Dyasli wrote:
> On 29/10/2019 10:19, Andrew Cooper wrote:
>> On 29/10/2019 10:17, Sergey Dyasli wrote:
>>> On 28/10/2019 11:33, Andrew Cooper wrote:
>>>> For now, how about cpu_has_hypervisor ?
>>>>
>>>> Whatever the virtual environment, we should trust the provided memory map.
>>> Unfortunately, this plan has failed: init_e820() is called way before
>>> early_cpu_init() and testing cpu_has_hypervisor is meaningless there.
>>>
>>> I guess I'll go for !pvh_boot check for now.
>> Opencode it with cpuid_ecx(1) for now, as we've done elsewhere.
>>
>> I've got a patch in my not-progressing-very-quickly series to remove the
>> boot time mapping at 0 which cleans up early CPUID handling so this can
>> turn into cpu_has_hypervisor, but that is definitely post-4.13 material
>> right now.
> Looking closely at early_cpu_init(), I don't see anything there that
> would prevent placing it near the top of __start_xen(). It works for
> pv-shim, but I need to also do some bare metal testing with this change.
>
> Do you know of anything that would prevent it?

Yes - microcode loading.  That sequence of things in __start_xen() is
very position sensitive.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
index 8e8a2c4e1b..54c2ae70c4 100644
--- a/xen/arch/x86/e820.c
+++ b/xen/arch/x86/e820.c
@@ -318,9 +318,10 @@  static int __init copy_e820_map(struct e820entry * biosmap, unsigned int nr_map)
 
         /*
          * Some BIOSes claim RAM in the 640k - 1M region.
-         * Not right. Fix it up.
+         * Not right. Fix it up, except for pv-shim case.
          */
-        if (type == E820_RAM) {
+        if ( !(pv_shim && pvh_boot) && type == E820_RAM )
+        {
             if (start < 0x100000ULL && end > 0xA0000ULL) {
                 if (start < 0xA0000ULL)
                     add_memory_region(start, 0xA0000ULL-start, type);