diff mbox series

[v3,for,4.13] x86/e820: fix 640k - 1M region reservation logic

Message ID 20191031105609.21819-1-sergey.dyasli@citrix.com (mailing list archive)
State New, archived
Headers show
Series [v3,for,4.13] x86/e820: fix 640k - 1M region reservation logic | expand

Commit Message

Sergey Dyasli Oct. 31, 2019, 10:56 a.m. UTC
Converting a guest from PV to PV-in-PVH makes the guest to have 384k
less memory, which may confuse guest's balloon driver. This happens
because Xen unconditionally reserves 640k - 1M region in E820 despite
the fact that it's really a usable RAM in PVH boot mode.

Fix this by skipping region type change in virtualised environments,
trusting whatever memory map our hypervisor has provided.

Take a refactoring opportunity to introduce early_cpu_has_hypervisor().

Signed-off-by: Sergey Dyasli <sergey.dyasli@citrix.com>
---
v2 --> v3:
- early_cpu_has_hypervisor() is added

CC: Jan Beulich <jbeulich@suse.com>
CC: Andrew Cooper <andrew.cooper3@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: "Roger Pau Monné" <roger.pau@citrix.com>
CC: Juergen Gross <jgross@suse.com>
---
 xen/arch/x86/cpu/common.c   | 9 +++++++++
 xen/arch/x86/e820.c         | 6 ++++--
 xen/arch/x86/guest/xen.c    | 4 ++--
 xen/arch/x86/mm.c           | 2 +-
 xen/include/asm-x86/setup.h | 1 +
 5 files changed, 17 insertions(+), 5 deletions(-)

Comments

Jan Beulich Oct. 31, 2019, 11:07 a.m. UTC | #1
On 31.10.2019 11:56, Sergey Dyasli wrote:
> --- a/xen/arch/x86/cpu/common.c
> +++ b/xen/arch/x86/cpu/common.c
> @@ -274,6 +274,15 @@ static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb)
>  	return _phys_pkg_id(get_apic_id(), index_msb);
>  }
>  
> +/*
> + * Sometimes it's too early to use cpu_has_hypervisor which is available only
> + * after early_cpu_init().
> + */
> +bool __init early_cpu_has_hypervisor(void)
> +{
> +	return cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR);
> +}

OOI, did you consider introducing a more general early_cpu_has(),
with X86_FEATURE_* passed as an argument?

> @@ -318,9 +319,10 @@ static int __init copy_e820_map(struct e820entry * biosmap, unsigned int nr_map)
>  
>          /*
>           * Some BIOSes claim RAM in the 640k - 1M region.
> -         * Not right. Fix it up.
> +         * Not right. Fix it up, but only when running on bare metal.
>           */
> -        if (type == E820_RAM) {
> +        if ( !early_cpu_has_hypervisor() && type == E820_RAM )
> +        {
>              if (start < 0x100000ULL && end > 0xA0000ULL) {
>                  if (start < 0xA0000ULL)
>                      add_memory_region(start, 0xA0000ULL-start, type);

Seeing original line and lower context - aren't you corrupting
previously reasonably consistent (Linux) style here? (This could
be easily taken care of while committing, but I'd first like the
point below be clarified.)

> --- a/xen/arch/x86/mm.c
> +++ b/xen/arch/x86/mm.c
> @@ -5943,7 +5943,7 @@ const struct platform_bad_page *__init get_platform_badpages(unsigned int *array
>      case 0x000806e0: /* erratum KBL??? */
>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>          *array_size = (cpuid_eax(0) >= 7 &&
> -                       !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) &&
> +                       !early_cpu_has_hypervisor() &&

Strictly speaking this makes clear that in early_cpu_has_hypervisor()
you should also check cpuid_eax(0) >= 1. We don't currently seem to
object to running on a system with only basic leaf 0 available (we
do object to there not being extended leaf 1). Andrew, do you have
any clear opinion one way or the other?

Jan
Sergey Dyasli Oct. 31, 2019, 11:45 a.m. UTC | #2
On 31/10/2019 11:07, Jan Beulich wrote:
> On 31.10.2019 11:56, Sergey Dyasli wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -274,6 +274,15 @@ static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb)
>>  	return _phys_pkg_id(get_apic_id(), index_msb);
>>  }
>>  
>> +/*
>> + * Sometimes it's too early to use cpu_has_hypervisor which is available only
>> + * after early_cpu_init().
>> + */
>> +bool __init early_cpu_has_hypervisor(void)
>> +{
>> +	return cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR);
>> +}
> 
> OOI, did you consider introducing a more general early_cpu_has(),
> with X86_FEATURE_* passed as an argument?

I couldn't find any other users of cpuid_e{c,d}x(0x1), so I don't see
the point in doing it now.

>> @@ -318,9 +319,10 @@ static int __init copy_e820_map(struct e820entry * biosmap, unsigned int nr_map)
>>  
>>          /*
>>           * Some BIOSes claim RAM in the 640k - 1M region.
>> -         * Not right. Fix it up.
>> +         * Not right. Fix it up, but only when running on bare metal.
>>           */
>> -        if (type == E820_RAM) {
>> +        if ( !early_cpu_has_hypervisor() && type == E820_RAM )
>> +        {
>>              if (start < 0x100000ULL && end > 0xA0000ULL) {
>>                  if (start < 0xA0000ULL)
>>                      add_memory_region(start, 0xA0000ULL-start, type);
> 
> Seeing original line and lower context - aren't you corrupting
> previously reasonably consistent (Linux) style here? (This could
> be easily taken care of while committing, but I'd first like the
> point below be clarified.)

This file has mixed coding style and I don't have any preference here.

>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5943,7 +5943,7 @@ const struct platform_bad_page *__init get_platform_badpages(unsigned int *array
>>      case 0x000806e0: /* erratum KBL??? */
>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>          *array_size = (cpuid_eax(0) >= 7 &&
>> -                       !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) &&
>> +                       !early_cpu_has_hypervisor() &&
> 
> Strictly speaking this makes clear that in early_cpu_has_hypervisor()
> you should also check cpuid_eax(0) >= 1. We don't currently seem to
> object to running on a system with only basic leaf 0 available (we
> do object to there not being extended leaf 1). Andrew, do you have
> any clear opinion one way or the other?

What gave you that impression? early_cpu_init() and generic_identify()
do cpuid(0x1) unconditionally.

--
Thanks,
Sergey
Jan Beulich Oct. 31, 2019, 12:38 p.m. UTC | #3
On 31.10.2019 12:45, Sergey Dyasli wrote:
> On 31/10/2019 11:07, Jan Beulich wrote:
>> On 31.10.2019 11:56, Sergey Dyasli wrote:
>>> --- a/xen/arch/x86/mm.c
>>> +++ b/xen/arch/x86/mm.c
>>> @@ -5943,7 +5943,7 @@ const struct platform_bad_page *__init get_platform_badpages(unsigned int *array
>>>      case 0x000806e0: /* erratum KBL??? */
>>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>>          *array_size = (cpuid_eax(0) >= 7 &&
>>> -                       !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) &&
>>> +                       !early_cpu_has_hypervisor() &&
>>
>> Strictly speaking this makes clear that in early_cpu_has_hypervisor()
>> you should also check cpuid_eax(0) >= 1. We don't currently seem to
>> object to running on a system with only basic leaf 0 available (we
>> do object to there not being extended leaf 1). Andrew, do you have
>> any clear opinion one way or the other?
> 
> What gave you that impression? early_cpu_init() and generic_identify()
> do cpuid(0x1) unconditionally.

"do cpuid(0x1) unconditionally" != "object to running on a system
with only basic leaf 0 available", i.e. that code is simply buggy as
long as we don't elsewhere panic() if there's only leaf 0 available.

Jan
Andrew Cooper Oct. 31, 2019, 12:46 p.m. UTC | #4
On 31/10/2019 11:07, Jan Beulich wrote:
> On 31.10.2019 11:56, Sergey Dyasli wrote:
>> --- a/xen/arch/x86/cpu/common.c
>> +++ b/xen/arch/x86/cpu/common.c
>> @@ -274,6 +274,15 @@ static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb)
>>  	return _phys_pkg_id(get_apic_id(), index_msb);
>>  }
>>  
>> +/*
>> + * Sometimes it's too early to use cpu_has_hypervisor which is available only
>> + * after early_cpu_init().
>> + */
>> +bool __init early_cpu_has_hypervisor(void)
>> +{
>> +	return cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR);
>> +}

Static inline alongside the other cpuid helpers please.  This assembles
to smaller than the call required to get to a separate translation unit.

> OOI, did you consider introducing a more general early_cpu_has(),
> with X86_FEATURE_* passed as an argument?

This is interim code.  Personally, I'm still go with what I said at the
very beginning, of opencoding it just in the hunk below.

>
>> @@ -318,9 +319,10 @@ static int __init copy_e820_map(struct e820entry * biosmap, unsigned int nr_map)
>>  
>>          /*
>>           * Some BIOSes claim RAM in the 640k - 1M region.
>> -         * Not right. Fix it up.
>> +         * Not right. Fix it up, but only when running on bare metal.
>>           */
>> -        if (type == E820_RAM) {
>> +        if ( !early_cpu_has_hypervisor() && type == E820_RAM )
>> +        {
>>              if (start < 0x100000ULL && end > 0xA0000ULL) {
>>                  if (start < 0xA0000ULL)
>>                      add_memory_region(start, 0xA0000ULL-start, type);
> Seeing original line and lower context - aren't you corrupting
> previously reasonably consistent (Linux) style here? (This could
> be easily taken care of while committing, but I'd first like the
> point below be clarified.)
>
>> --- a/xen/arch/x86/mm.c
>> +++ b/xen/arch/x86/mm.c
>> @@ -5943,7 +5943,7 @@ const struct platform_bad_page *__init get_platform_badpages(unsigned int *array
>>      case 0x000806e0: /* erratum KBL??? */
>>      case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
>>          *array_size = (cpuid_eax(0) >= 7 &&
>> -                       !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) &&
>> +                       !early_cpu_has_hypervisor() &&
> Strictly speaking this makes clear that in early_cpu_has_hypervisor()
> you should also check cpuid_eax(0) >= 1. We don't currently seem to
> object to running on a system with only basic leaf 0 available (we
> do object to there not being extended leaf 1). Andrew, do you have
> any clear opinion one way or the other?

By the time we are running in C, we know the CPU actually has long mode
and therefore max_leaf > 4(?) and max_extd_leaf >= 0x80000008

I don't see any value in not relying on this property, and a cost (extra
branches) to not relying on it.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/cpu/common.c b/xen/arch/x86/cpu/common.c
index 6c6bd63301..adedb557df 100644
--- a/xen/arch/x86/cpu/common.c
+++ b/xen/arch/x86/cpu/common.c
@@ -274,6 +274,15 @@  static inline u32 phys_pkg_id(u32 cpuid_apic, int index_msb)
 	return _phys_pkg_id(get_apic_id(), index_msb);
 }
 
+/*
+ * Sometimes it's too early to use cpu_has_hypervisor which is available only
+ * after early_cpu_init().
+ */
+bool __init early_cpu_has_hypervisor(void)
+{
+	return cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR);
+}
+
 /* Do minimum CPU detection early.
    Fields really needed: vendor, cpuid_level, family, model, mask, cache alignment.
    The others are not touched to avoid unwanted side effects.
diff --git a/xen/arch/x86/e820.c b/xen/arch/x86/e820.c
index 8e8a2c4e1b..dbbe6cac0a 100644
--- a/xen/arch/x86/e820.c
+++ b/xen/arch/x86/e820.c
@@ -10,6 +10,7 @@ 
 #include <asm/mtrr.h>
 #include <asm/msr.h>
 #include <asm/guest.h>
+#include <asm/setup.h>
 
 /*
  * opt_mem: Limit maximum address of physical RAM.
@@ -318,9 +319,10 @@  static int __init copy_e820_map(struct e820entry * biosmap, unsigned int nr_map)
 
         /*
          * Some BIOSes claim RAM in the 640k - 1M region.
-         * Not right. Fix it up.
+         * Not right. Fix it up, but only when running on bare metal.
          */
-        if (type == E820_RAM) {
+        if ( !early_cpu_has_hypervisor() && type == E820_RAM )
+        {
             if (start < 0x100000ULL && end > 0xA0000ULL) {
                 if (start < 0xA0000ULL)
                     add_memory_region(start, 0xA0000ULL-start, type);
diff --git a/xen/arch/x86/guest/xen.c b/xen/arch/x86/guest/xen.c
index 7b7a5badab..48ea3224ea 100644
--- a/xen/arch/x86/guest/xen.c
+++ b/xen/arch/x86/guest/xen.c
@@ -31,6 +31,7 @@ 
 #include <asm/guest.h>
 #include <asm/msr.h>
 #include <asm/processor.h>
+#include <asm/setup.h>
 
 #include <public/arch-x86/cpuid.h>
 #include <public/hvm/params.h>
@@ -72,8 +73,7 @@  void __init probe_hypervisor(void)
     if ( xen_guest )
         return;
 
-    /* Too early to use cpu_has_hypervisor */
-    if ( !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) )
+    if ( !early_cpu_has_hypervisor() )
         return;
 
     find_xen_leaves();
diff --git a/xen/arch/x86/mm.c b/xen/arch/x86/mm.c
index 99816fc67c..df1641634c 100644
--- a/xen/arch/x86/mm.c
+++ b/xen/arch/x86/mm.c
@@ -5943,7 +5943,7 @@  const struct platform_bad_page *__init get_platform_badpages(unsigned int *array
     case 0x000806e0: /* erratum KBL??? */
     case 0x000906e0: /* errata KBL??? / KBW114 / CFW103 */
         *array_size = (cpuid_eax(0) >= 7 &&
-                       !(cpuid_ecx(1) & cpufeat_mask(X86_FEATURE_HYPERVISOR)) &&
+                       !early_cpu_has_hypervisor() &&
                        (cpuid_count_ebx(7, 0) & cpufeat_mask(X86_FEATURE_HLE)));
         return &hle_bad_page;
     }
diff --git a/xen/include/asm-x86/setup.h b/xen/include/asm-x86/setup.h
index 861d46d6ac..ddd37907f1 100644
--- a/xen/include/asm-x86/setup.h
+++ b/xen/include/asm-x86/setup.h
@@ -16,6 +16,7 @@  extern unsigned long xenheap_initial_phys_start;
 
 void early_cpu_init(void);
 void early_time_init(void);
+bool early_cpu_has_hypervisor(void);
 
 void set_nr_cpu_ids(unsigned int max_cpus);