diff mbox

[v3,01/10] KVM: MMU: fix decoding cache type from MTRR

Message ID 1431499348-25188-2-git-send-email-guangrong.xiao@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xiao Guangrong May 13, 2015, 6:42 a.m. UTC
There are some bugs in current get_mtrr_type();
1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
   IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
   that means other bits are ignored if it is cleared

2: the fixed MTRR ranges are controlled by bit 0 of
   mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)

3: if MTRR is disabled, UC is applied to all of physical memory rather
   than mtrr_state->def_type

Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
---
 arch/x86/kvm/mmu.c | 14 ++++++--------
 1 file changed, 6 insertions(+), 8 deletions(-)

Comments

Wanpeng Li May 13, 2015, 8:09 a.m. UTC | #1
On Wed, May 13, 2015 at 02:42:19PM +0800, Xiao Guangrong wrote:
>There are some bugs in current get_mtrr_type();
>1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>   IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>   that means other bits are ignored if it is cleared
>
>2: the fixed MTRR ranges are controlled by bit 0 of
>   mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>
>3: if MTRR is disabled, UC is applied to all of physical memory rather
>   than mtrr_state->def_type
>
>Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>

Reviewed-by: Wanpeng Li <wanpeng.li@linux.intel.com>

>---
> arch/x86/kvm/mmu.c | 14 ++++++--------
> 1 file changed, 6 insertions(+), 8 deletions(-)
>
>diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>index b78e83f..d00cebd 100644
>--- a/arch/x86/kvm/mmu.c
>+++ b/arch/x86/kvm/mmu.c
>@@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
> static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
> 			 u64 start, u64 end)
> {
>-	int i;
> 	u64 base, mask;
> 	u8 prev_match, curr_match;
>-	int num_var_ranges = KVM_NR_VAR_MTRR;
>+	int i, num_var_ranges = KVM_NR_VAR_MTRR;
> 
>-	if (!mtrr_state->enabled)
>-		return 0xFF;
>+	/* MTRR is completely disabled, use UC for all of physical memory. */
>+	if (!(mtrr_state->enabled & 0x2))
>+		return MTRR_TYPE_UNCACHABLE;
> 
> 	/* Make end inclusive end, instead of exclusive */
> 	end--;
> 
> 	/* Look in fixed ranges. Just return the type as per start */
>-	if (mtrr_state->have_fixed && (start < 0x100000)) {
>+	if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
>+	      (start < 0x100000)) {
> 		int idx;
> 
> 		if (start < 0x80000) {
>@@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
> 	 * Look of multiple ranges matching this address and pick type
> 	 * as per MTRR precedence
> 	 */
>-	if (!(mtrr_state->enabled & 2))
>-		return mtrr_state->def_type;
>-
> 	prev_match = 0xFF;
> 	for (i = 0; i < num_var_ranges; ++i) {
> 		unsigned short start_state, end_state;
>-- 
>2.1.0
>
>--
>To unsubscribe from this list: send the line "unsubscribe kvm" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Alex Williamson July 12, 2015, 5:33 p.m. UTC | #2
On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
> There are some bugs in current get_mtrr_type();
> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>    IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>    that means other bits are ignored if it is cleared
> 
> 2: the fixed MTRR ranges are controlled by bit 0 of
>    mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
> 
> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>    than mtrr_state->def_type
> 
> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
> ---
>  arch/x86/kvm/mmu.c | 14 ++++++--------
>  1 file changed, 6 insertions(+), 8 deletions(-)


I'm seeing a significant regression in boot performance on Intel
hardware with assigned devices that bisects back to this patch.  There's
a long delay with Seabios between the version splash and execution of
option ROMs, and a _very_ long delay with OVMF before the display is
initialized.  The delay is long enough that users are reporting their
previously working VM is hung with 100% CPU usage on v4.2-rc1.  Thanks,

Alex

> 
> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
> index b78e83f..d00cebd 100644
> --- a/arch/x86/kvm/mmu.c
> +++ b/arch/x86/kvm/mmu.c
> @@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
>  static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
>  			 u64 start, u64 end)
>  {
> -	int i;
>  	u64 base, mask;
>  	u8 prev_match, curr_match;
> -	int num_var_ranges = KVM_NR_VAR_MTRR;
> +	int i, num_var_ranges = KVM_NR_VAR_MTRR;
>  
> -	if (!mtrr_state->enabled)
> -		return 0xFF;
> +	/* MTRR is completely disabled, use UC for all of physical memory. */
> +	if (!(mtrr_state->enabled & 0x2))
> +		return MTRR_TYPE_UNCACHABLE;
>  
>  	/* Make end inclusive end, instead of exclusive */
>  	end--;
>  
>  	/* Look in fixed ranges. Just return the type as per start */
> -	if (mtrr_state->have_fixed && (start < 0x100000)) {
> +	if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
> +	      (start < 0x100000)) {
>  		int idx;
>  
>  		if (start < 0x80000) {
> @@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
>  	 * Look of multiple ranges matching this address and pick type
>  	 * as per MTRR precedence
>  	 */
> -	if (!(mtrr_state->enabled & 2))
> -		return mtrr_state->def_type;
> -
>  	prev_match = 0xFF;
>  	for (i = 0; i < num_var_ranges; ++i) {
>  		unsigned short start_state, end_state;



--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong July 12, 2015, 6:59 p.m. UTC | #3
On 07/13/2015 01:33 AM, Alex Williamson wrote:
> On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
>> There are some bugs in current get_mtrr_type();
>> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>>     IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>>     that means other bits are ignored if it is cleared
>>
>> 2: the fixed MTRR ranges are controlled by bit 0 of
>>     mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>>
>> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>>     than mtrr_state->def_type
>>
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>   arch/x86/kvm/mmu.c | 14 ++++++--------
>>   1 file changed, 6 insertions(+), 8 deletions(-)
>
>
> I'm seeing a significant regression in boot performance on Intel
> hardware with assigned devices that bisects back to this patch.  There's
> a long delay with Seabios between the version splash and execution of
> option ROMs, and a _very_ long delay with OVMF before the display is
> initialized.  The delay is long enough that users are reporting their
> previously working VM is hung with 100% CPU usage on v4.2-rc1.  Thanks,
>

Alex, thanks for your report. I will try to reproduce and fix it asap.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Bandan Das July 12, 2015, 7:12 p.m. UTC | #4
Alex Williamson <alex.williamson@redhat.com> writes:

> On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
>> There are some bugs in current get_mtrr_type();
>> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>>    IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>>    that means other bits are ignored if it is cleared
>> 
>> 2: the fixed MTRR ranges are controlled by bit 0 of
>>    mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>> 
>> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>>    than mtrr_state->def_type
>> 
>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>> ---
>>  arch/x86/kvm/mmu.c | 14 ++++++--------
>>  1 file changed, 6 insertions(+), 8 deletions(-)
>
>
> I'm seeing a significant regression in boot performance on Intel
> hardware with assigned devices that bisects back to this patch.  There's
> a long delay with Seabios between the version splash and execution of
> option ROMs, and a _very_ long delay with OVMF before the display is
> initialized.  The delay is long enough that users are reporting their
> previously working VM is hung with 100% CPU usage on v4.2-rc1.  Thanks,
>
> Alex
>
>> 
>> diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
>> index b78e83f..d00cebd 100644
>> --- a/arch/x86/kvm/mmu.c
>> +++ b/arch/x86/kvm/mmu.c
>> @@ -2393,19 +2393,20 @@ EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
>>  static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
>>  			 u64 start, u64 end)
>>  {
>> -	int i;
>>  	u64 base, mask;
>>  	u8 prev_match, curr_match;
>> -	int num_var_ranges = KVM_NR_VAR_MTRR;
>> +	int i, num_var_ranges = KVM_NR_VAR_MTRR;
>>  
>> -	if (!mtrr_state->enabled)
>> -		return 0xFF;
>> +	/* MTRR is completely disabled, use UC for all of physical memory. */
>> +	if (!(mtrr_state->enabled & 0x2))
>> +		return MTRR_TYPE_UNCACHABLE;

Setting this to UC if MTRR is disabled is too restrictive maybe..

Bandan

>>  	/* Make end inclusive end, instead of exclusive */
>>  	end--;
>>  
>>  	/* Look in fixed ranges. Just return the type as per start */
>> -	if (mtrr_state->have_fixed && (start < 0x100000)) {
>> +	if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
>> +	      (start < 0x100000)) {
>>  		int idx;
>>  
>>  		if (start < 0x80000) {
>> @@ -2428,9 +2429,6 @@ static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
>>  	 * Look of multiple ranges matching this address and pick type
>>  	 * as per MTRR precedence
>>  	 */
>> -	if (!(mtrr_state->enabled & 2))
>> -		return mtrr_state->def_type;
>> -
>>  	prev_match = 0xFF;
>>  	for (i = 0; i < num_var_ranges; ++i) {
>>  		unsigned short start_state, end_state;
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe kvm" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 13, 2015, 7:32 a.m. UTC | #5
On 12/07/2015 20:59, Xiao Guangrong wrote:
> 
> 
> On 07/13/2015 01:33 AM, Alex Williamson wrote:
>> On Wed, 2015-05-13 at 14:42 +0800, Xiao Guangrong wrote:
>>> There are some bugs in current get_mtrr_type();
>>> 1: bit 1 of mtrr_state->enabled is corresponding bit 11 of
>>>     IA32_MTRR_DEF_TYPE MSR which completely control MTRR's enablement
>>>     that means other bits are ignored if it is cleared
>>>
>>> 2: the fixed MTRR ranges are controlled by bit 0 of
>>>     mtrr_state->enabled (bit 10 of IA32_MTRR_DEF_TYPE)
>>>
>>> 3: if MTRR is disabled, UC is applied to all of physical memory rather
>>>     than mtrr_state->def_type
>>>
>>> Signed-off-by: Xiao Guangrong <guangrong.xiao@linux.intel.com>
>>> ---
>>>   arch/x86/kvm/mmu.c | 14 ++++++--------
>>>   1 file changed, 6 insertions(+), 8 deletions(-)
>>
>>
>> I'm seeing a significant regression in boot performance on Intel
>> hardware with assigned devices that bisects back to this patch.  There's
>> a long delay with Seabios between the version splash and execution of
>> option ROMs, and a _very_ long delay with OVMF before the display is
>> initialized.  The delay is long enough that users are reporting their
>> previously working VM is hung with 100% CPU usage on v4.2-rc1.  Thanks,
>>
> 
> Alex, thanks for your report. I will try to reproduce and fix it asap.

The code that Bandan pointed out

+	/* MTRR is completely disabled, use UC for all of physical memory. */
+	if (!(mtrr_state->enabled & 0x2))
+		return MTRR_TYPE_UNCACHABLE;

actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
kvm_mtrr_get_guest_memory_type, 2015-06-15).  Should mtrr_default_type
actually be something like this:

static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
{
        if (mtrr_is_enabled(mtrr_state))
                return mtrr_state->deftype & IA32_MTRR_DEF_TYPE_TYPE_MASK;
        else
                return MTRR_TYPE_UNCACHABLE;
}

?  Then it's easy to add a quirk that makes the default WRITEBACK until
MTRRs are enabled.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong July 13, 2015, 2:45 p.m. UTC | #6
On 07/13/2015 03:32 PM, Paolo Bonzini wrote:

>>> I'm seeing a significant regression in boot performance on Intel
>>> hardware with assigned devices that bisects back to this patch.  There's
>>> a long delay with Seabios between the version splash and execution of
>>> option ROMs, and a _very_ long delay with OVMF before the display is
>>> initialized.  The delay is long enough that users are reporting their
>>> previously working VM is hung with 100% CPU usage on v4.2-rc1.  Thanks,
>>>
>>
>> Alex, thanks for your report. I will try to reproduce and fix it asap.
>

I have reproduced the issue and I think Bandan and Paolo is correct.

> The code that Bandan pointed out
>
> +	/* MTRR is completely disabled, use UC for all of physical memory. */
> +	if (!(mtrr_state->enabled & 0x2))
> +		return MTRR_TYPE_UNCACHABLE;
>
> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
> kvm_mtrr_get_guest_memory_type, 2015-06-15).  Should mtrr_default_type
> actually be something like this:

:(

Based on the SDM, UC is applied to all memory rather than default-type
if MTRR is disabled.

If i changed the code to:
   	if (!(mtrr_state->enabled & 0x2))
  		return mtrr_state->def_type;
the result is the same as before.

However, fast boot came back if "return 0xFF" here. So fast boot expects
that the memory type is WB.

>
> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
> {
>          if (mtrr_is_enabled(mtrr_state))
>                  return mtrr_state->deftype & IA32_MTRR_DEF_TYPE_TYPE_MASK;
>          else
>                  return MTRR_TYPE_UNCACHABLE;
> }
>
> ?  Then it's easy to add a quirk that makes the default WRITEBACK until
> MTRRs are enabled.

It is the wrong configure in OVMF... shall we need to adjust KVM to satisfy
OVMF?

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 13, 2015, 3:13 p.m. UTC | #7
On 13/07/2015 16:45, Xiao Guangrong wrote:
>> +    /* MTRR is completely disabled, use UC for all of physical
>> memory. */
>> +    if (!(mtrr_state->enabled & 0x2))
>> +        return MTRR_TYPE_UNCACHABLE;
>>
>> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
>> kvm_mtrr_get_guest_memory_type, 2015-06-15).
> 
> :(
> 
> Based on the SDM, UC is applied to all memory rather than default-type
> if MTRR is disabled.

There are two issues I think.  One is that I cannot find in the current
code that "UC is applied to all memory rather than default-type if MTRR
is disabled".  mtrr_default_type unconditionally looks at
mtrr_state->deftype.

> However, fast boot came back if "return 0xFF" here. So fast boot expects
> that the memory type is WB.

Yes.

>>
>> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
>> {
>>          if (mtrr_is_enabled(mtrr_state))
>>                  return mtrr_state->deftype &
>> IA32_MTRR_DEF_TYPE_TYPE_MASK;
>>          else
>>                  return MTRR_TYPE_UNCACHABLE;
>> }
>>
>> ?  Then it's easy to add a quirk that makes the default WRITEBACK until
>> MTRRs are enabled.
> 
> It is the wrong configure in OVMF... shall we need to adjust KVM to satisfy
> OVMF?

That's what quirks are for...  The firmware should still be fixed of course.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong July 13, 2015, 3:15 p.m. UTC | #8
On 07/13/2015 11:13 PM, Paolo Bonzini wrote:
> On 13/07/2015 16:45, Xiao Guangrong wrote:
>>> +    /* MTRR is completely disabled, use UC for all of physical
>>> memory. */
>>> +    if (!(mtrr_state->enabled & 0x2))
>>> +        return MTRR_TYPE_UNCACHABLE;
>>>
>>> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
>>> kvm_mtrr_get_guest_memory_type, 2015-06-15).
>>
>> :(
>>
>> Based on the SDM, UC is applied to all memory rather than default-type
>> if MTRR is disabled.
>
> There are two issues I think.  One is that I cannot find in the current
> code that "UC is applied to all memory rather than default-type if MTRR
> is disabled".  mtrr_default_type unconditionally looks at
> mtrr_state->deftype.

Yes... Will fix.

>
>> However, fast boot came back if "return 0xFF" here. So fast boot expects
>> that the memory type is WB.
>
> Yes.
>
>>>
>>> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
>>> {
>>>           if (mtrr_is_enabled(mtrr_state))
>>>                   return mtrr_state->deftype &
>>> IA32_MTRR_DEF_TYPE_TYPE_MASK;
>>>           else
>>>                   return MTRR_TYPE_UNCACHABLE;
>>> }
>>>
>>> ?  Then it's easy to add a quirk that makes the default WRITEBACK until
>>> MTRRs are enabled.
>>
>> It is the wrong configure in OVMF... shall we need to adjust KVM to satisfy
>> OVMF?
>
> That's what quirks are for...  The firmware should still be fixed of course.

I see, will do it.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laszlo Ersek July 14, 2015, 9:12 p.m. UTC | #9
Cross-posting to edk2-devel.

Original sub-thread starts here:
http://thread.gmane.org/gmane.linux.kernel/1952205/focus=1994315

On 07/13/15 17:15, Xiao Guangrong wrote:
> 
> 
> On 07/13/2015 11:13 PM, Paolo Bonzini wrote:
>> On 13/07/2015 16:45, Xiao Guangrong wrote:
>>>> +    /* MTRR is completely disabled, use UC for all of physical
>>>> memory. */
>>>> +    if (!(mtrr_state->enabled & 0x2))
>>>> +        return MTRR_TYPE_UNCACHABLE;
>>>>
>>>> actually disappears in commit fa61213746a7 (KVM: MTRR: simplify
>>>> kvm_mtrr_get_guest_memory_type, 2015-06-15).
>>>
>>> :(
>>>
>>> Based on the SDM, UC is applied to all memory rather than default-type
>>> if MTRR is disabled.
>>
>> There are two issues I think.  One is that I cannot find in the current
>> code that "UC is applied to all memory rather than default-type if MTRR
>> is disabled".  mtrr_default_type unconditionally looks at
>> mtrr_state->deftype.
> 
> Yes... Will fix.
> 
>>
>>> However, fast boot came back if "return 0xFF" here. So fast boot expects
>>> that the memory type is WB.
>>
>> Yes.
>>
>>>>
>>>> static u8 mtrr_default_type(struct kvm_mtrr *mtrr_state)
>>>> {
>>>>           if (mtrr_is_enabled(mtrr_state))
>>>>                   return mtrr_state->deftype &
>>>> IA32_MTRR_DEF_TYPE_TYPE_MASK;
>>>>           else
>>>>                   return MTRR_TYPE_UNCACHABLE;
>>>> }
>>>>
>>>> ?  Then it's easy to add a quirk that makes the default WRITEBACK until
>>>> MTRRs are enabled.
>>>
>>> It is the wrong configure in OVMF... shall we need to adjust KVM to
>>> satisfy
>>> OVMF?
>>
>> That's what quirks are for...  The firmware should still be fixed of
>> course.
> 
> I see, will do it.

The long delay that Alex reported (for the case when all guest memory
was set to UC up-front) is due to the fact that the SEC phase of OVMF
decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
drivers -- and this decompression is extremely memory-intensive.

(Refer to "Firmware image structure" in the OVMF whitepaper,
<http://www.linux-kvm.org/page/OVMF>.)

Perhaps also significant, with this initial MTRR change: the x86_64
reset vector builds some page tables too. (Refer to "Select features |
X64-specific reset vector for OVMF" in the OVMF whitepaper.)

(When Jordan implemented that reset vector first, we saw similar
performance degradation on AMD hosts (albeit not due to MTRR but due to
page attributes). See
<https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
it here because it makes me appreciate the current problem report.)

Anyway, the reset vector's page table building is implemented in
"OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
(It is recommended to refer heavily to the OVMF whitepaper, or at least
to drink heavily.)

In the PEI phase, we do set up MTRRs sensibly, see
"OvmfPkg/PlatformPei/MemDetect.c", function QemuInitializeRam().
However, that's too late with regard this problem report, because PEI
modules run *from* one of the firmware volumes that SEC expands with LZMA.

Anyway, the logic in QemuInitializeRam() relies on the MtrrLib library
class, which OVMF resolves to the
"UefiCpuPkg/Library/MtrrLib/MtrrLib.inf" instance. The latter has no
client module type restriction (it's BASE), so it could be used in SEC
too, if someone were to write patches.

I'm sorry but I really can't take this on now. So, respectfully:
- please quirk it in KVM for now (SeaBIOS is also affected),
- "patches welcome" for OVMF, as always.

Thanks
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Paolo Bonzini July 14, 2015, 9:15 p.m. UTC | #10
> The long delay that Alex reported (for the case when all guest memory
> was set to UC up-front) is due to the fact that the SEC phase of OVMF
> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
> drivers -- and this decompression is extremely memory-intensive.
> 
> (When Jordan implemented that reset vector first, we saw similar
> performance degradation on AMD hosts (albeit not due to MTRR but due to
> page attributes). See
> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
> it here because it makes me appreciate the current problem report.)
> 
> Anyway, the reset vector's page table building is implemented in
> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().

Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
and set the default type to writeback.

In any case we're going to have to quirk it, because of the broken
guests in the wild.

Paolo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laszlo Ersek July 14, 2015, 9:29 p.m. UTC | #11
On 07/14/15 23:15, Paolo Bonzini wrote:
>> The long delay that Alex reported (for the case when all guest memory
>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>> drivers -- and this decompression is extremely memory-intensive.
>>
>> (When Jordan implemented that reset vector first, we saw similar
>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>> page attributes). See
>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>> it here because it makes me appreciate the current problem report.)
>>
>> Anyway, the reset vector's page table building is implemented in
>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
> 
> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?

That's an idea, yes, if someone feels sufficiently drawn to writing
assembly. Complications:
- the reset vector is specific to OvmfPkg only in the OvmfPkgX64.dsc
  build
- it needs to be determined what memory to cover.

> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
> and set the default type to writeback.

Seems safe to me, off the top of my head (and testing could confirm /
disprove quickly).

> In any case we're going to have to quirk it, because of the broken
> guests in the wild.

Thanks.
Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Jordan Justen July 14, 2015, 10:37 p.m. UTC | #12
On 2015-07-14 14:29:11, Laszlo Ersek wrote:
> On 07/14/15 23:15, Paolo Bonzini wrote:
> >> The long delay that Alex reported (for the case when all guest memory
> >> was set to UC up-front) is due to the fact that the SEC phase of OVMF
> >> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
> >> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
> >> drivers -- and this decompression is extremely memory-intensive.
> >>
> >> (When Jordan implemented that reset vector first, we saw similar
> >> performance degradation on AMD hosts (albeit not due to MTRR but due to
> >> page attributes). See
> >> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
> >> it here because it makes me appreciate the current problem report.)
> >>
> >> Anyway, the reset vector's page table building is implemented in
> >> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
> >> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
> > 
> > Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
> 
> That's an idea, yes, if someone feels sufficiently drawn to writing
> assembly.

Maybe we can use MtrrLib in the SEC C code?

-Jordan

> Complications:
> - the reset vector is specific to OvmfPkg only in the OvmfPkgX64.dsc
>   build
> - it needs to be determined what memory to cover.
> 
> > I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
> > and set the default type to writeback.
> 
> Seems safe to me, off the top of my head (and testing could confirm /
> disprove quickly).
> 
> > In any case we're going to have to quirk it, because of the broken
> > guests in the wild.
> 
> Thanks.
> Laszlo
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Fan, Jeff July 15, 2015, 12:14 a.m. UTC | #13
Actually, MMIO will be used in OVMF SEC phase if local APIC is consumed. (SecPeiDebugAgentLib will consume local APIC timer for communication between debugger TARGET/HOST).

So, I suggest to keep MTRR default value to UC and set the code range to WB, or set default value to WB and set Local APIC space range to UC. (gUefiCpuPkgTokenSpaceGuid.PcdCpuLocalApicBaseAddress|0xfee00000)

As Jordan's suggestion, you could make use of MTRR lib in UefiCpuPkg.

Jeff

-----Original Message-----
From: Paolo Bonzini [mailto:pbonzini@redhat.com] 
Sent: Wednesday, July 15, 2015 5:16 AM
To: Laszlo Ersek
Cc: edk2-devel list; Xiao Guangrong; kvm@vger.kernel.org; gleb@kernel.org; mtosatti@redhat.com; linux-kernel@vger.kernel.org
Subject: Re: [edk2] MTRR setup in OVMF [was: PATCH v3 01/10 KVM: MMU: fix decoding cache type from MTRR]

> The long delay that Alex reported (for the case when all guest memory 
> was set to UC up-front) is due to the fact that the SEC phase of OVMF 
> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to 
> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI 
> drivers -- and this decompression is extremely memory-intensive.
> 
> (When Jordan implemented that reset vector first, we saw similar 
> performance degradation on AMD hosts (albeit not due to MTRR but due 
> to page attributes). See 
> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only 
> mentioning it here because it makes me appreciate the current problem 
> report.)
> 
> Anyway, the reset vector's page table building is implemented in 
> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC 
> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().

Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs and set the default type to writeback.

In any case we're going to have to quirk it, because of the broken guests in the wild.

Paolo

------------------------------------------------------------------------------
Don't Limit Your Business. Reach for the Cloud.
GigeNET's Cloud Solutions provide you with the tools and support that you need to offload your IT needs and focus on growing your business.
Configured For All Businesses. Start Your Cloud Today.
https://www.gigenetcloud.com/
Laszlo Ersek July 15, 2015, 9:57 a.m. UTC | #14
On 07/15/15 00:37, Jordan Justen wrote:
> On 2015-07-14 14:29:11, Laszlo Ersek wrote:
>> On 07/14/15 23:15, Paolo Bonzini wrote:
>>>> The long delay that Alex reported (for the case when all guest memory
>>>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>>>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>>>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>>>> drivers -- and this decompression is extremely memory-intensive.
>>>>
>>>> (When Jordan implemented that reset vector first, we saw similar
>>>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>>>> page attributes). See
>>>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>>>> it here because it makes me appreciate the current problem report.)
>>>>
>>>> Anyway, the reset vector's page table building is implemented in
>>>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>>>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
>>>
>>> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
>>
>> That's an idea, yes, if someone feels sufficiently drawn to writing
>> assembly.
> 
> Maybe we can use MtrrLib in the SEC C code?

If the page table building in the reset vector is not too costly with
UC, then yes, it should suffice if MtrrLib was put to use first just
before the decompression in SEC.

Thanks
Laszlo

> -Jordan
> 
>> Complications:
>> - the reset vector is specific to OvmfPkg only in the OvmfPkgX64.dsc
>>   build
>> - it needs to be determined what memory to cover.
>>
>>> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
>>> and set the default type to writeback.
>>
>> Seems safe to me, off the top of my head (and testing could confirm /
>> disprove quickly).
>>
>>> In any case we're going to have to quirk it, because of the broken
>>> guests in the wild.
>>
>> Thanks.
>> Laszlo

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Xiao Guangrong July 15, 2015, 7:30 p.m. UTC | #15
Hi,

I have posted the pachset to make OVMF happy and have CCed you guys,
could you please check it if it works for you?


On 07/15/2015 05:15 AM, Paolo Bonzini wrote:
>> The long delay that Alex reported (for the case when all guest memory
>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>> drivers -- and this decompression is extremely memory-intensive.
>>
>> (When Jordan implemented that reset vector first, we saw similar
>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>> page attributes). See
>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>> it here because it makes me appreciate the current problem report.)
>>
>> Anyway, the reset vector's page table building is implemented in
>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
>
> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
> and set the default type to writeback.
>
> In any case we're going to have to quirk it, because of the broken
> guests in the wild.
>
> Paolo
>
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Laszlo Ersek July 15, 2015, 7:41 p.m. UTC | #16
On 07/15/15 21:30, Xiao Guangrong wrote:
> 
> Hi,
> 
> I have posted the pachset to make OVMF happy and have CCed you guys,
> could you please check it if it works for you?

Sorry, I can't check it; I don't have an environment that exposes the
issue in the first place. Perhaps Alex can check it, or refer some of
the users that reported the problem to him to this patchset? (Those
users were using bleeding edge v4.2-rc1 anyway, unlike conservative me.)

Thanks!
Laszlo

> On 07/15/2015 05:15 AM, Paolo Bonzini wrote:
>>> The long delay that Alex reported (for the case when all guest memory
>>> was set to UC up-front) is due to the fact that the SEC phase of OVMF
>>> decompresses an approximately 1712 KB sized, LZMA-compressed blob, to
>>> approx. 896 KB worth of PEI drivers and 8192 KB worth of DXE and UEFI
>>> drivers -- and this decompression is extremely memory-intensive.
>>>
>>> (When Jordan implemented that reset vector first, we saw similar
>>> performance degradation on AMD hosts (albeit not due to MTRR but due to
>>> page attributes). See
>>> <https://github.com/tianocore/edk2/commit/98f378a7>. I'm only mentioning
>>> it here because it makes me appreciate the current problem report.)
>>>
>>> Anyway, the reset vector's page table building is implemented in
>>> "OvmfPkg/ResetVector/Ia32/PageTables64.asm". The decompression in SEC
>>> can be found in "OvmfPkg/Sec/SecMain.c", function DecompressMemFvs().
>>
>> Perhaps the OVMF reset vector should initialize the MTRRs for the BSP?
>> I think SEC doesn't do any MMIO, so it should be enough to enable MTRRs
>> and set the default type to writeback.
>>
>> In any case we're going to have to quirk it, because of the broken
>> guests in the wild.
>>
>> Paolo
>>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/mmu.c b/arch/x86/kvm/mmu.c
index b78e83f..d00cebd 100644
--- a/arch/x86/kvm/mmu.c
+++ b/arch/x86/kvm/mmu.c
@@ -2393,19 +2393,20 @@  EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
 static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
 			 u64 start, u64 end)
 {
-	int i;
 	u64 base, mask;
 	u8 prev_match, curr_match;
-	int num_var_ranges = KVM_NR_VAR_MTRR;
+	int i, num_var_ranges = KVM_NR_VAR_MTRR;
 
-	if (!mtrr_state->enabled)
-		return 0xFF;
+	/* MTRR is completely disabled, use UC for all of physical memory. */
+	if (!(mtrr_state->enabled & 0x2))
+		return MTRR_TYPE_UNCACHABLE;
 
 	/* Make end inclusive end, instead of exclusive */
 	end--;
 
 	/* Look in fixed ranges. Just return the type as per start */
-	if (mtrr_state->have_fixed && (start < 0x100000)) {
+	if (mtrr_state->have_fixed && (mtrr_state->enabled & 0x1) &&
+	      (start < 0x100000)) {
 		int idx;
 
 		if (start < 0x80000) {
@@ -2428,9 +2429,6 @@  static int get_mtrr_type(struct mtrr_state_type *mtrr_state,
 	 * Look of multiple ranges matching this address and pick type
 	 * as per MTRR precedence
 	 */
-	if (!(mtrr_state->enabled & 2))
-		return mtrr_state->def_type;
-
 	prev_match = 0xFF;
 	for (i = 0; i < num_var_ranges; ++i) {
 		unsigned short start_state, end_state;