diff mbox

commit 3c2e7f7de3 (KVM use NPT page attributes) causes boot failures

Message ID 55E6BEB1.6080106@linux.intel.com (mailing list archive)
State New, archived
Headers show

Commit Message

Xiao Guangrong Sept. 2, 2015, 9:17 a.m. UTC
On 09/02/2015 11:50 AM, Markus Trippelsdorf wrote:
> On 2015.09.02 at 06:31 +0800, Xiao Guangrong wrote:
>>
>>
>> On 09/01/2015 09:56 PM, Markus Trippelsdorf wrote:
>>> On 2015.09.01 at 21:00 +0800, Xiao Guangrong wrote:
>>>>
>>>> Did it trigger the BUG()/BUG_ON() in mtrr2protval()/fallback_mtrr_type()?
>>>> If yes, could you please print the actual value out?
>>>
>>> It is the BUG() in fallback_mtrr_type(). I changed it to a printk and
>>> it prints 1 for the value of mtrr.
>>>
>>>    MTRR_TYPE_WRCOMB     1
>>>
>>
>> Then I suspect pat is not enabled in your box, could you please check
>> CONFIG_X86_PAT is selected in your .config file, pat is shown in
>> /proc/cpuid, "nopat" kernel parameter is used, and dmesg | grep PAT.
>
> No. PAT is of course enabled and booting is successful sometimes even
> with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.

Thanks for your confirmation.

>
> markus@x4 linux % cat .config | grep  X86_PAT
> CONFIG_X86_PAT=y
> markus@x4 linux % dmesg | grep PAT
> [    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT

Strange, BP processor has already set WC to PAT1, however KVM does not read it out
from PAT MSR on its local CPU.

Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
finished after module_init()?

Could please apply this diff and test it again?

         }
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

Comments

Markus Trippelsdorf Sept. 2, 2015, 9:38 a.m. UTC | #1
On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:
> >
> > No. PAT is of course enabled and booting is successful sometimes even
> > with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.
> 
> Thanks for your confirmation.
> 
> >
> > markus@x4 linux % cat .config | grep  X86_PAT
> > CONFIG_X86_PAT=y
> > markus@x4 linux % dmesg | grep PAT
> > [    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
> 
> Strange, BP processor has already set WC to PAT1, however KVM does not read it out
> from PAT MSR on its local CPU.
> 
> Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
> finished after module_init()?
> 
> Could please apply this diff and test it again?

(Your patch was malformed.)

[    2.138098] kvm: Nested Virtualization enabled
[    2.138153] kvm: Nested Paging enabled
[    2.138204] KVM PAT: 0x7040600070406.
[    2.138255] mtrr2protval[0]:18.
[    2.138306] mtrr2protval[1]:ff.
[    2.138356] mtrr2protval[2]:0.
[    2.138408] mtrr2protval[3]:0.
[    2.138459] mtrr2protval[4]:8.
[    2.138510] mtrr2protval[5]:ff.
[    2.138561] mtrr2protval[6]:0.
[    2.138612] mtrr2protval[7]:10.
[    2.138662] BUG in fallback_mtrr_type, mtrr = 1.
Xiao Guangrong Sept. 2, 2015, 10:27 a.m. UTC | #2
On 09/02/2015 05:38 PM, Markus Trippelsdorf wrote:
> On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:
>>>
>>> No. PAT is of course enabled and booting is successful sometimes even
>>> with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.
>>
>> Thanks for your confirmation.
>>
>>>
>>> markus@x4 linux % cat .config | grep  X86_PAT
>>> CONFIG_X86_PAT=y
>>> markus@x4 linux % dmesg | grep PAT
>>> [    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
>>
>> Strange, BP processor has already set WC to PAT1, however KVM does not read it out
>> from PAT MSR on its local CPU.
>>
>> Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
>> finished after module_init()?
>>
>> Could please apply this diff and test it again?
>
> (Your patch was malformed.)
>
> [    2.138098] kvm: Nested Virtualization enabled
> [    2.138153] kvm: Nested Paging enabled
> [    2.138204] KVM PAT: 0x7040600070406.

So the PAT is the value after CPU reset, it's likely PAT is not initialized on
the local CPU.

Maybe something is escaped from stop_machine() called in native_smp_cpus_done(),
Ingo?
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Markus Trippelsdorf Sept. 2, 2015, 10:54 a.m. UTC | #3
On 2015.09.02 at 18:27 +0800, Xiao Guangrong wrote:
> 
> 
> On 09/02/2015 05:38 PM, Markus Trippelsdorf wrote:
> > On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:
> >>>
> >>> No. PAT is of course enabled and booting is successful sometimes even
> >>> with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.
> >>
> >> Thanks for your confirmation.
> >>
> >>>
> >>> markus@x4 linux % cat .config | grep  X86_PAT
> >>> CONFIG_X86_PAT=y
> >>> markus@x4 linux % dmesg | grep PAT
> >>> [    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
> >>
> >> Strange, BP processor has already set WC to PAT1, however KVM does not read it out
> >> from PAT MSR on its local CPU.
> >>
> >> Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
> >> finished after module_init()?
> >>
> >> Could please apply this diff and test it again?
> >
> > (Your patch was malformed.)
> >
> > [    2.138098] kvm: Nested Virtualization enabled
> > [    2.138153] kvm: Nested Paging enabled
> > [    2.138204] KVM PAT: 0x7040600070406.
> 
> So the PAT is the value after CPU reset, it's likely PAT is not initialized on
> the local CPU.

Could it be a simple AMD/INTEL difference. I'm running a AMD CPU and I
see many !use_intel() in if statements in arch/x86/kernel/cpu/mtrr/main.c...
Xiao Guangrong Sept. 2, 2015, 10:57 a.m. UTC | #4
On 09/02/2015 06:54 PM, Markus Trippelsdorf wrote:
> On 2015.09.02 at 18:27 +0800, Xiao Guangrong wrote:
>>
>>
>> On 09/02/2015 05:38 PM, Markus Trippelsdorf wrote:
>>> On 2015.09.02 at 17:17 +0800, Xiao Guangrong wrote:
>>>>>
>>>>> No. PAT is of course enabled and booting is successful sometimes even
>>>>> with the BUG() in allback_mtrr_type(). I suspect a setup (timing) issue.
>>>>
>>>> Thanks for your confirmation.
>>>>
>>>>>
>>>>> markus@x4 linux % cat .config | grep  X86_PAT
>>>>> CONFIG_X86_PAT=y
>>>>> markus@x4 linux % dmesg | grep PAT
>>>>> [    0.000000] x86/PAT: Configuration [0-7]: WB  WC  UC- UC  WB  WC  UC- WT
>>>>
>>>> Strange, BP processor has already set WC to PAT1, however KVM does not read it out
>>>> from PAT MSR on its local CPU.
>>>>
>>>> Hmm... PAT default values do not include WC, it seems initing PAT on SP has not
>>>> finished after module_init()?
>>>>
>>>> Could please apply this diff and test it again?
>>>
>>> (Your patch was malformed.)
>>>
>>> [    2.138098] kvm: Nested Virtualization enabled
>>> [    2.138153] kvm: Nested Paging enabled
>>> [    2.138204] KVM PAT: 0x7040600070406.
>>
>> So the PAT is the value after CPU reset, it's likely PAT is not initialized on
>> the local CPU.
>
> Could it be a simple AMD/INTEL difference. I'm running a AMD CPU and I
> see many !use_intel() in if statements in arch/x86/kernel/cpu/mtrr/main.c...
>

#define use_intel()	(mtrr_if && mtrr_if->use_intel_if == 1)

And i checked your CPU supports "mtrr" /proc/info, so it should use
generic_mtrr_ops and generic_mtrr_ops.use_intel_if = 1. That means AMD CPU
also use "intel" way. :)

Please refer to the initiation of "mtrr_if" in mtrr_bp_init().
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 189e464..d9d3a30 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -884,6 +884,7 @@  static u8 fallback_mtrr_type(int mtrr)
         case MTRR_TYPE_WRPROT:
                 return MTRR_TYPE_UC_MINUS;
         default:
+               printk("BUG in %s, mtrr = %d.\n", __FUNCTION__, mtrr);
                 BUG();
         }
  }
@@ -907,6 +908,8 @@  static void build_mtrr2protval(void)
          * guest.
          */
         rdmsrl(MSR_IA32_CR_PAT, pat);
+       printk("KVM PAT: 0x%llx.\n", pat);
+
         for (i = 0; i < 8; i++) {
                 u8 mtrr = pat >> (8 * i);

@@ -914,10 +917,17 @@  static void build_mtrr2protval(void)
                         mtrr2protval[mtrr] = __cm_idx2pte(i);
         }

+       for (i = 0; i < 8; i++)
+               printk("mtrr2protval[%d]:%x.\n", i, mtrr2protval[i]);
+
+
         for (i = 0; i < 8; i++) {
                 if (mtrr2protval[i] == MTRR2PROTVAL_INVALID) {
                         u8 fallback = fallback_mtrr_type(i);
                         mtrr2protval[i] = mtrr2protval[fallback];
+                       if (mtrr2protval[i] == MTRR2PROTVAL_INVALID)
+                               printk("BUG in %s, mtrr2protval[%d] = %x.\n", __FUNCTION__, i, 
mtrr2protval[i]);
+
                         BUG_ON(mtrr2protval[i] == MTRR2PROTVAL_INVALID);
                 }