diff mbox series

[v2] KVM: SVM: Override default MMIO mask if memory encryption is enabled

Message ID d741b3a58769749b7873fea703c027a68b8e2e3d.1577462279.git.thomas.lendacky@amd.com (mailing list archive)
State New, archived
Headers show
Series [v2] KVM: SVM: Override default MMIO mask if memory encryption is enabled | expand

Commit Message

Tom Lendacky Dec. 27, 2019, 3:58 p.m. UTC
The KVM MMIO support uses bit 51 as the reserved bit to cause nested page
faults when a guest performs MMIO. The AMD memory encryption support uses
a CPUID function to define the encryption bit position. Given this, it is
possible that these bits can conflict.

Use svm_hardware_setup() to override the MMIO mask if memory encryption
support is enabled. When memory encryption support is enabled the physical
address width is reduced and the first bit after the last valid reduced
physical address bit will always be reserved. Use this bit as the MMIO
mask.

Fixes: 28a1f3ac1d0c ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs")
Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
---
 arch/x86/kvm/svm.c | 26 ++++++++++++++++++++++++++
 1 file changed, 26 insertions(+)

Comments

Sean Christopherson Jan. 6, 2020, 10:49 p.m. UTC | #1
On Fri, Dec 27, 2019 at 09:58:00AM -0600, Tom Lendacky wrote:
> The KVM MMIO support uses bit 51 as the reserved bit to cause nested page
> faults when a guest performs MMIO. The AMD memory encryption support uses
> a CPUID function to define the encryption bit position. Given this, it is
> possible that these bits can conflict.
> 
> Use svm_hardware_setup() to override the MMIO mask if memory encryption
> support is enabled. When memory encryption support is enabled the physical
> address width is reduced and the first bit after the last valid reduced
> physical address bit will always be reserved. Use this bit as the MMIO
> mask.
> 
> Fixes: 28a1f3ac1d0c ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs")
> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/kvm/svm.c | 26 ++++++++++++++++++++++++++
>  1 file changed, 26 insertions(+)
> 
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index 122d4ce3b1ab..2cb834b5982a 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1361,6 +1361,32 @@ static __init int svm_hardware_setup(void)
>  		}
>  	}
>  
> +	/*
> +	 * The default MMIO mask is a single bit (excluding the present bit),
> +	 * which could conflict with the memory encryption bit. Check for
> +	 * memory encryption support and override the default MMIO masks if
> +	 * it is enabled.
> +	 */
> +	if (cpuid_eax(0x80000000) >= 0x8000001f) {
> +		u64 msr, mask;
> +
> +		rdmsrl(MSR_K8_SYSCFG, msr);
> +		if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT)  {
> +			/*
> +			 * The physical addressing width is reduced. The first
> +			 * bit above the new physical addressing limit will
> +			 * always be reserved. Use this bit and the present bit
> +			 * to generate a page fault with PFER.RSV = 1.
> +			 */
> +			mask = BIT_ULL(boot_cpu_data.x86_phys_bits);

This doesn't handle the case where x86_phys_bits _isn't_ reduced by SME/SEV
on a future processor, i.e. x86_phys_bits==52.

After staring at things for a while, I think we can handle this issue with
minimal fuss by special casing MKTME in kvm_set_mmio_spte_mask().  I'll
send a patch, I have a related bug fix for kvm_set_mmio_spte_mask() that
touches the same code.

> +			mask |= BIT_ULL(0);
> +
> +			kvm_mmu_set_mmio_spte_mask(mask, mask,
> +						   PT_WRITABLE_MASK |
> +						   PT_USER_MASK);
> +		}
> +	}
> +
>  	for_each_possible_cpu(cpu) {
>  		r = svm_cpu_init(cpu);
>  		if (r)
> -- 
> 2.17.1
>
Tom Lendacky Jan. 6, 2020, 11:14 p.m. UTC | #2
On 1/6/20 4:49 PM, Sean Christopherson wrote:
> On Fri, Dec 27, 2019 at 09:58:00AM -0600, Tom Lendacky wrote:
>> The KVM MMIO support uses bit 51 as the reserved bit to cause nested page
>> faults when a guest performs MMIO. The AMD memory encryption support uses
>> a CPUID function to define the encryption bit position. Given this, it is
>> possible that these bits can conflict.
>>
>> Use svm_hardware_setup() to override the MMIO mask if memory encryption
>> support is enabled. When memory encryption support is enabled the physical
>> address width is reduced and the first bit after the last valid reduced
>> physical address bit will always be reserved. Use this bit as the MMIO
>> mask.
>>
>> Fixes: 28a1f3ac1d0c ("kvm: x86: Set highest physical address bits in non-present/reserved SPTEs")
>> Suggested-by: Sean Christopherson <sean.j.christopherson@intel.com>
>> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
>> ---
>>  arch/x86/kvm/svm.c | 26 ++++++++++++++++++++++++++
>>  1 file changed, 26 insertions(+)
>>
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> index 122d4ce3b1ab..2cb834b5982a 100644
>> --- a/arch/x86/kvm/svm.c
>> +++ b/arch/x86/kvm/svm.c
>> @@ -1361,6 +1361,32 @@ static __init int svm_hardware_setup(void)
>>  		}
>>  	}
>>  
>> +	/*
>> +	 * The default MMIO mask is a single bit (excluding the present bit),
>> +	 * which could conflict with the memory encryption bit. Check for
>> +	 * memory encryption support and override the default MMIO masks if
>> +	 * it is enabled.
>> +	 */
>> +	if (cpuid_eax(0x80000000) >= 0x8000001f) {
>> +		u64 msr, mask;
>> +
>> +		rdmsrl(MSR_K8_SYSCFG, msr);
>> +		if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT)  {
>> +			/*
>> +			 * The physical addressing width is reduced. The first
>> +			 * bit above the new physical addressing limit will
>> +			 * always be reserved. Use this bit and the present bit
>> +			 * to generate a page fault with PFER.RSV = 1.
>> +			 */
>> +			mask = BIT_ULL(boot_cpu_data.x86_phys_bits);
> 
> This doesn't handle the case where x86_phys_bits _isn't_ reduced by SME/SEV
> on a future processor, i.e. x86_phys_bits==52.

Not sure I follow. If MSR_K8_SYSCFG_MEM_ENCRYPT is set then there will
always be a reduction in physical addressing (so I'm told). And if
MSR_K8_SYSCFG_MEM_ENCRYPT isn't set and x86_phys_bits == 52, then
everything should be fine with the existing kvm_set_mmio_spte_mask()
function where bit 51 is set but the present bit isn't, correct?

Thanks,
Tom

> 
> After staring at things for a while, I think we can handle this issue with
> minimal fuss by special casing MKTME in kvm_set_mmio_spte_mask().  I'll
> send a patch, I have a related bug fix for kvm_set_mmio_spte_mask() that
> touches the same code.
> 
>> +			mask |= BIT_ULL(0);
>> +
>> +			kvm_mmu_set_mmio_spte_mask(mask, mask,
>> +						   PT_WRITABLE_MASK |
>> +						   PT_USER_MASK);
>> +		}
>> +	}
>> +
>>  	for_each_possible_cpu(cpu) {
>>  		r = svm_cpu_init(cpu);
>>  		if (r)
>> -- 
>> 2.17.1
>>
Sean Christopherson Jan. 6, 2020, 11:38 p.m. UTC | #3
On Mon, Jan 06, 2020 at 05:14:04PM -0600, Tom Lendacky wrote:
> On 1/6/20 4:49 PM, Sean Christopherson wrote:
> > This doesn't handle the case where x86_phys_bits _isn't_ reduced by SME/SEV
> > on a future processor, i.e. x86_phys_bits==52.
> 
> Not sure I follow. If MSR_K8_SYSCFG_MEM_ENCRYPT is set then there will
> always be a reduction in physical addressing (so I'm told).

Hmm, I'm going off APM Vol 2, which states, or at least strongly implies,
that reducing the PA space is optional.  Section 7.10.2 is especially
clear on this:

  In implementations where the physical address size of the processor is
  reduced when memory encryption features are enabled, software must
  ensure it is executing from addresses where these upper physical address
  bits are 0 prior to setting SYSCFG[MemEncryptionModEn].

But, hopefully the other approach I have in mind actually works, as it's
significantly less special-case code and would naturally handle either
case, i.e. make this a moot point.


Entry on SYSCFG:

  3.2.1 System Configuration Register (SYSCFG)

  ...

  MemEncryptionMode. Bit 23.  Setting this bit to 1 enables the SME and
  SEV memory encryption features.

The SME entry the above links to says:

  7.10.1 Determining Support for Secure Memory Encryption

  ...

  Additionally, in some implementations, the physical address size of the
  processor may be reduced when memory encryption features are enabled, for
  example from 48 to 43 bits. In this case the upper physical address bits are
  treated as reserved when the feature is enabled except where otherwise
  indicated. When memory encryption is supported in an implementation, CPUID
  Fn8000_001F[EBX] reports any physical address size reduction present. Bits
  reserved in this mode are treated the same as other page table reserved bits,
  and will generate a page fault if found to be non-zero when used for address
  translation.

  ...

  7.10.2 Enabling Memory Encryption Extensions

  Prior to using SME, memory encryption features must be enabled by setting
  SYSCFG MSR bit 23 (MemEncryptionModEn) to 1. In implementations where the
  physical address size of the processor is reduced when memory encryption
  features are enabled, software must ensure it is executing from addresses where
  these upper physical address bits are 0 prior to setting
  SYSCFG[MemEncryptionModEn]. Memory encryption is then further controlled via
  the page tables.
Tom Lendacky Jan. 7, 2020, 8:16 p.m. UTC | #4
On 1/6/20 5:38 PM, Sean Christopherson wrote:
> On Mon, Jan 06, 2020 at 05:14:04PM -0600, Tom Lendacky wrote:
>> On 1/6/20 4:49 PM, Sean Christopherson wrote:
>>> This doesn't handle the case where x86_phys_bits _isn't_ reduced by SME/SEV
>>> on a future processor, i.e. x86_phys_bits==52.
>>
>> Not sure I follow. If MSR_K8_SYSCFG_MEM_ENCRYPT is set then there will
>> always be a reduction in physical addressing (so I'm told).
> 
> Hmm, I'm going off APM Vol 2, which states, or at least strongly implies,
> that reducing the PA space is optional.  Section 7.10.2 is especially
> clear on this:
> 
>   In implementations where the physical address size of the processor is
>   reduced when memory encryption features are enabled, software must
>   ensure it is executing from addresses where these upper physical address
>   bits are 0 prior to setting SYSCFG[MemEncryptionModEn].

It's probably not likely, but given what is stated, I can modify my patch
to check for a x86_phys_bits == 52 and skip the call to set the mask, eg:

	if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT &&
	    boot_cpu_data.x86_phys_bits < 52) {

> 
> But, hopefully the other approach I have in mind actually works, as it's
> significantly less special-case code and would naturally handle either
> case, i.e. make this a moot point.

I'll hold off on the above and wait for your patch.

Thanks,
Tom

> 
> 
> Entry on SYSCFG:
> 
>   3.2.1 System Configuration Register (SYSCFG)
> 
>   ...
> 
>   MemEncryptionMode. Bit 23.  Setting this bit to 1 enables the SME and
>   SEV memory encryption features.
> 
> The SME entry the above links to says:
> 
>   7.10.1 Determining Support for Secure Memory Encryption
> 
>   ...
> 
>   Additionally, in some implementations, the physical address size of the
>   processor may be reduced when memory encryption features are enabled, for
>   example from 48 to 43 bits. In this case the upper physical address bits are
>   treated as reserved when the feature is enabled except where otherwise
>   indicated. When memory encryption is supported in an implementation, CPUID
>   Fn8000_001F[EBX] reports any physical address size reduction present. Bits
>   reserved in this mode are treated the same as other page table reserved bits,
>   and will generate a page fault if found to be non-zero when used for address
>   translation.
> 
>   ...
> 
>   7.10.2 Enabling Memory Encryption Extensions
> 
>   Prior to using SME, memory encryption features must be enabled by setting
>   SYSCFG MSR bit 23 (MemEncryptionModEn) to 1. In implementations where the
>   physical address size of the processor is reduced when memory encryption
>   features are enabled, software must ensure it is executing from addresses where
>   these upper physical address bits are 0 prior to setting
>   SYSCFG[MemEncryptionModEn]. Memory encryption is then further controlled via
>   the page tables.
>
Sean Christopherson Jan. 7, 2020, 10:28 p.m. UTC | #5
On Tue, Jan 07, 2020 at 02:16:37PM -0600, Tom Lendacky wrote:
> On 1/6/20 5:38 PM, Sean Christopherson wrote:
> > On Mon, Jan 06, 2020 at 05:14:04PM -0600, Tom Lendacky wrote:
> >> On 1/6/20 4:49 PM, Sean Christopherson wrote:
> >>> This doesn't handle the case where x86_phys_bits _isn't_ reduced by SME/SEV
> >>> on a future processor, i.e. x86_phys_bits==52.
> >>
> >> Not sure I follow. If MSR_K8_SYSCFG_MEM_ENCRYPT is set then there will
> >> always be a reduction in physical addressing (so I'm told).
> > 
> > Hmm, I'm going off APM Vol 2, which states, or at least strongly implies,
> > that reducing the PA space is optional.  Section 7.10.2 is especially
> > clear on this:
> > 
> >   In implementations where the physical address size of the processor is
> >   reduced when memory encryption features are enabled, software must
> >   ensure it is executing from addresses where these upper physical address
> >   bits are 0 prior to setting SYSCFG[MemEncryptionModEn].
> 
> It's probably not likely, but given what is stated, I can modify my patch
> to check for a x86_phys_bits == 52 and skip the call to set the mask, eg:
> 
> 	if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT &&
> 	    boot_cpu_data.x86_phys_bits < 52) {
> 
> > 
> > But, hopefully the other approach I have in mind actually works, as it's
> > significantly less special-case code and would naturally handle either
> > case, i.e. make this a moot point.
> 
> I'll hold off on the above and wait for your patch.

Sorry for the delay, this is a bigger mess than originally thought.  Or
I'm completely misunderstanding the issue, which is also a distinct
possibility :-)

Due to KVM activating its L1TF mitigation irrespective of whether the CPU
is whitelisted as not being vulnerable to L1TF, simply using 86_phys_bits
to avoid colliding with the C-bit isn't sufficient as the L1TF mitigation
uses those first five reserved PA bits to store the MMIO GFN.  Setting
BIT(x86_phys_bits) for all MMIO sptes would cause it to be interpreted as
a GFN bit when the L1TF mitigation is active and lead to bogus MMIO.

The only sane approach I can think of is to activate the L1TF mitigation
based on whether the CPU is vulnerable to L1TF, as opposed to activating
the mitigation purely based on the max PA of the CPU.  Since all CPUs that
support SME/SEV are whitelisted as NO_L1TF, the L1TF mitigation and C-bit
should never be active at the same time.

Patch should be incoming soon...
Tom Lendacky Jan. 7, 2020, 10:54 p.m. UTC | #6
On 1/7/20 4:28 PM, Sean Christopherson wrote:
> On Tue, Jan 07, 2020 at 02:16:37PM -0600, Tom Lendacky wrote:
>> On 1/6/20 5:38 PM, Sean Christopherson wrote:
>>> On Mon, Jan 06, 2020 at 05:14:04PM -0600, Tom Lendacky wrote:
>>>> On 1/6/20 4:49 PM, Sean Christopherson wrote:
>>>>> This doesn't handle the case where x86_phys_bits _isn't_ reduced by SME/SEV
>>>>> on a future processor, i.e. x86_phys_bits==52.
>>>>
>>>> Not sure I follow. If MSR_K8_SYSCFG_MEM_ENCRYPT is set then there will
>>>> always be a reduction in physical addressing (so I'm told).
>>>
>>> Hmm, I'm going off APM Vol 2, which states, or at least strongly implies,
>>> that reducing the PA space is optional.  Section 7.10.2 is especially
>>> clear on this:
>>>
>>>   In implementations where the physical address size of the processor is
>>>   reduced when memory encryption features are enabled, software must
>>>   ensure it is executing from addresses where these upper physical address
>>>   bits are 0 prior to setting SYSCFG[MemEncryptionModEn].
>>
>> It's probably not likely, but given what is stated, I can modify my patch
>> to check for a x86_phys_bits == 52 and skip the call to set the mask, eg:
>>
>> 	if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT &&
>> 	    boot_cpu_data.x86_phys_bits < 52) {
>>
>>>
>>> But, hopefully the other approach I have in mind actually works, as it's
>>> significantly less special-case code and would naturally handle either
>>> case, i.e. make this a moot point.
>>
>> I'll hold off on the above and wait for your patch.
> 
> Sorry for the delay, this is a bigger mess than originally thought.  Or
> I'm completely misunderstanding the issue, which is also a distinct
> possibility :-)
> 
> Due to KVM activating its L1TF mitigation irrespective of whether the CPU
> is whitelisted as not being vulnerable to L1TF, simply using 86_phys_bits
> to avoid colliding with the C-bit isn't sufficient as the L1TF mitigation
> uses those first five reserved PA bits to store the MMIO GFN.  Setting
> BIT(x86_phys_bits) for all MMIO sptes would cause it to be interpreted as
> a GFN bit when the L1TF mitigation is active and lead to bogus MMIO.

The L1TF mitigation only gets applied when:
  boot_cpu_data.x86_cache_bits < 52 - shadow_nonpresent_or_rsvd_mask_len

  and with shadow_nonpresent_or_rsvd_mask_len = 5, that means that means
  boot_cpu_data.x86_cache_bits < 47.

On AMD processors that support memory encryption, the x86_cache_bits value
is not adjusted, just the x86_phys_bits. So for AMD processors that have
memory encryption support, this value will be at least 48 and therefore
not activate the L1TF mitigation.

> 
> The only sane approach I can think of is to activate the L1TF mitigation
> based on whether the CPU is vulnerable to L1TF, as opposed to activating> the mitigation purely based on the max PA of the CPU.  Since all CPUs that
> support SME/SEV are whitelisted as NO_L1TF, the L1TF mitigation and C-bit
> should never be active at the same time.

There is still the issue of setting a single bit that can conflict with
the C-bit. As it is today, if the C-bit were to be defined as bit 51, then
KVM would not take a nested page fault and MMIO would be broken.

Thanks,
Tom

> 
> Patch should be incoming soon...
>
Sean Christopherson Jan. 7, 2020, 11:31 p.m. UTC | #7
On Tue, Jan 07, 2020 at 04:54:34PM -0600, Tom Lendacky wrote:
> On 1/7/20 4:28 PM, Sean Christopherson wrote:
> > On Tue, Jan 07, 2020 at 02:16:37PM -0600, Tom Lendacky wrote:
> >> On 1/6/20 5:38 PM, Sean Christopherson wrote:
> >>> On Mon, Jan 06, 2020 at 05:14:04PM -0600, Tom Lendacky wrote:
> >>>> On 1/6/20 4:49 PM, Sean Christopherson wrote:
> >>>>> This doesn't handle the case where x86_phys_bits _isn't_ reduced by SME/SEV
> >>>>> on a future processor, i.e. x86_phys_bits==52.
> >>>>
> >>>> Not sure I follow. If MSR_K8_SYSCFG_MEM_ENCRYPT is set then there will
> >>>> always be a reduction in physical addressing (so I'm told).
> >>>
> >>> Hmm, I'm going off APM Vol 2, which states, or at least strongly implies,
> >>> that reducing the PA space is optional.  Section 7.10.2 is especially
> >>> clear on this:
> >>>
> >>>   In implementations where the physical address size of the processor is
> >>>   reduced when memory encryption features are enabled, software must
> >>>   ensure it is executing from addresses where these upper physical address
> >>>   bits are 0 prior to setting SYSCFG[MemEncryptionModEn].
> >>
> >> It's probably not likely, but given what is stated, I can modify my patch
> >> to check for a x86_phys_bits == 52 and skip the call to set the mask, eg:
> >>
> >> 	if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT &&
> >> 	    boot_cpu_data.x86_phys_bits < 52) {
> >>
> >>>
> >>> But, hopefully the other approach I have in mind actually works, as it's
> >>> significantly less special-case code and would naturally handle either
> >>> case, i.e. make this a moot point.
> >>
> >> I'll hold off on the above and wait for your patch.
> > 
> > Sorry for the delay, this is a bigger mess than originally thought.  Or
> > I'm completely misunderstanding the issue, which is also a distinct
> > possibility :-)
> > 
> > Due to KVM activating its L1TF mitigation irrespective of whether the CPU
> > is whitelisted as not being vulnerable to L1TF, simply using 86_phys_bits
> > to avoid colliding with the C-bit isn't sufficient as the L1TF mitigation
> > uses those first five reserved PA bits to store the MMIO GFN.  Setting
> > BIT(x86_phys_bits) for all MMIO sptes would cause it to be interpreted as
> > a GFN bit when the L1TF mitigation is active and lead to bogus MMIO.
> 
> The L1TF mitigation only gets applied when:
>   boot_cpu_data.x86_cache_bits < 52 - shadow_nonpresent_or_rsvd_mask_len
> 
>   and with shadow_nonpresent_or_rsvd_mask_len = 5, that means that means
>   boot_cpu_data.x86_cache_bits < 47.
> 
> On AMD processors that support memory encryption, the x86_cache_bits value
> is not adjusted, just the x86_phys_bits. So for AMD processors that have
> memory encryption support, this value will be at least 48 and therefore
> not activate the L1TF mitigation.

Ah.  Hrm.  I'd prefer to clean that code up to make the interactions more
explicit, but may be we can separate that out.

> > The only sane approach I can think of is to activate the L1TF mitigation
> > based on whether the CPU is vulnerable to L1TF, as opposed to activating> the mitigation purely based on the max PA of the CPU.  Since all CPUs that
> > support SME/SEV are whitelisted as NO_L1TF, the L1TF mitigation and C-bit
> > should never be active at the same time.
> 
> There is still the issue of setting a single bit that can conflict with
> the C-bit. As it is today, if the C-bit were to be defined as bit 51, then
> KVM would not take a nested page fault and MMIO would be broken.

Wouldn't Paolo's patch to use the raw "cpuid_eax(0x80000008) & 0xff" for
shadow_phys_bits fix that particular collision by causing
kvm_set_mmio_spte_mask() to clear the present bit?  Or am I misundertanding
how the PA reduction interacts with the C-Bit?

AIUI, using phys_bits=48, then the standard scenario is Cbit=47 and some
additional bits 46:M are reserved.  Applying that logic to phys_bits=52,
then Cbit=51 and bits 50:M are reserved, so there's a collision but it's
mostly benign because shadow_phys_bits==52, which triggers this:

	if (IS_ENABLED(CONFIG_X86_64) && shadow_phys_bits == 52)
		mask &= ~1ull;

In other words, Paolo's patch fixes the fatal bug, but unnecessarily
disables optimized MMIO page faults.  To remedy that, your idea is to rely
on the (undocumented?) behavior that there are always additional reserved
bits between Cbit and the reduced x86_phys_bits.
Tom Lendacky Jan. 7, 2020, 11:51 p.m. UTC | #8
On 1/7/20 5:31 PM, Sean Christopherson wrote:
> On Tue, Jan 07, 2020 at 04:54:34PM -0600, Tom Lendacky wrote:
>> On 1/7/20 4:28 PM, Sean Christopherson wrote:
>>> On Tue, Jan 07, 2020 at 02:16:37PM -0600, Tom Lendacky wrote:
>>>> On 1/6/20 5:38 PM, Sean Christopherson wrote:
>>>>> On Mon, Jan 06, 2020 at 05:14:04PM -0600, Tom Lendacky wrote:
>>>>>> On 1/6/20 4:49 PM, Sean Christopherson wrote:
>>>>>>> This doesn't handle the case where x86_phys_bits _isn't_ reduced by SME/SEV
>>>>>>> on a future processor, i.e. x86_phys_bits==52.
>>>>>>
>>>>>> Not sure I follow. If MSR_K8_SYSCFG_MEM_ENCRYPT is set then there will
>>>>>> always be a reduction in physical addressing (so I'm told).
>>>>>
>>>>> Hmm, I'm going off APM Vol 2, which states, or at least strongly implies,
>>>>> that reducing the PA space is optional.  Section 7.10.2 is especially
>>>>> clear on this:
>>>>>
>>>>>   In implementations where the physical address size of the processor is
>>>>>   reduced when memory encryption features are enabled, software must
>>>>>   ensure it is executing from addresses where these upper physical address
>>>>>   bits are 0 prior to setting SYSCFG[MemEncryptionModEn].
>>>>
>>>> It's probably not likely, but given what is stated, I can modify my patch
>>>> to check for a x86_phys_bits == 52 and skip the call to set the mask, eg:
>>>>
>>>> 	if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT &&
>>>> 	    boot_cpu_data.x86_phys_bits < 52) {
>>>>
>>>>>
>>>>> But, hopefully the other approach I have in mind actually works, as it's
>>>>> significantly less special-case code and would naturally handle either
>>>>> case, i.e. make this a moot point.
>>>>
>>>> I'll hold off on the above and wait for your patch.
>>>
>>> Sorry for the delay, this is a bigger mess than originally thought.  Or
>>> I'm completely misunderstanding the issue, which is also a distinct
>>> possibility :-)
>>>
>>> Due to KVM activating its L1TF mitigation irrespective of whether the CPU
>>> is whitelisted as not being vulnerable to L1TF, simply using 86_phys_bits
>>> to avoid colliding with the C-bit isn't sufficient as the L1TF mitigation
>>> uses those first five reserved PA bits to store the MMIO GFN.  Setting
>>> BIT(x86_phys_bits) for all MMIO sptes would cause it to be interpreted as
>>> a GFN bit when the L1TF mitigation is active and lead to bogus MMIO.
>>
>> The L1TF mitigation only gets applied when:
>>   boot_cpu_data.x86_cache_bits < 52 - shadow_nonpresent_or_rsvd_mask_len
>>
>>   and with shadow_nonpresent_or_rsvd_mask_len = 5, that means that means
>>   boot_cpu_data.x86_cache_bits < 47.
>>
>> On AMD processors that support memory encryption, the x86_cache_bits value
>> is not adjusted, just the x86_phys_bits. So for AMD processors that have
>> memory encryption support, this value will be at least 48 and therefore
>> not activate the L1TF mitigation.
> 
> Ah.  Hrm.  I'd prefer to clean that code up to make the interactions more
> explicit, but may be we can separate that out.
> 
>>> The only sane approach I can think of is to activate the L1TF mitigation
>>> based on whether the CPU is vulnerable to L1TF, as opposed to activating> the mitigation purely based on the max PA of the CPU.  Since all CPUs that
>>> support SME/SEV are whitelisted as NO_L1TF, the L1TF mitigation and C-bit
>>> should never be active at the same time.
>>
>> There is still the issue of setting a single bit that can conflict with
>> the C-bit. As it is today, if the C-bit were to be defined as bit 51, then
>> KVM would not take a nested page fault and MMIO would be broken.
> 
> Wouldn't Paolo's patch to use the raw "cpuid_eax(0x80000008) & 0xff" for
> shadow_phys_bits fix that particular collision by causing
> kvm_set_mmio_spte_mask() to clear the present bit?  Or am I misundertanding
> how the PA reduction interacts with the C-Bit?
> 
> AIUI, using phys_bits=48, then the standard scenario is Cbit=47 and some
> additional bits 46:M are reserved.  Applying that logic to phys_bits=52,
> then Cbit=51 and bits 50:M are reserved, so there's a collision but it's

There's no requirement that the C-bit correspond to phys_bits. So, for
example, you can have C-bit=51 and phys_bits=48 and so 47:M are reserved.

Thanks,
Tom

> mostly benign because shadow_phys_bits==52, which triggers this:
> 
> 	if (IS_ENABLED(CONFIG_X86_64) && shadow_phys_bits == 52)
> 		mask &= ~1ull;
> 
> In other words, Paolo's patch fixes the fatal bug, but unnecessarily
> disables optimized MMIO page faults.  To remedy that, your idea is to rely
> on the (undocumented?) behavior that there are always additional reserved
> bits between Cbit and the reduced x86_phys_bits.
>
Sean Christopherson Jan. 8, 2020, 12:04 a.m. UTC | #9
On Tue, Jan 07, 2020 at 05:51:51PM -0600, Tom Lendacky wrote:
> On 1/7/20 5:31 PM, Sean Christopherson wrote:
> > AIUI, using phys_bits=48, then the standard scenario is Cbit=47 and some
> > additional bits 46:M are reserved.  Applying that logic to phys_bits=52,
> > then Cbit=51 and bits 50:M are reserved, so there's a collision but it's
> 
> There's no requirement that the C-bit correspond to phys_bits. So, for
> example, you can have C-bit=51 and phys_bits=48 and so 47:M are reserved.

But then using blindly using x86_phys_bits would break if the PA bits
aren't reduced, e.g. C-bit=47 and phys_bits=47. AFAICT, there's no
requirement that there be reduced PA bits when there is a C-bit.  I'm
guessing there aren't plans to ship such CPUs, but I don't see anything
in the APM to prevent such a scenario.

Maybe the least painful approach would be to go with a version of this
patch and add a check that there are indeeded reserved/reduced bits?
Probably with a WARN_ON_ONCE if the check fails.
Tom Lendacky Jan. 8, 2020, 1:57 p.m. UTC | #10
On 1/7/20 6:04 PM, Sean Christopherson wrote:
> On Tue, Jan 07, 2020 at 05:51:51PM -0600, Tom Lendacky wrote:
>> On 1/7/20 5:31 PM, Sean Christopherson wrote:
>>> AIUI, using phys_bits=48, then the standard scenario is Cbit=47 and some
>>> additional bits 46:M are reserved.  Applying that logic to phys_bits=52,
>>> then Cbit=51 and bits 50:M are reserved, so there's a collision but it's
>>
>> There's no requirement that the C-bit correspond to phys_bits. So, for
>> example, you can have C-bit=51 and phys_bits=48 and so 47:M are reserved.
> 
> But then using blindly using x86_phys_bits would break if the PA bits
> aren't reduced, e.g. C-bit=47 and phys_bits=47. AFAICT, there's no
> requirement that there be reduced PA bits when there is a C-bit.  I'm
> guessing there aren't plans to ship such CPUs, but I don't see anything
> in the APM to prevent such a scenario.

I can add in extra checks to see if C-bit == phys_bits, etc. and adjust
with appropriate limit checking. It's in the init path, so the extra
checks aren't a big deal.

Thanks,
Tom

> 
> Maybe the least painful approach would be to go with a version of this
> patch and add a check that there are indeeded reserved/reduced bits?
> Probably with a WARN_ON_ONCE if the check fails.
>
Tom Lendacky Jan. 8, 2020, 6:41 p.m. UTC | #11
On 1/8/20 7:57 AM, Tom Lendacky wrote:
> On 1/7/20 6:04 PM, Sean Christopherson wrote:
>> On Tue, Jan 07, 2020 at 05:51:51PM -0600, Tom Lendacky wrote:
>>> On 1/7/20 5:31 PM, Sean Christopherson wrote:
>>>> AIUI, using phys_bits=48, then the standard scenario is Cbit=47 and some
>>>> additional bits 46:M are reserved.  Applying that logic to phys_bits=52,
>>>> then Cbit=51 and bits 50:M are reserved, so there's a collision but it's
>>>
>>> There's no requirement that the C-bit correspond to phys_bits. So, for
>>> example, you can have C-bit=51 and phys_bits=48 and so 47:M are reserved.
>>
>> But then using blindly using x86_phys_bits would break if the PA bits
>> aren't reduced, e.g. C-bit=47 and phys_bits=47. AFAICT, there's no
>> requirement that there be reduced PA bits when there is a C-bit.  I'm
>> guessing there aren't plans to ship such CPUs, but I don't see anything
>> in the APM to prevent such a scenario.
> 
> I can add in extra checks to see if C-bit == phys_bits, etc. and adjust
> with appropriate limit checking. It's in the init path, so the extra
> checks aren't a big deal.

Just sent V3 of the patch. I believe I have all the areas we discussed
covered. I also went back to using rsvd_bits() as was used before the
L1TF changes. Let me know what you think.

Thanks,
Tom

> 
> Thanks,
> Tom
> 
>>
>> Maybe the least painful approach would be to go with a version of this
>> patch and add a check that there are indeeded reserved/reduced bits?
>> Probably with a WARN_ON_ONCE if the check fails.
>>
diff mbox series

Patch

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 122d4ce3b1ab..2cb834b5982a 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1361,6 +1361,32 @@  static __init int svm_hardware_setup(void)
 		}
 	}
 
+	/*
+	 * The default MMIO mask is a single bit (excluding the present bit),
+	 * which could conflict with the memory encryption bit. Check for
+	 * memory encryption support and override the default MMIO masks if
+	 * it is enabled.
+	 */
+	if (cpuid_eax(0x80000000) >= 0x8000001f) {
+		u64 msr, mask;
+
+		rdmsrl(MSR_K8_SYSCFG, msr);
+		if (msr & MSR_K8_SYSCFG_MEM_ENCRYPT)  {
+			/*
+			 * The physical addressing width is reduced. The first
+			 * bit above the new physical addressing limit will
+			 * always be reserved. Use this bit and the present bit
+			 * to generate a page fault with PFER.RSV = 1.
+			 */
+			mask = BIT_ULL(boot_cpu_data.x86_phys_bits);
+			mask |= BIT_ULL(0);
+
+			kvm_mmu_set_mmio_spte_mask(mask, mask,
+						   PT_WRITABLE_MASK |
+						   PT_USER_MASK);
+		}
+	}
+
 	for_each_possible_cpu(cpu) {
 		r = svm_cpu_init(cpu);
 		if (r)