mbox series

[v4,0/2] kvm/cpuid: set proper GuestPhysBits in CPUID.0x80000008

Message ID 20240313125844.912415-1-kraxel@redhat.com (mailing list archive)
Headers show
Series kvm/cpuid: set proper GuestPhysBits in CPUID.0x80000008 | expand

Message

Gerd Hoffmann March 13, 2024, 12:58 p.m. UTC
Use the GuestPhysBits field (EAX[23:16]) to communicate the max
addressable GPA to the guest.  Typically this is identical to the max
effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does
not support 5-level TDP.

See commit messages and source code comments for details.

v4 changes:
 - comment fixups.
 - picked up reviewed-by tags,

Gerd Hoffmann (2):
  kvm/cpuid: remove GuestPhysBits code.
  kvm/cpuid: set proper GuestPhysBits in CPUID.0x80000008

 arch/x86/kvm/mmu.h     |  2 ++
 arch/x86/kvm/cpuid.c   | 41 +++++++++++++++++++++++++++++++----------
 arch/x86/kvm/mmu/mmu.c |  5 +++++
 3 files changed, 38 insertions(+), 10 deletions(-)

Comments

Sean Christopherson April 10, 2024, 12:19 a.m. UTC | #1
On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote:
> Use the GuestPhysBits field (EAX[23:16]) to communicate the max
> addressable GPA to the guest.  Typically this is identical to the max
> effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does
> not support 5-level TDP.
> 
> See commit messages and source code comments for details.
> 
> [...]

Applied to kvm-x86 misc, with massaged changelogs to be more verbose when
describing the impact of each change, e.g. to call out that patch 2 isn't an
urgent fix because guest firmware can simply limit itself to using GPAs that
can be addressed with 4-level paging.

I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when
patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR
as the raw/real MAXPHYADDR.

Please holler if you (or anyone) disagrees with the changes or my analysis on
the KVM-on-KVM issue.

Thanks!

[1/2] KVM: x86: Don't advertise guest.MAXPHYADDR as host.MAXPHYADDR in CPUID
      https://github.com/kvm-x86/linux/commit/6f5c9600621b
[2/2] KVM: x86: Advertise max mappable GPA in CPUID.0x80000008.GuestPhysBits
      https://github.com/kvm-x86/linux/commit/b628cb523c65

--
https://github.com/kvm-x86/linux/tree/next
Xiaoyao Li April 12, 2024, 7:03 a.m. UTC | #2
On 4/10/2024 8:19 AM, Sean Christopherson wrote:
> On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote:
>> Use the GuestPhysBits field (EAX[23:16]) to communicate the max
>> addressable GPA to the guest.  Typically this is identical to the max
>> effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does
>> not support 5-level TDP.
>>
>> See commit messages and source code comments for details.
>>
>> [...]
> 
> Applied to kvm-x86 misc, with massaged changelogs to be more verbose when
> describing the impact of each change, e.g. to call out that patch 2 isn't an
> urgent fix because guest firmware can simply limit itself to using GPAs that
> can be addressed with 4-level paging.
> 
> I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when
> patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR
> as the raw/real MAXPHYADDR.

you mean old KVM on new KVM?

As far as I see, it seems no harm. e.g., if the userspace and L0 KVM 
have the new implementation. On Intel SRF platform, L1 KVM sees 
EAX[23:16]=48, EAX[7:0]=52. And when L1 KVM is old, it reports EAX[7:0] 
= 48 to L1 userspace.

right, 48 is not the raw/real MAXPHYADDR. But I think there is not 
statement on KVM that CPUID.0x8000_0008.EAX[7:0] of 
KVM_GET_SUPPORTED_CPUID reports the raw/real MAXPHYADDR.

> Please holler if you (or anyone) disagrees with the changes or my analysis on
> the KVM-on-KVM issue.
> 
> Thanks!
> 
> [1/2] KVM: x86: Don't advertise guest.MAXPHYADDR as host.MAXPHYADDR in CPUID
>        https://github.com/kvm-x86/linux/commit/6f5c9600621b
> [2/2] KVM: x86: Advertise max mappable GPA in CPUID.0x80000008.GuestPhysBits
>        https://github.com/kvm-x86/linux/commit/b628cb523c65
> 
> --
> https://github.com/kvm-x86/linux/tree/next
>
Sean Christopherson April 12, 2024, 3:48 p.m. UTC | #3
On Fri, Apr 12, 2024, Xiaoyao Li wrote:
> On 4/10/2024 8:19 AM, Sean Christopherson wrote:
> > On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote:
> > > Use the GuestPhysBits field (EAX[23:16]) to communicate the max
> > > addressable GPA to the guest.  Typically this is identical to the max
> > > effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does
> > > not support 5-level TDP.
> > > 
> > > See commit messages and source code comments for details.
> > > 
> > > [...]
> > 
> > Applied to kvm-x86 misc, with massaged changelogs to be more verbose when
> > describing the impact of each change, e.g. to call out that patch 2 isn't an
> > urgent fix because guest firmware can simply limit itself to using GPAs that
> > can be addressed with 4-level paging.
> > 
> > I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when
> > patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR
> > as the raw/real MAXPHYADDR.
> 
> you mean old KVM on new KVM?

Yep.

> As far as I see, it seems no harm. e.g., if the userspace and L0 KVM have
> the new implementation. On Intel SRF platform, L1 KVM sees EAX[23:16]=48,
> EAX[7:0]=52. And when L1 KVM is old, it reports EAX[7:0] = 48 to L1
> userspace.

Yep.

> right, 48 is not the raw/real MAXPHYADDR. But I think there is not statement
> on KVM that CPUID.0x8000_0008.EAX[7:0] of KVM_GET_SUPPORTED_CPUID reports
> the raw/real MAXPHYADDR.

If we go deep enough, it becomes a functional problem.  It's not even _that_
ridiculous/contrived :-)

L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate
issues with reserved bits at that level.

But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48,
and so L2 KVM will incorrectly think bits 51:48 are reserved.  If both L0 and L1
are using TDP, neither L0 nor L1 will intercept #PF.  And because L1 userspace
was told MAXPHYADDR=48, it won't know that KVM needs to be configured with
allow_smaller_maxphyaddr=true in order for the setup to function correctly.

If L2 runs an L3, and does not use EPT, L2 will think it can generate a RSVD #PF
to accelerate emulated MMIO.  The GPA with bits 51:48!=0 created by L2 generates
an EPT violation in L1.  Because L1 doesn't have allow_smaller_maxphyaddr, L1
installs an EPT mapping for the wrong GPA (effectively drops bits 51:48), and
L3 hangs because L1 will keep doing nothing on the resulting EPT violation (L1
thinks there's already a valid mapping).

With patch 1 and the OVMF fixes backported, L1 KVM will enumerate MAXPHYADDR=52,
L1 userspace creates L2 with MAXPHYADDR=52, and L2 OVMF restricts its mappings to
bits 47:0.

At least, I think that's what will happen.
Xiaoyao Li April 15, 2024, 6:17 a.m. UTC | #4
On 4/12/2024 11:48 PM, Sean Christopherson wrote:
> On Fri, Apr 12, 2024, Xiaoyao Li wrote:
>> On 4/10/2024 8:19 AM, Sean Christopherson wrote:
>>> On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote:
>>>> Use the GuestPhysBits field (EAX[23:16]) to communicate the max
>>>> addressable GPA to the guest.  Typically this is identical to the max
>>>> effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does
>>>> not support 5-level TDP.
>>>>
>>>> See commit messages and source code comments for details.
>>>>
>>>> [...]
>>>
>>> Applied to kvm-x86 misc, with massaged changelogs to be more verbose when
>>> describing the impact of each change, e.g. to call out that patch 2 isn't an
>>> urgent fix because guest firmware can simply limit itself to using GPAs that
>>> can be addressed with 4-level paging.
>>>
>>> I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when
>>> patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR
>>> as the raw/real MAXPHYADDR.
>>
>> you mean old KVM on new KVM?
> 
> Yep.
> 
>> As far as I see, it seems no harm. e.g., if the userspace and L0 KVM have
>> the new implementation. On Intel SRF platform, L1 KVM sees EAX[23:16]=48,
>> EAX[7:0]=52. And when L1 KVM is old, it reports EAX[7:0] = 48 to L1
>> userspace.
> 
> Yep.
> 
>> right, 48 is not the raw/real MAXPHYADDR. But I think there is not statement
>> on KVM that CPUID.0x8000_0008.EAX[7:0] of KVM_GET_SUPPORTED_CPUID reports
>> the raw/real MAXPHYADDR.
> 
> If we go deep enough, it becomes a functional problem.  It's not even _that_
> ridiculous/contrived :-)
> 
> L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate
> issues with reserved bits at that level.
> 
> But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48,
> and so L2 KVM will incorrectly think bits 51:48 are reserved.  If both L0 and L1
> are using TDP, neither L0 nor L1 will intercept #PF.  And because L1 userspace
> was told MAXPHYADDR=48, it won't know that KVM needs to be configured with
> allow_smaller_maxphyaddr=true in order for the setup to function correctly.

In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48 
via KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by 
executing CPUID itself.

So if L1 userspace decides to configure MAXPHYADDR to 48 for L2, 
according to a). It is supposed to check if KVM is configured with 
allow_smaller_maxphyaddr=y. Otherwise, it cannot expect it works 
function correctly.

> If L2 runs an L3, and does not use EPT, L2 will think it can generate a RSVD #PF
> to accelerate emulated MMIO.  The GPA with bits 51:48!=0 created by L2 generates
> an EPT violation in L1.  Because L1 doesn't have allow_smaller_maxphyaddr, L1
> installs an EPT mapping for the wrong GPA (effectively drops bits 51:48), and
> L3 hangs because L1 will keep doing nothing on the resulting EPT violation (L1
> thinks there's already a valid mapping).
> 
> With patch 1 and the OVMF fixes backported, L1 KVM will enumerate MAXPHYADDR=52,
> L1 userspace creates L2 with MAXPHYADDR=52, and L2 OVMF restricts its mappings to
> bits 47:0.
> 
> At least, I think that's what will happen.
Sean Christopherson April 15, 2024, 2:58 p.m. UTC | #5
On Mon, Apr 15, 2024, Xiaoyao Li wrote:
> On 4/12/2024 11:48 PM, Sean Christopherson wrote:
> > On Fri, Apr 12, 2024, Xiaoyao Li wrote:
> > If we go deep enough, it becomes a functional problem.  It's not even _that_
> > ridiculous/contrived :-)
> > 
> > L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate
> > issues with reserved bits at that level.
> > 
> > But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48,
> > and so L2 KVM will incorrectly think bits 51:48 are reserved.  If both L0 and L1
> > are using TDP, neither L0 nor L1 will intercept #PF.  And because L1 userspace
> > was told MAXPHYADDR=48, it won't know that KVM needs to be configured with
> > allow_smaller_maxphyaddr=true in order for the setup to function correctly.
> 
> In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48 via
> KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by
> executing CPUID itself.

KVM can't assume userspace will do raw CPUID.
Xiaoyao Li April 16, 2024, 8:47 a.m. UTC | #6
On 4/15/2024 10:58 PM, Sean Christopherson wrote:
> On Mon, Apr 15, 2024, Xiaoyao Li wrote:
>> On 4/12/2024 11:48 PM, Sean Christopherson wrote:
>>> On Fri, Apr 12, 2024, Xiaoyao Li wrote:
>>> If we go deep enough, it becomes a functional problem.  It's not even _that_
>>> ridiculous/contrived :-)
>>>
>>> L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate
>>> issues with reserved bits at that level.
>>>
>>> But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48,
>>> and so L2 KVM will incorrectly think bits 51:48 are reserved.  If both L0 and L1
>>> are using TDP, neither L0 nor L1 will intercept #PF.  And because L1 userspace
>>> was told MAXPHYADDR=48, it won't know that KVM needs to be configured with
>>> allow_smaller_maxphyaddr=true in order for the setup to function correctly.
>>
>> In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48 via
>> KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by
>> executing CPUID itself.
> 
> KVM can't assume userspace will do raw CPUID.

So the KVM ABI is that, KVM_GET_SUPPORTED_CPUID always reports the 
host's MAXPHYADDR, if userspace wants to configure a smaller one than it 
for guest and expect it functioning, it needs to set 
kvm_intel.allower_smaller_maxphyaddr ?
Sean Christopherson April 16, 2024, 2:14 p.m. UTC | #7
On Tue, Apr 16, 2024, Xiaoyao Li wrote:
> On 4/15/2024 10:58 PM, Sean Christopherson wrote:
> > On Mon, Apr 15, 2024, Xiaoyao Li wrote:
> > > On 4/12/2024 11:48 PM, Sean Christopherson wrote:
> > > > On Fri, Apr 12, 2024, Xiaoyao Li wrote:
> > > > If we go deep enough, it becomes a functional problem.  It's not even _that_
> > > > ridiculous/contrived :-)
> > > > 
> > > > L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate
> > > > issues with reserved bits at that level.
> > > > 
> > > > But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48,
> > > > and so L2 KVM will incorrectly think bits 51:48 are reserved.  If both L0 and L1
> > > > are using TDP, neither L0 nor L1 will intercept #PF.  And because L1 userspace
> > > > was told MAXPHYADDR=48, it won't know that KVM needs to be configured with
> > > > allow_smaller_maxphyaddr=true in order for the setup to function correctly.
> > > 
> > > In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48 via
> > > KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by
> > > executing CPUID itself.
> > 
> > KVM can't assume userspace will do raw CPUID.
> 
> So the KVM ABI is that, KVM_GET_SUPPORTED_CPUID always reports the host's
> MAXPHYADDR,

Not precisely, because KVM will report a reduced value when something, e.g. MKTME,
is stealing physical address bits and KVM is using shadow paging.  I.e. when the
host's effective address width is also the guest's effective address width.

> if userspace wants to configure a smaller one than it for guest and expect it
> functioning, it needs to set kvm_intel.allower_smaller_maxphyaddr ?

Yep.  The interaction with allow_smaller_maxphyaddr is what I want to get "right",
in that I don't want KVM_GET_SUPPORTED_CPUID to report a MAXPHYADDR value that
won't work for KVM's default configuration.