Message ID | 20240313125844.912415-1-kraxel@redhat.com (mailing list archive) |
---|---|
Headers | show |
Series | kvm/cpuid: set proper GuestPhysBits in CPUID.0x80000008 | expand |
On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote: > Use the GuestPhysBits field (EAX[23:16]) to communicate the max > addressable GPA to the guest. Typically this is identical to the max > effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does > not support 5-level TDP. > > See commit messages and source code comments for details. > > [...] Applied to kvm-x86 misc, with massaged changelogs to be more verbose when describing the impact of each change, e.g. to call out that patch 2 isn't an urgent fix because guest firmware can simply limit itself to using GPAs that can be addressed with 4-level paging. I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR as the raw/real MAXPHYADDR. Please holler if you (or anyone) disagrees with the changes or my analysis on the KVM-on-KVM issue. Thanks! [1/2] KVM: x86: Don't advertise guest.MAXPHYADDR as host.MAXPHYADDR in CPUID https://github.com/kvm-x86/linux/commit/6f5c9600621b [2/2] KVM: x86: Advertise max mappable GPA in CPUID.0x80000008.GuestPhysBits https://github.com/kvm-x86/linux/commit/b628cb523c65 -- https://github.com/kvm-x86/linux/tree/next
On 4/10/2024 8:19 AM, Sean Christopherson wrote: > On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote: >> Use the GuestPhysBits field (EAX[23:16]) to communicate the max >> addressable GPA to the guest. Typically this is identical to the max >> effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does >> not support 5-level TDP. >> >> See commit messages and source code comments for details. >> >> [...] > > Applied to kvm-x86 misc, with massaged changelogs to be more verbose when > describing the impact of each change, e.g. to call out that patch 2 isn't an > urgent fix because guest firmware can simply limit itself to using GPAs that > can be addressed with 4-level paging. > > I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when > patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR > as the raw/real MAXPHYADDR. you mean old KVM on new KVM? As far as I see, it seems no harm. e.g., if the userspace and L0 KVM have the new implementation. On Intel SRF platform, L1 KVM sees EAX[23:16]=48, EAX[7:0]=52. And when L1 KVM is old, it reports EAX[7:0] = 48 to L1 userspace. right, 48 is not the raw/real MAXPHYADDR. But I think there is not statement on KVM that CPUID.0x8000_0008.EAX[7:0] of KVM_GET_SUPPORTED_CPUID reports the raw/real MAXPHYADDR. > Please holler if you (or anyone) disagrees with the changes or my analysis on > the KVM-on-KVM issue. > > Thanks! > > [1/2] KVM: x86: Don't advertise guest.MAXPHYADDR as host.MAXPHYADDR in CPUID > https://github.com/kvm-x86/linux/commit/6f5c9600621b > [2/2] KVM: x86: Advertise max mappable GPA in CPUID.0x80000008.GuestPhysBits > https://github.com/kvm-x86/linux/commit/b628cb523c65 > > -- > https://github.com/kvm-x86/linux/tree/next >
On Fri, Apr 12, 2024, Xiaoyao Li wrote: > On 4/10/2024 8:19 AM, Sean Christopherson wrote: > > On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote: > > > Use the GuestPhysBits field (EAX[23:16]) to communicate the max > > > addressable GPA to the guest. Typically this is identical to the max > > > effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does > > > not support 5-level TDP. > > > > > > See commit messages and source code comments for details. > > > > > > [...] > > > > Applied to kvm-x86 misc, with massaged changelogs to be more verbose when > > describing the impact of each change, e.g. to call out that patch 2 isn't an > > urgent fix because guest firmware can simply limit itself to using GPAs that > > can be addressed with 4-level paging. > > > > I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when > > patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR > > as the raw/real MAXPHYADDR. > > you mean old KVM on new KVM? Yep. > As far as I see, it seems no harm. e.g., if the userspace and L0 KVM have > the new implementation. On Intel SRF platform, L1 KVM sees EAX[23:16]=48, > EAX[7:0]=52. And when L1 KVM is old, it reports EAX[7:0] = 48 to L1 > userspace. Yep. > right, 48 is not the raw/real MAXPHYADDR. But I think there is not statement > on KVM that CPUID.0x8000_0008.EAX[7:0] of KVM_GET_SUPPORTED_CPUID reports > the raw/real MAXPHYADDR. If we go deep enough, it becomes a functional problem. It's not even _that_ ridiculous/contrived :-) L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate issues with reserved bits at that level. But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48, and so L2 KVM will incorrectly think bits 51:48 are reserved. If both L0 and L1 are using TDP, neither L0 nor L1 will intercept #PF. And because L1 userspace was told MAXPHYADDR=48, it won't know that KVM needs to be configured with allow_smaller_maxphyaddr=true in order for the setup to function correctly. If L2 runs an L3, and does not use EPT, L2 will think it can generate a RSVD #PF to accelerate emulated MMIO. The GPA with bits 51:48!=0 created by L2 generates an EPT violation in L1. Because L1 doesn't have allow_smaller_maxphyaddr, L1 installs an EPT mapping for the wrong GPA (effectively drops bits 51:48), and L3 hangs because L1 will keep doing nothing on the resulting EPT violation (L1 thinks there's already a valid mapping). With patch 1 and the OVMF fixes backported, L1 KVM will enumerate MAXPHYADDR=52, L1 userspace creates L2 with MAXPHYADDR=52, and L2 OVMF restricts its mappings to bits 47:0. At least, I think that's what will happen.
On 4/12/2024 11:48 PM, Sean Christopherson wrote: > On Fri, Apr 12, 2024, Xiaoyao Li wrote: >> On 4/10/2024 8:19 AM, Sean Christopherson wrote: >>> On Wed, 13 Mar 2024 13:58:41 +0100, Gerd Hoffmann wrote: >>>> Use the GuestPhysBits field (EAX[23:16]) to communicate the max >>>> addressable GPA to the guest. Typically this is identical to the max >>>> effective GPA, except in case the CPU supports MAXPHYADDR > 48 but does >>>> not support 5-level TDP. >>>> >>>> See commit messages and source code comments for details. >>>> >>>> [...] >>> >>> Applied to kvm-x86 misc, with massaged changelogs to be more verbose when >>> describing the impact of each change, e.g. to call out that patch 2 isn't an >>> urgent fix because guest firmware can simply limit itself to using GPAs that >>> can be addressed with 4-level paging. >>> >>> I also tagged patch 1 for stable@, as KVM-on-KVM will do the wrong thing when >>> patch 2 lands, i.e. KVM will incorrectly advertise the addressable MAXPHYADDR >>> as the raw/real MAXPHYADDR. >> >> you mean old KVM on new KVM? > > Yep. > >> As far as I see, it seems no harm. e.g., if the userspace and L0 KVM have >> the new implementation. On Intel SRF platform, L1 KVM sees EAX[23:16]=48, >> EAX[7:0]=52. And when L1 KVM is old, it reports EAX[7:0] = 48 to L1 >> userspace. > > Yep. > >> right, 48 is not the raw/real MAXPHYADDR. But I think there is not statement >> on KVM that CPUID.0x8000_0008.EAX[7:0] of KVM_GET_SUPPORTED_CPUID reports >> the raw/real MAXPHYADDR. > > If we go deep enough, it becomes a functional problem. It's not even _that_ > ridiculous/contrived :-) > > L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate > issues with reserved bits at that level. > > But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48, > and so L2 KVM will incorrectly think bits 51:48 are reserved. If both L0 and L1 > are using TDP, neither L0 nor L1 will intercept #PF. And because L1 userspace > was told MAXPHYADDR=48, it won't know that KVM needs to be configured with > allow_smaller_maxphyaddr=true in order for the setup to function correctly. In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48 via KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by executing CPUID itself. So if L1 userspace decides to configure MAXPHYADDR to 48 for L2, according to a). It is supposed to check if KVM is configured with allow_smaller_maxphyaddr=y. Otherwise, it cannot expect it works function correctly. > If L2 runs an L3, and does not use EPT, L2 will think it can generate a RSVD #PF > to accelerate emulated MMIO. The GPA with bits 51:48!=0 created by L2 generates > an EPT violation in L1. Because L1 doesn't have allow_smaller_maxphyaddr, L1 > installs an EPT mapping for the wrong GPA (effectively drops bits 51:48), and > L3 hangs because L1 will keep doing nothing on the resulting EPT violation (L1 > thinks there's already a valid mapping). > > With patch 1 and the OVMF fixes backported, L1 KVM will enumerate MAXPHYADDR=52, > L1 userspace creates L2 with MAXPHYADDR=52, and L2 OVMF restricts its mappings to > bits 47:0. > > At least, I think that's what will happen.
On Mon, Apr 15, 2024, Xiaoyao Li wrote: > On 4/12/2024 11:48 PM, Sean Christopherson wrote: > > On Fri, Apr 12, 2024, Xiaoyao Li wrote: > > If we go deep enough, it becomes a functional problem. It's not even _that_ > > ridiculous/contrived :-) > > > > L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate > > issues with reserved bits at that level. > > > > But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48, > > and so L2 KVM will incorrectly think bits 51:48 are reserved. If both L0 and L1 > > are using TDP, neither L0 nor L1 will intercept #PF. And because L1 userspace > > was told MAXPHYADDR=48, it won't know that KVM needs to be configured with > > allow_smaller_maxphyaddr=true in order for the setup to function correctly. > > In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48 via > KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by > executing CPUID itself. KVM can't assume userspace will do raw CPUID.
On 4/15/2024 10:58 PM, Sean Christopherson wrote: > On Mon, Apr 15, 2024, Xiaoyao Li wrote: >> On 4/12/2024 11:48 PM, Sean Christopherson wrote: >>> On Fri, Apr 12, 2024, Xiaoyao Li wrote: >>> If we go deep enough, it becomes a functional problem. It's not even _that_ >>> ridiculous/contrived :-) >>> >>> L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate >>> issues with reserved bits at that level. >>> >>> But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48, >>> and so L2 KVM will incorrectly think bits 51:48 are reserved. If both L0 and L1 >>> are using TDP, neither L0 nor L1 will intercept #PF. And because L1 userspace >>> was told MAXPHYADDR=48, it won't know that KVM needs to be configured with >>> allow_smaller_maxphyaddr=true in order for the setup to function correctly. >> >> In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48 via >> KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by >> executing CPUID itself. > > KVM can't assume userspace will do raw CPUID. So the KVM ABI is that, KVM_GET_SUPPORTED_CPUID always reports the host's MAXPHYADDR, if userspace wants to configure a smaller one than it for guest and expect it functioning, it needs to set kvm_intel.allower_smaller_maxphyaddr ?
On Tue, Apr 16, 2024, Xiaoyao Li wrote: > On 4/15/2024 10:58 PM, Sean Christopherson wrote: > > On Mon, Apr 15, 2024, Xiaoyao Li wrote: > > > On 4/12/2024 11:48 PM, Sean Christopherson wrote: > > > > On Fri, Apr 12, 2024, Xiaoyao Li wrote: > > > > If we go deep enough, it becomes a functional problem. It's not even _that_ > > > > ridiculous/contrived :-) > > > > > > > > L1 KVM is still aware that the real MAXPHYADDR=52, and so there are no immediate > > > > issues with reserved bits at that level. > > > > > > > > But L1 userspace will unintentionally configure L2 with CPUID.0x8000_0008.EAX[7:0]=48, > > > > and so L2 KVM will incorrectly think bits 51:48 are reserved. If both L0 and L1 > > > > are using TDP, neither L0 nor L1 will intercept #PF. And because L1 userspace > > > > was told MAXPHYADDR=48, it won't know that KVM needs to be configured with > > > > allow_smaller_maxphyaddr=true in order for the setup to function correctly. > > > > > > In this case, a) L1 userspace was told by L1 KVM that MAXPHYADDR = 48 via > > > KVM_GET_SUPPORTED_CPUID. But b) L1 userspace gets MAXPHYADDR = 52 by > > > executing CPUID itself. > > > > KVM can't assume userspace will do raw CPUID. > > So the KVM ABI is that, KVM_GET_SUPPORTED_CPUID always reports the host's > MAXPHYADDR, Not precisely, because KVM will report a reduced value when something, e.g. MKTME, is stealing physical address bits and KVM is using shadow paging. I.e. when the host's effective address width is also the guest's effective address width. > if userspace wants to configure a smaller one than it for guest and expect it > functioning, it needs to set kvm_intel.allower_smaller_maxphyaddr ? Yep. The interaction with allow_smaller_maxphyaddr is what I want to get "right", in that I don't want KVM_GET_SUPPORTED_CPUID to report a MAXPHYADDR value that won't work for KVM's default configuration.