[RFC,04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features

Message ID	381b25a0dc0ed3e4579d50efb3634329132a2c02.1609890536.git.kai.huang@intel.com (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-sgx-owner@kernel.org> IronPort-SDR: BdnvU3zu7Sv+dHSSevITYnO7XxMLpZEg/e/iLiapy+ouey5Vi+Bf1JbyMU03qJFb3NsQLZS3tc +1BvEBrAXgGw== IronPort-SDR: 7Ps4N4I8rk04pBsrE4wagmi+xI9hpAs7xZ+hgZPwzMK4WxlpRfCmOcbK30FRf20+4c6xjK+w+a 8Wp9TnrHZnaQ== From: Kai Huang <kai.huang@intel.com> To: linux-sgx@vger.kernel.org, kvm@vger.kernel.org, x86@kernel.org Cc: seanjc@google.com, jarkko@kernel.org, luto@kernel.org, dave.hansen@intel.com, haitao.huang@intel.com, pbonzini@redhat.com, bp@alien8.de, tglx@linutronix.de, mingo@redhat.com, hpa@zytor.com, Kai Huang <kai.huang@intel.com> Subject: [RFC PATCH 04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features Date: Wed, 6 Jan 2021 14:55:21 +1300 Message-Id: <381b25a0dc0ed3e4579d50efb3634329132a2c02.1609890536.git.kai.huang@intel.com> In-Reply-To: <cover.1609890536.git.kai.huang@intel.com> References: <cover.1609890536.git.kai.huang@intel.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	KVM SGX virtualization support \| expand [RFC,00/23] KVM SGX virtualization support [RFC,01/23] x86/sgx: Split out adding EPC page to free list to separate helper [RFC,02/23] x86/sgx: Add enum for SGX_CHILD_PRESENT error code [RFC,03/23] x86/sgx: Introduce virtual EPC for use by KVM guests [RFC,04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features [RFC,05/23] x86/cpu/intel: Allow SGX virtualization without Launch Control support [RFC,06/23] x86/sgx: Expose SGX architectural definitions to the kernel [RFC,07/23] x86/sgx: Move ENCLS leaf definitions to sgx_arch.h [RFC,08/23] x86/sgx: Add SGX2 ENCLS leaf definitions (EAUG, EMODPR and EMODT) [RFC,09/23] x86/sgx: Add encls_faulted() helper [RFC,10/23] x86/sgx: Add helper to update SGX_LEPUBKEYHASHn MSRs [RFC,11/23] x86/sgx: Add helpers to expose ECREATE and EINIT to KVM [RFC,12/23] x86/sgx: Move provisioning device creation out of SGX driver [RFC,13/23] KVM: VMX: Convert vcpu_vmx.exit_reason to a union [RFC,14/23] KVM: x86: Export kvm_mmu_gva_to_gpa_{read,write}() for SGX (VMX) [RFC,15/23] KVM: x86: Define new #PF SGX error code bit [RFC,16/23] KVM: x86: Add SGX feature leaf to reverse CPUID lookup [RFC,17/23] KVM: VMX: Add basic handling of VM-Exit from SGX enclave [RFC,18/23] KVM: VMX: Frame in ENCLS handler for SGX virtualization [RFC,19/23] KVM: VMX: Add SGX ENCLS[ECREATE] handler to enforce CPUID restrictions [RFC,20/23] KVM: VMX: Add emulation of SGX Launch Control LE hash MSRs [RFC,21/23] KVM: VMX: Add ENCLS[EINIT] handler to support SGX Launch Control (LC) [RFC,22/23] KVM: VMX: Enable SGX virtualization for SGX1, SGX2 and LC [RFC,23/23] KVM: x86: Add capability to grant VM access to privileged SGX attribute

Huang, Kai Jan. 6, 2021, 1:55 a.m. UTC

From: Sean Christopherson <sean.j.christopherson@intel.com>

Add a feature word to hold SGX features enumerated via CPUID.0x12.0x0,
along with flags for SGX1 and SGX2. As part of virtualizing SGX, KVM
needs to expose the SGX CPUID leafs to its guest. SGX1 and SGX2 need to
be in a dedicated feature word so that they can be queried via KVM's
reverse CPUID lookup to properly emulate the expected guest behavior.

Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
 [Kai: Also clear SGX1 and SGX2 bits in clear_sgx_caps().]
Signed-off-by: Kai Huang <kai.huang@intel.com>
---
 arch/x86/include/asm/cpufeature.h        | 5 +++--
 arch/x86/include/asm/cpufeatures.h       | 6 +++++-
 arch/x86/include/asm/disabled-features.h | 7 ++++++-
 arch/x86/include/asm/required-features.h | 2 +-
 arch/x86/kernel/cpu/common.c             | 4 ++++
 arch/x86/kernel/cpu/feat_ctl.c           | 2 ++
 6 files changed, 21 insertions(+), 5 deletions(-)

Dave Hansen Jan. 6, 2021, 7:39 p.m. UTC | #1

On 1/5/21 5:55 PM, Kai Huang wrote:
> --- a/arch/x86/kernel/cpu/feat_ctl.c
> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
>  {
>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
>  }

Logically, I think you want this *after* the "Allow SGX virtualization
without Launch Control support" patch.  As it stands, this will totally
disable SGX (including virtualization) if launch control is unavailable.

Huang, Kai Jan. 6, 2021, 10:12 p.m. UTC | #2

On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
> On 1/5/21 5:55 PM, Kai Huang wrote:
> > --- a/arch/x86/kernel/cpu/feat_ctl.c
> > +++ b/arch/x86/kernel/cpu/feat_ctl.c
> > @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
> >  {
> >  	setup_clear_cpu_cap(X86_FEATURE_SGX);
> >  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> > +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> > +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> >  }
> 
> Logically, I think you want this *after* the "Allow SGX virtualization
> without Launch Control support" patch.  As it stands, this will totally
> disable SGX (including virtualization) if launch control is unavailable.

To me it is better to be here, since clear_sgx_caps(), which disables SGX
totally, should logically clear all SGX feature bits, no matter later patch's
behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
them too. Otherwise the logic of this patch (adding new SGX feature bits) is
not complete IMHO.

And actually in later patch "Allow SGX virtualization without Launch Control
support", a new clear_sgx_lc() is added, and is called when LC is not
available but SGX virtualization is enabled, to make sure only SGX_LC bit is
cleared in this case. I don't quite understand why we need to clear SGX1 and
SGX2 in clear_sgx_caps() after the later patch.

Borislav Petkov Jan. 6, 2021, 10:15 p.m. UTC | #3

On Wed, Jan 06, 2021 at 02:55:21PM +1300, Kai Huang wrote:
> +/* Intel-defined SGX features, CPUID level 0x00000012:0 (EAX), word 19 */
> +#define X86_FEATURE_SGX1		(19*32+ 0) /* SGX1 leaf functions */
> +#define X86_FEATURE_SGX2		(19*32+ 1) /* SGX2 leaf functions */

Is anything else from that leaf going to be added later? Bit 5 is
"supports ENCLV instruction leaves", 6 is ENCLS insn leaves... are those
going to be used in the kernel too eventually?

Rest of them is reserved in the SDM which probably means internal only
for now.

Dave Hansen Jan. 6, 2021, 10:21 p.m. UTC | #4

On 1/6/21 2:12 PM, Kai Huang wrote:
> On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
>> On 1/5/21 5:55 PM, Kai Huang wrote:
>>> --- a/arch/x86/kernel/cpu/feat_ctl.c
>>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
>>> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
>>>  {
>>>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
>>>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
>>> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
>>> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
>>>  }
>> Logically, I think you want this *after* the "Allow SGX virtualization
>> without Launch Control support" patch.  As it stands, this will totally
>> disable SGX (including virtualization) if launch control is unavailable.
> To me it is better to be here, since clear_sgx_caps(), which disables SGX
> totally, should logically clear all SGX feature bits, no matter later patch's
> behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
> them too. Otherwise the logic of this patch (adding new SGX feature bits) is
> not complete IMHO.
> 
> And actually in later patch "Allow SGX virtualization without Launch Control
> support", a new clear_sgx_lc() is added, and is called when LC is not
> available but SGX virtualization is enabled, to make sure only SGX_LC bit is
> cleared in this case. I don't quite understand why we need to clear SGX1 and
> SGX2 in clear_sgx_caps() after the later patch.

I was talking about patch ordering.  It could be argued that this goes
after the content of patch 05/23.  Please _consider_ changing the ordering.

If that doesn't work for some reason, please at least call out in the
changelog that it leaves a temporarily funky situation.

Huang, Kai Jan. 6, 2021, 10:56 p.m. UTC | #5

On Wed, 6 Jan 2021 14:21:39 -0800 Dave Hansen wrote:
> On 1/6/21 2:12 PM, Kai Huang wrote:
> > On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
> >> On 1/5/21 5:55 PM, Kai Huang wrote:
> >>> --- a/arch/x86/kernel/cpu/feat_ctl.c
> >>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> >>> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
> >>>  {
> >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
> >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> >>>  }
> >> Logically, I think you want this *after* the "Allow SGX virtualization
> >> without Launch Control support" patch.  As it stands, this will totally
> >> disable SGX (including virtualization) if launch control is unavailable.
> > To me it is better to be here, since clear_sgx_caps(), which disables SGX
> > totally, should logically clear all SGX feature bits, no matter later patch's
> > behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
> > them too. Otherwise the logic of this patch (adding new SGX feature bits) is
> > not complete IMHO.
> > 
> > And actually in later patch "Allow SGX virtualization without Launch Control
> > support", a new clear_sgx_lc() is added, and is called when LC is not
> > available but SGX virtualization is enabled, to make sure only SGX_LC bit is
> > cleared in this case. I don't quite understand why we need to clear SGX1 and
> > SGX2 in clear_sgx_caps() after the later patch.
> 
> I was talking about patch ordering.  It could be argued that this goes
> after the content of patch 05/23.  Please _consider_ changing the ordering.
> 
> If that doesn't work for some reason, please at least call out in the
> changelog that it leaves a temporarily funky situation.
> 

The later patch currently uses SGX1 bit, which is the reason that this patch
needs be before later patch.

Sean,

I think it is OK to remove SGX1 bit check in later patch, since I have
never seen a machine with SGX bit in CPUID, but w/o SGX1. If we remove SGX1 bit
check in later, we can put this patch after the later patch.

Do you have comment here? If you are OK, I'll remove SGX1 bit check in later
patch and reorder the patch.

Huang, Kai Jan. 6, 2021, 11:09 p.m. UTC | #6

On Wed, 6 Jan 2021 23:15:27 +0100 Borislav Petkov wrote:
> On Wed, Jan 06, 2021 at 02:55:21PM +1300, Kai Huang wrote:
> > +/* Intel-defined SGX features, CPUID level 0x00000012:0 (EAX), word 19 */
> > +#define X86_FEATURE_SGX1		(19*32+ 0) /* SGX1 leaf functions */
> > +#define X86_FEATURE_SGX2		(19*32+ 1) /* SGX2 leaf functions */
> 
> Is anything else from that leaf going to be added later? Bit 5 is
> "supports ENCLV instruction leaves", 6 is ENCLS insn leaves... are those
> going to be used in the kernel too eventually?

Bit 5 and Bit 6 are related to reclaiming EPC page from SGX guest, and the
mechanism behind the two bits are only supposed to be used by KVM.

There's no urgent request to support them for now (and given basic SGX
virtualization is not in upstream), but I don't know whether they need to be
supported in the future.

> 
> Rest of them is reserved in the SDM which probably means internal only
> for now.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

Sean Christopherson Jan. 6, 2021, 11:19 p.m. UTC | #7

On Thu, Jan 07, 2021, Kai Huang wrote:
> On Wed, 6 Jan 2021 14:21:39 -0800 Dave Hansen wrote:
> > On 1/6/21 2:12 PM, Kai Huang wrote:
> > > On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
> > >> On 1/5/21 5:55 PM, Kai Huang wrote:
> > >>> --- a/arch/x86/kernel/cpu/feat_ctl.c
> > >>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> > >>> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
> > >>>  {
> > >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
> > >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> > >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> > >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> > >>>  }
> > >> Logically, I think you want this *after* the "Allow SGX virtualization
> > >> without Launch Control support" patch.  As it stands, this will totally
> > >> disable SGX (including virtualization) if launch control is unavailable.
> > >>
> > > To me it is better to be here, since clear_sgx_caps(), which disables SGX
> > > totally, should logically clear all SGX feature bits, no matter later patch's
> > > behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
> > > them too. Otherwise the logic of this patch (adding new SGX feature bits) is
> > > not complete IMHO.
> > > 
> > > And actually in later patch "Allow SGX virtualization without Launch Control
> > > support", a new clear_sgx_lc() is added, and is called when LC is not
> > > available but SGX virtualization is enabled, to make sure only SGX_LC bit is
> > > cleared in this case. I don't quite understand why we need to clear SGX1 and
> > > SGX2 in clear_sgx_caps() after the later patch.
> > 
> > I was talking about patch ordering.  It could be argued that this goes
> > after the content of patch 05/23.  Please _consider_ changing the ordering.
> > 
> > If that doesn't work for some reason, please at least call out in the
> > changelog that it leaves a temporarily funky situation.
> > 
> 
> The later patch currently uses SGX1 bit, which is the reason that this patch
> needs be before later patch.
> 
> Sean,
> 
> I think it is OK to remove SGX1 bit check in later patch, since I have
> never seen a machine with SGX bit in CPUID, but w/o SGX1.

The SGX1 check is "needed" to handle the case where SGX is supported but was
soft-disabled, e.g. because software disable a machine check bank by writing an
MCi_CTL MSR.

> If we remove SGX1 bit check in later, we can put this patch after the later
> patch.
> 
> Do you have comment here? If you are OK, I'll remove SGX1 bit check in later
> patch and reorder the patch.

Hmm, I'm not sure why the SGX driver was merged without explicitly checking for
SGX1 support.  I'm pretty sure we had an explicit SGX1 check in the driver path
at some point.  My guess is that the SGX1 change ended up in the KVM series
through a mishandled rebase.

Moving the check later won't break anything that's not already broken.  But,
arguably checking SGX1 is a bug fix of sorts, e.g. to guard against broken
firmware, and should go in as a standalone patch destined for stable.  The
kernel can't prevent SGX from being soft-disabled after boot, but IMO it should
cleanly handle the case where SGX was soft-disabled _before_ boot.

Dave Hansen Jan. 6, 2021, 11:33 p.m. UTC | #8

On 1/6/21 3:19 PM, Sean Christopherson wrote:
>> If we remove SGX1 bit check in later, we can put this patch after the later
>> patch.
>>
>> Do you have comment here? If you are OK, I'll remove SGX1 bit check in later
>> patch and reorder the patch.
> Hmm, I'm not sure why the SGX driver was merged without explicitly checking for
> SGX1 support.  I'm pretty sure we had an explicit SGX1 check in the driver path
> at some point.  My guess is that the SGX1 change ended up in the KVM series
> through a mishandled rebase.

There was one, but I think it got removed when I asked that the
X86_FEATURE_SGX1/2 bits be removed.  I actually even mentioned checking
the CPUID leaf directly with cpuid...() in initialization.  But I missed
when that was never done.

It's not a practical problem, but I do agree we should fix it up for
5.10 stable.

Huang, Kai Jan. 6, 2021, 11:40 p.m. UTC | #9

On Wed, 6 Jan 2021 14:21:39 -0800 Dave Hansen wrote:
> On 1/6/21 2:12 PM, Kai Huang wrote:
> > On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
> >> On 1/5/21 5:55 PM, Kai Huang wrote:
> >>> --- a/arch/x86/kernel/cpu/feat_ctl.c
> >>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> >>> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
> >>>  {
> >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
> >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> >>>  }
> >> Logically, I think you want this *after* the "Allow SGX virtualization
> >> without Launch Control support" patch.  As it stands, this will totally
> >> disable SGX (including virtualization) if launch control is unavailable.
> > To me it is better to be here, since clear_sgx_caps(), which disables SGX
> > totally, should logically clear all SGX feature bits, no matter later patch's
> > behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
> > them too. Otherwise the logic of this patch (adding new SGX feature bits) is
> > not complete IMHO.
> > 
> > And actually in later patch "Allow SGX virtualization without Launch Control
> > support", a new clear_sgx_lc() is added, and is called when LC is not
> > available but SGX virtualization is enabled, to make sure only SGX_LC bit is
> > cleared in this case. I don't quite understand why we need to clear SGX1 and
> > SGX2 in clear_sgx_caps() after the later patch.
> 
> I was talking about patch ordering.  It could be argued that this goes
> after the content of patch 05/23.  Please _consider_ changing the ordering.
> 
> If that doesn't work for some reason, please at least call out in the
> changelog that it leaves a temporarily funky situation.
> 
> 

Hi Dave,

After second thinking, if I understand you correctly, the "funky situation" you
are talking about is, w/o patch "Allow SGX virtualization without Launch
Control Support", SGX virtualization is disabled too if LC is not available in
hardware, but in previous patches (basically patch 3 "Introduce virtual EPC
for use by KVM guests"), we have been treating SGX virtualization can be
enabled?

In this case, clearing SGX1 and SGX2 bits before or after "Allow SGX
virtualization without Launch Control support" patch doesn't make difference,
since KVM should always check SGX bit first.

So a better way is to put "Allow SGX virtualization without Launch Control
Support" at the beginning of this series? If so, the Kconfig
X86_SGX_VIRTUALIZATION needs to be in separate patch at the very beginning.

Does above make sense?

Dave Hansen Jan. 6, 2021, 11:43 p.m. UTC | #10

On 1/6/21 3:40 PM, Kai Huang wrote:
> So a better way is to put "Allow SGX virtualization without Launch Control
> Support" at the beginning of this series? If so, the Kconfig
> X86_SGX_VIRTUALIZATION needs to be in separate patch at the very beginning.
> 
> Does above make sense? 

I think it's worth trying.  No promises that anyone will like the end
result, but give it a shot and I'll take a look.

Huang, Kai Jan. 6, 2021, 11:56 p.m. UTC | #11

On Wed, 6 Jan 2021 15:19:41 -0800 Sean Christopherson wrote:
> On Thu, Jan 07, 2021, Kai Huang wrote:
> > On Wed, 6 Jan 2021 14:21:39 -0800 Dave Hansen wrote:
> > > On 1/6/21 2:12 PM, Kai Huang wrote:
> > > > On Wed, 6 Jan 2021 11:39:46 -0800 Dave Hansen wrote:
> > > >> On 1/5/21 5:55 PM, Kai Huang wrote:
> > > >>> --- a/arch/x86/kernel/cpu/feat_ctl.c
> > > >>> +++ b/arch/x86/kernel/cpu/feat_ctl.c
> > > >>> @@ -97,6 +97,8 @@ static void clear_sgx_caps(void)
> > > >>>  {
> > > >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX);
> > > >>>  	setup_clear_cpu_cap(X86_FEATURE_SGX_LC);
> > > >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX1);
> > > >>> +	setup_clear_cpu_cap(X86_FEATURE_SGX2);
> > > >>>  }
> > > >> Logically, I think you want this *after* the "Allow SGX virtualization
> > > >> without Launch Control support" patch.  As it stands, this will totally
> > > >> disable SGX (including virtualization) if launch control is unavailable.
> > > >>
> > > > To me it is better to be here, since clear_sgx_caps(), which disables SGX
> > > > totally, should logically clear all SGX feature bits, no matter later patch's
> > > > behavior. So when new SGX bits are introduced, clear_sgx_caps() should clear
> > > > them too. Otherwise the logic of this patch (adding new SGX feature bits) is
> > > > not complete IMHO.
> > > > 
> > > > And actually in later patch "Allow SGX virtualization without Launch Control
> > > > support", a new clear_sgx_lc() is added, and is called when LC is not
> > > > available but SGX virtualization is enabled, to make sure only SGX_LC bit is
> > > > cleared in this case. I don't quite understand why we need to clear SGX1 and
> > > > SGX2 in clear_sgx_caps() after the later patch.
> > > 
> > > I was talking about patch ordering.  It could be argued that this goes
> > > after the content of patch 05/23.  Please _consider_ changing the ordering.
> > > 
> > > If that doesn't work for some reason, please at least call out in the
> > > changelog that it leaves a temporarily funky situation.
> > > 
> > 
> > The later patch currently uses SGX1 bit, which is the reason that this patch
> > needs be before later patch.
> > 
> > Sean,
> > 
> > I think it is OK to remove SGX1 bit check in later patch, since I have
> > never seen a machine with SGX bit in CPUID, but w/o SGX1.
> 
> The SGX1 check is "needed" to handle the case where SGX is supported but was
> soft-disabled, e.g. because software disable a machine check bank by writing an
> MCi_CTL MSR.
> 
> > If we remove SGX1 bit check in later, we can put this patch after the later
> > patch.
> > 
> > Do you have comment here? If you are OK, I'll remove SGX1 bit check in later
> > patch and reorder the patch.
> 
> Hmm, I'm not sure why the SGX driver was merged without explicitly checking for
> SGX1 support.  I'm pretty sure we had an explicit SGX1 check in the driver path
> at some point.  My guess is that the SGX1 change ended up in the KVM series
> through a mishandled rebase.
> 
> Moving the check later won't break anything that's not already broken.  But,
> arguably checking SGX1 is a bug fix of sorts, e.g. to guard against broken
> firmware, and should go in as a standalone patch destined for stable.  The
> kernel can't prevent SGX from being soft-disabled after boot, but IMO it should
> cleanly handle the case where SGX was soft-disabled _before_ boot.

It seems I need to dig some history. Thanks Sean for the info!

Huang, Kai Jan. 6, 2021, 11:56 p.m. UTC | #12

On Wed, 6 Jan 2021 15:43:54 -0800 Dave Hansen wrote:
> On 1/6/21 3:40 PM, Kai Huang wrote:
> > So a better way is to put "Allow SGX virtualization without Launch Control
> > Support" at the beginning of this series? If so, the Kconfig
> > X86_SGX_VIRTUALIZATION needs to be in separate patch at the very beginning.
> > 
> > Does above make sense? 
> 
> I think it's worth trying.  No promises that anyone will like the end
> result, but give it a shot and I'll take a look.

OK I'll try in next version. Thanks.

Borislav Petkov Jan. 7, 2021, 6:41 a.m. UTC | #13

On Thu, Jan 07, 2021 at 12:09:46PM +1300, Kai Huang wrote:
> There's no urgent request to support them for now (and given basic SGX
> virtualization is not in upstream), but I don't know whether they need to be
> supported in the future.

If that is the case, then wasting a whole leaf for two bits doesn't make
too much sense. And it looks like the kvm reverse lookup can be taught
to deal with composing that leaf dynamically when needed instead.

Thx.

Huang, Kai Jan. 8, 2021, 2 a.m. UTC | #14

On Thu, 7 Jan 2021 07:41:25 +0100 Borislav Petkov wrote:
> On Thu, Jan 07, 2021 at 12:09:46PM +1300, Kai Huang wrote:
> > There's no urgent request to support them for now (and given basic SGX
> > virtualization is not in upstream), but I don't know whether they need to be
> > supported in the future.
> 
> If that is the case, then wasting a whole leaf for two bits doesn't make
> too much sense. And it looks like the kvm reverse lookup can be taught
> to deal with composing that leaf dynamically when needed instead.

I am not sure changing reverse lookup to handle dynamic would be acceptable. To
me it is ugly, and I don't have a first glance on how to do it. KVM can query
host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
straightforward as having X86_FEATURE_SGX1/2.

And as Sean pointed out, SGX1 bit is also needed by both SGX driver and
init_ia32_feat_ctl():

	https://www.spinics.net/lists/kvm/msg231973.html

So having it would make things easier.

And regarding to other bits of this leaf, to me: 1) we cannot rule out
possibility that bit 5 and bit 6 will be supported in the future; 2) I cannot
talk more but we cannot rule out the possibility that there will be other bits
introduced in the future.

Sean, what do you think?

> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

Dave Hansen Jan. 8, 2021, 5:10 a.m. UTC | #15

On 1/7/21 6:00 PM, Kai Huang wrote:
> On Thu, 7 Jan 2021 07:41:25 +0100 Borislav Petkov wrote:
>> On Thu, Jan 07, 2021 at 12:09:46PM +1300, Kai Huang wrote:
>>> There's no urgent request to support them for now (and given basic SGX
>>> virtualization is not in upstream), but I don't know whether they need to be
>>> supported in the future.
>>
>> If that is the case, then wasting a whole leaf for two bits doesn't make
>> too much sense. And it looks like the kvm reverse lookup can be taught
>> to deal with composing that leaf dynamically when needed instead.
> 
> I am not sure changing reverse lookup to handle dynamic would be acceptable. To
> me it is ugly, and I don't have a first glance on how to do it. KVM can query
> host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
> straightforward as having X86_FEATURE_SGX1/2.

So, Boris was pretty direct here.  Could you please go spend a bit of
time to see what it would take to make these dynamic?  You can check
what our (Intel) plans are for this leaf, but if it's going to remain
sparsely-used, we need to look into making the leaves a bit more dynamic.

> And regarding to other bits of this leaf, to me: 1) we cannot rule out
> possibility that bit 5 and bit 6 will be supported in the future; 2) I cannot
> talk more but we cannot rule out the possibility that there will be other bits
> introduced in the future.

From the Intel side, let's go look at the features that are coming.  We
have a list of CPUID bits that have been dedicated to future CPU
features.  Let's look if 1 of these bits or 30 is coming.  I don't think
it's exactly a state secret approximately how many CPUID bits we *think*
will get used in this leaf.

We can't exactly put together a roadmap of bits, microarchitectures and
chip release dates.  But, we can at least say, "we have immediate plans
for most of the leaf" or "we don't plan to fill up the leaf any time soon."

Huang, Kai Jan. 8, 2021, 7:03 a.m. UTC | #16

On Thu, 7 Jan 2021 21:10:29 -0800 Dave Hansen wrote:
> On 1/7/21 6:00 PM, Kai Huang wrote:
> > On Thu, 7 Jan 2021 07:41:25 +0100 Borislav Petkov wrote:
> >> On Thu, Jan 07, 2021 at 12:09:46PM +1300, Kai Huang wrote:
> >>> There's no urgent request to support them for now (and given basic SGX
> >>> virtualization is not in upstream), but I don't know whether they need to be
> >>> supported in the future.
> >>
> >> If that is the case, then wasting a whole leaf for two bits doesn't make
> >> too much sense. And it looks like the kvm reverse lookup can be taught
> >> to deal with composing that leaf dynamically when needed instead.
> > 
> > I am not sure changing reverse lookup to handle dynamic would be acceptable. To
> > me it is ugly, and I don't have a first glance on how to do it. KVM can query
> > host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
> > straightforward as having X86_FEATURE_SGX1/2.
> 
> So, Boris was pretty direct here.  Could you please go spend a bit of
> time to see what it would take to make these dynamic?  You can check
> what our (Intel) plans are for this leaf, but if it's going to remain
> sparsely-used, we need to look into making the leaves a bit more dynamic.

I don't think reverse lookup can be made dyanmic, but like I said if we don't
have X86_FEATURE_SGX1/2, KVM needs to query raw CPUID when dealing with SGX.

The purpose of reverse lookup is to simplify KVM to have one common helper to
check whether guest's CPUID has particular hardware feature bit or not. For
instance, it changes guest_cpuid_has_xxx(cpuid) to guest_cpuid_has(cpuid,
X86_FEATURE_xxx), so KVM can get rid of bunch of dedicated
guest_cpuid_has_xxx() for each feature, but just use X86_FEATURE_xxx with one
function. W/o X86_FEATURE_SGX1/2, when dealing with them, KVM needs to have
dedicated functions but cannot use common one. That is a drawback for KVM.

Btw, one thing I forgot to say is with X86_FEATURE_SGX1/2, "sgx1" and "sgx2"
will be in /proc/cpuinfo auatomatically. I think showing "sgx2" (and other
future SGX features) in /proc/cpuinfo is helpful. W/o X86_FEATURE_SGX1/2, we
need specific handling, if we want to show them in /proc/cpuinfo.

That being said, if all those doesn't convince Boris and you guys, and Sean has
no say here, I'll remove X86_FEATURE_SGX1/2 in next version.

> 
> > And regarding to other bits of this leaf, to me: 1) we cannot rule out
> > possibility that bit 5 and bit 6 will be supported in the future; 2) I cannot
> > talk more but we cannot rule out the possibility that there will be other bits
> > introduced in the future.
> 
> From the Intel side, let's go look at the features that are coming.  We
> have a list of CPUID bits that have been dedicated to future CPU
> features.  Let's look if 1 of these bits or 30 is coming.  I don't think
> it's exactly a state secret approximately how many CPUID bits we *think*
> will get used in this leaf.
> 
> We can't exactly put together a roadmap of bits, microarchitectures and
> chip release dates.  But, we can at least say, "we have immediate plans
> for most of the leaf" or "we don't plan to fill up the leaf any time soon."

Borislav Petkov Jan. 8, 2021, 7:17 a.m. UTC | #17

On Fri, Jan 08, 2021 at 08:03:50PM +1300, Kai Huang wrote:
> > > I am not sure changing reverse lookup to handle dynamic would be acceptable. To
> > > me it is ugly, and I don't have a first glance on how to do it. KVM can query
> > > host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
> > > straightforward as having X86_FEATURE_SGX1/2.
> > 
> > So, Boris was pretty direct here.  Could you please go spend a bit of
> > time to see what it would take to make these dynamic?  You can check
> > what our (Intel) plans are for this leaf, but if it's going to remain
> > sparsely-used, we need to look into making the leaves a bit more dynamic.
> 
> I don't think reverse lookup can be made dyanmic, but like I said if we don't
> have X86_FEATURE_SGX1/2, KVM needs to query raw CPUID when dealing with SGX.

How about before you go and say that "it is ugly" and "don't think can
be made" you actually go and *really* try it first? Because actually
trying is sometimes faster than trying to find arguments against it. :)

Because I just did it and unless I'm missing something obvious - I
haven't actually tested it - this is not ugly at all and in the long run
it will become one big switch-case, which is perfectly fine.

---
diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 59bf91c57aa8..0bf5cb5441f8 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -30,6 +30,7 @@ enum cpuid_leafs
 	CPUID_7_ECX,
 	CPUID_8000_0007_EBX,
 	CPUID_7_EDX,
+	CPUID_12_EAX,	/* used only by KVM for now */
 };
 
 #ifdef CONFIG_X86_FEATURE_NAMES
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 84b887825f12..1bc1ade64489 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -292,6 +292,8 @@
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
 #define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 #define X86_FEATURE_PER_THREAD_MBA	(11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
+#define X86_FEATURE_SGX1		(11*32+ 8) /* SGX1 leaf functions */
+#define X86_FEATURE_SGX2		(11*32+ 9) /* SGX2 leaf functions */
 
 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dc921d76e42e..33c53a7411a1 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -63,8 +63,27 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
 	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
 	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
+	[CPUID_12_EAX]        = {      0x12, 0, CPUID_EAX},
 };
 
+/*
+ * Map a synthetic X86_FEATURE bit definition to the corresponding bit in the
+ * hardware CPUID leaf.
+ */
+static int map_synthetic_leaf(int x86_feature)
+{
+	switch (x86_feature) {
+	case X86_FEATURE_SGX1:	return BIT(0);
+	case X86_FEATURE_SGX2:	return BIT(1);
+	default:
+		break;
+	}
+
+	WARN_ON_ONCE(1);
+
+	return 0;
+}
+
 /*
  * Reverse CPUID and its derivatives can only be used for hardware-defined
  * feature words, i.e. words whose bits directly correspond to a CPUID leaf.
@@ -78,7 +97,6 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
 	BUILD_BUG_ON(x86_leaf == CPUID_LNX_1);
 	BUILD_BUG_ON(x86_leaf == CPUID_LNX_2);
 	BUILD_BUG_ON(x86_leaf == CPUID_LNX_3);
-	BUILD_BUG_ON(x86_leaf == CPUID_LNX_4);
 	BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid));
 	BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
 }
@@ -91,8 +109,14 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
  */
 static __always_inline u32 __feature_bit(int x86_feature)
 {
-	reverse_cpuid_check(x86_feature / 32);
-	return 1 << (x86_feature & 31);
+	int leaf = x86_feature / 32;
+
+	reverse_cpuid_check(leaf);
+
+	if (leaf == CPUID_LNX_4)
+		return map_synthetic_leaf(x86_feature);
+
+	return BIT(x86_feature & 31);
 }
 
 #define feature_bit(name)  __feature_bit(X86_FEATURE_##name)

Huang, Kai Jan. 8, 2021, 8:06 a.m. UTC | #18

On Fri, 8 Jan 2021 08:17:22 +0100 Borislav Petkov wrote:
> On Fri, Jan 08, 2021 at 08:03:50PM +1300, Kai Huang wrote:
> > > > I am not sure changing reverse lookup to handle dynamic would be acceptable. To
> > > > me it is ugly, and I don't have a first glance on how to do it. KVM can query
> > > > host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
> > > > straightforward as having X86_FEATURE_SGX1/2.
> > > 
> > > So, Boris was pretty direct here.  Could you please go spend a bit of
> > > time to see what it would take to make these dynamic?  You can check
> > > what our (Intel) plans are for this leaf, but if it's going to remain
> > > sparsely-used, we need to look into making the leaves a bit more dynamic.
> > 
> > I don't think reverse lookup can be made dyanmic, but like I said if we don't
> > have X86_FEATURE_SGX1/2, KVM needs to query raw CPUID when dealing with SGX.
> 
> How about before you go and say that "it is ugly" and "don't think can
> be made" you actually go and *really* try it first? Because actually
> trying is sometimes faster than trying to find arguments against it. :)

THanks. Lesson learned :)

> 
> Because I just did it and unless I'm missing something obvious - I
> haven't actually tested it - this is not ugly at all and in the long run
> it will become one big switch-case, which is perfectly fine.

No offence, but using synthetic bits is a little bit hack to me,given they are
actually hardware feature bits. And using synthetic leaf in reverse lookup is
against current KVM code. 

I'll try my own  way in next version, but thank you for the insight! :)

> 
> ---
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 59bf91c57aa8..0bf5cb5441f8 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -30,6 +30,7 @@ enum cpuid_leafs
>  	CPUID_7_ECX,
>  	CPUID_8000_0007_EBX,
>  	CPUID_7_EDX,
> +	CPUID_12_EAX,	/* used only by KVM for now */
>  };
>  
>  #ifdef CONFIG_X86_FEATURE_NAMES
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 84b887825f12..1bc1ade64489 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -292,6 +292,8 @@
>  #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
>  #define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
>  #define X86_FEATURE_PER_THREAD_MBA	(11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
> +#define X86_FEATURE_SGX1		(11*32+ 8) /* SGX1 leaf functions */
> +#define X86_FEATURE_SGX2		(11*32+ 9) /* SGX2 leaf functions */
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dc921d76e42e..33c53a7411a1 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -63,8 +63,27 @@ static const struct cpuid_reg reverse_cpuid[] = {
>  	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
>  	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
>  	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
> +	[CPUID_12_EAX]        = {      0x12, 0, CPUID_EAX},
>  };
>  
> +/*
> + * Map a synthetic X86_FEATURE bit definition to the corresponding bit in the
> + * hardware CPUID leaf.
> + */
> +static int map_synthetic_leaf(int x86_feature)
> +{
> +	switch (x86_feature) {
> +	case X86_FEATURE_SGX1:	return BIT(0);
> +	case X86_FEATURE_SGX2:	return BIT(1);
> +	default:
> +		break;
> +	}
> +
> +	WARN_ON_ONCE(1);
> +
> +	return 0;
> +}
> +
>  /*
>   * Reverse CPUID and its derivatives can only be used for hardware-defined
>   * feature words, i.e. words whose bits directly correspond to a CPUID leaf.
> @@ -78,7 +97,6 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
>  	BUILD_BUG_ON(x86_leaf == CPUID_LNX_1);
>  	BUILD_BUG_ON(x86_leaf == CPUID_LNX_2);
>  	BUILD_BUG_ON(x86_leaf == CPUID_LNX_3);
> -	BUILD_BUG_ON(x86_leaf == CPUID_LNX_4);
>  	BUILD_BUG_ON(x86_leaf >= ARRAY_SIZE(reverse_cpuid));
>  	BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
>  }
> @@ -91,8 +109,14 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
>   */
>  static __always_inline u32 __feature_bit(int x86_feature)
>  {
> -	reverse_cpuid_check(x86_feature / 32);
> -	return 1 << (x86_feature & 31);
> +	int leaf = x86_feature / 32;
> +
> +	reverse_cpuid_check(leaf);
> +
> +	if (leaf == CPUID_LNX_4)
> +		return map_synthetic_leaf(x86_feature);
> +
> +	return BIT(x86_feature & 31);
>  }
>  
>  #define feature_bit(name)  __feature_bit(X86_FEATURE_##name)
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

Borislav Petkov Jan. 8, 2021, 8:13 a.m. UTC | #19

On Fri, Jan 08, 2021 at 09:06:47PM +1300, Kai Huang wrote:
> No offence, but using synthetic bits is a little bit hack to me,given
> they are actually hardware feature bits.

Why?

Perhaps you need to have a look at Documentation/x86/cpuinfo.rst first.

> And using synthetic leaf in reverse lookup is against current KVM
> code.

You know how the kernel gets improved each day and old limitations are
not valid anymore?

> I'll try my own  way in next version, but thank you for the insight! :)

Feel free but remember to keep it simple. You can use mine too, if you
want to, as long as you attribute it with a Suggested-by or so.

Huang, Kai Jan. 8, 2021, 9 a.m. UTC | #20

On Fri, 8 Jan 2021 09:13:14 +0100 Borislav Petkov wrote:
> On Fri, Jan 08, 2021 at 09:06:47PM +1300, Kai Huang wrote:
> > No offence, but using synthetic bits is a little bit hack to me,given
> > they are actually hardware feature bits.
> 
> Why?
> 
> Perhaps you need to have a look at Documentation/x86/cpuinfo.rst first.

Will take a look. Thanks.

> 
> > And using synthetic leaf in reverse lookup is against current KVM
> > code.
> 
> You know how the kernel gets improved each day and old limitations are
> not valid anymore?
> 
> > I'll try my own  way in next version, but thank you for the insight! :)
> 
> Feel free but remember to keep it simple. You can use mine too, if you
> want to, as long as you attribute it with a Suggested-by or so.

OK. Thanks. 

> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

Sean Christopherson Jan. 8, 2021, 11:55 p.m. UTC | #21

On Fri, Jan 08, 2021, Borislav Petkov wrote:
> On Fri, Jan 08, 2021 at 08:03:50PM +1300, Kai Huang wrote:
> > > > I am not sure changing reverse lookup to handle dynamic would be acceptable. To
> > > > me it is ugly, and I don't have a first glance on how to do it. KVM can query
> > > > host CPUID when dealing with SGX w/o X86_FEATURE_SGX1/2, but it is not as
> > > > straightforward as having X86_FEATURE_SGX1/2.
> > > 
> > > So, Boris was pretty direct here.  Could you please go spend a bit of
> > > time to see what it would take to make these dynamic?  You can check
> > > what our (Intel) plans are for this leaf, but if it's going to remain
> > > sparsely-used, we need to look into making the leaves a bit more dynamic.

To be fair, this is the third time we've got conflicting, direct feedback on
this exact issue.  I do agree that it doesn't make sense to burn a whole word
for just two features, I guess I just feel like whining.

[*] https://lore.kernel.org/kvm/20180828102140.GA31102@nazgul.tnic/
[*] https://lore.kernel.org/linux-sgx/20190924162520.GJ19317@zn.tnic/

> > I don't think reverse lookup can be made dyanmic, but like I said if we don't
> > have X86_FEATURE_SGX1/2, KVM needs to query raw CPUID when dealing with SGX.
> 
> How about before you go and say that "it is ugly" and "don't think can
> be made" you actually go and *really* try it first? Because actually
> trying is sometimes faster than trying to find arguments against it. :)
> 
> Because I just did it and unless I'm missing something obvious - I
> haven't actually tested it - this is not ugly at all and in the long run
> it will become one big switch-case, which is perfectly fine.
>
> ---
> diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
> index 59bf91c57aa8..0bf5cb5441f8 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -30,6 +30,7 @@ enum cpuid_leafs
>  	CPUID_7_ECX,
>  	CPUID_8000_0007_EBX,
>  	CPUID_7_EDX,
> +	CPUID_12_EAX,	/* used only by KVM for now */
>  };
>  
>  #ifdef CONFIG_X86_FEATURE_NAMES
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index 84b887825f12..1bc1ade64489 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -292,6 +292,8 @@
>  #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
>  #define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
>  #define X86_FEATURE_PER_THREAD_MBA	(11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
> +#define X86_FEATURE_SGX1		(11*32+ 8) /* SGX1 leaf functions */
> +#define X86_FEATURE_SGX2		(11*32+ 9) /* SGX2 leaf functions */
>  
>  /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
>  #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dc921d76e42e..33c53a7411a1 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -63,8 +63,27 @@ static const struct cpuid_reg reverse_cpuid[] = {
>  	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
>  	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
>  	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
> +	[CPUID_12_EAX]        = {      0x12, 0, CPUID_EAX},
>  };

As is, this won't build (if KVM uses the features) because KVM cares about where
the feature actually lives in Linux's words.  The addition of CPUID_12_EAX is
unnecessary, and the new entry in reverse_cpuid would need to be
s/CPUID_12_EAX/CPUID_LNX_4.

That being said, I dislike this approach as it introduces fragility into KVM's
CPUID shenanigans.  E.g. fixing the above will make guest_cpuid_has() functional,
but kvm_cpu_cap_mask() and cpuid_entry_override() will not work as expected.

Here's a more involved approach that I believe will work (compile tested only)
and retains KVM's build magic.  Idea is to allocate a word kvm_cpu_caps for the
hardware-defined, Linux-scattered features, and use boot_cpu_has() to bridge the
gap when populating kvm_cpu_caps.

Another alternative would be to have KVM use boot_cpu_has() for everything, and
omit the memcpy from boot_cpu_data.x86_capability -> kvm_cpu_caps.  That would
eliminate some of the special logic for scattered features, but it adds nearly
3k bytes to kvm_set_cpu_caps(), which is hard to stomach even though it's
effectively one-and-done code.


From 6bdd61e23f1c0bd7519a3a6391c95cde5456f79d Mon Sep 17 00:00:00 2001
From: Sean Christopherson <seanjc@google.com>
Date: Fri, 8 Jan 2021 15:46:11 -0800
Subject: [PATCH] KVM: x86: Add support for reverse CPUID lookup of scattered
 features

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/include/asm/cpufeatures.h |  2 ++
 arch/x86/kvm/cpuid.c               | 36 ++++++++++++++++++---
 arch/x86/kvm/cpuid.h               | 50 +++++++++++++++++++++++++++---
 3 files changed, 78 insertions(+), 10 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index 9f9e9511f7cd..2fe57736d644 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -291,6 +291,8 @@
 #define X86_FEATURE_FENCE_SWAPGS_KERNEL	(11*32+ 5) /* "" LFENCE in kernel entry SWAPGS path */
 #define X86_FEATURE_SPLIT_LOCK_DETECT	(11*32+ 6) /* #AC for split lock */
 #define X86_FEATURE_PER_THREAD_MBA	(11*32+ 7) /* "" Per-thread Memory Bandwidth Allocation */
+#define X86_FEATURE_SGX1                (11*32+ 8) /* SGX1 leafs */
+#define X86_FEATURE_SGX2        	(11*32+ 9) /* SGX2 leafs */

 /* Intel-defined CPU features, CPUID level 0x00000007:1 (EAX), word 12 */
 #define X86_FEATURE_AVX512_BF16		(12*32+ 5) /* AVX512 BFLOAT16 instructions */
diff --git a/arch/x86/kvm/cpuid.c b/arch/x86/kvm/cpuid.c
index 13036cf0b912..4e647524f302 100644
--- a/arch/x86/kvm/cpuid.c
+++ b/arch/x86/kvm/cpuid.c
@@ -28,7 +28,7 @@
  * Unlike "struct cpuinfo_x86.x86_capability", kvm_cpu_caps doesn't need to be
  * aligned to sizeof(unsigned long) because it's not accessed via bitops.
  */
-u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
+u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 EXPORT_SYMBOL_GPL(kvm_cpu_caps);

 static u32 xstate_required_size(u64 xstate_bv, bool compacted)
@@ -53,6 +53,7 @@ static u32 xstate_required_size(u64 xstate_bv, bool compacted)
 }

 #define F feature_bit
+#define SF(name) (boot_cpu_has(X86_FEATURE_##name) ? F(name) : 0)

 static inline struct kvm_cpuid_entry2 *cpuid_entry2_find(
 	struct kvm_cpuid_entry2 *entries, int nent, u32 function, u32 index)
@@ -331,13 +332,13 @@ int kvm_vcpu_ioctl_get_cpuid2(struct kvm_vcpu *vcpu,
 	return r;
 }

-static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
+/* Mask kvm_cpu_caps for @leaf with the raw CPUID capabilities of this CPU. */
+static __always_inline void __kvm_cpu_cap_mask(enum cpuid_leafs leaf)
 {
 	const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
 	struct kvm_cpuid_entry2 entry;

 	reverse_cpuid_check(leaf);
-	kvm_cpu_caps[leaf] &= mask;

 	cpuid_count(cpuid.function, cpuid.index,
 		    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
@@ -345,6 +346,26 @@ static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
 	kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
 }

+static __always_inline void kvm_cpu_cap_mask(enum cpuid_leafs leaf, u32 mask)
+{
+	/* Use the "init" variant for scattered leafs. */
+	BUILD_BUG_ON(leaf >= NCAPINTS);
+
+	kvm_cpu_caps[leaf] &= mask;
+
+	__kvm_cpu_cap_mask(leaf);
+}
+
+static __always_inline void kvm_cpu_cap_init(enum cpuid_leafs leaf, u32 mask)
+{
+	/* Use the "mask" variant for hardwared-defined leafs. */
+	BUILD_BUG_ON(leaf < NCAPINTS);
+
+	kvm_cpu_caps[leaf] = mask;
+
+	__kvm_cpu_cap_mask(leaf);
+}
+
 void kvm_set_cpu_caps(void)
 {
 	unsigned int f_nx = is_efer_nx() ? F(NX) : 0;
@@ -355,12 +376,13 @@ void kvm_set_cpu_caps(void)
 	unsigned int f_gbpages = 0;
 	unsigned int f_lm = 0;
 #endif
+	memset(kvm_cpu_caps, 0, sizeof(kvm_cpu_caps));

-	BUILD_BUG_ON(sizeof(kvm_cpu_caps) >
+	BUILD_BUG_ON(sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)) >
 		     sizeof(boot_cpu_data.x86_capability));

 	memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
-	       sizeof(kvm_cpu_caps));
+	       sizeof(kvm_cpu_caps) - (NKVMCAPINTS * sizeof(*kvm_cpu_caps)));

 	kvm_cpu_cap_mask(CPUID_1_ECX,
 		/*
@@ -503,6 +525,10 @@ void kvm_set_cpu_caps(void)
 		F(ACE2) | F(ACE2_EN) | F(PHE) | F(PHE_EN) |
 		F(PMM) | F(PMM_EN)
 	);
+
+	kvm_cpu_cap_init(CPUID_12_EAX,
+		SF(SGX1) | SF(SGX2)
+	);
 }
 EXPORT_SYMBOL_GPL(kvm_set_cpu_caps);

diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
index dc921d76e42e..21f92d81d5a5 100644
--- a/arch/x86/kvm/cpuid.h
+++ b/arch/x86/kvm/cpuid.h
@@ -7,7 +7,25 @@
 #include <asm/processor.h>
 #include <uapi/asm/kvm_para.h>

-extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
+/*
+ * Hardware-defined CPUID leafs that are scattered in the kernel, but need to
+ * be directly by KVM.  Note, these word values conflict with the kernel's
+ * "bug" caps, but KVM doesn't use those.
+ */
+enum kvm_only_cpuid_leafs {
+	CPUID_12_EAX	 = NCAPINTS,
+	NR_KVM_CPU_CAPS,
+
+	NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
+};
+
+#define X86_KVM_FEATURE(w, f)		((w)*32 + (f))
+
+/* Intel-defined SGX sub-features, CPUID level 0x12 (EAX). */
+#define __X86_FEATURE_SGX1		X86_KVM_FEATURE(CPUID_12_EAX, 0)
+#define __X86_FEATURE_SGX2		X86_KVM_FEATURE(CPUID_12_EAX, 1)
+
+extern u32 kvm_cpu_caps[NR_KVM_CPU_CAPS] __read_mostly;
 void kvm_set_cpu_caps(void);

 void kvm_update_cpuid_runtime(struct kvm_vcpu *vcpu);
@@ -63,6 +81,7 @@ static const struct cpuid_reg reverse_cpuid[] = {
 	[CPUID_8000_0007_EBX] = {0x80000007, 0, CPUID_EBX},
 	[CPUID_7_EDX]         = {         7, 0, CPUID_EDX},
 	[CPUID_7_1_EAX]       = {         7, 1, CPUID_EAX},
+	[CPUID_12_EAX]        = {0x00000012, 0, CPUID_EAX},
 };

 /*
@@ -83,6 +102,25 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
 	BUILD_BUG_ON(reverse_cpuid[x86_leaf].function == 0);
 }

+/*
+ * A handful of feature bits are scattered in the kernel's cpufeatures word,
+ * translate them to KVM features that align with the hardware definitions.
+ */
+static __always_inline u32 __feature_translate(int x86_feature)
+{
+	if (x86_feature == X86_FEATURE_SGX1)
+		return __X86_FEATURE_SGX1;
+	else if (x86_feature == X86_FEATURE_SGX2)
+		return __X86_FEATURE_SGX2;
+
+	return x86_feature;
+}
+
+static __always_inline u32 __feature_leaf(int x86_feature)
+{
+	return __feature_translate(x86_feature) / 32;
+}
+
 /*
  * Retrieve the bit mask from an X86_FEATURE_* definition.  Features contain
  * the hardware defined bit number (stored in bits 4:0) and a software defined
@@ -91,6 +129,8 @@ static __always_inline void reverse_cpuid_check(unsigned int x86_leaf)
  */
 static __always_inline u32 __feature_bit(int x86_feature)
 {
+	x86_feature = __feature_translate(x86_feature);
+
 	reverse_cpuid_check(x86_feature / 32);
 	return 1 << (x86_feature & 31);
 }
@@ -99,7 +139,7 @@ static __always_inline u32 __feature_bit(int x86_feature)

 static __always_inline struct cpuid_reg x86_feature_cpuid(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);

 	reverse_cpuid_check(x86_leaf);
 	return reverse_cpuid[x86_leaf];
@@ -291,7 +331,7 @@ static inline bool cpuid_fault_enabled(struct kvm_vcpu *vcpu)

 static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);

 	reverse_cpuid_check(x86_leaf);
 	kvm_cpu_caps[x86_leaf] &= ~__feature_bit(x86_feature);
@@ -299,7 +339,7 @@ static __always_inline void kvm_cpu_cap_clear(unsigned int x86_feature)

 static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);

 	reverse_cpuid_check(x86_leaf);
 	kvm_cpu_caps[x86_leaf] |= __feature_bit(x86_feature);
@@ -307,7 +347,7 @@ static __always_inline void kvm_cpu_cap_set(unsigned int x86_feature)

 static __always_inline u32 kvm_cpu_cap_get(unsigned int x86_feature)
 {
-	unsigned int x86_leaf = x86_feature / 32;
+	unsigned int x86_leaf = __feature_leaf(x86_feature);

 	reverse_cpuid_check(x86_leaf);
 	return kvm_cpu_caps[x86_leaf] & __feature_bit(x86_feature);
--
2.30.0.284.gd98b1dd5eaa7-goog

Borislav Petkov Jan. 9, 2021, 12:35 a.m. UTC | #22

On Fri, Jan 08, 2021 at 03:55:52PM -0800, Sean Christopherson wrote:
> To be fair, this is the third time we've got conflicting, direct feedback on
> this exact issue.  I do agree that it doesn't make sense to burn a whole word
> for just two features, I guess I just feel like whining.
> 
> [*] https://lore.kernel.org/kvm/20180828102140.GA31102@nazgul.tnic/
> [*] https://lore.kernel.org/linux-sgx/20190924162520.GJ19317@zn.tnic/

Well, sorry that I confused you guys but in hindsight we probably should
have stopped you right then and there from imposing kvm requirements on
the machinery behind *_cpu_has() and kvm should have been a regular user
of those interfaces like the rest of the kernel code - nothing more.

And if you'd like to do your own X86_FEATURE_* querying but then extend
it with its own functionality, then that should have been decoupled.

And I will look at your patch later when brain is actually awake but
I strongly feel that in order to avoid such situations in the future,
*_cpu_has() internal functionality should be separate from kvm's
respective CPUID leafs representation. For obvious reasons.

And if there should be some partial sharing - if that makes sense at all
- then that should be first agreed upon.

Thx.

Sean Christopherson Jan. 9, 2021, 1:01 a.m. UTC | #23

On Sat, Jan 09, 2021, Borislav Petkov wrote:
> On Fri, Jan 08, 2021 at 03:55:52PM -0800, Sean Christopherson wrote:
> > To be fair, this is the third time we've got conflicting, direct feedback on
> > this exact issue.  I do agree that it doesn't make sense to burn a whole word
> > for just two features, I guess I just feel like whining.
> > 
> > [*] https://lore.kernel.org/kvm/20180828102140.GA31102@nazgul.tnic/
> > [*] https://lore.kernel.org/linux-sgx/20190924162520.GJ19317@zn.tnic/
> 
> Well, sorry that I confused you guys but in hindsight we probably should
> have stopped you right then and there from imposing kvm requirements on
> the machinery behind *_cpu_has() and kvm should have been a regular user
> of those interfaces like the rest of the kernel code - nothing more.
> 
> And if you'd like to do your own X86_FEATURE_* querying but then extend
> it with its own functionality, then that should have been decoupled.
> 
> And I will look at your patch later when brain is actually awake but
> I strongly feel that in order to avoid such situations in the future,
> *_cpu_has() internal functionality should be separate from kvm's
> respective CPUID leafs representation. For obvious reasons.

I kinda agree, but I'd prefer not to fully decouple KVM's CPUID stuff.  The more
manual definitions/translations we have to create, the more likely it is that
we'll screw something up.

> And if there should be some partial sharing - if that makes sense at all
> - then that should be first agreed upon.

Assuming the code I wrote actually works, I think that gets KVM to the point
where handling scattered features isn't awful, which should eliminate most of
the friction.  KVM would still be relying on the internals of *_cpu_has(), but
there are quite a few build-time assertions that help keep things aligned.  And,
what's best for the kernel will be what's best for KVM the vast majority of the
time, e.g. I don't anticipate the kernel scattering densely populated words just
for giggles.

Borislav Petkov Jan. 9, 2021, 1:19 a.m. UTC | #24

On Fri, Jan 08, 2021 at 03:55:52PM -0800, Sean Christopherson wrote:
> diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> index dc921d76e42e..21f92d81d5a5 100644
> --- a/arch/x86/kvm/cpuid.h
> +++ b/arch/x86/kvm/cpuid.h
> @@ -7,7 +7,25 @@
>  #include <asm/processor.h>
>  #include <uapi/asm/kvm_para.h>
> 
> -extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
> +/*
> + * Hardware-defined CPUID leafs that are scattered in the kernel, but need to
> + * be directly by KVM.  Note, these word values conflict with the kernel's
> + * "bug" caps, but KVM doesn't use those.

This feels like another conflict waiting to happen if KVM decides to use
them at some point...

So let me get this straight: KVM wants to use X86_FEATURE_* which
means, those numbers must map to the respective words in its CPUID caps
representation kvm_cpu_caps, AFAICT.

Then, it wants the leafs to correspond to the hardware leafs layout so
that it can do:

	kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);

which comes straight from CPUID.

So lemme look at one word:

        kvm_cpu_cap_mask(CPUID_1_EDX,
                F(FPU) | F(VME) | F(DE) | F(PSE) |
                F(TSC) | F(MSR) | F(PAE) | F(MCE) |
		...

it would build the bitmask of the CPUID leaf using X86_FEATURE_* bits
and then mask it out with the hardware leaf read from CPUID.

But why?

Why doesn't it simply build those leafs in kvm_cpu_caps from the leafs
we've already queried?

Oh it does so a bit earlier:

        memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
               sizeof(kvm_cpu_caps));

and that kvm_cpu_cap_mask() call is to clear some bits in kvm_cpu_caps
which is kvm-specific thing (not supported stuff etc).

But then why does kvm_cpu_cap_mask() does cpuid_count()? Didn't it just
read the bits from boot_cpu_data.x86_capability? And those bits we do
query and massage extensively during boot. So why does KVM needs to
query CPUID again instead of using what we've already queried?

Maybe I'm missing something kvm-specific.

In any case, this feels somewhat weird: you have *_cpu_has() on
baremetal abstracting almost completely from CPUID by collecting all
feature bits it needs into its own structure - x86_capability[] along
with accessors for it - and then you want to "abstract back" to CPUID
leafs from that interface. I wonder why.

Anyway, more questions tomorrow.

Gnight and good luck. :)

Sean Christopherson Jan. 11, 2021, 5:54 p.m. UTC | #25

On Sat, Jan 09, 2021, Borislav Petkov wrote:
> On Fri, Jan 08, 2021 at 03:55:52PM -0800, Sean Christopherson wrote:
> > diff --git a/arch/x86/kvm/cpuid.h b/arch/x86/kvm/cpuid.h
> > index dc921d76e42e..21f92d81d5a5 100644
> > --- a/arch/x86/kvm/cpuid.h
> > +++ b/arch/x86/kvm/cpuid.h
> > @@ -7,7 +7,25 @@
> >  #include <asm/processor.h>
> >  #include <uapi/asm/kvm_para.h>
> > 
> > -extern u32 kvm_cpu_caps[NCAPINTS] __read_mostly;
> > +/*
> > + * Hardware-defined CPUID leafs that are scattered in the kernel, but need to
> > + * be directly by KVM.  Note, these word values conflict with the kernel's
> > + * "bug" caps, but KVM doesn't use those.
> 
> This feels like another conflict waiting to happen if KVM decides to use
> them at some point...

Yes, but KVM including the bug caps in kvm_cpu_caps is extremely unlikely, and
arguably flat out wrong.  Currently, kvm_cpu_caps includes only CPUID-based
features that can be exposed direcly to the guest.  I could see a scenario where
KVM exposed "bug" capabilities to the guest via a paravirt interface, but I
would expect that KVM would either filter and expose the kernel's bug caps
without userspace input, or would add a KVM-defined paravirt CPUID leaf to
enumerate the caps and track _that_ in kvm_cpu_caps.

Anyways, I agree that overlapping the bug caps it's a bit of unnecessary
cleverness.  I'm not opposed to incorporating NBUGINTS into KVM, but that would
mean explicitly pulling in even more x86_capability implementation details.

> So let me get this straight: KVM wants to use X86_FEATURE_* which
> means, those numbers must map to the respective words in its CPUID caps
> representation kvm_cpu_caps, AFAICT.

That part is deliberate and isn't a dependency so much as how things are
implemented.  The true dependency is on the bit offsets within each word.  The
kernel could completely rescramble the word numbering and KVM would chug along
happily.  What KVM won't play nice with is if the kernel broke up a hardware-
defined, gathered CPUID leaf/word into scattered features spread out amongst
multiple Linux-defined words.

> Then, it wants the leafs to correspond to the hardware leafs layout so
> that it can do:
> 
> 	kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
> 
> which comes straight from CPUID.
> 
> So lemme look at one word:
> 
>         kvm_cpu_cap_mask(CPUID_1_EDX,
>                 F(FPU) | F(VME) | F(DE) | F(PSE) |
>                 F(TSC) | F(MSR) | F(PAE) | F(MCE) |
> 		...
> 
> 
> it would build the bitmask of the CPUID leaf using X86_FEATURE_* bits
> and then mask it out with the hardware leaf read from CPUID.
> 
> But why?
> 
> Why doesn't it simply build those leafs in kvm_cpu_caps from the leafs
> we've already queried?
> 
> Oh it does so a bit earlier:
> 
>         memcpy(&kvm_cpu_caps, &boot_cpu_data.x86_capability,
>                sizeof(kvm_cpu_caps));
> 
> and that kvm_cpu_cap_mask() call is to clear some bits in kvm_cpu_caps
> which is kvm-specific thing (not supported stuff etc).
> 
> But then why does kvm_cpu_cap_mask() does cpuid_count()? Didn't it just
> read the bits from boot_cpu_data.x86_capability? And those bits we do
> query and massage extensively during boot. So why does KVM needs to
> query CPUID again instead of using what we've already queried?

It's mostly historical; before the kvm_cpu_caps concept was introduced, the code
had grown organically to include both boot_cpu_data and raw CPUID info.  The
vast, vast majority of the time, doing CPUID is likely redundant.  But, as noted
in commit d8577a4c238f ("KVM: x86: Do host CPUID at load time to mask KVM cpu
caps"), the code is quite cheap and runs once at KVM load.  My argument back
then was, and still is, that an extra bit of paranoia is justified since the
code and operations are quite nearly free.

This particular dependency can be broken, and quite easily at that.  Rather than
memcpy() boot_cpu_data.x86_capability, it's trivially easy to redefine the F()
macro to invoke boot_cpu_has(), which would allow dropping the memcpy().  The
big downside, and why I didn't post the code, is that doing so means every
feature routed through F() requires some form of BT+Jcc (or CMOVcc) sequence,
whereas the mempcy() approach allows the F() features to be encoded as a single
literal by the compiler.

From a latency perspective, the extra code is negligible.  The big issue is that
all those extra checks add 2k+ bytes of code.  Eliminating the mempcy() doesn't
actually break KVM's dependency on the bit offsets, so we'd be bloating kvm.ko
by a noticeable amount without providing substantial value.

And, this behavior is mostly opportunistic; the true justification/motiviation
for taking a dependency on the X86_FEATURE_* bit offsets is for communication
with userspace, querying the guest CPU model, and runtime checks.

> Maybe I'm missing something kvm-specific.
> 
> In any case, this feels somewhat weird: you have *_cpu_has() on
> baremetal abstracting almost completely from CPUID by collecting all
> feature bits it needs into its own structure - x86_capability[] along
> with accessors for it - and then you want to "abstract back" to CPUID
> leafs from that interface. I wonder why.

It's effectively for communication with userspace.  Userspace, via ioctl(),
dictates the vCPU model to KVM, including the exact CPUID results.  to properly
virtualize/emulate the defined vCPU model, KVM must query the dictated CPUID
results to determine what features are supported, what guest operations
should fault, etc...  E.g. if the vCPU model, via CPUID, states that SMEP isn't
supported then KVM needs to inject a #GP if the guest attempts to set CR4.SMEP.

KVM also uses the hardware-defined CPUID ABI to advertise which features are
supported by both hardware and KVM.  This is the kvm_cpu_cap stuff, where KVM
reads boot_cpu_data to see what features were enabled by the kernel.

It would be possible for KVM to break the dependency on X86_FEATURE_* bit
offsets by defining a translation layer, but I strongly feel that adding manual
translations will do more harm than good as it increases the odds of us botching
a translation or using the wrong feature flag, creates potential namespace
conflicts, etc...

Borislav Petkov Jan. 11, 2021, 7:09 p.m. UTC | #26

On Mon, Jan 11, 2021 at 09:54:17AM -0800, Sean Christopherson wrote:
> Yes, but KVM including the bug caps in kvm_cpu_caps is extremely unlikely, and
> arguably flat out wrong.  Currently, kvm_cpu_caps includes only CPUID-based
> features that can be exposed direcly to the guest.  I could see a scenario where
> KVM exposed "bug" capabilities to the guest via a paravirt interface, but I
> would expect that KVM would either filter and expose the kernel's bug caps
> without userspace input, or would add a KVM-defined paravirt CPUID leaf to
> enumerate the caps and track _that_ in kvm_cpu_caps.
> 
> Anyways, I agree that overlapping the bug caps it's a bit of unnecessary
> cleverness.  I'm not opposed to incorporating NBUGINTS into KVM, but that would
> mean explicitly pulling in even more x86_capability implementation details.

Also, the kernel and kvm being part of it :) kinda tries to fix those
bugs and not expose them to the guest so exposing a bug would probably
be only for testing purposes...

> That part is deliberate and isn't a dependency so much as how things are
> implemented.  The true dependency is on the bit offsets within each word. 

Right.

> The kernel could completely rescramble the word numbering and KVM
> would chug along happily. What KVM won't play nice with is if the
> kernel broke up a hardware- defined, gathered CPUID leaf/word into
> scattered features spread out amongst multiple Linux-defined words.

Yes, kvm wants the bits just as they are in the CPUID leafs from the hw.

> It's mostly historical; before the kvm_cpu_caps concept was introduced, the code
> had grown organically to include both boot_cpu_data and raw CPUID info.  The
> vast, vast majority of the time, doing CPUID is likely redundant.  But, as noted
> in commit d8577a4c238f ("KVM: x86: Do host CPUID at load time to mask KVM cpu
> caps"), the code is quite cheap and runs once at KVM load.  My argument back
> then was, and still is, that an extra bit of paranoia is justified since the
> code and operations are quite nearly free.

Ok.

> This particular dependency can be broken, and quite easily at that.  Rather than
> memcpy() boot_cpu_data.x86_capability, it's trivially easy to redefine the F()
> macro to invoke boot_cpu_has(), which would allow dropping the memcpy().  The
> big downside, and why I didn't post the code, is that doing so means every
> feature routed through F() requires some form of BT+Jcc (or CMOVcc) sequence,
> whereas the mempcy() approach allows the F() features to be encoded as a single
> literal by the compiler.
> 
> From a latency perspective, the extra code is negligible.  The big issue is that
> all those extra checks add 2k+ bytes of code.  Eliminating the mempcy() doesn't
> actually break KVM's dependency on the bit offsets, so we'd be bloating kvm.ko
> by a noticeable amount without providing substantial value.
> 
> And, this behavior is mostly opportunistic; the true justification/motiviation
> for taking a dependency on the X86_FEATURE_* bit offsets is for communication
> with userspace, querying the guest CPU model, and runtime checks.

Ok, I guess we'll try to find a middle ground here and not let stuff
grow too ugly to live.

> It's effectively for communication with userspace.  Userspace, via ioctl(),
> dictates the vCPU model to KVM, including the exact CPUID results. 

And using the CPUID leafs with the exact bit positions is sort of an
"interface" there, I see.

> to properly
> virtualize/emulate the defined vCPU model, KVM must query the dictated CPUID
> results to determine what features are supported, what guest operations
> should fault, etc...  E.g. if the vCPU model, via CPUID, states that SMEP isn't
> supported then KVM needs to inject a #GP if the guest attempts to set CR4.SMEP.
> 
> KVM also uses the hardware-defined CPUID ABI to advertise which features are
> supported by both hardware and KVM.  This is the kvm_cpu_cap stuff, where KVM
> reads boot_cpu_data to see what features were enabled by the kernel.

Right.

> It would be possible for KVM to break the dependency on X86_FEATURE_* bit
> offsets by defining a translation layer, but I strongly feel that adding manual
> translations will do more harm than good as it increases the odds of us botching
> a translation or using the wrong feature flag, creates potential namespace
> conflicts, etc...

Ok, lemme see if we might encounter more issues down the road...

+enum kvm_only_cpuid_leafs {
+       CPUID_12_EAX     = NCAPINTS,
+       NR_KVM_CPU_CAPS,
+
+       NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
+};
+

What happens when we decide to allocate a separate leaf for CPUID_12_EAX
down the road?

You do it already here

Subject: [PATCH 04/13] x86/cpufeatures: Assign dedicated feature word for AMD mem encryption

for the AMD leaf.

I'm thinking this way around - from scattered to a hw one - should be ok
because that should work easily. The other way around, taking a hw leaf
and scattering it around x86_capability[] array elems would probably be
nasty but with your change that should work too.

Yah, I'm just hypothesizing here - I don't think this "other way around"
will ever happen...

Hmm, yap, I can cautiously say that with your change we should be ok...

Thx.

Sean Christopherson Jan. 11, 2021, 7:20 p.m. UTC | #27

On Mon, Jan 11, 2021, Borislav Petkov wrote:
> On Mon, Jan 11, 2021 at 09:54:17AM -0800, Sean Christopherson wrote:
> > It would be possible for KVM to break the dependency on X86_FEATURE_* bit
> > offsets by defining a translation layer, but I strongly feel that adding manual
> > translations will do more harm than good as it increases the odds of us botching
> > a translation or using the wrong feature flag, creates potential namespace
> > conflicts, etc...
> 
> Ok, lemme see if we might encounter more issues down the road...
> 
> +enum kvm_only_cpuid_leafs {
> +       CPUID_12_EAX     = NCAPINTS,
> +       NR_KVM_CPU_CAPS,
> +
> +       NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
> +};
> +
> 
> What happens when we decide to allocate a separate leaf for CPUID_12_EAX
> down the road?

Well, mechanically, that would generate a build failure if the kernel does the
obvious things and names the 'enum cpuid_leafs' entry CPUID_12_EAX.  That would
be an obvious clue that KVM should be updated.

If the kernel named the enum entry something different, and we botched the code
review, KVM would continue to work, but would unnecessarily copy the bits it
cares about to its own word.   E.g. the boot_cpu_has() checks and translation to
__X86_FEATURE_* would still be valid.  As far as failure modes go, that's not
terrible.

> You do it already here
> 
> Subject: [PATCH 04/13] x86/cpufeatures: Assign dedicated feature word for AMD mem encryption
> 
> for the AMD leaf.
> 
> I'm thinking this way around - from scattered to a hw one - should be ok
> because that should work easily. The other way around, taking a hw leaf
> and scattering it around x86_capability[] array elems would probably be
> nasty but with your change that should work too.
> 
> Yah, I'm just hypothesizing here - I don't think this "other way around"
> will ever happen...
> 
> Hmm, yap, I can cautiously say that with your change we should be ok...
> 
> Thx.
> 
> -- 
> Regards/Gruss,
>     Boris.
> 
> https://people.kernel.org/tglx/notes-about-netiquette

Jarkko Sakkinen Jan. 11, 2021, 11:39 p.m. UTC | #28

On Wed, Jan 06, 2021 at 02:55:21PM +1300, Kai Huang wrote:
> From: Sean Christopherson <sean.j.christopherson@intel.com>
> 
> Add a feature word to hold SGX features enumerated via CPUID.0x12.0x0,
> along with flags for SGX1 and SGX2. As part of virtualizing SGX, KVM
> needs to expose the SGX CPUID leafs to its guest. SGX1 and SGX2 need to
> be in a dedicated feature word so that they can be queried via KVM's
> reverse CPUID lookup to properly emulate the expected guest behavior.
> 
> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com>
>  [Kai: Also clear SGX1 and SGX2 bits in clear_sgx_caps().]
> Signed-off-by: Kai Huang <kai.huang@intel.com>

Acked-by: Jarkko Sakkinen <jarkko@kernel.org>

/Jarkko

Huang, Kai Jan. 12, 2021, 2:01 a.m. UTC | #29

On Mon, 11 Jan 2021 11:20:11 -0800 Sean Christopherson wrote:
> On Mon, Jan 11, 2021, Borislav Petkov wrote:
> > On Mon, Jan 11, 2021 at 09:54:17AM -0800, Sean Christopherson wrote:
> > > It would be possible for KVM to break the dependency on X86_FEATURE_* bit
> > > offsets by defining a translation layer, but I strongly feel that adding manual
> > > translations will do more harm than good as it increases the odds of us botching
> > > a translation or using the wrong feature flag, creates potential namespace
> > > conflicts, etc...
> > 
> > Ok, lemme see if we might encounter more issues down the road...
> > 
> > +enum kvm_only_cpuid_leafs {
> > +       CPUID_12_EAX     = NCAPINTS,
> > +       NR_KVM_CPU_CAPS,
> > +
> > +       NKVMCAPINTS = NR_KVM_CPU_CAPS - NCAPINTS,
> > +};
> > +
> > 
> > What happens when we decide to allocate a separate leaf for CPUID_12_EAX
> > down the road?
> 
> Well, mechanically, that would generate a build failure if the kernel does the
> obvious things and names the 'enum cpuid_leafs' entry CPUID_12_EAX.  That would
> be an obvious clue that KVM should be updated.
> 
> If the kernel named the enum entry something different, and we botched the code
> review, KVM would continue to work, but would unnecessarily copy the bits it
> cares about to its own word.   E.g. the boot_cpu_has() checks and translation to
> __X86_FEATURE_* would still be valid.  As far as failure modes go, that's not
> terrible.

Should we add a dedicated, i.e. kvm_scattered_cpu_caps[], instead of using
existing kvm_cpu_cap[NCAPINTS]? If so this issue can be avoided??

> 
> > You do it already here
> > 
> > Subject: [PATCH 04/13] x86/cpufeatures: Assign dedicated feature word for AMD mem encryption
> > 
> > for the AMD leaf.
> > 
> > I'm thinking this way around - from scattered to a hw one - should be ok
> > because that should work easily. The other way around, taking a hw leaf
> > and scattering it around x86_capability[] array elems would probably be
> > nasty but with your change that should work too.
> > 
> > Yah, I'm just hypothesizing here - I don't think this "other way around"
> > will ever happen...
> > 
> > Hmm, yap, I can cautiously say that with your change we should be ok...
> > 
> > Thx.
> > 
> > -- 
> > Regards/Gruss,
> >     Boris.
> > 
> > https://people.kernel.org/tglx/notes-about-netiquette

Borislav Petkov Jan. 12, 2021, 12:13 p.m. UTC | #30

On Mon, Jan 11, 2021 at 11:20:11AM -0800, Sean Christopherson wrote:
> Well, mechanically, that would generate a build failure if the kernel does the
> obvious things and names the 'enum cpuid_leafs' entry CPUID_12_EAX.  That would
> be an obvious clue that KVM should be updated.

Then we need to properly document that whenever someone does that
change, someone needs to touch the proper places.

> If the kernel named the enum entry something different, and we botched the code
> review, KVM would continue to work, but would unnecessarily copy the bits it
> cares about to its own word.   E.g. the boot_cpu_has() checks and translation to
> __X86_FEATURE_* would still be valid.  As far as failure modes go, that's not
> terrible.

Right, which reminds me: with your prototype patch, we would have:

static __always_inline void __kvm_cpu_cap_mask(enum cpuid_leafs leaf)
{
        const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
        struct kvm_cpuid_entry2 entry;

        reverse_cpuid_check(leaf);

        cpuid_count(cpuid.function, cpuid.index,
                    &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);

        kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
}

which does read CPUID from the hw and kvm_cpu_caps[] has already the
copied bits from boot_cpu_data.x86_capability.

Now you said that reading the CPUID is mostly redundant but we're
paranoid so we do it anyway, just in case, so how about we remove the
copying of boot_cpu_data.x86_capability? That's one less dependency
on the baremetal implementation.

Practically, nothing changes for kvm because it will read CPUID which is
the canonical thing anyway. And this should simplify the deal more and
keep it simple(r).

Hmmm.

Sean Christopherson Jan. 12, 2021, 5:15 p.m. UTC | #31

On Tue, Jan 12, 2021, Borislav Petkov wrote:
> On Mon, Jan 11, 2021 at 11:20:11AM -0800, Sean Christopherson wrote:
> > Well, mechanically, that would generate a build failure if the kernel does the
> > obvious things and names the 'enum cpuid_leafs' entry CPUID_12_EAX.  That would
> > be an obvious clue that KVM should be updated.
> 
> Then we need to properly document that whenever someone does that
> change, someone needs to touch the proper places.
> 
> > If the kernel named the enum entry something different, and we botched the code
> > review, KVM would continue to work, but would unnecessarily copy the bits it
> > cares about to its own word.   E.g. the boot_cpu_has() checks and translation to
> > __X86_FEATURE_* would still be valid.  As far as failure modes go, that's not
> > terrible.
> 
> Right, which reminds me: with your prototype patch, we would have:
> 
> static __always_inline void __kvm_cpu_cap_mask(enum cpuid_leafs leaf)
> {
>         const struct cpuid_reg cpuid = x86_feature_cpuid(leaf * 32);
>         struct kvm_cpuid_entry2 entry;
> 
>         reverse_cpuid_check(leaf);
> 
>         cpuid_count(cpuid.function, cpuid.index,
>                     &entry.eax, &entry.ebx, &entry.ecx, &entry.edx);
> 
>         kvm_cpu_caps[leaf] &= *__cpuid_entry_get_reg(&entry, cpuid.reg);
> }
> 
> which does read CPUID from the hw and kvm_cpu_caps[] has already the
> copied bits from boot_cpu_data.x86_capability.
> 
> Now you said that reading the CPUID is mostly redundant but we're
> paranoid so we do it anyway, just in case, so how about we remove the
> copying of boot_cpu_data.x86_capability? That's one less dependency
> on the baremetal implementation.
>
> Practically, nothing changes for kvm because it will read CPUID which is
> the canonical thing anyway. And this should simplify the deal more and
> keep it simple(r).

We want the boot_cpu_data.x86_capability memcpy() so that KVM doesn't advertise
support for features that are intentionally disabled in the kernel, e.g. via
kernel params.  Except for a few special cases, e.g. LA57, KVM doesn't enable
features in the guest if they're disabled in the host, even if the features are
supported in hardware.

For some features, e.g. SMEP and SMAP, honoring boot_cpu_data is mostly about
respecting the kernel's wishes, i.e. barring hardware bugs, enabling such
features in the guest won't break anything.  But for other features, e.g. XSAVE
based features, enabling them in the guest without proper support in the host
will corrupt guest and/or host state.

So it's really the CPUID read that is (mostly) superfluous.

Borislav Petkov Jan. 12, 2021, 5:51 p.m. UTC | #32

On Tue, Jan 12, 2021 at 09:15:52AM -0800, Sean Christopherson wrote:
> We want the boot_cpu_data.x86_capability memcpy() so that KVM doesn't advertise
> support for features that are intentionally disabled in the kernel, e.g. via
> kernel params.  Except for a few special cases, e.g. LA57, KVM doesn't enable
> features in the guest if they're disabled in the host, even if the features are
> supported in hardware.
> 
> For some features, e.g. SMEP and SMAP, honoring boot_cpu_data is mostly about
> respecting the kernel's wishes, i.e. barring hardware bugs, enabling such
> features in the guest won't break anything.  But for other features, e.g. XSAVE
> based features, enabling them in the guest without proper support in the host
> will corrupt guest and/or host state.

Ah ok, that is an important point.
 
> So it's really the CPUID read that is (mostly) superfluous.

Yeah, but that is cheap, as we established.

Ok then, I don't see anything that might be a problem and I guess we can
try that handling of scattered bits in kvm and see how far we'll get.

Thx.

Huang, Kai Jan. 12, 2021, 9:07 p.m. UTC | #33

On Tue, 2021-01-12 at 18:51 +0100, Borislav Petkov wrote:
> On Tue, Jan 12, 2021 at 09:15:52AM -0800, Sean Christopherson wrote:
> > We want the boot_cpu_data.x86_capability memcpy() so that KVM doesn't advertise
> > support for features that are intentionally disabled in the kernel, e.g. via
> > kernel params.  Except for a few special cases, e.g. LA57, KVM doesn't enable
> > features in the guest if they're disabled in the host, even if the features are
> > supported in hardware.
> > 
> > For some features, e.g. SMEP and SMAP, honoring boot_cpu_data is mostly about
> > respecting the kernel's wishes, i.e. barring hardware bugs, enabling such
> > features in the guest won't break anything.  But for other features, e.g. XSAVE
> > based features, enabling them in the guest without proper support in the host
> > will corrupt guest and/or host state.
> 
> Ah ok, that is an important point.
>  
> 
> 
> 
> > So it's really the CPUID read that is (mostly) superfluous.
> 
> Yeah, but that is cheap, as we established.
> 
> Ok then, I don't see anything that might be a problem and I guess we can
> try that handling of scattered bits in kvm and see how far we'll get.

Hi Sean, Boris,

Thanks for all  your feedback.

Sean,

Do you want to send me your patch (so that with your SoB), or do you want me to copy
& paste the code you posted in this series, plus Suggested-by you? Or how do you want
to proceed?

Also to me it is better to separate X86_FEATURE_SGX1/2 with rest of KVM changes?

And do you think adding a dedicated, i.e. kvm_scattered_cpu_caps[], instead of using
existing kvm_cpu_cap[NCAPINTS] would be helpful to solve the problem caused by adding
new leaf to x86 core (see my another reply in this thread)?

> 
> Thx.
>

Sean Christopherson Jan. 12, 2021, 11:17 p.m. UTC | #34

On Wed, Jan 13, 2021, Kai Huang wrote:
> On Tue, 2021-01-12 at 18:51 +0100, Borislav Petkov wrote:
> > On Tue, Jan 12, 2021 at 09:15:52AM -0800, Sean Christopherson wrote:
> > > We want the boot_cpu_data.x86_capability memcpy() so that KVM doesn't advertise
> > > support for features that are intentionally disabled in the kernel, e.g. via
> > > kernel params.  Except for a few special cases, e.g. LA57, KVM doesn't enable
> > > features in the guest if they're disabled in the host, even if the features are
> > > supported in hardware.
> > > 
> > > For some features, e.g. SMEP and SMAP, honoring boot_cpu_data is mostly about
> > > respecting the kernel's wishes, i.e. barring hardware bugs, enabling such
> > > features in the guest won't break anything.  But for other features, e.g. XSAVE
> > > based features, enabling them in the guest without proper support in the host
> > > will corrupt guest and/or host state.
> > 
> > Ah ok, that is an important point.
> > 
> > > So it's really the CPUID read that is (mostly) superfluous.
> > 
> > Yeah, but that is cheap, as we established.
> > 
> > Ok then, I don't see anything that might be a problem and I guess we can
> > try that handling of scattered bits in kvm and see how far we'll get.
> 
> Hi Sean, Boris,
> 
> Thanks for all  your feedback.
> 
> Sean,
> 
> Do you want to send me your patch (so that with your SoB), or do you want me to copy
> & paste the code you posted in this series, plus Suggested-by you? Or how do you want
> to proceed?
> 
> Also to me it is better to separate X86_FEATURE_SGX1/2 with rest of KVM changes?

Hmm, I'll split the changes into two proper patches and send them to you off list.

> And do you think adding a dedicated, i.e. kvm_scattered_cpu_caps[], instead of using
> existing kvm_cpu_cap[NCAPINTS] would be helpful to solve the problem caused by adding
> new leaf to x86 core (see my another reply in this thread)?

Probably not, because then we'd have to add new helpers to deal with the new
array, or change all the helpers to take the array as a pointer.  Blasting past
NCAPINTS is a little evil, but it does slot in nicely to the existing code.

Huang, Kai Jan. 13, 2021, 1:05 a.m. UTC | #35

On Tue, 2021-01-12 at 15:17 -0800, Sean Christopherson wrote:
> On Wed, Jan 13, 2021, Kai Huang wrote:
> > On Tue, 2021-01-12 at 18:51 +0100, Borislav Petkov wrote:
> > > On Tue, Jan 12, 2021 at 09:15:52AM -0800, Sean Christopherson wrote:
> > > > We want the boot_cpu_data.x86_capability memcpy() so that KVM doesn't advertise
> > > > support for features that are intentionally disabled in the kernel, e.g. via
> > > > kernel params.  Except for a few special cases, e.g. LA57, KVM doesn't enable
> > > > features in the guest if they're disabled in the host, even if the features are
> > > > supported in hardware.
> > > > 
> > > > For some features, e.g. SMEP and SMAP, honoring boot_cpu_data is mostly about
> > > > respecting the kernel's wishes, i.e. barring hardware bugs, enabling such
> > > > features in the guest won't break anything.  But for other features, e.g. XSAVE
> > > > based features, enabling them in the guest without proper support in the host
> > > > will corrupt guest and/or host state.
> > > 
> > > Ah ok, that is an important point.
> > > 
> > > > So it's really the CPUID read that is (mostly) superfluous.
> > > 
> > > Yeah, but that is cheap, as we established.
> > > 
> > > Ok then, I don't see anything that might be a problem and I guess we can
> > > try that handling of scattered bits in kvm and see how far we'll get.
> > 
> > Hi Sean, Boris,
> > 
> > Thanks for all  your feedback.
> > 
> > Sean,
> > 
> > Do you want to send me your patch (so that with your SoB), or do you want me to copy
> > & paste the code you posted in this series, plus Suggested-by you? Or how do you want
> > to proceed?
> > 
> > Also to me it is better to separate X86_FEATURE_SGX1/2 with rest of KVM changes?
> 
> Hmm, I'll split the changes into two proper patches and send them to you off list.

Thanks.

> 
> > And do you think adding a dedicated, i.e. kvm_scattered_cpu_caps[], instead of using
> > existing kvm_cpu_cap[NCAPINTS] would be helpful to solve the problem caused by adding
> > new leaf to x86 core (see my another reply in this thread)?
> 
> Probably not, because then we'd have to add new helpers to deal with the new
> array, or change all the helpers to take the array as a pointer.  Blasting past
> NCAPINTS is a little evil, but it does slot in nicely to the existing code.

Sure. Thanks.

[RFC,04/23] x86/cpufeatures: Add SGX1 and SGX2 sub-features

Commit Message

Comments

Patch