diff mbox series

[v3,06/27] x86/cea: Export per CPU variable cea_exception_stacks

Message ID 20241001050110.3643764-7-xin@zytor.com (mailing list archive)
State New
Headers show
Series Enable FRED with KVM VMX | expand

Commit Message

Xin Li Oct. 1, 2024, 5 a.m. UTC
The per CPU variable cea_exception_stacks contains per CPU stacks for
NMI, #DB and #DF, which is referenced in KVM to set host FRED RSP[123]
each time a vCPU is loaded onto a CPU, thus it needs to be exported.

Signed-off-by: Xin Li (Intel) <xin@zytor.com>
Tested-by: Shan Kang <shan.kang@intel.com>
---
 arch/x86/mm/cpu_entry_area.c | 1 +
 1 file changed, 1 insertion(+)

Comments

Dave Hansen Oct. 1, 2024, 4:12 p.m. UTC | #1
On 9/30/24 22:00, Xin Li (Intel) wrote:
> The per CPU variable cea_exception_stacks contains per CPU stacks for
> NMI, #DB and #DF, which is referenced in KVM to set host FRED RSP[123]
> each time a vCPU is loaded onto a CPU, thus it needs to be exported.

Nit: It's not obvious how 'cea_exception_stacks' get used in this
series.  It's never referenced explicitly.

I did figure it out by looking for 'RSP[123]' references, but a much
better changelog would be something like:

	The per CPU array 'cea_exception_stacks' points to per CPU
	stacks for NMI, #DB and #DF. It is normally referenced via the
	#define: __this_cpu_ist_top_va().

	FRED introduced new fields in the host-state area of the VMCS
	for stack levels 1->3 (HOST_IA32_FRED_RSP[123]). KVM must
	populate these each time a vCPU is loaded onto a CPU.

See how that explicitly gives the reader greppable strings for
"__this_cpu_ist_top_va" and "HOST_IA32_FRED_RSP"?  That makes it much
easier to figure out what is going on.

I was also momentarily confused about why these loads need to be done on
_every_ vCPU load.  I think it's because the host state can change as
the vCPU moves around to different physical CPUs and
__this_cpu_ist_top_va() can and will change.  But it's a detail that I
think deserves to be explained in the changelog.  There is also this
note in vmx_vcpu_load_vmcs():

>                 /*
>                  * Linux uses per-cpu TSS and GDT, so set these when switching
>                  * processors.  See 22.2.4.
>                  */

which makes me think that it might not be bad to pull *all* of the
per-cpu VMCS field population code out into a helper since the reasoning
of why these need to be repopulated is identical.

Also, what's the purpose of clearing GUEST_IA32_FRED_RSP[123] at
init_vmcs() time?  I would have thought that those values wouldn't
matter until the VMCS gets loaded at vmx_vcpu_load_vmcs() when they are
overwritten anyway.  Or, I could be just totally misunderstanding how
KVM consumes the VMCS. :)
Xin Li Oct. 1, 2024, 5:51 p.m. UTC | #2
On 10/1/2024 9:12 AM, Dave Hansen wrote:
> On 9/30/24 22:00, Xin Li (Intel) wrote:
>> The per CPU variable cea_exception_stacks contains per CPU stacks for
>> NMI, #DB and #DF, which is referenced in KVM to set host FRED RSP[123]
>> each time a vCPU is loaded onto a CPU, thus it needs to be exported.
> 
> Nit: It's not obvious how 'cea_exception_stacks' get used in this
> series.  It's never referenced explicitly.
> 
> I did figure it out by looking for 'RSP[123]' references, but a much
> better changelog would be something like:
> 
> 	The per CPU array 'cea_exception_stacks' points to per CPU
> 	stacks for NMI, #DB and #DF. It is normally referenced via the
> 	#define: __this_cpu_ist_top_va().
> 
> 	FRED introduced new fields in the host-state area of the VMCS
> 	for stack levels 1->3 (HOST_IA32_FRED_RSP[123]). KVM must
> 	populate these each time a vCPU is loaded onto a CPU.
> 

Yeah, this is way clearer.

> See how that explicitly gives the reader greppable strings for
> "__this_cpu_ist_top_va" and "HOST_IA32_FRED_RSP"?  That makes it much
> easier to figure out what is going on.

Nice for a maintainer in 20 years :)

> 
> I was also momentarily confused about why these loads need to be done on
> _every_ vCPU load.  I think it's because the host state can change as
> the vCPU moves around to different physical CPUs and
> __this_cpu_ist_top_va() can and will change.  But it's a detail that I
> think deserves to be explained in the changelog.  There is also this
> note in vmx_vcpu_load_vmcs():

Makes sense to me.

> 
>>                  /*
>>                   * Linux uses per-cpu TSS and GDT, so set these when switching
>>                   * processors.  See 22.2.4.
>>                   */
> 
> which makes me think that it might not be bad to pull *all* of the
> per-cpu VMCS field population code out into a helper since the reasoning
> of why these need to be repopulated is identical.
> 
> Also, what's the purpose of clearing GUEST_IA32_FRED_RSP[123] at
> init_vmcs() time?  I would have thought that those values wouldn't
> matter until the VMCS gets loaded at vmx_vcpu_load_vmcs() when they are
> overwritten anyway.  Or, I could be just totally misunderstanding how
> KVM consumes the VMCS. :)

I don't see any misunderstanding.  However we just do what the SDM
claims, even it seems that it's not a must *logically*.

FRED spec says:
The RESET state of each of the new MSRs is zero. INIT does not change
the value of the new MSRs

Thanks!
     Xin
Dave Hansen Oct. 1, 2024, 6:18 p.m. UTC | #3
On 10/1/24 10:51, Xin Li wrote:
...>> Also, what's the purpose of clearing GUEST_IA32_FRED_RSP[123] at
>> init_vmcs() time?  I would have thought that those values wouldn't
>> matter until the VMCS gets loaded at vmx_vcpu_load_vmcs() when they are
>> overwritten anyway.  Or, I could be just totally misunderstanding how
>> KVM consumes the VMCS. 
diff mbox series

Patch

diff --git a/arch/x86/mm/cpu_entry_area.c b/arch/x86/mm/cpu_entry_area.c
index 575f863f3c75..b8af71b67d9a 100644
--- a/arch/x86/mm/cpu_entry_area.c
+++ b/arch/x86/mm/cpu_entry_area.c
@@ -17,6 +17,7 @@  static DEFINE_PER_CPU_PAGE_ALIGNED(struct entry_stack_page, entry_stack_storage)
 #ifdef CONFIG_X86_64
 static DEFINE_PER_CPU_PAGE_ALIGNED(struct exception_stacks, exception_stacks);
 DEFINE_PER_CPU(struct cea_exception_stacks*, cea_exception_stacks);
+EXPORT_PER_CPU_SYMBOL(cea_exception_stacks);
 
 static DEFINE_PER_CPU_READ_MOSTLY(unsigned long, _cea_offset);