diff mbox series

x86/svm: Merge hsa and host_vmcb to reduce memory overhead

Message ID 20201026135043.15560-1-andrew.cooper3@citrix.com (mailing list archive)
State New
Headers show
Series x86/svm: Merge hsa and host_vmcb to reduce memory overhead | expand

Commit Message

Andrew Cooper Oct. 26, 2020, 1:50 p.m. UTC
The format of the Host State Area is, and has always been, a VMCB.  It is
explicitly safe to put the host VMSAVE data in.

This removes 4k of memory allocation per pCPU.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
---
 xen/arch/x86/hvm/svm/svm.c | 27 ++++-----------------------
 1 file changed, 4 insertions(+), 23 deletions(-)

Comments

Jan Beulich Oct. 27, 2020, 3:24 p.m. UTC | #1
On 26.10.2020 14:50, Andrew Cooper wrote:
> The format of the Host State Area is, and has always been, a VMCB.  It is
> explicitly safe to put the host VMSAVE data in.

Nit: The PM calls this "Host Save Area" or "Host State Save Area"
afaics.

I recall us discussing this option in the past, and not right
away pursuing it because of it not having been explicit at the
time. What place in the doc has made this explicit? The main
uncertainty (without any explicit statement) on my part would be
the risk of VMSAVE writing (for performance reasons) e.g. full
cache lines, i.e. more than exactly the bits holding the state
to be saved, without first bringing old contents in from memory.

Of course, with the VMSAVE gone from svm_ctxt_switch_to() the
only one left is in _svm_cpu_up(), i.e. long before the first
VM exit, but then the same consideration could apply the other
way around (VM exit writing full cache lines, assuming the
other parts of the HSA are unused).

And then of course, if in fact this isn't spelled out anywhere,
there's the forward compatibility question.

> --- a/xen/arch/x86/hvm/svm/svm.c
> +++ b/xen/arch/x86/hvm/svm/svm.c
> @@ -72,11 +72,10 @@ static void svm_update_guest_efer(struct vcpu *);
>  static struct hvm_function_table svm_function_table;
>  
>  /*
> - * Physical addresses of the Host State Area (for hardware) and vmcb (for Xen)
> - * which contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state when in
> - * guest vcpu context.
> + * Host State Area.  This area is used by the processor in non-root mode, and
> + * contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state required to
> + * leave guest vcpu context.
>   */
> -static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, hsa);
>  static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, host_vmcb);

The comment now applies to host_vmcb, so making the dual purpose
more obvious would imo be helpful. This would in particular mean
not starting with (only) "Host State Area" (unless here you really
mean to use a term different from the PM, to express the superset,
but then one less easy to mix up with Host Save Area would likely
be better), and not solely referring to non-root mode. Albeit
maybe you mean root mode by saying "contains Xen's"? If so,
perhaps "..., and also contains"?

Jan
Andrew Cooper Oct. 27, 2020, 7:30 p.m. UTC | #2
On 27/10/2020 15:24, Jan Beulich wrote:
> On 26.10.2020 14:50, Andrew Cooper wrote:
>> The format of the Host State Area is, and has always been, a VMCB.  It is
>> explicitly safe to put the host VMSAVE data in.
> Nit: The PM calls this "Host Save Area" or "Host State Save Area"
> afaics.
>
> I recall us discussing this option in the past, and not right
> away pursuing it because of it not having been explicit at the
> time. What place in the doc has made this explicit?

Sadly still not yet, but the pestering has happened.

> The main
> uncertainty (without any explicit statement) on my part would be
> the risk of VMSAVE writing (for performance reasons) e.g. full
> cache lines, i.e. more than exactly the bits holding the state
> to be saved, without first bringing old contents in from memory.

SEV-ES now requires the hypervisor to program desired exit state in in
the VMCB, due to differences in how the VMRUN instruction works.  See
Vol3 15.35.8.  (And yes - this does contradict the earlier statement in
that a the hypervisor must not write directly into the host state area).

I have had it confirmed by AMD that it is safe to use in this fashion,
but if you want more evidence, KVM has had this behaviour on AMD for its
entire lifetime.

>> --- a/xen/arch/x86/hvm/svm/svm.c
>> +++ b/xen/arch/x86/hvm/svm/svm.c
>> @@ -72,11 +72,10 @@ static void svm_update_guest_efer(struct vcpu *);
>>  static struct hvm_function_table svm_function_table;
>>  
>>  /*
>> - * Physical addresses of the Host State Area (for hardware) and vmcb (for Xen)
>> - * which contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state when in
>> - * guest vcpu context.
>> + * Host State Area.  This area is used by the processor in non-root mode, and
>> + * contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state required to
>> + * leave guest vcpu context.
>>   */
>> -static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, hsa);
>>  static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, host_vmcb);
> The comment now applies to host_vmcb, so making the dual purpose
> more obvious would imo be helpful.

But it isn't dual purpose.  It *is* host state, both the half which
VMRUN deals with, and the half which VMLOAD/SAVE deals with (separately,
to optimise VMRUN).

I suppose technically it gets a little complicated with whose state
fs/gs/ldtr/gsbase actually is, given the PV VMLOAD-optimised path, but
the state relevant to Xen's security is tr/syscall/sysenter, which will
remain correct from the start of day.

~Andrew
Jan Beulich Oct. 28, 2020, 8:06 a.m. UTC | #3
On 27.10.2020 20:30, Andrew Cooper wrote:
> On 27/10/2020 15:24, Jan Beulich wrote:
>> On 26.10.2020 14:50, Andrew Cooper wrote:
>>> The format of the Host State Area is, and has always been, a VMCB.  It is
>>> explicitly safe to put the host VMSAVE data in.
>> Nit: The PM calls this "Host Save Area" or "Host State Save Area"
>> afaics.
>>
>> I recall us discussing this option in the past, and not right
>> away pursuing it because of it not having been explicit at the
>> time. What place in the doc has made this explicit?
> 
> Sadly still not yet, but the pestering has happened.
> 
>> The main
>> uncertainty (without any explicit statement) on my part would be
>> the risk of VMSAVE writing (for performance reasons) e.g. full
>> cache lines, i.e. more than exactly the bits holding the state
>> to be saved, without first bringing old contents in from memory.
> 
> SEV-ES now requires the hypervisor to program desired exit state in in
> the VMCB, due to differences in how the VMRUN instruction works.  See
> Vol3 15.35.8.  (And yes - this does contradict the earlier statement in
> that a the hypervisor must not write directly into the host state area).
> 
> I have had it confirmed by AMD that it is safe to use in this fashion,
> but if you want more evidence, KVM has had this behaviour on AMD for its
> entire lifetime.

Ah, interesting.

>>> --- a/xen/arch/x86/hvm/svm/svm.c
>>> +++ b/xen/arch/x86/hvm/svm/svm.c
>>> @@ -72,11 +72,10 @@ static void svm_update_guest_efer(struct vcpu *);
>>>  static struct hvm_function_table svm_function_table;
>>>  
>>>  /*
>>> - * Physical addresses of the Host State Area (for hardware) and vmcb (for Xen)
>>> - * which contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state when in
>>> - * guest vcpu context.
>>> + * Host State Area.  This area is used by the processor in non-root mode, and
>>> + * contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state required to
>>> + * leave guest vcpu context.
>>>   */
>>> -static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, hsa);
>>>  static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, host_vmcb);
>> The comment now applies to host_vmcb, so making the dual purpose
>> more obvious would imo be helpful.
> 
> But it isn't dual purpose.  It *is* host state, both the half which
> VMRUN deals with, and the half which VMLOAD/SAVE deals with (separately,
> to optimise VMRUN).

It is host state, yes, and if you would spell it "host state area"
(and perhaps even omit "area") it would look less dual purpose,
because the use of capitals (to me at least) suggests you refer to
HSA (as used e.g. in the MSR name). The dual purpose really is
that (a) the address gets put in the respective MSR and (b) the
thing also gets directly accessed as a VMCB. Just like the comment
originally said. Yes, in the end it's cumulative host state.

Is there any reason why you can't mostly keep the original comment,
merely starting with singular "Physical address of ..."? (Of course
I'd then still prefer if "Host State Area" was changed to either of
the two terms actually used by the PM.)

Jan
diff mbox series

Patch

diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c
index cfea5b5523..9ec9ad0646 100644
--- a/xen/arch/x86/hvm/svm/svm.c
+++ b/xen/arch/x86/hvm/svm/svm.c
@@ -72,11 +72,10 @@  static void svm_update_guest_efer(struct vcpu *);
 static struct hvm_function_table svm_function_table;
 
 /*
- * Physical addresses of the Host State Area (for hardware) and vmcb (for Xen)
- * which contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state when in
- * guest vcpu context.
+ * Host State Area.  This area is used by the processor in non-root mode, and
+ * contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state required to
+ * leave guest vcpu context.
  */
-static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, hsa);
 static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, host_vmcb);
 #ifdef CONFIG_PV
 static DEFINE_PER_CPU(struct vmcb_struct *, host_vmcb_va);
@@ -1436,15 +1435,8 @@  static bool svm_event_pending(const struct vcpu *v)
 
 static void svm_cpu_dead(unsigned int cpu)
 {
-    paddr_t *this_hsa = &per_cpu(hsa, cpu);
     paddr_t *this_vmcb = &per_cpu(host_vmcb, cpu);
 
-    if ( *this_hsa )
-    {
-        free_domheap_page(maddr_to_page(*this_hsa));
-        *this_hsa = 0;
-    }
-
 #ifdef CONFIG_PV
     if ( per_cpu(host_vmcb_va, cpu) )
     {
@@ -1462,7 +1454,6 @@  static void svm_cpu_dead(unsigned int cpu)
 
 static int svm_cpu_up_prepare(unsigned int cpu)
 {
-    paddr_t *this_hsa = &per_cpu(hsa, cpu);
     paddr_t *this_vmcb = &per_cpu(host_vmcb, cpu);
     nodeid_t node = cpu_to_node(cpu);
     unsigned int memflags = 0;
@@ -1471,16 +1462,6 @@  static int svm_cpu_up_prepare(unsigned int cpu)
     if ( node != NUMA_NO_NODE )
         memflags = MEMF_node(node);
 
-    if ( !*this_hsa )
-    {
-        pg = alloc_domheap_page(NULL, memflags);
-        if ( !pg )
-            goto err;
-
-        clear_domain_page(page_to_mfn(pg));
-        *this_hsa = page_to_maddr(pg);
-    }
-
     if ( !*this_vmcb )
     {
         pg = alloc_domheap_page(NULL, memflags);
@@ -1597,7 +1578,7 @@  static int _svm_cpu_up(bool bsp)
     write_efer(read_efer() | EFER_SVME);
 
     /* Initialize the HSA for this core. */
-    wrmsrl(MSR_K8_VM_HSAVE_PA, per_cpu(hsa, cpu));
+    wrmsrl(MSR_K8_VM_HSAVE_PA, per_cpu(host_vmcb, cpu));
 
     /* check for erratum 383 */
     svm_init_erratum_383(c);