Message ID | 20201026135043.15560-1-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/svm: Merge hsa and host_vmcb to reduce memory overhead | expand |
On 26.10.2020 14:50, Andrew Cooper wrote: > The format of the Host State Area is, and has always been, a VMCB. It is > explicitly safe to put the host VMSAVE data in. Nit: The PM calls this "Host Save Area" or "Host State Save Area" afaics. I recall us discussing this option in the past, and not right away pursuing it because of it not having been explicit at the time. What place in the doc has made this explicit? The main uncertainty (without any explicit statement) on my part would be the risk of VMSAVE writing (for performance reasons) e.g. full cache lines, i.e. more than exactly the bits holding the state to be saved, without first bringing old contents in from memory. Of course, with the VMSAVE gone from svm_ctxt_switch_to() the only one left is in _svm_cpu_up(), i.e. long before the first VM exit, but then the same consideration could apply the other way around (VM exit writing full cache lines, assuming the other parts of the HSA are unused). And then of course, if in fact this isn't spelled out anywhere, there's the forward compatibility question. > --- a/xen/arch/x86/hvm/svm/svm.c > +++ b/xen/arch/x86/hvm/svm/svm.c > @@ -72,11 +72,10 @@ static void svm_update_guest_efer(struct vcpu *); > static struct hvm_function_table svm_function_table; > > /* > - * Physical addresses of the Host State Area (for hardware) and vmcb (for Xen) > - * which contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state when in > - * guest vcpu context. > + * Host State Area. This area is used by the processor in non-root mode, and > + * contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state required to > + * leave guest vcpu context. > */ > -static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, hsa); > static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, host_vmcb); The comment now applies to host_vmcb, so making the dual purpose more obvious would imo be helpful. This would in particular mean not starting with (only) "Host State Area" (unless here you really mean to use a term different from the PM, to express the superset, but then one less easy to mix up with Host Save Area would likely be better), and not solely referring to non-root mode. Albeit maybe you mean root mode by saying "contains Xen's"? If so, perhaps "..., and also contains"? Jan
On 27/10/2020 15:24, Jan Beulich wrote: > On 26.10.2020 14:50, Andrew Cooper wrote: >> The format of the Host State Area is, and has always been, a VMCB. It is >> explicitly safe to put the host VMSAVE data in. > Nit: The PM calls this "Host Save Area" or "Host State Save Area" > afaics. > > I recall us discussing this option in the past, and not right > away pursuing it because of it not having been explicit at the > time. What place in the doc has made this explicit? Sadly still not yet, but the pestering has happened. > The main > uncertainty (without any explicit statement) on my part would be > the risk of VMSAVE writing (for performance reasons) e.g. full > cache lines, i.e. more than exactly the bits holding the state > to be saved, without first bringing old contents in from memory. SEV-ES now requires the hypervisor to program desired exit state in in the VMCB, due to differences in how the VMRUN instruction works. See Vol3 15.35.8. (And yes - this does contradict the earlier statement in that a the hypervisor must not write directly into the host state area). I have had it confirmed by AMD that it is safe to use in this fashion, but if you want more evidence, KVM has had this behaviour on AMD for its entire lifetime. >> --- a/xen/arch/x86/hvm/svm/svm.c >> +++ b/xen/arch/x86/hvm/svm/svm.c >> @@ -72,11 +72,10 @@ static void svm_update_guest_efer(struct vcpu *); >> static struct hvm_function_table svm_function_table; >> >> /* >> - * Physical addresses of the Host State Area (for hardware) and vmcb (for Xen) >> - * which contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state when in >> - * guest vcpu context. >> + * Host State Area. This area is used by the processor in non-root mode, and >> + * contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state required to >> + * leave guest vcpu context. >> */ >> -static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, hsa); >> static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, host_vmcb); > The comment now applies to host_vmcb, so making the dual purpose > more obvious would imo be helpful. But it isn't dual purpose. It *is* host state, both the half which VMRUN deals with, and the half which VMLOAD/SAVE deals with (separately, to optimise VMRUN). I suppose technically it gets a little complicated with whose state fs/gs/ldtr/gsbase actually is, given the PV VMLOAD-optimised path, but the state relevant to Xen's security is tr/syscall/sysenter, which will remain correct from the start of day. ~Andrew
On 27.10.2020 20:30, Andrew Cooper wrote: > On 27/10/2020 15:24, Jan Beulich wrote: >> On 26.10.2020 14:50, Andrew Cooper wrote: >>> The format of the Host State Area is, and has always been, a VMCB. It is >>> explicitly safe to put the host VMSAVE data in. >> Nit: The PM calls this "Host Save Area" or "Host State Save Area" >> afaics. >> >> I recall us discussing this option in the past, and not right >> away pursuing it because of it not having been explicit at the >> time. What place in the doc has made this explicit? > > Sadly still not yet, but the pestering has happened. > >> The main >> uncertainty (without any explicit statement) on my part would be >> the risk of VMSAVE writing (for performance reasons) e.g. full >> cache lines, i.e. more than exactly the bits holding the state >> to be saved, without first bringing old contents in from memory. > > SEV-ES now requires the hypervisor to program desired exit state in in > the VMCB, due to differences in how the VMRUN instruction works. See > Vol3 15.35.8. (And yes - this does contradict the earlier statement in > that a the hypervisor must not write directly into the host state area). > > I have had it confirmed by AMD that it is safe to use in this fashion, > but if you want more evidence, KVM has had this behaviour on AMD for its > entire lifetime. Ah, interesting. >>> --- a/xen/arch/x86/hvm/svm/svm.c >>> +++ b/xen/arch/x86/hvm/svm/svm.c >>> @@ -72,11 +72,10 @@ static void svm_update_guest_efer(struct vcpu *); >>> static struct hvm_function_table svm_function_table; >>> >>> /* >>> - * Physical addresses of the Host State Area (for hardware) and vmcb (for Xen) >>> - * which contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state when in >>> - * guest vcpu context. >>> + * Host State Area. This area is used by the processor in non-root mode, and >>> + * contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state required to >>> + * leave guest vcpu context. >>> */ >>> -static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, hsa); >>> static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, host_vmcb); >> The comment now applies to host_vmcb, so making the dual purpose >> more obvious would imo be helpful. > > But it isn't dual purpose. It *is* host state, both the half which > VMRUN deals with, and the half which VMLOAD/SAVE deals with (separately, > to optimise VMRUN). It is host state, yes, and if you would spell it "host state area" (and perhaps even omit "area") it would look less dual purpose, because the use of capitals (to me at least) suggests you refer to HSA (as used e.g. in the MSR name). The dual purpose really is that (a) the address gets put in the respective MSR and (b) the thing also gets directly accessed as a VMCB. Just like the comment originally said. Yes, in the end it's cumulative host state. Is there any reason why you can't mostly keep the original comment, merely starting with singular "Physical address of ..."? (Of course I'd then still prefer if "Host State Area" was changed to either of the two terms actually used by the PM.) Jan
diff --git a/xen/arch/x86/hvm/svm/svm.c b/xen/arch/x86/hvm/svm/svm.c index cfea5b5523..9ec9ad0646 100644 --- a/xen/arch/x86/hvm/svm/svm.c +++ b/xen/arch/x86/hvm/svm/svm.c @@ -72,11 +72,10 @@ static void svm_update_guest_efer(struct vcpu *); static struct hvm_function_table svm_function_table; /* - * Physical addresses of the Host State Area (for hardware) and vmcb (for Xen) - * which contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state when in - * guest vcpu context. + * Host State Area. This area is used by the processor in non-root mode, and + * contains Xen's fs/gs/tr/ldtr and GSBASE/STAR/SYSENTER state required to + * leave guest vcpu context. */ -static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, hsa); static DEFINE_PER_CPU_READ_MOSTLY(paddr_t, host_vmcb); #ifdef CONFIG_PV static DEFINE_PER_CPU(struct vmcb_struct *, host_vmcb_va); @@ -1436,15 +1435,8 @@ static bool svm_event_pending(const struct vcpu *v) static void svm_cpu_dead(unsigned int cpu) { - paddr_t *this_hsa = &per_cpu(hsa, cpu); paddr_t *this_vmcb = &per_cpu(host_vmcb, cpu); - if ( *this_hsa ) - { - free_domheap_page(maddr_to_page(*this_hsa)); - *this_hsa = 0; - } - #ifdef CONFIG_PV if ( per_cpu(host_vmcb_va, cpu) ) { @@ -1462,7 +1454,6 @@ static void svm_cpu_dead(unsigned int cpu) static int svm_cpu_up_prepare(unsigned int cpu) { - paddr_t *this_hsa = &per_cpu(hsa, cpu); paddr_t *this_vmcb = &per_cpu(host_vmcb, cpu); nodeid_t node = cpu_to_node(cpu); unsigned int memflags = 0; @@ -1471,16 +1462,6 @@ static int svm_cpu_up_prepare(unsigned int cpu) if ( node != NUMA_NO_NODE ) memflags = MEMF_node(node); - if ( !*this_hsa ) - { - pg = alloc_domheap_page(NULL, memflags); - if ( !pg ) - goto err; - - clear_domain_page(page_to_mfn(pg)); - *this_hsa = page_to_maddr(pg); - } - if ( !*this_vmcb ) { pg = alloc_domheap_page(NULL, memflags); @@ -1597,7 +1578,7 @@ static int _svm_cpu_up(bool bsp) write_efer(read_efer() | EFER_SVME); /* Initialize the HSA for this core. */ - wrmsrl(MSR_K8_VM_HSAVE_PA, per_cpu(hsa, cpu)); + wrmsrl(MSR_K8_VM_HSAVE_PA, per_cpu(host_vmcb, cpu)); /* check for erratum 383 */ svm_init_erratum_383(c);
The format of the Host State Area is, and has always been, a VMCB. It is explicitly safe to put the host VMSAVE data in. This removes 4k of memory allocation per pCPU. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> --- xen/arch/x86/hvm/svm/svm.c | 27 ++++----------------------- 1 file changed, 4 insertions(+), 23 deletions(-)