Message ID | 20231219215751.9445-3-alexey.makhalov@broadcom.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | VMware hypercalls enhancements | expand |
On Tue, Dec 19, 2023 at 01:57:47PM -0800, Alexey Makhalov wrote: > +static inline > +unsigned long vmware_hypercall1(unsigned long cmd, unsigned long in1) ... > +static inline > +unsigned long vmware_hypercall3(unsigned long cmd, unsigned long in1, > + uint32_t *out1, uint32_t *out2) ... > +static inline > +unsigned long vmware_hypercall4(unsigned long cmd, unsigned long in1, > + uint32_t *out1, uint32_t *out2, > + uint32_t *out3) ... > +static inline > +unsigned long vmware_hypercall5(unsigned long cmd, unsigned long in1, > + unsigned long in3, unsigned long in4, > + unsigned long in5, uint32_t *out2) ... > +static inline > +unsigned long vmware_hypercall6(unsigned long cmd, unsigned long in1, > + unsigned long in3, uint32_t *out2, > + uint32_t *out3, uint32_t *out4, > + uint32_t *out5) ... > +static inline > +unsigned long vmware_hypercall7(unsigned long cmd, unsigned long in1, > + unsigned long in3, unsigned long in4, > + unsigned long in5, uint32_t *out1, > + uint32_t *out2, uint32_t *out3) Naming is weird. The number in the name doesn't help much as there seems no system on how many of the parameters are ins and outs. Why these combinations of ins/outs are supported? And as an outsider, I'm curious where in2 got lost :P
On 12/19/23 3:20 PM, kirill.shutemov@linux.intel.com wrote: > On Tue, Dec 19, 2023 at 01:57:47PM -0800, Alexey Makhalov wrote: >> +static inline >> +unsigned long vmware_hypercall1(unsigned long cmd, unsigned long in1) > ... >> +static inline >> +unsigned long vmware_hypercall3(unsigned long cmd, unsigned long in1, >> + uint32_t *out1, uint32_t *out2) > ... >> +static inline >> +unsigned long vmware_hypercall4(unsigned long cmd, unsigned long in1, >> + uint32_t *out1, uint32_t *out2, >> + uint32_t *out3) > ... >> +static inline >> +unsigned long vmware_hypercall5(unsigned long cmd, unsigned long in1, >> + unsigned long in3, unsigned long in4, >> + unsigned long in5, uint32_t *out2) > ... >> +static inline >> +unsigned long vmware_hypercall6(unsigned long cmd, unsigned long in1, >> + unsigned long in3, uint32_t *out2, >> + uint32_t *out3, uint32_t *out4, >> + uint32_t *out5) > ... >> +static inline >> +unsigned long vmware_hypercall7(unsigned long cmd, unsigned long in1, >> + unsigned long in3, unsigned long in4, >> + unsigned long in5, uint32_t *out1, >> + uint32_t *out2, uint32_t *out3) > > Naming is weird. The number in the name doesn't help much as there seems > no system on how many of the parameters are ins and outs. There was internal discussion on hypercall API naming. One of proposals was using 2 digits - number of input and number of output arguments. And it definitely looked weird. So, we agreed to have just single number - total number of arguments excluding cmd. > > Why these combinations of ins/outs are supported? VMware hypercalls can use up to 6 ins and 6 outs for LB and 7 ins and 7 outs for HB calls. The mapping to x86 registers is below: in0/out0 - rax in1/out1 - rbx in2/out2 - rcx in3/out3 - rdx in4/out4 - rsi in5/out5 - rdi in6/out6 - rbp (only used in high bandwidth hypercalls) args 0, 2 and 6 are remapped to r12, r13 and r14 for TDX. There is a standard on some arguments such as cmd on in2, magic on in0 and output value is on out0. While other arguments are not standardized across hypercall. Theoreticaly max hypercall, in term of number of arguments: vmware_hypercall9(cmd, in1, in3, in4, in5, *out1, *out2, *out3, *out4, *out5) But there is no such called in a linux kernel. Current combination of hypercalls covers all current and future (not yet upstreamed) callers, with round up to next number in some cases. > > And as an outsider, I'm curious where in2 got lost :P > 'lost' arguments: in0 - indirectly initialized inside hypercall function. out0 - return value from the hypercall. [LB hypercalls] in2 <- input cmd [HB hypercalls] in1 <- input cmd Regards, --Alexey
On Tue, Dec 19, 2023 at 04:17:40PM -0800, Alexey Makhalov wrote: > > > On 12/19/23 3:20 PM, kirill.shutemov@linux.intel.com wrote: > > On Tue, Dec 19, 2023 at 01:57:47PM -0800, Alexey Makhalov wrote: > > > +static inline > > > +unsigned long vmware_hypercall1(unsigned long cmd, unsigned long in1) > > ... > > > +static inline > > > +unsigned long vmware_hypercall3(unsigned long cmd, unsigned long in1, > > > + uint32_t *out1, uint32_t *out2) > > ... > > > +static inline > > > +unsigned long vmware_hypercall4(unsigned long cmd, unsigned long in1, > > > + uint32_t *out1, uint32_t *out2, > > > + uint32_t *out3) > > ... > > > +static inline > > > +unsigned long vmware_hypercall5(unsigned long cmd, unsigned long in1, > > > + unsigned long in3, unsigned long in4, > > > + unsigned long in5, uint32_t *out2) > > ... > > > +static inline > > > +unsigned long vmware_hypercall6(unsigned long cmd, unsigned long in1, > > > + unsigned long in3, uint32_t *out2, > > > + uint32_t *out3, uint32_t *out4, > > > + uint32_t *out5) > > ... > > > +static inline > > > +unsigned long vmware_hypercall7(unsigned long cmd, unsigned long in1, > > > + unsigned long in3, unsigned long in4, > > > + unsigned long in5, uint32_t *out1, > > > + uint32_t *out2, uint32_t *out3) > > > > Naming is weird. The number in the name doesn't help much as there seems > > no system on how many of the parameters are ins and outs. > > There was internal discussion on hypercall API naming. One of proposals was > using 2 digits - number of input and number of output arguments. > And it definitely looked weird. So, we agreed to have just single number - > total number of arguments excluding cmd. Have you considered naming them by number of input parameters? Number of output parameters as demanded by users. So vmware_hypercall4() will become vmware_hypercall1() and current vmware_hypercall1() and vmware_hypercall3() will go away. It is still awful, but /maybe/ better that this, I donno.
On 12/19/23 4:51 PM, kirill.shutemov@linux.intel.com wrote: > On Tue, Dec 19, 2023 at 04:17:40PM -0800, Alexey Makhalov wrote: >> >> >> On 12/19/23 3:20 PM, kirill.shutemov@linux.intel.com wrote: >>> On Tue, Dec 19, 2023 at 01:57:47PM -0800, Alexey Makhalov wrote: >>>> +static inline >>>> +unsigned long vmware_hypercall1(unsigned long cmd, unsigned long in1) >>> ... >>>> +static inline >>>> +unsigned long vmware_hypercall3(unsigned long cmd, unsigned long in1, >>>> + uint32_t *out1, uint32_t *out2) >>> ... >>>> +static inline >>>> +unsigned long vmware_hypercall4(unsigned long cmd, unsigned long in1, >>>> + uint32_t *out1, uint32_t *out2, >>>> + uint32_t *out3) >>> ... >>>> +static inline >>>> +unsigned long vmware_hypercall5(unsigned long cmd, unsigned long in1, >>>> + unsigned long in3, unsigned long in4, >>>> + unsigned long in5, uint32_t *out2) >>> ... >>>> +static inline >>>> +unsigned long vmware_hypercall6(unsigned long cmd, unsigned long in1, >>>> + unsigned long in3, uint32_t *out2, >>>> + uint32_t *out3, uint32_t *out4, >>>> + uint32_t *out5) >>> ... >>>> +static inline >>>> +unsigned long vmware_hypercall7(unsigned long cmd, unsigned long in1, >>>> + unsigned long in3, unsigned long in4, >>>> + unsigned long in5, uint32_t *out1, >>>> + uint32_t *out2, uint32_t *out3) >>> >>> Naming is weird. The number in the name doesn't help much as there seems >>> no system on how many of the parameters are ins and outs. >> >> There was internal discussion on hypercall API naming. One of proposals was >> using 2 digits - number of input and number of output arguments. >> And it definitely looked weird. So, we agreed to have just single number - >> total number of arguments excluding cmd. > > Have you considered naming them by number of input parameters? Number of > output parameters as demanded by users. > > So vmware_hypercall4() will become vmware_hypercall1() and current > vmware_hypercall1() and vmware_hypercall3() will go away. > > It is still awful, but /maybe/ better that this, I donno. > Deprecating vmware_hypercall1 and vmware_hypercall3 in favor of vmware_hypercall4 will generate less efficient code for the caller of first ones. Using current vmware_hypercall4 instead of vmware_hypercall1 will force the caller to allocate additional variables (register or on stack memory) for hypercall asm inline to put additional output registers on. And specifically to 'usage' of *out3 - compiler will unnecessary 'clobber' useful rdx, when hypervisor will keep it unchanged. Unfortunately VMware hypercall ABI is not as beautiful as KVM one, especially in number of output arguments and their ordering. rbp register usage as an argument is a separate bummer((. So we have to work with what we have. Current set of functions includes only 6 functions (for LB), which is the optimum between readability, maintainability and performance. It covers all current kernel callers and all new callers from yet to be upstreamed patches that we have in Photon OS including 2 patches for x86 and arm64 guest support. Regards, --Alexey
diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h index 3636faa8b4fe..719e41260ece 100644 --- a/arch/x86/include/asm/vmware.h +++ b/arch/x86/include/asm/vmware.h @@ -40,69 +40,219 @@ extern u8 vmware_hypercall_mode; -/* The low bandwidth call. The low word of edx is presumed clear. */ -#define VMWARE_HYPERCALL \ - ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT) ", %%dx; " \ - "inl (%%dx), %%eax", \ - "vmcall", X86_FEATURE_VMCALL, \ - "vmmcall", X86_FEATURE_VMW_VMMCALL) - /* - * The high bandwidth out call. The low word of edx is presumed to have the - * HB and OUT bits set. + * The low bandwidth call. The low word of edx is presumed to have OUT bit + * set. The high word of edx may contain input data from the caller. */ -#define VMWARE_HYPERCALL_HB_OUT \ - ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; " \ - "rep outsb", \ +#define VMWARE_HYPERCALL \ + ALTERNATIVE_3("cmpb $" \ + __stringify(CPUID_VMWARE_FEATURES_ECX_VMMCALL) \ + ", %[mode]\n\t" \ + "jg 2f\n\t" \ + "je 1f\n\t" \ + "movw %[port], %%dx\n\t" \ + "inl (%%dx), %%eax\n\t" \ + "jmp 3f\n\t" \ + "1: vmmcall\n\t" \ + "jmp 3f\n\t" \ + "2: vmcall\n\t" \ + "3:\n\t", \ + "movw %[port], %%dx\n\t" \ + "inl (%%dx), %%eax", X86_FEATURE_HYPERVISOR, \ "vmcall", X86_FEATURE_VMCALL, \ "vmmcall", X86_FEATURE_VMW_VMMCALL) +static inline +unsigned long vmware_hypercall1(unsigned long cmd, unsigned long in1) +{ + unsigned long out0; + + asm_inline volatile (VMWARE_HYPERCALL + : "=a" (out0) + : [port] "i" (VMWARE_HYPERVISOR_PORT), + [mode] "m" (vmware_hypercall_mode), + "a" (VMWARE_HYPERVISOR_MAGIC), + "b" (in1), + "c" (cmd), + "d" (0) + : "cc", "memory"); + return out0; +} + +static inline +unsigned long vmware_hypercall3(unsigned long cmd, unsigned long in1, + uint32_t *out1, uint32_t *out2) +{ + unsigned long out0; + + asm_inline volatile (VMWARE_HYPERCALL + : "=a" (out0), "=b" (*out1), "=c" (*out2) + : [port] "i" (VMWARE_HYPERVISOR_PORT), + [mode] "m" (vmware_hypercall_mode), + "a" (VMWARE_HYPERVISOR_MAGIC), + "b" (in1), + "c" (cmd), + "d" (0) + : "cc", "memory"); + return out0; +} + +static inline +unsigned long vmware_hypercall4(unsigned long cmd, unsigned long in1, + uint32_t *out1, uint32_t *out2, + uint32_t *out3) +{ + unsigned long out0; + + asm_inline volatile (VMWARE_HYPERCALL + : "=a" (out0), "=b" (*out1), "=c" (*out2), "=d" (*out3) + : [port] "i" (VMWARE_HYPERVISOR_PORT), + [mode] "m" (vmware_hypercall_mode), + "a" (VMWARE_HYPERVISOR_MAGIC), + "b" (in1), + "c" (cmd), + "d" (0) + : "cc", "memory"); + return out0; +} + +static inline +unsigned long vmware_hypercall5(unsigned long cmd, unsigned long in1, + unsigned long in3, unsigned long in4, + unsigned long in5, uint32_t *out2) +{ + unsigned long out0; + + asm_inline volatile (VMWARE_HYPERCALL + : "=a" (out0), "=c" (*out2) + : [port] "i" (VMWARE_HYPERVISOR_PORT), + [mode] "m" (vmware_hypercall_mode), + "a" (VMWARE_HYPERVISOR_MAGIC), + "b" (in1), + "c" (cmd), + "d" (in3), + "S" (in4), + "D" (in5) + : "cc", "memory"); + return out0; +} + +static inline +unsigned long vmware_hypercall6(unsigned long cmd, unsigned long in1, + unsigned long in3, uint32_t *out2, + uint32_t *out3, uint32_t *out4, + uint32_t *out5) +{ + unsigned long out0; + + asm_inline volatile (VMWARE_HYPERCALL + : "=a" (out0), "=c" (*out2), "=d" (*out3), "=S" (*out4), + "=D" (*out5) + : [port] "i" (VMWARE_HYPERVISOR_PORT), + [mode] "m" (vmware_hypercall_mode), + "a" (VMWARE_HYPERVISOR_MAGIC), + "b" (in1), + "c" (cmd), + "d" (in3) + : "cc", "memory"); + return out0; +} + +static inline +unsigned long vmware_hypercall7(unsigned long cmd, unsigned long in1, + unsigned long in3, unsigned long in4, + unsigned long in5, uint32_t *out1, + uint32_t *out2, uint32_t *out3) +{ + unsigned long out0; + + asm_inline volatile (VMWARE_HYPERCALL + : "=a" (out0), "=b" (*out1), "=c" (*out2), "=d" (*out3) + : [port] "i" (VMWARE_HYPERVISOR_PORT), + [mode] "m" (vmware_hypercall_mode), + "a" (VMWARE_HYPERVISOR_MAGIC), + "b" (in1), + "c" (cmd), + "d" (in3), + "S" (in4), + "D" (in5) + : "cc", "memory"); + return out0; +} + + +#ifdef CONFIG_X86_64 +#define VMW_BP_REG "%%rbp" +#define VMW_BP_CONSTRAINT "r" +#else +#define VMW_BP_REG "%%ebp" +#define VMW_BP_CONSTRAINT "m" +#endif + /* - * The high bandwidth in call. The low word of edx is presumed to have the - * HB bit set. + * High bandwidth calls are not supported on encrypted memory guests. + * The caller should check cc_platform_has(CC_ATTR_MEM_ENCRYPT) and use + * low bandwidth hypercall it memory encryption is set. + * This assumption simplifies HB hypercall impementation to just I/O port + * based approach without alternative patching. */ -#define VMWARE_HYPERCALL_HB_IN \ - ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; " \ - "rep insb", \ - "vmcall", X86_FEATURE_VMCALL, \ - "vmmcall", X86_FEATURE_VMW_VMMCALL) +static inline +unsigned long vmware_hypercall_hb_out(unsigned long cmd, unsigned long in2, + unsigned long in3, unsigned long in4, + unsigned long in5, unsigned long in6, + uint32_t *out1) +{ + unsigned long out0; + + asm_inline volatile ( + UNWIND_HINT_SAVE + "push " VMW_BP_REG "\n\t" + UNWIND_HINT_UNDEFINED + "mov %[in6], " VMW_BP_REG "\n\t" + "rep outsb\n\t" + "pop " VMW_BP_REG "\n\t" + UNWIND_HINT_RESTORE + : "=a" (out0), "=b" (*out1) + : "a" (VMWARE_HYPERVISOR_MAGIC), + "b" (cmd), + "c" (in2), + "d" (in3 | VMWARE_HYPERVISOR_PORT_HB), + "S" (in4), + "D" (in5), + [in6] VMW_BP_CONSTRAINT (in6) + : "cc", "memory"); + return out0; +} + +static inline +unsigned long vmware_hypercall_hb_in(unsigned long cmd, unsigned long in2, + unsigned long in3, unsigned long in4, + unsigned long in5, unsigned long in6, + uint32_t *out1) +{ + unsigned long out0; -#define VMWARE_PORT(cmd, eax, ebx, ecx, edx) \ - __asm__("inl (%%dx), %%eax" : \ - "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) : \ - "a"(VMWARE_HYPERVISOR_MAGIC), \ - "c"(VMWARE_CMD_##cmd), \ - "d"(VMWARE_HYPERVISOR_PORT), "b"(UINT_MAX) : \ - "memory") - -#define VMWARE_VMCALL(cmd, eax, ebx, ecx, edx) \ - __asm__("vmcall" : \ - "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) : \ - "a"(VMWARE_HYPERVISOR_MAGIC), \ - "c"(VMWARE_CMD_##cmd), \ - "d"(0), "b"(UINT_MAX) : \ - "memory") - -#define VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx) \ - __asm__("vmmcall" : \ - "=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) : \ - "a"(VMWARE_HYPERVISOR_MAGIC), \ - "c"(VMWARE_CMD_##cmd), \ - "d"(0), "b"(UINT_MAX) : \ - "memory") - -#define VMWARE_CMD(cmd, eax, ebx, ecx, edx) do { \ - switch (vmware_hypercall_mode) { \ - case CPUID_VMWARE_FEATURES_ECX_VMCALL: \ - VMWARE_VMCALL(cmd, eax, ebx, ecx, edx); \ - break; \ - case CPUID_VMWARE_FEATURES_ECX_VMMCALL: \ - VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx); \ - break; \ - default: \ - VMWARE_PORT(cmd, eax, ebx, ecx, edx); \ - break; \ - } \ - } while (0) + asm_inline volatile ( + UNWIND_HINT_SAVE + "push " VMW_BP_REG "\n\t" + UNWIND_HINT_UNDEFINED + "mov %[in6], " VMW_BP_REG "\n\t" + "rep insb\n\t" + "pop " VMW_BP_REG "\n\t" + UNWIND_HINT_RESTORE + : "=a" (out0), "=b" (*out1) + : "a" (VMWARE_HYPERVISOR_MAGIC), + "b" (cmd), + "c" (in2), + "d" (in3 | VMWARE_HYPERVISOR_PORT_HB), + "S" (in4), + "D" (in5), + [in6] VMW_BP_CONSTRAINT (in6) + : "cc", "memory"); + return out0; +} +#undef VMW_BP_REG +#undef VMW_BP_CONSTRAINT +#undef VMWARE_HYPERCALL #endif diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c index 4db8e1daa4a1..3aa1adaed18f 100644 --- a/arch/x86/kernel/cpu/vmware.c +++ b/arch/x86/kernel/cpu/vmware.c @@ -67,9 +67,10 @@ EXPORT_SYMBOL_GPL(vmware_hypercall_mode); static inline int __vmware_platform(void) { - uint32_t eax, ebx, ecx, edx; - VMWARE_CMD(GETVERSION, eax, ebx, ecx, edx); - return eax != (uint32_t)-1 && ebx == VMWARE_HYPERVISOR_MAGIC; + uint32_t eax, ebx, ecx; + + eax = vmware_hypercall3(VMWARE_CMD_GETVERSION, 0, &ebx, &ecx); + return eax != UINT_MAX && ebx == VMWARE_HYPERVISOR_MAGIC; } static unsigned long vmware_get_tsc_khz(void) @@ -121,21 +122,12 @@ static void __init vmware_cyc2ns_setup(void) pr_info("using clock offset of %llu ns\n", d->cyc2ns_offset); } -static int vmware_cmd_stealclock(uint32_t arg1, uint32_t arg2) +static int vmware_cmd_stealclock(uint32_t addr_hi, uint32_t addr_lo) { - uint32_t result, info; - - asm volatile (VMWARE_HYPERCALL : - "=a"(result), - "=c"(info) : - "a"(VMWARE_HYPERVISOR_MAGIC), - "b"(0), - "c"(VMWARE_CMD_STEALCLOCK), - "d"(0), - "S"(arg1), - "D"(arg2) : - "memory"); - return result; + uint32_t info; + + return vmware_hypercall5(VMWARE_CMD_STEALCLOCK, 0, 0, addr_hi, addr_lo, + &info); } static bool stealclock_enable(phys_addr_t pa) @@ -344,10 +336,10 @@ static void __init vmware_set_capabilities(void) static void __init vmware_platform_setup(void) { - uint32_t eax, ebx, ecx, edx; + uint32_t eax, ebx, ecx; uint64_t lpj, tsc_khz; - VMWARE_CMD(GETHZ, eax, ebx, ecx, edx); + eax = vmware_hypercall3(VMWARE_CMD_GETHZ, UINT_MAX, &ebx, &ecx); if (ebx != UINT_MAX) { lpj = tsc_khz = eax | (((uint64_t)ebx) << 32); @@ -429,8 +421,9 @@ static uint32_t __init vmware_platform(void) /* Checks if hypervisor supports x2apic without VT-D interrupt remapping. */ static bool __init vmware_legacy_x2apic_available(void) { - uint32_t eax, ebx, ecx, edx; - VMWARE_CMD(GETVCPU_INFO, eax, ebx, ecx, edx); + uint32_t eax; + + eax = vmware_hypercall1(VMWARE_CMD_GETVCPU_INFO, 0); return !(eax & BIT(VCPU_RESERVED)) && (eax & BIT(VCPU_LEGACY_X2APIC)); }