diff mbox series

[v9,3/8] x86/vmware: Introduce VMware hypercall API

Message ID 20240506215305.30756-4-alexey.makhalov@broadcom.com (mailing list archive)
State New, archived
Headers show
Series [v9,1/8] x86/vmware: Move common macros to vmware.h | expand

Commit Message

Alexey Makhalov May 6, 2024, 9:53 p.m. UTC
Introduce vmware_hypercall family of functions. It is a common
implementation to be used by the VMware guest code and virtual
device drivers in architecture independent manner.

The API consists of vmware_hypercallX and vmware_hypercall_hb_{out,in}
set of functions by analogy with KVM hypercall API. Architecture
specific implementation is hidden inside.

It will simplify future enhancements in VMware hypercalls such
as SEV-ES and TDX related changes without needs to modify a
caller in device drivers code.

Current implementation extends an idea from commit bac7b4e84323
("x86/vmware: Update platform detection code for VMCALL/VMMCALL
hypercalls") to have a slow, but safe path in VMWARE_HYPERCALL
earlier during the boot when alternatives are not yet applied.
This logic was inherited from VMWARE_CMD from the commit mentioned
above. Default alternative code was optimized by size to reduce
excessive nop alignment once alternatives are applied. Total
default code size is 26 bytes, in worse case (3 bytes alternative)
remaining 23 bytes will be aligned by only 3 long NOP instructions.

Signed-off-by: Alexey Makhalov <alexey.makhalov@broadcom.com>
Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
Reviewed-by: Jeff Sipek <jsipek@vmware.com>
---
 arch/x86/include/asm/vmware.h           | 288 +++++++++++++++++++-----
 arch/x86/kernel/cpu/vmware.c            |  35 ++-
 drivers/gpu/drm/vmwgfx/vmwgfx_msg_x86.h |   6 +-
 drivers/input/mouse/vmmouse.c           |   2 +
 drivers/ptp/ptp_vmw.c                   |   2 +
 5 files changed, 252 insertions(+), 81 deletions(-)

Comments

Borislav Petkov May 7, 2024, 9:58 a.m. UTC | #1
On Mon, May 06, 2024 at 02:53:00PM -0700, Alexey Makhalov wrote:
> +#define VMWARE_HYPERCALL						\
> +	ALTERNATIVE_3("cmpb $"						\
> +			__stringify(CPUID_VMWARE_FEATURES_ECX_VMMCALL)	\
> +			", %[mode]\n\t"					\
> +		      "jg 2f\n\t"					\
> +		      "je 1f\n\t"					\
> +		      "movw %[port], %%dx\n\t"				\
> +		      "inl (%%dx), %%eax\n\t"				\
> +		      "jmp 3f\n\t"					\
> +		      "1: vmmcall\n\t"					\
> +		      "jmp 3f\n\t"					\
> +		      "2: vmcall\n\t"					\
> +		      "3:\n\t",						\
> +		      "movw %[port], %%dx\n\t"				\
> +		      "inl (%%dx), %%eax", X86_FEATURE_HYPERVISOR,	\

That's a bunch of insns and their size would inadvertently go into the final
image.

What you should try to do is something like this:

ALTERNATIVE_3("jmp .Lend_legacy_call", "", X86_FEATURE_HYPERVISOR,
	      "vmcall; jmp .Lend_legacy_call", X86_FEATURE_VMCALL,
	      "vmmcall; jmp .Lend_legacy_call", X86_FEATURE_VMW_VMMCALL)

		/* bunch of conditional branches and INs and V*MCALLs, etc go here */

		.Lend_legacy_call:

so that you don't have these 26 bytes, as you say, of alternatives to patch but
only the JMPs and the VM*CALLs.

See for an example the macros in arch/x86/entry/calling.h which simply jump
over the code when not needed.

Also, you could restructure the alternative differently so that that bunch of
insns call is completely out-of-line because all current machines support
VM*CALL so you won't even need to patch. You only get to patch when running on
some old rust and there you can just as well go completely out-of-line.

Something along those lines, anyway.

> - * The high bandwidth in call. The low word of edx is presumed to have the
> - * HB bit set.
> + * High bandwidth calls are not supported on encrypted memory guests.
> + * The caller should check cc_platform_has(CC_ATTR_MEM_ENCRYPT) and use
> + * low bandwidth hypercall it memory encryption is set.

s/it/if/

> -#define VMWARE_PORT(cmd, eax, ebx, ecx, edx)				\
> -	__asm__("inl (%%dx), %%eax" :					\
> -		"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :		\
> -		"a"(VMWARE_HYPERVISOR_MAGIC),				\
> -		"c"(VMWARE_CMD_##cmd),					\
> -		"d"(VMWARE_HYPERVISOR_PORT), "b"(UINT_MAX) :		\
> -		"memory")
> -
> -#define VMWARE_VMCALL(cmd, eax, ebx, ecx, edx)				\
> -	__asm__("vmcall" :						\
> -		"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :		\
> -		"a"(VMWARE_HYPERVISOR_MAGIC),				\
> -		"c"(VMWARE_CMD_##cmd),					\
> -		"d"(0), "b"(UINT_MAX) :					\
> -		"memory")
> -
> -#define VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx)				\
> -	__asm__("vmmcall" :						\
> -		"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :		\
> -		"a"(VMWARE_HYPERVISOR_MAGIC),				\
> -		"c"(VMWARE_CMD_##cmd),					\
> -		"d"(0), "b"(UINT_MAX) :					\
> -		"memory")
> -
> -#define VMWARE_CMD(cmd, eax, ebx, ecx, edx) do {		\
> -	switch (vmware_hypercall_mode) {			\
> -	case CPUID_VMWARE_FEATURES_ECX_VMCALL:			\
> -		VMWARE_VMCALL(cmd, eax, ebx, ecx, edx);		\
> -		break;						\
> -	case CPUID_VMWARE_FEATURES_ECX_VMMCALL:			\
> -		VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx);	\
> -		break;						\
> -	default:						\
> -		VMWARE_PORT(cmd, eax, ebx, ecx, edx);		\
> -		break;						\
> -	}							\
> -	} while (0)

You're kidding, right?

You went to all that trouble in patch 1 to move those to the header only to
*remove* them here?

You do realize that that is a unnecessary churn for no good reason, right?

So that set needs to be restructured differently.

* first patch introduces those new API calls.

* follow-on patches convert the callers to the new API

* last patch removes the old API.

Ok?

And when you redo them, make sure you drop all Reviewed-by tags because the new
versions are not reviewed anymore.

Thx.
Alexey Makhalov May 9, 2024, 11:42 p.m. UTC | #2
On 5/7/24 2:58 AM, Borislav Petkov wrote:
> On Mon, May 06, 2024 at 02:53:00PM -0700, Alexey Makhalov wrote:
>> +#define VMWARE_HYPERCALL						\
>> +	ALTERNATIVE_3("cmpb $"						\
>> +			__stringify(CPUID_VMWARE_FEATURES_ECX_VMMCALL)	\
>> +			", %[mode]\n\t"					\
>> +		      "jg 2f\n\t"					\
>> +		      "je 1f\n\t"					\
>> +		      "movw %[port], %%dx\n\t"				\
>> +		      "inl (%%dx), %%eax\n\t"				\
>> +		      "jmp 3f\n\t"					\
>> +		      "1: vmmcall\n\t"					\
>> +		      "jmp 3f\n\t"					\
>> +		      "2: vmcall\n\t"					\
>> +		      "3:\n\t",						\
>> +		      "movw %[port], %%dx\n\t"				\
>> +		      "inl (%%dx), %%eax", X86_FEATURE_HYPERVISOR,	\
> 
> That's a bunch of insns and their size would inadvertently go into the final
> image.
> 
> What you should try to do is something like this:
> 
> ALTERNATIVE_3("jmp .Lend_legacy_call", "", X86_FEATURE_HYPERVISOR,
> 	      "vmcall; jmp .Lend_legacy_call", X86_FEATURE_VMCALL,
> 	      "vmmcall; jmp .Lend_legacy_call", X86_FEATURE_VMW_VMMCALL)
> 
> 		/* bunch of conditional branches and INs and V*MCALLs, etc go here */
> 
> 		.Lend_legacy_call:
> 
> so that you don't have these 26 bytes, as you say, of alternatives to patch but
> only the JMPs and the VM*CALLs.
> 
> See for an example the macros in arch/x86/entry/calling.h which simply jump
> over the code when not needed.
Good idea!

> 
> Also, you could restructure the alternative differently so that that bunch of
> insns call is completely out-of-line because all current machines support
> VM*CALL so you won't even need to patch. You only get to patch when running on
> some old rust and there you can just as well go completely out-of-line.
> 
Alternatives patching has not been performed at platform detection time.
And platform detection hypercalls should work on all machines.
That is the reason we have IN as a default hypercall behavior.

> Something along those lines, anyway.
> 
>> - * The high bandwidth in call. The low word of edx is presumed to have the
>> - * HB bit set.
>> + * High bandwidth calls are not supported on encrypted memory guests.
>> + * The caller should check cc_platform_has(CC_ATTR_MEM_ENCRYPT) and use
>> + * low bandwidth hypercall it memory encryption is set.
> 
> s/it/if/
Acked.

> 
>> -#define VMWARE_PORT(cmd, eax, ebx, ecx, edx)				\
>> -	__asm__("inl (%%dx), %%eax" :					\
>> -		"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :		\
>> -		"a"(VMWARE_HYPERVISOR_MAGIC),				\
>> -		"c"(VMWARE_CMD_##cmd),					\
>> -		"d"(VMWARE_HYPERVISOR_PORT), "b"(UINT_MAX) :		\
>> -		"memory")
>> -
>> -#define VMWARE_VMCALL(cmd, eax, ebx, ecx, edx)				\
>> -	__asm__("vmcall" :						\
>> -		"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :		\
>> -		"a"(VMWARE_HYPERVISOR_MAGIC),				\
>> -		"c"(VMWARE_CMD_##cmd),					\
>> -		"d"(0), "b"(UINT_MAX) :					\
>> -		"memory")
>> -
>> -#define VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx)				\
>> -	__asm__("vmmcall" :						\
>> -		"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :		\
>> -		"a"(VMWARE_HYPERVISOR_MAGIC),				\
>> -		"c"(VMWARE_CMD_##cmd),					\
>> -		"d"(0), "b"(UINT_MAX) :					\
>> -		"memory")
>> -
>> -#define VMWARE_CMD(cmd, eax, ebx, ecx, edx) do {		\
>> -	switch (vmware_hypercall_mode) {			\
>> -	case CPUID_VMWARE_FEATURES_ECX_VMCALL:			\
>> -		VMWARE_VMCALL(cmd, eax, ebx, ecx, edx);		\
>> -		break;						\
>> -	case CPUID_VMWARE_FEATURES_ECX_VMMCALL:			\
>> -		VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx);	\
>> -		break;						\
>> -	default:						\
>> -		VMWARE_PORT(cmd, eax, ebx, ecx, edx);		\
>> -		break;						\
>> -	}							\
>> -	} while (0)
> 
> You're kidding, right?
> 
> You went to all that trouble in patch 1 to move those to the header only to
> *remove* them here?
> 
> You do realize that that is a unnecessary churn for no good reason, right?
> 
> So that set needs to be restructured differently.
> 
> * first patch introduces those new API calls.
> 
> * follow-on patches convert the callers to the new API
> 
> * last patch removes the old API.
> 
> Ok?
My intention was to have a implementation transformation from locals 
macro through common macros to common API.

What you are suggesting will eliminate unnecessary patches. It makes sense.

Will perform this restructuring in v10.

> 
> And when you redo them, make sure you drop all Reviewed-by tags because the new
> versions are not reviewed anymore.
Noted.

Thanks again,
--Alexey
kernel test robot May 10, 2024, 5:24 a.m. UTC | #3
Hi Alexey,

kernel test robot noticed the following build errors:

[auto build test ERROR on drm-misc/drm-misc-next]
[also build test ERROR on dtor-input/next dtor-input/for-linus linus/master v6.9-rc7 next-20240509]
[cannot apply to tip/x86/vmware]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Alexey-Makhalov/x86-vmware-Move-common-macros-to-vmware-h/20240507-055606
base:   git://anongit.freedesktop.org/drm/drm-misc drm-misc-next
patch link:    https://lore.kernel.org/r/20240506215305.30756-4-alexey.makhalov%40broadcom.com
patch subject: [PATCH v9 3/8] x86/vmware: Introduce VMware hypercall API
config: x86_64-buildonly-randconfig-003-20240510 (https://download.01.org/0day-ci/archive/20240510/202405101333.vdlWwpgr-lkp@intel.com/config)
compiler: gcc-11 (Ubuntu 11.4.0-4ubuntu1) 11.4.0
reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240510/202405101333.vdlWwpgr-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202405101333.vdlWwpgr-lkp@intel.com/

All errors (new ones prefixed by >>):

   ld: drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: in function `vmw_close_channel':
>> vmwgfx_msg.c:(.text+0xaf): undefined reference to `vmware_hypercall_mode'
   ld: drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: in function `vmw_port_hb_in':
   vmwgfx_msg.c:(.text+0x2c4): undefined reference to `vmware_hypercall_mode'
   ld: drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: in function `vmw_port_hb_out':
   vmwgfx_msg.c:(.text+0x604): undefined reference to `vmware_hypercall_mode'
   ld: drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: in function `vmw_send_msg':
   vmwgfx_msg.c:(.text+0x8b0): undefined reference to `vmware_hypercall_mode'
   ld: drivers/gpu/drm/vmwgfx/vmwgfx_msg.o: in function `vmw_open_channel.constprop.0':
   vmwgfx_msg.c:(.text+0x9e8): undefined reference to `vmware_hypercall_mode'
   ld: drivers/gpu/drm/vmwgfx/vmwgfx_msg.o:vmwgfx_msg.c:(.text+0xc3c): more undefined references to `vmware_hypercall_mode' follow
Simon Horman May 11, 2024, 3:02 p.m. UTC | #4
On Mon, May 06, 2024 at 02:53:00PM -0700, Alexey Makhalov wrote:
> Introduce vmware_hypercall family of functions. It is a common
> implementation to be used by the VMware guest code and virtual
> device drivers in architecture independent manner.
> 
> The API consists of vmware_hypercallX and vmware_hypercall_hb_{out,in}
> set of functions by analogy with KVM hypercall API. Architecture
> specific implementation is hidden inside.
> 
> It will simplify future enhancements in VMware hypercalls such
> as SEV-ES and TDX related changes without needs to modify a
> caller in device drivers code.
> 
> Current implementation extends an idea from commit bac7b4e84323
> ("x86/vmware: Update platform detection code for VMCALL/VMMCALL
> hypercalls") to have a slow, but safe path in VMWARE_HYPERCALL
> earlier during the boot when alternatives are not yet applied.
> This logic was inherited from VMWARE_CMD from the commit mentioned
> above. Default alternative code was optimized by size to reduce
> excessive nop alignment once alternatives are applied. Total
> default code size is 26 bytes, in worse case (3 bytes alternative)
> remaining 23 bytes will be aligned by only 3 long NOP instructions.
> 
> Signed-off-by: Alexey Makhalov <alexey.makhalov@broadcom.com>
> Reviewed-by: Nadav Amit <nadav.amit@gmail.com>
> Reviewed-by: Jeff Sipek <jsipek@vmware.com>

...

> diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h

...

> +static inline
> +unsigned long vmware_hypercall3(unsigned long cmd, unsigned long in1,
> +				uint32_t *out1, uint32_t *out2)

nit: u32 is preferred over uint32_t.
     Likewise elsewhere in this patch-set.
...

>  /*
> - * The high bandwidth in call. The low word of edx is presumed to have the
> - * HB bit set.
> + * High bandwidth calls are not supported on encrypted memory guests.
> + * The caller should check cc_platform_has(CC_ATTR_MEM_ENCRYPT) and use
> + * low bandwidth hypercall it memory encryption is set.
> + * This assumption simplifies HB hypercall impementation to just I/O port

nit: implementation

     checkpatch.pl --codespell is your friend

> + * based approach without alternative patching.
>   */

...
Alexey Makhalov May 22, 2024, 11:39 p.m. UTC | #5
Hi Simon, apologize for long delay

On 5/11/24 8:02 AM, Simon Horman wrote:
>> diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h
> 
> ...
> 
>> +static inline
>> +unsigned long vmware_hypercall3(unsigned long cmd, unsigned long in1,
>> +				uint32_t *out1, uint32_t *out2)
> 
> nit: u32 is preferred over uint32_t.
>       Likewise elsewhere in this patch-set.
Good to know. Can you please shed a light on the reason?
I still see bunch of stdint style uint32_t in arch/x86.


> ...
> 
>>   /*
>> - * The high bandwidth in call. The low word of edx is presumed to have the
>> - * HB bit set.
>> + * High bandwidth calls are not supported on encrypted memory guests.
>> + * The caller should check cc_platform_has(CC_ATTR_MEM_ENCRYPT) and use
>> + * low bandwidth hypercall it memory encryption is set.
>> + * This assumption simplifies HB hypercall impementation to just I/O port
> 
> nit: implementation
> 
>       checkpatch.pl --codespell is your friend
Thanks, that is useful!

> 
>> + * based approach without alternative patching.
>>    */
> 
> ...
Simon Horman May 23, 2024, 12:52 p.m. UTC | #6
[ resending as I mangled the previous attempt , sorry ]

+ Joe Perches

On Wed, May 22, 2024 at 04:39:57PM -0700, Alexey Makhalov wrote:
> Hi Simon, apologize for long delay
> 
> On 5/11/24 8:02 AM, Simon Horman wrote:
> > > diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h
> > 
> > ...
> > 
> > > +static inline
> > > +unsigned long vmware_hypercall3(unsigned long cmd, unsigned long in1,
> > > +				uint32_t *out1, uint32_t *out2)
> > 
> > nit: u32 is preferred over uint32_t.
> >       Likewise elsewhere in this patch-set.
> Good to know. Can you please shed a light on the reason?
> I still see bunch of stdint style uint32_t in arch/x86.

Perhaps there is a document on this that I should know about.
But AFAIK, u32 and so on are Linux kernel types,
while uint32_t are C99-standard types.

Joe, are you able to shed any further light on this?

...
diff mbox series

Patch

diff --git a/arch/x86/include/asm/vmware.h b/arch/x86/include/asm/vmware.h
index de2533337611..2ac87068184a 100644
--- a/arch/x86/include/asm/vmware.h
+++ b/arch/x86/include/asm/vmware.h
@@ -7,14 +7,37 @@ 
 #include <linux/stringify.h>
 
 /*
- * The hypercall definitions differ in the low word of the %edx argument
+ * VMware hypercall ABI.
+ *
+ * - Low bandwidth (LB) hypercalls (I/O port based, vmcall and vmmcall)
+ * have up to 6 input and 6 output arguments passed and returned using
+ * registers: %eax (arg0), %ebx (arg1), %ecx (arg2), %edx (arg3),
+ * %esi (arg4), %edi (arg5).
+ * The following input arguments must be initialized by the caller:
+ * arg0 - VMWARE_HYPERVISOR_MAGIC
+ * arg2 - Hypercall command
+ * arg3 bits [15:0] - Port number, LB and direction flags
+ *
+ * - High bandwidth (HB) hypercalls are I/O port based only. They have
+ * up to 7 input and 7 output arguments passed and returned using
+ * registers: %eax (arg0), %ebx (arg1), %ecx (arg2), %edx (arg3),
+ * %esi (arg4), %edi (arg5), %ebp (arg6).
+ * The following input arguments must be initialized by the caller:
+ * arg0 - VMWARE_HYPERVISOR_MAGIC
+ * arg1 - Hypercall command
+ * arg3 bits [15:0] - Port number, HB and direction flags
+ *
+ * For compatibility purposes, x86_64 systems use only lower 32 bits
+ * for input and output arguments.
+ *
+ * The hypercall definitions differ in the low word of the %edx (arg3)
  * in the following way: the old I/O port based interface uses the port
  * number to distinguish between high- and low bandwidth versions, and
  * uses IN/OUT instructions to define transfer direction.
  *
  * The new vmcall interface instead uses a set of flags to select
  * bandwidth mode and transfer direction. The flags should be loaded
- * into %dx by any user and are automatically replaced by the port
+ * into arg3 by any user and are automatically replaced by the port
  * number if the I/O port method is used.
  */
 
@@ -37,69 +60,218 @@ 
 
 extern u8 vmware_hypercall_mode;
 
-/* The low bandwidth call. The low word of edx is presumed clear. */
-#define VMWARE_HYPERCALL						\
-	ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT) ", %%dx; " \
-		      "inl (%%dx), %%eax",				\
-		      "vmcall", X86_FEATURE_VMCALL,			\
-		      "vmmcall", X86_FEATURE_VMW_VMMCALL)
-
 /*
- * The high bandwidth out call. The low word of edx is presumed to have the
- * HB and OUT bits set.
+ * The low bandwidth call. The low word of %edx is presumed to have OUT bit
+ * set. The high word of %edx may contain input data from the caller.
  */
-#define VMWARE_HYPERCALL_HB_OUT						\
-	ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; " \
-		      "rep outsb",					\
+#define VMWARE_HYPERCALL						\
+	ALTERNATIVE_3("cmpb $"						\
+			__stringify(CPUID_VMWARE_FEATURES_ECX_VMMCALL)	\
+			", %[mode]\n\t"					\
+		      "jg 2f\n\t"					\
+		      "je 1f\n\t"					\
+		      "movw %[port], %%dx\n\t"				\
+		      "inl (%%dx), %%eax\n\t"				\
+		      "jmp 3f\n\t"					\
+		      "1: vmmcall\n\t"					\
+		      "jmp 3f\n\t"					\
+		      "2: vmcall\n\t"					\
+		      "3:\n\t",						\
+		      "movw %[port], %%dx\n\t"				\
+		      "inl (%%dx), %%eax", X86_FEATURE_HYPERVISOR,	\
 		      "vmcall", X86_FEATURE_VMCALL,			\
 		      "vmmcall", X86_FEATURE_VMW_VMMCALL)
 
+static inline
+unsigned long vmware_hypercall1(unsigned long cmd, unsigned long in1)
+{
+	unsigned long out0;
+
+	asm_inline volatile (VMWARE_HYPERCALL
+		: "=a" (out0)
+		: [port] "i" (VMWARE_HYPERVISOR_PORT),
+		  [mode] "m" (vmware_hypercall_mode),
+		  "a" (VMWARE_HYPERVISOR_MAGIC),
+		  "b" (in1),
+		  "c" (cmd),
+		  "d" (0)
+		: "cc", "memory");
+	return out0;
+}
+
+static inline
+unsigned long vmware_hypercall3(unsigned long cmd, unsigned long in1,
+				uint32_t *out1, uint32_t *out2)
+{
+	unsigned long out0;
+
+	asm_inline volatile (VMWARE_HYPERCALL
+		: "=a" (out0), "=b" (*out1), "=c" (*out2)
+		: [port] "i" (VMWARE_HYPERVISOR_PORT),
+		  [mode] "m" (vmware_hypercall_mode),
+		  "a" (VMWARE_HYPERVISOR_MAGIC),
+		  "b" (in1),
+		  "c" (cmd),
+		  "d" (0)
+		: "cc", "memory");
+	return out0;
+}
+
+static inline
+unsigned long vmware_hypercall4(unsigned long cmd, unsigned long in1,
+				uint32_t *out1, uint32_t *out2,
+				uint32_t *out3)
+{
+	unsigned long out0;
+
+	asm_inline volatile (VMWARE_HYPERCALL
+		: "=a" (out0), "=b" (*out1), "=c" (*out2), "=d" (*out3)
+		: [port] "i" (VMWARE_HYPERVISOR_PORT),
+		  [mode] "m" (vmware_hypercall_mode),
+		  "a" (VMWARE_HYPERVISOR_MAGIC),
+		  "b" (in1),
+		  "c" (cmd),
+		  "d" (0)
+		: "cc", "memory");
+	return out0;
+}
+
+static inline
+unsigned long vmware_hypercall5(unsigned long cmd, unsigned long in1,
+				unsigned long in3, unsigned long in4,
+				unsigned long in5, uint32_t *out2)
+{
+	unsigned long out0;
+
+	asm_inline volatile (VMWARE_HYPERCALL
+		: "=a" (out0), "=c" (*out2)
+		: [port] "i" (VMWARE_HYPERVISOR_PORT),
+		  [mode] "m" (vmware_hypercall_mode),
+		  "a" (VMWARE_HYPERVISOR_MAGIC),
+		  "b" (in1),
+		  "c" (cmd),
+		  "d" (in3),
+		  "S" (in4),
+		  "D" (in5)
+		: "cc", "memory");
+	return out0;
+}
+
+static inline
+unsigned long vmware_hypercall6(unsigned long cmd, unsigned long in1,
+				unsigned long in3, uint32_t *out2,
+				uint32_t *out3, uint32_t *out4,
+				uint32_t *out5)
+{
+	unsigned long out0;
+
+	asm_inline volatile (VMWARE_HYPERCALL
+		: "=a" (out0), "=c" (*out2), "=d" (*out3), "=S" (*out4),
+		  "=D" (*out5)
+		: [port] "i" (VMWARE_HYPERVISOR_PORT),
+		  [mode] "m" (vmware_hypercall_mode),
+		  "a" (VMWARE_HYPERVISOR_MAGIC),
+		  "b" (in1),
+		  "c" (cmd),
+		  "d" (in3)
+		: "cc", "memory");
+	return out0;
+}
+
+static inline
+unsigned long vmware_hypercall7(unsigned long cmd, unsigned long in1,
+				unsigned long in3, unsigned long in4,
+				unsigned long in5, uint32_t *out1,
+				uint32_t *out2, uint32_t *out3)
+{
+	unsigned long out0;
+
+	asm_inline volatile (VMWARE_HYPERCALL
+		: "=a" (out0), "=b" (*out1), "=c" (*out2), "=d" (*out3)
+		: [port] "i" (VMWARE_HYPERVISOR_PORT),
+		  [mode] "m" (vmware_hypercall_mode),
+		  "a" (VMWARE_HYPERVISOR_MAGIC),
+		  "b" (in1),
+		  "c" (cmd),
+		  "d" (in3),
+		  "S" (in4),
+		  "D" (in5)
+		: "cc", "memory");
+	return out0;
+}
+
+
+#ifdef CONFIG_X86_64
+#define VMW_BP_REG "%%rbp"
+#define VMW_BP_CONSTRAINT "r"
+#else
+#define VMW_BP_REG "%%ebp"
+#define VMW_BP_CONSTRAINT "m"
+#endif
+
 /*
- * The high bandwidth in call. The low word of edx is presumed to have the
- * HB bit set.
+ * High bandwidth calls are not supported on encrypted memory guests.
+ * The caller should check cc_platform_has(CC_ATTR_MEM_ENCRYPT) and use
+ * low bandwidth hypercall it memory encryption is set.
+ * This assumption simplifies HB hypercall impementation to just I/O port
+ * based approach without alternative patching.
  */
-#define VMWARE_HYPERCALL_HB_IN						\
-	ALTERNATIVE_2("movw $" __stringify(VMWARE_HYPERVISOR_PORT_HB) ", %%dx; " \
-		      "rep insb",					\
-		      "vmcall", X86_FEATURE_VMCALL,			\
-		      "vmmcall", X86_FEATURE_VMW_VMMCALL)
+static inline
+unsigned long vmware_hypercall_hb_out(unsigned long cmd, unsigned long in2,
+				      unsigned long in3, unsigned long in4,
+				      unsigned long in5, unsigned long in6,
+				      uint32_t *out1)
+{
+	unsigned long out0;
+
+	asm_inline volatile (
+		UNWIND_HINT_SAVE
+		"push " VMW_BP_REG "\n\t"
+		UNWIND_HINT_UNDEFINED
+		"mov %[in6], " VMW_BP_REG "\n\t"
+		"rep outsb\n\t"
+		"pop " VMW_BP_REG "\n\t"
+		UNWIND_HINT_RESTORE
+		: "=a" (out0), "=b" (*out1)
+		: "a" (VMWARE_HYPERVISOR_MAGIC),
+		  "b" (cmd),
+		  "c" (in2),
+		  "d" (in3 | VMWARE_HYPERVISOR_PORT_HB),
+		  "S" (in4),
+		  "D" (in5),
+		  [in6] VMW_BP_CONSTRAINT (in6)
+		: "cc", "memory");
+	return out0;
+}
+
+static inline
+unsigned long vmware_hypercall_hb_in(unsigned long cmd, unsigned long in2,
+				     unsigned long in3, unsigned long in4,
+				     unsigned long in5, unsigned long in6,
+				     uint32_t *out1)
+{
+	unsigned long out0;
 
-#define VMWARE_PORT(cmd, eax, ebx, ecx, edx)				\
-	__asm__("inl (%%dx), %%eax" :					\
-		"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :		\
-		"a"(VMWARE_HYPERVISOR_MAGIC),				\
-		"c"(VMWARE_CMD_##cmd),					\
-		"d"(VMWARE_HYPERVISOR_PORT), "b"(UINT_MAX) :		\
-		"memory")
-
-#define VMWARE_VMCALL(cmd, eax, ebx, ecx, edx)				\
-	__asm__("vmcall" :						\
-		"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :		\
-		"a"(VMWARE_HYPERVISOR_MAGIC),				\
-		"c"(VMWARE_CMD_##cmd),					\
-		"d"(0), "b"(UINT_MAX) :					\
-		"memory")
-
-#define VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx)				\
-	__asm__("vmmcall" :						\
-		"=a"(eax), "=c"(ecx), "=d"(edx), "=b"(ebx) :		\
-		"a"(VMWARE_HYPERVISOR_MAGIC),				\
-		"c"(VMWARE_CMD_##cmd),					\
-		"d"(0), "b"(UINT_MAX) :					\
-		"memory")
-
-#define VMWARE_CMD(cmd, eax, ebx, ecx, edx) do {		\
-	switch (vmware_hypercall_mode) {			\
-	case CPUID_VMWARE_FEATURES_ECX_VMCALL:			\
-		VMWARE_VMCALL(cmd, eax, ebx, ecx, edx);		\
-		break;						\
-	case CPUID_VMWARE_FEATURES_ECX_VMMCALL:			\
-		VMWARE_VMMCALL(cmd, eax, ebx, ecx, edx);	\
-		break;						\
-	default:						\
-		VMWARE_PORT(cmd, eax, ebx, ecx, edx);		\
-		break;						\
-	}							\
-	} while (0)
+	asm_inline volatile (
+		UNWIND_HINT_SAVE
+		"push " VMW_BP_REG "\n\t"
+		UNWIND_HINT_UNDEFINED
+		"mov %[in6], " VMW_BP_REG "\n\t"
+		"rep insb\n\t"
+		"pop " VMW_BP_REG "\n\t"
+		UNWIND_HINT_RESTORE
+		: "=a" (out0), "=b" (*out1)
+		: "a" (VMWARE_HYPERVISOR_MAGIC),
+		  "b" (cmd),
+		  "c" (in2),
+		  "d" (in3 | VMWARE_HYPERVISOR_PORT_HB),
+		  "S" (in4),
+		  "D" (in5),
+		  [in6] VMW_BP_CONSTRAINT (in6)
+		: "cc", "memory");
+	return out0;
+}
+#undef VMW_BP_REG
+#undef VMW_BP_CONSTRAINT
 
 #endif
diff --git a/arch/x86/kernel/cpu/vmware.c b/arch/x86/kernel/cpu/vmware.c
index 9d804d60a11f..3ec14a5fa4ac 100644
--- a/arch/x86/kernel/cpu/vmware.c
+++ b/arch/x86/kernel/cpu/vmware.c
@@ -67,9 +67,10 @@  EXPORT_SYMBOL_GPL(vmware_hypercall_mode);
 
 static inline int __vmware_platform(void)
 {
-	uint32_t eax, ebx, ecx, edx;
-	VMWARE_CMD(GETVERSION, eax, ebx, ecx, edx);
-	return eax != (uint32_t)-1 && ebx == VMWARE_HYPERVISOR_MAGIC;
+	uint32_t eax, ebx, ecx;
+
+	eax = vmware_hypercall3(VMWARE_CMD_GETVERSION, 0, &ebx, &ecx);
+	return eax != UINT_MAX && ebx == VMWARE_HYPERVISOR_MAGIC;
 }
 
 static unsigned long vmware_get_tsc_khz(void)
@@ -121,21 +122,12 @@  static void __init vmware_cyc2ns_setup(void)
 	pr_info("using clock offset of %llu ns\n", d->cyc2ns_offset);
 }
 
-static int vmware_cmd_stealclock(uint32_t arg1, uint32_t arg2)
+static int vmware_cmd_stealclock(uint32_t addr_hi, uint32_t addr_lo)
 {
-	uint32_t result, info;
-
-	asm volatile (VMWARE_HYPERCALL :
-		"=a"(result),
-		"=c"(info) :
-		"a"(VMWARE_HYPERVISOR_MAGIC),
-		"b"(0),
-		"c"(VMWARE_CMD_STEALCLOCK),
-		"d"(0),
-		"S"(arg1),
-		"D"(arg2) :
-		"memory");
-	return result;
+	uint32_t info;
+
+	return vmware_hypercall5(VMWARE_CMD_STEALCLOCK, 0, 0, addr_hi, addr_lo,
+				 &info);
 }
 
 static bool stealclock_enable(phys_addr_t pa)
@@ -344,10 +336,10 @@  static void __init vmware_set_capabilities(void)
 
 static void __init vmware_platform_setup(void)
 {
-	uint32_t eax, ebx, ecx, edx;
+	uint32_t eax, ebx, ecx;
 	uint64_t lpj, tsc_khz;
 
-	VMWARE_CMD(GETHZ, eax, ebx, ecx, edx);
+	eax = vmware_hypercall3(VMWARE_CMD_GETHZ, UINT_MAX, &ebx, &ecx);
 
 	if (ebx != UINT_MAX) {
 		lpj = tsc_khz = eax | (((uint64_t)ebx) << 32);
@@ -429,8 +421,9 @@  static uint32_t __init vmware_platform(void)
 /* Checks if hypervisor supports x2apic without VT-D interrupt remapping. */
 static bool __init vmware_legacy_x2apic_available(void)
 {
-	uint32_t eax, ebx, ecx, edx;
-	VMWARE_CMD(GETVCPU_INFO, eax, ebx, ecx, edx);
+	uint32_t eax;
+
+	eax = vmware_hypercall1(VMWARE_CMD_GETVCPU_INFO, 0);
 	return !(eax & GETVCPU_INFO_VCPU_RESERVED) &&
 		(eax & GETVCPU_INFO_LEGACY_X2APIC);
 }
diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_msg_x86.h b/drivers/gpu/drm/vmwgfx/vmwgfx_msg_x86.h
index 23899d743a90..e040ee21ea1a 100644
--- a/drivers/gpu/drm/vmwgfx/vmwgfx_msg_x86.h
+++ b/drivers/gpu/drm/vmwgfx/vmwgfx_msg_x86.h
@@ -68,6 +68,8 @@ 
                 "=d"(edx),			\
                 "=S"(si),			\
                 "=D"(di) :			\
+         [port] "i" (VMWARE_HYPERVISOR_PORT),	\
+         [mode] "m" (vmware_hypercall_mode),	\
                 "a"(magic),			\
                 "b"(in_ebx),			\
                 "c"(cmd),			\
@@ -110,7 +112,7 @@ 
 		"push %%rbp;"				\
 		UNWIND_HINT_UNDEFINED			\
                 "mov %12, %%rbp;"			\
-                VMWARE_HYPERCALL_HB_OUT			\
+                "rep outsb;"				\
                 "pop %%rbp;"				\
 		UNWIND_HINT_RESTORE :			\
                 "=a"(eax),				\
@@ -139,7 +141,7 @@ 
 		"push %%rbp;"				\
 		UNWIND_HINT_UNDEFINED			\
                 "mov %12, %%rbp;"			\
-                VMWARE_HYPERCALL_HB_IN			\
+                "rep insb;"				\
                 "pop %%rbp;"				\
 		UNWIND_HINT_RESTORE :			\
                 "=a"(eax),				\
diff --git a/drivers/input/mouse/vmmouse.c b/drivers/input/mouse/vmmouse.c
index ea9eff7c8099..ad94c835ee66 100644
--- a/drivers/input/mouse/vmmouse.c
+++ b/drivers/input/mouse/vmmouse.c
@@ -91,6 +91,8 @@  struct vmmouse_data {
 		"=d"(out4),				\
 		"=S"(__dummy1),				\
 		"=D"(__dummy2) :			\
+         [port] "i" (VMWARE_HYPERVISOR_PORT),		\
+         [mode] "m" (vmware_hypercall_mode),		\
 		"a"(VMMOUSE_PROTO_MAGIC),		\
 		"b"(in1),				\
 		"c"(VMMOUSE_PROTO_CMD_##cmd),		\
diff --git a/drivers/ptp/ptp_vmw.c b/drivers/ptp/ptp_vmw.c
index 27c5547aa8a9..279d191d2df9 100644
--- a/drivers/ptp/ptp_vmw.c
+++ b/drivers/ptp/ptp_vmw.c
@@ -29,6 +29,8 @@  static int ptp_vmw_pclk_read(u64 *ns)
 	asm volatile (VMWARE_HYPERCALL :
 		"=a"(ret), "=b"(nsec_hi), "=c"(nsec_lo), "=d"(unused1),
 		"=S"(unused2), "=D"(unused3) :
+		[port] "i" (VMWARE_HYPERVISOR_PORT),
+		[mode] "m" (vmware_hypercall_mode),
 		"a"(VMWARE_MAGIC), "b"(0),
 		"c"(VMWARE_CMD_PCLK_GETTIME), "d"(0) :
 		"memory");