diff mbox series

[6/6] KVM: VMX: enable IPI virtualization

Message ID 20210716064808.14757-7-guang.zeng@intel.com (mailing list archive)
State New, archived
Headers show
Series IPI virtualization support for VM | expand

Commit Message

Zeng Guang July 16, 2021, 6:48 a.m. UTC
From: Gao Chao <chao.gao@intel.com>

With IPI virtualization enabled, the processor emulates writes
to APIC registers that would send IPIs. The processor sets the
bit corresponding to the vector in target vCPU's PIR and may send
a notification (IPI) specified by NDST and NV fields in target vCPU's
PID. It is similar to what IOMMU engine does when dealing with posted
interrupt from devices.

A PID-pointer table is used by the processor to locate the PID of a
vCPU with the vCPU's APIC ID.

Like VT-d PI, if a vCPU goes to blocked state, VMM needs to switch its
notification vector to wakeup vector. This can ensure that when an IPI
for blocked vCPUs arrives, VMM can get control and wake up blocked
vCPUs. And if a VCPU is preempted, its posted interrupt notification
is suppressed.

Note that IPI virtualization can only virualize physical-addressing,
flat mode, unicast IPIs. Sending other IPIs would still cause a
VM exit and need to be handled by VMM.

Signed-off-by: Gao Chao <chao.gao@intel.com>
Signed-off-by: Zeng Guang <guang.zeng@intel.com>
---
 arch/x86/include/asm/vmx.h         |  8 ++++
 arch/x86/include/asm/vmxfeatures.h |  2 +
 arch/x86/kvm/vmx/capabilities.h    |  1 +
 arch/x86/kvm/vmx/posted_intr.c     | 22 ++++++---
 arch/x86/kvm/vmx/vmx.c             | 72 ++++++++++++++++++++++++++++--
 arch/x86/kvm/vmx/vmx.h             |  6 +++
 6 files changed, 102 insertions(+), 9 deletions(-)

Comments

Paolo Bonzini July 16, 2021, 9:52 a.m. UTC | #1
On 16/07/21 08:48, Zeng Guang wrote:
>  
> +	if (!(_cpu_based_3rd_exec_control & TERTIARY_EXEC_IPI_VIRT))
> +		enable_ipiv = 0;
> +
>   	}

Please move this to hardware_setup(), using a new function 
cpu_has_vmx_ipiv() in vmx/capabilities.h.

>  	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
> -		u64 opt3 = 0;
> +		u64 opt3 = enable_ipiv ? TERTIARY_EXEC_IPI_VIRT : 0;
>  		u64 min3 = 0;

I like the idea of changing opt3, but it's different from how 
setup_vmcs_config works for the other execution controls.  Let me think 
if it makes sense to clean this up, and move the handling of other 
module parameters from hardware_setup() to setup_vmcs_config().

> +
> +	if (vmx->ipiv_active)
> +		install_pid(vmx);

This should be if (enable_ipiv) instead, I think.

In fact, in all other places that are using vmx->ipiv_active, you can 
actually replace it with enable_ipiv; they are all reached only with 
kvm_vcpu_apicv_active(vcpu) == true.

> +	if (!enable_apicv) {
> +		enable_ipiv = 0;
> +		vmcs_config.cpu_based_3rd_exec_ctrl &= ~TERTIARY_EXEC_IPI_VIRT;
> +	}

The assignment to vmcs_config.cpu_based_3rd_exec_ctrl should not be 
necessary; kvm_vcpu_apicv_active will always be false in that case and 
IPI virtualization would never be enabled.

Paolo
Zeng Guang July 17, 2021, 3:55 a.m. UTC | #2
On 7/16/2021 5:52 PM, Paolo Bonzini wrote:
> On 16/07/21 08:48, Zeng Guang wrote:
>>
>> +    if (!(_cpu_based_3rd_exec_control & TERTIARY_EXEC_IPI_VIRT))
>> +        enable_ipiv = 0;
>> +
>>       }
>
> Please move this to hardware_setup(), using a new function 
> cpu_has_vmx_ipiv() in vmx/capabilities.h.
>
ok, we will change it to follow current framework.
>>      if (_cpu_based_exec_control & 
>> CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
>> -        u64 opt3 = 0;
>> +        u64 opt3 = enable_ipiv ? TERTIARY_EXEC_IPI_VIRT : 0;
>>          u64 min3 = 0;
>
> I like the idea of changing opt3, but it's different from how 
> setup_vmcs_config works for the other execution controls.  Let me 
> think if it makes sense to clean this up, and move the handling of 
> other module parameters from hardware_setup() to setup_vmcs_config().
>
May be an exception for ipiv feature ?
>> +
>> +    if (vmx->ipiv_active)
>> +        install_pid(vmx);
>
> This should be if (enable_ipiv) instead, I think.
>
> In fact, in all other places that are using vmx->ipiv_active, you can 
> actually replace it with enable_ipiv; they are all reached only with 
> kvm_vcpu_apicv_active(vcpu) == true.
>
enable_ipiv as a global variable indicates the hardware capability to 
enable IPIv. Each VM may have different IPIv configuration according to 
kvm_vcpu_apicv_active status. So we use ipiv_active per VM to enclose 
IPIv related operations.
>> +    if (!enable_apicv) {
>> +        enable_ipiv = 0;
>> +        vmcs_config.cpu_based_3rd_exec_ctrl &= ~TERTIARY_EXEC_IPI_VIRT;
>> +    }
>
> The assignment to vmcs_config.cpu_based_3rd_exec_ctrl should not be 
> necessary; kvm_vcpu_apicv_active will always be false in that case and 
> IPI virtualization would never be enabled.
>
We originally intend to make vmcs_config consistent with the actual ipiv 
capability and decouple it from other factors. As you mentioned , it's 
not necessary to update vmcs_config.cpu_based_3rd_exec_ctrl in this 
case. We will remove it.

Thanks.

> Paolo
>
Paolo Bonzini July 18, 2021, 8:32 p.m. UTC | #3
On 17/07/21 05:55, Zeng Guang wrote:
>>>      if (_cpu_based_exec_control & 
>>> CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
>>> -        u64 opt3 = 0;
>>> +        u64 opt3 = enable_ipiv ? TERTIARY_EXEC_IPI_VIRT : 0;
>>>          u64 min3 = 0;
>>
>> I like the idea of changing opt3, but it's different from how 
>> setup_vmcs_config works for the other execution controls.  Let me 
>> think if it makes sense to clean this up, and move the handling of 
>> other module parameters from hardware_setup() to setup_vmcs_config().
>>
> May be an exception for ipiv feature ?

Yes, possibly.  I'll look into using this idea for other parameters.

>>> +    if (vmx->ipiv_active)
>>> +        install_pid(vmx);
>>
>> This should be if (enable_ipiv) instead, I think.
>>
>> In fact, in all other places that are using vmx->ipiv_active, you can 
>> actually replace it with enable_ipiv; they are all reached only with 
>> kvm_vcpu_apicv_active(vcpu) == true.
>>
> enable_ipiv as a global variable indicates the hardware capability to 
> enable IPIv. Each VM may have different IPIv configuration according to 
> kvm_vcpu_apicv_active status. So we use ipiv_active per VM to enclose 
> IPIv related operations.

Understood, but in practice all uses of vmx->ipiv_active are guarded by 
kvm_vcpu_apicv_active so they are always reached with vmx->ipiv_active 
== enable_ipiv.

The one above instead seems wrong and should just use enable_ipiv.

Paolo
Zeng Guang July 19, 2021, 12:38 p.m. UTC | #4
On 7/19/2021 4:32 AM, Paolo Bonzini wrote:
> On 17/07/21 05:55, Zeng Guang wrote:
>>>> +    if (vmx->ipiv_active)
>>>> +        install_pid(vmx);
>>> This should be if (enable_ipiv) instead, I think.
>>>
>>> In fact, in all other places that are using vmx->ipiv_active, you can
>>> actually replace it with enable_ipiv; they are all reached only with
>>> kvm_vcpu_apicv_active(vcpu) == true.
>>>
>> enable_ipiv as a global variable indicates the hardware capability to
>> enable IPIv. Each VM may have different IPIv configuration according to
>> kvm_vcpu_apicv_active status. So we use ipiv_active per VM to enclose
>> IPIv related operations.
> Understood, but in practice all uses of vmx->ipiv_active are guarded by
> kvm_vcpu_apicv_active so they are always reached with vmx->ipiv_active
> == enable_ipiv.
>
> The one above instead seems wrong and should just use enable_ipiv.

enable_ipiv associate with "IPI virtualization" setting in tertiary exec 
controls and
enable_apicv which depends on cpu_has_vmx_apicv(). kvm_vcpu_apicv_active 
still
can be false even if enable_ipiv is true, e.g. in case irqchip not 
emulated in kernel.

It's ok to use enable_ipiv here. vmcs setup succeed for IPIv but it 
won't take into effect as
false kvm_vcpu_apicv_active disable “IPI virtualization" in this case.

> Paolo
>
Zeng Guang July 19, 2021, 1:16 p.m. UTC | #5
On 7/19/2021 4:32 AM, Paolo Bonzini wrote:
> On 17/07/21 05:55, Zeng Guang wrote:
>>>> +    if (vmx->ipiv_active)
>>>> +        install_pid(vmx);
>>> This should be if (enable_ipiv) instead, I think.
>>>
>>> In fact, in all other places that are using vmx->ipiv_active, you can
>>> actually replace it with enable_ipiv; they are all reached only with
>>> kvm_vcpu_apicv_active(vcpu) == true.
>>>
>> enable_ipiv as a global variable indicates the hardware capability to
>> enable IPIv. Each VM may have different IPIv configuration according to
>> kvm_vcpu_apicv_active status. So we use ipiv_active per VM to enclose
>> IPIv related operations.
> Understood, but in practice all uses of vmx->ipiv_active are guarded by
> kvm_vcpu_apicv_active so they are always reached with vmx->ipiv_active
> == enable_ipiv.
>
> The one above instead seems wrong and should just use enable_ipiv.
enable_ipiv associate with "IPI virtualization" setting in tertiary exec 
controls and
enable_apicv which depends on cpu_has_vmx_apicv(). kvm_vcpu_apicv_active 
still
can be false even if enable_ipiv is true, e.g. in case irqchip not 
emulated in kernel.

Though it's ok to use enable_ipiv here, vmcs setup succeed for IPIv but 
it won't take into effect as
false kvm_vcpu_apicv_active runtime disable “IPI virtualization" for VM 
in this case and
leads vmx->ipiv_active become false as well. So vmx->ipiv_active is more 
accurate to reflect
runtime IPIv status.
> Paolo
>
Paolo Bonzini July 19, 2021, 1:58 p.m. UTC | #6
On 19/07/21 14:38, Zeng Guang wrote:
>> Understood, but in practice all uses of vmx->ipiv_active are
>> guarded by kvm_vcpu_apicv_active so they are always reached with
>> vmx->ipiv_active == enable_ipiv.
>> 
>> The one above instead seems wrong and should just use enable_ipiv.
> 
> enable_ipiv associate with "IPI virtualization" setting in tertiary
> exec controls and enable_apicv which depends on cpu_has_vmx_apicv().
> kvm_vcpu_apicv_active still can be false even if enable_ipiv is true,
> e.g. in case irqchip not emulated in kernel.

Right, kvm_vcpu_apicv_active *is* set in init_vmcs.  But there's an 
     "if (kvm_vcpu_apicv_active(&vmx->vcpu))" above.  You can just stick

	if (enable_ipicv)
		install_pid(vmx);

inside there.  As to the other occurrences of vmx->ipiv_active, look here:

> +	if (!kvm_vcpu_apicv_active(vcpu))
> +		return;
> +
> +	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
> +		!irq_remapping_cap(IRQ_POSTING_CAP)) &&
> +		!to_vmx(vcpu)->ipiv_active)
>  		return;
>  

This one can be enable_ipiv because APICv must be active.

> +	if (!kvm_vcpu_apicv_active(vcpu))
> +		return 0;
> +
> +	/* Put vCPU into a list and set NV to wakeup vector if it is
> +	 * one of the following cases:
> +	 * 1. any assigned device is in use.
> +	 * 2. IPI virtualization is enabled.
> +	 */
> +	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
> +		!irq_remapping_cap(IRQ_POSTING_CAP)) && !to_vmx(vcpu)->ipiv_active)
>  		return 0;

This one can be !enable_ipiv because APICv must be active.

> 
> @@ -3870,6 +3877,8 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu, u8 mode)
>  		vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
>  		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
>  		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
> +		vmx_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR),
> +				MSR_TYPE_RW, !to_vmx(vcpu)->ipiv_active);
>  	}
>  }

Is inside "if (mode & MSR_BITMAP_MODE_X2APIC_APICV)" so APICv must be 
activ; so it can be enable_ipiv as well.

In conclusion, you do not need vmx->ipiv_active.

Paolo
Zeng Guang July 20, 2021, 1:07 a.m. UTC | #7
On 7/19/2021 9:58 PM, Paolo Bonzini wrote:
> On 19/07/21 14:38, Zeng Guang wrote:
>>> Understood, but in practice all uses of vmx->ipiv_active are
>>> guarded by kvm_vcpu_apicv_active so they are always reached with
>>> vmx->ipiv_active == enable_ipiv.
>>>
>>> The one above instead seems wrong and should just use enable_ipiv.
>> enable_ipiv associate with "IPI virtualization" setting in tertiary
>> exec controls and enable_apicv which depends on cpu_has_vmx_apicv().
>> kvm_vcpu_apicv_active still can be false even if enable_ipiv is true,
>> e.g. in case irqchip not emulated in kernel.
> Right, kvm_vcpu_apicv_active *is* set in init_vmcs.  But there's an
>       "if (kvm_vcpu_apicv_active(&vmx->vcpu))" above.  You can just stick
>
> 	if (enable_ipicv)
> 		install_pid(vmx);
Ok, got your point now. I will revise to remove vmx->ipiv_active. Thanks.
> inside there.  As to the other occurrences of vmx->ipiv_active, look here:
>
>> +	if (!kvm_vcpu_apicv_active(vcpu))
>> +		return;
>> +
>> +	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
>> +		!irq_remapping_cap(IRQ_POSTING_CAP)) &&
>> +		!to_vmx(vcpu)->ipiv_active)
>>   		return;
>>   
> This one can be enable_ipiv because APICv must be active.
>
>> +	if (!kvm_vcpu_apicv_active(vcpu))
>> +		return 0;
>> +
>> +	/* Put vCPU into a list and set NV to wakeup vector if it is
>> +	 * one of the following cases:
>> +	 * 1. any assigned device is in use.
>> +	 * 2. IPI virtualization is enabled.
>> +	 */
>> +	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
>> +		!irq_remapping_cap(IRQ_POSTING_CAP)) && !to_vmx(vcpu)->ipiv_active)
>>   		return 0;
> This one can be !enable_ipiv because APICv must be active.
>
>> @@ -3870,6 +3877,8 @@ static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu, u8 mode)
>>   		vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
>>   		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
>>   		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
>> +		vmx_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR),
>> +				MSR_TYPE_RW, !to_vmx(vcpu)->ipiv_active);
>>   	}
>>   }
> Is inside "if (mode & MSR_BITMAP_MODE_X2APIC_APICV)" so APICv must be
> activ; so it can be enable_ipiv as well.
>
> In conclusion, you do not need vmx->ipiv_active.
>
> Paolo
>
diff mbox series

Patch

diff --git a/arch/x86/include/asm/vmx.h b/arch/x86/include/asm/vmx.h
index 15652047f2db..e97cf7b9ff12 100644
--- a/arch/x86/include/asm/vmx.h
+++ b/arch/x86/include/asm/vmx.h
@@ -76,6 +76,11 @@ 
 #define SECONDARY_EXEC_ENABLE_USR_WAIT_PAUSE	VMCS_CONTROL_BIT(USR_WAIT_PAUSE)
 #define SECONDARY_EXEC_BUS_LOCK_DETECTION	VMCS_CONTROL_BIT(BUS_LOCK_DETECTION)
 
+/*
+ * Definitions of Tertiary Processor-Based VM-Execution Controls.
+ */
+#define TERTIARY_EXEC_IPI_VIRT			VMCS_CONTROL_BIT(IPI_VIRT)
+
 #define PIN_BASED_EXT_INTR_MASK                 VMCS_CONTROL_BIT(INTR_EXITING)
 #define PIN_BASED_NMI_EXITING                   VMCS_CONTROL_BIT(NMI_EXITING)
 #define PIN_BASED_VIRTUAL_NMIS                  VMCS_CONTROL_BIT(VIRTUAL_NMIS)
@@ -159,6 +164,7 @@  static inline int vmx_misc_mseg_revid(u64 vmx_misc)
 enum vmcs_field {
 	VIRTUAL_PROCESSOR_ID            = 0x00000000,
 	POSTED_INTR_NV                  = 0x00000002,
+	LAST_PID_POINTER_INDEX		= 0x00000008,
 	GUEST_ES_SELECTOR               = 0x00000800,
 	GUEST_CS_SELECTOR               = 0x00000802,
 	GUEST_SS_SELECTOR               = 0x00000804,
@@ -224,6 +230,8 @@  enum vmcs_field {
 	TSC_MULTIPLIER_HIGH             = 0x00002033,
 	TERTIARY_VM_EXEC_CONTROL	= 0x00002034,
 	TERTIARY_VM_EXEC_CONTROL_HIGH	= 0x00002035,
+	PID_POINTER_TABLE		= 0x00002042,
+	PID_POINTER_TABLE_HIGH		= 0x00002043,
 	GUEST_PHYSICAL_ADDRESS          = 0x00002400,
 	GUEST_PHYSICAL_ADDRESS_HIGH     = 0x00002401,
 	VMCS_LINK_POINTER               = 0x00002800,
diff --git a/arch/x86/include/asm/vmxfeatures.h b/arch/x86/include/asm/vmxfeatures.h
index 27e76eeca05b..e821e8126097 100644
--- a/arch/x86/include/asm/vmxfeatures.h
+++ b/arch/x86/include/asm/vmxfeatures.h
@@ -86,4 +86,6 @@ 
 #define VMX_FEATURE_ENCLV_EXITING	( 2*32+ 28) /* "" VM-Exit on ENCLV (leaf dependent) */
 #define VMX_FEATURE_BUS_LOCK_DETECTION	( 2*32+ 30) /* "" VM-Exit when bus lock caused */
 
+/* Tertiary Processor-Based VM-Execution Controls, word 3 */
+#define VMX_FEATURE_IPI_VIRT		( 3*32 + 4) /* "" Enable IPI virtualization */
 #endif /* _ASM_X86_VMXFEATURES_H */
diff --git a/arch/x86/kvm/vmx/capabilities.h b/arch/x86/kvm/vmx/capabilities.h
index 38d414f64e61..9e9710c3ee51 100644
--- a/arch/x86/kvm/vmx/capabilities.h
+++ b/arch/x86/kvm/vmx/capabilities.h
@@ -12,6 +12,7 @@  extern bool __read_mostly enable_ept;
 extern bool __read_mostly enable_unrestricted_guest;
 extern bool __read_mostly enable_ept_ad_bits;
 extern bool __read_mostly enable_pml;
+extern bool __read_mostly enable_ipiv;
 extern int __read_mostly pt_mode;
 
 #define PT_MODE_SYSTEM		0
diff --git a/arch/x86/kvm/vmx/posted_intr.c b/arch/x86/kvm/vmx/posted_intr.c
index 5f81ef092bd4..d817331bfb05 100644
--- a/arch/x86/kvm/vmx/posted_intr.c
+++ b/arch/x86/kvm/vmx/posted_intr.c
@@ -81,9 +81,12 @@  void vmx_vcpu_pi_put(struct kvm_vcpu *vcpu)
 {
 	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
 
-	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
-		!irq_remapping_cap(IRQ_POSTING_CAP)  ||
-		!kvm_vcpu_apicv_active(vcpu))
+	if (!kvm_vcpu_apicv_active(vcpu))
+		return;
+
+	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
+		!irq_remapping_cap(IRQ_POSTING_CAP)) &&
+		!to_vmx(vcpu)->ipiv_active)
 		return;
 
 	/* Set SN when the vCPU is preempted */
@@ -141,9 +144,16 @@  int pi_pre_block(struct kvm_vcpu *vcpu)
 	struct pi_desc old, new;
 	struct pi_desc *pi_desc = vcpu_to_pi_desc(vcpu);
 
-	if (!kvm_arch_has_assigned_device(vcpu->kvm) ||
-		!irq_remapping_cap(IRQ_POSTING_CAP)  ||
-		!kvm_vcpu_apicv_active(vcpu))
+	if (!kvm_vcpu_apicv_active(vcpu))
+		return 0;
+
+	/* Put vCPU into a list and set NV to wakeup vector if it is
+	 * one of the following cases:
+	 * 1. any assigned device is in use.
+	 * 2. IPI virtualization is enabled.
+	 */
+	if ((!kvm_arch_has_assigned_device(vcpu->kvm) ||
+		!irq_remapping_cap(IRQ_POSTING_CAP)) && !to_vmx(vcpu)->ipiv_active)
 		return 0;
 
 	WARN_ON(irqs_disabled());
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 820fc131d258..8a45f45b263c 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -104,6 +104,9 @@  module_param(fasteoi, bool, S_IRUGO);
 
 module_param(enable_apicv, bool, S_IRUGO);
 
+bool __read_mostly enable_ipiv = 1;
+module_param(enable_ipiv, bool, S_IRUGO);
+
 /*
  * If nested=1, nested virtualization is supported, i.e., guests may use
  * VMX and be a hypervisor for its own guests. If nested=0, guests may not
@@ -225,6 +228,7 @@  static const struct {
 };
 
 #define L1D_CACHE_ORDER 4
+#define PID_TABLE_ORDER get_order(KVM_MAX_VCPU_ID << 3)
 static void *vmx_l1d_flush_pages;
 
 static int vmx_setup_l1d_flush(enum vmx_l1d_flush_state l1tf)
@@ -2514,7 +2518,7 @@  static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 	}
 
 	if (_cpu_based_exec_control & CPU_BASED_ACTIVATE_TERTIARY_CONTROLS) {
-		u64 opt3 = 0;
+		u64 opt3 = enable_ipiv ? TERTIARY_EXEC_IPI_VIRT : 0;
 		u64 min3 = 0;
 
 		if (adjust_vmx_controls_64(min3, opt3,
@@ -2523,6 +2527,9 @@  static __init int setup_vmcs_config(struct vmcs_config *vmcs_conf,
 			return -EIO;
 	}
 
+	if (!(_cpu_based_3rd_exec_control & TERTIARY_EXEC_IPI_VIRT))
+		enable_ipiv = 0;
+
 	min = VM_EXIT_SAVE_DEBUG_CONTROLS | VM_EXIT_ACK_INTR_ON_EXIT;
 #ifdef CONFIG_X86_64
 	min |= VM_EXIT_HOST_ADDR_SPACE_SIZE;
@@ -3870,6 +3877,8 @@  static void vmx_update_msr_bitmap_x2apic(struct kvm_vcpu *vcpu, u8 mode)
 		vmx_enable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_TMCCT), MSR_TYPE_RW);
 		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_EOI), MSR_TYPE_W);
 		vmx_disable_intercept_for_msr(vcpu, X2APIC_MSR(APIC_SELF_IPI), MSR_TYPE_W);
+		vmx_set_intercept_for_msr(vcpu, X2APIC_MSR(APIC_ICR),
+				MSR_TYPE_RW, !to_vmx(vcpu)->ipiv_active);
 	}
 }
 
@@ -4138,14 +4147,21 @@  static void vmx_refresh_apicv_exec_ctrl(struct kvm_vcpu *vcpu)
 
 	pin_controls_set(vmx, vmx_pin_based_exec_ctrl(vmx));
 	if (cpu_has_secondary_exec_ctrls()) {
-		if (kvm_vcpu_apicv_active(vcpu))
+		if (kvm_vcpu_apicv_active(vcpu)) {
 			secondary_exec_controls_setbit(vmx,
 				      SECONDARY_EXEC_APIC_REGISTER_VIRT |
 				      SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
-		else
+			if (cpu_has_tertiary_exec_ctrls() && enable_ipiv)
+				tertiary_exec_controls_setbit(vmx,
+					TERTIARY_EXEC_IPI_VIRT);
+		} else {
 			secondary_exec_controls_clearbit(vmx,
 					SECONDARY_EXEC_APIC_REGISTER_VIRT |
 					SECONDARY_EXEC_VIRTUAL_INTR_DELIVERY);
+			if (cpu_has_tertiary_exec_ctrls())
+				tertiary_exec_controls_clearbit(vmx,
+					TERTIARY_EXEC_IPI_VIRT);
+		}
 	}
 
 	if (cpu_has_vmx_msr_bitmap())
@@ -4236,8 +4252,14 @@  vmx_adjust_secondary_exec_control(struct vcpu_vmx *vmx, u32 *exec_control,
 
 static void vmx_compute_tertiary_exec_control(struct vcpu_vmx *vmx)
 {
+	struct kvm_vcpu *vcpu = &vmx->vcpu;
 	u32 exec_control = vmcs_config.cpu_based_3rd_exec_ctrl;
 
+	if (!kvm_vcpu_apicv_active(vcpu))
+		exec_control &= ~TERTIARY_EXEC_IPI_VIRT;
+
+	vmx->ipiv_active = (exec_control & TERTIARY_EXEC_IPI_VIRT) ? true : false;
+
 	vmx->tertiary_exec_control = exec_control;
 }
 
@@ -4332,6 +4354,17 @@  static void vmx_compute_secondary_exec_control(struct vcpu_vmx *vmx)
 
 #define VMX_XSS_EXIT_BITMAP 0
 
+static void install_pid(struct vcpu_vmx *vmx)
+{
+	struct kvm_vmx *kvm_vmx = to_kvm_vmx(vmx->vcpu.kvm);
+
+	BUG_ON(vmx->vcpu.vcpu_id > kvm_vmx->pid_last_index);
+	/* Bit 0 is the valid bit */
+	kvm_vmx->pid_table[vmx->vcpu.vcpu_id] = __pa(&vmx->pi_desc) | 1;
+	vmcs_write64(PID_POINTER_TABLE, __pa(kvm_vmx->pid_table));
+	vmcs_write16(LAST_PID_POINTER_INDEX, kvm_vmx->pid_last_index);
+}
+
 /*
  * Noting that the initialization of Guest-state Area of VMCS is in
  * vmx_vcpu_reset().
@@ -4430,6 +4463,9 @@  static void init_vmcs(struct vcpu_vmx *vmx)
 		vmx->pt_desc.guest.output_mask = 0x7F;
 		vmcs_write64(GUEST_IA32_RTIT_CTL, 0);
 	}
+
+	if (vmx->ipiv_active)
+		install_pid(vmx);
 }
 
 static void vmx_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
@@ -6969,6 +7005,22 @@  static int vmx_vm_init(struct kvm *kvm)
 			break;
 		}
 	}
+
+	if (enable_ipiv) {
+		struct page *pages;
+
+		/* Allocate pages for PID table in order of PID_TABLE_ORDER
+		 * depending on KVM_MAX_VCPU_ID. Each PID entry is 8 bytes.
+		 */
+		pages = alloc_pages(GFP_KERNEL | __GFP_ZERO, PID_TABLE_ORDER);
+
+		if (!pages)
+			return -ENOMEM;
+
+		to_kvm_vmx(kvm)->pid_table = (void *)page_address(pages);
+		to_kvm_vmx(kvm)->pid_last_index = KVM_MAX_VCPU_ID;
+	}
+
 	return 0;
 }
 
@@ -7579,6 +7631,14 @@  static bool vmx_check_apicv_inhibit_reasons(ulong bit)
 	return supported & BIT(bit);
 }
 
+static void vmx_vm_destroy(struct kvm *kvm)
+{
+	struct kvm_vmx *kvm_vmx = to_kvm_vmx(kvm);
+
+	if (kvm_vmx->pid_table)
+		free_pages((unsigned long)kvm_vmx->pid_table, PID_TABLE_ORDER);
+}
+
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
 	.hardware_unsetup = hardware_unsetup,
 
@@ -7589,6 +7649,7 @@  static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
 	.vm_size = sizeof(struct kvm_vmx),
 	.vm_init = vmx_vm_init,
+	.vm_destroy = vmx_vm_destroy,
 
 	.vcpu_create = vmx_create_vcpu,
 	.vcpu_free = vmx_free_vcpu,
@@ -7828,6 +7889,11 @@  static __init int hardware_setup(void)
 		vmx_x86_ops.sync_pir_to_irr = NULL;
 	}
 
+	if (!enable_apicv) {
+		enable_ipiv = 0;
+		vmcs_config.cpu_based_3rd_exec_ctrl &= ~TERTIARY_EXEC_IPI_VIRT;
+	}
+
 	if (cpu_has_vmx_tsc_scaling()) {
 		kvm_has_tsc_control = true;
 		kvm_max_tsc_scaling_ratio = KVM_VMX_TSC_MULTIPLIER_MAX;
diff --git a/arch/x86/kvm/vmx/vmx.h b/arch/x86/kvm/vmx/vmx.h
index c356ceebe84c..0dee1e4c628c 100644
--- a/arch/x86/kvm/vmx/vmx.h
+++ b/arch/x86/kvm/vmx/vmx.h
@@ -344,6 +344,9 @@  struct vcpu_vmx {
 		DECLARE_BITMAP(read, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 		DECLARE_BITMAP(write, MAX_POSSIBLE_PASSTHROUGH_MSRS);
 	} shadow_msr_intercept;
+
+	/* IPI virtualization status */
+	bool ipiv_active;
 };
 
 struct kvm_vmx {
@@ -352,6 +355,9 @@  struct kvm_vmx {
 	unsigned int tss_addr;
 	bool ept_identity_pagetable_done;
 	gpa_t ept_identity_map_addr;
+	/* PID table for IPI virtualization */
+	u64 *pid_table;
+	u16 pid_last_index;
 };
 
 bool nested_vmx_allowed(struct kvm_vcpu *vcpu);