diff mbox series

[v5,5/8] KVM: x86: Support interrupt dispatch in x2APIC mode with APIC-write VM exit

Message ID 20211231142849.611-6-guang.zeng@intel.com (mailing list archive)
State New, archived
Headers show
Series IPI virtualization support for VM | expand

Commit Message

Zeng Guang Dec. 31, 2021, 2:28 p.m. UTC
In VMX non-root operation, new behavior applies to
virtualize WRMSR to vICR in x2APIC mode. Depending
on settings of the VM-execution controls, CPU would
produce APIC-write VM-exit following the 64-bit value
written to offset 300H on the virtual-APIC page(vICR).
KVM needs to retrieve the value written by CPU and
emulate the vICR write to deliver an interrupt.

Current KVM doesn't consider to handle the 64-bit setting
on vICR in trap-like APIC-write VM-exit. Because using
kvm_lapic_reg_write() to emulate writes to APIC_ICR requires
the APIC_ICR2 is already programmed correctly. But in the
above APIC-write VM-exit, CPU writes the whole 64 bits to
APIC_ICR rather than program higher 32 bits and lower 32
bits to APIC_ICR2 and APIC_ICR respectively. So, KVM needs
to retrieve the whole 64-bit value and program higher 32 bits
to APIC_ICR2 first.

Signed-off-by: Zeng Guang <guang.zeng@intel.com>
---
 arch/x86/kvm/lapic.c | 12 +++++++++---
 arch/x86/kvm/lapic.h |  5 +++++
 2 files changed, 14 insertions(+), 3 deletions(-)

Comments

Sean Christopherson Jan. 13, 2022, 9:29 p.m. UTC | #1
On Fri, Dec 31, 2021, Zeng Guang wrote:
> In VMX non-root operation, new behavior applies to

"new behavior" is ambiguous, it's not clear if it refers to new hardware behavior,
new KVM behavior, etc...

> virtualize WRMSR to vICR in x2APIC mode. Depending

Please wrap at ~75 chars, this is too narrow.

> on settings of the VM-execution controls, CPU would
> produce APIC-write VM-exit following the 64-bit value
> written to offset 300H on the virtual-APIC page(vICR).
> KVM needs to retrieve the value written by CPU and
> emulate the vICR write to deliver an interrupt.
> 
> Current KVM doesn't consider to handle the 64-bit setting
> on vICR in trap-like APIC-write VM-exit. Because using
> kvm_lapic_reg_write() to emulate writes to APIC_ICR requires
> the APIC_ICR2 is already programmed correctly. But in the
> above APIC-write VM-exit, CPU writes the whole 64 bits to
> APIC_ICR rather than program higher 32 bits and lower 32
> bits to APIC_ICR2 and APIC_ICR respectively. So, KVM needs
> to retrieve the whole 64-bit value and program higher 32 bits
> to APIC_ICR2 first.

I think this is simply saying:

  Upcoming Intel CPUs will support virtual x2APIC MSR writes to the vICR,
  i.e. will trap and generate an APIC-write VM-Exit instead of intercepting
  the WRMSR.  Add support for handling "nodecode" x2APIC writes, which were
  previously impossible.

  Note, x2APIC MSR writes are 64 bits wide.

and then the shortlog can be:

  KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode

The "interrupt dispatch" part is quite confusing because it's not really germane
to the change; yes, the vICR write does (eventually) dispatch an IRQ, but that
has nothing to do with the code being modified.

> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
> ---
>  arch/x86/kvm/lapic.c | 12 +++++++++---
>  arch/x86/kvm/lapic.h |  5 +++++
>  2 files changed, 14 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index f206fc35deff..3ce7142ba00e 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2186,15 +2186,21 @@ EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
>  /* emulate APIC access in a trap manner */
>  void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
>  {
> -	u32 val = 0;
> +	struct kvm_lapic *apic = vcpu->arch.apic;
> +	u64 val = 0;
>  
>  	/* hw has done the conditional check and inst decode */
>  	offset &= 0xff0;
>  
> -	kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
> +	/* exception dealing with 64bit data on vICR in x2apic mode */
> +	if ((offset == APIC_ICR) && apic_x2apic_mode(apic)) {

Sorry, I failed to reply to your response in the previous version.  I suggested
a WARN_ON(offset != APIC_ICR), but you were concerned that apic_x2apic_mode()
would be expensive to check before @offset.  I don't think that's a valid concern
as apic_x2apic_mode() is simply:

	apic->vcpu->arch.apic_base & X2APIC_ENABLE

And is likely well-predicted by the CPU, especially in single tenant or pinned
scenarios where the pCPU is running a single VM/vCPU, i.e. will amost never see
X2APIC_ENABLE toggling.

So I stand behind my previous feedback[*] that we should split on x2APIC.

> +		val = kvm_lapic_get_reg64(apic, offset);
> +		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(val>>32));
> +	} else
> +		kvm_lapic_reg_read(apic, offset, 4, &val);

Needs curly braces.  But again, I stand behind my previous feedback that this
would be better written as:

        if (apic_x2apic_mode(apic)) {
                if (WARN_ON_ONCE(offset != APIC_ICR))
                        return 1;

                kvm_lapic_reg_read(apic, offset, 8, &val);
                kvm_lapic_reg_write64(apic, offset, val);
        } else {
                kvm_lapic_reg_read(apic, offset, 4, &val);
                kvm_lapic_reg_write(apic, offset, val);
        }

after a patch (provided in earlier feedback) to introduce kvm_lapic_reg_write64().

[*] https://lore.kernel.org/all/YTvcJZSd1KQvNmaz@google.com

>  	/* TODO: optimize to just emulate side effect w/o one more write */
> -	kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
> +	kvm_lapic_reg_write(apic, offset, (u32)val);
>  }
>  EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
>
Zeng Guang Jan. 14, 2022, 7:52 a.m. UTC | #2
On 1/14/2022 5:29 AM, Sean Christopherson wrote:
> On Fri, Dec 31, 2021, Zeng Guang wrote:
>> In VMX non-root operation, new behavior applies to
> "new behavior" is ambiguous, it's not clear if it refers to new hardware behavior,
> new KVM behavior, etc...
>
>> virtualize WRMSR to vICR in x2APIC mode. Depending
> Please wrap at ~75 chars, this is too narrow.
>
>> on settings of the VM-execution controls, CPU would
>> produce APIC-write VM-exit following the 64-bit value
>> written to offset 300H on the virtual-APIC page(vICR).
>> KVM needs to retrieve the value written by CPU and
>> emulate the vICR write to deliver an interrupt.
>>
>> Current KVM doesn't consider to handle the 64-bit setting
>> on vICR in trap-like APIC-write VM-exit. Because using
>> kvm_lapic_reg_write() to emulate writes to APIC_ICR requires
>> the APIC_ICR2 is already programmed correctly. But in the
>> above APIC-write VM-exit, CPU writes the whole 64 bits to
>> APIC_ICR rather than program higher 32 bits and lower 32
>> bits to APIC_ICR2 and APIC_ICR respectively. So, KVM needs
>> to retrieve the whole 64-bit value and program higher 32 bits
>> to APIC_ICR2 first.
> I think this is simply saying:
>
>    Upcoming Intel CPUs will support virtual x2APIC MSR writes to the vICR,
>    i.e. will trap and generate an APIC-write VM-Exit instead of intercepting
>    the WRMSR.  Add support for handling "nodecode" x2APIC writes, which were
>    previously impossible.
>
>    Note, x2APIC MSR writes are 64 bits wide.
>
> and then the shortlog can be:
>
>    KVM: x86: Add support for vICR APIC-write VM-Exits in x2APIC mode
>
> The "interrupt dispatch" part is quite confusing because it's not really germane
> to the change; yes, the vICR write does (eventually) dispatch an IRQ, but that
> has nothing to do with the code being modified.

I would take commit message as you suggested. Thanks.

>> Signed-off-by: Zeng Guang <guang.zeng@intel.com>
>> ---
>>   arch/x86/kvm/lapic.c | 12 +++++++++---
>>   arch/x86/kvm/lapic.h |  5 +++++
>>   2 files changed, 14 insertions(+), 3 deletions(-)
>>
>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>> index f206fc35deff..3ce7142ba00e 100644
>> --- a/arch/x86/kvm/lapic.c
>> +++ b/arch/x86/kvm/lapic.c
>> @@ -2186,15 +2186,21 @@ EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
>>   /* emulate APIC access in a trap manner */
>>   void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
>>   {
>> -	u32 val = 0;
>> +	struct kvm_lapic *apic = vcpu->arch.apic;
>> +	u64 val = 0;
>>   
>>   	/* hw has done the conditional check and inst decode */
>>   	offset &= 0xff0;
>>   
>> -	kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
>> +	/* exception dealing with 64bit data on vICR in x2apic mode */
>> +	if ((offset == APIC_ICR) && apic_x2apic_mode(apic)) {
> Sorry, I failed to reply to your response in the previous version.  I suggested
> a WARN_ON(offset != APIC_ICR), but you were concerned that apic_x2apic_mode()
> would be expensive to check before @offset.  I don't think that's a valid concern
> as apic_x2apic_mode() is simply:
>
> 	apic->vcpu->arch.apic_base & X2APIC_ENABLE
>
> And is likely well-predicted by the CPU, especially in single tenant or pinned
> scenarios where the pCPU is running a single VM/vCPU, i.e. will amost never see
> X2APIC_ENABLE toggling.
>
> So I stand behind my previous feedback[*] that we should split on x2APIC.
>
>> +		val = kvm_lapic_get_reg64(apic, offset);
>> +		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(val>>32));
>> +	} else
>> +		kvm_lapic_reg_read(apic, offset, 4, &val);
> Needs curly braces.  But again, I stand behind my previous feedback that this
> would be better written as:
>
>          if (apic_x2apic_mode(apic)) {
>                  if (WARN_ON_ONCE(offset != APIC_ICR))
>                          return 1;
>
>                  kvm_lapic_reg_read(apic, offset, 8, &val);
>                  kvm_lapic_reg_write64(apic, offset, val);
>          } else {
>                  kvm_lapic_reg_read(apic, offset, 4, &val);
>                  kvm_lapic_reg_write(apic, offset, val);
>          }
>
> after a patch (provided in earlier feedback) to introduce kvm_lapic_reg_write64().
>
> [*] https://lore.kernel.org/all/YTvcJZSd1KQvNmaz@google.com

kvm_lapic_reg_read() is limited to read up to 4 bytes. It needs extension to support 64bit
read. And another concern is here getting reg value only specific from vICR(no other regs
need take care), going through whole path on kvm_lapic_reg_read() could be time-consuming
unnecessarily. Is it proper that calling kvm_lapic_get_reg64() to retrieve vICR value directly?

The change could be like follows:

         if (apic_x2apic_mode(apic)) {
                 if (WARN_ON_ONCE(offset != APIC_ICR))
                         return 1;

                 val = kvm_lapic_get_reg64(apic, offset);
                 kvm_lapic_reg_write64(apic, offset, val);
         } else {
                 kvm_lapic_reg_read(apic, offset, 4, &val);
                 kvm_lapic_reg_write(apic, offset, val);
         }

  

>>   	/* TODO: optimize to just emulate side effect w/o one more write */
>> -	kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
>> +	kvm_lapic_reg_write(apic, offset, (u32)val);
>>   }
>>   EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
>>
Sean Christopherson Jan. 14, 2022, 5:34 p.m. UTC | #3
On Fri, Jan 14, 2022, Zeng Guang wrote:
> kvm_lapic_reg_read() is limited to read up to 4 bytes. It needs extension to
> support 64bit read.

Ah, right.

> And another concern is here getting reg value only specific from vICR(no
> other regs need take care), going through whole path on kvm_lapic_reg_read()
> could be time-consuming unnecessarily. Is it proper that calling
> kvm_lapic_get_reg64() to retrieve vICR value directly?

Hmm, no, I don't think that's proper.  Retrieving a 64-bit value really is unique
to vICR.  Yes, the code does WARN on that, but if future architectural extensions
even generate APIC-write exits on other registers, then using kvm_lapic_get_reg64()
would be wrong and this code would need to be updated again.

What about tweaking my prep patch from before to the below?  That would yield:

	if (apic_x2apic_mode(apic)) {
		if (WARN_ON_ONCE(offset != APIC_ICR))
			return 1;

		kvm_lapic_msr_read(apic, offset, &val);
		kvm_lapic_msr_write(apic, offset, val);
	} else {
		kvm_lapic_reg_read(apic, offset, 4, &val);
		kvm_lapic_reg_write(apic, offset, val);
	}

I like that the above has "msr" in the low level x2apic helpers, and it maximizes
code reuse.  Compile tested only...

From: Sean Christopherson <seanjc@google.com>
Date: Fri, 14 Jan 2022 09:29:34 -0800
Subject: [PATCH] KVM: x86: Add helpers to handle 64-bit APIC MSR read/writes

Add helpers to handle 64-bit APIC read/writes via MSRs to deduplicate the
x2APIC and Hyper-V code needed to service reads/writes to ICR.  Future
support for IPI virtualization will add yet another path where KVM must
handle 64-bit APIC MSR reads/write (to ICR).

Opportunistically fix the comment in the write path; ICR2 holds the
destination (if there's no shorthand), not the vector.

No functional change intended.

Signed-off-by: Sean Christopherson <seanjc@google.com>
---
 arch/x86/kvm/lapic.c | 59 ++++++++++++++++++++++----------------------
 1 file changed, 29 insertions(+), 30 deletions(-)

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f206fc35deff..cc4531eb448f 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2787,6 +2787,30 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
 	return 0;
 }

+static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
+{
+	u32 low, high = 0;
+
+	if (kvm_lapic_reg_read(apic, reg, 4, &low))
+		return 1;
+
+	if (reg == APIC_ICR &&
+	    WARN_ON_ONCE(kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high)))
+		return 1;
+
+	*data = (((u64)high) << 32) | low;
+
+	return 0;
+}
+
+static int kvm_lapic_msr_write(struct kvm_lapic *apic, u32 reg, u64 data)
+{
+	/* For 64-bit ICR writes, set ICR2 (dest) before ICR (command). */
+	if (reg == APIC_ICR)
+		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
+	return kvm_lapic_reg_write(apic, reg, (u32)data);
+}
+
 int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
@@ -2798,16 +2822,13 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
 	if (reg == APIC_ICR2)
 		return 1;

-	/* if this is ICR write vector before command */
-	if (reg == APIC_ICR)
-		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
-	return kvm_lapic_reg_write(apic, reg, (u32)data);
+	return kvm_lapic_msr_write(apic, reg, data);
 }

 int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 {
 	struct kvm_lapic *apic = vcpu->arch.apic;
-	u32 reg = (msr - APIC_BASE_MSR) << 4, low, high = 0;
+	u32 reg = (msr - APIC_BASE_MSR) << 4;

 	if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(apic))
 		return 1;
@@ -2815,45 +2836,23 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
 	if (reg == APIC_DFR || reg == APIC_ICR2)
 		return 1;

-	if (kvm_lapic_reg_read(apic, reg, 4, &low))
-		return 1;
-	if (reg == APIC_ICR)
-		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
-
-	*data = (((u64)high) << 32) | low;
-
-	return 0;
+	return kvm_lapic_msr_read(apic, reg, data);
 }

 int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 reg, u64 data)
 {
-	struct kvm_lapic *apic = vcpu->arch.apic;
-
 	if (!lapic_in_kernel(vcpu))
 		return 1;

-	/* if this is ICR write vector before command */
-	if (reg == APIC_ICR)
-		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
-	return kvm_lapic_reg_write(apic, reg, (u32)data);
+	return kvm_lapic_msr_write(vcpu->arch.apic, reg, data);
 }

 int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
 {
-	struct kvm_lapic *apic = vcpu->arch.apic;
-	u32 low, high = 0;
-
 	if (!lapic_in_kernel(vcpu))
 		return 1;

-	if (kvm_lapic_reg_read(apic, reg, 4, &low))
-		return 1;
-	if (reg == APIC_ICR)
-		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
-
-	*data = (((u64)high) << 32) | low;
-
-	return 0;
+	return kvm_lapic_msr_read(vcpu->arch.apic, reg, data);
 }

 int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len)
--
Zeng Guang Jan. 15, 2022, 2:08 a.m. UTC | #4
On 1/15/2022 1:34 AM, Sean Christopherson wrote:
> On Fri, Jan 14, 2022, Zeng Guang wrote:
>> kvm_lapic_reg_read() is limited to read up to 4 bytes. It needs extension to
>> support 64bit read.
> Ah, right.
>
>> And another concern is here getting reg value only specific from vICR(no
>> other regs need take care), going through whole path on kvm_lapic_reg_read()
>> could be time-consuming unnecessarily. Is it proper that calling
>> kvm_lapic_get_reg64() to retrieve vICR value directly?
> Hmm, no, I don't think that's proper.  Retrieving a 64-bit value really is unique
> to vICR.  Yes, the code does WARN on that, but if future architectural extensions
> even generate APIC-write exits on other registers, then using kvm_lapic_get_reg64()
> would be wrong and this code would need to be updated again.
Split on x2apic and WARN on (offset != APIC_ICR) already limit register 
read to vICR only. Actually
we just need consider to deal with 64bit data specific to vICR in 
APIC-write exits. From this point of
view, previous design can be compatible on handling other registers even 
if future architectural
extensions changes. :)
>
> What about tweaking my prep patch from before to the below?  That would yield:
>
> 	if (apic_x2apic_mode(apic)) {
> 		if (WARN_ON_ONCE(offset != APIC_ICR))
> 			return 1;
>
> 		kvm_lapic_msr_read(apic, offset, &val);

I think it's problematic to use kvm_lapic_msr_read() in this case. It 
premises the high 32bit value
already valid at APIC_ICR2, while in handling "nodecode" x2APIC writes 
we need get continuous 64bit
data from offset 300H first and prepare emulation of APIC_ICR2 write. At 
this time, APIC_ICR2 is not
ready yet.

> 		kvm_lapic_msr_write(apic, offset, val);
> 	} else {
> 		kvm_lapic_reg_read(apic, offset, 4, &val);
> 		kvm_lapic_reg_write(apic, offset, val);
> 	}
>
> I like that the above has "msr" in the low level x2apic helpers, and it maximizes
> code reuse.  Compile tested only...
>
> From: Sean Christopherson <seanjc@google.com>
> Date: Fri, 14 Jan 2022 09:29:34 -0800
> Subject: [PATCH] KVM: x86: Add helpers to handle 64-bit APIC MSR read/writes
>
> Add helpers to handle 64-bit APIC read/writes via MSRs to deduplicate the
> x2APIC and Hyper-V code needed to service reads/writes to ICR.  Future
> support for IPI virtualization will add yet another path where KVM must
> handle 64-bit APIC MSR reads/write (to ICR).
>
> Opportunistically fix the comment in the write path; ICR2 holds the
> destination (if there's no shorthand), not the vector.
>
> No functional change intended.
>
> Signed-off-by: Sean Christopherson <seanjc@google.com>
> ---
>   arch/x86/kvm/lapic.c | 59 ++++++++++++++++++++++----------------------
>   1 file changed, 29 insertions(+), 30 deletions(-)
>
> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> index f206fc35deff..cc4531eb448f 100644
> --- a/arch/x86/kvm/lapic.c
> +++ b/arch/x86/kvm/lapic.c
> @@ -2787,6 +2787,30 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
>   	return 0;
>   }
>
> +static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
> +{
> +	u32 low, high = 0;
> +
> +	if (kvm_lapic_reg_read(apic, reg, 4, &low))
> +		return 1;
> +
> +	if (reg == APIC_ICR &&
> +	    WARN_ON_ONCE(kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high)))
> +		return 1;
> +
> +	*data = (((u64)high) << 32) | low;
> +
> +	return 0;
> +}
> +
> +static int kvm_lapic_msr_write(struct kvm_lapic *apic, u32 reg, u64 data)
> +{
> +	/* For 64-bit ICR writes, set ICR2 (dest) before ICR (command). */
> +	if (reg == APIC_ICR)
> +		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
> +	return kvm_lapic_reg_write(apic, reg, (u32)data);
> +}
> +
>   int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>   {
>   	struct kvm_lapic *apic = vcpu->arch.apic;
> @@ -2798,16 +2822,13 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>   	if (reg == APIC_ICR2)
>   		return 1;
>
> -	/* if this is ICR write vector before command */
> -	if (reg == APIC_ICR)
> -		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
> -	return kvm_lapic_reg_write(apic, reg, (u32)data);
> +	return kvm_lapic_msr_write(apic, reg, data);
>   }
>
>   int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
>   {
>   	struct kvm_lapic *apic = vcpu->arch.apic;
> -	u32 reg = (msr - APIC_BASE_MSR) << 4, low, high = 0;
> +	u32 reg = (msr - APIC_BASE_MSR) << 4;
>
>   	if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(apic))
>   		return 1;
> @@ -2815,45 +2836,23 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
>   	if (reg == APIC_DFR || reg == APIC_ICR2)
>   		return 1;
>
> -	if (kvm_lapic_reg_read(apic, reg, 4, &low))
> -		return 1;
> -	if (reg == APIC_ICR)
> -		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
> -
> -	*data = (((u64)high) << 32) | low;
> -
> -	return 0;
> +	return kvm_lapic_msr_read(apic, reg, data);
>   }
>
>   int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 reg, u64 data)
>   {
> -	struct kvm_lapic *apic = vcpu->arch.apic;
> -
>   	if (!lapic_in_kernel(vcpu))
>   		return 1;
>
> -	/* if this is ICR write vector before command */
> -	if (reg == APIC_ICR)
> -		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
> -	return kvm_lapic_reg_write(apic, reg, (u32)data);
> +	return kvm_lapic_msr_write(vcpu->arch.apic, reg, data);
>   }
>
>   int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
>   {
> -	struct kvm_lapic *apic = vcpu->arch.apic;
> -	u32 low, high = 0;
> -
>   	if (!lapic_in_kernel(vcpu))
>   		return 1;
>
> -	if (kvm_lapic_reg_read(apic, reg, 4, &low))
> -		return 1;
> -	if (reg == APIC_ICR)
> -		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
> -
> -	*data = (((u64)high) << 32) | low;
> -
> -	return 0;
> +	return kvm_lapic_msr_read(vcpu->arch.apic, reg, data);
>   }
>
>   int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len)
> --
Yuan Yao Jan. 18, 2022, 12:44 a.m. UTC | #5
On Sat, Jan 15, 2022 at 10:08:10AM +0800, Zeng Guang wrote:
> On 1/15/2022 1:34 AM, Sean Christopherson wrote:
> > On Fri, Jan 14, 2022, Zeng Guang wrote:
> > > kvm_lapic_reg_read() is limited to read up to 4 bytes. It needs extension to
> > > support 64bit read.
> > Ah, right.
> >
> > > And another concern is here getting reg value only specific from vICR(no
> > > other regs need take care), going through whole path on kvm_lapic_reg_read()
> > > could be time-consuming unnecessarily. Is it proper that calling
> > > kvm_lapic_get_reg64() to retrieve vICR value directly?
> > Hmm, no, I don't think that's proper.  Retrieving a 64-bit value really is unique
> > to vICR.  Yes, the code does WARN on that, but if future architectural extensions
> > even generate APIC-write exits on other registers, then using kvm_lapic_get_reg64()
> > would be wrong and this code would need to be updated again.
> Split on x2apic and WARN on (offset != APIC_ICR) already limit register read
> to vICR only. Actually
> we just need consider to deal with 64bit data specific to vICR in APIC-write
> exits. From this point of
> view, previous design can be compatible on handling other registers even if
> future architectural
> extensions changes. :)
> >
> > What about tweaking my prep patch from before to the below?  That would yield:
> >
> > 	if (apic_x2apic_mode(apic)) {
> > 		if (WARN_ON_ONCE(offset != APIC_ICR))
> > 			return 1;
> >
> > 		kvm_lapic_msr_read(apic, offset, &val);
>
> I think it's problematic to use kvm_lapic_msr_read() in this case. It
> premises the high 32bit value
> already valid at APIC_ICR2, while in handling "nodecode" x2APIC writes we
> need get continuous 64bit
> data from offset 300H first and prepare emulation of APIC_ICR2 write. At
> this time, APIC_ICR2 is not
> ready yet.

How about combine them, then you can handle the ICR write vmexit for
IPI virtualization and Sean's patch can still work with code reusing,
like below:

	if (apic_x2apic_mode(apic)) {
		if (WARN_ON_ONCE(offset != APIC_ICR))
			kvm_lapic_msr_read(apic, offset, &val);
		else
			kvm_lapic_get_reg64(apic, offset, &val);

		kvm_lapic_msr_write(apic, offset, val);
	} else {
		kvm_lapic_reg_read(apic, offset, 4, &val);
		kvm_lapic_reg_write(apic, offset, val);
	}

>
> > 		kvm_lapic_msr_write(apic, offset, val);
> > 	} else {
> > 		kvm_lapic_reg_read(apic, offset, 4, &val);
> > 		kvm_lapic_reg_write(apic, offset, val);
> > 	}
> >
> > I like that the above has "msr" in the low level x2apic helpers, and it maximizes
> > code reuse.  Compile tested only...
> >
> > From: Sean Christopherson <seanjc@google.com>
> > Date: Fri, 14 Jan 2022 09:29:34 -0800
> > Subject: [PATCH] KVM: x86: Add helpers to handle 64-bit APIC MSR read/writes
> >
> > Add helpers to handle 64-bit APIC read/writes via MSRs to deduplicate the
> > x2APIC and Hyper-V code needed to service reads/writes to ICR.  Future
> > support for IPI virtualization will add yet another path where KVM must
> > handle 64-bit APIC MSR reads/write (to ICR).
> >
> > Opportunistically fix the comment in the write path; ICR2 holds the
> > destination (if there's no shorthand), not the vector.
> >
> > No functional change intended.
> >
> > Signed-off-by: Sean Christopherson <seanjc@google.com>
> > ---
> >   arch/x86/kvm/lapic.c | 59 ++++++++++++++++++++++----------------------
> >   1 file changed, 29 insertions(+), 30 deletions(-)
> >
> > diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
> > index f206fc35deff..cc4531eb448f 100644
> > --- a/arch/x86/kvm/lapic.c
> > +++ b/arch/x86/kvm/lapic.c
> > @@ -2787,6 +2787,30 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
> >   	return 0;
> >   }
> >
> > +static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
> > +{
> > +	u32 low, high = 0;
> > +
> > +	if (kvm_lapic_reg_read(apic, reg, 4, &low))
> > +		return 1;
> > +
> > +	if (reg == APIC_ICR &&
> > +	    WARN_ON_ONCE(kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high)))
> > +		return 1;
> > +
> > +	*data = (((u64)high) << 32) | low;
> > +
> > +	return 0;
> > +}
> > +
> > +static int kvm_lapic_msr_write(struct kvm_lapic *apic, u32 reg, u64 data)
> > +{
> > +	/* For 64-bit ICR writes, set ICR2 (dest) before ICR (command). */
> > +	if (reg == APIC_ICR)
> > +		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
> > +	return kvm_lapic_reg_write(apic, reg, (u32)data);
> > +}
> > +
> >   int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> >   {
> >   	struct kvm_lapic *apic = vcpu->arch.apic;
> > @@ -2798,16 +2822,13 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
> >   	if (reg == APIC_ICR2)
> >   		return 1;
> >
> > -	/* if this is ICR write vector before command */
> > -	if (reg == APIC_ICR)
> > -		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
> > -	return kvm_lapic_reg_write(apic, reg, (u32)data);
> > +	return kvm_lapic_msr_write(apic, reg, data);
> >   }
> >
> >   int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
> >   {
> >   	struct kvm_lapic *apic = vcpu->arch.apic;
> > -	u32 reg = (msr - APIC_BASE_MSR) << 4, low, high = 0;
> > +	u32 reg = (msr - APIC_BASE_MSR) << 4;
> >
> >   	if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(apic))
> >   		return 1;
> > @@ -2815,45 +2836,23 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
> >   	if (reg == APIC_DFR || reg == APIC_ICR2)
> >   		return 1;
> >
> > -	if (kvm_lapic_reg_read(apic, reg, 4, &low))
> > -		return 1;
> > -	if (reg == APIC_ICR)
> > -		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
> > -
> > -	*data = (((u64)high) << 32) | low;
> > -
> > -	return 0;
> > +	return kvm_lapic_msr_read(apic, reg, data);
> >   }
> >
> >   int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 reg, u64 data)
> >   {
> > -	struct kvm_lapic *apic = vcpu->arch.apic;
> > -
> >   	if (!lapic_in_kernel(vcpu))
> >   		return 1;
> >
> > -	/* if this is ICR write vector before command */
> > -	if (reg == APIC_ICR)
> > -		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
> > -	return kvm_lapic_reg_write(apic, reg, (u32)data);
> > +	return kvm_lapic_msr_write(vcpu->arch.apic, reg, data);
> >   }
> >
> >   int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
> >   {
> > -	struct kvm_lapic *apic = vcpu->arch.apic;
> > -	u32 low, high = 0;
> > -
> >   	if (!lapic_in_kernel(vcpu))
> >   		return 1;
> >
> > -	if (kvm_lapic_reg_read(apic, reg, 4, &low))
> > -		return 1;
> > -	if (reg == APIC_ICR)
> > -		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
> > -
> > -	*data = (((u64)high) << 32) | low;
> > -
> > -	return 0;
> > +	return kvm_lapic_msr_read(vcpu->arch.apic, reg, data);
> >   }
> >
> >   int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len)
> > --
Zeng Guang Jan. 18, 2022, 3:06 a.m. UTC | #6
On 1/18/2022 8:44 AM, Yuan Yao wrote:
> On Sat, Jan 15, 2022 at 10:08:10AM +0800, Zeng Guang wrote:
>> On 1/15/2022 1:34 AM, Sean Christopherson wrote:
>>> On Fri, Jan 14, 2022, Zeng Guang wrote:
>>>> kvm_lapic_reg_read() is limited to read up to 4 bytes. It needs extension to
>>>> support 64bit read.
>>> Ah, right.
>>>
>>>> And another concern is here getting reg value only specific from vICR(no
>>>> other regs need take care), going through whole path on kvm_lapic_reg_read()
>>>> could be time-consuming unnecessarily. Is it proper that calling
>>>> kvm_lapic_get_reg64() to retrieve vICR value directly?
>>> Hmm, no, I don't think that's proper.  Retrieving a 64-bit value really is unique
>>> to vICR.  Yes, the code does WARN on that, but if future architectural extensions
>>> even generate APIC-write exits on other registers, then using kvm_lapic_get_reg64()
>>> would be wrong and this code would need to be updated again.
>> Split on x2apic and WARN on (offset != APIC_ICR) already limit register read
>> to vICR only. Actually
>> we just need consider to deal with 64bit data specific to vICR in APIC-write
>> exits. From this point of
>> view, previous design can be compatible on handling other registers even if
>> future architectural
>> extensions changes. :)
>>> What about tweaking my prep patch from before to the below?  That would yield:
>>>
>>> 	if (apic_x2apic_mode(apic)) {
>>> 		if (WARN_ON_ONCE(offset != APIC_ICR))
>>> 			return 1;
>>>
>>> 		kvm_lapic_msr_read(apic, offset, &val);
>> I think it's problematic to use kvm_lapic_msr_read() in this case. It
>> premises the high 32bit value
>> already valid at APIC_ICR2, while in handling "nodecode" x2APIC writes we
>> need get continuous 64bit
>> data from offset 300H first and prepare emulation of APIC_ICR2 write. At
>> this time, APIC_ICR2 is not
>> ready yet.
> How about combine them, then you can handle the ICR write vmexit for
> IPI virtualization and Sean's patch can still work with code reusing,
> like below:
>
> 	if (apic_x2apic_mode(apic)) {
> 		if (WARN_ON_ONCE(offset != APIC_ICR))
> 			kvm_lapic_msr_read(apic, offset, &val);
> 		else
> 			kvm_lapic_get_reg64(apic, offset, &val);
>
> 		kvm_lapic_msr_write(apic, offset, val);
> 	} else {
> 		kvm_lapic_reg_read(apic, offset, 4, &val);
> 		kvm_lapic_reg_write(apic, offset, val);
> 	}

Alternatively we can merge as this if Sean think it ok to call 
kvm_lapic_get_reg64() retrieving 64bit data from vICR directly.

>>> 		kvm_lapic_msr_write(apic, offset, val);
>>> 	} else {
>>> 		kvm_lapic_reg_read(apic, offset, 4, &val);
>>> 		kvm_lapic_reg_write(apic, offset, val);
>>> 	}
>>>
>>> I like that the above has "msr" in the low level x2apic helpers, and it maximizes
>>> code reuse.  Compile tested only...
>>>
>>> From: Sean Christopherson <seanjc@google.com>
>>> Date: Fri, 14 Jan 2022 09:29:34 -0800
>>> Subject: [PATCH] KVM: x86: Add helpers to handle 64-bit APIC MSR read/writes
>>>
>>> Add helpers to handle 64-bit APIC read/writes via MSRs to deduplicate the
>>> x2APIC and Hyper-V code needed to service reads/writes to ICR.  Future
>>> support for IPI virtualization will add yet another path where KVM must
>>> handle 64-bit APIC MSR reads/write (to ICR).
>>>
>>> Opportunistically fix the comment in the write path; ICR2 holds the
>>> destination (if there's no shorthand), not the vector.
>>>
>>> No functional change intended.
>>>
>>> Signed-off-by: Sean Christopherson <seanjc@google.com>
>>> ---
>>>    arch/x86/kvm/lapic.c | 59 ++++++++++++++++++++++----------------------
>>>    1 file changed, 29 insertions(+), 30 deletions(-)
>>>
>>> diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
>>> index f206fc35deff..cc4531eb448f 100644
>>> --- a/arch/x86/kvm/lapic.c
>>> +++ b/arch/x86/kvm/lapic.c
>>> @@ -2787,6 +2787,30 @@ int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr)
>>>    	return 0;
>>>    }
>>>
>>> +static int kvm_lapic_msr_read(struct kvm_lapic *apic, u32 reg, u64 *data)
>>> +{
>>> +	u32 low, high = 0;
>>> +
>>> +	if (kvm_lapic_reg_read(apic, reg, 4, &low))
>>> +		return 1;
>>> +
>>> +	if (reg == APIC_ICR &&
>>> +	    WARN_ON_ONCE(kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high)))
>>> +		return 1;
>>> +
>>> +	*data = (((u64)high) << 32) | low;
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int kvm_lapic_msr_write(struct kvm_lapic *apic, u32 reg, u64 data)
>>> +{
>>> +	/* For 64-bit ICR writes, set ICR2 (dest) before ICR (command). */
>>> +	if (reg == APIC_ICR)
>>> +		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
>>> +	return kvm_lapic_reg_write(apic, reg, (u32)data);
>>> +}
>>> +
>>>    int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>>>    {
>>>    	struct kvm_lapic *apic = vcpu->arch.apic;
>>> @@ -2798,16 +2822,13 @@ int kvm_x2apic_msr_write(struct kvm_vcpu *vcpu, u32 msr, u64 data)
>>>    	if (reg == APIC_ICR2)
>>>    		return 1;
>>>
>>> -	/* if this is ICR write vector before command */
>>> -	if (reg == APIC_ICR)
>>> -		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
>>> -	return kvm_lapic_reg_write(apic, reg, (u32)data);
>>> +	return kvm_lapic_msr_write(apic, reg, data);
>>>    }
>>>
>>>    int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
>>>    {
>>>    	struct kvm_lapic *apic = vcpu->arch.apic;
>>> -	u32 reg = (msr - APIC_BASE_MSR) << 4, low, high = 0;
>>> +	u32 reg = (msr - APIC_BASE_MSR) << 4;
>>>
>>>    	if (!lapic_in_kernel(vcpu) || !apic_x2apic_mode(apic))
>>>    		return 1;
>>> @@ -2815,45 +2836,23 @@ int kvm_x2apic_msr_read(struct kvm_vcpu *vcpu, u32 msr, u64 *data)
>>>    	if (reg == APIC_DFR || reg == APIC_ICR2)
>>>    		return 1;
>>>
>>> -	if (kvm_lapic_reg_read(apic, reg, 4, &low))
>>> -		return 1;
>>> -	if (reg == APIC_ICR)
>>> -		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
>>> -
>>> -	*data = (((u64)high) << 32) | low;
>>> -
>>> -	return 0;
>>> +	return kvm_lapic_msr_read(apic, reg, data);
>>>    }
>>>
>>>    int kvm_hv_vapic_msr_write(struct kvm_vcpu *vcpu, u32 reg, u64 data)
>>>    {
>>> -	struct kvm_lapic *apic = vcpu->arch.apic;
>>> -
>>>    	if (!lapic_in_kernel(vcpu))
>>>    		return 1;
>>>
>>> -	/* if this is ICR write vector before command */
>>> -	if (reg == APIC_ICR)
>>> -		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(data >> 32));
>>> -	return kvm_lapic_reg_write(apic, reg, (u32)data);
>>> +	return kvm_lapic_msr_write(vcpu->arch.apic, reg, data);
>>>    }
>>>
>>>    int kvm_hv_vapic_msr_read(struct kvm_vcpu *vcpu, u32 reg, u64 *data)
>>>    {
>>> -	struct kvm_lapic *apic = vcpu->arch.apic;
>>> -	u32 low, high = 0;
>>> -
>>>    	if (!lapic_in_kernel(vcpu))
>>>    		return 1;
>>>
>>> -	if (kvm_lapic_reg_read(apic, reg, 4, &low))
>>> -		return 1;
>>> -	if (reg == APIC_ICR)
>>> -		kvm_lapic_reg_read(apic, APIC_ICR2, 4, &high);
>>> -
>>> -	*data = (((u64)high) << 32) | low;
>>> -
>>> -	return 0;
>>> +	return kvm_lapic_msr_read(vcpu->arch.apic, reg, data);
>>>    }
>>>
>>>    int kvm_lapic_set_pv_eoi(struct kvm_vcpu *vcpu, u64 data, unsigned long len)
>>> --
Sean Christopherson Jan. 18, 2022, 6:17 p.m. UTC | #7
On Sat, Jan 15, 2022, Zeng Guang wrote:
> > What about tweaking my prep patch from before to the below?  That would yield:
> > 
> > 	if (apic_x2apic_mode(apic)) {
> > 		if (WARN_ON_ONCE(offset != APIC_ICR))
> > 			return 1;
> > 
> > 		kvm_lapic_msr_read(apic, offset, &val);
> 
> I think it's problematic to use kvm_lapic_msr_read() in this case. It
> premises the high 32bit value already valid at APIC_ICR2, while in handling
> "nodecode" x2APIC writes we need get continuous 64bit data from offset 300H
> first and prepare emulation of APIC_ICR2 write.

Ah, I read this part of the spec:

  All 64 bits of the ICR are written by using WRMSR to access the MSR with index 830H.
  If ECX = 830H, WRMSR writes the 64-bit value in EDX:EAX to the ICR, causing the APIC
  to send an IPI. If any of bits 13, 17:16, or 31:20 are set in EAX, WRMSR detects a
  reserved-bit violation and causes a general-protection exception (#GP).

but not the part down below that explicit says

  VICR refers the 64-bit field at offset 300H on the virtual-APIC page. When the
  “virtualize x2APIC mode” VM-execution control is 1 (indicating virtualization of
  x2APIC mode), this field is used to virtualize the entire ICR.

But that's indicative of an existing KVM problem.  KVM's emulation of x2APIC is
broken.  The SDM, in section 10.12.9 ICR Operation in x2APIC Mode, clearly states
that the ICR is extended to 64-bits.  ICR2 does not exist in x2APIC mode, full stop.
KVM botched things by effectively aliasing ICR[63:32] to ICR2.

We can and should fix that issue before merging IPIv suport, that way we don't
further propagate KVM's incorrect behavior.  KVM will need to manipulate the APIC
state in KVM_{G,S}ET_LAPIC so as not to "break" migration, "break" in quotes because
I highly doubt any kernel reads ICR[63:32] for anything but debug purposes.  But
we'd need to do that anyways for IPIv, otherwise migration from an IPIv host to
a non-IPIv host would suffer the same migration bug.

I'll post a series this week, in theory we should be able to reduce the patch for
IPIv support to just having to only touching kvm_apic_write_nodecode().
Zeng Guang Jan. 19, 2022, 2:48 a.m. UTC | #8
On 1/19/2022 2:17 AM, Sean Christopherson wrote:
> On Sat, Jan 15, 2022, Zeng Guang wrote:
>>> What about tweaking my prep patch from before to the below?  That would yield:
>>>
>>> 	if (apic_x2apic_mode(apic)) {
>>> 		if (WARN_ON_ONCE(offset != APIC_ICR))
>>> 			return 1;
>>>
>>> 		kvm_lapic_msr_read(apic, offset, &val);
>> I think it's problematic to use kvm_lapic_msr_read() in this case. It
>> premises the high 32bit value already valid at APIC_ICR2, while in handling
>> "nodecode" x2APIC writes we need get continuous 64bit data from offset 300H
>> first and prepare emulation of APIC_ICR2 write.
> Ah, I read this part of the spec:
>
>    All 64 bits of the ICR are written by using WRMSR to access the MSR with index 830H.
>    If ECX = 830H, WRMSR writes the 64-bit value in EDX:EAX to the ICR, causing the APIC
>    to send an IPI. If any of bits 13, 17:16, or 31:20 are set in EAX, WRMSR detects a
>    reserved-bit violation and causes a general-protection exception (#GP).
>
> but not the part down below that explicit says
>
>    VICR refers the 64-bit field at offset 300H on the virtual-APIC page. When the
>    “virtualize x2APIC mode” VM-execution control is 1 (indicating virtualization of
>    x2APIC mode), this field is used to virtualize the entire ICR.
>
> But that's indicative of an existing KVM problem.  KVM's emulation of x2APIC is
> broken.  The SDM, in section 10.12.9 ICR Operation in x2APIC Mode, clearly states
> that the ICR is extended to 64-bits.  ICR2 does not exist in x2APIC mode, full stop.
> KVM botched things by effectively aliasing ICR[63:32] to ICR2.
>
> We can and should fix that issue before merging IPIv suport, that way we don't
> further propagate KVM's incorrect behavior.  KVM will need to manipulate the APIC
> state in KVM_{G,S}ET_LAPIC so as not to "break" migration, "break" in quotes because
> I highly doubt any kernel reads ICR[63:32] for anything but debug purposes.  But
> we'd need to do that anyways for IPIv, otherwise migration from an IPIv host to
> a non-IPIv host would suffer the same migration bug.
>
> I'll post a series this week, in theory we should be able to reduce the patch for
> IPIv support to just having to only touching kvm_apic_write_nodecode().
OK, I'll adapt patch after your fix is ready. Suppose 
kvm_lapic_msr_{write,read} needn't emulate ICR2 write/read anymore.
diff mbox series

Patch

diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c
index f206fc35deff..3ce7142ba00e 100644
--- a/arch/x86/kvm/lapic.c
+++ b/arch/x86/kvm/lapic.c
@@ -2186,15 +2186,21 @@  EXPORT_SYMBOL_GPL(kvm_lapic_set_eoi);
 /* emulate APIC access in a trap manner */
 void kvm_apic_write_nodecode(struct kvm_vcpu *vcpu, u32 offset)
 {
-	u32 val = 0;
+	struct kvm_lapic *apic = vcpu->arch.apic;
+	u64 val = 0;
 
 	/* hw has done the conditional check and inst decode */
 	offset &= 0xff0;
 
-	kvm_lapic_reg_read(vcpu->arch.apic, offset, 4, &val);
+	/* exception dealing with 64bit data on vICR in x2apic mode */
+	if ((offset == APIC_ICR) && apic_x2apic_mode(apic)) {
+		val = kvm_lapic_get_reg64(apic, offset);
+		kvm_lapic_reg_write(apic, APIC_ICR2, (u32)(val>>32));
+	} else
+		kvm_lapic_reg_read(apic, offset, 4, &val);
 
 	/* TODO: optimize to just emulate side effect w/o one more write */
-	kvm_lapic_reg_write(vcpu->arch.apic, offset, val);
+	kvm_lapic_reg_write(apic, offset, (u32)val);
 }
 EXPORT_SYMBOL_GPL(kvm_apic_write_nodecode);
 
diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h
index 2b44e533fc8d..91864e401a64 100644
--- a/arch/x86/kvm/lapic.h
+++ b/arch/x86/kvm/lapic.h
@@ -158,6 +158,11 @@  static inline u32 kvm_lapic_get_reg(struct kvm_lapic *apic, int reg_off)
 	return *((u32 *) (apic->regs + reg_off));
 }
 
+static inline u64 kvm_lapic_get_reg64(struct kvm_lapic *apic, int reg_off)
+{
+	return *((u64 *) (apic->regs + reg_off));
+}
+
 static inline void __kvm_lapic_set_reg(char *regs, int reg_off, u32 val)
 {
 	*((u32 *) (regs + reg_off)) = val;