diff mbox

[PART1,RFC,v2,10/10] svm: Manage vcpu load/unload when enable AVIC

Message ID 1457124368-2025-11-git-send-email-Suravee.Suthikulpanit@amd.com (mailing list archive)
State New, archived
Headers show

Commit Message

Suravee Suthikulpanit March 4, 2016, 8:46 p.m. UTC
From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

When a vcpu is loaded/unloaded to a physical core, we need to update
information in the Physical APIC-ID table accordingly.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
 arch/x86/kvm/svm.c | 146 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 146 insertions(+)

Comments

Radim Krčmář March 9, 2016, 9:46 p.m. UTC | #1
2016-03-04 14:46-0600, Suravee Suthikulpanit:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> 
> When a vcpu is loaded/unloaded to a physical core, we need to update
> information in the Physical APIC-ID table accordingly.
> 
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> @@ -1508,6 +1510,146 @@ static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
> +static inline int
> +avic_update_iommu(struct kvm_vcpu *vcpu, int cpu, phys_addr_t pa, bool r)
> +{
> +	if (!kvm_arch_has_assigned_device(vcpu->kvm))
> +		return 0;
> +
> +	/* TODO: We will hook up with IOMMU API at later time */

(It'd be best to drop avic_update_iommu from this series and introduce
 it when merging iommu support.)

> +	return 0;
> +}
> +
> +static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)

This function does a lot and there is only one thing that must be done
in svm_vcpu_load:  change host physical APIC ID if the CPU changed.
The rest can be done elsewhere:
 - is_running when blocking.
 - kb_pg_ptr when the pointer changes = only on initialization?
 - valid when the kb_pg_ptr is valid = always for existing VCPUs?

> +static int avic_set_running(struct kvm_vcpu *vcpu, bool is_run)

avic_set_running should get address of the entry and write is_run to it.
(No need to touch avic_bk_page, h_phy_apic_id or do bound checks.)

I think you can cache the physical apic id table entry in *vcpu, so both
functions are going to be very simple.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 10, 2016, 2:01 p.m. UTC | #2
2016-03-09 22:46+0100, Radim Kr?má?:
> 2016-03-04 14:46-0600, Suravee Suthikulpanit:
>> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> 
>> When a vcpu is loaded/unloaded to a physical core, we need to update
>> information in the Physical APIC-ID table accordingly.
>> 
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>> +static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
> 
> This function does a lot and there is only one thing that must be done
> in svm_vcpu_load:  change host physical APIC ID if the CPU changed.
> The rest can be done elsewhere:
>  - is_running when blocking.

Well, we haven't reached an agreement on is_running yet.  The situation:
if we don't unset vcpu1.is_running when vcpu1 is scheduled out and vcpu2
gets scheduled on vcpu1's physical core, then vcpu2 would receive a
doorbell intended to vcpu1.

We'd like to keep is_running set when there is no reason to vmexit, but
not if a guest can negatively affect other guests.
How does receiving a stray doorbell affect the performance?

Thanks.

(Toggling is_running on load/put is definitely safer, so it's a good
 choice for first version.)
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Suravee Suthikulpanit March 14, 2016, 11:48 a.m. UTC | #3
Hi,

On 03/10/2016 04:46 AM, Radim Kr?má? wrote:
> 2016-03-04 14:46-0600, Suravee Suthikulpanit:
>> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>>
>> When a vcpu is loaded/unloaded to a physical core, we need to update
>> information in the Physical APIC-ID table accordingly.
>>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>> @@ -1508,6 +1510,146 @@ static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
>> +static inline int
>> +avic_update_iommu(struct kvm_vcpu *vcpu, int cpu, phys_addr_t pa, bool r)
>> +{
>> +	if (!kvm_arch_has_assigned_device(vcpu->kvm))
>> +		return 0;
>> +
>> +	/* TODO: We will hook up with IOMMU API at later time */
>
> (It'd be best to drop avic_update_iommu from this series and introduce
>   it when merging iommu support.)
>

I just kept it there to make code merging b/w part1 and 2 that I have 
been testing simpler. I didn't think it would cause much confusion. But 
if you think that might be the case, I can drop it for now.

>> +	return 0;
>> +}
>> +
>> +static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
>
> This function does a lot and there is only one thing that must be done
> in svm_vcpu_load:  change host physical APIC ID if the CPU changed.
> The rest can be done elsewhere:
>   - is_running when blocking.

I added the logic here to track if the is_running is set when unloading 
since I noticed the case when the vCPU is busy doing some work for the 
guest, then get unloaded and later on get loaded w/o 
blocking/unblocking. So, in theory, it should be be set to running 
during unloaded period, and it should restore this flag if it is loaded 
again.

>   - kb_pg_ptr when the pointer changes = only on initialization?

The reason I put this here mainly because it is a convenient place to 
set the vAPIC bakcing page address since we already have to set up the 
host physical APIC id. I guess I can try setting this separately during 
vcpu create. But I don't think it would make much difference.

>   - valid when the kb_pg_ptr is valid = always for existing VCPUs?

According to the programming manual, the valid bit is set when we set 
the host physical APIC ID. However, in theory, the vAPIC backing page 
address is required for AVIC hardware to set bits in IRR register, while 
the host physical APIC ID is needed for sending doorbell to the target 
physical core. So, I would actually interpret the valid bit as it should 
be set when the vAPIC backing address is set.

In the current implementation, the valid bit is set during vcpu load, 
but is not unset it when unload. This actually reflect the 
interpretation of the description above.

If we decide to move the setting of vAPIC backing page address to the 
vcpu create phrase, we would set the valid bit at that point as well.

Please let me know if you think differently.

>> +static int avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
>
> avic_set_running should get address of the entry and write is_run to it.
> (No need to touch avic_bk_page, h_phy_apic_id or do bound checks.)
>
> I think you can cache the physical apic id table entry in *vcpu, so both
> functions are going to be very simple.
>

I was actually thinking about doing this. Lemme try to come up with the 
logic to cache this.

Thanks,
Suravee
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Suravee Suthikulpanit March 14, 2016, 11:58 a.m. UTC | #4
Hi,

On 03/10/2016 09:01 PM, Radim Kr?má? wrote:
> Well, we haven't reached an agreement on is_running yet.  The situation:
> if we don't unset vcpu1.is_running when vcpu1 is scheduled out and vcpu2
> gets scheduled on vcpu1's physical core, then vcpu2 would receive a
> doorbell intended to vcpu1.

That's why, in V2, I added the logic to check if the is_running bit is 
set for the current vcpu (e.g. vcpu1) when unloaded, then restore the 
bit during loading later of if it was set during previous unloaded. This 
way, when we load the new vcpu (e.g. vcpu2), the is_running will be set 
as it was before unloading.

> We'd like to keep is_running set when there is no reason to vmexit, but
> not if a guest can negatively affect other guests.

Not sure how this can affect other guests?

> How does receiving a stray doorbell affect the performance?

As far as I know, the doorbell only affects the CPU during vmrun. Once 
received, it will check the IRR in vAPIC backing page.  So, I think if 
IRR bit is not set, the affect should be rather minimal.

Thanks,
Suravee
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 14, 2016, 4:40 p.m. UTC | #5
2016-03-14 18:48+0700, Suravee Suthikulpanit:
> On 03/10/2016 04:46 AM, Radim Kr?má? wrote:
>>2016-03-04 14:46-0600, Suravee Suthikulpanit:
>>>From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>>>
>>>When a vcpu is loaded/unloaded to a physical core, we need to update
>>>information in the Physical APIC-ID table accordingly.
>>>
>>>Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>>>---
>>>diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
>>>@@ -1508,6 +1510,146 @@ static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
>>>+static inline int
>>>+avic_update_iommu(struct kvm_vcpu *vcpu, int cpu, phys_addr_t pa, bool r)
>>>+{
>>>+	if (!kvm_arch_has_assigned_device(vcpu->kvm))
>>>+		return 0;
>>>+
>>>+	/* TODO: We will hook up with IOMMU API at later time */
>>
>>(It'd be best to drop avic_update_iommu from this series and introduce
>>  it when merging iommu support.)
>>
> 
> I just kept it there to make code merging b/w part1 and 2 that I have been
> testing simpler. I didn't think it would cause much confusion. But if you
> think that might be the case, I can drop it for now.

The iommu part might end up having different requirements for this
function, so this husk can only add work when compared to waiting.
And avic_update_iommu is logically separable, so it would be nicer as a
short separate patch anyway.  (It's not a problem if you leave it.)

>>>+static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
>>
>>This function does a lot and there is only one thing that must be done
>>in svm_vcpu_load:  change host physical APIC ID if the CPU changed.
>>The rest can be done elsewhere:
>>  - is_running when blocking.
> 
> I added the logic here to track if the is_running is set when unloading
> since I noticed the case when the vCPU is busy doing some work for the
> guest, then get unloaded and later on get loaded w/o blocking/unblocking.
> So, in theory, it should be be set to running during unloaded period, and it
> should restore this flag if it is loaded again.

(A second mail will be related to this.)

>>  - kb_pg_ptr when the pointer changes = only on initialization?
> 
> The reason I put this here mainly because it is a convenient place to set
> the vAPIC bakcing page address since we already have to set up the host
> physical APIC id. I guess I can try setting this separately during vcpu
> create. But I don't think it would make much difference.

vcpu_load isn't as hot vcpu_run, but it's still called often and the
most useful optimization is to avoid unnecessary operations ...
(I think the split code is going to be easier to understand as well.)

>>  - valid when the kb_pg_ptr is valid = always for existing VCPUs?
> 
> According to the programming manual, the valid bit is set when we set the
> host physical APIC ID.

(Physical APIC ID doesn't affect the valid bit at all.)

>                        However, in theory, the vAPIC backing page address is
> required for AVIC hardware to set bits in IRR register, while the host
> physical APIC ID is needed for sending doorbell to the target physical core.
> So, I would actually interpret the valid bit as it should be set when the
> vAPIC backing address is set.

Yes, APM (rev 3.23, vol 2, table 15-24):
  Valid bit. When set, indicates that this entry contains a valid vAPIC
  backing page pointer. If cleared, this table entry contains no
  information.

> In the current implementation, the valid bit is set during vcpu load, but is
> not unset it when unload. This actually reflect the interpretation of the
> description above.
> 
> If we decide to move the setting of vAPIC backing page address to the vcpu
> create phrase, we would set the valid bit at that point as well.
> 
> Please let me know if you think differently.

I agree with your analysis and setting the backing page and valid bit on
LAPIC creation seems better to me.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Radim Krčmář March 14, 2016, 4:54 p.m. UTC | #6
2016-03-14 18:58+0700, Suravee Suthikulpanit:
> On 03/10/2016 09:01 PM, Radim Kr?má? wrote:
>>Well, we haven't reached an agreement on is_running yet.  The situation:
>>if we don't unset vcpu1.is_running when vcpu1 is scheduled out and vcpu2
>>gets scheduled on vcpu1's physical core, then vcpu2 would receive a
>>doorbell intended to vcpu1.
> 
> That's why, in V2, I added the logic to check if the is_running bit is set
> for the current vcpu (e.g. vcpu1) when unloaded, then restore the bit during
> loading later of if it was set during previous unloaded. This way, when we
> load the new vcpu (e.g. vcpu2), the is_running will be set as it was before
> unloading.

Yes, that's a good solution and I'm leaning towards it.  The downside is
that IPIs from other VCPUs exit, even though KVM can't do anything,
because the vCPU is already going to run as soon as it can.
Keeping is_running set during unload would prevent meaningless exits.

>>We'd like to keep is_running set when there is no reason to vmexit, but
>>not if a guest can negatively affect other guests.
> 
> Not sure how this can affect other guests?

If is_running is set, then the doorbell is sent to a physical core, so
any running task/vCPU will receive it.  This is safe, but a difference
can be seen in performance.

>>How does receiving a stray doorbell affect the performance?
> 
> As far as I know, the doorbell only affects the CPU during vmrun.

Yeah, I guess that receiving a doorbell outside of vmrun has no
overhead.

>                                                                   Once
> received, it will check the IRR in vAPIC backing page.  So, I think if IRR
> bit is not set, the affect should be rather minimal.

Even empty IRR still needs to be rescanned every time a doorbell
arrives, which might affect the execution pipeline.

After re-reading all relevant quotes, I think that hardware wasn't
designed with this use in mind, so it's safer to assume an adverse
effect and go with solution we have now.  (It'd be hard to measure
anyway.)

Sorry for the tangent.
--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 5142861..ebcade0 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -35,6 +35,7 @@ 
 #include <linux/trace_events.h>
 #include <linux/slab.h>
 
+#include <asm/apic.h>
 #include <asm/perf_event.h>
 #include <asm/tlbflush.h>
 #include <asm/desc.h>
@@ -175,6 +176,7 @@  struct vcpu_svm {
 
 	struct page *avic_bk_page;
 	void *in_kernel_lapic_regs;
+	bool avic_was_running;
 };
 
 struct __attribute__ ((__packed__))
@@ -1508,6 +1510,146 @@  static int avic_vcpu_init(struct kvm *kvm, struct vcpu_svm *svm, int id)
 	return 0;
 }
 
+static inline int
+avic_update_iommu(struct kvm_vcpu *vcpu, int cpu, phys_addr_t pa, bool r)
+{
+	if (!kvm_arch_has_assigned_device(vcpu->kvm))
+		return 0;
+
+	/* TODO: We will hook up with IOMMU API at later time */
+	return 0;
+}
+
+static int avic_vcpu_load(struct kvm_vcpu *vcpu, int cpu, bool is_load)
+{
+	int g_phy_apic_id, h_phy_apic_id;
+	struct svm_avic_phy_ait_entry *entry, new_entry;
+	struct vcpu_svm *svm = to_svm(vcpu);
+	int ret = 0;
+
+	if (!avic)
+		return 0;
+
+	if (!svm)
+		return -EINVAL;
+
+	/* Note: APIC ID = 0xff is used for broadcast.
+	 *       APIC ID > 0xff is reserved.
+	 */
+	g_phy_apic_id = vcpu->vcpu_id;
+	h_phy_apic_id = __default_cpu_present_to_apicid(cpu);
+
+	if ((g_phy_apic_id >= AVIC_PHY_APIC_ID_MAX) ||
+	    (h_phy_apic_id >= AVIC_PHY_APIC_ID_MAX))
+		return -EINVAL;
+
+	entry = avic_get_phy_ait_entry(vcpu, g_phy_apic_id);
+	if (!entry)
+		return -EINVAL;
+
+	if (is_load) {
+		/* Handle vcpu load */
+		phys_addr_t pa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
+
+		new_entry = READ_ONCE(*entry);
+
+		BUG_ON(new_entry.is_running);
+
+		new_entry.bk_pg_ptr = (pa >> 12) & 0xffffffffff;
+		new_entry.valid = 1;
+		new_entry.host_phy_apic_id = h_phy_apic_id;
+
+		if (svm->avic_was_running) {
+			/**
+			 * Restore AVIC running flag if it was set during
+			 * vcpu unload.
+			 */
+			new_entry.is_running = 1;
+		}
+		ret = avic_update_iommu(vcpu, h_phy_apic_id, pa,
+					   svm->avic_was_running);
+		WRITE_ONCE(*entry, new_entry);
+
+	} else {
+		/* Handle vcpu unload */
+		new_entry = READ_ONCE(*entry);
+		if (new_entry.is_running) {
+			phys_addr_t pa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
+
+			/**
+			 * This handles the case when vcpu is scheduled out
+			 * and has not yet not called blocking. We save the
+			 * AVIC running flag so that we can restore later.
+			 */
+			svm->avic_was_running = true;
+
+			/**
+			 * We need to also clear the AVIC running flag
+			 * and communicate the changes to IOMMU.
+			 */
+			ret = avic_update_iommu(vcpu, h_phy_apic_id, pa, 0);
+
+			new_entry.is_running = 0;
+			WRITE_ONCE(*entry, new_entry);
+		} else {
+			svm->avic_was_running = false;
+		}
+	}
+
+	return ret;
+}
+
+/**
+ * This function is called during VCPU halt/unhalt.
+ */
+static int avic_set_running(struct kvm_vcpu *vcpu, bool is_run)
+{
+	int ret = 0;
+	int g_phy_apic_id, h_phy_apic_id;
+	struct svm_avic_phy_ait_entry *entry, new_entry;
+	struct vcpu_svm *svm = to_svm(vcpu);
+	phys_addr_t pa = PFN_PHYS(page_to_pfn(svm->avic_bk_page));
+
+	if (!avic)
+		return 0;
+
+	/* Note: APIC ID = 0xff is used for broadcast.
+	 *       APIC ID > 0xff is reserved.
+	 */
+	g_phy_apic_id = vcpu->vcpu_id;
+	h_phy_apic_id = __default_cpu_present_to_apicid(vcpu->cpu);
+
+	if ((g_phy_apic_id >= AVIC_PHY_APIC_ID_MAX) ||
+	    (h_phy_apic_id >= AVIC_PHY_APIC_ID_MAX))
+		return -EINVAL;
+
+	entry = avic_get_phy_ait_entry(vcpu, g_phy_apic_id);
+	if (!entry)
+		return -EINVAL;
+
+	if (is_run) {
+		/**
+		 * Handle vcpu unblocking after HLT
+		 */
+		new_entry = READ_ONCE(*entry);
+		new_entry.is_running = is_run;
+		WRITE_ONCE(*entry, new_entry);
+
+		ret = avic_update_iommu(vcpu, h_phy_apic_id, pa, is_run);
+	} else {
+		/**
+		 * Handle vcpu blocking due to HLT
+		 */
+		ret = avic_update_iommu(vcpu, h_phy_apic_id, pa, is_run);
+
+		new_entry = READ_ONCE(*entry);
+		new_entry.is_running = is_run;
+		WRITE_ONCE(*entry, new_entry);
+	}
+
+	return ret;
+}
+
 static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool init_event)
 {
 	struct vcpu_svm *svm = to_svm(vcpu);
@@ -1648,6 +1790,8 @@  static void svm_vcpu_load(struct kvm_vcpu *vcpu, int cpu)
 	/* This assumes that the kernel never uses MSR_TSC_AUX */
 	if (static_cpu_has(X86_FEATURE_RDTSCP))
 		wrmsrl(MSR_TSC_AUX, svm->tsc_aux);
+
+	avic_vcpu_load(vcpu, cpu, true);
 }
 
 static void svm_vcpu_put(struct kvm_vcpu *vcpu)
@@ -1655,6 +1799,8 @@  static void svm_vcpu_put(struct kvm_vcpu *vcpu)
 	struct vcpu_svm *svm = to_svm(vcpu);
 	int i;
 
+	avic_vcpu_load(vcpu, 0, false);
+
 	++vcpu->stat.host_state_reload;
 	kvm_load_ldt(svm->host.ldt);
 #ifdef CONFIG_X86_64