diff mbox

[v3,4/7] x86/hyper-v: allocate and use Virtual Processor Assist Pages

Message ID 20180309140249.2840-5-vkuznets@redhat.com (mailing list archive)
State New, archived
Headers show

Commit Message

Vitaly Kuznetsov March 9, 2018, 2:02 p.m. UTC
Virtual Processor Assist Pages usage allows us to do optimized EOI
processing for APIC, enable Enlightened VMCS support in KVM and more.
struct hv_vp_assist_page is defined according to the Hyper-V TLFS v5.0b.

Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>
---
Changes since v1:
  move HV_X64_ENLIGHTENED_VMCS_RECOMMENDED definition to this patch
---
 arch/x86/hyperv/hv_init.c          | 33 +++++++++++++++++++++++++++++++++
 arch/x86/include/asm/hyperv-tlfs.h | 13 +++++++++++++
 arch/x86/include/asm/mshyperv.h    | 10 ++++++++++
 3 files changed, 56 insertions(+)

Comments

Michael Kelley (EOSG) March 13, 2018, 11:08 p.m. UTC | #1
> -----Original Message-----
> From: Vitaly Kuznetsov <vkuznets@redhat.com>
> Sent: Friday, March 9, 2018 6:03 AM
> To: kvm@vger.kernel.org
> Cc: x86@kernel.org; Paolo Bonzini <pbonzini@redhat.com>; Radim Krčmář
> <rkrcmar@redhat.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang
> <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; Michael
> Kelley (EOSG) <Michael.H.Kelley@microsoft.com>; Mohammed Gamal
> <mmorsy@redhat.com>; Cathy Avery <cavery@redhat.com>; Bandan Das <bsd@redhat.com>;
> linux-kernel@vger.kernel.org
> Subject: [PATCH v3 4/7] x86/hyper-v: allocate and use Virtual Processor Assist Pages
> 
> Virtual Processor Assist Pages usage allows us to do optimized EOI
> processing for APIC, enable Enlightened VMCS support in KVM and more.
> struct hv_vp_assist_page is defined according to the Hyper-V TLFS v5.0b.
> 
> Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com>

Reviewed-by:  Michael Kelley <mikelley@microsoft.com>
Thomas Gleixner March 14, 2018, 3:15 p.m. UTC | #2
On Fri, 9 Mar 2018, Vitaly Kuznetsov wrote:
> @@ -198,6 +218,12 @@ static int hv_cpu_die(unsigned int cpu)
>  	struct hv_reenlightenment_control re_ctrl;
>  	unsigned int new_cpu;
>  
> +	if (hv_vp_assist_page && hv_vp_assist_page[cpu]) {
> +		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> +		vfree(hv_vp_assist_page[cpu]);
> +		hv_vp_assist_page[cpu] = NULL;

So this is freed before the CPU is actually dead. And this runs in
preemtible context. Is the wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); enough to
prevent eventual users of the assist page on the outgoing CPU from
accessing it?

>  	if (hv_reenlightenment_cb == NULL)
>  		return 0;
>  
> @@ -241,6 +267,13 @@ void hyperv_init(void)
>  	if (!hv_vp_index)
>  		return;
>  
> +	hv_vp_assist_page = kcalloc(num_possible_cpus(),
> +				    sizeof(*hv_vp_assist_page), GFP_KERNEL);
> +	if (!hv_vp_assist_page) {
> +		ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
> +		return;
> +	}
> +
>  	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
>  			      hv_cpu_init, hv_cpu_die) < 0)
>  		goto free_vp_index;

Shouldn't you free hv_vp_assist_page in the error path?

> +extern struct hv_vp_assist_page **hv_vp_assist_page;
> +
> +static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
> +{
> +	return hv_vp_assist_page[cpu];

Shouldn't that check hv_vp_assist_page != NULL?

Thanks,

	tglx
Vitaly Kuznetsov March 15, 2018, 10:10 a.m. UTC | #3
Thomas Gleixner <tglx@linutronix.de> writes:

> On Fri, 9 Mar 2018, Vitaly Kuznetsov wrote:
>> @@ -198,6 +218,12 @@ static int hv_cpu_die(unsigned int cpu)
>>  	struct hv_reenlightenment_control re_ctrl;
>>  	unsigned int new_cpu;
>>  
>> +	if (hv_vp_assist_page && hv_vp_assist_page[cpu]) {
>> +		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
>> +		vfree(hv_vp_assist_page[cpu]);
>> +		hv_vp_assist_page[cpu] = NULL;
>
> So this is freed before the CPU is actually dead. And this runs in
> preemtible context. Is the wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); enough to
> prevent eventual users of the assist page on the outgoing CPU from
> accessing it?
>

After we do wrmsrl() the page is no longer 'magic' so in case eventual
users try using it they'll most likely misbehave -- so changing the
shutdown order won't help.

The only user of these pages is currently KVM. Can we still have vCPUs
running on the outgoing CPU at this point? If case we can we're in
trouble and we need to somehow kick them out first.

>>  	if (hv_reenlightenment_cb == NULL)
>>  		return 0;
>>  
>> @@ -241,6 +267,13 @@ void hyperv_init(void)
>>  	if (!hv_vp_index)
>>  		return;
>>  
>> +	hv_vp_assist_page = kcalloc(num_possible_cpus(),
>> +				    sizeof(*hv_vp_assist_page), GFP_KERNEL);
>> +	if (!hv_vp_assist_page) {
>> +		ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
>> +		return;
>> +	}
>> +
>>  	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
>>  			      hv_cpu_init, hv_cpu_die) < 0)
>>  		goto free_vp_index;
>
> Shouldn't you free hv_vp_assist_page in the error path?
>

Yep, will do.

>> +extern struct hv_vp_assist_page **hv_vp_assist_page;
>> +
>> +static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
>> +{
>> +	return hv_vp_assist_page[cpu];
>
> Shouldn't that check hv_vp_assist_page != NULL?
>

Not strictly required as we clean HV_X64_ENLIGHTENED_VMCS_RECOMMENDED
above so KVM won't use it but I can add the check to make the API
better.

Thanks,
Thomas Gleixner March 15, 2018, 11:45 a.m. UTC | #4
On Thu, 15 Mar 2018, Vitaly Kuznetsov wrote:
> Thomas Gleixner <tglx@linutronix.de> writes:
> > On Fri, 9 Mar 2018, Vitaly Kuznetsov wrote:
> >> @@ -198,6 +218,12 @@ static int hv_cpu_die(unsigned int cpu)
> >>  	struct hv_reenlightenment_control re_ctrl;
> >>  	unsigned int new_cpu;
> >>  
> >> +	if (hv_vp_assist_page && hv_vp_assist_page[cpu]) {
> >> +		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> >> +		vfree(hv_vp_assist_page[cpu]);
> >> +		hv_vp_assist_page[cpu] = NULL;
> >
> > So this is freed before the CPU is actually dead. And this runs in
> > preemtible context. Is the wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); enough to
> > prevent eventual users of the assist page on the outgoing CPU from
> > accessing it?
> >
> 
> After we do wrmsrl() the page is no longer 'magic' so in case eventual
> users try using it they'll most likely misbehave -- so changing the
> shutdown order won't help.
> 
> The only user of these pages is currently KVM. Can we still have vCPUs
> running on the outgoing CPU at this point? If case we can we're in
> trouble and we need to somehow kick them out first.

The first thing we do in unplug is to mark the CPU inactive, but I'm not
sure whether that prevents something which was on the CPU before and
perhaps preempted or is affine to that CPU to be scheduled in
again. Peter????

Thanks,

	tglx
Peter Zijlstra March 15, 2018, 1:48 p.m. UTC | #5
On Thu, Mar 15, 2018 at 12:45:03PM +0100, Thomas Gleixner wrote:
> On Thu, 15 Mar 2018, Vitaly Kuznetsov wrote:
> > Thomas Gleixner <tglx@linutronix.de> writes:
> > > On Fri, 9 Mar 2018, Vitaly Kuznetsov wrote:
> > >> @@ -198,6 +218,12 @@ static int hv_cpu_die(unsigned int cpu)
> > >>  	struct hv_reenlightenment_control re_ctrl;
> > >>  	unsigned int new_cpu;
> > >>  
> > >> +	if (hv_vp_assist_page && hv_vp_assist_page[cpu]) {
> > >> +		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
> > >> +		vfree(hv_vp_assist_page[cpu]);
> > >> +		hv_vp_assist_page[cpu] = NULL;
> > >
> > > So this is freed before the CPU is actually dead. And this runs in
> > > preemtible context. Is the wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); enough to
> > > prevent eventual users of the assist page on the outgoing CPU from
> > > accessing it?
> > >
> > 
> > After we do wrmsrl() the page is no longer 'magic' so in case eventual
> > users try using it they'll most likely misbehave -- so changing the
> > shutdown order won't help.
> > 
> > The only user of these pages is currently KVM. Can we still have vCPUs
> > running on the outgoing CPU at this point? If case we can we're in
> > trouble and we need to somehow kick them out first.
> 
> The first thing we do in unplug is to mark the CPU inactive, but I'm not
> sure whether that prevents something which was on the CPU before and
> perhaps preempted or is affine to that CPU to be scheduled in
> again. Peter????

I think we can still have tasks running at this point.

AP_ACTIVE (sched_cpu_deactivate) simply takes the CPU out of the active
mask, which guarantees no new tasks will land on the CPU.

We'll then proceed all the way to TEARDOWN_CPU as 'normal', at which
point we'll call stop_machine() which does the old DYING callbacks.

It sounds like we want this done here, although possibly we can't do
vfree() from that context, in which case it needs to store the pointer
and do that from a BP callback (what used to be the OFFLINE callbacks or
something).
Thomas Gleixner March 15, 2018, 1:57 p.m. UTC | #6
On Thu, 15 Mar 2018, Peter Zijlstra wrote:
> On Thu, Mar 15, 2018 at 12:45:03PM +0100, Thomas Gleixner wrote:
> > On Thu, 15 Mar 2018, Vitaly Kuznetsov wrote:
> > > The only user of these pages is currently KVM. Can we still have vCPUs
> > > running on the outgoing CPU at this point? If case we can we're in
> > > trouble and we need to somehow kick them out first.
> > 
> > The first thing we do in unplug is to mark the CPU inactive, but I'm not
> > sure whether that prevents something which was on the CPU before and
> > perhaps preempted or is affine to that CPU to be scheduled in
> > again. Peter????
> 
> I think we can still have tasks running at this point.
> 
> AP_ACTIVE (sched_cpu_deactivate) simply takes the CPU out of the active
> mask, which guarantees no new tasks will land on the CPU.
> 
> We'll then proceed all the way to TEARDOWN_CPU as 'normal', at which
> point we'll call stop_machine() which does the old DYING callbacks.
> 
> It sounds like we want this done here, although possibly we can't do
> vfree() from that context, in which case it needs to store the pointer
> and do that from a BP callback (what used to be the OFFLINE callbacks or
> something).

So the wrmsr() wants to be in the dying range. The vfree() is questionable
anyway because the re-onlining of that CPU will just allocate it again. So
it could very well stay around.

Thanks,

	tglx
diff mbox

Patch

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 4b82bc206929..2e0c0351c5f8 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -88,6 +88,9 @@  EXPORT_SYMBOL_GPL(hyperv_cs);
 u32 *hv_vp_index;
 EXPORT_SYMBOL_GPL(hv_vp_index);
 
+struct hv_vp_assist_page **hv_vp_assist_page;
+EXPORT_SYMBOL_GPL(hv_vp_assist_page);
+
 u32 hv_max_vp_index;
 
 static int hv_cpu_init(unsigned int cpu)
@@ -101,6 +104,23 @@  static int hv_cpu_init(unsigned int cpu)
 	if (msr_vp_index > hv_max_vp_index)
 		hv_max_vp_index = msr_vp_index;
 
+	if (!hv_vp_assist_page)
+		return 0;
+
+	if (!hv_vp_assist_page[smp_processor_id()])
+		hv_vp_assist_page[smp_processor_id()] =
+			__vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL);
+
+	if (hv_vp_assist_page[smp_processor_id()]) {
+		u64 val;
+
+		val = vmalloc_to_pfn(hv_vp_assist_page[smp_processor_id()]);
+		val = (val << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) |
+			HV_X64_MSR_VP_ASSIST_PAGE_ENABLE;
+
+		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val);
+	}
+
 	return 0;
 }
 
@@ -198,6 +218,12 @@  static int hv_cpu_die(unsigned int cpu)
 	struct hv_reenlightenment_control re_ctrl;
 	unsigned int new_cpu;
 
+	if (hv_vp_assist_page && hv_vp_assist_page[cpu]) {
+		wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0);
+		vfree(hv_vp_assist_page[cpu]);
+		hv_vp_assist_page[cpu] = NULL;
+	}
+
 	if (hv_reenlightenment_cb == NULL)
 		return 0;
 
@@ -241,6 +267,13 @@  void hyperv_init(void)
 	if (!hv_vp_index)
 		return;
 
+	hv_vp_assist_page = kcalloc(num_possible_cpus(),
+				    sizeof(*hv_vp_assist_page), GFP_KERNEL);
+	if (!hv_vp_assist_page) {
+		ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED;
+		return;
+	}
+
 	if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online",
 			      hv_cpu_init, hv_cpu_die) < 0)
 		goto free_vp_index;
diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h
index 5c0d8fab87a3..3d0fba18ab4c 100644
--- a/arch/x86/include/asm/hyperv-tlfs.h
+++ b/arch/x86/include/asm/hyperv-tlfs.h
@@ -156,6 +156,9 @@ 
 /* Recommend using the newer ExProcessorMasks interface */
 #define HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED	(1 << 11)
 
+/* Recommend using enlightened VMCS */
+#define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED    (1 << 14)
+
 /*
  * Crash notification flag.
  */
@@ -469,6 +472,16 @@  struct hv_timer_message_payload {
 	__u64 delivery_time;	/* When the message was delivered */
 };
 
+/* Define virtual processor assist page structure. */
+struct hv_vp_assist_page {
+	__u32 apic_assist;
+	__u32 reserved;
+	__u64 vtl_control[2];
+	__u64 nested_enlightenments_control[2];
+	__u32 enlighten_vmentry;
+	__u64 current_nested_vmcs;
+};
+
 #define HV_STIMER_ENABLE		(1ULL << 0)
 #define HV_STIMER_PERIODIC		(1ULL << 1)
 #define HV_STIMER_LAZY			(1ULL << 2)
diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h
index 38cfbe9a5794..dcb40567b35e 100644
--- a/arch/x86/include/asm/mshyperv.h
+++ b/arch/x86/include/asm/mshyperv.h
@@ -218,6 +218,12 @@  static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size,
  */
 extern u32 *hv_vp_index;
 extern u32 hv_max_vp_index;
+extern struct hv_vp_assist_page **hv_vp_assist_page;
+
+static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
+{
+	return hv_vp_assist_page[cpu];
+}
 
 /**
  * hv_cpu_number_to_vp_number() - Map CPU to VP.
@@ -254,6 +260,10 @@  static inline void hyperv_setup_mmu_ops(void) {}
 static inline void set_hv_tscchange_cb(void (*cb)(void)) {}
 static inline void clear_hv_tscchange_cb(void) {}
 static inline void hyperv_stop_tsc_emulation(void) {};
+static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu)
+{
+	return NULL;
+}
 #endif /* CONFIG_HYPERV */
 
 #ifdef CONFIG_HYPERV_TSCPAGE