Message ID | 20180309140249.2840-5-vkuznets@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
> -----Original Message----- > From: Vitaly Kuznetsov <vkuznets@redhat.com> > Sent: Friday, March 9, 2018 6:03 AM > To: kvm@vger.kernel.org > Cc: x86@kernel.org; Paolo Bonzini <pbonzini@redhat.com>; Radim Krčmář > <rkrcmar@redhat.com>; KY Srinivasan <kys@microsoft.com>; Haiyang Zhang > <haiyangz@microsoft.com>; Stephen Hemminger <sthemmin@microsoft.com>; Michael > Kelley (EOSG) <Michael.H.Kelley@microsoft.com>; Mohammed Gamal > <mmorsy@redhat.com>; Cathy Avery <cavery@redhat.com>; Bandan Das <bsd@redhat.com>; > linux-kernel@vger.kernel.org > Subject: [PATCH v3 4/7] x86/hyper-v: allocate and use Virtual Processor Assist Pages > > Virtual Processor Assist Pages usage allows us to do optimized EOI > processing for APIC, enable Enlightened VMCS support in KVM and more. > struct hv_vp_assist_page is defined according to the Hyper-V TLFS v5.0b. > > Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Michael Kelley <mikelley@microsoft.com>
On Fri, 9 Mar 2018, Vitaly Kuznetsov wrote: > @@ -198,6 +218,12 @@ static int hv_cpu_die(unsigned int cpu) > struct hv_reenlightenment_control re_ctrl; > unsigned int new_cpu; > > + if (hv_vp_assist_page && hv_vp_assist_page[cpu]) { > + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); > + vfree(hv_vp_assist_page[cpu]); > + hv_vp_assist_page[cpu] = NULL; So this is freed before the CPU is actually dead. And this runs in preemtible context. Is the wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); enough to prevent eventual users of the assist page on the outgoing CPU from accessing it? > if (hv_reenlightenment_cb == NULL) > return 0; > > @@ -241,6 +267,13 @@ void hyperv_init(void) > if (!hv_vp_index) > return; > > + hv_vp_assist_page = kcalloc(num_possible_cpus(), > + sizeof(*hv_vp_assist_page), GFP_KERNEL); > + if (!hv_vp_assist_page) { > + ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED; > + return; > + } > + > if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online", > hv_cpu_init, hv_cpu_die) < 0) > goto free_vp_index; Shouldn't you free hv_vp_assist_page in the error path? > +extern struct hv_vp_assist_page **hv_vp_assist_page; > + > +static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu) > +{ > + return hv_vp_assist_page[cpu]; Shouldn't that check hv_vp_assist_page != NULL? Thanks, tglx
Thomas Gleixner <tglx@linutronix.de> writes: > On Fri, 9 Mar 2018, Vitaly Kuznetsov wrote: >> @@ -198,6 +218,12 @@ static int hv_cpu_die(unsigned int cpu) >> struct hv_reenlightenment_control re_ctrl; >> unsigned int new_cpu; >> >> + if (hv_vp_assist_page && hv_vp_assist_page[cpu]) { >> + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); >> + vfree(hv_vp_assist_page[cpu]); >> + hv_vp_assist_page[cpu] = NULL; > > So this is freed before the CPU is actually dead. And this runs in > preemtible context. Is the wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); enough to > prevent eventual users of the assist page on the outgoing CPU from > accessing it? > After we do wrmsrl() the page is no longer 'magic' so in case eventual users try using it they'll most likely misbehave -- so changing the shutdown order won't help. The only user of these pages is currently KVM. Can we still have vCPUs running on the outgoing CPU at this point? If case we can we're in trouble and we need to somehow kick them out first. >> if (hv_reenlightenment_cb == NULL) >> return 0; >> >> @@ -241,6 +267,13 @@ void hyperv_init(void) >> if (!hv_vp_index) >> return; >> >> + hv_vp_assist_page = kcalloc(num_possible_cpus(), >> + sizeof(*hv_vp_assist_page), GFP_KERNEL); >> + if (!hv_vp_assist_page) { >> + ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED; >> + return; >> + } >> + >> if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online", >> hv_cpu_init, hv_cpu_die) < 0) >> goto free_vp_index; > > Shouldn't you free hv_vp_assist_page in the error path? > Yep, will do. >> +extern struct hv_vp_assist_page **hv_vp_assist_page; >> + >> +static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu) >> +{ >> + return hv_vp_assist_page[cpu]; > > Shouldn't that check hv_vp_assist_page != NULL? > Not strictly required as we clean HV_X64_ENLIGHTENED_VMCS_RECOMMENDED above so KVM won't use it but I can add the check to make the API better. Thanks,
On Thu, 15 Mar 2018, Vitaly Kuznetsov wrote: > Thomas Gleixner <tglx@linutronix.de> writes: > > On Fri, 9 Mar 2018, Vitaly Kuznetsov wrote: > >> @@ -198,6 +218,12 @@ static int hv_cpu_die(unsigned int cpu) > >> struct hv_reenlightenment_control re_ctrl; > >> unsigned int new_cpu; > >> > >> + if (hv_vp_assist_page && hv_vp_assist_page[cpu]) { > >> + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); > >> + vfree(hv_vp_assist_page[cpu]); > >> + hv_vp_assist_page[cpu] = NULL; > > > > So this is freed before the CPU is actually dead. And this runs in > > preemtible context. Is the wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); enough to > > prevent eventual users of the assist page on the outgoing CPU from > > accessing it? > > > > After we do wrmsrl() the page is no longer 'magic' so in case eventual > users try using it they'll most likely misbehave -- so changing the > shutdown order won't help. > > The only user of these pages is currently KVM. Can we still have vCPUs > running on the outgoing CPU at this point? If case we can we're in > trouble and we need to somehow kick them out first. The first thing we do in unplug is to mark the CPU inactive, but I'm not sure whether that prevents something which was on the CPU before and perhaps preempted or is affine to that CPU to be scheduled in again. Peter???? Thanks, tglx
On Thu, Mar 15, 2018 at 12:45:03PM +0100, Thomas Gleixner wrote: > On Thu, 15 Mar 2018, Vitaly Kuznetsov wrote: > > Thomas Gleixner <tglx@linutronix.de> writes: > > > On Fri, 9 Mar 2018, Vitaly Kuznetsov wrote: > > >> @@ -198,6 +218,12 @@ static int hv_cpu_die(unsigned int cpu) > > >> struct hv_reenlightenment_control re_ctrl; > > >> unsigned int new_cpu; > > >> > > >> + if (hv_vp_assist_page && hv_vp_assist_page[cpu]) { > > >> + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); > > >> + vfree(hv_vp_assist_page[cpu]); > > >> + hv_vp_assist_page[cpu] = NULL; > > > > > > So this is freed before the CPU is actually dead. And this runs in > > > preemtible context. Is the wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); enough to > > > prevent eventual users of the assist page on the outgoing CPU from > > > accessing it? > > > > > > > After we do wrmsrl() the page is no longer 'magic' so in case eventual > > users try using it they'll most likely misbehave -- so changing the > > shutdown order won't help. > > > > The only user of these pages is currently KVM. Can we still have vCPUs > > running on the outgoing CPU at this point? If case we can we're in > > trouble and we need to somehow kick them out first. > > The first thing we do in unplug is to mark the CPU inactive, but I'm not > sure whether that prevents something which was on the CPU before and > perhaps preempted or is affine to that CPU to be scheduled in > again. Peter???? I think we can still have tasks running at this point. AP_ACTIVE (sched_cpu_deactivate) simply takes the CPU out of the active mask, which guarantees no new tasks will land on the CPU. We'll then proceed all the way to TEARDOWN_CPU as 'normal', at which point we'll call stop_machine() which does the old DYING callbacks. It sounds like we want this done here, although possibly we can't do vfree() from that context, in which case it needs to store the pointer and do that from a BP callback (what used to be the OFFLINE callbacks or something).
On Thu, 15 Mar 2018, Peter Zijlstra wrote: > On Thu, Mar 15, 2018 at 12:45:03PM +0100, Thomas Gleixner wrote: > > On Thu, 15 Mar 2018, Vitaly Kuznetsov wrote: > > > The only user of these pages is currently KVM. Can we still have vCPUs > > > running on the outgoing CPU at this point? If case we can we're in > > > trouble and we need to somehow kick them out first. > > > > The first thing we do in unplug is to mark the CPU inactive, but I'm not > > sure whether that prevents something which was on the CPU before and > > perhaps preempted or is affine to that CPU to be scheduled in > > again. Peter???? > > I think we can still have tasks running at this point. > > AP_ACTIVE (sched_cpu_deactivate) simply takes the CPU out of the active > mask, which guarantees no new tasks will land on the CPU. > > We'll then proceed all the way to TEARDOWN_CPU as 'normal', at which > point we'll call stop_machine() which does the old DYING callbacks. > > It sounds like we want this done here, although possibly we can't do > vfree() from that context, in which case it needs to store the pointer > and do that from a BP callback (what used to be the OFFLINE callbacks or > something). So the wrmsr() wants to be in the dying range. The vfree() is questionable anyway because the re-onlining of that CPU will just allocate it again. So it could very well stay around. Thanks, tglx
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 4b82bc206929..2e0c0351c5f8 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -88,6 +88,9 @@ EXPORT_SYMBOL_GPL(hyperv_cs); u32 *hv_vp_index; EXPORT_SYMBOL_GPL(hv_vp_index); +struct hv_vp_assist_page **hv_vp_assist_page; +EXPORT_SYMBOL_GPL(hv_vp_assist_page); + u32 hv_max_vp_index; static int hv_cpu_init(unsigned int cpu) @@ -101,6 +104,23 @@ static int hv_cpu_init(unsigned int cpu) if (msr_vp_index > hv_max_vp_index) hv_max_vp_index = msr_vp_index; + if (!hv_vp_assist_page) + return 0; + + if (!hv_vp_assist_page[smp_processor_id()]) + hv_vp_assist_page[smp_processor_id()] = + __vmalloc(PAGE_SIZE, GFP_KERNEL, PAGE_KERNEL); + + if (hv_vp_assist_page[smp_processor_id()]) { + u64 val; + + val = vmalloc_to_pfn(hv_vp_assist_page[smp_processor_id()]); + val = (val << HV_X64_MSR_VP_ASSIST_PAGE_ADDRESS_SHIFT) | + HV_X64_MSR_VP_ASSIST_PAGE_ENABLE; + + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, val); + } + return 0; } @@ -198,6 +218,12 @@ static int hv_cpu_die(unsigned int cpu) struct hv_reenlightenment_control re_ctrl; unsigned int new_cpu; + if (hv_vp_assist_page && hv_vp_assist_page[cpu]) { + wrmsrl(HV_X64_MSR_VP_ASSIST_PAGE, 0); + vfree(hv_vp_assist_page[cpu]); + hv_vp_assist_page[cpu] = NULL; + } + if (hv_reenlightenment_cb == NULL) return 0; @@ -241,6 +267,13 @@ void hyperv_init(void) if (!hv_vp_index) return; + hv_vp_assist_page = kcalloc(num_possible_cpus(), + sizeof(*hv_vp_assist_page), GFP_KERNEL); + if (!hv_vp_assist_page) { + ms_hyperv.hints &= ~HV_X64_ENLIGHTENED_VMCS_RECOMMENDED; + return; + } + if (cpuhp_setup_state(CPUHP_AP_ONLINE_DYN, "x86/hyperv_init:online", hv_cpu_init, hv_cpu_die) < 0) goto free_vp_index; diff --git a/arch/x86/include/asm/hyperv-tlfs.h b/arch/x86/include/asm/hyperv-tlfs.h index 5c0d8fab87a3..3d0fba18ab4c 100644 --- a/arch/x86/include/asm/hyperv-tlfs.h +++ b/arch/x86/include/asm/hyperv-tlfs.h @@ -156,6 +156,9 @@ /* Recommend using the newer ExProcessorMasks interface */ #define HV_X64_EX_PROCESSOR_MASKS_RECOMMENDED (1 << 11) +/* Recommend using enlightened VMCS */ +#define HV_X64_ENLIGHTENED_VMCS_RECOMMENDED (1 << 14) + /* * Crash notification flag. */ @@ -469,6 +472,16 @@ struct hv_timer_message_payload { __u64 delivery_time; /* When the message was delivered */ }; +/* Define virtual processor assist page structure. */ +struct hv_vp_assist_page { + __u32 apic_assist; + __u32 reserved; + __u64 vtl_control[2]; + __u64 nested_enlightenments_control[2]; + __u32 enlighten_vmentry; + __u64 current_nested_vmcs; +}; + #define HV_STIMER_ENABLE (1ULL << 0) #define HV_STIMER_PERIODIC (1ULL << 1) #define HV_STIMER_LAZY (1ULL << 2) diff --git a/arch/x86/include/asm/mshyperv.h b/arch/x86/include/asm/mshyperv.h index 38cfbe9a5794..dcb40567b35e 100644 --- a/arch/x86/include/asm/mshyperv.h +++ b/arch/x86/include/asm/mshyperv.h @@ -218,6 +218,12 @@ static inline u64 hv_do_rep_hypercall(u16 code, u16 rep_count, u16 varhead_size, */ extern u32 *hv_vp_index; extern u32 hv_max_vp_index; +extern struct hv_vp_assist_page **hv_vp_assist_page; + +static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu) +{ + return hv_vp_assist_page[cpu]; +} /** * hv_cpu_number_to_vp_number() - Map CPU to VP. @@ -254,6 +260,10 @@ static inline void hyperv_setup_mmu_ops(void) {} static inline void set_hv_tscchange_cb(void (*cb)(void)) {} static inline void clear_hv_tscchange_cb(void) {} static inline void hyperv_stop_tsc_emulation(void) {}; +static inline struct hv_vp_assist_page *hv_get_vp_assist_page(unsigned int cpu) +{ + return NULL; +} #endif /* CONFIG_HYPERV */ #ifdef CONFIG_HYPERV_TSCPAGE
Virtual Processor Assist Pages usage allows us to do optimized EOI processing for APIC, enable Enlightened VMCS support in KVM and more. struct hv_vp_assist_page is defined according to the Hyper-V TLFS v5.0b. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> --- Changes since v1: move HV_X64_ENLIGHTENED_VMCS_RECOMMENDED definition to this patch --- arch/x86/hyperv/hv_init.c | 33 +++++++++++++++++++++++++++++++++ arch/x86/include/asm/hyperv-tlfs.h | 13 +++++++++++++ arch/x86/include/asm/mshyperv.h | 10 ++++++++++ 3 files changed, 56 insertions(+)