Message ID | 20190611212003.26382-1-dima@arista.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | x86/hyperv: Disable preemption while setting reenlightenment vector | expand |
On Tue, Jun 11, 2019 at 10:20:03PM +0100, Dmitry Safonov wrote: > KVM support may be compiled as dynamic module, which triggers the > following splat on modprobe: > > KVM: vmx: using Hyper-V Enlightened VMCS > BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19 > CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1 > Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 > Call Trace: > dump_stack+0x61/0x7e > check_preemption_disabled+0xd4/0xe6 > debug_smp_processor_id+0x17/0x19 > set_hv_tscchange_cb+0x1b/0x89 > kvm_arch_init+0x14a/0x163 [kvm] > kvm_init+0x30/0x259 [kvm] > vmx_init+0xed/0x3db [kvm_intel] > do_one_initcall+0x89/0x1bc > do_init_module+0x5f/0x207 > load_module+0x1b34/0x209b > __ia32_sys_init_module+0x17/0x19 > do_fast_syscall_32+0x121/0x1fa > entry_SYSENTER_compat+0x7f/0x91 > > The easiest solution seems to be disabling preemption while setting up > reenlightment MSRs. While at it, fix hv_cpu_*() callbacks. > > Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications > support") > > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Cathy Avery <cavery@redhat.com> > Cc: Haiyang Zhang <haiyangz@microsoft.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: "K. Y. Srinivasan" <kys@microsoft.com> > Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> > Cc: Mohammed Gamal <mmorsy@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Cc: Roman Kagan <rkagan@virtuozzo.com> > Cc: Sasha Levin <sashal@kernel.org> > Cc: Stephen Hemminger <sthemmin@microsoft.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > Cc: devel@linuxdriverproject.org > Cc: kvm@vger.kernel.org > Cc: linux-hyperv@vger.kernel.org > Cc: x86@kernel.org > Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com> > Signed-off-by: Dmitry Safonov <dima@arista.com> > --- > arch/x86/hyperv/hv_init.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index 1608050e9df9..0bdd79ecbff8 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index); > static int hv_cpu_init(unsigned int cpu) > { > u64 msr_vp_index; > - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; > + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; > void **input_arg; > struct page *pg; > > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu) > > hv_get_vp_index(msr_vp_index); > > - hv_vp_index[smp_processor_id()] = msr_vp_index; > + hv_vp_index[cpu] = msr_vp_index; > > if (msr_vp_index > hv_max_vp_index) > hv_max_vp_index = msr_vp_index; > @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void)) > struct hv_reenlightenment_control re_ctrl = { > .vector = HYPERV_REENLIGHTENMENT_VECTOR, > .enabled = 1, > - .target_vp = hv_vp_index[smp_processor_id()] > }; > struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1}; > > @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void)) > /* Make sure callback is registered before we write to MSRs */ > wmb(); > > + preempt_disable(); > + re_ctrl.target_vp = hv_vp_index[smp_processor_id()]; > wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl)); > + preempt_enable(); > + > wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl)); > } > EXPORT_SYMBOL_GPL(set_hv_tscchange_cb); This looks bogus, MSRs are a per-cpu resource, you had better know what CPUs you're on and be stuck to it when you do wrmsr. This just fudges the code to make the warning go away and doesn't fix the actual problem afaict.
Dmitry Safonov <dima@arista.com> writes: > KVM support may be compiled as dynamic module, which triggers the > following splat on modprobe: > > KVM: vmx: using Hyper-V Enlightened VMCS > BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19 > CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1 > Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 > Call Trace: > dump_stack+0x61/0x7e > check_preemption_disabled+0xd4/0xe6 > debug_smp_processor_id+0x17/0x19 > set_hv_tscchange_cb+0x1b/0x89 > kvm_arch_init+0x14a/0x163 [kvm] > kvm_init+0x30/0x259 [kvm] > vmx_init+0xed/0x3db [kvm_intel] > do_one_initcall+0x89/0x1bc > do_init_module+0x5f/0x207 > load_module+0x1b34/0x209b > __ia32_sys_init_module+0x17/0x19 > do_fast_syscall_32+0x121/0x1fa > entry_SYSENTER_compat+0x7f/0x91 Hm, I never noticed this one, you probably need something like CONFIG_PREEMPT enabled so see it. > > The easiest solution seems to be disabling preemption while setting up > reenlightment MSRs. While at it, fix hv_cpu_*() callbacks. > > Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications > support") > > Cc: Andy Lutomirski <luto@kernel.org> > Cc: Borislav Petkov <bp@alien8.de> > Cc: Cathy Avery <cavery@redhat.com> > Cc: Haiyang Zhang <haiyangz@microsoft.com> > Cc: "H. Peter Anvin" <hpa@zytor.com> > Cc: Ingo Molnar <mingo@redhat.com> > Cc: "K. Y. Srinivasan" <kys@microsoft.com> > Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> > Cc: Mohammed Gamal <mmorsy@redhat.com> > Cc: Paolo Bonzini <pbonzini@redhat.com> > Cc: Radim Krčmář <rkrcmar@redhat.com> > Cc: Roman Kagan <rkagan@virtuozzo.com> > Cc: Sasha Levin <sashal@kernel.org> > Cc: Stephen Hemminger <sthemmin@microsoft.com> > Cc: Thomas Gleixner <tglx@linutronix.de> > Cc: Vitaly Kuznetsov <vkuznets@redhat.com> > > Cc: devel@linuxdriverproject.org > Cc: kvm@vger.kernel.org > Cc: linux-hyperv@vger.kernel.org > Cc: x86@kernel.org > Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com> > Signed-off-by: Dmitry Safonov <dima@arista.com> > --- > arch/x86/hyperv/hv_init.c | 9 ++++++--- > 1 file changed, 6 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > index 1608050e9df9..0bdd79ecbff8 100644 > --- a/arch/x86/hyperv/hv_init.c > +++ b/arch/x86/hyperv/hv_init.c > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index); > static int hv_cpu_init(unsigned int cpu) > { > u64 msr_vp_index; > - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; > + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; > void **input_arg; > struct page *pg; > > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu) > > hv_get_vp_index(msr_vp_index); > > - hv_vp_index[smp_processor_id()] = msr_vp_index; > + hv_vp_index[cpu] = msr_vp_index; > > if (msr_vp_index > hv_max_vp_index) > hv_max_vp_index = msr_vp_index; The above is unrelated cleanup (as cpu == smp_processor_id() for CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be preempted. > @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void)) > struct hv_reenlightenment_control re_ctrl = { > .vector = HYPERV_REENLIGHTENMENT_VECTOR, > .enabled = 1, > - .target_vp = hv_vp_index[smp_processor_id()] > }; > struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1}; > > @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void)) > /* Make sure callback is registered before we write to MSRs */ > wmb(); > > + preempt_disable(); > + re_ctrl.target_vp = hv_vp_index[smp_processor_id()]; > wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl)); > + preempt_enable(); > + My personal preference would be to do something like int cpu = get_cpu(); ... set things up ... put_cpu(); instead, there are no long-running things in the whole function. But what you've done should work too, so Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com> > wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl)); > } > EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
Peter Zijlstra <peterz@infradead.org> writes: > On Tue, Jun 11, 2019 at 10:20:03PM +0100, Dmitry Safonov wrote: >> KVM support may be compiled as dynamic module, which triggers the >> following splat on modprobe: >> >> KVM: vmx: using Hyper-V Enlightened VMCS >> BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19 >> CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1 >> Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 >> Call Trace: >> dump_stack+0x61/0x7e >> check_preemption_disabled+0xd4/0xe6 >> debug_smp_processor_id+0x17/0x19 >> set_hv_tscchange_cb+0x1b/0x89 >> kvm_arch_init+0x14a/0x163 [kvm] >> kvm_init+0x30/0x259 [kvm] >> vmx_init+0xed/0x3db [kvm_intel] >> do_one_initcall+0x89/0x1bc >> do_init_module+0x5f/0x207 >> load_module+0x1b34/0x209b >> __ia32_sys_init_module+0x17/0x19 >> do_fast_syscall_32+0x121/0x1fa >> entry_SYSENTER_compat+0x7f/0x91 >> >> The easiest solution seems to be disabling preemption while setting up >> reenlightment MSRs. While at it, fix hv_cpu_*() callbacks. >> >> Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications >> support") >> >> Cc: Andy Lutomirski <luto@kernel.org> >> Cc: Borislav Petkov <bp@alien8.de> >> Cc: Cathy Avery <cavery@redhat.com> >> Cc: Haiyang Zhang <haiyangz@microsoft.com> >> Cc: "H. Peter Anvin" <hpa@zytor.com> >> Cc: Ingo Molnar <mingo@redhat.com> >> Cc: "K. Y. Srinivasan" <kys@microsoft.com> >> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> >> Cc: Mohammed Gamal <mmorsy@redhat.com> >> Cc: Paolo Bonzini <pbonzini@redhat.com> >> Cc: Radim Krčmář <rkrcmar@redhat.com> >> Cc: Roman Kagan <rkagan@virtuozzo.com> >> Cc: Sasha Levin <sashal@kernel.org> >> Cc: Stephen Hemminger <sthemmin@microsoft.com> >> Cc: Thomas Gleixner <tglx@linutronix.de> >> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> >> >> Cc: devel@linuxdriverproject.org >> Cc: kvm@vger.kernel.org >> Cc: linux-hyperv@vger.kernel.org >> Cc: x86@kernel.org >> Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com> >> Signed-off-by: Dmitry Safonov <dima@arista.com> >> --- >> arch/x86/hyperv/hv_init.c | 9 ++++++--- >> 1 file changed, 6 insertions(+), 3 deletions(-) >> >> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c >> index 1608050e9df9..0bdd79ecbff8 100644 >> --- a/arch/x86/hyperv/hv_init.c >> +++ b/arch/x86/hyperv/hv_init.c >> @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index); >> static int hv_cpu_init(unsigned int cpu) >> { >> u64 msr_vp_index; >> - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; >> + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; >> void **input_arg; >> struct page *pg; >> >> @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu) >> >> hv_get_vp_index(msr_vp_index); >> >> - hv_vp_index[smp_processor_id()] = msr_vp_index; >> + hv_vp_index[cpu] = msr_vp_index; >> >> if (msr_vp_index > hv_max_vp_index) >> hv_max_vp_index = msr_vp_index; >> @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void)) >> struct hv_reenlightenment_control re_ctrl = { >> .vector = HYPERV_REENLIGHTENMENT_VECTOR, >> .enabled = 1, >> - .target_vp = hv_vp_index[smp_processor_id()] >> }; >> struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1}; >> >> @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void)) >> /* Make sure callback is registered before we write to MSRs */ >> wmb(); >> >> + preempt_disable(); >> + re_ctrl.target_vp = hv_vp_index[smp_processor_id()]; >> wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl)); >> + preempt_enable(); >> + >> wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl)); >> } >> EXPORT_SYMBOL_GPL(set_hv_tscchange_cb); > > This looks bogus, MSRs are a per-cpu resource, you had better know what > CPUs you're on and be stuck to it when you do wrmsr. This just fudges > the code to make the warning go away and doesn't fix the actual problem > afaict. Actually, we don't care which CPU will receive the reenlightenment notification and TSC Emulation in Hyper-V is, of course, global. We have code which re-assignes the notification to some other CPU in case the one it's currently assigned to goes away (see hv_cpu_die()).
On Wed, 12 Jun 2019, Vitaly Kuznetsov wrote: > Dmitry Safonov <dima@arista.com> writes: > > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c > > index 1608050e9df9..0bdd79ecbff8 100644 > > --- a/arch/x86/hyperv/hv_init.c > > +++ b/arch/x86/hyperv/hv_init.c > > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index); > > static int hv_cpu_init(unsigned int cpu) > > { > > u64 msr_vp_index; > > - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; > > + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; > > void **input_arg; > > struct page *pg; > > > > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu) > > > > hv_get_vp_index(msr_vp_index); > > > > - hv_vp_index[smp_processor_id()] = msr_vp_index; > > + hv_vp_index[cpu] = msr_vp_index; > > > > if (msr_vp_index > hv_max_vp_index) > > hv_max_vp_index = msr_vp_index; > > The above is unrelated cleanup (as cpu == smp_processor_id() for > CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be > preempted. They can be preempted, but they are guaranteed to run on the upcoming CPU, i.e. smp_processor_id() is allowed even in preemptible context as the task cannot migrate. Thanks, tglx
Thomas Gleixner <tglx@linutronix.de> writes: > On Wed, 12 Jun 2019, Vitaly Kuznetsov wrote: >> Dmitry Safonov <dima@arista.com> writes: >> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c >> > index 1608050e9df9..0bdd79ecbff8 100644 >> > --- a/arch/x86/hyperv/hv_init.c >> > +++ b/arch/x86/hyperv/hv_init.c >> > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index); >> > static int hv_cpu_init(unsigned int cpu) >> > { >> > u64 msr_vp_index; >> > - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; >> > + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; >> > void **input_arg; >> > struct page *pg; >> > >> > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu) >> > >> > hv_get_vp_index(msr_vp_index); >> > >> > - hv_vp_index[smp_processor_id()] = msr_vp_index; >> > + hv_vp_index[cpu] = msr_vp_index; >> > >> > if (msr_vp_index > hv_max_vp_index) >> > hv_max_vp_index = msr_vp_index; >> >> The above is unrelated cleanup (as cpu == smp_processor_id() for >> CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be >> preempted. > > They can be preempted, but they are guaranteed to run on the upcoming CPU, > i.e. smp_processor_id() is allowed even in preemptible context as the task > cannot migrate. > Ah, right, thanks! The guarantee that they don't migrate should be enough.
On Wed, Jun 12, 2019 at 12:17:24PM +0200, Vitaly Kuznetsov wrote: > Dmitry Safonov <dima@arista.com> writes: > > > KVM support may be compiled as dynamic module, which triggers the > > following splat on modprobe: > > > > KVM: vmx: using Hyper-V Enlightened VMCS > > BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19 > > CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1 > > Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 > > Call Trace: > > dump_stack+0x61/0x7e > > check_preemption_disabled+0xd4/0xe6 > > debug_smp_processor_id+0x17/0x19 > > set_hv_tscchange_cb+0x1b/0x89 > > kvm_arch_init+0x14a/0x163 [kvm] > > kvm_init+0x30/0x259 [kvm] > > vmx_init+0xed/0x3db [kvm_intel] > > do_one_initcall+0x89/0x1bc > > do_init_module+0x5f/0x207 > > load_module+0x1b34/0x209b > > __ia32_sys_init_module+0x17/0x19 > > do_fast_syscall_32+0x121/0x1fa > > entry_SYSENTER_compat+0x7f/0x91 > > Hm, I never noticed this one, you probably need something like > CONFIG_PREEMPT enabled so see it. CONFIG_DEBUG_PREEMPT > > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index); > > static int hv_cpu_init(unsigned int cpu) > > { > > u64 msr_vp_index; > > - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; > > + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; > > void **input_arg; > > struct page *pg; > > > > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu) > > > > hv_get_vp_index(msr_vp_index); > > > > - hv_vp_index[smp_processor_id()] = msr_vp_index; > > + hv_vp_index[cpu] = msr_vp_index; > > > > if (msr_vp_index > hv_max_vp_index) > > hv_max_vp_index = msr_vp_index; > > The above is unrelated cleanup (as cpu == smp_processor_id() for > CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be > preempted. Yeah, makes sense though. > > @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void)) > > struct hv_reenlightenment_control re_ctrl = { > > .vector = HYPERV_REENLIGHTENMENT_VECTOR, > > .enabled = 1, > > - .target_vp = hv_vp_index[smp_processor_id()] > > }; > > struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1}; > > > > @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void)) > > /* Make sure callback is registered before we write to MSRs */ > > wmb(); > > > > + preempt_disable(); > > + re_ctrl.target_vp = hv_vp_index[smp_processor_id()]; > > wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl)); > > + preempt_enable(); > > + > > My personal preference would be to do something like > int cpu = get_cpu(); > > ... set things up ... > > put_cpu(); If it doesn't matter, how about this then? --- diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 1608050e9df9..e58c693a9fce 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index); static int hv_cpu_init(unsigned int cpu) { u64 msr_vp_index; - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; void **input_arg; struct page *pg; @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu) hv_get_vp_index(msr_vp_index); - hv_vp_index[smp_processor_id()] = msr_vp_index; + hv_vp_index[cpu] = msr_vp_index; if (msr_vp_index > hv_max_vp_index) hv_max_vp_index = msr_vp_index; @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void)) struct hv_reenlightenment_control re_ctrl = { .vector = HYPERV_REENLIGHTENMENT_VECTOR, .enabled = 1, - .target_vp = hv_vp_index[smp_processor_id()] + .target_vp = hv_vp_index[raw_smp_processor_id()] }; struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
Peter Zijlstra <peterz@infradead.org> writes: > @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void)) > struct hv_reenlightenment_control re_ctrl = { > .vector = HYPERV_REENLIGHTENMENT_VECTOR, > .enabled = 1, > - .target_vp = hv_vp_index[smp_processor_id()] > + .target_vp = hv_vp_index[raw_smp_processor_id()] > }; > struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1}; > Yes, this should do, thanks! I'd also suggest to leave a comment like /* * This function can get preemted and migrate to a different CPU * but this doesn't matter. We just need to assign * reenlightenment notification to some online CPU. In case this * CPU goes offline, hv_cpu_die() will re-assign it to some * other online CPU. */
On 6/14/19 11:08 AM, Vitaly Kuznetsov wrote: > Peter Zijlstra <peterz@infradead.org> writes: > >> @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void)) >> struct hv_reenlightenment_control re_ctrl = { >> .vector = HYPERV_REENLIGHTENMENT_VECTOR, >> .enabled = 1, >> - .target_vp = hv_vp_index[smp_processor_id()] >> + .target_vp = hv_vp_index[raw_smp_processor_id()] >> }; >> struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1}; >> > > Yes, this should do, thanks! I'd also suggest to leave a comment like > /* > * This function can get preemted and migrate to a different CPU > * but this doesn't matter. We just need to assign > * reenlightenment notification to some online CPU. In case this > * CPU goes offline, hv_cpu_die() will re-assign it to some > * other online CPU. > */ What if the cpu goes down just before wrmsrl()? I mean, hv_cpu_die() will reassign another cpu, but this thread will be resumed on some other cpu and will write cpu number which is at that moment already down? (probably I miss something) And I presume it's guaranteed that during hv_cpu_die() no other cpu may go down: : new_cpu = cpumask_any_but(cpu_online_mask, cpu); : re_ctrl.target_vp = hv_vp_index[new_cpu]; : wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
On Fri, Jun 14, 2019 at 12:50:51PM +0100, Dmitry Safonov wrote: > On 6/14/19 11:08 AM, Vitaly Kuznetsov wrote: > > Peter Zijlstra <peterz@infradead.org> writes: > > > >> @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void)) > >> struct hv_reenlightenment_control re_ctrl = { > >> .vector = HYPERV_REENLIGHTENMENT_VECTOR, > >> .enabled = 1, > >> - .target_vp = hv_vp_index[smp_processor_id()] > >> + .target_vp = hv_vp_index[raw_smp_processor_id()] > >> }; > >> struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1}; > >> > > > > Yes, this should do, thanks! I'd also suggest to leave a comment like > > /* > > * This function can get preemted and migrate to a different CPU > > * but this doesn't matter. We just need to assign > > * reenlightenment notification to some online CPU. In case this > > * CPU goes offline, hv_cpu_die() will re-assign it to some > > * other online CPU. > > */ > > What if the cpu goes down just before wrmsrl()? > I mean, hv_cpu_die() will reassign another cpu, but this thread will be > resumed on some other cpu and will write cpu number which is at that > moment already down? > > (probably I miss something) > > And I presume it's guaranteed that during hv_cpu_die() no other cpu may > go down: > : new_cpu = cpumask_any_but(cpu_online_mask, cpu); > : re_ctrl.target_vp = hv_vp_index[new_cpu]; > : wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl)); Then cpus_read_lock() is the right interface, not preempt_disable(). I know you probably can't change the HV interface, but I'm thinking its rather daft you have to specify a CPU at all for this. The HV can just pick one and send the notification there, who cares.
On 6/14/19 1:27 PM, Peter Zijlstra wrote: > On Fri, Jun 14, 2019 at 12:50:51PM +0100, Dmitry Safonov wrote: >> On 6/14/19 11:08 AM, Vitaly Kuznetsov wrote: >>> Peter Zijlstra <peterz@infradead.org> writes: >>> >>>> @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void)) >>>> struct hv_reenlightenment_control re_ctrl = { >>>> .vector = HYPERV_REENLIGHTENMENT_VECTOR, >>>> .enabled = 1, >>>> - .target_vp = hv_vp_index[smp_processor_id()] >>>> + .target_vp = hv_vp_index[raw_smp_processor_id()] >>>> }; >>>> struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1}; >>>> >>> >>> Yes, this should do, thanks! I'd also suggest to leave a comment like >>> /* >>> * This function can get preemted and migrate to a different CPU >>> * but this doesn't matter. We just need to assign >>> * reenlightenment notification to some online CPU. In case this >>> * CPU goes offline, hv_cpu_die() will re-assign it to some >>> * other online CPU. >>> */ >> >> What if the cpu goes down just before wrmsrl()? >> I mean, hv_cpu_die() will reassign another cpu, but this thread will be >> resumed on some other cpu and will write cpu number which is at that >> moment already down? >> >> (probably I miss something) >> >> And I presume it's guaranteed that during hv_cpu_die() no other cpu may >> go down: >> : new_cpu = cpumask_any_but(cpu_online_mask, cpu); >> : re_ctrl.target_vp = hv_vp_index[new_cpu]; >> : wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl)); > > Then cpus_read_lock() is the right interface, not preempt_disable(). > > I know you probably can't change the HV interface, but I'm thinking its > rather daft you have to specify a CPU at all for this. The HV can just > pick one and send the notification there, who cares. Heh, I thought cpus_read_lock() is more "internal" api and preempt_diable() is prefered ;-) Will send v2 with the suggested comment and cpus_read_lock().
Dmitry Safonov <dima@arista.com> writes: > On 6/14/19 11:08 AM, Vitaly Kuznetsov wrote: >> Peter Zijlstra <peterz@infradead.org> writes: >> >>> @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void)) >>> struct hv_reenlightenment_control re_ctrl = { >>> .vector = HYPERV_REENLIGHTENMENT_VECTOR, >>> .enabled = 1, >>> - .target_vp = hv_vp_index[smp_processor_id()] >>> + .target_vp = hv_vp_index[raw_smp_processor_id()] >>> }; >>> struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1}; >>> >> >> Yes, this should do, thanks! I'd also suggest to leave a comment like >> /* >> * This function can get preemted and migrate to a different CPU >> * but this doesn't matter. We just need to assign >> * reenlightenment notification to some online CPU. In case this >> * CPU goes offline, hv_cpu_die() will re-assign it to some >> * other online CPU. >> */ > > What if the cpu goes down just before wrmsrl()? > I mean, hv_cpu_die() will reassign another cpu, but this thread will be > resumed on some other cpu and will write cpu number which is at that > moment already down? > Right you are, we need to guarantee wrmsr() happens before the CPU gets a chance to go offline: we don't save the cpu number anywhere, we just read it with rdmsr() in hv_cpu_die(). > > And I presume it's guaranteed that during hv_cpu_die() no other cpu may > go down: > : new_cpu = cpumask_any_but(cpu_online_mask, cpu); > : re_ctrl.target_vp = hv_vp_index[new_cpu]; > : wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl)); I *think* I got convinced that CPUs don't go offline simultaneously when I was writing this.
Peter Zijlstra <peterz@infradead.org> writes: > > I know you probably can't change the HV interface, but I'm thinking its > rather daft you have to specify a CPU at all for this. The HV can just > pick one and send the notification there, who cares. Generally speaking, hypervisor can't know if the CPU is offline (or e.g. 'isolated') from guest's perspective so I think having an option to specify affinity for reenlightenment notification is rather a good thing, not bad. (Actually, I don't remember if I tried specifying 'HV_ANY' (U32_MAX-1) here to see what happens. But then I doubt it'll notice the fact that we offlined some CPU so we may get a totally unexpected IRQ there).
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c index 1608050e9df9..0bdd79ecbff8 100644 --- a/arch/x86/hyperv/hv_init.c +++ b/arch/x86/hyperv/hv_init.c @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index); static int hv_cpu_init(unsigned int cpu) { u64 msr_vp_index; - struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()]; + struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu]; void **input_arg; struct page *pg; @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu) hv_get_vp_index(msr_vp_index); - hv_vp_index[smp_processor_id()] = msr_vp_index; + hv_vp_index[cpu] = msr_vp_index; if (msr_vp_index > hv_max_vp_index) hv_max_vp_index = msr_vp_index; @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void)) struct hv_reenlightenment_control re_ctrl = { .vector = HYPERV_REENLIGHTENMENT_VECTOR, .enabled = 1, - .target_vp = hv_vp_index[smp_processor_id()] }; struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1}; @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void)) /* Make sure callback is registered before we write to MSRs */ wmb(); + preempt_disable(); + re_ctrl.target_vp = hv_vp_index[smp_processor_id()]; wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl)); + preempt_enable(); + wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl)); } EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
KVM support may be compiled as dynamic module, which triggers the following splat on modprobe: KVM: vmx: using Hyper-V Enlightened VMCS BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19 CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007 06/02/2017 Call Trace: dump_stack+0x61/0x7e check_preemption_disabled+0xd4/0xe6 debug_smp_processor_id+0x17/0x19 set_hv_tscchange_cb+0x1b/0x89 kvm_arch_init+0x14a/0x163 [kvm] kvm_init+0x30/0x259 [kvm] vmx_init+0xed/0x3db [kvm_intel] do_one_initcall+0x89/0x1bc do_init_module+0x5f/0x207 load_module+0x1b34/0x209b __ia32_sys_init_module+0x17/0x19 do_fast_syscall_32+0x121/0x1fa entry_SYSENTER_compat+0x7f/0x91 The easiest solution seems to be disabling preemption while setting up reenlightment MSRs. While at it, fix hv_cpu_*() callbacks. Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications support") Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Cathy Avery <cavery@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com> Cc: Mohammed Gamal <mmorsy@redhat.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Cc: Roman Kagan <rkagan@virtuozzo.com> Cc: Sasha Levin <sashal@kernel.org> Cc: Stephen Hemminger <sthemmin@microsoft.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vitaly Kuznetsov <vkuznets@redhat.com> Cc: devel@linuxdriverproject.org Cc: kvm@vger.kernel.org Cc: linux-hyperv@vger.kernel.org Cc: x86@kernel.org Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com> Signed-off-by: Dmitry Safonov <dima@arista.com> --- arch/x86/hyperv/hv_init.c | 9 ++++++--- 1 file changed, 6 insertions(+), 3 deletions(-)