diff mbox series

x86/hyperv: Disable preemption while setting reenlightenment vector

Message ID 20190611212003.26382-1-dima@arista.com (mailing list archive)
State New, archived
Headers show
Series x86/hyperv: Disable preemption while setting reenlightenment vector | expand

Commit Message

Dmitry Safonov June 11, 2019, 9:20 p.m. UTC
KVM support may be compiled as dynamic module, which triggers the
following splat on modprobe:

 KVM: vmx: using Hyper-V Enlightened VMCS
 BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
 CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
 Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
 Call Trace:
  dump_stack+0x61/0x7e
  check_preemption_disabled+0xd4/0xe6
  debug_smp_processor_id+0x17/0x19
  set_hv_tscchange_cb+0x1b/0x89
  kvm_arch_init+0x14a/0x163 [kvm]
  kvm_init+0x30/0x259 [kvm]
  vmx_init+0xed/0x3db [kvm_intel]
  do_one_initcall+0x89/0x1bc
  do_init_module+0x5f/0x207
  load_module+0x1b34/0x209b
  __ia32_sys_init_module+0x17/0x19
  do_fast_syscall_32+0x121/0x1fa
  entry_SYSENTER_compat+0x7f/0x91

The easiest solution seems to be disabling preemption while setting up
reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.

Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
support")

Cc: Andy Lutomirski <luto@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Cathy Avery <cavery@redhat.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "K. Y. Srinivasan" <kys@microsoft.com>
Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
Cc: Mohammed Gamal <mmorsy@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Radim Krčmář <rkrcmar@redhat.com>
Cc: Roman Kagan <rkagan@virtuozzo.com>
Cc: Sasha Levin <sashal@kernel.org>
Cc: Stephen Hemminger <sthemmin@microsoft.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vitaly Kuznetsov <vkuznets@redhat.com>

Cc: devel@linuxdriverproject.org
Cc: kvm@vger.kernel.org
Cc: linux-hyperv@vger.kernel.org
Cc: x86@kernel.org
Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
---
 arch/x86/hyperv/hv_init.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

Comments

Peter Zijlstra June 12, 2019, 9:35 a.m. UTC | #1
On Tue, Jun 11, 2019 at 10:20:03PM +0100, Dmitry Safonov wrote:
> KVM support may be compiled as dynamic module, which triggers the
> following splat on modprobe:
> 
>  KVM: vmx: using Hyper-V Enlightened VMCS
>  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
>  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
>  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
>  Call Trace:
>   dump_stack+0x61/0x7e
>   check_preemption_disabled+0xd4/0xe6
>   debug_smp_processor_id+0x17/0x19
>   set_hv_tscchange_cb+0x1b/0x89
>   kvm_arch_init+0x14a/0x163 [kvm]
>   kvm_init+0x30/0x259 [kvm]
>   vmx_init+0xed/0x3db [kvm_intel]
>   do_one_initcall+0x89/0x1bc
>   do_init_module+0x5f/0x207
>   load_module+0x1b34/0x209b
>   __ia32_sys_init_module+0x17/0x19
>   do_fast_syscall_32+0x121/0x1fa
>   entry_SYSENTER_compat+0x7f/0x91
> 
> The easiest solution seems to be disabling preemption while setting up
> reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.
> 
> Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
> support")
> 
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Cathy Avery <cavery@redhat.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
> Cc: Mohammed Gamal <mmorsy@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
> 
> Cc: devel@linuxdriverproject.org
> Cc: kvm@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org
> Cc: x86@kernel.org
> Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  arch/x86/hyperv/hv_init.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 1608050e9df9..0bdd79ecbff8 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	u64 msr_vp_index;
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>  	void **input_arg;
>  	struct page *pg;
>  
> @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>  
>  	hv_get_vp_index(msr_vp_index);
>  
> -	hv_vp_index[smp_processor_id()] = msr_vp_index;
> +	hv_vp_index[cpu] = msr_vp_index;
>  
>  	if (msr_vp_index > hv_max_vp_index)
>  		hv_max_vp_index = msr_vp_index;
> @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	struct hv_reenlightenment_control re_ctrl = {
>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>  		.enabled = 1,
> -		.target_vp = hv_vp_index[smp_processor_id()]
>  	};
>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>  
> @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	/* Make sure callback is registered before we write to MSRs */
>  	wmb();
>  
> +	preempt_disable();
> +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
>  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
> +	preempt_enable();
> +
>  	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
>  }
>  EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);

This looks bogus, MSRs are a per-cpu resource, you had better know what
CPUs you're on and be stuck to it when you do wrmsr. This just fudges
the code to make the warning go away and doesn't fix the actual problem
afaict.
Vitaly Kuznetsov June 12, 2019, 10:17 a.m. UTC | #2
Dmitry Safonov <dima@arista.com> writes:

> KVM support may be compiled as dynamic module, which triggers the
> following splat on modprobe:
>
>  KVM: vmx: using Hyper-V Enlightened VMCS
>  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
>  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
>  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
>  Call Trace:
>   dump_stack+0x61/0x7e
>   check_preemption_disabled+0xd4/0xe6
>   debug_smp_processor_id+0x17/0x19
>   set_hv_tscchange_cb+0x1b/0x89
>   kvm_arch_init+0x14a/0x163 [kvm]
>   kvm_init+0x30/0x259 [kvm]
>   vmx_init+0xed/0x3db [kvm_intel]
>   do_one_initcall+0x89/0x1bc
>   do_init_module+0x5f/0x207
>   load_module+0x1b34/0x209b
>   __ia32_sys_init_module+0x17/0x19
>   do_fast_syscall_32+0x121/0x1fa
>   entry_SYSENTER_compat+0x7f/0x91

Hm, I never noticed this one, you probably need something like
CONFIG_PREEMPT enabled so see it.

>
> The easiest solution seems to be disabling preemption while setting up
> reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.
>
> Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
> support")
>
> Cc: Andy Lutomirski <luto@kernel.org>
> Cc: Borislav Petkov <bp@alien8.de>
> Cc: Cathy Avery <cavery@redhat.com>
> Cc: Haiyang Zhang <haiyangz@microsoft.com>
> Cc: "H. Peter Anvin" <hpa@zytor.com>
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
> Cc: Mohammed Gamal <mmorsy@redhat.com>
> Cc: Paolo Bonzini <pbonzini@redhat.com>
> Cc: Radim Krčmář <rkrcmar@redhat.com>
> Cc: Roman Kagan <rkagan@virtuozzo.com>
> Cc: Sasha Levin <sashal@kernel.org>
> Cc: Stephen Hemminger <sthemmin@microsoft.com>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>
> Cc: devel@linuxdriverproject.org
> Cc: kvm@vger.kernel.org
> Cc: linux-hyperv@vger.kernel.org
> Cc: x86@kernel.org
> Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
> Signed-off-by: Dmitry Safonov <dima@arista.com>
> ---
>  arch/x86/hyperv/hv_init.c | 9 ++++++---
>  1 file changed, 6 insertions(+), 3 deletions(-)
>
> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> index 1608050e9df9..0bdd79ecbff8 100644
> --- a/arch/x86/hyperv/hv_init.c
> +++ b/arch/x86/hyperv/hv_init.c
> @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>  static int hv_cpu_init(unsigned int cpu)
>  {
>  	u64 msr_vp_index;
> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>  	void **input_arg;
>  	struct page *pg;
>  
> @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>  
>  	hv_get_vp_index(msr_vp_index);
>  
> -	hv_vp_index[smp_processor_id()] = msr_vp_index;
> +	hv_vp_index[cpu] = msr_vp_index;
>  
>  	if (msr_vp_index > hv_max_vp_index)
>  		hv_max_vp_index = msr_vp_index;

The above is unrelated cleanup (as cpu == smp_processor_id() for
CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be
preempted.

> @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	struct hv_reenlightenment_control re_ctrl = {
>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>  		.enabled = 1,
> -		.target_vp = hv_vp_index[smp_processor_id()]
>  	};
>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>  
> @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	/* Make sure callback is registered before we write to MSRs */
>  	wmb();
>  
> +	preempt_disable();
> +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
>  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
> +	preempt_enable();
> +

My personal preference would be to do something like
   int cpu = get_cpu();

   ... set things up ...

   put_cpu();

instead, there are no long-running things in the whole function. But
what you've done should work too, so

Reviewed-by: Vitaly Kuznetsov <vkuznets@redhat.com>

>  	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
>  }
>  EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
Vitaly Kuznetsov June 12, 2019, 10:25 a.m. UTC | #3
Peter Zijlstra <peterz@infradead.org> writes:

> On Tue, Jun 11, 2019 at 10:20:03PM +0100, Dmitry Safonov wrote:
>> KVM support may be compiled as dynamic module, which triggers the
>> following splat on modprobe:
>> 
>>  KVM: vmx: using Hyper-V Enlightened VMCS
>>  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
>>  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
>>  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
>>  Call Trace:
>>   dump_stack+0x61/0x7e
>>   check_preemption_disabled+0xd4/0xe6
>>   debug_smp_processor_id+0x17/0x19
>>   set_hv_tscchange_cb+0x1b/0x89
>>   kvm_arch_init+0x14a/0x163 [kvm]
>>   kvm_init+0x30/0x259 [kvm]
>>   vmx_init+0xed/0x3db [kvm_intel]
>>   do_one_initcall+0x89/0x1bc
>>   do_init_module+0x5f/0x207
>>   load_module+0x1b34/0x209b
>>   __ia32_sys_init_module+0x17/0x19
>>   do_fast_syscall_32+0x121/0x1fa
>>   entry_SYSENTER_compat+0x7f/0x91
>> 
>> The easiest solution seems to be disabling preemption while setting up
>> reenlightment MSRs. While at it, fix hv_cpu_*() callbacks.
>> 
>> Fixes: 93286261de1b4 ("x86/hyperv: Reenlightenment notifications
>> support")
>> 
>> Cc: Andy Lutomirski <luto@kernel.org>
>> Cc: Borislav Petkov <bp@alien8.de>
>> Cc: Cathy Avery <cavery@redhat.com>
>> Cc: Haiyang Zhang <haiyangz@microsoft.com>
>> Cc: "H. Peter Anvin" <hpa@zytor.com>
>> Cc: Ingo Molnar <mingo@redhat.com>
>> Cc: "K. Y. Srinivasan" <kys@microsoft.com>
>> Cc: "Michael Kelley (EOSG)" <Michael.H.Kelley@microsoft.com>
>> Cc: Mohammed Gamal <mmorsy@redhat.com>
>> Cc: Paolo Bonzini <pbonzini@redhat.com>
>> Cc: Radim Krčmář <rkrcmar@redhat.com>
>> Cc: Roman Kagan <rkagan@virtuozzo.com>
>> Cc: Sasha Levin <sashal@kernel.org>
>> Cc: Stephen Hemminger <sthemmin@microsoft.com>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Vitaly Kuznetsov <vkuznets@redhat.com>
>> 
>> Cc: devel@linuxdriverproject.org
>> Cc: kvm@vger.kernel.org
>> Cc: linux-hyperv@vger.kernel.org
>> Cc: x86@kernel.org
>> Reported-by: Prasanna Panchamukhi <panchamukhi@arista.com>
>> Signed-off-by: Dmitry Safonov <dima@arista.com>
>> ---
>>  arch/x86/hyperv/hv_init.c | 9 ++++++---
>>  1 file changed, 6 insertions(+), 3 deletions(-)
>> 
>> diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> index 1608050e9df9..0bdd79ecbff8 100644
>> --- a/arch/x86/hyperv/hv_init.c
>> +++ b/arch/x86/hyperv/hv_init.c
>> @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>>  static int hv_cpu_init(unsigned int cpu)
>>  {
>>  	u64 msr_vp_index;
>> -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>>  	void **input_arg;
>>  	struct page *pg;
>>  
>> @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>>  
>>  	hv_get_vp_index(msr_vp_index);
>>  
>> -	hv_vp_index[smp_processor_id()] = msr_vp_index;
>> +	hv_vp_index[cpu] = msr_vp_index;
>>  
>>  	if (msr_vp_index > hv_max_vp_index)
>>  		hv_max_vp_index = msr_vp_index;
>> @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
>>  	struct hv_reenlightenment_control re_ctrl = {
>>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>>  		.enabled = 1,
>> -		.target_vp = hv_vp_index[smp_processor_id()]
>>  	};
>>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>>  
>> @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
>>  	/* Make sure callback is registered before we write to MSRs */
>>  	wmb();
>>  
>> +	preempt_disable();
>> +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
>>  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
>> +	preempt_enable();
>> +
>>  	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
>>  }
>>  EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);
>
> This looks bogus, MSRs are a per-cpu resource, you had better know what
> CPUs you're on and be stuck to it when you do wrmsr. This just fudges
> the code to make the warning go away and doesn't fix the actual problem
> afaict.

Actually, we don't care which CPU will receive the reenlightenment
notification and TSC Emulation in Hyper-V is, of course, global. We have
code which re-assignes the notification to some other CPU in case the
one it's currently assigned to goes away (see hv_cpu_die()).
Thomas Gleixner June 13, 2019, 7 p.m. UTC | #4
On Wed, 12 Jun 2019, Vitaly Kuznetsov wrote:
> Dmitry Safonov <dima@arista.com> writes:
> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
> > index 1608050e9df9..0bdd79ecbff8 100644
> > --- a/arch/x86/hyperv/hv_init.c
> > +++ b/arch/x86/hyperv/hv_init.c
> > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
> >  static int hv_cpu_init(unsigned int cpu)
> >  {
> >  	u64 msr_vp_index;
> > -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> > +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
> >  	void **input_arg;
> >  	struct page *pg;
> >  
> > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
> >  
> >  	hv_get_vp_index(msr_vp_index);
> >  
> > -	hv_vp_index[smp_processor_id()] = msr_vp_index;
> > +	hv_vp_index[cpu] = msr_vp_index;
> >  
> >  	if (msr_vp_index > hv_max_vp_index)
> >  		hv_max_vp_index = msr_vp_index;
> 
> The above is unrelated cleanup (as cpu == smp_processor_id() for
> CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be
> preempted.

They can be preempted, but they are guaranteed to run on the upcoming CPU,
i.e. smp_processor_id() is allowed even in preemptible context as the task
cannot migrate.

Thanks,

	tglx
Vitaly Kuznetsov June 14, 2019, 8:06 a.m. UTC | #5
Thomas Gleixner <tglx@linutronix.de> writes:

> On Wed, 12 Jun 2019, Vitaly Kuznetsov wrote:
>> Dmitry Safonov <dima@arista.com> writes:
>> > diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
>> > index 1608050e9df9..0bdd79ecbff8 100644
>> > --- a/arch/x86/hyperv/hv_init.c
>> > +++ b/arch/x86/hyperv/hv_init.c
>> > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
>> >  static int hv_cpu_init(unsigned int cpu)
>> >  {
>> >  	u64 msr_vp_index;
>> > -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
>> > +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
>> >  	void **input_arg;
>> >  	struct page *pg;
>> >  
>> > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
>> >  
>> >  	hv_get_vp_index(msr_vp_index);
>> >  
>> > -	hv_vp_index[smp_processor_id()] = msr_vp_index;
>> > +	hv_vp_index[cpu] = msr_vp_index;
>> >  
>> >  	if (msr_vp_index > hv_max_vp_index)
>> >  		hv_max_vp_index = msr_vp_index;
>> 
>> The above is unrelated cleanup (as cpu == smp_processor_id() for
>> CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be
>> preempted.
>
> They can be preempted, but they are guaranteed to run on the upcoming CPU,
> i.e. smp_processor_id() is allowed even in preemptible context as the task
> cannot migrate.
>

Ah, right, thanks! The guarantee that they don't migrate should be enough.
Peter Zijlstra June 14, 2019, 8:28 a.m. UTC | #6
On Wed, Jun 12, 2019 at 12:17:24PM +0200, Vitaly Kuznetsov wrote:
> Dmitry Safonov <dima@arista.com> writes:
> 
> > KVM support may be compiled as dynamic module, which triggers the
> > following splat on modprobe:
> >
> >  KVM: vmx: using Hyper-V Enlightened VMCS
> >  BUG: using smp_processor_id() in preemptible [00000000] code: modprobe/466 caller is debug_smp_processor_id+0x17/0x19
> >  CPU: 0 PID: 466 Comm: modprobe Kdump: loaded Not tainted 4.19.43 #1
> >  Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090007  06/02/2017
> >  Call Trace:
> >   dump_stack+0x61/0x7e
> >   check_preemption_disabled+0xd4/0xe6
> >   debug_smp_processor_id+0x17/0x19
> >   set_hv_tscchange_cb+0x1b/0x89
> >   kvm_arch_init+0x14a/0x163 [kvm]
> >   kvm_init+0x30/0x259 [kvm]
> >   vmx_init+0xed/0x3db [kvm_intel]
> >   do_one_initcall+0x89/0x1bc
> >   do_init_module+0x5f/0x207
> >   load_module+0x1b34/0x209b
> >   __ia32_sys_init_module+0x17/0x19
> >   do_fast_syscall_32+0x121/0x1fa
> >   entry_SYSENTER_compat+0x7f/0x91
> 
> Hm, I never noticed this one, you probably need something like
> CONFIG_PREEMPT enabled so see it.

CONFIG_DEBUG_PREEMPT

> > @@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
> >  static int hv_cpu_init(unsigned int cpu)
> >  {
> >  	u64 msr_vp_index;
> > -	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
> > +	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
> >  	void **input_arg;
> >  	struct page *pg;
> >  
> > @@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
> >  
> >  	hv_get_vp_index(msr_vp_index);
> >  
> > -	hv_vp_index[smp_processor_id()] = msr_vp_index;
> > +	hv_vp_index[cpu] = msr_vp_index;
> >  
> >  	if (msr_vp_index > hv_max_vp_index)
> >  		hv_max_vp_index = msr_vp_index;
> 
> The above is unrelated cleanup (as cpu == smp_processor_id() for
> CPUHP_AP_ONLINE_DYN callbacks), right? As I'm pretty sure these can'd be
> preempted.

Yeah, makes sense though.

> > @@ -182,7 +182,6 @@ void set_hv_tscchange_cb(void (*cb)(void))
> >  	struct hv_reenlightenment_control re_ctrl = {
> >  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
> >  		.enabled = 1,
> > -		.target_vp = hv_vp_index[smp_processor_id()]
> >  	};
> >  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
> >  
> > @@ -196,7 +195,11 @@ void set_hv_tscchange_cb(void (*cb)(void))
> >  	/* Make sure callback is registered before we write to MSRs */
> >  	wmb();
> >  
> > +	preempt_disable();
> > +	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
> >  	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
> > +	preempt_enable();
> > +
> 
> My personal preference would be to do something like
>    int cpu = get_cpu();
> 
>    ... set things up ...
> 
>    put_cpu();

If it doesn't matter, how about this then?

---
diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 1608050e9df9..e58c693a9fce 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -91,7 +91,7 @@ EXPORT_SYMBOL_GPL(hv_max_vp_index);
 static int hv_cpu_init(unsigned int cpu)
 {
 	u64 msr_vp_index;
-	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
+	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
 	void **input_arg;
 	struct page *pg;
 
@@ -103,7 +103,7 @@ static int hv_cpu_init(unsigned int cpu)
 
 	hv_get_vp_index(msr_vp_index);
 
-	hv_vp_index[smp_processor_id()] = msr_vp_index;
+	hv_vp_index[cpu] = msr_vp_index;
 
 	if (msr_vp_index > hv_max_vp_index)
 		hv_max_vp_index = msr_vp_index;
@@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void))
 	struct hv_reenlightenment_control re_ctrl = {
 		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
 		.enabled = 1,
-		.target_vp = hv_vp_index[smp_processor_id()]
+		.target_vp = hv_vp_index[raw_smp_processor_id()]
 	};
 	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
Vitaly Kuznetsov June 14, 2019, 10:08 a.m. UTC | #7
Peter Zijlstra <peterz@infradead.org> writes:

> @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void))
>  	struct hv_reenlightenment_control re_ctrl = {
>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>  		.enabled = 1,
> -		.target_vp = hv_vp_index[smp_processor_id()]
> +		.target_vp = hv_vp_index[raw_smp_processor_id()]
>  	};
>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>  

Yes, this should do, thanks! I'd also suggest to leave a comment like
	/* 
         * This function can get preemted and migrate to a different CPU
	 * but this doesn't matter. We just need to assign
	 * reenlightenment notification to some online CPU. In case this
         * CPU goes offline, hv_cpu_die() will re-assign it to some
 	 * other online CPU.
	 */
Dmitry Safonov June 14, 2019, 11:50 a.m. UTC | #8
On 6/14/19 11:08 AM, Vitaly Kuznetsov wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
>> @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void))
>>  	struct hv_reenlightenment_control re_ctrl = {
>>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>>  		.enabled = 1,
>> -		.target_vp = hv_vp_index[smp_processor_id()]
>> +		.target_vp = hv_vp_index[raw_smp_processor_id()]
>>  	};
>>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>>  
> 
> Yes, this should do, thanks! I'd also suggest to leave a comment like
> 	/* 
>          * This function can get preemted and migrate to a different CPU
> 	 * but this doesn't matter. We just need to assign
> 	 * reenlightenment notification to some online CPU. In case this
>          * CPU goes offline, hv_cpu_die() will re-assign it to some
>  	 * other online CPU.
> 	 */

What if the cpu goes down just before wrmsrl()?
I mean, hv_cpu_die() will reassign another cpu, but this thread will be
resumed on some other cpu and will write cpu number which is at that
moment already down?

(probably I miss something)

And I presume it's guaranteed that during hv_cpu_die() no other cpu may
go down:
:	new_cpu = cpumask_any_but(cpu_online_mask, cpu);
:	re_ctrl.target_vp = hv_vp_index[new_cpu];
:	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
Peter Zijlstra June 14, 2019, 12:27 p.m. UTC | #9
On Fri, Jun 14, 2019 at 12:50:51PM +0100, Dmitry Safonov wrote:
> On 6/14/19 11:08 AM, Vitaly Kuznetsov wrote:
> > Peter Zijlstra <peterz@infradead.org> writes:
> > 
> >> @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void))
> >>  	struct hv_reenlightenment_control re_ctrl = {
> >>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
> >>  		.enabled = 1,
> >> -		.target_vp = hv_vp_index[smp_processor_id()]
> >> +		.target_vp = hv_vp_index[raw_smp_processor_id()]
> >>  	};
> >>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
> >>  
> > 
> > Yes, this should do, thanks! I'd also suggest to leave a comment like
> > 	/* 
> >          * This function can get preemted and migrate to a different CPU
> > 	 * but this doesn't matter. We just need to assign
> > 	 * reenlightenment notification to some online CPU. In case this
> >          * CPU goes offline, hv_cpu_die() will re-assign it to some
> >  	 * other online CPU.
> > 	 */
> 
> What if the cpu goes down just before wrmsrl()?
> I mean, hv_cpu_die() will reassign another cpu, but this thread will be
> resumed on some other cpu and will write cpu number which is at that
> moment already down?
> 
> (probably I miss something)
> 
> And I presume it's guaranteed that during hv_cpu_die() no other cpu may
> go down:
> :	new_cpu = cpumask_any_but(cpu_online_mask, cpu);
> :	re_ctrl.target_vp = hv_vp_index[new_cpu];
> :	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));

Then cpus_read_lock() is the right interface, not preempt_disable().

I know you probably can't change the HV interface, but I'm thinking its
rather daft you have to specify a CPU at all for this. The HV can just
pick one and send the notification there, who cares.
Dmitry Safonov June 14, 2019, 2:28 p.m. UTC | #10
On 6/14/19 1:27 PM, Peter Zijlstra wrote:
> On Fri, Jun 14, 2019 at 12:50:51PM +0100, Dmitry Safonov wrote:
>> On 6/14/19 11:08 AM, Vitaly Kuznetsov wrote:
>>> Peter Zijlstra <peterz@infradead.org> writes:
>>>
>>>> @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void))
>>>>  	struct hv_reenlightenment_control re_ctrl = {
>>>>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>>>>  		.enabled = 1,
>>>> -		.target_vp = hv_vp_index[smp_processor_id()]
>>>> +		.target_vp = hv_vp_index[raw_smp_processor_id()]
>>>>  	};
>>>>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>>>>  
>>>
>>> Yes, this should do, thanks! I'd also suggest to leave a comment like
>>> 	/* 
>>>          * This function can get preemted and migrate to a different CPU
>>> 	 * but this doesn't matter. We just need to assign
>>> 	 * reenlightenment notification to some online CPU. In case this
>>>          * CPU goes offline, hv_cpu_die() will re-assign it to some
>>>  	 * other online CPU.
>>> 	 */
>>
>> What if the cpu goes down just before wrmsrl()?
>> I mean, hv_cpu_die() will reassign another cpu, but this thread will be
>> resumed on some other cpu and will write cpu number which is at that
>> moment already down?
>>
>> (probably I miss something)
>>
>> And I presume it's guaranteed that during hv_cpu_die() no other cpu may
>> go down:
>> :	new_cpu = cpumask_any_but(cpu_online_mask, cpu);
>> :	re_ctrl.target_vp = hv_vp_index[new_cpu];
>> :	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
> 
> Then cpus_read_lock() is the right interface, not preempt_disable().
> 
> I know you probably can't change the HV interface, but I'm thinking its
> rather daft you have to specify a CPU at all for this. The HV can just
> pick one and send the notification there, who cares.

Heh, I thought cpus_read_lock() is more "internal" api and
preempt_diable() is prefered ;-)

Will send v2 with the suggested comment and cpus_read_lock().
Vitaly Kuznetsov June 14, 2019, 9:36 p.m. UTC | #11
Dmitry Safonov <dima@arista.com> writes:

> On 6/14/19 11:08 AM, Vitaly Kuznetsov wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>>> @@ -182,7 +182,7 @@ void set_hv_tscchange_cb(void (*cb)(void))
>>>  	struct hv_reenlightenment_control re_ctrl = {
>>>  		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
>>>  		.enabled = 1,
>>> -		.target_vp = hv_vp_index[smp_processor_id()]
>>> +		.target_vp = hv_vp_index[raw_smp_processor_id()]
>>>  	};
>>>  	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
>>>  
>> 
>> Yes, this should do, thanks! I'd also suggest to leave a comment like
>> 	/* 
>>          * This function can get preemted and migrate to a different CPU
>> 	 * but this doesn't matter. We just need to assign
>> 	 * reenlightenment notification to some online CPU. In case this
>>          * CPU goes offline, hv_cpu_die() will re-assign it to some
>>  	 * other online CPU.
>> 	 */
>
> What if the cpu goes down just before wrmsrl()?
> I mean, hv_cpu_die() will reassign another cpu, but this thread will be
> resumed on some other cpu and will write cpu number which is at that
> moment already down?
>

Right you are, we need to guarantee wrmsr() happens before the CPU gets
a chance to go offline: we don't save the cpu number anywhere, we just
read it with rdmsr() in hv_cpu_die().

>
> And I presume it's guaranteed that during hv_cpu_die() no other cpu may
> go down:
> :	new_cpu = cpumask_any_but(cpu_online_mask, cpu);
> :	re_ctrl.target_vp = hv_vp_index[new_cpu];
> :	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));

I *think* I got convinced that CPUs don't go offline simultaneously when
I was writing this.
Vitaly Kuznetsov June 14, 2019, 9:44 p.m. UTC | #12
Peter Zijlstra <peterz@infradead.org> writes:

>
> I know you probably can't change the HV interface, but I'm thinking its
> rather daft you have to specify a CPU at all for this. The HV can just
> pick one and send the notification there, who cares.

Generally speaking, hypervisor can't know if the CPU is offline (or
e.g. 'isolated') from guest's perspective so I think having an option to
specify affinity for reenlightenment notification is rather a good
thing, not bad.

(Actually, I don't remember if I tried specifying 'HV_ANY' (U32_MAX-1)
here to see what happens. But then I doubt it'll notice the fact that we 
offlined some CPU so we may get a totally unexpected IRQ there).
diff mbox series

Patch

diff --git a/arch/x86/hyperv/hv_init.c b/arch/x86/hyperv/hv_init.c
index 1608050e9df9..0bdd79ecbff8 100644
--- a/arch/x86/hyperv/hv_init.c
+++ b/arch/x86/hyperv/hv_init.c
@@ -91,7 +91,7 @@  EXPORT_SYMBOL_GPL(hv_max_vp_index);
 static int hv_cpu_init(unsigned int cpu)
 {
 	u64 msr_vp_index;
-	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[smp_processor_id()];
+	struct hv_vp_assist_page **hvp = &hv_vp_assist_page[cpu];
 	void **input_arg;
 	struct page *pg;
 
@@ -103,7 +103,7 @@  static int hv_cpu_init(unsigned int cpu)
 
 	hv_get_vp_index(msr_vp_index);
 
-	hv_vp_index[smp_processor_id()] = msr_vp_index;
+	hv_vp_index[cpu] = msr_vp_index;
 
 	if (msr_vp_index > hv_max_vp_index)
 		hv_max_vp_index = msr_vp_index;
@@ -182,7 +182,6 @@  void set_hv_tscchange_cb(void (*cb)(void))
 	struct hv_reenlightenment_control re_ctrl = {
 		.vector = HYPERV_REENLIGHTENMENT_VECTOR,
 		.enabled = 1,
-		.target_vp = hv_vp_index[smp_processor_id()]
 	};
 	struct hv_tsc_emulation_control emu_ctrl = {.enabled = 1};
 
@@ -196,7 +195,11 @@  void set_hv_tscchange_cb(void (*cb)(void))
 	/* Make sure callback is registered before we write to MSRs */
 	wmb();
 
+	preempt_disable();
+	re_ctrl.target_vp = hv_vp_index[smp_processor_id()];
 	wrmsrl(HV_X64_MSR_REENLIGHTENMENT_CONTROL, *((u64 *)&re_ctrl));
+	preempt_enable();
+
 	wrmsrl(HV_X64_MSR_TSC_EMULATION_CONTROL, *((u64 *)&emu_ctrl));
 }
 EXPORT_SYMBOL_GPL(set_hv_tscchange_cb);