diff mbox

kvm: Fix kvmclock initialization on !CONFIG_KVM_GUEST

Message ID 87393o1c8q.fsf@devron.myhome.or.jp (mailing list archive)
State New, archived
Headers show

Commit Message

OGAWA Hirofumi Aug. 15, 2012, 2:05 p.m. UTC
If !CONFIG_KVM_GUEST, kvm_smp_prepare_boot_cpu() is not defined. So,
kvm_register_clock("primary cpu clock") in kvm_smp_prepare_boot_cpu()
is not called.

The detail of problem is hv_clock percpu usage. hv_clock is percpu
variable, but kvmclock_init() is called _before_ initializing percpu
area, and doesn't update address after initialized percpu area.

So, host kvm modify the memory area _before_ initializing percpu. This
became the cause of strange memory corruption on guest OS.


This fixes it by adding kvm_smp_prepare_boot_cpu().  [we might be
better to kill the usage before percpu initialization.]

Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---

 arch/x86/kernel/kvmclock.c |   17 +++++++++++++++++
 1 file changed, 17 insertions(+)

Comments

Marcelo Tosatti Aug. 16, 2012, 7:57 p.m. UTC | #1
On Wed, Aug 15, 2012 at 11:05:57PM +0900, OGAWA Hirofumi wrote:
> 
> If !CONFIG_KVM_GUEST, kvm_smp_prepare_boot_cpu() is not defined. So,
> kvm_register_clock("primary cpu clock") in kvm_smp_prepare_boot_cpu()
> is not called.
> 
> The detail of problem is hv_clock percpu usage. hv_clock is percpu
> variable, but kvmclock_init() is called _before_ initializing percpu
> area, and doesn't update address after initialized percpu area.
> 
> So, host kvm modify the memory area _before_ initializing percpu. This
> became the cause of strange memory corruption on guest OS.
> 
> 
> This fixes it by adding kvm_smp_prepare_boot_cpu().  [we might be
> better to kill the usage before percpu initialization.]
> 
> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

The distinction between CONFIG_KVM_CLOCK and CONFIG_KVM_GUEST is 
not so clear anymore, as this bug demonstrates.

There is no point in having a separate config option, therefore i
propose to merge the two (see other reply) instead.

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
OGAWA Hirofumi Aug. 17, 2012, 12:10 a.m. UTC | #2
Marcelo Tosatti <mtosatti@redhat.com> writes:

> On Wed, Aug 15, 2012 at 11:05:57PM +0900, OGAWA Hirofumi wrote:
>> 
>> If !CONFIG_KVM_GUEST, kvm_smp_prepare_boot_cpu() is not defined. So,
>> kvm_register_clock("primary cpu clock") in kvm_smp_prepare_boot_cpu()
>> is not called.
>> 
>> The detail of problem is hv_clock percpu usage. hv_clock is percpu
>> variable, but kvmclock_init() is called _before_ initializing percpu
>> area, and doesn't update address after initialized percpu area.
>> 
>> So, host kvm modify the memory area _before_ initializing percpu. This
>> became the cause of strange memory corruption on guest OS.
>> 
>> 
>> This fixes it by adding kvm_smp_prepare_boot_cpu().  [we might be
>> better to kill the usage before percpu initialization.]
>> 
>> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
>
> The distinction between CONFIG_KVM_CLOCK and CONFIG_KVM_GUEST is 
> not so clear anymore, as this bug demonstrates.
>
> There is no point in having a separate config option, therefore i
> propose to merge the two (see other reply) instead.

Yes, it was an another option to fix this. As note, the wrong percpu
usage (use it before initialization) is still true even if merged
KVM_CLOCK.

Thanks.
Marcelo Tosatti Aug. 17, 2012, 5:54 p.m. UTC | #3
On Fri, Aug 17, 2012 at 09:10:42AM +0900, OGAWA Hirofumi wrote:
> Marcelo Tosatti <mtosatti@redhat.com> writes:
> 
> > On Wed, Aug 15, 2012 at 11:05:57PM +0900, OGAWA Hirofumi wrote:
> >> 
> >> If !CONFIG_KVM_GUEST, kvm_smp_prepare_boot_cpu() is not defined. So,
> >> kvm_register_clock("primary cpu clock") in kvm_smp_prepare_boot_cpu()
> >> is not called.
> >> 
> >> The detail of problem is hv_clock percpu usage. hv_clock is percpu
> >> variable, but kvmclock_init() is called _before_ initializing percpu
> >> area, and doesn't update address after initialized percpu area.
> >> 
> >> So, host kvm modify the memory area _before_ initializing percpu. This
> >> became the cause of strange memory corruption on guest OS.
> >> 
> >> 
> >> This fixes it by adding kvm_smp_prepare_boot_cpu().  [we might be
> >> better to kill the usage before percpu initialization.]
> >> 
> >> Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
> >
> > The distinction between CONFIG_KVM_CLOCK and CONFIG_KVM_GUEST is 
> > not so clear anymore, as this bug demonstrates.
> >
> > There is no point in having a separate config option, therefore i
> > propose to merge the two (see other reply) instead.
> 
> Yes, it was an another option to fix this. As note, the wrong percpu
> usage (use it before initialization) is still true even if merged
> KVM_CLOCK.

Its fine, i believe, because there is a percpu area for the "boot
processor" (see __per_cpu_offset at arch/x86/kernel/setup_percpu.c) 
before proper initialization.

Can you please confirm the proposed config merge fixes the problem 
for you?

> Thanks.
> -- 
> OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>

--
To unsubscribe from this list: send the line "unsubscribe kvm" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
OGAWA Hirofumi Aug. 18, 2012, 12:17 a.m. UTC | #4
Marcelo Tosatti <mtosatti@redhat.com> writes:

>> Yes, it was an another option to fix this. As note, the wrong percpu
>> usage (use it before initialization) is still true even if merged
>> KVM_CLOCK.
>
> Its fine, i believe, because there is a percpu area for the "boot
> processor" (see __per_cpu_offset at arch/x86/kernel/setup_percpu.c) 
> before proper initialization.

Yes. I thought to use early_per_cpu() or similar though, finally I
decided to make a patch with minimum logic change.

> Can you please confirm the proposed config merge fixes the problem 
> for you?

kvm-clock: Using msrs 4b564d01 and 4b564d00
kvm-clock: cpu 0, msr 0:1e80f01, boot clock
...
kvm-clock: cpu 0, msr 0:3e3d2f01, primary cpu clock
...
kvm-clock: cpu 1, msr 0:3e5d2f01, secondary cpu clock

I confirmed kvm-clock of boot-cpu switched address properly with your
patch (msr xxx:yyy is gpa). This fix the problem.

-	  hypervisor.
+	  hypervisor. It includes a paravirtualized clock, so that instead 

BTW, in the patch, above line is having the trailing space - "instead ".

+	  of relying on a PIT (or probably other) emulation by the
+	  underlying device model, the host provides the guest with
+	  timing infrastructure such as time of day, and system time


Thanks.
diff mbox

Patch

diff -puN arch/x86/kernel/kvmclock.c~kvm-fix-kvmclock-init arch/x86/kernel/kvmclock.c
--- tux3fs/arch/x86/kernel/kvmclock.c~kvm-fix-kvmclock-init	2012-08-15 22:15:22.000000000 +0900
+++ tux3fs-hirofumi/arch/x86/kernel/kvmclock.c	2012-08-15 22:16:14.000000000 +0900
@@ -145,6 +145,14 @@  static void kvm_restore_sched_clock_stat
 	kvm_register_clock("primary cpu clock, resume");
 }
 
+#if defined(CONFIG_SMP) && !defined(CONFIG_KVM_GUEST)
+static void __init kvm_smp_prepare_boot_cpu(void)
+{
+	WARN_ON(kvm_register_clock("primary cpu clock"));
+	native_smp_prepare_boot_cpu();
+}
+#endif
+
 #ifdef CONFIG_X86_LOCAL_APIC
 static void __cpuinit kvm_setup_secondary_clock(void)
 {
@@ -194,6 +202,12 @@  void __init kvmclock_init(void)
 	printk(KERN_INFO "kvm-clock: Using msrs %x and %x",
 		msr_kvm_system_time, msr_kvm_wall_clock);
 
+	/*
+	 * This is temporary register until percpu is initialized.
+	 * After percpu was initialized, we register real hv_clock via
+	 * kvm_smp_prepare_boot_cpu().
+	 * FIXME: it wouldn't be good to use before percpu is initialized.
+	 */
 	if (kvm_register_clock("boot clock"))
 		return;
 	pv_time_ops.sched_clock = kvm_clock_read;
@@ -210,6 +224,9 @@  void __init kvmclock_init(void)
 #ifdef CONFIG_KEXEC
 	machine_ops.crash_shutdown  = kvm_crash_shutdown;
 #endif
+#if defined(CONFIG_SMP) && !defined(CONFIG_KVM_GUEST)
+	smp_ops.smp_prepare_boot_cpu = kvm_smp_prepare_boot_cpu;
+#endif
 	kvm_get_preset_lpj();
 	clocksource_register_hz(&kvm_clock, NSEC_PER_SEC);
 	pv_info.paravirt_enabled = 1;