Message ID | bug-59521-28872@https.bugzilla.kernel.org/ (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
https://bugzilla.kernel.org/show_bug.cgi?id=59521 --- Comment #1 from Eugene Batalov <eabatalov89@gmail.com> 2013-06-11 16:03:55 --- I have reconstructed the uninitialized pvclock read backtrace. References to file lines are for Ubuntu-raring kernel git://kernel.ubuntu.com/ubuntu/ubuntu-raring.git tag is Ubuntu-3.8.0-19.30 bp: 0xf3ccbe68 ip: 0xc103cfbd arch/x86/include/asm/pvclock.h:78 arch/x86/kernel/pvclock.c:74 bp: 0xf3ccbe70 ip: 0xc103c057 arch/x86/kernel/kvmclock.c:91 bp: 0xf3ccbe78 ip: 0xc1017598 arch/x86/kernel/tsc.c:58 bp: 0xf3ccbea8 ip: 0xc107e98d kernel/sched/clock.c:248 bp: 0xf3ccbeb8 ip: 0xc107ea35 kernel/sched/clock.c:342 bp: 0xf3ccbf08 ip: 0xc104ad85 kernel/printk.c:356 bp: 0xf3ccbf50 ip: 0xc104c4e1 kernel/printk.c:1607 bp: 0xf3ccbf70 ip: 0xc1609bb6 kernel/printk.c:1688 bp: 0xf3ccbf90 ip: 0xc1600a51 arch/x86/include/asm/bitops.h:321 arch/x86/kernel/cpu/common.c:1325 bp: 0xf3ccbfb4 ip: 0xc1604000 ?? bp: 0x00000000 kernel/printk.c:356 calls local_clock() calls sched_clock_cpu() calls sched_clock() calls paravirt_sched_clock() calls indirectly kvm_clock_read() unintialized pv_clock is read here vcpu kvmclock initialization is performed in kvm_register_clock. kvm_register_clock is called from static void __init kvm_smp_prepare_boot_cpu(void) called form ./init/main.c:524 as smp_prepare_boot_cpu I'll think about proper fix soon. We probably should fix cpu initialization stages order or disable usage of pvclock before it initialized.
https://bugzilla.kernel.org/show_bug.cgi?id=59521 --- Comment #2 from Eugene Batalov <eabatalov89@gmail.com> 2013-06-15 17:17:18 --- KVM pv_clock initialization for SMP CPU is called very early on SMP CPU boot stage: arch/x86/kernel/smpboot.c: __cpuinit start_secondary(void *unused) ... cpu_init(); x86_cpuinit.early_percpu_clock_init(); early_percpu_clock_init is set as virtual function in ./arch/x86/kernel/kvmclock.c: #ifdef CONFIG_X86_LOCAL_APIC x86_cpuinit.early_percpu_clock_init = kvm_setup_secondary_clock; #endif The problem is in cpu_init() which is called earlier. cpu_init() calls printk and possibly other stuff which can use timestamps. printk calls local_clock() to obtain a timestamp of a log message. On KVM guests call sequence usually ends up in kvm_clock_read but needed rdmsr is executed only in x86_cpuinit.early_percpu_clock_init(). I consider two approaches to fix the problem: 1. Swap cpu_init(); and x86_cpuinit.early_percpu_clock_init(); + Simple - We will get excessive restrictions on operations which allowed to be performed in early_percpu_clock_init() because percpu specific data is initialized only in cpu_init(). 2. Return 0ULL from kvm_clock_read untill it is initialized. + Simple too - Additional if statement inside kvm_clock_read (not serious even for performance paranoiacs) - Returning 0ULL looks ok because it is the same thing which kernel bootstrap CPU does on early boot stages. But I am not quite sure. Better to ask guys who maintain the needed relevant subsystem. I prefer the second way. It doesn't add complex restrictions to CPU bootup code. I'll send a patch soon which fixes the problem in the second way. I don't propagate such a logic to levels higher then KVM clocksource (pv_time_ops level for example) because of the following code: void __init kvmclock_init(void) ... 263 pv_time_ops.sched_clock = kvm_clock_read; 264 x86_platform.calibrate_tsc = kvm_get_tsc_khz; 265 x86_platform.get_wallclock = kvm_get_wallclock; 266 x86_platform.set_wallclock = kvm_set_wallclock; 267 #ifdef CONFIG_X86_LOCAL_APIC 268 x86_cpuinit.early_percpu_clock_init = 269 kvm_setup_secondary_clock; 270 #endif 271 x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; 272 x86_platform.restore_sched_clock_state = kvm_restore_sched_clock_state; To propagate the logic I need to make changes both in x86_platform and pv_time_ops also I should make a similar fix for ia64 arch. It needs some subsystems refactoring to make the changes clean. Dont' think that its worth to fix the bug. Better to make a simple fix.
Il 15/06/2013 19:17, bugzilla-daemon@bugzilla.kernel.org ha scritto: > The problem is in cpu_init() which is called earlier. > cpu_init() calls printk and possibly other stuff which can use timestamps. > printk calls local_clock() to obtain a timestamp of a log message. On KVM > guests call sequence usually ends up in kvm_clock_read but needed rdmsr is > executed only in x86_cpuinit.early_percpu_clock_init(). > > I consider two approaches to fix the problem: > 1. Swap cpu_init(); and x86_cpuinit.early_percpu_clock_init(); > + Simple > - We will get excessive restrictions on operations which allowed to be > performed in early_percpu_clock_init() because percpu specific data is > initialized only in cpu_init(). Considering how simple kvm_register_clock is, I think this is preferrable if it works. Ironically, commit 7069ed6 (x86: kvmclock: allocate pvclock shared memory area, 2012-11-27), which introduced the regression, is what should make this simpler fix possible. Paolo > 2. Return 0ULL from kvm_clock_read untill it is initialized. > + Simple too > - Additional if statement inside kvm_clock_read (not serious even for > performance paranoiacs) > - Returning 0ULL looks ok because it is the same thing which kernel bootstrap > CPU does on early boot stages. But I am not quite sure. Better to ask guys who > maintain the needed relevant subsystem. > > I prefer the second way. It doesn't add complex restrictions to CPU bootup > code. I'll send a patch soon which fixes the problem in the second way. > > I don't propagate such a logic to levels higher then KVM clocksource > (pv_time_ops level for example) because of the following code: > void __init kvmclock_init(void) > ... > 263 pv_time_ops.sched_clock = kvm_clock_read; > 264 x86_platform.calibrate_tsc = kvm_get_tsc_khz; > 265 x86_platform.get_wallclock = kvm_get_wallclock; > 266 x86_platform.set_wallclock = kvm_set_wallclock; > 267 #ifdef CONFIG_X86_LOCAL_APIC > 268 x86_cpuinit.early_percpu_clock_init = > 269 kvm_setup_secondary_clock; > 270 #endif > 271 x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; > 272 x86_platform.restore_sched_clock_state = > kvm_restore_sched_clock_state; > > To propagate the logic I need to make changes both in x86_platform and > pv_time_ops also I should make a similar fix for ia64 arch. It needs some > subsystems refactoring to make the changes clean. Dont' think that its worth to > fix the bug. Better to make a simple fix. > -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
https://bugzilla.kernel.org/show_bug.cgi?id=59521 --- Comment #3 from Anonymous Emailer <anonymous@kernel-bugs.osdl.org> 2013-06-17 15:50:27 --- Reply-To: pbonzini@redhat.com Il 15/06/2013 19:17, bugzilla-daemon@bugzilla.kernel.org ha scritto: > The problem is in cpu_init() which is called earlier. > cpu_init() calls printk and possibly other stuff which can use timestamps. > printk calls local_clock() to obtain a timestamp of a log message. On KVM > guests call sequence usually ends up in kvm_clock_read but needed rdmsr is > executed only in x86_cpuinit.early_percpu_clock_init(). > > I consider two approaches to fix the problem: > 1. Swap cpu_init(); and x86_cpuinit.early_percpu_clock_init(); > + Simple > - We will get excessive restrictions on operations which allowed to be > performed in early_percpu_clock_init() because percpu specific data is > initialized only in cpu_init(). Considering how simple kvm_register_clock is, I think this is preferrable if it works. Ironically, commit 7069ed6 (x86: kvmclock: allocate pvclock shared memory area, 2012-11-27), which introduced the regression, is what should make this simpler fix possible. Paolo > 2. Return 0ULL from kvm_clock_read untill it is initialized. > + Simple too > - Additional if statement inside kvm_clock_read (not serious even for > performance paranoiacs) > - Returning 0ULL looks ok because it is the same thing which kernel bootstrap > CPU does on early boot stages. But I am not quite sure. Better to ask guys who > maintain the needed relevant subsystem. > > I prefer the second way. It doesn't add complex restrictions to CPU bootup > code. I'll send a patch soon which fixes the problem in the second way. > > I don't propagate such a logic to levels higher then KVM clocksource > (pv_time_ops level for example) because of the following code: > void __init kvmclock_init(void) > ... > 263 pv_time_ops.sched_clock = kvm_clock_read; > 264 x86_platform.calibrate_tsc = kvm_get_tsc_khz; > 265 x86_platform.get_wallclock = kvm_get_wallclock; > 266 x86_platform.set_wallclock = kvm_set_wallclock; > 267 #ifdef CONFIG_X86_LOCAL_APIC > 268 x86_cpuinit.early_percpu_clock_init = > 269 kvm_setup_secondary_clock; > 270 #endif > 271 x86_platform.save_sched_clock_state = kvm_save_sched_clock_state; > 272 x86_platform.restore_sched_clock_state = > kvm_restore_sched_clock_state; > > To propagate the logic I need to make changes both in x86_platform and > pv_time_ops also I should make a similar fix for ia64 arch. It needs some > subsystems refactoring to make the changes clean. Dont' think that its worth to > fix the bug. Better to make a simple fix. >
https://bugzilla.kernel.org/show_bug.cgi?id=59521 --- Comment #4 from Eugene Batalov <eabatalov89@gmail.com> 2013-06-17 21:29:08 --- (In reply to comment #3) > Reply-To: pbonzini@redhat.com > > Il 15/06/2013 19:17, bugzilla-daemon@bugzilla.kernel.org ha scritto: > > The problem is in cpu_init() which is called earlier. > > cpu_init() calls printk and possibly other stuff which can use timestamps. > > printk calls local_clock() to obtain a timestamp of a log message. On KVM > > guests call sequence usually ends up in kvm_clock_read but needed rdmsr is > > executed only in x86_cpuinit.early_percpu_clock_init(). > > > > I consider two approaches to fix the problem: > > 1. Swap cpu_init(); and x86_cpuinit.early_percpu_clock_init(); > > + Simple > > - We will get excessive restrictions on operations which allowed to be > > performed in early_percpu_clock_init() because percpu specific data is > > initialized only in cpu_init(). > > Considering how simple kvm_register_clock is, I think this is > preferrable if it works. Ironically, commit 7069ed6 (x86: kvmclock: > allocate pvclock shared memory area, 2012-11-27), which introduced the > regression, is what should make this simpler fix possible. > > Paolo Understood your point. I'll test this fix and report the results.
Fixes regression introduced by 7069ed6 x86: kvmclock: allocate pvclock shared memory area and reported by Eugene: https://bugzilla.kernel.org/show_bug.cgi?id=59521 v2: - "[1/2] x86: kvmclock: zero initialize pvclock shared memory area" added to commit messaged another reason why it's clearing is necessary. - added "[2/2] x86: kvmclock: register per-cpu kvmclock at earliest possible time" to register clock earlier. Igor Mammedov (2): x86: kvmclock: zero initialize pvclock shared memory area x86: kvmclock: register per-cpu kvmclock at earliest possible time arch/x86/kernel/kvmclock.c | 12 ++++++++++-- arch/x86/kernel/smpboot.c | 2 +- arch/x86/mm/pageattr.c | 4 ++-- 3 files changed, 13 insertions(+), 5 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
diff --git a/arch/x86/kernel/kvmclock.c b/arch/x86/kernel/kvmclock.c index 5bedbdd..44aaf25 100644 --- a/arch/x86/kernel/kvmclock.c +++ b/arch/x86/kernel/kvmclock.c @@ -219,6 +222,7 @@ void __init kvmclock_init(void) { unsigned long mem; int size; + int i; size = PAGE_ALIGN(sizeof(struct pvclock_vsyscall_time_info)*NR_CPUS); @@ -238,6 +242,9 @@ void __init kvmclock_init(void) if (!mem) return; hv_clock = __va(mem); + for (i = 0; i < NR_CPUS; ++i) { + hv_clock[i].pvti.version = 0xDEADBEAF; + } if (kvm_register_clock("boot clock")) { hv_clock = NULL;