Message ID | 20220224215049.2845-1-dongli.zhang@oracle.com (mailing list archive) |
---|---|
Headers | show |
Series | xen: fix HVM kexec kernel panic | expand |
On 2/24/22 4:50 PM, Dongli Zhang wrote: > This is the v3 of the patch to fix xen kexec kernel panic issue when the > kexec is triggered on VCPU >= 32. > > PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20 > [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1 > [ 0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020 > [ 0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0 > ... ... > [ 0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX: 0000000000000000 > [ 0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: 0000000000000002 > [ 0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: 0000000000000020 > [ 0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001 > [ 0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: 0000000040000004 > [ 0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: 0000000000000000 > [ 0.000000] FS: 0000000000000000(0000) GS:ffffffffab588000(0000) knlGS:0000000000000000 > [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: 00000000000406a0 > [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 0.000000] Call Trace: > [ 0.000000] <TASK> > [ 0.000000] ? xen_clocksource_read+0x24/0x40 This is done to set xen_sched_clock_offset which I think will not be used for a while, until sched_clock is called (and the other two uses are for suspend/resume) Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until after all vcpu areas are properly set? Or are there other uses of xen_clocksource_read() before ? -boris > [ 0.000000] ? xen_init_time_common+0x5/0x49 > [ 0.000000] ? xen_hvm_init_time_ops+0x23/0x45 > [ 0.000000] ? xen_hvm_guest_init+0x221/0x25c > [ 0.000000] ? 0xffffffffa9600000 > [ 0.000000] ? setup_arch+0x440/0xbd6 > [ 0.000000] ? start_kernel+0x6c/0x695 > [ 0.000000] ? secondary_startup_64_no_verify+0xd5/0xdb > [ 0.000000] </TASK> > > > Changed since v1: > - Add commit message to explain why xen_hvm_init_time_ops() is delayed > for any vcpus. (Suggested by Boris Ostrovsky) > - Add a comment in xen_hvm_smp_prepare_boot_cpu() referencing the related > code in xen_hvm_guest_init(). (suggested by Juergen Gross) > Changed since v2: > - Delay for all VCPUs. (Suggested by Boris Ostrovsky) > - Add commit message that why PVM is not supported by this patch > - Test if kexec/kdump works with mainline xen (HVM and PVM) > > > I have delayed the xen_hvm_init_time_ops() for all VCPUs. Unfortunately, > now I am able to reproduce the clock backward as shown below on some old > versions of xen. I am not able to reproduce on most recent mainline xen. > > [ 0.359687] pcpu-alloc: [0] 16 17 18 19 20 21 22 23 [0] 24 25 26 27 28 29 30 31 > [ 0.359694] pcpu-alloc: [0] 32 33 34 35 36 37 38 39 [0] 40 41 42 43 44 45 46 47 > [ 0.359701] pcpu-alloc: [0] 48 49 50 51 52 53 54 55 [0] 56 57 58 59 60 61 62 63 > > ... clock backward after the clocksource is switched from native to xen... > > [ 0.000004] Fallback order for Node 0: 0 > [ 0.002967] Built 1 zonelists, mobility grouping on. Total pages: 3527744 > [ 0.007129] Policy zone: Normal > [ 0.008937] Kernel command line: BOOT_IMAGE=/boot/vmlinuz-5.17.0-rc4xen-00054-gf71077a4d84b-dirty root=UUID=2a5975ab-a059-4697-9aee-7a53ddfeea21 ro text console=ttyS0,115200n8 console=tty1 earlyprintk=tty S0,115200n8 loglevel=7 no_timer_check reboot=s32 splash crashkernel=512M-:192M vt.handoff=1 > [ 0.023880] Unknown kernel command line parameters "text splash BOOT_IMAGE=/boot/vmlinuz-5.17.0-rc4xen-00054-gf71077a4d84b-dirty", will be passed to user space. > [ 0.032647] printk: log_buf_len individual max cpu contribution: 4096 bytes > [ 0.036828] printk: log_buf_len total cpu_extra contributions: 258048 bytes > [ 0.041049] printk: log_buf_len min size: 262144 bytes > [ 0.044481] printk: log_buf_len: 524288 bytes > > > Since now I am able to reproduce the clock backward on old xen version, > please let me know if I should re-use the v2 of this patch, as it has been > running well in our env well for very long time. > > https://lore.kernel.org/all/20211028012543.8776-1-dongli.zhang@oracle.com/ > > > BTW, I have tested that 'soft_reset' does not work with mainline xen, even > when I directly trigger kexec with below commands. > > # kexec -l /boot/vmlinuz-5.17.0-rc4xen-00054-gf71077a4d84b-dirty \ > --initrd=/boot/initrd.img-5.17.0-rc4xen-00054-gf71077a4d84b-dirty \ > --reuse-cmdline > # kexec -e > > > Thank you very much! > > Dongli Zhang > >
Hi Boris, On 2/25/22 2:39 PM, Boris Ostrovsky wrote: > > On 2/24/22 4:50 PM, Dongli Zhang wrote: >> This is the v3 of the patch to fix xen kexec kernel panic issue when the >> kexec is triggered on VCPU >= 32. >> >> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20 >> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted >> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1 >> [ 0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020 >> [ 0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0 >> ... ... >> [ 0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX: >> 0000000000000000 >> [ 0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: 0000000000000002 >> [ 0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: 0000000000000020 >> [ 0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001 >> [ 0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: 0000000040000004 >> [ 0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: 0000000000000000 >> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffffab588000(0000) >> knlGS:0000000000000000 >> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: 00000000000406a0 >> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >> [ 0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >> [ 0.000000] Call Trace: >> [ 0.000000] <TASK> >> [ 0.000000] ? xen_clocksource_read+0x24/0x40 > > > This is done to set xen_sched_clock_offset which I think will not be used for a > while, until sched_clock is called (and the other two uses are for suspend/resume) > > > Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until > after all vcpu areas are properly set? Or are there other uses of > xen_clocksource_read() before ? > I have tested that below patch will panic kdump kernel. diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c index 6ff3c887e0b9..6a0c99941ae1 100644 --- a/arch/x86/xen/smp_hvm.c +++ b/arch/x86/xen/smp_hvm.c @@ -19,6 +19,8 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void) */ xen_vcpu_setup(0); + xen_init_sched_clock_offset(); + /* * The alternative logic (which patches the unlock/lock) runs before * the smp bootup up code is activated. Hence we need to set this up diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index d9c945ee1100..8a2eafa0c215 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -520,9 +520,14 @@ static void __init xen_time_init(void) pvclock_gtod_register_notifier(&xen_pvclock_gtod_notifier); } -static void __init xen_init_time_common(void) +void xen_init_sched_clock_offset(void) { xen_sched_clock_offset = xen_clocksource_read(); +} + +static void __init xen_init_time_common(void) +{ + xen_sched_clock_offset = 0; static_call_update(pv_steal_clock, xen_steal_clock); paravirt_set_sched_clock(xen_sched_clock); diff --git a/arch/x86/xen/xen-ops.h b/arch/x86/xen/xen-ops.h index fd0fec6e92f4..9f7656214dfb 100644 --- a/arch/x86/xen/xen-ops.h +++ b/arch/x86/xen/xen-ops.h @@ -69,6 +69,7 @@ void xen_teardown_timer(int cpu); void xen_setup_cpu_clockevents(void); void xen_save_time_memory_area(void); void xen_restore_time_memory_area(void); +void xen_init_sched_clock_offset(void); void xen_init_time_ops(void); void xen_hvm_init_time_ops(void); Unfortunately, I am not able to obtain the panic callstack from kdump time this time. I have only below. PANIC: early exception 0x0e IP 10:ffffffffa6c679b6 error 0 cr2 0x20 PANIC: early exception 0x0e IP 10:ffffffffa6c679b6 error 0 cr2 0x20 The sched_clock() can be used very early since commit 857baa87b642 ("sched/clock: Enable sched clock early"). Any printk should use sched_clock() to obtain the timestamp. vprintk_store() -> local_clock() -> sched_clock() -> paravirt_sched_clock() -> xen_sched_clock() -> xen_clocksource_read() AFAIR, we started to encounter the issue since commit 857baa87b642 ("sched/clock: Enable sched clock early"). kdump used to work well before that commit. Thank you very much! Dongli Zhang
On 2/25/22 8:17 PM, Dongli Zhang wrote: > Hi Boris, > > On 2/25/22 2:39 PM, Boris Ostrovsky wrote: >> >> On 2/24/22 4:50 PM, Dongli Zhang wrote: >>> This is the v3 of the patch to fix xen kexec kernel panic issue when the >>> kexec is triggered on VCPU >= 32. >>> >>> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20 >>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted >>> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1 >>> [ 0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020 >>> [ 0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0 >>> ... ... >>> [ 0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX: >>> 0000000000000000 >>> [ 0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: 0000000000000002 >>> [ 0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: 0000000000000020 >>> [ 0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: 0000000000000001 >>> [ 0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: 0000000040000004 >>> [ 0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: 0000000000000000 >>> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffffab588000(0000) >>> knlGS:0000000000000000 >>> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: 00000000000406a0 >>> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 >>> [ 0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 >>> [ 0.000000] Call Trace: >>> [ 0.000000] <TASK> >>> [ 0.000000] ? xen_clocksource_read+0x24/0x40 >> >> >> This is done to set xen_sched_clock_offset which I think will not be used for a >> while, until sched_clock is called (and the other two uses are for suspend/resume) >> >> >> Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until >> after all vcpu areas are properly set? Or are there other uses of >> xen_clocksource_read() before ? >> > > I have tested that below patch will panic kdump kernel. > Oh well, so much for that then. Yes, sched_clock() is at least called from printk path. I guess we will have to go with v2 then, we don't want to start seeing time going back, even if only with older hypervisors. The only thing I might ask is that you roll the logic inside xen_hvm_init_time_ops(). Something like xen_hvm_init_time_ops() { /* * Wait until per_cpu(xen_vcpu, 0) is initialized which may happen * later (e.g. when kdump kernel runs on >=MAX_VIRT_CPUS vcpu) */ if (__this_cpu_read(xen_vcpu_nr(0)) == NULL) return; ... -boris
Hi Boris, On 2/28/22 12:45 PM, Boris Ostrovsky wrote: > > > On 2/25/22 8:17 PM, Dongli Zhang wrote: >> Hi Boris, >> >> On 2/25/22 2:39 PM, Boris Ostrovsky wrote: >>> >>> On 2/24/22 4:50 PM, Dongli Zhang wrote: >>>> This is the v3 of the patch to fix xen kexec kernel panic issue when the >>>> kexec is triggered on VCPU >= 32. >>>> >>>> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20 >>>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted >>>> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1 >>>> [ 0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020 >>>> [ 0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0 >>>> ... ... >>>> [ 0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX: >>>> 0000000000000000 >>>> [ 0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: >>>> 0000000000000002 >>>> [ 0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: >>>> 0000000000000020 >>>> [ 0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: >>>> 0000000000000001 >>>> [ 0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: >>>> 0000000040000004 >>>> [ 0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: >>>> 0000000000000000 >>>> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffffab588000(0000) >>>> knlGS:0000000000000000 >>>> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: >>>> 00000000000406a0 >>>> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>> 0000000000000000 >>>> [ 0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>>> 0000000000000400 >>>> [ 0.000000] Call Trace: >>>> [ 0.000000] <TASK> >>>> [ 0.000000] ? xen_clocksource_read+0x24/0x40 >>> >>> >>> This is done to set xen_sched_clock_offset which I think will not be used for a >>> while, until sched_clock is called (and the other two uses are for >>> suspend/resume) >>> >>> >>> Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until >>> after all vcpu areas are properly set? Or are there other uses of >>> xen_clocksource_read() before ? >>> >> >> I have tested that below patch will panic kdump kernel. >> > > > > Oh well, so much for that then. Yes, sched_clock() is at least called from > printk path. > > > I guess we will have to go with v2 then, we don't want to start seeing time > going back, even if only with older hypervisors. The only thing I might ask is > that you roll the logic inside xen_hvm_init_time_ops(). Something like > > > xen_hvm_init_time_ops() > { > /* > * Wait until per_cpu(xen_vcpu, 0) is initialized which may happen > * later (e.g. when kdump kernel runs on >=MAX_VIRT_CPUS vcpu) > */ > if (__this_cpu_read(xen_vcpu_nr(0)) == NULL) > return; > I think you meant __this_cpu_read(xen_vcpu). I will call xen_hvm_init_time_ops() at both places, and move the logic into xen_hvm_init_time_ops(). Thank you very much! Dongli Zhang
Hi Boris, On 2/28/22 5:18 PM, Dongli Zhang wrote: > Hi Boris, > > On 2/28/22 12:45 PM, Boris Ostrovsky wrote: >> >> >> On 2/25/22 8:17 PM, Dongli Zhang wrote: >>> Hi Boris, >>> >>> On 2/25/22 2:39 PM, Boris Ostrovsky wrote: >>>> >>>> On 2/24/22 4:50 PM, Dongli Zhang wrote: >>>>> This is the v3 of the patch to fix xen kexec kernel panic issue when the >>>>> kexec is triggered on VCPU >= 32. >>>>> >>>>> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20 >>>>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted >>>>> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1 >>>>> [ 0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020 >>>>> [ 0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0 >>>>> ... ... >>>>> [ 0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX: >>>>> 0000000000000000 >>>>> [ 0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: >>>>> 0000000000000002 >>>>> [ 0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: >>>>> 0000000000000020 >>>>> [ 0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: >>>>> 0000000000000001 >>>>> [ 0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: >>>>> 0000000040000004 >>>>> [ 0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: >>>>> 0000000000000000 >>>>> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffffab588000(0000) >>>>> knlGS:0000000000000000 >>>>> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>> [ 0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: >>>>> 00000000000406a0 >>>>> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>>> 0000000000000000 >>>>> [ 0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>>>> 0000000000000400 >>>>> [ 0.000000] Call Trace: >>>>> [ 0.000000] <TASK> >>>>> [ 0.000000] ? xen_clocksource_read+0x24/0x40 >>>> >>>> >>>> This is done to set xen_sched_clock_offset which I think will not be used for a >>>> while, until sched_clock is called (and the other two uses are for >>>> suspend/resume) >>>> >>>> >>>> Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until >>>> after all vcpu areas are properly set? Or are there other uses of >>>> xen_clocksource_read() before ? >>>> >>> >>> I have tested that below patch will panic kdump kernel. >>> >> >> >> >> Oh well, so much for that then. Yes, sched_clock() is at least called from >> printk path. >> >> >> I guess we will have to go with v2 then, we don't want to start seeing time >> going back, even if only with older hypervisors. The only thing I might ask is >> that you roll the logic inside xen_hvm_init_time_ops(). Something like >> >> >> xen_hvm_init_time_ops() >> { >> /* >> * Wait until per_cpu(xen_vcpu, 0) is initialized which may happen >> * later (e.g. when kdump kernel runs on >=MAX_VIRT_CPUS vcpu) >> */ >> if (__this_cpu_read(xen_vcpu_nr(0)) == NULL) >> return; >> > > I think you meant __this_cpu_read(xen_vcpu). > > I will call xen_hvm_init_time_ops() at both places, and move the logic into > xen_hvm_init_time_ops(). > > Thank you very much! > > Dongli Zhang > How about we do not move the logic into xen_hvm_init_time_ops()? Suppose we move the logic into xen_hvm_init_time_ops() line 573, the line line 570 might be printed twice. 559 void __init xen_hvm_init_time_ops(void) 560 { 561 /* 562 * vector callback is needed otherwise we cannot receive interrupts 563 * on cpu > 0 and at this point we don't know how many cpus are 564 * available. 565 */ 566 if (!xen_have_vector_callback) 567 return; 568 569 if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { 570 pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); 571 return; 572 } 573 574 xen_init_time_common(); 575 576 x86_init.timers.setup_percpu_clockev = xen_time_init; 577 x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents; 578 579 x86_platform.set_wallclock = xen_set_wallclock; 580 } I feel the code looks better if we keep the logic at caller side. Would you mind letting me know your feedback? Thank you very much! Dongli Zhang
On 2/28/22 11:56 PM, Dongli Zhang wrote: > Hi Boris, > > On 2/28/22 5:18 PM, Dongli Zhang wrote: >> Hi Boris, >> >> On 2/28/22 12:45 PM, Boris Ostrovsky wrote: >>> >>> On 2/25/22 8:17 PM, Dongli Zhang wrote: >>>> Hi Boris, >>>> >>>> On 2/25/22 2:39 PM, Boris Ostrovsky wrote: >>>>> On 2/24/22 4:50 PM, Dongli Zhang wrote: >>>>>> This is the v3 of the patch to fix xen kexec kernel panic issue when the >>>>>> kexec is triggered on VCPU >= 32. >>>>>> >>>>>> PANIC: early exception 0x0e IP 10:ffffffffa96679b6 error 0 cr2 0x20 >>>>>> [ 0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted >>>>>> 5.17.0-rc4xen-00054-gf71077a4d84b-dirty #1 >>>>>> [ 0.000000] Hardware name: Xen HVM domU, BIOS 4.4.4OVM 12/15/2020 >>>>>> [ 0.000000] RIP: 0010:pvclock_clocksource_read+0x6/0xb0 >>>>>> ... ... >>>>>> [ 0.000000] RSP: 0000:ffffffffaae03e10 EFLAGS: 00010082 ORIG_RAX: >>>>>> 0000000000000000 >>>>>> [ 0.000000] RAX: 0000000000000000 RBX: 0000000000010000 RCX: >>>>>> 0000000000000002 >>>>>> [ 0.000000] RDX: 0000000000000003 RSI: ffffffffaac37515 RDI: >>>>>> 0000000000000020 >>>>>> [ 0.000000] RBP: 0000000000011000 R08: 0000000000000000 R09: >>>>>> 0000000000000001 >>>>>> [ 0.000000] R10: ffffffffaae03df8 R11: ffffffffaae03c68 R12: >>>>>> 0000000040000004 >>>>>> [ 0.000000] R13: ffffffffaae03e50 R14: 0000000000000000 R15: >>>>>> 0000000000000000 >>>>>> [ 0.000000] FS: 0000000000000000(0000) GS:ffffffffab588000(0000) >>>>>> knlGS:0000000000000000 >>>>>> [ 0.000000] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>>>> [ 0.000000] CR2: 0000000000000020 CR3: 00000000ea410000 CR4: >>>>>> 00000000000406a0 >>>>>> [ 0.000000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: >>>>>> 0000000000000000 >>>>>> [ 0.000000] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: >>>>>> 0000000000000400 >>>>>> [ 0.000000] Call Trace: >>>>>> [ 0.000000] <TASK> >>>>>> [ 0.000000] ? xen_clocksource_read+0x24/0x40 >>>>> >>>>> This is done to set xen_sched_clock_offset which I think will not be used for a >>>>> while, until sched_clock is called (and the other two uses are for >>>>> suspend/resume) >>>>> >>>>> >>>>> Can we simply defer 'xen_sched_clock_offset = xen_clocksource_read();' until >>>>> after all vcpu areas are properly set? Or are there other uses of >>>>> xen_clocksource_read() before ? >>>>> >>>> I have tested that below patch will panic kdump kernel. >>>> >>> >>> >>> Oh well, so much for that then. Yes, sched_clock() is at least called from >>> printk path. >>> >>> >>> I guess we will have to go with v2 then, we don't want to start seeing time >>> going back, even if only with older hypervisors. The only thing I might ask is >>> that you roll the logic inside xen_hvm_init_time_ops(). Something like >>> >>> >>> xen_hvm_init_time_ops() >>> { >>> /* >>> * Wait until per_cpu(xen_vcpu, 0) is initialized which may happen >>> * later (e.g. when kdump kernel runs on >=MAX_VIRT_CPUS vcpu) >>> */ >>> if (__this_cpu_read(xen_vcpu_nr(0)) == NULL) >>> return; >>> >> I think you meant __this_cpu_read(xen_vcpu). >> >> I will call xen_hvm_init_time_ops() at both places, and move the logic into >> xen_hvm_init_time_ops(). >> >> Thank you very much! >> >> Dongli Zhang >> > > How about we do not move the logic into xen_hvm_init_time_ops()? > > Suppose we move the logic into xen_hvm_init_time_ops() line 573, the line line > 570 might be printed twice. You would need to make sure the routine is executed only once so something like a local static variable would be needed. > > > 559 void __init xen_hvm_init_time_ops(void) > 560 { > 561 /* > 562 * vector callback is needed otherwise we cannot receive interrupts > 563 * on cpu > 0 and at this point we don't know how many cpus are > 564 * available. > 565 */ > 566 if (!xen_have_vector_callback) > 567 return; > 568 > 569 if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { > 570 pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); > 571 return; > 572 } > 573 > 574 xen_init_time_common(); > 575 > 576 x86_init.timers.setup_percpu_clockev = xen_time_init; > 577 x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents; > 578 > 579 x86_platform.set_wallclock = xen_set_wallclock; > 580 } > > I feel the code looks better if we keep the logic at caller side. Would you mind > letting me know your feedback? My preference is to keep logic concentrated in one place whenever possible. -boris