Message ID | 20220302164032.14569-3-dongli.zhang@oracle.com (mailing list archive) |
---|---|
State | Accepted |
Commit | c147a3a0efbdfac7f4b1fb029772ec5e9e1ba39b |
Headers | show |
Series | xen: fix HVM kexec kernel panic | expand |
On 3/2/22 11:40 AM, Dongli Zhang wrote: > void __init xen_hvm_init_time_ops(void) > { > + static bool hvm_time_initialized; > + > + if (hvm_time_initialized) > + return; > + > /* > * vector callback is needed otherwise we cannot receive interrupts > * on cpu > 0 and at this point we don't know how many cpus are > * available. > */ > if (!xen_have_vector_callback) > - return; > + goto exit; Why not just return? Do we expect the value of xen_have_vector_callback to change? -boris > > if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { > pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); > + goto exit; > + } > + > + /* > + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'. > + * The __this_cpu_read(xen_vcpu) is still NULL when Xen HVM guest > + * boots on vcpu >= MAX_VIRT_CPUS (e.g., kexec), To access > + * __this_cpu_read(xen_vcpu) via xen_clocksource_read() will panic. > + * > + * The xen_hvm_init_time_ops() should be called again later after > + * __this_cpu_read(xen_vcpu) is available. > + */ > + if (!__this_cpu_read(xen_vcpu)) { > + pr_info("Delay xen_init_time_common() as kernel is running on vcpu=%d\n", > + xen_vcpu_nr(0)); > return; > } > > @@ -577,6 +597,9 @@ void __init xen_hvm_init_time_ops(void) > x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents; > > x86_platform.set_wallclock = xen_set_wallclock; > + > +exit: > + hvm_time_initialized = true; > } > #endif >
Hi Boris, On 3/2/22 4:20 PM, Boris Ostrovsky wrote: > > On 3/2/22 11:40 AM, Dongli Zhang wrote: >> void __init xen_hvm_init_time_ops(void) >> { >> + static bool hvm_time_initialized; >> + >> + if (hvm_time_initialized) >> + return; >> + >> /* >> * vector callback is needed otherwise we cannot receive interrupts >> * on cpu > 0 and at this point we don't know how many cpus are >> * available. >> */ >> if (!xen_have_vector_callback) >> - return; >> + goto exit; > > > Why not just return? Do we expect the value of xen_have_vector_callback to change? I just want to keep above sync with .... > > > -boris > > >> if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { >> pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); >> + goto exit; >> + } ... here. That is, I want the main logic of xen_hvm_init_time_ops() to run for at most once. Both of above two if statements will "go to exit". Thank you very much! Dongli Zhang >> + >> + /* >> + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'. >> + * The __this_cpu_read(xen_vcpu) is still NULL when Xen HVM guest >> + * boots on vcpu >= MAX_VIRT_CPUS (e.g., kexec), To access >> + * __this_cpu_read(xen_vcpu) via xen_clocksource_read() will panic. >> + * >> + * The xen_hvm_init_time_ops() should be called again later after >> + * __this_cpu_read(xen_vcpu) is available. >> + */ >> + if (!__this_cpu_read(xen_vcpu)) { >> + pr_info("Delay xen_init_time_common() as kernel is running on >> vcpu=%d\n", >> + xen_vcpu_nr(0)); >> return; >> } >> @@ -577,6 +597,9 @@ void __init xen_hvm_init_time_ops(void) >> x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents; >> x86_platform.set_wallclock = xen_set_wallclock; >> + >> +exit: >> + hvm_time_initialized = true; >> } >> #endif >>
On 3/2/22 7:31 PM, Dongli Zhang wrote: > Hi Boris, > > On 3/2/22 4:20 PM, Boris Ostrovsky wrote: >> On 3/2/22 11:40 AM, Dongli Zhang wrote: >>> void __init xen_hvm_init_time_ops(void) >>> { >>> + static bool hvm_time_initialized; >>> + >>> + if (hvm_time_initialized) >>> + return; >>> + >>> /* >>> * vector callback is needed otherwise we cannot receive interrupts >>> * on cpu > 0 and at this point we don't know how many cpus are >>> * available. >>> */ >>> if (!xen_have_vector_callback) >>> - return; >>> + goto exit; >> >> Why not just return? Do we expect the value of xen_have_vector_callback to change? > I just want to keep above sync with .... > >> >> -boris >> >> >>> if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { >>> pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); >>> + goto exit; >>> + } > ... here. > > That is, I want the main logic of xen_hvm_init_time_ops() to run for at most > once. Both of above two if statements will "go to exit". I didn't notice this actually. I think both of them should return early, there is no reason to set hvm_time_initialized to true when, in fact, we have not initialized anything. And to avoid printing the warning twice we can just replace it with pr_info_once(). I can fix it up when committing so no need to resend. So unless you disagree Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> > Thank you very much! > > Dongli Zhang > >>> + >>> + /* >>> + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'. >>> + * The __this_cpu_read(xen_vcpu) is still NULL when Xen HVM guest >>> + * boots on vcpu >= MAX_VIRT_CPUS (e.g., kexec), To access >>> + * __this_cpu_read(xen_vcpu) via xen_clocksource_read() will panic. >>> + * >>> + * The xen_hvm_init_time_ops() should be called again later after >>> + * __this_cpu_read(xen_vcpu) is available. >>> + */ >>> + if (!__this_cpu_read(xen_vcpu)) { >>> + pr_info("Delay xen_init_time_common() as kernel is running on >>> vcpu=%d\n", >>> + xen_vcpu_nr(0)); >>> return; >>> } >>> @@ -577,6 +597,9 @@ void __init xen_hvm_init_time_ops(void) >>> x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents; >>> x86_platform.set_wallclock = xen_set_wallclock; >>> + >>> +exit: >>> + hvm_time_initialized = true; >>> } >>> #endif >>>
Hi Boris, On 3/2/22 6:11 PM, Boris Ostrovsky wrote: > > On 3/2/22 7:31 PM, Dongli Zhang wrote: >> Hi Boris, >> >> On 3/2/22 4:20 PM, Boris Ostrovsky wrote: >>> On 3/2/22 11:40 AM, Dongli Zhang wrote: >>>> void __init xen_hvm_init_time_ops(void) >>>> { >>>> + static bool hvm_time_initialized; >>>> + >>>> + if (hvm_time_initialized) >>>> + return; >>>> + >>>> /* >>>> * vector callback is needed otherwise we cannot receive interrupts >>>> * on cpu > 0 and at this point we don't know how many cpus are >>>> * available. >>>> */ >>>> if (!xen_have_vector_callback) >>>> - return; >>>> + goto exit; >>> >>> Why not just return? Do we expect the value of xen_have_vector_callback to >>> change? >> I just want to keep above sync with .... >> >>> >>> -boris >>> >>> >>>> if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { >>>> pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); >>>> + goto exit; >>>> + } >> ... here. >> >> That is, I want the main logic of xen_hvm_init_time_ops() to run for at most >> once. Both of above two if statements will "go to exit". > > > I didn't notice this actually. > > > I think both of them should return early, there is no reason to set > hvm_time_initialized to true when, in fact, we have not initialized anything. > And to avoid printing the warning twice we can just replace it with pr_info_once(). > > > I can fix it up when committing so no need to resend. So unless you disagree Thank you very much for fixing it during committing. Dongli Zhang
On 3/2/22 11:40 AM, Dongli Zhang wrote: > The sched_clock() can be used very early since commit 857baa87b642 > ("sched/clock: Enable sched clock early"). In addition, with commit > 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from 0"), kdump > kernel in Xen HVM guest may panic at very early stage when accessing > &__this_cpu_read(xen_vcpu)->time as in below: > > setup_arch() > -> init_hypervisor_platform() > -> x86_init.hyper.init_platform = xen_hvm_guest_init() > -> xen_hvm_init_time_ops() > -> xen_clocksource_read() > -> src = &__this_cpu_read(xen_vcpu)->time; > > This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info' > embedded inside 'shared_info' during early stage until xen_vcpu_setup() is > used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address. > > However, when Xen HVM guest panic on vcpu >= 32, since > xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when > vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic. > > This patch calls xen_hvm_init_time_ops() again later in > xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is > registered when the boot vcpu is >= 32. > > This issue can be reproduced on purpose via below command at the guest > side when kdump/kexec is enabled: > > "taskset -c 33 echo c > /proc/sysrq-trigger" > > The bugfix for PVM is not implemented due to the lack of testing > environment. > > Cc: Joe Jin <joe.jin@oracle.com> > Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Applied to for-linus-5.18 (with return path adjusted)
diff --git a/arch/x86/xen/smp_hvm.c b/arch/x86/xen/smp_hvm.c index 6ff3c887e0b9..b70afdff419c 100644 --- a/arch/x86/xen/smp_hvm.c +++ b/arch/x86/xen/smp_hvm.c @@ -19,6 +19,12 @@ static void __init xen_hvm_smp_prepare_boot_cpu(void) */ xen_vcpu_setup(0); + /* + * Called again in case the kernel boots on vcpu >= MAX_VIRT_CPUS. + * Refer to comments in xen_hvm_init_time_ops(). + */ + xen_hvm_init_time_ops(); + /* * The alternative logic (which patches the unlock/lock) runs before * the smp bootup up code is activated. Hence we need to set this up diff --git a/arch/x86/xen/time.c b/arch/x86/xen/time.c index 55b3407358a9..dcf292cc859e 100644 --- a/arch/x86/xen/time.c +++ b/arch/x86/xen/time.c @@ -558,16 +558,36 @@ static void xen_hvm_setup_cpu_clockevents(void) void __init xen_hvm_init_time_ops(void) { + static bool hvm_time_initialized; + + if (hvm_time_initialized) + return; + /* * vector callback is needed otherwise we cannot receive interrupts * on cpu > 0 and at this point we don't know how many cpus are * available. */ if (!xen_have_vector_callback) - return; + goto exit; if (!xen_feature(XENFEAT_hvm_safe_pvclock)) { pr_info("Xen doesn't support pvclock on HVM, disable pv timer"); + goto exit; + } + + /* + * Only MAX_VIRT_CPUS 'vcpu_info' are embedded inside 'shared_info'. + * The __this_cpu_read(xen_vcpu) is still NULL when Xen HVM guest + * boots on vcpu >= MAX_VIRT_CPUS (e.g., kexec), To access + * __this_cpu_read(xen_vcpu) via xen_clocksource_read() will panic. + * + * The xen_hvm_init_time_ops() should be called again later after + * __this_cpu_read(xen_vcpu) is available. + */ + if (!__this_cpu_read(xen_vcpu)) { + pr_info("Delay xen_init_time_common() as kernel is running on vcpu=%d\n", + xen_vcpu_nr(0)); return; } @@ -577,6 +597,9 @@ void __init xen_hvm_init_time_ops(void) x86_cpuinit.setup_percpu_clockev = xen_hvm_setup_cpu_clockevents; x86_platform.set_wallclock = xen_set_wallclock; + +exit: + hvm_time_initialized = true; } #endif
The sched_clock() can be used very early since commit 857baa87b642 ("sched/clock: Enable sched clock early"). In addition, with commit 38669ba205d1 ("x86/xen/time: Output xen sched_clock time from 0"), kdump kernel in Xen HVM guest may panic at very early stage when accessing &__this_cpu_read(xen_vcpu)->time as in below: setup_arch() -> init_hypervisor_platform() -> x86_init.hyper.init_platform = xen_hvm_guest_init() -> xen_hvm_init_time_ops() -> xen_clocksource_read() -> src = &__this_cpu_read(xen_vcpu)->time; This is because Xen HVM supports at most MAX_VIRT_CPUS=32 'vcpu_info' embedded inside 'shared_info' during early stage until xen_vcpu_setup() is used to allocate/relocate 'vcpu_info' for boot cpu at arbitrary address. However, when Xen HVM guest panic on vcpu >= 32, since xen_vcpu_info_reset(0) would set per_cpu(xen_vcpu, cpu) = NULL when vcpu >= 32, xen_clocksource_read() on vcpu >= 32 would panic. This patch calls xen_hvm_init_time_ops() again later in xen_hvm_smp_prepare_boot_cpu() after the 'vcpu_info' for boot vcpu is registered when the boot vcpu is >= 32. This issue can be reproduced on purpose via below command at the guest side when kdump/kexec is enabled: "taskset -c 33 echo c > /proc/sysrq-trigger" The bugfix for PVM is not implemented due to the lack of testing environment. Cc: Joe Jin <joe.jin@oracle.com> Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> --- Changed since v1: - Add commit message to explain why xen_hvm_init_time_ops() is delayed for any vcpus. (Suggested by Boris Ostrovsky) - Add a comment in xen_hvm_smp_prepare_boot_cpu() referencing the related code in xen_hvm_guest_init(). (suggested by Juergen Gross) Changed since v2: - Delay for all VCPUs. (Suggested by Boris Ostrovsky) - Add commit message that why PVM is not supported by this patch - Test if kexec/kdump works with mainline xen (HVM and PVM) Changed since v3: - Re-use v2 but move the login into xen_hvm_init_time_ops() (Suggested by Boris Ostrovsky) arch/x86/xen/smp_hvm.c | 6 ++++++ arch/x86/xen/time.c | 25 ++++++++++++++++++++++++- 2 files changed, 30 insertions(+), 1 deletion(-)