Message ID | 1427953216-11737-3-git-send-email-takahiro.akashi@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Thu, Apr 02, 2015 at 06:40:13AM +0100, AKASHI Takahiro wrote: > The current kvm implementation keeps EL2 vector table installed even > when the system is shut down. This prevents kexec from putting the system > with kvm back into EL2 when starting a new kernel. > > This patch resolves this issue by calling a cpu tear-down function via > reboot notifier, kvm_reboot_notify(), which is invoked by > kernel_restart_prepare() in kernel_kexec(). > While kvm has a generic hook, kvm_reboot(), we can't use it here because > a cpu teardown function will not be invoked, under current implementation, > if no guest vm has been created by kvm_create_vm(). > Please note that kvm_usage_count is zero in this case. > > We'd better, in the future, implement cpu hotplug support and put the > arch-specific initialization into kvm_arch_hardware_enable/disable(). > This way, we would be able to revert this patch. Why can't we use kvm_arch_hardware_enable/disable() currently? > > Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> > --- > arch/arm/kvm/arm.c | 21 +++++++++++++++++++++ > arch/arm64/kvm/Kconfig | 1 - > 2 files changed, 21 insertions(+), 1 deletion(-) > > diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c > index 39df694..f64713e 100644 > --- a/arch/arm/kvm/arm.c > +++ b/arch/arm/kvm/arm.c > @@ -25,6 +25,7 @@ > #include <linux/vmalloc.h> > #include <linux/fs.h> > #include <linux/mman.h> > +#include <linux/reboot.h> > #include <linux/sched.h> > #include <linux/kvm.h> > #include <trace/events/kvm.h> > @@ -1100,6 +1101,23 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) > return NULL; > } > > +static int kvm_reboot_notify(struct notifier_block *nb, > + unsigned long val, void *v) > +{ > + /* > + * Reset each CPU in EL2 to initial state. > + */ > + on_each_cpu(kvm_cpu_reset, NULL, 1); > + > + return NOTIFY_DONE; > +} > + > +static struct notifier_block kvm_reboot_nb = { > + .notifier_call = kvm_reboot_notify, > + .next = NULL, > + .priority = 0, /* FIXME */ It would be helpful for the comment to explain why this is wrong, and what needs fixing. Mark.
Mark, On 04/08/2015 10:05 PM, Mark Rutland wrote: > On Thu, Apr 02, 2015 at 06:40:13AM +0100, AKASHI Takahiro wrote: >> The current kvm implementation keeps EL2 vector table installed even >> when the system is shut down. This prevents kexec from putting the system >> with kvm back into EL2 when starting a new kernel. >> >> This patch resolves this issue by calling a cpu tear-down function via >> reboot notifier, kvm_reboot_notify(), which is invoked by >> kernel_restart_prepare() in kernel_kexec(). >> While kvm has a generic hook, kvm_reboot(), we can't use it here because >> a cpu teardown function will not be invoked, under current implementation, >> if no guest vm has been created by kvm_create_vm(). >> Please note that kvm_usage_count is zero in this case. >> >> We'd better, in the future, implement cpu hotplug support and put the >> arch-specific initialization into kvm_arch_hardware_enable/disable(). >> This way, we would be able to revert this patch. > > Why can't we use kvm_arch_hardware_enable/disable() currently? IIUC, kvm will call kvm_arch_hardware_enable() iff a new guest is being created *and* cpus have not been initialized yet. kvm_usage_count==0 indicates this. Similarly, kvm will call kvm_arch_hardware_disable() whenever a guest is being terminated (i.e. kvm_usage_count != 0). Therefore if kvm_arch_hardware_enable/disable() also handle EL2 vector table initialization, we don't have to have any particular operations, as my patch does, for kexec case. (a long-term solution) Since arm64 doesn't implement kvm_arch_hardware_enable() (I don't know why), I'm trying to fix the problem by adding a minimum tear-down function, kvm_cpu_reset, and invoking it via a reboot hook. (an interim fix) This scheme of a interim fix and a long-term solution, I heard, has been agreed by Marc and Geoff in LCU14. I just followed it. Is this clear? >> >> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> >> --- >> arch/arm/kvm/arm.c | 21 +++++++++++++++++++++ >> arch/arm64/kvm/Kconfig | 1 - >> 2 files changed, 21 insertions(+), 1 deletion(-) >> >> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c >> index 39df694..f64713e 100644 >> --- a/arch/arm/kvm/arm.c >> +++ b/arch/arm/kvm/arm.c >> @@ -25,6 +25,7 @@ >> #include <linux/vmalloc.h> >> #include <linux/fs.h> >> #include <linux/mman.h> >> +#include <linux/reboot.h> >> #include <linux/sched.h> >> #include <linux/kvm.h> >> #include <trace/events/kvm.h> >> @@ -1100,6 +1101,23 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) >> return NULL; >> } >> >> +static int kvm_reboot_notify(struct notifier_block *nb, >> + unsigned long val, void *v) >> +{ >> + /* >> + * Reset each CPU in EL2 to initial state. >> + */ >> + on_each_cpu(kvm_cpu_reset, NULL, 1); >> + >> + return NOTIFY_DONE; >> +} >> + >> +static struct notifier_block kvm_reboot_nb = { >> + .notifier_call = kvm_reboot_notify, >> + .next = NULL, >> + .priority = 0, /* FIXME */ > > It would be helpful for the comment to explain why this is wrong, and > what needs fixing. Thank for reminding me of this. *priority* enforces a calling order of registered hook functions. If some hook returns NOTIFY_STOP_MASK, subsequent hooks won't be called. (Nevertheless, reboot sequence will go ahead. See kernel_restart_prepare()/ notifier_call_chain().) So we should make sure that kvm_reboot_notify() be called 1) after any hook functions which may depend on kvm, and 2) before any hook functions which kvm may depend on, and 3) before any hook functions that may return NOTIFY_STOP_MASK But how can we guarantee this and determine a priority of kvm_reboot_notify()? Looking into all the occurrences of register_reboot_notifier(), 1) => nothing 2) => virt/kvm/kvm_main.c (priority: 0) 3) => drivers/cpufreq/s32416-cpufreq.c (priority: 0) drivers/cpufreq/s5pv210-cpufreq.c (priority: 0) So a priority higher than zero might be safe and better, but exactly what? Some hooks use "INT_MAX." Thanks, -Takahiro AKASHI > Mark. >
On Thu, Apr 09, 2015 at 05:53:33AM +0100, AKASHI Takahiro wrote: > Mark, > > On 04/08/2015 10:05 PM, Mark Rutland wrote: > > On Thu, Apr 02, 2015 at 06:40:13AM +0100, AKASHI Takahiro wrote: > >> The current kvm implementation keeps EL2 vector table installed even > >> when the system is shut down. This prevents kexec from putting the system > >> with kvm back into EL2 when starting a new kernel. > >> > >> This patch resolves this issue by calling a cpu tear-down function via > >> reboot notifier, kvm_reboot_notify(), which is invoked by > >> kernel_restart_prepare() in kernel_kexec(). > >> While kvm has a generic hook, kvm_reboot(), we can't use it here because > >> a cpu teardown function will not be invoked, under current implementation, > >> if no guest vm has been created by kvm_create_vm(). > >> Please note that kvm_usage_count is zero in this case. > >> > >> We'd better, in the future, implement cpu hotplug support and put the > >> arch-specific initialization into kvm_arch_hardware_enable/disable(). > >> This way, we would be able to revert this patch. > > > > Why can't we use kvm_arch_hardware_enable/disable() currently? > > IIUC, kvm will call kvm_arch_hardware_enable() iff a new guest is being > created *and* cpus have not been initialized yet. kvm_usage_count==0 > indicates this. Similarly, kvm will call kvm_arch_hardware_disable() whenever > a guest is being terminated (i.e. kvm_usage_count != 0). > Therefore if kvm_arch_hardware_enable/disable() also handle EL2 vector table > initialization, we don't have to have any particular operations, as my patch > does, for kexec case. > (a long-term solution) > > Since arm64 doesn't implement kvm_arch_hardware_enable() (I don't know why), > I'm trying to fix the problem by adding a minimum tear-down function, kvm_cpu_reset, > and invoking it via a reboot hook. > (an interim fix) What I don't understand is why we can't move the init and tear-down functions into kvm_arch_hardware_enable/disable(). They seem to be for precisely what you are implementing, with the only difference being the time that they are called. Either I'm missing something, or we can simply implement the existing hooks. I assume I'm missing something. > >> +static struct notifier_block kvm_reboot_nb = { > >> + .notifier_call = kvm_reboot_notify, > >> + .next = NULL, > >> + .priority = 0, /* FIXME */ > > > > It would be helpful for the comment to explain why this is wrong, and > > what needs fixing. > > Thank for reminding me of this. > > *priority* enforces a calling order of registered hook functions. > If some hook returns NOTIFY_STOP_MASK, subsequent hooks won't be called. > (Nevertheless, reboot sequence will go ahead. See kernel_restart_prepare()/ > notifier_call_chain().) > > So we should make sure that kvm_reboot_notify() be called > 1) after any hook functions which may depend on kvm, and Which hooks depend on KVM? > 2) before any hook functions which kvm may depend on, and Which other hooks does KVM depend on? > 3) before any hook functions that may return NOTIFY_STOP_MASK I think this would be solved by using kvm_arch_hardware_enable/disable. As far as I can tell, the VMs would be destroyed earlier (and hence KVM disabled) before we got to the final teardown. Thanks, Mark.
Mark Cc: Marc, Geoff On 04/10/2015 12:02 AM, Mark Rutland wrote: > On Thu, Apr 09, 2015 at 05:53:33AM +0100, AKASHI Takahiro wrote: >> Mark, >> >> On 04/08/2015 10:05 PM, Mark Rutland wrote: >>> On Thu, Apr 02, 2015 at 06:40:13AM +0100, AKASHI Takahiro wrote: >>>> The current kvm implementation keeps EL2 vector table installed even >>>> when the system is shut down. This prevents kexec from putting the system >>>> with kvm back into EL2 when starting a new kernel. >>>> >>>> This patch resolves this issue by calling a cpu tear-down function via >>>> reboot notifier, kvm_reboot_notify(), which is invoked by >>>> kernel_restart_prepare() in kernel_kexec(). >>>> While kvm has a generic hook, kvm_reboot(), we can't use it here because >>>> a cpu teardown function will not be invoked, under current implementation, >>>> if no guest vm has been created by kvm_create_vm(). >>>> Please note that kvm_usage_count is zero in this case. >>>> >>>> We'd better, in the future, implement cpu hotplug support and put the >>>> arch-specific initialization into kvm_arch_hardware_enable/disable(). >>>> This way, we would be able to revert this patch. >>> >>> Why can't we use kvm_arch_hardware_enable/disable() currently? >> >> IIUC, kvm will call kvm_arch_hardware_enable() iff a new guest is being >> created *and* cpus have not been initialized yet. kvm_usage_count==0 >> indicates this. Similarly, kvm will call kvm_arch_hardware_disable() whenever >> a guest is being terminated (i.e. kvm_usage_count != 0). >> Therefore if kvm_arch_hardware_enable/disable() also handle EL2 vector table >> initialization, we don't have to have any particular operations, as my patch >> does, for kexec case. >> (a long-term solution) >> >> Since arm64 doesn't implement kvm_arch_hardware_enable() (I don't know why), >> I'm trying to fix the problem by adding a minimum tear-down function, kvm_cpu_reset, >> and invoking it via a reboot hook. >> (an interim fix) > > What I don't understand is why we can't move the init and tear-down > functions into kvm_arch_hardware_enable/disable(). They seem to be for > precisely what you are implementing, with the only difference being the > time that they are called. I don't know, neither. I just followed the discussions between Marc and Geoff, and their conclusion. I guessed that *refactoring* might be more complicated than expected. FYI, I gave a quick try to kvm_arch_hardware_enable() approach by removing cpu_init_hyp_mode() from init_hyp_mode() and putting it into kvm_arch_hardware_enable(), and it seems to work, at least, in my environment: boot => start a kvm guest => kexec reboot => start a kvm guest > Either I'm missing something, or we can simply implement the existing > hooks. I assume I'm missing something. Marc, Geoff, any comments? >>>> +static struct notifier_block kvm_reboot_nb = { >>>> + .notifier_call = kvm_reboot_notify, >>>> + .next = NULL, >>>> + .priority = 0, /* FIXME */ >>> >>> It would be helpful for the comment to explain why this is wrong, and >>> what needs fixing. >> >> Thank for reminding me of this. >> >> *priority* enforces a calling order of registered hook functions. >> If some hook returns NOTIFY_STOP_MASK, subsequent hooks won't be called. >> (Nevertheless, reboot sequence will go ahead. See kernel_restart_prepare()/ >> notifier_call_chain().) >> >> So we should make sure that kvm_reboot_notify() be called >> 1) after any hook functions which may depend on kvm, and > > Which hooks depend on KVM? I think I answered this question below: >> But how can we guarantee this and determine a priority of kvm_reboot_notify()? >> Looking into all the occurrences of register_reboot_notifier(), >> 1) => nothing >> 2) => virt/kvm/kvm_main.c (priority: 0) >> 3) => drivers/cpufreq/s32416-cpufreq.c (priority: 0) >> drivers/cpufreq/s5pv210-cpufreq.c (priority: 0) >> >> So a priority higher than zero might be safe and better, but exactly what? >> Some hooks use "INT_MAX." Thanks, -Takahiro AKASHI >> 2) before any hook functions which kvm may depend on, and > > Which other hooks does KVM depend on? > >> 3) before any hook functions that may return NOTIFY_STOP_MASK > > I think this would be solved by using kvm_arch_hardware_enable/disable. > As far as I can tell, the VMs would be destroyed earlier (and hence KVM > disabled) before we got to the final teardown. > > Thanks, > Mark. >
> > What I don't understand is why we can't move the init and tear-down > > functions into kvm_arch_hardware_enable/disable(). They seem to be for > > precisely what you are implementing, with the only difference being the > > time that they are called. > > I don't know, neither. I just followed the discussions between Marc and Geoff, > and their conclusion. I guessed that *refactoring* might be more complicated than > expected. > > FYI, I gave a quick try to kvm_arch_hardware_enable() approach by removing > cpu_init_hyp_mode() from init_hyp_mode() and putting it into kvm_arch_hardware_enable(), > and it seems to work, at least, in my environment: > boot => start a kvm guest => kexec reboot => start a kvm guest That sounds pretty convincing to me, assuming you wired the teardown intp kvm_arch_hardware_disable() ? > > Either I'm missing something, or we can simply implement the existing > > hooks. I assume I'm missing something. > > Marc, Geoff, any comments? > > > >>>> +static struct notifier_block kvm_reboot_nb = { > >>>> + .notifier_call = kvm_reboot_notify, > >>>> + .next = NULL, > >>>> + .priority = 0, /* FIXME */ > >>> > >>> It would be helpful for the comment to explain why this is wrong, and > >>> what needs fixing. > >> > >> Thank for reminding me of this. > >> > >> *priority* enforces a calling order of registered hook functions. > >> If some hook returns NOTIFY_STOP_MASK, subsequent hooks won't be called. > >> (Nevertheless, reboot sequence will go ahead. See kernel_restart_prepare()/ > >> notifier_call_chain().) > >> > >> So we should make sure that kvm_reboot_notify() be called > >> 1) after any hook functions which may depend on kvm, and > > > > Which hooks depend on KVM? > > I think I answered this question below: > >> But how can we guarantee this and determine a priority of kvm_reboot_notify()? > >> Looking into all the occurrences of register_reboot_notifier(), > >> 1) => nothing > >> 2) => virt/kvm/kvm_main.c (priority: 0) > >> 3) => drivers/cpufreq/s32416-cpufreq.c (priority: 0) > >> drivers/cpufreq/s5pv210-cpufreq.c (priority: 0) > >> > >> So a priority higher than zero might be safe and better, but exactly what? > >> Some hooks use "INT_MAX." I can't see anything listed which has a dependency on KVM. The KVM notifier would be superseded by kvm_arch_hardware_{disable,enable}, and the cpufreq instances don't seem to have any relationship to KVM. Other architectures use kvm_arch_hardware_{enable,disable}(), so I imagine the core KVM code has no problem with the ordering of these. Mark.
diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c index 39df694..f64713e 100644 --- a/arch/arm/kvm/arm.c +++ b/arch/arm/kvm/arm.c @@ -25,6 +25,7 @@ #include <linux/vmalloc.h> #include <linux/fs.h> #include <linux/mman.h> +#include <linux/reboot.h> #include <linux/sched.h> #include <linux/kvm.h> #include <trace/events/kvm.h> @@ -1100,6 +1101,23 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr) return NULL; } +static int kvm_reboot_notify(struct notifier_block *nb, + unsigned long val, void *v) +{ + /* + * Reset each CPU in EL2 to initial state. + */ + on_each_cpu(kvm_cpu_reset, NULL, 1); + + return NOTIFY_DONE; +} + +static struct notifier_block kvm_reboot_nb = { + .notifier_call = kvm_reboot_notify, + .next = NULL, + .priority = 0, /* FIXME */ +}; + /** * Initialize Hyp-mode and memory mappings on all CPUs. */ @@ -1138,6 +1156,9 @@ int kvm_arch_init(void *opaque) hyp_cpu_pm_init(); kvm_coproc_table_init(); + + register_reboot_notifier(&kvm_reboot_nb); + return 0; out_err: cpu_notifier_register_done(); diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig index 30ae7a7..f5590c8 100644 --- a/arch/arm64/kvm/Kconfig +++ b/arch/arm64/kvm/Kconfig @@ -18,7 +18,6 @@ if VIRTUALIZATION config KVM bool "Kernel-based Virtual Machine (KVM) support" - depends on !KEXEC select MMU_NOTIFIER select PREEMPT_NOTIFIERS select ANON_INODES
The current kvm implementation keeps EL2 vector table installed even when the system is shut down. This prevents kexec from putting the system with kvm back into EL2 when starting a new kernel. This patch resolves this issue by calling a cpu tear-down function via reboot notifier, kvm_reboot_notify(), which is invoked by kernel_restart_prepare() in kernel_kexec(). While kvm has a generic hook, kvm_reboot(), we can't use it here because a cpu teardown function will not be invoked, under current implementation, if no guest vm has been created by kvm_create_vm(). Please note that kvm_usage_count is zero in this case. We'd better, in the future, implement cpu hotplug support and put the arch-specific initialization into kvm_arch_hardware_enable/disable(). This way, we would be able to revert this patch. Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org> --- arch/arm/kvm/arm.c | 21 +++++++++++++++++++++ arch/arm64/kvm/Kconfig | 1 - 2 files changed, 21 insertions(+), 1 deletion(-)