diff mbox

[v3,2/5] arm64: kvm: allow EL2 context to be reset on shutdown

Message ID 1427953216-11737-3-git-send-email-takahiro.akashi@linaro.org (mailing list archive)
State New, archived
Headers show

Commit Message

AKASHI Takahiro April 2, 2015, 5:40 a.m. UTC
The current kvm implementation keeps EL2 vector table installed even
when the system is shut down. This prevents kexec from putting the system
with kvm back into EL2 when starting a new kernel.

This patch resolves this issue by calling a cpu tear-down function via
reboot notifier, kvm_reboot_notify(), which is invoked by
kernel_restart_prepare() in kernel_kexec().
While kvm has a generic hook, kvm_reboot(), we can't use it here because
a cpu teardown function will not be invoked, under current implementation,
if no guest vm has been created by kvm_create_vm().
Please note that kvm_usage_count is zero in this case.

We'd better, in the future, implement cpu hotplug support and put the
arch-specific initialization into kvm_arch_hardware_enable/disable().
This way, we would be able to revert this patch.

Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
---
 arch/arm/kvm/arm.c     |   21 +++++++++++++++++++++
 arch/arm64/kvm/Kconfig |    1 -
 2 files changed, 21 insertions(+), 1 deletion(-)

Comments

Mark Rutland April 8, 2015, 1:05 p.m. UTC | #1
On Thu, Apr 02, 2015 at 06:40:13AM +0100, AKASHI Takahiro wrote:
> The current kvm implementation keeps EL2 vector table installed even
> when the system is shut down. This prevents kexec from putting the system
> with kvm back into EL2 when starting a new kernel.
> 
> This patch resolves this issue by calling a cpu tear-down function via
> reboot notifier, kvm_reboot_notify(), which is invoked by
> kernel_restart_prepare() in kernel_kexec().
> While kvm has a generic hook, kvm_reboot(), we can't use it here because
> a cpu teardown function will not be invoked, under current implementation,
> if no guest vm has been created by kvm_create_vm().
> Please note that kvm_usage_count is zero in this case.
> 
> We'd better, in the future, implement cpu hotplug support and put the
> arch-specific initialization into kvm_arch_hardware_enable/disable().
> This way, we would be able to revert this patch.

Why can't we use kvm_arch_hardware_enable/disable() currently?

> 
> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
> ---
>  arch/arm/kvm/arm.c     |   21 +++++++++++++++++++++
>  arch/arm64/kvm/Kconfig |    1 -
>  2 files changed, 21 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
> index 39df694..f64713e 100644
> --- a/arch/arm/kvm/arm.c
> +++ b/arch/arm/kvm/arm.c
> @@ -25,6 +25,7 @@
>  #include <linux/vmalloc.h>
>  #include <linux/fs.h>
>  #include <linux/mman.h>
> +#include <linux/reboot.h>
>  #include <linux/sched.h>
>  #include <linux/kvm.h>
>  #include <trace/events/kvm.h>
> @@ -1100,6 +1101,23 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
>  	return NULL;
>  }
>  
> +static int kvm_reboot_notify(struct notifier_block *nb,
> +			     unsigned long val, void *v)
> +{
> +	/*
> +	 * Reset each CPU in EL2 to initial state.
> +	 */
> +	on_each_cpu(kvm_cpu_reset, NULL, 1);
> +
> +	return NOTIFY_DONE;
> +}
> +
> +static struct notifier_block kvm_reboot_nb = {
> +	.notifier_call		= kvm_reboot_notify,
> +	.next			= NULL,
> +	.priority		= 0, /* FIXME */

It would be helpful for the comment to explain why this is wrong, and
what needs fixing.

Mark.
AKASHI Takahiro April 9, 2015, 4:53 a.m. UTC | #2
Mark,

On 04/08/2015 10:05 PM, Mark Rutland wrote:
> On Thu, Apr 02, 2015 at 06:40:13AM +0100, AKASHI Takahiro wrote:
>> The current kvm implementation keeps EL2 vector table installed even
>> when the system is shut down. This prevents kexec from putting the system
>> with kvm back into EL2 when starting a new kernel.
>>
>> This patch resolves this issue by calling a cpu tear-down function via
>> reboot notifier, kvm_reboot_notify(), which is invoked by
>> kernel_restart_prepare() in kernel_kexec().
>> While kvm has a generic hook, kvm_reboot(), we can't use it here because
>> a cpu teardown function will not be invoked, under current implementation,
>> if no guest vm has been created by kvm_create_vm().
>> Please note that kvm_usage_count is zero in this case.
>>
>> We'd better, in the future, implement cpu hotplug support and put the
>> arch-specific initialization into kvm_arch_hardware_enable/disable().
>> This way, we would be able to revert this patch.
>
> Why can't we use kvm_arch_hardware_enable/disable() currently?

IIUC, kvm will call kvm_arch_hardware_enable() iff a new guest is being
created *and* cpus have not been initialized yet. kvm_usage_count==0
indicates this. Similarly, kvm will call kvm_arch_hardware_disable() whenever
a guest is being terminated (i.e. kvm_usage_count != 0).
Therefore if kvm_arch_hardware_enable/disable() also handle EL2 vector table
initialization, we don't have to have any particular operations, as my patch
does, for kexec case.
(a long-term solution)

Since arm64 doesn't implement kvm_arch_hardware_enable() (I don't know why),
I'm trying to fix the problem by adding a minimum tear-down function, kvm_cpu_reset,
and invoking it via a reboot hook.
(an interim fix)

This scheme of a interim fix and a long-term solution, I heard, has been agreed
by Marc and Geoff in LCU14. I just followed it.

Is this clear?

>>
>> Signed-off-by: AKASHI Takahiro <takahiro.akashi@linaro.org>
>> ---
>>   arch/arm/kvm/arm.c     |   21 +++++++++++++++++++++
>>   arch/arm64/kvm/Kconfig |    1 -
>>   2 files changed, 21 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>> index 39df694..f64713e 100644
>> --- a/arch/arm/kvm/arm.c
>> +++ b/arch/arm/kvm/arm.c
>> @@ -25,6 +25,7 @@
>>   #include <linux/vmalloc.h>
>>   #include <linux/fs.h>
>>   #include <linux/mman.h>
>> +#include <linux/reboot.h>
>>   #include <linux/sched.h>
>>   #include <linux/kvm.h>
>>   #include <trace/events/kvm.h>
>> @@ -1100,6 +1101,23 @@ struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
>>   	return NULL;
>>   }
>>
>> +static int kvm_reboot_notify(struct notifier_block *nb,
>> +			     unsigned long val, void *v)
>> +{
>> +	/*
>> +	 * Reset each CPU in EL2 to initial state.
>> +	 */
>> +	on_each_cpu(kvm_cpu_reset, NULL, 1);
>> +
>> +	return NOTIFY_DONE;
>> +}
>> +
>> +static struct notifier_block kvm_reboot_nb = {
>> +	.notifier_call		= kvm_reboot_notify,
>> +	.next			= NULL,
>> +	.priority		= 0, /* FIXME */
>
> It would be helpful for the comment to explain why this is wrong, and
> what needs fixing.

Thank for reminding me of this.

*priority* enforces a calling order of registered hook functions.
If some hook returns NOTIFY_STOP_MASK, subsequent hooks won't be called.
(Nevertheless, reboot sequence will go ahead. See kernel_restart_prepare()/
notifier_call_chain().)

So we should make sure that kvm_reboot_notify() be called
1) after any hook functions which may depend on kvm, and
2) before any hook functions which kvm may depend on, and
3) before any hook functions that may return NOTIFY_STOP_MASK

But how can we guarantee this and determine a priority of kvm_reboot_notify()?
Looking into all the occurrences of register_reboot_notifier(),
1) => nothing
2) => virt/kvm/kvm_main.c (priority: 0)
3) => drivers/cpufreq/s32416-cpufreq.c (priority: 0)
       drivers/cpufreq/s5pv210-cpufreq.c (priority: 0)

So a priority higher than zero might be safe and better, but exactly what?
Some hooks use "INT_MAX."

Thanks,
-Takahiro AKASHI

> Mark.
>
Mark Rutland April 9, 2015, 3:02 p.m. UTC | #3
On Thu, Apr 09, 2015 at 05:53:33AM +0100, AKASHI Takahiro wrote:
> Mark,
> 
> On 04/08/2015 10:05 PM, Mark Rutland wrote:
> > On Thu, Apr 02, 2015 at 06:40:13AM +0100, AKASHI Takahiro wrote:
> >> The current kvm implementation keeps EL2 vector table installed even
> >> when the system is shut down. This prevents kexec from putting the system
> >> with kvm back into EL2 when starting a new kernel.
> >>
> >> This patch resolves this issue by calling a cpu tear-down function via
> >> reboot notifier, kvm_reboot_notify(), which is invoked by
> >> kernel_restart_prepare() in kernel_kexec().
> >> While kvm has a generic hook, kvm_reboot(), we can't use it here because
> >> a cpu teardown function will not be invoked, under current implementation,
> >> if no guest vm has been created by kvm_create_vm().
> >> Please note that kvm_usage_count is zero in this case.
> >>
> >> We'd better, in the future, implement cpu hotplug support and put the
> >> arch-specific initialization into kvm_arch_hardware_enable/disable().
> >> This way, we would be able to revert this patch.
> >
> > Why can't we use kvm_arch_hardware_enable/disable() currently?
> 
> IIUC, kvm will call kvm_arch_hardware_enable() iff a new guest is being
> created *and* cpus have not been initialized yet. kvm_usage_count==0
> indicates this. Similarly, kvm will call kvm_arch_hardware_disable() whenever
> a guest is being terminated (i.e. kvm_usage_count != 0).
> Therefore if kvm_arch_hardware_enable/disable() also handle EL2 vector table
> initialization, we don't have to have any particular operations, as my patch
> does, for kexec case.
> (a long-term solution)
> 
> Since arm64 doesn't implement kvm_arch_hardware_enable() (I don't know why),
> I'm trying to fix the problem by adding a minimum tear-down function, kvm_cpu_reset,
> and invoking it via a reboot hook.
> (an interim fix)

What I don't understand is why we can't move the init and tear-down
functions into kvm_arch_hardware_enable/disable(). They seem to be for
precisely what you are implementing, with the only difference being the
time that they are called.

Either I'm missing something, or we can simply implement the existing
hooks. I assume I'm missing something.

> >> +static struct notifier_block kvm_reboot_nb = {
> >> +	.notifier_call		= kvm_reboot_notify,
> >> +	.next			= NULL,
> >> +	.priority		= 0, /* FIXME */
> >
> > It would be helpful for the comment to explain why this is wrong, and
> > what needs fixing.
> 
> Thank for reminding me of this.
> 
> *priority* enforces a calling order of registered hook functions.
> If some hook returns NOTIFY_STOP_MASK, subsequent hooks won't be called.
> (Nevertheless, reboot sequence will go ahead. See kernel_restart_prepare()/
> notifier_call_chain().)
> 
> So we should make sure that kvm_reboot_notify() be called
> 1) after any hook functions which may depend on kvm, and

Which hooks depend on KVM?

> 2) before any hook functions which kvm may depend on, and

Which other hooks does KVM depend on?

> 3) before any hook functions that may return NOTIFY_STOP_MASK

I think this would be solved by using kvm_arch_hardware_enable/disable.
As far as I can tell, the VMs would be destroyed earlier (and hence KVM
disabled) before we got to the final teardown.

Thanks,
Mark.
AKASHI Takahiro April 10, 2015, 6:15 a.m. UTC | #4
Mark
Cc: Marc, Geoff

On 04/10/2015 12:02 AM, Mark Rutland wrote:
> On Thu, Apr 09, 2015 at 05:53:33AM +0100, AKASHI Takahiro wrote:
>> Mark,
>>
>> On 04/08/2015 10:05 PM, Mark Rutland wrote:
>>> On Thu, Apr 02, 2015 at 06:40:13AM +0100, AKASHI Takahiro wrote:
>>>> The current kvm implementation keeps EL2 vector table installed even
>>>> when the system is shut down. This prevents kexec from putting the system
>>>> with kvm back into EL2 when starting a new kernel.
>>>>
>>>> This patch resolves this issue by calling a cpu tear-down function via
>>>> reboot notifier, kvm_reboot_notify(), which is invoked by
>>>> kernel_restart_prepare() in kernel_kexec().
>>>> While kvm has a generic hook, kvm_reboot(), we can't use it here because
>>>> a cpu teardown function will not be invoked, under current implementation,
>>>> if no guest vm has been created by kvm_create_vm().
>>>> Please note that kvm_usage_count is zero in this case.
>>>>
>>>> We'd better, in the future, implement cpu hotplug support and put the
>>>> arch-specific initialization into kvm_arch_hardware_enable/disable().
>>>> This way, we would be able to revert this patch.
>>>
>>> Why can't we use kvm_arch_hardware_enable/disable() currently?
>>
>> IIUC, kvm will call kvm_arch_hardware_enable() iff a new guest is being
>> created *and* cpus have not been initialized yet. kvm_usage_count==0
>> indicates this. Similarly, kvm will call kvm_arch_hardware_disable() whenever
>> a guest is being terminated (i.e. kvm_usage_count != 0).
>> Therefore if kvm_arch_hardware_enable/disable() also handle EL2 vector table
>> initialization, we don't have to have any particular operations, as my patch
>> does, for kexec case.
>> (a long-term solution)
>>
>> Since arm64 doesn't implement kvm_arch_hardware_enable() (I don't know why),
>> I'm trying to fix the problem by adding a minimum tear-down function, kvm_cpu_reset,
>> and invoking it via a reboot hook.
>> (an interim fix)
>
> What I don't understand is why we can't move the init and tear-down
> functions into kvm_arch_hardware_enable/disable(). They seem to be for
> precisely what you are implementing, with the only difference being the
> time that they are called.

I don't know, neither. I just followed the discussions between Marc and Geoff,
and their conclusion. I guessed that *refactoring* might be more complicated than
expected.

FYI, I gave a quick try to kvm_arch_hardware_enable() approach by removing
cpu_init_hyp_mode() from init_hyp_mode() and putting it into kvm_arch_hardware_enable(),
and it seems to work, at least, in my environment:
    boot => start a kvm guest => kexec reboot => start a kvm guest

> Either I'm missing something, or we can simply implement the existing
> hooks. I assume I'm missing something.

Marc, Geoff, any comments?


>>>> +static struct notifier_block kvm_reboot_nb = {
>>>> +	.notifier_call		= kvm_reboot_notify,
>>>> +	.next			= NULL,
>>>> +	.priority		= 0, /* FIXME */
>>>
>>> It would be helpful for the comment to explain why this is wrong, and
>>> what needs fixing.
>>
>> Thank for reminding me of this.
>>
>> *priority* enforces a calling order of registered hook functions.
>> If some hook returns NOTIFY_STOP_MASK, subsequent hooks won't be called.
>> (Nevertheless, reboot sequence will go ahead. See kernel_restart_prepare()/
>> notifier_call_chain().)
>>
>> So we should make sure that kvm_reboot_notify() be called
>> 1) after any hook functions which may depend on kvm, and
>
> Which hooks depend on KVM?

I think I answered this question below:
 >> But how can we guarantee this and determine a priority of kvm_reboot_notify()?
 >> Looking into all the occurrences of register_reboot_notifier(),
 >> 1) => nothing
 >> 2) => virt/kvm/kvm_main.c (priority: 0)
 >> 3) => drivers/cpufreq/s32416-cpufreq.c (priority: 0)
 >>        drivers/cpufreq/s5pv210-cpufreq.c (priority: 0)
 >>
 >> So a priority higher than zero might be safe and better, but exactly what?
 >> Some hooks use "INT_MAX."

Thanks,
-Takahiro AKASHI

>> 2) before any hook functions which kvm may depend on, and
>
> Which other hooks does KVM depend on?
>
>> 3) before any hook functions that may return NOTIFY_STOP_MASK
>
> I think this would be solved by using kvm_arch_hardware_enable/disable.
> As far as I can tell, the VMs would be destroyed earlier (and hence KVM
> disabled) before we got to the final teardown.
>
> Thanks,
> Mark.
>
Mark Rutland April 15, 2015, 12:49 p.m. UTC | #5
> > What I don't understand is why we can't move the init and tear-down
> > functions into kvm_arch_hardware_enable/disable(). They seem to be for
> > precisely what you are implementing, with the only difference being the
> > time that they are called.
> 
> I don't know, neither. I just followed the discussions between Marc and Geoff,
> and their conclusion. I guessed that *refactoring* might be more complicated than
> expected.
> 
> FYI, I gave a quick try to kvm_arch_hardware_enable() approach by removing
> cpu_init_hyp_mode() from init_hyp_mode() and putting it into kvm_arch_hardware_enable(),
> and it seems to work, at least, in my environment:
>     boot => start a kvm guest => kexec reboot => start a kvm guest

That sounds pretty convincing to me, assuming you wired the teardown
intp kvm_arch_hardware_disable() ?

> > Either I'm missing something, or we can simply implement the existing
> > hooks. I assume I'm missing something.
> 
> Marc, Geoff, any comments?
> 
> 
> >>>> +static struct notifier_block kvm_reboot_nb = {
> >>>> +	.notifier_call		= kvm_reboot_notify,
> >>>> +	.next			= NULL,
> >>>> +	.priority		= 0, /* FIXME */
> >>>
> >>> It would be helpful for the comment to explain why this is wrong, and
> >>> what needs fixing.
> >>
> >> Thank for reminding me of this.
> >>
> >> *priority* enforces a calling order of registered hook functions.
> >> If some hook returns NOTIFY_STOP_MASK, subsequent hooks won't be called.
> >> (Nevertheless, reboot sequence will go ahead. See kernel_restart_prepare()/
> >> notifier_call_chain().)
> >>
> >> So we should make sure that kvm_reboot_notify() be called
> >> 1) after any hook functions which may depend on kvm, and
> >
> > Which hooks depend on KVM?
> 
> I think I answered this question below:
>  >> But how can we guarantee this and determine a priority of kvm_reboot_notify()?
>  >> Looking into all the occurrences of register_reboot_notifier(),
>  >> 1) => nothing
>  >> 2) => virt/kvm/kvm_main.c (priority: 0)
>  >> 3) => drivers/cpufreq/s32416-cpufreq.c (priority: 0)
>  >>        drivers/cpufreq/s5pv210-cpufreq.c (priority: 0)
>  >>
>  >> So a priority higher than zero might be safe and better, but exactly what?
>  >> Some hooks use "INT_MAX."

I can't see anything listed which has a dependency on KVM.

The KVM notifier would be superseded by
kvm_arch_hardware_{disable,enable}, and the cpufreq instances don't seem
to have any relationship to KVM.

Other architectures use kvm_arch_hardware_{enable,disable}(), so I
imagine the core KVM code has no problem with the ordering of these.

Mark.
diff mbox

Patch

diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
index 39df694..f64713e 100644
--- a/arch/arm/kvm/arm.c
+++ b/arch/arm/kvm/arm.c
@@ -25,6 +25,7 @@ 
 #include <linux/vmalloc.h>
 #include <linux/fs.h>
 #include <linux/mman.h>
+#include <linux/reboot.h>
 #include <linux/sched.h>
 #include <linux/kvm.h>
 #include <trace/events/kvm.h>
@@ -1100,6 +1101,23 @@  struct kvm_vcpu *kvm_mpidr_to_vcpu(struct kvm *kvm, unsigned long mpidr)
 	return NULL;
 }
 
+static int kvm_reboot_notify(struct notifier_block *nb,
+			     unsigned long val, void *v)
+{
+	/*
+	 * Reset each CPU in EL2 to initial state.
+	 */
+	on_each_cpu(kvm_cpu_reset, NULL, 1);
+
+	return NOTIFY_DONE;
+}
+
+static struct notifier_block kvm_reboot_nb = {
+	.notifier_call		= kvm_reboot_notify,
+	.next			= NULL,
+	.priority		= 0, /* FIXME */
+};
+
 /**
  * Initialize Hyp-mode and memory mappings on all CPUs.
  */
@@ -1138,6 +1156,9 @@  int kvm_arch_init(void *opaque)
 	hyp_cpu_pm_init();
 
 	kvm_coproc_table_init();
+
+	register_reboot_notifier(&kvm_reboot_nb);
+
 	return 0;
 out_err:
 	cpu_notifier_register_done();
diff --git a/arch/arm64/kvm/Kconfig b/arch/arm64/kvm/Kconfig
index 30ae7a7..f5590c8 100644
--- a/arch/arm64/kvm/Kconfig
+++ b/arch/arm64/kvm/Kconfig
@@ -18,7 +18,6 @@  if VIRTUALIZATION
 
 config KVM
 	bool "Kernel-based Virtual Machine (KVM) support"
-	depends on !KEXEC
 	select MMU_NOTIFIER
 	select PREEMPT_NOTIFIERS
 	select ANON_INODES