diff mbox series

[RFC,13/16] RISC-V: KVM: Add timer functionality

Message ID 20190729115544.17895-14-anup.patel@wdc.com (mailing list archive)
State New, archived
Headers show
Series KVM RISC-V Support | expand

Commit Message

Anup Patel July 29, 2019, 11:57 a.m. UTC
From: Atish Patra <atish.patra@wdc.com>

The RISC-V hypervisor specification doesn't have any virtual timer
feature.

Due to this, the guest VCPU timer will be programmed via SBI calls.
The host will use a separate hrtimer event for each guest VCPU to
provide timer functionality. We inject a virtual timer interrupt to
the guest VCPU whenever the guest VCPU hrtimer event expires.

The following features are not supported yet and will be added in
future:
1. A time offset to adjust guest time from host time
2. A saved next event in guest vcpu for vm migration

Signed-off-by: Atish Patra <atish.patra@wdc.com>
Signed-off-by: Anup Patel <anup.patel@wdc.com>
---
 arch/riscv/include/asm/kvm_host.h       |   4 +
 arch/riscv/include/asm/kvm_vcpu_timer.h |  32 +++++++
 arch/riscv/kvm/Makefile                 |   2 +-
 arch/riscv/kvm/vcpu.c                   |   6 ++
 arch/riscv/kvm/vcpu_timer.c             | 106 ++++++++++++++++++++++++
 drivers/clocksource/timer-riscv.c       |   6 ++
 include/clocksource/timer-riscv.h       |  14 ++++
 7 files changed, 169 insertions(+), 1 deletion(-)
 create mode 100644 arch/riscv/include/asm/kvm_vcpu_timer.h
 create mode 100644 arch/riscv/kvm/vcpu_timer.c
 create mode 100644 include/clocksource/timer-riscv.h

Comments

Andreas Schwab July 29, 2019, 2:40 p.m. UTC | #1
On Jul 29 2019, Anup Patel <Anup.Patel@wdc.com> wrote:

> From: Atish Patra <atish.patra@wdc.com>
>
> The RISC-V hypervisor specification doesn't have any virtual timer
> feature.
>
> Due to this, the guest VCPU timer will be programmed via SBI calls.
> The host will use a separate hrtimer event for each guest VCPU to
> provide timer functionality. We inject a virtual timer interrupt to
> the guest VCPU whenever the guest VCPU hrtimer event expires.
>
> The following features are not supported yet and will be added in
> future:
> 1. A time offset to adjust guest time from host time
> 2. A saved next event in guest vcpu for vm migration

I'm getting this error:

In file included from <command-line>:
./include/clocksource/timer-riscv.h:12:30: error: unknown type name ‘u32’
   12 | void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
      |                              ^~~
./include/clocksource/timer-riscv.h:12:41: error: unknown type name ‘u32’
   12 | void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
      |                                         ^~~
make[1]: *** [scripts/Makefile.build:301: include/clocksource/timer-riscv.h.s] Error 1

Andreas.
Atish Patra July 29, 2019, 6:02 p.m. UTC | #2
On Mon, 2019-07-29 at 16:40 +0200, Andreas Schwab wrote:
> On Jul 29 2019, Anup Patel <Anup.Patel@wdc.com> wrote:
> 
> > From: Atish Patra <atish.patra@wdc.com>
> > 
> > The RISC-V hypervisor specification doesn't have any virtual timer
> > feature.
> > 
> > Due to this, the guest VCPU timer will be programmed via SBI calls.
> > The host will use a separate hrtimer event for each guest VCPU to
> > provide timer functionality. We inject a virtual timer interrupt to
> > the guest VCPU whenever the guest VCPU hrtimer event expires.
> > 
> > The following features are not supported yet and will be added in
> > future:
> > 1. A time offset to adjust guest time from host time
> > 2. A saved next event in guest vcpu for vm migration
> 
> I'm getting this error:
> 
> In file included from <command-line>:
> ./include/clocksource/timer-riscv.h:12:30: error: unknown type name
> ‘u32’
>    12 | void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
>       |                              ^~~
> ./include/clocksource/timer-riscv.h:12:41: error: unknown type name
> ‘u32’
>    12 | void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
>       |                                         ^~~
> make[1]: *** [scripts/Makefile.build:301: include/clocksource/timer-
> riscv.h.s] Error 1
> 
> Andreas.
> 

Strange. We never saw this error. But I think we should add this one to
the header file (include/clocksource/timer-riscv.h) 

#include <linux/types.h>

Can you try it at your end and confirm please ?

Regards,
Atish
Andreas Schwab July 30, 2019, 6:51 a.m. UTC | #3
On Jul 29 2019, Atish Patra <Atish.Patra@wdc.com> wrote:

> Strange. We never saw this error.

It is part of CONFIG_KERNEL_HEADER_TEST.  Everyone developing a driver
should enable it.

> #include <linux/types.h>
>
> Can you try it at your end and confirm please ?

Confirmed.

Andreas.
Atish Patra July 30, 2019, 7 a.m. UTC | #4
On 7/29/19 11:51 PM, Andreas Schwab wrote:
> On Jul 29 2019, Atish Patra <Atish.Patra@wdc.com> wrote:
> 
>> Strange. We never saw this error.
> 
> It is part of CONFIG_KERNEL_HEADER_TEST.  Everyone developing a driver
> should enable it.
> 
>> #include <linux/types.h>
>>
>> Can you try it at your end and confirm please ?
> 
> Confirmed.
> 

Thanks. I will update the patch in v2.

> Andreas.
>
Paolo Bonzini July 30, 2019, 11:26 a.m. UTC | #5
On 29/07/19 13:57, Anup Patel wrote:
> +	if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
> +		hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(), delta_ns),

I think the guest would prefer if you saved the time before enabling
interrupts on the host, and use that here instead of ktime_get().
Otherwise the timer could be delayed arbitrarily by host interrupts.

(Because the RISC-V SBI timer is relative only---which is
unfortunate---guests will already pay a latency price due to the extra
cost of the SBI call compared to a bare metal implementation.  Sooner or
later you may want to implement something like x86's heuristic to
advance the timer deadline by a few hundred nanoseconds; perhaps add a
TODO now).

Paolo

> +				HRTIMER_MODE_ABS);
> +		t->is_set = true;
> +	} else
> +		kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
> +
Atish Patra July 31, 2019, 1:55 a.m. UTC | #6
On Tue, 2019-07-30 at 13:26 +0200, Paolo Bonzini wrote:
> On 29/07/19 13:57, Anup Patel wrote:
> > +	if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
> > +		hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(),
> > delta_ns),
> 
> I think the guest would prefer if you saved the time before enabling
> interrupts on the host, and use that here instead of ktime_get().
> Otherwise the timer could be delayed arbitrarily by host interrupts.
> 
> (Because the RISC-V SBI timer is relative only---which is
> unfortunate---

Just to clarify: RISC-V SBI timer call passes absolute time.

https://elixir.bootlin.com/linux/v5.3-rc2/source/drivers/clocksource/timer-riscv.c#L32

That's why we compute a delta between absolute time passed via SBI and
current time. hrtimer is programmed to trigger only after the delta
time from now.


> guests will already pay a latency price due to the extra
> cost of the SBI call compared to a bare metal implementation. 

Yes. There are ongoing discussions to remove this SBI call completely. 
Hopefully, that will happen before any real hardware with
virtualization support shows up :).

>  Sooner or
> later you may want to implement something like x86's heuristic to
> advance the timer deadline by a few hundred nanoseconds; perhaps add
> a
> TODO now).
> 

I am not aware of this approach. I will take a look. Thanks.

Regards,
Atish
> Paolo
> 
> > +				HRTIMER_MODE_ABS);
> > +		t->is_set = true;
> > +	} else
> > +		kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
> > +
> 
> _______________________________________________
> linux-riscv mailing list
> linux-riscv@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv
Paolo Bonzini July 31, 2019, 6:58 a.m. UTC | #7
On 31/07/19 03:55, Atish Patra wrote:
> On Tue, 2019-07-30 at 13:26 +0200, Paolo Bonzini wrote:
>> On 29/07/19 13:57, Anup Patel wrote:
>>> +	if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
>>> +		hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(),
>>> delta_ns),
>>
>> I think the guest would prefer if you saved the time before enabling
>> interrupts on the host, and use that here instead of ktime_get().
>> Otherwise the timer could be delayed arbitrarily by host interrupts.
>>
>> (Because the RISC-V SBI timer is relative only---which is
>> unfortunate---
> 
> Just to clarify: RISC-V SBI timer call passes absolute time.
> 
> https://elixir.bootlin.com/linux/v5.3-rc2/source/drivers/clocksource/timer-riscv.c#L32
> 
> That's why we compute a delta between absolute time passed via SBI and
> current time. hrtimer is programmed to trigger only after the delta
> time from now.

Nevermind, I got lost in all the conversions.

One important issue is the lack of ability to program a delta between
HS/HU-mode cycles and VS/VU-mode cycles.  Without this, it's impossible
to do virtual machine migration (except with hcounteren
trap-and-emulate, which I think we agree is not acceptable).  I found
the open issue at https://github.com/riscv/riscv-isa-manual/issues/298
and commented on it.

Paolo
Anup Patel July 31, 2019, 7:18 a.m. UTC | #8
On Wed, Jul 31, 2019 at 12:28 PM Paolo Bonzini <pbonzini@redhat.com> wrote:
>
> On 31/07/19 03:55, Atish Patra wrote:
> > On Tue, 2019-07-30 at 13:26 +0200, Paolo Bonzini wrote:
> >> On 29/07/19 13:57, Anup Patel wrote:
> >>> +   if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
> >>> +           hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(),
> >>> delta_ns),
> >>
> >> I think the guest would prefer if you saved the time before enabling
> >> interrupts on the host, and use that here instead of ktime_get().
> >> Otherwise the timer could be delayed arbitrarily by host interrupts.
> >>
> >> (Because the RISC-V SBI timer is relative only---which is
> >> unfortunate---
> >
> > Just to clarify: RISC-V SBI timer call passes absolute time.
> >
> > https://elixir.bootlin.com/linux/v5.3-rc2/source/drivers/clocksource/timer-riscv.c#L32
> >
> > That's why we compute a delta between absolute time passed via SBI and
> > current time. hrtimer is programmed to trigger only after the delta
> > time from now.
>
> Nevermind, I got lost in all the conversions.
>
> One important issue is the lack of ability to program a delta between
> HS/HU-mode cycles and VS/VU-mode cycles.  Without this, it's impossible
> to do virtual machine migration (except with hcounteren
> trap-and-emulate, which I think we agree is not acceptable).  I found
> the open issue at https://github.com/riscv/riscv-isa-manual/issues/298
> and commented on it.

This Github issue is open since quite some time now.

Thanks for commenting. I have pinged RISC-V spec maintainers as well.

Regards,
Anup
diff mbox series

Patch

diff --git a/arch/riscv/include/asm/kvm_host.h b/arch/riscv/include/asm/kvm_host.h
index 58f61ce28461..193a7ff0eb31 100644
--- a/arch/riscv/include/asm/kvm_host.h
+++ b/arch/riscv/include/asm/kvm_host.h
@@ -12,6 +12,7 @@ 
 #include <linux/types.h>
 #include <linux/kvm.h>
 #include <linux/kvm_types.h>
+#include <asm/kvm_vcpu_timer.h>
 
 #ifdef CONFIG_64BIT
 #define KVM_MAX_VCPUS			(1U << 16)
@@ -158,6 +159,9 @@  struct kvm_vcpu_arch {
 	raw_spinlock_t irqs_lock;
 	unsigned long irqs_pending;
 
+	/* VCPU Timer */
+	struct kvm_vcpu_timer timer;
+
 	/* MMIO instruction details */
 	struct kvm_mmio_decode mmio_decode;
 
diff --git a/arch/riscv/include/asm/kvm_vcpu_timer.h b/arch/riscv/include/asm/kvm_vcpu_timer.h
new file mode 100644
index 000000000000..df67ea86988e
--- /dev/null
+++ b/arch/riscv/include/asm/kvm_vcpu_timer.h
@@ -0,0 +1,32 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *	Atish Patra <atish.patra@wdc.com>
+ */
+
+#ifndef __KVM_VCPU_RISCV_TIMER_H
+#define __KVM_VCPU_RISCV_TIMER_H
+
+#include <linux/hrtimer.h>
+
+#define VCPU_TIMER_PROGRAM_THRESHOLD_NS		1000
+
+struct kvm_vcpu_timer {
+	bool init_done;
+	/* Check if the timer is programmed */
+	bool is_set;
+	struct hrtimer hrt;
+	/* Mult & Shift values to get nanosec from cycles */
+	u32 mult;
+	u32 shift;
+};
+
+int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu);
+int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
+				    unsigned long ncycles);
+
+#endif
diff --git a/arch/riscv/kvm/Makefile b/arch/riscv/kvm/Makefile
index c0f57f26c13d..3e0c7558320d 100644
--- a/arch/riscv/kvm/Makefile
+++ b/arch/riscv/kvm/Makefile
@@ -9,6 +9,6 @@  ccflags-y := -Ivirt/kvm -Iarch/riscv/kvm
 kvm-objs := $(common-objs-y)
 
 kvm-objs += main.o vm.o vmid.o tlb.o mmu.o
-kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o
+kvm-objs += vcpu.o vcpu_exit.o vcpu_switch.o vcpu_timer.o
 
 obj-$(CONFIG_KVM)	+= kvm.o
diff --git a/arch/riscv/kvm/vcpu.c b/arch/riscv/kvm/vcpu.c
index f3b0cadc1973..ed1f06b17953 100644
--- a/arch/riscv/kvm/vcpu.c
+++ b/arch/riscv/kvm/vcpu.c
@@ -52,6 +52,8 @@  static void kvm_riscv_reset_vcpu(struct kvm_vcpu *vcpu)
 
 	memcpy(cntx, reset_cntx, sizeof(*cntx));
 
+	kvm_riscv_vcpu_timer_reset(vcpu);
+
 	raw_spin_lock_irqsave(&vcpu->arch.irqs_lock, f);
 	vcpu->arch.irqs_pending = 0;
 	raw_spin_unlock_irqrestore(&vcpu->arch.irqs_lock, f);
@@ -125,6 +127,9 @@  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 	csr->hideleg |= SIE_STIE;
 	csr->hideleg |= SIE_SEIE;
 
+	/* Setup VCPU timer */
+	kvm_riscv_vcpu_timer_init(vcpu);
+
 	/* Reset VCPU */
 	kvm_riscv_reset_vcpu(vcpu);
 
@@ -133,6 +138,7 @@  int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
 
 void kvm_arch_vcpu_destroy(struct kvm_vcpu *vcpu)
 {
+	kvm_riscv_vcpu_timer_deinit(vcpu);
 	kvm_riscv_stage2_flush_cache(vcpu);
 	kmem_cache_free(kvm_vcpu_cache, vcpu);
 }
diff --git a/arch/riscv/kvm/vcpu_timer.c b/arch/riscv/kvm/vcpu_timer.c
new file mode 100644
index 000000000000..a45ca06e1aa6
--- /dev/null
+++ b/arch/riscv/kvm/vcpu_timer.c
@@ -0,0 +1,106 @@ 
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *     Atish Patra <atish.patra@wdc.com>
+ */
+
+#include <linux/errno.h>
+#include <linux/err.h>
+#include <linux/kvm_host.h>
+#include <clocksource/timer-riscv.h>
+#include <asm/csr.h>
+#include <asm/kvm_vcpu_timer.h>
+
+static enum hrtimer_restart kvm_riscv_vcpu_hrtimer_expired(struct hrtimer *h)
+{
+	struct kvm_vcpu_timer *t = container_of(h, struct kvm_vcpu_timer, hrt);
+	struct kvm_vcpu *vcpu = container_of(t, struct kvm_vcpu, arch.timer);
+
+	t->is_set = false;
+	kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
+
+	return HRTIMER_NORESTART;
+}
+
+static u64 kvm_riscv_delta_cycles2ns(u64 cycles, struct kvm_vcpu_timer *t)
+{
+	unsigned long flags;
+	u64 cycles_now, cycles_delta, delta_ns;
+
+	local_irq_save(flags);
+	cycles_now = get_cycles64();
+	if (cycles_now < cycles)
+		cycles_delta = cycles - cycles_now;
+	else
+		cycles_delta = 0;
+	delta_ns = (cycles_delta * t->mult) >> t->shift;
+	local_irq_restore(flags);
+
+	return delta_ns;
+}
+
+static int kvm_riscv_vcpu_timer_cancel(struct kvm_vcpu_timer *t)
+{
+	if (!t->init_done || !t->is_set)
+		return -EINVAL;
+
+	hrtimer_cancel(&t->hrt);
+	t->is_set = false;
+
+	return 0;
+}
+
+int kvm_riscv_vcpu_timer_next_event(struct kvm_vcpu *vcpu,
+				    unsigned long ncycles)
+{
+	struct kvm_vcpu_timer *t = &vcpu->arch.timer;
+	u64 delta_ns = kvm_riscv_delta_cycles2ns(ncycles, t);
+
+	if (!t->init_done)
+		return -EINVAL;
+
+	kvm_riscv_vcpu_unset_interrupt(vcpu, IRQ_S_TIMER);
+
+	if (delta_ns > VCPU_TIMER_PROGRAM_THRESHOLD_NS) {
+		hrtimer_start(&t->hrt, ktime_add_ns(ktime_get(), delta_ns),
+				HRTIMER_MODE_ABS);
+		t->is_set = true;
+	} else
+		kvm_riscv_vcpu_set_interrupt(vcpu, IRQ_S_TIMER);
+
+	return 0;
+}
+
+int kvm_riscv_vcpu_timer_init(struct kvm_vcpu *vcpu)
+{
+	struct kvm_vcpu_timer *t = &vcpu->arch.timer;
+
+	if (t->init_done)
+		return -EINVAL;
+
+	hrtimer_init(&t->hrt, CLOCK_MONOTONIC, HRTIMER_MODE_ABS);
+	t->hrt.function = kvm_riscv_vcpu_hrtimer_expired;
+	t->init_done = true;
+	t->is_set = false;
+
+	riscv_cs_get_mult_shift(&t->mult, &t->shift);
+
+	return 0;
+}
+
+int kvm_riscv_vcpu_timer_deinit(struct kvm_vcpu *vcpu)
+{
+	int ret;
+
+	ret = kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer);
+	vcpu->arch.timer.init_done = false;
+
+	return ret;
+}
+
+int kvm_riscv_vcpu_timer_reset(struct kvm_vcpu *vcpu)
+{
+	return kvm_riscv_vcpu_timer_cancel(&vcpu->arch.timer);
+}
diff --git a/drivers/clocksource/timer-riscv.c b/drivers/clocksource/timer-riscv.c
index 09e031176bc6..749b25876cad 100644
--- a/drivers/clocksource/timer-riscv.c
+++ b/drivers/clocksource/timer-riscv.c
@@ -80,6 +80,12 @@  static int riscv_timer_dying_cpu(unsigned int cpu)
 	return 0;
 }
 
+void riscv_cs_get_mult_shift(u32 *mult, u32 *shift)
+{
+	*mult = riscv_clocksource.mult;
+	*shift = riscv_clocksource.shift;
+}
+
 /* called directly from the low-level interrupt handler */
 void riscv_timer_interrupt(void)
 {
diff --git a/include/clocksource/timer-riscv.h b/include/clocksource/timer-riscv.h
new file mode 100644
index 000000000000..ecb9f70e2f98
--- /dev/null
+++ b/include/clocksource/timer-riscv.h
@@ -0,0 +1,14 @@ 
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Copyright (C) 2019 Western Digital Corporation or its affiliates.
+ *
+ * Authors:
+ *	Atish Patra <atish.patra@wdc.com>
+ */
+
+#ifndef __KVM_TIMER_RISCV_H
+#define __KVM_TIMER_RISCV_H
+
+void riscv_cs_get_mult_shift(u32 *mult, u32 *shift);
+
+#endif