diff mbox series

[RFC,v2,11/58] KVM: arm64: pkvm: Add pkvm_udelay()

Message ID 20241212180423.1578358-12-smostafa@google.com (mailing list archive)
State New
Headers show
Series KVM: Arm SMMUv3 driver for pKVM | expand

Commit Message

Mostafa Saleh Dec. 12, 2024, 6:03 p.m. UTC
From: Jean-Philippe Brucker <jean-philippe@linaro.org>

Add a simple delay loop for drivers.

This could use more work. It should be possible to insert a wfe and save
power, but I haven't studied whether it is safe to do so with the host
in control of the event stream. The SMMU driver will use wfe anyway for
frequent waits (provided the implementation can send command queue
events).

Signed-off-by: Mostafa Saleh <smostafa@google.com>
Signed-off-by: Jean-Philippe Brucker <jean-philippe@linaro.org>
---
 arch/arm64/kvm/hyp/include/nvhe/pkvm.h |  3 ++
 arch/arm64/kvm/hyp/nvhe/setup.c        |  4 +++
 arch/arm64/kvm/hyp/nvhe/timer-sr.c     | 42 ++++++++++++++++++++++++++
 3 files changed, 49 insertions(+)

Comments

Quentin Perret Dec. 19, 2024, 11:14 a.m. UTC | #1
On Thursday 12 Dec 2024 at 18:03:35 (+0000), Mostafa Saleh wrote:
> From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> 
> Add a simple delay loop for drivers.
> 
> This could use more work. It should be possible to insert a wfe and save
> power, but I haven't studied whether it is safe to do so with the host
> in control of the event stream. The SMMU driver will use wfe anyway for
> frequent waits (provided the implementation can send command queue
> events).

Mooh, I'm thoroughly hating that we need this -- pKVM is non preemptible
so we better not wait for too long.

I can surely figure it out from the following patches, but could you
please expand on the use-case?

On a side note I'm not too worried about the power impact of not having
a wfe in there, again we better not be spinning for long enough that
power starts to be noticeable.
Mostafa Saleh Dec. 19, 2024, 11:21 a.m. UTC | #2
On Thu, Dec 19, 2024 at 11:14:23AM +0000, Quentin Perret wrote:
> On Thursday 12 Dec 2024 at 18:03:35 (+0000), Mostafa Saleh wrote:
> > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > 
> > Add a simple delay loop for drivers.
> > 
> > This could use more work. It should be possible to insert a wfe and save
> > power, but I haven't studied whether it is safe to do so with the host
> > in control of the event stream. The SMMU driver will use wfe anyway for
> > frequent waits (provided the implementation can send command queue
> > events).
> 
> Mooh, I'm thoroughly hating that we need this -- pKVM is non preemptible
> so we better not wait for too long.
> 
> I can surely figure it out from the following patches, but could you
> please expand on the use-case?

The driver needs to poll some SMMU MMIO, so it needs to measure time
in terms of udelay to timeout, at the moment its arbitrary set to 100ms.

Thanks,
Mostafa
> 
> On a side note I'm not too worried about the power impact of not having
> a wfe in there, again we better not be spinning for long enough that
> power starts to be noticeable.
Quentin Perret Dec. 19, 2024, 11:28 a.m. UTC | #3
On Thursday 19 Dec 2024 at 11:21:40 (+0000), Mostafa Saleh wrote:
> On Thu, Dec 19, 2024 at 11:14:23AM +0000, Quentin Perret wrote:
> > On Thursday 12 Dec 2024 at 18:03:35 (+0000), Mostafa Saleh wrote:
> > > From: Jean-Philippe Brucker <jean-philippe@linaro.org>
> > > 
> > > Add a simple delay loop for drivers.
> > > 
> > > This could use more work. It should be possible to insert a wfe and save
> > > power, but I haven't studied whether it is safe to do so with the host
> > > in control of the event stream. The SMMU driver will use wfe anyway for
> > > frequent waits (provided the implementation can send command queue
> > > events).
> > 
> > Mooh, I'm thoroughly hating that we need this -- pKVM is non preemptible
> > so we better not wait for too long.
> > 
> > I can surely figure it out from the following patches, but could you
> > please expand on the use-case?
> 
> The driver needs to poll some SMMU MMIO, so it needs to measure time
> in terms of udelay to timeout, at the moment its arbitrary set to 100ms.

OK, I'll look at the patches to see how hard it'd be to return to the
host for scheduling while this is happening. It's probably hard because
we're presumably in a funky state, but it's worth trying to figure it
out. Any EL2 section longer than 100us or so is problematic in my view,
so a 100*ms* timeout is scary!
diff mbox series

Patch

diff --git a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
index 3b515ce4c433..8a5554615e40 100644
--- a/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
+++ b/arch/arm64/kvm/hyp/include/nvhe/pkvm.h
@@ -143,4 +143,7 @@  int pkvm_load_pvmfw_pages(struct pkvm_hyp_vm *vm, u64 ipa, phys_addr_t phys,
 			  u64 size);
 void pkvm_poison_pvmfw_pages(void);
 
+int pkvm_timer_init(void);
+void pkvm_udelay(unsigned long usecs);
+
 #endif /* __ARM64_KVM_NVHE_PKVM_H__ */
diff --git a/arch/arm64/kvm/hyp/nvhe/setup.c b/arch/arm64/kvm/hyp/nvhe/setup.c
index 46dd68161979..9d09f5f471b9 100644
--- a/arch/arm64/kvm/hyp/nvhe/setup.c
+++ b/arch/arm64/kvm/hyp/nvhe/setup.c
@@ -356,6 +356,10 @@  void __noreturn __pkvm_init_finalise(void)
 	if (ret)
 		goto out;
 
+	ret = pkvm_timer_init();
+	if (ret)
+		goto out;
+
 	ret = fix_host_ownership();
 	if (ret)
 		goto out;
diff --git a/arch/arm64/kvm/hyp/nvhe/timer-sr.c b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
index 3aaab20ae5b4..732beb5fe24b 100644
--- a/arch/arm64/kvm/hyp/nvhe/timer-sr.c
+++ b/arch/arm64/kvm/hyp/nvhe/timer-sr.c
@@ -11,6 +11,10 @@ 
 #include <asm/kvm_hyp.h>
 #include <asm/kvm_mmu.h>
 
+#include <nvhe/pkvm.h>
+
+static u32 timer_freq;
+
 void __kvm_timer_set_cntvoff(u64 cntvoff)
 {
 	write_sysreg(cntvoff, cntvoff_el2);
@@ -60,3 +64,41 @@  void __timer_enable_traps(struct kvm_vcpu *vcpu)
 
 	sysreg_clear_set(cnthctl_el2, clr, set);
 }
+
+static u64 pkvm_ticks_get(void)
+{
+	return __arch_counter_get_cntvct();
+}
+
+#define SEC_TO_US 1000000
+
+int pkvm_timer_init(void)
+{
+	timer_freq = read_sysreg(cntfrq_el0);
+	/*
+	 * TODO: The highest privileged level is supposed to initialize this
+	 * register. But on some systems (which?), this information is only
+	 * contained in the device-tree, so we'll need to find it out some other
+	 * way.
+	 */
+	if (!timer_freq || timer_freq < SEC_TO_US)
+		return -ENODEV;
+	return 0;
+}
+
+#define pkvm_time_us_to_ticks(us) ((u64)(us) * timer_freq / SEC_TO_US)
+
+void pkvm_udelay(unsigned long usecs)
+{
+	u64 ticks = pkvm_time_us_to_ticks(usecs);
+	u64 start = pkvm_ticks_get();
+
+	while (true) {
+		u64 cur = pkvm_ticks_get();
+
+		if ((cur - start) >= ticks || cur < start)
+			break;
+		/* TODO wfe */
+		cpu_relax();
+	}
+}