diff mbox series

[RFC,v2,2/4] KVM: arm64: Add PSCI SYSTEM_OFF2 function for hibernation

Message ID 20240318164646.1010092-3-dwmw2@infradead.org (mailing list archive)
State Handled Elsewhere, archived
Headers show
Series arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation | expand

Commit Message

David Woodhouse March 18, 2024, 4:14 p.m. UTC
From: David Woodhouse <dwmw@amazon.co.uk>

The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 function
which is analogous to ACPI S4 state. This will allow hosting environments
to determine that a guest is hibernated rather than just powered off, and
ensure that they preserve the virtual environment appropriately to allow
the guest to resume safely (or bump the hardware_signature in the FACS to
trigger a clean reboot instead).

The beta version will be changed to say that PSCI_FEATURES returns a bit
mask of the supported hibernate types, which is implemented here.

Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
---
 Documentation/virt/kvm/api.rst    | 11 +++++++++
 arch/arm64/include/asm/kvm_host.h |  2 ++
 arch/arm64/include/uapi/asm/kvm.h |  6 +++++
 arch/arm64/kvm/arm.c              |  5 +++++
 arch/arm64/kvm/psci.c             | 37 +++++++++++++++++++++++++++++++
 include/uapi/linux/kvm.h          |  1 +
 6 files changed, 62 insertions(+)

Comments

Marc Zyngier March 18, 2024, 5:29 p.m. UTC | #1
On Mon, 18 Mar 2024 16:14:24 +0000,
David Woodhouse <dwmw2@infradead.org> wrote:
> 
> From: David Woodhouse <dwmw@amazon.co.uk>
> 
> The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 function
> which is analogous to ACPI S4 state. This will allow hosting environments
> to determine that a guest is hibernated rather than just powered off, and
> ensure that they preserve the virtual environment appropriately to allow
> the guest to resume safely (or bump the hardware_signature in the FACS to
> trigger a clean reboot instead).
> 
> The beta version will be changed to say that PSCI_FEATURES returns a bit
> mask of the supported hibernate types, which is implemented here.
> 
> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
> ---
>  Documentation/virt/kvm/api.rst    | 11 +++++++++
>  arch/arm64/include/asm/kvm_host.h |  2 ++
>  arch/arm64/include/uapi/asm/kvm.h |  6 +++++
>  arch/arm64/kvm/arm.c              |  5 +++++
>  arch/arm64/kvm/psci.c             | 37 +++++++++++++++++++++++++++++++
>  include/uapi/linux/kvm.h          |  1 +
>  6 files changed, 62 insertions(+)
> 
> diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
> index 0b5a33ee71ee..ff061b6a2393 100644
> --- a/Documentation/virt/kvm/api.rst
> +++ b/Documentation/virt/kvm/api.rst
> @@ -6761,6 +6761,10 @@ the first `ndata` items (possibly zero) of the data array are valid.
>     the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI
>     specification.
>  
> + - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2
> +   if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI
> +   specification.
> +
>   - for RISC-V, data[0] is set to the value of the second argument of the
>     ``sbi_system_reset`` call.
>  
> @@ -6794,6 +6798,13 @@ either:
>   - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
>     "Caller responsibilities" for possible return values.
>  
> +Hibernation using the PSCI SYSTEM_OFF2 call is enabled with the
> +KVM_CAP_ARM_SYSTEM_OFF2 VM capability. If a guest invokes the PSCI
> +SYSTEM_OFF2 function, KVM will exit to userspace with the
> +KVM_SYSTEM_EVENT_SHUTDOWN event type and with data[0] set to
> +KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only supported hibernate
> +type for the SYSTEM_OFF2 function is HIBERNATE_OFF (0x0).

Again, I really oppose this way of doing things. We already have an
infrastructure for selecting PSCI levels. You may not like it, but it
exists, and I'm not going entertain supporting yet another bike-shed
model. Adding an orthogonal cap for a feature that is specific to a
new PSCI version is just awful.

Please make PSCI 1.3 the only version of PSCI supporting suspend in a
non-optional way, and be done with it.

	M.
David Woodhouse March 18, 2024, 5:54 p.m. UTC | #2
On Mon, 2024-03-18 at 17:29 +0000, Marc Zyngier wrote:
> 
> Again, I really oppose this way of doing things. We already have an
> infrastructure for selecting PSCI levels. You may not like it, but it
> exists, and I'm not going entertain supporting yet another bike-shed
> model. Adding an orthogonal cap for a feature that is specific to a
> new PSCI version is just awful.

Huh? This isn't a "new bike-shed model". This is a straight copy of
what we *already* have for SYSTEM_RESET2.

If I were bike-shedding, I wouldn't do separate caps for them; I'd have
done it as a *bitmask* of the optional PSCI calls that should be
enabled.

The *mandatory* ones should obviously come from the PSCI version alone,
but I can't see how that makes sense for the optional ones...

> Please make PSCI 1.3 the only version of PSCI supporting suspend in a
> non-optional way, and be done with it.

SYSTEM_OFF2 is an *optional* feature in PSCI v1.3. As are
CLEAR_INV_MEMREGION and CLEAR_INV_MEMREGION_ATTRIBUTES.

Are you suggesting that enabling v1.3 should automatically enable *all*
of the optional features that were defined in that version (and
previous versions) of the spec?
Marc Zyngier March 18, 2024, 6:07 p.m. UTC | #3
On Mon, 18 Mar 2024 17:54:06 +0000,
David Woodhouse <dwmw2@infradead.org> wrote:
> 
> [1  <text/plain; UTF-8 (quoted-printable)>]
> On Mon, 2024-03-18 at 17:29 +0000, Marc Zyngier wrote:
> > 
> > Again, I really oppose this way of doing things. We already have an
> > infrastructure for selecting PSCI levels. You may not like it, but it
> > exists, and I'm not going entertain supporting yet another bike-shed
> > model. Adding an orthogonal cap for a feature that is specific to a
> > new PSCI version is just awful.
> 
> Huh? This isn't a "new bike-shed model". This is a straight copy of
> what we *already* have for SYSTEM_RESET2.

There is no KVM capability for SYSTEM_RESET2. It is directly
advertised to the guest when PSCI 1.1 is supported.

> If I were bike-shedding, I wouldn't do separate caps for them; I'd have
> done it as a *bitmask* of the optional PSCI calls that should be
> enabled.
>
> The *mandatory* ones should obviously come from the PSCI version alone,
> but I can't see how that makes sense for the optional ones...

The guest is in a position to probe for what is supported or not with
the PSCI_FEATURES call.  Why would you add anything else?

> 
> > Please make PSCI 1.3 the only version of PSCI supporting suspend in a
> > non-optional way, and be done with it.
> 
> SYSTEM_OFF2 is an *optional* feature in PSCI v1.3. As are
> CLEAR_INV_MEMREGION and CLEAR_INV_MEMREGION_ATTRIBUTES.
> 
> Are you suggesting that enabling v1.3 should automatically enable *all*
> of the optional features that were defined in that version (and
> previous versions) of the spec?

No. We have everything we need to incrementally *add* features. So you
can perfectly implement PSCI 1.3 with only SYSTEM_OFF2, and only later
on add the rest, if ever.

	M.
David Woodhouse March 18, 2024, 6:17 p.m. UTC | #4
On Mon, 2024-03-18 at 18:07 +0000, Marc Zyngier wrote:
> On Mon, 18 Mar 2024 17:54:06 +0000,
> David Woodhouse <dwmw2@infradead.org> wrote:
> > 
> > [1  <text/plain; UTF-8 (quoted-printable)>]
> > On Mon, 2024-03-18 at 17:29 +0000, Marc Zyngier wrote:
> > > 
> > > Again, I really oppose this way of doing things. We already have an
> > > infrastructure for selecting PSCI levels. You may not like it, but it
> > > exists, and I'm not going entertain supporting yet another bike-shed
> > > model. Adding an orthogonal cap for a feature that is specific to a
> > > new PSCI version is just awful.
> > 
> > Huh? This isn't a "new bike-shed model". This is a straight copy of
> > what we *already* have for SYSTEM_RESET2.
> 
> There is no KVM capability for SYSTEM_RESET2. It is directly
> advertised to the guest when PSCI 1.1 is supported.

Apologies, I got that wrong. It's SYSTEM_SUSPEND and the corresponding
KVM_CAP_ARM_SYSTEM_SUSPEND that I was thinking of. Not SYSTEM_RESET2.I
mixed those up.
 
> > If I were bike-shedding, I wouldn't do separate caps for them; I'd have
> > done it as a *bitmask* of the optional PSCI calls that should be
> > enabled.
> > 
> > The *mandatory* ones should obviously come from the PSCI version alone,
> > but I can't see how that makes sense for the optional ones...
> 
> The guest is in a position to probe for what is supported or not with
> the PSCI_FEATURES call.  Why would you add anything else?

Because we don't want to silently *change* what's advertised to the
guest with the VMM explicitly opting in. 

> > > Please make PSCI 1.3 the only version of PSCI supporting suspend in a
> > > non-optional way, and be done with it.
> > 
> > SYSTEM_OFF2 is an *optional* feature in PSCI v1.3. As are
> > CLEAR_INV_MEMREGION and CLEAR_INV_MEMREGION_ATTRIBUTES.
> > 
> > Are you suggesting that enabling v1.3 should automatically enable *all*
> > of the optional features that were defined in that version (and
> > previous versions) of the spec?
> 
> No. We have everything we need to incrementally *add* features. So you
> can perfectly implement PSCI 1.3 with only SYSTEM_OFF2, and only later
> on add the rest, if ever.

OK. It's still awful, but I suppose can live with that since existing
VMMs will just see the same KVM_SYSTEM_EVENT_SHUTDOWN as before, and
hopefully just won't understand the flag (and won't notice) the extra
flag which says it's a hibernate.

A VMM might *perhaps* check for flags it doesn't understand and
complain about them, which is why we shouldn't really do that. But
where PSCI is concerned it seems we've left best practice behind a long
time ago, so I'll let it go.
diff mbox series

Patch

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 0b5a33ee71ee..ff061b6a2393 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6761,6 +6761,10 @@  the first `ndata` items (possibly zero) of the data array are valid.
    the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI
    specification.
 
+ - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2
+   if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI
+   specification.
+
  - for RISC-V, data[0] is set to the value of the second argument of the
    ``sbi_system_reset`` call.
 
@@ -6794,6 +6798,13 @@  either:
  - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2
    "Caller responsibilities" for possible return values.
 
+Hibernation using the PSCI SYSTEM_OFF2 call is enabled with the
+KVM_CAP_ARM_SYSTEM_OFF2 VM capability. If a guest invokes the PSCI
+SYSTEM_OFF2 function, KVM will exit to userspace with the
+KVM_SYSTEM_EVENT_SHUTDOWN event type and with data[0] set to
+KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only supported hibernate
+type for the SYSTEM_OFF2 function is HIBERNATE_OFF (0x0).
+
 ::
 
 		/* KVM_EXIT_IOAPIC_EOI */
diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h
index 6883963bbc3a..6ed05791afdb 100644
--- a/arch/arm64/include/asm/kvm_host.h
+++ b/arch/arm64/include/asm/kvm_host.h
@@ -299,6 +299,8 @@  struct kvm_arch {
 #define KVM_ARCH_FLAG_ID_REGS_INITIALIZED		7
 	/* Fine-Grained UNDEF initialised */
 #define KVM_ARCH_FLAG_FGU_INITIALIZED			8
+	/* PSCI SYSTEM_OFF2 (hibernate) enabled for the guest */
+#define KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED		9
 	unsigned long flags;
 
 	/* VM-wide vCPU feature set */
diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h
index 964df31da975..66736ff04011 100644
--- a/arch/arm64/include/uapi/asm/kvm.h
+++ b/arch/arm64/include/uapi/asm/kvm.h
@@ -484,6 +484,12 @@  enum {
  */
 #define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2	(1ULL << 0)
 
+/*
+ * Shutdown caused by a PSCI v1.3 SYSTEM_OFF2 call.
+ * Valid only when the system event has a type of KVM_SYSTEM_EVENT_SHUTDOWN.
+ */
+#define KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2	(1ULL << 0)
+
 /* run->fail_entry.hardware_entry_failure_reason codes. */
 #define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED	(1ULL << 0)
 
diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c
index 3dee5490eea9..47d23064d6f7 100644
--- a/arch/arm64/kvm/arm.c
+++ b/arch/arm64/kvm/arm.c
@@ -98,6 +98,10 @@  int kvm_vm_ioctl_enable_cap(struct kvm *kvm,
 		r = 0;
 		set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags);
 		break;
+	case KVM_CAP_ARM_SYSTEM_OFF2:
+		r = 0;
+		set_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags);
+		break;
 	case KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE:
 		new_cap = cap->args[0];
 
@@ -243,6 +247,7 @@  int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
 	case KVM_CAP_VCPU_ATTRIBUTES:
 	case KVM_CAP_PTP_KVM:
 	case KVM_CAP_ARM_SYSTEM_SUSPEND:
+	case KVM_CAP_ARM_SYSTEM_OFF2:
 	case KVM_CAP_IRQFD_RESAMPLE:
 	case KVM_CAP_COUNTER_OFFSET:
 		r = 1;
diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c
index 1f69b667332b..59570eea8aa7 100644
--- a/arch/arm64/kvm/psci.c
+++ b/arch/arm64/kvm/psci.c
@@ -194,6 +194,12 @@  static void kvm_psci_system_off(struct kvm_vcpu *vcpu)
 	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, 0);
 }
 
+static void kvm_psci_system_off2(struct kvm_vcpu *vcpu)
+{
+	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN,
+				 KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2);
+}
+
 static void kvm_psci_system_reset(struct kvm_vcpu *vcpu)
 {
 	kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 0);
@@ -353,6 +359,11 @@  static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
 			if (test_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags))
 				val = 0;
 			break;
+		case PSCI_1_3_FN_SYSTEM_OFF2:
+		case PSCI_1_3_FN64_SYSTEM_OFF2:
+			if (test_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags))
+				val = 1UL << PSCI_1_3_HIBERNATE_TYPE_OFF;
+			break;
 		case PSCI_1_1_FN_SYSTEM_RESET2:
 		case PSCI_1_1_FN64_SYSTEM_RESET2:
 			if (minor >= 1)
@@ -374,6 +385,32 @@  static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor)
 			return 0;
 		}
 		break;
+	case PSCI_1_3_FN_SYSTEM_OFF2:
+		kvm_psci_narrow_to_32bit(vcpu);
+		fallthrough;
+	case PSCI_1_3_FN64_SYSTEM_OFF2:
+		if (!test_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags))
+			break;
+
+		arg = smccc_get_arg1(vcpu);
+		if (arg != PSCI_1_3_HIBERNATE_TYPE_OFF) {
+			val = PSCI_RET_INVALID_PARAMS;
+			break;
+		}
+		kvm_psci_system_off2(vcpu);
+		/*
+		 * We shouldn't be going back to guest VCPU after
+		 * receiving SYSTEM_OFF2 request.
+		 *
+		 * If user space accidentally/deliberately resumes
+		 * guest VCPU after SYSTEM_OFF request then guest
+		 * VCPU should see internal failure from PSCI return
+		 * value. To achieve this, we preload r0 (or x0) with
+		 * PSCI return value INTERNAL_FAILURE.
+		 */
+		val = PSCI_RET_INTERNAL_FAILURE;
+		ret = 0;
+		break;
 	case PSCI_1_1_FN_SYSTEM_RESET2:
 		kvm_psci_narrow_to_32bit(vcpu);
 		fallthrough;
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 2190adbe3002..caa3845b80d6 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -917,6 +917,7 @@  struct kvm_enable_cap {
 #define KVM_CAP_MEMORY_ATTRIBUTES 233
 #define KVM_CAP_GUEST_MEMFD 234
 #define KVM_CAP_VM_TYPES 235
+#define KVM_CAP_ARM_SYSTEM_OFF2 236
 
 struct kvm_irq_routing_irqchip {
 	__u32 irqchip;