Message ID | 20240318164646.1010092-3-dwmw2@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: Add PSCI v1.3 SYSTEM_OFF2 support for hibernation | expand |
On Mon, 18 Mar 2024 16:14:24 +0000, David Woodhouse <dwmw2@infradead.org> wrote: > > From: David Woodhouse <dwmw@amazon.co.uk> > > The PSCI v1.3 specification (alpha) adds support for a SYSTEM_OFF2 function > which is analogous to ACPI S4 state. This will allow hosting environments > to determine that a guest is hibernated rather than just powered off, and > ensure that they preserve the virtual environment appropriately to allow > the guest to resume safely (or bump the hardware_signature in the FACS to > trigger a clean reboot instead). > > The beta version will be changed to say that PSCI_FEATURES returns a bit > mask of the supported hibernate types, which is implemented here. > > Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> > --- > Documentation/virt/kvm/api.rst | 11 +++++++++ > arch/arm64/include/asm/kvm_host.h | 2 ++ > arch/arm64/include/uapi/asm/kvm.h | 6 +++++ > arch/arm64/kvm/arm.c | 5 +++++ > arch/arm64/kvm/psci.c | 37 +++++++++++++++++++++++++++++++ > include/uapi/linux/kvm.h | 1 + > 6 files changed, 62 insertions(+) > > diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst > index 0b5a33ee71ee..ff061b6a2393 100644 > --- a/Documentation/virt/kvm/api.rst > +++ b/Documentation/virt/kvm/api.rst > @@ -6761,6 +6761,10 @@ the first `ndata` items (possibly zero) of the data array are valid. > the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI > specification. > > + - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2 > + if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI > + specification. > + > - for RISC-V, data[0] is set to the value of the second argument of the > ``sbi_system_reset`` call. > > @@ -6794,6 +6798,13 @@ either: > - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2 > "Caller responsibilities" for possible return values. > > +Hibernation using the PSCI SYSTEM_OFF2 call is enabled with the > +KVM_CAP_ARM_SYSTEM_OFF2 VM capability. If a guest invokes the PSCI > +SYSTEM_OFF2 function, KVM will exit to userspace with the > +KVM_SYSTEM_EVENT_SHUTDOWN event type and with data[0] set to > +KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only supported hibernate > +type for the SYSTEM_OFF2 function is HIBERNATE_OFF (0x0). Again, I really oppose this way of doing things. We already have an infrastructure for selecting PSCI levels. You may not like it, but it exists, and I'm not going entertain supporting yet another bike-shed model. Adding an orthogonal cap for a feature that is specific to a new PSCI version is just awful. Please make PSCI 1.3 the only version of PSCI supporting suspend in a non-optional way, and be done with it. M.
On Mon, 2024-03-18 at 17:29 +0000, Marc Zyngier wrote: > > Again, I really oppose this way of doing things. We already have an > infrastructure for selecting PSCI levels. You may not like it, but it > exists, and I'm not going entertain supporting yet another bike-shed > model. Adding an orthogonal cap for a feature that is specific to a > new PSCI version is just awful. Huh? This isn't a "new bike-shed model". This is a straight copy of what we *already* have for SYSTEM_RESET2. If I were bike-shedding, I wouldn't do separate caps for them; I'd have done it as a *bitmask* of the optional PSCI calls that should be enabled. The *mandatory* ones should obviously come from the PSCI version alone, but I can't see how that makes sense for the optional ones... > Please make PSCI 1.3 the only version of PSCI supporting suspend in a > non-optional way, and be done with it. SYSTEM_OFF2 is an *optional* feature in PSCI v1.3. As are CLEAR_INV_MEMREGION and CLEAR_INV_MEMREGION_ATTRIBUTES. Are you suggesting that enabling v1.3 should automatically enable *all* of the optional features that were defined in that version (and previous versions) of the spec?
On Mon, 18 Mar 2024 17:54:06 +0000, David Woodhouse <dwmw2@infradead.org> wrote: > > [1 <text/plain; UTF-8 (quoted-printable)>] > On Mon, 2024-03-18 at 17:29 +0000, Marc Zyngier wrote: > > > > Again, I really oppose this way of doing things. We already have an > > infrastructure for selecting PSCI levels. You may not like it, but it > > exists, and I'm not going entertain supporting yet another bike-shed > > model. Adding an orthogonal cap for a feature that is specific to a > > new PSCI version is just awful. > > Huh? This isn't a "new bike-shed model". This is a straight copy of > what we *already* have for SYSTEM_RESET2. There is no KVM capability for SYSTEM_RESET2. It is directly advertised to the guest when PSCI 1.1 is supported. > If I were bike-shedding, I wouldn't do separate caps for them; I'd have > done it as a *bitmask* of the optional PSCI calls that should be > enabled. > > The *mandatory* ones should obviously come from the PSCI version alone, > but I can't see how that makes sense for the optional ones... The guest is in a position to probe for what is supported or not with the PSCI_FEATURES call. Why would you add anything else? > > > Please make PSCI 1.3 the only version of PSCI supporting suspend in a > > non-optional way, and be done with it. > > SYSTEM_OFF2 is an *optional* feature in PSCI v1.3. As are > CLEAR_INV_MEMREGION and CLEAR_INV_MEMREGION_ATTRIBUTES. > > Are you suggesting that enabling v1.3 should automatically enable *all* > of the optional features that were defined in that version (and > previous versions) of the spec? No. We have everything we need to incrementally *add* features. So you can perfectly implement PSCI 1.3 with only SYSTEM_OFF2, and only later on add the rest, if ever. M.
On Mon, 2024-03-18 at 18:07 +0000, Marc Zyngier wrote: > On Mon, 18 Mar 2024 17:54:06 +0000, > David Woodhouse <dwmw2@infradead.org> wrote: > > > > [1 <text/plain; UTF-8 (quoted-printable)>] > > On Mon, 2024-03-18 at 17:29 +0000, Marc Zyngier wrote: > > > > > > Again, I really oppose this way of doing things. We already have an > > > infrastructure for selecting PSCI levels. You may not like it, but it > > > exists, and I'm not going entertain supporting yet another bike-shed > > > model. Adding an orthogonal cap for a feature that is specific to a > > > new PSCI version is just awful. > > > > Huh? This isn't a "new bike-shed model". This is a straight copy of > > what we *already* have for SYSTEM_RESET2. > > There is no KVM capability for SYSTEM_RESET2. It is directly > advertised to the guest when PSCI 1.1 is supported. Apologies, I got that wrong. It's SYSTEM_SUSPEND and the corresponding KVM_CAP_ARM_SYSTEM_SUSPEND that I was thinking of. Not SYSTEM_RESET2.I mixed those up. > > If I were bike-shedding, I wouldn't do separate caps for them; I'd have > > done it as a *bitmask* of the optional PSCI calls that should be > > enabled. > > > > The *mandatory* ones should obviously come from the PSCI version alone, > > but I can't see how that makes sense for the optional ones... > > The guest is in a position to probe for what is supported or not with > the PSCI_FEATURES call. Why would you add anything else? Because we don't want to silently *change* what's advertised to the guest with the VMM explicitly opting in. > > > Please make PSCI 1.3 the only version of PSCI supporting suspend in a > > > non-optional way, and be done with it. > > > > SYSTEM_OFF2 is an *optional* feature in PSCI v1.3. As are > > CLEAR_INV_MEMREGION and CLEAR_INV_MEMREGION_ATTRIBUTES. > > > > Are you suggesting that enabling v1.3 should automatically enable *all* > > of the optional features that were defined in that version (and > > previous versions) of the spec? > > No. We have everything we need to incrementally *add* features. So you > can perfectly implement PSCI 1.3 with only SYSTEM_OFF2, and only later > on add the rest, if ever. OK. It's still awful, but I suppose can live with that since existing VMMs will just see the same KVM_SYSTEM_EVENT_SHUTDOWN as before, and hopefully just won't understand the flag (and won't notice) the extra flag which says it's a hibernate. A VMM might *perhaps* check for flags it doesn't understand and complain about them, which is why we shouldn't really do that. But where PSCI is concerned it seems we've left best practice behind a long time ago, so I'll let it go.
diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst index 0b5a33ee71ee..ff061b6a2393 100644 --- a/Documentation/virt/kvm/api.rst +++ b/Documentation/virt/kvm/api.rst @@ -6761,6 +6761,10 @@ the first `ndata` items (possibly zero) of the data array are valid. the guest issued a SYSTEM_RESET2 call according to v1.1 of the PSCI specification. + - for arm64, data[0] is set to KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2 + if the guest issued a SYSTEM_OFF2 call according to v1.3 of the PSCI + specification. + - for RISC-V, data[0] is set to the value of the second argument of the ``sbi_system_reset`` call. @@ -6794,6 +6798,13 @@ either: - Deny the guest request to suspend the VM. See ARM DEN0022D.b 5.19.2 "Caller responsibilities" for possible return values. +Hibernation using the PSCI SYSTEM_OFF2 call is enabled with the +KVM_CAP_ARM_SYSTEM_OFF2 VM capability. If a guest invokes the PSCI +SYSTEM_OFF2 function, KVM will exit to userspace with the +KVM_SYSTEM_EVENT_SHUTDOWN event type and with data[0] set to +KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2. The only supported hibernate +type for the SYSTEM_OFF2 function is HIBERNATE_OFF (0x0). + :: /* KVM_EXIT_IOAPIC_EOI */ diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index 6883963bbc3a..6ed05791afdb 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -299,6 +299,8 @@ struct kvm_arch { #define KVM_ARCH_FLAG_ID_REGS_INITIALIZED 7 /* Fine-Grained UNDEF initialised */ #define KVM_ARCH_FLAG_FGU_INITIALIZED 8 + /* PSCI SYSTEM_OFF2 (hibernate) enabled for the guest */ +#define KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED 9 unsigned long flags; /* VM-wide vCPU feature set */ diff --git a/arch/arm64/include/uapi/asm/kvm.h b/arch/arm64/include/uapi/asm/kvm.h index 964df31da975..66736ff04011 100644 --- a/arch/arm64/include/uapi/asm/kvm.h +++ b/arch/arm64/include/uapi/asm/kvm.h @@ -484,6 +484,12 @@ enum { */ #define KVM_SYSTEM_EVENT_RESET_FLAG_PSCI_RESET2 (1ULL << 0) +/* + * Shutdown caused by a PSCI v1.3 SYSTEM_OFF2 call. + * Valid only when the system event has a type of KVM_SYSTEM_EVENT_SHUTDOWN. + */ +#define KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2 (1ULL << 0) + /* run->fail_entry.hardware_entry_failure_reason codes. */ #define KVM_EXIT_FAIL_ENTRY_CPU_UNSUPPORTED (1ULL << 0) diff --git a/arch/arm64/kvm/arm.c b/arch/arm64/kvm/arm.c index 3dee5490eea9..47d23064d6f7 100644 --- a/arch/arm64/kvm/arm.c +++ b/arch/arm64/kvm/arm.c @@ -98,6 +98,10 @@ int kvm_vm_ioctl_enable_cap(struct kvm *kvm, r = 0; set_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags); break; + case KVM_CAP_ARM_SYSTEM_OFF2: + r = 0; + set_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags); + break; case KVM_CAP_ARM_EAGER_SPLIT_CHUNK_SIZE: new_cap = cap->args[0]; @@ -243,6 +247,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext) case KVM_CAP_VCPU_ATTRIBUTES: case KVM_CAP_PTP_KVM: case KVM_CAP_ARM_SYSTEM_SUSPEND: + case KVM_CAP_ARM_SYSTEM_OFF2: case KVM_CAP_IRQFD_RESAMPLE: case KVM_CAP_COUNTER_OFFSET: r = 1; diff --git a/arch/arm64/kvm/psci.c b/arch/arm64/kvm/psci.c index 1f69b667332b..59570eea8aa7 100644 --- a/arch/arm64/kvm/psci.c +++ b/arch/arm64/kvm/psci.c @@ -194,6 +194,12 @@ static void kvm_psci_system_off(struct kvm_vcpu *vcpu) kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, 0); } +static void kvm_psci_system_off2(struct kvm_vcpu *vcpu) +{ + kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_SHUTDOWN, + KVM_SYSTEM_EVENT_SHUTDOWN_FLAG_PSCI_OFF2); +} + static void kvm_psci_system_reset(struct kvm_vcpu *vcpu) { kvm_prepare_system_event(vcpu, KVM_SYSTEM_EVENT_RESET, 0); @@ -353,6 +359,11 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor) if (test_bit(KVM_ARCH_FLAG_SYSTEM_SUSPEND_ENABLED, &kvm->arch.flags)) val = 0; break; + case PSCI_1_3_FN_SYSTEM_OFF2: + case PSCI_1_3_FN64_SYSTEM_OFF2: + if (test_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags)) + val = 1UL << PSCI_1_3_HIBERNATE_TYPE_OFF; + break; case PSCI_1_1_FN_SYSTEM_RESET2: case PSCI_1_1_FN64_SYSTEM_RESET2: if (minor >= 1) @@ -374,6 +385,32 @@ static int kvm_psci_1_x_call(struct kvm_vcpu *vcpu, u32 minor) return 0; } break; + case PSCI_1_3_FN_SYSTEM_OFF2: + kvm_psci_narrow_to_32bit(vcpu); + fallthrough; + case PSCI_1_3_FN64_SYSTEM_OFF2: + if (!test_bit(KVM_ARCH_FLAG_SYSTEM_OFF2_ENABLED, &kvm->arch.flags)) + break; + + arg = smccc_get_arg1(vcpu); + if (arg != PSCI_1_3_HIBERNATE_TYPE_OFF) { + val = PSCI_RET_INVALID_PARAMS; + break; + } + kvm_psci_system_off2(vcpu); + /* + * We shouldn't be going back to guest VCPU after + * receiving SYSTEM_OFF2 request. + * + * If user space accidentally/deliberately resumes + * guest VCPU after SYSTEM_OFF request then guest + * VCPU should see internal failure from PSCI return + * value. To achieve this, we preload r0 (or x0) with + * PSCI return value INTERNAL_FAILURE. + */ + val = PSCI_RET_INTERNAL_FAILURE; + ret = 0; + break; case PSCI_1_1_FN_SYSTEM_RESET2: kvm_psci_narrow_to_32bit(vcpu); fallthrough; diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h index 2190adbe3002..caa3845b80d6 100644 --- a/include/uapi/linux/kvm.h +++ b/include/uapi/linux/kvm.h @@ -917,6 +917,7 @@ struct kvm_enable_cap { #define KVM_CAP_MEMORY_ATTRIBUTES 233 #define KVM_CAP_GUEST_MEMFD 234 #define KVM_CAP_VM_TYPES 235 +#define KVM_CAP_ARM_SYSTEM_OFF2 236 struct kvm_irq_routing_irqchip { __u32 irqchip;