mbox series

[RFC,0/5] KVM: arm64: Pass PSCI to userspace

Message ID 20210608154805.216869-1-jean-philippe@linaro.org (mailing list archive)
Headers show
Series KVM: arm64: Pass PSCI to userspace | expand

Message

Jean-Philippe Brucker June 8, 2021, 3:48 p.m. UTC
Allow userspace to request handling PSCI calls from guests. Our goal is
to enable a vCPU hot-add solution for Arm where the VMM presents
possible resources to the guest at boot, and controls which vCPUs can be
brought up by allowing or denying PSCI CPU_ON calls. Passing HVC and
PSCI to userspace has been discussed on the list in the context of vCPU
hot-add [1,2] but it can also be useful for implementing other SMCCC and
vendor hypercalls [3,4,5].

Patches 1-3 allow userspace to request WFI to be executed in KVM. That
way the VMM can easily implement the PSCI CPU_SUSPEND function, which is
mandatory from PSCI v0.2 onwards (even if it doesn't have a more useful
implementation than WFI, natively available to the guest).

Patch 4 lets userspace request any HVC that isn't handled by KVM, and
patch 5 lets userspace request PSCI calls, disabling in-kernel PSCI
handling.

I'm focusing on the PSCI bits, but a complete prototype of vCPU hot-add
for arm64 on Linux and QEMU, most of it from Salil and James, is
available at [6].

[1] https://lore.kernel.org/kvmarm/82879258-46a7-a6e9-ee54-fc3692c1cdc3@arm.com/
[2] https://lore.kernel.org/linux-arm-kernel/20200625133757.22332-1-salil.mehta@huawei.com/
    (Followed by KVM forum and Linaro Open discussions)
[3] https://lore.kernel.org/linux-arm-kernel/f56cf420-affc-35f0-2355-801a924b8a35@arm.com/
[4] https://lore.kernel.org/kvm/bf7e83f1-c58e-8d65-edd0-d08f27b8b766@arm.com/
[5] https://lore.kernel.org/kvm/1569338454-26202-2-git-send-email-guoheyi@huawei.com/
[6] https://jpbrucker.net/git/linux/log/?h=cpuhp/devel
    https://jpbrucker.net/git/qemu/log/?h=cpuhp/devel    

Jean-Philippe Brucker (5):
  KVM: arm64: Replace power_off with mp_state in struct kvm_vcpu_arch
  KVM: arm64: Move WFI execution to check_vcpu_requests()
  KVM: arm64: Allow userspace to request WFI
  KVM: arm64: Pass hypercalls to userspace
  KVM: arm64: Pass PSCI calls to userspace

 Documentation/virt/kvm/api.rst      | 46 +++++++++++++++----
 Documentation/virt/kvm/arm/psci.rst |  1 +
 arch/arm64/include/asm/kvm_host.h   | 10 +++-
 include/kvm/arm_hypercalls.h        |  1 +
 include/kvm/arm_psci.h              |  4 ++
 include/uapi/linux/kvm.h            |  3 ++
 arch/arm64/kvm/arm.c                | 71 +++++++++++++++++++++--------
 arch/arm64/kvm/handle_exit.c        |  3 +-
 arch/arm64/kvm/hypercalls.c         | 28 +++++++++++-
 arch/arm64/kvm/psci.c               | 69 ++++++++++++++--------------
 10 files changed, 170 insertions(+), 66 deletions(-)

Comments

Alexandru Elisei July 19, 2021, 3:29 p.m. UTC | #1
Hi Jean-Philippe,

I'm not really familiar with this part of KVM, and I'm still trying to get my head
around how this works, so please bare with me if I ask silly questions.

This is how I understand this will work:

1. VMM opts in to forward HVC calls not handled by KVM.

2. VMM opts in to forward PSCI calls, other than
PSCI_1_0_FN_PSCI_FEATURES(ARM_SMCCC_VERSION_FUNC_ID).

3. Guest emulates PSCI calls (and all the other HVC calls).

    3.a For CPU_SUSPEND coming from VCPU A, userspace does a
KVM_SET_MP_STATE(KVM_MP_STATE_HALTED) ioctl on the VCPU fd which sets the request
KVM_REQ_SUSPEND.

    3.b The next time the VCPU is run, KVM blocks the VCPU as a result of the
request. kvm_vcpu_block() does a schedule() in a loop until it decides that the
CPU must unblock.

    3.c The VCPU will run as normal after kvm_vcpu_block() returns.

Please correct me if I got something wrong.

I have a few general questions. It doesn't mean there's something wrong with your
approach, I'm just trying to understand it better.

1. Why forwarding PSCI calls to userspace depend on enabling forwarding for other
HVC calls? As I understand from the patches, those handle distinct function IDs.

2. HVC call forwarding to userspace also forwards PSCI functions which are defined
in ARM DEN 0022D, but not (yet) implemented by KVM. What happens if KVM's PSCI
implementation gets support for one of those functions? How does userspace know
that now it also needs to enable PSCI call forwarding to be able to handle that
function?

It looks to me like the boundary between the functions that are forwarded when HVC
call forwarding is enabled and the functions that are forwarded when PSCI call
forwarding is enabled is based on what Linux v5.13 handles. Have you considered
choosing this boundary based on something less arbitrary, like the function types
specified in ARM DEN 0028C, table 2-1?

In my opinion, setting the MP state to HALTED looks like a sensible approach to
implementing PSCI_SUSPEND. I'll take a closer look at the patches after I get a
better understanding about what is going on.

On 6/8/21 4:48 PM, Jean-Philippe Brucker wrote:
> Allow userspace to request handling PSCI calls from guests. Our goal is
> to enable a vCPU hot-add solution for Arm where the VMM presents
> possible resources to the guest at boot, and controls which vCPUs can be
> brought up by allowing or denying PSCI CPU_ON calls. Passing HVC and
> PSCI to userspace has been discussed on the list in the context of vCPU
> hot-add [1,2] but it can also be useful for implementing other SMCCC and
> vendor hypercalls [3,4,5].
>
> Patches 1-3 allow userspace to request WFI to be executed in KVM. That

I don't understand this. KVM, in kvm_vcpu_block(), does not execute an WFI.
PSCI_SUSPEND is documented as being indistinguishable from an WFI from the guest's
point of view, but it's implementation is not architecturally defined.

Thanks,

Alex

> way the VMM can easily implement the PSCI CPU_SUSPEND function, which is
> mandatory from PSCI v0.2 onwards (even if it doesn't have a more useful
> implementation than WFI, natively available to the guest).
>
> Patch 4 lets userspace request any HVC that isn't handled by KVM, and
> patch 5 lets userspace request PSCI calls, disabling in-kernel PSCI
> handling.
>
> I'm focusing on the PSCI bits, but a complete prototype of vCPU hot-add
> for arm64 on Linux and QEMU, most of it from Salil and James, is
> available at [6].
>
> [1] https://lore.kernel.org/kvmarm/82879258-46a7-a6e9-ee54-fc3692c1cdc3@arm.com/
> [2] https://lore.kernel.org/linux-arm-kernel/20200625133757.22332-1-salil.mehta@huawei.com/
>     (Followed by KVM forum and Linaro Open discussions)
> [3] https://lore.kernel.org/linux-arm-kernel/f56cf420-affc-35f0-2355-801a924b8a35@arm.com/
> [4] https://lore.kernel.org/kvm/bf7e83f1-c58e-8d65-edd0-d08f27b8b766@arm.com/
> [5] https://lore.kernel.org/kvm/1569338454-26202-2-git-send-email-guoheyi@huawei.com/
> [6] https://jpbrucker.net/git/linux/log/?h=cpuhp/devel
>     https://jpbrucker.net/git/qemu/log/?h=cpuhp/devel    
>
> Jean-Philippe Brucker (5):
>   KVM: arm64: Replace power_off with mp_state in struct kvm_vcpu_arch
>   KVM: arm64: Move WFI execution to check_vcpu_requests()
>   KVM: arm64: Allow userspace to request WFI
>   KVM: arm64: Pass hypercalls to userspace
>   KVM: arm64: Pass PSCI calls to userspace
>
>  Documentation/virt/kvm/api.rst      | 46 +++++++++++++++----
>  Documentation/virt/kvm/arm/psci.rst |  1 +
>  arch/arm64/include/asm/kvm_host.h   | 10 +++-
>  include/kvm/arm_hypercalls.h        |  1 +
>  include/kvm/arm_psci.h              |  4 ++
>  include/uapi/linux/kvm.h            |  3 ++
>  arch/arm64/kvm/arm.c                | 71 +++++++++++++++++++++--------
>  arch/arm64/kvm/handle_exit.c        |  3 +-
>  arch/arm64/kvm/hypercalls.c         | 28 +++++++++++-
>  arch/arm64/kvm/psci.c               | 69 ++++++++++++++--------------
>  10 files changed, 170 insertions(+), 66 deletions(-)
>
Jean-Philippe Brucker July 19, 2021, 6:02 p.m. UTC | #2
Hi Alex,

I'm not planning to resend this work at the moment, because it looks like
vcpu hot-add will go a different way so I don't have a user. But I'll
probably address the feedback so far and park it on some branch, in case
anyone else needs it.

On Mon, Jul 19, 2021 at 04:29:18PM +0100, Alexandru Elisei wrote:
> 1. Why forwarding PSCI calls to userspace depend on enabling forwarding for other
> HVC calls? As I understand from the patches, those handle distinct function IDs.

The HVC cap from patch 4 enables returning from the VCPU_RUN ioctl with
KVM_EXIT_HYPERCALL, for any HVC not handled by KVM. This one should
definitely be improved, either by letting userspace choose the ranges of
HVC it wants, or at least by reporting ranges reserved by KVM to
userspace.

The PSCI cap from patch 5 disables the in-kernel PSCI implementation. As a
result those HVCs are forwarded to userspace.

It was suggested that other users will want to handle HVC calls (SDEI for
example [1]), hence splitting into two capabilities rather than just the
PSCI cap. In v5.14 x86 added KVM_CAP_EXIT_HYPERCALL [2], which lets
userspace receive specific hypercalls. We could reuse that and have PSCI
be one bit of that capability's parameter.

[1] https://lore.kernel.org/linux-arm-kernel/20170808164616.25949-12-james.morse@arm.com/
[2] https://lore.kernel.org/kvm/90778988e1ee01926ff9cac447aacb745f954c8c.1623174621.git.ashish.kalra@amd.com/

> 2. HVC call forwarding to userspace also forwards PSCI functions which are defined
> in ARM DEN 0022D, but not (yet) implemented by KVM. What happens if KVM's PSCI
> implementation gets support for one of those functions? How does userspace know
> that now it also needs to enable PSCI call forwarding to be able to handle that
> function?

We forward the whole PSCI function range, so it's either KVM or userspace.
If KVM manages PSCI and the guest calls an unimplemented function, that
returns directly to the guest without going to userspace.

The concern is valid for any other range, though. If userspace enables the
HVC cap it receives function calls that at some point KVM might need to
handle itself. So we need some negotiation between user and KVM about the
specific HVC ranges that userspace can and will handle.

> It looks to me like the boundary between the functions that are forwarded when HVC
> call forwarding is enabled and the functions that are forwarded when PSCI call
> forwarding is enabled is based on what Linux v5.13 handles. Have you considered
> choosing this boundary based on something less arbitrary, like the function types
> specified in ARM DEN 0028C, table 2-1?

For PSCI I've used the range 0-0x1f as the boundary, which is reserved for
PSCI by SMCCC (table 6-4 in that document).

> 
> In my opinion, setting the MP state to HALTED looks like a sensible approach to
> implementing PSCI_SUSPEND. I'll take a closer look at the patches after I get a
> better understanding about what is going on.
> 
> On 6/8/21 4:48 PM, Jean-Philippe Brucker wrote:
> > Allow userspace to request handling PSCI calls from guests. Our goal is
> > to enable a vCPU hot-add solution for Arm where the VMM presents
> > possible resources to the guest at boot, and controls which vCPUs can be
> > brought up by allowing or denying PSCI CPU_ON calls. Passing HVC and
> > PSCI to userspace has been discussed on the list in the context of vCPU
> > hot-add [1,2] but it can also be useful for implementing other SMCCC and
> > vendor hypercalls [3,4,5].
> >
> > Patches 1-3 allow userspace to request WFI to be executed in KVM. That
> 
> I don't understand this. KVM, in kvm_vcpu_block(), does not execute an WFI.
> PSCI_SUSPEND is documented as being indistinguishable from an WFI from the guest's
> point of view, but it's implementation is not architecturally defined.

Yes that was an oversimplification on my part

Thanks,
Jean
Oliver Upton July 19, 2021, 7:37 p.m. UTC | #3
On Mon, Jul 19, 2021 at 11:02 AM Jean-Philippe Brucker
<jean-philippe@linaro.org> wrote:
> We forward the whole PSCI function range, so it's either KVM or userspace.
> If KVM manages PSCI and the guest calls an unimplemented function, that
> returns directly to the guest without going to userspace.
>
> The concern is valid for any other range, though. If userspace enables the
> HVC cap it receives function calls that at some point KVM might need to
> handle itself. So we need some negotiation between user and KVM about the
> specific HVC ranges that userspace can and will handle.

Are we going to use KVM_CAPs for every interesting HVC range that
userspace may want to trap? I wonder if a more generic interface for
hypercall filtering would have merit to handle the aforementioned
cases, and whatever else a VMM will want to intercept down the line.

For example, x86 has the concept of 'MSR filtering', wherein userspace
can specify a set of registers that it wants to intercept. Doing
something similar for HVCs would avoid the need for a kernel change
each time a VMM wishes to intercept a new hypercall.

--
Thanks,
Oliver
Jean-Philippe Brucker July 21, 2021, 5:46 p.m. UTC | #4
On Mon, Jul 19, 2021 at 12:37:52PM -0700, Oliver Upton wrote:
> On Mon, Jul 19, 2021 at 11:02 AM Jean-Philippe Brucker
> <jean-philippe@linaro.org> wrote:
> > We forward the whole PSCI function range, so it's either KVM or userspace.
> > If KVM manages PSCI and the guest calls an unimplemented function, that
> > returns directly to the guest without going to userspace.
> >
> > The concern is valid for any other range, though. If userspace enables the
> > HVC cap it receives function calls that at some point KVM might need to
> > handle itself. So we need some negotiation between user and KVM about the
> > specific HVC ranges that userspace can and will handle.
> 
> Are we going to use KVM_CAPs for every interesting HVC range that
> userspace may want to trap? I wonder if a more generic interface for
> hypercall filtering would have merit to handle the aforementioned
> cases, and whatever else a VMM will want to intercept down the line.
> 
> For example, x86 has the concept of 'MSR filtering', wherein userspace
> can specify a set of registers that it wants to intercept. Doing
> something similar for HVCs would avoid the need for a kernel change
> each time a VMM wishes to intercept a new hypercall.

Yes we could introduce a VM device group for this:
* User reads attribute KVM_ARM_VM_HVC_NR_SLOTS, which defines the number
  of available HVC ranges.
* User writes attribute KVM_ARM_VM_HVC_SET_RANGE with one range
  struct kvm_arm_hvc_range {
          __u32 slot;
  #define KVM_ARM_HVC_USER (1 << 0) /* Enable range. 0 disables it */
          __u16 flags;
	  __u16 imm;
          __u32 fn_start;
          __u32 fn_end;
  };
* KVM forwards any HVC within this range to userspace.
* If one of the ranges is PSCI functions, disable KVM PSCI.

Since it's more work for KVM to keep track of ranges, I didn't include it
in the RFC, and I'm going to leave it to the next person dealing with this
stuff :)

Thanks,
Jean
Jean-Philippe Brucker July 21, 2021, 5:56 p.m. UTC | #5
On Tue, Jun 08, 2021 at 05:48:01PM +0200, Jean-Philippe Brucker wrote:
> Allow userspace to request handling PSCI calls from guests. Our goal is
> to enable a vCPU hot-add solution for Arm where the VMM presents
> possible resources to the guest at boot, and controls which vCPUs can be
> brought up by allowing or denying PSCI CPU_ON calls.

Since it looks like vCPU hot-add will be implemented differently, I don't
intend to resend this series at the moment. But some of it could be
useful for other projects and to avoid the helpful review effort going to
waste, I fixed it up and will leave it on branch
https://jpbrucker.net/git/linux/log/?h=kvm/psci-to-userspace
It now only uses KVM_CAP_EXIT_HYPERCALL introduced in v5.14.

Thanks,
Jean